Opened 11 years ago
Closed 10 years ago
Last modified 10 years ago
#11306 closed Patch - Bug Fix (Fixed)
0.26 (MASTER backend) core when (SLAVE backend) disconnects
Reported by: | Owned by: | Stuart Auchterlonie | |
---|---|---|---|
Priority: | minor | Milestone: | 0.27.1 |
Component: | MythTV - General | Version: | 0.26-fixes |
Severity: | medium | Keywords: | |
Cc: | Stuart Auchterlonie | Ticket locked: | no |
Description
I found a MASTER mythbackend (core) dump in 0.26-fixes which seems to be related to (another) SLAVE mythbackend going away. The result is a NULL (this pointer) for a MythSocket?. Unfortunately I don't think I can recreate this easily? -- but it looks to me like the (SLAVE backend) disconnected and then (MASTER backend) core dumped.
Attachments (6)
Change History (21)
Changed 11 years ago by
Attachment: | myOutput.txt added |
---|
comment:1 Changed 11 years ago by
Owner: | set to danielk |
---|---|
Status: | new → accepted |
Changed 11 years ago by
Attachment: | slave_shutdown_deadlock.patch added |
---|
A extremely dirty hack around the slave shutdown deadlock
comment:2 Changed 11 years ago by
Ok, for some reason I'm now also seeing this bug every single time the slave backend disconnects. The problem is that there is not much that can be done with the current design, as the playbacksock and encoderlink classes don't have real locking at all and the mythsocket callback is a huge design problem.
The problem case seems to be this:
Whenever the slave is requested to shutdown, two threads are racing against each other:
Scheduler::run --> Scheduler::HandleIdleShutdown --> EncoderLink::IsBusy --> PlaybackSock::IsBusy --> PlaybackSock::SendReceiveStringList --> MythSocket::Lock ---> deadlock! Slave Connection is closed: --> MythSocketThread::run --> MythSocketThread::ReadyToBeRead --> MythSocket::close --> MainServer::connectionClosed --> EncoderLink::SetSocket --> QMutexLocker::QMutexLocker ---> deadlock!
I've done a truly horrible hack around the problem, but I've yet to figure out any reasonable workaround, as the locking in each of these classes is not consistent. The patch has been working for me since yesterday, though.
Changed 11 years ago by
Attachment: | slave_shutdown_deadlock_v2.patch added |
---|
An extremely dirty hack around the slave shutdown deadlock v2
comment:3 Changed 11 years ago by
Can this patch please be included in 0.26-fixes? Currently (slave)backend shutdown is not usable anymore.
It seems this issue is also triggered (sometimes!) when the master backend sends a shutdown command to a slave backend. When the slave backend then succesfully shuts down, and thus is disconnected, the master backend dumps core.
After the master backend is restarted, it doesn't know the slave is "sleeping", but just thinks it is "not connected". This means the master backend will not try to wake-up the slave backend when required. In order to make the master backend see the slave backend again, the slave backend must be manually restarted.
When using a sleep-command for slave backends that doesn't do anything (the slave backend stays up and running), both the master and slave backends continue to run stable and don't crash.
comment:4 Changed 11 years ago by
It looks like there aren't that many suffering from this problem, but let's anyway add an updated patch so that slave reconnects work properly. Basically the needed change is one line modification since v2 patch.
Changed 11 years ago by
Attachment: | slave_shutdown_deadlock_v3.patch added |
---|
Updated slave backend deadlock patch v3
Changed 10 years ago by
Attachment: | slave_shutdown_deadlock_for.v0.27.patch added |
---|
Updated slave backend deadlock patch for the mythtv v0.27
comment:5 Changed 10 years ago by
I've added a patch for the current release v0.27 which fixes the problem originally reported in 0.26. The patch has been working without any problems for the past week.
slave_shutdown_deadlock_for.v0.27.patch
comment:6 Changed 10 years ago by
After upgrading my system (MBE + 2 combined SBE/FE + 1 FE) from 0.24 to 0.27 I quickly got hit by this issue as my SBE are configured to go to sleep when idle.
I've tried Tomi's patch (which applied only with a small modification) but the MBE would deadlock after handling a few requests.
So I've come up with my own patch which has been running smoothly in production for two weeks. Instead of adding locking around code blocks I've added proper refcounting each time a PlaybackSock? is accessed from the EncoderLink?.
I've sent a pull request (https://github.com/MythTV/mythtv/pull/63) for master branch (the exact same patch applies both on master and fixes/0.27)
Changed 10 years ago by
Attachment: | protect_pbs_with_proper_refcounting.patch added |
---|
Protect pbs with proper refcounting
comment:7 Changed 10 years ago by
Cc: | Stuart Auchterlonie added |
---|---|
Milestone: | unknown → 0.28 |
comment:8 Changed 10 years ago by
Type: | Bug Report - Crash → Patch - Bug Fix |
---|
comment:9 Changed 10 years ago by
I've been running Cedric's patch for the past 2-3 weeks and indeed it has been working flawlessly. Nice job!
comment:10 Changed 10 years ago by
Thank you Cedric! My home setup of a 7x24 master backend and a combined slave backend/frontend is working again as it should. Migrating to 0.27 was a big mistake. The system got completely unstable due to this bug. I don't understand why your fix is scheduled for 0.28 only. 0.27 users are left alone with a permanently crashing master backend that constantly fails to record shows. I really wonder how other MythTV users setup looks like and why there aren't more complaints about this fatal bug.
comment:11 Changed 10 years ago by
Owner: | changed from danielk to Stuart Auchterlonie |
---|---|
Status: | accepted → assigned |
This patch looks good, and with the positive feedback i'll push it to master and let it sit for a week or so. After that i'll backport it to 0.27
comment:14 Changed 10 years ago by
Resolution: | → Fixed |
---|---|
Status: | assigned → closed |
comment:15 Changed 10 years ago by
Milestone: | 0.28 → 0.27.1 |
---|
Output from GDB and mythbackend.log