Opened 11 years ago

Closed 10 years ago

Last modified 10 years ago

#11306 closed Patch - Bug Fix (Fixed)

0.26 (MASTER backend) core when (SLAVE backend) disconnects

Reported by: mythtvuser10@… Owned by: Stuart Auchterlonie
Priority: minor Milestone: 0.27.1
Component: MythTV - General Version: 0.26-fixes
Severity: medium Keywords:
Cc: Stuart Auchterlonie Ticket locked: no

Description

I found a MASTER mythbackend (core) dump in 0.26-fixes which seems to be related to (another) SLAVE mythbackend going away. The result is a NULL (this pointer) for a MythSocket?. Unfortunately I don't think I can recreate this easily? -- but it looks to me like the (SLAVE backend) disconnected and then (MASTER backend) core dumped.

Attachments (6)

myOutput.txt (4.4 KB) - added by mythtvuser10@… 11 years ago.
Output from GDB and mythbackend.log
slave_shutdown_deadlock.patch (23.8 KB) - added by tomi.orava@… 11 years ago.
A extremely dirty hack around the slave shutdown deadlock
slave_shutdown_deadlock_v2.patch (24.0 KB) - added by tomi.orava@… 11 years ago.
An extremely dirty hack around the slave shutdown deadlock v2
slave_shutdown_deadlock_v3.patch (23.9 KB) - added by Tomi Orava <tomi.orava@…> 11 years ago.
Updated slave backend deadlock patch v3
slave_shutdown_deadlock_for.v0.27.patch (17.4 KB) - added by Tomi Orava <tomi.orava@…> 11 years ago.
Updated slave backend deadlock patch for the mythtv v0.27
protect_pbs_with_proper_refcounting.patch (9.5 KB) - added by Cédric Schieli <cschieli@…> 10 years ago.
Protect pbs with proper refcounting

Download all attachments as: .zip

Change History (21)

Changed 11 years ago by mythtvuser10@…

Attachment: myOutput.txt added

Output from GDB and mythbackend.log

comment:1 Changed 11 years ago by danielk

Owner: set to danielk
Status: newaccepted

Changed 11 years ago by tomi.orava@…

A extremely dirty hack around the slave shutdown deadlock

comment:2 Changed 11 years ago by tomi.orava@…

Ok, for some reason I'm now also seeing this bug every single time the slave backend disconnects. The problem is that there is not much that can be done with the current design, as the playbacksock and encoderlink classes don't have real locking at all and the mythsocket callback is a huge design problem.

The problem case seems to be this:

Whenever the slave is requested to shutdown, two threads are racing against each other:

Scheduler::run
--> Scheduler::HandleIdleShutdown
--> EncoderLink::IsBusy
--> PlaybackSock::IsBusy
--> PlaybackSock::SendReceiveStringList
--> MythSocket::Lock
    ---> deadlock!



Slave Connection is closed:

--> MythSocketThread::run
--> MythSocketThread::ReadyToBeRead
--> MythSocket::close
--> MainServer::connectionClosed
--> EncoderLink::SetSocket
--> QMutexLocker::QMutexLocker
    ---> deadlock!

I've done a truly horrible hack around the problem, but I've yet to figure out any reasonable workaround, as the locking in each of these classes is not consistent. The patch has been working for me since yesterday, though.

Changed 11 years ago by tomi.orava@…

An extremely dirty hack around the slave shutdown deadlock v2

comment:3 Changed 11 years ago by johan@…

Can this patch please be included in 0.26-fixes? Currently (slave)backend shutdown is not usable anymore.

It seems this issue is also triggered (sometimes!) when the master backend sends a shutdown command to a slave backend. When the slave backend then succesfully shuts down, and thus is disconnected, the master backend dumps core.

After the master backend is restarted, it doesn't know the slave is "sleeping", but just thinks it is "not connected". This means the master backend will not try to wake-up the slave backend when required. In order to make the master backend see the slave backend again, the slave backend must be manually restarted.

When using a sleep-command for slave backends that doesn't do anything (the slave backend stays up and running), both the master and slave backends continue to run stable and don't crash.

comment:4 Changed 11 years ago by tomi.orava@…

It looks like there aren't that many suffering from this problem, but let's anyway add an updated patch so that slave reconnects work properly. Basically the needed change is one line modification since v2 patch.

Changed 11 years ago by Tomi Orava <tomi.orava@…>

Updated slave backend deadlock patch v3

Changed 11 years ago by Tomi Orava <tomi.orava@…>

Updated slave backend deadlock patch for the mythtv v0.27

comment:5 Changed 11 years ago by Tomi Orava <tomi.orava@…>

I've added a patch for the current release v0.27 which fixes the problem originally reported in 0.26. The patch has been working without any problems for the past week.

slave_shutdown_deadlock_for.v0.27.patch

comment:6 Changed 10 years ago by Cédric Schieli <cschieli@…>

After upgrading my system (MBE + 2 combined SBE/FE + 1 FE) from 0.24 to 0.27 I quickly got hit by this issue as my SBE are configured to go to sleep when idle.

I've tried Tomi's patch (which applied only with a small modification) but the MBE would deadlock after handling a few requests.

So I've come up with my own patch which has been running smoothly in production for two weeks. Instead of adding locking around code blocks I've added proper refcounting each time a PlaybackSock? is accessed from the EncoderLink?.

I've sent a pull request (https://github.com/MythTV/mythtv/pull/63) for master branch (the exact same patch applies both on master and fixes/0.27)

Changed 10 years ago by Cédric Schieli <cschieli@…>

Protect pbs with proper refcounting

comment:7 Changed 10 years ago by Stuart Auchterlonie

Cc: Stuart Auchterlonie added
Milestone: unknown0.28

comment:8 Changed 10 years ago by Karl Egly

Type: Bug Report - CrashPatch - Bug Fix

comment:9 Changed 10 years ago by tomi.orava@…

I've been running Cedric's patch for the past 2-3 weeks and indeed it has been working flawlessly. Nice job!

comment:10 Changed 10 years ago by andreas@…

Thank you Cedric! My home setup of a 7x24 master backend and a combined slave backend/frontend is working again as it should. Migrating to 0.27 was a big mistake. The system got completely unstable due to this bug. I don't understand why your fix is scheduled for 0.28 only. 0.27 users are left alone with a permanently crashing master backend that constantly fails to record shows. I really wonder how other MythTV users setup looks like and why there aren't more complaints about this fatal bug.

comment:11 Changed 10 years ago by Stuart Auchterlonie

Owner: changed from danielk to Stuart Auchterlonie
Status: acceptedassigned

This patch looks good, and with the positive feedback i'll push it to master and let it sit for a week or so. After that i'll backport it to 0.27

comment:12 Changed 10 years ago by Cédric Schieli <cschieli@…>

In 6f246c0d1567bd84e1d9d30de7def724fc75e8f6/mythtv:

Protect pbs with proper (aka locked) refcounting.

The PlaybackSock? must not be immediatly deleted on connectionClosed()
if some other requests are still using it. This leads to segfaults in
the MBE when the SBE disconnects (#11306).
(cherry picked from commit 7b305848b0584bcebc55d6fe5a670ef7d2f55360)

Refs #11306
Signed-off-by: Stuart Auchterlonie <stuarta@…>

comment:13 Changed 10 years ago by Cédric Schieli <cschieli@…>

In 27668b7280a3d55bf1c5540ae5b44f623f4d1892/mythtv:

Protect pbs with proper (aka locked) refcounting.

The PlaybackSock? must not be immediatly deleted on connectionClosed()
if some other requests are still using it. This leads to segfaults in
the MBE when the SBE disconnects (#11306).
(cherry picked from commit 7b305848b0584bcebc55d6fe5a670ef7d2f55360)

Refs #11306
Signed-off-by: Stuart Auchterlonie <stuarta@…>

(cherry picked from commit 6f246c0d1567bd84e1d9d30de7def724fc75e8f6)

comment:14 Changed 10 years ago by JYA

Resolution: Fixed
Status: assignedclosed

comment:15 Changed 10 years ago by JYA

Milestone: 0.280.27.1
Note: See TracTickets for help on using tickets.