Opened 9 years ago

Closed 9 years ago

#8526 closed defect (Fixed)

Deadlock in backend code when slave backend disconnects

Reported by: doug@… Owned by: cpinkham
Priority: minor Milestone: 0.24
Component: MythTV - General Version: Master Head
Severity: medium Keywords:
Cc: Ticket locked: no

Description

I encountered a deadlock in the master backend occuring sometimes when a slave backend disconnects. Since I am using the feature to make slave backends sleep when they are idle, I was encountering this frequently.

What seems to happen is that code in scheduler.cpp is called from mainserver.cpp while the sockListLock is held. If the scheduler code then calls code in mainserver.cpp, a deadlock can occur.

I have attached a patch which delays calls to the scheduler until after the lock has been freed in the MainServer::connectionClosed() function.

So far, this is working for me, but I'm not sure if the code currently depends on side effects from the scheduler that are removed by delaying the calls.

Attachments (2)

mainserver_slave_disconnect_deadlock.patch (1.7 KB) - added by doug@… 9 years ago.
Proposed patch
mythbackend_MainServer_BackendQueryDiskSpace_deadlock.patch (1.0 KB) - added by doug@… 9 years ago.
Patch for second described deadlock

Download all attachments as: .zip

Change History (6)

Changed 9 years ago by doug@…

Proposed patch

comment:1 Changed 9 years ago by cpinkham

Owner: changed from Isaac Richards to cpinkham
Status: newassigned

comment:2 Changed 9 years ago by doug@…

Found one more deadlock that occurs when a slave backend disappears in an unclean manner.

This can happen when the MainServer::BackendQueryDiskSpace? function is called. The the PlaybackSock::GetDiskSpace? function is called in here, the readStringList call can timeout, which results in the socket closed callback being called, which creates the deadlock. Here is a backtrace showing the deadlocked condition in the thread:

Thread 2 (Thread 0xaa4fdb70 (LWP 23664)): #0 0xb77c8424 in kernel_vsyscall () #1 0xb44390e5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #2 0xb459961b in QWaitCondition::wait(QMutex*, unsigned long) ()

from /usr/lib/qt4/libQtCore.so.4

#3 0xb459362b in QReadWriteLock::lockForWrite() ()

from /usr/lib/qt4/libQtCore.so.4

#4 0x080aa868 in MainServer::connectionClosed (this=0x91f66d0,

socket=0x9202598) at mainserver.cpp:5140

#5 0xb6090fc0 in MythSocket::close (this=0x9202598) at mythsocket.cpp:199 #6 0xb608b72d in MythSocket::readStringList (this=0x9202598, list=...,

timeoutMS=30000) at mythsocket.cpp:485

#7 0x0810122d in PlaybackSock::SendReceiveStringList? (this=0x91b9200,

strlist=..., min_reply_length=0) at playbacksock.cpp:105

#8 0x081059bf in PlaybackSock::GetDiskSpace? (this=0x91b9200, o_strlist=...)

at playbacksock.cpp:161

#9 0x080b980d in MainServer::BackendQueryDiskSpace? (this=0x91f66d0,

strlist=..., consolidated=false, allHosts=true) at mainserver.cpp:4021

#10 0x080bb525 in MainServer::GetFilesystemInfos? (this=0x91f66d0, fsInfos=...)

at mainserver.cpp:4123

#11 0x08077c63 in AutoExpire::CalcParams? (this=0x91f4b30) at autoexpire.cpp:135 #12 0x0807aa58 in SpawnUpdateThread? (autoExpireInstance=0x91f4b30)

at autoexpire.cpp:1010

#13 0xb443542f in start_thread () from /lib/libpthread.so.0 #14 0xb429670e in clone () from /lib/libc.so.6

I have attached another patch which should work around this.

Changed 9 years ago by doug@…

Patch for second described deadlock

comment:3 Changed 9 years ago by cpinkham

Milestone: unknown0.24

comment:4 Changed 9 years ago by robertm

Resolution: Fixed
Status: assignedclosed

(In [26827]) Avoid two deadlock conditions in Mythbackend when Slave Backends are running. One occurs when the slave frequently disconnects, and the other when it randomly and unpredictably disappears. These could render the MBE and SBE completely deadlocked until restart. A week of testing including some hard scripted interruption of the connections has yielded no deadlocks, which I was seeing frequently before. cpinkham has read over the patches and agrees that they appear correct. Submitted has been running patches with no ill effect for some time. Patches from Doug, fixes #8526.

Note: See TracTickets for help on using tickets.