Opened 15 years ago
Closed 14 years ago
#8526 closed defect (Fixed)
Deadlock in backend code when slave backend disconnects
Reported by: | Owned by: | cpinkham | |
---|---|---|---|
Priority: | minor | Milestone: | 0.24 |
Component: | MythTV - General | Version: | Master Head |
Severity: | medium | Keywords: | |
Cc: | Ticket locked: | no |
Description
I encountered a deadlock in the master backend occuring sometimes when a slave backend disconnects. Since I am using the feature to make slave backends sleep when they are idle, I was encountering this frequently.
What seems to happen is that code in scheduler.cpp is called from mainserver.cpp while the sockListLock is held. If the scheduler code then calls code in mainserver.cpp, a deadlock can occur.
I have attached a patch which delays calls to the scheduler until after the lock has been freed in the MainServer::connectionClosed() function.
So far, this is working for me, but I'm not sure if the code currently depends on side effects from the scheduler that are removed by delaying the calls.
Attachments (2)
Change History (6)
Changed 15 years ago by
Attachment: | mainserver_slave_disconnect_deadlock.patch added |
---|
comment:1 Changed 15 years ago by
Owner: | changed from Isaac Richards to cpinkham |
---|---|
Status: | new → assigned |
comment:2 Changed 15 years ago by
Found one more deadlock that occurs when a slave backend disappears in an unclean manner.
This can happen when the MainServer::BackendQueryDiskSpace? function is called. The the PlaybackSock::GetDiskSpace? function is called in here, the readStringList call can timeout, which results in the socket closed callback being called, which creates the deadlock. Here is a backtrace showing the deadlocked condition in the thread:
Thread 2 (Thread 0xaa4fdb70 (LWP 23664)): #0 0xb77c8424 in kernel_vsyscall () #1 0xb44390e5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #2 0xb459961b in QWaitCondition::wait(QMutex*, unsigned long) ()
from /usr/lib/qt4/libQtCore.so.4
#3 0xb459362b in QReadWriteLock::lockForWrite() ()
from /usr/lib/qt4/libQtCore.so.4
#4 0x080aa868 in MainServer::connectionClosed (this=0x91f66d0,
socket=0x9202598) at mainserver.cpp:5140
#5 0xb6090fc0 in MythSocket::close (this=0x9202598) at mythsocket.cpp:199 #6 0xb608b72d in MythSocket::readStringList (this=0x9202598, list=...,
timeoutMS=30000) at mythsocket.cpp:485
#7 0x0810122d in PlaybackSock::SendReceiveStringList? (this=0x91b9200,
strlist=..., min_reply_length=0) at playbacksock.cpp:105
#8 0x081059bf in PlaybackSock::GetDiskSpace? (this=0x91b9200, o_strlist=...)
at playbacksock.cpp:161
#9 0x080b980d in MainServer::BackendQueryDiskSpace? (this=0x91f66d0,
strlist=..., consolidated=false, allHosts=true) at mainserver.cpp:4021
#10 0x080bb525 in MainServer::GetFilesystemInfos? (this=0x91f66d0, fsInfos=...)
at mainserver.cpp:4123
#11 0x08077c63 in AutoExpire::CalcParams? (this=0x91f4b30) at autoexpire.cpp:135 #12 0x0807aa58 in SpawnUpdateThread? (autoExpireInstance=0x91f4b30)
at autoexpire.cpp:1010
#13 0xb443542f in start_thread () from /lib/libpthread.so.0 #14 0xb429670e in clone () from /lib/libc.so.6
I have attached another patch which should work around this.
Changed 15 years ago by
Attachment: | mythbackend_MainServer_BackendQueryDiskSpace_deadlock.patch added |
---|
Patch for second described deadlock
comment:3 Changed 14 years ago by
Milestone: | unknown → 0.24 |
---|
comment:4 Changed 14 years ago by
Resolution: | → Fixed |
---|---|
Status: | assigned → closed |
(In [26827]) Avoid two deadlock conditions in Mythbackend when Slave Backends are running. One occurs when the slave frequently disconnects, and the other when it randomly and unpredictably disappears. These could render the MBE and SBE completely deadlocked until restart. A week of testing including some hard scripted interruption of the connections has yielded no deadlocks, which I was seeing frequently before. cpinkham has read over the patches and agrees that they appear correct. Submitted has been running patches with no ill effect for some time. Patches from Doug, fixes #8526.
Proposed patch