Opened 5 years ago
Closed 5 years ago
Last modified 5 years ago
#13479 closed Bug Report - General (fixed)
mythbackend deadlock when editing schedule metadata on frontend
Reported by: | Peter Bennett | Owned by: | Peter Bennett |
---|---|---|---|
Priority: | minor | Milestone: | 31.0 |
Component: | MythTV - General | Version: | v30-fixes |
Severity: | medium | Keywords: | |
Cc: | Ticket locked: | no |
Description
This has happened a few times, but only since V30.
After performing a search for artwork in the schedule editor, the backend sometimes gets into a state where the frontends lose the connection and no frontend can get connected again.
Attachments (5)
Change History (15)
Changed 5 years ago by
Attachment: | gdb.core.1567104223.200.1464.mythbackend_v30.0-66-g83e27017203 added |
---|
Changed 5 years ago by
Attachment: | ver.gdb.core.1567104223.200.1464.mythbackend_v30.0-66-g83e27017203 added |
---|
Version
comment:1 Changed 5 years ago by
Version: | Unspecified → v30-fixes |
---|
comment:2 Changed 5 years ago by
This happened again, this time with no schedule or unusual actions on the frontend, just connecting the frontend and navigating to the recordings list.
comment:3 Changed 5 years ago by
This may be related to the report in the forum at https://forum.mythtv.org/viewtopic.php?f=36&t=3545
comment:4 Changed 5 years ago by
Owner: | set to Peter Bennett |
---|---|
Status: | new → assigned |
See the attached file thread_analysis.txt
Thread 34 and Thread 6 are both doing this
MainServer::connectionClosed calls sockListLock.lockForWrite MainServer::connectionClosed calls MainServer::UpdateSystemdStatus MainServer::UpdateSystemdStatus calls m_sched->GetAllPending(recordings) Scheduler::GetAllPending calls QMutexLocker lockit(&schedLock)
This causes
- lock on sockListLock
- lock on schedLock
Thread 22 is doing this
Scheduler::run calls QMutexLocker lockit(&schedLock); (l2070) Scheduler::run calls HandleIdleShutdown; (l2234) Scheduler::HandleIdleShutdown calls MainServer::isClientConnected :8358 MainServer::isClientConnected calls sockListLock.lockForRead
This causes
- lock on schedLock
- lock on sockListLock
* DEADLOCK *
If you are setting two or more locks, you need to always set them in the same order in every thread, otherwise deadlocks can happen.
There are many places where mainserver calls scheduler and fewer than 10 places where scheduler calls mainserver. All are potential deadlock scenarios. Perhaps whenever scheduler calls mainserver it should first unlock its mutex and lock it again upon return.
ANY COMMENTS FROM DEVELOPERS?
comment:5 Changed 5 years ago by
scheduler.cpp already has 9 places where calls to other routines are surrounded by schedLock.unlock() and schedLock.lock(). It looks like I only need to add those to the two places in scheduler.cpp where m_mainServer->isClientConnected() is called.
comment:6 follow-up: 7 Changed 5 years ago by
Regarding m_recListChanged - this can be used to check if the list changed. If the list changes during the call to m_mainServer->isClientConnected, that will not be a problem. In HandleIdleShutdown it does look through the list, and the list cannot change while it is looking through the list since it is locked at that time. I prefer to make as few changes as possible to reduce risk, so I plan to only unlock and relock at the two places m_mainServer->isClientConnected is called inside HandleIdleShutdown.
comment:7 Changed 5 years ago by
Replying to Peter Bennett:
Regarding m_recListChanged - this can be used to check if the list changed. If the list changes during the call to m_mainServer->isClientConnected, that will not be a problem. In HandleIdleShutdown it does look through the list, and the list cannot change while it is looking through the list since it is locked at that time. I prefer to make as few changes as possible to reduce risk, so I plan to only unlock and relock at the two places m_mainServer->isClientConnected is called inside HandleIdleShutdown.
Please do not do this. When m_recListChanged is true, stop and return. Always. It's not just that iterators might have been invalidated, but other assumptions like list ordering might have been invalidated too. You might be able to convince yourself that in this case at this time, it won't hurt anything. That might not always be the case, though, as an unrelated, future change might invalidate it. To avoid that and keep things simple, just stop and return.
Changed 5 years ago by
Attachment: | 20200204_1521_deadlock_v31.patch added |
---|
Updated V31 (master) patch
comment:10 Changed 5 years ago by
Milestone: | needs_triage → 31.0 |
---|
Full backtrace