Opened 11 years ago

Closed 11 years ago

Last modified 11 years ago

#7528 closed defect (fixed)

Backend Deadlock during rescheduling

Reported by: Jelle Foks <jelle-mythtv@…> Owned by: gigem
Priority: minor Milestone: unknown
Component: MythTV - Scheduling Version: unknown
Severity: medium Keywords:
Cc: Ticket locked: no

Description

I've had deadlocks in the backend that made the backend appear completely unresponsive to frontends (can't get a list of recorded shows, can't start playback of a recording etc). The only way out was restarting mythbackend on the master backend.

After experiencing the deadlock regularly, more than just a couple of times, I saw that it seemed to occur during re-scheduling.

Examining the use of mutexes around mythtv made me dizzy, there are so many different ones and it wasn't very clear to me which locks are or could be held during which functions...

But I found the instance below that looked like a potential source of the deadlocks I had been encountering, because it would re-acquire reschedLock before releasing recordmatchLock, so the thread may be blocked while holding that lock, which would be a deadlock if another thread is trying to get recordmatchLock while holding reschedLock. And since I'm not sure if it's guaranteed by the C++ standard or g++ compiler that the QMutexLocker constructor is not run before the reschedLock.unlock() statement above it, that may be the case right there in the same function...

So I made the change below in my mythtv to make sure it wasn't a deadlock anymore.

I haven't seen the backend deadlocks anymore since applying the change below (tested for months now, while before the patch I would sometimes get the deadlock more than once in a week).

Maybe not the cleanest way to do it, but it worked for me.

Looking at the other call to UpdateMatches? and the other use of recordmatchLock, the use of the lock doesn't make sense to me anyway (it makes one call to a mysql exec exclusive to a call to UpdateMatches? elsewhere, but doesn't do that for all mysql exec calls not all UpdateMatches? calls, so I'm at a loss at what the mutex is trying to achieve).

Perhaps it really should be called UpdateMatchesLock? and be locked only inside of UpdateMatches?... Or perhaps the whole locking logic around the scheduler should be revisited?

--- mythtv/programs/mythbackend/scheduler.cpp   (revision 22704)
+++ mythtv/programs/mythbackend/scheduler.cpp   (working copy)
@@ -1610,8 +1610,10 @@
                         reschedQueue.clear();

                     reschedLock.unlock();
-                    QMutexLocker locker(&recordmatchLock);
-                    UpdateMatches(recordid);
+                    {
+                        QMutexLocker locker(&recordmatchLock);
+                        UpdateMatches(recordid);
+                    }
                     reschedLock.lock();
                 }
             }

Attachments (1)

7528-reschedlock1.patch (1.1 KB) - added by gigem 11 years ago.

Download all attachments as: .zip

Change History (8)

comment:1 Changed 11 years ago by gigem

Status: newassigned

Changed 11 years ago by gigem

Attachment: 7528-reschedlock1.patch added

comment:2 Changed 11 years ago by gigem

The port to Qt4 and handling of non-thread-safe containers introduced an ABBA deadlock. It's odd that you seem to be the only one to have seen it occur. Please try the patch I attached. It's admittedly uglier than yours, but hopefully makes it clearer that reschedLock is only used to signal with reschedWait and to protect access to reschedQueue. recordmatchLock still protects access to the recordmatch table.

comment:3 Changed 11 years ago by Paul Kendall <paul@…>

I too saw this deadlock about 2 days ago. My system was recording one show and about to complete that and start recording two others. I was looking at the recorded shows in mythweb when I noticed that the single recording show was still recording and the others had not started. Also, mythweb was not generating preview images. I had to restart the backend then the two shows started to record as expected and the previews were generated.

comment:4 Changed 11 years ago by Jelle Foks <jelle-mythtv@…>

Thanks, the patch looks fine to me, I'll try it soon.

Maybe it needs a certain setup or recordings-schedule to trigger... Sometimes the bug would not trigger for weeks, then suddenly more than once in a week... I have multiple tuners and many recordings scheduled (e.g. record as many new movies as possible at the lowest priority, ditto local news broadcasts etc). My setup is almost never idle.

But what about the (other) call to UpdateMatches?() in FillRecordListFromDB() (which can be called from mainserver.cpp), shouldn't that then be done while holding the recordmatchLock too? And how about AddNewRecords?(), which has a "UPDATE recordmatch ", and is called from FillRecordList?(), which is called twice in scheduler.cpp without holding that lock.

Also: AddNotListed?() uses the table and is also called from FillRecordList?() without holding the lock.

comment:5 Changed 11 years ago by gigem

The call to UpdateMatches?() in FillRecordListFromDB() uses the private, temporary copy of recocrdmatch that it just made, so it doesn't need any protection. The access in AddNewRecords?() is updating columns that are only used by the scheduler thread, so I don't believe it needs any protection. All of the remaining unprotected uses of recordmatch are single, read-only queries, which I don't believe need protection either.

comment:6 Changed 11 years ago by gigem

Resolution: fixed
Status: assignedclosed

(In [22813]) Fixed an ABBA deadlock in the scheduler that crept in during the port to Qt4. Based on a patch by Jelle Foks who tracked down the problem. Fixes #7528.

comment:7 Changed 11 years ago by gigem

(In [22814]) Backport of [22813] to fix an ABBA deadlock in the scheduler that crept in during the port to Qt4. Based on a patch by Jelle Foks who tracked down the problem. Refs #7528.

Note: See TracTickets for help on using tickets.