Ticket #10870 (closed Bug Report - Hang/Deadlock: fixed)
Opened 11 months ago
Last modified 9 months ago
Race condition between StreamHandler::RemoveListener and StreamHandler::AddListener
| Reported by: | roger@… | Owned by: | danielk |
|---|---|---|---|
| Priority: | critical | Milestone: | 0.26 |
| Component: | MythTV - General | Version: | 0.25-fixes |
| Severity: | medium | Keywords: | AddListener RemoveListener |
| Cc: | Ticket locked: | no |
Description
There is a race condition when a thread calls StreamHandler::AddListener? when a parallel executing thread has already called AddListener::RemoveListener?. This results in both threads being deadlocked and never returning from these calls. The error message "Programmer Error: Start called before Stop finished" is also logged. The symptoms of this problem are recordings appearing in the completed recordings list which either have no file associated with them or a file with only a few bytes (usually 326 I think) in it.
I have attached a log showing this problem. This log is from mythbackend running on a quad core machine.
The backend is attempting to record two programs with the same start time on the same DTV multiplex. The capture cards are configured to allow multiple simultaneous recordings. The salient log lines are at
2012-06-26 21:00:02.354383 I [2501/2511] TVRecEvent streamhandler.cpp:97 (RemoveListener?) - SH(/dev/dvb/adapter201/frontend0): RemoveListener?(0xffffffffa1031e80) -- begin
The thread 2511 will end up blocked in StreamHandler::Stop and never return, and
2012-06-26 21:00:02.369020 I [2501/2706] RecThread? streamhandler.cpp:45 (AddListener?) - SH(/dev/dvb/adapter201/frontend0): AddListener?(0x972e448) -- begin
The thread 2706 will end up blocked in StreamHandler:Start and never return.
Attachments
Change History
Changed 11 months ago by roger@…
- Attachment mythbackendversion.txt added
comment:1 Changed 11 months ago by Roger James <roger@…>
The patch below is solely intended as a work around, and I have put it here for information only. I have regression tested it and done some very basic testing against the issue I reported. It appears to work. I don't like the idea of introducing extra locks to fix a bug, but it is the least intrusive change I could think of. A better fix would possibly be the implementation of a proper finite state machine with queued event handling.
Roger
$ git diff streamhandler.cpp
diff --git a/mythtv/libs/libmythtv/streamhandler.cpp b/mythtv/libs/libmythtv/str
index 2b0a7b6..28de368 100644
--- a/mythtv/libs/libmythtv/streamhandler.cpp
+++ b/mythtv/libs/libmythtv/streamhandler.cpp
@@ -133,7 +133,8 @@ void StreamHandler::RemoveListener(MPEGStreamData *data)
void StreamHandler::Start(void)
{
- QMutexLocker locker(&_start_stop_lock);
+ QMutexLocker start_stop_procedure_locker(&_start_stop_procedure_lock);
+ QMutexLocker start_stop_locker(&_start_stop_lock);
if (_running)
{
@@ -173,7 +174,8 @@ void StreamHandler::Start(void)
void StreamHandler::Stop(void)
{
- QMutexLocker locker(&_start_stop_lock);
+ QMutexLocker start_stop_procedure_locker(&_start_stop_procedure_lock);
+ QMutexLocker start_stop_locker(&_start_stop_lock);
do
{
$ git diff streamhandler.h
diff --git a/mythtv/libs/libmythtv/streamhandler.h b/mythtv/libs/libmythtv/strea
index 1db0584..ae0bc80 100644
--- a/mythtv/libs/libmythtv/streamhandler.h
+++ b/mythtv/libs/libmythtv/streamhandler.h
@@ -45,7 +45,7 @@ class PIDInfo
typedef QMap<uint,PIDInfo*> PIDInfoMap;
// locking order
-// _pid_lock -> _listener_lock -> _start_stop_lock
+// _pid_lock -> _listener_lock -> _start_stop_procedure_lock -> _start_stop_lock
class StreamHandler : protected MThread, public DeviceReaderCB
{
@@ -93,7 +93,7 @@ class StreamHandler : protected MThread, public DeviceReaderCB
virtual void RemoveNamedOutputFile(const QString &filename) {}
/// At minimum this sets _running_desired, this may also send
/// signals to anything that might be blocking the run() loop.
- /// \note: The _start_stop_lock must be held when this is called.
+ /// \note: The _start_stop_procedure_lock and the _start_stop_lock must be
virtual void SetRunningDesired(bool desired) { _running_desired = desired;
protected:
@@ -101,6 +101,7 @@ class StreamHandler : protected MThread, public DeviceReader
bool _needs_buffering;
bool _allow_section_reader;
+ mutable QMutex _start_stop_procedure_lock; /// Serialises calls to Start
mutable QMutex _start_stop_lock;
volatile bool _running_desired;
volatile bool _error;
$
comment:3 Changed 10 months ago by danielk
- Priority changed from minor to major
- Status changed from assigned to accepted
- Milestone changed from unknown to 0.26
comment:4 Changed 10 months ago by Daniel Thor Kristjansson <danielk@…>
- Status changed from accepted to closed
- Resolution set to fixed
comment:5 Changed 10 months ago by Daniel Thor Kristjansson <danielk@…>
comment:6 Changed 10 months ago by Daniel Thor Kristjansson <danielk@…>
comment:7 Changed 10 months ago by danielk
- Status changed from closed to new
- Resolution fixed deleted
comment:8 Changed 10 months ago by Daniel Thor Kristjansson <dkristjansson@…>
- Status changed from new to closed
- Resolution set to fixed
comment:9 Changed 10 months ago by Daniel Thor Kristjansson <dkristjansson@…>
comment:10 Changed 10 months ago by stuartm
- Priority changed from major to critical
- Status changed from closed to new
- Resolution fixed deleted
This doesn't appear to be fixed, just experienced the deadlock in StreamHandler::RemoveListener?(), BT attached.
Edit: Backtrace not attached, forgot to enable logging in gdb ...
Changed 10 months ago by stuartm
Partial BT, first dozen threads are missing including one stuck in StreamHandler::RemoveListener?()
comment:11 Changed 10 months ago by Daniel Thor Kristjansson <dkristjansson@…>
- Status changed from new to closed
- Resolution set to fixed
comment:12 Changed 10 months ago by danielk
stuartm, unless we have a backtrace with the deadlocked threads we can't really do anything. For now, I've backported the current fix to 0.25-fixes.
Changed 9 months ago by pedro@…
- Attachment mythbackend-backtrace.txt added
backtrace of deadlocked backend
comment:13 Changed 9 months ago by pedro@…
This problem doesn't seem fixed. I'm running the ubuntu packages of mythtv version 0.25.2+fixes.20120802.46cab93-0ubuntu1 that judging by the date should include the July 24th backport and yet the backend gets completely deadlocked when trying to record more than one show at a time from the same DVB-T card where all channels are available on the same frequency in a single stream.
To get the server back up I have to kill -9 the backend process and then restart it. I've attached a backtrace that seems to show both calls to StreamHandler::RemoveListener? and StreamHandler::AddListener? as this bug describes. I've setup recordings of 3 news shows at the same times twice a day and this seems to trip the bug every time, so if any other information is needed let me know.
comment:14 Changed 9 months ago by danielk
Perdo, please test the 0.26 beta.
comment:15 Changed 9 months ago by Pedro Côrte-Real <pedro@…>
I tested the packages from https://launchpad.net/~mythbuntu/+archive/0.26 version 0.26.0~master.20120816.7203f17-0ubuntu0mythbuntu2. With those I was able to record 3 shows at a time without it failing.

mythbackend version info