Opened 11 years ago

Closed 11 years ago

Last modified 11 years ago

#6305 closed defect (invalid)

Incorrect QWaitCondition & mutex usage in multiple places

Reported by: tomimo@… Owned by: Isaac Richards
Priority: minor Milestone: unknown
Component: MythTV - General Version: head
Severity: medium Keywords: qwaitcondition race hang
Cc: Ticket locked: no

Description (last modified by Janne Grunau)

I've noticed a quite frequent hang due to frequent status queries:

Example:

Thread 12 (Thread 0x49396950 (LWP 11152)):
#0  0x00007fdd7ee1c2d9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#1  0x00007fdd7d4ebb5b in ?? () from /usr/lib/libQtCore.so.4
#2  0x00007fdd7d4e73cd in QMutex::lock () from /usr/lib/libQtCore.so.4
#3  0x0000000000438e31 in QMutexLocker::relock (this=0x49393780) at /usr/include/qt4/QtCore/qmutex.h:116
#4  0x000000000043bdd9 in QMutexLocker (this=0x49393780, m=0xa77af8) at /usr/include/qt4/QtCore/qmutex.h:98
#5  0x00007fdd833642a5 in TVRec::IsBusy (this=0xa77a20, busy_input=0x0, time_buffer=5) at tv_rec.cpp:2466
#6  0x00000000004422a7 in EncoderLink::IsBusy (this=0xa9a530, busy_input=0x0, time_buffer=5) at encoderlink.cpp:102
#7  0x00000000004ff7fa in GetCurrentMaxBitrate (encoderList=0x776cc8) at backendutil.cpp:47
#8  0x00000000005012e8 in BackendQueryDiskSpace (strlist=@0x49394ba0, encoderList=0x776cc8, consolidated=true, allHosts=true) at backendutil.cpp:193
#9  0x0000000000453e96 in HttpStatus::FillStatusXML (this=0x7fdd70029170, pDoc=0x493952d0) at httpstatus.cpp:337
#10 0x0000000000456fd3 in HttpStatus::GetStatusHTML (this=0x7fdd70029170, pRequest=0x7fdd7052ce10) at httpstatus.cpp:134
#11 0x0000000000457236 in HttpStatus::ProcessRequest (this=0x7fdd70029170, pRequest=0x7fdd7052ce10) at httpstatus.cpp:88
#12 0x00007fdd81a049be in HttpServer::DelegateRequest (this=0x7fdd7001d600, pThread=0x7fdd70009170, pRequest=0x7fdd7052ce10) at httpserver.cpp:171
#13 0x00007fdd81a05196 in HttpWorkerThread::ProcessWork (this=0x7fdd70009170) at httpserver.cpp:297
#14 0x00007fdd81a01879 in WorkerThread::WakeForWork (this=0x7fdd70009170) at threadpool.cpp:218
#15 0x00007fdd81a01c88 in WorkerEvent::customEvent (this=0xa968a0, e=0x7fdd68830530) at threadpool.cpp:135
#16 0x00007fdd7d5e3dfd in QObject::event () from /usr/lib/libQtCore.so.4
#17 0x00007fdd7dd8fc3d in QApplicationPrivate::notify_helper () from /usr/lib/libQtGui.so.4
#18 0x00007fdd7dd979ba in QApplication::notify () from /usr/lib/libQtGui.so.4
#19 0x00007fdd7d5d4d61 in QCoreApplication::notifyInternal () from /usr/lib/libQtCore.so.4
#20 0x00007fdd7d5d59fa in QCoreApplicationPrivate::sendPostedEvents () from /usr/lib/libQtCore.so.4
#21 0x00007fdd7d5fd4d3 in ?? () from /usr/lib/libQtCore.so.4
#22 0x00007fdd79e14d3b in g_main_context_dispatch () from /usr/lib/libglib-2.0.so.0
#23 0x00007fdd79e1850d in ?? () from /usr/lib/libglib-2.0.so.0
#24 0x00007fdd79e186cb in g_main_context_iteration () from /usr/lib/libglib-2.0.so.0
#25 0x00007fdd7d5fd15f in QEventDispatcherGlib::processEvents () from /usr/lib/libQtCore.so.4
#26 0x00007fdd7d5d3682 in QEventLoop::processEvents () from /usr/lib/libQtCore.so.4
#27 0x00007fdd7d5d380d in QEventLoop::exec () from /usr/lib/libQtCore.so.4
#28 0x00007fdd7d4e93f8 in QThread::exec () from /usr/lib/libQtCore.so.4
#29 0x00007fdd81a0146d in WorkerThread::run (this=0x7fdd70009170) at threadpool.cpp:265
#30 0x00007fdd7d4ec362 in ?? () from /usr/lib/libQtCore.so.4
#31 0x00007fdd7ee183ea in start_thread () from /lib/libpthread.so.0
#32 0x00007fdd7ca4acbd in clone () from /lib/libc.so.6
#33 0x0000000000000000 in ?? ()

It looks there is multiple places where the QWaitCondition.wait/wakeall have been implemented incorrectly and therefore the code doesn't work as intended.

Example:

// Qt4 requires a QMutex as a parameter...
// not sure if this is the best solution.  Mutex Must be locked before wait.
     QMutex mutex;
     mutex.lock();
 
     while (encoding)
     {
         if (request_pause)
         {
            mainpaused = true;
            pauseWait.wakeAll();

No locks around this and no predicative set to make sure that the signal is detected in the receiver

            if (IsPaused() && tvrec)
                tvrec->RecorderPaused();
 
            unpauseWait.wait(&mutex, 100);

Uses a bogus mutex, doesn't have a proper locking around the wait-call and no predicative checking in order to detect if the wake-call has already been sent before we come to the wait()-call.

            if (cleartimeonpause)
                gettimeofday(&stm, &tzone);
            continue;

I'll attach a patch file which shows how I fixed the problem in recorderbase.cpp and .h. Because these pauseWait's and unpauseWait's are being used in several places, I'll upload a full patch after I've verified that my fix doesn't have anmy additional bugs.

Attachments (2)

condwait_usage.patch (4.4 KB) - added by tomimo@… 11 years ago.
A simple example of the fix I used for the described problem (full fix being prepared)
qttest.cpp (497 bytes) - added by tomimo@… 11 years ago.
Example test program displaying what is the problem with current conditionWait code.

Download all attachments as: .zip

Change History (6)

Changed 11 years ago by tomimo@…

Attachment: condwait_usage.patch added

A simple example of the fix I used for the described problem (full fix being prepared)

Changed 11 years ago by tomimo@…

Attachment: qttest.cpp added

Example test program displaying what is the problem with current conditionWait code.

comment:1 Changed 11 years ago by tomimo@…

Compile the attached qttest.cpp with the following command:

g++ -Wall -g -I/usr/include/qt4 -I/usr/include/qt4/QtCore qttest.cpp -o test -lQtCore

As the example code doesn't use any additional predicates either, it's very easy to see that a deadlock will occur if the consumer thread should act on every single wakeup-signal and not just hope that there will be another one soon ...

comment:2 Changed 11 years ago by Janne Grunau

Description: modified (diff)

I don't think the patch for Recorderbase is correct. The QWaitconditions are used as interuptible sleep. There is no need for useful mutexes in this usage case. Another problem with the patch is that it doesn't change the usage in derived classes.

the member variables paused and request_pause could probably be guarded by mutexes which should be used in the wait conditions in recorderbase.

I'm ready to accept that our standard Qt4 port of QWaitCondition with adding a otherwise unused mutex might create deadlocks in some cases but recorderbase is a bad example.

comment:3 Changed 11 years ago by Janne Grunau

Resolution: invalid
Status: newclosed

the QwaitCondition? usage in TVRec seems to be fine too and I can't reproduce the deadlock from above with running

while /bin/true; do wget "http://localhost:6544" -O /dev/null; sleep 0.1; done

three times simultaneously

comment:4 in reply to:  2 Changed 11 years ago by tomimo@…

Replying to janne:

I don't think the patch for Recorderbase is correct. The QWaitconditions are used as interuptible sleep. There is no need for useful mutexes in this usage case. Another problem with the patch is that it doesn't change the usage in derived classes.

The attached patch of course doesn't fix all the places, because I noticed that there exists at least 15 locations around the code which has the same QWaitCondition-usage flaw and I wanted to test my fixes as before committing it all. I did however want to bring some attention to this problem right a way.

I understand very well your comment about the purpose of the discussed code, but what I don't understand is that why would you want to keep the code which is obviously wrong ? This is a good example of QWaitCondition-usage where a wakeOne() or wakeAll() signal can be missed by mistake and even though the timedcondwait will eventually timeout and fix a potential deadlock situation, why shouldn't we fix the code to work properly in the first place ?

the member variables paused and request_pause could probably be guarded by mutexes which should be used in the wait conditions in recorderbase.

I'm ready to accept that our standard Qt4 port of QWaitCondition with adding a otherwise unused mutex might create deadlocks in some cases but recorderbase is a bad example.

I'm little bit amazed to see that you have closed this ticket, even though you admit yourself (and in the comments of the code) that the code in question is propably not correct. It's not that hard to fix the detected errors in order to make the MythTV more stable. I'm sure you that you know very well that the small loop wget test proves absolutely nothing about the stability of the code, especially as I've been (unfortunately) able to get the mythbackend hang in two different hosts once or twice a day without fixing these QWaitCondition cases.

I have no problem in submitting patches to MythTV, but of course I'd like to know that you'd be willing to accept them in some form as well.

I'm open to suggestions how to get these fixes done and I kindly ask you not to close this ticket just yet.

Note: See TracTickets for help on using tickets.