Opened 18 years ago

Closed 17 years ago

#2434 closed defect (fixed)

Workaround for hangs when ffwd/rewinding video

Reported by: matt-bulk-mythtv@… Owned by: danielk
Priority: minor Milestone: 0.21
Component: mythtv Version: head
Severity: medium Keywords:
Cc: mythtv@… Ticket locked: no

Description

I'm running myth-0.20-fixes on a VIA EPIA M10000 box, and I get regularly get hangs while fast forwarding/rewinding videos (live tv or recorded).

It seems to happen when seeking then resuming playback. In the logs I get loads of messages like:

2006-09-20 20:40:27.917 Waited too long for video out to pause

I've seen other people point out this issue too. Tracing it back, it seems to be getting stuck in PauseVideo?():

#0  0xb5df3440 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/tls/libpthread.so.0
#1  0xb64e04e0 in QWaitCondition::wait () from /usr/lib/libqt-mt.so.3
#2  0xb792fcba in NuppelVideoPlayer::PauseVideo (this=0x8f32db0, wait=true)
    at NuppelVideoPlayer.cpp:456
#3  0xb7943448 in NuppelVideoPlayer::StartPlaying (this=0x8f32db0)
    at NuppelVideoPlayer.cpp:3317
#4  0xb78deedf in SpawnDecode (param=0x8f32db0) at tv_play.cpp:256
#5  0xb5df0b63 in start_thread () from /lib/tls/libpthread.so.0
#6  0xb5ca918a in clone () from /lib/tls/libc.so.6

I think what is happening is that the seeking operation is trying to pause the video:

           if (fftime >= 5)
            {
                PauseVideo();

but there is a race between PauseVideo?() setting the pausevideo member to true:

void NuppelVideoPlayer::PauseVideo(bool wait)
{
    video_actually_paused = false;
    pausevideo = true;

and the user pressing 'play' causing pausevideo to become false in UnpauseVideo?()

void NuppelVideoPlayer::UnpauseVideo(void)
{
    pausevideo = false;
}

I think this is causing PauseVideo? to get stuck in an infinite loop will trying to seek.

I've attached a patch which works around this, but I've no idea if it's the correct thing to do. Feedback please :)

Attachments (8)

patch.txt (1.5 KB) - added by anonymous 18 years ago.
patch2.txt (2.0 KB) - added by Matt Holgate <matt-bulk-mythtv@…> 18 years ago.
New patch which addresses race issue more correctly.
mythplayback.diff (2.1 KB) - added by tomimo@… 17 years ago.
This patch fixes the incorrect locking of pausevideo-variable & videoThreadPaused-conditional.
2434-v1.patch (6.2 KB) - added by danielk 17 years ago.
Patch for NVP pause problem
gdb.txt (6.9 KB) - added by tomimo@… 17 years ago.
Test run with v2434-v1.patch (doesn't work)
2434-v2.patch (7.2 KB) - added by Matt Holgate <matt-bulk-mythtv@…> 17 years ago.
Modfied version of danielk's patch with additional lock
2434-v3.patch (7.3 KB) - added by danielk 17 years ago.
Cosmetic changes to 2434-v2.patch
mythplayback2.diff (3.3 KB) - added by tomimo@… 17 years ago.
Fixes a still existing race condition with Unpause/pause & lirc remote control

Download all attachments as: .zip

Change History (37)

Changed 18 years ago by anonymous

Attachment: patch.txt added

comment:1 Changed 18 years ago by Isaac Richards

Priority: majorminor

comment:2 Changed 18 years ago by g8ecj@…

Has anyone looked into this (especially to see if there are other instances)?

I'm finding that on an Epia front end its almost unusable without this patch as a moments impatience and its a reboot to recover (restarting just the frontend results in the XvMC display being messed up, often resulting in a green screen).

The patch applies cleanly to r11458.

comment:3 Changed 18 years ago by matt-bulk-mythtv@…

The only (serious) instance that I can immediately think of might be that pressing pause while seeking could cause the same thing. I don't know if this is a problem in practice though (this would happen because we wait for the video to be unpaused in a loop).

Also, maybe we should check that the video really is paused (by calling GetVideoPause?()) after other calls to PauseVideo?() in the same file (since PauseVideo? can now return with the video not actually paused with my patch).

Another option would be to add a lock to make sure that the setting of pausevideo and the wait for video_actually_paused to become true is atomic. In fact, a lock around the whole "Pause, DoFastForward?, Unpause", and the NuppelVideoPlayer::Pause() and NuppelVideoPlayer::Play() code might be a more robust way of fixing this, to ensure that user interactions don't interfere with programmaticly introduced pauses?

Isaac - any thoughts on this?

comment:4 Changed 18 years ago by anonymous

Hi, we used the patch, but the race still occurs.

Is there any update about this topic?

Tanks

comment:5 Changed 18 years ago by matt-bulk-mythtv@…

anonymous> What problem do you actually see? Is it when you press play while ffwd/rewinding?

comment:6 Changed 18 years ago by anonymous

My problem is that mythfrontend hangs as it arrives at beggining of autorecording while rewinding.

It happens also without pressing play. It simply hangs as it reaches the dummy recording which are empty. I tried to debug it a bit and it seems hanging because it needs at least one frame.

This problem is getting me crazy. Tanks of lots.

comment:7 Changed 18 years ago by matt-bulk-mythtv@…

I'm not sure this is the same issue. Are you getting lots of the following messages in your frontend log?

2006-09-20 20:40:27.917 Waited too long for video out to pause

comment:8 Changed 18 years ago by g8ecj@…

Version: 0.20head

Sorry to muddy the waters even more as the problem I have is different again - but the patch on this ticket fixes it!!

If I *skip* forwards (30secs) or backwards (8secs) without the patch whilst the OSD is repainting I get a hangup (i.e. only one button press per second if I'm lucky). With this patch I can go back/forwards as fast as I can press buttons on the remote.

comment:9 in reply to:  7 Changed 18 years ago by anonymous

Replying to matt-bulk-mythtv@holgate.org.uk:

I'm not sure this is the same issue. Are you getting lots of the following messages in your frontend log?

2006-09-20 20:40:27.917 Waited too long for video out to pause

I'll try to briefly explain what happens:

When I start viewing a channel if I try to rewind BEFORE the 0MB dummy record expires, that is before the following message appears in mythbackend log:

2006-10-12 10:12:48.232 Expiring Unknown from Thu Oct 12 10:09:45 2006, 0 MBytes, forced expire (LiveTV recording)

the rewind process goes till the beginning of the recording, then it jumps to the 0MB recording, in fact the following message appears "Rewind 3X 0:00 of 0:02" and remains there forever and the mythfrontend must be restarted. If mythfrontend is normally started (I mean without -v all) no message appears on the log; instead if -v all is enabled the log is quite long (where can I submit it?) Instead if I try to do rewind AFTER the appearance of

2006-10-12 10:12:48.232 Expiring Unknown from Thu Oct 12 10:09:45 2006, 0 MBytes, forced expire (LiveTV recording)

everything works fine. Have you idea of what's wrong on my Myth-Box?

comment:10 Changed 18 years ago by Matt Holgate <matt-bulk-mythtv@…>

I think the issue reported by anonymous above is unrelated to the issue here. As discussed off-ticket, please could you open a new ticket to cover this issue? Thanks!

comment:11 Changed 18 years ago by Matt Holgate <matt-bulk-mythtv@…>

Ok, I've had a go at trying to fix this properly.

The new patch I'm about to upload is a much better solution which prevents the race condition by ensuring the PauseVideo?() is only ever called from a single thread (the decoder thread). This ensures that pause/unpause operations are thread safe.

The only issues that I'm concerned about are:

  • it's no longer possible to call NVP::Pause with waitvideo==false (but I can't find any code that does this anyway. I've not changed the API to remove the parameter because I guess this would be a bad idea on the fixes branch
  • calling NVP::PauseDecoder? now always causes the video/audio to be paused as well (I've no idea if this is an issue or not).

I've only tested this patch very briefly, but it seems to fix the issue. I'll run with it for the next few days and see how it fares. If anyone else would like to test it too, that would be great.

Changed 18 years ago by Matt Holgate <matt-bulk-mythtv@…>

Attachment: patch2.txt added

New patch which addresses race issue more correctly.

comment:12 Changed 18 years ago by Matt Holgate <matt-bulk-mythtv@…>

[Ok, I just noticed that there's also some code in the video output thread which also modifies the pause state :(. But this does appear to be related DVD only, and I've no idea if it would actually cause a problem in practice...]

comment:13 Changed 18 years ago by danielk

Owner: changed from Isaac Richards to danielk

comment:14 Changed 17 years ago by anonymous

Cc: mythtv@… added

comment:15 Changed 17 years ago by hads

I'm getting this same issue on a friends install, it's using the .20 packages from Ubuntu Edgy. When I get a chance I will install from .20-fixes and test this patch.

Changed 17 years ago by tomimo@…

Attachment: mythplayback.diff added

This patch fixes the incorrect locking of pausevideo-variable & videoThreadPaused-conditional.

comment:16 Changed 17 years ago by tomimo@…

The attached "mythplayback.diff" patch fixes (at least for me) the incomplete locking of pausevideo-variable and videoThreadPaused-conditional which causes hangs described in this ticket. The patch applies to SVN-trunk (12283).

Regards, Tomi Orava

Changed 17 years ago by danielk

Attachment: 2434-v1.patch added

Patch for NVP pause problem

comment:17 Changed 17 years ago by danielk

Milestone: unknown0.21

Matt, Tomino, can you test the patch I just attached?

I've put it through about 200 pause/unpause cycles here, but I've only rarely experienced this problem so I'm not the best tester. The patch basically expands on what Tomino did in the last patch and completely protects writes to video_actually_paused and gets rid of some busy wait loops to boot.

comment:18 Changed 17 years ago by tomimo@…

Unfortunately the latest patch doesn't work. It seems that the frontend likes to hang (endless loop) on line 460 of NuppelVideoPlayer?.cpp in NuppelVideoPlayer::PauseVideo?(bool wait). The line is:

videoThreadPaused.wait(&pauseUnpauseLock, 250);

It seems that FC6's current gdb has some problems but the code does print the debug message line "VERBOSE(VB_IMPORTANT, "Waited too long for video out to pause");" over and over again, but is unable to break from the loop.

My personal test setup is one serial-lirc based remote controller, sticky fwd/rev keys enabled and the hang usually occurs whenever I'm seeking backwards and pressing the play to continue normal playback.

comment:19 Changed 17 years ago by danielk

Tomino, please compile with debugging symbols and run the frontend in gdb. Then hit Ctrl-C after it starts printing this, and create a backtrace with "thread apply all bt". Attach the backtrace to the ticket when your done. I'd like to know where the deadlock is occuring.

comment:20 Changed 17 years ago by tomimo@…

Hi Daniel,

There really seems to be a bug in current kernel or gdb 6.5 from FC6 ... Gdb just gives "Failed to read a valid object file image from memory." and fails (and applies to others as well as shown by Google). I'll get a fix/update for my machine as soon possible and try to get the backtrace of the hang even though this christmas thing is creating some extra work for me ATM :)

Changed 17 years ago by tomimo@…

Attachment: gdb.txt added

Test run with v2434-v1.patch (doesn't work)

comment:21 Changed 17 years ago by tomimo@…

Ok, I've run some tests with the latest patch and I must say I don't quite follow the logic of that patch. The problem seems to be that whenever I get into a situation where the NuppelVideoPlayer::Play() gets called _before_ the NuppelVideoPlayer::Pause() it's a deadlock, as the UnpauseVideo?()-method is not able to directly wakeup the PauseVideo?()-method ---> PauseVideo?() is waiting for video_actually_paused-variable to be true, but the UnpauseVideo?()-method doesn't touch that variable. Another thing I noticed was that the pausevideo-variable is being used without locking, which might create some races in some situations.

comment:22 in reply to:  16 Changed 17 years ago by maxchowhk@…

Type: patchdefect

The patch mythplayback.diff seems working for ffward then pause, but still fell into loop when do rewind then pause. Any idea?

comment:23 Changed 17 years ago by Matt Holgate <matt-bulk-mythtv@…>

Daniel, thanks for the new patch, and sorry to take so long to get back to you.

Anyway, I've tested the patch, and unfortunately it still doesn't fix the problem I was seeing. I have produced a modified version which I think /does/ fix it, at least in my limited testing so far.

The problem was that even though 'video_actually_paused' and 'pausevideo' are now protected by the pauseUnpauseLock, the original race can still occur given the following sequence of events:

  • ffwd/rewind code tries to pause the video by calling PauseVideo?(). PauseVideo?() waits on the condition variable for the video to /actually/ pause, meanwhile releasing the pauseUnpauseLock.
  • user presses 'play', resulting in UnpauseVideo?() being called. This thread is able to get the pauseUnpauseVideo lock, and pauseVideo is set to false.
  • Since 'pauseVideo' is now false, the video is never actually paused, and the ffwd/rewind code is not able to proceed.

The fix I have made adds an additional lock 'internalPauseLock'. This ensures that any internally generated (e.g. from the seeking code), {Pause/Do? something/Unpause} sequences are now atomic with respect to any user initiated pause/play operations. This means that an internally generated pause cannot ever interact badly with a user-generated one.

I've only briefly tested this patch, but I'll leave it running for the next few days to see how it fares. Any feedback really welcome, as it'd be great to get this one closed.

Changed 17 years ago by Matt Holgate <matt-bulk-mythtv@…>

Attachment: 2434-v2.patch added

Modfied version of danielk's patch with additional lock

Changed 17 years ago by danielk

Attachment: 2434-v3.patch added

Cosmetic changes to 2434-v2.patch

comment:24 Changed 17 years ago by ylee@…

I don't have an Epia, but tried the patch out on my 0.20-fixes system in hopes it would help out with the fast forwarding/rewinding with libmpeg2 getting stuck problem; sadly, no luck there. However, is there any harm in me continuing to use this patch? Is it something that will be incorporated into 0.20-fixes?

comment:25 Changed 17 years ago by danielk

ylee, there should be no harm in using the patch. This bug is present in 0.20-fixes as well, but I don't know if the fix will make it into 0.20-fixes only because it affects few people.

comment:26 Changed 17 years ago by matt-bulk-mythtv@…

I've just tested patch v3 - by hammering ffwd/rewind/play/pause for about five minutes during playback, and couldn't reproduce the problem. So I think this patch is good to go :)

Note, I only tested it on 0.20-fixes, as I do not have an svn-trunk setup (if it's possible to commit this to both branches, I'd be grateful).

Thanks for all your help!

comment:27 Changed 17 years ago by danielk

Resolution: fixed
Status: newclosed

(In [13457]) Fixes #2434. Fixes a locking issue with FF/Rew.

This mostly hit people with very slow CPU's like the VIA ones. Because we keep a binary paused/unpaused state rather than a count we need an additional lock for operations that pause, do something, then unpause; the rest of the changeset just makes this efficient by adding a QWaitCondition.

Tested by myself, Matt Holgate, Yeachang Lee, and Robin Gilks.

comment:28 Changed 17 years ago by tomimo@…

Resolution: fixed
Status: closedreopened

See attached PATCH[mythplayback2.diff]!

This particular bug can still be seen with the latest SVN trunk (13857). It seems that there still exists a race condition, where the user rewinds backwards and then presses PLAY to return to normal playback speed ---> it is possible that a LIRC-based remote control thread is able to call NuppelVideoPlayer::UnpauseVideo?() before the mythfrontend thread calls NuppelVideoPlayer::PauseVideo?(). Whenever this happens the mythfrontend hangs indefinetely with the error message "Waited too long for video out to pause" over and over.

The attached patch fixes this particular race condition by still introducing a single condition variable.

Regards, Tomi Orava

Changed 17 years ago by tomimo@…

Attachment: mythplayback2.diff added

Fixes a still existing race condition with Unpause/pause & lirc remote control

comment:29 Changed 17 years ago by danielk

Resolution: fixed
Status: reopenedclosed

Tomimo, please document the actual sequence of events in terms of actions sent by LIRC and functions and methods MythTV runs. It also sounds like this problem you are experiencing is on a higher level, because a pause is being called when you don't need one in the first place, so please open a new ticket for this problem.

Note: See TracTickets for help on using tickets.