Opened 13 years ago

Closed 13 years ago

Last modified 13 years ago

#2889 closed defect (fixed)

Backend crashes during what appears to be preview generation

Reported by: michael.tiller@… Owned by: danielk
Priority: minor Milestone: unknown
Component: mythtv Version: head
Severity: medium Keywords:
Cc: Ticket locked: no

Description

My backend regularly crashed due to a floating point exception. I'm using the latest SVN (12453) and running gdb to diagnose the problem. Here is a stack trace when the SIGFPE occurs:

#0 0xb5ed4597 in divdi3 () from /lib/libgcc_s.so.1 #1 0xb7540684 in av_reduce (dst_nom=0xaab564f4, dst_den=0xaab564f0, nom=0, den=0, max=1073741824) at rational.c:39 #2 0xb72d255c in mpeg_decode_frame (avctx=0xaab564d0, data=0xad7077bc, data_size=0xad707658, buf=0xa913e008 "", buf_size=166352) at mpeg12.c:2141 #3 0xb720d0b5 in avcodec_decode_video (avctx=0xaab564d0, picture=0xad7077bc, got_picture_ptr=0xad707658, buf=0xa913e008 "", buf_size=166352) at utils.c:903 #4 0xb7aedf6b in AvFormatDecoder::GetFrame? (this=0xaab40c00, onlyvideo=1) at avformatdecoder.cpp:3195 #5 0xb7aa12fd in NuppelVideoPlayer::GetFrameNormal? (this=0xaab02838, onlyvideo=1) at NuppelVideoPlayer?.cpp:1224 #6 0xb7aa5727 in NuppelVideoPlayer::GetFrame? (this=0xaab02838, onlyvideo=1, unsafe=false) at NuppelVideoPlayer?.cpp:1305 #7 0xb7aaee1b in NuppelVideoPlayer::GetScreenGrab? (this=0xaab02838, secondsin=64, bufflen=@0xad708170, vw=@0xad70816c, vh=@0xad708168, ar=@0xad708164) at NuppelVideoPlayer?.cpp:5156 #8 0xb794586d in PreviewGenerator::GetScreenGrab? (pginfo=0xaab00d78, filename=@0xaab00da8, secondsin=64, bufferlen=@0xad708170, video_width=@0xad70816c, video_height=@0xad708168, video_aspect=@0xad708164) at previewgenerator.cpp:387 #9 0x0809a7ce in MainServer::HandleGenPreviewPixmap? (this=0x8214f20, slist=@0xad70831c, pbs=0x8225340) at mainserver.cpp:3561 #10 0x080b37de in MainServer::ProcessRequestWork? (this=0x8214f20, sock=0x823aeb0) at mainserver.cpp:503 #11 0x080b57d2 in MainServer::ProcessRequest? (this=0x8214f20, sock=0x823aeb0) at mainserver.cpp:303 #12 0x080bdc52 in ProcessRequestThread::run (this=0x821d158) at mainserver.cpp:140 #13 0xb6482101 in QThreadInstance::start () from /usr/lib/libqt-mt.so.3 #14 0xb5fe6504 in start_thread () from /lib/tls/i686/cmov/libpthread.so.0 #15 0xb5e6951e in clone () from /lib/tls/i686/cmov/libc.so.6

The underlying problem is that I have an mpeg file (which I can provide if needed) that was captured using and HDHomerun. My guess is that the file is corrupted somehow. Myth does not seem to recognize the file is corrupted and instead attempts to work with the data. A lack of checking in the code leads to a variety of problems (this division by zero is just the first issue...if you clear that up, lots of other NULL pointer issues start showing up).

I don't know enough about the internals to figure out how to recognize the problem properly and deal with it (the lack of documentation about return values from functions is quite problematic for me as well). I'll keep trying, but this is a pretty serious issue since it crashes the backend (and I'm pretty sure it crashes the frontend too although I have only anecdotal evidence to support this, no stack trace...yet).

I'll try to put together a patch, but even if I succeed it will probably be a huge hack and not commitable. I hope this helps. Let me know if you need any other info (like the MPEG file that causes this...it's 803 Mb).

Attachments (1)

stack_trace.txt (1.8 KB) - added by michael.tiller@… 13 years ago.
Stack trace

Download all attachments as: .zip

Change History (12)

comment:1 Changed 13 years ago by michael.tiller@…

Terribly sorry about the formatting. I forgot to include the stack trace as an attachment. I'll do that now.

Changed 13 years ago by michael.tiller@…

Attachment: stack_trace.txt added

Stack trace

comment:2 Changed 13 years ago by michael.tiller@…

Rereading the original ticket, I wanted to clarify something. This happens all the time for reasons I don't understand. I'm recording off of QAM and much of the time it is fine but for some reason during midday (when Sesame Street or Bob the Builder are on?!?) this problem crops up. This is not a case of a single corrupted file. I get these all the time. I just wanted to make that clear since it may have originally sounded like I had this problem with only one recording.

comment:3 Changed 13 years ago by danielk

Owner: changed from Isaac Richards to danielk

Can mplayer play this file from beginning to end without crashing?

comment:4 Changed 13 years ago by michael.tiller@…

I'm looking into whether mplayer can play it back. I'm using "-vo null" to test this. I hope that is OK? Also, when I go to play it I get a line like this during playback:

A: 355.7 (05:55.6) of 33459.8 ( 9:17:39.7) 0.4%

Note the "9:17:39.7". Am I correct in interpreting this as implying the length of the file is 9 hours, 17 minutes and 39.7 seconds? If so, something is clearly no correct there. It should be just a one hour show?

Is there a way to get mplayer to playback as fast as possible (since I'm rendering to a null output device)? Otherwise, I'll have to wait for 9 hours for it to go through the file when I'm sure it could probably go 5x faster.

Also, it is worth pointing out that it is already 5 minutes into the file and that the thumbnail previews (which are what is causing the crash) should be taken from the very beginning (I haven't watched the show so the default is like 20 seconds in isn't it?).

comment:5 Changed 13 years ago by michael.tiller@…

OK, more info. I ran mplayer (mplayer -vo null filename.mpg) and it ran for some time and then stopped. Here is the output:

Starting playback...
a52: CRC check failed!  459.8 ( 9:17:39.7)  0.4%
a52: error at resampling
a52: CRC check failed!  459.8 ( 9:17:39.7)  0.4%
TS_PARSE: COULDN'T SYNC3459.8 ( 9:17:39.7)  0.4%
A:2220.1 (37:00.1) of 33459.8 ( 9:17:39.7)  0.4%
alsa-uninit: pcm closed

Exiting... (End of file)

Is there something else I should do now?

comment:6 Changed 13 years ago by michael.tiller@…

Here are some patches that I think are actually quite reasonable. They do checks that should probably be done. The only issue, I suspect, is how they respond to detecting the bad conditions. Since I don't really understand completely the flow of the code and the expected return types I had to guess a bit.

As a result of these diffs, my main problem appears to be gone (I noticed a different kind of crash, related to a free() call, with another file in my ring buffer but I suspect that is more of a fluke than this). Here are the diffs (quite simple actually)...

Index: libs/libavcodec/mpeg12.c
===================================================================
--- libs/libavcodec/mpeg12.c    (revision 12454)
+++ libs/libavcodec/mpeg12.c    (working copy)
@@ -1468,6 +1468,9 @@
         }
     }

+    if (s->current_picture.mb_type==0) {
+      return -1;
+    }
     s->current_picture.mb_type[ s->mb_x + s->mb_y*s->mb_stride ]= mb_type;

     return 0;
Index: libs/libavutil/rational.c
===================================================================
--- libs/libavutil/rational.c   (revision 12454)
+++ libs/libavutil/rational.c   (working copy)
@@ -36,8 +36,12 @@
     int sign= (nom<0) ^ (den<0);
     int64_t gcd= ff_gcd(ABS(nom), ABS(den));

+    if (den==0) {
+        return den==0;
+    }
     nom = ABS(nom)/gcd;
     den = ABS(den)/gcd;
+
     if(nom<=max && den<=max){
         a1= (AVRational){nom, den};
         den=0;

comment:7 in reply to:  2 Changed 13 years ago by ylee@…

Replying to michael.tiller@gmail.com:

This is not a case of a single corrupted file. I get these all the time..

I have been seeing the exact same thing as what Michael describes with the last few releases of 0.20-fixes: clean recordings that kill mythbackend during thumbnail generation. This normally isn't a big deal since I have turned generation off in mythfrontend in favor of the video previews, but I had to turn them off in MythWeb as well, which is slightly unfortunate. I hope Michael's patch, or some variant therein, makes it into the next 0.20-fixes batch and then to ATrpms.

comment:8 Changed 13 years ago by DannyCan@…

Version: 0.20head

I am having this same problem as well. My backend log shows this when it crashes: [mpeg2video @ 0xb7289008]ac-tex damaged at 80 42 [mpeg2video @ 0xb7289008]invalid mb type in I Frame at 0 43 [mpeg2video @ 0xb7289008]ac-tex damaged at 40 43 [mpeg2video @ 0xb7289008]ac-tex damaged at 80 43 [mpeg2video @ 0xb7289008]invalid mb type in I Frame at 0 44 [mpeg2video @ 0xb7289008]ac-tex damaged at 40 44 [mpeg2video @ 0xb7289008]ac-tex damaged at 80 44 [mpeg2video @ 0xb7289008]Warning MVs not available [mpeg2video @ 0xb7289008]get_buffer() failed (stride changed) 2007-01-08 18:36:24.569 AFD Error: Unknown decoding error

I don't suppose this helps any...

comment:9 Changed 13 years ago by danielk

Resolution: fixed
Status: newclosed

(In [12484]) Fixes #2889. Fix from Michael Tiller & Michael Niedermayer for a division by zero in ffmpeg.

Michael Tiller submitted this to ffmpeg and Michael Niedermayer has applied a version of this there (it is textually slightly different due to other changes in ffmpeg.)

comment:10 Changed 13 years ago by michael.tiller@…

I just wanted to add that there are two fixes. One is a division by zero error and the other is a null pointer error. Both are significant (i.e. cause crashes on my front end) but the comment implies that only one was fixed. Is that the case?

comment:11 Changed 13 years ago by anonymous

ffmpeg didn't accept the other one, and I'm going to want a backtrace before I accept it. I think the second problem could be a duplicate of #2822 and indicates a problem in MythTV proper.

Note: See TracTickets for help on using tickets.