id summary reporter owner description type status priority milestone component version severity resolution keywords cc mlocked 1076 mythbackend dies when writing big files (4 GB) - FAT32 filesystem buzz@… danielk "Buzz Says: (I'm making the -dev conversation into a ticket, as it's verifiably a bug when the backend seg faults) Scenario: Backend saves files to FAT32 partition. Backend tries to exceed 4GB (or there abouts) while unattended. Backend dies with error ""File size limit exceeded"" emitted by OS. Backend's last message prior to dying was: ""TFW: safe_swite() funky usleep"" (message comes from ThreadedFileWriter.cpp ) --------------------------- Buzz says: Wouldn't it be reasonable if it barfed the recording partially/entirely WITHOUT crashing mythbackend, rather than crashing entirely as it does now. --------------------------- Isaac says: Backtrace?:I'm not going to want to add code specifically to handle fat32, but certainly dying is bad. --------------------------- Buzz says: Last 4 lines of 'mythbackend -v all' followed by backtrace: 2006-01-19 16:35:20.575 MSqlQuery: INSERT INTO recordedmarkup (chanid, starttime, mark, type, offset) VALUES ( '1007' , '2006-01-19T15:15:00' , '119754' , 9 , '4289249140' ); 2006-01-19 16:35:20.576 MSqlQuery: INSERT INTO recordedmarkup (chanid, starttime, mark, type, offset) VALUES ( '1007' , '2006-01-19T15:15:00' , '119772' , 9 , '4289895672' ); 2006-01-19 16:35:20.577 MSqlQuery: INSERT INTO recordedmarkup (chanid, starttime, mark, type, offset) VALUES ( '1007' , '2006-01-19T15:15:00' , '119790' , 9 , '4290548972' ); 2006-01-19 16:35:25.630 TFW: safe_write(): funky usleep Program received signal SIGXFSZ, File size limit exceeded. [Switching to Thread -1336443984 (LWP 2872)] 0x00b8e402 in __kernel_vsyscall () (gdb) (gdb) bt #0 0x00b8e402 in __kernel_vsyscall () #1 0x007540bb in __write_nocancel () from /lib/libpthread.so.0 #2 0x00e93412 in safe_write (fd=20, data=0xaee70d90, sz=12920) at ThreadedFileWriter.cpp:57 #3 0x00e950a1 in ThreadedFileWriter::DiskLoop (this=0x89b69d0) at ThreadedFileWriter.cpp:367 #4 0x00e951a5 in ThreadedFileWriter::boot_writer (wotsit=0x89b69d0) at ThreadedFileWriter.cpp:93 #5 0x0074fb80 in start_thread () from /lib/libpthread.so.0 #6 0x02d969ce in clone () from /lib/libc.so.6 --------------------------- Buzz says: *) OS is sending a SIGXFSZ to backend, backend is taking default action which ""coredump and exit"". Solution: * capture SIGXFSZ, handle it gracefully. --------------------------- Buzz says: Hi All. I'm working on a solution to this thread and have got the following steps working in my code (diff attached - catch_and_handle_SIGXFSZ_diff.txt): 1) OS sends SIGXFSZ to mythbackend 2) backend captures said signal, squirrels it into a global called ""LastSignal"", so anyone who wants to can look for it. (yes, I know globals are bad, but signal handlers are worse.) 3) ThreadedFileWriter.cpp has an existing function called ""safe_write"" that I've modified so that it checks for the signal(in the global) before trying to write to any file. 4) safe_write: if a SIGXFSZ signal was received it ""aborts"" the in-progress write (then-and-there, without flushing memory buffers to disk or anything), returning an error. 5) safe_write is called from inside ThreadedFileWriter::DiskLoop. The return value of safe_write is tested in DiskLoop, and it causes both threads (write and sync threads of ThreadedFileWriter) to be torn down, and the ThreadedFileWriter to enter a state of ""write error"". 6) next time the caller (RingBuffer.cpp) trys to call tfw->Write(...) it fails, returning -1 up to the calling function (which is in RingBuffer.cpp - Write), and the tfw is torn down, closing the open file handle, and cleaning up. 7) RingBuffer.cpp already had the capability to return -1 or other errors, so it's been tweaked to look at the return status of the tfw->Write call too, and pass the error up if it occurs. ...now, I'm not sure where to take it from here. The signal is definitely being captured, and it's being passed all the way back up to the RingBuffer, so I know that's working but.... Nothing else (backend and/or frontend) seems to recognise that the recording failed. Should I go that far, or just barf the error message to the log, and leave it at that? IE: How do I make everything else recognise that the recording of this file has aborted/failed? Am I doing the right thing here... Or is there an easier way? --------------------------- Mark Weaver says: Just ignore the signal - write will return EFBIG and the recording should follow the usual failure path. You should be able to test it with ulimit -f, that will allow you to generate SIGXFSZ with smaller files. --------------------------- Buzz says: The problem is that as it exists now in CVS, ThreadedFileWriter.cpp has no ""usual failure path"" from the 'write' command (in safe_write). safe_write returns a uint to indicate how much was written, and '0' is a legitimate amount to write, not an error case. I've changed the relevant places to allow it to return negative (failure),and pass the failure back-up the calling chain to RingBuffer where it emits an error to the log. Both backend and frontend both still seem oblivious to the error condition that occurs when RingBuffer->Write() return -1 during a record. Other suggestions? --------------------------- " patch closed minor 0.20 mythtv head medium fixed 0