Opened 12 years ago

Closed 12 years ago

#10805 closed Bug Report - General (Invalid)

TFW - Taking too long to flush

Reported by: athroener@… Owned by: danielk
Priority: minor Milestone: unknown
Component: MythTV - General Version: 0.25-fixes
Severity: medium Keywords: Taking a long time to flush
Cc: Ticket locked: no

Description

I recently upgraded to 0.25 and have been getting a recurring error that cripples my system. The backend crashes and the system won't even reboot without a hard power off. I have verified that my RAID array is working and writing at around 100MB per second so I don't think it is an issue with write speed. Here is the error that I believe is locking up the system.

May 21 21:36:47 sloth mythbackend[2370]: W RecThread? ThreadedFileWriter?.cpp:301 (Flush) TFW(/fileshare/TV/Recorded/2422_20120521195900.mpg:52): Taking a long time to flush.. buffer size 134213200

I have included my backend log. Any help would be appreciated. This has been going on since I upgraded to 0.25 and is making my Myth system pretty much worthless.

Attachments (5)

mythbackend.log (270.1 KB) - added by athroener@… 12 years ago.
backend log
mythbackend.2.log (17.6 KB) - added by jens@… 12 years ago.
Backend log
iotop.txt (51.6 KB) - added by jens@… 12 years ago.
iotop oputput
ps.txt (20.8 KB) - added by jens@… 12 years ago.
ps -T PID output
vmstat.txt (15.7 KB) - added by jens@… 12 years ago.
/proc/vmstat output

Download all attachments as: .zip

Change History (10)

Changed 12 years ago by athroener@…

Attachment: mythbackend.log added

backend log

comment:1 Changed 12 years ago by beirdo

Owner: set to danielk
Status: newassigned

comment:2 Changed 12 years ago by beirdo

Priority: criticalminor
Severity: highmedium

resetting priorities to defaults.

comment:3 Changed 12 years ago by danielk

Status: assignedinfoneeded

athroner, when this next happens please attach to the process using gdb and to a 'thread apply all bt' to get a backtrace. Since the buffer size is so large and it isn't shrinking during the flush it would seem that no data is making it out to disk. If this is caused by a deadlock the backtrace should help tell us where it is.

comment:4 Changed 12 years ago by jens@…

Since I upgraded from Ubuntu 11.10 (MythTV 0.24) to 12.04 (MythTV 0.25) I experience the same issue. When I try to record two or more channels simultaneously the backend frequently locks up. Sometimes with the "Taking a long time to flush", sometimes with the "Maximum buffer size exceeded" messages.

I'm not certain about this but to me it seems the lock-ups occur preferentially when recording on one channel finishes or switches from one show to the next. Otherwise I haven't found any way to reliably reproduce the effect. It seems to occur rather randomly from a few minutes to a couple of hours or more after recordings start.

The issue does not seem to be related to excessive IO load. I can wildly shuffle files around on the disks without triggering it whereas normally nothing else is generating much load on the PC.

I have attached logs from one incident.

I'm recording to a level 5 RAID with 5 disks. After rebooting the PC the RAID is in an inconsistent state and needs to resync.

Switching to the mythbuntu repos (v0.25.1-58-g1d41f74) didn't help.

I tried to attach gdb to the process with "gdb /usr/bin/mythbackend PID" but unfortunately I can't get a backtrace after the lock-up. Before gdb output looks as expected but afterwards gdb can't stop the process anymore as the I/O threads are stuck in "uninterruptible sleep" (ps.txt).

Trying to kill -9 the backend leaves a zombie behind with the I/O threads still in "uninterruptible sleep".

Any idea how to get a backtrace or coredump anyway?

As I don't want to risk my older recordings to get corrupted and resyncing of my RAID takes more than two hours I have now set up a small extra RAID which I can use for tests. If you require more data or something I can gather it here. Just tell me what you need.

I have also recorded I/O utilization with iotop. At 23:27:35 in the log (iotop.txt) all threads stop writing. In the ps.txt log you can see that at the same time the threads get blocked. In the backend log I can't see anything extraordinary around that time. This is what always happens: The writing threads fall into uninterruptible sleep and stop writing.

After reading Warpme's message 'Re: [mythtv-users] recent "TFW -- took a long time" warnings' in the mythtv-users mailing list I also logged the /proc/vmstat values (vmstat.txt). The nr_dirty value always stays way below the thresholds. I hope this helps to track down the cause. As said above if you require more information or data please let me know.

Changed 12 years ago by jens@…

Attachment: mythbackend.2.log added

Backend log

Changed 12 years ago by jens@…

Attachment: iotop.txt added

iotop oputput

Changed 12 years ago by jens@…

Attachment: ps.txt added

ps -T PID output

Changed 12 years ago by jens@…

Attachment: vmstat.txt added

/proc/vmstat output

comment:5 Changed 12 years ago by danielk

Resolution: Invalid
Status: infoneededclosed

If you are getting a zombie process on kill -9 in the write it means the problem is at the OS level. The TFW warning is just the messenger, we shouldn't be getting stuck in write for such a long time. Your best bet is reading up on RAID performance issues and asking for help on the mythtv-users mailing list.

Note: See TracTickets for help on using tickets.