Opened 11 years ago

Closed 11 years ago

Last modified 10 years ago

#12045 closed Bug Report - General (fixed)

Live TV after a short while always make the frontend hang

Reported by: JYA Owned by: Jean-Yves Avenard <jyavenard@…>
Priority: blocker Milestone: 0.27.1
Component: MythTV - Recording Version: 0.27-fixes
Severity: medium Keywords:
Cc: Ticket locked: yes

Description

I'm very puzzled with this one.

It only started since I've been back from a month away. I had never experienced it in the past.

At first, I thought the issue with with my RAID array where the recordings are stored, as smart status of one of the disk started to be very verbose. Following that, I instead used as recording storage a NFS share to a very fast machine over gigabit (easily saturate the gigabit link and having local write speed in excess of 400MB/s on the NFS server) that didn't solve the problem.

So I created a brand new backend, new disks, database on fast SSD (580MB/s read/write) the lot... still issue is just as bad, no matter the frontend.

When starting live TV after a short while (ranging from 10-15s to 3-4 minutes) the frontend will hang.

In the backend log, you can see usually:

Feb  4 00:16:33 mediaserver-surrey1 mythbackend: mythbackend[27099]: E ProcessRequest ringbuffer.cpp:1105 (WaitForAvail) RingBuf(/external/recordings/1009_20140203131345.mpg): Waited 16 seconds for data, aborting.

I attempted to increase the various wait time, to know available.

In fact, from the logs, it appears that once we enter a particular condition, we just never exit the waiting loop; it will always time out.

in the logs usually there are plenty of:

Feb  4 00:15:41 mediaserver-surrey1 mythbackend: mythbackend[27099]: I ProcessRequest ringbuffer.cpp:1098 (WaitForAvail) RingBuf(/external/recordings/1009_20140203131345.mpg): Waited 0.5 seconds for data #012#011#011#011to become available... 182840 < 229376

which is fine, playback is just normal

If suddenly the writer thread takes a little longer than usual for whatever reason, WaitForAvailable? will start waiting forever:

Feb  4 00:15:53 mediaserver-surrey1 mythbackend: mythbackend[27099]: W TFWWrite threadedfilewriter.cpp:539 (DiskLoop) TFW(/external/recordings/1009_20140203131345.mpg:71): write(65424) cnt 22 total 1362060 -- took a long time, 1425 ms
Feb  4 00:15:53 mediaserver-surrey1 mythbackend: mythbackend[27099]: I ProcessRequest ringbuffer.cpp:1098 (WaitForAvail) RingBuf(/external/recordings/1009_20140203131345.mpg): Waited 8.0 seconds for data #012#011#011#011to become available... 137524 < 229376

To accentuate the issue and cause it to occur even more often, I run a side process, writing to the NFS share (not fast 10MB/s)

There seems to be a racing condition somewhere, once the writer thread takes longer than usual once, then the reader thread will never recover.

Attach is a log with the wait changed to 48s (WaitForAvail?) and 20s (WaitForReadsAllowed?).

Attachments (3)

mythbackend.log (32.4 KB) - added by JYA 11 years ago.
12045.diff (11.4 KB) - added by JYA 11 years ago.
Attempted fix
12045-try2.diff (8.6 KB) - added by JYA 11 years ago.
another attempted fix

Download all attachments as: .zip

Change History (14)

Changed 11 years ago by JYA

Attachment: mythbackend.log added

comment:1 Changed 11 years ago by JYA

Resolution: Invalid
Status: newclosed

turned out that is was the nfs mount option used.

NFS was mounted with rsize=8192,wsize=8192

removed that option, and so far playing okay...

comment:2 Changed 11 years ago by JYA

Resolution: Invalid
Status: closednew

Going to reopen that one...

It keeps happening. When the frontend mount directly the disk over NFS, there's no such issue occurring. So at least I have a work around; but it's far from the perfect approach, and certainly makes mythtv setup even more complicated.

Changed 11 years ago by JYA

Attachment: 12045.diff added

Attempted fix

comment:3 Changed 11 years ago by JYA

In TFW; the actual sync to disk was made in a separate thread, and a sync called every second...

This seemed convoluted to me; also was a bit unsure on how the system would behave when you have one thread calling write and another sync() at the same time (as it happens from time to time).

So this patch changes the behaviour. Instead sync() is called after a write operation, but at a maximum rate of once every second.

Also changes the log so there's no log when a read takes more than 250ms but less than 500ms... This seriously reduces the verbosity on my system.

comment:4 Changed 11 years ago by JYA

I believe the issue is solved with this last attempt.

The problem occurs when you are close to EOF and call read() that would have read past the EOF and returns early. If you call read() immediately again before any new write occurred: read() could block.

Changed 11 years ago by JYA

Attachment: 12045-try2.diff added

another attempted fix

comment:5 Changed 11 years ago by Jean-Yves Avenard <jyavenard@…>

Owner: set to Jean-Yves Avenard <jyavenard@…>
Resolution: fixed
Status: newclosed

In 2eb5091b6b2058bf94cb11ef36b5972ef9a18fee/mythtv:

Fix fileringbuffer locking up on read()

From time to time, most often seen during LiveTV, FileRingBuffer::safe_read() would lock and not return for a long time (on my system, it would block in excess of 20s).
Upon investigation, the issue occurs while reading a file currently being written to, if a call to read() encountered EOF, returned early and a following call to read() was made, before another write. Under those conditions, read() wouldn't return for a long time, causing the backend to abort the read and dropping the frontend connection.

So instead we now check the size of the current file via a call to POSIX fstat and never try to read beyond the EOF position.
If the file being read isn't a regular, or fstat returned an error, the old behaviour is used: read until we get as much data as we asked.

Surprisingly, this has fixed most of the issues I've been having for years with liveTV:

  • Stuttering or temporary free when close to the live
  • Freeze at program transition.

Add debugging information while in file mode

Fixes #12045

comment:6 Changed 11 years ago by Jean-Yves Avenard <jyavenard@…>

In 20f7cd2858e79385ceec07b2c19c7e4f84714e18/mythtv:

Fix fileringbuffer locking up on read()

From time to time, most often seen during LiveTV, FileRingBuffer::safe_read() would lock and not return for a long time (on my system, it would block in excess of 20s).
Upon investigation, the issue occurs while reading a file currently being written to, if a call to read() encountered EOF, returned early and a following call to read() was made, before another write. Under those conditions, read() wouldn't return for a long time, causing the backend to abort the read and dropping the frontend connection.

So instead we now check the size of the current file via a call to POSIX fstat and never try to read beyond the EOF position.
If the file being read isn't a regular, or fstat returned an error, the old behaviour is used: read until we get as much data as we asked.

Surprisingly, this has fixed most of the issues I've been having for years with liveTV:

  • Stuttering or temporary free when close to the live
  • Freeze at program transition.

Add debugging information while in file mode

Fixes #12045

comment:7 Changed 11 years ago by Jean-Yves Avenard <jyavenard@…>

In a7458907252a03552e3be4bf8085f7107a521159/mythtv:

Fix fileringbuffer locking up on read()

From time to time, most often seen during LiveTV, FileRingBuffer::safe_read() would lock and not return for a long time (on my system, it would block in excess of 20s).
Upon investigation, the issue occurs while reading a file currently being written to, if a call to read() encountered EOF, returned early and a following call to read() was made, before another write. Under those conditions, read() wouldn't return for a long time, causing the backend to abort the read and dropping the frontend connection.

So instead we now check the size of the current file via a call to POSIX fstat and never try to read beyond the EOF position.
If the file being read isn't a regular, or fstat returned an error, the old behaviour is used: read until we get as much data as we asked.

Surprisingly, this has fixed most of the issues I've been having for years with liveTV:

  • Stuttering or temporary free when close to the live
  • Freeze at program transition.

Add debugging information while in file mode

Fixes #12045
(cherry picked from commit 20f7cd2858e79385ceec07b2c19c7e4f84714e18)

comment:8 Changed 11 years ago by Jean-Yves Avenard <jyavenard@…>

In 0b9470c87fe4d727e173cddf715270da7d18311e/mythtv:

Fix fileringbuffer locking up on read()

From time to time, most often seen during LiveTV, FileRingBuffer::safe_read() would lock and not return for a long time (on my system, it would block in excess of 20s).
Upon investigation, the issue occurs while reading a file currently being written to, if a call to read() encountered EOF, returned early and a following call to read() was made, before another write. Under those conditions, read() wouldn't return for a long time, causing the backend to abort the read and dropping the frontend connection.

So instead we now check the size of the current file via a call to POSIX fstat and never try to read beyond the EOF position.
If the file being read isn't a regular, or fstat returned an error, the old behaviour is used: read until we get as much data as we asked.

Surprisingly, this has fixed most of the issues I've been having for years with liveTV:

  • Stuttering or temporary free when close to the live
  • Freeze at program transition.

Add debugging information while in file mode

Fixes #12045

(cherry picked from commit 20f7cd2858e79385ceec07b2c19c7e4f84714e18)

comment:9 Changed 10 years ago by paulh

Milestone: unknown0.27.1

comment:10 Changed 10 years ago by cweemin@…

This issues comes up when mythtv is recording into a Btrfs filesystem with a default mount option. The default option (commit) for btrfs is set to 30 seconds, and this will cause the backend to freeze intermittently if the write have not commit back to the device. I made my changes to fileringbuffer.cpp to open file with O_NONBLOCK mode and remove the fstat calls. Seems to fix my problem. I see there's already code to ignore EAGAIN errors, so I should be safe.

comment:11 Changed 10 years ago by JYA

Ticket locked: set

please don't hijack a closed bug. Re-open a new one if required.

Writes will be forced, regardless of the filesystem default as myth calls regularly sync. If with BTRFS that doesn't force the data to be written to disk, it should be raised upstream

Note: See TracTickets for help on using tickets.