Opened 2 years ago

Closed 2 years ago

Last modified 2 years ago

#13181 closed Bug Report - General (Fixed)

ExternalRecorder fails with "resource temporarily unavailable"

Reported by: Marc Rawji <mrawji@…> Owned by: jpoet
Priority: minor Milestone: 29.1
Component: MythTV - General Version: Master Head
Severity: medium Keywords:
Cc: Ticket locked: no

Description

I've been working on an external recorder for a while. It works most of the time, but recently i've been getting new errors.

First, the sequence of calls for a successful play look like this: StartStreaming? StopStreaming? StartStreaming?

There's only a few milliseconds between the StopStreaming? and the 2nd StartStreaming?, so my code allows for it and that seems to work just fine... The StopStreaming? in the logs below is probably a red herring.

My problem below is that I am getting a "Resource temporarily unavailable (11)" when mythtv is trying to read from the stdin (the stdout of my program with the video stream)

This seems to happen from this call: https://github.com/MythTV/mythtv/blob/master/mythtv/libs/libmythtv/recorders/ExternalStreamHandler.cpp#L125

The STDIN stream is configured to be non blocking, so the errno 11 isn't unexpected: https://github.com/MythTV/mythtv/blob/master/mythtv/libs/libmythtv/recorders/ExternalStreamHandler.cpp#L343

I am not sure what to change to keep this from happening.... it fails about 50% of the time, which is annoying.

I am looking for advice on a fix, or how to troubleshoot. I am running 0.29 on Ubuntu 16.04.

Thanks! Marc

Trimmed logs from mythbackend:

Nov 21 21:23:30  mythbackend: mythbackend[1286]: I TVRecEvent recorders/ExternalStreamHandler.cpp:1077 (ProcessCommand) ExternalRec(/home/recorder1/record.sh): ProcessCommand('Version?') = 'OK:1.0'
Nov 21 21:23:30 mythbackend: mythbackend[1286]: W TVRecEvent recorders/ExternalStreamHandler.cpp:1066 (ProcessCommand) ExternalRec(/home/recorder1/record.sh): External Recorder did not respond to 'StopStreaming'
Nov 21 21:23:30  mythbackend: mythbackend[1286]: I TVRecEvent recorders/ExternalStreamHandler.cpp:1077 (ProcessCommand) ExternalRec(/home/recorder1/record.sh): ProcessCommand('StopStreaming') = 'OK:Stopped'
Nov 21 21:23:30  mythbackend: mythbackend[1286]: E TVRecEvent recorders/ExternalStreamHandler.cpp:130 (Read) Failed to read from External Recorder: #012#011#011#011eno: Resource temporarily unavailable (11)
Nov 21 21:23:30 everest mythbackend: mythbackend[1286]: E TVRecEvent recorders/ExternalStreamHandler.cpp:151 (GetStatus) ExternIO::GetStatus: already in error state: 'Failed to read from External Recorder: #012#011#011#011eno: Resource temporarily unavailable (11)'
Nov 21 21:23:30 mythbackend: mythbackend[1286]: I TVRecEvent recorders/ExternalStreamHandler.cpp:1014 (StopStreaming) ExternalRec(/home/recorder1/record.sh): Streaming stopped
Nov 21 21:23:30  mythbackend: mythbackend[1286]: E ExternSH recorders/ExternalStreamHandler.cpp:625 (run) ExternalRec(/home/recorder1/record.sh): Failed to read from External Recorder: Failed to read from External Recorder: #012#011#011#011eno: Resource temporarily unavailable (11)

I grabbed the latest V30 (master) and hacked up ExternalStreamHandler?.cpp with extra log messages. I replaced the Ready function to use select instead of poll and still had the same behavior. I then commented out this line:

  • m_error = "Failed to read from External Recorder: " + ENO;

+ m_error = "MR: Failed to read from External Recorder: " + ENO; preventing the hard failure, and got a functioning video stream.

In the snippet below, FD 62 is the "stdout" with the transport stream. The "Checking for data" line comes from the Ready function: 1 indicates there's data to read. The empty line in the log is the from the commented out m_error... It's when we get the unexpected "erno 11".

We can see there's data before and after the error... I am not sure it's a problem with the Ready function, but with the 'read'

2017-11-23 15:07:35.517510 I  ExternalRec(/home/mrawji/recorder/record.sh): ProcessCommand('StopStreaming') = 'OK:Stopped'
2017-11-23 15:07:35.517516 E  ExternIO::Read '256' bytes, buffer size 312
2017-11-23 15:07:35.517690 E  MR:  Checking for data, return value was: '1', for FD '62'
2017-11-23 15:07:35.517693 E  MR:  Checking for data, return value was: '1', for FD '62'
2017-11-23 15:07:35.517700 E
2017-11-23 15:07:35.517709 E  ExternIO::Read '256' bytes, buffer size 380
2017-11-23 15:07:35.517822 E  MR:  Checking for data, return value was: '1', for FD '62'
2017-11-23 15:07:35.517842 E  ExternIO::Read '256' bytes, buffer size 260
2017-11-23 15:07:35.517879 E  MR:  Checking for data, return value was: '1', for FD '62'
2017-11-23 15:07:35.517901 E  ExternIO::Read '256' bytes, buffer size 328

I am thinking that I could possibly add a counter on the failed reads and if we hit 2 (or n) in row, then fail... It also looks like it'll read "0 bytes" a few times despite the Ready function returning True. In the trace above, I see one 0 byte read, followed by the "erno 11". Then, everything keeps working again (9 microseconds after the error).

I am writing 256 bytes at a time from my python app... not sure if that means anything.

Attachments (1)

ExternalRecorder-EAGAIN.patch (2.2 KB) - added by jpoet 2 years ago.

Download all attachments as: .zip

Change History (14)

comment:1 Changed 2 years ago by mrawji@…

I was chatting with a colleague at work and his take, was "yeah, it happens. the stream will report it's ready via poll when it isn't quite"..

I'll try coding up a non-fatal error check and see if that improves it.

comment:2 Changed 2 years ago by jpoet

Owner: set to jpoet
Status: newaccepted

comment:3 Changed 2 years ago by jpoet

Hi Marc,

This ticket prompted me to realize I had some uncommitted code for the External Recorder code. I just committed it, so if you could give it a try I would appreciate it.

If these changes don't fix your problem, then I will work with you to get this figured out.

John

comment:4 Changed 2 years ago by mrawji@…

Looking at your diff... I have a patch I am testing right now. I'll let you know where i end up (preliminary checks looked good). Assuming my testing works tonight, i'll back out my change, put in your change and see if the behavior is better. Thankfully, it's pretty easy for me to reproduce

comment:5 Changed 2 years ago by mrawji@…

Hey John... Your patch looks helpful, but with another bug I hit... every now and again i'd get an error saying my script hadn't responded to a command, but i saw a response the logs.

I just rebuilt the recent HEAD with your recent commits and I still have the same failure I did before. I am testing a patch (2 active recordings going on now) that gracefully handles the EAGAIN error when trying to read from the stream. Looks like one of my 2 recordings would have failed, without my patch, so I am encouraged.

Marc

comment:6 Changed 2 years ago by mrawji@…

Potential workaround here. I've successfully tested with 2 recordings tonight (on the fixes/29 branch), but ported to master here...

Pull Request: https://github.com/MythTV/mythtv/pull/161

comment:7 Changed 2 years ago by mrawji@…

Forgot to mention. I am super open to feedback on the pull request / code etc... this the first time i've contributed to a project like this. Thanks! Marc

comment:8 Changed 2 years ago by Peter Bennett

Reporter: changed from mrawji@… to Marc Rawji <mrawji@…>

comment:9 Changed 2 years ago by jpoet

Hi Marc,

You patch looks pretty good. The only feedback I would give, is that pre-increment is more efficient than post-increment (e.g. ++i is better than i++).

I did change the messages and "tighten" it up a bit. I am attaching a slightly modified version of your patch. Please verify that it works for you and that I have not broken it somehow.

Thanks,

John

Changed 2 years ago by jpoet

comment:10 Changed 2 years ago by mrawji@…

I just tried out your patch, and seems to work just fine. Thank for cleaning it up. Much more succinct than my original one. I am happy to merge it if you are. Can we backport this fixes/29 as well?

One more request, if I can.

Occasionally, my external recorder needs more than 2.5 seconds to start (I don't know why). Can you increase the timeout for the "Version?" command to let my app more time to start?

-    return ProcessCommand("Version?", 2500, result);
+    return ProcessCommand("Version?", 5000, result);

If you want to track this in another ticket, let me know and I can do that.

Thanks John! Marc

comment:11 Changed 2 years ago by jpoet

Resolution: Fixed
Status: acceptedclosed

comment:12 Changed 2 years ago by jpoet

Fixed by [af1126277ade] in fixes/29 and [77810afe6990] in master.

comment:13 Changed 2 years ago by Stuart Auchterlonie

Milestone: needs_triage29.1
Version: UnspecifiedMaster Head
Note: See TracTickets for help on using tickets.