Opened 12 months ago

Last modified 3 weeks ago

#13190 accepted Bug Report - General

ExternalRecorder - error reading STDERR

Reported by: mrawji@… Owned by: jpoet
Priority: minor Milestone: 29.2
Component: MythTV - General Version: v29-fixes
Severity: medium Keywords:
Cc: Ticket locked: no

Description

Hi,

I think this might be related to https://code.mythtv.org/trac/ticket/13181, but I am not sure.

I am running fixes/29 (v29.0-64-g39b2062).

I can't reproduce this at will (or, i don't know how). I've had it happen once that I've noticed. There a couple of things that are odd:

External Recorder did not respond

Dec  5 16:30:00: I TVRecEvent recorders/ExternalStreamHandler.cpp:1109 (ProcessCommand) ExternalRec(/home/xbmc/rec/recorder1/record.sh): ProcessCommand('HasTuner?') = 'OK:Yes'
Dec  5 16:30:00: W TVRecEvent recorders/ExternalStreamHandler.cpp:1097 (ProcessCommand) ExternalRec(/home/xbmc/rec/recorder1/record.sh): External Recorder did not respond to 'HasPictureAttributes?'
Dec  5 16:30:00: I TVRecEvent recorders/ExternalStreamHandler.cpp:1109 (ProcessCommand) ExternalRec(/home/xbmc/rec/recorder1/record.sh): ProcessCommand('HasPictureAttributes?') = 'OK:No'

Here's it's strange that it reports that the recorder did not respond, but it responded well within the timeout. I think it might be because the "Ready" method returns TRUE, but 'read' function returns 0 bytes, but I am just guessing.

same thing happens here:

Dec  5 16:30:00: W TVRecEvent recorders/ExternalStreamHandler.cpp:1097 (ProcessCommand) ExternalRec(/home/xbmc/rec/recorder1/record.sh): External Recorder did not respond to 'FlowControl?'
Dec  5 16:30:00: I TVRecEvent recorders/ExternalStreamHandler.cpp:1109 (ProcessCommand) ExternalRec(/home/xbmc/rec/recorder1/record.sh): ProcessCommand('FlowControl?') = 'OK:XON/XOFF'

However, it looks like it collected the right information:

Dec  5 16:30:00: I TVRecEvent recorders/ExternalStreamHandler.cpp:824 (OpenApp) ExternalRec(/home/xbmc/rec/recorder1/record.sh): Capabilities: tuner(yes) Picture attributes(no) Flow control(XON/XOFF)

If this is just a logging quirk, I am not bothered.

Reading the same data twice from the STDIN

This segment picks up where the one above ended. The FlowControl? seems to answered correctly, but the XON/XOFF apprears to be re-read as the answer to BlockSize?.

Dec  5 16:30:00: I TVRecEvent recorders/ExternalStreamHandler.cpp:1109 (ProcessCommand) ExternalRec(/home/xbmc/rec/recorder1/record.sh): ProcessCommand('FlowControl?') = 'OK:XON/XOFF'
Dec  5 16:30:00: I TVRecEvent recorders/ExternalStreamHandler.cpp:816 (OpenApp) ExternalRec(/home/xbmc/rec/recorder1/record.sh): App opened successfully
Dec  5 16:30:00: I TVRecEvent recorders/ExternalStreamHandler.cpp:824 (OpenApp) ExternalRec(/home/xbmc/rec/recorder1/record.sh): Capabilities: tuner(yes) Picture attributes(no) Flow control(XON/XOFF)
Dec  5 16:30:00: I TVRecEvent recorders/ExternalStreamHandler.cpp:1109 (ProcessCommand) ExternalRec(/home/xbmc/rec/recorder1/record.sh): ProcessCommand('BlockSize:6160384') = 'OK:XON/XOFF'

After that, all the responses are "off by 1"

Here's the tail of the log, with the clear example:

ProcessCommand('LockTimeout?') = 'OK'
Dec  5 16:30:02: E TVRecEvent recorders/ExternalSignalMonitor.cpp:206 (GetLockTimeout) ExternSigMon(/home/xbmc/rec/recorder1/record.sh): GetLockTimeout: invalid response 'OK'
Dec  5 16:30:02: I SignalMonitor recorders/ExternalStreamHandler.cpp:1109 (ProcessCommand) ExternalRec(/home/xbmc/rec/recorder1/record.sh): ProcessCommand('SignalStrengthPercent?') = 'OK:10000'

the OK:10000 is the response to "GetLockTimeout?"...

Logs

Here is the relevant part of the log:

Dec  5 16:30:00: I TVRecEvent recorders/ExternalStreamHandler.cpp:1109 (ProcessCommand) ExternalRec(/home/xbmc/rec/recorder1/record.sh): ProcessCommand('Version?') = 'OK:1.0'
Dec  5 16:30:00: I TVRecEvent recorders/ExternalStreamHandler.cpp:1109 (ProcessCommand) ExternalRec(/home/xbmc/rec/recorder1/record.sh): ProcessCommand('HasTuner?') = 'OK:Yes'
Dec  5 16:30:00: W TVRecEvent recorders/ExternalStreamHandler.cpp:1097 (ProcessCommand) ExternalRec(/home/xbmc/rec/recorder1/record.sh): External Recorder did not respond to 'HasPictureAttributes?'
Dec  5 16:30:00: I TVRecEvent recorders/ExternalStreamHandler.cpp:1109 (ProcessCommand) ExternalRec(/home/xbmc/rec/recorder1/record.sh): ProcessCommand('HasPictureAttributes?') = 'OK:No'
Dec  5 16:30:00: W TVRecEvent recorders/ExternalStreamHandler.cpp:1097 (ProcessCommand) ExternalRec(/home/xbmc/rec/recorder1/record.sh): External Recorder did not respond to 'FlowControl?'
Dec  5 16:30:00: I TVRecEvent recorders/ExternalStreamHandler.cpp:1109 (ProcessCommand) ExternalRec(/home/xbmc/rec/recorder1/record.sh): ProcessCommand('FlowControl?') = 'OK:XON/XOFF'
Dec  5 16:30:00: I TVRecEvent recorders/ExternalStreamHandler.cpp:816 (OpenApp) ExternalRec(/home/xbmc/rec/recorder1/record.sh): App opened successfully
Dec  5 16:30:00: I TVRecEvent recorders/ExternalStreamHandler.cpp:824 (OpenApp) ExternalRec(/home/xbmc/rec/recorder1/record.sh): Capabilities: tuner(yes) Picture attributes(no) Flow control(XON/XOFF)
Dec  5 16:30:00: I TVRecEvent recorders/ExternalStreamHandler.cpp:1109 (ProcessCommand) ExternalRec(/home/xbmc/rec/recorder1/record.sh): ProcessCommand('BlockSize:6160384') = 'OK:XON/XOFF'
Dec  5 16:30:00: I TVRecEvent recorders/ExternalStreamHandler.cpp:478 (Get) ExternSH: Creating new stream handler /home/xbmc/rec/recorder1/record.sh for /home/xbmc/rec/recorder1/record.sh
Dec  5 16:30:00: I TVRecEvent recorders/ExternalStreamHandler.cpp:1109 (ProcessCommand) ExternalRec(/home/xbmc/rec/recorder1/record.sh): ProcessCommand('TuneChannel:1') = 'OK'
Dec  5 16:30:00: I TVRecEvent recorders/ExternalChannel.cpp:72 (Tune) ExternChan[1](/home/xbmc/rec/recorder1/record.sh): Tune(55)
Dec  5 16:30:00: I TVRecEvent recorders/ExternalChannel.cpp:85 (Tune) ExternChan[1](/home/xbmc/rec/recorder1/record.sh): Tuning to 55
Dec  5 16:30:00: W TVRecEvent recorders/ExternalStreamHandler.cpp:1097 (ProcessCommand) ExternalRec(/home/xbmc/rec/recorder1/record.sh): External Recorder did not respond to 'TuneChannel:55'
Dec  5 16:30:02: I TVRecEvent recorders/ExternalStreamHandler.cpp:1109 (ProcessCommand) ExternalRec(/home/xbmc/rec/recorder1/record.sh): ProcessCommand('TuneChannel:55') = 'OK'
Dec  5 16:30:02: I TVRecEvent recorders/ExternalStreamHandler.cpp:1109 (ProcessCommand) ExternalRec(/home/xbmc/rec/recorder1/record.sh): ProcessCommand('Version?') = 'OK'
ProcessCommand('LockTimeout?') = 'OK'
Dec  5 16:30:02: E TVRecEvent recorders/ExternalSignalMonitor.cpp:206 (GetLockTimeout) ExternSigMon(/home/xbmc/rec/recorder1/record.sh): GetLockTimeout: invalid response 'OK'
Dec  5 16:30:02: I SignalMonitor recorders/ExternalStreamHandler.cpp:1109 (ProcessCommand) ExternalRec(/home/xbmc/rec/recorder1/record.sh): ProcessCommand('SignalStrengthPercent?') = 'OK:10000'

Change History (14)

comment:1 Changed 12 months ago by jpoet

Owner: set to jpoet
Status: newassigned

Can you try an older version, to see if [faa7994416] is somehow causing this problem for you? [faa7994416] was actually designed to fix a problem like this, but maybe it doesn't handle all cases.

comment:2 Changed 12 months ago by mrawji@…

Hi John,

I don't want to go back to an older version because I need the changes in [af1126277a]... without this change I get a failure rate in the 40%.

I'll try building with a revert of [faa7994416], add in some extra logging and see what happens, but might take me some time to get back to you since I can't make it fail on a regular basis.

Thanks, Marc

comment:3 Changed 12 months ago by Marc Rawji <mrawji@…>

Hi John,

So I added a loop (1000x) around the "cheap" Externalcommands to see if I could force it to reproduce. I didn't get a straight up failure where it just stopped working, but was able to reproduce this:

2017-12-07 22:11:12.387166 I  ExternalRec(/home/mrawji/recorder2/record.sh): ProcessCommand('Version?') = 'OK:1.0'
2017-12-07 22:11:12.387423 I  ExternalRec(/home/mrawji/recorder2/record.sh): ProcessCommand('Version?') = 'OK:1.0'
2017-12-07 22:11:12.387679 I  ExternalRec(/home/mrawji/recorder2/record.sh): ProcessCommand('Version?') = 'OK:1.0'
2017-12-07 22:11:12.387937 I  ExternalRec(/home/mrawji/recorder2/record.sh): ProcessCommand('Version?') = 'OK:1.0'
2017-12-07 22:11:12.388193 I  ExternalRec(/home/mrawji/recorder2/record.sh): ProcessCommand('Version?') = 'OK:1.0'
2017-12-07 22:11:12.388520 I  ExternalRec(/home/mrawji/recorder2/record.sh): ProcessCommand('Version?') = 'OK:1.0'
2017-12-07 22:11:12.388748 I  ExternalRec(/home/mrawji/recorder2/record.sh): ProcessCommand('Version?') = 'OK:1.0'
2017-12-07 22:11:12.388993 I  ExternalRec(/home/mrawji/recorder2/record.sh): ProcessCommand('Version?') = 'OK:1.0'
2017-12-07 22:11:12.395628 I  ExternalRec(/home/mrawji/recorder2/record.sh): ProcessCommand('Version?') = 'OK:1.0'
2017-12-07 22:11:12.395665 W  ExternalRec(/home/mrawji/recorder2/record.sh): External Recorder did not respond to 'Version?'
2017-12-07 22:11:12.430755 I  ExternalRec(/home/mrawji/recorder2/record.sh): ProcessCommand('Version?') = 'OK:1.0'
2017-12-07 22:11:12.433036 W  ExternalRec(/home/mrawji/recorder2/record.sh): External Recorder did not respond to 'Version?'
2017-12-07 22:11:12.444174 I  ExternalRec(/home/mrawji/recorder2/record.sh): ProcessCommand('Version?') = 'OK:1.0'

I can make that happen with and without the [faa7994416] commit.

I am also sure I didn't get the EAGAIN error reading from the stream... it just read 0 bytes.

I also see this log:

2017-12-07 22:11:12.694597 I  ExternalRec(/home/mrawji/recorder2/record.sh): ProcessCommand('HasTuner?') = 'OK:Yes'
2017-12-07 22:11:12.694793 I  ExternalRec(/home/mrawji/recorder2/record.sh): ProcessCommand('HasTuner?') = 'OK:Yes'
2017-12-07 22:11:12.698246 I  ExternalRec(/home/mrawji/recorder2/record.sh): ProcessCommand('HasTuner?') = 'OK:Yes
OK:Yes
OK:Yes
OK:Yes
OK:Yes'
2017-12-07 22:11:12.698771 I  ExternalRec(/home/mrawji/recorder2/record.sh): ProcessCommand('HasTuner?') = 'OK:Yes'
2017-12-07 22:11:12.699396 I  ExternalRec(/home/mrawji/recorder2/record.sh): ProcessCommand('HasTuner?') = 'OK:Yes'

but i don't know what to make of it... it shouldn't happen that 4 "lines" get read.

I think i'll try to put your commit back in, and force a "re-read"in case 0 bytes were returned by the read command, maybe with a usleep in there.

I am open to suggestions. random retries aren't my favorite way to fix problems.

Marc

comment:4 Changed 12 months ago by Marc Rawji <mrawji@…>

Ok, assuming my crappy 'for loops' around Version? and the BlockSize? were valid ways to reproduce, this patch seems to fix it. I haven't tested it on prod setup..

diff --git a/mythtv/libs/libmythtv/recorders/ExternalStreamHandler.cpp b/mythtv/libs/libmythtv/recorders/ExternalStreamHandler.cpp
index b2bf050..76017fd 100644
--- a/mythtv/libs/libmythtv/recorders/ExternalStreamHandler.cpp
+++ b/mythtv/libs/libmythtv/recorders/ExternalStreamHandler.cpp
@@ -1087,7 +1087,7 @@ bool ExternalStreamHandler::ProcessCommand(const QString & cmd, uint timeout,
             }
             // STATUS message are "out of band".
             // Ignore them while waiting for a responds to a command
-            if (!result.startsWith("STATUS"))
+            if (!result.startsWith("STATUS") && result.size() > 0)
                 break;
         }

essentially, if the size of the read data is zero, it'll try reading again.

Thoughts? Marc

comment:5 Changed 12 months ago by Stuart Auchterlonie

Milestone: needs_triage29.1
Version: Unspecifiedv29-fixes

comment:6 Changed 12 months ago by jpoet

Status: assignedaccepted

Marc,

Is your 'recorder' for low or standard def? Since it is taking so long to respond to commands, I assume it is fairly low bit-rate. Did you try increasing the timeout?

I am willing to commit the change to check the size of result. It does not seem to hurt anything. Have you been able to reproduce your problem, with that change in place?

John

comment:7 Changed 12 months ago by mrawji@…

Hi john,

No I haven't been able to reproduce but the failure rate looks like 4/1000, so low odds. No rush on the commit... I'll test more extensively, but I don't have a clear way to reproduce..

My recorder has both standard and high def inputs. I am not sure what you mean about it taking too long to respond... The logs above are all milliseconds apart and the timeouts are in seconds...

Marc

comment:8 Changed 12 months ago by Marc Rawji <mrawji@…>

Hi John...

I am doing some more debugging. I am not sure my patch actually fixes anything...

Some additional testing shows that sometimes the first character of the buffer is a 'newline', and the "O" from OK is missing, so readline reads an empty string.

I'll do some more research.

Marc

comment:9 Changed 11 months ago by Stuart Auchterlonie

Milestone: 29.10.28.2

Moving remaining open tickets to 0.28.2 milestone

comment:10 Changed 11 months ago by Stuart Auchterlonie

Milestone: 0.28.229.2

Moving remaining open tickets to 29.2 milestone

comment:11 Changed 5 weeks ago by jpoet

mrawji were you able to resolve this issue?

I have added a new "API" version which is more robust in keeping messages between mythbackend and the external recorder in sync. Please try it.

John

comment:12 Changed 5 weeks ago by mrawji

Hi John,

Funny to see this. After almost a year, I looked at the ticket this week. I haven't gone a whole lot deeper.

I found a case where the queries got out of sync if the request timedout in MythTv?, but the actual call continued in my external recorder.

I fixed it by having the request handler in a threadpool and keeping track of the current request. This seems to work, but it doesn't happen often so time will tell.

I am pretty sure we can close this. If I find something useful, i'll reopen and post a patch as well.

Can you share the link the new API? I am happy to give it a go!

Marc

comment:13 Changed 4 weeks ago by mrawji

Hey John,

I upgraded to the newest version of .29-fixes that includes your new API, but I think I am seeing issues with the ExternalRecorder? now... I've basically foobared my system at this point, so trying to recover it.

I'll post logs when I have everything setup again.

Marc

comment:14 Changed 3 weeks ago by mrawji

Hey John... just bumped up my system to Ubuntu 18.04 and my recorder to python3 instead of 2.7.

Everything seems to be running smoother. I'll try coding up the support for the V2 API in the next couple weeks of works calm down :)

Marc

Note: See TracTickets for help on using tickets.