Opened 7 years ago
Closed 3 years ago
#13190 closed Bug Report - General (Trac EOL)
ExternalRecorder - error reading STDERR
Reported by: | Owned by: | jpoet | |
---|---|---|---|
Priority: | minor | Milestone: | 29.2 |
Component: | MythTV - General | Version: | v29-fixes |
Severity: | medium | Keywords: | |
Cc: | Ticket locked: | no |
Description
Hi,
I think this might be related to https://code.mythtv.org/trac/ticket/13181, but I am not sure.
I am running fixes/29 (v29.0-64-g39b2062).
I can't reproduce this at will (or, i don't know how). I've had it happen once that I've noticed. There a couple of things that are odd:
External Recorder did not respond
Dec 5 16:30:00: I TVRecEvent recorders/ExternalStreamHandler.cpp:1109 (ProcessCommand) ExternalRec(/home/xbmc/rec/recorder1/record.sh): ProcessCommand('HasTuner?') = 'OK:Yes' Dec 5 16:30:00: W TVRecEvent recorders/ExternalStreamHandler.cpp:1097 (ProcessCommand) ExternalRec(/home/xbmc/rec/recorder1/record.sh): External Recorder did not respond to 'HasPictureAttributes?' Dec 5 16:30:00: I TVRecEvent recorders/ExternalStreamHandler.cpp:1109 (ProcessCommand) ExternalRec(/home/xbmc/rec/recorder1/record.sh): ProcessCommand('HasPictureAttributes?') = 'OK:No'
Here's it's strange that it reports that the recorder did not respond, but it responded well within the timeout. I think it might be because the "Ready" method returns TRUE, but 'read' function returns 0 bytes, but I am just guessing.
same thing happens here:
Dec 5 16:30:00: W TVRecEvent recorders/ExternalStreamHandler.cpp:1097 (ProcessCommand) ExternalRec(/home/xbmc/rec/recorder1/record.sh): External Recorder did not respond to 'FlowControl?' Dec 5 16:30:00: I TVRecEvent recorders/ExternalStreamHandler.cpp:1109 (ProcessCommand) ExternalRec(/home/xbmc/rec/recorder1/record.sh): ProcessCommand('FlowControl?') = 'OK:XON/XOFF'
However, it looks like it collected the right information:
Dec 5 16:30:00: I TVRecEvent recorders/ExternalStreamHandler.cpp:824 (OpenApp) ExternalRec(/home/xbmc/rec/recorder1/record.sh): Capabilities: tuner(yes) Picture attributes(no) Flow control(XON/XOFF)
If this is just a logging quirk, I am not bothered.
Reading the same data twice from the STDIN
This segment picks up where the one above ended. The FlowControl? seems to answered correctly, but the XON/XOFF apprears to be re-read as the answer to BlockSize?.
Dec 5 16:30:00: I TVRecEvent recorders/ExternalStreamHandler.cpp:1109 (ProcessCommand) ExternalRec(/home/xbmc/rec/recorder1/record.sh): ProcessCommand('FlowControl?') = 'OK:XON/XOFF' Dec 5 16:30:00: I TVRecEvent recorders/ExternalStreamHandler.cpp:816 (OpenApp) ExternalRec(/home/xbmc/rec/recorder1/record.sh): App opened successfully Dec 5 16:30:00: I TVRecEvent recorders/ExternalStreamHandler.cpp:824 (OpenApp) ExternalRec(/home/xbmc/rec/recorder1/record.sh): Capabilities: tuner(yes) Picture attributes(no) Flow control(XON/XOFF) Dec 5 16:30:00: I TVRecEvent recorders/ExternalStreamHandler.cpp:1109 (ProcessCommand) ExternalRec(/home/xbmc/rec/recorder1/record.sh): ProcessCommand('BlockSize:6160384') = 'OK:XON/XOFF'
After that, all the responses are "off by 1"
Here's the tail of the log, with the clear example:
ProcessCommand('LockTimeout?') = 'OK' Dec 5 16:30:02: E TVRecEvent recorders/ExternalSignalMonitor.cpp:206 (GetLockTimeout) ExternSigMon(/home/xbmc/rec/recorder1/record.sh): GetLockTimeout: invalid response 'OK' Dec 5 16:30:02: I SignalMonitor recorders/ExternalStreamHandler.cpp:1109 (ProcessCommand) ExternalRec(/home/xbmc/rec/recorder1/record.sh): ProcessCommand('SignalStrengthPercent?') = 'OK:10000'
the OK:10000 is the response to "GetLockTimeout?"...
Logs
Here is the relevant part of the log:
Dec 5 16:30:00: I TVRecEvent recorders/ExternalStreamHandler.cpp:1109 (ProcessCommand) ExternalRec(/home/xbmc/rec/recorder1/record.sh): ProcessCommand('Version?') = 'OK:1.0' Dec 5 16:30:00: I TVRecEvent recorders/ExternalStreamHandler.cpp:1109 (ProcessCommand) ExternalRec(/home/xbmc/rec/recorder1/record.sh): ProcessCommand('HasTuner?') = 'OK:Yes' Dec 5 16:30:00: W TVRecEvent recorders/ExternalStreamHandler.cpp:1097 (ProcessCommand) ExternalRec(/home/xbmc/rec/recorder1/record.sh): External Recorder did not respond to 'HasPictureAttributes?' Dec 5 16:30:00: I TVRecEvent recorders/ExternalStreamHandler.cpp:1109 (ProcessCommand) ExternalRec(/home/xbmc/rec/recorder1/record.sh): ProcessCommand('HasPictureAttributes?') = 'OK:No' Dec 5 16:30:00: W TVRecEvent recorders/ExternalStreamHandler.cpp:1097 (ProcessCommand) ExternalRec(/home/xbmc/rec/recorder1/record.sh): External Recorder did not respond to 'FlowControl?' Dec 5 16:30:00: I TVRecEvent recorders/ExternalStreamHandler.cpp:1109 (ProcessCommand) ExternalRec(/home/xbmc/rec/recorder1/record.sh): ProcessCommand('FlowControl?') = 'OK:XON/XOFF' Dec 5 16:30:00: I TVRecEvent recorders/ExternalStreamHandler.cpp:816 (OpenApp) ExternalRec(/home/xbmc/rec/recorder1/record.sh): App opened successfully Dec 5 16:30:00: I TVRecEvent recorders/ExternalStreamHandler.cpp:824 (OpenApp) ExternalRec(/home/xbmc/rec/recorder1/record.sh): Capabilities: tuner(yes) Picture attributes(no) Flow control(XON/XOFF) Dec 5 16:30:00: I TVRecEvent recorders/ExternalStreamHandler.cpp:1109 (ProcessCommand) ExternalRec(/home/xbmc/rec/recorder1/record.sh): ProcessCommand('BlockSize:6160384') = 'OK:XON/XOFF' Dec 5 16:30:00: I TVRecEvent recorders/ExternalStreamHandler.cpp:478 (Get) ExternSH: Creating new stream handler /home/xbmc/rec/recorder1/record.sh for /home/xbmc/rec/recorder1/record.sh Dec 5 16:30:00: I TVRecEvent recorders/ExternalStreamHandler.cpp:1109 (ProcessCommand) ExternalRec(/home/xbmc/rec/recorder1/record.sh): ProcessCommand('TuneChannel:1') = 'OK' Dec 5 16:30:00: I TVRecEvent recorders/ExternalChannel.cpp:72 (Tune) ExternChan[1](/home/xbmc/rec/recorder1/record.sh): Tune(55) Dec 5 16:30:00: I TVRecEvent recorders/ExternalChannel.cpp:85 (Tune) ExternChan[1](/home/xbmc/rec/recorder1/record.sh): Tuning to 55 Dec 5 16:30:00: W TVRecEvent recorders/ExternalStreamHandler.cpp:1097 (ProcessCommand) ExternalRec(/home/xbmc/rec/recorder1/record.sh): External Recorder did not respond to 'TuneChannel:55' Dec 5 16:30:02: I TVRecEvent recorders/ExternalStreamHandler.cpp:1109 (ProcessCommand) ExternalRec(/home/xbmc/rec/recorder1/record.sh): ProcessCommand('TuneChannel:55') = 'OK' Dec 5 16:30:02: I TVRecEvent recorders/ExternalStreamHandler.cpp:1109 (ProcessCommand) ExternalRec(/home/xbmc/rec/recorder1/record.sh): ProcessCommand('Version?') = 'OK' ProcessCommand('LockTimeout?') = 'OK' Dec 5 16:30:02: E TVRecEvent recorders/ExternalSignalMonitor.cpp:206 (GetLockTimeout) ExternSigMon(/home/xbmc/rec/recorder1/record.sh): GetLockTimeout: invalid response 'OK' Dec 5 16:30:02: I SignalMonitor recorders/ExternalStreamHandler.cpp:1109 (ProcessCommand) ExternalRec(/home/xbmc/rec/recorder1/record.sh): ProcessCommand('SignalStrengthPercent?') = 'OK:10000'
Change History (15)
comment:1 Changed 7 years ago by
Owner: | set to jpoet |
---|---|
Status: | new → assigned |
comment:2 Changed 7 years ago by
Hi John,
I don't want to go back to an older version because I need the changes in [af1126277a]... without this change I get a failure rate in the 40%.
I'll try building with a revert of [faa7994416], add in some extra logging and see what happens, but might take me some time to get back to you since I can't make it fail on a regular basis.
Thanks, Marc
comment:3 Changed 7 years ago by
Hi John,
So I added a loop (1000x) around the "cheap" Externalcommands to see if I could force it to reproduce. I didn't get a straight up failure where it just stopped working, but was able to reproduce this:
2017-12-07 22:11:12.387166 I ExternalRec(/home/mrawji/recorder2/record.sh): ProcessCommand('Version?') = 'OK:1.0' 2017-12-07 22:11:12.387423 I ExternalRec(/home/mrawji/recorder2/record.sh): ProcessCommand('Version?') = 'OK:1.0' 2017-12-07 22:11:12.387679 I ExternalRec(/home/mrawji/recorder2/record.sh): ProcessCommand('Version?') = 'OK:1.0' 2017-12-07 22:11:12.387937 I ExternalRec(/home/mrawji/recorder2/record.sh): ProcessCommand('Version?') = 'OK:1.0' 2017-12-07 22:11:12.388193 I ExternalRec(/home/mrawji/recorder2/record.sh): ProcessCommand('Version?') = 'OK:1.0' 2017-12-07 22:11:12.388520 I ExternalRec(/home/mrawji/recorder2/record.sh): ProcessCommand('Version?') = 'OK:1.0' 2017-12-07 22:11:12.388748 I ExternalRec(/home/mrawji/recorder2/record.sh): ProcessCommand('Version?') = 'OK:1.0' 2017-12-07 22:11:12.388993 I ExternalRec(/home/mrawji/recorder2/record.sh): ProcessCommand('Version?') = 'OK:1.0' 2017-12-07 22:11:12.395628 I ExternalRec(/home/mrawji/recorder2/record.sh): ProcessCommand('Version?') = 'OK:1.0' 2017-12-07 22:11:12.395665 W ExternalRec(/home/mrawji/recorder2/record.sh): External Recorder did not respond to 'Version?' 2017-12-07 22:11:12.430755 I ExternalRec(/home/mrawji/recorder2/record.sh): ProcessCommand('Version?') = 'OK:1.0' 2017-12-07 22:11:12.433036 W ExternalRec(/home/mrawji/recorder2/record.sh): External Recorder did not respond to 'Version?' 2017-12-07 22:11:12.444174 I ExternalRec(/home/mrawji/recorder2/record.sh): ProcessCommand('Version?') = 'OK:1.0'
I can make that happen with and without the [faa7994416] commit.
I am also sure I didn't get the EAGAIN error reading from the stream... it just read 0 bytes.
I also see this log:
2017-12-07 22:11:12.694597 I ExternalRec(/home/mrawji/recorder2/record.sh): ProcessCommand('HasTuner?') = 'OK:Yes' 2017-12-07 22:11:12.694793 I ExternalRec(/home/mrawji/recorder2/record.sh): ProcessCommand('HasTuner?') = 'OK:Yes' 2017-12-07 22:11:12.698246 I ExternalRec(/home/mrawji/recorder2/record.sh): ProcessCommand('HasTuner?') = 'OK:Yes OK:Yes OK:Yes OK:Yes OK:Yes' 2017-12-07 22:11:12.698771 I ExternalRec(/home/mrawji/recorder2/record.sh): ProcessCommand('HasTuner?') = 'OK:Yes' 2017-12-07 22:11:12.699396 I ExternalRec(/home/mrawji/recorder2/record.sh): ProcessCommand('HasTuner?') = 'OK:Yes'
but i don't know what to make of it... it shouldn't happen that 4 "lines" get read.
I think i'll try to put your commit back in, and force a "re-read"in case 0 bytes were returned by the read command, maybe with a usleep in there.
I am open to suggestions. random retries aren't my favorite way to fix problems.
Marc
comment:4 Changed 7 years ago by
Ok, assuming my crappy 'for loops' around Version? and the BlockSize? were valid ways to reproduce, this patch seems to fix it. I haven't tested it on prod setup..
diff --git a/mythtv/libs/libmythtv/recorders/ExternalStreamHandler.cpp b/mythtv/libs/libmythtv/recorders/ExternalStreamHandler.cpp index b2bf050..76017fd 100644 --- a/mythtv/libs/libmythtv/recorders/ExternalStreamHandler.cpp +++ b/mythtv/libs/libmythtv/recorders/ExternalStreamHandler.cpp @@ -1087,7 +1087,7 @@ bool ExternalStreamHandler::ProcessCommand(const QString & cmd, uint timeout, } // STATUS message are "out of band". // Ignore them while waiting for a responds to a command - if (!result.startsWith("STATUS")) + if (!result.startsWith("STATUS") && result.size() > 0) break; }
essentially, if the size of the read data is zero, it'll try reading again.
Thoughts? Marc
comment:5 Changed 7 years ago by
Milestone: | needs_triage → 29.1 |
---|---|
Version: | Unspecified → v29-fixes |
comment:6 Changed 7 years ago by
Status: | assigned → accepted |
---|
Marc,
Is your 'recorder' for low or standard def? Since it is taking so long to respond to commands, I assume it is fairly low bit-rate. Did you try increasing the timeout?
I am willing to commit the change to check the size of result. It does not seem to hurt anything. Have you been able to reproduce your problem, with that change in place?
John
comment:7 Changed 7 years ago by
Hi john,
No I haven't been able to reproduce but the failure rate looks like 4/1000, so low odds. No rush on the commit... I'll test more extensively, but I don't have a clear way to reproduce..
My recorder has both standard and high def inputs. I am not sure what you mean about it taking too long to respond... The logs above are all milliseconds apart and the timeouts are in seconds...
Marc
comment:8 Changed 7 years ago by
Hi John...
I am doing some more debugging. I am not sure my patch actually fixes anything...
Some additional testing shows that sometimes the first character of the buffer is a 'newline', and the "O" from OK is missing, so readline reads an empty string.
I'll do some more research.
Marc
comment:9 Changed 7 years ago by
Milestone: | 29.1 → 0.28.2 |
---|
Moving remaining open tickets to 0.28.2 milestone
comment:10 Changed 7 years ago by
Milestone: | 0.28.2 → 29.2 |
---|
Moving remaining open tickets to 29.2 milestone
comment:11 Changed 6 years ago by
mrawji were you able to resolve this issue?
I have added a new "API" version which is more robust in keeping messages between mythbackend and the external recorder in sync. Please try it.
John
comment:12 Changed 6 years ago by
Hi John,
Funny to see this. After almost a year, I looked at the ticket this week. I haven't gone a whole lot deeper.
I found a case where the queries got out of sync if the request timedout in MythTv?, but the actual call continued in my external recorder.
I fixed it by having the request handler in a threadpool and keeping track of the current request. This seems to work, but it doesn't happen often so time will tell.
I am pretty sure we can close this. If I find something useful, i'll reopen and post a patch as well.
Can you share the link the new API? I am happy to give it a go!
Marc
comment:13 Changed 6 years ago by
Hey John,
I upgraded to the newest version of .29-fixes that includes your new API, but I think I am seeing issues with the ExternalRecorder? now... I've basically foobared my system at this point, so trying to recover it.
I'll post logs when I have everything setup again.
Marc
comment:14 Changed 6 years ago by
Hey John... just bumped up my system to Ubuntu 18.04 and my recorder to python3 instead of 2.7.
Everything seems to be running smoother. I'll try coding up the support for the V2 API in the next couple weeks of works calm down :)
Marc
comment:15 Changed 3 years ago by
Resolution: | → Trac EOL |
---|---|
Status: | accepted → closed |
We have moved all bug tracking to github [1]
If you continue to have this issue, please open a new issue at github, referencing this ticket.
Can you try an older version, to see if [faa7994416] is somehow causing this problem for you? [faa7994416] was actually designed to fix a problem like this, but maybe it doesn't handle all cases.