Opened 9 years ago

Closed 9 years ago

#9123 closed defect (Fixed)

frontend and backend get stuck after live tv channel change using HD-PVR

Reported by: Johnny Stenback <mythtv-users@…> Owned by: cpinkham
Priority: minor Milestone: 0.24
Component: MythTV - General Version: Master Head
Severity: medium Keywords:
Cc: Ticket locked: no

Description

This is somewhat of a complex problem in that the only way I'm able to attempt to reproduce this is in my full production system which consists of a master backend, a separate combined frontend/slave backend, and a separate frontend. My system setup is as follows:

Master backend: mythbackend running, no frontend. One PVR-250 installed in the box. This is my secondary encoder. Big beefy server (well, sort of)

Combined frontend/slave: mythfrontend running, and mythbackend running as a slave, with a HD-PVR attached and configured. This is my primary encoder. Zotac ION based system, low powered atom, using VDPAU for playback.

Frontend only: Identical hardware to the combined frontend, but no slave running here.

This system has worked very well on 0.23 since it was released, and on trunk builds from before then, for over a year now. Yesterday I took the plunge to update to trunk builds as trunk was getting close to 0.24, and in fact the branch had just gotten created when I pulled, but I ended up at rev 26886. Anyways, the system seems to work very well for recording and watching recordings, and for live tv as well as long as I either watch live tv from the PVR-250. But if I watch live tv from the HD-PVR, attached to the slave backend, then it stops working at the first channel change.

What happens is this. I start live tv on the combined frontend/slave and either end up on the HD-PVR input or change to it, I see the signal monitor doing it's thing, I typically see the dialog saying "You should have received a signal by now..." (I have a 3 second sleep in my channel changing script), and eventually the channel change is done and everything's good. Video is playing, audio is fine, nothing wrong. Then when I change channels, I get the signal monitor again, and after a couple of seconds I get the dialog saying "You should have received a signal by now..." behind the signal monitor. At this point the frontend appears to lock up. No way out. But it's not deadlocked, it uses ~100% CPU at this point (~10% before this point), and the slave backend *also* is pegged using ~100% CPU. No buttons on the remote get me out of this (short of the one I have set to kill the frontend to restart it if it gets locked up).

If I attach to the frontend process with gdb at the point when the frontend locks up and interrupt a bunch of times, I see stack traces that mostly look like this:

#0 0x00007f00dcebaaed in open64 () from /lib/libpthread.so.0 #1 0x00007f00dd21e42a in ?? () from /usr/lib/qt4/lib/libQtCore.so.4 #2 0x00007f00dd216879 in QFSFileEngine::open(QFlags<QIODevice::OpenModeFlag>)

() from /usr/lib/qt4/lib/libQtCore.so.4

#3 0x00007f00dd1d774c in QFile::open(QFlags<QIODevice::OpenModeFlag>) ()

from /usr/lib/qt4/lib/libQtCore.so.4

#4 0x00007f00dd745ca0 in ?? () from /usr/lib/qt4/lib/libQtGui.so.4 #5 0x00007f00dd746b0b in QImageReader::supportsAnimation() const ()

from /usr/lib/qt4/lib/libQtGui.so.4

#6 0x00007f00de459d8f in MythUIImage::Load(bool, bool) ()

from /usr/lib/libmythui-0.24.so.0

#7 0x00007f00dea3664d in ?? () from /usr/lib/libmythtv-0.24.so.0 #8 0x00007f00de940b1a in TV::UpdateOSDSignal(PlayerContext? const*, QStringList const&) () from /usr/lib/libmythtv-0.24.so.0 #9 0x00007f00de96f94b in TV::customEvent(QEvent*) ()

from /usr/lib/libmythtv-0.24.so.0

#10 0x00007f00dd25d29c in QObject::event(QEvent*) ()

from /usr/lib/qt4/lib/libQtCore.so.4

#11 0x00007f00de90efbd in TV::event(QEvent*) ()

from /usr/lib/libmythtv-0.24.so.0

#12 0x00007f00dd67ca04 in QApplicationPrivate::notify_helper(QObject*, QEvent*)

() from /usr/lib/qt4/lib/libQtGui.so.4

#13 0x00007f00dd682e2f in QApplication::notify(QObject*, QEvent*) ()

from /usr/lib/qt4/lib/libQtGui.so.4

#14 0x00007f00dd246fcc in QCoreApplication::notifyInternal(QObject*, QEvent*)

() from /usr/lib/qt4/lib/libQtCore.so.4

#15 0x00007f00dd249ef4 in QCoreApplicationPrivate::sendPostedEvents(QObject*, int, QThreadData*) () from /usr/lib/qt4/lib/libQtCore.so.4 #16 0x00007f00dd2740d3 in ?? () from /usr/lib/qt4/lib/libQtCore.so.4 #17 0x00007f00d9626188 in g_main_context_dispatch ()

from /usr/lib/libglib-2.0.so.0

#18 0x00007f00d9626768 in ?? () from /usr/lib/libglib-2.0.so.0 #19 0x00007f00d9626a0a in g_main_context_iteration ()

from /usr/lib/libglib-2.0.so.0

#20 0x00007f00dd2744ff in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () from /usr/lib/qt4/lib/libQtCore.so.4 #21 0x00007f00dd71ea4e in ?? () from /usr/lib/qt4/lib/libQtGui.so.4 #22 0x00007f00dd24a2bf in QCoreApplication::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () from /usr/lib/qt4/lib/libQtCore.so.4

or variations thereof, but the common thing is that the frontend is pretty much always in something that's called from TV::UpdateOSDSignal(). And we're not locked up in anything there, we keep getting new messages that land us in the same place over and over. Attaching to the backend with gdb shows that it's doing various things, processing events, writing to a socket (MythSocket::writeStringList()), but nothing consistent or obviously interesting that I've seen yet. But I only did get a couple of stacks from the backend.

All the while, the master backend seems mostly unaffected by any of this, at least to the point where a simultaneous live tv instance watching off of the PVR-250 keeps running in another frontend running in a small window on a separate unrelated linux box (for testing only). Though it does seem like things are a bit sluggish, so the master may be involved one way or another.

I'll attach logs as well (from a different instance than when I grabbed the stacks, but the behavior was identical). And filing this against 0.24, even if this was seen with the rev just past the branch point. Feel free to adjust milestone and priorities etc as I'm mostly guessing on those.

Attachments (5)

frontend.log (34.3 KB) - added by Johnny Stenback <mythtv-users@…> 9 years ago.
frontend log, -v playback
slave.log (47.0 KB) - added by Johnny Stenback <mythtv-users@…> 9 years ago.
slave.log, -v record, the relevant bits are after the second to last channel change.
master.log (9.4 KB) - added by anonymous 9 years ago.
mythtv-MythEvent-debug-diff (4.2 KB) - added by Johnny Stenback <mythtv-users@…> 9 years ago.
debugging diff used to create logs…
9123_no_event_socket_on_backends.diff (973 bytes) - added by cpinkham 9 years ago.
Don't open an event socket if a backend connects to the master via MythCoreContext?

Download all attachments as: .zip

Change History (22)

Changed 9 years ago by Johnny Stenback <mythtv-users@…>

Attachment: frontend.log added

frontend log, -v playback

Changed 9 years ago by Johnny Stenback <mythtv-users@…>

Attachment: slave.log added

slave.log, -v record, the relevant bits are after the second to last channel change.

Changed 9 years ago by anonymous

Attachment: master.log added

comment:1 Changed 9 years ago by Johnny Stenback <mythtv-users@…>

The master log probably doesn't show anything of interesting, but I attached it for completeness sake.

And again, I can reproduce this at will, attach with gdb etc, so if anyone has ideas for other logs or other ways I could provide more information I'd gladly do that.

comment:2 Changed 9 years ago by robertm

Status: newinfoneeded_new

Cannot reproduce personally, and nobody else seems able to either-- did you remove *all* traces of old Myth libraries, executables, and other content before installing trunk?

comment:3 Changed 9 years ago by robertm

Also, what is the hardware on the master and slave backends, and what is the type of network connection? Logs indicate an unstable connection.

comment:4 Changed 9 years ago by robertm

And as a final followup, what is the behavior on the standalone frontend? What happens on the slave/frontend if you turn off the backend there? Do you have the "always stream from master backend" setting on?

comment:5 Changed 9 years ago by robertm

Milestone: 0.24unknown
Owner: set to cpinkham
Priority: majorminor
Severity: mediumlow
Version: UnspecifiedTrunk Head

comment:6 Changed 9 years ago by robertm

In discussion with this user in IRC, we've found that the MBE is sending numerous MythEvent? SIGNAL events per second to the SBE, which appear to be overrunning its ability to keep up and stream content. Toggling "always stream from master backend" to off has preliminarily helped the issue, but the "SIGNAL 1" events seem to be causing a problem. It appears they are dispatched from

void SignalMonitor::MonitorLoop?()

The loop continues and overwhelms the SBE, which then starts spewing MythSocket? errors, and the frontend is stuck in a loop.

I've asked the reporter to update the ticket with -v network logs from both backends. Reporter notes that killing the SBE seems to fix the issue, but the combination of an SBE + always stream from MBE should at least be accounted for, rather than just bringing the system to its knees.

comment:7 Changed 9 years ago by robertm

To clarify, turning off "always stream from backend" seems to avoid the problem, not fix it. The real issue is the overwhelming of the SBE with MythEvents?, and subsequent socket errors interrupting communication between the BEs.

comment:8 Changed 9 years ago by Johnny Stenback <mythtv-users@…>

Ok, trying to answer most questions here.

The hardware of the mbe is a 2.93GHz Intel Core2 duo with 4 gigs of ram an plenty of disks.

The hardware of the frontends (both identical) is a Zotac IONITX based system with a single core Intel Atom 230 and, Nvidia ION graphics.

The network is gigabit ethernet all the way.

I did a bunch more digging today, and while the behavior has changed to the point where I don't consistently see the lockup any more, or have not been able to reproduce it today at all, I still see issues which look like they're rooted in the same underlying problem.

What I'm seeing now (both on the standalone frontend and on the combined sbe and frontend) is that live tv starts out fine, but after the first channel change the osd never goes away, it remains open after the video starts playing and it's saying that the signal lelvel is 0%, and some other value flashes by at a pretty high frequency. This matches what I see in the logs, when this happens I see a flood of "MythEvent?: SIGNAL 1" messages in the logs. And what's interesting is that these messages keep going in the logs even after I stop watching live tv.

My theory so far is that we somehow fail to exit the loop in SignalMonitor::MonitorLoop?(), if that's the case I would expect the behavior I'm seeing, more or less. Per the logs, SignalMonitor::Stop() is called, but if for some reason the member "running" is not set at that point we would fail to stop the monitor loop. I'm working on a way to prove this, but if anyone has other ideas I'm all ears.

I'll attach the logs from both the sbe and the mbe, the mbe happened to be running with -v general,important,network,record,channel, and the sbe with -v important,general,extra,network,channel,record.

Oh, and yes, always stream from master backend is on here, I had it off earlier, but that seemed to have made it impossible to watch any live tv on my standalone frontend (I kept getting "Error opening jump program file buffer", and then dropped back to the main menu), but that's another issue for another ticket that I'll investigate more later if it persists.

comment:9 Changed 9 years ago by Johnny Stenback <mythtv-users@…>

Ok, the logs are too big to go as attachments, but I've uploaded them at:

http://www.jstenback.com/myth/mbe-toby.log (master backend)
http://www.jstenback.com/myth/sbe-toby.log (slave backend)

comment:10 Changed 9 years ago by anonymous

Ok, more info here...

I added more logging code to the signal monitor and it's working as expected afaict, we tear it down properly, leave the monitor loop and stop sending its messages. So that's not where the problem is.

What happens after the first channel change *while* watching live tv from my sbe is that the sbe gets the "SIGNAL 1" message it just sent *back* in MythCoreContext::readyRead() (from the mbe?) and then dispatches it again, and then we rinse and repeat. This is what triggers the flood of "SIGNAL 1" messages I'm seeing, we send it, receive it back, and send it again, and so on. But why, or what triggers this, I have no clue. I've learned this by adding more logging around the event dispatch code in various places...

Does any of this make any sense to anyone? Any more logs I can provide that would be helpful here?

comment:11 Changed 9 years ago by danielk

Milestone: unknown0.24
Priority: minorblocker
Severity: lowhigh
Status: infoneeded_newnew

Recategorizing based on debugging provided this morning. The "SIGNAL 1" is not important, it's the retransmission of any MythEvent? in a loop that is the issue. In the past this has been caused by a backend connecting an event socket to itself. There could be some new cause, but whatever the cause it is a blocker.

Note: I also saw a report of 100% frontend and backend CPU usage I believe on the users mailing list which in retrospect sounds like it could be the same problem.

comment:12 Changed 9 years ago by Johnny Stenback <mythtv-users@…>

Yet more info, which will hopefully be helpful here...

Using a debug patch that I hacked up (I'll attach the patch) I've learned that the message storm that I'm seeing here goes from the sbe to the mbe and then from there back to the sbe (and out to the fe as well afaict). I have logs with my debug info added (to the sbe only), created with -v general,important,network,record,channel,socket (and extra too on the sbe, which probably isn't important). The logs are available at

http://www.jstenback.com/myth/mbe-event-debug.log
http://www.jstenback.com/myth/sbe-event-debug.log
http://www.jstenback.com/myth/fe-event-debug.log

and in those logs 10.0.0.5 (mythtv) is the mbe, 10.0.0.130 (toby) is the sbe, and 10.0.0.1 (gandalf) is a separate linux box I used as the fe for this test. The sequence of events in those logs is to start the mbe, start the sbe, start the fe, then in the fe hit live tv (channel 244 on the mbe), switch inputs (channel 244 on the sbe), change channel (to channel 276 on the sbe).

My belief is that the first occurrence of MythCoreContext::readyRead in the sbe log is the first event that comes back to the sbe.

Changed 9 years ago by Johnny Stenback <mythtv-users@…>

Attachment: mythtv-MythEvent-debug-diff added

debugging diff used to create logs...

Changed 9 years ago by cpinkham

Don't open an event socket if a backend connects to the master via MythCoreContext?

comment:13 Changed 9 years ago by Johnny Stenback <mythtv-users@…>

The attached patch does seem to stop the flood of the events that I've been seeing! If more logs are of any use, just let me know.

comment:14 Changed 9 years ago by cpinkham

(In [26963]) Don't open an event socket when a backend connects back to the master backend via MythCoreContext::ConnectToMasterServer?(). Normally this shouldn't be necessary since slaves should be using their existing master backend connection, but this fix will prevent an event loop should something cause a slave to connect back to the master.

Refs #9123. This fixes the event loop in #9123, but doesn't fix the actual bug that is causing the slave to connect back to the master, so I'm leaving the ticket open for now. I'm also lowering the status level of the ticket separately since #9123 should no longer be a blocker.

comment:15 Changed 9 years ago by cpinkham

(In [26964]) Backports [26963] from trunk.

Here's the original commit message:

Don't open an event socket when a backend connects back to the master backend via MythCoreContext::ConnectToMasterServer?(). Normally this shouldn't be necessary since slaves should be using their existing master backend connection, but this fix will prevent an event loop should something cause a slave to connect back to the master.

Refs #9123. This fixes the event loop in #9123, but doesn't fix the actual bug that is causing the slave to connect back to the master, so I'm leaving the ticket open for now. I'm also lowering the status level of the ticket separately since #9123 should no longer be a blocker.

comment:16 Changed 9 years ago by cpinkham

Milestone: 0.24unknown
Priority: blockerminor
Severity: highmedium

comment:17 Changed 9 years ago by cpinkham

Milestone: unknown0.24
Resolution: Fixed
Status: newclosed

After discussion in IRC, the root cause of this was determined to be a missing mythpreviewgen binary on the slave backend. In this case, the slave punts to the master to generate the preview if it can.

Note: See TracTickets for help on using tickets.