Ticket #10265 (new Bug Report - Hang/Deadlock)
Opened 16 months ago
Last modified 5 months ago
mythbackend main thread blocks soon after startup on CentOS 6.2
| Reported by: | J. Ali Harlow <ali@…> | Owned by: | |
|---|---|---|---|
| Priority: | minor | Milestone: | unknown |
| Component: | MythTV - General | Version: | 0.24-fixes |
| Severity: | medium | Keywords: | |
| Cc: | Ticket locked: | yes |
Description
Running fixes/0.24 (294968bea7546d5c0d27b7326401e79a7e67764b) plus http://code.mythtv.org/trac/attachment/ticket/9704/mythtv-0.24-backport_reconnect_fixes.patch and mysql-5.1.52-1.el6_0.1.x86_64 on CentOS 6.2
Whenever I start mythbackend, I have a short window to start the frontend during which time it will connect (and appears to work fine). After that short window closes, the backend no longer accepts any connections.
I wrote a simple script/program to show what mythbackend was watching (attached). When I run this during the window, I see something like the following:
22280: poll fds=[{fd=3, events=1, revents=0} {fd=9, events=1, revents=0} {fd=38, events=1, revents=0} {fd=3, events=1, revents=0}] nfds=0x4
22285: poll fds=[{fd=7, events=1, revents=0}] nfds=0x1
22288: select nfds=0xf readfds=[12 13 14] writefds=0x0 exceptfds=0x0
22444: poll fds=[{fd=26, events=1, revents=0}] nfds=0x1
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
mythbacke 22280 root 3r FIFO 0,8 0t0 117605 pipe
mythbacke 22280 root 7r FIFO 0,8 0t0 117615 pipe
mythbacke 22280 root 9u IPv4 117616 0t0 TCP *:lds-dump (LISTEN)
mythbacke 22280 root 12u IPv4 117619 0t0 UDP *:apc-6549
mythbacke 22280 root 13u IPv4 117620 0t0 UDP 239.255.255.250:ssdp
mythbacke 22280 root 14u IPv4 117621 0t0 UDP 255.255.255.255:ssdp
mythbacke 22280 root 26r FIFO 0,8 0t0 118008 pipe
mythbacke 22280 root 38u IPv4 118014 0t0 TCP *:lds-distrib (LISTEN)
so we can see that the main thread is watching some pipe (twice) and the two listening TCP ports.
A little later, we get this:
22285: poll fds=[{fd=7, events=1, revents=0}] nfds=0x1
22288: select nfds=0xf readfds=[12 13 14] writefds=0x0 exceptfds=0x0
22444: poll fds=[{fd=26, events=1, revents=0}] nfds=0x1
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
mythbacke 22280 root 7r FIFO 0,8 0t0 117615 pipe
mythbacke 22280 root 12u IPv4 117619 0t0 UDP *:apc-6549
mythbacke 22280 root 13u IPv4 117620 0t0 UDP 239.255.255.250:ssdp
mythbacke 22280 root 14u IPv4 117621 0t0 UDP 255.255.255.255:ssdp
mythbacke 22280 root 26r FIFO 0,8 0t0 118008 pipe
and strace tells us the main thread is blocked reading the pipe that it was watching:
# strace -p 22280 Process 22280 attached - interrupt to quit read(3,
A backtrace at this point is attached.
Attachments
Change History
Changed 16 months ago by J. Ali Harlow <ali@…>
Script to see what mythbackend is watching
Changed 16 months ago by J. Ali Harlow <ali@…>
- Attachment read_mem.c added
Helper program for myth-poll
comment:1 Changed 16 months ago by wagnerrp
- Status changed from new to closed
- Resolution set to Duplicate
Duplicate of #10188
comment:2 Changed 16 months ago by wagnerrp
- Status changed from closed to new
- Resolution Duplicate deleted
scratch that...
comment:3 follow-up: ↓ 4 Changed 16 months ago by danielk
- Ali Harlow, this appears to be an issue with dbus. 1/ Is dbus installed? 2/ can you toggle whether it is installed and report back if that resolves the deadlock.
There probably isn't much we can do about dbus issues other than to just disable it, but it would be good to have some confirmation as to whether that is the real issue here.
PS In the future when we ask for a backtrace in mythtv land we really mean a "thread apply all bt" but in this case I don't think that is necessary.
comment:4 in reply to: ↑ 3 ; follow-up: ↓ 5 Changed 16 months ago by J. Ali Harlow <ali@…>
Replying to danielk:
- Ali Harlow, this appears to be an issue with dbus. 1/ Is dbus installed? 2/ can you toggle whether it is installed and report back if that resolves the deadlock.
yes, dbus-1.2.24-5.el6_1.x86_64 is installed. I can't uninstall it (just about every gnome package requires it). I tried killing the session daemon to see if I could do this temporarily to test but this just causes gnome to log me out (and then for gdm to get very confused).
PS In the future when we ask for a backtrace in mythtv land we really mean a "thread apply all bt" but in this case I don't think that is necessary.
Okay. Let me know if this would be useful and if rebuilding dbus-libs to get the debug symbols would be helpful.
Do we know what myth is doing to initiate the communication with dbus? I searched the source of mythbackend and the libs but didn't find it mentioned anywhere. (Wondering if I can disable it from there and achieve the same effect.)
comment:5 in reply to: ↑ 4 Changed 16 months ago by Ken Dreyer <ktdreyer@…>
I've been trying 0.23, 0.24.1, and 0.24-fixes on CentOS 6, all with very similar symptoms to this bug report.
I've found that if I disable EIT scanning (or just remove all tuners altogether), and don't connect any frontends, then mythbackend will respond on the HTTP status port (6544). After starting up the backend, as soon as I connect the first frontend client, then the backend immediately stops responding on 6544, and after that first client disconnects I cannot get a response on 6543 either. (I'm just using watch -n 0.5 'wget -O - http://localhost:6544' to test.) The backend is hosed from that point until I restart it.
Ali, does that match your symptoms?
comment:6 Changed 16 months ago by Ken Dreyer <ktdreyer@…>
Actually yes, my backtrace in gdb is identical to yours. Looks like I need to install the debuginfo RPMs for QT / DBus to see the last part of the stack trace.
comment:7 follow-up: ↓ 8 Changed 16 months ago by kkuphal
I would just like to add in case it is needed for debugging that I'm running my backend and frontend on CentOS 6.0, kernel version 2.6.32-71 and an earlier version of trunk from around August 1st and I do not have this issue currently. I also have dbus installed at 1.2.24-3.
Changed 16 months ago by Ken Dreyer <ktdreyer@…>
- Attachment kdreyer-backtrace.txt added
backtrace with debugging info (this was on 0.23, but it is basically the same as Ali's)
comment:8 in reply to: ↑ 7 Changed 16 months ago by Ken Dreyer <ktdreyer@…>
Replying to kkuphal: Thanks, I'm on a fully-updated 6.2 system. I downgraded my dbus, qt, and glibc back to CentOS 6's earliest versions (dbus-1.2.24-3, qt-4.6.2-16, glibc-2.12-1.7) and rebooted, but it had no effect.
The problem seems to be related to GDM's interaction with dbus. When I log into Xfce through GDM, I see mythbackend hang upon the first connection. Then when I run "init 3", then "startx", I'm able to launch mythfrontend repeatedly, and mythbackend doesn't hang.
comment:9 Changed 16 months ago by ktdreyer@…
As an update, the problem occurs whenever the "messagebus" service is running, or more specifically, "dbus-daemon" with the "--system" flag. Disabling the messagebus service allows mythbackend to run without hanging.
comment:10 Changed 16 months ago by yoshihara@…
I've encountered the same issue as this and been able to recover from it with disabling "messagebus" daemon starting. My environment is as follows:
OS: Scientific Linux release 6.1 (Carbon) Kernel: 2.6.32-220.4.1.el6.i686 Mythtv version: 0.24.1 TV Capture card: Earth-soft PT2
comment:11 Changed 16 months ago by yoshihara@…
After observation for a while, I found the above solution had so much influences on my environment, for example, X-Window couldn't start. So, I'm waiting for the provision of another one.
comment:12 Changed 16 months ago by ktdreyer@…
Here's the workaround I'm using (posted to the users' mailing list earlier)
I set up a minimal CentOS 6 chroot within /chroot, then installed mythtv-backend into that. When I run mythbackend inside the chroot, it is unable to connect with the systemwide dbus daemon, but it can still communicate with mysqld and mythfrontend over TCP/IP.
Regarding disk space, I created a 4GB logical volume for the chroot, but everything is currently only taking up 850 MB, so I could have probably gone for 2GB without problems. (I have the mythtv "recordings" directory on an altogether separate logical volume that I also mount within the chroot.)
It's certainly not optimal, but I'm not a DBus/Qt hacker so I don't know what else to do.
comment:13 Changed 15 months ago by Github
Move fix from 7866a616c into MythCoreContext?.
This moves the GetMasterHostName? fix from changeset 7866a616c into MythCoreContext?, in the event this is the cause of other similar stalls in the backend protocol server.
Refs #10265
Branch: master Changeset: a0c9d003dee59ed03e52ea564d2c0a00787cb96b
comment:14 Changed 15 months ago by yoshihara@…
I've downloaded and installed the latest version of mythtv through Github, but in vain...
comment:15 Changed 15 months ago by brian.sanders@…
If it helps any, I just started a MythTV install on Centos 6 fully updtaed. Pulled .24-fixes from git, and have run into this problem. I basically can't run mythtv on CentOS at this time.
Just to test, I shut down messagebus and I was able to get a connection to finally be accepted. So this does seem to be the same bug. It is very easy to replicate, just install from a current CentOS build and you can't avoid it.
comment:16 Changed 15 months ago by Jonathan Martens <jonathan@…>
I have the same issues with current trunk as well, see #10302. I would love to investigate this further, but I am relatively new to this. Perhaps some of you can give some pointers on where to start?
comment:17 follow-up: ↓ 18 Changed 12 months ago by Andrey Zhunev <a-j@…>
I discovered that downgrading qtwebkit from qtwebkit-2.1.1-1.el6 to qtwebkit-2.0-3.el6 solves the mythbackend blocking issue on CentOS 6.2. Not sure how to dig it further though... But with qtwebkit-2.0-3 installed, mythbackend is perfectly stable already for several days in a row (working 24/7).
comment:18 in reply to: ↑ 17 Changed 12 months ago by yoshihara@…
Replying to Andrey Zhunev <a-j@…>:
I discovered that downgrading qtwebkit from qtwebkit-2.1.1-1.el6 to qtwebkit-2.0-3.el6 solves the mythbackend blocking issue on CentOS 6.2. Not sure how to dig it further though... But with qtwebkit-2.0-3 installed, mythbackend is perfectly stable already for several days in a row (working 24/7).
Hi Andrey,
I tried the work-around you discovered on my own environment consists of Scientific Linux 6.2 and CentOS 6.2.
The former had no problem, but the latter did one with start of mythfrontend showing seg fault error of its latest kernel version.
Thus, personally I think the work-around is generally effective, but I would like dev team of mythTV to fix the root cause of this issue.
Regards, Yoshii
comment:19 Changed 12 months ago by yoshihara@…
I tried further since then and the problem on Scientific Linux's side was due to corruption of related libraries' installation structure for mythTV. I completely removed it and re-installed, succeeded to watch TV programs on it.
As a result, did qtwebkit cause this issue although I didn't obviously know its root cause?
Regards, Yoshii
comment:20 Changed 12 months ago by yoshihara@…
Sorry for my typo.
(Wrong) Scientific Linux's side
(Correct) CentOS's side
Regards, Yoshii
comment:21 Changed 12 months ago by ktdreyer@…
Yoshii: what does "rpm -qv qtwebkit" say?
comment:22 Changed 12 months ago by yoshihara@…
Please see the following description:
(Scientific Linux 6.2) [root@scientific6 ~]# rpm -qv qtwebkit qtwebkit-2.0-3.el6.i686
(CentOS 6.2) [root@centos6 ~]# rpm -qv qtwebkit qtwebkit-2.0-3.el6.x86_64
I downloaded and installed each package from ATrpms repository.
Regards, Yoshii
comment:23 Changed 12 months ago by ktdreyer@…
I have no idea how qtwebkit could be affecting this, but to me next step is to bisect the commits between qtwebkit 2.0 to 2.1.1, in order to understand what exactly could be causing the lockup.
comment:24 Changed 10 months ago by gronslet@…
Hi, I also have this issue - getting
E MythSocket(1676920:36): readStringList: Error, timed out after 7000 ms.
from mythfrontend.
My setup: CentOS 6.2 Various kernels (eg. kernel-2.6.32-220.23.1.el6.centos.plus.x86_64 and patched vanilla 3.4.4) MythTV fixes/0.25 compiled from git (2012-07-13), and prebuild binaries (0.25-6.el6.x86_64). I am running a combined FE/BE (frontend/backend).
I was not able to get mythfrontend talk to the backend before I did the workaround suggested in earlier comments, ie.
# wget http://dl.atrpms.net/all/qtwebkit-2.0-3.el6.x86_64.rpm # wget http://dl.atrpms.net/all/qtwebkit-devel-2.0-3.el6.x86_64.rpm # yum downgrade qtwebkit-2.0-3.el6.x86_64.rpm qtwebkit-devel-2.0-3.el6.x86_64.rpm
However, I ran into the problem again after restarting some services and/or mythfrontend/backend. So it seems it is not a permanent solution.
Big thanks for having this bugtracker - I have spent numerous of hours on this, and this bug report certainly saved me some time. :)
comment:25 Changed 6 months ago by karlcz@…
I just encountered this on two up to date CentOS 6.3 + EPEL + RPM Fusion machines using the mythtv 0.25 packages from RPM fusion.
A workaround that seems to work for now is to enable all of dbus, mythbackend, and GDM with auto-login as usual (for mythfrontend). Then in rc.local add this hack to get mythbackend started without dbus being available before restarting it for the sake of GDM and friends:
service messagebus stop service mythbackend stop sleep 5 service mythbackend start sleep 5 service messagebus start
Does anybody know how to tell mythbackend not to use dbus?
comment:26 Changed 6 months ago by ktdreyer@…
From what I can tell, the D-Bus interaction is coming from Qt event loop, not necessarily unique to MythTV. Again, I'm not a Qt hacker, but I'm not sure there's even a way to disable the D-Bus interaction via Qt's API.
comment:27 Changed 5 months ago by hobbes1069@…
Well I tried updating QT from 4.6.2 to 4.6.4 in my CentOS 6.3 virtual machine hoping that the problem was fixed upstream but no luck.
comment:28 Changed 5 months ago by anonymous@…
I don't think the issue is QT, but rather the rather ancient level of glib2 supplied with Centos/RHEL. In the 2.22.5 version of gmain.c, the wakeup pipe is created blocking; in later versions the wakeup code is moved into a new gwakeup.c module and uses non-blocking pipes.
A possible circumvention (haven't tried this yet) would be to modify QT src/corelib/kernel/qsocketnotifier.cpp to force registered file descriptors non-blocking.
comment:29 Changed 5 months ago by hobbes1069@…
Well if someone can come up with a patch I'll try it in my CentOS 6.3 virtual machine...
comment:30 Changed 5 months ago by kenni
- Ticket locked set
Please use the mythtv-users mailing list for discussions. There're way too many comments in this ticket, which doesn't add any useful information at all. If the discussions on the mailing list lead to any conclusions, let's add the conclusions here afterwards.

Backtrace