Opened 13 years ago
Closed 9 years ago
#10265 closed Bug Report - Hang/Deadlock (Upstream Bug)
mythbackend main thread blocks soon after startup on CentOS 6.2
Reported by: | Owned by: | ||
---|---|---|---|
Priority: | minor | Milestone: | unknown |
Component: | MythTV - General | Version: | 0.24-fixes |
Severity: | medium | Keywords: | |
Cc: | Ticket locked: | no |
Description
Running fixes/0.24 (294968bea7546d5c0d27b7326401e79a7e67764b) plus http://code.mythtv.org/trac/attachment/ticket/9704/mythtv-0.24-backport_reconnect_fixes.patch and mysql-5.1.52-1.el6_0.1.x86_64 on CentOS 6.2
Whenever I start mythbackend, I have a short window to start the frontend during which time it will connect (and appears to work fine). After that short window closes, the backend no longer accepts any connections.
I wrote a simple script/program to show what mythbackend was watching (attached). When I run this during the window, I see something like the following:
22280: poll fds=[{fd=3, events=1, revents=0} {fd=9, events=1, revents=0} {fd=38, events=1, revents=0} {fd=3, events=1, revents=0}] nfds=0x4 22285: poll fds=[{fd=7, events=1, revents=0}] nfds=0x1 22288: select nfds=0xf readfds=[12 13 14] writefds=0x0 exceptfds=0x0 22444: poll fds=[{fd=26, events=1, revents=0}] nfds=0x1 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME mythbacke 22280 root 3r FIFO 0,8 0t0 117605 pipe mythbacke 22280 root 7r FIFO 0,8 0t0 117615 pipe mythbacke 22280 root 9u IPv4 117616 0t0 TCP *:lds-dump (LISTEN) mythbacke 22280 root 12u IPv4 117619 0t0 UDP *:apc-6549 mythbacke 22280 root 13u IPv4 117620 0t0 UDP 239.255.255.250:ssdp mythbacke 22280 root 14u IPv4 117621 0t0 UDP 255.255.255.255:ssdp mythbacke 22280 root 26r FIFO 0,8 0t0 118008 pipe mythbacke 22280 root 38u IPv4 118014 0t0 TCP *:lds-distrib (LISTEN)
so we can see that the main thread is watching some pipe (twice) and the two listening TCP ports.
A little later, we get this:
22285: poll fds=[{fd=7, events=1, revents=0}] nfds=0x1 22288: select nfds=0xf readfds=[12 13 14] writefds=0x0 exceptfds=0x0 22444: poll fds=[{fd=26, events=1, revents=0}] nfds=0x1 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME mythbacke 22280 root 7r FIFO 0,8 0t0 117615 pipe mythbacke 22280 root 12u IPv4 117619 0t0 UDP *:apc-6549 mythbacke 22280 root 13u IPv4 117620 0t0 UDP 239.255.255.250:ssdp mythbacke 22280 root 14u IPv4 117621 0t0 UDP 255.255.255.255:ssdp mythbacke 22280 root 26r FIFO 0,8 0t0 118008 pipe
and strace tells us the main thread is blocked reading the pipe that it was watching:
# strace -p 22280 Process 22280 attached - interrupt to quit read(3,
A backtrace at this point is attached.
Attachments (4)
Change History (47)
comment:3 follow-up: 4 Changed 13 years ago by
- Ali Harlow, this appears to be an issue with dbus. 1/ Is dbus installed? 2/ can you toggle whether it is installed and report back if that resolves the deadlock.
There probably isn't much we can do about dbus issues other than to just disable it, but it would be good to have some confirmation as to whether that is the real issue here.
PS In the future when we ask for a backtrace in mythtv land we really mean a "thread apply all bt" but in this case I don't think that is necessary.
comment:4 follow-up: 5 Changed 13 years ago by
Replying to danielk:
- Ali Harlow, this appears to be an issue with dbus. 1/ Is dbus installed? 2/ can you toggle whether it is installed and report back if that resolves the deadlock.
yes, dbus-1.2.24-5.el6_1.x86_64 is installed. I can't uninstall it (just about every gnome package requires it). I tried killing the session daemon to see if I could do this temporarily to test but this just causes gnome to log me out (and then for gdm to get very confused).
PS In the future when we ask for a backtrace in mythtv land we really mean a "thread apply all bt" but in this case I don't think that is necessary.
Okay. Let me know if this would be useful and if rebuilding dbus-libs to get the debug symbols would be helpful.
Do we know what myth is doing to initiate the communication with dbus? I searched the source of mythbackend and the libs but didn't find it mentioned anywhere. (Wondering if I can disable it from there and achieve the same effect.)
comment:5 Changed 13 years ago by
I've been trying 0.23, 0.24.1, and 0.24-fixes on CentOS 6, all with very similar symptoms to this bug report.
I've found that if I disable EIT scanning (or just remove all tuners altogether), and don't connect any frontends, then mythbackend will respond on the HTTP status port (6544). After starting up the backend, as soon as I connect the first frontend client, then the backend immediately stops responding on 6544, and after that first client disconnects I cannot get a response on 6543 either. (I'm just using watch -n 0.5 'wget -O - http://localhost:6544'
to test.) The backend is hosed from that point until I restart it.
Ali, does that match your symptoms?
comment:6 Changed 13 years ago by
Actually yes, my backtrace in gdb is identical to yours. Looks like I need to install the debuginfo RPMs for QT / DBus to see the last part of the stack trace.
comment:7 follow-up: 8 Changed 13 years ago by
I would just like to add in case it is needed for debugging that I'm running my backend and frontend on CentOS 6.0, kernel version 2.6.32-71 and an earlier version of trunk from around August 1st and I do not have this issue currently. I also have dbus installed at 1.2.24-3.
Changed 13 years ago by
Attachment: | kdreyer-backtrace.txt added |
---|
backtrace with debugging info (this was on 0.23, but it is basically the same as Ali's)
comment:8 Changed 13 years ago by
Replying to kkuphal: Thanks, I'm on a fully-updated 6.2 system. I downgraded my dbus, qt, and glibc back to CentOS 6's earliest versions (dbus-1.2.24-3, qt-4.6.2-16, glibc-2.12-1.7) and rebooted, but it had no effect.
The problem seems to be related to GDM's interaction with dbus. When I log into Xfce through GDM, I see mythbackend hang upon the first connection. Then when I run "init 3", then "startx", I'm able to launch mythfrontend repeatedly, and mythbackend doesn't hang.
comment:9 Changed 13 years ago by
As an update, the problem occurs whenever the "messagebus" service is running, or more specifically, "dbus-daemon" with the "--system" flag. Disabling the messagebus service allows mythbackend to run without hanging.
comment:10 Changed 13 years ago by
I've encountered the same issue as this and been able to recover from it with disabling "messagebus" daemon starting. My environment is as follows:
OS: Scientific Linux release 6.1 (Carbon) Kernel: 2.6.32-220.4.1.el6.i686 Mythtv version: 0.24.1 TV Capture card: Earth-soft PT2
comment:11 Changed 13 years ago by
After observation for a while, I found the above solution had so much influences on my environment, for example, X-Window couldn't start. So, I'm waiting for the provision of another one.
comment:12 Changed 13 years ago by
Here's the workaround I'm using (posted to the users' mailing list earlier)
I set up a minimal CentOS 6 chroot within /chroot, then installed mythtv-backend into that. When I run mythbackend inside the chroot, it is unable to connect with the systemwide dbus daemon, but it can still communicate with mysqld and mythfrontend over TCP/IP.
Regarding disk space, I created a 4GB logical volume for the chroot, but everything is currently only taking up 850 MB, so I could have probably gone for 2GB without problems. (I have the mythtv "recordings" directory on an altogether separate logical volume that I also mount within the chroot.)
It's certainly not optimal, but I'm not a DBus/Qt hacker so I don't know what else to do.
comment:13 Changed 13 years ago by
Move fix from 7866a616c into MythCoreContext?.
This moves the GetMasterHostName? fix from changeset 7866a616c into MythCoreContext?, in the event this is the cause of other similar stalls in the backend protocol server.
Refs #10265
Branch: master Changeset: a0c9d003dee59ed03e52ea564d2c0a00787cb96b
comment:14 Changed 13 years ago by
I've downloaded and installed the latest version of mythtv through Github, but in vain...
comment:15 Changed 13 years ago by
If it helps any, I just started a MythTV install on Centos 6 fully updtaed. Pulled .24-fixes from git, and have run into this problem. I basically can't run mythtv on CentOS at this time.
Just to test, I shut down messagebus and I was able to get a connection to finally be accepted. So this does seem to be the same bug. It is very easy to replicate, just install from a current CentOS build and you can't avoid it.
comment:16 Changed 13 years ago by
I have the same issues with current trunk as well, see #10302. I would love to investigate this further, but I am relatively new to this. Perhaps some of you can give some pointers on where to start?
comment:17 follow-up: 18 Changed 13 years ago by
I discovered that downgrading qtwebkit from qtwebkit-2.1.1-1.el6 to qtwebkit-2.0-3.el6 solves the mythbackend blocking issue on CentOS 6.2. Not sure how to dig it further though... But with qtwebkit-2.0-3 installed, mythbackend is perfectly stable already for several days in a row (working 24/7).
comment:18 Changed 13 years ago by
Replying to Andrey Zhunev <a-j@…>:
I discovered that downgrading qtwebkit from qtwebkit-2.1.1-1.el6 to qtwebkit-2.0-3.el6 solves the mythbackend blocking issue on CentOS 6.2. Not sure how to dig it further though... But with qtwebkit-2.0-3 installed, mythbackend is perfectly stable already for several days in a row (working 24/7).
Hi Andrey,
I tried the work-around you discovered on my own environment consists of Scientific Linux 6.2 and CentOS 6.2.
The former had no problem, but the latter did one with start of mythfrontend showing seg fault error of its latest kernel version.
Thus, personally I think the work-around is generally effective, but I would like dev team of mythTV to fix the root cause of this issue.
Regards, Yoshii
comment:19 Changed 13 years ago by
I tried further since then and the problem on Scientific Linux's side was due to corruption of related libraries' installation structure for mythTV. I completely removed it and re-installed, succeeded to watch TV programs on it.
As a result, did qtwebkit cause this issue although I didn't obviously know its root cause?
Regards, Yoshii
comment:20 Changed 13 years ago by
Sorry for my typo.
(Wrong) Scientific Linux's side
(Correct) CentOS's side
Regards, Yoshii
comment:22 Changed 13 years ago by
Please see the following description:
(Scientific Linux 6.2) [root@scientific6 ~]# rpm -qv qtwebkit qtwebkit-2.0-3.el6.i686
(CentOS 6.2) [root@centos6 ~]# rpm -qv qtwebkit qtwebkit-2.0-3.el6.x86_64
I downloaded and installed each package from ATrpms repository.
Regards, Yoshii
comment:23 Changed 13 years ago by
I have no idea how qtwebkit could be affecting this, but to me next step is to bisect the commits between qtwebkit 2.0 to 2.1.1, in order to understand what exactly could be causing the lockup.
comment:24 Changed 12 years ago by
Hi, I also have this issue - getting
E MythSocket(1676920:36): readStringList: Error, timed out after 7000 ms.
from mythfrontend.
My setup: CentOS 6.2 Various kernels (eg. kernel-2.6.32-220.23.1.el6.centos.plus.x86_64 and patched vanilla 3.4.4) MythTV fixes/0.25 compiled from git (2012-07-13), and prebuild binaries (0.25-6.el6.x86_64). I am running a combined FE/BE (frontend/backend).
I was not able to get mythfrontend talk to the backend before I did the workaround suggested in earlier comments, ie.
# wget http://dl.atrpms.net/all/qtwebkit-2.0-3.el6.x86_64.rpm # wget http://dl.atrpms.net/all/qtwebkit-devel-2.0-3.el6.x86_64.rpm # yum downgrade qtwebkit-2.0-3.el6.x86_64.rpm qtwebkit-devel-2.0-3.el6.x86_64.rpm
However, I ran into the problem again after restarting some services and/or mythfrontend/backend. So it seems it is not a permanent solution.
Big thanks for having this bugtracker - I have spent numerous of hours on this, and this bug report certainly saved me some time. :)
comment:25 Changed 12 years ago by
I just encountered this on two up to date CentOS 6.3 + EPEL + RPM Fusion machines using the mythtv 0.25 packages from RPM fusion.
A workaround that seems to work for now is to enable all of dbus, mythbackend, and GDM with auto-login as usual (for mythfrontend). Then in rc.local add this hack to get mythbackend started without dbus being available before restarting it for the sake of GDM and friends:
service messagebus stop service mythbackend stop sleep 5 service mythbackend start sleep 5 service messagebus start
Does anybody know how to tell mythbackend not to use dbus?
comment:26 Changed 12 years ago by
From what I can tell, the D-Bus interaction is coming from Qt event loop, not necessarily unique to MythTV. Again, I'm not a Qt hacker, but I'm not sure there's even a way to disable the D-Bus interaction via Qt's API.
comment:27 Changed 12 years ago by
Well I tried updating QT from 4.6.2 to 4.6.4 in my CentOS 6.3 virtual machine hoping that the problem was fixed upstream but no luck.
comment:28 Changed 12 years ago by
I don't think the issue is QT, but rather the rather ancient level of glib2 supplied with Centos/RHEL. In the 2.22.5 version of gmain.c, the wakeup pipe is created blocking; in later versions the wakeup code is moved into a new gwakeup.c module and uses non-blocking pipes.
A possible circumvention (haven't tried this yet) would be to modify QT src/corelib/kernel/qsocketnotifier.cpp to force registered file descriptors non-blocking.
comment:29 Changed 12 years ago by
Well if someone can come up with a patch I'll try it in my CentOS 6.3 virtual machine...
comment:30 Changed 12 years ago by
Ticket locked: | set |
---|
Please use the mythtv-users mailing list for discussions. There're way too many comments in this ticket, which doesn't add any useful information at all. If the discussions on the mailing list lead to any conclusions, let's add the conclusions here afterwards.
comment:31 follow-up: 32 Changed 11 years ago by
Status: | new → infoneeded_new |
---|---|
Ticket locked: | unset |
Is this still a problem? You might just have to use a more up to date OS that uses newer versions of the packages required by Myth rather than the ancient versions CentOS likes to use :)
comment:32 Changed 11 years ago by
Replying to paulh:
Is this still a problem?
I am still seeing this same problem on a fresh CentOS 6.4 install with MythTV installed from the RPM Fusion repo. Stopping the messagebus service allows mythbackend to respond to requests again. So nothing seems to have changed in the last 2 years.
comment:33 follow-up: 34 Changed 11 years ago by
Is that this, build date 30 May, http://download1.rpmfusion.org/free/el/updates/6/x86_64/mythtv-0.26.0-9.el6.x86_64.rpm
or perhaps this, from 'testing', build date 13 Oct but still 0.26? http://download1.rpmfusion.org/free/el/updates/testing/6/x86_64/mythtv-0.26.1-4.el6.x86_64.rpm
or did you build from the 0.27 SRPM, which hasn't yet, TTBOMK, reached the repo (and probably won't, AIUI, since it needs a version of Qt that doesn't meet the repo guidelines).
I have a 0.27-fixes i386 build under Scientific Linux 6 with extras, and I haven't seen this.
comment:34 Changed 11 years ago by
Replying to John Pilkington <J.Pilk@…>:
Is that this, build date 30 May, http://download1.rpmfusion.org/free/el/updates/6/x86_64/mythtv-0.26.0-9.el6.x86_64.rpm
Yes, I have this build installed. Waiting for the cmyth plugin for XBMC to support 0.27 before upgrading to that one.
comment:35 Changed 11 years ago by
The latest MythTV 0.27 source seems to need at least Qt 4.7 or later version for its compiling. But CentOS itself still has up to 4.6 rpm package...
comment:36 Changed 11 years ago by
We require 4.8.0 for 0.27, which is only 2 years old, 5 point versions behind the latest 4.x branch release and 1 major version behind the current QT 5.1.1. So obviously it will be another few years before CentOS feel it's obsolete enough to start shipping it.
comment:37 Changed 11 years ago by
I was glad to find this report since I've been preferring the centos releases after getting tired of upgrading my myth machines so often.
epel has qt5 available for el6. I'm not sure if it's appropriate for rpmfusion or atrpms to depend on epel or not.
Also, el7 should be released this year sometime, which should help the situation.
Off to try building 0.27 rpms with qt5...
comment:38 Changed 11 years ago by
After building 0.27 against qt5 on centos6 I'm still getting the bug.
comment:39 Changed 11 years ago by
Also, stopping dbus and then running mythbackend didn't work on my setup.
Mythtv 0.27(cb3b784) / qt5-qtbase-5.2.1-2 from epel Centos 6.5 x86_64
comment:40 Changed 10 years ago by
I'm still getting the same problem with 0.27.3/qt5 with centos 6.5.
I do want to report that stopping "messagebus" before starting the backend works when running 0.26 from rpmfusion on centos 6.5.
I'll probably upgrade to centos 7 at this point.
comment:43 Changed 9 years ago by
Resolution: | → Upstream Bug |
---|---|
Status: | infoneeded_new → closed |
Thanks for reporting back.
Backtrace