Opened 12 years ago

Closed 8 years ago

#10265 closed Bug Report - Hang/Deadlock (Upstream Bug)

mythbackend main thread blocks soon after startup on CentOS 6.2

Reported by: J. Ali Harlow <ali@…> Owned by:
Priority: minor Milestone: unknown
Component: MythTV - General Version: 0.24-fixes
Severity: medium Keywords:
Cc: Ticket locked: no

Description

Running fixes/0.24 (294968bea7546d5c0d27b7326401e79a7e67764b) plus http://code.mythtv.org/trac/attachment/ticket/9704/mythtv-0.24-backport_reconnect_fixes.patch and mysql-5.1.52-1.el6_0.1.x86_64 on CentOS 6.2

Whenever I start mythbackend, I have a short window to start the frontend during which time it will connect (and appears to work fine). After that short window closes, the backend no longer accepts any connections.

I wrote a simple script/program to show what mythbackend was watching (attached). When I run this during the window, I see something like the following:

22280: poll fds=[{fd=3, events=1, revents=0} {fd=9, events=1, revents=0} {fd=38, events=1, revents=0} {fd=3, events=1, revents=0}] nfds=0x4
22285: poll fds=[{fd=7, events=1, revents=0}] nfds=0x1
22288: select nfds=0xf readfds=[12 13 14] writefds=0x0 exceptfds=0x0
22444: poll fds=[{fd=26, events=1, revents=0}] nfds=0x1
COMMAND     PID USER   FD   TYPE DEVICE SIZE/OFF   NODE NAME
mythbacke 22280 root    3r  FIFO    0,8      0t0 117605 pipe
mythbacke 22280 root    7r  FIFO    0,8      0t0 117615 pipe
mythbacke 22280 root    9u  IPv4 117616      0t0    TCP *:lds-dump (LISTEN)
mythbacke 22280 root   12u  IPv4 117619      0t0    UDP *:apc-6549 
mythbacke 22280 root   13u  IPv4 117620      0t0    UDP 239.255.255.250:ssdp 
mythbacke 22280 root   14u  IPv4 117621      0t0    UDP 255.255.255.255:ssdp 
mythbacke 22280 root   26r  FIFO    0,8      0t0 118008 pipe
mythbacke 22280 root   38u  IPv4 118014      0t0    TCP *:lds-distrib (LISTEN)

so we can see that the main thread is watching some pipe (twice) and the two listening TCP ports.

A little later, we get this:

22285: poll fds=[{fd=7, events=1, revents=0}] nfds=0x1
22288: select nfds=0xf readfds=[12 13 14] writefds=0x0 exceptfds=0x0
22444: poll fds=[{fd=26, events=1, revents=0}] nfds=0x1
COMMAND     PID USER   FD   TYPE DEVICE SIZE/OFF   NODE NAME
mythbacke 22280 root    7r  FIFO    0,8      0t0 117615 pipe
mythbacke 22280 root   12u  IPv4 117619      0t0    UDP *:apc-6549 
mythbacke 22280 root   13u  IPv4 117620      0t0    UDP 239.255.255.250:ssdp 
mythbacke 22280 root   14u  IPv4 117621      0t0    UDP 255.255.255.255:ssdp 
mythbacke 22280 root   26r  FIFO    0,8      0t0 118008 pipe

and strace tells us the main thread is blocked reading the pipe that it was watching:

# strace -p 22280
Process 22280 attached - interrupt to quit
read(3, 

A backtrace at this point is attached.

Attachments (4)

xxx (3.0 KB) - added by J. Ali Harlow <ali@…> 12 years ago.
Backtrace
myth-poll (2.0 KB) - added by J. Ali Harlow <ali@…> 12 years ago.
Script to see what mythbackend is watching
read_mem.c (2.5 KB) - added by J. Ali Harlow <ali@…> 12 years ago.
Helper program for myth-poll
kdreyer-backtrace.txt (8.6 KB) - added by Ken Dreyer <ktdreyer@…> 12 years ago.
backtrace with debugging info (this was on 0.23, but it is basically the same as Ali's)

Download all attachments as: .zip

Change History (47)

Changed 12 years ago by J. Ali Harlow <ali@…>

Attachment: xxx added

Backtrace

Changed 12 years ago by J. Ali Harlow <ali@…>

Attachment: myth-poll added

Script to see what mythbackend is watching

Changed 12 years ago by J. Ali Harlow <ali@…>

Attachment: read_mem.c added

Helper program for myth-poll

comment:1 Changed 12 years ago by Raymond Wagner

Resolution: Duplicate
Status: newclosed

Duplicate of #10188

comment:2 Changed 12 years ago by Raymond Wagner

Resolution: Duplicate
Status: closednew

scratch that...

comment:3 Changed 12 years ago by danielk

  1. Ali Harlow, this appears to be an issue with dbus. 1/ Is dbus installed? 2/ can you toggle whether it is installed and report back if that resolves the deadlock.

There probably isn't much we can do about dbus issues other than to just disable it, but it would be good to have some confirmation as to whether that is the real issue here.

PS In the future when we ask for a backtrace in mythtv land we really mean a "thread apply all bt" but in this case I don't think that is necessary.

comment:4 in reply to:  3 ; Changed 12 years ago by J. Ali Harlow <ali@…>

Replying to danielk:

  1. Ali Harlow, this appears to be an issue with dbus. 1/ Is dbus installed? 2/ can you toggle whether it is installed and report back if that resolves the deadlock.

yes, dbus-1.2.24-5.el6_1.x86_64 is installed. I can't uninstall it (just about every gnome package requires it). I tried killing the session daemon to see if I could do this temporarily to test but this just causes gnome to log me out (and then for gdm to get very confused).

PS In the future when we ask for a backtrace in mythtv land we really mean a "thread apply all bt" but in this case I don't think that is necessary.

Okay. Let me know if this would be useful and if rebuilding dbus-libs to get the debug symbols would be helpful.

Do we know what myth is doing to initiate the communication with dbus? I searched the source of mythbackend and the libs but didn't find it mentioned anywhere. (Wondering if I can disable it from there and achieve the same effect.)

comment:5 in reply to:  4 Changed 12 years ago by Ken Dreyer <ktdreyer@…>

I've been trying 0.23, 0.24.1, and 0.24-fixes on CentOS 6, all with very similar symptoms to this bug report.

I've found that if I disable EIT scanning (or just remove all tuners altogether), and don't connect any frontends, then mythbackend will respond on the HTTP status port (6544). After starting up the backend, as soon as I connect the first frontend client, then the backend immediately stops responding on 6544, and after that first client disconnects I cannot get a response on 6543 either. (I'm just using watch -n 0.5 'wget -O - http://localhost:6544' to test.) The backend is hosed from that point until I restart it.

Ali, does that match your symptoms?

comment:6 Changed 12 years ago by Ken Dreyer <ktdreyer@…>

Actually yes, my backtrace in gdb is identical to yours. Looks like I need to install the debuginfo RPMs for QT / DBus to see the last part of the stack trace.

comment:7 Changed 12 years ago by kkuphal

I would just like to add in case it is needed for debugging that I'm running my backend and frontend on CentOS 6.0, kernel version 2.6.32-71 and an earlier version of trunk from around August 1st and I do not have this issue currently. I also have dbus installed at 1.2.24-3.

Changed 12 years ago by Ken Dreyer <ktdreyer@…>

Attachment: kdreyer-backtrace.txt added

backtrace with debugging info (this was on 0.23, but it is basically the same as Ali's)

comment:8 in reply to:  7 Changed 12 years ago by Ken Dreyer <ktdreyer@…>

Replying to kkuphal: Thanks, I'm on a fully-updated 6.2 system. I downgraded my dbus, qt, and glibc back to CentOS 6's earliest versions (dbus-1.2.24-3, qt-4.6.2-16, glibc-2.12-1.7) and rebooted, but it had no effect.

The problem seems to be related to GDM's interaction with dbus. When I log into Xfce through GDM, I see mythbackend hang upon the first connection. Then when I run "init 3", then "startx", I'm able to launch mythfrontend repeatedly, and mythbackend doesn't hang.

comment:9 Changed 12 years ago by ktdreyer@…

As an update, the problem occurs whenever the "messagebus" service is running, or more specifically, "dbus-daemon" with the "--system" flag. Disabling the messagebus service allows mythbackend to run without hanging.

comment:10 Changed 12 years ago by yoshihara@…

I've encountered the same issue as this and been able to recover from it with disabling "messagebus" daemon starting. My environment is as follows:

OS: Scientific Linux release 6.1 (Carbon) Kernel: 2.6.32-220.4.1.el6.i686 Mythtv version: 0.24.1 TV Capture card: Earth-soft PT2

comment:11 Changed 12 years ago by yoshihara@…

After observation for a while, I found the above solution had so much influences on my environment, for example, X-Window couldn't start. So, I'm waiting for the provision of another one.

comment:12 Changed 12 years ago by ktdreyer@…

Here's the workaround I'm using (posted to the users' mailing list earlier)

I set up a minimal CentOS 6 chroot within /chroot, then installed mythtv-backend into that. When I run mythbackend inside the chroot, it is unable to connect with the systemwide dbus daemon, but it can still communicate with mysqld and mythfrontend over TCP/IP.

Regarding disk space, I created a 4GB logical volume for the chroot, but everything is currently only taking up 850 MB, so I could have probably gone for 2GB without problems. (I have the mythtv "recordings" directory on an altogether separate logical volume that I also mount within the chroot.)

It's certainly not optimal, but I'm not a DBus/Qt hacker so I don't know what else to do.

comment:13 Changed 12 years ago by Github

Move fix from 7866a616c into MythCoreContext?.

This moves the GetMasterHostName? fix from changeset 7866a616c into MythCoreContext?, in the event this is the cause of other similar stalls in the backend protocol server.

Refs #10265

Branch: master Changeset: a0c9d003dee59ed03e52ea564d2c0a00787cb96b

comment:14 Changed 12 years ago by yoshihara@…

I've downloaded and installed the latest version of mythtv through Github, but in vain...

comment:15 Changed 12 years ago by brian.sanders@…

If it helps any, I just started a MythTV install on Centos 6 fully updtaed. Pulled .24-fixes from git, and have run into this problem. I basically can't run mythtv on CentOS at this time.

Just to test, I shut down messagebus and I was able to get a connection to finally be accepted. So this does seem to be the same bug. It is very easy to replicate, just install from a current CentOS build and you can't avoid it.

comment:16 Changed 12 years ago by Jonathan Martens <jonathan@…>

I have the same issues with current trunk as well, see #10302. I would love to investigate this further, but I am relatively new to this. Perhaps some of you can give some pointers on where to start?

comment:17 Changed 12 years ago by Andrey Zhunev <a-j@…>

I discovered that downgrading qtwebkit from qtwebkit-2.1.1-1.el6 to qtwebkit-2.0-3.el6 solves the mythbackend blocking issue on CentOS 6.2. Not sure how to dig it further though... But with qtwebkit-2.0-3 installed, mythbackend is perfectly stable already for several days in a row (working 24/7).

comment:18 in reply to:  17 Changed 12 years ago by yoshihara@…

Replying to Andrey Zhunev <a-j@…>:

I discovered that downgrading qtwebkit from qtwebkit-2.1.1-1.el6 to qtwebkit-2.0-3.el6 solves the mythbackend blocking issue on CentOS 6.2. Not sure how to dig it further though... But with qtwebkit-2.0-3 installed, mythbackend is perfectly stable already for several days in a row (working 24/7).

Hi Andrey,

I tried the work-around you discovered on my own environment consists of Scientific Linux 6.2 and CentOS 6.2.

The former had no problem, but the latter did one with start of mythfrontend showing seg fault error of its latest kernel version.

Thus, personally I think the work-around is generally effective, but I would like dev team of mythTV to fix the root cause of this issue.

Regards, Yoshii

comment:19 Changed 12 years ago by yoshihara@…

I tried further since then and the problem on Scientific Linux's side was due to corruption of related libraries' installation structure for mythTV. I completely removed it and re-installed, succeeded to watch TV programs on it.

As a result, did qtwebkit cause this issue although I didn't obviously know its root cause?

Regards, Yoshii

comment:20 Changed 12 years ago by yoshihara@…

Sorry for my typo.

(Wrong) Scientific Linux's side

(Correct) CentOS's side

Regards, Yoshii

comment:21 Changed 12 years ago by ktdreyer@…

Yoshii: what does "rpm -qv qtwebkit" say?

comment:22 Changed 12 years ago by yoshihara@…

Please see the following description:

(Scientific Linux 6.2) [root@scientific6 ~]# rpm -qv qtwebkit qtwebkit-2.0-3.el6.i686

(CentOS 6.2) [root@centos6 ~]# rpm -qv qtwebkit qtwebkit-2.0-3.el6.x86_64

I downloaded and installed each package from ATrpms repository.

Regards, Yoshii

comment:23 Changed 12 years ago by ktdreyer@…

I have no idea how qtwebkit could be affecting this, but to me next step is to bisect the commits between qtwebkit 2.0 to 2.1.1, in order to understand what exactly could be causing the lockup.

comment:24 Changed 12 years ago by gronslet@…

Hi, I also have this issue - getting

E  MythSocket(1676920:36): readStringList: Error, timed out after 7000 ms.   

from mythfrontend.

My setup: CentOS 6.2 Various kernels (eg. kernel-2.6.32-220.23.1.el6.centos.plus.x86_64 and patched vanilla 3.4.4) MythTV fixes/0.25 compiled from git (2012-07-13), and prebuild binaries (0.25-6.el6.x86_64). I am running a combined FE/BE (frontend/backend).

I was not able to get mythfrontend talk to the backend before I did the workaround suggested in earlier comments, ie.

# wget http://dl.atrpms.net/all/qtwebkit-2.0-3.el6.x86_64.rpm
# wget http://dl.atrpms.net/all/qtwebkit-devel-2.0-3.el6.x86_64.rpm
# yum downgrade qtwebkit-2.0-3.el6.x86_64.rpm qtwebkit-devel-2.0-3.el6.x86_64.rpm

However, I ran into the problem again after restarting some services and/or mythfrontend/backend. So it seems it is not a permanent solution.

Big thanks for having this bugtracker - I have spent numerous of hours on this, and this bug report certainly saved me some time. :)

comment:25 Changed 11 years ago by karlcz@…

I just encountered this on two up to date CentOS 6.3 + EPEL + RPM Fusion machines using the mythtv 0.25 packages from RPM fusion.

A workaround that seems to work for now is to enable all of dbus, mythbackend, and GDM with auto-login as usual (for mythfrontend). Then in rc.local add this hack to get mythbackend started without dbus being available before restarting it for the sake of GDM and friends:

service messagebus stop
service mythbackend stop
sleep 5
service mythbackend start
sleep 5
service messagebus start

Does anybody know how to tell mythbackend not to use dbus?

comment:26 Changed 11 years ago by ktdreyer@…

From what I can tell, the D-Bus interaction is coming from Qt event loop, not necessarily unique to MythTV. Again, I'm not a Qt hacker, but I'm not sure there's even a way to disable the D-Bus interaction via Qt's API.

comment:27 Changed 11 years ago by hobbes1069@…

Well I tried updating QT from 4.6.2 to 4.6.4 in my CentOS 6.3 virtual machine hoping that the problem was fixed upstream but no luck.

comment:28 Changed 11 years ago by anonymous@…

I don't think the issue is QT, but rather the rather ancient level of glib2 supplied with Centos/RHEL. In the 2.22.5 version of gmain.c, the wakeup pipe is created blocking; in later versions the wakeup code is moved into a new gwakeup.c module and uses non-blocking pipes.

A possible circumvention (haven't tried this yet) would be to modify QT src/corelib/kernel/qsocketnotifier.cpp to force registered file descriptors non-blocking.

comment:29 Changed 11 years ago by hobbes1069@…

Well if someone can come up with a patch I'll try it in my CentOS 6.3 virtual machine...

comment:30 Changed 11 years ago by Kenni Lund [kenni a kelu dot dk]

Ticket locked: set

Please use the mythtv-users mailing list for discussions. There're way too many comments in this ticket, which doesn't add any useful information at all. If the discussions on the mailing list lead to any conclusions, let's add the conclusions here afterwards.

comment:31 Changed 11 years ago by paulh

Status: newinfoneeded_new
Ticket locked: unset

Is this still a problem? You might just have to use a more up to date OS that uses newer versions of the packages required by Myth rather than the ancient versions CentOS likes to use :)

comment:32 in reply to:  31 Changed 10 years ago by brian.johnson@…

Replying to paulh:

Is this still a problem?

I am still seeing this same problem on a fresh CentOS 6.4 install with MythTV installed from the RPM Fusion repo. Stopping the messagebus service allows mythbackend to respond to requests again. So nothing seems to have changed in the last 2 years.

comment:33 Changed 10 years ago by John Pilkington <J.Pilk@…>

Is that this, build date 30 May, http://download1.rpmfusion.org/free/el/updates/6/x86_64/mythtv-0.26.0-9.el6.x86_64.rpm

or perhaps this, from 'testing', build date 13 Oct but still 0.26? http://download1.rpmfusion.org/free/el/updates/testing/6/x86_64/mythtv-0.26.1-4.el6.x86_64.rpm

or did you build from the 0.27 SRPM, which hasn't yet, TTBOMK, reached the repo (and probably won't, AIUI, since it needs a version of Qt that doesn't meet the repo guidelines).

I have a 0.27-fixes i386 build under Scientific Linux 6 with extras, and I haven't seen this.

comment:34 in reply to:  33 Changed 10 years ago by brian.johnson@…

Replying to John Pilkington <J.Pilk@…>:

Is that this, build date 30 May, http://download1.rpmfusion.org/free/el/updates/6/x86_64/mythtv-0.26.0-9.el6.x86_64.rpm

Yes, I have this build installed. Waiting for the cmyth plugin for XBMC to support 0.27 before upgrading to that one.

comment:35 Changed 10 years ago by ta.chang1972@…

The latest MythTV 0.27 source seems to need at least Qt 4.7 or later version for its compiling. But CentOS itself still has up to 4.6 rpm package...

comment:36 Changed 10 years ago by stuartm

We require 4.8.0 for 0.27, which is only 2 years old, 5 point versions behind the latest 4.x branch release and 1 major version behind the current QT 5.1.1. So obviously it will be another few years before CentOS feel it's obsolete enough to start shipping it.

Last edited 10 years ago by stuartm (previous) (diff)

comment:37 Changed 10 years ago by wberrier@…

I was glad to find this report since I've been preferring the centos releases after getting tired of upgrading my myth machines so often.

epel has qt5 available for el6. I'm not sure if it's appropriate for rpmfusion or atrpms to depend on epel or not.

Also, el7 should be released this year sometime, which should help the situation.

Off to try building 0.27 rpms with qt5...

comment:38 Changed 10 years ago by wberrier@…

After building 0.27 against qt5 on centos6 I'm still getting the bug.

comment:39 Changed 10 years ago by wberrier@…

Also, stopping dbus and then running mythbackend didn't work on my setup.

Mythtv 0.27(cb3b784) / qt5-qtbase-5.2.1-2 from epel Centos 6.5 x86_64

comment:40 Changed 10 years ago by wberrier@…

I'm still getting the same problem with 0.27.3/qt5 with centos 6.5.

I do want to report that stopping "messagebus" before starting the backend works when running 0.26 from rpmfusion on centos 6.5.

I'll probably upgrade to centos 7 at this point.

comment:41 Changed 8 years ago by paulh

Still a problem with centos 7?

comment:42 Changed 8 years ago by wberrier@…

Not a problem on centos 7, works as expected.

comment:43 Changed 8 years ago by paulh

Resolution: Upstream Bug
Status: infoneeded_newclosed

Thanks for reporting back.

Note: See TracTickets for help on using tickets.