Opened 13 years ago

Closed 12 years ago

Last modified 12 years ago

#9704 closed Bug Report - Hang/Deadlock (Fixed)

backend deadlocks up with Protocol version check failure.

Reported by: brian@… Owned by:
Priority: minor Milestone:
Component: MythTV - General Version: 0.24-fixes
Severity: medium Keywords:
Cc: Ticket locked: yes

Description

I have been experiencing for many months now backend deadlocks that sometimes means recordings don't happen but always mean frontends fail to connect to the backend.

When this happens the following messages appear in the backend log:

2011-03-30 20:00:20.502 MythCoreContext?: Connecting to backend server: 10.75.22.2:6543 (try 1 of 5) 2011-03-30 20:00:20.553 MythCoreContext?: Connecting to backend server: 10.75.22.2:6543 (try 1 of 5) 2011-03-30 20:00:27.503 MythSocket?(9c167a8:11): readStringList: Error, timed out after 7000 ms. 2011-03-30 20:00:27.504 Protocol version check failure.

The response to MYTH_PROTO_VERSION was empty. This happens when the backend is too busy to respond, or has deadlocked in due to bugs or hardware failure.

2011-03-30 20:00:27.554 MythSocket?(9c16ba0:12): readStringList: Error, timed out after 7000 ms.

and these repeat on and on and the situation can only be resolved by a backend restart, which annoying in the least means interrupted recordings if some where ongoing or as I mentioned recordings not even starting.

I realize this is not much to go on yet. The next time I see this (and I could probably pretty safely hold my breath waiting, so it likely won't be long) I will get a threaded backtrace of the backend and attach it to this bug along with a log snippit.

Attachments (18)

ThreadStacktrace.txt (104.0 KB) - added by brian@… 13 years ago.
threaded back trace of crash
mythbackend.log (21.0 KB) - added by brian@… 13 years ago.
backend log
gdb-mbe.txt (90.5 KB) - added by redpepperracing@… 13 years ago.
Master backend backtrace
gdb-sbe.txt (31.7 KB) - added by redpepperracing@… 13 years ago.
Slave backend backtrace
mythbackend-mbe.log (32.8 KB) - added by redpepperracing@… 13 years ago.
Master backend logfile
mythbackend-sbe.log (11.5 KB) - added by redpepperracing@… 13 years ago.
Slave backend logfile
mythfrontend-mfe.log (18.5 KB) - added by redpepperracing@… 13 years ago.
Remote frontend log
nongolfer05061.log (5.6 KB) - added by nongolfer@… 13 years ago.
Mythcommflag? error log
ThreadStacktrace.2.txt (86.0 KB) - added by brian@… 13 years ago.
stack trace of deadlocked threads
ThreadStacktrace.3.txt (102.8 KB) - added by brian@… 13 years ago.
stack trace of deadlocked threads
ThreadStacktrace.4.txt (87.6 KB) - added by brian@… 13 years ago.
stack trace of deadlocked threads
ThreadStacktrace.5.txt (88.5 KB) - added by brian@… 13 years ago.
stack trace of deadlocked threads after EIT scanning fix
ThreadStacktrace.6.txt (82.9 KB) - added by brian@… 13 years ago.
stack trace of deadlocked threads
ThreadStacktrace.7.txt (82.6 KB) - added by brian@… 13 years ago.
stack trace of deadlocked threads
ThreadStacktrace.8.txt (89.9 KB) - added by brian@… 13 years ago.
stack trace of deadlocked threads
mythtv-0.24-backport_reconnect_fixes.patch (29.7 KB) - added by sphery 13 years ago.
Backports 063e45ed23 , 4dfcdb8dd , and 1fb0a77fc from master to 0.24-fixes.
mythbackend-ga15740a.log.zip (96.2 KB) - added by warpme@… 13 years ago.
Deadlocked BE log
trace-ga15740a.txt.zip (3.4 KB) - added by warpme@… 13 years ago.
Deadlocked BE trace

Download all attachments as: .zip

Change History (71)

Changed 13 years ago by brian@…

Attachment: ThreadStacktrace.txt added

threaded back trace of crash

Changed 13 years ago by brian@…

Attachment: mythbackend.log added

backend log

comment:1 Changed 13 years ago by brian@…

Hardware is:

1 x PVR-150

1 x PVR-250

1 x PVR-350

1 x HVR-950Q

comment:2 Changed 13 years ago by brian@…

There was a suggestion on the list to run the backend with "-v most" to see if anything in the debug output correlated with this issue. I have now run with that for about 10 days and the problem does not seem to reproduce with that level of verbosity.

comment:3 Changed 13 years ago by brian@…

And within 24 hours of removing the "-v most" debugging on the backend, the problem is back:

2011-04-18 22:00:03.410 TVRec(1): Changing from None to RecordingOnly
2011-04-18 22:00:03.453 TVRec(1): HW Tuner: 1->1
2011-04-18 22:00:04.461 TVRec(1): rec->GetPathname(): '/storage1/2917_20110418220000.mpg'
2011-04-18 22:00:04.861 [mpeg2video @ 0x4059ac0]ac-tex damaged at 18 18
2011-04-18 22:00:04.884 [mpeg2video @ 0x4059ac0]Warning MVs not available
2011-04-18 22:00:05.003 [ac3 @ 0x4059ac0]incomplete frame
2011-04-18 22:00:09.266 Preview Error: Encountered problems running '/usr/bin/mythpreviewgen --size 0x0 --chanid 3824 --starttime 20110418210000  > /dev/null'
2011-04-18 22:00:15.800 ~MythContext waiting for threads to exit.
2011-04-18 22:00:40.985 UPnpMedia: BuildMediaMap VIDEO scan starting in :/video/mythvideo:
2011-04-18 22:00:41.239 UPnpMedia: BuildMediaMap Done. Found 118 objects
2011-04-18 22:00:45.084 commflag: Commercial Detection Finished: "The Chicago Code":"St. Valentine's Day Massacre" recorded from channel 2864 at 2011-04-18T21:00:00 (5 commercial break(s))
2011-04-18 22:01:14.582 JobQueue: Commercial Detection Starting for "Cougar Town":Walls recorded from channel 3776 at 2011-04-18T21:31:00
2011-04-18 22:01:14.589 commflag: Commercial Detection Starting: "Cougar Town":Walls recorded from channel 3776 at 2011-04-18T21:31:00
2011-04-18 22:01:14.945 Using runtime prefix = /usr
2011-04-18 22:01:14.948 Using configuration directory = /home/mythtv/.mythtv
2011-04-18 22:01:14.950 Using localhost value of pvr
2011-04-18 22:01:14.974 New DB connection, total: 1
2011-04-18 22:01:14.990 Connected to database 'mythconverg' at host: localhost
2011-04-18 22:01:15.010 Closing DB connection named 'DBManager0'
2011-04-18 22:01:15.011 Connected to database 'mythconverg' at host: localhost
2011-04-18 22:01:15.014 Current locale EN_US
2011-04-18 22:01:15.015 Reading locale defaults from /usr/share/mythtv//locales/en_us.xml
2011-04-18 22:01:15.064 Loading en_us translation for module mythfrontend
2011-04-18 22:01:15.272 MythCoreContext: Connecting to backend server: 10.75.22.2:6543 (try 1 of 5)
2011-04-18 22:01:22.275 MythSocket(987fa60:11): readStringList: Error, timed out after 7000 ms.
2011-04-18 22:01:22.276 Protocol version check failure.
			The response to MYTH_PROTO_VERSION was empty.
			This happens when the backend is too busy to respond,
			or has deadlocked in due to bugs or hardware failure.
2011-04-18 22:01:22.277 MythCoreContext: Connecting to backend server: 10.75.22.2:6543 (try 1 of 5)
2011-04-18 22:01:22.368 AFD: Opened codec 0x98db0c0, id(MPEG2VIDEO) type(Video)
2011-04-18 22:01:22.370 AFD: codec AC3 has 2 channels
2011-04-18 22:01:22.373 AFD: Opened codec 0x98e5bd0, id(AC3) type(Audio)
2011-04-18 22:01:28.174 AFD: Opened codec 0xb4d01250, id(MPEG2VIDEO) type(Video)
2011-04-18 22:01:28.176 AFD: codec AC3 has 0 channels
2011-04-18 22:01:28.177 AFD: Opened codec 0xb34e1530, id(AC3) type(Audio)
2011-04-18 22:01:28.180 AFD: codec AC3 has 0 channels
2011-04-18 22:01:28.182 AFD: Opened codec 0xb34e1900, id(AC3) type(Audio)
2011-04-18 22:01:28.183 [mpeg2video @ 0x33aaac0]mpeg_decode_postinit() failure
2011-04-18 22:01:28.184 AFD Error: Unknown decoding error
2011-04-18 22:01:28.185 [mpeg2video @ 0x33aaac0]mpeg_decode_postinit() failure
2011-04-18 22:01:28.186 AFD Error: Unknown decoding error
2011-04-18 22:01:29.280 MythSocket(987fed8:11): readStringList: Error, timed out after 7000 ms.
2011-04-18 22:01:29.281 Protocol version check failure.
			The response to MYTH_PROTO_VERSION was empty.
			This happens when the backend is too busy to respond,
			or has deadlocked in due to bugs or hardware failure.
2011-04-18 22:01:32.145 [mpeg2video @ 0x33aaac0]ac-tex damaged at 8 16
2011-04-18 22:01:32.148 [mpeg2video @ 0x33aaac0]Warning MVs not available
2011-04-18 22:01:40.919 ~MythContext waiting for threads to exit.
2011-04-18 22:02:11.231 commflag: Commercial Detection Finished: "Mike & Molly":"Samuel Gets Fired" recorded from channel 2917 at 2011-04-18T21:31:00 (3 commercial break(s))
2011-04-18 22:02:19.610 JobQueue: Commercial Detection Starting for "Hawaii Five-0":Ho'opa'i recorded from channel 2917 at 2011-04-18T22:00:00
2011-04-18 22:02:19.614 commflag: Commercial Detection Starting: "Hawaii Five-0":Ho'opa'i recorded from channel 2917 at 2011-04-18T22:00:00
2011-04-18 22:02:19.909 Using runtime prefix = /usr
2011-04-18 22:02:19.917 Using configuration directory = /home/mythtv/.mythtv
2011-04-18 22:02:19.919 Using localhost value of pvr
2011-04-18 22:02:19.948 New DB connection, total: 1
2011-04-18 22:02:19.959 Connected to database 'mythconverg' at host: localhost
2011-04-18 22:02:19.978 Closing DB connection named 'DBManager0'
2011-04-18 22:02:19.980 Connected to database 'mythconverg' at host: localhost
2011-04-18 22:02:19.982 Current locale EN_US
2011-04-18 22:02:19.985 Reading locale defaults from /usr/share/mythtv//locales/en_us.xml
2011-04-18 22:02:20.007 Loading en_us translation for module mythfrontend
2011-04-18 22:02:20.203 MythCoreContext: Connecting to backend server: 10.75.22.2:6543 (try 1 of 5)
2011-04-18 22:02:20.252 MythCoreContext: Connecting to backend server: 10.75.22.2:6543 (try 1 of 5)
2011-04-18 22:02:27.208 MythSocket(85048b0:11): readStringList: Error, timed out after 7000 ms.
2011-04-18 22:02:27.209 Protocol version check failure.
			The response to MYTH_PROTO_VERSION was empty.
			This happens when the backend is too busy to respond,
			or has deadlocked in due to bugs or hardware failure.
2011-04-18 22:02:27.254 MythSocket(8505050:12): readStringList: Error, timed out after 7000 ms.
2011-04-18 22:02:27.255 Protocol version check failure.
			The response to MYTH_PROTO_VERSION was empty.
			This happens when the backend is too busy to respond,
			or has deadlocked in due to bugs or hardware failure.
2011-04-18 22:02:27.256 MythCoreContext: Connecting to backend server: 10.75.22.2:6543 (try 1 of 5)
2011-04-18 22:02:34.257 MythSocket(8502ac0:11): readStringList: Error, timed out after 7000 ms.
2011-04-18 22:02:34.257 Protocol version check failure.
			The response to MYTH_PROTO_VERSION was empty.
			This happens when the backend is too busy to respond,
			or has deadlocked in due to bugs or hardware failure.
2011-04-18 22:02:34.258 Unable to find active recorder for this recording, realtime flagging will not be enabled.
2011-04-18 22:02:34.276 MythCoreContext: Connecting to backend server: 10.75.22.2:6543 (try 1 of 5)
2011-04-18 22:02:41.282 MythSocket(84b6538:11): readStringList: Error, timed out after 7000 ms.
2011-04-18 22:02:41.283 Protocol version check failure.
			The response to MYTH_PROTO_VERSION was empty.
			This happens when the backend is too busy to respond,
			or has deadlocked in due to bugs or hardware failure.
2011-04-18 22:02:41.284 MythCoreContext: Connecting to backend server: 10.75.22.2:6543 (try 1 of 5)
2011-04-18 22:02:41.602 AFD: Opened codec 0x8510ea0, id(MPEG2VIDEO) type(Video)
2011-04-18 22:02:41.606 AFD: codec AC3 has 2 channels
2011-04-18 22:02:41.613 AFD: Opened codec 0x851cde0, id(AC3) type(Audio)
2011-04-18 22:02:41.614 AFD: codec AC3 has 0 channels
2011-04-18 22:02:41.615 AFD: Opened codec 0x851d560, id(AC3) type(Audio)
2011-04-18 22:02:48.286 MythSocket(8504610:11): readStringList: Error, timed out after 7000 ms.
2011-04-18 22:02:48.286 Protocol version check failure.
			The response to MYTH_PROTO_VERSION was empty.
			This happens when the backend is too busy to respond,
			or has deadlocked in due to bugs or hardware failure.
2011-04-18 22:06:32.325 [mpeg2video @ 0x1379ac0]ac-tex damaged at 27 13
2011-04-18 22:06:32.343 [mpeg2video @ 0x1379ac0]Warning MVs not available
2011-04-18 22:06:32.414 [ac3 @ 0x1379ac0]incomplete frame
2011-04-18 22:06:36.126 MythCoreContext: Connecting to backend server: 10.75.22.2:6543 (try 1 of 5)
2011-04-18 22:06:43.129 MythSocket(987e750:11): readStringList: Error, timed out after 7000 ms.
2011-04-18 22:06:43.130 Protocol version check failure.
			The response to MYTH_PROTO_VERSION was empty.
			This happens when the backend is too busy to respond,
			or has deadlocked in due to bugs or hardware failure.
2011-04-18 22:06:43.131 ~MythContext waiting for threads to exit.
2011-04-18 22:06:43.181 MythCoreContext: Connecting to backend server: 10.75.22.2:6543 (try 1 of 5)
2011-04-18 22:06:50.183 MythSocket(9854868:8): readStringList: Error, timed out after 7000 ms.
2011-04-18 22:06:50.183 Protocol version check failure.
			The response to MYTH_PROTO_VERSION was empty.
			This happens when the backend is too busy to respond,
			or has deadlocked in due to bugs or hardware failure.
2011-04-18 22:07:13.269 commflag: Commercial Detection Finished: "Cougar Town":Walls recorded from channel 3776 at 2011-04-18T21:31:00 (2 commercial break(s))
2011-04-18 22:30:44.259 UPnpMedia: BuildMediaMap VIDEO scan starting in :/video/mythvideo:
2011-04-18 22:30:44.446 UPnpMedia: BuildMediaMap Done. Found 118 objects
2011-04-18 22:49:13.373 RingBuf(/storage1/2917_20110418220000.mpg): Waited 0.2 seconds for data 
			to become available... 22524 < 32768
2011-04-18 22:53:02.046 RingBuf(/storage1/2917_20110418220000.mpg): Waited 0.2 seconds for data 
			to become available... 24396 < 32768
2011-04-18 22:55:06.255 RingBuf(/storage1/2917_20110418220000.mpg): Waited 0.2 seconds for data 
			to become available... 11468 < 32768
2011-04-18 22:56:34.667 RingBuf(/storage1/2917_20110418220000.mpg): Waited 0.2 seconds for data 
			to become available... 2468 < 32768
2011-04-18 22:59:24.626 RingBuf(/storage1/2917_20110418220000.mpg): Waited 0.5 seconds for data 
			to become available... 27868 < 32768
2011-04-18 22:59:49.135 AFD: Opened codec 0x852e150, id(MPEG2VIDEO) type(Video)
2011-04-18 22:59:49.136 AFD: codec AC3 has 0 channels
2011-04-18 22:59:49.231 AFD: Opened codec 0x852bee0, id(AC3) type(Audio)
2011-04-18 23:00:00.912 TVRec(1): Changing from RecordingOnly to None
2011-04-18 23:00:00.968 Finished recording Hawaii Five-0 "Ho'opa'i": channel 2917
2011-04-18 23:00:01.095 scheduler: Finished recording: Hawaii Five-0 "Ho'opa'i": channel 2917
2011-04-18 23:00:01.669 Finished recording Hawaii Five-0 "Ho'opa'i": channel 2917
2011-04-18 23:00:04.369 [mpeg2video @ 0x133dac0]ac-tex damaged at 29 19
2011-04-18 23:00:04.374 [mpeg2video @ 0x133dac0]Warning MVs not available
2011-04-18 23:00:04.522 [ac3 @ 0x133dac0]incomplete frame
2011-04-18 23:00:08.259 MythCoreContext: Connecting to backend server: 10.75.22.2:6543 (try 1 of 5)
2011-04-18 23:00:15.463 MythSocket(852b330:15): readStringList: Error, timed out after 7000 ms.
2011-04-18 23:00:15.464 Protocol version check failure.
			The response to MYTH_PROTO_VERSION was empty.
			This happens when the backend is too busy to respond,
			or has deadlocked in due to bugs or hardware failure.
2011-04-18 23:00:15.465 ~MythContext waiting for threads to exit.
2011-04-18 23:00:15.515 MythCoreContext: Connecting to backend server: 10.75.22.2:6543 (try 1 of 5)
2011-04-18 23:00:22.517 MythSocket(852b330:8): readStringList: Error, timed out after 7000 ms.
2011-04-18 23:00:22.518 Protocol version check failure.
			The response to MYTH_PROTO_VERSION was empty.
			This happens when the backend is too busy to respond,
			or has deadlocked in due to bugs or hardware failure.
2011-04-18 23:00:43.997 commflag: Commercial Detection Finished: "Hawaii Five-0":Ho'opa'i recorded from channel 2917 at 2011-04-18T22:00:00 (0 commercial break(s))

comment:4 Changed 13 years ago by brian@…

And again tonight, failure to record programs and the same old errors:

2011-04-19 22:04:30.036 Using runtime prefix = /usr
2011-04-19 22:04:30.037 Using configuration directory = /home/mythtv/.mythtv
2011-04-19 22:04:30.039 Using localhost value of pvr
2011-04-19 22:04:30.048 New DB connection, total: 1
2011-04-19 22:04:30.057 Connected to database 'mythconverg' at host: localhost
2011-04-19 22:04:30.066 Closing DB connection named 'DBManager0'
2011-04-19 22:04:30.068 Connected to database 'mythconverg' at host: localhost
2011-04-19 22:04:30.070 Current locale EN_US
2011-04-19 22:04:30.071 Reading locale defaults from /usr/share/mythtv//locales/en_us.xml
2011-04-19 22:04:30.146 Loading en_us translation for module mythfrontend
2011-04-19 22:04:30.209 MythCoreContext: Connecting to backend server: 10.75.22.2:6543 (try 1 of 5)
2011-04-19 22:04:30.261 MythCoreContext: Connecting to backend server: 10.75.22.2:6543 (try 1 of 5)
2011-04-19 22:04:37.214 MythSocket(892d960:11): readStringList: Error, timed out after 7000 ms.
2011-04-19 22:04:37.215 Protocol version check failure.
                        The response to MYTH_PROTO_VERSION was empty.
                        This happens when the backend is too busy to respond,
                        or has deadlocked in due to bugs or hardware failure.
2011-04-19 22:04:37.263 MythSocket(892e1e8:12): readStringList: Error, timed out after 7000 ms.
2011-04-19 22:04:37.263 Protocol version check failure.
                        The response to MYTH_PROTO_VERSION was empty.
                        This happens when the backend is too busy to respond,
                        or has deadlocked in due to bugs or hardware failure.
2011-04-19 22:04:37.264 MythCoreContext: Connecting to backend server: 10.75.22.2:6543 (try 1 of 5)
2011-04-19 22:04:44.266 MythSocket(892e350:11): readStringList: Error, timed out after 7000 ms.
2011-04-19 22:04:44.266 Protocol version check failure.
                        The response to MYTH_PROTO_VERSION was empty.
                        This happens when the backend is too busy to respond,
                        or has deadlocked in due to bugs or hardware failure.
2011-04-19 22:04:44.267 Unable to find active recorder for this recording, realtime flagging will not be enabled.
2011-04-19 22:04:44.292 MythCoreContext: Connecting to backend server: 10.75.22.2:6543 (try 1 of 5)
2011-04-19 22:04:51.295 MythSocket(88e29a0:11): readStringList: Error, timed out after 7000 ms.
2011-04-19 22:04:51.295 Protocol version check failure.
                        The response to MYTH_PROTO_VERSION was empty.
                        This happens when the backend is too busy to respond,
                        or has deadlocked in due to bugs or hardware failure.
2011-04-19 22:04:51.301 MythCoreContext: Connecting to backend server: 10.75.22.2:6543 (try 1 of 5)
2011-04-19 22:04:51.601 AFD Warning: ScanATSCCaptionStreams() called with no PMT
2011-04-19 22:04:51.602 AFD: Opened codec 0x8949e90, id(MPEG2VIDEO) type(Video)
2011-04-19 22:04:51.605 AFD: codec MP2 has 2 channels
2011-04-19 22:04:51.609 AFD: Opened codec 0x894b760, id(MP2) type(Audio)
2011-04-19 22:04:58.304 MythSocket(88e2b48:11): readStringList: Error, timed out after 7000 ms.
2011-04-19 22:04:58.305 Protocol version check failure.
                        The response to MYTH_PROTO_VERSION was empty.
                        This happens when the backend is too busy to respond,
                        or has deadlocked in due to bugs or hardware failure.
2011-04-19 22:07:28.838 [ac3 @ 0x22d6ac0]incomplete frame
2011-04-19 22:07:33.211 ~MythContext waiting for threads to exit.
2011-04-19 22:07:54.693 commflag: Commercial Detection Finished: "Dancing With the Stars" recorded from channel 3129 at 2011-04-19T21:00:00 (6 commercial break(s))
2011-04-19 22:07:56.725 UPnpMedia: BuildMediaMap VIDEO scan starting in :/video/mythvideo:
2011-04-19 22:07:57.034 UPnpMedia: BuildMediaMap Done. Found 118 objects
2011-04-19 22:08:44.361 DB Error (Resolution insert):
Query was:
INSERT INTO recordedmarkup    (chanid, starttime, mark, type, data)    VALUES ( ?, ?, ?, ?, ?);
Bindings were:
:CHANID=1040, :DATA=720, :MARK=15541, :STARTTIME=2011-04-19T22:00:00, :TYPE=30
Driver error was [2/1062]:
QMYSQL3: Unable to execute statement
Database error was:
Duplicate entry '1040-2011-04-19 22:00:00-30-15541' for key 'PRIMARY'

2011-04-19 22:08:44.363 DB Error (Resolution insert):
Query was:
INSERT INTO recordedmarkup    (chanid, starttime, mark, type, data)    VALUES ( ?, ?, ?, ?, ?);
Bindings were:
:CHANID=1040, :DATA=480, :MARK=15541, :STARTTIME=2011-04-19T22:00:00, :TYPE=31
Driver error was [2/1062]:
QMYSQL3: Unable to execute statement
Database error was:
Duplicate entry '1040-2011-04-19 22:00:00-31-15541' for key 'PRIMARY'

2011-04-19 22:10:55.879 [mpeg2video @ 0x5aafac0]warning: first frame is no keyframe
2011-04-19 22:10:55.883 [mpeg2video @ 0x5aafac0]warning: first frame is no keyframe
2011-04-19 22:10:55.909 [mpeg2video @ 0x5aafac0]warning: first frame is no keyframe
2011-04-19 22:10:55.914 [mpeg2video @ 0x5aafac0]warning: first frame is no keyframe
2011-04-19 22:10:56.020 [mpeg2video @ 0x5aafac0]warning: first frame is no keyframe
2011-04-19 22:10:56.112 [mpeg2video @ 0x5aafac0]warning: first frame is no keyframe
2011-04-19 22:10:56.188 [mpeg2video @ 0x5aafac0]warning: first frame is no keyframe
2011-04-19 22:10:56.273 [mpeg2video @ 0x5aafac0]warning: first frame is no keyframe
2011-04-19 22:10:56.277 [mpeg2video @ 0x5aafac0]warning: first frame is no keyframe
2011-04-19 22:10:56.342 [mpeg2video @ 0x5aafac0]warning: first frame is no keyframe

Now that I've proved that this cannot be reproduced with -v most and very reliably reproduced without it, what is next in terms of trying to get to the bottom of this?

*Every* *single* night there is some issue with mythtv whether it be the backend deadlocking like this and missing recordings or front-ends locking up because the backend is locked up. Sadly, it's just not so much fun using mythtv any more. :-(

It's frustration trying to watch it and having problems every single night and frustration having to keep explaining to the family why their recordings are just not being recorded with any reliability any more. :-(

comment:5 Changed 13 years ago by markk

This is not an area of the code I'm familiar with, but here's my not so quick but still quite dirty analysis of the backtrace (using latest 0.24-fixes code):-

Thread 1 (the main thread)

  • has received a MASTER_UPDATE_PROG_INFO message (mainserver.cpp @ 1001). I presume this is the result of a recording finishing, though I'm not too sure. In the process of handling that message, it asks for the recording status of the program referenced in the message. It gets this from the scheduler - scheduler.cpp, GetRecStatus? @ 1534 and immediately tries to obtain a lock on access to the scheduler which it presumably never obtains. The main loop is now deadlocked.

Thread 20 - the scheduler thread

  • is in the middle of the Scheduler::RunScheduler? loop (scheduler.cpp @ 1688). This obtains the scheduler lock at line 1730 and is still holding it when it calls AutoExpire::Update at line 2152. AutoExpire::Update tries to get a lock on the global AutoExpire? object - and again presumably fails.

Thread 18 - the AutoExpire? thread

  • is in the middle of AutoExpire::RunExpirer? (line 310) and is holding the global autoexpire lock. RunExpirer? then proceeds down to the ProgramInfo? class where it is accessing the database. It's not clear whether the database access is stalled for some reason or whether this is just where the interrupt happened to land.

So the main thread (1) is waiting on the scheduler lock, the scheduler thread (20) holds that lock but is waiting for the autoexpire lock, the autoexpire thread (18) holds the autoexpire lock - but I have no idea if it's released or not.

comment:6 Changed 13 years ago by redpepperracing@…

I am seeing the same issue, I can reproduce it at will. I have an MBE with 2x HDHR tuners and 2x PVR-x50 tuners, and a slave backend with one HD-PVR tuner, saving to the MBE via NFS mounted file system. Using current trunk I can reproduce this by either starting live TV on a remote frontend, or by watching a recording on the remote FE and then scheduling a recording on a channel on the SBE via Mythweb, which is the scenario I have backtraced and logged, which should be attached to this ticket. Let me know if there is anything else I can do. As long as I leave the SBE down, the MBE is stable.

Changed 13 years ago by redpepperracing@…

Attachment: gdb-mbe.txt added

Master backend backtrace

Changed 13 years ago by redpepperracing@…

Attachment: gdb-sbe.txt added

Slave backend backtrace

Changed 13 years ago by redpepperracing@…

Attachment: mythbackend-mbe.log added

Master backend logfile

Changed 13 years ago by redpepperracing@…

Attachment: mythbackend-sbe.log added

Slave backend logfile

Changed 13 years ago by redpepperracing@…

Attachment: mythfrontend-mfe.log added

Remote frontend log

comment:7 Changed 13 years ago by brian@…

As an additional data point, my setup has an SBE also, but no tuner in it. It's there just to relieve some of the commflagging burden.

comment:8 Changed 13 years ago by redpepperracing@…

Apparently my issue is different from this one, so I have opened a new ticket. Please ignore my logs/backtraces.

comment:9 Changed 13 years ago by nongolfer@…

I suspect mythcommflag. I have what appears to be the same issue. Was experiencing deadlocks and crashes almost daily. Most crashes occurred approx. 15 minutes after digital recordings. I changed the options for the mythcommflag process to be "-v none,important,database --nopercentage" and the system ran perfectly for over a week. Today I removed the "database" option and the system crashed with the MYTH_PROTO_VERSION error within 2 hours. The crash occurred 5 minutes after a mythcommflag job started.

Changed 13 years ago by nongolfer@…

Attachment: nongolfer05061.log added

Mythcommflag? error log

comment:10 Changed 13 years ago by nongolfer@…

I failed to include my version information in my previous post... MythTV Version : 0.24-7.fc14 (464fa28373) MythTV Branch : Network Protocol : 63 Library API : 0.24.20101129-1 QT Version : 4.7.1 Options compiled in:

linux release using_alsa using_jack using_oss using_pulse using_pulseoutput using_backend using_bindings_perl using_bindings_python using_crystalhd using_dvb using_firewire using_frontend using_hdhomerun using_hdpvr using_iptv using_ivtv using_joystick_menu using_libfftw3 using_lirc using_mheg using_opengl_video using_opengl_vsync using_qtdbus using_qtwebkit using_v4l using_x11 using_xrandr using_xv using_xvmc using_xvmc_vld using_xvmcw using_bindings_perl using_bindings_python using_mythtranscode using_opengl using_vdpau using_ffmpeg_threads using_live using_mheg

comment:11 Changed 13 years ago by nongolfer@…

After a couple of weeks running with -v none,important,database --nopercentage on mythcommflag, the system has been somewhat stable. It still crashes about once per week. The logs are different each time, but always occur during mythcommflag processing. I plan to monkey around with the verbosity to see if I can get any useful information, but this is a heavily used system and the WAF has gotten pretty low, so I'm reluctant to take big chances. It is my belief that the MYTH_PROTO error is just one possible symptom of an overall crash caused by something else. I see it in some of the crashes, but not all.

comment:12 Changed 13 years ago by brian@…

FWIW, the symptoms described in ticket #9773 are the same as what I experience here. I suppose though, many different kinds of deadlocks can have the same symptoms.

Is there any indication in any of the evidence provided here that this situation is the same as ticket #9773?

If there isn't enough evidence in this ticket, I'd be most happy to gather it if you just advise what you need/want.

comment:13 Changed 13 years ago by brian@…

Would more backtraces from the BE be helpful?

Changed 13 years ago by brian@…

Attachment: ThreadStacktrace.2.txt added

stack trace of deadlocked threads

comment:14 Changed 13 years ago by brian@…

As you can see, I've attached another stack trace of this problem in action in hopes that it helps debug it.

Changed 13 years ago by brian@…

Attachment: ThreadStacktrace.3.txt added

stack trace of deadlocked threads

comment:15 Changed 13 years ago by brian@…

I've attached another stack trace in hopes that it reveals something.

Changed 13 years ago by brian@…

Attachment: ThreadStacktrace.4.txt added

stack trace of deadlocked threads

comment:16 Changed 13 years ago by brian@…

I've attached another threaded stack trace. I'm not trying to be a PITA but honestly am just trying to provide as much information as might be needed to try to figure out where this deadlock is.

comment:17 Changed 13 years ago by robertm

Brian, no need for more and more stacktraces, it makes it harder to go through the ticket (and consequently less likely ever to get looked at). Unless they differ in some way or you have a diagnosis about the deadlock, it would be best to wait for a prompt for further information.

comment:18 Changed 13 years ago by robertm

Priority: criticalminor
Severity: highmedium

Re-setting priority and severity, please see ticket howto, these are for developer use.

comment:19 Changed 13 years ago by robertm

Status: newinfoneeded_new

Brian, are you running EIT exclusively? If not, is there a reason you have EIT scanning turned on? I would be interested in knowing what happens when you turn it off.

Probable dupe of #9261.

comment:20 in reply to:  19 ; Changed 13 years ago by ian@…

Replying to robertm:

Brian, are you running EIT exclusively? If not, is there a reason you have EIT scanning turned on? I would be interested in knowing what happens when you turn it off.

Probable dupe of #9261.

I don't think it is a dupe of #9261. The backtrace in #9261 has an EITScanner thread. I am experiencing the same as this ticket (turn of commercial flagging stops the deadlocks) and my deadlock is occurring with no active EITScanner thread although I do use EIT.

comment:21 in reply to:  20 ; Changed 13 years ago by ian@…

Replying to ian@…:

Replying to robertm:

Brian, are you running EIT exclusively? If not, is there a reason you have EIT scanning turned on? I would be interested in knowing what happens when you turn it off.

Probable dupe of #9261.

I don't think it is a dupe of #9261. The backtrace in #9261 has an EITScanner thread. I am experiencing the same as this ticket (turn of commercial flagging stops the deadlocks) and my deadlock is occurring with no active EITScanner thread although I do use EIT.

To clarify, nongolfer is seeing the commercial flagging issue. Some of Brian's backtraces have the doFlagCommercials thread and some don't. So perhaps there are two bugs (or more) bugs here with similar symptoms.

Brian, have you tried disabling commercial flagging?

comment:22 in reply to:  19 Changed 13 years ago by brian@…

Replying to robertm:

Brian, are you running EIT exclusively?

TBH, I'm not sure what that question means.

I don't think I even get any EIT information with my cable provider.

If not, is there a reason you have EIT scanning turned on?

Likely I turned it on when I first configured my digital tuner in hopes of getting some data from it and just never turned it of, just in case my provider does start sending it.

I would be interested in knowing what happens when you turn it off.

I can surely try that.

comment:23 in reply to:  21 Changed 13 years ago by brian@…

Replying to ian@…:

Brian, have you tried disabling commercial flagging?

I have not (yet). I'm trying to avoid doing that as skipping commercials while watching is one of the major reasons we use a PVR.

comment:24 in reply to:  20 ; Changed 13 years ago by robertm

Replying to ian@…:

I don't think it is a dupe of #9261. The backtrace in #9261 has an EITScanner thread.

As do all of Brian's backtraces. This is his ticket. Please allow him to answer when we ask for information from him. Replies by other users, yourself included, only make it harder to troubleshoot.

Brian, please turn off all EIT scanning and report back when you can.

comment:25 Changed 13 years ago by nongolfer@…

FYI: I turned off commercial flagging a few days ago. So far, the system is stable, but I'll need to let it run this way for about a week to be sure. I'll report back. For what it's worth, I have always had EIT turned off on my tuners.

comment:26 in reply to:  24 ; Changed 13 years ago by brian@…

Replying to robertm:

Brian, please turn off all EIT scanning and report back when you can.

Yes, sorry for my lack of clarity in comment 19. I disabled it at that time. Have not really had much recording going on with the end of the winter season and whatnot. Hopefully I will get a feeling for the success of the disabling of EIT scanning soon.

comment:27 in reply to:  26 Changed 13 years ago by brian@…

Replying to brian@…:

Replying to robertm:

Brian, please turn off all EIT scanning and report back when you can.

Yes, sorry for my lack of clarity in comment 19. I disabled it at that time. Have not really had much recording going on with the end of the winter season and whatnot. Hopefully I will get a feeling for the success of the disabling of EIT scanning soon.

Sadly, I have to report that the backend locked up again last night, after having disabled EIT scanning 5 days ago. I know you said you didn't want any more stack traces and so didn't grab one from last night's deadlock but I really should have just to ensure that the backend is honoring my disabling of EIT scanning.

Since EIT scanning doesn't seem to be the culprit, any more ideas? Would stack traces, now that I have disabled EIT scanning be useful or should I still not attach any more?

comment:28 Changed 13 years ago by robertm

Brian, go ahead and provide the BT and logging without EIT, thanks.

comment:29 in reply to:  28 Changed 13 years ago by brian@…

Replying to robertm:

Brian, go ahead and provide the BT and logging without EIT, thanks.

Hrm. So I grabbed some new stack traces in the last few days and they all have EIT functions in them. I am sure that I have disabled EIT scanning to the best of my knowledge how to:

mythtv-setup -> Video Sources -> Schedules Direct Digital -> Perform EIT scan (unchecked)

It's also unchecked in my analog schedule.

Is there any where else I have to make changes to completely disable EIT scanning?

Is there a database query you can provide to prove the state of EIT scanning?

comment:30 Changed 13 years ago by nongolfer@…

I have been running without commflagging for about two weeks now with no crashes. Other than a local power failure the system has been rock solid without commflagging. EIT has never been turned on on my system, so we may be looking at two completely different situations.

FYI: On the backend setup, for your DVB tuners, go into "Recording Options" and uncheck "Use DVB card for active EIT scan." At least that's what I've done when I installed the cards as recommended on some post.

comment:31 Changed 13 years ago by nongolfer@…

Sad to say, my system is still crashing and I can't seem to get any reasonable debugging information. Turning off commflag seems to have helped, but I've still got something wrong. Since I'm running the Fedora 14 RPM, I'm not overly confident of the setup. It defaults to running as root. I've switched it to run as a limited user and turned commflag back on. As far as I can tell at this point, I'm just muddying the waters on this issue and may be totally on the wrong track. If I do find something related, I'll report back. Best of luck to all!

comment:32 Changed 13 years ago by brian@…

Any (more) thoughts on why I'd still have EITScanner::RunEventLoop? and EITThread::run threads running in my backend now that I have (supposedly) disabled EIT scanning?

comment:33 Changed 13 years ago by beirdo

the EITScanner should no longer run on disabled inputs as of 6e8dae55ef4fd162e78e on master.

comment:34 Changed 13 years ago by brian@…

Awesome! Will that be backported to 0.24-fixes, seeing as that's where I'm seeing all of this deadlocking that robertm suspects might be related to EIT scanning?

comment:35 Changed 13 years ago by beirdo

Backported to fixes/0.24 in 5e67f945d1ce3b15c0be9c44c5bb8e4d2bb9bdd5

comment:36 Changed 13 years ago by brian@…

Well, sadly, fixing the EIT scanning so that I can properly disable it has not been a solution. I got another lockup last night. I will attach the stack trace.

Changed 13 years ago by brian@…

Attachment: ThreadStacktrace.5.txt added

stack trace of deadlocked threads after EIT scanning fix

comment:37 Changed 13 years ago by brian@…

At the risk of being annoying, I am going to attach another couple of stack traces after having upgraded to v0.24.1-36-gd06878a which prevents the EIT threads from running when EIT is disabled. Let me know if these are not being of any use, or if there is another path you want to take?

Changed 13 years ago by brian@…

Attachment: ThreadStacktrace.6.txt added

stack trace of deadlocked threads

Changed 13 years ago by brian@…

Attachment: ThreadStacktrace.7.txt added

stack trace of deadlocked threads

comment:38 Changed 13 years ago by brian@…

There have been mentions of turning off commercial flagging to prevent the deadlocks. I am willing to try that just to add to the pool of data... But what I want to ask is that for those of you that are turning off commercial flagging, are you just living with non-flagged recordings or are you flagging them in some non-real time manner so as to not deadlock a recording back end?

comment:39 Changed 13 years ago by nongolfer@…

My mythcommflag theory did not pan out. I was running without flagging at all. Although my system was a little more stable, I still had the same freeze ups. I'm now suspecting hardware.

Changed 13 years ago by brian@…

Attachment: ThreadStacktrace.8.txt added

stack trace of deadlocked threads

comment:40 Changed 13 years ago by stuartm

The current theory is that this and two other similar tickets are related to the database, to that end could we know what version of mysql is being used and which distribution? We'd like to know whether there are any common factors.

comment:41 in reply to:  40 Changed 13 years ago by brian@…

Replying to stuartm:

The current theory is that this and two other similar tickets are related to the database,

I have noticed that most/all of my backtraces do have DB threads in them.

to that end could we know what version of mysql is being used

The distribution's version is 5.1.41-3ubuntu12.10. Mysql tells me:

# mysql -V
mysql  Ver 14.14 Distrib 5.1.41, for debian-linux-gnu (i486) using readline 6.1

and which distribution?

Ubuntu Lucid.

comment:42 Changed 13 years ago by beirdo

Status: infoneeded_newnew
Type: Bug Report - GeneralBug Report - Hang/Deadlock

comment:43 Changed 13 years ago by Github

Enable automatic reconnection for MySQL connections.

After 155138a127 changed the minimum supported Qt version to 4.6, we now have support for MYSQL_OPT_RECONNECT in QSqlDatabase::setConnectOptions().

Note that before MySQL 5, reconnecting connections were the default in the MySQL C API (so, were always enabled for the Qt-MySQL driver). Unfortunately, depending on how the application is written, reconnections may cause issues (since reconnections have effects on transactions, table locks, temporary tables, session and user variables, prepared statements, and such), so MySQL 5.0.3 changed the default to disable automatic reconnections, and MySQL 5.0.13 added MYSQL_OPT_RECONNECT to make reconnection an application-controllable option. Since the Qt-MySQL driver didn't have support for specifying MYSQL_OPT_RECONNECT, it was impossible to get reconnecting connections with MySQL 5.0.3+ in Qt applications. Finally in Qt4.6, Qt devs added support for specifying MYSQL_OPT_RECONNECT, so we can re-enable automatic reconnect.

Thanks to David Edwards for the reminder to set this option.

Refs #9704, #9773, #9792.

Branch: master Changeset: dd2d7a14d10c55f8bc8d12e944adb3b3c9ac8cea

comment:44 Changed 13 years ago by Github

Fix SQL reconnection logic. Refs #9704. Refs #9773. Refs #9792.

What this does is get rid of the old KickDatabase?() logic and replaces it with a simple reconnect if the DB socket is disconnected, and also gets rid of a recent re-enablement of the automatic reconnects in [dd2d7a14d1] and importantly it extends the reconnection logic in [528324d8ba] so that it resends the prepare() and rebinds the bindings when a reconnect happens in an exec().

This also makes the QSqlQuery parent of MSqlQuery private and adds some wrappers so we can prevent MSqlQuery users from accessing methods that may invalidate some assumption in MSqlQuery + it adds debugging wrappers for first(), last(), previous(), and seek() like we already had for next().

A little bit of history is in order.

In Qt3 the SQL driver automatically reconnected whenever a connection was lost. But the first query after this auto-reconnect always failed. So in 2003 mdz added KickDatabase?() which would recover disconnect and reconnect an automatically all was good with the world.

In Qt4 the QSqlQuery::prepare() started to actually do prepared statements instead of emulating them. This caused a small problem in now it was possible that a disconnect happened between the call to prepare() and exec() and then the exec() would fail. But odds of this were still pretty low for most uses of the database in MythTV. But at some point trolltech also got rid of the automatic reconnections which caused us trouble in Qt3. To fix this we added reconnection logic in [528324d8ba] and this is where the real trouble began. While this reconnection logic really does exactly the same thing that the auto-reconnection logic did in Qt3, the fact that prepare() doesn't just emulate prepared statements in Qt4 means that it's now that when we reconnect to the database in exec() the prepared statement has not been sent to the database.

Note: This should be back-ported to 0.24-fixes after a reasonable burn-in time in master.

Branch: master Changeset: 4dfcdb8dd0c80d096ed95d5eccd489c692efa90d

comment:45 Changed 13 years ago by Github

Few bits missing in [4dfcdb8dd]. Refs #9704. Refs #9773. Refs #9792.

Branch: master Changeset: 1fb0a77fc17089b5cd2bf36ef07998bf68461261

Changed 13 years ago by sphery

Backports 063e45ed23 , 4dfcdb8dd , and 1fb0a77fc from master to 0.24-fixes.

comment:46 Changed 13 years ago by danielk

Note: until all the db reconnection issues are resolved and this ticket is possibly fixed, I will only be referencing the #9704 on reconnection related commits.

comment:47 Changed 13 years ago by Github

Refs #9704. Make new DB code more forgiving of NULL strings in NOT NULL columns.

If the query fails with NULL bindings on strings, this tries to resend the query without any NULL strings and if that succeeds it prints an error message and proceeds.

Note: If you see the DB error message, it really does indicate code that needs fixing.

Branch: master Changeset: d85b750fdea94d46d30b8d501d5f82c2d68b3e0a

comment:48 Changed 13 years ago by warpme@…

Hi, I'm on v0.25pre-3260-ga15740a-dirty-20110829. Yesterday I enabled EIT active scan. Today morning BE was deaf to all FE with "The response to MYTH_PROTO_VERSION was empty.". Before active EIT enablement I was on 20110812-g50606cd with passive EIT scan. This combination was stable during 2 weeks in production enviroment. BE log and deadlock trace attached.

Changed 13 years ago by warpme@…

Deadlocked BE log

Changed 13 years ago by warpme@…

Attachment: trace-ga15740a.txt.zip added

Deadlocked BE trace

comment:49 Changed 13 years ago by brian@…

From my perspective, this ticket can close, possibly after the fix is landed to 02.4-fixes.

comment:50 Changed 12 years ago by vtp74@…

I had 0.24.1 and a lot of deadlocks. I applied patch "mythtv-0.24-backport_reconnect_fixes.patch" and changeset "d85b750fdea94d46d30b8d501d5f82c2d68b3e0a".

After building and installing the fixed version the system is stable now. No more deadlocks. In my opinion, 0.24.1 is unusable without these patches. I can't recommend MythTV to anyone because of this bug.

When will fixed 0.24.1 be officially available?

comment:51 Changed 12 years ago by stuartm

Ticket locked: set

comment:52 Changed 12 years ago by danielk

Milestone: unknown0.24.2
Resolution: Fixed
Status: newclosed

comment:53 Changed 12 years ago by stuartm

Milestone: 0.24.2

Milestone 0.24.2 deleted

Note: See TracTickets for help on using tickets.