Opened 14 years ago
Closed 13 years ago
Last modified 13 years ago
#9704 closed Bug Report - Hang/Deadlock (Fixed)
backend deadlocks up with Protocol version check failure.
Reported by: | Owned by: | ||
---|---|---|---|
Priority: | minor | Milestone: | |
Component: | MythTV - General | Version: | 0.24-fixes |
Severity: | medium | Keywords: | |
Cc: | Ticket locked: | yes |
Description
I have been experiencing for many months now backend deadlocks that sometimes means recordings don't happen but always mean frontends fail to connect to the backend.
When this happens the following messages appear in the backend log:
2011-03-30 20:00:20.502 MythCoreContext?: Connecting to backend server: 10.75.22.2:6543 (try 1 of 5) 2011-03-30 20:00:20.553 MythCoreContext?: Connecting to backend server: 10.75.22.2:6543 (try 1 of 5) 2011-03-30 20:00:27.503 MythSocket?(9c167a8:11): readStringList: Error, timed out after 7000 ms. 2011-03-30 20:00:27.504 Protocol version check failure.
The response to MYTH_PROTO_VERSION was empty. This happens when the backend is too busy to respond, or has deadlocked in due to bugs or hardware failure.
2011-03-30 20:00:27.554 MythSocket?(9c16ba0:12): readStringList: Error, timed out after 7000 ms.
and these repeat on and on and the situation can only be resolved by a backend restart, which annoying in the least means interrupted recordings if some where ongoing or as I mentioned recordings not even starting.
I realize this is not much to go on yet. The next time I see this (and I could probably pretty safely hold my breath waiting, so it likely won't be long) I will get a threaded backtrace of the backend and attach it to this bug along with a log snippit.
Attachments (18)
Change History (71)
Changed 14 years ago by
Attachment: | ThreadStacktrace.txt added |
---|
comment:2 Changed 14 years ago by
There was a suggestion on the list to run the backend with "-v most" to see if anything in the debug output correlated with this issue. I have now run with that for about 10 days and the problem does not seem to reproduce with that level of verbosity.
comment:3 Changed 14 years ago by
And within 24 hours of removing the "-v most" debugging on the backend, the problem is back:
2011-04-18 22:00:03.410 TVRec(1): Changing from None to RecordingOnly 2011-04-18 22:00:03.453 TVRec(1): HW Tuner: 1->1 2011-04-18 22:00:04.461 TVRec(1): rec->GetPathname(): '/storage1/2917_20110418220000.mpg' 2011-04-18 22:00:04.861 [mpeg2video @ 0x4059ac0]ac-tex damaged at 18 18 2011-04-18 22:00:04.884 [mpeg2video @ 0x4059ac0]Warning MVs not available 2011-04-18 22:00:05.003 [ac3 @ 0x4059ac0]incomplete frame 2011-04-18 22:00:09.266 Preview Error: Encountered problems running '/usr/bin/mythpreviewgen --size 0x0 --chanid 3824 --starttime 20110418210000 > /dev/null' 2011-04-18 22:00:15.800 ~MythContext waiting for threads to exit. 2011-04-18 22:00:40.985 UPnpMedia: BuildMediaMap VIDEO scan starting in :/video/mythvideo: 2011-04-18 22:00:41.239 UPnpMedia: BuildMediaMap Done. Found 118 objects 2011-04-18 22:00:45.084 commflag: Commercial Detection Finished: "The Chicago Code":"St. Valentine's Day Massacre" recorded from channel 2864 at 2011-04-18T21:00:00 (5 commercial break(s)) 2011-04-18 22:01:14.582 JobQueue: Commercial Detection Starting for "Cougar Town":Walls recorded from channel 3776 at 2011-04-18T21:31:00 2011-04-18 22:01:14.589 commflag: Commercial Detection Starting: "Cougar Town":Walls recorded from channel 3776 at 2011-04-18T21:31:00 2011-04-18 22:01:14.945 Using runtime prefix = /usr 2011-04-18 22:01:14.948 Using configuration directory = /home/mythtv/.mythtv 2011-04-18 22:01:14.950 Using localhost value of pvr 2011-04-18 22:01:14.974 New DB connection, total: 1 2011-04-18 22:01:14.990 Connected to database 'mythconverg' at host: localhost 2011-04-18 22:01:15.010 Closing DB connection named 'DBManager0' 2011-04-18 22:01:15.011 Connected to database 'mythconverg' at host: localhost 2011-04-18 22:01:15.014 Current locale EN_US 2011-04-18 22:01:15.015 Reading locale defaults from /usr/share/mythtv//locales/en_us.xml 2011-04-18 22:01:15.064 Loading en_us translation for module mythfrontend 2011-04-18 22:01:15.272 MythCoreContext: Connecting to backend server: 10.75.22.2:6543 (try 1 of 5) 2011-04-18 22:01:22.275 MythSocket(987fa60:11): readStringList: Error, timed out after 7000 ms. 2011-04-18 22:01:22.276 Protocol version check failure. The response to MYTH_PROTO_VERSION was empty. This happens when the backend is too busy to respond, or has deadlocked in due to bugs or hardware failure. 2011-04-18 22:01:22.277 MythCoreContext: Connecting to backend server: 10.75.22.2:6543 (try 1 of 5) 2011-04-18 22:01:22.368 AFD: Opened codec 0x98db0c0, id(MPEG2VIDEO) type(Video) 2011-04-18 22:01:22.370 AFD: codec AC3 has 2 channels 2011-04-18 22:01:22.373 AFD: Opened codec 0x98e5bd0, id(AC3) type(Audio) 2011-04-18 22:01:28.174 AFD: Opened codec 0xb4d01250, id(MPEG2VIDEO) type(Video) 2011-04-18 22:01:28.176 AFD: codec AC3 has 0 channels 2011-04-18 22:01:28.177 AFD: Opened codec 0xb34e1530, id(AC3) type(Audio) 2011-04-18 22:01:28.180 AFD: codec AC3 has 0 channels 2011-04-18 22:01:28.182 AFD: Opened codec 0xb34e1900, id(AC3) type(Audio) 2011-04-18 22:01:28.183 [mpeg2video @ 0x33aaac0]mpeg_decode_postinit() failure 2011-04-18 22:01:28.184 AFD Error: Unknown decoding error 2011-04-18 22:01:28.185 [mpeg2video @ 0x33aaac0]mpeg_decode_postinit() failure 2011-04-18 22:01:28.186 AFD Error: Unknown decoding error 2011-04-18 22:01:29.280 MythSocket(987fed8:11): readStringList: Error, timed out after 7000 ms. 2011-04-18 22:01:29.281 Protocol version check failure. The response to MYTH_PROTO_VERSION was empty. This happens when the backend is too busy to respond, or has deadlocked in due to bugs or hardware failure. 2011-04-18 22:01:32.145 [mpeg2video @ 0x33aaac0]ac-tex damaged at 8 16 2011-04-18 22:01:32.148 [mpeg2video @ 0x33aaac0]Warning MVs not available 2011-04-18 22:01:40.919 ~MythContext waiting for threads to exit. 2011-04-18 22:02:11.231 commflag: Commercial Detection Finished: "Mike & Molly":"Samuel Gets Fired" recorded from channel 2917 at 2011-04-18T21:31:00 (3 commercial break(s)) 2011-04-18 22:02:19.610 JobQueue: Commercial Detection Starting for "Hawaii Five-0":Ho'opa'i recorded from channel 2917 at 2011-04-18T22:00:00 2011-04-18 22:02:19.614 commflag: Commercial Detection Starting: "Hawaii Five-0":Ho'opa'i recorded from channel 2917 at 2011-04-18T22:00:00 2011-04-18 22:02:19.909 Using runtime prefix = /usr 2011-04-18 22:02:19.917 Using configuration directory = /home/mythtv/.mythtv 2011-04-18 22:02:19.919 Using localhost value of pvr 2011-04-18 22:02:19.948 New DB connection, total: 1 2011-04-18 22:02:19.959 Connected to database 'mythconverg' at host: localhost 2011-04-18 22:02:19.978 Closing DB connection named 'DBManager0' 2011-04-18 22:02:19.980 Connected to database 'mythconverg' at host: localhost 2011-04-18 22:02:19.982 Current locale EN_US 2011-04-18 22:02:19.985 Reading locale defaults from /usr/share/mythtv//locales/en_us.xml 2011-04-18 22:02:20.007 Loading en_us translation for module mythfrontend 2011-04-18 22:02:20.203 MythCoreContext: Connecting to backend server: 10.75.22.2:6543 (try 1 of 5) 2011-04-18 22:02:20.252 MythCoreContext: Connecting to backend server: 10.75.22.2:6543 (try 1 of 5) 2011-04-18 22:02:27.208 MythSocket(85048b0:11): readStringList: Error, timed out after 7000 ms. 2011-04-18 22:02:27.209 Protocol version check failure. The response to MYTH_PROTO_VERSION was empty. This happens when the backend is too busy to respond, or has deadlocked in due to bugs or hardware failure. 2011-04-18 22:02:27.254 MythSocket(8505050:12): readStringList: Error, timed out after 7000 ms. 2011-04-18 22:02:27.255 Protocol version check failure. The response to MYTH_PROTO_VERSION was empty. This happens when the backend is too busy to respond, or has deadlocked in due to bugs or hardware failure. 2011-04-18 22:02:27.256 MythCoreContext: Connecting to backend server: 10.75.22.2:6543 (try 1 of 5) 2011-04-18 22:02:34.257 MythSocket(8502ac0:11): readStringList: Error, timed out after 7000 ms. 2011-04-18 22:02:34.257 Protocol version check failure. The response to MYTH_PROTO_VERSION was empty. This happens when the backend is too busy to respond, or has deadlocked in due to bugs or hardware failure. 2011-04-18 22:02:34.258 Unable to find active recorder for this recording, realtime flagging will not be enabled. 2011-04-18 22:02:34.276 MythCoreContext: Connecting to backend server: 10.75.22.2:6543 (try 1 of 5) 2011-04-18 22:02:41.282 MythSocket(84b6538:11): readStringList: Error, timed out after 7000 ms. 2011-04-18 22:02:41.283 Protocol version check failure. The response to MYTH_PROTO_VERSION was empty. This happens when the backend is too busy to respond, or has deadlocked in due to bugs or hardware failure. 2011-04-18 22:02:41.284 MythCoreContext: Connecting to backend server: 10.75.22.2:6543 (try 1 of 5) 2011-04-18 22:02:41.602 AFD: Opened codec 0x8510ea0, id(MPEG2VIDEO) type(Video) 2011-04-18 22:02:41.606 AFD: codec AC3 has 2 channels 2011-04-18 22:02:41.613 AFD: Opened codec 0x851cde0, id(AC3) type(Audio) 2011-04-18 22:02:41.614 AFD: codec AC3 has 0 channels 2011-04-18 22:02:41.615 AFD: Opened codec 0x851d560, id(AC3) type(Audio) 2011-04-18 22:02:48.286 MythSocket(8504610:11): readStringList: Error, timed out after 7000 ms. 2011-04-18 22:02:48.286 Protocol version check failure. The response to MYTH_PROTO_VERSION was empty. This happens when the backend is too busy to respond, or has deadlocked in due to bugs or hardware failure. 2011-04-18 22:06:32.325 [mpeg2video @ 0x1379ac0]ac-tex damaged at 27 13 2011-04-18 22:06:32.343 [mpeg2video @ 0x1379ac0]Warning MVs not available 2011-04-18 22:06:32.414 [ac3 @ 0x1379ac0]incomplete frame 2011-04-18 22:06:36.126 MythCoreContext: Connecting to backend server: 10.75.22.2:6543 (try 1 of 5) 2011-04-18 22:06:43.129 MythSocket(987e750:11): readStringList: Error, timed out after 7000 ms. 2011-04-18 22:06:43.130 Protocol version check failure. The response to MYTH_PROTO_VERSION was empty. This happens when the backend is too busy to respond, or has deadlocked in due to bugs or hardware failure. 2011-04-18 22:06:43.131 ~MythContext waiting for threads to exit. 2011-04-18 22:06:43.181 MythCoreContext: Connecting to backend server: 10.75.22.2:6543 (try 1 of 5) 2011-04-18 22:06:50.183 MythSocket(9854868:8): readStringList: Error, timed out after 7000 ms. 2011-04-18 22:06:50.183 Protocol version check failure. The response to MYTH_PROTO_VERSION was empty. This happens when the backend is too busy to respond, or has deadlocked in due to bugs or hardware failure. 2011-04-18 22:07:13.269 commflag: Commercial Detection Finished: "Cougar Town":Walls recorded from channel 3776 at 2011-04-18T21:31:00 (2 commercial break(s)) 2011-04-18 22:30:44.259 UPnpMedia: BuildMediaMap VIDEO scan starting in :/video/mythvideo: 2011-04-18 22:30:44.446 UPnpMedia: BuildMediaMap Done. Found 118 objects 2011-04-18 22:49:13.373 RingBuf(/storage1/2917_20110418220000.mpg): Waited 0.2 seconds for data to become available... 22524 < 32768 2011-04-18 22:53:02.046 RingBuf(/storage1/2917_20110418220000.mpg): Waited 0.2 seconds for data to become available... 24396 < 32768 2011-04-18 22:55:06.255 RingBuf(/storage1/2917_20110418220000.mpg): Waited 0.2 seconds for data to become available... 11468 < 32768 2011-04-18 22:56:34.667 RingBuf(/storage1/2917_20110418220000.mpg): Waited 0.2 seconds for data to become available... 2468 < 32768 2011-04-18 22:59:24.626 RingBuf(/storage1/2917_20110418220000.mpg): Waited 0.5 seconds for data to become available... 27868 < 32768 2011-04-18 22:59:49.135 AFD: Opened codec 0x852e150, id(MPEG2VIDEO) type(Video) 2011-04-18 22:59:49.136 AFD: codec AC3 has 0 channels 2011-04-18 22:59:49.231 AFD: Opened codec 0x852bee0, id(AC3) type(Audio) 2011-04-18 23:00:00.912 TVRec(1): Changing from RecordingOnly to None 2011-04-18 23:00:00.968 Finished recording Hawaii Five-0 "Ho'opa'i": channel 2917 2011-04-18 23:00:01.095 scheduler: Finished recording: Hawaii Five-0 "Ho'opa'i": channel 2917 2011-04-18 23:00:01.669 Finished recording Hawaii Five-0 "Ho'opa'i": channel 2917 2011-04-18 23:00:04.369 [mpeg2video @ 0x133dac0]ac-tex damaged at 29 19 2011-04-18 23:00:04.374 [mpeg2video @ 0x133dac0]Warning MVs not available 2011-04-18 23:00:04.522 [ac3 @ 0x133dac0]incomplete frame 2011-04-18 23:00:08.259 MythCoreContext: Connecting to backend server: 10.75.22.2:6543 (try 1 of 5) 2011-04-18 23:00:15.463 MythSocket(852b330:15): readStringList: Error, timed out after 7000 ms. 2011-04-18 23:00:15.464 Protocol version check failure. The response to MYTH_PROTO_VERSION was empty. This happens when the backend is too busy to respond, or has deadlocked in due to bugs or hardware failure. 2011-04-18 23:00:15.465 ~MythContext waiting for threads to exit. 2011-04-18 23:00:15.515 MythCoreContext: Connecting to backend server: 10.75.22.2:6543 (try 1 of 5) 2011-04-18 23:00:22.517 MythSocket(852b330:8): readStringList: Error, timed out after 7000 ms. 2011-04-18 23:00:22.518 Protocol version check failure. The response to MYTH_PROTO_VERSION was empty. This happens when the backend is too busy to respond, or has deadlocked in due to bugs or hardware failure. 2011-04-18 23:00:43.997 commflag: Commercial Detection Finished: "Hawaii Five-0":Ho'opa'i recorded from channel 2917 at 2011-04-18T22:00:00 (0 commercial break(s))
comment:4 Changed 14 years ago by
And again tonight, failure to record programs and the same old errors:
2011-04-19 22:04:30.036 Using runtime prefix = /usr 2011-04-19 22:04:30.037 Using configuration directory = /home/mythtv/.mythtv 2011-04-19 22:04:30.039 Using localhost value of pvr 2011-04-19 22:04:30.048 New DB connection, total: 1 2011-04-19 22:04:30.057 Connected to database 'mythconverg' at host: localhost 2011-04-19 22:04:30.066 Closing DB connection named 'DBManager0' 2011-04-19 22:04:30.068 Connected to database 'mythconverg' at host: localhost 2011-04-19 22:04:30.070 Current locale EN_US 2011-04-19 22:04:30.071 Reading locale defaults from /usr/share/mythtv//locales/en_us.xml 2011-04-19 22:04:30.146 Loading en_us translation for module mythfrontend 2011-04-19 22:04:30.209 MythCoreContext: Connecting to backend server: 10.75.22.2:6543 (try 1 of 5) 2011-04-19 22:04:30.261 MythCoreContext: Connecting to backend server: 10.75.22.2:6543 (try 1 of 5) 2011-04-19 22:04:37.214 MythSocket(892d960:11): readStringList: Error, timed out after 7000 ms. 2011-04-19 22:04:37.215 Protocol version check failure. The response to MYTH_PROTO_VERSION was empty. This happens when the backend is too busy to respond, or has deadlocked in due to bugs or hardware failure. 2011-04-19 22:04:37.263 MythSocket(892e1e8:12): readStringList: Error, timed out after 7000 ms. 2011-04-19 22:04:37.263 Protocol version check failure. The response to MYTH_PROTO_VERSION was empty. This happens when the backend is too busy to respond, or has deadlocked in due to bugs or hardware failure. 2011-04-19 22:04:37.264 MythCoreContext: Connecting to backend server: 10.75.22.2:6543 (try 1 of 5) 2011-04-19 22:04:44.266 MythSocket(892e350:11): readStringList: Error, timed out after 7000 ms. 2011-04-19 22:04:44.266 Protocol version check failure. The response to MYTH_PROTO_VERSION was empty. This happens when the backend is too busy to respond, or has deadlocked in due to bugs or hardware failure. 2011-04-19 22:04:44.267 Unable to find active recorder for this recording, realtime flagging will not be enabled. 2011-04-19 22:04:44.292 MythCoreContext: Connecting to backend server: 10.75.22.2:6543 (try 1 of 5) 2011-04-19 22:04:51.295 MythSocket(88e29a0:11): readStringList: Error, timed out after 7000 ms. 2011-04-19 22:04:51.295 Protocol version check failure. The response to MYTH_PROTO_VERSION was empty. This happens when the backend is too busy to respond, or has deadlocked in due to bugs or hardware failure. 2011-04-19 22:04:51.301 MythCoreContext: Connecting to backend server: 10.75.22.2:6543 (try 1 of 5) 2011-04-19 22:04:51.601 AFD Warning: ScanATSCCaptionStreams() called with no PMT 2011-04-19 22:04:51.602 AFD: Opened codec 0x8949e90, id(MPEG2VIDEO) type(Video) 2011-04-19 22:04:51.605 AFD: codec MP2 has 2 channels 2011-04-19 22:04:51.609 AFD: Opened codec 0x894b760, id(MP2) type(Audio) 2011-04-19 22:04:58.304 MythSocket(88e2b48:11): readStringList: Error, timed out after 7000 ms. 2011-04-19 22:04:58.305 Protocol version check failure. The response to MYTH_PROTO_VERSION was empty. This happens when the backend is too busy to respond, or has deadlocked in due to bugs or hardware failure. 2011-04-19 22:07:28.838 [ac3 @ 0x22d6ac0]incomplete frame 2011-04-19 22:07:33.211 ~MythContext waiting for threads to exit. 2011-04-19 22:07:54.693 commflag: Commercial Detection Finished: "Dancing With the Stars" recorded from channel 3129 at 2011-04-19T21:00:00 (6 commercial break(s)) 2011-04-19 22:07:56.725 UPnpMedia: BuildMediaMap VIDEO scan starting in :/video/mythvideo: 2011-04-19 22:07:57.034 UPnpMedia: BuildMediaMap Done. Found 118 objects 2011-04-19 22:08:44.361 DB Error (Resolution insert): Query was: INSERT INTO recordedmarkup (chanid, starttime, mark, type, data) VALUES ( ?, ?, ?, ?, ?); Bindings were: :CHANID=1040, :DATA=720, :MARK=15541, :STARTTIME=2011-04-19T22:00:00, :TYPE=30 Driver error was [2/1062]: QMYSQL3: Unable to execute statement Database error was: Duplicate entry '1040-2011-04-19 22:00:00-30-15541' for key 'PRIMARY' 2011-04-19 22:08:44.363 DB Error (Resolution insert): Query was: INSERT INTO recordedmarkup (chanid, starttime, mark, type, data) VALUES ( ?, ?, ?, ?, ?); Bindings were: :CHANID=1040, :DATA=480, :MARK=15541, :STARTTIME=2011-04-19T22:00:00, :TYPE=31 Driver error was [2/1062]: QMYSQL3: Unable to execute statement Database error was: Duplicate entry '1040-2011-04-19 22:00:00-31-15541' for key 'PRIMARY' 2011-04-19 22:10:55.879 [mpeg2video @ 0x5aafac0]warning: first frame is no keyframe 2011-04-19 22:10:55.883 [mpeg2video @ 0x5aafac0]warning: first frame is no keyframe 2011-04-19 22:10:55.909 [mpeg2video @ 0x5aafac0]warning: first frame is no keyframe 2011-04-19 22:10:55.914 [mpeg2video @ 0x5aafac0]warning: first frame is no keyframe 2011-04-19 22:10:56.020 [mpeg2video @ 0x5aafac0]warning: first frame is no keyframe 2011-04-19 22:10:56.112 [mpeg2video @ 0x5aafac0]warning: first frame is no keyframe 2011-04-19 22:10:56.188 [mpeg2video @ 0x5aafac0]warning: first frame is no keyframe 2011-04-19 22:10:56.273 [mpeg2video @ 0x5aafac0]warning: first frame is no keyframe 2011-04-19 22:10:56.277 [mpeg2video @ 0x5aafac0]warning: first frame is no keyframe 2011-04-19 22:10:56.342 [mpeg2video @ 0x5aafac0]warning: first frame is no keyframe
Now that I've proved that this cannot be reproduced with -v most and very reliably reproduced without it, what is next in terms of trying to get to the bottom of this?
*Every* *single* night there is some issue with mythtv whether it be the backend deadlocking like this and missing recordings or front-ends locking up because the backend is locked up. Sadly, it's just not so much fun using mythtv any more. :-(
It's frustration trying to watch it and having problems every single night and frustration having to keep explaining to the family why their recordings are just not being recorded with any reliability any more. :-(
comment:5 Changed 14 years ago by
This is not an area of the code I'm familiar with, but here's my not so quick but still quite dirty analysis of the backtrace (using latest 0.24-fixes code):-
Thread 1 (the main thread)
- has received a MASTER_UPDATE_PROG_INFO message (mainserver.cpp @ 1001). I presume this is the result of a recording finishing, though I'm not too sure. In the process of handling that message, it asks for the recording status of the program referenced in the message. It gets this from the scheduler - scheduler.cpp, GetRecStatus? @ 1534 and immediately tries to obtain a lock on access to the scheduler which it presumably never obtains. The main loop is now deadlocked.
Thread 20 - the scheduler thread
- is in the middle of the Scheduler::RunScheduler? loop (scheduler.cpp @ 1688). This obtains the scheduler lock at line 1730 and is still holding it when it calls AutoExpire::Update at line 2152. AutoExpire::Update tries to get a lock on the global AutoExpire? object - and again presumably fails.
Thread 18 - the AutoExpire? thread
- is in the middle of AutoExpire::RunExpirer? (line 310) and is holding the global autoexpire lock. RunExpirer? then proceeds down to the ProgramInfo? class where it is accessing the database. It's not clear whether the database access is stalled for some reason or whether this is just where the interrupt happened to land.
So the main thread (1) is waiting on the scheduler lock, the scheduler thread (20) holds that lock but is waiting for the autoexpire lock, the autoexpire thread (18) holds the autoexpire lock - but I have no idea if it's released or not.
comment:6 Changed 14 years ago by
I am seeing the same issue, I can reproduce it at will. I have an MBE with 2x HDHR tuners and 2x PVR-x50 tuners, and a slave backend with one HD-PVR tuner, saving to the MBE via NFS mounted file system. Using current trunk I can reproduce this by either starting live TV on a remote frontend, or by watching a recording on the remote FE and then scheduling a recording on a channel on the SBE via Mythweb, which is the scenario I have backtraced and logged, which should be attached to this ticket. Let me know if there is anything else I can do. As long as I leave the SBE down, the MBE is stable.
comment:7 Changed 14 years ago by
As an additional data point, my setup has an SBE also, but no tuner in it. It's there just to relieve some of the commflagging burden.
comment:8 Changed 14 years ago by
Apparently my issue is different from this one, so I have opened a new ticket. Please ignore my logs/backtraces.
comment:9 Changed 14 years ago by
I suspect mythcommflag. I have what appears to be the same issue. Was experiencing deadlocks and crashes almost daily. Most crashes occurred approx. 15 minutes after digital recordings. I changed the options for the mythcommflag process to be "-v none,important,database --nopercentage" and the system ran perfectly for over a week. Today I removed the "database" option and the system crashed with the MYTH_PROTO_VERSION error within 2 hours. The crash occurred 5 minutes after a mythcommflag job started.
comment:10 Changed 14 years ago by
I failed to include my version information in my previous post... MythTV Version : 0.24-7.fc14 (464fa28373) MythTV Branch : Network Protocol : 63 Library API : 0.24.20101129-1 QT Version : 4.7.1 Options compiled in:
linux release using_alsa using_jack using_oss using_pulse using_pulseoutput using_backend using_bindings_perl using_bindings_python using_crystalhd using_dvb using_firewire using_frontend using_hdhomerun using_hdpvr using_iptv using_ivtv using_joystick_menu using_libfftw3 using_lirc using_mheg using_opengl_video using_opengl_vsync using_qtdbus using_qtwebkit using_v4l using_x11 using_xrandr using_xv using_xvmc using_xvmc_vld using_xvmcw using_bindings_perl using_bindings_python using_mythtranscode using_opengl using_vdpau using_ffmpeg_threads using_live using_mheg
comment:11 Changed 14 years ago by
After a couple of weeks running with -v none,important,database --nopercentage on mythcommflag, the system has been somewhat stable. It still crashes about once per week. The logs are different each time, but always occur during mythcommflag processing. I plan to monkey around with the verbosity to see if I can get any useful information, but this is a heavily used system and the WAF has gotten pretty low, so I'm reluctant to take big chances. It is my belief that the MYTH_PROTO error is just one possible symptom of an overall crash caused by something else. I see it in some of the crashes, but not all.
comment:12 Changed 14 years ago by
FWIW, the symptoms described in ticket #9773 are the same as what I experience here. I suppose though, many different kinds of deadlocks can have the same symptoms.
Is there any indication in any of the evidence provided here that this situation is the same as ticket #9773?
If there isn't enough evidence in this ticket, I'd be most happy to gather it if you just advise what you need/want.
comment:14 Changed 14 years ago by
As you can see, I've attached another stack trace of this problem in action in hopes that it helps debug it.
comment:15 Changed 14 years ago by
I've attached another stack trace in hopes that it reveals something.
comment:16 Changed 14 years ago by
I've attached another threaded stack trace. I'm not trying to be a PITA but honestly am just trying to provide as much information as might be needed to try to figure out where this deadlock is.
comment:17 Changed 14 years ago by
Brian, no need for more and more stacktraces, it makes it harder to go through the ticket (and consequently less likely ever to get looked at). Unless they differ in some way or you have a diagnosis about the deadlock, it would be best to wait for a prompt for further information.
comment:18 Changed 14 years ago by
Priority: | critical → minor |
---|---|
Severity: | high → medium |
Re-setting priority and severity, please see ticket howto, these are for developer use.
comment:19 follow-ups: 20 22 Changed 14 years ago by
Status: | new → infoneeded_new |
---|
Brian, are you running EIT exclusively? If not, is there a reason you have EIT scanning turned on? I would be interested in knowing what happens when you turn it off.
Probable dupe of #9261.
comment:20 follow-ups: 21 24 Changed 14 years ago by
Replying to robertm:
Brian, are you running EIT exclusively? If not, is there a reason you have EIT scanning turned on? I would be interested in knowing what happens when you turn it off.
Probable dupe of #9261.
I don't think it is a dupe of #9261. The backtrace in #9261 has an EITScanner thread. I am experiencing the same as this ticket (turn of commercial flagging stops the deadlocks) and my deadlock is occurring with no active EITScanner thread although I do use EIT.
comment:21 follow-up: 23 Changed 14 years ago by
Replying to ian@…:
Replying to robertm:
Brian, are you running EIT exclusively? If not, is there a reason you have EIT scanning turned on? I would be interested in knowing what happens when you turn it off.
Probable dupe of #9261.
I don't think it is a dupe of #9261. The backtrace in #9261 has an EITScanner thread. I am experiencing the same as this ticket (turn of commercial flagging stops the deadlocks) and my deadlock is occurring with no active EITScanner thread although I do use EIT.
To clarify, nongolfer is seeing the commercial flagging issue. Some of Brian's backtraces have the doFlagCommercials thread and some don't. So perhaps there are two bugs (or more) bugs here with similar symptoms.
Brian, have you tried disabling commercial flagging?
comment:22 Changed 14 years ago by
Replying to robertm:
Brian, are you running EIT exclusively?
TBH, I'm not sure what that question means.
I don't think I even get any EIT information with my cable provider.
If not, is there a reason you have EIT scanning turned on?
Likely I turned it on when I first configured my digital tuner in hopes of getting some data from it and just never turned it of, just in case my provider does start sending it.
I would be interested in knowing what happens when you turn it off.
I can surely try that.
comment:23 Changed 14 years ago by
Replying to ian@…:
Brian, have you tried disabling commercial flagging?
I have not (yet). I'm trying to avoid doing that as skipping commercials while watching is one of the major reasons we use a PVR.
comment:24 follow-up: 26 Changed 14 years ago by
Replying to ian@…:
I don't think it is a dupe of #9261. The backtrace in #9261 has an EITScanner thread.
As do all of Brian's backtraces. This is his ticket. Please allow him to answer when we ask for information from him. Replies by other users, yourself included, only make it harder to troubleshoot.
Brian, please turn off all EIT scanning and report back when you can.
comment:25 Changed 14 years ago by
FYI: I turned off commercial flagging a few days ago. So far, the system is stable, but I'll need to let it run this way for about a week to be sure. I'll report back. For what it's worth, I have always had EIT turned off on my tuners.
comment:26 follow-up: 27 Changed 14 years ago by
Replying to robertm:
Brian, please turn off all EIT scanning and report back when you can.
Yes, sorry for my lack of clarity in comment 19. I disabled it at that time. Have not really had much recording going on with the end of the winter season and whatnot. Hopefully I will get a feeling for the success of the disabling of EIT scanning soon.
comment:27 Changed 14 years ago by
Replying to brian@…:
Replying to robertm:
Brian, please turn off all EIT scanning and report back when you can.
Yes, sorry for my lack of clarity in comment 19. I disabled it at that time. Have not really had much recording going on with the end of the winter season and whatnot. Hopefully I will get a feeling for the success of the disabling of EIT scanning soon.
Sadly, I have to report that the backend locked up again last night, after having disabled EIT scanning 5 days ago. I know you said you didn't want any more stack traces and so didn't grab one from last night's deadlock but I really should have just to ensure that the backend is honoring my disabling of EIT scanning.
Since EIT scanning doesn't seem to be the culprit, any more ideas? Would stack traces, now that I have disabled EIT scanning be useful or should I still not attach any more?
comment:28 follow-up: 29 Changed 14 years ago by
Brian, go ahead and provide the BT and logging without EIT, thanks.
comment:29 Changed 14 years ago by
Replying to robertm:
Brian, go ahead and provide the BT and logging without EIT, thanks.
Hrm. So I grabbed some new stack traces in the last few days and they all have EIT functions in them. I am sure that I have disabled EIT scanning to the best of my knowledge how to:
mythtv-setup -> Video Sources -> Schedules Direct Digital -> Perform EIT scan (unchecked)
It's also unchecked in my analog schedule.
Is there any where else I have to make changes to completely disable EIT scanning?
Is there a database query you can provide to prove the state of EIT scanning?
comment:30 Changed 14 years ago by
I have been running without commflagging for about two weeks now with no crashes. Other than a local power failure the system has been rock solid without commflagging. EIT has never been turned on on my system, so we may be looking at two completely different situations.
FYI: On the backend setup, for your DVB tuners, go into "Recording Options" and uncheck "Use DVB card for active EIT scan." At least that's what I've done when I installed the cards as recommended on some post.
comment:31 Changed 14 years ago by
Sad to say, my system is still crashing and I can't seem to get any reasonable debugging information. Turning off commflag seems to have helped, but I've still got something wrong. Since I'm running the Fedora 14 RPM, I'm not overly confident of the setup. It defaults to running as root. I've switched it to run as a limited user and turned commflag back on. As far as I can tell at this point, I'm just muddying the waters on this issue and may be totally on the wrong track. If I do find something related, I'll report back. Best of luck to all!
comment:32 Changed 14 years ago by
Any (more) thoughts on why I'd still have EITScanner::RunEventLoop? and EITThread::run threads running in my backend now that I have (supposedly) disabled EIT scanning?
comment:33 Changed 14 years ago by
the EITScanner should no longer run on disabled inputs as of 6e8dae55ef4fd162e78e on master.
comment:34 Changed 14 years ago by
Awesome! Will that be backported to 0.24-fixes, seeing as that's where I'm seeing all of this deadlocking that robertm suspects might be related to EIT scanning?
comment:35 Changed 14 years ago by
Backported to fixes/0.24 in 5e67f945d1ce3b15c0be9c44c5bb8e4d2bb9bdd5
comment:36 Changed 14 years ago by
Well, sadly, fixing the EIT scanning so that I can properly disable it has not been a solution. I got another lockup last night. I will attach the stack trace.
Changed 14 years ago by
Attachment: | ThreadStacktrace.5.txt added |
---|
stack trace of deadlocked threads after EIT scanning fix
comment:37 Changed 14 years ago by
At the risk of being annoying, I am going to attach another couple of stack traces after having upgraded to v0.24.1-36-gd06878a which prevents the EIT threads from running when EIT is disabled. Let me know if these are not being of any use, or if there is another path you want to take?
comment:38 Changed 14 years ago by
There have been mentions of turning off commercial flagging to prevent the deadlocks. I am willing to try that just to add to the pool of data... But what I want to ask is that for those of you that are turning off commercial flagging, are you just living with non-flagged recordings or are you flagging them in some non-real time manner so as to not deadlock a recording back end?
comment:39 Changed 14 years ago by
My mythcommflag theory did not pan out. I was running without flagging at all. Although my system was a little more stable, I still had the same freeze ups. I'm now suspecting hardware.
comment:40 follow-up: 41 Changed 14 years ago by
The current theory is that this and two other similar tickets are related to the database, to that end could we know what version of mysql is being used and which distribution? We'd like to know whether there are any common factors.
comment:41 Changed 14 years ago by
Replying to stuartm:
The current theory is that this and two other similar tickets are related to the database,
I have noticed that most/all of my backtraces do have DB threads in them.
to that end could we know what version of mysql is being used
The distribution's version is 5.1.41-3ubuntu12.10. Mysql tells me:
# mysql -V mysql Ver 14.14 Distrib 5.1.41, for debian-linux-gnu (i486) using readline 6.1
and which distribution?
Ubuntu Lucid.
comment:42 Changed 14 years ago by
Status: | infoneeded_new → new |
---|---|
Type: | Bug Report - General → Bug Report - Hang/Deadlock |
comment:43 Changed 14 years ago by
Enable automatic reconnection for MySQL connections.
After 155138a127 changed the minimum supported Qt version to 4.6, we now have support for MYSQL_OPT_RECONNECT in QSqlDatabase::setConnectOptions().
Note that before MySQL 5, reconnecting connections were the default in the MySQL C API (so, were always enabled for the Qt-MySQL driver). Unfortunately, depending on how the application is written, reconnections may cause issues (since reconnections have effects on transactions, table locks, temporary tables, session and user variables, prepared statements, and such), so MySQL 5.0.3 changed the default to disable automatic reconnections, and MySQL 5.0.13 added MYSQL_OPT_RECONNECT to make reconnection an application-controllable option. Since the Qt-MySQL driver didn't have support for specifying MYSQL_OPT_RECONNECT, it was impossible to get reconnecting connections with MySQL 5.0.3+ in Qt applications. Finally in Qt4.6, Qt devs added support for specifying MYSQL_OPT_RECONNECT, so we can re-enable automatic reconnect.
Thanks to David Edwards for the reminder to set this option.
Branch: master Changeset: dd2d7a14d10c55f8bc8d12e944adb3b3c9ac8cea
comment:44 Changed 14 years ago by
Fix SQL reconnection logic. Refs #9704. Refs #9773. Refs #9792.
What this does is get rid of the old KickDatabase?() logic and replaces it with a simple reconnect if the DB socket is disconnected, and also gets rid of a recent re-enablement of the automatic reconnects in [dd2d7a14d1] and importantly it extends the reconnection logic in [528324d8ba] so that it resends the prepare() and rebinds the bindings when a reconnect happens in an exec().
This also makes the QSqlQuery parent of MSqlQuery private and adds some wrappers so we can prevent MSqlQuery users from accessing methods that may invalidate some assumption in MSqlQuery + it adds debugging wrappers for first(), last(), previous(), and seek() like we already had for next().
A little bit of history is in order.
In Qt3 the SQL driver automatically reconnected whenever a connection was lost. But the first query after this auto-reconnect always failed. So in 2003 mdz added KickDatabase?() which would recover disconnect and reconnect an automatically all was good with the world.
In Qt4 the QSqlQuery::prepare() started to actually do prepared statements instead of emulating them. This caused a small problem in now it was possible that a disconnect happened between the call to prepare() and exec() and then the exec() would fail. But odds of this were still pretty low for most uses of the database in MythTV. But at some point trolltech also got rid of the automatic reconnections which caused us trouble in Qt3. To fix this we added reconnection logic in [528324d8ba] and this is where the real trouble began. While this reconnection logic really does exactly the same thing that the auto-reconnection logic did in Qt3, the fact that prepare() doesn't just emulate prepared statements in Qt4 means that it's now that when we reconnect to the database in exec() the prepared statement has not been sent to the database.
Note: This should be back-ported to 0.24-fixes after a reasonable burn-in time in master.
Branch: master Changeset: 4dfcdb8dd0c80d096ed95d5eccd489c692efa90d
comment:45 Changed 14 years ago by
Few bits missing in [4dfcdb8dd]. Refs #9704. Refs #9773. Refs #9792.
Branch: master Changeset: 1fb0a77fc17089b5cd2bf36ef07998bf68461261
Changed 14 years ago by
Attachment: | mythtv-0.24-backport_reconnect_fixes.patch added |
---|
Backports 063e45ed23 , 4dfcdb8dd , and 1fb0a77fc from master to 0.24-fixes.
comment:46 Changed 14 years ago by
Note: until all the db reconnection issues are resolved and this ticket is possibly fixed, I will only be referencing the #9704 on reconnection related commits.
comment:47 Changed 14 years ago by
Refs #9704. Make new DB code more forgiving of NULL strings in NOT NULL columns.
If the query fails with NULL bindings on strings, this tries to resend the query without any NULL strings and if that succeeds it prints an error message and proceeds.
Note: If you see the DB error message, it really does indicate code that needs fixing.
Branch: master Changeset: d85b750fdea94d46d30b8d501d5f82c2d68b3e0a
comment:48 Changed 13 years ago by
Hi, I'm on v0.25pre-3260-ga15740a-dirty-20110829. Yesterday I enabled EIT active scan. Today morning BE was deaf to all FE with "The response to MYTH_PROTO_VERSION was empty.". Before active EIT enablement I was on 20110812-g50606cd with passive EIT scan. This combination was stable during 2 weeks in production enviroment. BE log and deadlock trace attached.
comment:49 Changed 13 years ago by
From my perspective, this ticket can close, possibly after the fix is landed to 02.4-fixes.
comment:50 Changed 13 years ago by
I had 0.24.1 and a lot of deadlocks. I applied patch "mythtv-0.24-backport_reconnect_fixes.patch" and changeset "d85b750fdea94d46d30b8d501d5f82c2d68b3e0a".
After building and installing the fixed version the system is stable now. No more deadlocks. In my opinion, 0.24.1 is unusable without these patches. I can't recommend MythTV to anyone because of this bug.
When will fixed 0.24.1 be officially available?
comment:51 Changed 13 years ago by
Ticket locked: | set |
---|
comment:52 Changed 13 years ago by
Milestone: | unknown → 0.24.2 |
---|---|
Resolution: | → Fixed |
Status: | new → closed |
threaded back trace of crash