Opened 13 years ago
Closed 13 years ago
Last modified 13 years ago
#10771 closed Bug Report - Hang/Deadlock (fixed)
BE deadlocks after backporting #10541
Reported by: | Owned by: | danielk | |
---|---|---|---|
Priority: | major | Milestone: | 0.25.1 |
Component: | MythTV - EIT | Version: | 0.25-fixes |
Severity: | medium | Keywords: | EIT |
Cc: | Ticket locked: | no |
Description
Hi,
I had hopes backporting #10541 to 0.25-fixes solves issue from #10016. Currently as workaround #10016 I forced active EIT to work on night window where probability that livetv meets EIT scanner is almost zero (#10076). Yesterday I build v0.25-123-gd3a5b0a to give run for #10541 and turn-on continuous EIT active scan on my system (configured only on 2nd tuner).
I had following scenario:
-start BE.
-make sure EIT active scanner starts (configured on 2nd tuner).
-in meantime there was some recordings on 1st tuner.
-there was recording on 2nd tuner. EIT scanner stops.
-after end of recording on 2nd tuner EIT scanner NOT resumed (bad).
-after 1h of idle I manually scheduled recording on 2nd tuner.
-since start of recording on 2nd tuner BE is deadlocked.
BE trace attached.
Attachments (8)
Change History (40)
Changed 13 years ago by
Attachment: | BE-trace.txt added |
---|
comment:1 Changed 13 years ago by
Milestone: | unknown → 0.25.1 |
---|---|
Owner: | changed from Stuart Auchterlonie to danielk |
Priority: | minor → major |
Status: | new → assigned |
comment:3 Changed 13 years ago by
Warpme, are you running with some other patches? If so can you reproduce with [d3a5b0a] without the patches?
The backtrace says TVRec::run() was sitting on line 1535, but that function ends on line 1418... The backtrace for Thread 22 shows similar anomalies.
comment:4 Changed 13 years ago by
Status: | assigned → infoneeded |
---|
Another question, did you modify the live tv order from the default?
comment:5 Changed 13 years ago by
Daniel, thx for quick replay. Indeed I applied patch from #10076, but 3 weeks ago - when You introduce fix for #10541 in master - I do tests without #10076 by back-porting #10541 to 0.25-fixes. Results were exactly the same, but I decided not to bother You at that time. Regarding line numbers mismatch - sure I'll reproduce on vanilla [d3a5b0a] today evening. BR
comment:6 Changed 13 years ago by
Forgot to answer about livetv order: yes, it is changed. I have 3 tuners with 4 virt. each. Recording order is: 1,2,3,4,5,6,7,8,9,10,11,12. LiveTV is: 12,8,4,11,7,3,10,6,2,9,5,1
comment:7 Changed 13 years ago by
Daniel, I managed to get trace+log on vanilla [d3a5b0a]. Scenario was following:
1.start BE, wait till active EIT starts
2.created manual schedulle on tuner doing EIT
3.FE rec. list shows that recording starts but reported size was 0 during 3 min
4.killed ongoing recording, wait 1min
5.schedulled again new recording on this tuner
6.when recording should start - BE deadlocks
Observing correlation between BE & FE I see that recorder starts but not writing. It looks like it is blocked by previously working EIT scanner on this tuner. Scheduling another recording on this blocked tuner deadlocks BE. Log + trace attached.
Changed 13 years ago by
Attachment: | mythbackend.20120529182901.24284.log added |
---|
be log from vanilla d3a5b0a
comment:8 Changed 13 years ago by
Status: | infoneeded → assigned |
---|
Thanks.
Threads 27 and 28 are the main issue here. StopActiveScan?() in 28 is being run from within the event loop, but thread 27 was already in SetChannel?() and is waiting for the event loop to get to the end of the loop. StopActiveScan?() is waiting for SetChannel?() to complete, hence the deadlock.
Threads 2, 4, 9, 12, 23 and 24 are also blocked on waiting for thread 9 streamhandler.cpp:159, but I'm treating this as a red herring for now.
comment:9 Changed 13 years ago by
Status: | assigned → infoneeded |
---|
warpme, I've attached a patch which should address the deadlock if I diagnosed the cause correctly. The patch is against master but it will apply on top of vanilla [d3a5b0a].
Please test and report back if it works or doesn't.
Please attach another backtrace if it doesn't.
Thanks
comment:10 Changed 13 years ago by
Daniel, I applied patch over [d3a5b0a]. It looks like now active EIT scanner is not starting at all. I tried launch livetv for moment - maybe it will unblock scanner (it uses the same tuner). No go. BE log +trace log attached.
Changed 13 years ago by
Attachment: | mythbackend.20120529224205.8292.log added |
---|
BE log with 10771_patch_v1
comment:11 Changed 13 years ago by
Adding a comment here to indicate that I am also seeing similar symptoms with 0.25-fixes. Occasional deadlocks experienced whilst switching to LiveTV on a tuner. After a1d9015924480bb8a733452e294fb72ebbfce94a (from #10541) was applied to 0.25-fixes deadlocks occurred multiple times requiring the backend to be forcefully killed as it did not respond correctly to SIGTERM. Backend still running (housecleaning running, responds to mythbackend --printsched), but not responding to requests from frontends, built-in web server not responding. No trace available as I compiled the latest 0.25-fixes with the commit reverted.
comment:12 follow-up: 13 Changed 13 years ago by
@mayfields,
In my case only effective solution is workaround like described in ticket description. If You will apply patch from #10076, set EITCrawIdleStart to 3900. By this EIT active scanner will work from 1:05AM+60s*tuner_number and end 4h later. Assuming You are not watching livetv between 1AM-5AM - You shouldn't have deadlocks (oh well You can have sporadic deadlocks when in livetv You are changing channels exactly when recording starts. I had such 1 per month or two). -br
comment:13 Changed 13 years ago by
After following the above advice from warpme@... to restrict EIT scanning to specified hours where there are no recordings being made I have had no issues in the past 7 days.
comment:14 Changed 13 years ago by
Can you please test the following patch? http://pastebin.ca/2162527
comment:15 Changed 13 years ago by
Stuart, Is this patch for 0.25-fixes ?. I can't apply it cleanly over 0.25-fixes. Backporting manually show some different code in scheduler.cpp hunks.
comment:16 Changed 13 years ago by
The patch is against master, I'll create a version against -fixes but I'll wait a couple of days to see that there are no regressions before I go to the trouble.
comment:17 Changed 13 years ago by
I modified the patch for 0.25-fixes. Should this be applied along with the patch from 10541 (which was later reverted), or without?
comment:20 Changed 13 years ago by
Stuart, I'm really exited that reschedule blockiness was finally resolved. Believe me - those 10-20sec deaf periods i.e during LiveTV zapping are really annoying. I hope You still have plans to create this patch also for 0.25-fixes. Again - thx for working on this issue !
comment:21 Changed 13 years ago by
This fixes one cause of that particular issue, but I'm aware of others which are a little harder to fix.
I'll backport this change to 0.25-fixes when I'm satisfied that it doesn't introduce any new bugs which means a week or more of testing in master.
comment:22 Changed 13 years ago by
Oh, sure. perfectly clear. If You need 0.25 tester before releasing it officially I'm ready to apply it on my prod. system. PM me and I'll get You feedback how it works on 0.25-fixes. br
comment:28 Changed 13 years ago by
Stuart, #11d7795503h nicely improves user expierience as with this patch - when zapping LiveTV - I don't have deaf to Ch+/Ch- periods around full hours (when scheduler reschedules). Also "Opening video buffers failed too many times" when used zaps LiveTV around reschedule almost gone. Unfortunately it causes frequent BE deadlocks. So far I can describe 2 scenarios:
- I have applied #10076 with EIT scan window for 1:05AM-5:01AM. Scenario for deadlock is following:
-Start BE, wait whole night to EIT scan
-Launch LiveTV after night. It gets TLMSC but after 5-10sec of black screen going back to MainMenu? with "Opening video buffers failed too many times".
-Launching LiveTV again gives Popup "Connection to BE was lost" and BE stops to communicate with all FE (any action on FE gives 5-10sec stale and goes to MainMenu?)
- Turning-off EIT scan at time window (original way of gathering EIT), I can trigger deadlock in following way:
-Start BE and wait till EIT active scanner starts
-Start LiveTV few times
-After 1..3 LiveTV starts, I get "Opening video buffers failed too many times" and return to MainMenu?
-Now BE is in deadlock state where asking for LiveTV or recordings Playback gives immediate back to MainMenu?.
For this scenario I'm attaching BE.log & backtrace.
Generally - after 10 months of living with #10016 I think root cause is related to how active EIT scanner is stopped when recorder starts on given tuner. Doing some tests on scenario 2 shows me that deadlock occurs when given tuner is just grabbing EIT and grabber is preempted by LiveTV recorder.
If I'm reading code correctly - looking on eitscanner.cpp:83, main scanner loop can be ended in quite rare execution places. Lest imagine: if LiveTV recorder just tuning to new channel and eit scanner is just doing any action between eitscanner.cpp:86...151 - we have resource access conflict as tuner just grabbing EIT and in the same time is tuning to new channel. I believe issue described in #9480 comment #7 is exactly result of these conflicts (EIT is on Cyfra mplex while LiveTV tunes to Polsat).
I think we should look how exactly EIT scanner is tear down when recorder starts....
comment:29 Changed 13 years ago by
The bt provided doesn't show a deadlock, at least not obviously. It also doesn't involve the scheduler. That commit does mean we're not blocking for so long under certain conditions, which might mean it's exposing a race condition elsewhere but it seems to be a new problem. Can you open a new ticket and maybe get a couple of new backtraces for comparison?
comment:30 Changed 13 years ago by
Stuart, Now I revert 11d7795503h and will wait 3-4 days to see how 11d7795503h is related to problem described in comment#28. Regarding separate ticket - I already done it (some time ago). It is under #10016 - but currently first LiveTV launch causes deadlock while in past it was only LiveTV failure. I still think ideas expressed by me in comment#28 might be valid for #10016.
comment:31 Changed 13 years ago by
Resolution: | → fixed |
---|---|
Status: | infoneeded → closed |
BE trace