Opened 14 years ago
Closed 14 years ago
#9885 closed Bug Report - General (Fixed)
Deadlock on slave backend disconnect
Reported by: | Owned by: | danielk | |
---|---|---|---|
Priority: | major | Milestone: | 0.25 |
Component: | MythTV - General | Version: | Master Head |
Severity: | medium | Keywords: | |
Cc: | Ticket locked: | no |
Description
I have a setup with a master BE, 2 slave BEs and 1 - 3 frontends. I am running code compiled from git (v0.25pre-2145-gf199a84-dirty).
The behaviour is that nothing works and one slave is dead and the master backend is deadlocked. FE's don't work and accessing the status port with a browser times-out.
The slave death is accompanied by kernel syslog messages like: kernel: [371375.689820] mythbackend: page allocation failure. order:0, mode:0x20 Maybe the slave death is due to a kernel bug, BUT the master should not deadlock!
The attached backtrace shows that master Thread 16 is trying, in SlaveDisconnected? to get a shedlock, when shedlock is already held by Scheduler::run further up the stack.
I saw exactly the same problem with an older version: 0.24-7.fc14 (464fa28373) but went to git head in the hope that this had been fixed :-(
I notice other deadlock tickets (esp #9745), but none seem to have quite the same description, and the version I am running has the #9745 fix included.
Attachments (5)
Change History (12)
Changed 14 years ago by
Attachment: | hex.backtrace added |
---|
Changed 14 years ago by
Attachment: | slavedisconnect.patch added |
---|
comment:1 Changed 14 years ago by
I have also seen a similar deadlock a few times on 0.24. I have been running the backend with the attached patch for a while now without any problems.
comment:2 Changed 14 years ago by
I haven't tried Jonatan's patch yet but will soon (I missed it until now).
Without the patch, the issue is still there as of version: v0.25pre-2563-ga41e965
See thread 21 in the attached backtrace.
Changed 14 years ago by
Attachment: | hex-a41e965.backtrace added |
---|
Backtrace all threads as of commit a41e965
Changed 14 years ago by
Attachment: | hex-a41e965.log.gz added |
---|
Master backend log as of commit a41e965
comment:3 Changed 14 years ago by
I have been running with Jonatan's patch for a week now and it seems to fix the problem. I deliberately tried to provoke it by killing and restarting the backend many times and never saw the deadlock.
Can this patch be applied?
comment:4 Changed 14 years ago by
Ian, Janathan's patch is more of a debugging patch, it just disables a bit of code. And can't be applied as is. But if it fixed the problem for you, it does show that you are both experiencing the same deadlock and the same fix will help both of you.
comment:5 Changed 14 years ago by
Refs #9885. Fixes deadlock when a slave backend disconnect is first seen from within the Scheduler thread. Patch by Ian Dall.
Keeping ticket open since this should be backported to 0.24-fixes.
Branch: master Changeset: 1fae22a8bc56a5474375332f7799b0ee91bb6244
comment:6 Changed 14 years ago by
Milestone: | unknown → 0.25 |
---|---|
Owner: | set to danielk |
Status: | new → assigned |
comment:7 Changed 14 years ago by
Resolution: | → Fixed |
---|---|
Status: | assigned → closed |
Fixed in [3a5f78862a5be39e02ee549551d59e9c40fa575d]
Fixes #9885. Fixes deadlock when a slave backend disconnect is first seen from within the Scheduler thread. Patch by Ian Dall.
This has been running in master for 4 weeks without reports of regression.
Backtrace all threads