Ticket #9885 (closed Bug Report - General: Fixed)
Opened 2 years ago
Last modified 22 months ago
Deadlock on slave backend disconnect
| Reported by: | Ian Dall <ian@…> | Owned by: | danielk |
|---|---|---|---|
| Priority: | major | Milestone: | 0.25 |
| Component: | MythTV - General | Version: | Master Head |
| Severity: | medium | Keywords: | |
| Cc: | Ticket locked: | no |
Description
I have a setup with a master BE, 2 slave BEs and 1 - 3 frontends. I am running code compiled from git (v0.25pre-2145-gf199a84-dirty).
The behaviour is that nothing works and one slave is dead and the master backend is deadlocked. FE's don't work and accessing the status port with a browser times-out.
The slave death is accompanied by kernel syslog messages like: kernel: [371375.689820] mythbackend: page allocation failure. order:0, mode:0x20 Maybe the slave death is due to a kernel bug, BUT the master should not deadlock!
The attached backtrace shows that master Thread 16 is trying, in SlaveDisconnected? to get a shedlock, when shedlock is already held by Scheduler::run further up the stack.
I saw exactly the same problem with an older version: 0.24-7.fc14 (464fa28373) but went to git head in the hope that this had been fixed :-(
I notice other deadlock tickets (esp #9745), but none seem to have quite the same description, and the version I am running has the #9745 fix included.
Attachments
Change History
Changed 2 years ago by Ian Dall <ian@…>
- Attachment hex.backtrace added
comment:1 Changed 2 years ago by Jonatan <mythtv@…>
I have also seen a similar deadlock a few times on 0.24. I have been running the backend with the attached patch for a while now without any problems.
comment:2 Changed 23 months ago by Ian Dall <ian@…>
I haven't tried Jonatan's patch yet but will soon (I missed it until now).
Without the patch, the issue is still there as of version: v0.25pre-2563-ga41e965
See thread 21 in the attached backtrace.
Changed 23 months ago by Ian Dall <ian@…>
- Attachment hex-a41e965.backtrace added
Backtrace all threads as of commit a41e965
Changed 23 months ago by Ian Dall <ian@…>
- Attachment hex-a41e965.log.gz added
Master backend log as of commit a41e965
comment:3 Changed 23 months ago by Ian Dall <ian@…>
I have been running with Jonatan's patch for a week now and it seems to fix the problem. I deliberately tried to provoke it by killing and restarting the backend many times and never saw the deadlock.
Can this patch be applied?
comment:4 Changed 23 months ago by danielk
Ian, Janathan's patch is more of a debugging patch, it just disables a bit of code. And can't be applied as is. But if it fixed the problem for you, it does show that you are both experiencing the same deadlock and the same fix will help both of you.
comment:5 Changed 23 months ago by Github
Refs #9885. Fixes deadlock when a slave backend disconnect is first seen from within the Scheduler thread. Patch by Ian Dall.
Keeping ticket open since this should be backported to 0.24-fixes.
Branch: master Changeset: 1fae22a8bc56a5474375332f7799b0ee91bb6244
comment:6 Changed 23 months ago by danielk
- Owner set to danielk
- Status changed from new to assigned
- Milestone changed from unknown to 0.25
comment:7 Changed 22 months ago by danielk
- Status changed from assigned to closed
- Resolution set to Fixed
Fixed in [3a5f78862a5be39e02ee549551d59e9c40fa575d]
Fixes #9885. Fixes deadlock when a slave backend disconnect is first seen from within the Scheduler thread. Patch by Ian Dall.
This has been running in master for 4 weeks without reports of regression.

Backtrace all threads