Modify
Warning Please read the Ticket HowTo before creating or commenting on a ticket. Failure to do so may cause your ticket to be rejected or result in a slower response.

Opened 3 years ago

Closed 3 years ago

#9885 closed Bug Report - General (Fixed)

Deadlock on slave backend disconnect

Reported by: Ian Dall <ian@…> Owned by: danielk
Priority: major Milestone: 0.25
Component: MythTV - General Version: Master Head
Severity: medium Keywords:
Cc: Ticket locked: no

Description

I have a setup with a master BE, 2 slave BEs and 1 - 3 frontends. I am running code compiled from git (v0.25pre-2145-gf199a84-dirty).

The behaviour is that nothing works and one slave is dead and the master backend is deadlocked. FE's don't work and accessing the status port with a browser times-out.

The slave death is accompanied by kernel syslog messages like:
kernel: [371375.689820] mythbackend: page allocation failure. order:0, mode:0x20

Maybe the slave death is due to a kernel bug, BUT the master should not deadlock!

The attached backtrace shows that master Thread 16 is trying, in SlaveDisconnected? to get a shedlock, when shedlock is already held by Scheduler::run further up the stack.

I saw exactly the same problem with an older version: 0.24-7.fc14 (464fa28373) but went to git head in the hope that this had been fixed :-(

I notice other deadlock tickets (esp #9745), but none seem to have quite the same description, and the version I am running has the #9745 fix included.

Attachments (5)

hex.backtrace (20.3 KB) - added by Ian Dall <ian@…> 3 years ago.
Backtrace all threads
hex.log.gz (25.1 KB) - added by Ian Dall <ian@…> 3 years ago.
Master backend log.
slavedisconnect.patch (3.7 KB) - added by Jonatan <mythtv@…> 3 years ago.
hex-a41e965.backtrace (27.6 KB) - added by Ian Dall <ian@…> 3 years ago.
Backtrace all threads as of commit a41e965
hex-a41e965.log.gz (201.9 KB) - added by Ian Dall <ian@…> 3 years ago.
Master backend log as of commit a41e965

Download all attachments as: .zip

Change History (12)

Changed 3 years ago by Ian Dall <ian@…>

Backtrace all threads

Changed 3 years ago by Ian Dall <ian@…>

Master backend log.

Changed 3 years ago by Jonatan <mythtv@…>

comment:1 Changed 3 years ago by Jonatan <mythtv@…>

I have also seen a similar deadlock a few times on 0.24. I have been running the backend with the attached patch for a while now without any problems.

comment:2 Changed 3 years ago by Ian Dall <ian@…>

I haven't tried Jonatan's patch yet but will soon (I missed it until now).

Without the patch, the issue is still there as of version: v0.25pre-2563-ga41e965

See thread 21 in the attached backtrace.

Changed 3 years ago by Ian Dall <ian@…>

Backtrace all threads as of commit a41e965

Changed 3 years ago by Ian Dall <ian@…>

Master backend log as of commit a41e965

comment:3 Changed 3 years ago by Ian Dall <ian@…>

I have been running with Jonatan's patch for a week now and it seems to fix the problem. I deliberately tried to provoke it by killing and restarting the backend many times and never saw the deadlock.

Can this patch be applied?

comment:4 Changed 3 years ago by danielk

Ian, Janathan's patch is more of a debugging patch, it just disables a bit of code. And can't be applied as is. But if it fixed the problem for you, it does show that you are both experiencing the same deadlock and the same fix will help both of you.

comment:5 Changed 3 years ago by Github

Refs #9885. Fixes deadlock when a slave backend disconnect is first seen from within the Scheduler thread. Patch by Ian Dall.

Keeping ticket open since this should be backported to 0.24-fixes.

Branch: master
Changeset: 1fae22a8bc56a5474375332f7799b0ee91bb6244

comment:6 Changed 3 years ago by danielk

  • Milestone changed from unknown to 0.25
  • Owner set to danielk
  • Status changed from new to assigned

comment:7 Changed 3 years ago by danielk

  • Resolution set to Fixed
  • Status changed from assigned to closed

Fixed in [3a5f78862a5be39e02ee549551d59e9c40fa575d]

Fixes #9885. Fixes deadlock when a slave backend disconnect is first seen from within the Scheduler thread. Patch by Ian Dall.


This has been running in master for 4 weeks without reports of regression.

Add Comment

Modify Ticket

Action
as closed .
The resolution will be deleted. Next status will be 'new'.
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.