Opened 9 years ago

Closed 8 years ago

#9682 closed Developer Task (Won't Fix)

Myth Backend handles loss of NFS mount poorly

Reported by: danielk Owned by: danielk
Priority: minor Milestone: unknown
Component: MythTV - General Version: Master Head
Severity: medium Keywords:
Cc: Ticket locked: no

Description (last modified by beirdo)

When an NFS server goes down in the middle of a recording any of the backend's ThreadedFileWriters? writing to that disk become permanently wedged and CPU usage shoots up to near 100%.

Using aio_write() instead of write() should allow us to better handle this condition. We should at minimum not use 100% CPU, and ideally continue recording to other disks including with the currently blocked recorder.

Change History (4)

comment:1 Changed 9 years ago by beirdo

Description: modified (diff)
Summary: Myth Backend handles loss of disk poorlyMyth Backend handles loss of NFS mount poorly

First thing to do... if you are worried about NFS mounts disappearing, mount them with options "soft,intr,retrans=6". The default is "hard,nointr,retrans=3". There is some risk of lost data due to doing this, but the NFS connection will no longer indefinitely hang.

Additional to that, the aio_write() may be a good plan (is this portable, and will it have the desired affect on non NFS writes?)

comment:2 Changed 8 years ago by Raymond Wagner

Status: newassigned

comment:3 Changed 8 years ago by Github

Refs #9682. This creates a runnable that updates the free space list on the master backend.

The problem with the existing code is that if any mount goes dead each frontend causes one more worker thread to queue up every 15 seconds. If you have 3 frontends 720 threads are started every hour the mount is down. This instead uses one runnable to update the total disk space every 15 seconds for use by all the frontends. If no frontend requests an updated disk space calculation for one minute the runnable shuts down.

Branch: master Changeset: 394c2e82c8723605d939b02a789f068aa8c62d44

comment:4 Changed 8 years ago by danielk

Resolution: Won't Fix
Status: assignedclosed

[394c2e82] fixes what appears to be the worst offender when a mount goes dead. It looks like a dead-lock but in fact the threads that are created do all continue once the mount is restored.

Changing ThreadedFileWriter? to use asynchronous writes is complex and wouldn't do a whole lot to improve liveness in the presence of blocked disk access, so I'm closing this as wontfix.

Note: See TracTickets for help on using tickets.