Opened 13 years ago
Closed 13 years ago
#9682 closed Developer Task (Won't Fix)
Myth Backend handles loss of NFS mount poorly
Reported by: | danielk | Owned by: | danielk |
---|---|---|---|
Priority: | minor | Milestone: | unknown |
Component: | MythTV - General | Version: | Master Head |
Severity: | medium | Keywords: | |
Cc: | Ticket locked: | no |
Description (last modified by )
When an NFS server goes down in the middle of a recording any of the backend's ThreadedFileWriters? writing to that disk become permanently wedged and CPU usage shoots up to near 100%.
Using aio_write() instead of write() should allow us to better handle this condition. We should at minimum not use 100% CPU, and ideally continue recording to other disks including with the currently blocked recorder.
Change History (4)
comment:1 Changed 13 years ago by
Description: | modified (diff) |
---|---|
Summary: | Myth Backend handles loss of disk poorly → Myth Backend handles loss of NFS mount poorly |
comment:2 Changed 13 years ago by
Status: | new → assigned |
---|
comment:3 Changed 13 years ago by
Refs #9682. This creates a runnable that updates the free space list on the master backend.
The problem with the existing code is that if any mount goes dead each frontend causes one more worker thread to queue up every 15 seconds. If you have 3 frontends 720 threads are started every hour the mount is down. This instead uses one runnable to update the total disk space every 15 seconds for use by all the frontends. If no frontend requests an updated disk space calculation for one minute the runnable shuts down.
Branch: master Changeset: 394c2e82c8723605d939b02a789f068aa8c62d44
comment:4 Changed 13 years ago by
Resolution: | → Won't Fix |
---|---|
Status: | assigned → closed |
[394c2e82] fixes what appears to be the worst offender when a mount goes dead. It looks like a dead-lock but in fact the threads that are created do all continue once the mount is restored.
Changing ThreadedFileWriter? to use asynchronous writes is complex and wouldn't do a whole lot to improve liveness in the presence of blocked disk access, so I'm closing this as wontfix.
First thing to do... if you are worried about NFS mounts disappearing, mount them with options "soft,intr,retrans=6". The default is "hard,nointr,retrans=3". There is some risk of lost data due to doing this, but the NFS connection will no longer indefinitely hang.
Additional to that, the aio_write() may be a good plan (is this portable, and will it have the desired affect on non NFS writes?)