Opened 11 years ago

Closed 11 years ago

Last modified 11 years ago

#6503 closed defect (fixed)

GetFilesystemInfos, TotalKB: -1 should be ignored

Reported by: nick@… Owned by: cpinkham
Priority: minor Milestone: 0.21.1
Component: MythTV - General Version: 0.21-fixes
Severity: medium Keywords:
Cc: rolo@… Ticket locked: no

Description

In short, GetFilesystemInfos? reports -1 total kb and 0 free kb, and Mythtv thinks it should mass expire, killing tons of shows. Meanwhile, actual filesystem has loads of free space.

MythTV Version   : 18207
MythTV Branch    : branches/release-0-21-fixes
Library API      : 0.21.20080304-1
Network Protocol : 40
Options compiled in:
 linux profile using_oss using_alsa using_arts using_jack using_backend using_dbox2 using_dvb using_firewire using_frontend using_hdhomerun using_iptv using_ivtv using_joystick_menu using_libfftw3 using_lirc using_opengl_vsync using_opengl_video using_v4l using_x11 using_xrandr using_xv using_xvmc using_xvmcw using_xvmc_vld using_glx_proc_addr_arb using_bindings_perl using_bindings_python using_opengl using_ffmpeg_threads using_libavc_5_3 using_live
Distributor ID:	Ubuntu
Description:	Ubuntu 8.04.2
Release:	8.04
Codename:	hardy

pacifico - Master Backend, physical location of recordings, exported via NFS, mounted as /var/lib/mythtv/recordings via autofs control of /var/lib/mythtv

guinness - Slave Backend, /var/lib/mythtv/recordings mounted from pacifico via autofs control of /var/lib/mythtv

mysql> select * from storagegroup;
+----+-----------+----------+----------------------------+
| id | groupname | hostname | dirname                    |
+----+-----------+----------+----------------------------+
|  2 | Default   | pacifico | /var/lib/mythtv/recordings | 
+----+-----------+----------+----------------------------+

From mythtvbackend.log on pacifico:

--- GetFilesystemInfos directory list start ---
Dir: pacifico:/var/lib/mythtv/recordings
     Location: Remote
     Drive ID: 1
     TotalKB : 878560256
     UsedKB  : 246081536
     FreeKB  : 632478720

Dir: guinness:/var/lib/mythtv/recordings
     Location: Local
     Drive ID: 1
     TotalKB : -1
     UsedKB  : -1
     FreeKB  : 0

--- GetFilesystemInfos directory list end ---

A few things to note here:

  • Dir: guinness sizes are obviously in error
  • Dir: pacifico sizes are correct
  • Dir: pacifico location is *Local* _not_ Remote as it reports
  • Dir: guinness location is *Remote*, and _not_ Local as it reports

Also, just pacifico can start (without guinness connecting) with incorrect data:

--- GetFilesystemInfos directory list start ---
Dir: pacifico:/var/lib/mythtv/recordings
     Location: Local
     Drive ID: 1
     TotalKB : -1
     UsedKB  : -1
     FreeKB  : 0

--- GetFilesystemInfos directory list end ---

Either case triggers a expiration massacre, in which most of our shows are deleted.

It's obvious Mythtv should not take action on the "TotalKB: -1" and "FreeKB: 0" scenario, but instead Mythtv assumes it needs to clear up free space and wacks everything, making for a very sad wife :(

Suggestion: a) I understand autofs/NFS can be complex, and sometimes mis-configured mounts, or startup timing can render a directory not available but b) Mythtv should correctly handle the case where the sizes reported by GetFilesystemInfos? are obviously in error, and not trigger an serial expiration spree. c) GetFilesystemInfos? is actually plain wrong here, why are we even getting to this erroneous state.

Attachments (1)

statfs.c (1.1 KB) - added by cpinkham 11 years ago.

Download all attachments as: .zip

Change History (14)

comment:1 Changed 11 years ago by anonymous

Additional info... without starting guinness, I can restart mythbackend on pacifico several times and get different output from GetFilesystemInfos?:

--- GetFilesystemInfos directory list start ---
Dir: pacifico:/var/lib/mythtv/recordings
     Location: Local
     Drive ID: 1
     TotalKB : -1
     UsedKB  : -1
     FreeKB  : 0

--- GetFilesystemInfos directory list end ---
--- GetFilesystemInfos directory list start ---
Dir: pacifico:/var/lib/mythtv/recordings
     Location: Remote
     Drive ID: 1
     TotalKB : 878560256
     UsedKB  : 236649472
     FreeKB  : 641910784

--- GetFilesystemInfos directory list end ---

comment:2 Changed 11 years ago by cpinkham

(In [20452]) Don't try to AutoExpire? items on a filesystem if we received invalid (-1) totalSpaceKB or usedSpaceKB results back from GetFilesystemInfos?() for that filesystem.

References #6503, but does not address the root cause of the issue described in the ticket.

comment:3 Changed 11 years ago by cpinkham

(In [20453]) Carry over [20452] from trunk.

Don't try to AutoExpire?? items on a filesystem if we received invalid (-1) totalSpaceKB or usedSpaceKB results back from GetFilesystemInfos??() for that filesystem.

References #6503, but does not address the root cause of the issue described in the ticket.

comment:4 Changed 11 years ago by cpinkham

Owner: changed from Isaac Richards to cpinkham
Status: newassigned

Can you replicate this issue currently?

Can you run both backends with "-v file,schedule" and see if that gives any additional clues to what is going on?

The local vs remote determination is based on the filesystem type. We assume local unless the filesystem type returned from statfs() is a network filesystem. In the years since this code went in, I haven't seen any issues like this so far. If you compiled from source, you could try inserting some debug VERBOSE comments around line 123 in programs/mythbackend/backendutil.cpp.

Are you using the ghost option with automount? That could be causing this issue since the statfs() won't cause a mount, so we're unable to get the actual size and usage info.

comment:5 Changed 11 years ago by nick@…

Yes, using ghost w/autofs. Also, using nfsv4, so local mounts w/autofs are not done via a fs bind; by the logic above, may statfs() then always indicate it's a network fs? Example, on pacifico:

pacifico:recordings on /var/lib/mythtv/recordings type nfs4 (rw,sec=sys,addr=192.168.3.3,clientaddr=192.168.3.3)

With this additional info, let me know how you'd like me to proceed. I can reproduce by restarting the backend... every other time it seems to detect -1 with GetFilesystemInfos?. Note, I can reproduce even with the slave backend out of the picture.

comment:6 Changed 11 years ago by Dibblah

Is the filesystem actually mounted at the time Mythbackend starts?

comment:7 Changed 11 years ago by nick@…

Right, seems obvious now... starting either backend before the fs is mounted results in -1, but if it's mounted (ex, by me requesting an ls of the dir to trigger autofs), correct sizing is reported. This even on both backends (see below). However, problem of myth not knowing which is local remains (is this important?). Local should be pacifico, but instead both backends think it's remote:

pacifico log:

2009-04-26 01:37:17.300 AutoExpire: ExpireRecordings()
2009-04-26 01:37:17.307 Maximal bitrate of busy encoders is 0 KB/min
--- GetFilesystemInfos directory list start ---
Dir: pacifico:/var/lib/mythtv/recordings
     Location: Remote
     Drive ID: 1
     TotalKB : 878560256
     UsedKB  : 236894208
     FreeKB  : 641666048

Dir: guinness:/var/lib/mythtv/recordings
     Location: Remote
     Drive ID: 1
     TotalKB : 878560256
     UsedKB  : 236894208
     FreeKB  : 641666048

--- GetFilesystemInfos directory list end ---

guinness log:

2009-04-26 01:37:06.226 Enabled verbose msgs:  important general file schedule
2009-04-26 01:37:06.237 SG(): CheckAllStorageGroupDirs(): Checking All Storage Group directories
2009-04-26 01:37:07.239 Connecting to master server: 192.168.3.3:6543
2009-04-26 01:37:07.241 Connected successfully

So it seems the CAUSE of the -1 is because of statfs+autofs+ghost option, and CAUSE of both "Remote" is using autofs+nfsv4 (no bind mount for local fs). Ok, I can switch to nfsv3 and nix ghosting as a next step.

However, original bug report of Myth not correctly handling -1 within GetFilesystemInfos? remains. I see some dups... has error handling been added to trunk?

Also, should GetFilesystemInfos? get some attention around filesystem detection, or are the side effects of autofs+ghost and/or autofs+nfs4 assumed a sysadmin problem?

comment:8 Changed 11 years ago by cpinkham

I added code to the AutoExpirer? in both trunk and -fixes to ignore filesystems with a TotalKB or UsedKB == -1. If you are compiling your own binaries and update to svn head then you should be able to see some error messages printed to the log when this situation is encountered.

If you are using automount and the filesystem is not already mounted or automounted when the expirer runs, then there isn't much we can do about it in the code. I wonder if this situation would occur if you used a subdirectory like we recommend. We generally tell people not to put their recordings in the root directory of a mount whether it is nfs or a local drive. The reason for this is because if the filesystem is not mounted for some reason, the backend will happily put the files in the parent filesystem and then those files will be hidden when/if the real recording filesystem is mounted. We recommend putting recordings in a subdir of the mount. If you used a subdir, then autofs would probably cause the filesystem to be mounted when we call statfs() and we'd get the true usage numbers for use in the expirer.

With only one filesystem/directory, you may not be seeing issues in the scheduler, but if you had 2 of them, these invalid usage numbers would severely impact the Storage Groups disk scheduling code.

Can you compile and run the attached statfs.c program? Basically this:

gcc -o statfs statfs.c ; ./statfs /var/lib/mythtv/recordings

Try that with the filesystem mounted and unmounted, then try it again with the filesystem unmounted, but instead of giving it the recordings directory, create a temporary subdirectory underneath recordings and pass statfs the name of the subdirectory (ie, ./statfs /var/lib/mythtv/recordings/tmpsubdir) and see if that gives different results or if it causes the filesystem to be mounted (which it should). If this is the case, then using a subdir would fix the issue for you.

comment:9 Changed 11 years ago by nick@…

Ok, current setup, from the slave backend... autofs is controlling /var/lib/mythtv, and --ghost is turned on; I force a umount and run statfs, then trigger autofs to mount (via ls), and run statfs again:

[nick@guinness tmp]$ ls /var/lib/mythtv
audio  music  pictures  recordings  tv_sync  videos
[nick@guinness tmp]$ sudo umount /var/lib/mythtv/recordings
[nick@guinness tmp]$ ./statfs  /var/lib/mythtv/recordings
Info for  : /var/lib/mythtv/recordings
FS Type   : 0x187
Tot Blks  : 0
Free Blks : 0
Block Size: 4096
TotalKB   :        0 KB ((0 * 4096) >> 10)
FreeKB    :        0 KB ((0 * 4096) >> 10)
UsedKB    :        0 KB (calculated from TotalKB - FreeKB)
[nick@guinness tmp]$ ls /var/lib/mythtv/recordings > /dev/null
[nick@guinness tmp]$ ./statfs  /var/lib/mythtv/recordings
Info for  : /var/lib/mythtv/recordings
FS Type   : 0x6969
Tot Blks  : 857969
Free Blks : 738951
Block Size: 1048576
TotalKB   : 878560256 KB ((857969 * 1048576) >> 10)
FreeKB    : 756685824 KB ((1752170496 * 1048576) >> 10)
UsedKB    : 121874432 KB (calculated from TotalKB - FreeKB)

And again with a tmpsubdir:

[nick@guinness tmp]$ sudo umount /var/lib/mythtv/recordings
[nick@guinness tmp]$ ./statfs /var/lib/mythtv/recordings/tmpsubdir
Info for  : /var/lib/mythtv/recordings/tmpsubdir
FS Type   : 0x6969
Tot Blks  : 857969
Free Blks : 738774
Block Size: 1048576
TotalKB   : 878560256 KB ((857969 * 1048576) >> 10)
FreeKB    : 756504576 KB ((1566572544 * 1048576) >> 10)
UsedKB    : 122055680 KB (calculated from TotalKB - FreeKB)

And finally, without --ghost

[nick@guinness tmp]$ ls /var/lib/mythtv
[nick@guinness tmp]$ ./statfs /var/lib/mythtv/recordings
Info for  : /var/lib/mythtv/recordings
FS Type   : 0x6969
Tot Blks  : 857969
Free Blks : 738452
Block Size: 1048576
TotalKB   : 878560256 KB ((857969 * 1048576) >> 10)
FreeKB    : 756174848 KB ((1228931072 * 1048576) >> 10)
UsedKB    : 122385408 KB (calculated from TotalKB - FreeKB)

You are correct, using a subdir gets around the problem. Also, so it seems, --ghost is indeed causing statfs to behave in seemingly unexpected ways. The FS Type of 0x187 that is returned when the fs is not mounted and --ghost is on is interesting, I don't see this in statfs(2), but I see in stat.c that it's defined as "S_MAGIC_AUTOFS", which makes sense. Perhaps this could be handled within GetFilesystemInfos? somehow?

Meanwhile, I'm going to nix --ghost on my machines, seems it causes more problems then it's worth.

BTW, thanks for the safety net fix in [20452]! I'll see if I can compile from source to test it.

comment:10 Changed 11 years ago by jyavenard@…

FWIW, using NFS here with ubuntu 8.04 on a 6TB share: Info for : /data/videos FS Type : 0x6969 Tot Blks : 22891717 Free Blks : 13895424 Block Size: 262144 TotalKB : 5860279552 KB ((22891717 * 262144) >> 10) FreeKB : 3557228544 KB ((469762048 * 262144) >> 10) UsedKB : 2303051008 KB (calculated from TotalKB - FreeKB)

comment:11 Changed 11 years ago by nick@…

[20452] works great! I recreated the -1 problem, and waited for AutoExpire? to be kicked off... and volla:

2009-04-26 17:52:00.853 AutoExpire: ExpireRecordings()
2009-04-26 17:52:00.860 Maximal bitrate of busy encoders is 0 KB/min
--- GetFilesystemInfos directory list start ---
Dir: pacifico:/var/lib/mythtv/recordings
     Location: Local
     Drive ID: 1
     TotalKB : -1
     UsedKB  : -1
     FreeKB  : 0

Dir: guinness:/var/lib/mythtv/recordings
     Location: Remote
     Drive ID: 1
     TotalKB : 878560256
     UsedKB  : 107846656
     FreeKB  : 770713600

--- GetFilesystemInfos directory list end ---
2009-04-26 17:52:00.878 AutoExpire: FillDBOrdered: Adding deleted programs in FIFO order
2009-04-26 17:52:00.879 AutoExpire: FillDBOrdered: Adding expirable programs in Oldest First order
2009-04-26 17:52:00.885 fsID #1: Total:    -0.0 GB   Used:    -0.0 GB   Free:     0.0 GB
2009-04-26 17:52:00.886 AutoExpire Error: fsID #1 has invalid info, AutoExpire can not run for this filesystem.  Continuing on to next...
2009-04-26 17:52:00.887 Directories on filesystem ID 1:
2009-04-26 17:52:00.887     pacifico:/var/lib/mythtv/recordings
2009-04-26 17:52:00.888     guinness:/var/lib/mythtv/recordings

Changed 11 years ago by cpinkham

Attachment: statfs.c added

comment:12 Changed 11 years ago by cpinkham

Resolution: fixed
Status: assignedclosed

After discussion here and on the mailing list, I'm going to mark this ticket as fixed. There isn't any sane way for us to handle a filesystem that is ghosted since we don't know whether it is local or remote, so it's best to use subdirectories for recordings which will allow statfs() to get the proper type and usage information. That has been our recommendation for other reasons as well, so this is just another confirmation that subdirs are best rather than putting recordings in the root of a mount.

comment:13 Changed 11 years ago by anonymous

Understand. I especially appreciate the safety net change set in the case of invalid GetFilesystemInfos? info. I'm sure this will save the recordings of others in the future as well.

Thanks!

Note: See TracTickets for help on using tickets.