Opened 13 years ago

Closed 13 years ago

#3248 closed defect (fixed)

mytharchive lockfile issues

Reported by: anonymous Owned by: paulh
Priority: minor Milestone: unknown
Component: mytharchive Version: 0.20
Severity: medium Keywords:
Cc: hugh@… Ticket locked: no

Description

I posted this the mythtv-dev list on 2007 Mar 5 http://mythtv.org/pipermail/mythtv-dev/2007-March/054076.html (Trac didn't accept new tickets at that time).

There was no posted response but on 2007 March 18 paulh submitted changeset 13070 that addressed most of these concerns.

I'm adding this ticket record the issues and comment on the changeset.

Verbatim from email to list:


mytharchive has been acting up on me, so I did some investigation.

Sometimes a mythburn step fails (I'm vague on how or what), leaving the mythburn.lck lock file but no mythburn.py process. When this happens, cancel requests are ineffective, and any attempt to archive something just gets you into the logviewer program, whatching the log make no further process. Nothing will ever delete the lock.

For example, we tried to put too much on a DVD and got into that state.

So there are two problems: (1) a failure with no proper diagnostic or termination (2) and then no way to archive after this.

This message was prompted by investigation of problem 2.

I'm not the only one to have experienced this. See, for example http://threebit.net/mail-archive/mythtv-users/msg35922.html

The mythburn.lck lockfile logic in mythburn.py has a few problems.

  • The checking for and creation of a lock ought to be "atomic". Otherwise there can be a race condition. The conventional way of doing this on UNIX-like systems is to try to creat a lockfile using, in Python terms,

os.open(lockfilepath, O_WRONLY | O_CREAT | O_EXCL)

If there is already a lockfile, this fails Otherwise, it creates one. This is atomic.

  • the logic of mythburn.py's try ... finally block at lines 3571 through 3611 seems wrong. The finally part removes the lock file but it may not belong to this process. Luckilly, the program that invokes mythburn.py will not do so if the lock is present. So this case will not come up. Still, this seems quite wrong.
  • the lock file, when present, simply contains the string "lock". It is better to follow the convention that the lock file contains the process ID of the owning process. This allows people and processes to determine if a lock is stale: held by a now-dead process.
  • It seems to me that each place where the lock is checked should then check to see if the lock is stale and act accordingly (break the lock if stale, perhaps with a warning).

Once this last step is implemented, my problem 2 will go away.

I started to code changes to mythburn.py but before getting serious, I'd like to know if the relevant authorities think I'm on the right track.

Attachments (1)

mythburn.py.diff (1.9 KB) - added by hugh@… 13 years ago.
patch to mythburn.py 0.1.20070316-1 [untested]

Download all attachments as: .zip

Change History (5)

comment:1 Changed 13 years ago by anonymous

Thanks, PaulH, for changeset 13070.

There is still a bug in mythburn.py: the mythburn.lck file is removed in line 4039 even if the lockfile is not owned by this process (i.e. was not created by this process). The obvious fix is to move lines 3997 through 4000 outside the try that starts at line 3996.

In actual fact, those lines ought to be redundant. Line 4002, if done right, would eliminate the need for them. Unfortunately, it isn't quite right and leaves a small race condition.

The function call os.open(os.path.join(logpath, "mythburn.lck"), os.O_WRONLY | os.O_CREAT | os.O_EXCL) will fail if the lock file exists and create it if it does not exist (unless there is another problem such as the directory not existing, or not being writeable, or the file system being full, etc.). The correct form of this line should be something like the following (replacing lines 3998 through 4004:

import errno
lckpath = os.path.join(logpath, "mythburn.lck")
try:
    fd = os.open(lckpath, os.O_WRONLY | os.O_CREAT | os.O_EXCL)
    try:
        os.write(fd, "%d\n" % os.getpid())
        os.close(fd)
    except:
        os.remove(lckpath)
        raise
except OSError, e:
    if e.errno == errno.EEXIST:
        write("Lock file exists -- already running???")
        sys.exit(1)
    else:
        fatalError("cannot create lockfile: %s" % e)
# if we get here, we own the lock

The inner try is there in the hopes of making the code more bulletproof. The failure would probably only happen if the disk drive were full.

At this point, the "try" from line 3996 should be moved to after this block of code. That is the point at which we know the lockfile is ours and hence we should remove it when we are done.

I've used "fd" instead of "file" to hold the file descriptor. I think that this makes the code clearer.

I've used "lckpath" to hold the lock file path. Since this is used several places, I think it would be clearer and less subject to bitrot to form it in one place in the code. Other places should reference this variable too (lines 4038 and 4039). Actually, line 4038 ought to now be redundant: 4039 should be unconditional since we know that we've got the lock file.

I've added a \n to the format for the contents of the lockfile. If I correctly understand the code in mytharchive/mail.cpp:checkProcess, it seems to think that the lockfile contains a line.

Speaking of checkProcess, the test for the existence of a process is perhaps not the best. The normal way, in POSIX, is to test with kill(pid,0) system call and see if the result is ESRCH (meaning "no process can be found corresponding to that pid").

comment:2 Changed 13 years ago by paulh

If you create a patch with these changes I will apply it :-)

Changed 13 years ago by hugh@…

Attachment: mythburn.py.diff added

patch to mythburn.py 0.1.20070316-1 [untested]

comment:3 Changed 13 years ago by hugh@…

I've attached a patch for mythburn.py. I tried to run it in my .20 system but the mytharchive system has changed enough that a current mythburn.py cannot just be dropped into a .20 system. So testing was not done. I did some testing of the code block shown in my previous message in isolation.

I did not change mytharchive/mail.cpp:checkProcess because I don't know my way around the library and build system.

comment:4 Changed 13 years ago by paulh

Resolution: fixed
Status: newclosed

(In [13199]) Apply the patch from Hugh to improve the lock file handling in mythburn.py. Also implement his suggestion on how to detect if a process is running from its pid using kill(pid, 0). Closes #3248.

Note: See TracTickets for help on using tickets.