Opened 13 years ago

Closed 6 years ago

#9160 closed Bug Report - General (Won't Fix)

Improper insertion of jobs through MythWeb can block future commflag jobs

Reported by: Ken Emerson <kenneth.emerson@…> Owned by: Raymond Wagner
Priority: minor Milestone: 0.28.2
Component: Plugin - MythWeb Version: Master Head
Severity: medium Keywords:
Cc: Ticket locked: no

Description (last modified by Raymond Wagner)

MythWeb has no protection insertion of new jobs into the jobqueue table. If two instances of the details page are opened and the same job run twice, or the user hits the back button triggering a repeat submission, a duplicate job can be added to the table. This results in the second copy of the job being left in the queue indefinitely, confusing subsequent host selection of tasks.

This will be rendered a non-issue by the jobqueue rewrite, if not fixed by the time that gets merged in.

Attachments (3)

commflag_log.txt (8.9 KB) - added by Ken Emerson <kenneth.emerson@…> 13 years ago.
Requested backend log file of commflag start and segfault
commflag_log_2.txt (2.9 KB) - added by Ken Emerson <kenneth.emerson@…> 13 years ago.
backend logfile with job stuck in jobqueue with status=4
jobqueue.txt (4.6 KB) - added by Ken Emerson <kenneth.emerson@…> 13 years ago.
Then entry of 6/24/2011 14:30 is currently running. The entry of 6/27/2011 14:30 is an orphan.

Download all attachments as: .zip

Change History (29)

comment:1 Changed 13 years ago by cpinkham

What's the status of these orphaned jobs? I looked at the code, and we should set the status to one of JOB_ERRORED, JOB_ABORTED, or JOB_FINISHED, no matter what reason mythcommflag exitted. If it's not errored and not aborted, then we mark it as finished. Can you also tell me what is in the comment field for these jobs? Does that get updated or does it still indicate flagging is underway.

comment:2 Changed 13 years ago by cpinkham

Status: newinfoneeded_new

comment:3 Changed 13 years ago by Ken Emerson <kenneth.emerson@…>

Status = 4

Comment = "99% Completed @ 450.573 fps. "

I had already deleted the two others, but looking back in my DB backup from last night the status for them was also 4 with similar "99% Completed" comments. I have one recording now that consistently produces a segfault when commflagging. After running (and failing) I have the following entry in the jobqueue table:

id chanid starttime inserttime type cmds flags status statustime hostname args comment schedruntime

7905 1081 2010-10-27 21:01:00 2010-10-29 13:29:27 2 0 4 4 2010-10-29 13:37:35 mythtv [BLOB - 0B] 99% Completed @ 450.573 fps. 2010-10-29 13:29:27

comment:4 Changed 13 years ago by cpinkham

Please paste in the relevant portion of your backend logs when mythcommflag starts and when it segfaults. Specifically, I'm looking for some lines that have the word "PID" on them, but include at least 30-40 lines around that when mythcommflag segfaults. There should be a PID line when mythcommflag starts and another one when mythcommflag segfaults showing that the process ended.

Changed 13 years ago by Ken Emerson <kenneth.emerson@…>

Attachment: commflag_log.txt added

Requested backend log file of commflag start and segfault

comment:5 Changed 13 years ago by Ken Emerson <kenneth.emerson@…>

Interestingly enough, when I deleted the entry in the jobqueue table and restarted the commflag job, it still segfaulted but did not remain incomplete in the jobqueue. I've attached the snippet from the original job (on the 27th), although it looks identical to all the others.

comment:6 Changed 13 years ago by beirdo

That log snippet shows the process ending, although we misinterpreted the result as being 139 commercials. I would venture a guess that you are using ubuntu, as I have seen the same weirdness of return value there (it should say that it was killed with signal 11). However, it doesn't look like anything that is hanging on the myth_system side.

I guess we need to investigate more. I assume this is repeatable in your setup?

comment:7 Changed 13 years ago by Ken Emerson <kenneth.emerson@…>

I am running Ubuntu 10.4. This particular recording will not complete a commflag job (always segfaults); therefore, yes it is repeatable. This recording must have something seriously wrong with it. I will keep it around until you tell me you are done with it.

comment:8 Changed 13 years ago by beirdo

The recording is likely corrupt. There's not much that can be done in that case, but however, the job shouldn't be staying in the jobqueue, and that's the core of this bug.

comment:9 Changed 13 years ago by cpinkham

The fact that the preview generator ran indicates that the JobQueue? does see mythcommflag end and the JobQueue? starts it's cleanup work. The preview generator is run after we've updated the job status in the database though, so I don't see how the status would have been left at 'running' with the given log file.

If you can reproduce the issue where the job is left in a '4' (running) status, then please attach backend logs of around the time mythcommflag dies. Run the backend with "-v jobqueue,database,extra" so that I can see what the JobQueue? is doing and what SQL commands and being sent and their results. You can start the log where the segfault occurs and please include at least 200 lines after that point.

comment:10 Changed 13 years ago by Ken Emerson <kenneth.emerson@…>

I had to run the commflag job twice to get the output you asked for. Interesting results: The first run, the process segfaulted but completed; the second run, the process did not segfault, found six commercial breaks (seems reasonable), AND got stuck in the job queue with a status of 4. Log file attached.

Changed 13 years ago by Ken Emerson <kenneth.emerson@…>

Attachment: commflag_log_2.txt added

backend logfile with job stuck in jobqueue with status=4

comment:11 Changed 13 years ago by Ken Emerson <kenneth.emerson@…>

Sometime after this (can't say how long, noticed the next day), the jobqueue did get cleaned up and the "stuck" job disappeared.

comment:12 Changed 13 years ago by stuartm

Status: infoneeded_newnew

comment:13 Changed 13 years ago by Raymond Wagner

Owner: changed from cpinkham to Raymond Wagner
Status: newaccepted

The offending code will end up getting replaced by the jobqueue scheduler as part of #7990.

comment:14 Changed 13 years ago by Raymond Wagner

Status: acceptedinfoneeded

Are you still having this problem after the MythSystem? rewrite in December?

comment:15 Changed 13 years ago by Ken Emerson <kenneth.emerson@…>

I recently upgraded to master (5/18). Soon after that I had many commflag jobs fail. The failures occurred mainly on two TV series: The Burns and Allen Show (Antenna TV OTA) and Wheel of Fortune (NBC OTA). I believe the Burns and Allen Show was the first time I'd recorded on that channel and the recordings that had failed commflag jobs also would not play without segfaulting; however, the Wheel of Fortune shows played fine. With all of these failed jobs in the job queue, it appeared that no more commflag jobs were being scheduled (at one time there were as many as 8 or 9 failed jobs in the queue). When I manually cleared out the queue, commflag jobs began to be queued once again.

Currently I have 11 entries in the jobqueue table (mythweb only shows 6). Three of these are failed commflag jobs that date back to 6/23. I am still getting new jobs queued. I don't know if there is an upper limit of jobs in this queue that causes further queuing to cease. If you wish, I will leave the queue alone and wait for more failed jobs to be entered to see what happens.

comment:16 Changed 13 years ago by Raymond Wagner

There is no limit to the size of the jobqueue table. The only thing that prevents a job from being queued is if the same type of job is already running on that recording. The display in MythWeb is just passthrough from the backend. All errored and queued jobs should be listed, but completed jobs will only be shown for the past four hours.

I don't see anything in the code that would cause this, but I'll leave it open for now. Let me know if the behavior starts up again. If nothing else, it will be fixed automatically when the rewrite goes in.

comment:17 Changed 13 years ago by Ken Emerson <kenneth.emerson@…>

OK. After reading your explanation, I may have to admit to causing the problem via mythweb. I would select a recording from the RECORDINGS page and tell it to queue a commflag job. Instead of waiting for it to return, I would use the browser's back button to select another recording. Somehow, this queued two jobs for each of two recordings which then confused the scheduler leaving the second job of each recording in the queue forever. When I deleted these and carefully added more jobs, some of which always failed, the scheduler worked correctly. I'd say you can close this ticket as fixed (and color me embarrassed).

comment:18 Changed 13 years ago by Raymond Wagner

Component: MythTV - MythjobqueuePlugin - MythWeb
Description: modified (diff)
Owner: changed from Raymond Wagner to Rob Smith
Status: infoneededassigned
Summary: Failed commflag jobs can block future commflag jobsImproper insertion of jobs through MythWeb can block future commflag jobs

comment:19 Changed 13 years ago by Ken Emerson <kenneth.emerson@…>

Some additional (and more useful) information. I was wrong about the browser back button (or at least this isn't what caused the duplicate entries in the job queue). The problem is repeatable:

If you have an entry in the job queue for commflagging that has error'ed out and then go to that recording in the "Recorded Programs" web page and requeue a commflag job (for the same episode), a new entry is placed in the job queue. When the scheduler sees the new job, both the new entry and the error'ed out entry become active. When the commflag job completes, one of the active entries remains active (i.e. shows green and says 95% complete). This entry does not go away which could eventually (if enough of these happen) block new jobs from running.

Attaching contents of jobqueue.

Changed 13 years ago by Ken Emerson <kenneth.emerson@…>

Attachment: jobqueue.txt added

Then entry of 6/24/2011 14:30 is currently running. The entry of 6/27/2011 14:30 is an orphan.

comment:20 Changed 13 years ago by stuartm

Priority: minormajor
Type: defectBug Report - General

If we required jobs to be added through the protocol and not added directly to the database we could ensure that the input was validated and not a duplicate of an existing entry.

comment:21 in reply to:  20 Changed 12 years ago by Raymond Wagner

Milestone: unknown0.26
Owner: changed from Rob Smith to Raymond Wagner
Priority: majorminor

Replying to stuartm:

If we required jobs to be added through the protocol and not added directly to the database we could ensure that the input was validated and not a duplicate of an existing entry.

Unless someone else wants to fix it directly, I'll take this as part of #7990, since the above is one of the intended goals.

comment:22 Changed 12 years ago by Raymond Wagner

Milestone: 0.260.27

comment:23 Changed 11 years ago by Raymond Wagner

Milestone: 0.270.28

comment:24 Changed 8 years ago by Stuart Auchterlonie

Milestone: 0.280.28.1

Moving unresolved tickets to next point release

comment:25 Changed 7 years ago by Stuart Auchterlonie

Milestone: 0.28.10.28.2

Moving remaining open 0.28.1 tickets to 0.28.2

comment:26 Changed 6 years ago by Stuart Auchterlonie

Resolution: Won't Fix
Status: assignedclosed

Closing any remaining tickets for 0.28, if the issue persists, feel free to reopen and align to v29 or master

Note: See TracTickets for help on using tickets.