Opened 8 years ago

Closed 7 years ago

Last modified 6 years ago

#12758 closed Patch - Feature (fixed)

Feature patch: New feature for mythfilldatabase to optionally not use allatonce (--no-allatonce)

Reported by: Gary Buhrmaster <gary.buhrmaster@…> Owned by: Peter Bennett
Priority: minor Milestone: unknown
Component: MythTV - Mythfilldatabase Version: Master Head
Severity: medium Keywords:
Cc: Peter Bennett Ticket locked: no

Description

Feature patch: New feature for mythfilldatabase to optionally not use allatonce.

The option added to mythfilldatabase is '--no-allatonce' and is (sort of) the equivalent to a no-dd-grab-all option (if it existed), --dd-grab-all not being the default due to some systems being under-provisioned.

While for many grabbers that specify a preference for grabbing "allatonce" using it has advantages, for grabbers that return a large amount of data (example: 500+ channels for 21+ days with high details) mythfilldatabase due to its internal structure of parsing the entire xmltv file before processing can end up consuming vast amounts of virtual memory (6GB+ in one test case), which some backends may not be able to support well.

This patch adds in a --no-allatonce option to mythfilldatabase which will cause the fill processing to fall back to a day at a time grab/fill, trading off invoking the grabber multiple times (with increased CPU procesing workload) for a reduced memory requirement.

This patch should (likely) be removed when/if someone changes the entire mythfilldatabase processing to do chunked parsing and processing of the xmltv data (or the patch rejected in its entirely, in the case that that idea of chunked processing is on someones radar to be available RSN).

github ref: https://github.com/garybuhrmaster/mythtv/commit/76bf320504f0749bc73aeb28389d798416178671

github git-am ref: https://github.com/garybuhrmaster/mythtv/commit/76bf320504f0749bc73aeb28389d798416178671.patch

Change History (11)

comment:1 Changed 7 years ago by Peter Bennett

Cc: Peter Bennett added

comment:2 Changed 7 years ago by Peter Bennett

I have a backend with 4 GB memory, so I have been using a manual script which does 3 days at a time. I wonder if this patch can be changed to allow an option of x days at a time instead of all or nothing.

comment:3 Changed 7 years ago by Gary Buhrmaster <gary.buhrmaster@…>

I had thought (I thought I tested this explicitly, but it has been a long time) that the (proposed) --no-allatonce and --refresh <days> parameter would allow you to (sort of?) achieve what you are asking. Advanced grabbers optimize the data retrieval costs. Are you saying (the proposed) --no-allatonce and --refresh <days> combination does not work together?

comment:4 Changed 7 years ago by Peter Bennett

Sorry this is just me being lazy. I did not look into it. Let me try that and see how it works out.

comment:5 Changed 7 years ago by Gary Buhrmaster <gary.buhrmaster@…>

As I recall (again, a long time ago), mythfilldatabase does not optimize the --offset and --days capabilities that grabbers are supposed to support, so a --refresh 4-5 could (in theory) result in a --offset 4 --days 2 request to a grabber rather than a --offset 4 --days 1 followed by a --offset 5 --days 1 request, but there is code in mythfilldatabase which only supports one day at a time (and a comment (which I think is aged, but I am not going to dispute it) which says there is other code which depends on --days 1). And I think someone has mentioned some grabbers do not reliably support all options.

In the end, the better solution is (likely) a code refactor which supports processing the xmltv file in "chunks" rather than requiring it to be read (and stored in memory) entirely. This patch was always intended to be an stopgap until some motivated dev wanted to do that refactor (presumably because their system was memory constrained).

comment:6 Changed 7 years ago by Peter Bennett

Yes, the --refresh parameter does not help. I am trying to download 3 days at a time and avoid having to run mythfilldatabase 7 times. The code in filldata.cpp either does the whole lot at once or 1 day at a time. I think that a parameter to mythfilldatabase to specify the number of days to download at a time may work, as long as the grabber supports it. Maybe I will try that. I did not see the comment that says that the code depends on --days 1

FYI - Your patch fails because the latest filldata.cpp does not include below code that is in your version but not in MythTV. Your version is diverging. I am sure all your changes are in tickets but have not been applied.

    if (only_update_channels && source.xmltvgrabber_apiconfig)
    {
        command += " --list-channels";
    }
else 
Last edited 7 years ago by Peter Bennett (previous) (diff)

comment:7 Changed 7 years ago by Peter Bennett

Trying to increase the number of days done at a time will be difficult as much of the logic processes a day at a time. I have abandoned that idea.

I tested the patch and I found that in order for it to be effective it must be used in conjunction with --refresh 0-21. If I use --refresh all, or no refresh parameter, it only pulls tomorrow and future days for which it does not already have data. However, schedules change and I like to update all changes on every run.

This works well and uses minimal memory (see command below). It also seems better than creating individual files with an external run of the grabber and then feeding them into mythfilldatabase, because mythfilldatabase has some overhead on each run.

mythfilldatabase --no-allatonce --refresh 0-21

When used with json grabbers like tv_grab_zz_sdjson_sqlite which cache the data, there should be no impact on the schedules direct server from using -no-allatonce, since the data is cached in a local database and there should not be extra load on the upstream server.

comment:8 Changed 7 years ago by Peter Bennett

Owner: changed from stuartm to Peter Bennett
Status: newaccepted

comment:9 Changed 7 years ago by Gary Buhrmaster <gary.buhrmaster@…>

Resolution: fixed
Status: acceptedclosed

In ee7052fe93d43c65672995b01b0a21a8fa5e0537/mythtv:

New feature for mythfilldatabase to optionally not use allatonce.

The option added to mythfilldatabase is '--no-allatonce' and is (sort of)
the equivalent to a no-dd-grab-all option (if it existed), --dd-grab-all
not being the default due to some systems being under-provisioned.

While for many grabbers that specify a preference for grabbing
"allatonce" using it has advantages, for grabbers that return a
large amount of data (example: 500+ channels for 21+ days with high
details) mythfilldatabase due to its internal structure of parsing
the entire xmltv file before processing can end up consuming vast
amounts of virtual memory (6GB+ in one test case), which some backends
may not be able to support well.

This patch adds in a --no-allatonce option to mythfilldatabase which
will cause the fill processing to fall back to a day at a time grab/fill,
trading off invoking the grabber multiple times (with increased CPU
procesing workload) for a reduced memory requirement.

Fixes #12758

Signed-off-by: Peter Bennett <pbennett@…>

comment:10 in reply to:  7 Changed 7 years ago by Gary Buhrmaster <gary.buhrmaster@…>

Replying to pbennett: ....

When used with json grabbers like tv_grab_zz_sdjson_sqlite which cache the data, there should be no impact on the schedules direct server from using -no-allatonce, since the data is cached in a local database and there should not be extra load on the upstream server.

Just to close the loop here a bit, at least the tv_grab_zz_sdjson_sqlite grabber has a (very) slight additional overhead, as the grabber only retrieves necessary data for the particular request day, so rather than downloading any new/additional data for all days at once, you now end up retrieving the (approximately same amount of data) one day at a time. I presume (but have zero specific knowledge) that the upstream efficiency is slightly less. Noise in the grand scheme of things, I expect. If you really want to reduce the minor inefficiency, have a cron job runs that grabber with "--download-only" option to pull in all the data at once before running mythfilldatabase.

And as for (a previous comment) code divergence. Sorry about that. It is possibly an artifact due to too many patches in those related code areas over the last few years, and possibly an artifact of my aborted attempts with additional ideas/fixes/cleanup. Time (for me) to go review those things.

And while it is no longer relevant, when the patch was originally submitted there was a comment by one of the devs on irc that it seemed to be the wrong approach (which I agreed to, the better approach is a code refactor, but no one currently had that itch/time).

comment:11 Changed 6 years ago by Peter Bennett

Owner: changed from Peter Bennett to Peter Bennett
Note: See TracTickets for help on using tickets.