Opened 13 years ago

Closed 10 years ago

#10047 closed Developer Task (Won't Fix)

Optimize Mythweb Program Listing

Reported by: fracmak@… Owned by: Rob Smith
Priority: minor Milestone: unknown
Component: Plugin - MythWeb Version: Master Head
Severity: medium Keywords:
Cc: Ticket locked: no

Description

So I've been working on what I thought would be a simple science project and has kind of taken on a life of it's own. I've been investing why the program listing page seems to take about 2-3 seconds to load on a Intel Core 2 Duo processor at 0 load. My setup has about 200 channels, which generates about 200k program entries, 2800 oldrecorded entries.

My investigation has shown what averages out to be about 2600 db queries on each page load. The flow seems to go as follows

1: get callsign list 2: for each call sign, request channel from db (200 queries) 3: for each channel, request a list of programs program (give or take 4 per channel, so 800 queries) 4: each program preloads a channel from the db (800 more requests) 5: each program needs to check the oldrecorded table (800 more requests)

Each query is fairly fast, taking between 1-4ms, but when you add all that up, you get almost 3 seconds of load time. The advantage of this method though is that you can start streaming the html to the client fairly fast, so you get the illusion of speed, but you quickly overwhelm the db server as you start navigating around the program listing.

I've rewritten this entire process so it boils down to 2 queries, which are much easier to optimize. The first query is the heavy lifter. It loads up the entire program listing of your view combined with the channel information for display in one full swoop. After much poking at it, I was able to get the performance of that query to come in at about 500ms on my machine, which still seems high, but is a full 300-400% improvement over the old method. There's one downside to this query, to get the performance this far down, I had to put an extra constraint on the query. The old query had a where clause of program_start_time < display_end_time && program_end_time > display_start_time which worked out well cause it no matter how long a program was, you'd capture it if it crossed the display you were looking at. Unfortunately this has a HUGE performance hit on the db due to the way the indexes work. I did some metrics on my data and found that no program is longer than 8 hours, so I put a cap on the start_time to be 12 hours, just to be safe. This cut down a 1.2s query to 500ms, which I think is worth it. If we find a program that's longer running, I can easily up that number. The second query is one of the oldrecorded queries that I found extremely hard to integrate into the main query without destroying the performance, so it's broken out into it's own query. I added these 2 queries as a new simplified version of load_all_program_data() called load_program_list(). I also rewrote list_data.php from a channel loop followed by a program loop, to just a program loop that has a variable to track when a channel changes. I've removed a bunch of logic in my attempt to overhaul the loop and simplify the code, so I'm certain I've messed something up for someone, so there's much testing to be done (but it seems to be working fairly well for my on my machine).

Now the bad news, to fix #4 above, I had to comment out an assumption that if a chanid is passed into the program object, it'll construct a channel object from the db. I don't particularly like this assumption, because it assumes you always want to have that db hit for each program entry you build (this assumption exists for Schedule and Recording objects as well), which is detrimental to the program listing performance. I've already found two function calls that depended on this assumption, the ajax get_show_details called from the program listing page, and the program details page.

So I decided to stop here, until I could detail out what I've done and get feedback. I'd love to dig in more to the mythweb project and try and clean up/streamline/simplify some of the functionality. I have a number of ideas on how to break apart the themes from the underlying code to make it easier to skin mythweb and make building new functionality easier, but I figured I'd get the blessings from the mythweb gods, and make certain that what I've done so far meets with their approval. I've uploaded my changes to a github fork of the mythweb project for everyone to see:

https://github.com/fracmak/mythweb/tree/listings_optimization_scratch

Change History (7)

comment:1 Changed 13 years ago by Jay Merrifield <fracmak@…>

I should note that I was able to get the master query down to 300-350ms by changing one of the indexes on the oldrecorded tables, but I'm not certain what the protocol is for making DB changes, who I need to go through, etc etc.

comment:2 Changed 13 years ago by beirdo

If you need an index changed, might as well give the details in the ticket here, and we'll work from that.

comment:3 Changed 13 years ago by Jay Merrifield <fracmak@…>

Sorry for the delayed response, the e-mail got lost in my mythtv filtering, the index I need is as follows:

create index "program_series_recstatus" on oldrecorded ("programid","seriesid","recstatus")

I know it looks very similar to the "recstatus" index ("recstatus","programid","seriesid") but it's a more efficient query by far because recstatus has low cardinality, and is almost as slow as doing a table search. By putting programid as being first, the query time is cut in half with the query I've created.

I'd love to get some feedback on the other optimizations I've made, I've been digging through the mythtv code to see how the scheduled/conflict decision making is done, and I found the same slow query there, so there's a chance some of this code can be reused there, but I'd rather focus on finding the best algorithm in the web interface before attempting something in the core code.

comment:4 Changed 11 years ago by Rob Smith

Status: newinfoneeded_new

Well, the concept of the split was to allow easier memory caching via APC/Memcache and thus not even hit the DB for these queries most of the time. That was never really finished.

Is this the true diff:

https://github.com/fracmak/mythweb/compare/master...listings_optimization_scratch ?

comment:5 Changed 11 years ago by fracmak@…

Yup, that is the diff. Took me a minute to remember what I was doing. The only really odd part was where I commented out the channel information from the Program.php. Don't remember why I left it in but I know why I took it out, which was because I was running the queries based on channels not programs and hence did not need the channel information set because it's being loaded by the query.

comment:6 Changed 11 years ago by paulh

Status: infoneeded_newnew

comment:7 Changed 10 years ago by Rob Smith

Resolution: Won't Fix
Status: newclosed

Given Mythweb is EOL, I'm closing new features. This should be done via the backend webserver now.

Note: See TracTickets for help on using tickets.