Opened 14 years ago

Closed 14 years ago

#333 closed defect (fixed)

mythfilldatabase retrieves data for every day for the second datadirect source

Reported by: bolek-mythtv@… Owned by: cpinkham
Priority: minor Milestone: unknown
Component: mythtv Version: 0.18.1
Severity: medium Keywords: mythfilldatabase datadirect
Cc: Ticket locked: no

Description (last modified by Oscar Carlsson)

I have two tuner cards and each uses a different source with a different channel lineup from the same datadirect account.

When I run mythfilldatabase (with no options), it downloads the data for tomorrow and the 14-th day for the first source (lineup) but then it downloads the data for every single day for the second source (which I understand is not the intended behavior)

Attachments (1)

mythfilldatabase.log (23.4 KB) - added by bolek-mythtv@… 14 years ago.
output from mythfilldatabase --refresh-today

Download all attachments as: .zip

Change History (21)

comment:1 Changed 14 years ago by greg

Owner: changed from Isaac Richards to greg

Does your second source contain a lot of channels with no scheduling data? If so then if the total number of programs for a given day is less than the number of channels * 4 it is assuming there is a possible problem with the low program count and forcing a grab of that days data.

I have a source here that has many channels with no data and as such I see the same thing. We could add a --dont-refresh-lowdata type option that would cause it to ignore days with "low" overall program data.

comment:2 Changed 14 years ago by anonymous

No, there is data, but the second source has only 5 channels in total (all OTA HDTV). They all have full guide data.

Attached is the log from mythfilldatabase --refresh-today (but the behavior is the same with no options).

comment:3 Changed 14 years ago by bolek-mythtv@…

Sorry, the previous comment was by me (bolek-mythtv@…), the reporter.

Changed 14 years ago by bolek-mythtv@…

Attachment: mythfilldatabase.log added

output from mythfilldatabase --refresh-today

comment:4 Changed 14 years ago by anonymous

I would like to know how many channels you have on your two sources, thanks.

comment:5 Changed 14 years ago by bolek-mythtv@…

The first source has 53 channels and the second source has 5 channels.

comment:6 Changed 14 years ago by Oscar Carlsson

Description: modified (diff)

I am kind of working on this one btw.

comment:7 Changed 14 years ago by bolek-mythtv@…

Hal Burch suggested the following test:

If you delete the first source, does it still grab all of the second?

Such a test would determine if it's a problem with having two sources or some issue particular to the second source.

I deleted the first source and mythfilldatabase still fetches every single day for the remaining source. So, it seems to be an issue with that source.

Here are the zap2it source parameters:

Zip Code: 01730 Provider: Local Broadcast Listings - Antenna Channels: 2-2, 4-1, 5-1, 7-1, 25-1

Hopefully it will be possible to reproduce the problem with this info...

comment:8 Changed 14 years ago by cpinkham

(In [7951]) Add in a little more debug info for mythfilldatabase to tell us why a day's data was refreshed/downloaded. Refs #333 but that report is against 0.18.1 so this commit won't help debug that.

comment:9 Changed 14 years ago by cpinkham

Can you run the following SQL queries and tell us what the result are? These are what mythfilldatabase would be running in order to see whether it needs to redownload a day's data. I'm looking at a patch by Oscar that he thinks might be a fix for this, but I want to see if this is why you're having to download all days.

SELECT COUNT(*) FROM program LEFT JOIN channel USING (chanid) WHERE sourceid = 2 AND starttime >= DATE_ADD(CURRENT_DATE(), INTERVAL '1 18' DAY_HOUR) AND starttime < DATE_ADD(CURRENT_DATE(), INTERVAL 2 DAY); SELECT COUNT(*) FROM program LEFT JOIN channel USING (chanid) WHERE sourceid = 2 AND starttime >= DATE_ADD(CURRENT_DATE(), INTERVAL '2 18' DAY_HOUR) AND starttime < DATE_ADD(CURRENT_DATE(), INTERVAL 3 DAY); SELECT COUNT(*) FROM program LEFT JOIN channel USING (chanid) WHERE sourceid = 2 AND starttime >= DATE_ADD(CURRENT_DATE(), INTERVAL '3 18' DAY_HOUR) AND starttime < DATE_ADD(CURRENT_DATE(), INTERVAL 4 DAY); SELECT COUNT(*) FROM program LEFT JOIN channel USING (chanid) WHERE sourceid = 2 AND starttime >= DATE_ADD(CURRENT_DATE(), INTERVAL '4 18' DAY_HOUR) AND starttime < DATE_ADD(CURRENT_DATE(), INTERVAL 5 DAY);

comment:10 Changed 14 years ago by cpinkham

cut-and-paste formatting issues on that SQL, but the queries should still work fine.

comment:11 Changed 14 years ago by bolek-mythtv@…

This looks OK in the ticket, not sure about e-mail

mysql> SELECT COUNT(*) FROM program LEFT JOIN channel USING (chanid) WHERE
    ->  sourceid = 2 AND starttime >= DATE_ADD(CURRENT_DATE(), INTERVAL '1 18'
    ->  DAY_HOUR) AND starttime < DATE_ADD(CURRENT_DATE(), INTERVAL 2 DAY);
+----------+
| COUNT(*) |
+----------+
|       45 |
+----------+
1 row in set (0.03 sec)

mysql> SELECT COUNT(*) FROM program LEFT JOIN channel USING (chanid) WHERE
    ->  sourceid = 2 AND starttime >= DATE_ADD(CURRENT_DATE(), INTERVAL '2 18'
    ->  DAY_HOUR) AND starttime < DATE_ADD(CURRENT_DATE(), INTERVAL 3 DAY);
+----------+
| COUNT(*) |
+----------+
|       43 |
+----------+
1 row in set (0.01 sec)

mysql> SELECT COUNT(*) FROM program LEFT JOIN channel USING (chanid) WHERE
    ->  sourceid = 2 AND starttime >= DATE_ADD(CURRENT_DATE(), INTERVAL '3 18'
    ->  DAY_HOUR) AND starttime < DATE_ADD(CURRENT_DATE(), INTERVAL 4 DAY);
+----------+
| COUNT(*) |
+----------+
|       43 |
+----------+
1 row in set (0.01 sec)

mysql> SELECT COUNT(*) FROM program LEFT JOIN channel USING (chanid) WHERE
    ->  sourceid = 2 AND starttime >= DATE_ADD(CURRENT_DATE(), INTERVAL '4 18'
    ->  DAY_HOUR) AND starttime < DATE_ADD(CURRENT_DATE(), INTERVAL 5 DAY);
+----------+
| COUNT(*) |
+----------+
|       41 |
+----------+
1 row in set (0.00 sec)

mysql>

comment:12 Changed 14 years ago by anonymous

Chris: Yeah so with the chancnt from the first source it would trigger a refresh of that day since (53*4 > 45) but it wouldn't be triggered if chancnt was correct, (5*4 < 45).

comment:13 Changed 14 years ago by cpinkham

I know you said you only have 5 channels, but can you humor me and also run this query:

select count(*) from channel where sourceid = 2;

chancnt is declared inside the sources loop, and if this issue was related to chancnt not being reset between sources, then the problem would have gone away when you deleted the first source.

comment:14 Changed 14 years ago by bolek-mythtv@…

mysql> select count(*) from channel where sourceid = 2;
+----------+
| count(*) |
+----------+
|       19 |
+----------+
1 row in set (0.01 sec)

mysql>

I guess I am not sure what is is :-) The above is the total number of channels, including the invisible ones and channels with no data. The number of channels in the DataDirect? source is 5.

comment:15 Changed 14 years ago by cpinkham

Owner: changed from greg to cpinkham

That's the cause right there. You have 19 channels for sourceid 2, the low-limit threshold would be 19 * 4 = 76 and you only have 41-45 programs for those 19 channels. Can you do the following queries and they may help me work around this issue.

select c.chanid, c.visible, count(p.starttime) from channel c left join program p on c.chanid = p.chanid AND starttime >= DATE_ADD(CURRENT_DATE(), INTERVAL '3 18' DAY_HOUR) AND starttime < DATE_ADD(CURRENT_DATE(), INTERVAL 4 DAY) where c.sourceid = 2 group by c.chanid, c.visible;

select c.chanid, c.visible, count(p.starttime) from channel c left join program p on c.chanid = p.chanid AND starttime >= DATE_ADD(CURRENT_DATE(), INTERVAL '3 12' DAY_HOUR) AND starttime < DATE_ADD(CURRENT_DATE(), INTERVAL 4 DAY) where c.sourceid = 2 group by c.chanid, c.visible;

select c.chanid, c.visible, count(p.starttime) from channel c left join program p on c.chanid = p.chanid AND starttime >= DATE_ADD(CURRENT_DATE(), INTERVAL 3 DAY) AND starttime < DATE_ADD(CURRENT_DATE(), INTERVAL 4 DAY) where c.sourceid = 2 group by c.chanid, c.visible;

I think that we could put a little more logic in determining whether a channel had enough data, by looking at the day previous. For instance if the previous day only had 10 programs on a channel, there's no reason to expect the day we're looking at to have 5 in the last 6 hours of the day. I have 6 HD channels with no data in them for whole days, and you said you do also, so exlucind those from the average will help get a better idea if we need to refresh a day or not. If you run the above queries, I think I'll have enough info to get a fix in prior to the 0.19 release.

comment:16 Changed 14 years ago by bolek-mythtv@…

Note that there is one channel that has data but it's not visible. This is correct (I made it invisible because of poor reception).

mysql> select c.chanid, c.visible, count(p.starttime) from channel c left join
    ->  program p on c.chanid = p.chanid AND starttime >= DATE_ADD(CURRENT_DATE(),
    ->  INTERVAL '3 18' DAY_HOUR) AND starttime < DATE_ADD(CURRENT_DATE(),
    ->  INTERVAL 4 DAY) where c.sourceid = 2 group by c.chanid, c.visible;
+--------+---------+--------------------+
| chanid | visible | count(p.starttime) |
+--------+---------+--------------------+
|   2000 |       1 |                  7 |
|   2001 |       0 |                  0 |
|   2002 |       0 |                  0 |
|   2003 |       1 |                  8 |
|   2004 |       0 |                  0 |
|   2005 |       0 |                  0 |
|   2006 |       1 |                  9 |
|   2007 |       1 |                  7 |
|   2008 |       0 |                  0 |
|   2009 |       0 |                  0 |
|   2010 |       0 |                  0 |
|   2011 |       0 |                  0 |
|   2012 |       0 |                  0 |
|   2013 |       0 |                  0 |
|   2014 |       0 |                  7 |
|   2015 |       0 |                  0 |
|   2016 |       0 |                  0 |
|   2017 |       0 |                  0 |
|   2018 |       0 |                  0 |
+--------+---------+--------------------+
19 rows in set (0.08 sec)

mysql> select c.chanid, c.visible, count(p.starttime) from channel c left join
    ->  program p on c.chanid = p.chanid AND starttime >= DATE_ADD(CURRENT_DATE(),
    ->  INTERVAL '3 12' DAY_HOUR) AND starttime < DATE_ADD(CURRENT_DATE(),
    ->  INTERVAL 4 DAY) where c.sourceid = 2 group by c.chanid, c.visible;
+--------+---------+--------------------+
| chanid | visible | count(p.starttime) |
+--------+---------+--------------------+
|   2000 |       1 |                 14 |
|   2001 |       0 |                  0 |
|   2002 |       0 |                  0 |
|   2003 |       1 |                 10 |
|   2004 |       0 |                  0 |
|   2005 |       0 |                  0 |
|   2006 |       1 |                 13 |
|   2007 |       1 |                 16 |
|   2008 |       0 |                  0 |
|   2009 |       0 |                  0 |
|   2010 |       0 |                  0 |
|   2011 |       0 |                  0 |
|   2012 |       0 |                  0 |
|   2013 |       0 |                  0 |
|   2014 |       0 |                 16 |
|   2015 |       0 |                  0 |
|   2016 |       0 |                  0 |
|   2017 |       0 |                  0 |
|   2018 |       0 |                  0 |
+--------+---------+--------------------+
19 rows in set (0.01 sec)

mysql> select c.chanid, c.visible, count(p.starttime) from channel c left join
    ->  program p on c.chanid = p.chanid AND starttime >= DATE_ADD(CURRENT_DATE(),
    ->  INTERVAL 3 DAY) AND starttime < DATE_ADD(CURRENT_DATE(), INTERVAL 4 DAY)
    ->  where c.sourceid = 2 group by c.chanid, c.visible;
+--------+---------+--------------------+
| chanid | visible | count(p.starttime) |
+--------+---------+--------------------+
|   2000 |       1 |                 33 |
|   2001 |       0 |                  0 |
|   2002 |       0 |                  0 |
|   2003 |       1 |                 21 |
|   2004 |       0 |                  0 |
|   2005 |       0 |                  0 |
|   2006 |       1 |                 26 |
|   2007 |       1 |                 32 |
|   2008 |       0 |                  0 |
|   2009 |       0 |                  0 |
|   2010 |       0 |                  0 |
|   2011 |       0 |                  0 |
|   2012 |       0 |                  0 |
|   2013 |       0 |                  0 |
|   2014 |       0 |                 27 |
|   2015 |       0 |                  0 |
|   2016 |       0 |                  0 |
|   2017 |       0 |                  0 |
|   2018 |       0 |                  0 |
+--------+---------+--------------------+
19 rows in set (0.01 sec)

comment:17 Changed 14 years ago by cpinkham

Status: newassigned

OK, Looks like we can work around this by looking at the previous day for a channel and if there is no data for that channel (when there is for thers), we exclude it from the average when checking to see if we need to refresh the day's data. I'll code something up. I have a channel or two with no data as well that I can use to test.

comment:18 Changed 14 years ago by Oscar Carlsson

Doesn't DD have built-in cache functions? IMO this would be better handled by the grabber itself since we _always_ want to upgrade to make sure we get last minute changes etc, which is impossible to detect without actually asking the server, for example tv_grab_se_swedb has built in cache routines which looks at the server timestamps to decide if it should download or not..

comment:19 Changed 14 years ago by anonymous

The data is downloaded in one xml file I believe and the file is built on-the-fly since zap2it knows what channels you have in your lineup. If we query them, they still have to go through the work of checking to see if any of our channels have changed. I'm not sure if they support that functionality, and our goal is to minimize impact on their servers, so for now I think we're doing the best we can. I didn't write the DD code though, so I'm not 100% sure what they support so I may be wrong.

comment:20 Changed 14 years ago by cpinkham

Resolution: fixed
Status: assignedclosed

(In [8048]) Add some more "do we need to refresh" logic to mythfilldatabase. We now check which channels had data for the prevous day so we can make a better guess at how much we should have for today. Closes #333. In the example given, we were refreshing because the user had 19 channels, but only had 6 with data. The chancnt * 4 test was failing because of all the channels that had no data. I purposefully do not count only visible channels because there can be cases where channels are visible but still have no data.

Also VERBOSE-ify parts of mythfilldatabase, default flags are important,general like most other programs, but specifying "-v important,general,channel" will give more debug information now. You can also turn off general and get only important info (suppresses wget, etc.). More cerr/cout cleanup will follow.

Note: See TracTickets for help on using tickets.