Opened 23 months ago

Last modified 11 months ago

#12989 new Patch - Feature

Feature patch: Use XMLTV dd_progid data to create seriesid

Reported by: Gary Buhrmaster <gary.buhrmaster@…> Owned by: stuartm
Priority: minor Milestone: 29.2
Component: MythTV - Mythfilldatabase Version: Master Head
Severity: medium Keywords:
Cc: Peter Bennett Ticket locked: no

Description

Feature patch: Use XMLTV dd_progid data to create seriesid

[I wanted to get this into trac so the work is not lost]

From discussions with John Poet, that started with regarding matching of a series titled "taboo" (which was being used by two substantially different shows) it was identified that the current seriesid processing for XMLTV grabbers using a hash for the title is non-optimal in the (Schedules Direct) case where the seriesid can be more accurately represented (via a supplied dd_progid). The internal datadirect grabber creates such a seriesid from its feeds.

From some research and discussions with Schedules Direct regarding the gracenote upstream (raw) data, there is a true seriesid available, but passing that through to MythTV, and then adjusting existing data has some challenges (some upstream). And while there are some potential approaches, rather than wait for the perfect, we can do something now which is equivalent to the internal dd grabber.

This patch creates a better seriesid from a XMLTV source which provides a dd_progid with a seriesid which is compatible with the (internal) DataDirect? grabber (further moving forward to allowing the eventual removal of the internal DD grabber).

The following patch (based on some initial work by Mr. Poet; he should probably get credit/signoff for the code, should he concur with my conclusion as to his contribution) creates an appropriate seriesid.

Note to the dev reviewing this code:

The uniqueid value was added to the code in commit 347ea0319330cb06cf3e418e12e79602c3235bc0 (over a decade ago) however it was never set, and always empty, resulting in some dead code. It appears that the uniqueid may have been intended to serve the same/equivalent purposes of programid (at some point). This patch removes the (dead) uniqueid along the way as part of the larger set of fixes.

I have been running this patch (and the patches from #12742) in my production environment for some time now, and the results are quite acceptable for my use cases using the XMLTV grabber for Schedules Direct.

As this patch changes the seriesid for new grabbed programs, those that are using the "this series" matching based on the previous seriesid may run into some issues. Likely the easiest solution to recommend is to recreate those rules.

diff --git a/mythtv/programs/mythfilldatabase/xmltvparser.cpp b/mythtv/programs/mythfilldatabase/xmltvparser.cpp
index 0089768..4197df8 100644
--- a/mythtv/programs/mythfilldatabase/xmltvparser.cpp
+++ b/mythtv/programs/mythfilldatabase/xmltvparser.cpp
@@ -283,8 +283,7 @@ static void parseAudio(QDomElement &element, ProgInfo *pginfo)
 
 ProgInfo *XMLTVParser::parseProgram(QDomElement &element)
 {
-    QString uniqueid, season, episode, totalepisodes;
-    int dd_progid_done = 0;
+    QString programid, season, episode, totalepisodes;
     ProgInfo *pginfo = new ProgInfo();
 
     QString text = element.attribute("start", "");
@@ -459,8 +458,11 @@ ProgInfo *XMLTVParser::parseProgram(QDomElement &element)
                     int idx = episodenum.indexOf('.');
                     if (idx != -1)
                         episodenum.remove(idx, 1);
-                    pginfo->programId = episodenum;
-                    dd_progid_done = 1;
+                    programid = episodenum;
+                    /* Only EPisodes and SHows are part of a series for SD */
+                    if (programid.startsWith(QString("EP")) ||
+                        programid.startsWith(QString("SH")))
+                        pginfo->seriesId = QString("EP") + programid.mid(2,8);
                 }
                 else if (info.attribute("system") == "xmltv_ns")
                 {
@@ -557,22 +559,20 @@ ProgInfo *XMLTVParser::parseProgram(QDomElement &element)
         && ProgramInfo::kCategorySeries != pginfo->categoryType)
         pginfo->airdate = current_year;
 
-    /* Let's build ourself a programid */
-    QString programid;
-
-    if (ProgramInfo::kCategoryMovie == pginfo->categoryType)
-        programid = "MV";
-    else if (ProgramInfo::kCategorySeries == pginfo->categoryType)
-        programid = "EP";
-    else if (ProgramInfo::kCategorySports == pginfo->categoryType)
-        programid = "SP";
-    else
-        programid = "SH";
-
-    if (!uniqueid.isEmpty()) // we already have a unique id ready for use
-        programid.append(uniqueid);
-    else
+    if (programid.isEmpty())
     {
+
+        /* Let's build ourself a programid */
+
+        if (ProgramInfo::kCategoryMovie == pginfo->categoryType)
+            programid = "MV";
+        else if (ProgramInfo::kCategorySeries == pginfo->categoryType)
+            programid = "EP";
+        else if (ProgramInfo::kCategorySports == pginfo->categoryType)
+            programid = "SP";
+        else
+            programid = "SH";
+
         QString seriesid = QString::number(ELFHash(pginfo->title.toUtf8()));
         pginfo->seriesId = seriesid;
         programid.append(seriesid);
@@ -610,8 +610,8 @@ ProgInfo *XMLTVParser::parseProgram(QDomElement &element)
                 programid.clear();
         }
     }
-    if (dd_progid_done == 0)
-        pginfo->programId = programid;
+
+    pginfo->programId = programid;
 
     return pginfo;
 }

Additionally, here is a proposed patch to adjust existing data for the few limited cases so that matches based on existing data and seriesid will work better. I am unsure if it might not have some bad side effects, and it is only (mostly) useful for those (very very?) few that were using Schedules Direct data using the internal DD grabber and moved to using the XMLTV grabber, so a very small use case, but I have (manually) run the SQL commands against my production DB to verify that no unexpected things happen.

For those who always used Schedules Direct using the internal DD grabber, there is a set of sql updates that might improve future accuracy in matching by seriesid, but as it is very dependent on the specifics of how a system was/is running, those updates cannot be automated and would have to be run manually by interested users, so will not be provided in this ticket. Additionally, for those primarily now using Schedules Direct there is a method to convert existing recording rules to the new seriesid, but as this too, iss very dependant on the specifics of a system, again, that manual database update will not be provided.

This patch has only been compile tested (caveat emptor).

My current thinking is that since this impacts few users, that this patch should *not* be committed, but I felt I should at least throw it out there in the case others have other opinions.

diff --git a/mythtv/libs/libmythtv/dbcheck.cpp b/mythtv/libs/libmythtv/dbcheck.cpp
index a0892ce..d2d0ab4 100644
--- a/mythtv/libs/libmythtv/dbcheck.cpp
+++ b/mythtv/libs/libmythtv/dbcheck.cpp
@@ -3331,6 +3331,37 @@ NULL
             return false;
     }
 
+    if (dbver == "1346")
+    {
+        const char *updates[] = {
+            /* Adjust (correct?) seriesid */
+            "UPDATE oldrecorded "
+            "  SET seriesid = CONCAT('EP', SUBSTR(seriesid, 3, 8)) "
+            "  WHERE seriesid like 'EP%' OR seriesid like 'SH%' ",
+            "UPDATE recorded "
+            "  SET seriesid = CONCAT('EP', SUBSTR(seriesid, 3, 8)) "
+            "  WHERE seriesid like 'EP%' OR seriesid like 'SH%' ",
+            "UPDATE record "
+            "  SET seriesid = CONCAT('EP', SUBSTR(seriesid, 3, 8)) "
+            "  WHERE seriesid like 'EP%' OR seriesid like 'SH%' ",
+            "UPDATE program "
+            "  SET seriesid = CONCAT('EP', SUBSTR(seriesid, 3, 8)) "
+            "  WHERE seriesid like 'EP%' OR seriesid like 'SH%' ",
+            /* Remove seriesid for non-series' */
+            "UPDATE oldrecorded SET seriesid = '' "
+            "  WHERE seriesid LIKE 'MV%' OR seriesid LIKE 'SP%' ",
+            "UPDATE recorded SET seriesid = '' "
+            "  WHERE seriesid LIKE 'MV%' OR seriesid LIKE 'SP%' ",
+            "UPDATE record SET seriesid = '' "
+            "  WHERE seriesid LIKE 'MV%' OR seriesid LIKE 'SP%' ",
+            "UPDATE program SET seriesid = '' "
+            "  WHERE seriesid like 'MV%' OR seriesid like 'SP%' ",
+            NULL
+        };
+        if (!performActualUpdate(&updates[0], "1347", dbver))
+            return false;
+    }
+
     /*
      * TODO the following settings are no more, clean them up with the next schema change
      * to avoid confusion by stale settings in the database

Change History (7)

comment:1 Changed 22 months ago by jpoet

The first part of this has been committed in [4e5cf2b4ef].

I am unsure about the second part. I would be fine for me, but I don't know how it would affect non-US users. Stuart, Karl, do you have thoughts?

comment:2 Changed 22 months ago by Stuart Auchterlonie

Milestone: unknown29.0

John,

I only use EIT data, so I can only comment on this use case.

Specifically it isn't going to impact UK EIT users as we don't have any existing seriesid data that matches the queries.

Would need someone who uses one of the other guide data sources to see what they get for these queries.

mysql> select distinct seriesid from oldrecorded WHERE seriesid like 'EP%' OR seriesid like 'SH%';
Empty set (0.00 sec)

mysql> select distinct seriesid from recorded WHERE seriesid like 'EP%' OR seriesid like 'SH%';
Empty set (0.00 sec)

mysql> select distinct seriesid from record WHERE seriesid like 'EP%' OR seriesid like 'SH%';
Empty set (0.00 sec)

mysql> select distinct seriesid from program WHERE seriesid like 'EP%' OR seriesid like 'SH%';
Empty set (0.00 sec)

Regards Stuart

comment:3 Changed 20 months ago by Peter Bennett

Cc: Peter Bennett added

comment:4 Changed 20 months ago by Peter Bennett

I have used SD data direct and now I use SD xmltv

select distinct seriesid from oldrecorded WHERE seriesid like 'EP%' OR seriesid like 'SH%';
Many results with EP, none with SH.

select distinct seriesid from recorded WHERE seriesid like 'EP%' OR seriesid like 'SH%';
None

select distinct seriesid from record WHERE seriesid like 'EP%' OR seriesid like 'SH%';
None

select distinct seriesid from program WHERE seriesid like 'EP%' OR seriesid like 'SH%';
None

comment:5 Changed 13 months ago by Stuart Auchterlonie

Milestone: 29.029.1

comment:6 Changed 11 months ago by Stuart Auchterlonie

Milestone: 29.10.28.2

Moving remaining open tickets to 0.28.2 milestone

comment:7 Changed 11 months ago by Stuart Auchterlonie

Milestone: 0.28.229.2

Moving remaining open tickets to 29.2 milestone

Note: See TracTickets for help on using tickets.