Opened 4 years ago

Closed 4 years ago

Last modified 4 years ago

#13607 closed Bug Report - General (fixed)

Program description not being extracted where title expands across additional dvb field

Reported by: bib1963 Owned by: Klaas de Waal
Priority: minor Milestone: 32.0
Component: MythTV - General Version: Master Head
Severity: medium Keywords:
Cc: Ticket locked: no

Description

A program "The league of Extraordinary Gentlemen" is being transmitted here in the UK.

The program description is not being extracted.

From the db:
MariaDB [mythtvdb]> select starttime,title,subtitle,description from program where title like "%gentlemen%" limit 1;
+---------------------+---------------------------------------+----------+-------------+
| starttime           | title                                 | subtitle | description |
+---------------------+---------------------------------------+----------+-------------+
| 2020-04-12 17:50:00 | The League of Extraordinary Gentlemen |          |             |
+---------------------+---------------------------------------+----------+-------------+
1 row in set (0.05 sec)

And from dvbsnoop, the relevant extract:

            DVB-DescriptorTag: 77 (0x4d)  [= short_event_descriptor]
            descriptor_length: 230 (0xe6)
              ISO639_2_language_code:  eng
            event_name_length: 30 (0x1e)
            event_name: "The League of Extraordinary..."  -- Charset: Latin alphabet
            text_length: 195 (0xc3)
            text_char: "...Gentlemen: (2003) Fantasy with Sean Connery. In an alternative Victorian age, Allan Quatermain, Dorian Gray, Captain Nemo, Mina Harker and the Invisible Man stop a world war. Violence.  [AD,S]"  -- Charset: Latin alphabet

I assume it breaks when it hits that colon at the end of "Gentlemen".

Change History (10)

comment:1 Changed 4 years ago by Klaas de Waal

Which channel is it? Is this on DVB-T/T2 (Freeview) or on Astra 28.2E satellite (Freesat)? If it is on Freesat I might be able to reproduce this.

comment:2 Changed 4 years ago by bib1963

That particular extraction was on DVB-T2, but I am sure I have also seen it on satellite.

comment:3 Changed 4 years ago by bib1963

Here are some more which seem to be missing descriptions:

2020-04-20 13:30:00 | Beyond Stardom                                     
2020-04-20 20:00:00 | Harbour Lights                                     
2020-04-21 00:10:00 | House                                              
2020-04-22 18:30:00 | Lawmen of the Old West                             
2020-04-18 09:45:00 | Tad the Lost Explorer and the Secret of King Midas 
2020-04-20 19:30:00 | Tales of the Unexpected                            
2020-04-18 18:50:00 | The League of Extraordinary Gentlemen              
2020-04-18 20:00:00 | World Without End  

I'm not sure all of them could be hit by data going across multiple fields. "House" is very short and would appear to have corrupted entries or they are not using plain ascii, yet it's the same same entries. Here is the dvbsnoop details...

        DVB-DescriptorTag: 77 (0x4d)  [= short_event_descriptor]
            descriptor_length: 93 (0x5d)
              ISO639_2_language_code:  eng
            event_name_length: 5 (0x05)
            event_name: "House"  -- Charset: Latin alphabet
            text_length: 83 (0x53)
            text_char: "..215264.363l.313j_]351.M263376342ޛ222235336.333).8251246277251v202214303314327.347307341/363p327z.@210Ip.[.330E272351352246355356242e276<270256C.273`.3323s342.M257@"  -- Charset: reserved

comment:4 Changed 4 years ago by Stuart Auchterlonie

That looks suspiciously like it's been encoded as some of the Freesat stuff is

comment:5 Changed 4 years ago by Klaas de Waal

Owner: set to Klaas de Waal
Status: newassigned

comment:6 Changed 4 years ago by Klaas de Waal

Status: assignedinfoneeded

The issue with the "The League of Extraordinary Gentlemen" has been reproduced on channel 300, Film4, on Astra-2 28E2. A fix for this issue has been applied in master in commit c1fb397f7f6ad25845f6fe7cde0cead07e11c932.

Please give feedback on this, especially if it does not only fix the "League" issue but if it causes unwanted effects, i.e. regressions, on other programs.

comment:7 Changed 4 years ago by Klaas de Waal <kdewaal@…>

In c1fb397f7/mythtv:

UK EIT fixup fix for missing description

The UK EIT fixup did remove the description completely if there
was a year present anywhere in the subtitle or description.
If there was a year present in the title then the description was
replaced by the right part of the title starting with the year,
thereby also discarding the existing description.
This code is now deactivated and will be deleted when the correctness
of this fix is proven and there are no regressions.

Refs #13607

comment:8 Changed 4 years ago by Klaas de Waal

With additional debug code running for 24 hours receiving EIT from Astra-2 there are four occasions with two different programs where the description would be discarded because there was a year in the concatenated string, as shown here:

2020-04-21 02:01:59.505022 I  KdW UK EIT fixup fix #13607
    position1 m_ukYear 108
    strFull 'Hollywood's Brightest Bombshell: The Hedy Lamarr Story. Documentary about Hollywood wild-child Hedy Lamarr. [2017]'
    kdwfix t,s,d 'Hollywood's Brightest Bombshell' '' 'The Hedy Lamarr Story. Documentary about Hollywood wild-child Hedy Lamarr. [2017]'
    no_fix t,s,d 'Hollywood's Brightest Bombshell' '' ''
--
2020-04-21 02:13:04.503598 I  KdW UK EIT fixup fix #13607
    position1 m_ukYear 50
    strFull 'Teenage Mutant Ninja Turtles: Out of the Shadows: (2016) Part-animated superhero adventure. The quartet of crime-fighting friends try to stop their enemy Shredder from helping the alien Krang from conquering Earth.'
    kdwfix t,s,d 'Teenage Mutant Ninja Turtles: Out of the Shadows' '' '(2016) Part-animated superhero adventure. The quartet of crime-fighting friends try to stop their enemy Shredder from helping the alien Krang from conquering Earth.'
    no_fix t,s,d 'Teenage Mutant Ninja Turtles: Out of the Shadows' '' ''

The "no_fix" string is the title, subtitle, description as a result of the original code and that code discards the desciption because there is a year in the concatenated string.

The "kdwfix" string is the title, subtitle, description with the fix applied. Note that the year in the description is removed by later processing so you do not see that in the guide.

comment:9 Changed 4 years ago by Klaas de Waal <kdewaal@…>

Resolution: fixed
Status: infoneededclosed

In 96a8372d11/mythtv:

Finalize UK EIT Fixup fix

Finalize the fix by removing the deactivated code
now that the fix shows to be correct after 24h testing.

Fixes #13607

comment:10 Changed 4 years ago by Stuart Auchterlonie

Milestone: needs_triage32.0
Version: UnspecifiedMaster Head
Note: See TracTickets for help on using tickets.