Opened 12 years ago

Closed 12 years ago

Last modified 12 years ago

#3549 closed patch (fixed)

EIT doesn't handle á Á é É í Í ó Ó ú Ú ñ Ñ well. Spanish EPG

Reported by: josesuarez1983 <j.suarez.agapito@…> Owned by: Janne Grunau
Priority: minor Milestone: 0.21
Component: eit Version: head
Severity: medium Keywords: EIT, Special characters, Spanish Stations
Cc: j.suarez.agapito@… Ticket locked: no

Description

Hello to everybody:

I'm using 13559 svn rev, and Kubuntu 7.04. The system is set to use Spanish UTF-8 (defacto setting for Ubuntu with Spanish language). I've seen some people with problems on being able to get special characters on the program guide. I've tried applying some of the hints that were given, such as including the following in a .cnf file in the mysql config directory:

[mysqld] character-set-server=latin1

[mysql.server] character-set-server=latin1

[mysqld_safe] character-set-server=latin1

[mysql] default-character-set=latin1

However, I can't get some of the special characters. I'm using the Spanish stations' EPG as the input for EIT. I don't know what charset the stations use here because I get a very strange thing:

There are about 5 or 6 stations whose EPG is shown correctly on the Program Guide in MythTV, that is all the folowing characters are shown right: á Á é É í Í ó Ó ú Ú ñ Ñ However, all the other stations' EPGs aren't shown correctly. These are problems I found. The first part is what's contained in the mythconverg database, and the arrow shows what should be shown:

DON QUIJOTE (1“ PARTE) --> DON QUIJOTE (1ª PARTE)

BÆsico --> Básico FINAL CHAMPIONS: MILN - LIVERPOOL --> FINAL CHAMPIONS: MILÁN - LIVERPOOL

¿QuØ comemos hoy? --> ¿Qué comemos hoy?

AL SALIR DE CLASE: 'Jugando con espŦritus' --> AL SALIR DE CLASE: 'Jugando con espíritus' AQU˝NO HAY QUIÉN VIVA --> AQUÍ NO HAY QUIÉN VIVA

AL SALIR DE CLASE: 'Alguien matð a la estrella de la radio' --> AL SALIR DE CLASE: 'Alguien mató a la estrella de la radio' CLASIFICACI©N FORMULA 1: 'G.P. MONACO" -> CLASIFICACIÓN FORMULA 1: 'G.P. MONACO"

Aæos luz --> Años luz CINE-MATRIX: SKY CAPTAIN Y EL MUNDO DEL MA¹ANA --> CINE-MATRIX: SKY CAPTAIN Y EL MUNDO DEL MAÑANA

A la œltima --> A la última

Well, the funny thing is that I have a Siemens Gigaset which uses vdr internally and that device can read all those special characters in all the stations' EPG, so I suppose there must be something wrong in how mythbackend stores the info.

Finally, I'm attaching a file which contains the whole EIT table contained in the mythconverg database, so as to show what errors there are. If you guys need any additional info, please tell me.

Thanks for the effort you are placing in the development of MythTV. It really is a terrific app!

Attachments (2)

mythtv eit.txt (237.7 KB) - added by josesuarez1983 <j.suarez.agapito@…> 12 years ago.
The EIT portion of the mythconverg database
t3549_dvbt_spain.diff (652 bytes) - added by Janne Grunau 12 years ago.
eitfixups for spanish dvb-t

Download all attachments as: .zip

Change History (22)

Changed 12 years ago by josesuarez1983 <j.suarez.agapito@…>

Attachment: mythtv eit.txt added

The EIT portion of the mythconverg database

comment:1 Changed 12 years ago by josesuarez1983 <j.suarez.agapito@…>

Oh, and by the way, these are the locales generated in my system:

jose@amd64:~$ sudo dpkg-reconfigure locales Password: Generating locales...

en_AU.UTF-8... up-to-date en_BW.UTF-8... up-to-date en_CA.UTF-8... up-to-date en_DK.UTF-8... up-to-date en_GB.UTF-8... up-to-date en_HK.UTF-8... up-to-date en_IE.UTF-8... up-to-date en_IN.UTF-8... up-to-date en_NZ.UTF-8... up-to-date en_PH.UTF-8... up-to-date en_SG.UTF-8... up-to-date en_US.ISO-8859-1... up-to-date en_US.ISO-8859-15... up-to-date en_US.UTF-8... up-to-date en_ZA.UTF-8... up-to-date en_ZW.UTF-8... up-to-date es_AR.UTF-8... up-to-date es_BO.UTF-8... up-to-date es_CL.UTF-8... up-to-date es_CO.UTF-8... up-to-date es_CR.UTF-8... up-to-date es_DO.UTF-8... up-to-date es_EC.UTF-8... up-to-date es_ES.ISO-8859-1... up-to-date es_ES.ISO-8859-15... up-to-date es_ES.UTF-8... up-to-date es_GT.UTF-8... up-to-date es_HN.UTF-8... up-to-date es_MX.UTF-8... up-to-date es_NI.UTF-8... up-to-date es_PA.UTF-8... up-to-date es_PE.UTF-8... up-to-date es_PR.UTF-8... up-to-date es_PY.UTF-8... up-to-date es_SV.UTF-8... up-to-date es_US.UTF-8... up-to-date es_UY.UTF-8... up-to-date es_VE.UTF-8... up-to-date

Generation complete.

comment:2 in reply to:  1 Changed 12 years ago by Stuart Auchterlonie

Status: newinfoneeded_new

Is this still happening with latest svn? There have been quite a few utf8 fixes committed

Stuart

comment:3 Changed 12 years ago by Stuart Auchterlonie

Milestone: unknown0.21
Version: unknownhead

comment:4 Changed 12 years ago by anonymous

Looks like the email I sent in response to your questions never got approved (it might be in the waiting for approval state...), so I'm enclosing it here:

_

Yes, I'm using svn trunk from yesterday at 15:00 GMT approximately (can't remember svn revision). Still happens but, as explained previously, not in all channels, so I suppose some are using different charsets. However, in the VDR 1.4.4 my set-top-box is running, the EPG characters appear all correct. I am totally willing to give further info is needed. If you require an EPG table dump, tell me and I will do so (if so, please, give me directions on how to do it because I don't know almost anything about SQL).

Thanks a lot for spending some time on this bug.

Regards,

José.

Right now I'm using yesterday's svn version (around rev 15533) and still these issues happen. Do you need any info from the EIT table of mythconverg?

Thanks in advance. Oh, by the way, great improvement the multirec thing! Now MythTV is even greater!

comment:5 Changed 12 years ago by Janne Grunau

It sounds if those channels don't indicate the encoding of the text correctly. If someone attachs the output of "SELECT transportid, networkid, serviceid FROM dtv_multiplex,channel WHERE dtv_multiplex.mplexid = channel.mplexid ORDER BY networkid, transportid, serviceid;" for the broken channels it can be fixed easily.

comment:6 Changed 12 years ago by Janne Grunau

Status: infoneeded_newnew

comment:7 Changed 12 years ago by Janne Grunau

Owner: changed from Stuart Auchterlonie to Janne Grunau
Status: newaccepted

comment:8 Changed 12 years ago by Janne Grunau

Status: acceptedinfoneeded

comment:9 Changed 12 years ago by anonymous

Okay, I have managed to get some data from mythconverg database. I have checked the "program" table with mysql-admin and saved it to a local file. I have seen there is more than one representation for special characters. Here are my findings:

Places where there should be an á: in the table appear two symbols for á: á and Æ á shows correct in EIT timetable, the latter doesn't.

Places where there should be an é: é and Ø Only é shows correct in EIT timetable

Places where there should be an í: í- and Ŧ ­ Only í- shows like í in the EIT table in MythTV.

Places where there should be an ó: ó and ð Only ó shows correct in EIT timetable.

Places where there should be an ú: ú and Å“ Only ú shows correct

Places where there should be a ñ: ñ and æ Only ñ shows correct

Places where there should be a Ñ: Ñ and ¹ Only Ñ shows correctly

I still need to work out some characters: the capitalized letters, ¿, ¡, º and ª (and probably some other too). I hope this info helps. Should you need further information, please, ask for it.

Best regards,

José

comment:10 Changed 12 years ago by Janne Grunau

Status: infoneededassigned

I need the indentifiers of the channels with broken characters

SELECT transportid, networkid, serviceid, channame
FROM channel, dtv_multiplex
WHERE channel.mplexid = dtv_multiplex.mplexid AND
      channame IN ("broken channel 1", "broken channel 2", ...);

should give the needed information

comment:11 Changed 12 years ago by anonymous

I will try to get that info tonight. I don't know much about mysql but I'll try to get that by browsing the tables using mysql-admin. Thank you, janne.

comment:12 in reply to:  10 Changed 12 years ago by miguel.yarza@…

Hi janne, here are the identifiers of the channels with broken characters in spanish dvb-t, as I get them in Bilbao (a spanish city).
In spanish dvb-t there are national channels, regional channels and local channels, channels below are National channels with broken EIT characters.
EIT characters in regional channels, as I watch them in Bilbao, are OK, but there could be some other regional channels with wrong characters, maybe someone can post them.

transportid networkid serviceid name
10 8916 100 Teledeporte
10 8916 260 VEO
10 8916 261 SET en VEO
10 8916 262 Tienda en VEO
10 8916 300 NET TV
12 8916 180 Telecinco
12 8916 181 T5 Estrellas
12 8916 182 T5 Sport
12 8916 301 Flymusic
13 8916 140 ANTENA 3
13 8916 141 Neox
13 8916 142 Nova
13 8916 341 Telehit

Thank you and regards

Miguel

Changed 12 years ago by Janne Grunau

Attachment: t3549_dvbt_spain.diff added

eitfixups for spanish dvb-t

comment:13 Changed 12 years ago by Janne Grunau

Type: defectpatch

please try t3549_dvbt_spain.diff.

I suspose all channels on those thre multiplexes are affected. You have to clear the program and eit_cache tables to get new data and test the patch.

comment:14 Changed 12 years ago by anonymous

Sorry for the delay in posting back. In Madrid there are also the following channels with problems:

Transportid networdid Serviceid Name 6200 8916 421 Telemadrid 6200 8916 422 laotra 6200 8916 423 onda6

comment:15 Changed 12 years ago by anonymous

It got a bit screwed. Here it is again:
Transportid networdid Serviceid Name
6200 8916 421 Telemadrid
6200 8916 422 laotra
6200 8916 423 onda6

comment:16 Changed 12 years ago by anonymous

I'm going to try the patch adding

fix[ 6200LL << 32 | 8916 << 16 ] = EITFixUp::kEFixForceISO8859_15;

as well. I'll tell if it works. Thanks for the patch!

comment:17 Changed 12 years ago by anonymous

OK, I've just compiled Mythtv with the patch by janne with the Telemadrid addition by me and:

1) The channels that are affected by janne's patch show all the special characters perfectly! Thanks, Janne and Miguel.

2) Still no EIT data has been "harvested" for the Telemadrid multiplex (the one I added with the 6200 transportid) so I can't tell yet. I will tell if everything shows correct when EIT data shows up.

For the moment I think Janne's patch could be proposed for being merged to eithelper.

comment:18 Changed 12 years ago by anonymous

The EIT has just shown up (sorry for posting so many times within so little time) for the Telemadrid multiplex (transportid 6200) and the EIT data shows the special characters perfectly too! :)

Thus, the following line could be added to Janne's patch:

fix[ 6200LL << 32 | 8916 << 16 ] = EITFixUp::kEFixForceISO8859_15;

Thanks

comment:19 Changed 12 years ago by Janne Grunau

Resolution: fixed
Status: assignedclosed

(In [16054]) Force ISO8859-15 encoding on a couple of spanish DVB-T multiplexes. Closes #3549

comment:20 Changed 12 years ago by Janne Grunau

(In [16055]) Merges revision [16054] from trunk: Force ISO8859-15 encoding on a couple of spanish DVB-T multiplexes. Closes #3549

Note: See TracTickets for help on using tickets.