Opened 14 years ago

Closed 14 years ago

#3645 closed defect (fixed)

MythMusic CD-ripper: UTF8 and year issue

Reported by: otto at kolsi dot fi Owned by: Nigel
Priority: minor Milestone: unknown
Component: mythmusic Version: head
Severity: medium Keywords:
Cc: Ticket locked: no


MythMusic CD-ripper doesn't handle other than ascii characters correctly. When ripping a CD that has e.g. 'ä' characters both in album name and in song titles, there are problems:

  • CD-rip screen that shows song information has two squares where there should be a special character
  • ID3-tag information and Myth DB gets corrupted information

This is when Song/Album? info is coming from the FreeDB/CDDB. Haven't tried but it probably works correctly if you go and manually change all these square characters to correct ones.

During song encoding, following lines are printed to frontend log:

TagLib: String::prepare() - Unicode conversion error.

In addition, FreeDB/CDDB lookups do not seem to get the year of the album, although year info is present in albums when manually searching from:

Change History (6)

comment:1 Changed 14 years ago by otto at kolsi dot fi

Same character corruption problem occurs also in MythMusic player if you play a CD with song names that contain non-ascii characters (and CD is successfully found in FreeDB/CDDB).

comment:2 Changed 14 years ago by Nigel

A few years ago, UTF8 conversion was added to the CDDB lookup [5215]. I am still researching, but I think that was slightly wrong, because most CDDB data doesn't seem to be UTF8. The most correct way to fix this would be to check for UTF8 escape sequences, but here is my quick fix:

Index: cddecoder.cpp
--- cddecoder.cpp       (revision 13626)
+++ cddecoder.cpp       (working copy)
@@ -398,7 +398,7 @@
         return NULL;
-    compilation_artist = QString::fromUtf8(discdata.data_artist);
+    compilation_artist = discdata.data_artist;
     if (compilation_artist.lower().left(7) == "various")
@@ -406,7 +406,7 @@
-    album = QString::fromUtf8(discdata.data_title);
+    album = discdata.data_title;
     genre = cddb_genre(discdata.data_genre);
     if (!genre.isEmpty()) 
@@ -416,8 +416,8 @@
         genre = flet + rt;
-    title = QString::fromUtf8(discdata.data_track[tracknum - 1].track_name);        
-    artist = QString::fromUtf8(discdata.data_track[tracknum - 1].track_artist);
+    title  = discdata.data_track[tracknum - 1].track_name;
+    artist = discdata.data_track[tracknum - 1].track_artist;
     if (artist.length() < 1)
@@ -425,6 +425,14 @@
       compilation_artist = "";
+    if (CDDB_PROTOCOL_LEVEL > 5)  // Proto 6 (and above?) supposedly UTF8
+    {
+        compilation_artist = QString::fromUtf8(compilation_artist);
+        album              = QString::fromUtf8(album);
+        title              = QString::fromUtf8(title);
+        artist             = QString::fromUtf8(artist);
+    }
     if (title.length() < 1)
         title = QString(QObject::tr("Track %1")).arg(tracknum);

As for the year, libcdaudio was written before this was added to the CDDB database (protocol level 5?). My Darwin Qt CDDB lookup does the right thing, but it is a while before that will be used for Linux as well.

comment:3 Changed 14 years ago by Nigel

Owner: changed from Isaac Richards to Nigel
Status: newassigned

comment:4 Changed 14 years ago by Nigel

Quick fix not good enough. Despite the proto specified, CDDBd seems to just return the data in its files (i.e it does not convert to UTF-8 for proto 6). So, MythMusic will have to check the encoding of the returned strings.

comment:5 Changed 14 years ago by Nigel

(In [13988]) New function to detect whether a string contains UTF-8 sequences. For the rare situations where we don't know the format of data. See #3645

comment:6 Changed 14 years ago by Nigel

Resolution: fixed
Status: assignedclosed

(In [14003]) Make CDDB lookup automatically determine encoding. Closes #3645

Note: See TracTickets for help on using tickets.