Opened 8 years ago

Closed 8 years ago

Last modified 8 years ago

#10371 closed Bug Report - General (fixed)

mythcommandlineparser / mytharchivehelper fails on files with non-ascii-filenames

Reported by: t.brackertz@… Owned by: Raymond Wagner
Priority: minor Milestone: 0.25
Component: MythTV - General Version: Master Head
Severity: medium Keywords:
Cc: Ticket locked: no

Description

Since mytharchivehelper uses mythcommandlineparser it fails on files with non-ascii-characters in their filename.

For example:

video@videobox:~$ mytharchivehelper --getfileinfo --infile "/video4/Videos_nicht_aus_myth/fertig/\"Arme Sau\" - Das Geschäft mit dem Erbgut_Phoenix_2006_am Anfang fehlt was.mpg" --outfile "/mythpuffer/mytharchive/work/2/streaminfo.xml" --method 0
2012-02-23 02:10:37.187859 C  mytharchivehelper version: master [v0.25pre-4389-gdc9aff8] www.mythtv.org
2012-02-23 02:10:37.187919 N  Enabled verbose msgs:  general jobqueue
2012-02-23 02:10:37.187942 N  Setting Log Level to LOG_INFO
2012-02-23 02:10:37.187993 I  Added logging to the console
2012-02-23 02:10:37.187999 I  Added database logging to table logging
2012-02-23 02:10:37.188064 N  Setting up SIGHUP handler
2012-02-23 02:10:37.188247 N  Using runtime prefix = /usr
2012-02-23 02:10:37.188285 N  Using configuration directory = /home/video/.mythtv
2012-02-23 02:10:37.196982 E  Error parsing: /home/video/.mythtv/config.xml at line: 1  column: 1
2012-02-23 02:10:37.196995 E  Error Msg: unexpected end of file
2012-02-23 02:10:37.213277 N  Empty LocalHostName.
2012-02-23 02:10:37.213289 I  Using localhost value of videobox
2012-02-23 02:10:37.213385 E  Error parsing: /home/video/.mythtv/config.xml at line: 1  column: 1
2012-02-23 02:10:37.213390 E  Error Msg: unexpected end of file
2012-02-23 02:10:37.239179 I  Database connection created: DBManager0
2012-02-23 02:10:37.239215 I  New DB connection, total: 1
2012-02-23 02:10:37.241140 I  Connected to database 'mythconverg' at host: localhost
2012-02-23 02:10:37.243256 I  Closing DB connection named 'DBManager0'
2012-02-23 02:10:37.243364 I  Database connection created: DBManager1
2012-02-23 02:10:37.243377 I  New DB connection, total: 1
2012-02-23 02:10:37.243795 I  Connected to database 'mythconverg' at host: localhost
2012-02-23 02:10:37.245368 I  Current locale DE_DE
2012-02-23 02:10:37.245475 N  Reading locale defaults from /usr/share/mythtv//locales/de_de.xml
2012-02-23 02:10:37.254605 I  getFileInfo(): Opening '/video4/Videos_nicht_aus_myth/fertig/"Arme Sau" - Das Geschäft mit dem Erbgut_Phoenix_2006_am Anfang fehlt was.mpg'
2012-02-23 02:10:37.255003 E  getFileInfo(): Couldn't open input file
			eno: Datei oder Verzeichnis nicht gefunden (2)
2012-02-23 02:10:37.256981 I  Closing DB connection named 'DBManager1'

I tried to narrow down the reason with the following results: Looking at main.cpp of mytharchivehelper:

  • getFileInfo() hasn't changed so the new commandlineparsing in main() is probably in charge
  • here Mytharchivehelper seems to be correctly invoked

Therefore I think there is a bug in mythcommandlineparser.cpp. I tried to locate it but failed so far, but:

As you can see in the example above the character 'ä' gets replaced by the 2 characters 'ä'. I think that looks as if an unicode-String (2 Bytes for 'ä') gets treated as ascii (1 Byte per character). Maybe something like QString::fromLocal8bit() has been applied twice.

Change History (9)

comment:1 Changed 8 years ago by Raymond Wagner

Milestone: unknown0.25
Owner: set to Raymond Wagner
Status: newaccepted
Version: UnspecifiedMaster Head

comment:2 Changed 8 years ago by Raymond Wagner

Status: acceptedinfoneeded

Normally, if your environment language were incorrect, there would be a warning during startup. I don't see that here, but just for reference, could you post the output of locale in your terminal?

comment:3 Changed 8 years ago by t.brackertz@…

Here it is:

LANG=de_DE.UTF-8
LANGUAGE=de:en
LC_CTYPE="de_DE.UTF-8"
LC_NUMERIC="de_DE.UTF-8"
LC_TIME="de_DE.UTF-8"
LC_COLLATE="de_DE.UTF-8"
LC_MONETARY="de_DE.UTF-8"
LC_MESSAGES=de_DE.UTF-8
LC_PAPER="de_DE.UTF-8"
LC_NAME="de_DE.UTF-8"
LC_ADDRESS="de_DE.UTF-8"
LC_TELEPHONE="de_DE.UTF-8"
LC_MEASUREMENT="de_DE.UTF-8"
LC_IDENTIFICATION="de_DE.UTF-8"
LC_ALL=

The same happens if I set LC_ALL= to de_DE.UTF-8

With non-utf-charsets I see the warning.

There is no warning because my encoding is unicode. The bug is new. As described below it worked on the same system with the same files before (I try the same files again and again as I'm fixing bugs in mythburn.py at the moment.)

comment:4 Changed 8 years ago by Raymond Wagner

Status: infoneededassigned

Thanks. Actually, we've traced it down to an issue of the command line parser being run before QApplication is initialized. After init, the behavior of the stored QString is altered, resulting in the issue you're seeing.

comment:5 Changed 8 years ago by Raymond Wagner

Status: assignedaccepted

comment:6 Changed 8 years ago by t.brackertz@…

Ok, it seems to me that QApplication does a QString::fromLocal8Bit on the elements of argv. As QApplication::QApplication gets a pointer to argv this also affects the argv mythcommandlineparser works on. Then mythcommandlineparser does another QString::fromLocal8Bit on argv. So it is applied twice.

Am I right?

If so mythcommandlineparser could check if there is a QApplication-Object using QCoreApplication::instance () and only do QString::fromLocal8Bit if not. So mythcommandlineparser could be used with or without a QApplication-Object.

Maybe I'm completely wrong.

comment:7 Changed 8 years ago by Github

Resolution: fixed
Status: acceptedclosed

Delay String processing until after QApplication has been initialized.

This stores strings, stringlists, and maps containing strings as QByteArrays instead, and waits until QApplication has been created and configured proper text behavior based off the system locale, before processing them into QString values. This resolves an issue where

8-bit unicode text passed on the command line would result in invalid

strings when used.

This bumps the library version. Fixes #10371

Branch: master Changeset: 7adbd54074db3977b6291f38873e529193c57ae8

comment:8 in reply to:  6 Changed 8 years ago by Raymond Wagner

Replying to t.brackertz@…:>

If so mythcommandlineparser could check if there is a QApplication-Object using QCoreApplication::instance () and only do QString::fromLocal8Bit if not.

Very close, but not quite there. ::fromLocal8Bit() does whatever text encoding is proper for that system, but until QApplication is available, it defaults to ::fromAscii(). For your specific case, the text must be processed using ::fromUtf8, but we don't know that until the QApplication is initialized and selects the proper locale.

So the options are:

  • Duplicate the Qt code for determining the proper locale, but that becomes a nuisance to independently maintain.
  • Perform all command line parsing after QApplication has done its thing, but lose out on whatever arguments QApplication has stripped out of the arrays.
    • or feed it empty arrays
    • or make a copy of those arrays before hand
  • Do what the above patch does, storing those parsed values as QByteArrays until QApplication is available, and delay the handling of text encoding until the strings are actually needed.

comment:9 Changed 8 years ago by t.brackertz@…

Works now. Can be closed as fixed.

Thank you

Note: See TracTickets for help on using tickets.