Opened 13 years ago
Closed 13 years ago
Last modified 13 years ago
#10371 closed Bug Report - General (fixed)
mythcommandlineparser / mytharchivehelper fails on files with non-ascii-filenames
Reported by: | Owned by: | Raymond Wagner | |
---|---|---|---|
Priority: | minor | Milestone: | 0.25 |
Component: | MythTV - General | Version: | Master Head |
Severity: | medium | Keywords: | |
Cc: | Ticket locked: | no |
Description
Since mytharchivehelper uses mythcommandlineparser it fails on files with non-ascii-characters in their filename.
For example:
video@videobox:~$ mytharchivehelper --getfileinfo --infile "/video4/Videos_nicht_aus_myth/fertig/\"Arme Sau\" - Das Geschäft mit dem Erbgut_Phoenix_2006_am Anfang fehlt was.mpg" --outfile "/mythpuffer/mytharchive/work/2/streaminfo.xml" --method 0 2012-02-23 02:10:37.187859 C mytharchivehelper version: master [v0.25pre-4389-gdc9aff8] www.mythtv.org 2012-02-23 02:10:37.187919 N Enabled verbose msgs: general jobqueue 2012-02-23 02:10:37.187942 N Setting Log Level to LOG_INFO 2012-02-23 02:10:37.187993 I Added logging to the console 2012-02-23 02:10:37.187999 I Added database logging to table logging 2012-02-23 02:10:37.188064 N Setting up SIGHUP handler 2012-02-23 02:10:37.188247 N Using runtime prefix = /usr 2012-02-23 02:10:37.188285 N Using configuration directory = /home/video/.mythtv 2012-02-23 02:10:37.196982 E Error parsing: /home/video/.mythtv/config.xml at line: 1 column: 1 2012-02-23 02:10:37.196995 E Error Msg: unexpected end of file 2012-02-23 02:10:37.213277 N Empty LocalHostName. 2012-02-23 02:10:37.213289 I Using localhost value of videobox 2012-02-23 02:10:37.213385 E Error parsing: /home/video/.mythtv/config.xml at line: 1 column: 1 2012-02-23 02:10:37.213390 E Error Msg: unexpected end of file 2012-02-23 02:10:37.239179 I Database connection created: DBManager0 2012-02-23 02:10:37.239215 I New DB connection, total: 1 2012-02-23 02:10:37.241140 I Connected to database 'mythconverg' at host: localhost 2012-02-23 02:10:37.243256 I Closing DB connection named 'DBManager0' 2012-02-23 02:10:37.243364 I Database connection created: DBManager1 2012-02-23 02:10:37.243377 I New DB connection, total: 1 2012-02-23 02:10:37.243795 I Connected to database 'mythconverg' at host: localhost 2012-02-23 02:10:37.245368 I Current locale DE_DE 2012-02-23 02:10:37.245475 N Reading locale defaults from /usr/share/mythtv//locales/de_de.xml 2012-02-23 02:10:37.254605 I getFileInfo(): Opening '/video4/Videos_nicht_aus_myth/fertig/"Arme Sau" - Das Geschäft mit dem Erbgut_Phoenix_2006_am Anfang fehlt was.mpg' 2012-02-23 02:10:37.255003 E getFileInfo(): Couldn't open input file eno: Datei oder Verzeichnis nicht gefunden (2) 2012-02-23 02:10:37.256981 I Closing DB connection named 'DBManager1'
I tried to narrow down the reason with the following results: Looking at main.cpp of mytharchivehelper:
- getFileInfo() hasn't changed so the new commandlineparsing in main() is probably in charge
- here Mytharchivehelper seems to be correctly invoked
Therefore I think there is a bug in mythcommandlineparser.cpp. I tried to locate it but failed so far, but:
As you can see in the example above the character 'ä' gets replaced by the 2 characters 'ä'. I think that looks as if an unicode-String (2 Bytes for 'ä') gets treated as ascii (1 Byte per character). Maybe something like QString::fromLocal8bit() has been applied twice.
Change History (9)
comment:1 Changed 13 years ago by
Milestone: | unknown → 0.25 |
---|---|
Owner: | set to Raymond Wagner |
Status: | new → accepted |
Version: | Unspecified → Master Head |
comment:2 Changed 13 years ago by
Status: | accepted → infoneeded |
---|
comment:3 Changed 13 years ago by
Here it is:
LANG=de_DE.UTF-8 LANGUAGE=de:en LC_CTYPE="de_DE.UTF-8" LC_NUMERIC="de_DE.UTF-8" LC_TIME="de_DE.UTF-8" LC_COLLATE="de_DE.UTF-8" LC_MONETARY="de_DE.UTF-8" LC_MESSAGES=de_DE.UTF-8 LC_PAPER="de_DE.UTF-8" LC_NAME="de_DE.UTF-8" LC_ADDRESS="de_DE.UTF-8" LC_TELEPHONE="de_DE.UTF-8" LC_MEASUREMENT="de_DE.UTF-8" LC_IDENTIFICATION="de_DE.UTF-8" LC_ALL=
The same happens if I set LC_ALL= to de_DE.UTF-8
With non-utf-charsets I see the warning.
There is no warning because my encoding is unicode. The bug is new. As described below it worked on the same system with the same files before (I try the same files again and again as I'm fixing bugs in mythburn.py at the moment.)
comment:4 Changed 13 years ago by
Status: | infoneeded → assigned |
---|
Thanks. Actually, we've traced it down to an issue of the command line parser being run before QApplication is initialized. After init, the behavior of the stored QString is altered, resulting in the issue you're seeing.
comment:5 Changed 13 years ago by
Status: | assigned → accepted |
---|
comment:6 follow-up: 8 Changed 13 years ago by
Ok, it seems to me that QApplication does a QString::fromLocal8Bit on the elements of argv. As QApplication::QApplication gets a pointer to argv this also affects the argv mythcommandlineparser works on. Then mythcommandlineparser does another QString::fromLocal8Bit on argv. So it is applied twice.
Am I right?
If so mythcommandlineparser could check if there is a QApplication-Object using QCoreApplication::instance () and only do QString::fromLocal8Bit if not. So mythcommandlineparser could be used with or without a QApplication-Object.
Maybe I'm completely wrong.
comment:7 Changed 13 years ago by
Resolution: | → fixed |
---|---|
Status: | accepted → closed |
Delay String processing until after QApplication has been initialized.
This stores strings, stringlists, and maps containing strings as QByteArrays instead, and waits until QApplication has been created and configured proper text behavior based off the system locale, before processing them into QString values. This resolves an issue where
8-bit unicode text passed on the command line would result in invalid
strings when used.
This bumps the library version. Fixes #10371
Branch: master Changeset: 7adbd54074db3977b6291f38873e529193c57ae8
comment:8 Changed 13 years ago by
Replying to t.brackertz@…:>
If so mythcommandlineparser could check if there is a QApplication-Object using QCoreApplication::instance () and only do QString::fromLocal8Bit if not.
Very close, but not quite there. ::fromLocal8Bit() does whatever text encoding is proper for that system, but until QApplication is available, it defaults to ::fromAscii(). For your specific case, the text must be processed using ::fromUtf8, but we don't know that until the QApplication is initialized and selects the proper locale.
So the options are:
- Duplicate the Qt code for determining the proper locale, but that becomes a nuisance to independently maintain.
- Perform all command line parsing after QApplication has done its thing, but lose out on whatever arguments QApplication has stripped out of the arrays.
- or feed it empty arrays
- or make a copy of those arrays before hand
- Do what the above patch does, storing those parsed values as QByteArrays until QApplication is available, and delay the handling of text encoding until the strings are actually needed.
Normally, if your environment language were incorrect, there would be a warning during startup. I don't see that here, but just for reference, could you post the output of
locale
in your terminal?