Opened 8 years ago

Closed 8 years ago

#9918 closed Bug Report - General (fixed)

Incorrect character encoding in xml status (patch provided)

Reported by: Ian Dall <ian@…> Owned by: beirdo
Priority: minor Milestone: 0.25
Component: MythTV - General Version: Unspecified
Severity: medium Keywords: xml encoding
Cc: Ticket locked: no

Description

Fetching the xml status, eg by

wget http://mythmaster:6544/Status/xml

Yields incorrect xml when non-ascii characters are in the relevant program descriptions. Here in Australia, channel 10 likes to assert copyright over the program information and puts the copyright symbol into its EIT data.

In HttpStatus::GetStatusXML() the output uses QDomDocument::stream() which defaults to the locale character encoding, whereas the default encoding for xml is UTF-8.

We need to user stream.setCodec("UTF-8") to set the encoding. Preferably we also want a

<?xml version="1.0" encoding="UTF-8"?>

string to specify the xml decoding. In principle another encoding could be chosen, but there seems no reason not to use UTF-8.

Attachments (1)

httpstatus.diff (896 bytes) - added by Ian Dall <ian@…> 8 years ago.
Patch to ensure xml created with the correct character encoding

Download all attachments as: .zip

Change History (6)

Changed 8 years ago by Ian Dall <ian@…>

Attachment: httpstatus.diff added

Patch to ensure xml created with the correct character encoding

comment:1 Changed 8 years ago by Raymond Wagner

Status: newinfoneeded_new

What happens if you set the locale character encoding to match your location? Add the following to the environment mythbackend operates in.

export LANG=en_AU.UTF-8
  --or--
setenv LANG en_AU.UTF-8

comment:2 Changed 8 years ago by Ian Dall <ian@…>

Thanks, setting LANG as above does result in UTF-8 encoded xml which parses without error.

The thing is, if LANG isn't *.UTF-8, then the xml is invalid, as any encoding except UTF-8 must have a "Text Declaration" (or a "Byte Order Mark" if it is UTF-16). [w3c xml 1.0 Section 4.3.3]

So, I guess it is a low priority given there is a work around, but I would maintain that either UTF-8 locale should be forced (as my patch does) or the text declaration should be added with the encoding which is actually used. Something like:

    QTextCodec *default_enc = QTextCodec::codecForLocale();
    QDomProcessingInstruction encoding = doc.createProcessingInstruction("xml", "version=\"1.0\" encoding=\""
    									 + default_enc->name() + "\"");
    doc.appendChild(encoding);

Unfortunately this doesn't work because QTextCodec::codecForLocale() always has a name of "System" in Qt 4.7, so I can't see any clean way to do this.

comment:3 Changed 8 years ago by beirdo

Status: infoneeded_newnew

comment:4 Changed 8 years ago by beirdo

Owner: set to beirdo
Status: newassigned

comment:5 Changed 8 years ago by Github

Milestone: unknown0.25
Resolution: fixed
Status: assignedclosed

Force HTTP status to use UTF-8 in XML

XML should always be in UTF-8 (or at least a stated encoding) regardless of system locale. This change forces it to always use UTF-8.

Fixes #9918

Signed-off-by: Gavin Hurlbut <ghurlbut@…>

Branch: master Changeset: 77618fd71f6cd333d856e72cf12bbcf9116fd4f9

Note: See TracTickets for help on using tickets.