Opened 6 years ago

Closed 6 years ago

Last modified 6 years ago

#13315 closed Patch - Bug Fix (fixed)

Use fully-decoded URL paths when translating to local paths

Reported by: ijc Owned by: Peter Bennett
Priority: minor Milestone: 30.0
Component: MythTV - General Version: Master Head
Severity: medium Keywords:
Cc: Ticket locked: no

Description

Please see https://github.com/MythTV/mythtv/pull/167. Not quite sure about the component, is this services API backend? I left as general. I set the version/milestone to master & 30, but it would be great to get this back ported to fixes/29 too please.

Quoting the PR description:

Previously it was not possible to access files with certain characters in them remotely (such as music tracks with quotes in the title) because the characters are URL %-escaped, e.g.:

        ProcessRequest storagegroup.cpp:608 (FindFile) SG(Music): FindFile: Searching for '/Enslaved - Isa/01-Intro: %22Green Reflection%22.flac'
        ProcessRequest storagegroup.cpp:639 (FindFileDir) SG(Music): FindFileDir: Checking '/storage/music' for '/storage/music//Enslaved - Isa/01-Intro: %22Green Reflection%22.flac'
        ProcessRequest storagegroup.cpp:639 (FindFileDir) SG(Default): FindFileDir: Checking '/var/lib/mythtv/recordings' for '/var/lib/mythtv/recordings//Enslaved - Isa/01-Intro: %22Green Reflection%22.flac'
        ProcessRequest storagegroup.cpp:639 (FindFileDir) SG(): FindFileDir: Checking '/var/lib/mythtv/.mythtv/Banners' for '/var/lib/mythtv/.mythtv/Banners//Enslaved - Isa/01-Intro: %22Green Reflection%22.flac'
        ProcessRequest storagegroup.cpp:639 (FindFileDir) SG(): FindFileDir: Checking '/var/lib/mythtv/.mythtv/Coverart' for '/var/lib/mythtv/.mythtv/Coverart//Enslaved - Isa/01-Intro: %22Green Reflection%22.flac'
        ProcessRequest storagegroup.cpp:639 (FindFileDir) SG(): FindFileDir: Checking '/var/backups/mythtv' for '/var/backups/mythtv//Enslaved - Isa/01-Intro: %22Green Reflection%22.flac'
        ProcessRequest storagegroup.cpp:639 (FindFileDir) SG(): FindFileDir: Checking '/var/lib/mythtv/recordings' for '/var/lib/mythtv/recordings//Enslaved - Isa/01-Intro: %22Green Reflection%22.flac'
        ProcessRequest storagegroup.cpp:639 (FindFileDir) SG(): FindFileDir: Checking '/var/lib/mythtv/.mythtv/Fanart' for '/var/lib/mythtv/.mythtv/Fanart//Enslaved - Isa/01-Intro: %22Gree  Reflection%22.flac'
        ProcessRequest storagegroup.cpp:639 (FindFileDir) SG(): FindFileDir: Checking '/storage/music' for '/storage/music//Enslaved - Isa/01-Intro: %22Green Reflection%22.flac'
        ProcessRequest storagegroup.cpp:639 (FindFileDir) SG(): FindFileDir: Checking '/var/lib/mythtv/.mythtv/MusicArt' for '/var/lib/mythtv/.mythtv/MusicArt//Enslaved - Isa/01-Intro: %22Green Reflection%22.flac'
        ProcessRequest storagegroup.cpp:639 (FindFileDir) SG(): FindFileDir: Checking '/storage/media' for '/storage/media//Enslaved - Isa/01-Intro: %22Green Reflection%22.flac'
        ProcessRequest storagegroup.cpp:639 (FindFileDir) SG(): FindFileDir: Checking '/var/lib/mythtv/.mythtv/Screenshots' for '/var/lib/mythtv/.mythtv/Screenshots//Enslaved - Isa/01-Intro: %22Green Reflection%22.flac'
        ProcessRequest storagegroup.cpp:639 (FindFileDir) SG(): FindFileDir: Checking '/var/lib/mythtv/.mythtv/Trailers' for '/var/lib/mythtv/.mythtv/Trailers//Enslaved - Isa/01-Intro: %22Green Reflection%22.flac'
        ProcessRequest storagegroup.cpp:639 (FindFileDir) SG(): FindFileDir: Checking '/var/lib/mythvideo' for '/var/lib/mythvideo//Enslaved - Isa/01-Intro: %22Green Reflection%22.flac'
        ProcessRequest storagegroup.cpp:622 (FindFile) SG(Music): FindFile: Unable to find '/Enslaved - Isa/01-Intro: %22Green Reflection%22.flac'!
        ProcessRequest mainserver.cpp:7968 (LocalFilePath) MainServer: ERROR: LocalFilePath unable to find local path for '/Enslaved - Isa/01-Intro: %22Green Reflection%22.flac'.

This is because the actual file on disk is not %-encoded, the actual filename is 01-Intro: "Green Reflection".flac. If the filename had been %-encoded, then the resulting %'s would, I beleive, have themselves been encoded as %25 by this point. To fix this switch from url.toString() to url.path(QUrl::FullyDecoded). Note that:

  • url.toString() does not accept QUrl::FullyDecoded. That produces the error message: QUrl: QUrl::FullyDecoded is not permitted when reconstructing the full URL
  • at this point in the code the URL only contains a path element and a possible fragment (handled separately in the code). The is no scheme:// or hostname etc). I think the only reason it is a QUrl at this point and not a QString is the fragment handling.
  • a path (not a full URL) is what StorageGroup::FindFile and friends expect, they take QString not QUrl.

Also add QUrl::FullyDecoded to some existing uses of url.path() in these code paths.

There seems to be two very similar copies of this code (libmythprotoserver and in mythbackend/mainserver directly) so I have adjusted both. It's not entirely clear to me where they are each used from, but I have tested (on a backport to fixes/29):

  • recording playback (both preexisting recordings and live TV), the filenames here are autogenerated and are not expected to contain any quotable characters AFAIK.
  • mythvideo (filenames with and without quote characters).
  • mythmusic (filenames with and without quote characters).
  • old gallery plugin (only filenames without quotes).

Signed-off-by: Ian Campbell <ijc@…>

Attachments (2)

0001-Drop-internal-use-of-QUrl-in-FileTransfer.patch (13.6 KB) - added by ijc 6 years ago.
[PATCH 1/2] Drop internal use of QUrl in FileTransfer?
0002-Use-QUrl-internally-in-MythCoreContext-GenMythURL.patch (2.7 KB) - added by ijc 6 years ago.
[PATCH 2/2] Use QUrl internally in MythCoreContext::GenMythURL

Download all attachments as: .zip

Change History (20)

comment:1 Changed 6 years ago by ijc

New comment on PR:

Things which looked like fragments (bits after #) and queries (bits after ?) were being dropped, even though they were actually part of the filename, not the URL. In a few (but not all) places the fragment was being put back on, but the query never was. Handle this across the board, also be sure to decode any %escapes in those too.

This is working for me, but I'm now also playing with a path which ditches QUrl altogether and just uses a QString, we aren't treating these things as URLs in any meaningful way here and just end up undoing all the things which QUrl does for us.

comment:2 Changed 6 years ago by ijc

I posted on the PR in September but forgot to repeat here:

This is working for me, but I'm now also playing with a path which ditches QUrl altogether and just uses a QString, we aren't treating these things as URLs in any meaningful way here and just end up undoing all the things which QUrl does for us.

I've been running with this for a few days now and everything seems ok AFAICT. I've pushed it here now.

It's now been a few months not days and all remains fine AFAICT.

comment:3 Changed 6 years ago by Peter Bennett

Owner: set to Peter Bennett
Status: newassigned

comment:4 Changed 6 years ago by ijc

I went to squash and realised that I had only applied the change to remove QUrl in favour of QString from one of the two copies of this code. I will update ASAP with a version that does both halves.

comment:5 Changed 6 years ago by ijc

I've attached an updated (and squashed) patch which handles both sets of code. I also updated the PR with the same thing.

comment:6 Changed 6 years ago by Peter Bennett

There is a problem here. The patch fixes some things and breaks others.

I tested the following video file name :

  ticker_1080i%25.mkv

This plays correctly on a local frontend as well as a remote frontend. However with your patch the playback fails in both cases with the backend reporting it cannot find ticker_1080i%.mkv

Downloaded files frequently have url encoded file names and people may have such files. These would fail with the new patch.

The patch does fix the case of failing to play a file with a quote (") in the name.

comment:7 Changed 6 years ago by ijc

Hi Peter, I somehow missed the notification for your comment but I was coming here to report the same thing (or at least something similar) which is that handling of UTF-8 seems to be a bit off now too.

From what you are saying it seems like there is still some URL decoding going on somewhere i.e. a request for the filename ticker_1080i%25.mkv seems to have been decoded into ticker_1080i%.mkv somewhere along the line which is unexpected I think.

comment:8 Changed 6 years ago by ijc

So I have added a file with a %25 in it so I have:

$ ls /var/lib/mythvideo/temp/ticker_1080i%25.mkv 
/var/lib/mythvideo/temp/ticker_1080i%25.mkv

Which is in the database as:

MariaDB [mythconverg]> select title,subtitle,filename from videometadata where filename like "%ticker%";
+-----------------+----------+--------------------------+
| title           | subtitle | filename                 |
+-----------------+----------+--------------------------+
| ticker 1080i%25 |          | temp/ticker_1080i%25.mkv |
+-----------------+----------+--------------------------+
1 row in set (0.01 sec)

When playing (without my patches) I see:

mythfrontend[9987]: I CoreContext remotefile.cpp:518 (Exists) RemoteFile::Exists(): looking for remote file: myth://Videos@iranon/temp/ticker_1080i%25.mkv
mythfrontend[9987]: I MythSocketThread(-1) mythsocket.cpp:758 (WriteStringListReal) MythSocket(2891300:36): write -> 36 55      QUERY_FILE_EXISTS[]:[]temp/ticker_1080i%.mkv[]:[]Videos
mythfrontend[9987]: I MythSocketThread(-1) mythsocket.cpp:758 (WriteStringListReal) MythSocket(5222e40:76): write -> 76 191     ANN FileTransfer iranon 0 1 2000[]:[]/temp/ticker_1080i%.mkv[]:[]...

and

mythbackend[9874]: I MythSocketThread(96) mythsocket.cpp:959 (ReadStringListReal) MythSocket(2aced50:96): read  <- 96 55      QUERY_FILE_EXISTS[]:[]temp/ticker_1080i%.mkv[]:[]Videos
mythbackend[9874]: E ProcessRequest storagegroup.cpp:622 (FindFile) SG(Videos): FindFile: Unable to find 'temp/ticker_1080i%.mkv'!
mythbackend[9874]: I MythSocketThread(182) mythsocket.cpp:959 (ReadStringListReal) MythSocket(2ad0fb0:182): read  <- 182 191     ANN FileTransfer iranon 0 1 2000[]:[]/temp/ticker_1080i%.mkv[]:[...
mythbackend[9874]: I ProcessRequest mainserver.cpp:7994 (LocalFilePath) MainServer: LocalFilePath(/temp/ticker_1080i%25.mkv 'ticker_1080i%.mkv'), found file through exhaustive search at '/var/lib/mythvideo//temp/ticker_1080i%25.mkv'

So, I think it looks like something somewhere (on the client side, between the log for RemoteFile::Exists() and the one for WriteStringListReal) is decoding the URL escaping in the name and then relying on the code I removed to reencode it so it can find the file, which is rather backwards (if anything it should be further encoding it to ticker_1080i%2525.mkv so it can be decoded to ticker_1080i%25.mkv).

I think the code which messes with the encoding is spread through libs/libmythbase/remotefile.{cpp,h}.

I think the choices are to try fix the use of Qurl in remotefile.cpp (which must be there in this case because it actually is a URL, so we can't just excise it completely) or to roll back to my first attempt here which added fixes to the use of QUrl in mainserver.cpp etc (e.g. ensuring the fragment and query were put back, etc).

I prefer the first choice (although it might take a bit longer to get right). I'm going to have a play right now.

comment:9 Changed 6 years ago by ijc

OK, so I'm now fairly convinced that the issue is that myth is treating myth://Videos@iranon/temp/ticker_1080i%25.mkv as a URL referring to the file temp/ticker_1080i%25.mkv.

The correct URL to refer to that file would be myth://Videos@iranon/temp/ticker_1080i%2525.mkv (i.e. the literal % has been encoded).

I think we've got a situation where two wrongs are making a (partially) right in that the incorrect URL can sometimes be accepted because the backend is reencoding the path.

I'm debating what to do here -- finding all the places which call into RemoteFile::* and ensuring that they properly URL encode things correctly seems like a pretty large (if not impossible) task, but would be the correct thing to do I think.

Declaring that RemoteFile URLs are not exactly the same as regular ones, wrt %encoding and the handling of query&fragment parts might be simpler but feels very ugly :-/

comment:10 Changed 6 years ago by David Hampton

I agree that there shouldn't be two different flavors of url. Too confusing to both users and developers.

comment:11 Changed 6 years ago by ijc

Most (all?) of the RemoteFile methods seem to take some variant on QString url and then reasonably promptly do a QUrl qurl(url) to turn it into a structured URL.

I think it would make sense to push the use of QUrl into the method arguments, this will then propagate up the call chain to whichever places are currently failing to properly encode which can then be fixed. It also seems like the right thing to do -- if these things are going to be treated as URLs we should probably use the right type for them.

One wrinkle is that some (all?) of these methods also handle an argument which is a local path and not a URL at all. I think that's OK, all it means is these become QUrl's with a scheme of file and this all comes out in the wash.

I'll give this a go, I expect it'll be a big patch and take a little while to do...

comment:12 Changed 6 years ago by ijc

I'll give this a go, I expect it'll be a big patch and take a little while to do...

and indeed it is a pretty big patch (and I didn't even finish)...

However, in the process of looking at it I realised that we can get a large improvement in correctness by just having GenMythURL use QUrl internally but still return a QString (i.e. qurl.toString()) this ensures that everything in the returned stringified url is correctly escaped such that converting back to a QUrl later on (e.g. in remotefile.cpp) does the right thing.

The patch for this looks quite tractable and I'm testing it right now. I'm hopeful it might even be acceptable for the 30 release.

Once GenMythURL uses QUrl internally we can start to bubble that up the call chains (perhaps using MDEPRECATED as needed to break off even smaller chunks at a time) and eventually end up using QUrl everywhere (my initial mistake was to start from the remotefile.cpp functions and trying to tease it out from the other end -- which was backwards and left me with a big pile of yarn...)

Changed 6 years ago by ijc

[PATCH 1/2] Drop internal use of QUrl in FileTransfer?

Changed 6 years ago by ijc

[PATCH 2/2] Use QUrl internally in MythCoreContext::GenMythURL

comment:13 Changed 6 years ago by ijc

I've refreshed the original patch and added a second (cumulative) patch. This seems to be work for me in all the test cases I tried before, plus Peter's case with the % encoded file name.

I'm still looking into more widespread internal use of QUrl, but I think this pair could go in now.

comment:14 Changed 6 years ago by Peter Bennett

I tested files with quote (") and with %25

  • Unpatched backend and frontend - file with quote fails on remote frontend
  • Unpatched backend, patched frontend - file with quote fails on remote frontend
  • Patched backend, unpatched remote frontend - file with %25 fails on remote frontend
  • Patched backend and frontend - all works

This seems fine to me. Are you ready for me to commit it? There was some discussion on irc about adding some tests.

comment:15 Changed 6 years ago by ijc

I think the testing discussion on IRC is a longer term thing (and will be quite involved).

IMHO this can go in whenever you are ready.

comment:16 Changed 6 years ago by Peter Bennett

Resolution: Fixed
Status: assignedclosed

commit b18a1ef110500d1634891b4d3534682db61a2348 (HEAD -> master, origin/master)

Author: Ian Campbell <ijc@hellion.org.uk>
Date:   Sun Dec 9 18:34:40 2018 -0500

    Fix handling of special characters in remote files
    
    Change the usage of QUrl so that file names are correctly escaped
    when accessing files on a remote backend, for example videos with
    a quote as part of the name.
    
    Fixes #13315
    
    Signed-off-by: Peter Bennett <pbennett@mythtv.org>

comment:17 Changed 6 years ago by Ian Campbell <ijc@…>

Resolution: Fixedfixed

In b18a1ef11/mythtv:

Fix handling of special characters in remote files

Change the usage of QUrl so that file names are correctly escaped
when accessing files on a remote backend, for example videos with
a quote as part of the name.

Fixes #13315

Signed-off-by: Peter Bennett <pbennett@…>

comment:18 Changed 6 years ago by Ian Campbell <ijc@…>

In b18a1ef11/mythtv:

Fix handling of special characters in remote files

Change the usage of QUrl so that file names are correctly escaped
when accessing files on a remote backend, for example videos with
a quote as part of the name.

Fixes #13315

Signed-off-by: Peter Bennett <pbennett@…>

Note: See TracTickets for help on using tickets.