Opened 5 years ago
Closed 5 years ago
Last modified 5 years ago
#13514 closed Bug Report - General (Fixed)
ttvdb does not choose the best show when searching
Reported by: | mspieth | Owned by: | mspieth |
---|---|---|---|
Priority: | minor | Milestone: | 31.0 |
Component: | MythTV - General | Version: | Master Head |
Severity: | medium | Keywords: | |
Cc: | Ticket locked: | no |
Description
Updates on the thetvdb site means the search request can return shows that are not in the best order. There needs to be a sorting algo to get the best show choice.
I propose to add levenshtein text distance to choose the best match. This will mean a new python module python-levenshtein as a requirement for ttvdb.
Attachments (1)
Change History (8)
Changed 5 years ago by
Attachment: | levenshtein_normalize_unicode.py added |
---|
comment:1 Changed 5 years ago by
You can use the levenshtein implementation from the MythtTV's python bindings as fallback.
This works best with normalized unicode strings, and can be fed with utf-8 encoded strings as well.
See usage in attached file 'levenshtein_normalize_unicode.py'. Works with python2 and python3.
comment:2 follow-up: 3 Changed 5 years ago by
Is it better to use this or fuzzywuzzy?
Opinions wanted.
comment:3 Changed 5 years ago by
Replying to mspieth:
Is it better to use this or fuzzywuzzy?
Opinions wanted.
Well, since you asked....
Do you have enough samples of the ttvdb (poor) choices to result in a meaningful comparison of the various algorithms to determine which is statistically better(*) for MythTV? Those might include Jaccard Similarity, Cosine Similarity, Levenshtein Distance. If you have enough samples, you could use something like the textdistance library (which provides for the choice of many algorithms) as the the test platform for obtaining your statistics.
fuzzywuzzy is an implementation of Levenshtein distance, as I recall, as is python-Levenshtein, so in addition to the existing internal matching, they should be expected to all produce essentially equivalent results (and obviously the internal function is already in place, which has a different advantage).
I suppose that requiring textdistance, with an optional user override of the algorithm, provides the most flexibility, but I have not researched how widely textdistance has been packaged in the supported distros, and if (for example) Levenshtein is always good, using the internal functions might be good enough.
(*) The general problem of natural language searching is not yet a completely solved problem, of course, so "better" is as good as it gets.
comment:4 Changed 5 years ago by
Resolution: | → Fixed |
---|---|
Status: | assigned → closed |
After a few inputs this has been changed to use the bindings copy of levenshtein.
Works just as well as before.
comment:7 Changed 5 years ago by
Milestone: | needs_triage → 31.0 |
---|---|
Version: | Unspecified → Master Head |
Usage of levenshtein within python bindings