Opened 17 years ago
Closed 16 years ago
#2734 closed defect (fixed)
scripts for fetching metadata for video files automatically
Reported by: | visit0r <gmail: pekka.jaaskelainen> | Owned by: | Anduin Withers |
---|---|---|---|
Priority: | minor | Milestone: | unknown |
Component: | mythvideo | Version: | 0.20 |
Severity: | low | Keywords: | |
Cc: | Ticket locked: | no |
Description
I wrote two Python scripts that fetch meta data for video files automatically. They could be placed to MythVideo?/scripts, or somewhere. We discussed the scripts briefly in IRC with xris and anduin.
The find_meta.py [file/dir] scans the directory of the given video file for text files which could contain IMDb URLs (basically .nfos). If found, it fetches meta data from IMDb using the imdbpy.py script. If not found, it tries "intelligently" to search for the IMDb entry using the video file name or directory name (in case of DVD backup dirs). It creates [filename].metadata for video files and video.metadata for video directories (DVD backups/rips) if meta data was found.
This way of using the script adding meta data for MythVideo? is much easier, as one can just execute find_meta.py -r [videoroot] and magically get meta data added to all video files which the script could find meta data for. The current UI in MythVideo? is not really usable for adding meta data for large counts of videos at once. The script can also be executed in crontab as it does not overwrite old metadata files by default.
The imdbpy.py script (which can be used in MythVideo? as an IMDb ripper, BTW) supports also fetching data for single episodes of TV-series.
anduin agreed to implement importing of these metadata files to MythVideo?. In my opinion, the .metadata files could be also parsed "on the fly" when browsing the files in MythVideo?. This way for example, burned videos could work nicely without needing to insert to DB the metadata from each burned disc with videos when watching them.
Please test and comment. I hope the support to MythVideo? will be added soon. I can also help in that, if needed.
Attachments (19)
Change History (55)
Changed 17 years ago by
Attachment: | metadata_scripts.tar.gz added |
---|
comment:1 Changed 17 years ago by
comment:2 Changed 17 years ago by
Changed 17 years ago by
Attachment: | metadata_script_improvements.patch added |
---|
improvements to the scripts
comment:3 Changed 17 years ago by
Main improvement in the attached patch is the ability to import the metadata directly to Myth DB. Also other improvements I can't recall was added.
This makes the script usable in 0.20 also, as MythVideo? does not have to parse the .metadata files "on-the-fly".
comment:4 Changed 17 years ago by
Changed 17 years ago by
Attachment: | mythvideo_metadata_scripts.tar.gz added |
---|
The scripts as a whole for easier testing outside trunk.
Changed 17 years ago by
Attachment: | metadata_script_improvements_against_trunk_07-03-03.patch added |
---|
Changes against trunk (note: scripts are NOT tested with MythVideo? on trunk, only with 0.20).
comment:5 Changed 17 years ago by
I finished the features I planned for the metadata scripts. Now the scripts truly kick ass. Please try them out. I've only tested them with 0.20, but they should work also in trunk (AFAIK there hasn't been any MythVideo? DB schema changes that would break the importing).
Some of the changes since the last upload of the scripts to this ticket:
find_metadata.py:
- detect if metadata already exist before starting to scan the file
- better detection for dvd rip directories
- add file chaining data for multifile titles (dvd rip dirs)
- do not write metadata files by default, import to MythDB by default
- allow giving multiple paths to scan as argument
- picks the first Runtime if imdbpy.py returns multiple
- importing of movie's genre
- fetch and import the cover image
imdbpy.py:
- removed the changelog from the comments (SVN logs are better for this)
- fixed the plot summary fetching
comment:6 Changed 17 years ago by
comment:7 Changed 17 years ago by
Severity: | medium → low |
---|---|
Type: | enhancement → defect |
When running find_meta.py it says:
Got multiple candidates for title search 'Goldfinger-Documentary'. Use the '-a' switch to choose the correct one.
And as I understand it, it should say Use the '-i' switch?
comment:8 Changed 17 years ago by
You can use either one of them (-i or -a) to choose the correct imdb title.
Changed 17 years ago by
Attachment: | interacive_mode_fixes.patch added |
---|
More informative printout when multiple titles are found + uses 'readline' lib, if available, to read lines in interactive mode.
Changed 17 years ago by
Attachment: | metadata_script_improvements_against_trunk_07-03-04.patch added |
---|
Improvements since yesterday (major addition: new poster fetching script).
Changed 17 years ago by
Attachment: | mythvideo_metadata_scripts-07-03-04.tar.gz added |
---|
Updated scripts.
comment:9 Changed 17 years ago by
I had a boring Sunday, and improved the scripts a bit further.
Major addition is a new poster grabber script (fetch_poster.py) which grabs posters from movieposter.com. These poster images seem to be much better quality (larger images) than those from IMDb. However, the script has a fallback to the IMDb fetching if MoviePoster? does not have the poster.
The script uses Python Imaging Library to figure out the image sizes of the poster image candidates and chooses the one which is largest (and vertical).
In addition, several fixes here and there. For example, merged the patch in ticket #3123. So Anduin, if you apply this, you can close that ticket also. I think this is the last update to these scripts for a while unless bugs are found.
comment:10 Changed 17 years ago by
(In [12934]) References #2734 Closes #3123
Thanks to hads for the patch in #3123
Updates from Pekka J?\195?\131?\194?\164?\195?\131?\194?\164skel?\195?\131?\194?\164inen
"Major addition is a new poster grabber script (fetch_poster.py) which grabs posters from movieposter.com. These poster images seem to be much better quality (larger images) than those from IMDb. However, the script has a fallback to the IMDb fetching if MoviePoster?? does not have the poster.
The script uses Python Imaging Library to figure out the image sizes of the poster image candidates and chooses the one which is largest (and vertical)."
comment:11 Changed 17 years ago by
Got some problem with the latest version of Season is missing like for 0797746, then it just halts.
Scanning /var/lib/mythvideo/TopGear/Season8/Top Gear - 08x01 - 2006.05.07.nfo for IMDb ID Found IMDb ID '0797746' Querying IMDb for meta data for ID 0797746... Traceback (most recent call last):
File "/usr/share/mythtv/mythvideo/scripts/find_meta.py", line 931, in ?
main()
File "/usr/share/mythtv/mythvideo/scripts/find_meta.py", line 926, in main
scan(path, options.imdb_id)
File "/usr/share/mythtv/mythvideo/scripts/find_meta.py", line 834, in scan
scan_directory(root)
File "/usr/share/mythtv/mythvideo/scripts/find_meta.py", line 795, in scan_directory
scan_file(video)
File "/usr/share/mythtv/mythvideo/scripts/find_meta.py", line 758, in scan_file
metadata = find_metadata_for_video_path(pathName)
File "/usr/share/mythtv/mythvideo/scripts/find_meta.py", line 620, in find_metadata_for_video_path
metadata = imdbpy.metadata_search(imdb_id)
File "/usr/share/mythtv/mythvideo/scripts/imdbpy.py", line 197, in metadata_search
metadata += 'Title:' + episode_title_format % \
File "/var/lib/python-support/python2.4/imdb/utils.py", line 797, in getitem
rawData = self.data[key]
KeyError?: 'season'
Changed 17 years ago by
Attachment: | updates-07-03-09.patch added |
---|
Here's a patch fixing this problem and cleaning up the code a bit.
comment:12 follow-up: 13 Changed 17 years ago by
Please test the patch. Anduin can you apply this, thanks.
comment:13 Changed 17 years ago by
Replying to Pekka Jääskeläinen <pekka.jaaskelainen@gmail.com>:
Please test the patch. Anduin can you apply this, thanks.
Hi having some trouble applying the patch to mythvideo_metadata_scripts-07-03-04.tar.gz
root@myth:/usr/share/mythtv# patch -p0 < updates-07-03-09.patch patching file mythvideo/scripts/imdbpy.py Hunk #4 FAILED at 158. 1 out of 4 hunks FAILED -- saving rejects to file mythvideo/scripts/imdbpy.py.rej
comment:15 Changed 17 years ago by
comment:16 Changed 17 years ago by
comment:17 Changed 17 years ago by
I ran into some errors using the script so I downloaded the most recent version from the trunk (13018) and I'm still seeing the following errors:
mythtv@myth:/usr/share/mythtv/mythvideo/scripts$ ./find_meta.py -w -i -r /video/video_imports/ Database connection successful. Scanning directory /video/video_imports/... Scanning directory /video/video_imports/TV Shows... Scanning directory /video/video_imports/TV Shows/Heroes?... Found 1 videos. Scanning file Heroes s01e18 - Parasite.avi... Title search 'Heroes s01e18 - Parasite' Got multiple candidates for title search 'Heroes s01e18 - Parasite'. 0813715) Heroes (2006) 0802147) Heroes (2006) 0452723) Heroes (1984) 0291617) Heroes (1990) 0422413) Heroes (1985) ?)0813715 Chose 0813715 Querying IMDb for meta data for ID 0813715... Metadata: Title:Heroes Runtime:45,60 (including commercials) Year:2006 Director:Greg Beeman Plot:They thought they were like everyone else... until they woke with incredible abilities. Runtime:45,60 (including commercials) Genres:Drama,Fantasy Countries:USA IMDb:0813715
Inserting metadata to MythDB for /video/video_imports/TV Shows/Heroes/Heroes? s01e18 - Parasite.avi. No metadata in MythDB, creating a new one. ./find_meta.py:397: Warning: Field 'title' doesn't have a default value
c.execute("""INSERT INTO videometadata(filename) VALUES(%s)""", (videopath,))
./find_meta.py:397: Warning: Field 'director' doesn't have a default value
c.execute("""INSERT INTO videometadata(filename) VALUES(%s)""", (videopath,))
./find_meta.py:397: Warning: Field 'rating' doesn't have a default value
c.execute("""INSERT INTO videometadata(filename) VALUES(%s)""", (videopath,))
./find_meta.py:397: Warning: Field 'inetref' doesn't have a default value
c.execute("""INSERT INTO videometadata(filename) VALUES(%s)""", (videopath,))
./find_meta.py:397: Warning: Field 'year' doesn't have a default value
c.execute("""INSERT INTO videometadata(filename) VALUES(%s)""", (videopath,))
./find_meta.py:397: Warning: Field 'userrating' doesn't have a default value
c.execute("""INSERT INTO videometadata(filename) VALUES(%s)""", (videopath,))
./find_meta.py:397: Warning: Field 'length' doesn't have a default value
c.execute("""INSERT INTO videometadata(filename) VALUES(%s)""", (videopath,))
./find_meta.py:397: Warning: Field 'showlevel' doesn't have a default value
c.execute("""INSERT INTO videometadata(filename) VALUES(%s)""", (videopath,))
./find_meta.py:397: Warning: Field 'coverfile' doesn't have a default value
c.execute("""INSERT INTO videometadata(filename) VALUES(%s)""", (videopath,))
Traceback (most recent call last):
File "./find_meta.py", line 947, in ?
main()
File "./find_meta.py", line 942, in main
scan(path, options.imdb_id)
File "./find_meta.py", line 850, in scan
scan_directory(root)
File "./find_meta.py", line 811, in scan_directory
scan_file(video)
File "./find_meta.py", line 776, in scan_file
save_metadata(pathName, metadata_target, metadata)
File "./find_meta.py", line 547, in save_metadata
save_metadata_to_mythdb(videopath, metadata)
File "./find_meta.py", line 316, in save_metadata_to_mythdb
return save_video_metadata_to_mythdb(videopath, metadata)
File "./find_meta.py", line 412, in save_video_metadata_to_mythdb
coverfile = find_poster_image(title, inetref)
File "./find_meta.py", line 494, in find_poster_image
posters = fetch_poster.find_best_posters(\
File "/usr/share/mythtv/mythvideo/scripts/fetch_poster.py", line 215, in find_best_posters
new_posters = fetcher.fetch(title, imdb_id)
File "/usr/share/mythtv/mythvideo/scripts/fetch_poster.py", line 108, in fetch
image_url = self.find_poster_image_url(url)
File "/usr/share/mythtv/mythvideo/scripts/fetch_poster.py", line 123, in find_poster_image_url
soup = BeautifulSoup?.BeautifulSoup?(urllib.urlopen(poster_page_url))
File "/usr/lib/python2.4/urllib.py", line 82, in urlopen
return opener.open(url)
File "/usr/lib/python2.4/urllib.py", line 187, in open
return self.open_unknown(fullurl, data)
File "/usr/lib/python2.4/urllib.py", line 199, in open_unknown
raise IOError, ('url error', 'unknown url type', type)
IOError: [Errno url error] unknown url type: 'mailto'
mythtv@myth:/usr/share/mythtv/mythvideo/scripts$
Am I using it incorrectly?
comment:18 Changed 17 years ago by
I found a solution for my errors, but I've never programmed in python before, so I'm not sure what the best solution would be. The important thing is that it works for me. It turned out that there were actually two separate issues.
The first one was the warnings caused by a lack of a default for some of the fields in the videometadata table. I just used sql to alter the table to give every field a default and the warnings went away. I'm guessing the only reason I even encountered these warnings is because I'm working entirely from the command line and hadn't initialized the table by visiting the module with the frontend.
The second was an error that caused the script to die any time it came across a mailto link on the movie posters site. I guess this is primarily an issue for TV shows, and possibly only shows who haven't released any DVD's yet. A simple if statement tested for a mailto and bypassed the invalid url request. I don't know what a valid patch file should look like so I apologize for the crude attempt below:
mythvideo/scripts/fetch_poster.py around line 123:
""" Parses the given poster page and returns an URL pointing to the poster image. """ +if poster_page_url[:6] == 'mailto': + return None
soup = BeautifulSoup?.BeautifulSoup?(urllib.urlopen(poster_page_url))
Changed 17 years ago by
Attachment: | ignore_mailto_links.patch added |
---|
ignore mailto: links in MoviePoster? grabbing
comment:19 Changed 17 years ago by
Thanks for reporting. I couldn't reproduce your default value issue, but the latter issue is fixed in the attached patch.
Changed 17 years ago by
Attachment: | metadata_scripts_prune.patch added |
---|
Add '-p' which prunes old metadata from MythDB. Includes the previous patch also.
comment:20 Changed 17 years ago by
Changed 17 years ago by
Attachment: | add_leading_zero_to_episode_number.diff added |
---|
adds leading zero to seris and episode in title when they are less than 10
comment:21 Changed 17 years ago by
comment:22 Changed 17 years ago by
Normaly I burn my Movies like that:
An directory for every movie containing:
- The Video-File
- The Cover-File
- The Metadata-File
So when I Insert the CD I can see the Cover-File without importing the Video to the DB. *nice*
This would be perfect combined with your Idea:
In my opinion, the .metadata files could be also parsed "on the fly" when browsing the files in MythVideo?.
The only change to mythvideo would be: In the case mythvideo finds only one videofile in an directory, it should skip showing the content of the directory and directly go to the next step. (Playing or showing the Infos)
So every CD/DVD would look like imported to the DB.
Even if mythvideo will not parse the metadata "on the fly" this change cold be cool, because I would write an autostart-script which would execute "find_meta.py -r [cd]" when it finds an specific empty textfile (like: "importit.txt") on th CD.
MythVideo? could also recognize several files with practicly the same name as one Movie.
- Movie.CD1.avi
- Movie.CD2.avi
-> Movie
comment:23 Changed 17 years ago by
Yes,this all have been planned. Especially the idea that you see movie titles (and don't have to select files in file system mode) even though they consist of multiple discs would be nice, but it has its problems in implementation side (bookmarks etc.). I welcome your patches though :)
comment:24 follow-up: 27 Changed 17 years ago by
Some updates to the metadata scripts.
imdbpy.py
- detection for series titles in format "Sopranos 612"
- fixed a crash when the IMDb had no year for the queried title
- some misc. desperate unicode fixes (I really hate those!)
- fetch AKA info (names for the title in different countries), if available
- misc code cleanups and bug fixes
find_meta.py
- a switch which disables fetching of the poster image (-s)
- a switch which allows defining the language code of the AKA to use for the title string (-t)
- misc code cleanups and bug fixes
Changed 17 years ago by
Attachment: | mythvideo_metadata_script_updates-07-05-05.diff added |
---|
my updates as of 2007-05-05 (against trunk)
comment:25 Changed 17 years ago by
(In [13423])
Thanks to Pekka J?\195?\164?\195?\164skel?\195?\164inen for updating his script:
misc code cleanups and bug fixes
imdbpy.py
- fixed a crash when IMDb had no year for the queried title
- some misc. desperate unicode fixes (I really hate those!)
- fetch AKA info (names for the title in different countries), if available
find_meta.py
- a switch which disables fetching of the poster image (-s)
- a switch which allows defining the language code of the AKA to use for the title string (-t)
comment:26 Changed 17 years ago by
Unicode doesn't seem to be supported. Grabbing Aeon Flux (the movie) from imdb crashed the script with unicode errors. It's the Greek AE combination that seems to be messing it up.
comment:27 Changed 17 years ago by
Replying to visit0r:
- a switch which allows defining the language code of the AKA to use for the title string (-t)
AKA sometimes looks like that:[br]
Reine Nervensache 2::(Austria)::(Germany)::[de]
Because the Title is the same in Austria and in Germany.
In this case the Name imported to the DB is
Reine Nervensache 2::(Austria)(Analyze That)
instead of
Reine Nervensache 2 (Analyze That)
BTW: isnt the swicht (-l) and not (-t)?
Changed 17 years ago by
Attachment: | mythvideo_metadata_script_updates-07-05-12.diff added |
---|
fixes to the metadata scripts
comment:28 Changed 17 years ago by
The attached patch fixes both reported problems and couple of others. In addition, in MythTV.py moves MythDB related functions out of MythTV so find_meta.py works again also in 0.20 (no need to worry about the mythbackend protocol upgrades).
comment:29 Changed 17 years ago by
Changed 17 years ago by
Attachment: | mythdb_class_fix.diff added |
---|
Fix for MythDB class after last update
comment:30 Changed 17 years ago by
Changed 17 years ago by
Attachment: | mythtv.py_additions.diff added |
---|
A couple of additions to the MythTV class and doc string tidy up
comment:31 Changed 17 years ago by
I just attached a diff with some doc string tidy ups, added some str and repr methods, and a couple of additions to the MythTV class;
- getRecorderList - This complements getFreeRecorderList; it gets all the recorders, free or not.
- getRecorderDetails - Gets the details of a tuner card (type and video device etc). A small Recorder class was added for this.
I accidentally included a CSS adjustment for mythweb in my patch and I can't overwrite it. Sorry, please ignore that bit of the diff.
Changed 17 years ago by
Attachment: | mythtv.py_additions.2.diff added |
---|
Updated patch, removed unrelated stuff and fixed strftime string
comment:32 Changed 17 years ago by
Updated the patch, I just added another method to the MythTV class; isActiveBackend. Also added the hostname property to the Recorder class.
comment:33 Changed 17 years ago by
Resolution: | → fixed |
---|---|
Status: | new → closed |
comment:34 Changed 17 years ago by
Resolution: | fixed |
---|---|
Status: | closed → reopened |
Trivial fix for allowing to use ip in the backend hostname setting (parser tokenized '.' so basically only one-word host names worked, such as 'localhost').
Removed redundant init_db() from find_meta.py.
Changed 17 years ago by
Attachment: | backendip.patch added |
---|
Changed 17 years ago by
Attachment: | backendip2.patch added |
---|
comment:35 Changed 17 years ago by
Sorry, the first one had some debugging modifications in it. The latter patch is the correcct one.
comment:36 Changed 16 years ago by
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
the scripts