Compare commits

..

385 Commits

Author SHA1 Message Date
80b9bbce86 release 2013.11.13 2013-11-13 11:09:04 +01:00
d37936386f Credit @saper for tvp IE (#1730) 2013-11-13 11:08:07 +01:00
c3a3028f9f [tvp] Minor improvements (#1730) 2013-11-13 11:06:53 +01:00
6c5ad80cdc Merge remote-tracking branch 'saper/tvp' 2013-11-13 11:03:49 +01:00
b5bdc2699a Credit @jelly for gamekings extractor (#1759) 2013-11-13 10:52:22 +01:00
384b98cd8f [gamekings] Minor fixes (#1759) 2013-11-13 10:51:00 +01:00
eb9b5bffef Add extractor for gamekings.tv 2013-11-13 10:38:47 +01:00
8b8cbd8f6d [vine] Fix uploader extraction 2013-11-12 20:50:52 +01:00
72b18c5d34 FFmpegMetadataPP: don't enclose the values with " (fixes #1756) 2013-11-12 20:38:13 +01:00
eb0a839866 [common] Simplify og_search_property 2013-11-12 10:36:23 +01:00
1777d5a952 release 2013.11.11 2013-11-11 18:28:17 +01:00
d4b7da84c3 Clarify -c. Do not pass it in if you don't know what you're doing
Suggested in #1743
2013-11-11 14:21:14 +01:00
801dbbdffd Use avconv for downloading with m3u8 manifests if it's available (fixes #1735) 2013-11-10 16:47:03 +01:00
0ed05a1d2d Use the 'rtmp_live' field for the live parameter of rtmpdump 2013-11-10 12:45:17 +01:00
1008bebade Merge remote-tracking branch 'rzhxeo/rtmpdump_live' 2013-11-10 12:38:40 +01:00
ae84f879d7 Merge all the subtitles test into a single file
They reuse a base class
2013-11-10 12:28:21 +01:00
be6dfd1b49 [ted] Return a single info_dict for talks urls
It failed with the --list-subs option
2013-11-10 12:09:12 +01:00
231516b6c9 Merge pull request #1705 from iemejia/master
[ted] support for subtitles
2013-11-10 11:54:18 +01:00
fb53d58dcf Merge pull request #1726 from saper/escaped
Fix AssertionError when og property not found
2013-11-10 02:51:52 -08:00
2a9e9b210b Fix the documentation of '--autonumber-size' (#1743)
it's '--auto-number' not '--autonumber'
2013-11-09 19:21:30 +01:00
897d6cc43a Improve format listing for long format ids
Now arte.tv videos have quite long ids.
2013-11-09 19:07:34 +01:00
f470c6c812 [arte] Improve the format sorting
Also use the bitrate.
Prefer normal version and sourds/mal version over original version with subtitles.
2013-11-09 19:05:19 +01:00
566d4e0425 [arte] Make sure the format_id is unique (closes #1739)
Include the bitrate and use the height instead of the quality field.
2013-11-09 19:01:23 +01:00
81be02d2f9 [cnn] Accept www.cnn.com urls (fixes #1740) 2013-11-09 18:16:32 +01:00
c2b6a482d5 [brightcove] the format function requires to specify the index in python2.6 2013-11-09 18:10:11 +01:00
12c167c881 [soundcloud] Allow to download tracks marked as not 'streamable'
They use the rtmp protocol but if the are marked as 'downloadable' it can use the direct download link.
2013-11-09 18:08:03 +01:00
20aafee7fa [kankan] Fix the video url
It now requires two additional parameters, one is a timestamp we get from the getCdnresource_flv page and the other is a key we have to build.
2013-11-09 16:51:11 +01:00
be07375b66 Don't recode the video with m3u8 downloads (fixes #1741) 2013-11-09 16:40:00 +01:00
dd5bcdc4c9 [brightcove] Set the 'Referer' header if the url has the 'linkBaseUrl' parameter (fixes #1553) 2013-11-07 21:06:48 +01:00
6161d17579 release 2013.11.07 2013-11-07 11:06:34 +01:00
4ac5306ae7 Fix the report progress when file_size is unknown (#1731)
The report_progress function will accept eta and percent with None value and will set the message to 'Unknow ETA' or 'Unknown %'.
Otherwise the values must be numbers.
2013-11-07 08:03:35 +01:00
b1a80ec1a9 [xnxx] Accept urls that start with 'www' (fixes #1734) 2013-11-06 23:45:01 +01:00
672fe94dcb release 2013.11.06.1 2013-11-06 22:11:46 +01:00
51040b72ed [brightcove] Support redirected urls from bcove.me (fixes #1732)
'bctid' needs to be changed to '@videoPlayer', and 'bckey' to 'playerKey'.
2013-11-06 22:03:00 +01:00
4f045eef8f [youtube:channel] Fix the extraction
The page don't include the 'load more' button anymore, now we directly get the 'c4_browse_ajax' pages.
2013-11-06 21:42:33 +01:00
5d7b253ea0 Add an extractor for eitb.tv (fixes #1608)
The BrighcoveExperience object doesn't contain the video id, the extractor adds it and passes the url to BrightcoveIE.
2013-11-06 20:06:14 +01:00
b0759f0c19 [brightcove] Extract all the available formats 2013-11-06 19:05:41 +01:00
065472936a Add an extractor for space.com (fixes #1718)
It uses Brightcove, but requires some special process for getting a url with the playerKey field in some videos
2013-11-06 17:37:39 +01:00
fc4a0c2aec [brightcove] Change the 'videoId' or 'videoID' field to '@videoPlayer' (fixes #1697)
It seems to be needed when using the htmlFederated page
2013-11-06 17:31:47 +01:00
eeb165e674 [brightcove] Add the extraction of the url from generic 2013-11-06 16:58:03 +01:00
9ee2b5f6f2 tests: don't run the test if any of the extractors listed in the 'add_ie' field is marked as not working 2013-11-06 16:43:26 +01:00
da54be877a release 2013.11.06 2013-11-06 14:02:52 +01:00
50a886b7ab Fix reporting when file size is unkown (Fixes #1731) 2013-11-06 14:02:33 +01:00
76e67c2cb6 Clean up imports 2013-11-06 14:01:43 +01:00
5137ebac0b [tvp] Telewizja Polska: new extractor for tvp.pl, fixes #1719
Thanks-To: mplonski

https://github.com/mplonski/linux/blob/master/tvp-dl.py
2013-11-05 23:47:40 +01:00
a8eeb0597b Fix AssertionError when og property not found
On tvp.pl some webpages contain OpenGraph
metadata and some don't.

If og property is not found, _og_search_description
fails with

WARNING: unable to extract OpenGraph description; please report this issue on http://yt-dl.org/bug
Traceback (most recent call last):
  File "/usr/home/saper/bin/youtube-dl", line 18, in <module>
    youtube_dl.main()
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/__init__.py", line 766, in main
    _real_main(argv)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/__init__.py", line 719, in _real_main
    retcode = ydl.download(all_urls)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/YoutubeDL.py", line 715, in download
    videos = self.extract_info(url)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/YoutubeDL.py", line 348, in extract_info
    ie_result = ie.extract(url)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/extractor/common.py", line 125, in extract
    return self._real_extract(url)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/extractor/tvp.py", line 56, in _real_extract
    info['description'] = self._og_search_description(webpage)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/extractor/common.py", line 331, in _og_search_description
    return self._og_search_property('description', html, fatal=False, **kargs)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/extractor/common.py", line 325, in _og_search_property
    return unescapeHTML(escaped)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/utils.py", line 494, in unescapeHTML
    assert type(s) == type(u'')
AssertionError

The patch allows me to use:

  try:
    info['description'] = self._og_search_description(webpage)
    info['thumbnail'] = self._og_search_thumbnail(webpage)
  except RegexNotFoundError:
    pass
2013-11-05 23:19:29 +01:00
4ed3e51080 [ted] fixed error in case of no subtitles present
I created a test, but I leave it commented since TED videos get
new subtitles frequently.
2013-11-05 12:00:13 +01:00
7f34001d57 Merge pull request #1724 from rzhxeo/generic_youtube
[GenericIE] Also detect youtube if src url of iframe is embedded in ' instead of "
2013-11-04 23:00:46 -08:00
2dcf7d8f99 [GenericIE] Also detect youtube if src url of iframe is embedded in ' instaed of " 2013-11-05 02:08:02 +01:00
19b0668251 [canal2c] Accept more urls (fixes #1723)
The url only needs to have the 'idVideo' field in the query, in any position.
We have to set the 'void=oui' in the webpage url, so that we get the file name.
2013-11-04 22:26:19 +01:00
e7e6b54d8a [teamcoco] Parse the xml file and extract all the formats 2013-11-03 17:48:12 +01:00
2a1a8ffe41 Merge pull request #1693 from alexvh/teamcoco_fix
[teamcoco] Fix video url extraction for some videos
2013-11-03 17:19:51 +01:00
08fb86c49b [youtube] Add description for YoutubeSearchDateIE (#1710) 2013-11-03 15:59:10 +01:00
3633d77c0f Merge remote-tracking branch 'CBGoodBuddy/ytsearchtime' 2013-11-03 15:56:55 +01:00
165e179764 release 2013.11.03 2013-11-03 15:50:36 +01:00
12ebdd1506 [viddler] Support non-digit IDs (Fixes #1714) 2013-11-03 15:49:59 +01:00
1baf9a5938 Merge pull request #1698 from rzhxeo/cinemassacre
[CinemassacreIE] Support more embed urls
2013-11-03 05:17:12 -08:00
a56f9de156 Style fixes for extractors: remove spaces around (,),{ and } 2013-11-03 14:06:47 +01:00
fa5d47af4b Merge pull request #1679 from rzhxeo/mofosex
Add support for http://www.mofosex.com
2013-11-03 05:04:14 -08:00
d607038753 Merge pull request #1677 from rzhxeo/xtube
Add support for http://www.xtube.com
2013-11-03 03:28:02 -08:00
9ac6a01aaf Merge pull request #1676 from rzhxeo/extremetube
Add support for http://www.extremetube.com
2013-11-03 03:25:46 -08:00
be97abc247 Set the 'extractor_key' field in the info_dict
It's the string returned by the class method 'ie_key', which allows to retrieve the extractor with 'get_info_extractor'
2013-11-03 12:14:44 +01:00
9103bbc5cd Add the 'webpage_url' field to info_dict
The url for the video page, it must allow to reproduce the result.
It's automatically set by YoutubeDL if it's missing.
2013-11-03 12:11:13 +01:00
b6c45014ae Set the extra_info inside YoutubeDL.process_ie_result and set only if the keys are missing 2013-11-03 11:57:04 +01:00
a3dd924871 Add YoutubeSearchDateIE extractor to youtube.py & __init__.py, which searches by publication date. 2013-11-02 22:40:48 -04:00
137bbb3e37 [XTubeIE] Add description to TEST 2013-11-02 22:45:48 +01:00
86ad94bb2e [ExtremeTubeIE] Set age_limit to 18 and fix uploader extraction 2013-11-02 22:33:49 +01:00
3e56add7c9 Merge pull request #1678 from rzhxeo/keezmovies
[KeezMoviesIE] Detect URLs with numbers in the SEO part correct
2013-11-02 14:15:52 -07:00
f52f01b5d2 [brightcove] Don't set the extension
If the video only has the 'FLVFullLengthURL' key, it can still be an mp4 file.
2013-11-02 21:20:46 +01:00
98d7efb537 [exfm] skip tests
The site is down too often.
2013-11-02 20:51:09 +01:00
cf51923545 [youtube] Remove vevo test
The video is no longer available and it seems that vevo video don't use encrypted signatures anymore.
2013-11-02 20:46:26 +01:00
38fcd4597a Merge remote-tracking branch 'iemejia/master' 2013-11-02 19:56:06 +01:00
165e3bb67a [bambuser] Add an extractor for channels (closes #1702) 2013-11-02 19:50:57 +01:00
38db46794f Merge branch 'ted_subtitles' 2013-11-02 19:50:45 +01:00
a9a3876d55 [ted] Added support for subtitle download 2013-11-02 19:48:39 +01:00
1f343eaabb [subtitles] refactor to support websites with subtitle information the
webpage.

I added the parameter webpage, so now it's similar to the way automatic
captions are handled. This is an improvement needed for websites like
TED.
2013-11-02 19:29:25 +01:00
72a5b4f702 Add an extractor for bambuser.com (#1702) 2013-11-02 19:01:01 +01:00
0a43ddf320 [CinemassacreIE] Add live paramter to extracted info as a workaround 2013-11-02 18:08:35 +01:00
31366066bd Add support for live parameter to rtmpdump 2013-11-02 18:08:16 +01:00
aa2484e390 release 2013.11.02 2013-11-02 11:21:36 +01:00
8eddf3e91d [youtube] Encode subtitle track name in request (Fixes #1700) 2013-11-02 11:21:05 +01:00
60d142aa8d Add an extractor for vk.com (closes #1635) 2013-11-01 22:34:18 +01:00
66cf3ac342 [metacafe] Fix support for age-restricted videos (fixes #1696)
The 'Content-Type' header must be set for disabling the family filter.
The 'flashversion' cookie  is only needed for AnyClip videos.
Added tests for standard metacafe videos and for age-restricted videos.
Also set the 'age_limit' field.
2013-11-01 11:56:15 +01:00
ab4e151347 [CinemassacreIE] Support more embed urls 2013-11-01 01:24:23 +01:00
ac2547f5ff [teamcoco] Fix video url extraction for some videos
Video url extraction failed for some videos,
e.g. http://teamcoco.com/video/old-time-baseball

The url extracted was also occasionally suboptimal quality,
e.g. http://teamcoco.com/video/louis-ck-interview-george-w-bush
2013-10-31 15:41:14 -04:00
5f1ea943ab [livestream] fix the extraction of events
It now uses a json dictionary from the webpage.
2013-10-31 08:07:26 +01:00
0ef7ad5cd4 Fix the test for dailymotion subtitles
The extractor returns a single info_dict now.
2013-10-31 07:55:03 +01:00
9f1109a564 [dailymotion] Fix support for age-restricted videos (Fixes #1688) 2013-10-31 00:20:49 +01:00
33b1d9595d release 2013.10.30 2013-10-30 01:17:20 +01:00
7193498811 Use index in formt string (Fixes vevo test on Python 2.6) 2013-10-30 01:17:00 +01:00
72321ead7b [vevo] Readd support for SMIL (Fixes #1683) 2013-10-30 01:14:17 +01:00
b5d0d817bc Remove superfluous space 2013-10-30 01:09:44 +01:00
94badb2599 Fix output indenting for --list-formats 2013-10-30 01:09:26 +01:00
b9a836515f Update the Vimeo test vector md5
confirmed that this is indeed the first 10241 (we went off by one with
byte range 0-10240) of the full, playing mp4, so they probably
reencoded or something
2013-10-29 16:44:35 -04:00
21c924f406 [arte] Download the 'Originalversion' version if it's the only one available (fixes #1682) 2013-10-29 20:58:49 +01:00
e54fd4b23b [vevo] Add more format details 2013-10-29 15:10:09 +01:00
57dd9a8f2f Nicer --list-formats output 2013-10-29 15:09:45 +01:00
912cbf5d4e [vevo] Fix timestamp handling
( / 1000 is implicit float division )
2013-10-29 14:00:23 +01:00
43d7895ea0 release 2013.10.29 2013-10-29 06:48:39 +01:00
f7ff55aa78 Merge remote-tracking branch 'origin/master' 2013-10-29 06:48:18 +01:00
795f28f871 [youtube] Fix login (Fixes #1681) 2013-10-29 06:45:54 +01:00
f6cc16f5d8 [tests] a HTTP 503 is a transient issue 2013-10-28 19:07:16 -04:00
321a01f971 [mtv] Remove the templates from the mediagen url 2013-10-28 23:37:01 +01:00
646e17a53d Fix YouTubeDL test 2013-10-28 23:18:13 +01:00
dd508b7c4f [tests] don't fail on network errors
This is suboptimal, but at least this way we will need to look at the logs
only to check for network errors that happen too often, instead of
parsing a ton of lines each time to see if there is some true test failing
2013-10-28 18:03:26 -04:00
2563bcc85c Add an extractor for MySpace (closes #1666) 2013-10-28 22:02:17 +01:00
702665c085 tests: build the filename from the info_dict if the 'file' key is missing
It will need to have the 'id' and 'ext' keys to work.
2013-10-28 22:01:37 +01:00
dcc2a706ef Add support for http://www.xtube.com 2013-10-28 19:23:48 +01:00
2bc67c35ac [KeezMoviesIE] Detect URLs with numbers in the SEO part correct 2013-10-28 18:22:55 +01:00
77ae65877e Add support for http://www.mofosex.com 2013-10-28 18:18:58 +01:00
32a35e4418 Add support for http://www.extremetube.com 2013-10-28 17:35:01 +01:00
369a759acc setup.py: Make sure the setuptools_available variable is set
Otherwise it would crash if it can't import setuptools.
2013-10-28 16:54:48 +01:00
79b3f61228 Merge pull request #1675 from rzhxeo/fix
Check if description and thumbnail are None to prevent crash
2013-10-28 08:35:40 -07:00
216d71d001 Check if description and thumbnail are None to prevent crash 2013-10-28 16:28:35 +01:00
78a3a9f89e Make "requested format not available" expected (#1655) 2013-10-28 11:41:59 +01:00
a7685f3bf4 mixcloud does not do any format selection 2013-10-28 11:41:32 +01:00
f088ea5486 release 2013.10.28 2013-10-28 11:34:21 +01:00
1003d108d5 [vimeo] Support hash in URL (Fixes #1669) 2013-10-28 11:32:22 +01:00
8abeeb9449 Nicer --list-formats output 2013-10-28 11:31:12 +01:00
c1002e96e9 Let extractors omit ext in formats 2013-10-28 11:28:02 +01:00
77d0a82fef [addanime] Use new formats system 2013-10-28 11:24:47 +01:00
ebc14f251c Merge remote-tracking branch 'origin/master' 2013-10-28 10:44:13 +01:00
d41e6efc85 New debug option --write-pages 2013-10-28 10:44:02 +01:00
8ffa13e03e [Instagram] get the non-https link, as they are serving Akamai cert from a instagram.com domain 2013-10-28 02:34:29 -04:00
db477d3a37 Merge pull request #1620 from jaimeMF/console_script
Use the console_scripts entry point if setuptools is available
2013-10-27 23:08:59 -07:00
750e9833b8 Add the missing age_limit tags; added a devscript to do a superficial check for porn sites without the age_limit tag in the test 2013-10-28 01:50:17 -04:00
82f0ac657c Merge pull request #1657 by @rzhxeo
[YouPornIE] Extract all encrypted links and remove doubles at the end
2013-10-28 01:45:52 -04:00
eb6a2277a2 Merge pull request #1659 by @rzhxeo
Add support for http://www.tube8.com
2013-10-28 01:38:28 -04:00
f8778fb0fa Merge pull request #1663 by @rzhxeo
Add support for http://www.spankwire.com
2013-10-28 01:35:11 -04:00
e2f9de207c Merge pull request #1664 by @rzhxeo
Add support for http://www.keezmovies.com
2013-10-28 01:25:46 -04:00
a93cc0d943 Merge pull request #1661 by @rzhxeo
Add support for http://www.pornhub.com
2013-10-28 00:50:39 -04:00
7d8c2e07f2 [Exfm] replace the failing Soundcloud test vector (broken also in browser) 2013-10-28 00:33:43 -04:00
efb4c36b18 Merge pull request #1660 from pyed/master
[addanime] try to download HQ before normal
2013-10-27 21:14:19 -07:00
29526d0d2b Merge pull request #1656 from rzhxeo/xhamster
[XHamsterIE] Extract SD and HD video
2013-10-27 10:12:59 -07:00
198e370f23 [addanime] better regex. 2013-10-27 19:48:02 +03:00
c19f7764a5 [generic] Detect bandcamp pages that use custom domains (closes #1662)
They embed the original url in the 'og:url' property.
2013-10-27 14:40:25 +01:00
bc63d9d329 [rtlnow] Change the test for rtlnitronow 2013-10-27 14:26:19 +01:00
aa929c37d5 [generic] Fix test video's checksum 2013-10-27 14:21:37 +01:00
af4d506eb3 [faz] Use a regex for getting the description
The page cannot be parsed in python2.6 with the html parser.
2013-10-27 14:18:55 +01:00
5da0549581 [KeezMoviesIE] Correct return value for embedded videos 2013-10-27 12:48:09 +01:00
749a4fd2fd [facebook] Don't recommend to report the issue if the video is private. 2013-10-27 12:13:55 +01:00
6f71ef580c [facebook] Report a more meaningful message if the video cannot be accessed (closes #1658) 2013-10-27 12:09:46 +01:00
67874aeffa [facebook] Fix the login process (fixes #1244) 2013-10-27 12:07:58 +01:00
3e6a330d38 [addanime] fix md5sum 2013-10-27 13:51:26 +03:00
aee5e18c8f [addanime] catch 'RegexNotFoundError' 2013-10-27 13:36:43 +03:00
5b11143d05 Add support for http://www.keezmovies.com 2013-10-27 10:10:28 +01:00
7b2212e954 Add support for http://www.spankwire.com 2013-10-27 01:59:26 +02:00
71865091ab [Tube8IE] Fix regex for uploader extraction 2013-10-27 01:08:03 +02:00
125cfd78e8 Add support for http://www.pornhub.com 2013-10-27 01:04:22 +02:00
8cb57d9b91 [Tube8IE] Escape dot in regex 2013-10-27 00:21:27 +02:00
14e10b2b6e [addanime] try to download HQ before normal 2013-10-27 01:19:38 +03:00
6e76104d66 [YouPornIE] Make webpage download more robust 2013-10-26 23:33:32 +02:00
1d45a23b74 Add support for http://www.tube8.com 2013-10-26 23:27:30 +02:00
7df286540f [YouPornIE] Extract all encrypted links and remove doubles at the end 2013-10-26 21:57:10 +02:00
5d0c97541a [XHamsterIE] Extract SD and HD video 2013-10-26 20:38:54 +02:00
49a25557b0 [8tracks] Use track count instead of looking at at_last_track property
This fixes the error:

$ youtube-dl http://8tracks.com/vladmc/counting-stars
[8tracks] counting-stars: Downloading webpage
[8tracks] counting-stars: Downloading song information 1/4
[8tracks] counting-stars: Downloading song information 2/4
[8tracks] counting-stars: Downloading song information 3/4
[8tracks] counting-stars: Downloading song information 4/4
[8tracks] counting-stars: Downloading song information 5/4
Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/home/phihag/projects/youtube-dl/youtube_dl/__main__.py", line 18, in <module>
    youtube_dl.main()
  File "/home/phihag/projects/youtube-dl/youtube_dl/__init__.py", line 761, in main
    _real_main(argv)
  File "/home/phihag/projects/youtube-dl/youtube_dl/__init__.py", line 714, in _real_main
    retcode = ydl.download(all_urls)
  File "/home/phihag/projects/youtube-dl/youtube_dl/YoutubeDL.py", line 701, in download
    videos = self.extract_info(url)
  File "/home/phihag/projects/youtube-dl/youtube_dl/YoutubeDL.py", line 342, in extract_info
    ie_result = ie.extract(url)
  File "/home/phihag/projects/youtube-dl/youtube_dl/extractor/common.py", line 121, in extract
    return self._real_extract(url)
  File "/home/phihag/projects/youtube-dl/youtube_dl/extractor/eighttracks.py", line 111, in _real_extract
    'id': track_data['id'],
KeyError: 'id'
2013-10-25 23:46:19 +02:00
b5936c0059 Document the %(format_id)s field for the output template 2013-10-25 17:18:06 +02:00
600cc1a4f0 [youtube] Set the format_id field to the itag of the format (closes #1624) 2013-10-25 17:17:46 +02:00
ea32fbacc8 Fix the extensions of two tests with youtube videos
The best quality is now a mp4 video.
2013-10-25 16:55:37 +02:00
00fe14fc75 [youtube] Also use the 'adaptative_fmts' field from the /get_video_info page (fixes #1649)
The 'adaptative_fmts' field from the video page is not added to the 'url_encoded_fmt_stream_map'
2013-10-25 16:52:58 +02:00
fcc28edb2f [cinemassacre] Simplify
* Remove some rtmp parameters that are not needed.
* Remove the md5 checksums, the video is not downloaded.
* Remove the code used before the current format system.
2013-10-23 20:21:41 +02:00
fac6be2dd5 Merge pull request #1632 from rzhxeo/cinemassacre
[Cinemassacre] Download video that is shown in flash player
2013-10-23 20:15:39 +02:00
1cf64ee468 release 2013.10.23.2 2013-10-23 18:38:09 +02:00
cdec0190c4 [dailymotion] Extract all the available formats (closes #1028) 2013-10-23 17:33:38 +02:00
2450bcb28b [nowvideo] Fix key extraction
Extract it from the embed page
2013-10-23 17:00:33 +02:00
3126050c0f Hide the video password on verbose mode 2013-10-23 16:32:17 +02:00
93b22c7828 [vimeo] fix the extraction for videos protected with password
Added a test video.
2013-10-23 16:31:53 +02:00
0a89b2852e release 2013.10.23.1 2013-10-23 15:12:33 +02:00
55b3e45bba [vimeo] Fix pro videos and player.vimeo.com urls
The old process can still be used for those videos.
Added RegexNotFoundError, which is raised by _search_regex if it can't extract the info.
2013-10-23 14:38:03 +02:00
365bcf6d97 Merge remote-tracking branch 'origin/master' 2013-10-23 11:40:46 +02:00
71907db3ba [vimeo] Fix normal videos (Fixes #1642)
Vimeo Pro Videos are still broken
2013-10-23 11:38:53 +02:00
6803655ced Merge pull request #1622 from rbrito/fix-extension
extractor: youtube: Set extension of AAC audio formats to m4a.
2013-10-22 15:16:26 -07:00
df1c39ec5c release 2013.10.23 2013-10-23 00:07:27 +02:00
80f55a9511 release 2013.10.22 2013-10-22 22:35:13 +02:00
7853cc5ae1 Merge remote-tracking branch 'origin/master'
Conflicts:
	youtube_dl/YoutubeDL.py
2013-10-22 22:30:06 +02:00
586a91b67f Expand tilde in template (Fixes #1639) 2013-10-22 22:28:26 +02:00
b028e96144 [arte.tv:creative] Update the title of the test 2013-10-22 21:06:06 +02:00
ce68b5907c [nhl:videocenter] Fix playlist title extraction 2013-10-22 21:01:16 +02:00
fe7e0c9825 Style fixes in YoutubeDL.py
Fixed some of the problems reported by pep8
2013-10-22 14:49:34 +02:00
12893efe01 Respect the download parameter in YoutubeDL.process_video_result if the extractor handle the format selection 2013-10-22 00:01:59 +02:00
a6387bfd3c [vimeo] Implement the new format selection system (closes PR #996)
Rebased and deleted some parts to use the new system instead of copying the one from YoutubeIE
2013-10-21 23:16:11 +02:00
f6a54188c2 [youtube] Use 'node is None' when checking if the video has automatic captions
It had stopped working and it reports a FutureWarning
2013-10-21 16:28:55 +02:00
cbbd9a9c69 Fix the duration field for the VideoDetective and InternetVideoArchive tests
Also remove the use of the old format system and the comment
2013-10-21 15:07:33 +02:00
685a9cd2f1 [googleplus] Fix upload_date extraction 2013-10-21 15:00:21 +02:00
182a107877 [arte] Set the format_note and the format_id fields (closes #1628) 2013-10-21 14:42:30 +02:00
8c51aa6506 The 'format' field now defaults to '{format_id} - {width}x{height}{format_note}'
Following the YoutubeIE format. The 'format_note' gives additional info about the format, for example '3D' or 'DASH video'.
2013-10-21 14:42:06 +02:00
3fd39e37f2 YoutubeDL: remove method that came from FileDownloader 2013-10-21 13:52:24 +02:00
49e86983e7 Allow to use the extension for the format selection
The best format with the extension is downloaded.
2013-10-21 13:31:55 +02:00
a9c58ad945 Accept requested formats to be in the format 35/best (closes #1552)
The format selection code is now an independent function.
2013-10-21 13:19:58 +02:00
f8b45beacc Merge remote-tracking branch 'rbrito/set-age'
Conflicts:
	youtube_dl/extractor/xhamster.py
2013-10-19 21:16:14 +02:00
9d92015d43 [xhamster] Add support for age_limit (Instead of #1627) 2013-10-19 21:09:48 +02:00
50a6150ed9 extractor: Set age limit on some adult-related extractors.
More age limit of videos for adult-related sites.

Note that, for redtube, I explicitly left the variable containing the age
limit, since the comment justifying the age limit is a good thing to have.

That being said, I included the age limit field on the test, to better
reflect what the information extractor does (even if it may not break the
automated tests).

Signed-off-by: Rogério Brito <rbrito@ime.usp.br>
2013-10-19 14:19:25 -03:00
b0505eb611 [CinemassacreIE] Fix information extraction 2013-10-19 16:46:17 +02:00
284acd57d6 Add an author email 2013-10-19 11:14:20 +02:00
8ed6b34477 extractor: Set age limit on some adult-related extractors.
This is similar in spirit to what was done in commit 8e590a117f.

Signed-off-by: Rogério Brito <rbrito@ime.usp.br>
2013-10-18 19:32:37 -03:00
f6f1fc9286 extractor: youtube: Fix extension of dash formats.
While we are at it, separate the audio formats from the video formats.

Signed-off-by: Rogério Brito <rbrito@ime.usp.br>
2013-10-18 18:53:00 -03:00
8e590a117f [xnxx] Add age_limit 2013-10-18 23:35:17 +02:00
d5594202aa Simplify release process 2013-10-18 23:34:55 +02:00
b186d949cf release 2013.10.18.2 2013-10-18 23:22:54 +02:00
3d2986063c [bash-completion] Do not use dash in function name (Fixes #1623) 2013-10-18 23:13:46 +02:00
41fd7c7e60 Add new option --abort-on-error 2013-10-18 23:09:32 +02:00
fdefe96bf2 Document %(format)s (#1612) 2013-10-18 23:09:08 +02:00
16f36a6fc9 extractor: youtube: Set extension of AAC audio formats to m4a.
This, in particular, eases downloading both audio and videos in DASH formats
before muxing them, which alleviates the problem that I exposed on issue

Furthermore, one may argue that this is, indeed, the case for correctness's
sake.

Signed-off-by: Rogério Brito <rbrito@ime.usp.br>
2013-10-18 17:50:55 -03:00
f44415360e Use the console_scripts entry point if setuptools is available 2013-10-18 13:49:25 +02:00
cce722b79c Add metavar to --cache-dir 2013-10-18 11:50:48 +02:00
82697fb2ab release 2013.10.18.1 2013-10-18 11:45:30 +02:00
53c1d3ef49 Check for embedded YouTube player (Fixes #1616) 2013-10-18 11:44:57 +02:00
8e55e9abfc release 2013.10.18 2013-10-18 11:17:21 +02:00
7c58ef3275 [tudou] Fix title regex (Fixes #1614) 2013-10-18 11:16:20 +02:00
416a5efce7 fix typos 2013-10-18 00:49:45 +02:00
f4d96df0f1 Extend #980 with --max-quality support 2013-10-18 00:46:35 +02:00
5d254f776a Fix test 2013-10-18 00:27:51 +02:00
1c1218fefc Merge remote-tracking branch 'jaimeMF/format_selection' 2013-10-18 00:17:03 +02:00
d21ab29200 Add an extractor for techtalks.tv (closes #1606) 2013-10-17 08:20:58 +02:00
54ed626cf8 release 2013.10.17 2013-10-17 02:20:26 +02:00
a733eb6c53 [youtube] Do not crash if caption info is missing altogether (Fixes #1610) 2013-10-17 02:19:19 +02:00
591454798d [brightcove] Raise error if playlist is empty (#1608) 2013-10-17 01:02:17 +02:00
38604f1a4f Merge remote-tracking branch 'origin/master' 2013-10-17 00:55:06 +02:00
2d0efe70a6 [brightcove] Fix more broken XML (#1608) 2013-10-17 00:46:11 +02:00
bfd14b1b2f Add an extractor for rutube.ru (closes #1136)
It downloads with a m3u8 manifest, requires ffmpeg.
2013-10-16 16:57:40 +02:00
76965512da Fix the indentation of the Makefile
It uses tabs, no spaces.
2013-10-15 23:15:15 +02:00
996d1c3242 Don't include the test/testdata directory in the youtube-dl.tar.gz
The last releases included big files that increased the size of the compressed file.
2013-10-15 23:08:52 +02:00
8abbf43f21 release 2013.10.15 2013-10-15 12:06:45 +02:00
10eaae48ff Merge branch 'master' of github.com:rg3/youtube-dl 2013-10-15 12:05:24 +02:00
9d4660cab1 [generic] Support embedded vimeo videos (#1602) 2013-10-15 12:05:13 +02:00
9d74e308f7 [sztvhu] Fix the title extraction 2013-10-15 08:22:59 +02:00
e772692ffd Fix an import in the tests and the Youtube Shows test 2013-10-15 08:22:20 +02:00
8381a92120 [websurg] Skipt the test
It needs login information.
2013-10-15 08:12:30 +02:00
cd054fc491 Use upper-case for prefixes in help to signify bytes (#1043) 2013-10-15 04:53:02 +02:00
f219743e33 Merge remote-tracking branch 'alphapapa/master' 2013-10-15 04:52:07 +02:00
4f41664de8 Merge remote-tracking branch 'Rudloff/websurg' 2013-10-15 02:11:33 +02:00
a4fd04158e Do not import * 2013-10-15 02:07:26 +02:00
44a5f1718a Simplify tests
* Make them directly executable again
* Move common stuff (md5, parameters) to helper
* Never import *
* General clean up
2013-10-15 02:00:55 +02:00
a623df4c7b Credit @Elbandi for sztvhu 2013-10-15 01:34:47 +02:00
7cf67fbe29 [sztvhu] Simplify 2013-10-15 01:33:20 +02:00
3ddf1a6d01 Merge remote-tracking branch 'Elbandi/master' 2013-10-15 01:26:34 +02:00
850555c484 Merge remote-tracking branch 'origin/master' 2013-10-15 01:25:47 +02:00
9ed3bdc64d [tudou] Add support for youku links (Closes #1571) 2013-10-15 01:20:04 +02:00
c45aa56080 [gamespot] Fix video extraction (fixes #1587) 2013-10-14 16:46:07 +02:00
7394b8db3b Merge remote-tracking branch 'origin/master' 2013-10-14 16:07:53 +02:00
f9b3d7af47 Add an extractor for Szombathelyi TV 2013-10-14 13:07:47 +02:00
ea62a2da46 add VideoPremium.tv RTMP support 2013-10-14 01:32:47 -04:00
7468b6b71d Merge pull request #1569 from Jaiz909/1321-download-annotations
Added downloading annotations download support - closes #1321
2013-10-13 22:03:22 -07:00
1fb07d10a3 [youtube] Adds #1312 Download annotations
Adds #1321 Download annotations from youtube
Annotations are downloaded and written to a .annotations.xml file using the https://www.youtube.com/annotations_invideo?features=1&legacy=1&video_id=$VIDEOID API.
Added unit test for annotations.
2013-10-14 16:22:27 +11:00
9378ae6e1d [youku] Allow shortcut youku:ID and make non-matching groups non-matching (#1571) 2013-10-13 15:55:05 +02:00
06723d47c4 Merge remote-tracking branch 'jaimeMF/opus-fix' 2013-10-13 15:26:10 +02:00
69a0c470b5 [arte] Add an extractor for future.arte.tv (closes #1593) 2013-10-13 14:21:13 +02:00
c40f5cf45c [arte] add an extractor for creative.arte.tv (#1593)
The +7 videos now use an independent extractor that is also used for the creative videos
2013-10-13 13:54:31 +02:00
4b7b839f24 Add an extractor for rottentomatoes.com and improve InternetVideoArchiveIE to get the best quality 2013-10-12 22:22:31 +02:00
3d60d33773 Add an extractor for videodetective.com (closes #262)
It uses the internetvideoarchive.com platform.
2013-10-12 21:36:17 +02:00
d7e66d39a0 Add an extractor for internetvideoarchive.com videos
It's used by videodetective.com
2013-10-12 21:34:04 +02:00
d3f46b9aa5 Add support for single-test tox runs
Use a sintax like
    tox test.test_download:TestDownload.test_NowVideo
to run the specific test on all the tox environments (Python versions)
2013-10-12 13:17:11 -04:00
f5e54a1fda add support for NowVideo.ch 2013-10-12 13:11:03 -04:00
4eb7f1d12e FFmpegPostProcessor: print the command line used if the --verbose option is given 2013-10-12 13:49:27 +02:00
0f6d12e43c Don't set the '-aq' option with the opus format (fixes #1263) 2013-10-12 13:30:30 +02:00
b4cdc245cf Merge pull request #1590 from joeyadams/master
Fix Brightcove detection when another Flash object is on the page
2013-10-12 02:09:39 -07:00
3283533149 Fix Brightcove detection when another Flash object is on the page
The regex used non-greedy match, but alas it failed on input like this:

    <object class="...> ... class="BrightcoveExperience"

It captured two objects and the intervening HTML.  This commit fixes this by
not allowing a ">" to appear before BrightcoveExperience.

Video in question: http://www.harpercollinschildrens.com/feature/petethecat/
2013-10-11 21:52:33 -04:00
8032e31f2d Merge pull request #1558 from rzhxeo/cinemassacre
Add support for http://cinemassacre.com
2013-10-11 20:38:26 +02:00
d2f9cdb205 Merge branch 'cinemassacre' of github.com:rzhxeo/youtube-dl into rzhxeo-cinemassacre 2013-10-11 19:53:27 +02:00
8016c92297 Fix the default values of format_id and format 2013-10-11 16:34:49 +02:00
e028d0d1e3 Implement the prefer_free_formats in YoutubeDL 2013-10-11 16:34:49 +02:00
79819f58f2 Default 'format' field to {width}x{height}
If width is None, use {height}p and if height is None, '???'
2013-10-11 16:34:49 +02:00
6ff000b888 Do not handle format selection for IEs that already handle it 2013-10-11 16:34:48 +02:00
99e206d508 Implement the max quality option in YoutubeDL 2013-10-11 16:34:48 +02:00
dd82ffea0c Implement format selection in YoutubeDL
Now the IEs can set a formats field in the info_dict, with the formats ordered from worst to best quality. It's a list of dicts with the following fields:
* Mandatory: url and ext
* Optional: format and format_id

The format_id is used for choosing which formats have to be downloaded.

Now a video result is processed by the method process_video_result.
2013-10-11 16:34:48 +02:00
3823342d9d [arte] Prepare for generic format support (#980) 2013-10-11 16:33:31 +02:00
91dbaef406 [nhl] Add an extractor for videocenter's categories (#1586)
It downloads the last 12 videos.
2013-10-11 14:33:26 +02:00
9026dd3858 Make sure it only runs rtmpdump one time in test mode and return True if the download can be resumed 2013-10-11 12:42:15 +02:00
81d7f1928c Merge pull request #1565 from rzhxeo/rtmpdump_test
Only download 1 sec. with rtmpdump in test mode
2013-10-11 12:40:18 +02:00
bc4f29170f Add a PostProcessor for adding metadata to the file (closes #1570)
It currently sets the title, the date and the author values.
2013-10-11 11:19:09 +02:00
cb354c8f62 [yahoo] Download the info from another page
The 'meta' field is not always in the video webpage
2013-10-10 21:01:45 +02:00
1cbb27b151 [gamespot] Mark as broken (#1587) 2013-10-10 19:55:52 +02:00
0ab4ff6378 [mtv] Strip the description
There were some tabs and newlines added around the string.
2013-10-10 19:53:44 +02:00
63da13e829 Add an extractor for faz.net (closes #1582) 2013-10-10 19:37:17 +02:00
4193a453c2 Don't add extractors with IE_DESC set to False to the page of supported sites. 2013-10-10 16:18:02 +02:00
2e1fa03bf5 Add an extractor for video.nhl.com (closes #1586) 2013-10-10 16:16:49 +02:00
8f1ae18a18 release 2013.10.09 2013-10-09 23:50:47 +02:00
57da92b7df [youtube] Do not recognize attribution link as user (Fixes #1573) 2013-10-09 23:50:38 +02:00
df4f632dbc Merge pull request #1584 from wingsuit/master
Tiny tpo
2013-10-09 07:44:06 -07:00
a34c2faae4 [youtube] set the 'name' parameter in the subtitles url (fixes #1577) 2013-10-09 16:41:36 +02:00
Tom
1d368c7589 Tiny tpo 2013-10-09 21:56:09 +08:00
88bd97e34c [vevo] Some improvements (fixes #1580)
Extract the info from http://videoplayer.vevo.com/VideoService/AuthenticateVideo?isrc={id}
Some videos don't have an smil manifest, extract the video urls directly from the json and use the last version of the video.
Extract all the available formats and set the 'formats' field of the result
2013-10-08 21:25:38 +02:00
2ae3edb1cf Fix the printing of the proxy map in debug mode
The proxies have to be extracted from the opener.handlers
2013-10-07 21:10:31 +02:00
b2ad967e45 Simplify test setup 2013-10-07 19:06:36 +02:00
a27b9e8bd5 Move opener setup into a separate helper function 2013-10-07 19:01:47 +02:00
4481a754e4 release 2013.10.07 2013-10-07 14:34:19 +02:00
faa6ef6bc8 [jeuxvideo] Improve code quality (fixes #1567) 2013-10-07 14:33:23 +02:00
15870e90b0 Restore warning when user forgets to quote URL (#1396) 2013-10-07 12:21:24 +02:00
8e4f824365 Remove test parameter from _download_with_rtmpdump 2013-10-06 22:04:32 +02:00
387ae5f30b [vimeo] Recognize urls ending in a slash (fixes #1242) 2013-10-06 21:56:23 +02:00
ad7a071ab6 Only download 1 sec. with rtmpdump in test mode 2013-10-06 20:55:24 +02:00
1310bf2474 [redtube] add age_limit 2013-10-06 16:39:35 +02:00
b24f347190 Merge branch 'download-archive'
Conflicts:
	youtube_dl/YoutubeDL.py
	youtube_dl/__init__.py
2013-10-06 16:30:26 +02:00
ee6c9f95e1 Remove superfluous parenthesis 2013-10-06 16:28:36 +02:00
2a69c6b879 Merge branch 'age_limit' 2013-10-06 16:23:18 +02:00
cfadd183c4 Call extracted property age_limit everywhere 2013-10-06 16:23:06 +02:00
e484c81f0c [generic] Clarify error messages 2013-10-06 16:03:18 +02:00
7e5e8306fd release 2013.10.06 2013-10-06 07:13:14 +02:00
41e8bca4d0 [viddler] Add basic support (Fixes #1520) 2013-10-06 07:12:47 +02:00
8dbe9899a9 Allow users to specify an age limit (fixes #1545)
With these changes, users can now restrict what videos are downloaded by the intented audience, by specifying their age with --age-limit YEARS .
Add rudimentary support in youtube, pornotube, and youporn.
2013-10-06 06:08:56 +02:00
f4aac741d5 Move try_rm to test helpers 2013-10-06 05:47:17 +02:00
c1c9a79c49 Add basic --download-archive option
Often, users want to be able to download only videos they haven't seen before, despite the video files having been deleted or moved in the mean time.
When --download-archive FILE is given, the extractor and ID of every download is recorded in the specified file. If it is already present, the video in question is skipped.
2013-10-06 04:27:10 +02:00
226113c880 Merge remote-tracking branch 'origin/tox' 2013-10-05 22:47:44 +02:00
8932a66e49 [fixup] remove unnecessary commented function 2013-10-05 16:38:37 -04:00
79cfb46d42 add tox configuration file for easy testing 2013-10-05 16:08:48 -04:00
00fcc17aee add capability to suppress expected warnings in tests 2013-10-05 15:55:58 -04:00
e94b783c74 [googleplus] Fix upload_date detection 2013-10-05 16:38:33 +02:00
97dae9ae07 [bliptv] Make sure video ID is a string 2013-10-05 16:12:29 +02:00
ca215e0a4f [CinemassacreIE] Use MD5 to check in TEST description 2013-10-05 13:42:17 +02:00
91a26ca559 [CinemassacreIE] Remove docstring from class 2013-10-05 13:40:05 +02:00
1ece880d7c [CinemassacreIE] Add support for other embed methods 2013-10-05 13:36:13 +02:00
400afddaf4 Add CinemassacreIE 2013-10-05 09:37:11 +02:00
c3fef636b5 [dailymotion] Fix playlist extraction
The html code has changed, make the video ids extraction more solid.
2013-10-04 14:07:29 +02:00
46e28a84ca [brightcove] Fix up some broken HTML (#1553) 2013-10-04 11:53:49 +02:00
17ad2b3fb1 [yahoo] Switch ext of test 2013-10-04 11:44:56 +02:00
5e2a60db4a [yahoo] Fix test title 2013-10-04 11:44:02 +02:00
cd214418f6 [redtube] pep8 2013-10-04 11:41:57 +02:00
ba2d9f213e [jeuxvideo] fix video file md5sum 2013-10-04 11:38:56 +02:00
7f8ae73a5d Include length in player cache ID
Some videos use the same player with IDs of multiple lengths.
See https://travis-ci.org/rg3/youtube-dl/jobs/12126506#L319 for an example.
2013-10-04 11:36:06 +02:00
466880f531 [yahoo] Do not try to run rtmpdump on travis 2013-10-04 11:34:12 +02:00
9f1f6d2437 [rtlnow] Skip test on travis 2013-10-04 11:33:14 +02:00
9e0f897f6b [francetv] Use common format for ID of generation-quoi subextractor 2013-10-04 11:30:47 +02:00
c0f6aa876f Merge remote-tracking branch 'origin/master' 2013-10-04 11:14:20 +02:00
d93bdee9a6 [comedycentral] Prepare for generic video extraction (#980) 2013-10-04 11:14:10 +02:00
f13d09332d [mtv] Prepare for #980 2013-10-04 11:10:04 +02:00
2f5865cc6d Clarify that url and ext are optional when formats is given (#980) 2013-10-04 11:09:43 +02:00
deefc05b88 Document formats (for #980) 2013-10-04 10:40:42 +02:00
0d8cb1cc14 [ted] Prepare #980 merge 2013-10-04 10:32:34 +02:00
a90b9fd209 Merge pull request #1551 from rzhxeo/flickr
[FlickrIE] Fix HTTPS url
2013-10-03 23:14:12 -07:00
829493439a [FlickrIE] Fix HTTPS url 2013-10-04 07:47:40 +02:00
73b4fafd82 Use self._download_webpage everywhere 2013-10-04 01:12:42 +02:00
b039775057 Unused variable 2013-10-04 01:07:24 +02:00
5c1d63b737 Changes suggested by @phihag 2013-10-04 01:04:38 +02:00
3cd022f6e6 Merge remote-tracking branch 'rzhxeo/rtl_ntv' 2013-10-04 00:59:11 +02:00
abefd1f7c4 Merge remote-tracking branch 'rzhxeo/rtl_upload_date' 2013-10-04 00:58:35 +02:00
c21315f273 [youtube] new static 82 signature 2013-10-04 00:43:01 +02:00
9ab1018b1a release 2013.10.04 2013-10-04 00:38:19 +02:00
da0a5d2d6e [france2] Add support for URLs without video IDs (Fixes #1547) 2013-10-04 00:34:36 +02:00
ee6adb166c [ign] Support more urls and detect multiple videos in articles (fixes #1543) 2013-10-02 20:59:34 +02:00
be8fe32c92 Fix help of --cachedir 2013-10-02 14:37:19 +02:00
c38b1e776d [youtube] Simplify cache_dir code (#1529) 2013-10-02 08:41:14 +02:00
4f8bf17f23 Merge remote-tracking branch 'holomorph/master' 2013-10-02 08:23:53 +02:00
ca40186c75 [youtube] Fix static 82 signature (Closes #1539) 2013-10-02 08:20:00 +02:00
a8c6b24155 [youtube] Support videos without a title (Fixes #1391, Closes #1542) 2013-10-02 07:25:35 +02:00
bd8e5c7ca2 Merge pull request #1531 from rg3/no-playlist
[youtube] implement --no-playlist to only download current video
2013-10-01 10:08:20 -07:00
7c61bd36bb [youtube] correct --no-playlist for python3 2013-10-01 11:58:13 -04:00
c54283824c [dailymotion] Detect vevo videos (fixes #1532)
All videos from the Vevo user, just embed videos from vevo.com
2013-10-01 15:05:41 +02:00
52f15da2ca release 2013.10.01.1 2013-10-01 14:44:26 +02:00
44d466559e Properly handle stream meap not being present 2013-10-01 14:44:09 +02:00
05751eb047 release 2013.10.01 2013-10-01 11:43:54 +02:00
f10503db67 Handle videos without url_encoded_fmt_stream_map (Fixes #1535) 2013-10-01 11:39:11 +02:00
adfeafe9e1 [RTLnowIE] Allow video description without upload date
Some videos (feature films) have no upload date.
2013-10-01 07:22:49 +02:00
4c62a16f4f [RTLnowIE] Add support for http://n-tvnow.de 2013-10-01 06:55:30 +02:00
c0de39e6d4 Merge pull request #2 from rg3/master
Update
2013-09-30 21:39:58 -07:00
fa55675593 Support XDG base directory specification 2013-09-30 18:22:38 -04:00
d4d9920a26 add test for --no-playlist 2013-09-30 18:01:17 -04:00
47192f92d8 implement --no-playlist to only download current video - closes #755 2013-09-30 16:26:25 -04:00
722076a123 [rtlnow] Replace one of the tests
The video is no longer available.
2013-09-29 23:07:26 +02:00
bb4aa62cf7 [appletrailers] The request for the settings must have the trailer name in lower case (fixes #1329) 2013-09-29 20:59:19 +02:00
843530568f [appletrailers] Rework extraction (fixes #1387)
The exraction was broken:
* The includes page contains img elements that need to be fixed.
* Use the 'itunes.inc' page, it contains a json dictionary for each trailer with information.
* Get the formats from 'includes/settings{trailer_name}.json'
* Use urljoin to allow urls with a fragment identifier to work

Removed the thumbnail urls from the tests, they are different now.
2013-09-29 20:49:58 +02:00
138a5454b5 release 2013.09.29 2013-09-29 14:38:37 +02:00
d279037036 [update] Prevent cmd window popup on Windows (Fixes #1478) 2013-09-29 14:37:06 +02:00
46353f6783 [update] Look for .exe extension on Windows (Fixes #745) 2013-09-29 14:37:00 +02:00
70922df8b5 [dailymotion] Disable the family filter in the playlists (fixes #1524) 2013-09-29 12:44:02 +02:00
9c15e9de84 [yahoo] Fix video extraction (fixes #1521)
There's no need to use two different methods.
Now we can also download videos over http if possible.
Also run the test for rtmp videos, but skip the download.
2013-09-28 21:19:52 +02:00
123c10608d Merge branch 'master' of github.com:rg3/youtube-dl 2013-09-28 15:43:38 +02:00
0b7c2485b6 [zdf] Add support for hash URLs and simplify (#1518) 2013-09-28 15:43:34 +02:00
9abb32045a [youtube] Add hlsvp to the error message if it can't be found and remove the live stream test
It's no longer available, other olympics streams have the same problem.
2013-09-27 15:06:27 +02:00
f490e77e77 [youtube] Set the thumbnail to None if it can't be extracted 2013-09-27 14:22:36 +02:00
2dc592991a [youtube] update description of test 2013-09-27 14:20:52 +02:00
0a60edcfa9 Don't fail if the video thumbnail couldn't be downloaded (fixes #1516)
Just report a warning
2013-09-27 14:19:19 +02:00
c53f9d30c8 Merge branch 'master' of github.com:rg3/youtube-dl 2013-09-27 13:09:58 +02:00
509f398292 Remove youtube_genalgo (#1515)
With the automatic signature extraction, this script has become superfluous now
2013-09-27 13:09:24 +02:00
74bab3f0a4 Don't embed subtitles if the list is empty or the field is not set (fixes #1510) 2013-09-27 08:08:43 +02:00
8574862991 Merge remote-tracking branch 'rzhxeo/RTL_T' 2013-09-27 06:25:04 +02:00
2de957c7e1 Merge remote-tracking branch 'rzhxeo/RTL' 2013-09-27 06:23:10 +02:00
920de7a27d [youtube] Fix 83 signature (Closes #1511) 2013-09-27 06:15:21 +02:00
63efc427cd [RTLnowIE] Clean video title
The title of some videos has the following format:
Series - Episode | Series online schauen bei ... NOW
2013-09-27 06:00:37 +02:00
ce65fb6c76 [RTLnowIE] Add support for http://rtlnitronow.de 2013-09-27 05:50:16 +02:00
4de1994b6e [brightcove] Use direct url for the tests
The test_all_urls.py test failed because BrightcoveIE doesn't match them.
2013-09-26 18:59:56 +02:00
592882aa9f [brightcove] Support videos that only provide flv versions (fixes #1504)
Moved the test from generic.py to brightcove.py
2013-09-26 13:54:31 +02:00
cc6943e86a Improvements 2013-09-18 00:07:04 +02:00
8f77093262 Merge remote-tracking branch 'upstream/master' into websurg 2013-09-17 23:07:44 +02:00
d79a0e233a Extractor for websurg.com 2013-09-17 22:13:40 +02:00
0025da15cf Clarify that download rate is in bytes per second
I found f918ec7ea2 but it is still not clear to anyone who hasn't read Issue #723 whether the limit is in bits or bytes.  This is doubly confusing because 1) ISPs usually advertise speeds in bits per second, and 2) lowercase "k" and "m" are often used in correlation with bits rather than bytes.
2013-07-13 16:42:16 -05:00
109 changed files with 4593 additions and 1447 deletions

1
.gitignore vendored
View File

@ -25,3 +25,4 @@ updates_key.pem
*.mp4 *.mp4
*.part *.part
test/testdata test/testdata
.tox

View File

@ -13,13 +13,13 @@ PYTHON=/usr/bin/env python
# set SYSCONFDIR to /etc if PREFIX=/usr or PREFIX=/usr/local # set SYSCONFDIR to /etc if PREFIX=/usr or PREFIX=/usr/local
ifeq ($(PREFIX),/usr) ifeq ($(PREFIX),/usr)
SYSCONFDIR=/etc SYSCONFDIR=/etc
else else
ifeq ($(PREFIX),/usr/local) ifeq ($(PREFIX),/usr/local)
SYSCONFDIR=/etc SYSCONFDIR=/etc
else else
SYSCONFDIR=$(PREFIX)/etc SYSCONFDIR=$(PREFIX)/etc
endif endif
endif endif
install: youtube-dl youtube-dl.1 youtube-dl.bash-completion install: youtube-dl youtube-dl.1 youtube-dl.bash-completion
@ -71,6 +71,7 @@ youtube-dl.tar.gz: youtube-dl README.md README.txt youtube-dl.1 youtube-dl.bash-
--exclude '*~' \ --exclude '*~' \
--exclude '__pycache' \ --exclude '__pycache' \
--exclude '.git' \ --exclude '.git' \
--exclude 'testdata' \
-- \ -- \
bin devscripts test youtube_dl \ bin devscripts test youtube_dl \
CHANGELOG LICENSE README.md README.txt \ CHANGELOG LICENSE README.md README.txt \

View File

@ -21,6 +21,8 @@ which means you can modify it, redistribute it or use it however you like.
sudo if needed) sudo if needed)
-i, --ignore-errors continue on download errors, for example to to -i, --ignore-errors continue on download errors, for example to to
skip unavailable videos in a playlist skip unavailable videos in a playlist
--abort-on-error Abort downloading of further videos (in the
playlist or the command line) if an error occurs
--dump-user-agent display the current browser identification --dump-user-agent display the current browser identification
--user-agent UA specify a custom user agent --user-agent UA specify a custom user agent
--referer REF specify a custom referer, use if the video access --referer REF specify a custom referer, use if the video access
@ -30,9 +32,10 @@ which means you can modify it, redistribute it or use it however you like.
--extractor-descriptions Output descriptions of all supported extractors --extractor-descriptions Output descriptions of all supported extractors
--proxy URL Use the specified HTTP/HTTPS proxy --proxy URL Use the specified HTTP/HTTPS proxy
--no-check-certificate Suppress HTTPS certificate validation. --no-check-certificate Suppress HTTPS certificate validation.
--cache-dir None Location in the filesystem where youtube-dl can --cache-dir DIR Location in the filesystem where youtube-dl can
store downloaded information permanently. store downloaded information permanently. By
~/.youtube-dl/cache by default default $XDG_CACHE_HOME/youtube-dl or ~/.cache
/youtube-dl .
--no-cache-dir Disable filesystem caching --no-cache-dir Disable filesystem caching
## Video Selection: ## Video Selection:
@ -50,11 +53,16 @@ which means you can modify it, redistribute it or use it however you like.
--date DATE download only videos uploaded in this date --date DATE download only videos uploaded in this date
--datebefore DATE download only videos uploaded before this date --datebefore DATE download only videos uploaded before this date
--dateafter DATE download only videos uploaded after this date --dateafter DATE download only videos uploaded after this date
--no-playlist download only the currently playing video
--age-limit YEARS download only videos suitable for the given age
--download-archive FILE Download only videos not present in the archive
file. Record all downloaded videos in it.
## Download Options: ## Download Options:
-r, --rate-limit LIMIT maximum download rate (e.g. 50k or 44.6m) -r, --rate-limit LIMIT maximum download rate in bytes per second (e.g.
50K or 4.2M)
-R, --retries RETRIES number of retries (default is 10) -R, --retries RETRIES number of retries (default is 10)
--buffer-size SIZE size of download buffer (e.g. 1024 or 16k) --buffer-size SIZE size of download buffer (e.g. 1024 or 16K)
(default is 1024) (default is 1024)
--no-resize-buffer do not automatically adjust the buffer size. By --no-resize-buffer do not automatically adjust the buffer size. By
default, the buffer size is automatically resized default, the buffer size is automatically resized
@ -70,7 +78,10 @@ which means you can modify it, redistribute it or use it however you like.
%(uploader_id)s for the uploader nickname if %(uploader_id)s for the uploader nickname if
different, %(autonumber)s to get an automatically different, %(autonumber)s to get an automatically
incremented number, %(ext)s for the filename incremented number, %(ext)s for the filename
extension, %(upload_date)s for the upload date extension, %(format)s for the format description
(like "22 - 1280x720" or "HD"),%(format_id)s for
the unique id of the format (like Youtube's
itags: "137"),%(upload_date)s for the upload date
(YYYYMMDD), %(extractor)s for the provider (YYYYMMDD), %(extractor)s for the provider
(youtube, metacafe, etc), %(id)s for the video id (youtube, metacafe, etc), %(id)s for the video id
, %(playlist)s for the playlist the video is in, , %(playlist)s for the playlist the video is in,
@ -81,12 +92,14 @@ which means you can modify it, redistribute it or use it however you like.
ownloads/%(uploader)s/%(title)s-%(id)s.%(ext)s' . ownloads/%(uploader)s/%(title)s-%(id)s.%(ext)s' .
--autonumber-size NUMBER Specifies the number of digits in %(autonumber)s --autonumber-size NUMBER Specifies the number of digits in %(autonumber)s
when it is present in output filename template or when it is present in output filename template or
--autonumber option is given --auto-number option is given
--restrict-filenames Restrict filenames to only ASCII characters, and --restrict-filenames Restrict filenames to only ASCII characters, and
avoid "&" and spaces in filenames avoid "&" and spaces in filenames
-a, --batch-file FILE file containing URLs to download ('-' for stdin) -a, --batch-file FILE file containing URLs to download ('-' for stdin)
-w, --no-overwrites do not overwrite files -w, --no-overwrites do not overwrite files
-c, --continue resume partially downloaded files -c, --continue force resume of partially downloaded files. By
default, youtube-dl will resume downloads if
possible.
--no-continue do not resume partially downloaded files (restart --no-continue do not resume partially downloaded files (restart
from beginning) from beginning)
--cookies FILE file to read cookies from and dump cookie jar in --cookies FILE file to read cookies from and dump cookie jar in
@ -95,6 +108,7 @@ which means you can modify it, redistribute it or use it however you like.
file modification time file modification time
--write-description write video description to a .description file --write-description write video description to a .description file
--write-info-json write video metadata to a .info.json file --write-info-json write video metadata to a .info.json file
--write-annotations write video annotations to a .annotation file
--write-thumbnail write thumbnail image to disk --write-thumbnail write thumbnail image to disk
## Verbosity / Simulation Options: ## Verbosity / Simulation Options:
@ -115,6 +129,8 @@ which means you can modify it, redistribute it or use it however you like.
-v, --verbose print various debugging information -v, --verbose print various debugging information
--dump-intermediate-pages print downloaded pages to debug problems(very --dump-intermediate-pages print downloaded pages to debug problems(very
verbose) verbose)
--write-pages Write downloaded pages to files in the current
directory
## Video Format Options: ## Video Format Options:
-f, --format FORMAT video format code, specifiy the order of -f, --format FORMAT video format code, specifiy the order of
@ -161,6 +177,7 @@ which means you can modify it, redistribute it or use it however you like.
processed files are overwritten by default processed files are overwritten by default
--embed-subs embed subtitles in the video (only for mp4 --embed-subs embed subtitles in the video (only for mp4
videos) videos)
--add-metadata add metadata to the files
# CONFIGURATION # CONFIGURATION

View File

@ -1,4 +1,4 @@
__youtube-dl() __youtube_dl()
{ {
local cur prev opts local cur prev opts
COMPREPLY=() COMPREPLY=()
@ -15,4 +15,4 @@ __youtube-dl()
fi fi
} }
complete -F __youtube-dl youtube-dl complete -F __youtube_dl youtube-dl

39
devscripts/check-porn.py Normal file
View File

@ -0,0 +1,39 @@
#!/usr/bin/env python
"""
This script employs a VERY basic heuristic ('porn' in webpage.lower()) to check
if we are not 'age_limit' tagging some porn site
"""
# Allow direct execution
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import get_testcases
from youtube_dl.utils import compat_urllib_request
for test in get_testcases():
try:
webpage = compat_urllib_request.urlopen(test['url'], timeout=10).read()
except:
print('\nFail: {0}'.format(test['name']))
continue
webpage = webpage.decode('utf8', 'replace')
if 'porn' in webpage.lower() and ('info_dict' not in test
or 'age_limit' not in test['info_dict']
or test['info_dict']['age_limit'] != 18):
print('\nPotential missing age_limit check: {0}'.format(test['name']))
elif 'porn' not in webpage.lower() and ('info_dict' in test and
'age_limit' in test['info_dict'] and
test['info_dict']['age_limit'] == 18):
print('\nPotential false negative: {0}'.format(test['name']))
else:
sys.stdout.write('.')
sys.stdout.flush()
print()

View File

@ -16,10 +16,11 @@ def main():
ie_htmls = [] ie_htmls = []
for ie in sorted(youtube_dl.gen_extractors(), key=lambda i: i.IE_NAME.lower()): for ie in sorted(youtube_dl.gen_extractors(), key=lambda i: i.IE_NAME.lower()):
ie_html = '<b>{}</b>'.format(ie.IE_NAME) ie_html = '<b>{}</b>'.format(ie.IE_NAME)
try: ie_desc = getattr(ie, 'IE_DESC', None)
if ie_desc is False:
continue
elif ie_desc is not None:
ie_html += ': {}'.format(ie.IE_DESC) ie_html += ': {}'.format(ie.IE_DESC)
except AttributeError:
pass
if ie.working() == False: if ie.working() == False:
ie_html += ' (Currently broken)' ie_html += ' (Currently broken)'
ie_htmls.append('<li>{}</li>'.format(ie_html)) ie_htmls.append('<li>{}</li>'.format(ie_html))

View File

@ -88,10 +88,6 @@ ROOT=$(pwd)
"$ROOT/devscripts/gh-pages/update-sites.py" "$ROOT/devscripts/gh-pages/update-sites.py"
git add *.html *.html.in update git add *.html *.html.in update
git commit -m "release $version" git commit -m "release $version"
git show HEAD
read -p "Is it good, can I push? (y/n) " -n 1
if [[ ! $REPLY =~ ^[Yy]$ ]]; then exit 1; fi
echo
git push "$ROOT" gh-pages git push "$ROOT" gh-pages
git push "$ORIGIN_URL" gh-pages git push "$ORIGIN_URL" gh-pages
) )

View File

@ -1,116 +0,0 @@
#!/usr/bin/env python
# encoding: utf-8
# Generate youtube signature algorithm from test cases
import sys
tests = [
# 93 - vfl79wBKW 2013/07/20
(u"qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[]}|:;?/>.<'`~\"",
u".>/?;:|}][{=+-_)(*&^%$#@!MNBVCXZASDFGHJKLPOIUYTREWQ098765'321mnbvcxzasdfghjklpoiu"),
# 92 - vflQw-fB4 2013/07/17
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[]}|:;?/>.<'`~\"",
"mrtyuioplkjhgfdsazxcvbnq1234567890QWERTY}IOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[]\"|:;"),
# 91 - vfl79wBKW 2013/07/20 (sporadic)
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[]}|:;?/>.<'`~",
"/?;:|}][{=+-_)(*&^%$#@!MNBVCXZASDFGHJKLPOIUYTREWQ09876543.1mnbvcxzasdfghjklpoiu"),
# 90
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[]}|:;?/>.<'`",
"mrtyuioplkjhgfdsazxcvbne1234567890QWER[YUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={`]}|"),
# 89
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[]}|:;?/>.<'",
"/?;:|}<[{=+-_)(*&^%$#@!MqBVCXZASDFGHJKLPOIUYTREWQ0987654321mnbvcxzasdfghjklpoiuyt"),
# 88 - vflapUV9V 2013/08/28
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[]}|:;?/>.<",
"ioplkjhgfdsazxcvbnm12<4567890QWERTYUIOZLKJHGFDSAeXCVBNM!@#$%^&*()_-+={[]}|:;?/>.3"),
# 87
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$^&*()_-+={[]}|:;?/>.<",
"uioplkjhgfdsazxcvbnm1t34567890QWE2TYUIOPLKJHGFDSAZXCVeNM!@#$^&*()_-+={[]}|:;?/>.<"),
# 86 - vflHql6Pr 2013/09/24
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[|};?/>.<",
";}|[{=+-d)(*&^%$#@!MNBVCXZASDFGHJKLPOIUYT_EWQ0987654321mnbvcxzas/fghjklpoiuytrewq"),
# 85 - vflkuzxcs 2013/09/11
('0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[',
'3456789a0cdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRS[UVWXYZ!"#$%&\'()*+,-./:;<=>?@'),
# 84 - vflHql6Pr 2013/09/24 (sporadic)
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[};?>.<",
"}[{=+-_)g*&^%$#@!MNBVCXZASDFGHJKLPOIUYTRE(Q0987654321mnbvcxzasdf?hjklpoiuytrewq"),
# 83
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!#$%^&*()_+={[};?/>.<",
".>/?;}[{=+_)(*&^%<#!MNBVCXZASPFGHJKLwOIUYTREWQ0987654321mnbvcxzasdfghjklpoiuytreq"),
# 82 - vflGNjMhJ 2013/09/12
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKHGFDSAZXCVBNM!@#$%^&*(-+={[};?/>.<",
".>/?;}[<=+-(*&^%$#@!MNBVCXeASDFGHKLPOqUYTREWQ0987654321mnbvcxzasdfghjklpoiuytrIwZ"),
# 81 - vflLC8JvQ 2013/07/25
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKHGFDSAZXCVBNM!@#$%^&*(-+={[};?/>.",
"C>/?;}[{=+-(*&^%$#@!MNBVYXZASDFGHKLPOIU.TREWQ0q87659321mnbvcxzasdfghjkl4oiuytrewp"),
# 80 - vflZK4ZYR 2013/08/23 (sporadic)
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKHGFDSAZXCVBNM!@#$%^&*(-+={[};?/>",
"wertyuioplkjhgfdsaqxcvbnm1234567890QWERTYUIOPLKHGFDSAZXCVBNM!@#$%^&z(-+={[};?/>"),
# 79 - vflLC8JvQ 2013/07/25 (sporadic)
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKHGFDSAZXCVBNM!@#$%^&*(-+={[};?/",
"Z?;}[{=+-(*&^%$#@!MNBVCXRASDFGHKLPOIUYT/EWQ0q87659321mnbvcxzasdfghjkl4oiuytrewp"),
]
tests_age_gate = [
# 86 - vflqinMWD
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[|};?/>.<",
"ertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!/#$%^&*()_-+={[|};?@"),
]
def find_matching(wrong, right):
idxs = [wrong.index(c) for c in right]
return compress(idxs)
return ('s[%d]' % i for i in idxs)
def compress(idxs):
def _genslice(start, end, step):
starts = '' if start == 0 else str(start)
ends = ':%d' % (end+step)
steps = '' if step == 1 else (':%d' % step)
return 's[%s%s%s]' % (starts, ends, steps)
step = None
for i, prev in zip(idxs[1:], idxs[:-1]):
if step is not None:
if i - prev == step:
continue
yield _genslice(start, prev, step)
step = None
continue
if i - prev in [-1, 1]:
step = i - prev
start = prev
continue
else:
yield 's[%d]' % prev
if step is None:
yield 's[%d]' % i
else:
yield _genslice(start, i, step)
def _assert_compress(inp, exp):
res = list(compress(inp))
if res != exp:
print('Got %r, expected %r' % (res, exp))
assert res == exp
_assert_compress([0,2,4,6], ['s[0]', 's[2]', 's[4]', 's[6]'])
_assert_compress([0,1,2,4,6,7], ['s[:3]', 's[4]', 's[6:8]'])
_assert_compress([8,0,1,2,4,7,6,9], ['s[8]', 's[:3]', 's[4]', 's[7:5:-1]', 's[9]'])
def gen(wrong, right, indent):
code = ' + '.join(find_matching(wrong, right))
return 'if len(s) == %d:\n%s return %s\n' % (len(wrong), indent, code)
def genall(tests):
indent = ' ' * 8
return indent + (indent + 'el').join(gen(wrong, right, indent) for wrong,right in tests)
def main():
print(genall(tests))
print(u' Age gate:')
print(genall(tests_age_gate))
if __name__ == '__main__':
main()

View File

@ -8,8 +8,10 @@ import sys
try: try:
from setuptools import setup from setuptools import setup
setuptools_available = True
except ImportError: except ImportError:
from distutils.core import setup from distutils.core import setup
setuptools_available = False
try: try:
# This will create an exe that needs Microsoft Visual C++ 2008 # This will create an exe that needs Microsoft Visual C++ 2008
@ -43,13 +45,16 @@ if len(sys.argv) >= 2 and sys.argv[1] == 'py2exe':
params = py2exe_params params = py2exe_params
else: else:
params = { params = {
'scripts': ['bin/youtube-dl'],
'data_files': [ # Installing system-wide would require sudo... 'data_files': [ # Installing system-wide would require sudo...
('etc/bash_completion.d', ['youtube-dl.bash-completion']), ('etc/bash_completion.d', ['youtube-dl.bash-completion']),
('share/doc/youtube_dl', ['README.txt']), ('share/doc/youtube_dl', ['README.txt']),
('share/man/man1/', ['youtube-dl.1']) ('share/man/man1/', ['youtube-dl.1'])
] ]
} }
if setuptools_available:
params['entry_points'] = {'console_scripts': ['youtube-dl = youtube_dl:main']}
else:
params['scripts'] = ['bin/youtube-dl']
# Get the version from youtube_dl/version.py without importing the package # Get the version from youtube_dl/version.py without importing the package
exec(compile(open('youtube_dl/version.py').read(), exec(compile(open('youtube_dl/version.py').read(),
@ -63,6 +68,7 @@ setup(
' YouTube.com and other video sites.', ' YouTube.com and other video sites.',
url='https://github.com/rg3/youtube-dl', url='https://github.com/rg3/youtube-dl',
author='Ricardo Garcia', author='Ricardo Garcia',
author_email='ytdl@yt-dl.org',
maintainer='Philipp Hagemeister', maintainer='Philipp Hagemeister',
maintainer_email='phihag@phihag.de', maintainer_email='phihag@phihag.de',
packages=['youtube_dl', 'youtube_dl.extractor'], packages=['youtube_dl', 'youtube_dl.extractor'],

0
test/__init__.py Normal file
View File

View File

@ -1,38 +1,80 @@
import errno
import io import io
import hashlib
import json import json
import os.path import os.path
import re
import types
import sys
import youtube_dl.extractor import youtube_dl.extractor
from youtube_dl import YoutubeDL, YoutubeDLHandler from youtube_dl import YoutubeDL
from youtube_dl.utils import ( from youtube_dl.utils import preferredencoding
compat_cookiejar,
compat_urllib_request,
)
# General configuration (from __init__, not very elegant...)
jar = compat_cookiejar.CookieJar()
cookie_processor = compat_urllib_request.HTTPCookieProcessor(jar)
proxy_handler = compat_urllib_request.ProxyHandler()
opener = compat_urllib_request.build_opener(proxy_handler, cookie_processor, YoutubeDLHandler())
compat_urllib_request.install_opener(opener)
PARAMETERS_FILE = os.path.join(os.path.dirname(os.path.abspath(__file__)), "parameters.json") def global_setup():
with io.open(PARAMETERS_FILE, encoding='utf-8') as pf: youtube_dl._setup_opener(timeout=10)
parameters = json.load(pf)
def get_params(override=None):
PARAMETERS_FILE = os.path.join(os.path.dirname(os.path.abspath(__file__)),
"parameters.json")
with io.open(PARAMETERS_FILE, encoding='utf-8') as pf:
parameters = json.load(pf)
if override:
parameters.update(override)
return parameters
def try_rm(filename):
""" Remove a file if it exists """
try:
os.remove(filename)
except OSError as ose:
if ose.errno != errno.ENOENT:
raise
def report_warning(message):
'''
Print the message to stderr, it will be prefixed with 'WARNING:'
If stderr is a tty file the 'WARNING:' will be colored
'''
if sys.stderr.isatty() and os.name != 'nt':
_msg_header = u'\033[0;33mWARNING:\033[0m'
else:
_msg_header = u'WARNING:'
output = u'%s %s\n' % (_msg_header, message)
if 'b' in getattr(sys.stderr, 'mode', '') or sys.version_info[0] < 3:
output = output.encode(preferredencoding())
sys.stderr.write(output)
class FakeYDL(YoutubeDL): class FakeYDL(YoutubeDL):
def __init__(self): def __init__(self, override=None):
self.result = []
# Different instances of the downloader can't share the same dictionary # Different instances of the downloader can't share the same dictionary
# some test set the "sublang" parameter, which would break the md5 checks. # some test set the "sublang" parameter, which would break the md5 checks.
self.params = dict(parameters) params = get_params(override=override)
def to_screen(self, s): super(FakeYDL, self).__init__(params)
self.result = []
def to_screen(self, s, skip_eol=None):
print(s) print(s)
def trouble(self, s, tb=None): def trouble(self, s, tb=None):
raise Exception(s) raise Exception(s)
def download(self, x): def download(self, x):
self.result.append(x) self.result.append(x)
def expect_warning(self, regex):
# Silence an expected warning matching a regex
old_report_warning = self.report_warning
def report_warning(self, message):
if re.match(regex, message): return
old_report_warning(message)
self.report_warning = types.MethodType(report_warning, self)
def get_testcases(): def get_testcases():
for ie in youtube_dl.extractor.gen_extractors(): for ie in youtube_dl.extractor.gen_extractors():
t = getattr(ie, '_TEST', None) t = getattr(ie, '_TEST', None)
@ -42,3 +84,6 @@ def get_testcases():
for t in getattr(ie, '_TESTS', []): for t in getattr(ie, '_TESTS', []):
t['name'] = type(ie).__name__[:-len('IE')] t['name'] = type(ie).__name__[:-len('IE')]
yield t yield t
md5 = lambda s: hashlib.md5(s.encode('utf-8')).hexdigest()

145
test/test_YoutubeDL.py Normal file
View File

@ -0,0 +1,145 @@
#!/usr/bin/env python
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import FakeYDL
class YDL(FakeYDL):
def __init__(self, *args, **kwargs):
super(YDL, self).__init__(*args, **kwargs)
self.downloaded_info_dicts = []
self.msgs = []
def process_info(self, info_dict):
self.downloaded_info_dicts.append(info_dict)
def to_screen(self, msg):
self.msgs.append(msg)
class TestFormatSelection(unittest.TestCase):
def test_prefer_free_formats(self):
# Same resolution => download webm
ydl = YDL()
ydl.params['prefer_free_formats'] = True
formats = [
{u'ext': u'webm', u'height': 460},
{u'ext': u'mp4', u'height': 460},
]
info_dict = {u'formats': formats, u'extractor': u'test'}
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded[u'ext'], u'webm')
# Different resolution => download best quality (mp4)
ydl = YDL()
ydl.params['prefer_free_formats'] = True
formats = [
{u'ext': u'webm', u'height': 720},
{u'ext': u'mp4', u'height': 1080},
]
info_dict[u'formats'] = formats
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded[u'ext'], u'mp4')
# No prefer_free_formats => keep original formats order
ydl = YDL()
ydl.params['prefer_free_formats'] = False
formats = [
{u'ext': u'webm', u'height': 720},
{u'ext': u'flv', u'height': 720},
]
info_dict[u'formats'] = formats
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded[u'ext'], u'flv')
def test_format_limit(self):
formats = [
{u'format_id': u'meh', u'url': u'http://example.com/meh'},
{u'format_id': u'good', u'url': u'http://example.com/good'},
{u'format_id': u'great', u'url': u'http://example.com/great'},
{u'format_id': u'excellent', u'url': u'http://example.com/exc'},
]
info_dict = {
u'formats': formats, u'extractor': u'test', 'id': 'testvid'}
ydl = YDL()
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded[u'format_id'], u'excellent')
ydl = YDL({'format_limit': 'good'})
assert ydl.params['format_limit'] == 'good'
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded[u'format_id'], u'good')
ydl = YDL({'format_limit': 'great', 'format': 'all'})
ydl.process_ie_result(info_dict)
self.assertEqual(ydl.downloaded_info_dicts[0][u'format_id'], u'meh')
self.assertEqual(ydl.downloaded_info_dicts[1][u'format_id'], u'good')
self.assertEqual(ydl.downloaded_info_dicts[2][u'format_id'], u'great')
self.assertTrue('3' in ydl.msgs[0])
ydl = YDL()
ydl.params['format_limit'] = 'excellent'
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded[u'format_id'], u'excellent')
def test_format_selection(self):
formats = [
{u'format_id': u'35', u'ext': u'mp4'},
{u'format_id': u'45', u'ext': u'webm'},
{u'format_id': u'47', u'ext': u'webm'},
{u'format_id': u'2', u'ext': u'flv'},
]
info_dict = {u'formats': formats, u'extractor': u'test'}
ydl = YDL({'format': u'20/47'})
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], u'47')
ydl = YDL({'format': u'20/71/worst'})
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], u'35')
ydl = YDL()
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], u'2')
ydl = YDL({'format': u'webm/mp4'})
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], u'47')
ydl = YDL({'format': u'3gp/40/mp4'})
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], u'35')
def test_add_extra_info(self):
test_dict = {
'extractor': 'Foo',
}
extra_info = {
'extractor': 'Bar',
'playlist': 'funny videos',
}
YDL.add_extra_info(test_dict, extra_info)
self.assertEqual(test_dict['extractor'], 'Foo')
self.assertEqual(test_dict['playlist'], 'funny videos')
if __name__ == '__main__':
unittest.main()

View File

@ -0,0 +1,55 @@
#!/usr/bin/env python
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import global_setup, try_rm
global_setup()
from youtube_dl import YoutubeDL
def _download_restricted(url, filename, age):
""" Returns true iff the file has been downloaded """
params = {
'age_limit': age,
'skip_download': True,
'writeinfojson': True,
"outtmpl": "%(id)s.%(ext)s",
}
ydl = YoutubeDL(params)
ydl.add_default_info_extractors()
json_filename = filename + '.info.json'
try_rm(json_filename)
ydl.download([url])
res = os.path.exists(json_filename)
try_rm(json_filename)
return res
class TestAgeRestriction(unittest.TestCase):
def _assert_restricted(self, url, filename, age, old_age=None):
self.assertTrue(_download_restricted(url, filename, old_age))
self.assertFalse(_download_restricted(url, filename, age))
def test_youtube(self):
self._assert_restricted('07FYdnEawAQ', '07FYdnEawAQ.mp4', 10)
def test_youporn(self):
self._assert_restricted(
'http://www.youporn.com/watch/505835/sex-ed-is-it-safe-to-masturbate-daily/',
'505835.mp4', 2, old_age=25)
def test_pornotube(self):
self._assert_restricted(
'http://pornotube.com/c/173/m/1689755/Marilyn-Monroe-Bathing',
'1689755.flv', 13)
if __name__ == '__main__':
unittest.main()

View File

@ -1,14 +1,20 @@
#!/usr/bin/env python #!/usr/bin/env python
import sys
import unittest
# Allow direct execution # Allow direct execution
import os import os
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import get_testcases
from youtube_dl.extractor import (
gen_extractors,
JustinTVIE,
YoutubeIE,
)
from youtube_dl.extractor import YoutubeIE, YoutubePlaylistIE, YoutubeChannelIE, JustinTVIE, gen_extractors
from helper import get_testcases
class TestAllURLsMatching(unittest.TestCase): class TestAllURLsMatching(unittest.TestCase):
def setUp(self): def setUp(self):

View File

@ -1,71 +0,0 @@
#!/usr/bin/env python
import sys
import unittest
import json
import io
import hashlib
# Allow direct execution
import os
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from youtube_dl.extractor import DailymotionIE
from youtube_dl.utils import *
from helper import FakeYDL
md5 = lambda s: hashlib.md5(s.encode('utf-8')).hexdigest()
class TestDailymotionSubtitles(unittest.TestCase):
def setUp(self):
self.DL = FakeYDL()
self.url = 'http://www.dailymotion.com/video/xczg00'
def getInfoDict(self):
IE = DailymotionIE(self.DL)
info_dict = IE.extract(self.url)
return info_dict
def getSubtitles(self):
info_dict = self.getInfoDict()
return info_dict[0]['subtitles']
def test_no_writesubtitles(self):
subtitles = self.getSubtitles()
self.assertEqual(subtitles, None)
def test_subtitles(self):
self.DL.params['writesubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['en']), '976553874490cba125086bbfea3ff76f')
def test_subtitles_lang(self):
self.DL.params['writesubtitles'] = True
self.DL.params['subtitleslangs'] = ['fr']
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['fr']), '594564ec7d588942e384e920e5341792')
def test_allsubtitles(self):
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(len(subtitles.keys()), 5)
def test_list_subtitles(self):
self.DL.params['listsubtitles'] = True
info_dict = self.getInfoDict()
self.assertEqual(info_dict, None)
def test_automatic_captions(self):
self.DL.params['writeautomaticsub'] = True
self.DL.params['subtitleslang'] = ['en']
subtitles = self.getSubtitles()
self.assertTrue(len(subtitles.keys()) == 0)
def test_nosubtitles(self):
self.url = 'http://www.dailymotion.com/video/x12u166_le-zapping-tele-star-du-08-aout-2013_tv'
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(len(subtitles), 0)
def test_multiple_langs(self):
self.DL.params['writesubtitles'] = True
langs = ['es', 'fr', 'de']
self.DL.params['subtitleslangs'] = langs
subtitles = self.getSubtitles()
for lang in langs:
self.assertTrue(subtitles.get(lang) is not None, u'Subtitles for \'%s\' not extracted' % lang)
if __name__ == '__main__':
unittest.main()

View File

@ -1,43 +1,40 @@
#!/usr/bin/env python #!/usr/bin/env python
import errno # Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import (
get_params,
get_testcases,
global_setup,
try_rm,
md5,
report_warning
)
global_setup()
import hashlib import hashlib
import io import io
import os
import json import json
import unittest
import sys
import socket import socket
import binascii
# Allow direct execution
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import youtube_dl.YoutubeDL import youtube_dl.YoutubeDL
from youtube_dl.utils import * from youtube_dl.utils import (
compat_str,
PARAMETERS_FILE = os.path.join(os.path.dirname(os.path.abspath(__file__)), "parameters.json") compat_urllib_error,
compat_HTTPError,
DownloadError,
ExtractorError,
UnavailableVideoError,
)
from youtube_dl.extractor import get_info_extractor
RETRIES = 3 RETRIES = 3
# General configuration (from __init__, not very elegant...)
jar = compat_cookiejar.CookieJar()
cookie_processor = compat_urllib_request.HTTPCookieProcessor(jar)
proxy_handler = compat_urllib_request.ProxyHandler()
opener = compat_urllib_request.build_opener(proxy_handler, cookie_processor, YoutubeDLHandler())
compat_urllib_request.install_opener(opener)
socket.setdefaulttimeout(10)
def _try_rm(filename):
""" Remove a file if it exists """
try:
os.remove(filename)
except OSError as ose:
if ose.errno != errno.ENOENT:
raise
md5 = lambda s: hashlib.md5(s.encode('utf-8')).hexdigest()
class YoutubeDL(youtube_dl.YoutubeDL): class YoutubeDL(youtube_dl.YoutubeDL):
def __init__(self, *args, **kwargs): def __init__(self, *args, **kwargs):
self.to_stderr = self.to_screen self.to_stderr = self.to_screen
@ -54,17 +51,12 @@ def _file_md5(fn):
with open(fn, 'rb') as f: with open(fn, 'rb') as f:
return hashlib.md5(f.read()).hexdigest() return hashlib.md5(f.read()).hexdigest()
from helper import get_testcases
defs = get_testcases() defs = get_testcases()
with io.open(PARAMETERS_FILE, encoding='utf-8') as pf:
parameters = json.load(pf)
class TestDownload(unittest.TestCase): class TestDownload(unittest.TestCase):
maxDiff = None maxDiff = None
def setUp(self): def setUp(self):
self.parameters = parameters
self.defs = defs self.defs = defs
### Dynamically generate tests ### Dynamically generate tests
@ -72,20 +64,27 @@ def generator(test_case):
def test_template(self): def test_template(self):
ie = youtube_dl.extractor.get_info_extractor(test_case['name']) ie = youtube_dl.extractor.get_info_extractor(test_case['name'])
other_ies = [get_info_extractor(ie_key) for ie_key in test_case.get('add_ie', [])]
def print_skipping(reason): def print_skipping(reason):
print('Skipping %s: %s' % (test_case['name'], reason)) print('Skipping %s: %s' % (test_case['name'], reason))
if not ie._WORKING: if not ie.working():
print_skipping('IE marked as not _WORKING') print_skipping('IE marked as not _WORKING')
return return
if 'playlist' not in test_case and not test_case['file']: if 'playlist' not in test_case:
print_skipping('No output file specified') info_dict = test_case.get('info_dict', {})
return if not test_case.get('file') and not (info_dict.get('id') and info_dict.get('ext')):
print_skipping('The output file cannot be know, the "file" '
'key is missing or the info_dict is incomplete')
return
if 'skip' in test_case: if 'skip' in test_case:
print_skipping(test_case['skip']) print_skipping(test_case['skip'])
return return
for other_ie in other_ies:
if not other_ie.working():
print_skipping(u'test depends on %sIE, marked as not WORKING' % other_ie.ie_key())
return
params = self.parameters.copy() params = get_params(test_case.get('params', {}))
params.update(test_case.get('params', {}))
ydl = YoutubeDL(params) ydl = YoutubeDL(params)
ydl.add_default_info_extractors() ydl.add_default_info_extractors()
@ -95,35 +94,47 @@ def generator(test_case):
finished_hook_called.add(status['filename']) finished_hook_called.add(status['filename'])
ydl.fd.add_progress_hook(_hook) ydl.fd.add_progress_hook(_hook)
def get_tc_filename(tc):
return tc.get('file') or ydl.prepare_filename(tc.get('info_dict', {}))
test_cases = test_case.get('playlist', [test_case]) test_cases = test_case.get('playlist', [test_case])
for tc in test_cases: def try_rm_tcs_files():
_try_rm(tc['file']) for tc in test_cases:
_try_rm(tc['file'] + '.part') tc_filename = get_tc_filename(tc)
_try_rm(tc['file'] + '.info.json') try_rm(tc_filename)
try_rm(tc_filename + '.part')
try_rm(tc_filename + '.info.json')
try_rm_tcs_files()
try: try:
for retry in range(1, RETRIES + 1): try_num = 1
while True:
try: try:
ydl.download([test_case['url']]) ydl.download([test_case['url']])
except (DownloadError, ExtractorError) as err: except (DownloadError, ExtractorError) as err:
if retry == RETRIES: raise
# Check if the exception is not a network related one # Check if the exception is not a network related one
if not err.exc_info[0] in (compat_urllib_error.URLError, socket.timeout, UnavailableVideoError): if not err.exc_info[0] in (compat_urllib_error.URLError, socket.timeout, UnavailableVideoError) or (err.exc_info[0] == compat_HTTPError and err.exc_info[1].code == 503):
raise raise
print('Retrying: {0} failed tries\n\n##########\n\n'.format(retry)) if try_num == RETRIES:
report_warning(u'Failed due to network errors, skipping...')
return
print('Retrying: {0} failed tries\n\n##########\n\n'.format(try_num))
try_num += 1
else: else:
break break
for tc in test_cases: for tc in test_cases:
tc_filename = get_tc_filename(tc)
if not test_case.get('params', {}).get('skip_download', False): if not test_case.get('params', {}).get('skip_download', False):
self.assertTrue(os.path.exists(tc['file']), msg='Missing file ' + tc['file']) self.assertTrue(os.path.exists(tc_filename), msg='Missing file ' + tc_filename)
self.assertTrue(tc['file'] in finished_hook_called) self.assertTrue(tc_filename in finished_hook_called)
self.assertTrue(os.path.exists(tc['file'] + '.info.json')) self.assertTrue(os.path.exists(tc_filename + '.info.json'))
if 'md5' in tc: if 'md5' in tc:
md5_for_file = _file_md5(tc['file']) md5_for_file = _file_md5(tc_filename)
self.assertEqual(md5_for_file, tc['md5']) self.assertEqual(md5_for_file, tc['md5'])
with io.open(tc['file'] + '.info.json', encoding='utf-8') as infof: with io.open(tc_filename + '.info.json', encoding='utf-8') as infof:
info_dict = json.load(infof) info_dict = json.load(infof)
for (info_field, expected) in tc.get('info_dict', {}).items(): for (info_field, expected) in tc.get('info_dict', {}).items():
if isinstance(expected, compat_str) and expected.startswith('md5:'): if isinstance(expected, compat_str) and expected.startswith('md5:'):
@ -143,11 +154,11 @@ def generator(test_case):
# Check for the presence of mandatory fields # Check for the presence of mandatory fields
for key in ('id', 'url', 'title', 'ext'): for key in ('id', 'url', 'title', 'ext'):
self.assertTrue(key in info_dict.keys() and info_dict[key]) self.assertTrue(key in info_dict.keys() and info_dict[key])
# Check for mandatory fields that are automatically set by YoutubeDL
for key in ['webpage_url', 'extractor', 'extractor_key']:
self.assertTrue(info_dict.get(key), u'Missing field: %s' % key)
finally: finally:
for tc in test_cases: try_rm_tcs_files()
_try_rm(tc['file'])
_try_rm(tc['file'] + '.part')
_try_rm(tc['file'] + '.info.json')
return test_template return test_template

View File

@ -1,25 +1,29 @@
#!/usr/bin/env python #!/usr/bin/env python
# encoding: utf-8 # encoding: utf-8
import sys
import unittest
import json
# Allow direct execution # Allow direct execution
import os import os
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import FakeYDL, global_setup
global_setup()
from youtube_dl.extractor import ( from youtube_dl.extractor import (
DailymotionPlaylistIE, DailymotionPlaylistIE,
DailymotionUserIE, DailymotionUserIE,
VimeoChannelIE, VimeoChannelIE,
UstreamChannelIE, UstreamChannelIE,
SoundcloudSetIE,
SoundcloudUserIE, SoundcloudUserIE,
LivestreamIE, LivestreamIE,
NHLVideocenterIE,
BambuserChannelIE,
) )
from youtube_dl.utils import *
from helper import FakeYDL
class TestPlaylists(unittest.TestCase): class TestPlaylists(unittest.TestCase):
def assertIsPlaylist(self, info): def assertIsPlaylist(self, info):
@ -58,6 +62,14 @@ class TestPlaylists(unittest.TestCase):
self.assertEqual(result['id'], u'5124905') self.assertEqual(result['id'], u'5124905')
self.assertTrue(len(result['entries']) >= 11) self.assertTrue(len(result['entries']) >= 11)
def test_soundcloud_set(self):
dl = FakeYDL()
ie = SoundcloudSetIE(dl)
result = ie.extract('https://soundcloud.com/the-concept-band/sets/the-royal-concept-ep')
self.assertIsPlaylist(result)
self.assertEqual(result['title'], u'The Royal Concept EP')
self.assertTrue(len(result['entries']) >= 6)
def test_soundcloud_user(self): def test_soundcloud_user(self):
dl = FakeYDL() dl = FakeYDL()
ie = SoundcloudUserIE(dl) ie = SoundcloudUserIE(dl)
@ -74,5 +86,22 @@ class TestPlaylists(unittest.TestCase):
self.assertEqual(result['title'], u'TEDCity2.0 (English)') self.assertEqual(result['title'], u'TEDCity2.0 (English)')
self.assertTrue(len(result['entries']) >= 4) self.assertTrue(len(result['entries']) >= 4)
def test_nhl_videocenter(self):
dl = FakeYDL()
ie = NHLVideocenterIE(dl)
result = ie.extract('http://video.canucks.nhl.com/videocenter/console?catid=999')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], u'999')
self.assertEqual(result['title'], u'Highlights')
self.assertEqual(len(result['entries']), 12)
def test_bambuser_channel(self):
dl = FakeYDL()
ie = BambuserChannelIE(dl)
result = ie.extract('http://bambuser.com/channel/pixelversity')
self.assertIsPlaylist(result)
self.assertEqual(result['title'], u'pixelversity')
self.assertTrue(len(result['entries']) >= 66)
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()

211
test/test_subtitles.py Normal file
View File

@ -0,0 +1,211 @@
#!/usr/bin/env python
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import FakeYDL, global_setup, md5
global_setup()
from youtube_dl.extractor import (
YoutubeIE,
DailymotionIE,
TEDIE,
)
class BaseTestSubtitles(unittest.TestCase):
url = None
IE = None
def setUp(self):
self.DL = FakeYDL()
self.ie = self.IE(self.DL)
def getInfoDict(self):
info_dict = self.ie.extract(self.url)
return info_dict
def getSubtitles(self):
info_dict = self.getInfoDict()
return info_dict['subtitles']
class TestYoutubeSubtitles(BaseTestSubtitles):
url = 'QRS8MkLhQmM'
IE = YoutubeIE
def getSubtitles(self):
info_dict = self.getInfoDict()
return info_dict[0]['subtitles']
def test_youtube_no_writesubtitles(self):
self.DL.params['writesubtitles'] = False
subtitles = self.getSubtitles()
self.assertEqual(subtitles, None)
def test_youtube_subtitles(self):
self.DL.params['writesubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['en']), '4cd9278a35ba2305f47354ee13472260')
def test_youtube_subtitles_lang(self):
self.DL.params['writesubtitles'] = True
self.DL.params['subtitleslangs'] = ['it']
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['it']), '164a51f16f260476a05b50fe4c2f161d')
def test_youtube_allsubtitles(self):
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(len(subtitles.keys()), 13)
def test_youtube_subtitles_sbv_format(self):
self.DL.params['writesubtitles'] = True
self.DL.params['subtitlesformat'] = 'sbv'
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['en']), '13aeaa0c245a8bed9a451cb643e3ad8b')
def test_youtube_subtitles_vtt_format(self):
self.DL.params['writesubtitles'] = True
self.DL.params['subtitlesformat'] = 'vtt'
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['en']), '356cdc577fde0c6783b9b822e7206ff7')
def test_youtube_list_subtitles(self):
self.DL.expect_warning(u'Video doesn\'t have automatic captions')
self.DL.params['listsubtitles'] = True
info_dict = self.getInfoDict()
self.assertEqual(info_dict, None)
def test_youtube_automatic_captions(self):
self.url = '8YoUxe5ncPo'
self.DL.params['writeautomaticsub'] = True
self.DL.params['subtitleslangs'] = ['it']
subtitles = self.getSubtitles()
self.assertTrue(subtitles['it'] is not None)
def test_youtube_nosubtitles(self):
self.DL.expect_warning(u'video doesn\'t have subtitles')
self.url = 'sAjKT8FhjI8'
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(len(subtitles), 0)
def test_youtube_multiple_langs(self):
self.url = 'QRS8MkLhQmM'
self.DL.params['writesubtitles'] = True
langs = ['it', 'fr', 'de']
self.DL.params['subtitleslangs'] = langs
subtitles = self.getSubtitles()
for lang in langs:
self.assertTrue(subtitles.get(lang) is not None, u'Subtitles for \'%s\' not extracted' % lang)
class TestDailymotionSubtitles(BaseTestSubtitles):
url = 'http://www.dailymotion.com/video/xczg00'
IE = DailymotionIE
def test_no_writesubtitles(self):
subtitles = self.getSubtitles()
self.assertEqual(subtitles, None)
def test_subtitles(self):
self.DL.params['writesubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['en']), '976553874490cba125086bbfea3ff76f')
def test_subtitles_lang(self):
self.DL.params['writesubtitles'] = True
self.DL.params['subtitleslangs'] = ['fr']
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['fr']), '594564ec7d588942e384e920e5341792')
def test_allsubtitles(self):
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(len(subtitles.keys()), 5)
def test_list_subtitles(self):
self.DL.expect_warning(u'Automatic Captions not supported by this server')
self.DL.params['listsubtitles'] = True
info_dict = self.getInfoDict()
self.assertEqual(info_dict, None)
def test_automatic_captions(self):
self.DL.expect_warning(u'Automatic Captions not supported by this server')
self.DL.params['writeautomaticsub'] = True
self.DL.params['subtitleslang'] = ['en']
subtitles = self.getSubtitles()
self.assertTrue(len(subtitles.keys()) == 0)
def test_nosubtitles(self):
self.DL.expect_warning(u'video doesn\'t have subtitles')
self.url = 'http://www.dailymotion.com/video/x12u166_le-zapping-tele-star-du-08-aout-2013_tv'
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(len(subtitles), 0)
def test_multiple_langs(self):
self.DL.params['writesubtitles'] = True
langs = ['es', 'fr', 'de']
self.DL.params['subtitleslangs'] = langs
subtitles = self.getSubtitles()
for lang in langs:
self.assertTrue(subtitles.get(lang) is not None, u'Subtitles for \'%s\' not extracted' % lang)
class TestTedSubtitles(BaseTestSubtitles):
url = 'http://www.ted.com/talks/dan_dennett_on_our_consciousness.html'
IE = TEDIE
def test_no_writesubtitles(self):
subtitles = self.getSubtitles()
self.assertEqual(subtitles, None)
def test_subtitles(self):
self.DL.params['writesubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['en']), '2154f31ff9b9f89a0aa671537559c21d')
def test_subtitles_lang(self):
self.DL.params['writesubtitles'] = True
self.DL.params['subtitleslangs'] = ['fr']
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['fr']), '7616cbc6df20ec2c1204083c83871cf6')
def test_allsubtitles(self):
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(len(subtitles.keys()), 28)
def test_list_subtitles(self):
self.DL.expect_warning(u'Automatic Captions not supported by this server')
self.DL.params['listsubtitles'] = True
info_dict = self.getInfoDict()
self.assertEqual(info_dict, None)
def test_automatic_captions(self):
self.DL.expect_warning(u'Automatic Captions not supported by this server')
self.DL.params['writeautomaticsub'] = True
self.DL.params['subtitleslang'] = ['en']
subtitles = self.getSubtitles()
self.assertTrue(len(subtitles.keys()) == 0)
def test_multiple_langs(self):
self.DL.params['writesubtitles'] = True
langs = ['es', 'fr', 'de']
self.DL.params['subtitleslangs'] = langs
subtitles = self.getSubtitles()
for lang in langs:
self.assertTrue(subtitles.get(lang) is not None, u'Subtitles for \'%s\' not extracted' % lang)
if __name__ == '__main__':
unittest.main()

View File

@ -1,14 +1,15 @@
#!/usr/bin/env python #!/usr/bin/env python
# coding: utf-8
# Various small unit tests
import sys
import unittest
import xml.etree.ElementTree
# Allow direct execution # Allow direct execution
import os import os
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
# Various small unit tests
import xml.etree.ElementTree
#from youtube_dl.utils import htmlentity_transform #from youtube_dl.utils import htmlentity_transform
from youtube_dl.utils import ( from youtube_dl.utils import (
@ -20,6 +21,9 @@ from youtube_dl.utils import (
unified_strdate, unified_strdate,
find_xpath_attr, find_xpath_attr,
get_meta_content, get_meta_content,
xpath_with_ns,
smuggle_url,
unsmuggle_url,
) )
if sys.version_info < (3, 0): if sys.version_info < (3, 0):
@ -141,5 +145,31 @@ class TestUtil(unittest.TestCase):
self.assertEqual(get_meta('description'), u'foo & bar') self.assertEqual(get_meta('description'), u'foo & bar')
self.assertEqual(get_meta('author'), 'Plato') self.assertEqual(get_meta('author'), 'Plato')
def test_xpath_with_ns(self):
testxml = u'''<root xmlns:media="http://example.com/">
<media:song>
<media:author>The Author</media:author>
<url>http://server.com/download.mp3</url>
</media:song>
</root>'''
doc = xml.etree.ElementTree.fromstring(testxml)
find = lambda p: doc.find(xpath_with_ns(p, {'media': 'http://example.com/'}))
self.assertTrue(find('media:song') is not None)
self.assertEqual(find('media:song/media:author').text, u'The Author')
self.assertEqual(find('media:song/url').text, u'http://server.com/download.mp3')
def test_smuggle_url(self):
data = {u"ö": u"ö", u"abc": [3]}
url = 'https://foo.bar/baz?x=y#a'
smug_url = smuggle_url(url, data)
unsmug_url, unsmug_data = unsmuggle_url(smug_url)
self.assertEqual(url, unsmug_url)
self.assertEqual(data, unsmug_data)
res_url, res_data = unsmuggle_url(url)
self.assertEqual(res_url, url)
self.assertEqual(res_data, None)
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()

View File

@ -0,0 +1,80 @@
#!/usr/bin/env python
# coding: utf-8
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import get_params, global_setup, try_rm
global_setup()
import io
import xml.etree.ElementTree
import youtube_dl.YoutubeDL
import youtube_dl.extractor
class YoutubeDL(youtube_dl.YoutubeDL):
def __init__(self, *args, **kwargs):
super(YoutubeDL, self).__init__(*args, **kwargs)
self.to_stderr = self.to_screen
params = get_params({
'writeannotations': True,
'skip_download': True,
'writeinfojson': False,
'format': 'flv',
})
TEST_ID = 'gr51aVj-mLg'
ANNOTATIONS_FILE = TEST_ID + '.flv.annotations.xml'
EXPECTED_ANNOTATIONS = ['Speech bubble', 'Note', 'Title', 'Spotlight', 'Label']
class TestAnnotations(unittest.TestCase):
def setUp(self):
# Clear old files
self.tearDown()
def test_info_json(self):
expected = list(EXPECTED_ANNOTATIONS) #Two annotations could have the same text.
ie = youtube_dl.extractor.YoutubeIE()
ydl = YoutubeDL(params)
ydl.add_info_extractor(ie)
ydl.download([TEST_ID])
self.assertTrue(os.path.exists(ANNOTATIONS_FILE))
annoxml = None
with io.open(ANNOTATIONS_FILE, 'r', encoding='utf-8') as annof:
annoxml = xml.etree.ElementTree.parse(annof)
self.assertTrue(annoxml is not None, 'Failed to parse annotations XML')
root = annoxml.getroot()
self.assertEqual(root.tag, 'document')
annotationsTag = root.find('annotations')
self.assertEqual(annotationsTag.tag, 'annotations')
annotations = annotationsTag.findall('annotation')
#Not all the annotations have TEXT children and the annotations are returned unsorted.
for a in annotations:
self.assertEqual(a.tag, 'annotation')
if a.get('type') == 'text':
textTag = a.find('TEXT')
text = textTag.text
self.assertTrue(text in expected) #assertIn only added in python 2.7
#remove the first occurance, there could be more than one annotation with the same text
expected.remove(text)
#We should have seen (and removed) all the expected annotation texts.
self.assertEqual(len(expected), 0, 'Not all expected annotations were found.')
def tearDown(self):
try_rm(ANNOTATIONS_FILE)
if __name__ == '__main__':
unittest.main()

View File

@ -1,37 +1,34 @@
#!/usr/bin/env python #!/usr/bin/env python
# coding: utf-8 # coding: utf-8
import json # Allow direct execution
import os import os
import sys import sys
import unittest import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
# Allow direct execution from test.helper import get_params, global_setup
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) global_setup()
import io
import json
import youtube_dl.YoutubeDL import youtube_dl.YoutubeDL
import youtube_dl.extractor import youtube_dl.extractor
from youtube_dl.utils import *
PARAMETERS_FILE = os.path.join(os.path.dirname(os.path.abspath(__file__)), "parameters.json")
# General configuration (from __init__, not very elegant...)
jar = compat_cookiejar.CookieJar()
cookie_processor = compat_urllib_request.HTTPCookieProcessor(jar)
proxy_handler = compat_urllib_request.ProxyHandler()
opener = compat_urllib_request.build_opener(proxy_handler, cookie_processor, YoutubeDLHandler())
compat_urllib_request.install_opener(opener)
class YoutubeDL(youtube_dl.YoutubeDL): class YoutubeDL(youtube_dl.YoutubeDL):
def __init__(self, *args, **kwargs): def __init__(self, *args, **kwargs):
super(YoutubeDL, self).__init__(*args, **kwargs) super(YoutubeDL, self).__init__(*args, **kwargs)
self.to_stderr = self.to_screen self.to_stderr = self.to_screen
with io.open(PARAMETERS_FILE, encoding='utf-8') as pf: params = get_params({
params = json.load(pf) 'writeinfojson': True,
params['writeinfojson'] = True 'skip_download': True,
params['skip_download'] = True 'writedescription': True,
params['writedescription'] = True })
TEST_ID = 'BaW_jenozKc' TEST_ID = 'BaW_jenozKc'
INFO_JSON_FILE = TEST_ID + '.mp4.info.json' INFO_JSON_FILE = TEST_ID + '.mp4.info.json'
@ -42,6 +39,7 @@ This is a test video for youtube-dl.
For more information, contact phihag@phihag.de .''' For more information, contact phihag@phihag.de .'''
class TestInfoJSON(unittest.TestCase): class TestInfoJSON(unittest.TestCase):
def setUp(self): def setUp(self):
# Clear old files # Clear old files

View File

@ -1,20 +1,26 @@
#!/usr/bin/env python #!/usr/bin/env python
import sys
import unittest
import json
# Allow direct execution # Allow direct execution
import os import os
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from youtube_dl.extractor import YoutubeUserIE, YoutubePlaylistIE, YoutubeIE, YoutubeChannelIE, YoutubeShowIE from test.helper import FakeYDL, global_setup
from youtube_dl.utils import * global_setup()
from youtube_dl.extractor import (
YoutubeUserIE,
YoutubePlaylistIE,
YoutubeIE,
YoutubeChannelIE,
YoutubeShowIE,
)
from helper import FakeYDL
class TestYoutubeLists(unittest.TestCase): class TestYoutubeLists(unittest.TestCase):
def assertIsPlaylist(self,info): def assertIsPlaylist(self, info):
"""Make sure the info has '_type' set to 'playlist'""" """Make sure the info has '_type' set to 'playlist'"""
self.assertEqual(info['_type'], 'playlist') self.assertEqual(info['_type'], 'playlist')
@ -27,6 +33,14 @@ class TestYoutubeLists(unittest.TestCase):
ytie_results = [YoutubeIE()._extract_id(url['url']) for url in result['entries']] ytie_results = [YoutubeIE()._extract_id(url['url']) for url in result['entries']]
self.assertEqual(ytie_results, [ 'bV9L5Ht9LgY', 'FXxLjLQi3Fg', 'tU3Bgo5qJZE']) self.assertEqual(ytie_results, [ 'bV9L5Ht9LgY', 'FXxLjLQi3Fg', 'tU3Bgo5qJZE'])
def test_youtube_playlist_noplaylist(self):
dl = FakeYDL()
dl.params['noplaylist'] = True
ie = YoutubePlaylistIE(dl)
result = ie.extract('https://www.youtube.com/watch?v=FXxLjLQi3Fg&list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re')
self.assertEqual(result['_type'], 'url')
self.assertEqual(YoutubeIE()._extract_id(result['url']), 'FXxLjLQi3Fg')
def test_issue_673(self): def test_issue_673(self):
dl = FakeYDL() dl = FakeYDL()
ie = YoutubePlaylistIE(dl) ie = YoutubePlaylistIE(dl)
@ -92,7 +106,7 @@ class TestYoutubeLists(unittest.TestCase):
dl = FakeYDL() dl = FakeYDL()
ie = YoutubeShowIE(dl) ie = YoutubeShowIE(dl)
result = ie.extract('http://www.youtube.com/show/airdisasters') result = ie.extract('http://www.youtube.com/show/airdisasters')
self.assertTrue(len(result) >= 4) self.assertTrue(len(result) >= 3)
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()

View File

@ -1,14 +1,18 @@
#!/usr/bin/env python #!/usr/bin/env python
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import global_setup
global_setup()
import io import io
import re import re
import string import string
import sys
import unittest
# Allow direct execution
import os
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from youtube_dl.extractor import YoutubeIE from youtube_dl.extractor import YoutubeIE
from youtube_dl.utils import compat_str, compat_urlretrieve from youtube_dl.utils import compat_str, compat_urlretrieve

View File

@ -1,84 +0,0 @@
#!/usr/bin/env python
import sys
import unittest
import json
import io
import hashlib
# Allow direct execution
import os
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from youtube_dl.extractor import YoutubeIE
from youtube_dl.utils import *
from helper import FakeYDL
md5 = lambda s: hashlib.md5(s.encode('utf-8')).hexdigest()
class TestYoutubeSubtitles(unittest.TestCase):
def setUp(self):
self.DL = FakeYDL()
self.url = 'QRS8MkLhQmM'
def getInfoDict(self):
IE = YoutubeIE(self.DL)
info_dict = IE.extract(self.url)
return info_dict
def getSubtitles(self):
info_dict = self.getInfoDict()
return info_dict[0]['subtitles']
def test_youtube_no_writesubtitles(self):
self.DL.params['writesubtitles'] = False
subtitles = self.getSubtitles()
self.assertEqual(subtitles, None)
def test_youtube_subtitles(self):
self.DL.params['writesubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['en']), '4cd9278a35ba2305f47354ee13472260')
def test_youtube_subtitles_lang(self):
self.DL.params['writesubtitles'] = True
self.DL.params['subtitleslangs'] = ['it']
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['it']), '164a51f16f260476a05b50fe4c2f161d')
def test_youtube_allsubtitles(self):
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(len(subtitles.keys()), 13)
def test_youtube_subtitles_sbv_format(self):
self.DL.params['writesubtitles'] = True
self.DL.params['subtitlesformat'] = 'sbv'
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['en']), '13aeaa0c245a8bed9a451cb643e3ad8b')
def test_youtube_subtitles_vtt_format(self):
self.DL.params['writesubtitles'] = True
self.DL.params['subtitlesformat'] = 'vtt'
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['en']), '356cdc577fde0c6783b9b822e7206ff7')
def test_youtube_list_subtitles(self):
self.DL.params['listsubtitles'] = True
info_dict = self.getInfoDict()
self.assertEqual(info_dict, None)
def test_youtube_automatic_captions(self):
self.url = '8YoUxe5ncPo'
self.DL.params['writeautomaticsub'] = True
self.DL.params['subtitleslangs'] = ['it']
subtitles = self.getSubtitles()
self.assertTrue(subtitles['it'] is not None)
def test_youtube_nosubtitles(self):
self.url = 'sAjKT8FhjI8'
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(len(subtitles), 0)
def test_youtube_multiple_langs(self):
self.url = 'QRS8MkLhQmM'
self.DL.params['writesubtitles'] = True
langs = ['it', 'fr', 'de']
self.DL.params['subtitleslangs'] = langs
subtitles = self.getSubtitles()
for lang in langs:
self.assertTrue(subtitles.get(lang) is not None, u'Subtitles for \'%s\' not extracted' % lang)
if __name__ == '__main__':
unittest.main()

8
tox.ini Normal file
View File

@ -0,0 +1,8 @@
[tox]
envlist = py26,py27,py33
[testenv]
deps =
nose
coverage
commands = nosetests --verbose {posargs:test} # --with-coverage --cover-package=youtube_dl --cover-html
# test.test_download:TestDownload.test_NowVideo

View File

@ -4,12 +4,19 @@ import re
import subprocess import subprocess
import sys import sys
import time import time
import traceback
if os.name == 'nt': if os.name == 'nt':
import ctypes import ctypes
from .utils import * from .utils import (
compat_urllib_error,
compat_urllib_request,
ContentTooShortError,
determine_ext,
encodeFilename,
sanitize_open,
timeconvert,
)
class FileDownloader(object): class FileDownloader(object):
@ -194,7 +201,7 @@ class FileDownloader(object):
if old_filename == new_filename: if old_filename == new_filename:
return return
os.rename(encodeFilename(old_filename), encodeFilename(new_filename)) os.rename(encodeFilename(old_filename), encodeFilename(new_filename))
except (IOError, OSError) as err: except (IOError, OSError):
self.report_error(u'unable to rename file') self.report_error(u'unable to rename file')
def try_utime(self, filename, last_modified_hdr): def try_utime(self, filename, last_modified_hdr):
@ -227,8 +234,14 @@ class FileDownloader(object):
if self.params.get('noprogress', False): if self.params.get('noprogress', False):
return return
clear_line = (u'\x1b[K' if sys.stderr.isatty() and os.name != 'nt' else u'') clear_line = (u'\x1b[K' if sys.stderr.isatty() and os.name != 'nt' else u'')
eta_str = self.format_eta(eta) if eta is not None:
percent_str = self.format_percent(percent) eta_str = self.format_eta(eta)
else:
eta_str = 'Unknown ETA'
if percent is not None:
percent_str = self.format_percent(percent)
else:
percent_str = 'Unknown %'
speed_str = self.format_speed(speed) speed_str = self.format_speed(speed)
if self.params.get('progress_with_newline', False): if self.params.get('progress_with_newline', False):
self.to_screen(u'[download] %s of %s at %s ETA %s' % self.to_screen(u'[download] %s of %s at %s ETA %s' %
@ -251,7 +264,7 @@ class FileDownloader(object):
"""Report file has already been fully downloaded.""" """Report file has already been fully downloaded."""
try: try:
self.to_screen(u'[download] %s has already been downloaded' % file_name) self.to_screen(u'[download] %s has already been downloaded' % file_name)
except (UnicodeEncodeError) as err: except UnicodeEncodeError:
self.to_screen(u'[download] The file has already been downloaded') self.to_screen(u'[download] The file has already been downloaded')
def report_unable_to_resume(self): def report_unable_to_resume(self):
@ -267,9 +280,10 @@ class FileDownloader(object):
self.to_screen(u'\r%s[download] 100%% of %s in %s' % self.to_screen(u'\r%s[download] 100%% of %s in %s' %
(clear_line, data_len_str, self.format_seconds(tot_time))) (clear_line, data_len_str, self.format_seconds(tot_time)))
def _download_with_rtmpdump(self, filename, url, player_url, page_url, play_path, tc_url): def _download_with_rtmpdump(self, filename, url, player_url, page_url, play_path, tc_url, live):
self.report_destination(filename) self.report_destination(filename)
tmpfilename = self.temp_name(filename) tmpfilename = self.temp_name(filename)
test = self.params.get('test', False)
# Check for rtmpdump first # Check for rtmpdump first
try: try:
@ -291,6 +305,10 @@ class FileDownloader(object):
basic_args += ['--playpath', play_path] basic_args += ['--playpath', play_path]
if tc_url is not None: if tc_url is not None:
basic_args += ['--tcUrl', url] basic_args += ['--tcUrl', url]
if test:
basic_args += ['--stop', '1']
if live:
basic_args += ['--live']
args = basic_args + [[], ['--resume', '--skip', '1']][self.params.get('continuedl', False)] args = basic_args + [[], ['--resume', '--skip', '1']][self.params.get('continuedl', False)]
if self.params.get('verbose', False): if self.params.get('verbose', False):
try: try:
@ -300,7 +318,7 @@ class FileDownloader(object):
shell_quote = repr shell_quote = repr
self.to_screen(u'[debug] rtmpdump command line: ' + shell_quote(args)) self.to_screen(u'[debug] rtmpdump command line: ' + shell_quote(args))
retval = subprocess.call(args) retval = subprocess.call(args)
while retval == 2 or retval == 1: while (retval == 2 or retval == 1) and not test:
prevsize = os.path.getsize(encodeFilename(tmpfilename)) prevsize = os.path.getsize(encodeFilename(tmpfilename))
self.to_screen(u'\r[rtmpdump] %s bytes' % prevsize, skip_eol=True) self.to_screen(u'\r[rtmpdump] %s bytes' % prevsize, skip_eol=True)
time.sleep(5.0) # This seems to be needed time.sleep(5.0) # This seems to be needed
@ -313,7 +331,7 @@ class FileDownloader(object):
self.to_screen(u'\r[rtmpdump] Could not download the whole video. This can happen for some advertisements.') self.to_screen(u'\r[rtmpdump] Could not download the whole video. This can happen for some advertisements.')
retval = 0 retval = 0
break break
if retval == 0: if retval == 0 or (test and retval == 2):
fsize = os.path.getsize(encodeFilename(tmpfilename)) fsize = os.path.getsize(encodeFilename(tmpfilename))
self.to_screen(u'\r[rtmpdump] %s bytes' % fsize) self.to_screen(u'\r[rtmpdump] %s bytes' % fsize)
self.try_rename(tmpfilename, filename) self.try_rename(tmpfilename, filename)
@ -363,15 +381,20 @@ class FileDownloader(object):
self.report_destination(filename) self.report_destination(filename)
tmpfilename = self.temp_name(filename) tmpfilename = self.temp_name(filename)
args = ['ffmpeg', '-y', '-i', url, '-f', 'mp4', tmpfilename] args = ['-y', '-i', url, '-f', 'mp4', '-c', 'copy',
# Check for ffmpeg first '-bsf:a', 'aac_adtstoasc', tmpfilename]
try:
subprocess.call(['ffmpeg', '-h'], stdout=(open(os.path.devnull, 'w')), stderr=subprocess.STDOUT)
except (OSError, IOError):
self.report_error(u'm3u8 download detected but "%s" could not be run' % args[0] )
return False
retval = subprocess.call(args) for program in ['avconv', 'ffmpeg']:
try:
subprocess.call([program, '-version'], stdout=(open(os.path.devnull, 'w')), stderr=subprocess.STDOUT)
break
except (OSError, IOError):
pass
else:
self.report_error(u'm3u8 download detected but ffmpeg or avconv could not be found')
cmd = [program] + args
retval = subprocess.call(cmd)
if retval == 0: if retval == 0:
fsize = os.path.getsize(encodeFilename(tmpfilename)) fsize = os.path.getsize(encodeFilename(tmpfilename))
self.to_screen(u'\r[%s] %s bytes' % (args[0], fsize)) self.to_screen(u'\r[%s] %s bytes' % (args[0], fsize))
@ -408,7 +431,8 @@ class FileDownloader(object):
info_dict.get('player_url', None), info_dict.get('player_url', None),
info_dict.get('page_url', None), info_dict.get('page_url', None),
info_dict.get('play_path', None), info_dict.get('play_path', None),
info_dict.get('tc_url', None)) info_dict.get('tc_url', None),
info_dict.get('rtmp_live', False))
# Attempt to download using mplayer # Attempt to download using mplayer
if url.startswith('mms') or url.startswith('rtsp'): if url.startswith('mms') or url.startswith('rtsp'):
@ -547,12 +571,11 @@ class FileDownloader(object):
# Progress message # Progress message
speed = self.calc_speed(start, time.time(), byte_counter - resume_len) speed = self.calc_speed(start, time.time(), byte_counter - resume_len)
if data_len is None: if data_len is None:
self.report_progress('Unknown %', data_len_str, speed_str, 'Unknown ETA') eta = percent = None
eta = None
else: else:
percent = self.calc_percent(byte_counter, data_len) percent = self.calc_percent(byte_counter, data_len)
eta = self.calc_eta(start, time.time(), data_len - resume_len, byte_counter - resume_len) eta = self.calc_eta(start, time.time(), data_len - resume_len, byte_counter - resume_len)
self.report_progress(percent, data_len_str, speed, eta) self.report_progress(percent, data_len_str, speed, eta)
self._hook_progress({ self._hook_progress({
'downloaded_bytes': byte_counter, 'downloaded_bytes': byte_counter,

View File

@ -3,7 +3,14 @@ import subprocess
import sys import sys
import time import time
from .utils import *
from .utils import (
compat_subprocess_get_DEVNULL,
encodeFilename,
PostProcessingError,
shell_quote,
subtitles_filename,
)
class PostProcessor(object): class PostProcessor(object):
@ -82,6 +89,8 @@ class FFmpegPostProcessor(PostProcessor):
+ opts + + opts +
[encodeFilename(self._ffmpeg_filename_argument(out_path))]) [encodeFilename(self._ffmpeg_filename_argument(out_path))])
if self._downloader.params.get('verbose', False):
self._downloader.to_screen(u'[debug] ffmpeg command line: %s' % shell_quote(cmd))
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE) p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout,stderr = p.communicate() stdout,stderr = p.communicate()
if p.returncode != 0: if p.returncode != 0:
@ -177,7 +186,8 @@ class FFmpegExtractAudioPP(FFmpegPostProcessor):
extension = self._preferredcodec extension = self._preferredcodec
more_opts = [] more_opts = []
if self._preferredquality is not None: if self._preferredquality is not None:
if int(self._preferredquality) < 10: # The opus codec doesn't support the -aq option
if int(self._preferredquality) < 10 and extension != 'opus':
more_opts += [self._exes['avconv'] and '-q:a' or '-aq', self._preferredquality] more_opts += [self._exes['avconv'] and '-q:a' or '-aq', self._preferredquality]
else: else:
more_opts += [self._exes['avconv'] and '-b:a' or '-ab', self._preferredquality + 'k'] more_opts += [self._exes['avconv'] and '-b:a' or '-ab', self._preferredquality + 'k']
@ -444,8 +454,11 @@ class FFmpegEmbedSubtitlePP(FFmpegPostProcessor):
if information['ext'] != u'mp4': if information['ext'] != u'mp4':
self._downloader.to_screen(u'[ffmpeg] Subtitles can only be embedded in mp4 files') self._downloader.to_screen(u'[ffmpeg] Subtitles can only be embedded in mp4 files')
return True, information return True, information
sub_langs = [key for key in information['subtitles']] if not information.get('subtitles'):
self._downloader.to_screen(u'[ffmpeg] There aren\'t any subtitles to embed')
return True, information
sub_langs = [key for key in information['subtitles']]
filename = information['filepath'] filename = information['filepath']
input_files = [filename] + [subtitles_filename(filename, lang, self._subformat) for lang in sub_langs] input_files = [filename] + [subtitles_filename(filename, lang, self._subformat) for lang in sub_langs]
@ -464,3 +477,35 @@ class FFmpegEmbedSubtitlePP(FFmpegPostProcessor):
os.rename(encodeFilename(temp_filename), encodeFilename(filename)) os.rename(encodeFilename(temp_filename), encodeFilename(filename))
return True, information return True, information
class FFmpegMetadataPP(FFmpegPostProcessor):
def run(self, info):
metadata = {}
if info.get('title') is not None:
metadata['title'] = info['title']
if info.get('upload_date') is not None:
metadata['date'] = info['upload_date']
if info.get('uploader') is not None:
metadata['artist'] = info['uploader']
elif info.get('uploader_id') is not None:
metadata['artist'] = info['uploader_id']
if not metadata:
self._downloader.to_screen(u'[ffmpeg] There isn\'t any metadata to add')
return True, info
filename = info['filepath']
ext = os.path.splitext(filename)[1][1:]
temp_filename = filename + u'.temp'
options = ['-c', 'copy']
for (name, value) in metadata.items():
options.extend(['-metadata', '%s=%s' % (name, value)])
options.extend(['-f', ext])
self._downloader.to_screen(u'[ffmpeg] Adding metadata to \'%s\'' % filename)
self.run_ffmpeg(filename, temp_filename, options)
os.remove(encodeFilename(filename))
os.rename(encodeFilename(temp_filename), encodeFilename(filename))
return True, info

View File

@ -3,6 +3,7 @@
from __future__ import absolute_import from __future__ import absolute_import
import errno
import io import io
import os import os
import re import re
@ -70,6 +71,7 @@ class YoutubeDL(object):
logtostderr: Log messages to stderr instead of stdout. logtostderr: Log messages to stderr instead of stdout.
writedescription: Write the video description to a .description file writedescription: Write the video description to a .description file
writeinfojson: Write the video description to a .info.json file writeinfojson: Write the video description to a .info.json file
writeannotations: Write the video annotations to a .annotations.xml file
writethumbnail: Write the thumbnail image to a file writethumbnail: Write the thumbnail image to a file
writesubtitles: Write the video subtitles to a file writesubtitles: Write the video subtitles to a file
writeautomaticsub: Write the automatic subtitles to a file writeautomaticsub: Write the automatic subtitles to a file
@ -83,7 +85,13 @@ class YoutubeDL(object):
skip_download: Skip the actual download of the video file skip_download: Skip the actual download of the video file
cachedir: Location of the cache files in the filesystem. cachedir: Location of the cache files in the filesystem.
None to disable filesystem cache. None to disable filesystem cache.
noplaylist: Download single video instead of a playlist if in doubt.
age_limit: An integer representing the user's age in years.
Unsuitable videos for the given age are skipped.
downloadarchive: File name of a file where all downloads are recorded.
Videos already present in the file are not downloaded
again.
The following parameters are not used by YoutubeDL itself, they are used by The following parameters are not used by YoutubeDL itself, they are used by
the FileDownloader: the FileDownloader:
nopart, updatetime, buffersize, ratelimit, min_filesize, max_filesize, test, nopart, updatetime, buffersize, ratelimit, min_filesize, max_filesize, test,
@ -112,7 +120,7 @@ class YoutubeDL(object):
and not params['restrictfilenames']): and not params['restrictfilenames']):
# On Python 3, the Unicode filesystem API will throw errors (#1474) # On Python 3, the Unicode filesystem API will throw errors (#1474)
self.report_warning( self.report_warning(
u'Assuming --restrict-filenames isnce file system encoding ' u'Assuming --restrict-filenames since file system encoding '
u'cannot encode all charactes. ' u'cannot encode all charactes. '
u'Set the LC_ALL environment variable to fix this.') u'Set the LC_ALL environment variable to fix this.')
params['restrictfilenames'] = True params['restrictfilenames'] = True
@ -208,10 +216,10 @@ class YoutubeDL(object):
If stderr is a tty file the 'WARNING:' will be colored If stderr is a tty file the 'WARNING:' will be colored
''' '''
if sys.stderr.isatty() and os.name != 'nt': if sys.stderr.isatty() and os.name != 'nt':
_msg_header=u'\033[0;33mWARNING:\033[0m' _msg_header = u'\033[0;33mWARNING:\033[0m'
else: else:
_msg_header=u'WARNING:' _msg_header = u'WARNING:'
warning_message=u'%s %s' % (_msg_header,message) warning_message = u'%s %s' % (_msg_header, message)
self.to_stderr(warning_message) self.to_stderr(warning_message)
def report_error(self, message, tb=None): def report_error(self, message, tb=None):
@ -226,19 +234,6 @@ class YoutubeDL(object):
error_message = u'%s %s' % (_msg_header, message) error_message = u'%s %s' % (_msg_header, message)
self.trouble(error_message, tb) self.trouble(error_message, tb)
def slow_down(self, start_time, byte_counter):
"""Sleep if the download speed is over the rate limit."""
rate_limit = self.params.get('ratelimit', None)
if rate_limit is None or byte_counter == 0:
return
now = time.time()
elapsed = now - start_time
if elapsed <= 0.0:
return
speed = float(byte_counter) / elapsed
if speed > rate_limit:
time.sleep((byte_counter - rate_limit * (now - start_time)) / rate_limit)
def report_writedescription(self, descfn): def report_writedescription(self, descfn):
""" Report that the description file is being written """ """ Report that the description file is being written """
self.to_screen(u'[info] Writing video description to: ' + descfn) self.to_screen(u'[info] Writing video description to: ' + descfn)
@ -251,6 +246,10 @@ class YoutubeDL(object):
""" Report that the metadata file has been written """ """ Report that the metadata file has been written """
self.to_screen(u'[info] Video description metadata as JSON to: ' + infofn) self.to_screen(u'[info] Video description metadata as JSON to: ' + infofn)
def report_writeannotations(self, annofn):
""" Report that the annotations file has been written. """
self.to_screen(u'[info] Writing video annotations to: ' + annofn)
def report_file_already_downloaded(self, file_name): def report_file_already_downloaded(self, file_name):
"""Report file has already been fully downloaded.""" """Report file has already been fully downloaded."""
try: try:
@ -273,16 +272,18 @@ class YoutubeDL(object):
autonumber_size = 5 autonumber_size = 5
autonumber_templ = u'%0' + str(autonumber_size) + u'd' autonumber_templ = u'%0' + str(autonumber_size) + u'd'
template_dict['autonumber'] = autonumber_templ % self._num_downloads template_dict['autonumber'] = autonumber_templ % self._num_downloads
if template_dict['playlist_index'] is not None: if template_dict.get('playlist_index') is not None:
template_dict['playlist_index'] = u'%05d' % template_dict['playlist_index'] template_dict['playlist_index'] = u'%05d' % template_dict['playlist_index']
sanitize = lambda k,v: sanitize_filename( sanitize = lambda k, v: sanitize_filename(
u'NA' if v is None else compat_str(v), u'NA' if v is None else compat_str(v),
restricted=self.params.get('restrictfilenames'), restricted=self.params.get('restrictfilenames'),
is_id=(k==u'id')) is_id=(k == u'id'))
template_dict = dict((k, sanitize(k, v)) for k,v in template_dict.items()) template_dict = dict((k, sanitize(k, v))
for k, v in template_dict.items())
filename = self.params['outtmpl'] % template_dict tmpl = os.path.expanduser(self.params['outtmpl'])
filename = tmpl % template_dict
return filename return filename
except KeyError as err: except KeyError as err:
self.report_error(u'Erroneous output template') self.report_error(u'Erroneous output template')
@ -308,15 +309,28 @@ class YoutubeDL(object):
dateRange = self.params.get('daterange', DateRange()) dateRange = self.params.get('daterange', DateRange())
if date not in dateRange: if date not in dateRange:
return u'[download] %s upload date is not in range %s' % (date_from_str(date).isoformat(), dateRange) return u'[download] %s upload date is not in range %s' % (date_from_str(date).isoformat(), dateRange)
age_limit = self.params.get('age_limit')
if age_limit is not None:
if age_limit < info_dict.get('age_limit', 0):
return u'Skipping "' + title + '" because it is age restricted'
if self.in_download_archive(info_dict):
return (u'%(title)s has already been recorded in archive'
% info_dict)
return None return None
@staticmethod
def add_extra_info(info_dict, extra_info):
'''Set the keys from extra_info in info dict if they are missing'''
for key, value in extra_info.items():
info_dict.setdefault(key, value)
def extract_info(self, url, download=True, ie_key=None, extra_info={}): def extract_info(self, url, download=True, ie_key=None, extra_info={}):
''' '''
Returns a list with a dictionary for each video we find. Returns a list with a dictionary for each video we find.
If 'download', also downloads the videos. If 'download', also downloads the videos.
extra_info is a dict containing the extra values to add to each result extra_info is a dict containing the extra values to add to each result
''' '''
if ie_key: if ie_key:
ies = [self.get_info_extractor(ie_key)] ies = [self.get_info_extractor(ie_key)]
else: else:
@ -336,17 +350,17 @@ class YoutubeDL(object):
break break
if isinstance(ie_result, list): if isinstance(ie_result, list):
# Backwards compatibility: old IE result format # Backwards compatibility: old IE result format
for result in ie_result:
result.update(extra_info)
ie_result = { ie_result = {
'_type': 'compat_list', '_type': 'compat_list',
'entries': ie_result, 'entries': ie_result,
} }
else: self.add_extra_info(ie_result,
ie_result.update(extra_info) {
if 'extractor' not in ie_result: 'extractor': ie.IE_NAME,
ie_result['extractor'] = ie.IE_NAME 'webpage_url': url,
return self.process_ie_result(ie_result, download=download) 'extractor_key': ie.ie_key(),
})
return self.process_ie_result(ie_result, download, extra_info)
except ExtractorError as de: # An error we somewhat expected except ExtractorError as de: # An error we somewhat expected
self.report_error(compat_str(de), de.format_traceback()) self.report_error(compat_str(de), de.format_traceback())
break break
@ -358,7 +372,7 @@ class YoutubeDL(object):
raise raise
else: else:
self.report_error(u'no suitable InfoExtractor: %s' % url) self.report_error(u'no suitable InfoExtractor: %s' % url)
def process_ie_result(self, ie_result, download=True, extra_info={}): def process_ie_result(self, ie_result, download=True, extra_info={}):
""" """
Take the result of the ie(may be modified) and resolve all unresolved Take the result of the ie(may be modified) and resolve all unresolved
@ -370,14 +384,8 @@ class YoutubeDL(object):
result_type = ie_result.get('_type', 'video') # If not given we suppose it's a video, support the default old system result_type = ie_result.get('_type', 'video') # If not given we suppose it's a video, support the default old system
if result_type == 'video': if result_type == 'video':
ie_result.update(extra_info) self.add_extra_info(ie_result, extra_info)
if 'playlist' not in ie_result: return self.process_video_result(ie_result)
# It isn't part of a playlist
ie_result['playlist'] = None
ie_result['playlist_index'] = None
if download:
self.process_info(ie_result)
return ie_result
elif result_type == 'url': elif result_type == 'url':
# We have to add extra_info to the results because it may be # We have to add extra_info to the results because it may be
# contained in a playlist # contained in a playlist
@ -386,9 +394,10 @@ class YoutubeDL(object):
ie_key=ie_result.get('ie_key'), ie_key=ie_result.get('ie_key'),
extra_info=extra_info) extra_info=extra_info)
elif result_type == 'playlist': elif result_type == 'playlist':
self.add_extra_info(ie_result, extra_info)
# We process each entry in the playlist # We process each entry in the playlist
playlist = ie_result.get('title', None) or ie_result.get('id', None) playlist = ie_result.get('title', None) or ie_result.get('id', None)
self.to_screen(u'[download] Downloading playlist: %s' % playlist) self.to_screen(u'[download] Downloading playlist: %s' % playlist)
playlist_results = [] playlist_results = []
@ -406,17 +415,15 @@ class YoutubeDL(object):
self.to_screen(u"[%s] playlist '%s': Collected %d video ids (downloading %d of them)" % self.to_screen(u"[%s] playlist '%s': Collected %d video ids (downloading %d of them)" %
(ie_result['extractor'], playlist, n_all_entries, n_entries)) (ie_result['extractor'], playlist, n_all_entries, n_entries))
for i,entry in enumerate(entries,1): for i, entry in enumerate(entries, 1):
self.to_screen(u'[download] Downloading video #%s of %s' %(i, n_entries)) self.to_screen(u'[download] Downloading video #%s of %s' % (i, n_entries))
extra = { extra = {
'playlist': playlist, 'playlist': playlist,
'playlist_index': i + playliststart, 'playlist_index': i + playliststart,
} 'extractor': ie_result['extractor'],
if not 'extractor' in entry: 'webpage_url': ie_result['webpage_url'],
# We set the extractor, if it's an url it will be set then to 'extractor_key': ie_result['extractor_key'],
# the new extractor, but if it's already a video we must make }
# sure it's present: see issue #877
entry['extractor'] = ie_result['extractor']
entry_result = self.process_ie_result(entry, entry_result = self.process_ie_result(entry,
download=download, download=download,
extra_info=extra) extra_info=extra)
@ -425,16 +432,122 @@ class YoutubeDL(object):
return ie_result return ie_result
elif result_type == 'compat_list': elif result_type == 'compat_list':
def _fixup(r): def _fixup(r):
r.setdefault('extractor', ie_result['extractor']) self.add_extra_info(r,
{
'extractor': ie_result['extractor'],
'webpage_url': ie_result['webpage_url'],
'extractor_key': ie_result['extractor_key'],
})
return r return r
ie_result['entries'] = [ ie_result['entries'] = [
self.process_ie_result(_fixup(r), download=download) self.process_ie_result(_fixup(r), download, extra_info)
for r in ie_result['entries'] for r in ie_result['entries']
] ]
return ie_result return ie_result
else: else:
raise Exception('Invalid result type: %s' % result_type) raise Exception('Invalid result type: %s' % result_type)
def select_format(self, format_spec, available_formats):
if format_spec == 'best' or format_spec is None:
return available_formats[-1]
elif format_spec == 'worst':
return available_formats[0]
else:
extensions = [u'mp4', u'flv', u'webm', u'3gp']
if format_spec in extensions:
filter_f = lambda f: f['ext'] == format_spec
else:
filter_f = lambda f: f['format_id'] == format_spec
matches = list(filter(filter_f, available_formats))
if matches:
return matches[-1]
return None
def process_video_result(self, info_dict, download=True):
assert info_dict.get('_type', 'video') == 'video'
if 'playlist' not in info_dict:
# It isn't part of a playlist
info_dict['playlist'] = None
info_dict['playlist_index'] = None
# This extractors handle format selection themselves
if info_dict['extractor'] in [u'youtube', u'Youku']:
if download:
self.process_info(info_dict)
return info_dict
# We now pick which formats have to be downloaded
if info_dict.get('formats') is None:
# There's only one format available
formats = [info_dict]
else:
formats = info_dict['formats']
# We check that all the formats have the format and format_id fields
for (i, format) in enumerate(formats):
if format.get('format_id') is None:
format['format_id'] = compat_str(i)
if format.get('format') is None:
format['format'] = u'{id} - {res}{note}'.format(
id=format['format_id'],
res=self.format_resolution(format),
note=u' ({0})'.format(format['format_note']) if format.get('format_note') is not None else '',
)
# Automatically determine file extension if missing
if 'ext' not in format:
format['ext'] = determine_ext(format['url'])
if self.params.get('listformats', None):
self.list_formats(info_dict)
return
format_limit = self.params.get('format_limit', None)
if format_limit:
formats = list(takewhile_inclusive(
lambda f: f['format_id'] != format_limit, formats
))
if self.params.get('prefer_free_formats'):
def _free_formats_key(f):
try:
ext_ord = [u'flv', u'mp4', u'webm'].index(f['ext'])
except ValueError:
ext_ord = -1
# We only compare the extension if they have the same height and width
return (f.get('height'), f.get('width'), ext_ord)
formats = sorted(formats, key=_free_formats_key)
req_format = self.params.get('format', 'best')
if req_format is None:
req_format = 'best'
formats_to_download = []
# The -1 is for supporting YoutubeIE
if req_format in ('-1', 'all'):
formats_to_download = formats
else:
# We can accept formats requestd in the format: 34/5/best, we pick
# the first that is available, starting from left
req_formats = req_format.split('/')
for rf in req_formats:
selected_format = self.select_format(rf, formats)
if selected_format is not None:
formats_to_download = [selected_format]
break
if not formats_to_download:
raise ExtractorError(u'requested format not available',
expected=True)
if download:
if len(formats_to_download) > 1:
self.to_screen(u'[info] %s: downloading video in %s formats' % (info_dict['id'], len(formats_to_download)))
for format in formats_to_download:
new_info = dict(info_dict)
new_info.update(format)
self.process_info(new_info)
# We update the info dict with the best quality format (backwards compatibility)
info_dict.update(formats_to_download[-1])
return info_dict
def process_info(self, info_dict): def process_info(self, info_dict):
"""Process a single resolved IE result.""" """Process a single resolved IE result."""
@ -472,9 +585,9 @@ class YoutubeDL(object):
if self.params.get('forceurl', False): if self.params.get('forceurl', False):
# For RTMP URLs, also include the playpath # For RTMP URLs, also include the playpath
compat_print(info_dict['url'] + info_dict.get('play_path', u'')) compat_print(info_dict['url'] + info_dict.get('play_path', u''))
if self.params.get('forcethumbnail', False) and 'thumbnail' in info_dict: if self.params.get('forcethumbnail', False) and info_dict.get('thumbnail') is not None:
compat_print(info_dict['thumbnail']) compat_print(info_dict['thumbnail'])
if self.params.get('forcedescription', False) and 'description' in info_dict: if self.params.get('forcedescription', False) and info_dict.get('description') is not None:
compat_print(info_dict['description']) compat_print(info_dict['description'])
if self.params.get('forcefilename', False) and filename is not None: if self.params.get('forcefilename', False) and filename is not None:
compat_print(filename) compat_print(filename)
@ -508,10 +621,22 @@ class YoutubeDL(object):
self.report_error(u'Cannot write description file ' + descfn) self.report_error(u'Cannot write description file ' + descfn)
return return
if self.params.get('writeannotations', False):
try:
annofn = filename + u'.annotations.xml'
self.report_writeannotations(annofn)
with io.open(encodeFilename(annofn), 'w', encoding='utf-8') as annofile:
annofile.write(info_dict['annotations'])
except (KeyError, TypeError):
self.report_warning(u'There are no annotations to write.')
except (OSError, IOError):
self.report_error(u'Cannot write annotations file: ' + annofn)
return
subtitles_are_requested = any([self.params.get('writesubtitles', False), subtitles_are_requested = any([self.params.get('writesubtitles', False),
self.params.get('writeautomaticsub')]) self.params.get('writeautomaticsub')])
if subtitles_are_requested and 'subtitles' in info_dict and info_dict['subtitles']: if subtitles_are_requested and 'subtitles' in info_dict and info_dict['subtitles']:
# subtitles download errors are already managed as troubles in relevant IE # subtitles download errors are already managed as troubles in relevant IE
# that way it will silently go on when used with unsupporting IE # that way it will silently go on when used with unsupporting IE
subtitles = info_dict['subtitles'] subtitles = info_dict['subtitles']
@ -533,7 +658,7 @@ class YoutubeDL(object):
infofn = filename + u'.info.json' infofn = filename + u'.info.json'
self.report_writeinfojson(infofn) self.report_writeinfojson(infofn)
try: try:
json_info_dict = dict((k, v) for k,v in info_dict.items() if not k in ['urlhandle']) json_info_dict = dict((k, v) for k, v in info_dict.items() if not k in ['urlhandle'])
write_json_file(json_info_dict, encodeFilename(infofn)) write_json_file(json_info_dict, encodeFilename(infofn))
except (OSError, IOError): except (OSError, IOError):
self.report_error(u'Cannot write metadata to JSON file ' + infofn) self.report_error(u'Cannot write metadata to JSON file ' + infofn)
@ -545,11 +670,15 @@ class YoutubeDL(object):
thumb_filename = filename.rpartition('.')[0] + u'.' + thumb_format thumb_filename = filename.rpartition('.')[0] + u'.' + thumb_format
self.to_screen(u'[%s] %s: Downloading thumbnail ...' % self.to_screen(u'[%s] %s: Downloading thumbnail ...' %
(info_dict['extractor'], info_dict['id'])) (info_dict['extractor'], info_dict['id']))
uf = compat_urllib_request.urlopen(info_dict['thumbnail']) try:
with open(thumb_filename, 'wb') as thumbf: uf = compat_urllib_request.urlopen(info_dict['thumbnail'])
shutil.copyfileobj(uf, thumbf) with open(thumb_filename, 'wb') as thumbf:
self.to_screen(u'[%s] %s: Writing thumbnail to: %s' % shutil.copyfileobj(uf, thumbf)
(info_dict['extractor'], info_dict['id'], thumb_filename)) self.to_screen(u'[%s] %s: Writing thumbnail to: %s' %
(info_dict['extractor'], info_dict['id'], thumb_filename))
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
self.report_warning(u'Unable to download thumbnail "%s": %s' %
(info_dict['thumbnail'], compat_str(err)))
if not self.params.get('skip_download', False): if not self.params.get('skip_download', False):
if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(filename)): if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(filename)):
@ -573,6 +702,8 @@ class YoutubeDL(object):
self.report_error(u'postprocessing: %s' % str(err)) self.report_error(u'postprocessing: %s' % str(err))
return return
self.record_download_archive(info_dict)
def download(self, url_list): def download(self, url_list):
"""Download a given list of URLs.""" """Download a given list of URLs."""
if len(url_list) > 1 and self.fixed_template(): if len(url_list) > 1 and self.fixed_template():
@ -597,7 +728,7 @@ class YoutubeDL(object):
keep_video = None keep_video = None
for pp in self._pps: for pp in self._pps:
try: try:
keep_video_wish,new_info = pp.run(info) keep_video_wish, new_info = pp.run(info)
if keep_video_wish is not None: if keep_video_wish is not None:
if keep_video_wish: if keep_video_wish:
keep_video = keep_video_wish keep_video = keep_video_wish
@ -612,3 +743,61 @@ class YoutubeDL(object):
os.remove(encodeFilename(filename)) os.remove(encodeFilename(filename))
except (IOError, OSError): except (IOError, OSError):
self.report_warning(u'Unable to remove downloaded video file') self.report_warning(u'Unable to remove downloaded video file')
def in_download_archive(self, info_dict):
fn = self.params.get('download_archive')
if fn is None:
return False
vid_id = info_dict['extractor'] + u' ' + info_dict['id']
try:
with locked_file(fn, 'r', encoding='utf-8') as archive_file:
for line in archive_file:
if line.strip() == vid_id:
return True
except IOError as ioe:
if ioe.errno != errno.ENOENT:
raise
return False
def record_download_archive(self, info_dict):
fn = self.params.get('download_archive')
if fn is None:
return
vid_id = info_dict['extractor'] + u' ' + info_dict['id']
with locked_file(fn, 'a', encoding='utf-8') as archive_file:
archive_file.write(vid_id + u'\n')
@staticmethod
def format_resolution(format, default='unknown'):
if format.get('_resolution') is not None:
return format['_resolution']
if format.get('height') is not None:
if format.get('width') is not None:
res = u'%sx%s' % (format['width'], format['height'])
else:
res = u'%sp' % format['height']
else:
res = default
return res
def list_formats(self, info_dict):
def line(format):
return (u'%-20s%-10s%-12s%s' % (
format['format_id'],
format['ext'],
self.format_resolution(format),
format.get('format_note', ''),
)
)
formats = info_dict.get('formats', [info_dict])
formats_s = list(map(line, formats))
if len(formats) > 1:
formats_s[0] += (' ' if formats[0].get('format_note') else '') + '(worst)'
formats_s[-1] += (' ' if formats[-1].get('format_note') else '') + '(best)'
header_line = line({
'format_id': u'format code', 'ext': u'extension',
'_resolution': u'resolution', 'format_note': u'note'})
self.to_screen(u'[info] Available formats for %s:\n%s\n%s' %
(info_dict['id'], header_line, u"\n".join(formats_s)))

View File

@ -31,11 +31,15 @@ __authors__ = (
'Huarong Huo', 'Huarong Huo',
'Ismael Mejía', 'Ismael Mejía',
'Steffan \'Ruirize\' James', 'Steffan \'Ruirize\' James',
'Andras Elso',
'Jelle van der Waa',
'Marcin Cieślak',
) )
__license__ = 'Public Domain' __license__ = 'Public Domain'
import codecs import codecs
import collections
import getpass import getpass
import optparse import optparse
import os import os
@ -45,17 +49,43 @@ import shlex
import socket import socket
import subprocess import subprocess
import sys import sys
import warnings import traceback
import platform import platform
from .utils import * from .utils import (
compat_cookiejar,
compat_print,
compat_str,
compat_urllib_request,
DateRange,
decodeOption,
determine_ext,
DownloadError,
get_cachedir,
make_HTTPS_handler,
MaxDownloadsReached,
platform_name,
preferredencoding,
SameFileError,
std_headers,
write_string,
YoutubeDLHandler,
)
from .update import update_self from .update import update_self
from .version import __version__ from .version import __version__
from .FileDownloader import * from .FileDownloader import (
FileDownloader,
)
from .extractor import gen_extractors from .extractor import gen_extractors
from .YoutubeDL import YoutubeDL from .YoutubeDL import YoutubeDL
from .PostProcessor import * from .PostProcessor import (
FFmpegMetadataPP,
FFmpegVideoConvertor,
FFmpegExtractAudioPP,
FFmpegEmbedSubtitlePP,
)
def parseOpts(overrideArguments=None): def parseOpts(overrideArguments=None):
def _readOptions(filename_bytes): def _readOptions(filename_bytes):
@ -105,7 +135,7 @@ def parseOpts(overrideArguments=None):
def _hide_login_info(opts): def _hide_login_info(opts):
opts = list(opts) opts = list(opts)
for private_opt in ['-p', '--password', '-u', '--username']: for private_opt in ['-p', '--password', '-u', '--username', '--video-password']:
try: try:
i = opts.index(private_opt) i = opts.index(private_opt)
opts[i+1] = '<PRIVATE>' opts[i+1] = '<PRIVATE>'
@ -151,6 +181,9 @@ def parseOpts(overrideArguments=None):
action='store_true', dest='update_self', help='update this program to latest version. Make sure that you have sufficient permissions (run with sudo if needed)') action='store_true', dest='update_self', help='update this program to latest version. Make sure that you have sufficient permissions (run with sudo if needed)')
general.add_option('-i', '--ignore-errors', general.add_option('-i', '--ignore-errors',
action='store_true', dest='ignoreerrors', help='continue on download errors, for example to to skip unavailable videos in a playlist', default=False) action='store_true', dest='ignoreerrors', help='continue on download errors, for example to to skip unavailable videos in a playlist', default=False)
general.add_option('--abort-on-error',
action='store_false', dest='ignoreerrors',
help='Abort downloading of further videos (in the playlist or the command line) if an error occurs')
general.add_option('--dump-user-agent', general.add_option('--dump-user-agent',
action='store_true', dest='dump_user_agent', action='store_true', dest='dump_user_agent',
help='display the current browser identification', default=False) help='display the current browser identification', default=False)
@ -168,8 +201,8 @@ def parseOpts(overrideArguments=None):
general.add_option('--proxy', dest='proxy', default=None, help='Use the specified HTTP/HTTPS proxy', metavar='URL') general.add_option('--proxy', dest='proxy', default=None, help='Use the specified HTTP/HTTPS proxy', metavar='URL')
general.add_option('--no-check-certificate', action='store_true', dest='no_check_certificate', default=False, help='Suppress HTTPS certificate validation.') general.add_option('--no-check-certificate', action='store_true', dest='no_check_certificate', default=False, help='Suppress HTTPS certificate validation.')
general.add_option( general.add_option(
'--cache-dir', dest='cachedir', default=u'~/.youtube-dl/cache', '--cache-dir', dest='cachedir', default=get_cachedir(), metavar='DIR',
help='Location in the filesystem where youtube-dl can store downloaded information permanently. %default by default') help='Location in the filesystem where youtube-dl can store downloaded information permanently. By default $XDG_CACHE_HOME/youtube-dl or ~/.cache/youtube-dl .')
general.add_option( general.add_option(
'--no-cache-dir', action='store_const', const=None, dest='cachedir', '--no-cache-dir', action='store_const', const=None, dest='cachedir',
help='Disable filesystem caching') help='Disable filesystem caching')
@ -187,6 +220,13 @@ def parseOpts(overrideArguments=None):
selection.add_option('--date', metavar='DATE', dest='date', help='download only videos uploaded in this date', default=None) selection.add_option('--date', metavar='DATE', dest='date', help='download only videos uploaded in this date', default=None)
selection.add_option('--datebefore', metavar='DATE', dest='datebefore', help='download only videos uploaded before this date', default=None) selection.add_option('--datebefore', metavar='DATE', dest='datebefore', help='download only videos uploaded before this date', default=None)
selection.add_option('--dateafter', metavar='DATE', dest='dateafter', help='download only videos uploaded after this date', default=None) selection.add_option('--dateafter', metavar='DATE', dest='dateafter', help='download only videos uploaded after this date', default=None)
selection.add_option('--no-playlist', action='store_true', dest='noplaylist', help='download only the currently playing video', default=False)
selection.add_option('--age-limit', metavar='YEARS', dest='age_limit',
help='download only videos suitable for the given age',
default=None, type=int)
selection.add_option('--download-archive', metavar='FILE',
dest='download_archive',
help='Download only videos not present in the archive file. Record all downloaded videos in it.')
authentication.add_option('-u', '--username', authentication.add_option('-u', '--username',
@ -200,7 +240,7 @@ def parseOpts(overrideArguments=None):
video_format.add_option('-f', '--format', video_format.add_option('-f', '--format',
action='store', dest='format', metavar='FORMAT', action='store', dest='format', metavar='FORMAT', default='best',
help='video format code, specifiy the order of preference using slashes: "-f 22/17/18". "-f mp4" and "-f flv" are also supported') help='video format code, specifiy the order of preference using slashes: "-f 22/17/18". "-f mp4" and "-f flv" are also supported')
video_format.add_option('--all-formats', video_format.add_option('--all-formats',
action='store_const', dest='format', help='download all available video formats', const='all') action='store_const', dest='format', help='download all available video formats', const='all')
@ -232,11 +272,11 @@ def parseOpts(overrideArguments=None):
help='languages of the subtitles to download (optional) separated by commas, use IETF language tags like \'en,pt\'') help='languages of the subtitles to download (optional) separated by commas, use IETF language tags like \'en,pt\'')
downloader.add_option('-r', '--rate-limit', downloader.add_option('-r', '--rate-limit',
dest='ratelimit', metavar='LIMIT', help='maximum download rate (e.g. 50k or 44.6m)') dest='ratelimit', metavar='LIMIT', help='maximum download rate in bytes per second (e.g. 50K or 4.2M)')
downloader.add_option('-R', '--retries', downloader.add_option('-R', '--retries',
dest='retries', metavar='RETRIES', help='number of retries (default is %default)', default=10) dest='retries', metavar='RETRIES', help='number of retries (default is %default)', default=10)
downloader.add_option('--buffer-size', downloader.add_option('--buffer-size',
dest='buffersize', metavar='SIZE', help='size of download buffer (e.g. 1024 or 16k) (default is %default)', default="1024") dest='buffersize', metavar='SIZE', help='size of download buffer (e.g. 1024 or 16K) (default is %default)', default="1024")
downloader.add_option('--no-resize-buffer', downloader.add_option('--no-resize-buffer',
action='store_true', dest='noresizebuffer', action='store_true', dest='noresizebuffer',
help='do not automatically adjust the buffer size. By default, the buffer size is automatically resized from an initial value of SIZE.', default=False) help='do not automatically adjust the buffer size. By default, the buffer size is automatically resized from an initial value of SIZE.', default=False)
@ -278,6 +318,9 @@ def parseOpts(overrideArguments=None):
verbosity.add_option('--dump-intermediate-pages', verbosity.add_option('--dump-intermediate-pages',
action='store_true', dest='dump_intermediate_pages', default=False, action='store_true', dest='dump_intermediate_pages', default=False,
help='print downloaded pages to debug problems(very verbose)') help='print downloaded pages to debug problems(very verbose)')
verbosity.add_option('--write-pages',
action='store_true', dest='write_pages', default=False,
help='Write downloaded pages to files in the current directory')
verbosity.add_option('--youtube-print-sig-code', verbosity.add_option('--youtube-print-sig-code',
action='store_true', dest='youtube_print_sig_code', default=False, action='store_true', dest='youtube_print_sig_code', default=False,
help=optparse.SUPPRESS_HELP) help=optparse.SUPPRESS_HELP)
@ -297,7 +340,10 @@ def parseOpts(overrideArguments=None):
help=('output filename template. Use %(title)s to get the title, ' help=('output filename template. Use %(title)s to get the title, '
'%(uploader)s for the uploader name, %(uploader_id)s for the uploader nickname if different, ' '%(uploader)s for the uploader name, %(uploader_id)s for the uploader nickname if different, '
'%(autonumber)s to get an automatically incremented number, ' '%(autonumber)s to get an automatically incremented number, '
'%(ext)s for the filename extension, %(upload_date)s for the upload date (YYYYMMDD), ' '%(ext)s for the filename extension, '
'%(format)s for the format description (like "22 - 1280x720" or "HD"),'
'%(format_id)s for the unique id of the format (like Youtube\'s itags: "137"),'
'%(upload_date)s for the upload date (YYYYMMDD), '
'%(extractor)s for the provider (youtube, metacafe, etc), ' '%(extractor)s for the provider (youtube, metacafe, etc), '
'%(id)s for the video id , %(playlist)s for the playlist the video is in, ' '%(id)s for the video id , %(playlist)s for the playlist the video is in, '
'%(playlist_index)s for the position in the playlist and %% for a literal percent. ' '%(playlist_index)s for the position in the playlist and %% for a literal percent. '
@ -305,7 +351,7 @@ def parseOpts(overrideArguments=None):
'for example with -o \'/my/downloads/%(uploader)s/%(title)s-%(id)s.%(ext)s\' .')) 'for example with -o \'/my/downloads/%(uploader)s/%(title)s-%(id)s.%(ext)s\' .'))
filesystem.add_option('--autonumber-size', filesystem.add_option('--autonumber-size',
dest='autonumber_size', metavar='NUMBER', dest='autonumber_size', metavar='NUMBER',
help='Specifies the number of digits in %(autonumber)s when it is present in output filename template or --autonumber option is given') help='Specifies the number of digits in %(autonumber)s when it is present in output filename template or --auto-number option is given')
filesystem.add_option('--restrict-filenames', filesystem.add_option('--restrict-filenames',
action='store_true', dest='restrictfilenames', action='store_true', dest='restrictfilenames',
help='Restrict filenames to only ASCII characters, and avoid "&" and spaces in filenames', default=False) help='Restrict filenames to only ASCII characters, and avoid "&" and spaces in filenames', default=False)
@ -314,7 +360,7 @@ def parseOpts(overrideArguments=None):
filesystem.add_option('-w', '--no-overwrites', filesystem.add_option('-w', '--no-overwrites',
action='store_true', dest='nooverwrites', help='do not overwrite files', default=False) action='store_true', dest='nooverwrites', help='do not overwrite files', default=False)
filesystem.add_option('-c', '--continue', filesystem.add_option('-c', '--continue',
action='store_true', dest='continue_dl', help='resume partially downloaded files', default=True) action='store_true', dest='continue_dl', help='force resume of partially downloaded files. By default, youtube-dl will resume downloads if possible.', default=True)
filesystem.add_option('--no-continue', filesystem.add_option('--no-continue',
action='store_false', dest='continue_dl', action='store_false', dest='continue_dl',
help='do not resume partially downloaded files (restart from beginning)') help='do not resume partially downloaded files (restart from beginning)')
@ -331,6 +377,9 @@ def parseOpts(overrideArguments=None):
filesystem.add_option('--write-info-json', filesystem.add_option('--write-info-json',
action='store_true', dest='writeinfojson', action='store_true', dest='writeinfojson',
help='write video metadata to a .info.json file', default=False) help='write video metadata to a .info.json file', default=False)
filesystem.add_option('--write-annotations',
action='store_true', dest='writeannotations',
help='write video annotations to a .annotation file', default=False)
filesystem.add_option('--write-thumbnail', filesystem.add_option('--write-thumbnail',
action='store_true', dest='writethumbnail', action='store_true', dest='writethumbnail',
help='write thumbnail image to disk', default=False) help='write thumbnail image to disk', default=False)
@ -350,6 +399,8 @@ def parseOpts(overrideArguments=None):
help='do not overwrite post-processed files; the post-processed files are overwritten by default') help='do not overwrite post-processed files; the post-processed files are overwritten by default')
postproc.add_option('--embed-subs', action='store_true', dest='embedsubtitles', default=False, postproc.add_option('--embed-subs', action='store_true', dest='embedsubtitles', default=False,
help='embed subtitles in the video (only for mp4 videos)') help='embed subtitles in the video (only for mp4 videos)')
postproc.add_option('--add-metadata', action='store_true', dest='addmetadata', default=False,
help='add metadata to the files')
parser.add_option_group(general) parser.add_option_group(general)
@ -369,9 +420,13 @@ def parseOpts(overrideArguments=None):
else: else:
xdg_config_home = os.environ.get('XDG_CONFIG_HOME') xdg_config_home = os.environ.get('XDG_CONFIG_HOME')
if xdg_config_home: if xdg_config_home:
userConfFile = os.path.join(xdg_config_home, 'youtube-dl.conf') userConfFile = os.path.join(xdg_config_home, 'youtube-dl', 'config')
if not os.path.isfile(userConfFile):
userConfFile = os.path.join(xdg_config_home, 'youtube-dl.conf')
else: else:
userConfFile = os.path.join(os.path.expanduser('~'), '.config', 'youtube-dl.conf') userConfFile = os.path.join(os.path.expanduser('~'), '.config', 'youtube-dl', 'config')
if not os.path.isfile(userConfFile):
userConfFile = os.path.join(os.path.expanduser('~'), '.config', 'youtube-dl.conf')
systemConf = _readOptions('/etc/youtube-dl.conf') systemConf = _readOptions('/etc/youtube-dl.conf')
userConf = _readOptions(userConfFile) userConf = _readOptions(userConfFile)
commandLineConf = sys.argv[1:] commandLineConf = sys.argv[1:]
@ -436,27 +491,7 @@ def _real_main(argv=None):
all_urls = batchurls + args all_urls = batchurls + args
all_urls = [url.strip() for url in all_urls] all_urls = [url.strip() for url in all_urls]
# General configuration opener = _setup_opener(jar=jar, opts=opts)
cookie_processor = compat_urllib_request.HTTPCookieProcessor(jar)
if opts.proxy is not None:
if opts.proxy == '':
proxies = {}
else:
proxies = {'http': opts.proxy, 'https': opts.proxy}
else:
proxies = compat_urllib_request.getproxies()
# Set HTTPS proxy to HTTP one if given (https://github.com/rg3/youtube-dl/issues/805)
if 'http' in proxies and 'https' not in proxies:
proxies['https'] = proxies['http']
proxy_handler = compat_urllib_request.ProxyHandler(proxies)
https_handler = make_HTTPS_handler(opts)
opener = compat_urllib_request.build_opener(https_handler, proxy_handler, cookie_processor, YoutubeDLHandler())
# Delete the default user-agent header, which would otherwise apply in
# cases where our custom HTTP handler doesn't come into play
# (See https://github.com/rg3/youtube-dl/issues/1309 for details)
opener.addheaders =[]
compat_urllib_request.install_opener(opener)
socket.setdefaulttimeout(300) # 5 minutes should be enough (famous last words)
extractors = gen_extractors() extractors = gen_extractors()
@ -473,6 +508,8 @@ def _real_main(argv=None):
if not ie._WORKING: if not ie._WORKING:
continue continue
desc = getattr(ie, 'IE_DESC', ie.IE_NAME) desc = getattr(ie, 'IE_DESC', ie.IE_NAME)
if desc is False:
continue
if hasattr(ie, 'SEARCH_KEY'): if hasattr(ie, 'SEARCH_KEY'):
_SEARCHES = (u'cute kittens', u'slithering pythons', u'falling cat', u'angry poodle', u'purple fish', u'running tortoise') _SEARCHES = (u'cute kittens', u'slithering pythons', u'falling cat', u'angry poodle', u'purple fish', u'running tortoise')
_COUNTS = (u'', u'5', u'10', u'all') _COUNTS = (u'', u'5', u'10', u'all')
@ -599,11 +636,13 @@ def _real_main(argv=None):
'progress_with_newline': opts.progress_with_newline, 'progress_with_newline': opts.progress_with_newline,
'playliststart': opts.playliststart, 'playliststart': opts.playliststart,
'playlistend': opts.playlistend, 'playlistend': opts.playlistend,
'noplaylist': opts.noplaylist,
'logtostderr': opts.outtmpl == '-', 'logtostderr': opts.outtmpl == '-',
'consoletitle': opts.consoletitle, 'consoletitle': opts.consoletitle,
'nopart': opts.nopart, 'nopart': opts.nopart,
'updatetime': opts.updatetime, 'updatetime': opts.updatetime,
'writedescription': opts.writedescription, 'writedescription': opts.writedescription,
'writeannotations': opts.writeannotations,
'writeinfojson': opts.writeinfojson, 'writeinfojson': opts.writeinfojson,
'writethumbnail': opts.writethumbnail, 'writethumbnail': opts.writethumbnail,
'writesubtitles': opts.writesubtitles, 'writesubtitles': opts.writesubtitles,
@ -618,6 +657,7 @@ def _real_main(argv=None):
'prefer_free_formats': opts.prefer_free_formats, 'prefer_free_formats': opts.prefer_free_formats,
'verbose': opts.verbose, 'verbose': opts.verbose,
'dump_intermediate_pages': opts.dump_intermediate_pages, 'dump_intermediate_pages': opts.dump_intermediate_pages,
'write_pages': opts.write_pages,
'test': opts.test, 'test': opts.test,
'keepvideo': opts.keepvideo, 'keepvideo': opts.keepvideo,
'min_filesize': opts.min_filesize, 'min_filesize': opts.min_filesize,
@ -625,6 +665,8 @@ def _real_main(argv=None):
'daterange': date, 'daterange': date,
'cachedir': opts.cachedir, 'cachedir': opts.cachedir,
'youtube_print_sig_code': opts.youtube_print_sig_code, 'youtube_print_sig_code': opts.youtube_print_sig_code,
'age_limit': opts.age_limit,
'download_archive': opts.download_archive,
}) })
if opts.verbose: if opts.verbose:
@ -644,11 +686,19 @@ def _real_main(argv=None):
except: except:
pass pass
write_string(u'[debug] Python version %s - %s' %(platform.python_version(), platform_name()) + u'\n') write_string(u'[debug] Python version %s - %s' %(platform.python_version(), platform_name()) + u'\n')
write_string(u'[debug] Proxy map: ' + str(proxy_handler.proxies) + u'\n')
proxy_map = {}
for handler in opener.handlers:
if hasattr(handler, 'proxies'):
proxy_map.update(handler.proxies)
write_string(u'[debug] Proxy map: ' + compat_str(proxy_map) + u'\n')
ydl.add_default_info_extractors() ydl.add_default_info_extractors()
# PostProcessors # PostProcessors
# Add the metadata pp first, the other pps will copy it
if opts.addmetadata:
ydl.add_post_processor(FFmpegMetadataPP())
if opts.extractaudio: if opts.extractaudio:
ydl.add_post_processor(FFmpegExtractAudioPP(preferredcodec=opts.audioformat, preferredquality=opts.audioquality, nopostoverwrites=opts.nopostoverwrites)) ydl.add_post_processor(FFmpegExtractAudioPP(preferredcodec=opts.audioformat, preferredquality=opts.audioquality, nopostoverwrites=opts.nopostoverwrites))
if opts.recodevideo: if opts.recodevideo:
@ -658,7 +708,7 @@ def _real_main(argv=None):
# Update version # Update version
if opts.update_self: if opts.update_self:
update_self(ydl.to_screen, opts.verbose, sys.argv[0]) update_self(ydl.to_screen, opts.verbose)
# Maybe do nothing # Maybe do nothing
if len(all_urls) < 1: if len(all_urls) < 1:
@ -677,11 +727,42 @@ def _real_main(argv=None):
if opts.cookiefile is not None: if opts.cookiefile is not None:
try: try:
jar.save() jar.save()
except (IOError, OSError) as err: except (IOError, OSError):
sys.exit(u'ERROR: unable to save cookie jar') sys.exit(u'ERROR: unable to save cookie jar')
sys.exit(retcode) sys.exit(retcode)
def _setup_opener(jar=None, opts=None, timeout=300):
if opts is None:
FakeOptions = collections.namedtuple(
'FakeOptions', ['proxy', 'no_check_certificate'])
opts = FakeOptions(proxy=None, no_check_certificate=False)
cookie_processor = compat_urllib_request.HTTPCookieProcessor(jar)
if opts.proxy is not None:
if opts.proxy == '':
proxies = {}
else:
proxies = {'http': opts.proxy, 'https': opts.proxy}
else:
proxies = compat_urllib_request.getproxies()
# Set HTTPS proxy to HTTP one if given (https://github.com/rg3/youtube-dl/issues/805)
if 'http' in proxies and 'https' not in proxies:
proxies['https'] = proxies['http']
proxy_handler = compat_urllib_request.ProxyHandler(proxies)
https_handler = make_HTTPS_handler(opts)
opener = compat_urllib_request.build_opener(
https_handler, proxy_handler, cookie_processor, YoutubeDLHandler())
# Delete the default user-agent header, which would otherwise apply in
# cases where our custom HTTP handler doesn't come into play
# (See https://github.com/rg3/youtube-dl/issues/1309 for details)
opener.addheaders = []
compat_urllib_request.install_opener(opener)
socket.setdefaulttimeout(timeout)
return opener
def main(argv=None): def main(argv=None):
try: try:
_real_main(argv) _real_main(argv)

View File

@ -2,8 +2,14 @@ from .appletrailers import AppleTrailersIE
from .addanime import AddAnimeIE from .addanime import AddAnimeIE
from .archiveorg import ArchiveOrgIE from .archiveorg import ArchiveOrgIE
from .ard import ARDIE from .ard import ARDIE
from .arte import ArteTvIE from .arte import (
ArteTvIE,
ArteTVPlus7IE,
ArteTVCreativeIE,
ArteTVFutureIE,
)
from .auengine import AUEngineIE from .auengine import AUEngineIE
from .bambuser import BambuserIE, BambuserChannelIE
from .bandcamp import BandcampIE from .bandcamp import BandcampIE
from .bliptv import BlipTVIE, BlipTVUserIE from .bliptv import BlipTVIE, BlipTVUserIE
from .bloomberg import BloombergIE from .bloomberg import BloombergIE
@ -12,6 +18,7 @@ from .brightcove import BrightcoveIE
from .c56 import C56IE from .c56 import C56IE
from .canalplus import CanalplusIE from .canalplus import CanalplusIE
from .canalc2 import Canalc2IE from .canalc2 import Canalc2IE
from .cinemassacre import CinemassacreIE
from .cnn import CNNIE from .cnn import CNNIE
from .collegehumor import CollegeHumorIE from .collegehumor import CollegeHumorIE
from .comedycentral import ComedyCentralIE from .comedycentral import ComedyCentralIE
@ -31,9 +38,12 @@ from .defense import DefenseGouvFrIE
from .ebaumsworld import EbaumsWorldIE from .ebaumsworld import EbaumsWorldIE
from .ehow import EHowIE from .ehow import EHowIE
from .eighttracks import EightTracksIE from .eighttracks import EightTracksIE
from .eitb import EitbIE
from .escapist import EscapistIE from .escapist import EscapistIE
from .exfm import ExfmIE from .exfm import ExfmIE
from .extremetube import ExtremeTubeIE
from .facebook import FacebookIE from .facebook import FacebookIE
from .faz import FazIE
from .fktv import ( from .fktv import (
FKTVIE, FKTVIE,
FKTVPosteckeIE, FKTVPosteckeIE,
@ -47,6 +57,7 @@ from .francetv import (
) )
from .freesound import FreesoundIE from .freesound import FreesoundIE
from .funnyordie import FunnyOrDieIE from .funnyordie import FunnyOrDieIE
from .gamekings import GamekingsIE
from .gamespot import GameSpotIE from .gamespot import GameSpotIE
from .gametrailers import GametrailersIE from .gametrailers import GametrailersIE
from .generic import GenericIE from .generic import GenericIE
@ -60,10 +71,12 @@ from .ign import IGNIE, OneUPIE
from .ina import InaIE from .ina import InaIE
from .infoq import InfoQIE from .infoq import InfoQIE
from .instagram import InstagramIE from .instagram import InstagramIE
from .internetvideoarchive import InternetVideoArchiveIE
from .jeuxvideo import JeuxVideoIE from .jeuxvideo import JeuxVideoIE
from .jukebox import JukeboxIE from .jukebox import JukeboxIE
from .justintv import JustinTVIE from .justintv import JustinTVIE
from .kankan import KankanIE from .kankan import KankanIE
from .keezmovies import KeezMoviesIE
from .kickstarter import KickStarterIE from .kickstarter import KickStarterIE
from .keek import KeekIE from .keek import KeekIE
from .liveleak import LiveLeakIE from .liveleak import LiveLeakIE
@ -72,44 +85,57 @@ from .metacafe import MetacafeIE
from .metacritic import MetacriticIE from .metacritic import MetacriticIE
from .mit import TechTVMITIE, MITIE from .mit import TechTVMITIE, MITIE
from .mixcloud import MixcloudIE from .mixcloud import MixcloudIE
from .mofosex import MofosexIE
from .mtv import MTVIE from .mtv import MTVIE
from .muzu import MuzuTVIE from .muzu import MuzuTVIE
from .myspace import MySpaceIE
from .myspass import MySpassIE from .myspass import MySpassIE
from .myvideo import MyVideoIE from .myvideo import MyVideoIE
from .naver import NaverIE from .naver import NaverIE
from .nba import NBAIE from .nba import NBAIE
from .nbc import NBCNewsIE from .nbc import NBCNewsIE
from .newgrounds import NewgroundsIE from .newgrounds import NewgroundsIE
from .nhl import NHLIE, NHLVideocenterIE
from .nowvideo import NowVideoIE
from .ooyala import OoyalaIE from .ooyala import OoyalaIE
from .orf import ORFIE from .orf import ORFIE
from .pbs import PBSIE from .pbs import PBSIE
from .photobucket import PhotobucketIE from .photobucket import PhotobucketIE
from .pornhub import PornHubIE
from .pornotube import PornotubeIE from .pornotube import PornotubeIE
from .rbmaradio import RBMARadioIE from .rbmaradio import RBMARadioIE
from .redtube import RedTubeIE from .redtube import RedTubeIE
from .ringtv import RingTVIE from .ringtv import RingTVIE
from .ro220 import Ro220IE from .ro220 import Ro220IE
from .rottentomatoes import RottenTomatoesIE
from .roxwel import RoxwelIE from .roxwel import RoxwelIE
from .rtlnow import RTLnowIE from .rtlnow import RTLnowIE
from .rutube import RutubeIE
from .sina import SinaIE from .sina import SinaIE
from .slashdot import SlashdotIE from .slashdot import SlashdotIE
from .slideshare import SlideshareIE from .slideshare import SlideshareIE
from .sohu import SohuIE from .sohu import SohuIE
from .soundcloud import SoundcloudIE, SoundcloudSetIE, SoundcloudUserIE from .soundcloud import SoundcloudIE, SoundcloudSetIE, SoundcloudUserIE
from .southparkstudios import SouthParkStudiosIE from .southparkstudios import SouthParkStudiosIE
from .space import SpaceIE
from .spankwire import SpankwireIE
from .spiegel import SpiegelIE from .spiegel import SpiegelIE
from .stanfordoc import StanfordOpenClassroomIE from .stanfordoc import StanfordOpenClassroomIE
from .statigram import StatigramIE from .statigram import StatigramIE
from .steam import SteamIE from .steam import SteamIE
from .sztvhu import SztvHuIE
from .teamcoco import TeamcocoIE from .teamcoco import TeamcocoIE
from .techtalks import TechTalksIE
from .ted import TEDIE from .ted import TEDIE
from .tf1 import TF1IE from .tf1 import TF1IE
from .thisav import ThisAVIE from .thisav import ThisAVIE
from .traileraddict import TrailerAddictIE from .traileraddict import TrailerAddictIE
from .trilulilu import TriluliluIE from .trilulilu import TriluliluIE
from .tube8 import Tube8IE
from .tudou import TudouIE from .tudou import TudouIE
from .tumblr import TumblrIE from .tumblr import TumblrIE
from .tutv import TutvIE from .tutv import TutvIE
from .tvp import TvpIE
from .unistra import UnistraIE from .unistra import UnistraIE
from .ustream import UstreamIE, UstreamChannelIE from .ustream import UstreamIE, UstreamChannelIE
from .vbox7 import Vbox7IE from .vbox7 import Vbox7IE
@ -117,16 +143,22 @@ from .veehd import VeeHDIE
from .veoh import VeohIE from .veoh import VeohIE
from .vevo import VevoIE from .vevo import VevoIE
from .vice import ViceIE from .vice import ViceIE
from .viddler import ViddlerIE
from .videodetective import VideoDetectiveIE
from .videofyme import VideofyMeIE from .videofyme import VideofyMeIE
from .videopremium import VideoPremiumIE
from .vimeo import VimeoIE, VimeoChannelIE from .vimeo import VimeoIE, VimeoChannelIE
from .vine import VineIE from .vine import VineIE
from .vk import VKIE
from .wat import WatIE from .wat import WatIE
from .websurg import WeBSurgIE
from .weibo import WeiboIE from .weibo import WeiboIE
from .wimp import WimpIE from .wimp import WimpIE
from .worldstarhiphop import WorldStarHipHopIE from .worldstarhiphop import WorldStarHipHopIE
from .xhamster import XHamsterIE from .xhamster import XHamsterIE
from .xnxx import XNXXIE from .xnxx import XNXXIE
from .xvideos import XVideosIE from .xvideos import XVideosIE
from .xtube import XTubeIE
from .yahoo import YahooIE, YahooSearchIE from .yahoo import YahooIE, YahooSearchIE
from .youjizz import YouJizzIE from .youjizz import YouJizzIE
from .youku import YoukuIE from .youku import YoukuIE
@ -135,11 +167,13 @@ from .youtube import (
YoutubeIE, YoutubeIE,
YoutubePlaylistIE, YoutubePlaylistIE,
YoutubeSearchIE, YoutubeSearchIE,
YoutubeSearchDateIE,
YoutubeUserIE, YoutubeUserIE,
YoutubeChannelIE, YoutubeChannelIE,
YoutubeShowIE, YoutubeShowIE,
YoutubeSubscriptionsIE, YoutubeSubscriptionsIE,
YoutubeRecommendedIE, YoutubeRecommendedIE,
YoutubeTruncatedURLIE,
YoutubeWatchLaterIE, YoutubeWatchLaterIE,
YoutubeFavouritesIE, YoutubeFavouritesIE,
) )

View File

@ -17,8 +17,8 @@ class AddAnimeIE(InfoExtractor):
IE_NAME = u'AddAnime' IE_NAME = u'AddAnime'
_TEST = { _TEST = {
u'url': u'http://www.add-anime.net/watch_video.php?v=24MR3YO5SAS9', u'url': u'http://www.add-anime.net/watch_video.php?v=24MR3YO5SAS9',
u'file': u'24MR3YO5SAS9.flv', u'file': u'24MR3YO5SAS9.mp4',
u'md5': u'1036a0e0cd307b95bd8a8c3a5c8cfaf1', u'md5': u'72954ea10bc979ab5e2eb288b21425a0',
u'info_dict': { u'info_dict': {
u"description": u"One Piece 606", u"description": u"One Piece 606",
u"title": u"One Piece 606" u"title": u"One Piece 606"
@ -31,7 +31,8 @@ class AddAnimeIE(InfoExtractor):
video_id = mobj.group('video_id') video_id = mobj.group('video_id')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
except ExtractorError as ee: except ExtractorError as ee:
if not isinstance(ee.cause, compat_HTTPError): if not isinstance(ee.cause, compat_HTTPError) or \
ee.cause.code != 503:
raise raise
redir_webpage = ee.cause.read().decode('utf-8') redir_webpage = ee.cause.read().decode('utf-8')
@ -60,16 +61,26 @@ class AddAnimeIE(InfoExtractor):
note=u'Confirming after redirect') note=u'Confirming after redirect')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
video_url = self._search_regex(r"var normal_video_file = '(.*?)';", formats = []
webpage, u'video file URL') for format_id in ('normal', 'hq'):
rex = r"var %s_video_file = '(.*?)';" % re.escape(format_id)
video_url = self._search_regex(rex, webpage, u'video file URLx',
fatal=False)
if not video_url:
continue
formats.append({
'format_id': format_id,
'url': video_url,
})
if not formats:
raise ExtractorError(u'Cannot find any video format!')
video_title = self._og_search_title(webpage) video_title = self._og_search_title(webpage)
video_description = self._og_search_description(webpage) video_description = self._og_search_description(webpage)
return { return {
'_type': 'video', '_type': 'video',
'id': video_id, 'id': video_id,
'url': video_url, 'formats': formats,
'ext': 'flv',
'title': video_title, 'title': video_title,
'description': video_description 'description': video_description
} }

View File

@ -1,8 +1,10 @@
import re import re
import xml.etree.ElementTree import xml.etree.ElementTree
import json
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
compat_urlparse,
determine_ext, determine_ext,
) )
@ -14,10 +16,9 @@ class AppleTrailersIE(InfoExtractor):
u"playlist": [ u"playlist": [
{ {
u"file": u"manofsteel-trailer4.mov", u"file": u"manofsteel-trailer4.mov",
u"md5": u"11874af099d480cc09e103b189805d5f", u"md5": u"d97a8e575432dbcb81b7c3acb741f8a8",
u"info_dict": { u"info_dict": {
u"duration": 111, u"duration": 111,
u"thumbnail": u"http://trailers.apple.com/trailers/wb/manofsteel/images/thumbnail_11624.jpg",
u"title": u"Trailer 4", u"title": u"Trailer 4",
u"upload_date": u"20130523", u"upload_date": u"20130523",
u"uploader_id": u"wb", u"uploader_id": u"wb",
@ -25,10 +26,9 @@ class AppleTrailersIE(InfoExtractor):
}, },
{ {
u"file": u"manofsteel-trailer3.mov", u"file": u"manofsteel-trailer3.mov",
u"md5": u"07a0a262aae5afe68120eed61137ab34", u"md5": u"b8017b7131b721fb4e8d6f49e1df908c",
u"info_dict": { u"info_dict": {
u"duration": 182, u"duration": 182,
u"thumbnail": u"http://trailers.apple.com/trailers/wb/manofsteel/images/thumbnail_10793.jpg",
u"title": u"Trailer 3", u"title": u"Trailer 3",
u"upload_date": u"20130417", u"upload_date": u"20130417",
u"uploader_id": u"wb", u"uploader_id": u"wb",
@ -36,10 +36,9 @@ class AppleTrailersIE(InfoExtractor):
}, },
{ {
u"file": u"manofsteel-trailer.mov", u"file": u"manofsteel-trailer.mov",
u"md5": u"e401fde0813008e3307e54b6f384cff1", u"md5": u"d0f1e1150989b9924679b441f3404d48",
u"info_dict": { u"info_dict": {
u"duration": 148, u"duration": 148,
u"thumbnail": u"http://trailers.apple.com/trailers/wb/manofsteel/images/thumbnail_8703.jpg",
u"title": u"Trailer", u"title": u"Trailer",
u"upload_date": u"20121212", u"upload_date": u"20121212",
u"uploader_id": u"wb", u"uploader_id": u"wb",
@ -47,10 +46,9 @@ class AppleTrailersIE(InfoExtractor):
}, },
{ {
u"file": u"manofsteel-teaser.mov", u"file": u"manofsteel-teaser.mov",
u"md5": u"76b392f2ae9e7c98b22913c10a639c97", u"md5": u"5fe08795b943eb2e757fa95cb6def1cb",
u"info_dict": { u"info_dict": {
u"duration": 93, u"duration": 93,
u"thumbnail": u"http://trailers.apple.com/trailers/wb/manofsteel/images/thumbnail_6899.jpg",
u"title": u"Teaser", u"title": u"Teaser",
u"upload_date": u"20120721", u"upload_date": u"20120721",
u"uploader_id": u"wb", u"uploader_id": u"wb",
@ -59,87 +57,61 @@ class AppleTrailersIE(InfoExtractor):
] ]
} }
_JSON_RE = r'iTunes.playURL\((.*?)\);'
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
movie = mobj.group('movie') movie = mobj.group('movie')
uploader_id = mobj.group('company') uploader_id = mobj.group('company')
playlist_url = url.partition(u'?')[0] + u'/includes/playlists/web.inc' playlist_url = compat_urlparse.urljoin(url, u'includes/playlists/itunes.inc')
playlist_snippet = self._download_webpage(playlist_url, movie) playlist_snippet = self._download_webpage(playlist_url, movie)
playlist_cleaned = re.sub(r'(?s)<script>.*?</script>', u'', playlist_snippet) playlist_cleaned = re.sub(r'(?s)<script[^<]*?>.*?</script>', u'', playlist_snippet)
playlist_cleaned = re.sub(r'<img ([^<]*?)>', r'<img \1/>', playlist_cleaned)
# The ' in the onClick attributes are not escaped, it couldn't be parsed
# with xml.etree.ElementTree.fromstring
# like: http://trailers.apple.com/trailers/wb/gravity/
def _clean_json(m):
return u'iTunes.playURL(%s);' % m.group(1).replace('\'', '&#39;')
playlist_cleaned = re.sub(self._JSON_RE, _clean_json, playlist_cleaned)
playlist_html = u'<html>' + playlist_cleaned + u'</html>' playlist_html = u'<html>' + playlist_cleaned + u'</html>'
size_cache = {}
doc = xml.etree.ElementTree.fromstring(playlist_html) doc = xml.etree.ElementTree.fromstring(playlist_html)
playlist = [] playlist = []
for li in doc.findall('./div/ul/li'): for li in doc.findall('./div/ul/li'):
title = li.find('.//h3').text on_click = li.find('.//a').attrib['onClick']
trailer_info_json = self._search_regex(self._JSON_RE,
on_click, u'trailer info')
trailer_info = json.loads(trailer_info_json)
title = trailer_info['title']
video_id = movie + '-' + re.sub(r'[^a-zA-Z0-9]', '', title).lower() video_id = movie + '-' + re.sub(r'[^a-zA-Z0-9]', '', title).lower()
thumbnail = li.find('.//img').attrib['src'] thumbnail = li.find('.//img').attrib['src']
upload_date = trailer_info['posted'].replace('-', '')
date_el = li.find('.//p') runtime = trailer_info['runtime']
upload_date = None m = re.search(r'(?P<minutes>[0-9]+):(?P<seconds>[0-9]{1,2})', runtime)
m = re.search(r':\s?(?P<month>[0-9]{2})/(?P<day>[0-9]{2})/(?P<year>[0-9]{2})', date_el.text)
if m:
upload_date = u'20' + m.group('year') + m.group('month') + m.group('day')
runtime_el = date_el.find('./br')
m = re.search(r':\s?(?P<minutes>[0-9]+):(?P<seconds>[0-9]{1,2})', runtime_el.tail)
duration = None duration = None
if m: if m:
duration = 60 * int(m.group('minutes')) + int(m.group('seconds')) duration = 60 * int(m.group('minutes')) + int(m.group('seconds'))
first_url = trailer_info['url']
trailer_id = first_url.split('/')[-1].rpartition('_')[0].lower()
settings_json_url = compat_urlparse.urljoin(url, 'includes/settings/%s.json' % trailer_id)
settings_json = self._download_webpage(settings_json_url, trailer_id, u'Downloading settings json')
settings = json.loads(settings_json)
formats = [] formats = []
for formats_el in li.findall('.//a'): for format in settings['metadata']['sizes']:
if formats_el.attrib['class'] != 'OverlayPanel': # The src is a file pointing to the real video file
continue format_url = re.sub(r'_(\d*p.mov)', r'_h\1', format['src'])
target = formats_el.attrib['target'] formats.append({
'url': format_url,
format_code = formats_el.text 'ext': determine_ext(format_url),
if 'Automatic' in format_code: 'format': format['type'],
continue 'width': format['width'],
'height': int(format['height']),
size_q = formats_el.attrib['href'] })
size_id = size_q.rpartition('#videos-')[2] formats = sorted(formats, key=lambda f: (f['height'], f['width']))
if size_id not in size_cache:
size_url = url + size_q
sizepage_html = self._download_webpage(
size_url, movie,
note=u'Downloading size info %s' % size_id,
errnote=u'Error while downloading size info %s' % size_id,
)
_doc = xml.etree.ElementTree.fromstring(sizepage_html)
size_cache[size_id] = _doc
sizepage_doc = size_cache[size_id]
links = sizepage_doc.findall('.//{http://www.w3.org/1999/xhtml}ul/{http://www.w3.org/1999/xhtml}li/{http://www.w3.org/1999/xhtml}a')
for vid_a in links:
href = vid_a.get('href')
if not href.endswith(target):
continue
detail_q = href.partition('#')[0]
detail_url = url + '/' + detail_q
m = re.match(r'includes/(?P<detail_id>[^/]+)/', detail_q)
detail_id = m.group('detail_id')
detail_html = self._download_webpage(
detail_url, movie,
note=u'Downloading detail %s %s' % (detail_id, size_id),
errnote=u'Error while downloading detail %s %s' % (detail_id, size_id)
)
detail_doc = xml.etree.ElementTree.fromstring(detail_html)
movie_link_el = detail_doc.find('.//{http://www.w3.org/1999/xhtml}a')
assert movie_link_el.get('class') == 'movieLink'
movie_link = movie_link_el.get('href').partition('?')[0].replace('_', '_h')
ext = determine_ext(movie_link)
assert ext == 'mov'
formats.append({
'format': format_code,
'ext': ext,
'url': movie_link,
})
info = { info = {
'_type': 'video', '_type': 'video',

View File

@ -1,3 +1,4 @@
# encoding: utf-8
import re import re
import json import json
import xml.etree.ElementTree import xml.etree.ElementTree
@ -7,15 +8,16 @@ from ..utils import (
ExtractorError, ExtractorError,
find_xpath_attr, find_xpath_attr,
unified_strdate, unified_strdate,
determine_ext,
get_element_by_id,
compat_str,
) )
# There are different sources of video in arte.tv, the extraction process
# is different for each one. The videos usually expire in 7 days, so we can't
# add tests.
class ArteTvIE(InfoExtractor): class ArteTvIE(InfoExtractor):
"""
There are two sources of video in arte.tv: videos.arte.tv and
www.arte.tv/guide, the extraction process is different for each one.
The videos expire in 7 days, so we can't add tests.
"""
_EMISSION_URL = r'(?:http://)?www\.arte.tv/guide/(?P<lang>fr|de)/(?:(?:sendungen|emissions)/)?(?P<id>.*?)/(?P<name>.*?)(\?.*)?'
_VIDEOS_URL = r'(?:http://)?videos.arte.tv/(?P<lang>fr|de)/.*-(?P<id>.*?).html' _VIDEOS_URL = r'(?:http://)?videos.arte.tv/(?P<lang>fr|de)/.*-(?P<id>.*?).html'
_LIVEWEB_URL = r'(?:http://)?liveweb.arte.tv/(?P<lang>fr|de)/(?P<subpage>.+?)/(?P<name>.+)' _LIVEWEB_URL = r'(?:http://)?liveweb.arte.tv/(?P<lang>fr|de)/(?P<subpage>.+?)/(?P<name>.+)'
_LIVE_URL = r'index-[0-9]+\.html$' _LIVE_URL = r'index-[0-9]+\.html$'
@ -24,7 +26,7 @@ class ArteTvIE(InfoExtractor):
@classmethod @classmethod
def suitable(cls, url): def suitable(cls, url):
return any(re.match(regex, url) for regex in (cls._EMISSION_URL, cls._VIDEOS_URL, cls._LIVEWEB_URL)) return any(re.match(regex, url) for regex in (cls._VIDEOS_URL, cls._LIVEWEB_URL))
# TODO implement Live Stream # TODO implement Live Stream
# from ..utils import compat_urllib_parse # from ..utils import compat_urllib_parse
@ -55,14 +57,6 @@ class ArteTvIE(InfoExtractor):
# video_url = u'%s/%s' % (info.get('url'), info.get('path')) # video_url = u'%s/%s' % (info.get('url'), info.get('path'))
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._EMISSION_URL, url)
if mobj is not None:
lang = mobj.group('lang')
# This is not a real id, it can be for example AJT for the news
# http://www.arte.tv/guide/fr/emissions/AJT/arte-journal
video_id = mobj.group('id')
return self._extract_emission(url, video_id, lang)
mobj = re.match(self._VIDEOS_URL, url) mobj = re.match(self._VIDEOS_URL, url)
if mobj is not None: if mobj is not None:
id = mobj.group('id') id = mobj.group('id')
@ -80,49 +74,6 @@ class ArteTvIE(InfoExtractor):
# self.extractLiveStream(url) # self.extractLiveStream(url)
# return # return
def _extract_emission(self, url, video_id, lang):
"""Extract from www.arte.tv/guide"""
webpage = self._download_webpage(url, video_id)
json_url = self._html_search_regex(r'arte_vp_url="(.*?)"', webpage, 'json url')
json_info = self._download_webpage(json_url, video_id, 'Downloading info json')
self.report_extraction(video_id)
info = json.loads(json_info)
player_info = info['videoJsonPlayer']
info_dict = {'id': player_info['VID'],
'title': player_info['VTI'],
'description': player_info.get('VDE'),
'upload_date': unified_strdate(player_info['VDA'].split(' ')[0]),
'thumbnail': player_info['programImage'],
'ext': 'flv',
}
formats = player_info['VSR'].values()
def _match_lang(f):
# Return true if that format is in the language of the url
if lang == 'fr':
l = 'F'
elif lang == 'de':
l = 'A'
regexes = [r'VO?%s' % l, r'VO?.-ST%s' % l]
return any(re.match(r, f['versionCode']) for r in regexes)
# Some formats may not be in the same language as the url
formats = filter(_match_lang, formats)
# We order the formats by quality
formats = sorted(formats, key=lambda f: int(f['height']))
# Prefer videos without subtitles in the same language
formats = sorted(formats, key=lambda f: re.match(r'VO(F|A)-STM\1', f['versionCode']) is None)
# Pick the best quality
format_info = formats[-1]
if format_info['mediaType'] == u'rtmp':
info_dict['url'] = format_info['streamer']
info_dict['play_path'] = 'mp4:' + format_info['url']
else:
info_dict['url'] = format_info['url']
return info_dict
def _extract_video(self, url, video_id, lang): def _extract_video(self, url, video_id, lang):
"""Extract from videos.arte.tv""" """Extract from videos.arte.tv"""
ref_xml_url = url.replace('/videos/', '/do_delegate/videos/') ref_xml_url = url.replace('/videos/', '/do_delegate/videos/')
@ -172,3 +123,140 @@ class ArteTvIE(InfoExtractor):
'ext': 'flv', 'ext': 'flv',
'thumbnail': self._og_search_thumbnail(webpage), 'thumbnail': self._og_search_thumbnail(webpage),
} }
class ArteTVPlus7IE(InfoExtractor):
IE_NAME = u'arte.tv:+7'
_VALID_URL = r'https?://www\.arte.tv/guide/(?P<lang>fr|de)/(?:(?:sendungen|emissions)/)?(?P<id>.*?)/(?P<name>.*?)(\?.*)?'
@classmethod
def _extract_url_info(cls, url):
mobj = re.match(cls._VALID_URL, url)
lang = mobj.group('lang')
# This is not a real id, it can be for example AJT for the news
# http://www.arte.tv/guide/fr/emissions/AJT/arte-journal
video_id = mobj.group('id')
return video_id, lang
def _real_extract(self, url):
video_id, lang = self._extract_url_info(url)
webpage = self._download_webpage(url, video_id)
return self._extract_from_webpage(webpage, video_id, lang)
def _extract_from_webpage(self, webpage, video_id, lang):
json_url = self._html_search_regex(r'arte_vp_url="(.*?)"', webpage, 'json url')
json_info = self._download_webpage(json_url, video_id, 'Downloading info json')
self.report_extraction(video_id)
info = json.loads(json_info)
player_info = info['videoJsonPlayer']
info_dict = {
'id': player_info['VID'],
'title': player_info['VTI'],
'description': player_info.get('VDE'),
'upload_date': unified_strdate(player_info.get('VDA', '').split(' ')[0]),
'thumbnail': player_info.get('programImage') or player_info.get('VTU', {}).get('IUR'),
}
all_formats = player_info['VSR'].values()
# Some formats use the m3u8 protocol
all_formats = list(filter(lambda f: f.get('videoFormat') != 'M3U8', all_formats))
def _match_lang(f):
if f.get('versionCode') is None:
return True
# Return true if that format is in the language of the url
if lang == 'fr':
l = 'F'
elif lang == 'de':
l = 'A'
regexes = [r'VO?%s' % l, r'VO?.-ST%s' % l]
return any(re.match(r, f['versionCode']) for r in regexes)
# Some formats may not be in the same language as the url
formats = filter(_match_lang, all_formats)
formats = list(formats) # in python3 filter returns an iterator
if not formats:
# Some videos are only available in the 'Originalversion'
# they aren't tagged as being in French or German
if all(f['versionCode'] == 'VO' for f in all_formats):
formats = all_formats
else:
raise ExtractorError(u'The formats list is empty')
if re.match(r'[A-Z]Q', formats[0]['quality']) is not None:
def sort_key(f):
return ['HQ', 'MQ', 'EQ', 'SQ'].index(f['quality'])
else:
def sort_key(f):
return (
# Sort first by quality
int(f.get('height',-1)),
int(f.get('bitrate',-1)),
# The original version with subtitles has lower relevance
re.match(r'VO-ST(F|A)', f.get('versionCode', '')) is None,
# The version with sourds/mal subtitles has also lower relevance
re.match(r'VO?(F|A)-STM\1', f.get('versionCode', '')) is None,
)
formats = sorted(formats, key=sort_key)
def _format(format_info):
quality = ''
height = format_info.get('height')
if height is not None:
quality = compat_str(height)
bitrate = format_info.get('bitrate')
if bitrate is not None:
quality += '-%d' % bitrate
if format_info.get('versionCode') is not None:
format_id = u'%s-%s' % (quality, format_info['versionCode'])
else:
format_id = quality
info = {
'format_id': format_id,
'format_note': format_info.get('versionLibelle'),
'width': format_info.get('width'),
'height': height,
}
if format_info['mediaType'] == u'rtmp':
info['url'] = format_info['streamer']
info['play_path'] = 'mp4:' + format_info['url']
info['ext'] = 'flv'
else:
info['url'] = format_info['url']
info['ext'] = determine_ext(info['url'])
return info
info_dict['formats'] = [_format(f) for f in formats]
return info_dict
# It also uses the arte_vp_url url from the webpage to extract the information
class ArteTVCreativeIE(ArteTVPlus7IE):
IE_NAME = u'arte.tv:creative'
_VALID_URL = r'https?://creative\.arte\.tv/(?P<lang>fr|de)/magazine?/(?P<id>.+)'
_TEST = {
u'url': u'http://creative.arte.tv/de/magazin/agentur-amateur-corporate-design',
u'file': u'050489-002.mp4',
u'info_dict': {
u'title': u'Agentur Amateur / Agence Amateur #2 : Corporate Design',
},
}
class ArteTVFutureIE(ArteTVPlus7IE):
IE_NAME = u'arte.tv:future'
_VALID_URL = r'https?://future\.arte\.tv/(?P<lang>fr|de)/(thema|sujet)/.*?#article-anchor-(?P<id>\d+)'
_TEST = {
u'url': u'http://future.arte.tv/fr/sujet/info-sciences#article-anchor-7081',
u'file': u'050940-003.mp4',
u'info_dict': {
u'title': u'Les champignons au secours de la planète',
},
}
def _real_extract(self, url):
anchor_id, lang = self._extract_url_info(url)
webpage = self._download_webpage(url, anchor_id)
row = get_element_by_id(anchor_id, webpage)
return self._extract_from_webpage(row, anchor_id, lang)

View File

@ -0,0 +1,80 @@
import re
import json
import itertools
from .common import InfoExtractor
from ..utils import (
compat_urllib_request,
)
class BambuserIE(InfoExtractor):
IE_NAME = u'bambuser'
_VALID_URL = r'https?://bambuser\.com/v/(?P<id>\d+)'
_API_KEY = '005f64509e19a868399060af746a00aa'
_TEST = {
u'url': u'http://bambuser.com/v/4050584',
u'md5': u'fba8f7693e48fd4e8641b3fd5539a641',
u'info_dict': {
u'id': u'4050584',
u'ext': u'flv',
u'title': u'Education engineering days - lightning talks',
u'duration': 3741,
u'uploader': u'pixelversity',
u'uploader_id': u'344706',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
info_url = ('http://player-c.api.bambuser.com/getVideo.json?'
'&api_key=%s&vid=%s' % (self._API_KEY, video_id))
info_json = self._download_webpage(info_url, video_id)
info = json.loads(info_json)['result']
return {
'id': video_id,
'title': info['title'],
'url': info['url'],
'thumbnail': info.get('preview'),
'duration': int(info['length']),
'view_count': int(info['views_total']),
'uploader': info['username'],
'uploader_id': info['uid'],
}
class BambuserChannelIE(InfoExtractor):
IE_NAME = u'bambuser:channel'
_VALID_URL = r'http://bambuser.com/channel/(?P<user>.*?)(?:/|#|\?|$)'
# The maximum number we can get with each request
_STEP = 50
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
user = mobj.group('user')
urls = []
last_id = ''
for i in itertools.count(1):
req_url = ('http://bambuser.com/xhr-api/index.php?username={user}'
'&sort=created&access_mode=0%2C1%2C2&limit={count}'
'&method=broadcast&format=json&vid_older_than={last}'
).format(user=user, count=self._STEP, last=last_id)
req = compat_urllib_request.Request(req_url)
# Without setting this header, we wouldn't get any result
req.add_header('Referer', 'http://bambuser.com/channel/%s' % user)
info_json = self._download_webpage(req, user,
u'Downloading page %d' % i)
results = json.loads(info_json)['result']
if len(results) == 0:
break
last_id = results[-1]['vid']
urls.extend(self.url_result(v['page'], 'Bambuser') for v in results)
return {
'_type': 'playlist',
'title': user,
'entries': urls,
}

View File

@ -115,7 +115,7 @@ class BlipTVIE(InfoExtractor):
ext = umobj.group(1) ext = umobj.group(1)
info = { info = {
'id': data['item_id'], 'id': compat_str(data['item_id']),
'url': video_url, 'url': video_url,
'uploader': data['display_name'], 'uploader': data['display_name'],
'upload_date': upload_date, 'upload_date': upload_date,

View File

@ -1,3 +1,5 @@
# encoding: utf-8
import re import re
import json import json
import xml.etree.ElementTree import xml.etree.ElementTree
@ -7,15 +9,53 @@ from ..utils import (
compat_urllib_parse, compat_urllib_parse,
find_xpath_attr, find_xpath_attr,
compat_urlparse, compat_urlparse,
compat_str,
compat_urllib_request,
ExtractorError,
) )
class BrightcoveIE(InfoExtractor): class BrightcoveIE(InfoExtractor):
_VALID_URL = r'https?://.*brightcove\.com/(services|viewer).*\?(?P<query>.*)' _VALID_URL = r'https?://.*brightcove\.com/(services|viewer).*\?(?P<query>.*)'
_FEDERATED_URL_TEMPLATE = 'http://c.brightcove.com/services/viewer/htmlFederated?%s' _FEDERATED_URL_TEMPLATE = 'http://c.brightcove.com/services/viewer/htmlFederated?%s'
_PLAYLIST_URL_TEMPLATE = 'http://c.brightcove.com/services/json/experience/runtime/?command=get_programming_for_experience&playerKey=%s' _PLAYLIST_URL_TEMPLATE = 'http://c.brightcove.com/services/json/experience/runtime/?command=get_programming_for_experience&playerKey=%s'
# There is a test for Brigtcove in GenericIE, that way we test both the download _TESTS = [
# and the detection of videos, and we don't have to find an URL that is always valid {
# From http://www.8tv.cat/8aldia/videos/xavier-sala-i-martin-aquesta-tarda-a-8-al-dia/
u'url': u'http://c.brightcove.com/services/viewer/htmlFederated?playerID=1654948606001&flashID=myExperience&%40videoPlayer=2371591881001',
u'file': u'2371591881001.mp4',
u'md5': u'8eccab865181d29ec2958f32a6a754f5',
u'note': u'Test Brightcove downloads and detection in GenericIE',
u'info_dict': {
u'title': u'Xavier Sala i Martín: “Un banc que no presta és un banc zombi que no serveix per a res”',
u'uploader': u'8TV',
u'description': u'md5:a950cc4285c43e44d763d036710cd9cd',
}
},
{
# From http://medianetwork.oracle.com/video/player/1785452137001
u'url': u'http://c.brightcove.com/services/viewer/htmlFederated?playerID=1217746023001&flashID=myPlayer&%40videoPlayer=1785452137001',
u'file': u'1785452137001.flv',
u'info_dict': {
u'title': u'JVMLS 2012: Arrays 2.0 - Opportunities and Challenges',
u'description': u'John Rose speaks at the JVM Language Summit, August 1, 2012.',
u'uploader': u'Oracle',
},
},
{
# From http://mashable.com/2013/10/26/thermoelectric-bracelet-lets-you-control-your-body-temperature/
u'url': u'http://c.brightcove.com/services/viewer/federated_f9?&playerID=1265504713001&publisherID=AQ%7E%7E%2CAAABBzUwv1E%7E%2CxP-xFHVUstiMFlNYfvF4G9yFnNaqCw_9&videoID=2750934548001',
u'info_dict': {
u'id': u'2750934548001',
u'ext': u'mp4',
u'title': u'This Bracelet Acts as a Personal Thermostat',
u'description': u'md5:547b78c64f4112766ccf4e151c20b6a0',
u'uploader': u'Mashable',
},
},
]
@classmethod @classmethod
def _build_brighcove_url(cls, object_str): def _build_brighcove_url(cls, object_str):
@ -23,6 +63,13 @@ class BrightcoveIE(InfoExtractor):
Build a Brightcove url from a xml string containing Build a Brightcove url from a xml string containing
<object class="BrightcoveExperience">{params}</object> <object class="BrightcoveExperience">{params}</object>
""" """
# Fix up some stupid HTML, see https://github.com/rg3/youtube-dl/issues/1553
object_str = re.sub(r'(<param name="[^"]+" value="[^"]+")>',
lambda m: m.group(1) + '/>', object_str)
# Fix up some stupid XML, see https://github.com/rg3/youtube-dl/issues/1608
object_str = object_str.replace(u'<--', u'<!--')
object_doc = xml.etree.ElementTree.fromstring(object_str) object_doc = xml.etree.ElementTree.fromstring(object_str)
assert u'BrightcoveExperience' in object_doc.attrib['class'] assert u'BrightcoveExperience' in object_doc.attrib['class']
params = {'flashID': object_doc.attrib['id'], params = {'flashID': object_doc.attrib['id'],
@ -35,24 +82,48 @@ class BrightcoveIE(InfoExtractor):
videoPlayer = find_xpath_attr(object_doc, './param', 'name', '@videoPlayer') videoPlayer = find_xpath_attr(object_doc, './param', 'name', '@videoPlayer')
if videoPlayer is not None: if videoPlayer is not None:
params['@videoPlayer'] = videoPlayer.attrib['value'] params['@videoPlayer'] = videoPlayer.attrib['value']
linkBase = find_xpath_attr(object_doc, './param', 'name', 'linkBaseURL')
if linkBase is not None:
params['linkBaseURL'] = linkBase.attrib['value']
data = compat_urllib_parse.urlencode(params) data = compat_urllib_parse.urlencode(params)
return cls._FEDERATED_URL_TEMPLATE % data return cls._FEDERATED_URL_TEMPLATE % data
@classmethod
def _extract_brightcove_url(cls, webpage):
"""Try to extract the brightcove url from the wepbage, returns None
if it can't be found
"""
m_brightcove = re.search(
r'<object[^>]+?class=([\'"])[^>]*?BrightcoveExperience.*?\1.+?</object>',
webpage, re.DOTALL)
if m_brightcove is not None:
return cls._build_brighcove_url(m_brightcove.group())
else:
return None
def _real_extract(self, url): def _real_extract(self, url):
# Change the 'videoId' and others field to '@videoPlayer'
url = re.sub(r'(?<=[?&])(videoI(d|D)|bctid)', '%40videoPlayer', url)
# Change bckey (used by bcove.me urls) to playerKey
url = re.sub(r'(?<=[?&])bckey', 'playerKey', url)
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
query_str = mobj.group('query') query_str = mobj.group('query')
query = compat_urlparse.parse_qs(query_str) query = compat_urlparse.parse_qs(query_str)
videoPlayer = query.get('@videoPlayer') videoPlayer = query.get('@videoPlayer')
if videoPlayer: if videoPlayer:
return self._get_video_info(videoPlayer[0], query_str) return self._get_video_info(videoPlayer[0], query_str, query)
else: else:
player_key = query['playerKey'] player_key = query['playerKey']
return self._get_playlist_info(player_key[0]) return self._get_playlist_info(player_key[0])
def _get_video_info(self, video_id, query): def _get_video_info(self, video_id, query_str, query):
request_url = self._FEDERATED_URL_TEMPLATE % query request_url = self._FEDERATED_URL_TEMPLATE % query_str
webpage = self._download_webpage(request_url, video_id) req = compat_urllib_request.Request(request_url)
linkBase = query.get('linkBaseURL')
if linkBase is not None:
req.add_header('Referer', linkBase[0])
webpage = self._download_webpage(req, video_id)
self.report_extraction(video_id) self.report_extraction(video_id)
info = self._search_regex(r'var experienceJSON = ({.*?});', webpage, 'json') info = self._search_regex(r'var experienceJSON = ({.*?});', webpage, 'json')
@ -65,22 +136,36 @@ class BrightcoveIE(InfoExtractor):
playlist_info = self._download_webpage(self._PLAYLIST_URL_TEMPLATE % player_key, playlist_info = self._download_webpage(self._PLAYLIST_URL_TEMPLATE % player_key,
player_key, u'Downloading playlist information') player_key, u'Downloading playlist information')
playlist_info = json.loads(playlist_info)['videoList'] json_data = json.loads(playlist_info)
if 'videoList' not in json_data:
raise ExtractorError(u'Empty playlist')
playlist_info = json_data['videoList']
videos = [self._extract_video_info(video_info) for video_info in playlist_info['mediaCollectionDTO']['videoDTOs']] videos = [self._extract_video_info(video_info) for video_info in playlist_info['mediaCollectionDTO']['videoDTOs']]
return self.playlist_result(videos, playlist_id=playlist_info['id'], return self.playlist_result(videos, playlist_id=playlist_info['id'],
playlist_title=playlist_info['mediaCollectionDTO']['displayName']) playlist_title=playlist_info['mediaCollectionDTO']['displayName'])
def _extract_video_info(self, video_info): def _extract_video_info(self, video_info):
renditions = video_info['renditions'] info = {
renditions = sorted(renditions, key=lambda r: r['size']) 'id': compat_str(video_info['id']),
best_format = renditions[-1] 'title': video_info['displayName'],
'description': video_info.get('shortDescription'),
'thumbnail': video_info.get('videoStillURL') or video_info.get('thumbnailURL'),
'uploader': video_info.get('publisherName'),
}
return {'id': video_info['id'], renditions = video_info.get('renditions')
'title': video_info['displayName'], if renditions:
'url': best_format['defaultURL'], renditions = sorted(renditions, key=lambda r: r['size'])
'ext': 'mp4', info['formats'] = [{
'description': video_info.get('shortDescription'), 'url': rend['defaultURL'],
'thumbnail': video_info.get('videoStillURL') or video_info.get('thumbnailURL'), 'height': rend.get('frameHeight'),
'uploader': video_info.get('publisherName'), 'width': rend.get('frameWidth'),
} } for rend in renditions]
elif video_info.get('FLVFullLengthURL') is not None:
info.update({
'url': video_info['FLVFullLengthURL'],
})
else:
raise ExtractorError(u'Unable to extract video url for %s' % info['id'])
return info

View File

@ -6,7 +6,7 @@ from .common import InfoExtractor
class Canalc2IE(InfoExtractor): class Canalc2IE(InfoExtractor):
IE_NAME = 'canalc2.tv' IE_NAME = 'canalc2.tv'
_VALID_URL = r'http://.*?\.canalc2\.tv/video\.asp\?idVideo=(\d+)&voir=oui' _VALID_URL = r'http://.*?\.canalc2\.tv/video\.asp\?.*?idVideo=(?P<id>\d+)'
_TEST = { _TEST = {
u'url': u'http://www.canalc2.tv/video.asp?idVideo=12163&voir=oui', u'url': u'http://www.canalc2.tv/video.asp?idVideo=12163&voir=oui',
@ -18,7 +18,9 @@ class Canalc2IE(InfoExtractor):
} }
def _real_extract(self, url): def _real_extract(self, url):
video_id = re.match(self._VALID_URL, url).group(1) video_id = re.match(self._VALID_URL, url).group('id')
# We need to set the voir field for getting the file name
url = 'http://www.canalc2.tv/video.asp?idVideo=%s&voir=oui' % video_id
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
file_name = self._search_regex( file_name = self._search_regex(
r"so\.addVariable\('file','(.*?)'\);", r"so\.addVariable\('file','(.*?)'\);",

View File

@ -0,0 +1,90 @@
# encoding: utf-8
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
)
class CinemassacreIE(InfoExtractor):
_VALID_URL = r'(?:http://)?(?:www\.)?(?P<url>cinemassacre\.com/(?P<date_Y>[0-9]{4})/(?P<date_m>[0-9]{2})/(?P<date_d>[0-9]{2})/.+?)(?:[/?].*)?'
_TESTS = [{
u'url': u'http://cinemassacre.com/2012/11/10/avgn-the-movie-trailer/',
u'file': u'19911.flv',
u'info_dict': {
u'upload_date': u'20121110',
u'title': u'“Angry Video Game Nerd: The Movie” Trailer',
u'description': u'md5:fb87405fcb42a331742a0dce2708560b',
},
u'params': {
# rtmp download
u'skip_download': True,
},
},
{
u'url': u'http://cinemassacre.com/2013/10/02/the-mummys-hand-1940',
u'file': u'521be8ef82b16.flv',
u'info_dict': {
u'upload_date': u'20131002',
u'title': u'The Mummys Hand (1940)',
},
u'params': {
# rtmp download
u'skip_download': True,
},
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
webpage_url = u'http://' + mobj.group('url')
webpage = self._download_webpage(webpage_url, None) # Don't know video id yet
video_date = mobj.group('date_Y') + mobj.group('date_m') + mobj.group('date_d')
mobj = re.search(r'src="(?P<embed_url>http://player\.screenwavemedia\.com/play/[a-zA-Z]+\.php\?id=(?:Cinemassacre-)?(?P<video_id>.+?))"', webpage)
if not mobj:
raise ExtractorError(u'Can\'t extract embed url and video id')
playerdata_url = mobj.group(u'embed_url')
video_id = mobj.group(u'video_id')
video_title = self._html_search_regex(r'<title>(?P<title>.+?)\|',
webpage, u'title')
video_description = self._html_search_regex(r'<div class="entry-content">(?P<description>.+?)</div>',
webpage, u'description', flags=re.DOTALL, fatal=False)
if len(video_description) == 0:
video_description = None
playerdata = self._download_webpage(playerdata_url, video_id)
url = self._html_search_regex(r'\'streamer\': \'(?P<url>[^\']+)\'', playerdata, u'url')
sd_file = self._html_search_regex(r'\'file\': \'(?P<sd_file>[^\']+)\'', playerdata, u'sd_file')
hd_file = self._html_search_regex(r'\'?file\'?: "(?P<hd_file>[^"]+)"', playerdata, u'hd_file')
video_thumbnail = self._html_search_regex(r'\'image\': \'(?P<thumbnail>[^\']+)\'', playerdata, u'thumbnail', fatal=False)
formats = [
{
'url': url,
'play_path': 'mp4:' + sd_file,
'rtmp_live': True, # workaround
'ext': 'flv',
'format': 'sd',
'format_id': 'sd',
},
{
'url': url,
'play_path': 'mp4:' + hd_file,
'rtmp_live': True, # workaround
'ext': 'flv',
'format': 'hd',
'format_id': 'hd',
},
]
return {
'id': video_id,
'title': video_title,
'formats': formats,
'description': video_description,
'upload_date': video_date,
'thumbnail': video_thumbnail,
}

View File

@ -6,7 +6,7 @@ from ..utils import determine_ext
class CNNIE(InfoExtractor): class CNNIE(InfoExtractor):
_VALID_URL = r'''(?x)https?://(edition\.)?cnn\.com/video/(data/.+?|\?)/ _VALID_URL = r'''(?x)https?://((edition|www)\.)?cnn\.com/video/(data/.+?|\?)/
(?P<path>.+?/(?P<title>[^/]+?)(?:\.cnn|(?=&)))''' (?P<path>.+?/(?P<title>[^/]+?)(?:\.cnn|(?=&)))'''
_TESTS = [{ _TESTS = [{

View File

@ -51,12 +51,12 @@ class ComedyCentralIE(InfoExtractor):
'400': 'mp4', '400': 'mp4',
} }
_video_dimensions = { _video_dimensions = {
'3500': '1280x720', '3500': (1280, 720),
'2200': '960x540', '2200': (960, 540),
'1700': '768x432', '1700': (768, 432),
'1200': '640x360', '1200': (640, 360),
'750': '512x288', '750': (512, 288),
'400': '384x216', '400': (384, 216),
} }
@classmethod @classmethod
@ -64,11 +64,13 @@ class ComedyCentralIE(InfoExtractor):
"""Receives a URL and returns True if suitable for this IE.""" """Receives a URL and returns True if suitable for this IE."""
return re.match(cls._VALID_URL, url, re.VERBOSE) is not None return re.match(cls._VALID_URL, url, re.VERBOSE) is not None
def _print_formats(self, formats): @staticmethod
print('Available formats:') def _transform_rtmp_url(rtmp_video_url):
for x in formats: m = re.match(r'^rtmpe?://.*?/(?P<finalid>gsp.comedystor/.*)$', rtmp_video_url)
print('%s\t:\t%s\t[%s]' %(x, self._video_extensions.get(x, 'mp4'), self._video_dimensions.get(x, '???'))) if not m:
raise ExtractorError(u'Cannot transform RTMP url')
base = 'http://mtvnmobile.vo.llnwd.net/kip0/_pxn=1+_pxI0=Ripod-h264+_pxL0=undefined+_pxM0=+_pxK=18639+_pxE=mp4/44620/mtvnorigin/'
return base + m.group('finalid')
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url, re.VERBOSE) mobj = re.match(self._VALID_URL, url, re.VERBOSE)
@ -155,40 +157,31 @@ class ComedyCentralIE(InfoExtractor):
self._downloader.report_error(u'unable to download ' + mediaId + ': No videos found') self._downloader.report_error(u'unable to download ' + mediaId + ': No videos found')
continue continue
if self._downloader.params.get('listformats', None): formats = []
self._print_formats([i[0] for i in turls]) for format, rtmp_video_url in turls:
return w, h = self._video_dimensions.get(format, (None, None))
formats.append({
# For now, just pick the highest bitrate 'url': self._transform_rtmp_url(rtmp_video_url),
format,rtmp_video_url = turls[-1] 'ext': self._video_extensions.get(format, 'mp4'),
'format_id': format,
# Get the format arg from the arg stream 'height': h,
req_format = self._downloader.params.get('format', None) 'width': w,
})
# Select format if we can find one
for f,v in turls:
if f == req_format:
format, rtmp_video_url = f, v
break
m = re.match(r'^rtmpe?://.*?/(?P<finalid>gsp.comedystor/.*)$', rtmp_video_url)
if not m:
raise ExtractorError(u'Cannot transform RTMP url')
base = 'http://mtvnmobile.vo.llnwd.net/kip0/_pxn=1+_pxI0=Ripod-h264+_pxL0=undefined+_pxM0=+_pxK=18639+_pxE=mp4/44620/mtvnorigin/'
video_url = base + m.group('finalid')
effTitle = showId + u'-' + epTitle + u' part ' + compat_str(partNum+1) effTitle = showId + u'-' + epTitle + u' part ' + compat_str(partNum+1)
info = { info = {
'id': shortMediaId, 'id': shortMediaId,
'url': video_url, 'formats': formats,
'uploader': showId, 'uploader': showId,
'upload_date': officialDate, 'upload_date': officialDate,
'title': effTitle, 'title': effTitle,
'ext': 'mp4',
'format': format,
'thumbnail': None, 'thumbnail': None,
'description': compat_str(officialTitle), 'description': compat_str(officialTitle),
} }
# TODO: Remove when #980 has been merged
info.update(info['formats'][-1])
results.append(info) results.append(info)
return results return results

View File

@ -14,6 +14,8 @@ from ..utils import (
clean_html, clean_html,
compiled_regex_type, compiled_regex_type,
ExtractorError, ExtractorError,
RegexNotFoundError,
sanitize_filename,
unescapeHTML, unescapeHTML,
) )
@ -35,6 +37,8 @@ class InfoExtractor(object):
title: Video title, unescaped. title: Video title, unescaped.
ext: Video filename extension. ext: Video filename extension.
Instead of url and ext, formats can also specified.
The following fields are optional: The following fields are optional:
format: The video format, defaults to ext (used for --get-format) format: The video format, defaults to ext (used for --get-format)
@ -52,8 +56,26 @@ class InfoExtractor(object):
view_count: How many users have watched the video on the platform. view_count: How many users have watched the video on the platform.
urlhandle: [internal] The urlHandle to be used to download the file, urlhandle: [internal] The urlHandle to be used to download the file,
like returned by urllib.request.urlopen like returned by urllib.request.urlopen
age_limit: Age restriction for the video, as an integer (years)
formats: A list of dictionaries for each format available, it must
be ordered from worst to best quality. Potential fields:
* url Mandatory. The URL of the video file
* ext Will be calculated from url if missing
* format A human-readable description of the format
("mp4 container with h264/opus").
Calculated from the format_id, width, height.
and format_note fields if missing.
* format_id A short description of the format
("mp4_h264_opus" or "19")
* format_note Additional info about the format
("3D" or "DASH video")
* width Width of the video, if known
* height Height of the video, if known
webpage_url: The url to the video webpage, if given to youtube-dl it
should allow to get the same result again. (It will be set
by YoutubeDL if it's missing)
The fields should all be Unicode strings. Unless mentioned otherwise, the fields should be Unicode strings.
Subclasses of this one should re-define the _real_initialize() and Subclasses of this one should re-define the _real_initialize() and
_real_extract() methods and define a _VALID_URL regexp. _real_extract() methods and define a _VALID_URL regexp.
@ -164,6 +186,17 @@ class InfoExtractor(object):
self.to_screen(u'Dumping request to ' + url) self.to_screen(u'Dumping request to ' + url)
dump = base64.b64encode(webpage_bytes).decode('ascii') dump = base64.b64encode(webpage_bytes).decode('ascii')
self._downloader.to_screen(dump) self._downloader.to_screen(dump)
if self._downloader.params.get('write_pages', False):
try:
url = url_or_request.get_full_url()
except AttributeError:
url = url_or_request
raw_filename = ('%s_%s.dump' % (video_id, url))
filename = sanitize_filename(raw_filename, restricted=True)
self.to_screen(u'Saving request to ' + filename)
with open(filename, 'wb') as outf:
outf.write(webpage_bytes)
content = webpage_bytes.decode(encoding, 'replace') content = webpage_bytes.decode(encoding, 'replace')
return (content, urlh) return (content, urlh)
@ -214,7 +247,7 @@ class InfoExtractor(object):
Perform a regex search on the given string, using a single or a list of Perform a regex search on the given string, using a single or a list of
patterns returning the first matching group. patterns returning the first matching group.
In case of failure return a default value or raise a WARNING or a In case of failure return a default value or raise a WARNING or a
ExtractorError, depending on fatal, specifying the field name. RegexNotFoundError, depending on fatal, specifying the field name.
""" """
if isinstance(pattern, (str, compat_str, compiled_regex_type)): if isinstance(pattern, (str, compat_str, compiled_regex_type)):
mobj = re.search(pattern, string, flags) mobj = re.search(pattern, string, flags)
@ -234,7 +267,7 @@ class InfoExtractor(object):
elif default is not None: elif default is not None:
return default return default
elif fatal: elif fatal:
raise ExtractorError(u'Unable to extract %s' % _name) raise RegexNotFoundError(u'Unable to extract %s' % _name)
else: else:
self._downloader.report_warning(u'unable to extract %s; ' self._downloader.report_warning(u'unable to extract %s; '
u'please report this issue on http://yt-dl.org/bug' % _name) u'please report this issue on http://yt-dl.org/bug' % _name)
@ -289,6 +322,8 @@ class InfoExtractor(object):
if name is None: if name is None:
name = 'OpenGraph %s' % prop name = 'OpenGraph %s' % prop
escaped = self._search_regex(self._og_regex(prop), html, name, flags=re.DOTALL, **kargs) escaped = self._search_regex(self._og_regex(prop), html, name, flags=re.DOTALL, **kargs)
if escaped is None:
return None
return unescapeHTML(escaped) return unescapeHTML(escaped)
def _og_search_thumbnail(self, html, **kargs): def _og_search_thumbnail(self, html, **kargs):
@ -300,10 +335,19 @@ class InfoExtractor(object):
def _og_search_title(self, html, **kargs): def _og_search_title(self, html, **kargs):
return self._og_search_property('title', html, **kargs) return self._og_search_property('title', html, **kargs)
def _og_search_video_url(self, html, name='video url', **kargs): def _og_search_video_url(self, html, name='video url', secure=True, **kargs):
return self._html_search_regex([self._og_regex('video:secure_url'), regexes = [self._og_regex('video')]
self._og_regex('video')], if secure: regexes.insert(0, self._og_regex('video:secure_url'))
html, name, **kargs) return self._html_search_regex(regexes, html, name, **kargs)
def _rta_search(self, html):
# See http://www.rtalabel.org/index.php?content=howtofaq#single
if re.search(r'(?ix)<meta\s+name="rating"\s+'
r' content="RTA-5042-1996-1400-1577-RTA"',
html):
return 18
return 0
class SearchInfoExtractor(InfoExtractor): class SearchInfoExtractor(InfoExtractor):
""" """
@ -342,7 +386,7 @@ class SearchInfoExtractor(InfoExtractor):
def _get_n_results(self, query, n): def _get_n_results(self, query, n):
"""Get a specified number of results for a query""" """Get a specified number of results for a query"""
raise NotImplementedError("This method must be implemented by sublclasses") raise NotImplementedError("This method must be implemented by subclasses")
@property @property
def SEARCH_KEY(self): def SEARCH_KEY(self):

View File

@ -10,25 +10,71 @@ from ..utils import (
compat_str, compat_str,
get_element_by_attribute, get_element_by_attribute,
get_element_by_id, get_element_by_id,
orderedSet,
ExtractorError, ExtractorError,
) )
class DailymotionBaseInfoExtractor(InfoExtractor):
@staticmethod
def _build_request(url):
"""Build a request with the family filter disabled"""
request = compat_urllib_request.Request(url)
request.add_header('Cookie', 'family_filter=off')
request.add_header('Cookie', 'ff=off')
return request
class DailymotionIE(SubtitlesInfoExtractor): class DailymotionIE(DailymotionBaseInfoExtractor, SubtitlesInfoExtractor):
"""Information Extractor for Dailymotion""" """Information Extractor for Dailymotion"""
_VALID_URL = r'(?i)(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/(?:embed/)?video/([^/]+)' _VALID_URL = r'(?i)(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/(?:embed/)?video/([^/]+)'
IE_NAME = u'dailymotion' IE_NAME = u'dailymotion'
_TEST = {
u'url': u'http://www.dailymotion.com/video/x33vw9_tutoriel-de-youtubeur-dl-des-video_tech', _FORMATS = [
u'file': u'x33vw9.mp4', (u'stream_h264_ld_url', u'ld'),
u'md5': u'392c4b85a60a90dc4792da41ce3144eb', (u'stream_h264_url', u'standard'),
u'info_dict': { (u'stream_h264_hq_url', u'hq'),
u"uploader": u"Amphora Alex and Van .", (u'stream_h264_hd_url', u'hd'),
u"title": u"Tutoriel de Youtubeur\"DL DES VIDEO DE YOUTUBE\"" (u'stream_h264_hd1080_url', u'hd180'),
]
_TESTS = [
{
u'url': u'http://www.dailymotion.com/video/x33vw9_tutoriel-de-youtubeur-dl-des-video_tech',
u'file': u'x33vw9.mp4',
u'md5': u'392c4b85a60a90dc4792da41ce3144eb',
u'info_dict': {
u"uploader": u"Amphora Alex and Van .",
u"title": u"Tutoriel de Youtubeur\"DL DES VIDEO DE YOUTUBE\""
}
},
# Vevo video
{
u'url': u'http://www.dailymotion.com/video/x149uew_katy-perry-roar-official_musi',
u'file': u'USUV71301934.mp4',
u'info_dict': {
u'title': u'Roar (Official)',
u'uploader': u'Katy Perry',
u'upload_date': u'20130905',
},
u'params': {
u'skip_download': True,
},
u'skip': u'VEVO is only available in some countries',
},
# age-restricted video
{
u'url': u'http://www.dailymotion.com/video/xyh2zz_leanna-decker-cyber-girl-of-the-year-desires-nude-playboy-plus_redband',
u'file': u'xyh2zz.mp4',
u'md5': u'0d667a7b9cebecc3c89ee93099c4159d',
u'info_dict': {
u'title': 'Leanna Decker - Cyber Girl Of The Year Desires Nude [Playboy Plus]',
u'uploader': 'HotWaves1012',
u'age_limit': 18,
}
} }
} ]
def _real_extract(self, url): def _real_extract(self, url):
# Extract id and simplified title from URL # Extract id and simplified title from URL
@ -36,21 +82,29 @@ class DailymotionIE(SubtitlesInfoExtractor):
video_id = mobj.group(1).split('_')[0].split('?')[0] video_id = mobj.group(1).split('_')[0].split('?')[0]
video_extension = 'mp4'
url = 'http://www.dailymotion.com/video/%s' % video_id url = 'http://www.dailymotion.com/video/%s' % video_id
# Retrieve video webpage to extract further information # Retrieve video webpage to extract further information
request = compat_urllib_request.Request(url) request = self._build_request(url)
request.add_header('Cookie', 'family_filter=off')
webpage = self._download_webpage(request, video_id) webpage = self._download_webpage(request, video_id)
# Extract URL, uploader and title from webpage # Extract URL, uploader and title from webpage
self.report_extraction(video_id) self.report_extraction(video_id)
# It may just embed a vevo video:
m_vevo = re.search(
r'<link rel="video_src" href="[^"]*?vevo.com[^"]*?videoId=(?P<id>[\w]*)',
webpage)
if m_vevo is not None:
vevo_id = m_vevo.group('id')
self.to_screen(u'Vevo video detected: %s' % vevo_id)
return self.url_result(u'vevo:%s' % vevo_id, ie='Vevo')
video_uploader = self._search_regex([r'(?im)<span class="owner[^\"]+?">[^<]+?<a [^>]+?>([^<]+?)</a>', video_uploader = self._search_regex([r'(?im)<span class="owner[^\"]+?">[^<]+?<a [^>]+?>([^<]+?)</a>',
# Looking for official user # Looking for official user
r'<(?:span|a) .*?rel="author".*?>([^<]+?)</'], r'<(?:span|a) .*?rel="author".*?>([^<]+?)</'],
webpage, 'video uploader') webpage, 'video uploader', fatal=False)
age_limit = self._rta_search(webpage)
video_upload_date = None video_upload_date = None
mobj = re.search(r'<div class="[^"]*uploaded_cont[^"]*" title="[^"]*">([0-9]{2})-([0-9]{2})-([0-9]{4})</div>', webpage) mobj = re.search(r'<div class="[^"]*uploaded_cont[^"]*" title="[^"]*">([0-9]{2})-([0-9]{2})-([0-9]{4})</div>', webpage)
@ -67,37 +121,43 @@ class DailymotionIE(SubtitlesInfoExtractor):
msg = 'Couldn\'t get video, Dailymotion says: %s' % info['error']['title'] msg = 'Couldn\'t get video, Dailymotion says: %s' % info['error']['title']
raise ExtractorError(msg, expected=True) raise ExtractorError(msg, expected=True)
# TODO: support choosing qualities formats = []
for (key, format_id) in self._FORMATS:
for key in ['stream_h264_hd1080_url','stream_h264_hd_url', video_url = info.get(key)
'stream_h264_hq_url','stream_h264_url', if video_url is not None:
'stream_h264_ld_url']: m_size = re.search(r'H264-(\d+)x(\d+)', video_url)
if info.get(key):#key in info and info[key]: if m_size is not None:
max_quality = key width, height = m_size.group(1), m_size.group(2)
self.to_screen(u'Using %s' % key) else:
break width, height = None, None
else: formats.append({
'url': video_url,
'ext': 'mp4',
'format_id': format_id,
'width': width,
'height': height,
})
if not formats:
raise ExtractorError(u'Unable to extract video URL') raise ExtractorError(u'Unable to extract video URL')
video_url = info[max_quality]
# subtitles # subtitles
video_subtitles = self.extract_subtitles(video_id) video_subtitles = self.extract_subtitles(video_id, webpage)
if self._downloader.params.get('listsubtitles', False): if self._downloader.params.get('listsubtitles', False):
self._list_available_subtitles(video_id) self._list_available_subtitles(video_id, webpage)
return return
return [{ return {
'id': video_id, 'id': video_id,
'url': video_url, 'formats': formats,
'uploader': video_uploader, 'uploader': video_uploader,
'upload_date': video_upload_date, 'upload_date': video_upload_date,
'title': self._og_search_title(webpage), 'title': self._og_search_title(webpage),
'ext': video_extension,
'subtitles': video_subtitles, 'subtitles': video_subtitles,
'thumbnail': info['thumbnail_url'] 'thumbnail': info['thumbnail_url'],
}] 'age_limit': age_limit,
}
def _get_available_subtitles(self, video_id): def _get_available_subtitles(self, video_id, webpage):
try: try:
sub_list = self._download_webpage( sub_list = self._download_webpage(
'https://api.dailymotion.com/video/%s/subtitles?fields=id,language,url' % video_id, 'https://api.dailymotion.com/video/%s/subtitles?fields=id,language,url' % video_id,
@ -113,7 +173,7 @@ class DailymotionIE(SubtitlesInfoExtractor):
return {} return {}
class DailymotionPlaylistIE(InfoExtractor): class DailymotionPlaylistIE(DailymotionBaseInfoExtractor):
IE_NAME = u'dailymotion:playlist' IE_NAME = u'dailymotion:playlist'
_VALID_URL = r'(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/playlist/(?P<id>.+?)/' _VALID_URL = r'(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/playlist/(?P<id>.+?)/'
_MORE_PAGES_INDICATOR = r'<div class="next">.*?<a.*?href="/playlist/.+?".*?>.*?</a>.*?</div>' _MORE_PAGES_INDICATOR = r'<div class="next">.*?<a.*?href="/playlist/.+?".*?>.*?</a>.*?</div>'
@ -122,16 +182,17 @@ class DailymotionPlaylistIE(InfoExtractor):
def _extract_entries(self, id): def _extract_entries(self, id):
video_ids = [] video_ids = []
for pagenum in itertools.count(1): for pagenum in itertools.count(1):
webpage = self._download_webpage(self._PAGE_TEMPLATE % (id, pagenum), request = self._build_request(self._PAGE_TEMPLATE % (id, pagenum))
webpage = self._download_webpage(request,
id, u'Downloading page %s' % pagenum) id, u'Downloading page %s' % pagenum)
playlist_el = get_element_by_attribute(u'class', u'video_list', webpage) playlist_el = get_element_by_attribute(u'class', u'video_list', webpage)
video_ids.extend(re.findall(r'data-id="(.+?)" data-ext-id', playlist_el)) video_ids.extend(re.findall(r'data-id="(.+?)"', playlist_el))
if re.search(self._MORE_PAGES_INDICATOR, webpage, re.DOTALL) is None: if re.search(self._MORE_PAGES_INDICATOR, webpage, re.DOTALL) is None:
break break
return [self.url_result('http://www.dailymotion.com/video/%s' % video_id, 'Dailymotion') return [self.url_result('http://www.dailymotion.com/video/%s' % video_id, 'Dailymotion')
for video_id in video_ids] for video_id in orderedSet(video_ids)]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)

View File

@ -25,7 +25,7 @@ class DepositFilesIE(InfoExtractor):
url = 'http://depositfiles.com/en/files/' + file_id url = 'http://depositfiles.com/en/files/' + file_id
# Retrieve file webpage with 'Free download' button pressed # Retrieve file webpage with 'Free download' button pressed
free_download_indication = { 'gateway_result' : '1' } free_download_indication = {'gateway_result' : '1'}
request = compat_urllib_request.Request(url, compat_urllib_parse.urlencode(free_download_indication)) request = compat_urllib_request.Request(url, compat_urllib_parse.urlencode(free_download_indication))
try: try:
self.report_download_webpage(file_id) self.report_download_webpage(file_id)

View File

@ -101,7 +101,7 @@ class EightTracksIE(InfoExtractor):
first_url = 'http://8tracks.com/sets/%s/play?player=sm&mix_id=%s&format=jsonh' % (session, mix_id) first_url = 'http://8tracks.com/sets/%s/play?player=sm&mix_id=%s&format=jsonh' % (session, mix_id)
next_url = first_url next_url = first_url
res = [] res = []
for i in itertools.count(): for i in range(track_count):
api_json = self._download_webpage(next_url, playlist_id, api_json = self._download_webpage(next_url, playlist_id,
note=u'Downloading song information %s/%s' % (str(i+1), track_count), note=u'Downloading song information %s/%s' % (str(i+1), track_count),
errnote=u'Failed to download song information') errnote=u'Failed to download song information')
@ -116,7 +116,5 @@ class EightTracksIE(InfoExtractor):
'ext': 'm4a', 'ext': 'm4a',
} }
res.append(info) res.append(info)
if api_data['set']['at_last_track']:
break
next_url = 'http://8tracks.com/sets/%s/next?player=sm&mix_id=%s&format=jsonh&track_id=%s' % (session, mix_id, track_data['id']) next_url = 'http://8tracks.com/sets/%s/next?player=sm&mix_id=%s&format=jsonh&track_id=%s' % (session, mix_id, track_data['id'])
return res return res

View File

@ -0,0 +1,37 @@
# encoding: utf-8
import re
from .common import InfoExtractor
from .brightcove import BrightcoveIE
from ..utils import ExtractorError
class EitbIE(InfoExtractor):
IE_NAME = u'eitb.tv'
_VALID_URL = r'https?://www\.eitb\.tv/(eu/bideoa|es/video)/[^/]+/(?P<playlist_id>\d+)/(?P<chapter_id>\d+)'
_TEST = {
u'add_ie': ['Brightcove'],
u'url': u'http://www.eitb.tv/es/video/60-minutos-60-minutos-2013-2014/2677100210001/2743577154001/lasa-y-zabala-30-anos/',
u'md5': u'edf4436247185adee3ea18ce64c47998',
u'info_dict': {
u'id': u'2743577154001',
u'ext': u'mp4',
u'title': u'60 minutos (Lasa y Zabala, 30 años)',
# All videos from eitb has this description in the brightcove info
u'description': u'.',
u'uploader': u'Euskal Telebista',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
chapter_id = mobj.group('chapter_id')
webpage = self._download_webpage(url, chapter_id)
bc_url = BrightcoveIE._extract_brightcove_url(webpage)
if bc_url is None:
raise ExtractorError(u'Could not extract the Brightcove url')
# The BrightcoveExperience object doesn't contain the video id, we set
# it manually
bc_url += '&%40videoPlayer={0}'.format(chapter_id)
return self.url_result(bc_url, BrightcoveIE.ie_key())

View File

@ -11,16 +11,17 @@ class ExfmIE(InfoExtractor):
_SOUNDCLOUD_URL = r'(?:http://)?(?:www\.)?api\.soundcloud.com/tracks/([^/]+)/stream' _SOUNDCLOUD_URL = r'(?:http://)?(?:www\.)?api\.soundcloud.com/tracks/([^/]+)/stream'
_TESTS = [ _TESTS = [
{ {
u'url': u'http://ex.fm/song/1bgtzg', u'url': u'http://ex.fm/song/eh359',
u'file': u'95223130.mp3', u'file': u'44216187.mp3',
u'md5': u'8a7967a3fef10e59a1d6f86240fd41cf', u'md5': u'e45513df5631e6d760970b14cc0c11e7',
u'info_dict': { u'info_dict': {
u"title": u"We Can't Stop - Miley Cyrus", u"title": u"Test House \"Love Is Not Enough\" (Extended Mix) DeadJournalist Exclusive",
u"uploader": u"Miley Cyrus", u"uploader": u"deadjournalist",
u'upload_date': u'20130603', u'upload_date': u'20120424',
u'description': u'Download "We Can\'t Stop" \r\niTunes: http://smarturl.it/WeCantStop?IQid=SC\r\nAmazon: http://smarturl.it/WeCantStopAMZ?IQid=SC', u'description': u'Test House \"Love Is Not Enough\" (Extended Mix) DeadJournalist Exclusive',
}, },
u'note': u'Soundcloud song', u'note': u'Soundcloud song',
u'skip': u'The site is down too often',
}, },
{ {
u'url': u'http://ex.fm/song/wddt8', u'url': u'http://ex.fm/song/wddt8',
@ -30,6 +31,7 @@ class ExfmIE(InfoExtractor):
u'title': u'Safe and Sound', u'title': u'Safe and Sound',
u'uploader': u'Capital Cities', u'uploader': u'Capital Cities',
}, },
u'skip': u'The site is down too often',
}, },
] ]

View File

@ -0,0 +1,50 @@
import os
import re
from .common import InfoExtractor
from ..utils import (
compat_urllib_parse_urlparse,
compat_urllib_request,
compat_urllib_parse,
)
class ExtremeTubeIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?(?:www\.)?(?P<url>extremetube\.com/video/.+?(?P<videoid>[0-9]+))(?:[/?&]|$)'
_TEST = {
u'url': u'http://www.extremetube.com/video/music-video-14-british-euro-brit-european-cumshots-swallow-652431',
u'file': u'652431.mp4',
u'md5': u'1fb9228f5e3332ec8c057d6ac36f33e0',
u'info_dict': {
u"title": u"Music Video 14 british euro brit european cumshots swallow",
u"uploader": u"unknown",
u"age_limit": 18,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('videoid')
url = 'http://www.' + mobj.group('url')
req = compat_urllib_request.Request(url)
req.add_header('Cookie', 'age_verified=1')
webpage = self._download_webpage(req, video_id)
video_title = self._html_search_regex(r'<h1 [^>]*?title="([^"]+)"[^>]*>\1<', webpage, u'title')
uploader = self._html_search_regex(r'>Posted by:(?=<)(?:\s|<[^>]*>)*(.+?)\|', webpage, u'uploader', fatal=False)
video_url = compat_urllib_parse.unquote(self._html_search_regex(r'video_url=(.+?)&amp;', webpage, u'video_url'))
path = compat_urllib_parse_urlparse(video_url).path
extension = os.path.splitext(path)[1][1:]
format = path.split('/')[5].split('_')[:2]
format = "-".join(format)
return {
'id': video_id,
'title': video_title,
'uploader': uploader,
'url': video_url,
'ext': extension,
'format': format,
'format_id': format,
'age_limit': 18,
}

View File

@ -19,7 +19,8 @@ class FacebookIE(InfoExtractor):
"""Information Extractor for Facebook""" """Information Extractor for Facebook"""
_VALID_URL = r'^(?:https?://)?(?:\w+\.)?facebook\.com/(?:video/video|photo)\.php\?(?:.*?)v=(?P<ID>\d+)(?:.*)' _VALID_URL = r'^(?:https?://)?(?:\w+\.)?facebook\.com/(?:video/video|photo)\.php\?(?:.*?)v=(?P<ID>\d+)(?:.*)'
_LOGIN_URL = 'https://login.facebook.com/login.php?m&next=http%3A%2F%2Fm.facebook.com%2Fhome.php&' _LOGIN_URL = 'https://www.facebook.com/login.php?next=http%3A%2F%2Ffacebook.com%2Fhome.php&login_attempt=1'
_CHECKPOINT_URL = 'https://www.facebook.com/checkpoint/?next=http%3A%2F%2Ffacebook.com%2Fhome.php&_fb_noscript=1'
_NETRC_MACHINE = 'facebook' _NETRC_MACHINE = 'facebook'
IE_NAME = u'facebook' IE_NAME = u'facebook'
_TEST = { _TEST = {
@ -36,50 +37,56 @@ class FacebookIE(InfoExtractor):
"""Report attempt to log in.""" """Report attempt to log in."""
self.to_screen(u'Logging in') self.to_screen(u'Logging in')
def _real_initialize(self): def _login(self):
if self._downloader is None: (useremail, password) = self._get_login_info()
return
useremail = None
password = None
downloader_params = self._downloader.params
# Attempt to use provided username and password or .netrc data
if downloader_params.get('username', None) is not None:
useremail = downloader_params['username']
password = downloader_params['password']
elif downloader_params.get('usenetrc', False):
try:
info = netrc.netrc().authenticators(self._NETRC_MACHINE)
if info is not None:
useremail = info[0]
password = info[2]
else:
raise netrc.NetrcParseError('No authenticators for %s' % self._NETRC_MACHINE)
except (IOError, netrc.NetrcParseError) as err:
self._downloader.report_warning(u'parsing .netrc: %s' % compat_str(err))
return
if useremail is None: if useremail is None:
return return
# Log in login_page_req = compat_urllib_request.Request(self._LOGIN_URL)
login_page_req.add_header('Cookie', 'locale=en_US')
self.report_login()
login_page = self._download_webpage(login_page_req, None, note=False,
errnote=u'Unable to download login page')
lsd = self._search_regex(r'"lsd":"(\w*?)"', login_page, u'lsd')
lgnrnd = self._search_regex(r'name="lgnrnd" value="([^"]*?)"', login_page, u'lgnrnd')
login_form = { login_form = {
'email': useremail, 'email': useremail,
'pass': password, 'pass': password,
'login': 'Log+In' 'lsd': lsd,
'lgnrnd': lgnrnd,
'next': 'http://facebook.com/home.php',
'default_persistent': '0',
'legacy_return': '1',
'timezone': '-60',
'trynum': '1',
} }
request = compat_urllib_request.Request(self._LOGIN_URL, compat_urllib_parse.urlencode(login_form)) request = compat_urllib_request.Request(self._LOGIN_URL, compat_urllib_parse.urlencode(login_form))
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
try: try:
self.report_login()
login_results = compat_urllib_request.urlopen(request).read() login_results = compat_urllib_request.urlopen(request).read()
if re.search(r'<form(.*)name="login"(.*)</form>', login_results) is not None: if re.search(r'<form(.*)name="login"(.*)</form>', login_results) is not None:
self._downloader.report_warning(u'unable to log in: bad username/password, or exceded login rate limit (~3/min). Check credentials or wait.') self._downloader.report_warning(u'unable to log in: bad username/password, or exceded login rate limit (~3/min). Check credentials or wait.')
return return
check_form = {
'fb_dtsg': self._search_regex(r'"fb_dtsg":"(.*?)"', login_results, u'fb_dtsg'),
'nh': self._search_regex(r'name="nh" value="(\w*?)"', login_results, u'nh'),
'name_action_selected': 'dont_save',
'submit[Continue]': self._search_regex(r'<input value="(.*?)" name="submit\[Continue\]"', login_results, u'continue'),
}
check_req = compat_urllib_request.Request(self._CHECKPOINT_URL, compat_urllib_parse.urlencode(check_form))
check_req.add_header('Content-Type', 'application/x-www-form-urlencoded')
check_response = compat_urllib_request.urlopen(check_req).read()
if re.search(r'id="checkpointSubmitButton"', check_response) is not None:
self._downloader.report_warning(u'Unable to confirm login, you have to login in your brower and authorize the login.')
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err: except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
self._downloader.report_warning(u'unable to log in: %s' % compat_str(err)) self._downloader.report_warning(u'unable to log in: %s' % compat_str(err))
return return
def _real_initialize(self):
self._login()
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
if mobj is None: if mobj is None:
@ -93,7 +100,13 @@ class FacebookIE(InfoExtractor):
AFTER = '.forEach(function(variable) {swf.addVariable(variable[0], variable[1]);});' AFTER = '.forEach(function(variable) {swf.addVariable(variable[0], variable[1]);});'
m = re.search(re.escape(BEFORE) + '(.*?)' + re.escape(AFTER), webpage) m = re.search(re.escape(BEFORE) + '(.*?)' + re.escape(AFTER), webpage)
if not m: if not m:
raise ExtractorError(u'Cannot parse data') m_msg = re.search(r'class="[^"]*uiInterstitialContent[^"]*"><div>(.*?)</div>', webpage)
if m_msg is not None:
raise ExtractorError(
u'The video is not available, Facebook said: "%s"' % m_msg.group(1),
expected=True)
else:
raise ExtractorError(u'Cannot parse data')
data = dict(json.loads(m.group(1))) data = dict(json.loads(m.group(1)))
params_raw = compat_urllib_parse.unquote(data['params']) params_raw = compat_urllib_parse.unquote(data['params'])
params = json.loads(params_raw) params = json.loads(params_raw)

View File

@ -0,0 +1,58 @@
# encoding: utf-8
import re
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import (
determine_ext,
)
class FazIE(InfoExtractor):
IE_NAME = u'faz.net'
_VALID_URL = r'https?://www\.faz\.net/multimedia/videos/.*?-(?P<id>\d+).html'
_TEST = {
u'url': u'http://www.faz.net/multimedia/videos/stockholm-chemie-nobelpreis-fuer-drei-amerikanische-forscher-12610585.html',
u'file': u'12610585.mp4',
u'info_dict': {
u'title': u'Stockholm: Chemie-Nobelpreis für drei amerikanische Forscher',
u'description': u'md5:1453fbf9a0d041d985a47306192ea253',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
self.to_screen(video_id)
webpage = self._download_webpage(url, video_id)
config_xml_url = self._search_regex(r'writeFLV\(\'(.+?)\',', webpage,
u'config xml url')
config_xml = self._download_webpage(config_xml_url, video_id,
u'Downloading config xml')
config = xml.etree.ElementTree.fromstring(config_xml.encode('utf-8'))
encodings = config.find('ENCODINGS')
formats = []
for code in ['LOW', 'HIGH', 'HQ']:
encoding = encodings.find(code)
if encoding is None:
continue
encoding_url = encoding.find('FILENAME').text
formats.append({
'url': encoding_url,
'ext': determine_ext(encoding_url),
'format_id': code.lower(),
})
descr = self._html_search_regex(r'<p class="Content Copy">(.*?)</p>', webpage, u'description')
info = {
'id': video_id,
'title': self._og_search_title(webpage),
'formats': formats,
'description': descr,
'thumbnail': config.find('STILL/STILL_BIG').text,
}
# TODO: Remove when #980 has been merged
info.update(formats[-1])
return info

View File

@ -9,7 +9,7 @@ from ..utils import (
class FlickrIE(InfoExtractor): class FlickrIE(InfoExtractor):
"""Information Extractor for Flickr videos""" """Information Extractor for Flickr videos"""
_VALID_URL = r'(?:https?://)?(?:www\.)?flickr\.com/photos/(?P<uploader_id>[\w\-_@]+)/(?P<id>\d+).*' _VALID_URL = r'(?:https?://)?(?:www\.|secure\.)?flickr\.com/photos/(?P<uploader_id>[\w\-_@]+)/(?P<id>\d+).*'
_TEST = { _TEST = {
u'url': u'http://www.flickr.com/photos/forestwander-nature-pictures/5645318632/in/photostream/', u'url': u'http://www.flickr.com/photos/forestwander-nature-pictures/5645318632/in/photostream/',
u'file': u'5645318632.mp4', u'file': u'5645318632.mp4',

View File

@ -70,7 +70,11 @@ class FranceTvInfoIE(FranceTVBaseInfoExtractor):
class France2IE(FranceTVBaseInfoExtractor): class France2IE(FranceTVBaseInfoExtractor):
IE_NAME = u'france2.fr' IE_NAME = u'france2.fr'
_VALID_URL = r'https?://www\.france2\.fr/emissions/.*?/videos/(?P<id>\d+)' _VALID_URL = r'''(?x)https?://www\.france2\.fr/
(?:
emissions/.*?/videos/(?P<id>\d+)
| emission/(?P<key>[^/?]+)
)'''
_TEST = { _TEST = {
u'url': u'http://www.france2.fr/emissions/13h15-le-samedi-le-dimanche/videos/75540104', u'url': u'http://www.france2.fr/emissions/13h15-le-samedi-le-dimanche/videos/75540104',
@ -86,12 +90,20 @@ class France2IE(FranceTVBaseInfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id') if mobj.group('key'):
webpage = self._download_webpage(url, mobj.group('key'))
video_id = self._html_search_regex(
r'''(?x)<div\s+class="video-player">\s*
<a\s+href="http://videos.francetv.fr/video/([0-9]+)"\s+
class="francetv-video-player">''',
webpage, u'video ID')
else:
video_id = mobj.group('id')
return self._extract_video(video_id) return self._extract_video(video_id)
class GenerationQuoiIE(InfoExtractor): class GenerationQuoiIE(InfoExtractor):
IE_NAME = u'http://generation-quoi.france2.fr' IE_NAME = u'france2.fr:generation-quoi'
_VALID_URL = r'https?://generation-quoi\.france2\.fr/portrait/(?P<name>.*)(\?|$)' _VALID_URL = r'https?://generation-quoi\.france2\.fr/portrait/(?P<name>.*)(\?|$)'
_TEST = { _TEST = {

View File

@ -0,0 +1,40 @@
import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
)
class GamekingsIE(InfoExtractor):
_VALID_URL = r'http?://www\.gamekings\.tv/videos/(?P<name>[0-9a-z\-]+)'
_TEST = {
u"url": u"http://www.gamekings.tv/videos/phoenix-wright-ace-attorney-dual-destinies-review/",
u'file': u'20130811.mp4',
u'md5': u'17f6088f7d0149ff2b46f2714bdb1954',
u'info_dict': {
u"title": u"Phoenix Wright: Ace Attorney \u2013 Dual Destinies Review",
u"description": u"Melle en Steven hebben voor de review een week in de rechtbank doorbracht met Phoenix Wright: Ace Attorney - Dual Destinies.",
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
name = mobj.group('name')
webpage = self._download_webpage(url, name)
video_url = self._og_search_video_url(webpage)
video = re.search(r'[0-9]+', video_url)
video_id = video.group(0)
# Todo: add medium format
video_url = video_url.replace(video_id, 'large/' + video_id)
return {
'id': video_id,
'ext': 'mp4',
'url': video_url,
'title': self._og_search_title(webpage),
'description': self._og_search_description(webpage),
}

View File

@ -1,55 +1,59 @@
import re import re
import xml.etree.ElementTree import json
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
unified_strdate,
compat_urllib_parse, compat_urllib_parse,
compat_urlparse,
unescapeHTML,
get_meta_content,
) )
class GameSpotIE(InfoExtractor): class GameSpotIE(InfoExtractor):
_VALID_URL = r'(?:http://)?(?:www\.)?gamespot\.com/.*-(?P<page_id>\d+)/?' _VALID_URL = r'(?:http://)?(?:www\.)?gamespot\.com/.*-(?P<page_id>\d+)/?'
_TEST = { _TEST = {
u"url": u"http://www.gamespot.com/arma-iii/videos/arma-iii-community-guide-sitrep-i-6410818/", u"url": u"http://www.gamespot.com/arma-iii/videos/arma-iii-community-guide-sitrep-i-6410818/",
u"file": u"6410818.mp4", u"file": u"gs-2300-6410818.mp4",
u"md5": u"b2a30deaa8654fcccd43713a6b6a4825", u"md5": u"b2a30deaa8654fcccd43713a6b6a4825",
u"info_dict": { u"info_dict": {
u"title": u"Arma 3 - Community Guide: SITREP I", u"title": u"Arma 3 - Community Guide: SITREP I",
u"upload_date": u"20130627", u'description': u'Check out this video where some of the basics of Arma 3 is explained.',
} }
} }
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
page_id = mobj.group('page_id') page_id = video_id = mobj.group('page_id')
webpage = self._download_webpage(url, page_id) webpage = self._download_webpage(url, page_id)
video_id = self._html_search_regex([r'"og:video" content=".*?\?id=(\d+)"', data_video_json = self._search_regex(r'data-video=\'(.*?)\'', webpage, u'data video')
r'http://www\.gamespot\.com/videoembed/(\d+)'], data_video = json.loads(unescapeHTML(data_video_json))
webpage, 'video id')
data = compat_urllib_parse.urlencode({'id': video_id, 'newplayer': '1'})
info_url = 'http://www.gamespot.com/pages/video_player/xml.php?' + data
info_xml = self._download_webpage(info_url, video_id)
doc = xml.etree.ElementTree.fromstring(info_xml)
clip_el = doc.find('./playList/clip')
http_urls = [{'url': node.find('filePath').text, # Transform the manifest url to a link to the mp4 files
'rate': int(node.find('rate').text)} # they are used in mobile devices.
for node in clip_el.find('./httpURI')] f4m_url = data_video['videoStreams']['f4m_stream']
best_quality = sorted(http_urls, key=lambda f: f['rate'])[-1] f4m_path = compat_urlparse.urlparse(f4m_url).path
video_url = best_quality['url'] QUALITIES_RE = r'((,\d+)+,?)'
title = clip_el.find('./title').text qualities = self._search_regex(QUALITIES_RE, f4m_path, u'qualities').strip(',').split(',')
ext = video_url.rpartition('.')[2] http_path = f4m_path[1:].split('/', 1)[1]
thumbnail_url = clip_el.find('./screenGrabURI').text http_template = re.sub(QUALITIES_RE, r'%s', http_path)
view_count = int(clip_el.find('./views').text) http_template = http_template.replace('.csmil/manifest.f4m', '')
upload_date = unified_strdate(clip_el.find('./postDate').text) http_template = compat_urlparse.urljoin('http://video.gamespotcdn.com/', http_template)
formats = []
for q in qualities:
formats.append({
'url': http_template % q,
'ext': 'mp4',
'format_id': q,
})
return [{ info = {
'id' : video_id, 'id': data_video['guid'],
'url' : video_url, 'title': compat_urllib_parse.unquote(data_video['title']),
'ext' : ext, 'formats': formats,
'title' : title, 'description': get_meta_content('description', webpage),
'thumbnail' : thumbnail_url, 'thumbnail': self._og_search_thumbnail(webpage),
'upload_date' : upload_date, }
'view_count' : view_count, # TODO: Remove when #980 has been merged
}] info.update(formats[-1])
return info

View File

@ -11,6 +11,8 @@ from ..utils import (
compat_urlparse, compat_urlparse,
ExtractorError, ExtractorError,
smuggle_url,
unescapeHTML,
) )
from .brightcove import BrightcoveIE from .brightcove import BrightcoveIE
@ -23,23 +25,52 @@ class GenericIE(InfoExtractor):
{ {
u'url': u'http://www.hodiho.fr/2013/02/regis-plante-sa-jeep.html', u'url': u'http://www.hodiho.fr/2013/02/regis-plante-sa-jeep.html',
u'file': u'13601338388002.mp4', u'file': u'13601338388002.mp4',
u'md5': u'85b90ccc9d73b4acd9138d3af4c27f89', u'md5': u'6e15c93721d7ec9e9ca3fdbf07982cfd',
u'info_dict': { u'info_dict': {
u"uploader": u"www.hodiho.fr", u"uploader": u"www.hodiho.fr",
u"title": u"R\u00e9gis plante sa Jeep" u"title": u"R\u00e9gis plante sa Jeep"
} }
}, },
# embedded vimeo video
{ {
u'url': u'http://www.8tv.cat/8aldia/videos/xavier-sala-i-martin-aquesta-tarda-a-8-al-dia/', u'add_ie': ['Vimeo'],
u'file': u'2371591881001.mp4', u'url': u'http://skillsmatter.com/podcast/home/move-semanticsperfect-forwarding-and-rvalue-references',
u'md5': u'9e80619e0a94663f0bdc849b4566af19', u'file': u'22444065.mp4',
u'note': u'Test Brightcove downloads and detection in GenericIE', u'md5': u'2903896e23df39722c33f015af0666e2',
u'info_dict': { u'info_dict': {
u'title': u'Xavier Sala i Martín: “Un banc que no presta és un banc zombi que no serveix per a res”', u'title': u'ACCU 2011: Move Semantics,Perfect Forwarding, and Rvalue references- Scott Meyers- 13/04/2011',
u'uploader': u'8TV', u"uploader_id": u"skillsmatter",
u'description': u'md5:a950cc4285c43e44d763d036710cd9cd', u"uploader": u"Skills Matter",
} }
}, },
# bandcamp page with custom domain
{
u'add_ie': ['Bandcamp'],
u'url': u'http://bronyrock.com/track/the-pony-mash',
u'file': u'3235767654.mp3',
u'info_dict': {
u'title': u'The Pony Mash',
u'uploader': u'M_Pallante',
},
u'skip': u'There is a limit of 200 free downloads / month for the test song',
},
# embedded brightcove video
# it also tests brightcove videos that need to set the 'Referer' in the
# http requests
{
u'add_ie': ['Brightcove'],
u'url': u'http://www.bfmtv.com/video/bfmbusiness/cours-bourse/cours-bourse-l-analyse-technique-154522/',
u'info_dict': {
u'id': u'2765128793001',
u'ext': u'mp4',
u'title': u'Le cours de bourse : lanalyse technique',
u'description': u'md5:7e9ad046e968cb2d1114004aba466fd9',
u'uploader': u'BFM BUSINESS',
},
u'params': {
u'skip_download': True,
},
},
] ]
def report_download_webpage(self, video_id): def report_download_webpage(self, video_id):
@ -128,16 +159,36 @@ class GenericIE(InfoExtractor):
except ValueError: except ValueError:
# since this is the last-resort InfoExtractor, if # since this is the last-resort InfoExtractor, if
# this error is thrown, it'll be thrown here # this error is thrown, it'll be thrown here
raise ExtractorError(u'Invalid URL: %s' % url) raise ExtractorError(u'Failed to download URL: %s' % url)
self.report_extraction(video_id) self.report_extraction(video_id)
# Look for BrightCove: # Look for BrightCove:
m_brightcove = re.search(r'<object.+?class=([\'"]).*?BrightcoveExperience.*?\1.+?</object>', webpage, re.DOTALL) bc_url = BrightcoveIE._extract_brightcove_url(webpage)
if m_brightcove is not None: if bc_url is not None:
self.to_screen(u'Brightcove video detected.') self.to_screen(u'Brightcove video detected.')
bc_url = BrightcoveIE._build_brighcove_url(m_brightcove.group())
return self.url_result(bc_url, 'Brightcove') return self.url_result(bc_url, 'Brightcove')
# Look for embedded Vimeo player
mobj = re.search(
r'<iframe[^>]+?src="(https?://player.vimeo.com/video/.+?)"', webpage)
if mobj:
player_url = unescapeHTML(mobj.group(1))
surl = smuggle_url(player_url, {'Referer': url})
return self.url_result(surl, 'Vimeo')
# Look for embedded YouTube player
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>https?://(?:www\.)?youtube.com/embed/.+?)\1', webpage)
if mobj:
surl = unescapeHTML(mobj.group(u'url'))
return self.url_result(surl, 'Youtube')
# Look for Bandcamp pages with custom domain
mobj = re.search(r'<meta property="og:url"[^>]*?content="(.*?bandcamp\.com.*?)"', webpage)
if mobj is not None:
burl = unescapeHTML(mobj.group(1))
return self.url_result(burl, 'Bandcamp')
# Start with something easy: JW Player in SWFObject # Start with something easy: JW Player in SWFObject
mobj = re.search(r'flashvars: [\'"](?:.*&)?file=(http[^\'"&]*)', webpage) mobj = re.search(r'flashvars: [\'"](?:.*&)?file=(http[^\'"&]*)', webpage)
if mobj is None: if mobj is None:
@ -160,12 +211,12 @@ class GenericIE(InfoExtractor):
# HTML5 video # HTML5 video
mobj = re.search(r'<video[^<]*(?:>.*?<source.*?)? src="([^"]+)"', webpage, flags=re.DOTALL) mobj = re.search(r'<video[^<]*(?:>.*?<source.*?)? src="([^"]+)"', webpage, flags=re.DOTALL)
if mobj is None: if mobj is None:
raise ExtractorError(u'Invalid URL: %s' % url) raise ExtractorError(u'Unsupported URL: %s' % url)
# It's possible that one of the regexes # It's possible that one of the regexes
# matched, but returned an empty group: # matched, but returned an empty group:
if mobj.group(1) is None: if mobj.group(1) is None:
raise ExtractorError(u'Invalid URL: %s' % url) raise ExtractorError(u'Did not find a valid video URL at %s' % url)
video_url = mobj.group(1) video_url = mobj.group(1)
video_url = compat_urlparse.urljoin(url, video_url) video_url = compat_urlparse.urljoin(url, video_url)

View File

@ -41,8 +41,9 @@ class GooglePlusIE(InfoExtractor):
# Extract update date # Extract update date
upload_date = self._html_search_regex( upload_date = self._html_search_regex(
['title="Timestamp">(.*?)</a>', r'<a.+?class="g-M.+?>(.+?)</a>'], r'''(?x)<a.+?class="o-U-s\s[^"]+"\s+style="display:\s*none"\s*>
webpage, u'upload date', fatal=False) ([0-9]{4}-[0-9]{2}-[0-9]{2})</a>''',
webpage, u'upload date', fatal=False, flags=re.VERBOSE)
if upload_date: if upload_date:
# Convert timestring to a format suitable for filename # Convert timestring to a format suitable for filename
upload_date = datetime.datetime.strptime(upload_date, "%Y-%m-%d") upload_date = datetime.datetime.strptime(upload_date, "%Y-%m-%d")

View File

@ -30,7 +30,7 @@ class HypemIE(InfoExtractor):
raise ExtractorError(u'Invalid URL: %s' % url) raise ExtractorError(u'Invalid URL: %s' % url)
track_id = mobj.group(1) track_id = mobj.group(1)
data = { 'ax': 1, 'ts': time.time() } data = {'ax': 1, 'ts': time.time()}
data_encoded = compat_urllib_parse.urlencode(data) data_encoded = compat_urllib_parse.urlencode(data)
complete_url = url + "?" + data_encoded complete_url = url + "?" + data_encoded
request = compat_urllib_request.Request(complete_url) request = compat_urllib_request.Request(complete_url)
@ -68,4 +68,4 @@ class HypemIE(InfoExtractor):
'ext': "mp3", 'ext': "mp3",
'title': title, 'title': title,
'artist': artist, 'artist': artist,
}] }]

View File

@ -13,7 +13,7 @@ class IGNIE(InfoExtractor):
Some videos of it.ign.com are also supported Some videos of it.ign.com are also supported
""" """
_VALID_URL = r'https?://.+?\.ign\.com/(?P<type>videos|show_videos|articles)(/.+)?/(?P<name_or_id>.+)' _VALID_URL = r'https?://.+?\.ign\.com/(?P<type>videos|show_videos|articles|(?:[^/]*/feature))(/.+)?/(?P<name_or_id>.+)'
IE_NAME = u'ign.com' IE_NAME = u'ign.com'
_CONFIG_URL_TEMPLATE = 'http://www.ign.com/videos/configs/id/%s.config' _CONFIG_URL_TEMPLATE = 'http://www.ign.com/videos/configs/id/%s.config'
@ -21,15 +21,39 @@ class IGNIE(InfoExtractor):
r'id="my_show_video">.*?<p>(.*?)</p>', r'id="my_show_video">.*?<p>(.*?)</p>',
] ]
_TEST = { _TESTS = [
u'url': u'http://www.ign.com/videos/2013/06/05/the-last-of-us-review', {
u'file': u'8f862beef863986b2785559b9e1aa599.mp4', u'url': u'http://www.ign.com/videos/2013/06/05/the-last-of-us-review',
u'md5': u'eac8bdc1890980122c3b66f14bdd02e9', u'file': u'8f862beef863986b2785559b9e1aa599.mp4',
u'info_dict': { u'md5': u'eac8bdc1890980122c3b66f14bdd02e9',
u'title': u'The Last of Us Review', u'info_dict': {
u'description': u'md5:c8946d4260a4d43a00d5ae8ed998870c', u'title': u'The Last of Us Review',
} u'description': u'md5:c8946d4260a4d43a00d5ae8ed998870c',
} }
},
{
u'url': u'http://me.ign.com/en/feature/15775/100-little-things-in-gta-5-that-will-blow-your-mind',
u'playlist': [
{
u'file': u'5ebbd138523268b93c9141af17bec937.mp4',
u'info_dict': {
u'title': u'GTA 5 Video Review',
u'description': u'Rockstar drops the mic on this generation of games. Watch our review of the masterly Grand Theft Auto V.',
},
},
{
u'file': u'638672ee848ae4ff108df2a296418ee2.mp4',
u'info_dict': {
u'title': u'GTA 5\'s Twisted Beauty in Super Slow Motion',
u'description': u'The twisted beauty of GTA 5 in stunning slow motion.',
},
},
],
u'params': {
u'skip_download': True,
},
},
]
def _find_video_id(self, webpage): def _find_video_id(self, webpage):
res_id = [r'data-video-id="(.+?)"', res_id = [r'data-video-id="(.+?)"',
@ -46,6 +70,13 @@ class IGNIE(InfoExtractor):
if page_type == 'articles': if page_type == 'articles':
video_url = self._search_regex(r'var videoUrl = "(.+?)"', webpage, u'video url') video_url = self._search_regex(r'var videoUrl = "(.+?)"', webpage, u'video url')
return self.url_result(video_url, ie='IGN') return self.url_result(video_url, ie='IGN')
elif page_type != 'video':
multiple_urls = re.findall(
'<param name="flashvars" value="[^"]*?url=(https?://www\.ign\.com/videos/.*?)["&]',
webpage)
if multiple_urls:
return [self.url_result(u, ie='IGN') for u in multiple_urls]
video_id = self._find_video_id(webpage) video_id = self._find_video_id(webpage)
result = self._get_video_info(video_id) result = self._get_video_info(video_id)
description = self._html_search_regex(self._DESCRIPTION_RE, description = self._html_search_regex(self._DESCRIPTION_RE,
@ -87,6 +118,9 @@ class OneUPIE(IGNIE):
} }
} }
# Override IGN tests
_TESTS = []
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
id = mobj.group('name_or_id') id = mobj.group('name_or_id')

View File

@ -26,7 +26,7 @@ class InstagramIE(InfoExtractor):
return [{ return [{
'id': video_id, 'id': video_id,
'url': self._og_search_video_url(webpage), 'url': self._og_search_video_url(webpage, secure=False),
'ext': 'mp4', 'ext': 'mp4',
'title': u'Video by %s' % uploader_id, 'title': u'Video by %s' % uploader_id,
'thumbnail': self._og_search_thumbnail(webpage), 'thumbnail': self._og_search_thumbnail(webpage),

View File

@ -0,0 +1,84 @@
import re
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import (
compat_urlparse,
compat_urllib_parse,
xpath_with_ns,
determine_ext,
)
class InternetVideoArchiveIE(InfoExtractor):
_VALID_URL = r'https?://video\.internetvideoarchive\.net/flash/players/.*?\?.*?publishedid.*?'
_TEST = {
u'url': u'http://video.internetvideoarchive.net/flash/players/flashconfiguration.aspx?customerid=69249&publishedid=452693&playerid=247',
u'file': u'452693.mp4',
u'info_dict': {
u'title': u'SKYFALL',
u'description': u'In SKYFALL, Bond\'s loyalty to M is tested as her past comes back to haunt her. As MI6 comes under attack, 007 must track down and destroy the threat, no matter how personal the cost.',
u'duration': 153,
},
}
@staticmethod
def _build_url(query):
return 'http://video.internetvideoarchive.net/flash/players/flashconfiguration.aspx?' + query
@staticmethod
def _clean_query(query):
NEEDED_ARGS = ['publishedid', 'customerid']
query_dic = compat_urlparse.parse_qs(query)
cleaned_dic = dict((k,v[0]) for (k,v) in query_dic.items() if k in NEEDED_ARGS)
# Other player ids return m3u8 urls
cleaned_dic['playerid'] = '247'
cleaned_dic['videokbrate'] = '100000'
return compat_urllib_parse.urlencode(cleaned_dic)
def _real_extract(self, url):
query = compat_urlparse.urlparse(url).query
query_dic = compat_urlparse.parse_qs(query)
video_id = query_dic['publishedid'][0]
url = self._build_url(query)
flashconfiguration_xml = self._download_webpage(url, video_id,
u'Downloading flash configuration')
flashconfiguration = xml.etree.ElementTree.fromstring(flashconfiguration_xml.encode('utf-8'))
file_url = flashconfiguration.find('file').text
file_url = file_url.replace('/playlist.aspx', '/mrssplaylist.aspx')
# Replace some of the parameters in the query to get the best quality
# and http links (no m3u8 manifests)
file_url = re.sub(r'(?<=\?)(.+)$',
lambda m: self._clean_query(m.group()),
file_url)
info_xml = self._download_webpage(file_url, video_id,
u'Downloading video info')
info = xml.etree.ElementTree.fromstring(info_xml.encode('utf-8'))
item = info.find('channel/item')
def _bp(p):
return xpath_with_ns(p,
{'media': 'http://search.yahoo.com/mrss/',
'jwplayer': 'http://developer.longtailvideo.com/trac/wiki/FlashFormats'})
formats = []
for content in item.findall(_bp('media:group/media:content')):
attr = content.attrib
f_url = attr['url']
formats.append({
'url': f_url,
'ext': determine_ext(f_url),
'width': int(attr['width']),
'bitrate': int(attr['bitrate']),
})
formats = sorted(formats, key=lambda f: f['bitrate'])
return {
'id': video_id,
'title': item.find('title').text,
'formats': formats,
'thumbnail': item.find(_bp('media:thumbnail')).attrib['url'],
'description': item.find('description').text,
'duration': int(attr['duration']),
}

View File

@ -6,13 +6,14 @@ import xml.etree.ElementTree
from .common import InfoExtractor from .common import InfoExtractor
class JeuxVideoIE(InfoExtractor): class JeuxVideoIE(InfoExtractor):
_VALID_URL = r'http://.*?\.jeuxvideo\.com/.*/(.*?)-\d+\.htm' _VALID_URL = r'http://.*?\.jeuxvideo\.com/.*/(.*?)-\d+\.htm'
_TEST = { _TEST = {
u'url': u'http://www.jeuxvideo.com/reportages-videos-jeux/0004/00046170/tearaway-playstation-vita-gc-2013-tearaway-nous-presente-ses-papiers-d-identite-00115182.htm', u'url': u'http://www.jeuxvideo.com/reportages-videos-jeux/0004/00046170/tearaway-playstation-vita-gc-2013-tearaway-nous-presente-ses-papiers-d-identite-00115182.htm',
u'file': u'5182.mp4', u'file': u'5182.mp4',
u'md5': u'e0fdb0cd3ce98713ef9c1e1e025779d0', u'md5': u'046e491afb32a8aaac1f44dd4ddd54ee',
u'info_dict': { u'info_dict': {
u'title': u'GC 2013 : Tearaway nous présente ses papiers d\'identité', u'title': u'GC 2013 : Tearaway nous présente ses papiers d\'identité',
u'description': u'Lorsque les développeurs de LittleBigPlanet proposent un nouveau titre, on ne peut que s\'attendre à un résultat original et fort attrayant.\n', u'description': u'Lorsque les développeurs de LittleBigPlanet proposent un nouveau titre, on ne peut que s\'attendre à un résultat original et fort attrayant.\n',
@ -23,25 +24,29 @@ class JeuxVideoIE(InfoExtractor):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
title = re.match(self._VALID_URL, url).group(1) title = re.match(self._VALID_URL, url).group(1)
webpage = self._download_webpage(url, title) webpage = self._download_webpage(url, title)
m_download = re.search(r'<param name="flashvars" value="config=(.*?)" />', webpage) xml_link = self._html_search_regex(
r'<param name="flashvars" value="config=(.*?)" />',
xml_link = m_download.group(1) webpage, u'config URL')
id = re.search(r'http://www.jeuxvideo.com/config/\w+/0011/(.*?)/\d+_player\.xml', xml_link).group(1) video_id = self._search_regex(
r'http://www\.jeuxvideo\.com/config/\w+/\d+/(.*?)/\d+_player\.xml',
xml_link, u'video ID')
xml_config = self._download_webpage(xml_link, title, xml_config = self._download_webpage(
'Downloading XML config') xml_link, title, u'Downloading XML config')
config = xml.etree.ElementTree.fromstring(xml_config.encode('utf-8')) config = xml.etree.ElementTree.fromstring(xml_config.encode('utf-8'))
info = re.search(r'<format\.json>(.*?)</format\.json>', info_json = self._search_regex(
xml_config, re.MULTILINE|re.DOTALL).group(1) r'(?sm)<format\.json>(.*?)</format\.json>',
info = json.loads(info)['versions'][0] xml_config, u'JSON information')
info = json.loads(info_json)['versions'][0]
video_url = 'http://video720.jeuxvideo.com/' + info['file'] video_url = 'http://video720.jeuxvideo.com/' + info['file']
return {'id': id, return {
'title' : config.find('titre_video').text, 'id': video_id,
'ext' : 'mp4', 'title': config.find('titre_video').text,
'url' : video_url, 'ext': 'mp4',
'description': self._og_search_description(webpage), 'url': video_url,
'thumbnail': config.find('image').text, 'description': self._og_search_description(webpage),
} 'thumbnail': config.find('image').text,
}

View File

@ -1,8 +1,10 @@
import re import re
import hashlib
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import determine_ext from ..utils import determine_ext
_md5 = lambda s: hashlib.md5(s.encode('utf-8')).hexdigest()
class KankanIE(InfoExtractor): class KankanIE(InfoExtractor):
_VALID_URL = r'https?://(?:.*?\.)?kankan\.com/.+?/(?P<id>\d+)\.shtml' _VALID_URL = r'https?://(?:.*?\.)?kankan\.com/.+?/(?P<id>\d+)\.shtml'
@ -30,7 +32,10 @@ class KankanIE(InfoExtractor):
video_id, u'Downloading video url info') video_id, u'Downloading video url info')
ip = self._search_regex(r'ip:"(.+?)"', video_info_page, u'video url ip') ip = self._search_regex(r'ip:"(.+?)"', video_info_page, u'video url ip')
path = self._search_regex(r'path:"(.+?)"', video_info_page, u'video url path') path = self._search_regex(r'path:"(.+?)"', video_info_page, u'video url path')
video_url = 'http://%s%s' % (ip, path) param1 = self._search_regex(r'param1:(\d+)', video_info_page, u'param1')
param2 = self._search_regex(r'param2:(\d+)', video_info_page, u'param2')
key = _md5('xl_mp43651' + param1 + param2)
video_url = 'http://%s%s?key=%s&key1=%s' % (ip, path, key, param2)
return {'id': video_id, return {'id': video_id,
'title': title, 'title': title,

View File

@ -0,0 +1,61 @@
import os
import re
from .common import InfoExtractor
from ..utils import (
compat_urllib_parse_urlparse,
compat_urllib_request,
compat_urllib_parse,
)
from ..aes import (
aes_decrypt_text
)
class KeezMoviesIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?(?:www\.)?(?P<url>keezmovies\.com/video/.+?(?P<videoid>[0-9]+))(?:[/?&]|$)'
_TEST = {
u'url': u'http://www.keezmovies.com/video/petite-asian-lady-mai-playing-in-bathtub-1214711',
u'file': u'1214711.mp4',
u'md5': u'6e297b7e789329923fcf83abb67c9289',
u'info_dict': {
u"title": u"Petite Asian Lady Mai Playing In Bathtub",
u"age_limit": 18,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('videoid')
url = 'http://www.' + mobj.group('url')
req = compat_urllib_request.Request(url)
req.add_header('Cookie', 'age_verified=1')
webpage = self._download_webpage(req, video_id)
# embedded video
mobj = re.search(r'href="([^"]+)"></iframe>', webpage)
if mobj:
embedded_url = mobj.group(1)
return self.url_result(embedded_url)
video_title = self._html_search_regex(r'<h1 [^>]*>([^<]+)', webpage, u'title')
video_url = compat_urllib_parse.unquote(self._html_search_regex(r'video_url=(.+?)&amp;', webpage, u'video_url'))
if webpage.find('encrypted=true')!=-1:
password = self._html_search_regex(r'video_title=(.+?)&amp;', webpage, u'password')
video_url = aes_decrypt_text(video_url, password, 32).decode('utf-8')
path = compat_urllib_parse_urlparse(video_url).path
extension = os.path.splitext(path)[1][1:]
format = path.split('/')[4].split('_')[:2]
format = "-".join(format)
age_limit = self._rta_search(webpage)
return {
'id': video_id,
'title': video_title,
'url': video_url,
'ext': extension,
'format': format,
'format_id': format,
'age_limit': age_limit,
}

View File

@ -40,13 +40,9 @@ class LivestreamIE(InfoExtractor):
if video_id is None: if video_id is None:
# This is an event page: # This is an event page:
player = get_meta_content('twitter:player', webpage) config_json = self._search_regex(r'window.config = ({.*?});',
if player is None: webpage, u'window config')
raise ExtractorError('Couldn\'t extract event api url') info = json.loads(config_json)['event']
api_url = player.replace('/player', '')
api_url = re.sub(r'^(https?://)(new\.)', r'\1api.\2', api_url)
info = json.loads(self._download_webpage(api_url, event_name,
u'Downloading event info'))
videos = [self._extract_video_info(video_data['data']) videos = [self._extract_video_info(video_data['data'])
for video_data in info['feed']['data'] if video_data['type'] == u'video'] for video_data in info['feed']['data'] if video_data['type'] == u'video']
return self.playlist_result(videos, info['id'], info['full_name']) return self.playlist_result(videos, info['id'], info['full_name'])

View File

@ -20,10 +20,12 @@ class MetacafeIE(InfoExtractor):
_DISCLAIMER = 'http://www.metacafe.com/family_filter/' _DISCLAIMER = 'http://www.metacafe.com/family_filter/'
_FILTER_POST = 'http://www.metacafe.com/f/index.php?inputType=filter&controllerGroup=user' _FILTER_POST = 'http://www.metacafe.com/f/index.php?inputType=filter&controllerGroup=user'
IE_NAME = u'metacafe' IE_NAME = u'metacafe'
_TESTS = [{ _TESTS = [
# Youtube video
{
u"add_ie": ["Youtube"], u"add_ie": ["Youtube"],
u"url": u"http://metacafe.com/watch/yt-_aUehQsCQtM/the_electric_company_short_i_pbs_kids_go/", u"url": u"http://metacafe.com/watch/yt-_aUehQsCQtM/the_electric_company_short_i_pbs_kids_go/",
u"file": u"_aUehQsCQtM.flv", u"file": u"_aUehQsCQtM.mp4",
u"info_dict": { u"info_dict": {
u"upload_date": u"20090102", u"upload_date": u"20090102",
u"title": u"The Electric Company | \"Short I\" | PBS KIDS GO!", u"title": u"The Electric Company | \"Short I\" | PBS KIDS GO!",
@ -32,15 +34,42 @@ class MetacafeIE(InfoExtractor):
u"uploader_id": u"PBS" u"uploader_id": u"PBS"
} }
}, },
# Normal metacafe video
{
u'url': u'http://www.metacafe.com/watch/11121940/news_stuff_you_wont_do_with_your_playstation_4/',
u'md5': u'6e0bca200eaad2552e6915ed6fd4d9ad',
u'info_dict': {
u'id': u'11121940',
u'ext': u'mp4',
u'title': u'News: Stuff You Won\'t Do with Your PlayStation 4',
u'uploader': u'ign',
u'description': u'Sony released a massive FAQ on the PlayStation Blog detailing the PS4\'s capabilities and limitations.',
},
},
# AnyClip video
{ {
u"url": u"http://www.metacafe.com/watch/an-dVVXnuY7Jh77J/the_andromeda_strain_1971_stop_the_bomb_part_3/", u"url": u"http://www.metacafe.com/watch/an-dVVXnuY7Jh77J/the_andromeda_strain_1971_stop_the_bomb_part_3/",
u"file": u"an-dVVXnuY7Jh77J.mp4", u"file": u"an-dVVXnuY7Jh77J.mp4",
u"info_dict": { u"info_dict": {
u"title": u"The Andromeda Strain (1971): Stop the Bomb Part 3", u"title": u"The Andromeda Strain (1971): Stop the Bomb Part 3",
u"uploader": u"anyclip", u"uploader": u"anyclip",
u"description": u"md5:38c711dd98f5bb87acf973d573442e67" u"description": u"md5:38c711dd98f5bb87acf973d573442e67",
} },
}] },
# age-restricted video
{
u'url': u'http://www.metacafe.com/watch/5186653/bbc_internal_christmas_tape_79_uncensored_outtakes_etc/',
u'md5': u'98dde7c1a35d02178e8ab7560fe8bd09',
u'info_dict': {
u'id': u'5186653',
u'ext': u'mp4',
u'title': u'BBC INTERNAL Christmas Tape \'79 - UNCENSORED Outtakes, Etc.',
u'uploader': u'Dwayne Pipe',
u'description': u'md5:950bf4c581e2c059911fa3ffbe377e4b',
u'age_limit': 18,
},
},
]
def report_disclaimer(self): def report_disclaimer(self):
@ -62,6 +91,7 @@ class MetacafeIE(InfoExtractor):
'submit': "Continue - I'm over 18", 'submit': "Continue - I'm over 18",
} }
request = compat_urllib_request.Request(self._FILTER_POST, compat_urllib_parse.urlencode(disclaimer_form)) request = compat_urllib_request.Request(self._FILTER_POST, compat_urllib_parse.urlencode(disclaimer_form))
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
try: try:
self.report_age_confirmation() self.report_age_confirmation()
compat_urllib_request.urlopen(request).read() compat_urllib_request.urlopen(request).read()
@ -83,7 +113,12 @@ class MetacafeIE(InfoExtractor):
# Retrieve video webpage to extract further information # Retrieve video webpage to extract further information
req = compat_urllib_request.Request('http://www.metacafe.com/watch/%s/' % video_id) req = compat_urllib_request.Request('http://www.metacafe.com/watch/%s/' % video_id)
req.headers['Cookie'] = 'flashVersion=0;'
# AnyClip videos require the flashversion cookie so that we get the link
# to the mp4 file
mobj_an = re.match(r'^an-(.*?)$', video_id)
if mobj_an:
req.headers['Cookie'] = 'flashVersion=0;'
webpage = self._download_webpage(req, video_id) webpage = self._download_webpage(req, video_id)
# Extract URL, uploader and title from webpage # Extract URL, uploader and title from webpage
@ -125,6 +160,11 @@ class MetacafeIE(InfoExtractor):
r'submitter=(.*?);|googletag\.pubads\(\)\.setTargeting\("(?:channel|submiter)","([^"]+)"\);', r'submitter=(.*?);|googletag\.pubads\(\)\.setTargeting\("(?:channel|submiter)","([^"]+)"\);',
webpage, u'uploader nickname', fatal=False) webpage, u'uploader nickname', fatal=False)
if re.search(r'"contentRating":"restricted"', webpage) is not None:
age_limit = 18
else:
age_limit = 0
return { return {
'_type': 'video', '_type': 'video',
'id': video_id, 'id': video_id,
@ -134,4 +174,5 @@ class MetacafeIE(InfoExtractor):
'upload_date': None, 'upload_date': None,
'title': video_title, 'title': video_title,
'ext': video_ext, 'ext': video_ext,
'age_limit': age_limit,
} }

View File

@ -0,0 +1,49 @@
import os
import re
from .common import InfoExtractor
from ..utils import (
compat_urllib_parse_urlparse,
compat_urllib_request,
compat_urllib_parse,
)
class MofosexIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?(?:www\.)?(?P<url>mofosex\.com/videos/(?P<videoid>[0-9]+)/.*?\.html)'
_TEST = {
u'url': u'http://www.mofosex.com/videos/5018/japanese-teen-music-video.html',
u'file': u'5018.mp4',
u'md5': u'1b2eb47ac33cc75d4a80e3026b613c5a',
u'info_dict': {
u"title": u"Japanese Teen Music Video",
u"age_limit": 18,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('videoid')
url = 'http://www.' + mobj.group('url')
req = compat_urllib_request.Request(url)
req.add_header('Cookie', 'age_verified=1')
webpage = self._download_webpage(req, video_id)
video_title = self._html_search_regex(r'<h1>(.+?)<', webpage, u'title')
video_url = compat_urllib_parse.unquote(self._html_search_regex(r'flashvars.video_url = \'([^\']+)', webpage, u'video_url'))
path = compat_urllib_parse_urlparse(video_url).path
extension = os.path.splitext(path)[1][1:]
format = path.split('/')[5].split('_')[:2]
format = "-".join(format)
age_limit = self._rta_search(webpage)
return {
'id': video_id,
'title': video_title,
'url': video_url,
'ext': extension,
'format': format,
'format_id': format,
'age_limit': age_limit,
}

View File

@ -26,6 +26,7 @@ class MTVIE(InfoExtractor):
}, },
}, },
{ {
u'add_ie': ['Vevo'],
u'url': u'http://www.mtv.com/videos/taylor-swift/916187/everything-has-changed-ft-ed-sheeran.jhtml', u'url': u'http://www.mtv.com/videos/taylor-swift/916187/everything-has-changed-ft-ed-sheeran.jhtml',
u'file': u'USCJY1331283.mp4', u'file': u'USCJY1331283.mp4',
u'md5': u'73b4e7fcadd88929292fe52c3ced8caf', u'md5': u'73b4e7fcadd88929292fe52c3ced8caf',
@ -54,46 +55,57 @@ class MTVIE(InfoExtractor):
def _get_thumbnail_url(self, uri, itemdoc): def _get_thumbnail_url(self, uri, itemdoc):
return 'http://mtv.mtvnimages.com/uri/' + uri return 'http://mtv.mtvnimages.com/uri/' + uri
def _extract_video_url(self, metadataXml): def _extract_video_formats(self, metadataXml):
if '/error_country_block.swf' in metadataXml: if '/error_country_block.swf' in metadataXml:
raise ExtractorError(u'This video is not available from your country.', expected=True) raise ExtractorError(u'This video is not available from your country.', expected=True)
mdoc = xml.etree.ElementTree.fromstring(metadataXml.encode('utf-8')) mdoc = xml.etree.ElementTree.fromstring(metadataXml.encode('utf-8'))
renditions = mdoc.findall('.//rendition') renditions = mdoc.findall('.//rendition')
# For now, always pick the highest quality. formats = []
rendition = renditions[-1] for rendition in mdoc.findall('.//rendition'):
try:
try: _, _, ext = rendition.attrib['type'].partition('/')
_,_,ext = rendition.attrib['type'].partition('/') rtmp_video_url = rendition.find('./src').text
format = ext + '-' + rendition.attrib['width'] + 'x' + rendition.attrib['height'] + '_' + rendition.attrib['bitrate'] formats.append({'ext': ext,
rtmp_video_url = rendition.find('./src').text 'url': self._transform_rtmp_url(rtmp_video_url),
except KeyError: 'format_id': rendition.get('bitrate'),
raise ExtractorError('Invalid rendition field.') 'width': int(rendition.get('width')),
video_url = self._transform_rtmp_url(rtmp_video_url) 'height': int(rendition.get('height')),
return {'ext': ext, 'url': video_url, 'format': format} })
except (KeyError, TypeError):
raise ExtractorError('Invalid rendition field.')
return formats
def _get_video_info(self, itemdoc): def _get_video_info(self, itemdoc):
uri = itemdoc.find('guid').text uri = itemdoc.find('guid').text
video_id = self._id_from_uri(uri) video_id = self._id_from_uri(uri)
self.report_extraction(video_id) self.report_extraction(video_id)
mediagen_url = itemdoc.find('%s/%s' % (_media_xml_tag('group'), _media_xml_tag('content'))).attrib['url'] mediagen_url = itemdoc.find('%s/%s' % (_media_xml_tag('group'), _media_xml_tag('content'))).attrib['url']
# Remove the templates, like &device={device}
mediagen_url = re.sub(r'&[^=]*?={.*?}(?=(&|$))', u'', mediagen_url)
if 'acceptMethods' not in mediagen_url: if 'acceptMethods' not in mediagen_url:
mediagen_url += '&acceptMethods=fms' mediagen_url += '&acceptMethods=fms'
mediagen_page = self._download_webpage(mediagen_url, video_id, mediagen_page = self._download_webpage(mediagen_url, video_id,
u'Downloading video urls') u'Downloading video urls')
video_info = self._extract_video_url(mediagen_page)
description_node = itemdoc.find('description') description_node = itemdoc.find('description')
if description_node is not None: if description_node is not None:
description = description_node.text description = description_node.text.strip()
else: else:
description = None description = None
video_info.update({'title': itemdoc.find('title').text,
'id': video_id, info = {
'thumbnail': self._get_thumbnail_url(uri, itemdoc), 'title': itemdoc.find('title').text,
'description': description, 'formats': self._extract_video_formats(mediagen_page),
}) 'id': video_id,
return video_info 'thumbnail': self._get_thumbnail_url(uri, itemdoc),
'description': description,
}
# TODO: Remove when #980 has been merged
info.update(info['formats'][-1])
return info
def _get_videos_info(self, uri): def _get_videos_info(self, uri):
video_id = self._id_from_uri(uri) video_id = self._id_from_uri(uri)

View File

@ -0,0 +1,48 @@
import re
import json
from .common import InfoExtractor
from ..utils import (
compat_str,
)
class MySpaceIE(InfoExtractor):
_VALID_URL = r'https?://myspace\.com/([^/]+)/video/[^/]+/(?P<id>\d+)'
_TEST = {
u'url': u'https://myspace.com/coldplay/video/viva-la-vida/100008689',
u'info_dict': {
u'id': u'100008689',
u'ext': u'flv',
u'title': u'Viva La Vida',
u'description': u'The official Viva La Vida video, directed by Hype Williams',
u'uploader': u'Coldplay',
u'uploader_id': u'coldplay',
},
u'params': {
# rtmp download
u'skip_download': True,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
context = json.loads(self._search_regex(r'context = ({.*?});', webpage,
u'context'))
video = context['video']
rtmp_url, play_path = video['streamUrl'].split(';', 1)
return {
'id': compat_str(video['mediaId']),
'title': video['title'],
'url': rtmp_url,
'play_path': play_path,
'ext': 'flv',
'description': video['description'],
'thumbnail': video['imageUrl'],
'uploader': video['artistName'],
'uploader_id': video['artistUsername'],
}

120
youtube_dl/extractor/nhl.py Normal file
View File

@ -0,0 +1,120 @@
import re
import json
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import (
compat_urlparse,
compat_urllib_parse,
determine_ext,
unified_strdate,
)
class NHLBaseInfoExtractor(InfoExtractor):
@staticmethod
def _fix_json(json_string):
return json_string.replace('\\\'', '\'')
def _extract_video(self, info):
video_id = info['id']
self.report_extraction(video_id)
initial_video_url = info['publishPoint']
data = compat_urllib_parse.urlencode({
'type': 'fvod',
'path': initial_video_url.replace('.mp4', '_sd.mp4'),
})
path_url = 'http://video.nhl.com/videocenter/servlets/encryptvideopath?' + data
path_response = self._download_webpage(path_url, video_id,
u'Downloading final video url')
path_doc = xml.etree.ElementTree.fromstring(path_response)
video_url = path_doc.find('path').text
join = compat_urlparse.urljoin
return {
'id': video_id,
'title': info['name'],
'url': video_url,
'ext': determine_ext(video_url),
'description': info['description'],
'duration': int(info['duration']),
'thumbnail': join(join(video_url, '/u/'), info['bigImage']),
'upload_date': unified_strdate(info['releaseDate'].split('.')[0]),
}
class NHLIE(NHLBaseInfoExtractor):
IE_NAME = u'nhl.com'
_VALID_URL = r'https?://video(?P<team>\.[^.]*)?\.nhl\.com/videocenter/console\?.*?(?<=[?&])id=(?P<id>\d+)'
_TEST = {
u'url': u'http://video.canucks.nhl.com/videocenter/console?catid=6?id=453614',
u'file': u'453614.mp4',
u'info_dict': {
u'title': u'Quick clip: Weise 4-3 goal vs Flames',
u'description': u'Dale Weise scores his first of the season to put the Canucks up 4-3.',
u'duration': 18,
u'upload_date': u'20131006',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
json_url = 'http://video.nhl.com/videocenter/servlets/playlist?ids=%s&format=json' % video_id
info_json = self._download_webpage(json_url, video_id,
u'Downloading info json')
info_json = self._fix_json(info_json)
info = json.loads(info_json)[0]
return self._extract_video(info)
class NHLVideocenterIE(NHLBaseInfoExtractor):
IE_NAME = u'nhl.com:videocenter'
IE_DESC = u'Download the first 12 videos from a videocenter category'
_VALID_URL = r'https?://video\.(?P<team>[^.]*)\.nhl\.com/videocenter/(console\?.*?catid=(?P<catid>[^&]+))?'
@classmethod
def suitable(cls, url):
if NHLIE.suitable(url):
return False
return super(NHLVideocenterIE, cls).suitable(url)
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
team = mobj.group('team')
webpage = self._download_webpage(url, team)
cat_id = self._search_regex(
[r'var defaultCatId = "(.+?)";',
r'{statusIndex:0,index:0,.*?id:(.*?),'],
webpage, u'category id')
playlist_title = self._html_search_regex(
r'tab0"[^>]*?>(.*?)</td>',
webpage, u'playlist title', flags=re.DOTALL).lower().capitalize()
data = compat_urllib_parse.urlencode({
'cid': cat_id,
# This is the default value
'count': 12,
'ptrs': 3,
'format': 'json',
})
path = '/videocenter/servlets/browse?' + data
request_url = compat_urlparse.urljoin(url, path)
response = self._download_webpage(request_url, playlist_title)
response = self._fix_json(response)
if not response.strip():
self._downloader.report_warning(u'Got an empty reponse, trying '
u'adding the "newvideos" parameter')
response = self._download_webpage(request_url + '&newvideos=true',
playlist_title)
response = self._fix_json(response)
videos = json.loads(response)
return {
'_type': 'playlist',
'title': playlist_title,
'id': cat_id,
'entries': [self._extract_video(i) for i in videos],
}

View File

@ -0,0 +1,46 @@
import re
from .common import InfoExtractor
from ..utils import compat_urlparse
class NowVideoIE(InfoExtractor):
_VALID_URL = r'(?:https?://)?(?:www\.)?nowvideo\.ch/video/(?P<id>\w+)'
_TEST = {
u'url': u'http://www.nowvideo.ch/video/0mw0yow7b6dxa',
u'file': u'0mw0yow7b6dxa.flv',
u'md5': u'f8fbbc8add72bd95b7850c6a02fc8817',
u'info_dict': {
u"title": u"youtubedl test video _BaW_jenozKc.mp4"
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage_url = 'http://www.nowvideo.ch/video/' + video_id
embed_url = 'http://embed.nowvideo.ch/embed.php?v=' + video_id
webpage = self._download_webpage(webpage_url, video_id)
embed_page = self._download_webpage(embed_url, video_id,
u'Downloading embed page')
self.report_extraction(video_id)
video_title = self._html_search_regex(r'<h4>(.*)</h4>',
webpage, u'video title')
video_key = self._search_regex(r'var fkzd="(.*)";',
embed_page, u'video key')
api_call = "http://www.nowvideo.ch/api/player.api.php?file={0}&numOfErrors=0&cid=1&key={1}".format(video_id, video_key)
api_response = self._download_webpage(api_call, video_id,
u'Downloading API page')
video_url = compat_urlparse.parse_qs(api_response)[u'url'][0]
return [{
'id': video_id,
'url': video_url,
'ext': 'flv',
'title': video_title,
}]

View File

@ -0,0 +1,69 @@
import os
import re
from .common import InfoExtractor
from ..utils import (
compat_urllib_parse_urlparse,
compat_urllib_request,
compat_urllib_parse,
unescapeHTML,
)
from ..aes import (
aes_decrypt_text
)
class PornHubIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?(?:www\.)?(?P<url>pornhub\.com/view_video\.php\?viewkey=(?P<videoid>[0-9]+))'
_TEST = {
u'url': u'http://www.pornhub.com/view_video.php?viewkey=648719015',
u'file': u'648719015.mp4',
u'md5': u'882f488fa1f0026f023f33576004a2ed',
u'info_dict': {
u"uploader": u"BABES-COM",
u"title": u"Seductive Indian beauty strips down and fingers her pink pussy",
u"age_limit": 18
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('videoid')
url = 'http://www.' + mobj.group('url')
req = compat_urllib_request.Request(url)
req.add_header('Cookie', 'age_verified=1')
webpage = self._download_webpage(req, video_id)
video_title = self._html_search_regex(r'<h1 [^>]+>([^<]+)', webpage, u'title')
video_uploader = self._html_search_regex(r'<b>From: </b>(?:\s|<[^>]*>)*(.+?)<', webpage, u'uploader', fatal=False)
thumbnail = self._html_search_regex(r'"image_url":"([^"]+)', webpage, u'thumbnail', fatal=False)
if thumbnail:
thumbnail = compat_urllib_parse.unquote(thumbnail)
video_urls = list(map(compat_urllib_parse.unquote , re.findall(r'"quality_[0-9]{3}p":"([^"]+)', webpage)))
if webpage.find('"encrypted":true') != -1:
password = self._html_search_regex(r'"video_title":"([^"]+)', webpage, u'password').replace('+', ' ')
video_urls = list(map(lambda s: aes_decrypt_text(s, password, 32).decode('utf-8'), video_urls))
formats = []
for video_url in video_urls:
path = compat_urllib_parse_urlparse(video_url).path
extension = os.path.splitext(path)[1][1:]
format = path.split('/')[5].split('_')[:2]
format = "-".join(format)
formats.append({
'url': video_url,
'ext': extension,
'format': format,
'format_id': format,
})
formats.sort(key=lambda format: list(map(lambda s: s.zfill(6), format['format'].split('-'))))
return {
'id': video_id,
'uploader': video_uploader,
'title': video_title,
'thumbnail': thumbnail,
'formats': formats,
'age_limit': 18,
}

View File

@ -16,7 +16,8 @@ class PornotubeIE(InfoExtractor):
u'md5': u'374dd6dcedd24234453b295209aa69b6', u'md5': u'374dd6dcedd24234453b295209aa69b6',
u'info_dict': { u'info_dict': {
u"upload_date": u"20090708", u"upload_date": u"20090708",
u"title": u"Marilyn-Monroe-Bathing" u"title": u"Marilyn-Monroe-Bathing",
u"age_limit": 18
} }
} }
@ -38,6 +39,7 @@ class PornotubeIE(InfoExtractor):
VIDEO_UPLOADED_RE = r'<div class="video_added_by">Added (?P<date>[0-9\/]+) by' VIDEO_UPLOADED_RE = r'<div class="video_added_by">Added (?P<date>[0-9\/]+) by'
upload_date = self._html_search_regex(VIDEO_UPLOADED_RE, webpage, u'upload date', fatal=False) upload_date = self._html_search_regex(VIDEO_UPLOADED_RE, webpage, u'upload date', fatal=False)
if upload_date: upload_date = unified_strdate(upload_date) if upload_date: upload_date = unified_strdate(upload_date)
age_limit = self._rta_search(webpage)
info = {'id': video_id, info = {'id': video_id,
'url': video_url, 'url': video_url,
@ -45,6 +47,7 @@ class PornotubeIE(InfoExtractor):
'upload_date': upload_date, 'upload_date': upload_date,
'title': video_title, 'title': video_title,
'ext': 'flv', 'ext': 'flv',
'format': 'flv'} 'format': 'flv',
'age_limit': age_limit}
return [info] return [info]

View File

@ -10,28 +10,35 @@ class RedTubeIE(InfoExtractor):
u'file': u'66418.mp4', u'file': u'66418.mp4',
u'md5': u'7b8c22b5e7098a3e1c09709df1126d2d', u'md5': u'7b8c22b5e7098a3e1c09709df1126d2d',
u'info_dict': { u'info_dict': {
u"title": u"Sucked on a toilet" u"title": u"Sucked on a toilet",
u"age_limit": 18,
} }
} }
def _real_extract(self,url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id') video_id = mobj.group('id')
video_extension = 'mp4' video_extension = 'mp4'
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
self.report_extraction(video_id) self.report_extraction(video_id)
video_url = self._html_search_regex(r'<source src="(.+?)" type="video/mp4">', video_url = self._html_search_regex(
webpage, u'video URL') r'<source src="(.+?)" type="video/mp4">', webpage, u'video URL')
video_title = self._html_search_regex('<h1 class="videoTitle slidePanelMovable">(.+?)</h1>', video_title = self._html_search_regex(
r'<h1 class="videoTitle slidePanelMovable">(.+?)</h1>',
webpage, u'title') webpage, u'title')
return [{ # No self-labeling, but they describe themselves as
'id': video_id, # "Home of Videos Porno"
'url': video_url, age_limit = 18
'ext': video_extension,
'title': video_title, return {
}] 'id': video_id,
'url': video_url,
'ext': video_extension,
'title': video_title,
'age_limit': age_limit,
}

View File

@ -0,0 +1,16 @@
from .videodetective import VideoDetectiveIE
# It just uses the same method as videodetective.com,
# the internetvideoarchive.com is extracted from the og:video property
class RottenTomatoesIE(VideoDetectiveIE):
_VALID_URL = r'https?://www\.rottentomatoes\.com/m/[^/]+/trailers/(?P<id>\d+)'
_TEST = {
u'url': u'http://www.rottentomatoes.com/m/toy_story_3/trailers/11028566/',
u'file': '613340.mp4',
u'info_dict': {
u'title': u'TOY STORY 3',
u'description': u'From the creators of the beloved TOY STORY films, comes a story that will reunite the gang in a whole new way.',
},
}

View File

@ -8,8 +8,8 @@ from ..utils import (
) )
class RTLnowIE(InfoExtractor): class RTLnowIE(InfoExtractor):
"""Information Extractor for RTL NOW, RTL2 NOW, SUPER RTL NOW and VOX NOW""" """Information Extractor for RTL NOW, RTL2 NOW, RTL NITRO, SUPER RTL NOW, VOX NOW and n-tv NOW"""
_VALID_URL = r'(?:http://)?(?P<url>(?P<base_url>rtl-now\.rtl\.de/|rtl2now\.rtl2\.de/|(?:www\.)?voxnow\.de/|(?:www\.)?superrtlnow\.de/)[a-zA-Z0-9-]+/[a-zA-Z0-9-]+\.php\?(?:container_id|film_id)=(?P<video_id>[0-9]+)&player=1(?:&season=[0-9]+)?(?:&.*)?)' _VALID_URL = r'(?:http://)?(?P<url>(?P<base_url>rtl-now\.rtl\.de/|rtl2now\.rtl2\.de/|(?:www\.)?voxnow\.de/|(?:www\.)?rtlnitronow\.de/|(?:www\.)?superrtlnow\.de/|(?:www\.)?n-tvnow\.de/)[a-zA-Z0-9-]+/[a-zA-Z0-9-]+\.php\?(?:container_id|film_id)=(?P<video_id>[0-9]+)&player=1(?:&season=[0-9]+)?(?:&.*)?)'
_TESTS = [{ _TESTS = [{
u'url': u'http://rtl-now.rtl.de/ahornallee/folge-1.php?film_id=90419&player=1&season=1', u'url': u'http://rtl-now.rtl.de/ahornallee/folge-1.php?film_id=90419&player=1&season=1',
u'file': u'90419.flv', u'file': u'90419.flv',
@ -61,8 +61,34 @@ class RTLnowIE(InfoExtractor):
u'params': { u'params': {
u'skip_download': True, u'skip_download': True,
}, },
},
{
u'url': u'http://www.rtlnitronow.de/recht-ordnung/stadtpolizei-frankfurt-gerichtsvollzieher-leipzig.php?film_id=129679&player=1&season=1',
u'file': u'129679.flv',
u'info_dict': {
u'upload_date': u'20131016',
u'title': u'Recht & Ordnung - Stadtpolizei Frankfurt/ Gerichtsvollzieher...',
u'description': u'Stadtpolizei Frankfurt/ Gerichtsvollzieher Leipzig',
},
u'params': {
u'skip_download': True,
},
},
{
u'url': u'http://www.n-tvnow.de/top-gear/episode-1-2013-01-01-00-00-00.php?film_id=124903&player=1&season=10',
u'file': u'124903.flv',
u'info_dict': {
u'upload_date': u'20130101',
u'title': u'Top Gear vom 01.01.2013',
u'description': u'Episode 1',
},
u'params': {
u'skip_download': True,
},
u'skip': u'Only works from Germany',
}] }]
def _real_extract(self,url): def _real_extract(self,url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
@ -79,20 +105,23 @@ class RTLnowIE(InfoExtractor):
msg = clean_html(note_m.group(1)) msg = clean_html(note_m.group(1))
raise ExtractorError(msg) raise ExtractorError(msg)
video_title = self._html_search_regex(r'<title>(?P<title>[^<]+)</title>', video_title = self._html_search_regex(r'<title>(?P<title>[^<]+?)( \| [^<]*)?</title>',
webpage, u'title') webpage, u'title')
playerdata_url = self._html_search_regex(r'\'playerdata\': \'(?P<playerdata_url>[^\']+)\'', playerdata_url = self._html_search_regex(r'\'playerdata\': \'(?P<playerdata_url>[^\']+)\'',
webpage, u'playerdata_url') webpage, u'playerdata_url')
playerdata = self._download_webpage(playerdata_url, video_id) playerdata = self._download_webpage(playerdata_url, video_id)
mobj = re.search(r'<title><!\[CDATA\[(?P<description>.+?)\s+- (?:Sendung )?vom (?P<upload_date_d>[0-9]{2})\.(?P<upload_date_m>[0-9]{2})\.(?:(?P<upload_date_Y>[0-9]{4})|(?P<upload_date_y>[0-9]{2})) [0-9]{2}:[0-9]{2} Uhr\]\]></title>', playerdata) mobj = re.search(r'<title><!\[CDATA\[(?P<description>.+?)(?:\s+- (?:Sendung )?vom (?P<upload_date_d>[0-9]{2})\.(?P<upload_date_m>[0-9]{2})\.(?:(?P<upload_date_Y>[0-9]{4})|(?P<upload_date_y>[0-9]{2})) [0-9]{2}:[0-9]{2} Uhr)?\]\]></title>', playerdata)
if mobj: if mobj:
video_description = mobj.group(u'description') video_description = mobj.group(u'description')
if mobj.group('upload_date_Y'): if mobj.group('upload_date_Y'):
video_upload_date = mobj.group('upload_date_Y') video_upload_date = mobj.group('upload_date_Y')
else: elif mobj.group('upload_date_y'):
video_upload_date = u'20' + mobj.group('upload_date_y') video_upload_date = u'20' + mobj.group('upload_date_y')
video_upload_date += mobj.group('upload_date_m')+mobj.group('upload_date_d') else:
video_upload_date = None
if video_upload_date:
video_upload_date += mobj.group('upload_date_m')+mobj.group('upload_date_d')
else: else:
video_description = None video_description = None
video_upload_date = None video_upload_date = None

View File

@ -0,0 +1,58 @@
# encoding: utf-8
import re
import json
from .common import InfoExtractor
from ..utils import (
compat_urlparse,
compat_str,
ExtractorError,
)
class RutubeIE(InfoExtractor):
_VALID_URL = r'https?://rutube.ru/video/(?P<long_id>\w+)'
_TEST = {
u'url': u'http://rutube.ru/video/3eac3b4561676c17df9132a9a1e62e3e/',
u'file': u'3eac3b4561676c17df9132a9a1e62e3e.mp4',
u'info_dict': {
u'title': u'Раненный кенгуру забежал в аптеку',
u'uploader': u'NTDRussian',
u'uploader_id': u'29790',
},
u'params': {
# It requires ffmpeg (m3u8 download)
u'skip_download': True,
},
}
def _get_api_response(self, short_id, subpath):
api_url = 'http://rutube.ru/api/play/%s/%s/?format=json' % (subpath, short_id)
response_json = self._download_webpage(api_url, short_id,
u'Downloading %s json' % subpath)
return json.loads(response_json)
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
long_id = mobj.group('long_id')
webpage = self._download_webpage(url, long_id)
og_video = self._og_search_video_url(webpage)
short_id = compat_urlparse.urlparse(og_video).path[1:]
options = self._get_api_response(short_id, 'options')
trackinfo = self._get_api_response(short_id, 'trackinfo')
# Some videos don't have the author field
author = trackinfo.get('author') or {}
m3u8_url = trackinfo['video_balancer'].get('m3u8')
if m3u8_url is None:
raise ExtractorError(u'Couldn\'t find m3u8 manifest url')
return {
'id': trackinfo['id'],
'title': trackinfo['title'],
'url': m3u8_url,
'ext': 'mp4',
'thumbnail': options['thumbnail_url'],
'uploader': author.get('name'),
'uploader_id': compat_str(author['id']) if author else None,
}

View File

@ -7,6 +7,7 @@ class SlashdotIE(InfoExtractor):
_VALID_URL = r'https?://tv.slashdot.org/video/\?embed=(?P<id>.*?)(&|$)' _VALID_URL = r'https?://tv.slashdot.org/video/\?embed=(?P<id>.*?)(&|$)'
_TEST = { _TEST = {
u'add_ie': ['Ooyala'],
u'url': u'http://tv.slashdot.org/video/?embed=JscHMzZDplD0p-yNLOzTfzC3Q3xzJaUz', u'url': u'http://tv.slashdot.org/video/?embed=JscHMzZDplD0p-yNLOzTfzC3Q3xzJaUz',
u'file': u'JscHMzZDplD0p-yNLOzTfzC3Q3xzJaUz.mp4', u'file': u'JscHMzZDplD0p-yNLOzTfzC3Q3xzJaUz.mp4',
u'md5': u'd2222e7a4a4c1541b3e0cf732fb26735', u'md5': u'd2222e7a4a4c1541b3e0cf732fb26735',

View File

@ -29,17 +29,34 @@ class SoundcloudIE(InfoExtractor):
) )
''' '''
IE_NAME = u'soundcloud' IE_NAME = u'soundcloud'
_TEST = { _TESTS = [
u'url': u'http://soundcloud.com/ethmusic/lostin-powers-she-so-heavy', {
u'file': u'62986583.mp3', u'url': u'http://soundcloud.com/ethmusic/lostin-powers-she-so-heavy',
u'md5': u'ebef0a451b909710ed1d7787dddbf0d7', u'file': u'62986583.mp3',
u'info_dict': { u'md5': u'ebef0a451b909710ed1d7787dddbf0d7',
u"upload_date": u"20121011", u'info_dict': {
u"description": u"No Downloads untill we record the finished version this weekend, i was too pumped n i had to post it , earl is prolly gonna b hella p.o'd", u"upload_date": u"20121011",
u"uploader": u"E.T. ExTerrestrial Music", u"description": u"No Downloads untill we record the finished version this weekend, i was too pumped n i had to post it , earl is prolly gonna b hella p.o'd",
u"title": u"Lostin Powers - She so Heavy (SneakPreview) Adrian Ackers Blueprint 1" u"uploader": u"E.T. ExTerrestrial Music",
} u"title": u"Lostin Powers - She so Heavy (SneakPreview) Adrian Ackers Blueprint 1"
} }
},
# not streamable song
{
u'url': u'https://soundcloud.com/the-concept-band/goldrushed-mastered?in=the-concept-band/sets/the-royal-concept-ep',
u'info_dict': {
u'id': u'47127627',
u'ext': u'mp3',
u'title': u'Goldrushed',
u'uploader': u'The Royal Concept',
u'upload_date': u'20120521',
},
u'params': {
# rtmp
u'skip_download': True,
},
},
]
_CLIENT_ID = 'b45b1aa10f1ac2941910a7f0d10f8e28' _CLIENT_ID = 'b45b1aa10f1ac2941910a7f0d10f8e28'
@ -56,16 +73,16 @@ class SoundcloudIE(InfoExtractor):
return 'http://api.soundcloud.com/resolve.json?url=' + url + '&client_id=' + cls._CLIENT_ID return 'http://api.soundcloud.com/resolve.json?url=' + url + '&client_id=' + cls._CLIENT_ID
def _extract_info_dict(self, info, full_title=None, quiet=False): def _extract_info_dict(self, info, full_title=None, quiet=False):
video_id = info['id'] track_id = compat_str(info['id'])
name = full_title or video_id name = full_title or track_id
if quiet == False: if quiet == False:
self.report_extraction(name) self.report_extraction(name)
thumbnail = info['artwork_url'] thumbnail = info['artwork_url']
if thumbnail is not None: if thumbnail is not None:
thumbnail = thumbnail.replace('-large', '-t500x500') thumbnail = thumbnail.replace('-large', '-t500x500')
return { result = {
'id': info['id'], 'id': track_id,
'url': info['stream_url'] + '?client_id=' + self._CLIENT_ID, 'url': info['stream_url'] + '?client_id=' + self._CLIENT_ID,
'uploader': info['user']['username'], 'uploader': info['user']['username'],
'upload_date': unified_strdate(info['created_at']), 'upload_date': unified_strdate(info['created_at']),
@ -74,6 +91,21 @@ class SoundcloudIE(InfoExtractor):
'description': info['description'], 'description': info['description'],
'thumbnail': thumbnail, 'thumbnail': thumbnail,
} }
if info.get('downloadable', False):
result['url'] = 'https://api.soundcloud.com/tracks/{0}/download?client_id={1}'.format(track_id, self._CLIENT_ID)
if not info.get('streamable', False):
# We have to get the rtmp url
stream_json = self._download_webpage(
'http://api.soundcloud.com/i1/tracks/{0}/streams?client_id={1}'.format(track_id, self._CLIENT_ID),
track_id, u'Downloading track url')
rtmp_url = json.loads(stream_json)['rtmp_mp3_128_url']
# The url doesn't have an rtmp app, we have to extract the playpath
url, path = rtmp_url.split('mp3:', 1)
result.update({
'url': url,
'play_path': 'mp3:' + path,
})
return result
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url, flags=re.VERBOSE) mobj = re.match(self._VALID_URL, url, flags=re.VERBOSE)
@ -106,70 +138,8 @@ class SoundcloudIE(InfoExtractor):
class SoundcloudSetIE(SoundcloudIE): class SoundcloudSetIE(SoundcloudIE):
_VALID_URL = r'^(?:https?://)?(?:www\.)?soundcloud\.com/([\w\d-]+)/sets/([\w\d-]+)(?:[?].*)?$' _VALID_URL = r'^(?:https?://)?(?:www\.)?soundcloud\.com/([\w\d-]+)/sets/([\w\d-]+)(?:[?].*)?$'
IE_NAME = u'soundcloud:set' IE_NAME = u'soundcloud:set'
_TEST = { # it's in tests/test_playlists.py
u"url":"https://soundcloud.com/the-concept-band/sets/the-royal-concept-ep", _TESTS = []
u"playlist": [
{
u"file":"30510138.mp3",
u"md5":"f9136bf103901728f29e419d2c70f55d",
u"info_dict": {
u"upload_date": u"20111213",
u"description": u"The Royal Concept from Stockholm\r\nFilip / Povel / David / Magnus\r\nwww.royalconceptband.com",
u"uploader": u"The Royal Concept",
u"title": u"D-D-Dance"
}
},
{
u"file":"47127625.mp3",
u"md5":"09b6758a018470570f8fd423c9453dd8",
u"info_dict": {
u"upload_date": u"20120521",
u"description": u"The Royal Concept from Stockholm\r\nFilip / Povel / David / Magnus\r\nwww.royalconceptband.com",
u"uploader": u"The Royal Concept",
u"title": u"The Royal Concept - Gimme Twice"
}
},
{
u"file":"47127627.mp3",
u"md5":"154abd4e418cea19c3b901f1e1306d9c",
u"info_dict": {
u"upload_date": u"20120521",
u"uploader": u"The Royal Concept",
u"title": u"Goldrushed"
}
},
{
u"file":"47127629.mp3",
u"md5":"2f5471edc79ad3f33a683153e96a79c1",
u"info_dict": {
u"upload_date": u"20120521",
u"description": u"The Royal Concept from Stockholm\r\nFilip / Povel / David / Magnus\r\nwww.royalconceptband.com",
u"uploader": u"The Royal Concept",
u"title": u"In the End"
}
},
{
u"file":"47127631.mp3",
u"md5":"f9ba87aa940af7213f98949254f1c6e2",
u"info_dict": {
u"upload_date": u"20120521",
u"description": u"The Royal Concept from Stockholm\r\nFilip / David / Povel / Magnus\r\nwww.theroyalconceptband.com",
u"uploader": u"The Royal Concept",
u"title": u"Knocked Up"
}
},
{
u"file":"75206121.mp3",
u"md5":"f9d1fe9406717e302980c30de4af9353",
u"info_dict": {
u"upload_date": u"20130116",
u"description": u"The unreleased track World on Fire premiered on the CW's hit show Arrow (8pm/7pm central). \r\nAs a gift to our fans we would like to offer you a free download of the track! ",
u"uploader": u"The Royal Concept",
u"title": u"World On Fire"
}
}
]
}
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
@ -208,7 +178,7 @@ class SoundcloudUserIE(SoundcloudIE):
IE_NAME = u'soundcloud:user' IE_NAME = u'soundcloud:user'
# it's in tests/test_playlists.py # it's in tests/test_playlists.py
_TEST = None _TESTS = []
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)

View File

@ -0,0 +1,35 @@
import re
from .common import InfoExtractor
from .brightcove import BrightcoveIE
from ..utils import RegexNotFoundError, ExtractorError
class SpaceIE(InfoExtractor):
_VALID_URL = r'https?://www\.space\.com/\d+-(?P<title>[^/\.\?]*?)-video.html'
_TEST = {
u'add_ie': ['Brightcove'],
u'url': u'http://www.space.com/23373-huge-martian-landforms-detail-revealed-by-european-probe-video.html',
u'info_dict': {
u'id': u'2780937028001',
u'ext': u'mp4',
u'title': u'Huge Martian Landforms\' Detail Revealed By European Probe | Video',
u'description': u'md5:db81cf7f3122f95ed234b631a6ea1e61',
u'uploader': u'TechMedia Networks',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
title = mobj.group('title')
webpage = self._download_webpage(url, title)
try:
# Some videos require the playerKey field, which isn't define in
# the BrightcoveExperience object
brightcove_url = self._og_search_video_url(webpage)
except RegexNotFoundError:
# Other videos works fine with the info from the object
brightcove_url = BrightcoveIE._extract_brightcove_url(webpage)
if brightcove_url is None:
raise ExtractorError(u'The webpage does not contain a video', expected=True)
return self.url_result(brightcove_url, BrightcoveIE.ie_key())

View File

@ -0,0 +1,74 @@
import os
import re
from .common import InfoExtractor
from ..utils import (
compat_urllib_parse_urlparse,
compat_urllib_request,
compat_urllib_parse,
unescapeHTML,
)
from ..aes import (
aes_decrypt_text
)
class SpankwireIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?(?:www\.)?(?P<url>spankwire\.com/[^/]*/video(?P<videoid>[0-9]+)/?)'
_TEST = {
u'url': u'http://www.spankwire.com/Buckcherry-s-X-Rated-Music-Video-Crazy-Bitch/video103545/',
u'file': u'103545.mp4',
u'md5': u'1b3f55e345500552dbc252a3e9c1af43',
u'info_dict': {
u"uploader": u"oreusz",
u"title": u"Buckcherry`s X Rated Music Video Crazy Bitch",
u"description": u"Crazy Bitch X rated music video.",
u"age_limit": 18,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('videoid')
url = 'http://www.' + mobj.group('url')
req = compat_urllib_request.Request(url)
req.add_header('Cookie', 'age_verified=1')
webpage = self._download_webpage(req, video_id)
video_title = self._html_search_regex(r'<h1>([^<]+)', webpage, u'title')
video_uploader = self._html_search_regex(r'by:\s*<a [^>]*>(.+?)</a>', webpage, u'uploader', fatal=False)
thumbnail = self._html_search_regex(r'flashvars\.image_url = "([^"]+)', webpage, u'thumbnail', fatal=False)
description = self._html_search_regex(r'>\s*Description:</div>\s*<[^>]*>([^<]+)', webpage, u'description', fatal=False)
if len(description) == 0:
description = None
video_urls = list(map(compat_urllib_parse.unquote , re.findall(r'flashvars\.quality_[0-9]{3}p = "([^"]+)', webpage)))
if webpage.find('flashvars\.encrypted = "true"') != -1:
password = self._html_search_regex(r'flashvars\.video_title = "([^"]+)', webpage, u'password').replace('+', ' ')
video_urls = list(map(lambda s: aes_decrypt_text(s, password, 32).decode('utf-8'), video_urls))
formats = []
for video_url in video_urls:
path = compat_urllib_parse_urlparse(video_url).path
extension = os.path.splitext(path)[1][1:]
format = path.split('/')[4].split('_')[:2]
format = "-".join(format)
formats.append({
'url': video_url,
'ext': extension,
'format': format,
'format_id': format,
})
formats.sort(key=lambda format: list(map(lambda s: s.zfill(6), format['format'].split('-'))))
age_limit = self._rta_search(webpage)
return {
'id': video_id,
'uploader': video_uploader,
'title': video_title,
'thumbnail': thumbnail,
'description': description,
'formats': formats,
'age_limit': age_limit,
}

View File

@ -12,9 +12,9 @@ class SubtitlesInfoExtractor(InfoExtractor):
return any([self._downloader.params.get('writesubtitles', False), return any([self._downloader.params.get('writesubtitles', False),
self._downloader.params.get('writeautomaticsub')]) self._downloader.params.get('writeautomaticsub')])
def _list_available_subtitles(self, video_id, webpage=None): def _list_available_subtitles(self, video_id, webpage):
""" outputs the available subtitles for the video """ """ outputs the available subtitles for the video """
sub_lang_list = self._get_available_subtitles(video_id) sub_lang_list = self._get_available_subtitles(video_id, webpage)
auto_captions_list = self._get_available_automatic_caption(video_id, webpage) auto_captions_list = self._get_available_automatic_caption(video_id, webpage)
sub_lang = ",".join(list(sub_lang_list.keys())) sub_lang = ",".join(list(sub_lang_list.keys()))
self.to_screen(u'%s: Available subtitles for video: %s' % self.to_screen(u'%s: Available subtitles for video: %s' %
@ -23,7 +23,7 @@ class SubtitlesInfoExtractor(InfoExtractor):
self.to_screen(u'%s: Available automatic captions for video: %s' % self.to_screen(u'%s: Available automatic captions for video: %s' %
(video_id, auto_lang)) (video_id, auto_lang))
def extract_subtitles(self, video_id, video_webpage=None): def extract_subtitles(self, video_id, webpage):
""" """
returns {sub_lang: sub} ,{} if subtitles not found or None if the returns {sub_lang: sub} ,{} if subtitles not found or None if the
subtitles aren't requested. subtitles aren't requested.
@ -32,9 +32,9 @@ class SubtitlesInfoExtractor(InfoExtractor):
return None return None
available_subs_list = {} available_subs_list = {}
if self._downloader.params.get('writeautomaticsub', False): if self._downloader.params.get('writeautomaticsub', False):
available_subs_list.update(self._get_available_automatic_caption(video_id, video_webpage)) available_subs_list.update(self._get_available_automatic_caption(video_id, webpage))
if self._downloader.params.get('writesubtitles', False): if self._downloader.params.get('writesubtitles', False):
available_subs_list.update(self._get_available_subtitles(video_id)) available_subs_list.update(self._get_available_subtitles(video_id, webpage))
if not available_subs_list: # error, it didn't get the available subtitles if not available_subs_list: # error, it didn't get the available subtitles
return {} return {}
@ -74,7 +74,7 @@ class SubtitlesInfoExtractor(InfoExtractor):
return return
return sub return sub
def _get_available_subtitles(self, video_id): def _get_available_subtitles(self, video_id, webpage):
""" """
returns {sub_lang: url} or {} if not available returns {sub_lang: url} or {} if not available
Must be redefined by the subclasses Must be redefined by the subclasses

View File

@ -0,0 +1,44 @@
# -*- coding: utf-8 -*-
import re
from .common import InfoExtractor
from ..utils import determine_ext
class SztvHuIE(InfoExtractor):
_VALID_URL = r'(?:http://)?(?:(?:www\.)?sztv\.hu|www\.tvszombathely\.hu)/(?:[^/]+)/.+-(?P<id>[0-9]+)'
_TEST = {
u'url': u'http://sztv.hu/hirek/cserkeszek-nepszerusitettek-a-kornyezettudatos-eletmodot-a-savaria-teren-20130909',
u'file': u'20130909.mp4',
u'md5': u'a6df607b11fb07d0e9f2ad94613375cb',
u'info_dict': {
u"title": u"Cserkészek népszerűsítették a környezettudatos életmódot a Savaria téren",
u"description": u'A zöld nap játékos ismeretterjesztő programjait a Magyar Cserkész Szövetség szervezte, akik az ország nyolc városában adják át tudásukat az érdeklődőknek. A PET...',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
video_file = self._search_regex(
r'file: "...:(.*?)",', webpage, 'video file')
title = self._html_search_regex(
r'<meta name="title" content="([^"]*?) - [^-]*? - [^-]*?"',
webpage, 'video title')
description = self._html_search_regex(
r'<meta name="description" content="([^"]*)"/>',
webpage, 'video description', fatal=False)
thumbnail = self._og_search_thumbnail(webpage)
video_url = 'http://media.sztv.hu/vod/' + video_file
return {
'id': video_id,
'url': video_url,
'title': title,
'ext': determine_ext(video_url),
'description': description,
'thumbnail': thumbnail,
}

View File

@ -1,4 +1,5 @@
import re import re
import xml.etree.ElementTree
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
@ -11,7 +12,7 @@ class TeamcocoIE(InfoExtractor):
_TEST = { _TEST = {
u'url': u'http://teamcoco.com/video/louis-ck-interview-george-w-bush', u'url': u'http://teamcoco.com/video/louis-ck-interview-george-w-bush',
u'file': u'19705.mp4', u'file': u'19705.mp4',
u'md5': u'27b6f7527da5acf534b15f21b032656e', u'md5': u'cde9ba0fa3506f5f017ce11ead928f9a',
u'info_dict': { u'info_dict': {
u"description": u"Louis C.K. got starstruck by George W. Bush, so what? Part one.", u"description": u"Louis C.K. got starstruck by George W. Bush, so what? Part one.",
u"title": u"Louis C.K. Interview Pt. 1 11/3/11" u"title": u"Louis C.K. Interview Pt. 1 11/3/11"
@ -31,16 +32,40 @@ class TeamcocoIE(InfoExtractor):
self.report_extraction(video_id) self.report_extraction(video_id)
data_url = 'http://teamcoco.com/cvp/2.0/%s.xml' % video_id data_url = 'http://teamcoco.com/cvp/2.0/%s.xml' % video_id
data = self._download_webpage(data_url, video_id, 'Downloading data webpage') data_xml = self._download_webpage(data_url, video_id, 'Downloading data webpage')
data = xml.etree.ElementTree.fromstring(data_xml.encode('utf-8'))
video_url = self._html_search_regex(r'<file [^>]*type="high".*?>(.*?)</file>',
data, u'video URL')
return [{ qualities = ['500k', '480p', '1000k', '720p', '1080p']
formats = []
for file in data.findall('files/file'):
if file.attrib.get('playmode') == 'all':
# it just duplicates one of the entries
break
file_url = file.text
m_format = re.search(r'(\d+(k|p))\.mp4', file_url)
if m_format is not None:
format_id = m_format.group(1)
else:
format_id = file.attrib['bitrate']
formats.append({
'url': file_url,
'ext': 'mp4',
'format_id': format_id,
})
def sort_key(f):
try:
return qualities.index(f['format_id'])
except ValueError:
return -1
formats.sort(key=sort_key)
if not formats:
raise RegexNotFoundError(u'Unable to extract video URL')
return {
'id': video_id, 'id': video_id,
'url': video_url, 'formats': formats,
'ext': 'mp4',
'title': self._og_search_title(webpage), 'title': self._og_search_title(webpage),
'thumbnail': self._og_search_thumbnail(webpage), 'thumbnail': self._og_search_thumbnail(webpage),
'description': self._og_search_description(webpage), 'description': self._og_search_description(webpage),
}] }

View File

@ -0,0 +1,65 @@
import re
from .common import InfoExtractor
from ..utils import (
get_element_by_attribute,
clean_html,
)
class TechTalksIE(InfoExtractor):
_VALID_URL = r'https?://techtalks\.tv/talks/[^/]*/(?P<id>\d+)/'
_TEST = {
u'url': u'http://techtalks.tv/talks/learning-topic-models-going-beyond-svd/57758/',
u'playlist': [
{
u'file': u'57758.flv',
u'info_dict': {
u'title': u'Learning Topic Models --- Going beyond SVD',
},
},
{
u'file': u'57758-slides.flv',
u'info_dict': {
u'title': u'Learning Topic Models --- Going beyond SVD',
},
},
],
u'params': {
# rtmp download
u'skip_download': True,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
talk_id = mobj.group('id')
webpage = self._download_webpage(url, talk_id)
rtmp_url = self._search_regex(r'netConnectionUrl: \'(.*?)\'', webpage,
u'rtmp url')
play_path = self._search_regex(r'href=\'(.*?)\' [^>]*id="flowplayer_presenter"',
webpage, u'presenter play path')
title = clean_html(get_element_by_attribute('class', 'title', webpage))
video_info = {
'id': talk_id,
'title': title,
'url': rtmp_url,
'play_path': play_path,
'ext': 'flv',
}
m_slides = re.search(r'<a class="slides" href=\'(.*?)\'', webpage)
if m_slides is None:
return video_info
else:
return [
video_info,
# The slides video
{
'id': talk_id + '-slides',
'title': title,
'url': rtmp_url,
'play_path': m_slides.group(1),
'ext': 'flv',
},
]

View File

@ -1,10 +1,14 @@
import json import json
import re import re
from .common import InfoExtractor from .subtitles import SubtitlesInfoExtractor
from ..utils import (
compat_str,
RegexNotFoundError,
)
class TEDIE(InfoExtractor): class TEDIE(SubtitlesInfoExtractor):
_VALID_URL=r'''http://www\.ted\.com/ _VALID_URL=r'''http://www\.ted\.com/
( (
((?P<type_playlist>playlists)/(?P<playlist_id>\d+)) # We have a playlist ((?P<type_playlist>playlists)/(?P<playlist_id>\d+)) # We have a playlist
@ -32,7 +36,7 @@ class TEDIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
m=re.match(self._VALID_URL, url, re.VERBOSE) m=re.match(self._VALID_URL, url, re.VERBOSE)
if m.group('type_talk'): if m.group('type_talk'):
return [self._talk_info(url)] return self._talk_info(url)
else : else :
playlist_id=m.group('playlist_id') playlist_id=m.group('playlist_id')
name=m.group('name') name=m.group('name')
@ -77,12 +81,44 @@ class TEDIE(InfoExtractor):
thumbnail = self._search_regex(r'</span>[\s.]*</div>[\s.]*<img src="(.*?)"', thumbnail = self._search_regex(r'</span>[\s.]*</div>[\s.]*<img src="(.*?)"',
webpage, 'thumbnail') webpage, 'thumbnail')
formats = [{
'ext': 'mp4',
'url': stream['file'],
'format': stream['id']
} for stream in info['htmlStreams']]
video_id = info['id']
# subtitles
video_subtitles = self.extract_subtitles(video_id, webpage)
if self._downloader.params.get('listsubtitles', False):
self._list_available_subtitles(video_id, webpage)
return
info = { info = {
'id': info['id'], 'id': video_id,
'url': info['htmlStreams'][-1]['file'], 'title': title,
'ext': 'mp4', 'thumbnail': thumbnail,
'title': title, 'description': desc,
'thumbnail': thumbnail, 'subtitles': video_subtitles,
'description': desc, 'formats': formats,
} }
# TODO: Remove when #980 has been merged
info.update(info['formats'][-1])
return info return info
def _get_available_subtitles(self, video_id, webpage):
try:
options = self._search_regex(r'(?:<select name="subtitles_language_select" id="subtitles_language_select">)(.*?)(?:</select>)', webpage, 'subtitles_language_select', flags=re.DOTALL)
languages = re.findall(r'(?:<option value=")(\S+)"', options)
if languages:
sub_lang_list = {}
for l in languages:
url = 'http://www.ted.com/talks/subtitles/id/%s/lang/%s/format/srt' % (video_id, l)
sub_lang_list[l] = url
return sub_lang_list
except RegexNotFoundError as err:
self._downloader.report_warning(u'video doesn\'t have subtitles')
return {}

View File

@ -0,0 +1,65 @@
import os
import re
from .common import InfoExtractor
from ..utils import (
compat_urllib_parse_urlparse,
compat_urllib_request,
compat_urllib_parse,
unescapeHTML,
)
from ..aes import (
aes_decrypt_text
)
class Tube8IE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?(?:www\.)?(?P<url>tube8\.com/[^/]+/[^/]+/(?P<videoid>[0-9]+)/?)'
_TEST = {
u'url': u'http://www.tube8.com/teen/kasia-music-video/229795/',
u'file': u'229795.mp4',
u'md5': u'e9e0b0c86734e5e3766e653509475db0',
u'info_dict': {
u"description": u"hot teen Kasia grinding",
u"uploader": u"unknown",
u"title": u"Kasia music video",
u"age_limit": 18,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('videoid')
url = 'http://www.' + mobj.group('url')
req = compat_urllib_request.Request(url)
req.add_header('Cookie', 'age_verified=1')
webpage = self._download_webpage(req, video_id)
video_title = self._html_search_regex(r'videotitle ="([^"]+)', webpage, u'title')
video_description = self._html_search_regex(r'>Description:</strong>(.+?)<', webpage, u'description', fatal=False)
video_uploader = self._html_search_regex(r'>Submitted by:</strong>(?:\s|<[^>]*>)*(.+?)<', webpage, u'uploader', fatal=False)
thumbnail = self._html_search_regex(r'"image_url":"([^"]+)', webpage, u'thumbnail', fatal=False)
if thumbnail:
thumbnail = thumbnail.replace('\\/', '/')
video_url = self._html_search_regex(r'"video_url":"([^"]+)', webpage, u'video_url')
if webpage.find('"encrypted":true')!=-1:
password = self._html_search_regex(r'"video_title":"([^"]+)', webpage, u'password')
video_url = aes_decrypt_text(video_url, password, 32).decode('utf-8')
path = compat_urllib_parse_urlparse(video_url).path
extension = os.path.splitext(path)[1][1:]
format = path.split('/')[4].split('_')[:2]
format = "-".join(format)
return {
'id': video_id,
'uploader': video_uploader,
'title': video_title,
'thumbnail': thumbnail,
'description': video_description,
'url': video_url,
'ext': extension,
'format': format,
'format_id': format,
'age_limit': 18,
}

View File

@ -7,15 +7,25 @@ from .common import InfoExtractor
class TudouIE(InfoExtractor): class TudouIE(InfoExtractor):
_VALID_URL = r'(?:http://)?(?:www\.)?tudou\.com/(?:listplay|programs)/(?:view|(.+?))/(?:([^/]+)|([^/]+))(?:\.html)?' _VALID_URL = r'(?:http://)?(?:www\.)?tudou\.com/(?:listplay|programs|albumplay)/(?:view|(.+?))/(?:([^/]+)|([^/]+))(?:\.html)?'
_TEST = { _TESTS = [{
u'url': u'http://www.tudou.com/listplay/zzdE77v6Mmo/2xN2duXMxmw.html', u'url': u'http://www.tudou.com/listplay/zzdE77v6Mmo/2xN2duXMxmw.html',
u'file': u'159448201.f4v', u'file': u'159448201.f4v',
u'md5': u'140a49ed444bd22f93330985d8475fcb', u'md5': u'140a49ed444bd22f93330985d8475fcb',
u'info_dict': { u'info_dict': {
u"title": u"卡马乔国足开大脚长传冲吊集锦" u"title": u"卡马乔国足开大脚长传冲吊集锦"
} }
} },
{
u'url': u'http://www.tudou.com/albumplay/TenTw_JgiPM/PzsAs5usU9A.html',
u'file': u'todo.mp4',
u'md5': u'todo.mp4',
u'info_dict': {
u'title': u'todo.mp4',
},
u'add_ie': [u'Youku'],
u'skip': u'Only works from China'
}]
def _url_for_id(self, id, quality = None): def _url_for_id(self, id, quality = None):
info_url = "http://v2.tudou.com/f?id="+str(id) info_url = "http://v2.tudou.com/f?id="+str(id)
@ -29,14 +39,19 @@ class TudouIE(InfoExtractor):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.group(2) video_id = mobj.group(2)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
title = re.search(",kw:\"(.+)\"",webpage)
if title is None: m = re.search(r'vcode:\s*[\'"](.+?)[\'"]', webpage)
title = re.search(",kw: \'(.+)\'",webpage) if m and m.group(1):
title = title.group(1) return {
thumbnail_url = re.search(",pic: \'(.+?)\'",webpage) '_type': 'url',
if thumbnail_url is None: 'url': u'youku:' + m.group(1),
thumbnail_url = re.search(",pic:\"(.+?)\"",webpage) 'ie_key': 'Youku'
thumbnail_url = thumbnail_url.group(1) }
title = self._search_regex(
r",kw:\s*['\"](.+?)[\"']", webpage, u'title')
thumbnail_url = self._search_regex(
r",pic:\s*[\"'](.+?)[\"']", webpage, u'thumbnail URL', fatal=False)
segs_json = self._search_regex(r'segs: \'(.*)\'', webpage, 'segments') segs_json = self._search_regex(r'segs: \'(.*)\'', webpage, 'segments')
segments = json.loads(segs_json) segments = json.loads(segs_json)

View File

@ -0,0 +1,41 @@
import json
import re
from .common import InfoExtractor
class TvpIE(InfoExtractor):
IE_NAME = u'tvp.pl'
_VALID_URL = r'https?://www\.tvp\.pl/.*?wideo/(?P<date>\d+)/(?P<id>\d+)'
_TEST = {
u'url': u'http://www.tvp.pl/warszawa/magazyny/campusnews/wideo/31102013/12878238',
u'md5': u'148408967a6a468953c0a75cbdaf0d7a',
u'file': u'12878238.wmv',
u'info_dict': {
u'title': u'31.10.2013',
u'description': u'31.10.2013',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
json_url = 'http://www.tvp.pl/pub/stat/videofileinfo?video_id=%s' % video_id
json_params = self._download_webpage(
json_url, video_id, u"Downloading video metadata")
params = json.loads(json_params)
self.report_extraction(video_id)
video_url = params['video_url']
title = self._og_search_title(webpage, fatal=True)
return {
'id': video_id,
'title': title,
'ext': 'wmv',
'url': video_url,
'description': self._og_search_description(webpage),
'thumbnail': self._og_search_thumbnail(webpage),
}

View File

@ -1,53 +1,127 @@
import re import re
import json import json
import xml.etree.ElementTree
import datetime
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
compat_HTTPError,
ExtractorError, ExtractorError,
) )
class VevoIE(InfoExtractor): class VevoIE(InfoExtractor):
""" """
Accepts urls from vevo.com or in the format 'vevo:{id}' Accepts urls from vevo.com or in the format 'vevo:{id}'
(currently used by MTVIE) (currently used by MTVIE)
""" """
_VALID_URL = r'((http://www.vevo.com/watch/.*?/.*?/)|(vevo:))(?P<id>.*?)(\?|$)' _VALID_URL = r'((http://www.vevo.com/watch/.*?/.*?/)|(vevo:))(?P<id>.*?)(\?|$)'
_TEST = { _TESTS = [{
u'url': u'http://www.vevo.com/watch/hurts/somebody-to-die-for/GB1101300280', u'url': u'http://www.vevo.com/watch/hurts/somebody-to-die-for/GB1101300280',
u'file': u'GB1101300280.mp4', u'file': u'GB1101300280.mp4',
u'md5': u'06bea460acb744eab74a9d7dcb4bfd61', u"md5": u"06bea460acb744eab74a9d7dcb4bfd61",
u'info_dict': { u'info_dict': {
u"upload_date": u"20130624", u"upload_date": u"20130624",
u"uploader": u"Hurts", u"uploader": u"Hurts",
u"title": u"Somebody to Die For" u"title": u"Somebody to Die For",
u"duration": 230,
u"width": 1920,
u"height": 1080,
} }
} }]
_SMIL_BASE_URL = 'http://smil.lvl3.vevo.com/'
def _formats_from_json(self, video_info):
last_version = {'version': -1}
for version in video_info['videoVersions']:
# These are the HTTP downloads, other types are for different manifests
if version['sourceType'] == 2:
if version['version'] > last_version['version']:
last_version = version
if last_version['version'] == -1:
raise ExtractorError(u'Unable to extract last version of the video')
renditions = xml.etree.ElementTree.fromstring(last_version['data'])
formats = []
# Already sorted from worst to best quality
for rend in renditions.findall('rendition'):
attr = rend.attrib
format_note = '%(videoCodec)s@%(videoBitrate)4sk, %(audioCodec)s@%(audioBitrate)3sk' % attr
formats.append({
'url': attr['url'],
'format_id': attr['name'],
'format_note': format_note,
'height': int(attr['frameheight']),
'width': int(attr['frameWidth']),
})
return formats
def _formats_from_smil(self, smil_xml):
formats = []
smil_doc = xml.etree.ElementTree.fromstring(smil_xml.encode('utf-8'))
els = smil_doc.findall('.//{http://www.w3.org/2001/SMIL20/Language}video')
for el in els:
src = el.attrib['src']
m = re.match(r'''(?xi)
(?P<ext>[a-z0-9]+):
(?P<path>
[/a-z0-9]+ # The directory and main part of the URL
_(?P<cbr>[0-9]+)k
_(?P<width>[0-9]+)x(?P<height>[0-9]+)
_(?P<vcodec>[a-z0-9]+)
_(?P<vbr>[0-9]+)
_(?P<acodec>[a-z0-9]+)
_(?P<abr>[0-9]+)
\.[a-z0-9]+ # File extension
)''', src)
if not m:
continue
format_url = self._SMIL_BASE_URL + m.group('path')
format_note = ('%(vcodec)s@%(vbr)4sk, %(acodec)s@%(abr)3sk' %
m.groupdict())
formats.append({
'url': format_url,
'format_id': u'SMIL_' + m.group('cbr'),
'format_note': format_note,
'ext': m.group('ext'),
'width': int(m.group('width')),
'height': int(m.group('height')),
})
return formats
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id') video_id = mobj.group('id')
json_url = 'http://www.vevo.com/data/video/%s' % video_id json_url = 'http://videoplayer.vevo.com/VideoService/AuthenticateVideo?isrc=%s' % video_id
base_url = 'http://smil.lvl3.vevo.com'
videos_url = '%s/Video/V2/VFILE/%s/%sr.smil' % (base_url, video_id, video_id.lower())
info_json = self._download_webpage(json_url, video_id, u'Downloading json info') info_json = self._download_webpage(json_url, video_id, u'Downloading json info')
links_webpage = self._download_webpage(videos_url, video_id, u'Downloading videos urls') video_info = json.loads(info_json)['video']
self.report_extraction(video_id) formats = self._formats_from_json(video_info)
video_info = json.loads(info_json) try:
m_urls = list(re.finditer(r'<video src="(?P<ext>.*?):/?(?P<url>.*?)"', links_webpage)) smil_url = '%s/Video/V2/VFILE/%s/%sr.smil' % (
if m_urls is None or len(m_urls) == 0: self._SMIL_BASE_URL, video_id, video_id.lower())
raise ExtractorError(u'Unable to extract video url') smil_xml = self._download_webpage(smil_url, video_id,
# They are sorted from worst to best quality u'Downloading SMIL info')
m_url = m_urls[-1] formats.extend(self._formats_from_smil(smil_xml))
video_url = base_url + '/' + m_url.group('url') except ExtractorError as ee:
ext = m_url.group('ext') if not isinstance(ee.cause, compat_HTTPError):
raise
self._downloader.report_warning(
u'Cannot download SMIL information, falling back to JSON ..')
return {'url': video_url, timestamp_ms = int(self._search_regex(
'ext': ext, r'/Date\((\d+)\)/', video_info['launchDate'], u'launch date'))
'id': video_id, upload_date = datetime.datetime.fromtimestamp(timestamp_ms // 1000)
'title': video_info['title'], info = {
'thumbnail': video_info['img'], 'id': video_id,
'upload_date': video_info['launchDate'].replace('/',''), 'title': video_info['title'],
'uploader': video_info['Artists'][0]['title'], 'formats': formats,
} 'thumbnail': video_info['imageUrl'],
'upload_date': upload_date.strftime('%Y%m%d'),
'uploader': video_info['mainArtists'][0]['artistName'],
'duration': video_info['duration'],
}
return info

View File

@ -0,0 +1,64 @@
import json
import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
)
class ViddlerIE(InfoExtractor):
_VALID_URL = r'(?P<domain>https?://(?:www\.)?viddler.com)/(?:v|embed|player)/(?P<id>[a-z0-9]+)'
_TEST = {
u"url": u"http://www.viddler.com/v/43903784",
u'file': u'43903784.mp4',
u'md5': u'fbbaedf7813e514eb7ca30410f439ac9',
u'info_dict': {
u"title": u"Video Made Easy",
u"uploader": u"viddler",
u"duration": 100.89,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
embed_url = mobj.group('domain') + u'/embed/' + video_id
webpage = self._download_webpage(embed_url, video_id)
video_sources_code = self._search_regex(
r"(?ms)sources\s*:\s*(\{.*?\})", webpage, u'video URLs')
video_sources = json.loads(video_sources_code.replace("'", '"'))
formats = [{
'url': video_url,
'format': format_id,
} for video_url, format_id in video_sources.items()]
title = self._html_search_regex(
r"title\s*:\s*'([^']*)'", webpage, u'title')
uploader = self._html_search_regex(
r"authorName\s*:\s*'([^']*)'", webpage, u'uploader', fatal=False)
duration_s = self._html_search_regex(
r"duration\s*:\s*([0-9.]*)", webpage, u'duration', fatal=False)
duration = float(duration_s) if duration_s else None
thumbnail = self._html_search_regex(
r"thumbnail\s*:\s*'([^']*)'",
webpage, u'thumbnail', fatal=False)
info = {
'_type': 'video',
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'uploader': uploader,
'duration': duration,
'formats': formats,
}
# TODO: Remove when #980 has been merged
info['formats'][-1]['ext'] = determine_ext(info['formats'][-1]['url'])
info.update(info['formats'][-1])
return info

View File

@ -0,0 +1,30 @@
import re
from .common import InfoExtractor
from .internetvideoarchive import InternetVideoArchiveIE
from ..utils import (
compat_urlparse,
)
class VideoDetectiveIE(InfoExtractor):
_VALID_URL = r'https?://www\.videodetective\.com/[^/]+/[^/]+/(?P<id>\d+)'
_TEST = {
u'url': u'http://www.videodetective.com/movies/kick-ass-2/194487',
u'file': u'194487.mp4',
u'info_dict': {
u'title': u'KICK-ASS 2',
u'description': u'md5:65ba37ad619165afac7d432eaded6013',
u'duration': 135,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
og_video = self._og_search_video_url(webpage)
query = compat_urlparse.urlparse(og_video).query
return self.url_result(InternetVideoArchiveIE._build_url(query),
ie=InternetVideoArchiveIE.ie_key())

View File

@ -0,0 +1,40 @@
import re
import random
from .common import InfoExtractor
class VideoPremiumIE(InfoExtractor):
_VALID_URL = r'(?:https?://)?(?:www\.)?videopremium\.tv/(?P<id>\w+)(?:/.*)?'
_TEST = {
u'url': u'http://videopremium.tv/4w7oadjsf156',
u'file': u'4w7oadjsf156.f4v',
u'info_dict': {
u"title": u"youtube-dl_test_video____a_________-BaW_jenozKc.mp4.mp4"
},
u'params': {
u'skip_download': True,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage_url = 'http://videopremium.tv/' + video_id
webpage = self._download_webpage(webpage_url, video_id)
self.report_extraction(video_id)
video_title = self._html_search_regex(r'<h2(?:.*?)>\s*(.+?)\s*<',
webpage, u'video title')
return [{
'id': video_id,
'url': "rtmp://e%d.md.iplay.md/play" % random.randint(1, 16),
'play_path': "mp4:%s.f4v" % video_id,
'page_url': "http://videopremium.tv/" + video_id,
'player_url': "http://videopremium.tv/uplayer/uppod.swf",
'ext': 'f4v',
'title': video_title,
}]

View File

@ -1,3 +1,4 @@
# encoding: utf-8
import json import json
import re import re
import itertools import itertools
@ -10,19 +11,21 @@ from ..utils import (
clean_html, clean_html,
get_element_by_attribute, get_element_by_attribute,
ExtractorError, ExtractorError,
RegexNotFoundError,
std_headers, std_headers,
unsmuggle_url,
) )
class VimeoIE(InfoExtractor): class VimeoIE(InfoExtractor):
"""Information extractor for vimeo.com.""" """Information extractor for vimeo.com."""
# _VALID_URL matches Vimeo URLs # _VALID_URL matches Vimeo URLs
_VALID_URL = r'(?P<proto>https?://)?(?:(?:www|player)\.)?vimeo(?P<pro>pro)?\.com/(?:(?:(?:groups|album)/[^/]+)|(?:.*?)/)?(?P<direct_link>play_redirect_hls\?clip_id=)?(?:videos?/)?(?P<id>[0-9]+)(?:[?].*)?$' _VALID_URL = r'(?P<proto>https?://)?(?:(?:www|(?P<player>player))\.)?vimeo(?P<pro>pro)?\.com/(?:(?:(?:groups|album)/[^/]+)|(?:.*?)/)?(?P<direct_link>play_redirect_hls\?clip_id=)?(?:videos?/)?(?P<id>[0-9]+)/?(?:[?].*)?(?:#.*)?$'
_NETRC_MACHINE = 'vimeo' _NETRC_MACHINE = 'vimeo'
IE_NAME = u'vimeo' IE_NAME = u'vimeo'
_TESTS = [ _TESTS = [
{ {
u'url': u'http://vimeo.com/56015672', u'url': u'http://vimeo.com/56015672#at=0',
u'file': u'56015672.mp4', u'file': u'56015672.mp4',
u'md5': u'8879b6cc097e987f02484baf890129e5', u'md5': u'8879b6cc097e987f02484baf890129e5',
u'info_dict': { u'info_dict': {
@ -54,6 +57,21 @@ class VimeoIE(InfoExtractor):
u'uploader': u'The BLN & Business of Software', u'uploader': u'The BLN & Business of Software',
}, },
}, },
{
u'url': u'http://vimeo.com/68375962',
u'file': u'68375962.mp4',
u'md5': u'aaf896bdb7ddd6476df50007a0ac0ae7',
u'note': u'Video protected with password',
u'info_dict': {
u'title': u'youtube-dl password protected test video',
u'upload_date': u'20130614',
u'uploader_id': u'user18948128',
u'uploader': u'Jaime Marquínez Ferrándiz',
},
u'params': {
u'videopassword': u'youtube-dl',
},
},
] ]
def _login(self): def _login(self):
@ -98,21 +116,25 @@ class VimeoIE(InfoExtractor):
self._login() self._login()
def _real_extract(self, url, new_video=True): def _real_extract(self, url, new_video=True):
url, data = unsmuggle_url(url)
headers = std_headers
if data is not None:
headers = headers.copy()
headers.update(data)
# Extract ID from URL # Extract ID from URL
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
if mobj is None: if mobj is None:
raise ExtractorError(u'Invalid URL: %s' % url) raise ExtractorError(u'Invalid URL: %s' % url)
video_id = mobj.group('id') video_id = mobj.group('id')
if not mobj.group('proto'): if mobj.group('pro') or mobj.group('player'):
url = 'https://' + url
elif mobj.group('pro'):
url = 'http://player.vimeo.com/video/' + video_id url = 'http://player.vimeo.com/video/' + video_id
elif mobj.group('direct_link'): else:
url = 'https://vimeo.com/' + video_id url = 'https://vimeo.com/' + video_id
# Retrieve video webpage to extract further information # Retrieve video webpage to extract further information
request = compat_urllib_request.Request(url, None, std_headers) request = compat_urllib_request.Request(url, None, headers)
webpage = self._download_webpage(request, video_id) webpage = self._download_webpage(request, video_id)
# Now we begin extracting as much information as we can from what we # Now we begin extracting as much information as we can from what we
@ -122,18 +144,26 @@ class VimeoIE(InfoExtractor):
# Extract the config JSON # Extract the config JSON
try: try:
config = self._search_regex([r' = {config:({.+?}),assets:', r'c=({.+?);'], try:
webpage, u'info section', flags=re.DOTALL) config_url = self._html_search_regex(
config = json.loads(config) r' data-config-url="(.+?)"', webpage, u'config URL')
except: config_json = self._download_webpage(config_url, video_id)
config = json.loads(config_json)
except RegexNotFoundError:
# For pro videos or player.vimeo.com urls
config = self._search_regex([r' = {config:({.+?}),assets:', r'c=({.+?);'],
webpage, u'info section', flags=re.DOTALL)
config = json.loads(config)
except Exception as e:
if re.search('The creator of this video has not given you permission to embed it on this domain.', webpage): if re.search('The creator of this video has not given you permission to embed it on this domain.', webpage):
raise ExtractorError(u'The author has restricted the access to this video, try with the "--referer" option') raise ExtractorError(u'The author has restricted the access to this video, try with the "--referer" option')
if re.search('If so please provide the correct password.', webpage): if re.search('<form[^>]+?id="pw_form"', webpage) is not None:
self._verify_video_password(url, video_id, webpage) self._verify_video_password(url, video_id, webpage)
return self._real_extract(url) return self._real_extract(url)
else: else:
raise ExtractorError(u'Unable to extract info section') raise ExtractorError(u'Unable to extract info section',
cause=e)
# Extract title # Extract title
video_title = config["video"]["title"] video_title = config["video"]["title"]
@ -172,47 +202,47 @@ class VimeoIE(InfoExtractor):
# Vimeo specific: extract video codec and quality information # Vimeo specific: extract video codec and quality information
# First consider quality, then codecs, then take everything # First consider quality, then codecs, then take everything
# TODO bind to format param codecs = [('vp6', 'flv'), ('vp8', 'flv'), ('h264', 'mp4')]
codecs = [('h264', 'mp4'), ('vp8', 'flv'), ('vp6', 'flv')] files = {'hd': [], 'sd': [], 'other': []}
files = { 'hd': [], 'sd': [], 'other': []}
config_files = config["video"].get("files") or config["request"].get("files") config_files = config["video"].get("files") or config["request"].get("files")
for codec_name, codec_extension in codecs: for codec_name, codec_extension in codecs:
if codec_name in config_files: for quality in config_files.get(codec_name, []):
if 'hd' in config_files[codec_name]: format_id = '-'.join((codec_name, quality)).lower()
files['hd'].append((codec_name, codec_extension, 'hd')) key = quality if quality in files else 'other'
elif 'sd' in config_files[codec_name]: video_url = None
files['sd'].append((codec_name, codec_extension, 'sd')) if isinstance(config_files[codec_name], dict):
file_info = config_files[codec_name][quality]
video_url = file_info.get('url')
else: else:
files['other'].append((codec_name, codec_extension, config_files[codec_name][0])) file_info = {}
if video_url is None:
video_url = "http://player.vimeo.com/play_redirect?clip_id=%s&sig=%s&time=%s&quality=%s&codecs=%s&type=moogaloop_local&embed_location=" \
%(video_id, sig, timestamp, quality, codec_name.upper())
for quality in ('hd', 'sd', 'other'): files[key].append({
if len(files[quality]) > 0: 'ext': codec_extension,
video_quality = files[quality][0][2] 'url': video_url,
video_codec = files[quality][0][0] 'format_id': format_id,
video_extension = files[quality][0][1] 'width': file_info.get('width'),
self.to_screen(u'%s: Downloading %s file at %s quality' % (video_id, video_codec.upper(), video_quality)) 'height': file_info.get('height'),
break })
else: formats = []
for key in ('other', 'sd', 'hd'):
formats += files[key]
if len(formats) == 0:
raise ExtractorError(u'No known codec found') raise ExtractorError(u'No known codec found')
video_url = None return {
if isinstance(config_files[video_codec], dict):
video_url = config_files[video_codec][video_quality].get("url")
if video_url is None:
video_url = "http://player.vimeo.com/play_redirect?clip_id=%s&sig=%s&time=%s&quality=%s&codecs=%s&type=moogaloop_local&embed_location=" \
%(video_id, sig, timestamp, video_quality, video_codec.upper())
return [{
'id': video_id, 'id': video_id,
'url': video_url,
'uploader': video_uploader, 'uploader': video_uploader,
'uploader_id': video_uploader_id, 'uploader_id': video_uploader_id,
'upload_date': video_upload_date, 'upload_date': video_upload_date,
'title': video_title, 'title': video_title,
'ext': video_extension,
'thumbnail': video_thumbnail, 'thumbnail': video_thumbnail,
'description': video_description, 'description': video_description,
}] 'formats': formats,
'webpage_url': url,
}
class VimeoChannelIE(InfoExtractor): class VimeoChannelIE(InfoExtractor):

View File

@ -27,7 +27,7 @@ class VineIE(InfoExtractor):
video_url = self._html_search_regex(r'<meta property="twitter:player:stream" content="(.+?)"', video_url = self._html_search_regex(r'<meta property="twitter:player:stream" content="(.+?)"',
webpage, u'video URL') webpage, u'video URL')
uploader = self._html_search_regex(r'<div class="user">.*?<h2>(.+?)</h2>', uploader = self._html_search_regex(r'<p class="username">(.*?)</p>',
webpage, u'uploader', fatal=False, flags=re.DOTALL) webpage, u'uploader', fatal=False, flags=re.DOTALL)
return [{ return [{

View File

@ -0,0 +1,45 @@
# encoding: utf-8
import re
import json
from .common import InfoExtractor
from ..utils import (
compat_str,
unescapeHTML,
)
class VKIE(InfoExtractor):
IE_NAME = u'vk.com'
_VALID_URL = r'https?://vk\.com/(?:videos.*?\?.*?z=)?video(?P<id>.*?)(?:\?|%2F|$)'
_TEST = {
u'url': u'http://vk.com/videos-77521?z=video-77521_162222515%2Fclub77521',
u'md5': u'0deae91935c54e00003c2a00646315f0',
u'info_dict': {
u'id': u'162222515',
u'ext': u'flv',
u'title': u'ProtivoGunz - Хуёвая песня',
u'uploader': u'Noize MC',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
info_url = 'http://vk.com/al_video.php?act=show&al=1&video=%s' % video_id
info_page = self._download_webpage(info_url, video_id)
m_yt = re.search(r'src="(http://www.youtube.com/.*?)"', info_page)
if m_yt is not None:
self.to_screen(u'Youtube video detected')
return self.url_result(m_yt.group(1), 'Youtube')
vars_json = self._search_regex(r'var vars = ({.*?});', info_page, u'vars')
vars = json.loads(vars_json)
return {
'id': compat_str(vars['vid']),
'url': vars['url240'],
'title': unescapeHTML(vars['md_title']),
'thumbnail': vars['jpg'],
'uploader': vars['md_author'],
}

View File

@ -0,0 +1,59 @@
# coding: utf-8
import re
from ..utils import (
compat_urllib_request,
compat_urllib_parse
)
from .common import InfoExtractor
class WeBSurgIE(InfoExtractor):
IE_NAME = u'websurg.com'
_VALID_URL = r'http://.*?\.websurg\.com/MEDIA/\?noheader=1&doi=(.*)'
_TEST = {
u'url': u'http://www.websurg.com/MEDIA/?noheader=1&doi=vd01en4012',
u'file': u'vd01en4012.mp4',
u'params': {
u'skip_download': True,
},
u'skip': u'Requires login information',
}
_LOGIN_URL = 'http://www.websurg.com/inc/login/login_div.ajax.php?login=1'
def _real_initialize(self):
login_form = {
'username': self._downloader.params['username'],
'password': self._downloader.params['password'],
'Submit': 1
}
request = compat_urllib_request.Request(
self._LOGIN_URL, compat_urllib_parse.urlencode(login_form))
request.add_header(
'Content-Type', 'application/x-www-form-urlencoded;charset=utf-8')
compat_urllib_request.urlopen(request).info()
webpage = self._download_webpage(self._LOGIN_URL, '', 'Logging in')
if webpage != 'OK':
self._downloader.report_error(
u'Unable to log in: bad username/password')
def _real_extract(self, url):
video_id = re.match(self._VALID_URL, url).group(1)
webpage = self._download_webpage(url, video_id)
url_info = re.search(r'streamer="(.*?)" src="(.*?)"', webpage)
return {'id': video_id,
'title': self._og_search_title(webpage),
'description': self._og_search_description(webpage),
'ext' : 'mp4',
'url' : url_info.group(1) + '/' + url_info.group(2),
'thumbnail': self._og_search_thumbnail(webpage)
}

View File

@ -13,6 +13,7 @@ class WeiboIE(InfoExtractor):
_VALID_URL = r'https?://video\.weibo\.com/v/weishipin/t_(?P<id>.+?)\.htm' _VALID_URL = r'https?://video\.weibo\.com/v/weishipin/t_(?P<id>.+?)\.htm'
_TEST = { _TEST = {
u'add_ie': ['Sina'],
u'url': u'http://video.weibo.com/v/weishipin/t_zjUw2kZ.htm', u'url': u'http://video.weibo.com/v/weishipin/t_zjUw2kZ.htm',
u'file': u'98322879.flv', u'file': u'98322879.flv',
u'info_dict': { u'info_dict': {

View File

@ -19,7 +19,8 @@ class XHamsterIE(InfoExtractor):
u'info_dict': { u'info_dict': {
u"upload_date": u"20121014", u"upload_date": u"20121014",
u"uploader_id": u"Ruseful2011", u"uploader_id": u"Ruseful2011",
u"title": u"FemaleAgent Shy beauty takes the bait" u"title": u"FemaleAgent Shy beauty takes the bait",
u"age_limit": 18,
} }
}, },
{ {
@ -27,28 +28,33 @@ class XHamsterIE(InfoExtractor):
u'file': u'2221348.flv', u'file': u'2221348.flv',
u'md5': u'e767b9475de189320f691f49c679c4c7', u'md5': u'e767b9475de189320f691f49c679c4c7',
u'info_dict': { u'info_dict': {
u"upload_date": u"20130914", u"upload_date": u"20130914",
u"uploader_id": u"jojo747400", u"uploader_id": u"jojo747400",
u"title": u"Britney Spears Sexy Booty" u"title": u"Britney Spears Sexy Booty",
u"age_limit": 18,
} }
}] }]
def _real_extract(self,url): def _real_extract(self,url):
def extract_video_url(webpage):
mobj = re.search(r'\'srv\': \'(?P<server>[^\']*)\',\s*\'file\': \'(?P<file>[^\']+)\',', webpage)
if mobj is None:
raise ExtractorError(u'Unable to extract media URL')
if len(mobj.group('server')) == 0:
return compat_urllib_parse.unquote(mobj.group('file'))
else:
return mobj.group('server')+'/key='+mobj.group('file')
def is_hd(webpage):
return webpage.find('<div class=\'icon iconHD\'>') != -1
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id') video_id = mobj.group('id')
seo = mobj.group('seo') seo = mobj.group('seo')
mrss_url = 'http://xhamster.com/movies/%s/%s.html?hd' % (video_id, seo) mrss_url = 'http://xhamster.com/movies/%s/%s.html' % (video_id, seo)
webpage = self._download_webpage(mrss_url, video_id) webpage = self._download_webpage(mrss_url, video_id)
mobj = re.search(r'\'srv\': \'(?P<server>[^\']*)\',\s*\'file\': \'(?P<file>[^\']+)\',', webpage)
if mobj is None:
raise ExtractorError(u'Unable to extract media URL')
if len(mobj.group('server')) == 0:
video_url = compat_urllib_parse.unquote(mobj.group('file'))
else:
video_url = mobj.group('server')+'/key='+mobj.group('file')
video_title = self._html_search_regex(r'<title>(?P<title>.+?) - xHamster\.com</title>', video_title = self._html_search_regex(r'<title>(?P<title>.+?) - xHamster\.com</title>',
webpage, u'title') webpage, u'title')
@ -72,13 +78,34 @@ class XHamsterIE(InfoExtractor):
video_thumbnail = self._search_regex(r'\'image\':\'(?P<thumbnail>[^\']+)\'', video_thumbnail = self._search_regex(r'\'image\':\'(?P<thumbnail>[^\']+)\'',
webpage, u'thumbnail', fatal=False) webpage, u'thumbnail', fatal=False)
return [{ age_limit = self._rta_search(webpage)
'id': video_id,
'url': video_url, video_url = extract_video_url(webpage)
'ext': determine_ext(video_url), hd = is_hd(webpage)
'title': video_title, formats = [{
'url': video_url,
'ext': determine_ext(video_url),
'format': 'hd' if hd else 'sd',
'format_id': 'hd' if hd else 'sd',
}]
if not hd:
webpage = self._download_webpage(mrss_url+'?hd', video_id)
if is_hd(webpage):
video_url = extract_video_url(webpage)
formats.append({
'url': video_url,
'ext': determine_ext(video_url),
'format': 'hd',
'format_id': 'hd',
})
return {
'id': video_id,
'title': video_title,
'formats': formats,
'description': video_description, 'description': video_description,
'upload_date': video_upload_date, 'upload_date': video_upload_date,
'uploader_id': video_uploader_id, 'uploader_id': video_uploader_id,
'thumbnail': video_thumbnail 'thumbnail': video_thumbnail,
}] 'age_limit': age_limit,
}

View File

@ -9,7 +9,7 @@ from ..utils import (
class XNXXIE(InfoExtractor): class XNXXIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?video\.xnxx\.com/video([0-9]+)/(.*)' _VALID_URL = r'^(?:https?://)?(?:video|www)\.xnxx\.com/video([0-9]+)/(.*)'
VIDEO_URL_RE = r'flv_url=(.*?)&amp;' VIDEO_URL_RE = r'flv_url=(.*?)&amp;'
VIDEO_TITLE_RE = r'<title>(.*?)\s+-\s+XNXX.COM' VIDEO_TITLE_RE = r'<title>(.*?)\s+-\s+XNXX.COM'
VIDEO_THUMB_RE = r'url_bigthumb=(.*?)&amp;' VIDEO_THUMB_RE = r'url_bigthumb=(.*?)&amp;'
@ -18,7 +18,8 @@ class XNXXIE(InfoExtractor):
u'file': u'1135332.flv', u'file': u'1135332.flv',
u'md5': u'0831677e2b4761795f68d417e0b7b445', u'md5': u'0831677e2b4761795f68d417e0b7b445',
u'info_dict': { u'info_dict': {
u"title": u"lida \u00bb Naked Funny Actress (5)" u"title": u"lida \u00bb Naked Funny Actress (5)",
u"age_limit": 18,
} }
} }
@ -50,4 +51,5 @@ class XNXXIE(InfoExtractor):
'ext': 'flv', 'ext': 'flv',
'thumbnail': video_thumbnail, 'thumbnail': video_thumbnail,
'description': None, 'description': None,
'age_limit': 18,
}] }]

View File

@ -0,0 +1,55 @@
import os
import re
from .common import InfoExtractor
from ..utils import (
compat_urllib_parse_urlparse,
compat_urllib_request,
compat_urllib_parse,
)
class XTubeIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?(?:www\.)?(?P<url>xtube\.com/watch\.php\?v=(?P<videoid>[^/?&]+))'
_TEST = {
u'url': u'http://www.xtube.com/watch.php?v=kVTUy_G222_',
u'file': u'kVTUy_G222_.mp4',
u'md5': u'092fbdd3cbe292c920ef6fc6a8a9cdab',
u'info_dict': {
u"title": u"strange erotica",
u"description": u"surreal gay themed erotica...almost an ET kind of thing",
u"uploader": u"greenshowers",
u"age_limit": 18,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('videoid')
url = 'http://www.' + mobj.group('url')
req = compat_urllib_request.Request(url)
req.add_header('Cookie', 'age_verified=1')
webpage = self._download_webpage(req, video_id)
video_title = self._html_search_regex(r'<div class="p_5px[^>]*>([^<]+)', webpage, u'title')
video_uploader = self._html_search_regex(r'so_s\.addVariable\("owner_u", "([^"]+)', webpage, u'uploader', fatal=False)
video_description = self._html_search_regex(r'<p class="video_description">([^<]+)', webpage, u'description', default=None)
video_url= self._html_search_regex(r'var videoMp4 = "([^"]+)', webpage, u'video_url').replace('\\/', '/')
path = compat_urllib_parse_urlparse(video_url).path
extension = os.path.splitext(path)[1][1:]
format = path.split('/')[5].split('_')[:2]
format[0] += 'p'
format[1] += 'k'
format = "-".join(format)
return {
'id': video_id,
'title': video_title,
'uploader': video_uploader,
'description': video_description,
'url': video_url,
'ext': extension,
'format': format,
'format_id': format,
'age_limit': 18,
}

View File

@ -13,7 +13,8 @@ class XVideosIE(InfoExtractor):
u'file': u'939581.flv', u'file': u'939581.flv',
u'md5': u'1d0c835822f0a71a7bf011855db929d0', u'md5': u'1d0c835822f0a71a7bf011855db929d0',
u'info_dict': { u'info_dict': {
u"title": u"Funny Porns By >>>>S<<<<<< -1" u"title": u"Funny Porns By >>>>S<<<<<< -1",
u"age_limit": 18,
} }
} }
@ -46,6 +47,7 @@ class XVideosIE(InfoExtractor):
'ext': 'flv', 'ext': 'flv',
'thumbnail': video_thumbnail, 'thumbnail': video_thumbnail,
'description': None, 'description': None,
'age_limit': 18,
} }
return [info] return [info]

Some files were not shown because too many files have changed in this diff Show More