Compare commits

..

1218 Commits

Author SHA1 Message Date
00b350d209 [test] tell Travis to install rtmpdump and add initial support to rtmp testing 2013-11-25 17:46:33 -05:00
d8ec4959c8 Merge pull request #1830 from jaimeMF/download-archive
Use the 'extractor_key' field for the download archive file
2013-11-25 14:14:25 -08:00
d31209a144 Use the 'extractor_key' field for the download archive file
It has the same value as the ie_key.
2013-11-25 22:57:15 +01:00
529a2e2cc3 Fix typo in the documentation of the 'download_archive' param 2013-11-25 22:52:09 +01:00
781a7d0546 release 2013.11.25.3 2013-11-25 22:36:18 +01:00
fb04e40396 [soundcloud] Support for listing of audio-only files 2013-11-25 22:34:56 +01:00
d9b011f201 Fix rtmpdump with non-ASCII filenames on Windows on 2.x
Reported in #1798
2013-11-25 22:31:38 +01:00
b0b9eaa196 Merge pull request #1829 from jaimeMF/ydl-empty-params
Allow to initialize a YoutubeDL object without parameters
2013-11-25 13:19:59 -08:00
8b134b1062 Merge branch 'master' of github.com:rg3/youtube-dl 2013-11-25 22:16:07 +01:00
0c75c3fa7a Do not warn about fixed output template if --max-downloads is 1
Fixes #1828
2013-11-25 22:15:33 +01:00
a3927cf7ee Allow to initialize a YoutubeDL object without parameters
Having to pass the 'outtmpl' parameter feels really strange when you just want to extract the info of a video.
2013-11-25 22:03:39 +01:00
1a62c18f65 [bambuser] Skip the download in the test
It doesn't respect the 'Range' header.
2013-11-25 22:03:20 +01:00
2a15e7063b [soundcloud] Prefer HTTP over RTMP (#1798) 2013-11-25 20:30:41 +01:00
d46cc192d7 Reduce socket timeout 2013-11-25 19:11:01 +01:00
bb2bebdbe1 release 2013.11.25.2 2013-11-25 15:47:14 +01:00
5db07df634 Fix --download-archive (Fixes #1826) 2013-11-25 15:46:54 +01:00
ea36cbac5e Merge remote-tracking branch 'rbrito/swap-dimensions' 2013-11-25 06:19:15 +01:00
d0d2b49ab7 [FileDownloader] use moved format_bytes method 2013-11-25 06:17:41 +01:00
31cb6d8fef Merge remote-tracking branch 'rzhxeo/rtmpdump' 2013-11-25 06:16:18 +01:00
daa0dd2973 release 2013.11.25.1 2013-11-25 06:06:39 +01:00
de79c46c8f [viki] Fix subtitle extraction 2013-11-25 06:06:18 +01:00
94ccb6fa2e [viki] Fix subtitles extraction 2013-11-25 05:58:04 +01:00
07e4035879 [viki] Fix uploader extraction 2013-11-25 05:57:55 +01:00
d0efb9ec9a [tests] Remove global_setup function 2013-11-25 03:47:32 +01:00
ac05067d3d release 2013.11.25 2013-11-25 03:37:49 +01:00
113577e155 [generic] Improve detection
Allow download of http://goo.gl/7X5tOk
Fixes #1818
2013-11-25 03:35:53 +01:00
79d09f47c2 Merge branch 'opener-to-ydl' 2013-11-25 03:30:37 +01:00
c059bdd432 Remove quality_name field and improve zdf extractor 2013-11-25 03:28:55 +01:00
02dbf93f0e [zdf/common] Use API in ZDF extractor.
This also comes with a lot of extra format fields
Fixes #1518
2013-11-25 03:13:22 +01:00
1fb2bcbbf7 [viki] Make uploader field optional (#1813) 2013-11-25 02:02:34 +01:00
16e055849e Update the keywords tests for the rename of the old ComedyCentralIE 2013-11-24 22:13:20 +01:00
66cfab4226 [comedycentral] Add support for comedycentral.com videos (closes #1824)
It's a subclass of MTVIE

The extractor for colbertnation.com and thedailyshow.com is called now ComedyCentralShowsIE
2013-11-24 21:18:35 +01:00
6d88bc37a3 [viki] Skip travis test
Also provide a better error message for geoblocked videos.
2013-11-24 15:28:50 +01:00
b7553b2554 [vik] Clarify output 2013-11-24 15:20:16 +01:00
e03db0a077 Merge branch 'master' into opener-to-ydl 2013-11-24 15:18:44 +01:00
a1ee09e815 Document proxy 2013-11-24 15:03:25 +01:00
267ed0c5d3 [collegehumor] Encode the xml before calling xml.etree.ElementTree.fromstring (fixes #1822)
Uses a new helper method in InfoExtractor: _download_xml
2013-11-24 14:59:19 +01:00
f459d17018 [youtube] Add an extractor for downloading the watch history (closes #1821) 2013-11-24 14:33:50 +01:00
dc65dcbb6d [mixcloud] The description field may be missing (fixes #1819) 2013-11-24 11:28:44 +01:00
d214fdb8fe [brightcove] Don't use 'or' with the xml nodes, use the 'value' attribute instead 2013-11-24 11:02:34 +01:00
138df537ff release 2013.11.24.1 2013-11-24 07:51:56 +01:00
0c7c19d6bc [clipfish] Add extractor (Fixes #1760) 2013-11-24 07:51:44 +01:00
eaaafc59c2 release 2013.11.24 2013-11-24 07:30:34 +01:00
382ed50e0e [viki] Add extractor (fixes #1813) 2013-11-24 07:30:05 +01:00
66ec019240 [youtube] do not use variable name twice 2013-11-24 06:54:26 +01:00
bd49928f7a [niconico] Clarify download 2013-11-24 06:53:50 +01:00
23e6d50d73 [bandcamp] Remove unused variable 2013-11-24 06:52:53 +01:00
2e767313e4 [update] fix error 2013-11-24 06:52:21 +01:00
38b2db6a66 Credit @takuya0301 for niconico 2013-11-24 06:39:49 +01:00
13ebea791f [niconico] Simplify and make work with old Python versions
The website requires SSLv3, otherwise it just times out during SSL negotiation.
2013-11-24 06:39:10 +01:00
4c9c57428f Merge remote-tracking branch 'takuya0301/niconico' 2013-11-24 06:09:11 +01:00
8bf9319e9c Simplify logger code(#1811) 2013-11-24 06:08:11 +01:00
4914120727 Merge remote-tracking branch 'iTaybb/master' 2013-11-24 06:07:12 +01:00
36de0a0e1a [brightcove] Set the 'videoPlayer' value to the 'videoId' if it's missing in the parameters (fixes #1815) 2013-11-23 23:27:15 +01:00
e5c146d586 [streamcloud] skip test on travis 2013-11-23 15:57:42 +01:00
52ad14aeb0 Add support for niconico 2013-11-23 18:19:44 +09:00
43afe28588 Log to an external logger (fixes #1810)
Sadly applications using youtube-dl's python sources can't directly
access it's log stream. It's pretty much limited to stdout and stderr
only.

It should log to logging.Logger instance passed to YoutubeDL's params
dictionary.
2013-11-23 10:22:18 +02:00
a87b0615aa release 2013.11.22.2 2013-11-22 23:08:15 +01:00
d7386f6276 [update] Check if version from repository is newer before updating
Closes #1704
2013-11-22 23:05:58 +01:00
081640940e Merge branch 'master' of github.com:rg3/youtube-dl 2013-11-22 22:46:57 +01:00
7012b23c94 Match --download-archive during playlist processing (Fixes #1745) 2013-11-22 22:46:46 +01:00
d3b30148ed [bambuser:channel] Update test 2013-11-22 21:26:31 +01:00
9f79463803 [howcast] update test's checksum 2013-11-22 21:25:12 +01:00
d35dc6d3b5 [bandcamp] move the album test to the album extractor and return a single track instead of a playlist 2013-11-22 21:19:31 +01:00
50123be421 release 2013.11.22.1 2013-11-22 20:23:55 +01:00
3f8ced5144 Merge remote-tracking branch 'jaimeMF/yt-playlists' 2013-11-22 20:11:54 +01:00
00ea0f11eb Print full title in --get-title output (#1806) 2013-11-22 20:00:35 +01:00
dca0872056 Move the opener to the YoutubeDL object.
This is the first step towards being able to just import youtube_dl and start using it.
Apart from removing global state, this would fix problems like #1805.
2013-11-22 19:57:52 +01:00
0b63aed8df [update] do not assign to unused variables 2013-11-22 19:15:36 +01:00
15c3adbb16 Merge branch 'master' of github.com:rg3/youtube-dl 2013-11-22 19:08:33 +01:00
f143a42fe6 [bandcamp] Skip album test 2013-11-22 19:08:25 +01:00
241650c7ff [vimeo] Fix the extraction of vimeo pro and player.vimeo.com videos 2013-11-22 18:20:31 +01:00
bfe7439a20 release 2013.11.22 2013-11-22 17:46:26 +01:00
cffa6aa107 [bandcamp] Support trackinfo-style songs (Fixes #1270) 2013-11-22 17:44:55 +01:00
02e4ebbbad [streamcloud] Add IE (Fixes #1801) 2013-11-22 17:19:22 +01:00
ab009f59ef [toutv] Fix a typo 2013-11-22 17:18:03 +01:00
0980426559 [bandcamp] add support for albums (reported in #1270) 2013-11-22 16:05:14 +01:00
b1c9c66936 Remove unnecessary slash in setup.py (Fixes #1778) 2013-11-21 23:26:28 +01:00
a6a173c2fd utils.shell_quote: Convert the args to unicode strings
The youtube test video failed with `UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 34: ordinal not in range(128)`, the problem was with the filenames being encoded.
2013-11-21 14:09:28 +01:00
2bb683c201 release 2013.11.21 2013-11-21 13:59:33 +01:00
64bb5187f5 [soundcloud] Retrieve the file url using the client_id for the iPhone (fixes #1798)
The desktop's client_id always give the rtmp url, but with the iPhone one it returns the http url if it's available.
2013-11-21 13:16:19 +01:00
9e4f50a8ae [sztv] skip test, site is undergoing mid-term maintenance 2013-11-20 09:59:03 +01:00
0190eecc00 [nhl] Make NHLVideocenter IE_DESC fit with other descriptions 2013-11-20 09:45:29 +01:00
ca872a4c0b [spankwire] Fix description search 2013-11-20 09:23:53 +01:00
f2e87ef4fa [anitube] Skip test (on travis) 2013-11-20 07:46:44 +01:00
0ad97bbc05 [spankwire] fix check for description 2013-11-20 07:45:32 +01:00
c4864091a1 [videopremium] Support new crazy redirect scheme 2013-11-20 07:43:21 +01:00
9a98a466b3 [toutv] really skip test 2013-11-20 07:37:22 +01:00
f99e0f1ed6 Adapt age restriction tests to new .info.json filenames 2013-11-20 07:37:07 +01:00
d323bcb152 release 2013.11.20 2013-11-20 07:25:17 +01:00
da6a795fdb [escapist] Fix title search 2013-11-20 07:23:23 +01:00
c5edcde21f [escapist] upper-case URL 2013-11-20 06:56:59 +01:00
15ff3c831e [escapist] Fix syntax error 2013-11-20 06:55:07 +01:00
100959a6d9 [escapist] Add support for HD format (Closes #1755) 2013-11-20 06:52:08 +01:00
0a120f74b2 Credit @diffycat for anitube 2013-11-20 06:36:00 +01:00
8f05351984 [anitube] Minor fixes (#1776) 2013-11-20 06:35:02 +01:00
4eb92208a3 Adapt test to changed .info.json name 2013-11-20 06:34:48 +01:00
71791f414c Merge remote-tracking branch 'diffycat/master' 2013-11-20 06:28:13 +01:00
f3682997d7 Clean up unused imports and other minor mistakes 2013-11-20 06:27:48 +01:00
cc13cc0251 [teamcoco] Correct error 2013-11-20 06:25:33 +01:00
86bd5f2ca9 Merge remote-tracking branch 'dz0ny/patch-1' 2013-11-20 06:21:05 +01:00
8694c60000 import json for --dump-json 2013-11-20 06:18:24 +01:00
9d1538182f Add an option to dump json information 2013-11-20 06:14:57 +01:00
5904088811 Add support for tou.tv (Fixes #1792) 2013-11-20 06:13:19 +01:00
69545c2aff [d8] inherit from CanalplusIE
it reuses the same extraction process
2013-11-19 20:44:20 +01:00
495da337ae Merge pull request #1758 from migbac/master
Add support for d8.tv
2013-11-19 20:43:14 +01:00
34b3afc7be release 2013.11.19 2013-11-19 12:41:01 +01:00
00373a4c5d Merge pull request #1790 from rg3/console-title
Correctly write and restore the console title on the stack (fixes #1782)
2013-11-18 07:50:10 -08:00
cb7dfeeac4 [youtube] only allow domain name to be upper-case (#1786) 2013-11-18 16:42:35 +01:00
efd6c574a2 Correctly write and restore the console title on the stack (fixes #1782) 2013-11-18 16:35:41 +01:00
4113e6ab56 [auengine] Do not return unnecessary ext 2013-11-18 14:36:01 +01:00
9a942a4671 release 2013.11.18.1 2013-11-18 13:56:53 +01:00
9906d397a0 [auengine] Simplify 2013-11-18 13:56:45 +01:00
ae8f787141 Remove iPhone from user agent. This breaks a lot of extractors
In the future, it might be worth investigating whether we get better content when we claime to be an iPhone.
2013-11-18 13:52:26 +01:00
a81b4d5c8f release 2013.11.18 2013-11-18 13:30:43 +01:00
887c6acdf2 Support multiple embedded YouTube URLs (Fixes #1787) 2013-11-18 13:28:26 +01:00
83aa529330 Support protocol-independent URLs (#1787) 2013-11-18 13:18:17 +01:00
96b31b6533 Add iPhone to UA (#1746) 2013-11-18 13:05:58 +01:00
fccd377198 Suppor embed-only videos (Fixes #1746) 2013-11-18 13:05:18 +01:00
2b35c9ef74 Merge branch 'master' into rtmpdump
Conflicts:
	youtube_dl/FileDownloader.py

Merge
2013-11-18 00:27:06 +01:00
73c566695f release 2013.11.17 2013-11-17 22:14:13 +01:00
63b7b7224a [MTVIE] Try with RTMP URL if download fails
This fixes youtube-dl http://www.southpark.de/clips/155251/cartman-vs-the-dog-whisperer
2013-11-17 22:11:40 +01:00
ce80c8b8ee Merge pull request #1784 from rzhxeo/southpark
Add support for southpark.de
2013-11-17 12:15:13 -08:00
749febf4d1 Allow --console-title when --quiet is given (Fixes #1783) 2013-11-17 21:12:50 +01:00
bdde425cbe Save and restore console title (Fixes #1782) 2013-11-17 21:10:11 +01:00
746f491f82 Add support for southpark.de 2013-11-17 17:54:47 +01:00
1672647ade [SouthParkStudiosIE] Move from _TEST to _TESTS 2013-11-17 17:43:58 +01:00
90b6bbc38c [SouthParkStudiosIE] Also detect urls without http:// or www 2013-11-17 17:42:24 +01:00
ce02ed60f2 Remove * imports 2013-11-17 16:47:52 +01:00
1e5b9a95fd Move console_title to YoutubeDL 2013-11-17 11:39:52 +01:00
1d699755e0 [youtube] Add view_count (Fixes #1781) 2013-11-17 11:06:16 +01:00
ddf49c6344 [arte] remove two typos 2013-11-17 11:05:49 +01:00
ba3881dffd Add support for anitube.se (#1417) 2013-11-16 18:26:34 +04:00
d1c252048b [redtube] Do not test md5, seems to vary 2013-11-16 10:30:09 +01:00
eab2724138 [gamekings] Do not test md5 sum, precise file changes regularly 2013-11-16 02:32:23 +01:00
21ea3e06c9 [gamekings] remove unnecessary import 2013-11-16 02:31:02 +01:00
52d703d3d1 [tvp] Skip tests 2013-11-16 02:09:30 +01:00
ce152341a1 [bambuser] Do not test for MD5, seems to be flaky 2013-11-16 01:59:28 +01:00
f058e34011 [dailymotion] Fix playlists 2013-11-16 01:56:23 +01:00
b5349e8721 Fix indentation of (best) and (worst) in --list-formats 2013-11-16 01:39:45 +01:00
7150858d49 [spiegel] Implement format selection 2013-11-16 01:33:12 +01:00
91c7271aab Add automatic generation of format note based on bitrate and codecs 2013-11-16 01:08:43 +01:00
aa13b2dffd release 2013.11.15.1 2013-11-15 14:35:00 +01:00
fc2ef392be [ted] Fix playlists (Fixes #1770) 2013-11-15 14:33:51 +01:00
463a908705 [ted] simplify 2013-11-15 14:06:38 +01:00
d24ffe1cfa [rtlnow] Remove the test for nitro
The videos expire.
2013-11-15 12:57:59 +01:00
78fb87b283 Don't accept '>' inside the content attribute in OpenGraph regexes 2013-11-15 12:54:13 +01:00
ab2d524780 Improve the OpenGraph regex
* Do not accept '>' between the property and content attributes.
* Recognize the properties if the content attribute is before the property attribute using two regexes (fixes the extraction of the description for SlideshareIE).
2013-11-15 12:24:54 +01:00
85d61685f1 [tvp] Update the title and the description of the test video 2013-11-15 12:10:22 +01:00
b9643eed7c [youtube:channel] Fix the extraction of autogenerated channels
The ajax pages are empty, now it looks directly in the channel's /videos page
2013-11-15 11:51:45 +01:00
feee2ecfa9 Pass the 'download' argument to 'process_video_result' (fixes #1769) 2013-11-15 11:04:26 +01:00
a25a5cfeec release 2013.11.15 2013-11-15 01:47:15 +01:00
0e145dd541 Merge branch 'master' of github.com:rg3/youtube-dl 2013-11-15 01:46:50 +01:00
9f9be844fc [youtube] Fix protocol-independent URLs (Fixes #1768) 2013-11-15 01:45:39 +01:00
e3b9ab5e18 [soundlcoud] Set the correct extension for the tracks (fixes #1766)
Some tracks are not in mp3 format, they can be wav files.
2013-11-14 19:45:39 +01:00
c66d2baa9c [livestream] Add an extractor for the original version of livestream (closes #1764)
The two versions use different systems.
2013-11-14 13:16:32 +01:00
08bc37cdd0 Update test_write_info_json.py 2013-11-13 18:55:49 +01:00
9771cceb2c Fix filename extension leaking to json filename
Makes writeinfojson behaving exactly as writethumbnail in case where filename contains mediafile extension.

Case:

video.mp4 converted to music.mp3 would yield music.mp4.info.json instead music.mp3.info.json or music.info.json
2013-11-13 18:34:03 +01:00
ca715127a2 Don't assume the 'subtitlesformat' is set in the params dict (fixes #1750) 2013-11-13 17:14:10 +01:00
ea7a7af1d4 [gamekings] Fix the test video checksum 2013-11-13 17:13:06 +01:00
880e1c529d [youtube:playlist] Login into youtube if requested (fixes #1757)
Allows to download private playlists
2013-11-13 16:39:11 +01:00
dcbb45803f [youtube:playlist] Don't use the gdata api (closes #1508)
Parse the playlist pages instead
2013-11-13 16:26:50 +01:00
80b9bbce86 release 2013.11.13 2013-11-13 11:09:04 +01:00
d37936386f Credit @saper for tvp IE (#1730) 2013-11-13 11:08:07 +01:00
c3a3028f9f [tvp] Minor improvements (#1730) 2013-11-13 11:06:53 +01:00
6c5ad80cdc Merge remote-tracking branch 'saper/tvp' 2013-11-13 11:03:49 +01:00
b5bdc2699a Credit @jelly for gamekings extractor (#1759) 2013-11-13 10:52:22 +01:00
384b98cd8f [gamekings] Minor fixes (#1759) 2013-11-13 10:51:00 +01:00
eb9b5bffef Add extractor for gamekings.tv 2013-11-13 10:38:47 +01:00
0bd59f3723 Add support for d8.tv 2013-11-12 23:32:03 +01:00
8b8cbd8f6d [vine] Fix uploader extraction 2013-11-12 20:50:52 +01:00
72b18c5d34 FFmpegMetadataPP: don't enclose the values with " (fixes #1756) 2013-11-12 20:38:13 +01:00
eb0a839866 [common] Simplify og_search_property 2013-11-12 10:36:23 +01:00
1777d5a952 release 2013.11.11 2013-11-11 18:28:17 +01:00
d4b7da84c3 Clarify -c. Do not pass it in if you don't know what you're doing
Suggested in #1743
2013-11-11 14:21:14 +01:00
801dbbdffd Use avconv for downloading with m3u8 manifests if it's available (fixes #1735) 2013-11-10 16:47:03 +01:00
0ed05a1d2d Use the 'rtmp_live' field for the live parameter of rtmpdump 2013-11-10 12:45:17 +01:00
1008bebade Merge remote-tracking branch 'rzhxeo/rtmpdump_live' 2013-11-10 12:38:40 +01:00
ae84f879d7 Merge all the subtitles test into a single file
They reuse a base class
2013-11-10 12:28:21 +01:00
be6dfd1b49 [ted] Return a single info_dict for talks urls
It failed with the --list-subs option
2013-11-10 12:09:12 +01:00
231516b6c9 Merge pull request #1705 from iemejia/master
[ted] support for subtitles
2013-11-10 11:54:18 +01:00
fb53d58dcf Merge pull request #1726 from saper/escaped
Fix AssertionError when og property not found
2013-11-10 02:51:52 -08:00
2a9e9b210b Fix the documentation of '--autonumber-size' (#1743)
it's '--auto-number' not '--autonumber'
2013-11-09 19:21:30 +01:00
897d6cc43a Improve format listing for long format ids
Now arte.tv videos have quite long ids.
2013-11-09 19:07:34 +01:00
f470c6c812 [arte] Improve the format sorting
Also use the bitrate.
Prefer normal version and sourds/mal version over original version with subtitles.
2013-11-09 19:05:19 +01:00
566d4e0425 [arte] Make sure the format_id is unique (closes #1739)
Include the bitrate and use the height instead of the quality field.
2013-11-09 19:01:23 +01:00
81be02d2f9 [cnn] Accept www.cnn.com urls (fixes #1740) 2013-11-09 18:16:32 +01:00
c2b6a482d5 [brightcove] the format function requires to specify the index in python2.6 2013-11-09 18:10:11 +01:00
12c167c881 [soundcloud] Allow to download tracks marked as not 'streamable'
They use the rtmp protocol but if the are marked as 'downloadable' it can use the direct download link.
2013-11-09 18:08:03 +01:00
20aafee7fa [kankan] Fix the video url
It now requires two additional parameters, one is a timestamp we get from the getCdnresource_flv page and the other is a key we have to build.
2013-11-09 16:51:11 +01:00
be07375b66 Don't recode the video with m3u8 downloads (fixes #1741) 2013-11-09 16:40:00 +01:00
4894fe8c5b Report download progress of rtmpdump 2013-11-09 11:14:40 +01:00
dd5bcdc4c9 [brightcove] Set the 'Referer' header if the url has the 'linkBaseUrl' parameter (fixes #1553) 2013-11-07 21:06:48 +01:00
6161d17579 release 2013.11.07 2013-11-07 11:06:34 +01:00
4ac5306ae7 Fix the report progress when file_size is unknown (#1731)
The report_progress function will accept eta and percent with None value and will set the message to 'Unknow ETA' or 'Unknown %'.
Otherwise the values must be numbers.
2013-11-07 08:03:35 +01:00
b1a80ec1a9 [xnxx] Accept urls that start with 'www' (fixes #1734) 2013-11-06 23:45:01 +01:00
672fe94dcb release 2013.11.06.1 2013-11-06 22:11:46 +01:00
51040b72ed [brightcove] Support redirected urls from bcove.me (fixes #1732)
'bctid' needs to be changed to '@videoPlayer', and 'bckey' to 'playerKey'.
2013-11-06 22:03:00 +01:00
4f045eef8f [youtube:channel] Fix the extraction
The page don't include the 'load more' button anymore, now we directly get the 'c4_browse_ajax' pages.
2013-11-06 21:42:33 +01:00
5d7b253ea0 Add an extractor for eitb.tv (fixes #1608)
The BrighcoveExperience object doesn't contain the video id, the extractor adds it and passes the url to BrightcoveIE.
2013-11-06 20:06:14 +01:00
b0759f0c19 [brightcove] Extract all the available formats 2013-11-06 19:05:41 +01:00
065472936a Add an extractor for space.com (fixes #1718)
It uses Brightcove, but requires some special process for getting a url with the playerKey field in some videos
2013-11-06 17:37:39 +01:00
fc4a0c2aec [brightcove] Change the 'videoId' or 'videoID' field to '@videoPlayer' (fixes #1697)
It seems to be needed when using the htmlFederated page
2013-11-06 17:31:47 +01:00
eeb165e674 [brightcove] Add the extraction of the url from generic 2013-11-06 16:58:03 +01:00
9ee2b5f6f2 tests: don't run the test if any of the extractors listed in the 'add_ie' field is marked as not working 2013-11-06 16:43:26 +01:00
da54be877a release 2013.11.06 2013-11-06 14:02:52 +01:00
50a886b7ab Fix reporting when file size is unkown (Fixes #1731) 2013-11-06 14:02:33 +01:00
76e67c2cb6 Clean up imports 2013-11-06 14:01:43 +01:00
5137ebac0b [tvp] Telewizja Polska: new extractor for tvp.pl, fixes #1719
Thanks-To: mplonski

https://github.com/mplonski/linux/blob/master/tvp-dl.py
2013-11-05 23:47:40 +01:00
a8eeb0597b Fix AssertionError when og property not found
On tvp.pl some webpages contain OpenGraph
metadata and some don't.

If og property is not found, _og_search_description
fails with

WARNING: unable to extract OpenGraph description; please report this issue on http://yt-dl.org/bug
Traceback (most recent call last):
  File "/usr/home/saper/bin/youtube-dl", line 18, in <module>
    youtube_dl.main()
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/__init__.py", line 766, in main
    _real_main(argv)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/__init__.py", line 719, in _real_main
    retcode = ydl.download(all_urls)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/YoutubeDL.py", line 715, in download
    videos = self.extract_info(url)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/YoutubeDL.py", line 348, in extract_info
    ie_result = ie.extract(url)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/extractor/common.py", line 125, in extract
    return self._real_extract(url)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/extractor/tvp.py", line 56, in _real_extract
    info['description'] = self._og_search_description(webpage)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/extractor/common.py", line 331, in _og_search_description
    return self._og_search_property('description', html, fatal=False, **kargs)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/extractor/common.py", line 325, in _og_search_property
    return unescapeHTML(escaped)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/utils.py", line 494, in unescapeHTML
    assert type(s) == type(u'')
AssertionError

The patch allows me to use:

  try:
    info['description'] = self._og_search_description(webpage)
    info['thumbnail'] = self._og_search_thumbnail(webpage)
  except RegexNotFoundError:
    pass
2013-11-05 23:19:29 +01:00
4ed3e51080 [ted] fixed error in case of no subtitles present
I created a test, but I leave it commented since TED videos get
new subtitles frequently.
2013-11-05 12:00:13 +01:00
7f34001d57 Merge pull request #1724 from rzhxeo/generic_youtube
[GenericIE] Also detect youtube if src url of iframe is embedded in ' instead of "
2013-11-04 23:00:46 -08:00
2dcf7d8f99 [GenericIE] Also detect youtube if src url of iframe is embedded in ' instaed of " 2013-11-05 02:08:02 +01:00
19b0668251 [canal2c] Accept more urls (fixes #1723)
The url only needs to have the 'idVideo' field in the query, in any position.
We have to set the 'void=oui' in the webpage url, so that we get the file name.
2013-11-04 22:26:19 +01:00
e7e6b54d8a [teamcoco] Parse the xml file and extract all the formats 2013-11-03 17:48:12 +01:00
2a1a8ffe41 Merge pull request #1693 from alexvh/teamcoco_fix
[teamcoco] Fix video url extraction for some videos
2013-11-03 17:19:51 +01:00
08fb86c49b [youtube] Add description for YoutubeSearchDateIE (#1710) 2013-11-03 15:59:10 +01:00
3633d77c0f Merge remote-tracking branch 'CBGoodBuddy/ytsearchtime' 2013-11-03 15:56:55 +01:00
165e179764 release 2013.11.03 2013-11-03 15:50:36 +01:00
12ebdd1506 [viddler] Support non-digit IDs (Fixes #1714) 2013-11-03 15:49:59 +01:00
1baf9a5938 Merge pull request #1698 from rzhxeo/cinemassacre
[CinemassacreIE] Support more embed urls
2013-11-03 05:17:12 -08:00
a56f9de156 Style fixes for extractors: remove spaces around (,),{ and } 2013-11-03 14:06:47 +01:00
fa5d47af4b Merge pull request #1679 from rzhxeo/mofosex
Add support for http://www.mofosex.com
2013-11-03 05:04:14 -08:00
d607038753 Merge pull request #1677 from rzhxeo/xtube
Add support for http://www.xtube.com
2013-11-03 03:28:02 -08:00
9ac6a01aaf Merge pull request #1676 from rzhxeo/extremetube
Add support for http://www.extremetube.com
2013-11-03 03:25:46 -08:00
be97abc247 Set the 'extractor_key' field in the info_dict
It's the string returned by the class method 'ie_key', which allows to retrieve the extractor with 'get_info_extractor'
2013-11-03 12:14:44 +01:00
9103bbc5cd Add the 'webpage_url' field to info_dict
The url for the video page, it must allow to reproduce the result.
It's automatically set by YoutubeDL if it's missing.
2013-11-03 12:11:13 +01:00
b6c45014ae Set the extra_info inside YoutubeDL.process_ie_result and set only if the keys are missing 2013-11-03 11:57:04 +01:00
a3dd924871 Add YoutubeSearchDateIE extractor to youtube.py & __init__.py, which searches by publication date. 2013-11-02 22:40:48 -04:00
137bbb3e37 [XTubeIE] Add description to TEST 2013-11-02 22:45:48 +01:00
86ad94bb2e [ExtremeTubeIE] Set age_limit to 18 and fix uploader extraction 2013-11-02 22:33:49 +01:00
3e56add7c9 Merge pull request #1678 from rzhxeo/keezmovies
[KeezMoviesIE] Detect URLs with numbers in the SEO part correct
2013-11-02 14:15:52 -07:00
f52f01b5d2 [brightcove] Don't set the extension
If the video only has the 'FLVFullLengthURL' key, it can still be an mp4 file.
2013-11-02 21:20:46 +01:00
98d7efb537 [exfm] skip tests
The site is down too often.
2013-11-02 20:51:09 +01:00
cf51923545 [youtube] Remove vevo test
The video is no longer available and it seems that vevo video don't use encrypted signatures anymore.
2013-11-02 20:46:26 +01:00
38fcd4597a Merge remote-tracking branch 'iemejia/master' 2013-11-02 19:56:06 +01:00
165e3bb67a [bambuser] Add an extractor for channels (closes #1702) 2013-11-02 19:50:57 +01:00
38db46794f Merge branch 'ted_subtitles' 2013-11-02 19:50:45 +01:00
a9a3876d55 [ted] Added support for subtitle download 2013-11-02 19:48:39 +01:00
1f343eaabb [subtitles] refactor to support websites with subtitle information the
webpage.

I added the parameter webpage, so now it's similar to the way automatic
captions are handled. This is an improvement needed for websites like
TED.
2013-11-02 19:29:25 +01:00
72a5b4f702 Add an extractor for bambuser.com (#1702) 2013-11-02 19:01:01 +01:00
0a43ddf320 [CinemassacreIE] Add live paramter to extracted info as a workaround 2013-11-02 18:08:35 +01:00
31366066bd Add support for live parameter to rtmpdump 2013-11-02 18:08:16 +01:00
aa2484e390 release 2013.11.02 2013-11-02 11:21:36 +01:00
8eddf3e91d [youtube] Encode subtitle track name in request (Fixes #1700) 2013-11-02 11:21:05 +01:00
60d142aa8d Add an extractor for vk.com (closes #1635) 2013-11-01 22:34:18 +01:00
66cf3ac342 [metacafe] Fix support for age-restricted videos (fixes #1696)
The 'Content-Type' header must be set for disabling the family filter.
The 'flashversion' cookie  is only needed for AnyClip videos.
Added tests for standard metacafe videos and for age-restricted videos.
Also set the 'age_limit' field.
2013-11-01 11:56:15 +01:00
ab4e151347 [CinemassacreIE] Support more embed urls 2013-11-01 01:24:23 +01:00
ac2547f5ff [teamcoco] Fix video url extraction for some videos
Video url extraction failed for some videos,
e.g. http://teamcoco.com/video/old-time-baseball

The url extracted was also occasionally suboptimal quality,
e.g. http://teamcoco.com/video/louis-ck-interview-george-w-bush
2013-10-31 15:41:14 -04:00
5f1ea943ab [livestream] fix the extraction of events
It now uses a json dictionary from the webpage.
2013-10-31 08:07:26 +01:00
0ef7ad5cd4 Fix the test for dailymotion subtitles
The extractor returns a single info_dict now.
2013-10-31 07:55:03 +01:00
9f1109a564 [dailymotion] Fix support for age-restricted videos (Fixes #1688) 2013-10-31 00:20:49 +01:00
33b1d9595d release 2013.10.30 2013-10-30 01:17:20 +01:00
7193498811 Use index in formt string (Fixes vevo test on Python 2.6) 2013-10-30 01:17:00 +01:00
72321ead7b [vevo] Readd support for SMIL (Fixes #1683) 2013-10-30 01:14:17 +01:00
b5d0d817bc Remove superfluous space 2013-10-30 01:09:44 +01:00
94badb2599 Fix output indenting for --list-formats 2013-10-30 01:09:26 +01:00
b9a836515f Update the Vimeo test vector md5
confirmed that this is indeed the first 10241 (we went off by one with
byte range 0-10240) of the full, playing mp4, so they probably
reencoded or something
2013-10-29 16:44:35 -04:00
21c924f406 [arte] Download the 'Originalversion' version if it's the only one available (fixes #1682) 2013-10-29 20:58:49 +01:00
e54fd4b23b [vevo] Add more format details 2013-10-29 15:10:09 +01:00
57dd9a8f2f Nicer --list-formats output 2013-10-29 15:09:45 +01:00
912cbf5d4e [vevo] Fix timestamp handling
( / 1000 is implicit float division )
2013-10-29 14:00:23 +01:00
43d7895ea0 release 2013.10.29 2013-10-29 06:48:39 +01:00
f7ff55aa78 Merge remote-tracking branch 'origin/master' 2013-10-29 06:48:18 +01:00
795f28f871 [youtube] Fix login (Fixes #1681) 2013-10-29 06:45:54 +01:00
f6cc16f5d8 [tests] a HTTP 503 is a transient issue 2013-10-28 19:07:16 -04:00
321a01f971 [mtv] Remove the templates from the mediagen url 2013-10-28 23:37:01 +01:00
646e17a53d Fix YouTubeDL test 2013-10-28 23:18:13 +01:00
dd508b7c4f [tests] don't fail on network errors
This is suboptimal, but at least this way we will need to look at the logs
only to check for network errors that happen too often, instead of
parsing a ton of lines each time to see if there is some true test failing
2013-10-28 18:03:26 -04:00
2563bcc85c Add an extractor for MySpace (closes #1666) 2013-10-28 22:02:17 +01:00
702665c085 tests: build the filename from the info_dict if the 'file' key is missing
It will need to have the 'id' and 'ext' keys to work.
2013-10-28 22:01:37 +01:00
dcc2a706ef Add support for http://www.xtube.com 2013-10-28 19:23:48 +01:00
2bc67c35ac [KeezMoviesIE] Detect URLs with numbers in the SEO part correct 2013-10-28 18:22:55 +01:00
77ae65877e Add support for http://www.mofosex.com 2013-10-28 18:18:58 +01:00
32a35e4418 Add support for http://www.extremetube.com 2013-10-28 17:35:01 +01:00
369a759acc setup.py: Make sure the setuptools_available variable is set
Otherwise it would crash if it can't import setuptools.
2013-10-28 16:54:48 +01:00
79b3f61228 Merge pull request #1675 from rzhxeo/fix
Check if description and thumbnail are None to prevent crash
2013-10-28 08:35:40 -07:00
216d71d001 Check if description and thumbnail are None to prevent crash 2013-10-28 16:28:35 +01:00
78a3a9f89e Make "requested format not available" expected (#1655) 2013-10-28 11:41:59 +01:00
a7685f3bf4 mixcloud does not do any format selection 2013-10-28 11:41:32 +01:00
f088ea5486 release 2013.10.28 2013-10-28 11:34:21 +01:00
1003d108d5 [vimeo] Support hash in URL (Fixes #1669) 2013-10-28 11:32:22 +01:00
8abeeb9449 Nicer --list-formats output 2013-10-28 11:31:12 +01:00
c1002e96e9 Let extractors omit ext in formats 2013-10-28 11:28:02 +01:00
77d0a82fef [addanime] Use new formats system 2013-10-28 11:24:47 +01:00
ebc14f251c Merge remote-tracking branch 'origin/master' 2013-10-28 10:44:13 +01:00
d41e6efc85 New debug option --write-pages 2013-10-28 10:44:02 +01:00
8ffa13e03e [Instagram] get the non-https link, as they are serving Akamai cert from a instagram.com domain 2013-10-28 02:34:29 -04:00
db477d3a37 Merge pull request #1620 from jaimeMF/console_script
Use the console_scripts entry point if setuptools is available
2013-10-27 23:08:59 -07:00
750e9833b8 Add the missing age_limit tags; added a devscript to do a superficial check for porn sites without the age_limit tag in the test 2013-10-28 01:50:17 -04:00
82f0ac657c Merge pull request #1657 by @rzhxeo
[YouPornIE] Extract all encrypted links and remove doubles at the end
2013-10-28 01:45:52 -04:00
eb6a2277a2 Merge pull request #1659 by @rzhxeo
Add support for http://www.tube8.com
2013-10-28 01:38:28 -04:00
f8778fb0fa Merge pull request #1663 by @rzhxeo
Add support for http://www.spankwire.com
2013-10-28 01:35:11 -04:00
e2f9de207c Merge pull request #1664 by @rzhxeo
Add support for http://www.keezmovies.com
2013-10-28 01:25:46 -04:00
a93cc0d943 Merge pull request #1661 by @rzhxeo
Add support for http://www.pornhub.com
2013-10-28 00:50:39 -04:00
7d8c2e07f2 [Exfm] replace the failing Soundcloud test vector (broken also in browser) 2013-10-28 00:33:43 -04:00
efb4c36b18 Merge pull request #1660 from pyed/master
[addanime] try to download HQ before normal
2013-10-27 21:14:19 -07:00
29526d0d2b Merge pull request #1656 from rzhxeo/xhamster
[XHamsterIE] Extract SD and HD video
2013-10-27 10:12:59 -07:00
198e370f23 [addanime] better regex. 2013-10-27 19:48:02 +03:00
c19f7764a5 [generic] Detect bandcamp pages that use custom domains (closes #1662)
They embed the original url in the 'og:url' property.
2013-10-27 14:40:25 +01:00
bc63d9d329 [rtlnow] Change the test for rtlnitronow 2013-10-27 14:26:19 +01:00
aa929c37d5 [generic] Fix test video's checksum 2013-10-27 14:21:37 +01:00
af4d506eb3 [faz] Use a regex for getting the description
The page cannot be parsed in python2.6 with the html parser.
2013-10-27 14:18:55 +01:00
5da0549581 [KeezMoviesIE] Correct return value for embedded videos 2013-10-27 12:48:09 +01:00
749a4fd2fd [facebook] Don't recommend to report the issue if the video is private. 2013-10-27 12:13:55 +01:00
6f71ef580c [facebook] Report a more meaningful message if the video cannot be accessed (closes #1658) 2013-10-27 12:09:46 +01:00
67874aeffa [facebook] Fix the login process (fixes #1244) 2013-10-27 12:07:58 +01:00
3e6a330d38 [addanime] fix md5sum 2013-10-27 13:51:26 +03:00
aee5e18c8f [addanime] catch 'RegexNotFoundError' 2013-10-27 13:36:43 +03:00
5b11143d05 Add support for http://www.keezmovies.com 2013-10-27 10:10:28 +01:00
7b2212e954 Add support for http://www.spankwire.com 2013-10-27 01:59:26 +02:00
71865091ab [Tube8IE] Fix regex for uploader extraction 2013-10-27 01:08:03 +02:00
125cfd78e8 Add support for http://www.pornhub.com 2013-10-27 01:04:22 +02:00
8cb57d9b91 [Tube8IE] Escape dot in regex 2013-10-27 00:21:27 +02:00
14e10b2b6e [addanime] try to download HQ before normal 2013-10-27 01:19:38 +03:00
6e76104d66 [YouPornIE] Make webpage download more robust 2013-10-26 23:33:32 +02:00
1d45a23b74 Add support for http://www.tube8.com 2013-10-26 23:27:30 +02:00
7df286540f [YouPornIE] Extract all encrypted links and remove doubles at the end 2013-10-26 21:57:10 +02:00
5d0c97541a [XHamsterIE] Extract SD and HD video 2013-10-26 20:38:54 +02:00
49a25557b0 [8tracks] Use track count instead of looking at at_last_track property
This fixes the error:

$ youtube-dl http://8tracks.com/vladmc/counting-stars
[8tracks] counting-stars: Downloading webpage
[8tracks] counting-stars: Downloading song information 1/4
[8tracks] counting-stars: Downloading song information 2/4
[8tracks] counting-stars: Downloading song information 3/4
[8tracks] counting-stars: Downloading song information 4/4
[8tracks] counting-stars: Downloading song information 5/4
Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/home/phihag/projects/youtube-dl/youtube_dl/__main__.py", line 18, in <module>
    youtube_dl.main()
  File "/home/phihag/projects/youtube-dl/youtube_dl/__init__.py", line 761, in main
    _real_main(argv)
  File "/home/phihag/projects/youtube-dl/youtube_dl/__init__.py", line 714, in _real_main
    retcode = ydl.download(all_urls)
  File "/home/phihag/projects/youtube-dl/youtube_dl/YoutubeDL.py", line 701, in download
    videos = self.extract_info(url)
  File "/home/phihag/projects/youtube-dl/youtube_dl/YoutubeDL.py", line 342, in extract_info
    ie_result = ie.extract(url)
  File "/home/phihag/projects/youtube-dl/youtube_dl/extractor/common.py", line 121, in extract
    return self._real_extract(url)
  File "/home/phihag/projects/youtube-dl/youtube_dl/extractor/eighttracks.py", line 111, in _real_extract
    'id': track_data['id'],
KeyError: 'id'
2013-10-25 23:46:19 +02:00
b5936c0059 Document the %(format_id)s field for the output template 2013-10-25 17:18:06 +02:00
600cc1a4f0 [youtube] Set the format_id field to the itag of the format (closes #1624) 2013-10-25 17:17:46 +02:00
ea32fbacc8 Fix the extensions of two tests with youtube videos
The best quality is now a mp4 video.
2013-10-25 16:55:37 +02:00
00fe14fc75 [youtube] Also use the 'adaptative_fmts' field from the /get_video_info page (fixes #1649)
The 'adaptative_fmts' field from the video page is not added to the 'url_encoded_fmt_stream_map'
2013-10-25 16:52:58 +02:00
fcc28edb2f [cinemassacre] Simplify
* Remove some rtmp parameters that are not needed.
* Remove the md5 checksums, the video is not downloaded.
* Remove the code used before the current format system.
2013-10-23 20:21:41 +02:00
fac6be2dd5 Merge pull request #1632 from rzhxeo/cinemassacre
[Cinemassacre] Download video that is shown in flash player
2013-10-23 20:15:39 +02:00
1cf64ee468 release 2013.10.23.2 2013-10-23 18:38:09 +02:00
cdec0190c4 [dailymotion] Extract all the available formats (closes #1028) 2013-10-23 17:33:38 +02:00
2450bcb28b [nowvideo] Fix key extraction
Extract it from the embed page
2013-10-23 17:00:33 +02:00
3126050c0f Hide the video password on verbose mode 2013-10-23 16:32:17 +02:00
93b22c7828 [vimeo] fix the extraction for videos protected with password
Added a test video.
2013-10-23 16:31:53 +02:00
0a89b2852e release 2013.10.23.1 2013-10-23 15:12:33 +02:00
55b3e45bba [vimeo] Fix pro videos and player.vimeo.com urls
The old process can still be used for those videos.
Added RegexNotFoundError, which is raised by _search_regex if it can't extract the info.
2013-10-23 14:38:03 +02:00
365bcf6d97 Merge remote-tracking branch 'origin/master' 2013-10-23 11:40:46 +02:00
71907db3ba [vimeo] Fix normal videos (Fixes #1642)
Vimeo Pro Videos are still broken
2013-10-23 11:38:53 +02:00
6803655ced Merge pull request #1622 from rbrito/fix-extension
extractor: youtube: Set extension of AAC audio formats to m4a.
2013-10-22 15:16:26 -07:00
df1c39ec5c release 2013.10.23 2013-10-23 00:07:27 +02:00
80f55a9511 release 2013.10.22 2013-10-22 22:35:13 +02:00
7853cc5ae1 Merge remote-tracking branch 'origin/master'
Conflicts:
	youtube_dl/YoutubeDL.py
2013-10-22 22:30:06 +02:00
586a91b67f Expand tilde in template (Fixes #1639) 2013-10-22 22:28:26 +02:00
b028e96144 [arte.tv:creative] Update the title of the test 2013-10-22 21:06:06 +02:00
ce68b5907c [nhl:videocenter] Fix playlist title extraction 2013-10-22 21:01:16 +02:00
fe7e0c9825 Style fixes in YoutubeDL.py
Fixed some of the problems reported by pep8
2013-10-22 14:49:34 +02:00
12893efe01 Respect the download parameter in YoutubeDL.process_video_result if the extractor handle the format selection 2013-10-22 00:01:59 +02:00
a6387bfd3c [vimeo] Implement the new format selection system (closes PR #996)
Rebased and deleted some parts to use the new system instead of copying the one from YoutubeIE
2013-10-21 23:16:11 +02:00
f6a54188c2 [youtube] Use 'node is None' when checking if the video has automatic captions
It had stopped working and it reports a FutureWarning
2013-10-21 16:28:55 +02:00
cbbd9a9c69 Fix the duration field for the VideoDetective and InternetVideoArchive tests
Also remove the use of the old format system and the comment
2013-10-21 15:07:33 +02:00
685a9cd2f1 [googleplus] Fix upload_date extraction 2013-10-21 15:00:21 +02:00
182a107877 [arte] Set the format_note and the format_id fields (closes #1628) 2013-10-21 14:42:30 +02:00
8c51aa6506 The 'format' field now defaults to '{format_id} - {width}x{height}{format_note}'
Following the YoutubeIE format. The 'format_note' gives additional info about the format, for example '3D' or 'DASH video'.
2013-10-21 14:42:06 +02:00
3fd39e37f2 YoutubeDL: remove method that came from FileDownloader 2013-10-21 13:52:24 +02:00
49e86983e7 Allow to use the extension for the format selection
The best format with the extension is downloaded.
2013-10-21 13:31:55 +02:00
a9c58ad945 Accept requested formats to be in the format 35/best (closes #1552)
The format selection code is now an independent function.
2013-10-21 13:19:58 +02:00
f8b45beacc Merge remote-tracking branch 'rbrito/set-age'
Conflicts:
	youtube_dl/extractor/xhamster.py
2013-10-19 21:16:14 +02:00
9d92015d43 [xhamster] Add support for age_limit (Instead of #1627) 2013-10-19 21:09:48 +02:00
50a6150ed9 extractor: Set age limit on some adult-related extractors.
More age limit of videos for adult-related sites.

Note that, for redtube, I explicitly left the variable containing the age
limit, since the comment justifying the age limit is a good thing to have.

That being said, I included the age limit field on the test, to better
reflect what the information extractor does (even if it may not break the
automated tests).

Signed-off-by: Rogério Brito <rbrito@ime.usp.br>
2013-10-19 14:19:25 -03:00
d5a9bb4ea9 extractor: youtube: Swap video dimensions to match standard practice.
While working on this, I thought about simplifying things like changing
480x854 to 480p, and that seemed like a good option, until I realized that
people (me included) usually link the concept of some number followed by a p
with the video being 16:9.

So, we would be losing some information and, as we all know,
[explicit is better than implicit][*].

[*]: http://www.python.org/dev/peps/pep-0020/

This closes #1446.

Signed-off-by: Rogério Brito <rbrito@ime.usp.br>
2013-10-19 14:04:44 -03:00
b0505eb611 [CinemassacreIE] Fix information extraction 2013-10-19 16:46:17 +02:00
284acd57d6 Add an author email 2013-10-19 11:14:20 +02:00
8ed6b34477 extractor: Set age limit on some adult-related extractors.
This is similar in spirit to what was done in commit 8e590a117f.

Signed-off-by: Rogério Brito <rbrito@ime.usp.br>
2013-10-18 19:32:37 -03:00
f6f1fc9286 extractor: youtube: Fix extension of dash formats.
While we are at it, separate the audio formats from the video formats.

Signed-off-by: Rogério Brito <rbrito@ime.usp.br>
2013-10-18 18:53:00 -03:00
8e590a117f [xnxx] Add age_limit 2013-10-18 23:35:17 +02:00
d5594202aa Simplify release process 2013-10-18 23:34:55 +02:00
b186d949cf release 2013.10.18.2 2013-10-18 23:22:54 +02:00
3d2986063c [bash-completion] Do not use dash in function name (Fixes #1623) 2013-10-18 23:13:46 +02:00
41fd7c7e60 Add new option --abort-on-error 2013-10-18 23:09:32 +02:00
fdefe96bf2 Document %(format)s (#1612) 2013-10-18 23:09:08 +02:00
16f36a6fc9 extractor: youtube: Set extension of AAC audio formats to m4a.
This, in particular, eases downloading both audio and videos in DASH formats
before muxing them, which alleviates the problem that I exposed on issue

Furthermore, one may argue that this is, indeed, the case for correctness's
sake.

Signed-off-by: Rogério Brito <rbrito@ime.usp.br>
2013-10-18 17:50:55 -03:00
f44415360e Use the console_scripts entry point if setuptools is available 2013-10-18 13:49:25 +02:00
cce722b79c Add metavar to --cache-dir 2013-10-18 11:50:48 +02:00
82697fb2ab release 2013.10.18.1 2013-10-18 11:45:30 +02:00
53c1d3ef49 Check for embedded YouTube player (Fixes #1616) 2013-10-18 11:44:57 +02:00
8e55e9abfc release 2013.10.18 2013-10-18 11:17:21 +02:00
7c58ef3275 [tudou] Fix title regex (Fixes #1614) 2013-10-18 11:16:20 +02:00
416a5efce7 fix typos 2013-10-18 00:49:45 +02:00
f4d96df0f1 Extend #980 with --max-quality support 2013-10-18 00:46:35 +02:00
5d254f776a Fix test 2013-10-18 00:27:51 +02:00
1c1218fefc Merge remote-tracking branch 'jaimeMF/format_selection' 2013-10-18 00:17:03 +02:00
d21ab29200 Add an extractor for techtalks.tv (closes #1606) 2013-10-17 08:20:58 +02:00
54ed626cf8 release 2013.10.17 2013-10-17 02:20:26 +02:00
a733eb6c53 [youtube] Do not crash if caption info is missing altogether (Fixes #1610) 2013-10-17 02:19:19 +02:00
591454798d [brightcove] Raise error if playlist is empty (#1608) 2013-10-17 01:02:17 +02:00
38604f1a4f Merge remote-tracking branch 'origin/master' 2013-10-17 00:55:06 +02:00
2d0efe70a6 [brightcove] Fix more broken XML (#1608) 2013-10-17 00:46:11 +02:00
bfd14b1b2f Add an extractor for rutube.ru (closes #1136)
It downloads with a m3u8 manifest, requires ffmpeg.
2013-10-16 16:57:40 +02:00
76965512da Fix the indentation of the Makefile
It uses tabs, no spaces.
2013-10-15 23:15:15 +02:00
996d1c3242 Don't include the test/testdata directory in the youtube-dl.tar.gz
The last releases included big files that increased the size of the compressed file.
2013-10-15 23:08:52 +02:00
8abbf43f21 release 2013.10.15 2013-10-15 12:06:45 +02:00
10eaae48ff Merge branch 'master' of github.com:rg3/youtube-dl 2013-10-15 12:05:24 +02:00
9d4660cab1 [generic] Support embedded vimeo videos (#1602) 2013-10-15 12:05:13 +02:00
9d74e308f7 [sztvhu] Fix the title extraction 2013-10-15 08:22:59 +02:00
e772692ffd Fix an import in the tests and the Youtube Shows test 2013-10-15 08:22:20 +02:00
8381a92120 [websurg] Skipt the test
It needs login information.
2013-10-15 08:12:30 +02:00
cd054fc491 Use upper-case for prefixes in help to signify bytes (#1043) 2013-10-15 04:53:02 +02:00
f219743e33 Merge remote-tracking branch 'alphapapa/master' 2013-10-15 04:52:07 +02:00
4f41664de8 Merge remote-tracking branch 'Rudloff/websurg' 2013-10-15 02:11:33 +02:00
a4fd04158e Do not import * 2013-10-15 02:07:26 +02:00
44a5f1718a Simplify tests
* Make them directly executable again
* Move common stuff (md5, parameters) to helper
* Never import *
* General clean up
2013-10-15 02:00:55 +02:00
a623df4c7b Credit @Elbandi for sztvhu 2013-10-15 01:34:47 +02:00
7cf67fbe29 [sztvhu] Simplify 2013-10-15 01:33:20 +02:00
3ddf1a6d01 Merge remote-tracking branch 'Elbandi/master' 2013-10-15 01:26:34 +02:00
850555c484 Merge remote-tracking branch 'origin/master' 2013-10-15 01:25:47 +02:00
9ed3bdc64d [tudou] Add support for youku links (Closes #1571) 2013-10-15 01:20:04 +02:00
c45aa56080 [gamespot] Fix video extraction (fixes #1587) 2013-10-14 16:46:07 +02:00
7394b8db3b Merge remote-tracking branch 'origin/master' 2013-10-14 16:07:53 +02:00
f9b3d7af47 Add an extractor for Szombathelyi TV 2013-10-14 13:07:47 +02:00
ea62a2da46 add VideoPremium.tv RTMP support 2013-10-14 01:32:47 -04:00
7468b6b71d Merge pull request #1569 from Jaiz909/1321-download-annotations
Added downloading annotations download support - closes #1321
2013-10-13 22:03:22 -07:00
1fb07d10a3 [youtube] Adds #1312 Download annotations
Adds #1321 Download annotations from youtube
Annotations are downloaded and written to a .annotations.xml file using the https://www.youtube.com/annotations_invideo?features=1&legacy=1&video_id=$VIDEOID API.
Added unit test for annotations.
2013-10-14 16:22:27 +11:00
9378ae6e1d [youku] Allow shortcut youku:ID and make non-matching groups non-matching (#1571) 2013-10-13 15:55:05 +02:00
06723d47c4 Merge remote-tracking branch 'jaimeMF/opus-fix' 2013-10-13 15:26:10 +02:00
69a0c470b5 [arte] Add an extractor for future.arte.tv (closes #1593) 2013-10-13 14:21:13 +02:00
c40f5cf45c [arte] add an extractor for creative.arte.tv (#1593)
The +7 videos now use an independent extractor that is also used for the creative videos
2013-10-13 13:54:31 +02:00
4b7b839f24 Add an extractor for rottentomatoes.com and improve InternetVideoArchiveIE to get the best quality 2013-10-12 22:22:31 +02:00
3d60d33773 Add an extractor for videodetective.com (closes #262)
It uses the internetvideoarchive.com platform.
2013-10-12 21:36:17 +02:00
d7e66d39a0 Add an extractor for internetvideoarchive.com videos
It's used by videodetective.com
2013-10-12 21:34:04 +02:00
d3f46b9aa5 Add support for single-test tox runs
Use a sintax like
    tox test.test_download:TestDownload.test_NowVideo
to run the specific test on all the tox environments (Python versions)
2013-10-12 13:17:11 -04:00
f5e54a1fda add support for NowVideo.ch 2013-10-12 13:11:03 -04:00
4eb7f1d12e FFmpegPostProcessor: print the command line used if the --verbose option is given 2013-10-12 13:49:27 +02:00
0f6d12e43c Don't set the '-aq' option with the opus format (fixes #1263) 2013-10-12 13:30:30 +02:00
b4cdc245cf Merge pull request #1590 from joeyadams/master
Fix Brightcove detection when another Flash object is on the page
2013-10-12 02:09:39 -07:00
3283533149 Fix Brightcove detection when another Flash object is on the page
The regex used non-greedy match, but alas it failed on input like this:

    <object class="...> ... class="BrightcoveExperience"

It captured two objects and the intervening HTML.  This commit fixes this by
not allowing a ">" to appear before BrightcoveExperience.

Video in question: http://www.harpercollinschildrens.com/feature/petethecat/
2013-10-11 21:52:33 -04:00
8032e31f2d Merge pull request #1558 from rzhxeo/cinemassacre
Add support for http://cinemassacre.com
2013-10-11 20:38:26 +02:00
d2f9cdb205 Merge branch 'cinemassacre' of github.com:rzhxeo/youtube-dl into rzhxeo-cinemassacre 2013-10-11 19:53:27 +02:00
8016c92297 Fix the default values of format_id and format 2013-10-11 16:34:49 +02:00
e028d0d1e3 Implement the prefer_free_formats in YoutubeDL 2013-10-11 16:34:49 +02:00
79819f58f2 Default 'format' field to {width}x{height}
If width is None, use {height}p and if height is None, '???'
2013-10-11 16:34:49 +02:00
6ff000b888 Do not handle format selection for IEs that already handle it 2013-10-11 16:34:48 +02:00
99e206d508 Implement the max quality option in YoutubeDL 2013-10-11 16:34:48 +02:00
dd82ffea0c Implement format selection in YoutubeDL
Now the IEs can set a formats field in the info_dict, with the formats ordered from worst to best quality. It's a list of dicts with the following fields:
* Mandatory: url and ext
* Optional: format and format_id

The format_id is used for choosing which formats have to be downloaded.

Now a video result is processed by the method process_video_result.
2013-10-11 16:34:48 +02:00
3823342d9d [arte] Prepare for generic format support (#980) 2013-10-11 16:33:31 +02:00
91dbaef406 [nhl] Add an extractor for videocenter's categories (#1586)
It downloads the last 12 videos.
2013-10-11 14:33:26 +02:00
9026dd3858 Make sure it only runs rtmpdump one time in test mode and return True if the download can be resumed 2013-10-11 12:42:15 +02:00
81d7f1928c Merge pull request #1565 from rzhxeo/rtmpdump_test
Only download 1 sec. with rtmpdump in test mode
2013-10-11 12:40:18 +02:00
bc4f29170f Add a PostProcessor for adding metadata to the file (closes #1570)
It currently sets the title, the date and the author values.
2013-10-11 11:19:09 +02:00
cb354c8f62 [yahoo] Download the info from another page
The 'meta' field is not always in the video webpage
2013-10-10 21:01:45 +02:00
1cbb27b151 [gamespot] Mark as broken (#1587) 2013-10-10 19:55:52 +02:00
0ab4ff6378 [mtv] Strip the description
There were some tabs and newlines added around the string.
2013-10-10 19:53:44 +02:00
63da13e829 Add an extractor for faz.net (closes #1582) 2013-10-10 19:37:17 +02:00
4193a453c2 Don't add extractors with IE_DESC set to False to the page of supported sites. 2013-10-10 16:18:02 +02:00
2e1fa03bf5 Add an extractor for video.nhl.com (closes #1586) 2013-10-10 16:16:49 +02:00
8f1ae18a18 release 2013.10.09 2013-10-09 23:50:47 +02:00
57da92b7df [youtube] Do not recognize attribution link as user (Fixes #1573) 2013-10-09 23:50:38 +02:00
df4f632dbc Merge pull request #1584 from wingsuit/master
Tiny tpo
2013-10-09 07:44:06 -07:00
a34c2faae4 [youtube] set the 'name' parameter in the subtitles url (fixes #1577) 2013-10-09 16:41:36 +02:00
Tom
1d368c7589 Tiny tpo 2013-10-09 21:56:09 +08:00
88bd97e34c [vevo] Some improvements (fixes #1580)
Extract the info from http://videoplayer.vevo.com/VideoService/AuthenticateVideo?isrc={id}
Some videos don't have an smil manifest, extract the video urls directly from the json and use the last version of the video.
Extract all the available formats and set the 'formats' field of the result
2013-10-08 21:25:38 +02:00
2ae3edb1cf Fix the printing of the proxy map in debug mode
The proxies have to be extracted from the opener.handlers
2013-10-07 21:10:31 +02:00
b2ad967e45 Simplify test setup 2013-10-07 19:06:36 +02:00
a27b9e8bd5 Move opener setup into a separate helper function 2013-10-07 19:01:47 +02:00
4481a754e4 release 2013.10.07 2013-10-07 14:34:19 +02:00
faa6ef6bc8 [jeuxvideo] Improve code quality (fixes #1567) 2013-10-07 14:33:23 +02:00
15870e90b0 Restore warning when user forgets to quote URL (#1396) 2013-10-07 12:21:24 +02:00
8e4f824365 Remove test parameter from _download_with_rtmpdump 2013-10-06 22:04:32 +02:00
387ae5f30b [vimeo] Recognize urls ending in a slash (fixes #1242) 2013-10-06 21:56:23 +02:00
ad7a071ab6 Only download 1 sec. with rtmpdump in test mode 2013-10-06 20:55:24 +02:00
1310bf2474 [redtube] add age_limit 2013-10-06 16:39:35 +02:00
b24f347190 Merge branch 'download-archive'
Conflicts:
	youtube_dl/YoutubeDL.py
	youtube_dl/__init__.py
2013-10-06 16:30:26 +02:00
ee6c9f95e1 Remove superfluous parenthesis 2013-10-06 16:28:36 +02:00
2a69c6b879 Merge branch 'age_limit' 2013-10-06 16:23:18 +02:00
cfadd183c4 Call extracted property age_limit everywhere 2013-10-06 16:23:06 +02:00
e484c81f0c [generic] Clarify error messages 2013-10-06 16:03:18 +02:00
7e5e8306fd release 2013.10.06 2013-10-06 07:13:14 +02:00
41e8bca4d0 [viddler] Add basic support (Fixes #1520) 2013-10-06 07:12:47 +02:00
8dbe9899a9 Allow users to specify an age limit (fixes #1545)
With these changes, users can now restrict what videos are downloaded by the intented audience, by specifying their age with --age-limit YEARS .
Add rudimentary support in youtube, pornotube, and youporn.
2013-10-06 06:08:56 +02:00
f4aac741d5 Move try_rm to test helpers 2013-10-06 05:47:17 +02:00
c1c9a79c49 Add basic --download-archive option
Often, users want to be able to download only videos they haven't seen before, despite the video files having been deleted or moved in the mean time.
When --download-archive FILE is given, the extractor and ID of every download is recorded in the specified file. If it is already present, the video in question is skipped.
2013-10-06 04:27:10 +02:00
226113c880 Merge remote-tracking branch 'origin/tox' 2013-10-05 22:47:44 +02:00
8932a66e49 [fixup] remove unnecessary commented function 2013-10-05 16:38:37 -04:00
79cfb46d42 add tox configuration file for easy testing 2013-10-05 16:08:48 -04:00
00fcc17aee add capability to suppress expected warnings in tests 2013-10-05 15:55:58 -04:00
e94b783c74 [googleplus] Fix upload_date detection 2013-10-05 16:38:33 +02:00
97dae9ae07 [bliptv] Make sure video ID is a string 2013-10-05 16:12:29 +02:00
ca215e0a4f [CinemassacreIE] Use MD5 to check in TEST description 2013-10-05 13:42:17 +02:00
91a26ca559 [CinemassacreIE] Remove docstring from class 2013-10-05 13:40:05 +02:00
1ece880d7c [CinemassacreIE] Add support for other embed methods 2013-10-05 13:36:13 +02:00
400afddaf4 Add CinemassacreIE 2013-10-05 09:37:11 +02:00
c3fef636b5 [dailymotion] Fix playlist extraction
The html code has changed, make the video ids extraction more solid.
2013-10-04 14:07:29 +02:00
46e28a84ca [brightcove] Fix up some broken HTML (#1553) 2013-10-04 11:53:49 +02:00
17ad2b3fb1 [yahoo] Switch ext of test 2013-10-04 11:44:56 +02:00
5e2a60db4a [yahoo] Fix test title 2013-10-04 11:44:02 +02:00
cd214418f6 [redtube] pep8 2013-10-04 11:41:57 +02:00
ba2d9f213e [jeuxvideo] fix video file md5sum 2013-10-04 11:38:56 +02:00
7f8ae73a5d Include length in player cache ID
Some videos use the same player with IDs of multiple lengths.
See https://travis-ci.org/rg3/youtube-dl/jobs/12126506#L319 for an example.
2013-10-04 11:36:06 +02:00
466880f531 [yahoo] Do not try to run rtmpdump on travis 2013-10-04 11:34:12 +02:00
9f1f6d2437 [rtlnow] Skip test on travis 2013-10-04 11:33:14 +02:00
9e0f897f6b [francetv] Use common format for ID of generation-quoi subextractor 2013-10-04 11:30:47 +02:00
c0f6aa876f Merge remote-tracking branch 'origin/master' 2013-10-04 11:14:20 +02:00
d93bdee9a6 [comedycentral] Prepare for generic video extraction (#980) 2013-10-04 11:14:10 +02:00
f13d09332d [mtv] Prepare for #980 2013-10-04 11:10:04 +02:00
2f5865cc6d Clarify that url and ext are optional when formats is given (#980) 2013-10-04 11:09:43 +02:00
deefc05b88 Document formats (for #980) 2013-10-04 10:40:42 +02:00
0d8cb1cc14 [ted] Prepare #980 merge 2013-10-04 10:32:34 +02:00
a90b9fd209 Merge pull request #1551 from rzhxeo/flickr
[FlickrIE] Fix HTTPS url
2013-10-03 23:14:12 -07:00
829493439a [FlickrIE] Fix HTTPS url 2013-10-04 07:47:40 +02:00
73b4fafd82 Use self._download_webpage everywhere 2013-10-04 01:12:42 +02:00
b039775057 Unused variable 2013-10-04 01:07:24 +02:00
5c1d63b737 Changes suggested by @phihag 2013-10-04 01:04:38 +02:00
3cd022f6e6 Merge remote-tracking branch 'rzhxeo/rtl_ntv' 2013-10-04 00:59:11 +02:00
abefd1f7c4 Merge remote-tracking branch 'rzhxeo/rtl_upload_date' 2013-10-04 00:58:35 +02:00
c21315f273 [youtube] new static 82 signature 2013-10-04 00:43:01 +02:00
9ab1018b1a release 2013.10.04 2013-10-04 00:38:19 +02:00
da0a5d2d6e [france2] Add support for URLs without video IDs (Fixes #1547) 2013-10-04 00:34:36 +02:00
ee6adb166c [ign] Support more urls and detect multiple videos in articles (fixes #1543) 2013-10-02 20:59:34 +02:00
be8fe32c92 Fix help of --cachedir 2013-10-02 14:37:19 +02:00
c38b1e776d [youtube] Simplify cache_dir code (#1529) 2013-10-02 08:41:14 +02:00
4f8bf17f23 Merge remote-tracking branch 'holomorph/master' 2013-10-02 08:23:53 +02:00
ca40186c75 [youtube] Fix static 82 signature (Closes #1539) 2013-10-02 08:20:00 +02:00
a8c6b24155 [youtube] Support videos without a title (Fixes #1391, Closes #1542) 2013-10-02 07:25:35 +02:00
bd8e5c7ca2 Merge pull request #1531 from rg3/no-playlist
[youtube] implement --no-playlist to only download current video
2013-10-01 10:08:20 -07:00
7c61bd36bb [youtube] correct --no-playlist for python3 2013-10-01 11:58:13 -04:00
c54283824c [dailymotion] Detect vevo videos (fixes #1532)
All videos from the Vevo user, just embed videos from vevo.com
2013-10-01 15:05:41 +02:00
52f15da2ca release 2013.10.01.1 2013-10-01 14:44:26 +02:00
44d466559e Properly handle stream meap not being present 2013-10-01 14:44:09 +02:00
05751eb047 release 2013.10.01 2013-10-01 11:43:54 +02:00
f10503db67 Handle videos without url_encoded_fmt_stream_map (Fixes #1535) 2013-10-01 11:39:11 +02:00
adfeafe9e1 [RTLnowIE] Allow video description without upload date
Some videos (feature films) have no upload date.
2013-10-01 07:22:49 +02:00
4c62a16f4f [RTLnowIE] Add support for http://n-tvnow.de 2013-10-01 06:55:30 +02:00
c0de39e6d4 Merge pull request #2 from rg3/master
Update
2013-09-30 21:39:58 -07:00
fa55675593 Support XDG base directory specification 2013-09-30 18:22:38 -04:00
d4d9920a26 add test for --no-playlist 2013-09-30 18:01:17 -04:00
47192f92d8 implement --no-playlist to only download current video - closes #755 2013-09-30 16:26:25 -04:00
722076a123 [rtlnow] Replace one of the tests
The video is no longer available.
2013-09-29 23:07:26 +02:00
bb4aa62cf7 [appletrailers] The request for the settings must have the trailer name in lower case (fixes #1329) 2013-09-29 20:59:19 +02:00
843530568f [appletrailers] Rework extraction (fixes #1387)
The exraction was broken:
* The includes page contains img elements that need to be fixed.
* Use the 'itunes.inc' page, it contains a json dictionary for each trailer with information.
* Get the formats from 'includes/settings{trailer_name}.json'
* Use urljoin to allow urls with a fragment identifier to work

Removed the thumbnail urls from the tests, they are different now.
2013-09-29 20:49:58 +02:00
138a5454b5 release 2013.09.29 2013-09-29 14:38:37 +02:00
d279037036 [update] Prevent cmd window popup on Windows (Fixes #1478) 2013-09-29 14:37:06 +02:00
46353f6783 [update] Look for .exe extension on Windows (Fixes #745) 2013-09-29 14:37:00 +02:00
70922df8b5 [dailymotion] Disable the family filter in the playlists (fixes #1524) 2013-09-29 12:44:02 +02:00
9c15e9de84 [yahoo] Fix video extraction (fixes #1521)
There's no need to use two different methods.
Now we can also download videos over http if possible.
Also run the test for rtmp videos, but skip the download.
2013-09-28 21:19:52 +02:00
123c10608d Merge branch 'master' of github.com:rg3/youtube-dl 2013-09-28 15:43:38 +02:00
0b7c2485b6 [zdf] Add support for hash URLs and simplify (#1518) 2013-09-28 15:43:34 +02:00
9abb32045a [youtube] Add hlsvp to the error message if it can't be found and remove the live stream test
It's no longer available, other olympics streams have the same problem.
2013-09-27 15:06:27 +02:00
f490e77e77 [youtube] Set the thumbnail to None if it can't be extracted 2013-09-27 14:22:36 +02:00
2dc592991a [youtube] update description of test 2013-09-27 14:20:52 +02:00
0a60edcfa9 Don't fail if the video thumbnail couldn't be downloaded (fixes #1516)
Just report a warning
2013-09-27 14:19:19 +02:00
c53f9d30c8 Merge branch 'master' of github.com:rg3/youtube-dl 2013-09-27 13:09:58 +02:00
509f398292 Remove youtube_genalgo (#1515)
With the automatic signature extraction, this script has become superfluous now
2013-09-27 13:09:24 +02:00
74bab3f0a4 Don't embed subtitles if the list is empty or the field is not set (fixes #1510) 2013-09-27 08:08:43 +02:00
8574862991 Merge remote-tracking branch 'rzhxeo/RTL_T' 2013-09-27 06:25:04 +02:00
2de957c7e1 Merge remote-tracking branch 'rzhxeo/RTL' 2013-09-27 06:23:10 +02:00
920de7a27d [youtube] Fix 83 signature (Closes #1511) 2013-09-27 06:15:21 +02:00
63efc427cd [RTLnowIE] Clean video title
The title of some videos has the following format:
Series - Episode | Series online schauen bei ... NOW
2013-09-27 06:00:37 +02:00
ce65fb6c76 [RTLnowIE] Add support for http://rtlnitronow.de 2013-09-27 05:50:16 +02:00
4de1994b6e [brightcove] Use direct url for the tests
The test_all_urls.py test failed because BrightcoveIE doesn't match them.
2013-09-26 18:59:56 +02:00
592882aa9f [brightcove] Support videos that only provide flv versions (fixes #1504)
Moved the test from generic.py to brightcove.py
2013-09-26 13:54:31 +02:00
b98d6a1e19 release 2013.09.24.2 2013-09-24 21:55:34 +02:00
29c7a63df8 Remove debugging code 2013-09-24 21:55:25 +02:00
8b25323ae2 release 2013.09.24.1 2013-09-24 21:40:47 +02:00
f426de8460 Merge remote-tracking branch 'origin/master' 2013-09-24 21:40:30 +02:00
695dc094ab Merge branch 'automatic-signatures' 2013-09-24 21:40:08 +02:00
e80d861064 Revert "[southparkstudios] Fix mgid extraction"
This reverts commit 0fd49457f5.

It seems that the redesign was temporary.
2013-09-24 21:39:38 +02:00
2cdeb20135 release 2013.09.24 2013-09-24 21:28:06 +02:00
7f74773254 Add option --no-cache-dir 2013-09-24 21:26:10 +02:00
f2c327fd39 Fix 86 signature (#1494) 2013-09-24 21:20:42 +02:00
e35e4ddc9a Fix output of --youtube-print-sig-code when counting down to 0 2013-09-24 21:18:03 +02:00
c3c88a2664 Allow opts.cachedir == None to disable cache 2013-09-24 21:04:43 +02:00
bb0eee71e7 [youtube] Update one of the test's description 2013-09-24 21:04:13 +02:00
6f56389b88 [youtube] update algos for length 86 and 84 (fixes #1494) 2013-09-24 21:02:00 +02:00
5b333c1ce6 [francetv] Add an extractor for Generation Quoi (closes #1475) 2013-09-23 21:41:54 +02:00
a825f33030 [francetv] Add an extractor for France2 2013-09-23 21:28:33 +02:00
92f618f2e2 Merge remote-tracking branch 'origin/master' 2013-09-23 11:24:49 +02:00
81ec7c7901 [facebook] Allow untitled videos (Fixes #1484) 2013-09-23 11:24:33 +02:00
dd5d2eb03c If the file is already downloaded include the size in the progress hook 2013-09-22 23:39:30 +02:00
4ae720042c Include the eta and the speed in the progress hooks
Useful when listening to the progress hook, for example in a GUI.
2013-09-22 23:31:39 +02:00
c705320f48 Correct test strings 2013-09-22 12:18:16 +02:00
d2d8f89531 Do not warn if fallback is without alternatives (because we did not get the flash player URL) 2013-09-22 12:18:10 +02:00
bdde940e90 [youtube] Improve flash player URL handling 2013-09-22 12:17:42 +02:00
45f4a76dbc Work around nosetests nosiness 2013-09-22 11:45:29 +02:00
13dc64ce74 [youtube] Remove _decrypt_signature_age_gate 2013-09-22 11:17:21 +02:00
c35f9e72ce Move cachedir doc 2013-09-22 11:09:25 +02:00
f8061589e6 [youtube] Actually pass in cachedir option 2013-09-22 10:51:33 +02:00
0ca96d48c7 [youtube] Improve source code quality 2013-09-22 10:37:23 +02:00
4ba146f35d Update static signatures 2013-09-22 10:31:25 +02:00
edf3e38ebd [youtube] Improve cache and add an option to print the extracted signatures 2013-09-22 10:30:02 +02:00
c4417ddb61 [youtube] Add filesystem signature cache 2013-09-22 00:35:03 +02:00
4a2080e407 [youku] better error handling
blocked videos used to cause death by TypeError, now we report what the
server says
2013-09-21 20:50:31 +02:00
2f2ffea9ca Clarify a couple of calls 2013-09-21 15:34:29 +02:00
ba552f542f Use reader instead of indexing 2013-09-21 15:32:37 +02:00
8379969834 Prepare signature function caching 2013-09-21 15:19:48 +02:00
95dbd2f990 Change test target (Verified with node.js) 2013-09-21 15:10:38 +02:00
a7177865b1 Implement more opcodes 2013-09-21 14:48:12 +02:00
e0df6211cc Restore accidentally deleted commits
That's what happens if you let Windows machines write :(
2013-09-21 14:40:35 +02:00
b00ca882a4 [livestream] Fix events extraction (fixes #1467) 2013-09-21 13:50:52 +02:00
39baacc49f [dailymotion] Add an extractor for users (closes #1476) 2013-09-21 12:45:53 +02:00
3a1d48d6de [dailymotion] Raise ExtractorError if the dailymotion response reports an error 2013-09-21 12:15:54 +02:00
34308b30d6 Warn if no locale is set (#1474) 2013-09-21 11:48:07 +02:00
bc1506f8c0 Merge branch 'master' of github.com:rg3/youtube-dl 2013-09-21 11:10:30 +02:00
b61067fa4f Abort if extractaudio is given without a variable extension (#1470) 2013-09-21 11:10:22 +02:00
69b227a9bc [southparkstudios] add support for http://www.southparkstudios.com/full-episodes/* urls (closes #1469) 2013-09-21 10:58:43 +02:00
0fd49457f5 [southparkstudios] Fix mgid extraction 2013-09-21 10:51:25 +02:00
58f289d013 release 2013.09.20.1 2013-09-20 22:59:14 +02:00
3d60bb96e1 Add an extractor for ebaumsworld.com (closes #1462) 2013-09-20 16:55:50 +02:00
38d025b3f0 [youtube] add algo for length 91 2013-09-20 14:43:16 +02:00
c40c6aaaaa Catch socket.error before IOError
Since python 2.6 it's a child class.
2013-09-20 13:26:03 +02:00
1a810f0d4e [funnyordie] Fix video url extraction 2013-09-20 13:05:34 +02:00
63037593c0 release 2013.09.20 2013-09-20 10:24:48 +02:00
7a878d47fa Merge pull request #1464 from patrickslin/patch-7
Unable to decrypt signature length 93 (fixes #1461)
2013-09-20 08:25:10 +02:00
bc4b900898 Unable to decrypt signature length 93 (fixes #1461) 2013-09-19 21:49:06 -07:00
c5e743f66f [fktv] support videos splitted in any number of parts and some style changes 2013-09-18 23:32:37 +02:00
6c36d8d6fb Merge pull request #1438 from rzhxeo/fktv
Add support for http://fernsehkritik.tv
2013-09-18 23:05:56 +02:00
71c82637e7 [youtube] apply the fix for lists with number of videos multiple of _MAX_RESULTS to user extraction
Copied from the playlist extractor.
2013-09-18 23:00:32 +02:00
2dad310e2c Credit @Ruirize for newgrounds 2013-09-18 22:30:22 +02:00
d0ae9e3a8d [newgrounds] simplify 2013-09-18 22:14:43 +02:00
a19413c311 Changed file hash. 2013-09-18 17:17:12 +01:00
1ef80b55dd Fixes test fail
Was unaware of --id being passed to test.
2013-09-18 16:23:38 +01:00
eb03f4dad3 Added Newgrounds support 2013-09-18 15:54:45 +01:00
830dd1944a Clarify -i help (#1453) 2013-09-18 13:23:04 +02:00
cc6943e86a Improvements 2013-09-18 00:07:04 +02:00
1237c9a3a5 XHamsterIE: Fix support for new HD video url format and add test (closes PR #1443) 2013-09-17 23:08:01 +02:00
8f77093262 Merge remote-tracking branch 'upstream/master' into websurg 2013-09-17 23:07:44 +02:00
5d13df79a5 [francetv] Remove Pluzz test
Videos expire in 7 days
2013-09-17 22:49:43 +02:00
d79a0e233a Extractor for websurg.com 2013-09-17 22:13:40 +02:00
6523223a4c [hotnewhiphop] Fix test case title 2013-09-17 21:10:57 +02:00
4a67aafb7e [youtube] Don't search the flash player version for videos with age gate activated 2013-09-17 20:59:55 +02:00
f3f34c5b0f release 2013.09.17 2013-09-17 17:00:20 +02:00
6ae8ee3f54 Update 85 signature (Fixes #1449)
This is the first signature algorithm to have been parsed automatically, although that only works for HTML5 players for now, and is not yet integrated into master.
2013-09-17 16:59:13 +02:00
e8f8e80097 Add an extractor for vice.com (closes #1051) 2013-09-16 20:58:36 +02:00
4dc0ff3ecf [ooyala] prefer ipad url
It has better quality with m3u8 manifests
2013-09-16 20:38:54 +02:00
4b6462fc1e Add an extractor for Bloomberg (closes #1436) 2013-09-16 20:38:48 +02:00
c4ece78564 [ooyala] add support for more type of video urls, like m3u8 manifests. 2013-09-16 19:34:10 +02:00
0761d02b0b Add FKTV extractor 2013-09-16 14:46:19 +02:00
71c107fc57 Add FKTV extractor
Support for Fernsehkritik-TV (incl. Postecke)
2013-09-16 14:45:14 +02:00
7459e3a290 Always correct encoding when writing to sys.stderr (Fixes #1435) 2013-09-16 06:55:41 +02:00
f9e66fb993 release 2013.09.16 2013-09-16 04:12:57 +02:00
6c603ccce3 [devscripts/release] temporary workarounds 2013-09-16 04:12:43 +02:00
ef66b0c6ef Merge remote-tracking branch 'origin/master' 2013-09-16 03:32:53 +02:00
22b50ecb2f Starts of a Windows service 2013-09-16 03:32:45 +02:00
5a6fecc3de Add an extractor for southparkstudios.com (closes #1434)
It uses the MTV system
2013-09-15 23:30:58 +02:00
cdbccafed9 Merge pull request #1422 from rzhxeo/xhamster
XHamsterIE: Add support for new URL format (download in hd by default)
2013-09-15 12:18:39 +02:00
e69ae5b9e7 [youtube] support youtube.googleapis.com/v/* urls (fixes #1425) 2013-09-15 12:14:59 +02:00
92790f4e54 [soundcloud] Add an extractor for users (closes #1426) 2013-09-14 21:41:49 +02:00
471a5ee908 Set the ext field for each format 2013-09-14 14:45:04 +02:00
19e1d35989 [mixcloud] Rewrite extractor (fixes #278) 2013-09-14 14:26:42 +02:00
0b7f31184d Now --all-sub is a modifier to --write-sub and --write-auto-sub (closes #1412)
For keeping backwards compatibility --all-sub sets --write-sub if --write-auto-sub is not given
2013-09-14 11:14:40 +02:00
fad84d50fe [googleplus] Fix upload date extraction 2013-09-14 11:10:01 +02:00
9a1c32dc54 XHamsterIE: Add support for new URL format 2013-09-14 05:42:00 +02:00
a921f40799 [ustream] Simplify channel extraction
the ChannelParser has been moved to a new function in utils get_meta_content
Instead of the SocialStreamParser now it uses a regex
2013-09-13 22:05:29 +02:00
74ac9bdd82 Merge pull request #1413 from tewe/master
Add Ustream channel support
2013-09-13 21:34:31 +02:00
94518f2087 Merge pull request #1409 from JohnyMoSwag/master (closes #1404)
added kickstarter IE
2013-09-13 19:52:56 +02:00
535f59bbcf Merge pull request #1350 from Jaiz909/description-keyerror-fix
Fixed issue #1277 KeyError when no description.
2013-09-13 18:20:42 +02:00
71cedb3c0c [buildserver] Service installation and uninstallation 2013-09-13 02:25:12 +02:00
dd01d6558a [gamespot] Update test video title 2013-09-12 22:18:39 +02:00
ce85f022d2 [youtube] update algo for length 82 (fixes #1416) 2013-09-12 22:04:09 +02:00
ad94a6fe44 [canalplust] accept urls that don't include the video id (fixes #1415), extract more info and update test 2013-09-12 21:56:36 +02:00
353ba14060 [buildserver] Rely on repository license 2013-09-12 16:34:24 +02:00
83de794223 Add original buildserver from @fraca7 2013-09-12 16:30:43 +02:00
bfd5c93af9 Add Ustream channel support 2013-09-12 12:30:14 +02:00
c247d87ef3 [funnyordie] fix video url extraction 2013-09-12 11:31:27 +02:00
07ac9e2cc2 release 2013.09.12 2013-09-12 11:26:44 +02:00
6bc520c207 Check for both automatic captions and subtitles with options --write-sub and --write-auto-sub (fixes #1224) 2013-09-12 11:15:25 +02:00
f1d20fa39f added kickstarter IE 2013-09-11 14:50:38 -07:00
e3dc22ca3a [youtube] Fix detection of videos with automatic captions 2013-09-11 19:24:56 +02:00
d665f8d3cb [subtitles] Also list the available automatic captions languages with '--list-sub' 2013-09-11 19:17:30 +02:00
055e6f3657 [youtube] Support automatic captions with original language different from English (fixes #1225) and download in multiple languages. 2013-09-11 19:08:43 +02:00
ac4f319ba1 Credit @iemejia 2013-09-11 17:58:51 +02:00
542cca0e8c Merge branch 'subtitles_rework' (closes PR #1326) 2013-09-11 17:41:24 +02:00
6a2449df3b [howcast] Do not download from http://www.howcast.com/videos/{video_id}
It takes too much to follow the redirection.
2013-09-11 17:36:23 +02:00
7fad1c6328 [subtitles] Use self._download_webpage for extracting the subtitles
It raises ExtractorError for the same exceptions we have to catch.
2013-09-11 16:24:47 +02:00
d82134c339 [subtitles] Simplify the extraction of subtitles in subclasses and remove NoAutoSubtitlesInfoExtractor
Subclasses just need to call the method extract_subtitles, which will call _extract_subtitles and _request_automatic_caption
Now the default implementation of _request_automatic_caption returns {}.
2013-09-11 16:05:49 +02:00
54d39d8b2f [subtitles] rename SubitlesIE to SubtitlesInfoExtractor
Otherwise it can be automatically detected as a IE ready for use.
2013-09-11 15:51:04 +02:00
de7f3446e0 [youtube] move subtitles methods from the base extractor to YoutubeIE 2013-09-11 15:48:23 +02:00
f8e52269c1 [subtitles] made inheritance hierarchy flat as requested 2013-09-11 15:21:09 +02:00
cf1dd0c59e Merge branch 'master' into subtitles_rework 2013-09-11 14:26:48 +02:00
22c8b52545 In the supported sites page, sort the extractors in case insensitive 2013-09-11 12:04:27 +02:00
1f7dc42cd0 release 2013.11.09 2013-09-11 11:30:10 +02:00
aa8f2641da [youtube] update algo for length 85 (fixes #1408 and fixes #1406) 2013-09-11 11:24:58 +02:00
648d25d43d [francetv] Add an extractor for francetvinfo.fr (closes #1317)
It uses the same system as Pluzz, create a base class for both extractors.
2013-09-10 15:50:34 +02:00
df3e61003a Merge pull request #1402 from Rudloff/canalc2
Wrong property name
2013-09-10 03:19:37 -07:00
6b361ad5ee Wrong property name 2013-09-10 12:13:22 +02:00
5d8afe69f7 Add an extractor for pluzz.francetv.fr (closes PR #1399) 2013-09-10 12:00:00 +02:00
a1ab553858 release 2013.09.10 2013-09-10 11:25:11 +02:00
07463ea162 Add an extractor for Slideshare (closes #1400) 2013-09-10 11:19:58 +02:00
6d2d21f713 [sohu] add support for my.tv.sohu.com urls (fixes #1398) 2013-09-09 19:56:16 +02:00
061b2889a9 Fix the minutes part in FileDownloader.format_seconds (fixed #1397)
It printed for the minutes the result of (seconds // 60)
2013-09-09 10:38:54 +02:00
8963d9c266 [youtube] Modify the regex to match ids of length 11 (fixes #1396)
In urls like http://www.youtube.com/watch?v=BaW_jenozKcsharePLED17F32AD9753930 you can't split the query string and ids always have that length.
2013-09-09 10:33:12 +02:00
890f62e868 Revert "[youtube] Fix detection of tags from HLS videos."
They have undo the change

This reverts commit 0638ad9999.
2013-09-08 18:50:07 +02:00
8f362589a5 release 2013.09.07 2013-09-07 22:29:15 +02:00
a27a2470cd Merge branch 'master' of github.com:rg3/youtube-dl 2013-09-07 22:28:54 +02:00
72836fcee4 Merge branch 'master' into subtitles_rework 2013-09-06 23:24:41 +02:00
a7130543fa [generic] If the url doesn't specify the protocol, then try to extract prepending 'http://' 2013-09-06 18:39:35 +02:00
a490fda746 [daylimotion] accept embed urls (fixes #1386) 2013-09-06 18:36:07 +02:00
7e77275293 Add an extractor for Metacritic 2013-09-06 18:08:07 +02:00
d6e203b3dc [subtitles] fixed multiple subtitles language separated by comma after merge
As mentioned in the pull request, I forgot to include this changes.
aa6a10c44a
2013-09-06 16:30:13 +02:00
e3ea479087 [youtube] Fix some issues with the detection of playlist/channel urls (reported in #1374)
They were being caught by YoutubeUserIE, now it only extracts a url if the rest of extractors aren't suitable.
Now the url tests check that the urls can only be extracted with an specific extractor.
2013-09-06 16:24:24 +02:00
faab1d3836 [youtube] Fix detection of feeds urls (fixes #1294)
Urls like https://www.youtube.com/feed/watch_later were being as users (before the last changes to YoutubeUserIE, as videos)
2013-09-06 14:45:49 +02:00
8851a574a3 Fix add-versions 2013-09-06 11:07:34 +02:00
59282080c8 release 2013.09.06.1 2013-09-06 10:53:35 +02:00
98f3da4040 Merge remote-tracking branch 'origin/master' 2013-09-06 10:53:24 +02:00
1d213233cd Do not re-download files for hashsum generation (Fixes #1383) 2013-09-06 10:51:53 +02:00
fd9cf73836 [youtube] Users: download from the api in json to simplify extraction (fixes #1358)
There could be duplicate videos or other videos if the description have links.
2013-09-06 10:43:02 +02:00
0638ad9999 [youtube] Fix detection of tags from HLS videos. 2013-09-06 10:25:31 +02:00
1eb527692a release 2013.09.06 2013-09-06 10:13:33 +02:00
09bb17e108 Merge pull request #1378 from patrickslin/patch-6
Vevo sig changed again, please update for us! Thanks very much! (fixes #...
2013-09-06 09:53:23 +02:00
1cf911bc82 Vevo sig changed again, please update for us! Thanks very much! (fixes #1375) 2013-09-05 17:38:03 -07:00
f4b052321b [youtube] Urls like youtube.com/NASA are now interpreted as users (fixes #1069)
Video urls like http://youtube.com/BaW_jenozKc are not valid, but http://youtu.be/BaW_jenozKc is correct.
2013-09-05 22:39:15 +02:00
a636203ea5 release 2013.09.05 2013-09-05 22:30:50 +02:00
c215217e39 [youtube] Playlists: extract the videos id from ['media$group']['yt$videoid'] (fixes #1374)
'media$player' is not defined for private videos.
2013-09-05 21:40:04 +02:00
08e291b54d [generic] Recognize html5 video in the format '<video src=".+?"' and only unquote the url when extracting the id (fixes #1372) 2013-09-05 18:02:17 +02:00
6b95b065be Add extractor for tvcast.naver.com (closes #1331) 2013-09-05 10:53:40 +02:00
9363169b67 [daum] Get the video page from a canonical url to extract the full id (fixes #1373) and extract description. 2013-09-05 10:08:17 +02:00
085bea4513 Credit @Huarong for tv.sohu.com 2013-09-04 22:09:22 +02:00
150f20828b Add extractor for daum.net (closes #1330) 2013-09-04 22:06:50 +02:00
08523ee20a release 2013.09.04 2013-09-04 14:33:32 +02:00
5d5171d26a Merge pull request #1341 from xanadu/master
add support for "-f mp4" for YouTube
2013-09-03 18:52:12 -07:00
96fb5605b2 AHLS -> Apple HTTP Live Streaming 2013-09-03 18:49:35 -07:00
7011de0bc2 Merge pull request #1363 from Rudloff/defense
defense.gouv.fr
2013-09-03 18:23:08 -07:00
c3dd69eab4 Merge remote-tracking branch 'upstream/master' 2013-09-03 12:22:29 -07:00
025171c476 Suggested by @phihag 2013-09-03 12:03:19 +02:00
c8dbccde30 [orf] Remove the test video, they seem to expire in one week 2013-09-03 11:51:01 +02:00
4ff7a0f1f6 [dailymotion] improve the regex for extracting the video info 2013-09-03 11:33:59 +02:00
9c2ade40de [vimeo] Handle Assertions Error when trying to get the description
In some pages the html tags are not closed, python 2.6 cannot handle it.
2013-09-03 11:11:36 +02:00
aa32314d09 [vimeo] add support for videos that embed the download url in the player page (fixes #1364) 2013-09-03 10:48:56 +02:00
52afe99665 Extractor for defense.gouv.fr 2013-09-03 01:51:17 +02:00
b0446d6a33 Merge remote-tracking branch 'upstream/master' 2013-09-03 01:27:49 +02:00
8e4e89f1c2 Add an extractor for VeeHD (closes #1359) 2013-09-02 11:54:09 +02:00
6c758d79de [metacafe] Add more cases for detecting the uploader detection (reported in #1343) 2013-08-31 22:35:39 +02:00
691008087b Add an automatic page generator for the supported sites (related #156)
They are listed in the "supportedsites.html" page.
2013-08-31 15:18:52 +02:00
85f03346eb Merge remote-tracking branch 'upstream/master' 2013-08-30 17:51:59 -07:00
bdc6b3fc64 add support for "-f mp4" for YouTube 2013-08-30 17:51:50 -07:00
847f582290 Merge remote-tracking branch 'upstream/master' 2013-08-31 00:37:29 +02:00
10f5c016ec release 2013.08.30 2013-08-30 21:02:07 +02:00
2e756879f1 [youtube] update algo for length 86 2013-08-30 20:49:51 +02:00
c7a7750d3b [youtube] Fix typo in the _VALID_URL for YoutubeFavouritesIE, it was intended to also match :ytfavourites 2013-08-30 20:13:05 +02:00
9193c1eede Add youtube keywords to the bash completion script 2013-08-30 20:11:53 +02:00
b3f0e53048 Fixed issue #1277 KeyError when no description.
Allows a continue with a warning when an extractor cannot retrieve a description.
2013-08-31 01:53:01 +10:00
3243d0f7b6 release 2013.08.29 2013-08-29 23:29:34 +02:00
23b00bc0e4 [youtube] update algo for length 84
Only appears sometimes, nearly identical to length 86.
2013-08-29 22:44:29 +02:00
52e1eea18b [youtube] update algo for length 86 (fixes #1349) 2013-08-29 22:33:58 +02:00
ee80d66727 [ign] update 1up extractor to work with the updated IGNIE 2013-08-29 21:51:09 +02:00
f1fb2d12b3 [ign] extract videos from articles pages 2013-08-29 21:39:36 +02:00
deb2c73212 Merge pull request #1347 from whydoubt/fix_orf_at
Fix orf.at extractor by adding file coding mark
2013-08-29 11:05:38 -07:00
8928491074 Fix orf.at extractor by adding file coding mark 2013-08-29 12:51:38 -05:00
545434670b Add an extractor for orf.at (closes #1346)
Make find_xpath_attr also accept numbers in the value
2013-08-29 19:16:07 +02:00
54fda45bac Merge pull request #1342 from whydoubt/fix_mit_26
Fix MIT extractor for Python 2.6
2013-08-29 13:42:08 +02:00
c7bf7366bc Update descriptions checksum for some test for Unistra and Youtube 2013-08-29 13:41:59 +02:00
b7052e5087 Also print the field that fails if it is a md5 checksum 2013-08-29 12:15:45 +02:00
0d75ae2ce3 Fix detection of the webpage charset if it's declared using ' instead of "
Like in "<meta charset='utf-8'/>"
2013-08-29 11:35:15 +02:00
b5ba7b9dcf Fix MIT extractor for Python 2.6
The HTML for the MIT page does not parse cleanly for Python 2.6 due
to script tags within an actual script element.  The offending piece
is inside a comment block, so removing all such comment blocks
fixes the parsing.
2013-08-28 14:24:42 -05:00
483e0ddd4d Merge remote-tracking branch 'upstream/master' 2013-08-28 10:19:28 -07:00
2891932bf0 release 2013.08.28.1 2013-08-28 19:00:17 +02:00
591078babf Merge remote-tracking branch 'upstream/master' 2013-08-28 09:57:28 -07:00
9868c781a1 Merge remote-tracking branch 'origin/master' 2013-08-28 18:22:33 +02:00
c257baff85 Merge remote-tracking branch 'rzhxeo/youporn-hd'
Conflicts:
	youtube_dl/utils.py
2013-08-28 18:22:28 +02:00
878e83c5a4 YoupornIE: Clean up extraction of hd video 2013-08-28 16:04:48 +02:00
0012690aae Let aes_decrypt_text return bytes instead of unicode 2013-08-28 16:03:35 +02:00
6e74bc41ca Fix division bug in aes.py 2013-08-28 16:01:43 +02:00
cba892fa1f Add intlist_to_bytes to utils.py 2013-08-28 15:59:07 +02:00
550bfd4cbd Merge pull request #1 from phihag/youporn-hd-pr
Allow changes to run under Python 3
2013-08-28 06:23:33 -07:00
920ef0779b Hide the password and username in verbose mode (closes #1089) 2013-08-28 15:14:02 +02:00
48ea9cea77 Allow changes to run under Python 3 2013-08-28 14:34:49 +02:00
ccf4b799df Merge remote-tracking branch 'origin/master' 2013-08-28 14:02:40 +02:00
f143d86ad2 [sohu] Handle encoding, and fix tests 2013-08-28 14:00:05 +02:00
8ae97d76ee PostProcessingError holds the message in the 'msg' property, not in 'message' (fixes #1323)
Causes DeprecationWarning: http://www.python.org/dev/peps/pep-0352/
2013-08-28 13:37:31 +02:00
f8b362739e Merge remote-tracking branch 'Huarong/master' 2013-08-28 13:10:59 +02:00
6d69d03bac Merge remote-tracking branch 'origin/reuse_ies' 2013-08-28 13:05:21 +02:00
204da0d3e3 Merge remote-tracking branch 'origin/master' 2013-08-28 12:57:44 +02:00
c496ca96e7 Fix platform name in Python 2 with --verbose (Closes #1228) 2013-08-28 12:57:10 +02:00
67b22dd036 Add extractors for video.mit.edu and techtv.mit.edu (closes #1327)
video.mit.edu just embeds the videos from techtv.mit.edu
2013-08-28 12:55:42 +02:00
ce6a696e4d Remove unused imports 2013-08-28 12:47:38 +02:00
a5caba1eb0 [generic] simply use urljoin 2013-08-28 12:47:27 +02:00
cd9c100963 Merge remote-tracking branch 'upstream/master' 2013-08-28 12:20:12 +02:00
edde6c56ac Print playpath with --get-url (Fixes #1334) 2013-08-28 12:14:45 +02:00
b7f89fe692 Merge remote-tracking branch 'upstream/master' 2013-08-28 12:10:34 +02:00
ae3531adf9 [generic] Fix URL concatenation
When the url is something like http://example.org/foo/bar?x=y  and the added is file/video.mp4 , we want http://example.org/foo/file/video.mp4
Fixes #1268.
2013-08-28 12:08:17 +02:00
8cf5ee7831 Merge branch 'master' of github.com:rg3/youtube-dl 2013-08-28 11:57:18 +02:00
aa3e950764 Tolerate junk at the end of gzip-compressed content (#1268) 2013-08-28 11:57:13 +02:00
1301a0dd42 Merge remote-tracking branch 'upstream/master' 2013-08-28 11:02:12 +02:00
af8bd6a82d Show the time taken to download in the same format as the ETA 2013-08-28 10:56:11 +02:00
6d38616e67 Merge pull request #1181 from h3xx/master
Add some verbosity when reporting finished downloads

Remove the mixed use of tabs and spaces for indentation.
2013-08-28 10:54:07 +02:00
4f5f18acb9 [addanime] add file 2013-08-28 10:28:16 +02:00
3e223834d9 [youtube] update algo for length 88, thanks to @Ramhack (fixes #1328) 2013-08-28 10:26:44 +02:00
a1bb0f8773 [cnn] remove debug print call. 2013-08-28 10:20:37 +02:00
0e283428f7 HTTPError is in urllib.error in Python 3, not in http.error 2013-08-28 10:18:39 +02:00
2eabb80254 [addanime] improve 2013-08-28 04:25:38 +02:00
44586389e4 [appletrailers] Add support 2013-08-28 02:18:44 +02:00
06a401c845 Merge branch 'master' into subtitles_rework 2013-08-28 00:33:12 +02:00
273f603efb [cnn] Allow more URLs 2013-08-28 00:14:19 +02:00
1619e22f40 release 2013.08.28 2013-08-27 23:31:36 +02:00
88a79ce6a6 Delete default user agent (Fixes #1309) 2013-08-27 23:31:24 +02:00
acebc9cd6b Revert "Install our own HTTPS handler as well (#1309)"
This reverts commit 36399e8576 and fixes #1322.
2013-08-27 23:28:20 +02:00
443c12a703 Merge pull request #1324 from whydoubt/fix_gplus
Initial slash in Google+ photos link was removed
2013-08-27 13:36:39 -07:00
7f3c4f4f65 Initial slash in Google+ photos link was removed 2013-08-27 14:38:50 -05:00
0bc56fa66a Add an extractor for NBC news (closes #1320) 2013-08-27 12:38:57 +02:00
1a582dd49d Add an extractor for CNN (closes #1318) 2013-08-27 11:56:48 +02:00
c5b921b597 Merge remote-tracking branch 'upstream/master' 2013-08-27 10:47:47 +02:00
e86ea47c02 [canalc2] Small improvements 2013-08-27 10:35:20 +02:00
aa5a63a5b5 Merge remote-tracking branch 'Rudloff/canalc2' 2013-08-27 10:31:46 +02:00
2a7b4da9b2 [hark] get the song info in JSON and extract more information. 2013-08-27 10:25:38 +02:00
069d098f84 [canalplus] Accept player.canalplus.fr urls 2013-08-27 10:21:57 +02:00
b3889f7023 release 2013.08.27 2013-08-27 02:30:47 +02:00
65883c8dbd Merge branch 'master' of github.com:rg3/youtube-dl 2013-08-27 02:00:23 +02:00
341ca8d74c [trilulilu] Add support for trilulilu.ro
Fun fact: The ads (not yet supported) are loaded from youtube ;)
2013-08-27 01:59:00 +02:00
99859d436c Merge remote-tracking branch 'upstream/master' 2013-08-26 15:16:13 -07:00
1b01e2b085 Merge pull request #1315 from yasoob/master
fixed tests for c56 and dailymotion
2013-08-26 13:38:48 -07:00
976fc7d137 fixed tests for c56 and dailymotion 2013-08-27 01:00:17 +05:00
c3b7b29c23 Merge remote-tracking branch 'origin/master' 2013-08-26 21:29:44 +02:00
627a91a9a8 [generic] small typo 2013-08-26 21:29:31 +02:00
6dc6302599 Merge pull request #1231 from yasoob/master
Added an IE for hark.com
2013-08-26 12:29:04 -07:00
7a20e2e1f8 Merge remote-tracking branch 'upstream/master' 2013-08-26 03:16:42 +02:00
90648143c3 Merge pull request #1310 from rzhxeo/rtlnow
Add support for http://superrtlnow.de
2013-08-25 15:45:22 -07:00
5c6658d4dd Merge remote-tracking branch 'upstream/master' 2013-08-24 23:01:39 +02:00
9585f890f8 [generic] add support for relative URLs (Fixes #1308) 2013-08-24 22:56:37 +02:00
0838239e8e [generic] Support double slash URLs (Fixes #1309) 2013-08-24 22:52:45 +02:00
36399e8576 Install our own HTTPS handler as well (#1309) 2013-08-24 22:49:22 +02:00
9460db832c [ro220] Add support for 220.ro 2013-08-24 21:10:03 +02:00
d68730a56e Add SUPER RTL NOW to RTLnow extractor 2013-08-24 13:22:28 +02:00
f2aeefe29c [youtube] update algo for length 84 2013-08-24 10:48:12 +02:00
39c6f507df Merge remote-tracking branch 'upstream/master' 2013-08-23 15:33:36 -07:00
d2d1eb5b0a Switch to domain yt-dl.org 2013-08-23 23:57:23 +02:00
8ae7be3ef4 release 2013.08.23 2013-08-23 23:09:53 +02:00
306170518f [youtube] update algo for length 86 (fixes #1302) 2013-08-23 22:36:59 +02:00
aa6a10c44a Allow to specify multiple subtitles languages separated by commas (closes #518) 2013-08-23 18:34:57 +02:00
9af73dc4fc Print a message before embedding the subtitles 2013-08-23 18:17:43 +02:00
fc483bb6af [xhamster] use determine_ext 2013-08-23 17:23:34 +02:00
53b0f3e4e2 Merge pull request #1301 from rzhxeo/xhamster
XHamsterIE: Fix video extension and add video description
2013-08-23 17:21:30 +02:00
4353cf51a0 XHamsterIE: Add video description 2013-08-23 16:40:20 +02:00
ce34e9ce5e XHamsterIE: Fix video extension
Cut off GET parameter
2013-08-23 16:33:41 +02:00
d4051a8e05 Add a post processor for embedding subtitles in mp4 videos (closes #1052) 2013-08-23 15:06:19 +02:00
df3df7fb64 [youtube] Fix download of subtitles with '--all-subs'
If _extract_subtitles is called the option 'write subtitles' is always true.
2013-08-23 13:14:22 +02:00
9e9c164052 Merge pull request #937 from jaimeMF/subtitles_rework
Subtitles rework
2013-08-23 02:40:25 -07:00
066090dd3f [youtube] add algo for length 80 and update player info 2013-08-23 11:33:56 +02:00
614d9c19c1 Merge remote-tracking branch 'upstream/master' 2013-08-22 17:02:41 -07:00
bd2dee6c67 Merge branch 'master' into subtitles_rework 2013-08-23 01:47:10 +02:00
74e6672beb Merge pull request #1297 from iemejia/master
[subtitles] separated subtitle options in their own group
2013-08-22 16:30:14 -07:00
02bcf0d389 release 2013.08.22 2013-08-22 23:29:42 +02:00
18b4e04f1c Merge branch 'master' into subtitles_rework 2013-08-22 23:29:36 +02:00
10204dc898 [videofyme] Add an additional quality (they change between downloads of the info) and update md5 sum of the test video 2013-08-22 23:23:52 +02:00
1865ed31b9 [subtitles] separated subtitle options in their own group 2013-08-22 22:44:04 +02:00
3669cdba10 [youtube] update algo for length 82 (fixes #1296) 2013-08-22 22:35:15 +02:00
939fbd26ac [youtube] fix the order of DASH formats 2013-08-22 19:45:24 +02:00
b4e60dac23 Merge remote-tracking branch 'upstream/master' 2013-08-22 10:43:51 -07:00
e6ddb4e7af Merge pull request #1279 from xanadu/master
Add YouTube DASH formats to YouTubeIE
2013-08-22 19:33:34 +02:00
83390b83d9 Merge pull request #1266 from MiLk/py-generator
Update the youtube algorithm generator
2013-08-22 10:18:58 -07:00
ff2424595a lxml is not part of the standard library. 2013-08-22 14:47:51 +02:00
adeb9c73d6 Merge remote-tracking branch 'upstream/master' 2013-08-22 14:04:30 +02:00
cd0abcc0bb Extractor for canalc2.tv 2013-08-22 13:54:23 +02:00
4a55479fa9 Credit Pierre Rudloff for JeuxVideoIE and UnistraIE 2013-08-22 13:21:32 +02:00
f527115b5f Rename utv.py to unistra.py and extract more info
There are other sites that could be named utv, which would conflict if they are added
2013-08-22 13:19:35 +02:00
75e1b46add Download from utv.unistra.fr (PR #1271)
Squashed to a single commit to keep the file 'youtube-dl' unchanged and remove the revert commit.
2013-08-22 12:58:12 +02:00
05a2926c5c Merge remote-tracking branch 'upstream/master' 2013-08-22 12:55:58 +02:00
7070b83687 Merge remote-tracking branch 'upstream/master' 2013-08-22 12:54:17 +02:00
8d212e604a Merge remote-tracking branch 'upstream/master'
Conflicts:
	youtube_dl/extractor/jeuxvideo.py
2013-08-22 12:52:05 +02:00
063fcc9676 [jeuxvideo] Extract more information and add test 2013-08-22 12:37:34 +02:00
8403612258 Merge pull request #1267 from Rudloff/master
Download videos from jeuxvideo.com

Edited to keep the file 'youtube-dl' unchanged.
2013-08-22 12:25:21 +02:00
25b51c7816 Download videos from jeuxvideo.com 2013-08-22 12:12:34 +02:00
9779b63bb6 Add an extractor for PBS (closes #870 and #873) 2013-08-22 11:57:21 +02:00
d81aef3adf Add an extractor for tv.slashdot.org (closes #1192)
It uses the ooyala platform, so it just extracts the ooyala url.
2013-08-21 21:51:58 +02:00
5af7e056a7 Merge remote-tracking branch 'upstream/master' 2013-08-21 10:53:42 -07:00
45ed795cb0 [youtube] update uploader name for a test video: 'IconaPop' has changed to 'Icona Pop' 2013-08-21 19:28:48 +02:00
683e98a8a4 [statigram] change test video
The old one cannot be accessed.
2013-08-21 19:20:27 +02:00
e0cfeb2ea7 [funnyordie] fix extraction of video url and title 2013-08-21 18:58:25 +02:00
75340ee383 [vevo] Fix urls with a query (#1258) 2013-08-21 18:20:03 +02:00
668de34c6b [soundcloud] Support widget urls (fixes #1252) 2013-08-21 17:06:37 +02:00
a91b954bb4 [vimeo] extract information for Vimeo Pro videos from http://player.vimeo.com/video/{video_id} (fixes #1197)
For some videos https://vimeo.com/{video_id} doesn't work
2013-08-21 13:48:19 +02:00
a3f62b8255 Merge remote-tracking branch 'upstream/master' 2013-08-21 00:07:03 -07:00
37b6d5f684 fix hls test 2013-08-20 23:51:05 -07:00
b7a6838407 address review comment 2013-08-20 21:57:32 -07:00
cde846b3d3 fix code style 2013-08-20 21:42:49 -07:00
6c3e6e88d3 Allow hours in ETA display (Fixes #1280) 2013-08-21 05:44:19 +02:00
739674cd77 [rtlnow] Add support for error message for queries from outside of Germany 2013-08-21 05:24:58 +02:00
4b2d7cae11 release 2013.08.21 2013-08-21 04:33:57 +02:00
7fea7156cb [generic] support HTML5 video 2013-08-21 04:32:22 +02:00
3093468977 [generic] Ignore stupid HTTP servers (#1284) 2013-08-21 04:32:07 +02:00
79cb25776f Cache suitable regular expressions
This speeds up TestAllURLsMatching.test_no_duplicates by about 8000% at the cost of minimal memory overhead.
2013-08-21 04:06:48 +02:00
87f78946a5 [collegehumor] Allow old-style videos (Fixes #1285) 2013-08-21 03:50:56 +02:00
211fbc1328 fix failed tests 2013-08-19 18:57:55 -07:00
836a086ce9 Add YouTube DASH formats to YouTubeIE 2013-08-19 18:22:25 -07:00
90d3989b99 Merge remote-tracking branch 'upstream/master' 2013-08-19 17:11:52 -07:00
d741e55a42 [youtube] Support watch_popup URLs (Fixes #1275) 2013-08-19 10:27:42 +02:00
17d3aaaf16 Merge pull request #1273 from rzhxeo/rtlnow
Add support for http://voxnow.de
2013-08-19 00:19:06 -07:00
ea55b2a4ca Add VOXnow to RTLnow extractor 2013-08-19 08:57:36 +02:00
3f0537dd4a Merge remote-tracking branch 'rzhxeo/rtlnow' 2013-08-19 00:25:34 +02:00
943f7f7a39 Download videos from jeuxvideo.com 2013-08-18 16:11:47 +02:00
12e895fc5a Merge branch 'master' into py-generator 2013-08-18 11:12:38 +02:00
bda2c49d75 Update algo - see #1254
Signed-off-by: Emilien Kenler <hello@emilienkenler.com>
2013-08-18 11:10:39 +02:00
01b32990da Add RTLnow extractor 2013-08-18 08:16:53 +02:00
dbda1b5147 Add RTLnow extractor
Supports http://rtl2now.rtl2.de and http://rtl-now.rtl.de
2013-08-18 08:15:18 +02:00
ddf3bd328b release 2013.08.17 2013-08-17 08:33:36 +02:00
b9c37b92cf Merge pull request #1256 from patrickslin/patch-5
Length 85 changed again? (fixes #1254)
2013-08-16 14:07:49 -07:00
5a27ecdd2e Update AddAnime.py 2013-08-16 23:54:09 +03:00
f9c3c90ca8 Length 85 changed again? (fixes #1254) 2013-08-16 08:54:01 -07:00
6daccbe317 release 2013.08.15 2013-08-15 22:40:00 +02:00
71ea844c0e Merge pull request #1248 from patrickslin/patch-4
Unable to Download Video (fixes #1247)
2013-08-15 13:38:32 -07:00
3a7256697e Unable to Download Video (fixes #1247) 2013-08-15 13:00:20 -07:00
d1ba998274 release 2013.08.14 2013-08-14 10:19:53 +02:00
718ced8d8c Merge pull request #1239 from patrickslin/patch-3
Updated Vevo Signature Length (fixes #1237)
2013-08-14 01:18:58 -07:00
e1842025d0 Updated Vevo Signature Length (fixes #1237) 2013-08-13 17:57:35 -07:00
2b9213cdc1 Update generator
Signed-off-by: Emilien Kenler <hello@emilienkenler.com>
2013-08-12 10:48:40 +02:00
e3a88568b0 Added an IE for hark.com 2013-08-11 22:23:05 +05:00
0577177e3e [vevo] fix testcase 2013-08-11 07:12:38 +02:00
298f833b16 Note update possibility on errors (thanks @chbrown, #1229) 2013-08-11 06:46:24 +02:00
97b3656c2e YoupornIE: Add support for hd videos and update Test 2013-08-09 18:37:33 +02:00
f3bcebb1d2 add an aes implementation 2013-08-09 18:36:01 +02:00
0f399e6e5e release 2013.08.09 2013-08-09 15:49:09 +02:00
5b075e27cb Merge pull request #1218 from patrickslin/patch-2
New sig len 89 algo
2013-08-09 03:42:13 -07:00
8a9d86a2a7 New sig len 89 algo
Fixes new YT encrypted sig len 89.
2013-08-08 21:48:12 -07:00
d80a064eff [subtitles] Added tests to check correct behavior when no subtitles are
available
2013-08-08 22:22:33 +02:00
d468a09789 release 2013.08.08.1 2013-08-08 20:45:16 +02:00
9f4ab73d7f Merge pull request #1216 from patrickslin/patch-5
Invalid signature again (fixes #1215)
2013-08-08 11:44:29 -07:00
02cf62e240 Invalid signature again (fixes #1215) 2013-08-08 11:28:50 -07:00
d55de6eec2 [subtitles] Skips now the subtitles that has already been downloaded.
Just a validation for file exists, I also removed a method that wasn't
been used because it was a copy paste from FileDownloader.
2013-08-08 18:30:04 +02:00
69df680b97 [subtitles] Improved docs + new class for servers who don't support
auto-caption
2013-08-08 11:20:56 +02:00
447591e1ae [test] Cleaned subtitles tests 2013-08-08 11:03:52 +02:00
33eb0ce4c4 [subtitles] removed only-sub option (--skip-download achieves the same
functionality)
2013-08-08 10:06:24 +02:00
505c28aac9 Separated subtitle options in their own group 2013-08-08 09:53:25 +02:00
67fb0c5495 Merge branch 'master' of github.com:rg3/youtube-dl 2013-08-08 08:56:59 +02:00
4efba05c56 Clarify template error message (#1209) 2013-08-08 08:55:26 +02:00
8377574c9c [internal] Improved subtitle architecture + (update in
youtube/dailymotion)

The structure of subtitles was refined, you only need to implement one
method that returns a dictionnary of the available subtitles (lang, url) to
support all the subtitle options in a website. I updated the subtitle
downloaders for youtube/dailymotion to show how it works.
2013-08-08 08:54:10 +02:00
0f90943e45 Merge pull request #1189 from cyisfor/master
More informative error
2013-08-07 17:30:28 -07:00
526e638c8a release 2013.08.08 2013-08-08 00:39:23 +02:00
372297e713 Undo the previous commit (it was a mistake) 2013-08-07 21:24:42 +02:00
356e067390 Merge remote-tracking branch 'patrickslin/patch-4' 2013-08-07 20:19:51 +02:00
e2f48f9643 Remove youtube sig tests
The signature algo changes too often for the static test to make sense.
2013-08-07 20:11:40 +02:00
b513a251f8 Merge commit '7a4c6cc92f9ffec9135652a49153caffa5520c29' 2013-08-07 20:11:04 +02:00
953e32b2c1 [dailymotion] Added support for subtitles + new InfoExtractor for
generic subtitle download.

The idea is that all subtitle downloaders must descend from SubtitlesIE
and implement only three basic methods to achieve the complete subtitle
download functionality. This will allow to reduce the code in YoutubeIE
once it is rewritten.
2013-08-07 18:59:11 +02:00
5898e28272 Fixed small type issue 2013-08-07 18:48:24 +02:00
67dfbc0cb9 Added exceptions for the subtitle and video types in .gitignore 2013-08-07 18:42:40 +02:00
36cb11f068 Encrypted sig 87 broken again (fixes #1200) 2013-08-06 21:35:37 -07:00
7a4c6cc92f Updated the 84 length signature decryption
Updated the right 84 length signature decryption 06.08.2013
2013-08-06 15:41:13 +03:00
7edcb8f39c More informative error 2013-08-05 19:43:09 -07:00
d5b00ee6e0 improve sohu extractor 2013-08-06 10:26:57 +08:00
461cead4f7 changes 2013-08-06 04:34:24 +03:00
b5a6d40818 fix parse title bug 2013-08-05 22:51:54 +08:00
968b5e0112 Add some verbosity when reporting finished downloads
For example:

    [download] Resuming download at byte 1868140
    [download] Destination: Entry #1-Bn59FJ4HrmU.flv
    [download] 100% of 3.27MiB in 4s

This format is meant to somewhat mirror the behavior of wget(1) when reporting finished downloads:

    100%[==================>] 54,836,682   788KB/s   in 74s

    2013-08-04 12:32:05 (728 KB/s) - 'google-chrome-stable_current_x86_64.rpm' saved [54836682/54836682]
2013-08-04 12:45:24 -05:00
39b782b390 [collegehumor] support urls in the format www.collegehumor.com/e/{video_id} (fixes #1179) 2013-08-04 16:36:48 +02:00
577664c8e8 Add an extractor from muzu.tv (closes #1177) 2013-08-04 11:10:57 +02:00
bba12cec89 Add an extractor for videofy.me (closes #1171)
Also modify find_xpath_attr to accept values with spaces like for id="HQ on"
2013-08-03 22:50:27 +02:00
70c4c03cb8 [arte] add support for downloading from http://liveweb.arte.tv (fixes #1014) 2013-08-03 19:07:04 +02:00
f5791ed136 [arte] Prefer vídeos without subtitles in the same language (fixes #1173) and fix crash when there's no description 2013-08-03 17:32:29 +02:00
4ec929dc9b use ..utils/clean_html() 2013-08-03 10:29:58 +08:00
fbf189a6ee [myvideo] add support for videos that place the video info inside www.myvideo.de/service/data/video/{id}/config (fixes #616) 2013-08-02 21:09:17 +02:00
09825cb5c0 Add an extractor for Ooyala (closes #833)
Only works for some sites, it doesn't work for videos that use a f4m manifest
2013-08-02 16:53:16 +02:00
ed27d35674 [youtube] don't crash in verbose mode if 'ad3_module' is not defined in age protected videos (fixes #1159) 2013-08-02 14:17:01 +02:00
fd5539eb41 release 2013.08.02 2013-08-02 13:35:13 +02:00
04bca64bde [youtube]: new algo for length 83 (fixes #1164) 2013-08-02 12:38:17 +02:00
03cc7c20c1 [youtube] show which formats are in 3D with "-F" and in the format field 2013-08-02 12:21:28 +02:00
4075311d94 Merge pull request #1163 from xanadu/master
add support for download YouTube 3d format of 3d content
2013-08-02 12:06:34 +02:00
6624a2b07d add an extractor for tv.sohu.com 2013-08-02 17:58:46 +08:00
6d3a7d03e1 fix bug: kankan extractor not support http://vod.kankan.com/v/70/70309.shtml 2013-08-02 15:26:11 +08:00
95fdc7d69c Merge remote-tracking branch 'upstream/master' 2013-08-01 10:57:12 -07:00
86fe61c8f9 add support for download YouTube 3d format of 3d content 2013-08-01 10:47:48 -07:00
9bb6d2f21d Merge pull request #1161 from meyerd/master
Fix regex error when only subtitled video is available on arte.
2013-08-01 04:49:05 -07:00
e3f4593e76 Fix regex error when only subtitled video is available on arte. 2013-08-01 11:48:17 +02:00
1d043b93cf [youtube] Add support for downloading videos with hlsvp (fixes #1083)
They are downloaded with a m3u8 manifest, they seem to be encrypted, but ffmpeg can handle them.
2013-07-31 23:41:05 +02:00
b15d4f624f Allow to download from m3u8 manifests with ffmpeg
They are detected by the extension of the url.
2013-07-31 22:33:37 +02:00
4aa16a50f5 Log a better error message if ffprobe or avconv are not found (related #1134) 2013-07-31 21:22:08 +02:00
bbcbf4d459 Switch some calls to to_stderr to report_error and report_warning 2013-07-31 21:20:46 +02:00
930ad9eecc release 2013.07.31 2013-07-31 10:55:02 +02:00
b072a9defd YoutubeIE: with age protected videos, add a missing "return" to return the signature decrypted with _decrypt_signature 2013-07-31 10:51:00 +02:00
75952c6e3d YoutubeIE: new algo for length 86 (fixes #1156)
Now is using the same length as the flash player used for age protected videos, but the algorithm is different, so now for age protected videos it first tries to use the old algo.
2013-07-31 10:45:13 +02:00
05afc96b73 Print urls from the batch file with --verbose (related #1155) 2013-07-30 23:11:44 +02:00
fa80026915 Disable way and tf1 tests, the whole videos are served sometimes, so the md5 sum doesn't match. 2013-07-30 11:19:07 +02:00
2bc3de0f28 [worldstarhiphop] Small cleanup
The second check for the Vevo id is not necessary.
2013-07-30 11:10:17 +02:00
99c7bc94af Merge pull request #1148 from JohnyMoSwag/master
[worldstarhiphop] support vevo videos
2013-07-30 11:05:40 +02:00
152c8f349d Merge pull request #1149 from pishposhmcgee/patch-3
[vevo] Modified m_urls regex and video_url
2013-07-30 01:57:37 -07:00
d75654c15e using re.search 2013-07-29 14:39:14 -07:00
0725f584e1 [wat] fix the extraction of the video url (fixes #1103)
Use the direct download link for Android.
2013-07-29 23:38:02 +02:00
8cda9241d1 Add an extractor for kankan.com (closes #1133) 2013-07-29 23:13:12 +02:00
a3124ba49f Modified m_urls regex and video_url
Some videos have a leading slash, some do not
2013-07-29 15:45:20 -05:00
579e2691fe detect vevo embed fix 2013-07-29 12:24:26 -07:00
63f05de10b detect vevo embed 2013-07-29 12:11:57 -07:00
caeefc29eb [vimeo] add an extractor for channels 2013-07-29 13:12:09 +02:00
a3c736def2 [dailymotion] Add an extractor for Dailymotion playlists 2013-07-29 12:07:38 +02:00
58261235f0 Add an extractor for roxwell.com (closes #1044) 2013-07-26 13:00:59 +02:00
da70877a1b release 2013.07.25.2 2013-07-25 22:58:40 +02:00
5c468ca8a8 YoutubeIE: add algo for length 79 (fixes #1126) 2013-07-25 22:50:24 +02:00
aedd6bb97d YoutubeIE: new algo for length 81 (fixes #1127) 2013-07-25 22:06:53 +02:00
733d9cacb8 Merge pull request #1120 from pishposhmcgee/patch-1
[collegehumor] Added an option 'e' to go with 'video' or 'embed'
2013-07-25 01:14:43 -07:00
42f2805e48 [keek] Fix testcase (Broken by accident in 6625f82940) 2013-07-25 10:10:37 +02:00
0ffcb7c6fc release 2013.07.25.1 2013-07-25 09:53:15 +02:00
27669bd11d [ina] Allow I at start of video IDs 2013-07-25 09:52:58 +02:00
6625f82940 [keek] Allow httpS URLs (Fixes #1123) 2013-07-25 09:40:19 +02:00
d0866f0bb4 release 2013.07.25 2013-07-25 09:35:25 +02:00
09eeb75130 Merge remote-tracking branch 'pishposhmcgee/patch-2' 2013-07-25 09:34:56 +02:00
0a99956f71 [ina] Fix URL detection (Fixes #1121) 2013-07-25 09:34:12 +02:00
12ef6aefa8 changed video_url regex
Some older videos contain an extra properties such as 'embed' before 'type'.
2013-07-24 21:51:08 -05:00
e93aa81aa6 Added an option 'e' to go with 'video' or 'embed'
Based on links that I've seen, /e/<videoid> also occurs in the wild, and making this substitution yields effective results.
2013-07-24 16:55:28 -05:00
755eb0320e [youtube] use itertools.count instead of a "while True" loop and a manual counter 2013-07-24 22:27:33 +02:00
43ba5456b1 [youtube] add an extractor for the "Watch Later" list 2013-07-24 22:13:39 +02:00
156d5ad6da release 2013.07.24.2 2013-07-24 21:18:41 +02:00
c626a3d9fa Add an extractor for downloading the Youtube favorite videos(closes #127) 2013-07-24 20:45:19 +02:00
b2e8bc1b20 YoutubeIE: Move the code from _real_initialize to a base class
This allows to reuse the code in other IEs without having to overwrite some parts.
2013-07-24 20:40:12 +02:00
771822ebb8 YoutubePlaylistIE: break only if there's no entry field in the response
Otherwise the Favorite videos playlist cannot be downloaded complete.
Also break if it reach the maximum value of the start-index.
2013-07-24 20:14:55 +02:00
eb6a41ba0f ExfmIE: extract Soundcloud songs using SoundcloudIE
Now SouncloudIE accepts api urls.
2013-07-24 14:39:21 +02:00
7d2392691c [soundcloud]: Some improvements
Extract thumbnails.
Make SoundcloudSetIE a subclass of SoundcloudIE to reuse some code.
Directly extract the file url without downloading an extra page.
2013-07-24 14:15:12 +02:00
c216c1894d release 2013.07.24.1 2013-07-24 13:52:55 +02:00
3e1ad508eb Add Youtube player info for length 87 2013-07-24 12:48:25 +02:00
a052c1d785 Merge pull request #1114 from alexvh/traileraddict_hd
[traileraddict] Obtain hd quality stream if available

Updated md5 checksum of the test video.
2013-07-24 10:52:24 +02:00
16484d4923 [traileraddict]: Support clips urls and more trailer urls 2013-07-24 10:43:44 +02:00
32a09b4382 Merge pull request #1113 from alexvh/master
[traileraddict] Allow all types of trailer URLs
2013-07-24 10:37:52 +02:00
870a7e6156 release 2013.07.24 2013-07-24 10:29:34 +02:00
239e3e0cca YoutubeIE: new algo for length 87 (fixes #1105)
Squashed commit from the pull requests #1107, #1109 and #1110.
2013-07-24 10:20:52 +02:00
b1ca5e3ffa [traileraddict] Obtain hd quality stream if available
No clear method for determining if hd is available so opted to just
check for presence of hd toggle function.
2013-07-24 02:42:32 -04:00
b9a1252c96 [traileraddict] Allow all types of trailer URLs
Valid url regex for traileraddict.com is too strict. Need to allow,
e.g. theatrical-trailer, teaser-trailer, feature-read-band-trailer, etc.
2013-07-24 00:48:11 -04:00
fc492de31d release 2013.07.23.1 2013-07-23 18:37:52 +02:00
a9c0f9bc63 Merge branch 'master' of github.com:rg3/youtube-dl 2013-07-23 18:37:09 +02:00
b7cc9f5026 [soundcloud] Support URLs with a slash at the end (Fixes #1104) 2013-07-23 18:35:52 +02:00
252580c561 YoutubeChannelE: switch ajax query from channel_ajax to c4_browse_ajax
It wasn't detecting when there aren't more videos
2013-07-23 14:58:01 +02:00
acc47c1a3f Mark WatIE and TF1IE as broken (related #1103) 2013-07-23 14:29:30 +02:00
70fa830e4d CollegeHumorIE: support Youtube videos and embed urls (fixes #1094) 2013-07-23 14:29:29 +02:00
a7af0ebaf5 release 2013.07.23 2013-07-23 14:20:52 +02:00
67ae7b4760 Fix BreakIE
Also detect videos that come from Youtube
2013-07-23 11:41:05 +02:00
de48addae2 Fix CollegHumorIE
Now it downloads the video over http in one file, it doesn't downloads in fragments
Added a test and use the methods in InfoExtractor for downloading webpages
2013-07-23 11:14:11 +02:00
ddbfd0f0c5 ComedyCentralIE: support the extended interviews urls (fixes #1079) 2013-07-21 11:04:56 +02:00
d7ae0639b4 [youtube] Add an extractor for Youtube recommended videos (":ytrec" keyword) (closes #476)
The new extractor and YoutubeSubscriptionsIE are subclasses of YoutubeFeedsInfoExtractor, which allows to fetch videos from http://www.youtube.com/feed_ajax
2013-07-20 19:33:40 +02:00
6804038d06 Don't try to write the subtitles if it's None 2013-07-20 12:59:47 +02:00
2f799533ae YoutubeIE: don't crash when trying to get automatic captions if the videos has standard subtitles. 2013-07-20 12:56:10 +02:00
88ae5991cd YoutubeIE: use the same function for getting the subtitles for the "--write-sub" and "--all-sub" options 2013-07-20 12:56:06 +02:00
5d51a883c2 Use a dictionary for storing the subtitles
The errors while getting the subtitles are reported as warnings, if no subtitles are found return and empty dict.
2013-07-20 12:52:25 +02:00
c4a91be726 Save subtitles using the same code for all the options 2013-07-20 12:52:24 +02:00
0382435990 [exfm] Add IE_* descriptions 2013-07-20 11:26:36 +02:00
b390d85d95 Merge remote-tracking branch 'yasoob/master' 2013-07-20 11:23:56 +02:00
be925dc64c release 2013.07.19 2013-07-19 23:42:29 +02:00
de7a91bfe3 WeiboIE: extract the player urls from a json webpage
Also extract a Sina url that doesn't require to follow a redirection.
2013-07-19 20:43:44 +02:00
a4358cbabd YoutubeIE: new algo for length 85 (closes #1080), thanks to @patrickslin 2013-07-19 17:12:40 +02:00
177ed935a9 TEDIE: fix the title extraction 2013-07-19 16:13:31 +02:00
c364f15ff1 Add WeiboIE (closes #1039)
It just embed video from other sites.
Modified the _VALID_URL of Youku to catch embed urls.
2013-07-19 16:09:14 +02:00
e1f6e61e6a Add an extractor for 56.com (related #1039) 2013-07-19 15:17:34 +02:00
0932300e3a Add SinaIE (related #1039): extractor for video.sina.com.cn 2013-07-18 15:31:50 +02:00
3f40217704 InstagramIE: fix the extraction of the uploader_id and the title
The page title is now 'Instagram', so we build it.
Also extract the description
2013-07-18 13:12:27 +02:00
f631c3311a Hint that --update may need sudo 2013-07-18 12:53:24 +02:00
ad433bb372 release 2013.07.18 2013-07-18 12:41:49 +02:00
3e0b3a1428 Remove the test to signature of lengths 43,43
It's already covered by the test for length 87
2013-07-18 12:29:09 +02:00
444b116597 YoutubeIE: add algo for length 90 (closes #1064)
Order the cases from higher to lower length.
2013-07-18 12:25:41 +02:00
2aea08eda1 Merge pull request #1068 from MiLk/genalgo-youtube-92
[youtube] Add generator for signature 92
2013-07-18 09:54:56 +02:00
8e5e059d7d forgot to import json json 2013-07-18 12:40:56 +05:00
2b1b511f6b removed some unnecessary imports 2013-07-18 12:37:47 +05:00
233ad24ecf corrected a typo and added myself to travis notifications. 2013-07-18 12:37:02 +05:00
c4949c50f9 added test for ex.fm 2013-07-18 12:33:31 +05:00
b6ef402905 added an IE for ex.fm 2013-07-18 12:30:21 +05:00
ccf365475a [youtube] Add generator for signature 92 2013-07-17 17:43:44 +02:00
e1fb245690 Add CondeNastIE
It supports some of the websites of the Condé Nast group: WIRED, GQ, Vogue, Glamour, W Magazine and Vanity Fair.
2013-07-17 14:39:02 +02:00
5a76c6517e YoutubeIE: some encrypted signatures have more than two parts, print the size of all the parts 2013-07-17 12:08:10 +02:00
1bb9568776 release 2013.07.17.1 2013-07-17 11:18:35 +02:00
ecd1c2f7e9 [thisav] add a test for video MD5 2013-07-17 11:18:14 +02:00
466de68801 [thisav] Add IE (Fixes #1056) 2013-07-17 11:16:53 +02:00
88d4111cfa [youtube] Add code for signature 92 (Closes #1060) 2013-07-17 11:06:34 +02:00
51fb64bab1 Mark test_youtube_sig as non-executable (#1066) 2013-07-17 11:04:07 +02:00
be547e1d3b Revert "[youtube] improved decrypt_signature, closes #1060"
This reverts commit fe6fad1242 and closes #1066.
2013-07-17 11:01:40 +02:00
bf85454116 [metacafe] Fix test 2013-07-17 10:50:30 +02:00
5910724b11 [metacafe] New result format 2013-07-17 10:49:49 +02:00
7e24b09da9 [metacafe] Extract description 2013-07-17 10:45:35 +02:00
f085f960e7 [metacafe] Fix uploader detection 2013-07-17 10:45:24 +02:00
f38de77f6e Use unescapeHTML for OpenGraph properties
These are attribute values, so we don't need the more complex and whitespace-destroying cleanHTML - we just need to unescape quotes, that's it.
2013-07-17 10:38:23 +02:00
58e7d46d1b Merge remote-tracking branch 'Forever-Young/patch-1' 2013-07-17 09:25:52 +02:00
2a5201638d [youtube] Add sig test for 92 (Thanks to @patrickslin) 2013-07-17 09:23:38 +02:00
fe6fad1242 [youtube] improved decrypt_signature, closes #1060 2013-07-17 10:41:43 +04:00
ec00e1d8a0 [metacafe] Use modern helper methods 2013-07-17 01:35:33 +02:00
de29c4144e Ignore errors in git error handling in verbose mode in Python 3 2013-07-17 01:33:28 +02:00
f3bab0044e Write debugging output to stderr (#1059) 2013-07-17 01:30:34 +02:00
ffd1833b87 release 2013.07.17 2013-07-17 01:14:38 +02:00
896d5b63e8 [metacafe] Add support for AnyClip videos (#1059) 2013-07-17 01:14:30 +02:00
67de24e449 [freesound] Minor improvements 2013-07-15 21:33:45 +02:00
66400c470c Merge pull request #1050 from yasoob/master
Added an IE and test for Freesound.org .
2013-07-15 21:06:51 +02:00
7665010267 added test for freesound.org 2013-07-15 20:17:09 +05:00
5d9b75051a Added an IE for freesound.org 2013-07-15 20:16:44 +05:00
ab2f744b90 GametrailersIE: make it a subclass of MTVIE to reuse most of the extraction process 2013-07-14 14:29:15 +02:00
300fcad8a6 MTVIE: fix xml tags in the media namespace (python2.6) 2013-07-14 14:02:04 +02:00
f7e025958a [mtv]: rework MTVIE and add tests (closes #913)
It uses the same system as ComedyCentralIE to transform ramp urls into http.
2013-07-14 13:41:46 +02:00
0ab5531363 [livestream] fix import statement 2013-07-14 09:25:51 +02:00
b4444d5ca2 Add LivestreamIE (closes #1042) 2013-07-13 23:58:04 +02:00
0025da15cf Clarify that download rate is in bytes per second
I found f918ec7ea2 but it is still not clear to anyone who hasn't read Issue #723 whether the limit is in bits or bytes.  This is doubly confusing because 1) ISPs usually advertise speeds in bits per second, and 2) lowercase "k" and "m" are often used in correlation with bits rather than bytes.
2013-07-13 16:42:16 -05:00
b9d3e1635f Strip hash info from URL when making requests (Fixes #1038) 2013-07-13 22:52:12 +02:00
aa6b734e02 [instagram] really fix uploader_id detection (Fixes #1038) 2013-07-13 21:45:33 +02:00
73b57f0ccb [instagram] fix uploader_id detection (Fixes #1038) 2013-07-13 20:40:04 +02:00
3c4e6d8337 Improve OpenGraph property matching 2013-07-13 20:39:47 +02:00
36034aecc2 Merge remote-tracking branch 'jaimeMF/opengraph' 2013-07-13 20:33:23 +02:00
ffca4b5c32 Add CanalplusIE (closes #59 and closes #918) 2013-07-13 13:36:15 +02:00
b0e72bcf34 CriterionIE: simplify some parts and use _html_search_regex 2013-07-13 12:26:05 +02:00
7fd930c0c8 Merge pull request #1036 from yasoob/master
Added an IE and test for Criterion videos (closes #1035).
2013-07-13 12:18:03 +02:00
2e78b2bead YouJizzIE: support videos that define the urls in a playlist page (closes #1037) 2013-07-13 12:07:07 +02:00
44dbe89035 Use re.DOTALL by default when searching OpenGraph properties 2013-07-13 11:29:08 +02:00
2d5a8b5512 added test for criterion.com 2013-07-13 09:18:03 +05:00
159736c1b8 added an IE for criterion.com 2013-07-13 09:17:48 +05:00
46720279c2 InfoExtractor: add some helper methods to extract OpenGraph info 2013-07-12 22:12:04 +02:00
d8269e1dfb Don't try to save the thumbnail if it's None
It means the extractor couldn't find it
2013-07-12 22:11:59 +02:00
cbdbb76665 Use determine_ext when saving the thumbnail
Urls that contain a query produced filenames with wrong extensions
2013-07-12 22:08:49 +02:00
6543f0dca5 BrightcoveIE: Use parse_qs to extract the fields of the query (closes #1032)
Add a compat_urlparse to utils.
2013-07-12 14:53:28 +02:00
232eb88bfe GenericIE: allow to match declaration of the Brightocove parameters that use ' instead of " 2013-07-12 14:52:01 +02:00
a95967f8b7 [ign]: support some country versions and add an extractor for 1up.com
1up.com uses the gin video system, the extractor is a subclass of IGNIE, it just replaces the video id
2013-07-12 11:39:40 +02:00
2ef648d3d3 Add IGNIE
Only for www.ign.com, it doesn't support country specific versions (like es.ign.com)
2013-07-12 00:03:59 +02:00
33f6830fd5 release 2013.07.12 2013-07-11 23:54:34 +02:00
606d7e67fd YoutubeIE: add algo for length 81 (closes #1026) 2013-07-11 23:47:54 +02:00
fd87ff26b9 release 2013.07.11 2013-07-11 21:04:59 +02:00
85347e1cb6 YoutubeIE: a new algo for length 83 2013-07-11 20:21:45 +02:00
41897817cc GametrailersIE: support multipart videos
Use xml.etree.ElementTree instead of re when possible
2013-07-11 18:24:53 +02:00
45ff2d51d0 [brightcove] add import 2013-07-11 16:31:29 +02:00
5de3ece225 [brightcove] fix on Python 2.6 2013-07-11 16:16:02 +02:00
df50a41289 [arte] Fix on 2.6 2013-07-11 16:12:16 +02:00
59ae56fad5 Add helper function find_path_attr 2013-07-11 16:12:08 +02:00
690e872c51 Remove video_result helper method
Calling it was more complex then actually including the type in the video info
2013-07-11 12:12:30 +02:00
81082e046e [ehow] improve minor bits 2013-07-11 12:11:00 +02:00
3fa9550837 Merge remote-tracking branch 'yasoob/master' 2013-07-11 12:02:16 +02:00
b1082f01a6 added test for ehow 2013-07-11 14:30:25 +05:00
f35b84c807 added an IE for Ehow videos 2013-07-11 14:25:14 +05:00
117adb0f0f GenericIE: detect more Brightcove videos
In some sites "class" contains more that BrightcoveExperience
2013-07-11 00:25:38 +02:00
abb285fb1b BrightcoveIE: add support for playlists 2013-07-11 00:04:33 +02:00
a431154706 Set the playlist_index and playlist fields for already resolved video results. 2013-07-10 23:36:30 +02:00
cfe50f04ed GenericIE: Detect videos from Brightcove
Brightcove videos info is usually found in an <object class="BrightcoveExperience"></object> node, this is passed to a new method of BrightcoveIE that builds a url to extract the video.
2013-07-10 17:49:11 +02:00
a7055eb956 YoutubeIE: show a more meaningful error when it founds a rtmpe download (related #343) 2013-07-10 14:35:11 +02:00
0a1be1e997 release 2013.07.10 2013-07-10 11:36:11 +02:00
c93898dae9 YoutubeIE: new algo for length 83 (closes #1017 and closes #1016) 2013-07-10 10:44:04 +02:00
ebdf2af727 GameSpotIE: support more urls and download videos in the best quality 2013-07-09 20:07:52 +02:00
c108eb73cc YoutubeIE: Fix vevo explicit videos (closes #956)
When an age restricted video is detected it simulates accessing the video from www.youtube.com/v/{video_id}
2013-07-09 15:43:44 +02:00
3a1375dacf VeohIE: remove debug logging 2013-07-09 11:11:55 +02:00
41bece30b4 DotsubIE: simplify and extract the upload date
Do not declare variables for fields in the info dictionary.
2013-07-08 22:40:42 +02:00
16ea58cbda Merge pull request #1009 from yasoob/master
Added an IE and test for dotsub.com videos. ( closes #1008 )
2013-07-08 22:21:06 +02:00
99e350d902 Add VeohIE (closes #1006) 2013-07-08 22:02:23 +02:00
13e06d298c added an IE and test for dotsub. 2013-07-09 00:05:52 +05:00
56c7366547 YoutubeIE: reuse instances of InfoExtractors (closes #998)
When a IE is added to the list, it's also added to a dictionary. When a IE is requested it first looks in the dictionary and if there's no instance it will create a new one.

That way _real_initialize is only called once for each IE, saving time if it needs to login for example.
2013-07-08 15:14:27 +02:00
81f0259b9e YoutubeSubscriptionsIE: raise an error if there's no login information. 2013-07-08 11:24:11 +02:00
fefcb5d314 YoutubeIE: use the new method in the base IE for getting the login info 2013-07-08 11:24:11 +02:00
345b0c9b46 Remove dead code 2013-07-08 02:13:50 +02:00
20c3893f0e Do not redefine variables in list comprehensions 2013-07-08 02:12:20 +02:00
29293c1e09 release 2013.07.08.1 2013-07-08 02:05:22 +02:00
5fe3a3c3fb [archive.org] Add extractor (Fixes #1003) 2013-07-08 02:05:02 +02:00
b04621d155 release 2013.07.08 2013-07-08 01:29:16 +02:00
b227060388 [arte] Always look for the JSON URL (Fixes #1002) 2013-07-08 01:28:19 +02:00
d93e4dcbb7 Merge branch 'master' of github.com:rg3/youtube-dl 2013-07-08 01:15:19 +02:00
73e79f2a1b [3sat] Add support (Fixes #1001) 2013-07-08 01:13:55 +02:00
fc79158de2 VimeoIE: authentication support (closes #885) and add a method in the base InfoExtractor to get the login info 2013-07-07 23:24:34 +02:00
7763b04e5f YoutubeIE: extract the thumbnail in the best possible quality 2013-07-07 21:21:15 +02:00
9d7b44b4cc release 2013.07.07.01 2013-07-07 17:13:56 +02:00
897f36d179 [youtube:subscriptions] Use colon for differentiation of shortcuts 2013-07-07 17:13:26 +02:00
94c3637f6d release 2013.07.07 2013-07-07 16:55:06 +02:00
04cc96173c [youtube] Add and extractor for the subscriptions feed (closes #498)
It can be downloaded using the ytsubscriptions keyword.
It needs the login information.
2013-07-07 13:58:23 +02:00
fbaaad49d7 Add BrightcoveIE (closes #832)
It only accepts the urls that are use for embedding the video, it doesn't search in generic webpages to find Brightcove videos
2013-07-05 21:31:50 +02:00
b29f3b250d DailymotionIE: extract thumbnail 2013-07-05 19:39:37 +02:00
fa343954d4 release 2013.07.05 2013-07-05 14:46:24 +02:00
2491f5898e DailymotionIE: simplify the extraction of the title and remove an unused assignment of video_uploader 2013-07-05 14:20:15 +02:00
b27c856fbc Dailymotion: fix the download of the video in the max quality (closes #986) 2013-07-05 14:15:26 +02:00
9941ceb331 ArteTVIE: support emission urls that don't contain the video id
Like http://www.arte.tv/guide/fr/emissions/AJT/arte-journal
2013-07-05 12:56:41 +02:00
c536d38059 release 2013.07.04 2013-07-04 18:07:34 +02:00
8de64cac98 [arte] Fix language selection (Fixes #988) 2013-07-04 18:07:03 +02:00
6d6d286539 Merge branch 'master' of github.com:rg3/youtube-dl 2013-07-03 16:36:42 +02:00
5d2eac9eba [auengine] Add tests (Fixes #985) 2013-07-03 16:36:36 +02:00
9826925a20 ArteTVIE: extract the video with the correct language
Some urls from the French version of the page could download the German version.

Also instead of extracting the json url from the webpage, build it to skip the download
2013-07-02 17:34:40 +02:00
24a267b562 TudouIE: extract all the segments of the video and download the best quality (closes #975)
Also simplify a bit the extraction of the id from the url and write directly the title for the test video
2013-07-02 12:38:24 +02:00
d4da3d6116 BlipTVIE: download the video in the best quality (closes #215) 2013-07-02 10:40:23 +02:00
d5a62e4f5f release 2013.07.02 2013-07-02 09:14:09 +02:00
9a82b2389f Do not show bug report for errors that are to be expected (Closes #973) 2013-07-02 08:40:21 +02:00
8dba13f7e8 Squelch git not found exception (#973) 2013-07-02 08:36:20 +02:00
deacef651f Improve formatting 2013-07-02 08:35:39 +02:00
2e1b3afeca README.md: Fix markup and some of the text.
(Originally from Rogério Brito <rbrito@ime.usp.br>)
2013-07-02 07:39:54 +02:00
652e776893 setup: PEP-8 fixes.
Signed-off-by: Rogério Brito <rbrito@ime.usp.br>
2013-07-01 23:17:48 -03:00
d055fe4cb0 setup: cosmetics: Add/remove some whitespace for readability.
This also fixes some long lines.

Signed-off-by: Rogério Brito <rbrito@ime.usp.br>
2013-07-01 23:17:48 -03:00
131842bb0b setup: Move pseudo-docstring to a proper comment.
A string statement is not a docstring if it doesn't occur right at the top
of modules, functions, class definitions etc.

This patch fixes it.

Signed-off-by: Rogério Brito <rbrito@ime.usp.br>
2013-07-01 23:17:48 -03:00
59fc531f78 Add InstagramIE (related #904) 2013-07-01 21:08:54 +02:00
5c44c15438 GenericIE: match titles that spread across multiple lines (related #904) 2013-07-01 20:50:50 +02:00
62067cb9b8 Shorten --list-extractor-descriptions to --extractor-descriptions 2013-07-01 18:59:29 +02:00
0f81866329 Add --list-extractor-descriptions (human-readable list of IEs) 2013-07-01 18:52:19 +02:00
2db67bc0f4 Merge branch 'master' of github.com:rg3/youtube-dl 2013-07-01 18:21:36 +02:00
7dba9cd039 Sort IEs alphabetically in --list-extractors 2013-07-01 18:21:29 +02:00
75dff0eef7 [youtube]: add YoutubeShowIE (closes #14)
It just extracts the playlists urls for each season
2013-07-01 17:59:28 +02:00
d828f3a550 YoutubeIE: use a negative index when accessing the last element of the format list 2013-07-01 17:19:33 +02:00
bcd6e4bd07 YoutubeIE: extract the correct video id for movie URLs (closes #597) 2013-07-01 16:51:18 +02:00
53936f3d57 Merge remote-tracking branch 'yasoob/master'
Conflicts:
	youtube_dl/extractor/__init__.py
2013-07-01 15:19:45 +02:00
0beb3add18 Separate downloader options 2013-07-01 14:53:25 +02:00
f9bd64c098 [update] Add package manager to error message (#959) 2013-07-01 02:36:49 +02:00
d7f44b5bdb [youtube] Warn if URL is most likely wrong (#969) 2013-07-01 02:29:29 +02:00
48bfb5f238 [instagram] Fix title 2013-06-30 14:07:32 +02:00
97ebe8dcaf StatigramIE: update the title of the test video 2013-06-30 13:57:57 +02:00
d4409747ba TumblrIE: update test
The video (once more) is no longer available
2013-06-30 13:52:20 +02:00
37b6a6617f ArteTvIE: support videos from videos.arte.tv
Each source of videos have a different extraction process, they are in different methods of the extractor.
Changed the extension of videos from mp4 to flv.
2013-06-30 13:38:22 +02:00
ca1c9cfe11 release 2013.06.34.4 2013-06-29 20:22:08 +02:00
adeb4d7469 Merge remote-tracking branch 'origin/master' 2013-06-29 20:21:13 +02:00
50587ee8ec [vimeo] fix detection for http://vimeo.com/groups/124584/videos/24973060 2013-06-29 20:20:20 +02:00
8244288dfe WatIE: support videos divided in multiple parts (closes #222 and #659)
The id for the videos is now the full id, no the one in the webpage url.
Also extract more information: description, view_count and upload_date
2013-06-29 18:22:03 +02:00
6ffe72835a [tutv] Fix URL type (for Python 3) 2013-06-29 17:42:15 +02:00
8ba5e990a5 release 2013.06.34.3 2013-06-29 17:30:11 +02:00
9afb1afcc6 [tutv] Add IE (Fixes #965) 2013-06-29 17:29:40 +02:00
0e21093a8f Merge branch 'master' of github.com:rg3/youtube-dl 2013-06-29 16:57:34 +02:00
9c5cd0948f [ted] Fix test checksum 2013-06-29 16:45:56 +02:00
1083705fe8 Update the default output template in the README
It was changed in 08b2ac745a
2013-06-29 16:35:28 +02:00
f3d294617f Document view_count (Closes #963) 2013-06-29 16:32:28 +02:00
de33a30858 Merge pull request #962 from jaimeMF/TF1
Add TF1IE
2013-06-29 07:30:49 -07:00
887a227953 added an IE and test for traileraddict.com 2013-06-29 19:17:27 +05:00
705f6f35bc Move TF1IE to its own file 2013-06-29 15:18:19 +02:00
e648b22dbd Add TF1IE 2013-06-29 15:07:25 +02:00
257a2501fa keep track of the dates and html5player versions of working YT signature algos 2013-06-29 01:05:36 +02:00
99afb3ddd4 Add WatIE 2013-06-28 22:01:47 +02:00
a3c776203f Rewrote error message a bit to clarify 2013-06-28 18:53:31 +02:00
53f350c165 Changed the error message.
I changed the ExtractorError from ```msg = msg + u'; please report this issue on http://yt-dl.org/bug'``` to ```msg = msg + u'; please report this issue on http://yt-dl.org/bug with the complete output by running the same command with --verbose flag'```
Hopefully this will tell the users to report bugs with the complete output.
2013-06-28 18:51:54 +02:00
f46d31f948 Add RingTVIE (Thanks @yasoob) 2013-06-28 18:51:00 +02:00
bf64ff72db Added an IE for gamespot. Although gamespot allows downloading but it is only available to registered users. With this IE no registration is required. 2013-06-28 18:42:45 +02:00
bc2884afc1 Print which IE is being skipped in test_download 2013-06-28 11:20:00 +02:00
023fa8c440 Add function add_default_info_extractors to YoutubeDL
It adds to the list the ies returned by ge_extractors
2013-06-27 23:51:06 +02:00
427023a1e6 Merge branch 'generate-ie-list' 2013-06-27 22:44:02 +02:00
a924876fed Make sure that IEs only accept their own URLs 2013-06-27 21:25:51 +02:00
3f223f7b2e [tumblr] Fix title 2013-06-27 21:19:42 +02:00
fc2c063e1e Move testcase generator to helper 2013-06-27 21:15:16 +02:00
20db33e299 Make sure SoundcloudIE does not match soundcloud sets 2013-06-27 21:11:23 +02:00
c0109aa497 release 2013.06.34.2 2013-06-27 20:50:57 +02:00
ba7a1de04d Credit @gitprojs for auengine 2013-06-27 20:50:34 +02:00
4269e78a80 Merge branch 'master' of github.com:rg3/youtube-dl 2013-06-27 20:47:03 +02:00
6f5ac90cf3 Move tests to the IE definitions 2013-06-27 20:46:46 +02:00
de282fc217 Merge pull request #954 from gitprojs/generic
Augmented Generic IE
2013-06-27 11:44:46 -07:00
ddbd903576 Tests: Add coding to files 2013-06-27 20:32:02 +02:00
0c56a3f773 [googleplus] move tests 2013-06-27 20:31:27 +02:00
9d069c4778 [infoq] move tests 2013-06-27 20:27:08 +02:00
0d843f796b Remove superfluous name declarations 2013-06-27 20:25:56 +02:00
67f51b3d8c [youku] move tests 2013-06-27 20:25:46 +02:00
5c5de1c79a [eighttracks] move test 2013-06-27 20:22:00 +02:00
0821771466 [steam] move test 2013-06-27 20:20:00 +02:00
83f6f68e79 [metacafe] move tests 2013-06-27 20:18:35 +02:00
27473d18da Made 'video' the default title for generic IE 2013-06-27 19:18:15 +01:00
0c6c096c20 [soundcloud] Move tests 2013-06-27 20:17:21 +02:00
52c8ade4ad Made generic IE handle more cases
Added a possible quote after file, so it can now handle cases like:
'file': 'http://www.a.com/b.mp4'
2013-06-27 19:16:09 +01:00
0e853ca4c4 [youtube] Fix tests in 2.x 2013-06-27 19:55:39 +02:00
41beccbab0 Use str every time 2013-06-27 19:43:43 +02:00
2eb88d953f Allow _TESTS attribute for IEs with multiple tests
This also improves the numbering of duplicate tests
2013-06-27 19:13:11 +02:00
1f0483b4b1 Generate the list of IEs automatically
It seems like GenericIE needs to be last, but other than that, the order really does not matter anymore.
To cut down on merge conflicts, generate the list of IEs automatically.
2013-06-27 18:43:32 +02:00
6b47c7f24e Allow moving tests into IE files
Allow adding download tests right in the IE file.
This will cut down on merge conflicts and make it more likely that new IE authors will add tests right away.
2013-06-27 18:28:45 +02:00
d798e1c7a9 [auengine] Rename to official capitalization 2013-06-27 18:19:19 +02:00
3a8736bd74 Merge remote-tracking branch 'gitprojs/master'
Conflicts:
	youtube_dl/extractor/__init__.py
2013-06-27 18:16:41 +02:00
e4decf2750 Updated auengine IE to use compat_urllib* utils 2013-06-27 13:48:28 +01:00
62008f69c1 Added an IE for auengine.com 2013-06-27 12:58:09 +01:00
198 changed files with 13728 additions and 3041 deletions

10
.gitignore vendored
View File

@ -17,4 +17,12 @@ youtube-dl.tar.gz
.coverage
cover/
updates_key.pem
*.egg-info
*.egg-info
*.srt
*.sbv
*.vtt
*.flv
*.mp4
*.part
test/testdata
.tox

View File

@ -3,12 +3,16 @@ python:
- "2.6"
- "2.7"
- "3.3"
before_install:
- sudo apt-get update -qq
- sudo apt-get install -qq rtmpdump
script: nosetests test --verbose
notifications:
email:
- filippo.valsorda@gmail.com
- phihag@phihag.de
- jaime.marquinez.ferrandiz+travis@gmail.com
- yasoob.khld@gmail.com
# irc:
# channels:
# - "irc.freenode.org#youtube-dl"

View File

@ -13,13 +13,13 @@ PYTHON=/usr/bin/env python
# set SYSCONFDIR to /etc if PREFIX=/usr or PREFIX=/usr/local
ifeq ($(PREFIX),/usr)
SYSCONFDIR=/etc
SYSCONFDIR=/etc
else
ifeq ($(PREFIX),/usr/local)
SYSCONFDIR=/etc
else
SYSCONFDIR=$(PREFIX)/etc
endif
ifeq ($(PREFIX),/usr/local)
SYSCONFDIR=/etc
else
SYSCONFDIR=$(PREFIX)/etc
endif
endif
install: youtube-dl youtube-dl.1 youtube-dl.bash-completion
@ -71,6 +71,7 @@ youtube-dl.tar.gz: youtube-dl README.md README.txt youtube-dl.1 youtube-dl.bash-
--exclude '*~' \
--exclude '__pycache' \
--exclude '.git' \
--exclude 'testdata' \
-- \
bin devscripts test youtube_dl \
CHANGELOG LICENSE README.md README.txt \

View File

@ -16,23 +16,27 @@ which means you can modify it, redistribute it or use it however you like.
# OPTIONS
-h, --help print this help text and exit
--version print program version and exit
-U, --update update this program to latest version
-i, --ignore-errors continue on download errors
-r, --rate-limit LIMIT maximum download rate (e.g. 50k or 44.6m)
-R, --retries RETRIES number of retries (default is 10)
--buffer-size SIZE size of download buffer (e.g. 1024 or 16k)
(default is 1024)
--no-resize-buffer do not automatically adjust the buffer size. By
default, the buffer size is automatically resized
from an initial value of SIZE.
-U, --update update this program to latest version. Make sure
that you have sufficient permissions (run with
sudo if needed)
-i, --ignore-errors continue on download errors, for example to to
skip unavailable videos in a playlist
--abort-on-error Abort downloading of further videos (in the
playlist or the command line) if an error occurs
--dump-user-agent display the current browser identification
--user-agent UA specify a custom user agent
--referer REF specify a custom referer, use if the video access
is restricted to one domain
--list-extractors List all supported extractors and the URLs they
would handle
--extractor-descriptions Output descriptions of all supported extractors
--proxy URL Use the specified HTTP/HTTPS proxy
--no-check-certificate Suppress HTTPS certificate validation.
--cache-dir DIR Location in the filesystem where youtube-dl can
store downloaded information permanently. By
default $XDG_CACHE_HOME/youtube-dl or ~/.cache
/youtube-dl .
--no-cache-dir Disable filesystem caching
## Video Selection:
--playlist-start NUMBER playlist video to start at (default is 1)
@ -49,6 +53,20 @@ which means you can modify it, redistribute it or use it however you like.
--date DATE download only videos uploaded in this date
--datebefore DATE download only videos uploaded before this date
--dateafter DATE download only videos uploaded after this date
--no-playlist download only the currently playing video
--age-limit YEARS download only videos suitable for the given age
--download-archive FILE Download only videos not present in the archive
file. Record all downloaded videos in it.
## Download Options:
-r, --rate-limit LIMIT maximum download rate in bytes per second (e.g.
50K or 4.2M)
-R, --retries RETRIES number of retries (default is 10)
--buffer-size SIZE size of download buffer (e.g. 1024 or 16K)
(default is 1024)
--no-resize-buffer do not automatically adjust the buffer size. By
default, the buffer size is automatically resized
from an initial value of SIZE.
## Filesystem Options:
-t, --title use title in file name (default)
@ -60,7 +78,10 @@ which means you can modify it, redistribute it or use it however you like.
%(uploader_id)s for the uploader nickname if
different, %(autonumber)s to get an automatically
incremented number, %(ext)s for the filename
extension, %(upload_date)s for the upload date
extension, %(format)s for the format description
(like "22 - 1280x720" or "HD"),%(format_id)s for
the unique id of the format (like Youtube's
itags: "137"),%(upload_date)s for the upload date
(YYYYMMDD), %(extractor)s for the provider
(youtube, metacafe, etc), %(id)s for the video id
, %(playlist)s for the playlist the video is in,
@ -71,12 +92,14 @@ which means you can modify it, redistribute it or use it however you like.
ownloads/%(uploader)s/%(title)s-%(id)s.%(ext)s' .
--autonumber-size NUMBER Specifies the number of digits in %(autonumber)s
when it is present in output filename template or
--autonumber option is given
--auto-number option is given
--restrict-filenames Restrict filenames to only ASCII characters, and
avoid "&" and spaces in filenames
-a, --batch-file FILE file containing URLs to download ('-' for stdin)
-w, --no-overwrites do not overwrite files
-c, --continue resume partially downloaded files
-c, --continue force resume of partially downloaded files. By
default, youtube-dl will resume downloads if
possible.
--no-continue do not resume partially downloaded files (restart
from beginning)
--cookies FILE file to read cookies from and dump cookie jar in
@ -85,6 +108,7 @@ which means you can modify it, redistribute it or use it however you like.
file modification time
--write-description write video description to a .description file
--write-info-json write video metadata to a .info.json file
--write-annotations write video annotations to a .annotation file
--write-thumbnail write thumbnail image to disk
## Verbosity / Simulation Options:
@ -99,34 +123,38 @@ which means you can modify it, redistribute it or use it however you like.
--get-description simulate, quiet but print video description
--get-filename simulate, quiet but print output filename
--get-format simulate, quiet but print output format
-j, --dump-json simulate, quiet but print JSON information
--newline output progress bar as new lines
--no-progress do not print progress bar
--console-title display progress in console titlebar
-v, --verbose print various debugging information
--dump-intermediate-pages print downloaded pages to debug problems(very
verbose)
--write-pages Write downloaded pages to files in the current
directory
## Video Format Options:
-f, --format FORMAT video format code, specifiy the order of
preference using slashes: "-f 22/17/18"
preference using slashes: "-f 22/17/18". "-f mp4"
and "-f flv" are also supported
--all-formats download all available video formats
--prefer-free-formats prefer free video formats unless a specific one
is requested
--max-quality FORMAT highest quality format to download
-F, --list-formats list all available formats (currently youtube
only)
--write-sub write subtitle file (currently youtube only)
--write-auto-sub write automatic subtitle file (currently youtube
only)
--only-sub [deprecated] alias of --skip-download
## Subtitle Options:
--write-sub write subtitle file
--write-auto-sub write automatic subtitle file (youtube only)
--all-subs downloads all the available subtitles of the
video (currently youtube only)
video
--list-subs lists all available subtitles for the video
(currently youtube only)
--sub-format FORMAT subtitle format [srt/sbv/vtt] (default=srt)
(currently youtube only)
--sub-lang LANG language of the subtitles to download (optional)
use IETF language tags like 'en'
--sub-format FORMAT subtitle format (default=srt) ([sbv/vtt] youtube
only)
--sub-lang LANGS languages of the subtitles to download (optional)
separated by commas, use IETF language tags like
'en,pt'
## Authentication Options:
-u, --username USERNAME account username
@ -148,6 +176,9 @@ which means you can modify it, redistribute it or use it however you like.
processing; the video is erased by default
--no-post-overwrites do not overwrite post-processed files; the post-
processed files are overwritten by default
--embed-subs embed subtitles in the video (only for mp4
videos)
--add-metadata add metadata to the files
# CONFIGURATION
@ -168,7 +199,7 @@ The `-o` option allows users to indicate a template for the output file names. T
- `playlist`: The name or the id of the playlist that contains the video.
- `playlist_index`: The index of the video in the playlist, a five-digit number.
The current default template is `%(id)s.%(ext)s`, but that will be switchted to `%(title)s-%(id)s.%(ext)s` (which can be requested with `-t` at the moment).
The current default template is `%(title)s-%(id)s.%(ext)s`.
In some cases, you don't want special characters such as 中, spaces, or &, such as when transferring the downloaded filename to a Windows system or the filename through an 8bit-unsafe channel. In these cases, add the `--restrict-filenames` flag to get a shorter title:
@ -194,11 +225,11 @@ Examples:
### Can you please put the -b option back?
Most people asking this question are not aware that youtube-dl now defaults to downloading the highest available quality as reported by YouTube, which will be 1080p or 720p in some cases, so you no longer need the -b option. For some specific videos, maybe YouTube does not report them to be available in a specific high quality format you''re interested in. In that case, simply request it with the -f option and youtube-dl will try to download it.
Most people asking this question are not aware that youtube-dl now defaults to downloading the highest available quality as reported by YouTube, which will be 1080p or 720p in some cases, so you no longer need the `-b` option. For some specific videos, maybe YouTube does not report them to be available in a specific high quality format you're interested in. In that case, simply request it with the `-f` option and youtube-dl will try to download it.
### I get HTTP error 402 when trying to download a video. What's this?
Apparently YouTube requires you to pass a CAPTCHA test if you download too much. We''re [considering to provide a way to let you solve the CAPTCHA](https://github.com/rg3/youtube-dl/issues/154), but at the moment, your best course of action is pointing a webbrowser to the youtube URL, solving the CAPTCHA, and restart youtube-dl.
Apparently YouTube requires you to pass a CAPTCHA test if you download too much. We're [considering to provide a way to let you solve the CAPTCHA](https://github.com/rg3/youtube-dl/issues/154), but at the moment, your best course of action is pointing a webbrowser to the youtube URL, solving the CAPTCHA, and restart youtube-dl.
### I have downloaded a video but how can I play it?

View File

@ -1,14 +1,18 @@
__youtube-dl()
__youtube_dl()
{
local cur prev opts
COMPREPLY=()
cur="${COMP_WORDS[COMP_CWORD]}"
opts="{{flags}}"
keywords=":ytfavorites :ytrecommended :ytsubscriptions :ytwatchlater"
if [[ ${cur} == * ]] ; then
if [[ ${cur} =~ : ]]; then
COMPREPLY=( $(compgen -W "${keywords}" -- ${cur}) )
return 0
elif [[ ${cur} == * ]] ; then
COMPREPLY=( $(compgen -W "${opts}" -- ${cur}) )
return 0
fi
}
complete -F __youtube-dl youtube-dl
complete -F __youtube_dl youtube-dl

405
devscripts/buildserver.py Normal file
View File

@ -0,0 +1,405 @@
#!/usr/bin/python3
from http.server import HTTPServer, BaseHTTPRequestHandler
from socketserver import ThreadingMixIn
import argparse
import ctypes
import functools
import sys
import threading
import traceback
import os.path
class BuildHTTPServer(ThreadingMixIn, HTTPServer):
allow_reuse_address = True
advapi32 = ctypes.windll.advapi32
SC_MANAGER_ALL_ACCESS = 0xf003f
SC_MANAGER_CREATE_SERVICE = 0x02
SERVICE_WIN32_OWN_PROCESS = 0x10
SERVICE_AUTO_START = 0x2
SERVICE_ERROR_NORMAL = 0x1
DELETE = 0x00010000
SERVICE_STATUS_START_PENDING = 0x00000002
SERVICE_STATUS_RUNNING = 0x00000004
SERVICE_ACCEPT_STOP = 0x1
SVCNAME = 'youtubedl_builder'
LPTSTR = ctypes.c_wchar_p
START_CALLBACK = ctypes.WINFUNCTYPE(None, ctypes.c_int, ctypes.POINTER(LPTSTR))
class SERVICE_TABLE_ENTRY(ctypes.Structure):
_fields_ = [
('lpServiceName', LPTSTR),
('lpServiceProc', START_CALLBACK)
]
HandlerEx = ctypes.WINFUNCTYPE(
ctypes.c_int, # return
ctypes.c_int, # dwControl
ctypes.c_int, # dwEventType
ctypes.c_void_p, # lpEventData,
ctypes.c_void_p, # lpContext,
)
def _ctypes_array(c_type, py_array):
ar = (c_type * len(py_array))()
ar[:] = py_array
return ar
def win_OpenSCManager():
res = advapi32.OpenSCManagerW(None, None, SC_MANAGER_ALL_ACCESS)
if not res:
raise Exception('Opening service manager failed - '
'are you running this as administrator?')
return res
def win_install_service(service_name, cmdline):
manager = win_OpenSCManager()
try:
h = advapi32.CreateServiceW(
manager, service_name, None,
SC_MANAGER_CREATE_SERVICE, SERVICE_WIN32_OWN_PROCESS,
SERVICE_AUTO_START, SERVICE_ERROR_NORMAL,
cmdline, None, None, None, None, None)
if not h:
raise OSError('Service creation failed: %s' % ctypes.FormatError())
advapi32.CloseServiceHandle(h)
finally:
advapi32.CloseServiceHandle(manager)
def win_uninstall_service(service_name):
manager = win_OpenSCManager()
try:
h = advapi32.OpenServiceW(manager, service_name, DELETE)
if not h:
raise OSError('Could not find service %s: %s' % (
service_name, ctypes.FormatError()))
try:
if not advapi32.DeleteService(h):
raise OSError('Deletion failed: %s' % ctypes.FormatError())
finally:
advapi32.CloseServiceHandle(h)
finally:
advapi32.CloseServiceHandle(manager)
def win_service_report_event(service_name, msg, is_error=True):
with open('C:/sshkeys/log', 'a', encoding='utf-8') as f:
f.write(msg + '\n')
event_log = advapi32.RegisterEventSourceW(None, service_name)
if not event_log:
raise OSError('Could not report event: %s' % ctypes.FormatError())
try:
type_id = 0x0001 if is_error else 0x0004
event_id = 0xc0000000 if is_error else 0x40000000
lines = _ctypes_array(LPTSTR, [msg])
if not advapi32.ReportEventW(
event_log, type_id, 0, event_id, None, len(lines), 0,
lines, None):
raise OSError('Event reporting failed: %s' % ctypes.FormatError())
finally:
advapi32.DeregisterEventSource(event_log)
def win_service_handler(stop_event, *args):
try:
raise ValueError('Handler called with args ' + repr(args))
TODO
except Exception as e:
tb = traceback.format_exc()
msg = str(e) + '\n' + tb
win_service_report_event(service_name, msg, is_error=True)
raise
def win_service_set_status(handle, status_code):
svcStatus = SERVICE_STATUS()
svcStatus.dwServiceType = SERVICE_WIN32_OWN_PROCESS
svcStatus.dwCurrentState = status_code
svcStatus.dwControlsAccepted = SERVICE_ACCEPT_STOP
svcStatus.dwServiceSpecificExitCode = 0
if not advapi32.SetServiceStatus(handle, ctypes.byref(svcStatus)):
raise OSError('SetServiceStatus failed: %r' % ctypes.FormatError())
def win_service_main(service_name, real_main, argc, argv_raw):
try:
#args = [argv_raw[i].value for i in range(argc)]
stop_event = threading.Event()
handler = HandlerEx(functools.partial(stop_event, win_service_handler))
h = advapi32.RegisterServiceCtrlHandlerExW(service_name, handler, None)
if not h:
raise OSError('Handler registration failed: %s' %
ctypes.FormatError())
TODO
except Exception as e:
tb = traceback.format_exc()
msg = str(e) + '\n' + tb
win_service_report_event(service_name, msg, is_error=True)
raise
def win_service_start(service_name, real_main):
try:
cb = START_CALLBACK(
functools.partial(win_service_main, service_name, real_main))
dispatch_table = _ctypes_array(SERVICE_TABLE_ENTRY, [
SERVICE_TABLE_ENTRY(
service_name,
cb
),
SERVICE_TABLE_ENTRY(None, ctypes.cast(None, START_CALLBACK))
])
if not advapi32.StartServiceCtrlDispatcherW(dispatch_table):
raise OSError('ctypes start failed: %s' % ctypes.FormatError())
except Exception as e:
tb = traceback.format_exc()
msg = str(e) + '\n' + tb
win_service_report_event(service_name, msg, is_error=True)
raise
def main(args=None):
parser = argparse.ArgumentParser()
parser.add_argument('-i', '--install',
action='store_const', dest='action', const='install',
help='Launch at Windows startup')
parser.add_argument('-u', '--uninstall',
action='store_const', dest='action', const='uninstall',
help='Remove Windows service')
parser.add_argument('-s', '--service',
action='store_const', dest='action', const='service',
help='Run as a Windows service')
parser.add_argument('-b', '--bind', metavar='<host:port>',
action='store', default='localhost:8142',
help='Bind to host:port (default %default)')
options = parser.parse_args(args=args)
if options.action == 'install':
fn = os.path.abspath(__file__).replace('v:', '\\\\vboxsrv\\vbox')
cmdline = '%s %s -s -b %s' % (sys.executable, fn, options.bind)
win_install_service(SVCNAME, cmdline)
return
if options.action == 'uninstall':
win_uninstall_service(SVCNAME)
return
if options.action == 'service':
win_service_start(SVCNAME, main)
return
host, port_str = options.bind.split(':')
port = int(port_str)
print('Listening on %s:%d' % (host, port))
srv = BuildHTTPServer((host, port), BuildHTTPRequestHandler)
thr = threading.Thread(target=srv.serve_forever)
thr.start()
input('Press ENTER to shut down')
srv.shutdown()
thr.join()
def rmtree(path):
for name in os.listdir(path):
fname = os.path.join(path, name)
if os.path.isdir(fname):
rmtree(fname)
else:
os.chmod(fname, 0o666)
os.remove(fname)
os.rmdir(path)
#==============================================================================
class BuildError(Exception):
def __init__(self, output, code=500):
self.output = output
self.code = code
def __str__(self):
return self.output
class HTTPError(BuildError):
pass
class PythonBuilder(object):
def __init__(self, **kwargs):
pythonVersion = kwargs.pop('python', '2.7')
try:
key = _winreg.OpenKey(_winreg.HKEY_LOCAL_MACHINE, r'SOFTWARE\Python\PythonCore\%s\InstallPath' % pythonVersion)
try:
self.pythonPath, _ = _winreg.QueryValueEx(key, '')
finally:
_winreg.CloseKey(key)
except Exception:
raise BuildError('No such Python version: %s' % pythonVersion)
super(PythonBuilder, self).__init__(**kwargs)
class GITInfoBuilder(object):
def __init__(self, **kwargs):
try:
self.user, self.repoName = kwargs['path'][:2]
self.rev = kwargs.pop('rev')
except ValueError:
raise BuildError('Invalid path')
except KeyError as e:
raise BuildError('Missing mandatory parameter "%s"' % e.args[0])
path = os.path.join(os.environ['APPDATA'], 'Build archive', self.repoName, self.user)
if not os.path.exists(path):
os.makedirs(path)
self.basePath = tempfile.mkdtemp(dir=path)
self.buildPath = os.path.join(self.basePath, 'build')
super(GITInfoBuilder, self).__init__(**kwargs)
class GITBuilder(GITInfoBuilder):
def build(self):
try:
subprocess.check_output(['git', 'clone', 'git://github.com/%s/%s.git' % (self.user, self.repoName), self.buildPath])
subprocess.check_output(['git', 'checkout', self.rev], cwd=self.buildPath)
except subprocess.CalledProcessError as e:
raise BuildError(e.output)
super(GITBuilder, self).build()
class YoutubeDLBuilder(object):
authorizedUsers = ['fraca7', 'phihag', 'rg3', 'FiloSottile']
def __init__(self, **kwargs):
if self.repoName != 'youtube-dl':
raise BuildError('Invalid repository "%s"' % self.repoName)
if self.user not in self.authorizedUsers:
raise HTTPError('Unauthorized user "%s"' % self.user, 401)
super(YoutubeDLBuilder, self).__init__(**kwargs)
def build(self):
try:
subprocess.check_output([os.path.join(self.pythonPath, 'python.exe'), 'setup.py', 'py2exe'],
cwd=self.buildPath)
except subprocess.CalledProcessError as e:
raise BuildError(e.output)
super(YoutubeDLBuilder, self).build()
class DownloadBuilder(object):
def __init__(self, **kwargs):
self.handler = kwargs.pop('handler')
self.srcPath = os.path.join(self.buildPath, *tuple(kwargs['path'][2:]))
self.srcPath = os.path.abspath(os.path.normpath(self.srcPath))
if not self.srcPath.startswith(self.buildPath):
raise HTTPError(self.srcPath, 401)
super(DownloadBuilder, self).__init__(**kwargs)
def build(self):
if not os.path.exists(self.srcPath):
raise HTTPError('No such file', 404)
if os.path.isdir(self.srcPath):
raise HTTPError('Is a directory: %s' % self.srcPath, 401)
self.handler.send_response(200)
self.handler.send_header('Content-Type', 'application/octet-stream')
self.handler.send_header('Content-Disposition', 'attachment; filename=%s' % os.path.split(self.srcPath)[-1])
self.handler.send_header('Content-Length', str(os.stat(self.srcPath).st_size))
self.handler.end_headers()
with open(self.srcPath, 'rb') as src:
shutil.copyfileobj(src, self.handler.wfile)
super(DownloadBuilder, self).build()
class CleanupTempDir(object):
def build(self):
try:
rmtree(self.basePath)
except Exception as e:
print('WARNING deleting "%s": %s' % (self.basePath, e))
super(CleanupTempDir, self).build()
class Null(object):
def __init__(self, **kwargs):
pass
def start(self):
pass
def close(self):
pass
def build(self):
pass
class Builder(PythonBuilder, GITBuilder, YoutubeDLBuilder, DownloadBuilder, CleanupTempDir, Null):
pass
class BuildHTTPRequestHandler(BaseHTTPRequestHandler):
actionDict = { 'build': Builder, 'download': Builder } # They're the same, no more caching.
def do_GET(self):
path = urlparse.urlparse(self.path)
paramDict = dict([(key, value[0]) for key, value in urlparse.parse_qs(path.query).items()])
action, _, path = path.path.strip('/').partition('/')
if path:
path = path.split('/')
if action in self.actionDict:
try:
builder = self.actionDict[action](path=path, handler=self, **paramDict)
builder.start()
try:
builder.build()
finally:
builder.close()
except BuildError as e:
self.send_response(e.code)
msg = unicode(e).encode('UTF-8')
self.send_header('Content-Type', 'text/plain; charset=UTF-8')
self.send_header('Content-Length', len(msg))
self.end_headers()
self.wfile.write(msg)
except HTTPError as e:
self.send_response(e.code, str(e))
else:
self.send_response(500, 'Unknown build method "%s"' % action)
else:
self.send_response(500, 'Malformed URL')
#==============================================================================
if __name__ == '__main__':
main()

39
devscripts/check-porn.py Normal file
View File

@ -0,0 +1,39 @@
#!/usr/bin/env python
"""
This script employs a VERY basic heuristic ('porn' in webpage.lower()) to check
if we are not 'age_limit' tagging some porn site
"""
# Allow direct execution
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import get_testcases
from youtube_dl.utils import compat_urllib_request
for test in get_testcases():
try:
webpage = compat_urllib_request.urlopen(test['url'], timeout=10).read()
except:
print('\nFail: {0}'.format(test['name']))
continue
webpage = webpage.decode('utf8', 'replace')
if 'porn' in webpage.lower() and ('info_dict' not in test
or 'age_limit' not in test['info_dict']
or test['info_dict']['age_limit'] != 18):
print('\nPotential missing age_limit check: {0}'.format(test['name']))
elif 'porn' not in webpage.lower() and ('info_dict' in test and
'age_limit' in test['info_dict'] and
test['info_dict']['age_limit'] == 18):
print('\nPotential false negative: {0}'.format(test['name']))
else:
sys.stdout.write('.')
sys.stdout.flush()
print()

View File

@ -3,31 +3,40 @@
import json
import sys
import hashlib
import urllib.request
import os.path
if len(sys.argv) <= 1:
print('Specify the version number as parameter')
sys.exit()
print('Specify the version number as parameter')
sys.exit()
version = sys.argv[1]
with open('update/LATEST_VERSION', 'w') as f:
f.write(version)
f.write(version)
versions_info = json.load(open('update/versions.json'))
if 'signature' in versions_info:
del versions_info['signature']
del versions_info['signature']
new_version = {}
filenames = {'bin': 'youtube-dl', 'exe': 'youtube-dl.exe', 'tar': 'youtube-dl-%s.tar.gz' % version}
filenames = {
'bin': 'youtube-dl',
'exe': 'youtube-dl.exe',
'tar': 'youtube-dl-%s.tar.gz' % version}
build_dir = os.path.join('..', '..', 'build', version)
for key, filename in filenames.items():
print('Downloading and checksumming %s...' %filename)
url = 'http://youtube-dl.org/downloads/%s/%s' % (version, filename)
data = urllib.request.urlopen(url).read()
sha256sum = hashlib.sha256(data).hexdigest()
new_version[key] = (url, sha256sum)
url = 'https://yt-dl.org/downloads/%s/%s' % (version, filename)
fn = os.path.join(build_dir, filename)
with open(fn, 'rb') as f:
data = f.read()
if not data:
raise ValueError('File %s is empty!' % fn)
sha256sum = hashlib.sha256(data).hexdigest()
new_version[key] = (url, sha256sum)
versions_info['versions'][version] = new_version
versions_info['latest'] = version
json.dump(versions_info, open('update/versions.json', 'w'), indent=4, sort_keys=True)
with open('update/versions.json', 'w') as jsonf:
json.dump(versions_info, jsonf, indent=4, sort_keys=True)

View File

@ -22,7 +22,7 @@ entry_template=textwrap.dedent("""
<atom:link href="http://rg3.github.io/youtube-dl" />
<atom:content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
Downloads available at <a href="http://youtube-dl.org/downloads/@VERSION@/">http://youtube-dl.org/downloads/@VERSION@/</a>
Downloads available at <a href="https://yt-dl.org/downloads/@VERSION@/">https://yt-dl.org/downloads/@VERSION@/</a>
</div>
</atom:content>
<atom:author>
@ -54,4 +54,3 @@ atom_template = atom_template.replace('@ENTRIES@', entries_str)
with open('update/releases.atom','w',encoding='utf-8') as atom_file:
atom_file.write(atom_template)

View File

@ -0,0 +1,34 @@
#!/usr/bin/env python3
import sys
import os
import textwrap
# We must be able to import youtube_dl
sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))
import youtube_dl
def main():
with open('supportedsites.html.in', 'r', encoding='utf-8') as tmplf:
template = tmplf.read()
ie_htmls = []
for ie in sorted(youtube_dl.gen_extractors(), key=lambda i: i.IE_NAME.lower()):
ie_html = '<b>{}</b>'.format(ie.IE_NAME)
ie_desc = getattr(ie, 'IE_DESC', None)
if ie_desc is False:
continue
elif ie_desc is not None:
ie_html += ': {}'.format(ie.IE_DESC)
if ie.working() == False:
ie_html += ' (Currently broken)'
ie_htmls.append('<li>{}</li>'.format(ie_html))
template = template.replace('@SITES@', textwrap.indent('\n'.join(ie_htmls), '\t'))
with open('supportedsites.html', 'w', encoding='utf-8') as sitesf:
sitesf.write(template)
if __name__ == '__main__':
main()

View File

@ -55,8 +55,8 @@ git push origin "$version"
/bin/echo -e "\n### OK, now it is time to build the binaries..."
REV=$(git rev-parse HEAD)
make youtube-dl youtube-dl.tar.gz
wget "http://jeromelaheurte.net:8142/download/rg3/youtube-dl/youtube-dl.exe?rev=$REV" -O youtube-dl.exe || \
wget "http://jeromelaheurte.net:8142/build/rg3/youtube-dl/youtube-dl.exe?rev=$REV" -O youtube-dl.exe
read -p "VM running? (y/n) " -n 1
wget "http://localhost:8142/build/rg3/youtube-dl/youtube-dl.exe?rev=$REV" -O youtube-dl.exe
mkdir -p "build/$version"
mv youtube-dl youtube-dl.exe "build/$version"
mv youtube-dl.tar.gz "build/$version/youtube-dl-$version.tar.gz"
@ -67,7 +67,7 @@ RELEASE_FILES="youtube-dl youtube-dl.exe youtube-dl-$version.tar.gz"
(cd build/$version/ && sha512sum $RELEASE_FILES > SHA2-512SUMS)
git checkout HEAD -- youtube-dl youtube-dl.exe
/bin/echo -e "\n### Signing and uploading the new binaries to youtube-dl.org..."
/bin/echo -e "\n### Signing and uploading the new binaries to yt-dl.org ..."
for f in $RELEASE_FILES; do gpg --detach-sig "build/$version/$f"; done
scp -r "build/$version" ytdl@yt-dl.org:html/tmp/
ssh ytdl@yt-dl.org "mv html/tmp/$version html/downloads/"
@ -85,12 +85,9 @@ ROOT=$(pwd)
"$ROOT/devscripts/gh-pages/sign-versions.py" < "$ROOT/updates_key.pem"
"$ROOT/devscripts/gh-pages/generate-download.py"
"$ROOT/devscripts/gh-pages/update-copyright.py"
"$ROOT/devscripts/gh-pages/update-sites.py"
git add *.html *.html.in update
git commit -m "release $version"
git show HEAD
read -p "Is it good, can I push? (y/n) " -n 1
if [[ ! $REPLY =~ ^[Yy]$ ]]; then exit 1; fi
echo
git push "$ROOT" gh-pages
git push "$ORIGIN_URL" gh-pages
)

View File

@ -1,76 +0,0 @@
#!/usr/bin/env python
# Generate youtube signature algorithm from test cases
import sys
tests = [
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[]}|:;?/>.<",
"J:|}][{=+-_)(*&;%$#@>MNBVCXZASDFGH^KLPOIUYTREWQ0987654321mnbvcxzasdfghrklpoiuytej"),
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$^&*()_-+={[]}|:;?/>.<",
"!?;:|}][{=+-_)(*&^$#@/MNBVCXZASqFGHJKLPOIUYTREWQ0987654321mnbvcxzasdfghjklpoiuytr"),
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[|};?/>.<",
"ertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!/#$%^&*()_-+={[|};?@"),
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[};?/>.<",
"{>/?;}[.=+-_)(*&^%$#@!MqBVCXZASDFwHJKLPOIUYTREWQ0987654321mnbvcxzasdfghjklpoiuytr"),
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[};?>.<",
"<.>?;}[{=+-_)(*&^%$#@!MNBVCXZASDFGHJKLPOIUYTREWe098765432rmnbvcxzasdfghjklpoiuyt1"),
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!#$%^&*()_+={[};?/>.<",
"D.>/?;}[{=+_)(*&^%$#!MNBVCXeAS<FGHJKLPOIUYTREWZ0987654321mnbvcxzasdfghjklpoiuytrQ"),
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKHGFDSAZXCVBNM!@#$%^&*(-+={[};?/>.<",
"Q>/?;}[{=+-(*<^%$#@!MNBVCXZASDFGHKLPOIUY8REWT0q&7654321mnbvcxzasdfghjklpoiuytrew9"),
]
def find_matching(wrong, right):
idxs = [wrong.index(c) for c in right]
return compress(idxs)
return ('s[%d]' % i for i in idxs)
def compress(idxs):
def _genslice(start, end, step):
starts = '' if start == 0 else str(start)
ends = ':%d' % (end+step)
steps = '' if step == 1 else (':%d' % step)
return 's[%s%s%s]' % (starts, ends, steps)
step = None
for i, prev in zip(idxs[1:], idxs[:-1]):
if step is not None:
if i - prev == step:
continue
yield _genslice(start, prev, step)
step = None
continue
if i - prev in [-1, 1]:
step = i - prev
start = prev
continue
else:
yield 's[%d]' % prev
if step is None:
yield 's[%d]' % i
else:
yield _genslice(start, i, step)
def _assert_compress(inp, exp):
res = list(compress(inp))
if res != exp:
print('Got %r, expected %r' % (res, exp))
assert res == exp
_assert_compress([0,2,4,6], ['s[0]', 's[2]', 's[4]', 's[6]'])
_assert_compress([0,1,2,4,6,7], ['s[:3]', 's[4]', 's[6:8]'])
_assert_compress([8,0,1,2,4,7,6,9], ['s[8]', 's[:3]', 's[4]', 's[7:5:-1]', 's[9]'])
def gen(wrong, right, indent):
code = ' + '.join(find_matching(wrong, right))
return 'if len(s) == %d:\n%s return %s\n' % (len(wrong), indent, code)
def genall(tests):
indent = ' ' * 8
return indent + (indent + 'el').join(gen(wrong, right, indent) for wrong,right in tests)
def main():
print(genall(tests))
if __name__ == '__main__':
main()

View File

@ -8,12 +8,15 @@ import sys
try:
from setuptools import setup
setuptools_available = True
except ImportError:
from distutils.core import setup
setuptools_available = False
try:
# This will create an exe that needs Microsoft Visual C++ 2008
# Redistributable Package
import py2exe
"""This will create an exe that needs Microsoft Visual C++ 2008 Redistributable Package"""
except ImportError:
if len(sys.argv) >= 2 and sys.argv[1] == 'py2exe':
print("Cannot import py2exe", file=sys.stderr)
@ -26,13 +29,15 @@ py2exe_options = {
"dist_dir": '.',
"dll_excludes": ['w9xpopen.exe'],
}
py2exe_console = [{
"script": "./youtube_dl/__main__.py",
"dest_base": "youtube-dl",
}]
py2exe_params = {
'console': py2exe_console,
'options': { "py2exe": py2exe_options },
'options': {"py2exe": py2exe_options},
'zipfile': None
}
@ -40,31 +45,39 @@ if len(sys.argv) >= 2 and sys.argv[1] == 'py2exe':
params = py2exe_params
else:
params = {
'scripts': ['bin/youtube-dl'],
'data_files': [('etc/bash_completion.d', ['youtube-dl.bash-completion']), # Installing system-wide would require sudo...
('share/doc/youtube_dl', ['README.txt']),
('share/man/man1/', ['youtube-dl.1'])]
'data_files': [ # Installing system-wide would require sudo...
('etc/bash_completion.d', ['youtube-dl.bash-completion']),
('share/doc/youtube_dl', ['README.txt']),
('share/man/man1', ['youtube-dl.1'])
]
}
if setuptools_available:
params['entry_points'] = {'console_scripts': ['youtube-dl = youtube_dl:main']}
else:
params['scripts'] = ['bin/youtube-dl']
# Get the version from youtube_dl/version.py without importing the package
exec(compile(open('youtube_dl/version.py').read(), 'youtube_dl/version.py', 'exec'))
exec(compile(open('youtube_dl/version.py').read(),
'youtube_dl/version.py', 'exec'))
setup(
name = 'youtube_dl',
version = __version__,
description = 'YouTube video downloader',
long_description = 'Small command-line program to download videos from YouTube.com and other video sites.',
url = 'https://github.com/rg3/youtube-dl',
author = 'Ricardo Garcia',
maintainer = 'Philipp Hagemeister',
maintainer_email = 'phihag@phihag.de',
packages = ['youtube_dl', 'youtube_dl.extractor'],
name='youtube_dl',
version=__version__,
description='YouTube video downloader',
long_description='Small command-line program to download videos from'
' YouTube.com and other video sites.',
url='https://github.com/rg3/youtube-dl',
author='Ricardo Garcia',
author_email='ytdl@yt-dl.org',
maintainer='Philipp Hagemeister',
maintainer_email='phihag@phihag.de',
packages=['youtube_dl', 'youtube_dl.extractor'],
# Provokes warning on most systems (why?!)
#test_suite = 'nose.collector',
#test_requires = ['nosetest'],
# test_suite = 'nose.collector',
# test_requires = ['nosetest'],
classifiers = [
classifiers=[
"Topic :: Multimedia :: Video",
"Development Status :: 5 - Production/Stable",
"Environment :: Console",

0
test/__init__.py Normal file
View File

View File

@ -1,33 +1,85 @@
import errno
import io
import hashlib
import json
import os.path
import re
import types
import sys
from youtube_dl import YoutubeDL, YoutubeDLHandler
from youtube_dl.utils import (
compat_cookiejar,
compat_urllib_request,
)
import youtube_dl.extractor
from youtube_dl import YoutubeDL
from youtube_dl.utils import preferredencoding
# General configuration (from __init__, not very elegant...)
jar = compat_cookiejar.CookieJar()
cookie_processor = compat_urllib_request.HTTPCookieProcessor(jar)
proxy_handler = compat_urllib_request.ProxyHandler()
opener = compat_urllib_request.build_opener(proxy_handler, cookie_processor, YoutubeDLHandler())
compat_urllib_request.install_opener(opener)
PARAMETERS_FILE = os.path.join(os.path.dirname(os.path.abspath(__file__)), "parameters.json")
with io.open(PARAMETERS_FILE, encoding='utf-8') as pf:
parameters = json.load(pf)
def get_params(override=None):
PARAMETERS_FILE = os.path.join(os.path.dirname(os.path.abspath(__file__)),
"parameters.json")
with io.open(PARAMETERS_FILE, encoding='utf-8') as pf:
parameters = json.load(pf)
if override:
parameters.update(override)
return parameters
def try_rm(filename):
""" Remove a file if it exists """
try:
os.remove(filename)
except OSError as ose:
if ose.errno != errno.ENOENT:
raise
def report_warning(message):
'''
Print the message to stderr, it will be prefixed with 'WARNING:'
If stderr is a tty file the 'WARNING:' will be colored
'''
if sys.stderr.isatty() and os.name != 'nt':
_msg_header = u'\033[0;33mWARNING:\033[0m'
else:
_msg_header = u'WARNING:'
output = u'%s %s\n' % (_msg_header, message)
if 'b' in getattr(sys.stderr, 'mode', '') or sys.version_info[0] < 3:
output = output.encode(preferredencoding())
sys.stderr.write(output)
class FakeYDL(YoutubeDL):
def __init__(self):
self.result = []
def __init__(self, override=None):
# Different instances of the downloader can't share the same dictionary
# some test set the "sublang" parameter, which would break the md5 checks.
self.params = dict(parameters)
def to_screen(self, s):
params = get_params(override=override)
super(FakeYDL, self).__init__(params)
self.result = []
def to_screen(self, s, skip_eol=None):
print(s)
def trouble(self, s, tb=None):
raise Exception(s)
def download(self, x):
self.result.append(x)
self.result.append(x)
def expect_warning(self, regex):
# Silence an expected warning matching a regex
old_report_warning = self.report_warning
def report_warning(self, message):
if re.match(regex, message): return
old_report_warning(message)
self.report_warning = types.MethodType(report_warning, self)
def get_testcases():
for ie in youtube_dl.extractor.gen_extractors():
t = getattr(ie, '_TEST', None)
if t:
t['name'] = type(ie).__name__[:-len('IE')]
yield t
for t in getattr(ie, '_TESTS', []):
t['name'] = type(ie).__name__[:-len('IE')]
yield t
md5 = lambda s: hashlib.md5(s.encode('utf-8')).hexdigest()

View File

@ -38,7 +38,6 @@
"writedescription": false,
"writeinfojson": true,
"writesubtitles": false,
"onlysubtitles": false,
"allsubtitles": false,
"listssubtitles": false
}

145
test/test_YoutubeDL.py Normal file
View File

@ -0,0 +1,145 @@
#!/usr/bin/env python
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import FakeYDL
class YDL(FakeYDL):
def __init__(self, *args, **kwargs):
super(YDL, self).__init__(*args, **kwargs)
self.downloaded_info_dicts = []
self.msgs = []
def process_info(self, info_dict):
self.downloaded_info_dicts.append(info_dict)
def to_screen(self, msg):
self.msgs.append(msg)
class TestFormatSelection(unittest.TestCase):
def test_prefer_free_formats(self):
# Same resolution => download webm
ydl = YDL()
ydl.params['prefer_free_formats'] = True
formats = [
{u'ext': u'webm', u'height': 460},
{u'ext': u'mp4', u'height': 460},
]
info_dict = {u'formats': formats, u'extractor': u'test'}
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded[u'ext'], u'webm')
# Different resolution => download best quality (mp4)
ydl = YDL()
ydl.params['prefer_free_formats'] = True
formats = [
{u'ext': u'webm', u'height': 720},
{u'ext': u'mp4', u'height': 1080},
]
info_dict[u'formats'] = formats
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded[u'ext'], u'mp4')
# No prefer_free_formats => keep original formats order
ydl = YDL()
ydl.params['prefer_free_formats'] = False
formats = [
{u'ext': u'webm', u'height': 720},
{u'ext': u'flv', u'height': 720},
]
info_dict[u'formats'] = formats
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded[u'ext'], u'flv')
def test_format_limit(self):
formats = [
{u'format_id': u'meh', u'url': u'http://example.com/meh'},
{u'format_id': u'good', u'url': u'http://example.com/good'},
{u'format_id': u'great', u'url': u'http://example.com/great'},
{u'format_id': u'excellent', u'url': u'http://example.com/exc'},
]
info_dict = {
u'formats': formats, u'extractor': u'test', 'id': 'testvid'}
ydl = YDL()
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded[u'format_id'], u'excellent')
ydl = YDL({'format_limit': 'good'})
assert ydl.params['format_limit'] == 'good'
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded[u'format_id'], u'good')
ydl = YDL({'format_limit': 'great', 'format': 'all'})
ydl.process_ie_result(info_dict)
self.assertEqual(ydl.downloaded_info_dicts[0][u'format_id'], u'meh')
self.assertEqual(ydl.downloaded_info_dicts[1][u'format_id'], u'good')
self.assertEqual(ydl.downloaded_info_dicts[2][u'format_id'], u'great')
self.assertTrue('3' in ydl.msgs[0])
ydl = YDL()
ydl.params['format_limit'] = 'excellent'
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded[u'format_id'], u'excellent')
def test_format_selection(self):
formats = [
{u'format_id': u'35', u'ext': u'mp4'},
{u'format_id': u'45', u'ext': u'webm'},
{u'format_id': u'47', u'ext': u'webm'},
{u'format_id': u'2', u'ext': u'flv'},
]
info_dict = {u'formats': formats, u'extractor': u'test'}
ydl = YDL({'format': u'20/47'})
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], u'47')
ydl = YDL({'format': u'20/71/worst'})
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], u'35')
ydl = YDL()
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], u'2')
ydl = YDL({'format': u'webm/mp4'})
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], u'47')
ydl = YDL({'format': u'3gp/40/mp4'})
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], u'35')
def test_add_extra_info(self):
test_dict = {
'extractor': 'Foo',
}
extra_info = {
'extractor': 'Bar',
'playlist': 'funny videos',
}
YDL.add_extra_info(test_dict, extra_info)
self.assertEqual(test_dict['extractor'], 'Foo')
self.assertEqual(test_dict['playlist'], 'funny videos')
if __name__ == '__main__':
unittest.main()

View File

@ -0,0 +1,54 @@
#!/usr/bin/env python
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import try_rm
from youtube_dl import YoutubeDL
def _download_restricted(url, filename, age):
""" Returns true iff the file has been downloaded """
params = {
'age_limit': age,
'skip_download': True,
'writeinfojson': True,
"outtmpl": "%(id)s.%(ext)s",
}
ydl = YoutubeDL(params)
ydl.add_default_info_extractors()
json_filename = os.path.splitext(filename)[0] + '.info.json'
try_rm(json_filename)
ydl.download([url])
res = os.path.exists(json_filename)
try_rm(json_filename)
return res
class TestAgeRestriction(unittest.TestCase):
def _assert_restricted(self, url, filename, age, old_age=None):
self.assertTrue(_download_restricted(url, filename, old_age))
self.assertFalse(_download_restricted(url, filename, age))
def test_youtube(self):
self._assert_restricted('07FYdnEawAQ', '07FYdnEawAQ.mp4', 10)
def test_youporn(self):
self._assert_restricted(
'http://www.youporn.com/watch/505835/sex-ed-is-it-safe-to-masturbate-daily/',
'505835.mp4', 2, old_age=25)
def test_pornotube(self):
self._assert_restricted(
'http://pornotube.com/c/173/m/1689755/Marilyn-Monroe-Bathing',
'1689755.flv', 13)
if __name__ == '__main__':
unittest.main()

View File

@ -1,33 +1,66 @@
#!/usr/bin/env python
import sys
import unittest
# Allow direct execution
import os
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import get_testcases
from youtube_dl.extractor import (
gen_extractors,
JustinTVIE,
YoutubeIE,
)
from youtube_dl.extractor import YoutubeIE, YoutubePlaylistIE, YoutubeChannelIE, JustinTVIE
class TestAllURLsMatching(unittest.TestCase):
def setUp(self):
self.ies = gen_extractors()
def matching_ies(self, url):
return [ie.IE_NAME for ie in self.ies if ie.suitable(url) and ie.IE_NAME != 'generic']
def assertMatch(self, url, ie_list):
self.assertEqual(self.matching_ies(url), ie_list)
def test_youtube_playlist_matching(self):
self.assertTrue(YoutubePlaylistIE.suitable(u'ECUl4u3cNGP61MdtwGTqZA0MreSaDybji8'))
self.assertTrue(YoutubePlaylistIE.suitable(u'UUBABnxM4Ar9ten8Mdjj1j0Q')) #585
self.assertTrue(YoutubePlaylistIE.suitable(u'PL63F0C78739B09958'))
self.assertTrue(YoutubePlaylistIE.suitable(u'https://www.youtube.com/playlist?list=UUBABnxM4Ar9ten8Mdjj1j0Q'))
self.assertTrue(YoutubePlaylistIE.suitable(u'https://www.youtube.com/course?list=ECUl4u3cNGP61MdtwGTqZA0MreSaDybji8'))
self.assertTrue(YoutubePlaylistIE.suitable(u'https://www.youtube.com/playlist?list=PLwP_SiAcdui0KVebT0mU9Apz359a4ubsC'))
self.assertTrue(YoutubePlaylistIE.suitable(u'https://www.youtube.com/watch?v=AV6J6_AeFEQ&playnext=1&list=PL4023E734DA416012')) #668
self.assertFalse(YoutubePlaylistIE.suitable(u'PLtS2H6bU1M'))
assertPlaylist = lambda url: self.assertMatch(url, ['youtube:playlist'])
assertPlaylist(u'ECUl4u3cNGP61MdtwGTqZA0MreSaDybji8')
assertPlaylist(u'UUBABnxM4Ar9ten8Mdjj1j0Q') #585
assertPlaylist(u'PL63F0C78739B09958')
assertPlaylist(u'https://www.youtube.com/playlist?list=UUBABnxM4Ar9ten8Mdjj1j0Q')
assertPlaylist(u'https://www.youtube.com/course?list=ECUl4u3cNGP61MdtwGTqZA0MreSaDybji8')
assertPlaylist(u'https://www.youtube.com/playlist?list=PLwP_SiAcdui0KVebT0mU9Apz359a4ubsC')
assertPlaylist(u'https://www.youtube.com/watch?v=AV6J6_AeFEQ&playnext=1&list=PL4023E734DA416012') #668
self.assertFalse('youtube:playlist' in self.matching_ies(u'PLtS2H6bU1M'))
def test_youtube_matching(self):
self.assertTrue(YoutubeIE.suitable(u'PLtS2H6bU1M'))
self.assertFalse(YoutubeIE.suitable(u'https://www.youtube.com/watch?v=AV6J6_AeFEQ&playnext=1&list=PL4023E734DA416012')) #668
self.assertMatch('http://youtu.be/BaW_jenozKc', ['youtube'])
self.assertMatch('http://www.youtube.com/v/BaW_jenozKc', ['youtube'])
self.assertMatch('https://youtube.googleapis.com/v/BaW_jenozKc', ['youtube'])
def test_youtube_channel_matching(self):
self.assertTrue(YoutubeChannelIE.suitable('https://www.youtube.com/channel/HCtnHdj3df7iM'))
self.assertTrue(YoutubeChannelIE.suitable('https://www.youtube.com/channel/HCtnHdj3df7iM?feature=gb_ch_rec'))
self.assertTrue(YoutubeChannelIE.suitable('https://www.youtube.com/channel/HCtnHdj3df7iM/videos'))
assertChannel = lambda url: self.assertMatch(url, ['youtube:channel'])
assertChannel('https://www.youtube.com/channel/HCtnHdj3df7iM')
assertChannel('https://www.youtube.com/channel/HCtnHdj3df7iM?feature=gb_ch_rec')
assertChannel('https://www.youtube.com/channel/HCtnHdj3df7iM/videos')
def test_youtube_user_matching(self):
self.assertMatch('www.youtube.com/NASAgovVideo/videos', ['youtube:user'])
def test_youtube_feeds(self):
self.assertMatch('https://www.youtube.com/feed/watch_later', ['youtube:watch_later'])
self.assertMatch('https://www.youtube.com/feed/subscriptions', ['youtube:subscriptions'])
self.assertMatch('https://www.youtube.com/feed/recommended', ['youtube:recommended'])
self.assertMatch('https://www.youtube.com/my_favorites', ['youtube:favorites'])
def test_youtube_show_matching(self):
self.assertMatch('http://www.youtube.com/show/airdisasters', ['youtube:show'])
def test_justin_tv_channelid_matching(self):
self.assertTrue(JustinTVIE.suitable(u"justin.tv/vanillatv"))
@ -46,9 +79,33 @@ class TestAllURLsMatching(unittest.TestCase):
self.assertTrue(JustinTVIE.suitable(u"http://www.twitch.tv/tsm_theoddone/c/2349361"))
def test_youtube_extract(self):
self.assertEqual(YoutubeIE()._extract_id('http://www.youtube.com/watch?&v=BaW_jenozKc'), 'BaW_jenozKc')
self.assertEqual(YoutubeIE()._extract_id('https://www.youtube.com/watch?&v=BaW_jenozKc'), 'BaW_jenozKc')
self.assertEqual(YoutubeIE()._extract_id('https://www.youtube.com/watch?feature=player_embedded&v=BaW_jenozKc'), 'BaW_jenozKc')
assertExtractId = lambda url, id: self.assertEqual(YoutubeIE()._extract_id(url), id)
assertExtractId('http://www.youtube.com/watch?&v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('https://www.youtube.com/watch?&v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('https://www.youtube.com/watch?feature=player_embedded&v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('https://www.youtube.com/watch_popup?v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('http://www.youtube.com/watch?v=BaW_jenozKcsharePLED17F32AD9753930', 'BaW_jenozKc')
assertExtractId('BaW_jenozKc', 'BaW_jenozKc')
def test_no_duplicates(self):
ies = gen_extractors()
for tc in get_testcases():
url = tc['url']
for ie in ies:
if type(ie).__name__ in ['GenericIE', tc['name'] + 'IE']:
self.assertTrue(ie.suitable(url), '%s should match URL %r' % (type(ie).__name__, url))
else:
self.assertFalse(ie.suitable(url), '%s should not match URL %r' % (type(ie).__name__, url))
def test_keywords(self):
self.assertMatch(':ytsubs', ['youtube:subscriptions'])
self.assertMatch(':ytsubscriptions', ['youtube:subscriptions'])
self.assertMatch(':ythistory', ['youtube:history'])
self.assertMatch(':thedailyshow', ['ComedyCentralShows'])
self.assertMatch(':tds', ['ComedyCentralShows'])
self.assertMatch(':colbertreport', ['ComedyCentralShows'])
self.assertMatch(':cr', ['ComedyCentralShows'])
if __name__ == '__main__':
unittest.main()

View File

@ -1,45 +1,38 @@
#!/usr/bin/env python
import errno
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import (
get_params,
get_testcases,
try_rm,
md5,
report_warning
)
import hashlib
import io
import os
import json
import unittest
import sys
import socket
import binascii
# Allow direct execution
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import youtube_dl.YoutubeDL
import youtube_dl.extractor
from youtube_dl.utils import *
DEF_FILE = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'tests.json')
PARAMETERS_FILE = os.path.join(os.path.dirname(os.path.abspath(__file__)), "parameters.json")
from youtube_dl.utils import (
compat_str,
compat_urllib_error,
compat_HTTPError,
DownloadError,
ExtractorError,
UnavailableVideoError,
)
from youtube_dl.extractor import get_info_extractor
RETRIES = 3
# General configuration (from __init__, not very elegant...)
jar = compat_cookiejar.CookieJar()
cookie_processor = compat_urllib_request.HTTPCookieProcessor(jar)
proxy_handler = compat_urllib_request.ProxyHandler()
opener = compat_urllib_request.build_opener(proxy_handler, cookie_processor, YoutubeDLHandler())
compat_urllib_request.install_opener(opener)
socket.setdefaulttimeout(10)
def _try_rm(filename):
""" Remove a file if it exists """
try:
os.remove(filename)
except OSError as ose:
if ose.errno != errno.ENOENT:
raise
md5 = lambda s: hashlib.md5(s.encode('utf-8')).hexdigest()
class YoutubeDL(youtube_dl.YoutubeDL):
def __init__(self, *args, **kwargs):
self.to_stderr = self.to_screen
@ -56,16 +49,12 @@ def _file_md5(fn):
with open(fn, 'rb') as f:
return hashlib.md5(f.read()).hexdigest()
with io.open(DEF_FILE, encoding='utf-8') as deff:
defs = json.load(deff)
with io.open(PARAMETERS_FILE, encoding='utf-8') as pf:
parameters = json.load(pf)
defs = get_testcases()
class TestDownload(unittest.TestCase):
maxDiff = None
def setUp(self):
self.parameters = parameters
self.defs = defs
### Dynamically generate tests
@ -73,66 +62,86 @@ def generator(test_case):
def test_template(self):
ie = youtube_dl.extractor.get_info_extractor(test_case['name'])
if not ie._WORKING:
print('Skipping: IE marked as not _WORKING')
return
if 'playlist' not in test_case and not test_case['file']:
print('Skipping: No output file specified')
other_ies = [get_info_extractor(ie_key) for ie_key in test_case.get('add_ie', [])]
def print_skipping(reason):
print('Skipping %s: %s' % (test_case['name'], reason))
if not ie.working():
print_skipping('IE marked as not _WORKING')
return
if 'playlist' not in test_case:
info_dict = test_case.get('info_dict', {})
if not test_case.get('file') and not (info_dict.get('id') and info_dict.get('ext')):
print_skipping('The output file cannot be know, the "file" '
'key is missing or the info_dict is incomplete')
return
if 'skip' in test_case:
print('Skipping: {0}'.format(test_case['skip']))
print_skipping(test_case['skip'])
return
for other_ie in other_ies:
if not other_ie.working():
print_skipping(u'test depends on %sIE, marked as not WORKING' % other_ie.ie_key())
return
params = self.parameters.copy()
params.update(test_case.get('params', {}))
params = get_params(test_case.get('params', {}))
ydl = YoutubeDL(params)
for ie in youtube_dl.extractor.gen_extractors():
ydl.add_info_extractor(ie)
ydl.add_default_info_extractors()
finished_hook_called = set()
def _hook(status):
if status['status'] == 'finished':
finished_hook_called.add(status['filename'])
ydl.fd.add_progress_hook(_hook)
def get_tc_filename(tc):
return tc.get('file') or ydl.prepare_filename(tc.get('info_dict', {}))
test_cases = test_case.get('playlist', [test_case])
for tc in test_cases:
_try_rm(tc['file'])
_try_rm(tc['file'] + '.part')
_try_rm(tc['file'] + '.info.json')
def try_rm_tcs_files():
for tc in test_cases:
tc_filename = get_tc_filename(tc)
try_rm(tc_filename)
try_rm(tc_filename + '.part')
try_rm(os.path.splitext(tc_filename)[0] + '.info.json')
try_rm_tcs_files()
try:
for retry in range(1, RETRIES + 1):
try_num = 1
while True:
try:
ydl.download([test_case['url']])
except (DownloadError, ExtractorError) as err:
if retry == RETRIES: raise
# Check if the exception is not a network related one
if not err.exc_info[0] in (compat_urllib_error.URLError, socket.timeout, UnavailableVideoError):
if not err.exc_info[0] in (compat_urllib_error.URLError, socket.timeout, UnavailableVideoError) or (err.exc_info[0] == compat_HTTPError and err.exc_info[1].code == 503):
raise
print('Retrying: {0} failed tries\n\n##########\n\n'.format(retry))
if try_num == RETRIES:
report_warning(u'Failed due to network errors, skipping...')
return
print('Retrying: {0} failed tries\n\n##########\n\n'.format(try_num))
try_num += 1
else:
break
for tc in test_cases:
tc_filename = get_tc_filename(tc)
if not test_case.get('params', {}).get('skip_download', False):
self.assertTrue(os.path.exists(tc['file']), msg='Missing file ' + tc['file'])
self.assertTrue(tc['file'] in finished_hook_called)
self.assertTrue(os.path.exists(tc['file'] + '.info.json'))
self.assertTrue(os.path.exists(tc_filename), msg='Missing file ' + tc_filename)
self.assertTrue(tc_filename in finished_hook_called)
info_json_fn = os.path.splitext(tc_filename)[0] + '.info.json'
self.assertTrue(os.path.exists(info_json_fn))
if 'md5' in tc:
md5_for_file = _file_md5(tc['file'])
md5_for_file = _file_md5(tc_filename)
self.assertEqual(md5_for_file, tc['md5'])
with io.open(tc['file'] + '.info.json', encoding='utf-8') as infof:
with io.open(info_json_fn, encoding='utf-8') as infof:
info_dict = json.load(infof)
for (info_field, expected) in tc.get('info_dict', {}).items():
if isinstance(expected, compat_str) and expected.startswith('md5:'):
self.assertEqual(expected, 'md5:' + md5(info_dict.get(info_field)))
got = 'md5:' + md5(info_dict.get(info_field))
else:
got = info_dict.get(info_field)
self.assertEqual(
expected, got,
u'invalid value for field %s, expected %r, got %r' % (info_field, expected, got))
self.assertEqual(expected, got,
u'invalid value for field %s, expected %r, got %r' % (info_field, expected, got))
# If checkable fields are missing from the test case, print the info_dict
test_info_dict = dict((key, value if not isinstance(value, compat_str) or len(value) < 250 else 'md5:' + md5(value))
@ -144,20 +153,23 @@ def generator(test_case):
# Check for the presence of mandatory fields
for key in ('id', 'url', 'title', 'ext'):
self.assertTrue(key in info_dict.keys() and info_dict[key])
# Check for mandatory fields that are automatically set by YoutubeDL
for key in ['webpage_url', 'extractor', 'extractor_key']:
self.assertTrue(info_dict.get(key), u'Missing field: %s' % key)
finally:
for tc in test_cases:
_try_rm(tc['file'])
_try_rm(tc['file'] + '.part')
_try_rm(tc['file'] + '.info.json')
try_rm_tcs_files()
return test_template
### And add them to TestDownload
for n, test_case in enumerate(defs):
test_method = generator(test_case)
test_method.__name__ = "test_{0}".format(test_case["name"])
if getattr(TestDownload, test_method.__name__, False):
test_method.__name__ = "test_{0}_{1}".format(test_case["name"], n)
tname = 'test_' + str(test_case['name'])
i = 1
while hasattr(TestDownload, tname):
tname = 'test_' + str(test_case['name']) + '_' + str(i)
i += 1
test_method.__name__ = tname
setattr(TestDownload, test_method.__name__, test_method)
del test_method

115
test/test_playlists.py Normal file
View File

@ -0,0 +1,115 @@
#!/usr/bin/env python
# encoding: utf-8
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import FakeYDL
from youtube_dl.extractor import (
DailymotionPlaylistIE,
DailymotionUserIE,
VimeoChannelIE,
UstreamChannelIE,
SoundcloudSetIE,
SoundcloudUserIE,
LivestreamIE,
NHLVideocenterIE,
BambuserChannelIE,
BandcampAlbumIE
)
class TestPlaylists(unittest.TestCase):
def assertIsPlaylist(self, info):
"""Make sure the info has '_type' set to 'playlist'"""
self.assertEqual(info['_type'], 'playlist')
def test_dailymotion_playlist(self):
dl = FakeYDL()
ie = DailymotionPlaylistIE(dl)
result = ie.extract('http://www.dailymotion.com/playlist/xv4bw_nqtv_sport/1#video=xl8v3q')
self.assertIsPlaylist(result)
self.assertEqual(result['title'], u'SPORT')
self.assertTrue(len(result['entries']) > 20)
def test_dailymotion_user(self):
dl = FakeYDL()
ie = DailymotionUserIE(dl)
result = ie.extract('http://www.dailymotion.com/user/generation-quoi/')
self.assertIsPlaylist(result)
self.assertEqual(result['title'], u'Génération Quoi')
self.assertTrue(len(result['entries']) >= 26)
def test_vimeo_channel(self):
dl = FakeYDL()
ie = VimeoChannelIE(dl)
result = ie.extract('http://vimeo.com/channels/tributes')
self.assertIsPlaylist(result)
self.assertEqual(result['title'], u'Vimeo Tributes')
self.assertTrue(len(result['entries']) > 24)
def test_ustream_channel(self):
dl = FakeYDL()
ie = UstreamChannelIE(dl)
result = ie.extract('http://www.ustream.tv/channel/young-americans-for-liberty')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], u'5124905')
self.assertTrue(len(result['entries']) >= 11)
def test_soundcloud_set(self):
dl = FakeYDL()
ie = SoundcloudSetIE(dl)
result = ie.extract('https://soundcloud.com/the-concept-band/sets/the-royal-concept-ep')
self.assertIsPlaylist(result)
self.assertEqual(result['title'], u'The Royal Concept EP')
self.assertTrue(len(result['entries']) >= 6)
def test_soundcloud_user(self):
dl = FakeYDL()
ie = SoundcloudUserIE(dl)
result = ie.extract('https://soundcloud.com/the-concept-band')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], u'9615865')
self.assertTrue(len(result['entries']) >= 12)
def test_livestream_event(self):
dl = FakeYDL()
ie = LivestreamIE(dl)
result = ie.extract('http://new.livestream.com/tedx/cityenglish')
self.assertIsPlaylist(result)
self.assertEqual(result['title'], u'TEDCity2.0 (English)')
self.assertTrue(len(result['entries']) >= 4)
def test_nhl_videocenter(self):
dl = FakeYDL()
ie = NHLVideocenterIE(dl)
result = ie.extract('http://video.canucks.nhl.com/videocenter/console?catid=999')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], u'999')
self.assertEqual(result['title'], u'Highlights')
self.assertEqual(len(result['entries']), 12)
def test_bambuser_channel(self):
dl = FakeYDL()
ie = BambuserChannelIE(dl)
result = ie.extract('http://bambuser.com/channel/pixelversity')
self.assertIsPlaylist(result)
self.assertEqual(result['title'], u'pixelversity')
self.assertTrue(len(result['entries']) >= 60)
def test_bandcamp_album(self):
dl = FakeYDL()
ie = BandcampAlbumIE(dl)
result = ie.extract('http://mpallante.bandcamp.com/album/nightmare-night-ep')
self.assertIsPlaylist(result)
self.assertEqual(result['title'], u'Nightmare Night EP')
self.assertTrue(len(result['entries']) >= 4)
if __name__ == '__main__':
unittest.main()

210
test/test_subtitles.py Normal file
View File

@ -0,0 +1,210 @@
#!/usr/bin/env python
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import FakeYDL, md5
from youtube_dl.extractor import (
YoutubeIE,
DailymotionIE,
TEDIE,
)
class BaseTestSubtitles(unittest.TestCase):
url = None
IE = None
def setUp(self):
self.DL = FakeYDL()
self.ie = self.IE(self.DL)
def getInfoDict(self):
info_dict = self.ie.extract(self.url)
return info_dict
def getSubtitles(self):
info_dict = self.getInfoDict()
return info_dict['subtitles']
class TestYoutubeSubtitles(BaseTestSubtitles):
url = 'QRS8MkLhQmM'
IE = YoutubeIE
def getSubtitles(self):
info_dict = self.getInfoDict()
return info_dict[0]['subtitles']
def test_youtube_no_writesubtitles(self):
self.DL.params['writesubtitles'] = False
subtitles = self.getSubtitles()
self.assertEqual(subtitles, None)
def test_youtube_subtitles(self):
self.DL.params['writesubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['en']), '4cd9278a35ba2305f47354ee13472260')
def test_youtube_subtitles_lang(self):
self.DL.params['writesubtitles'] = True
self.DL.params['subtitleslangs'] = ['it']
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['it']), '164a51f16f260476a05b50fe4c2f161d')
def test_youtube_allsubtitles(self):
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(len(subtitles.keys()), 13)
def test_youtube_subtitles_sbv_format(self):
self.DL.params['writesubtitles'] = True
self.DL.params['subtitlesformat'] = 'sbv'
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['en']), '13aeaa0c245a8bed9a451cb643e3ad8b')
def test_youtube_subtitles_vtt_format(self):
self.DL.params['writesubtitles'] = True
self.DL.params['subtitlesformat'] = 'vtt'
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['en']), '356cdc577fde0c6783b9b822e7206ff7')
def test_youtube_list_subtitles(self):
self.DL.expect_warning(u'Video doesn\'t have automatic captions')
self.DL.params['listsubtitles'] = True
info_dict = self.getInfoDict()
self.assertEqual(info_dict, None)
def test_youtube_automatic_captions(self):
self.url = '8YoUxe5ncPo'
self.DL.params['writeautomaticsub'] = True
self.DL.params['subtitleslangs'] = ['it']
subtitles = self.getSubtitles()
self.assertTrue(subtitles['it'] is not None)
def test_youtube_nosubtitles(self):
self.DL.expect_warning(u'video doesn\'t have subtitles')
self.url = 'sAjKT8FhjI8'
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(len(subtitles), 0)
def test_youtube_multiple_langs(self):
self.url = 'QRS8MkLhQmM'
self.DL.params['writesubtitles'] = True
langs = ['it', 'fr', 'de']
self.DL.params['subtitleslangs'] = langs
subtitles = self.getSubtitles()
for lang in langs:
self.assertTrue(subtitles.get(lang) is not None, u'Subtitles for \'%s\' not extracted' % lang)
class TestDailymotionSubtitles(BaseTestSubtitles):
url = 'http://www.dailymotion.com/video/xczg00'
IE = DailymotionIE
def test_no_writesubtitles(self):
subtitles = self.getSubtitles()
self.assertEqual(subtitles, None)
def test_subtitles(self):
self.DL.params['writesubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['en']), '976553874490cba125086bbfea3ff76f')
def test_subtitles_lang(self):
self.DL.params['writesubtitles'] = True
self.DL.params['subtitleslangs'] = ['fr']
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['fr']), '594564ec7d588942e384e920e5341792')
def test_allsubtitles(self):
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(len(subtitles.keys()), 5)
def test_list_subtitles(self):
self.DL.expect_warning(u'Automatic Captions not supported by this server')
self.DL.params['listsubtitles'] = True
info_dict = self.getInfoDict()
self.assertEqual(info_dict, None)
def test_automatic_captions(self):
self.DL.expect_warning(u'Automatic Captions not supported by this server')
self.DL.params['writeautomaticsub'] = True
self.DL.params['subtitleslang'] = ['en']
subtitles = self.getSubtitles()
self.assertTrue(len(subtitles.keys()) == 0)
def test_nosubtitles(self):
self.DL.expect_warning(u'video doesn\'t have subtitles')
self.url = 'http://www.dailymotion.com/video/x12u166_le-zapping-tele-star-du-08-aout-2013_tv'
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(len(subtitles), 0)
def test_multiple_langs(self):
self.DL.params['writesubtitles'] = True
langs = ['es', 'fr', 'de']
self.DL.params['subtitleslangs'] = langs
subtitles = self.getSubtitles()
for lang in langs:
self.assertTrue(subtitles.get(lang) is not None, u'Subtitles for \'%s\' not extracted' % lang)
class TestTedSubtitles(BaseTestSubtitles):
url = 'http://www.ted.com/talks/dan_dennett_on_our_consciousness.html'
IE = TEDIE
def test_no_writesubtitles(self):
subtitles = self.getSubtitles()
self.assertEqual(subtitles, None)
def test_subtitles(self):
self.DL.params['writesubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['en']), '2154f31ff9b9f89a0aa671537559c21d')
def test_subtitles_lang(self):
self.DL.params['writesubtitles'] = True
self.DL.params['subtitleslangs'] = ['fr']
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['fr']), '7616cbc6df20ec2c1204083c83871cf6')
def test_allsubtitles(self):
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(len(subtitles.keys()), 28)
def test_list_subtitles(self):
self.DL.expect_warning(u'Automatic Captions not supported by this server')
self.DL.params['listsubtitles'] = True
info_dict = self.getInfoDict()
self.assertEqual(info_dict, None)
def test_automatic_captions(self):
self.DL.expect_warning(u'Automatic Captions not supported by this server')
self.DL.params['writeautomaticsub'] = True
self.DL.params['subtitleslang'] = ['en']
subtitles = self.getSubtitles()
self.assertTrue(len(subtitles.keys()) == 0)
def test_multiple_langs(self):
self.DL.params['writesubtitles'] = True
langs = ['es', 'fr', 'de']
self.DL.params['subtitleslangs'] = langs
subtitles = self.getSubtitles()
for lang in langs:
self.assertTrue(subtitles.get(lang) is not None, u'Subtitles for \'%s\' not extracted' % lang)
if __name__ == '__main__':
unittest.main()

View File

@ -1,21 +1,32 @@
#!/usr/bin/env python
# Various small unit tests
import sys
import unittest
# coding: utf-8
# Allow direct execution
import os
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
# Various small unit tests
import xml.etree.ElementTree
#from youtube_dl.utils import htmlentity_transform
from youtube_dl.utils import timeconvert
from youtube_dl.utils import sanitize_filename
from youtube_dl.utils import unescapeHTML
from youtube_dl.utils import orderedSet
from youtube_dl.utils import DateRange
from youtube_dl.utils import unified_strdate
from youtube_dl.utils import (
timeconvert,
sanitize_filename,
unescapeHTML,
orderedSet,
DateRange,
unified_strdate,
find_xpath_attr,
get_meta_content,
xpath_with_ns,
smuggle_url,
unsmuggle_url,
shell_quote,
encodeFilename,
)
if sys.version_info < (3, 0):
_compat_str = lambda b: b.decode('unicode-escape')
@ -112,5 +123,59 @@ class TestUtil(unittest.TestCase):
self.assertEqual(unified_strdate('Dec 14, 2012'), '20121214')
self.assertEqual(unified_strdate('2012/10/11 01:56:38 +0000'), '20121011')
def test_find_xpath_attr(self):
testxml = u'''<root>
<node/>
<node x="a"/>
<node x="a" y="c" />
<node x="b" y="d" />
</root>'''
doc = xml.etree.ElementTree.fromstring(testxml)
self.assertEqual(find_xpath_attr(doc, './/fourohfour', 'n', 'v'), None)
self.assertEqual(find_xpath_attr(doc, './/node', 'x', 'a'), doc[1])
self.assertEqual(find_xpath_attr(doc, './/node', 'y', 'c'), doc[2])
def test_meta_parser(self):
testhtml = u'''
<head>
<meta name="description" content="foo &amp; bar">
<meta content='Plato' name='author'/>
</head>
'''
get_meta = lambda name: get_meta_content(name, testhtml)
self.assertEqual(get_meta('description'), u'foo & bar')
self.assertEqual(get_meta('author'), 'Plato')
def test_xpath_with_ns(self):
testxml = u'''<root xmlns:media="http://example.com/">
<media:song>
<media:author>The Author</media:author>
<url>http://server.com/download.mp3</url>
</media:song>
</root>'''
doc = xml.etree.ElementTree.fromstring(testxml)
find = lambda p: doc.find(xpath_with_ns(p, {'media': 'http://example.com/'}))
self.assertTrue(find('media:song') is not None)
self.assertEqual(find('media:song/media:author').text, u'The Author')
self.assertEqual(find('media:song/url').text, u'http://server.com/download.mp3')
def test_smuggle_url(self):
data = {u"ö": u"ö", u"abc": [3]}
url = 'https://foo.bar/baz?x=y#a'
smug_url = smuggle_url(url, data)
unsmug_url, unsmug_data = unsmuggle_url(smug_url)
self.assertEqual(url, unsmug_url)
self.assertEqual(data, unsmug_data)
res_url, res_data = unsmuggle_url(url)
self.assertEqual(res_url, url)
self.assertEqual(res_data, None)
def test_shell_quote(self):
args = ['ffmpeg', '-i', encodeFilename(u'ñ€ß\'.mp4')]
self.assertEqual(shell_quote(args), u"""ffmpeg -i 'ñ€ß'"'"'.mp4'""")
if __name__ == '__main__':
unittest.main()

View File

@ -0,0 +1,79 @@
#!/usr/bin/env python
# coding: utf-8
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import get_params, try_rm
import io
import xml.etree.ElementTree
import youtube_dl.YoutubeDL
import youtube_dl.extractor
class YoutubeDL(youtube_dl.YoutubeDL):
def __init__(self, *args, **kwargs):
super(YoutubeDL, self).__init__(*args, **kwargs)
self.to_stderr = self.to_screen
params = get_params({
'writeannotations': True,
'skip_download': True,
'writeinfojson': False,
'format': 'flv',
})
TEST_ID = 'gr51aVj-mLg'
ANNOTATIONS_FILE = TEST_ID + '.flv.annotations.xml'
EXPECTED_ANNOTATIONS = ['Speech bubble', 'Note', 'Title', 'Spotlight', 'Label']
class TestAnnotations(unittest.TestCase):
def setUp(self):
# Clear old files
self.tearDown()
def test_info_json(self):
expected = list(EXPECTED_ANNOTATIONS) #Two annotations could have the same text.
ie = youtube_dl.extractor.YoutubeIE()
ydl = YoutubeDL(params)
ydl.add_info_extractor(ie)
ydl.download([TEST_ID])
self.assertTrue(os.path.exists(ANNOTATIONS_FILE))
annoxml = None
with io.open(ANNOTATIONS_FILE, 'r', encoding='utf-8') as annof:
annoxml = xml.etree.ElementTree.parse(annof)
self.assertTrue(annoxml is not None, 'Failed to parse annotations XML')
root = annoxml.getroot()
self.assertEqual(root.tag, 'document')
annotationsTag = root.find('annotations')
self.assertEqual(annotationsTag.tag, 'annotations')
annotations = annotationsTag.findall('annotation')
#Not all the annotations have TEXT children and the annotations are returned unsorted.
for a in annotations:
self.assertEqual(a.tag, 'annotation')
if a.get('type') == 'text':
textTag = a.find('TEXT')
text = textTag.text
self.assertTrue(text in expected) #assertIn only added in python 2.7
#remove the first occurance, there could be more than one annotation with the same text
expected.remove(text)
#We should have seen (and removed) all the expected annotation texts.
self.assertEqual(len(expected), 0, 'Not all expected annotations were found.')
def tearDown(self):
try_rm(ANNOTATIONS_FILE)
if __name__ == '__main__':
unittest.main()

View File

@ -1,40 +1,36 @@
#!/usr/bin/env python
# coding: utf-8
import json
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
# Allow direct execution
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import get_params
import io
import json
import youtube_dl.YoutubeDL
import youtube_dl.extractor
from youtube_dl.utils import *
PARAMETERS_FILE = os.path.join(os.path.dirname(os.path.abspath(__file__)), "parameters.json")
# General configuration (from __init__, not very elegant...)
jar = compat_cookiejar.CookieJar()
cookie_processor = compat_urllib_request.HTTPCookieProcessor(jar)
proxy_handler = compat_urllib_request.ProxyHandler()
opener = compat_urllib_request.build_opener(proxy_handler, cookie_processor, YoutubeDLHandler())
compat_urllib_request.install_opener(opener)
class YoutubeDL(youtube_dl.YoutubeDL):
def __init__(self, *args, **kwargs):
super(YoutubeDL, self).__init__(*args, **kwargs)
self.to_stderr = self.to_screen
with io.open(PARAMETERS_FILE, encoding='utf-8') as pf:
params = json.load(pf)
params['writeinfojson'] = True
params['skip_download'] = True
params['writedescription'] = True
params = get_params({
'writeinfojson': True,
'skip_download': True,
'writedescription': True,
})
TEST_ID = 'BaW_jenozKc'
INFO_JSON_FILE = TEST_ID + '.mp4.info.json'
INFO_JSON_FILE = TEST_ID + '.info.json'
DESCRIPTION_FILE = TEST_ID + '.mp4.description'
EXPECTED_DESCRIPTION = u'''test chars: "'/\ä↭𝕐
@ -42,6 +38,7 @@ This is a test video for youtube-dl.
For more information, contact phihag@phihag.de .'''
class TestInfoJSON(unittest.TestCase):
def setUp(self):
# Clear old files

View File

@ -1,42 +1,55 @@
#!/usr/bin/env python
import sys
import unittest
import json
# Allow direct execution
import os
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from youtube_dl.extractor import YoutubeUserIE, YoutubePlaylistIE, YoutubeIE, YoutubeChannelIE
from youtube_dl.utils import *
from test.helper import FakeYDL
from youtube_dl.extractor import (
YoutubeUserIE,
YoutubePlaylistIE,
YoutubeIE,
YoutubeChannelIE,
YoutubeShowIE,
)
from helper import FakeYDL
class TestYoutubeLists(unittest.TestCase):
def assertIsPlaylist(self,info):
def assertIsPlaylist(self, info):
"""Make sure the info has '_type' set to 'playlist'"""
self.assertEqual(info['_type'], 'playlist')
def test_youtube_playlist(self):
dl = FakeYDL()
ie = YoutubePlaylistIE(dl)
result = ie.extract('https://www.youtube.com/playlist?list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re')[0]
result = ie.extract('https://www.youtube.com/playlist?list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re')
self.assertIsPlaylist(result)
self.assertEqual(result['title'], 'ytdl test PL')
ytie_results = [YoutubeIE()._extract_id(url['url']) for url in result['entries']]
self.assertEqual(ytie_results, [ 'bV9L5Ht9LgY', 'FXxLjLQi3Fg', 'tU3Bgo5qJZE'])
def test_youtube_playlist_noplaylist(self):
dl = FakeYDL()
dl.params['noplaylist'] = True
ie = YoutubePlaylistIE(dl)
result = ie.extract('https://www.youtube.com/watch?v=FXxLjLQi3Fg&list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re')
self.assertEqual(result['_type'], 'url')
self.assertEqual(YoutubeIE()._extract_id(result['url']), 'FXxLjLQi3Fg')
def test_issue_673(self):
dl = FakeYDL()
ie = YoutubePlaylistIE(dl)
result = ie.extract('PLBB231211A4F62143')[0]
result = ie.extract('PLBB231211A4F62143')
self.assertTrue(len(result['entries']) > 25)
def test_youtube_playlist_long(self):
dl = FakeYDL()
ie = YoutubePlaylistIE(dl)
result = ie.extract('https://www.youtube.com/playlist?list=UUBABnxM4Ar9ten8Mdjj1j0Q')[0]
result = ie.extract('https://www.youtube.com/playlist?list=UUBABnxM4Ar9ten8Mdjj1j0Q')
self.assertIsPlaylist(result)
self.assertTrue(len(result['entries']) >= 799)
@ -44,7 +57,7 @@ class TestYoutubeLists(unittest.TestCase):
#651
dl = FakeYDL()
ie = YoutubePlaylistIE(dl)
result = ie.extract('https://www.youtube.com/playlist?list=PLwP_SiAcdui0KVebT0mU9Apz359a4ubsC')[0]
result = ie.extract('https://www.youtube.com/playlist?list=PLwP_SiAcdui0KVebT0mU9Apz359a4ubsC')
ytie_results = [YoutubeIE()._extract_id(url['url']) for url in result['entries']]
self.assertFalse('pElCt5oNDuI' in ytie_results)
self.assertFalse('KdPEApIVdWM' in ytie_results)
@ -52,7 +65,7 @@ class TestYoutubeLists(unittest.TestCase):
def test_youtube_playlist_empty(self):
dl = FakeYDL()
ie = YoutubePlaylistIE(dl)
result = ie.extract('https://www.youtube.com/playlist?list=PLtPgu7CB4gbZDA7i_euNxn75ISqxwZPYx')[0]
result = ie.extract('https://www.youtube.com/playlist?list=PLtPgu7CB4gbZDA7i_euNxn75ISqxwZPYx')
self.assertIsPlaylist(result)
self.assertEqual(len(result['entries']), 0)
@ -60,7 +73,7 @@ class TestYoutubeLists(unittest.TestCase):
dl = FakeYDL()
ie = YoutubePlaylistIE(dl)
# TODO find a > 100 (paginating?) videos course
result = ie.extract('https://www.youtube.com/course?list=ECUl4u3cNGP61MdtwGTqZA0MreSaDybji8')[0]
result = ie.extract('https://www.youtube.com/course?list=ECUl4u3cNGP61MdtwGTqZA0MreSaDybji8')
entries = result['entries']
self.assertEqual(YoutubeIE()._extract_id(entries[0]['url']), 'j9WZyLZCBzs')
self.assertEqual(len(entries), 25)
@ -70,23 +83,29 @@ class TestYoutubeLists(unittest.TestCase):
dl = FakeYDL()
ie = YoutubeChannelIE(dl)
#test paginated channel
result = ie.extract('https://www.youtube.com/channel/UCKfVa3S1e4PHvxWcwyMMg8w')[0]
result = ie.extract('https://www.youtube.com/channel/UCKfVa3S1e4PHvxWcwyMMg8w')
self.assertTrue(len(result['entries']) > 90)
#test autogenerated channel
result = ie.extract('https://www.youtube.com/channel/HCtnHdj3df7iM/videos')[0]
result = ie.extract('https://www.youtube.com/channel/HCtnHdj3df7iM/videos')
self.assertTrue(len(result['entries']) >= 18)
def test_youtube_user(self):
dl = FakeYDL()
ie = YoutubeUserIE(dl)
result = ie.extract('https://www.youtube.com/user/TheLinuxFoundation')[0]
result = ie.extract('https://www.youtube.com/user/TheLinuxFoundation')
self.assertTrue(len(result['entries']) >= 320)
def test_youtube_safe_search(self):
dl = FakeYDL()
ie = YoutubePlaylistIE(dl)
result = ie.extract('PLtPgu7CB4gbY9oDN3drwC3cMbJggS7dKl')[0]
result = ie.extract('PLtPgu7CB4gbY9oDN3drwC3cMbJggS7dKl')
self.assertEqual(len(result['entries']), 2)
def test_youtube_show(self):
dl = FakeYDL()
ie = YoutubeShowIE(dl)
result = ie.extract('http://www.youtube.com/show/airdisasters')
self.assertTrue(len(result) >= 3)
if __name__ == '__main__':
unittest.main()

View File

@ -1,57 +0,0 @@
#!/usr/bin/env python
import unittest
import sys
# Allow direct execution
import os
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from youtube_dl.extractor.youtube import YoutubeIE
from helper import FakeYDL
sig = YoutubeIE(FakeYDL())._decrypt_signature
class TestYoutubeSig(unittest.TestCase):
def test_43_43(self):
wrong = '5AEEAE0EC39677BC65FD9021CCD115F1F2DBD5A59E4.C0B243A3E2DED6769199AF3461781E75122AE135135'
right = '931EA22157E1871643FA9519676DED253A342B0C.4E95A5DBD2F1F511DCC1209DF56CB77693CE0EAE'
self.assertEqual(sig(wrong), right)
def test_88(self):
wrong = "qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[]}|:;?/>.<"
right = "J:|}][{=+-_)(*&;%$#@>MNBVCXZASDFGH^KLPOIUYTREWQ0987654321mnbvcxzasdfghrklpoiuytej"
self.assertEqual(sig(wrong), right)
def test_87(self):
wrong = "qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$^&*()_-+={[]}|:;?/>.<"
right = "!?;:|}][{=+-_)(*&^$#@/MNBVCXZASqFGHJKLPOIUYTREWQ0987654321mnbvcxzasdfghjklpoiuytr"
self.assertEqual(sig(wrong), right)
def test_86(self):
wrong = "qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[|};?/>.<"
right = "ertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!/#$%^&*()_-+={[|};?@"
self.assertEqual(sig(wrong), right)
def test_85(self):
wrong = "qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[};?/>.<"
right = "{>/?;}[.=+-_)(*&^%$#@!MqBVCXZASDFwHJKLPOIUYTREWQ0987654321mnbvcxzasdfghjklpoiuytr"
self.assertEqual(sig(wrong), right)
def test_84(self):
wrong = "qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[};?>.<"
right = "<.>?;}[{=+-_)(*&^%$#@!MNBVCXZASDFGHJKLPOIUYTREWe098765432rmnbvcxzasdfghjklpoiuyt1"
self.assertEqual(sig(wrong), right)
def test_83(self):
wrong = "qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!#$%^&*()_+={[};?/>.<"
right = "D.>/?;}[{=+_)(*&^%$#!MNBVCXeAS<FGHJKLPOIUYTREWZ0987654321mnbvcxzasdfghjklpoiuytrQ"
self.assertEqual(sig(wrong), right)
def test_82(self):
wrong = "qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKHGFDSAZXCVBNM!@#$%^&*(-+={[};?/>.<"
right = "Q>/?;}[{=+-(*<^%$#@!MNBVCXZASDFGHKLPOIUY8REWT0q&7654321mnbvcxzasdfghjklpoiuytrew9"
self.assertEqual(sig(wrong), right)
if __name__ == '__main__':
unittest.main()

View File

@ -0,0 +1,81 @@
#!/usr/bin/env python
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import io
import re
import string
from youtube_dl.extractor import YoutubeIE
from youtube_dl.utils import compat_str, compat_urlretrieve
_TESTS = [
(
u'https://s.ytimg.com/yts/jsbin/html5player-vflHOr_nV.js',
u'js',
86,
u'>=<;:/.-[+*)(\'&%$#"!ZYX0VUTSRQPONMLKJIHGFEDCBA\\yxwvutsrqponmlkjihgfedcba987654321',
),
(
u'https://s.ytimg.com/yts/jsbin/html5player-vfldJ8xgI.js',
u'js',
85,
u'3456789a0cdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRS[UVWXYZ!"#$%&\'()*+,-./:;<=>?@',
),
(
u'https://s.ytimg.com/yts/swfbin/watch_as3-vflg5GhxU.swf',
u'swf',
82,
u':/.-,+*)=\'&%$#"!ZYX0VUTSRQPONMLKJIHGFEDCBAzyxw>utsrqponmlkjihgfedcba987654321'
),
]
class TestSignature(unittest.TestCase):
def setUp(self):
TEST_DIR = os.path.dirname(os.path.abspath(__file__))
self.TESTDATA_DIR = os.path.join(TEST_DIR, 'testdata')
if not os.path.exists(self.TESTDATA_DIR):
os.mkdir(self.TESTDATA_DIR)
def make_tfunc(url, stype, sig_length, expected_sig):
basename = url.rpartition('/')[2]
m = re.match(r'.*-([a-zA-Z0-9_-]+)\.[a-z]+$', basename)
assert m, '%r should follow URL format' % basename
test_id = m.group(1)
def test_func(self):
fn = os.path.join(self.TESTDATA_DIR, basename)
if not os.path.exists(fn):
compat_urlretrieve(url, fn)
ie = YoutubeIE()
if stype == 'js':
with io.open(fn, encoding='utf-8') as testf:
jscode = testf.read()
func = ie._parse_sig_js(jscode)
else:
assert stype == 'swf'
with open(fn, 'rb') as testf:
swfcode = testf.read()
func = ie._parse_sig_swf(swfcode)
src_sig = compat_str(string.printable[:sig_length])
got_sig = func(src_sig)
self.assertEqual(got_sig, expected_sig)
test_func.__name__ = str('test_signature_' + stype + '_' + test_id)
setattr(TestSignature, test_func.__name__, test_func)
for test_spec in _TESTS:
make_tfunc(*test_spec)
if __name__ == '__main__':
unittest.main()

View File

@ -1,95 +0,0 @@
#!/usr/bin/env python
import sys
import unittest
import json
import io
import hashlib
# Allow direct execution
import os
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from youtube_dl.extractor import YoutubeIE
from youtube_dl.utils import *
from helper import FakeYDL
md5 = lambda s: hashlib.md5(s.encode('utf-8')).hexdigest()
class TestYoutubeSubtitles(unittest.TestCase):
def setUp(self):
DL = FakeYDL()
DL.params['allsubtitles'] = False
DL.params['writesubtitles'] = False
DL.params['subtitlesformat'] = 'srt'
DL.params['listsubtitles'] = False
def test_youtube_no_subtitles(self):
DL = FakeYDL()
DL.params['writesubtitles'] = False
IE = YoutubeIE(DL)
info_dict = IE.extract('QRS8MkLhQmM')
subtitles = info_dict[0]['subtitles']
self.assertEqual(subtitles, None)
def test_youtube_subtitles(self):
DL = FakeYDL()
DL.params['writesubtitles'] = True
IE = YoutubeIE(DL)
info_dict = IE.extract('QRS8MkLhQmM')
sub = info_dict[0]['subtitles'][0]
self.assertEqual(md5(sub[2]), '4cd9278a35ba2305f47354ee13472260')
def test_youtube_subtitles_it(self):
DL = FakeYDL()
DL.params['writesubtitles'] = True
DL.params['subtitleslang'] = 'it'
IE = YoutubeIE(DL)
info_dict = IE.extract('QRS8MkLhQmM')
sub = info_dict[0]['subtitles'][0]
self.assertEqual(md5(sub[2]), '164a51f16f260476a05b50fe4c2f161d')
def test_youtube_onlysubtitles(self):
DL = FakeYDL()
DL.params['writesubtitles'] = True
DL.params['onlysubtitles'] = True
IE = YoutubeIE(DL)
info_dict = IE.extract('QRS8MkLhQmM')
sub = info_dict[0]['subtitles'][0]
self.assertEqual(md5(sub[2]), '4cd9278a35ba2305f47354ee13472260')
def test_youtube_allsubtitles(self):
DL = FakeYDL()
DL.params['allsubtitles'] = True
IE = YoutubeIE(DL)
info_dict = IE.extract('QRS8MkLhQmM')
subtitles = info_dict[0]['subtitles']
self.assertEqual(len(subtitles), 13)
def test_youtube_subtitles_sbv_format(self):
DL = FakeYDL()
DL.params['writesubtitles'] = True
DL.params['subtitlesformat'] = 'sbv'
IE = YoutubeIE(DL)
info_dict = IE.extract('QRS8MkLhQmM')
sub = info_dict[0]['subtitles'][0]
self.assertEqual(md5(sub[2]), '13aeaa0c245a8bed9a451cb643e3ad8b')
def test_youtube_subtitles_vtt_format(self):
DL = FakeYDL()
DL.params['writesubtitles'] = True
DL.params['subtitlesformat'] = 'vtt'
IE = YoutubeIE(DL)
info_dict = IE.extract('QRS8MkLhQmM')
sub = info_dict[0]['subtitles'][0]
self.assertEqual(md5(sub[2]), '356cdc577fde0c6783b9b822e7206ff7')
def test_youtube_list_subtitles(self):
DL = FakeYDL()
DL.params['listsubtitles'] = True
IE = YoutubeIE(DL)
info_dict = IE.extract('QRS8MkLhQmM')
self.assertEqual(info_dict, None)
def test_youtube_automatic_captions(self):
DL = FakeYDL()
DL.params['writeautomaticsub'] = True
DL.params['subtitleslang'] = 'it'
IE = YoutubeIE(DL)
info_dict = IE.extract('8YoUxe5ncPo')
sub = info_dict[0]['subtitles'][0]
self.assertTrue(sub[2] is not None)
if __name__ == '__main__':
unittest.main()

View File

@ -1,727 +0,0 @@
[
{
"name": "Youtube",
"url": "http://www.youtube.com/watch?v=BaW_jenozKc",
"file": "BaW_jenozKc.mp4",
"info_dict": {
"title": "youtube-dl test video \"'/\\ä↭𝕐",
"uploader": "Philipp Hagemeister",
"uploader_id": "phihag",
"upload_date": "20121002",
"description": "test chars: \"'/\\ä↭𝕐\n\nThis is a test video for youtube-dl.\n\nFor more information, contact phihag@phihag.de ."
}
},
{
"name": "Youtube",
"url": "http://www.youtube.com/watch?v=1ltcDfZMA3U",
"file": "1ltcDfZMA3U.flv",
"note": "Test VEVO video (#897)",
"info_dict": {
"upload_date": "20070518",
"title": "Maps - It Will Find You",
"description": "Music video by Maps performing It Will Find You.",
"uploader": "MuteUSA",
"uploader_id": "MuteUSA"
}
},
{
"name": "Youtube",
"url": "http://www.youtube.com/watch?v=UxxajLWwzqY",
"file": "UxxajLWwzqY.mp4",
"note": "Test generic use_cipher_signature video (#897)",
"info_dict": {
"upload_date": "20120506",
"title": "Icona Pop - I Love It (feat. Charli XCX) [OFFICIAL VIDEO]",
"description": "md5:b085c9804f5ab69f4adea963a2dceb3c",
"uploader": "IconaPop",
"uploader_id": "IconaPop"
}
},
{
"name": "Dailymotion",
"md5": "392c4b85a60a90dc4792da41ce3144eb",
"url": "http://www.dailymotion.com/video/x33vw9_tutoriel-de-youtubeur-dl-des-video_tech",
"file": "x33vw9.mp4",
"info_dict": {
"uploader": "Alex and Van .",
"title": "Tutoriel de Youtubeur\"DL DES VIDEO DE YOUTUBE\""
}
},
{
"name": "Metacafe",
"add_ie": ["Youtube"],
"url": "http://metacafe.com/watch/yt-_aUehQsCQtM/the_electric_company_short_i_pbs_kids_go/",
"file": "_aUehQsCQtM.flv",
"info_dict": {
"upload_date": "20090102",
"title": "The Electric Company | \"Short I\" | PBS KIDS GO!",
"description": "md5:2439a8ef6d5a70e380c22f5ad323e5a8",
"uploader": "PBS",
"uploader_id": "PBS"
}
},
{
"name": "BlipTV",
"md5": "b2d849efcf7ee18917e4b4d9ff37cafe",
"url": "http://blip.tv/cbr/cbr-exclusive-gotham-city-imposters-bats-vs-jokerz-short-3-5796352",
"file": "5779306.m4v",
"info_dict": {
"upload_date": "20111205",
"description": "md5:9bc31f227219cde65e47eeec8d2dc596",
"uploader": "Comic Book Resources - CBR TV",
"title": "CBR EXCLUSIVE: \"Gotham City Imposters\" Bats VS Jokerz Short 3"
}
},
{
"name": "XVideos",
"md5": "1d0c835822f0a71a7bf011855db929d0",
"url": "http://www.xvideos.com/video939581/funny_porns_by_s_-1",
"file": "939581.flv",
"info_dict": {
"title": "Funny Porns By >>>>S<<<<<< -1"
}
},
{
"name": "YouPorn",
"md5": "c37ddbaaa39058c76a7e86c6813423c1",
"url": "http://www.youporn.com/watch/505835/sex-ed-is-it-safe-to-masturbate-daily/",
"file": "505835.mp4",
"info_dict": {
"upload_date": "20101221",
"description": "Love & Sex Answers: http://bit.ly/DanAndJenn -- Is It Unhealthy To Masturbate Daily?",
"uploader": "Ask Dan And Jennifer",
"title": "Sex Ed: Is It Safe To Masturbate Daily?"
}
},
{
"name": "Pornotube",
"md5": "374dd6dcedd24234453b295209aa69b6",
"url": "http://pornotube.com/c/173/m/1689755/Marilyn-Monroe-Bathing",
"file": "1689755.flv",
"info_dict": {
"upload_date": "20090708",
"title": "Marilyn-Monroe-Bathing"
}
},
{
"name": "YouJizz",
"md5": "07e15fa469ba384c7693fd246905547c",
"url": "http://www.youjizz.com/videos/zeichentrick-1-2189178.html",
"file": "2189178.flv",
"info_dict": {
"title": "Zeichentrick 1"
}
},
{
"name": "Vimeo",
"md5": "8879b6cc097e987f02484baf890129e5",
"url": "http://vimeo.com/56015672",
"file": "56015672.mp4",
"info_dict": {
"title": "youtube-dl test video - ★ \" ' 幸 / \\ ä ↭ 𝕐",
"uploader": "Filippo Valsorda",
"uploader_id": "user7108434",
"upload_date": "20121220",
"description": "This is a test case for youtube-dl.\nFor more information, see github.com/rg3/youtube-dl\nTest chars: ★ \" ' 幸 / \\ ä ↭ 𝕐"
}
},
{
"name": "Soundcloud",
"md5": "ebef0a451b909710ed1d7787dddbf0d7",
"url": "http://soundcloud.com/ethmusic/lostin-powers-she-so-heavy",
"file": "62986583.mp3",
"info_dict": {
"upload_date": "20121011",
"description": "No Downloads untill we record the finished version this weekend, i was too pumped n i had to post it , earl is prolly gonna b hella p.o'd",
"uploader": "E.T. ExTerrestrial Music",
"title": "Lostin Powers - She so Heavy (SneakPreview) Adrian Ackers Blueprint 1"
}
},
{
"name": "StanfordOpenClassroom",
"md5": "544a9468546059d4e80d76265b0443b8",
"url": "http://openclassroom.stanford.edu/MainFolder/VideoPage.php?course=PracticalUnix&video=intro-environment&speed=100",
"file": "PracticalUnix_intro-environment.mp4",
"info_dict": {
"title": "Intro Environment"
}
},
{
"name": "XNXX",
"md5": "0831677e2b4761795f68d417e0b7b445",
"url": "http://video.xnxx.com/video1135332/lida_naked_funny_actress_5_",
"file": "1135332.flv",
"info_dict": {
"title": "lida » Naked Funny Actress (5)"
}
},
{
"name": "Youku",
"url": "http://v.youku.com/v_show/id_XNDgyMDQ2NTQw.html",
"file": "XNDgyMDQ2NTQw_part00.flv",
"md5": "ffe3f2e435663dc2d1eea34faeff5b5b",
"params": { "test": false },
"info_dict": {
"title": "youtube-dl test video \"'/\\ä↭𝕐"
}
},
{
"name": "NBA",
"url": "http://www.nba.com/video/games/nets/2012/12/04/0021200253-okc-bkn-recap.nba/index.html",
"file": "0021200253-okc-bkn-recap.nba.mp4",
"md5": "c0edcfc37607344e2ff8f13c378c88a4",
"info_dict": {
"description": "Kevin Durant scores 32 points and dishes out six assists as the Thunder beat the Nets in Brooklyn.",
"title": "Thunder vs. Nets"
}
},
{
"name": "JustinTV",
"url": "http://www.twitch.tv/thegamedevhub/b/296128360",
"file": "296128360.flv",
"md5": "ecaa8a790c22a40770901460af191c9a",
"info_dict": {
"upload_date": "20110927",
"uploader_id": 25114803,
"uploader": "thegamedevhub",
"title": "Beginner Series - Scripting With Python Pt.1"
}
},
{
"name": "MyVideo",
"url": "http://www.myvideo.de/watch/8229274/bowling_fail_or_win",
"file": "8229274.flv",
"md5": "2d2753e8130479ba2cb7e0a37002053e",
"info_dict": {
"title": "bowling-fail-or-win"
}
},
{
"name": "Escapist",
"url": "http://www.escapistmagazine.com/videos/view/the-escapist-presents/6618-Breaking-Down-Baldurs-Gate",
"file": "6618-Breaking-Down-Baldurs-Gate.mp4",
"md5": "c6793dbda81388f4264c1ba18684a74d",
"info_dict": {
"description": "Baldur's Gate: Original, Modded or Enhanced Edition? I'll break down what you can expect from the new Baldur's Gate: Enhanced Edition.",
"uploader": "the-escapist-presents",
"title": "Breaking Down Baldur's Gate"
}
},
{
"name": "GooglePlus",
"url": "https://plus.google.com/u/0/108897254135232129896/posts/ZButuJc6CtH",
"file": "ZButuJc6CtH.flv",
"info_dict": {
"upload_date": "20120613",
"uploader": "井上ヨシマサ",
"title": "嘆きの天使 降臨"
}
},
{
"name": "FunnyOrDie",
"url": "http://www.funnyordie.com/videos/0732f586d7/heart-shaped-box-literal-video-version",
"file": "0732f586d7.mp4",
"md5": "f647e9e90064b53b6e046e75d0241fbd",
"info_dict": {
"description": "Lyrics changed to match the video. Spoken cameo by Obscurus Lupa (from ThatGuyWithTheGlasses.com). Based on a concept by Dustin McLean (DustFilms.com). Performed, edited, and written by David A. Scott.",
"title": "Heart-Shaped Box: Literal Video Version"
}
},
{
"name": "Steam",
"url": "http://store.steampowered.com/video/105600/",
"playlist": [
{
"file": "81300.flv",
"md5": "f870007cee7065d7c76b88f0a45ecc07",
"info_dict": {
"title": "Terraria 1.1 Trailer"
}
},
{
"file": "80859.flv",
"md5": "61aaf31a5c5c3041afb58fb83cbb5751",
"info_dict": {
"title": "Terraria Trailer"
}
}
]
},
{
"name": "Ustream",
"url": "http://www.ustream.tv/recorded/20274954",
"file": "20274954.flv",
"md5": "088f151799e8f572f84eb62f17d73e5c",
"info_dict": {
"title": "Young Americans for Liberty February 7, 2012 2:28 AM",
"uploader": "Young Americans for Liberty"
}
},
{
"name": "InfoQ",
"url": "http://www.infoq.com/presentations/A-Few-of-My-Favorite-Python-Things",
"file": "12-jan-pythonthings.mp4",
"info_dict": {
"description": "Mike Pirnat presents some tips and tricks, standard libraries and third party packages that make programming in Python a richer experience.",
"title": "A Few of My Favorite [Python] Things"
},
"params": {
"skip_download": true
}
},
{
"name": "ComedyCentral",
"url": "http://www.thedailyshow.com/watch/thu-december-13-2012/kristen-stewart",
"file": "422212.mp4",
"md5": "4e2f5cb088a83cd8cdb7756132f9739d",
"info_dict": {
"upload_date": "20121214",
"description": "Kristen Stewart",
"uploader": "thedailyshow",
"title": "thedailyshow-kristen-stewart part 1"
}
},
{
"name": "RBMARadio",
"url": "http://www.rbmaradio.com/shows/ford-lopatin-live-at-primavera-sound-2011",
"file": "ford-lopatin-live-at-primavera-sound-2011.mp3",
"md5": "6bc6f9bcb18994b4c983bc3bf4384d95",
"info_dict": {
"title": "Live at Primavera Sound 2011",
"description": "Joel Ford and Daniel \u2019Oneohtrix Point Never\u2019 Lopatin fly their midified pop extravaganza to Spain. Live at Primavera Sound 2011.",
"uploader": "Ford & Lopatin",
"uploader_id": "ford-lopatin",
"location": "Spain"
}
},
{
"name": "Facebook",
"url": "https://www.facebook.com/photo.php?v=120708114770723",
"file": "120708114770723.mp4",
"md5": "48975a41ccc4b7a581abd68651c1a5a8",
"info_dict": {
"title": "PEOPLE ARE AWESOME 2013",
"duration": 279
}
},
{
"name": "EightTracks",
"url": "http://8tracks.com/ytdl/youtube-dl-test-tracks-a",
"playlist": [
{
"file": "11885610.m4a",
"md5": "96ce57f24389fc8734ce47f4c1abcc55",
"info_dict": {
"title": "youtue-dl project<>\"' - youtube-dl test track 1 \"'/\\\u00e4\u21ad",
"uploader_id": "ytdl"
}
},
{
"file": "11885608.m4a",
"md5": "4ab26f05c1f7291ea460a3920be8021f",
"info_dict": {
"title": "youtube-dl project - youtube-dl test track 2 \"'/\\\u00e4\u21ad",
"uploader_id": "ytdl"
}
},
{
"file": "11885679.m4a",
"md5": "d30b5b5f74217410f4689605c35d1fd7",
"info_dict": {
"title": "youtube-dl project as well - youtube-dl test track 3 \"'/\\\u00e4\u21ad",
"uploader_id": "ytdl"
}
},
{
"file": "11885680.m4a",
"md5": "4eb0a669317cd725f6bbd336a29f923a",
"info_dict": {
"title": "youtube-dl project as well - youtube-dl test track 4 \"'/\\\u00e4\u21ad",
"uploader_id": "ytdl"
}
},
{
"file": "11885682.m4a",
"md5": "1893e872e263a2705558d1d319ad19e8",
"info_dict": {
"title": "PH - youtube-dl test track 5 \"'/\\\u00e4\u21ad",
"uploader_id": "ytdl"
}
},
{
"file": "11885683.m4a",
"md5": "b673c46f47a216ab1741ae8836af5899",
"info_dict": {
"title": "PH - youtube-dl test track 6 \"'/\\\u00e4\u21ad",
"uploader_id": "ytdl"
}
},
{
"file": "11885684.m4a",
"md5": "1d74534e95df54986da7f5abf7d842b7",
"info_dict": {
"title": "phihag - youtube-dl test track 7 \"'/\\\u00e4\u21ad",
"uploader_id": "ytdl"
}
},
{
"file": "11885685.m4a",
"md5": "f081f47af8f6ae782ed131d38b9cd1c0",
"info_dict": {
"title": "phihag - youtube-dl test track 8 \"'/\\\u00e4\u21ad",
"uploader_id": "ytdl"
}
}
]
},
{
"name": "Keek",
"url": "http://www.keek.com/ytdl/keeks/NODfbab",
"file": "NODfbab.mp4",
"md5": "9b0636f8c0f7614afa4ea5e4c6e57e83",
"info_dict": {
"uploader": "ytdl",
"title": "test chars: \"'/\\ä<>This is a test video for youtube-dl.For more information, contact phihag@phihag.de ."
}
},
{
"name": "TED",
"url": "http://www.ted.com/talks/dan_dennett_on_our_consciousness.html",
"file": "102.mp4",
"md5": "8cd9dfa41ee000ce658fd48fb5d89a61",
"info_dict": {
"title": "Dan Dennett: The illusion of consciousness",
"description": "md5:c6fa72e6eedbd938c9caf6b2702f5922"
}
},
{
"name": "MySpass",
"url": "http://www.myspass.de/myspass/shows/tvshows/absolute-mehrheit/Absolute-Mehrheit-vom-17022013-Die-Highlights-Teil-2--/11741/",
"file": "11741.mp4",
"md5": "0b49f4844a068f8b33f4b7c88405862b",
"info_dict": {
"description": "Wer kann in die Fußstapfen von Wolfgang Kubicki treten und die Mehrheit der Zuschauer hinter sich versammeln? Wird vielleicht sogar die Absolute Mehrheit geknackt und der Jackpot von 200.000 Euro mit nach Hause genommen?",
"title": "Absolute Mehrheit vom 17.02.2013 - Die Highlights, Teil 2"
}
},
{
"name": "Generic",
"url": "http://www.hodiho.fr/2013/02/regis-plante-sa-jeep.html",
"file": "13601338388002.mp4",
"md5": "85b90ccc9d73b4acd9138d3af4c27f89",
"info_dict": {
"uploader": "www.hodiho.fr",
"title": "Régis plante sa Jeep"
}
},
{
"name": "Spiegel",
"url": "http://www.spiegel.de/video/vulkan-tungurahua-in-ecuador-ist-wieder-aktiv-video-1259285.html",
"file": "1259285.mp4",
"md5": "2c2754212136f35fb4b19767d242f66e",
"info_dict": {
"title": "Vulkanausbruch in Ecuador: Der \"Feuerschlund\" ist wieder aktiv"
}
},
{
"name": "LiveLeak",
"md5": "0813c2430bea7a46bf13acf3406992f4",
"url": "http://www.liveleak.com/view?i=757_1364311680",
"file": "757_1364311680.mp4",
"info_dict": {
"title": "Most unlucky car accident",
"description": "extremely bad day for this guy..!",
"uploader": "ljfriel2"
}
},
{
"name": "WorldStarHipHop",
"url": "http://www.worldstarhiphop.com/videos/video.php?v=wshh6a7q1ny0G34ZwuIO",
"file": "wshh6a7q1ny0G34ZwuIO.mp4",
"md5": "9d04de741161603bf7071bbf4e883186",
"info_dict": {
"title": "Video: KO Of The Week: MMA Fighter Gets Knocked Out By Swift Head Kick!"
}
},
{
"name": "ARD",
"url": "http://www.ardmediathek.de/das-erste/tagesschau-in-100-sek?documentId=14077640",
"file": "14077640.mp4",
"md5": "6ca8824255460c787376353f9e20bbd8",
"info_dict": {
"title": "11.04.2013 09:23 Uhr - Tagesschau in 100 Sekunden"
},
"skip": "Requires rtmpdump"
},
{
"name": "Tumblr",
"url": "http://resigno.tumblr.com/post/53364321212/e-de-extrema-importancia-que-esse-video-seja",
"file": "53364321212.mp4",
"md5": "0716d3dd51baf68a28b40fdf1251494e",
"info_dict": {
"title": "Rafael Lemos | Tumblr"
}
},
{
"name": "SoundcloudSet",
"url":"https://soundcloud.com/the-concept-band/sets/the-royal-concept-ep",
"playlist":[
{
"file":"30510138.mp3",
"md5":"f9136bf103901728f29e419d2c70f55d",
"info_dict": {
"upload_date": "20111213",
"description": "The Royal Concept from Stockholm\r\nFilip / Povel / David / Magnus\r\nwww.royalconceptband.com",
"uploader": "The Royal Concept",
"title": "D-D-Dance"
}
},
{
"file":"47127625.mp3",
"md5":"09b6758a018470570f8fd423c9453dd8",
"info_dict": {
"upload_date": "20120521",
"description": "The Royal Concept from Stockholm\r\nFilip / Povel / David / Magnus\r\nwww.royalconceptband.com",
"uploader": "The Royal Concept",
"title": "The Royal Concept - Gimme Twice"
}
},
{
"file":"47127627.mp3",
"md5":"154abd4e418cea19c3b901f1e1306d9c",
"info_dict": {
"upload_date": "20120521",
"uploader": "The Royal Concept",
"title": "Goldrushed"
}
},
{
"file":"47127629.mp3",
"md5":"2f5471edc79ad3f33a683153e96a79c1",
"info_dict": {
"upload_date": "20120521",
"description": "The Royal Concept from Stockholm\r\nFilip / Povel / David / Magnus\r\nwww.royalconceptband.com",
"uploader": "The Royal Concept",
"title": "In the End"
}
},
{
"file":"47127631.mp3",
"md5":"f9ba87aa940af7213f98949254f1c6e2",
"info_dict": {
"upload_date": "20120521",
"description": "The Royal Concept from Stockholm\r\nFilip / David / Povel / Magnus\r\nwww.theroyalconceptband.com",
"uploader": "The Royal Concept",
"title": "Knocked Up"
}
},
{
"file":"75206121.mp3",
"md5":"f9d1fe9406717e302980c30de4af9353",
"info_dict": {
"upload_date": "20130116",
"description": "The unreleased track World on Fire premiered on the CW's hit show Arrow (8pm/7pm central). \r\nAs a gift to our fans we would like to offer you a free download of the track! ",
"uploader": "The Royal Concept",
"title": "World On Fire"
}
}
]
},
{
"name":"Bandcamp",
"url":"http://youtube-dl.bandcamp.com/track/youtube-dl-test-song",
"file":"1812978515.mp3",
"md5":"cdeb30cdae1921719a3cbcab696ef53c",
"info_dict": {
"title":"youtube-dl test song \"'/\\ä↭"
},
"skip": "There is a limit of 200 free downloads / month for the test song"
},
{
"name": "RedTube",
"url": "http://www.redtube.com/66418",
"file": "66418.mp4",
"md5": "7b8c22b5e7098a3e1c09709df1126d2d",
"info_dict":{
"title":"Sucked on a toilet"
}
},
{
"name": "Photobucket",
"url": "http://media.photobucket.com/user/rachaneronas/media/TiredofLinkBuildingTryBacklinkMyDomaincom_zpsc0c3b9fa.mp4.html?filters[term]=search&filters[primary]=videos&filters[secondary]=images&sort=1&o=0",
"file": "zpsc0c3b9fa.mp4",
"md5": "7dabfb92b0a31f6c16cebc0f8e60ff99",
"info_dict": {
"upload_date": "20130504",
"uploader": "rachaneronas",
"title": "Tired of Link Building? Try BacklinkMyDomain.com!"
}
},
{
"name": "Ina",
"url": "www.ina.fr/video/I12055569/francois-hollande-je-crois-que-c-est-clair-video.html",
"file": "I12055569.mp4",
"md5": "a667021bf2b41f8dc6049479d9bb38a3",
"info_dict":{
"title":"François Hollande \"Je crois que c'est clair\""
}
},
{
"name": "Yahoo",
"url": "http://screen.yahoo.com/julian-smith-travis-legg-watch-214727115.html",
"file": "214727115.flv",
"md5": "2e717f169c1be93d84d3794a00d4a325",
"info_dict": {
"title": "Julian Smith & Travis Legg Watch Julian Smith"
},
"skip": "Requires rtmpdump"
},
{
"name": "Howcast",
"url": "http://www.howcast.com/videos/390161-How-to-Tie-a-Square-Knot-Properly",
"file": "390161.mp4",
"md5": "1d7ba54e2c9d7dc6935ef39e00529138",
"info_dict":{
"title":"How to Tie a Square Knot Properly",
"description":"The square knot, also known as the reef knot, is one of the oldest, most basic knots to tie, and can be used in many different ways. Here's the proper way to tie a square knot."
}
},
{
"name": "Vine",
"url": "https://vine.co/v/b9KOOWX7HUx",
"file": "b9KOOWX7HUx.mp4",
"md5": "2f36fed6235b16da96ce9b4dc890940d",
"info_dict":{
"title": "Chicken.",
"uploader": "Jack Dorsey"
}
},
{
"name": "Flickr",
"url": "http://www.flickr.com/photos/forestwander-nature-pictures/5645318632/in/photostream/",
"file": "5645318632.mp4",
"md5": "6fdc01adbc89d72fc9c4f15b4a4ba87b",
"info_dict":{
"title": "Dark Hollow Waterfalls",
"uploader_id": "forestwander-nature-pictures",
"description": "Waterfalls in the Springtime at Dark Hollow Waterfalls. These are located just off of Skyline Drive in Virginia. They are only about 6/10 of a mile hike but it is a pretty steep hill and a good climb back up."
}
},
{
"name": "Teamcoco",
"url": "http://teamcoco.com/video/louis-ck-interview-george-w-bush",
"file": "19705.mp4",
"md5": "27b6f7527da5acf534b15f21b032656e",
"info_dict":{
"title": "Louis C.K. Interview Pt. 1 11/3/11",
"description": "Louis C.K. got starstruck by George W. Bush, so what? Part one."
}
},
{
"name": "XHamster",
"url": "http://xhamster.com/movies/1509445/femaleagent_shy_beauty_takes_the_bait.html",
"file": "1509445.flv",
"md5": "9f48e0e8d58e3076bb236ff412ab62fa",
"info_dict": {
"upload_date": "20121014",
"uploader_id": "Ruseful2011",
"title": "FemaleAgent Shy beauty takes the bait"
}
},
{
"name": "Hypem",
"url": "http://hypem.com/track/1v6ga/BODYWORK+-+TAME",
"file": "1v6ga.mp3",
"md5": "b9cc91b5af8995e9f0c1cee04c575828",
"info_dict":{
"title":"Tame"
}
},
{
"name": "Vbox7",
"url": "http://vbox7.com/play:249bb972c2",
"file": "249bb972c2.flv",
"md5": "9c70d6d956f888bdc08c124acc120cfe",
"info_dict":{
"title":"Смях! Чудо - чист за секунди - Скрита камера"
}
},
{
"name": "Gametrailers",
"url": "http://www.gametrailers.com/videos/zbvr8i/mirror-s-edge-2-e3-2013--debut-trailer",
"file": "zbvr8i.flv",
"md5": "c3edbc995ab4081976e16779bd96a878",
"info_dict": {
"title": "E3 2013: Debut Trailer"
},
"skip": "Requires rtmpdump"
},
{
"name": "Statigram",
"url": "http://statigr.am/p/484091715184808010_284179915",
"file": "484091715184808010_284179915.mp4",
"md5": "deda4ff333abe2e118740321e992605b",
"info_dict": {
"uploader_id": "videoseconds",
"title": "Instagram photo by @videoseconds (Videos)"
}
},
{
"name": "Break",
"url": "http://www.break.com/video/when-girls-act-like-guys-2468056",
"file": "2468056.mp4",
"md5": "a3513fb1547fba4fb6cfac1bffc6c46b",
"info_dict": {
"title": "When Girls Act Like D-Bags"
}
},
{
"name": "Vevo",
"url": "http://www.vevo.com/watch/hurts/somebody-to-die-for/GB1101300280",
"file": "GB1101300280.mp4",
"md5": "06bea460acb744eab74a9d7dcb4bfd61",
"info_dict": {
"title": "Somebody To Die For",
"upload_date": "20130624",
"uploader": "Hurts"
}
},
{
"name": "Tudou",
"url": "http://www.tudou.com/listplay/zzdE77v6Mmo/2xN2duXMxmw.html",
"file": "159447792.f4v",
"md5": "ad7c358a01541e926a1e413612c6b10a",
"info_dict": {
"title": "卡马乔国足开大脚长传冲吊集锦"
}
},
{
"name": "CSpan",
"url": "http://www.c-spanvideo.org/program/HolderonV",
"file": "315139.flv",
"md5": "74a623266956f69e4df0068ab6c80fe4",
"info_dict": {
"title": "Attorney General Eric Holder on Voting Rights Act Decision"
},
"skip": "Requires rtmpdump"
},
{
"name": "Wimp",
"url": "http://www.wimp.com/deerfence/",
"file": "deerfence.flv",
"md5": "8b215e2e0168c6081a1cf84b2846a2b5",
"info_dict": {
"title": "Watch Till End: Herd of deer jump over a fence."
}
},
{
"name": "HotNewHipHop",
"url": "http://www.hotnewhiphop.com/freddie-gibbs-lay-it-down-song.1435540.html'",
"file": "1435540.mp3",
"md5": "2c2cd2f76ef11a9b3b581e8b232f3d96",
"info_dict": {
"title": "Freddie Gibbs Songs - Lay It Down"
}
}
]

8
tox.ini Normal file
View File

@ -0,0 +1,8 @@
[tox]
envlist = py26,py27,py33
[testenv]
deps =
nose
coverage
commands = nosetests --verbose {posargs:test} # --with-coverage --cover-package=youtube_dl --cover-html
# test.test_download:TestDownload.test_NowVideo

View File

@ -1,15 +1,19 @@
import math
import os
import re
import subprocess
import sys
import time
import traceback
if os.name == 'nt':
import ctypes
from .utils import *
from .utils import (
compat_urllib_error,
compat_urllib_request,
ContentTooShortError,
determine_ext,
encodeFilename,
format_bytes,
sanitize_open,
timeconvert,
)
class FileDownloader(object):
@ -50,45 +54,56 @@ class FileDownloader(object):
self.params = params
@staticmethod
def format_bytes(bytes):
if bytes is None:
return 'N/A'
if type(bytes) is str:
bytes = float(bytes)
if bytes == 0.0:
exponent = 0
def format_seconds(seconds):
(mins, secs) = divmod(seconds, 60)
(hours, mins) = divmod(mins, 60)
if hours > 99:
return '--:--:--'
if hours == 0:
return '%02d:%02d' % (mins, secs)
else:
exponent = int(math.log(bytes, 1024.0))
suffix = ['B','KiB','MiB','GiB','TiB','PiB','EiB','ZiB','YiB'][exponent]
converted = float(bytes) / float(1024 ** exponent)
return '%.2f%s' % (converted, suffix)
return '%02d:%02d:%02d' % (hours, mins, secs)
@staticmethod
def calc_percent(byte_counter, data_len):
if data_len is None:
return None
return float(byte_counter) / float(data_len) * 100.0
@staticmethod
def format_percent(percent):
if percent is None:
return '---.-%'
return '%6s' % ('%3.1f%%' % (float(byte_counter) / float(data_len) * 100.0))
return '%6s' % ('%3.1f%%' % percent)
@staticmethod
def calc_eta(start, now, total, current):
if total is None:
return '--:--'
return None
dif = now - start
if current == 0 or dif < 0.001: # One millisecond
return '--:--'
return None
rate = float(current) / dif
eta = int((float(total) - float(current)) / rate)
(eta_mins, eta_secs) = divmod(eta, 60)
if eta_mins > 99:
return int((float(total) - float(current)) / rate)
@staticmethod
def format_eta(eta):
if eta is None:
return '--:--'
return '%02d:%02d' % (eta_mins, eta_secs)
return FileDownloader.format_seconds(eta)
@staticmethod
def calc_speed(start, now, bytes):
dif = now - start
if bytes == 0 or dif < 0.001: # One millisecond
return None
return float(bytes) / dif
@staticmethod
def format_speed(speed):
if speed is None:
return '%10s' % '---b/s'
return '%10s' % ('%s/s' % FileDownloader.format_bytes(float(bytes) / dif))
return '%10s' % ('%s/s' % format_bytes(speed))
@staticmethod
def best_block_size(elapsed_time, bytes):
@ -119,16 +134,8 @@ class FileDownloader(object):
def to_stderr(self, message):
self.ydl.to_screen(message)
def to_cons_title(self, message):
"""Set console/terminal window title to message."""
if not self.params.get('consoletitle', False):
return
if os.name == 'nt' and ctypes.windll.kernel32.GetConsoleWindow():
# c_wchar_p() might not be necessary if `message` is
# already of type unicode()
ctypes.windll.kernel32.SetConsoleTitleW(ctypes.c_wchar_p(message))
elif 'TERM' in os.environ:
self.to_screen('\033]0;%s\007' % message, skip_eol=True)
def to_console_title(self, message):
self.ydl.to_console_title(message)
def trouble(self, *args, **kargs):
self.ydl.trouble(*args, **kargs)
@ -169,7 +176,7 @@ class FileDownloader(object):
if old_filename == new_filename:
return
os.rename(encodeFilename(old_filename), encodeFilename(new_filename))
except (IOError, OSError) as err:
except (IOError, OSError):
self.report_error(u'unable to rename file')
def try_utime(self, filename, last_modified_hdr):
@ -197,18 +204,27 @@ class FileDownloader(object):
"""Report destination filename."""
self.to_screen(u'[download] Destination: ' + filename)
def report_progress(self, percent_str, data_len_str, speed_str, eta_str):
def report_progress(self, percent, data_len_str, speed, eta):
"""Report download progress."""
if self.params.get('noprogress', False):
return
clear_line = (u'\x1b[K' if sys.stderr.isatty() and os.name != 'nt' else u'')
if eta is not None:
eta_str = self.format_eta(eta)
else:
eta_str = 'Unknown ETA'
if percent is not None:
percent_str = self.format_percent(percent)
else:
percent_str = 'Unknown %'
speed_str = self.format_speed(speed)
if self.params.get('progress_with_newline', False):
self.to_screen(u'[download] %s of %s at %s ETA %s' %
(percent_str, data_len_str, speed_str, eta_str))
else:
self.to_screen(u'\r%s[download] %s of %s at %s ETA %s' %
(clear_line, percent_str, data_len_str, speed_str, eta_str), skip_eol=True)
self.to_cons_title(u'youtube-dl - %s of %s at %s ETA %s' %
self.to_console_title(u'youtube-dl - %s of %s at %s ETA %s' %
(percent_str.strip(), data_len_str.strip(), speed_str.strip(), eta_str.strip()))
def report_resuming_byte(self, resume_len):
@ -223,23 +239,81 @@ class FileDownloader(object):
"""Report file has already been fully downloaded."""
try:
self.to_screen(u'[download] %s has already been downloaded' % file_name)
except (UnicodeEncodeError) as err:
except UnicodeEncodeError:
self.to_screen(u'[download] The file has already been downloaded')
def report_unable_to_resume(self):
"""Report it was impossible to resume download."""
self.to_screen(u'[download] Unable to resume')
def report_finish(self):
def report_finish(self, data_len_str, tot_time):
"""Report download finished."""
if self.params.get('noprogress', False):
self.to_screen(u'[download] Download completed')
else:
self.to_screen(u'')
clear_line = (u'\x1b[K' if sys.stderr.isatty() and os.name != 'nt' else u'')
self.to_screen(u'\r%s[download] 100%% of %s in %s' %
(clear_line, data_len_str, self.format_seconds(tot_time)))
def _download_with_rtmpdump(self, filename, url, player_url, page_url, play_path, tc_url, live):
def run_rtmpdump(args):
start = time.time()
resume_percent = None
resume_downloaded_data_len = None
proc = subprocess.Popen(args, stderr=subprocess.PIPE)
cursor_in_new_line = True
proc_stderr_closed = False
while not proc_stderr_closed:
# read line from stderr
line = u''
while True:
char = proc.stderr.read(1)
if not char:
proc_stderr_closed = True
break
if char in [b'\r', b'\n']:
break
line += char.decode('ascii', 'replace')
if not line:
# proc_stderr_closed is True
continue
mobj = re.search(r'([0-9]+\.[0-9]{3}) kB / [0-9]+\.[0-9]{2} sec \(([0-9]{1,2}\.[0-9])%\)', line)
if mobj:
downloaded_data_len = int(float(mobj.group(1))*1024)
percent = float(mobj.group(2))
if not resume_percent:
resume_percent = percent
resume_downloaded_data_len = downloaded_data_len
eta = self.calc_eta(start, time.time(), 100-resume_percent, percent-resume_percent)
speed = self.calc_speed(start, time.time(), downloaded_data_len-resume_downloaded_data_len)
data_len = None
if percent > 0:
data_len = int(downloaded_data_len * 100 / percent)
data_len_str = u'~' + format_bytes(data_len)
self.report_progress(percent, data_len_str, speed, eta)
cursor_in_new_line = False
self._hook_progress({
'downloaded_bytes': downloaded_data_len,
'total_bytes': data_len,
'tmpfilename': tmpfilename,
'filename': filename,
'status': 'downloading',
'eta': eta,
'speed': speed,
})
elif self.params.get('verbose', False):
if not cursor_in_new_line:
self.to_screen(u'')
cursor_in_new_line = True
self.to_screen(u'[rtmpdump] '+line)
proc.wait()
if not cursor_in_new_line:
self.to_screen(u'')
return proc.returncode
def _download_with_rtmpdump(self, filename, url, player_url, page_url, play_path, tc_url):
self.report_destination(filename)
tmpfilename = self.temp_name(filename)
test = self.params.get('test', False)
# Check for rtmpdump first
try:
@ -247,12 +321,11 @@ class FileDownloader(object):
except (OSError, IOError):
self.report_error(u'RTMP download detected but "rtmpdump" could not be run')
return False
verbosity_option = '--verbose' if self.params.get('verbose', False) else '--quiet'
# Download using rtmpdump. rtmpdump returns exit code 2 when
# the connection was interrumpted and resuming appears to be
# possible. This is part of rtmpdump's normal usage, AFAIK.
basic_args = ['rtmpdump', verbosity_option, '-r', url, '-o', tmpfilename]
basic_args = ['rtmpdump', '--verbose', '-r', url, '-o', tmpfilename]
if player_url is not None:
basic_args += ['--swfVfy', player_url]
if page_url is not None:
@ -261,31 +334,53 @@ class FileDownloader(object):
basic_args += ['--playpath', play_path]
if tc_url is not None:
basic_args += ['--tcUrl', url]
if test:
basic_args += ['--stop', '1']
if live:
basic_args += ['--live']
args = basic_args + [[], ['--resume', '--skip', '1']][self.params.get('continuedl', False)]
if sys.platform == 'win32' and sys.version_info < (3, 0):
# Windows subprocess module does not actually support Unicode
# on Python 2.x
# See http://stackoverflow.com/a/9951851/35070
subprocess_encoding = sys.getfilesystemencoding()
args = [a.encode(subprocess_encoding, 'ignore') for a in args]
else:
subprocess_encoding = None
if self.params.get('verbose', False):
if subprocess_encoding:
str_args = [
a.decode(subprocess_encoding) if isinstance(a, bytes) else a
for a in args]
else:
str_args = args
try:
import pipes
shell_quote = lambda args: ' '.join(map(pipes.quote, args))
shell_quote = lambda args: ' '.join(map(pipes.quote, str_args))
except ImportError:
shell_quote = repr
self.to_screen(u'[debug] rtmpdump command line: ' + shell_quote(args))
retval = subprocess.call(args)
while retval == 2 or retval == 1:
self.to_screen(u'[debug] rtmpdump command line: ' + shell_quote(str_args))
retval = run_rtmpdump(args)
while (retval == 2 or retval == 1) and not test:
prevsize = os.path.getsize(encodeFilename(tmpfilename))
self.to_screen(u'\r[rtmpdump] %s bytes' % prevsize, skip_eol=True)
self.to_screen(u'[rtmpdump] %s bytes' % prevsize)
time.sleep(5.0) # This seems to be needed
retval = subprocess.call(basic_args + ['-e'] + [[], ['-k', '1']][retval == 1])
retval = run_rtmpdump(basic_args + ['-e'] + [[], ['-k', '1']][retval == 1])
cursize = os.path.getsize(encodeFilename(tmpfilename))
if prevsize == cursize and retval == 1:
break
# Some rtmp streams seem abort after ~ 99.8%. Don't complain for those
if prevsize == cursize and retval == 2 and cursize > 1024:
self.to_screen(u'\r[rtmpdump] Could not download the whole video. This can happen for some advertisements.')
self.to_screen(u'[rtmpdump] Could not download the whole video. This can happen for some advertisements.')
retval = 0
break
if retval == 0:
if retval == 0 or (test and retval == 2):
fsize = os.path.getsize(encodeFilename(tmpfilename))
self.to_screen(u'\r[rtmpdump] %s bytes' % fsize)
self.to_screen(u'[rtmpdump] %s bytes' % fsize)
self.try_rename(tmpfilename, filename)
self._hook_progress({
'downloaded_bytes': fsize,
@ -329,6 +424,40 @@ class FileDownloader(object):
self.report_error(u'mplayer exited with code %d' % retval)
return False
def _download_m3u8_with_ffmpeg(self, filename, url):
self.report_destination(filename)
tmpfilename = self.temp_name(filename)
args = ['-y', '-i', url, '-f', 'mp4', '-c', 'copy',
'-bsf:a', 'aac_adtstoasc', tmpfilename]
for program in ['avconv', 'ffmpeg']:
try:
subprocess.call([program, '-version'], stdout=(open(os.path.devnull, 'w')), stderr=subprocess.STDOUT)
break
except (OSError, IOError):
pass
else:
self.report_error(u'm3u8 download detected but ffmpeg or avconv could not be found')
cmd = [program] + args
retval = subprocess.call(cmd)
if retval == 0:
fsize = os.path.getsize(encodeFilename(tmpfilename))
self.to_screen(u'\r[%s] %s bytes' % (args[0], fsize))
self.try_rename(tmpfilename, filename)
self._hook_progress({
'downloaded_bytes': fsize,
'total_bytes': fsize,
'filename': filename,
'status': 'finished',
})
return True
else:
self.to_stderr(u"\n")
self.report_error(u'ffmpeg exited with code %d' % retval)
return False
def _do_download(self, filename, info_dict):
url = info_dict['url']
@ -339,6 +468,7 @@ class FileDownloader(object):
self._hook_progress({
'filename': filename,
'status': 'finished',
'total_bytes': os.path.getsize(encodeFilename(filename)),
})
return True
@ -348,12 +478,17 @@ class FileDownloader(object):
info_dict.get('player_url', None),
info_dict.get('page_url', None),
info_dict.get('play_path', None),
info_dict.get('tc_url', None))
info_dict.get('tc_url', None),
info_dict.get('rtmp_live', False))
# Attempt to download using mplayer
if url.startswith('mms') or url.startswith('rtsp'):
return self._download_with_mplayer(filename, url)
# m3u8 manifest are downloaded with ffmpeg
if determine_ext(url) == u'm3u8':
return self._download_m3u8_with_ffmpeg(filename, url)
tmpfilename = self.temp_name(filename)
stream = None
@ -448,7 +583,7 @@ class FileDownloader(object):
self.to_screen(u'\r[download] File is larger than max-filesize (%s bytes > %s bytes). Aborting.' % (data_len, max_data_len))
return False
data_len_str = self.format_bytes(data_len)
data_len_str = format_bytes(data_len)
byte_counter = 0 + resume_len
block_size = self.params.get('buffersize', 1024)
start = time.time()
@ -481,13 +616,13 @@ class FileDownloader(object):
block_size = self.best_block_size(after - before, len(data_block))
# Progress message
speed_str = self.calc_speed(start, time.time(), byte_counter - resume_len)
speed = self.calc_speed(start, time.time(), byte_counter - resume_len)
if data_len is None:
self.report_progress('Unknown %', data_len_str, speed_str, 'Unknown ETA')
eta = percent = None
else:
percent_str = self.calc_percent(byte_counter, data_len)
eta_str = self.calc_eta(start, time.time(), data_len - resume_len, byte_counter - resume_len)
self.report_progress(percent_str, data_len_str, speed_str, eta_str)
percent = self.calc_percent(byte_counter, data_len)
eta = self.calc_eta(start, time.time(), data_len - resume_len, byte_counter - resume_len)
self.report_progress(percent, data_len_str, speed, eta)
self._hook_progress({
'downloaded_bytes': byte_counter,
@ -495,6 +630,8 @@ class FileDownloader(object):
'tmpfilename': tmpfilename,
'filename': filename,
'status': 'downloading',
'eta': eta,
'speed': speed,
})
# Apply rate limit
@ -505,7 +642,7 @@ class FileDownloader(object):
self.report_error(u'Did not get any data blocks')
return False
stream.close()
self.report_finish()
self.report_finish(data_len_str, (time.time() - start))
if data_len is not None and byte_counter != data_len:
raise ContentTooShortError(byte_counter, int(data_len))
self.try_rename(tmpfilename, filename)
@ -537,6 +674,8 @@ class FileDownloader(object):
* downloaded_bytes: Bytes on disks
* total_bytes: Total bytes, None if unknown
* tmpfilename: The filename we're currently writing to
* eta: The estimated time in seconds, None if unknown
* speed: The download speed in bytes/second, None if unknown
Hooks are guaranteed to be called at least once (with status "finished")
if the download is successful.

View File

@ -3,7 +3,14 @@ import subprocess
import sys
import time
from .utils import *
from .utils import (
compat_subprocess_get_DEVNULL,
encodeFilename,
PostProcessingError,
shell_quote,
subtitles_filename,
)
class PostProcessor(object):
@ -71,12 +78,19 @@ class FFmpegPostProcessor(PostProcessor):
programs = ['avprobe', 'avconv', 'ffmpeg', 'ffprobe']
return dict((program, executable(program)) for program in programs)
def run_ffmpeg(self, path, out_path, opts):
def run_ffmpeg_multiple_files(self, input_paths, out_path, opts):
if not self._exes['ffmpeg'] and not self._exes['avconv']:
raise FFmpegPostProcessorError(u'ffmpeg or avconv not found. Please install one.')
cmd = ([self._exes['avconv'] or self._exes['ffmpeg'], '-y', '-i', encodeFilename(path)]
files_cmd = []
for path in input_paths:
files_cmd.extend(['-i', encodeFilename(path)])
cmd = ([self._exes['avconv'] or self._exes['ffmpeg'], '-y'] + files_cmd
+ opts +
[encodeFilename(self._ffmpeg_filename_argument(out_path))])
if self._downloader.params.get('verbose', False):
self._downloader.to_screen(u'[debug] ffmpeg command line: %s' % shell_quote(cmd))
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout,stderr = p.communicate()
if p.returncode != 0:
@ -84,6 +98,9 @@ class FFmpegPostProcessor(PostProcessor):
msg = stderr.strip().split('\n')[-1]
raise FFmpegPostProcessorError(msg)
def run_ffmpeg(self, path, out_path, opts):
self.run_ffmpeg_multiple_files([path], out_path, opts)
def _ffmpeg_filename_argument(self, fn):
# ffmpeg broke --, see https://ffmpeg.org/trac/ffmpeg/ticket/2127 for details
if fn.startswith(u'-'):
@ -100,7 +117,8 @@ class FFmpegExtractAudioPP(FFmpegPostProcessor):
self._nopostoverwrites = nopostoverwrites
def get_audio_codec(self, path):
if not self._exes['ffprobe'] and not self._exes['avprobe']: return None
if not self._exes['ffprobe'] and not self._exes['avprobe']:
raise PostProcessingError(u'ffprobe or avprobe not found. Please install one.')
try:
cmd = [self._exes['avprobe'] or self._exes['ffprobe'], '-show_streams', encodeFilename(self._ffmpeg_filename_argument(path))]
handle = subprocess.Popen(cmd, stderr=compat_subprocess_get_DEVNULL(), stdout=subprocess.PIPE)
@ -128,7 +146,7 @@ class FFmpegExtractAudioPP(FFmpegPostProcessor):
try:
FFmpegPostProcessor.run_ffmpeg(self, path, out_path, opts)
except FFmpegPostProcessorError as err:
raise AudioConversionError(err.message)
raise AudioConversionError(err.msg)
def run(self, information):
path = information['filepath']
@ -168,7 +186,8 @@ class FFmpegExtractAudioPP(FFmpegPostProcessor):
extension = self._preferredcodec
more_opts = []
if self._preferredquality is not None:
if int(self._preferredquality) < 10:
# The opus codec doesn't support the -aq option
if int(self._preferredquality) < 10 and extension != 'opus':
more_opts += [self._exes['avconv'] and '-q:a' or '-aq', self._preferredquality]
else:
more_opts += [self._exes['avconv'] and '-b:a' or '-ab', self._preferredquality + 'k']
@ -198,7 +217,7 @@ class FFmpegExtractAudioPP(FFmpegPostProcessor):
except:
etype,e,tb = sys.exc_info()
if isinstance(e, AudioConversionError):
msg = u'audio conversion failed: ' + e.message
msg = u'audio conversion failed: ' + e.msg
else:
msg = u'error running ' + (self._exes['avconv'] and 'avconv' or 'ffmpeg')
raise PostProcessingError(msg)
@ -208,7 +227,7 @@ class FFmpegExtractAudioPP(FFmpegPostProcessor):
try:
os.utime(encodeFilename(new_path), (time.time(), information['filetime']))
except:
self._downloader.to_stderr(u'WARNING: Cannot update utime of audio file')
self._downloader.report_warning(u'Cannot update utime of audio file')
information['filepath'] = new_path
return self._nopostoverwrites,information
@ -231,3 +250,262 @@ class FFmpegVideoConvertor(FFmpegPostProcessor):
information['format'] = self._preferedformat
information['ext'] = self._preferedformat
return False,information
class FFmpegEmbedSubtitlePP(FFmpegPostProcessor):
# See http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt
_lang_map = {
'aa': 'aar',
'ab': 'abk',
'ae': 'ave',
'af': 'afr',
'ak': 'aka',
'am': 'amh',
'an': 'arg',
'ar': 'ara',
'as': 'asm',
'av': 'ava',
'ay': 'aym',
'az': 'aze',
'ba': 'bak',
'be': 'bel',
'bg': 'bul',
'bh': 'bih',
'bi': 'bis',
'bm': 'bam',
'bn': 'ben',
'bo': 'bod',
'br': 'bre',
'bs': 'bos',
'ca': 'cat',
'ce': 'che',
'ch': 'cha',
'co': 'cos',
'cr': 'cre',
'cs': 'ces',
'cu': 'chu',
'cv': 'chv',
'cy': 'cym',
'da': 'dan',
'de': 'deu',
'dv': 'div',
'dz': 'dzo',
'ee': 'ewe',
'el': 'ell',
'en': 'eng',
'eo': 'epo',
'es': 'spa',
'et': 'est',
'eu': 'eus',
'fa': 'fas',
'ff': 'ful',
'fi': 'fin',
'fj': 'fij',
'fo': 'fao',
'fr': 'fra',
'fy': 'fry',
'ga': 'gle',
'gd': 'gla',
'gl': 'glg',
'gn': 'grn',
'gu': 'guj',
'gv': 'glv',
'ha': 'hau',
'he': 'heb',
'hi': 'hin',
'ho': 'hmo',
'hr': 'hrv',
'ht': 'hat',
'hu': 'hun',
'hy': 'hye',
'hz': 'her',
'ia': 'ina',
'id': 'ind',
'ie': 'ile',
'ig': 'ibo',
'ii': 'iii',
'ik': 'ipk',
'io': 'ido',
'is': 'isl',
'it': 'ita',
'iu': 'iku',
'ja': 'jpn',
'jv': 'jav',
'ka': 'kat',
'kg': 'kon',
'ki': 'kik',
'kj': 'kua',
'kk': 'kaz',
'kl': 'kal',
'km': 'khm',
'kn': 'kan',
'ko': 'kor',
'kr': 'kau',
'ks': 'kas',
'ku': 'kur',
'kv': 'kom',
'kw': 'cor',
'ky': 'kir',
'la': 'lat',
'lb': 'ltz',
'lg': 'lug',
'li': 'lim',
'ln': 'lin',
'lo': 'lao',
'lt': 'lit',
'lu': 'lub',
'lv': 'lav',
'mg': 'mlg',
'mh': 'mah',
'mi': 'mri',
'mk': 'mkd',
'ml': 'mal',
'mn': 'mon',
'mr': 'mar',
'ms': 'msa',
'mt': 'mlt',
'my': 'mya',
'na': 'nau',
'nb': 'nob',
'nd': 'nde',
'ne': 'nep',
'ng': 'ndo',
'nl': 'nld',
'nn': 'nno',
'no': 'nor',
'nr': 'nbl',
'nv': 'nav',
'ny': 'nya',
'oc': 'oci',
'oj': 'oji',
'om': 'orm',
'or': 'ori',
'os': 'oss',
'pa': 'pan',
'pi': 'pli',
'pl': 'pol',
'ps': 'pus',
'pt': 'por',
'qu': 'que',
'rm': 'roh',
'rn': 'run',
'ro': 'ron',
'ru': 'rus',
'rw': 'kin',
'sa': 'san',
'sc': 'srd',
'sd': 'snd',
'se': 'sme',
'sg': 'sag',
'si': 'sin',
'sk': 'slk',
'sl': 'slv',
'sm': 'smo',
'sn': 'sna',
'so': 'som',
'sq': 'sqi',
'sr': 'srp',
'ss': 'ssw',
'st': 'sot',
'su': 'sun',
'sv': 'swe',
'sw': 'swa',
'ta': 'tam',
'te': 'tel',
'tg': 'tgk',
'th': 'tha',
'ti': 'tir',
'tk': 'tuk',
'tl': 'tgl',
'tn': 'tsn',
'to': 'ton',
'tr': 'tur',
'ts': 'tso',
'tt': 'tat',
'tw': 'twi',
'ty': 'tah',
'ug': 'uig',
'uk': 'ukr',
'ur': 'urd',
'uz': 'uzb',
've': 'ven',
'vi': 'vie',
'vo': 'vol',
'wa': 'wln',
'wo': 'wol',
'xh': 'xho',
'yi': 'yid',
'yo': 'yor',
'za': 'zha',
'zh': 'zho',
'zu': 'zul',
}
def __init__(self, downloader=None, subtitlesformat='srt'):
super(FFmpegEmbedSubtitlePP, self).__init__(downloader)
self._subformat = subtitlesformat
@classmethod
def _conver_lang_code(cls, code):
"""Convert language code from ISO 639-1 to ISO 639-2/T"""
return cls._lang_map.get(code[:2])
def run(self, information):
if information['ext'] != u'mp4':
self._downloader.to_screen(u'[ffmpeg] Subtitles can only be embedded in mp4 files')
return True, information
if not information.get('subtitles'):
self._downloader.to_screen(u'[ffmpeg] There aren\'t any subtitles to embed')
return True, information
sub_langs = [key for key in information['subtitles']]
filename = information['filepath']
input_files = [filename] + [subtitles_filename(filename, lang, self._subformat) for lang in sub_langs]
opts = ['-map', '0:0', '-map', '0:1', '-c:v', 'copy', '-c:a', 'copy']
for (i, lang) in enumerate(sub_langs):
opts.extend(['-map', '%d:0' % (i+1), '-c:s:%d' % i, 'mov_text'])
lang_code = self._conver_lang_code(lang)
if lang_code is not None:
opts.extend(['-metadata:s:s:%d' % i, 'language=%s' % lang_code])
opts.extend(['-f', 'mp4'])
temp_filename = filename + u'.temp'
self._downloader.to_screen(u'[ffmpeg] Embedding subtitles in \'%s\'' % filename)
self.run_ffmpeg_multiple_files(input_files, temp_filename, opts)
os.remove(encodeFilename(filename))
os.rename(encodeFilename(temp_filename), encodeFilename(filename))
return True, information
class FFmpegMetadataPP(FFmpegPostProcessor):
def run(self, info):
metadata = {}
if info.get('title') is not None:
metadata['title'] = info['title']
if info.get('upload_date') is not None:
metadata['date'] = info['upload_date']
if info.get('uploader') is not None:
metadata['artist'] = info['uploader']
elif info.get('uploader_id') is not None:
metadata['artist'] = info['uploader_id']
if not metadata:
self._downloader.to_screen(u'[ffmpeg] There isn\'t any metadata to add')
return True, info
filename = info['filepath']
ext = os.path.splitext(filename)[1][1:]
temp_filename = filename + u'.temp'
options = ['-c', 'copy']
for (name, value) in metadata.items():
options.extend(['-metadata', '%s=%s' % (name, value)])
options.extend(['-f', ext])
self._downloader.to_screen(u'[ffmpeg] Adding metadata to \'%s\'' % filename)
self.run_ffmpeg(filename, temp_filename, options)
os.remove(encodeFilename(filename))
os.rename(encodeFilename(temp_filename), encodeFilename(filename))
return True, info

View File

@ -3,18 +3,55 @@
from __future__ import absolute_import
import errno
import io
import json
import os
import platform
import re
import shutil
import subprocess
import socket
import sys
import time
import traceback
from .utils import *
from .extractor import get_info_extractor
if os.name == 'nt':
import ctypes
from .utils import (
compat_cookiejar,
compat_http_client,
compat_print,
compat_str,
compat_urllib_error,
compat_urllib_request,
ContentTooShortError,
date_from_str,
DateRange,
determine_ext,
DownloadError,
encodeFilename,
ExtractorError,
format_bytes,
locked_file,
make_HTTPS_handler,
MaxDownloadsReached,
PostProcessingError,
platform_name,
preferredencoding,
SameFileError,
sanitize_filename,
subtitles_filename,
takewhile_inclusive,
UnavailableVideoError,
write_json_file,
write_string,
YoutubeDLHandler,
)
from .extractor import get_info_extractor, gen_extractors
from .FileDownloader import FileDownloader
from .version import __version__
class YoutubeDL(object):
@ -56,6 +93,7 @@ class YoutubeDL(object):
forcethumbnail: Force printing thumbnail URL.
forcedescription: Force printing description.
forcefilename: Force printing final filename.
forcejson: Force printing info_dict as JSON.
simulate: Do not download the video files.
format: Video format code.
format_limit: Highest quality format to try.
@ -67,20 +105,34 @@ class YoutubeDL(object):
playlistend: Playlist item to end at.
matchtitle: Download only matching titles.
rejecttitle: Reject downloads for matching titles.
logger: Log messages to a logging.Logger instance.
logtostderr: Log messages to stderr instead of stdout.
writedescription: Write the video description to a .description file
writeinfojson: Write the video description to a .info.json file
writeannotations: Write the video annotations to a .annotations.xml file
writethumbnail: Write the thumbnail image to a file
writesubtitles: Write the video subtitles to a file
writeautomaticsub: Write the automatic subtitles to a file
allsubtitles: Downloads all the subtitles of the video
(requires writesubtitles or writeautomaticsub)
listsubtitles: Lists all available subtitles for the video
subtitlesformat: Subtitle format [srt/sbv/vtt] (default=srt)
subtitleslang: Language of the subtitles to download
subtitleslangs: List of languages of the subtitles to download
keepvideo: Keep the video file after post-processing
daterange: A DateRange object, download only if the upload_date is in the range.
skip_download: Skip the actual download of the video file
cachedir: Location of the cache files in the filesystem.
None to disable filesystem cache.
noplaylist: Download single video instead of a playlist if in doubt.
age_limit: An integer representing the user's age in years.
Unsuitable videos for the given age are skipped.
download_archive: File name of a file where all downloads are recorded.
Videos already present in the file are not downloaded
again.
cookiefile: File name where cookies should be read from and dumped to.
nocheckcertificate:Do not verify SSL certificates
proxy: URL of the proxy server to use
The following parameters are not used by YoutubeDL itself, they are used by
the FileDownloader:
nopart, updatetime, buffersize, ratelimit, min_filesize, max_filesize, test,
@ -94,25 +146,59 @@ class YoutubeDL(object):
_num_downloads = None
_screen_file = None
def __init__(self, params):
def __init__(self, params={}):
"""Create a FileDownloader object with the given options."""
self._ies = []
self._ies_instances = {}
self._pps = []
self._progress_hooks = []
self._download_retcode = 0
self._num_downloads = 0
self._screen_file = [sys.stdout, sys.stderr][params.get('logtostderr', False)]
if (sys.version_info >= (3,) and sys.platform != 'win32' and
sys.getfilesystemencoding() in ['ascii', 'ANSI_X3.4-1968']
and not params['restrictfilenames']):
# On Python 3, the Unicode filesystem API will throw errors (#1474)
self.report_warning(
u'Assuming --restrict-filenames since file system encoding '
u'cannot encode all charactes. '
u'Set the LC_ALL environment variable to fix this.')
params['restrictfilenames'] = True
self.params = params
self.fd = FileDownloader(self, self.params)
if '%(stitle)s' in self.params['outtmpl']:
if '%(stitle)s' in self.params.get('outtmpl', ''):
self.report_warning(u'%(stitle)s is deprecated. Use the %(title)s and the --restrict-filenames flag(which also secures %(uploader)s et al) instead.')
self._setup_opener()
def add_info_extractor(self, ie):
"""Add an InfoExtractor object to the end of the list."""
self._ies.append(ie)
self._ies_instances[ie.ie_key()] = ie
ie.set_downloader(self)
def get_info_extractor(self, ie_key):
"""
Get an instance of an IE with name ie_key, it will try to get one from
the _ies list, if there's no instance it will create a new one and add
it to the extractor list.
"""
ie = self._ies_instances.get(ie_key)
if ie is None:
ie = get_info_extractor(ie_key)()
self.add_info_extractor(ie)
return ie
def add_default_info_extractors(self):
"""
Add the InfoExtractors returned by gen_extractors to the end of the list
"""
for ie in gen_extractors():
self.add_info_extractor(ie)
def add_post_processor(self, pp):
"""Add a PostProcessor object to the end of the chain."""
self._pps.append(pp)
@ -120,26 +206,57 @@ class YoutubeDL(object):
def to_screen(self, message, skip_eol=False):
"""Print message to stdout if not in quiet mode."""
assert type(message) == type(u'')
if not self.params.get('quiet', False):
if self.params.get('logger'):
self.params['logger'].debug(message)
elif not self.params.get('quiet', False):
terminator = [u'\n', u''][skip_eol]
output = message + terminator
if 'b' in getattr(self._screen_file, 'mode', '') or sys.version_info[0] < 3: # Python 2 lies about the mode of sys.stdout/sys.stderr
output = output.encode(preferredencoding(), 'ignore')
self._screen_file.write(output)
self._screen_file.flush()
write_string(output, self._screen_file)
def to_stderr(self, message):
"""Print message to stderr."""
assert type(message) == type(u'')
output = message + u'\n'
if 'b' in getattr(self._screen_file, 'mode', '') or sys.version_info[0] < 3: # Python 2 lies about the mode of sys.stdout/sys.stderr
output = output.encode(preferredencoding())
sys.stderr.write(output)
if self.params.get('logger'):
self.params['logger'].error(message)
else:
output = message + u'\n'
if 'b' in getattr(self._screen_file, 'mode', '') or sys.version_info[0] < 3: # Python 2 lies about the mode of sys.stdout/sys.stderr
output = output.encode(preferredencoding())
sys.stderr.write(output)
def fixed_template(self):
"""Checks if the output template is fixed."""
return (re.search(u'(?u)%\\(.+?\\)s', self.params['outtmpl']) is None)
def to_console_title(self, message):
if not self.params.get('consoletitle', False):
return
if os.name == 'nt' and ctypes.windll.kernel32.GetConsoleWindow():
# c_wchar_p() might not be necessary if `message` is
# already of type unicode()
ctypes.windll.kernel32.SetConsoleTitleW(ctypes.c_wchar_p(message))
elif 'TERM' in os.environ:
write_string(u'\033]0;%s\007' % message, self._screen_file)
def save_console_title(self):
if not self.params.get('consoletitle', False):
return
if 'TERM' in os.environ:
# Save the title on stack
write_string(u'\033[22;0t', self._screen_file)
def restore_console_title(self):
if not self.params.get('consoletitle', False):
return
if 'TERM' in os.environ:
# Restore the title from stack
write_string(u'\033[23;0t', self._screen_file)
def __enter__(self):
self.save_console_title()
return self
def __exit__(self, *args):
self.restore_console_title()
if self.params.get('cookiefile') is not None:
self.cookiejar.save()
def trouble(self, message=None, tb=None):
"""Determine action to take when a download problem appears.
@ -177,10 +294,10 @@ class YoutubeDL(object):
If stderr is a tty file the 'WARNING:' will be colored
'''
if sys.stderr.isatty() and os.name != 'nt':
_msg_header=u'\033[0;33mWARNING:\033[0m'
_msg_header = u'\033[0;33mWARNING:\033[0m'
else:
_msg_header=u'WARNING:'
warning_message=u'%s %s' % (_msg_header,message)
_msg_header = u'WARNING:'
warning_message = u'%s %s' % (_msg_header, message)
self.to_stderr(warning_message)
def report_error(self, message, tb=None):
@ -195,19 +312,6 @@ class YoutubeDL(object):
error_message = u'%s %s' % (_msg_header, message)
self.trouble(error_message, tb)
def slow_down(self, start_time, byte_counter):
"""Sleep if the download speed is over the rate limit."""
rate_limit = self.params.get('ratelimit', None)
if rate_limit is None or byte_counter == 0:
return
now = time.time()
elapsed = now - start_time
if elapsed <= 0.0:
return
speed = float(byte_counter) / elapsed
if speed > rate_limit:
time.sleep((byte_counter - rate_limit * (now - start_time)) / rate_limit)
def report_writedescription(self, descfn):
""" Report that the description file is being written """
self.to_screen(u'[info] Writing video description to: ' + descfn)
@ -220,11 +324,15 @@ class YoutubeDL(object):
""" Report that the metadata file has been written """
self.to_screen(u'[info] Video description metadata as JSON to: ' + infofn)
def report_writeannotations(self, annofn):
""" Report that the annotations file has been written. """
self.to_screen(u'[info] Writing video annotations to: ' + annofn)
def report_file_already_downloaded(self, file_name):
"""Report file has already been fully downloaded."""
try:
self.to_screen(u'[download] %s has already been downloaded' % file_name)
except (UnicodeEncodeError) as err:
except UnicodeEncodeError:
self.to_screen(u'[download] The file has already been downloaded')
def increment_downloads(self):
@ -242,54 +350,69 @@ class YoutubeDL(object):
autonumber_size = 5
autonumber_templ = u'%0' + str(autonumber_size) + u'd'
template_dict['autonumber'] = autonumber_templ % self._num_downloads
if template_dict['playlist_index'] is not None:
if template_dict.get('playlist_index') is not None:
template_dict['playlist_index'] = u'%05d' % template_dict['playlist_index']
sanitize = lambda k,v: sanitize_filename(
sanitize = lambda k, v: sanitize_filename(
u'NA' if v is None else compat_str(v),
restricted=self.params.get('restrictfilenames'),
is_id=(k==u'id'))
template_dict = dict((k, sanitize(k, v)) for k,v in template_dict.items())
is_id=(k == u'id'))
template_dict = dict((k, sanitize(k, v))
for k, v in template_dict.items())
filename = self.params['outtmpl'] % template_dict
tmpl = os.path.expanduser(self.params['outtmpl'])
filename = tmpl % template_dict
return filename
except KeyError as err:
self.report_error(u'Erroneous output template')
return None
except ValueError as err:
self.report_error(u'Insufficient system charset ' + repr(preferredencoding()))
self.report_error(u'Error in output template: ' + str(err) + u' (encoding: ' + repr(preferredencoding()) + ')')
return None
def _match_entry(self, info_dict):
""" Returns None iff the file should be downloaded """
title = info_dict['title']
matchtitle = self.params.get('matchtitle', False)
if matchtitle:
if not re.search(matchtitle, title, re.IGNORECASE):
return u'[download] "' + title + '" title did not match pattern "' + matchtitle + '"'
rejecttitle = self.params.get('rejecttitle', False)
if rejecttitle:
if re.search(rejecttitle, title, re.IGNORECASE):
return u'"' + title + '" title matched reject pattern "' + rejecttitle + '"'
if 'title' in info_dict:
# This can happen when we're just evaluating the playlist
title = info_dict['title']
matchtitle = self.params.get('matchtitle', False)
if matchtitle:
if not re.search(matchtitle, title, re.IGNORECASE):
return u'[download] "' + title + '" title did not match pattern "' + matchtitle + '"'
rejecttitle = self.params.get('rejecttitle', False)
if rejecttitle:
if re.search(rejecttitle, title, re.IGNORECASE):
return u'"' + title + '" title matched reject pattern "' + rejecttitle + '"'
date = info_dict.get('upload_date', None)
if date is not None:
dateRange = self.params.get('daterange', DateRange())
if date not in dateRange:
return u'[download] %s upload date is not in range %s' % (date_from_str(date).isoformat(), dateRange)
age_limit = self.params.get('age_limit')
if age_limit is not None:
if age_limit < info_dict.get('age_limit', 0):
return u'Skipping "' + title + '" because it is age restricted'
if self.in_download_archive(info_dict):
return (u'%s has already been recorded in archive'
% info_dict.get('title', info_dict.get('id', u'video')))
return None
@staticmethod
def add_extra_info(info_dict, extra_info):
'''Set the keys from extra_info in info dict if they are missing'''
for key, value in extra_info.items():
info_dict.setdefault(key, value)
def extract_info(self, url, download=True, ie_key=None, extra_info={}):
'''
Returns a list with a dictionary for each video we find.
If 'download', also downloads the videos.
extra_info is a dict containing the extra values to add to each result
'''
if ie_key:
ie = get_info_extractor(ie_key)()
ie.set_downloader(self)
ies = [ie]
ies = [self.get_info_extractor(ie_key)]
else:
ies = self._ies
@ -307,17 +430,17 @@ class YoutubeDL(object):
break
if isinstance(ie_result, list):
# Backwards compatibility: old IE result format
for result in ie_result:
result.update(extra_info)
ie_result = {
'_type': 'compat_list',
'entries': ie_result,
}
else:
ie_result.update(extra_info)
if 'extractor' not in ie_result:
ie_result['extractor'] = ie.IE_NAME
return self.process_ie_result(ie_result, download=download)
self.add_extra_info(ie_result,
{
'extractor': ie.IE_NAME,
'webpage_url': url,
'extractor_key': ie.ie_key(),
})
return self.process_ie_result(ie_result, download, extra_info)
except ExtractorError as de: # An error we somewhat expected
self.report_error(compat_str(de), de.format_traceback())
break
@ -329,7 +452,7 @@ class YoutubeDL(object):
raise
else:
self.report_error(u'no suitable InfoExtractor: %s' % url)
def process_ie_result(self, ie_result, download=True, extra_info={}):
"""
Take the result of the ie(may be modified) and resolve all unresolved
@ -341,13 +464,8 @@ class YoutubeDL(object):
result_type = ie_result.get('_type', 'video') # If not given we suppose it's a video, support the default old system
if result_type == 'video':
if 'playlist' not in ie_result:
# It isn't part of a playlist
ie_result['playlist'] = None
ie_result['playlist_index'] = None
if download:
self.process_info(ie_result)
return ie_result
self.add_extra_info(ie_result, extra_info)
return self.process_video_result(ie_result, download=download)
elif result_type == 'url':
# We have to add extra_info to the results because it may be
# contained in a playlist
@ -356,9 +474,10 @@ class YoutubeDL(object):
ie_key=ie_result.get('ie_key'),
extra_info=extra_info)
elif result_type == 'playlist':
# We process each entry in the playlist
playlist = ie_result.get('title', None) or ie_result.get('id', None)
self.to_screen(u'[download] Downloading playlist: %s' % playlist)
self.to_screen(u'[download] Downloading playlist: %s' % playlist)
playlist_results = []
@ -376,17 +495,21 @@ class YoutubeDL(object):
self.to_screen(u"[%s] playlist '%s': Collected %d video ids (downloading %d of them)" %
(ie_result['extractor'], playlist, n_all_entries, n_entries))
for i,entry in enumerate(entries,1):
self.to_screen(u'[download] Downloading video #%s of %s' %(i, n_entries))
for i, entry in enumerate(entries, 1):
self.to_screen(u'[download] Downloading video #%s of %s' % (i, n_entries))
extra = {
'playlist': playlist,
'playlist_index': i + playliststart,
}
if not 'extractor' in entry:
# We set the extractor, if it's an url it will be set then to
# the new extractor, but if it's already a video we must make
# sure it's present: see issue #877
entry['extractor'] = ie_result['extractor']
'playlist': playlist,
'playlist_index': i + playliststart,
'extractor': ie_result['extractor'],
'webpage_url': ie_result['webpage_url'],
'extractor_key': ie_result['extractor_key'],
}
reason = self._match_entry(entry)
if reason is not None:
self.to_screen(u'[download] ' + reason)
continue
entry_result = self.process_ie_result(entry,
download=download,
extra_info=extra)
@ -395,16 +518,122 @@ class YoutubeDL(object):
return ie_result
elif result_type == 'compat_list':
def _fixup(r):
r.setdefault('extractor', ie_result['extractor'])
self.add_extra_info(r,
{
'extractor': ie_result['extractor'],
'webpage_url': ie_result['webpage_url'],
'extractor_key': ie_result['extractor_key'],
})
return r
ie_result['entries'] = [
self.process_ie_result(_fixup(r), download=download)
self.process_ie_result(_fixup(r), download, extra_info)
for r in ie_result['entries']
]
return ie_result
else:
raise Exception('Invalid result type: %s' % result_type)
def select_format(self, format_spec, available_formats):
if format_spec == 'best' or format_spec is None:
return available_formats[-1]
elif format_spec == 'worst':
return available_formats[0]
else:
extensions = [u'mp4', u'flv', u'webm', u'3gp']
if format_spec in extensions:
filter_f = lambda f: f['ext'] == format_spec
else:
filter_f = lambda f: f['format_id'] == format_spec
matches = list(filter(filter_f, available_formats))
if matches:
return matches[-1]
return None
def process_video_result(self, info_dict, download=True):
assert info_dict.get('_type', 'video') == 'video'
if 'playlist' not in info_dict:
# It isn't part of a playlist
info_dict['playlist'] = None
info_dict['playlist_index'] = None
# This extractors handle format selection themselves
if info_dict['extractor'] in [u'youtube', u'Youku']:
if download:
self.process_info(info_dict)
return info_dict
# We now pick which formats have to be downloaded
if info_dict.get('formats') is None:
# There's only one format available
formats = [info_dict]
else:
formats = info_dict['formats']
# We check that all the formats have the format and format_id fields
for (i, format) in enumerate(formats):
if format.get('format_id') is None:
format['format_id'] = compat_str(i)
if format.get('format') is None:
format['format'] = u'{id} - {res}{note}'.format(
id=format['format_id'],
res=self.format_resolution(format),
note=u' ({0})'.format(format['format_note']) if format.get('format_note') is not None else '',
)
# Automatically determine file extension if missing
if 'ext' not in format:
format['ext'] = determine_ext(format['url'])
if self.params.get('listformats', None):
self.list_formats(info_dict)
return
format_limit = self.params.get('format_limit', None)
if format_limit:
formats = list(takewhile_inclusive(
lambda f: f['format_id'] != format_limit, formats
))
if self.params.get('prefer_free_formats'):
def _free_formats_key(f):
try:
ext_ord = [u'flv', u'mp4', u'webm'].index(f['ext'])
except ValueError:
ext_ord = -1
# We only compare the extension if they have the same height and width
return (f.get('height'), f.get('width'), ext_ord)
formats = sorted(formats, key=_free_formats_key)
req_format = self.params.get('format', 'best')
if req_format is None:
req_format = 'best'
formats_to_download = []
# The -1 is for supporting YoutubeIE
if req_format in ('-1', 'all'):
formats_to_download = formats
else:
# We can accept formats requestd in the format: 34/5/best, we pick
# the first that is available, starting from left
req_formats = req_format.split('/')
for rf in req_formats:
selected_format = self.select_format(rf, formats)
if selected_format is not None:
formats_to_download = [selected_format]
break
if not formats_to_download:
raise ExtractorError(u'requested format not available',
expected=True)
if download:
if len(formats_to_download) > 1:
self.to_screen(u'[info] %s: downloading video in %s formats' % (info_dict['id'], len(formats_to_download)))
for format in formats_to_download:
new_info = dict(info_dict)
new_info.update(format)
self.process_info(new_info)
# We update the info dict with the best quality format (backwards compatibility)
info_dict.update(formats_to_download[-1])
return info_dict
def process_info(self, info_dict):
"""Process a single resolved IE result."""
@ -436,19 +665,22 @@ class YoutubeDL(object):
# Forced printings
if self.params.get('forcetitle', False):
compat_print(info_dict['title'])
compat_print(info_dict['fulltitle'])
if self.params.get('forceid', False):
compat_print(info_dict['id'])
if self.params.get('forceurl', False):
compat_print(info_dict['url'])
if self.params.get('forcethumbnail', False) and 'thumbnail' in info_dict:
# For RTMP URLs, also include the playpath
compat_print(info_dict['url'] + info_dict.get('play_path', u''))
if self.params.get('forcethumbnail', False) and info_dict.get('thumbnail') is not None:
compat_print(info_dict['thumbnail'])
if self.params.get('forcedescription', False) and 'description' in info_dict:
if self.params.get('forcedescription', False) and info_dict.get('description') is not None:
compat_print(info_dict['description'])
if self.params.get('forcefilename', False) and filename is not None:
compat_print(filename)
if self.params.get('forceformat', False):
compat_print(info_dict['format'])
if self.params.get('forcejson', False):
compat_print(json.dumps(info_dict))
# Do nothing else if in simulate mode
if self.params.get('simulate', False):
@ -471,68 +703,70 @@ class YoutubeDL(object):
self.report_writedescription(descfn)
with io.open(encodeFilename(descfn), 'w', encoding='utf-8') as descfile:
descfile.write(info_dict['description'])
except (KeyError, TypeError):
self.report_warning(u'There\'s no description to write.')
except (OSError, IOError):
self.report_error(u'Cannot write description file ' + descfn)
return
if (self.params.get('writesubtitles', False) or self.params.get('writeautomaticsub')) and 'subtitles' in info_dict and info_dict['subtitles']:
if self.params.get('writeannotations', False):
try:
annofn = filename + u'.annotations.xml'
self.report_writeannotations(annofn)
with io.open(encodeFilename(annofn), 'w', encoding='utf-8') as annofile:
annofile.write(info_dict['annotations'])
except (KeyError, TypeError):
self.report_warning(u'There are no annotations to write.')
except (OSError, IOError):
self.report_error(u'Cannot write annotations file: ' + annofn)
return
subtitles_are_requested = any([self.params.get('writesubtitles', False),
self.params.get('writeautomaticsub')])
if subtitles_are_requested and 'subtitles' in info_dict and info_dict['subtitles']:
# subtitles download errors are already managed as troubles in relevant IE
# that way it will silently go on when used with unsupporting IE
subtitle = info_dict['subtitles'][0]
(sub_error, sub_lang, sub) = subtitle
sub_format = self.params.get('subtitlesformat')
if sub_error:
self.report_warning("Some error while getting the subtitles")
else:
subtitles = info_dict['subtitles']
sub_format = self.params.get('subtitlesformat', 'srt')
for sub_lang in subtitles.keys():
sub = subtitles[sub_lang]
if sub is None:
continue
try:
sub_filename = filename.rsplit('.', 1)[0] + u'.' + sub_lang + u'.' + sub_format
sub_filename = subtitles_filename(filename, sub_lang, sub_format)
self.report_writesubtitles(sub_filename)
with io.open(encodeFilename(sub_filename), 'w', encoding='utf-8') as subfile:
subfile.write(sub)
subfile.write(sub)
except (OSError, IOError):
self.report_error(u'Cannot write subtitles file ' + descfn)
return
if self.params.get('allsubtitles', False) and 'subtitles' in info_dict and info_dict['subtitles']:
subtitles = info_dict['subtitles']
sub_format = self.params.get('subtitlesformat')
for subtitle in subtitles:
(sub_error, sub_lang, sub) = subtitle
if sub_error:
self.report_warning("Some error while getting the subtitles")
else:
try:
sub_filename = filename.rsplit('.', 1)[0] + u'.' + sub_lang + u'.' + sub_format
self.report_writesubtitles(sub_filename)
with io.open(encodeFilename(sub_filename), 'w', encoding='utf-8') as subfile:
subfile.write(sub)
except (OSError, IOError):
self.report_error(u'Cannot write subtitles file ' + descfn)
return
if self.params.get('writeinfojson', False):
infofn = filename + u'.info.json'
infofn = os.path.splitext(filename)[0] + u'.info.json'
self.report_writeinfojson(infofn)
try:
json_info_dict = dict((k, v) for k,v in info_dict.items() if not k in ['urlhandle'])
json_info_dict = dict((k, v) for k, v in info_dict.items() if not k in ['urlhandle'])
write_json_file(json_info_dict, encodeFilename(infofn))
except (OSError, IOError):
self.report_error(u'Cannot write metadata to JSON file ' + infofn)
return
if self.params.get('writethumbnail', False):
if 'thumbnail' in info_dict:
thumb_format = info_dict['thumbnail'].rpartition(u'/')[2].rpartition(u'.')[2]
if not thumb_format:
thumb_format = 'jpg'
if info_dict.get('thumbnail') is not None:
thumb_format = determine_ext(info_dict['thumbnail'], u'jpg')
thumb_filename = filename.rpartition('.')[0] + u'.' + thumb_format
self.to_screen(u'[%s] %s: Downloading thumbnail ...' %
(info_dict['extractor'], info_dict['id']))
uf = compat_urllib_request.urlopen(info_dict['thumbnail'])
with open(thumb_filename, 'wb') as thumbf:
shutil.copyfileobj(uf, thumbf)
self.to_screen(u'[%s] %s: Writing thumbnail to: %s' %
(info_dict['extractor'], info_dict['id'], thumb_filename))
try:
uf = compat_urllib_request.urlopen(info_dict['thumbnail'])
with open(thumb_filename, 'wb') as thumbf:
shutil.copyfileobj(uf, thumbf)
self.to_screen(u'[%s] %s: Writing thumbnail to: %s' %
(info_dict['extractor'], info_dict['id'], thumb_filename))
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
self.report_warning(u'Unable to download thumbnail "%s": %s' %
(info_dict['thumbnail'], compat_str(err)))
if not self.params.get('skip_download', False):
if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(filename)):
@ -540,11 +774,11 @@ class YoutubeDL(object):
else:
try:
success = self.fd._do_download(filename, info_dict)
except (OSError, IOError) as err:
raise UnavailableVideoError()
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
self.report_error(u'unable to download video data: %s' % str(err))
return
except (OSError, IOError) as err:
raise UnavailableVideoError(err)
except (ContentTooShortError, ) as err:
self.report_error(u'content too short (expected %s bytes and served %s)' % (err.expected, err.downloaded))
return
@ -556,15 +790,19 @@ class YoutubeDL(object):
self.report_error(u'postprocessing: %s' % str(err))
return
self.record_download_archive(info_dict)
def download(self, url_list):
"""Download a given list of URLs."""
if len(url_list) > 1 and self.fixed_template():
if (len(url_list) > 1 and
'%' not in self.params['outtmpl']
and self.params.get('max_downloads') != 1):
raise SameFileError(self.params['outtmpl'])
for url in url_list:
try:
#It also downloads the videos
videos = self.extract_info(url)
self.extract_info(url)
except UnavailableVideoError:
self.report_error(u'unable to download video')
except MaxDownloadsReached:
@ -580,7 +818,7 @@ class YoutubeDL(object):
keep_video = None
for pp in self._pps:
try:
keep_video_wish,new_info = pp.run(info)
keep_video_wish, new_info = pp.run(info)
if keep_video_wish is not None:
if keep_video_wish:
keep_video = keep_video_wish
@ -588,10 +826,184 @@ class YoutubeDL(object):
# No clear decision yet, let IE decide
keep_video = keep_video_wish
except PostProcessingError as e:
self.to_stderr(u'ERROR: ' + e.msg)
self.report_error(e.msg)
if keep_video is False and not self.params.get('keepvideo', False):
try:
self.to_screen(u'Deleting original file %s (pass -k to keep)' % filename)
os.remove(encodeFilename(filename))
except (IOError, OSError):
self.report_warning(u'Unable to remove downloaded video file')
def _make_archive_id(self, info_dict):
# Future-proof against any change in case
# and backwards compatibility with prior versions
extractor = info_dict.get('extractor_key')
if extractor is None:
if 'id' in info_dict:
extractor = info_dict.get('ie_key') # key in a playlist
if extractor is None:
return None # Incomplete video information
return extractor.lower() + u' ' + info_dict['id']
def in_download_archive(self, info_dict):
fn = self.params.get('download_archive')
if fn is None:
return False
vid_id = self._make_archive_id(info_dict)
if vid_id is None:
return False # Incomplete video information
try:
with locked_file(fn, 'r', encoding='utf-8') as archive_file:
for line in archive_file:
if line.strip() == vid_id:
return True
except IOError as ioe:
if ioe.errno != errno.ENOENT:
raise
return False
def record_download_archive(self, info_dict):
fn = self.params.get('download_archive')
if fn is None:
return
vid_id = self._make_archive_id(info_dict)
assert vid_id
with locked_file(fn, 'a', encoding='utf-8') as archive_file:
archive_file.write(vid_id + u'\n')
@staticmethod
def format_resolution(format, default='unknown'):
if format.get('vcodec') == 'none':
return 'audio only'
if format.get('_resolution') is not None:
return format['_resolution']
if format.get('height') is not None:
if format.get('width') is not None:
res = u'%sx%s' % (format['width'], format['height'])
else:
res = u'%sp' % format['height']
else:
res = default
return res
def list_formats(self, info_dict):
def format_note(fdict):
res = u''
if fdict.get('format_note') is not None:
res += fdict['format_note'] + u' '
if (fdict.get('vcodec') is not None and
fdict.get('vcodec') != 'none'):
res += u'%-5s' % fdict['vcodec']
elif fdict.get('vbr') is not None:
res += u'video'
if fdict.get('vbr') is not None:
res += u'@%4dk' % fdict['vbr']
if fdict.get('acodec') is not None:
if res:
res += u', '
res += u'%-5s' % fdict['acodec']
elif fdict.get('abr') is not None:
if res:
res += u', '
res += 'audio'
if fdict.get('abr') is not None:
res += u'@%3dk' % fdict['abr']
if fdict.get('filesize') is not None:
if res:
res += u', '
res += format_bytes(fdict['filesize'])
return res
def line(format, idlen=20):
return ((u'%-' + compat_str(idlen + 1) + u's%-10s%-12s%s') % (
format['format_id'],
format['ext'],
self.format_resolution(format),
format_note(format),
))
formats = info_dict.get('formats', [info_dict])
idlen = max(len(u'format code'),
max(len(f['format_id']) for f in formats))
formats_s = [line(f, idlen) for f in formats]
if len(formats) > 1:
formats_s[0] += (' ' if format_note(formats[0]) else '') + '(worst)'
formats_s[-1] += (' ' if format_note(formats[-1]) else '') + '(best)'
header_line = line({
'format_id': u'format code', 'ext': u'extension',
'_resolution': u'resolution', 'format_note': u'note'}, idlen=idlen)
self.to_screen(u'[info] Available formats for %s:\n%s\n%s' %
(info_dict['id'], header_line, u"\n".join(formats_s)))
def urlopen(self, req):
""" Start an HTTP download """
return self._opener.open(req)
def print_debug_header(self):
if not self.params.get('verbose'):
return
write_string(u'[debug] youtube-dl version ' + __version__ + u'\n')
try:
sp = subprocess.Popen(
['git', 'rev-parse', '--short', 'HEAD'],
stdout=subprocess.PIPE, stderr=subprocess.PIPE,
cwd=os.path.dirname(os.path.abspath(__file__)))
out, err = sp.communicate()
out = out.decode().strip()
if re.match('[0-9a-f]+', out):
write_string(u'[debug] Git HEAD: ' + out + u'\n')
except:
try:
sys.exc_clear()
except:
pass
write_string(u'[debug] Python version %s - %s' %
(platform.python_version(), platform_name()) + u'\n')
proxy_map = {}
for handler in self._opener.handlers:
if hasattr(handler, 'proxies'):
proxy_map.update(handler.proxies)
write_string(u'[debug] Proxy map: ' + compat_str(proxy_map) + u'\n')
def _setup_opener(self, timeout=20):
opts_cookiefile = self.params.get('cookiefile')
opts_proxy = self.params.get('proxy')
if opts_cookiefile is None:
self.cookiejar = compat_cookiejar.CookieJar()
else:
self.cookiejar = compat_cookiejar.MozillaCookieJar(
opts_cookiefile)
if os.access(opts_cookiefile, os.R_OK):
self.cookiejar.load()
cookie_processor = compat_urllib_request.HTTPCookieProcessor(
self.cookiejar)
if opts_proxy is not None:
if opts_proxy == '':
proxies = {}
else:
proxies = {'http': opts_proxy, 'https': opts_proxy}
else:
proxies = compat_urllib_request.getproxies()
# Set HTTPS proxy to HTTP one if given (https://github.com/rg3/youtube-dl/issues/805)
if 'http' in proxies and 'https' not in proxies:
proxies['https'] = proxies['http']
proxy_handler = compat_urllib_request.ProxyHandler(proxies)
https_handler = make_HTTPS_handler(
self.params.get('nocheckcertificate', False))
opener = compat_urllib_request.build_opener(
https_handler, proxy_handler, cookie_processor, YoutubeDLHandler())
# Delete the default user-agent header, which would otherwise apply in
# cases where our custom HTTP handler doesn't come into play
# (See https://github.com/rg3/youtube-dl/issues/1309 for details)
opener.addheaders = []
self._opener = opener
# TODO remove this global modification
compat_urllib_request.install_opener(opener)
socket.setdefaulttimeout(timeout)

View File

@ -26,7 +26,17 @@ __authors__ = (
'Julien Fraichard',
'Johny Mo Swag',
'Axel Noack',
)
'Albert Kim',
'Pierre Rudloff',
'Huarong Huo',
'Ismael Mejía',
'Steffan \'Ruirize\' James',
'Andras Elso',
'Jelle van der Waa',
'Marcin Cieślak',
'Anton Larionov',
'Takuya Tsuchida',
)
__license__ = 'Public Domain'
@ -34,21 +44,40 @@ import codecs
import getpass
import optparse
import os
import random
import re
import shlex
import socket
import subprocess
import sys
import warnings
import platform
from .utils import *
from .utils import (
compat_print,
DateRange,
decodeOption,
determine_ext,
DownloadError,
get_cachedir,
MaxDownloadsReached,
preferredencoding,
SameFileError,
std_headers,
write_string,
)
from .update import update_self
from .version import __version__
from .FileDownloader import *
from .FileDownloader import (
FileDownloader,
)
from .extractor import gen_extractors
from .version import __version__
from .YoutubeDL import YoutubeDL
from .PostProcessor import *
from .PostProcessor import (
FFmpegMetadataPP,
FFmpegVideoConvertor,
FFmpegExtractAudioPP,
FFmpegEmbedSubtitlePP,
)
def parseOpts(overrideArguments=None):
def _readOptions(filename_bytes):
@ -80,6 +109,9 @@ def parseOpts(overrideArguments=None):
return "".join(opts)
def _comma_separated_values_options_callback(option, opt_str, value, parser):
setattr(parser.values, option.dest, value.split(','))
def _find_term_columns():
columns = os.environ.get('COLUMNS', None)
if columns:
@ -93,6 +125,16 @@ def parseOpts(overrideArguments=None):
pass
return None
def _hide_login_info(opts):
opts = list(opts)
for private_opt in ['-p', '--password', '-u', '--username', '--video-password']:
try:
i = opts.index(private_opt)
opts[i+1] = '<PRIVATE>'
except ValueError:
pass
return opts
max_width = 80
max_help_position = 80
@ -117,6 +159,8 @@ def parseOpts(overrideArguments=None):
selection = optparse.OptionGroup(parser, 'Video Selection')
authentication = optparse.OptionGroup(parser, 'Authentication Options')
video_format = optparse.OptionGroup(parser, 'Video Format Options')
subtitles = optparse.OptionGroup(parser, 'Subtitle Options')
downloader = optparse.OptionGroup(parser, 'Download Options')
postproc = optparse.OptionGroup(parser, 'Post-processing Options')
filesystem = optparse.OptionGroup(parser, 'Filesystem Options')
verbosity = optparse.OptionGroup(parser, 'Verbosity / Simulation Options')
@ -126,18 +170,12 @@ def parseOpts(overrideArguments=None):
general.add_option('-v', '--version',
action='version', help='print program version and exit')
general.add_option('-U', '--update',
action='store_true', dest='update_self', help='update this program to latest version')
action='store_true', dest='update_self', help='update this program to latest version. Make sure that you have sufficient permissions (run with sudo if needed)')
general.add_option('-i', '--ignore-errors',
action='store_true', dest='ignoreerrors', help='continue on download errors', default=False)
general.add_option('-r', '--rate-limit',
dest='ratelimit', metavar='LIMIT', help='maximum download rate (e.g. 50k or 44.6m)')
general.add_option('-R', '--retries',
dest='retries', metavar='RETRIES', help='number of retries (default is %default)', default=10)
general.add_option('--buffer-size',
dest='buffersize', metavar='SIZE', help='size of download buffer (e.g. 1024 or 16k) (default is %default)', default="1024")
general.add_option('--no-resize-buffer',
action='store_true', dest='noresizebuffer',
help='do not automatically adjust the buffer size. By default, the buffer size is automatically resized from an initial value of SIZE.', default=False)
action='store_true', dest='ignoreerrors', help='continue on download errors, for example to to skip unavailable videos in a playlist', default=False)
general.add_option('--abort-on-error',
action='store_false', dest='ignoreerrors',
help='Abort downloading of further videos (in the playlist or the command line) if an error occurs')
general.add_option('--dump-user-agent',
action='store_true', dest='dump_user_agent',
help='display the current browser identification', default=False)
@ -149,9 +187,18 @@ def parseOpts(overrideArguments=None):
general.add_option('--list-extractors',
action='store_true', dest='list_extractors',
help='List all supported extractors and the URLs they would handle', default=False)
general.add_option('--extractor-descriptions',
action='store_true', dest='list_extractor_descriptions',
help='Output descriptions of all supported extractors', default=False)
general.add_option('--proxy', dest='proxy', default=None, help='Use the specified HTTP/HTTPS proxy', metavar='URL')
general.add_option('--no-check-certificate', action='store_true', dest='no_check_certificate', default=False, help='Suppress HTTPS certificate validation.')
general.add_option('--test', action='store_true', dest='test', default=False, help=optparse.SUPPRESS_HELP)
general.add_option(
'--cache-dir', dest='cachedir', default=get_cachedir(), metavar='DIR',
help='Location in the filesystem where youtube-dl can store downloaded information permanently. By default $XDG_CACHE_HOME/youtube-dl or ~/.cache/youtube-dl .')
general.add_option(
'--no-cache-dir', action='store_const', const=None, dest='cachedir',
help='Disable filesystem caching')
selection.add_option('--playlist-start',
dest='playliststart', metavar='NUMBER', help='playlist video to start at (default is %default)', default=1)
@ -159,12 +206,21 @@ def parseOpts(overrideArguments=None):
dest='playlistend', metavar='NUMBER', help='playlist video to end at (default is last)', default=-1)
selection.add_option('--match-title', dest='matchtitle', metavar='REGEX',help='download only matching titles (regex or caseless sub-string)')
selection.add_option('--reject-title', dest='rejecttitle', metavar='REGEX',help='skip download for matching titles (regex or caseless sub-string)')
selection.add_option('--max-downloads', metavar='NUMBER', dest='max_downloads', help='Abort after downloading NUMBER files', default=None)
selection.add_option('--max-downloads', metavar='NUMBER',
dest='max_downloads', type=int, default=None,
help='Abort after downloading NUMBER files')
selection.add_option('--min-filesize', metavar='SIZE', dest='min_filesize', help="Do not download any videos smaller than SIZE (e.g. 50k or 44.6m)", default=None)
selection.add_option('--max-filesize', metavar='SIZE', dest='max_filesize', help="Do not download any videos larger than SIZE (e.g. 50k or 44.6m)", default=None)
selection.add_option('--date', metavar='DATE', dest='date', help='download only videos uploaded in this date', default=None)
selection.add_option('--datebefore', metavar='DATE', dest='datebefore', help='download only videos uploaded before this date', default=None)
selection.add_option('--dateafter', metavar='DATE', dest='dateafter', help='download only videos uploaded after this date', default=None)
selection.add_option('--no-playlist', action='store_true', dest='noplaylist', help='download only the currently playing video', default=False)
selection.add_option('--age-limit', metavar='YEARS', dest='age_limit',
help='download only videos suitable for the given age',
default=None, type=int)
selection.add_option('--download-archive', metavar='FILE',
dest='download_archive',
help='Download only videos not present in the archive file. Record all downloaded videos in it.')
authentication.add_option('-u', '--username',
@ -178,8 +234,8 @@ def parseOpts(overrideArguments=None):
video_format.add_option('-f', '--format',
action='store', dest='format', metavar='FORMAT',
help='video format code, specifiy the order of preference using slashes: "-f 22/17/18"')
action='store', dest='format', metavar='FORMAT', default='best',
help='video format code, specifiy the order of preference using slashes: "-f 22/17/18". "-f mp4" and "-f flv" are also supported')
video_format.add_option('--all-formats',
action='store_const', dest='format', help='download all available video formats', const='all')
video_format.add_option('--prefer-free-formats',
@ -188,27 +244,37 @@ def parseOpts(overrideArguments=None):
action='store', dest='format_limit', metavar='FORMAT', help='highest quality format to download')
video_format.add_option('-F', '--list-formats',
action='store_true', dest='listformats', help='list all available formats (currently youtube only)')
video_format.add_option('--write-sub', '--write-srt',
subtitles.add_option('--write-sub', '--write-srt',
action='store_true', dest='writesubtitles',
help='write subtitle file (currently youtube only)', default=False)
video_format.add_option('--write-auto-sub', '--write-automatic-sub',
help='write subtitle file', default=False)
subtitles.add_option('--write-auto-sub', '--write-automatic-sub',
action='store_true', dest='writeautomaticsub',
help='write automatic subtitle file (currently youtube only)', default=False)
video_format.add_option('--only-sub',
action='store_true', dest='skip_download',
help='[deprecated] alias of --skip-download', default=False)
video_format.add_option('--all-subs',
help='write automatic subtitle file (youtube only)', default=False)
subtitles.add_option('--all-subs',
action='store_true', dest='allsubtitles',
help='downloads all the available subtitles of the video (currently youtube only)', default=False)
video_format.add_option('--list-subs',
help='downloads all the available subtitles of the video', default=False)
subtitles.add_option('--list-subs',
action='store_true', dest='listsubtitles',
help='lists all available subtitles for the video (currently youtube only)', default=False)
video_format.add_option('--sub-format',
help='lists all available subtitles for the video', default=False)
subtitles.add_option('--sub-format',
action='store', dest='subtitlesformat', metavar='FORMAT',
help='subtitle format [srt/sbv/vtt] (default=srt) (currently youtube only)', default='srt')
video_format.add_option('--sub-lang', '--srt-lang',
action='store', dest='subtitleslang', metavar='LANG',
help='language of the subtitles to download (optional) use IETF language tags like \'en\'')
help='subtitle format (default=srt) ([sbv/vtt] youtube only)', default='srt')
subtitles.add_option('--sub-lang', '--sub-langs', '--srt-lang',
action='callback', dest='subtitleslangs', metavar='LANGS', type='str',
default=[], callback=_comma_separated_values_options_callback,
help='languages of the subtitles to download (optional) separated by commas, use IETF language tags like \'en,pt\'')
downloader.add_option('-r', '--rate-limit',
dest='ratelimit', metavar='LIMIT', help='maximum download rate in bytes per second (e.g. 50K or 4.2M)')
downloader.add_option('-R', '--retries',
dest='retries', metavar='RETRIES', help='number of retries (default is %default)', default=10)
downloader.add_option('--buffer-size',
dest='buffersize', metavar='SIZE', help='size of download buffer (e.g. 1024 or 16K) (default is %default)', default="1024")
downloader.add_option('--no-resize-buffer',
action='store_true', dest='noresizebuffer',
help='do not automatically adjust the buffer size. By default, the buffer size is automatically resized from an initial value of SIZE.', default=False)
downloader.add_option('--test', action='store_true', dest='test', default=False, help=optparse.SUPPRESS_HELP)
verbosity.add_option('-q', '--quiet',
action='store_true', dest='quiet', help='activates quiet mode', default=False)
@ -234,6 +300,9 @@ def parseOpts(overrideArguments=None):
verbosity.add_option('--get-format',
action='store_true', dest='getformat',
help='simulate, quiet but print output format', default=False)
verbosity.add_option('-j', '--dump-json',
action='store_true', dest='dumpjson',
help='simulate, quiet but print JSON information', default=False)
verbosity.add_option('--newline',
action='store_true', dest='progress_with_newline', help='output progress bar as new lines', default=False)
verbosity.add_option('--no-progress',
@ -246,6 +315,13 @@ def parseOpts(overrideArguments=None):
verbosity.add_option('--dump-intermediate-pages',
action='store_true', dest='dump_intermediate_pages', default=False,
help='print downloaded pages to debug problems(very verbose)')
verbosity.add_option('--write-pages',
action='store_true', dest='write_pages', default=False,
help='Write downloaded pages to files in the current directory')
verbosity.add_option('--youtube-print-sig-code',
action='store_true', dest='youtube_print_sig_code', default=False,
help=optparse.SUPPRESS_HELP)
filesystem.add_option('-t', '--title',
action='store_true', dest='usetitle', help='use title in file name (default)', default=False)
@ -261,7 +337,10 @@ def parseOpts(overrideArguments=None):
help=('output filename template. Use %(title)s to get the title, '
'%(uploader)s for the uploader name, %(uploader_id)s for the uploader nickname if different, '
'%(autonumber)s to get an automatically incremented number, '
'%(ext)s for the filename extension, %(upload_date)s for the upload date (YYYYMMDD), '
'%(ext)s for the filename extension, '
'%(format)s for the format description (like "22 - 1280x720" or "HD"),'
'%(format_id)s for the unique id of the format (like Youtube\'s itags: "137"),'
'%(upload_date)s for the upload date (YYYYMMDD), '
'%(extractor)s for the provider (youtube, metacafe, etc), '
'%(id)s for the video id , %(playlist)s for the playlist the video is in, '
'%(playlist_index)s for the position in the playlist and %% for a literal percent. '
@ -269,7 +348,7 @@ def parseOpts(overrideArguments=None):
'for example with -o \'/my/downloads/%(uploader)s/%(title)s-%(id)s.%(ext)s\' .'))
filesystem.add_option('--autonumber-size',
dest='autonumber_size', metavar='NUMBER',
help='Specifies the number of digits in %(autonumber)s when it is present in output filename template or --autonumber option is given')
help='Specifies the number of digits in %(autonumber)s when it is present in output filename template or --auto-number option is given')
filesystem.add_option('--restrict-filenames',
action='store_true', dest='restrictfilenames',
help='Restrict filenames to only ASCII characters, and avoid "&" and spaces in filenames', default=False)
@ -278,7 +357,7 @@ def parseOpts(overrideArguments=None):
filesystem.add_option('-w', '--no-overwrites',
action='store_true', dest='nooverwrites', help='do not overwrite files', default=False)
filesystem.add_option('-c', '--continue',
action='store_true', dest='continue_dl', help='resume partially downloaded files', default=True)
action='store_true', dest='continue_dl', help='force resume of partially downloaded files. By default, youtube-dl will resume downloads if possible.', default=True)
filesystem.add_option('--no-continue',
action='store_false', dest='continue_dl',
help='do not resume partially downloaded files (restart from beginning)')
@ -295,6 +374,9 @@ def parseOpts(overrideArguments=None):
filesystem.add_option('--write-info-json',
action='store_true', dest='writeinfojson',
help='write video metadata to a .info.json file', default=False)
filesystem.add_option('--write-annotations',
action='store_true', dest='writeannotations',
help='write video annotations to a .annotation file', default=False)
filesystem.add_option('--write-thumbnail',
action='store_true', dest='writethumbnail',
help='write thumbnail image to disk', default=False)
@ -312,35 +394,45 @@ def parseOpts(overrideArguments=None):
help='keeps the video file on disk after the post-processing; the video is erased by default')
postproc.add_option('--no-post-overwrites', action='store_true', dest='nopostoverwrites', default=False,
help='do not overwrite post-processed files; the post-processed files are overwritten by default')
postproc.add_option('--embed-subs', action='store_true', dest='embedsubtitles', default=False,
help='embed subtitles in the video (only for mp4 videos)')
postproc.add_option('--add-metadata', action='store_true', dest='addmetadata', default=False,
help='add metadata to the files')
parser.add_option_group(general)
parser.add_option_group(selection)
parser.add_option_group(downloader)
parser.add_option_group(filesystem)
parser.add_option_group(verbosity)
parser.add_option_group(video_format)
parser.add_option_group(subtitles)
parser.add_option_group(authentication)
parser.add_option_group(postproc)
if overrideArguments is not None:
opts, args = parser.parse_args(overrideArguments)
if opts.verbose:
sys.stderr.write(u'[debug] Override config: ' + repr(overrideArguments) + '\n')
write_string(u'[debug] Override config: ' + repr(overrideArguments) + '\n')
else:
xdg_config_home = os.environ.get('XDG_CONFIG_HOME')
if xdg_config_home:
userConfFile = os.path.join(xdg_config_home, 'youtube-dl.conf')
userConfFile = os.path.join(xdg_config_home, 'youtube-dl', 'config')
if not os.path.isfile(userConfFile):
userConfFile = os.path.join(xdg_config_home, 'youtube-dl.conf')
else:
userConfFile = os.path.join(os.path.expanduser('~'), '.config', 'youtube-dl.conf')
userConfFile = os.path.join(os.path.expanduser('~'), '.config', 'youtube-dl', 'config')
if not os.path.isfile(userConfFile):
userConfFile = os.path.join(os.path.expanduser('~'), '.config', 'youtube-dl.conf')
systemConf = _readOptions('/etc/youtube-dl.conf')
userConf = _readOptions(userConfFile)
commandLineConf = sys.argv[1:]
commandLineConf = sys.argv[1:]
argv = systemConf + userConf + commandLineConf
opts, args = parser.parse_args(argv)
if opts.verbose:
sys.stderr.write(u'[debug] System config: ' + repr(systemConf) + '\n')
sys.stderr.write(u'[debug] User config: ' + repr(userConf) + '\n')
sys.stderr.write(u'[debug] Command-line args: ' + repr(commandLineConf) + '\n')
write_string(u'[debug] System config: ' + repr(_hide_login_info(systemConf)) + '\n')
write_string(u'[debug] User config: ' + repr(_hide_login_info(userConf)) + '\n')
write_string(u'[debug] Command-line args: ' + repr(_hide_login_info(commandLineConf)) + '\n')
return parser, opts, args
@ -352,23 +444,10 @@ def _real_main(argv=None):
parser, opts, args = parseOpts(argv)
# Open appropriate CookieJar
if opts.cookiefile is None:
jar = compat_cookiejar.CookieJar()
else:
try:
jar = compat_cookiejar.MozillaCookieJar(opts.cookiefile)
if os.access(opts.cookiefile, os.R_OK):
jar.load()
except (IOError, OSError) as err:
if opts.verbose:
traceback.print_exc()
sys.stderr.write(u'ERROR: unable to open cookie file\n')
sys.exit(101)
# Set user agent
if opts.user_agent is not None:
std_headers['User-Agent'] = opts.user_agent
# Set referer
if opts.referer is not None:
std_headers['Referer'] = opts.referer
@ -389,39 +468,37 @@ def _real_main(argv=None):
batchurls = batchfd.readlines()
batchurls = [x.strip() for x in batchurls]
batchurls = [x for x in batchurls if len(x) > 0 and not re.search(r'^[#/;]', x)]
if opts.verbose:
write_string(u'[debug] Batch file urls: ' + repr(batchurls) + u'\n')
except IOError:
sys.exit(u'ERROR: batch file could not be read')
all_urls = batchurls + args
all_urls = [url.strip() for url in all_urls]
# General configuration
cookie_processor = compat_urllib_request.HTTPCookieProcessor(jar)
if opts.proxy is not None:
if opts.proxy == '':
proxies = {}
else:
proxies = {'http': opts.proxy, 'https': opts.proxy}
else:
proxies = compat_urllib_request.getproxies()
# Set HTTPS proxy to HTTP one if given (https://github.com/rg3/youtube-dl/issues/805)
if 'http' in proxies and 'https' not in proxies:
proxies['https'] = proxies['http']
proxy_handler = compat_urllib_request.ProxyHandler(proxies)
https_handler = make_HTTPS_handler(opts)
opener = compat_urllib_request.build_opener(https_handler, proxy_handler, cookie_processor, YoutubeDLHandler())
compat_urllib_request.install_opener(opener)
socket.setdefaulttimeout(300) # 5 minutes should be enough (famous last words)
extractors = gen_extractors()
if opts.list_extractors:
for ie in extractors:
for ie in sorted(extractors, key=lambda ie: ie.IE_NAME.lower()):
compat_print(ie.IE_NAME + (' (CURRENTLY BROKEN)' if not ie._WORKING else ''))
matchedUrls = [url for url in all_urls if ie.suitable(url)]
all_urls = [url for url in all_urls if url not in matchedUrls]
for mu in matchedUrls:
compat_print(u' ' + mu)
sys.exit(0)
if opts.list_extractor_descriptions:
for ie in sorted(extractors, key=lambda ie: ie.IE_NAME.lower()):
if not ie._WORKING:
continue
desc = getattr(ie, 'IE_DESC', ie.IE_NAME)
if desc is False:
continue
if hasattr(ie, 'SEARCH_KEY'):
_SEARCHES = (u'cute kittens', u'slithering pythons', u'falling cat', u'angry poodle', u'purple fish', u'running tortoise')
_COUNTS = (u'', u'5', u'10', u'all')
desc += u' (Example: "%s%s:%s" )' % (ie.SEARCH_KEY, random.choice(_COUNTS), random.choice(_SEARCHES))
compat_print(desc)
sys.exit(0)
# Conflicting, missing and erroneous options
if opts.usenetrc and (opts.username is not None or opts.password is not None):
@ -452,7 +529,7 @@ def _real_main(argv=None):
if opts.retries is not None:
try:
opts.retries = int(opts.retries)
except (TypeError, ValueError) as err:
except (TypeError, ValueError):
parser.error(u'invalid retry count specified')
if opts.buffersize is not None:
numeric_buffersize = FileDownloader.parse_bytes(opts.buffersize)
@ -463,13 +540,13 @@ def _real_main(argv=None):
opts.playliststart = int(opts.playliststart)
if opts.playliststart <= 0:
raise ValueError(u'Playlist start must be positive')
except (TypeError, ValueError) as err:
except (TypeError, ValueError):
parser.error(u'invalid playlist start number specified')
try:
opts.playlistend = int(opts.playlistend)
if opts.playlistend != -1 and (opts.playlistend <= 0 or opts.playlistend < opts.playliststart):
raise ValueError(u'Playlist end must be greater than playlist start')
except (TypeError, ValueError) as err:
except (TypeError, ValueError):
parser.error(u'invalid playlist end number specified')
if opts.extractaudio:
if opts.audioformat not in ['best', 'aac', 'mp3', 'm4a', 'opus', 'vorbis', 'wav']:
@ -486,6 +563,11 @@ def _real_main(argv=None):
else:
date = DateRange(opts.dateafter, opts.datebefore)
# --all-sub automatically sets --write-sub if --write-auto-sub is not given
# this was the old behaviour if only --all-sub was given.
if opts.allsubtitles and (opts.writeautomaticsub == False):
opts.writesubtitles = True
if sys.version_info < (3,):
# In Python 2, sys.argv is a bytestring (also note http://bugs.python.org/issue2128 for Windows systems)
if opts.outtmpl is not None:
@ -498,14 +580,17 @@ def _real_main(argv=None):
or (opts.useid and u'%(id)s.%(ext)s')
or (opts.autonumber and u'%(autonumber)s-%(id)s.%(ext)s')
or u'%(title)s-%(id)s.%(ext)s')
if '%(ext)s' not in outtmpl and opts.extractaudio:
parser.error(u'Cannot download a video and extract audio into the same'
u' file! Use "%%(ext)s" instead of %r' %
determine_ext(outtmpl, u''))
# YoutubeDL
ydl = YoutubeDL({
ydl_opts = {
'usenetrc': opts.usenetrc,
'username': opts.username,
'password': opts.password,
'videopassword': opts.videopassword,
'quiet': (opts.quiet or opts.geturl or opts.gettitle or opts.getid or opts.getthumbnail or opts.getdescription or opts.getfilename or opts.getformat),
'quiet': (opts.quiet or opts.geturl or opts.gettitle or opts.getid or opts.getthumbnail or opts.getdescription or opts.getfilename or opts.getformat or opts.dumpjson),
'forceurl': opts.geturl,
'forcetitle': opts.gettitle,
'forceid': opts.getid,
@ -513,8 +598,9 @@ def _real_main(argv=None):
'forcedescription': opts.getdescription,
'forcefilename': opts.getfilename,
'forceformat': opts.getformat,
'forcejson': opts.dumpjson,
'simulate': opts.simulate,
'skip_download': (opts.skip_download or opts.simulate or opts.geturl or opts.gettitle or opts.getid or opts.getthumbnail or opts.getdescription or opts.getfilename or opts.getformat),
'skip_download': (opts.skip_download or opts.simulate or opts.geturl or opts.gettitle or opts.getid or opts.getthumbnail or opts.getdescription or opts.getfilename or opts.getformat or opts.dumpjson),
'format': opts.format,
'format_limit': opts.format_limit,
'listformats': opts.listformats,
@ -532,11 +618,13 @@ def _real_main(argv=None):
'progress_with_newline': opts.progress_with_newline,
'playliststart': opts.playliststart,
'playlistend': opts.playlistend,
'noplaylist': opts.noplaylist,
'logtostderr': opts.outtmpl == '-',
'consoletitle': opts.consoletitle,
'nopart': opts.nopart,
'updatetime': opts.updatetime,
'writedescription': opts.writedescription,
'writeannotations': opts.writeannotations,
'writeinfojson': opts.writeinfojson,
'writethumbnail': opts.writethumbnail,
'writesubtitles': opts.writesubtitles,
@ -544,69 +632,62 @@ def _real_main(argv=None):
'allsubtitles': opts.allsubtitles,
'listsubtitles': opts.listsubtitles,
'subtitlesformat': opts.subtitlesformat,
'subtitleslang': opts.subtitleslang,
'subtitleslangs': opts.subtitleslangs,
'matchtitle': decodeOption(opts.matchtitle),
'rejecttitle': decodeOption(opts.rejecttitle),
'max_downloads': opts.max_downloads,
'prefer_free_formats': opts.prefer_free_formats,
'verbose': opts.verbose,
'dump_intermediate_pages': opts.dump_intermediate_pages,
'write_pages': opts.write_pages,
'test': opts.test,
'keepvideo': opts.keepvideo,
'min_filesize': opts.min_filesize,
'max_filesize': opts.max_filesize,
'daterange': date,
})
'cachedir': opts.cachedir,
'youtube_print_sig_code': opts.youtube_print_sig_code,
'age_limit': opts.age_limit,
'download_archive': opts.download_archive,
'cookiefile': opts.cookiefile,
'nocheckcertificate': opts.no_check_certificate,
}
with YoutubeDL(ydl_opts) as ydl:
ydl.print_debug_header()
ydl.add_default_info_extractors()
# PostProcessors
# Add the metadata pp first, the other pps will copy it
if opts.addmetadata:
ydl.add_post_processor(FFmpegMetadataPP())
if opts.extractaudio:
ydl.add_post_processor(FFmpegExtractAudioPP(preferredcodec=opts.audioformat, preferredquality=opts.audioquality, nopostoverwrites=opts.nopostoverwrites))
if opts.recodevideo:
ydl.add_post_processor(FFmpegVideoConvertor(preferedformat=opts.recodevideo))
if opts.embedsubtitles:
ydl.add_post_processor(FFmpegEmbedSubtitlePP(subtitlesformat=opts.subtitlesformat))
# Update version
if opts.update_self:
update_self(ydl.to_screen, opts.verbose)
# Maybe do nothing
if len(all_urls) < 1:
if not opts.update_self:
parser.error(u'you must provide at least one URL')
else:
sys.exit()
if opts.verbose:
ydl.to_screen(u'[debug] youtube-dl version ' + __version__)
try:
sp = subprocess.Popen(['git', 'rev-parse', '--short', 'HEAD'], stdout=subprocess.PIPE, stderr=subprocess.PIPE,
cwd=os.path.dirname(os.path.abspath(__file__)))
out, err = sp.communicate()
out = out.decode().strip()
if re.match('[0-9a-f]+', out):
ydl.to_screen(u'[debug] Git HEAD: ' + out)
except:
pass
ydl.to_screen(u'[debug] Python version %s - %s' %(platform.python_version(), platform.platform()))
ydl.to_screen(u'[debug] Proxy map: ' + str(proxy_handler.proxies))
for extractor in extractors:
ydl.add_info_extractor(extractor)
# PostProcessors
if opts.extractaudio:
ydl.add_post_processor(FFmpegExtractAudioPP(preferredcodec=opts.audioformat, preferredquality=opts.audioquality, nopostoverwrites=opts.nopostoverwrites))
if opts.recodevideo:
ydl.add_post_processor(FFmpegVideoConvertor(preferedformat=opts.recodevideo))
# Update version
if opts.update_self:
update_self(ydl.to_screen, opts.verbose, sys.argv[0])
# Maybe do nothing
if len(all_urls) < 1:
if not opts.update_self:
parser.error(u'you must provide at least one URL')
else:
sys.exit()
try:
retcode = ydl.download(all_urls)
except MaxDownloadsReached:
ydl.to_screen(u'--max-download limit reached, aborting.')
retcode = 101
# Dump cookie jar if requested
if opts.cookiefile is not None:
try:
jar.save()
except (IOError, OSError) as err:
sys.exit(u'ERROR: unable to save cookie jar')
retcode = ydl.download(all_urls)
except MaxDownloadsReached:
ydl.to_screen(u'--max-download limit reached, aborting.')
retcode = 101
sys.exit(retcode)
def main(argv=None):
try:
_real_main(argv)

202
youtube_dl/aes.py Normal file
View File

@ -0,0 +1,202 @@
__all__ = ['aes_encrypt', 'key_expansion', 'aes_ctr_decrypt', 'aes_decrypt_text']
import base64
from math import ceil
from .utils import bytes_to_intlist, intlist_to_bytes
BLOCK_SIZE_BYTES = 16
def aes_ctr_decrypt(data, key, counter):
"""
Decrypt with aes in counter mode
@param {int[]} data cipher
@param {int[]} key 16/24/32-Byte cipher key
@param {instance} counter Instance whose next_value function (@returns {int[]} 16-Byte block)
returns the next counter block
@returns {int[]} decrypted data
"""
expanded_key = key_expansion(key)
block_count = int(ceil(float(len(data)) / BLOCK_SIZE_BYTES))
decrypted_data=[]
for i in range(block_count):
counter_block = counter.next_value()
block = data[i*BLOCK_SIZE_BYTES : (i+1)*BLOCK_SIZE_BYTES]
block += [0]*(BLOCK_SIZE_BYTES - len(block))
cipher_counter_block = aes_encrypt(counter_block, expanded_key)
decrypted_data += xor(block, cipher_counter_block)
decrypted_data = decrypted_data[:len(data)]
return decrypted_data
def key_expansion(data):
"""
Generate key schedule
@param {int[]} data 16/24/32-Byte cipher key
@returns {int[]} 176/208/240-Byte expanded key
"""
data = data[:] # copy
rcon_iteration = 1
key_size_bytes = len(data)
expanded_key_size_bytes = (key_size_bytes // 4 + 7) * BLOCK_SIZE_BYTES
while len(data) < expanded_key_size_bytes:
temp = data[-4:]
temp = key_schedule_core(temp, rcon_iteration)
rcon_iteration += 1
data += xor(temp, data[-key_size_bytes : 4-key_size_bytes])
for _ in range(3):
temp = data[-4:]
data += xor(temp, data[-key_size_bytes : 4-key_size_bytes])
if key_size_bytes == 32:
temp = data[-4:]
temp = sub_bytes(temp)
data += xor(temp, data[-key_size_bytes : 4-key_size_bytes])
for _ in range(3 if key_size_bytes == 32 else 2 if key_size_bytes == 24 else 0):
temp = data[-4:]
data += xor(temp, data[-key_size_bytes : 4-key_size_bytes])
data = data[:expanded_key_size_bytes]
return data
def aes_encrypt(data, expanded_key):
"""
Encrypt one block with aes
@param {int[]} data 16-Byte state
@param {int[]} expanded_key 176/208/240-Byte expanded key
@returns {int[]} 16-Byte cipher
"""
rounds = len(expanded_key) // BLOCK_SIZE_BYTES - 1
data = xor(data, expanded_key[:BLOCK_SIZE_BYTES])
for i in range(1, rounds+1):
data = sub_bytes(data)
data = shift_rows(data)
if i != rounds:
data = mix_columns(data)
data = xor(data, expanded_key[i*BLOCK_SIZE_BYTES : (i+1)*BLOCK_SIZE_BYTES])
return data
def aes_decrypt_text(data, password, key_size_bytes):
"""
Decrypt text
- The first 8 Bytes of decoded 'data' are the 8 high Bytes of the counter
- The cipher key is retrieved by encrypting the first 16 Byte of 'password'
with the first 'key_size_bytes' Bytes from 'password' (if necessary filled with 0's)
- Mode of operation is 'counter'
@param {str} data Base64 encoded string
@param {str,unicode} password Password (will be encoded with utf-8)
@param {int} key_size_bytes Possible values: 16 for 128-Bit, 24 for 192-Bit or 32 for 256-Bit
@returns {str} Decrypted data
"""
NONCE_LENGTH_BYTES = 8
data = bytes_to_intlist(base64.b64decode(data))
password = bytes_to_intlist(password.encode('utf-8'))
key = password[:key_size_bytes] + [0]*(key_size_bytes - len(password))
key = aes_encrypt(key[:BLOCK_SIZE_BYTES], key_expansion(key)) * (key_size_bytes // BLOCK_SIZE_BYTES)
nonce = data[:NONCE_LENGTH_BYTES]
cipher = data[NONCE_LENGTH_BYTES:]
class Counter:
__value = nonce + [0]*(BLOCK_SIZE_BYTES - NONCE_LENGTH_BYTES)
def next_value(self):
temp = self.__value
self.__value = inc(self.__value)
return temp
decrypted_data = aes_ctr_decrypt(cipher, key, Counter())
plaintext = intlist_to_bytes(decrypted_data)
return plaintext
RCON = (0x8d, 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80, 0x1b, 0x36)
SBOX = (0x63, 0x7C, 0x77, 0x7B, 0xF2, 0x6B, 0x6F, 0xC5, 0x30, 0x01, 0x67, 0x2B, 0xFE, 0xD7, 0xAB, 0x76,
0xCA, 0x82, 0xC9, 0x7D, 0xFA, 0x59, 0x47, 0xF0, 0xAD, 0xD4, 0xA2, 0xAF, 0x9C, 0xA4, 0x72, 0xC0,
0xB7, 0xFD, 0x93, 0x26, 0x36, 0x3F, 0xF7, 0xCC, 0x34, 0xA5, 0xE5, 0xF1, 0x71, 0xD8, 0x31, 0x15,
0x04, 0xC7, 0x23, 0xC3, 0x18, 0x96, 0x05, 0x9A, 0x07, 0x12, 0x80, 0xE2, 0xEB, 0x27, 0xB2, 0x75,
0x09, 0x83, 0x2C, 0x1A, 0x1B, 0x6E, 0x5A, 0xA0, 0x52, 0x3B, 0xD6, 0xB3, 0x29, 0xE3, 0x2F, 0x84,
0x53, 0xD1, 0x00, 0xED, 0x20, 0xFC, 0xB1, 0x5B, 0x6A, 0xCB, 0xBE, 0x39, 0x4A, 0x4C, 0x58, 0xCF,
0xD0, 0xEF, 0xAA, 0xFB, 0x43, 0x4D, 0x33, 0x85, 0x45, 0xF9, 0x02, 0x7F, 0x50, 0x3C, 0x9F, 0xA8,
0x51, 0xA3, 0x40, 0x8F, 0x92, 0x9D, 0x38, 0xF5, 0xBC, 0xB6, 0xDA, 0x21, 0x10, 0xFF, 0xF3, 0xD2,
0xCD, 0x0C, 0x13, 0xEC, 0x5F, 0x97, 0x44, 0x17, 0xC4, 0xA7, 0x7E, 0x3D, 0x64, 0x5D, 0x19, 0x73,
0x60, 0x81, 0x4F, 0xDC, 0x22, 0x2A, 0x90, 0x88, 0x46, 0xEE, 0xB8, 0x14, 0xDE, 0x5E, 0x0B, 0xDB,
0xE0, 0x32, 0x3A, 0x0A, 0x49, 0x06, 0x24, 0x5C, 0xC2, 0xD3, 0xAC, 0x62, 0x91, 0x95, 0xE4, 0x79,
0xE7, 0xC8, 0x37, 0x6D, 0x8D, 0xD5, 0x4E, 0xA9, 0x6C, 0x56, 0xF4, 0xEA, 0x65, 0x7A, 0xAE, 0x08,
0xBA, 0x78, 0x25, 0x2E, 0x1C, 0xA6, 0xB4, 0xC6, 0xE8, 0xDD, 0x74, 0x1F, 0x4B, 0xBD, 0x8B, 0x8A,
0x70, 0x3E, 0xB5, 0x66, 0x48, 0x03, 0xF6, 0x0E, 0x61, 0x35, 0x57, 0xB9, 0x86, 0xC1, 0x1D, 0x9E,
0xE1, 0xF8, 0x98, 0x11, 0x69, 0xD9, 0x8E, 0x94, 0x9B, 0x1E, 0x87, 0xE9, 0xCE, 0x55, 0x28, 0xDF,
0x8C, 0xA1, 0x89, 0x0D, 0xBF, 0xE6, 0x42, 0x68, 0x41, 0x99, 0x2D, 0x0F, 0xB0, 0x54, 0xBB, 0x16)
MIX_COLUMN_MATRIX = ((2,3,1,1),
(1,2,3,1),
(1,1,2,3),
(3,1,1,2))
def sub_bytes(data):
return [SBOX[x] for x in data]
def rotate(data):
return data[1:] + [data[0]]
def key_schedule_core(data, rcon_iteration):
data = rotate(data)
data = sub_bytes(data)
data[0] = data[0] ^ RCON[rcon_iteration]
return data
def xor(data1, data2):
return [x^y for x, y in zip(data1, data2)]
def mix_column(data):
data_mixed = []
for row in range(4):
mixed = 0
for column in range(4):
addend = data[column]
if MIX_COLUMN_MATRIX[row][column] in (2,3):
addend <<= 1
if addend > 0xff:
addend &= 0xff
addend ^= 0x1b
if MIX_COLUMN_MATRIX[row][column] == 3:
addend ^= data[column]
mixed ^= addend & 0xff
data_mixed.append(mixed)
return data_mixed
def mix_columns(data):
data_mixed = []
for i in range(4):
column = data[i*4 : (i+1)*4]
data_mixed += mix_column(column)
return data_mixed
def shift_rows(data):
data_shifted = []
for column in range(4):
for row in range(4):
data_shifted.append( data[((column + row) & 0b11) * 4 + row] )
return data_shifted
def inc(data):
data = data[:] # copy
for i in range(len(data)-1,-1,-1):
if data[i] == 255:
data[i] = 0
else:
data[i] = data[i] + 1
break
return data

View File

@ -1,145 +1,210 @@
from .appletrailers import AppleTrailersIE
from .addanime import AddAnimeIE
from .anitube import AnitubeIE
from .archiveorg import ArchiveOrgIE
from .ard import ARDIE
from .arte import ArteTvIE
from .bandcamp import BandcampIE
from .arte import (
ArteTvIE,
ArteTVPlus7IE,
ArteTVCreativeIE,
ArteTVFutureIE,
)
from .auengine import AUEngineIE
from .bambuser import BambuserIE, BambuserChannelIE
from .bandcamp import BandcampIE, BandcampAlbumIE
from .bliptv import BlipTVIE, BlipTVUserIE
from .bloomberg import BloombergIE
from .breakcom import BreakIE
from .brightcove import BrightcoveIE
from .c56 import C56IE
from .canalplus import CanalplusIE
from .canalc2 import Canalc2IE
from .cinemassacre import CinemassacreIE
from .clipfish import ClipfishIE
from .cnn import CNNIE
from .collegehumor import CollegeHumorIE
from .comedycentral import ComedyCentralIE
from .comedycentral import ComedyCentralIE, ComedyCentralShowsIE
from .condenast import CondeNastIE
from .criterion import CriterionIE
from .cspan import CSpanIE
from .dailymotion import DailymotionIE
from .d8 import D8IE
from .dailymotion import (
DailymotionIE,
DailymotionPlaylistIE,
DailymotionUserIE,
)
from .daum import DaumIE
from .depositfiles import DepositFilesIE
from .dotsub import DotsubIE
from .dreisat import DreiSatIE
from .defense import DefenseGouvFrIE
from .ebaumsworld import EbaumsWorldIE
from .ehow import EHowIE
from .eighttracks import EightTracksIE
from .eitb import EitbIE
from .escapist import EscapistIE
from .exfm import ExfmIE
from .extremetube import ExtremeTubeIE
from .facebook import FacebookIE
from .faz import FazIE
from .fktv import (
FKTVIE,
FKTVPosteckeIE,
)
from .flickr import FlickrIE
from .francetv import (
PluzzIE,
FranceTvInfoIE,
France2IE,
GenerationQuoiIE
)
from .freesound import FreesoundIE
from .funnyordie import FunnyOrDieIE
from .gamekings import GamekingsIE
from .gamespot import GameSpotIE
from .gametrailers import GametrailersIE
from .generic import GenericIE
from .googleplus import GooglePlusIE
from .googlesearch import GoogleSearchIE
from .hark import HarkIE
from .hotnewhiphop import HotNewHipHopIE
from .howcast import HowcastIE
from .hypem import HypemIE
from .ign import IGNIE, OneUPIE
from .ina import InaIE
from .infoq import InfoQIE
from .instagram import InstagramIE
from .internetvideoarchive import InternetVideoArchiveIE
from .jeuxvideo import JeuxVideoIE
from .jukebox import JukeboxIE
from .justintv import JustinTVIE
from .kankan import KankanIE
from .keezmovies import KeezMoviesIE
from .kickstarter import KickStarterIE
from .keek import KeekIE
from .liveleak import LiveLeakIE
from .livestream import LivestreamIE, LivestreamOriginalIE
from .metacafe import MetacafeIE
from .metacritic import MetacriticIE
from .mit import TechTVMITIE, MITIE
from .mixcloud import MixcloudIE
from .mofosex import MofosexIE
from .mtv import MTVIE
from .muzu import MuzuTVIE
from .myspace import MySpaceIE
from .myspass import MySpassIE
from .myvideo import MyVideoIE
from .naver import NaverIE
from .nba import NBAIE
from .nbc import NBCNewsIE
from .newgrounds import NewgroundsIE
from .nhl import NHLIE, NHLVideocenterIE
from .niconico import NiconicoIE
from .nowvideo import NowVideoIE
from .ooyala import OoyalaIE
from .orf import ORFIE
from .pbs import PBSIE
from .photobucket import PhotobucketIE
from .pornhub import PornHubIE
from .pornotube import PornotubeIE
from .rbmaradio import RBMARadioIE
from .redtube import RedTubeIE
from .soundcloud import SoundcloudIE, SoundcloudSetIE
from .ringtv import RingTVIE
from .ro220 import Ro220IE
from .rottentomatoes import RottenTomatoesIE
from .roxwel import RoxwelIE
from .rtlnow import RTLnowIE
from .rutube import RutubeIE
from .sina import SinaIE
from .slashdot import SlashdotIE
from .slideshare import SlideshareIE
from .sohu import SohuIE
from .soundcloud import SoundcloudIE, SoundcloudSetIE, SoundcloudUserIE
from .southparkstudios import (
SouthParkStudiosIE,
SouthparkDeIE,
)
from .space import SpaceIE
from .spankwire import SpankwireIE
from .spiegel import SpiegelIE
from .stanfordoc import StanfordOpenClassroomIE
from .statigram import StatigramIE
from .steam import SteamIE
from .streamcloud import StreamcloudIE
from .sztvhu import SztvHuIE
from .teamcoco import TeamcocoIE
from .techtalks import TechTalksIE
from .ted import TEDIE
from .tf1 import TF1IE
from .thisav import ThisAVIE
from .toutv import TouTvIE
from .traileraddict import TrailerAddictIE
from .trilulilu import TriluliluIE
from .tube8 import Tube8IE
from .tudou import TudouIE
from .tumblr import TumblrIE
from .ustream import UstreamIE
from .tutv import TutvIE
from .tvp import TvpIE
from .unistra import UnistraIE
from .ustream import UstreamIE, UstreamChannelIE
from .vbox7 import Vbox7IE
from .veehd import VeeHDIE
from .veoh import VeohIE
from .vevo import VevoIE
from .vimeo import VimeoIE
from .vice import ViceIE
from .viddler import ViddlerIE
from .videodetective import VideoDetectiveIE
from .videofyme import VideofyMeIE
from .videopremium import VideoPremiumIE
from .vimeo import VimeoIE, VimeoChannelIE
from .vine import VineIE
from .viki import VikiIE
from .vk import VKIE
from .wat import WatIE
from .websurg import WeBSurgIE
from .weibo import WeiboIE
from .wimp import WimpIE
from .worldstarhiphop import WorldStarHipHopIE
from .xhamster import XHamsterIE
from .xnxx import XNXXIE
from .xvideos import XVideosIE
from .xtube import XTubeIE
from .yahoo import YahooIE, YahooSearchIE
from .youjizz import YouJizzIE
from .youku import YoukuIE
from .youporn import YouPornIE
from .youtube import YoutubeIE, YoutubePlaylistIE, YoutubeSearchIE, YoutubeUserIE, YoutubeChannelIE
from .youtube import (
YoutubeIE,
YoutubePlaylistIE,
YoutubeSearchIE,
YoutubeSearchDateIE,
YoutubeUserIE,
YoutubeChannelIE,
YoutubeShowIE,
YoutubeSubscriptionsIE,
YoutubeRecommendedIE,
YoutubeTruncatedURLIE,
YoutubeWatchLaterIE,
YoutubeFavouritesIE,
YoutubeHistoryIE,
)
from .zdf import ZDFIE
_ALL_CLASSES = [
klass
for name, klass in globals().items()
if name.endswith('IE') and name != 'GenericIE'
]
_ALL_CLASSES.append(GenericIE)
def gen_extractors():
""" Return a list of an instance of every supported extractor.
The order does matter; the first extractor matched is the one handling the URL.
"""
return [
YoutubePlaylistIE(),
YoutubeChannelIE(),
YoutubeUserIE(),
YoutubeSearchIE(),
YoutubeIE(),
MetacafeIE(),
DailymotionIE(),
GoogleSearchIE(),
PhotobucketIE(),
YahooIE(),
YahooSearchIE(),
DepositFilesIE(),
FacebookIE(),
BlipTVIE(),
BlipTVUserIE(),
VimeoIE(),
MyVideoIE(),
ComedyCentralIE(),
EscapistIE(),
CollegeHumorIE(),
XVideosIE(),
SoundcloudSetIE(),
SoundcloudIE(),
InfoQIE(),
MixcloudIE(),
StanfordOpenClassroomIE(),
MTVIE(),
YoukuIE(),
XNXXIE(),
YouJizzIE(),
PornotubeIE(),
YouPornIE(),
GooglePlusIE(),
ArteTvIE(),
NBAIE(),
WorldStarHipHopIE(),
JustinTVIE(),
FunnyOrDieIE(),
SteamIE(),
UstreamIE(),
RBMARadioIE(),
EightTracksIE(),
KeekIE(),
TEDIE(),
MySpassIE(),
SpiegelIE(),
LiveLeakIE(),
ARDIE(),
ZDFIE(),
TumblrIE(),
BandcampIE(),
RedTubeIE(),
InaIE(),
HowcastIE(),
VineIE(),
FlickrIE(),
TeamcocoIE(),
XHamsterIE(),
HypemIE(),
Vbox7IE(),
GametrailersIE(),
StatigramIE(),
BreakIE(),
VevoIE(),
JukeboxIE(),
TudouIE(),
CSpanIE(),
WimpIE(),
HotNewHipHopIE(),
GenericIE()
]
return [klass() for klass in _ALL_CLASSES]
def get_info_extractor(ie_name):
"""Returns the info extractor class with the given ie_name"""

View File

@ -0,0 +1,86 @@
import re
from .common import InfoExtractor
from ..utils import (
compat_HTTPError,
compat_str,
compat_urllib_parse,
compat_urllib_parse_urlparse,
ExtractorError,
)
class AddAnimeIE(InfoExtractor):
_VALID_URL = r'^http://(?:\w+\.)?add-anime\.net/watch_video.php\?(?:.*?)v=(?P<video_id>[\w_]+)(?:.*)'
IE_NAME = u'AddAnime'
_TEST = {
u'url': u'http://www.add-anime.net/watch_video.php?v=24MR3YO5SAS9',
u'file': u'24MR3YO5SAS9.mp4',
u'md5': u'72954ea10bc979ab5e2eb288b21425a0',
u'info_dict': {
u"description": u"One Piece 606",
u"title": u"One Piece 606"
}
}
def _real_extract(self, url):
try:
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('video_id')
webpage = self._download_webpage(url, video_id)
except ExtractorError as ee:
if not isinstance(ee.cause, compat_HTTPError) or \
ee.cause.code != 503:
raise
redir_webpage = ee.cause.read().decode('utf-8')
action = self._search_regex(
r'<form id="challenge-form" action="([^"]+)"',
redir_webpage, u'Redirect form')
vc = self._search_regex(
r'<input type="hidden" name="jschl_vc" value="([^"]+)"/>',
redir_webpage, u'redirect vc value')
av = re.search(
r'a\.value = ([0-9]+)[+]([0-9]+)[*]([0-9]+);',
redir_webpage)
if av is None:
raise ExtractorError(u'Cannot find redirect math task')
av_res = int(av.group(1)) + int(av.group(2)) * int(av.group(3))
parsed_url = compat_urllib_parse_urlparse(url)
av_val = av_res + len(parsed_url.netloc)
confirm_url = (
parsed_url.scheme + u'://' + parsed_url.netloc +
action + '?' +
compat_urllib_parse.urlencode({
'jschl_vc': vc, 'jschl_answer': compat_str(av_val)}))
self._download_webpage(
confirm_url, video_id,
note=u'Confirming after redirect')
webpage = self._download_webpage(url, video_id)
formats = []
for format_id in ('normal', 'hq'):
rex = r"var %s_video_file = '(.*?)';" % re.escape(format_id)
video_url = self._search_regex(rex, webpage, u'video file URLx',
fatal=False)
if not video_url:
continue
formats.append({
'format_id': format_id,
'url': video_url,
})
if not formats:
raise ExtractorError(u'Cannot find any video format!')
video_title = self._og_search_title(webpage)
video_description = self._og_search_description(webpage)
return {
'_type': 'video',
'id': video_id,
'formats': formats,
'title': video_title,
'description': video_description
}

View File

@ -0,0 +1,55 @@
import re
import xml.etree.ElementTree
from .common import InfoExtractor
class AnitubeIE(InfoExtractor):
IE_NAME = u'anitube.se'
_VALID_URL = r'https?://(?:www\.)?anitube\.se/video/(?P<id>\d+)'
_TEST = {
u'url': u'http://www.anitube.se/video/36621',
u'md5': u'59d0eeae28ea0bc8c05e7af429998d43',
u'file': u'36621.mp4',
u'info_dict': {
u'id': u'36621',
u'ext': u'mp4',
u'title': u'Recorder to Randoseru 01',
},
u'skip': u'Blocked in the US',
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
key = self._html_search_regex(r'http://www\.anitube\.se/embed/([A-Za-z0-9_-]*)',
webpage, u'key')
webpage_config = self._download_webpage('http://www.anitube.se/nuevo/econfig.php?key=%s' % key,
key)
config_xml = xml.etree.ElementTree.fromstring(webpage_config.encode('utf-8'))
video_title = config_xml.find('title').text
formats = []
video_url = config_xml.find('file')
if video_url is not None:
formats.append({
'format_id': 'sd',
'url': video_url.text,
})
video_url = config_xml.find('filehd')
if video_url is not None:
formats.append({
'format_id': 'hd',
'url': video_url.text,
})
return {
'id': video_id,
'title': video_title,
'formats': formats
}

View File

@ -0,0 +1,138 @@
import re
import xml.etree.ElementTree
import json
from .common import InfoExtractor
from ..utils import (
compat_urlparse,
determine_ext,
)
class AppleTrailersIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?trailers.apple.com/trailers/(?P<company>[^/]+)/(?P<movie>[^/]+)'
_TEST = {
u"url": u"http://trailers.apple.com/trailers/wb/manofsteel/",
u"playlist": [
{
u"file": u"manofsteel-trailer4.mov",
u"md5": u"d97a8e575432dbcb81b7c3acb741f8a8",
u"info_dict": {
u"duration": 111,
u"title": u"Trailer 4",
u"upload_date": u"20130523",
u"uploader_id": u"wb",
},
},
{
u"file": u"manofsteel-trailer3.mov",
u"md5": u"b8017b7131b721fb4e8d6f49e1df908c",
u"info_dict": {
u"duration": 182,
u"title": u"Trailer 3",
u"upload_date": u"20130417",
u"uploader_id": u"wb",
},
},
{
u"file": u"manofsteel-trailer.mov",
u"md5": u"d0f1e1150989b9924679b441f3404d48",
u"info_dict": {
u"duration": 148,
u"title": u"Trailer",
u"upload_date": u"20121212",
u"uploader_id": u"wb",
},
},
{
u"file": u"manofsteel-teaser.mov",
u"md5": u"5fe08795b943eb2e757fa95cb6def1cb",
u"info_dict": {
u"duration": 93,
u"title": u"Teaser",
u"upload_date": u"20120721",
u"uploader_id": u"wb",
},
}
]
}
_JSON_RE = r'iTunes.playURL\((.*?)\);'
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
movie = mobj.group('movie')
uploader_id = mobj.group('company')
playlist_url = compat_urlparse.urljoin(url, u'includes/playlists/itunes.inc')
playlist_snippet = self._download_webpage(playlist_url, movie)
playlist_cleaned = re.sub(r'(?s)<script[^<]*?>.*?</script>', u'', playlist_snippet)
playlist_cleaned = re.sub(r'<img ([^<]*?)>', r'<img \1/>', playlist_cleaned)
# The ' in the onClick attributes are not escaped, it couldn't be parsed
# with xml.etree.ElementTree.fromstring
# like: http://trailers.apple.com/trailers/wb/gravity/
def _clean_json(m):
return u'iTunes.playURL(%s);' % m.group(1).replace('\'', '&#39;')
playlist_cleaned = re.sub(self._JSON_RE, _clean_json, playlist_cleaned)
playlist_html = u'<html>' + playlist_cleaned + u'</html>'
doc = xml.etree.ElementTree.fromstring(playlist_html)
playlist = []
for li in doc.findall('./div/ul/li'):
on_click = li.find('.//a').attrib['onClick']
trailer_info_json = self._search_regex(self._JSON_RE,
on_click, u'trailer info')
trailer_info = json.loads(trailer_info_json)
title = trailer_info['title']
video_id = movie + '-' + re.sub(r'[^a-zA-Z0-9]', '', title).lower()
thumbnail = li.find('.//img').attrib['src']
upload_date = trailer_info['posted'].replace('-', '')
runtime = trailer_info['runtime']
m = re.search(r'(?P<minutes>[0-9]+):(?P<seconds>[0-9]{1,2})', runtime)
duration = None
if m:
duration = 60 * int(m.group('minutes')) + int(m.group('seconds'))
first_url = trailer_info['url']
trailer_id = first_url.split('/')[-1].rpartition('_')[0].lower()
settings_json_url = compat_urlparse.urljoin(url, 'includes/settings/%s.json' % trailer_id)
settings_json = self._download_webpage(settings_json_url, trailer_id, u'Downloading settings json')
settings = json.loads(settings_json)
formats = []
for format in settings['metadata']['sizes']:
# The src is a file pointing to the real video file
format_url = re.sub(r'_(\d*p.mov)', r'_h\1', format['src'])
formats.append({
'url': format_url,
'ext': determine_ext(format_url),
'format': format['type'],
'width': format['width'],
'height': int(format['height']),
})
formats = sorted(formats, key=lambda f: (f['height'], f['width']))
info = {
'_type': 'video',
'id': video_id,
'title': title,
'formats': formats,
'title': title,
'duration': duration,
'thumbnail': thumbnail,
'upload_date': upload_date,
'uploader_id': uploader_id,
'user_agent': 'QuickTime compatible (youtube-dl)',
}
# TODO: Remove when #980 has been merged
info['url'] = formats[-1]['url']
info['ext'] = formats[-1]['ext']
playlist.append(info)
return {
'_type': 'playlist',
'id': movie,
'entries': playlist,
}

View File

@ -0,0 +1,68 @@
import json
import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
unified_strdate,
)
class ArchiveOrgIE(InfoExtractor):
IE_NAME = 'archive.org'
IE_DESC = 'archive.org videos'
_VALID_URL = r'(?:https?://)?(?:www\.)?archive.org/details/(?P<id>[^?/]+)(?:[?].*)?$'
_TEST = {
u"url": u"http://archive.org/details/XD300-23_68HighlightsAResearchCntAugHumanIntellect",
u'file': u'XD300-23_68HighlightsAResearchCntAugHumanIntellect.ogv',
u'md5': u'8af1d4cf447933ed3c7f4871162602db',
u'info_dict': {
u"title": u"1968 Demo - FJCC Conference Presentation Reel #1",
u"description": u"Reel 1 of 3: Also known as the \"Mother of All Demos\", Doug Engelbart's presentation at the Fall Joint Computer Conference in San Francisco, December 9, 1968 titled \"A Research Center for Augmenting Human Intellect.\" For this presentation, Doug and his team astonished the audience by not only relating their research, but demonstrating it live. This was the debut of the mouse, interactive computing, hypermedia, computer supported software engineering, video teleconferencing, etc. See also <a href=\"http://dougengelbart.org/firsts/dougs-1968-demo.html\" rel=\"nofollow\">Doug's 1968 Demo page</a> for more background, highlights, links, and the detailed paper published in this conference proceedings. Filmed on 3 reels: Reel 1 | <a href=\"http://www.archive.org/details/XD300-24_68HighlightsAResearchCntAugHumanIntellect\" rel=\"nofollow\">Reel 2</a> | <a href=\"http://www.archive.org/details/XD300-25_68HighlightsAResearchCntAugHumanIntellect\" rel=\"nofollow\">Reel 3</a>",
u"upload_date": u"19681210",
u"uploader": u"SRI International"
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
json_url = url + (u'?' if u'?' in url else '&') + u'output=json'
json_data = self._download_webpage(json_url, video_id)
data = json.loads(json_data)
title = data['metadata']['title'][0]
description = data['metadata']['description'][0]
uploader = data['metadata']['creator'][0]
upload_date = unified_strdate(data['metadata']['date'][0])
formats = [{
'format': fdata['format'],
'url': 'http://' + data['server'] + data['dir'] + fn,
'file_size': int(fdata['size']),
}
for fn,fdata in data['files'].items()
if 'Video' in fdata['format']]
formats.sort(key=lambda fdata: fdata['file_size'])
for f in formats:
f['ext'] = determine_ext(f['url'])
info = {
'_type': 'video',
'id': video_id,
'title': title,
'formats': formats,
'description': description,
'uploader': uploader,
'upload_date': upload_date,
}
thumbnail = data.get('misc', {}).get('image')
if thumbnail:
info['thumbnail'] = thumbnail
# TODO: Remove when #980 has been merged
info.update(formats[-1])
return info

View File

@ -9,6 +9,15 @@ class ARDIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?(?:(?:www\.)?ardmediathek\.de|mediathek\.daserste\.de)/(?:.*/)(?P<video_id>[^/\?]+)(?:\?.*)?'
_TITLE = r'<h1(?: class="boxTopHeadline")?>(?P<title>.*)</h1>'
_MEDIA_STREAM = r'mediaCollection\.addMediaStream\((?P<media_type>\d+), (?P<quality>\d+), "(?P<rtmp_url>[^"]*)", "(?P<video_url>[^"]*)", "[^"]*"\)'
_TEST = {
u'url': u'http://www.ardmediathek.de/das-erste/tagesschau-in-100-sek?documentId=14077640',
u'file': u'14077640.mp4',
u'md5': u'6ca8824255460c787376353f9e20bbd8',
u'info_dict': {
u"title": u"11.04.2013 09:23 Uhr - Tagesschau in 100 Sekunden"
},
u'skip': u'Requires rtmpdump'
}
def _real_extract(self, url):
# determine video id from url
@ -23,7 +32,7 @@ class ARDIE(InfoExtractor):
# determine title and media streams from webpage
html = self._download_webpage(url, video_id)
title = re.search(self._TITLE, html).group('title')
streams = [m.groupdict() for m in re.finditer(self._MEDIA_STREAM, html)]
streams = [mo.groupdict() for mo in re.finditer(self._MEDIA_STREAM, html)]
if not streams:
assert '"fsk"' in html
raise ExtractorError(u'This video is only available after 8:00 pm')

View File

@ -1,22 +1,35 @@
# encoding: utf-8
import re
import json
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import (
# This is used by the not implemented extractLiveStream method
compat_urllib_parse,
ExtractorError,
find_xpath_attr,
unified_strdate,
determine_ext,
get_element_by_id,
compat_str,
)
# There are different sources of video in arte.tv, the extraction process
# is different for each one. The videos usually expire in 7 days, so we can't
# add tests.
class ArteTvIE(InfoExtractor):
_VALID_URL = r'(?:http://)?www\.arte.tv/guide/(?:fr|de)/(?:(?:sendungen|emissions)/)?(?P<id>.*?)/(?P<name>.*?)(\?.*)?'
_VIDEOS_URL = r'(?:http://)?videos.arte.tv/(?P<lang>fr|de)/.*-(?P<id>.*?).html'
_LIVEWEB_URL = r'(?:http://)?liveweb.arte.tv/(?P<lang>fr|de)/(?P<subpage>.+?)/(?P<name>.+)'
_LIVE_URL = r'index-[0-9]+\.html$'
IE_NAME = u'arte.tv'
@classmethod
def suitable(cls, url):
return any(re.match(regex, url) for regex in (cls._VIDEOS_URL, cls._LIVEWEB_URL))
# TODO implement Live Stream
# from ..utils import compat_urllib_parse
# def extractLiveStream(self, url):
# video_lang = url.split('/')[-4]
# info = self.grep_webpage(
@ -44,18 +57,93 @@ class ArteTvIE(InfoExtractor):
# video_url = u'%s/%s' % (info.get('url'), info.get('path'))
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
name = mobj.group('name')
# This is not a real id, it can be for example AJT for the news
# http://www.arte.tv/guide/fr/emissions/AJT/arte-journal
video_id = mobj.group('id')
mobj = re.match(self._VIDEOS_URL, url)
if mobj is not None:
id = mobj.group('id')
lang = mobj.group('lang')
return self._extract_video(url, id, lang)
if re.search(self._LIVE_URL, video_id) is not None:
mobj = re.match(self._LIVEWEB_URL, url)
if mobj is not None:
name = mobj.group('name')
lang = mobj.group('lang')
return self._extract_liveweb(url, name, lang)
if re.search(self._LIVE_URL, url) is not None:
raise ExtractorError(u'Arte live streams are not yet supported, sorry')
# self.extractLiveStream(url)
# return
def _extract_video(self, url, video_id, lang):
"""Extract from videos.arte.tv"""
ref_xml_url = url.replace('/videos/', '/do_delegate/videos/')
ref_xml_url = ref_xml_url.replace('.html', ',view,asPlayerXml.xml')
ref_xml = self._download_webpage(ref_xml_url, video_id, note=u'Downloading metadata')
ref_xml_doc = xml.etree.ElementTree.fromstring(ref_xml)
config_node = find_xpath_attr(ref_xml_doc, './/video', 'lang', lang)
config_xml_url = config_node.attrib['ref']
config_xml = self._download_webpage(config_xml_url, video_id, note=u'Downloading configuration')
video_urls = list(re.finditer(r'<url quality="(?P<quality>.*?)">(?P<url>.*?)</url>', config_xml))
def _key(m):
quality = m.group('quality')
if quality == 'hd':
return 2
else:
return 1
# We pick the best quality
video_urls = sorted(video_urls, key=_key)
video_url = list(video_urls)[-1].group('url')
title = self._html_search_regex(r'<name>(.*?)</name>', config_xml, 'title')
thumbnail = self._html_search_regex(r'<firstThumbnailUrl>(.*?)</firstThumbnailUrl>',
config_xml, 'thumbnail')
return {'id': video_id,
'title': title,
'thumbnail': thumbnail,
'url': video_url,
'ext': 'flv',
}
def _extract_liveweb(self, url, name, lang):
"""Extract form http://liveweb.arte.tv/"""
webpage = self._download_webpage(url, name)
video_id = self._search_regex(r'eventId=(\d+?)("|&)', webpage, u'event id')
config_xml = self._download_webpage('http://download.liveweb.arte.tv/o21/liveweb/events/event-%s.xml' % video_id,
video_id, u'Downloading information')
config_doc = xml.etree.ElementTree.fromstring(config_xml.encode('utf-8'))
event_doc = config_doc.find('event')
url_node = event_doc.find('video').find('urlHd')
if url_node is None:
url_node = event_doc.find('urlSd')
return {'id': video_id,
'title': event_doc.find('name%s' % lang.capitalize()).text,
'url': url_node.text.replace('MP4', 'mp4'),
'ext': 'flv',
'thumbnail': self._og_search_thumbnail(webpage),
}
class ArteTVPlus7IE(InfoExtractor):
IE_NAME = u'arte.tv:+7'
_VALID_URL = r'https?://www\.arte.tv/guide/(?P<lang>fr|de)/(?:(?:sendungen|emissions)/)?(?P<id>.*?)/(?P<name>.*?)(\?.*)?'
@classmethod
def _extract_url_info(cls, url):
mobj = re.match(cls._VALID_URL, url)
lang = mobj.group('lang')
# This is not a real id, it can be for example AJT for the news
# http://www.arte.tv/guide/fr/emissions/AJT/arte-journal
video_id = mobj.group('id')
return video_id, lang
def _real_extract(self, url):
video_id, lang = self._extract_url_info(url)
webpage = self._download_webpage(url, video_id)
return self._extract_from_webpage(webpage, video_id, lang)
def _extract_from_webpage(self, webpage, video_id, lang):
json_url = self._html_search_regex(r'arte_vp_url="(.*?)"', webpage, 'json url')
json_info = self._download_webpage(json_url, video_id, 'Downloading info json')
@ -63,24 +151,112 @@ class ArteTvIE(InfoExtractor):
info = json.loads(json_info)
player_info = info['videoJsonPlayer']
info_dict = {'id': player_info['VID'],
'title': player_info['VTI'],
'description': player_info['VDE'],
'upload_date': unified_strdate(player_info['VDA'].split(' ')[0]),
'thumbnail': player_info['programImage'],
}
info_dict = {
'id': player_info['VID'],
'title': player_info['VTI'],
'description': player_info.get('VDE'),
'upload_date': unified_strdate(player_info.get('VDA', '').split(' ')[0]),
'thumbnail': player_info.get('programImage') or player_info.get('VTU', {}).get('IUR'),
}
formats = player_info['VSR'].values()
# We order the formats by quality
formats = sorted(formats, key=lambda f: int(f['height']))
# Pick the best quality
format_info = formats[-1]
if format_info['mediaType'] == u'rtmp':
info_dict['url'] = format_info['streamer']
info_dict['play_path'] = 'mp4:' + format_info['url']
info_dict['ext'] = 'mp4'
all_formats = player_info['VSR'].values()
# Some formats use the m3u8 protocol
all_formats = list(filter(lambda f: f.get('videoFormat') != 'M3U8', all_formats))
def _match_lang(f):
if f.get('versionCode') is None:
return True
# Return true if that format is in the language of the url
if lang == 'fr':
l = 'F'
elif lang == 'de':
l = 'A'
regexes = [r'VO?%s' % l, r'VO?.-ST%s' % l]
return any(re.match(r, f['versionCode']) for r in regexes)
# Some formats may not be in the same language as the url
formats = filter(_match_lang, all_formats)
formats = list(formats) # in python3 filter returns an iterator
if not formats:
# Some videos are only available in the 'Originalversion'
# they aren't tagged as being in French or German
if all(f['versionCode'] == 'VO' for f in all_formats):
formats = all_formats
else:
raise ExtractorError(u'The formats list is empty')
if re.match(r'[A-Z]Q', formats[0]['quality']) is not None:
def sort_key(f):
return ['HQ', 'MQ', 'EQ', 'SQ'].index(f['quality'])
else:
info_dict['url'] = format_info['url']
info_dict['ext'] = 'mp4'
def sort_key(f):
return (
# Sort first by quality
int(f.get('height',-1)),
int(f.get('bitrate',-1)),
# The original version with subtitles has lower relevance
re.match(r'VO-ST(F|A)', f.get('versionCode', '')) is None,
# The version with sourds/mal subtitles has also lower relevance
re.match(r'VO?(F|A)-STM\1', f.get('versionCode', '')) is None,
)
formats = sorted(formats, key=sort_key)
def _format(format_info):
quality = ''
height = format_info.get('height')
if height is not None:
quality = compat_str(height)
bitrate = format_info.get('bitrate')
if bitrate is not None:
quality += '-%d' % bitrate
if format_info.get('versionCode') is not None:
format_id = u'%s-%s' % (quality, format_info['versionCode'])
else:
format_id = quality
info = {
'format_id': format_id,
'format_note': format_info.get('versionLibelle'),
'width': format_info.get('width'),
'height': height,
}
if format_info['mediaType'] == u'rtmp':
info['url'] = format_info['streamer']
info['play_path'] = 'mp4:' + format_info['url']
info['ext'] = 'flv'
else:
info['url'] = format_info['url']
info['ext'] = determine_ext(info['url'])
return info
info_dict['formats'] = [_format(f) for f in formats]
return info_dict
# It also uses the arte_vp_url url from the webpage to extract the information
class ArteTVCreativeIE(ArteTVPlus7IE):
IE_NAME = u'arte.tv:creative'
_VALID_URL = r'https?://creative\.arte\.tv/(?P<lang>fr|de)/magazine?/(?P<id>.+)'
_TEST = {
u'url': u'http://creative.arte.tv/de/magazin/agentur-amateur-corporate-design',
u'file': u'050489-002.mp4',
u'info_dict': {
u'title': u'Agentur Amateur / Agence Amateur #2 : Corporate Design',
},
}
class ArteTVFutureIE(ArteTVPlus7IE):
IE_NAME = u'arte.tv:future'
_VALID_URL = r'https?://future\.arte\.tv/(?P<lang>fr|de)/(thema|sujet)/.*?#article-anchor-(?P<id>\d+)'
_TEST = {
u'url': u'http://future.arte.tv/fr/sujet/info-sciences#article-anchor-7081',
u'file': u'050940-003.mp4',
u'info_dict': {
u'title': u'Les champignons au secours de la planète',
},
}
def _real_extract(self, url):
anchor_id, lang = self._extract_url_info(url)
webpage = self._download_webpage(url, anchor_id)
row = get_element_by_id(anchor_id, webpage)
return self._extract_from_webpage(row, anchor_id, lang)

View File

@ -0,0 +1,49 @@
import re
from .common import InfoExtractor
from ..utils import (
compat_urllib_parse,
determine_ext,
ExtractorError,
)
class AUEngineIE(InfoExtractor):
_TEST = {
u'url': u'http://auengine.com/embed.php?file=lfvlytY6&w=650&h=370',
u'file': u'lfvlytY6.mp4',
u'md5': u'48972bdbcf1a3a2f5533e62425b41d4f',
u'info_dict': {
u"title": u"[Commie]The Legend of the Legendary Heroes - 03 - Replication Eye (Alpha Stigma)[F9410F5A]"
}
}
_VALID_URL = r'(?:http://)?(?:www\.)?auengine\.com/embed.php\?.*?file=([^&]+).*?'
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group(1)
webpage = self._download_webpage(url, video_id)
title = self._html_search_regex(r'<title>(?P<title>.+?)</title>',
webpage, u'title')
title = title.strip()
links = re.findall(r'\s(?:file|url):\s*["\']([^\'"]+)["\']', webpage)
links = map(compat_urllib_parse.unquote, links)
thumbnail = None
video_url = None
for link in links:
if link.endswith('.png'):
thumbnail = link
elif '/videos/' in link:
video_url = link
if not video_url:
raise ExtractorError(u'Could not find video URL')
ext = u'.' + determine_ext(video_url)
if ext == title[-len(ext):]:
title = title[:-len(ext)]
return {
'id': video_id,
'url': video_url,
'title': title,
'thumbnail': thumbnail,
}

View File

@ -0,0 +1,86 @@
import re
import json
import itertools
from .common import InfoExtractor
from ..utils import (
compat_urllib_request,
)
class BambuserIE(InfoExtractor):
IE_NAME = u'bambuser'
_VALID_URL = r'https?://bambuser\.com/v/(?P<id>\d+)'
_API_KEY = '005f64509e19a868399060af746a00aa'
_TEST = {
u'url': u'http://bambuser.com/v/4050584',
# MD5 seems to be flaky, see https://travis-ci.org/rg3/youtube-dl/jobs/14051016#L388
#u'md5': u'fba8f7693e48fd4e8641b3fd5539a641',
u'info_dict': {
u'id': u'4050584',
u'ext': u'flv',
u'title': u'Education engineering days - lightning talks',
u'duration': 3741,
u'uploader': u'pixelversity',
u'uploader_id': u'344706',
},
u'params': {
# It doesn't respect the 'Range' header, it would download the whole video
# caused the travis builds to fail: https://travis-ci.org/rg3/youtube-dl/jobs/14493845#L59
u'skip_download': True,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
info_url = ('http://player-c.api.bambuser.com/getVideo.json?'
'&api_key=%s&vid=%s' % (self._API_KEY, video_id))
info_json = self._download_webpage(info_url, video_id)
info = json.loads(info_json)['result']
return {
'id': video_id,
'title': info['title'],
'url': info['url'],
'thumbnail': info.get('preview'),
'duration': int(info['length']),
'view_count': int(info['views_total']),
'uploader': info['username'],
'uploader_id': info['uid'],
}
class BambuserChannelIE(InfoExtractor):
IE_NAME = u'bambuser:channel'
_VALID_URL = r'http://bambuser.com/channel/(?P<user>.*?)(?:/|#|\?|$)'
# The maximum number we can get with each request
_STEP = 50
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
user = mobj.group('user')
urls = []
last_id = ''
for i in itertools.count(1):
req_url = ('http://bambuser.com/xhr-api/index.php?username={user}'
'&sort=created&access_mode=0%2C1%2C2&limit={count}'
'&method=broadcast&format=json&vid_older_than={last}'
).format(user=user, count=self._STEP, last=last_id)
req = compat_urllib_request.Request(req_url)
# Without setting this header, we wouldn't get any result
req.add_header('Referer', 'http://bambuser.com/channel/%s' % user)
info_json = self._download_webpage(req, user,
u'Downloading page %d' % i)
results = json.loads(info_json)['result']
if len(results) == 0:
break
last_id = results[-1]['vid']
urls.extend(self.url_result(v['page'], 'Bambuser') for v in results)
return {
'_type': 'playlist',
'title': user,
'entries': urls,
}

View File

@ -3,12 +3,24 @@ import re
from .common import InfoExtractor
from ..utils import (
compat_str,
compat_urlparse,
ExtractorError,
)
class BandcampIE(InfoExtractor):
IE_NAME = u'Bandcamp'
_VALID_URL = r'http://.*?\.bandcamp\.com/track/(?P<title>.*)'
_TESTS = [{
u'url': u'http://youtube-dl.bandcamp.com/track/youtube-dl-test-song',
u'file': u'1812978515.mp3',
u'md5': u'cdeb30cdae1921719a3cbcab696ef53c',
u'info_dict': {
u"title": u"youtube-dl test song \"'/\\\u00e4\u21ad"
},
u'skip': u'There is a limit of 200 free downloads / month for the test song'
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
@ -17,6 +29,23 @@ class BandcampIE(InfoExtractor):
# We get the link to the free download page
m_download = re.search(r'freeDownloadPage: "(.*?)"', webpage)
if m_download is None:
m_trackinfo = re.search(r'trackinfo: (.+),\s*?\n', webpage)
if m_trackinfo:
json_code = m_trackinfo.group(1)
data = json.loads(json_code)
for d in data:
formats = [{
'format_id': 'format_id',
'url': format_url,
'ext': format_id.partition('-')[0]
} for format_id, format_url in sorted(d['file'].items())]
return {
'id': compat_str(d['id']),
'title': d['title'],
'formats': formats,
}
else:
raise ExtractorError(u'No free songs found')
download_link = m_download.group(1)
@ -52,3 +81,49 @@ class BandcampIE(InfoExtractor):
}
return [track_info]
class BandcampAlbumIE(InfoExtractor):
IE_NAME = u'Bandcamp:album'
_VALID_URL = r'http://.*?\.bandcamp\.com/album/(?P<title>.*)'
_TEST = {
u'url': u'http://blazo.bandcamp.com/album/jazz-format-mixtape-vol-1',
u'playlist': [
{
u'file': u'1353101989.mp3',
u'md5': u'39bc1eded3476e927c724321ddf116cf',
u'info_dict': {
u'title': u'Intro',
}
},
{
u'file': u'38097443.mp3',
u'md5': u'1a2c32e2691474643e912cc6cd4bffaa',
u'info_dict': {
u'title': u'Kero One - Keep It Alive (Blazo remix)',
}
},
],
u'params': {
u'playlistend': 2
},
u'skip': u'Bancamp imposes download limits. See test_playlists:test_bandcamp_album for the playlist test'
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
title = mobj.group('title')
webpage = self._download_webpage(url, title)
tracks_paths = re.findall(r'<a href="(.*?)" itemprop="url">', webpage)
if not tracks_paths:
raise ExtractorError(u'The page doesn\'t contain any track')
entries = [
self.url_result(compat_urlparse.urljoin(url, t_path), ie=BandcampIE.ie_key())
for t_path in tracks_paths]
title = self._search_regex(r'album_title : "(.*?)"', webpage, u'title')
return {
'_type': 'playlist',
'title': title,
'entries': entries,
}

View File

@ -24,6 +24,17 @@ class BlipTVIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?(?:\w+\.)?blip\.tv/((.+/)|(play/)|(api\.swf#))(.+)$'
_URL_EXT = r'^.*\.([a-z0-9]+)$'
IE_NAME = u'blip.tv'
_TEST = {
u'url': u'http://blip.tv/cbr/cbr-exclusive-gotham-city-imposters-bats-vs-jokerz-short-3-5796352',
u'file': u'5779306.m4v',
u'md5': u'80baf1ec5c3d2019037c1c707d676b9f',
u'info_dict': {
u"upload_date": u"20111205",
u"description": u"md5:9bc31f227219cde65e47eeec8d2dc596",
u"uploader": u"Comic Book Resources - CBR TV",
u"title": u"CBR EXCLUSIVE: \"Gotham City Imposters\" Bats VS Jokerz Short 3"
}
}
def report_direct_download(self, title):
"""Report information extraction."""
@ -92,14 +103,19 @@ class BlipTVIE(InfoExtractor):
data = json_data
upload_date = datetime.datetime.strptime(data['datestamp'], '%m-%d-%y %H:%M%p').strftime('%Y%m%d')
video_url = data['media']['url']
if 'additionalMedia' in data:
formats = sorted(data['additionalMedia'], key=lambda f: int(f['media_height']))
best_format = formats[-1]
video_url = best_format['url']
else:
video_url = data['media']['url']
umobj = re.match(self._URL_EXT, video_url)
if umobj is None:
raise ValueError('Can not determine filename extension')
ext = umobj.group(1)
info = {
'id': data['item_id'],
'id': compat_str(data['item_id']),
'url': video_url,
'uploader': data['display_name'],
'upload_date': upload_date,
@ -173,5 +189,5 @@ class BlipTVUserIE(InfoExtractor):
pagenum += 1
urls = [u'http://blip.tv/%s' % video_id for video_id in video_ids]
url_entries = [self.url_result(url, 'BlipTV') for url in urls]
url_entries = [self.url_result(vurl, 'BlipTV') for vurl in urls]
return [self.playlist_result(url_entries, playlist_title = username)]

View File

@ -0,0 +1,27 @@
import re
from .common import InfoExtractor
class BloombergIE(InfoExtractor):
_VALID_URL = r'https?://www\.bloomberg\.com/video/(?P<name>.+?).html'
_TEST = {
u'url': u'http://www.bloomberg.com/video/shah-s-presentation-on-foreign-exchange-strategies-qurhIVlJSB6hzkVi229d8g.html',
u'file': u'12bzhqZTqQHmmlA8I-i0NpzJgcG5NNYX.mp4',
u'info_dict': {
u'title': u'Shah\'s Presentation on Foreign-Exchange Strategies',
u'description': u'md5:abc86e5236f9f0e4866c59ad36736686',
},
u'params': {
# Requires ffmpeg (m3u8 manifest)
u'skip_download': True,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
name = mobj.group('name')
webpage = self._download_webpage(url, name)
ooyala_url = self._og_search_video_url(webpage)
return self.url_result(ooyala_url, ie='Ooyala')

View File

@ -1,25 +1,38 @@
import re
import json
from .common import InfoExtractor
from ..utils import determine_ext
class BreakIE(InfoExtractor):
_VALID_URL = r'(?:http://)?(?:www\.)?break\.com/video/([^/]+)'
_TEST = {
u'url': u'http://www.break.com/video/when-girls-act-like-guys-2468056',
u'file': u'2468056.mp4',
u'md5': u'a3513fb1547fba4fb6cfac1bffc6c46b',
u'info_dict': {
u"title": u"When Girls Act Like D-Bags"
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group(1).split("-")[-1]
webpage = self._download_webpage(url, video_id)
video_url = re.search(r"videoPath: '(.+?)',",webpage).group(1)
key = re.search(r"icon: '(.+?)',",webpage).group(1)
final_url = str(video_url)+"?"+str(key)
thumbnail_url = re.search(r"thumbnailURL: '(.+?)'",webpage).group(1)
title = re.search(r"sVidTitle: '(.+)',",webpage).group(1)
ext = video_url.split('.')[-1]
embed_url = 'http://www.break.com/embed/%s' % video_id
webpage = self._download_webpage(embed_url, video_id)
info_json = self._search_regex(r'var embedVars = ({.*?});', webpage,
u'info json', flags=re.DOTALL)
info = json.loads(info_json)
video_url = info['videoUri']
m_youtube = re.search(r'(https?://www\.youtube\.com/watch\?v=.*)', video_url)
if m_youtube is not None:
return self.url_result(m_youtube.group(1), 'Youtube')
final_url = video_url + '?' + info['AuthToken']
return [{
'id': video_id,
'url': final_url,
'ext': ext,
'title': title,
'thumbnail': thumbnail_url,
'ext': determine_ext(final_url),
'title': info['contentName'],
'thumbnail': info['thumbUri'],
}]

View File

@ -0,0 +1,177 @@
# encoding: utf-8
import re
import json
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import (
compat_urllib_parse,
find_xpath_attr,
compat_urlparse,
compat_str,
compat_urllib_request,
ExtractorError,
)
class BrightcoveIE(InfoExtractor):
_VALID_URL = r'https?://.*brightcove\.com/(services|viewer).*\?(?P<query>.*)'
_FEDERATED_URL_TEMPLATE = 'http://c.brightcove.com/services/viewer/htmlFederated?%s'
_PLAYLIST_URL_TEMPLATE = 'http://c.brightcove.com/services/json/experience/runtime/?command=get_programming_for_experience&playerKey=%s'
_TESTS = [
{
# From http://www.8tv.cat/8aldia/videos/xavier-sala-i-martin-aquesta-tarda-a-8-al-dia/
u'url': u'http://c.brightcove.com/services/viewer/htmlFederated?playerID=1654948606001&flashID=myExperience&%40videoPlayer=2371591881001',
u'file': u'2371591881001.mp4',
u'md5': u'8eccab865181d29ec2958f32a6a754f5',
u'note': u'Test Brightcove downloads and detection in GenericIE',
u'info_dict': {
u'title': u'Xavier Sala i Martín: “Un banc que no presta és un banc zombi que no serveix per a res”',
u'uploader': u'8TV',
u'description': u'md5:a950cc4285c43e44d763d036710cd9cd',
}
},
{
# From http://medianetwork.oracle.com/video/player/1785452137001
u'url': u'http://c.brightcove.com/services/viewer/htmlFederated?playerID=1217746023001&flashID=myPlayer&%40videoPlayer=1785452137001',
u'file': u'1785452137001.flv',
u'info_dict': {
u'title': u'JVMLS 2012: Arrays 2.0 - Opportunities and Challenges',
u'description': u'John Rose speaks at the JVM Language Summit, August 1, 2012.',
u'uploader': u'Oracle',
},
},
{
# From http://mashable.com/2013/10/26/thermoelectric-bracelet-lets-you-control-your-body-temperature/
u'url': u'http://c.brightcove.com/services/viewer/federated_f9?&playerID=1265504713001&publisherID=AQ%7E%7E%2CAAABBzUwv1E%7E%2CxP-xFHVUstiMFlNYfvF4G9yFnNaqCw_9&videoID=2750934548001',
u'info_dict': {
u'id': u'2750934548001',
u'ext': u'mp4',
u'title': u'This Bracelet Acts as a Personal Thermostat',
u'description': u'md5:547b78c64f4112766ccf4e151c20b6a0',
u'uploader': u'Mashable',
},
},
]
@classmethod
def _build_brighcove_url(cls, object_str):
"""
Build a Brightcove url from a xml string containing
<object class="BrightcoveExperience">{params}</object>
"""
# Fix up some stupid HTML, see https://github.com/rg3/youtube-dl/issues/1553
object_str = re.sub(r'(<param name="[^"]+" value="[^"]+")>',
lambda m: m.group(1) + '/>', object_str)
# Fix up some stupid XML, see https://github.com/rg3/youtube-dl/issues/1608
object_str = object_str.replace(u'<--', u'<!--')
object_doc = xml.etree.ElementTree.fromstring(object_str)
assert u'BrightcoveExperience' in object_doc.attrib['class']
params = {'flashID': object_doc.attrib['id'],
'playerID': find_xpath_attr(object_doc, './param', 'name', 'playerID').attrib['value'],
}
def find_param(name):
node = find_xpath_attr(object_doc, './param', 'name', name)
if node is not None:
return node.attrib['value']
return None
playerKey = find_param('playerKey')
# Not all pages define this value
if playerKey is not None:
params['playerKey'] = playerKey
# The three fields hold the id of the video
videoPlayer = find_param('@videoPlayer') or find_param('videoId') or find_param('videoID')
if videoPlayer is not None:
params['@videoPlayer'] = videoPlayer
linkBase = find_param('linkBaseURL')
if linkBase is not None:
params['linkBaseURL'] = linkBase
data = compat_urllib_parse.urlencode(params)
return cls._FEDERATED_URL_TEMPLATE % data
@classmethod
def _extract_brightcove_url(cls, webpage):
"""Try to extract the brightcove url from the wepbage, returns None
if it can't be found
"""
m_brightcove = re.search(
r'<object[^>]+?class=([\'"])[^>]*?BrightcoveExperience.*?\1.+?</object>',
webpage, re.DOTALL)
if m_brightcove is not None:
return cls._build_brighcove_url(m_brightcove.group())
else:
return None
def _real_extract(self, url):
# Change the 'videoId' and others field to '@videoPlayer'
url = re.sub(r'(?<=[?&])(videoI(d|D)|bctid)', '%40videoPlayer', url)
# Change bckey (used by bcove.me urls) to playerKey
url = re.sub(r'(?<=[?&])bckey', 'playerKey', url)
mobj = re.match(self._VALID_URL, url)
query_str = mobj.group('query')
query = compat_urlparse.parse_qs(query_str)
videoPlayer = query.get('@videoPlayer')
if videoPlayer:
return self._get_video_info(videoPlayer[0], query_str, query)
else:
player_key = query['playerKey']
return self._get_playlist_info(player_key[0])
def _get_video_info(self, video_id, query_str, query):
request_url = self._FEDERATED_URL_TEMPLATE % query_str
req = compat_urllib_request.Request(request_url)
linkBase = query.get('linkBaseURL')
if linkBase is not None:
req.add_header('Referer', linkBase[0])
webpage = self._download_webpage(req, video_id)
self.report_extraction(video_id)
info = self._search_regex(r'var experienceJSON = ({.*?});', webpage, 'json')
info = json.loads(info)['data']
video_info = info['programmedContent']['videoPlayer']['mediaDTO']
return self._extract_video_info(video_info)
def _get_playlist_info(self, player_key):
playlist_info = self._download_webpage(self._PLAYLIST_URL_TEMPLATE % player_key,
player_key, u'Downloading playlist information')
json_data = json.loads(playlist_info)
if 'videoList' not in json_data:
raise ExtractorError(u'Empty playlist')
playlist_info = json_data['videoList']
videos = [self._extract_video_info(video_info) for video_info in playlist_info['mediaCollectionDTO']['videoDTOs']]
return self.playlist_result(videos, playlist_id=playlist_info['id'],
playlist_title=playlist_info['mediaCollectionDTO']['displayName'])
def _extract_video_info(self, video_info):
info = {
'id': compat_str(video_info['id']),
'title': video_info['displayName'],
'description': video_info.get('shortDescription'),
'thumbnail': video_info.get('videoStillURL') or video_info.get('thumbnailURL'),
'uploader': video_info.get('publisherName'),
}
renditions = video_info.get('renditions')
if renditions:
renditions = sorted(renditions, key=lambda r: r['size'])
info['formats'] = [{
'url': rend['defaultURL'],
'height': rend.get('frameHeight'),
'width': rend.get('frameWidth'),
} for rend in renditions]
elif video_info.get('FLVFullLengthURL') is not None:
info.update({
'url': video_info['FLVFullLengthURL'],
})
else:
raise ExtractorError(u'Unable to extract video url for %s' % info['id'])
return info

View File

@ -0,0 +1,36 @@
# coding: utf-8
import re
import json
from .common import InfoExtractor
from ..utils import determine_ext
class C56IE(InfoExtractor):
_VALID_URL = r'https?://((www|player)\.)?56\.com/(.+?/)?(v_|(play_album.+-))(?P<textid>.+?)\.(html|swf)'
IE_NAME = u'56.com'
_TEST ={
u'url': u'http://www.56.com/u39/v_OTM0NDA3MTY.html',
u'file': u'93440716.flv',
u'md5': u'e59995ac63d0457783ea05f93f12a866',
u'info_dict': {
u'title': u'网事知多少 第32期车怒',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url, flags=re.VERBOSE)
text_id = mobj.group('textid')
info_page = self._download_webpage('http://vxml.56.com/json/%s/' % text_id,
text_id, u'Downloading video info')
info = json.loads(info_page)['info']
best_format = sorted(info['rfiles'], key=lambda f: int(f['filesize']))[-1]
video_url = best_format['url']
return {'id': info['vid'],
'title': info['Subject'],
'url': video_url,
'ext': determine_ext(video_url),
'thumbnail': info.get('bimg') or info.get('img'),
}

View File

@ -0,0 +1,37 @@
# coding: utf-8
import re
from .common import InfoExtractor
class Canalc2IE(InfoExtractor):
IE_NAME = 'canalc2.tv'
_VALID_URL = r'http://.*?\.canalc2\.tv/video\.asp\?.*?idVideo=(?P<id>\d+)'
_TEST = {
u'url': u'http://www.canalc2.tv/video.asp?idVideo=12163&voir=oui',
u'file': u'12163.mp4',
u'md5': u'060158428b650f896c542dfbb3d6487f',
u'info_dict': {
u'title': u'Terrasses du Numérique'
}
}
def _real_extract(self, url):
video_id = re.match(self._VALID_URL, url).group('id')
# We need to set the voir field for getting the file name
url = 'http://www.canalc2.tv/video.asp?idVideo=%s&voir=oui' % video_id
webpage = self._download_webpage(url, video_id)
file_name = self._search_regex(
r"so\.addVariable\('file','(.*?)'\);",
webpage, 'file name')
video_url = 'http://vod-flash.u-strasbg.fr:8080/' + file_name
title = self._html_search_regex(
r'class="evenement8">(.*?)</a>', webpage, u'title')
return {'id': video_id,
'ext': 'mp4',
'url': video_url,
'title': title,
}

View File

@ -0,0 +1,55 @@
# encoding: utf-8
import re
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import unified_strdate
class CanalplusIE(InfoExtractor):
_VALID_URL = r'https?://(www\.canalplus\.fr/.*?/(?P<path>.*)|player\.canalplus\.fr/#/(?P<id>\d+))'
_VIDEO_INFO_TEMPLATE = 'http://service.canal-plus.com/video/rest/getVideosLiees/cplus/%s'
IE_NAME = u'canalplus.fr'
_TEST = {
u'url': u'http://www.canalplus.fr/c-infos-documentaires/pid1830-c-zapping.html?vid=922470',
u'file': u'922470.flv',
u'info_dict': {
u'title': u'Zapping - 26/08/13',
u'description': u'Le meilleur de toutes les chaînes, tous les jours.\nEmission du 26 août 2013',
u'upload_date': u'20130826',
},
u'params': {
u'skip_download': True,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.groupdict().get('id')
if video_id is None:
webpage = self._download_webpage(url, mobj.group('path'))
video_id = self._search_regex(r'videoId = "(\d+)";', webpage, u'video id')
info_url = self._VIDEO_INFO_TEMPLATE % video_id
info_page = self._download_webpage(info_url,video_id,
u'Downloading video info')
self.report_extraction(video_id)
doc = xml.etree.ElementTree.fromstring(info_page.encode('utf-8'))
video_info = [video for video in doc if video.find('ID').text == video_id][0]
infos = video_info.find('INFOS')
media = video_info.find('MEDIA')
formats = [media.find('VIDEOS/%s' % format)
for format in ['BAS_DEBIT', 'HAUT_DEBIT', 'HD']]
video_url = [format.text for format in formats if format is not None][-1]
return {'id': video_id,
'title': u'%s - %s' % (infos.find('TITRAGE/TITRE').text,
infos.find('TITRAGE/SOUS_TITRE').text),
'url': video_url,
'ext': 'flv',
'upload_date': unified_strdate(infos.find('PUBLICATION/DATE').text),
'thumbnail': media.find('IMAGES/GRAND').text,
'description': infos.find('DESCRIPTION').text,
'view_count': int(infos.find('NB_VUES').text),
}

View File

@ -0,0 +1,84 @@
# encoding: utf-8
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
)
class CinemassacreIE(InfoExtractor):
_VALID_URL = r'(?:http://)?(?:www\.)?(?P<url>cinemassacre\.com/(?P<date_Y>[0-9]{4})/(?P<date_m>[0-9]{2})/(?P<date_d>[0-9]{2})/.+?)(?:[/?].*)?'
_TESTS = [{
u'url': u'http://cinemassacre.com/2012/11/10/avgn-the-movie-trailer/',
u'file': u'19911.flv',
u'md5': u'f9bb7ede54d1229c9846e197b4737e06',
u'info_dict': {
u'upload_date': u'20121110',
u'title': u'“Angry Video Game Nerd: The Movie” Trailer',
u'description': u'md5:fb87405fcb42a331742a0dce2708560b',
}
},
{
u'url': u'http://cinemassacre.com/2013/10/02/the-mummys-hand-1940',
u'file': u'521be8ef82b16.flv',
u'md5': u'9509ee44dcaa7c1068604817c19a9e50',
u'info_dict': {
u'upload_date': u'20131002',
u'title': u'The Mummys Hand (1940)',
}
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
webpage_url = u'http://' + mobj.group('url')
webpage = self._download_webpage(webpage_url, None) # Don't know video id yet
video_date = mobj.group('date_Y') + mobj.group('date_m') + mobj.group('date_d')
mobj = re.search(r'src="(?P<embed_url>http://player\.screenwavemedia\.com/play/[a-zA-Z]+\.php\?id=(?:Cinemassacre-)?(?P<video_id>.+?))"', webpage)
if not mobj:
raise ExtractorError(u'Can\'t extract embed url and video id')
playerdata_url = mobj.group(u'embed_url')
video_id = mobj.group(u'video_id')
video_title = self._html_search_regex(r'<title>(?P<title>.+?)\|',
webpage, u'title')
video_description = self._html_search_regex(r'<div class="entry-content">(?P<description>.+?)</div>',
webpage, u'description', flags=re.DOTALL, fatal=False)
if len(video_description) == 0:
video_description = None
playerdata = self._download_webpage(playerdata_url, video_id)
url = self._html_search_regex(r'\'streamer\': \'(?P<url>[^\']+)\'', playerdata, u'url')
sd_file = self._html_search_regex(r'\'file\': \'(?P<sd_file>[^\']+)\'', playerdata, u'sd_file')
hd_file = self._html_search_regex(r'\'?file\'?: "(?P<hd_file>[^"]+)"', playerdata, u'hd_file')
video_thumbnail = self._html_search_regex(r'\'image\': \'(?P<thumbnail>[^\']+)\'', playerdata, u'thumbnail', fatal=False)
formats = [
{
'url': url,
'play_path': 'mp4:' + sd_file,
'rtmp_live': True, # workaround
'ext': 'flv',
'format': 'sd',
'format_id': 'sd',
},
{
'url': url,
'play_path': 'mp4:' + hd_file,
'rtmp_live': True, # workaround
'ext': 'flv',
'format': 'hd',
'format_id': 'hd',
},
]
return {
'id': video_id,
'title': video_title,
'formats': formats,
'description': video_description,
'upload_date': video_date,
'thumbnail': video_thumbnail,
}

View File

@ -0,0 +1,53 @@
import re
import time
import xml.etree.ElementTree
from .common import InfoExtractor
class ClipfishIE(InfoExtractor):
IE_NAME = u'clipfish'
_VALID_URL = r'^https?://(?:www\.)?clipfish\.de/.*?/video/(?P<id>[0-9]+)/'
_TEST = {
u'url': u'http://www.clipfish.de/special/supertalent/video/4028320/supertalent-2013-ivana-opacak-singt-nobodys-perfect/',
u'file': u'4028320.f4v',
u'md5': u'5e38bda8c329fbfb42be0386a3f5a382',
u'info_dict': {
u'title': u'Supertalent 2013: Ivana Opacak singt Nobody\'s Perfect',
u'duration': 399,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group(1)
info_url = ('http://www.clipfish.de/devxml/videoinfo/%s?ts=%d' %
(video_id, int(time.time())))
info_xml = self._download_webpage(
info_url, video_id, note=u'Downloading info page')
doc = xml.etree.ElementTree.fromstring(info_xml)
title = doc.find('title').text
video_url = doc.find('filename').text
thumbnail = doc.find('imageurl').text
duration_str = doc.find('duration').text
m = re.match(
r'^(?P<hours>[0-9]+):(?P<minutes>[0-9]{2}):(?P<seconds>[0-9]{2}):(?P<ms>[0-9]*)$',
duration_str)
if m:
duration = (
(int(m.group('hours')) * 60 * 60) +
(int(m.group('minutes')) * 60) +
(int(m.group('seconds')))
)
else:
duration = None
return {
'id': video_id,
'title': title,
'url': video_url,
'thumbnail': thumbnail,
'duration': duration,
}

View File

@ -0,0 +1,58 @@
import re
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import determine_ext
class CNNIE(InfoExtractor):
_VALID_URL = r'''(?x)https?://((edition|www)\.)?cnn\.com/video/(data/.+?|\?)/
(?P<path>.+?/(?P<title>[^/]+?)(?:\.cnn|(?=&)))'''
_TESTS = [{
u'url': u'http://edition.cnn.com/video/?/video/sports/2013/06/09/nadal-1-on-1.cnn',
u'file': u'sports_2013_06_09_nadal-1-on-1.cnn.mp4',
u'md5': u'3e6121ea48df7e2259fe73a0628605c4',
u'info_dict': {
u'title': u'Nadal wins 8th French Open title',
u'description': u'World Sport\'s Amanda Davies chats with 2013 French Open champion Rafael Nadal.',
},
},
{
u"url": u"http://edition.cnn.com/video/?/video/us/2013/08/21/sot-student-gives-epic-speech.georgia-institute-of-technology&utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+rss%2Fcnn_topstories+%28RSS%3A+Top+Stories%29",
u"file": u"us_2013_08_21_sot-student-gives-epic-speech.georgia-institute-of-technology.mp4",
u"md5": u"b5cc60c60a3477d185af8f19a2a26f4e",
u"info_dict": {
u"title": "Student's epic speech stuns new freshmen",
u"description": "A Georgia Tech student welcomes the incoming freshmen with an epic speech backed by music from \"2001: A Space Odyssey.\""
}
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
path = mobj.group('path')
page_title = mobj.group('title')
info_url = u'http://cnn.com/video/data/3.0/%s/index.xml' % path
info_xml = self._download_webpage(info_url, page_title)
info = xml.etree.ElementTree.fromstring(info_xml.encode('utf-8'))
formats = []
for f in info.findall('files/file'):
mf = re.match(r'(\d+)x(\d+)(?:_(.*)k)?',f.attrib['bitrate'])
if mf is not None:
formats.append((int(mf.group(1)), int(mf.group(2)), int(mf.group(3) or 0), f.text))
formats = sorted(formats)
(_,_,_, video_path) = formats[-1]
video_url = 'http://ht.cdn.turner.com/cnn/big%s' % video_path
thumbnails = sorted([((int(t.attrib['height']),int(t.attrib['width'])), t.text) for t in info.findall('images/image')])
thumbs_dict = [{'resolution': res, 'url': t_url} for (res, t_url) in thumbnails]
return {'id': info.attrib['id'],
'title': info.find('headline').text,
'url': video_url,
'ext': determine_ext(video_url),
'thumbnail': thumbnails[-1][1],
'thumbnails': thumbs_dict,
'description': info.find('description').text,
}

View File

@ -1,26 +1,35 @@
import re
import socket
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import (
compat_http_client,
compat_str,
compat_urllib_error,
compat_urllib_parse_urlparse,
compat_urllib_request,
determine_ext,
ExtractorError,
)
class CollegeHumorIE(InfoExtractor):
_WORKING = False
_VALID_URL = r'^(?:https?://)?(?:www\.)?collegehumor\.com/video/(?P<videoid>[0-9]+)/(?P<shorttitle>.*)$'
_VALID_URL = r'^(?:https?://)?(?:www\.)?collegehumor\.com/(video|embed|e)/(?P<videoid>[0-9]+)/?(?P<shorttitle>.*)$'
def report_manifest(self, video_id):
"""Report information extraction."""
self.to_screen(u'%s: Downloading XML manifest' % video_id)
_TESTS = [{
u'url': u'http://www.collegehumor.com/video/6902724/comic-con-cosplay-catastrophe',
u'file': u'6902724.mp4',
u'md5': u'1264c12ad95dca142a9f0bf7968105a0',
u'info_dict': {
u'title': u'Comic-Con Cosplay Catastrophe',
u'description': u'Fans get creative this year at San Diego. Too creative. And yes, that\'s really Joss Whedon.',
},
},
{
u'url': u'http://www.collegehumor.com/video/3505939/font-conference',
u'file': u'3505939.mp4',
u'md5': u'c51ca16b82bb456a4397987791a835f5',
u'info_dict': {
u'title': u'Font Conference',
u'description': u'This video wasn\'t long enough, so we made it double-spaced.',
},
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
@ -36,39 +45,38 @@ class CollegeHumorIE(InfoExtractor):
self.report_extraction(video_id)
xmlUrl = 'http://www.collegehumor.com/moogaloop/video/' + video_id
try:
metaXml = compat_urllib_request.urlopen(xmlUrl).read()
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
raise ExtractorError(u'Unable to download video info XML: %s' % compat_str(err))
mdoc = self._download_xml(xmlUrl, video_id,
u'Downloading info XML',
u'Unable to download video info XML')
mdoc = xml.etree.ElementTree.fromstring(metaXml)
try:
videoNode = mdoc.findall('./video')[0]
youtubeIdNode = videoNode.find('./youtubeID')
if youtubeIdNode is not None:
return self.url_result(youtubeIdNode.text, 'Youtube')
info['description'] = videoNode.findall('./description')[0].text
info['title'] = videoNode.findall('./caption')[0].text
info['thumbnail'] = videoNode.findall('./thumbnail')[0].text
manifest_url = videoNode.findall('./file')[0].text
next_url = videoNode.findall('./file')[0].text
except IndexError:
raise ExtractorError(u'Invalid metadata XML file')
manifest_url += '?hdcore=2.10.3'
self.report_manifest(video_id)
try:
manifestXml = compat_urllib_request.urlopen(manifest_url).read()
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
raise ExtractorError(u'Unable to download video info XML: %s' % compat_str(err))
if next_url.endswith(u'manifest.f4m'):
manifest_url = next_url + '?hdcore=2.10.3'
adoc = self._download_xml(manifest_url, video_id,
u'Downloading XML manifest',
u'Unable to download video info XML')
adoc = xml.etree.ElementTree.fromstring(manifestXml)
try:
media_node = adoc.findall('./{http://ns.adobe.com/f4m/1.0}media')[0]
node_id = media_node.attrib['url']
video_id = adoc.findall('./{http://ns.adobe.com/f4m/1.0}id')[0].text
except IndexError as err:
raise ExtractorError(u'Invalid manifest file')
try:
video_id = adoc.findall('./{http://ns.adobe.com/f4m/1.0}id')[0].text
except IndexError:
raise ExtractorError(u'Invalid manifest file')
url_pr = compat_urllib_parse_urlparse(info['thumbnail'])
info['url'] = url_pr.scheme + '://' + url_pr.netloc + video_id[:-2].replace('.csmil','').replace(',','')
info['ext'] = 'mp4'
else:
# Old-style direct links
info['url'] = next_url
info['ext'] = determine_ext(info['url'])
url_pr = compat_urllib_parse_urlparse(manifest_url)
url = url_pr.scheme + '://' + url_pr.netloc + '/z' + video_id[:-2] + '/' + node_id + 'Seg1-Frag1'
info['url'] = url
info['ext'] = 'f4f'
return [info]
return info

View File

@ -2,6 +2,7 @@ import re
import xml.etree.ElementTree
from .common import InfoExtractor
from .mtv import MTVIE, _media_xml_tag
from ..utils import (
compat_str,
compat_urllib_parse,
@ -11,9 +12,38 @@ from ..utils import (
)
class ComedyCentralIE(InfoExtractor):
"""Information extractor for The Daily Show and Colbert Report """
class ComedyCentralIE(MTVIE):
_VALID_URL = r'http://www.comedycentral.com/(video-clips|episodes|cc-studios)/(?P<title>.*)'
_FEED_URL = u'http://comedycentral.com/feeds/mrss/'
_TEST = {
u'url': u'http://www.comedycentral.com/video-clips/kllhuv/stand-up-greg-fitzsimmons--uncensored---too-good-of-a-mother',
u'md5': u'4167875aae411f903b751a21f357f1ee',
u'info_dict': {
u'id': u'cef0cbb3-e776-4bc9-b62e-8016deccb354',
u'ext': u'mp4',
u'title': u'Uncensored - Greg Fitzsimmons - Too Good of a Mother',
u'description': u'After a certain point, breastfeeding becomes c**kblocking.',
},
}
# Overwrite MTVIE properties we don't want
_TESTS = []
def _get_thumbnail_url(self, uri, itemdoc):
search_path = '%s/%s' % (_media_xml_tag('group'), _media_xml_tag('thumbnail'))
return itemdoc.find(search_path).attrib['url']
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
title = mobj.group('title')
webpage = self._download_webpage(url, title)
mgid = self._search_regex(r'data-mgid="(?P<mgid>mgid:.*?)"',
webpage, u'mgid')
return self._get_videos_info(mgid)
class ComedyCentralShowsIE(InfoExtractor):
IE_DESC = u'The Daily Show / Colbert Report'
# urls can be abbreviations like :thedailyshow or :colbert
# urls for episodes like:
# or urls for clips like: http://www.thedailyshow.com/watch/mon-december-10-2012/any-given-gun-day
@ -25,8 +55,21 @@ class ComedyCentralIE(InfoExtractor):
(full-episodes/(?P<episode>.*)|
(?P<clip>
(the-colbert-report-(videos|collections)/(?P<clipID>[0-9]+)/[^/]*/(?P<cntitle>.*?))
|(watch/(?P<date>[^/]*)/(?P<tdstitle>.*)))))
|(watch/(?P<date>[^/]*)/(?P<tdstitle>.*)))|
(?P<interview>
extended-interviews/(?P<interID>[0-9]+)/playlist_tds_extended_(?P<interview_title>.*?)/.*?)))
$"""
_TEST = {
u'url': u'http://www.thedailyshow.com/watch/thu-december-13-2012/kristen-stewart',
u'file': u'422212.mp4',
u'md5': u'4e2f5cb088a83cd8cdb7756132f9739d',
u'info_dict': {
u"upload_date": u"20121214",
u"description": u"Kristen Stewart",
u"uploader": u"thedailyshow",
u"title": u"thedailyshow-kristen-stewart part 1"
}
}
_available_formats = ['3500', '2200', '1700', '1200', '750', '400']
@ -39,12 +82,12 @@ class ComedyCentralIE(InfoExtractor):
'400': 'mp4',
}
_video_dimensions = {
'3500': '1280x720',
'2200': '960x540',
'1700': '768x432',
'1200': '640x360',
'750': '512x288',
'400': '384x216',
'3500': (1280, 720),
'2200': (960, 540),
'1700': (768, 432),
'1200': (640, 360),
'750': (512, 288),
'400': (384, 216),
}
@classmethod
@ -52,11 +95,13 @@ class ComedyCentralIE(InfoExtractor):
"""Receives a URL and returns True if suitable for this IE."""
return re.match(cls._VALID_URL, url, re.VERBOSE) is not None
def _print_formats(self, formats):
print('Available formats:')
for x in formats:
print('%s\t:\t%s\t[%s]' %(x, self._video_extensions.get(x, 'mp4'), self._video_dimensions.get(x, '???')))
@staticmethod
def _transform_rtmp_url(rtmp_video_url):
m = re.match(r'^rtmpe?://.*?/(?P<finalid>gsp.comedystor/.*)$', rtmp_video_url)
if not m:
raise ExtractorError(u'Cannot transform RTMP url')
base = 'http://mtvnmobile.vo.llnwd.net/kip0/_pxn=1+_pxI0=Ripod-h264+_pxL0=undefined+_pxM0=+_pxK=18639+_pxE=mp4/44620/mtvnorigin/'
return base + m.group('finalid')
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url, re.VERBOSE)
@ -77,6 +122,9 @@ class ComedyCentralIE(InfoExtractor):
else:
epTitle = mobj.group('cntitle')
dlNewest = False
elif mobj.group('interview'):
epTitle = mobj.group('interview_title')
dlNewest = False
else:
dlNewest = not mobj.group('episode')
if dlNewest:
@ -140,40 +188,31 @@ class ComedyCentralIE(InfoExtractor):
self._downloader.report_error(u'unable to download ' + mediaId + ': No videos found')
continue
if self._downloader.params.get('listformats', None):
self._print_formats([i[0] for i in turls])
return
# For now, just pick the highest bitrate
format,rtmp_video_url = turls[-1]
# Get the format arg from the arg stream
req_format = self._downloader.params.get('format', None)
# Select format if we can find one
for f,v in turls:
if f == req_format:
format, rtmp_video_url = f, v
break
m = re.match(r'^rtmpe?://.*?/(?P<finalid>gsp.comedystor/.*)$', rtmp_video_url)
if not m:
raise ExtractorError(u'Cannot transform RTMP url')
base = 'http://mtvnmobile.vo.llnwd.net/kip0/_pxn=1+_pxI0=Ripod-h264+_pxL0=undefined+_pxM0=+_pxK=18639+_pxE=mp4/44620/mtvnorigin/'
video_url = base + m.group('finalid')
formats = []
for format, rtmp_video_url in turls:
w, h = self._video_dimensions.get(format, (None, None))
formats.append({
'url': self._transform_rtmp_url(rtmp_video_url),
'ext': self._video_extensions.get(format, 'mp4'),
'format_id': format,
'height': h,
'width': w,
})
effTitle = showId + u'-' + epTitle + u' part ' + compat_str(partNum+1)
info = {
'id': shortMediaId,
'url': video_url,
'formats': formats,
'uploader': showId,
'upload_date': officialDate,
'title': effTitle,
'ext': 'mp4',
'format': format,
'thumbnail': None,
'description': compat_str(officialTitle),
}
# TODO: Remove when #980 has been merged
info.update(info['formats'][-1])
results.append(info)
return results

View File

@ -3,18 +3,23 @@ import os
import re
import socket
import sys
import netrc
import xml.etree.ElementTree
from ..utils import (
compat_http_client,
compat_urllib_error,
compat_urllib_request,
compat_str,
clean_html,
compiled_regex_type,
ExtractorError,
RegexNotFoundError,
sanitize_filename,
unescapeHTML,
)
class InfoExtractor(object):
"""Information Extractor class.
@ -33,9 +38,13 @@ class InfoExtractor(object):
title: Video title, unescaped.
ext: Video filename extension.
Instead of url and ext, formats can also specified.
The following fields are optional:
format: The video format, defaults to ext (used for --get-format)
thumbnails: A list of dictionaries (with the entries "resolution" and
"url") for the varying thumbnails
thumbnail: Full URL to a video thumbnail image.
description: One-line video description.
uploader: Full name of the video uploader.
@ -43,11 +52,36 @@ class InfoExtractor(object):
uploader_id: Nickname or id of the video uploader.
location: Physical location of the video.
player_url: SWF Player URL (used for rtmpdump).
subtitles: The subtitle file contents.
subtitles: The subtitle file contents as a dictionary in the format
{language: subtitles}.
view_count: How many users have watched the video on the platform.
urlhandle: [internal] The urlHandle to be used to download the file,
like returned by urllib.request.urlopen
age_limit: Age restriction for the video, as an integer (years)
formats: A list of dictionaries for each format available, it must
be ordered from worst to best quality. Potential fields:
* url Mandatory. The URL of the video file
* ext Will be calculated from url if missing
* format A human-readable description of the format
("mp4 container with h264/opus").
Calculated from the format_id, width, height.
and format_note fields if missing.
* format_id A short description of the format
("mp4_h264_opus" or "19")
* format_note Additional info about the format
("3D" or "DASH video")
* width Width of the video, if known
* height Height of the video, if known
* abr Average audio bitrate in KBit/s
* acodec Name of the audio codec in use
* vbr Average video bitrate in KBit/s
* vcodec Name of the video codec in use
* filesize The number of bytes, if known in advance
webpage_url: The url to the video webpage, if given to youtube-dl it
should allow to get the same result again. (It will be set
by YoutubeDL if it's missing)
The fields should all be Unicode strings.
Unless mentioned otherwise, the fields should be Unicode strings.
Subclasses of this one should re-define the _real_initialize() and
_real_extract() methods and define a _VALID_URL regexp.
@ -72,7 +106,13 @@ class InfoExtractor(object):
@classmethod
def suitable(cls, url):
"""Receives a URL and returns True if suitable for this IE."""
return re.match(cls._VALID_URL, url) is not None
# This does not use has/getattr intentionally - we want to know whether
# we have cached the regexp for *this* class, whereas getattr would also
# match the superclass
if '_VALID_URL_RE' not in cls.__dict__:
cls._VALID_URL_RE = re.compile(cls._VALID_URL)
return cls._VALID_URL_RE.match(url) is not None
@classmethod
def working(cls):
@ -102,6 +142,11 @@ class InfoExtractor(object):
"""Real extraction process. Redefine in subclasses."""
pass
@classmethod
def ie_key(cls):
"""A string for getting the InfoExtractor with get_info_extractor"""
return cls.__name__[:-2]
@property
def IE_NAME(self):
return type(self).__name__[:-2]
@ -113,22 +158,32 @@ class InfoExtractor(object):
elif note is not False:
self.to_screen(u'%s: %s' % (video_id, note))
try:
return compat_urllib_request.urlopen(url_or_request)
return self._downloader.urlopen(url_or_request)
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
if errnote is None:
errnote = u'Unable to download webpage'
raise ExtractorError(u'%s: %s' % (errnote, compat_str(err)), sys.exc_info()[2])
raise ExtractorError(u'%s: %s' % (errnote, compat_str(err)), sys.exc_info()[2], cause=err)
def _download_webpage_handle(self, url_or_request, video_id, note=None, errnote=None):
""" Returns a tuple (page content as string, URL handle) """
# Strip hashes from the URL (#1038)
if isinstance(url_or_request, (compat_str, str)):
url_or_request = url_or_request.partition('#')[0]
urlh = self._request_webpage(url_or_request, video_id, note, errnote)
content_type = urlh.headers.get('Content-Type', '')
webpage_bytes = urlh.read()
m = re.match(r'[a-zA-Z0-9_.-]+/[a-zA-Z0-9_.-]+\s*;\s*charset=(.+)', content_type)
if m:
encoding = m.group(1)
else:
encoding = 'utf-8'
webpage_bytes = urlh.read()
m = re.search(br'<meta[^>]+charset=[\'"]?([^\'")]+)[ /\'">]',
webpage_bytes[:1024])
if m:
encoding = m.group(1).decode('ascii')
else:
encoding = 'utf-8'
if self._downloader.params.get('dump_intermediate_pages', False):
try:
url = url_or_request.get_full_url()
@ -137,6 +192,17 @@ class InfoExtractor(object):
self.to_screen(u'Dumping request to ' + url)
dump = base64.b64encode(webpage_bytes).decode('ascii')
self._downloader.to_screen(dump)
if self._downloader.params.get('write_pages', False):
try:
url = url_or_request.get_full_url()
except AttributeError:
url = url_or_request
raw_filename = ('%s_%s.dump' % (video_id, url))
filename = sanitize_filename(raw_filename, restricted=True)
self.to_screen(u'Saving request to ' + filename)
with open(filename, 'wb') as outf:
outf.write(webpage_bytes)
content = webpage_bytes.decode(encoding, 'replace')
return (content, urlh)
@ -144,6 +210,11 @@ class InfoExtractor(object):
""" Returns the data of the page as a string """
return self._download_webpage_handle(url_or_request, video_id, note, errnote)[0]
def _download_xml(self, url_or_request, video_id, note=u'Downloading XML', errnote=u'Unable to downloand XML'):
"""Return the xml as an xml.etree.ElementTree.Element"""
xml_string = self._download_webpage(url_or_request, video_id, note, errnote)
return xml.etree.ElementTree.fromstring(xml_string.encode('utf-8'))
def to_screen(self, msg):
"""Print msg to screen, prefixing it with '[ie_name]'"""
self._downloader.to_screen(u'[%s] %s' % (self.IE_NAME, msg))
@ -160,18 +231,19 @@ class InfoExtractor(object):
"""Report attempt to confirm age."""
self.to_screen(u'Confirming age')
def report_login(self):
"""Report attempt to log in."""
self.to_screen(u'Logging in')
#Methods for following #608
#They set the correct value of the '_type' key
def video_result(self, video_info):
"""Returns a video"""
video_info['_type'] = 'video'
return video_info
def url_result(self, url, ie=None):
def url_result(self, url, ie=None, video_id=None):
"""Returns a url that points to a page that should be processed"""
#TODO: ie should be the class used for getting the info
video_info = {'_type': 'url',
'url': url,
'ie_key': ie}
if video_id is not None:
video_info['id'] = video_id
return video_info
def playlist_result(self, entries, playlist_id=None, playlist_title=None):
"""Returns a playlist"""
@ -188,7 +260,7 @@ class InfoExtractor(object):
Perform a regex search on the given string, using a single or a list of
patterns returning the first matching group.
In case of failure return a default value or raise a WARNING or a
ExtractorError, depending on fatal, specifying the field name.
RegexNotFoundError, depending on fatal, specifying the field name.
"""
if isinstance(pattern, (str, compat_str, compiled_regex_type)):
mobj = re.search(pattern, string, flags)
@ -208,7 +280,7 @@ class InfoExtractor(object):
elif default is not None:
return default
elif fatal:
raise ExtractorError(u'Unable to extract %s' % _name)
raise RegexNotFoundError(u'Unable to extract %s' % _name)
else:
self._downloader.report_warning(u'unable to extract %s; '
u'please report this issue on http://yt-dl.org/bug' % _name)
@ -224,6 +296,106 @@ class InfoExtractor(object):
else:
return res
def _get_login_info(self):
"""
Get the the login info as (username, password)
It will look in the netrc file using the _NETRC_MACHINE value
If there's no info available, return (None, None)
"""
if self._downloader is None:
return (None, None)
username = None
password = None
downloader_params = self._downloader.params
# Attempt to use provided username and password or .netrc data
if downloader_params.get('username', None) is not None:
username = downloader_params['username']
password = downloader_params['password']
elif downloader_params.get('usenetrc', False):
try:
info = netrc.netrc().authenticators(self._NETRC_MACHINE)
if info is not None:
username = info[0]
password = info[2]
else:
raise netrc.NetrcParseError('No authenticators for %s' % self._NETRC_MACHINE)
except (IOError, netrc.NetrcParseError) as err:
self._downloader.report_warning(u'parsing .netrc: %s' % compat_str(err))
return (username, password)
# Helper functions for extracting OpenGraph info
@staticmethod
def _og_regexes(prop):
content_re = r'content=(?:"([^>]+?)"|\'(.+?)\')'
property_re = r'property=[\'"]og:%s[\'"]' % re.escape(prop)
template = r'<meta[^>]+?%s[^>]+?%s'
return [
template % (property_re, content_re),
template % (content_re, property_re),
]
def _og_search_property(self, prop, html, name=None, **kargs):
if name is None:
name = 'OpenGraph %s' % prop
escaped = self._search_regex(self._og_regexes(prop), html, name, flags=re.DOTALL, **kargs)
if escaped is None:
return None
return unescapeHTML(escaped)
def _og_search_thumbnail(self, html, **kargs):
return self._og_search_property('image', html, u'thumbnail url', fatal=False, **kargs)
def _og_search_description(self, html, **kargs):
return self._og_search_property('description', html, fatal=False, **kargs)
def _og_search_title(self, html, **kargs):
return self._og_search_property('title', html, **kargs)
def _og_search_video_url(self, html, name='video url', secure=True, **kargs):
regexes = self._og_regexes('video')
if secure: regexes = self._og_regexes('video:secure_url') + regexes
return self._html_search_regex(regexes, html, name, **kargs)
def _html_search_meta(self, name, html, display_name=None):
if display_name is None:
display_name = name
return self._html_search_regex(
r'''(?ix)<meta(?=[^>]+(?:name|property)=["\']%s["\'])
[^>]+content=["\']([^"\']+)["\']''' % re.escape(name),
html, display_name, fatal=False)
def _dc_search_uploader(self, html):
return self._html_search_meta('dc.creator', html, 'uploader')
def _rta_search(self, html):
# See http://www.rtalabel.org/index.php?content=howtofaq#single
if re.search(r'(?ix)<meta\s+name="rating"\s+'
r' content="RTA-5042-1996-1400-1577-RTA"',
html):
return 18
return 0
def _media_rating_search(self, html):
# See http://www.tjg-designs.com/WP/metadata-code-examples-adding-metadata-to-your-web-pages/
rating = self._html_search_meta('rating', html)
if not rating:
return None
RATING_TABLE = {
'safe for kids': 0,
'general': 8,
'14 years': 14,
'mature': 17,
'restricted': 19,
}
return RATING_TABLE.get(rating.lower(), None)
class SearchInfoExtractor(InfoExtractor):
"""
Base class for paged search queries extractors.
@ -261,4 +433,8 @@ class SearchInfoExtractor(InfoExtractor):
def _get_n_results(self, query, n):
"""Get a specified number of results for a query"""
raise NotImplementedError("This method must be implemented by sublclasses")
raise NotImplementedError("This method must be implemented by subclasses")
@property
def SEARCH_KEY(self):
return self._SEARCH_KEY

View File

@ -0,0 +1,106 @@
# coding: utf-8
import re
import json
from .common import InfoExtractor
from ..utils import (
compat_urllib_parse,
orderedSet,
compat_urllib_parse_urlparse,
compat_urlparse,
)
class CondeNastIE(InfoExtractor):
"""
Condé Nast is a media group, some of its sites use a custom HTML5 player
that works the same in all of them.
"""
# The keys are the supported sites and the values are the name to be shown
# to the user and in the extractor description.
_SITES = {'wired': u'WIRED',
'gq': u'GQ',
'vogue': u'Vogue',
'glamour': u'Glamour',
'wmagazine': u'W Magazine',
'vanityfair': u'Vanity Fair',
}
_VALID_URL = r'http://(video|www).(?P<site>%s).com/(?P<type>watch|series|video)/(?P<id>.+)' % '|'.join(_SITES.keys())
IE_DESC = u'Condé Nast media group: %s' % ', '.join(sorted(_SITES.values()))
_TEST = {
u'url': u'http://video.wired.com/watch/3d-printed-speakers-lit-with-led',
u'file': u'5171b343c2b4c00dd0c1ccb3.mp4',
u'md5': u'1921f713ed48aabd715691f774c451f7',
u'info_dict': {
u'title': u'3D Printed Speakers Lit With LED',
u'description': u'Check out these beautiful 3D printed LED speakers. You can\'t actually buy them, but LumiGeek is working on a board that will let you make you\'re own.',
}
}
def _extract_series(self, url, webpage):
title = self._html_search_regex(r'<div class="cne-series-info">.*?<h1>(.+?)</h1>',
webpage, u'series title', flags=re.DOTALL)
url_object = compat_urllib_parse_urlparse(url)
base_url = '%s://%s' % (url_object.scheme, url_object.netloc)
m_paths = re.finditer(r'<p class="cne-thumb-title">.*?<a href="(/watch/.+?)["\?]',
webpage, flags=re.DOTALL)
paths = orderedSet(m.group(1) for m in m_paths)
build_url = lambda path: compat_urlparse.urljoin(base_url, path)
entries = [self.url_result(build_url(path), 'CondeNast') for path in paths]
return self.playlist_result(entries, playlist_title=title)
def _extract_video(self, webpage):
description = self._html_search_regex([r'<div class="cne-video-description">(.+?)</div>',
r'<div class="video-post-content">(.+?)</div>',
],
webpage, u'description',
fatal=False, flags=re.DOTALL)
params = self._search_regex(r'var params = {(.+?)}[;,]', webpage,
u'player params', flags=re.DOTALL)
video_id = self._search_regex(r'videoId: [\'"](.+?)[\'"]', params, u'video id')
player_id = self._search_regex(r'playerId: [\'"](.+?)[\'"]', params, u'player id')
target = self._search_regex(r'target: [\'"](.+?)[\'"]', params, u'target')
data = compat_urllib_parse.urlencode({'videoId': video_id,
'playerId': player_id,
'target': target,
})
base_info_url = self._search_regex(r'url = [\'"](.+?)[\'"][,;]',
webpage, u'base info url',
default='http://player.cnevids.com/player/loader.js?')
info_url = base_info_url + data
info_page = self._download_webpage(info_url, video_id,
u'Downloading video info')
video_info = self._search_regex(r'var video = ({.+?});', info_page, u'video info')
video_info = json.loads(video_info)
def _formats_sort_key(f):
type_ord = 1 if f['type'] == 'video/mp4' else 0
quality_ord = 1 if f['quality'] == 'high' else 0
return (quality_ord, type_ord)
best_format = sorted(video_info['sources'][0], key=_formats_sort_key)[-1]
return {'id': video_id,
'url': best_format['src'],
'ext': best_format['type'].split('/')[-1],
'title': video_info['title'],
'thumbnail': video_info['poster_frame'],
'description': description,
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
site = mobj.group('site')
url_type = mobj.group('type')
id = mobj.group('id')
self.to_screen(u'Extracting from %s with the Condé Nast extractor' % self._SITES[site])
webpage = self._download_webpage(url, id)
if url_type == 'series':
return self._extract_series(url, webpage)
else:
return self._extract_video(webpage)

View File

@ -0,0 +1,40 @@
# -*- coding: utf-8 -*-
import re
from .common import InfoExtractor
from ..utils import determine_ext
class CriterionIE(InfoExtractor):
_VALID_URL = r'https?://www\.criterion\.com/films/(\d*)-.+'
_TEST = {
u'url': u'http://www.criterion.com/films/184-le-samourai',
u'file': u'184.mp4',
u'md5': u'bc51beba55685509883a9a7830919ec3',
u'info_dict': {
u"title": u"Le Samouraï",
u"description" : u'md5:a2b4b116326558149bef81f76dcbb93f',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group(1)
webpage = self._download_webpage(url, video_id)
final_url = self._search_regex(r'so.addVariable\("videoURL", "(.+?)"\)\;',
webpage, 'video url')
title = self._html_search_regex(r'<meta content="(.+?)" property="og:title" />',
webpage, 'video title')
description = self._html_search_regex(r'<meta name="description" content="(.+?)" />',
webpage, 'video description')
thumbnail = self._search_regex(r'so.addVariable\("thumbnailURL", "(.+?)"\)\;',
webpage, 'thumbnail url')
return {'id': video_id,
'url' : final_url,
'title': title,
'ext': determine_ext(final_url),
'description': description,
'thumbnail': thumbnail,
}

View File

@ -7,6 +7,15 @@ from ..utils import (
class CSpanIE(InfoExtractor):
_VALID_URL = r'http://www.c-spanvideo.org/program/(.*)'
_TEST = {
u'url': u'http://www.c-spanvideo.org/program/HolderonV',
u'file': u'315139.flv',
u'md5': u'74a623266956f69e4df0068ab6c80fe4',
u'info_dict': {
u"title": u"Attorney General Eric Holder on Voting Rights Act Decision"
},
u'skip': u'Requires rtmpdump'
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
@ -25,8 +34,6 @@ class CSpanIE(InfoExtractor):
description = self._html_search_regex(r'<meta (?:property="og:|name=")description" content="(.*?)"',
webpage, 'description',
flags=re.MULTILINE|re.DOTALL)
thumbnail = self._html_search_regex(r'<meta property="og:image" content="(.*?)"',
webpage, 'thumbnail')
url = self._search_regex(r'<string name="URL">(.*?)</string>',
video_info, 'video url')
@ -40,5 +47,5 @@ class CSpanIE(InfoExtractor):
'url': url,
'play_path': path,
'description': description,
'thumbnail': thumbnail,
'thumbnail': self._og_search_thumbnail(webpage),
}

View File

@ -0,0 +1,22 @@
# encoding: utf-8
from .canalplus import CanalplusIE
class D8IE(CanalplusIE):
_VALID_URL = r'https?://www\.d8\.tv/.*?/(?P<path>.*)'
_VIDEO_INFO_TEMPLATE = 'http://service.canal-plus.com/video/rest/getVideosLiees/d8/%s'
IE_NAME = u'd8.tv'
_TEST = {
u'url': u'http://www.d8.tv/d8-docs-mags/pid6589-d8-campagne-intime.html',
u'file': u'966289.flv',
u'info_dict': {
u'title': u'Campagne intime - Documentaire exceptionnel',
u'description': u'md5:d2643b799fb190846ae09c61e59a859f',
u'upload_date': u'20131108',
},
u'params': {
# rtmp
u'skip_download': True,
},
}

View File

@ -1,77 +1,228 @@
import re
import json
import itertools
from .common import InfoExtractor
from .subtitles import SubtitlesInfoExtractor
from ..utils import (
compat_urllib_request,
compat_urllib_parse,
compat_str,
get_element_by_attribute,
get_element_by_id,
orderedSet,
ExtractorError,
unescapeHTML,
)
class DailymotionIE(InfoExtractor):
class DailymotionBaseInfoExtractor(InfoExtractor):
@staticmethod
def _build_request(url):
"""Build a request with the family filter disabled"""
request = compat_urllib_request.Request(url)
request.add_header('Cookie', 'family_filter=off')
request.add_header('Cookie', 'ff=off')
return request
class DailymotionIE(DailymotionBaseInfoExtractor, SubtitlesInfoExtractor):
"""Information Extractor for Dailymotion"""
_VALID_URL = r'(?i)(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/video/([^/]+)'
_VALID_URL = r'(?i)(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/(?:embed/)?video/([^/]+)'
IE_NAME = u'dailymotion'
_FORMATS = [
(u'stream_h264_ld_url', u'ld'),
(u'stream_h264_url', u'standard'),
(u'stream_h264_hq_url', u'hq'),
(u'stream_h264_hd_url', u'hd'),
(u'stream_h264_hd1080_url', u'hd180'),
]
_TESTS = [
{
u'url': u'http://www.dailymotion.com/video/x33vw9_tutoriel-de-youtubeur-dl-des-video_tech',
u'file': u'x33vw9.mp4',
u'md5': u'392c4b85a60a90dc4792da41ce3144eb',
u'info_dict': {
u"uploader": u"Amphora Alex and Van .",
u"title": u"Tutoriel de Youtubeur\"DL DES VIDEO DE YOUTUBE\""
}
},
# Vevo video
{
u'url': u'http://www.dailymotion.com/video/x149uew_katy-perry-roar-official_musi',
u'file': u'USUV71301934.mp4',
u'info_dict': {
u'title': u'Roar (Official)',
u'uploader': u'Katy Perry',
u'upload_date': u'20130905',
},
u'params': {
u'skip_download': True,
},
u'skip': u'VEVO is only available in some countries',
},
# age-restricted video
{
u'url': u'http://www.dailymotion.com/video/xyh2zz_leanna-decker-cyber-girl-of-the-year-desires-nude-playboy-plus_redband',
u'file': u'xyh2zz.mp4',
u'md5': u'0d667a7b9cebecc3c89ee93099c4159d',
u'info_dict': {
u'title': 'Leanna Decker - Cyber Girl Of The Year Desires Nude [Playboy Plus]',
u'uploader': 'HotWaves1012',
u'age_limit': 18,
}
}
]
def _real_extract(self, url):
# Extract id and simplified title from URL
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group(1).split('_')[0].split('?')[0]
video_extension = 'mp4'
url = 'http://www.dailymotion.com/video/%s' % video_id
# Retrieve video webpage to extract further information
request = compat_urllib_request.Request(url)
request.add_header('Cookie', 'family_filter=off')
request = self._build_request(url)
webpage = self._download_webpage(request, video_id)
# Extract URL, uploader and title from webpage
self.report_extraction(video_id)
mobj = re.search(r'\s*var flashvars = (.*)', webpage)
if mobj is None:
raise ExtractorError(u'Unable to extract media URL')
flashvars = compat_urllib_parse.unquote(mobj.group(1))
for key in ['hd1080URL', 'hd720URL', 'hqURL', 'sdURL', 'ldURL', 'video_url']:
if key in flashvars:
max_quality = key
self.to_screen(u'Using %s' % key)
break
else:
raise ExtractorError(u'Unable to extract video URL')
# It may just embed a vevo video:
m_vevo = re.search(
r'<link rel="video_src" href="[^"]*?vevo.com[^"]*?videoId=(?P<id>[\w]*)',
webpage)
if m_vevo is not None:
vevo_id = m_vevo.group('id')
self.to_screen(u'Vevo video detected: %s' % vevo_id)
return self.url_result(u'vevo:%s' % vevo_id, ie='Vevo')
mobj = re.search(r'"' + max_quality + r'":"(.+?)"', flashvars)
if mobj is None:
raise ExtractorError(u'Unable to extract video URL')
video_url = compat_urllib_parse.unquote(mobj.group(1)).replace('\\/', '/')
# TODO: support choosing qualities
mobj = re.search(r'<meta property="og:title" content="(?P<title>[^"]*)" />', webpage)
if mobj is None:
raise ExtractorError(u'Unable to extract title')
video_title = unescapeHTML(mobj.group('title'))
video_uploader = None
video_uploader = self._search_regex([r'(?im)<span class="owner[^\"]+?">[^<]+?<a [^>]+?>([^<]+?)</a>',
# Looking for official user
r'<(?:span|a) .*?rel="author".*?>([^<]+?)</'],
webpage, 'video uploader')
webpage, 'video uploader', fatal=False)
age_limit = self._rta_search(webpage)
video_upload_date = None
mobj = re.search(r'<div class="[^"]*uploaded_cont[^"]*" title="[^"]*">([0-9]{2})-([0-9]{2})-([0-9]{4})</div>', webpage)
if mobj is not None:
video_upload_date = mobj.group(3) + mobj.group(2) + mobj.group(1)
return [{
embed_url = 'http://www.dailymotion.com/embed/video/%s' % video_id
embed_page = self._download_webpage(embed_url, video_id,
u'Downloading embed page')
info = self._search_regex(r'var info = ({.*?}),$', embed_page,
'video info', flags=re.MULTILINE)
info = json.loads(info)
if info.get('error') is not None:
msg = 'Couldn\'t get video, Dailymotion says: %s' % info['error']['title']
raise ExtractorError(msg, expected=True)
formats = []
for (key, format_id) in self._FORMATS:
video_url = info.get(key)
if video_url is not None:
m_size = re.search(r'H264-(\d+)x(\d+)', video_url)
if m_size is not None:
width, height = m_size.group(1), m_size.group(2)
else:
width, height = None, None
formats.append({
'url': video_url,
'ext': 'mp4',
'format_id': format_id,
'width': width,
'height': height,
})
if not formats:
raise ExtractorError(u'Unable to extract video URL')
# subtitles
video_subtitles = self.extract_subtitles(video_id, webpage)
if self._downloader.params.get('listsubtitles', False):
self._list_available_subtitles(video_id, webpage)
return
return {
'id': video_id,
'url': video_url,
'formats': formats,
'uploader': video_uploader,
'upload_date': video_upload_date,
'title': video_title,
'ext': video_extension,
}]
'title': self._og_search_title(webpage),
'subtitles': video_subtitles,
'thumbnail': info['thumbnail_url'],
'age_limit': age_limit,
}
def _get_available_subtitles(self, video_id, webpage):
try:
sub_list = self._download_webpage(
'https://api.dailymotion.com/video/%s/subtitles?fields=id,language,url' % video_id,
video_id, note=False)
except ExtractorError as err:
self._downloader.report_warning(u'unable to download video subtitles: %s' % compat_str(err))
return {}
info = json.loads(sub_list)
if (info['total'] > 0):
sub_lang_list = dict((l['language'], l['url']) for l in info['list'])
return sub_lang_list
self._downloader.report_warning(u'video doesn\'t have subtitles')
return {}
class DailymotionPlaylistIE(DailymotionBaseInfoExtractor):
IE_NAME = u'dailymotion:playlist'
_VALID_URL = r'(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/playlist/(?P<id>.+?)/'
_MORE_PAGES_INDICATOR = r'<div class="next">.*?<a.*?href="/playlist/.+?".*?>.*?</a>.*?</div>'
_PAGE_TEMPLATE = 'https://www.dailymotion.com/playlist/%s/%s'
def _extract_entries(self, id):
video_ids = []
for pagenum in itertools.count(1):
request = self._build_request(self._PAGE_TEMPLATE % (id, pagenum))
webpage = self._download_webpage(request,
id, u'Downloading page %s' % pagenum)
playlist_el = get_element_by_attribute(u'class', u'row video_list', webpage)
video_ids.extend(re.findall(r'data-id="(.+?)"', playlist_el))
if re.search(self._MORE_PAGES_INDICATOR, webpage, re.DOTALL) is None:
break
return [self.url_result('http://www.dailymotion.com/video/%s' % video_id, 'Dailymotion')
for video_id in orderedSet(video_ids)]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
playlist_id = mobj.group('id')
webpage = self._download_webpage(url, playlist_id)
return {'_type': 'playlist',
'id': playlist_id,
'title': get_element_by_id(u'playlist_name', webpage),
'entries': self._extract_entries(playlist_id),
}
class DailymotionUserIE(DailymotionPlaylistIE):
IE_NAME = u'dailymotion:user'
_VALID_URL = r'(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/user/(?P<user>[^/]+)'
_MORE_PAGES_INDICATOR = r'<div class="next">.*?<a.*?href="/user/.+?".*?>.*?</a>.*?</div>'
_PAGE_TEMPLATE = 'http://www.dailymotion.com/user/%s/%s'
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
user = mobj.group('user')
webpage = self._download_webpage(url, user)
full_user = self._html_search_regex(
r'<a class="label" href="/%s".*?>(.*?)</' % re.escape(user),
webpage, u'user', flags=re.DOTALL)
return {
'_type': 'playlist',
'id': user,
'title': full_user,
'entries': self._extract_entries(user),
}

View File

@ -0,0 +1,74 @@
# encoding: utf-8
import re
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import (
compat_urllib_parse,
determine_ext,
)
class DaumIE(InfoExtractor):
_VALID_URL = r'https?://tvpot\.daum\.net/.*?clipid=(?P<id>\d+)'
IE_NAME = u'daum.net'
_TEST = {
u'url': u'http://tvpot.daum.net/clip/ClipView.do?clipid=52554690',
u'file': u'52554690.mp4',
u'info_dict': {
u'title': u'DOTA 2GETHER 시즌2 6회 - 2부',
u'description': u'DOTA 2GETHER 시즌2 6회 - 2부',
u'upload_date': u'20130831',
u'duration': 3868,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group(1)
canonical_url = 'http://tvpot.daum.net/v/%s' % video_id
webpage = self._download_webpage(canonical_url, video_id)
full_id = self._search_regex(r'<link rel="video_src" href=".+?vid=(.+?)"',
webpage, u'full id')
query = compat_urllib_parse.urlencode({'vid': full_id})
info_xml = self._download_webpage(
'http://tvpot.daum.net/clip/ClipInfoXml.do?' + query, video_id,
u'Downloading video info')
urls_xml = self._download_webpage(
'http://videofarm.daum.net/controller/api/open/v1_2/MovieData.apixml?' + query,
video_id, u'Downloading video formats info')
info = xml.etree.ElementTree.fromstring(info_xml.encode('utf-8'))
urls = xml.etree.ElementTree.fromstring(urls_xml.encode('utf-8'))
self.to_screen(u'%s: Getting video urls' % video_id)
formats = []
for format_el in urls.findall('result/output_list/output_list'):
profile = format_el.attrib['profile']
format_query = compat_urllib_parse.urlencode({
'vid': full_id,
'profile': profile,
})
url_xml = self._download_webpage(
'http://videofarm.daum.net/controller/api/open/v1_2/MovieLocation.apixml?' + format_query,
video_id, note=False)
url_doc = xml.etree.ElementTree.fromstring(url_xml.encode('utf-8'))
format_url = url_doc.find('result/url').text
formats.append({
'url': format_url,
'ext': determine_ext(format_url),
'format_id': profile,
})
info = {
'id': video_id,
'title': info.find('TITLE').text,
'formats': formats,
'thumbnail': self._og_search_thumbnail(webpage),
'description': info.find('CONTENTS').text,
'duration': int(info.find('DURATION').text),
'upload_date': info.find('REGDTTM').text[:8],
}
# TODO: Remove when #980 has been merged
info.update(formats[-1])
return info

View File

@ -0,0 +1,39 @@
import re
import json
from .common import InfoExtractor
class DefenseGouvFrIE(InfoExtractor):
_IE_NAME = 'defense.gouv.fr'
_VALID_URL = (r'http://.*?\.defense\.gouv\.fr/layout/set/'
r'ligthboxvideo/base-de-medias/webtv/(.*)')
_TEST = {
u'url': (u'http://www.defense.gouv.fr/layout/set/ligthboxvideo/'
u'base-de-medias/webtv/attaque-chimique-syrienne-du-21-aout-2013-1'),
u'file': u'11213.mp4',
u'md5': u'75bba6124da7e63d2d60b5244ec9430c',
"info_dict": {
"title": "attaque-chimique-syrienne-du-21-aout-2013-1"
}
}
def _real_extract(self, url):
title = re.match(self._VALID_URL, url).group(1)
webpage = self._download_webpage(url, title)
video_id = self._search_regex(
r"flashvars.pvg_id=\"(\d+)\";",
webpage, 'ID')
json_url = ('http://static.videos.gouv.fr/brightcovehub/export/json/'
+ video_id)
info = self._download_webpage(json_url, title,
'Downloading JSON config')
video_url = json.loads(info)['renditions'][0]['url']
return {'id': video_id,
'ext': 'mp4',
'url': video_url,
'title': title,
}

View File

@ -25,7 +25,7 @@ class DepositFilesIE(InfoExtractor):
url = 'http://depositfiles.com/en/files/' + file_id
# Retrieve file webpage with 'Free download' button pressed
free_download_indication = { 'gateway_result' : '1' }
free_download_indication = {'gateway_result' : '1'}
request = compat_urllib_request.Request(url, compat_urllib_parse.urlencode(free_download_indication))
try:
self.report_download_webpage(file_id)

View File

@ -0,0 +1,41 @@
import re
import json
import time
from .common import InfoExtractor
class DotsubIE(InfoExtractor):
_VALID_URL = r'(?:http://)?(?:www\.)?dotsub\.com/view/([^/]+)'
_TEST = {
u'url': u'http://dotsub.com/view/aed3b8b2-1889-4df5-ae63-ad85f5572f27',
u'file': u'aed3b8b2-1889-4df5-ae63-ad85f5572f27.flv',
u'md5': u'0914d4d69605090f623b7ac329fea66e',
u'info_dict': {
u"title": u"Pyramids of Waste (2010), AKA The Lightbulb Conspiracy - Planned obsolescence documentary",
u"uploader": u"4v4l0n42",
u'description': u'Pyramids of Waste (2010) also known as "The lightbulb conspiracy" is a documentary about how our economic system based on consumerism and planned obsolescence is breaking our planet down.\r\n\r\nSolutions to this can be found at:\r\nhttp://robotswillstealyourjob.com\r\nhttp://www.federicopistono.org\r\n\r\nhttp://opensourceecology.org\r\nhttp://thezeitgeistmovement.com',
u'thumbnail': u'http://dotsub.com/media/aed3b8b2-1889-4df5-ae63-ad85f5572f27/p',
u'upload_date': u'20101213',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group(1)
info_url = "https://dotsub.com/api/media/%s/metadata" %(video_id)
webpage = self._download_webpage(info_url, video_id)
info = json.loads(webpage)
date = time.gmtime(info['dateCreated']/1000) # The timestamp is in miliseconds
return [{
'id': video_id,
'url': info['mediaURI'],
'ext': 'flv',
'title': info['title'],
'thumbnail': info['screenshotURI'],
'description': info['description'],
'uploader': info['user'],
'view_count': info['numberOfViews'],
'upload_date': u'%04i%02i%02i' % (date.tm_year, date.tm_mon, date.tm_mday),
}]

View File

@ -0,0 +1,85 @@
# coding: utf-8
import re
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import (
determine_ext,
unified_strdate,
)
class DreiSatIE(InfoExtractor):
IE_NAME = '3sat'
_VALID_URL = r'(?:http://)?(?:www\.)?3sat.de/mediathek/index.php\?(?:(?:mode|display)=[^&]+&)*obj=(?P<id>[0-9]+)$'
_TEST = {
u"url": u"http://www.3sat.de/mediathek/index.php?obj=36983",
u'file': u'36983.webm',
u'md5': u'57c97d0469d71cf874f6815aa2b7c944',
u'info_dict': {
u"title": u"Kaffeeland Schweiz",
u"description": u"Über 80 Kaffeeröstereien liefern in der Schweiz das Getränk, in das das Land so vernarrt ist: Mehr als 1000 Tassen trinkt ein Schweizer pro Jahr. SCHWEIZWEIT nimmt die Kaffeekultur unter die...",
u"uploader": u"3sat",
u"upload_date": u"20130622"
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
details_url = 'http://www.3sat.de/mediathek/xmlservice/web/beitragsDetails?ak=web&id=%s' % video_id
details_xml = self._download_webpage(details_url, video_id, note=u'Downloading video details')
details_doc = xml.etree.ElementTree.fromstring(details_xml.encode('utf-8'))
thumbnail_els = details_doc.findall('.//teaserimage')
thumbnails = [{
'width': te.attrib['key'].partition('x')[0],
'height': te.attrib['key'].partition('x')[2],
'url': te.text,
} for te in thumbnail_els]
information_el = details_doc.find('.//information')
video_title = information_el.find('./title').text
video_description = information_el.find('./detail').text
details_el = details_doc.find('.//details')
video_uploader = details_el.find('./channel').text
upload_date = unified_strdate(details_el.find('./airtime').text)
format_els = details_doc.findall('.//formitaet')
formats = [{
'format_id': fe.attrib['basetype'],
'width': int(fe.find('./width').text),
'height': int(fe.find('./height').text),
'url': fe.find('./url').text,
'ext': determine_ext(fe.find('./url').text),
'filesize': int(fe.find('./filesize').text),
'video_bitrate': int(fe.find('./videoBitrate').text),
'3sat_qualityname': fe.find('./quality').text,
} for fe in format_els
if not fe.find('./url').text.startswith('http://www.metafilegenerator.de/')]
def _sortkey(format):
qidx = ['low', 'med', 'high', 'veryhigh'].index(format['3sat_qualityname'])
prefer_http = 1 if 'rtmp' in format['url'] else 0
return (qidx, prefer_http, format['video_bitrate'])
formats.sort(key=_sortkey)
info = {
'_type': 'video',
'id': video_id,
'title': video_title,
'formats': formats,
'description': video_description,
'thumbnails': thumbnails,
'thumbnail': thumbnails[-1]['url'],
'uploader': video_uploader,
'upload_date': upload_date,
}
# TODO: Remove when #980 has been merged
info.update(formats[-1])
return info

View File

@ -0,0 +1,37 @@
import re
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import determine_ext
class EbaumsWorldIE(InfoExtractor):
_VALID_URL = r'https?://www\.ebaumsworld\.com/video/watch/(?P<id>\d+)'
_TEST = {
u'url': u'http://www.ebaumsworld.com/video/watch/83367677/',
u'file': u'83367677.mp4',
u'info_dict': {
u'title': u'A Giant Python Opens The Door',
u'description': u'This is how nightmares start...',
u'uploader': u'jihadpizza',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
config_xml = self._download_webpage(
'http://www.ebaumsworld.com/video/player/%s' % video_id, video_id)
config = xml.etree.ElementTree.fromstring(config_xml.encode('utf-8'))
video_url = config.find('file').text
return {
'id': video_id,
'title': config.find('title').text,
'url': video_url,
'ext': determine_ext(video_url),
'description': config.find('description').text,
'thumbnail': config.find('image').text,
'uploader': config.find('username').text,
}

View File

@ -0,0 +1,46 @@
import re
from ..utils import (
compat_urllib_parse,
determine_ext
)
from .common import InfoExtractor
class EHowIE(InfoExtractor):
IE_NAME = u'eHow'
_VALID_URL = r'(?:https?://)?(?:www\.)?ehow\.com/[^/_?]*_(?P<id>[0-9]+)'
_TEST = {
u'url': u'http://www.ehow.com/video_12245069_hardwood-flooring-basics.html',
u'file': u'12245069.flv',
u'md5': u'9809b4e3f115ae2088440bcb4efbf371',
u'info_dict': {
u"title": u"Hardwood Flooring Basics",
u"description": u"Hardwood flooring may be time consuming, but its ultimately a pretty straightforward concept. Learn about hardwood flooring basics with help from a hardware flooring business owner in this free video...",
u"uploader": u"Erick Nathan"
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
video_url = self._search_regex(r'(?:file|source)=(http[^\'"&]*)',
webpage, u'video URL')
final_url = compat_urllib_parse.unquote(video_url)
uploader = self._search_regex(r'<meta name="uploader" content="(.+?)" />',
webpage, u'uploader')
title = self._og_search_title(webpage).replace(' | eHow', '')
ext = determine_ext(final_url)
return {
'_type': 'video',
'id': video_id,
'url': final_url,
'ext': ext,
'title': title,
'thumbnail': self._og_search_thumbnail(webpage),
'description': self._og_search_description(webpage),
'uploader': uploader,
}

View File

@ -1,4 +1,3 @@
import itertools
import json
import random
import re
@ -12,6 +11,77 @@ from ..utils import (
class EightTracksIE(InfoExtractor):
IE_NAME = '8tracks'
_VALID_URL = r'https?://8tracks.com/(?P<user>[^/]+)/(?P<id>[^/#]+)(?:#.*)?$'
_TEST = {
u"name": u"EightTracks",
u"url": u"http://8tracks.com/ytdl/youtube-dl-test-tracks-a",
u"playlist": [
{
u"file": u"11885610.m4a",
u"md5": u"96ce57f24389fc8734ce47f4c1abcc55",
u"info_dict": {
u"title": u"youtue-dl project<>\"' - youtube-dl test track 1 \"'/\\\u00e4\u21ad",
u"uploader_id": u"ytdl"
}
},
{
u"file": u"11885608.m4a",
u"md5": u"4ab26f05c1f7291ea460a3920be8021f",
u"info_dict": {
u"title": u"youtube-dl project - youtube-dl test track 2 \"'/\\\u00e4\u21ad",
u"uploader_id": u"ytdl"
}
},
{
u"file": u"11885679.m4a",
u"md5": u"d30b5b5f74217410f4689605c35d1fd7",
u"info_dict": {
u"title": u"youtube-dl project as well - youtube-dl test track 3 \"'/\\\u00e4\u21ad",
u"uploader_id": u"ytdl"
}
},
{
u"file": u"11885680.m4a",
u"md5": u"4eb0a669317cd725f6bbd336a29f923a",
u"info_dict": {
u"title": u"youtube-dl project as well - youtube-dl test track 4 \"'/\\\u00e4\u21ad",
u"uploader_id": u"ytdl"
}
},
{
u"file": u"11885682.m4a",
u"md5": u"1893e872e263a2705558d1d319ad19e8",
u"info_dict": {
u"title": u"PH - youtube-dl test track 5 \"'/\\\u00e4\u21ad",
u"uploader_id": u"ytdl"
}
},
{
u"file": u"11885683.m4a",
u"md5": u"b673c46f47a216ab1741ae8836af5899",
u"info_dict": {
u"title": u"PH - youtube-dl test track 6 \"'/\\\u00e4\u21ad",
u"uploader_id": u"ytdl"
}
},
{
u"file": u"11885684.m4a",
u"md5": u"1d74534e95df54986da7f5abf7d842b7",
u"info_dict": {
u"title": u"phihag - youtube-dl test track 7 \"'/\\\u00e4\u21ad",
u"uploader_id": u"ytdl"
}
},
{
u"file": u"11885685.m4a",
u"md5": u"f081f47af8f6ae782ed131d38b9cd1c0",
u"info_dict": {
u"title": u"phihag - youtube-dl test track 8 \"'/\\\u00e4\u21ad",
u"uploader_id": u"ytdl"
}
}
]
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
@ -30,7 +100,7 @@ class EightTracksIE(InfoExtractor):
first_url = 'http://8tracks.com/sets/%s/play?player=sm&mix_id=%s&format=jsonh' % (session, mix_id)
next_url = first_url
res = []
for i in itertools.count():
for i in range(track_count):
api_json = self._download_webpage(next_url, playlist_id,
note=u'Downloading song information %s/%s' % (str(i+1), track_count),
errnote=u'Failed to download song information')
@ -45,7 +115,5 @@ class EightTracksIE(InfoExtractor):
'ext': 'm4a',
}
res.append(info)
if api_data['set']['at_last_track']:
break
next_url = 'http://8tracks.com/sets/%s/next?player=sm&mix_id=%s&format=jsonh&track_id=%s' % (session, mix_id, track_data['id'])
return res

View File

@ -0,0 +1,37 @@
# encoding: utf-8
import re
from .common import InfoExtractor
from .brightcove import BrightcoveIE
from ..utils import ExtractorError
class EitbIE(InfoExtractor):
IE_NAME = u'eitb.tv'
_VALID_URL = r'https?://www\.eitb\.tv/(eu/bideoa|es/video)/[^/]+/(?P<playlist_id>\d+)/(?P<chapter_id>\d+)'
_TEST = {
u'add_ie': ['Brightcove'],
u'url': u'http://www.eitb.tv/es/video/60-minutos-60-minutos-2013-2014/2677100210001/2743577154001/lasa-y-zabala-30-anos/',
u'md5': u'edf4436247185adee3ea18ce64c47998',
u'info_dict': {
u'id': u'2743577154001',
u'ext': u'mp4',
u'title': u'60 minutos (Lasa y Zabala, 30 años)',
# All videos from eitb has this description in the brightcove info
u'description': u'.',
u'uploader': u'Euskal Telebista',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
chapter_id = mobj.group('chapter_id')
webpage = self._download_webpage(url, chapter_id)
bc_url = BrightcoveIE._extract_brightcove_url(webpage)
if bc_url is None:
raise ExtractorError(u'Could not extract the Brightcove url')
# The BrightcoveExperience object doesn't contain the video id, we set
# it manually
bc_url += '&%40videoPlayer={0}'.format(chapter_id)
return self.url_result(bc_url, BrightcoveIE.ie_key())

View File

@ -11,58 +11,74 @@ from ..utils import (
class EscapistIE(InfoExtractor):
_VALID_URL = r'^(https?://)?(www\.)?escapistmagazine\.com/videos/view/(?P<showname>[^/]+)/(?P<episode>[^/?]+)[/?]?.*$'
_VALID_URL = r'^https?://?(www\.)?escapistmagazine\.com/videos/view/(?P<showname>[^/]+)/(?P<episode>[^/?]+)[/?]?.*$'
_TEST = {
u'url': u'http://www.escapistmagazine.com/videos/view/the-escapist-presents/6618-Breaking-Down-Baldurs-Gate',
u'file': u'6618-Breaking-Down-Baldurs-Gate.mp4',
u'md5': u'ab3a706c681efca53f0a35f1415cf0d1',
u'info_dict': {
u"description": u"Baldur's Gate: Original, Modded or Enhanced Edition? I'll break down what you can expect from the new Baldur's Gate: Enhanced Edition.",
u"uploader": u"the-escapist-presents",
u"title": u"Breaking Down Baldur's Gate"
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
if mobj is None:
raise ExtractorError(u'Invalid URL: %s' % url)
showName = mobj.group('showname')
videoId = mobj.group('episode')
self.report_extraction(videoId)
webpage = self._download_webpage(url, videoId)
videoDesc = self._html_search_regex('<meta name="description" content="([^"]*)"',
videoDesc = self._html_search_regex(
r'<meta name="description" content="([^"]*)"',
webpage, u'description', fatal=False)
imgUrl = self._html_search_regex('<meta property="og:image" content="([^"]*)"',
webpage, u'thumbnail', fatal=False)
playerUrl = self._og_search_video_url(webpage, name=u'player URL')
playerUrl = self._html_search_regex('<meta property="og:video" content="([^"]*)"',
webpage, u'player url')
title = self._html_search_regex(
r'<meta name="title" content="([^"]*)"',
webpage, u'title').split(' : ')[-1]
title = self._html_search_regex('<meta name="title" content="([^"]*)"',
webpage, u'player url').split(' : ')[-1]
configUrl = self._search_regex('config=(.*)$', playerUrl, u'config url')
configUrl = self._search_regex('config=(.*)$', playerUrl, u'config URL')
configUrl = compat_urllib_parse.unquote(configUrl)
configJSON = self._download_webpage(configUrl, videoId,
u'Downloading configuration',
u'unable to download configuration')
formats = []
# Technically, it's JavaScript, not JSON
configJSON = configJSON.replace("'", '"')
def _add_format(name, cfgurl):
configJSON = self._download_webpage(
cfgurl, videoId,
u'Downloading ' + name + ' configuration',
u'Unable to download ' + name + ' configuration')
# Technically, it's JavaScript, not JSON
configJSON = configJSON.replace("'", '"')
try:
config = json.loads(configJSON)
except (ValueError,) as err:
raise ExtractorError(u'Invalid JSON in configuration file: ' + compat_str(err))
playlist = config['playlist']
formats.append({
'url': playlist[1]['url'],
'format_id': name,
})
_add_format(u'normal', configUrl)
hq_url = (configUrl +
('&hq=1' if '?' in configUrl else configUrl + '?hq=1'))
try:
config = json.loads(configJSON)
except (ValueError,) as err:
raise ExtractorError(u'Invalid JSON in configuration file: ' + compat_str(err))
_add_format(u'hq', hq_url)
except ExtractorError:
pass # That's fine, we'll just use normal quality
playlist = config['playlist']
videoUrl = playlist[1]['url']
info = {
return {
'id': videoId,
'url': videoUrl,
'formats': formats,
'uploader': showName,
'upload_date': None,
'title': title,
'ext': 'mp4',
'thumbnail': imgUrl,
'thumbnail': self._og_search_thumbnail(webpage),
'description': videoDesc,
'player_url': playerUrl,
}
return [info]

View File

@ -0,0 +1,56 @@
import re
import json
from .common import InfoExtractor
class ExfmIE(InfoExtractor):
IE_NAME = u'exfm'
IE_DESC = u'ex.fm'
_VALID_URL = r'(?:http://)?(?:www\.)?ex\.fm/song/([^/]+)'
_SOUNDCLOUD_URL = r'(?:http://)?(?:www\.)?api\.soundcloud.com/tracks/([^/]+)/stream'
_TESTS = [
{
u'url': u'http://ex.fm/song/eh359',
u'file': u'44216187.mp3',
u'md5': u'e45513df5631e6d760970b14cc0c11e7',
u'info_dict': {
u"title": u"Test House \"Love Is Not Enough\" (Extended Mix) DeadJournalist Exclusive",
u"uploader": u"deadjournalist",
u'upload_date': u'20120424',
u'description': u'Test House \"Love Is Not Enough\" (Extended Mix) DeadJournalist Exclusive',
},
u'note': u'Soundcloud song',
u'skip': u'The site is down too often',
},
{
u'url': u'http://ex.fm/song/wddt8',
u'file': u'wddt8.mp3',
u'md5': u'966bd70741ac5b8570d8e45bfaed3643',
u'info_dict': {
u'title': u'Safe and Sound',
u'uploader': u'Capital Cities',
},
u'skip': u'The site is down too often',
},
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
song_id = mobj.group(1)
info_url = "http://ex.fm/api/v3/song/%s" %(song_id)
webpage = self._download_webpage(info_url, song_id)
info = json.loads(webpage)
song_url = info['song']['url']
if re.match(self._SOUNDCLOUD_URL, song_url) is not None:
self.to_screen('Soundcloud song detected')
return self.url_result(song_url.replace('/stream',''), 'Soundcloud')
return [{
'id': song_id,
'url': song_url,
'ext': 'mp3',
'title': info['song']['title'],
'thumbnail': info['song']['image']['large'],
'uploader': info['song']['artist'],
'view_count': info['song']['loved_count'],
}]

View File

@ -0,0 +1,50 @@
import os
import re
from .common import InfoExtractor
from ..utils import (
compat_urllib_parse_urlparse,
compat_urllib_request,
compat_urllib_parse,
)
class ExtremeTubeIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?(?:www\.)?(?P<url>extremetube\.com/video/.+?(?P<videoid>[0-9]+))(?:[/?&]|$)'
_TEST = {
u'url': u'http://www.extremetube.com/video/music-video-14-british-euro-brit-european-cumshots-swallow-652431',
u'file': u'652431.mp4',
u'md5': u'1fb9228f5e3332ec8c057d6ac36f33e0',
u'info_dict': {
u"title": u"Music Video 14 british euro brit european cumshots swallow",
u"uploader": u"unknown",
u"age_limit": 18,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('videoid')
url = 'http://www.' + mobj.group('url')
req = compat_urllib_request.Request(url)
req.add_header('Cookie', 'age_verified=1')
webpage = self._download_webpage(req, video_id)
video_title = self._html_search_regex(r'<h1 [^>]*?title="([^"]+)"[^>]*>\1<', webpage, u'title')
uploader = self._html_search_regex(r'>Posted by:(?=<)(?:\s|<[^>]*>)*(.+?)\|', webpage, u'uploader', fatal=False)
video_url = compat_urllib_parse.unquote(self._html_search_regex(r'video_url=(.+?)&amp;', webpage, u'video_url'))
path = compat_urllib_parse_urlparse(video_url).path
extension = os.path.splitext(path)[1][1:]
format = path.split('/')[5].split('_')[:2]
format = "-".join(format)
return {
'id': video_id,
'title': video_title,
'uploader': uploader,
'url': video_url,
'ext': extension,
'format': format,
'format_id': format,
'age_limit': 18,
}

View File

@ -1,5 +1,4 @@
import json
import netrc
import re
import socket
@ -19,58 +18,74 @@ class FacebookIE(InfoExtractor):
"""Information Extractor for Facebook"""
_VALID_URL = r'^(?:https?://)?(?:\w+\.)?facebook\.com/(?:video/video|photo)\.php\?(?:.*?)v=(?P<ID>\d+)(?:.*)'
_LOGIN_URL = 'https://login.facebook.com/login.php?m&next=http%3A%2F%2Fm.facebook.com%2Fhome.php&'
_LOGIN_URL = 'https://www.facebook.com/login.php?next=http%3A%2F%2Ffacebook.com%2Fhome.php&login_attempt=1'
_CHECKPOINT_URL = 'https://www.facebook.com/checkpoint/?next=http%3A%2F%2Ffacebook.com%2Fhome.php&_fb_noscript=1'
_NETRC_MACHINE = 'facebook'
IE_NAME = u'facebook'
_TEST = {
u'url': u'https://www.facebook.com/photo.php?v=120708114770723',
u'file': u'120708114770723.mp4',
u'md5': u'48975a41ccc4b7a581abd68651c1a5a8',
u'info_dict': {
u"duration": 279,
u"title": u"PEOPLE ARE AWESOME 2013"
}
}
def report_login(self):
"""Report attempt to log in."""
self.to_screen(u'Logging in')
def _real_initialize(self):
if self._downloader is None:
return
useremail = None
password = None
downloader_params = self._downloader.params
# Attempt to use provided username and password or .netrc data
if downloader_params.get('username', None) is not None:
useremail = downloader_params['username']
password = downloader_params['password']
elif downloader_params.get('usenetrc', False):
try:
info = netrc.netrc().authenticators(self._NETRC_MACHINE)
if info is not None:
useremail = info[0]
password = info[2]
else:
raise netrc.NetrcParseError('No authenticators for %s' % self._NETRC_MACHINE)
except (IOError, netrc.NetrcParseError) as err:
self._downloader.report_warning(u'parsing .netrc: %s' % compat_str(err))
return
def _login(self):
(useremail, password) = self._get_login_info()
if useremail is None:
return
# Log in
login_page_req = compat_urllib_request.Request(self._LOGIN_URL)
login_page_req.add_header('Cookie', 'locale=en_US')
self.report_login()
login_page = self._download_webpage(login_page_req, None, note=False,
errnote=u'Unable to download login page')
lsd = self._search_regex(r'"lsd":"(\w*?)"', login_page, u'lsd')
lgnrnd = self._search_regex(r'name="lgnrnd" value="([^"]*?)"', login_page, u'lgnrnd')
login_form = {
'email': useremail,
'pass': password,
'login': 'Log+In'
'lsd': lsd,
'lgnrnd': lgnrnd,
'next': 'http://facebook.com/home.php',
'default_persistent': '0',
'legacy_return': '1',
'timezone': '-60',
'trynum': '1',
}
request = compat_urllib_request.Request(self._LOGIN_URL, compat_urllib_parse.urlencode(login_form))
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
try:
self.report_login()
login_results = compat_urllib_request.urlopen(request).read()
if re.search(r'<form(.*)name="login"(.*)</form>', login_results) is not None:
self._downloader.report_warning(u'unable to log in: bad username/password, or exceded login rate limit (~3/min). Check credentials or wait.')
return
check_form = {
'fb_dtsg': self._search_regex(r'"fb_dtsg":"(.*?)"', login_results, u'fb_dtsg'),
'nh': self._search_regex(r'name="nh" value="(\w*?)"', login_results, u'nh'),
'name_action_selected': 'dont_save',
'submit[Continue]': self._search_regex(r'<input value="(.*?)" name="submit\[Continue\]"', login_results, u'continue'),
}
check_req = compat_urllib_request.Request(self._CHECKPOINT_URL, compat_urllib_parse.urlencode(check_form))
check_req.add_header('Content-Type', 'application/x-www-form-urlencoded')
check_response = compat_urllib_request.urlopen(check_req).read()
if re.search(r'id="checkpointSubmitButton"', check_response) is not None:
self._downloader.report_warning(u'Unable to confirm login, you have to login in your brower and authorize the login.')
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
self._downloader.report_warning(u'unable to log in: %s' % compat_str(err))
return
def _real_initialize(self):
self._login()
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
if mobj is None:
@ -84,7 +99,13 @@ class FacebookIE(InfoExtractor):
AFTER = '.forEach(function(variable) {swf.addVariable(variable[0], variable[1]);});'
m = re.search(re.escape(BEFORE) + '(.*?)' + re.escape(AFTER), webpage)
if not m:
raise ExtractorError(u'Cannot parse data')
m_msg = re.search(r'class="[^"]*uiInterstitialContent[^"]*"><div>(.*?)</div>', webpage)
if m_msg is not None:
raise ExtractorError(
u'The video is not available, Facebook said: "%s"' % m_msg.group(1),
expected=True)
else:
raise ExtractorError(u'Cannot parse data')
data = dict(json.loads(m.group(1)))
params_raw = compat_urllib_parse.unquote(data['params'])
params = json.loads(params_raw)
@ -97,8 +118,8 @@ class FacebookIE(InfoExtractor):
video_duration = int(video_data['video_duration'])
thumbnail = video_data['thumbnail_src']
video_title = self._html_search_regex('<h2 class="uiHeaderTitle">([^<]+)</h2>',
webpage, u'title')
video_title = self._html_search_regex(
r'<h2 class="uiHeaderTitle">([^<]*)</h2>', webpage, u'title')
info = {
'id': video_id,

View File

@ -0,0 +1,58 @@
# encoding: utf-8
import re
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import (
determine_ext,
)
class FazIE(InfoExtractor):
IE_NAME = u'faz.net'
_VALID_URL = r'https?://www\.faz\.net/multimedia/videos/.*?-(?P<id>\d+).html'
_TEST = {
u'url': u'http://www.faz.net/multimedia/videos/stockholm-chemie-nobelpreis-fuer-drei-amerikanische-forscher-12610585.html',
u'file': u'12610585.mp4',
u'info_dict': {
u'title': u'Stockholm: Chemie-Nobelpreis für drei amerikanische Forscher',
u'description': u'md5:1453fbf9a0d041d985a47306192ea253',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
self.to_screen(video_id)
webpage = self._download_webpage(url, video_id)
config_xml_url = self._search_regex(r'writeFLV\(\'(.+?)\',', webpage,
u'config xml url')
config_xml = self._download_webpage(config_xml_url, video_id,
u'Downloading config xml')
config = xml.etree.ElementTree.fromstring(config_xml.encode('utf-8'))
encodings = config.find('ENCODINGS')
formats = []
for code in ['LOW', 'HIGH', 'HQ']:
encoding = encodings.find(code)
if encoding is None:
continue
encoding_url = encoding.find('FILENAME').text
formats.append({
'url': encoding_url,
'ext': determine_ext(encoding_url),
'format_id': code.lower(),
})
descr = self._html_search_regex(r'<p class="Content Copy">(.*?)</p>', webpage, u'description')
info = {
'id': video_id,
'title': self._og_search_title(webpage),
'formats': formats,
'description': descr,
'thumbnail': config.find('STILL/STILL_BIG').text,
}
# TODO: Remove when #980 has been merged
info.update(formats[-1])
return info

View File

@ -0,0 +1,78 @@
import re
import random
import json
from .common import InfoExtractor
from ..utils import (
determine_ext,
get_element_by_id,
clean_html,
)
class FKTVIE(InfoExtractor):
IE_NAME = u'fernsehkritik.tv'
_VALID_URL = r'(?:http://)?(?:www\.)?fernsehkritik.tv/folge-(?P<ep>[0-9]+)(?:/.*)?'
_TEST = {
u'url': u'http://fernsehkritik.tv/folge-1',
u'file': u'00011.flv',
u'info_dict': {
u'title': u'Folge 1 vom 10. April 2007',
u'description': u'md5:fb4818139c7cfe6907d4b83412a6864f',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
episode = int(mobj.group('ep'))
server = random.randint(2, 4)
video_thumbnail = 'http://fernsehkritik.tv/images/magazin/folge%d.jpg' % episode
start_webpage = self._download_webpage('http://fernsehkritik.tv/folge-%d/Start' % episode,
episode)
playlist = self._search_regex(r'playlist = (\[.*?\]);', start_webpage,
u'playlist', flags=re.DOTALL)
files = json.loads(re.sub('{[^{}]*?}', '{}', playlist))
# TODO: return a single multipart video
videos = []
for i, _ in enumerate(files, 1):
video_id = '%04d%d' % (episode, i)
video_url = 'http://dl%d.fernsehkritik.tv/fernsehkritik%d%s.flv' % (server, episode, '' if i == 1 else '-%d' % i)
videos.append({
'id': video_id,
'url': video_url,
'ext': determine_ext(video_url),
'title': clean_html(get_element_by_id('eptitle', start_webpage)),
'description': clean_html(get_element_by_id('contentlist', start_webpage)),
'thumbnail': video_thumbnail
})
return videos
class FKTVPosteckeIE(InfoExtractor):
IE_NAME = u'fernsehkritik.tv:postecke'
_VALID_URL = r'(?:http://)?(?:www\.)?fernsehkritik.tv/inline-video/postecke.php\?(.*&)?ep=(?P<ep>[0-9]+)(&|$)'
_TEST = {
u'url': u'http://fernsehkritik.tv/inline-video/postecke.php?iframe=true&width=625&height=440&ep=120',
u'file': u'0120.flv',
u'md5': u'262f0adbac80317412f7e57b4808e5c4',
u'info_dict': {
u"title": u"Postecke 120"
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
episode = int(mobj.group('ep'))
server = random.randint(2, 4)
video_id = '%04d' % episode
video_url = 'http://dl%d.fernsehkritik.tv/postecke/postecke%d.flv' % (server, episode)
video_title = 'Postecke %d' % episode
return {
'id': video_id,
'url': video_url,
'ext': determine_ext(video_url),
'title': video_title,
}

View File

@ -9,7 +9,17 @@ from ..utils import (
class FlickrIE(InfoExtractor):
"""Information Extractor for Flickr videos"""
_VALID_URL = r'(?:https?://)?(?:www\.)?flickr\.com/photos/(?P<uploader_id>[\w\-_@]+)/(?P<id>\d+).*'
_VALID_URL = r'(?:https?://)?(?:www\.|secure\.)?flickr\.com/photos/(?P<uploader_id>[\w\-_@]+)/(?P<id>\d+).*'
_TEST = {
u'url': u'http://www.flickr.com/photos/forestwander-nature-pictures/5645318632/in/photostream/',
u'file': u'5645318632.mp4',
u'md5': u'6fdc01adbc89d72fc9c4f15b4a4ba87b',
u'info_dict': {
u"description": u"Waterfalls in the Springtime at Dark Hollow Waterfalls. These are located just off of Skyline Drive in Virginia. They are only about 6/10 of a mile hike but it is a pretty steep hill and a good climb back up.",
u"uploader_id": u"forestwander-nature-pictures",
u"title": u"Dark Hollow Waterfalls"
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
@ -37,21 +47,12 @@ class FlickrIE(InfoExtractor):
raise ExtractorError(u'Unable to extract video url')
video_url = mobj.group(1) + unescapeHTML(mobj.group(2))
video_title = self._html_search_regex(r'<meta property="og:title" content=(?:"([^"]+)"|\'([^\']+)\')',
webpage, u'video title')
video_description = self._html_search_regex(r'<meta property="og:description" content=(?:"([^"]+)"|\'([^\']+)\')',
webpage, u'description', fatal=False)
thumbnail = self._html_search_regex(r'<meta property="og:image" content=(?:"([^"]+)"|\'([^\']+)\')',
webpage, u'thumbnail', fatal=False)
return [{
'id': video_id,
'url': video_url,
'ext': 'mp4',
'title': video_title,
'description': video_description,
'thumbnail': thumbnail,
'title': self._og_search_title(webpage),
'description': self._og_search_description(webpage),
'thumbnail': self._og_search_thumbnail(webpage),
'uploader_id': video_uploader_id,
}]

View File

@ -0,0 +1,129 @@
# encoding: utf-8
import re
import xml.etree.ElementTree
import json
from .common import InfoExtractor
from ..utils import (
compat_urlparse,
)
class FranceTVBaseInfoExtractor(InfoExtractor):
def _extract_video(self, video_id):
xml_desc = self._download_webpage(
'http://www.francetvinfo.fr/appftv/webservices/video/'
'getInfosOeuvre.php?id-diffusion='
+ video_id, video_id, 'Downloading XML config')
info = xml.etree.ElementTree.fromstring(xml_desc.encode('utf-8'))
manifest_url = info.find('videos/video/url').text
video_url = manifest_url.replace('manifest.f4m', 'index_2_av.m3u8')
video_url = video_url.replace('/z/', '/i/')
thumbnail_path = info.find('image').text
return {'id': video_id,
'ext': 'mp4',
'url': video_url,
'title': info.find('titre').text,
'thumbnail': compat_urlparse.urljoin('http://pluzz.francetv.fr', thumbnail_path),
'description': info.find('synopsis').text,
}
class PluzzIE(FranceTVBaseInfoExtractor):
IE_NAME = u'pluzz.francetv.fr'
_VALID_URL = r'https?://pluzz\.francetv\.fr/videos/(.*?)\.html'
# Can't use tests, videos expire in 7 days
def _real_extract(self, url):
title = re.match(self._VALID_URL, url).group(1)
webpage = self._download_webpage(url, title)
video_id = self._search_regex(
r'data-diffusion="(\d+)"', webpage, 'ID')
return self._extract_video(video_id)
class FranceTvInfoIE(FranceTVBaseInfoExtractor):
IE_NAME = u'francetvinfo.fr'
_VALID_URL = r'https?://www\.francetvinfo\.fr/replay.*/(?P<title>.+).html'
_TEST = {
u'url': u'http://www.francetvinfo.fr/replay-jt/france-3/soir-3/jt-grand-soir-3-lundi-26-aout-2013_393427.html',
u'file': u'84981923.mp4',
u'info_dict': {
u'title': u'Soir 3',
},
u'params': {
u'skip_download': True,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
page_title = mobj.group('title')
webpage = self._download_webpage(url, page_title)
video_id = self._search_regex(r'id-video=(\d+?)"', webpage, u'video id')
return self._extract_video(video_id)
class France2IE(FranceTVBaseInfoExtractor):
IE_NAME = u'france2.fr'
_VALID_URL = r'''(?x)https?://www\.france2\.fr/
(?:
emissions/.*?/videos/(?P<id>\d+)
| emission/(?P<key>[^/?]+)
)'''
_TEST = {
u'url': u'http://www.france2.fr/emissions/13h15-le-samedi-le-dimanche/videos/75540104',
u'file': u'75540104.mp4',
u'info_dict': {
u'title': u'13h15, le samedi...',
u'description': u'md5:2e5b58ba7a2d3692b35c792be081a03d',
},
u'params': {
u'skip_download': True,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
if mobj.group('key'):
webpage = self._download_webpage(url, mobj.group('key'))
video_id = self._html_search_regex(
r'''(?x)<div\s+class="video-player">\s*
<a\s+href="http://videos.francetv.fr/video/([0-9]+)"\s+
class="francetv-video-player">''',
webpage, u'video ID')
else:
video_id = mobj.group('id')
return self._extract_video(video_id)
class GenerationQuoiIE(InfoExtractor):
IE_NAME = u'france2.fr:generation-quoi'
_VALID_URL = r'https?://generation-quoi\.france2\.fr/portrait/(?P<name>.*)(\?|$)'
_TEST = {
u'url': u'http://generation-quoi.france2.fr/portrait/garde-a-vous',
u'file': u'k7FJX8VBcvvLmX4wA5Q.mp4',
u'info_dict': {
u'title': u'Génération Quoi - Garde à Vous',
u'uploader': u'Génération Quoi',
},
u'params': {
# It uses Dailymotion
u'skip_download': True,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
name = mobj.group('name')
info_url = compat_urlparse.urljoin(url, '/medias/video/%s.json' % name)
info_json = self._download_webpage(info_url, name)
info = json.loads(info_json)
return self.url_result('http://www.dailymotion.com/video/%s' % info['id'],
ie='Dailymotion')

View File

@ -0,0 +1,36 @@
import re
from .common import InfoExtractor
from ..utils import determine_ext
class FreesoundIE(InfoExtractor):
_VALID_URL = r'(?:https?://)?(?:www\.)?freesound\.org/people/([^/]+)/sounds/(?P<id>[^/]+)'
_TEST = {
u'url': u'http://www.freesound.org/people/miklovan/sounds/194503/',
u'file': u'194503.mp3',
u'md5': u'12280ceb42c81f19a515c745eae07650',
u'info_dict': {
u"title": u"gulls in the city.wav",
u"uploader" : u"miklovan",
u'description': u'the sounds of seagulls in the city',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
music_id = mobj.group('id')
webpage = self._download_webpage(url, music_id)
title = self._html_search_regex(r'<div id="single_sample_header">.*?<a href="#">(.+?)</a>',
webpage, 'music title', flags=re.DOTALL)
music_url = self._og_search_property('audio', webpage, 'music url')
description = self._html_search_regex(r'<div id="sound_description">(.*?)</div>',
webpage, 'description', fatal=False, flags=re.DOTALL)
return [{
'id': music_id,
'title': title,
'url': music_url,
'uploader': self._og_search_property('audio:artist', webpage, 'music uploader'),
'ext': determine_ext(music_url),
'description': description,
}]

View File

@ -5,6 +5,15 @@ from .common import InfoExtractor
class FunnyOrDieIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?(?:www\.)?funnyordie\.com/videos/(?P<id>[0-9a-f]+)/.*$'
_TEST = {
u'url': u'http://www.funnyordie.com/videos/0732f586d7/heart-shaped-box-literal-video-version',
u'file': u'0732f586d7.mp4',
u'md5': u'f647e9e90064b53b6e046e75d0241fbd',
u'info_dict': {
u"description": u"Lyrics changed to match the video. Spoken cameo by Obscurus Lupa (from ThatGuyWithTheGlasses.com). Based on a concept by Dustin McLean (DustFilms.com). Performed, edited, and written by David A. Scott.",
u"title": u"Heart-Shaped Box: Literal Video Version"
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
@ -12,20 +21,15 @@ class FunnyOrDieIE(InfoExtractor):
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
video_url = self._html_search_regex(r'<video[^>]*>\s*<source[^>]*>\s*<source src="(?P<url>[^"]+)"',
video_url = self._search_regex(
[r'type="video/mp4" src="(.*?)"', r'src="([^>]*?)" type=\'video/mp4\''],
webpage, u'video URL', flags=re.DOTALL)
title = self._html_search_regex((r"<h1 class='player_page_h1'.*?>(?P<title>.*?)</h1>",
r'<title>(?P<title>[^<]+?)</title>'), webpage, 'title', flags=re.DOTALL)
video_description = self._html_search_regex(r'<meta property="og:description" content="(?P<desc>.*?)"',
webpage, u'description', fatal=False, flags=re.DOTALL)
info = {
'id': video_id,
'url': video_url,
'ext': 'mp4',
'title': title,
'description': video_description,
'title': self._og_search_title(webpage),
'description': self._og_search_description(webpage),
}
return [info]

View File

@ -0,0 +1,38 @@
import re
from .common import InfoExtractor
class GamekingsIE(InfoExtractor):
_VALID_URL = r'http?://www\.gamekings\.tv/videos/(?P<name>[0-9a-z\-]+)'
_TEST = {
u"url": u"http://www.gamekings.tv/videos/phoenix-wright-ace-attorney-dual-destinies-review/",
u'file': u'20130811.mp4',
# MD5 is flaky, seems to change regularly
#u'md5': u'2f32b1f7b80fdc5cb616efb4f387f8a3',
u'info_dict': {
u"title": u"Phoenix Wright: Ace Attorney \u2013 Dual Destinies Review",
u"description": u"Melle en Steven hebben voor de review een week in de rechtbank doorbracht met Phoenix Wright: Ace Attorney - Dual Destinies.",
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
name = mobj.group('name')
webpage = self._download_webpage(url, name)
video_url = self._og_search_video_url(webpage)
video = re.search(r'[0-9]+', video_url)
video_id = video.group(0)
# Todo: add medium format
video_url = video_url.replace(video_id, 'large/' + video_id)
return {
'id': video_id,
'ext': 'mp4',
'url': video_url,
'title': self._og_search_title(webpage),
'description': self._og_search_description(webpage),
}

View File

@ -0,0 +1,59 @@
import re
import json
from .common import InfoExtractor
from ..utils import (
compat_urllib_parse,
compat_urlparse,
unescapeHTML,
get_meta_content,
)
class GameSpotIE(InfoExtractor):
_VALID_URL = r'(?:http://)?(?:www\.)?gamespot\.com/.*-(?P<page_id>\d+)/?'
_TEST = {
u"url": u"http://www.gamespot.com/arma-iii/videos/arma-iii-community-guide-sitrep-i-6410818/",
u"file": u"gs-2300-6410818.mp4",
u"md5": u"b2a30deaa8654fcccd43713a6b6a4825",
u"info_dict": {
u"title": u"Arma 3 - Community Guide: SITREP I",
u'description': u'Check out this video where some of the basics of Arma 3 is explained.',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
page_id = mobj.group('page_id')
webpage = self._download_webpage(url, page_id)
data_video_json = self._search_regex(r'data-video=\'(.*?)\'', webpage, u'data video')
data_video = json.loads(unescapeHTML(data_video_json))
# Transform the manifest url to a link to the mp4 files
# they are used in mobile devices.
f4m_url = data_video['videoStreams']['f4m_stream']
f4m_path = compat_urlparse.urlparse(f4m_url).path
QUALITIES_RE = r'((,\d+)+,?)'
qualities = self._search_regex(QUALITIES_RE, f4m_path, u'qualities').strip(',').split(',')
http_path = f4m_path[1:].split('/', 1)[1]
http_template = re.sub(QUALITIES_RE, r'%s', http_path)
http_template = http_template.replace('.csmil/manifest.f4m', '')
http_template = compat_urlparse.urljoin('http://video.gamespotcdn.com/', http_template)
formats = []
for q in qualities:
formats.append({
'url': http_template % q,
'ext': 'mp4',
'format_id': q,
})
info = {
'id': data_video['guid'],
'title': compat_urllib_parse.unquote(data_video['title']),
'formats': formats,
'description': get_meta_content('description', webpage),
'thumbnail': self._og_search_thumbnail(webpage),
}
# TODO: Remove when #980 has been merged
info.update(formats[-1])
return info

View File

@ -1,59 +1,36 @@
import re
from .common import InfoExtractor
from ..utils import (
compat_urllib_parse,
from .mtv import MTVIE, _media_xml_tag
ExtractorError,
)
class GametrailersIE(InfoExtractor):
class GametrailersIE(MTVIE):
"""
Gametrailers use the same videos system as MTVIE, it just changes the feed
url, where the uri is and the method to get the thumbnails.
"""
_VALID_URL = r'http://www.gametrailers.com/(?P<type>videos|reviews|full-episodes)/(?P<id>.*?)/(?P<title>.*)'
_TEST = {
u'url': u'http://www.gametrailers.com/videos/zbvr8i/mirror-s-edge-2-e3-2013--debut-trailer',
u'file': u'70e9a5d7-cf25-4a10-9104-6f3e7342ae0d.mp4',
u'md5': u'4c8e67681a0ea7ec241e8c09b3ea8cf7',
u'info_dict': {
u'title': u'E3 2013: Debut Trailer',
u'description': u'Faith is back! Check out the World Premiere trailer for Mirror\'s Edge 2 straight from the EA Press Conference at E3 2013!',
},
}
# Overwrite MTVIE properties we don't want
_TESTS = []
_FEED_URL = 'http://www.gametrailers.com/feeds/mrss'
def _get_thumbnail_url(self, uri, itemdoc):
search_path = '%s/%s' % (_media_xml_tag('group'), _media_xml_tag('thumbnail'))
return itemdoc.find(search_path).attrib['url']
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
if mobj is None:
raise ExtractorError(u'Invalid URL: %s' % url)
video_id = mobj.group('id')
video_type = mobj.group('type')
webpage = self._download_webpage(url, video_id)
if video_type == 'full-episodes':
mgid_re = r'data-video="(?P<mgid>mgid:.*?)"'
else:
mgid_re = r'data-contentId=\'(?P<mgid>mgid:.*?)\''
mgid = self._search_regex(mgid_re, webpage, u'mgid')
data = compat_urllib_parse.urlencode({'uri': mgid, 'acceptMethods': 'fms'})
info_page = self._download_webpage('http://www.gametrailers.com/feeds/mrss?' + data,
video_id, u'Downloading video info')
links_webpage = self._download_webpage('http://www.gametrailers.com/feeds/mediagen/?' + data,
video_id, u'Downloading video urls info')
self.report_extraction(video_id)
info_re = r'''<title><!\[CDATA\[(?P<title>.*?)\]\]></title>.*
<description><!\[CDATA\[(?P<description>.*?)\]\]></description>.*
<image>.*
<url>(?P<thumb>.*?)</url>.*
</image>'''
m_info = re.search(info_re, info_page, re.VERBOSE|re.DOTALL)
if m_info is None:
raise ExtractorError(u'Unable to extract video info')
video_title = m_info.group('title')
video_description = m_info.group('description')
video_thumb = m_info.group('thumb')
m_urls = list(re.finditer(r'<src>(?P<url>.*)</src>', links_webpage))
if m_urls is None or len(m_urls) == 0:
raise ExtractorError(u'Unable to extract video url')
# They are sorted from worst to best quality
video_url = m_urls[-1].group('url')
return {'url': video_url,
'id': video_id,
'title': video_title,
# Videos are actually flv not mp4
'ext': 'flv',
'thumbnail': video_thumb,
'description': video_description,
}
mgid = self._search_regex([r'data-video="(?P<mgid>mgid:.*?)"',
r'data-contentId=\'(?P<mgid>mgid:.*?)\''],
webpage, u'mgid')
return self._get_videos_info(mgid)

View File

@ -1,3 +1,5 @@
# encoding: utf-8
import os
import re
@ -6,15 +8,70 @@ from ..utils import (
compat_urllib_error,
compat_urllib_parse,
compat_urllib_request,
compat_urlparse,
ExtractorError,
smuggle_url,
unescapeHTML,
)
from .brightcove import BrightcoveIE
class GenericIE(InfoExtractor):
"""Generic last-resort information extractor."""
IE_DESC = u'Generic downloader that works on some sites'
_VALID_URL = r'.*'
IE_NAME = u'generic'
_TESTS = [
{
u'url': u'http://www.hodiho.fr/2013/02/regis-plante-sa-jeep.html',
u'file': u'13601338388002.mp4',
u'md5': u'6e15c93721d7ec9e9ca3fdbf07982cfd',
u'info_dict': {
u"uploader": u"www.hodiho.fr",
u"title": u"R\u00e9gis plante sa Jeep"
}
},
# embedded vimeo video
{
u'add_ie': ['Vimeo'],
u'url': u'http://skillsmatter.com/podcast/home/move-semanticsperfect-forwarding-and-rvalue-references',
u'file': u'22444065.mp4',
u'md5': u'2903896e23df39722c33f015af0666e2',
u'info_dict': {
u'title': u'ACCU 2011: Move Semantics,Perfect Forwarding, and Rvalue references- Scott Meyers- 13/04/2011',
u"uploader_id": u"skillsmatter",
u"uploader": u"Skills Matter",
}
},
# bandcamp page with custom domain
{
u'add_ie': ['Bandcamp'],
u'url': u'http://bronyrock.com/track/the-pony-mash',
u'file': u'3235767654.mp3',
u'info_dict': {
u'title': u'The Pony Mash',
u'uploader': u'M_Pallante',
},
u'skip': u'There is a limit of 200 free downloads / month for the test song',
},
# embedded brightcove video
# it also tests brightcove videos that need to set the 'Referer' in the
# http requests
{
u'add_ie': ['Brightcove'],
u'url': u'http://www.bfmtv.com/video/bfmbusiness/cours-bourse/cours-bourse-l-analyse-technique-154522/',
u'info_dict': {
u'id': u'2765128793001',
u'ext': u'mp4',
u'title': u'Le cours de bourse : lanalyse technique',
u'description': u'md5:7e9ad046e968cb2d1114004aba466fd9',
u'uploader': u'BFM BUSINESS',
},
u'params': {
u'skip_download': True,
},
},
]
def report_download_webpage(self, video_id):
"""Report webpage download."""
@ -83,8 +140,18 @@ class GenericIE(InfoExtractor):
return new_url
def _real_extract(self, url):
new_url = self._test_redirect(url)
if new_url: return [self.url_result(new_url)]
parsed_url = compat_urlparse.urlparse(url)
if not parsed_url.scheme:
self._downloader.report_warning('The url doesn\'t specify the protocol, trying with http')
return self.url_result('http://' + url)
try:
new_url = self._test_redirect(url)
if new_url:
return [self.url_result(new_url)]
except compat_urllib_error.HTTPError:
# This may be a stupid server that doesn't like HEAD, our UA, or so
pass
video_id = url.split('/')[-1]
try:
@ -92,9 +159,49 @@ class GenericIE(InfoExtractor):
except ValueError:
# since this is the last-resort InfoExtractor, if
# this error is thrown, it'll be thrown here
raise ExtractorError(u'Invalid URL: %s' % url)
raise ExtractorError(u'Failed to download URL: %s' % url)
self.report_extraction(video_id)
# it's tempting to parse this further, but you would
# have to take into account all the variations like
# Video Title - Site Name
# Site Name | Video Title
# Video Title - Tagline | Site Name
# and so on and so forth; it's just not practical
video_title = self._html_search_regex(r'<title>(.*)</title>',
webpage, u'video title', default=u'video', flags=re.DOTALL)
# Look for BrightCove:
bc_url = BrightcoveIE._extract_brightcove_url(webpage)
if bc_url is not None:
self.to_screen(u'Brightcove video detected.')
return self.url_result(bc_url, 'Brightcove')
# Look for embedded Vimeo player
mobj = re.search(
r'<iframe[^>]+?src="(https?://player.vimeo.com/video/.+?)"', webpage)
if mobj:
player_url = unescapeHTML(mobj.group(1))
surl = smuggle_url(player_url, {'Referer': url})
return self.url_result(surl, 'Vimeo')
# Look for embedded YouTube player
matches = re.findall(
r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?youtube.com/embed/.+?)\1', webpage)
if matches:
urlrs = [self.url_result(unescapeHTML(tuppl[1]), 'Youtube')
for tuppl in matches]
return self.playlist_result(
urlrs, playlist_id=video_id, playlist_title=video_title)
# Look for Bandcamp pages with custom domain
mobj = re.search(r'<meta property="og:url"[^>]*?content="(.*?bandcamp\.com.*?)"', webpage)
if mobj is not None:
burl = unescapeHTML(mobj.group(1))
# Don't set the extractor because it can be a track url or an album
return self.url_result(burl)
# Start with something easy: JW Player in SWFObject
mobj = re.search(r'flashvars: [\'"](?:.*&)?file=(http[^\'"&]*)', webpage)
if mobj is None:
@ -102,7 +209,7 @@ class GenericIE(InfoExtractor):
mobj = re.search(r'[^A-Za-z0-9]?(?:file|source)=(http[^\'"&]*)', webpage)
if mobj is None:
# Broaden the search a little bit: JWPlayer JS loader
mobj = re.search(r'[^A-Za-z0-9]?file:\s*["\'](http[^\'"&]*)', webpage)
mobj = re.search(r'[^A-Za-z0-9]?file["\']?:\s*["\'](http[^\'"]*)', webpage)
if mobj is None:
# Try to find twitter cards info
mobj = re.search(r'<meta (?:property|name)="twitter:player:stream" (?:content|value)="(.+?)"', webpage)
@ -114,38 +221,31 @@ class GenericIE(InfoExtractor):
if m_video_type is not None:
mobj = re.search(r'<meta.*?property="og:video".*?content="(.*?)"', webpage)
if mobj is None:
raise ExtractorError(u'Invalid URL: %s' % url)
# HTML5 video
mobj = re.search(r'<video[^<]*(?:>.*?<source.*?)? src="([^"]+)"', webpage, flags=re.DOTALL)
if mobj is None:
raise ExtractorError(u'Unsupported URL: %s' % url)
# It's possible that one of the regexes
# matched, but returned an empty group:
if mobj.group(1) is None:
raise ExtractorError(u'Invalid URL: %s' % url)
raise ExtractorError(u'Did not find a valid video URL at %s' % url)
video_url = compat_urllib_parse.unquote(mobj.group(1))
video_id = os.path.basename(video_url)
video_url = mobj.group(1)
video_url = compat_urlparse.urljoin(url, video_url)
video_id = compat_urllib_parse.unquote(os.path.basename(video_url))
# here's a fun little line of code for you:
video_extension = os.path.splitext(video_id)[1][1:]
video_id = os.path.splitext(video_id)[0]
# it's tempting to parse this further, but you would
# have to take into account all the variations like
# Video Title - Site Name
# Site Name | Video Title
# Video Title - Tagline | Site Name
# and so on and so forth; it's just not practical
video_title = self._html_search_regex(r'<title>(.*)</title>',
webpage, u'video title')
# video uploader is domain name
video_uploader = self._search_regex(r'(?:https?://)?([^/]*)/.*',
url, u'video uploader')
return [{
return {
'id': video_id,
'url': video_url,
'uploader': video_uploader,
'upload_date': None,
'title': video_title,
'ext': video_extension,
}]
}

View File

@ -1,3 +1,5 @@
# coding: utf-8
import datetime
import re
@ -8,10 +10,18 @@ from ..utils import (
class GooglePlusIE(InfoExtractor):
"""Information extractor for plus.google.com."""
IE_DESC = u'Google Plus'
_VALID_URL = r'(?:https://)?plus\.google\.com/(?:[^/]+/)*?posts/(\w+)'
IE_NAME = u'plus.google'
_TEST = {
u"url": u"https://plus.google.com/u/0/108897254135232129896/posts/ZButuJc6CtH",
u"file": u"ZButuJc6CtH.flv",
u"info_dict": {
u"upload_date": u"20120613",
u"uploader": u"井上ヨシマサ",
u"title": u"嘆きの天使 降臨"
}
}
def _real_extract(self, url):
# Extract id from URL
@ -30,8 +40,10 @@ class GooglePlusIE(InfoExtractor):
self.report_extraction(video_id)
# Extract update date
upload_date = self._html_search_regex('title="Timestamp">(.*?)</a>',
webpage, u'upload date', fatal=False)
upload_date = self._html_search_regex(
r'''(?x)<a.+?class="o-U-s\s[^"]+"\s+style="display:\s*none"\s*>
([0-9]{4}-[0-9]{2}-[0-9]{2})</a>''',
webpage, u'upload date', fatal=False, flags=re.VERBOSE)
if upload_date:
# Convert timestring to a format suitable for filename
upload_date = datetime.datetime.strptime(upload_date, "%Y-%m-%d")
@ -47,8 +59,8 @@ class GooglePlusIE(InfoExtractor):
webpage, 'title', default=u'NA')
# Step 2, Simulate clicking the image box to launch video
DOMAIN = 'https://plus.google.com'
video_page = self._search_regex(r'<a href="((?:%s)?/photos/.*?)"' % re.escape(DOMAIN),
DOMAIN = 'https://plus.google.com/'
video_page = self._search_regex(r'<a href="((?:%s)?photos/.*?)"' % re.escape(DOMAIN),
webpage, u'video page URL')
if not video_page.startswith(DOMAIN):
video_page = DOMAIN + video_page

View File

@ -8,7 +8,7 @@ from ..utils import (
class GoogleSearchIE(SearchInfoExtractor):
"""Information Extractor for Google Video search queries."""
IE_DESC = u'Google Video search'
_MORE_PAGES_INDICATOR = r'id="pnnext" class="pn"'
_MAX_RESULTS = 1000
IE_NAME = u'video.google:search'

View File

@ -0,0 +1,37 @@
# -*- coding: utf-8 -*-
import re
import json
from .common import InfoExtractor
from ..utils import determine_ext
class HarkIE(InfoExtractor):
_VALID_URL = r'https?://www\.hark\.com/clips/(.+?)-.+'
_TEST = {
u'url': u'http://www.hark.com/clips/mmbzyhkgny-obama-beyond-the-afghan-theater-we-only-target-al-qaeda-on-may-23-2013',
u'file': u'mmbzyhkgny.mp3',
u'md5': u'6783a58491b47b92c7c1af5a77d4cbee',
u'info_dict': {
u'title': u"Obama: 'Beyond The Afghan Theater, We Only Target Al Qaeda' on May 23, 2013",
u'description': u'President Barack Obama addressed the nation live on May 23, 2013 in a speech aimed at addressing counter-terrorism policies including the use of drone strikes, detainees at Guantanamo Bay prison facility, and American citizens who are terrorists.',
u'duration': 11,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group(1)
json_url = "http://www.hark.com/clips/%s.json" %(video_id)
info_json = self._download_webpage(json_url, video_id)
info = json.loads(info_json)
final_url = info['url']
return {'id': video_id,
'url' : final_url,
'title': info['name'],
'ext': determine_ext(final_url),
'description': info['description'],
'thumbnail': info['image_original'],
'duration': info['duration'],
}

View File

@ -6,6 +6,14 @@ from .common import InfoExtractor
class HotNewHipHopIE(InfoExtractor):
_VALID_URL = r'http://www\.hotnewhiphop.com/.*\.(?P<id>.*)\.html'
_TEST = {
u'url': u"http://www.hotnewhiphop.com/freddie-gibbs-lay-it-down-song.1435540.html",
u'file': u'1435540.mp3',
u'md5': u'2c2cd2f76ef11a9b3b581e8b232f3d96',
u'info_dict': {
u"title": u"Freddie Gibbs - Lay It Down"
}
}
def _real_extract(self, url):
m = re.match(self._VALID_URL, url)
@ -25,16 +33,12 @@ class HotNewHipHopIE(InfoExtractor):
video_title = self._html_search_regex(r"<title>(.*)</title>",
webpage_src, u'title')
# Getting thumbnail and if not thumbnail sets correct title for WSHH candy video.
thumbnail = self._html_search_regex(r'"og:image" content="(.*)"',
webpage_src, u'thumbnail', fatal=False)
results = [{
'id': video_id,
'url' : video_url,
'title' : video_title,
'thumbnail' : thumbnail,
'thumbnail' : self._og_search_thumbnail(webpage_src),
'ext' : 'mp3',
}]
return results
return results

View File

@ -5,13 +5,21 @@ from .common import InfoExtractor
class HowcastIE(InfoExtractor):
_VALID_URL = r'(?:https?://)?(?:www\.)?howcast\.com/videos/(?P<id>\d+)'
_TEST = {
u'url': u'http://www.howcast.com/videos/390161-How-to-Tie-a-Square-Knot-Properly',
u'file': u'390161.mp4',
u'md5': u'8b743df908c42f60cf6496586c7f12c3',
u'info_dict': {
u"description": u"The square knot, also known as the reef knot, is one of the oldest, most basic knots to tie, and can be used in many different ways. Here's the proper way to tie a square knot.",
u"title": u"How to Tie a Square Knot Properly"
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage_url = 'http://www.howcast.com/videos/' + video_id
webpage = self._download_webpage(webpage_url, video_id)
webpage = self._download_webpage(url, video_id)
self.report_extraction(video_id)

View File

@ -15,6 +15,14 @@ from ..utils import (
class HypemIE(InfoExtractor):
"""Information Extractor for hypem"""
_VALID_URL = r'(?:http://)?(?:www\.)?hypem\.com/track/([^/]+)/([^/]+)'
_TEST = {
u'url': u'http://hypem.com/track/1v6ga/BODYWORK+-+TAME',
u'file': u'1v6ga.mp3',
u'md5': u'b9cc91b5af8995e9f0c1cee04c575828',
u'info_dict': {
u"title": u"Tame"
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
@ -22,7 +30,7 @@ class HypemIE(InfoExtractor):
raise ExtractorError(u'Invalid URL: %s' % url)
track_id = mobj.group(1)
data = { 'ax': 1, 'ts': time.time() }
data = {'ax': 1, 'ts': time.time()}
data_encoded = compat_urllib_parse.urlencode(data)
complete_url = url + "?" + data_encoded
request = compat_urllib_request.Request(complete_url)
@ -60,4 +68,4 @@ class HypemIE(InfoExtractor):
'ext': "mp3",
'title': title,
'artist': artist,
}]
}]

129
youtube_dl/extractor/ign.py Normal file
View File

@ -0,0 +1,129 @@
import re
import json
from .common import InfoExtractor
from ..utils import (
determine_ext,
)
class IGNIE(InfoExtractor):
"""
Extractor for some of the IGN sites, like www.ign.com, es.ign.com de.ign.com.
Some videos of it.ign.com are also supported
"""
_VALID_URL = r'https?://.+?\.ign\.com/(?P<type>videos|show_videos|articles|(?:[^/]*/feature))(/.+)?/(?P<name_or_id>.+)'
IE_NAME = u'ign.com'
_CONFIG_URL_TEMPLATE = 'http://www.ign.com/videos/configs/id/%s.config'
_DESCRIPTION_RE = [r'<span class="page-object-description">(.+?)</span>',
r'id="my_show_video">.*?<p>(.*?)</p>',
]
_TESTS = [
{
u'url': u'http://www.ign.com/videos/2013/06/05/the-last-of-us-review',
u'file': u'8f862beef863986b2785559b9e1aa599.mp4',
u'md5': u'eac8bdc1890980122c3b66f14bdd02e9',
u'info_dict': {
u'title': u'The Last of Us Review',
u'description': u'md5:c8946d4260a4d43a00d5ae8ed998870c',
}
},
{
u'url': u'http://me.ign.com/en/feature/15775/100-little-things-in-gta-5-that-will-blow-your-mind',
u'playlist': [
{
u'file': u'5ebbd138523268b93c9141af17bec937.mp4',
u'info_dict': {
u'title': u'GTA 5 Video Review',
u'description': u'Rockstar drops the mic on this generation of games. Watch our review of the masterly Grand Theft Auto V.',
},
},
{
u'file': u'638672ee848ae4ff108df2a296418ee2.mp4',
u'info_dict': {
u'title': u'GTA 5\'s Twisted Beauty in Super Slow Motion',
u'description': u'The twisted beauty of GTA 5 in stunning slow motion.',
},
},
],
u'params': {
u'skip_download': True,
},
},
]
def _find_video_id(self, webpage):
res_id = [r'data-video-id="(.+?)"',
r'<object id="vid_(.+?)"',
r'<meta name="og:image" content=".*/(.+?)-(.+?)/.+.jpg"',
]
return self._search_regex(res_id, webpage, 'video id')
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
name_or_id = mobj.group('name_or_id')
page_type = mobj.group('type')
webpage = self._download_webpage(url, name_or_id)
if page_type == 'articles':
video_url = self._search_regex(r'var videoUrl = "(.+?)"', webpage, u'video url')
return self.url_result(video_url, ie='IGN')
elif page_type != 'video':
multiple_urls = re.findall(
'<param name="flashvars" value="[^"]*?url=(https?://www\.ign\.com/videos/.*?)["&]',
webpage)
if multiple_urls:
return [self.url_result(u, ie='IGN') for u in multiple_urls]
video_id = self._find_video_id(webpage)
result = self._get_video_info(video_id)
description = self._html_search_regex(self._DESCRIPTION_RE,
webpage, 'video description',
flags=re.DOTALL)
result['description'] = description
return result
def _get_video_info(self, video_id):
config_url = self._CONFIG_URL_TEMPLATE % video_id
config = json.loads(self._download_webpage(config_url, video_id,
u'Downloading video info'))
media = config['playlist']['media']
video_url = media['url']
return {'id': media['metadata']['videoId'],
'url': video_url,
'ext': determine_ext(video_url),
'title': media['metadata']['title'],
'thumbnail': media['poster'][0]['url'].replace('{size}', 'grande'),
}
class OneUPIE(IGNIE):
"""Extractor for 1up.com, it uses the ign videos system."""
_VALID_URL = r'https?://gamevideos.1up.com/(?P<type>video)/id/(?P<name_or_id>.+)'
IE_NAME = '1up.com'
_DESCRIPTION_RE = r'<div id="vid_summary">(.+?)</div>'
_TEST = {
u'url': u'http://gamevideos.1up.com/video/id/34976',
u'file': u'34976.mp4',
u'md5': u'68a54ce4ebc772e4b71e3123d413163d',
u'info_dict': {
u'title': u'Sniper Elite V2 - Trailer',
u'description': u'md5:5d289b722f5a6d940ca3136e9dae89cf',
}
}
# Override IGN tests
_TESTS = []
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
id = mobj.group('name_or_id')
result = super(OneUPIE, self)._real_extract(url)
result['id'] = id
return result

View File

@ -5,7 +5,15 @@ from .common import InfoExtractor
class InaIE(InfoExtractor):
"""Information Extractor for Ina.fr"""
_VALID_URL = r'(?:http://)?(?:www\.)?ina\.fr/video/(?P<id>I[0-9]+)/.*'
_VALID_URL = r'(?:http://)?(?:www\.)?ina\.fr/video/(?P<id>I?[A-F0-9]+)/.*'
_TEST = {
u'url': u'www.ina.fr/video/I12055569/francois-hollande-je-crois-que-c-est-clair-video.html',
u'file': u'I12055569.mp4',
u'md5': u'a667021bf2b41f8dc6049479d9bb38a3',
u'info_dict': {
u"title": u"Fran\u00e7ois Hollande \"Je crois que c'est clair\""
}
}
def _real_extract(self,url):
mobj = re.match(self._VALID_URL, url)

View File

@ -11,6 +11,18 @@ from ..utils import (
class InfoQIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?(?:www\.)?infoq\.com/[^/]+/[^/]+$'
_TEST = {
u"name": u"InfoQ",
u"url": u"http://www.infoq.com/presentations/A-Few-of-My-Favorite-Python-Things",
u"file": u"12-jan-pythonthings.mp4",
u"info_dict": {
u"description": u"Mike Pirnat presents some tips and tricks, standard libraries and third party packages that make programming in Python a richer experience.",
u"title": u"A Few of My Favorite [Python] Things"
},
u"params": {
u"skip_download": True
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)

View File

@ -0,0 +1,35 @@
import re
from .common import InfoExtractor
class InstagramIE(InfoExtractor):
_VALID_URL = r'(?:http://)?instagram.com/p/(.*?)/'
_TEST = {
u'url': u'http://instagram.com/p/aye83DjauH/?foo=bar#abc',
u'file': u'aye83DjauH.mp4',
u'md5': u'0d2da106a9d2631273e192b372806516',
u'info_dict': {
u"uploader_id": u"naomipq",
u"title": u"Video by naomipq",
u'description': u'md5:1f17f0ab29bd6fe2bfad705f58de3cb8',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group(1)
webpage = self._download_webpage(url, video_id)
uploader_id = self._search_regex(r'"owner":{"username":"(.+?)"',
webpage, u'uploader id', fatal=False)
desc = self._search_regex(r'"caption":"(.*?)"', webpage, u'description',
fatal=False)
return [{
'id': video_id,
'url': self._og_search_video_url(webpage, secure=False),
'ext': 'mp4',
'title': u'Video by %s' % uploader_id,
'thumbnail': self._og_search_thumbnail(webpage),
'uploader_id' : uploader_id,
'description': desc,
}]

View File

@ -0,0 +1,84 @@
import re
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import (
compat_urlparse,
compat_urllib_parse,
xpath_with_ns,
determine_ext,
)
class InternetVideoArchiveIE(InfoExtractor):
_VALID_URL = r'https?://video\.internetvideoarchive\.net/flash/players/.*?\?.*?publishedid.*?'
_TEST = {
u'url': u'http://video.internetvideoarchive.net/flash/players/flashconfiguration.aspx?customerid=69249&publishedid=452693&playerid=247',
u'file': u'452693.mp4',
u'info_dict': {
u'title': u'SKYFALL',
u'description': u'In SKYFALL, Bond\'s loyalty to M is tested as her past comes back to haunt her. As MI6 comes under attack, 007 must track down and destroy the threat, no matter how personal the cost.',
u'duration': 153,
},
}
@staticmethod
def _build_url(query):
return 'http://video.internetvideoarchive.net/flash/players/flashconfiguration.aspx?' + query
@staticmethod
def _clean_query(query):
NEEDED_ARGS = ['publishedid', 'customerid']
query_dic = compat_urlparse.parse_qs(query)
cleaned_dic = dict((k,v[0]) for (k,v) in query_dic.items() if k in NEEDED_ARGS)
# Other player ids return m3u8 urls
cleaned_dic['playerid'] = '247'
cleaned_dic['videokbrate'] = '100000'
return compat_urllib_parse.urlencode(cleaned_dic)
def _real_extract(self, url):
query = compat_urlparse.urlparse(url).query
query_dic = compat_urlparse.parse_qs(query)
video_id = query_dic['publishedid'][0]
url = self._build_url(query)
flashconfiguration_xml = self._download_webpage(url, video_id,
u'Downloading flash configuration')
flashconfiguration = xml.etree.ElementTree.fromstring(flashconfiguration_xml.encode('utf-8'))
file_url = flashconfiguration.find('file').text
file_url = file_url.replace('/playlist.aspx', '/mrssplaylist.aspx')
# Replace some of the parameters in the query to get the best quality
# and http links (no m3u8 manifests)
file_url = re.sub(r'(?<=\?)(.+)$',
lambda m: self._clean_query(m.group()),
file_url)
info_xml = self._download_webpage(file_url, video_id,
u'Downloading video info')
info = xml.etree.ElementTree.fromstring(info_xml.encode('utf-8'))
item = info.find('channel/item')
def _bp(p):
return xpath_with_ns(p,
{'media': 'http://search.yahoo.com/mrss/',
'jwplayer': 'http://developer.longtailvideo.com/trac/wiki/FlashFormats'})
formats = []
for content in item.findall(_bp('media:group/media:content')):
attr = content.attrib
f_url = attr['url']
formats.append({
'url': f_url,
'ext': determine_ext(f_url),
'width': int(attr['width']),
'bitrate': int(attr['bitrate']),
})
formats = sorted(formats, key=lambda f: f['bitrate'])
return {
'id': video_id,
'title': item.find('title').text,
'formats': formats,
'thumbnail': item.find(_bp('media:thumbnail')).attrib['url'],
'description': item.find('description').text,
'duration': int(attr['duration']),
}

View File

@ -0,0 +1,52 @@
# coding: utf-8
import json
import re
import xml.etree.ElementTree
from .common import InfoExtractor
class JeuxVideoIE(InfoExtractor):
_VALID_URL = r'http://.*?\.jeuxvideo\.com/.*/(.*?)-\d+\.htm'
_TEST = {
u'url': u'http://www.jeuxvideo.com/reportages-videos-jeux/0004/00046170/tearaway-playstation-vita-gc-2013-tearaway-nous-presente-ses-papiers-d-identite-00115182.htm',
u'file': u'5182.mp4',
u'md5': u'046e491afb32a8aaac1f44dd4ddd54ee',
u'info_dict': {
u'title': u'GC 2013 : Tearaway nous présente ses papiers d\'identité',
u'description': u'Lorsque les développeurs de LittleBigPlanet proposent un nouveau titre, on ne peut que s\'attendre à un résultat original et fort attrayant.\n',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
title = mobj.group(1)
webpage = self._download_webpage(url, title)
xml_link = self._html_search_regex(
r'<param name="flashvars" value="config=(.*?)" />',
webpage, u'config URL')
video_id = self._search_regex(
r'http://www\.jeuxvideo\.com/config/\w+/\d+/(.*?)/\d+_player\.xml',
xml_link, u'video ID')
xml_config = self._download_webpage(
xml_link, title, u'Downloading XML config')
config = xml.etree.ElementTree.fromstring(xml_config.encode('utf-8'))
info_json = self._search_regex(
r'(?sm)<format\.json>(.*?)</format\.json>',
xml_config, u'JSON information')
info = json.loads(info_json)['versions'][0]
video_url = 'http://video720.jeuxvideo.com/' + info['file']
return {
'id': video_id,
'title': config.find('titre_video').text,
'ext': 'mp4',
'url': video_url,
'description': self._og_search_description(webpage),
'thumbnail': config.find('image').text,
}

View File

@ -26,6 +26,17 @@ class JustinTVIE(InfoExtractor):
"""
_JUSTIN_PAGE_LIMIT = 100
IE_NAME = u'justin.tv'
_TEST = {
u'url': u'http://www.twitch.tv/thegamedevhub/b/296128360',
u'file': u'296128360.flv',
u'md5': u'ecaa8a790c22a40770901460af191c9a',
u'info_dict': {
u"upload_date": u"20110927",
u"uploader_id": 25114803,
u"uploader": u"thegamedevhub",
u"title": u"Beginner Series - Scripting With Python Pt.1"
}
}
def report_download_page(self, channel, offset):
"""Report attempt to download a single page of videos."""

Some files were not shown because too many files have changed in this diff Show More