Compare commits

..

1755 Commits

Author SHA1 Message Date
00b350d209 [test] tell Travis to install rtmpdump and add initial support to rtmp testing 2013-11-25 17:46:33 -05:00
d8ec4959c8 Merge pull request #1830 from jaimeMF/download-archive
Use the 'extractor_key' field for the download archive file
2013-11-25 14:14:25 -08:00
d31209a144 Use the 'extractor_key' field for the download archive file
It has the same value as the ie_key.
2013-11-25 22:57:15 +01:00
529a2e2cc3 Fix typo in the documentation of the 'download_archive' param 2013-11-25 22:52:09 +01:00
781a7d0546 release 2013.11.25.3 2013-11-25 22:36:18 +01:00
fb04e40396 [soundcloud] Support for listing of audio-only files 2013-11-25 22:34:56 +01:00
d9b011f201 Fix rtmpdump with non-ASCII filenames on Windows on 2.x
Reported in #1798
2013-11-25 22:31:38 +01:00
b0b9eaa196 Merge pull request #1829 from jaimeMF/ydl-empty-params
Allow to initialize a YoutubeDL object without parameters
2013-11-25 13:19:59 -08:00
8b134b1062 Merge branch 'master' of github.com:rg3/youtube-dl 2013-11-25 22:16:07 +01:00
0c75c3fa7a Do not warn about fixed output template if --max-downloads is 1
Fixes #1828
2013-11-25 22:15:33 +01:00
a3927cf7ee Allow to initialize a YoutubeDL object without parameters
Having to pass the 'outtmpl' parameter feels really strange when you just want to extract the info of a video.
2013-11-25 22:03:39 +01:00
1a62c18f65 [bambuser] Skip the download in the test
It doesn't respect the 'Range' header.
2013-11-25 22:03:20 +01:00
2a15e7063b [soundcloud] Prefer HTTP over RTMP (#1798) 2013-11-25 20:30:41 +01:00
d46cc192d7 Reduce socket timeout 2013-11-25 19:11:01 +01:00
bb2bebdbe1 release 2013.11.25.2 2013-11-25 15:47:14 +01:00
5db07df634 Fix --download-archive (Fixes #1826) 2013-11-25 15:46:54 +01:00
ea36cbac5e Merge remote-tracking branch 'rbrito/swap-dimensions' 2013-11-25 06:19:15 +01:00
d0d2b49ab7 [FileDownloader] use moved format_bytes method 2013-11-25 06:17:41 +01:00
31cb6d8fef Merge remote-tracking branch 'rzhxeo/rtmpdump' 2013-11-25 06:16:18 +01:00
daa0dd2973 release 2013.11.25.1 2013-11-25 06:06:39 +01:00
de79c46c8f [viki] Fix subtitle extraction 2013-11-25 06:06:18 +01:00
94ccb6fa2e [viki] Fix subtitles extraction 2013-11-25 05:58:04 +01:00
07e4035879 [viki] Fix uploader extraction 2013-11-25 05:57:55 +01:00
d0efb9ec9a [tests] Remove global_setup function 2013-11-25 03:47:32 +01:00
ac05067d3d release 2013.11.25 2013-11-25 03:37:49 +01:00
113577e155 [generic] Improve detection
Allow download of http://goo.gl/7X5tOk
Fixes #1818
2013-11-25 03:35:53 +01:00
79d09f47c2 Merge branch 'opener-to-ydl' 2013-11-25 03:30:37 +01:00
c059bdd432 Remove quality_name field and improve zdf extractor 2013-11-25 03:28:55 +01:00
02dbf93f0e [zdf/common] Use API in ZDF extractor.
This also comes with a lot of extra format fields
Fixes #1518
2013-11-25 03:13:22 +01:00
1fb2bcbbf7 [viki] Make uploader field optional (#1813) 2013-11-25 02:02:34 +01:00
16e055849e Update the keywords tests for the rename of the old ComedyCentralIE 2013-11-24 22:13:20 +01:00
66cfab4226 [comedycentral] Add support for comedycentral.com videos (closes #1824)
It's a subclass of MTVIE

The extractor for colbertnation.com and thedailyshow.com is called now ComedyCentralShowsIE
2013-11-24 21:18:35 +01:00
6d88bc37a3 [viki] Skip travis test
Also provide a better error message for geoblocked videos.
2013-11-24 15:28:50 +01:00
b7553b2554 [vik] Clarify output 2013-11-24 15:20:16 +01:00
e03db0a077 Merge branch 'master' into opener-to-ydl 2013-11-24 15:18:44 +01:00
a1ee09e815 Document proxy 2013-11-24 15:03:25 +01:00
267ed0c5d3 [collegehumor] Encode the xml before calling xml.etree.ElementTree.fromstring (fixes #1822)
Uses a new helper method in InfoExtractor: _download_xml
2013-11-24 14:59:19 +01:00
f459d17018 [youtube] Add an extractor for downloading the watch history (closes #1821) 2013-11-24 14:33:50 +01:00
dc65dcbb6d [mixcloud] The description field may be missing (fixes #1819) 2013-11-24 11:28:44 +01:00
d214fdb8fe [brightcove] Don't use 'or' with the xml nodes, use the 'value' attribute instead 2013-11-24 11:02:34 +01:00
138df537ff release 2013.11.24.1 2013-11-24 07:51:56 +01:00
0c7c19d6bc [clipfish] Add extractor (Fixes #1760) 2013-11-24 07:51:44 +01:00
eaaafc59c2 release 2013.11.24 2013-11-24 07:30:34 +01:00
382ed50e0e [viki] Add extractor (fixes #1813) 2013-11-24 07:30:05 +01:00
66ec019240 [youtube] do not use variable name twice 2013-11-24 06:54:26 +01:00
bd49928f7a [niconico] Clarify download 2013-11-24 06:53:50 +01:00
23e6d50d73 [bandcamp] Remove unused variable 2013-11-24 06:52:53 +01:00
2e767313e4 [update] fix error 2013-11-24 06:52:21 +01:00
38b2db6a66 Credit @takuya0301 for niconico 2013-11-24 06:39:49 +01:00
13ebea791f [niconico] Simplify and make work with old Python versions
The website requires SSLv3, otherwise it just times out during SSL negotiation.
2013-11-24 06:39:10 +01:00
4c9c57428f Merge remote-tracking branch 'takuya0301/niconico' 2013-11-24 06:09:11 +01:00
8bf9319e9c Simplify logger code(#1811) 2013-11-24 06:08:11 +01:00
4914120727 Merge remote-tracking branch 'iTaybb/master' 2013-11-24 06:07:12 +01:00
36de0a0e1a [brightcove] Set the 'videoPlayer' value to the 'videoId' if it's missing in the parameters (fixes #1815) 2013-11-23 23:27:15 +01:00
e5c146d586 [streamcloud] skip test on travis 2013-11-23 15:57:42 +01:00
52ad14aeb0 Add support for niconico 2013-11-23 18:19:44 +09:00
43afe28588 Log to an external logger (fixes #1810)
Sadly applications using youtube-dl's python sources can't directly
access it's log stream. It's pretty much limited to stdout and stderr
only.

It should log to logging.Logger instance passed to YoutubeDL's params
dictionary.
2013-11-23 10:22:18 +02:00
a87b0615aa release 2013.11.22.2 2013-11-22 23:08:15 +01:00
d7386f6276 [update] Check if version from repository is newer before updating
Closes #1704
2013-11-22 23:05:58 +01:00
081640940e Merge branch 'master' of github.com:rg3/youtube-dl 2013-11-22 22:46:57 +01:00
7012b23c94 Match --download-archive during playlist processing (Fixes #1745) 2013-11-22 22:46:46 +01:00
d3b30148ed [bambuser:channel] Update test 2013-11-22 21:26:31 +01:00
9f79463803 [howcast] update test's checksum 2013-11-22 21:25:12 +01:00
d35dc6d3b5 [bandcamp] move the album test to the album extractor and return a single track instead of a playlist 2013-11-22 21:19:31 +01:00
50123be421 release 2013.11.22.1 2013-11-22 20:23:55 +01:00
3f8ced5144 Merge remote-tracking branch 'jaimeMF/yt-playlists' 2013-11-22 20:11:54 +01:00
00ea0f11eb Print full title in --get-title output (#1806) 2013-11-22 20:00:35 +01:00
dca0872056 Move the opener to the YoutubeDL object.
This is the first step towards being able to just import youtube_dl and start using it.
Apart from removing global state, this would fix problems like #1805.
2013-11-22 19:57:52 +01:00
0b63aed8df [update] do not assign to unused variables 2013-11-22 19:15:36 +01:00
15c3adbb16 Merge branch 'master' of github.com:rg3/youtube-dl 2013-11-22 19:08:33 +01:00
f143a42fe6 [bandcamp] Skip album test 2013-11-22 19:08:25 +01:00
241650c7ff [vimeo] Fix the extraction of vimeo pro and player.vimeo.com videos 2013-11-22 18:20:31 +01:00
bfe7439a20 release 2013.11.22 2013-11-22 17:46:26 +01:00
cffa6aa107 [bandcamp] Support trackinfo-style songs (Fixes #1270) 2013-11-22 17:44:55 +01:00
02e4ebbbad [streamcloud] Add IE (Fixes #1801) 2013-11-22 17:19:22 +01:00
ab009f59ef [toutv] Fix a typo 2013-11-22 17:18:03 +01:00
0980426559 [bandcamp] add support for albums (reported in #1270) 2013-11-22 16:05:14 +01:00
b1c9c66936 Remove unnecessary slash in setup.py (Fixes #1778) 2013-11-21 23:26:28 +01:00
a6a173c2fd utils.shell_quote: Convert the args to unicode strings
The youtube test video failed with `UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 34: ordinal not in range(128)`, the problem was with the filenames being encoded.
2013-11-21 14:09:28 +01:00
2bb683c201 release 2013.11.21 2013-11-21 13:59:33 +01:00
64bb5187f5 [soundcloud] Retrieve the file url using the client_id for the iPhone (fixes #1798)
The desktop's client_id always give the rtmp url, but with the iPhone one it returns the http url if it's available.
2013-11-21 13:16:19 +01:00
9e4f50a8ae [sztv] skip test, site is undergoing mid-term maintenance 2013-11-20 09:59:03 +01:00
0190eecc00 [nhl] Make NHLVideocenter IE_DESC fit with other descriptions 2013-11-20 09:45:29 +01:00
ca872a4c0b [spankwire] Fix description search 2013-11-20 09:23:53 +01:00
f2e87ef4fa [anitube] Skip test (on travis) 2013-11-20 07:46:44 +01:00
0ad97bbc05 [spankwire] fix check for description 2013-11-20 07:45:32 +01:00
c4864091a1 [videopremium] Support new crazy redirect scheme 2013-11-20 07:43:21 +01:00
9a98a466b3 [toutv] really skip test 2013-11-20 07:37:22 +01:00
f99e0f1ed6 Adapt age restriction tests to new .info.json filenames 2013-11-20 07:37:07 +01:00
d323bcb152 release 2013.11.20 2013-11-20 07:25:17 +01:00
da6a795fdb [escapist] Fix title search 2013-11-20 07:23:23 +01:00
c5edcde21f [escapist] upper-case URL 2013-11-20 06:56:59 +01:00
15ff3c831e [escapist] Fix syntax error 2013-11-20 06:55:07 +01:00
100959a6d9 [escapist] Add support for HD format (Closes #1755) 2013-11-20 06:52:08 +01:00
0a120f74b2 Credit @diffycat for anitube 2013-11-20 06:36:00 +01:00
8f05351984 [anitube] Minor fixes (#1776) 2013-11-20 06:35:02 +01:00
4eb92208a3 Adapt test to changed .info.json name 2013-11-20 06:34:48 +01:00
71791f414c Merge remote-tracking branch 'diffycat/master' 2013-11-20 06:28:13 +01:00
f3682997d7 Clean up unused imports and other minor mistakes 2013-11-20 06:27:48 +01:00
cc13cc0251 [teamcoco] Correct error 2013-11-20 06:25:33 +01:00
86bd5f2ca9 Merge remote-tracking branch 'dz0ny/patch-1' 2013-11-20 06:21:05 +01:00
8694c60000 import json for --dump-json 2013-11-20 06:18:24 +01:00
9d1538182f Add an option to dump json information 2013-11-20 06:14:57 +01:00
5904088811 Add support for tou.tv (Fixes #1792) 2013-11-20 06:13:19 +01:00
69545c2aff [d8] inherit from CanalplusIE
it reuses the same extraction process
2013-11-19 20:44:20 +01:00
495da337ae Merge pull request #1758 from migbac/master
Add support for d8.tv
2013-11-19 20:43:14 +01:00
34b3afc7be release 2013.11.19 2013-11-19 12:41:01 +01:00
00373a4c5d Merge pull request #1790 from rg3/console-title
Correctly write and restore the console title on the stack (fixes #1782)
2013-11-18 07:50:10 -08:00
cb7dfeeac4 [youtube] only allow domain name to be upper-case (#1786) 2013-11-18 16:42:35 +01:00
efd6c574a2 Correctly write and restore the console title on the stack (fixes #1782) 2013-11-18 16:35:41 +01:00
4113e6ab56 [auengine] Do not return unnecessary ext 2013-11-18 14:36:01 +01:00
9a942a4671 release 2013.11.18.1 2013-11-18 13:56:53 +01:00
9906d397a0 [auengine] Simplify 2013-11-18 13:56:45 +01:00
ae8f787141 Remove iPhone from user agent. This breaks a lot of extractors
In the future, it might be worth investigating whether we get better content when we claime to be an iPhone.
2013-11-18 13:52:26 +01:00
a81b4d5c8f release 2013.11.18 2013-11-18 13:30:43 +01:00
887c6acdf2 Support multiple embedded YouTube URLs (Fixes #1787) 2013-11-18 13:28:26 +01:00
83aa529330 Support protocol-independent URLs (#1787) 2013-11-18 13:18:17 +01:00
96b31b6533 Add iPhone to UA (#1746) 2013-11-18 13:05:58 +01:00
fccd377198 Suppor embed-only videos (Fixes #1746) 2013-11-18 13:05:18 +01:00
2b35c9ef74 Merge branch 'master' into rtmpdump
Conflicts:
	youtube_dl/FileDownloader.py

Merge
2013-11-18 00:27:06 +01:00
73c566695f release 2013.11.17 2013-11-17 22:14:13 +01:00
63b7b7224a [MTVIE] Try with RTMP URL if download fails
This fixes youtube-dl http://www.southpark.de/clips/155251/cartman-vs-the-dog-whisperer
2013-11-17 22:11:40 +01:00
ce80c8b8ee Merge pull request #1784 from rzhxeo/southpark
Add support for southpark.de
2013-11-17 12:15:13 -08:00
749febf4d1 Allow --console-title when --quiet is given (Fixes #1783) 2013-11-17 21:12:50 +01:00
bdde425cbe Save and restore console title (Fixes #1782) 2013-11-17 21:10:11 +01:00
746f491f82 Add support for southpark.de 2013-11-17 17:54:47 +01:00
1672647ade [SouthParkStudiosIE] Move from _TEST to _TESTS 2013-11-17 17:43:58 +01:00
90b6bbc38c [SouthParkStudiosIE] Also detect urls without http:// or www 2013-11-17 17:42:24 +01:00
ce02ed60f2 Remove * imports 2013-11-17 16:47:52 +01:00
1e5b9a95fd Move console_title to YoutubeDL 2013-11-17 11:39:52 +01:00
1d699755e0 [youtube] Add view_count (Fixes #1781) 2013-11-17 11:06:16 +01:00
ddf49c6344 [arte] remove two typos 2013-11-17 11:05:49 +01:00
ba3881dffd Add support for anitube.se (#1417) 2013-11-16 18:26:34 +04:00
d1c252048b [redtube] Do not test md5, seems to vary 2013-11-16 10:30:09 +01:00
eab2724138 [gamekings] Do not test md5 sum, precise file changes regularly 2013-11-16 02:32:23 +01:00
21ea3e06c9 [gamekings] remove unnecessary import 2013-11-16 02:31:02 +01:00
52d703d3d1 [tvp] Skip tests 2013-11-16 02:09:30 +01:00
ce152341a1 [bambuser] Do not test for MD5, seems to be flaky 2013-11-16 01:59:28 +01:00
f058e34011 [dailymotion] Fix playlists 2013-11-16 01:56:23 +01:00
b5349e8721 Fix indentation of (best) and (worst) in --list-formats 2013-11-16 01:39:45 +01:00
7150858d49 [spiegel] Implement format selection 2013-11-16 01:33:12 +01:00
91c7271aab Add automatic generation of format note based on bitrate and codecs 2013-11-16 01:08:43 +01:00
aa13b2dffd release 2013.11.15.1 2013-11-15 14:35:00 +01:00
fc2ef392be [ted] Fix playlists (Fixes #1770) 2013-11-15 14:33:51 +01:00
463a908705 [ted] simplify 2013-11-15 14:06:38 +01:00
d24ffe1cfa [rtlnow] Remove the test for nitro
The videos expire.
2013-11-15 12:57:59 +01:00
78fb87b283 Don't accept '>' inside the content attribute in OpenGraph regexes 2013-11-15 12:54:13 +01:00
ab2d524780 Improve the OpenGraph regex
* Do not accept '>' between the property and content attributes.
* Recognize the properties if the content attribute is before the property attribute using two regexes (fixes the extraction of the description for SlideshareIE).
2013-11-15 12:24:54 +01:00
85d61685f1 [tvp] Update the title and the description of the test video 2013-11-15 12:10:22 +01:00
b9643eed7c [youtube:channel] Fix the extraction of autogenerated channels
The ajax pages are empty, now it looks directly in the channel's /videos page
2013-11-15 11:51:45 +01:00
feee2ecfa9 Pass the 'download' argument to 'process_video_result' (fixes #1769) 2013-11-15 11:04:26 +01:00
a25a5cfeec release 2013.11.15 2013-11-15 01:47:15 +01:00
0e145dd541 Merge branch 'master' of github.com:rg3/youtube-dl 2013-11-15 01:46:50 +01:00
9f9be844fc [youtube] Fix protocol-independent URLs (Fixes #1768) 2013-11-15 01:45:39 +01:00
e3b9ab5e18 [soundlcoud] Set the correct extension for the tracks (fixes #1766)
Some tracks are not in mp3 format, they can be wav files.
2013-11-14 19:45:39 +01:00
c66d2baa9c [livestream] Add an extractor for the original version of livestream (closes #1764)
The two versions use different systems.
2013-11-14 13:16:32 +01:00
08bc37cdd0 Update test_write_info_json.py 2013-11-13 18:55:49 +01:00
9771cceb2c Fix filename extension leaking to json filename
Makes writeinfojson behaving exactly as writethumbnail in case where filename contains mediafile extension.

Case:

video.mp4 converted to music.mp3 would yield music.mp4.info.json instead music.mp3.info.json or music.info.json
2013-11-13 18:34:03 +01:00
ca715127a2 Don't assume the 'subtitlesformat' is set in the params dict (fixes #1750) 2013-11-13 17:14:10 +01:00
ea7a7af1d4 [gamekings] Fix the test video checksum 2013-11-13 17:13:06 +01:00
880e1c529d [youtube:playlist] Login into youtube if requested (fixes #1757)
Allows to download private playlists
2013-11-13 16:39:11 +01:00
dcbb45803f [youtube:playlist] Don't use the gdata api (closes #1508)
Parse the playlist pages instead
2013-11-13 16:26:50 +01:00
80b9bbce86 release 2013.11.13 2013-11-13 11:09:04 +01:00
d37936386f Credit @saper for tvp IE (#1730) 2013-11-13 11:08:07 +01:00
c3a3028f9f [tvp] Minor improvements (#1730) 2013-11-13 11:06:53 +01:00
6c5ad80cdc Merge remote-tracking branch 'saper/tvp' 2013-11-13 11:03:49 +01:00
b5bdc2699a Credit @jelly for gamekings extractor (#1759) 2013-11-13 10:52:22 +01:00
384b98cd8f [gamekings] Minor fixes (#1759) 2013-11-13 10:51:00 +01:00
eb9b5bffef Add extractor for gamekings.tv 2013-11-13 10:38:47 +01:00
0bd59f3723 Add support for d8.tv 2013-11-12 23:32:03 +01:00
8b8cbd8f6d [vine] Fix uploader extraction 2013-11-12 20:50:52 +01:00
72b18c5d34 FFmpegMetadataPP: don't enclose the values with " (fixes #1756) 2013-11-12 20:38:13 +01:00
eb0a839866 [common] Simplify og_search_property 2013-11-12 10:36:23 +01:00
1777d5a952 release 2013.11.11 2013-11-11 18:28:17 +01:00
d4b7da84c3 Clarify -c. Do not pass it in if you don't know what you're doing
Suggested in #1743
2013-11-11 14:21:14 +01:00
801dbbdffd Use avconv for downloading with m3u8 manifests if it's available (fixes #1735) 2013-11-10 16:47:03 +01:00
0ed05a1d2d Use the 'rtmp_live' field for the live parameter of rtmpdump 2013-11-10 12:45:17 +01:00
1008bebade Merge remote-tracking branch 'rzhxeo/rtmpdump_live' 2013-11-10 12:38:40 +01:00
ae84f879d7 Merge all the subtitles test into a single file
They reuse a base class
2013-11-10 12:28:21 +01:00
be6dfd1b49 [ted] Return a single info_dict for talks urls
It failed with the --list-subs option
2013-11-10 12:09:12 +01:00
231516b6c9 Merge pull request #1705 from iemejia/master
[ted] support for subtitles
2013-11-10 11:54:18 +01:00
fb53d58dcf Merge pull request #1726 from saper/escaped
Fix AssertionError when og property not found
2013-11-10 02:51:52 -08:00
2a9e9b210b Fix the documentation of '--autonumber-size' (#1743)
it's '--auto-number' not '--autonumber'
2013-11-09 19:21:30 +01:00
897d6cc43a Improve format listing for long format ids
Now arte.tv videos have quite long ids.
2013-11-09 19:07:34 +01:00
f470c6c812 [arte] Improve the format sorting
Also use the bitrate.
Prefer normal version and sourds/mal version over original version with subtitles.
2013-11-09 19:05:19 +01:00
566d4e0425 [arte] Make sure the format_id is unique (closes #1739)
Include the bitrate and use the height instead of the quality field.
2013-11-09 19:01:23 +01:00
81be02d2f9 [cnn] Accept www.cnn.com urls (fixes #1740) 2013-11-09 18:16:32 +01:00
c2b6a482d5 [brightcove] the format function requires to specify the index in python2.6 2013-11-09 18:10:11 +01:00
12c167c881 [soundcloud] Allow to download tracks marked as not 'streamable'
They use the rtmp protocol but if the are marked as 'downloadable' it can use the direct download link.
2013-11-09 18:08:03 +01:00
20aafee7fa [kankan] Fix the video url
It now requires two additional parameters, one is a timestamp we get from the getCdnresource_flv page and the other is a key we have to build.
2013-11-09 16:51:11 +01:00
be07375b66 Don't recode the video with m3u8 downloads (fixes #1741) 2013-11-09 16:40:00 +01:00
4894fe8c5b Report download progress of rtmpdump 2013-11-09 11:14:40 +01:00
dd5bcdc4c9 [brightcove] Set the 'Referer' header if the url has the 'linkBaseUrl' parameter (fixes #1553) 2013-11-07 21:06:48 +01:00
6161d17579 release 2013.11.07 2013-11-07 11:06:34 +01:00
4ac5306ae7 Fix the report progress when file_size is unknown (#1731)
The report_progress function will accept eta and percent with None value and will set the message to 'Unknow ETA' or 'Unknown %'.
Otherwise the values must be numbers.
2013-11-07 08:03:35 +01:00
b1a80ec1a9 [xnxx] Accept urls that start with 'www' (fixes #1734) 2013-11-06 23:45:01 +01:00
672fe94dcb release 2013.11.06.1 2013-11-06 22:11:46 +01:00
51040b72ed [brightcove] Support redirected urls from bcove.me (fixes #1732)
'bctid' needs to be changed to '@videoPlayer', and 'bckey' to 'playerKey'.
2013-11-06 22:03:00 +01:00
4f045eef8f [youtube:channel] Fix the extraction
The page don't include the 'load more' button anymore, now we directly get the 'c4_browse_ajax' pages.
2013-11-06 21:42:33 +01:00
5d7b253ea0 Add an extractor for eitb.tv (fixes #1608)
The BrighcoveExperience object doesn't contain the video id, the extractor adds it and passes the url to BrightcoveIE.
2013-11-06 20:06:14 +01:00
b0759f0c19 [brightcove] Extract all the available formats 2013-11-06 19:05:41 +01:00
065472936a Add an extractor for space.com (fixes #1718)
It uses Brightcove, but requires some special process for getting a url with the playerKey field in some videos
2013-11-06 17:37:39 +01:00
fc4a0c2aec [brightcove] Change the 'videoId' or 'videoID' field to '@videoPlayer' (fixes #1697)
It seems to be needed when using the htmlFederated page
2013-11-06 17:31:47 +01:00
eeb165e674 [brightcove] Add the extraction of the url from generic 2013-11-06 16:58:03 +01:00
9ee2b5f6f2 tests: don't run the test if any of the extractors listed in the 'add_ie' field is marked as not working 2013-11-06 16:43:26 +01:00
da54be877a release 2013.11.06 2013-11-06 14:02:52 +01:00
50a886b7ab Fix reporting when file size is unkown (Fixes #1731) 2013-11-06 14:02:33 +01:00
76e67c2cb6 Clean up imports 2013-11-06 14:01:43 +01:00
5137ebac0b [tvp] Telewizja Polska: new extractor for tvp.pl, fixes #1719
Thanks-To: mplonski

https://github.com/mplonski/linux/blob/master/tvp-dl.py
2013-11-05 23:47:40 +01:00
a8eeb0597b Fix AssertionError when og property not found
On tvp.pl some webpages contain OpenGraph
metadata and some don't.

If og property is not found, _og_search_description
fails with

WARNING: unable to extract OpenGraph description; please report this issue on http://yt-dl.org/bug
Traceback (most recent call last):
  File "/usr/home/saper/bin/youtube-dl", line 18, in <module>
    youtube_dl.main()
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/__init__.py", line 766, in main
    _real_main(argv)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/__init__.py", line 719, in _real_main
    retcode = ydl.download(all_urls)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/YoutubeDL.py", line 715, in download
    videos = self.extract_info(url)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/YoutubeDL.py", line 348, in extract_info
    ie_result = ie.extract(url)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/extractor/common.py", line 125, in extract
    return self._real_extract(url)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/extractor/tvp.py", line 56, in _real_extract
    info['description'] = self._og_search_description(webpage)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/extractor/common.py", line 331, in _og_search_description
    return self._og_search_property('description', html, fatal=False, **kargs)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/extractor/common.py", line 325, in _og_search_property
    return unescapeHTML(escaped)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/utils.py", line 494, in unescapeHTML
    assert type(s) == type(u'')
AssertionError

The patch allows me to use:

  try:
    info['description'] = self._og_search_description(webpage)
    info['thumbnail'] = self._og_search_thumbnail(webpage)
  except RegexNotFoundError:
    pass
2013-11-05 23:19:29 +01:00
4ed3e51080 [ted] fixed error in case of no subtitles present
I created a test, but I leave it commented since TED videos get
new subtitles frequently.
2013-11-05 12:00:13 +01:00
7f34001d57 Merge pull request #1724 from rzhxeo/generic_youtube
[GenericIE] Also detect youtube if src url of iframe is embedded in ' instead of "
2013-11-04 23:00:46 -08:00
2dcf7d8f99 [GenericIE] Also detect youtube if src url of iframe is embedded in ' instaed of " 2013-11-05 02:08:02 +01:00
19b0668251 [canal2c] Accept more urls (fixes #1723)
The url only needs to have the 'idVideo' field in the query, in any position.
We have to set the 'void=oui' in the webpage url, so that we get the file name.
2013-11-04 22:26:19 +01:00
e7e6b54d8a [teamcoco] Parse the xml file and extract all the formats 2013-11-03 17:48:12 +01:00
2a1a8ffe41 Merge pull request #1693 from alexvh/teamcoco_fix
[teamcoco] Fix video url extraction for some videos
2013-11-03 17:19:51 +01:00
08fb86c49b [youtube] Add description for YoutubeSearchDateIE (#1710) 2013-11-03 15:59:10 +01:00
3633d77c0f Merge remote-tracking branch 'CBGoodBuddy/ytsearchtime' 2013-11-03 15:56:55 +01:00
165e179764 release 2013.11.03 2013-11-03 15:50:36 +01:00
12ebdd1506 [viddler] Support non-digit IDs (Fixes #1714) 2013-11-03 15:49:59 +01:00
1baf9a5938 Merge pull request #1698 from rzhxeo/cinemassacre
[CinemassacreIE] Support more embed urls
2013-11-03 05:17:12 -08:00
a56f9de156 Style fixes for extractors: remove spaces around (,),{ and } 2013-11-03 14:06:47 +01:00
fa5d47af4b Merge pull request #1679 from rzhxeo/mofosex
Add support for http://www.mofosex.com
2013-11-03 05:04:14 -08:00
d607038753 Merge pull request #1677 from rzhxeo/xtube
Add support for http://www.xtube.com
2013-11-03 03:28:02 -08:00
9ac6a01aaf Merge pull request #1676 from rzhxeo/extremetube
Add support for http://www.extremetube.com
2013-11-03 03:25:46 -08:00
be97abc247 Set the 'extractor_key' field in the info_dict
It's the string returned by the class method 'ie_key', which allows to retrieve the extractor with 'get_info_extractor'
2013-11-03 12:14:44 +01:00
9103bbc5cd Add the 'webpage_url' field to info_dict
The url for the video page, it must allow to reproduce the result.
It's automatically set by YoutubeDL if it's missing.
2013-11-03 12:11:13 +01:00
b6c45014ae Set the extra_info inside YoutubeDL.process_ie_result and set only if the keys are missing 2013-11-03 11:57:04 +01:00
a3dd924871 Add YoutubeSearchDateIE extractor to youtube.py & __init__.py, which searches by publication date. 2013-11-02 22:40:48 -04:00
137bbb3e37 [XTubeIE] Add description to TEST 2013-11-02 22:45:48 +01:00
86ad94bb2e [ExtremeTubeIE] Set age_limit to 18 and fix uploader extraction 2013-11-02 22:33:49 +01:00
3e56add7c9 Merge pull request #1678 from rzhxeo/keezmovies
[KeezMoviesIE] Detect URLs with numbers in the SEO part correct
2013-11-02 14:15:52 -07:00
f52f01b5d2 [brightcove] Don't set the extension
If the video only has the 'FLVFullLengthURL' key, it can still be an mp4 file.
2013-11-02 21:20:46 +01:00
98d7efb537 [exfm] skip tests
The site is down too often.
2013-11-02 20:51:09 +01:00
cf51923545 [youtube] Remove vevo test
The video is no longer available and it seems that vevo video don't use encrypted signatures anymore.
2013-11-02 20:46:26 +01:00
38fcd4597a Merge remote-tracking branch 'iemejia/master' 2013-11-02 19:56:06 +01:00
165e3bb67a [bambuser] Add an extractor for channels (closes #1702) 2013-11-02 19:50:57 +01:00
38db46794f Merge branch 'ted_subtitles' 2013-11-02 19:50:45 +01:00
a9a3876d55 [ted] Added support for subtitle download 2013-11-02 19:48:39 +01:00
1f343eaabb [subtitles] refactor to support websites with subtitle information the
webpage.

I added the parameter webpage, so now it's similar to the way automatic
captions are handled. This is an improvement needed for websites like
TED.
2013-11-02 19:29:25 +01:00
72a5b4f702 Add an extractor for bambuser.com (#1702) 2013-11-02 19:01:01 +01:00
0a43ddf320 [CinemassacreIE] Add live paramter to extracted info as a workaround 2013-11-02 18:08:35 +01:00
31366066bd Add support for live parameter to rtmpdump 2013-11-02 18:08:16 +01:00
aa2484e390 release 2013.11.02 2013-11-02 11:21:36 +01:00
8eddf3e91d [youtube] Encode subtitle track name in request (Fixes #1700) 2013-11-02 11:21:05 +01:00
60d142aa8d Add an extractor for vk.com (closes #1635) 2013-11-01 22:34:18 +01:00
66cf3ac342 [metacafe] Fix support for age-restricted videos (fixes #1696)
The 'Content-Type' header must be set for disabling the family filter.
The 'flashversion' cookie  is only needed for AnyClip videos.
Added tests for standard metacafe videos and for age-restricted videos.
Also set the 'age_limit' field.
2013-11-01 11:56:15 +01:00
ab4e151347 [CinemassacreIE] Support more embed urls 2013-11-01 01:24:23 +01:00
ac2547f5ff [teamcoco] Fix video url extraction for some videos
Video url extraction failed for some videos,
e.g. http://teamcoco.com/video/old-time-baseball

The url extracted was also occasionally suboptimal quality,
e.g. http://teamcoco.com/video/louis-ck-interview-george-w-bush
2013-10-31 15:41:14 -04:00
5f1ea943ab [livestream] fix the extraction of events
It now uses a json dictionary from the webpage.
2013-10-31 08:07:26 +01:00
0ef7ad5cd4 Fix the test for dailymotion subtitles
The extractor returns a single info_dict now.
2013-10-31 07:55:03 +01:00
9f1109a564 [dailymotion] Fix support for age-restricted videos (Fixes #1688) 2013-10-31 00:20:49 +01:00
33b1d9595d release 2013.10.30 2013-10-30 01:17:20 +01:00
7193498811 Use index in formt string (Fixes vevo test on Python 2.6) 2013-10-30 01:17:00 +01:00
72321ead7b [vevo] Readd support for SMIL (Fixes #1683) 2013-10-30 01:14:17 +01:00
b5d0d817bc Remove superfluous space 2013-10-30 01:09:44 +01:00
94badb2599 Fix output indenting for --list-formats 2013-10-30 01:09:26 +01:00
b9a836515f Update the Vimeo test vector md5
confirmed that this is indeed the first 10241 (we went off by one with
byte range 0-10240) of the full, playing mp4, so they probably
reencoded or something
2013-10-29 16:44:35 -04:00
21c924f406 [arte] Download the 'Originalversion' version if it's the only one available (fixes #1682) 2013-10-29 20:58:49 +01:00
e54fd4b23b [vevo] Add more format details 2013-10-29 15:10:09 +01:00
57dd9a8f2f Nicer --list-formats output 2013-10-29 15:09:45 +01:00
912cbf5d4e [vevo] Fix timestamp handling
( / 1000 is implicit float division )
2013-10-29 14:00:23 +01:00
43d7895ea0 release 2013.10.29 2013-10-29 06:48:39 +01:00
f7ff55aa78 Merge remote-tracking branch 'origin/master' 2013-10-29 06:48:18 +01:00
795f28f871 [youtube] Fix login (Fixes #1681) 2013-10-29 06:45:54 +01:00
f6cc16f5d8 [tests] a HTTP 503 is a transient issue 2013-10-28 19:07:16 -04:00
321a01f971 [mtv] Remove the templates from the mediagen url 2013-10-28 23:37:01 +01:00
646e17a53d Fix YouTubeDL test 2013-10-28 23:18:13 +01:00
dd508b7c4f [tests] don't fail on network errors
This is suboptimal, but at least this way we will need to look at the logs
only to check for network errors that happen too often, instead of
parsing a ton of lines each time to see if there is some true test failing
2013-10-28 18:03:26 -04:00
2563bcc85c Add an extractor for MySpace (closes #1666) 2013-10-28 22:02:17 +01:00
702665c085 tests: build the filename from the info_dict if the 'file' key is missing
It will need to have the 'id' and 'ext' keys to work.
2013-10-28 22:01:37 +01:00
dcc2a706ef Add support for http://www.xtube.com 2013-10-28 19:23:48 +01:00
2bc67c35ac [KeezMoviesIE] Detect URLs with numbers in the SEO part correct 2013-10-28 18:22:55 +01:00
77ae65877e Add support for http://www.mofosex.com 2013-10-28 18:18:58 +01:00
32a35e4418 Add support for http://www.extremetube.com 2013-10-28 17:35:01 +01:00
369a759acc setup.py: Make sure the setuptools_available variable is set
Otherwise it would crash if it can't import setuptools.
2013-10-28 16:54:48 +01:00
79b3f61228 Merge pull request #1675 from rzhxeo/fix
Check if description and thumbnail are None to prevent crash
2013-10-28 08:35:40 -07:00
216d71d001 Check if description and thumbnail are None to prevent crash 2013-10-28 16:28:35 +01:00
78a3a9f89e Make "requested format not available" expected (#1655) 2013-10-28 11:41:59 +01:00
a7685f3bf4 mixcloud does not do any format selection 2013-10-28 11:41:32 +01:00
f088ea5486 release 2013.10.28 2013-10-28 11:34:21 +01:00
1003d108d5 [vimeo] Support hash in URL (Fixes #1669) 2013-10-28 11:32:22 +01:00
8abeeb9449 Nicer --list-formats output 2013-10-28 11:31:12 +01:00
c1002e96e9 Let extractors omit ext in formats 2013-10-28 11:28:02 +01:00
77d0a82fef [addanime] Use new formats system 2013-10-28 11:24:47 +01:00
ebc14f251c Merge remote-tracking branch 'origin/master' 2013-10-28 10:44:13 +01:00
d41e6efc85 New debug option --write-pages 2013-10-28 10:44:02 +01:00
8ffa13e03e [Instagram] get the non-https link, as they are serving Akamai cert from a instagram.com domain 2013-10-28 02:34:29 -04:00
db477d3a37 Merge pull request #1620 from jaimeMF/console_script
Use the console_scripts entry point if setuptools is available
2013-10-27 23:08:59 -07:00
750e9833b8 Add the missing age_limit tags; added a devscript to do a superficial check for porn sites without the age_limit tag in the test 2013-10-28 01:50:17 -04:00
82f0ac657c Merge pull request #1657 by @rzhxeo
[YouPornIE] Extract all encrypted links and remove doubles at the end
2013-10-28 01:45:52 -04:00
eb6a2277a2 Merge pull request #1659 by @rzhxeo
Add support for http://www.tube8.com
2013-10-28 01:38:28 -04:00
f8778fb0fa Merge pull request #1663 by @rzhxeo
Add support for http://www.spankwire.com
2013-10-28 01:35:11 -04:00
e2f9de207c Merge pull request #1664 by @rzhxeo
Add support for http://www.keezmovies.com
2013-10-28 01:25:46 -04:00
a93cc0d943 Merge pull request #1661 by @rzhxeo
Add support for http://www.pornhub.com
2013-10-28 00:50:39 -04:00
7d8c2e07f2 [Exfm] replace the failing Soundcloud test vector (broken also in browser) 2013-10-28 00:33:43 -04:00
efb4c36b18 Merge pull request #1660 from pyed/master
[addanime] try to download HQ before normal
2013-10-27 21:14:19 -07:00
29526d0d2b Merge pull request #1656 from rzhxeo/xhamster
[XHamsterIE] Extract SD and HD video
2013-10-27 10:12:59 -07:00
198e370f23 [addanime] better regex. 2013-10-27 19:48:02 +03:00
c19f7764a5 [generic] Detect bandcamp pages that use custom domains (closes #1662)
They embed the original url in the 'og:url' property.
2013-10-27 14:40:25 +01:00
bc63d9d329 [rtlnow] Change the test for rtlnitronow 2013-10-27 14:26:19 +01:00
aa929c37d5 [generic] Fix test video's checksum 2013-10-27 14:21:37 +01:00
af4d506eb3 [faz] Use a regex for getting the description
The page cannot be parsed in python2.6 with the html parser.
2013-10-27 14:18:55 +01:00
5da0549581 [KeezMoviesIE] Correct return value for embedded videos 2013-10-27 12:48:09 +01:00
749a4fd2fd [facebook] Don't recommend to report the issue if the video is private. 2013-10-27 12:13:55 +01:00
6f71ef580c [facebook] Report a more meaningful message if the video cannot be accessed (closes #1658) 2013-10-27 12:09:46 +01:00
67874aeffa [facebook] Fix the login process (fixes #1244) 2013-10-27 12:07:58 +01:00
3e6a330d38 [addanime] fix md5sum 2013-10-27 13:51:26 +03:00
aee5e18c8f [addanime] catch 'RegexNotFoundError' 2013-10-27 13:36:43 +03:00
5b11143d05 Add support for http://www.keezmovies.com 2013-10-27 10:10:28 +01:00
7b2212e954 Add support for http://www.spankwire.com 2013-10-27 01:59:26 +02:00
71865091ab [Tube8IE] Fix regex for uploader extraction 2013-10-27 01:08:03 +02:00
125cfd78e8 Add support for http://www.pornhub.com 2013-10-27 01:04:22 +02:00
8cb57d9b91 [Tube8IE] Escape dot in regex 2013-10-27 00:21:27 +02:00
14e10b2b6e [addanime] try to download HQ before normal 2013-10-27 01:19:38 +03:00
6e76104d66 [YouPornIE] Make webpage download more robust 2013-10-26 23:33:32 +02:00
1d45a23b74 Add support for http://www.tube8.com 2013-10-26 23:27:30 +02:00
7df286540f [YouPornIE] Extract all encrypted links and remove doubles at the end 2013-10-26 21:57:10 +02:00
5d0c97541a [XHamsterIE] Extract SD and HD video 2013-10-26 20:38:54 +02:00
49a25557b0 [8tracks] Use track count instead of looking at at_last_track property
This fixes the error:

$ youtube-dl http://8tracks.com/vladmc/counting-stars
[8tracks] counting-stars: Downloading webpage
[8tracks] counting-stars: Downloading song information 1/4
[8tracks] counting-stars: Downloading song information 2/4
[8tracks] counting-stars: Downloading song information 3/4
[8tracks] counting-stars: Downloading song information 4/4
[8tracks] counting-stars: Downloading song information 5/4
Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/home/phihag/projects/youtube-dl/youtube_dl/__main__.py", line 18, in <module>
    youtube_dl.main()
  File "/home/phihag/projects/youtube-dl/youtube_dl/__init__.py", line 761, in main
    _real_main(argv)
  File "/home/phihag/projects/youtube-dl/youtube_dl/__init__.py", line 714, in _real_main
    retcode = ydl.download(all_urls)
  File "/home/phihag/projects/youtube-dl/youtube_dl/YoutubeDL.py", line 701, in download
    videos = self.extract_info(url)
  File "/home/phihag/projects/youtube-dl/youtube_dl/YoutubeDL.py", line 342, in extract_info
    ie_result = ie.extract(url)
  File "/home/phihag/projects/youtube-dl/youtube_dl/extractor/common.py", line 121, in extract
    return self._real_extract(url)
  File "/home/phihag/projects/youtube-dl/youtube_dl/extractor/eighttracks.py", line 111, in _real_extract
    'id': track_data['id'],
KeyError: 'id'
2013-10-25 23:46:19 +02:00
b5936c0059 Document the %(format_id)s field for the output template 2013-10-25 17:18:06 +02:00
600cc1a4f0 [youtube] Set the format_id field to the itag of the format (closes #1624) 2013-10-25 17:17:46 +02:00
ea32fbacc8 Fix the extensions of two tests with youtube videos
The best quality is now a mp4 video.
2013-10-25 16:55:37 +02:00
00fe14fc75 [youtube] Also use the 'adaptative_fmts' field from the /get_video_info page (fixes #1649)
The 'adaptative_fmts' field from the video page is not added to the 'url_encoded_fmt_stream_map'
2013-10-25 16:52:58 +02:00
fcc28edb2f [cinemassacre] Simplify
* Remove some rtmp parameters that are not needed.
* Remove the md5 checksums, the video is not downloaded.
* Remove the code used before the current format system.
2013-10-23 20:21:41 +02:00
fac6be2dd5 Merge pull request #1632 from rzhxeo/cinemassacre
[Cinemassacre] Download video that is shown in flash player
2013-10-23 20:15:39 +02:00
1cf64ee468 release 2013.10.23.2 2013-10-23 18:38:09 +02:00
cdec0190c4 [dailymotion] Extract all the available formats (closes #1028) 2013-10-23 17:33:38 +02:00
2450bcb28b [nowvideo] Fix key extraction
Extract it from the embed page
2013-10-23 17:00:33 +02:00
3126050c0f Hide the video password on verbose mode 2013-10-23 16:32:17 +02:00
93b22c7828 [vimeo] fix the extraction for videos protected with password
Added a test video.
2013-10-23 16:31:53 +02:00
0a89b2852e release 2013.10.23.1 2013-10-23 15:12:33 +02:00
55b3e45bba [vimeo] Fix pro videos and player.vimeo.com urls
The old process can still be used for those videos.
Added RegexNotFoundError, which is raised by _search_regex if it can't extract the info.
2013-10-23 14:38:03 +02:00
365bcf6d97 Merge remote-tracking branch 'origin/master' 2013-10-23 11:40:46 +02:00
71907db3ba [vimeo] Fix normal videos (Fixes #1642)
Vimeo Pro Videos are still broken
2013-10-23 11:38:53 +02:00
6803655ced Merge pull request #1622 from rbrito/fix-extension
extractor: youtube: Set extension of AAC audio formats to m4a.
2013-10-22 15:16:26 -07:00
df1c39ec5c release 2013.10.23 2013-10-23 00:07:27 +02:00
80f55a9511 release 2013.10.22 2013-10-22 22:35:13 +02:00
7853cc5ae1 Merge remote-tracking branch 'origin/master'
Conflicts:
	youtube_dl/YoutubeDL.py
2013-10-22 22:30:06 +02:00
586a91b67f Expand tilde in template (Fixes #1639) 2013-10-22 22:28:26 +02:00
b028e96144 [arte.tv:creative] Update the title of the test 2013-10-22 21:06:06 +02:00
ce68b5907c [nhl:videocenter] Fix playlist title extraction 2013-10-22 21:01:16 +02:00
fe7e0c9825 Style fixes in YoutubeDL.py
Fixed some of the problems reported by pep8
2013-10-22 14:49:34 +02:00
12893efe01 Respect the download parameter in YoutubeDL.process_video_result if the extractor handle the format selection 2013-10-22 00:01:59 +02:00
a6387bfd3c [vimeo] Implement the new format selection system (closes PR #996)
Rebased and deleted some parts to use the new system instead of copying the one from YoutubeIE
2013-10-21 23:16:11 +02:00
f6a54188c2 [youtube] Use 'node is None' when checking if the video has automatic captions
It had stopped working and it reports a FutureWarning
2013-10-21 16:28:55 +02:00
cbbd9a9c69 Fix the duration field for the VideoDetective and InternetVideoArchive tests
Also remove the use of the old format system and the comment
2013-10-21 15:07:33 +02:00
685a9cd2f1 [googleplus] Fix upload_date extraction 2013-10-21 15:00:21 +02:00
182a107877 [arte] Set the format_note and the format_id fields (closes #1628) 2013-10-21 14:42:30 +02:00
8c51aa6506 The 'format' field now defaults to '{format_id} - {width}x{height}{format_note}'
Following the YoutubeIE format. The 'format_note' gives additional info about the format, for example '3D' or 'DASH video'.
2013-10-21 14:42:06 +02:00
3fd39e37f2 YoutubeDL: remove method that came from FileDownloader 2013-10-21 13:52:24 +02:00
49e86983e7 Allow to use the extension for the format selection
The best format with the extension is downloaded.
2013-10-21 13:31:55 +02:00
a9c58ad945 Accept requested formats to be in the format 35/best (closes #1552)
The format selection code is now an independent function.
2013-10-21 13:19:58 +02:00
f8b45beacc Merge remote-tracking branch 'rbrito/set-age'
Conflicts:
	youtube_dl/extractor/xhamster.py
2013-10-19 21:16:14 +02:00
9d92015d43 [xhamster] Add support for age_limit (Instead of #1627) 2013-10-19 21:09:48 +02:00
50a6150ed9 extractor: Set age limit on some adult-related extractors.
More age limit of videos for adult-related sites.

Note that, for redtube, I explicitly left the variable containing the age
limit, since the comment justifying the age limit is a good thing to have.

That being said, I included the age limit field on the test, to better
reflect what the information extractor does (even if it may not break the
automated tests).

Signed-off-by: Rogério Brito <rbrito@ime.usp.br>
2013-10-19 14:19:25 -03:00
d5a9bb4ea9 extractor: youtube: Swap video dimensions to match standard practice.
While working on this, I thought about simplifying things like changing
480x854 to 480p, and that seemed like a good option, until I realized that
people (me included) usually link the concept of some number followed by a p
with the video being 16:9.

So, we would be losing some information and, as we all know,
[explicit is better than implicit][*].

[*]: http://www.python.org/dev/peps/pep-0020/

This closes #1446.

Signed-off-by: Rogério Brito <rbrito@ime.usp.br>
2013-10-19 14:04:44 -03:00
b0505eb611 [CinemassacreIE] Fix information extraction 2013-10-19 16:46:17 +02:00
284acd57d6 Add an author email 2013-10-19 11:14:20 +02:00
8ed6b34477 extractor: Set age limit on some adult-related extractors.
This is similar in spirit to what was done in commit 8e590a117f.

Signed-off-by: Rogério Brito <rbrito@ime.usp.br>
2013-10-18 19:32:37 -03:00
f6f1fc9286 extractor: youtube: Fix extension of dash formats.
While we are at it, separate the audio formats from the video formats.

Signed-off-by: Rogério Brito <rbrito@ime.usp.br>
2013-10-18 18:53:00 -03:00
8e590a117f [xnxx] Add age_limit 2013-10-18 23:35:17 +02:00
d5594202aa Simplify release process 2013-10-18 23:34:55 +02:00
b186d949cf release 2013.10.18.2 2013-10-18 23:22:54 +02:00
3d2986063c [bash-completion] Do not use dash in function name (Fixes #1623) 2013-10-18 23:13:46 +02:00
41fd7c7e60 Add new option --abort-on-error 2013-10-18 23:09:32 +02:00
fdefe96bf2 Document %(format)s (#1612) 2013-10-18 23:09:08 +02:00
16f36a6fc9 extractor: youtube: Set extension of AAC audio formats to m4a.
This, in particular, eases downloading both audio and videos in DASH formats
before muxing them, which alleviates the problem that I exposed on issue

Furthermore, one may argue that this is, indeed, the case for correctness's
sake.

Signed-off-by: Rogério Brito <rbrito@ime.usp.br>
2013-10-18 17:50:55 -03:00
f44415360e Use the console_scripts entry point if setuptools is available 2013-10-18 13:49:25 +02:00
cce722b79c Add metavar to --cache-dir 2013-10-18 11:50:48 +02:00
82697fb2ab release 2013.10.18.1 2013-10-18 11:45:30 +02:00
53c1d3ef49 Check for embedded YouTube player (Fixes #1616) 2013-10-18 11:44:57 +02:00
8e55e9abfc release 2013.10.18 2013-10-18 11:17:21 +02:00
7c58ef3275 [tudou] Fix title regex (Fixes #1614) 2013-10-18 11:16:20 +02:00
416a5efce7 fix typos 2013-10-18 00:49:45 +02:00
f4d96df0f1 Extend #980 with --max-quality support 2013-10-18 00:46:35 +02:00
5d254f776a Fix test 2013-10-18 00:27:51 +02:00
1c1218fefc Merge remote-tracking branch 'jaimeMF/format_selection' 2013-10-18 00:17:03 +02:00
d21ab29200 Add an extractor for techtalks.tv (closes #1606) 2013-10-17 08:20:58 +02:00
54ed626cf8 release 2013.10.17 2013-10-17 02:20:26 +02:00
a733eb6c53 [youtube] Do not crash if caption info is missing altogether (Fixes #1610) 2013-10-17 02:19:19 +02:00
591454798d [brightcove] Raise error if playlist is empty (#1608) 2013-10-17 01:02:17 +02:00
38604f1a4f Merge remote-tracking branch 'origin/master' 2013-10-17 00:55:06 +02:00
2d0efe70a6 [brightcove] Fix more broken XML (#1608) 2013-10-17 00:46:11 +02:00
bfd14b1b2f Add an extractor for rutube.ru (closes #1136)
It downloads with a m3u8 manifest, requires ffmpeg.
2013-10-16 16:57:40 +02:00
76965512da Fix the indentation of the Makefile
It uses tabs, no spaces.
2013-10-15 23:15:15 +02:00
996d1c3242 Don't include the test/testdata directory in the youtube-dl.tar.gz
The last releases included big files that increased the size of the compressed file.
2013-10-15 23:08:52 +02:00
8abbf43f21 release 2013.10.15 2013-10-15 12:06:45 +02:00
10eaae48ff Merge branch 'master' of github.com:rg3/youtube-dl 2013-10-15 12:05:24 +02:00
9d4660cab1 [generic] Support embedded vimeo videos (#1602) 2013-10-15 12:05:13 +02:00
9d74e308f7 [sztvhu] Fix the title extraction 2013-10-15 08:22:59 +02:00
e772692ffd Fix an import in the tests and the Youtube Shows test 2013-10-15 08:22:20 +02:00
8381a92120 [websurg] Skipt the test
It needs login information.
2013-10-15 08:12:30 +02:00
cd054fc491 Use upper-case for prefixes in help to signify bytes (#1043) 2013-10-15 04:53:02 +02:00
f219743e33 Merge remote-tracking branch 'alphapapa/master' 2013-10-15 04:52:07 +02:00
4f41664de8 Merge remote-tracking branch 'Rudloff/websurg' 2013-10-15 02:11:33 +02:00
a4fd04158e Do not import * 2013-10-15 02:07:26 +02:00
44a5f1718a Simplify tests
* Make them directly executable again
* Move common stuff (md5, parameters) to helper
* Never import *
* General clean up
2013-10-15 02:00:55 +02:00
a623df4c7b Credit @Elbandi for sztvhu 2013-10-15 01:34:47 +02:00
7cf67fbe29 [sztvhu] Simplify 2013-10-15 01:33:20 +02:00
3ddf1a6d01 Merge remote-tracking branch 'Elbandi/master' 2013-10-15 01:26:34 +02:00
850555c484 Merge remote-tracking branch 'origin/master' 2013-10-15 01:25:47 +02:00
9ed3bdc64d [tudou] Add support for youku links (Closes #1571) 2013-10-15 01:20:04 +02:00
c45aa56080 [gamespot] Fix video extraction (fixes #1587) 2013-10-14 16:46:07 +02:00
7394b8db3b Merge remote-tracking branch 'origin/master' 2013-10-14 16:07:53 +02:00
f9b3d7af47 Add an extractor for Szombathelyi TV 2013-10-14 13:07:47 +02:00
ea62a2da46 add VideoPremium.tv RTMP support 2013-10-14 01:32:47 -04:00
7468b6b71d Merge pull request #1569 from Jaiz909/1321-download-annotations
Added downloading annotations download support - closes #1321
2013-10-13 22:03:22 -07:00
1fb07d10a3 [youtube] Adds #1312 Download annotations
Adds #1321 Download annotations from youtube
Annotations are downloaded and written to a .annotations.xml file using the https://www.youtube.com/annotations_invideo?features=1&legacy=1&video_id=$VIDEOID API.
Added unit test for annotations.
2013-10-14 16:22:27 +11:00
9378ae6e1d [youku] Allow shortcut youku:ID and make non-matching groups non-matching (#1571) 2013-10-13 15:55:05 +02:00
06723d47c4 Merge remote-tracking branch 'jaimeMF/opus-fix' 2013-10-13 15:26:10 +02:00
69a0c470b5 [arte] Add an extractor for future.arte.tv (closes #1593) 2013-10-13 14:21:13 +02:00
c40f5cf45c [arte] add an extractor for creative.arte.tv (#1593)
The +7 videos now use an independent extractor that is also used for the creative videos
2013-10-13 13:54:31 +02:00
4b7b839f24 Add an extractor for rottentomatoes.com and improve InternetVideoArchiveIE to get the best quality 2013-10-12 22:22:31 +02:00
3d60d33773 Add an extractor for videodetective.com (closes #262)
It uses the internetvideoarchive.com platform.
2013-10-12 21:36:17 +02:00
d7e66d39a0 Add an extractor for internetvideoarchive.com videos
It's used by videodetective.com
2013-10-12 21:34:04 +02:00
d3f46b9aa5 Add support for single-test tox runs
Use a sintax like
    tox test.test_download:TestDownload.test_NowVideo
to run the specific test on all the tox environments (Python versions)
2013-10-12 13:17:11 -04:00
f5e54a1fda add support for NowVideo.ch 2013-10-12 13:11:03 -04:00
4eb7f1d12e FFmpegPostProcessor: print the command line used if the --verbose option is given 2013-10-12 13:49:27 +02:00
0f6d12e43c Don't set the '-aq' option with the opus format (fixes #1263) 2013-10-12 13:30:30 +02:00
b4cdc245cf Merge pull request #1590 from joeyadams/master
Fix Brightcove detection when another Flash object is on the page
2013-10-12 02:09:39 -07:00
3283533149 Fix Brightcove detection when another Flash object is on the page
The regex used non-greedy match, but alas it failed on input like this:

    <object class="...> ... class="BrightcoveExperience"

It captured two objects and the intervening HTML.  This commit fixes this by
not allowing a ">" to appear before BrightcoveExperience.

Video in question: http://www.harpercollinschildrens.com/feature/petethecat/
2013-10-11 21:52:33 -04:00
8032e31f2d Merge pull request #1558 from rzhxeo/cinemassacre
Add support for http://cinemassacre.com
2013-10-11 20:38:26 +02:00
d2f9cdb205 Merge branch 'cinemassacre' of github.com:rzhxeo/youtube-dl into rzhxeo-cinemassacre 2013-10-11 19:53:27 +02:00
8016c92297 Fix the default values of format_id and format 2013-10-11 16:34:49 +02:00
e028d0d1e3 Implement the prefer_free_formats in YoutubeDL 2013-10-11 16:34:49 +02:00
79819f58f2 Default 'format' field to {width}x{height}
If width is None, use {height}p and if height is None, '???'
2013-10-11 16:34:49 +02:00
6ff000b888 Do not handle format selection for IEs that already handle it 2013-10-11 16:34:48 +02:00
99e206d508 Implement the max quality option in YoutubeDL 2013-10-11 16:34:48 +02:00
dd82ffea0c Implement format selection in YoutubeDL
Now the IEs can set a formats field in the info_dict, with the formats ordered from worst to best quality. It's a list of dicts with the following fields:
* Mandatory: url and ext
* Optional: format and format_id

The format_id is used for choosing which formats have to be downloaded.

Now a video result is processed by the method process_video_result.
2013-10-11 16:34:48 +02:00
3823342d9d [arte] Prepare for generic format support (#980) 2013-10-11 16:33:31 +02:00
91dbaef406 [nhl] Add an extractor for videocenter's categories (#1586)
It downloads the last 12 videos.
2013-10-11 14:33:26 +02:00
9026dd3858 Make sure it only runs rtmpdump one time in test mode and return True if the download can be resumed 2013-10-11 12:42:15 +02:00
81d7f1928c Merge pull request #1565 from rzhxeo/rtmpdump_test
Only download 1 sec. with rtmpdump in test mode
2013-10-11 12:40:18 +02:00
bc4f29170f Add a PostProcessor for adding metadata to the file (closes #1570)
It currently sets the title, the date and the author values.
2013-10-11 11:19:09 +02:00
cb354c8f62 [yahoo] Download the info from another page
The 'meta' field is not always in the video webpage
2013-10-10 21:01:45 +02:00
1cbb27b151 [gamespot] Mark as broken (#1587) 2013-10-10 19:55:52 +02:00
0ab4ff6378 [mtv] Strip the description
There were some tabs and newlines added around the string.
2013-10-10 19:53:44 +02:00
63da13e829 Add an extractor for faz.net (closes #1582) 2013-10-10 19:37:17 +02:00
4193a453c2 Don't add extractors with IE_DESC set to False to the page of supported sites. 2013-10-10 16:18:02 +02:00
2e1fa03bf5 Add an extractor for video.nhl.com (closes #1586) 2013-10-10 16:16:49 +02:00
8f1ae18a18 release 2013.10.09 2013-10-09 23:50:47 +02:00
57da92b7df [youtube] Do not recognize attribution link as user (Fixes #1573) 2013-10-09 23:50:38 +02:00
df4f632dbc Merge pull request #1584 from wingsuit/master
Tiny tpo
2013-10-09 07:44:06 -07:00
a34c2faae4 [youtube] set the 'name' parameter in the subtitles url (fixes #1577) 2013-10-09 16:41:36 +02:00
Tom
1d368c7589 Tiny tpo 2013-10-09 21:56:09 +08:00
88bd97e34c [vevo] Some improvements (fixes #1580)
Extract the info from http://videoplayer.vevo.com/VideoService/AuthenticateVideo?isrc={id}
Some videos don't have an smil manifest, extract the video urls directly from the json and use the last version of the video.
Extract all the available formats and set the 'formats' field of the result
2013-10-08 21:25:38 +02:00
2ae3edb1cf Fix the printing of the proxy map in debug mode
The proxies have to be extracted from the opener.handlers
2013-10-07 21:10:31 +02:00
b2ad967e45 Simplify test setup 2013-10-07 19:06:36 +02:00
a27b9e8bd5 Move opener setup into a separate helper function 2013-10-07 19:01:47 +02:00
4481a754e4 release 2013.10.07 2013-10-07 14:34:19 +02:00
faa6ef6bc8 [jeuxvideo] Improve code quality (fixes #1567) 2013-10-07 14:33:23 +02:00
15870e90b0 Restore warning when user forgets to quote URL (#1396) 2013-10-07 12:21:24 +02:00
8e4f824365 Remove test parameter from _download_with_rtmpdump 2013-10-06 22:04:32 +02:00
387ae5f30b [vimeo] Recognize urls ending in a slash (fixes #1242) 2013-10-06 21:56:23 +02:00
ad7a071ab6 Only download 1 sec. with rtmpdump in test mode 2013-10-06 20:55:24 +02:00
1310bf2474 [redtube] add age_limit 2013-10-06 16:39:35 +02:00
b24f347190 Merge branch 'download-archive'
Conflicts:
	youtube_dl/YoutubeDL.py
	youtube_dl/__init__.py
2013-10-06 16:30:26 +02:00
ee6c9f95e1 Remove superfluous parenthesis 2013-10-06 16:28:36 +02:00
2a69c6b879 Merge branch 'age_limit' 2013-10-06 16:23:18 +02:00
cfadd183c4 Call extracted property age_limit everywhere 2013-10-06 16:23:06 +02:00
e484c81f0c [generic] Clarify error messages 2013-10-06 16:03:18 +02:00
7e5e8306fd release 2013.10.06 2013-10-06 07:13:14 +02:00
41e8bca4d0 [viddler] Add basic support (Fixes #1520) 2013-10-06 07:12:47 +02:00
8dbe9899a9 Allow users to specify an age limit (fixes #1545)
With these changes, users can now restrict what videos are downloaded by the intented audience, by specifying their age with --age-limit YEARS .
Add rudimentary support in youtube, pornotube, and youporn.
2013-10-06 06:08:56 +02:00
f4aac741d5 Move try_rm to test helpers 2013-10-06 05:47:17 +02:00
c1c9a79c49 Add basic --download-archive option
Often, users want to be able to download only videos they haven't seen before, despite the video files having been deleted or moved in the mean time.
When --download-archive FILE is given, the extractor and ID of every download is recorded in the specified file. If it is already present, the video in question is skipped.
2013-10-06 04:27:10 +02:00
226113c880 Merge remote-tracking branch 'origin/tox' 2013-10-05 22:47:44 +02:00
8932a66e49 [fixup] remove unnecessary commented function 2013-10-05 16:38:37 -04:00
79cfb46d42 add tox configuration file for easy testing 2013-10-05 16:08:48 -04:00
00fcc17aee add capability to suppress expected warnings in tests 2013-10-05 15:55:58 -04:00
e94b783c74 [googleplus] Fix upload_date detection 2013-10-05 16:38:33 +02:00
97dae9ae07 [bliptv] Make sure video ID is a string 2013-10-05 16:12:29 +02:00
ca215e0a4f [CinemassacreIE] Use MD5 to check in TEST description 2013-10-05 13:42:17 +02:00
91a26ca559 [CinemassacreIE] Remove docstring from class 2013-10-05 13:40:05 +02:00
1ece880d7c [CinemassacreIE] Add support for other embed methods 2013-10-05 13:36:13 +02:00
400afddaf4 Add CinemassacreIE 2013-10-05 09:37:11 +02:00
c3fef636b5 [dailymotion] Fix playlist extraction
The html code has changed, make the video ids extraction more solid.
2013-10-04 14:07:29 +02:00
46e28a84ca [brightcove] Fix up some broken HTML (#1553) 2013-10-04 11:53:49 +02:00
17ad2b3fb1 [yahoo] Switch ext of test 2013-10-04 11:44:56 +02:00
5e2a60db4a [yahoo] Fix test title 2013-10-04 11:44:02 +02:00
cd214418f6 [redtube] pep8 2013-10-04 11:41:57 +02:00
ba2d9f213e [jeuxvideo] fix video file md5sum 2013-10-04 11:38:56 +02:00
7f8ae73a5d Include length in player cache ID
Some videos use the same player with IDs of multiple lengths.
See https://travis-ci.org/rg3/youtube-dl/jobs/12126506#L319 for an example.
2013-10-04 11:36:06 +02:00
466880f531 [yahoo] Do not try to run rtmpdump on travis 2013-10-04 11:34:12 +02:00
9f1f6d2437 [rtlnow] Skip test on travis 2013-10-04 11:33:14 +02:00
9e0f897f6b [francetv] Use common format for ID of generation-quoi subextractor 2013-10-04 11:30:47 +02:00
c0f6aa876f Merge remote-tracking branch 'origin/master' 2013-10-04 11:14:20 +02:00
d93bdee9a6 [comedycentral] Prepare for generic video extraction (#980) 2013-10-04 11:14:10 +02:00
f13d09332d [mtv] Prepare for #980 2013-10-04 11:10:04 +02:00
2f5865cc6d Clarify that url and ext are optional when formats is given (#980) 2013-10-04 11:09:43 +02:00
deefc05b88 Document formats (for #980) 2013-10-04 10:40:42 +02:00
0d8cb1cc14 [ted] Prepare #980 merge 2013-10-04 10:32:34 +02:00
a90b9fd209 Merge pull request #1551 from rzhxeo/flickr
[FlickrIE] Fix HTTPS url
2013-10-03 23:14:12 -07:00
829493439a [FlickrIE] Fix HTTPS url 2013-10-04 07:47:40 +02:00
73b4fafd82 Use self._download_webpage everywhere 2013-10-04 01:12:42 +02:00
b039775057 Unused variable 2013-10-04 01:07:24 +02:00
5c1d63b737 Changes suggested by @phihag 2013-10-04 01:04:38 +02:00
3cd022f6e6 Merge remote-tracking branch 'rzhxeo/rtl_ntv' 2013-10-04 00:59:11 +02:00
abefd1f7c4 Merge remote-tracking branch 'rzhxeo/rtl_upload_date' 2013-10-04 00:58:35 +02:00
c21315f273 [youtube] new static 82 signature 2013-10-04 00:43:01 +02:00
9ab1018b1a release 2013.10.04 2013-10-04 00:38:19 +02:00
da0a5d2d6e [france2] Add support for URLs without video IDs (Fixes #1547) 2013-10-04 00:34:36 +02:00
ee6adb166c [ign] Support more urls and detect multiple videos in articles (fixes #1543) 2013-10-02 20:59:34 +02:00
be8fe32c92 Fix help of --cachedir 2013-10-02 14:37:19 +02:00
c38b1e776d [youtube] Simplify cache_dir code (#1529) 2013-10-02 08:41:14 +02:00
4f8bf17f23 Merge remote-tracking branch 'holomorph/master' 2013-10-02 08:23:53 +02:00
ca40186c75 [youtube] Fix static 82 signature (Closes #1539) 2013-10-02 08:20:00 +02:00
a8c6b24155 [youtube] Support videos without a title (Fixes #1391, Closes #1542) 2013-10-02 07:25:35 +02:00
bd8e5c7ca2 Merge pull request #1531 from rg3/no-playlist
[youtube] implement --no-playlist to only download current video
2013-10-01 10:08:20 -07:00
7c61bd36bb [youtube] correct --no-playlist for python3 2013-10-01 11:58:13 -04:00
c54283824c [dailymotion] Detect vevo videos (fixes #1532)
All videos from the Vevo user, just embed videos from vevo.com
2013-10-01 15:05:41 +02:00
52f15da2ca release 2013.10.01.1 2013-10-01 14:44:26 +02:00
44d466559e Properly handle stream meap not being present 2013-10-01 14:44:09 +02:00
05751eb047 release 2013.10.01 2013-10-01 11:43:54 +02:00
f10503db67 Handle videos without url_encoded_fmt_stream_map (Fixes #1535) 2013-10-01 11:39:11 +02:00
adfeafe9e1 [RTLnowIE] Allow video description without upload date
Some videos (feature films) have no upload date.
2013-10-01 07:22:49 +02:00
4c62a16f4f [RTLnowIE] Add support for http://n-tvnow.de 2013-10-01 06:55:30 +02:00
c0de39e6d4 Merge pull request #2 from rg3/master
Update
2013-09-30 21:39:58 -07:00
fa55675593 Support XDG base directory specification 2013-09-30 18:22:38 -04:00
d4d9920a26 add test for --no-playlist 2013-09-30 18:01:17 -04:00
47192f92d8 implement --no-playlist to only download current video - closes #755 2013-09-30 16:26:25 -04:00
722076a123 [rtlnow] Replace one of the tests
The video is no longer available.
2013-09-29 23:07:26 +02:00
bb4aa62cf7 [appletrailers] The request for the settings must have the trailer name in lower case (fixes #1329) 2013-09-29 20:59:19 +02:00
843530568f [appletrailers] Rework extraction (fixes #1387)
The exraction was broken:
* The includes page contains img elements that need to be fixed.
* Use the 'itunes.inc' page, it contains a json dictionary for each trailer with information.
* Get the formats from 'includes/settings{trailer_name}.json'
* Use urljoin to allow urls with a fragment identifier to work

Removed the thumbnail urls from the tests, they are different now.
2013-09-29 20:49:58 +02:00
138a5454b5 release 2013.09.29 2013-09-29 14:38:37 +02:00
d279037036 [update] Prevent cmd window popup on Windows (Fixes #1478) 2013-09-29 14:37:06 +02:00
46353f6783 [update] Look for .exe extension on Windows (Fixes #745) 2013-09-29 14:37:00 +02:00
70922df8b5 [dailymotion] Disable the family filter in the playlists (fixes #1524) 2013-09-29 12:44:02 +02:00
9c15e9de84 [yahoo] Fix video extraction (fixes #1521)
There's no need to use two different methods.
Now we can also download videos over http if possible.
Also run the test for rtmp videos, but skip the download.
2013-09-28 21:19:52 +02:00
123c10608d Merge branch 'master' of github.com:rg3/youtube-dl 2013-09-28 15:43:38 +02:00
0b7c2485b6 [zdf] Add support for hash URLs and simplify (#1518) 2013-09-28 15:43:34 +02:00
9abb32045a [youtube] Add hlsvp to the error message if it can't be found and remove the live stream test
It's no longer available, other olympics streams have the same problem.
2013-09-27 15:06:27 +02:00
f490e77e77 [youtube] Set the thumbnail to None if it can't be extracted 2013-09-27 14:22:36 +02:00
2dc592991a [youtube] update description of test 2013-09-27 14:20:52 +02:00
0a60edcfa9 Don't fail if the video thumbnail couldn't be downloaded (fixes #1516)
Just report a warning
2013-09-27 14:19:19 +02:00
c53f9d30c8 Merge branch 'master' of github.com:rg3/youtube-dl 2013-09-27 13:09:58 +02:00
509f398292 Remove youtube_genalgo (#1515)
With the automatic signature extraction, this script has become superfluous now
2013-09-27 13:09:24 +02:00
74bab3f0a4 Don't embed subtitles if the list is empty or the field is not set (fixes #1510) 2013-09-27 08:08:43 +02:00
8574862991 Merge remote-tracking branch 'rzhxeo/RTL_T' 2013-09-27 06:25:04 +02:00
2de957c7e1 Merge remote-tracking branch 'rzhxeo/RTL' 2013-09-27 06:23:10 +02:00
920de7a27d [youtube] Fix 83 signature (Closes #1511) 2013-09-27 06:15:21 +02:00
63efc427cd [RTLnowIE] Clean video title
The title of some videos has the following format:
Series - Episode | Series online schauen bei ... NOW
2013-09-27 06:00:37 +02:00
ce65fb6c76 [RTLnowIE] Add support for http://rtlnitronow.de 2013-09-27 05:50:16 +02:00
4de1994b6e [brightcove] Use direct url for the tests
The test_all_urls.py test failed because BrightcoveIE doesn't match them.
2013-09-26 18:59:56 +02:00
592882aa9f [brightcove] Support videos that only provide flv versions (fixes #1504)
Moved the test from generic.py to brightcove.py
2013-09-26 13:54:31 +02:00
b98d6a1e19 release 2013.09.24.2 2013-09-24 21:55:34 +02:00
29c7a63df8 Remove debugging code 2013-09-24 21:55:25 +02:00
8b25323ae2 release 2013.09.24.1 2013-09-24 21:40:47 +02:00
f426de8460 Merge remote-tracking branch 'origin/master' 2013-09-24 21:40:30 +02:00
695dc094ab Merge branch 'automatic-signatures' 2013-09-24 21:40:08 +02:00
e80d861064 Revert "[southparkstudios] Fix mgid extraction"
This reverts commit 0fd49457f5.

It seems that the redesign was temporary.
2013-09-24 21:39:38 +02:00
2cdeb20135 release 2013.09.24 2013-09-24 21:28:06 +02:00
7f74773254 Add option --no-cache-dir 2013-09-24 21:26:10 +02:00
f2c327fd39 Fix 86 signature (#1494) 2013-09-24 21:20:42 +02:00
e35e4ddc9a Fix output of --youtube-print-sig-code when counting down to 0 2013-09-24 21:18:03 +02:00
c3c88a2664 Allow opts.cachedir == None to disable cache 2013-09-24 21:04:43 +02:00
bb0eee71e7 [youtube] Update one of the test's description 2013-09-24 21:04:13 +02:00
6f56389b88 [youtube] update algos for length 86 and 84 (fixes #1494) 2013-09-24 21:02:00 +02:00
5b333c1ce6 [francetv] Add an extractor for Generation Quoi (closes #1475) 2013-09-23 21:41:54 +02:00
a825f33030 [francetv] Add an extractor for France2 2013-09-23 21:28:33 +02:00
92f618f2e2 Merge remote-tracking branch 'origin/master' 2013-09-23 11:24:49 +02:00
81ec7c7901 [facebook] Allow untitled videos (Fixes #1484) 2013-09-23 11:24:33 +02:00
dd5d2eb03c If the file is already downloaded include the size in the progress hook 2013-09-22 23:39:30 +02:00
4ae720042c Include the eta and the speed in the progress hooks
Useful when listening to the progress hook, for example in a GUI.
2013-09-22 23:31:39 +02:00
c705320f48 Correct test strings 2013-09-22 12:18:16 +02:00
d2d8f89531 Do not warn if fallback is without alternatives (because we did not get the flash player URL) 2013-09-22 12:18:10 +02:00
bdde940e90 [youtube] Improve flash player URL handling 2013-09-22 12:17:42 +02:00
45f4a76dbc Work around nosetests nosiness 2013-09-22 11:45:29 +02:00
13dc64ce74 [youtube] Remove _decrypt_signature_age_gate 2013-09-22 11:17:21 +02:00
c35f9e72ce Move cachedir doc 2013-09-22 11:09:25 +02:00
f8061589e6 [youtube] Actually pass in cachedir option 2013-09-22 10:51:33 +02:00
0ca96d48c7 [youtube] Improve source code quality 2013-09-22 10:37:23 +02:00
4ba146f35d Update static signatures 2013-09-22 10:31:25 +02:00
edf3e38ebd [youtube] Improve cache and add an option to print the extracted signatures 2013-09-22 10:30:02 +02:00
c4417ddb61 [youtube] Add filesystem signature cache 2013-09-22 00:35:03 +02:00
4a2080e407 [youku] better error handling
blocked videos used to cause death by TypeError, now we report what the
server says
2013-09-21 20:50:31 +02:00
2f2ffea9ca Clarify a couple of calls 2013-09-21 15:34:29 +02:00
ba552f542f Use reader instead of indexing 2013-09-21 15:32:37 +02:00
8379969834 Prepare signature function caching 2013-09-21 15:19:48 +02:00
95dbd2f990 Change test target (Verified with node.js) 2013-09-21 15:10:38 +02:00
a7177865b1 Implement more opcodes 2013-09-21 14:48:12 +02:00
e0df6211cc Restore accidentally deleted commits
That's what happens if you let Windows machines write :(
2013-09-21 14:40:35 +02:00
b00ca882a4 [livestream] Fix events extraction (fixes #1467) 2013-09-21 13:50:52 +02:00
39baacc49f [dailymotion] Add an extractor for users (closes #1476) 2013-09-21 12:45:53 +02:00
3a1d48d6de [dailymotion] Raise ExtractorError if the dailymotion response reports an error 2013-09-21 12:15:54 +02:00
34308b30d6 Warn if no locale is set (#1474) 2013-09-21 11:48:07 +02:00
bc1506f8c0 Merge branch 'master' of github.com:rg3/youtube-dl 2013-09-21 11:10:30 +02:00
b61067fa4f Abort if extractaudio is given without a variable extension (#1470) 2013-09-21 11:10:22 +02:00
69b227a9bc [southparkstudios] add support for http://www.southparkstudios.com/full-episodes/* urls (closes #1469) 2013-09-21 10:58:43 +02:00
0fd49457f5 [southparkstudios] Fix mgid extraction 2013-09-21 10:51:25 +02:00
58f289d013 release 2013.09.20.1 2013-09-20 22:59:14 +02:00
3d60bb96e1 Add an extractor for ebaumsworld.com (closes #1462) 2013-09-20 16:55:50 +02:00
38d025b3f0 [youtube] add algo for length 91 2013-09-20 14:43:16 +02:00
c40c6aaaaa Catch socket.error before IOError
Since python 2.6 it's a child class.
2013-09-20 13:26:03 +02:00
1a810f0d4e [funnyordie] Fix video url extraction 2013-09-20 13:05:34 +02:00
63037593c0 release 2013.09.20 2013-09-20 10:24:48 +02:00
7a878d47fa Merge pull request #1464 from patrickslin/patch-7
Unable to decrypt signature length 93 (fixes #1461)
2013-09-20 08:25:10 +02:00
bc4b900898 Unable to decrypt signature length 93 (fixes #1461) 2013-09-19 21:49:06 -07:00
c5e743f66f [fktv] support videos splitted in any number of parts and some style changes 2013-09-18 23:32:37 +02:00
6c36d8d6fb Merge pull request #1438 from rzhxeo/fktv
Add support for http://fernsehkritik.tv
2013-09-18 23:05:56 +02:00
71c82637e7 [youtube] apply the fix for lists with number of videos multiple of _MAX_RESULTS to user extraction
Copied from the playlist extractor.
2013-09-18 23:00:32 +02:00
2dad310e2c Credit @Ruirize for newgrounds 2013-09-18 22:30:22 +02:00
d0ae9e3a8d [newgrounds] simplify 2013-09-18 22:14:43 +02:00
a19413c311 Changed file hash. 2013-09-18 17:17:12 +01:00
1ef80b55dd Fixes test fail
Was unaware of --id being passed to test.
2013-09-18 16:23:38 +01:00
eb03f4dad3 Added Newgrounds support 2013-09-18 15:54:45 +01:00
830dd1944a Clarify -i help (#1453) 2013-09-18 13:23:04 +02:00
cc6943e86a Improvements 2013-09-18 00:07:04 +02:00
1237c9a3a5 XHamsterIE: Fix support for new HD video url format and add test (closes PR #1443) 2013-09-17 23:08:01 +02:00
8f77093262 Merge remote-tracking branch 'upstream/master' into websurg 2013-09-17 23:07:44 +02:00
5d13df79a5 [francetv] Remove Pluzz test
Videos expire in 7 days
2013-09-17 22:49:43 +02:00
d79a0e233a Extractor for websurg.com 2013-09-17 22:13:40 +02:00
6523223a4c [hotnewhiphop] Fix test case title 2013-09-17 21:10:57 +02:00
4a67aafb7e [youtube] Don't search the flash player version for videos with age gate activated 2013-09-17 20:59:55 +02:00
f3f34c5b0f release 2013.09.17 2013-09-17 17:00:20 +02:00
6ae8ee3f54 Update 85 signature (Fixes #1449)
This is the first signature algorithm to have been parsed automatically, although that only works for HTML5 players for now, and is not yet integrated into master.
2013-09-17 16:59:13 +02:00
e8f8e80097 Add an extractor for vice.com (closes #1051) 2013-09-16 20:58:36 +02:00
4dc0ff3ecf [ooyala] prefer ipad url
It has better quality with m3u8 manifests
2013-09-16 20:38:54 +02:00
4b6462fc1e Add an extractor for Bloomberg (closes #1436) 2013-09-16 20:38:48 +02:00
c4ece78564 [ooyala] add support for more type of video urls, like m3u8 manifests. 2013-09-16 19:34:10 +02:00
0761d02b0b Add FKTV extractor 2013-09-16 14:46:19 +02:00
71c107fc57 Add FKTV extractor
Support for Fernsehkritik-TV (incl. Postecke)
2013-09-16 14:45:14 +02:00
7459e3a290 Always correct encoding when writing to sys.stderr (Fixes #1435) 2013-09-16 06:55:41 +02:00
f9e66fb993 release 2013.09.16 2013-09-16 04:12:57 +02:00
6c603ccce3 [devscripts/release] temporary workarounds 2013-09-16 04:12:43 +02:00
ef66b0c6ef Merge remote-tracking branch 'origin/master' 2013-09-16 03:32:53 +02:00
22b50ecb2f Starts of a Windows service 2013-09-16 03:32:45 +02:00
5a6fecc3de Add an extractor for southparkstudios.com (closes #1434)
It uses the MTV system
2013-09-15 23:30:58 +02:00
cdbccafed9 Merge pull request #1422 from rzhxeo/xhamster
XHamsterIE: Add support for new URL format (download in hd by default)
2013-09-15 12:18:39 +02:00
e69ae5b9e7 [youtube] support youtube.googleapis.com/v/* urls (fixes #1425) 2013-09-15 12:14:59 +02:00
92790f4e54 [soundcloud] Add an extractor for users (closes #1426) 2013-09-14 21:41:49 +02:00
471a5ee908 Set the ext field for each format 2013-09-14 14:45:04 +02:00
19e1d35989 [mixcloud] Rewrite extractor (fixes #278) 2013-09-14 14:26:42 +02:00
0b7f31184d Now --all-sub is a modifier to --write-sub and --write-auto-sub (closes #1412)
For keeping backwards compatibility --all-sub sets --write-sub if --write-auto-sub is not given
2013-09-14 11:14:40 +02:00
fad84d50fe [googleplus] Fix upload date extraction 2013-09-14 11:10:01 +02:00
9a1c32dc54 XHamsterIE: Add support for new URL format 2013-09-14 05:42:00 +02:00
a921f40799 [ustream] Simplify channel extraction
the ChannelParser has been moved to a new function in utils get_meta_content
Instead of the SocialStreamParser now it uses a regex
2013-09-13 22:05:29 +02:00
74ac9bdd82 Merge pull request #1413 from tewe/master
Add Ustream channel support
2013-09-13 21:34:31 +02:00
94518f2087 Merge pull request #1409 from JohnyMoSwag/master (closes #1404)
added kickstarter IE
2013-09-13 19:52:56 +02:00
535f59bbcf Merge pull request #1350 from Jaiz909/description-keyerror-fix
Fixed issue #1277 KeyError when no description.
2013-09-13 18:20:42 +02:00
71cedb3c0c [buildserver] Service installation and uninstallation 2013-09-13 02:25:12 +02:00
dd01d6558a [gamespot] Update test video title 2013-09-12 22:18:39 +02:00
ce85f022d2 [youtube] update algo for length 82 (fixes #1416) 2013-09-12 22:04:09 +02:00
ad94a6fe44 [canalplust] accept urls that don't include the video id (fixes #1415), extract more info and update test 2013-09-12 21:56:36 +02:00
353ba14060 [buildserver] Rely on repository license 2013-09-12 16:34:24 +02:00
83de794223 Add original buildserver from @fraca7 2013-09-12 16:30:43 +02:00
bfd5c93af9 Add Ustream channel support 2013-09-12 12:30:14 +02:00
c247d87ef3 [funnyordie] fix video url extraction 2013-09-12 11:31:27 +02:00
07ac9e2cc2 release 2013.09.12 2013-09-12 11:26:44 +02:00
6bc520c207 Check for both automatic captions and subtitles with options --write-sub and --write-auto-sub (fixes #1224) 2013-09-12 11:15:25 +02:00
f1d20fa39f added kickstarter IE 2013-09-11 14:50:38 -07:00
e3dc22ca3a [youtube] Fix detection of videos with automatic captions 2013-09-11 19:24:56 +02:00
d665f8d3cb [subtitles] Also list the available automatic captions languages with '--list-sub' 2013-09-11 19:17:30 +02:00
055e6f3657 [youtube] Support automatic captions with original language different from English (fixes #1225) and download in multiple languages. 2013-09-11 19:08:43 +02:00
ac4f319ba1 Credit @iemejia 2013-09-11 17:58:51 +02:00
542cca0e8c Merge branch 'subtitles_rework' (closes PR #1326) 2013-09-11 17:41:24 +02:00
6a2449df3b [howcast] Do not download from http://www.howcast.com/videos/{video_id}
It takes too much to follow the redirection.
2013-09-11 17:36:23 +02:00
7fad1c6328 [subtitles] Use self._download_webpage for extracting the subtitles
It raises ExtractorError for the same exceptions we have to catch.
2013-09-11 16:24:47 +02:00
d82134c339 [subtitles] Simplify the extraction of subtitles in subclasses and remove NoAutoSubtitlesInfoExtractor
Subclasses just need to call the method extract_subtitles, which will call _extract_subtitles and _request_automatic_caption
Now the default implementation of _request_automatic_caption returns {}.
2013-09-11 16:05:49 +02:00
54d39d8b2f [subtitles] rename SubitlesIE to SubtitlesInfoExtractor
Otherwise it can be automatically detected as a IE ready for use.
2013-09-11 15:51:04 +02:00
de7f3446e0 [youtube] move subtitles methods from the base extractor to YoutubeIE 2013-09-11 15:48:23 +02:00
f8e52269c1 [subtitles] made inheritance hierarchy flat as requested 2013-09-11 15:21:09 +02:00
cf1dd0c59e Merge branch 'master' into subtitles_rework 2013-09-11 14:26:48 +02:00
22c8b52545 In the supported sites page, sort the extractors in case insensitive 2013-09-11 12:04:27 +02:00
1f7dc42cd0 release 2013.11.09 2013-09-11 11:30:10 +02:00
aa8f2641da [youtube] update algo for length 85 (fixes #1408 and fixes #1406) 2013-09-11 11:24:58 +02:00
648d25d43d [francetv] Add an extractor for francetvinfo.fr (closes #1317)
It uses the same system as Pluzz, create a base class for both extractors.
2013-09-10 15:50:34 +02:00
df3e61003a Merge pull request #1402 from Rudloff/canalc2
Wrong property name
2013-09-10 03:19:37 -07:00
6b361ad5ee Wrong property name 2013-09-10 12:13:22 +02:00
5d8afe69f7 Add an extractor for pluzz.francetv.fr (closes PR #1399) 2013-09-10 12:00:00 +02:00
a1ab553858 release 2013.09.10 2013-09-10 11:25:11 +02:00
07463ea162 Add an extractor for Slideshare (closes #1400) 2013-09-10 11:19:58 +02:00
6d2d21f713 [sohu] add support for my.tv.sohu.com urls (fixes #1398) 2013-09-09 19:56:16 +02:00
061b2889a9 Fix the minutes part in FileDownloader.format_seconds (fixed #1397)
It printed for the minutes the result of (seconds // 60)
2013-09-09 10:38:54 +02:00
8963d9c266 [youtube] Modify the regex to match ids of length 11 (fixes #1396)
In urls like http://www.youtube.com/watch?v=BaW_jenozKcsharePLED17F32AD9753930 you can't split the query string and ids always have that length.
2013-09-09 10:33:12 +02:00
890f62e868 Revert "[youtube] Fix detection of tags from HLS videos."
They have undo the change

This reverts commit 0638ad9999.
2013-09-08 18:50:07 +02:00
8f362589a5 release 2013.09.07 2013-09-07 22:29:15 +02:00
a27a2470cd Merge branch 'master' of github.com:rg3/youtube-dl 2013-09-07 22:28:54 +02:00
72836fcee4 Merge branch 'master' into subtitles_rework 2013-09-06 23:24:41 +02:00
a7130543fa [generic] If the url doesn't specify the protocol, then try to extract prepending 'http://' 2013-09-06 18:39:35 +02:00
a490fda746 [daylimotion] accept embed urls (fixes #1386) 2013-09-06 18:36:07 +02:00
7e77275293 Add an extractor for Metacritic 2013-09-06 18:08:07 +02:00
d6e203b3dc [subtitles] fixed multiple subtitles language separated by comma after merge
As mentioned in the pull request, I forgot to include this changes.
aa6a10c44a
2013-09-06 16:30:13 +02:00
e3ea479087 [youtube] Fix some issues with the detection of playlist/channel urls (reported in #1374)
They were being caught by YoutubeUserIE, now it only extracts a url if the rest of extractors aren't suitable.
Now the url tests check that the urls can only be extracted with an specific extractor.
2013-09-06 16:24:24 +02:00
faab1d3836 [youtube] Fix detection of feeds urls (fixes #1294)
Urls like https://www.youtube.com/feed/watch_later were being as users (before the last changes to YoutubeUserIE, as videos)
2013-09-06 14:45:49 +02:00
8851a574a3 Fix add-versions 2013-09-06 11:07:34 +02:00
59282080c8 release 2013.09.06.1 2013-09-06 10:53:35 +02:00
98f3da4040 Merge remote-tracking branch 'origin/master' 2013-09-06 10:53:24 +02:00
1d213233cd Do not re-download files for hashsum generation (Fixes #1383) 2013-09-06 10:51:53 +02:00
fd9cf73836 [youtube] Users: download from the api in json to simplify extraction (fixes #1358)
There could be duplicate videos or other videos if the description have links.
2013-09-06 10:43:02 +02:00
0638ad9999 [youtube] Fix detection of tags from HLS videos. 2013-09-06 10:25:31 +02:00
1eb527692a release 2013.09.06 2013-09-06 10:13:33 +02:00
09bb17e108 Merge pull request #1378 from patrickslin/patch-6
Vevo sig changed again, please update for us! Thanks very much! (fixes #...
2013-09-06 09:53:23 +02:00
1cf911bc82 Vevo sig changed again, please update for us! Thanks very much! (fixes #1375) 2013-09-05 17:38:03 -07:00
f4b052321b [youtube] Urls like youtube.com/NASA are now interpreted as users (fixes #1069)
Video urls like http://youtube.com/BaW_jenozKc are not valid, but http://youtu.be/BaW_jenozKc is correct.
2013-09-05 22:39:15 +02:00
a636203ea5 release 2013.09.05 2013-09-05 22:30:50 +02:00
c215217e39 [youtube] Playlists: extract the videos id from ['media$group']['yt$videoid'] (fixes #1374)
'media$player' is not defined for private videos.
2013-09-05 21:40:04 +02:00
08e291b54d [generic] Recognize html5 video in the format '<video src=".+?"' and only unquote the url when extracting the id (fixes #1372) 2013-09-05 18:02:17 +02:00
6b95b065be Add extractor for tvcast.naver.com (closes #1331) 2013-09-05 10:53:40 +02:00
9363169b67 [daum] Get the video page from a canonical url to extract the full id (fixes #1373) and extract description. 2013-09-05 10:08:17 +02:00
085bea4513 Credit @Huarong for tv.sohu.com 2013-09-04 22:09:22 +02:00
150f20828b Add extractor for daum.net (closes #1330) 2013-09-04 22:06:50 +02:00
08523ee20a release 2013.09.04 2013-09-04 14:33:32 +02:00
5d5171d26a Merge pull request #1341 from xanadu/master
add support for "-f mp4" for YouTube
2013-09-03 18:52:12 -07:00
96fb5605b2 AHLS -> Apple HTTP Live Streaming 2013-09-03 18:49:35 -07:00
7011de0bc2 Merge pull request #1363 from Rudloff/defense
defense.gouv.fr
2013-09-03 18:23:08 -07:00
c3dd69eab4 Merge remote-tracking branch 'upstream/master' 2013-09-03 12:22:29 -07:00
025171c476 Suggested by @phihag 2013-09-03 12:03:19 +02:00
c8dbccde30 [orf] Remove the test video, they seem to expire in one week 2013-09-03 11:51:01 +02:00
4ff7a0f1f6 [dailymotion] improve the regex for extracting the video info 2013-09-03 11:33:59 +02:00
9c2ade40de [vimeo] Handle Assertions Error when trying to get the description
In some pages the html tags are not closed, python 2.6 cannot handle it.
2013-09-03 11:11:36 +02:00
aa32314d09 [vimeo] add support for videos that embed the download url in the player page (fixes #1364) 2013-09-03 10:48:56 +02:00
52afe99665 Extractor for defense.gouv.fr 2013-09-03 01:51:17 +02:00
b0446d6a33 Merge remote-tracking branch 'upstream/master' 2013-09-03 01:27:49 +02:00
8e4e89f1c2 Add an extractor for VeeHD (closes #1359) 2013-09-02 11:54:09 +02:00
6c758d79de [metacafe] Add more cases for detecting the uploader detection (reported in #1343) 2013-08-31 22:35:39 +02:00
691008087b Add an automatic page generator for the supported sites (related #156)
They are listed in the "supportedsites.html" page.
2013-08-31 15:18:52 +02:00
85f03346eb Merge remote-tracking branch 'upstream/master' 2013-08-30 17:51:59 -07:00
bdc6b3fc64 add support for "-f mp4" for YouTube 2013-08-30 17:51:50 -07:00
847f582290 Merge remote-tracking branch 'upstream/master' 2013-08-31 00:37:29 +02:00
10f5c016ec release 2013.08.30 2013-08-30 21:02:07 +02:00
2e756879f1 [youtube] update algo for length 86 2013-08-30 20:49:51 +02:00
c7a7750d3b [youtube] Fix typo in the _VALID_URL for YoutubeFavouritesIE, it was intended to also match :ytfavourites 2013-08-30 20:13:05 +02:00
9193c1eede Add youtube keywords to the bash completion script 2013-08-30 20:11:53 +02:00
b3f0e53048 Fixed issue #1277 KeyError when no description.
Allows a continue with a warning when an extractor cannot retrieve a description.
2013-08-31 01:53:01 +10:00
3243d0f7b6 release 2013.08.29 2013-08-29 23:29:34 +02:00
23b00bc0e4 [youtube] update algo for length 84
Only appears sometimes, nearly identical to length 86.
2013-08-29 22:44:29 +02:00
52e1eea18b [youtube] update algo for length 86 (fixes #1349) 2013-08-29 22:33:58 +02:00
ee80d66727 [ign] update 1up extractor to work with the updated IGNIE 2013-08-29 21:51:09 +02:00
f1fb2d12b3 [ign] extract videos from articles pages 2013-08-29 21:39:36 +02:00
deb2c73212 Merge pull request #1347 from whydoubt/fix_orf_at
Fix orf.at extractor by adding file coding mark
2013-08-29 11:05:38 -07:00
8928491074 Fix orf.at extractor by adding file coding mark 2013-08-29 12:51:38 -05:00
545434670b Add an extractor for orf.at (closes #1346)
Make find_xpath_attr also accept numbers in the value
2013-08-29 19:16:07 +02:00
54fda45bac Merge pull request #1342 from whydoubt/fix_mit_26
Fix MIT extractor for Python 2.6
2013-08-29 13:42:08 +02:00
c7bf7366bc Update descriptions checksum for some test for Unistra and Youtube 2013-08-29 13:41:59 +02:00
b7052e5087 Also print the field that fails if it is a md5 checksum 2013-08-29 12:15:45 +02:00
0d75ae2ce3 Fix detection of the webpage charset if it's declared using ' instead of "
Like in "<meta charset='utf-8'/>"
2013-08-29 11:35:15 +02:00
b5ba7b9dcf Fix MIT extractor for Python 2.6
The HTML for the MIT page does not parse cleanly for Python 2.6 due
to script tags within an actual script element.  The offending piece
is inside a comment block, so removing all such comment blocks
fixes the parsing.
2013-08-28 14:24:42 -05:00
483e0ddd4d Merge remote-tracking branch 'upstream/master' 2013-08-28 10:19:28 -07:00
2891932bf0 release 2013.08.28.1 2013-08-28 19:00:17 +02:00
591078babf Merge remote-tracking branch 'upstream/master' 2013-08-28 09:57:28 -07:00
9868c781a1 Merge remote-tracking branch 'origin/master' 2013-08-28 18:22:33 +02:00
c257baff85 Merge remote-tracking branch 'rzhxeo/youporn-hd'
Conflicts:
	youtube_dl/utils.py
2013-08-28 18:22:28 +02:00
878e83c5a4 YoupornIE: Clean up extraction of hd video 2013-08-28 16:04:48 +02:00
0012690aae Let aes_decrypt_text return bytes instead of unicode 2013-08-28 16:03:35 +02:00
6e74bc41ca Fix division bug in aes.py 2013-08-28 16:01:43 +02:00
cba892fa1f Add intlist_to_bytes to utils.py 2013-08-28 15:59:07 +02:00
550bfd4cbd Merge pull request #1 from phihag/youporn-hd-pr
Allow changes to run under Python 3
2013-08-28 06:23:33 -07:00
920ef0779b Hide the password and username in verbose mode (closes #1089) 2013-08-28 15:14:02 +02:00
48ea9cea77 Allow changes to run under Python 3 2013-08-28 14:34:49 +02:00
ccf4b799df Merge remote-tracking branch 'origin/master' 2013-08-28 14:02:40 +02:00
f143d86ad2 [sohu] Handle encoding, and fix tests 2013-08-28 14:00:05 +02:00
8ae97d76ee PostProcessingError holds the message in the 'msg' property, not in 'message' (fixes #1323)
Causes DeprecationWarning: http://www.python.org/dev/peps/pep-0352/
2013-08-28 13:37:31 +02:00
f8b362739e Merge remote-tracking branch 'Huarong/master' 2013-08-28 13:10:59 +02:00
6d69d03bac Merge remote-tracking branch 'origin/reuse_ies' 2013-08-28 13:05:21 +02:00
204da0d3e3 Merge remote-tracking branch 'origin/master' 2013-08-28 12:57:44 +02:00
c496ca96e7 Fix platform name in Python 2 with --verbose (Closes #1228) 2013-08-28 12:57:10 +02:00
67b22dd036 Add extractors for video.mit.edu and techtv.mit.edu (closes #1327)
video.mit.edu just embeds the videos from techtv.mit.edu
2013-08-28 12:55:42 +02:00
ce6a696e4d Remove unused imports 2013-08-28 12:47:38 +02:00
a5caba1eb0 [generic] simply use urljoin 2013-08-28 12:47:27 +02:00
cd9c100963 Merge remote-tracking branch 'upstream/master' 2013-08-28 12:20:12 +02:00
edde6c56ac Print playpath with --get-url (Fixes #1334) 2013-08-28 12:14:45 +02:00
b7f89fe692 Merge remote-tracking branch 'upstream/master' 2013-08-28 12:10:34 +02:00
ae3531adf9 [generic] Fix URL concatenation
When the url is something like http://example.org/foo/bar?x=y  and the added is file/video.mp4 , we want http://example.org/foo/file/video.mp4
Fixes #1268.
2013-08-28 12:08:17 +02:00
8cf5ee7831 Merge branch 'master' of github.com:rg3/youtube-dl 2013-08-28 11:57:18 +02:00
aa3e950764 Tolerate junk at the end of gzip-compressed content (#1268) 2013-08-28 11:57:13 +02:00
1301a0dd42 Merge remote-tracking branch 'upstream/master' 2013-08-28 11:02:12 +02:00
af8bd6a82d Show the time taken to download in the same format as the ETA 2013-08-28 10:56:11 +02:00
6d38616e67 Merge pull request #1181 from h3xx/master
Add some verbosity when reporting finished downloads

Remove the mixed use of tabs and spaces for indentation.
2013-08-28 10:54:07 +02:00
4f5f18acb9 [addanime] add file 2013-08-28 10:28:16 +02:00
3e223834d9 [youtube] update algo for length 88, thanks to @Ramhack (fixes #1328) 2013-08-28 10:26:44 +02:00
a1bb0f8773 [cnn] remove debug print call. 2013-08-28 10:20:37 +02:00
0e283428f7 HTTPError is in urllib.error in Python 3, not in http.error 2013-08-28 10:18:39 +02:00
2eabb80254 [addanime] improve 2013-08-28 04:25:38 +02:00
44586389e4 [appletrailers] Add support 2013-08-28 02:18:44 +02:00
06a401c845 Merge branch 'master' into subtitles_rework 2013-08-28 00:33:12 +02:00
273f603efb [cnn] Allow more URLs 2013-08-28 00:14:19 +02:00
1619e22f40 release 2013.08.28 2013-08-27 23:31:36 +02:00
88a79ce6a6 Delete default user agent (Fixes #1309) 2013-08-27 23:31:24 +02:00
acebc9cd6b Revert "Install our own HTTPS handler as well (#1309)"
This reverts commit 36399e8576 and fixes #1322.
2013-08-27 23:28:20 +02:00
443c12a703 Merge pull request #1324 from whydoubt/fix_gplus
Initial slash in Google+ photos link was removed
2013-08-27 13:36:39 -07:00
7f3c4f4f65 Initial slash in Google+ photos link was removed 2013-08-27 14:38:50 -05:00
0bc56fa66a Add an extractor for NBC news (closes #1320) 2013-08-27 12:38:57 +02:00
1a582dd49d Add an extractor for CNN (closes #1318) 2013-08-27 11:56:48 +02:00
c5b921b597 Merge remote-tracking branch 'upstream/master' 2013-08-27 10:47:47 +02:00
e86ea47c02 [canalc2] Small improvements 2013-08-27 10:35:20 +02:00
aa5a63a5b5 Merge remote-tracking branch 'Rudloff/canalc2' 2013-08-27 10:31:46 +02:00
2a7b4da9b2 [hark] get the song info in JSON and extract more information. 2013-08-27 10:25:38 +02:00
069d098f84 [canalplus] Accept player.canalplus.fr urls 2013-08-27 10:21:57 +02:00
b3889f7023 release 2013.08.27 2013-08-27 02:30:47 +02:00
65883c8dbd Merge branch 'master' of github.com:rg3/youtube-dl 2013-08-27 02:00:23 +02:00
341ca8d74c [trilulilu] Add support for trilulilu.ro
Fun fact: The ads (not yet supported) are loaded from youtube ;)
2013-08-27 01:59:00 +02:00
99859d436c Merge remote-tracking branch 'upstream/master' 2013-08-26 15:16:13 -07:00
1b01e2b085 Merge pull request #1315 from yasoob/master
fixed tests for c56 and dailymotion
2013-08-26 13:38:48 -07:00
976fc7d137 fixed tests for c56 and dailymotion 2013-08-27 01:00:17 +05:00
c3b7b29c23 Merge remote-tracking branch 'origin/master' 2013-08-26 21:29:44 +02:00
627a91a9a8 [generic] small typo 2013-08-26 21:29:31 +02:00
6dc6302599 Merge pull request #1231 from yasoob/master
Added an IE for hark.com
2013-08-26 12:29:04 -07:00
7a20e2e1f8 Merge remote-tracking branch 'upstream/master' 2013-08-26 03:16:42 +02:00
90648143c3 Merge pull request #1310 from rzhxeo/rtlnow
Add support for http://superrtlnow.de
2013-08-25 15:45:22 -07:00
5c6658d4dd Merge remote-tracking branch 'upstream/master' 2013-08-24 23:01:39 +02:00
9585f890f8 [generic] add support for relative URLs (Fixes #1308) 2013-08-24 22:56:37 +02:00
0838239e8e [generic] Support double slash URLs (Fixes #1309) 2013-08-24 22:52:45 +02:00
36399e8576 Install our own HTTPS handler as well (#1309) 2013-08-24 22:49:22 +02:00
9460db832c [ro220] Add support for 220.ro 2013-08-24 21:10:03 +02:00
d68730a56e Add SUPER RTL NOW to RTLnow extractor 2013-08-24 13:22:28 +02:00
f2aeefe29c [youtube] update algo for length 84 2013-08-24 10:48:12 +02:00
39c6f507df Merge remote-tracking branch 'upstream/master' 2013-08-23 15:33:36 -07:00
d2d1eb5b0a Switch to domain yt-dl.org 2013-08-23 23:57:23 +02:00
8ae7be3ef4 release 2013.08.23 2013-08-23 23:09:53 +02:00
306170518f [youtube] update algo for length 86 (fixes #1302) 2013-08-23 22:36:59 +02:00
aa6a10c44a Allow to specify multiple subtitles languages separated by commas (closes #518) 2013-08-23 18:34:57 +02:00
9af73dc4fc Print a message before embedding the subtitles 2013-08-23 18:17:43 +02:00
fc483bb6af [xhamster] use determine_ext 2013-08-23 17:23:34 +02:00
53b0f3e4e2 Merge pull request #1301 from rzhxeo/xhamster
XHamsterIE: Fix video extension and add video description
2013-08-23 17:21:30 +02:00
4353cf51a0 XHamsterIE: Add video description 2013-08-23 16:40:20 +02:00
ce34e9ce5e XHamsterIE: Fix video extension
Cut off GET parameter
2013-08-23 16:33:41 +02:00
d4051a8e05 Add a post processor for embedding subtitles in mp4 videos (closes #1052) 2013-08-23 15:06:19 +02:00
df3df7fb64 [youtube] Fix download of subtitles with '--all-subs'
If _extract_subtitles is called the option 'write subtitles' is always true.
2013-08-23 13:14:22 +02:00
9e9c164052 Merge pull request #937 from jaimeMF/subtitles_rework
Subtitles rework
2013-08-23 02:40:25 -07:00
066090dd3f [youtube] add algo for length 80 and update player info 2013-08-23 11:33:56 +02:00
614d9c19c1 Merge remote-tracking branch 'upstream/master' 2013-08-22 17:02:41 -07:00
bd2dee6c67 Merge branch 'master' into subtitles_rework 2013-08-23 01:47:10 +02:00
74e6672beb Merge pull request #1297 from iemejia/master
[subtitles] separated subtitle options in their own group
2013-08-22 16:30:14 -07:00
02bcf0d389 release 2013.08.22 2013-08-22 23:29:42 +02:00
18b4e04f1c Merge branch 'master' into subtitles_rework 2013-08-22 23:29:36 +02:00
10204dc898 [videofyme] Add an additional quality (they change between downloads of the info) and update md5 sum of the test video 2013-08-22 23:23:52 +02:00
1865ed31b9 [subtitles] separated subtitle options in their own group 2013-08-22 22:44:04 +02:00
3669cdba10 [youtube] update algo for length 82 (fixes #1296) 2013-08-22 22:35:15 +02:00
939fbd26ac [youtube] fix the order of DASH formats 2013-08-22 19:45:24 +02:00
b4e60dac23 Merge remote-tracking branch 'upstream/master' 2013-08-22 10:43:51 -07:00
e6ddb4e7af Merge pull request #1279 from xanadu/master
Add YouTube DASH formats to YouTubeIE
2013-08-22 19:33:34 +02:00
83390b83d9 Merge pull request #1266 from MiLk/py-generator
Update the youtube algorithm generator
2013-08-22 10:18:58 -07:00
ff2424595a lxml is not part of the standard library. 2013-08-22 14:47:51 +02:00
adeb9c73d6 Merge remote-tracking branch 'upstream/master' 2013-08-22 14:04:30 +02:00
cd0abcc0bb Extractor for canalc2.tv 2013-08-22 13:54:23 +02:00
4a55479fa9 Credit Pierre Rudloff for JeuxVideoIE and UnistraIE 2013-08-22 13:21:32 +02:00
f527115b5f Rename utv.py to unistra.py and extract more info
There are other sites that could be named utv, which would conflict if they are added
2013-08-22 13:19:35 +02:00
75e1b46add Download from utv.unistra.fr (PR #1271)
Squashed to a single commit to keep the file 'youtube-dl' unchanged and remove the revert commit.
2013-08-22 12:58:12 +02:00
05a2926c5c Merge remote-tracking branch 'upstream/master' 2013-08-22 12:55:58 +02:00
7070b83687 Merge remote-tracking branch 'upstream/master' 2013-08-22 12:54:17 +02:00
8d212e604a Merge remote-tracking branch 'upstream/master'
Conflicts:
	youtube_dl/extractor/jeuxvideo.py
2013-08-22 12:52:05 +02:00
063fcc9676 [jeuxvideo] Extract more information and add test 2013-08-22 12:37:34 +02:00
8403612258 Merge pull request #1267 from Rudloff/master
Download videos from jeuxvideo.com

Edited to keep the file 'youtube-dl' unchanged.
2013-08-22 12:25:21 +02:00
25b51c7816 Download videos from jeuxvideo.com 2013-08-22 12:12:34 +02:00
9779b63bb6 Add an extractor for PBS (closes #870 and #873) 2013-08-22 11:57:21 +02:00
d81aef3adf Add an extractor for tv.slashdot.org (closes #1192)
It uses the ooyala platform, so it just extracts the ooyala url.
2013-08-21 21:51:58 +02:00
5af7e056a7 Merge remote-tracking branch 'upstream/master' 2013-08-21 10:53:42 -07:00
45ed795cb0 [youtube] update uploader name for a test video: 'IconaPop' has changed to 'Icona Pop' 2013-08-21 19:28:48 +02:00
683e98a8a4 [statigram] change test video
The old one cannot be accessed.
2013-08-21 19:20:27 +02:00
e0cfeb2ea7 [funnyordie] fix extraction of video url and title 2013-08-21 18:58:25 +02:00
75340ee383 [vevo] Fix urls with a query (#1258) 2013-08-21 18:20:03 +02:00
668de34c6b [soundcloud] Support widget urls (fixes #1252) 2013-08-21 17:06:37 +02:00
a91b954bb4 [vimeo] extract information for Vimeo Pro videos from http://player.vimeo.com/video/{video_id} (fixes #1197)
For some videos https://vimeo.com/{video_id} doesn't work
2013-08-21 13:48:19 +02:00
a3f62b8255 Merge remote-tracking branch 'upstream/master' 2013-08-21 00:07:03 -07:00
37b6d5f684 fix hls test 2013-08-20 23:51:05 -07:00
b7a6838407 address review comment 2013-08-20 21:57:32 -07:00
cde846b3d3 fix code style 2013-08-20 21:42:49 -07:00
6c3e6e88d3 Allow hours in ETA display (Fixes #1280) 2013-08-21 05:44:19 +02:00
739674cd77 [rtlnow] Add support for error message for queries from outside of Germany 2013-08-21 05:24:58 +02:00
4b2d7cae11 release 2013.08.21 2013-08-21 04:33:57 +02:00
7fea7156cb [generic] support HTML5 video 2013-08-21 04:32:22 +02:00
3093468977 [generic] Ignore stupid HTTP servers (#1284) 2013-08-21 04:32:07 +02:00
79cb25776f Cache suitable regular expressions
This speeds up TestAllURLsMatching.test_no_duplicates by about 8000% at the cost of minimal memory overhead.
2013-08-21 04:06:48 +02:00
87f78946a5 [collegehumor] Allow old-style videos (Fixes #1285) 2013-08-21 03:50:56 +02:00
211fbc1328 fix failed tests 2013-08-19 18:57:55 -07:00
836a086ce9 Add YouTube DASH formats to YouTubeIE 2013-08-19 18:22:25 -07:00
90d3989b99 Merge remote-tracking branch 'upstream/master' 2013-08-19 17:11:52 -07:00
d741e55a42 [youtube] Support watch_popup URLs (Fixes #1275) 2013-08-19 10:27:42 +02:00
17d3aaaf16 Merge pull request #1273 from rzhxeo/rtlnow
Add support for http://voxnow.de
2013-08-19 00:19:06 -07:00
ea55b2a4ca Add VOXnow to RTLnow extractor 2013-08-19 08:57:36 +02:00
3f0537dd4a Merge remote-tracking branch 'rzhxeo/rtlnow' 2013-08-19 00:25:34 +02:00
943f7f7a39 Download videos from jeuxvideo.com 2013-08-18 16:11:47 +02:00
12e895fc5a Merge branch 'master' into py-generator 2013-08-18 11:12:38 +02:00
bda2c49d75 Update algo - see #1254
Signed-off-by: Emilien Kenler <hello@emilienkenler.com>
2013-08-18 11:10:39 +02:00
01b32990da Add RTLnow extractor 2013-08-18 08:16:53 +02:00
dbda1b5147 Add RTLnow extractor
Supports http://rtl2now.rtl2.de and http://rtl-now.rtl.de
2013-08-18 08:15:18 +02:00
ddf3bd328b release 2013.08.17 2013-08-17 08:33:36 +02:00
b9c37b92cf Merge pull request #1256 from patrickslin/patch-5
Length 85 changed again? (fixes #1254)
2013-08-16 14:07:49 -07:00
5a27ecdd2e Update AddAnime.py 2013-08-16 23:54:09 +03:00
f9c3c90ca8 Length 85 changed again? (fixes #1254) 2013-08-16 08:54:01 -07:00
6daccbe317 release 2013.08.15 2013-08-15 22:40:00 +02:00
71ea844c0e Merge pull request #1248 from patrickslin/patch-4
Unable to Download Video (fixes #1247)
2013-08-15 13:38:32 -07:00
3a7256697e Unable to Download Video (fixes #1247) 2013-08-15 13:00:20 -07:00
d1ba998274 release 2013.08.14 2013-08-14 10:19:53 +02:00
718ced8d8c Merge pull request #1239 from patrickslin/patch-3
Updated Vevo Signature Length (fixes #1237)
2013-08-14 01:18:58 -07:00
e1842025d0 Updated Vevo Signature Length (fixes #1237) 2013-08-13 17:57:35 -07:00
2b9213cdc1 Update generator
Signed-off-by: Emilien Kenler <hello@emilienkenler.com>
2013-08-12 10:48:40 +02:00
e3a88568b0 Added an IE for hark.com 2013-08-11 22:23:05 +05:00
0577177e3e [vevo] fix testcase 2013-08-11 07:12:38 +02:00
298f833b16 Note update possibility on errors (thanks @chbrown, #1229) 2013-08-11 06:46:24 +02:00
97b3656c2e YoupornIE: Add support for hd videos and update Test 2013-08-09 18:37:33 +02:00
f3bcebb1d2 add an aes implementation 2013-08-09 18:36:01 +02:00
0f399e6e5e release 2013.08.09 2013-08-09 15:49:09 +02:00
5b075e27cb Merge pull request #1218 from patrickslin/patch-2
New sig len 89 algo
2013-08-09 03:42:13 -07:00
8a9d86a2a7 New sig len 89 algo
Fixes new YT encrypted sig len 89.
2013-08-08 21:48:12 -07:00
d80a064eff [subtitles] Added tests to check correct behavior when no subtitles are
available
2013-08-08 22:22:33 +02:00
d468a09789 release 2013.08.08.1 2013-08-08 20:45:16 +02:00
9f4ab73d7f Merge pull request #1216 from patrickslin/patch-5
Invalid signature again (fixes #1215)
2013-08-08 11:44:29 -07:00
02cf62e240 Invalid signature again (fixes #1215) 2013-08-08 11:28:50 -07:00
d55de6eec2 [subtitles] Skips now the subtitles that has already been downloaded.
Just a validation for file exists, I also removed a method that wasn't
been used because it was a copy paste from FileDownloader.
2013-08-08 18:30:04 +02:00
69df680b97 [subtitles] Improved docs + new class for servers who don't support
auto-caption
2013-08-08 11:20:56 +02:00
447591e1ae [test] Cleaned subtitles tests 2013-08-08 11:03:52 +02:00
33eb0ce4c4 [subtitles] removed only-sub option (--skip-download achieves the same
functionality)
2013-08-08 10:06:24 +02:00
505c28aac9 Separated subtitle options in their own group 2013-08-08 09:53:25 +02:00
67fb0c5495 Merge branch 'master' of github.com:rg3/youtube-dl 2013-08-08 08:56:59 +02:00
4efba05c56 Clarify template error message (#1209) 2013-08-08 08:55:26 +02:00
8377574c9c [internal] Improved subtitle architecture + (update in
youtube/dailymotion)

The structure of subtitles was refined, you only need to implement one
method that returns a dictionnary of the available subtitles (lang, url) to
support all the subtitle options in a website. I updated the subtitle
downloaders for youtube/dailymotion to show how it works.
2013-08-08 08:54:10 +02:00
0f90943e45 Merge pull request #1189 from cyisfor/master
More informative error
2013-08-07 17:30:28 -07:00
526e638c8a release 2013.08.08 2013-08-08 00:39:23 +02:00
372297e713 Undo the previous commit (it was a mistake) 2013-08-07 21:24:42 +02:00
356e067390 Merge remote-tracking branch 'patrickslin/patch-4' 2013-08-07 20:19:51 +02:00
e2f48f9643 Remove youtube sig tests
The signature algo changes too often for the static test to make sense.
2013-08-07 20:11:40 +02:00
b513a251f8 Merge commit '7a4c6cc92f9ffec9135652a49153caffa5520c29' 2013-08-07 20:11:04 +02:00
953e32b2c1 [dailymotion] Added support for subtitles + new InfoExtractor for
generic subtitle download.

The idea is that all subtitle downloaders must descend from SubtitlesIE
and implement only three basic methods to achieve the complete subtitle
download functionality. This will allow to reduce the code in YoutubeIE
once it is rewritten.
2013-08-07 18:59:11 +02:00
5898e28272 Fixed small type issue 2013-08-07 18:48:24 +02:00
67dfbc0cb9 Added exceptions for the subtitle and video types in .gitignore 2013-08-07 18:42:40 +02:00
36cb11f068 Encrypted sig 87 broken again (fixes #1200) 2013-08-06 21:35:37 -07:00
7a4c6cc92f Updated the 84 length signature decryption
Updated the right 84 length signature decryption 06.08.2013
2013-08-06 15:41:13 +03:00
7edcb8f39c More informative error 2013-08-05 19:43:09 -07:00
d5b00ee6e0 improve sohu extractor 2013-08-06 10:26:57 +08:00
461cead4f7 changes 2013-08-06 04:34:24 +03:00
b5a6d40818 fix parse title bug 2013-08-05 22:51:54 +08:00
968b5e0112 Add some verbosity when reporting finished downloads
For example:

    [download] Resuming download at byte 1868140
    [download] Destination: Entry #1-Bn59FJ4HrmU.flv
    [download] 100% of 3.27MiB in 4s

This format is meant to somewhat mirror the behavior of wget(1) when reporting finished downloads:

    100%[==================>] 54,836,682   788KB/s   in 74s

    2013-08-04 12:32:05 (728 KB/s) - 'google-chrome-stable_current_x86_64.rpm' saved [54836682/54836682]
2013-08-04 12:45:24 -05:00
39b782b390 [collegehumor] support urls in the format www.collegehumor.com/e/{video_id} (fixes #1179) 2013-08-04 16:36:48 +02:00
577664c8e8 Add an extractor from muzu.tv (closes #1177) 2013-08-04 11:10:57 +02:00
bba12cec89 Add an extractor for videofy.me (closes #1171)
Also modify find_xpath_attr to accept values with spaces like for id="HQ on"
2013-08-03 22:50:27 +02:00
70c4c03cb8 [arte] add support for downloading from http://liveweb.arte.tv (fixes #1014) 2013-08-03 19:07:04 +02:00
f5791ed136 [arte] Prefer vídeos without subtitles in the same language (fixes #1173) and fix crash when there's no description 2013-08-03 17:32:29 +02:00
4ec929dc9b use ..utils/clean_html() 2013-08-03 10:29:58 +08:00
fbf189a6ee [myvideo] add support for videos that place the video info inside www.myvideo.de/service/data/video/{id}/config (fixes #616) 2013-08-02 21:09:17 +02:00
09825cb5c0 Add an extractor for Ooyala (closes #833)
Only works for some sites, it doesn't work for videos that use a f4m manifest
2013-08-02 16:53:16 +02:00
ed27d35674 [youtube] don't crash in verbose mode if 'ad3_module' is not defined in age protected videos (fixes #1159) 2013-08-02 14:17:01 +02:00
fd5539eb41 release 2013.08.02 2013-08-02 13:35:13 +02:00
04bca64bde [youtube]: new algo for length 83 (fixes #1164) 2013-08-02 12:38:17 +02:00
03cc7c20c1 [youtube] show which formats are in 3D with "-F" and in the format field 2013-08-02 12:21:28 +02:00
4075311d94 Merge pull request #1163 from xanadu/master
add support for download YouTube 3d format of 3d content
2013-08-02 12:06:34 +02:00
6624a2b07d add an extractor for tv.sohu.com 2013-08-02 17:58:46 +08:00
6d3a7d03e1 fix bug: kankan extractor not support http://vod.kankan.com/v/70/70309.shtml 2013-08-02 15:26:11 +08:00
95fdc7d69c Merge remote-tracking branch 'upstream/master' 2013-08-01 10:57:12 -07:00
86fe61c8f9 add support for download YouTube 3d format of 3d content 2013-08-01 10:47:48 -07:00
9bb6d2f21d Merge pull request #1161 from meyerd/master
Fix regex error when only subtitled video is available on arte.
2013-08-01 04:49:05 -07:00
e3f4593e76 Fix regex error when only subtitled video is available on arte. 2013-08-01 11:48:17 +02:00
1d043b93cf [youtube] Add support for downloading videos with hlsvp (fixes #1083)
They are downloaded with a m3u8 manifest, they seem to be encrypted, but ffmpeg can handle them.
2013-07-31 23:41:05 +02:00
b15d4f624f Allow to download from m3u8 manifests with ffmpeg
They are detected by the extension of the url.
2013-07-31 22:33:37 +02:00
4aa16a50f5 Log a better error message if ffprobe or avconv are not found (related #1134) 2013-07-31 21:22:08 +02:00
bbcbf4d459 Switch some calls to to_stderr to report_error and report_warning 2013-07-31 21:20:46 +02:00
930ad9eecc release 2013.07.31 2013-07-31 10:55:02 +02:00
b072a9defd YoutubeIE: with age protected videos, add a missing "return" to return the signature decrypted with _decrypt_signature 2013-07-31 10:51:00 +02:00
75952c6e3d YoutubeIE: new algo for length 86 (fixes #1156)
Now is using the same length as the flash player used for age protected videos, but the algorithm is different, so now for age protected videos it first tries to use the old algo.
2013-07-31 10:45:13 +02:00
05afc96b73 Print urls from the batch file with --verbose (related #1155) 2013-07-30 23:11:44 +02:00
fa80026915 Disable way and tf1 tests, the whole videos are served sometimes, so the md5 sum doesn't match. 2013-07-30 11:19:07 +02:00
2bc3de0f28 [worldstarhiphop] Small cleanup
The second check for the Vevo id is not necessary.
2013-07-30 11:10:17 +02:00
99c7bc94af Merge pull request #1148 from JohnyMoSwag/master
[worldstarhiphop] support vevo videos
2013-07-30 11:05:40 +02:00
152c8f349d Merge pull request #1149 from pishposhmcgee/patch-3
[vevo] Modified m_urls regex and video_url
2013-07-30 01:57:37 -07:00
d75654c15e using re.search 2013-07-29 14:39:14 -07:00
0725f584e1 [wat] fix the extraction of the video url (fixes #1103)
Use the direct download link for Android.
2013-07-29 23:38:02 +02:00
8cda9241d1 Add an extractor for kankan.com (closes #1133) 2013-07-29 23:13:12 +02:00
a3124ba49f Modified m_urls regex and video_url
Some videos have a leading slash, some do not
2013-07-29 15:45:20 -05:00
579e2691fe detect vevo embed fix 2013-07-29 12:24:26 -07:00
63f05de10b detect vevo embed 2013-07-29 12:11:57 -07:00
caeefc29eb [vimeo] add an extractor for channels 2013-07-29 13:12:09 +02:00
a3c736def2 [dailymotion] Add an extractor for Dailymotion playlists 2013-07-29 12:07:38 +02:00
58261235f0 Add an extractor for roxwell.com (closes #1044) 2013-07-26 13:00:59 +02:00
da70877a1b release 2013.07.25.2 2013-07-25 22:58:40 +02:00
5c468ca8a8 YoutubeIE: add algo for length 79 (fixes #1126) 2013-07-25 22:50:24 +02:00
aedd6bb97d YoutubeIE: new algo for length 81 (fixes #1127) 2013-07-25 22:06:53 +02:00
733d9cacb8 Merge pull request #1120 from pishposhmcgee/patch-1
[collegehumor] Added an option 'e' to go with 'video' or 'embed'
2013-07-25 01:14:43 -07:00
42f2805e48 [keek] Fix testcase (Broken by accident in 6625f82940) 2013-07-25 10:10:37 +02:00
0ffcb7c6fc release 2013.07.25.1 2013-07-25 09:53:15 +02:00
27669bd11d [ina] Allow I at start of video IDs 2013-07-25 09:52:58 +02:00
6625f82940 [keek] Allow httpS URLs (Fixes #1123) 2013-07-25 09:40:19 +02:00
d0866f0bb4 release 2013.07.25 2013-07-25 09:35:25 +02:00
09eeb75130 Merge remote-tracking branch 'pishposhmcgee/patch-2' 2013-07-25 09:34:56 +02:00
0a99956f71 [ina] Fix URL detection (Fixes #1121) 2013-07-25 09:34:12 +02:00
12ef6aefa8 changed video_url regex
Some older videos contain an extra properties such as 'embed' before 'type'.
2013-07-24 21:51:08 -05:00
e93aa81aa6 Added an option 'e' to go with 'video' or 'embed'
Based on links that I've seen, /e/<videoid> also occurs in the wild, and making this substitution yields effective results.
2013-07-24 16:55:28 -05:00
755eb0320e [youtube] use itertools.count instead of a "while True" loop and a manual counter 2013-07-24 22:27:33 +02:00
43ba5456b1 [youtube] add an extractor for the "Watch Later" list 2013-07-24 22:13:39 +02:00
156d5ad6da release 2013.07.24.2 2013-07-24 21:18:41 +02:00
c626a3d9fa Add an extractor for downloading the Youtube favorite videos(closes #127) 2013-07-24 20:45:19 +02:00
b2e8bc1b20 YoutubeIE: Move the code from _real_initialize to a base class
This allows to reuse the code in other IEs without having to overwrite some parts.
2013-07-24 20:40:12 +02:00
771822ebb8 YoutubePlaylistIE: break only if there's no entry field in the response
Otherwise the Favorite videos playlist cannot be downloaded complete.
Also break if it reach the maximum value of the start-index.
2013-07-24 20:14:55 +02:00
eb6a41ba0f ExfmIE: extract Soundcloud songs using SoundcloudIE
Now SouncloudIE accepts api urls.
2013-07-24 14:39:21 +02:00
7d2392691c [soundcloud]: Some improvements
Extract thumbnails.
Make SoundcloudSetIE a subclass of SoundcloudIE to reuse some code.
Directly extract the file url without downloading an extra page.
2013-07-24 14:15:12 +02:00
c216c1894d release 2013.07.24.1 2013-07-24 13:52:55 +02:00
3e1ad508eb Add Youtube player info for length 87 2013-07-24 12:48:25 +02:00
a052c1d785 Merge pull request #1114 from alexvh/traileraddict_hd
[traileraddict] Obtain hd quality stream if available

Updated md5 checksum of the test video.
2013-07-24 10:52:24 +02:00
16484d4923 [traileraddict]: Support clips urls and more trailer urls 2013-07-24 10:43:44 +02:00
32a09b4382 Merge pull request #1113 from alexvh/master
[traileraddict] Allow all types of trailer URLs
2013-07-24 10:37:52 +02:00
870a7e6156 release 2013.07.24 2013-07-24 10:29:34 +02:00
239e3e0cca YoutubeIE: new algo for length 87 (fixes #1105)
Squashed commit from the pull requests #1107, #1109 and #1110.
2013-07-24 10:20:52 +02:00
b1ca5e3ffa [traileraddict] Obtain hd quality stream if available
No clear method for determining if hd is available so opted to just
check for presence of hd toggle function.
2013-07-24 02:42:32 -04:00
b9a1252c96 [traileraddict] Allow all types of trailer URLs
Valid url regex for traileraddict.com is too strict. Need to allow,
e.g. theatrical-trailer, teaser-trailer, feature-read-band-trailer, etc.
2013-07-24 00:48:11 -04:00
fc492de31d release 2013.07.23.1 2013-07-23 18:37:52 +02:00
a9c0f9bc63 Merge branch 'master' of github.com:rg3/youtube-dl 2013-07-23 18:37:09 +02:00
b7cc9f5026 [soundcloud] Support URLs with a slash at the end (Fixes #1104) 2013-07-23 18:35:52 +02:00
252580c561 YoutubeChannelE: switch ajax query from channel_ajax to c4_browse_ajax
It wasn't detecting when there aren't more videos
2013-07-23 14:58:01 +02:00
acc47c1a3f Mark WatIE and TF1IE as broken (related #1103) 2013-07-23 14:29:30 +02:00
70fa830e4d CollegeHumorIE: support Youtube videos and embed urls (fixes #1094) 2013-07-23 14:29:29 +02:00
a7af0ebaf5 release 2013.07.23 2013-07-23 14:20:52 +02:00
67ae7b4760 Fix BreakIE
Also detect videos that come from Youtube
2013-07-23 11:41:05 +02:00
de48addae2 Fix CollegHumorIE
Now it downloads the video over http in one file, it doesn't downloads in fragments
Added a test and use the methods in InfoExtractor for downloading webpages
2013-07-23 11:14:11 +02:00
ddbfd0f0c5 ComedyCentralIE: support the extended interviews urls (fixes #1079) 2013-07-21 11:04:56 +02:00
d7ae0639b4 [youtube] Add an extractor for Youtube recommended videos (":ytrec" keyword) (closes #476)
The new extractor and YoutubeSubscriptionsIE are subclasses of YoutubeFeedsInfoExtractor, which allows to fetch videos from http://www.youtube.com/feed_ajax
2013-07-20 19:33:40 +02:00
6804038d06 Don't try to write the subtitles if it's None 2013-07-20 12:59:47 +02:00
2f799533ae YoutubeIE: don't crash when trying to get automatic captions if the videos has standard subtitles. 2013-07-20 12:56:10 +02:00
88ae5991cd YoutubeIE: use the same function for getting the subtitles for the "--write-sub" and "--all-sub" options 2013-07-20 12:56:06 +02:00
5d51a883c2 Use a dictionary for storing the subtitles
The errors while getting the subtitles are reported as warnings, if no subtitles are found return and empty dict.
2013-07-20 12:52:25 +02:00
c4a91be726 Save subtitles using the same code for all the options 2013-07-20 12:52:24 +02:00
0382435990 [exfm] Add IE_* descriptions 2013-07-20 11:26:36 +02:00
b390d85d95 Merge remote-tracking branch 'yasoob/master' 2013-07-20 11:23:56 +02:00
be925dc64c release 2013.07.19 2013-07-19 23:42:29 +02:00
de7a91bfe3 WeiboIE: extract the player urls from a json webpage
Also extract a Sina url that doesn't require to follow a redirection.
2013-07-19 20:43:44 +02:00
a4358cbabd YoutubeIE: new algo for length 85 (closes #1080), thanks to @patrickslin 2013-07-19 17:12:40 +02:00
177ed935a9 TEDIE: fix the title extraction 2013-07-19 16:13:31 +02:00
c364f15ff1 Add WeiboIE (closes #1039)
It just embed video from other sites.
Modified the _VALID_URL of Youku to catch embed urls.
2013-07-19 16:09:14 +02:00
e1f6e61e6a Add an extractor for 56.com (related #1039) 2013-07-19 15:17:34 +02:00
0932300e3a Add SinaIE (related #1039): extractor for video.sina.com.cn 2013-07-18 15:31:50 +02:00
3f40217704 InstagramIE: fix the extraction of the uploader_id and the title
The page title is now 'Instagram', so we build it.
Also extract the description
2013-07-18 13:12:27 +02:00
f631c3311a Hint that --update may need sudo 2013-07-18 12:53:24 +02:00
ad433bb372 release 2013.07.18 2013-07-18 12:41:49 +02:00
3e0b3a1428 Remove the test to signature of lengths 43,43
It's already covered by the test for length 87
2013-07-18 12:29:09 +02:00
444b116597 YoutubeIE: add algo for length 90 (closes #1064)
Order the cases from higher to lower length.
2013-07-18 12:25:41 +02:00
2aea08eda1 Merge pull request #1068 from MiLk/genalgo-youtube-92
[youtube] Add generator for signature 92
2013-07-18 09:54:56 +02:00
8e5e059d7d forgot to import json json 2013-07-18 12:40:56 +05:00
2b1b511f6b removed some unnecessary imports 2013-07-18 12:37:47 +05:00
233ad24ecf corrected a typo and added myself to travis notifications. 2013-07-18 12:37:02 +05:00
c4949c50f9 added test for ex.fm 2013-07-18 12:33:31 +05:00
b6ef402905 added an IE for ex.fm 2013-07-18 12:30:21 +05:00
ccf365475a [youtube] Add generator for signature 92 2013-07-17 17:43:44 +02:00
e1fb245690 Add CondeNastIE
It supports some of the websites of the Condé Nast group: WIRED, GQ, Vogue, Glamour, W Magazine and Vanity Fair.
2013-07-17 14:39:02 +02:00
5a76c6517e YoutubeIE: some encrypted signatures have more than two parts, print the size of all the parts 2013-07-17 12:08:10 +02:00
1bb9568776 release 2013.07.17.1 2013-07-17 11:18:35 +02:00
ecd1c2f7e9 [thisav] add a test for video MD5 2013-07-17 11:18:14 +02:00
466de68801 [thisav] Add IE (Fixes #1056) 2013-07-17 11:16:53 +02:00
88d4111cfa [youtube] Add code for signature 92 (Closes #1060) 2013-07-17 11:06:34 +02:00
51fb64bab1 Mark test_youtube_sig as non-executable (#1066) 2013-07-17 11:04:07 +02:00
be547e1d3b Revert "[youtube] improved decrypt_signature, closes #1060"
This reverts commit fe6fad1242 and closes #1066.
2013-07-17 11:01:40 +02:00
bf85454116 [metacafe] Fix test 2013-07-17 10:50:30 +02:00
5910724b11 [metacafe] New result format 2013-07-17 10:49:49 +02:00
7e24b09da9 [metacafe] Extract description 2013-07-17 10:45:35 +02:00
f085f960e7 [metacafe] Fix uploader detection 2013-07-17 10:45:24 +02:00
f38de77f6e Use unescapeHTML for OpenGraph properties
These are attribute values, so we don't need the more complex and whitespace-destroying cleanHTML - we just need to unescape quotes, that's it.
2013-07-17 10:38:23 +02:00
58e7d46d1b Merge remote-tracking branch 'Forever-Young/patch-1' 2013-07-17 09:25:52 +02:00
2a5201638d [youtube] Add sig test for 92 (Thanks to @patrickslin) 2013-07-17 09:23:38 +02:00
fe6fad1242 [youtube] improved decrypt_signature, closes #1060 2013-07-17 10:41:43 +04:00
ec00e1d8a0 [metacafe] Use modern helper methods 2013-07-17 01:35:33 +02:00
de29c4144e Ignore errors in git error handling in verbose mode in Python 3 2013-07-17 01:33:28 +02:00
f3bab0044e Write debugging output to stderr (#1059) 2013-07-17 01:30:34 +02:00
ffd1833b87 release 2013.07.17 2013-07-17 01:14:38 +02:00
896d5b63e8 [metacafe] Add support for AnyClip videos (#1059) 2013-07-17 01:14:30 +02:00
67de24e449 [freesound] Minor improvements 2013-07-15 21:33:45 +02:00
66400c470c Merge pull request #1050 from yasoob/master
Added an IE and test for Freesound.org .
2013-07-15 21:06:51 +02:00
7665010267 added test for freesound.org 2013-07-15 20:17:09 +05:00
5d9b75051a Added an IE for freesound.org 2013-07-15 20:16:44 +05:00
ab2f744b90 GametrailersIE: make it a subclass of MTVIE to reuse most of the extraction process 2013-07-14 14:29:15 +02:00
300fcad8a6 MTVIE: fix xml tags in the media namespace (python2.6) 2013-07-14 14:02:04 +02:00
f7e025958a [mtv]: rework MTVIE and add tests (closes #913)
It uses the same system as ComedyCentralIE to transform ramp urls into http.
2013-07-14 13:41:46 +02:00
0ab5531363 [livestream] fix import statement 2013-07-14 09:25:51 +02:00
b4444d5ca2 Add LivestreamIE (closes #1042) 2013-07-13 23:58:04 +02:00
0025da15cf Clarify that download rate is in bytes per second
I found f918ec7ea2 but it is still not clear to anyone who hasn't read Issue #723 whether the limit is in bits or bytes.  This is doubly confusing because 1) ISPs usually advertise speeds in bits per second, and 2) lowercase "k" and "m" are often used in correlation with bits rather than bytes.
2013-07-13 16:42:16 -05:00
b9d3e1635f Strip hash info from URL when making requests (Fixes #1038) 2013-07-13 22:52:12 +02:00
aa6b734e02 [instagram] really fix uploader_id detection (Fixes #1038) 2013-07-13 21:45:33 +02:00
73b57f0ccb [instagram] fix uploader_id detection (Fixes #1038) 2013-07-13 20:40:04 +02:00
3c4e6d8337 Improve OpenGraph property matching 2013-07-13 20:39:47 +02:00
36034aecc2 Merge remote-tracking branch 'jaimeMF/opengraph' 2013-07-13 20:33:23 +02:00
ffca4b5c32 Add CanalplusIE (closes #59 and closes #918) 2013-07-13 13:36:15 +02:00
b0e72bcf34 CriterionIE: simplify some parts and use _html_search_regex 2013-07-13 12:26:05 +02:00
7fd930c0c8 Merge pull request #1036 from yasoob/master
Added an IE and test for Criterion videos (closes #1035).
2013-07-13 12:18:03 +02:00
2e78b2bead YouJizzIE: support videos that define the urls in a playlist page (closes #1037) 2013-07-13 12:07:07 +02:00
44dbe89035 Use re.DOTALL by default when searching OpenGraph properties 2013-07-13 11:29:08 +02:00
2d5a8b5512 added test for criterion.com 2013-07-13 09:18:03 +05:00
159736c1b8 added an IE for criterion.com 2013-07-13 09:17:48 +05:00
46720279c2 InfoExtractor: add some helper methods to extract OpenGraph info 2013-07-12 22:12:04 +02:00
d8269e1dfb Don't try to save the thumbnail if it's None
It means the extractor couldn't find it
2013-07-12 22:11:59 +02:00
cbdbb76665 Use determine_ext when saving the thumbnail
Urls that contain a query produced filenames with wrong extensions
2013-07-12 22:08:49 +02:00
6543f0dca5 BrightcoveIE: Use parse_qs to extract the fields of the query (closes #1032)
Add a compat_urlparse to utils.
2013-07-12 14:53:28 +02:00
232eb88bfe GenericIE: allow to match declaration of the Brightocove parameters that use ' instead of " 2013-07-12 14:52:01 +02:00
a95967f8b7 [ign]: support some country versions and add an extractor for 1up.com
1up.com uses the gin video system, the extractor is a subclass of IGNIE, it just replaces the video id
2013-07-12 11:39:40 +02:00
2ef648d3d3 Add IGNIE
Only for www.ign.com, it doesn't support country specific versions (like es.ign.com)
2013-07-12 00:03:59 +02:00
33f6830fd5 release 2013.07.12 2013-07-11 23:54:34 +02:00
606d7e67fd YoutubeIE: add algo for length 81 (closes #1026) 2013-07-11 23:47:54 +02:00
fd87ff26b9 release 2013.07.11 2013-07-11 21:04:59 +02:00
85347e1cb6 YoutubeIE: a new algo for length 83 2013-07-11 20:21:45 +02:00
41897817cc GametrailersIE: support multipart videos
Use xml.etree.ElementTree instead of re when possible
2013-07-11 18:24:53 +02:00
45ff2d51d0 [brightcove] add import 2013-07-11 16:31:29 +02:00
5de3ece225 [brightcove] fix on Python 2.6 2013-07-11 16:16:02 +02:00
df50a41289 [arte] Fix on 2.6 2013-07-11 16:12:16 +02:00
59ae56fad5 Add helper function find_path_attr 2013-07-11 16:12:08 +02:00
690e872c51 Remove video_result helper method
Calling it was more complex then actually including the type in the video info
2013-07-11 12:12:30 +02:00
81082e046e [ehow] improve minor bits 2013-07-11 12:11:00 +02:00
3fa9550837 Merge remote-tracking branch 'yasoob/master' 2013-07-11 12:02:16 +02:00
b1082f01a6 added test for ehow 2013-07-11 14:30:25 +05:00
f35b84c807 added an IE for Ehow videos 2013-07-11 14:25:14 +05:00
117adb0f0f GenericIE: detect more Brightcove videos
In some sites "class" contains more that BrightcoveExperience
2013-07-11 00:25:38 +02:00
abb285fb1b BrightcoveIE: add support for playlists 2013-07-11 00:04:33 +02:00
a431154706 Set the playlist_index and playlist fields for already resolved video results. 2013-07-10 23:36:30 +02:00
cfe50f04ed GenericIE: Detect videos from Brightcove
Brightcove videos info is usually found in an <object class="BrightcoveExperience"></object> node, this is passed to a new method of BrightcoveIE that builds a url to extract the video.
2013-07-10 17:49:11 +02:00
a7055eb956 YoutubeIE: show a more meaningful error when it founds a rtmpe download (related #343) 2013-07-10 14:35:11 +02:00
0a1be1e997 release 2013.07.10 2013-07-10 11:36:11 +02:00
c93898dae9 YoutubeIE: new algo for length 83 (closes #1017 and closes #1016) 2013-07-10 10:44:04 +02:00
ebdf2af727 GameSpotIE: support more urls and download videos in the best quality 2013-07-09 20:07:52 +02:00
c108eb73cc YoutubeIE: Fix vevo explicit videos (closes #956)
When an age restricted video is detected it simulates accessing the video from www.youtube.com/v/{video_id}
2013-07-09 15:43:44 +02:00
3a1375dacf VeohIE: remove debug logging 2013-07-09 11:11:55 +02:00
41bece30b4 DotsubIE: simplify and extract the upload date
Do not declare variables for fields in the info dictionary.
2013-07-08 22:40:42 +02:00
16ea58cbda Merge pull request #1009 from yasoob/master
Added an IE and test for dotsub.com videos. ( closes #1008 )
2013-07-08 22:21:06 +02:00
99e350d902 Add VeohIE (closes #1006) 2013-07-08 22:02:23 +02:00
13e06d298c added an IE and test for dotsub. 2013-07-09 00:05:52 +05:00
56c7366547 YoutubeIE: reuse instances of InfoExtractors (closes #998)
When a IE is added to the list, it's also added to a dictionary. When a IE is requested it first looks in the dictionary and if there's no instance it will create a new one.

That way _real_initialize is only called once for each IE, saving time if it needs to login for example.
2013-07-08 15:14:27 +02:00
81f0259b9e YoutubeSubscriptionsIE: raise an error if there's no login information. 2013-07-08 11:24:11 +02:00
fefcb5d314 YoutubeIE: use the new method in the base IE for getting the login info 2013-07-08 11:24:11 +02:00
345b0c9b46 Remove dead code 2013-07-08 02:13:50 +02:00
20c3893f0e Do not redefine variables in list comprehensions 2013-07-08 02:12:20 +02:00
29293c1e09 release 2013.07.08.1 2013-07-08 02:05:22 +02:00
5fe3a3c3fb [archive.org] Add extractor (Fixes #1003) 2013-07-08 02:05:02 +02:00
b04621d155 release 2013.07.08 2013-07-08 01:29:16 +02:00
b227060388 [arte] Always look for the JSON URL (Fixes #1002) 2013-07-08 01:28:19 +02:00
d93e4dcbb7 Merge branch 'master' of github.com:rg3/youtube-dl 2013-07-08 01:15:19 +02:00
73e79f2a1b [3sat] Add support (Fixes #1001) 2013-07-08 01:13:55 +02:00
fc79158de2 VimeoIE: authentication support (closes #885) and add a method in the base InfoExtractor to get the login info 2013-07-07 23:24:34 +02:00
7763b04e5f YoutubeIE: extract the thumbnail in the best possible quality 2013-07-07 21:21:15 +02:00
9d7b44b4cc release 2013.07.07.01 2013-07-07 17:13:56 +02:00
897f36d179 [youtube:subscriptions] Use colon for differentiation of shortcuts 2013-07-07 17:13:26 +02:00
94c3637f6d release 2013.07.07 2013-07-07 16:55:06 +02:00
04cc96173c [youtube] Add and extractor for the subscriptions feed (closes #498)
It can be downloaded using the ytsubscriptions keyword.
It needs the login information.
2013-07-07 13:58:23 +02:00
fbaaad49d7 Add BrightcoveIE (closes #832)
It only accepts the urls that are use for embedding the video, it doesn't search in generic webpages to find Brightcove videos
2013-07-05 21:31:50 +02:00
b29f3b250d DailymotionIE: extract thumbnail 2013-07-05 19:39:37 +02:00
fa343954d4 release 2013.07.05 2013-07-05 14:46:24 +02:00
2491f5898e DailymotionIE: simplify the extraction of the title and remove an unused assignment of video_uploader 2013-07-05 14:20:15 +02:00
b27c856fbc Dailymotion: fix the download of the video in the max quality (closes #986) 2013-07-05 14:15:26 +02:00
9941ceb331 ArteTVIE: support emission urls that don't contain the video id
Like http://www.arte.tv/guide/fr/emissions/AJT/arte-journal
2013-07-05 12:56:41 +02:00
c536d38059 release 2013.07.04 2013-07-04 18:07:34 +02:00
8de64cac98 [arte] Fix language selection (Fixes #988) 2013-07-04 18:07:03 +02:00
6d6d286539 Merge branch 'master' of github.com:rg3/youtube-dl 2013-07-03 16:36:42 +02:00
5d2eac9eba [auengine] Add tests (Fixes #985) 2013-07-03 16:36:36 +02:00
9826925a20 ArteTVIE: extract the video with the correct language
Some urls from the French version of the page could download the German version.

Also instead of extracting the json url from the webpage, build it to skip the download
2013-07-02 17:34:40 +02:00
24a267b562 TudouIE: extract all the segments of the video and download the best quality (closes #975)
Also simplify a bit the extraction of the id from the url and write directly the title for the test video
2013-07-02 12:38:24 +02:00
d4da3d6116 BlipTVIE: download the video in the best quality (closes #215) 2013-07-02 10:40:23 +02:00
d5a62e4f5f release 2013.07.02 2013-07-02 09:14:09 +02:00
9a82b2389f Do not show bug report for errors that are to be expected (Closes #973) 2013-07-02 08:40:21 +02:00
8dba13f7e8 Squelch git not found exception (#973) 2013-07-02 08:36:20 +02:00
deacef651f Improve formatting 2013-07-02 08:35:39 +02:00
2e1b3afeca README.md: Fix markup and some of the text.
(Originally from Rogério Brito <rbrito@ime.usp.br>)
2013-07-02 07:39:54 +02:00
652e776893 setup: PEP-8 fixes.
Signed-off-by: Rogério Brito <rbrito@ime.usp.br>
2013-07-01 23:17:48 -03:00
d055fe4cb0 setup: cosmetics: Add/remove some whitespace for readability.
This also fixes some long lines.

Signed-off-by: Rogério Brito <rbrito@ime.usp.br>
2013-07-01 23:17:48 -03:00
131842bb0b setup: Move pseudo-docstring to a proper comment.
A string statement is not a docstring if it doesn't occur right at the top
of modules, functions, class definitions etc.

This patch fixes it.

Signed-off-by: Rogério Brito <rbrito@ime.usp.br>
2013-07-01 23:17:48 -03:00
59fc531f78 Add InstagramIE (related #904) 2013-07-01 21:08:54 +02:00
5c44c15438 GenericIE: match titles that spread across multiple lines (related #904) 2013-07-01 20:50:50 +02:00
62067cb9b8 Shorten --list-extractor-descriptions to --extractor-descriptions 2013-07-01 18:59:29 +02:00
0f81866329 Add --list-extractor-descriptions (human-readable list of IEs) 2013-07-01 18:52:19 +02:00
2db67bc0f4 Merge branch 'master' of github.com:rg3/youtube-dl 2013-07-01 18:21:36 +02:00
7dba9cd039 Sort IEs alphabetically in --list-extractors 2013-07-01 18:21:29 +02:00
75dff0eef7 [youtube]: add YoutubeShowIE (closes #14)
It just extracts the playlists urls for each season
2013-07-01 17:59:28 +02:00
d828f3a550 YoutubeIE: use a negative index when accessing the last element of the format list 2013-07-01 17:19:33 +02:00
bcd6e4bd07 YoutubeIE: extract the correct video id for movie URLs (closes #597) 2013-07-01 16:51:18 +02:00
53936f3d57 Merge remote-tracking branch 'yasoob/master'
Conflicts:
	youtube_dl/extractor/__init__.py
2013-07-01 15:19:45 +02:00
0beb3add18 Separate downloader options 2013-07-01 14:53:25 +02:00
f9bd64c098 [update] Add package manager to error message (#959) 2013-07-01 02:36:49 +02:00
d7f44b5bdb [youtube] Warn if URL is most likely wrong (#969) 2013-07-01 02:29:29 +02:00
48bfb5f238 [instagram] Fix title 2013-06-30 14:07:32 +02:00
97ebe8dcaf StatigramIE: update the title of the test video 2013-06-30 13:57:57 +02:00
d4409747ba TumblrIE: update test
The video (once more) is no longer available
2013-06-30 13:52:20 +02:00
37b6a6617f ArteTvIE: support videos from videos.arte.tv
Each source of videos have a different extraction process, they are in different methods of the extractor.
Changed the extension of videos from mp4 to flv.
2013-06-30 13:38:22 +02:00
ca1c9cfe11 release 2013.06.34.4 2013-06-29 20:22:08 +02:00
adeb4d7469 Merge remote-tracking branch 'origin/master' 2013-06-29 20:21:13 +02:00
50587ee8ec [vimeo] fix detection for http://vimeo.com/groups/124584/videos/24973060 2013-06-29 20:20:20 +02:00
8244288dfe WatIE: support videos divided in multiple parts (closes #222 and #659)
The id for the videos is now the full id, no the one in the webpage url.
Also extract more information: description, view_count and upload_date
2013-06-29 18:22:03 +02:00
6ffe72835a [tutv] Fix URL type (for Python 3) 2013-06-29 17:42:15 +02:00
8ba5e990a5 release 2013.06.34.3 2013-06-29 17:30:11 +02:00
9afb1afcc6 [tutv] Add IE (Fixes #965) 2013-06-29 17:29:40 +02:00
0e21093a8f Merge branch 'master' of github.com:rg3/youtube-dl 2013-06-29 16:57:34 +02:00
9c5cd0948f [ted] Fix test checksum 2013-06-29 16:45:56 +02:00
1083705fe8 Update the default output template in the README
It was changed in 08b2ac745a
2013-06-29 16:35:28 +02:00
f3d294617f Document view_count (Closes #963) 2013-06-29 16:32:28 +02:00
de33a30858 Merge pull request #962 from jaimeMF/TF1
Add TF1IE
2013-06-29 07:30:49 -07:00
887a227953 added an IE and test for traileraddict.com 2013-06-29 19:17:27 +05:00
705f6f35bc Move TF1IE to its own file 2013-06-29 15:18:19 +02:00
e648b22dbd Add TF1IE 2013-06-29 15:07:25 +02:00
257a2501fa keep track of the dates and html5player versions of working YT signature algos 2013-06-29 01:05:36 +02:00
99afb3ddd4 Add WatIE 2013-06-28 22:01:47 +02:00
a3c776203f Rewrote error message a bit to clarify 2013-06-28 18:53:31 +02:00
53f350c165 Changed the error message.
I changed the ExtractorError from ```msg = msg + u'; please report this issue on http://yt-dl.org/bug'``` to ```msg = msg + u'; please report this issue on http://yt-dl.org/bug with the complete output by running the same command with --verbose flag'```
Hopefully this will tell the users to report bugs with the complete output.
2013-06-28 18:51:54 +02:00
f46d31f948 Add RingTVIE (Thanks @yasoob) 2013-06-28 18:51:00 +02:00
bf64ff72db Added an IE for gamespot. Although gamespot allows downloading but it is only available to registered users. With this IE no registration is required. 2013-06-28 18:42:45 +02:00
bc2884afc1 Print which IE is being skipped in test_download 2013-06-28 11:20:00 +02:00
023fa8c440 Add function add_default_info_extractors to YoutubeDL
It adds to the list the ies returned by ge_extractors
2013-06-27 23:51:06 +02:00
427023a1e6 Merge branch 'generate-ie-list' 2013-06-27 22:44:02 +02:00
a924876fed Make sure that IEs only accept their own URLs 2013-06-27 21:25:51 +02:00
3f223f7b2e [tumblr] Fix title 2013-06-27 21:19:42 +02:00
fc2c063e1e Move testcase generator to helper 2013-06-27 21:15:16 +02:00
20db33e299 Make sure SoundcloudIE does not match soundcloud sets 2013-06-27 21:11:23 +02:00
c0109aa497 release 2013.06.34.2 2013-06-27 20:50:57 +02:00
ba7a1de04d Credit @gitprojs for auengine 2013-06-27 20:50:34 +02:00
4269e78a80 Merge branch 'master' of github.com:rg3/youtube-dl 2013-06-27 20:47:03 +02:00
6f5ac90cf3 Move tests to the IE definitions 2013-06-27 20:46:46 +02:00
de282fc217 Merge pull request #954 from gitprojs/generic
Augmented Generic IE
2013-06-27 11:44:46 -07:00
ddbd903576 Tests: Add coding to files 2013-06-27 20:32:02 +02:00
0c56a3f773 [googleplus] move tests 2013-06-27 20:31:27 +02:00
9d069c4778 [infoq] move tests 2013-06-27 20:27:08 +02:00
0d843f796b Remove superfluous name declarations 2013-06-27 20:25:56 +02:00
67f51b3d8c [youku] move tests 2013-06-27 20:25:46 +02:00
5c5de1c79a [eighttracks] move test 2013-06-27 20:22:00 +02:00
0821771466 [steam] move test 2013-06-27 20:20:00 +02:00
83f6f68e79 [metacafe] move tests 2013-06-27 20:18:35 +02:00
27473d18da Made 'video' the default title for generic IE 2013-06-27 19:18:15 +01:00
0c6c096c20 [soundcloud] Move tests 2013-06-27 20:17:21 +02:00
52c8ade4ad Made generic IE handle more cases
Added a possible quote after file, so it can now handle cases like:
'file': 'http://www.a.com/b.mp4'
2013-06-27 19:16:09 +01:00
0e853ca4c4 [youtube] Fix tests in 2.x 2013-06-27 19:55:39 +02:00
41beccbab0 Use str every time 2013-06-27 19:43:43 +02:00
2eb88d953f Allow _TESTS attribute for IEs with multiple tests
This also improves the numbering of duplicate tests
2013-06-27 19:13:11 +02:00
1f0483b4b1 Generate the list of IEs automatically
It seems like GenericIE needs to be last, but other than that, the order really does not matter anymore.
To cut down on merge conflicts, generate the list of IEs automatically.
2013-06-27 18:43:32 +02:00
6b47c7f24e Allow moving tests into IE files
Allow adding download tests right in the IE file.
This will cut down on merge conflicts and make it more likely that new IE authors will add tests right away.
2013-06-27 18:28:45 +02:00
d798e1c7a9 [auengine] Rename to official capitalization 2013-06-27 18:19:19 +02:00
3a8736bd74 Merge remote-tracking branch 'gitprojs/master'
Conflicts:
	youtube_dl/extractor/__init__.py
2013-06-27 18:16:41 +02:00
c8c5163618 release 2013.06.34.1 2013-06-27 17:58:58 +02:00
500f3d2432 Merge remote-tracking branch 'origin/HEAD' 2013-06-27 17:58:42 +02:00
ed4a915e08 Add tests and improve for HotNewHipHop 2013-06-27 17:56:48 +02:00
b8f7b1579a Merge remote-tracking branch 'JohnyMoSwag/master' 2013-06-27 17:52:41 +02:00
ed54491c60 fix for detecting youtube embedded videos. 2013-06-27 08:39:32 -07:00
e4decf2750 Updated auengine IE to use compat_urllib* utils 2013-06-27 13:48:28 +01:00
c90f13d106 YoutubeIE: update the docstrings and the error message of _decrypt_signature
Now it doesn't check the size of the two parts of the key.
2013-06-27 14:37:45 +02:00
62008f69c1 Added an IE for auengine.com 2013-06-27 12:58:09 +01:00
e88f5e0b4e release 2013.06.34 2013-06-27 13:02:57 +02:00
769fda3c5a print more encrypted signature info on -v (rel: #948) 2013-06-27 12:54:07 +02:00
23300d7149 a new day, a new s algo - fix #946 2013-06-27 12:24:46 +02:00
f5756f388a Check in signature generator 2013-06-27 11:15:29 +02:00
ee313cdcbf simplify youtube signature generation 2013-06-27 11:15:01 +02:00
8b50fed04b removed print statement 2013-06-26 19:04:05 -07:00
5b66de8859 Added HotNewHipHop IE 2013-06-26 18:38:48 -07:00
e38af9e00c Merge branch 'master' of github.com:rg3/youtube-dl 2013-06-27 01:52:13 +02:00
6b37f0be55 Add a clean-room implementation for youtube signatures 2013-06-27 01:51:10 +02:00
6e5d5f2fc1 Merge branch 'master' of github.com:rg3/youtube-dl 2013-06-27 00:16:02 +02:00
75c9481224 ArteTvIE: rewrite the extract process to support the new site (fixes #875)
The video can be downloaded with rtmp or http, but the best quality format seems to always use rtmp.
Deleted the old methods.
2013-06-27 00:09:51 +02:00
5746f9da99 Add test for youtube signature algorithm 2013-06-27 00:09:25 +02:00
112da0a0ce Simplify FakeYDL 2013-06-27 00:09:05 +02:00
bcd606c0fe ComedycentralIE: Force conversion of the description to unicode (close #941)
When writing to a file it would fail.
2013-06-26 21:38:01 +02:00
ed92bc9f6e [wimp] minor readability improvements (#940) 2013-06-26 18:22:42 +02:00
9b0756f8f2 [vevo] remove unused import 2013-06-26 18:05:01 +02:00
aa0c87391c Add CSpanIE (closes #312) 2013-06-26 17:55:54 +02:00
b1dfdc51b1 added .decode('ascii') 2013-06-26 19:41:55 +05:00
2e32528012 FileDownloader: fixed call to "report_error" of YoutubeDL
It was being called as "error"
2013-06-26 16:32:47 +02:00
f64e7695a1 added b'' to my regex expression in order to solve the error on python 3 2013-06-26 18:46:05 +05:00
5abeaf0650 changed wimp.py according to the changes suggested by jaime 2013-06-26 17:26:59 +05:00
8bcc355972 removed trailing ',' and corrected the title in test 2013-06-26 15:51:25 +05:00
6b4642fae3 added test for wimp.com 2013-06-26 15:40:24 +05:00
d1bd37deac Merge branch 'master' of github.com:rg3/youtube-dl 2013-06-26 15:30:21 +05:00
405ec05cb2 added an IE for wimp.com 2013-06-26 15:25:53 +05:00
52e8e1dc88 Merge pull request #936 from iemejia/master
Added option for vtt WebVTT subtitle format for Youtube
2013-06-26 03:06:06 -07:00
b98a6b2f72 Fixed typo in subtitle format option (from: sbt => sbv) 2013-06-26 11:59:29 +02:00
0ca45b233f Added missing write-auto-sub option in README file 2013-06-26 11:34:38 +02:00
65cceef8f4 Added support for additional vtt subtitle format (WebVTT) in youtube-dl. 2013-06-26 11:28:47 +02:00
b004821fa9 Add the option "--write-auto-sub" to download automatic subtitles from Youtube
Now automatic subtitles are only downloaded if the option is given.
(closes #903)
2013-06-25 23:46:24 +02:00
81b42336ad release 2013.06.33 2013-06-25 22:42:02 +02:00
c6c1974672 Add "--video-password" option (related #889)
Used only for accessing a private video

Restore the error when the account is missing
2013-06-25 22:22:32 +02:00
a545d1d262 Merge pull request #922 from JohnyMoSwag/master
Added embedded youtube detection to WorldstarIE
2013-06-25 22:08:58 +02:00
037fcd0047 JukeboxIE: support more countries 2013-06-25 22:04:44 +02:00
318452bc0c Sort IEs alphabetically 2013-06-25 21:11:57 +02:00
d746cd88c2 Merge remote-tracking branch 'yasoob/master' 2013-06-25 21:09:15 +02:00
9c42603b5a release 2013.06.32 2013-06-25 20:55:47 +02:00
ea93cce4f6 Directly call update_latest 2013-06-25 20:50:54 +02:00
f4daa18152 added test for tudou.com 2013-06-25 22:52:21 +05:00
9caa687d81 Added an IE for todou 2013-06-25 22:48:08 +05:00
3b58c6fb54 Update latest files on release 2013-06-25 18:48:57 +02:00
5926c10690 release 2013.06.31 2013-06-25 18:40:58 +02:00
df725153d2 Credit mc2avr for JukeboxIE (#924) 2013-06-25 17:57:47 +02:00
d662896090 [googleplus] Adapt to new detail URL format 2013-06-25 17:52:32 +02:00
db241e8645 Add encoding to jukebox IE and simplify it a little bit 2013-06-25 17:16:38 +02:00
ead28ff30a Make upload atomic (#925) 2013-06-25 17:14:25 +02:00
515d7a5e73 Add Jukebox IE 2013-06-25 17:12:35 +02:00
14fbdc9cdd [jukebox] call YoutubeIE if necessary 2013-06-25 16:51:09 +02:00
98bcd2834a improve generic and encrypted signature error messages 2013-06-25 16:47:16 +02:00
f7ab6cbe16 add tests for use_cipher_signature videos (#897) and the ability to test multiple videos per IE 2013-06-25 14:38:00 +02:00
28ef06f7c2 add JukeboxIE 2013-06-25 13:28:59 +02:00
577d02370d release 2013.06.30 2013-06-25 12:28:40 +02:00
50be92c11c Handle video pages without vevo IDs (Fixes #923) 2013-06-25 12:28:17 +02:00
d18596baf4 added Youtube embed detection to WorldstarIE 2013-06-24 18:58:49 -07:00
7ce7e39476 YoutubeIE: Extend decryption of signatures to all videos that have the 's' field in the url_encoded_fmt_stream_map (related #920) 2013-06-24 21:25:12 +02:00
93eb15c573 clean up printing in __init__.py 2013-06-24 15:57:53 +02:00
9f4d83e3b1 release 2013.06.29 2013-06-24 14:51:24 +02:00
1c251cd948 MTVIE: add support for Vevo videos (related #913) 2013-06-24 13:54:19 +02:00
70d1924f8b Add VevoIE 2013-06-24 12:31:41 +02:00
7b4948b05f release 2013.06.28 2013-06-24 11:11:33 +02:00
878b5d9f0d Merge remote-tracking branch 'jaimeMF/youtubedl_class' 2013-06-24 10:48:41 +02:00
2bc1820660 release 2013.06.27 2013-06-24 10:32:08 +02:00
8bf8b5a577 Use the new class in the tests 2013-06-24 10:21:44 +02:00
8222d8de88 Split FileDownloader in two classes: FileDownloader and YoutubeDL
YoutubeDL is the class that coordinates everything
FileDownloader gets a filename and an info dict and downloads the video.
2013-06-24 10:21:43 +02:00
c7253e2e8c [youtube] fix condition always being evaluated to true 2013-06-24 09:42:46 +02:00
d69cf69a6a [youtube] Use mp4 as extension for format 38 (Fixes #892) 2013-06-24 01:22:59 +02:00
d02ecdefab release 2013.06.26 2013-06-24 01:01:53 +02:00
bc857bfce0 Remove includes from setup.py for windows build 2013-06-24 01:01:17 +02:00
f8bf74575a release 2013.06.25 2013-06-24 00:20:36 +02:00
964ac8b584 Fix release script once more 2013-06-24 00:09:57 +02:00
a3522dfddd Merge branch 'master' of github.com:rg3/youtube-dl 2013-06-24 00:09:11 +02:00
d3a8613b6e Improve test skipping functionality 2013-06-24 00:05:02 +02:00
200b388752 Correct comparison test 2013-06-24 00:02:49 +02:00
dabcaf3b06 release 2013.06.24 2013-06-24 00:02:20 +02:00
e646ffe795 Add included files for Windows build 2013-06-24 00:01:41 +02:00
b0dcc3c47f setup.py: include the new extractor module 2013-06-23 23:54:08 +02:00
b07d9c23c5 release 2013.06.23 2013-06-23 23:42:21 +02:00
d71cae62cc allow skipping tests when releasing
(YouTube Subtitles are currently flaky in Germany, especially via IPv6)
2013-06-23 23:41:54 +02:00
633a50cf4b Update Makefile to packaged paths 2013-06-23 23:27:28 +02:00
825e0984e2 [break] adapt to new paths 2013-06-23 22:59:51 +02:00
d1cade5ade Correct module name 2013-06-23 22:53:42 +02:00
190717e31f [justin.tv] Clarify variable content 2013-06-23 22:52:43 +02:00
0824c28c8b Remove mentions of old InfoExtractors module 2013-06-23 22:42:59 +02:00
c59b4aaeef Fix imports and restrict available legacy imports 2013-06-23 22:38:59 +02:00
f9c6cbf002 Move extractor imports and functions into extractor/__init__.py 2013-06-23 22:36:24 +02:00
b8fe71ab86 Remove unused imports from InfoExtractor 2013-06-23 22:34:23 +02:00
cb10cded2a [xhamster] Move into own file 2013-06-23 22:32:44 +02:00
cd8b830292 [Teamcoco] Move into own file 2013-06-23 22:31:50 +02:00
1ac4004f3a [flickr] Move into own file 2013-06-23 22:31:12 +02:00
e17d368ae2 [howcast] Move into own file 2013-06-23 22:30:16 +02:00
27110b0567 [hypem] Move into own file 2013-06-23 22:29:27 +02:00
9fe4de3471 [ina] Move into own file 2013-06-23 22:28:19 +02:00
d26d440e19 [redtube] Simplify 2013-06-23 22:27:34 +02:00
9f5daf0006 [redtube] move into own file 2013-06-23 22:27:16 +02:00
eb1634cbf8 [Vine] move into own file 2013-06-23 22:26:30 +02:00
01c10ca26e [VBox7] move into own file 2013-06-23 22:25:46 +02:00
45aef47281 [Bandcamp] move into own file 2013-06-23 22:24:58 +02:00
ae287755b7 [Tumblr] move into own file 2013-06-23 22:24:07 +02:00
a37f27ae99 [LiveLeak] move into own file 2013-06-23 22:23:19 +02:00
49f5f315fd [Spiegel] move into own file 2013-06-23 22:22:08 +02:00
97d2db017c [myspass] Move into own file and default to mp4 ext 2013-06-23 22:20:45 +02:00
2c64df0399 [keek] move into own file 2013-06-23 22:16:41 +02:00
828400422a [8tracks] Move into own file 2013-06-23 22:15:50 +02:00
c3c77cec30 [youjizz] move into own file 2013-06-23 22:14:22 +02:00
1183b85f50 [pornotube] move into own file 2013-06-23 22:13:32 +02:00
0143dc029c [YouPorn] move into own file 2013-06-23 22:12:14 +02:00
e10e576fed [RBMARadio] move into own file 2013-06-23 22:09:32 +02:00
78af8eb1d1 [ustream] move into its own file 2013-06-23 22:08:28 +02:00
79e93125d0 [justin.tv] move into own file 2013-06-23 22:07:27 +02:00
48db0b1f4a [FunnyOrDie] Remove unused import 2013-06-23 22:07:17 +02:00
8f0578f0fc Move FunnyOrDie into its own file 2013-06-23 22:05:23 +02:00
250f557872 Move WorldStarHipHop into its own file 2013-06-23 22:04:08 +02:00
462dc88b17 Move Steam IE into its own file 2013-06-23 22:02:56 +02:00
570fa151fc Move XNXX into its own file 2013-06-23 22:01:57 +02:00
9c286cfa00 Move Youku IE into its own file 2013-06-23 22:01:02 +02:00
80cbb6ddbb Move MixCloud into its own file 2013-06-23 21:59:15 +02:00
9fd5ce0cbe Move TED IE into its own file 2013-06-23 21:55:53 +02:00
1736dec629 Mark MTV as broken for now (#913) 2013-06-23 21:52:41 +02:00
b8a360837a Fix Statigram test 2013-06-23 21:34:40 +02:00
fc28721960 Add MTV IE file (oops) 2013-06-23 21:34:03 +02:00
51ce3a75c9 Improve error reporting for downloads 2013-06-23 21:33:11 +02:00
335056663a Move MTV IE into its own file 2013-06-23 21:27:38 +02:00
5b286728de Move NBA IE into its own file 2013-06-23 21:18:00 +02:00
291a168bcc Move StanfordOC IE into its own file 2013-06-23 21:16:32 +02:00
fda7d31aa0 Move infoq into its own file 2013-06-23 21:14:19 +02:00
cbf46c737c Move XVideos IE into its own file (and simplify it a bit) 2013-06-23 21:11:47 +02:00
7beb36a529 Move Collegehumor IE into its own file 2013-06-23 21:10:21 +02:00
153697660d Move Escapist into its own file 2013-06-23 21:08:17 +02:00
60a72e8d45 Simplify EscapistIE 2013-06-23 21:06:49 +02:00
426ff04282 Move DepositFiles into its own IE 2013-06-23 21:06:20 +02:00
a50e1b32e4 Add facebook import 2013-06-23 21:00:34 +02:00
9eae41ddef Move Facebook into its own file 2013-06-23 20:59:45 +02:00
aad0d6d5ba Move Soundcloud into its own file 2013-06-23 20:57:44 +02:00
7aca14a1ec Move G+ IE into its own file, and move google search into a more descriptive module 2013-06-23 20:55:15 +02:00
d1596ef439 Add import for google search 2013-06-23 20:51:42 +02:00
ea63e4998b Move comedycentral into its own file 2013-06-23 20:51:04 +02:00
a08dfd27a8 Move MyVideo into its own file 2013-06-23 20:48:32 +02:00
f58848011e Move blip.tv extractors into their own file 2013-06-23 20:44:48 +02:00
934858ad86 Move YahooSearchIE to youtube_dl.extractor.yahoo 2013-06-23 20:41:54 +02:00
3c25b9abae Remove useless headers 2013-06-23 20:35:50 +02:00
3fc03845a1 Move GoogleSearchIE into its own file 2013-06-23 20:32:49 +02:00
9b122384e9 Move GenericIE into its own file 2013-06-23 20:31:45 +02:00
9f4e6bbaeb Move gametrailers IE into its own file 2013-06-23 20:29:56 +02:00
b05654f0e3 Move YoutubeSearchIE to the other youtube IEs 2013-06-23 20:28:15 +02:00
9b3a760bbb [arte] Mark dead code as such 2013-06-23 20:26:35 +02:00
d5822b96b0 Move ARD, Arte, ZDF into their own files 2013-06-23 20:24:07 +02:00
b3d14cbfa7 Move Vimeo into its own file 2013-06-23 20:18:21 +02:00
d6039175e5 Move yahoo into its own file 2013-06-23 20:13:52 +02:00
97d6faaced Move Photobucket into its own file 2013-06-23 20:12:18 +02:00
219b8130df Move DailyMotion into its own file 2013-06-23 20:12:03 +02:00
38cbc40a64 Move Metacafe and Statigram into their own files, and remove absolute import 2013-06-23 20:07:51 +02:00
93d3a642a9 [youtube] remove dead code 2013-06-23 19:59:40 +02:00
c5e8d7af0e Move youtube extractors to youtube_dl.extractor.youtube 2013-06-23 19:58:33 +02:00
d6983cb460 Fix generic class move (add all files) 2013-06-23 19:57:38 +02:00
dd9829292e Improve vevo message 2013-06-23 19:45:42 +02:00
89cb0eb0b6 Use new signature calculation method only if sig is not present 2013-06-23 19:43:18 +02:00
9b5fffb149 added an IE and test for break.com 2013-06-23 22:42:51 +05:00
1f90438025 Merge remote-tracking branch 'jaimeMF/vevo_fix' 2013-06-23 19:42:27 +02:00
a130adb25b [Statigr.am] Correct uploader id 2013-06-23 19:41:28 +02:00
8756c5fe7a Merge remote-tracking branch 'origin/vimeo_passworded_videos' 2013-06-23 19:00:16 +02:00
828dba2983 Improvge error reporting 2013-06-23 18:59:01 +02:00
6b3f5a329b Improve Statigr.am IE 2013-06-23 18:58:53 +02:00
63ef586b05 Merge remote-tracking branch 'yasoob/master' 2013-06-23 18:45:50 +02:00
383a6a61b1 Merge pull request #905 from rbrito/manpage-apropos
README: Add brief description for manpages/apropos.
2013-06-23 09:41:59 -07:00
4fdd4e6f6f added test for Statigr 2013-06-23 18:56:26 +05:00
01ba4b80a7 added StatigrIE 2013-06-23 18:02:55 +05:00
de66764e4e added StatigrIE 2013-06-23 17:46:14 +05:00
1037d53988 GenericIE: look for Open Graph info
Only if there is a direct link to the file, don't try if it points to a Flash player
2013-06-23 13:26:49 +02:00
c3ab8f866c Change metavar of "--sub-format" from LANG to FORMAT 2013-06-23 12:59:20 +02:00
94eb2dd1fe README: Add brief description for manpages/apropos.
Trying to mimic the manpage of (GNU) `ls`, we don't conjugate the verb as
"downloads" or something else.

Signed-off-by: Rogério Brito <rbrito@ime.usp.br>
2013-06-22 19:16:11 -03:00
346b5ce8fd YoutubeIE: report warnings instead of errors if the subtitles are not found (related #901)
For example when downloading a playlist some videos may not have subtitles but the download shouldn't stop.
2013-06-22 14:15:33 +02:00
b37fbb990b Move the decrypting function to a static method 2013-06-22 13:20:06 +02:00
ef75f76f5c Detect more vevo videos 2013-06-22 13:13:40 +02:00
e296100005 Merge pull request #888 from rg3/youtube_playlists_fix_886
YoutubePlaylistIE: try to extract the url of the entries from the media$group dictionary (closes #886)
2013-06-22 03:35:32 -07:00
953dd93a48 YoutubePlaylistIE: don't look into entry['content']['src'], accruing to the docs this can return live stream urls 2013-06-22 12:32:27 +02:00
e704f4d378 YoutubeIE: If not subtitles language is given default to English for automatic captions (related #901) 2013-06-22 12:14:24 +02:00
77d0f05f71 YoutubeIE: Detect new Vevo style videos
The url_encoded_fmt_stream_map can be found in the video page, but the signature must be decrypted, we get it from the webpage instead of the `get_video_info` pages because we have only discover the algorithm for keys with both sub keys of size 43.
2013-06-21 21:51:10 +02:00
50d2376769 Leave out sig if not present (#896) 2013-06-21 01:22:47 +02:00
759d525301 release 2013.06.21 2013-06-21 00:33:44 +02:00
fcfa188548 Show which IEs are slow during release 2013-06-21 00:29:31 +02:00
f4c8bbcfc2 TEDIE: download the best quality video and use the new _search_regex functions
Also extracts the description.
2013-06-20 20:51:20 +02:00
31eead52e7 YoutubePlaylistIE: try to extract the url of the entries from the media$group dictionary
Extracting it from content can return rtsp urls.
2013-06-20 17:23:27 +02:00
038a3a1a61 RBMARadioIE: fix the extraction of the JSON data 2013-06-20 14:37:43 +02:00
587c68b2cd DailymotionIE: fix the extraction of the video uploader and use _search_regex for getting it 2013-06-20 14:15:29 +02:00
377fdf5dde Update the TumblrIE: the video is no longer available 2013-06-20 14:02:21 +02:00
5c67601931 Revert "Fix GooglePlusIE: the video_page url has changed of place"
The old method is working again.

This reverts commit 449d5c910c.
2013-06-20 13:53:04 +02:00
68f54207a3 SteamIE: only verify the age if needed
Also use the _html_search_regex function
2013-06-20 13:43:44 +02:00
bb47437686 Ignore invalid dates (Fixes #894) 2013-06-19 22:13:16 +02:00
213b715893 Merge pull request #887 from anisse/master
Fetch all entries that are in a youtube playlist

Also add a test.
2013-06-19 12:52:44 +02:00
449d5c910c Fix GooglePlusIE: the video_page url has changed of place 2013-06-18 14:22:16 +02:00
0251f9c9c0 add _search_regex to the new IEs 2013-06-17 19:47:44 +02:00
8bc7c3d858 Merge branch 'search_regex' - PR #872 - closes #847 2013-06-17 19:28:18 +02:00
af44c94862 use _search_regex in GenericIE 2013-06-17 19:25:35 +02:00
36ed7177f0 Fix HypemIE test: the song name has been changed 2013-06-16 20:42:28 +02:00
32aa88bcae Add GametrailersIE 2013-06-16 20:34:45 +02:00
51090d636b VimeoIE: allow to download password protected videos 2013-06-15 11:35:14 +02:00
31513ea6b9 Update test_issue_673 in Youtube Lists
Some videos have been removed.
Delete the title check, it's not the purpose of that test.
2013-06-15 11:20:22 +02:00
88cebbd7b8 YoutubePlaylistIE: get *all* videos
For that, we add parameter safeSearch=none that asks youtube not filter
results before sending them to us.

Note: this parameter could be added to YoutubeSearchIE and YoutubeUserIE
as well, but I don't know what would be the impact in term of unwanted
results. Maybe expose that as a parameter? For a playlist it's different
since the user chose what she put in the playlist.
2013-06-13 23:45:32 +02:00
fb8f7280bc GenericIE: try to find videos from twitter cards info 2013-06-13 08:26:39 +02:00
f380401bbd YoutubeSearchIE: the query is a str, in python 3 it fails if decode is called 2013-06-11 19:15:07 +02:00
9abc6c8b31 Update YahooIE test
The old test video is no longer available.
2013-06-10 19:42:02 +02:00
8cd252f115 Use long rtmpdump options
Note that we accidentally called rtmpdump with -v (--live) instead of -V (--verbose) because we missed this.
2013-06-10 18:14:45 +02:00
53f72b11e5 Allow unsetting the proxy with the --proxy option 2013-06-09 23:43:18 +02:00
ee55fcbe12 switch long info_dict fields checking to md5 2013-06-09 15:03:54 +02:00
78d3442b12 test: extend the reach of info_dict checking
* print the info_dict in a format suitable to easy adding to tests.json during tests if un-tested fields are detected
* make it possible to put the crc32 in tests.json if the field is too long
* complete the "info_dict" fields in existing tests
* fixed the bugs catched doing this
2013-06-09 14:21:42 +02:00
979a9dd4c4 _html_search_regex with clean_html superpowers 2013-06-09 11:57:13 +02:00
d5979c5d55 do not ask the user to report network errors 2013-06-09 11:55:08 +02:00
8027175600 Set the extractor key in playlists entries
If they were videos the extractor key wasn't being set anywhere else
Closes 877
2013-06-08 12:08:44 +02:00
3054ff0cbe Merge pull request #853 from mc2avr/master
add ZDFIE
2013-06-08 11:44:01 +02:00
cd453d38bb Merge pull request #878 from yasoob/master
Added Vbox7.com InfoExtractor and tests.
2013-06-08 10:54:47 +02:00
f5a290eed9 print "please report this issue on GitHub" on every ExtractorError 2013-06-08 09:56:34 +02:00
ecb3e676a5 Added Vbox7 Infoextractor 2013-06-08 12:44:38 +05:00
8b59a98610 XHamster: Can't see the description anywhere in the UI 2013-06-07 12:47:12 +02:00
8409501206 use search_regex in new IEs 2013-06-07 12:47:12 +02:00
be95cac157 raise exceptions on warnings during tests - and solve a couple of them 2013-06-07 12:46:23 +02:00
476203d025 print WARNINGs during test + minor fix to NBAIE 2013-06-06 15:07:05 +02:00
468e2e926b implement fallbacks and defaults in _search_regex 2013-06-06 14:35:08 +02:00
ac3e9394e7 Implement search_regex from #847 2013-06-06 14:01:44 +02:00
868d62a509 style and error handling edits to HypemIE 2013-06-06 12:02:36 +02:00
157b864a01 added HypemIE
rebased, closes PR #871
2013-06-06 12:01:07 +02:00
951b9dfd94 Merge pull request #866 from yasoob/master
Added support for XHamster - closes #841
2013-06-04 10:39:31 -07:00
1142d31164 Merge pull request #863 from davidcl/master
Add some tests to match Justin.tv / Twitch.tv URLs
2013-06-04 10:36:36 -07:00
9131bde941 SpiegelE: the page layout has changed a bit 2013-06-04 19:31:06 +02:00
1132c10dc2 Merge pull request #864 from jacobian/vimeopro
Fixed an error downloading vimeo pro videos.
2013-06-04 10:15:12 -07:00
c978a96c02 Added test for XHamster.com 2013-06-04 17:33:02 +05:00
71e458d437 Added support for xhamster in infoextractors 2013-06-04 17:30:54 +05:00
57bde0d9c7 Fix the test_all_urls (Import issue) 2013-06-04 13:10:12 +02:00
50b4d25980 Merge within test_all_urls 2013-06-04 13:06:49 +02:00
eda60e8251 VimeoIE: support videos from vimeopro.com 2013-06-04 12:04:54 +02:00
c794cbbb19 Fixed an error downloading vimeo pro videos. 2013-06-03 18:03:59 -05:00
4a76d1dbe5 Add tests for justin.tv and twitch.tv 2013-06-03 22:16:55 +02:00
418f734a58 Merge pull request #854 from rg3/youtube_automatic_captions
YoutubeIE: fallback to automatic captions when subtitles aren't found
2013-06-01 14:18:27 -07:00
dc1c355b72 YoutubeIE: fallback to automatic captions when subtitles aren't found (closes #843)
Also modify test_youtube_subtitles to support running the tests in any order.
2013-05-31 17:03:40 +02:00
1b2b22ed9f BlipTV: accept urls in the format http://a.blip.tv/api.swf#{id} (closes #857)
Tweak the regex so that BlipTV can be before BlipTVUser.
2013-05-28 15:12:39 +02:00
f2cd958c0a add ZDFIE and _download_with_mplayer(mms://,rtsp://) 2013-05-23 21:42:03 +02:00
57adeaea87 release 2013.05.23 2013-05-23 13:37:19 +02:00
8f3f1aef05 Fix HowCast IE 2013-05-23 13:34:33 +02:00
51d2453c7a small tweaks 2013-05-21 16:07:27 +02:00
45014296be Add TeamcocoIE (closes #212) 2013-05-21 14:37:32 +02:00
afef36c950 add support for Flickr videos - closes #261 2013-05-20 23:19:38 +02:00
b31756c18e Python 2 compat fixes for MyVideo.de rtmpdump downloads 2013-05-20 11:57:10 +02:00
f008688520 make rtmpdump inherit the verbose option for debugging 2013-05-20 11:54:21 +02:00
5b68ea215b Merge pull request #842 - myvideo, rtmp support
@dersphere code, from dersphere/plugin.video.myvideo_de.git
rewritten by @mc2avr
released in the Public Domain by the author
ref: https://github.com/rg3/youtube-dl/pull/842
2013-05-20 09:49:58 +02:00
b1d568f0bc HowcastIE: extract thumbnail 2013-05-20 08:39:41 +02:00
17bd1b2f41 VineIE: extract more information and minor style changes 2013-05-20 08:31:03 +02:00
5b0d3cc0cd Add support for Vine - closes #845 2013-05-20 00:33:14 +02:00
d4f76f1674 Add support for Howcast.com - closes #835 2013-05-18 19:17:19 +02:00
340fa21198 UstreamIE: get thumbnail and uploader name 2013-05-18 11:54:18 +02:00
de5d66d431 MyVideoIE: add rtmp support 2013-05-15 23:38:44 +02:00
7bdb17d4d5 Add extra_info argument to extract_info and process_ie_result
It allows to update the info_dicts with other values

(closes #840)
2013-05-14 14:40:40 +02:00
419c64b107 Throw a better error if the protocol is invalid 2013-05-13 19:54:07 +02:00
99a5ae3f8e Simplify generic search IE (Closes #839) 2013-05-13 19:53:52 +02:00
c7563c528b Merge remote-tracking branch 'jaimeMF/SearchIE' 2013-05-13 19:43:35 +02:00
e30e9318da Add base class SearchInfoExtractor for search queries IEs 2013-05-13 14:58:44 +02:00
5c51028d38 release 2013.05.14 2013-05-13 13:50:05 +02:00
c1d58e1c67 Merge pull request #834 from chocolateboy/install_prefix_fix
only install to /etc if PREFIX is /usr or /usr/local
2013-05-13 00:42:24 -07:00
02030ff7fe release 2013.05.13 2013-05-13 09:38:27 +02:00
f45c185fa9 Do not re-encode / to # if / is a platform separator, and correctly handle permission errors (Fixes #831) 2013-05-13 09:20:08 +02:00
1bd96c3a60 Deprecate --only-sub 2013-05-13 09:06:18 +02:00
929f85d851 Remove a print call used for debugging 2013-05-12 20:56:54 +02:00
98d4a4e6bc YoutubeSearchIE: return a playlist (related #838) 2013-05-12 20:53:37 +02:00
fb2f83360c FFmpegPostProcessor: decode stderr first and then get the last line (closes #837) 2013-05-12 19:08:32 +02:00
3c5e7729e1 GoogleSearchIE: change query urls to http://www.google.com/search
The old one was given HTTP 404 errors
2013-05-12 18:44:56 +02:00
5a853e1423 Fix YahooSearchIE: (closes #300) 2013-05-12 17:49:35 +02:00
2f58b12dad YahooIE: support more videos 2013-05-12 17:05:43 +02:00
59f4fd4dc6 YahooIE: remove old code and accept screen.yahoo.com videos (#300)
Videos require rtmpdump
2013-05-12 14:05:14 +02:00
5738240ee8 only install to /etc if PREFIX is /usr or /usr/local 2013-05-10 23:05:58 +01:00
86fd453ea8 Merge remote-tracking branch 'origin/master' 2013-05-10 09:21:24 +02:00
c83411b9ee Skip bandcamp tests for now - free limit has been exceeded 2013-05-10 09:10:34 +02:00
057c9938a1 Import FileDownloader in test_youtube_subtitles
Fix last commit
2013-05-10 08:37:49 +02:00
9259966132 test_youtube_subtitles: FakeDownloader inherits form FileDownloader 2013-05-10 08:31:30 +02:00
b08980412e Merge pull request #826 from jakeogh/master
Added --get-id option to print video IDs
2013-05-09 16:52:54 -07:00
532a1e0429 release 2013.05.10 2013-05-10 01:45:21 +02:00
2a36c352a0 Retry to disable YT ratelimit to unlock full bandwidth
This is the second attempt: a60b854d90
Sometimes the ratelimit=yes is already in the URL, and doubling it
leads to a 403. Now should work on all videos, at least works on all
I could test.

Closes #648
2013-05-09 00:39:10 +02:00
1a2adf3f49 added --get-id option to print video IDs 2013-05-05 22:30:07 -07:00
43b62accbb GoogleSearchIE: rename _download_n_results to _get_n_results 2013-05-05 22:12:41 +02:00
be74864ace Credit @JohnyMoSwag for WorldstarhiphopIE (#730) 2013-05-05 21:56:38 +02:00
0ae456f08a Credit @julienfr112 for Ina IE (#823) 2013-05-05 21:35:50 +02:00
0f75d25991 release 2013.05.07 2013-05-05 21:13:16 +02:00
67129e4a15 release 2013.05.06 2013-05-05 21:01:46 +02:00
dfb9323cf9 Clean up InaIE (Closes #823) 2013-05-05 20:57:19 +02:00
7f5bd09baf Add support to www.ina.fr 2013-05-05 20:54:36 +02:00
02d5eb935f Merge remote-tracking branch 'origin/master'
Conflicts:
	youtube_dl/InfoExtractors.py
2013-05-05 20:51:27 +02:00
94ca71b7cc Fix GoogleSearchIE (Fixes #822) 2013-05-05 20:49:57 +02:00
b338f1b154 FileDownloader: Simplify and document 2013-05-05 20:49:42 +02:00
486f0c9476 More callbacks changed to raise ExtractorError 2013-05-05 13:59:25 +02:00
d96680f58d PhotobucketIE: accept new format of urls and add a test 2013-05-05 13:07:00 +02:00
f8602d3242 ArteTvIE: Fix format of upload date 2013-05-05 11:48:47 +02:00
0c021ad171 More callbacks changed to raise ExtractorError 2013-05-04 14:23:16 +02:00
086d7b4500 Merge pull request #802 from joeframbach/master
If path and new_path are the same, then dont delete the file
2013-05-04 03:35:19 -07:00
891629c84a release 2013.05.05 2013-05-04 12:31:17 +02:00
ea6d901e51 Add --no-check-certificate (#814) 2013-05-04 12:22:56 +02:00
4539dd30e6 twitch.tv chapters (#810): print out start and end time 2013-05-04 12:02:18 +02:00
c43e57242e twitch.tv chapters: Include uploader (#810) 2013-05-04 11:44:59 +02:00
db8fd71ca9 twitch.tv chapters: Use API for title and other metadata 2013-05-04 11:42:44 +02:00
f4f316881d Improve Twitch.tv chapter support (#810) 2013-05-04 11:27:39 +02:00
0e16f09474 Work on twitch.tv chapters (#810) 2013-05-04 10:36:37 +02:00
09dd418f53 Experimentally whitelist Escapist test 2013-05-04 09:11:38 +02:00
decd1d1737 raise ExtractorError instead of calling back 2013-05-04 08:38:28 +02:00
180e689f7e Simplify WorldStarHipHop 2013-05-04 08:06:56 +02:00
7da5556ac2 Better fix for getting source url's 2013-05-04 08:04:28 +02:00
f23a03a89b updated regular experssion for possible future updates to source url 2013-05-04 07:59:33 +02:00
84e4682f0e Always use HTTPS for youtube (Fixes #691) 2013-05-04 07:49:25 +02:00
1f99511210 release 2013.05.04 2013-05-04 07:12:33 +02:00
0d94f2474c Work around a Python bug on Windows with UTF-8 configuration (#820) 2013-05-04 07:09:50 +02:00
480b6c1e8b Fix comedycentral: newest 2013-05-04 02:53:26 +02:00
95464f14d1 Credit @yasoob for IE 2013-05-03 20:08:16 +02:00
c34407d16c Simplify RedTube 2013-05-03 20:07:35 +02:00
5e34d2ebbf Moved redtube info extractor to the end 2013-05-03 23:57:16 +06:00
815dd2ffa8 Redtube test now works
I just did a little makeover by changing redtube tests. Now they are passed.
2013-05-03 23:51:27 +06:00
ecd5fb49c5 added redtube.com in InfoExtractors (2nd pull request with the required amindments)
added redtube.com in InfoExtractors (2nd pull request with the required amindments). Now this script can also download redtube.com videos
2013-05-03 22:44:34 +06:00
b86174e7a3 added test for redtube.com
I just added the test for redtube.com
2013-05-03 22:40:56 +06:00
2e2038dc35 TEDIE: report the correct talk title when a link with the language code is given 2013-05-02 18:28:07 +02:00
46bfb42258 InfoExtractors: use _download_webpage in more IEs
IEs without tests are intact.
2013-05-02 18:18:27 +02:00
feecf22511 InfoExtractors: fix some regular expressions where dots weren't escaped 2013-05-02 13:39:56 +02:00
4c4f15eb78 Merge pull request #815 from JohnyMoSwag/master
Update for new source links on worldstarhiphop.com
2013-05-02 13:23:32 +02:00
104ccdb8b4 TumblrIE: fix title matching 2013-05-02 13:12:41 +02:00
6ccff79594 Small update for additon of new video source links 2013-05-01 20:30:14 -07:00
aed523ecc1 Add BandcampIE (closes #568) 2013-05-01 15:55:46 +02:00
d496a75d0a release 2013.05.01 2013-05-01 14:07:23 +02:00
5c01dd1e73 Merge remote-tracking branch 'origin/master' 2013-05-01 14:05:02 +02:00
11d9224e3b add --write-thumbnail option to download thumbnail (Suggested by `) 2013-05-01 14:04:33 +02:00
34c29ba1d7 Add test for SoundcloudSet 2013-04-30 21:23:38 +02:00
6cd657f9f2 release 2013.04.31 2013-04-30 19:50:20 +02:00
4ae9e55822 Correctly clear the line before writing a new status line 2013-04-30 19:42:58 +02:00
8749b71273 Fix FakeDownloaders 2013-04-30 19:42:13 +02:00
dbc50fdf82 Fix help for --proxy 2013-04-30 18:27:54 +02:00
b1d2ef9255 release 2013.04.30 2013-04-30 18:00:56 +02:00
5fb16555af --proxy option 2013-04-30 17:57:13 +02:00
ba7c775a04 Remove a commented line I forgot.
[ci skip]
2013-04-30 14:21:46 +02:00
fe348844d9 SoundcloudSetIE: Use upload_date in the unified format (fixes #812) 2013-04-29 23:57:36 +02:00
767e00277f Use report_warning when a not working IE will be uses 2013-04-28 17:12:07 +02:00
6ce533a220 release 2013.04.28 2013-04-28 16:32:05 +02:00
08b2ac745a Default to --title (Fixes #499) 2013-04-28 16:26:11 +02:00
46a127eecb Fix print_notes 2013-04-28 16:21:29 +02:00
fc63faf070 release 2013.04.27 2013-04-28 15:53:14 +02:00
9665577802 Adapt tests to changes in youtube's "Most Popular" channel 2013-04-28 15:50:29 +02:00
434aca5b14 Automatically set HTTPS proxy if given (Fixes #805) 2013-04-28 15:41:05 +02:00
e31852aba9 Document the video selection using the upload date 2013-04-28 12:02:30 +02:00
37254abc36 Allow to use relative dates in the format (now|today)[+-][0-9](day|week|month|year)(s)? (Closes #137)
Also fix DateRange not accepting ranges of one day.
2013-04-28 11:39:37 +02:00
a11ea50319 Re-enable Dailymotion (tests pass) 2013-04-27 21:53:21 +02:00
81df121dd3 Merge branch 'master' of github.com:rg3/youtube-dl 2013-04-27 20:26:42 +02:00
50f6412eb8 Rename soundcloud to soundcloud:set 2013-04-27 20:12:46 +02:00
bf50b0383e Fix some IEs that didn't return the uploade_date in the YYYYMMDD format
Create a function unified_strdate in utils.py to fix these problems
2013-04-27 15:14:20 +02:00
bd55852517 Allow to select videos to download by their upload dates (related #137)
Only absolute dates.
2013-04-27 14:01:55 +02:00
4c9f7a9988 SteamIE: accept urls with agecheck 2013-04-27 11:03:34 +02:00
aba8df23ed YoutubePlaylistIE: don't crash with empty lists (related #808)
The playlist_title wasn't initialized.
2013-04-27 10:41:52 +02:00
3820df0106 Merge pull request #801 from expleo/add_referer_support 2013-04-26 19:34:32 +02:00
e74c504f91 Dont delete source file when source file and post-processed file are the same 2013-04-24 21:59:10 +00:00
fa70605db2 IEs: clean __init__ methods
They are not needed
2013-04-24 23:05:43 +02:00
0d173446ff InfoExtractors: use report_download_webpage in _request_webpage
Allows to show the warning when falling back on GenericIE
2013-04-24 22:11:57 +02:00
320e26a0af Clean duplicate method report_download_webpage in InfoExtractors 2013-04-24 22:02:20 +02:00
a3d689cfb3 Fix InfoQ 2013-04-24 21:16:10 +02:00
59cc5d9380 Updated README 2013-04-24 14:12:33 +02:00
28535652ab Adds support for passing a referer. 2013-04-24 13:56:04 +02:00
7b670a4483 YouTube: Fall back to <meta> description if video is rated (Fixes #800) 2013-04-23 13:54:17 +02:00
69fc019f26 YoutubeIE when no description is found use an empty unicode string (closes #800) 2013-04-23 12:24:08 +02:00
613bf66939 More calls to trouble changed to report_error 2013-04-23 11:31:37 +02:00
9edb0916f4 Disable colored messages in Windows (related #794) 2013-04-23 11:09:22 +02:00
f4b659f782 Document order of preference for format selection (closes #798) 2013-04-23 10:33:54 +02:00
c70446c7df Merge branch 'master' of github.com:rg3/youtube-dl 2013-04-22 23:15:15 +02:00
c76cb6d548 Correct indentation 2013-04-22 23:15:05 +02:00
71f37e90ef Merge pull request #797 from AI0867/patch-1
Use standard unit symbols in format_bytes
2013-04-22 14:13:52 -07:00
75b5c590a8 Do not read configuration files if explicit arguments are given by a host program (#792) 2013-04-22 23:05:14 +02:00
4469666780 Merge pull request #792 from fp7/master
Parameters as arguments to main
2013-04-22 13:44:05 -07:00
c15e024141 TumblrIE
I haven't found many videos to test, so it may not work for all.
2013-04-22 21:27:27 +02:00
8cb94542f4 release 2013.04.22 2013-04-22 20:01:56 +02:00
c681a03918 Fix --list-formats (Closes #799) 2013-04-22 19:51:56 +02:00
30f2999962 Added parenthesis for explicity 2013-04-22 10:15:58 +02:00
74e3452b9e Add playlist and playlist_index to the help string for the output option
Also split the help string in different lines to make editing easier.
2013-04-22 10:06:07 +02:00
9e1cf0c200 SteamIE returns a playlist
With the game name as title.
2013-04-21 22:05:21 +02:00
e11eb11906 Allow to download videos with age check from Steam
Also move method report_age_confirmation to the base IE class.
2013-04-21 21:56:13 +02:00
c04bca6f60 release 2013.04.21 2013-04-21 12:52:45 +02:00
b0936ef423 Use standard unit symbols in format_bytes 2013-04-21 02:38:37 +03:00
41a6eb949a Clean duplicate method report_extraction in InfoExtractors
A lot of IEs had implemented the method in the same way.
2013-04-20 21:12:29 +02:00
f17ce13a92 Write the method to_screen in InfoExtractor (related #608)
Except the ones in youtube subtypes (user, channels ..) all calls to _downloader.to_screen has been changed.
The calls not prefixed with the IE name hasn't been touched.
2013-04-20 20:55:40 +02:00
8c416ad29a Remove calls to _downloader.download in Youtube searchs
Instead, return the urls of the videos.
2013-04-20 19:22:45 +02:00
c72938240e Get the title of Youtube playlists 2013-04-20 18:57:05 +02:00
e905b6f80e TEDIE can now return a playlist 2013-04-20 13:31:21 +02:00
6de8f1afb7 Allows to specify which IE should be used for extracting info for a result of type url 2013-04-20 12:58:35 +02:00
9341212642 Create a function in InfoExtractors that returns the InfoExtractor class with the given name 2013-04-20 12:42:57 +02:00
f7a9721e16 Fix some metacafe videos, closes #562 2013-04-20 12:06:58 +02:00
089e843b0f Use _download_webpage in MetacafeIE 2013-04-20 11:40:05 +02:00
c8056d866a Add myself to travis notifications 2013-04-20 11:17:03 +02:00
49da66e459 The test video for subtitles has added a new language 2013-04-20 10:39:02 +02:00
fb6c319904 Add tests for YoutubeChannelIE
- tests for identifying channel urls
- test retrieval of paginated channel
- test retrieval of autogenerated channel
2013-04-19 18:11:05 -04:00
5a8d13199c Fix YoutubeChannelIE
- urls with query parameters now match
- fixes regex for identifying videos
- fixes pagination
2013-04-19 18:05:35 -04:00
dce9027045 Merge branch 'extract_info_rewrite' 2013-04-19 21:57:08 +02:00
feba604e92 Fix playlists with size 50i ∀ i∉ℕ (Closes #782) 2013-04-18 07:28:43 +02:00
d22f65413a release 2013.04.18 2013-04-18 06:29:32 +02:00
0599ef8c08 Limit titles to 200 characters (Closes #789) 2013-04-18 06:27:11 +02:00
bfdf469295 Fix FunnyOrDie extraction for a special video (#789) 2013-04-18 06:21:46 +02:00
32c96387c1 Fix facebook IE 2013-04-18 04:41:48 +02:00
c8c5443bb5 Revert "disable YT ratelimit; this should enable to max out the connection bandwidth"
Although cool, that seems to break a lot of youtube videos.

This reverts commit a60b854d90.
2013-04-17 23:22:25 +02:00
a60b854d90 disable YT ratelimit; this should enable to max out the connection bandwidth 2013-04-17 19:48:35 +02:00
b8ad4f02a2 Arguments as parameter to function _real_main so it can be used programmatically 2013-04-16 19:26:48 +02:00
d281274bf2 Add a playlist_index key to the info_dict, can be used in the output template 2013-04-16 15:13:29 +02:00
b625bc2c31 release 2013.04.11 2013-04-11 18:42:57 +02:00
f4381ab88a Fix keek title extraction 2013-04-11 18:39:13 +02:00
744435f2a4 Show whole diff in error cases 2013-04-11 18:38:43 +02:00
855703e55e Option to dump intermediate pages 2013-04-11 18:31:35 +02:00
927c8c4924 Use download_webpage in youtube IE 2013-04-11 18:18:15 +02:00
0ba994e9e3 Skip ARD test as it requires rtmpdump 2013-04-11 17:20:17 +02:00
af9ad45cd4 Re-enable Stanford OC test 2013-04-11 17:20:05 +02:00
e0fee250c3 Fix default for variable-size autonumbering 2013-04-11 17:07:55 +02:00
72ca05016d Merge remote-tracking branch 'sagittarian/vimeo-no-desc' 2013-04-11 10:56:01 +02:00
844d1f9fa1 Removed overly verbose options and arguments (Should be obvious from the previous lines) 2013-04-11 10:54:37 +02:00
213c31ae16 Added option --autonumber-size:
Specifies the number of digits in %(autonumber)s when it is present in output filename template or --autonumber option is given
2013-04-11 10:53:57 +02:00
04f3d551a0 Merge remote-tracking branch 'sagittarian/resolve-symlinks' 2013-04-11 10:51:13 +02:00
e8600d69fd Credit @catch22 for ARD IE 2013-04-11 10:48:37 +02:00
b03d65c237 Minor improvements for ARD IE 2013-04-11 10:47:21 +02:00
8743974189 Resolve the symlink if __main__.py is invoke as a symlink. 2013-04-11 08:02:17 +03:00
dc36bc9434 Fix bug when the vimeo description is empty on Python 2.x. 2013-04-11 07:27:04 +03:00
bce878a7c1 Implement the playlist/start options in FileDownloader
It makes it available for all the InfoExtractors
2013-04-10 14:32:03 +02:00
532d797824 In MetacafeIE return a url if YoutubeIE should do the job 2013-04-10 00:06:03 +02:00
146c12a2da Change the order for extracting/downloading
Now it gets a video info and directly downloads it, the it pass to the next video founded.
2013-04-10 00:05:04 +02:00
d39919c03e Add progress counter for playlists
Closes #276
2013-04-09 13:45:52 +02:00
df2dedeefb added ARD InfoExtractor (german state television) 2013-04-07 15:23:48 +02:00
adb029ed81 added --playpath/-y support to RTMP downloads (via 'play_path' entry in 'info_dict') 2013-04-07 15:17:36 +02:00
43ff1a347d Change rg3.github.com to rg3.github.io almost everywhere 2013-04-06 10:46:17 +02:00
14294236bf Merge branch 'master' into extract_info_rewrite 2013-04-05 12:39:51 +02:00
c2b293ba30 release 2013.04.03 2013-04-03 19:43:53 +02:00
37cd9f522f Restore youtube-dl (update) binary (#770) 2013-04-01 23:43:20 +02:00
f33154cd39 Merge pull request #764 from jaimeMF/subtitles_not_found
Fix crash when subtitles are not found
2013-03-31 19:02:18 -07:00
bafeed9f5d Don't crash in FileDownloader if subtitles couldn't be found and errors are ignored 2013-03-31 12:21:35 +02:00
ef767f9fd5 Fix crash when subtitles are not found and the option --all-subs is given 2013-03-31 12:19:13 +02:00
bc97f6d60c Use report_error in subtitles error handling 2013-03-31 12:10:12 +02:00
90a99c1b5e retry on UnavailableVideoError 2013-03-31 03:29:34 +02:00
f375d4b7de import all IEs when testing to resemble more closely the real env 2013-03-31 03:12:28 +02:00
fa41fbd318 don't catch YT user URLs in YoutubePlaylistIE (fix #754, fix #763) 2013-03-31 03:02:49 +02:00
6a205c8876 More fixes on subtitles errors handling 2013-03-30 14:17:12 +01:00
0fb3756409 Fix crash when subtitles are not found 2013-03-30 14:11:33 +01:00
fbbdf475b1 Different feed file name 2013-03-29 21:44:11 +01:00
c238be3e3a Correct feed title 2013-03-29 21:41:20 +01:00
1bf2801e6a release 2013.03.29 2013-03-29 21:22:57 +01:00
c9c8402093 Merge pull request #758 from jaimeMF/atom-feed
Add an Atom feed generator in devscripts
2013-03-29 12:50:20 -07:00
6060788083 Write a new feed each time, reading from versions.json 2013-03-29 19:42:33 +01:00
e3700fc9e4 Merge pull request #736 from rg3/retry
Exception stacking and test retry
2013-03-29 09:01:27 -07:00
b693216d8d Merge pull request #752 from dodo/master
SoundcloudSetIE
2013-03-29 08:40:22 -07:00
46b9d8295d Merge pull request #730 by @JohnyMoSwag
Support for Worldstarhiphop.com
2013-03-29 16:14:49 +01:00
7decf8951c fix FunnyOrDieIE, MyVideoIE, TEDIE 2013-03-29 15:59:13 +01:00
1f46c15262 fix SpiegelIE 2013-03-29 15:31:38 +01:00
0cd358676c Rebased, fixed and extended LiveLeak.com support
close #757 - close #761
2013-03-29 15:13:24 +01:00
43113d92cc Update InfoExtractors.py 2013-03-29 14:23:09 +01:00
7eab8dc750 Pass the playlist info_dict to process_info
the playlist value can be used in the output template
2013-03-29 12:32:42 +01:00
44e939514e Added test for WorldStarHipHop 2013-03-28 20:05:28 -07:00
95506f1235 Merge remote-tracking branch 'jaimeMF/color_error_messages' 2013-03-29 00:25:48 +01:00
a91556fd74 Add a note on MaxDownloadsReached (#732, thanks to CBGoodBuddy) 2013-03-29 00:20:13 +01:00
1447f728b5 Merge branch 'master' of github.com:rg3/youtube-dl 2013-03-29 00:06:48 +01:00
d2c690828a Add title and id to playlist results
Not all IE give both. They are not used yet.
2013-03-28 13:39:00 +01:00
cfa90f4adc Merge branch 'master' into extract_info_rewrite 2013-03-28 13:20:33 +01:00
898280a056 use sys.stdout.buffer only on Python3 2013-03-28 13:13:03 +01:00
59b4a2f0e4 Merge pull request #762 from jynnantonix/master
Use sys.stdout.buffer when writing to standard out
2013-03-28 05:11:51 -07:00
1ee9778405 Use sys.stdout.buffer instead of sys.stdout
sys.stdout defaults to text mode, we need to use the underlying buffer
instead when writing binary data.

Signed-off-by: Chirantan Ekbote <chirantan.ekbote@gmail.com>
2013-03-27 15:57:11 -04:00
db74c11d2b Add an Atom feed generator in devscripts 2013-03-26 18:13:52 +01:00
5011cded16 SoundcloudSetIE
info extractor for soundcloud sets
2013-03-24 02:24:07 +01:00
f10b2a9c14 fix KeekIE 2013-03-20 12:13:52 +01:00
5cb3c0b319 Merge pull request #699 by @iemejia
Removed innecesary function to convert subtitles, improved use of the youtube api
2013-03-20 11:35:55 +01:00
b9fc428494 add '--write-srt' and '--srt-lang' aliases for backwards compatibility 2013-03-20 11:29:07 +01:00
c0ba104674 Fixed typo in error message when no subtitles were available. 2013-03-20 08:41:54 +01:00
2a4093eaf3 Added new option '--list-subs' to show the available subtitle languages 2013-03-20 08:41:54 +01:00
9e62bc4439 Added new option '--sub-format' to choose the format of the subtitles to downloade (defaut=srt) 2013-03-20 08:41:54 +01:00
553d097442 Refactor subtitle options from srt to the more generic 'sub'.
In order to be more consistent with different subtitle formats.
From:
* --write-srt to --write-sub
* --only-srt to --only-sub
* --all-srt to --all-subs
* --srt-lang to --sub-lang'

Refactored also all the mentions of srt for sub in all the source code.
2013-03-20 08:41:53 +01:00
ae608b8076 Added new option '--all-srt' to download all the subtitles of a video.
Only works in youtube for the moment.
2013-03-20 08:41:53 +01:00
c397187061 Spiegel: Support hash at end of URL 2013-03-16 23:52:17 +01:00
e32b06e977 Spiegel IE 2013-03-12 01:08:54 +01:00
8c42c506cd Add configuration to -v output 2013-03-12 00:10:05 +01:00
8cc83b8dbe Bubble up all the stack of exceptions and retry download tests on timeout errors 2013-03-09 10:05:43 +01:00
51af426d89 forgot to fix this. 2013-03-08 22:52:17 -08:00
08ec0af7c6 catch fatal error 2013-03-08 22:48:05 -08:00
3b221c5406 removed str used for other project. 2013-03-08 22:39:45 -08:00
3d3423574d Fix Unicode handling GenericIE (Fixes #734) 2013-03-08 20:47:06 +01:00
e5edd51de4 Clear up error messages (#734) 2013-03-08 20:12:05 +01:00
64c78d50cc working - worldstarhiphop IE
Support for WorldStarHipHop
2013-03-07 16:27:21 -08:00
b3bcca0844 clean up 2013-03-07 15:39:17 -08:00
61e40c88a9 fixed typo 2013-03-06 21:14:46 -08:00
40634747f7 Support for WorldStarHipHop.com 2013-03-06 21:09:55 -08:00
c2e21f2f0d Merge pull request #728 from timdoug/fix-escapist-extension
Escapist videos are acutally .mp4, not .flv
2013-03-06 10:26:18 -08:00
47dcd621c0 Escapist videos are acutally .mp4, not .flv 2013-03-06 12:46:45 -05:00
a0d6fe7b92 When a redirect is found return the new url using the new style 2013-03-05 22:33:32 +01:00
c9fa1cbab6 More trouble calls changed in InfoExtractors.py
The calls with the message starting with 'WARNING' have been changed to report_warning instead of report_error
2013-03-05 21:13:17 +01:00
8a38a194fb Add auxiliary methods to InfoExtractor to set the '_type' key and use them for some playlist IEs 2013-03-05 20:55:48 +01:00
6ac7f082c4 extract_info now expects ie.extract to return a list in the format proposed in issue 608.
Each element should have a '_type' key specifying if it's a video, an url or a playlist.
`extract_info` will process each element to get the full info
2013-03-05 20:14:32 +01:00
f6e6da9525 Use extract_info in BlipTV User and Youtube Channel 2013-03-05 12:26:18 +01:00
597cc8a455 Use extract_info in YoutubePlaylist and YoutubeSearch 2013-03-05 11:58:01 +01:00
3370abd509 Merge branch 'master' into extract_info_rewrite 2013-03-04 22:25:46 +01:00
631f73978c Add a method for extracting info from a list of urls 2013-03-04 22:16:42 +01:00
e5f30ade10 Use report_error in InfoExtractors.py
Some calls haven't been changed
2013-03-04 15:56:14 +01:00
6622d22c79 Use report_error in FileDownloader.py 2013-03-04 11:47:58 +01:00
4e1582f372 Use red color when printing error messages 2013-03-04 11:27:25 +01:00
967897fd22 Fix Python 3 errors with rmtp downloads 2013-03-03 22:38:38 +01:00
f918ec7ea2 Clarify rate limit documentation (Closes #723) 2013-03-03 22:35:26 +01:00
a2ae43a55f Remove changed playlist test (#661) 2013-03-03 22:19:19 +01:00
7ae153ee9c Remove tweetreel - it has shut down 2013-03-03 22:15:06 +01:00
f7b567ff84 Use proper urlparse functions and simplify a bit 2013-03-03 22:09:44 +01:00
f2e237adc8 Merge remote-tracking branch 'jcarlosgarciasegovia/master' 2013-03-03 22:04:06 +01:00
2e5457be1d Use report_warning in InfoExtractors 2013-03-02 11:24:07 +01:00
7f9d41a55e Allow downloading http://blip.tv/play/ embeded URLs 2013-03-01 10:22:16 +00:00
8207626bbe Use color when printing warning messages 2013-02-28 22:07:29 +01:00
df8db1aa21 Create extract_info method 2013-02-26 23:33:58 +01:00
691db5ba02 Don't be too clever (Fixes Python 3) 2013-02-26 22:03:43 +01:00
acb8752f80 fix tests in Python3, and make them parallelizable 2013-02-26 22:03:33 +01:00
679790eee1 Do not user upper-case for non-constants 2013-02-26 20:03:19 +01:00
6bf48bd866 Merge remote-tracking branch 'origin/API_YT_playlists' 2013-02-26 19:58:04 +01:00
790d4fcbe1 Merge pull request #715 from joksnet/no_video_results
[YT Search] No results if items is not in response
2013-02-26 10:43:35 -08:00
89de9eb125 Modified Youtube video/playlist matching; fixes #668; fixes #585 2013-02-26 19:06:41 +01:00
6324fd1d74 Switch YTPlaylistIE to API (relevant: #586); fixes #651; fixes #673; fixes #661 2013-02-26 19:06:28 +01:00
9e07cf2955 [YT Search] No results if items is not in response
When a query results of 0 items, the key items is not present in the
api_response dictionary, raising a KeyError.

Intead, look for the key and call trouble if it's not present.
2013-02-26 18:06:43 +01:00
f03b88b3fb Merge remote-tracking branch 'joksnet/not_keep_video_message' 2013-02-25 00:35:12 +01:00
97d0365f49 release 2013.02.25 2013-02-25 00:28:19 +01:00
12887875a2 Fix typo 2013-02-25 00:22:55 +01:00
450e709972 Formalize URL creation (prepare for some cleanup in blip.tv:users) 2013-02-24 23:23:50 +01:00
9befce2b8c Merge remote-tracking branch 'joksnet/ytsearch_decode_request' 2013-02-24 23:14:34 +01:00
cb99797798 Test TED thumbnail 2013-02-24 01:01:20 +01:00
f82b28146a Merge remote-tracking branch 'jaimeMF/TED' 2013-02-24 00:59:22 +01:00
4dc72b830c Merge remote-tracking branch 'jaimeMF/Steam' 2013-02-24 00:59:03 +01:00
ea05129ebd release 2013.02.22 2013-02-24 00:47:08 +01:00
35d217133f Message for delete video it's not an error.
When using youtube-dl from another python script with the quiet option
on, and a post procesor for extract the audio. The message of deleting
video shows in the first script logs (as it goes to stderr).

There is no way to keep this quiet as it's treated as an error, even if,
for me, it's not.
2013-02-23 22:52:52 +01:00
d1b7a24354 Decode the data requested to the api in utf-8. 2013-02-23 22:47:22 +01:00
c85538dba1 TED: get thumbnails 2013-02-23 17:27:49 +01:00
60bd48b175 Steam: get thumbnails 2013-02-23 16:48:15 +01:00
4be0aa3539 release 2012.02.22 2013-02-22 16:41:36 +01:00
f636c34481 Stop early in nosetests (in release script) 2013-02-22 16:40:19 +01:00
3bf79c752e Print *all* release notes 2013-02-22 00:36:23 +01:00
cdb130b09a Added new option '--only-srt' to download only the subtitles of a video
Improved option '--srt-lang'
 - it shows the argument in case of missing subtitles
 - added language suffix for non-english languages (e.g. video.it.srt)
2013-02-21 22:12:36 +01:00
2e5d60b7db Removed conversion from youtube closed caption format to srt since youtube api supports the 'srt' format 2013-02-21 20:51:35 +01:00
8271226a55 Fix --match-title and --reject-title decoding (Closes #690) 2013-02-21 17:09:39 +01:00
1013186a17 Also check for JSLoader of JWSPlayer (thanks to @maximeg, Closes #685) 2013-02-21 16:56:48 +01:00
7c038b3c32 Import HTTPErrorProcessor from the correct module (Closes #696) 2013-02-21 16:49:05 +01:00
c8cd8e5f55 release 2013.02.19 2013-02-19 00:06:04 +01:00
471cf47796 include bash completion and manpage in PyPi dist 2013-02-18 23:56:13 +01:00
202 changed files with 18870 additions and 5440 deletions

8
.gitignore vendored
View File

@ -18,3 +18,11 @@ youtube-dl.tar.gz
cover/
updates_key.pem
*.egg-info
*.srt
*.sbv
*.vtt
*.flv
*.mp4
*.part
test/testdata
.tox

View File

@ -3,11 +3,16 @@ python:
- "2.6"
- "2.7"
- "3.3"
before_install:
- sudo apt-get update -qq
- sudo apt-get install -qq rtmpdump
script: nosetests test --verbose
notifications:
email:
- filippo.valsorda@gmail.com
- phihag@phihag.de
- jaime.marquinez.ferrandiz+travis@gmail.com
- yasoob.khld@gmail.com
# irc:
# channels:
# - "irc.freenode.org#youtube-dl"

View File

@ -1,3 +1,5 @@
include README.md
include test/*.py
include test/*.json
include test/*.json
include youtube-dl.bash-completion
include youtube-dl.1

View File

@ -9,9 +9,19 @@ cleanall: clean
PREFIX=/usr/local
BINDIR=$(PREFIX)/bin
MANDIR=$(PREFIX)/man
SYSCONFDIR=/etc
PYTHON=/usr/bin/env python
# set SYSCONFDIR to /etc if PREFIX=/usr or PREFIX=/usr/local
ifeq ($(PREFIX),/usr)
SYSCONFDIR=/etc
else
ifeq ($(PREFIX),/usr/local)
SYSCONFDIR=/etc
else
SYSCONFDIR=$(PREFIX)/etc
endif
endif
install: youtube-dl youtube-dl.1 youtube-dl.bash-completion
install -d $(DESTDIR)$(BINDIR)
install -m 755 youtube-dl $(DESTDIR)$(BINDIR)
@ -30,15 +40,15 @@ tar: youtube-dl.tar.gz
pypi-files: youtube-dl.bash-completion README.txt youtube-dl.1
youtube-dl: youtube_dl/*.py
zip --quiet youtube-dl youtube_dl/*.py
youtube-dl: youtube_dl/*.py youtube_dl/*/*.py
zip --quiet youtube-dl youtube_dl/*.py youtube_dl/*/*.py
zip --quiet --junk-paths youtube-dl youtube_dl/__main__.py
echo '#!$(PYTHON)' > youtube-dl
cat youtube-dl.zip >> youtube-dl
rm youtube-dl.zip
chmod a+x youtube-dl
README.md: youtube_dl/*.py
README.md: youtube_dl/*.py youtube_dl/*/*.py
COLUMNS=80 python -m youtube_dl --help | python devscripts/make_readme.py
README.txt: README.md
@ -47,7 +57,7 @@ README.txt: README.md
youtube-dl.1: README.md
pandoc -s -f markdown -t man README.md -o youtube-dl.1
youtube-dl.bash-completion: youtube_dl/*.py devscripts/bash-completion.in
youtube-dl.bash-completion: youtube_dl/*.py youtube_dl/*/*.py devscripts/bash-completion.in
python devscripts/bash-completion.py
bash-completion: youtube-dl.bash-completion
@ -61,6 +71,7 @@ youtube-dl.tar.gz: youtube-dl README.md README.txt youtube-dl.1 youtube-dl.bash-
--exclude '*~' \
--exclude '__pycache' \
--exclude '.git' \
--exclude 'testdata' \
-- \
bin devscripts test youtube_dl \
CHANGELOG LICENSE README.md README.txt \

271
README.md
View File

@ -1,7 +1,7 @@
% YOUTUBE-DL(1)
# NAME
youtube-dl
youtube-dl - download videos from youtube.com or other video platforms
# SYNOPSIS
**youtube-dl** [OPTIONS] URL [URL...]
@ -14,113 +14,171 @@ your Unix box, on Windows or on Mac OS X. It is released to the public domain,
which means you can modify it, redistribute it or use it however you like.
# OPTIONS
-h, --help print this help text and exit
--version print program version and exit
-U, --update update this program to latest version
-i, --ignore-errors continue on download errors
-r, --rate-limit LIMIT download rate limit (e.g. 50k or 44.6m)
-R, --retries RETRIES number of retries (default is 10)
--buffer-size SIZE size of download buffer (e.g. 1024 or 16k) (default
is 1024)
--no-resize-buffer do not automatically adjust the buffer size. By
default, the buffer size is automatically resized
from an initial value of SIZE.
--dump-user-agent display the current browser identification
--user-agent UA specify a custom user agent
--list-extractors List all supported extractors and the URLs they
would handle
-h, --help print this help text and exit
--version print program version and exit
-U, --update update this program to latest version. Make sure
that you have sufficient permissions (run with
sudo if needed)
-i, --ignore-errors continue on download errors, for example to to
skip unavailable videos in a playlist
--abort-on-error Abort downloading of further videos (in the
playlist or the command line) if an error occurs
--dump-user-agent display the current browser identification
--user-agent UA specify a custom user agent
--referer REF specify a custom referer, use if the video access
is restricted to one domain
--list-extractors List all supported extractors and the URLs they
would handle
--extractor-descriptions Output descriptions of all supported extractors
--proxy URL Use the specified HTTP/HTTPS proxy
--no-check-certificate Suppress HTTPS certificate validation.
--cache-dir DIR Location in the filesystem where youtube-dl can
store downloaded information permanently. By
default $XDG_CACHE_HOME/youtube-dl or ~/.cache
/youtube-dl .
--no-cache-dir Disable filesystem caching
## Video Selection:
--playlist-start NUMBER playlist video to start at (default is 1)
--playlist-end NUMBER playlist video to end at (default is last)
--match-title REGEX download only matching titles (regex or caseless
sub-string)
--reject-title REGEX skip download for matching titles (regex or
caseless sub-string)
--max-downloads NUMBER Abort after downloading NUMBER files
--min-filesize SIZE Do not download any videos smaller than SIZE (e.g.
50k or 44.6m)
--max-filesize SIZE Do not download any videos larger than SIZE (e.g.
50k or 44.6m)
--playlist-start NUMBER playlist video to start at (default is 1)
--playlist-end NUMBER playlist video to end at (default is last)
--match-title REGEX download only matching titles (regex or caseless
sub-string)
--reject-title REGEX skip download for matching titles (regex or
caseless sub-string)
--max-downloads NUMBER Abort after downloading NUMBER files
--min-filesize SIZE Do not download any videos smaller than SIZE
(e.g. 50k or 44.6m)
--max-filesize SIZE Do not download any videos larger than SIZE (e.g.
50k or 44.6m)
--date DATE download only videos uploaded in this date
--datebefore DATE download only videos uploaded before this date
--dateafter DATE download only videos uploaded after this date
--no-playlist download only the currently playing video
--age-limit YEARS download only videos suitable for the given age
--download-archive FILE Download only videos not present in the archive
file. Record all downloaded videos in it.
## Download Options:
-r, --rate-limit LIMIT maximum download rate in bytes per second (e.g.
50K or 4.2M)
-R, --retries RETRIES number of retries (default is 10)
--buffer-size SIZE size of download buffer (e.g. 1024 or 16K)
(default is 1024)
--no-resize-buffer do not automatically adjust the buffer size. By
default, the buffer size is automatically resized
from an initial value of SIZE.
## Filesystem Options:
-t, --title use title in file name
--id use video ID in file name
-l, --literal [deprecated] alias of --title
-A, --auto-number number downloaded files starting from 00000
-o, --output TEMPLATE output filename template. Use %(title)s to get the
title, %(uploader)s for the uploader name,
%(uploader_id)s for the uploader nickname if
different, %(autonumber)s to get an automatically
incremented number, %(ext)s for the filename
extension, %(upload_date)s for the upload date
(YYYYMMDD), %(extractor)s for the provider
(youtube, metacafe, etc), %(id)s for the video id
and %% for a literal percent. Use - to output to
stdout. Can also be used to download to a different
directory, for example with -o '/my/downloads/%(upl
oader)s/%(title)s-%(id)s.%(ext)s' .
--restrict-filenames Restrict filenames to only ASCII characters, and
avoid "&" and spaces in filenames
-a, --batch-file FILE file containing URLs to download ('-' for stdin)
-w, --no-overwrites do not overwrite files
-c, --continue resume partially downloaded files
--no-continue do not resume partially downloaded files (restart
from beginning)
--cookies FILE file to read cookies from and dump cookie jar in
--no-part do not use .part files
--no-mtime do not use the Last-modified header to set the file
modification time
--write-description write video description to a .description file
--write-info-json write video metadata to a .info.json file
-t, --title use title in file name (default)
--id use only video ID in file name
-l, --literal [deprecated] alias of --title
-A, --auto-number number downloaded files starting from 00000
-o, --output TEMPLATE output filename template. Use %(title)s to get
the title, %(uploader)s for the uploader name,
%(uploader_id)s for the uploader nickname if
different, %(autonumber)s to get an automatically
incremented number, %(ext)s for the filename
extension, %(format)s for the format description
(like "22 - 1280x720" or "HD"),%(format_id)s for
the unique id of the format (like Youtube's
itags: "137"),%(upload_date)s for the upload date
(YYYYMMDD), %(extractor)s for the provider
(youtube, metacafe, etc), %(id)s for the video id
, %(playlist)s for the playlist the video is in,
%(playlist_index)s for the position in the
playlist and %% for a literal percent. Use - to
output to stdout. Can also be used to download to
a different directory, for example with -o '/my/d
ownloads/%(uploader)s/%(title)s-%(id)s.%(ext)s' .
--autonumber-size NUMBER Specifies the number of digits in %(autonumber)s
when it is present in output filename template or
--auto-number option is given
--restrict-filenames Restrict filenames to only ASCII characters, and
avoid "&" and spaces in filenames
-a, --batch-file FILE file containing URLs to download ('-' for stdin)
-w, --no-overwrites do not overwrite files
-c, --continue force resume of partially downloaded files. By
default, youtube-dl will resume downloads if
possible.
--no-continue do not resume partially downloaded files (restart
from beginning)
--cookies FILE file to read cookies from and dump cookie jar in
--no-part do not use .part files
--no-mtime do not use the Last-modified header to set the
file modification time
--write-description write video description to a .description file
--write-info-json write video metadata to a .info.json file
--write-annotations write video annotations to a .annotation file
--write-thumbnail write thumbnail image to disk
## Verbosity / Simulation Options:
-q, --quiet activates quiet mode
-s, --simulate do not download the video and do not write anything
to disk
--skip-download do not download the video
-g, --get-url simulate, quiet but print URL
-e, --get-title simulate, quiet but print title
--get-thumbnail simulate, quiet but print thumbnail URL
--get-description simulate, quiet but print video description
--get-filename simulate, quiet but print output filename
--get-format simulate, quiet but print output format
--newline output progress bar as new lines
--no-progress do not print progress bar
--console-title display progress in console titlebar
-v, --verbose print various debugging information
-q, --quiet activates quiet mode
-s, --simulate do not download the video and do not write
anything to disk
--skip-download do not download the video
-g, --get-url simulate, quiet but print URL
-e, --get-title simulate, quiet but print title
--get-id simulate, quiet but print id
--get-thumbnail simulate, quiet but print thumbnail URL
--get-description simulate, quiet but print video description
--get-filename simulate, quiet but print output filename
--get-format simulate, quiet but print output format
-j, --dump-json simulate, quiet but print JSON information
--newline output progress bar as new lines
--no-progress do not print progress bar
--console-title display progress in console titlebar
-v, --verbose print various debugging information
--dump-intermediate-pages print downloaded pages to debug problems(very
verbose)
--write-pages Write downloaded pages to files in the current
directory
## Video Format Options:
-f, --format FORMAT video format code
--all-formats download all available video formats
--prefer-free-formats prefer free video formats unless a specific one is
requested
--max-quality FORMAT highest quality format to download
-F, --list-formats list all available formats (currently youtube only)
--write-srt write video closed captions to a .srt file
(currently youtube only)
--srt-lang LANG language of the closed captions to download
(optional) use IETF language tags like 'en'
-f, --format FORMAT video format code, specifiy the order of
preference using slashes: "-f 22/17/18". "-f mp4"
and "-f flv" are also supported
--all-formats download all available video formats
--prefer-free-formats prefer free video formats unless a specific one
is requested
--max-quality FORMAT highest quality format to download
-F, --list-formats list all available formats (currently youtube
only)
## Subtitle Options:
--write-sub write subtitle file
--write-auto-sub write automatic subtitle file (youtube only)
--all-subs downloads all the available subtitles of the
video
--list-subs lists all available subtitles for the video
--sub-format FORMAT subtitle format (default=srt) ([sbv/vtt] youtube
only)
--sub-lang LANGS languages of the subtitles to download (optional)
separated by commas, use IETF language tags like
'en,pt'
## Authentication Options:
-u, --username USERNAME account username
-p, --password PASSWORD account password
-n, --netrc use .netrc authentication data
-u, --username USERNAME account username
-p, --password PASSWORD account password
-n, --netrc use .netrc authentication data
--video-password PASSWORD video password (vimeo only)
## Post-processing Options:
-x, --extract-audio convert video files to audio-only files (requires
ffmpeg or avconv and ffprobe or avprobe)
--audio-format FORMAT "best", "aac", "vorbis", "mp3", "m4a", "opus", or
"wav"; best by default
--audio-quality QUALITY ffmpeg/avconv audio quality specification, insert a
value between 0 (better) and 9 (worse) for VBR or a
specific bitrate like 128K (default 5)
--recode-video FORMAT Encode the video to another format if necessary
(currently supported: mp4|flv|ogg|webm)
-k, --keep-video keeps the video file on disk after the post-
processing; the video is erased by default
--no-post-overwrites do not overwrite post-processed files; the post-
processed files are overwritten by default
-x, --extract-audio convert video files to audio-only files (requires
ffmpeg or avconv and ffprobe or avprobe)
--audio-format FORMAT "best", "aac", "vorbis", "mp3", "m4a", "opus", or
"wav"; best by default
--audio-quality QUALITY ffmpeg/avconv audio quality specification, insert
a value between 0 (better) and 9 (worse) for VBR
or a specific bitrate like 128K (default 5)
--recode-video FORMAT Encode the video to another format if necessary
(currently supported: mp4|flv|ogg|webm)
-k, --keep-video keeps the video file on disk after the post-
processing; the video is erased by default
--no-post-overwrites do not overwrite post-processed files; the post-
processed files are overwritten by default
--embed-subs embed subtitles in the video (only for mp4
videos)
--add-metadata add metadata to the files
# CONFIGURATION
@ -138,8 +196,10 @@ The `-o` option allows users to indicate a template for the output file names. T
- `ext`: The sequence will be replaced by the appropriate extension (like flv or mp4).
- `epoch`: The sequence will be replaced by the Unix epoch when creating the file.
- `autonumber`: The sequence will be replaced by a five-digit number that will be increased with each download, starting at zero.
- `playlist`: The name or the id of the playlist that contains the video.
- `playlist_index`: The index of the video in the playlist, a five-digit number.
The current default template is `%(id)s.%(ext)s`, but that will be switchted to `%(title)s-%(id)s.%(ext)s` (which can be requested with `-t` at the moment).
The current default template is `%(title)s-%(id)s.%(ext)s`.
In some cases, you don't want special characters such as 中, spaces, or &, such as when transferring the downloaded filename to a Windows system or the filename through an 8bit-unsafe channel. In these cases, add the `--restrict-filenames` flag to get a shorter title:
@ -148,15 +208,28 @@ In some cases, you don't want special characters such as 中, spaces, or &, such
$ youtube-dl --get-filename -o "%(title)s.%(ext)s" BaW_jenozKc --restrict-filenames
youtube-dl_test_video_.mp4 # A simple file name
# VIDEO SELECTION
Videos can be filtered by their upload date using the options `--date`, `--datebefore` or `--dateafter`, they accept dates in two formats:
- Absolute dates: Dates in the format `YYYYMMDD`.
- Relative dates: Dates in the format `(now|today)[+-][0-9](day|week|month|year)(s)?`
Examples:
$ youtube-dl --dateafter now-6months #will only download the videos uploaded in the last 6 months
$ youtube-dl --date 19700101 #will only download the videos uploaded in January 1, 1970
$ youtube-dl --dateafter 20000101 --datebefore 20100101 #will only download the videos uploaded between 2000 and 2010
# FAQ
### Can you please put the -b option back?
Most people asking this question are not aware that youtube-dl now defaults to downloading the highest available quality as reported by YouTube, which will be 1080p or 720p in some cases, so you no longer need the -b option. For some specific videos, maybe YouTube does not report them to be available in a specific high quality format you''re interested in. In that case, simply request it with the -f option and youtube-dl will try to download it.
Most people asking this question are not aware that youtube-dl now defaults to downloading the highest available quality as reported by YouTube, which will be 1080p or 720p in some cases, so you no longer need the `-b` option. For some specific videos, maybe YouTube does not report them to be available in a specific high quality format you're interested in. In that case, simply request it with the `-f` option and youtube-dl will try to download it.
### I get HTTP error 402 when trying to download a video. What's this?
Apparently YouTube requires you to pass a CAPTCHA test if you download too much. We''re [considering to provide a way to let you solve the CAPTCHA](https://github.com/rg3/youtube-dl/issues/154), but at the moment, your best course of action is pointing a webbrowser to the youtube URL, solving the CAPTCHA, and restart youtube-dl.
Apparently YouTube requires you to pass a CAPTCHA test if you download too much. We're [considering to provide a way to let you solve the CAPTCHA](https://github.com/rg3/youtube-dl/issues/154), but at the moment, your best course of action is pointing a webbrowser to the youtube URL, solving the CAPTCHA, and restart youtube-dl.
### I have downloaded a video but how can I play it?

View File

@ -1,14 +1,18 @@
__youtube-dl()
__youtube_dl()
{
local cur prev opts
COMPREPLY=()
cur="${COMP_WORDS[COMP_CWORD]}"
opts="{{flags}}"
keywords=":ytfavorites :ytrecommended :ytsubscriptions :ytwatchlater"
if [[ ${cur} == * ]] ; then
if [[ ${cur} =~ : ]]; then
COMPREPLY=( $(compgen -W "${keywords}" -- ${cur}) )
return 0
elif [[ ${cur} == * ]] ; then
COMPREPLY=( $(compgen -W "${opts}" -- ${cur}) )
return 0
fi
}
complete -F __youtube-dl youtube-dl
complete -F __youtube_dl youtube-dl

405
devscripts/buildserver.py Normal file
View File

@ -0,0 +1,405 @@
#!/usr/bin/python3
from http.server import HTTPServer, BaseHTTPRequestHandler
from socketserver import ThreadingMixIn
import argparse
import ctypes
import functools
import sys
import threading
import traceback
import os.path
class BuildHTTPServer(ThreadingMixIn, HTTPServer):
allow_reuse_address = True
advapi32 = ctypes.windll.advapi32
SC_MANAGER_ALL_ACCESS = 0xf003f
SC_MANAGER_CREATE_SERVICE = 0x02
SERVICE_WIN32_OWN_PROCESS = 0x10
SERVICE_AUTO_START = 0x2
SERVICE_ERROR_NORMAL = 0x1
DELETE = 0x00010000
SERVICE_STATUS_START_PENDING = 0x00000002
SERVICE_STATUS_RUNNING = 0x00000004
SERVICE_ACCEPT_STOP = 0x1
SVCNAME = 'youtubedl_builder'
LPTSTR = ctypes.c_wchar_p
START_CALLBACK = ctypes.WINFUNCTYPE(None, ctypes.c_int, ctypes.POINTER(LPTSTR))
class SERVICE_TABLE_ENTRY(ctypes.Structure):
_fields_ = [
('lpServiceName', LPTSTR),
('lpServiceProc', START_CALLBACK)
]
HandlerEx = ctypes.WINFUNCTYPE(
ctypes.c_int, # return
ctypes.c_int, # dwControl
ctypes.c_int, # dwEventType
ctypes.c_void_p, # lpEventData,
ctypes.c_void_p, # lpContext,
)
def _ctypes_array(c_type, py_array):
ar = (c_type * len(py_array))()
ar[:] = py_array
return ar
def win_OpenSCManager():
res = advapi32.OpenSCManagerW(None, None, SC_MANAGER_ALL_ACCESS)
if not res:
raise Exception('Opening service manager failed - '
'are you running this as administrator?')
return res
def win_install_service(service_name, cmdline):
manager = win_OpenSCManager()
try:
h = advapi32.CreateServiceW(
manager, service_name, None,
SC_MANAGER_CREATE_SERVICE, SERVICE_WIN32_OWN_PROCESS,
SERVICE_AUTO_START, SERVICE_ERROR_NORMAL,
cmdline, None, None, None, None, None)
if not h:
raise OSError('Service creation failed: %s' % ctypes.FormatError())
advapi32.CloseServiceHandle(h)
finally:
advapi32.CloseServiceHandle(manager)
def win_uninstall_service(service_name):
manager = win_OpenSCManager()
try:
h = advapi32.OpenServiceW(manager, service_name, DELETE)
if not h:
raise OSError('Could not find service %s: %s' % (
service_name, ctypes.FormatError()))
try:
if not advapi32.DeleteService(h):
raise OSError('Deletion failed: %s' % ctypes.FormatError())
finally:
advapi32.CloseServiceHandle(h)
finally:
advapi32.CloseServiceHandle(manager)
def win_service_report_event(service_name, msg, is_error=True):
with open('C:/sshkeys/log', 'a', encoding='utf-8') as f:
f.write(msg + '\n')
event_log = advapi32.RegisterEventSourceW(None, service_name)
if not event_log:
raise OSError('Could not report event: %s' % ctypes.FormatError())
try:
type_id = 0x0001 if is_error else 0x0004
event_id = 0xc0000000 if is_error else 0x40000000
lines = _ctypes_array(LPTSTR, [msg])
if not advapi32.ReportEventW(
event_log, type_id, 0, event_id, None, len(lines), 0,
lines, None):
raise OSError('Event reporting failed: %s' % ctypes.FormatError())
finally:
advapi32.DeregisterEventSource(event_log)
def win_service_handler(stop_event, *args):
try:
raise ValueError('Handler called with args ' + repr(args))
TODO
except Exception as e:
tb = traceback.format_exc()
msg = str(e) + '\n' + tb
win_service_report_event(service_name, msg, is_error=True)
raise
def win_service_set_status(handle, status_code):
svcStatus = SERVICE_STATUS()
svcStatus.dwServiceType = SERVICE_WIN32_OWN_PROCESS
svcStatus.dwCurrentState = status_code
svcStatus.dwControlsAccepted = SERVICE_ACCEPT_STOP
svcStatus.dwServiceSpecificExitCode = 0
if not advapi32.SetServiceStatus(handle, ctypes.byref(svcStatus)):
raise OSError('SetServiceStatus failed: %r' % ctypes.FormatError())
def win_service_main(service_name, real_main, argc, argv_raw):
try:
#args = [argv_raw[i].value for i in range(argc)]
stop_event = threading.Event()
handler = HandlerEx(functools.partial(stop_event, win_service_handler))
h = advapi32.RegisterServiceCtrlHandlerExW(service_name, handler, None)
if not h:
raise OSError('Handler registration failed: %s' %
ctypes.FormatError())
TODO
except Exception as e:
tb = traceback.format_exc()
msg = str(e) + '\n' + tb
win_service_report_event(service_name, msg, is_error=True)
raise
def win_service_start(service_name, real_main):
try:
cb = START_CALLBACK(
functools.partial(win_service_main, service_name, real_main))
dispatch_table = _ctypes_array(SERVICE_TABLE_ENTRY, [
SERVICE_TABLE_ENTRY(
service_name,
cb
),
SERVICE_TABLE_ENTRY(None, ctypes.cast(None, START_CALLBACK))
])
if not advapi32.StartServiceCtrlDispatcherW(dispatch_table):
raise OSError('ctypes start failed: %s' % ctypes.FormatError())
except Exception as e:
tb = traceback.format_exc()
msg = str(e) + '\n' + tb
win_service_report_event(service_name, msg, is_error=True)
raise
def main(args=None):
parser = argparse.ArgumentParser()
parser.add_argument('-i', '--install',
action='store_const', dest='action', const='install',
help='Launch at Windows startup')
parser.add_argument('-u', '--uninstall',
action='store_const', dest='action', const='uninstall',
help='Remove Windows service')
parser.add_argument('-s', '--service',
action='store_const', dest='action', const='service',
help='Run as a Windows service')
parser.add_argument('-b', '--bind', metavar='<host:port>',
action='store', default='localhost:8142',
help='Bind to host:port (default %default)')
options = parser.parse_args(args=args)
if options.action == 'install':
fn = os.path.abspath(__file__).replace('v:', '\\\\vboxsrv\\vbox')
cmdline = '%s %s -s -b %s' % (sys.executable, fn, options.bind)
win_install_service(SVCNAME, cmdline)
return
if options.action == 'uninstall':
win_uninstall_service(SVCNAME)
return
if options.action == 'service':
win_service_start(SVCNAME, main)
return
host, port_str = options.bind.split(':')
port = int(port_str)
print('Listening on %s:%d' % (host, port))
srv = BuildHTTPServer((host, port), BuildHTTPRequestHandler)
thr = threading.Thread(target=srv.serve_forever)
thr.start()
input('Press ENTER to shut down')
srv.shutdown()
thr.join()
def rmtree(path):
for name in os.listdir(path):
fname = os.path.join(path, name)
if os.path.isdir(fname):
rmtree(fname)
else:
os.chmod(fname, 0o666)
os.remove(fname)
os.rmdir(path)
#==============================================================================
class BuildError(Exception):
def __init__(self, output, code=500):
self.output = output
self.code = code
def __str__(self):
return self.output
class HTTPError(BuildError):
pass
class PythonBuilder(object):
def __init__(self, **kwargs):
pythonVersion = kwargs.pop('python', '2.7')
try:
key = _winreg.OpenKey(_winreg.HKEY_LOCAL_MACHINE, r'SOFTWARE\Python\PythonCore\%s\InstallPath' % pythonVersion)
try:
self.pythonPath, _ = _winreg.QueryValueEx(key, '')
finally:
_winreg.CloseKey(key)
except Exception:
raise BuildError('No such Python version: %s' % pythonVersion)
super(PythonBuilder, self).__init__(**kwargs)
class GITInfoBuilder(object):
def __init__(self, **kwargs):
try:
self.user, self.repoName = kwargs['path'][:2]
self.rev = kwargs.pop('rev')
except ValueError:
raise BuildError('Invalid path')
except KeyError as e:
raise BuildError('Missing mandatory parameter "%s"' % e.args[0])
path = os.path.join(os.environ['APPDATA'], 'Build archive', self.repoName, self.user)
if not os.path.exists(path):
os.makedirs(path)
self.basePath = tempfile.mkdtemp(dir=path)
self.buildPath = os.path.join(self.basePath, 'build')
super(GITInfoBuilder, self).__init__(**kwargs)
class GITBuilder(GITInfoBuilder):
def build(self):
try:
subprocess.check_output(['git', 'clone', 'git://github.com/%s/%s.git' % (self.user, self.repoName), self.buildPath])
subprocess.check_output(['git', 'checkout', self.rev], cwd=self.buildPath)
except subprocess.CalledProcessError as e:
raise BuildError(e.output)
super(GITBuilder, self).build()
class YoutubeDLBuilder(object):
authorizedUsers = ['fraca7', 'phihag', 'rg3', 'FiloSottile']
def __init__(self, **kwargs):
if self.repoName != 'youtube-dl':
raise BuildError('Invalid repository "%s"' % self.repoName)
if self.user not in self.authorizedUsers:
raise HTTPError('Unauthorized user "%s"' % self.user, 401)
super(YoutubeDLBuilder, self).__init__(**kwargs)
def build(self):
try:
subprocess.check_output([os.path.join(self.pythonPath, 'python.exe'), 'setup.py', 'py2exe'],
cwd=self.buildPath)
except subprocess.CalledProcessError as e:
raise BuildError(e.output)
super(YoutubeDLBuilder, self).build()
class DownloadBuilder(object):
def __init__(self, **kwargs):
self.handler = kwargs.pop('handler')
self.srcPath = os.path.join(self.buildPath, *tuple(kwargs['path'][2:]))
self.srcPath = os.path.abspath(os.path.normpath(self.srcPath))
if not self.srcPath.startswith(self.buildPath):
raise HTTPError(self.srcPath, 401)
super(DownloadBuilder, self).__init__(**kwargs)
def build(self):
if not os.path.exists(self.srcPath):
raise HTTPError('No such file', 404)
if os.path.isdir(self.srcPath):
raise HTTPError('Is a directory: %s' % self.srcPath, 401)
self.handler.send_response(200)
self.handler.send_header('Content-Type', 'application/octet-stream')
self.handler.send_header('Content-Disposition', 'attachment; filename=%s' % os.path.split(self.srcPath)[-1])
self.handler.send_header('Content-Length', str(os.stat(self.srcPath).st_size))
self.handler.end_headers()
with open(self.srcPath, 'rb') as src:
shutil.copyfileobj(src, self.handler.wfile)
super(DownloadBuilder, self).build()
class CleanupTempDir(object):
def build(self):
try:
rmtree(self.basePath)
except Exception as e:
print('WARNING deleting "%s": %s' % (self.basePath, e))
super(CleanupTempDir, self).build()
class Null(object):
def __init__(self, **kwargs):
pass
def start(self):
pass
def close(self):
pass
def build(self):
pass
class Builder(PythonBuilder, GITBuilder, YoutubeDLBuilder, DownloadBuilder, CleanupTempDir, Null):
pass
class BuildHTTPRequestHandler(BaseHTTPRequestHandler):
actionDict = { 'build': Builder, 'download': Builder } # They're the same, no more caching.
def do_GET(self):
path = urlparse.urlparse(self.path)
paramDict = dict([(key, value[0]) for key, value in urlparse.parse_qs(path.query).items()])
action, _, path = path.path.strip('/').partition('/')
if path:
path = path.split('/')
if action in self.actionDict:
try:
builder = self.actionDict[action](path=path, handler=self, **paramDict)
builder.start()
try:
builder.build()
finally:
builder.close()
except BuildError as e:
self.send_response(e.code)
msg = unicode(e).encode('UTF-8')
self.send_header('Content-Type', 'text/plain; charset=UTF-8')
self.send_header('Content-Length', len(msg))
self.end_headers()
self.wfile.write(msg)
except HTTPError as e:
self.send_response(e.code, str(e))
else:
self.send_response(500, 'Unknown build method "%s"' % action)
else:
self.send_response(500, 'Malformed URL')
#==============================================================================
if __name__ == '__main__':
main()

39
devscripts/check-porn.py Normal file
View File

@ -0,0 +1,39 @@
#!/usr/bin/env python
"""
This script employs a VERY basic heuristic ('porn' in webpage.lower()) to check
if we are not 'age_limit' tagging some porn site
"""
# Allow direct execution
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import get_testcases
from youtube_dl.utils import compat_urllib_request
for test in get_testcases():
try:
webpage = compat_urllib_request.urlopen(test['url'], timeout=10).read()
except:
print('\nFail: {0}'.format(test['name']))
continue
webpage = webpage.decode('utf8', 'replace')
if 'porn' in webpage.lower() and ('info_dict' not in test
or 'age_limit' not in test['info_dict']
or test['info_dict']['age_limit'] != 18):
print('\nPotential missing age_limit check: {0}'.format(test['name']))
elif 'porn' not in webpage.lower() and ('info_dict' in test and
'age_limit' in test['info_dict'] and
test['info_dict']['age_limit'] == 18):
print('\nPotential false negative: {0}'.format(test['name']))
else:
sys.stdout.write('.')
sys.stdout.flush()
print()

View File

@ -3,31 +3,40 @@
import json
import sys
import hashlib
import urllib.request
import os.path
if len(sys.argv) <= 1:
print('Specify the version number as parameter')
sys.exit()
print('Specify the version number as parameter')
sys.exit()
version = sys.argv[1]
with open('update/LATEST_VERSION', 'w') as f:
f.write(version)
f.write(version)
versions_info = json.load(open('update/versions.json'))
if 'signature' in versions_info:
del versions_info['signature']
del versions_info['signature']
new_version = {}
filenames = {'bin': 'youtube-dl', 'exe': 'youtube-dl.exe', 'tar': 'youtube-dl-%s.tar.gz' % version}
filenames = {
'bin': 'youtube-dl',
'exe': 'youtube-dl.exe',
'tar': 'youtube-dl-%s.tar.gz' % version}
build_dir = os.path.join('..', '..', 'build', version)
for key, filename in filenames.items():
print('Downloading and checksumming %s...' %filename)
url = 'http://youtube-dl.org/downloads/%s/%s' % (version, filename)
data = urllib.request.urlopen(url).read()
sha256sum = hashlib.sha256(data).hexdigest()
new_version[key] = (url, sha256sum)
url = 'https://yt-dl.org/downloads/%s/%s' % (version, filename)
fn = os.path.join(build_dir, filename)
with open(fn, 'rb') as f:
data = f.read()
if not data:
raise ValueError('File %s is empty!' % fn)
sha256sum = hashlib.sha256(data).hexdigest()
new_version[key] = (url, sha256sum)
versions_info['versions'][version] = new_version
versions_info['latest'] = version
json.dump(versions_info, open('update/versions.json', 'w'), indent=4, sort_keys=True)
with open('update/versions.json', 'w') as jsonf:
json.dump(versions_info, jsonf, indent=4, sort_keys=True)

View File

@ -0,0 +1,56 @@
#!/usr/bin/env python3
import datetime
import textwrap
import json
atom_template=textwrap.dedent("""\
<?xml version='1.0' encoding='utf-8'?>
<atom:feed xmlns:atom="http://www.w3.org/2005/Atom">
<atom:title>youtube-dl releases</atom:title>
<atom:id>youtube-dl-updates-feed</atom:id>
<atom:updated>@TIMESTAMP@</atom:updated>
@ENTRIES@
</atom:feed>""")
entry_template=textwrap.dedent("""
<atom:entry>
<atom:id>youtube-dl-@VERSION@</atom:id>
<atom:title>New version @VERSION@</atom:title>
<atom:link href="http://rg3.github.io/youtube-dl" />
<atom:content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
Downloads available at <a href="https://yt-dl.org/downloads/@VERSION@/">https://yt-dl.org/downloads/@VERSION@/</a>
</div>
</atom:content>
<atom:author>
<atom:name>The youtube-dl maintainers</atom:name>
</atom:author>
<atom:updated>@TIMESTAMP@</atom:updated>
</atom:entry>
""")
now = datetime.datetime.now()
now_iso = now.isoformat()
atom_template = atom_template.replace('@TIMESTAMP@',now_iso)
entries=[]
versions_info = json.load(open('update/versions.json'))
versions = list(versions_info['versions'].keys())
versions.sort()
for v in versions:
entry = entry_template.replace('@TIMESTAMP@',v.replace('.','-'))
entry = entry.replace('@VERSION@',v)
entries.append(entry)
entries_str = textwrap.indent(''.join(entries), '\t')
atom_template = atom_template.replace('@ENTRIES@', entries_str)
with open('update/releases.atom','w',encoding='utf-8') as atom_file:
atom_file.write(atom_template)

View File

@ -0,0 +1,34 @@
#!/usr/bin/env python3
import sys
import os
import textwrap
# We must be able to import youtube_dl
sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))
import youtube_dl
def main():
with open('supportedsites.html.in', 'r', encoding='utf-8') as tmplf:
template = tmplf.read()
ie_htmls = []
for ie in sorted(youtube_dl.gen_extractors(), key=lambda i: i.IE_NAME.lower()):
ie_html = '<b>{}</b>'.format(ie.IE_NAME)
ie_desc = getattr(ie, 'IE_DESC', None)
if ie_desc is False:
continue
elif ie_desc is not None:
ie_html += ': {}'.format(ie.IE_DESC)
if ie.working() == False:
ie_html += ' (Currently broken)'
ie_htmls.append('<li>{}</li>'.format(ie_html))
template = template.replace('@SITES@', textwrap.indent('\n'.join(ie_htmls), '\t'))
with open('supportedsites.html', 'w', encoding='utf-8') as sitesf:
sitesf.write(template)
if __name__ == '__main__':
main()

View File

@ -14,6 +14,12 @@
set -e
skip_tests=false
if [ "$1" = '--skip-test' ]; then
skip_tests=true
shift
fi
if [ -z "$1" ]; then echo "ERROR: specify version number like this: $0 1994.09.06"; exit 1; fi
version="$1"
if [ ! -z "`git tag | grep "$version"`" ]; then echo 'ERROR: version already present'; exit 1; fi
@ -22,7 +28,11 @@ if [ ! -f "updates_key.pem" ]; then echo 'ERROR: updates_key.pem missing'; exit
/bin/echo -e "\n### First of all, testing..."
make cleanall
nosetests --with-coverage --cover-package=youtube_dl --cover-html test || exit 1
if $skip_tests ; then
echo 'SKIPPING TESTS'
else
nosetests --verbose --with-coverage --cover-package=youtube_dl --cover-html test --stop || exit 1
fi
/bin/echo -e "\n### Changing version in version.py..."
sed -i "s/__version__ = '.*'/__version__ = '$version'/" youtube_dl/version.py
@ -45,8 +55,8 @@ git push origin "$version"
/bin/echo -e "\n### OK, now it is time to build the binaries..."
REV=$(git rev-parse HEAD)
make youtube-dl youtube-dl.tar.gz
wget "http://jeromelaheurte.net:8142/download/rg3/youtube-dl/youtube-dl.exe?rev=$REV" -O youtube-dl.exe || \
wget "http://jeromelaheurte.net:8142/build/rg3/youtube-dl/youtube-dl.exe?rev=$REV" -O youtube-dl.exe
read -p "VM running? (y/n) " -n 1
wget "http://localhost:8142/build/rg3/youtube-dl/youtube-dl.exe?rev=$REV" -O youtube-dl.exe
mkdir -p "build/$version"
mv youtube-dl youtube-dl.exe "build/$version"
mv youtube-dl.tar.gz "build/$version/youtube-dl-$version.tar.gz"
@ -57,9 +67,11 @@ RELEASE_FILES="youtube-dl youtube-dl.exe youtube-dl-$version.tar.gz"
(cd build/$version/ && sha512sum $RELEASE_FILES > SHA2-512SUMS)
git checkout HEAD -- youtube-dl youtube-dl.exe
/bin/echo -e "\n### Signing and uploading the new binaries to youtube-dl.org..."
/bin/echo -e "\n### Signing and uploading the new binaries to yt-dl.org ..."
for f in $RELEASE_FILES; do gpg --detach-sig "build/$version/$f"; done
scp -r "build/$version" ytdl@youtube-dl.org:html/downloads/
scp -r "build/$version" ytdl@yt-dl.org:html/tmp/
ssh ytdl@yt-dl.org "mv html/tmp/$version html/downloads/"
ssh ytdl@yt-dl.org "sh html/update_latest.sh $version"
/bin/echo -e "\n### Now switching to gh-pages..."
git clone --branch gh-pages --single-branch . build/gh-pages
@ -69,15 +81,13 @@ ROOT=$(pwd)
ORIGIN_URL=$(git config --get remote.origin.url)
cd build/gh-pages
"$ROOT/devscripts/gh-pages/add-version.py" $version
"$ROOT/devscripts/gh-pages/update-feed.py"
"$ROOT/devscripts/gh-pages/sign-versions.py" < "$ROOT/updates_key.pem"
"$ROOT/devscripts/gh-pages/generate-download.py"
"$ROOT/devscripts/gh-pages/update-copyright.py"
"$ROOT/devscripts/gh-pages/update-sites.py"
git add *.html *.html.in update
git commit -m "release $version"
git show HEAD
read -p "Is it good, can I push? (y/n) " -n 1
if [[ ! $REPLY =~ ^[Yy]$ ]]; then exit 1; fi
echo
git push "$ROOT" gh-pages
git push "$ORIGIN_URL" gh-pages
)

View File

@ -40,7 +40,7 @@ raw_input()
filename = sys.argv[0]
UPDATE_URL = "http://rg3.github.com/youtube-dl/update/"
UPDATE_URL = "http://rg3.github.io/youtube-dl/update/"
VERSION_URL = UPDATE_URL + 'LATEST_VERSION'
JSON_URL = UPDATE_URL + 'versions.json'
UPDATES_RSA_KEY = (0x9d60ee4d8f805312fdb15a62f87b95bd66177b91df176765d13514a0f1754bcd2057295c5b6f1d35daa6742c3ffc9a82d3e118861c207995a8031e151d863c9927e304576bc80692bc8e094896fcf11b66f3e29e04e3a71e9a11558558acea1840aec37fc396fb6b65dc81a1c4144e03bd1c011de62e3f1357b327d08426fe93, 65537)

View File

@ -2,17 +2,21 @@
# -*- coding: utf-8 -*-
from __future__ import print_function
import pkg_resources
import sys
try:
from setuptools import setup
setuptools_available = True
except ImportError:
from distutils.core import setup
setuptools_available = False
try:
# This will create an exe that needs Microsoft Visual C++ 2008
# Redistributable Package
import py2exe
"""This will create an exe that needs Microsoft Visual C++ 2008 Redistributable Package"""
except ImportError:
if len(sys.argv) >= 2 and sys.argv[1] == 'py2exe':
print("Cannot import py2exe", file=sys.stderr)
@ -23,15 +27,17 @@ py2exe_options = {
"compressed": 1,
"optimize": 2,
"dist_dir": '.',
"dll_excludes": ['w9xpopen.exe']
"dll_excludes": ['w9xpopen.exe'],
}
py2exe_console = [{
"script": "./youtube_dl/__main__.py",
"dest_base": "youtube-dl",
}]
py2exe_params = {
'console': py2exe_console,
'options': { "py2exe": py2exe_options },
'options': {"py2exe": py2exe_options},
'zipfile': None
}
@ -39,31 +45,39 @@ if len(sys.argv) >= 2 and sys.argv[1] == 'py2exe':
params = py2exe_params
else:
params = {
'scripts': ['bin/youtube-dl'],
'data_files': [('etc/bash_completion.d', ['youtube-dl.bash-completion']), # Installing system-wide would require sudo...
('share/doc/youtube_dl', ['README.txt']),
('share/man/man1/', ['youtube-dl.1'])]
'data_files': [ # Installing system-wide would require sudo...
('etc/bash_completion.d', ['youtube-dl.bash-completion']),
('share/doc/youtube_dl', ['README.txt']),
('share/man/man1', ['youtube-dl.1'])
]
}
if setuptools_available:
params['entry_points'] = {'console_scripts': ['youtube-dl = youtube_dl:main']}
else:
params['scripts'] = ['bin/youtube-dl']
# Get the version from youtube_dl/version.py without importing the package
exec(compile(open('youtube_dl/version.py').read(), 'youtube_dl/version.py', 'exec'))
exec(compile(open('youtube_dl/version.py').read(),
'youtube_dl/version.py', 'exec'))
setup(
name = 'youtube_dl',
version = __version__,
description = 'YouTube video downloader',
long_description = 'Small command-line program to download videos from YouTube.com and other video sites.',
url = 'https://github.com/rg3/youtube-dl',
author = 'Ricardo Garcia',
maintainer = 'Philipp Hagemeister',
maintainer_email = 'phihag@phihag.de',
packages = ['youtube_dl'],
name='youtube_dl',
version=__version__,
description='YouTube video downloader',
long_description='Small command-line program to download videos from'
' YouTube.com and other video sites.',
url='https://github.com/rg3/youtube-dl',
author='Ricardo Garcia',
author_email='ytdl@yt-dl.org',
maintainer='Philipp Hagemeister',
maintainer_email='phihag@phihag.de',
packages=['youtube_dl', 'youtube_dl.extractor'],
# Provokes warning on most systems (why?!)
#test_suite = 'nose.collector',
#test_requires = ['nosetest'],
# test_suite = 'nose.collector',
# test_requires = ['nosetest'],
classifiers = [
classifiers=[
"Topic :: Multimedia :: Video",
"Development Status :: 5 - Production/Stable",
"Environment :: Console",

0
test/__init__.py Normal file
View File

85
test/helper.py Normal file
View File

@ -0,0 +1,85 @@
import errno
import io
import hashlib
import json
import os.path
import re
import types
import sys
import youtube_dl.extractor
from youtube_dl import YoutubeDL
from youtube_dl.utils import preferredencoding
def get_params(override=None):
PARAMETERS_FILE = os.path.join(os.path.dirname(os.path.abspath(__file__)),
"parameters.json")
with io.open(PARAMETERS_FILE, encoding='utf-8') as pf:
parameters = json.load(pf)
if override:
parameters.update(override)
return parameters
def try_rm(filename):
""" Remove a file if it exists """
try:
os.remove(filename)
except OSError as ose:
if ose.errno != errno.ENOENT:
raise
def report_warning(message):
'''
Print the message to stderr, it will be prefixed with 'WARNING:'
If stderr is a tty file the 'WARNING:' will be colored
'''
if sys.stderr.isatty() and os.name != 'nt':
_msg_header = u'\033[0;33mWARNING:\033[0m'
else:
_msg_header = u'WARNING:'
output = u'%s %s\n' % (_msg_header, message)
if 'b' in getattr(sys.stderr, 'mode', '') or sys.version_info[0] < 3:
output = output.encode(preferredencoding())
sys.stderr.write(output)
class FakeYDL(YoutubeDL):
def __init__(self, override=None):
# Different instances of the downloader can't share the same dictionary
# some test set the "sublang" parameter, which would break the md5 checks.
params = get_params(override=override)
super(FakeYDL, self).__init__(params)
self.result = []
def to_screen(self, s, skip_eol=None):
print(s)
def trouble(self, s, tb=None):
raise Exception(s)
def download(self, x):
self.result.append(x)
def expect_warning(self, regex):
# Silence an expected warning matching a regex
old_report_warning = self.report_warning
def report_warning(self, message):
if re.match(regex, message): return
old_report_warning(message)
self.report_warning = types.MethodType(report_warning, self)
def get_testcases():
for ie in youtube_dl.extractor.gen_extractors():
t = getattr(ie, '_TEST', None)
if t:
t['name'] = type(ie).__name__[:-len('IE')]
yield t
for t in getattr(ie, '_TESTS', []):
t['name'] = type(ie).__name__[:-len('IE')]
yield t
md5 = lambda s: hashlib.md5(s.encode('utf-8')).hexdigest()

View File

@ -29,6 +29,7 @@
"simulate": false,
"skip_download": false,
"subtitleslang": null,
"subtitlesformat": "srt",
"test": true,
"updatetime": true,
"usenetrc": false,
@ -36,5 +37,7 @@
"verbose": true,
"writedescription": false,
"writeinfojson": true,
"writesubtitles": false
}
"writesubtitles": false,
"allsubtitles": false,
"listssubtitles": false
}

145
test/test_YoutubeDL.py Normal file
View File

@ -0,0 +1,145 @@
#!/usr/bin/env python
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import FakeYDL
class YDL(FakeYDL):
def __init__(self, *args, **kwargs):
super(YDL, self).__init__(*args, **kwargs)
self.downloaded_info_dicts = []
self.msgs = []
def process_info(self, info_dict):
self.downloaded_info_dicts.append(info_dict)
def to_screen(self, msg):
self.msgs.append(msg)
class TestFormatSelection(unittest.TestCase):
def test_prefer_free_formats(self):
# Same resolution => download webm
ydl = YDL()
ydl.params['prefer_free_formats'] = True
formats = [
{u'ext': u'webm', u'height': 460},
{u'ext': u'mp4', u'height': 460},
]
info_dict = {u'formats': formats, u'extractor': u'test'}
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded[u'ext'], u'webm')
# Different resolution => download best quality (mp4)
ydl = YDL()
ydl.params['prefer_free_formats'] = True
formats = [
{u'ext': u'webm', u'height': 720},
{u'ext': u'mp4', u'height': 1080},
]
info_dict[u'formats'] = formats
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded[u'ext'], u'mp4')
# No prefer_free_formats => keep original formats order
ydl = YDL()
ydl.params['prefer_free_formats'] = False
formats = [
{u'ext': u'webm', u'height': 720},
{u'ext': u'flv', u'height': 720},
]
info_dict[u'formats'] = formats
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded[u'ext'], u'flv')
def test_format_limit(self):
formats = [
{u'format_id': u'meh', u'url': u'http://example.com/meh'},
{u'format_id': u'good', u'url': u'http://example.com/good'},
{u'format_id': u'great', u'url': u'http://example.com/great'},
{u'format_id': u'excellent', u'url': u'http://example.com/exc'},
]
info_dict = {
u'formats': formats, u'extractor': u'test', 'id': 'testvid'}
ydl = YDL()
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded[u'format_id'], u'excellent')
ydl = YDL({'format_limit': 'good'})
assert ydl.params['format_limit'] == 'good'
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded[u'format_id'], u'good')
ydl = YDL({'format_limit': 'great', 'format': 'all'})
ydl.process_ie_result(info_dict)
self.assertEqual(ydl.downloaded_info_dicts[0][u'format_id'], u'meh')
self.assertEqual(ydl.downloaded_info_dicts[1][u'format_id'], u'good')
self.assertEqual(ydl.downloaded_info_dicts[2][u'format_id'], u'great')
self.assertTrue('3' in ydl.msgs[0])
ydl = YDL()
ydl.params['format_limit'] = 'excellent'
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded[u'format_id'], u'excellent')
def test_format_selection(self):
formats = [
{u'format_id': u'35', u'ext': u'mp4'},
{u'format_id': u'45', u'ext': u'webm'},
{u'format_id': u'47', u'ext': u'webm'},
{u'format_id': u'2', u'ext': u'flv'},
]
info_dict = {u'formats': formats, u'extractor': u'test'}
ydl = YDL({'format': u'20/47'})
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], u'47')
ydl = YDL({'format': u'20/71/worst'})
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], u'35')
ydl = YDL()
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], u'2')
ydl = YDL({'format': u'webm/mp4'})
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], u'47')
ydl = YDL({'format': u'3gp/40/mp4'})
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], u'35')
def test_add_extra_info(self):
test_dict = {
'extractor': 'Foo',
}
extra_info = {
'extractor': 'Bar',
'playlist': 'funny videos',
}
YDL.add_extra_info(test_dict, extra_info)
self.assertEqual(test_dict['extractor'], 'Foo')
self.assertEqual(test_dict['playlist'], 'funny videos')
if __name__ == '__main__':
unittest.main()

View File

@ -0,0 +1,54 @@
#!/usr/bin/env python
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import try_rm
from youtube_dl import YoutubeDL
def _download_restricted(url, filename, age):
""" Returns true iff the file has been downloaded """
params = {
'age_limit': age,
'skip_download': True,
'writeinfojson': True,
"outtmpl": "%(id)s.%(ext)s",
}
ydl = YoutubeDL(params)
ydl.add_default_info_extractors()
json_filename = os.path.splitext(filename)[0] + '.info.json'
try_rm(json_filename)
ydl.download([url])
res = os.path.exists(json_filename)
try_rm(json_filename)
return res
class TestAgeRestriction(unittest.TestCase):
def _assert_restricted(self, url, filename, age, old_age=None):
self.assertTrue(_download_restricted(url, filename, old_age))
self.assertFalse(_download_restricted(url, filename, age))
def test_youtube(self):
self._assert_restricted('07FYdnEawAQ', '07FYdnEawAQ.mp4', 10)
def test_youporn(self):
self._assert_restricted(
'http://www.youporn.com/watch/505835/sex-ed-is-it-safe-to-masturbate-daily/',
'505835.mp4', 2, old_age=25)
def test_pornotube(self):
self._assert_restricted(
'http://pornotube.com/c/173/m/1689755/Marilyn-Monroe-Bathing',
'1689755.flv', 13)
if __name__ == '__main__':
unittest.main()

View File

@ -1,27 +1,111 @@
#!/usr/bin/env python
import sys
import unittest
# Allow direct execution
import os
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import get_testcases
from youtube_dl.extractor import (
gen_extractors,
JustinTVIE,
YoutubeIE,
)
from youtube_dl.InfoExtractors import YoutubeIE, YoutubePlaylistIE
class TestAllURLsMatching(unittest.TestCase):
def setUp(self):
self.ies = gen_extractors()
def matching_ies(self, url):
return [ie.IE_NAME for ie in self.ies if ie.suitable(url) and ie.IE_NAME != 'generic']
def assertMatch(self, url, ie_list):
self.assertEqual(self.matching_ies(url), ie_list)
def test_youtube_playlist_matching(self):
self.assertTrue(YoutubePlaylistIE().suitable(u'ECUl4u3cNGP61MdtwGTqZA0MreSaDybji8'))
self.assertTrue(YoutubePlaylistIE().suitable(u'PL63F0C78739B09958'))
self.assertFalse(YoutubePlaylistIE().suitable(u'PLtS2H6bU1M'))
assertPlaylist = lambda url: self.assertMatch(url, ['youtube:playlist'])
assertPlaylist(u'ECUl4u3cNGP61MdtwGTqZA0MreSaDybji8')
assertPlaylist(u'UUBABnxM4Ar9ten8Mdjj1j0Q') #585
assertPlaylist(u'PL63F0C78739B09958')
assertPlaylist(u'https://www.youtube.com/playlist?list=UUBABnxM4Ar9ten8Mdjj1j0Q')
assertPlaylist(u'https://www.youtube.com/course?list=ECUl4u3cNGP61MdtwGTqZA0MreSaDybji8')
assertPlaylist(u'https://www.youtube.com/playlist?list=PLwP_SiAcdui0KVebT0mU9Apz359a4ubsC')
assertPlaylist(u'https://www.youtube.com/watch?v=AV6J6_AeFEQ&playnext=1&list=PL4023E734DA416012') #668
self.assertFalse('youtube:playlist' in self.matching_ies(u'PLtS2H6bU1M'))
def test_youtube_matching(self):
self.assertTrue(YoutubeIE().suitable(u'PLtS2H6bU1M'))
self.assertTrue(YoutubeIE.suitable(u'PLtS2H6bU1M'))
self.assertFalse(YoutubeIE.suitable(u'https://www.youtube.com/watch?v=AV6J6_AeFEQ&playnext=1&list=PL4023E734DA416012')) #668
self.assertMatch('http://youtu.be/BaW_jenozKc', ['youtube'])
self.assertMatch('http://www.youtube.com/v/BaW_jenozKc', ['youtube'])
self.assertMatch('https://youtube.googleapis.com/v/BaW_jenozKc', ['youtube'])
def test_youtube_channel_matching(self):
assertChannel = lambda url: self.assertMatch(url, ['youtube:channel'])
assertChannel('https://www.youtube.com/channel/HCtnHdj3df7iM')
assertChannel('https://www.youtube.com/channel/HCtnHdj3df7iM?feature=gb_ch_rec')
assertChannel('https://www.youtube.com/channel/HCtnHdj3df7iM/videos')
def test_youtube_user_matching(self):
self.assertMatch('www.youtube.com/NASAgovVideo/videos', ['youtube:user'])
def test_youtube_feeds(self):
self.assertMatch('https://www.youtube.com/feed/watch_later', ['youtube:watch_later'])
self.assertMatch('https://www.youtube.com/feed/subscriptions', ['youtube:subscriptions'])
self.assertMatch('https://www.youtube.com/feed/recommended', ['youtube:recommended'])
self.assertMatch('https://www.youtube.com/my_favorites', ['youtube:favorites'])
def test_youtube_show_matching(self):
self.assertMatch('http://www.youtube.com/show/airdisasters', ['youtube:show'])
def test_justin_tv_channelid_matching(self):
self.assertTrue(JustinTVIE.suitable(u"justin.tv/vanillatv"))
self.assertTrue(JustinTVIE.suitable(u"twitch.tv/vanillatv"))
self.assertTrue(JustinTVIE.suitable(u"www.justin.tv/vanillatv"))
self.assertTrue(JustinTVIE.suitable(u"www.twitch.tv/vanillatv"))
self.assertTrue(JustinTVIE.suitable(u"http://www.justin.tv/vanillatv"))
self.assertTrue(JustinTVIE.suitable(u"http://www.twitch.tv/vanillatv"))
self.assertTrue(JustinTVIE.suitable(u"http://www.justin.tv/vanillatv/"))
self.assertTrue(JustinTVIE.suitable(u"http://www.twitch.tv/vanillatv/"))
def test_justintv_videoid_matching(self):
self.assertTrue(JustinTVIE.suitable(u"http://www.twitch.tv/vanillatv/b/328087483"))
def test_justin_tv_chapterid_matching(self):
self.assertTrue(JustinTVIE.suitable(u"http://www.twitch.tv/tsm_theoddone/c/2349361"))
def test_youtube_extract(self):
self.assertEqual(YoutubeIE()._extract_id('http://www.youtube.com/watch?&v=BaW_jenozKc'), 'BaW_jenozKc')
self.assertEqual(YoutubeIE()._extract_id('https://www.youtube.com/watch?&v=BaW_jenozKc'), 'BaW_jenozKc')
self.assertEqual(YoutubeIE()._extract_id('https://www.youtube.com/watch?feature=player_embedded&v=BaW_jenozKc'), 'BaW_jenozKc')
assertExtractId = lambda url, id: self.assertEqual(YoutubeIE()._extract_id(url), id)
assertExtractId('http://www.youtube.com/watch?&v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('https://www.youtube.com/watch?&v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('https://www.youtube.com/watch?feature=player_embedded&v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('https://www.youtube.com/watch_popup?v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('http://www.youtube.com/watch?v=BaW_jenozKcsharePLED17F32AD9753930', 'BaW_jenozKc')
assertExtractId('BaW_jenozKc', 'BaW_jenozKc')
def test_no_duplicates(self):
ies = gen_extractors()
for tc in get_testcases():
url = tc['url']
for ie in ies:
if type(ie).__name__ in ['GenericIE', tc['name'] + 'IE']:
self.assertTrue(ie.suitable(url), '%s should match URL %r' % (type(ie).__name__, url))
else:
self.assertFalse(ie.suitable(url), '%s should not match URL %r' % (type(ie).__name__, url))
def test_keywords(self):
self.assertMatch(':ytsubs', ['youtube:subscriptions'])
self.assertMatch(':ytsubscriptions', ['youtube:subscriptions'])
self.assertMatch(':ythistory', ['youtube:history'])
self.assertMatch(':thedailyshow', ['ComedyCentralShows'])
self.assertMatch(':tds', ['ComedyCentralShows'])
self.assertMatch(':colbertreport', ['ComedyCentralShows'])
self.assertMatch(':cr', ['ComedyCentralShows'])
if __name__ == '__main__':
unittest.main()

View File

@ -1,125 +1,175 @@
#!/usr/bin/env python
import errno
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import (
get_params,
get_testcases,
try_rm,
md5,
report_warning
)
import hashlib
import io
import os
import json
import unittest
import sys
import hashlib
import socket
# Allow direct execution
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import youtube_dl.YoutubeDL
from youtube_dl.utils import (
compat_str,
compat_urllib_error,
compat_HTTPError,
DownloadError,
ExtractorError,
UnavailableVideoError,
)
from youtube_dl.extractor import get_info_extractor
import youtube_dl.FileDownloader
import youtube_dl.InfoExtractors
from youtube_dl.utils import *
RETRIES = 3
DEF_FILE = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'tests.json')
PARAMETERS_FILE = os.path.join(os.path.dirname(os.path.abspath(__file__)), "parameters.json")
# General configuration (from __init__, not very elegant...)
jar = compat_cookiejar.CookieJar()
cookie_processor = compat_urllib_request.HTTPCookieProcessor(jar)
proxy_handler = compat_urllib_request.ProxyHandler()
opener = compat_urllib_request.build_opener(proxy_handler, cookie_processor, YoutubeDLHandler())
compat_urllib_request.install_opener(opener)
socket.setdefaulttimeout(10)
def _try_rm(filename):
""" Remove a file if it exists """
try:
os.remove(filename)
except OSError as ose:
if ose.errno != errno.ENOENT:
raise
class FileDownloader(youtube_dl.FileDownloader):
class YoutubeDL(youtube_dl.YoutubeDL):
def __init__(self, *args, **kwargs):
self.to_stderr = self.to_screen
self.processed_info_dicts = []
return youtube_dl.FileDownloader.__init__(self, *args, **kwargs)
super(YoutubeDL, self).__init__(*args, **kwargs)
def report_warning(self, message):
# Don't accept warnings during tests
raise ExtractorError(message)
def process_info(self, info_dict):
self.processed_info_dicts.append(info_dict)
return youtube_dl.FileDownloader.process_info(self, info_dict)
return super(YoutubeDL, self).process_info(info_dict)
def _file_md5(fn):
with open(fn, 'rb') as f:
return hashlib.md5(f.read()).hexdigest()
with io.open(DEF_FILE, encoding='utf-8') as deff:
defs = json.load(deff)
with io.open(PARAMETERS_FILE, encoding='utf-8') as pf:
parameters = json.load(pf)
defs = get_testcases()
class TestDownload(unittest.TestCase):
maxDiff = None
def setUp(self):
self.parameters = parameters
self.defs = defs
### Dynamically generate tests
def generator(test_case):
def test_template(self):
ie = getattr(youtube_dl.InfoExtractors, test_case['name'] + 'IE')
if not ie._WORKING:
print('Skipping: IE marked as not _WORKING')
return
if 'playlist' not in test_case and not test_case['file']:
print('Skipping: No output file specified')
ie = youtube_dl.extractor.get_info_extractor(test_case['name'])
other_ies = [get_info_extractor(ie_key) for ie_key in test_case.get('add_ie', [])]
def print_skipping(reason):
print('Skipping %s: %s' % (test_case['name'], reason))
if not ie.working():
print_skipping('IE marked as not _WORKING')
return
if 'playlist' not in test_case:
info_dict = test_case.get('info_dict', {})
if not test_case.get('file') and not (info_dict.get('id') and info_dict.get('ext')):
print_skipping('The output file cannot be know, the "file" '
'key is missing or the info_dict is incomplete')
return
if 'skip' in test_case:
print('Skipping: {0}'.format(test_case['skip']))
print_skipping(test_case['skip'])
return
for other_ie in other_ies:
if not other_ie.working():
print_skipping(u'test depends on %sIE, marked as not WORKING' % other_ie.ie_key())
return
params = self.parameters.copy()
params.update(test_case.get('params', {}))
params = get_params(test_case.get('params', {}))
fd = FileDownloader(params)
fd.add_info_extractor(ie())
for ien in test_case.get('add_ie', []):
fd.add_info_extractor(getattr(youtube_dl.InfoExtractors, ien + 'IE')())
ydl = YoutubeDL(params)
ydl.add_default_info_extractors()
finished_hook_called = set()
def _hook(status):
if status['status'] == 'finished':
finished_hook_called.add(status['filename'])
fd.add_progress_hook(_hook)
ydl.fd.add_progress_hook(_hook)
def get_tc_filename(tc):
return tc.get('file') or ydl.prepare_filename(tc.get('info_dict', {}))
test_cases = test_case.get('playlist', [test_case])
for tc in test_cases:
_try_rm(tc['file'])
_try_rm(tc['file'] + '.part')
_try_rm(tc['file'] + '.info.json')
def try_rm_tcs_files():
for tc in test_cases:
tc_filename = get_tc_filename(tc)
try_rm(tc_filename)
try_rm(tc_filename + '.part')
try_rm(os.path.splitext(tc_filename)[0] + '.info.json')
try_rm_tcs_files()
try:
fd.download([test_case['url']])
try_num = 1
while True:
try:
ydl.download([test_case['url']])
except (DownloadError, ExtractorError) as err:
# Check if the exception is not a network related one
if not err.exc_info[0] in (compat_urllib_error.URLError, socket.timeout, UnavailableVideoError) or (err.exc_info[0] == compat_HTTPError and err.exc_info[1].code == 503):
raise
if try_num == RETRIES:
report_warning(u'Failed due to network errors, skipping...')
return
print('Retrying: {0} failed tries\n\n##########\n\n'.format(try_num))
try_num += 1
else:
break
for tc in test_cases:
tc_filename = get_tc_filename(tc)
if not test_case.get('params', {}).get('skip_download', False):
self.assertTrue(os.path.exists(tc['file']), msg='Missing file ' + tc['file'])
self.assertTrue(tc['file'] in finished_hook_called)
self.assertTrue(os.path.exists(tc['file'] + '.info.json'))
self.assertTrue(os.path.exists(tc_filename), msg='Missing file ' + tc_filename)
self.assertTrue(tc_filename in finished_hook_called)
info_json_fn = os.path.splitext(tc_filename)[0] + '.info.json'
self.assertTrue(os.path.exists(info_json_fn))
if 'md5' in tc:
md5_for_file = _file_md5(tc['file'])
md5_for_file = _file_md5(tc_filename)
self.assertEqual(md5_for_file, tc['md5'])
with io.open(tc['file'] + '.info.json', encoding='utf-8') as infof:
with io.open(info_json_fn, encoding='utf-8') as infof:
info_dict = json.load(infof)
for (info_field, value) in tc.get('info_dict', {}).items():
self.assertEqual(value, info_dict.get(info_field))
for (info_field, expected) in tc.get('info_dict', {}).items():
if isinstance(expected, compat_str) and expected.startswith('md5:'):
got = 'md5:' + md5(info_dict.get(info_field))
else:
got = info_dict.get(info_field)
self.assertEqual(expected, got,
u'invalid value for field %s, expected %r, got %r' % (info_field, expected, got))
# If checkable fields are missing from the test case, print the info_dict
test_info_dict = dict((key, value if not isinstance(value, compat_str) or len(value) < 250 else 'md5:' + md5(value))
for key, value in info_dict.items()
if value and key in ('title', 'description', 'uploader', 'upload_date', 'uploader_id', 'location'))
if not all(key in tc.get('info_dict', {}).keys() for key in test_info_dict.keys()):
sys.stderr.write(u'\n"info_dict": ' + json.dumps(test_info_dict, ensure_ascii=False, indent=2) + u'\n')
# Check for the presence of mandatory fields
for key in ('id', 'url', 'title', 'ext'):
self.assertTrue(key in info_dict.keys() and info_dict[key])
# Check for mandatory fields that are automatically set by YoutubeDL
for key in ['webpage_url', 'extractor', 'extractor_key']:
self.assertTrue(info_dict.get(key), u'Missing field: %s' % key)
finally:
for tc in test_cases:
_try_rm(tc['file'])
_try_rm(tc['file'] + '.part')
_try_rm(tc['file'] + '.info.json')
try_rm_tcs_files()
return test_template
### And add them to TestDownload
for test_case in defs:
for n, test_case in enumerate(defs):
test_method = generator(test_case)
test_method.__name__ = "test_{0}".format(test_case["name"])
tname = 'test_' + str(test_case['name'])
i = 1
while hasattr(TestDownload, tname):
tname = 'test_' + str(test_case['name']) + '_' + str(i)
i += 1
test_method.__name__ = tname
setattr(TestDownload, test_method.__name__, test_method)
del test_method

115
test/test_playlists.py Normal file
View File

@ -0,0 +1,115 @@
#!/usr/bin/env python
# encoding: utf-8
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import FakeYDL
from youtube_dl.extractor import (
DailymotionPlaylistIE,
DailymotionUserIE,
VimeoChannelIE,
UstreamChannelIE,
SoundcloudSetIE,
SoundcloudUserIE,
LivestreamIE,
NHLVideocenterIE,
BambuserChannelIE,
BandcampAlbumIE
)
class TestPlaylists(unittest.TestCase):
def assertIsPlaylist(self, info):
"""Make sure the info has '_type' set to 'playlist'"""
self.assertEqual(info['_type'], 'playlist')
def test_dailymotion_playlist(self):
dl = FakeYDL()
ie = DailymotionPlaylistIE(dl)
result = ie.extract('http://www.dailymotion.com/playlist/xv4bw_nqtv_sport/1#video=xl8v3q')
self.assertIsPlaylist(result)
self.assertEqual(result['title'], u'SPORT')
self.assertTrue(len(result['entries']) > 20)
def test_dailymotion_user(self):
dl = FakeYDL()
ie = DailymotionUserIE(dl)
result = ie.extract('http://www.dailymotion.com/user/generation-quoi/')
self.assertIsPlaylist(result)
self.assertEqual(result['title'], u'Génération Quoi')
self.assertTrue(len(result['entries']) >= 26)
def test_vimeo_channel(self):
dl = FakeYDL()
ie = VimeoChannelIE(dl)
result = ie.extract('http://vimeo.com/channels/tributes')
self.assertIsPlaylist(result)
self.assertEqual(result['title'], u'Vimeo Tributes')
self.assertTrue(len(result['entries']) > 24)
def test_ustream_channel(self):
dl = FakeYDL()
ie = UstreamChannelIE(dl)
result = ie.extract('http://www.ustream.tv/channel/young-americans-for-liberty')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], u'5124905')
self.assertTrue(len(result['entries']) >= 11)
def test_soundcloud_set(self):
dl = FakeYDL()
ie = SoundcloudSetIE(dl)
result = ie.extract('https://soundcloud.com/the-concept-band/sets/the-royal-concept-ep')
self.assertIsPlaylist(result)
self.assertEqual(result['title'], u'The Royal Concept EP')
self.assertTrue(len(result['entries']) >= 6)
def test_soundcloud_user(self):
dl = FakeYDL()
ie = SoundcloudUserIE(dl)
result = ie.extract('https://soundcloud.com/the-concept-band')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], u'9615865')
self.assertTrue(len(result['entries']) >= 12)
def test_livestream_event(self):
dl = FakeYDL()
ie = LivestreamIE(dl)
result = ie.extract('http://new.livestream.com/tedx/cityenglish')
self.assertIsPlaylist(result)
self.assertEqual(result['title'], u'TEDCity2.0 (English)')
self.assertTrue(len(result['entries']) >= 4)
def test_nhl_videocenter(self):
dl = FakeYDL()
ie = NHLVideocenterIE(dl)
result = ie.extract('http://video.canucks.nhl.com/videocenter/console?catid=999')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], u'999')
self.assertEqual(result['title'], u'Highlights')
self.assertEqual(len(result['entries']), 12)
def test_bambuser_channel(self):
dl = FakeYDL()
ie = BambuserChannelIE(dl)
result = ie.extract('http://bambuser.com/channel/pixelversity')
self.assertIsPlaylist(result)
self.assertEqual(result['title'], u'pixelversity')
self.assertTrue(len(result['entries']) >= 60)
def test_bandcamp_album(self):
dl = FakeYDL()
ie = BandcampAlbumIE(dl)
result = ie.extract('http://mpallante.bandcamp.com/album/nightmare-night-ep')
self.assertIsPlaylist(result)
self.assertEqual(result['title'], u'Nightmare Night EP')
self.assertTrue(len(result['entries']) >= 4)
if __name__ == '__main__':
unittest.main()

210
test/test_subtitles.py Normal file
View File

@ -0,0 +1,210 @@
#!/usr/bin/env python
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import FakeYDL, md5
from youtube_dl.extractor import (
YoutubeIE,
DailymotionIE,
TEDIE,
)
class BaseTestSubtitles(unittest.TestCase):
url = None
IE = None
def setUp(self):
self.DL = FakeYDL()
self.ie = self.IE(self.DL)
def getInfoDict(self):
info_dict = self.ie.extract(self.url)
return info_dict
def getSubtitles(self):
info_dict = self.getInfoDict()
return info_dict['subtitles']
class TestYoutubeSubtitles(BaseTestSubtitles):
url = 'QRS8MkLhQmM'
IE = YoutubeIE
def getSubtitles(self):
info_dict = self.getInfoDict()
return info_dict[0]['subtitles']
def test_youtube_no_writesubtitles(self):
self.DL.params['writesubtitles'] = False
subtitles = self.getSubtitles()
self.assertEqual(subtitles, None)
def test_youtube_subtitles(self):
self.DL.params['writesubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['en']), '4cd9278a35ba2305f47354ee13472260')
def test_youtube_subtitles_lang(self):
self.DL.params['writesubtitles'] = True
self.DL.params['subtitleslangs'] = ['it']
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['it']), '164a51f16f260476a05b50fe4c2f161d')
def test_youtube_allsubtitles(self):
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(len(subtitles.keys()), 13)
def test_youtube_subtitles_sbv_format(self):
self.DL.params['writesubtitles'] = True
self.DL.params['subtitlesformat'] = 'sbv'
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['en']), '13aeaa0c245a8bed9a451cb643e3ad8b')
def test_youtube_subtitles_vtt_format(self):
self.DL.params['writesubtitles'] = True
self.DL.params['subtitlesformat'] = 'vtt'
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['en']), '356cdc577fde0c6783b9b822e7206ff7')
def test_youtube_list_subtitles(self):
self.DL.expect_warning(u'Video doesn\'t have automatic captions')
self.DL.params['listsubtitles'] = True
info_dict = self.getInfoDict()
self.assertEqual(info_dict, None)
def test_youtube_automatic_captions(self):
self.url = '8YoUxe5ncPo'
self.DL.params['writeautomaticsub'] = True
self.DL.params['subtitleslangs'] = ['it']
subtitles = self.getSubtitles()
self.assertTrue(subtitles['it'] is not None)
def test_youtube_nosubtitles(self):
self.DL.expect_warning(u'video doesn\'t have subtitles')
self.url = 'sAjKT8FhjI8'
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(len(subtitles), 0)
def test_youtube_multiple_langs(self):
self.url = 'QRS8MkLhQmM'
self.DL.params['writesubtitles'] = True
langs = ['it', 'fr', 'de']
self.DL.params['subtitleslangs'] = langs
subtitles = self.getSubtitles()
for lang in langs:
self.assertTrue(subtitles.get(lang) is not None, u'Subtitles for \'%s\' not extracted' % lang)
class TestDailymotionSubtitles(BaseTestSubtitles):
url = 'http://www.dailymotion.com/video/xczg00'
IE = DailymotionIE
def test_no_writesubtitles(self):
subtitles = self.getSubtitles()
self.assertEqual(subtitles, None)
def test_subtitles(self):
self.DL.params['writesubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['en']), '976553874490cba125086bbfea3ff76f')
def test_subtitles_lang(self):
self.DL.params['writesubtitles'] = True
self.DL.params['subtitleslangs'] = ['fr']
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['fr']), '594564ec7d588942e384e920e5341792')
def test_allsubtitles(self):
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(len(subtitles.keys()), 5)
def test_list_subtitles(self):
self.DL.expect_warning(u'Automatic Captions not supported by this server')
self.DL.params['listsubtitles'] = True
info_dict = self.getInfoDict()
self.assertEqual(info_dict, None)
def test_automatic_captions(self):
self.DL.expect_warning(u'Automatic Captions not supported by this server')
self.DL.params['writeautomaticsub'] = True
self.DL.params['subtitleslang'] = ['en']
subtitles = self.getSubtitles()
self.assertTrue(len(subtitles.keys()) == 0)
def test_nosubtitles(self):
self.DL.expect_warning(u'video doesn\'t have subtitles')
self.url = 'http://www.dailymotion.com/video/x12u166_le-zapping-tele-star-du-08-aout-2013_tv'
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(len(subtitles), 0)
def test_multiple_langs(self):
self.DL.params['writesubtitles'] = True
langs = ['es', 'fr', 'de']
self.DL.params['subtitleslangs'] = langs
subtitles = self.getSubtitles()
for lang in langs:
self.assertTrue(subtitles.get(lang) is not None, u'Subtitles for \'%s\' not extracted' % lang)
class TestTedSubtitles(BaseTestSubtitles):
url = 'http://www.ted.com/talks/dan_dennett_on_our_consciousness.html'
IE = TEDIE
def test_no_writesubtitles(self):
subtitles = self.getSubtitles()
self.assertEqual(subtitles, None)
def test_subtitles(self):
self.DL.params['writesubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['en']), '2154f31ff9b9f89a0aa671537559c21d')
def test_subtitles_lang(self):
self.DL.params['writesubtitles'] = True
self.DL.params['subtitleslangs'] = ['fr']
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['fr']), '7616cbc6df20ec2c1204083c83871cf6')
def test_allsubtitles(self):
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(len(subtitles.keys()), 28)
def test_list_subtitles(self):
self.DL.expect_warning(u'Automatic Captions not supported by this server')
self.DL.params['listsubtitles'] = True
info_dict = self.getInfoDict()
self.assertEqual(info_dict, None)
def test_automatic_captions(self):
self.DL.expect_warning(u'Automatic Captions not supported by this server')
self.DL.params['writeautomaticsub'] = True
self.DL.params['subtitleslang'] = ['en']
subtitles = self.getSubtitles()
self.assertTrue(len(subtitles.keys()) == 0)
def test_multiple_langs(self):
self.DL.params['writesubtitles'] = True
langs = ['es', 'fr', 'de']
self.DL.params['subtitleslangs'] = langs
subtitles = self.getSubtitles()
for lang in langs:
self.assertTrue(subtitles.get(lang) is not None, u'Subtitles for \'%s\' not extracted' % lang)
if __name__ == '__main__':
unittest.main()

View File

@ -1,19 +1,32 @@
#!/usr/bin/env python
# Various small unit tests
import sys
import unittest
# coding: utf-8
# Allow direct execution
import os
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
# Various small unit tests
import xml.etree.ElementTree
#from youtube_dl.utils import htmlentity_transform
from youtube_dl.utils import timeconvert
from youtube_dl.utils import sanitize_filename
from youtube_dl.utils import unescapeHTML
from youtube_dl.utils import orderedSet
from youtube_dl.utils import (
timeconvert,
sanitize_filename,
unescapeHTML,
orderedSet,
DateRange,
unified_strdate,
find_xpath_attr,
get_meta_content,
xpath_with_ns,
smuggle_url,
unsmuggle_url,
shell_quote,
encodeFilename,
)
if sys.version_info < (3, 0):
_compat_str = lambda b: b.decode('unicode-escape')
@ -95,6 +108,74 @@ class TestUtil(unittest.TestCase):
def test_unescape_html(self):
self.assertEqual(unescapeHTML(_compat_str('%20;')), _compat_str('%20;'))
def test_daterange(self):
_20century = DateRange("19000101","20000101")
self.assertFalse("17890714" in _20century)
_ac = DateRange("00010101")
self.assertTrue("19690721" in _ac)
_firstmilenium = DateRange(end="10000101")
self.assertTrue("07110427" in _firstmilenium)
def test_unified_dates(self):
self.assertEqual(unified_strdate('December 21, 2010'), '20101221')
self.assertEqual(unified_strdate('8/7/2009'), '20090708')
self.assertEqual(unified_strdate('Dec 14, 2012'), '20121214')
self.assertEqual(unified_strdate('2012/10/11 01:56:38 +0000'), '20121011')
def test_find_xpath_attr(self):
testxml = u'''<root>
<node/>
<node x="a"/>
<node x="a" y="c" />
<node x="b" y="d" />
</root>'''
doc = xml.etree.ElementTree.fromstring(testxml)
self.assertEqual(find_xpath_attr(doc, './/fourohfour', 'n', 'v'), None)
self.assertEqual(find_xpath_attr(doc, './/node', 'x', 'a'), doc[1])
self.assertEqual(find_xpath_attr(doc, './/node', 'y', 'c'), doc[2])
def test_meta_parser(self):
testhtml = u'''
<head>
<meta name="description" content="foo &amp; bar">
<meta content='Plato' name='author'/>
</head>
'''
get_meta = lambda name: get_meta_content(name, testhtml)
self.assertEqual(get_meta('description'), u'foo & bar')
self.assertEqual(get_meta('author'), 'Plato')
def test_xpath_with_ns(self):
testxml = u'''<root xmlns:media="http://example.com/">
<media:song>
<media:author>The Author</media:author>
<url>http://server.com/download.mp3</url>
</media:song>
</root>'''
doc = xml.etree.ElementTree.fromstring(testxml)
find = lambda p: doc.find(xpath_with_ns(p, {'media': 'http://example.com/'}))
self.assertTrue(find('media:song') is not None)
self.assertEqual(find('media:song/media:author').text, u'The Author')
self.assertEqual(find('media:song/url').text, u'http://server.com/download.mp3')
def test_smuggle_url(self):
data = {u"ö": u"ö", u"abc": [3]}
url = 'https://foo.bar/baz?x=y#a'
smug_url = smuggle_url(url, data)
unsmug_url, unsmug_data = unsmuggle_url(smug_url)
self.assertEqual(url, unsmug_url)
self.assertEqual(data, unsmug_data)
res_url, res_data = unsmuggle_url(url)
self.assertEqual(res_url, url)
self.assertEqual(res_data, None)
def test_shell_quote(self):
args = ['ffmpeg', '-i', encodeFilename(u'ñ€ß\'.mp4')]
self.assertEqual(shell_quote(args), u"""ffmpeg -i 'ñ€ß'"'"'.mp4'""")
if __name__ == '__main__':
unittest.main()

View File

@ -0,0 +1,79 @@
#!/usr/bin/env python
# coding: utf-8
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import get_params, try_rm
import io
import xml.etree.ElementTree
import youtube_dl.YoutubeDL
import youtube_dl.extractor
class YoutubeDL(youtube_dl.YoutubeDL):
def __init__(self, *args, **kwargs):
super(YoutubeDL, self).__init__(*args, **kwargs)
self.to_stderr = self.to_screen
params = get_params({
'writeannotations': True,
'skip_download': True,
'writeinfojson': False,
'format': 'flv',
})
TEST_ID = 'gr51aVj-mLg'
ANNOTATIONS_FILE = TEST_ID + '.flv.annotations.xml'
EXPECTED_ANNOTATIONS = ['Speech bubble', 'Note', 'Title', 'Spotlight', 'Label']
class TestAnnotations(unittest.TestCase):
def setUp(self):
# Clear old files
self.tearDown()
def test_info_json(self):
expected = list(EXPECTED_ANNOTATIONS) #Two annotations could have the same text.
ie = youtube_dl.extractor.YoutubeIE()
ydl = YoutubeDL(params)
ydl.add_info_extractor(ie)
ydl.download([TEST_ID])
self.assertTrue(os.path.exists(ANNOTATIONS_FILE))
annoxml = None
with io.open(ANNOTATIONS_FILE, 'r', encoding='utf-8') as annof:
annoxml = xml.etree.ElementTree.parse(annof)
self.assertTrue(annoxml is not None, 'Failed to parse annotations XML')
root = annoxml.getroot()
self.assertEqual(root.tag, 'document')
annotationsTag = root.find('annotations')
self.assertEqual(annotationsTag.tag, 'annotations')
annotations = annotationsTag.findall('annotation')
#Not all the annotations have TEXT children and the annotations are returned unsorted.
for a in annotations:
self.assertEqual(a.tag, 'annotation')
if a.get('type') == 'text':
textTag = a.find('TEXT')
text = textTag.text
self.assertTrue(text in expected) #assertIn only added in python 2.7
#remove the first occurance, there could be more than one annotation with the same text
expected.remove(text)
#We should have seen (and removed) all the expected annotation texts.
self.assertEqual(len(expected), 0, 'Not all expected annotations were found.')
def tearDown(self):
try_rm(ANNOTATIONS_FILE)
if __name__ == '__main__':
unittest.main()

View File

@ -1,40 +1,36 @@
#!/usr/bin/env python
# coding: utf-8
import json
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
# Allow direct execution
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import get_params
import youtube_dl.FileDownloader
import youtube_dl.InfoExtractors
from youtube_dl.utils import *
PARAMETERS_FILE = os.path.join(os.path.dirname(os.path.abspath(__file__)), "parameters.json")
import io
import json
# General configuration (from __init__, not very elegant...)
jar = compat_cookiejar.CookieJar()
cookie_processor = compat_urllib_request.HTTPCookieProcessor(jar)
proxy_handler = compat_urllib_request.ProxyHandler()
opener = compat_urllib_request.build_opener(proxy_handler, cookie_processor, YoutubeDLHandler())
compat_urllib_request.install_opener(opener)
import youtube_dl.YoutubeDL
import youtube_dl.extractor
class FileDownloader(youtube_dl.FileDownloader):
class YoutubeDL(youtube_dl.YoutubeDL):
def __init__(self, *args, **kwargs):
youtube_dl.FileDownloader.__init__(self, *args, **kwargs)
super(YoutubeDL, self).__init__(*args, **kwargs)
self.to_stderr = self.to_screen
with io.open(PARAMETERS_FILE, encoding='utf-8') as pf:
params = json.load(pf)
params['writeinfojson'] = True
params['skip_download'] = True
params['writedescription'] = True
params = get_params({
'writeinfojson': True,
'skip_download': True,
'writedescription': True,
})
TEST_ID = 'BaW_jenozKc'
INFO_JSON_FILE = TEST_ID + '.mp4.info.json'
INFO_JSON_FILE = TEST_ID + '.info.json'
DESCRIPTION_FILE = TEST_ID + '.mp4.description'
EXPECTED_DESCRIPTION = u'''test chars: "'/\ä↭𝕐
@ -42,16 +38,17 @@ This is a test video for youtube-dl.
For more information, contact phihag@phihag.de .'''
class TestInfoJSON(unittest.TestCase):
def setUp(self):
# Clear old files
self.tearDown()
def test_info_json(self):
ie = youtube_dl.InfoExtractors.YoutubeIE()
fd = FileDownloader(params)
fd.add_info_extractor(ie)
fd.download([TEST_ID])
ie = youtube_dl.extractor.YoutubeIE()
ydl = YoutubeDL(params)
ydl.add_info_extractor(ie)
ydl.download([TEST_ID])
self.assertTrue(os.path.exists(INFO_JSON_FILE))
with io.open(INFO_JSON_FILE, 'r', encoding='utf-8') as jsonf:
jd = json.load(jsonf)

View File

@ -1,73 +1,111 @@
#!/usr/bin/env python
import sys
import unittest
import json
# Allow direct execution
import os
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from youtube_dl.InfoExtractors import YoutubeUserIE,YoutubePlaylistIE
from youtube_dl.utils import *
from test.helper import FakeYDL
PARAMETERS_FILE = os.path.join(os.path.dirname(os.path.abspath(__file__)), "parameters.json")
with io.open(PARAMETERS_FILE, encoding='utf-8') as pf:
parameters = json.load(pf)
# General configuration (from __init__, not very elegant...)
jar = compat_cookiejar.CookieJar()
cookie_processor = compat_urllib_request.HTTPCookieProcessor(jar)
proxy_handler = compat_urllib_request.ProxyHandler()
opener = compat_urllib_request.build_opener(proxy_handler, cookie_processor, YoutubeDLHandler())
compat_urllib_request.install_opener(opener)
from youtube_dl.extractor import (
YoutubeUserIE,
YoutubePlaylistIE,
YoutubeIE,
YoutubeChannelIE,
YoutubeShowIE,
)
class FakeDownloader(object):
def __init__(self):
self.result = []
self.params = parameters
def to_screen(self, s):
print(s)
def trouble(self, s):
raise Exception(s)
def download(self, x):
self.result.append(x)
class TestYoutubeLists(unittest.TestCase):
def assertIsPlaylist(self, info):
"""Make sure the info has '_type' set to 'playlist'"""
self.assertEqual(info['_type'], 'playlist')
def test_youtube_playlist(self):
DL = FakeDownloader()
IE = YoutubePlaylistIE(DL)
IE.extract('https://www.youtube.com/playlist?list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re')
self.assertEqual(DL.result, [
['http://www.youtube.com/watch?v=bV9L5Ht9LgY'],
['http://www.youtube.com/watch?v=FXxLjLQi3Fg'],
['http://www.youtube.com/watch?v=tU3Bgo5qJZE']
])
dl = FakeYDL()
ie = YoutubePlaylistIE(dl)
result = ie.extract('https://www.youtube.com/playlist?list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re')
self.assertIsPlaylist(result)
self.assertEqual(result['title'], 'ytdl test PL')
ytie_results = [YoutubeIE()._extract_id(url['url']) for url in result['entries']]
self.assertEqual(ytie_results, [ 'bV9L5Ht9LgY', 'FXxLjLQi3Fg', 'tU3Bgo5qJZE'])
def test_youtube_playlist_noplaylist(self):
dl = FakeYDL()
dl.params['noplaylist'] = True
ie = YoutubePlaylistIE(dl)
result = ie.extract('https://www.youtube.com/watch?v=FXxLjLQi3Fg&list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re')
self.assertEqual(result['_type'], 'url')
self.assertEqual(YoutubeIE()._extract_id(result['url']), 'FXxLjLQi3Fg')
def test_issue_673(self):
dl = FakeYDL()
ie = YoutubePlaylistIE(dl)
result = ie.extract('PLBB231211A4F62143')
self.assertTrue(len(result['entries']) > 25)
def test_youtube_playlist_long(self):
DL = FakeDownloader()
IE = YoutubePlaylistIE(DL)
IE.extract('https://www.youtube.com/playlist?list=UUBABnxM4Ar9ten8Mdjj1j0Q')
self.assertTrue(len(DL.result) >= 799)
dl = FakeYDL()
ie = YoutubePlaylistIE(dl)
result = ie.extract('https://www.youtube.com/playlist?list=UUBABnxM4Ar9ten8Mdjj1j0Q')
self.assertIsPlaylist(result)
self.assertTrue(len(result['entries']) >= 799)
def test_youtube_playlist_with_deleted(self):
#651
dl = FakeYDL()
ie = YoutubePlaylistIE(dl)
result = ie.extract('https://www.youtube.com/playlist?list=PLwP_SiAcdui0KVebT0mU9Apz359a4ubsC')
ytie_results = [YoutubeIE()._extract_id(url['url']) for url in result['entries']]
self.assertFalse('pElCt5oNDuI' in ytie_results)
self.assertFalse('KdPEApIVdWM' in ytie_results)
def test_youtube_playlist_empty(self):
dl = FakeYDL()
ie = YoutubePlaylistIE(dl)
result = ie.extract('https://www.youtube.com/playlist?list=PLtPgu7CB4gbZDA7i_euNxn75ISqxwZPYx')
self.assertIsPlaylist(result)
self.assertEqual(len(result['entries']), 0)
def test_youtube_course(self):
DL = FakeDownloader()
IE = YoutubePlaylistIE(DL)
dl = FakeYDL()
ie = YoutubePlaylistIE(dl)
# TODO find a > 100 (paginating?) videos course
IE.extract('https://www.youtube.com/course?list=ECUl4u3cNGP61MdtwGTqZA0MreSaDybji8')
self.assertEqual(DL.result[0], ['http://www.youtube.com/watch?v=j9WZyLZCBzs'])
self.assertEqual(len(DL.result), 25)
self.assertEqual(DL.result[-1], ['http://www.youtube.com/watch?v=rYefUsYuEp0'])
result = ie.extract('https://www.youtube.com/course?list=ECUl4u3cNGP61MdtwGTqZA0MreSaDybji8')
entries = result['entries']
self.assertEqual(YoutubeIE()._extract_id(entries[0]['url']), 'j9WZyLZCBzs')
self.assertEqual(len(entries), 25)
self.assertEqual(YoutubeIE()._extract_id(entries[-1]['url']), 'rYefUsYuEp0')
def test_youtube_channel(self):
# I give up, please find a channel that does paginate and test this like test_youtube_playlist_long
pass # TODO
dl = FakeYDL()
ie = YoutubeChannelIE(dl)
#test paginated channel
result = ie.extract('https://www.youtube.com/channel/UCKfVa3S1e4PHvxWcwyMMg8w')
self.assertTrue(len(result['entries']) > 90)
#test autogenerated channel
result = ie.extract('https://www.youtube.com/channel/HCtnHdj3df7iM/videos')
self.assertTrue(len(result['entries']) >= 18)
def test_youtube_user(self):
DL = FakeDownloader()
IE = YoutubeUserIE(DL)
IE.extract('https://www.youtube.com/user/TheLinuxFoundation')
self.assertTrue(len(DL.result) >= 320)
dl = FakeYDL()
ie = YoutubeUserIE(dl)
result = ie.extract('https://www.youtube.com/user/TheLinuxFoundation')
self.assertTrue(len(result['entries']) >= 320)
def test_youtube_safe_search(self):
dl = FakeYDL()
ie = YoutubePlaylistIE(dl)
result = ie.extract('PLtPgu7CB4gbY9oDN3drwC3cMbJggS7dKl')
self.assertEqual(len(result['entries']), 2)
def test_youtube_show(self):
dl = FakeYDL()
ie = YoutubeShowIE(dl)
result = ie.extract('http://www.youtube.com/show/airdisasters')
self.assertTrue(len(result) >= 3)
if __name__ == '__main__':
unittest.main()

View File

@ -0,0 +1,81 @@
#!/usr/bin/env python
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import io
import re
import string
from youtube_dl.extractor import YoutubeIE
from youtube_dl.utils import compat_str, compat_urlretrieve
_TESTS = [
(
u'https://s.ytimg.com/yts/jsbin/html5player-vflHOr_nV.js',
u'js',
86,
u'>=<;:/.-[+*)(\'&%$#"!ZYX0VUTSRQPONMLKJIHGFEDCBA\\yxwvutsrqponmlkjihgfedcba987654321',
),
(
u'https://s.ytimg.com/yts/jsbin/html5player-vfldJ8xgI.js',
u'js',
85,
u'3456789a0cdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRS[UVWXYZ!"#$%&\'()*+,-./:;<=>?@',
),
(
u'https://s.ytimg.com/yts/swfbin/watch_as3-vflg5GhxU.swf',
u'swf',
82,
u':/.-,+*)=\'&%$#"!ZYX0VUTSRQPONMLKJIHGFEDCBAzyxw>utsrqponmlkjihgfedcba987654321'
),
]
class TestSignature(unittest.TestCase):
def setUp(self):
TEST_DIR = os.path.dirname(os.path.abspath(__file__))
self.TESTDATA_DIR = os.path.join(TEST_DIR, 'testdata')
if not os.path.exists(self.TESTDATA_DIR):
os.mkdir(self.TESTDATA_DIR)
def make_tfunc(url, stype, sig_length, expected_sig):
basename = url.rpartition('/')[2]
m = re.match(r'.*-([a-zA-Z0-9_-]+)\.[a-z]+$', basename)
assert m, '%r should follow URL format' % basename
test_id = m.group(1)
def test_func(self):
fn = os.path.join(self.TESTDATA_DIR, basename)
if not os.path.exists(fn):
compat_urlretrieve(url, fn)
ie = YoutubeIE()
if stype == 'js':
with io.open(fn, encoding='utf-8') as testf:
jscode = testf.read()
func = ie._parse_sig_js(jscode)
else:
assert stype == 'swf'
with open(fn, 'rb') as testf:
swfcode = testf.read()
func = ie._parse_sig_swf(swfcode)
src_sig = compat_str(string.printable[:sig_length])
got_sig = func(src_sig)
self.assertEqual(got_sig, expected_sig)
test_func.__name__ = str('test_signature_' + stype + '_' + test_id)
setattr(TestSignature, test_func.__name__, test_func)
for test_spec in _TESTS:
make_tfunc(*test_spec)
if __name__ == '__main__':
unittest.main()

View File

@ -1,57 +0,0 @@
#!/usr/bin/env python
import sys
import unittest
import json
import io
import hashlib
# Allow direct execution
import os
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from youtube_dl.InfoExtractors import YoutubeIE
from youtube_dl.utils import *
PARAMETERS_FILE = os.path.join(os.path.dirname(os.path.abspath(__file__)), "parameters.json")
with io.open(PARAMETERS_FILE, encoding='utf-8') as pf:
parameters = json.load(pf)
# General configuration (from __init__, not very elegant...)
jar = compat_cookiejar.CookieJar()
cookie_processor = compat_urllib_request.HTTPCookieProcessor(jar)
proxy_handler = compat_urllib_request.ProxyHandler()
opener = compat_urllib_request.build_opener(proxy_handler, cookie_processor, YoutubeDLHandler())
compat_urllib_request.install_opener(opener)
class FakeDownloader(object):
def __init__(self):
self.result = []
self.params = parameters
def to_screen(self, s):
print(s)
def trouble(self, s):
raise Exception(s)
def download(self, x):
self.result.append(x)
md5 = lambda s: hashlib.md5(s.encode('utf-8')).hexdigest()
class TestYoutubeSubtitles(unittest.TestCase):
def test_youtube_subtitles(self):
DL = FakeDownloader()
DL.params['writesubtitles'] = True
IE = YoutubeIE(DL)
info_dict = IE.extract('QRS8MkLhQmM')
self.assertEqual(md5(info_dict[0]['subtitles']), 'c3228550d59116f3c29fba370b55d033')
def test_youtube_subtitles_it(self):
DL = FakeDownloader()
DL.params['writesubtitles'] = True
DL.params['subtitleslang'] = 'it'
IE = YoutubeIE(DL)
info_dict = IE.extract('QRS8MkLhQmM')
self.assertEqual(md5(info_dict[0]['subtitles']), '132a88a0daf8e1520f393eb58f1f646a')
if __name__ == '__main__':
unittest.main()

View File

@ -1,308 +0,0 @@
[
{
"name": "Youtube",
"url": "http://www.youtube.com/watch?v=BaW_jenozKc",
"file": "BaW_jenozKc.mp4",
"info_dict": {
"title": "youtube-dl test video \"'/\\ä↭𝕐",
"uploader": "Philipp Hagemeister",
"uploader_id": "phihag",
"upload_date": "20121002",
"description": "test chars: \"'/\\ä↭𝕐\n\nThis is a test video for youtube-dl.\n\nFor more information, contact phihag@phihag.de ."
}
},
{
"name": "Dailymotion",
"md5": "392c4b85a60a90dc4792da41ce3144eb",
"url": "http://www.dailymotion.com/video/x33vw9_tutoriel-de-youtubeur-dl-des-video_tech",
"file": "x33vw9.mp4"
},
{
"name": "Metacafe",
"add_ie": ["Youtube"],
"url": "http://metacafe.com/watch/yt-_aUehQsCQtM/the_electric_company_short_i_pbs_kids_go/",
"file": "_aUehQsCQtM.flv"
},
{
"name": "BlipTV",
"md5": "b2d849efcf7ee18917e4b4d9ff37cafe",
"url": "http://blip.tv/cbr/cbr-exclusive-gotham-city-imposters-bats-vs-jokerz-short-3-5796352",
"file": "5779306.m4v"
},
{
"name": "XVideos",
"md5": "1d0c835822f0a71a7bf011855db929d0",
"url": "http://www.xvideos.com/video939581/funny_porns_by_s_-1",
"file": "939581.flv"
},
{
"name": "YouPorn",
"md5": "c37ddbaaa39058c76a7e86c6813423c1",
"url": "http://www.youporn.com/watch/505835/sex-ed-is-it-safe-to-masturbate-daily/",
"file": "505835.mp4"
},
{
"name": "Pornotube",
"md5": "374dd6dcedd24234453b295209aa69b6",
"url": "http://pornotube.com/c/173/m/1689755/Marilyn-Monroe-Bathing",
"file": "1689755.flv"
},
{
"name": "YouJizz",
"md5": "07e15fa469ba384c7693fd246905547c",
"url": "http://www.youjizz.com/videos/zeichentrick-1-2189178.html",
"file": "2189178.flv"
},
{
"name": "Vimeo",
"md5": "8879b6cc097e987f02484baf890129e5",
"url": "http://vimeo.com/56015672",
"file": "56015672.mp4",
"info_dict": {
"title": "youtube-dl test video - ★ \" ' 幸 / \\ ä ↭ 𝕐",
"uploader": "Filippo Valsorda",
"uploader_id": "user7108434",
"upload_date": "20121220",
"description": "This is a test case for youtube-dl.\nFor more information, see github.com/rg3/youtube-dl\nTest chars: ★ \" ' 幸 / \\ ä ↭ 𝕐"
}
},
{
"name": "Soundcloud",
"md5": "ebef0a451b909710ed1d7787dddbf0d7",
"url": "http://soundcloud.com/ethmusic/lostin-powers-she-so-heavy",
"file": "62986583.mp3"
},
{
"name": "StanfordOpenClassroom",
"md5": "544a9468546059d4e80d76265b0443b8",
"url": "http://openclassroom.stanford.edu/MainFolder/VideoPage.php?course=PracticalUnix&video=intro-environment&speed=100",
"file": "PracticalUnix_intro-environment.mp4",
"skip": "Currently offline"
},
{
"name": "XNXX",
"md5": "0831677e2b4761795f68d417e0b7b445",
"url": "http://video.xnxx.com/video1135332/lida_naked_funny_actress_5_",
"file": "1135332.flv"
},
{
"name": "Youku",
"url": "http://v.youku.com/v_show/id_XNDgyMDQ2NTQw.html",
"file": "XNDgyMDQ2NTQw_part00.flv",
"md5": "ffe3f2e435663dc2d1eea34faeff5b5b",
"params": { "test": false }
},
{
"name": "NBA",
"url": "http://www.nba.com/video/games/nets/2012/12/04/0021200253-okc-bkn-recap.nba/index.html",
"file": "0021200253-okc-bkn-recap.nba.mp4",
"md5": "c0edcfc37607344e2ff8f13c378c88a4"
},
{
"name": "JustinTV",
"url": "http://www.twitch.tv/thegamedevhub/b/296128360",
"file": "296128360.flv",
"md5": "ecaa8a790c22a40770901460af191c9a"
},
{
"name": "MyVideo",
"url": "http://www.myvideo.de/watch/8229274/bowling_fail_or_win",
"file": "8229274.flv",
"md5": "2d2753e8130479ba2cb7e0a37002053e"
},
{
"name": "Escapist",
"url": "http://www.escapistmagazine.com/videos/view/the-escapist-presents/6618-Breaking-Down-Baldurs-Gate",
"file": "6618-Breaking-Down-Baldurs-Gate.flv",
"md5": "c6793dbda81388f4264c1ba18684a74d",
"skip": "Fails with timeout on Travis"
},
{
"name": "GooglePlus",
"url": "https://plus.google.com/u/0/108897254135232129896/posts/ZButuJc6CtH",
"file": "ZButuJc6CtH.flv"
},
{
"name": "FunnyOrDie",
"url": "http://www.funnyordie.com/videos/0732f586d7/heart-shaped-box-literal-video-version",
"file": "0732f586d7.mp4",
"md5": "f647e9e90064b53b6e046e75d0241fbd"
},
{
"name": "TweetReel",
"url": "http://tweetreel.com/?77smq",
"file": "77smq.mov",
"md5": "56b4d9ca9de467920f3f99a6d91255d6",
"info_dict": {
"uploader": "itszero",
"uploader_id": "itszero",
"upload_date": "20091225",
"description": "Installing Gentoo Linux on Powerbook G4, it turns out the sleep indicator becomes HDD activity indicator :D"
}
},
{
"name": "Steam",
"url": "http://store.steampowered.com/video/105600/",
"playlist": [
{
"file": "81300.flv",
"md5": "f870007cee7065d7c76b88f0a45ecc07",
"info_dict": {
"title": "Terraria 1.1 Trailer"
}
},
{
"file": "80859.flv",
"md5": "61aaf31a5c5c3041afb58fb83cbb5751",
"info_dict": {
"title": "Terraria Trailer"
}
}
]
},
{
"name": "Ustream",
"url": "http://www.ustream.tv/recorded/20274954",
"file": "20274954.flv",
"md5": "088f151799e8f572f84eb62f17d73e5c",
"info_dict": {
"title": "Young Americans for Liberty February 7, 2012 2:28 AM"
}
},
{
"name": "InfoQ",
"url": "http://www.infoq.com/presentations/A-Few-of-My-Favorite-Python-Things",
"file": "12-jan-pythonthings.mp4",
"info_dict": {
"title": "A Few of My Favorite [Python] Things"
},
"params": {
"skip_download": true
}
},
{
"name": "ComedyCentral",
"url": "http://www.thedailyshow.com/watch/thu-december-13-2012/kristen-stewart",
"file": "422212.mp4",
"md5": "4e2f5cb088a83cd8cdb7756132f9739d",
"info_dict": {
"title": "thedailyshow-kristen-stewart part 1"
}
},
{
"name": "RBMARadio",
"url": "http://www.rbmaradio.com/shows/ford-lopatin-live-at-primavera-sound-2011",
"file": "ford-lopatin-live-at-primavera-sound-2011.mp3",
"md5": "6bc6f9bcb18994b4c983bc3bf4384d95",
"info_dict": {
"title": "Live at Primavera Sound 2011",
"description": "Joel Ford and Daniel \u2019Oneohtrix Point Never\u2019 Lopatin fly their midified pop extravaganza to Spain. Live at Primavera Sound 2011.",
"uploader": "Ford & Lopatin",
"uploader_id": "ford-lopatin",
"location": "Spain"
}
},
{
"name": "Facebook",
"url": "https://www.facebook.com/photo.php?v=120708114770723",
"file": "120708114770723.mp4",
"md5": "48975a41ccc4b7a581abd68651c1a5a8",
"info_dict": {
"title": "PEOPLE ARE AWESOME 2013",
"duration": 279
}
},
{
"name": "EightTracks",
"url": "http://8tracks.com/ytdl/youtube-dl-test-tracks-a",
"playlist": [
{
"file": "11885610.m4a",
"md5": "96ce57f24389fc8734ce47f4c1abcc55",
"info_dict": {
"title": "youtue-dl project<>\"' - youtube-dl test track 1 \"'/\\\u00e4\u21ad",
"uploader_id": "ytdl"
}
},
{
"file": "11885608.m4a",
"md5": "4ab26f05c1f7291ea460a3920be8021f",
"info_dict": {
"title": "youtube-dl project - youtube-dl test track 2 \"'/\\\u00e4\u21ad",
"uploader_id": "ytdl"
}
},
{
"file": "11885679.m4a",
"md5": "d30b5b5f74217410f4689605c35d1fd7",
"info_dict": {
"title": "youtube-dl project as well - youtube-dl test track 3 \"'/\\\u00e4\u21ad"
}
},
{
"file": "11885680.m4a",
"md5": "4eb0a669317cd725f6bbd336a29f923a",
"info_dict": {
"title": "youtube-dl project as well - youtube-dl test track 4 \"'/\\\u00e4\u21ad"
}
},
{
"file": "11885682.m4a",
"md5": "1893e872e263a2705558d1d319ad19e8",
"info_dict": {
"title": "PH - youtube-dl test track 5 \"'/\\\u00e4\u21ad"
}
},
{
"file": "11885683.m4a",
"md5": "b673c46f47a216ab1741ae8836af5899",
"info_dict": {
"title": "PH - youtube-dl test track 6 \"'/\\\u00e4\u21ad"
}
},
{
"file": "11885684.m4a",
"md5": "1d74534e95df54986da7f5abf7d842b7",
"info_dict": {
"title": "phihag - youtube-dl test track 7 \"'/\\\u00e4\u21ad"
}
},
{
"file": "11885685.m4a",
"md5": "f081f47af8f6ae782ed131d38b9cd1c0",
"info_dict": {
"title": "phihag - youtube-dl test track 8 \"'/\\\u00e4\u21ad"
}
}
]
},
{
"name": "Keek",
"url": "http://www.keek.com/ytdl/keeks/NODfbab",
"file": "NODfbab.mp4",
"md5": "9b0636f8c0f7614afa4ea5e4c6e57e83",
"info_dict": {
"title": "test chars: \"'/\\ä<>This is a test video for youtube-dl.For more information, contact phihag@phihag.de ."
}
},
{
"name": "TED",
"url": "http://www.ted.com/talks/dan_dennett_on_our_consciousness.html",
"file": "102.mp4",
"md5": "7bc087e71d16f18f9b8ab9fa62a8a031",
"info_dict": {
"title": "Dan Dennett: The illusion of consciousness"
}
},
{
"name": "MySpass",
"url": "http://www.myspass.de/myspass/shows/tvshows/absolute-mehrheit/Absolute-Mehrheit-vom-17022013-Die-Highlights-Teil-2--/11741/",
"file": "11741.mp4",
"md5": "0b49f4844a068f8b33f4b7c88405862b",
"info_dict": {
"title": "Absolute Mehrheit vom 17.02.2013 - Die Highlights, Teil 2"
}
}
]

8
tox.ini Normal file
View File

@ -0,0 +1,8 @@
[tox]
envlist = py26,py27,py33
[testenv]
deps =
nose
coverage
commands = nosetests --verbose {posargs:test} # --with-coverage --cover-package=youtube_dl --cover-html
# test.test_download:TestDownload.test_NowVideo

Binary file not shown.

View File

@ -1,151 +1,109 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import absolute_import
import math
import io
import os
import re
import socket
import subprocess
import sys
import time
import traceback
if os.name == 'nt':
import ctypes
from .utils import *
from .utils import (
compat_urllib_error,
compat_urllib_request,
ContentTooShortError,
determine_ext,
encodeFilename,
format_bytes,
sanitize_open,
timeconvert,
)
class FileDownloader(object):
"""File Downloader class.
File downloader objects are the ones responsible of downloading the
actual video file and writing it to disk if the user has requested
it, among some other tasks. In most cases there should be one per
program. As, given a video URL, the downloader doesn't know how to
extract all the needed information, task that InfoExtractors do, it
has to pass the URL to one of them.
For this, file downloader objects have a method that allows
InfoExtractors to be registered in a given order. When it is passed
a URL, the file downloader handles it to the first InfoExtractor it
finds that reports being able to handle it. The InfoExtractor extracts
all the information about the video or videos the URL refers to, and
asks the FileDownloader to process the video information, possibly
downloading the video.
actual video file and writing it to disk.
File downloaders accept a lot of parameters. In order not to saturate
the object constructor with arguments, it receives a dictionary of
options instead. These options are available through the params
attribute for the InfoExtractors to use. The FileDownloader also
registers itself as the downloader in charge for the InfoExtractors
that are added to it, so this is a "mutual registration".
options instead.
Available options:
username: Username for authentication purposes.
password: Password for authentication purposes.
usenetrc: Use netrc for authentication instead.
verbose: Print additional info to stdout.
quiet: Do not print messages to stdout.
forceurl: Force printing final URL.
forcetitle: Force printing title.
forcethumbnail: Force printing thumbnail URL.
forcedescription: Force printing description.
forcefilename: Force printing final filename.
simulate: Do not download the video files.
format: Video format code.
format_limit: Highest quality format to try.
outtmpl: Template for output names.
restrictfilenames: Do not allow "&" and spaces in file names
ignoreerrors: Do not stop on download errors.
ratelimit: Download speed limit, in bytes/sec.
nooverwrites: Prevent overwriting files.
retries: Number of times to retry for HTTP error 5xx
buffersize: Size of download buffer in bytes.
noresizebuffer: Do not automatically resize the download buffer.
continuedl: Try to continue downloads if possible.
noprogress: Do not print the progress bar.
playliststart: Playlist item to start at.
playlistend: Playlist item to end at.
matchtitle: Download only matching titles.
rejecttitle: Reject downloads for matching titles.
logtostderr: Log messages to stderr instead of stdout.
consoletitle: Display progress in console window's titlebar.
nopart: Do not use temporary .part files.
updatetime: Use the Last-modified header to set output file timestamps.
writedescription: Write the video description to a .description file
writeinfojson: Write the video description to a .info.json file
writesubtitles: Write the video subtitles to a .srt file
subtitleslang: Language of the subtitles to download
test: Download only first bytes to test the downloader.
keepvideo: Keep the video file after post-processing
min_filesize: Skip files smaller than this size
max_filesize: Skip files larger than this size
"""
params = None
_ies = []
_pps = []
_download_retcode = None
_num_downloads = None
_screen_file = None
def __init__(self, params):
def __init__(self, ydl, params):
"""Create a FileDownloader object with the given options."""
self._ies = []
self._pps = []
self.ydl = ydl
self._progress_hooks = []
self._download_retcode = 0
self._num_downloads = 0
self._screen_file = [sys.stdout, sys.stderr][params.get('logtostderr', False)]
self.params = params
if '%(stitle)s' in self.params['outtmpl']:
self.to_stderr(u'WARNING: %(stitle)s is deprecated. Use the %(title)s and the --restrict-filenames flag(which also secures %(uploader)s et al) instead.')
@staticmethod
def format_bytes(bytes):
if bytes is None:
return 'N/A'
if type(bytes) is str:
bytes = float(bytes)
if bytes == 0.0:
exponent = 0
def format_seconds(seconds):
(mins, secs) = divmod(seconds, 60)
(hours, mins) = divmod(mins, 60)
if hours > 99:
return '--:--:--'
if hours == 0:
return '%02d:%02d' % (mins, secs)
else:
exponent = int(math.log(bytes, 1024.0))
suffix = 'bkMGTPEZY'[exponent]
converted = float(bytes) / float(1024 ** exponent)
return '%.2f%s' % (converted, suffix)
return '%02d:%02d:%02d' % (hours, mins, secs)
@staticmethod
def calc_percent(byte_counter, data_len):
if data_len is None:
return None
return float(byte_counter) / float(data_len) * 100.0
@staticmethod
def format_percent(percent):
if percent is None:
return '---.-%'
return '%6s' % ('%3.1f%%' % (float(byte_counter) / float(data_len) * 100.0))
return '%6s' % ('%3.1f%%' % percent)
@staticmethod
def calc_eta(start, now, total, current):
if total is None:
return '--:--'
return None
dif = now - start
if current == 0 or dif < 0.001: # One millisecond
return '--:--'
return None
rate = float(current) / dif
eta = int((float(total) - float(current)) / rate)
(eta_mins, eta_secs) = divmod(eta, 60)
if eta_mins > 99:
return int((float(total) - float(current)) / rate)
@staticmethod
def format_eta(eta):
if eta is None:
return '--:--'
return '%02d:%02d' % (eta_mins, eta_secs)
return FileDownloader.format_seconds(eta)
@staticmethod
def calc_speed(start, now, bytes):
dif = now - start
if bytes == 0 or dif < 0.001: # One millisecond
return None
return float(bytes) / dif
@staticmethod
def format_speed(speed):
if speed is None:
return '%10s' % '---b/s'
return '%10s' % ('%s/s' % FileDownloader.format_bytes(float(bytes) / dif))
return '%10s' % ('%s/s' % format_bytes(speed))
@staticmethod
def best_block_size(elapsed_time, bytes):
@ -170,69 +128,23 @@ class FileDownloader(object):
multiplier = 1024.0 ** 'bkmgtpezy'.index(matchobj.group(2).lower())
return int(round(number * multiplier))
def add_info_extractor(self, ie):
"""Add an InfoExtractor object to the end of the list."""
self._ies.append(ie)
ie.set_downloader(self)
def add_post_processor(self, pp):
"""Add a PostProcessor object to the end of the chain."""
self._pps.append(pp)
pp.set_downloader(self)
def to_screen(self, message, skip_eol=False):
"""Print message to stdout if not in quiet mode."""
assert type(message) == type(u'')
if not self.params.get('quiet', False):
terminator = [u'\n', u''][skip_eol]
output = message + terminator
if 'b' in getattr(self._screen_file, 'mode', '') or sys.version_info[0] < 3: # Python 2 lies about the mode of sys.stdout/sys.stderr
output = output.encode(preferredencoding(), 'ignore')
self._screen_file.write(output)
self._screen_file.flush()
def to_screen(self, *args, **kargs):
self.ydl.to_screen(*args, **kargs)
def to_stderr(self, message):
"""Print message to stderr."""
assert type(message) == type(u'')
output = message + u'\n'
if 'b' in getattr(self._screen_file, 'mode', '') or sys.version_info[0] < 3: # Python 2 lies about the mode of sys.stdout/sys.stderr
output = output.encode(preferredencoding())
sys.stderr.write(output)
self.ydl.to_screen(message)
def to_cons_title(self, message):
"""Set console/terminal window title to message."""
if not self.params.get('consoletitle', False):
return
if os.name == 'nt' and ctypes.windll.kernel32.GetConsoleWindow():
# c_wchar_p() might not be necessary if `message` is
# already of type unicode()
ctypes.windll.kernel32.SetConsoleTitleW(ctypes.c_wchar_p(message))
elif 'TERM' in os.environ:
self.to_screen('\033]0;%s\007' % message, skip_eol=True)
def to_console_title(self, message):
self.ydl.to_console_title(message)
def fixed_template(self):
"""Checks if the output template is fixed."""
return (re.search(u'(?u)%\\(.+?\\)s', self.params['outtmpl']) is None)
def trouble(self, *args, **kargs):
self.ydl.trouble(*args, **kargs)
def trouble(self, message=None, tb=None):
"""Determine action to take when a download problem appears.
def report_warning(self, *args, **kargs):
self.ydl.report_warning(*args, **kargs)
Depending on if the downloader has been configured to ignore
download errors or not, this method may throw an exception or
not when errors are found, after printing the message.
tb, if given, is additional traceback information.
"""
if message is not None:
self.to_stderr(message)
if self.params.get('verbose'):
if tb is None:
tb_data = traceback.format_list(traceback.extract_stack())
tb = u''.join(tb_data)
self.to_stderr(tb)
if not self.params.get('ignoreerrors', False):
raise DownloadError(message)
self._download_retcode = 1
def report_error(self, *args, **kargs):
self.ydl.report_error(*args, **kargs)
def slow_down(self, start_time, byte_counter):
"""Sleep if the download speed is over the rate limit."""
@ -264,8 +176,8 @@ class FileDownloader(object):
if old_filename == new_filename:
return
os.rename(encodeFilename(old_filename), encodeFilename(new_filename))
except (IOError, OSError) as err:
self.trouble(u'ERROR: unable to rename file')
except (IOError, OSError):
self.report_error(u'unable to rename file')
def try_utime(self, filename, last_modified_hdr):
"""Try to set the last-modified time of the given file."""
@ -279,39 +191,40 @@ class FileDownloader(object):
filetime = timeconvert(timestr)
if filetime is None:
return filetime
# Ignore obviously invalid dates
if filetime == 0:
return
try:
os.utime(filename, (time.time(), filetime))
except:
pass
return filetime
def report_writedescription(self, descfn):
""" Report that the description file is being written """
self.to_screen(u'[info] Writing video description to: ' + descfn)
def report_writesubtitles(self, srtfn):
""" Report that the subtitles file is being written """
self.to_screen(u'[info] Writing video subtitles to: ' + srtfn)
def report_writeinfojson(self, infofn):
""" Report that the metadata file has been written """
self.to_screen(u'[info] Video description metadata as JSON to: ' + infofn)
def report_destination(self, filename):
"""Report destination filename."""
self.to_screen(u'[download] Destination: ' + filename)
def report_progress(self, percent_str, data_len_str, speed_str, eta_str):
def report_progress(self, percent, data_len_str, speed, eta):
"""Report download progress."""
if self.params.get('noprogress', False):
return
clear_line = (u'\x1b[K' if sys.stderr.isatty() and os.name != 'nt' else u'')
if eta is not None:
eta_str = self.format_eta(eta)
else:
eta_str = 'Unknown ETA'
if percent is not None:
percent_str = self.format_percent(percent)
else:
percent_str = 'Unknown %'
speed_str = self.format_speed(speed)
if self.params.get('progress_with_newline', False):
self.to_screen(u'[download] %s of %s at %s ETA %s' %
(percent_str, data_len_str, speed_str, eta_str))
else:
self.to_screen(u'\r[download] %s of %s at %s ETA %s' %
(percent_str, data_len_str, speed_str, eta_str), skip_eol=True)
self.to_cons_title(u'youtube-dl - %s of %s at %s ETA %s' %
self.to_screen(u'\r%s[download] %s of %s at %s ETA %s' %
(clear_line, percent_str, data_len_str, speed_str, eta_str), skip_eol=True)
self.to_console_title(u'youtube-dl - %s of %s at %s ETA %s' %
(percent_str.strip(), data_len_str.strip(), speed_str.strip(), eta_str.strip()))
def report_resuming_byte(self, resume_len):
@ -326,283 +239,148 @@ class FileDownloader(object):
"""Report file has already been fully downloaded."""
try:
self.to_screen(u'[download] %s has already been downloaded' % file_name)
except (UnicodeEncodeError) as err:
except UnicodeEncodeError:
self.to_screen(u'[download] The file has already been downloaded')
def report_unable_to_resume(self):
"""Report it was impossible to resume download."""
self.to_screen(u'[download] Unable to resume')
def report_finish(self):
def report_finish(self, data_len_str, tot_time):
"""Report download finished."""
if self.params.get('noprogress', False):
self.to_screen(u'[download] Download completed')
else:
self.to_screen(u'')
clear_line = (u'\x1b[K' if sys.stderr.isatty() and os.name != 'nt' else u'')
self.to_screen(u'\r%s[download] 100%% of %s in %s' %
(clear_line, data_len_str, self.format_seconds(tot_time)))
def increment_downloads(self):
"""Increment the ordinal that assigns a number to each file."""
self._num_downloads += 1
def prepare_filename(self, info_dict):
"""Generate the output filename."""
try:
template_dict = dict(info_dict)
template_dict['epoch'] = int(time.time())
template_dict['autonumber'] = u'%05d' % self._num_downloads
sanitize = lambda k,v: sanitize_filename(
u'NA' if v is None else compat_str(v),
restricted=self.params.get('restrictfilenames'),
is_id=(k==u'id'))
template_dict = dict((k, sanitize(k, v)) for k,v in template_dict.items())
filename = self.params['outtmpl'] % template_dict
return filename
except (ValueError, KeyError) as err:
self.trouble(u'ERROR: invalid system charset or erroneous output template')
return None
def _match_entry(self, info_dict):
""" Returns None iff the file should be downloaded """
title = info_dict['title']
matchtitle = self.params.get('matchtitle', False)
if matchtitle:
matchtitle = matchtitle.decode('utf8')
if not re.search(matchtitle, title, re.IGNORECASE):
return u'[download] "' + title + '" title did not match pattern "' + matchtitle + '"'
rejecttitle = self.params.get('rejecttitle', False)
if rejecttitle:
rejecttitle = rejecttitle.decode('utf8')
if re.search(rejecttitle, title, re.IGNORECASE):
return u'"' + title + '" title matched reject pattern "' + rejecttitle + '"'
return None
def process_info(self, info_dict):
"""Process a single dictionary returned by an InfoExtractor."""
# Keep for backwards compatibility
info_dict['stitle'] = info_dict['title']
if not 'format' in info_dict:
info_dict['format'] = info_dict['ext']
reason = self._match_entry(info_dict)
if reason is not None:
self.to_screen(u'[download] ' + reason)
return
max_downloads = self.params.get('max_downloads')
if max_downloads is not None:
if self._num_downloads > int(max_downloads):
raise MaxDownloadsReached()
filename = self.prepare_filename(info_dict)
# Forced printings
if self.params.get('forcetitle', False):
compat_print(info_dict['title'])
if self.params.get('forceurl', False):
compat_print(info_dict['url'])
if self.params.get('forcethumbnail', False) and 'thumbnail' in info_dict:
compat_print(info_dict['thumbnail'])
if self.params.get('forcedescription', False) and 'description' in info_dict:
compat_print(info_dict['description'])
if self.params.get('forcefilename', False) and filename is not None:
compat_print(filename)
if self.params.get('forceformat', False):
compat_print(info_dict['format'])
# Do nothing else if in simulate mode
if self.params.get('simulate', False):
return
if filename is None:
return
try:
dn = os.path.dirname(encodeFilename(filename))
if dn != '' and not os.path.exists(dn): # dn is already encoded
os.makedirs(dn)
except (OSError, IOError) as err:
self.trouble(u'ERROR: unable to create directory ' + compat_str(err))
return
if self.params.get('writedescription', False):
try:
descfn = filename + u'.description'
self.report_writedescription(descfn)
with io.open(encodeFilename(descfn), 'w', encoding='utf-8') as descfile:
descfile.write(info_dict['description'])
except (OSError, IOError):
self.trouble(u'ERROR: Cannot write description file ' + descfn)
return
if self.params.get('writesubtitles', False) and 'subtitles' in info_dict and info_dict['subtitles']:
# subtitles download errors are already managed as troubles in relevant IE
# that way it will silently go on when used with unsupporting IE
try:
srtfn = filename.rsplit('.', 1)[0] + u'.srt'
self.report_writesubtitles(srtfn)
with io.open(encodeFilename(srtfn), 'w', encoding='utf-8') as srtfile:
srtfile.write(info_dict['subtitles'])
except (OSError, IOError):
self.trouble(u'ERROR: Cannot write subtitles file ' + descfn)
return
if self.params.get('writeinfojson', False):
infofn = filename + u'.info.json'
self.report_writeinfojson(infofn)
try:
json_info_dict = dict((k, v) for k,v in info_dict.items() if not k in ['urlhandle'])
write_json_file(json_info_dict, encodeFilename(infofn))
except (OSError, IOError):
self.trouble(u'ERROR: Cannot write metadata to JSON file ' + infofn)
return
if not self.params.get('skip_download', False):
if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(filename)):
success = True
else:
try:
success = self._do_download(filename, info_dict)
except (OSError, IOError) as err:
raise UnavailableVideoError()
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
self.trouble(u'ERROR: unable to download video data: %s' % str(err))
return
except (ContentTooShortError, ) as err:
self.trouble(u'ERROR: content too short (expected %s bytes and served %s)' % (err.expected, err.downloaded))
return
if success:
try:
self.post_process(filename, info_dict)
except (PostProcessingError) as err:
self.trouble(u'ERROR: postprocessing: %s' % str(err))
return
def download(self, url_list):
"""Download a given list of URLs."""
if len(url_list) > 1 and self.fixed_template():
raise SameFileError(self.params['outtmpl'])
for url in url_list:
suitable_found = False
for ie in self._ies:
# Go to next InfoExtractor if not suitable
if not ie.suitable(url):
continue
# Warn if the _WORKING attribute is False
if not ie.working():
self.to_stderr(u'WARNING: the program functionality for this site has been marked as broken, '
u'and will probably not work. If you want to go on, use the -i option.')
# Suitable InfoExtractor found
suitable_found = True
# Extract information from URL and process it
try:
videos = ie.extract(url)
except ExtractorError as de: # An error we somewhat expected
self.trouble(u'ERROR: ' + compat_str(de), de.format_traceback())
break
except Exception as e:
if self.params.get('ignoreerrors', False):
self.trouble(u'ERROR: ' + compat_str(e), tb=compat_str(traceback.format_exc()))
def _download_with_rtmpdump(self, filename, url, player_url, page_url, play_path, tc_url, live):
def run_rtmpdump(args):
start = time.time()
resume_percent = None
resume_downloaded_data_len = None
proc = subprocess.Popen(args, stderr=subprocess.PIPE)
cursor_in_new_line = True
proc_stderr_closed = False
while not proc_stderr_closed:
# read line from stderr
line = u''
while True:
char = proc.stderr.read(1)
if not char:
proc_stderr_closed = True
break
else:
raise
if char in [b'\r', b'\n']:
break
line += char.decode('ascii', 'replace')
if not line:
# proc_stderr_closed is True
continue
mobj = re.search(r'([0-9]+\.[0-9]{3}) kB / [0-9]+\.[0-9]{2} sec \(([0-9]{1,2}\.[0-9])%\)', line)
if mobj:
downloaded_data_len = int(float(mobj.group(1))*1024)
percent = float(mobj.group(2))
if not resume_percent:
resume_percent = percent
resume_downloaded_data_len = downloaded_data_len
eta = self.calc_eta(start, time.time(), 100-resume_percent, percent-resume_percent)
speed = self.calc_speed(start, time.time(), downloaded_data_len-resume_downloaded_data_len)
data_len = None
if percent > 0:
data_len = int(downloaded_data_len * 100 / percent)
data_len_str = u'~' + format_bytes(data_len)
self.report_progress(percent, data_len_str, speed, eta)
cursor_in_new_line = False
self._hook_progress({
'downloaded_bytes': downloaded_data_len,
'total_bytes': data_len,
'tmpfilename': tmpfilename,
'filename': filename,
'status': 'downloading',
'eta': eta,
'speed': speed,
})
elif self.params.get('verbose', False):
if not cursor_in_new_line:
self.to_screen(u'')
cursor_in_new_line = True
self.to_screen(u'[rtmpdump] '+line)
proc.wait()
if not cursor_in_new_line:
self.to_screen(u'')
return proc.returncode
if len(videos or []) > 1 and self.fixed_template():
raise SameFileError(self.params['outtmpl'])
for video in videos or []:
video['extractor'] = ie.IE_NAME
try:
self.increment_downloads()
self.process_info(video)
except UnavailableVideoError:
self.trouble(u'\nERROR: unable to download video')
# Suitable InfoExtractor had been found; go to next URL
break
if not suitable_found:
self.trouble(u'ERROR: no suitable InfoExtractor: %s' % url)
return self._download_retcode
def post_process(self, filename, ie_info):
"""Run all the postprocessors on the given file."""
info = dict(ie_info)
info['filepath'] = filename
keep_video = None
for pp in self._pps:
try:
keep_video_wish,new_info = pp.run(info)
if keep_video_wish is not None:
if keep_video_wish:
keep_video = keep_video_wish
elif keep_video is None:
# No clear decision yet, let IE decide
keep_video = keep_video_wish
except PostProcessingError as e:
self.to_stderr(u'ERROR: ' + e.msg)
if keep_video is False and not self.params.get('keepvideo', False):
try:
self.to_stderr(u'Deleting original file %s (pass -k to keep)' % filename)
os.remove(encodeFilename(filename))
except (IOError, OSError):
self.to_stderr(u'WARNING: Unable to remove downloaded video file')
def _download_with_rtmpdump(self, filename, url, player_url, page_url):
self.report_destination(filename)
tmpfilename = self.temp_name(filename)
test = self.params.get('test', False)
# Check for rtmpdump first
try:
subprocess.call(['rtmpdump', '-h'], stdout=(file(os.path.devnull, 'w')), stderr=subprocess.STDOUT)
subprocess.call(['rtmpdump', '-h'], stdout=(open(os.path.devnull, 'w')), stderr=subprocess.STDOUT)
except (OSError, IOError):
self.trouble(u'ERROR: RTMP download detected but "rtmpdump" could not be run')
self.report_error(u'RTMP download detected but "rtmpdump" could not be run')
return False
# Download using rtmpdump. rtmpdump returns exit code 2 when
# the connection was interrumpted and resuming appears to be
# possible. This is part of rtmpdump's normal usage, AFAIK.
basic_args = ['rtmpdump', '-q', '-r', url, '-o', tmpfilename]
basic_args = ['rtmpdump', '--verbose', '-r', url, '-o', tmpfilename]
if player_url is not None:
basic_args += ['-W', player_url]
basic_args += ['--swfVfy', player_url]
if page_url is not None:
basic_args += ['--pageUrl', page_url]
args = basic_args + [[], ['-e', '-k', '1']][self.params.get('continuedl', False)]
if play_path is not None:
basic_args += ['--playpath', play_path]
if tc_url is not None:
basic_args += ['--tcUrl', url]
if test:
basic_args += ['--stop', '1']
if live:
basic_args += ['--live']
args = basic_args + [[], ['--resume', '--skip', '1']][self.params.get('continuedl', False)]
if sys.platform == 'win32' and sys.version_info < (3, 0):
# Windows subprocess module does not actually support Unicode
# on Python 2.x
# See http://stackoverflow.com/a/9951851/35070
subprocess_encoding = sys.getfilesystemencoding()
args = [a.encode(subprocess_encoding, 'ignore') for a in args]
else:
subprocess_encoding = None
if self.params.get('verbose', False):
if subprocess_encoding:
str_args = [
a.decode(subprocess_encoding) if isinstance(a, bytes) else a
for a in args]
else:
str_args = args
try:
import pipes
shell_quote = lambda args: ' '.join(map(pipes.quote, args))
shell_quote = lambda args: ' '.join(map(pipes.quote, str_args))
except ImportError:
shell_quote = repr
self.to_screen(u'[debug] rtmpdump command line: ' + shell_quote(args))
retval = subprocess.call(args)
while retval == 2 or retval == 1:
self.to_screen(u'[debug] rtmpdump command line: ' + shell_quote(str_args))
retval = run_rtmpdump(args)
while (retval == 2 or retval == 1) and not test:
prevsize = os.path.getsize(encodeFilename(tmpfilename))
self.to_screen(u'\r[rtmpdump] %s bytes' % prevsize, skip_eol=True)
self.to_screen(u'[rtmpdump] %s bytes' % prevsize)
time.sleep(5.0) # This seems to be needed
retval = subprocess.call(basic_args + ['-e'] + [[], ['-k', '1']][retval == 1])
retval = run_rtmpdump(basic_args + ['-e'] + [[], ['-k', '1']][retval == 1])
cursize = os.path.getsize(encodeFilename(tmpfilename))
if prevsize == cursize and retval == 1:
break
# Some rtmp streams seem abort after ~ 99.8%. Don't complain for those
if prevsize == cursize and retval == 2 and cursize > 1024:
self.to_screen(u'\r[rtmpdump] Could not download the whole video. This can happen for some advertisements.')
self.to_screen(u'[rtmpdump] Could not download the whole video. This can happen for some advertisements.')
retval = 0
break
if retval == 0:
if retval == 0 or (test and retval == 2):
fsize = os.path.getsize(encodeFilename(tmpfilename))
self.to_screen(u'\r[rtmpdump] %s bytes' % fsize)
self.to_screen(u'[rtmpdump] %s bytes' % fsize)
self.try_rename(tmpfilename, filename)
self._hook_progress({
'downloaded_bytes': fsize,
@ -612,9 +390,75 @@ class FileDownloader(object):
})
return True
else:
self.trouble(u'\nERROR: rtmpdump exited with code %d' % retval)
self.to_stderr(u"\n")
self.report_error(u'rtmpdump exited with code %d' % retval)
return False
def _download_with_mplayer(self, filename, url):
self.report_destination(filename)
tmpfilename = self.temp_name(filename)
args = ['mplayer', '-really-quiet', '-vo', 'null', '-vc', 'dummy', '-dumpstream', '-dumpfile', tmpfilename, url]
# Check for mplayer first
try:
subprocess.call(['mplayer', '-h'], stdout=(open(os.path.devnull, 'w')), stderr=subprocess.STDOUT)
except (OSError, IOError):
self.report_error(u'MMS or RTSP download detected but "%s" could not be run' % args[0] )
return False
# Download using mplayer.
retval = subprocess.call(args)
if retval == 0:
fsize = os.path.getsize(encodeFilename(tmpfilename))
self.to_screen(u'\r[%s] %s bytes' % (args[0], fsize))
self.try_rename(tmpfilename, filename)
self._hook_progress({
'downloaded_bytes': fsize,
'total_bytes': fsize,
'filename': filename,
'status': 'finished',
})
return True
else:
self.to_stderr(u"\n")
self.report_error(u'mplayer exited with code %d' % retval)
return False
def _download_m3u8_with_ffmpeg(self, filename, url):
self.report_destination(filename)
tmpfilename = self.temp_name(filename)
args = ['-y', '-i', url, '-f', 'mp4', '-c', 'copy',
'-bsf:a', 'aac_adtstoasc', tmpfilename]
for program in ['avconv', 'ffmpeg']:
try:
subprocess.call([program, '-version'], stdout=(open(os.path.devnull, 'w')), stderr=subprocess.STDOUT)
break
except (OSError, IOError):
pass
else:
self.report_error(u'm3u8 download detected but ffmpeg or avconv could not be found')
cmd = [program] + args
retval = subprocess.call(cmd)
if retval == 0:
fsize = os.path.getsize(encodeFilename(tmpfilename))
self.to_screen(u'\r[%s] %s bytes' % (args[0], fsize))
self.try_rename(tmpfilename, filename)
self._hook_progress({
'downloaded_bytes': fsize,
'total_bytes': fsize,
'filename': filename,
'status': 'finished',
})
return True
else:
self.to_stderr(u"\n")
self.report_error(u'ffmpeg exited with code %d' % retval)
return False
def _do_download(self, filename, info_dict):
url = info_dict['url']
@ -624,6 +468,7 @@ class FileDownloader(object):
self._hook_progress({
'filename': filename,
'status': 'finished',
'total_bytes': os.path.getsize(encodeFilename(filename)),
})
return True
@ -631,7 +476,18 @@ class FileDownloader(object):
if url.startswith('rtmp'):
return self._download_with_rtmpdump(filename, url,
info_dict.get('player_url', None),
info_dict.get('page_url', None))
info_dict.get('page_url', None),
info_dict.get('play_path', None),
info_dict.get('tc_url', None),
info_dict.get('rtmp_live', False))
# Attempt to download using mplayer
if url.startswith('mms') or url.startswith('rtsp'):
return self._download_with_mplayer(filename, url)
# m3u8 manifest are downloaded with ffmpeg
if determine_ext(url) == u'm3u8':
return self._download_m3u8_with_ffmpeg(filename, url)
tmpfilename = self.temp_name(filename)
stream = None
@ -712,7 +568,7 @@ class FileDownloader(object):
self.report_retry(count, retries)
if count > retries:
self.trouble(u'ERROR: giving up after %s retries' % retries)
self.report_error(u'giving up after %s retries' % retries)
return False
data_len = data.info().get('Content-length', None)
@ -727,7 +583,7 @@ class FileDownloader(object):
self.to_screen(u'\r[download] File is larger than max-filesize (%s bytes > %s bytes). Aborting.' % (data_len, max_data_len))
return False
data_len_str = self.format_bytes(data_len)
data_len_str = format_bytes(data_len)
byte_counter = 0 + resume_len
block_size = self.params.get('buffersize', 1024)
start = time.time()
@ -748,24 +604,25 @@ class FileDownloader(object):
filename = self.undo_temp_name(tmpfilename)
self.report_destination(filename)
except (OSError, IOError) as err:
self.trouble(u'ERROR: unable to open for writing: %s' % str(err))
self.report_error(u'unable to open for writing: %s' % str(err))
return False
try:
stream.write(data_block)
except (IOError, OSError) as err:
self.trouble(u'\nERROR: unable to write data: %s' % str(err))
self.to_stderr(u"\n")
self.report_error(u'unable to write data: %s' % str(err))
return False
if not self.params.get('noresizebuffer', False):
block_size = self.best_block_size(after - before, len(data_block))
# Progress message
speed_str = self.calc_speed(start, time.time(), byte_counter - resume_len)
speed = self.calc_speed(start, time.time(), byte_counter - resume_len)
if data_len is None:
self.report_progress('Unknown %', data_len_str, speed_str, 'Unknown ETA')
eta = percent = None
else:
percent_str = self.calc_percent(byte_counter, data_len)
eta_str = self.calc_eta(start, time.time(), data_len - resume_len, byte_counter - resume_len)
self.report_progress(percent_str, data_len_str, speed_str, eta_str)
percent = self.calc_percent(byte_counter, data_len)
eta = self.calc_eta(start, time.time(), data_len - resume_len, byte_counter - resume_len)
self.report_progress(percent, data_len_str, speed, eta)
self._hook_progress({
'downloaded_bytes': byte_counter,
@ -773,16 +630,19 @@ class FileDownloader(object):
'tmpfilename': tmpfilename,
'filename': filename,
'status': 'downloading',
'eta': eta,
'speed': speed,
})
# Apply rate limit
self.slow_down(start, byte_counter - resume_len)
if stream is None:
self.trouble(u'\nERROR: Did not get any data blocks')
self.to_stderr(u"\n")
self.report_error(u'Did not get any data blocks')
return False
stream.close()
self.report_finish()
self.report_finish(data_len_str, (time.time() - start))
if data_len is not None and byte_counter != data_len:
raise ContentTooShortError(byte_counter, int(data_len))
self.try_rename(tmpfilename, filename)
@ -814,6 +674,8 @@ class FileDownloader(object):
* downloaded_bytes: Bytes on disks
* total_bytes: Total bytes, None if unknown
* tmpfilename: The filename we're currently writing to
* eta: The estimated time in seconds, None if unknown
* speed: The download speed in bytes/second, None if unknown
Hooks are guaranteed to be called at least once (with status "finished")
if the download is successful.

File diff suppressed because it is too large Load Diff

View File

@ -1,14 +1,16 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import absolute_import
import os
import subprocess
import sys
import time
from .utils import *
from .utils import (
compat_subprocess_get_DEVNULL,
encodeFilename,
PostProcessingError,
shell_quote,
subtitles_filename,
)
class PostProcessor(object):
@ -76,17 +78,28 @@ class FFmpegPostProcessor(PostProcessor):
programs = ['avprobe', 'avconv', 'ffmpeg', 'ffprobe']
return dict((program, executable(program)) for program in programs)
def run_ffmpeg(self, path, out_path, opts):
def run_ffmpeg_multiple_files(self, input_paths, out_path, opts):
if not self._exes['ffmpeg'] and not self._exes['avconv']:
raise FFmpegPostProcessorError(u'ffmpeg or avconv not found. Please install one.')
cmd = ([self._exes['avconv'] or self._exes['ffmpeg'], '-y', '-i', encodeFilename(path)]
files_cmd = []
for path in input_paths:
files_cmd.extend(['-i', encodeFilename(path)])
cmd = ([self._exes['avconv'] or self._exes['ffmpeg'], '-y'] + files_cmd
+ opts +
[encodeFilename(self._ffmpeg_filename_argument(out_path))])
if self._downloader.params.get('verbose', False):
self._downloader.to_screen(u'[debug] ffmpeg command line: %s' % shell_quote(cmd))
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout,stderr = p.communicate()
if p.returncode != 0:
stderr = stderr.decode('utf-8', 'replace')
msg = stderr.strip().split('\n')[-1]
raise FFmpegPostProcessorError(msg.decode('utf-8', 'replace'))
raise FFmpegPostProcessorError(msg)
def run_ffmpeg(self, path, out_path, opts):
self.run_ffmpeg_multiple_files([path], out_path, opts)
def _ffmpeg_filename_argument(self, fn):
# ffmpeg broke --, see https://ffmpeg.org/trac/ffmpeg/ticket/2127 for details
@ -104,7 +117,8 @@ class FFmpegExtractAudioPP(FFmpegPostProcessor):
self._nopostoverwrites = nopostoverwrites
def get_audio_codec(self, path):
if not self._exes['ffprobe'] and not self._exes['avprobe']: return None
if not self._exes['ffprobe'] and not self._exes['avprobe']:
raise PostProcessingError(u'ffprobe or avprobe not found. Please install one.')
try:
cmd = [self._exes['avprobe'] or self._exes['ffprobe'], '-show_streams', encodeFilename(self._ffmpeg_filename_argument(path))]
handle = subprocess.Popen(cmd, stderr=compat_subprocess_get_DEVNULL(), stdout=subprocess.PIPE)
@ -132,7 +146,7 @@ class FFmpegExtractAudioPP(FFmpegPostProcessor):
try:
FFmpegPostProcessor.run_ffmpeg(self, path, out_path, opts)
except FFmpegPostProcessorError as err:
raise AudioConversionError(err.message)
raise AudioConversionError(err.msg)
def run(self, information):
path = information['filepath']
@ -172,7 +186,8 @@ class FFmpegExtractAudioPP(FFmpegPostProcessor):
extension = self._preferredcodec
more_opts = []
if self._preferredquality is not None:
if int(self._preferredquality) < 10:
# The opus codec doesn't support the -aq option
if int(self._preferredquality) < 10 and extension != 'opus':
more_opts += [self._exes['avconv'] and '-q:a' or '-aq', self._preferredquality]
else:
more_opts += [self._exes['avconv'] and '-b:a' or '-ab', self._preferredquality + 'k']
@ -188,6 +203,11 @@ class FFmpegExtractAudioPP(FFmpegPostProcessor):
prefix, sep, ext = path.rpartition(u'.') # not os.path.splitext, since the latter does not work on unicode in all setups
new_path = prefix + sep + extension
# If we download foo.mp3 and convert it to... foo.mp3, then don't delete foo.mp3, silly.
if new_path == path:
self._nopostoverwrites = True
try:
if self._nopostoverwrites and os.path.exists(encodeFilename(new_path)):
self._downloader.to_screen(u'[youtube] Post-process file %s exists, skipping' % new_path)
@ -197,7 +217,7 @@ class FFmpegExtractAudioPP(FFmpegPostProcessor):
except:
etype,e,tb = sys.exc_info()
if isinstance(e, AudioConversionError):
msg = u'audio conversion failed: ' + e.message
msg = u'audio conversion failed: ' + e.msg
else:
msg = u'error running ' + (self._exes['avconv'] and 'avconv' or 'ffmpeg')
raise PostProcessingError(msg)
@ -207,10 +227,10 @@ class FFmpegExtractAudioPP(FFmpegPostProcessor):
try:
os.utime(encodeFilename(new_path), (time.time(), information['filetime']))
except:
self._downloader.to_stderr(u'WARNING: Cannot update utime of audio file')
self._downloader.report_warning(u'Cannot update utime of audio file')
information['filepath'] = new_path
return False,information
return self._nopostoverwrites,information
class FFmpegVideoConvertor(FFmpegPostProcessor):
def __init__(self, downloader=None,preferedformat=None):
@ -230,3 +250,262 @@ class FFmpegVideoConvertor(FFmpegPostProcessor):
information['format'] = self._preferedformat
information['ext'] = self._preferedformat
return False,information
class FFmpegEmbedSubtitlePP(FFmpegPostProcessor):
# See http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt
_lang_map = {
'aa': 'aar',
'ab': 'abk',
'ae': 'ave',
'af': 'afr',
'ak': 'aka',
'am': 'amh',
'an': 'arg',
'ar': 'ara',
'as': 'asm',
'av': 'ava',
'ay': 'aym',
'az': 'aze',
'ba': 'bak',
'be': 'bel',
'bg': 'bul',
'bh': 'bih',
'bi': 'bis',
'bm': 'bam',
'bn': 'ben',
'bo': 'bod',
'br': 'bre',
'bs': 'bos',
'ca': 'cat',
'ce': 'che',
'ch': 'cha',
'co': 'cos',
'cr': 'cre',
'cs': 'ces',
'cu': 'chu',
'cv': 'chv',
'cy': 'cym',
'da': 'dan',
'de': 'deu',
'dv': 'div',
'dz': 'dzo',
'ee': 'ewe',
'el': 'ell',
'en': 'eng',
'eo': 'epo',
'es': 'spa',
'et': 'est',
'eu': 'eus',
'fa': 'fas',
'ff': 'ful',
'fi': 'fin',
'fj': 'fij',
'fo': 'fao',
'fr': 'fra',
'fy': 'fry',
'ga': 'gle',
'gd': 'gla',
'gl': 'glg',
'gn': 'grn',
'gu': 'guj',
'gv': 'glv',
'ha': 'hau',
'he': 'heb',
'hi': 'hin',
'ho': 'hmo',
'hr': 'hrv',
'ht': 'hat',
'hu': 'hun',
'hy': 'hye',
'hz': 'her',
'ia': 'ina',
'id': 'ind',
'ie': 'ile',
'ig': 'ibo',
'ii': 'iii',
'ik': 'ipk',
'io': 'ido',
'is': 'isl',
'it': 'ita',
'iu': 'iku',
'ja': 'jpn',
'jv': 'jav',
'ka': 'kat',
'kg': 'kon',
'ki': 'kik',
'kj': 'kua',
'kk': 'kaz',
'kl': 'kal',
'km': 'khm',
'kn': 'kan',
'ko': 'kor',
'kr': 'kau',
'ks': 'kas',
'ku': 'kur',
'kv': 'kom',
'kw': 'cor',
'ky': 'kir',
'la': 'lat',
'lb': 'ltz',
'lg': 'lug',
'li': 'lim',
'ln': 'lin',
'lo': 'lao',
'lt': 'lit',
'lu': 'lub',
'lv': 'lav',
'mg': 'mlg',
'mh': 'mah',
'mi': 'mri',
'mk': 'mkd',
'ml': 'mal',
'mn': 'mon',
'mr': 'mar',
'ms': 'msa',
'mt': 'mlt',
'my': 'mya',
'na': 'nau',
'nb': 'nob',
'nd': 'nde',
'ne': 'nep',
'ng': 'ndo',
'nl': 'nld',
'nn': 'nno',
'no': 'nor',
'nr': 'nbl',
'nv': 'nav',
'ny': 'nya',
'oc': 'oci',
'oj': 'oji',
'om': 'orm',
'or': 'ori',
'os': 'oss',
'pa': 'pan',
'pi': 'pli',
'pl': 'pol',
'ps': 'pus',
'pt': 'por',
'qu': 'que',
'rm': 'roh',
'rn': 'run',
'ro': 'ron',
'ru': 'rus',
'rw': 'kin',
'sa': 'san',
'sc': 'srd',
'sd': 'snd',
'se': 'sme',
'sg': 'sag',
'si': 'sin',
'sk': 'slk',
'sl': 'slv',
'sm': 'smo',
'sn': 'sna',
'so': 'som',
'sq': 'sqi',
'sr': 'srp',
'ss': 'ssw',
'st': 'sot',
'su': 'sun',
'sv': 'swe',
'sw': 'swa',
'ta': 'tam',
'te': 'tel',
'tg': 'tgk',
'th': 'tha',
'ti': 'tir',
'tk': 'tuk',
'tl': 'tgl',
'tn': 'tsn',
'to': 'ton',
'tr': 'tur',
'ts': 'tso',
'tt': 'tat',
'tw': 'twi',
'ty': 'tah',
'ug': 'uig',
'uk': 'ukr',
'ur': 'urd',
'uz': 'uzb',
've': 'ven',
'vi': 'vie',
'vo': 'vol',
'wa': 'wln',
'wo': 'wol',
'xh': 'xho',
'yi': 'yid',
'yo': 'yor',
'za': 'zha',
'zh': 'zho',
'zu': 'zul',
}
def __init__(self, downloader=None, subtitlesformat='srt'):
super(FFmpegEmbedSubtitlePP, self).__init__(downloader)
self._subformat = subtitlesformat
@classmethod
def _conver_lang_code(cls, code):
"""Convert language code from ISO 639-1 to ISO 639-2/T"""
return cls._lang_map.get(code[:2])
def run(self, information):
if information['ext'] != u'mp4':
self._downloader.to_screen(u'[ffmpeg] Subtitles can only be embedded in mp4 files')
return True, information
if not information.get('subtitles'):
self._downloader.to_screen(u'[ffmpeg] There aren\'t any subtitles to embed')
return True, information
sub_langs = [key for key in information['subtitles']]
filename = information['filepath']
input_files = [filename] + [subtitles_filename(filename, lang, self._subformat) for lang in sub_langs]
opts = ['-map', '0:0', '-map', '0:1', '-c:v', 'copy', '-c:a', 'copy']
for (i, lang) in enumerate(sub_langs):
opts.extend(['-map', '%d:0' % (i+1), '-c:s:%d' % i, 'mov_text'])
lang_code = self._conver_lang_code(lang)
if lang_code is not None:
opts.extend(['-metadata:s:s:%d' % i, 'language=%s' % lang_code])
opts.extend(['-f', 'mp4'])
temp_filename = filename + u'.temp'
self._downloader.to_screen(u'[ffmpeg] Embedding subtitles in \'%s\'' % filename)
self.run_ffmpeg_multiple_files(input_files, temp_filename, opts)
os.remove(encodeFilename(filename))
os.rename(encodeFilename(temp_filename), encodeFilename(filename))
return True, information
class FFmpegMetadataPP(FFmpegPostProcessor):
def run(self, info):
metadata = {}
if info.get('title') is not None:
metadata['title'] = info['title']
if info.get('upload_date') is not None:
metadata['date'] = info['upload_date']
if info.get('uploader') is not None:
metadata['artist'] = info['uploader']
elif info.get('uploader_id') is not None:
metadata['artist'] = info['uploader_id']
if not metadata:
self._downloader.to_screen(u'[ffmpeg] There isn\'t any metadata to add')
return True, info
filename = info['filepath']
ext = os.path.splitext(filename)[1][1:]
temp_filename = filename + u'.temp'
options = ['-c', 'copy']
for (name, value) in metadata.items():
options.extend(['-metadata', '%s=%s' % (name, value)])
options.extend(['-f', ext])
self._downloader.to_screen(u'[ffmpeg] Adding metadata to \'%s\'' % filename)
self.run_ffmpeg(filename, temp_filename, options)
os.remove(encodeFilename(filename))
os.rename(encodeFilename(temp_filename), encodeFilename(filename))
return True, info

1009
youtube_dl/YoutubeDL.py Normal file

File diff suppressed because it is too large Load Diff

View File

@ -1,9 +1,6 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import with_statement
from __future__ import absolute_import
__authors__ = (
'Ricardo Garcia Gonzalez',
'Danny Colligan',
@ -24,29 +21,65 @@ __authors__ = (
'Jaime Marquínez Ferrándiz',
'Jeff Crouse',
'Osama Khalid',
)
'Michael Walter',
'M. Yasoob Ullah Khalid',
'Julien Fraichard',
'Johny Mo Swag',
'Axel Noack',
'Albert Kim',
'Pierre Rudloff',
'Huarong Huo',
'Ismael Mejía',
'Steffan \'Ruirize\' James',
'Andras Elso',
'Jelle van der Waa',
'Marcin Cieślak',
'Anton Larionov',
'Takuya Tsuchida',
)
__license__ = 'Public Domain'
import codecs
import getpass
import optparse
import os
import random
import re
import shlex
import socket
import subprocess
import sys
import warnings
import platform
from .utils import *
from .utils import (
compat_print,
DateRange,
decodeOption,
determine_ext,
DownloadError,
get_cachedir,
MaxDownloadsReached,
preferredencoding,
SameFileError,
std_headers,
write_string,
)
from .update import update_self
from .FileDownloader import (
FileDownloader,
)
from .extractor import gen_extractors
from .version import __version__
from .FileDownloader import *
from .InfoExtractors import gen_extractors
from .PostProcessor import *
from .YoutubeDL import YoutubeDL
from .PostProcessor import (
FFmpegMetadataPP,
FFmpegVideoConvertor,
FFmpegExtractAudioPP,
FFmpegEmbedSubtitlePP,
)
def parseOpts():
def parseOpts(overrideArguments=None):
def _readOptions(filename_bytes):
try:
optionf = open(filename_bytes)
@ -76,6 +109,9 @@ def parseOpts():
return "".join(opts)
def _comma_separated_values_options_callback(option, opt_str, value, parser):
setattr(parser.values, option.dest, value.split(','))
def _find_term_columns():
columns = os.environ.get('COLUMNS', None)
if columns:
@ -89,6 +125,16 @@ def parseOpts():
pass
return None
def _hide_login_info(opts):
opts = list(opts)
for private_opt in ['-p', '--password', '-u', '--username', '--video-password']:
try:
i = opts.index(private_opt)
opts[i+1] = '<PRIVATE>'
except ValueError:
pass
return opts
max_width = 80
max_help_position = 80
@ -113,6 +159,8 @@ def parseOpts():
selection = optparse.OptionGroup(parser, 'Video Selection')
authentication = optparse.OptionGroup(parser, 'Authentication Options')
video_format = optparse.OptionGroup(parser, 'Video Format Options')
subtitles = optparse.OptionGroup(parser, 'Subtitle Options')
downloader = optparse.OptionGroup(parser, 'Download Options')
postproc = optparse.OptionGroup(parser, 'Post-processing Options')
filesystem = optparse.OptionGroup(parser, 'Filesystem Options')
verbosity = optparse.OptionGroup(parser, 'Verbosity / Simulation Options')
@ -122,27 +170,35 @@ def parseOpts():
general.add_option('-v', '--version',
action='version', help='print program version and exit')
general.add_option('-U', '--update',
action='store_true', dest='update_self', help='update this program to latest version')
action='store_true', dest='update_self', help='update this program to latest version. Make sure that you have sufficient permissions (run with sudo if needed)')
general.add_option('-i', '--ignore-errors',
action='store_true', dest='ignoreerrors', help='continue on download errors', default=False)
general.add_option('-r', '--rate-limit',
dest='ratelimit', metavar='LIMIT', help='download rate limit (e.g. 50k or 44.6m)')
general.add_option('-R', '--retries',
dest='retries', metavar='RETRIES', help='number of retries (default is %default)', default=10)
general.add_option('--buffer-size',
dest='buffersize', metavar='SIZE', help='size of download buffer (e.g. 1024 or 16k) (default is %default)', default="1024")
general.add_option('--no-resize-buffer',
action='store_true', dest='noresizebuffer',
help='do not automatically adjust the buffer size. By default, the buffer size is automatically resized from an initial value of SIZE.', default=False)
action='store_true', dest='ignoreerrors', help='continue on download errors, for example to to skip unavailable videos in a playlist', default=False)
general.add_option('--abort-on-error',
action='store_false', dest='ignoreerrors',
help='Abort downloading of further videos (in the playlist or the command line) if an error occurs')
general.add_option('--dump-user-agent',
action='store_true', dest='dump_user_agent',
help='display the current browser identification', default=False)
general.add_option('--user-agent',
dest='user_agent', help='specify a custom user agent', metavar='UA')
general.add_option('--referer',
dest='referer', help='specify a custom referer, use if the video access is restricted to one domain',
metavar='REF', default=None)
general.add_option('--list-extractors',
action='store_true', dest='list_extractors',
help='List all supported extractors and the URLs they would handle', default=False)
general.add_option('--test', action='store_true', dest='test', default=False, help=optparse.SUPPRESS_HELP)
general.add_option('--extractor-descriptions',
action='store_true', dest='list_extractor_descriptions',
help='Output descriptions of all supported extractors', default=False)
general.add_option('--proxy', dest='proxy', default=None, help='Use the specified HTTP/HTTPS proxy', metavar='URL')
general.add_option('--no-check-certificate', action='store_true', dest='no_check_certificate', default=False, help='Suppress HTTPS certificate validation.')
general.add_option(
'--cache-dir', dest='cachedir', default=get_cachedir(), metavar='DIR',
help='Location in the filesystem where youtube-dl can store downloaded information permanently. By default $XDG_CACHE_HOME/youtube-dl or ~/.cache/youtube-dl .')
general.add_option(
'--no-cache-dir', action='store_const', const=None, dest='cachedir',
help='Disable filesystem caching')
selection.add_option('--playlist-start',
dest='playliststart', metavar='NUMBER', help='playlist video to start at (default is %default)', default=1)
@ -150,9 +206,21 @@ def parseOpts():
dest='playlistend', metavar='NUMBER', help='playlist video to end at (default is last)', default=-1)
selection.add_option('--match-title', dest='matchtitle', metavar='REGEX',help='download only matching titles (regex or caseless sub-string)')
selection.add_option('--reject-title', dest='rejecttitle', metavar='REGEX',help='skip download for matching titles (regex or caseless sub-string)')
selection.add_option('--max-downloads', metavar='NUMBER', dest='max_downloads', help='Abort after downloading NUMBER files', default=None)
selection.add_option('--max-downloads', metavar='NUMBER',
dest='max_downloads', type=int, default=None,
help='Abort after downloading NUMBER files')
selection.add_option('--min-filesize', metavar='SIZE', dest='min_filesize', help="Do not download any videos smaller than SIZE (e.g. 50k or 44.6m)", default=None)
selection.add_option('--max-filesize', metavar='SIZE', dest='max_filesize', help="Do not download any videos larger than SIZE (e.g. 50k or 44.6m)", default=None)
selection.add_option('--date', metavar='DATE', dest='date', help='download only videos uploaded in this date', default=None)
selection.add_option('--datebefore', metavar='DATE', dest='datebefore', help='download only videos uploaded before this date', default=None)
selection.add_option('--dateafter', metavar='DATE', dest='dateafter', help='download only videos uploaded after this date', default=None)
selection.add_option('--no-playlist', action='store_true', dest='noplaylist', help='download only the currently playing video', default=False)
selection.add_option('--age-limit', metavar='YEARS', dest='age_limit',
help='download only videos suitable for the given age',
default=None, type=int)
selection.add_option('--download-archive', metavar='FILE',
dest='download_archive',
help='Download only videos not present in the archive file. Record all downloaded videos in it.')
authentication.add_option('-u', '--username',
@ -161,10 +229,13 @@ def parseOpts():
dest='password', metavar='PASSWORD', help='account password')
authentication.add_option('-n', '--netrc',
action='store_true', dest='usenetrc', help='use .netrc authentication data', default=False)
authentication.add_option('--video-password',
dest='videopassword', metavar='PASSWORD', help='video password (vimeo only)')
video_format.add_option('-f', '--format',
action='store', dest='format', metavar='FORMAT', help='video format code')
action='store', dest='format', metavar='FORMAT', default='best',
help='video format code, specifiy the order of preference using slashes: "-f 22/17/18". "-f mp4" and "-f flv" are also supported')
video_format.add_option('--all-formats',
action='store_const', dest='format', help='download all available video formats', const='all')
video_format.add_option('--prefer-free-formats',
@ -173,12 +244,37 @@ def parseOpts():
action='store', dest='format_limit', metavar='FORMAT', help='highest quality format to download')
video_format.add_option('-F', '--list-formats',
action='store_true', dest='listformats', help='list all available formats (currently youtube only)')
video_format.add_option('--write-srt',
subtitles.add_option('--write-sub', '--write-srt',
action='store_true', dest='writesubtitles',
help='write video closed captions to a .srt file (currently youtube only)', default=False)
video_format.add_option('--srt-lang',
action='store', dest='subtitleslang', metavar='LANG',
help='language of the closed captions to download (optional) use IETF language tags like \'en\'')
help='write subtitle file', default=False)
subtitles.add_option('--write-auto-sub', '--write-automatic-sub',
action='store_true', dest='writeautomaticsub',
help='write automatic subtitle file (youtube only)', default=False)
subtitles.add_option('--all-subs',
action='store_true', dest='allsubtitles',
help='downloads all the available subtitles of the video', default=False)
subtitles.add_option('--list-subs',
action='store_true', dest='listsubtitles',
help='lists all available subtitles for the video', default=False)
subtitles.add_option('--sub-format',
action='store', dest='subtitlesformat', metavar='FORMAT',
help='subtitle format (default=srt) ([sbv/vtt] youtube only)', default='srt')
subtitles.add_option('--sub-lang', '--sub-langs', '--srt-lang',
action='callback', dest='subtitleslangs', metavar='LANGS', type='str',
default=[], callback=_comma_separated_values_options_callback,
help='languages of the subtitles to download (optional) separated by commas, use IETF language tags like \'en,pt\'')
downloader.add_option('-r', '--rate-limit',
dest='ratelimit', metavar='LIMIT', help='maximum download rate in bytes per second (e.g. 50K or 4.2M)')
downloader.add_option('-R', '--retries',
dest='retries', metavar='RETRIES', help='number of retries (default is %default)', default=10)
downloader.add_option('--buffer-size',
dest='buffersize', metavar='SIZE', help='size of download buffer (e.g. 1024 or 16K) (default is %default)', default="1024")
downloader.add_option('--no-resize-buffer',
action='store_true', dest='noresizebuffer',
help='do not automatically adjust the buffer size. By default, the buffer size is automatically resized from an initial value of SIZE.', default=False)
downloader.add_option('--test', action='store_true', dest='test', default=False, help=optparse.SUPPRESS_HELP)
verbosity.add_option('-q', '--quiet',
action='store_true', dest='quiet', help='activates quiet mode', default=False)
@ -190,6 +286,8 @@ def parseOpts():
action='store_true', dest='geturl', help='simulate, quiet but print URL', default=False)
verbosity.add_option('-e', '--get-title',
action='store_true', dest='gettitle', help='simulate, quiet but print title', default=False)
verbosity.add_option('--get-id',
action='store_true', dest='getid', help='simulate, quiet but print id', default=False)
verbosity.add_option('--get-thumbnail',
action='store_true', dest='getthumbnail',
help='simulate, quiet but print thumbnail URL', default=False)
@ -202,6 +300,9 @@ def parseOpts():
verbosity.add_option('--get-format',
action='store_true', dest='getformat',
help='simulate, quiet but print output format', default=False)
verbosity.add_option('-j', '--dump-json',
action='store_true', dest='dumpjson',
help='simulate, quiet but print JSON information', default=False)
verbosity.add_option('--newline',
action='store_true', dest='progress_with_newline', help='output progress bar as new lines', default=False)
verbosity.add_option('--no-progress',
@ -211,18 +312,43 @@ def parseOpts():
help='display progress in console titlebar', default=False)
verbosity.add_option('-v', '--verbose',
action='store_true', dest='verbose', help='print various debugging information', default=False)
verbosity.add_option('--dump-intermediate-pages',
action='store_true', dest='dump_intermediate_pages', default=False,
help='print downloaded pages to debug problems(very verbose)')
verbosity.add_option('--write-pages',
action='store_true', dest='write_pages', default=False,
help='Write downloaded pages to files in the current directory')
verbosity.add_option('--youtube-print-sig-code',
action='store_true', dest='youtube_print_sig_code', default=False,
help=optparse.SUPPRESS_HELP)
filesystem.add_option('-t', '--title',
action='store_true', dest='usetitle', help='use title in file name', default=False)
action='store_true', dest='usetitle', help='use title in file name (default)', default=False)
filesystem.add_option('--id',
action='store_true', dest='useid', help='use video ID in file name', default=False)
action='store_true', dest='useid', help='use only video ID in file name', default=False)
filesystem.add_option('-l', '--literal',
action='store_true', dest='usetitle', help='[deprecated] alias of --title', default=False)
filesystem.add_option('-A', '--auto-number',
action='store_true', dest='autonumber',
help='number downloaded files starting from 00000', default=False)
filesystem.add_option('-o', '--output',
dest='outtmpl', metavar='TEMPLATE', help='output filename template. Use %(title)s to get the title, %(uploader)s for the uploader name, %(uploader_id)s for the uploader nickname if different, %(autonumber)s to get an automatically incremented number, %(ext)s for the filename extension, %(upload_date)s for the upload date (YYYYMMDD), %(extractor)s for the provider (youtube, metacafe, etc), %(id)s for the video id and %% for a literal percent. Use - to output to stdout. Can also be used to download to a different directory, for example with -o \'/my/downloads/%(uploader)s/%(title)s-%(id)s.%(ext)s\' .')
dest='outtmpl', metavar='TEMPLATE',
help=('output filename template. Use %(title)s to get the title, '
'%(uploader)s for the uploader name, %(uploader_id)s for the uploader nickname if different, '
'%(autonumber)s to get an automatically incremented number, '
'%(ext)s for the filename extension, '
'%(format)s for the format description (like "22 - 1280x720" or "HD"),'
'%(format_id)s for the unique id of the format (like Youtube\'s itags: "137"),'
'%(upload_date)s for the upload date (YYYYMMDD), '
'%(extractor)s for the provider (youtube, metacafe, etc), '
'%(id)s for the video id , %(playlist)s for the playlist the video is in, '
'%(playlist_index)s for the position in the playlist and %% for a literal percent. '
'Use - to output to stdout. Can also be used to download to a different directory, '
'for example with -o \'/my/downloads/%(uploader)s/%(title)s-%(id)s.%(ext)s\' .'))
filesystem.add_option('--autonumber-size',
dest='autonumber_size', metavar='NUMBER',
help='Specifies the number of digits in %(autonumber)s when it is present in output filename template or --auto-number option is given')
filesystem.add_option('--restrict-filenames',
action='store_true', dest='restrictfilenames',
help='Restrict filenames to only ASCII characters, and avoid "&" and spaces in filenames', default=False)
@ -231,7 +357,7 @@ def parseOpts():
filesystem.add_option('-w', '--no-overwrites',
action='store_true', dest='nooverwrites', help='do not overwrite files', default=False)
filesystem.add_option('-c', '--continue',
action='store_true', dest='continue_dl', help='resume partially downloaded files', default=True)
action='store_true', dest='continue_dl', help='force resume of partially downloaded files. By default, youtube-dl will resume downloads if possible.', default=True)
filesystem.add_option('--no-continue',
action='store_false', dest='continue_dl',
help='do not resume partially downloaded files (restart from beginning)')
@ -248,6 +374,12 @@ def parseOpts():
filesystem.add_option('--write-info-json',
action='store_true', dest='writeinfojson',
help='write video metadata to a .info.json file', default=False)
filesystem.add_option('--write-annotations',
action='store_true', dest='writeannotations',
help='write video annotations to a .annotation file', default=False)
filesystem.add_option('--write-thumbnail',
action='store_true', dest='writethumbnail',
help='write thumbnail image to disk', default=False)
postproc.add_option('-x', '--extract-audio', action='store_true', dest='extractaudio', default=False,
@ -262,49 +394,67 @@ def parseOpts():
help='keeps the video file on disk after the post-processing; the video is erased by default')
postproc.add_option('--no-post-overwrites', action='store_true', dest='nopostoverwrites', default=False,
help='do not overwrite post-processed files; the post-processed files are overwritten by default')
postproc.add_option('--embed-subs', action='store_true', dest='embedsubtitles', default=False,
help='embed subtitles in the video (only for mp4 videos)')
postproc.add_option('--add-metadata', action='store_true', dest='addmetadata', default=False,
help='add metadata to the files')
parser.add_option_group(general)
parser.add_option_group(selection)
parser.add_option_group(downloader)
parser.add_option_group(filesystem)
parser.add_option_group(verbosity)
parser.add_option_group(video_format)
parser.add_option_group(subtitles)
parser.add_option_group(authentication)
parser.add_option_group(postproc)
xdg_config_home = os.environ.get('XDG_CONFIG_HOME')
if xdg_config_home:
userConf = os.path.join(xdg_config_home, 'youtube-dl.conf')
if overrideArguments is not None:
opts, args = parser.parse_args(overrideArguments)
if opts.verbose:
write_string(u'[debug] Override config: ' + repr(overrideArguments) + '\n')
else:
userConf = os.path.join(os.path.expanduser('~'), '.config', 'youtube-dl.conf')
argv = _readOptions('/etc/youtube-dl.conf') + _readOptions(userConf) + sys.argv[1:]
opts, args = parser.parse_args(argv)
xdg_config_home = os.environ.get('XDG_CONFIG_HOME')
if xdg_config_home:
userConfFile = os.path.join(xdg_config_home, 'youtube-dl', 'config')
if not os.path.isfile(userConfFile):
userConfFile = os.path.join(xdg_config_home, 'youtube-dl.conf')
else:
userConfFile = os.path.join(os.path.expanduser('~'), '.config', 'youtube-dl', 'config')
if not os.path.isfile(userConfFile):
userConfFile = os.path.join(os.path.expanduser('~'), '.config', 'youtube-dl.conf')
systemConf = _readOptions('/etc/youtube-dl.conf')
userConf = _readOptions(userConfFile)
commandLineConf = sys.argv[1:]
argv = systemConf + userConf + commandLineConf
opts, args = parser.parse_args(argv)
if opts.verbose:
write_string(u'[debug] System config: ' + repr(_hide_login_info(systemConf)) + '\n')
write_string(u'[debug] User config: ' + repr(_hide_login_info(userConf)) + '\n')
write_string(u'[debug] Command-line args: ' + repr(_hide_login_info(commandLineConf)) + '\n')
return parser, opts, args
def _real_main():
parser, opts, args = parseOpts()
def _real_main(argv=None):
# Compatibility fixes for Windows
if sys.platform == 'win32':
# https://github.com/rg3/youtube-dl/issues/820
codecs.register(lambda name: codecs.lookup('utf-8') if name == 'cp65001' else None)
parser, opts, args = parseOpts(argv)
# Open appropriate CookieJar
if opts.cookiefile is None:
jar = compat_cookiejar.CookieJar()
else:
try:
jar = compat_cookiejar.MozillaCookieJar(opts.cookiefile)
if os.access(opts.cookiefile, os.R_OK):
jar.load()
except (IOError, OSError) as err:
if opts.verbose:
traceback.print_exc()
sys.stderr.write(u'ERROR: unable to open cookie file\n')
sys.exit(101)
# Set user agent
if opts.user_agent is not None:
std_headers['User-Agent'] = opts.user_agent
# Set referer
if opts.referer is not None:
std_headers['Referer'] = opts.referer
# Dump user agent
if opts.dump_user_agent:
print(std_headers['User-Agent'])
compat_print(std_headers['User-Agent'])
sys.exit(0)
# Batch file verification
@ -318,34 +468,43 @@ def _real_main():
batchurls = batchfd.readlines()
batchurls = [x.strip() for x in batchurls]
batchurls = [x for x in batchurls if len(x) > 0 and not re.search(r'^[#/;]', x)]
if opts.verbose:
write_string(u'[debug] Batch file urls: ' + repr(batchurls) + u'\n')
except IOError:
sys.exit(u'ERROR: batch file could not be read')
all_urls = batchurls + args
all_urls = [url.strip() for url in all_urls]
# General configuration
cookie_processor = compat_urllib_request.HTTPCookieProcessor(jar)
proxy_handler = compat_urllib_request.ProxyHandler()
opener = compat_urllib_request.build_opener(proxy_handler, cookie_processor, YoutubeDLHandler())
compat_urllib_request.install_opener(opener)
socket.setdefaulttimeout(300) # 5 minutes should be enough (famous last words)
extractors = gen_extractors()
if opts.list_extractors:
for ie in extractors:
print(ie.IE_NAME + (' (CURRENTLY BROKEN)' if not ie._WORKING else ''))
for ie in sorted(extractors, key=lambda ie: ie.IE_NAME.lower()):
compat_print(ie.IE_NAME + (' (CURRENTLY BROKEN)' if not ie._WORKING else ''))
matchedUrls = [url for url in all_urls if ie.suitable(url)]
all_urls = [url for url in all_urls if url not in matchedUrls]
for mu in matchedUrls:
print(u' ' + mu)
compat_print(u' ' + mu)
sys.exit(0)
if opts.list_extractor_descriptions:
for ie in sorted(extractors, key=lambda ie: ie.IE_NAME.lower()):
if not ie._WORKING:
continue
desc = getattr(ie, 'IE_DESC', ie.IE_NAME)
if desc is False:
continue
if hasattr(ie, 'SEARCH_KEY'):
_SEARCHES = (u'cute kittens', u'slithering pythons', u'falling cat', u'angry poodle', u'purple fish', u'running tortoise')
_COUNTS = (u'', u'5', u'10', u'all')
desc += u' (Example: "%s%s:%s" )' % (ie.SEARCH_KEY, random.choice(_COUNTS), random.choice(_SEARCHES))
compat_print(desc)
sys.exit(0)
# Conflicting, missing and erroneous options
if opts.usenetrc and (opts.username is not None or opts.password is not None):
parser.error(u'using .netrc conflicts with giving username/password')
if opts.password is not None and opts.username is None:
parser.error(u'account username missing')
parser.error(u' account username missing\n')
if opts.outtmpl is not None and (opts.usetitle or opts.autonumber or opts.useid):
parser.error(u'using output template conflicts with using title, video ID or auto number')
if opts.usetitle and opts.useid:
@ -370,7 +529,7 @@ def _real_main():
if opts.retries is not None:
try:
opts.retries = int(opts.retries)
except (TypeError, ValueError) as err:
except (TypeError, ValueError):
parser.error(u'invalid retry count specified')
if opts.buffersize is not None:
numeric_buffersize = FileDownloader.parse_bytes(opts.buffersize)
@ -381,13 +540,13 @@ def _real_main():
opts.playliststart = int(opts.playliststart)
if opts.playliststart <= 0:
raise ValueError(u'Playlist start must be positive')
except (TypeError, ValueError) as err:
except (TypeError, ValueError):
parser.error(u'invalid playlist start number specified')
try:
opts.playlistend = int(opts.playlistend)
if opts.playlistend != -1 and (opts.playlistend <= 0 or opts.playlistend < opts.playliststart):
raise ValueError(u'Playlist end must be greater than playlist start')
except (TypeError, ValueError) as err:
except (TypeError, ValueError):
parser.error(u'invalid playlist end number specified')
if opts.extractaudio:
if opts.audioformat not in ['best', 'aac', 'mp3', 'm4a', 'opus', 'vorbis', 'wav']:
@ -399,6 +558,15 @@ def _real_main():
if opts.recodevideo is not None:
if opts.recodevideo not in ['mp4', 'flv', 'webm', 'ogg']:
parser.error(u'invalid video recode format specified')
if opts.date is not None:
date = DateRange.day(opts.date)
else:
date = DateRange(opts.dateafter, opts.datebefore)
# --all-sub automatically sets --write-sub if --write-auto-sub is not given
# this was the old behaviour if only --all-sub was given.
if opts.allsubtitles and (opts.writeautomaticsub == False):
opts.writesubtitles = True
if sys.version_info < (3,):
# In Python 2, sys.argv is a bytestring (also note http://bugs.python.org/issue2128 for Windows systems)
@ -411,25 +579,33 @@ def _real_main():
or (opts.usetitle and u'%(title)s-%(id)s.%(ext)s')
or (opts.useid and u'%(id)s.%(ext)s')
or (opts.autonumber and u'%(autonumber)s-%(id)s.%(ext)s')
or u'%(id)s.%(ext)s')
# File downloader
fd = FileDownloader({
or u'%(title)s-%(id)s.%(ext)s')
if '%(ext)s' not in outtmpl and opts.extractaudio:
parser.error(u'Cannot download a video and extract audio into the same'
u' file! Use "%%(ext)s" instead of %r' %
determine_ext(outtmpl, u''))
ydl_opts = {
'usenetrc': opts.usenetrc,
'username': opts.username,
'password': opts.password,
'quiet': (opts.quiet or opts.geturl or opts.gettitle or opts.getthumbnail or opts.getdescription or opts.getfilename or opts.getformat),
'videopassword': opts.videopassword,
'quiet': (opts.quiet or opts.geturl or opts.gettitle or opts.getid or opts.getthumbnail or opts.getdescription or opts.getfilename or opts.getformat or opts.dumpjson),
'forceurl': opts.geturl,
'forcetitle': opts.gettitle,
'forceid': opts.getid,
'forcethumbnail': opts.getthumbnail,
'forcedescription': opts.getdescription,
'forcefilename': opts.getfilename,
'forceformat': opts.getformat,
'forcejson': opts.dumpjson,
'simulate': opts.simulate,
'skip_download': (opts.skip_download or opts.simulate or opts.geturl or opts.gettitle or opts.getthumbnail or opts.getdescription or opts.getfilename or opts.getformat),
'skip_download': (opts.skip_download or opts.simulate or opts.geturl or opts.gettitle or opts.getid or opts.getthumbnail or opts.getdescription or opts.getfilename or opts.getformat or opts.dumpjson),
'format': opts.format,
'format_limit': opts.format_limit,
'listformats': opts.listformats,
'outtmpl': outtmpl,
'autonumber_size': opts.autonumber_size,
'restrictfilenames': opts.restrictfilenames,
'ignoreerrors': opts.ignoreerrors,
'ratelimit': opts.ratelimit,
@ -442,77 +618,79 @@ def _real_main():
'progress_with_newline': opts.progress_with_newline,
'playliststart': opts.playliststart,
'playlistend': opts.playlistend,
'noplaylist': opts.noplaylist,
'logtostderr': opts.outtmpl == '-',
'consoletitle': opts.consoletitle,
'nopart': opts.nopart,
'updatetime': opts.updatetime,
'writedescription': opts.writedescription,
'writeannotations': opts.writeannotations,
'writeinfojson': opts.writeinfojson,
'writethumbnail': opts.writethumbnail,
'writesubtitles': opts.writesubtitles,
'subtitleslang': opts.subtitleslang,
'matchtitle': opts.matchtitle,
'rejecttitle': opts.rejecttitle,
'writeautomaticsub': opts.writeautomaticsub,
'allsubtitles': opts.allsubtitles,
'listsubtitles': opts.listsubtitles,
'subtitlesformat': opts.subtitlesformat,
'subtitleslangs': opts.subtitleslangs,
'matchtitle': decodeOption(opts.matchtitle),
'rejecttitle': decodeOption(opts.rejecttitle),
'max_downloads': opts.max_downloads,
'prefer_free_formats': opts.prefer_free_formats,
'verbose': opts.verbose,
'dump_intermediate_pages': opts.dump_intermediate_pages,
'write_pages': opts.write_pages,
'test': opts.test,
'keepvideo': opts.keepvideo,
'min_filesize': opts.min_filesize,
'max_filesize': opts.max_filesize
})
'max_filesize': opts.max_filesize,
'daterange': date,
'cachedir': opts.cachedir,
'youtube_print_sig_code': opts.youtube_print_sig_code,
'age_limit': opts.age_limit,
'download_archive': opts.download_archive,
'cookiefile': opts.cookiefile,
'nocheckcertificate': opts.no_check_certificate,
}
with YoutubeDL(ydl_opts) as ydl:
ydl.print_debug_header()
ydl.add_default_info_extractors()
# PostProcessors
# Add the metadata pp first, the other pps will copy it
if opts.addmetadata:
ydl.add_post_processor(FFmpegMetadataPP())
if opts.extractaudio:
ydl.add_post_processor(FFmpegExtractAudioPP(preferredcodec=opts.audioformat, preferredquality=opts.audioquality, nopostoverwrites=opts.nopostoverwrites))
if opts.recodevideo:
ydl.add_post_processor(FFmpegVideoConvertor(preferedformat=opts.recodevideo))
if opts.embedsubtitles:
ydl.add_post_processor(FFmpegEmbedSubtitlePP(subtitlesformat=opts.subtitlesformat))
# Update version
if opts.update_self:
update_self(ydl.to_screen, opts.verbose)
# Maybe do nothing
if len(all_urls) < 1:
if not opts.update_self:
parser.error(u'you must provide at least one URL')
else:
sys.exit()
if opts.verbose:
fd.to_screen(u'[debug] youtube-dl version ' + __version__)
try:
sp = subprocess.Popen(['git', 'rev-parse', '--short', 'HEAD'], stdout=subprocess.PIPE, stderr=subprocess.PIPE,
cwd=os.path.dirname(os.path.abspath(__file__)))
out, err = sp.communicate()
out = out.decode().strip()
if re.match('[0-9a-f]+', out):
fd.to_screen(u'[debug] Git HEAD: ' + out)
except:
pass
fd.to_screen(u'[debug] Python version %s - %s' %(platform.python_version(), platform.platform()))
fd.to_screen(u'[debug] Proxy map: ' + str(proxy_handler.proxies))
for extractor in extractors:
fd.add_info_extractor(extractor)
# PostProcessors
if opts.extractaudio:
fd.add_post_processor(FFmpegExtractAudioPP(preferredcodec=opts.audioformat, preferredquality=opts.audioquality, nopostoverwrites=opts.nopostoverwrites))
if opts.recodevideo:
fd.add_post_processor(FFmpegVideoConvertor(preferedformat=opts.recodevideo))
# Update version
if opts.update_self:
update_self(fd.to_screen, opts.verbose, sys.argv[0])
# Maybe do nothing
if len(all_urls) < 1:
if not opts.update_self:
parser.error(u'you must provide at least one URL')
else:
sys.exit()
try:
retcode = fd.download(all_urls)
except MaxDownloadsReached:
fd.to_screen(u'--max-download limit reached, aborting.')
retcode = 101
# Dump cookie jar if requested
if opts.cookiefile is not None:
try:
jar.save()
except (IOError, OSError) as err:
sys.exit(u'ERROR: unable to save cookie jar')
retcode = ydl.download(all_urls)
except MaxDownloadsReached:
ydl.to_screen(u'--max-download limit reached, aborting.')
retcode = 101
sys.exit(retcode)
def main():
def main(argv=None):
try:
_real_main()
_real_main(argv)
except DownloadError:
sys.exit(1)
except SameFileError:

View File

@ -9,7 +9,8 @@ import sys
if __package__ is None and not hasattr(sys, "frozen"):
# direct call of __main__.py
import os.path
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
path = os.path.realpath(os.path.abspath(__file__))
sys.path.append(os.path.dirname(os.path.dirname(path)))
import youtube_dl

202
youtube_dl/aes.py Normal file
View File

@ -0,0 +1,202 @@
__all__ = ['aes_encrypt', 'key_expansion', 'aes_ctr_decrypt', 'aes_decrypt_text']
import base64
from math import ceil
from .utils import bytes_to_intlist, intlist_to_bytes
BLOCK_SIZE_BYTES = 16
def aes_ctr_decrypt(data, key, counter):
"""
Decrypt with aes in counter mode
@param {int[]} data cipher
@param {int[]} key 16/24/32-Byte cipher key
@param {instance} counter Instance whose next_value function (@returns {int[]} 16-Byte block)
returns the next counter block
@returns {int[]} decrypted data
"""
expanded_key = key_expansion(key)
block_count = int(ceil(float(len(data)) / BLOCK_SIZE_BYTES))
decrypted_data=[]
for i in range(block_count):
counter_block = counter.next_value()
block = data[i*BLOCK_SIZE_BYTES : (i+1)*BLOCK_SIZE_BYTES]
block += [0]*(BLOCK_SIZE_BYTES - len(block))
cipher_counter_block = aes_encrypt(counter_block, expanded_key)
decrypted_data += xor(block, cipher_counter_block)
decrypted_data = decrypted_data[:len(data)]
return decrypted_data
def key_expansion(data):
"""
Generate key schedule
@param {int[]} data 16/24/32-Byte cipher key
@returns {int[]} 176/208/240-Byte expanded key
"""
data = data[:] # copy
rcon_iteration = 1
key_size_bytes = len(data)
expanded_key_size_bytes = (key_size_bytes // 4 + 7) * BLOCK_SIZE_BYTES
while len(data) < expanded_key_size_bytes:
temp = data[-4:]
temp = key_schedule_core(temp, rcon_iteration)
rcon_iteration += 1
data += xor(temp, data[-key_size_bytes : 4-key_size_bytes])
for _ in range(3):
temp = data[-4:]
data += xor(temp, data[-key_size_bytes : 4-key_size_bytes])
if key_size_bytes == 32:
temp = data[-4:]
temp = sub_bytes(temp)
data += xor(temp, data[-key_size_bytes : 4-key_size_bytes])
for _ in range(3 if key_size_bytes == 32 else 2 if key_size_bytes == 24 else 0):
temp = data[-4:]
data += xor(temp, data[-key_size_bytes : 4-key_size_bytes])
data = data[:expanded_key_size_bytes]
return data
def aes_encrypt(data, expanded_key):
"""
Encrypt one block with aes
@param {int[]} data 16-Byte state
@param {int[]} expanded_key 176/208/240-Byte expanded key
@returns {int[]} 16-Byte cipher
"""
rounds = len(expanded_key) // BLOCK_SIZE_BYTES - 1
data = xor(data, expanded_key[:BLOCK_SIZE_BYTES])
for i in range(1, rounds+1):
data = sub_bytes(data)
data = shift_rows(data)
if i != rounds:
data = mix_columns(data)
data = xor(data, expanded_key[i*BLOCK_SIZE_BYTES : (i+1)*BLOCK_SIZE_BYTES])
return data
def aes_decrypt_text(data, password, key_size_bytes):
"""
Decrypt text
- The first 8 Bytes of decoded 'data' are the 8 high Bytes of the counter
- The cipher key is retrieved by encrypting the first 16 Byte of 'password'
with the first 'key_size_bytes' Bytes from 'password' (if necessary filled with 0's)
- Mode of operation is 'counter'
@param {str} data Base64 encoded string
@param {str,unicode} password Password (will be encoded with utf-8)
@param {int} key_size_bytes Possible values: 16 for 128-Bit, 24 for 192-Bit or 32 for 256-Bit
@returns {str} Decrypted data
"""
NONCE_LENGTH_BYTES = 8
data = bytes_to_intlist(base64.b64decode(data))
password = bytes_to_intlist(password.encode('utf-8'))
key = password[:key_size_bytes] + [0]*(key_size_bytes - len(password))
key = aes_encrypt(key[:BLOCK_SIZE_BYTES], key_expansion(key)) * (key_size_bytes // BLOCK_SIZE_BYTES)
nonce = data[:NONCE_LENGTH_BYTES]
cipher = data[NONCE_LENGTH_BYTES:]
class Counter:
__value = nonce + [0]*(BLOCK_SIZE_BYTES - NONCE_LENGTH_BYTES)
def next_value(self):
temp = self.__value
self.__value = inc(self.__value)
return temp
decrypted_data = aes_ctr_decrypt(cipher, key, Counter())
plaintext = intlist_to_bytes(decrypted_data)
return plaintext
RCON = (0x8d, 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80, 0x1b, 0x36)
SBOX = (0x63, 0x7C, 0x77, 0x7B, 0xF2, 0x6B, 0x6F, 0xC5, 0x30, 0x01, 0x67, 0x2B, 0xFE, 0xD7, 0xAB, 0x76,
0xCA, 0x82, 0xC9, 0x7D, 0xFA, 0x59, 0x47, 0xF0, 0xAD, 0xD4, 0xA2, 0xAF, 0x9C, 0xA4, 0x72, 0xC0,
0xB7, 0xFD, 0x93, 0x26, 0x36, 0x3F, 0xF7, 0xCC, 0x34, 0xA5, 0xE5, 0xF1, 0x71, 0xD8, 0x31, 0x15,
0x04, 0xC7, 0x23, 0xC3, 0x18, 0x96, 0x05, 0x9A, 0x07, 0x12, 0x80, 0xE2, 0xEB, 0x27, 0xB2, 0x75,
0x09, 0x83, 0x2C, 0x1A, 0x1B, 0x6E, 0x5A, 0xA0, 0x52, 0x3B, 0xD6, 0xB3, 0x29, 0xE3, 0x2F, 0x84,
0x53, 0xD1, 0x00, 0xED, 0x20, 0xFC, 0xB1, 0x5B, 0x6A, 0xCB, 0xBE, 0x39, 0x4A, 0x4C, 0x58, 0xCF,
0xD0, 0xEF, 0xAA, 0xFB, 0x43, 0x4D, 0x33, 0x85, 0x45, 0xF9, 0x02, 0x7F, 0x50, 0x3C, 0x9F, 0xA8,
0x51, 0xA3, 0x40, 0x8F, 0x92, 0x9D, 0x38, 0xF5, 0xBC, 0xB6, 0xDA, 0x21, 0x10, 0xFF, 0xF3, 0xD2,
0xCD, 0x0C, 0x13, 0xEC, 0x5F, 0x97, 0x44, 0x17, 0xC4, 0xA7, 0x7E, 0x3D, 0x64, 0x5D, 0x19, 0x73,
0x60, 0x81, 0x4F, 0xDC, 0x22, 0x2A, 0x90, 0x88, 0x46, 0xEE, 0xB8, 0x14, 0xDE, 0x5E, 0x0B, 0xDB,
0xE0, 0x32, 0x3A, 0x0A, 0x49, 0x06, 0x24, 0x5C, 0xC2, 0xD3, 0xAC, 0x62, 0x91, 0x95, 0xE4, 0x79,
0xE7, 0xC8, 0x37, 0x6D, 0x8D, 0xD5, 0x4E, 0xA9, 0x6C, 0x56, 0xF4, 0xEA, 0x65, 0x7A, 0xAE, 0x08,
0xBA, 0x78, 0x25, 0x2E, 0x1C, 0xA6, 0xB4, 0xC6, 0xE8, 0xDD, 0x74, 0x1F, 0x4B, 0xBD, 0x8B, 0x8A,
0x70, 0x3E, 0xB5, 0x66, 0x48, 0x03, 0xF6, 0x0E, 0x61, 0x35, 0x57, 0xB9, 0x86, 0xC1, 0x1D, 0x9E,
0xE1, 0xF8, 0x98, 0x11, 0x69, 0xD9, 0x8E, 0x94, 0x9B, 0x1E, 0x87, 0xE9, 0xCE, 0x55, 0x28, 0xDF,
0x8C, 0xA1, 0x89, 0x0D, 0xBF, 0xE6, 0x42, 0x68, 0x41, 0x99, 0x2D, 0x0F, 0xB0, 0x54, 0xBB, 0x16)
MIX_COLUMN_MATRIX = ((2,3,1,1),
(1,2,3,1),
(1,1,2,3),
(3,1,1,2))
def sub_bytes(data):
return [SBOX[x] for x in data]
def rotate(data):
return data[1:] + [data[0]]
def key_schedule_core(data, rcon_iteration):
data = rotate(data)
data = sub_bytes(data)
data[0] = data[0] ^ RCON[rcon_iteration]
return data
def xor(data1, data2):
return [x^y for x, y in zip(data1, data2)]
def mix_column(data):
data_mixed = []
for row in range(4):
mixed = 0
for column in range(4):
addend = data[column]
if MIX_COLUMN_MATRIX[row][column] in (2,3):
addend <<= 1
if addend > 0xff:
addend &= 0xff
addend ^= 0x1b
if MIX_COLUMN_MATRIX[row][column] == 3:
addend ^= data[column]
mixed ^= addend & 0xff
data_mixed.append(mixed)
return data_mixed
def mix_columns(data):
data_mixed = []
for i in range(4):
column = data[i*4 : (i+1)*4]
data_mixed += mix_column(column)
return data_mixed
def shift_rows(data):
data_shifted = []
for column in range(4):
for row in range(4):
data_shifted.append( data[((column + row) & 0b11) * 4 + row] )
return data_shifted
def inc(data):
data = data[:] # copy
for i in range(len(data)-1,-1,-1):
if data[i] == 255:
data[i] = 0
else:
data[i] = data[i] + 1
break
return data

View File

@ -0,0 +1,211 @@
from .appletrailers import AppleTrailersIE
from .addanime import AddAnimeIE
from .anitube import AnitubeIE
from .archiveorg import ArchiveOrgIE
from .ard import ARDIE
from .arte import (
ArteTvIE,
ArteTVPlus7IE,
ArteTVCreativeIE,
ArteTVFutureIE,
)
from .auengine import AUEngineIE
from .bambuser import BambuserIE, BambuserChannelIE
from .bandcamp import BandcampIE, BandcampAlbumIE
from .bliptv import BlipTVIE, BlipTVUserIE
from .bloomberg import BloombergIE
from .breakcom import BreakIE
from .brightcove import BrightcoveIE
from .c56 import C56IE
from .canalplus import CanalplusIE
from .canalc2 import Canalc2IE
from .cinemassacre import CinemassacreIE
from .clipfish import ClipfishIE
from .cnn import CNNIE
from .collegehumor import CollegeHumorIE
from .comedycentral import ComedyCentralIE, ComedyCentralShowsIE
from .condenast import CondeNastIE
from .criterion import CriterionIE
from .cspan import CSpanIE
from .d8 import D8IE
from .dailymotion import (
DailymotionIE,
DailymotionPlaylistIE,
DailymotionUserIE,
)
from .daum import DaumIE
from .depositfiles import DepositFilesIE
from .dotsub import DotsubIE
from .dreisat import DreiSatIE
from .defense import DefenseGouvFrIE
from .ebaumsworld import EbaumsWorldIE
from .ehow import EHowIE
from .eighttracks import EightTracksIE
from .eitb import EitbIE
from .escapist import EscapistIE
from .exfm import ExfmIE
from .extremetube import ExtremeTubeIE
from .facebook import FacebookIE
from .faz import FazIE
from .fktv import (
FKTVIE,
FKTVPosteckeIE,
)
from .flickr import FlickrIE
from .francetv import (
PluzzIE,
FranceTvInfoIE,
France2IE,
GenerationQuoiIE
)
from .freesound import FreesoundIE
from .funnyordie import FunnyOrDieIE
from .gamekings import GamekingsIE
from .gamespot import GameSpotIE
from .gametrailers import GametrailersIE
from .generic import GenericIE
from .googleplus import GooglePlusIE
from .googlesearch import GoogleSearchIE
from .hark import HarkIE
from .hotnewhiphop import HotNewHipHopIE
from .howcast import HowcastIE
from .hypem import HypemIE
from .ign import IGNIE, OneUPIE
from .ina import InaIE
from .infoq import InfoQIE
from .instagram import InstagramIE
from .internetvideoarchive import InternetVideoArchiveIE
from .jeuxvideo import JeuxVideoIE
from .jukebox import JukeboxIE
from .justintv import JustinTVIE
from .kankan import KankanIE
from .keezmovies import KeezMoviesIE
from .kickstarter import KickStarterIE
from .keek import KeekIE
from .liveleak import LiveLeakIE
from .livestream import LivestreamIE, LivestreamOriginalIE
from .metacafe import MetacafeIE
from .metacritic import MetacriticIE
from .mit import TechTVMITIE, MITIE
from .mixcloud import MixcloudIE
from .mofosex import MofosexIE
from .mtv import MTVIE
from .muzu import MuzuTVIE
from .myspace import MySpaceIE
from .myspass import MySpassIE
from .myvideo import MyVideoIE
from .naver import NaverIE
from .nba import NBAIE
from .nbc import NBCNewsIE
from .newgrounds import NewgroundsIE
from .nhl import NHLIE, NHLVideocenterIE
from .niconico import NiconicoIE
from .nowvideo import NowVideoIE
from .ooyala import OoyalaIE
from .orf import ORFIE
from .pbs import PBSIE
from .photobucket import PhotobucketIE
from .pornhub import PornHubIE
from .pornotube import PornotubeIE
from .rbmaradio import RBMARadioIE
from .redtube import RedTubeIE
from .ringtv import RingTVIE
from .ro220 import Ro220IE
from .rottentomatoes import RottenTomatoesIE
from .roxwel import RoxwelIE
from .rtlnow import RTLnowIE
from .rutube import RutubeIE
from .sina import SinaIE
from .slashdot import SlashdotIE
from .slideshare import SlideshareIE
from .sohu import SohuIE
from .soundcloud import SoundcloudIE, SoundcloudSetIE, SoundcloudUserIE
from .southparkstudios import (
SouthParkStudiosIE,
SouthparkDeIE,
)
from .space import SpaceIE
from .spankwire import SpankwireIE
from .spiegel import SpiegelIE
from .stanfordoc import StanfordOpenClassroomIE
from .statigram import StatigramIE
from .steam import SteamIE
from .streamcloud import StreamcloudIE
from .sztvhu import SztvHuIE
from .teamcoco import TeamcocoIE
from .techtalks import TechTalksIE
from .ted import TEDIE
from .tf1 import TF1IE
from .thisav import ThisAVIE
from .toutv import TouTvIE
from .traileraddict import TrailerAddictIE
from .trilulilu import TriluliluIE
from .tube8 import Tube8IE
from .tudou import TudouIE
from .tumblr import TumblrIE
from .tutv import TutvIE
from .tvp import TvpIE
from .unistra import UnistraIE
from .ustream import UstreamIE, UstreamChannelIE
from .vbox7 import Vbox7IE
from .veehd import VeeHDIE
from .veoh import VeohIE
from .vevo import VevoIE
from .vice import ViceIE
from .viddler import ViddlerIE
from .videodetective import VideoDetectiveIE
from .videofyme import VideofyMeIE
from .videopremium import VideoPremiumIE
from .vimeo import VimeoIE, VimeoChannelIE
from .vine import VineIE
from .viki import VikiIE
from .vk import VKIE
from .wat import WatIE
from .websurg import WeBSurgIE
from .weibo import WeiboIE
from .wimp import WimpIE
from .worldstarhiphop import WorldStarHipHopIE
from .xhamster import XHamsterIE
from .xnxx import XNXXIE
from .xvideos import XVideosIE
from .xtube import XTubeIE
from .yahoo import YahooIE, YahooSearchIE
from .youjizz import YouJizzIE
from .youku import YoukuIE
from .youporn import YouPornIE
from .youtube import (
YoutubeIE,
YoutubePlaylistIE,
YoutubeSearchIE,
YoutubeSearchDateIE,
YoutubeUserIE,
YoutubeChannelIE,
YoutubeShowIE,
YoutubeSubscriptionsIE,
YoutubeRecommendedIE,
YoutubeTruncatedURLIE,
YoutubeWatchLaterIE,
YoutubeFavouritesIE,
YoutubeHistoryIE,
)
from .zdf import ZDFIE
_ALL_CLASSES = [
klass
for name, klass in globals().items()
if name.endswith('IE') and name != 'GenericIE'
]
_ALL_CLASSES.append(GenericIE)
def gen_extractors():
""" Return a list of an instance of every supported extractor.
The order does matter; the first extractor matched is the one handling the URL.
"""
return [klass() for klass in _ALL_CLASSES]
def get_info_extractor(ie_name):
"""Returns the info extractor class with the given ie_name"""
return globals()[ie_name+'IE']

View File

@ -0,0 +1,86 @@
import re
from .common import InfoExtractor
from ..utils import (
compat_HTTPError,
compat_str,
compat_urllib_parse,
compat_urllib_parse_urlparse,
ExtractorError,
)
class AddAnimeIE(InfoExtractor):
_VALID_URL = r'^http://(?:\w+\.)?add-anime\.net/watch_video.php\?(?:.*?)v=(?P<video_id>[\w_]+)(?:.*)'
IE_NAME = u'AddAnime'
_TEST = {
u'url': u'http://www.add-anime.net/watch_video.php?v=24MR3YO5SAS9',
u'file': u'24MR3YO5SAS9.mp4',
u'md5': u'72954ea10bc979ab5e2eb288b21425a0',
u'info_dict': {
u"description": u"One Piece 606",
u"title": u"One Piece 606"
}
}
def _real_extract(self, url):
try:
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('video_id')
webpage = self._download_webpage(url, video_id)
except ExtractorError as ee:
if not isinstance(ee.cause, compat_HTTPError) or \
ee.cause.code != 503:
raise
redir_webpage = ee.cause.read().decode('utf-8')
action = self._search_regex(
r'<form id="challenge-form" action="([^"]+)"',
redir_webpage, u'Redirect form')
vc = self._search_regex(
r'<input type="hidden" name="jschl_vc" value="([^"]+)"/>',
redir_webpage, u'redirect vc value')
av = re.search(
r'a\.value = ([0-9]+)[+]([0-9]+)[*]([0-9]+);',
redir_webpage)
if av is None:
raise ExtractorError(u'Cannot find redirect math task')
av_res = int(av.group(1)) + int(av.group(2)) * int(av.group(3))
parsed_url = compat_urllib_parse_urlparse(url)
av_val = av_res + len(parsed_url.netloc)
confirm_url = (
parsed_url.scheme + u'://' + parsed_url.netloc +
action + '?' +
compat_urllib_parse.urlencode({
'jschl_vc': vc, 'jschl_answer': compat_str(av_val)}))
self._download_webpage(
confirm_url, video_id,
note=u'Confirming after redirect')
webpage = self._download_webpage(url, video_id)
formats = []
for format_id in ('normal', 'hq'):
rex = r"var %s_video_file = '(.*?)';" % re.escape(format_id)
video_url = self._search_regex(rex, webpage, u'video file URLx',
fatal=False)
if not video_url:
continue
formats.append({
'format_id': format_id,
'url': video_url,
})
if not formats:
raise ExtractorError(u'Cannot find any video format!')
video_title = self._og_search_title(webpage)
video_description = self._og_search_description(webpage)
return {
'_type': 'video',
'id': video_id,
'formats': formats,
'title': video_title,
'description': video_description
}

View File

@ -0,0 +1,55 @@
import re
import xml.etree.ElementTree
from .common import InfoExtractor
class AnitubeIE(InfoExtractor):
IE_NAME = u'anitube.se'
_VALID_URL = r'https?://(?:www\.)?anitube\.se/video/(?P<id>\d+)'
_TEST = {
u'url': u'http://www.anitube.se/video/36621',
u'md5': u'59d0eeae28ea0bc8c05e7af429998d43',
u'file': u'36621.mp4',
u'info_dict': {
u'id': u'36621',
u'ext': u'mp4',
u'title': u'Recorder to Randoseru 01',
},
u'skip': u'Blocked in the US',
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
key = self._html_search_regex(r'http://www\.anitube\.se/embed/([A-Za-z0-9_-]*)',
webpage, u'key')
webpage_config = self._download_webpage('http://www.anitube.se/nuevo/econfig.php?key=%s' % key,
key)
config_xml = xml.etree.ElementTree.fromstring(webpage_config.encode('utf-8'))
video_title = config_xml.find('title').text
formats = []
video_url = config_xml.find('file')
if video_url is not None:
formats.append({
'format_id': 'sd',
'url': video_url.text,
})
video_url = config_xml.find('filehd')
if video_url is not None:
formats.append({
'format_id': 'hd',
'url': video_url.text,
})
return {
'id': video_id,
'title': video_title,
'formats': formats
}

View File

@ -0,0 +1,138 @@
import re
import xml.etree.ElementTree
import json
from .common import InfoExtractor
from ..utils import (
compat_urlparse,
determine_ext,
)
class AppleTrailersIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?trailers.apple.com/trailers/(?P<company>[^/]+)/(?P<movie>[^/]+)'
_TEST = {
u"url": u"http://trailers.apple.com/trailers/wb/manofsteel/",
u"playlist": [
{
u"file": u"manofsteel-trailer4.mov",
u"md5": u"d97a8e575432dbcb81b7c3acb741f8a8",
u"info_dict": {
u"duration": 111,
u"title": u"Trailer 4",
u"upload_date": u"20130523",
u"uploader_id": u"wb",
},
},
{
u"file": u"manofsteel-trailer3.mov",
u"md5": u"b8017b7131b721fb4e8d6f49e1df908c",
u"info_dict": {
u"duration": 182,
u"title": u"Trailer 3",
u"upload_date": u"20130417",
u"uploader_id": u"wb",
},
},
{
u"file": u"manofsteel-trailer.mov",
u"md5": u"d0f1e1150989b9924679b441f3404d48",
u"info_dict": {
u"duration": 148,
u"title": u"Trailer",
u"upload_date": u"20121212",
u"uploader_id": u"wb",
},
},
{
u"file": u"manofsteel-teaser.mov",
u"md5": u"5fe08795b943eb2e757fa95cb6def1cb",
u"info_dict": {
u"duration": 93,
u"title": u"Teaser",
u"upload_date": u"20120721",
u"uploader_id": u"wb",
},
}
]
}
_JSON_RE = r'iTunes.playURL\((.*?)\);'
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
movie = mobj.group('movie')
uploader_id = mobj.group('company')
playlist_url = compat_urlparse.urljoin(url, u'includes/playlists/itunes.inc')
playlist_snippet = self._download_webpage(playlist_url, movie)
playlist_cleaned = re.sub(r'(?s)<script[^<]*?>.*?</script>', u'', playlist_snippet)
playlist_cleaned = re.sub(r'<img ([^<]*?)>', r'<img \1/>', playlist_cleaned)
# The ' in the onClick attributes are not escaped, it couldn't be parsed
# with xml.etree.ElementTree.fromstring
# like: http://trailers.apple.com/trailers/wb/gravity/
def _clean_json(m):
return u'iTunes.playURL(%s);' % m.group(1).replace('\'', '&#39;')
playlist_cleaned = re.sub(self._JSON_RE, _clean_json, playlist_cleaned)
playlist_html = u'<html>' + playlist_cleaned + u'</html>'
doc = xml.etree.ElementTree.fromstring(playlist_html)
playlist = []
for li in doc.findall('./div/ul/li'):
on_click = li.find('.//a').attrib['onClick']
trailer_info_json = self._search_regex(self._JSON_RE,
on_click, u'trailer info')
trailer_info = json.loads(trailer_info_json)
title = trailer_info['title']
video_id = movie + '-' + re.sub(r'[^a-zA-Z0-9]', '', title).lower()
thumbnail = li.find('.//img').attrib['src']
upload_date = trailer_info['posted'].replace('-', '')
runtime = trailer_info['runtime']
m = re.search(r'(?P<minutes>[0-9]+):(?P<seconds>[0-9]{1,2})', runtime)
duration = None
if m:
duration = 60 * int(m.group('minutes')) + int(m.group('seconds'))
first_url = trailer_info['url']
trailer_id = first_url.split('/')[-1].rpartition('_')[0].lower()
settings_json_url = compat_urlparse.urljoin(url, 'includes/settings/%s.json' % trailer_id)
settings_json = self._download_webpage(settings_json_url, trailer_id, u'Downloading settings json')
settings = json.loads(settings_json)
formats = []
for format in settings['metadata']['sizes']:
# The src is a file pointing to the real video file
format_url = re.sub(r'_(\d*p.mov)', r'_h\1', format['src'])
formats.append({
'url': format_url,
'ext': determine_ext(format_url),
'format': format['type'],
'width': format['width'],
'height': int(format['height']),
})
formats = sorted(formats, key=lambda f: (f['height'], f['width']))
info = {
'_type': 'video',
'id': video_id,
'title': title,
'formats': formats,
'title': title,
'duration': duration,
'thumbnail': thumbnail,
'upload_date': upload_date,
'uploader_id': uploader_id,
'user_agent': 'QuickTime compatible (youtube-dl)',
}
# TODO: Remove when #980 has been merged
info['url'] = formats[-1]['url']
info['ext'] = formats[-1]['ext']
playlist.append(info)
return {
'_type': 'playlist',
'id': movie,
'entries': playlist,
}

View File

@ -0,0 +1,68 @@
import json
import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
unified_strdate,
)
class ArchiveOrgIE(InfoExtractor):
IE_NAME = 'archive.org'
IE_DESC = 'archive.org videos'
_VALID_URL = r'(?:https?://)?(?:www\.)?archive.org/details/(?P<id>[^?/]+)(?:[?].*)?$'
_TEST = {
u"url": u"http://archive.org/details/XD300-23_68HighlightsAResearchCntAugHumanIntellect",
u'file': u'XD300-23_68HighlightsAResearchCntAugHumanIntellect.ogv',
u'md5': u'8af1d4cf447933ed3c7f4871162602db',
u'info_dict': {
u"title": u"1968 Demo - FJCC Conference Presentation Reel #1",
u"description": u"Reel 1 of 3: Also known as the \"Mother of All Demos\", Doug Engelbart's presentation at the Fall Joint Computer Conference in San Francisco, December 9, 1968 titled \"A Research Center for Augmenting Human Intellect.\" For this presentation, Doug and his team astonished the audience by not only relating their research, but demonstrating it live. This was the debut of the mouse, interactive computing, hypermedia, computer supported software engineering, video teleconferencing, etc. See also <a href=\"http://dougengelbart.org/firsts/dougs-1968-demo.html\" rel=\"nofollow\">Doug's 1968 Demo page</a> for more background, highlights, links, and the detailed paper published in this conference proceedings. Filmed on 3 reels: Reel 1 | <a href=\"http://www.archive.org/details/XD300-24_68HighlightsAResearchCntAugHumanIntellect\" rel=\"nofollow\">Reel 2</a> | <a href=\"http://www.archive.org/details/XD300-25_68HighlightsAResearchCntAugHumanIntellect\" rel=\"nofollow\">Reel 3</a>",
u"upload_date": u"19681210",
u"uploader": u"SRI International"
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
json_url = url + (u'?' if u'?' in url else '&') + u'output=json'
json_data = self._download_webpage(json_url, video_id)
data = json.loads(json_data)
title = data['metadata']['title'][0]
description = data['metadata']['description'][0]
uploader = data['metadata']['creator'][0]
upload_date = unified_strdate(data['metadata']['date'][0])
formats = [{
'format': fdata['format'],
'url': 'http://' + data['server'] + data['dir'] + fn,
'file_size': int(fdata['size']),
}
for fn,fdata in data['files'].items()
if 'Video' in fdata['format']]
formats.sort(key=lambda fdata: fdata['file_size'])
for f in formats:
f['ext'] = determine_ext(f['url'])
info = {
'_type': 'video',
'id': video_id,
'title': title,
'formats': formats,
'description': description,
'uploader': uploader,
'upload_date': upload_date,
}
thumbnail = data.get('misc', {}).get('image')
if thumbnail:
info['thumbnail'] = thumbnail
# TODO: Remove when #980 has been merged
info.update(formats[-1])
return info

View File

@ -0,0 +1,54 @@
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
)
class ARDIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?(?:(?:www\.)?ardmediathek\.de|mediathek\.daserste\.de)/(?:.*/)(?P<video_id>[^/\?]+)(?:\?.*)?'
_TITLE = r'<h1(?: class="boxTopHeadline")?>(?P<title>.*)</h1>'
_MEDIA_STREAM = r'mediaCollection\.addMediaStream\((?P<media_type>\d+), (?P<quality>\d+), "(?P<rtmp_url>[^"]*)", "(?P<video_url>[^"]*)", "[^"]*"\)'
_TEST = {
u'url': u'http://www.ardmediathek.de/das-erste/tagesschau-in-100-sek?documentId=14077640',
u'file': u'14077640.mp4',
u'md5': u'6ca8824255460c787376353f9e20bbd8',
u'info_dict': {
u"title": u"11.04.2013 09:23 Uhr - Tagesschau in 100 Sekunden"
},
u'skip': u'Requires rtmpdump'
}
def _real_extract(self, url):
# determine video id from url
m = re.match(self._VALID_URL, url)
numid = re.search(r'documentId=([0-9]+)', url)
if numid:
video_id = numid.group(1)
else:
video_id = m.group('video_id')
# determine title and media streams from webpage
html = self._download_webpage(url, video_id)
title = re.search(self._TITLE, html).group('title')
streams = [mo.groupdict() for mo in re.finditer(self._MEDIA_STREAM, html)]
if not streams:
assert '"fsk"' in html
raise ExtractorError(u'This video is only available after 8:00 pm')
# choose default media type and highest quality for now
stream = max([s for s in streams if int(s["media_type"]) == 0],
key=lambda s: int(s["quality"]))
# there's two possibilities: RTMP stream or HTTP download
info = {'id': video_id, 'title': title, 'ext': 'mp4'}
if stream['rtmp_url']:
self.to_screen(u'RTMP download detected')
assert stream['video_url'].startswith('mp4:')
info["url"] = stream["rtmp_url"]
info["play_path"] = stream['video_url']
else:
assert stream["video_url"].endswith('.mp4')
info["url"] = stream["video_url"]
return [info]

View File

@ -0,0 +1,262 @@
# encoding: utf-8
import re
import json
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import (
ExtractorError,
find_xpath_attr,
unified_strdate,
determine_ext,
get_element_by_id,
compat_str,
)
# There are different sources of video in arte.tv, the extraction process
# is different for each one. The videos usually expire in 7 days, so we can't
# add tests.
class ArteTvIE(InfoExtractor):
_VIDEOS_URL = r'(?:http://)?videos.arte.tv/(?P<lang>fr|de)/.*-(?P<id>.*?).html'
_LIVEWEB_URL = r'(?:http://)?liveweb.arte.tv/(?P<lang>fr|de)/(?P<subpage>.+?)/(?P<name>.+)'
_LIVE_URL = r'index-[0-9]+\.html$'
IE_NAME = u'arte.tv'
@classmethod
def suitable(cls, url):
return any(re.match(regex, url) for regex in (cls._VIDEOS_URL, cls._LIVEWEB_URL))
# TODO implement Live Stream
# from ..utils import compat_urllib_parse
# def extractLiveStream(self, url):
# video_lang = url.split('/')[-4]
# info = self.grep_webpage(
# url,
# r'src="(.*?/videothek_js.*?\.js)',
# 0,
# [
# (1, 'url', u'Invalid URL: %s' % url)
# ]
# )
# http_host = url.split('/')[2]
# next_url = 'http://%s%s' % (http_host, compat_urllib_parse.unquote(info.get('url')))
# info = self.grep_webpage(
# next_url,
# r'(s_artestras_scst_geoFRDE_' + video_lang + '.*?)\'.*?' +
# '(http://.*?\.swf).*?' +
# '(rtmp://.*?)\'',
# re.DOTALL,
# [
# (1, 'path', u'could not extract video path: %s' % url),
# (2, 'player', u'could not extract video player: %s' % url),
# (3, 'url', u'could not extract video url: %s' % url)
# ]
# )
# video_url = u'%s/%s' % (info.get('url'), info.get('path'))
def _real_extract(self, url):
mobj = re.match(self._VIDEOS_URL, url)
if mobj is not None:
id = mobj.group('id')
lang = mobj.group('lang')
return self._extract_video(url, id, lang)
mobj = re.match(self._LIVEWEB_URL, url)
if mobj is not None:
name = mobj.group('name')
lang = mobj.group('lang')
return self._extract_liveweb(url, name, lang)
if re.search(self._LIVE_URL, url) is not None:
raise ExtractorError(u'Arte live streams are not yet supported, sorry')
# self.extractLiveStream(url)
# return
def _extract_video(self, url, video_id, lang):
"""Extract from videos.arte.tv"""
ref_xml_url = url.replace('/videos/', '/do_delegate/videos/')
ref_xml_url = ref_xml_url.replace('.html', ',view,asPlayerXml.xml')
ref_xml = self._download_webpage(ref_xml_url, video_id, note=u'Downloading metadata')
ref_xml_doc = xml.etree.ElementTree.fromstring(ref_xml)
config_node = find_xpath_attr(ref_xml_doc, './/video', 'lang', lang)
config_xml_url = config_node.attrib['ref']
config_xml = self._download_webpage(config_xml_url, video_id, note=u'Downloading configuration')
video_urls = list(re.finditer(r'<url quality="(?P<quality>.*?)">(?P<url>.*?)</url>', config_xml))
def _key(m):
quality = m.group('quality')
if quality == 'hd':
return 2
else:
return 1
# We pick the best quality
video_urls = sorted(video_urls, key=_key)
video_url = list(video_urls)[-1].group('url')
title = self._html_search_regex(r'<name>(.*?)</name>', config_xml, 'title')
thumbnail = self._html_search_regex(r'<firstThumbnailUrl>(.*?)</firstThumbnailUrl>',
config_xml, 'thumbnail')
return {'id': video_id,
'title': title,
'thumbnail': thumbnail,
'url': video_url,
'ext': 'flv',
}
def _extract_liveweb(self, url, name, lang):
"""Extract form http://liveweb.arte.tv/"""
webpage = self._download_webpage(url, name)
video_id = self._search_regex(r'eventId=(\d+?)("|&)', webpage, u'event id')
config_xml = self._download_webpage('http://download.liveweb.arte.tv/o21/liveweb/events/event-%s.xml' % video_id,
video_id, u'Downloading information')
config_doc = xml.etree.ElementTree.fromstring(config_xml.encode('utf-8'))
event_doc = config_doc.find('event')
url_node = event_doc.find('video').find('urlHd')
if url_node is None:
url_node = event_doc.find('urlSd')
return {'id': video_id,
'title': event_doc.find('name%s' % lang.capitalize()).text,
'url': url_node.text.replace('MP4', 'mp4'),
'ext': 'flv',
'thumbnail': self._og_search_thumbnail(webpage),
}
class ArteTVPlus7IE(InfoExtractor):
IE_NAME = u'arte.tv:+7'
_VALID_URL = r'https?://www\.arte.tv/guide/(?P<lang>fr|de)/(?:(?:sendungen|emissions)/)?(?P<id>.*?)/(?P<name>.*?)(\?.*)?'
@classmethod
def _extract_url_info(cls, url):
mobj = re.match(cls._VALID_URL, url)
lang = mobj.group('lang')
# This is not a real id, it can be for example AJT for the news
# http://www.arte.tv/guide/fr/emissions/AJT/arte-journal
video_id = mobj.group('id')
return video_id, lang
def _real_extract(self, url):
video_id, lang = self._extract_url_info(url)
webpage = self._download_webpage(url, video_id)
return self._extract_from_webpage(webpage, video_id, lang)
def _extract_from_webpage(self, webpage, video_id, lang):
json_url = self._html_search_regex(r'arte_vp_url="(.*?)"', webpage, 'json url')
json_info = self._download_webpage(json_url, video_id, 'Downloading info json')
self.report_extraction(video_id)
info = json.loads(json_info)
player_info = info['videoJsonPlayer']
info_dict = {
'id': player_info['VID'],
'title': player_info['VTI'],
'description': player_info.get('VDE'),
'upload_date': unified_strdate(player_info.get('VDA', '').split(' ')[0]),
'thumbnail': player_info.get('programImage') or player_info.get('VTU', {}).get('IUR'),
}
all_formats = player_info['VSR'].values()
# Some formats use the m3u8 protocol
all_formats = list(filter(lambda f: f.get('videoFormat') != 'M3U8', all_formats))
def _match_lang(f):
if f.get('versionCode') is None:
return True
# Return true if that format is in the language of the url
if lang == 'fr':
l = 'F'
elif lang == 'de':
l = 'A'
regexes = [r'VO?%s' % l, r'VO?.-ST%s' % l]
return any(re.match(r, f['versionCode']) for r in regexes)
# Some formats may not be in the same language as the url
formats = filter(_match_lang, all_formats)
formats = list(formats) # in python3 filter returns an iterator
if not formats:
# Some videos are only available in the 'Originalversion'
# they aren't tagged as being in French or German
if all(f['versionCode'] == 'VO' for f in all_formats):
formats = all_formats
else:
raise ExtractorError(u'The formats list is empty')
if re.match(r'[A-Z]Q', formats[0]['quality']) is not None:
def sort_key(f):
return ['HQ', 'MQ', 'EQ', 'SQ'].index(f['quality'])
else:
def sort_key(f):
return (
# Sort first by quality
int(f.get('height',-1)),
int(f.get('bitrate',-1)),
# The original version with subtitles has lower relevance
re.match(r'VO-ST(F|A)', f.get('versionCode', '')) is None,
# The version with sourds/mal subtitles has also lower relevance
re.match(r'VO?(F|A)-STM\1', f.get('versionCode', '')) is None,
)
formats = sorted(formats, key=sort_key)
def _format(format_info):
quality = ''
height = format_info.get('height')
if height is not None:
quality = compat_str(height)
bitrate = format_info.get('bitrate')
if bitrate is not None:
quality += '-%d' % bitrate
if format_info.get('versionCode') is not None:
format_id = u'%s-%s' % (quality, format_info['versionCode'])
else:
format_id = quality
info = {
'format_id': format_id,
'format_note': format_info.get('versionLibelle'),
'width': format_info.get('width'),
'height': height,
}
if format_info['mediaType'] == u'rtmp':
info['url'] = format_info['streamer']
info['play_path'] = 'mp4:' + format_info['url']
info['ext'] = 'flv'
else:
info['url'] = format_info['url']
info['ext'] = determine_ext(info['url'])
return info
info_dict['formats'] = [_format(f) for f in formats]
return info_dict
# It also uses the arte_vp_url url from the webpage to extract the information
class ArteTVCreativeIE(ArteTVPlus7IE):
IE_NAME = u'arte.tv:creative'
_VALID_URL = r'https?://creative\.arte\.tv/(?P<lang>fr|de)/magazine?/(?P<id>.+)'
_TEST = {
u'url': u'http://creative.arte.tv/de/magazin/agentur-amateur-corporate-design',
u'file': u'050489-002.mp4',
u'info_dict': {
u'title': u'Agentur Amateur / Agence Amateur #2 : Corporate Design',
},
}
class ArteTVFutureIE(ArteTVPlus7IE):
IE_NAME = u'arte.tv:future'
_VALID_URL = r'https?://future\.arte\.tv/(?P<lang>fr|de)/(thema|sujet)/.*?#article-anchor-(?P<id>\d+)'
_TEST = {
u'url': u'http://future.arte.tv/fr/sujet/info-sciences#article-anchor-7081',
u'file': u'050940-003.mp4',
u'info_dict': {
u'title': u'Les champignons au secours de la planète',
},
}
def _real_extract(self, url):
anchor_id, lang = self._extract_url_info(url)
webpage = self._download_webpage(url, anchor_id)
row = get_element_by_id(anchor_id, webpage)
return self._extract_from_webpage(row, anchor_id, lang)

View File

@ -0,0 +1,49 @@
import re
from .common import InfoExtractor
from ..utils import (
compat_urllib_parse,
determine_ext,
ExtractorError,
)
class AUEngineIE(InfoExtractor):
_TEST = {
u'url': u'http://auengine.com/embed.php?file=lfvlytY6&w=650&h=370',
u'file': u'lfvlytY6.mp4',
u'md5': u'48972bdbcf1a3a2f5533e62425b41d4f',
u'info_dict': {
u"title": u"[Commie]The Legend of the Legendary Heroes - 03 - Replication Eye (Alpha Stigma)[F9410F5A]"
}
}
_VALID_URL = r'(?:http://)?(?:www\.)?auengine\.com/embed.php\?.*?file=([^&]+).*?'
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group(1)
webpage = self._download_webpage(url, video_id)
title = self._html_search_regex(r'<title>(?P<title>.+?)</title>',
webpage, u'title')
title = title.strip()
links = re.findall(r'\s(?:file|url):\s*["\']([^\'"]+)["\']', webpage)
links = map(compat_urllib_parse.unquote, links)
thumbnail = None
video_url = None
for link in links:
if link.endswith('.png'):
thumbnail = link
elif '/videos/' in link:
video_url = link
if not video_url:
raise ExtractorError(u'Could not find video URL')
ext = u'.' + determine_ext(video_url)
if ext == title[-len(ext):]:
title = title[:-len(ext)]
return {
'id': video_id,
'url': video_url,
'title': title,
'thumbnail': thumbnail,
}

View File

@ -0,0 +1,86 @@
import re
import json
import itertools
from .common import InfoExtractor
from ..utils import (
compat_urllib_request,
)
class BambuserIE(InfoExtractor):
IE_NAME = u'bambuser'
_VALID_URL = r'https?://bambuser\.com/v/(?P<id>\d+)'
_API_KEY = '005f64509e19a868399060af746a00aa'
_TEST = {
u'url': u'http://bambuser.com/v/4050584',
# MD5 seems to be flaky, see https://travis-ci.org/rg3/youtube-dl/jobs/14051016#L388
#u'md5': u'fba8f7693e48fd4e8641b3fd5539a641',
u'info_dict': {
u'id': u'4050584',
u'ext': u'flv',
u'title': u'Education engineering days - lightning talks',
u'duration': 3741,
u'uploader': u'pixelversity',
u'uploader_id': u'344706',
},
u'params': {
# It doesn't respect the 'Range' header, it would download the whole video
# caused the travis builds to fail: https://travis-ci.org/rg3/youtube-dl/jobs/14493845#L59
u'skip_download': True,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
info_url = ('http://player-c.api.bambuser.com/getVideo.json?'
'&api_key=%s&vid=%s' % (self._API_KEY, video_id))
info_json = self._download_webpage(info_url, video_id)
info = json.loads(info_json)['result']
return {
'id': video_id,
'title': info['title'],
'url': info['url'],
'thumbnail': info.get('preview'),
'duration': int(info['length']),
'view_count': int(info['views_total']),
'uploader': info['username'],
'uploader_id': info['uid'],
}
class BambuserChannelIE(InfoExtractor):
IE_NAME = u'bambuser:channel'
_VALID_URL = r'http://bambuser.com/channel/(?P<user>.*?)(?:/|#|\?|$)'
# The maximum number we can get with each request
_STEP = 50
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
user = mobj.group('user')
urls = []
last_id = ''
for i in itertools.count(1):
req_url = ('http://bambuser.com/xhr-api/index.php?username={user}'
'&sort=created&access_mode=0%2C1%2C2&limit={count}'
'&method=broadcast&format=json&vid_older_than={last}'
).format(user=user, count=self._STEP, last=last_id)
req = compat_urllib_request.Request(req_url)
# Without setting this header, we wouldn't get any result
req.add_header('Referer', 'http://bambuser.com/channel/%s' % user)
info_json = self._download_webpage(req, user,
u'Downloading page %d' % i)
results = json.loads(info_json)['result']
if len(results) == 0:
break
last_id = results[-1]['vid']
urls.extend(self.url_result(v['page'], 'Bambuser') for v in results)
return {
'_type': 'playlist',
'title': user,
'entries': urls,
}

View File

@ -0,0 +1,129 @@
import json
import re
from .common import InfoExtractor
from ..utils import (
compat_str,
compat_urlparse,
ExtractorError,
)
class BandcampIE(InfoExtractor):
IE_NAME = u'Bandcamp'
_VALID_URL = r'http://.*?\.bandcamp\.com/track/(?P<title>.*)'
_TESTS = [{
u'url': u'http://youtube-dl.bandcamp.com/track/youtube-dl-test-song',
u'file': u'1812978515.mp3',
u'md5': u'cdeb30cdae1921719a3cbcab696ef53c',
u'info_dict': {
u"title": u"youtube-dl test song \"'/\\\u00e4\u21ad"
},
u'skip': u'There is a limit of 200 free downloads / month for the test song'
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
title = mobj.group('title')
webpage = self._download_webpage(url, title)
# We get the link to the free download page
m_download = re.search(r'freeDownloadPage: "(.*?)"', webpage)
if m_download is None:
m_trackinfo = re.search(r'trackinfo: (.+),\s*?\n', webpage)
if m_trackinfo:
json_code = m_trackinfo.group(1)
data = json.loads(json_code)
for d in data:
formats = [{
'format_id': 'format_id',
'url': format_url,
'ext': format_id.partition('-')[0]
} for format_id, format_url in sorted(d['file'].items())]
return {
'id': compat_str(d['id']),
'title': d['title'],
'formats': formats,
}
else:
raise ExtractorError(u'No free songs found')
download_link = m_download.group(1)
id = re.search(r'var TralbumData = {(.*?)id: (?P<id>\d*?)$',
webpage, re.MULTILINE|re.DOTALL).group('id')
download_webpage = self._download_webpage(download_link, id,
'Downloading free downloads page')
# We get the dictionary of the track from some javascrip code
info = re.search(r'items: (.*?),$',
download_webpage, re.MULTILINE).group(1)
info = json.loads(info)[0]
# We pick mp3-320 for now, until format selection can be easily implemented.
mp3_info = info[u'downloads'][u'mp3-320']
# If we try to use this url it says the link has expired
initial_url = mp3_info[u'url']
re_url = r'(?P<server>http://(.*?)\.bandcamp\.com)/download/track\?enc=mp3-320&fsig=(?P<fsig>.*?)&id=(?P<id>.*?)&ts=(?P<ts>.*)$'
m_url = re.match(re_url, initial_url)
#We build the url we will use to get the final track url
# This url is build in Bandcamp in the script download_bunde_*.js
request_url = '%s/statdownload/track?enc=mp3-320&fsig=%s&id=%s&ts=%s&.rand=665028774616&.vrs=1' % (m_url.group('server'), m_url.group('fsig'), id, m_url.group('ts'))
final_url_webpage = self._download_webpage(request_url, id, 'Requesting download url')
# If we could correctly generate the .rand field the url would be
#in the "download_url" key
final_url = re.search(r'"retry_url":"(.*?)"', final_url_webpage).group(1)
track_info = {'id':id,
'title' : info[u'title'],
'ext' : 'mp3',
'url' : final_url,
'thumbnail' : info[u'thumb_url'],
'uploader' : info[u'artist']
}
return [track_info]
class BandcampAlbumIE(InfoExtractor):
IE_NAME = u'Bandcamp:album'
_VALID_URL = r'http://.*?\.bandcamp\.com/album/(?P<title>.*)'
_TEST = {
u'url': u'http://blazo.bandcamp.com/album/jazz-format-mixtape-vol-1',
u'playlist': [
{
u'file': u'1353101989.mp3',
u'md5': u'39bc1eded3476e927c724321ddf116cf',
u'info_dict': {
u'title': u'Intro',
}
},
{
u'file': u'38097443.mp3',
u'md5': u'1a2c32e2691474643e912cc6cd4bffaa',
u'info_dict': {
u'title': u'Kero One - Keep It Alive (Blazo remix)',
}
},
],
u'params': {
u'playlistend': 2
},
u'skip': u'Bancamp imposes download limits. See test_playlists:test_bandcamp_album for the playlist test'
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
title = mobj.group('title')
webpage = self._download_webpage(url, title)
tracks_paths = re.findall(r'<a href="(.*?)" itemprop="url">', webpage)
if not tracks_paths:
raise ExtractorError(u'The page doesn\'t contain any track')
entries = [
self.url_result(compat_urlparse.urljoin(url, t_path), ie=BandcampIE.ie_key())
for t_path in tracks_paths]
title = self._search_regex(r'album_title : "(.*?)"', webpage, u'title')
return {
'_type': 'playlist',
'title': title,
'entries': entries,
}

View File

@ -0,0 +1,193 @@
import datetime
import json
import os
import re
import socket
from .common import InfoExtractor
from ..utils import (
compat_http_client,
compat_parse_qs,
compat_str,
compat_urllib_error,
compat_urllib_parse_urlparse,
compat_urllib_request,
ExtractorError,
unescapeHTML,
)
class BlipTVIE(InfoExtractor):
"""Information extractor for blip.tv"""
_VALID_URL = r'^(?:https?://)?(?:\w+\.)?blip\.tv/((.+/)|(play/)|(api\.swf#))(.+)$'
_URL_EXT = r'^.*\.([a-z0-9]+)$'
IE_NAME = u'blip.tv'
_TEST = {
u'url': u'http://blip.tv/cbr/cbr-exclusive-gotham-city-imposters-bats-vs-jokerz-short-3-5796352',
u'file': u'5779306.m4v',
u'md5': u'80baf1ec5c3d2019037c1c707d676b9f',
u'info_dict': {
u"upload_date": u"20111205",
u"description": u"md5:9bc31f227219cde65e47eeec8d2dc596",
u"uploader": u"Comic Book Resources - CBR TV",
u"title": u"CBR EXCLUSIVE: \"Gotham City Imposters\" Bats VS Jokerz Short 3"
}
}
def report_direct_download(self, title):
"""Report information extraction."""
self.to_screen(u'%s: Direct download detected' % title)
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
if mobj is None:
raise ExtractorError(u'Invalid URL: %s' % url)
# See https://github.com/rg3/youtube-dl/issues/857
api_mobj = re.match(r'http://a\.blip\.tv/api\.swf#(?P<video_id>[\d\w]+)', url)
if api_mobj is not None:
url = 'http://blip.tv/play/g_%s' % api_mobj.group('video_id')
urlp = compat_urllib_parse_urlparse(url)
if urlp.path.startswith('/play/'):
request = compat_urllib_request.Request(url)
response = compat_urllib_request.urlopen(request)
redirecturl = response.geturl()
rurlp = compat_urllib_parse_urlparse(redirecturl)
file_id = compat_parse_qs(rurlp.fragment)['file'][0].rpartition('/')[2]
url = 'http://blip.tv/a/a-' + file_id
return self._real_extract(url)
if '?' in url:
cchar = '&'
else:
cchar = '?'
json_url = url + cchar + 'skin=json&version=2&no_wrap=1'
request = compat_urllib_request.Request(json_url)
request.add_header('User-Agent', 'iTunes/10.6.1')
self.report_extraction(mobj.group(1))
info = None
try:
urlh = compat_urllib_request.urlopen(request)
if urlh.headers.get('Content-Type', '').startswith('video/'): # Direct download
basename = url.split('/')[-1]
title,ext = os.path.splitext(basename)
title = title.decode('UTF-8')
ext = ext.replace('.', '')
self.report_direct_download(title)
info = {
'id': title,
'url': url,
'uploader': None,
'upload_date': None,
'title': title,
'ext': ext,
'urlhandle': urlh
}
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
raise ExtractorError(u'ERROR: unable to download video info webpage: %s' % compat_str(err))
if info is None: # Regular URL
try:
json_code_bytes = urlh.read()
json_code = json_code_bytes.decode('utf-8')
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
raise ExtractorError(u'Unable to read video info webpage: %s' % compat_str(err))
try:
json_data = json.loads(json_code)
if 'Post' in json_data:
data = json_data['Post']
else:
data = json_data
upload_date = datetime.datetime.strptime(data['datestamp'], '%m-%d-%y %H:%M%p').strftime('%Y%m%d')
if 'additionalMedia' in data:
formats = sorted(data['additionalMedia'], key=lambda f: int(f['media_height']))
best_format = formats[-1]
video_url = best_format['url']
else:
video_url = data['media']['url']
umobj = re.match(self._URL_EXT, video_url)
if umobj is None:
raise ValueError('Can not determine filename extension')
ext = umobj.group(1)
info = {
'id': compat_str(data['item_id']),
'url': video_url,
'uploader': data['display_name'],
'upload_date': upload_date,
'title': data['title'],
'ext': ext,
'format': data['media']['mimeType'],
'thumbnail': data['thumbnailUrl'],
'description': data['description'],
'player_url': data['embedUrl'],
'user_agent': 'iTunes/10.6.1',
}
except (ValueError,KeyError) as err:
raise ExtractorError(u'Unable to parse video information: %s' % repr(err))
return [info]
class BlipTVUserIE(InfoExtractor):
"""Information Extractor for blip.tv users."""
_VALID_URL = r'(?:(?:(?:https?://)?(?:\w+\.)?blip\.tv/)|bliptvuser:)([^/]+)/*$'
_PAGE_SIZE = 12
IE_NAME = u'blip.tv:user'
def _real_extract(self, url):
# Extract username
mobj = re.match(self._VALID_URL, url)
if mobj is None:
raise ExtractorError(u'Invalid URL: %s' % url)
username = mobj.group(1)
page_base = 'http://m.blip.tv/pr/show_get_full_episode_list?users_id=%s&lite=0&esi=1'
page = self._download_webpage(url, username, u'Downloading user page')
mobj = re.search(r'data-users-id="([^"]+)"', page)
page_base = page_base % mobj.group(1)
# Download video ids using BlipTV Ajax calls. Result size per
# query is limited (currently to 12 videos) so we need to query
# page by page until there are no video ids - it means we got
# all of them.
video_ids = []
pagenum = 1
while True:
url = page_base + "&page=" + str(pagenum)
page = self._download_webpage(url, username,
u'Downloading video ids from page %d' % pagenum)
# Extract video identifiers
ids_in_page = []
for mobj in re.finditer(r'href="/([^"]+)"', page):
if mobj.group(1) not in ids_in_page:
ids_in_page.append(unescapeHTML(mobj.group(1)))
video_ids.extend(ids_in_page)
# A little optimization - if current page is not
# "full", ie. does not contain PAGE_SIZE video ids then
# we can assume that this page is the last one - there
# are no more ids on further pages - no need to query
# again.
if len(ids_in_page) < self._PAGE_SIZE:
break
pagenum += 1
urls = [u'http://blip.tv/%s' % video_id for video_id in video_ids]
url_entries = [self.url_result(vurl, 'BlipTV') for vurl in urls]
return [self.playlist_result(url_entries, playlist_title = username)]

View File

@ -0,0 +1,27 @@
import re
from .common import InfoExtractor
class BloombergIE(InfoExtractor):
_VALID_URL = r'https?://www\.bloomberg\.com/video/(?P<name>.+?).html'
_TEST = {
u'url': u'http://www.bloomberg.com/video/shah-s-presentation-on-foreign-exchange-strategies-qurhIVlJSB6hzkVi229d8g.html',
u'file': u'12bzhqZTqQHmmlA8I-i0NpzJgcG5NNYX.mp4',
u'info_dict': {
u'title': u'Shah\'s Presentation on Foreign-Exchange Strategies',
u'description': u'md5:abc86e5236f9f0e4866c59ad36736686',
},
u'params': {
# Requires ffmpeg (m3u8 manifest)
u'skip_download': True,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
name = mobj.group('name')
webpage = self._download_webpage(url, name)
ooyala_url = self._og_search_video_url(webpage)
return self.url_result(ooyala_url, ie='Ooyala')

View File

@ -0,0 +1,38 @@
import re
import json
from .common import InfoExtractor
from ..utils import determine_ext
class BreakIE(InfoExtractor):
_VALID_URL = r'(?:http://)?(?:www\.)?break\.com/video/([^/]+)'
_TEST = {
u'url': u'http://www.break.com/video/when-girls-act-like-guys-2468056',
u'file': u'2468056.mp4',
u'md5': u'a3513fb1547fba4fb6cfac1bffc6c46b',
u'info_dict': {
u"title": u"When Girls Act Like D-Bags"
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group(1).split("-")[-1]
embed_url = 'http://www.break.com/embed/%s' % video_id
webpage = self._download_webpage(embed_url, video_id)
info_json = self._search_regex(r'var embedVars = ({.*?});', webpage,
u'info json', flags=re.DOTALL)
info = json.loads(info_json)
video_url = info['videoUri']
m_youtube = re.search(r'(https?://www\.youtube\.com/watch\?v=.*)', video_url)
if m_youtube is not None:
return self.url_result(m_youtube.group(1), 'Youtube')
final_url = video_url + '?' + info['AuthToken']
return [{
'id': video_id,
'url': final_url,
'ext': determine_ext(final_url),
'title': info['contentName'],
'thumbnail': info['thumbUri'],
}]

View File

@ -0,0 +1,177 @@
# encoding: utf-8
import re
import json
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import (
compat_urllib_parse,
find_xpath_attr,
compat_urlparse,
compat_str,
compat_urllib_request,
ExtractorError,
)
class BrightcoveIE(InfoExtractor):
_VALID_URL = r'https?://.*brightcove\.com/(services|viewer).*\?(?P<query>.*)'
_FEDERATED_URL_TEMPLATE = 'http://c.brightcove.com/services/viewer/htmlFederated?%s'
_PLAYLIST_URL_TEMPLATE = 'http://c.brightcove.com/services/json/experience/runtime/?command=get_programming_for_experience&playerKey=%s'
_TESTS = [
{
# From http://www.8tv.cat/8aldia/videos/xavier-sala-i-martin-aquesta-tarda-a-8-al-dia/
u'url': u'http://c.brightcove.com/services/viewer/htmlFederated?playerID=1654948606001&flashID=myExperience&%40videoPlayer=2371591881001',
u'file': u'2371591881001.mp4',
u'md5': u'8eccab865181d29ec2958f32a6a754f5',
u'note': u'Test Brightcove downloads and detection in GenericIE',
u'info_dict': {
u'title': u'Xavier Sala i Martín: “Un banc que no presta és un banc zombi que no serveix per a res”',
u'uploader': u'8TV',
u'description': u'md5:a950cc4285c43e44d763d036710cd9cd',
}
},
{
# From http://medianetwork.oracle.com/video/player/1785452137001
u'url': u'http://c.brightcove.com/services/viewer/htmlFederated?playerID=1217746023001&flashID=myPlayer&%40videoPlayer=1785452137001',
u'file': u'1785452137001.flv',
u'info_dict': {
u'title': u'JVMLS 2012: Arrays 2.0 - Opportunities and Challenges',
u'description': u'John Rose speaks at the JVM Language Summit, August 1, 2012.',
u'uploader': u'Oracle',
},
},
{
# From http://mashable.com/2013/10/26/thermoelectric-bracelet-lets-you-control-your-body-temperature/
u'url': u'http://c.brightcove.com/services/viewer/federated_f9?&playerID=1265504713001&publisherID=AQ%7E%7E%2CAAABBzUwv1E%7E%2CxP-xFHVUstiMFlNYfvF4G9yFnNaqCw_9&videoID=2750934548001',
u'info_dict': {
u'id': u'2750934548001',
u'ext': u'mp4',
u'title': u'This Bracelet Acts as a Personal Thermostat',
u'description': u'md5:547b78c64f4112766ccf4e151c20b6a0',
u'uploader': u'Mashable',
},
},
]
@classmethod
def _build_brighcove_url(cls, object_str):
"""
Build a Brightcove url from a xml string containing
<object class="BrightcoveExperience">{params}</object>
"""
# Fix up some stupid HTML, see https://github.com/rg3/youtube-dl/issues/1553
object_str = re.sub(r'(<param name="[^"]+" value="[^"]+")>',
lambda m: m.group(1) + '/>', object_str)
# Fix up some stupid XML, see https://github.com/rg3/youtube-dl/issues/1608
object_str = object_str.replace(u'<--', u'<!--')
object_doc = xml.etree.ElementTree.fromstring(object_str)
assert u'BrightcoveExperience' in object_doc.attrib['class']
params = {'flashID': object_doc.attrib['id'],
'playerID': find_xpath_attr(object_doc, './param', 'name', 'playerID').attrib['value'],
}
def find_param(name):
node = find_xpath_attr(object_doc, './param', 'name', name)
if node is not None:
return node.attrib['value']
return None
playerKey = find_param('playerKey')
# Not all pages define this value
if playerKey is not None:
params['playerKey'] = playerKey
# The three fields hold the id of the video
videoPlayer = find_param('@videoPlayer') or find_param('videoId') or find_param('videoID')
if videoPlayer is not None:
params['@videoPlayer'] = videoPlayer
linkBase = find_param('linkBaseURL')
if linkBase is not None:
params['linkBaseURL'] = linkBase
data = compat_urllib_parse.urlencode(params)
return cls._FEDERATED_URL_TEMPLATE % data
@classmethod
def _extract_brightcove_url(cls, webpage):
"""Try to extract the brightcove url from the wepbage, returns None
if it can't be found
"""
m_brightcove = re.search(
r'<object[^>]+?class=([\'"])[^>]*?BrightcoveExperience.*?\1.+?</object>',
webpage, re.DOTALL)
if m_brightcove is not None:
return cls._build_brighcove_url(m_brightcove.group())
else:
return None
def _real_extract(self, url):
# Change the 'videoId' and others field to '@videoPlayer'
url = re.sub(r'(?<=[?&])(videoI(d|D)|bctid)', '%40videoPlayer', url)
# Change bckey (used by bcove.me urls) to playerKey
url = re.sub(r'(?<=[?&])bckey', 'playerKey', url)
mobj = re.match(self._VALID_URL, url)
query_str = mobj.group('query')
query = compat_urlparse.parse_qs(query_str)
videoPlayer = query.get('@videoPlayer')
if videoPlayer:
return self._get_video_info(videoPlayer[0], query_str, query)
else:
player_key = query['playerKey']
return self._get_playlist_info(player_key[0])
def _get_video_info(self, video_id, query_str, query):
request_url = self._FEDERATED_URL_TEMPLATE % query_str
req = compat_urllib_request.Request(request_url)
linkBase = query.get('linkBaseURL')
if linkBase is not None:
req.add_header('Referer', linkBase[0])
webpage = self._download_webpage(req, video_id)
self.report_extraction(video_id)
info = self._search_regex(r'var experienceJSON = ({.*?});', webpage, 'json')
info = json.loads(info)['data']
video_info = info['programmedContent']['videoPlayer']['mediaDTO']
return self._extract_video_info(video_info)
def _get_playlist_info(self, player_key):
playlist_info = self._download_webpage(self._PLAYLIST_URL_TEMPLATE % player_key,
player_key, u'Downloading playlist information')
json_data = json.loads(playlist_info)
if 'videoList' not in json_data:
raise ExtractorError(u'Empty playlist')
playlist_info = json_data['videoList']
videos = [self._extract_video_info(video_info) for video_info in playlist_info['mediaCollectionDTO']['videoDTOs']]
return self.playlist_result(videos, playlist_id=playlist_info['id'],
playlist_title=playlist_info['mediaCollectionDTO']['displayName'])
def _extract_video_info(self, video_info):
info = {
'id': compat_str(video_info['id']),
'title': video_info['displayName'],
'description': video_info.get('shortDescription'),
'thumbnail': video_info.get('videoStillURL') or video_info.get('thumbnailURL'),
'uploader': video_info.get('publisherName'),
}
renditions = video_info.get('renditions')
if renditions:
renditions = sorted(renditions, key=lambda r: r['size'])
info['formats'] = [{
'url': rend['defaultURL'],
'height': rend.get('frameHeight'),
'width': rend.get('frameWidth'),
} for rend in renditions]
elif video_info.get('FLVFullLengthURL') is not None:
info.update({
'url': video_info['FLVFullLengthURL'],
})
else:
raise ExtractorError(u'Unable to extract video url for %s' % info['id'])
return info

View File

@ -0,0 +1,36 @@
# coding: utf-8
import re
import json
from .common import InfoExtractor
from ..utils import determine_ext
class C56IE(InfoExtractor):
_VALID_URL = r'https?://((www|player)\.)?56\.com/(.+?/)?(v_|(play_album.+-))(?P<textid>.+?)\.(html|swf)'
IE_NAME = u'56.com'
_TEST ={
u'url': u'http://www.56.com/u39/v_OTM0NDA3MTY.html',
u'file': u'93440716.flv',
u'md5': u'e59995ac63d0457783ea05f93f12a866',
u'info_dict': {
u'title': u'网事知多少 第32期车怒',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url, flags=re.VERBOSE)
text_id = mobj.group('textid')
info_page = self._download_webpage('http://vxml.56.com/json/%s/' % text_id,
text_id, u'Downloading video info')
info = json.loads(info_page)['info']
best_format = sorted(info['rfiles'], key=lambda f: int(f['filesize']))[-1]
video_url = best_format['url']
return {'id': info['vid'],
'title': info['Subject'],
'url': video_url,
'ext': determine_ext(video_url),
'thumbnail': info.get('bimg') or info.get('img'),
}

View File

@ -0,0 +1,37 @@
# coding: utf-8
import re
from .common import InfoExtractor
class Canalc2IE(InfoExtractor):
IE_NAME = 'canalc2.tv'
_VALID_URL = r'http://.*?\.canalc2\.tv/video\.asp\?.*?idVideo=(?P<id>\d+)'
_TEST = {
u'url': u'http://www.canalc2.tv/video.asp?idVideo=12163&voir=oui',
u'file': u'12163.mp4',
u'md5': u'060158428b650f896c542dfbb3d6487f',
u'info_dict': {
u'title': u'Terrasses du Numérique'
}
}
def _real_extract(self, url):
video_id = re.match(self._VALID_URL, url).group('id')
# We need to set the voir field for getting the file name
url = 'http://www.canalc2.tv/video.asp?idVideo=%s&voir=oui' % video_id
webpage = self._download_webpage(url, video_id)
file_name = self._search_regex(
r"so\.addVariable\('file','(.*?)'\);",
webpage, 'file name')
video_url = 'http://vod-flash.u-strasbg.fr:8080/' + file_name
title = self._html_search_regex(
r'class="evenement8">(.*?)</a>', webpage, u'title')
return {'id': video_id,
'ext': 'mp4',
'url': video_url,
'title': title,
}

View File

@ -0,0 +1,55 @@
# encoding: utf-8
import re
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import unified_strdate
class CanalplusIE(InfoExtractor):
_VALID_URL = r'https?://(www\.canalplus\.fr/.*?/(?P<path>.*)|player\.canalplus\.fr/#/(?P<id>\d+))'
_VIDEO_INFO_TEMPLATE = 'http://service.canal-plus.com/video/rest/getVideosLiees/cplus/%s'
IE_NAME = u'canalplus.fr'
_TEST = {
u'url': u'http://www.canalplus.fr/c-infos-documentaires/pid1830-c-zapping.html?vid=922470',
u'file': u'922470.flv',
u'info_dict': {
u'title': u'Zapping - 26/08/13',
u'description': u'Le meilleur de toutes les chaînes, tous les jours.\nEmission du 26 août 2013',
u'upload_date': u'20130826',
},
u'params': {
u'skip_download': True,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.groupdict().get('id')
if video_id is None:
webpage = self._download_webpage(url, mobj.group('path'))
video_id = self._search_regex(r'videoId = "(\d+)";', webpage, u'video id')
info_url = self._VIDEO_INFO_TEMPLATE % video_id
info_page = self._download_webpage(info_url,video_id,
u'Downloading video info')
self.report_extraction(video_id)
doc = xml.etree.ElementTree.fromstring(info_page.encode('utf-8'))
video_info = [video for video in doc if video.find('ID').text == video_id][0]
infos = video_info.find('INFOS')
media = video_info.find('MEDIA')
formats = [media.find('VIDEOS/%s' % format)
for format in ['BAS_DEBIT', 'HAUT_DEBIT', 'HD']]
video_url = [format.text for format in formats if format is not None][-1]
return {'id': video_id,
'title': u'%s - %s' % (infos.find('TITRAGE/TITRE').text,
infos.find('TITRAGE/SOUS_TITRE').text),
'url': video_url,
'ext': 'flv',
'upload_date': unified_strdate(infos.find('PUBLICATION/DATE').text),
'thumbnail': media.find('IMAGES/GRAND').text,
'description': infos.find('DESCRIPTION').text,
'view_count': int(infos.find('NB_VUES').text),
}

View File

@ -0,0 +1,84 @@
# encoding: utf-8
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
)
class CinemassacreIE(InfoExtractor):
_VALID_URL = r'(?:http://)?(?:www\.)?(?P<url>cinemassacre\.com/(?P<date_Y>[0-9]{4})/(?P<date_m>[0-9]{2})/(?P<date_d>[0-9]{2})/.+?)(?:[/?].*)?'
_TESTS = [{
u'url': u'http://cinemassacre.com/2012/11/10/avgn-the-movie-trailer/',
u'file': u'19911.flv',
u'md5': u'f9bb7ede54d1229c9846e197b4737e06',
u'info_dict': {
u'upload_date': u'20121110',
u'title': u'“Angry Video Game Nerd: The Movie” Trailer',
u'description': u'md5:fb87405fcb42a331742a0dce2708560b',
}
},
{
u'url': u'http://cinemassacre.com/2013/10/02/the-mummys-hand-1940',
u'file': u'521be8ef82b16.flv',
u'md5': u'9509ee44dcaa7c1068604817c19a9e50',
u'info_dict': {
u'upload_date': u'20131002',
u'title': u'The Mummys Hand (1940)',
}
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
webpage_url = u'http://' + mobj.group('url')
webpage = self._download_webpage(webpage_url, None) # Don't know video id yet
video_date = mobj.group('date_Y') + mobj.group('date_m') + mobj.group('date_d')
mobj = re.search(r'src="(?P<embed_url>http://player\.screenwavemedia\.com/play/[a-zA-Z]+\.php\?id=(?:Cinemassacre-)?(?P<video_id>.+?))"', webpage)
if not mobj:
raise ExtractorError(u'Can\'t extract embed url and video id')
playerdata_url = mobj.group(u'embed_url')
video_id = mobj.group(u'video_id')
video_title = self._html_search_regex(r'<title>(?P<title>.+?)\|',
webpage, u'title')
video_description = self._html_search_regex(r'<div class="entry-content">(?P<description>.+?)</div>',
webpage, u'description', flags=re.DOTALL, fatal=False)
if len(video_description) == 0:
video_description = None
playerdata = self._download_webpage(playerdata_url, video_id)
url = self._html_search_regex(r'\'streamer\': \'(?P<url>[^\']+)\'', playerdata, u'url')
sd_file = self._html_search_regex(r'\'file\': \'(?P<sd_file>[^\']+)\'', playerdata, u'sd_file')
hd_file = self._html_search_regex(r'\'?file\'?: "(?P<hd_file>[^"]+)"', playerdata, u'hd_file')
video_thumbnail = self._html_search_regex(r'\'image\': \'(?P<thumbnail>[^\']+)\'', playerdata, u'thumbnail', fatal=False)
formats = [
{
'url': url,
'play_path': 'mp4:' + sd_file,
'rtmp_live': True, # workaround
'ext': 'flv',
'format': 'sd',
'format_id': 'sd',
},
{
'url': url,
'play_path': 'mp4:' + hd_file,
'rtmp_live': True, # workaround
'ext': 'flv',
'format': 'hd',
'format_id': 'hd',
},
]
return {
'id': video_id,
'title': video_title,
'formats': formats,
'description': video_description,
'upload_date': video_date,
'thumbnail': video_thumbnail,
}

View File

@ -0,0 +1,53 @@
import re
import time
import xml.etree.ElementTree
from .common import InfoExtractor
class ClipfishIE(InfoExtractor):
IE_NAME = u'clipfish'
_VALID_URL = r'^https?://(?:www\.)?clipfish\.de/.*?/video/(?P<id>[0-9]+)/'
_TEST = {
u'url': u'http://www.clipfish.de/special/supertalent/video/4028320/supertalent-2013-ivana-opacak-singt-nobodys-perfect/',
u'file': u'4028320.f4v',
u'md5': u'5e38bda8c329fbfb42be0386a3f5a382',
u'info_dict': {
u'title': u'Supertalent 2013: Ivana Opacak singt Nobody\'s Perfect',
u'duration': 399,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group(1)
info_url = ('http://www.clipfish.de/devxml/videoinfo/%s?ts=%d' %
(video_id, int(time.time())))
info_xml = self._download_webpage(
info_url, video_id, note=u'Downloading info page')
doc = xml.etree.ElementTree.fromstring(info_xml)
title = doc.find('title').text
video_url = doc.find('filename').text
thumbnail = doc.find('imageurl').text
duration_str = doc.find('duration').text
m = re.match(
r'^(?P<hours>[0-9]+):(?P<minutes>[0-9]{2}):(?P<seconds>[0-9]{2}):(?P<ms>[0-9]*)$',
duration_str)
if m:
duration = (
(int(m.group('hours')) * 60 * 60) +
(int(m.group('minutes')) * 60) +
(int(m.group('seconds')))
)
else:
duration = None
return {
'id': video_id,
'title': title,
'url': video_url,
'thumbnail': thumbnail,
'duration': duration,
}

View File

@ -0,0 +1,58 @@
import re
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import determine_ext
class CNNIE(InfoExtractor):
_VALID_URL = r'''(?x)https?://((edition|www)\.)?cnn\.com/video/(data/.+?|\?)/
(?P<path>.+?/(?P<title>[^/]+?)(?:\.cnn|(?=&)))'''
_TESTS = [{
u'url': u'http://edition.cnn.com/video/?/video/sports/2013/06/09/nadal-1-on-1.cnn',
u'file': u'sports_2013_06_09_nadal-1-on-1.cnn.mp4',
u'md5': u'3e6121ea48df7e2259fe73a0628605c4',
u'info_dict': {
u'title': u'Nadal wins 8th French Open title',
u'description': u'World Sport\'s Amanda Davies chats with 2013 French Open champion Rafael Nadal.',
},
},
{
u"url": u"http://edition.cnn.com/video/?/video/us/2013/08/21/sot-student-gives-epic-speech.georgia-institute-of-technology&utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+rss%2Fcnn_topstories+%28RSS%3A+Top+Stories%29",
u"file": u"us_2013_08_21_sot-student-gives-epic-speech.georgia-institute-of-technology.mp4",
u"md5": u"b5cc60c60a3477d185af8f19a2a26f4e",
u"info_dict": {
u"title": "Student's epic speech stuns new freshmen",
u"description": "A Georgia Tech student welcomes the incoming freshmen with an epic speech backed by music from \"2001: A Space Odyssey.\""
}
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
path = mobj.group('path')
page_title = mobj.group('title')
info_url = u'http://cnn.com/video/data/3.0/%s/index.xml' % path
info_xml = self._download_webpage(info_url, page_title)
info = xml.etree.ElementTree.fromstring(info_xml.encode('utf-8'))
formats = []
for f in info.findall('files/file'):
mf = re.match(r'(\d+)x(\d+)(?:_(.*)k)?',f.attrib['bitrate'])
if mf is not None:
formats.append((int(mf.group(1)), int(mf.group(2)), int(mf.group(3) or 0), f.text))
formats = sorted(formats)
(_,_,_, video_path) = formats[-1]
video_url = 'http://ht.cdn.turner.com/cnn/big%s' % video_path
thumbnails = sorted([((int(t.attrib['height']),int(t.attrib['width'])), t.text) for t in info.findall('images/image')])
thumbs_dict = [{'resolution': res, 'url': t_url} for (res, t_url) in thumbnails]
return {'id': info.attrib['id'],
'title': info.find('headline').text,
'url': video_url,
'ext': determine_ext(video_url),
'thumbnail': thumbnails[-1][1],
'thumbnails': thumbs_dict,
'description': info.find('description').text,
}

View File

@ -0,0 +1,82 @@
import re
from .common import InfoExtractor
from ..utils import (
compat_urllib_parse_urlparse,
determine_ext,
ExtractorError,
)
class CollegeHumorIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?(?:www\.)?collegehumor\.com/(video|embed|e)/(?P<videoid>[0-9]+)/?(?P<shorttitle>.*)$'
_TESTS = [{
u'url': u'http://www.collegehumor.com/video/6902724/comic-con-cosplay-catastrophe',
u'file': u'6902724.mp4',
u'md5': u'1264c12ad95dca142a9f0bf7968105a0',
u'info_dict': {
u'title': u'Comic-Con Cosplay Catastrophe',
u'description': u'Fans get creative this year at San Diego. Too creative. And yes, that\'s really Joss Whedon.',
},
},
{
u'url': u'http://www.collegehumor.com/video/3505939/font-conference',
u'file': u'3505939.mp4',
u'md5': u'c51ca16b82bb456a4397987791a835f5',
u'info_dict': {
u'title': u'Font Conference',
u'description': u'This video wasn\'t long enough, so we made it double-spaced.',
},
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
if mobj is None:
raise ExtractorError(u'Invalid URL: %s' % url)
video_id = mobj.group('videoid')
info = {
'id': video_id,
'uploader': None,
'upload_date': None,
}
self.report_extraction(video_id)
xmlUrl = 'http://www.collegehumor.com/moogaloop/video/' + video_id
mdoc = self._download_xml(xmlUrl, video_id,
u'Downloading info XML',
u'Unable to download video info XML')
try:
videoNode = mdoc.findall('./video')[0]
youtubeIdNode = videoNode.find('./youtubeID')
if youtubeIdNode is not None:
return self.url_result(youtubeIdNode.text, 'Youtube')
info['description'] = videoNode.findall('./description')[0].text
info['title'] = videoNode.findall('./caption')[0].text
info['thumbnail'] = videoNode.findall('./thumbnail')[0].text
next_url = videoNode.findall('./file')[0].text
except IndexError:
raise ExtractorError(u'Invalid metadata XML file')
if next_url.endswith(u'manifest.f4m'):
manifest_url = next_url + '?hdcore=2.10.3'
adoc = self._download_xml(manifest_url, video_id,
u'Downloading XML manifest',
u'Unable to download video info XML')
try:
video_id = adoc.findall('./{http://ns.adobe.com/f4m/1.0}id')[0].text
except IndexError:
raise ExtractorError(u'Invalid manifest file')
url_pr = compat_urllib_parse_urlparse(info['thumbnail'])
info['url'] = url_pr.scheme + '://' + url_pr.netloc + video_id[:-2].replace('.csmil','').replace(',','')
info['ext'] = 'mp4'
else:
# Old-style direct links
info['url'] = next_url
info['ext'] = determine_ext(info['url'])
return info

View File

@ -0,0 +1,218 @@
import re
import xml.etree.ElementTree
from .common import InfoExtractor
from .mtv import MTVIE, _media_xml_tag
from ..utils import (
compat_str,
compat_urllib_parse,
ExtractorError,
unified_strdate,
)
class ComedyCentralIE(MTVIE):
_VALID_URL = r'http://www.comedycentral.com/(video-clips|episodes|cc-studios)/(?P<title>.*)'
_FEED_URL = u'http://comedycentral.com/feeds/mrss/'
_TEST = {
u'url': u'http://www.comedycentral.com/video-clips/kllhuv/stand-up-greg-fitzsimmons--uncensored---too-good-of-a-mother',
u'md5': u'4167875aae411f903b751a21f357f1ee',
u'info_dict': {
u'id': u'cef0cbb3-e776-4bc9-b62e-8016deccb354',
u'ext': u'mp4',
u'title': u'Uncensored - Greg Fitzsimmons - Too Good of a Mother',
u'description': u'After a certain point, breastfeeding becomes c**kblocking.',
},
}
# Overwrite MTVIE properties we don't want
_TESTS = []
def _get_thumbnail_url(self, uri, itemdoc):
search_path = '%s/%s' % (_media_xml_tag('group'), _media_xml_tag('thumbnail'))
return itemdoc.find(search_path).attrib['url']
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
title = mobj.group('title')
webpage = self._download_webpage(url, title)
mgid = self._search_regex(r'data-mgid="(?P<mgid>mgid:.*?)"',
webpage, u'mgid')
return self._get_videos_info(mgid)
class ComedyCentralShowsIE(InfoExtractor):
IE_DESC = u'The Daily Show / Colbert Report'
# urls can be abbreviations like :thedailyshow or :colbert
# urls for episodes like:
# or urls for clips like: http://www.thedailyshow.com/watch/mon-december-10-2012/any-given-gun-day
# or: http://www.colbertnation.com/the-colbert-report-videos/421667/november-29-2012/moon-shattering-news
# or: http://www.colbertnation.com/the-colbert-report-collections/422008/festival-of-lights/79524
_VALID_URL = r"""^(:(?P<shortname>tds|thedailyshow|cr|colbert|colbertnation|colbertreport)
|(https?://)?(www\.)?
(?P<showname>thedailyshow|colbertnation)\.com/
(full-episodes/(?P<episode>.*)|
(?P<clip>
(the-colbert-report-(videos|collections)/(?P<clipID>[0-9]+)/[^/]*/(?P<cntitle>.*?))
|(watch/(?P<date>[^/]*)/(?P<tdstitle>.*)))|
(?P<interview>
extended-interviews/(?P<interID>[0-9]+)/playlist_tds_extended_(?P<interview_title>.*?)/.*?)))
$"""
_TEST = {
u'url': u'http://www.thedailyshow.com/watch/thu-december-13-2012/kristen-stewart',
u'file': u'422212.mp4',
u'md5': u'4e2f5cb088a83cd8cdb7756132f9739d',
u'info_dict': {
u"upload_date": u"20121214",
u"description": u"Kristen Stewart",
u"uploader": u"thedailyshow",
u"title": u"thedailyshow-kristen-stewart part 1"
}
}
_available_formats = ['3500', '2200', '1700', '1200', '750', '400']
_video_extensions = {
'3500': 'mp4',
'2200': 'mp4',
'1700': 'mp4',
'1200': 'mp4',
'750': 'mp4',
'400': 'mp4',
}
_video_dimensions = {
'3500': (1280, 720),
'2200': (960, 540),
'1700': (768, 432),
'1200': (640, 360),
'750': (512, 288),
'400': (384, 216),
}
@classmethod
def suitable(cls, url):
"""Receives a URL and returns True if suitable for this IE."""
return re.match(cls._VALID_URL, url, re.VERBOSE) is not None
@staticmethod
def _transform_rtmp_url(rtmp_video_url):
m = re.match(r'^rtmpe?://.*?/(?P<finalid>gsp.comedystor/.*)$', rtmp_video_url)
if not m:
raise ExtractorError(u'Cannot transform RTMP url')
base = 'http://mtvnmobile.vo.llnwd.net/kip0/_pxn=1+_pxI0=Ripod-h264+_pxL0=undefined+_pxM0=+_pxK=18639+_pxE=mp4/44620/mtvnorigin/'
return base + m.group('finalid')
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url, re.VERBOSE)
if mobj is None:
raise ExtractorError(u'Invalid URL: %s' % url)
if mobj.group('shortname'):
if mobj.group('shortname') in ('tds', 'thedailyshow'):
url = u'http://www.thedailyshow.com/full-episodes/'
else:
url = u'http://www.colbertnation.com/full-episodes/'
mobj = re.match(self._VALID_URL, url, re.VERBOSE)
assert mobj is not None
if mobj.group('clip'):
if mobj.group('showname') == 'thedailyshow':
epTitle = mobj.group('tdstitle')
else:
epTitle = mobj.group('cntitle')
dlNewest = False
elif mobj.group('interview'):
epTitle = mobj.group('interview_title')
dlNewest = False
else:
dlNewest = not mobj.group('episode')
if dlNewest:
epTitle = mobj.group('showname')
else:
epTitle = mobj.group('episode')
self.report_extraction(epTitle)
webpage,htmlHandle = self._download_webpage_handle(url, epTitle)
if dlNewest:
url = htmlHandle.geturl()
mobj = re.match(self._VALID_URL, url, re.VERBOSE)
if mobj is None:
raise ExtractorError(u'Invalid redirected URL: ' + url)
if mobj.group('episode') == '':
raise ExtractorError(u'Redirected URL is still not specific: ' + url)
epTitle = mobj.group('episode')
mMovieParams = re.findall('(?:<param name="movie" value="|var url = ")(http://media.mtvnservices.com/([^"]*(?:episode|video).*?:.*?))"', webpage)
if len(mMovieParams) == 0:
# The Colbert Report embeds the information in a without
# a URL prefix; so extract the alternate reference
# and then add the URL prefix manually.
altMovieParams = re.findall('data-mgid="([^"]*(?:episode|video).*?:.*?)"', webpage)
if len(altMovieParams) == 0:
raise ExtractorError(u'unable to find Flash URL in webpage ' + url)
else:
mMovieParams = [("http://media.mtvnservices.com/" + altMovieParams[0], altMovieParams[0])]
uri = mMovieParams[0][1]
indexUrl = 'http://shadow.comedycentral.com/feeds/video_player/mrss/?' + compat_urllib_parse.urlencode({'uri': uri})
indexXml = self._download_webpage(indexUrl, epTitle,
u'Downloading show index',
u'unable to download episode index')
results = []
idoc = xml.etree.ElementTree.fromstring(indexXml)
itemEls = idoc.findall('.//item')
for partNum,itemEl in enumerate(itemEls):
mediaId = itemEl.findall('./guid')[0].text
shortMediaId = mediaId.split(':')[-1]
showId = mediaId.split(':')[-2].replace('.com', '')
officialTitle = itemEl.findall('./title')[0].text
officialDate = unified_strdate(itemEl.findall('./pubDate')[0].text)
configUrl = ('http://www.comedycentral.com/global/feeds/entertainment/media/mediaGenEntertainment.jhtml?' +
compat_urllib_parse.urlencode({'uri': mediaId}))
configXml = self._download_webpage(configUrl, epTitle,
u'Downloading configuration for %s' % shortMediaId)
cdoc = xml.etree.ElementTree.fromstring(configXml)
turls = []
for rendition in cdoc.findall('.//rendition'):
finfo = (rendition.attrib['bitrate'], rendition.findall('./src')[0].text)
turls.append(finfo)
if len(turls) == 0:
self._downloader.report_error(u'unable to download ' + mediaId + ': No videos found')
continue
formats = []
for format, rtmp_video_url in turls:
w, h = self._video_dimensions.get(format, (None, None))
formats.append({
'url': self._transform_rtmp_url(rtmp_video_url),
'ext': self._video_extensions.get(format, 'mp4'),
'format_id': format,
'height': h,
'width': w,
})
effTitle = showId + u'-' + epTitle + u' part ' + compat_str(partNum+1)
info = {
'id': shortMediaId,
'formats': formats,
'uploader': showId,
'upload_date': officialDate,
'title': effTitle,
'thumbnail': None,
'description': compat_str(officialTitle),
}
# TODO: Remove when #980 has been merged
info.update(info['formats'][-1])
results.append(info)
return results

View File

@ -0,0 +1,440 @@
import base64
import os
import re
import socket
import sys
import netrc
import xml.etree.ElementTree
from ..utils import (
compat_http_client,
compat_urllib_error,
compat_str,
clean_html,
compiled_regex_type,
ExtractorError,
RegexNotFoundError,
sanitize_filename,
unescapeHTML,
)
class InfoExtractor(object):
"""Information Extractor class.
Information extractors are the classes that, given a URL, extract
information about the video (or videos) the URL refers to. This
information includes the real video URL, the video title, author and
others. The information is stored in a dictionary which is then
passed to the FileDownloader. The FileDownloader processes this
information possibly downloading the video to the file system, among
other possible outcomes.
The dictionaries must include the following fields:
id: Video identifier.
url: Final video URL.
title: Video title, unescaped.
ext: Video filename extension.
Instead of url and ext, formats can also specified.
The following fields are optional:
format: The video format, defaults to ext (used for --get-format)
thumbnails: A list of dictionaries (with the entries "resolution" and
"url") for the varying thumbnails
thumbnail: Full URL to a video thumbnail image.
description: One-line video description.
uploader: Full name of the video uploader.
upload_date: Video upload date (YYYYMMDD).
uploader_id: Nickname or id of the video uploader.
location: Physical location of the video.
player_url: SWF Player URL (used for rtmpdump).
subtitles: The subtitle file contents as a dictionary in the format
{language: subtitles}.
view_count: How many users have watched the video on the platform.
urlhandle: [internal] The urlHandle to be used to download the file,
like returned by urllib.request.urlopen
age_limit: Age restriction for the video, as an integer (years)
formats: A list of dictionaries for each format available, it must
be ordered from worst to best quality. Potential fields:
* url Mandatory. The URL of the video file
* ext Will be calculated from url if missing
* format A human-readable description of the format
("mp4 container with h264/opus").
Calculated from the format_id, width, height.
and format_note fields if missing.
* format_id A short description of the format
("mp4_h264_opus" or "19")
* format_note Additional info about the format
("3D" or "DASH video")
* width Width of the video, if known
* height Height of the video, if known
* abr Average audio bitrate in KBit/s
* acodec Name of the audio codec in use
* vbr Average video bitrate in KBit/s
* vcodec Name of the video codec in use
* filesize The number of bytes, if known in advance
webpage_url: The url to the video webpage, if given to youtube-dl it
should allow to get the same result again. (It will be set
by YoutubeDL if it's missing)
Unless mentioned otherwise, the fields should be Unicode strings.
Subclasses of this one should re-define the _real_initialize() and
_real_extract() methods and define a _VALID_URL regexp.
Probably, they should also be added to the list of extractors.
_real_extract() must return a *list* of information dictionaries as
described above.
Finally, the _WORKING attribute should be set to False for broken IEs
in order to warn the users and skip the tests.
"""
_ready = False
_downloader = None
_WORKING = True
def __init__(self, downloader=None):
"""Constructor. Receives an optional downloader."""
self._ready = False
self.set_downloader(downloader)
@classmethod
def suitable(cls, url):
"""Receives a URL and returns True if suitable for this IE."""
# This does not use has/getattr intentionally - we want to know whether
# we have cached the regexp for *this* class, whereas getattr would also
# match the superclass
if '_VALID_URL_RE' not in cls.__dict__:
cls._VALID_URL_RE = re.compile(cls._VALID_URL)
return cls._VALID_URL_RE.match(url) is not None
@classmethod
def working(cls):
"""Getter method for _WORKING."""
return cls._WORKING
def initialize(self):
"""Initializes an instance (authentication, etc)."""
if not self._ready:
self._real_initialize()
self._ready = True
def extract(self, url):
"""Extracts URL information and returns it in list of dicts."""
self.initialize()
return self._real_extract(url)
def set_downloader(self, downloader):
"""Sets the downloader for this IE."""
self._downloader = downloader
def _real_initialize(self):
"""Real initialization process. Redefine in subclasses."""
pass
def _real_extract(self, url):
"""Real extraction process. Redefine in subclasses."""
pass
@classmethod
def ie_key(cls):
"""A string for getting the InfoExtractor with get_info_extractor"""
return cls.__name__[:-2]
@property
def IE_NAME(self):
return type(self).__name__[:-2]
def _request_webpage(self, url_or_request, video_id, note=None, errnote=None):
""" Returns the response handle """
if note is None:
self.report_download_webpage(video_id)
elif note is not False:
self.to_screen(u'%s: %s' % (video_id, note))
try:
return self._downloader.urlopen(url_or_request)
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
if errnote is None:
errnote = u'Unable to download webpage'
raise ExtractorError(u'%s: %s' % (errnote, compat_str(err)), sys.exc_info()[2], cause=err)
def _download_webpage_handle(self, url_or_request, video_id, note=None, errnote=None):
""" Returns a tuple (page content as string, URL handle) """
# Strip hashes from the URL (#1038)
if isinstance(url_or_request, (compat_str, str)):
url_or_request = url_or_request.partition('#')[0]
urlh = self._request_webpage(url_or_request, video_id, note, errnote)
content_type = urlh.headers.get('Content-Type', '')
webpage_bytes = urlh.read()
m = re.match(r'[a-zA-Z0-9_.-]+/[a-zA-Z0-9_.-]+\s*;\s*charset=(.+)', content_type)
if m:
encoding = m.group(1)
else:
m = re.search(br'<meta[^>]+charset=[\'"]?([^\'")]+)[ /\'">]',
webpage_bytes[:1024])
if m:
encoding = m.group(1).decode('ascii')
else:
encoding = 'utf-8'
if self._downloader.params.get('dump_intermediate_pages', False):
try:
url = url_or_request.get_full_url()
except AttributeError:
url = url_or_request
self.to_screen(u'Dumping request to ' + url)
dump = base64.b64encode(webpage_bytes).decode('ascii')
self._downloader.to_screen(dump)
if self._downloader.params.get('write_pages', False):
try:
url = url_or_request.get_full_url()
except AttributeError:
url = url_or_request
raw_filename = ('%s_%s.dump' % (video_id, url))
filename = sanitize_filename(raw_filename, restricted=True)
self.to_screen(u'Saving request to ' + filename)
with open(filename, 'wb') as outf:
outf.write(webpage_bytes)
content = webpage_bytes.decode(encoding, 'replace')
return (content, urlh)
def _download_webpage(self, url_or_request, video_id, note=None, errnote=None):
""" Returns the data of the page as a string """
return self._download_webpage_handle(url_or_request, video_id, note, errnote)[0]
def _download_xml(self, url_or_request, video_id, note=u'Downloading XML', errnote=u'Unable to downloand XML'):
"""Return the xml as an xml.etree.ElementTree.Element"""
xml_string = self._download_webpage(url_or_request, video_id, note, errnote)
return xml.etree.ElementTree.fromstring(xml_string.encode('utf-8'))
def to_screen(self, msg):
"""Print msg to screen, prefixing it with '[ie_name]'"""
self._downloader.to_screen(u'[%s] %s' % (self.IE_NAME, msg))
def report_extraction(self, id_or_name):
"""Report information extraction."""
self.to_screen(u'%s: Extracting information' % id_or_name)
def report_download_webpage(self, video_id):
"""Report webpage download."""
self.to_screen(u'%s: Downloading webpage' % video_id)
def report_age_confirmation(self):
"""Report attempt to confirm age."""
self.to_screen(u'Confirming age')
def report_login(self):
"""Report attempt to log in."""
self.to_screen(u'Logging in')
#Methods for following #608
def url_result(self, url, ie=None, video_id=None):
"""Returns a url that points to a page that should be processed"""
#TODO: ie should be the class used for getting the info
video_info = {'_type': 'url',
'url': url,
'ie_key': ie}
if video_id is not None:
video_info['id'] = video_id
return video_info
def playlist_result(self, entries, playlist_id=None, playlist_title=None):
"""Returns a playlist"""
video_info = {'_type': 'playlist',
'entries': entries}
if playlist_id:
video_info['id'] = playlist_id
if playlist_title:
video_info['title'] = playlist_title
return video_info
def _search_regex(self, pattern, string, name, default=None, fatal=True, flags=0):
"""
Perform a regex search on the given string, using a single or a list of
patterns returning the first matching group.
In case of failure return a default value or raise a WARNING or a
RegexNotFoundError, depending on fatal, specifying the field name.
"""
if isinstance(pattern, (str, compat_str, compiled_regex_type)):
mobj = re.search(pattern, string, flags)
else:
for p in pattern:
mobj = re.search(p, string, flags)
if mobj: break
if sys.stderr.isatty() and os.name != 'nt':
_name = u'\033[0;34m%s\033[0m' % name
else:
_name = name
if mobj:
# return the first matching group
return next(g for g in mobj.groups() if g is not None)
elif default is not None:
return default
elif fatal:
raise RegexNotFoundError(u'Unable to extract %s' % _name)
else:
self._downloader.report_warning(u'unable to extract %s; '
u'please report this issue on http://yt-dl.org/bug' % _name)
return None
def _html_search_regex(self, pattern, string, name, default=None, fatal=True, flags=0):
"""
Like _search_regex, but strips HTML tags and unescapes entities.
"""
res = self._search_regex(pattern, string, name, default, fatal, flags)
if res:
return clean_html(res).strip()
else:
return res
def _get_login_info(self):
"""
Get the the login info as (username, password)
It will look in the netrc file using the _NETRC_MACHINE value
If there's no info available, return (None, None)
"""
if self._downloader is None:
return (None, None)
username = None
password = None
downloader_params = self._downloader.params
# Attempt to use provided username and password or .netrc data
if downloader_params.get('username', None) is not None:
username = downloader_params['username']
password = downloader_params['password']
elif downloader_params.get('usenetrc', False):
try:
info = netrc.netrc().authenticators(self._NETRC_MACHINE)
if info is not None:
username = info[0]
password = info[2]
else:
raise netrc.NetrcParseError('No authenticators for %s' % self._NETRC_MACHINE)
except (IOError, netrc.NetrcParseError) as err:
self._downloader.report_warning(u'parsing .netrc: %s' % compat_str(err))
return (username, password)
# Helper functions for extracting OpenGraph info
@staticmethod
def _og_regexes(prop):
content_re = r'content=(?:"([^>]+?)"|\'(.+?)\')'
property_re = r'property=[\'"]og:%s[\'"]' % re.escape(prop)
template = r'<meta[^>]+?%s[^>]+?%s'
return [
template % (property_re, content_re),
template % (content_re, property_re),
]
def _og_search_property(self, prop, html, name=None, **kargs):
if name is None:
name = 'OpenGraph %s' % prop
escaped = self._search_regex(self._og_regexes(prop), html, name, flags=re.DOTALL, **kargs)
if escaped is None:
return None
return unescapeHTML(escaped)
def _og_search_thumbnail(self, html, **kargs):
return self._og_search_property('image', html, u'thumbnail url', fatal=False, **kargs)
def _og_search_description(self, html, **kargs):
return self._og_search_property('description', html, fatal=False, **kargs)
def _og_search_title(self, html, **kargs):
return self._og_search_property('title', html, **kargs)
def _og_search_video_url(self, html, name='video url', secure=True, **kargs):
regexes = self._og_regexes('video')
if secure: regexes = self._og_regexes('video:secure_url') + regexes
return self._html_search_regex(regexes, html, name, **kargs)
def _html_search_meta(self, name, html, display_name=None):
if display_name is None:
display_name = name
return self._html_search_regex(
r'''(?ix)<meta(?=[^>]+(?:name|property)=["\']%s["\'])
[^>]+content=["\']([^"\']+)["\']''' % re.escape(name),
html, display_name, fatal=False)
def _dc_search_uploader(self, html):
return self._html_search_meta('dc.creator', html, 'uploader')
def _rta_search(self, html):
# See http://www.rtalabel.org/index.php?content=howtofaq#single
if re.search(r'(?ix)<meta\s+name="rating"\s+'
r' content="RTA-5042-1996-1400-1577-RTA"',
html):
return 18
return 0
def _media_rating_search(self, html):
# See http://www.tjg-designs.com/WP/metadata-code-examples-adding-metadata-to-your-web-pages/
rating = self._html_search_meta('rating', html)
if not rating:
return None
RATING_TABLE = {
'safe for kids': 0,
'general': 8,
'14 years': 14,
'mature': 17,
'restricted': 19,
}
return RATING_TABLE.get(rating.lower(), None)
class SearchInfoExtractor(InfoExtractor):
"""
Base class for paged search queries extractors.
They accept urls in the format _SEARCH_KEY(|all|[0-9]):{query}
Instances should define _SEARCH_KEY and _MAX_RESULTS.
"""
@classmethod
def _make_valid_url(cls):
return r'%s(?P<prefix>|[1-9][0-9]*|all):(?P<query>[\s\S]+)' % cls._SEARCH_KEY
@classmethod
def suitable(cls, url):
return re.match(cls._make_valid_url(), url) is not None
def _real_extract(self, query):
mobj = re.match(self._make_valid_url(), query)
if mobj is None:
raise ExtractorError(u'Invalid search query "%s"' % query)
prefix = mobj.group('prefix')
query = mobj.group('query')
if prefix == '':
return self._get_n_results(query, 1)
elif prefix == 'all':
return self._get_n_results(query, self._MAX_RESULTS)
else:
n = int(prefix)
if n <= 0:
raise ExtractorError(u'invalid download number %s for query "%s"' % (n, query))
elif n > self._MAX_RESULTS:
self._downloader.report_warning(u'%s returns max %i results (you requested %i)' % (self._SEARCH_KEY, self._MAX_RESULTS, n))
n = self._MAX_RESULTS
return self._get_n_results(query, n)
def _get_n_results(self, query, n):
"""Get a specified number of results for a query"""
raise NotImplementedError("This method must be implemented by subclasses")
@property
def SEARCH_KEY(self):
return self._SEARCH_KEY

View File

@ -0,0 +1,106 @@
# coding: utf-8
import re
import json
from .common import InfoExtractor
from ..utils import (
compat_urllib_parse,
orderedSet,
compat_urllib_parse_urlparse,
compat_urlparse,
)
class CondeNastIE(InfoExtractor):
"""
Condé Nast is a media group, some of its sites use a custom HTML5 player
that works the same in all of them.
"""
# The keys are the supported sites and the values are the name to be shown
# to the user and in the extractor description.
_SITES = {'wired': u'WIRED',
'gq': u'GQ',
'vogue': u'Vogue',
'glamour': u'Glamour',
'wmagazine': u'W Magazine',
'vanityfair': u'Vanity Fair',
}
_VALID_URL = r'http://(video|www).(?P<site>%s).com/(?P<type>watch|series|video)/(?P<id>.+)' % '|'.join(_SITES.keys())
IE_DESC = u'Condé Nast media group: %s' % ', '.join(sorted(_SITES.values()))
_TEST = {
u'url': u'http://video.wired.com/watch/3d-printed-speakers-lit-with-led',
u'file': u'5171b343c2b4c00dd0c1ccb3.mp4',
u'md5': u'1921f713ed48aabd715691f774c451f7',
u'info_dict': {
u'title': u'3D Printed Speakers Lit With LED',
u'description': u'Check out these beautiful 3D printed LED speakers. You can\'t actually buy them, but LumiGeek is working on a board that will let you make you\'re own.',
}
}
def _extract_series(self, url, webpage):
title = self._html_search_regex(r'<div class="cne-series-info">.*?<h1>(.+?)</h1>',
webpage, u'series title', flags=re.DOTALL)
url_object = compat_urllib_parse_urlparse(url)
base_url = '%s://%s' % (url_object.scheme, url_object.netloc)
m_paths = re.finditer(r'<p class="cne-thumb-title">.*?<a href="(/watch/.+?)["\?]',
webpage, flags=re.DOTALL)
paths = orderedSet(m.group(1) for m in m_paths)
build_url = lambda path: compat_urlparse.urljoin(base_url, path)
entries = [self.url_result(build_url(path), 'CondeNast') for path in paths]
return self.playlist_result(entries, playlist_title=title)
def _extract_video(self, webpage):
description = self._html_search_regex([r'<div class="cne-video-description">(.+?)</div>',
r'<div class="video-post-content">(.+?)</div>',
],
webpage, u'description',
fatal=False, flags=re.DOTALL)
params = self._search_regex(r'var params = {(.+?)}[;,]', webpage,
u'player params', flags=re.DOTALL)
video_id = self._search_regex(r'videoId: [\'"](.+?)[\'"]', params, u'video id')
player_id = self._search_regex(r'playerId: [\'"](.+?)[\'"]', params, u'player id')
target = self._search_regex(r'target: [\'"](.+?)[\'"]', params, u'target')
data = compat_urllib_parse.urlencode({'videoId': video_id,
'playerId': player_id,
'target': target,
})
base_info_url = self._search_regex(r'url = [\'"](.+?)[\'"][,;]',
webpage, u'base info url',
default='http://player.cnevids.com/player/loader.js?')
info_url = base_info_url + data
info_page = self._download_webpage(info_url, video_id,
u'Downloading video info')
video_info = self._search_regex(r'var video = ({.+?});', info_page, u'video info')
video_info = json.loads(video_info)
def _formats_sort_key(f):
type_ord = 1 if f['type'] == 'video/mp4' else 0
quality_ord = 1 if f['quality'] == 'high' else 0
return (quality_ord, type_ord)
best_format = sorted(video_info['sources'][0], key=_formats_sort_key)[-1]
return {'id': video_id,
'url': best_format['src'],
'ext': best_format['type'].split('/')[-1],
'title': video_info['title'],
'thumbnail': video_info['poster_frame'],
'description': description,
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
site = mobj.group('site')
url_type = mobj.group('type')
id = mobj.group('id')
self.to_screen(u'Extracting from %s with the Condé Nast extractor' % self._SITES[site])
webpage = self._download_webpage(url, id)
if url_type == 'series':
return self._extract_series(url, webpage)
else:
return self._extract_video(webpage)

View File

@ -0,0 +1,40 @@
# -*- coding: utf-8 -*-
import re
from .common import InfoExtractor
from ..utils import determine_ext
class CriterionIE(InfoExtractor):
_VALID_URL = r'https?://www\.criterion\.com/films/(\d*)-.+'
_TEST = {
u'url': u'http://www.criterion.com/films/184-le-samourai',
u'file': u'184.mp4',
u'md5': u'bc51beba55685509883a9a7830919ec3',
u'info_dict': {
u"title": u"Le Samouraï",
u"description" : u'md5:a2b4b116326558149bef81f76dcbb93f',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group(1)
webpage = self._download_webpage(url, video_id)
final_url = self._search_regex(r'so.addVariable\("videoURL", "(.+?)"\)\;',
webpage, 'video url')
title = self._html_search_regex(r'<meta content="(.+?)" property="og:title" />',
webpage, 'video title')
description = self._html_search_regex(r'<meta name="description" content="(.+?)" />',
webpage, 'video description')
thumbnail = self._search_regex(r'so.addVariable\("thumbnailURL", "(.+?)"\)\;',
webpage, 'thumbnail url')
return {'id': video_id,
'url' : final_url,
'title': title,
'ext': determine_ext(final_url),
'description': description,
'thumbnail': thumbnail,
}

View File

@ -0,0 +1,51 @@
import re
from .common import InfoExtractor
from ..utils import (
compat_urllib_parse,
)
class CSpanIE(InfoExtractor):
_VALID_URL = r'http://www.c-spanvideo.org/program/(.*)'
_TEST = {
u'url': u'http://www.c-spanvideo.org/program/HolderonV',
u'file': u'315139.flv',
u'md5': u'74a623266956f69e4df0068ab6c80fe4',
u'info_dict': {
u"title": u"Attorney General Eric Holder on Voting Rights Act Decision"
},
u'skip': u'Requires rtmpdump'
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
prog_name = mobj.group(1)
webpage = self._download_webpage(url, prog_name)
video_id = self._search_regex(r'programid=(.*?)&', webpage, 'video id')
data = compat_urllib_parse.urlencode({'programid': video_id,
'dynamic':'1'})
info_url = 'http://www.c-spanvideo.org/common/services/flashXml.php?' + data
video_info = self._download_webpage(info_url, video_id, u'Downloading video info')
self.report_extraction(video_id)
title = self._html_search_regex(r'<string name="title">(.*?)</string>',
video_info, 'title')
description = self._html_search_regex(r'<meta (?:property="og:|name=")description" content="(.*?)"',
webpage, 'description',
flags=re.MULTILINE|re.DOTALL)
url = self._search_regex(r'<string name="URL">(.*?)</string>',
video_info, 'video url')
url = url.replace('$(protocol)', 'rtmp').replace('$(port)', '443')
path = self._search_regex(r'<string name="path">(.*?)</string>',
video_info, 'rtmp play path')
return {'id': video_id,
'title': title,
'ext': 'flv',
'url': url,
'play_path': path,
'description': description,
'thumbnail': self._og_search_thumbnail(webpage),
}

View File

@ -0,0 +1,22 @@
# encoding: utf-8
from .canalplus import CanalplusIE
class D8IE(CanalplusIE):
_VALID_URL = r'https?://www\.d8\.tv/.*?/(?P<path>.*)'
_VIDEO_INFO_TEMPLATE = 'http://service.canal-plus.com/video/rest/getVideosLiees/d8/%s'
IE_NAME = u'd8.tv'
_TEST = {
u'url': u'http://www.d8.tv/d8-docs-mags/pid6589-d8-campagne-intime.html',
u'file': u'966289.flv',
u'info_dict': {
u'title': u'Campagne intime - Documentaire exceptionnel',
u'description': u'md5:d2643b799fb190846ae09c61e59a859f',
u'upload_date': u'20131108',
},
u'params': {
# rtmp
u'skip_download': True,
},
}

View File

@ -0,0 +1,228 @@
import re
import json
import itertools
from .common import InfoExtractor
from .subtitles import SubtitlesInfoExtractor
from ..utils import (
compat_urllib_request,
compat_str,
get_element_by_attribute,
get_element_by_id,
orderedSet,
ExtractorError,
)
class DailymotionBaseInfoExtractor(InfoExtractor):
@staticmethod
def _build_request(url):
"""Build a request with the family filter disabled"""
request = compat_urllib_request.Request(url)
request.add_header('Cookie', 'family_filter=off')
request.add_header('Cookie', 'ff=off')
return request
class DailymotionIE(DailymotionBaseInfoExtractor, SubtitlesInfoExtractor):
"""Information Extractor for Dailymotion"""
_VALID_URL = r'(?i)(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/(?:embed/)?video/([^/]+)'
IE_NAME = u'dailymotion'
_FORMATS = [
(u'stream_h264_ld_url', u'ld'),
(u'stream_h264_url', u'standard'),
(u'stream_h264_hq_url', u'hq'),
(u'stream_h264_hd_url', u'hd'),
(u'stream_h264_hd1080_url', u'hd180'),
]
_TESTS = [
{
u'url': u'http://www.dailymotion.com/video/x33vw9_tutoriel-de-youtubeur-dl-des-video_tech',
u'file': u'x33vw9.mp4',
u'md5': u'392c4b85a60a90dc4792da41ce3144eb',
u'info_dict': {
u"uploader": u"Amphora Alex and Van .",
u"title": u"Tutoriel de Youtubeur\"DL DES VIDEO DE YOUTUBE\""
}
},
# Vevo video
{
u'url': u'http://www.dailymotion.com/video/x149uew_katy-perry-roar-official_musi',
u'file': u'USUV71301934.mp4',
u'info_dict': {
u'title': u'Roar (Official)',
u'uploader': u'Katy Perry',
u'upload_date': u'20130905',
},
u'params': {
u'skip_download': True,
},
u'skip': u'VEVO is only available in some countries',
},
# age-restricted video
{
u'url': u'http://www.dailymotion.com/video/xyh2zz_leanna-decker-cyber-girl-of-the-year-desires-nude-playboy-plus_redband',
u'file': u'xyh2zz.mp4',
u'md5': u'0d667a7b9cebecc3c89ee93099c4159d',
u'info_dict': {
u'title': 'Leanna Decker - Cyber Girl Of The Year Desires Nude [Playboy Plus]',
u'uploader': 'HotWaves1012',
u'age_limit': 18,
}
}
]
def _real_extract(self, url):
# Extract id and simplified title from URL
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group(1).split('_')[0].split('?')[0]
url = 'http://www.dailymotion.com/video/%s' % video_id
# Retrieve video webpage to extract further information
request = self._build_request(url)
webpage = self._download_webpage(request, video_id)
# Extract URL, uploader and title from webpage
self.report_extraction(video_id)
# It may just embed a vevo video:
m_vevo = re.search(
r'<link rel="video_src" href="[^"]*?vevo.com[^"]*?videoId=(?P<id>[\w]*)',
webpage)
if m_vevo is not None:
vevo_id = m_vevo.group('id')
self.to_screen(u'Vevo video detected: %s' % vevo_id)
return self.url_result(u'vevo:%s' % vevo_id, ie='Vevo')
video_uploader = self._search_regex([r'(?im)<span class="owner[^\"]+?">[^<]+?<a [^>]+?>([^<]+?)</a>',
# Looking for official user
r'<(?:span|a) .*?rel="author".*?>([^<]+?)</'],
webpage, 'video uploader', fatal=False)
age_limit = self._rta_search(webpage)
video_upload_date = None
mobj = re.search(r'<div class="[^"]*uploaded_cont[^"]*" title="[^"]*">([0-9]{2})-([0-9]{2})-([0-9]{4})</div>', webpage)
if mobj is not None:
video_upload_date = mobj.group(3) + mobj.group(2) + mobj.group(1)
embed_url = 'http://www.dailymotion.com/embed/video/%s' % video_id
embed_page = self._download_webpage(embed_url, video_id,
u'Downloading embed page')
info = self._search_regex(r'var info = ({.*?}),$', embed_page,
'video info', flags=re.MULTILINE)
info = json.loads(info)
if info.get('error') is not None:
msg = 'Couldn\'t get video, Dailymotion says: %s' % info['error']['title']
raise ExtractorError(msg, expected=True)
formats = []
for (key, format_id) in self._FORMATS:
video_url = info.get(key)
if video_url is not None:
m_size = re.search(r'H264-(\d+)x(\d+)', video_url)
if m_size is not None:
width, height = m_size.group(1), m_size.group(2)
else:
width, height = None, None
formats.append({
'url': video_url,
'ext': 'mp4',
'format_id': format_id,
'width': width,
'height': height,
})
if not formats:
raise ExtractorError(u'Unable to extract video URL')
# subtitles
video_subtitles = self.extract_subtitles(video_id, webpage)
if self._downloader.params.get('listsubtitles', False):
self._list_available_subtitles(video_id, webpage)
return
return {
'id': video_id,
'formats': formats,
'uploader': video_uploader,
'upload_date': video_upload_date,
'title': self._og_search_title(webpage),
'subtitles': video_subtitles,
'thumbnail': info['thumbnail_url'],
'age_limit': age_limit,
}
def _get_available_subtitles(self, video_id, webpage):
try:
sub_list = self._download_webpage(
'https://api.dailymotion.com/video/%s/subtitles?fields=id,language,url' % video_id,
video_id, note=False)
except ExtractorError as err:
self._downloader.report_warning(u'unable to download video subtitles: %s' % compat_str(err))
return {}
info = json.loads(sub_list)
if (info['total'] > 0):
sub_lang_list = dict((l['language'], l['url']) for l in info['list'])
return sub_lang_list
self._downloader.report_warning(u'video doesn\'t have subtitles')
return {}
class DailymotionPlaylistIE(DailymotionBaseInfoExtractor):
IE_NAME = u'dailymotion:playlist'
_VALID_URL = r'(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/playlist/(?P<id>.+?)/'
_MORE_PAGES_INDICATOR = r'<div class="next">.*?<a.*?href="/playlist/.+?".*?>.*?</a>.*?</div>'
_PAGE_TEMPLATE = 'https://www.dailymotion.com/playlist/%s/%s'
def _extract_entries(self, id):
video_ids = []
for pagenum in itertools.count(1):
request = self._build_request(self._PAGE_TEMPLATE % (id, pagenum))
webpage = self._download_webpage(request,
id, u'Downloading page %s' % pagenum)
playlist_el = get_element_by_attribute(u'class', u'row video_list', webpage)
video_ids.extend(re.findall(r'data-id="(.+?)"', playlist_el))
if re.search(self._MORE_PAGES_INDICATOR, webpage, re.DOTALL) is None:
break
return [self.url_result('http://www.dailymotion.com/video/%s' % video_id, 'Dailymotion')
for video_id in orderedSet(video_ids)]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
playlist_id = mobj.group('id')
webpage = self._download_webpage(url, playlist_id)
return {'_type': 'playlist',
'id': playlist_id,
'title': get_element_by_id(u'playlist_name', webpage),
'entries': self._extract_entries(playlist_id),
}
class DailymotionUserIE(DailymotionPlaylistIE):
IE_NAME = u'dailymotion:user'
_VALID_URL = r'(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/user/(?P<user>[^/]+)'
_MORE_PAGES_INDICATOR = r'<div class="next">.*?<a.*?href="/user/.+?".*?>.*?</a>.*?</div>'
_PAGE_TEMPLATE = 'http://www.dailymotion.com/user/%s/%s'
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
user = mobj.group('user')
webpage = self._download_webpage(url, user)
full_user = self._html_search_regex(
r'<a class="label" href="/%s".*?>(.*?)</' % re.escape(user),
webpage, u'user', flags=re.DOTALL)
return {
'_type': 'playlist',
'id': user,
'title': full_user,
'entries': self._extract_entries(user),
}

View File

@ -0,0 +1,74 @@
# encoding: utf-8
import re
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import (
compat_urllib_parse,
determine_ext,
)
class DaumIE(InfoExtractor):
_VALID_URL = r'https?://tvpot\.daum\.net/.*?clipid=(?P<id>\d+)'
IE_NAME = u'daum.net'
_TEST = {
u'url': u'http://tvpot.daum.net/clip/ClipView.do?clipid=52554690',
u'file': u'52554690.mp4',
u'info_dict': {
u'title': u'DOTA 2GETHER 시즌2 6회 - 2부',
u'description': u'DOTA 2GETHER 시즌2 6회 - 2부',
u'upload_date': u'20130831',
u'duration': 3868,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group(1)
canonical_url = 'http://tvpot.daum.net/v/%s' % video_id
webpage = self._download_webpage(canonical_url, video_id)
full_id = self._search_regex(r'<link rel="video_src" href=".+?vid=(.+?)"',
webpage, u'full id')
query = compat_urllib_parse.urlencode({'vid': full_id})
info_xml = self._download_webpage(
'http://tvpot.daum.net/clip/ClipInfoXml.do?' + query, video_id,
u'Downloading video info')
urls_xml = self._download_webpage(
'http://videofarm.daum.net/controller/api/open/v1_2/MovieData.apixml?' + query,
video_id, u'Downloading video formats info')
info = xml.etree.ElementTree.fromstring(info_xml.encode('utf-8'))
urls = xml.etree.ElementTree.fromstring(urls_xml.encode('utf-8'))
self.to_screen(u'%s: Getting video urls' % video_id)
formats = []
for format_el in urls.findall('result/output_list/output_list'):
profile = format_el.attrib['profile']
format_query = compat_urllib_parse.urlencode({
'vid': full_id,
'profile': profile,
})
url_xml = self._download_webpage(
'http://videofarm.daum.net/controller/api/open/v1_2/MovieLocation.apixml?' + format_query,
video_id, note=False)
url_doc = xml.etree.ElementTree.fromstring(url_xml.encode('utf-8'))
format_url = url_doc.find('result/url').text
formats.append({
'url': format_url,
'ext': determine_ext(format_url),
'format_id': profile,
})
info = {
'id': video_id,
'title': info.find('TITLE').text,
'formats': formats,
'thumbnail': self._og_search_thumbnail(webpage),
'description': info.find('CONTENTS').text,
'duration': int(info.find('DURATION').text),
'upload_date': info.find('REGDTTM').text[:8],
}
# TODO: Remove when #980 has been merged
info.update(formats[-1])
return info

View File

@ -0,0 +1,39 @@
import re
import json
from .common import InfoExtractor
class DefenseGouvFrIE(InfoExtractor):
_IE_NAME = 'defense.gouv.fr'
_VALID_URL = (r'http://.*?\.defense\.gouv\.fr/layout/set/'
r'ligthboxvideo/base-de-medias/webtv/(.*)')
_TEST = {
u'url': (u'http://www.defense.gouv.fr/layout/set/ligthboxvideo/'
u'base-de-medias/webtv/attaque-chimique-syrienne-du-21-aout-2013-1'),
u'file': u'11213.mp4',
u'md5': u'75bba6124da7e63d2d60b5244ec9430c',
"info_dict": {
"title": "attaque-chimique-syrienne-du-21-aout-2013-1"
}
}
def _real_extract(self, url):
title = re.match(self._VALID_URL, url).group(1)
webpage = self._download_webpage(url, title)
video_id = self._search_regex(
r"flashvars.pvg_id=\"(\d+)\";",
webpage, 'ID')
json_url = ('http://static.videos.gouv.fr/brightcovehub/export/json/'
+ video_id)
info = self._download_webpage(json_url, title,
'Downloading JSON config')
video_url = json.loads(info)['renditions'][0]['url']
return {'id': video_id,
'ext': 'mp4',
'url': video_url,
'title': title,
}

View File

@ -0,0 +1,60 @@
import re
import os
import socket
from .common import InfoExtractor
from ..utils import (
compat_http_client,
compat_str,
compat_urllib_error,
compat_urllib_parse,
compat_urllib_request,
ExtractorError,
)
class DepositFilesIE(InfoExtractor):
"""Information extractor for depositfiles.com"""
_VALID_URL = r'(?:http://)?(?:\w+\.)?depositfiles\.com/(?:../(?#locale))?files/(.+)'
def _real_extract(self, url):
file_id = url.split('/')[-1]
# Rebuild url in english locale
url = 'http://depositfiles.com/en/files/' + file_id
# Retrieve file webpage with 'Free download' button pressed
free_download_indication = {'gateway_result' : '1'}
request = compat_urllib_request.Request(url, compat_urllib_parse.urlencode(free_download_indication))
try:
self.report_download_webpage(file_id)
webpage = compat_urllib_request.urlopen(request).read()
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
raise ExtractorError(u'Unable to retrieve file webpage: %s' % compat_str(err))
# Search for the real file URL
mobj = re.search(r'<form action="(http://fileshare.+?)"', webpage)
if (mobj is None) or (mobj.group(1) is None):
# Try to figure out reason of the error.
mobj = re.search(r'<strong>(Attention.*?)</strong>', webpage, re.DOTALL)
if (mobj is not None) and (mobj.group(1) is not None):
restriction_message = re.sub('\s+', ' ', mobj.group(1)).strip()
raise ExtractorError(u'%s' % restriction_message)
else:
raise ExtractorError(u'Unable to extract download URL from: %s' % url)
file_url = mobj.group(1)
file_extension = os.path.splitext(file_url)[1][1:]
# Search for file title
file_title = self._search_regex(r'<b title="(.*?)">', webpage, u'title')
return [{
'id': file_id.decode('utf-8'),
'url': file_url.decode('utf-8'),
'uploader': None,
'upload_date': None,
'title': file_title,
'ext': file_extension.decode('utf-8'),
}]

View File

@ -0,0 +1,41 @@
import re
import json
import time
from .common import InfoExtractor
class DotsubIE(InfoExtractor):
_VALID_URL = r'(?:http://)?(?:www\.)?dotsub\.com/view/([^/]+)'
_TEST = {
u'url': u'http://dotsub.com/view/aed3b8b2-1889-4df5-ae63-ad85f5572f27',
u'file': u'aed3b8b2-1889-4df5-ae63-ad85f5572f27.flv',
u'md5': u'0914d4d69605090f623b7ac329fea66e',
u'info_dict': {
u"title": u"Pyramids of Waste (2010), AKA The Lightbulb Conspiracy - Planned obsolescence documentary",
u"uploader": u"4v4l0n42",
u'description': u'Pyramids of Waste (2010) also known as "The lightbulb conspiracy" is a documentary about how our economic system based on consumerism and planned obsolescence is breaking our planet down.\r\n\r\nSolutions to this can be found at:\r\nhttp://robotswillstealyourjob.com\r\nhttp://www.federicopistono.org\r\n\r\nhttp://opensourceecology.org\r\nhttp://thezeitgeistmovement.com',
u'thumbnail': u'http://dotsub.com/media/aed3b8b2-1889-4df5-ae63-ad85f5572f27/p',
u'upload_date': u'20101213',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group(1)
info_url = "https://dotsub.com/api/media/%s/metadata" %(video_id)
webpage = self._download_webpage(info_url, video_id)
info = json.loads(webpage)
date = time.gmtime(info['dateCreated']/1000) # The timestamp is in miliseconds
return [{
'id': video_id,
'url': info['mediaURI'],
'ext': 'flv',
'title': info['title'],
'thumbnail': info['screenshotURI'],
'description': info['description'],
'uploader': info['user'],
'view_count': info['numberOfViews'],
'upload_date': u'%04i%02i%02i' % (date.tm_year, date.tm_mon, date.tm_mday),
}]

View File

@ -0,0 +1,85 @@
# coding: utf-8
import re
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import (
determine_ext,
unified_strdate,
)
class DreiSatIE(InfoExtractor):
IE_NAME = '3sat'
_VALID_URL = r'(?:http://)?(?:www\.)?3sat.de/mediathek/index.php\?(?:(?:mode|display)=[^&]+&)*obj=(?P<id>[0-9]+)$'
_TEST = {
u"url": u"http://www.3sat.de/mediathek/index.php?obj=36983",
u'file': u'36983.webm',
u'md5': u'57c97d0469d71cf874f6815aa2b7c944',
u'info_dict': {
u"title": u"Kaffeeland Schweiz",
u"description": u"Über 80 Kaffeeröstereien liefern in der Schweiz das Getränk, in das das Land so vernarrt ist: Mehr als 1000 Tassen trinkt ein Schweizer pro Jahr. SCHWEIZWEIT nimmt die Kaffeekultur unter die...",
u"uploader": u"3sat",
u"upload_date": u"20130622"
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
details_url = 'http://www.3sat.de/mediathek/xmlservice/web/beitragsDetails?ak=web&id=%s' % video_id
details_xml = self._download_webpage(details_url, video_id, note=u'Downloading video details')
details_doc = xml.etree.ElementTree.fromstring(details_xml.encode('utf-8'))
thumbnail_els = details_doc.findall('.//teaserimage')
thumbnails = [{
'width': te.attrib['key'].partition('x')[0],
'height': te.attrib['key'].partition('x')[2],
'url': te.text,
} for te in thumbnail_els]
information_el = details_doc.find('.//information')
video_title = information_el.find('./title').text
video_description = information_el.find('./detail').text
details_el = details_doc.find('.//details')
video_uploader = details_el.find('./channel').text
upload_date = unified_strdate(details_el.find('./airtime').text)
format_els = details_doc.findall('.//formitaet')
formats = [{
'format_id': fe.attrib['basetype'],
'width': int(fe.find('./width').text),
'height': int(fe.find('./height').text),
'url': fe.find('./url').text,
'ext': determine_ext(fe.find('./url').text),
'filesize': int(fe.find('./filesize').text),
'video_bitrate': int(fe.find('./videoBitrate').text),
'3sat_qualityname': fe.find('./quality').text,
} for fe in format_els
if not fe.find('./url').text.startswith('http://www.metafilegenerator.de/')]
def _sortkey(format):
qidx = ['low', 'med', 'high', 'veryhigh'].index(format['3sat_qualityname'])
prefer_http = 1 if 'rtmp' in format['url'] else 0
return (qidx, prefer_http, format['video_bitrate'])
formats.sort(key=_sortkey)
info = {
'_type': 'video',
'id': video_id,
'title': video_title,
'formats': formats,
'description': video_description,
'thumbnails': thumbnails,
'thumbnail': thumbnails[-1]['url'],
'uploader': video_uploader,
'upload_date': upload_date,
}
# TODO: Remove when #980 has been merged
info.update(formats[-1])
return info

View File

@ -0,0 +1,37 @@
import re
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import determine_ext
class EbaumsWorldIE(InfoExtractor):
_VALID_URL = r'https?://www\.ebaumsworld\.com/video/watch/(?P<id>\d+)'
_TEST = {
u'url': u'http://www.ebaumsworld.com/video/watch/83367677/',
u'file': u'83367677.mp4',
u'info_dict': {
u'title': u'A Giant Python Opens The Door',
u'description': u'This is how nightmares start...',
u'uploader': u'jihadpizza',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
config_xml = self._download_webpage(
'http://www.ebaumsworld.com/video/player/%s' % video_id, video_id)
config = xml.etree.ElementTree.fromstring(config_xml.encode('utf-8'))
video_url = config.find('file').text
return {
'id': video_id,
'title': config.find('title').text,
'url': video_url,
'ext': determine_ext(video_url),
'description': config.find('description').text,
'thumbnail': config.find('image').text,
'uploader': config.find('username').text,
}

View File

@ -0,0 +1,46 @@
import re
from ..utils import (
compat_urllib_parse,
determine_ext
)
from .common import InfoExtractor
class EHowIE(InfoExtractor):
IE_NAME = u'eHow'
_VALID_URL = r'(?:https?://)?(?:www\.)?ehow\.com/[^/_?]*_(?P<id>[0-9]+)'
_TEST = {
u'url': u'http://www.ehow.com/video_12245069_hardwood-flooring-basics.html',
u'file': u'12245069.flv',
u'md5': u'9809b4e3f115ae2088440bcb4efbf371',
u'info_dict': {
u"title": u"Hardwood Flooring Basics",
u"description": u"Hardwood flooring may be time consuming, but its ultimately a pretty straightforward concept. Learn about hardwood flooring basics with help from a hardware flooring business owner in this free video...",
u"uploader": u"Erick Nathan"
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
video_url = self._search_regex(r'(?:file|source)=(http[^\'"&]*)',
webpage, u'video URL')
final_url = compat_urllib_parse.unquote(video_url)
uploader = self._search_regex(r'<meta name="uploader" content="(.+?)" />',
webpage, u'uploader')
title = self._og_search_title(webpage).replace(' | eHow', '')
ext = determine_ext(final_url)
return {
'_type': 'video',
'id': video_id,
'url': final_url,
'ext': ext,
'title': title,
'thumbnail': self._og_search_thumbnail(webpage),
'description': self._og_search_description(webpage),
'uploader': uploader,
}

View File

@ -0,0 +1,119 @@
import json
import random
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
)
class EightTracksIE(InfoExtractor):
IE_NAME = '8tracks'
_VALID_URL = r'https?://8tracks.com/(?P<user>[^/]+)/(?P<id>[^/#]+)(?:#.*)?$'
_TEST = {
u"name": u"EightTracks",
u"url": u"http://8tracks.com/ytdl/youtube-dl-test-tracks-a",
u"playlist": [
{
u"file": u"11885610.m4a",
u"md5": u"96ce57f24389fc8734ce47f4c1abcc55",
u"info_dict": {
u"title": u"youtue-dl project<>\"' - youtube-dl test track 1 \"'/\\\u00e4\u21ad",
u"uploader_id": u"ytdl"
}
},
{
u"file": u"11885608.m4a",
u"md5": u"4ab26f05c1f7291ea460a3920be8021f",
u"info_dict": {
u"title": u"youtube-dl project - youtube-dl test track 2 \"'/\\\u00e4\u21ad",
u"uploader_id": u"ytdl"
}
},
{
u"file": u"11885679.m4a",
u"md5": u"d30b5b5f74217410f4689605c35d1fd7",
u"info_dict": {
u"title": u"youtube-dl project as well - youtube-dl test track 3 \"'/\\\u00e4\u21ad",
u"uploader_id": u"ytdl"
}
},
{
u"file": u"11885680.m4a",
u"md5": u"4eb0a669317cd725f6bbd336a29f923a",
u"info_dict": {
u"title": u"youtube-dl project as well - youtube-dl test track 4 \"'/\\\u00e4\u21ad",
u"uploader_id": u"ytdl"
}
},
{
u"file": u"11885682.m4a",
u"md5": u"1893e872e263a2705558d1d319ad19e8",
u"info_dict": {
u"title": u"PH - youtube-dl test track 5 \"'/\\\u00e4\u21ad",
u"uploader_id": u"ytdl"
}
},
{
u"file": u"11885683.m4a",
u"md5": u"b673c46f47a216ab1741ae8836af5899",
u"info_dict": {
u"title": u"PH - youtube-dl test track 6 \"'/\\\u00e4\u21ad",
u"uploader_id": u"ytdl"
}
},
{
u"file": u"11885684.m4a",
u"md5": u"1d74534e95df54986da7f5abf7d842b7",
u"info_dict": {
u"title": u"phihag - youtube-dl test track 7 \"'/\\\u00e4\u21ad",
u"uploader_id": u"ytdl"
}
},
{
u"file": u"11885685.m4a",
u"md5": u"f081f47af8f6ae782ed131d38b9cd1c0",
u"info_dict": {
u"title": u"phihag - youtube-dl test track 8 \"'/\\\u00e4\u21ad",
u"uploader_id": u"ytdl"
}
}
]
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
if mobj is None:
raise ExtractorError(u'Invalid URL: %s' % url)
playlist_id = mobj.group('id')
webpage = self._download_webpage(url, playlist_id)
json_like = self._search_regex(r"PAGE.mix = (.*?);\n", webpage, u'trax information', flags=re.DOTALL)
data = json.loads(json_like)
session = str(random.randint(0, 1000000000))
mix_id = data['id']
track_count = data['tracks_count']
first_url = 'http://8tracks.com/sets/%s/play?player=sm&mix_id=%s&format=jsonh' % (session, mix_id)
next_url = first_url
res = []
for i in range(track_count):
api_json = self._download_webpage(next_url, playlist_id,
note=u'Downloading song information %s/%s' % (str(i+1), track_count),
errnote=u'Failed to download song information')
api_data = json.loads(api_json)
track_data = api_data[u'set']['track']
info = {
'id': track_data['id'],
'url': track_data['track_file_stream_url'],
'title': track_data['performer'] + u' - ' + track_data['name'],
'raw_title': track_data['name'],
'uploader_id': data['user']['login'],
'ext': 'm4a',
}
res.append(info)
next_url = 'http://8tracks.com/sets/%s/next?player=sm&mix_id=%s&format=jsonh&track_id=%s' % (session, mix_id, track_data['id'])
return res

View File

@ -0,0 +1,37 @@
# encoding: utf-8
import re
from .common import InfoExtractor
from .brightcove import BrightcoveIE
from ..utils import ExtractorError
class EitbIE(InfoExtractor):
IE_NAME = u'eitb.tv'
_VALID_URL = r'https?://www\.eitb\.tv/(eu/bideoa|es/video)/[^/]+/(?P<playlist_id>\d+)/(?P<chapter_id>\d+)'
_TEST = {
u'add_ie': ['Brightcove'],
u'url': u'http://www.eitb.tv/es/video/60-minutos-60-minutos-2013-2014/2677100210001/2743577154001/lasa-y-zabala-30-anos/',
u'md5': u'edf4436247185adee3ea18ce64c47998',
u'info_dict': {
u'id': u'2743577154001',
u'ext': u'mp4',
u'title': u'60 minutos (Lasa y Zabala, 30 años)',
# All videos from eitb has this description in the brightcove info
u'description': u'.',
u'uploader': u'Euskal Telebista',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
chapter_id = mobj.group('chapter_id')
webpage = self._download_webpage(url, chapter_id)
bc_url = BrightcoveIE._extract_brightcove_url(webpage)
if bc_url is None:
raise ExtractorError(u'Could not extract the Brightcove url')
# The BrightcoveExperience object doesn't contain the video id, we set
# it manually
bc_url += '&%40videoPlayer={0}'.format(chapter_id)
return self.url_result(bc_url, BrightcoveIE.ie_key())

View File

@ -0,0 +1,84 @@
import json
import re
from .common import InfoExtractor
from ..utils import (
compat_str,
compat_urllib_parse,
ExtractorError,
)
class EscapistIE(InfoExtractor):
_VALID_URL = r'^https?://?(www\.)?escapistmagazine\.com/videos/view/(?P<showname>[^/]+)/(?P<episode>[^/?]+)[/?]?.*$'
_TEST = {
u'url': u'http://www.escapistmagazine.com/videos/view/the-escapist-presents/6618-Breaking-Down-Baldurs-Gate',
u'file': u'6618-Breaking-Down-Baldurs-Gate.mp4',
u'md5': u'ab3a706c681efca53f0a35f1415cf0d1',
u'info_dict': {
u"description": u"Baldur's Gate: Original, Modded or Enhanced Edition? I'll break down what you can expect from the new Baldur's Gate: Enhanced Edition.",
u"uploader": u"the-escapist-presents",
u"title": u"Breaking Down Baldur's Gate"
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
showName = mobj.group('showname')
videoId = mobj.group('episode')
self.report_extraction(videoId)
webpage = self._download_webpage(url, videoId)
videoDesc = self._html_search_regex(
r'<meta name="description" content="([^"]*)"',
webpage, u'description', fatal=False)
playerUrl = self._og_search_video_url(webpage, name=u'player URL')
title = self._html_search_regex(
r'<meta name="title" content="([^"]*)"',
webpage, u'title').split(' : ')[-1]
configUrl = self._search_regex('config=(.*)$', playerUrl, u'config URL')
configUrl = compat_urllib_parse.unquote(configUrl)
formats = []
def _add_format(name, cfgurl):
configJSON = self._download_webpage(
cfgurl, videoId,
u'Downloading ' + name + ' configuration',
u'Unable to download ' + name + ' configuration')
# Technically, it's JavaScript, not JSON
configJSON = configJSON.replace("'", '"')
try:
config = json.loads(configJSON)
except (ValueError,) as err:
raise ExtractorError(u'Invalid JSON in configuration file: ' + compat_str(err))
playlist = config['playlist']
formats.append({
'url': playlist[1]['url'],
'format_id': name,
})
_add_format(u'normal', configUrl)
hq_url = (configUrl +
('&hq=1' if '?' in configUrl else configUrl + '?hq=1'))
try:
_add_format(u'hq', hq_url)
except ExtractorError:
pass # That's fine, we'll just use normal quality
return {
'id': videoId,
'formats': formats,
'uploader': showName,
'title': title,
'thumbnail': self._og_search_thumbnail(webpage),
'description': videoDesc,
'player_url': playerUrl,
}

View File

@ -0,0 +1,56 @@
import re
import json
from .common import InfoExtractor
class ExfmIE(InfoExtractor):
IE_NAME = u'exfm'
IE_DESC = u'ex.fm'
_VALID_URL = r'(?:http://)?(?:www\.)?ex\.fm/song/([^/]+)'
_SOUNDCLOUD_URL = r'(?:http://)?(?:www\.)?api\.soundcloud.com/tracks/([^/]+)/stream'
_TESTS = [
{
u'url': u'http://ex.fm/song/eh359',
u'file': u'44216187.mp3',
u'md5': u'e45513df5631e6d760970b14cc0c11e7',
u'info_dict': {
u"title": u"Test House \"Love Is Not Enough\" (Extended Mix) DeadJournalist Exclusive",
u"uploader": u"deadjournalist",
u'upload_date': u'20120424',
u'description': u'Test House \"Love Is Not Enough\" (Extended Mix) DeadJournalist Exclusive',
},
u'note': u'Soundcloud song',
u'skip': u'The site is down too often',
},
{
u'url': u'http://ex.fm/song/wddt8',
u'file': u'wddt8.mp3',
u'md5': u'966bd70741ac5b8570d8e45bfaed3643',
u'info_dict': {
u'title': u'Safe and Sound',
u'uploader': u'Capital Cities',
},
u'skip': u'The site is down too often',
},
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
song_id = mobj.group(1)
info_url = "http://ex.fm/api/v3/song/%s" %(song_id)
webpage = self._download_webpage(info_url, song_id)
info = json.loads(webpage)
song_url = info['song']['url']
if re.match(self._SOUNDCLOUD_URL, song_url) is not None:
self.to_screen('Soundcloud song detected')
return self.url_result(song_url.replace('/stream',''), 'Soundcloud')
return [{
'id': song_id,
'url': song_url,
'ext': 'mp3',
'title': info['song']['title'],
'thumbnail': info['song']['image']['large'],
'uploader': info['song']['artist'],
'view_count': info['song']['loved_count'],
}]

View File

@ -0,0 +1,50 @@
import os
import re
from .common import InfoExtractor
from ..utils import (
compat_urllib_parse_urlparse,
compat_urllib_request,
compat_urllib_parse,
)
class ExtremeTubeIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?(?:www\.)?(?P<url>extremetube\.com/video/.+?(?P<videoid>[0-9]+))(?:[/?&]|$)'
_TEST = {
u'url': u'http://www.extremetube.com/video/music-video-14-british-euro-brit-european-cumshots-swallow-652431',
u'file': u'652431.mp4',
u'md5': u'1fb9228f5e3332ec8c057d6ac36f33e0',
u'info_dict': {
u"title": u"Music Video 14 british euro brit european cumshots swallow",
u"uploader": u"unknown",
u"age_limit": 18,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('videoid')
url = 'http://www.' + mobj.group('url')
req = compat_urllib_request.Request(url)
req.add_header('Cookie', 'age_verified=1')
webpage = self._download_webpage(req, video_id)
video_title = self._html_search_regex(r'<h1 [^>]*?title="([^"]+)"[^>]*>\1<', webpage, u'title')
uploader = self._html_search_regex(r'>Posted by:(?=<)(?:\s|<[^>]*>)*(.+?)\|', webpage, u'uploader', fatal=False)
video_url = compat_urllib_parse.unquote(self._html_search_regex(r'video_url=(.+?)&amp;', webpage, u'video_url'))
path = compat_urllib_parse_urlparse(video_url).path
extension = os.path.splitext(path)[1][1:]
format = path.split('/')[5].split('_')[:2]
format = "-".join(format)
return {
'id': video_id,
'title': video_title,
'uploader': uploader,
'url': video_url,
'ext': extension,
'format': format,
'format_id': format,
'age_limit': 18,
}

View File

@ -0,0 +1,132 @@
import json
import re
import socket
from .common import InfoExtractor
from ..utils import (
compat_http_client,
compat_str,
compat_urllib_error,
compat_urllib_parse,
compat_urllib_request,
ExtractorError,
)
class FacebookIE(InfoExtractor):
"""Information Extractor for Facebook"""
_VALID_URL = r'^(?:https?://)?(?:\w+\.)?facebook\.com/(?:video/video|photo)\.php\?(?:.*?)v=(?P<ID>\d+)(?:.*)'
_LOGIN_URL = 'https://www.facebook.com/login.php?next=http%3A%2F%2Ffacebook.com%2Fhome.php&login_attempt=1'
_CHECKPOINT_URL = 'https://www.facebook.com/checkpoint/?next=http%3A%2F%2Ffacebook.com%2Fhome.php&_fb_noscript=1'
_NETRC_MACHINE = 'facebook'
IE_NAME = u'facebook'
_TEST = {
u'url': u'https://www.facebook.com/photo.php?v=120708114770723',
u'file': u'120708114770723.mp4',
u'md5': u'48975a41ccc4b7a581abd68651c1a5a8',
u'info_dict': {
u"duration": 279,
u"title": u"PEOPLE ARE AWESOME 2013"
}
}
def report_login(self):
"""Report attempt to log in."""
self.to_screen(u'Logging in')
def _login(self):
(useremail, password) = self._get_login_info()
if useremail is None:
return
login_page_req = compat_urllib_request.Request(self._LOGIN_URL)
login_page_req.add_header('Cookie', 'locale=en_US')
self.report_login()
login_page = self._download_webpage(login_page_req, None, note=False,
errnote=u'Unable to download login page')
lsd = self._search_regex(r'"lsd":"(\w*?)"', login_page, u'lsd')
lgnrnd = self._search_regex(r'name="lgnrnd" value="([^"]*?)"', login_page, u'lgnrnd')
login_form = {
'email': useremail,
'pass': password,
'lsd': lsd,
'lgnrnd': lgnrnd,
'next': 'http://facebook.com/home.php',
'default_persistent': '0',
'legacy_return': '1',
'timezone': '-60',
'trynum': '1',
}
request = compat_urllib_request.Request(self._LOGIN_URL, compat_urllib_parse.urlencode(login_form))
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
try:
login_results = compat_urllib_request.urlopen(request).read()
if re.search(r'<form(.*)name="login"(.*)</form>', login_results) is not None:
self._downloader.report_warning(u'unable to log in: bad username/password, or exceded login rate limit (~3/min). Check credentials or wait.')
return
check_form = {
'fb_dtsg': self._search_regex(r'"fb_dtsg":"(.*?)"', login_results, u'fb_dtsg'),
'nh': self._search_regex(r'name="nh" value="(\w*?)"', login_results, u'nh'),
'name_action_selected': 'dont_save',
'submit[Continue]': self._search_regex(r'<input value="(.*?)" name="submit\[Continue\]"', login_results, u'continue'),
}
check_req = compat_urllib_request.Request(self._CHECKPOINT_URL, compat_urllib_parse.urlencode(check_form))
check_req.add_header('Content-Type', 'application/x-www-form-urlencoded')
check_response = compat_urllib_request.urlopen(check_req).read()
if re.search(r'id="checkpointSubmitButton"', check_response) is not None:
self._downloader.report_warning(u'Unable to confirm login, you have to login in your brower and authorize the login.')
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
self._downloader.report_warning(u'unable to log in: %s' % compat_str(err))
return
def _real_initialize(self):
self._login()
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
if mobj is None:
raise ExtractorError(u'Invalid URL: %s' % url)
video_id = mobj.group('ID')
url = 'https://www.facebook.com/video/video.php?v=%s' % video_id
webpage = self._download_webpage(url, video_id)
BEFORE = '{swf.addParam(param[0], param[1]);});\n'
AFTER = '.forEach(function(variable) {swf.addVariable(variable[0], variable[1]);});'
m = re.search(re.escape(BEFORE) + '(.*?)' + re.escape(AFTER), webpage)
if not m:
m_msg = re.search(r'class="[^"]*uiInterstitialContent[^"]*"><div>(.*?)</div>', webpage)
if m_msg is not None:
raise ExtractorError(
u'The video is not available, Facebook said: "%s"' % m_msg.group(1),
expected=True)
else:
raise ExtractorError(u'Cannot parse data')
data = dict(json.loads(m.group(1)))
params_raw = compat_urllib_parse.unquote(data['params'])
params = json.loads(params_raw)
video_data = params['video_data'][0]
video_url = video_data.get('hd_src')
if not video_url:
video_url = video_data['sd_src']
if not video_url:
raise ExtractorError(u'Cannot find video URL')
video_duration = int(video_data['video_duration'])
thumbnail = video_data['thumbnail_src']
video_title = self._html_search_regex(
r'<h2 class="uiHeaderTitle">([^<]*)</h2>', webpage, u'title')
info = {
'id': video_id,
'title': video_title,
'url': video_url,
'ext': 'mp4',
'duration': video_duration,
'thumbnail': thumbnail,
}
return [info]

View File

@ -0,0 +1,58 @@
# encoding: utf-8
import re
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import (
determine_ext,
)
class FazIE(InfoExtractor):
IE_NAME = u'faz.net'
_VALID_URL = r'https?://www\.faz\.net/multimedia/videos/.*?-(?P<id>\d+).html'
_TEST = {
u'url': u'http://www.faz.net/multimedia/videos/stockholm-chemie-nobelpreis-fuer-drei-amerikanische-forscher-12610585.html',
u'file': u'12610585.mp4',
u'info_dict': {
u'title': u'Stockholm: Chemie-Nobelpreis für drei amerikanische Forscher',
u'description': u'md5:1453fbf9a0d041d985a47306192ea253',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
self.to_screen(video_id)
webpage = self._download_webpage(url, video_id)
config_xml_url = self._search_regex(r'writeFLV\(\'(.+?)\',', webpage,
u'config xml url')
config_xml = self._download_webpage(config_xml_url, video_id,
u'Downloading config xml')
config = xml.etree.ElementTree.fromstring(config_xml.encode('utf-8'))
encodings = config.find('ENCODINGS')
formats = []
for code in ['LOW', 'HIGH', 'HQ']:
encoding = encodings.find(code)
if encoding is None:
continue
encoding_url = encoding.find('FILENAME').text
formats.append({
'url': encoding_url,
'ext': determine_ext(encoding_url),
'format_id': code.lower(),
})
descr = self._html_search_regex(r'<p class="Content Copy">(.*?)</p>', webpage, u'description')
info = {
'id': video_id,
'title': self._og_search_title(webpage),
'formats': formats,
'description': descr,
'thumbnail': config.find('STILL/STILL_BIG').text,
}
# TODO: Remove when #980 has been merged
info.update(formats[-1])
return info

View File

@ -0,0 +1,78 @@
import re
import random
import json
from .common import InfoExtractor
from ..utils import (
determine_ext,
get_element_by_id,
clean_html,
)
class FKTVIE(InfoExtractor):
IE_NAME = u'fernsehkritik.tv'
_VALID_URL = r'(?:http://)?(?:www\.)?fernsehkritik.tv/folge-(?P<ep>[0-9]+)(?:/.*)?'
_TEST = {
u'url': u'http://fernsehkritik.tv/folge-1',
u'file': u'00011.flv',
u'info_dict': {
u'title': u'Folge 1 vom 10. April 2007',
u'description': u'md5:fb4818139c7cfe6907d4b83412a6864f',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
episode = int(mobj.group('ep'))
server = random.randint(2, 4)
video_thumbnail = 'http://fernsehkritik.tv/images/magazin/folge%d.jpg' % episode
start_webpage = self._download_webpage('http://fernsehkritik.tv/folge-%d/Start' % episode,
episode)
playlist = self._search_regex(r'playlist = (\[.*?\]);', start_webpage,
u'playlist', flags=re.DOTALL)
files = json.loads(re.sub('{[^{}]*?}', '{}', playlist))
# TODO: return a single multipart video
videos = []
for i, _ in enumerate(files, 1):
video_id = '%04d%d' % (episode, i)
video_url = 'http://dl%d.fernsehkritik.tv/fernsehkritik%d%s.flv' % (server, episode, '' if i == 1 else '-%d' % i)
videos.append({
'id': video_id,
'url': video_url,
'ext': determine_ext(video_url),
'title': clean_html(get_element_by_id('eptitle', start_webpage)),
'description': clean_html(get_element_by_id('contentlist', start_webpage)),
'thumbnail': video_thumbnail
})
return videos
class FKTVPosteckeIE(InfoExtractor):
IE_NAME = u'fernsehkritik.tv:postecke'
_VALID_URL = r'(?:http://)?(?:www\.)?fernsehkritik.tv/inline-video/postecke.php\?(.*&)?ep=(?P<ep>[0-9]+)(&|$)'
_TEST = {
u'url': u'http://fernsehkritik.tv/inline-video/postecke.php?iframe=true&width=625&height=440&ep=120',
u'file': u'0120.flv',
u'md5': u'262f0adbac80317412f7e57b4808e5c4',
u'info_dict': {
u"title": u"Postecke 120"
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
episode = int(mobj.group('ep'))
server = random.randint(2, 4)
video_id = '%04d' % episode
video_url = 'http://dl%d.fernsehkritik.tv/postecke/postecke%d.flv' % (server, episode)
video_title = 'Postecke %d' % episode
return {
'id': video_id,
'url': video_url,
'ext': determine_ext(video_url),
'title': video_title,
}

View File

@ -0,0 +1,58 @@
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
unescapeHTML,
)
class FlickrIE(InfoExtractor):
"""Information Extractor for Flickr videos"""
_VALID_URL = r'(?:https?://)?(?:www\.|secure\.)?flickr\.com/photos/(?P<uploader_id>[\w\-_@]+)/(?P<id>\d+).*'
_TEST = {
u'url': u'http://www.flickr.com/photos/forestwander-nature-pictures/5645318632/in/photostream/',
u'file': u'5645318632.mp4',
u'md5': u'6fdc01adbc89d72fc9c4f15b4a4ba87b',
u'info_dict': {
u"description": u"Waterfalls in the Springtime at Dark Hollow Waterfalls. These are located just off of Skyline Drive in Virginia. They are only about 6/10 of a mile hike but it is a pretty steep hill and a good climb back up.",
u"uploader_id": u"forestwander-nature-pictures",
u"title": u"Dark Hollow Waterfalls"
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_uploader_id = mobj.group('uploader_id')
webpage_url = 'http://www.flickr.com/photos/' + video_uploader_id + '/' + video_id
webpage = self._download_webpage(webpage_url, video_id)
secret = self._search_regex(r"photo_secret: '(\w+)'", webpage, u'secret')
first_url = 'https://secure.flickr.com/apps/video/video_mtl_xml.gne?v=x&photo_id=' + video_id + '&secret=' + secret + '&bitrate=700&target=_self'
first_xml = self._download_webpage(first_url, video_id, 'Downloading first data webpage')
node_id = self._html_search_regex(r'<Item id="id">(\d+-\d+)</Item>',
first_xml, u'node_id')
second_url = 'https://secure.flickr.com/video_playlist.gne?node_id=' + node_id + '&tech=flash&mode=playlist&bitrate=700&secret=' + secret + '&rd=video.yahoo.com&noad=1'
second_xml = self._download_webpage(second_url, video_id, 'Downloading second data webpage')
self.report_extraction(video_id)
mobj = re.search(r'<STREAM APP="(.+?)" FULLPATH="(.+?)"', second_xml)
if mobj is None:
raise ExtractorError(u'Unable to extract video url')
video_url = mobj.group(1) + unescapeHTML(mobj.group(2))
return [{
'id': video_id,
'url': video_url,
'ext': 'mp4',
'title': self._og_search_title(webpage),
'description': self._og_search_description(webpage),
'thumbnail': self._og_search_thumbnail(webpage),
'uploader_id': video_uploader_id,
}]

View File

@ -0,0 +1,129 @@
# encoding: utf-8
import re
import xml.etree.ElementTree
import json
from .common import InfoExtractor
from ..utils import (
compat_urlparse,
)
class FranceTVBaseInfoExtractor(InfoExtractor):
def _extract_video(self, video_id):
xml_desc = self._download_webpage(
'http://www.francetvinfo.fr/appftv/webservices/video/'
'getInfosOeuvre.php?id-diffusion='
+ video_id, video_id, 'Downloading XML config')
info = xml.etree.ElementTree.fromstring(xml_desc.encode('utf-8'))
manifest_url = info.find('videos/video/url').text
video_url = manifest_url.replace('manifest.f4m', 'index_2_av.m3u8')
video_url = video_url.replace('/z/', '/i/')
thumbnail_path = info.find('image').text
return {'id': video_id,
'ext': 'mp4',
'url': video_url,
'title': info.find('titre').text,
'thumbnail': compat_urlparse.urljoin('http://pluzz.francetv.fr', thumbnail_path),
'description': info.find('synopsis').text,
}
class PluzzIE(FranceTVBaseInfoExtractor):
IE_NAME = u'pluzz.francetv.fr'
_VALID_URL = r'https?://pluzz\.francetv\.fr/videos/(.*?)\.html'
# Can't use tests, videos expire in 7 days
def _real_extract(self, url):
title = re.match(self._VALID_URL, url).group(1)
webpage = self._download_webpage(url, title)
video_id = self._search_regex(
r'data-diffusion="(\d+)"', webpage, 'ID')
return self._extract_video(video_id)
class FranceTvInfoIE(FranceTVBaseInfoExtractor):
IE_NAME = u'francetvinfo.fr'
_VALID_URL = r'https?://www\.francetvinfo\.fr/replay.*/(?P<title>.+).html'
_TEST = {
u'url': u'http://www.francetvinfo.fr/replay-jt/france-3/soir-3/jt-grand-soir-3-lundi-26-aout-2013_393427.html',
u'file': u'84981923.mp4',
u'info_dict': {
u'title': u'Soir 3',
},
u'params': {
u'skip_download': True,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
page_title = mobj.group('title')
webpage = self._download_webpage(url, page_title)
video_id = self._search_regex(r'id-video=(\d+?)"', webpage, u'video id')
return self._extract_video(video_id)
class France2IE(FranceTVBaseInfoExtractor):
IE_NAME = u'france2.fr'
_VALID_URL = r'''(?x)https?://www\.france2\.fr/
(?:
emissions/.*?/videos/(?P<id>\d+)
| emission/(?P<key>[^/?]+)
)'''
_TEST = {
u'url': u'http://www.france2.fr/emissions/13h15-le-samedi-le-dimanche/videos/75540104',
u'file': u'75540104.mp4',
u'info_dict': {
u'title': u'13h15, le samedi...',
u'description': u'md5:2e5b58ba7a2d3692b35c792be081a03d',
},
u'params': {
u'skip_download': True,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
if mobj.group('key'):
webpage = self._download_webpage(url, mobj.group('key'))
video_id = self._html_search_regex(
r'''(?x)<div\s+class="video-player">\s*
<a\s+href="http://videos.francetv.fr/video/([0-9]+)"\s+
class="francetv-video-player">''',
webpage, u'video ID')
else:
video_id = mobj.group('id')
return self._extract_video(video_id)
class GenerationQuoiIE(InfoExtractor):
IE_NAME = u'france2.fr:generation-quoi'
_VALID_URL = r'https?://generation-quoi\.france2\.fr/portrait/(?P<name>.*)(\?|$)'
_TEST = {
u'url': u'http://generation-quoi.france2.fr/portrait/garde-a-vous',
u'file': u'k7FJX8VBcvvLmX4wA5Q.mp4',
u'info_dict': {
u'title': u'Génération Quoi - Garde à Vous',
u'uploader': u'Génération Quoi',
},
u'params': {
# It uses Dailymotion
u'skip_download': True,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
name = mobj.group('name')
info_url = compat_urlparse.urljoin(url, '/medias/video/%s.json' % name)
info_json = self._download_webpage(info_url, name)
info = json.loads(info_json)
return self.url_result('http://www.dailymotion.com/video/%s' % info['id'],
ie='Dailymotion')

View File

@ -0,0 +1,36 @@
import re
from .common import InfoExtractor
from ..utils import determine_ext
class FreesoundIE(InfoExtractor):
_VALID_URL = r'(?:https?://)?(?:www\.)?freesound\.org/people/([^/]+)/sounds/(?P<id>[^/]+)'
_TEST = {
u'url': u'http://www.freesound.org/people/miklovan/sounds/194503/',
u'file': u'194503.mp3',
u'md5': u'12280ceb42c81f19a515c745eae07650',
u'info_dict': {
u"title": u"gulls in the city.wav",
u"uploader" : u"miklovan",
u'description': u'the sounds of seagulls in the city',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
music_id = mobj.group('id')
webpage = self._download_webpage(url, music_id)
title = self._html_search_regex(r'<div id="single_sample_header">.*?<a href="#">(.+?)</a>',
webpage, 'music title', flags=re.DOTALL)
music_url = self._og_search_property('audio', webpage, 'music url')
description = self._html_search_regex(r'<div id="sound_description">(.*?)</div>',
webpage, 'description', fatal=False, flags=re.DOTALL)
return [{
'id': music_id,
'title': title,
'url': music_url,
'uploader': self._og_search_property('audio:artist', webpage, 'music uploader'),
'ext': determine_ext(music_url),
'description': description,
}]

View File

@ -0,0 +1,35 @@
import re
from .common import InfoExtractor
class FunnyOrDieIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?(?:www\.)?funnyordie\.com/videos/(?P<id>[0-9a-f]+)/.*$'
_TEST = {
u'url': u'http://www.funnyordie.com/videos/0732f586d7/heart-shaped-box-literal-video-version',
u'file': u'0732f586d7.mp4',
u'md5': u'f647e9e90064b53b6e046e75d0241fbd',
u'info_dict': {
u"description": u"Lyrics changed to match the video. Spoken cameo by Obscurus Lupa (from ThatGuyWithTheGlasses.com). Based on a concept by Dustin McLean (DustFilms.com). Performed, edited, and written by David A. Scott.",
u"title": u"Heart-Shaped Box: Literal Video Version"
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
video_url = self._search_regex(
[r'type="video/mp4" src="(.*?)"', r'src="([^>]*?)" type=\'video/mp4\''],
webpage, u'video URL', flags=re.DOTALL)
info = {
'id': video_id,
'url': video_url,
'ext': 'mp4',
'title': self._og_search_title(webpage),
'description': self._og_search_description(webpage),
}
return [info]

View File

@ -0,0 +1,38 @@
import re
from .common import InfoExtractor
class GamekingsIE(InfoExtractor):
_VALID_URL = r'http?://www\.gamekings\.tv/videos/(?P<name>[0-9a-z\-]+)'
_TEST = {
u"url": u"http://www.gamekings.tv/videos/phoenix-wright-ace-attorney-dual-destinies-review/",
u'file': u'20130811.mp4',
# MD5 is flaky, seems to change regularly
#u'md5': u'2f32b1f7b80fdc5cb616efb4f387f8a3',
u'info_dict': {
u"title": u"Phoenix Wright: Ace Attorney \u2013 Dual Destinies Review",
u"description": u"Melle en Steven hebben voor de review een week in de rechtbank doorbracht met Phoenix Wright: Ace Attorney - Dual Destinies.",
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
name = mobj.group('name')
webpage = self._download_webpage(url, name)
video_url = self._og_search_video_url(webpage)
video = re.search(r'[0-9]+', video_url)
video_id = video.group(0)
# Todo: add medium format
video_url = video_url.replace(video_id, 'large/' + video_id)
return {
'id': video_id,
'ext': 'mp4',
'url': video_url,
'title': self._og_search_title(webpage),
'description': self._og_search_description(webpage),
}

View File

@ -0,0 +1,59 @@
import re
import json
from .common import InfoExtractor
from ..utils import (
compat_urllib_parse,
compat_urlparse,
unescapeHTML,
get_meta_content,
)
class GameSpotIE(InfoExtractor):
_VALID_URL = r'(?:http://)?(?:www\.)?gamespot\.com/.*-(?P<page_id>\d+)/?'
_TEST = {
u"url": u"http://www.gamespot.com/arma-iii/videos/arma-iii-community-guide-sitrep-i-6410818/",
u"file": u"gs-2300-6410818.mp4",
u"md5": u"b2a30deaa8654fcccd43713a6b6a4825",
u"info_dict": {
u"title": u"Arma 3 - Community Guide: SITREP I",
u'description': u'Check out this video where some of the basics of Arma 3 is explained.',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
page_id = mobj.group('page_id')
webpage = self._download_webpage(url, page_id)
data_video_json = self._search_regex(r'data-video=\'(.*?)\'', webpage, u'data video')
data_video = json.loads(unescapeHTML(data_video_json))
# Transform the manifest url to a link to the mp4 files
# they are used in mobile devices.
f4m_url = data_video['videoStreams']['f4m_stream']
f4m_path = compat_urlparse.urlparse(f4m_url).path
QUALITIES_RE = r'((,\d+)+,?)'
qualities = self._search_regex(QUALITIES_RE, f4m_path, u'qualities').strip(',').split(',')
http_path = f4m_path[1:].split('/', 1)[1]
http_template = re.sub(QUALITIES_RE, r'%s', http_path)
http_template = http_template.replace('.csmil/manifest.f4m', '')
http_template = compat_urlparse.urljoin('http://video.gamespotcdn.com/', http_template)
formats = []
for q in qualities:
formats.append({
'url': http_template % q,
'ext': 'mp4',
'format_id': q,
})
info = {
'id': data_video['guid'],
'title': compat_urllib_parse.unquote(data_video['title']),
'formats': formats,
'description': get_meta_content('description', webpage),
'thumbnail': self._og_search_thumbnail(webpage),
}
# TODO: Remove when #980 has been merged
info.update(formats[-1])
return info

View File

@ -0,0 +1,36 @@
import re
from .mtv import MTVIE, _media_xml_tag
class GametrailersIE(MTVIE):
"""
Gametrailers use the same videos system as MTVIE, it just changes the feed
url, where the uri is and the method to get the thumbnails.
"""
_VALID_URL = r'http://www.gametrailers.com/(?P<type>videos|reviews|full-episodes)/(?P<id>.*?)/(?P<title>.*)'
_TEST = {
u'url': u'http://www.gametrailers.com/videos/zbvr8i/mirror-s-edge-2-e3-2013--debut-trailer',
u'file': u'70e9a5d7-cf25-4a10-9104-6f3e7342ae0d.mp4',
u'md5': u'4c8e67681a0ea7ec241e8c09b3ea8cf7',
u'info_dict': {
u'title': u'E3 2013: Debut Trailer',
u'description': u'Faith is back! Check out the World Premiere trailer for Mirror\'s Edge 2 straight from the EA Press Conference at E3 2013!',
},
}
# Overwrite MTVIE properties we don't want
_TESTS = []
_FEED_URL = 'http://www.gametrailers.com/feeds/mrss'
def _get_thumbnail_url(self, uri, itemdoc):
search_path = '%s/%s' % (_media_xml_tag('group'), _media_xml_tag('thumbnail'))
return itemdoc.find(search_path).attrib['url']
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
mgid = self._search_regex([r'data-video="(?P<mgid>mgid:.*?)"',
r'data-contentId=\'(?P<mgid>mgid:.*?)\''],
webpage, u'mgid')
return self._get_videos_info(mgid)

View File

@ -0,0 +1,251 @@
# encoding: utf-8
import os
import re
from .common import InfoExtractor
from ..utils import (
compat_urllib_error,
compat_urllib_parse,
compat_urllib_request,
compat_urlparse,
ExtractorError,
smuggle_url,
unescapeHTML,
)
from .brightcove import BrightcoveIE
class GenericIE(InfoExtractor):
IE_DESC = u'Generic downloader that works on some sites'
_VALID_URL = r'.*'
IE_NAME = u'generic'
_TESTS = [
{
u'url': u'http://www.hodiho.fr/2013/02/regis-plante-sa-jeep.html',
u'file': u'13601338388002.mp4',
u'md5': u'6e15c93721d7ec9e9ca3fdbf07982cfd',
u'info_dict': {
u"uploader": u"www.hodiho.fr",
u"title": u"R\u00e9gis plante sa Jeep"
}
},
# embedded vimeo video
{
u'add_ie': ['Vimeo'],
u'url': u'http://skillsmatter.com/podcast/home/move-semanticsperfect-forwarding-and-rvalue-references',
u'file': u'22444065.mp4',
u'md5': u'2903896e23df39722c33f015af0666e2',
u'info_dict': {
u'title': u'ACCU 2011: Move Semantics,Perfect Forwarding, and Rvalue references- Scott Meyers- 13/04/2011',
u"uploader_id": u"skillsmatter",
u"uploader": u"Skills Matter",
}
},
# bandcamp page with custom domain
{
u'add_ie': ['Bandcamp'],
u'url': u'http://bronyrock.com/track/the-pony-mash',
u'file': u'3235767654.mp3',
u'info_dict': {
u'title': u'The Pony Mash',
u'uploader': u'M_Pallante',
},
u'skip': u'There is a limit of 200 free downloads / month for the test song',
},
# embedded brightcove video
# it also tests brightcove videos that need to set the 'Referer' in the
# http requests
{
u'add_ie': ['Brightcove'],
u'url': u'http://www.bfmtv.com/video/bfmbusiness/cours-bourse/cours-bourse-l-analyse-technique-154522/',
u'info_dict': {
u'id': u'2765128793001',
u'ext': u'mp4',
u'title': u'Le cours de bourse : lanalyse technique',
u'description': u'md5:7e9ad046e968cb2d1114004aba466fd9',
u'uploader': u'BFM BUSINESS',
},
u'params': {
u'skip_download': True,
},
},
]
def report_download_webpage(self, video_id):
"""Report webpage download."""
if not self._downloader.params.get('test', False):
self._downloader.report_warning(u'Falling back on generic information extractor.')
super(GenericIE, self).report_download_webpage(video_id)
def report_following_redirect(self, new_url):
"""Report information extraction."""
self._downloader.to_screen(u'[redirect] Following redirect to %s' % new_url)
def _test_redirect(self, url):
"""Check if it is a redirect, like url shorteners, in case return the new url."""
class HeadRequest(compat_urllib_request.Request):
def get_method(self):
return "HEAD"
class HEADRedirectHandler(compat_urllib_request.HTTPRedirectHandler):
"""
Subclass the HTTPRedirectHandler to make it use our
HeadRequest also on the redirected URL
"""
def redirect_request(self, req, fp, code, msg, headers, newurl):
if code in (301, 302, 303, 307):
newurl = newurl.replace(' ', '%20')
newheaders = dict((k,v) for k,v in req.headers.items()
if k.lower() not in ("content-length", "content-type"))
return HeadRequest(newurl,
headers=newheaders,
origin_req_host=req.get_origin_req_host(),
unverifiable=True)
else:
raise compat_urllib_error.HTTPError(req.get_full_url(), code, msg, headers, fp)
class HTTPMethodFallback(compat_urllib_request.BaseHandler):
"""
Fallback to GET if HEAD is not allowed (405 HTTP error)
"""
def http_error_405(self, req, fp, code, msg, headers):
fp.read()
fp.close()
newheaders = dict((k,v) for k,v in req.headers.items()
if k.lower() not in ("content-length", "content-type"))
return self.parent.open(compat_urllib_request.Request(req.get_full_url(),
headers=newheaders,
origin_req_host=req.get_origin_req_host(),
unverifiable=True))
# Build our opener
opener = compat_urllib_request.OpenerDirector()
for handler in [compat_urllib_request.HTTPHandler, compat_urllib_request.HTTPDefaultErrorHandler,
HTTPMethodFallback, HEADRedirectHandler,
compat_urllib_request.HTTPErrorProcessor, compat_urllib_request.HTTPSHandler]:
opener.add_handler(handler())
response = opener.open(HeadRequest(url))
if response is None:
raise ExtractorError(u'Invalid URL protocol')
new_url = response.geturl()
if url == new_url:
return False
self.report_following_redirect(new_url)
return new_url
def _real_extract(self, url):
parsed_url = compat_urlparse.urlparse(url)
if not parsed_url.scheme:
self._downloader.report_warning('The url doesn\'t specify the protocol, trying with http')
return self.url_result('http://' + url)
try:
new_url = self._test_redirect(url)
if new_url:
return [self.url_result(new_url)]
except compat_urllib_error.HTTPError:
# This may be a stupid server that doesn't like HEAD, our UA, or so
pass
video_id = url.split('/')[-1]
try:
webpage = self._download_webpage(url, video_id)
except ValueError:
# since this is the last-resort InfoExtractor, if
# this error is thrown, it'll be thrown here
raise ExtractorError(u'Failed to download URL: %s' % url)
self.report_extraction(video_id)
# it's tempting to parse this further, but you would
# have to take into account all the variations like
# Video Title - Site Name
# Site Name | Video Title
# Video Title - Tagline | Site Name
# and so on and so forth; it's just not practical
video_title = self._html_search_regex(r'<title>(.*)</title>',
webpage, u'video title', default=u'video', flags=re.DOTALL)
# Look for BrightCove:
bc_url = BrightcoveIE._extract_brightcove_url(webpage)
if bc_url is not None:
self.to_screen(u'Brightcove video detected.')
return self.url_result(bc_url, 'Brightcove')
# Look for embedded Vimeo player
mobj = re.search(
r'<iframe[^>]+?src="(https?://player.vimeo.com/video/.+?)"', webpage)
if mobj:
player_url = unescapeHTML(mobj.group(1))
surl = smuggle_url(player_url, {'Referer': url})
return self.url_result(surl, 'Vimeo')
# Look for embedded YouTube player
matches = re.findall(
r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?youtube.com/embed/.+?)\1', webpage)
if matches:
urlrs = [self.url_result(unescapeHTML(tuppl[1]), 'Youtube')
for tuppl in matches]
return self.playlist_result(
urlrs, playlist_id=video_id, playlist_title=video_title)
# Look for Bandcamp pages with custom domain
mobj = re.search(r'<meta property="og:url"[^>]*?content="(.*?bandcamp\.com.*?)"', webpage)
if mobj is not None:
burl = unescapeHTML(mobj.group(1))
# Don't set the extractor because it can be a track url or an album
return self.url_result(burl)
# Start with something easy: JW Player in SWFObject
mobj = re.search(r'flashvars: [\'"](?:.*&)?file=(http[^\'"&]*)', webpage)
if mobj is None:
# Broaden the search a little bit
mobj = re.search(r'[^A-Za-z0-9]?(?:file|source)=(http[^\'"&]*)', webpage)
if mobj is None:
# Broaden the search a little bit: JWPlayer JS loader
mobj = re.search(r'[^A-Za-z0-9]?file["\']?:\s*["\'](http[^\'"]*)', webpage)
if mobj is None:
# Try to find twitter cards info
mobj = re.search(r'<meta (?:property|name)="twitter:player:stream" (?:content|value)="(.+?)"', webpage)
if mobj is None:
# We look for Open Graph info:
# We have to match any number spaces between elements, some sites try to align them (eg.: statigr.am)
m_video_type = re.search(r'<meta.*?property="og:video:type".*?content="video/(.*?)"', webpage)
# We only look in og:video if the MIME type is a video, don't try if it's a Flash player:
if m_video_type is not None:
mobj = re.search(r'<meta.*?property="og:video".*?content="(.*?)"', webpage)
if mobj is None:
# HTML5 video
mobj = re.search(r'<video[^<]*(?:>.*?<source.*?)? src="([^"]+)"', webpage, flags=re.DOTALL)
if mobj is None:
raise ExtractorError(u'Unsupported URL: %s' % url)
# It's possible that one of the regexes
# matched, but returned an empty group:
if mobj.group(1) is None:
raise ExtractorError(u'Did not find a valid video URL at %s' % url)
video_url = mobj.group(1)
video_url = compat_urlparse.urljoin(url, video_url)
video_id = compat_urllib_parse.unquote(os.path.basename(video_url))
# here's a fun little line of code for you:
video_id = os.path.splitext(video_id)[0]
# video uploader is domain name
video_uploader = self._search_regex(r'(?:https?://)?([^/]*)/.*',
url, u'video uploader')
return {
'id': video_id,
'url': video_url,
'uploader': video_uploader,
'upload_date': None,
'title': video_title,
}

View File

@ -0,0 +1,98 @@
# coding: utf-8
import datetime
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
)
class GooglePlusIE(InfoExtractor):
IE_DESC = u'Google Plus'
_VALID_URL = r'(?:https://)?plus\.google\.com/(?:[^/]+/)*?posts/(\w+)'
IE_NAME = u'plus.google'
_TEST = {
u"url": u"https://plus.google.com/u/0/108897254135232129896/posts/ZButuJc6CtH",
u"file": u"ZButuJc6CtH.flv",
u"info_dict": {
u"upload_date": u"20120613",
u"uploader": u"井上ヨシマサ",
u"title": u"嘆きの天使 降臨"
}
}
def _real_extract(self, url):
# Extract id from URL
mobj = re.match(self._VALID_URL, url)
if mobj is None:
raise ExtractorError(u'Invalid URL: %s' % url)
post_url = mobj.group(0)
video_id = mobj.group(1)
video_extension = 'flv'
# Step 1, Retrieve post webpage to extract further information
webpage = self._download_webpage(post_url, video_id, u'Downloading entry webpage')
self.report_extraction(video_id)
# Extract update date
upload_date = self._html_search_regex(
r'''(?x)<a.+?class="o-U-s\s[^"]+"\s+style="display:\s*none"\s*>
([0-9]{4}-[0-9]{2}-[0-9]{2})</a>''',
webpage, u'upload date', fatal=False, flags=re.VERBOSE)
if upload_date:
# Convert timestring to a format suitable for filename
upload_date = datetime.datetime.strptime(upload_date, "%Y-%m-%d")
upload_date = upload_date.strftime('%Y%m%d')
# Extract uploader
uploader = self._html_search_regex(r'rel\="author".*?>(.*?)</a>',
webpage, u'uploader', fatal=False)
# Extract title
# Get the first line for title
video_title = self._html_search_regex(r'<meta name\=\"Description\" content\=\"(.*?)[\n<"]',
webpage, 'title', default=u'NA')
# Step 2, Simulate clicking the image box to launch video
DOMAIN = 'https://plus.google.com/'
video_page = self._search_regex(r'<a href="((?:%s)?photos/.*?)"' % re.escape(DOMAIN),
webpage, u'video page URL')
if not video_page.startswith(DOMAIN):
video_page = DOMAIN + video_page
webpage = self._download_webpage(video_page, video_id, u'Downloading video page')
# Extract video links on video page
"""Extract video links of all sizes"""
pattern = r'\d+,\d+,(\d+),"(http\://redirector\.googlevideo\.com.*?)"'
mobj = re.findall(pattern, webpage)
if len(mobj) == 0:
raise ExtractorError(u'Unable to extract video links')
# Sort in resolution
links = sorted(mobj)
# Choose the lowest of the sort, i.e. highest resolution
video_url = links[-1]
# Only get the url. The resolution part in the tuple has no use anymore
video_url = video_url[-1]
# Treat escaped \u0026 style hex
try:
video_url = video_url.decode("unicode_escape")
except AttributeError: # Python 3
video_url = bytes(video_url, 'ascii').decode('unicode-escape')
return [{
'id': video_id,
'url': video_url,
'uploader': uploader,
'upload_date': upload_date,
'title': video_title,
'ext': video_extension,
}]

View File

@ -0,0 +1,39 @@
import itertools
import re
from .common import SearchInfoExtractor
from ..utils import (
compat_urllib_parse,
)
class GoogleSearchIE(SearchInfoExtractor):
IE_DESC = u'Google Video search'
_MORE_PAGES_INDICATOR = r'id="pnnext" class="pn"'
_MAX_RESULTS = 1000
IE_NAME = u'video.google:search'
_SEARCH_KEY = 'gvsearch'
def _get_n_results(self, query, n):
"""Get a specified number of results for a query"""
res = {
'_type': 'playlist',
'id': query,
'entries': []
}
for pagenum in itertools.count(1):
result_url = u'http://www.google.com/search?tbm=vid&q=%s&start=%s&hl=en' % (compat_urllib_parse.quote_plus(query), pagenum*10)
webpage = self._download_webpage(result_url, u'gvsearch:' + query,
note='Downloading result page ' + str(pagenum))
for mobj in re.finditer(r'<h3 class="r"><a href="([^"]+)"', webpage):
e = {
'_type': 'url',
'url': mobj.group(1)
}
res['entries'].append(e)
if (pagenum * 10 > n) or not re.search(self._MORE_PAGES_INDICATOR, webpage):
return res

View File

@ -0,0 +1,37 @@
# -*- coding: utf-8 -*-
import re
import json
from .common import InfoExtractor
from ..utils import determine_ext
class HarkIE(InfoExtractor):
_VALID_URL = r'https?://www\.hark\.com/clips/(.+?)-.+'
_TEST = {
u'url': u'http://www.hark.com/clips/mmbzyhkgny-obama-beyond-the-afghan-theater-we-only-target-al-qaeda-on-may-23-2013',
u'file': u'mmbzyhkgny.mp3',
u'md5': u'6783a58491b47b92c7c1af5a77d4cbee',
u'info_dict': {
u'title': u"Obama: 'Beyond The Afghan Theater, We Only Target Al Qaeda' on May 23, 2013",
u'description': u'President Barack Obama addressed the nation live on May 23, 2013 in a speech aimed at addressing counter-terrorism policies including the use of drone strikes, detainees at Guantanamo Bay prison facility, and American citizens who are terrorists.',
u'duration': 11,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group(1)
json_url = "http://www.hark.com/clips/%s.json" %(video_id)
info_json = self._download_webpage(json_url, video_id)
info = json.loads(info_json)
final_url = info['url']
return {'id': video_id,
'url' : final_url,
'title': info['name'],
'ext': determine_ext(final_url),
'description': info['description'],
'thumbnail': info['image_original'],
'duration': info['duration'],
}

View File

@ -0,0 +1,44 @@
import re
import base64
from .common import InfoExtractor
class HotNewHipHopIE(InfoExtractor):
_VALID_URL = r'http://www\.hotnewhiphop.com/.*\.(?P<id>.*)\.html'
_TEST = {
u'url': u"http://www.hotnewhiphop.com/freddie-gibbs-lay-it-down-song.1435540.html",
u'file': u'1435540.mp3',
u'md5': u'2c2cd2f76ef11a9b3b581e8b232f3d96',
u'info_dict': {
u"title": u"Freddie Gibbs - Lay It Down"
}
}
def _real_extract(self, url):
m = re.match(self._VALID_URL, url)
video_id = m.group('id')
webpage_src = self._download_webpage(url, video_id)
video_url_base64 = self._search_regex(r'data-path="(.*?)"',
webpage_src, u'video URL', fatal=False)
if video_url_base64 == None:
video_url = self._search_regex(r'"contentUrl" content="(.*?)"', webpage_src,
u'video URL')
return self.url_result(video_url, ie='Youtube')
video_url = base64.b64decode(video_url_base64).decode('utf-8')
video_title = self._html_search_regex(r"<title>(.*)</title>",
webpage_src, u'title')
results = [{
'id': video_id,
'url' : video_url,
'title' : video_title,
'thumbnail' : self._og_search_thumbnail(webpage_src),
'ext' : 'mp3',
}]
return results

View File

@ -0,0 +1,45 @@
import re
from .common import InfoExtractor
class HowcastIE(InfoExtractor):
_VALID_URL = r'(?:https?://)?(?:www\.)?howcast\.com/videos/(?P<id>\d+)'
_TEST = {
u'url': u'http://www.howcast.com/videos/390161-How-to-Tie-a-Square-Knot-Properly',
u'file': u'390161.mp4',
u'md5': u'8b743df908c42f60cf6496586c7f12c3',
u'info_dict': {
u"description": u"The square knot, also known as the reef knot, is one of the oldest, most basic knots to tie, and can be used in many different ways. Here's the proper way to tie a square knot.",
u"title": u"How to Tie a Square Knot Properly"
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
self.report_extraction(video_id)
video_url = self._search_regex(r'\'?file\'?: "(http://mobile-media\.howcast\.com/[0-9]+\.mp4)',
webpage, u'video URL')
video_title = self._html_search_regex(r'<meta content=(?:"([^"]+)"|\'([^\']+)\') property=\'og:title\'',
webpage, u'title')
video_description = self._html_search_regex(r'<meta content=(?:"([^"]+)"|\'([^\']+)\') name=\'description\'',
webpage, u'description', fatal=False)
thumbnail = self._html_search_regex(r'<meta content=\'(.+?)\' property=\'og:image\'',
webpage, u'thumbnail', fatal=False)
return [{
'id': video_id,
'url': video_url,
'ext': 'mp4',
'title': video_title,
'description': video_description,
'thumbnail': thumbnail,
}]

View File

@ -0,0 +1,71 @@
import json
import re
import time
from .common import InfoExtractor
from ..utils import (
compat_str,
compat_urllib_parse,
compat_urllib_request,
ExtractorError,
)
class HypemIE(InfoExtractor):
"""Information Extractor for hypem"""
_VALID_URL = r'(?:http://)?(?:www\.)?hypem\.com/track/([^/]+)/([^/]+)'
_TEST = {
u'url': u'http://hypem.com/track/1v6ga/BODYWORK+-+TAME',
u'file': u'1v6ga.mp3',
u'md5': u'b9cc91b5af8995e9f0c1cee04c575828',
u'info_dict': {
u"title": u"Tame"
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
if mobj is None:
raise ExtractorError(u'Invalid URL: %s' % url)
track_id = mobj.group(1)
data = {'ax': 1, 'ts': time.time()}
data_encoded = compat_urllib_parse.urlencode(data)
complete_url = url + "?" + data_encoded
request = compat_urllib_request.Request(complete_url)
response, urlh = self._download_webpage_handle(request, track_id, u'Downloading webpage with the url')
cookie = urlh.headers.get('Set-Cookie', '')
self.report_extraction(track_id)
html_tracks = self._html_search_regex(r'<script type="application/json" id="displayList-data">(.*?)</script>',
response, u'tracks', flags=re.MULTILINE|re.DOTALL).strip()
try:
track_list = json.loads(html_tracks)
track = track_list[u'tracks'][0]
except ValueError:
raise ExtractorError(u'Hypemachine contained invalid JSON.')
key = track[u"key"]
track_id = track[u"id"]
artist = track[u"artist"]
title = track[u"song"]
serve_url = "http://hypem.com/serve/source/%s/%s" % (compat_str(track_id), compat_str(key))
request = compat_urllib_request.Request(serve_url, "" , {'Content-Type': 'application/json'})
request.add_header('cookie', cookie)
song_data_json = self._download_webpage(request, track_id, u'Downloading metadata')
try:
song_data = json.loads(song_data_json)
except ValueError:
raise ExtractorError(u'Hypemachine contained invalid JSON.')
final_url = song_data[u"url"]
return [{
'id': track_id,
'url': final_url,
'ext': "mp3",
'title': title,
'artist': artist,
}]

129
youtube_dl/extractor/ign.py Normal file
View File

@ -0,0 +1,129 @@
import re
import json
from .common import InfoExtractor
from ..utils import (
determine_ext,
)
class IGNIE(InfoExtractor):
"""
Extractor for some of the IGN sites, like www.ign.com, es.ign.com de.ign.com.
Some videos of it.ign.com are also supported
"""
_VALID_URL = r'https?://.+?\.ign\.com/(?P<type>videos|show_videos|articles|(?:[^/]*/feature))(/.+)?/(?P<name_or_id>.+)'
IE_NAME = u'ign.com'
_CONFIG_URL_TEMPLATE = 'http://www.ign.com/videos/configs/id/%s.config'
_DESCRIPTION_RE = [r'<span class="page-object-description">(.+?)</span>',
r'id="my_show_video">.*?<p>(.*?)</p>',
]
_TESTS = [
{
u'url': u'http://www.ign.com/videos/2013/06/05/the-last-of-us-review',
u'file': u'8f862beef863986b2785559b9e1aa599.mp4',
u'md5': u'eac8bdc1890980122c3b66f14bdd02e9',
u'info_dict': {
u'title': u'The Last of Us Review',
u'description': u'md5:c8946d4260a4d43a00d5ae8ed998870c',
}
},
{
u'url': u'http://me.ign.com/en/feature/15775/100-little-things-in-gta-5-that-will-blow-your-mind',
u'playlist': [
{
u'file': u'5ebbd138523268b93c9141af17bec937.mp4',
u'info_dict': {
u'title': u'GTA 5 Video Review',
u'description': u'Rockstar drops the mic on this generation of games. Watch our review of the masterly Grand Theft Auto V.',
},
},
{
u'file': u'638672ee848ae4ff108df2a296418ee2.mp4',
u'info_dict': {
u'title': u'GTA 5\'s Twisted Beauty in Super Slow Motion',
u'description': u'The twisted beauty of GTA 5 in stunning slow motion.',
},
},
],
u'params': {
u'skip_download': True,
},
},
]
def _find_video_id(self, webpage):
res_id = [r'data-video-id="(.+?)"',
r'<object id="vid_(.+?)"',
r'<meta name="og:image" content=".*/(.+?)-(.+?)/.+.jpg"',
]
return self._search_regex(res_id, webpage, 'video id')
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
name_or_id = mobj.group('name_or_id')
page_type = mobj.group('type')
webpage = self._download_webpage(url, name_or_id)
if page_type == 'articles':
video_url = self._search_regex(r'var videoUrl = "(.+?)"', webpage, u'video url')
return self.url_result(video_url, ie='IGN')
elif page_type != 'video':
multiple_urls = re.findall(
'<param name="flashvars" value="[^"]*?url=(https?://www\.ign\.com/videos/.*?)["&]',
webpage)
if multiple_urls:
return [self.url_result(u, ie='IGN') for u in multiple_urls]
video_id = self._find_video_id(webpage)
result = self._get_video_info(video_id)
description = self._html_search_regex(self._DESCRIPTION_RE,
webpage, 'video description',
flags=re.DOTALL)
result['description'] = description
return result
def _get_video_info(self, video_id):
config_url = self._CONFIG_URL_TEMPLATE % video_id
config = json.loads(self._download_webpage(config_url, video_id,
u'Downloading video info'))
media = config['playlist']['media']
video_url = media['url']
return {'id': media['metadata']['videoId'],
'url': video_url,
'ext': determine_ext(video_url),
'title': media['metadata']['title'],
'thumbnail': media['poster'][0]['url'].replace('{size}', 'grande'),
}
class OneUPIE(IGNIE):
"""Extractor for 1up.com, it uses the ign videos system."""
_VALID_URL = r'https?://gamevideos.1up.com/(?P<type>video)/id/(?P<name_or_id>.+)'
IE_NAME = '1up.com'
_DESCRIPTION_RE = r'<div id="vid_summary">(.+?)</div>'
_TEST = {
u'url': u'http://gamevideos.1up.com/video/id/34976',
u'file': u'34976.mp4',
u'md5': u'68a54ce4ebc772e4b71e3123d413163d',
u'info_dict': {
u'title': u'Sniper Elite V2 - Trailer',
u'description': u'md5:5d289b722f5a6d940ca3136e9dae89cf',
}
}
# Override IGN tests
_TESTS = []
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
id = mobj.group('name_or_id')
result = super(OneUPIE, self)._real_extract(url)
result['id'] = id
return result

View File

@ -0,0 +1,39 @@
import re
from .common import InfoExtractor
class InaIE(InfoExtractor):
"""Information Extractor for Ina.fr"""
_VALID_URL = r'(?:http://)?(?:www\.)?ina\.fr/video/(?P<id>I?[A-F0-9]+)/.*'
_TEST = {
u'url': u'www.ina.fr/video/I12055569/francois-hollande-je-crois-que-c-est-clair-video.html',
u'file': u'I12055569.mp4',
u'md5': u'a667021bf2b41f8dc6049479d9bb38a3',
u'info_dict': {
u"title": u"Fran\u00e7ois Hollande \"Je crois que c'est clair\""
}
}
def _real_extract(self,url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
mrss_url='http://player.ina.fr/notices/%s.mrss' % video_id
video_extension = 'mp4'
webpage = self._download_webpage(mrss_url, video_id)
self.report_extraction(video_id)
video_url = self._html_search_regex(r'<media:player url="(?P<mp4url>http://mp4.ina.fr/[^"]+\.mp4)',
webpage, u'video URL')
video_title = self._search_regex(r'<title><!\[CDATA\[(?P<titre>.*?)]]></title>',
webpage, u'title')
return [{
'id': video_id,
'url': video_url,
'ext': video_extension,
'title': video_title,
}]

View File

@ -0,0 +1,62 @@
import base64
import re
from .common import InfoExtractor
from ..utils import (
compat_urllib_parse,
ExtractorError,
)
class InfoQIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?(?:www\.)?infoq\.com/[^/]+/[^/]+$'
_TEST = {
u"name": u"InfoQ",
u"url": u"http://www.infoq.com/presentations/A-Few-of-My-Favorite-Python-Things",
u"file": u"12-jan-pythonthings.mp4",
u"info_dict": {
u"description": u"Mike Pirnat presents some tips and tricks, standard libraries and third party packages that make programming in Python a richer experience.",
u"title": u"A Few of My Favorite [Python] Things"
},
u"params": {
u"skip_download": True
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
webpage = self._download_webpage(url, video_id=url)
self.report_extraction(url)
# Extract video URL
mobj = re.search(r"jsclassref ?= ?'([^']*)'", webpage)
if mobj is None:
raise ExtractorError(u'Unable to extract video url')
real_id = compat_urllib_parse.unquote(base64.b64decode(mobj.group(1).encode('ascii')).decode('utf-8'))
video_url = 'rtmpe://video.infoq.com/cfx/st/' + real_id
# Extract title
video_title = self._search_regex(r'contentTitle = "(.*?)";',
webpage, u'title')
# Extract description
video_description = self._html_search_regex(r'<meta name="description" content="(.*)"(?:\s*/)?>',
webpage, u'description', fatal=False)
video_filename = video_url.split('/')[-1]
video_id, extension = video_filename.split('.')
info = {
'id': video_id,
'url': video_url,
'uploader': None,
'upload_date': None,
'title': video_title,
'ext': extension, # Extension is always(?) mp4, but seems to be flv
'thumbnail': None,
'description': video_description,
}
return [info]

View File

@ -0,0 +1,35 @@
import re
from .common import InfoExtractor
class InstagramIE(InfoExtractor):
_VALID_URL = r'(?:http://)?instagram.com/p/(.*?)/'
_TEST = {
u'url': u'http://instagram.com/p/aye83DjauH/?foo=bar#abc',
u'file': u'aye83DjauH.mp4',
u'md5': u'0d2da106a9d2631273e192b372806516',
u'info_dict': {
u"uploader_id": u"naomipq",
u"title": u"Video by naomipq",
u'description': u'md5:1f17f0ab29bd6fe2bfad705f58de3cb8',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group(1)
webpage = self._download_webpage(url, video_id)
uploader_id = self._search_regex(r'"owner":{"username":"(.+?)"',
webpage, u'uploader id', fatal=False)
desc = self._search_regex(r'"caption":"(.*?)"', webpage, u'description',
fatal=False)
return [{
'id': video_id,
'url': self._og_search_video_url(webpage, secure=False),
'ext': 'mp4',
'title': u'Video by %s' % uploader_id,
'thumbnail': self._og_search_thumbnail(webpage),
'uploader_id' : uploader_id,
'description': desc,
}]

Some files were not shown because too many files have changed in this diff Show More