Compare commits

..

291 Commits

Author SHA1 Message Date
Philipp Hagemeister
89bb8e97ee release 2014.05.19 2014-05-19 11:42:37 +02:00
Sergey M․
4ea5c7b70d [ndr] Improve thumbnail extraction 2014-05-18 14:23:02 +07:00
Sergey M․
8dfa187b8a [generic] Support pagespeed_iframe for NovaMov embeds 2014-05-17 18:12:12 +07:00
Sergey M․
c1ed1f7055 [ndr] Fix title, description and duration extraction 2014-05-17 18:11:40 +07:00
Sergey M․
1514f74967 [ndr] Fix thumbnail extraction 2014-05-17 17:58:37 +07:00
Philipp Hagemeister
91994c2c81 release 2014.05.17 2014-05-17 00:17:40 +02:00
Jaime Marquínez Ferrándiz
76e92371ac [youtube] Recognize a second format of the upload_date in the 'watch-uploader-info' element (#2911) 2014-05-16 22:12:52 +02:00
Jaime Marquínez Ferrándiz
08af0205f9 Merge remote-tracking branch 'codesparkle/fix-photobucket-url' (closes #2934)
Fix photobucket url extraction
2014-05-16 20:44:52 +02:00
codesparkle
a725fb1f43 test_download works for photobucket after this change 2014-05-17 03:25:41 +10:00
Jaime Marquínez Ferrándiz
05ee2b6dad [youtube] Fix extraction of the feed 'paging' values (fixes #2925) 2014-05-16 16:01:13 +02:00
Philipp Hagemeister
b74feacac5 release 2014.05.16.1 2014-05-16 15:53:17 +02:00
Philipp Hagemeister
426b52fc5d Merge remote-tracking branch 'origin/master' 2014-05-16 15:52:01 +02:00
Philipp Hagemeister
5c30b26846 [francetv] Add support for non-numeric video IDs (Fixes #2927) 2014-05-16 15:51:01 +02:00
Philipp Hagemeister
f07b74fc18 [ffmpeg] Correct argument encoding on Windows with Python 2.x
Fixes #2924
2014-05-16 15:47:56 +02:00
Sergey M․
a5a45015ba [generic] Fix redirect 2014-05-16 20:32:53 +07:00
Philipp Hagemeister
beee53de06 [youtube] Look for published-on date if uploaded-on is not found
Fixes #2911
2014-05-16 13:21:44 +02:00
Philipp Hagemeister
8712f2bea7 release 2014.05.16 2014-05-16 12:04:52 +02:00
Philipp Hagemeister
ea102818c9 Merge remote-tracking branch 'origin/master' 2014-05-16 12:04:24 +02:00
Philipp Hagemeister
0a871f6880 Provide compatibility check_output for 2.6 (Fixes #2926) 2014-05-16 12:03:59 +02:00
Sergey M․
481efc84a8 [bliptv] Switch extraction to RSS (Closes #2920) 2014-05-15 22:20:40 +07:00
Jaime Marquínez Ferrándiz
01ed5c9be3 [youtube] Fix typo 2014-05-15 13:43:29 +02:00
Philipp Hagemeister
ad3bc6acd5 Document and test categories (#2923) 2014-05-15 12:41:42 +02:00
Philipp Hagemeister
5afa7f8bee [extractor/common] --write-pages: Correct file name if video_id is None 2014-05-15 12:39:33 +02:00
Dario Guarascio
ec8deefc27 [youtube] Video categories added to metadata 2014-05-15 13:59:27 +07:00
Sergey M․
a2d5a4ee64 [gamespot] Update test URL and modernize 2014-05-14 20:13:34 +07:00
Jaime Marquínez Ferrándiz
dffcc2ea0c Makefile: write the manpage to the right file and use the processed markdown document 2014-05-13 14:37:05 +02:00
Philipp Hagemeister
1800eeefed add prepare_manpage 2014-05-13 14:21:21 +02:00
Sergey M․
d7e7dedbde [noco] Skip test 2014-05-13 19:12:17 +07:00
Philipp Hagemeister
d19bb9c0aa Split man and README (Fixes #2892) 2014-05-13 11:16:11 +02:00
Philipp Hagemeister
3ef79a974a [README] Stress example URL
This seems to be the part most often overlooked in our README.
2014-05-13 10:28:58 +02:00
Philipp Hagemeister
bc6800fbed release 2014.05.13 2014-05-13 10:20:10 +02:00
Philipp Hagemeister
65314dccf8 [empflix] Simplify (#2903) 2014-05-13 10:14:05 +02:00
Philipp Hagemeister
feb7221209 Merge remote-tracking branch 'hojel/empflix' 2014-05-13 10:11:14 +02:00
Philipp Hagemeister
56a94d8cbb [hentaistigma] Simplified (#2902) 2014-05-13 10:10:59 +02:00
Philipp Hagemeister
24e6ec8ac8 Merge remote-tracking branch 'hojel/hentaistigma' 2014-05-13 10:09:04 +02:00
Philipp Hagemeister
87724af7a8 [nuvid] Simplify (#2901) 2014-05-13 10:08:32 +02:00
Philipp Hagemeister
b65c3e77e8 Merge remote-tracking branch 'hojel/nuvid' 2014-05-13 10:05:20 +02:00
Philipp Hagemeister
5301304bf2 [slutload] Simplify (#2898) 2014-05-13 10:04:29 +02:00
Philipp Hagemeister
948bcc60df Merge remote-tracking branch 'hojel/slutload' 2014-05-13 10:00:49 +02:00
Philipp Hagemeister
25dfe0eb10 Credit @hojel for fc2 and other extractors (#2877) 2014-05-13 10:00:27 +02:00
Philipp Hagemeister
8e71456a81 [fc2] Add new extractor (Fixes #2877)
This commit has been recreated, since there seems to have been a problem with GitHub; the PR doesn't have a branch.
2014-05-13 09:58:36 +02:00
Philipp Hagemeister
ccdd34ed78 Credit @jnormore for vine:user (#2888) 2014-05-13 09:53:58 +02:00
Philipp Hagemeister
26d886354f Merge remote-tracking branch 'frewsxcv/patch-1' 2014-05-13 09:52:28 +02:00
Philipp Hagemeister
a172b258ac [vine:user] Simplify 2014-05-13 09:50:03 +02:00
Philipp Hagemeister
7b93c2c204 Merge remote-tracking branch 'jnormore/vine_user' 2014-05-13 09:45:27 +02:00
Philipp Hagemeister
57c7411f46 [mixcloud] Shed API dependency (#2904) 2014-05-13 09:42:38 +02:00
Philipp Hagemeister
d0a122348e [test/helper] Clarify which field failed an assertion 2014-05-13 09:41:36 +02:00
Philipp Hagemeister
e4cbb5f382 [wdr] Add support for mobile URLs 2014-05-12 22:17:19 +02:00
Philipp Hagemeister
c1bce22f23 [extractor/common] Protect against long video IDs and URLs 2014-05-12 21:58:23 +02:00
Philipp Hagemeister
e3abbbe301 release 2014.05.12 2014-05-12 16:40:03 +02:00
Sergey M․
55b36e3710 [videott] Add support for video.tt (Closes #2889) 2014-05-12 20:23:08 +07:00
hojel
877bea9ce1 [empflix] Add new extractor 2014-05-12 04:10:29 -07:00
hojel
33c7ff861e [hentaistigma] Add new extractor 2014-05-12 03:58:07 -07:00
hojel
749fe60c1e [nuvid] Add new extractor 2014-05-12 03:48:40 -07:00
hojel
63b31b059c [slutload] Add new extractor 2014-05-12 01:29:19 -07:00
hojel
1476b497eb [slutload] Add new extractor 2014-05-12 01:28:56 -07:00
Jaime Marquínez Ferrándiz
e399853d0c [youtube:playlist] Improve detection of private lists (#2840) 2014-05-12 07:59:33 +02:00
Corey Farwell
fdb205b19e Enable testing on Python 3.4 2014-05-11 20:13:22 -07:00
Sergey M․
fbe8053120 [vk] Update test 2014-05-11 16:43:59 +07:00
Jason Normore
ea783d01e1 Added VineUserIE extractor for vine user timeline
Added vine user timeline extractor using unofficial
vine api user profile and timeline api endpoints.
2014-05-10 23:18:20 -04:00
Jaime Marquínez Ferrándiz
b7d73595dc Allow recoding the video to mkv 2014-05-10 15:09:56 +02:00
Sergey M․
e97e53eeed [vevo] Add friendly error output (#2874) 2014-05-10 04:34:53 +07:00
Sergey M․
342f630dbf [rutv] Add support for more live stream URLs (Closes #2875) 2014-05-10 02:23:24 +07:00
Sergey M․
69c8fb9e5d [vimeo] Add video duration extraction(Closes #2876) 2014-05-10 01:46:40 +07:00
Sergey M․
5f0f8013ac [vube] Consider optional fields and modernize 2014-05-09 01:45:34 +07:00
Sergey M․
b5368acee8 [vube] Improve URL detection and extract timestamp 2014-05-09 01:31:25 +07:00
Sergey M․
f71959fcf5 [nfb] Add support for videos with captions (#2866) 2014-05-08 22:07:14 +07:00
Philipp Hagemeister
5c9f3b8b16 [arte] Fix versionCode interpretation (#2588) 2014-05-08 02:00:47 +02:00
Sergey M․
bebd6f9308 [funnyordie] Extract more formats 2014-05-07 21:02:57 +07:00
Sergey M.
84a2806c16 Merge pull request #2859 from pulpe/FunnyOrDie_thumb
[FunnyOrDie] fix thumbnails + add test (fixes #2856)
2014-05-06 19:46:40 +07:00
pulpe
d0111a7409 [FunnyOrDie] simplify 2014-05-06 10:19:13 +02:00
pulpe
aab8874c55 [FunnyOrDie] fix thumbnails + add test (fixes #2856) 2014-05-06 08:57:28 +02:00
Sergey M․
fcf5b01746 [prosiebensat1] Simplify 2014-05-05 19:02:49 +07:00
Philipp Hagemeister
4de9e9a6db [canalplus] Fix id determination (Fixes #2851) 2014-05-05 03:30:05 +02:00
Philipp Hagemeister
0067d6c4be release 2014.05.05 2014-05-05 03:15:40 +02:00
Philipp Hagemeister
2099125333 [soundcloud/generic] Add support for playlists 2014-05-05 03:15:17 +02:00
Philipp Hagemeister
b48f147d5a [bandcamp] Add support for subdomains (Fixes #2850) 2014-05-05 02:44:44 +02:00
Jaime Marquínez Ferrándiz
4f3e943080 [vimeo] Some modernization and style fixes 2014-05-04 22:27:56 +02:00
Jaime Marquínez Ferrándiz
7558830fa3 [vimeo] Fix description extraction 2014-05-04 21:48:08 +02:00
Sergey M․
867274e997 [statigram] Update to fit new website name and rename extractor 2014-05-04 16:52:10 +07:00
Sergey M․
6515778305 [nytimes] Improve file size extraction 2014-05-03 03:11:38 +07:00
Sergey M․
3b1dfc0f2f [newstube] Do not shadow standard str 2014-05-03 02:30:50 +07:00
Sergey M․
d664de44b7 [nytimes] Add support for nytimes.com (Closes #2846) 2014-05-03 02:28:38 +07:00
Sergey M․
bbe99d26ec Credit @nicoe for rtbf.be (#2822) 2014-05-02 02:36:11 +07:00
Sergey M․
50fc59968e [ntv] Simplify 2014-05-02 02:26:07 +07:00
Sergey M․
b8b01bb92a [newstube] Add support for newstube.ru (Closes #2814) 2014-05-01 21:15:25 +07:00
Sergey M․
eb45133451 [rtmp] Add support for multiple AFM data entries 2014-05-01 21:14:21 +07:00
Jaime Marquínez Ferrándiz
10c0e2d818 [youtube:playlist] Raise an error if the list doesn't exist or is private (closes #2840) 2014-05-01 15:40:35 +02:00
Sergey M․
669f0e7cda [generic] Fix wrong entries index 2014-05-01 16:28:37 +07:00
Sergey M․
32fd27ec98 [http] Fix string/None comparison with int while in test 2014-04-30 20:02:17 +07:00
Philipp Hagemeister
0c13f378de Merge remote-tracking branch 'origin/master' 2014-04-30 14:12:41 +02:00
Philipp Hagemeister
0049594efb [vine] Remove debugging code 2014-04-30 14:12:30 +02:00
Sergey M․
113c7d3eb0 [canalplus] Update test file checksum 2014-04-30 18:54:12 +07:00
Sergey M․
549371fc99 [nrk] Update test file checksums 2014-04-30 18:51:50 +07:00
Sergey M․
957f27e5bb [scivee] Revert test file download 2014-04-30 18:49:29 +07:00
Philipp Hagemeister
1f8c19767b release 2014.04.30.1 2014-04-30 10:07:39 +02:00
Philipp Hagemeister
a383a98af6 [utils/_windows_write_string] Be defensive about fileno (Fixes #2820) 2014-04-30 10:07:32 +02:00
Philipp Hagemeister
acd69589a5 [YoutubeDL] Do not require default output template to be set 2014-04-30 10:02:08 +02:00
Philipp Hagemeister
b30b8698ea [generic] Allow multiple matches for generic hits (Fixes #2818) 2014-04-30 02:23:51 +02:00
Philipp Hagemeister
f1f25be6db release 2014.04.30 2014-04-30 02:05:03 +02:00
Philipp Hagemeister
deab8c1960 Merge branch 'master' of github.com:rg3/youtube-dl 2014-04-30 02:04:55 +02:00
Philipp Hagemeister
c57f775710 [YoutubeDL] Add simple tests for format_note (Closes #2825) 2014-04-30 02:02:41 +02:00
AGSPhoenix
e75cafe9fb Clean up format list for consistency
This should make the format list output look a bit nicer.
2014-04-30 01:52:05 +02:00
Philipp Hagemeister
33ab8453c4 Merge pull request #2813 from dstftw/test-real-download-improvement
Improve download mechanism when Range HTTP header is ignored
2014-04-30 01:50:33 +02:00
Philipp Hagemeister
ebd3c7b370 [generic] Add support for protocol-independent URLs (Fixes #2810) 2014-04-30 01:46:06 +02:00
Philipp Hagemeister
29645a1d44 Merge remote-tracking branch 'pulpe/moviezinese' 2014-04-30 01:37:05 +02:00
Philipp Hagemeister
22d99a801a [syfy] Add suppor for generic URLs (Fixes #2827) 2014-04-30 01:35:52 +02:00
Jaime Marquínez Ferrándiz
57b8d84cd9 [5min] Raise an error if the 'success' field is False
For example for georestricted videos.
2014-04-29 14:57:38 +02:00
Sergey M․
65e4ad5bfe [rtbf] Minor changes and YouTube videos support 2014-04-29 19:41:58 +07:00
Nicolas Évrard
98b7d476d9 [RTBFVideo] Remove useless print statement 2014-04-28 23:19:56 +02:00
Nicolas Évrard
201e3c99b9 [RTBFVideo] Add new extractor 2014-04-28 20:32:13 +02:00
Sergey M․
8a7a4a9796 [scivee] Skip test for now 2014-04-28 19:52:32 +07:00
Sergey M․
df297c8794 [http] Improve download mechanism when Range HTTP header is ignored 2014-04-27 09:32:01 +07:00
pulpe
3f53a75f02 [moviezine] Add extractor for moviezine.se (fixes #2808) 2014-04-26 18:55:29 +02:00
Sergey M․
7c360e3a04 [scivee] Add support for scivee.tv 2014-04-26 20:22:15 +07:00
Sergey M․
d2176c8011 [nrk] Add support for nrk.no (Closes #2804) 2014-04-25 21:34:44 +07:00
Jaime Marquínez Ferrándiz
aa92f06308 [youtube] Don't call 'unquote_plus' on the video title (fixes #2799)
It's already unquoted after calling 'compat_parse_qs'.
It replaced '+' with spaces, for example in https://www.youtube.com/watch?v=XC0b5YexO-I.
2014-04-25 13:19:03 +02:00
Jaime Marquínez Ferrándiz
e00c9cf599 [youtube] Update test description field 2014-04-25 13:14:15 +02:00
Jaime Marquínez Ferrándiz
ba60a3ebe0 [youtube] Update test description field 2014-04-25 12:57:04 +02:00
Jaime Marquínez Ferrándiz
efb7e11988 [vimeo] Add an extractor for the watch later list (closes #2787) 2014-04-24 21:51:20 +02:00
Sergey M․
a55c8b7aac [9gag] Fix post view regex 2014-04-24 19:52:34 +07:00
Jaime Marquínez Ferrándiz
a980bc4324 [vimeo] Fix logging in python 3.x
The POST data must be a bytes object.
2014-04-24 14:44:27 +02:00
Sergey M․
4b10aadffc [dailymotion] Fix user playlist extraction 2014-04-23 19:42:34 +07:00
Sergey M․
5bec574859 [ted] Update test 2014-04-22 19:49:41 +07:00
Philipp Hagemeister
d11271dd29 [youtube] Include video Id in common error message (Fixes #2786) 2014-04-21 20:34:03 +02:00
Philipp Hagemeister
1d9d26d09b release 2014.04.21.6 2014-04-21 16:18:32 +02:00
Philipp Hagemeister
c0292e8ab7 [generic] Improve jwplayer detection (Fixes #2731) 2014-04-21 16:16:53 +02:00
Philipp Hagemeister
f44e5d8b43 [vuclip] Fix VALID_URL regex 2014-04-21 16:14:21 +02:00
Philipp Hagemeister
6ea74538e3 release 2014.04.21.5 2014-04-21 15:56:23 +02:00
Philipp Hagemeister
24b8924b46 [facebook] Correct login (Fixes #2743) 2014-04-21 15:56:09 +02:00
Philipp Hagemeister
86a3c67112 release 2014.04.21.4 2014-04-21 15:25:16 +02:00
Philipp Hagemeister
8be874370d Merge branch 'master' of github.com:rg3/youtube-dl 2014-04-21 15:24:51 +02:00
Philipp Hagemeister
aec74dd95a [vuclip] Add extractor (Fixes #2735) 2014-04-21 15:24:44 +02:00
Sergey M․
6890574256 [rutube] Add missing whitespace 2014-04-21 19:04:11 +07:00
Sergey M․
d03745c684 [jukebox] Update test md5 2014-04-21 19:00:27 +07:00
Philipp Hagemeister
28746fbd59 [bilibili] Add preliminary support (#2174)
The URL http://www.bilibili.tv/video/av636603/index_2.html does not work yet.
2014-04-21 13:46:41 +02:00
Philipp Hagemeister
0321213c11 [test_subtitles] Allow more subtitles for TED videos 2014-04-21 13:20:14 +02:00
Philipp Hagemeister
3f0aae4244 release 2014.04.21.3 2014-04-21 12:40:09 +02:00
Philipp Hagemeister
48099643cc [generic] Be more relaxed when looking for aparat embeds (Fixes #2784) 2014-04-21 12:37:41 +02:00
Philipp Hagemeister
621f33c9d0 [ted] Extend search for description 2014-04-21 12:37:16 +02:00
Philipp Hagemeister
f07a9f6f43 [ted] Remove superfluous u prefixes 2014-04-21 12:34:32 +02:00
Philipp Hagemeister
e51880fd32 [cnet] Correct JSON capturing 2014-04-21 07:59:29 +02:00
Philipp Hagemeister
88ce273da4 [arte] differentiate JSON outputs 2014-04-21 07:59:16 +02:00
Philipp Hagemeister
b9ba5dfa28 [test helper] Correct only_matching test gathering 2014-04-21 07:56:51 +02:00
Philipp Hagemeister
4086f11929 release 2014.04.21.2 2014-04-21 07:12:12 +02:00
Philipp Hagemeister
478c2c6193 [clubic] Add extractor (Fixes #2773) 2014-04-21 07:12:02 +02:00
Philipp Hagemeister
d2d6481afb [mdr] Remove unused imports 2014-04-21 06:49:21 +02:00
Philipp Hagemeister
43acb120f3 release 2014.04.21.1 2014-04-21 06:28:25 +02:00
Philipp Hagemeister
e8f2025edf [mdr] Add support for modern URLs (Fixes #2775) 2014-04-21 06:25:21 +02:00
Philipp Hagemeister
a4eb9578af [yahoo] Add support for movies (Fixes #2780) 2014-04-21 06:18:04 +02:00
Philipp Hagemeister
fa35cdad02 [condenast|generic] Add support for condenast embeds (Fixes #2783) 2014-04-21 05:47:52 +02:00
Philipp Hagemeister
d1b9c912a4 [utils] Fix _windows_write_string (Fixes #2779)
It turns out that the function did not work for outputs longer than 1024 UCS-2 tokens.
Write non-BMP characters one by one to ensure that we count correctly.
2014-04-21 04:59:46 +02:00
Philipp Hagemeister
edec83a025 [infoq] Add support for HTTP downloads (Fixes #722) 2014-04-21 03:21:34 +02:00
Philipp Hagemeister
c0a7c60815 [infoq] Simplify (#2777) 2014-04-21 02:55:35 +02:00
Philipp Hagemeister
117a7d1944 Merge remote-tracking branch 'kwbr/master' 2014-04-21 02:48:04 +02:00
Philipp Hagemeister
a40e0dd434 release 2014.04.21 2014-04-21 02:34:53 +02:00
Philipp Hagemeister
188b086dd9 Merge branch 'master' of github.com:rg3/youtube-dl 2014-04-21 02:34:44 +02:00
Philipp Hagemeister
1f27d2c0e1 [steam] Add support for steamcommunity.com (Fixes #2757) 2014-04-21 02:34:34 +02:00
Kai Weber
7560096db5 [infoq] Simplify playpath calculation 2014-04-20 01:10:30 +02:00
Kai Weber
282cb9c7ba [infoq] Fix extractor 2014-04-20 01:01:37 +02:00
Sergey M․
3a9d6790ad [ivi] Update playlist tests 2014-04-20 03:06:50 +07:00
Philipp Hagemeister
0610a3e0b2 Remove unused imports 2014-04-19 19:57:09 +02:00
Philipp Hagemeister
7f9c31df88 [steam] Simplify 2014-04-19 19:55:53 +02:00
Philipp Hagemeister
3fa6b6e293 [steam] Modernize 2014-04-19 19:51:04 +02:00
Philipp Hagemeister
3c50b99ab4 [extremetube] Modernize 2014-04-19 19:42:51 +02:00
Philipp Hagemeister
52fadd5fb2 [test_all_urls] Add support for distributed URL matching test definition 2014-04-19 19:41:06 +02:00
Philipp Hagemeister
5367fe7f4d [test_all_urls] Simplify 2014-04-19 13:01:15 +02:00
Philipp Hagemeister
427588f6e7 Merge remote-tracking branch 'MikeCol/extremetube-gay' 2014-04-19 12:59:52 +02:00
Philipp Hagemeister
51745be312 release 2014.04.19 2014-04-19 11:55:33 +02:00
Sergey M․
d7f1e7c88f [rutube] Fix extraction 2014-04-19 15:59:12 +07:00
MikeCol
4145a257be Extended regex match to include gay clips 2014-04-19 00:29:42 +02:00
Sergey M․
525dc9809e [noco] Fix test description md5 2014-04-18 21:36:04 +07:00
Sergey M․
1bf3210816 [noco] Add support for noco.tv (Closes #2712) 2014-04-18 21:11:09 +07:00
Sergey M․
e6c6d10d99 [podomatic] Improve video URL extraction (Closes #2763) 2014-04-17 19:59:52 +07:00
Jaime Marquínez Ferrándiz
f270256e06 [tlc] Add an extractor for tlc.com
It uses the same system as discovery.com
2014-04-16 20:29:31 +02:00
Jaime Marquínez Ferrándiz
f401c6f69f [canalplus] Download the video in the test
It doesn't use rtmpdump now.
2014-04-16 15:54:00 +02:00
Sergey M․
b075d25bed [canalplus] Prefer f4m and modernize (Closes #2749) 2014-04-16 20:47:39 +07:00
Jaime Marquínez Ferrándiz
3d1bb6b4dd Add an extractor for tlc.de (fixes #2748) 2014-04-16 15:45:05 +02:00
Philipp Hagemeister
1db2666916 [youtube:playlist] Correct playlist ID output
The ID now starts with PL, so we don't need to output that twice.
2014-04-15 17:55:52 +02:00
Jaime Marquínez Ferrándiz
8f5c0218d8 [fivemin] Get the 'sid' from the embed page (fixes #2745)
It allows to download some videos that failed.
2014-04-15 16:18:37 +02:00
Sergey M․
d7666dff82 [9gag] Fix and improve extraction 2014-04-15 19:49:38 +07:00
Jaime Marquínez Ferrándiz
2d4c98dbd1 [ted] Use the rtmp links if there http downloads are not available. 2014-04-14 15:23:12 +02:00
Sergey M․
fd50bf623c [generic] Modernize tests 2014-04-14 18:56:29 +07:00
Sergey M․
d360a14678 [generic] Update test 2014-04-14 18:51:46 +07:00
Philipp Hagemeister
d0f2ab6969 release 2014.04.13 2014-04-13 03:22:30 +02:00
Philipp Hagemeister
de906ef543 [aol] Add support for playlists (Fixes #2730) 2014-04-13 03:22:24 +02:00
Sergey M․
2fb3deeca1 [tube8] Fix extraction and modernize 2014-04-13 03:56:32 +07:00
Philipp Hagemeister
66398056f1 Merge branch 'master' of github.com:rg3/youtube-dl 2014-04-12 17:15:16 +02:00
Jaime Marquínez Ferrándiz
77477fa4c9 Merge branch 'atomicparsley' (closes #2436) 2014-04-12 15:52:42 +02:00
Jaime Marquínez Ferrándiz
a169e18ce1 [atomicparsley] Remove unneeded __init__ method 2014-04-12 15:51:40 +02:00
Jaime Marquínez Ferrándiz
381640e3ac [brightcove] Only use url from meta element if it has the 'playerKey' field (fixes #2738) 2014-04-12 12:53:48 +02:00
Sergey M․
37e3410137 [prosiebensat1] Add one more clip id pattern (Closes #2737) 2014-04-12 02:53:55 +07:00
Jaime Marquínez Ferrándiz
97b5196960 [weibo] Modernize 2014-04-11 16:02:34 +02:00
Sergey M․
6a4f3528c8 [firstpost] Fix extraction 2014-04-11 20:40:42 +07:00
Philipp Hagemeister
b9c76aa1a9 [youtube] Add support for cleanvideosearch.com (Fixes #2734) 2014-04-11 13:53:05 +02:00
Philipp Hagemeister
0d3070d364 release 2014.04.11.2 2014-04-11 09:44:33 +02:00
Philipp Hagemeister
7753cadbfa [comedycentral:shows] Add support for TDS special editions (Fixes #2733) 2014-04-11 09:30:07 +02:00
Philipp Hagemeister
3950450342 [pyvideo] Fix title 2014-04-11 02:20:50 +02:00
Philipp Hagemeister
c82b1fdad6 [slideshare] Fix description 2014-04-11 02:19:15 +02:00
Philipp Hagemeister
b0fb63abe8 [dailymotion:playlist] Fix title 2014-04-11 02:16:46 +02:00
Philipp Hagemeister
3ab34c603e [comedycentral] Fix test md5sum 2014-04-11 02:14:31 +02:00
Philipp Hagemeister
7d6413341a release 2014.04.11.1 2014-04-11 01:29:54 +02:00
Philipp Hagemeister
140012d0f6 release 2014.04.11 2014-04-11 01:28:30 +02:00
Philipp Hagemeister
4be9f8c814 [ninegag] Add support for p/ URLs 2014-04-11 01:25:24 +02:00
Sergey M․
5c802bac37 [byutv] Fix test 2014-04-10 19:37:55 +07:00
Sergey M․
6c30ff756a [mpora] Fix test 2014-04-10 19:10:03 +07:00
Jaime Marquínez Ferrándiz
62749e4708 [morningstar] Also support 'Cover' (#2729) 2014-04-09 20:51:28 +02:00
Jaime Marquínez Ferrándiz
6b7dee4b38 [morningstar] Recognize urls that use 'videoCenter' (fixes #2729) 2014-04-09 20:45:49 +02:00
Sergey M․
ef2041eb4e [br] Add audio extraction and support more URLs (Closes #2728) 2014-04-09 20:19:27 +07:00
Philipp Hagemeister
29e3e682af [comedycentral] Match more URLs
Looks like they only offer clips instead of full episodes now. We'll need to add new parsing code as well.
2014-04-09 11:43:15 +02:00
Philipp Hagemeister
f983c44199 Merge pull request #2725 from foolscap/subtitles-error-fix
Fix subtitle download error reporting (Fixes #2724)
2014-04-09 10:16:06 +02:00
robbie
e4db19511a Fix subtitle download error reporting (Fixes #2724) 2014-04-08 15:59:27 +01:00
Sergey M․
c47d21da80 [ntv] Update test 2014-04-08 19:11:40 +07:00
Philipp Hagemeister
269aecd0c0 [ffmpeg] Do not pass in byets to subprocess (Fixes #2717) 2014-04-07 23:33:05 +02:00
Philipp Hagemeister
aafddb2b0a Merge remote-tracking branch 'anisse/fix-content-encoding-charset' 2014-04-07 23:27:03 +02:00
Philipp Hagemeister
6262ac8ac5 release 2014.04.07.4 2014-04-07 23:23:54 +02:00
Philipp Hagemeister
89938c719e Fix Windows output for non-BMP unicode characters 2014-04-07 23:23:48 +02:00
Anisse Astier
ec0fafbb19 [extractor/common] fallback on utf-8 when charset is not found
fixes #2721
2014-04-07 23:10:16 +02:00
Philipp Hagemeister
a5863bdf33 release 2014.04.07.3 2014-04-07 22:48:45 +02:00
Philipp Hagemeister
b58ddb32ba [utils] Completely rewrite Windows output (Fixes #2672) 2014-04-07 22:48:13 +02:00
Philipp Hagemeister
b9e12a8140 release 2014.04.07.2 2014-04-07 21:41:20 +02:00
Philipp Hagemeister
104aa7388a Use our own encoding when writing strings 2014-04-07 21:40:34 +02:00
Philipp Hagemeister
c3855d28b0 Merge branch 'master' of github.com:rg3/youtube-dl 2014-04-07 19:57:51 +02:00
Philipp Hagemeister
734f90bb41 Use --encoding when outputting 2014-04-07 19:57:42 +02:00
Jaime Marquínez Ferrándiz
91a6addeeb Add support for rtve.es/alacarta 2014-04-07 17:30:32 +02:00
Philipp Hagemeister
9afb76c5ad release 2014.04.07.1 2014-04-07 15:28:55 +02:00
Philipp Hagemeister
dfb2cb5cfd [teamcoco] Simplify ID management (Closes #2715) 2014-04-07 15:25:35 +02:00
Philipp Hagemeister
650d688d10 release 2014.04.07 2014-04-07 13:11:37 +02:00
Philipp Hagemeister
0ba77818f3 [ted] Add width and height (Fixes #2716) 2014-04-07 13:11:30 +02:00
Sergey M․
09baa7da7e [rts] Update test 2014-04-07 00:34:23 +07:00
Sergey M․
85e787f51d [cbsnews] Add support for cbsnews.com (Closes #2691) 2014-04-06 06:03:58 +07:00
Philipp Hagemeister
2a9e1e453a Merge branch 'master' of github.com:rg3/youtube-dl 2014-04-05 20:05:47 +02:00
Philipp Hagemeister
ee1e199685 [justin.tv] Modernize (Fixes #2705) 2014-04-05 17:56:36 +02:00
Sergey M․
17c5a00774 [novamov] Simplify 2014-04-05 19:36:22 +07:00
Sergey M․
15c0e8e7b2 [generic] Generalize novamov based embeds 2014-04-05 17:20:05 +07:00
Sergey M․
cca37fba48 [divxstage] Fix typo in IE_NAME 2014-04-05 17:15:43 +07:00
Sergey M․
9d0993ec4a [movshare] Support more domains 2014-04-05 17:00:18 +07:00
Sergey M․
342f33bf9e [divxstage] Support more domains 2014-04-05 16:50:05 +07:00
Sergey M․
7cd3bc5f99 [nowvideo] Support more domains 2014-04-05 16:38:57 +07:00
Sergey M․
931055e6cb [videoweed] Revert _FILE_DELETED_REGEX 2014-04-05 16:32:14 +07:00
Sergey M․
d0e4cf82f1 [movshare] Add _FILE_DELETED_REGEX 2014-04-05 16:31:38 +07:00
Sergey M․
6f88df2c57 [divxstage] Add support for divxstage.eu 2014-04-05 16:29:44 +07:00
Sergey M․
4479bf2762 [videoweed] Simplify 2014-04-05 16:09:28 +07:00
Sergey M․
1ff7c0f7d8 [movshare] Add support for movshare.net 2014-04-05 16:09:03 +07:00
Sergey M․
610e47c87e Credit @sainyamkapoor for videoweed extractor 2014-04-05 15:53:50 +07:00
Sergey M․
50f566076f [generic] Add support for videoweed embeds 2014-04-05 15:49:45 +07:00
Sergey M․
92810ff497 [nowvideo] Improve _VALID_URL 2014-04-05 15:35:21 +07:00
Sergey M․
60ccc59a1c [novamov] Improve _VALID_URL 2014-04-05 15:34:54 +07:00
Sergey M․
91745595d3 [videoweed] Simplify 2014-04-05 15:32:55 +07:00
Sainyam Kapoor
d6e40507d0 [videoweed]Cleanup 2014-04-05 10:53:22 +05:30
Sainyam Kapoor
deed48b472 [Videoweed] Added support for videoweed. 2014-04-05 10:40:03 +05:30
Philipp Hagemeister
e4d41bfca5 Merge pull request #2696 from anovicecodemonkey/support-ustream-embeds
[UstreamIE] [generic] Added support for Ustream embed URLs (Fixes #2694)
2014-04-04 23:33:08 +02:00
Philipp Hagemeister
a355b70f27 [cspan] Do not test number of playlist entries
Apparently, CSpan switches between single-file and multiple-file results. Either one is fine as long as we get the full four hours.
2014-04-04 23:16:22 +02:00
Philipp Hagemeister
f8514f6186 [rts] Use visible id in file names
Maybe the internal ID is more precise, but it's totally confusing, and the obvious ID still allows a google search.
2014-04-04 23:13:55 +02:00
Philipp Hagemeister
e09b8fcd9d [ro220] Make test case more flexible
Either one or two spaces is fine here.
2014-04-04 23:08:33 +02:00
Philipp Hagemeister
7d1b527ff9 [motorsport] Fix on Python 3 2014-04-04 23:06:27 +02:00
Philipp Hagemeister
f943c7b622 release 2014.04.04.7 2014-04-04 23:01:45 +02:00
Philipp Hagemeister
676eb3f2dd Fix unicode_escape (Fixes #2695) 2014-04-04 23:00:51 +02:00
Philipp Hagemeister
98b7cf1ace release 2014.04.04.6 2014-04-04 22:48:35 +02:00
Philipp Hagemeister
c465afd736 [teamcoco] Fix regex in 2.6 (#2700)
The re engine does not want to repeat an empty string, for fear that something like

    (.*)*

could be matching the tokens ...

    ""
    "" ""
    "" "" ""

Of course, that's harmless with a question mark, although still somewhat strange.
2014-04-04 22:46:47 +02:00
Philipp Hagemeister
b84d6e7fc4 Merge remote-tracking branch 'AGSPhoenix/teamcoco-fix' 2014-04-04 22:44:49 +02:00
Philipp Hagemeister
2efd5d78c1 release 2014.04.04.5 2014-04-04 22:24:45 +02:00
Philipp Hagemeister
c8edf47b3a [yahoo] Support https and -uploader URLs (Fixes #2701) 2014-04-04 22:23:59 +02:00
Philipp Hagemeister
3b4c26a428 [pornhd] Avoid shadowing variable url 2014-04-04 22:22:30 +02:00
Philipp Hagemeister
1525148114 Remove unused imports 2014-04-04 22:22:11 +02:00
Philipp Hagemeister
9e0c5791c1 release 2014.04.04.4 2014-04-04 22:15:32 +02:00
Philipp Hagemeister
29a1ab2afc Add alternative --prefer-unsecure spelling (Closes #2697) 2014-04-04 22:15:21 +02:00
AGSPhoenix
fa387d2d99 Revert "Workaround for regex engine limitation"
This reverts commit 6d0d573eca.
2014-04-04 15:37:49 -04:00
AGSPhoenix
6d0d573eca Workaround for regex engine limitation 2014-04-04 15:25:28 -04:00
AGSPhoenix
bb799e811b Add a test for the new URL pages
Add a test for the pages with the video_id in the URL.
2014-04-04 13:52:35 -04:00
AGSPhoenix
04ee53eca1 Support TeamCoco URLs with video_id in the title
If the URL has the video_id in it, use that since the current method of
finding the id breaks on those pages.

Fixes 2698.
2014-04-04 13:42:34 -04:00
Jaime Marquínez Ferrándiz
659eb98a53 [breakcom] Fix YouTube videos extraction (fixes #2699) 2014-04-04 19:01:18 +02:00
anovicecodemonkey
ca6aada48e Fix _TEST for Ustream embed URLs 2014-04-05 03:26:29 +10:30
Jaime Marquínez Ferrándiz
43df5a7e71 [keezmovies] Modernize 2014-04-04 18:52:43 +02:00
Jaime Marquínez Ferrándiz
88f1c6de7b [yahoo] Modernize 2014-04-04 18:52:43 +02:00
Sergey M․
65a40ab82b [pornhd] Update test checksum 2014-04-04 22:47:38 +07:00
Sergey M․
4b9cced103 [pornhd] Fix extraction (Closes #2693) 2014-04-04 22:45:39 +07:00
anovicecodemonkey
5c38625259 [UstreamIE] [generic] Added support for Ustream embed URLs (Fixes #2694) 2014-04-05 00:53:09 +10:30
Sergey M․
6344fa04bb [rts] Add more formats and audio support (Closes #2689) 2014-04-04 20:42:06 +07:00
Jaime Marquínez Ferrándiz
e3ced9ed61 [downloader/common] Use compat_str with the error in try_rename (appeared in #2389)
Otherwise on python 2.x we get `UnicodeDecodeError` because it may contain non ascii characters.
2014-04-04 14:59:11 +02:00
Philipp Hagemeister
5075d598bc release 2014.04.04.2 2014-04-04 02:24:21 +02:00
Philipp Hagemeister
68eb8e90e6 [youtube:playlist] Fix playlists for logged-in users (Fixes #2690) 2014-04-04 02:23:36 +02:00
Philipp Hagemeister
d3a96346c4 release 2014.04.04.3 2014-04-04 02:09:16 +02:00
Philipp Hagemeister
0e518e2fea [cnet] Fall back to "videos" key 2014-04-04 02:09:04 +02:00
Philipp Hagemeister
1e0a235f39 [dailymotion] Fix playlist+user 2014-04-04 02:04:16 +02:00
Philipp Hagemeister
9ad400f75e [generic] Remove test case that has become a 404 2014-04-04 01:47:17 +02:00
Philipp Hagemeister
3537b93d8a [tests] Fix YoutubeDL tests
Since bec1fad, the id, title, and url (also in formats) keys are mandatory. Change the tests to reflect that.
2014-04-04 01:45:49 +02:00
pulpe
784763c565 we don't need to run ffmpeg more times 2014-03-26 15:22:52 +01:00
pulpe
39c68260c0 fix ffmpeg metadatapp 2014-03-26 15:22:52 +01:00
pulpe
149254d0d5 fix ffmpeg error, if youtube-dl runs more than once with --embed-thumbnail with same video 2014-03-26 15:22:52 +01:00
pulpe
0c14e2fbe3 add post processor 2014-03-26 15:22:51 +01:00
111 changed files with 3674 additions and 1110 deletions

View File

@@ -3,6 +3,7 @@ python:
- "2.6"
- "2.7"
- "3.3"
- "3.4"
script: nosetests test --verbose
notifications:
email:

View File

@@ -1,7 +1,7 @@
all: youtube-dl README.md README.txt youtube-dl.1 youtube-dl.bash-completion
clean:
rm -rf youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz
rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz
cleanall: clean
rm -f youtube-dl youtube-dl.exe
@@ -55,7 +55,9 @@ README.txt: README.md
pandoc -f markdown -t plain README.md -o README.txt
youtube-dl.1: README.md
pandoc -s -f markdown -t man README.md -o youtube-dl.1
python devscripts/prepare_manpage.py >youtube-dl.1.temp.md
pandoc -s -f markdown -t man youtube-dl.1.temp.md -o youtube-dl.1
rm -f youtube-dl.1.temp.md
youtube-dl.bash-completion: youtube_dl/*.py youtube_dl/*/*.py devscripts/bash-completion.in
python devscripts/bash-completion.py

View File

@@ -1,11 +1,24 @@
% YOUTUBE-DL(1)
# NAME
youtube-dl - download videos from youtube.com or other video platforms
# SYNOPSIS
**youtube-dl** [OPTIONS] URL [URL...]
# INSTALLATION
To install it right away for all UNIX users (Linux, OS X, etc.), type:
sudo curl https://yt-dl.org/latest/youtube-dl -o /usr/local/bin/youtube-dl
sudo chmod a+x /usr/local/bin/youtube-dl
If you do not have curl, you can alternatively use a recent wget:
sudo wget https://yt-dl.org/downloads/2014.05.13/youtube-dl -O /usr/local/bin/youtube-dl
sudo chmod a+x /usr/local/bin/youtube-dl
Windows users can [download a .exe file](https://yt-dl.org/latest/youtube-dl.exe) and place it in their home directory or any other location on their [PATH](http://en.wikipedia.org/wiki/PATH_%28variable%29).
Alternatively, refer to the developer instructions below for how to check out and work with the git repository. For further options, including PGP signatures, see https://rg3.github.io/youtube-dl/download.html .
# DESCRIPTION
**youtube-dl** is a small command-line program to download videos from
YouTube.com and a few more sites. It requires the Python interpreter, version
@@ -250,6 +263,7 @@ which means you can modify it, redistribute it or use it however you like.
default
--embed-subs embed subtitles in the video (only for mp4
videos)
--embed-thumbnail embed thumbnail in the audio as cover art
--add-metadata write metadata to the video file
--xattrs write metadata to the video file's xattrs
(using dublin core and xdg standards)
@@ -457,7 +471,7 @@ If your report is shorter than two lines, it is almost certainly missing some of
For bug reports, this means that your report should contain the *complete* output of youtube-dl when called with the -v flag. The error message you get for (most) bugs even says so, but you would not believe how many of our bug reports do not contain this information.
Site support requests must contain an example URL. An example URL is a URL you might want to download, like http://www.youtube.com/watch?v=BaW_jenozKc . There should be an obvious video present. Except under very special circumstances, the main page of a video service (e.g. http://www.youtube.com/ ) is *not* an example URL.
Site support requests **must contain an example URL**. An example URL is a URL you might want to download, like http://www.youtube.com/watch?v=BaW_jenozKc . There should be an obvious video present. Except under very special circumstances, the main page of a video service (e.g. http://www.youtube.com/ ) is *not* an example URL.
### Are you using the latest version?

View File

@@ -15,7 +15,7 @@ header = oldreadme[:oldreadme.index('# OPTIONS')]
footer = oldreadme[oldreadme.index('# CONFIGURATION'):]
options = helptext[helptext.index(' General Options:') + 19:]
options = re.sub(r'^ (\w.+)$', r'## \1', options, flags=re.M)
options = re.sub(r'(?m)^ (\w.+)$', r'## \1', options)
options = '# OPTIONS\n' + options + '\n'
with io.open(README_FILE, 'w', encoding='utf-8') as f:

View File

@@ -0,0 +1,20 @@
import io
import os.path
import sys
import re
ROOT_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
README_FILE = os.path.join(ROOT_DIR, 'README.md')
with io.open(README_FILE, encoding='utf-8') as f:
readme = f.read()
PREFIX = '%YOUTUBE-DL(1)\n\n# NAME\n'
readme = re.sub(r'(?s)# INSTALLATION.*?(?=# DESCRIPTION)', '', readme)
readme = PREFIX + readme
if sys.version_info < (3, 0):
print(readme.encode('utf-8'))
else:
print(readme)

View File

@@ -74,13 +74,19 @@ class FakeYDL(YoutubeDL):
old_report_warning(message)
self.report_warning = types.MethodType(report_warning, self)
def gettestcases():
def gettestcases(include_onlymatching=False):
for ie in youtube_dl.extractor.gen_extractors():
t = getattr(ie, '_TEST', None)
if t:
t['name'] = type(ie).__name__[:-len('IE')]
yield t
for t in getattr(ie, '_TESTS', []):
assert not hasattr(ie, '_TESTS'), \
'%s has _TEST and _TESTS' % type(ie).__name__
tests = [t]
else:
tests = getattr(ie, '_TESTS', [])
for t in tests:
if not include_onlymatching and t.get('only_matching', False):
continue
t['name'] = type(ie).__name__[:-len('IE')]
yield t
@@ -101,7 +107,7 @@ def expect_info_dict(self, expected_dict, got_dict):
elif isinstance(expected, type):
got = got_dict.get(info_field)
self.assertTrue(isinstance(got, expected),
u'Expected type %r, but got value %r of type %r' % (expected, got, type(got)))
u'Expected type %r for field %s, but got value %r of type %r' % (expected, info_field, got, type(got)))
else:
if isinstance(expected, compat_str) and expected.startswith('md5:'):
got = 'md5:' + md5(got_dict.get(info_field))
@@ -128,3 +134,17 @@ def expect_info_dict(self, expected_dict, got_dict):
missing_keys,
'Missing keys in test definition: %s' % (
', '.join(sorted(missing_keys))))
def assertRegexpMatches(self, text, regexp, msg=None):
if hasattr(self, 'assertRegexpMatches'):
return self.assertRegexpMatches(text, regexp, msg)
else:
m = re.match(regexp, text)
if not m:
note = 'Regexp didn\'t match: %r not found in %r' % (regexp, text)
if msg is None:
msg = note
else:
msg = note + ', ' + msg
self.assertTrue(m, msg)

View File

@@ -8,7 +8,7 @@ import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import FakeYDL
from test.helper import FakeYDL, assertRegexpMatches
from youtube_dl import YoutubeDL
from youtube_dl.extractor import YoutubeIE
@@ -26,16 +26,27 @@ class YDL(FakeYDL):
self.msgs.append(msg)
def _make_result(formats, **kwargs):
res = {
'formats': formats,
'id': 'testid',
'title': 'testttitle',
'extractor': 'testex',
}
res.update(**kwargs)
return res
class TestFormatSelection(unittest.TestCase):
def test_prefer_free_formats(self):
# Same resolution => download webm
ydl = YDL()
ydl.params['prefer_free_formats'] = True
formats = [
{'ext': 'webm', 'height': 460},
{'ext': 'mp4', 'height': 460},
{'ext': 'webm', 'height': 460, 'url': 'x'},
{'ext': 'mp4', 'height': 460, 'url': 'y'},
]
info_dict = {'formats': formats, 'extractor': 'test'}
info_dict = _make_result(formats)
yie = YoutubeIE(ydl)
yie._sort_formats(info_dict['formats'])
ydl.process_ie_result(info_dict)
@@ -46,8 +57,8 @@ class TestFormatSelection(unittest.TestCase):
ydl = YDL()
ydl.params['prefer_free_formats'] = True
formats = [
{'ext': 'webm', 'height': 720},
{'ext': 'mp4', 'height': 1080},
{'ext': 'webm', 'height': 720, 'url': 'a'},
{'ext': 'mp4', 'height': 1080, 'url': 'b'},
]
info_dict['formats'] = formats
yie = YoutubeIE(ydl)
@@ -60,9 +71,9 @@ class TestFormatSelection(unittest.TestCase):
ydl = YDL()
ydl.params['prefer_free_formats'] = False
formats = [
{'ext': 'webm', 'height': 720},
{'ext': 'mp4', 'height': 720},
{'ext': 'flv', 'height': 720},
{'ext': 'webm', 'height': 720, 'url': '_'},
{'ext': 'mp4', 'height': 720, 'url': '_'},
{'ext': 'flv', 'height': 720, 'url': '_'},
]
info_dict['formats'] = formats
yie = YoutubeIE(ydl)
@@ -74,8 +85,8 @@ class TestFormatSelection(unittest.TestCase):
ydl = YDL()
ydl.params['prefer_free_formats'] = False
formats = [
{'ext': 'flv', 'height': 720},
{'ext': 'webm', 'height': 720},
{'ext': 'flv', 'height': 720, 'url': '_'},
{'ext': 'webm', 'height': 720, 'url': '_'},
]
info_dict['formats'] = formats
yie = YoutubeIE(ydl)
@@ -91,8 +102,7 @@ class TestFormatSelection(unittest.TestCase):
{'format_id': 'great', 'url': 'http://example.com/great', 'preference': 3},
{'format_id': 'excellent', 'url': 'http://example.com/exc', 'preference': 4},
]
info_dict = {
'formats': formats, 'extractor': 'test', 'id': 'testvid'}
info_dict = _make_result(formats)
ydl = YDL()
ydl.process_ie_result(info_dict)
@@ -120,12 +130,12 @@ class TestFormatSelection(unittest.TestCase):
def test_format_selection(self):
formats = [
{'format_id': '35', 'ext': 'mp4', 'preference': 1},
{'format_id': '45', 'ext': 'webm', 'preference': 2},
{'format_id': '47', 'ext': 'webm', 'preference': 3},
{'format_id': '2', 'ext': 'flv', 'preference': 4},
{'format_id': '35', 'ext': 'mp4', 'preference': 1, 'url': '_'},
{'format_id': '45', 'ext': 'webm', 'preference': 2, 'url': '_'},
{'format_id': '47', 'ext': 'webm', 'preference': 3, 'url': '_'},
{'format_id': '2', 'ext': 'flv', 'preference': 4, 'url': '_'},
]
info_dict = {'formats': formats, 'extractor': 'test'}
info_dict = _make_result(formats)
ydl = YDL({'format': '20/47'})
ydl.process_ie_result(info_dict.copy())
@@ -154,12 +164,12 @@ class TestFormatSelection(unittest.TestCase):
def test_format_selection_audio(self):
formats = [
{'format_id': 'audio-low', 'ext': 'webm', 'preference': 1, 'vcodec': 'none'},
{'format_id': 'audio-mid', 'ext': 'webm', 'preference': 2, 'vcodec': 'none'},
{'format_id': 'audio-high', 'ext': 'flv', 'preference': 3, 'vcodec': 'none'},
{'format_id': 'vid', 'ext': 'mp4', 'preference': 4},
{'format_id': 'audio-low', 'ext': 'webm', 'preference': 1, 'vcodec': 'none', 'url': '_'},
{'format_id': 'audio-mid', 'ext': 'webm', 'preference': 2, 'vcodec': 'none', 'url': '_'},
{'format_id': 'audio-high', 'ext': 'flv', 'preference': 3, 'vcodec': 'none', 'url': '_'},
{'format_id': 'vid', 'ext': 'mp4', 'preference': 4, 'url': '_'},
]
info_dict = {'formats': formats, 'extractor': 'test'}
info_dict = _make_result(formats)
ydl = YDL({'format': 'bestaudio'})
ydl.process_ie_result(info_dict.copy())
@@ -172,10 +182,10 @@ class TestFormatSelection(unittest.TestCase):
self.assertEqual(downloaded['format_id'], 'audio-low')
formats = [
{'format_id': 'vid-low', 'ext': 'mp4', 'preference': 1},
{'format_id': 'vid-high', 'ext': 'mp4', 'preference': 2},
{'format_id': 'vid-low', 'ext': 'mp4', 'preference': 1, 'url': '_'},
{'format_id': 'vid-high', 'ext': 'mp4', 'preference': 2, 'url': '_'},
]
info_dict = {'formats': formats, 'extractor': 'test'}
info_dict = _make_result(formats)
ydl = YDL({'format': 'bestaudio/worstaudio/best'})
ydl.process_ie_result(info_dict.copy())
@@ -184,11 +194,11 @@ class TestFormatSelection(unittest.TestCase):
def test_format_selection_video(self):
formats = [
{'format_id': 'dash-video-low', 'ext': 'mp4', 'preference': 1, 'acodec': 'none'},
{'format_id': 'dash-video-high', 'ext': 'mp4', 'preference': 2, 'acodec': 'none'},
{'format_id': 'vid', 'ext': 'mp4', 'preference': 3},
{'format_id': 'dash-video-low', 'ext': 'mp4', 'preference': 1, 'acodec': 'none', 'url': '_'},
{'format_id': 'dash-video-high', 'ext': 'mp4', 'preference': 2, 'acodec': 'none', 'url': '_'},
{'format_id': 'vid', 'ext': 'mp4', 'preference': 3, 'url': '_'},
]
info_dict = {'formats': formats, 'extractor': 'test'}
info_dict = _make_result(formats)
ydl = YDL({'format': 'bestvideo'})
ydl.process_ie_result(info_dict.copy())
@@ -217,10 +227,12 @@ class TestFormatSelection(unittest.TestCase):
for f1id, f2id in zip(order, order[1:]):
f1 = YoutubeIE._formats[f1id].copy()
f1['format_id'] = f1id
f1['url'] = 'url:' + f1id
f2 = YoutubeIE._formats[f2id].copy()
f2['format_id'] = f2id
f2['url'] = 'url:' + f2id
info_dict = {'formats': [f1, f2], 'extractor': 'youtube'}
info_dict = _make_result([f1, f2], extractor='youtube')
ydl = YDL()
yie = YoutubeIE(ydl)
yie._sort_formats(info_dict['formats'])
@@ -228,7 +240,7 @@ class TestFormatSelection(unittest.TestCase):
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], f1id)
info_dict = {'formats': [f2, f1], 'extractor': 'youtube'}
info_dict = _make_result([f2, f1], extractor='youtube')
ydl = YDL()
yie = YoutubeIE(ydl)
yie._sort_formats(info_dict['formats'])
@@ -262,6 +274,12 @@ class TestFormatSelection(unittest.TestCase):
# Replace missing fields with 'NA'
self.assertEqual(fname('%(uploader_date)s-%(id)s.%(ext)s'), 'NA-1234.mp4')
def test_format_note(self):
ydl = YoutubeDL()
self.assertEqual(ydl._format_note({}), '')
assertRegexpMatches(self, ydl._format_note({
'vbr': 10,
}), '^x\s*10k$')
if __name__ == '__main__':
unittest.main()

View File

@@ -49,6 +49,7 @@ class TestAllURLsMatching(unittest.TestCase):
self.assertMatch('http://youtu.be/BaW_jenozKc', ['youtube'])
self.assertMatch('http://www.youtube.com/v/BaW_jenozKc', ['youtube'])
self.assertMatch('https://youtube.googleapis.com/v/BaW_jenozKc', ['youtube'])
self.assertMatch('http://www.cleanvideosearch.com/media/action/yt/watch?videoId=8v_4O44sfjM', ['youtube'])
def test_youtube_channel_matching(self):
assertChannel = lambda url: self.assertMatch(url, ['youtube:channel'])
@@ -76,20 +77,20 @@ class TestAllURLsMatching(unittest.TestCase):
self.assertMatch('https://www.youtube.com/results?baz=bar&search_query=youtube-dl+test+video&filters=video&lclk=video', ['youtube:search_url'])
def test_justin_tv_channelid_matching(self):
self.assertTrue(JustinTVIE.suitable(u"justin.tv/vanillatv"))
self.assertTrue(JustinTVIE.suitable(u"twitch.tv/vanillatv"))
self.assertTrue(JustinTVIE.suitable(u"www.justin.tv/vanillatv"))
self.assertTrue(JustinTVIE.suitable(u"www.twitch.tv/vanillatv"))
self.assertTrue(JustinTVIE.suitable(u"http://www.justin.tv/vanillatv"))
self.assertTrue(JustinTVIE.suitable(u"http://www.twitch.tv/vanillatv"))
self.assertTrue(JustinTVIE.suitable(u"http://www.justin.tv/vanillatv/"))
self.assertTrue(JustinTVIE.suitable(u"http://www.twitch.tv/vanillatv/"))
self.assertTrue(JustinTVIE.suitable('justin.tv/vanillatv'))
self.assertTrue(JustinTVIE.suitable('twitch.tv/vanillatv'))
self.assertTrue(JustinTVIE.suitable('www.justin.tv/vanillatv'))
self.assertTrue(JustinTVIE.suitable('www.twitch.tv/vanillatv'))
self.assertTrue(JustinTVIE.suitable('http://www.justin.tv/vanillatv'))
self.assertTrue(JustinTVIE.suitable('http://www.twitch.tv/vanillatv'))
self.assertTrue(JustinTVIE.suitable('http://www.justin.tv/vanillatv/'))
self.assertTrue(JustinTVIE.suitable('http://www.twitch.tv/vanillatv/'))
def test_justintv_videoid_matching(self):
self.assertTrue(JustinTVIE.suitable(u"http://www.twitch.tv/vanillatv/b/328087483"))
self.assertTrue(JustinTVIE.suitable('http://www.twitch.tv/vanillatv/b/328087483'))
def test_justin_tv_chapterid_matching(self):
self.assertTrue(JustinTVIE.suitable(u"http://www.twitch.tv/tsm_theoddone/c/2349361"))
self.assertTrue(JustinTVIE.suitable('http://www.twitch.tv/tsm_theoddone/c/2349361'))
def test_youtube_extract(self):
assertExtractId = lambda url, id: self.assertEqual(YoutubeIE.extract_id(url), id)
@@ -105,7 +106,7 @@ class TestAllURLsMatching(unittest.TestCase):
def test_no_duplicates(self):
ies = gen_extractors()
for tc in gettestcases():
for tc in gettestcases(include_onlymatching=True):
url = tc['url']
for ie in ies:
if type(ie).__name__ in ('GenericIE', tc['name'] + 'IE'):
@@ -156,6 +157,25 @@ class TestAllURLsMatching(unittest.TestCase):
self.assertMatch(
'http://thedailyshow.cc.com/guests/michael-lewis/3efna8/exclusive---michael-lewis-extended-interview-pt--3',
['ComedyCentralShows'])
self.assertMatch(
'http://thedailyshow.cc.com/episodes/sy7yv0/april-8--2014---denis-leary',
['ComedyCentralShows'])
self.assertMatch(
'http://thecolbertreport.cc.com/episodes/8ase07/april-8--2014---jane-goodall',
['ComedyCentralShows'])
self.assertMatch(
'http://thedailyshow.cc.com/video-playlists/npde3s/the-daily-show-19088-highlights',
['ComedyCentralShows'])
self.assertMatch(
'http://thedailyshow.cc.com/special-editions/2l8fdb/special-edition---a-look-back-at-food',
['ComedyCentralShows'])
def test_yahoo_https(self):
# https://github.com/rg3/youtube-dl/issues/2701
self.assertMatch(
'https://screen.yahoo.com/smartwatches-latest-wearable-gadgets-163745379-cbs.html',
['Yahoo'])
if __name__ == '__main__':
unittest.main()

View File

@@ -10,6 +10,7 @@ import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import (
assertRegexpMatches,
expect_info_dict,
FakeYDL,
)
@@ -22,9 +23,11 @@ from youtube_dl.extractor import (
VimeoUserIE,
VimeoAlbumIE,
VimeoGroupsIE,
VineUserIE,
UstreamChannelIE,
SoundcloudSetIE,
SoundcloudUserIE,
SoundcloudPlaylistIE,
LivestreamIE,
NHLVideocenterIE,
BambuserChannelIE,
@@ -43,6 +46,7 @@ from youtube_dl.extractor import (
XTubeUserIE,
InstagramUserIE,
CSpanIE,
AolIE,
)
@@ -99,6 +103,13 @@ class TestPlaylists(unittest.TestCase):
self.assertEqual(result['title'], 'Rolex Awards for Enterprise')
self.assertTrue(len(result['entries']) > 72)
def test_vine_user(self):
dl = FakeYDL()
ie = VineUserIE(dl)
result = ie.extract('https://vine.co/Visa')
self.assertIsPlaylist(result)
self.assertTrue(len(result['entries']) >= 50)
def test_ustream_channel(self):
dl = FakeYDL()
ie = UstreamChannelIE(dl)
@@ -123,6 +134,17 @@ class TestPlaylists(unittest.TestCase):
self.assertEqual(result['id'], '9615865')
self.assertTrue(len(result['entries']) >= 12)
def test_soundcloud_playlist(self):
dl = FakeYDL()
ie = SoundcloudPlaylistIE(dl)
result = ie.extract('http://api.soundcloud.com/playlists/4110309')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], '4110309')
self.assertEqual(result['title'], 'TILT Brass - Bowery Poetry Club, August \'03 [Non-Site SCR 02]')
assertRegexpMatches(
self, result['description'], r'TILT Brass - Bowery Poetry Club')
self.assertEqual(len(result['entries']), 6)
def test_livestream_event(self):
dl = FakeYDL()
ie = LivestreamIE(dl)
@@ -191,8 +213,8 @@ class TestPlaylists(unittest.TestCase):
self.assertIsPlaylist(result)
self.assertEqual(result['id'], 'dezhurnyi_angel')
self.assertEqual(result['title'], 'Дежурный ангел (2010 - 2012)')
self.assertTrue(len(result['entries']) >= 36)
self.assertTrue(len(result['entries']) >= 23)
def test_ivi_compilation_season(self):
dl = FakeYDL()
ie = IviCompilationIE(dl)
@@ -200,7 +222,7 @@ class TestPlaylists(unittest.TestCase):
self.assertIsPlaylist(result)
self.assertEqual(result['id'], 'dezhurnyi_angel/season2')
self.assertEqual(result['title'], 'Дежурный ангел (2010 - 2012) 2 сезон')
self.assertTrue(len(result['entries']) >= 20)
self.assertTrue(len(result['entries']) >= 7)
def test_imdb_list(self):
dl = FakeYDL()
@@ -324,10 +346,19 @@ class TestPlaylists(unittest.TestCase):
self.assertEqual(result['id'], '342759')
self.assertEqual(
result['title'], 'General Motors Ignition Switch Recall')
self.assertEqual(len(result['entries']), 9)
whole_duration = sum(e['duration'] for e in result['entries'])
self.assertEqual(whole_duration, 14855)
def test_aol_playlist(self):
dl = FakeYDL()
ie = AolIE(dl)
result = ie.extract(
'http://on.aol.com/playlist/brace-yourself---todays-weirdest-news-152147?icid=OnHomepageC4_Omg_Img#_videoid=518184316')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], '152147')
self.assertEqual(
result['title'], 'Brace Yourself - Today\'s Weirdest News')
self.assertTrue(len(result['entries']) >= 10)
if __name__ == '__main__':
unittest.main()

View File

@@ -181,7 +181,7 @@ class TestTedSubtitles(BaseTestSubtitles):
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(len(subtitles.keys()), 28)
self.assertTrue(len(subtitles.keys()) >= 28)
def test_list_subtitles(self):
self.DL.expect_warning(u'Automatic Captions not supported by this server')

View File

@@ -38,6 +38,7 @@ from youtube_dl.utils import (
xpath_with_ns,
parse_iso8601,
strip_jsonp,
uppercase_escape,
)
if sys.version_info < (3, 0):
@@ -279,6 +280,9 @@ class TestUtil(unittest.TestCase):
d = json.loads(stripped)
self.assertEqual(d, [{"id": "532cb", "x": 3}])
def test_uppercase_escpae(self):
self.assertEqual(uppercase_escape(u''), u'')
self.assertEqual(uppercase_escape(u'\\U0001d550'), u'𝕐')
if __name__ == '__main__':
unittest.main()

135
youtube_dl/YoutubeDL.py Normal file → Executable file
View File

@@ -31,6 +31,7 @@ from .utils import (
ContentTooShortError,
date_from_str,
DateRange,
DEFAULT_OUTTMPL,
determine_ext,
DownloadError,
encodeFilename,
@@ -286,6 +287,9 @@ class YoutubeDL(object):
"""Print message to stdout if not in quiet mode."""
return self.to_stdout(message, skip_eol, check_quiet=True)
def _write_string(self, s, out=None):
write_string(s, out=out, encoding=self.params.get('encoding'))
def to_stdout(self, message, skip_eol=False, check_quiet=False):
"""Print message to stdout if not in quiet mode."""
if self.params.get('logger'):
@@ -295,7 +299,7 @@ class YoutubeDL(object):
terminator = ['\n', ''][skip_eol]
output = message + terminator
write_string(output, self._screen_file)
self._write_string(output, self._screen_file)
def to_stderr(self, message):
"""Print message to stderr."""
@@ -305,7 +309,7 @@ class YoutubeDL(object):
else:
message = self._bidi_workaround(message)
output = message + '\n'
write_string(output, self._err_file)
self._write_string(output, self._err_file)
def to_console_title(self, message):
if not self.params.get('consoletitle', False):
@@ -315,21 +319,21 @@ class YoutubeDL(object):
# already of type unicode()
ctypes.windll.kernel32.SetConsoleTitleW(ctypes.c_wchar_p(message))
elif 'TERM' in os.environ:
write_string('\033]0;%s\007' % message, self._screen_file)
self._write_string('\033]0;%s\007' % message, self._screen_file)
def save_console_title(self):
if not self.params.get('consoletitle', False):
return
if 'TERM' in os.environ:
# Save the title on stack
write_string('\033[22;0t', self._screen_file)
self._write_string('\033[22;0t', self._screen_file)
def restore_console_title(self):
if not self.params.get('consoletitle', False):
return
if 'TERM' in os.environ:
# Restore the title from stack
write_string('\033[23;0t', self._screen_file)
self._write_string('\033[23;0t', self._screen_file)
def __enter__(self):
self.save_console_title()
@@ -437,7 +441,8 @@ class YoutubeDL(object):
if v is not None)
template_dict = collections.defaultdict(lambda: 'NA', template_dict)
tmpl = os.path.expanduser(self.params['outtmpl'])
outtmpl = self.params.get('outtmpl', DEFAULT_OUTTMPL)
tmpl = os.path.expanduser(outtmpl)
filename = tmpl % template_dict
return filename
except ValueError as err:
@@ -933,7 +938,7 @@ class YoutubeDL(object):
with io.open(encodeFilename(sub_filename), 'w', encoding='utf-8') as subfile:
subfile.write(sub)
except (OSError, IOError):
self.report_error('Cannot write subtitles file ' + descfn)
self.report_error('Cannot write subtitles file ' + sub_filename)
return
if self.params.get('writeinfojson', False):
@@ -1022,10 +1027,11 @@ class YoutubeDL(object):
def download(self, url_list):
"""Download a given list of URLs."""
outtmpl = self.params.get('outtmpl', DEFAULT_OUTTMPL)
if (len(url_list) > 1 and
'%' not in self.params['outtmpl']
'%' not in outtmpl
and self.params.get('max_downloads') != 1):
raise SameFileError(self.params['outtmpl'])
raise SameFileError(outtmpl)
for url in url_list:
try:
@@ -1136,57 +1142,57 @@ class YoutubeDL(object):
res = default
return res
def list_formats(self, info_dict):
def format_note(fdict):
res = ''
if fdict.get('ext') in ['f4f', 'f4m']:
res += '(unsupported) '
if fdict.get('format_note') is not None:
res += fdict['format_note'] + ' '
if fdict.get('tbr') is not None:
res += '%4dk ' % fdict['tbr']
if fdict.get('container') is not None:
if res:
res += ', '
res += '%s container' % fdict['container']
if (fdict.get('vcodec') is not None and
fdict.get('vcodec') != 'none'):
if res:
res += ', '
res += fdict['vcodec']
if fdict.get('vbr') is not None:
res += '@'
elif fdict.get('vbr') is not None and fdict.get('abr') is not None:
res += 'video@'
def _format_note(self, fdict):
res = ''
if fdict.get('ext') in ['f4f', 'f4m']:
res += '(unsupported) '
if fdict.get('format_note') is not None:
res += fdict['format_note'] + ' '
if fdict.get('tbr') is not None:
res += '%4dk ' % fdict['tbr']
if fdict.get('container') is not None:
if res:
res += ', '
res += '%s container' % fdict['container']
if (fdict.get('vcodec') is not None and
fdict.get('vcodec') != 'none'):
if res:
res += ', '
res += fdict['vcodec']
if fdict.get('vbr') is not None:
res += '%4dk' % fdict['vbr']
if fdict.get('acodec') is not None:
if res:
res += ', '
if fdict['acodec'] == 'none':
res += 'video only'
else:
res += '%-5s' % fdict['acodec']
elif fdict.get('abr') is not None:
if res:
res += ', '
res += 'audio'
if fdict.get('abr') is not None:
res += '@%3dk' % fdict['abr']
if fdict.get('asr') is not None:
res += ' (%5dHz)' % fdict['asr']
if fdict.get('filesize') is not None:
if res:
res += ', '
res += format_bytes(fdict['filesize'])
return res
res += '@'
elif fdict.get('vbr') is not None and fdict.get('abr') is not None:
res += 'video@'
if fdict.get('vbr') is not None:
res += '%4dk' % fdict['vbr']
if fdict.get('acodec') is not None:
if res:
res += ', '
if fdict['acodec'] == 'none':
res += 'video only'
else:
res += '%-5s' % fdict['acodec']
elif fdict.get('abr') is not None:
if res:
res += ', '
res += 'audio'
if fdict.get('abr') is not None:
res += '@%3dk' % fdict['abr']
if fdict.get('asr') is not None:
res += ' (%5dHz)' % fdict['asr']
if fdict.get('filesize') is not None:
if res:
res += ', '
res += format_bytes(fdict['filesize'])
return res
def list_formats(self, info_dict):
def line(format, idlen=20):
return (('%-' + compat_str(idlen + 1) + 's%-10s%-12s%s') % (
format['format_id'],
format['ext'],
self.format_resolution(format),
format_note(format),
self._format_note(format),
))
formats = info_dict.get('formats', [info_dict])
@@ -1194,8 +1200,8 @@ class YoutubeDL(object):
max(len(f['format_id']) for f in formats))
formats_s = [line(f, idlen) for f in formats]
if len(formats) > 1:
formats_s[0] += (' ' if format_note(formats[0]) else '') + '(worst)'
formats_s[-1] += (' ' if format_note(formats[-1]) else '') + '(best)'
formats_s[0] += (' ' if self._format_note(formats[0]) else '') + '(worst)'
formats_s[-1] += (' ' if self._format_note(formats[-1]) else '') + '(best)'
header_line = line({
'format_id': 'format code', 'ext': 'extension',
@@ -1211,9 +1217,16 @@ class YoutubeDL(object):
if not self.params.get('verbose'):
return
write_string('[debug] Encodings: locale %s, fs %s, out %s, pref %s\n' %
(locale.getpreferredencoding(), sys.getfilesystemencoding(), sys.stdout.encoding, self.get_encoding()))
write_string('[debug] youtube-dl version ' + __version__ + '\n')
write_string(
'[debug] Encodings: locale %s, fs %s, out %s, pref %s\n' % (
locale.getpreferredencoding(),
sys.getfilesystemencoding(),
sys.stdout.encoding,
self.get_encoding()),
encoding=None
)
self._write_string('[debug] youtube-dl version ' + __version__ + '\n')
try:
sp = subprocess.Popen(
['git', 'rev-parse', '--short', 'HEAD'],
@@ -1222,20 +1235,20 @@ class YoutubeDL(object):
out, err = sp.communicate()
out = out.decode().strip()
if re.match('[0-9a-f]+', out):
write_string('[debug] Git HEAD: ' + out + '\n')
self._write_string('[debug] Git HEAD: ' + out + '\n')
except:
try:
sys.exc_clear()
except:
pass
write_string('[debug] Python version %s - %s' %
self._write_string('[debug] Python version %s - %s' %
(platform.python_version(), platform_name()) + '\n')
proxy_map = {}
for handler in self._opener.handlers:
if hasattr(handler, 'proxies'):
proxy_map.update(handler.proxies)
write_string('[debug] Proxy map: ' + compat_str(proxy_map) + '\n')
self._write_string('[debug] Proxy map: ' + compat_str(proxy_map) + '\n')
def _setup_opener(self):
timeout_val = self.params.get('socket_timeout')

View File

@@ -52,6 +52,10 @@ __authors__ = (
'Juan C. Olivares',
'Mattias Harrysson',
'phaer',
'Sainyam Kapoor',
'Nicolas Évrard',
'Jason Normore',
'Hoje Lee',
)
__license__ = 'Public Domain'
@@ -71,6 +75,7 @@ from .utils import (
compat_getpass,
compat_print,
DateRange,
DEFAULT_OUTTMPL,
decodeOption,
get_term_width,
DownloadError,
@@ -91,6 +96,8 @@ from .extractor import gen_extractors
from .version import __version__
from .YoutubeDL import YoutubeDL
from .postprocessor import (
AtomicParsleyPP,
FFmpegAudioFixPP,
FFmpegMetadataPP,
FFmpegVideoConvertor,
FFmpegExtractAudioPP,
@@ -242,7 +249,7 @@ def parseOpts(overrideArguments=None):
help='Use the specified HTTP/HTTPS proxy. Pass in an empty string (--proxy "") for direct connection')
general.add_option('--no-check-certificate', action='store_true', dest='no_check_certificate', default=False, help='Suppress HTTPS certificate validation.')
general.add_option(
'--prefer-insecure', action='store_true', dest='prefer_insecure',
'--prefer-insecure', '--prefer-unsecure', action='store_true', dest='prefer_insecure',
help='Use an unencrypted connection to retrieve information about the video. (Currently supported only for YouTube)')
general.add_option(
'--cache-dir', dest='cachedir', default=get_cachedir(), metavar='DIR',
@@ -502,6 +509,8 @@ def parseOpts(overrideArguments=None):
help='do not overwrite post-processed files; the post-processed files are overwritten by default')
postproc.add_option('--embed-subs', action='store_true', dest='embedsubtitles', default=False,
help='embed subtitles in the video (only for mp4 videos)')
postproc.add_option('--embed-thumbnail', action='store_true', dest='embedthumbnail', default=False,
help='embed thumbnail in the audio as cover art')
postproc.add_option('--add-metadata', action='store_true', dest='addmetadata', default=False,
help='write metadata to the video file')
postproc.add_option('--xattrs', action='store_true', dest='xattrs', default=False,
@@ -671,7 +680,7 @@ def _real_main(argv=None):
if not opts.audioquality.isdigit():
parser.error(u'invalid audio quality specified')
if opts.recodevideo is not None:
if opts.recodevideo not in ['mp4', 'flv', 'webm', 'ogg']:
if opts.recodevideo not in ['mp4', 'flv', 'webm', 'ogg', 'mkv']:
parser.error(u'invalid video recode format specified')
if opts.date is not None:
date = DateRange.day(opts.date)
@@ -700,7 +709,7 @@ def _real_main(argv=None):
or (opts.usetitle and u'%(title)s-%(id)s.%(ext)s')
or (opts.useid and u'%(id)s.%(ext)s')
or (opts.autonumber and u'%(autonumber)s-%(id)s.%(ext)s')
or u'%(title)s-%(id)s.%(ext)s')
or DEFAULT_OUTTMPL)
if not os.path.splitext(outtmpl)[1] and opts.extractaudio:
parser.error(u'Cannot download a video and extract audio into the same'
u' file! Use "{0}.%(ext)s" instead of "{0}" as the output'
@@ -807,6 +816,10 @@ def _real_main(argv=None):
ydl.add_post_processor(FFmpegEmbedSubtitlePP(subtitlesformat=opts.subtitlesformat))
if opts.xattrs:
ydl.add_post_processor(XAttrMetadataPP())
if opts.embedthumbnail:
if not opts.addmetadata:
ydl.add_post_processor(FFmpegAudioFixPP())
ydl.add_post_processor(AtomicParsleyPP())
# Update version
if opts.update_self:

View File

@@ -4,9 +4,10 @@ import sys
import time
from ..utils import (
compat_str,
encodeFilename,
timeconvert,
format_bytes,
timeconvert,
)
@@ -173,7 +174,7 @@ class FileDownloader(object):
return
os.rename(encodeFilename(old_filename), encodeFilename(new_filename))
except (IOError, OSError) as err:
self.report_error(u'unable to rename file: %s' % str(err))
self.report_error(u'unable to rename file: %s' % compat_str(err))
def try_utime(self, filename, last_modified_hdr):
"""Try to set the last-modified time of the given file."""

View File

@@ -14,6 +14,8 @@ from ..utils import (
class HttpFD(FileDownloader):
_TEST_FILE_SIZE = 10241
def real_download(self, filename, info_dict):
url = info_dict['url']
tmpfilename = self.temp_name(filename)
@@ -28,8 +30,10 @@ class HttpFD(FileDownloader):
basic_request = compat_urllib_request.Request(url, None, headers)
request = compat_urllib_request.Request(url, None, headers)
if self.params.get('test', False):
request.add_header('Range', 'bytes=0-10240')
is_test = self.params.get('test', False)
if is_test:
request.add_header('Range', 'bytes=0-%s' % str(self._TEST_FILE_SIZE - 1))
# Establish possible resume length
if os.path.isfile(encodeFilename(tmpfilename)):
@@ -100,6 +104,15 @@ class HttpFD(FileDownloader):
return False
data_len = data.info().get('Content-length', None)
# Range HTTP header may be ignored/unsupported by a webserver
# (e.g. extractor/scivee.py, extractor/bambuser.py).
# However, for a test we still would like to download just a piece of a file.
# To achieve this we limit data_len to _TEST_FILE_SIZE and manually control
# block size when downloading a file.
if is_test and (data_len is None or int(data_len) > self._TEST_FILE_SIZE):
data_len = self._TEST_FILE_SIZE
if data_len is not None:
data_len = int(data_len) + resume_len
min_data_len = self.params.get("min_filesize", None)
@@ -118,7 +131,7 @@ class HttpFD(FileDownloader):
while True:
# Download and write
before = time.time()
data_block = data.read(block_size)
data_block = data.read(block_size if not is_test else min(block_size, data_len - byte_counter))
after = time.time()
if len(data_block) == 0:
break
@@ -162,6 +175,9 @@ class HttpFD(FileDownloader):
'speed': speed,
})
if is_test and byte_counter == data_len:
break
# Apply rate limit
self.slow_down(start, byte_counter - resume_len)

View File

@@ -10,6 +10,7 @@ from .common import FileDownloader
from ..utils import (
encodeFilename,
format_bytes,
compat_str,
)
@@ -127,7 +128,10 @@ class RtmpFD(FileDownloader):
basic_args += ['--flashVer', flash_version]
if live:
basic_args += ['--live']
if conn:
if isinstance(conn, list):
for entry in conn:
basic_args += ['--conn', entry]
elif isinstance(conn, compat_str):
basic_args += ['--conn', conn]
args = basic_args + [[], ['--resume', '--skip', '1']][not live and self.params.get('continuedl', False)]

View File

@@ -20,6 +20,7 @@ from .auengine import AUEngineIE
from .bambuser import BambuserIE, BambuserChannelIE
from .bandcamp import BandcampIE, BandcampAlbumIE
from .bbccouk import BBCCoUkIE
from .bilibili import BiliBiliIE
from .blinkx import BlinkxIE
from .bliptv import BlipTVIE, BlipTVUserIE
from .bloomberg import BloombergIE
@@ -32,6 +33,7 @@ from .canal13cl import Canal13clIE
from .canalplus import CanalplusIE
from .canalc2 import Canalc2IE
from .cbs import CBSIE
from .cbsnews import CBSNewsIE
from .ceskatelevize import CeskaTelevizeIE
from .channel9 import Channel9IE
from .chilloutzone import ChilloutzoneIE
@@ -39,6 +41,7 @@ from .cinemassacre import CinemassacreIE
from .clipfish import ClipfishIE
from .cliphunter import CliphunterIE
from .clipsyndicate import ClipsyndicateIE
from .clubic import ClubicIE
from .cmt import CMTIE
from .cnet import CNETIE
from .cnn import (
@@ -62,12 +65,14 @@ from .dotsub import DotsubIE
from .dreisat import DreiSatIE
from .defense import DefenseGouvFrIE
from .discovery import DiscoveryIE
from .divxstage import DivxStageIE
from .dropbox import DropboxIE
from .ebaumsworld import EbaumsWorldIE
from .ehow import EHowIE
from .eighttracks import EightTracksIE
from .eitb import EitbIE
from .elpais import ElPaisIE
from .empflix import EmpflixIE
from .engadget import EngadgetIE
from .escapist import EscapistIE
from .everyonesmixtape import EveryonesMixtapeIE
@@ -75,6 +80,7 @@ from .exfm import ExfmIE
from .extremetube import ExtremeTubeIE
from .facebook import FacebookIE
from .faz import FazIE
from .fc2 import FC2IE
from .firstpost import FirstpostIE
from .firsttv import FirstTVIE
from .fivemin import FiveMinIE
@@ -105,10 +111,12 @@ from .googleplus import GooglePlusIE
from .googlesearch import GoogleSearchIE
from .hark import HarkIE
from .helsinki import HelsinkiIE
from .hentaistigma import HentaiStigmaIE
from .hotnewhiphop import HotNewHipHopIE
from .howcast import HowcastIE
from .huffpost import HuffPostIE
from .hypem import HypemIE
from .iconosquare import IconosquareIE
from .ign import IGNIE, OneUPIE
from .imdb import (
ImdbIE,
@@ -156,6 +164,8 @@ from .mofosex import MofosexIE
from .mooshare import MooshareIE
from .morningstar import MorningstarIE
from .motorsport import MotorsportIE
from .moviezine import MoviezineIE
from .movshare import MovShareIE
from .mtv import (
MTVIE,
MTVIggyIE,
@@ -174,15 +184,20 @@ from .nbc import (
from .ndr import NDRIE
from .ndtv import NDTVIE
from .newgrounds import NewgroundsIE
from .newstube import NewstubeIE
from .nfb import NFBIE
from .nhl import NHLIE, NHLVideocenterIE
from .niconico import NiconicoIE
from .ninegag import NineGagIE
from .noco import NocoIE
from .normalboots import NormalbootsIE
from .novamov import NovaMovIE
from .nowness import NownessIE
from .nowvideo import NowVideoIE
from .nrk import NRKIE
from .ntv import NTVIE
from .nytimes import NYTimesIE
from .nuvid import NuvidIE
from .oe1 import OE1IE
from .ooyala import OoyalaIE
from .orf import ORFIE
@@ -203,8 +218,10 @@ from .ringtv import RingTVIE
from .ro220 import Ro220IE
from .rottentomatoes import RottenTomatoesIE
from .roxwel import RoxwelIE
from .rtbf import RTBFIE
from .rtlnow import RTLnowIE
from .rts import RTSIE
from .rtve import RTVEALaCartaIE
from .rutube import (
RutubeIE,
RutubeChannelIE,
@@ -213,9 +230,11 @@ from .rutube import (
)
from .rutv import RUTVIE
from .savefrom import SaveFromIE
from .scivee import SciVeeIE
from .servingsys import ServingSysIE
from .sina import SinaIE
from .slideshare import SlideshareIE
from .slutload import SlutloadIE
from .smotri import (
SmotriIE,
SmotriCommunityIE,
@@ -223,7 +242,12 @@ from .smotri import (
SmotriBroadcastIE,
)
from .sohu import SohuIE
from .soundcloud import SoundcloudIE, SoundcloudSetIE, SoundcloudUserIE
from .soundcloud import (
SoundcloudIE,
SoundcloudSetIE,
SoundcloudUserIE,
SoundcloudPlaylistIE
)
from .southparkstudios import (
SouthParkStudiosIE,
SouthparkDeIE,
@@ -233,7 +257,6 @@ from .spankwire import SpankwireIE
from .spiegel import SpiegelIE
from .spike import SpikeIE
from .stanfordoc import StanfordOpenClassroomIE
from .statigram import StatigramIE
from .steam import SteamIE
from .streamcloud import StreamcloudIE
from .streamcz import StreamCZIE
@@ -247,6 +270,7 @@ from .tf1 import TF1IE
from .theplatform import ThePlatformIE
from .thisav import ThisAVIE
from .tinypic import TinyPicIE
from .tlc import TlcIE, TlcDeIE
from .toutv import TouTvIE
from .toypics import ToypicsUserIE, ToypicsIE
from .traileraddict import TrailerAddictIE
@@ -276,6 +300,8 @@ from .videodetective import VideoDetectiveIE
from .videolecturesnet import VideoLecturesNetIE
from .videofyme import VideofyMeIE
from .videopremium import VideoPremiumIE
from .videott import VideoTtIE
from .videoweed import VideoWeedIE
from .vimeo import (
VimeoIE,
VimeoChannelIE,
@@ -283,15 +309,21 @@ from .vimeo import (
VimeoAlbumIE,
VimeoGroupsIE,
VimeoReviewIE,
VimeoWatchLaterIE,
)
from .vine import (
VineIE,
VineUserIE,
)
from .vine import VineIE
from .viki import VikiIE
from .vk import VKIE
from .vube import VubeIE
from .vuclip import VuClipIE
from .washingtonpost import WashingtonPostIE
from .wat import WatIE
from .wdr import (
WDRIE,
WDRMobileIE,
WDRMausIE,
)
from .weibo import WeiboIE

View File

@@ -8,7 +8,18 @@ from .fivemin import FiveMinIE
class AolIE(InfoExtractor):
IE_NAME = 'on.aol.com'
_VALID_URL = r'http://on\.aol\.com/video/.*-(?P<id>\d+)($|\?)'
_VALID_URL = r'''(?x)
(?:
aol-video:|
http://on\.aol\.com/
(?:
video/.*-|
playlist/(?P<playlist_display_id>[^/?#]+?)-(?P<playlist_id>[0-9]+)[?#].*_videoid=
)
)
(?P<id>[0-9]+)
(?:$|\?)
'''
_TEST = {
'url': 'http://on.aol.com/video/u-s--official-warns-of-largest-ever-irs-phone-scam-518167793?icid=OnHomepageC2Wide_MustSee_Img',
@@ -24,5 +35,31 @@ class AolIE(InfoExtractor):
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
self.to_screen('Downloading 5min.com video %s' % video_id)
playlist_id = mobj.group('playlist_id')
if playlist_id and not self._downloader.params.get('noplaylist'):
self.to_screen('Downloading playlist %s - add --no-playlist to just download video %s' % (playlist_id, video_id))
webpage = self._download_webpage(url, playlist_id)
title = self._html_search_regex(
r'<h1 class="video-title[^"]*">(.+?)</h1>', webpage, 'title')
playlist_html = self._search_regex(
r"(?s)<ul\s+class='video-related[^']*'>(.*?)</ul>", webpage,
'playlist HTML')
entries = [{
'_type': 'url',
'url': 'aol-video:%s' % m.group('id'),
'ie_key': 'Aol',
} for m in re.finditer(
r"<a\s+href='.*videoid=(?P<id>[0-9]+)'\s+class='video-thumb'>",
playlist_html)]
return {
'_type': 'playlist',
'id': playlist_id,
'display_id': mobj.group('playlist_display_id'),
'title': title,
'entries': entries,
}
return FiveMinIE._build_result(video_id)

View File

@@ -74,7 +74,8 @@ class ArteTVPlus7IE(InfoExtractor):
return self._extract_from_webpage(webpage, video_id, lang)
def _extract_from_webpage(self, webpage, video_id, lang):
json_url = self._html_search_regex(r'arte_vp_url="(.*?)"', webpage, 'json url')
json_url = self._html_search_regex(
r'arte_vp_url="(.*?)"', webpage, 'json vp url')
return self._extract_from_json_url(json_url, video_id, lang)
def _extract_from_json_url(self, json_url, video_id, lang):
@@ -120,14 +121,17 @@ class ArteTVPlus7IE(InfoExtractor):
return ['HQ', 'MQ', 'EQ', 'SQ'].index(f['quality'])
else:
def sort_key(f):
versionCode = f.get('versionCode')
if versionCode is None:
versionCode = ''
return (
# Sort first by quality
int(f.get('height',-1)),
int(f.get('bitrate',-1)),
int(f.get('height', -1)),
int(f.get('bitrate', -1)),
# The original version with subtitles has lower relevance
re.match(r'VO-ST(F|A)', f.get('versionCode', '')) is None,
re.match(r'VO-ST(F|A)', versionCode) is None,
# The version with sourds/mal subtitles has also lower relevance
re.match(r'VO?(F|A)-STM\1', f.get('versionCode', '')) is None,
re.match(r'VO?(F|A)-STM\1', versionCode) is None,
# Prefer http downloads over m3u8
0 if f['url'].endswith('m3u8') else 1,
)

View File

@@ -12,7 +12,7 @@ from ..utils import (
class BandcampIE(InfoExtractor):
_VALID_URL = r'http://.*?\.bandcamp\.com/track/(?P<title>.*)'
_VALID_URL = r'https?://.*?\.bandcamp\.com/track/(?P<title>.*)'
_TESTS = [{
'url': 'http://youtube-dl.bandcamp.com/track/youtube-dl-test-song',
'file': '1812978515.mp3',
@@ -100,7 +100,7 @@ class BandcampIE(InfoExtractor):
class BandcampAlbumIE(InfoExtractor):
IE_NAME = 'Bandcamp:album'
_VALID_URL = r'http://.*?\.bandcamp\.com/album/(?P<title>.*)'
_VALID_URL = r'https?://(?:(?P<subdomain>[^.]+)\.)?bandcamp\.com(?:/album/(?P<title>[^?#]+))?'
_TEST = {
'url': 'http://blazo.bandcamp.com/album/jazz-format-mixtape-vol-1',
@@ -128,8 +128,10 @@ class BandcampAlbumIE(InfoExtractor):
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
playlist_id = mobj.group('subdomain')
title = mobj.group('title')
webpage = self._download_webpage(url, title)
display_id = title or playlist_id
webpage = self._download_webpage(url, display_id)
tracks_paths = re.findall(r'<a href="(.*?)" itemprop="url">', webpage)
if not tracks_paths:
raise ExtractorError('The page doesn\'t contain any tracks')
@@ -139,6 +141,8 @@ class BandcampAlbumIE(InfoExtractor):
title = self._search_regex(r'album_title : "(.*?)"', webpage, 'title')
return {
'_type': 'playlist',
'id': playlist_id,
'display_id': display_id,
'title': title,
'entries': entries,
}

View File

@@ -0,0 +1,106 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
compat_parse_qs,
ExtractorError,
int_or_none,
unified_strdate,
)
class BiliBiliIE(InfoExtractor):
_VALID_URL = r'http://www\.bilibili\.tv/video/av(?P<id>[0-9]+)/'
_TEST = {
'url': 'http://www.bilibili.tv/video/av1074402/',
'md5': '2c301e4dab317596e837c3e7633e7d86',
'info_dict': {
'id': '1074402',
'ext': 'flv',
'title': '【金坷垃】金泡沫',
'duration': 308,
'upload_date': '20140420',
'thumbnail': 're:^https?://.+\.jpg',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
video_code = self._search_regex(
r'(?s)<div itemprop="video".*?>(.*?)</div>', webpage, 'video code')
title = self._html_search_meta(
'media:title', video_code, 'title', fatal=True)
duration_str = self._html_search_meta(
'duration', video_code, 'duration')
if duration_str is None:
duration = None
else:
duration_mobj = re.match(
r'^T(?:(?P<hours>[0-9]+)H)?(?P<minutes>[0-9]+)M(?P<seconds>[0-9]+)S$',
duration_str)
duration = (
int_or_none(duration_mobj.group('hours'), default=0) * 3600 +
int(duration_mobj.group('minutes')) * 60 +
int(duration_mobj.group('seconds')))
upload_date = unified_strdate(self._html_search_meta(
'uploadDate', video_code, fatal=False))
thumbnail = self._html_search_meta(
'thumbnailUrl', video_code, 'thumbnail', fatal=False)
player_params = compat_parse_qs(self._html_search_regex(
r'<iframe .*?class="player" src="https://secure.bilibili.tv/secure,([^"]+)"',
webpage, 'player params'))
if 'cid' in player_params:
cid = player_params['cid'][0]
lq_doc = self._download_xml(
'http://interface.bilibili.cn/v_cdn_play?cid=%s' % cid,
video_id,
note='Downloading LQ video info'
)
lq_durl = lq_doc.find('.//durl')
formats = [{
'format_id': 'lq',
'quality': 1,
'url': lq_durl.find('./url').text,
'filesize': int_or_none(
lq_durl.find('./size'), get_attr='text'),
}]
hq_doc = self._download_xml(
'http://interface.bilibili.cn/playurl?cid=%s' % cid,
video_id,
note='Downloading HQ video info',
fatal=False,
)
if hq_doc is not False:
hq_durl = hq_doc.find('.//durl')
formats.append({
'format_id': 'hq',
'quality': 2,
'ext': 'flv',
'url': hq_durl.find('./url').text,
'filesize': int_or_none(
hq_durl.find('./size'), get_attr='text'),
})
else:
raise ExtractorError('Unsupported player parameters: %r' % (player_params,))
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'formats': formats,
'duration': duration,
'upload_date': upload_date,
'thumbnail': thumbnail,
}

View File

@@ -1,102 +1,124 @@
from __future__ import unicode_literals
import datetime
import re
from .common import InfoExtractor
from .subtitles import SubtitlesInfoExtractor
from ..utils import (
compat_str,
compat_urllib_request,
unescapeHTML,
parse_iso8601,
compat_urlparse,
clean_html,
compat_str,
)
class BlipTVIE(SubtitlesInfoExtractor):
"""Information extractor for blip.tv"""
_VALID_URL = r'https?://(?:\w+\.)?blip\.tv/(?:(?:.+-|rss/flash/)(?P<id>\d+)|((?:play/|api\.swf#)(?P<lookup_id>[\da-zA-Z]+)))'
_VALID_URL = r'https?://(?:\w+\.)?blip\.tv/((.+/)|(play/)|(api\.swf#))(?P<presumptive_id>.+)$'
_TESTS = [{
'url': 'http://blip.tv/cbr/cbr-exclusive-gotham-city-imposters-bats-vs-jokerz-short-3-5796352',
'md5': 'c6934ad0b6acf2bd920720ec888eb812',
'info_dict': {
'id': '5779306',
'ext': 'mov',
'upload_date': '20111205',
'description': 'md5:9bc31f227219cde65e47eeec8d2dc596',
'uploader': 'Comic Book Resources - CBR TV',
'title': 'CBR EXCLUSIVE: "Gotham City Imposters" Bats VS Jokerz Short 3',
_TESTS = [
{
'url': 'http://blip.tv/cbr/cbr-exclusive-gotham-city-imposters-bats-vs-jokerz-short-3-5796352',
'md5': 'c6934ad0b6acf2bd920720ec888eb812',
'info_dict': {
'id': '5779306',
'ext': 'mov',
'title': 'CBR EXCLUSIVE: "Gotham City Imposters" Bats VS Jokerz Short 3',
'description': 'md5:9bc31f227219cde65e47eeec8d2dc596',
'timestamp': 1323138843,
'upload_date': '20111206',
'uploader': 'cbr',
'uploader_id': '679425',
'duration': 81,
}
},
{
# https://github.com/rg3/youtube-dl/pull/2274
'note': 'Video with subtitles',
'url': 'http://blip.tv/play/h6Uag5OEVgI.html',
'md5': '309f9d25b820b086ca163ffac8031806',
'info_dict': {
'id': '6586561',
'ext': 'mp4',
'title': 'Red vs. Blue Season 11 Episode 1',
'description': 'One-Zero-One',
'timestamp': 1371261608,
'upload_date': '20130615',
'uploader': 'redvsblue',
'uploader_id': '792887',
'duration': 279,
}
}
}, {
# https://github.com/rg3/youtube-dl/pull/2274
'note': 'Video with subtitles',
'url': 'http://blip.tv/play/h6Uag5OEVgI.html',
'md5': '309f9d25b820b086ca163ffac8031806',
'info_dict': {
'id': '6586561',
'ext': 'mp4',
'uploader': 'Red vs. Blue',
'description': 'One-Zero-One',
'upload_date': '20130614',
'title': 'Red vs. Blue Season 11 Episode 1',
}
}]
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
presumptive_id = mobj.group('presumptive_id')
lookup_id = mobj.group('lookup_id')
# See https://github.com/rg3/youtube-dl/issues/857
embed_mobj = re.match(r'https?://(?:\w+\.)?blip\.tv/(?:play/|api\.swf#)([a-zA-Z0-9]+)', url)
if embed_mobj:
info_url = 'http://blip.tv/play/%s.x?p=1' % embed_mobj.group(1)
info_page = self._download_webpage(info_url, embed_mobj.group(1))
video_id = self._search_regex(
r'data-episode-id="([0-9]+)', info_page, 'video_id')
return self.url_result('http://blip.tv/a/a-' + video_id, 'BlipTV')
cchar = '&' if '?' in url else '?'
json_url = url + cchar + 'skin=json&version=2&no_wrap=1'
request = compat_urllib_request.Request(json_url)
request.add_header('User-Agent', 'iTunes/10.6.1')
json_data = self._download_json(request, video_id=presumptive_id)
if 'Post' in json_data:
data = json_data['Post']
if lookup_id:
info_page = self._download_webpage(
'http://blip.tv/play/%s.x?p=1' % lookup_id, lookup_id, 'Resolving lookup id')
video_id = self._search_regex(r'data-episode-id="([0-9]+)', info_page, 'video_id')
else:
data = json_data
video_id = mobj.group('id')
rss = self._download_xml('http://blip.tv/rss/flash/%s' % video_id, video_id, 'Downloading video RSS')
def blip(s):
return '{http://blip.tv/dtd/blip/1.0}%s' % s
def media(s):
return '{http://search.yahoo.com/mrss/}%s' % s
def itunes(s):
return '{http://www.itunes.com/dtds/podcast-1.0.dtd}%s' % s
item = rss.find('channel/item')
video_id = item.find(blip('item_id')).text
title = item.find('./title').text
description = clean_html(compat_str(item.find(blip('puredescription')).text))
timestamp = parse_iso8601(item.find(blip('datestamp')).text)
uploader = item.find(blip('user')).text
uploader_id = item.find(blip('userid')).text
duration = int(item.find(blip('runtime')).text)
media_thumbnail = item.find(media('thumbnail'))
thumbnail = media_thumbnail.get('url') if media_thumbnail is not None else item.find(itunes('image')).text
categories = [category.text for category in item.findall('category')]
video_id = compat_str(data['item_id'])
upload_date = datetime.datetime.strptime(data['datestamp'], '%m-%d-%y %H:%M%p').strftime('%Y%m%d')
subtitles = {}
formats = []
if 'additionalMedia' in data:
for f in data['additionalMedia']:
if f.get('file_type_srt') == 1:
LANGS = {
'english': 'en',
}
lang = f['role'].rpartition('-')[-1].strip().lower()
langcode = LANGS.get(lang, lang)
subtitles[langcode] = f['url']
continue
if not int(f['media_width']): # filter m3u8
continue
subtitles = {}
media_group = item.find(media('group'))
for media_content in media_group.findall(media('content')):
url = media_content.get('url')
role = media_content.get(blip('role'))
msg = self._download_webpage(
url + '?showplayer=20140425131715&referrer=http://blip.tv&mask=7&skin=flashvars&view=url',
video_id, 'Resolving URL for %s' % role)
real_url = compat_urlparse.parse_qs(msg)['message'][0]
media_type = media_content.get('type')
if media_type == 'text/srt' or url.endswith('.srt'):
LANGS = {
'english': 'en',
}
lang = role.rpartition('-')[-1].strip().lower()
langcode = LANGS.get(lang, lang)
subtitles[langcode] = url
elif media_type.startswith('video/'):
formats.append({
'url': f['url'],
'format_id': f['role'],
'width': int(f['media_width']),
'height': int(f['media_height']),
'url': real_url,
'format_id': role,
'format_note': media_type,
'vcodec': media_content.get(blip('vcodec')),
'acodec': media_content.get(blip('acodec')),
'filesize': media_content.get('filesize'),
'width': int(media_content.get('width')),
'height': int(media_content.get('height')),
})
else:
formats.append({
'url': data['media']['url'],
'width': int(data['media']['width']),
'height': int(data['media']['height']),
})
self._sort_formats(formats)
# subtitles
@@ -107,12 +129,14 @@ class BlipTVIE(SubtitlesInfoExtractor):
return {
'id': video_id,
'uploader': data['display_name'],
'upload_date': upload_date,
'title': data['title'],
'thumbnail': data['thumbnailUrl'],
'description': data['description'],
'user_agent': 'iTunes/10.6.1',
'title': title,
'description': description,
'timestamp': timestamp,
'uploader': uploader,
'uploader_id': uploader_id,
'duration': duration,
'thumbnail': thumbnail,
'categories': categories,
'formats': formats,
'subtitles': video_subtitles,
}

View File

@@ -4,39 +4,72 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import ExtractorError
from ..utils import (
ExtractorError,
int_or_none,
)
class BRIE(InfoExtractor):
IE_DESC = "Bayerischer Rundfunk Mediathek"
_VALID_URL = r"^https?://(?:www\.)?br\.de/mediathek/video/(?:sendungen/)?(?:[a-z0-9\-/]+/)?(?P<id>[a-z0-9\-]+)\.html$"
_BASE_URL = "http://www.br.de"
IE_DESC = 'Bayerischer Rundfunk Mediathek'
_VALID_URL = r'https?://(?:www\.)?br\.de/(?:[a-z0-9\-]+/)+(?P<id>[a-z0-9\-]+)\.html'
_BASE_URL = 'http://www.br.de'
_TESTS = [
{
"url": "http://www.br.de/mediathek/video/anselm-gruen-114.html",
"md5": "c4f83cf0f023ba5875aba0bf46860df2",
"info_dict": {
"id": "2c8d81c5-6fb7-4a74-88d4-e768e5856532",
"ext": "mp4",
"title": "Feiern und Verzichten",
"description": "Anselm Grün: Feiern und Verzichten",
"uploader": "BR/Birgit Baier",
"upload_date": "20140301"
'url': 'http://www.br.de/mediathek/video/anselm-gruen-114.html',
'md5': 'c4f83cf0f023ba5875aba0bf46860df2',
'info_dict': {
'id': '2c8d81c5-6fb7-4a74-88d4-e768e5856532',
'ext': 'mp4',
'title': 'Feiern und Verzichten',
'description': 'Anselm Grün: Feiern und Verzichten',
'uploader': 'BR/Birgit Baier',
'upload_date': '20140301',
}
},
{
"url": "http://www.br.de/mediathek/video/sendungen/unter-unserem-himmel/unter-unserem-himmel-alpen-ueber-den-pass-100.html",
"md5": "ab451b09d861dbed7d7cc9ab0be19ebe",
"info_dict": {
"id": "2c060e69-3a27-4e13-b0f0-668fac17d812",
"ext": "mp4",
"title": "Über den Pass",
"description": "Die Eroberung der Alpen: Über den Pass",
"uploader": None,
"upload_date": None
'url': 'http://www.br.de/mediathek/video/sendungen/unter-unserem-himmel/unter-unserem-himmel-alpen-ueber-den-pass-100.html',
'md5': 'ab451b09d861dbed7d7cc9ab0be19ebe',
'info_dict': {
'id': '2c060e69-3a27-4e13-b0f0-668fac17d812',
'ext': 'mp4',
'title': 'Über den Pass',
'description': 'Die Eroberung der Alpen: Über den Pass',
}
}
},
{
'url': 'http://www.br.de/nachrichten/schaeuble-haushaltsentwurf-bundestag-100.html',
'md5': '3db0df1a9a9cd9fa0c70e6ea8aa8e820',
'info_dict': {
'id': 'c6aae3de-2cf9-43f2-957f-f17fef9afaab',
'ext': 'aac',
'title': '"Keine neuen Schulden im nächsten Jahr"',
'description': 'Haushaltsentwurf: "Keine neuen Schulden im nächsten Jahr"',
}
},
{
'url': 'http://www.br.de/radio/bayern1/service/team/videos/team-video-erdelt100.html',
'md5': 'dbab0aef2e047060ea7a21fc1ce1078a',
'info_dict': {
'id': '6ba73750-d405-45d3-861d-1ce8c524e059',
'ext': 'mp4',
'title': 'Umweltbewusster Häuslebauer',
'description': 'Uwe Erdelt: Umweltbewusster Häuslebauer',
}
},
{
'url': 'http://www.br.de/fernsehen/br-alpha/sendungen/kant-fuer-anfaenger/kritik-der-reinen-vernunft/kant-kritik-01-metaphysik100.html',
'md5': '23bca295f1650d698f94fc570977dae3',
'info_dict': {
'id': 'd982c9ce-8648-4753-b358-98abb8aec43d',
'ext': 'mp4',
'title': 'Folge 1 - Metaphysik',
'description': 'Kant für Anfänger: Folge 1 - Metaphysik',
'uploader': 'Eva Maria Steimle',
'upload_date': '20140117',
}
},
]
def _real_extract(self, url):
@@ -44,56 +77,63 @@ class BRIE(InfoExtractor):
display_id = mobj.group('id')
page = self._download_webpage(url, display_id)
xml_url = self._search_regex(
r"return BRavFramework\.register\(BRavFramework\('avPlayer_(?:[a-f0-9-]{36})'\)\.setup\({dataURL:'(/mediathek/video/[a-z0-9/~_.-]+)'}\)\);", page, "XMLURL")
r"return BRavFramework\.register\(BRavFramework\('avPlayer_(?:[a-f0-9-]{36})'\)\.setup\({dataURL:'(/(?:[a-z0-9\-]+/)+[a-z0-9/~_.-]+)'}\)\);", page, 'XMLURL')
xml = self._download_xml(self._BASE_URL + xml_url, None)
videos = []
for xml_video in xml.findall("video"):
video = {
"id": xml_video.get("externalId"),
"title": xml_video.find("title").text,
"formats": self._extract_formats(xml_video.find("assets")),
"thumbnails": self._extract_thumbnails(xml_video.find("teaserImage/variants")),
"description": " ".join(xml_video.find("shareTitle").text.splitlines()),
"webpage_url": xml_video.find("permalink").text
}
if xml_video.find("author").text:
video["uploader"] = xml_video.find("author").text
if xml_video.find("broadcastDate").text:
video["upload_date"] = "".join(reversed(xml_video.find("broadcastDate").text.split(".")))
videos.append(video)
medias = []
if len(videos) > 1:
for xml_media in xml.findall('video') + xml.findall('audio'):
media = {
'id': xml_media.get('externalId'),
'title': xml_media.find('title').text,
'formats': self._extract_formats(xml_media.find('assets')),
'thumbnails': self._extract_thumbnails(xml_media.find('teaserImage/variants')),
'description': ' '.join(xml_media.find('shareTitle').text.splitlines()),
'webpage_url': xml_media.find('permalink').text
}
if xml_media.find('author').text:
media['uploader'] = xml_media.find('author').text
if xml_media.find('broadcastDate').text:
media['upload_date'] = ''.join(reversed(xml_media.find('broadcastDate').text.split('.')))
medias.append(media)
if len(medias) > 1:
self._downloader.report_warning(
'found multiple videos; please '
'found multiple medias; please '
'report this with the video URL to http://yt-dl.org/bug')
if not videos:
raise ExtractorError('No video entries found')
return videos[0]
if not medias:
raise ExtractorError('No media entries found')
return medias[0]
def _extract_formats(self, assets):
def text_or_none(asset, tag):
elem = asset.find(tag)
return None if elem is None else elem.text
formats = [{
"url": asset.find("downloadUrl").text,
"ext": asset.find("mediaType").text,
"format_id": asset.get("type"),
"width": int(asset.find("frameWidth").text),
"height": int(asset.find("frameHeight").text),
"tbr": int(asset.find("bitrateVideo").text),
"abr": int(asset.find("bitrateAudio").text),
"vcodec": asset.find("codecVideo").text,
"container": asset.find("mediaType").text,
"filesize": int(asset.find("size").text),
} for asset in assets.findall("asset")
if asset.find("downloadUrl") is not None]
'url': text_or_none(asset, 'downloadUrl'),
'ext': text_or_none(asset, 'mediaType'),
'format_id': asset.get('type'),
'width': int_or_none(text_or_none(asset, 'frameWidth')),
'height': int_or_none(text_or_none(asset, 'frameHeight')),
'tbr': int_or_none(text_or_none(asset, 'bitrateVideo')),
'abr': int_or_none(text_or_none(asset, 'bitrateAudio')),
'vcodec': text_or_none(asset, 'codecVideo'),
'acodec': text_or_none(asset, 'codecAudio'),
'container': text_or_none(asset, 'mediaType'),
'filesize': int_or_none(text_or_none(asset, 'size')),
} for asset in assets.findall('asset')
if asset.find('downloadUrl') is not None]
self._sort_formats(formats)
return formats
def _extract_thumbnails(self, variants):
thumbnails = [{
"url": self._BASE_URL + variant.find("url").text,
"width": int(variant.find("width").text),
"height": int(variant.find("height").text),
} for variant in variants.findall("variant")]
thumbnails.sort(key=lambda x: x["width"] * x["height"], reverse=True)
'url': self._BASE_URL + variant.find('url').text,
'width': int_or_none(variant.find('width').text),
'height': int_or_none(variant.find('height').text),
} for variant in variants.findall('variant')]
thumbnails.sort(key=lambda x: x['width'] * x['height'], reverse=True)
return thumbnails

View File

@@ -27,9 +27,10 @@ class BreakIE(InfoExtractor):
webpage, 'info json', flags=re.DOTALL)
info = json.loads(info_json)
video_url = info['videoUri']
m_youtube = re.search(r'(https?://www\.youtube\.com/watch\?v=.*)', video_url)
if m_youtube is not None:
return self.url_result(m_youtube.group(1), 'Youtube')
youtube_id = info.get('youtubeId')
if youtube_id:
return self.url_result(youtube_id, 'Youtube')
final_url = video_url + '?' + info['AuthToken']
return {
'id': video_id,

View File

@@ -140,7 +140,11 @@ class BrightcoveIE(InfoExtractor):
url_m = re.search(r'<meta\s+property="og:video"\s+content="(http://c.brightcove.com/[^"]+)"', webpage)
if url_m:
return [unescapeHTML(url_m.group(1))]
url = unescapeHTML(url_m.group(1))
# Some sites don't add it, we can't download with this url, for example:
# http://www.ktvu.com/videos/news/raw-video-caltrain-releases-video-of-man-almost/vCTZdY/
if 'playerKey' in url:
return [url]
matches = re.findall(
r'''(?sx)<object

View File

@@ -4,9 +4,7 @@ import json
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
)
from ..utils import ExtractorError
class BYUtvIE(InfoExtractor):
@@ -16,7 +14,7 @@ class BYUtvIE(InfoExtractor):
'info_dict': {
'id': 'granite-flats-talking',
'ext': 'mp4',
'description': 'md5:1a7ae3e153359b7cc355ef3963441e5f',
'description': 'md5:4e9a7ce60f209a33eca0ac65b4918e1c',
'title': 'Talking',
'thumbnail': 're:^https?://.*promo.*'
},

View File

@@ -1,53 +1,72 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import unified_strdate
from ..utils import (
unified_strdate,
url_basename,
)
class CanalplusIE(InfoExtractor):
_VALID_URL = r'https?://(www\.canalplus\.fr/.*?/(?P<path>.*)|player\.canalplus\.fr/#/(?P<id>\d+))'
_VALID_URL = r'https?://(?:www\.canalplus\.fr/.*?/(?P<path>.*)|player\.canalplus\.fr/#/(?P<id>[0-9]+))'
_VIDEO_INFO_TEMPLATE = 'http://service.canal-plus.com/video/rest/getVideosLiees/cplus/%s'
IE_NAME = u'canalplus.fr'
IE_NAME = 'canalplus.fr'
_TEST = {
u'url': u'http://www.canalplus.fr/c-infos-documentaires/pid1830-c-zapping.html?vid=922470',
u'file': u'922470.flv',
u'info_dict': {
u'title': u'Zapping - 26/08/13',
u'description': u'Le meilleur de toutes les chaînes, tous les jours.\nEmission du 26 août 2013',
u'upload_date': u'20130826',
},
u'params': {
u'skip_download': True,
'url': 'http://www.canalplus.fr/c-infos-documentaires/pid1830-c-zapping.html?vid=922470',
'md5': '3db39fb48b9685438ecf33a1078023e4',
'info_dict': {
'id': '922470',
'ext': 'flv',
'title': 'Zapping - 26/08/13',
'description': 'Le meilleur de toutes les chaînes, tous les jours.\nEmission du 26 août 2013',
'upload_date': '20130826',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.groupdict().get('id')
# Beware, some subclasses do not define an id group
display_id = url_basename(mobj.group('path'))
if video_id is None:
webpage = self._download_webpage(url, mobj.group('path'))
video_id = self._search_regex(r'<canal:player videoId="(\d+)"', webpage, u'video id')
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(r'<canal:player videoId="(\d+)"', webpage, 'video id')
info_url = self._VIDEO_INFO_TEMPLATE % video_id
doc = self._download_xml(info_url,video_id,
u'Downloading video info')
doc = self._download_xml(info_url, video_id, 'Downloading video XML')
self.report_extraction(video_id)
video_info = [video for video in doc if video.find('ID').text == video_id][0]
infos = video_info.find('INFOS')
media = video_info.find('MEDIA')
formats = [media.find('VIDEOS/%s' % format)
for format in ['BAS_DEBIT', 'HAUT_DEBIT', 'HD']]
video_url = [format.text for format in formats if format is not None][-1]
infos = video_info.find('INFOS')
return {'id': video_id,
'title': u'%s - %s' % (infos.find('TITRAGE/TITRE').text,
infos.find('TITRAGE/SOUS_TITRE').text),
'url': video_url,
'ext': 'flv',
'upload_date': unified_strdate(infos.find('PUBLICATION/DATE').text),
'thumbnail': media.find('IMAGES/GRAND').text,
'description': infos.find('DESCRIPTION').text,
'view_count': int(infos.find('NB_VUES').text),
}
preferences = ['MOBILE', 'BAS_DEBIT', 'HAUT_DEBIT', 'HD', 'HLS', 'HDS']
formats = [
{
'url': fmt.text + '?hdcore=2.11.3' if fmt.tag == 'HDS' else fmt.text,
'format_id': fmt.tag,
'ext': 'mp4' if fmt.tag == 'HLS' else 'flv',
'preference': preferences.index(fmt.tag) if fmt.tag in preferences else -1,
} for fmt in media.find('VIDEOS') if fmt.text
]
self._sort_formats(formats)
return {
'id': video_id,
'display_id': display_id,
'title': '%s - %s' % (infos.find('TITRAGE/TITRE').text,
infos.find('TITRAGE/SOUS_TITRE').text),
'upload_date': unified_strdate(infos.find('PUBLICATION/DATE').text),
'thumbnail': media.find('IMAGES/GRAND').text,
'description': infos.find('DESCRIPTION').text,
'view_count': int(infos.find('NB_VUES').text),
'like_count': int(infos.find('NB_LIKES').text),
'comment_count': int(infos.find('NB_COMMENTS').text),
'formats': formats,
}

View File

@@ -0,0 +1,87 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor
class CBSNewsIE(InfoExtractor):
IE_DESC = 'CBS News'
_VALID_URL = r'http://(?:www\.)?cbsnews\.com/(?:[^/]+/)+(?P<id>[\da-z_-]+)'
_TESTS = [
{
'url': 'http://www.cbsnews.com/news/tesla-and-spacex-elon-musks-industrial-empire/',
'info_dict': {
'id': 'tesla-and-spacex-elon-musks-industrial-empire',
'ext': 'flv',
'title': 'Tesla and SpaceX: Elon Musk\'s industrial empire',
'thumbnail': 'http://beta.img.cbsnews.com/i/2014/03/30/60147937-2f53-4565-ad64-1bdd6eb64679/60-0330-pelley-640x360.jpg',
'duration': 791,
},
'params': {
# rtmp download
'skip_download': True,
},
},
{
'url': 'http://www.cbsnews.com/videos/fort-hood-shooting-army-downplays-mental-illness-as-cause-of-attack/',
'info_dict': {
'id': 'fort-hood-shooting-army-downplays-mental-illness-as-cause-of-attack',
'ext': 'flv',
'title': 'Fort Hood shooting: Army downplays mental illness as cause of attack',
'thumbnail': 'http://cbsnews2.cbsistatic.com/hub/i/r/2014/04/04/0c9fbc66-576b-41ca-8069-02d122060dd2/thumbnail/140x90/6dad7a502f88875ceac38202984b6d58/en-0404-werner-replace-640x360.jpg',
'duration': 205,
},
'params': {
# rtmp download
'skip_download': True,
},
},
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
video_info = json.loads(self._html_search_regex(
r'(?:<ul class="media-list items" id="media-related-items"><li data-video-info|<div id="cbsNewsVideoPlayer" data-video-player-options)=\'({.+?})\'',
webpage, 'video JSON info'))
item = video_info['item'] if 'item' in video_info else video_info
title = item.get('articleTitle') or item.get('hed')
duration = item.get('duration')
thumbnail = item.get('mediaImage') or item.get('thumbnail')
formats = []
for format_id in ['RtmpMobileLow', 'RtmpMobileHigh', 'Hls', 'RtmpDesktop']:
uri = item.get('media' + format_id + 'URI')
if not uri:
continue
fmt = {
'url': uri,
'format_id': format_id,
}
if uri.startswith('rtmp'):
fmt.update({
'app': 'ondemand?auth=cbs',
'play_path': 'mp4:' + uri.split('<break>')[-1],
'player_url': 'http://www.cbsnews.com/[[IMPORT]]/vidtech.cbsinteractive.com/player/3_3_0/CBSI_PLAYER_HD.swf',
'page_url': 'http://www.cbsnews.com',
'ext': 'flv',
})
elif uri.endswith('.m3u8'):
fmt['ext'] = 'mp4'
formats.append(fmt)
return {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'duration': duration,
'formats': formats,
}

View File

@@ -0,0 +1,58 @@
# coding: utf-8
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..utils import (
clean_html,
qualities,
)
class ClubicIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?clubic\.com/video/[^/]+/video.*-(?P<id>[0-9]+)\.html'
_TEST = {
'url': 'http://www.clubic.com/video/clubic-week/video-clubic-week-2-0-le-fbi-se-lance-dans-la-photo-d-identite-448474.html',
'md5': '1592b694ba586036efac1776b0b43cd3',
'info_dict': {
'id': '448474',
'ext': 'mp4',
'title': 'Clubic Week 2.0 : le FBI se lance dans la photo d\u0092identité',
'description': 're:Gueule de bois chez Nokia. Le constructeur a indiqué cette.*',
'thumbnail': 're:^http://img\.clubic\.com/.*\.jpg$',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
player_url = 'http://player.m6web.fr/v1/player/clubic/%s.html' % video_id
player_page = self._download_webpage(player_url, video_id)
config_json = self._search_regex(
r'(?m)M6\.Player\.config\s*=\s*(\{.+?\});$', player_page,
'configuration')
config = json.loads(config_json)
video_info = config['videoInfo']
sources = config['sources']
quality_order = qualities(['sd', 'hq'])
formats = [{
'format_id': src['streamQuality'],
'url': src['src'],
'quality': quality_order(src['streamQuality']),
} for src in sources]
self._sort_formats(formats)
return {
'id': video_id,
'title': video_info['title'],
'formats': formats,
'description': clean_html(video_info.get('description')),
'thumbnail': config.get('poster'),
}

View File

@@ -6,6 +6,7 @@ import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
int_or_none,
)
@@ -32,10 +33,14 @@ class CNETIE(InfoExtractor):
webpage = self._download_webpage(url, display_id)
data_json = self._html_search_regex(
r"<div class=\"cnetVideoPlayer\" data-cnet-video-options='([^']+)'",
r"<div class=\"cnetVideoPlayer\"\s+.*?data-cnet-video-options='([^']+)'",
webpage, 'data json')
data = json.loads(data_json)
vdata = data['video']
if not vdata:
vdata = data['videos'][0]
if not vdata:
raise ExtractorError('Cannot find video data')
video_id = vdata['id']
title = vdata['headline']

View File

@@ -21,7 +21,7 @@ class ComedyCentralIE(MTVServicesInfoExtractor):
_TEST = {
'url': 'http://www.comedycentral.com/video-clips/kllhuv/stand-up-greg-fitzsimmons--uncensored---too-good-of-a-mother',
'md5': '4167875aae411f903b751a21f357f1ee',
'md5': 'c4f48e9eda1b16dd10add0744344b6d8',
'info_dict': {
'id': 'cef0cbb3-e776-4bc9-b62e-8016deccb354',
'ext': 'mp4',
@@ -41,9 +41,9 @@ class ComedyCentralShowsIE(InfoExtractor):
_VALID_URL = r'''(?x)^(:(?P<shortname>tds|thedailyshow|cr|colbert|colbertnation|colbertreport)
|https?://(:www\.)?
(?P<showname>thedailyshow|thecolbertreport)\.(?:cc\.)?com/
(full-episodes/(?:[0-9a-z]{6}/)?(?P<episode>.*)|
((?:full-)?episodes/(?:[0-9a-z]{6}/)?(?P<episode>.*)|
(?P<clip>
(?:(?:guests/[^/]+|videos)/[^/]+/(?P<videotitle>[^/?#]+))
(?:(?:guests/[^/]+|videos|video-playlists|special-editions)/[^/]+/(?P<videotitle>[^/?#]+))
|(the-colbert-report-(videos|collections)/(?P<clipID>[0-9]+)/[^/]*/(?P<cntitle>.*?))
|(watch/(?P<date>[^/]*)/(?P<tdstitle>.*))
)|

View File

@@ -113,6 +113,8 @@ class InfoExtractor(object):
webpage_url: The url to the video webpage, if given to youtube-dl it
should allow to get the same result again. (It will be set
by YoutubeDL if it's missing)
categories: A list of categories that the video falls in, for example
["Sports", "Berlin"]
Unless mentioned otherwise, the fields should be Unicode strings.
@@ -242,16 +244,20 @@ class InfoExtractor(object):
url = url_or_request.get_full_url()
except AttributeError:
url = url_or_request
if len(url) > 200:
h = u'___' + hashlib.md5(url.encode('utf-8')).hexdigest()
url = url[:200 - len(h)] + h
raw_filename = ('%s_%s.dump' % (video_id, url))
basen = '%s_%s' % (video_id, url)
if len(basen) > 240:
h = u'___' + hashlib.md5(basen.encode('utf-8')).hexdigest()
basen = basen[:240 - len(h)] + h
raw_filename = basen + '.dump'
filename = sanitize_filename(raw_filename, restricted=True)
self.to_screen(u'Saving request to ' + filename)
with open(filename, 'wb') as outf:
outf.write(webpage_bytes)
content = webpage_bytes.decode(encoding, 'replace')
try:
content = webpage_bytes.decode(encoding, 'replace')
except LookupError:
content = webpage_bytes.decode('utf-8', 'replace')
if (u'<title>Access to this site is blocked</title>' in content and
u'Websense' in content[:512]):
@@ -276,9 +282,12 @@ class InfoExtractor(object):
def _download_xml(self, url_or_request, video_id,
note=u'Downloading XML', errnote=u'Unable to download XML',
transform_source=None):
transform_source=None, fatal=True):
"""Return the xml as an xml.etree.ElementTree.Element"""
xml_string = self._download_webpage(url_or_request, video_id, note, errnote)
xml_string = self._download_webpage(
url_or_request, video_id, note, errnote, fatal=fatal)
if xml_string is False:
return xml_string
if transform_source:
xml_string = transform_source(xml_string)
return xml.etree.ElementTree.fromstring(xml_string.encode('utf-8'))
@@ -542,6 +551,23 @@ class InfoExtractor(object):
)
formats.sort(key=_formats_key)
def http_scheme(self):
""" Either "https:" or "https:", depending on the user's preferences """
return (
'http:'
if self._downloader.params.get('prefer_insecure', False)
else 'https:')
def _proto_relative_url(self, url, scheme=None):
if url is None:
return url
if url.startswith('//'):
if scheme is None:
scheme = self.http_scheme()
return scheme + url
else:
return url
class SearchInfoExtractor(InfoExtractor):
"""
@@ -585,3 +611,4 @@ class SearchInfoExtractor(InfoExtractor):
@property
def SEARCH_KEY(self):
return self._SEARCH_KEY

View File

@@ -28,16 +28,18 @@ class CondeNastIE(InfoExtractor):
'glamour': 'Glamour',
'wmagazine': 'W Magazine',
'vanityfair': 'Vanity Fair',
'cnevids': 'Condé Nast',
}
_VALID_URL = r'http://(video|www)\.(?P<site>%s)\.com/(?P<type>watch|series|video)/(?P<id>.+)' % '|'.join(_SITES.keys())
_VALID_URL = r'http://(video|www|player)\.(?P<site>%s)\.com/(?P<type>watch|series|video|embed)/(?P<id>[^/?#]+)' % '|'.join(_SITES.keys())
IE_DESC = 'Condé Nast media group: %s' % ', '.join(sorted(_SITES.values()))
_TEST = {
'url': 'http://video.wired.com/watch/3d-printed-speakers-lit-with-led',
'file': '5171b343c2b4c00dd0c1ccb3.mp4',
'md5': '1921f713ed48aabd715691f774c451f7',
'info_dict': {
'id': '5171b343c2b4c00dd0c1ccb3',
'ext': 'mp4',
'title': '3D Printed Speakers Lit With LED',
'description': 'Check out these beautiful 3D printed LED speakers. You can\'t actually buy them, but LumiGeek is working on a board that will let you make you\'re own.',
}
@@ -55,12 +57,16 @@ class CondeNastIE(InfoExtractor):
entries = [self.url_result(build_url(path), 'CondeNast') for path in paths]
return self.playlist_result(entries, playlist_title=title)
def _extract_video(self, webpage):
description = self._html_search_regex([r'<div class="cne-video-description">(.+?)</div>',
r'<div class="video-post-content">(.+?)</div>',
],
webpage, 'description',
fatal=False, flags=re.DOTALL)
def _extract_video(self, webpage, url_type):
if url_type != 'embed':
description = self._html_search_regex(
[
r'<div class="cne-video-description">(.+?)</div>',
r'<div class="video-post-content">(.+?)</div>',
],
webpage, 'description', fatal=False, flags=re.DOTALL)
else:
description = None
params = self._search_regex(r'var params = {(.+?)}[;,]', webpage,
'player params', flags=re.DOTALL)
video_id = self._search_regex(r'videoId: [\'"](.+?)[\'"]', params, 'video id')
@@ -99,12 +105,12 @@ class CondeNastIE(InfoExtractor):
mobj = re.match(self._VALID_URL, url)
site = mobj.group('site')
url_type = mobj.group('type')
id = mobj.group('id')
item_id = mobj.group('id')
self.to_screen(u'Extracting from %s with the Condé Nast extractor' % self._SITES[site])
webpage = self._download_webpage(url, id)
self.to_screen('Extracting from %s with the Condé Nast extractor' % self._SITES[site])
webpage = self._download_webpage(url, item_id)
if url_type == 'series':
return self._extract_series(url, webpage)
else:
return self._extract_video(webpage)
return self._extract_video(webpage, url_type)

View File

@@ -8,13 +8,11 @@ from .subtitles import SubtitlesInfoExtractor
from ..utils import (
compat_urllib_request,
compat_str,
get_element_by_attribute,
get_element_by_id,
orderedSet,
str_to_int,
int_or_none,
ExtractorError,
unescapeHTML,
)
class DailymotionBaseInfoExtractor(InfoExtractor):
@@ -180,7 +178,7 @@ class DailymotionIE(DailymotionBaseInfoExtractor, SubtitlesInfoExtractor):
class DailymotionPlaylistIE(DailymotionBaseInfoExtractor):
IE_NAME = u'dailymotion:playlist'
_VALID_URL = r'(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/playlist/(?P<id>.+?)/'
_MORE_PAGES_INDICATOR = r'<div class="next">.*?<a.*?href="/playlist/.+?".*?>.*?</a>.*?</div>'
_MORE_PAGES_INDICATOR = r'(?s)<div class="pages[^"]*">.*?<a\s+class="[^"]*?icon-arrow_right[^"]*?"'
_PAGE_TEMPLATE = 'https://www.dailymotion.com/playlist/%s/%s'
def _extract_entries(self, id):
@@ -190,10 +188,9 @@ class DailymotionPlaylistIE(DailymotionBaseInfoExtractor):
webpage = self._download_webpage(request,
id, u'Downloading page %s' % pagenum)
playlist_el = get_element_by_attribute(u'class', u'row video_list', webpage)
video_ids.extend(re.findall(r'data-id="(.+?)"', playlist_el))
video_ids.extend(re.findall(r'data-xid="(.+?)"', webpage))
if re.search(self._MORE_PAGES_INDICATOR, webpage, re.DOTALL) is None:
if re.search(self._MORE_PAGES_INDICATOR, webpage) is None:
break
return [self.url_result('http://www.dailymotion.com/video/%s' % video_id, 'Dailymotion')
for video_id in orderedSet(video_ids)]
@@ -203,26 +200,26 @@ class DailymotionPlaylistIE(DailymotionBaseInfoExtractor):
playlist_id = mobj.group('id')
webpage = self._download_webpage(url, playlist_id)
return {'_type': 'playlist',
'id': playlist_id,
'title': get_element_by_id(u'playlist_name', webpage),
'entries': self._extract_entries(playlist_id),
}
return {
'_type': 'playlist',
'id': playlist_id,
'title': self._og_search_title(webpage),
'entries': self._extract_entries(playlist_id),
}
class DailymotionUserIE(DailymotionPlaylistIE):
IE_NAME = u'dailymotion:user'
_VALID_URL = r'(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/user/(?P<user>[^/]+)'
_MORE_PAGES_INDICATOR = r'<div class="next">.*?<a.*?href="/user/.+?".*?>.*?</a>.*?</div>'
_VALID_URL = r'https?://(?:www\.)?dailymotion\.[a-z]{2,3}/user/(?P<user>[^/]+)'
_PAGE_TEMPLATE = 'http://www.dailymotion.com/user/%s/%s'
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
user = mobj.group('user')
webpage = self._download_webpage(url, user)
full_user = self._html_search_regex(
r'<a class="label" href="/%s".*?>(.*?)</' % re.escape(user),
webpage, u'user', flags=re.DOTALL)
full_user = unescapeHTML(self._html_search_regex(
r'<a class="nav-image" title="([^"]+)" href="/%s">' % re.escape(user),
webpage, u'user', flags=re.DOTALL))
return {
'_type': 'playlist',

View File

@@ -0,0 +1,27 @@
from __future__ import unicode_literals
from .novamov import NovaMovIE
class DivxStageIE(NovaMovIE):
IE_NAME = 'divxstage'
IE_DESC = 'DivxStage'
_VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': 'divxstage\.(?:eu|net|ch|co|at|ag)'}
_HOST = 'www.divxstage.eu'
_FILE_DELETED_REGEX = r'>This file no longer exists on our servers.<'
_TITLE_REGEX = r'<div class="video_det">\s*<strong>([^<]+)</strong>'
_DESCRIPTION_REGEX = r'<div class="video_det">\s*<strong>[^<]+</strong>\s*<p>([^<]+)</p>'
_TEST = {
'url': 'http://www.divxstage.eu/video/57f238e2e5e01',
'md5': '63969f6eb26533a1968c4d325be63e72',
'info_dict': {
'id': '57f238e2e5e01',
'ext': 'flv',
'title': 'youtubedl test video',
'description': 'This is a test video for youtubedl.',
}
}

View File

@@ -0,0 +1,48 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
)
class EmpflixIE(InfoExtractor):
_VALID_URL = r'^https?://www\.empflix\.com/videos/.*?-(?P<id>[0-9]+)\.html'
_TEST = {
'url': 'http://www.empflix.com/videos/Amateur-Finger-Fuck-33051.html',
'md5': '5e5cc160f38ca9857f318eb97146e13e',
'info_dict': {
'id': '33051',
'ext': 'flv',
'title': 'Amateur Finger Fuck',
'age_limit': 18,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
age_limit = self._rta_search(webpage)
video_title = self._html_search_regex(
r'name="title" value="(?P<title>[^"]*)"', webpage, 'title')
cfg_url = self._html_search_regex(
r'flashvars\.config = escape\("([^"]+)"',
webpage, 'flashvars.config')
cfg_xml = self._download_xml(
cfg_url, video_id, note='Downloading metadata')
video_url = cfg_xml.find('videoLink').text
return {
'id': video_id,
'url': video_url,
'ext': 'flv',
'title': video_title,
'age_limit': age_limit,
}

View File

@@ -1,4 +1,5 @@
import os
from __future__ import unicode_literals
import re
from .common import InfoExtractor
@@ -8,18 +9,23 @@ from ..utils import (
compat_urllib_parse,
)
class ExtremeTubeIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?(?:www\.)?(?P<url>extremetube\.com/video/.+?(?P<videoid>[0-9]+))(?:[/?&]|$)'
_TEST = {
u'url': u'http://www.extremetube.com/video/music-video-14-british-euro-brit-european-cumshots-swallow-652431',
u'file': u'652431.mp4',
u'md5': u'1fb9228f5e3332ec8c057d6ac36f33e0',
u'info_dict': {
u"title": u"Music Video 14 british euro brit european cumshots swallow",
u"uploader": u"unknown",
u"age_limit": 18,
_VALID_URL = r'^(?:https?://)?(?:www\.)?(?P<url>extremetube\.com/.*?video/.+?(?P<videoid>[0-9]+))(?:[/?&]|$)'
_TESTS = [{
'url': 'http://www.extremetube.com/video/music-video-14-british-euro-brit-european-cumshots-swallow-652431',
'md5': '1fb9228f5e3332ec8c057d6ac36f33e0',
'info_dict': {
'id': '652431',
'ext': 'mp4',
'title': 'Music Video 14 british euro brit european cumshots swallow',
'uploader': 'unknown',
'age_limit': 18,
}
}
}, {
'url': 'http://www.extremetube.com/gay/video/abcde-1234',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
@@ -30,11 +36,14 @@ class ExtremeTubeIE(InfoExtractor):
req.add_header('Cookie', 'age_verified=1')
webpage = self._download_webpage(req, video_id)
video_title = self._html_search_regex(r'<h1 [^>]*?title="([^"]+)"[^>]*>\1<', webpage, u'title')
uploader = self._html_search_regex(r'>Posted by:(?=<)(?:\s|<[^>]*>)*(.+?)\|', webpage, u'uploader', fatal=False)
video_url = compat_urllib_parse.unquote(self._html_search_regex(r'video_url=(.+?)&amp;', webpage, u'video_url'))
video_title = self._html_search_regex(
r'<h1 [^>]*?title="([^"]+)"[^>]*>\1<', webpage, 'title')
uploader = self._html_search_regex(
r'>Posted by:(?=<)(?:\s|<[^>]*>)*(.+?)\|', webpage, 'uploader',
fatal=False)
video_url = compat_urllib_parse.unquote(self._html_search_regex(
r'video_url=(.+?)&amp;', webpage, 'video_url'))
path = compat_urllib_parse_urlparse(video_url).path
extension = os.path.splitext(path)[1][1:]
format = path.split('/')[5].split('_')[:2]
format = "-".join(format)
@@ -43,7 +52,6 @@ class ExtremeTubeIE(InfoExtractor):
'title': video_title,
'uploader': uploader,
'url': video_url,
'ext': extension,
'format': format,
'format_id': format,
'age_limit': 18,

View File

@@ -76,9 +76,8 @@ class FacebookIE(InfoExtractor):
check_form = {
'fb_dtsg': self._search_regex(r'name="fb_dtsg" value="(.+?)"', login_results, 'fb_dtsg'),
'nh': self._search_regex(r'name="nh" value="(\w*?)"', login_results, 'nh'),
'h': self._search_regex(r'name="h" value="(\w*?)"', login_results, 'h'),
'name_action_selected': 'dont_save',
'submit[Continue]': self._search_regex(r'<button[^>]+value="(.*?)"[^>]+name="submit\[Continue\]"', login_results, 'continue'),
}
check_req = compat_urllib_request.Request(self._CHECKPOINT_URL, urlencode_postdata(check_form))
check_req.add_header('Content-Type', 'application/x-www-form-urlencoded')

View File

@@ -0,0 +1,60 @@
#! -*- coding: utf-8 -*-
from __future__ import unicode_literals
import re
import hashlib
from .common import InfoExtractor
from ..utils import (
ExtractorError,
compat_urllib_request,
compat_urlparse,
)
class FC2IE(InfoExtractor):
_VALID_URL = r'^http://video\.fc2\.com/(?P<lang>[^/]+)/content/(?P<id>[^/]+)'
IE_NAME = 'fc2'
_TEST = {
'url': 'http://video.fc2.com/en/content/20121103kUan1KHs',
'md5': 'a6ebe8ebe0396518689d963774a54eb7',
'info_dict': {
'id': '20121103kUan1KHs',
'ext': 'flv',
'title': 'Boxing again with Puff',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
self._downloader.cookiejar.clear_session_cookies() # must clear
title = self._og_search_title(webpage)
thumbnail = self._og_search_thumbnail(webpage)
refer = url.replace('/content/', '/a/content/')
mimi = hashlib.md5(video_id + '_gGddgPfeaf_gzyr').hexdigest()
info_url = (
"http://video.fc2.com/ginfo.php?mimi={1:s}&href={2:s}&v={0:s}&fversion=WIN%2011%2C6%2C602%2C180&from=2&otag=0&upid={0:s}&tk=null&".
format(video_id, mimi, compat_urllib_request.quote(refer, safe='').replace('.','%2E')))
info_webpage = self._download_webpage(
info_url, video_id, note='Downloading info page')
info = compat_urlparse.parse_qs(info_webpage)
if 'err_code' in info:
raise ExtractorError('Error code: %s' % info['err_code'][0])
video_url = info['filepath'][0] + '?mid=' + info['mid'][0]
return {
'id': video_id,
'title': info['title'][0],
'url': video_url,
'ext': 'flv',
'thumbnail': thumbnail,
}

View File

@@ -6,7 +6,6 @@ from .common import InfoExtractor
class FirstpostIE(InfoExtractor):
IE_NAME = 'Firstpost.com'
_VALID_URL = r'http://(?:www\.)?firstpost\.com/[^/]+/.*-(?P<id>[0-9]+)\.html'
_TEST = {
@@ -16,7 +15,6 @@ class FirstpostIE(InfoExtractor):
'id': '1025403',
'ext': 'mp4',
'title': 'India to launch indigenous aircraft carrier INS Vikrant today',
'description': 'Its flight deck is over twice the size of a football field, its power unit can light up the entire Kochi city and the cabling is enough to cover the distance between here to Delhi.',
}
}
@@ -24,15 +22,26 @@ class FirstpostIE(InfoExtractor):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
video_url = self._html_search_regex(
r'<div.*?name="div_video".*?flashvars="([^"]+)">',
webpage, 'video URL')
data = self._download_xml(
'http://www.firstpost.com/getvideoxml-%s.xml' % video_id, video_id,
'Downloading video XML')
item = data.find('./playlist/item')
thumbnail = item.find('./image').text
title = item.find('./title').text
formats = [
{
'url': details.find('./file').text,
'format_id': details.find('./label').text.strip(),
'width': int(details.find('./width').text.strip()),
'height': int(details.find('./height').text.strip()),
} for details in item.findall('./source/file_details') if details.find('./file').text
]
return {
'id': video_id,
'url': video_url,
'title': self._og_search_title(webpage),
'description': self._og_search_description(webpage),
'thumbnail': self._og_search_thumbnail(webpage),
'title': title,
'thumbnail': thumbnail,
'formats': formats,
}

View File

@@ -5,6 +5,8 @@ import re
from .common import InfoExtractor
from ..utils import (
compat_str,
compat_urllib_parse,
ExtractorError,
)
@@ -16,16 +18,28 @@ class FiveMinIE(InfoExtractor):
(?P<id>\d+)
'''
_TEST = {
# From http://www.engadget.com/2013/11/15/ipad-mini-retina-display-review/
'url': 'http://pshared.5min.com/Scripts/PlayerSeed.js?sid=281&width=560&height=345&playList=518013791',
'md5': '4f7b0b79bf1a470e5004f7112385941d',
'info_dict': {
'id': '518013791',
'ext': 'mp4',
'title': 'iPad Mini with Retina Display Review',
_TESTS = [
{
# From http://www.engadget.com/2013/11/15/ipad-mini-retina-display-review/
'url': 'http://pshared.5min.com/Scripts/PlayerSeed.js?sid=281&width=560&height=345&playList=518013791',
'md5': '4f7b0b79bf1a470e5004f7112385941d',
'info_dict': {
'id': '518013791',
'ext': 'mp4',
'title': 'iPad Mini with Retina Display Review',
},
},
}
{
# From http://on.aol.com/video/how-to-make-a-next-level-fruit-salad-518086247
'url': '5min:518086247',
'md5': 'e539a9dd682c288ef5a498898009f69e',
'info_dict': {
'id': '518086247',
'ext': 'mp4',
'title': 'How to Make a Next-Level Fruit Salad',
},
},
]
@classmethod
def _build_result(cls, video_id):
@@ -34,10 +48,28 @@ class FiveMinIE(InfoExtractor):
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
info = self._download_json(
'https://syn.5min.com/handlers/SenseHandler.ashx?func=GetResults&'
'playlist=%s&url=https' % video_id,
video_id)['binding'][0]
embed_url = 'https://embed.5min.com/playerseed/?playList=%s' % video_id
embed_page = self._download_webpage(embed_url, video_id,
'Downloading embed page')
sid = self._search_regex(r'sid=(\d+)', embed_page, 'sid')
query = compat_urllib_parse.urlencode({
'func': 'GetResults',
'playlist': video_id,
'sid': sid,
'isPlayerSeed': 'true',
'url': embed_url,
})
response = self._download_json(
'https://syn.5min.com/handlers/SenseHandler.ashx?' + query,
video_id)
if not response['success']:
err_msg = response['errorMessage']
if err_msg == 'ErrorVideoUserNotGeo':
msg = 'Video not available from your location'
else:
msg = 'Aol said: %s' % err_msg
raise ExtractorError(msg, expected=True, video_id=video_id)
info = response['binding'][0]
second_id = compat_str(int(video_id[:-2]) + 1)
formats = []

View File

@@ -48,24 +48,36 @@ class PluzzIE(FranceTVBaseInfoExtractor):
class FranceTvInfoIE(FranceTVBaseInfoExtractor):
IE_NAME = 'francetvinfo.fr'
_VALID_URL = r'https?://www\.francetvinfo\.fr/replay.*/(?P<title>.+)\.html'
_VALID_URL = r'https?://www\.francetvinfo\.fr/.*/(?P<title>.+)\.html'
_TEST = {
_TESTS = [{
'url': 'http://www.francetvinfo.fr/replay-jt/france-3/soir-3/jt-grand-soir-3-lundi-26-aout-2013_393427.html',
'file': '84981923.mp4',
'info_dict': {
'id': '84981923',
'ext': 'mp4',
'title': 'Soir 3',
},
'params': {
'skip_download': True,
},
}
}, {
'url': 'http://www.francetvinfo.fr/elections/europeennes/direct-europeennes-regardez-le-debat-entre-les-candidats-a-la-presidence-de-la-commission_600639.html',
'info_dict': {
'id': 'EV_20019',
'ext': 'mp4',
'title': 'Débat des candidats à la Commission européenne',
'description': 'Débat des candidats à la Commission européenne',
},
'params': {
'skip_download': 'HLS (reqires ffmpeg)'
}
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
page_title = mobj.group('title')
webpage = self._download_webpage(url, page_title)
video_id = self._search_regex(r'id-video=(\d+?)[@"]', webpage, 'video id')
video_id = self._search_regex(r'id-video=((?:[^0-9]*?_)?[0-9]+)[@"]', webpage, 'video id')
return self._extract_video(video_id)

View File

@@ -4,22 +4,32 @@ import json
import re
from .common import InfoExtractor
from ..utils import ExtractorError
class FunnyOrDieIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?funnyordie\.com/(?P<type>embed|videos)/(?P<id>[0-9a-f]+)(?:$|[?#/])'
_TEST = {
_TESTS = [{
'url': 'http://www.funnyordie.com/videos/0732f586d7/heart-shaped-box-literal-video-version',
'file': '0732f586d7.mp4',
'md5': 'f647e9e90064b53b6e046e75d0241fbd',
'md5': 'bcd81e0c4f26189ee09be362ad6e6ba9',
'info_dict': {
'description': ('Lyrics changed to match the video. Spoken cameo '
'by Obscurus Lupa (from ThatGuyWithTheGlasses.com). Based on a '
'concept by Dustin McLean (DustFilms.com). Performed, edited, '
'and written by David A. Scott.'),
'id': '0732f586d7',
'ext': 'mp4',
'title': 'Heart-Shaped Box: Literal Video Version',
'description': 'md5:ea09a01bc9a1c46d9ab696c01747c338',
'thumbnail': 're:^http:.*\.jpg$',
},
}
}, {
'url': 'http://www.funnyordie.com/embed/e402820827',
'md5': 'ff4d83318f89776ed0250634cfaa8d36',
'info_dict': {
'id': 'e402820827',
'ext': 'mp4',
'title': 'Please Use This Song (Jon Lajoie)',
'description': 'md5:2ed27d364f5a805a6dba199faaf6681d',
'thumbnail': 're:^http:.*\.jpg$',
},
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
@@ -27,27 +37,34 @@ class FunnyOrDieIE(InfoExtractor):
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
video_url = self._search_regex(
[r'type="video/mp4" src="(.*?)"', r'src="([^>]*?)" type=\'video/mp4\''],
webpage, 'video URL', flags=re.DOTALL)
links = re.findall(r'<source src="([^"]+/v)\d+\.([^"]+)" type=\'video', webpage)
if not links:
raise ExtractorError('No media links available for %s' % video_id)
if mobj.group('type') == 'embed':
post_json = self._search_regex(
r'fb_post\s*=\s*(\{.*?\});', webpage, 'post details')
post = json.loads(post_json)
title = post['name']
description = post.get('description')
thumbnail = post.get('picture')
else:
title = self._og_search_title(webpage)
description = self._og_search_description(webpage)
thumbnail = None
links.sort(key=lambda link: 1 if link[1] == 'mp4' else 0)
bitrates = self._html_search_regex(r'<source src="[^"]+/v,((?:\d+,)+)\.mp4\.csmil', webpage, 'video bitrates')
bitrates = [int(b) for b in bitrates.rstrip(',').split(',')]
bitrates.sort()
formats = []
for bitrate in bitrates:
for link in links:
formats.append({
'url': '%s%d.%s' % (link[0], bitrate, link[1]),
'format_id': '%s-%d' % (link[1], bitrate),
'vbr': bitrate,
})
post_json = self._search_regex(
r'fb_post\s*=\s*(\{.*?\});', webpage, 'post details')
post = json.loads(post_json)
return {
'id': video_id,
'url': video_url,
'ext': 'mp4',
'title': title,
'description': description,
'thumbnail': thumbnail,
'title': post['name'],
'description': post.get('description'),
'thumbnail': post.get('picture'),
'formats': formats,
}

View File

@@ -15,11 +15,12 @@ from ..utils import (
class GameSpotIE(InfoExtractor):
_VALID_URL = r'(?:http://)?(?:www\.)?gamespot\.com/.*-(?P<page_id>\d+)/?'
_TEST = {
"url": "http://www.gamespot.com/arma-iii/videos/arma-iii-community-guide-sitrep-i-6410818/",
"file": "gs-2300-6410818.mp4",
"md5": "b2a30deaa8654fcccd43713a6b6a4825",
"info_dict": {
"title": "Arma 3 - Community Guide: SITREP I",
'url': 'http://www.gamespot.com/videos/arma-3-community-guide-sitrep-i/2300-6410818/',
'md5': 'b2a30deaa8654fcccd43713a6b6a4825',
'info_dict': {
'id': 'gs-2300-6410818',
'ext': 'mp4',
'title': 'Arma 3 - Community Guide: SITREP I',
'description': 'Check out this video where some of the basics of Arma 3 is explained.',
}
}

View File

@@ -35,9 +35,10 @@ class GenericIE(InfoExtractor):
_TESTS = [
{
'url': 'http://www.hodiho.fr/2013/02/regis-plante-sa-jeep.html',
'file': '13601338388002.mp4',
'md5': '6e15c93721d7ec9e9ca3fdbf07982cfd',
'md5': '85b90ccc9d73b4acd9138d3af4c27f89',
'info_dict': {
'id': '13601338388002',
'ext': 'mp4',
'uploader': 'www.hodiho.fr',
'title': 'R\u00e9gis plante sa Jeep',
}
@@ -46,8 +47,9 @@ class GenericIE(InfoExtractor):
{
'add_ie': ['Bandcamp'],
'url': 'http://bronyrock.com/track/the-pony-mash',
'file': '3235767654.mp3',
'info_dict': {
'id': '3235767654',
'ext': 'mp3',
'title': 'The Pony Mash',
'uploader': 'M_Pallante',
},
@@ -73,9 +75,10 @@ class GenericIE(InfoExtractor):
{
# https://github.com/rg3/youtube-dl/issues/2253
'url': 'http://bcove.me/i6nfkrc3',
'file': '3101154703001.mp4',
'md5': '0ba9446db037002366bab3b3eb30c88c',
'info_dict': {
'id': '3101154703001',
'ext': 'mp4',
'title': 'Still no power',
'uploader': 'thestar.com',
'description': 'Mississauga resident David Farmer is still out of power as a result of the ice storm a month ago. To keep the house warm, Farmer cuts wood from his property for a wood burning stove downstairs.',
@@ -114,20 +117,6 @@ class GenericIE(InfoExtractor):
'title': '2cc213299525360.mov', # that's what we get
},
},
# second style of embedded ooyala videos
{
'url': 'http://www.smh.com.au/tv/business/show/financial-review-sunday/behind-the-scenes-financial-review-sunday--4350201.html',
'info_dict': {
'id': '13djJjYjptA1XpPx8r9kuzPyj3UZH0Uk',
'ext': 'mp4',
'title': 'Behind-the-scenes: Financial Review Sunday ',
'description': 'Step inside Channel Nine studios for an exclusive tour of its upcoming financial business show.',
},
'params': {
# m3u8 download
'skip_download': True,
},
},
# google redirect
{
'url': 'http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CCUQtwIwAA&url=http%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DcmQHVoWB5FY&ei=F-sNU-LLCaXk4QT52ICQBQ&usg=AFQjCNEw4hL29zgOohLXvpJ-Bdh2bils1Q&bvm=bv.61965928,d.bGE',
@@ -198,6 +187,17 @@ class GenericIE(InfoExtractor):
'description': 'md5:ddb2a40ecd6b6a147e400e535874947b',
}
},
# Embeded Ustream video
{
'url': 'http://www.american.edu/spa/pti/nsa-privacy-janus-2014.cfm',
'md5': '27b99cdb639c9b12a79bca876a073417',
'info_dict': {
'id': '45734260',
'ext': 'flv',
'uploader': 'AU SPA: The NSA and Privacy',
'title': 'NSA and Privacy Forum Debate featuring General Hayden and Barton Gellman'
}
},
# nowvideo embed hidden behind percent encoding
{
'url': 'http://www.waoanime.tv/the-super-dimension-fortress-macross-episode-1/',
@@ -239,6 +239,28 @@ class GenericIE(InfoExtractor):
'uploader_id': 'rbctv_2012_4',
},
},
# Condé Nast embed
{
'url': 'http://www.wired.com/2014/04/honda-asimo/',
'md5': 'ba0dfe966fa007657bd1443ee672db0f',
'info_dict': {
'id': '53501be369702d3275860000',
'ext': 'mp4',
'title': 'Hondas New Asimo Robot Is More Human Than Ever',
}
},
# Dailymotion embed
{
'url': 'http://www.spi0n.com/zap-spi0n-com-n216/',
'md5': '441aeeb82eb72c422c7f14ec533999cd',
'info_dict': {
'id': 'k2mm4bCdJ6CQ2i7c8o2',
'ext': 'mp4',
'title': 'Le Zap de Spi0n n°216 - Zapping du Web',
'uploader': 'Spi0n',
},
'add_ie': ['Dailymotion'],
}
]
def report_download_webpage(self, video_id):
@@ -323,6 +345,12 @@ class GenericIE(InfoExtractor):
}
def _real_extract(self, url):
if url.startswith('//'):
return {
'_type': 'url',
'url': self.http_scheme() + url,
}
parsed_url = compat_urlparse.urlparse(url)
if not parsed_url.scheme:
default_search = self._downloader.params.get('default_search')
@@ -459,7 +487,7 @@ class GenericIE(InfoExtractor):
matches = re.findall(
r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?dailymotion\.com/embed/video/.+?)\1', webpage)
if matches:
urlrs = [self.url_result(unescapeHTML(tuppl[1]), 'Dailymotion')
urlrs = [self.url_result(unescapeHTML(tuppl[1]))
for tuppl in matches]
return self.playlist_result(
urlrs, playlist_id=video_id, playlist_title=video_title)
@@ -485,6 +513,22 @@ class GenericIE(InfoExtractor):
if mobj:
return self.url_result(mobj.group(1), 'BlipTV')
# Look for embedded condenast player
matches = re.findall(
r'<iframe\s+(?:[a-zA-Z-]+="[^"]+"\s+)*?src="(https?://player\.cnevids\.com/embed/[^"]+")',
webpage)
if matches:
return {
'_type': 'playlist',
'entries': [{
'_type': 'url',
'ie_key': 'CondeNast',
'url': ma,
} for ma in matches],
'title': video_title,
'id': video_id,
}
# Look for Bandcamp pages with custom domain
mobj = re.search(r'<meta property="og:url"[^>]*?content="(.*?bandcamp\.com.*?)"', webpage)
if mobj is not None:
@@ -505,7 +549,7 @@ class GenericIE(InfoExtractor):
return OoyalaIE._build_url_result(mobj.group('ec'))
# Look for Aparat videos
mobj = re.search(r'<iframe src="(http://www\.aparat\.com/video/[^"]+)"', webpage)
mobj = re.search(r'<iframe .*?src="(http://www\.aparat\.com/video/[^"]+)"', webpage)
if mobj is not None:
return self.url_result(mobj.group(1), 'Aparat')
@@ -514,17 +558,18 @@ class GenericIE(InfoExtractor):
if mobj is not None:
return self.url_result(mobj.group(1), 'Mpora')
# Look for embedded NovaMov player
# Look for embedded NovaMov-based player
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>http://(?:(?:embed|www)\.)?novamov\.com/embed\.php.+?)\1', webpage)
r'''(?x)<(?:pagespeed_)?iframe[^>]+?src=(["\'])
(?P<url>http://(?:(?:embed|www)\.)?
(?:novamov\.com|
nowvideo\.(?:ch|sx|eu|at|ag|co)|
videoweed\.(?:es|com)|
movshare\.(?:net|sx|ag)|
divxstage\.(?:eu|net|ch|co|at|ag))
/embed\.php.+?)\1''', webpage)
if mobj is not None:
return self.url_result(mobj.group('url'), 'NovaMov')
# Look for embedded NowVideo player
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>http://(?:(?:embed|www)\.)?nowvideo\.(?:ch|sx|eu)/embed\.php.+?)\1', webpage)
if mobj is not None:
return self.url_result(mobj.group('url'), 'NowVideo')
return self.url_result(mobj.group('url'))
# Look for embedded Facebook player
mobj = re.search(
@@ -570,6 +615,12 @@ class GenericIE(InfoExtractor):
if mobj is not None:
return self.url_result(mobj.group('url'), 'TED')
# Look for embedded Ustream videos
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>http://www\.ustream\.tv/embed/.+?)\1', webpage)
if mobj is not None:
return self.url_result(mobj.group('url'), 'Ustream')
# Look for embedded arte.tv player
mobj = re.search(
r'<script [^>]*?src="(?P<url>http://www\.arte\.tv/playerv2/embed[^"]+)"',
@@ -582,65 +633,86 @@ class GenericIE(InfoExtractor):
if smotri_url:
return self.url_result(smotri_url, 'Smotri')
# Start with something easy: JW Player in SWFObject
mobj = re.search(r'flashvars: [\'"](?:.*&)?file=(http[^\'"&]*)', webpage)
if mobj is None:
# Look for gorilla-vid style embedding
mobj = re.search(r'(?s)(?:jw_plugins|JWPlayerOptions).*?file\s*:\s*["\'](.*?)["\']', webpage)
if mobj is None:
# Broaden the search a little bit
mobj = re.search(r'[^A-Za-z0-9]?(?:file|source)=(http[^\'"&]*)', webpage)
if mobj is None:
# Broaden the search a little bit: JWPlayer JS loader
mobj = re.search(r'[^A-Za-z0-9]?file["\']?:\s*["\'](http(?![^\'"]+\.[0-9]+[\'"])[^\'"]+)["\']', webpage)
# Look for embeded soundcloud player
mobj = re.search(
r'<iframe src="(?P<url>https?://(?:w\.)?soundcloud\.com/player[^"]+)"',
webpage)
if mobj is not None:
url = unescapeHTML(mobj.group('url'))
return self.url_result(url)
if mobj is None:
# Start with something easy: JW Player in SWFObject
found = re.findall(r'flashvars: [\'"](?:.*&)?file=(http[^\'"&]*)', webpage)
if not found:
# Look for gorilla-vid style embedding
found = re.findall(r'''(?sx)
(?:
jw_plugins|
JWPlayerOptions|
jwplayer\s*\(\s*["'][^'"]+["']\s*\)\s*\.setup
)
.*?file\s*:\s*["\'](.*?)["\']''', webpage)
if not found:
# Broaden the search a little bit
found = re.findall(r'[^A-Za-z0-9]?(?:file|source)=(http[^\'"&]*)', webpage)
if not found:
# Broaden the findall a little bit: JWPlayer JS loader
found = re.findall(r'[^A-Za-z0-9]?file["\']?:\s*["\'](http(?![^\'"]+\.[0-9]+[\'"])[^\'"]+)["\']', webpage)
if not found:
# Try to find twitter cards info
mobj = re.search(r'<meta (?:property|name)="twitter:player:stream" (?:content|value)="(.+?)"', webpage)
if mobj is None:
found = re.findall(r'<meta (?:property|name)="twitter:player:stream" (?:content|value)="(.+?)"', webpage)
if not found:
# We look for Open Graph info:
# We have to match any number spaces between elements, some sites try to align them (eg.: statigr.am)
m_video_type = re.search(r'<meta.*?property="og:video:type".*?content="video/(.*?)"', webpage)
m_video_type = re.findall(r'<meta.*?property="og:video:type".*?content="video/(.*?)"', webpage)
# We only look in og:video if the MIME type is a video, don't try if it's a Flash player:
if m_video_type is not None:
mobj = re.search(r'<meta.*?property="og:video".*?content="(.*?)"', webpage)
if mobj is None:
found = re.findall(r'<meta.*?property="og:video".*?content="(.*?)"', webpage)
if not found:
# HTML5 video
mobj = re.search(r'<video[^<]*(?:>.*?<source.*?)? src="([^"]+)"', webpage, flags=re.DOTALL)
if mobj is None:
mobj = re.search(
found = re.findall(r'(?s)<video[^<]*(?:>.*?<source.*?)? src="([^"]+)"', webpage)
if not found:
found = re.search(
r'(?i)<meta\s+(?=(?:[a-z-]+="[^"]+"\s+)*http-equiv="refresh")'
r'(?:[a-z-]+="[^"]+"\s+)*?content="[0-9]{,2};url=\'([^\']+)\'"',
webpage)
if mobj:
new_url = mobj.group(1)
if found:
new_url = found.group(1)
self.report_following_redirect(new_url)
return {
'_type': 'url',
'url': new_url,
}
if mobj is None:
if not found:
raise ExtractorError('Unsupported URL: %s' % url)
# It's possible that one of the regexes
# matched, but returned an empty group:
if mobj.group(1) is None:
raise ExtractorError('Did not find a valid video URL at %s' % url)
entries = []
for video_url in found:
video_url = compat_urlparse.urljoin(url, video_url)
video_id = compat_urllib_parse.unquote(os.path.basename(video_url))
video_url = mobj.group(1)
video_url = compat_urlparse.urljoin(url, video_url)
video_id = compat_urllib_parse.unquote(os.path.basename(video_url))
# Sometimes, jwplayer extraction will result in a YouTube URL
if YoutubeIE.suitable(video_url):
entries.append(self.url_result(video_url, 'Youtube'))
continue
# Sometimes, jwplayer extraction will result in a YouTube URL
if YoutubeIE.suitable(video_url):
return self.url_result(video_url, 'Youtube')
# here's a fun little line of code for you:
video_id = os.path.splitext(video_id)[0]
# here's a fun little line of code for you:
video_id = os.path.splitext(video_id)[0]
entries.append({
'id': video_id,
'url': video_url,
'uploader': video_uploader,
'title': video_title,
})
if len(entries) == 1:
return entries[0]
else:
for num, e in enumerate(entries, start=1):
e['title'] = '%s (%d)' % (e['title'], num)
return {
'_type': 'playlist',
'entries': entries,
}
return {
'id': video_id,
'url': video_url,
'uploader': video_uploader,
'title': video_title,
}

View File

@@ -0,0 +1,42 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class HentaiStigmaIE(InfoExtractor):
_VALID_URL = r'^https?://hentai\.animestigma\.com/(?P<id>[^/]+)'
_TEST = {
'url': 'http://hentai.animestigma.com/inyouchuu-etsu-bonus/',
'md5': '4e3d07422a68a4cc363d8f57c8bf0d23',
'info_dict': {
'id': 'inyouchuu-etsu-bonus',
'ext': 'mp4',
"title": "Inyouchuu Etsu Bonus",
"age_limit": 18,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
title = self._html_search_regex(
r'<h2 class="posttitle"><a[^>]*>([^<]+)</a>',
webpage, 'title')
wrap_url = self._html_search_regex(
r'<iframe src="([^"]+mp4)"', webpage, 'wrapper url')
wrap_webpage = self._download_webpage(wrap_url, video_id)
video_url = self._html_search_regex(
r'clip:\s*{\s*url: "([^"]*)"', wrap_webpage, 'video url')
return {
'id': video_id,
'url': video_url,
'title': title,
'age_limit': 18,
}

View File

@@ -5,8 +5,8 @@ import re
from .common import InfoExtractor
class StatigramIE(InfoExtractor):
_VALID_URL = r'https?://(www\.)?statigr\.am/p/(?P<id>[^/]+)'
class IconosquareIE(InfoExtractor):
_VALID_URL = r'https?://(www\.)?(?:iconosquare\.com|statigr\.am)/p/(?P<id>[^/]+)'
_TEST = {
'url': 'http://statigr.am/p/522207370455279102_24101272',
'md5': '6eb93b882a3ded7c378ee1d6884b1814',
@@ -15,6 +15,7 @@ class StatigramIE(InfoExtractor):
'ext': 'mp4',
'uploader_id': 'aguynamedpatrick',
'title': 'Instagram photo by @aguynamedpatrick (Patrick Janelle)',
'description': 'md5:644406a9ec27457ed7aa7a9ebcd4ce3d',
},
}
@@ -25,7 +26,7 @@ class StatigramIE(InfoExtractor):
html_title = self._html_search_regex(
r'<title>(.+?)</title>',
webpage, 'title')
title = re.sub(r'(?: *\(Videos?\))? \| Statigram$', '', html_title)
title = re.sub(r'(?: *\(Videos?\))? \| (?:Iconosquare|Statigram)$', '', html_title)
uploader_id = self._html_search_regex(
r'@([^ ]+)', title, 'uploader name', fatal=False)
@@ -33,6 +34,7 @@ class StatigramIE(InfoExtractor):
'id': video_id,
'url': self._og_search_video_url(webpage),
'title': title,
'description': self._og_search_description(webpage),
'thumbnail': self._og_search_thumbnail(webpage),
'uploader_id': uploader_id
}

View File

@@ -106,7 +106,7 @@ class OneUPIE(IGNIE):
_DESCRIPTION_RE = r'<div id="vid_summary">(.+?)</div>'
_TEST = {
_TESTS = [{
'url': 'http://gamevideos.1up.com/video/id/34976',
'md5': '68a54ce4ebc772e4b71e3123d413163d',
'info_dict': {
@@ -115,10 +115,7 @@ class OneUPIE(IGNIE):
'title': 'Sniper Elite V2 - Trailer',
'description': 'md5:5d289b722f5a6d940ca3136e9dae89cf',
}
}
# Override IGN tests
_TESTS = []
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)

View File

@@ -11,16 +11,15 @@ from ..utils import (
class InfoQIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?infoq\.com/[^/]+/(?P<id>[^/]+)$'
_TEST = {
"name": "InfoQ",
"url": "http://www.infoq.com/presentations/A-Few-of-My-Favorite-Python-Things",
"file": "12-jan-pythonthings.mp4",
"info_dict": {
"description": "Mike Pirnat presents some tips and tricks, standard libraries and third party packages that make programming in Python a richer experience.",
"title": "A Few of My Favorite [Python] Things",
},
"params": {
"skip_download": True,
'url': 'http://www.infoq.com/presentations/A-Few-of-My-Favorite-Python-Things',
'md5': 'b5ca0e0a8c1fed93b0e65e48e462f9a2',
'info_dict': {
'id': '12-jan-pythonthings',
'ext': 'mp4',
'description': 'Mike Pirnat presents some tips and tricks, standard libraries and third party packages that make programming in Python a richer experience.',
'title': 'A Few of My Favorite [Python] Things',
},
}
@@ -30,26 +29,39 @@ class InfoQIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
video_title = self._html_search_regex(r'<title>(.*?)</title>', webpage, 'title')
video_description = self._html_search_meta('description', webpage, 'description')
# The server URL is hardcoded
video_url = 'rtmpe://video.infoq.com/cfx/st/'
# Extract video URL
encoded_id = self._search_regex(r"jsclassref ?= ?'([^']*)'", webpage, 'encoded id')
encoded_id = self._search_regex(
r"jsclassref\s*=\s*'([^']*)'", webpage, 'encoded id')
real_id = compat_urllib_parse.unquote(base64.b64decode(encoded_id.encode('ascii')).decode('utf-8'))
video_url = 'rtmpe://video.infoq.com/cfx/st/' + real_id
playpath = 'mp4:' + real_id
# Extract title
video_title = self._search_regex(r'contentTitle = "(.*?)";',
webpage, 'title')
# Extract description
video_description = self._html_search_regex(r'<meta name="description" content="(.*)"(?:\s*/)?>',
webpage, 'description', fatal=False)
video_filename = video_url.split('/')[-1]
video_filename = playpath.split('/')[-1]
video_id, extension = video_filename.split('.')
http_base = self._search_regex(
r'EXPRESSINSTALL_SWF\s*=\s*"(https?://[^/"]+/)', webpage,
'HTTP base URL')
formats = [{
'format_id': 'rtmp',
'url': video_url,
'ext': extension,
'play_path': playpath,
}, {
'format_id': 'http',
'url': http_base + real_id,
}]
self._sort_formats(formats)
return {
'id': video_id,
'url': video_url,
'title': video_title,
'ext': extension, # Extension is always(?) mp4, but seems to be flv
'description': video_description,
'formats': formats,
}

View File

@@ -14,7 +14,7 @@ class JukeboxIE(InfoExtractor):
_VALID_URL = r'^http://www\.jukebox?\..+?\/.+[,](?P<video_id>[a-z0-9\-]+)\.html'
_TEST = {
'url': 'http://www.jukebox.es/kosheen/videoclip,pride,r303r.html',
'md5': '5dc6477e74b1e37042ac5acedd8413e5',
'md5': '1574e9b4d6438446d5b7dbcdf2786276',
'info_dict': {
'id': 'r303r',
'ext': 'flv',

View File

@@ -1,9 +1,12 @@
from __future__ import unicode_literals
import json
import os
import re
from .common import InfoExtractor
from ..utils import (
compat_str,
ExtractorError,
formatSeconds,
)
@@ -24,34 +27,31 @@ class JustinTVIE(InfoExtractor):
/?(?:\#.*)?$
"""
_JUSTIN_PAGE_LIMIT = 100
IE_NAME = u'justin.tv'
IE_NAME = 'justin.tv'
IE_DESC = 'justin.tv and twitch.tv'
_TEST = {
u'url': u'http://www.twitch.tv/thegamedevhub/b/296128360',
u'file': u'296128360.flv',
u'md5': u'ecaa8a790c22a40770901460af191c9a',
u'info_dict': {
u"upload_date": u"20110927",
u"uploader_id": 25114803,
u"uploader": u"thegamedevhub",
u"title": u"Beginner Series - Scripting With Python Pt.1"
'url': 'http://www.twitch.tv/thegamedevhub/b/296128360',
'md5': 'ecaa8a790c22a40770901460af191c9a',
'info_dict': {
'id': '296128360',
'ext': 'flv',
'upload_date': '20110927',
'uploader_id': 25114803,
'uploader': 'thegamedevhub',
'title': 'Beginner Series - Scripting With Python Pt.1'
}
}
def report_download_page(self, channel, offset):
"""Report attempt to download a single page of videos."""
self.to_screen(u'%s: Downloading video information from %d to %d' %
(channel, offset, offset + self._JUSTIN_PAGE_LIMIT))
# Return count of items, list of *valid* items
def _parse_page(self, url, video_id):
info_json = self._download_webpage(url, video_id,
u'Downloading video info JSON',
u'unable to download video info JSON')
'Downloading video info JSON',
'unable to download video info JSON')
response = json.loads(info_json)
if type(response) != list:
error_text = response.get('error', 'unknown error')
raise ExtractorError(u'Justin.tv API: %s' % error_text)
raise ExtractorError('Justin.tv API: %s' % error_text)
info = []
for clip in response:
video_url = clip['video_file_url']
@@ -62,7 +62,7 @@ class JustinTVIE(InfoExtractor):
video_id = clip['id']
video_title = clip.get('title', video_id)
info.append({
'id': video_id,
'id': compat_str(video_id),
'url': video_url,
'title': video_title,
'uploader': clip.get('channel_name', video_uploader_id),
@@ -74,8 +74,6 @@ class JustinTVIE(InfoExtractor):
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
if mobj is None:
raise ExtractorError(u'invalid URL: %s' % url)
api_base = 'http://api.justin.tv'
paged = False
@@ -89,40 +87,41 @@ class JustinTVIE(InfoExtractor):
webpage = self._download_webpage(url, chapter_id)
m = re.search(r'PP\.archive_id = "([0-9]+)";', webpage)
if not m:
raise ExtractorError(u'Cannot find archive of a chapter')
raise ExtractorError('Cannot find archive of a chapter')
archive_id = m.group(1)
api = api_base + '/broadcast/by_chapter/%s.xml' % chapter_id
doc = self._download_xml(api, chapter_id,
note=u'Downloading chapter information',
errnote=u'Chapter information download failed')
doc = self._download_xml(
api, chapter_id,
note='Downloading chapter information',
errnote='Chapter information download failed')
for a in doc.findall('.//archive'):
if archive_id == a.find('./id').text:
break
else:
raise ExtractorError(u'Could not find chapter in chapter information')
raise ExtractorError('Could not find chapter in chapter information')
video_url = a.find('./video_file_url').text
video_ext = video_url.rpartition('.')[2] or u'flv'
video_ext = video_url.rpartition('.')[2] or 'flv'
chapter_api_url = u'https://api.twitch.tv/kraken/videos/c' + chapter_id
chapter_info_json = self._download_webpage(chapter_api_url, u'c' + chapter_id,
note='Downloading chapter metadata',
errnote='Download of chapter metadata failed')
chapter_info = json.loads(chapter_info_json)
chapter_api_url = 'https://api.twitch.tv/kraken/videos/c' + chapter_id
chapter_info = self._download_json(
chapter_api_url, 'c' + chapter_id,
note='Downloading chapter metadata',
errnote='Download of chapter metadata failed')
bracket_start = int(doc.find('.//bracket_start').text)
bracket_end = int(doc.find('.//bracket_end').text)
# TODO determine start (and probably fix up file)
# youtube-dl -v http://www.twitch.tv/firmbelief/c/1757457
#video_url += u'?start=' + TODO:start_timestamp
#video_url += '?start=' + TODO:start_timestamp
# bracket_start is 13290, but we want 51670615
self._downloader.report_warning(u'Chapter detected, but we can just download the whole file. '
u'Chapter starts at %s and ends at %s' % (formatSeconds(bracket_start), formatSeconds(bracket_end)))
self._downloader.report_warning('Chapter detected, but we can just download the whole file. '
'Chapter starts at %s and ends at %s' % (formatSeconds(bracket_start), formatSeconds(bracket_end)))
info = {
'id': u'c' + chapter_id,
'id': 'c' + chapter_id,
'url': video_url,
'ext': video_ext,
'title': chapter_info['title'],
@@ -131,14 +130,12 @@ class JustinTVIE(InfoExtractor):
'uploader': chapter_info['channel']['display_name'],
'uploader_id': chapter_info['channel']['name'],
}
return [info]
return info
else:
video_id = mobj.group('videoid')
api = api_base + '/broadcast/by_archive/%s.json' % video_id
self.report_extraction(video_id)
info = []
entries = []
offset = 0
limit = self._JUSTIN_PAGE_LIMIT
while True:
@@ -146,8 +143,12 @@ class JustinTVIE(InfoExtractor):
self.report_download_page(video_id, offset)
page_url = api + ('?offset=%d&limit=%d' % (offset, limit))
page_count, page_info = self._parse_page(page_url, video_id)
info.extend(page_info)
entries.extend(page_info)
if not paged or page_count != limit:
break
offset += limit
return info
return {
'_type': 'playlist',
'id': video_id,
'entries': entries,
}

View File

@@ -1,3 +1,5 @@
from __future__ import unicode_literals
import os
import re
@@ -11,22 +13,22 @@ from ..aes import (
aes_decrypt_text
)
class KeezMoviesIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?(?:www\.)?(?P<url>keezmovies\.com/video/.+?(?P<videoid>[0-9]+))(?:[/?&]|$)'
_VALID_URL = r'^https?://(?:www\.)?keezmovies\.com/video/.+?(?P<videoid>[0-9]+)(?:[/?&]|$)'
_TEST = {
u'url': u'http://www.keezmovies.com/video/petite-asian-lady-mai-playing-in-bathtub-1214711',
u'file': u'1214711.mp4',
u'md5': u'6e297b7e789329923fcf83abb67c9289',
u'info_dict': {
u"title": u"Petite Asian Lady Mai Playing In Bathtub",
u"age_limit": 18,
'url': 'http://www.keezmovies.com/video/petite-asian-lady-mai-playing-in-bathtub-1214711',
'file': '1214711.mp4',
'md5': '6e297b7e789329923fcf83abb67c9289',
'info_dict': {
'title': 'Petite Asian Lady Mai Playing In Bathtub',
'age_limit': 18,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('videoid')
url = 'http://www.' + mobj.group('url')
req = compat_urllib_request.Request(url)
req.add_header('Cookie', 'age_verified=1')
@@ -38,10 +40,10 @@ class KeezMoviesIE(InfoExtractor):
embedded_url = mobj.group(1)
return self.url_result(embedded_url)
video_title = self._html_search_regex(r'<h1 [^>]*>([^<]+)', webpage, u'title')
video_url = compat_urllib_parse.unquote(self._html_search_regex(r'video_url=(.+?)&amp;', webpage, u'video_url'))
if webpage.find('encrypted=true')!=-1:
password = self._html_search_regex(r'video_title=(.+?)&amp;', webpage, u'password')
video_title = self._html_search_regex(r'<h1 [^>]*>([^<]+)', webpage, 'title')
video_url = compat_urllib_parse.unquote(self._html_search_regex(r'video_url=(.+?)&amp;', webpage, 'video_url'))
if 'encrypted=true' in webpage:
password = self._html_search_regex(r'video_title=(.+?)&amp;', webpage, 'password')
video_url = aes_decrypt_text(video_url, password, 32).decode('utf-8')
path = compat_urllib_parse_urlparse(video_url).path
extension = os.path.splitext(path)[1][1:]

View File

@@ -1,15 +1,18 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
)
class MDRIE(InfoExtractor):
_VALID_URL = r'^(?P<domain>(?:https?://)?(?:www\.)?mdr\.de)/mediathek/(?:.*)/(?P<type>video|audio)(?P<video_id>[^/_]+)_.*'
_VALID_URL = r'^(?P<domain>https?://(?:www\.)?mdr\.de)/(?:.*)/(?P<type>video|audio)(?P<video_id>[^/_]+)(?:_|\.html)'
# No tests, MDR regularily deletes its videos
_TEST = {
'url': 'http://www.mdr.de/fakt/video189002.html',
'only_matching': True,
}
def _real_extract(self, url):
m = re.match(self._VALID_URL, url)
@@ -19,9 +22,9 @@ class MDRIE(InfoExtractor):
# determine title and media streams from webpage
html = self._download_webpage(url, video_id)
title = self._html_search_regex(r'<h2>(.*?)</h2>', html, u'title')
title = self._html_search_regex(r'<h[12]>(.*?)</h[12]>', html, 'title')
xmlurl = self._search_regex(
r'(/mediathek/(?:.+)/(?:video|audio)[0-9]+-avCustom.xml)', html, u'XML URL')
r'dataURL:\'(/(?:.+)/(?:video|audio)[0-9]+-avCustom.xml)', html, 'XML URL')
doc = self._download_xml(domain + xmlurl, video_id)
formats = []
@@ -41,7 +44,7 @@ class MDRIE(InfoExtractor):
if vbr_el is None:
format.update({
'vcodec': 'none',
'format_id': u'%s-%d' % (media_type, abr),
'format_id': '%s-%d' % (media_type, abr),
})
else:
vbr = int(vbr_el.text) // 1000
@@ -49,12 +52,9 @@ class MDRIE(InfoExtractor):
'vbr': vbr,
'width': int(a.find('frameWidth').text),
'height': int(a.find('frameHeight').text),
'format_id': u'%s-%d' % (media_type, vbr),
'format_id': '%s-%d' % (media_type, vbr),
})
formats.append(format)
if not formats:
raise ExtractorError(u'Could not find any valid formats')
self._sort_formats(formats)
return {

View File

@@ -4,9 +4,10 @@ import re
from .common import InfoExtractor
from ..utils import (
unified_strdate,
compat_urllib_parse,
ExtractorError,
int_or_none,
parse_iso8601,
)
@@ -24,6 +25,10 @@ class MixcloudIE(InfoExtractor):
'uploader': 'Daniel Holbach',
'uploader_id': 'dholbach',
'upload_date': '20111115',
'timestamp': 1321359578,
'thumbnail': 're:https?://.*\.jpg',
'view_count': int,
'like_count': int,
},
}
@@ -51,10 +56,6 @@ class MixcloudIE(InfoExtractor):
webpage = self._download_webpage(url, track_id)
api_url = 'http://api.mixcloud.com/%s/%s/' % (uploader, cloudcast_name)
info = self._download_json(
api_url, track_id, 'Downloading cloudcast info')
preview_url = self._search_regex(
r'\s(?:data-preview-url|m-preview)="(.+?)"', webpage, 'preview url')
song_url = preview_url.replace('/previews/', '/c/originals/')
@@ -65,16 +66,41 @@ class MixcloudIE(InfoExtractor):
template_url = template_url.replace('.mp3', '.m4a').replace('originals/', 'm4a/64/')
final_song_url = self._get_url(template_url)
if final_song_url is None:
raise ExtractorError(u'Unable to extract track url')
raise ExtractorError('Unable to extract track url')
PREFIX = (
r'<div class="cloudcast-play-button-container"'
r'(?:\s+[a-zA-Z0-9-]+(?:="[^"]+")?)*?\s+')
title = self._html_search_regex(
PREFIX + r'm-title="([^"]+)"', webpage, 'title')
thumbnail = self._proto_relative_url(self._html_search_regex(
PREFIX + r'm-thumbnail-url="([^"]+)"', webpage, 'thumbnail',
fatal=False))
uploader = self._html_search_regex(
PREFIX + r'm-owner-name="([^"]+)"',
webpage, 'uploader', fatal=False)
uploader_id = self._search_regex(
r'\s+"profile": "([^"]+)",', webpage, 'uploader id', fatal=False)
description = self._og_search_description(webpage)
like_count = int_or_none(self._search_regex(
r'<meta itemprop="interactionCount" content="UserLikes:([0-9]+)"',
webpage, 'like count', fatal=False))
view_count = int_or_none(self._search_regex(
r'<meta itemprop="interactionCount" content="UserPlays:([0-9]+)"',
webpage, 'play count', fatal=False))
timestamp = parse_iso8601(self._search_regex(
r'<time itemprop="dateCreated" datetime="([^"]+)">',
webpage, 'upload date'))
return {
'id': track_id,
'title': info['name'],
'title': title,
'url': final_song_url,
'description': info.get('description'),
'thumbnail': info['pictures'].get('extra_large'),
'uploader': info['user']['name'],
'uploader_id': info['user']['username'],
'upload_date': unified_strdate(info['created_time']),
'view_count': info['play_count'],
'description': description,
'thumbnail': thumbnail,
'uploader': uploader,
'uploader_id': uploader_id,
'timestamp': timestamp,
'view_count': view_count,
'like_count': like_count,
}

View File

@@ -1,22 +1,14 @@
# coding: utf-8
from __future__ import unicode_literals
import hashlib
import json
import re
import time
from .common import InfoExtractor
from ..utils import (
compat_parse_qs,
compat_str,
int_or_none,
)
class MorningstarIE(InfoExtractor):
IE_DESC = 'morningstar.com'
_VALID_URL = r'https?://(?:www\.)?morningstar\.com/cover/videocenter\.aspx\?id=(?P<id>[0-9]+)'
_VALID_URL = r'https?://(?:www\.)?morningstar\.com/[cC]over/video[cC]enter\.aspx\?id=(?P<id>[0-9]+)'
_TEST = {
'url': 'http://www.morningstar.com/cover/videocenter.aspx?id=615869',
'md5': '6c0acface7a787aadc8391e4bbf7b0f5',

View File

@@ -44,7 +44,7 @@ class MotorsportIE(InfoExtractor):
e = compat_str(int(time.time()) + 24 * 60 * 60)
base_video_url = params['location'] + '?e=' + e
s = 'h3hg713fh32'
h = hashlib.md5(s + base_video_url).hexdigest()
h = hashlib.md5((s + base_video_url).encode('utf-8')).hexdigest()
video_url = base_video_url + '&h=' + h
uploader = self._html_search_regex(

View File

@@ -0,0 +1,45 @@
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class MoviezineIE(InfoExtractor):
_VALID_URL = r'https?://www\.moviezine\.se/video/(?P<id>[^?#]+)'
_TEST = {
'url': 'http://www.moviezine.se/video/205866',
'info_dict': {
'id': '205866',
'ext': 'mp4',
'title': 'Oculus - Trailer 1',
'description': 'md5:40cc6790fc81d931850ca9249b40e8a4',
'thumbnail': 're:http://.*\.jpg',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
jsplayer = self._download_webpage('http://www.moviezine.se/api/player.js?video=%s' % video_id, video_id, 'Downloading js api player')
formats =[{
'format_id': 'sd',
'url': self._html_search_regex(r'file: "(.+?)",', jsplayer, 'file'),
'quality': 0,
'ext': 'mp4',
}]
self._sort_formats(formats)
return {
'id': video_id,
'title': self._search_regex(r'title: "(.+?)",', jsplayer, 'title'),
'thumbnail': self._search_regex(r'image: "(.+?)",', jsplayer, 'image'),
'formats': formats,
'description': self._og_search_description(webpage),
}

View File

@@ -0,0 +1,27 @@
from __future__ import unicode_literals
from .novamov import NovaMovIE
class MovShareIE(NovaMovIE):
IE_NAME = 'movshare'
IE_DESC = 'MovShare'
_VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': 'movshare\.(?:net|sx|ag)'}
_HOST = 'www.movshare.net'
_FILE_DELETED_REGEX = r'>This file no longer exists on our servers.<'
_TITLE_REGEX = r'<strong>Title:</strong> ([^<]+)</p>'
_DESCRIPTION_REGEX = r'<strong>Description:</strong> ([^<]+)</p>'
_TEST = {
'url': 'http://www.movshare.net/video/559e28be54d96',
'md5': 'abd31a2132947262c50429e1d16c1bfd',
'info_dict': {
'id': '559e28be54d96',
'ext': 'flv',
'title': 'dissapeared image',
'description': 'optical illusion dissapeared image magic illusion',
}
}

View File

@@ -4,9 +4,7 @@ import json
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
)
from ..utils import int_or_none
class MporaIE(InfoExtractor):
@@ -20,7 +18,7 @@ class MporaIE(InfoExtractor):
'info_dict': {
'title': 'Katy Curd - Winter in the Forest',
'duration': 416,
'uploader': 'petenewman',
'uploader': 'Peter Newman Media',
},
}

View File

@@ -4,7 +4,10 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import ExtractorError
from ..utils import (
ExtractorError,
int_or_none,
)
class NDRIE(InfoExtractor):
@@ -45,13 +48,12 @@ class NDRIE(InfoExtractor):
page = self._download_webpage(url, video_id, 'Downloading page')
title = self._og_search_title(page)
title = self._og_search_title(page).strip()
description = self._og_search_description(page)
if description:
description = description.strip()
mobj = re.search(
r'<div class="duration"><span class="min">(?P<minutes>\d+)</span>:<span class="sec">(?P<seconds>\d+)</span></div>',
page)
duration = int(mobj.group('minutes')) * 60 + int(mobj.group('seconds')) if mobj else None
duration = int_or_none(self._html_search_regex(r'duration: (\d+),\n', page, 'duration', fatal=False))
formats = []
@@ -66,10 +68,12 @@ class NDRIE(InfoExtractor):
video_url = re.search(r'''3: {src:'(?P<video>.+?)\.hi\.mp4', type:"video/mp4"},''', page)
if video_url:
thumbnail = self._html_search_regex(r'(?m)title: "NDR PLAYER",\s*poster: "([^"]+)",',
page, 'thumbnail', fatal=False)
if thumbnail:
thumbnail = 'http://www.ndr.de' + thumbnail
thumbnails = re.findall(r'''\d+: {src: "([^"]+)"(?: \|\| '[^']+')?, quality: '([^']+)'}''', page)
if thumbnails:
QUALITIES = ['xs', 's', 'm', 'l', 'xl']
thumbnails.sort(key=lambda thumb: QUALITIES.index(thumb[1]) if thumb[1] in QUALITIES else -1)
thumbnail = 'http://www.ndr.de' + thumbnails[-1][0]
for format_id in ['lo', 'hi', 'hq']:
formats.append({
'url': '%s.%s.mp4' % (video_url.group('video'), format_id),

View File

@@ -0,0 +1,87 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class NewstubeIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?newstube\.ru/media/(?P<id>.+)'
_TEST = {
'url': 'http://newstube.ru/media/na-korable-progress-prodolzhaetsya-testirovanie-sistemy-kurs',
'info_dict': {
'id': 'd156a237-a6e9-4111-a682-039995f721f1',
'ext': 'flv',
'title': 'На корабле «Прогресс» продолжается тестирование системы «Курс»',
'description': 'md5:d0cbe7b4a6f600552617e48548d5dc77',
'duration': 20.04,
},
'params': {
# rtmp download
'skip_download': True,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
page = self._download_webpage(url, video_id, 'Downloading page')
video_guid = self._html_search_regex(
r'<meta property="og:video" content="https?://(?:www\.)?newstube\.ru/freshplayer\.swf\?guid=(?P<guid>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})',
page, 'video GUID')
player = self._download_xml(
'http://p.newstube.ru/v2/player.asmx/GetAutoPlayInfo6?state=&url=%s&sessionId=&id=%s&placement=profile&location=n2' % (url, video_guid),
video_guid, 'Downloading player XML')
def ns(s):
return s.replace('/', '/%(ns)s') % {'ns': '{http://app1.newstube.ru/N2SiteWS/player.asmx}'}
session_id = player.find(ns('./SessionId')).text
media_info = player.find(ns('./Medias/MediaInfo'))
title = media_info.find(ns('./Name')).text
description = self._og_search_description(page)
thumbnail = media_info.find(ns('./KeyFrame')).text
duration = int(media_info.find(ns('./Duration')).text) / 1000.0
formats = []
for stream_info in media_info.findall(ns('./Streams/StreamInfo')):
media_location = stream_info.find(ns('./MediaLocation'))
if media_location is None:
continue
server = media_location.find(ns('./Server')).text
app = media_location.find(ns('./App')).text
media_id = stream_info.find(ns('./Id')).text
quality_id = stream_info.find(ns('./QualityId')).text
name = stream_info.find(ns('./Name')).text
width = int(stream_info.find(ns('./Width')).text)
height = int(stream_info.find(ns('./Height')).text)
formats.append({
'url': 'rtmp://%s/%s' % (server, app),
'app': app,
'play_path': '01/%s' % video_guid.upper(),
'rtmp_conn': ['S:%s' % session_id, 'S:%s' % media_id, 'S:n2'],
'page_url': url,
'ext': 'flv',
'format_id': quality_id,
'format_note': name,
'width': width,
'height': height,
})
self._sort_formats(formats)
return {
'id': video_guid,
'title': title,
'description': description,
'thumbnail': thumbnail,
'duration': duration,
'formats': formats,
}

View File

@@ -73,14 +73,16 @@ class NFBIE(InfoExtractor):
title = media.find('title').text
description = media.find('description').text
# It seems assets always go from lower to better quality, so no need to sort
formats = [{
'url': x.find('default/streamerURI').text,
'app': x.find('default/streamerURI').text.split('/', 3)[3],
'play_path': x.find('default/url').text,
'rtmp_live': False,
'ext': 'mp4',
'format_id': x.get('quality'),
} for x in media.findall('assets/asset')]
for asset in media.findall('assets/asset'):
for x in asset:
formats.append({
'url': x.find('streamerURI').text,
'app': x.find('streamerURI').text.split('/', 3)[3],
'play_path': x.find('url').text,
'rtmp_live': False,
'ext': 'mp4',
'format_id': '%s-%s' % (x.tag, asset.get('quality')),
})
return {
'id': video_id,

View File

@@ -1,15 +1,22 @@
from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor
from ..utils import str_to_int
class NineGagIE(InfoExtractor):
IE_NAME = '9gag'
_VALID_URL = r'^https?://(?:www\.)?9gag\.tv/v/(?P<id>[0-9]+)'
_VALID_URL = r'''(?x)^https?://(?:www\.)?9gag\.tv/
(?:
v/(?P<numid>[0-9]+)|
p/(?P<id>[a-zA-Z0-9]+)/(?P<display_id>[^?#/]+)
)
'''
_TEST = {
_TESTS = [{
"url": "http://9gag.tv/v/1912",
"info_dict": {
"id": "1912",
@@ -20,34 +27,42 @@ class NineGagIE(InfoExtractor):
"thumbnail": "re:^https?://",
},
'add_ie': ['Youtube']
}
},
{
'url': 'http://9gag.tv/p/KklwM/alternate-banned-opening-scene-of-gravity?ref=fsidebar',
'info_dict': {
'id': 'KklwM',
'ext': 'mp4',
'display_id': 'alternate-banned-opening-scene-of-gravity',
"description": "While Gravity was a pretty awesome movie already, YouTuber Krishna Shenoi came up with a way to improve upon it, introducing a much better solution to Sandra Bullock's seemingly endless tumble in space. The ending is priceless.",
'title': "Banned Opening Scene Of \"Gravity\" That Changes The Whole Movie",
},
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = mobj.group('numid') or mobj.group('id')
display_id = mobj.group('display_id') or video_id
webpage = self._download_webpage(url, video_id)
webpage = self._download_webpage(url, display_id)
youtube_id = self._html_search_regex(
r'(?s)id="jsid-video-post-container".*?data-external-id="([^"]+)"',
webpage, 'video ID')
description = self._html_search_regex(
r'(?s)<div class="video-caption">.*?<p>(.*?)</p>', webpage,
'description', fatal=False)
view_count_str = self._html_search_regex(
r'<p><b>([0-9][0-9,]*)</b> views</p>', webpage, 'view count',
fatal=False)
view_count = (
None if view_count_str is None
else int(view_count_str.replace(',', '')))
post_view = json.loads(self._html_search_regex(
r'var postView = new app\.PostView\({\s*post:\s*({.+?}),', webpage, 'post view'))
youtube_id = post_view['videoExternalId']
title = post_view['title']
description = post_view['description']
view_count = str_to_int(post_view['externalView'])
thumbnail = post_view.get('thumbnail_700w') or post_view.get('ogImageUrl') or post_view.get('thumbnail_300w')
return {
'_type': 'url_transparent',
'url': youtube_id,
'ie_key': 'Youtube',
'id': video_id,
'title': self._og_search_title(webpage),
'display_id': display_id,
'title': title,
'description': description,
'view_count': view_count,
'thumbnail': self._og_search_thumbnail(webpage),
'thumbnail': thumbnail,
}

View File

@@ -0,0 +1,106 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
unified_strdate,
compat_str,
)
class NocoIE(InfoExtractor):
_VALID_URL = r'http://(?:(?:www\.)?noco\.tv/emission/|player\.noco\.tv/\?idvideo=)(?P<id>\d+)'
_TEST = {
'url': 'http://noco.tv/emission/11538/nolife/ami-ami-idol-hello-france/',
'md5': '0a993f0058ddbcd902630b2047ef710e',
'info_dict': {
'id': '11538',
'ext': 'mp4',
'title': 'Ami Ami Idol - Hello! France',
'description': 'md5:4eaab46ab68fa4197a317a88a53d3b86',
'upload_date': '20140412',
'uploader': 'Nolife',
'uploader_id': 'NOL',
'duration': 2851.2,
},
'skip': 'Requires noco account',
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
medias = self._download_json(
'http://api.noco.tv/1.0/video/medias/%s' % video_id, video_id, 'Downloading video JSON')
formats = []
for fmt in medias['fr']['video_list']['default']['quality_list']:
format_id = fmt['quality_key']
file = self._download_json(
'http://api.noco.tv/1.0/video/file/%s/fr/%s' % (format_id.lower(), video_id),
video_id, 'Downloading %s video JSON' % format_id)
file_url = file['file']
if not file_url:
continue
if file_url == 'forbidden':
raise ExtractorError(
'%s returned error: %s - %s' % (
self.IE_NAME, file['popmessage']['title'], file['popmessage']['message']),
expected=True)
formats.append({
'url': file_url,
'format_id': format_id,
'width': fmt['res_width'],
'height': fmt['res_lines'],
'abr': fmt['audiobitrate'],
'vbr': fmt['videobitrate'],
'filesize': fmt['filesize'],
'format_note': fmt['quality_name'],
'preference': fmt['priority'],
})
self._sort_formats(formats)
show = self._download_json(
'http://api.noco.tv/1.0/shows/show/%s' % video_id, video_id, 'Downloading show JSON')[0]
upload_date = unified_strdate(show['indexed'])
uploader = show['partner_name']
uploader_id = show['partner_key']
duration = show['duration_ms'] / 1000.0
thumbnail = show['screenshot']
episode = show.get('show_TT') or show.get('show_OT')
family = show.get('family_TT') or show.get('family_OT')
episode_number = show.get('episode_number')
title = ''
if family:
title += family
if episode_number:
title += ' #' + compat_str(episode_number)
if episode:
title += ' - ' + episode
description = show.get('show_resume') or show.get('family_resume')
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'upload_date': upload_date,
'uploader': uploader,
'uploader_id': uploader_id,
'duration': duration,
'formats': formats,
}

View File

@@ -13,7 +13,8 @@ class NovaMovIE(InfoExtractor):
IE_NAME = 'novamov'
IE_DESC = 'NovaMov'
_VALID_URL = r'http://(?:(?:www\.)?%(host)s/video/|(?:(?:embed|www)\.)%(host)s/embed\.php\?(?:.*?&)?v=)(?P<videoid>[a-z\d]{13})' % {'host': 'novamov\.com'}
_VALID_URL_TEMPLATE = r'http://(?:(?:www\.)?%(host)s/(?:file|video)/|(?:(?:embed|www)\.)%(host)s/embed\.php\?(?:.*?&)?v=)(?P<id>[a-z\d]{13})'
_VALID_URL = _VALID_URL_TEMPLATE % {'host': 'novamov\.com'}
_HOST = 'www.novamov.com'
@@ -36,18 +37,17 @@ class NovaMovIE(InfoExtractor):
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('videoid')
video_id = mobj.group('id')
page = self._download_webpage(
'http://%s/video/%s' % (self._HOST, video_id), video_id, 'Downloading video page')
if re.search(self._FILE_DELETED_REGEX, page) is not None:
raise ExtractorError(u'Video %s does not exist' % video_id, expected=True)
raise ExtractorError('Video %s does not exist' % video_id, expected=True)
filekey = self._search_regex(self._FILEKEY_REGEX, page, 'filekey')
title = self._html_search_regex(self._TITLE_REGEX, page, 'title', fatal=False)
description = self._html_search_regex(self._DESCRIPTION_REGEX, page, 'description', default='', fatal=False)
api_response = self._download_webpage(

View File

@@ -7,7 +7,7 @@ class NowVideoIE(NovaMovIE):
IE_NAME = 'nowvideo'
IE_DESC = 'NowVideo'
_VALID_URL = r'http://(?:(?:www\.)?%(host)s/video/|(?:(?:embed|www)\.)%(host)s/embed\.php\?(?:.*?&)?v=)(?P<videoid>[a-z\d]{13})' % {'host': 'nowvideo\.(?:ch|sx|eu)'}
_VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': 'nowvideo\.(?:ch|sx|eu|at|ag|co)'}
_HOST = 'www.nowvideo.ch'

View File

@@ -0,0 +1,67 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import ExtractorError
class NRKIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?nrk\.no/(?:video|lyd)/[^/]+/(?P<id>[\dA-F]{16})'
_TESTS = [
{
'url': 'http://www.nrk.no/video/dompap_og_andre_fugler_i_piip_show/D0FA54B5C8B6CE59/emne/piipshow/',
'md5': 'a6eac35052f3b242bb6bb7f43aed5886',
'info_dict': {
'id': '150533',
'ext': 'flv',
'title': 'Dompap og andre fugler i Piip-Show',
'description': 'md5:d9261ba34c43b61c812cb6b0269a5c8f'
}
},
{
'url': 'http://www.nrk.no/lyd/lyd_av_oppleser_for_blinde/AEFDDD5473BA0198/',
'md5': '3471f2a51718195164e88f46bf427668',
'info_dict': {
'id': '154915',
'ext': 'flv',
'title': 'Slik høres internett ut når du er blind',
'description': 'md5:a621f5cc1bd75c8d5104cb048c6b8568',
}
},
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
page = self._download_webpage(url, video_id)
video_id = self._html_search_regex(r'<div class="nrk-video" data-nrk-id="(\d+)">', page, 'video id')
data = self._download_json(
'http://v7.psapi.nrk.no/mediaelement/%s' % video_id, video_id, 'Downloading media JSON')
if data['usageRights']['isGeoBlocked']:
raise ExtractorError('NRK har ikke rettig-heter til å vise dette programmet utenfor Norge', expected=True)
video_url = data['mediaUrl'] + '?hdcore=3.1.1&plugin=aasp-3.1.1.69.124'
images = data.get('images')
if images:
thumbnails = images['webImages']
thumbnails.sort(key=lambda image: image['pixelWidth'])
thumbnail = thumbnails[-1]['imageUrl']
else:
thumbnail = None
return {
'id': video_id,
'url': video_url,
'ext': 'flv',
'title': data['title'],
'description': data['description'],
'thumbnail': thumbnail,
}

View File

@@ -24,9 +24,9 @@ class NTVIE(InfoExtractor):
'duration': 136,
},
'params': {
# rtmp download
'skip_download': True,
},
# rtmp download
'skip_download': True,
},
},
{
'url': 'http://www.ntv.ru/video/novosti/750370/',
@@ -38,9 +38,9 @@ class NTVIE(InfoExtractor):
'duration': 172,
},
'params': {
# rtmp download
'skip_download': True,
},
# rtmp download
'skip_download': True,
},
},
{
'url': 'http://www.ntv.ru/peredacha/segodnya/m23700/o232416',
@@ -52,23 +52,23 @@ class NTVIE(InfoExtractor):
'duration': 1496,
},
'params': {
# rtmp download
'skip_download': True,
},
# rtmp download
'skip_download': True,
},
},
{
'url': 'http://www.ntv.ru/kino/Koma_film',
'info_dict': {
'id': '750783',
'id': '758100',
'ext': 'flv',
'title': 'Остросюжетный фильм «Кома» — 4 апреля вечером на НТВ',
'description': 'Остросюжетный фильм «Кома» — 4 апреля вечером на НТВ',
'duration': 28,
'title': 'Остросюжетный фильм «Кома»',
'description': 'Остросюжетный фильм «Кома»',
'duration': 5592,
},
'params': {
# rtmp download
'skip_download': True,
},
# rtmp download
'skip_download': True,
},
},
{
'url': 'http://www.ntv.ru/serial/Delo_vrachey/m31760/o233916/',
@@ -80,33 +80,25 @@ class NTVIE(InfoExtractor):
'duration': 2590,
},
'params': {
# rtmp download
'skip_download': True,
},
# rtmp download
'skip_download': True,
},
},
]
_VIDEO_ID_REGEXES = [
r'<meta property="og:url" content="http://www\.ntv\.ru/video/(\d+)',
r'<video embed=[^>]+><id>(\d+)</id>',
r'<video restriction[^>]+><key>(\d+)</key>'
r'<video restriction[^>]+><key>(\d+)</key>',
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
page = self._download_webpage(url, video_id, 'Downloading page')
page = self._download_webpage(url, video_id)
for pattern in self._VIDEO_ID_REGEXES:
mobj = re.search(pattern, page)
if mobj:
break
if not mobj:
raise ExtractorError('No media links available for %s' % video_id)
video_id = mobj.group(1)
video_id = self._html_search_regex(self._VIDEO_ID_REGEXES, page, 'video id')
player = self._download_xml('http://www.ntv.ru/vi%s/' % video_id, video_id, 'Downloading video XML')
title = unescapeHTML(player.find('./data/title').text)
@@ -124,7 +116,7 @@ class NTVIE(InfoExtractor):
'7': 'video2',
}
app = apps[puid22] if puid22 in apps else apps['4']
app = apps.get(puid22, apps['4'])
formats = []
for format_id in ['', 'hi', 'webm']:

View File

@@ -0,0 +1,48 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class NuvidIE(InfoExtractor):
_VALID_URL = r'^https?://(?:www|m)\.nuvid\.com/video/(?P<id>[0-9]+)'
_TEST = {
'url': 'http://m.nuvid.com/video/1310741/',
'md5': 'eab207b7ac4fccfb4e23c86201f11277',
'info_dict': {
'id': '1310741',
'ext': 'mp4',
"title": "Horny babes show their awesome bodeis and",
"age_limit": 18,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
murl = url.replace('://www.', '://m.')
webpage = self._download_webpage(murl, video_id)
title = self._html_search_regex(
r'<div class="title">\s+<h2[^>]*>([^<]+)</h2>',
webpage, 'title').strip()
url_end = self._html_search_regex(
r'href="(/mp4/[^"]+)"[^>]*data-link_type="mp4"',
webpage, 'video_url')
video_url = 'http://m.nuvid.com' + url_end
thumbnail = self._html_search_regex(
r'href="(/thumbs/[^"]+)"[^>]*data-link_type="thumbs"',
webpage, 'thumbnail URL', fatal=False)
return {
'id': video_id,
'url': video_url,
'ext': 'mp4',
'title': title,
'thumbnail': thumbnail,
'age_limit': 18,
}

View File

@@ -0,0 +1,77 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import parse_iso8601
class NYTimesIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?nytimes\.com/video/(?:[^/]+/)+(?P<id>\d+)'
_TEST = {
'url': 'http://www.nytimes.com/video/opinion/100000002847155/verbatim-what-is-a-photocopier.html?playlistId=100000001150263',
'md5': '18a525a510f942ada2720db5f31644c0',
'info_dict': {
'id': '100000002847155',
'ext': 'mov',
'title': 'Verbatim: What Is a Photocopier?',
'description': 'md5:93603dada88ddbda9395632fdc5da260',
'timestamp': 1398631707,
'upload_date': '20140427',
'uploader': 'Brett Weiner',
'duration': 419,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_data = self._download_json(
'http://www.nytimes.com/svc/video/api/v2/video/%s' % video_id, video_id, 'Downloading video JSON')
title = video_data['headline']
description = video_data['summary']
duration = video_data['duration'] / 1000.0
uploader = video_data['byline']
timestamp = parse_iso8601(video_data['publication_date'][:-8])
def get_file_size(file_size):
if isinstance(file_size, int):
return file_size
elif isinstance(file_size, dict):
return int(file_size.get('value', 0))
else:
return 0
formats = [
{
'url': video['url'],
'format_id': video['type'],
'vcodec': video['video_codec'],
'width': video['width'],
'height': video['height'],
'filesize': get_file_size(video['fileSize']),
} for video in video_data['renditions']
]
self._sort_formats(formats)
thumbnails = [
{
'url': 'http://www.nytimes.com/%s' % image['url'],
'resolution': '%dx%d' % (image['width'], image['height']),
} for image in video_data['images']
]
return {
'id': video_id,
'title': title,
'description': description,
'timestamp': timestamp,
'uploader': uploader,
'duration': duration,
'formats': formats,
'thumbnails': thumbnails,
}

View File

@@ -1,10 +1,10 @@
from __future__ import unicode_literals
import datetime
import json
import re
from .common import InfoExtractor
from ..utils import compat_urllib_parse
class PhotobucketIE(InfoExtractor):
@@ -14,6 +14,7 @@ class PhotobucketIE(InfoExtractor):
'file': 'zpsc0c3b9fa.mp4',
'md5': '7dabfb92b0a31f6c16cebc0f8e60ff99',
'info_dict': {
'timestamp': 1367669341,
'upload_date': '20130504',
'uploader': 'rachaneronas',
'title': 'Tired of Link Building? Try BacklinkMyDomain.com!',
@@ -32,11 +33,12 @@ class PhotobucketIE(InfoExtractor):
info_json = self._search_regex(r'Pb\.Data\.Shared\.put\(Pb\.Data\.Shared\.MEDIA, (.*?)\);',
webpage, 'info json')
info = json.loads(info_json)
url = compat_urllib_parse.unquote(self._html_search_regex(r'file=(.+\.mp4)', info['linkcodes']['html'], 'url'))
return {
'id': video_id,
'url': info['downloadUrl'],
'url': url,
'uploader': info['username'],
'upload_date': datetime.date.fromtimestamp(info['creationDate']).strftime('%Y%m%d'),
'timestamp': info['creationDate'],
'title': info['title'],
'ext': video_extension,
'thumbnail': info['thumbUrl'],

View File

@@ -6,22 +6,36 @@ import re
from .common import InfoExtractor
from ..utils import int_or_none
class PodomaticIE(InfoExtractor):
IE_NAME = 'podomatic'
_VALID_URL = r'^(?P<proto>https?)://(?P<channel>[^.]+)\.podomatic\.com/entry/(?P<id>[^?]+)'
_TEST = {
"url": "http://scienceteachingtips.podomatic.com/entry/2009-01-02T16_03_35-08_00",
"file": "2009-01-02T16_03_35-08_00.mp3",
"md5": "84bb855fcf3429e6bf72460e1eed782d",
"info_dict": {
"uploader": "Science Teaching Tips",
"uploader_id": "scienceteachingtips",
"title": "64. When the Moon Hits Your Eye",
"duration": 446,
}
}
_TESTS = [
{
'url': 'http://scienceteachingtips.podomatic.com/entry/2009-01-02T16_03_35-08_00',
'md5': '84bb855fcf3429e6bf72460e1eed782d',
'info_dict': {
'id': '2009-01-02T16_03_35-08_00',
'ext': 'mp3',
'uploader': 'Science Teaching Tips',
'uploader_id': 'scienceteachingtips',
'title': '64. When the Moon Hits Your Eye',
'duration': 446,
}
},
{
'url': 'http://ostbahnhof.podomatic.com/entry/2013-11-15T16_31_21-08_00',
'md5': 'd2cf443931b6148e27638650e2638297',
'info_dict': {
'id': '2013-11-15T16_31_21-08_00',
'ext': 'mp3',
'uploader': 'Ostbahnhof / Techno Mix',
'uploader_id': 'ostbahnhof',
'title': 'Einunddreizig',
'duration': 3799,
}
},
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
@@ -32,10 +46,12 @@ class PodomaticIE(InfoExtractor):
'?permalink=true&rtmp=0') %
(mobj.group('proto'), channel, video_id))
data_json = self._download_webpage(
json_url, video_id, note=u'Downloading video info')
json_url, video_id, 'Downloading video info')
data = json.loads(data_json)
video_url = data['downloadLink']
if not video_url:
video_url = '%s/%s' % (data['streamer'].replace('rtmp', 'http'), data['mediaLocation'])
uploader = data['podcast']
title = data['title']
thumbnail = data['imageLocation']

View File

@@ -1,44 +1,81 @@
from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor
from ..utils import compat_urllib_parse
from ..utils import int_or_none
class PornHdIE(InfoExtractor):
_VALID_URL = r'(?:http://)?(?:www\.)?pornhd\.com/(?:[a-z]{2,4}/)?videos/(?P<video_id>[0-9]+)/(?P<video_title>.+)'
_VALID_URL = r'http://(?:www\.)?pornhd\.com/(?:[a-z]{2,4}/)?videos/(?P<id>\d+)'
_TEST = {
'url': 'http://www.pornhd.com/videos/1962/sierra-day-gets-his-cum-all-over-herself-hd-porn-video',
'file': '1962.flv',
'md5': '35272469887dca97abd30abecc6cdf75',
'md5': '956b8ca569f7f4d8ec563e2c41598441',
'info_dict': {
"title": "sierra-day-gets-his-cum-all-over-herself-hd-porn-video",
"age_limit": 18,
'id': '1962',
'ext': 'mp4',
'title': 'Sierra loves doing laundry',
'description': 'md5:8ff0523848ac2b8f9b065ba781ccf294',
'age_limit': 18,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('video_id')
video_title = mobj.group('video_title')
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
next_url = self._html_search_regex(
r'&hd=(http.+?)&', webpage, 'video URL')
next_url = compat_urllib_parse.unquote(next_url)
title = self._og_search_title(webpage)
TITLE_SUFFIX = ' porn HD Video | PornHD.com '
if title.endswith(TITLE_SUFFIX):
title = title[:-len(TITLE_SUFFIX)]
video_url = self._download_webpage(
next_url, video_id, note='Retrieving video URL',
errnote='Could not retrieve video URL')
age_limit = 18
description = self._html_search_regex(
r'<div class="description">([^<]+)</div>', webpage, 'description', fatal=False)
view_count = int_or_none(self._html_search_regex(
r'(\d+) views </span>', webpage, 'view count', fatal=False))
formats = [
{
'url': format_url,
'ext': format.lower(),
'format_id': '%s-%s' % (format.lower(), quality.lower()),
'quality': 1 if quality.lower() == 'high' else 0,
} for format, quality, format_url in re.findall(
r'var __video([\da-zA-Z]+?)(Low|High)StreamUrl = \'(http://.+?)\?noProxy=1\'', webpage)
]
mobj = re.search(r'flashVars = (?P<flashvars>{.+?});', webpage)
if mobj:
flashvars = json.loads(mobj.group('flashvars'))
formats.extend([
{
'url': flashvars['hashlink'].replace('?noProxy=1', ''),
'ext': 'flv',
'format_id': 'flv-low',
'quality': 0,
},
{
'url': flashvars['hd'].replace('?noProxy=1', ''),
'ext': 'flv',
'format_id': 'flv-high',
'quality': 1,
}
])
thumbnail = flashvars['urlWallpaper']
else:
thumbnail = self._og_search_thumbnail(webpage)
self._sort_formats(formats)
return {
'id': video_id,
'url': video_url,
'ext': 'flv',
'title': video_title,
'age_limit': age_limit,
'title': title,
'description': description,
'thumbnail': thumbnail,
'view_count': view_count,
'formats': formats,
'age_limit': 18,
}

View File

@@ -8,8 +8,6 @@ from .common import InfoExtractor
from ..utils import (
compat_urllib_parse,
unified_strdate,
clean_html,
RegexNotFoundError,
)
@@ -160,6 +158,7 @@ class ProSiebenSat1IE(InfoExtractor):
_CLIPID_REGEXES = [
r'"clip_id"\s*:\s+"(\d+)"',
r'clipid: "(\d+)"',
r'clipId=(\d+)',
]
_TITLE_REGEXES = [
r'<h2 class="subtitle" itemprop="name">\s*(.+?)</h2>',
@@ -187,16 +186,7 @@ class ProSiebenSat1IE(InfoExtractor):
page = self._download_webpage(url, video_id, 'Downloading page')
def extract(patterns, name, page, fatal=False):
for pattern in patterns:
mobj = re.search(pattern, page)
if mobj:
return clean_html(mobj.group(1))
if fatal:
raise RegexNotFoundError(u'Unable to extract %s' % name)
return None
clip_id = extract(self._CLIPID_REGEXES, 'clip id', page, fatal=True)
clip_id = self._html_search_regex(self._CLIPID_REGEXES, page, 'clip id')
access_token = 'testclient'
client_name = 'kolibri-1.2.5'
@@ -245,13 +235,12 @@ class ProSiebenSat1IE(InfoExtractor):
urls = self._download_json(url_api_url, clip_id, 'Downloading urls JSON')
title = extract(self._TITLE_REGEXES, 'title', page, fatal=True)
description = extract(self._DESCRIPTION_REGEXES, 'description', page)
title = self._html_search_regex(self._TITLE_REGEXES, page, 'title')
description = self._html_search_regex(self._DESCRIPTION_REGEXES, page, 'description', fatal=False)
thumbnail = self._og_search_thumbnail(page)
upload_date = extract(self._UPLOAD_DATE_REGEXES, 'upload date', page)
if upload_date:
upload_date = unified_strdate(upload_date)
upload_date = unified_strdate(self._html_search_regex(
self._UPLOAD_DATE_REGEXES, page, 'upload date', fatal=False))
formats = []

View File

@@ -46,7 +46,8 @@ class PyvideoIE(InfoExtractor):
return self.url_result(m_youtube.group(1), 'Youtube')
title = self._html_search_regex(
r'<div class="section">.*?<h3>([^>]+?)</h3>', webpage, 'title', flags=re.DOTALL)
r'<div class="section">.*?<h3(?:\s+class="[^"]*")?>([^>]+?)</h3>',
webpage, 'title', flags=re.DOTALL)
video_url = self._search_regex(
[r'<source src="(.*?)"', r'<dt>Download</dt>.*?<a href="(.+?)"'],
webpage, 'video url', flags=re.DOTALL)

View File

@@ -18,7 +18,7 @@ class Ro220IE(InfoExtractor):
'md5': '03af18b73a07b4088753930db7a34add',
'info_dict': {
"title": "Luati-le Banii sez 4 ep 1",
"description": "Iata-ne reveniti dupa o binemeritata vacanta. Va astept si pe Facebook cu pareri si comentarii.",
"description": "re:^Iata-ne reveniti dupa o binemeritata vacanta\. +Va astept si pe Facebook cu pareri si comentarii.$",
}
}

View File

@@ -0,0 +1,49 @@
# coding: utf-8
from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor
class RTBFIE(InfoExtractor):
_VALID_URL = r'https?://www.rtbf.be/video/[^\?]+\?id=(?P<id>\d+)'
_TEST = {
'url': 'https://www.rtbf.be/video/detail_les-diables-au-coeur-episode-2?id=1921274',
'md5': '799f334ddf2c0a582ba80c44655be570',
'info_dict': {
'id': '1921274',
'ext': 'mp4',
'title': 'Les Diables au coeur (épisode 2)',
'description': 'Football - Diables Rouges',
'duration': 3099,
'timestamp': 1398456336,
'upload_date': '20140425',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
page = self._download_webpage('https://www.rtbf.be/video/embed?id=%s' % video_id, video_id)
data = json.loads(self._html_search_regex(
r'<div class="js-player-embed" data-video="([^"]+)"', page, 'data video'))['data']
video_url = data.get('downloadUrl') or data.get('url')
if data['provider'].lower() == 'youtube':
return self.url_result(video_url, 'Youtube')
return {
'id': video_id,
'url': video_url,
'title': data['title'],
'description': data.get('description') or data.get('subtitle'),
'thumbnail': data['thumbnail']['large'],
'duration': data.get('duration') or data.get('realDuration'),
'timestamp': data['created'],
'view_count': data['viewCount'],
}

View File

@@ -9,46 +9,136 @@ from ..utils import (
parse_duration,
parse_iso8601,
unescapeHTML,
compat_str,
)
class RTSIE(InfoExtractor):
IE_DESC = 'RTS.ch'
_VALID_URL = r'^https?://(?:www\.)?rts\.ch/archives/tv/[^/]+/(?P<id>[0-9]+)-.*?\.html'
_VALID_URL = r'^https?://(?:www\.)?rts\.ch/(?:[^/]+/){2,}(?P<id>[0-9]+)-.*?\.html'
_TEST = {
'url': 'http://www.rts.ch/archives/tv/divers/3449373-les-enfants-terribles.html',
'md5': '753b877968ad8afaeddccc374d4256a5',
'info_dict': {
'id': '3449373',
'ext': 'mp4',
'duration': 1488,
'title': 'Les Enfants Terribles',
'description': 'France Pommier et sa soeur Luce Feral, les deux filles de ce groupe de 5.',
'uploader': 'Divers',
'upload_date': '19680921',
'timestamp': -40280400,
'thumbnail': 're:^https?://.*\.image'
_TESTS = [
{
'url': 'http://www.rts.ch/archives/tv/divers/3449373-les-enfants-terribles.html',
'md5': '753b877968ad8afaeddccc374d4256a5',
'info_dict': {
'id': '3449373',
'ext': 'mp4',
'duration': 1488,
'title': 'Les Enfants Terribles',
'description': 'France Pommier et sa soeur Luce Feral, les deux filles de ce groupe de 5.',
'uploader': 'Divers',
'upload_date': '19680921',
'timestamp': -40280400,
'thumbnail': 're:^https?://.*\.image'
},
},
}
{
'url': 'http://www.rts.ch/emissions/passe-moi-les-jumelles/5624067-entre-ciel-et-mer.html',
'md5': 'c148457a27bdc9e5b1ffe081a7a8337b',
'info_dict': {
'id': '5624067',
'ext': 'mp4',
'duration': 3720,
'title': 'Les yeux dans les cieux - Mon homard au Canada',
'description': 'md5:d22ee46f5cc5bac0912e5a0c6d44a9f7',
'uploader': 'Passe-moi les jumelles',
'upload_date': '20140404',
'timestamp': 1396635300,
'thumbnail': 're:^https?://.*\.image'
},
},
{
'url': 'http://www.rts.ch/video/sport/hockey/5745975-1-2-kloten-fribourg-5-2-second-but-pour-gotteron-par-kwiatowski.html',
'md5': 'b4326fecd3eb64a458ba73c73e91299d',
'info_dict': {
'id': '5745975',
'ext': 'mp4',
'duration': 48,
'title': '1/2, Kloten - Fribourg (5-2): second but pour Gottéron par Kwiatowski',
'description': 'Hockey - Playoff',
'uploader': 'Hockey',
'upload_date': '20140403',
'timestamp': 1396556882,
'thumbnail': 're:^https?://.*\.image'
},
'skip': 'Blocked outside Switzerland',
},
{
'url': 'http://www.rts.ch/video/info/journal-continu/5745356-londres-cachee-par-un-epais-smog.html',
'md5': '9bb06503773c07ce83d3cbd793cebb91',
'info_dict': {
'id': '5745356',
'ext': 'mp4',
'duration': 33,
'title': 'Londres cachée par un épais smog',
'description': 'Un important voile de smog recouvre Londres depuis mercredi, provoqué par la pollution et du sable du Sahara.',
'uploader': 'Le Journal en continu',
'upload_date': '20140403',
'timestamp': 1396537322,
'thumbnail': 're:^https?://.*\.image'
},
},
{
'url': 'http://www.rts.ch/audio/couleur3/programmes/la-belle-video-de-stephane-laurenceau/5706148-urban-hippie-de-damien-krisl-03-04-2014.html',
'md5': 'dd8ef6a22dff163d063e2a52bc8adcae',
'info_dict': {
'id': '5706148',
'ext': 'mp3',
'duration': 123,
'title': '"Urban Hippie", de Damien Krisl',
'description': 'Des Hippies super glam.',
'upload_date': '20140403',
'timestamp': 1396551600,
},
},
]
def _real_extract(self, url):
m = re.match(self._VALID_URL, url)
video_id = m.group('id')
all_info = self._download_json(
'http://www.rts.ch/a/%s.html?f=json/article' % video_id, video_id)
info = all_info['video']['JSONinfo']
def download_json(internal_id):
return self._download_json(
'http://www.rts.ch/a/%s.html?f=json/article' % internal_id,
video_id)
all_info = download_json(video_id)
# video_id extracted out of URL is not always a real id
if 'video' not in all_info and 'audio' not in all_info:
page = self._download_webpage(url, video_id)
internal_id = self._html_search_regex(
r'<(?:video|audio) data-id="([0-9]+)"', page,
'internal video id')
all_info = download_json(internal_id)
info = all_info['video']['JSONinfo'] if 'video' in all_info else all_info['audio']
upload_timestamp = parse_iso8601(info.get('broadcast_date'))
duration = parse_duration(info.get('duration'))
duration = info.get('duration') or info.get('cutout') or info.get('cutduration')
if isinstance(duration, compat_str):
duration = parse_duration(duration)
view_count = info.get('plays')
thumbnail = unescapeHTML(info.get('preview_image_url'))
def extract_bitrate(url):
return int_or_none(self._search_regex(
r'-([0-9]+)k\.', url, 'bitrate', default=None))
formats = [{
'format_id': fid,
'url': furl,
'tbr': int_or_none(self._search_regex(
r'-([0-9]+)k\.', furl, 'bitrate', default=None)),
'tbr': extract_bitrate(furl),
} for fid, furl in info['streams'].items()]
if 'media' in info:
formats.extend([{
'format_id': '%s-%sk' % (media['ext'], media['rate']),
'url': 'http://download-video.rts.ch/%s' % media['url'],
'tbr': media['rate'] or extract_bitrate(media['url']),
} for media in info['media'] if media.get('rate')])
self._sort_formats(formats)
return {
@@ -57,6 +147,7 @@ class RTSIE(InfoExtractor):
'title': info['title'],
'description': info.get('intro'),
'duration': duration,
'view_count': view_count,
'uploader': info.get('programName'),
'timestamp': upload_timestamp,
'thumbnail': thumbnail,

View File

@@ -0,0 +1,84 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
import base64
from .common import InfoExtractor
from ..utils import (
struct_unpack,
)
class RTVEALaCartaIE(InfoExtractor):
IE_NAME = 'rtve.es:alacarta'
IE_DESC = 'RTVE a la carta'
_VALID_URL = r'http://www\.rtve\.es/alacarta/videos/[^/]+/[^/]+/(?P<id>\d+)'
_TEST = {
'url': 'http://www.rtve.es/alacarta/videos/balonmano/o-swiss-cup-masculina-final-espana-suecia/2491869/',
'md5': '18fcd45965bdd076efdb12cd7f6d7b9e',
'info_dict': {
'id': '2491869',
'ext': 'mp4',
'title': 'Balonmano - Swiss Cup masculina. Final: España-Suecia',
},
}
def _decrypt_url(self, png):
encrypted_data = base64.b64decode(png)
text_index = encrypted_data.find(b'tEXt')
text_chunk = encrypted_data[text_index-4:]
length = struct_unpack('!I', text_chunk[:4])[0]
# Use bytearray to get integers when iterating in both python 2.x and 3.x
data = bytearray(text_chunk[8:8+length])
data = [chr(b) for b in data if b != 0]
hash_index = data.index('#')
alphabet_data = data[:hash_index]
url_data = data[hash_index+1:]
alphabet = []
e = 0
d = 0
for l in alphabet_data:
if d == 0:
alphabet.append(l)
d = e = (e + 1) % 4
else:
d -= 1
url = ''
f = 0
e = 3
b = 1
for letter in url_data:
if f == 0:
l = int(letter)*10
f = 1
else:
if e == 0:
l += int(letter)
url += alphabet[l]
e = (b + 3) % 4
f = 0
b += 1
else:
e -= 1
return url
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
info = self._download_json(
'http://www.rtve.es/api/videos/%s/config/alacarta_videos.json' % video_id,
video_id)['page']['items'][0]
png_url = 'http://www.rtve.es/ztnr/movil/thumbnail/default/videos/%s.png' % video_id
png = self._download_webpage(png_url, video_id, 'Downloading url information')
video_url = self._decrypt_url(png)
return {
'id': video_id,
'title': info['title'],
'url': video_url,
'thumbnail': info['image'],
}

View File

@@ -43,13 +43,14 @@ class RutubeIE(InfoExtractor):
'http://rutube.ru/api/video/%s/?format=json' % video_id,
video_id, 'Downloading video JSON')
trackinfo = self._download_json(
'http://rutube.ru/api/play/trackinfo/%s/?format=json' % video_id,
video_id, 'Downloading trackinfo JSON')
# Some videos don't have the author field
author = trackinfo.get('author') or {}
m3u8_url = trackinfo['video_balancer'].get('m3u8')
author = video.get('author') or {}
options = self._download_json(
'http://rutube.ru/api/play/options/%s/?format=json' % video_id,
video_id, 'Downloading options JSON')
m3u8_url = options['video_balancer'].get('m3u8')
if m3u8_url is None:
raise ExtractorError('Couldn\'t find m3u8 manifest url')

View File

@@ -12,7 +12,12 @@ from ..utils import (
class RUTVIE(InfoExtractor):
IE_DESC = 'RUTV.RU'
_VALID_URL = r'https?://player\.(?:rutv\.ru|vgtrk\.com)/(?:flash2v/container\.swf\?id=|iframe/(?P<type>swf|video|live)/id/)(?P<id>\d+)'
_VALID_URL = r'''(?x)
https?://player\.(?:rutv\.ru|vgtrk\.com)/
(?P<path>flash2v/container\.swf\?id=
|iframe/(?P<type>swf|video|live)/id/
|index/iframe/cast_id/)
(?P<id>\d+)'''
_TESTS = [
{
@@ -90,7 +95,7 @@ class RUTVIE(InfoExtractor):
@classmethod
def _extract_url(cls, webpage):
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>https?://player\.rutv\.ru/iframe/(?:swf|video|live)/id/.+?)\1', webpage)
r'<iframe[^>]+?src=(["\'])(?P<url>https?://player\.rutv\.ru/(?:iframe/(?:swf|video|live)/id|index/iframe/cast_id)/.+?)\1', webpage)
if mobj:
return mobj.group('url')
@@ -103,10 +108,16 @@ class RUTVIE(InfoExtractor):
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_type = mobj.group('type')
video_path = mobj.group('path')
if not video_type or video_type == 'swf':
if video_path.startswith('flash2v'):
video_type = 'video'
elif video_path.startswith('iframe'):
video_type = mobj.group('type')
if video_type == 'swf':
video_type = 'video'
elif video_path.startswith('index/iframe/cast_id'):
video_type = 'live'
json_data = self._download_json(
'http://player.rutv.ru/iframe/%splay/id/%s' % ('live-' if video_type == 'live' else '', video_id),

View File

@@ -0,0 +1,56 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import int_or_none
class SciVeeIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?scivee\.tv/node/(?P<id>\d+)'
_TEST = {
'url': 'http://www.scivee.tv/node/62352',
'md5': 'b16699b74c9e6a120f6772a44960304f',
'info_dict': {
'id': '62352',
'ext': 'mp4',
'title': 'Adam Arkin at the 2014 DOE JGI Genomics of Energy & Environment Meeting',
'description': 'md5:81f1710638e11a481358fab1b11059d7',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
# annotations XML is malformed
annotations = self._download_webpage(
'http://www.scivee.tv/assets/annotations/%s' % video_id, video_id, 'Downloading annotations')
title = self._html_search_regex(r'<title>([^<]+)</title>', annotations, 'title')
description = self._html_search_regex(r'<abstract>([^<]+)</abstract>', annotations, 'abstract', fatal=False)
filesize = int_or_none(self._html_search_regex(
r'<filesize>([^<]+)</filesize>', annotations, 'filesize', fatal=False))
formats = [
{
'url': 'http://www.scivee.tv/assets/audio/%s' % video_id,
'ext': 'mp3',
'format_id': 'audio',
},
{
'url': 'http://www.scivee.tv/assets/video/%s' % video_id,
'ext': 'mp4',
'format_id': 'video',
'filesize': filesize,
},
]
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': 'http://www.scivee.tv/assets/videothumb/%s' % video_id,
'formats': formats,
}

View File

@@ -39,7 +39,8 @@ class SlideshareIE(InfoExtractor):
ext = info['jsplayer']['video_extension']
video_url = compat_urlparse.urljoin(bucket, doc + '-SD.' + ext)
description = self._html_search_regex(
r'<p class="description.*?"[^>]*>(.*?)</p>', webpage, 'description')
r'<p\s+(?:style="[^"]*"\s+)?class="description.*?"[^>]*>(.*?)</p>', webpage,
'description', fatal=False)
return {
'_type': 'video',

View File

@@ -0,0 +1,47 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
)
class SlutloadIE(InfoExtractor):
_VALID_URL = r'^https?://(?:\w+\.)?slutload\.com/video/[^/]+/(?P<id>[^/]+)/?$'
_TEST = {
'url': 'http://www.slutload.com/video/virginie-baisee-en-cam/TD73btpBqSxc/',
'md5': '0cf531ae8006b530bd9df947a6a0df77',
'info_dict': {
'id': 'TD73btpBqSxc',
'ext': 'mp4',
"title": "virginie baisee en cam",
"age_limit": 18,
'thumbnail': 're:https?://.*?\.jpg'
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
video_title = self._html_search_regex(r'<h1><strong>([^<]+)</strong>',
webpage, 'title').strip()
video_url = self._html_search_regex(
r'(?s)<div id="vidPlayer"\s+data-url="([^"]+)"',
webpage, 'video URL')
thumbnail = self._html_search_regex(
r'(?s)<div id="vidPlayer"\s+.*?previewer-file="([^"]+)"',
webpage, 'thumbnail', fatal=False)
return {
'id': video_id,
'url': video_url,
'title': video_title,
'thumbnail': thumbnail,
'age_limit': 18
}

View File

@@ -25,7 +25,7 @@ class SoundcloudIE(InfoExtractor):
of the stream token and uid
"""
_VALID_URL = r'''^(?:https?://)?
_VALID_URL = r'''(?x)^(?:https?://)?
(?:(?:(?:www\.|m\.)?soundcloud\.com/
(?P<uploader>[\w\d-]+)/
(?!sets/)(?P<title>[\w\d-]+)/?
@@ -94,10 +94,6 @@ class SoundcloudIE(InfoExtractor):
_CLIENT_ID = 'b45b1aa10f1ac2941910a7f0d10f8e28'
_IPHONE_CLIENT_ID = '376f225bf427445fc4bfb6b99b72e0bf'
@classmethod
def suitable(cls, url):
return re.match(cls._VALID_URL, url, flags=re.VERBOSE) is not None
def report_resolve(self, video_id):
"""Report information extraction."""
self.to_screen('%s: Resolving id' % video_id)
@@ -141,11 +137,10 @@ class SoundcloudIE(InfoExtractor):
# We have to retrieve the url
streams_url = ('http://api.soundcloud.com/i1/tracks/{0}/streams?'
'client_id={1}&secret_token={2}'.format(track_id, self._IPHONE_CLIENT_ID, secret_token))
stream_json = self._download_webpage(
format_dict = self._download_json(
streams_url,
track_id, 'Downloading track url')
format_dict = json.loads(stream_json)
for key, stream_url in format_dict.items():
if key.startswith('http'):
formats.append({
@@ -198,7 +193,7 @@ class SoundcloudIE(InfoExtractor):
full_title = track_id
elif mobj.group('player'):
query = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
return self.url_result(query['url'][0], ie='Soundcloud')
return self.url_result(query['url'][0])
else:
# extract uploader (which is in the url)
uploader = mobj.group('uploader')
@@ -213,11 +208,11 @@ class SoundcloudIE(InfoExtractor):
url = 'http://soundcloud.com/%s' % resolve_title
info_json_url = self._resolv_url(url)
info_json = self._download_webpage(info_json_url, full_title, 'Downloading info JSON')
info = self._download_json(info_json_url, full_title, 'Downloading info JSON')
info = json.loads(info_json)
return self._extract_info_dict(info, full_title, secret_token=token)
class SoundcloudSetIE(SoundcloudIE):
_VALID_URL = r'https?://(?:www\.)?soundcloud\.com/([\w\d-]+)/sets/([\w\d-]+)'
IE_NAME = 'soundcloud:set'
@@ -232,16 +227,15 @@ class SoundcloudSetIE(SoundcloudIE):
# extract uploader (which is in the url)
uploader = mobj.group(1)
# extract simple title (uploader + slug of song title)
slug_title = mobj.group(2)
slug_title = mobj.group(2)
full_title = '%s/sets/%s' % (uploader, slug_title)
self.report_resolve(full_title)
url = 'http://soundcloud.com/%s/sets/%s' % (uploader, slug_title)
resolv_url = self._resolv_url(url)
info_json = self._download_webpage(resolv_url, full_title)
info = self._download_json(resolv_url, full_title)
info = json.loads(info_json)
if 'errors' in info:
for err in info['errors']:
self._downloader.report_error('unable to download video webpage: %s' % compat_str(err['error_message']))
@@ -268,26 +262,55 @@ class SoundcloudUserIE(SoundcloudIE):
url = 'http://soundcloud.com/%s/' % uploader
resolv_url = self._resolv_url(url)
user_json = self._download_webpage(resolv_url, uploader,
'Downloading user info')
user = json.loads(user_json)
user = self._download_json(
resolv_url, uploader, 'Downloading user info')
base_url = 'http://api.soundcloud.com/users/%s/tracks.json?' % uploader
tracks = []
entries = []
for i in itertools.count():
data = compat_urllib_parse.urlencode({'offset': i*50,
'client_id': self._CLIENT_ID,
})
tracks_url = 'http://api.soundcloud.com/users/%s/tracks.json?' % user['id'] + data
response = self._download_webpage(tracks_url, uploader,
'Downloading tracks page %s' % (i+1))
new_tracks = json.loads(response)
tracks.extend(self._extract_info_dict(track, quiet=True) for track in new_tracks)
if len(new_tracks) < 50:
data = compat_urllib_parse.urlencode({
'offset': i * 50,
'client_id': self._CLIENT_ID,
})
new_entries = self._download_json(
base_url + data, uploader, 'Downloading track page %s' % (i + 1))
entries.extend(self._extract_info_dict(e, quiet=True) for e in new_entries)
if len(new_entries) < 50:
break
return {
'_type': 'playlist',
'id': compat_str(user['id']),
'title': user['username'],
'entries': tracks,
'entries': entries,
}
class SoundcloudPlaylistIE(SoundcloudIE):
_VALID_URL = r'https?://api\.soundcloud\.com/playlists/(?P<id>[0-9]+)'
IE_NAME = 'soundcloud:playlist'
# it's in tests/test_playlists.py
_TESTS = []
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
playlist_id = mobj.group('id')
base_url = '%s//api.soundcloud.com/playlists/%s.json?' % (self.http_scheme(), playlist_id)
data = compat_urllib_parse.urlencode({
'client_id': self._CLIENT_ID,
})
data = self._download_json(
base_url + data, playlist_id, 'Downloading playlist')
entries = [
self._extract_info_dict(t, quiet=True) for t in data['tracks']]
return {
'_type': 'playlist',
'id': playlist_id,
'title': data.get('title'),
'description': data.get('description'),
'entries': entries,
}

View File

@@ -1,3 +1,5 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
@@ -8,78 +10,114 @@ from ..utils import (
class SteamIE(InfoExtractor):
_VALID_URL = r"""http://store\.steampowered\.com/
(agecheck/)?
(?P<urltype>video|app)/ #If the page is only for videos or for a game
(?P<gameID>\d+)/?
(?P<videoID>\d*)(?P<extra>\??) #For urltype == video we sometimes get the videoID
"""
_VALID_URL = r"""(?x)
https?://store\.steampowered\.com/
(agecheck/)?
(?P<urltype>video|app)/ #If the page is only for videos or for a game
(?P<gameID>\d+)/?
(?P<videoID>\d*)(?P<extra>\??) # For urltype == video we sometimes get the videoID
|
https?://(?:www\.)?steamcommunity\.com/sharedfiles/filedetails/\?id=(?P<fileID>[0-9]+)
"""
_VIDEO_PAGE_TEMPLATE = 'http://store.steampowered.com/video/%s/'
_AGECHECK_TEMPLATE = 'http://store.steampowered.com/agecheck/video/%s/?snr=1_agecheck_agecheck__age-gate&ageDay=1&ageMonth=January&ageYear=1970'
_TEST = {
u"url": u"http://store.steampowered.com/video/105600/",
u"playlist": [
_TESTS = [{
"url": "http://store.steampowered.com/video/105600/",
"playlist": [
{
u"file": u"81300.flv",
u"md5": u"f870007cee7065d7c76b88f0a45ecc07",
u"info_dict": {
u"title": u"Terraria 1.1 Trailer",
u'playlist_index': 1,
"md5": "f870007cee7065d7c76b88f0a45ecc07",
"info_dict": {
'id': '81300',
'ext': 'flv',
"title": "Terraria 1.1 Trailer",
'playlist_index': 1,
}
},
{
u"file": u"80859.flv",
u"md5": u"61aaf31a5c5c3041afb58fb83cbb5751",
u"info_dict": {
u"title": u"Terraria Trailer",
u'playlist_index': 2,
"md5": "61aaf31a5c5c3041afb58fb83cbb5751",
"info_dict": {
'id': '80859',
'ext': 'flv',
"title": "Terraria Trailer",
'playlist_index': 2,
}
}
]
}
@classmethod
def suitable(cls, url):
"""Receives a URL and returns True if suitable for this IE."""
return re.match(cls._VALID_URL, url, re.VERBOSE) is not None
],
'params': {
'playlistend': 2,
}
}, {
'url': 'http://steamcommunity.com/sharedfiles/filedetails/?id=242472205',
'info_dict': {
'id': 'WB5DvDOOvAY',
'ext': 'mp4',
'upload_date': '20140329',
'title': 'FRONTIERS - Final Greenlight Trailer',
'description': "The final trailer for the Steam Greenlight launch. Hooray, progress! Here's the official Greenlight page: http://steamcommunity.com/sharedfiles/filedetails/?id=242472205",
'uploader': 'AAD Productions',
'uploader_id': 'AtomicAgeDogGames',
}
}]
def _real_extract(self, url):
m = re.match(self._VALID_URL, url, re.VERBOSE)
gameID = m.group('gameID')
videourl = self._VIDEO_PAGE_TEMPLATE % gameID
webpage = self._download_webpage(videourl, gameID)
m = re.match(self._VALID_URL, url)
fileID = m.group('fileID')
if fileID:
videourl = url
playlist_id = fileID
else:
gameID = m.group('gameID')
playlist_id = gameID
videourl = self._VIDEO_PAGE_TEMPLATE % playlist_id
webpage = self._download_webpage(videourl, playlist_id)
if re.search('<h2>Please enter your birth date to continue:</h2>', webpage) is not None:
videourl = self._AGECHECK_TEMPLATE % gameID
videourl = self._AGECHECK_TEMPLATE % playlist_id
self.report_age_confirmation()
webpage = self._download_webpage(videourl, gameID)
webpage = self._download_webpage(videourl, playlist_id)
self.report_extraction(gameID)
game_title = self._html_search_regex(r'<h2 class="pageheader">(.*?)</h2>',
webpage, 'game title')
if fileID:
playlist_title = self._html_search_regex(
r'<div class="workshopItemTitle">(.+)</div>', webpage, 'title')
mweb = re.finditer(r'''(?x)
'movie_(?P<videoID>[0-9]+)':\s*\{\s*
YOUTUBE_VIDEO_ID:\s*"(?P<youtube_id>[^"]+)",
''', webpage)
videos = [{
'_type': 'url',
'url': vid.group('youtube_id'),
'ie_key': 'Youtube',
} for vid in mweb]
else:
playlist_title = self._html_search_regex(
r'<h2 class="pageheader">(.*?)</h2>', webpage, 'game title')
urlRE = r"'movie_(?P<videoID>\d+)': \{\s*FILENAME: \"(?P<videoURL>[\w:/\.\?=]+)\"(,\s*MOVIE_NAME: \"(?P<videoName>[\w:/\.\?=\+-]+)\")?\s*\},"
mweb = re.finditer(urlRE, webpage)
namesRE = r'<span class="title">(?P<videoName>.+?)</span>'
titles = re.finditer(namesRE, webpage)
thumbsRE = r'<img class="movie_thumb" src="(?P<thumbnail>.+?)">'
thumbs = re.finditer(thumbsRE, webpage)
videos = []
for vid,vtitle,thumb in zip(mweb,titles,thumbs):
video_id = vid.group('videoID')
title = vtitle.group('videoName')
video_url = vid.group('videoURL')
video_thumb = thumb.group('thumbnail')
if not video_url:
raise ExtractorError(u'Cannot find video url for %s' % video_id)
info = {
'id':video_id,
'url':video_url,
'ext': 'flv',
'title': unescapeHTML(title),
'thumbnail': video_thumb
}
videos.append(info)
return [self.playlist_result(videos, gameID, game_title)]
mweb = re.finditer(r'''(?x)
'movie_(?P<videoID>[0-9]+)':\s*\{\s*
FILENAME:\s*"(?P<videoURL>[\w:/\.\?=]+)"
(,\s*MOVIE_NAME:\s*\"(?P<videoName>[\w:/\.\?=\+-]+)\")?\s*\},
''', webpage)
titles = re.finditer(
r'<span class="title">(?P<videoName>.+?)</span>', webpage)
thumbs = re.finditer(
r'<img class="movie_thumb" src="(?P<thumbnail>.+?)">', webpage)
videos = []
for vid, vtitle, thumb in zip(mweb, titles, thumbs):
video_id = vid.group('videoID')
title = vtitle.group('videoName')
video_url = vid.group('videoURL')
video_thumb = thumb.group('thumbnail')
if not video_url:
raise ExtractorError('Cannot find video url for %s' % video_id)
videos.append({
'id': video_id,
'url': video_url,
'ext': 'flv',
'title': unescapeHTML(title),
'thumbnail': video_thumb
})
if not videos:
raise ExtractorError('Could not find any videos')
return self.playlist_result(videos, playlist_id, playlist_title)

View File

@@ -6,9 +6,9 @@ from .common import InfoExtractor
class SyfyIE(InfoExtractor):
_VALID_URL = r'https?://www\.syfy\.com/videos/.+?vid:(?P<id>\d+)'
_VALID_URL = r'https?://www\.syfy\.com/(?:videos/.+?vid:(?P<id>[0-9]+)|(?!videos)(?P<video_name>[^/]+)(?:$|[?#]))'
_TEST = {
_TESTS = [{
'url': 'http://www.syfy.com/videos/Robot%20Combat%20League/Behind%20the%20Scenes/vid:2631458',
'md5': 'e07de1d52c7278adbb9b9b1c93a66849',
'info_dict': {
@@ -18,10 +18,30 @@ class SyfyIE(InfoExtractor):
'description': 'Listen to what insights George Lucas give his daughter Amanda.',
},
'add_ie': ['ThePlatform'],
}
}, {
'url': 'http://www.syfy.com/wilwheaton',
'md5': '94dfa54ee3ccb63295b276da08c415f6',
'info_dict': {
'id': '4yoffOOXC767',
'ext': 'flv',
'title': 'The Wil Wheaton Project - Premiering May 27th at 10/9c.',
'description': 'The Wil Wheaton Project premieres May 27th at 10/9c. Don\'t miss it.',
},
'add_ie': ['ThePlatform'],
'skip': 'Blocked outside the US',
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_name = mobj.group('video_name')
if video_name:
generic_webpage = self._download_webpage(url, video_name)
video_id = self._search_regex(
r'<iframe.*?class="video_iframe_page"\s+src="/_utils/video/thP_video_controller.php.*?_vid([0-9]+)">',
generic_webpage, 'video ID')
url = 'http://www.syfy.com/videos/%s/%s/vid:%s' % (
video_name, video_name, video_id)
else:
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
return self.url_result(self._og_search_video_url(webpage))

View File

@@ -3,14 +3,21 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
)
class TeamcocoIE(InfoExtractor):
_VALID_URL = r'http://teamcoco\.com/video/(?P<url_title>.*)'
_TEST = {
_VALID_URL = r'http://teamcoco\.com/video/(?P<video_id>[0-9]+)?/?(?P<display_id>.*)'
_TESTS = [
{
'url': 'http://teamcoco.com/video/80187/conan-becomes-a-mary-kay-beauty-consultant',
'file': '80187.mp4',
'md5': '3f7746aa0dc86de18df7539903d399ea',
'info_dict': {
'title': 'Conan Becomes A Mary Kay Beauty Consultant',
'description': 'Mary Kay is perhaps the most trusted name in female beauty, so of course Conan is a natural choice to sell their products.'
}
},
{
'url': 'http://teamcoco.com/video/louis-ck-interview-george-w-bush',
'file': '19705.mp4',
'md5': 'cde9ba0fa3506f5f017ce11ead928f9a',
@@ -19,22 +26,23 @@ class TeamcocoIE(InfoExtractor):
"title": "Louis C.K. Interview Pt. 1 11/3/11"
}
}
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
if mobj is None:
raise ExtractorError('Invalid URL: %s' % url)
url_title = mobj.group('url_title')
webpage = self._download_webpage(url, url_title)
video_id = self._html_search_regex(
r'<article class="video" data-id="(\d+?)"',
webpage, 'video id')
self.report_extraction(video_id)
display_id = mobj.group('display_id')
webpage = self._download_webpage(url, display_id)
video_id = mobj.group("video_id")
if not video_id:
video_id = self._html_search_regex(
r'<article class="video" data-id="(\d+?)"',
webpage, 'video id')
data_url = 'http://teamcoco.com/cvp/2.0/%s.xml' % video_id
data = self._download_xml(data_url, video_id, 'Downloading data webpage')
data = self._download_xml(
data_url, display_id, 'Downloading data webpage')
qualities = ['500k', '480p', '1000k', '720p', '1080p']
formats = []
@@ -69,6 +77,7 @@ class TeamcocoIE(InfoExtractor):
return {
'id': video_id,
'display_id': display_id,
'formats': formats,
'title': self._og_search_title(webpage),
'thumbnail': self._og_search_thumbnail(webpage),

View File

@@ -37,6 +37,7 @@ class TEDIE(SubtitlesInfoExtractor):
'consciousness, but that half the time our brains are '
'actively fooling us.'),
'uploader': 'Dan Dennett',
'width': 854,
}
}, {
'url': 'http://www.ted.com/watch/ted-institute/ted-bcg/vishal-sikka-the-beauty-and-power-of-algorithms',
@@ -48,12 +49,22 @@ class TEDIE(SubtitlesInfoExtractor):
'thumbnail': 're:^https?://.+\.jpg',
'description': 'Adaptive, intelligent, and consistent, algorithms are emerging as the ultimate app for everything from matching consumers to products to assessing medical diagnoses. Vishal Sikka shares his appreciation for the algorithm, charting both its inherent beauty and its growing power.',
}
}, {
'url': 'http://www.ted.com/talks/gabby_giffords_and_mark_kelly_be_passionate_be_courageous_be_your_best',
'md5': '49144e345a899b8cb34d315f3b9cfeeb',
'info_dict': {
'id': '1972',
'ext': 'mp4',
'title': 'Be passionate. Be courageous. Be your best.',
'uploader': 'Gabby Giffords and Mark Kelly',
'description': 'md5:5174aed4d0f16021b704120360f72b92',
},
}]
_FORMATS_PREFERENCE = {
'low': 1,
'medium': 2,
'high': 3,
_NATIVE_FORMATS = {
'low': {'preference': 1, 'width': 320, 'height': 180},
'medium': {'preference': 2, 'width': 512, 'height': 288},
'high': {'preference': 3, 'width': 854, 'height': 480},
}
def _extract_info(self, webpage):
@@ -83,7 +94,7 @@ class TEDIE(SubtitlesInfoExtractor):
playlist_info = info['playlist']
playlist_entries = [
self.url_result(u'http://www.ted.com/talks/' + talk['slug'], self.ie_key())
self.url_result('http://www.ted.com/talks/' + talk['slug'], self.ie_key())
for talk in info['talks']
]
return self.playlist_result(
@@ -98,12 +109,26 @@ class TEDIE(SubtitlesInfoExtractor):
talk_info = self._extract_info(webpage)['talks'][0]
formats = [{
'ext': 'mp4',
'url': format_url,
'format_id': format_id,
'format': format_id,
'preference': self._FORMATS_PREFERENCE.get(format_id, -1),
} for (format_id, format_url) in talk_info['nativeDownloads'].items()]
} for (format_id, format_url) in talk_info['nativeDownloads'].items() if format_url is not None]
if formats:
for f in formats:
finfo = self._NATIVE_FORMATS.get(f['format_id'])
if finfo:
f.update(finfo)
else:
# Use rtmp downloads
formats = [{
'format_id': f['name'],
'url': talk_info['streamer'],
'play_path': f['file'],
'ext': 'flv',
'width': f['width'],
'height': f['height'],
'tbr': f['bitrate'],
} for f in talk_info['resources']['rtmp']]
self._sort_formats(formats)
video_id = compat_str(talk_info['id'])
@@ -135,7 +160,7 @@ class TEDIE(SubtitlesInfoExtractor):
sub_lang_list[l] = url
return sub_lang_list
else:
self._downloader.report_warning(u'video doesn\'t have subtitles')
self._downloader.report_warning('video doesn\'t have subtitles')
return {}
def _watch_info(self, url, name):
@@ -150,7 +175,10 @@ class TEDIE(SubtitlesInfoExtractor):
title = self._html_search_regex(
r"(?s)<h1(?:\s+class='[^']+')?>(.+?)</h1>", webpage, 'title')
description = self._html_search_regex(
r'(?s)<h4 class="[^"]+" id="h3--about-this-talk">.*?</h4>(.*?)</div>',
[
r'(?s)<h4 class="[^"]+" id="h3--about-this-talk">.*?</h4>(.*?)</div>',
r'(?s)<p><strong>About this talk:</strong>\s+(.*?)</p>',
],
webpage, 'description', fatal=False)
return {

View File

@@ -52,7 +52,7 @@ class ThePlatformIE(InfoExtractor):
head = meta.find(_x('smil:head'))
body = meta.find(_x('smil:body'))
f4m_node = body.find(_x('smil:seq/smil:video'))
f4m_node = body.find(_x('smil:seq//smil:video'))
if f4m_node is not None:
f4m_url = f4m_node.attrib['src']
if 'manifest.f4m?' not in f4m_url:

View File

@@ -0,0 +1,60 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from .brightcove import BrightcoveIE
from .discovery import DiscoveryIE
class TlcIE(DiscoveryIE):
IE_NAME = 'tlc.com'
_VALID_URL = r'http://www\.tlc\.com\/[a-zA-Z0-9\-]*/[a-zA-Z0-9\-]*/videos/(?P<id>[a-zA-Z0-9\-]*)(.htm)?'
_TEST = {
'url': 'http://www.tlc.com/tv-shows/cake-boss/videos/too-big-to-fly.htm',
'md5': 'c4038f4a9b44d0b5d74caaa64ed2a01a',
'info_dict': {
'id': '853232',
'ext': 'mp4',
'title': 'Cake Boss: Too Big to Fly',
'description': 'Buddy has taken on a high flying task.',
'duration': 119,
},
}
class TlcDeIE(InfoExtractor):
IE_NAME = 'tlc.de'
_VALID_URL = r'http://www\.tlc\.de/sendungen/[^/]+/videos/(?P<title>[^/?]+)'
_TEST = {
'url': 'http://www.tlc.de/sendungen/breaking-amish/videos/#3235167922001',
'info_dict': {
'id': '3235167922001',
'ext': 'mp4',
'title': 'Breaking Amish: Die Welt da draußen',
'uploader': 'Discovery Networks - Germany',
'description': 'Vier Amische und eine Mennonitin wagen in New York'
' den Sprung in ein komplett anderes Leben. Begleitet sie auf'
' ihrem spannenden Weg.',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
title = mobj.group('title')
webpage = self._download_webpage(url, title)
iframe_url = self._search_regex(
'<iframe src="(http://www\.tlc\.de/wp-content/.+?)"', webpage,
'iframe url')
# Otherwise we don't get the correct 'BrightcoveExperience' element,
# example: http://www.tlc.de/sendungen/cake-boss/videos/cake-boss-cannoli-drama/
iframe_url = iframe_url.replace('.htm?', '.php?')
iframe = self._download_webpage(iframe_url, title)
return {
'_type': 'url',
'url': BrightcoveIE._extract_brightcove_url(iframe),
'ie': BrightcoveIE.ie_key(),
}

View File

@@ -1,63 +1,83 @@
import os
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..utils import (
compat_urllib_parse_urlparse,
compat_urllib_request,
int_or_none,
str_to_int,
)
from ..aes import (
aes_decrypt_text
)
from ..aes import aes_decrypt_text
class Tube8IE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?(?:www\.)?(?P<url>tube8\.com/.+?/(?P<videoid>\d+)/?)$'
_VALID_URL = r'https?://(?:www\.)?tube8\.com/(?:[^/]+/){2}(?P<id>\d+)'
_TEST = {
u'url': u'http://www.tube8.com/teen/kasia-music-video/229795/',
u'file': u'229795.mp4',
u'md5': u'e9e0b0c86734e5e3766e653509475db0',
u'info_dict': {
u"description": u"hot teen Kasia grinding",
u"uploader": u"unknown",
u"title": u"Kasia music video",
u"age_limit": 18,
'url': 'http://www.tube8.com/teen/kasia-music-video/229795/',
'file': '229795.mp4',
'md5': 'e9e0b0c86734e5e3766e653509475db0',
'info_dict': {
'description': 'hot teen Kasia grinding',
'uploader': 'unknown',
'title': 'Kasia music video',
'age_limit': 18,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('videoid')
url = 'http://www.' + mobj.group('url')
video_id = mobj.group('id')
req = compat_urllib_request.Request(url)
req.add_header('Cookie', 'age_verified=1')
webpage = self._download_webpage(req, video_id)
video_title = self._html_search_regex(r'videotitle ="([^"]+)', webpage, u'title')
video_description = self._html_search_regex(r'>Description:</strong>(.+?)<', webpage, u'description', fatal=False)
video_uploader = self._html_search_regex(r'>Submitted by:</strong>(?:\s|<[^>]*>)*(.+?)<', webpage, u'uploader', fatal=False)
thumbnail = self._html_search_regex(r'"image_url":"([^"]+)', webpage, u'thumbnail', fatal=False)
if thumbnail:
thumbnail = thumbnail.replace('\\/', '/')
flashvars = json.loads(self._html_search_regex(
r'var flashvars\s*=\s*({.+?})', webpage, 'flashvars'))
video_url = self._html_search_regex(r'"video_url":"([^"]+)', webpage, u'video_url')
if webpage.find('"encrypted":true')!=-1:
password = self._html_search_regex(r'"video_title":"([^"]+)', webpage, u'password')
video_url = aes_decrypt_text(video_url, password, 32).decode('utf-8')
video_url = flashvars['video_url']
if flashvars.get('encrypted') is True:
video_url = aes_decrypt_text(video_url, flashvars['video_title'], 32).decode('utf-8')
path = compat_urllib_parse_urlparse(video_url).path
extension = os.path.splitext(path)[1][1:]
format = path.split('/')[4].split('_')[:2]
format = "-".join(format)
format_id = '-'.join(path.split('/')[4].split('_')[:2])
thumbnail = flashvars.get('image_url')
title = self._html_search_regex(
r'videotitle\s*=\s*"([^"]+)', webpage, 'title')
description = self._html_search_regex(
r'>Description:</strong>(.+?)<', webpage, 'description', fatal=False)
uploader = self._html_search_regex(
r'<strong class="video-username">(?:<a href="[^"]+">)?([^<]+)(?:</a>)?</strong>',
webpage, 'uploader', fatal=False)
like_count = int_or_none(self._html_search_regex(
r"rupVar\s*=\s*'(\d+)'", webpage, 'like count', fatal=False))
dislike_count = int_or_none(self._html_search_regex(
r"rdownVar\s*=\s*'(\d+)'", webpage, 'dislike count', fatal=False))
view_count = self._html_search_regex(
r'<strong>Views: </strong>([\d,\.]+)</li>', webpage, 'view count', fatal=False)
if view_count:
view_count = str_to_int(view_count)
comment_count = self._html_search_regex(
r'<span id="allCommentsCount">(\d+)</span>', webpage, 'comment count', fatal=False)
if comment_count:
comment_count = str_to_int(comment_count)
return {
'id': video_id,
'uploader': video_uploader,
'title': video_title,
'thumbnail': thumbnail,
'description': video_description,
'url': video_url,
'ext': extension,
'format': format,
'format_id': format,
'title': title,
'description': description,
'thumbnail': thumbnail,
'uploader': uploader,
'format_id': format_id,
'view_count': view_count,
'like_count': like_count,
'dislike_count': dislike_count,
'comment_count': comment_count,
'age_limit': 18,
}

View File

@@ -11,7 +11,7 @@ from ..utils import (
class UstreamIE(InfoExtractor):
_VALID_URL = r'https?://www\.ustream\.tv/recorded/(?P<videoID>\d+)'
_VALID_URL = r'https?://www\.ustream\.tv/(?P<type>recorded|embed)/(?P<videoID>\d+)'
IE_NAME = 'ustream'
_TEST = {
'url': 'http://www.ustream.tv/recorded/20274954',
@@ -25,6 +25,13 @@ class UstreamIE(InfoExtractor):
def _real_extract(self, url):
m = re.match(self._VALID_URL, url)
if m.group('type') == 'embed':
video_id = m.group('videoID')
webpage = self._download_webpage(url, video_id)
desktop_video_id = self._html_search_regex(r'ContentVideoIds=\["([^"]*?)"\]', webpage, 'desktop_video_id')
desktop_url = 'http://www.ustream.tv/recorded/' + desktop_video_id
return self.url_result(desktop_url, 'Ustream')
video_id = m.group('videoID')
video_url = 'http://tcdn.ustream.tv/video/%s' % video_id

View File

@@ -134,7 +134,13 @@ class VevoIE(InfoExtractor):
video_id = mobj.group('id')
json_url = 'http://videoplayer.vevo.com/VideoService/AuthenticateVideo?isrc=%s' % video_id
video_info = self._download_json(json_url, video_id)['video']
response = self._download_json(json_url, video_id)
video_info = response['video']
if not video_info:
if 'statusMessage' in response:
raise ExtractorError('%s said: %s' % (self.IE_NAME, response['statusMessage']), expected=True)
raise ExtractorError('Unable to extract videos')
formats = self._formats_from_json(video_info)

View File

@@ -0,0 +1,58 @@
from __future__ import unicode_literals
import re
import base64
from .common import InfoExtractor
from ..utils import unified_strdate
class VideoTtIE(InfoExtractor):
ID_NAME = 'video.tt'
IE_DESC = 'video.tt - Your True Tube'
_VALID_URL = r'http://(?:www\.)?video\.tt/(?:video/|watch_video\.php\?v=)(?P<id>[\da-zA-Z]{9})'
_TEST = {
'url': 'http://www.video.tt/watch_video.php?v=amd5YujV8',
'md5': 'b13aa9e2f267effb5d1094443dff65ba',
'info_dict': {
'id': 'amd5YujV8',
'ext': 'flv',
'title': 'Motivational video Change your mind in just 2.50 mins',
'description': '',
'upload_date': '20130827',
'uploader': 'joseph313',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
settings = self._download_json(
'http://www.video.tt/player_control/settings.php?v=%s' % video_id, video_id,
'Downloading video JSON')['settings']
video = settings['video_details']['video']
formats = [
{
'url': base64.b64decode(res['u']).decode('utf-8'),
'ext': 'flv',
'format_id': res['l'],
} for res in settings['res'] if res['u']
]
return {
'id': video_id,
'title': video['title'],
'description': video['description'],
'thumbnail': settings['config']['thumbnail'],
'upload_date': unified_strdate(video['added']),
'uploader': video['owner'],
'view_count': int(video['view_count']),
'comment_count': int(video['comment_count']),
'like_count': int(video['liked']),
'dislike_count': int(video['disliked']),
'formats': formats,
}

View File

@@ -0,0 +1,26 @@
from __future__ import unicode_literals
from .novamov import NovaMovIE
class VideoWeedIE(NovaMovIE):
IE_NAME = 'videoweed'
IE_DESC = 'VideoWeed'
_VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': 'videoweed\.(?:es|com)'}
_HOST = 'www.videoweed.es'
_FILE_DELETED_REGEX = r'>This file no longer exists on our servers.<'
_TITLE_REGEX = r'<h1 class="text_shadow">([^<]+)</h1>'
_TEST = {
'url': 'http://www.videoweed.es/file/b42178afbea14',
'md5': 'abd31a2132947262c50429e1d16c1bfd',
'info_dict': {
'id': 'b42178afbea14',
'ext': 'flv',
'title': 'optical illusion dissapeared image magic illusion',
'description': ''
},
}

View File

@@ -17,10 +17,39 @@ from ..utils import (
RegexNotFoundError,
std_headers,
unsmuggle_url,
urlencode_postdata,
int_or_none,
)
class VimeoIE(SubtitlesInfoExtractor):
class VimeoBaseInfoExtractor(InfoExtractor):
_NETRC_MACHINE = 'vimeo'
_LOGIN_REQUIRED = False
def _login(self):
(username, password) = self._get_login_info()
if username is None:
if self._LOGIN_REQUIRED:
raise ExtractorError('No login info available, needed for using %s.' % self.IE_NAME, expected=True)
return
self.report_login()
login_url = 'https://vimeo.com/log_in'
webpage = self._download_webpage(login_url, None, False)
token = self._search_regex(r'xsrft: \'(.*?)\'', webpage, 'login token')
data = urlencode_postdata({
'email': username,
'password': password,
'action': 'login',
'service': 'vimeo',
'token': token,
})
login_request = compat_urllib_request.Request(login_url, data)
login_request.add_header('Content-Type', 'application/x-www-form-urlencoded')
login_request.add_header('Cookie', 'xsrft=%s' % token)
self._download_webpage(login_request, None, False, 'Wrong login info')
class VimeoIE(VimeoBaseInfoExtractor, SubtitlesInfoExtractor):
"""Information extractor for vimeo.com."""
# _VALID_URL matches Vimeo URLs
@@ -33,7 +62,6 @@ class VimeoIE(SubtitlesInfoExtractor):
(?:videos?/)?
(?P<id>[0-9]+)
/?(?:[?&].*)?(?:[#].*)?$'''
_NETRC_MACHINE = 'vimeo'
IE_NAME = 'vimeo'
_TESTS = [
{
@@ -47,40 +75,47 @@ class VimeoIE(SubtitlesInfoExtractor):
"uploader_id": "user7108434",
"uploader": "Filippo Valsorda",
"title": "youtube-dl test video - \u2605 \" ' \u5e78 / \\ \u00e4 \u21ad \U0001d550",
"duration": 10,
},
},
{
'url': 'http://vimeopro.com/openstreetmapus/state-of-the-map-us-2013/video/68093876',
'file': '68093876.mp4',
'md5': '3b5ca6aa22b60dfeeadf50b72e44ed82',
'note': 'Vimeo Pro video (#1197)',
'info_dict': {
'id': '68093876',
'ext': 'mp4',
'uploader_id': 'openstreetmapus',
'uploader': 'OpenStreetMap US',
'title': 'Andy Allan - Putting the Carto into OpenStreetMap Cartography',
'duration': 1595,
},
},
{
'url': 'http://player.vimeo.com/video/54469442',
'file': '54469442.mp4',
'md5': '619b811a4417aa4abe78dc653becf511',
'note': 'Videos that embed the url in the player page',
'info_dict': {
'id': '54469442',
'ext': 'mp4',
'title': 'Kathy Sierra: Building the minimum Badass User, Business of Software',
'uploader': 'The BLN & Business of Software',
'uploader_id': 'theblnbusinessofsoftware',
'duration': 3610,
},
},
{
'url': 'http://vimeo.com/68375962',
'file': '68375962.mp4',
'md5': 'aaf896bdb7ddd6476df50007a0ac0ae7',
'note': 'Video protected with password',
'info_dict': {
'id': '68375962',
'ext': 'mp4',
'title': 'youtube-dl password protected test video',
'upload_date': '20130614',
'uploader_id': 'user18948128',
'uploader': 'Jaime Marquínez Ferrándiz',
'duration': 10,
},
'params': {
'videopassword': 'youtube-dl',
@@ -98,6 +133,7 @@ class VimeoIE(SubtitlesInfoExtractor):
'upload_date': '20131015',
'uploader_id': 'staff',
'uploader': 'Vimeo Staff',
'duration': 62,
}
},
]
@@ -111,38 +147,21 @@ class VimeoIE(SubtitlesInfoExtractor):
else:
return super(VimeoIE, cls).suitable(url)
def _login(self):
(username, password) = self._get_login_info()
if username is None:
return
self.report_login()
login_url = 'https://vimeo.com/log_in'
webpage = self._download_webpage(login_url, None, False)
token = self._search_regex(r'xsrft: \'(.*?)\'', webpage, 'login token')
data = compat_urllib_parse.urlencode({'email': username,
'password': password,
'action': 'login',
'service': 'vimeo',
'token': token,
})
login_request = compat_urllib_request.Request(login_url, data)
login_request.add_header('Content-Type', 'application/x-www-form-urlencoded')
login_request.add_header('Cookie', 'xsrft=%s' % token)
self._download_webpage(login_request, None, False, 'Wrong login info')
def _verify_video_password(self, url, video_id, webpage):
password = self._downloader.params.get('videopassword', None)
if password is None:
raise ExtractorError('This video is protected by a password, use the --video-password option')
token = self._search_regex(r'xsrft: \'(.*?)\'', webpage, 'login token')
data = compat_urllib_parse.urlencode({'password': password,
'token': token})
data = compat_urllib_parse.urlencode({
'password': password,
'token': token,
})
# I didn't manage to use the password with https
if url.startswith('https'):
pass_url = url.replace('https','http')
pass_url = url.replace('https', 'http')
else:
pass_url = url
password_request = compat_urllib_request.Request(pass_url+'/password', data)
password_request = compat_urllib_request.Request(pass_url + '/password', data)
password_request.add_header('Content-Type', 'application/x-www-form-urlencoded')
password_request.add_header('Cookie', 'xsrft=%s' % token)
self._download_webpage(password_request, video_id,
@@ -249,8 +268,9 @@ class VimeoIE(SubtitlesInfoExtractor):
# Extract video description
video_description = None
try:
video_description = get_element_by_attribute("itemprop", "description", webpage)
if video_description: video_description = clean_html(video_description)
video_description = get_element_by_attribute("class", "description_wrapper", webpage)
if video_description:
video_description = clean_html(video_description)
except AssertionError as err:
# On some pages like (http://player.vimeo.com/video/54469442) the
# html tags are not closed, python 2.6 cannot handle it
@@ -259,6 +279,9 @@ class VimeoIE(SubtitlesInfoExtractor):
else:
raise
# Extract video duration
video_duration = int_or_none(config["video"].get("duration"))
# Extract upload date
video_upload_date = None
mobj = re.search(r'<meta itemprop="dateCreated" content="(\d{4})-(\d{2})-(\d{2})T', webpage)
@@ -296,7 +319,7 @@ class VimeoIE(SubtitlesInfoExtractor):
file_info = {}
if video_url is None:
video_url = "http://player.vimeo.com/play_redirect?clip_id=%s&sig=%s&time=%s&quality=%s&codecs=%s&type=moogaloop_local&embed_location=" \
%(video_id, sig, timestamp, quality, codec_name.upper())
% (video_id, sig, timestamp, quality, codec_name.upper())
files[key].append({
'ext': codec_extension,
@@ -330,6 +353,7 @@ class VimeoIE(SubtitlesInfoExtractor):
'title': video_title,
'thumbnail': video_thumbnail,
'description': video_description,
'duration': video_duration,
'formats': formats,
'webpage_url': url,
'view_count': view_count,
@@ -355,7 +379,7 @@ class VimeoChannelIE(InfoExtractor):
video_ids = []
for pagenum in itertools.count(1):
webpage = self._download_webpage(
self._page_url(base_url, pagenum) ,list_id,
self._page_url(base_url, pagenum), list_id,
'Downloading page %s' % pagenum)
video_ids.extend(re.findall(r'id="clip_(\d+?)"', webpage))
if re.search(self._MORE_PAGES_INDICATOR, webpage, re.DOTALL) is None:
@@ -371,7 +395,7 @@ class VimeoChannelIE(InfoExtractor):
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
channel_id = mobj.group('id')
channel_id = mobj.group('id')
return self._extract_videos(channel_id, 'http://vimeo.com/channels/%s' % channel_id)
@@ -438,3 +462,25 @@ class VimeoReviewIE(InfoExtractor):
video_id = mobj.group('id')
player_url = 'https://player.vimeo.com/player/' + video_id
return self.url_result(player_url, 'Vimeo', video_id)
class VimeoWatchLaterIE(VimeoBaseInfoExtractor, VimeoChannelIE):
IE_NAME = 'vimeo:watchlater'
IE_DESC = 'Vimeo watch later list, "vimeowatchlater" keyword (requires authentication)'
_VALID_URL = r'https?://vimeo\.com/home/watchlater|:vimeowatchlater'
_LOGIN_REQUIRED = True
_TITLE_RE = r'href="/home/watchlater".*?>(.*?)<'
def _real_initialize(self):
self._login()
def _page_url(self, base_url, pagenum):
url = '%s/page:%d/' % (base_url, pagenum)
request = compat_urllib_request.Request(url)
# Set the header to get a partial html page with the ids,
# the normal page doesn't contain them.
request.add_header('X-Requested-With', 'XMLHttpRequest')
return request
def _real_extract(self, url):
return self._extract_videos('watchlater', 'https://vimeo.com/home/watchlater')

View File

@@ -2,6 +2,7 @@ from __future__ import unicode_literals
import re
import json
import itertools
from .common import InfoExtractor
from ..utils import unified_strdate
@@ -57,4 +58,34 @@ class VineIE(InfoExtractor):
'comment_count': data['comments']['count'],
'repost_count': data['reposts']['count'],
'formats': formats,
}
}
class VineUserIE(InfoExtractor):
IE_NAME = 'vine:user'
_VALID_URL = r'(?:https?://)?vine\.co/(?P<user>[^/]+)/?(\?.*)?$'
_VINE_BASE_URL = "https://vine.co/"
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
user = mobj.group('user')
profile_url = "%sapi/users/profiles/vanity/%s" % (
self._VINE_BASE_URL, user)
profile_data = self._download_json(
profile_url, user, note='Downloading user profile data')
user_id = profile_data['data']['userId']
timeline_data = []
for pagenum in itertools.count(1):
timeline_url = "%sapi/timelines/users/%s?page=%s" % (
self._VINE_BASE_URL, user_id, pagenum)
timeline_page = self._download_json(
timeline_url, user, note='Downloading page %d' % pagenum)
timeline_data.extend(timeline_page['data']['records'])
if timeline_page['data']['nextPage'] is None:
break
entries = [
self.url_result(e['permalinkUrl'], 'Vine') for e in timeline_data]
return self.playlist_result(entries, user)

View File

@@ -37,7 +37,7 @@ class VKIE(InfoExtractor):
'info_dict': {
'id': '163339118',
'ext': 'mp4',
'uploader': 'Elvira Dzhonik',
'uploader': 'Elya Iskhakova',
'title': 'Dream Theater - Hollow Years Live at Budokan 720*',
'duration': 558,
}
@@ -108,7 +108,7 @@ class VKIE(InfoExtractor):
m_yt = re.search(r'src="(http://www.youtube.com/.*?)"', info_page)
if m_yt is not None:
self.to_screen(u'Youtube video detected')
self.to_screen('Youtube video detected')
return self.url_result(m_yt.group(1), 'Youtube')
data_json = self._search_regex(r'var vars = ({.*?});', info_page, 'vars')
data = json.loads(data_json)

View File

@@ -1,47 +1,69 @@
from __future__ import unicode_literals
import re
import datetime
from .common import InfoExtractor
from ..utils import int_or_none
class VubeIE(InfoExtractor):
IE_NAME = 'vube'
IE_DESC = 'Vube.com'
_VALID_URL = r'http://vube\.com/[^/]+/(?P<id>[\da-zA-Z]{10})'
_VALID_URL = r'http://vube\.com/(?:[^/]+/)+(?P<id>[\da-zA-Z]{10})\b'
_TEST = {
'url': 'http://vube.com/Chiara+Grispo+Video+Channel/YL2qNPkqon',
'md5': 'db7aba89d4603dadd627e9d1973946fe',
'info_dict': {
'id': 'YL2qNPkqon',
'ext': 'mp4',
'title': 'Chiara Grispo - Price Tag by Jessie J',
'description': 'md5:8ea652a1f36818352428cb5134933313',
'thumbnail': 'http://frame.thestaticvube.com/snap/228x128/102e7e63057-5ebc-4f5c-4065-6ce4ebde131f.jpg',
'uploader': 'Chiara.Grispo',
'uploader_id': '1u3hX0znhP',
'upload_date': '20140103',
'duration': 170.56
_TESTS = [
{
'url': 'http://vube.com/Chiara+Grispo+Video+Channel/YL2qNPkqon',
'md5': 'db7aba89d4603dadd627e9d1973946fe',
'info_dict': {
'id': 'YL2qNPkqon',
'ext': 'mp4',
'title': 'Chiara Grispo - Price Tag by Jessie J',
'description': 'md5:8ea652a1f36818352428cb5134933313',
'thumbnail': 'http://frame.thestaticvube.com/snap/228x128/102e7e63057-5ebc-4f5c-4065-6ce4ebde131f.jpg',
'uploader': 'Chiara.Grispo',
'uploader_id': '1u3hX0znhP',
'timestamp': 1388743358,
'upload_date': '20140103',
'duration': 170.56
}
},
{
'url': 'http://vube.com/SerainaMusic/my-7-year-old-sister-and-i-singing-alive-by-krewella/UeBhTudbfS?t=s&n=1',
'md5': '5d4a52492d76f72712117ce6b0d98d08',
'info_dict': {
'id': 'UeBhTudbfS',
'ext': 'mp4',
'title': 'My 7 year old Sister and I singing "Alive" by Krewella',
'description': 'md5:40bcacb97796339f1690642c21d56f4a',
'thumbnail': 'http://frame.thestaticvube.com/snap/228x128/102265d5a9f-0f17-4f6b-5753-adf08484ee1e.jpg',
'uploader': 'Seraina',
'uploader_id': 'XU9VE2BQ2q',
'timestamp': 1396492438,
'upload_date': '20140403',
'duration': 240.107
}
}
}
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video = self._download_json('http://vube.com/api/v2/video/%s' % video_id,
video_id, 'Downloading video JSON')
video = self._download_json(
'http://vube.com/api/v2/video/%s' % video_id, video_id, 'Downloading video JSON')
public_id = video['public_id']
formats = [{'url': 'http://video.thestaticvube.com/video/%s/%s.mp4' % (fmt['media_resolution_id'], public_id),
'height': int(fmt['height']),
'abr': int(fmt['audio_bitrate']),
'vbr': int(fmt['video_bitrate']),
'format_id': fmt['media_resolution_id']
} for fmt in video['mtm'] if fmt['transcoding_status'] == 'processed']
formats = [
{
'url': 'http://video.thestaticvube.com/video/%s/%s.mp4' % (fmt['media_resolution_id'], public_id),
'height': int(fmt['height']),
'abr': int(fmt['audio_bitrate']),
'vbr': int(fmt['video_bitrate']),
'format_id': fmt['media_resolution_id']
} for fmt in video['mtm'] if fmt['transcoding_status'] == 'processed'
]
self._sort_formats(formats)
@@ -52,16 +74,16 @@ class VubeIE(InfoExtractor):
thumbnail = 'http:' + thumbnail
uploader = video['user_alias']
uploader_id = video['user_url_id']
upload_date = datetime.datetime.fromtimestamp(int(video['upload_time'])).strftime('%Y%m%d')
timestamp = int(video['upload_time'])
duration = video['duration']
view_count = video['raw_view_count']
like_count = video['total_likes']
dislike_count= video['total_hates']
view_count = video.get('raw_view_count')
like_count = video.get('total_likes')
dislike_count= video.get('total_hates')
comment = self._download_json('http://vube.com/api/video/%s/comment' % video_id,
video_id, 'Downloading video comment JSON')
comment = self._download_json(
'http://vube.com/api/video/%s/comment' % video_id, video_id, 'Downloading video comment JSON')
comment_count = comment['total']
comment_count = int_or_none(comment.get('total'))
return {
'id': video_id,
@@ -71,7 +93,7 @@ class VubeIE(InfoExtractor):
'thumbnail': thumbnail,
'uploader': uploader,
'uploader_id': uploader_id,
'upload_date': upload_date,
'timestamp': timestamp,
'duration': duration,
'view_count': view_count,
'like_count': like_count,

Some files were not shown because too many files have changed in this diff Show More