Compare commits

..

385 Commits

Author SHA1 Message Date
Philipp Hagemeister
cceb5ec237 release 2014.07.20 2014-07-20 18:47:03 +02:00
Philipp Hagemeister
71a6eaff83 Merge remote-tracking branch 'origin/master' 2014-07-20 18:32:59 +02:00
Philipp Hagemeister
7fd48d0413 [youtube] Correct signature testcase 2014-07-20 18:30:27 +02:00
Philipp Hagemeister
1b38b5be86 [swfinterp] Remove debugging code 2014-07-20 18:29:09 +02:00
Philipp Hagemeister
decf2ae400 [swfinterp] Correct array access 2014-07-20 18:28:49 +02:00
Philipp Hagemeister
0d989011ff [swfinterp] Add support for calling methods on objects 2014-07-20 14:49:10 +02:00
Philipp Hagemeister
01b4b74574 [swfinterp] Add support for calls to instance methods 2014-07-20 12:47:15 +02:00
Philipp Hagemeister
70f767dc65 [swfinterp] Add support for multiple classes 2014-07-20 00:25:58 +02:00
Philipp Hagemeister
e75c24e889 [swfinterp] Extend tests and fix parsing 2014-07-20 00:03:54 +02:00
Philipp Hagemeister
0cb2056304 [swfinterp] Start working on basic tests 2014-07-19 23:05:07 +02:00
Sergey M․
604f292ab7 [sapo] Add extractor (Closes #2816) 2014-07-20 00:00:20 +07:00
Sergey M․
23d3c422ab [francetv] Add support for mobile URLs (Closes #3275) 2014-07-19 17:47:50 +07:00
Sergey M․
0c1ffe980d [mlb] Fix _VALID_URL 2014-07-18 21:43:01 +07:00
Sergey M․
5e95cb27d6 Credit @hassaanaliw for cracked (#3274) 2014-07-18 21:41:34 +07:00
Sergey M․
3b86f936c5 Merge branch 'hassaanaliw-cracked' 2014-07-18 21:39:38 +07:00
Sergey M․
e0942e37aa [crackled] Improve, fix invalid regexes and extract more metadata 2014-07-18 21:39:21 +07:00
Sergey M․
c45a6caa95 [utils] Add None check in str_to_int 2014-07-18 21:37:40 +07:00
Sergey M․
61bbddbaa6 Merge branch 'cracked' of https://github.com/hassaanaliw/youtube-dl 2014-07-18 20:29:35 +07:00
Philipp Hagemeister
5425626790 [youtube] Move swfinterp into its own file 2014-07-18 10:24:28 +02:00
Philipp Hagemeister
5dc3552d85 [youtube] Add support for classes in swf parser 2014-07-18 00:54:17 +02:00
Philipp Hagemeister
3fbd27f73e [youtube] SWF parser: Add opcode 86
Yes, I know we need 96, but an implementation of 86 could help avoid a similar issue.
2014-07-17 23:22:49 +02:00
Philipp Hagemeister
0382ecb78d Merge pull request #3289 from Reventl0v/patch-1
Fix the url in the INSTALLATION section
2014-07-17 22:54:24 +02:00
Philipp Hagemeister
72edb6fc8c Merge remote-tracking branch 'origin/master' 2014-07-17 22:32:54 +02:00
Jaime Marquínez Ferrándiz
66149e3f2b [npo] Fix the json extraction (fixes #3282)
The comment in the javascript file is not always the same.
2014-07-17 22:29:03 +02:00
Reventl0v
6e74521d98 Fix the url in the INSTALLATION section 2014-07-17 21:08:43 +02:00
Philipp Hagemeister
cf01013161 [youtube] Find more swf players (Closes #3270, refer #3271) 2014-07-17 16:28:36 +02:00
Jaime Marquínez Ferrándiz
1e179c7528 Merge pull request #3283 from MikeCol/redtube_thumb_new
Redtube changed player config, new place to get thumb URL
2014-07-17 12:44:21 +02:00
MikeCol
530ed178b7 Redtube changed player config, new place to get thumb URL 2014-07-17 11:17:27 +02:00
Jaime Marquínez Ferrándiz
74aa18f68f [dfb] Add extractor (closes #3280) 2014-07-17 10:07:51 +02:00
Jaime Marquínez Ferrándiz
d9222264a8 [adultswim] The bitrate must be an integer or None (reported in #2952) 2014-07-17 09:31:48 +02:00
Jaime Marquínez Ferrándiz
ca14211e93 [adultswim] Simplify (closes #2952) 2014-07-17 09:27:06 +02:00
Jaime Marquínez Ferrándiz
b1d65c3369 Merge remote-tracking branch 'adammw/adultswim' 2014-07-17 09:21:43 +02:00
Jaime Marquínez Ferrándiz
b4c538b02b [comedycentral] Only recognize the cc.com domain
The old comedycentral.com urls redirect to the new urls.
2014-07-16 23:05:56 +02:00
Jaime Marquínez Ferrándiz
13059bceb2 [comedycentral] Recognize 'full-episodes' urls (fixes #3277) 2014-07-16 23:05:56 +02:00
Sergey M․
d8894e24a4 [rtbf] Fix data video regex 2014-07-17 01:57:38 +07:00
Sergey M․
3b09757bac Credit @chaochichen for mlb (#3252) 2014-07-16 21:03:30 +07:00
Sergey M․
2f97f76877 Merge branch 'cracked' of https://github.com/hassaanaliw/youtube-dl into hassaanaliw-cracked 2014-07-16 20:55:38 +07:00
hassaanaliw
43f0537c06 [cracked] Add new extractor 2014-07-16 18:45:42 +05:00
Sergey M․
a816da0dc3 Merge branch 'chaochichen-MLB' 2014-07-16 20:42:01 +07:00
Sergey M․
7bb49d1057 [mlb] Extract more metadata and all formats, provide more tests 2014-07-16 20:40:28 +07:00
Sergey M․
1aa42fedee Merge branch 'MLB' of https://github.com/chaochichen/youtube-dl into chaochichen-MLB 2014-07-16 19:13:35 +07:00
Philipp Hagemeister
ee90ddab94 release 2014.07.15 2014-07-15 22:59:12 +02:00
Charles Chen
172240c0a4 Switched to use media detail XML to extract video URL 2014-07-15 13:55:23 -07:00
Jaime Marquínez Ferrándiz
ad25aee245 [youtube & jsinterp] Fix signature extraction (fixes #3255)
Some functions are defined now inside an object, the jsinterp will search its definition if the variable is not defined in the local namespace.
2014-07-15 22:46:39 +02:00
Sergey M․
bd1f325b42 [tutv] Replace 404 test and modernize 2014-07-15 19:32:42 +07:00
Sergey M․
00a82ea805 [soundcloud] Replace 404 test 2014-07-15 19:18:06 +07:00
Charles Chen
b1b01841af [MLB] Add new extractor 2014-07-14 11:00:55 -07:00
Filippo Valsorda
816930c485 Fix utils.strip_jsonp 2014-07-14 00:41:23 +02:00
Sergey M․
76233cda34 [pyvideo] Fix title extraction 2014-07-14 00:38:10 +07:00
Jaime Marquínez Ferrándiz
9dcea39985 [tlc.de] If the url contains a fragment, use if in the iframe url (reported in #2748)
The fragment is used in the webpage for selecting different videos.
2014-07-13 14:38:26 +02:00
Jaime Marquínez Ferrándiz
10d00a756a rename southparkstudios.py to southpark.py
And make the extractor only recognize southpark.cc.com urls, the old urls are redirected.
2014-07-13 14:08:23 +02:00
Jaime Marquínez Ferrándiz
eb50741129 Merge remote-tracking branch 'adammw/southpark' 2014-07-13 14:01:09 +02:00
Adam Malcontenti-Wilson
3804b01276 Update test 2014-07-13 21:29:04 +10:00
Adam Malcontenti-Wilson
b1298d8e06 Test for colon in mgid 2014-07-13 21:15:18 +10:00
Adam Malcontenti-Wilson
6a46dc8db7 Add southpark.cc.com to southpark IE 2014-07-13 12:48:30 +10:00
Filippo Valsorda
36cb99f958 [ReverbNation] Add new IE - closes #2250 2014-07-13 00:47:20 +02:00
Sergey M․
81650f95e2 [ruhd] Add extractor 2014-07-13 04:03:22 +07:00
Sergey M․
34dbcb8505 [ndr] Replace 404 test 2014-07-12 22:08:33 +07:00
Philipp Hagemeister
c993c829e2 [firedrive] Simplify 2014-07-12 14:27:14 +02:00
Philipp Hagemeister
0d90e0f067 Credit @naglis for firedrive (#3242) 2014-07-12 14:23:54 +02:00
Naglis Jonaitis
678f58de4b [firedrive] Add new extractor. Addresses #3095 2014-07-12 00:42:42 +03:00
Sergey M․
c961a0e63e [screencast] Add one more format and improve title extraction 2014-07-11 22:52:48 +07:00
Sergey M․
aaefb347c0 [gorillavid] Fix embedded videos extraction 2014-07-11 22:23:00 +07:00
Philipp Hagemeister
09018e19a5 release 2014.07.11.3 2014-07-11 17:21:16 +02:00
Sergey M․
345e37831c [youtube] Update nosubtitles test 2014-07-11 22:08:04 +07:00
Sergey M․
00ac799b68 [vine:user] Update test 2014-07-11 22:04:24 +07:00
Jaime Marquínez Ferrándiz
133af9385b Update supported formats for the --recode-video option (#3228) 2014-07-11 16:16:30 +02:00
Philipp Hagemeister
40c696e5c6 [screencast] Add suppot for more video types (#3236) 2014-07-11 15:39:24 +02:00
Philipp Hagemeister
d6d5028922 release 2014.07.11.2 2014-07-11 13:34:48 +02:00
Philipp Hagemeister
38ad119f97 [screencast] Add new extractor (Fixes #3236) 2014-07-11 13:34:19 +02:00
Philipp Hagemeister
4e415288d7 [criterion] Simplify and modernize 2014-07-11 13:21:32 +02:00
Philipp Hagemeister
fada438acf release 2014.07.11.1 2014-07-11 11:53:28 +02:00
Philipp Hagemeister
1df0ae2170 Credit @tobidope for gameone (#2941) 2014-07-11 11:29:17 +02:00
Philipp Hagemeister
d96b9d40f0 [gameone] Sort formats 2014-07-11 11:27:44 +02:00
Philipp Hagemeister
fa19dfccf9 Merge remote-tracking branch 'tobidope/gameone' 2014-07-11 11:17:57 +02:00
Philipp Hagemeister
cdc22cb886 Credit @adammw for tenplay (#2954) 2014-07-11 11:16:04 +02:00
Philipp Hagemeister
04c77a54b0 [tenplay] PEP8 2014-07-11 11:15:35 +02:00
Philipp Hagemeister
64a8c39a1f Merge remote-tracking branch 'adammw/tenplay' 2014-07-11 11:12:41 +02:00
Philipp Hagemeister
3d55f2806e Credit @irtusb for vimple (#3073) 2014-07-11 11:11:52 +02:00
Philipp Hagemeister
1eb867f33f [vimple] Simplify and PEP8 2014-07-11 11:11:09 +02:00
Philipp Hagemeister
e93f4f7578 [vodlocker] Remove unused imports 2014-07-11 11:09:01 +02:00
Philipp Hagemeister
45ead916d1 [vimple] Do not fail if duration is missing 2014-07-11 11:08:36 +02:00
Philipp Hagemeister
3a0879c8c8 Merge remote-tracking branch 'irtusb/vimple' 2014-07-11 11:07:44 +02:00
Philipp Hagemeister
ebf361ce18 Merge remote-tracking branch 'azeem/soundcloud_likes' 2014-07-11 11:06:33 +02:00
Philipp Hagemeister
953b358668 [gorillavid] Add support for daclips.in (Closes #3213) 2014-07-11 11:05:16 +02:00
Philipp Hagemeister
3dfd25b3aa [goshgay] PEP8 and test for age_limit (#3220) 2014-07-11 11:01:59 +02:00
Philipp Hagemeister
6f66eedc5d Merge remote-tracking branch 'MikeCol/goshgay' 2014-07-11 11:00:37 +02:00
Philipp Hagemeister
4094b6e36d [vodlocker] PEP8, generalization, and simplification (#3223) 2014-07-11 10:57:40 +02:00
Philipp Hagemeister
c09cbf0ed9 Merge remote-tracking branch 'pachacamac/vodlocker' 2014-07-11 10:54:53 +02:00
Philipp Hagemeister
391d53e1dd release 2014.07.11 2014-07-11 10:49:41 +02:00
Philipp Hagemeister
f64ebfe3e5 [youtube] Correct signature test 2014-07-11 10:46:11 +02:00
Philipp Hagemeister
fc040bfd05 [jsinterp] Prevent mis-recognitions of local functions 2014-07-11 10:44:56 +02:00
Philipp Hagemeister
c8bf86d50d [youtube] Correct signature extraction error detection 2014-07-11 10:44:39 +02:00
Philipp Hagemeister
61989fb5e9 [jsinterp] Remove superfluous u 2014-07-11 10:40:02 +02:00
Philipp Hagemeister
6f9d4d542f [youtube] Add test for new signature scheme (#3232) 2014-07-11 10:34:01 +02:00
Philipp Hagemeister
b3a8878080 [youtube] Remove static signatures
The always fail by now. Instead, use only automatic signature extraction
2014-07-11 10:23:19 +02:00
Philipp Hagemeister
f4d66a99cf release 2014.07.10 2014-07-10 14:49:16 +02:00
pachacamac
537ba6f381 [Vodlocker] Add new extractor 2014-07-09 18:21:46 +02:00
Sergey M․
411f691b21 [mpora] Fix player regex 2014-07-09 19:12:42 +07:00
MikeCol
d6aa1967ad GoshGay Extractor 2014-07-09 12:14:53 +02:00
Sergey M․
6e1e0e4b5b [veoh] Skip deleted test video 2014-07-08 20:22:27 +07:00
azeem
3941669d69 [soundcloud] Adding likes support to SoundcloudUserIE 2014-07-07 23:59:57 +05:30
Sergey M․
1aac03797e [ninegag] Fix extraction 2014-07-07 20:12:59 +07:00
Jaime Marquínez Ferrándiz
459af43494 [arte] Manually set the rtmp play_path (fix #3198)
rtmpdump doesn't parse it right
2014-07-07 14:10:57 +02:00
Philipp Hagemeister
f4f7e3cf41 Merge branch 'master' of github.com:rg3/youtube-dl 2014-07-06 20:57:05 +02:00
Sergey M․
1fd015516e [newstube] Replace test 2014-07-06 19:32:13 +07:00
Sergey M․
76bafa8ffe [newstube] Capture error message 2014-07-06 18:53:31 +07:00
Philipp Hagemeister
8d5797b00f [YoutubeDL] Show download URL when -v is set
This will allow us to debug issues like #3204
2014-07-06 11:28:51 +02:00
Philipp Hagemeister
7571c02c8a [generic] Set default-search to error
This prevents users from submitting bug reports where they mistyped a URL, and prevents me from getting a weird video when holding shift and thus searching for :Tds
2014-07-06 11:22:44 +02:00
Petr Půlpán
49cbe7c8e3 [allocine] add extractor for allocine.fr (fixes #3189) 2014-07-05 14:42:26 +02:00
Sergey M․
ba4133c9eb Credit @hakatashi for #3181 #3182 2014-07-04 22:30:43 +07:00
Sergey M․
b67f1840a1 [niconico] Remove unused import 2014-07-04 22:26:56 +07:00
Sergey M.
165c46690f Merge pull request #3180 from hakatashi/niconico-without-authentication
[niconico] Download without authentication
2014-07-04 22:25:05 +07:00
Sergey M․
16bc9ab601 Merge branch 'hakatashi-niconico-channel-video' 2014-07-04 22:06:31 +07:00
Sergey M․
15ce1338b4 [niconico] Extract more metadata and simplify (Closes #3181) 2014-07-04 22:05:46 +07:00
Sergey M․
0ff30c5333 Merge branch 'niconico-channel-video' of https://github.com/hakatashi/youtube-dl into hakatashi-niconico-channel-video 2014-07-04 21:39:54 +07:00
Sergey M․
6feb2d5e80 [youtube:search_url] Update regexes 2014-07-04 19:21:19 +07:00
Sergey M․
1e07fea200 [teachertube] Add support for new video URL format 2014-07-03 21:11:56 +07:00
Sergey M․
7aeb67b39b [teachertube:user:collection] Update media regex 2014-07-03 21:08:44 +07:00
Sergey M․
93881db22a [anitube] Modernize 2014-07-02 19:24:01 +07:00
hakatashi
64ed7a38f9 [niconico] Add support for channel video 2014-07-02 03:13:12 +09:00
hakatashi
2fd466fcfc [niconico] Download without authentication 2014-07-02 02:32:54 +09:00
Philipp Hagemeister
dc2fc73691 [youtube:truncated_url] Move test to extractor 2014-07-01 15:49:34 +02:00
Philipp Hagemeister
c4808c6009 [youtube_truncated_url] Add support for truncated watch URLs with annotations (#3178) 2014-07-01 15:49:16 +02:00
Sergey M․
c67f584eb3 [rai] Skip test 2014-07-01 19:24:18 +07:00
Petr Půlpán
29f6ed78e8 [tagesschau] replace 404 test 2014-07-01 10:35:49 +02:00
pulpe
7807ee664d [wdr] fix test 2014-07-01 09:59:57 +02:00
Sergey M․
d518d06efd [vk] Skip georestricted ivi embed test 2014-06-30 03:16:31 +07:00
Petr Půlpán
25a0cc44b9 [teachertube:user] fix regex 2014-06-29 20:33:46 +02:00
Petr Půlpán
825cdcec3c Merge branch 'master' of github.com:rg3/youtube-dl 2014-06-29 16:44:37 +02:00
Petr Půlpán
41b610acab [GooglePlus] fix video title extraction 2014-06-29 16:43:31 +02:00
Sergey M․
0364fa8b65 [generic] Add support for ivi.ru embedded player 2014-06-29 20:18:23 +07:00
Sergey M․
849086a1ae [vk] Better support for embeds 2014-06-29 20:07:59 +07:00
Sergey M․
36fbc6887f [ivi] Add support for embedded URLs 2014-06-29 20:06:47 +07:00
Sergey M․
a8a98e43f2 [vk] Add support for mobile URLs 2014-06-29 19:51:00 +07:00
Sergey M․
57bdc730e2 [vk] Add support for more URL formats (#3172) 2014-06-29 19:33:39 +07:00
Petr Půlpán
31a196d7f5 [TeacherTube] add user + collection, removed classrooms 2014-06-29 13:45:10 +02:00
Petr Půlpán
9b27e6c3b4 [Tumblr] fix encoding (PEP0263) 2014-06-29 09:32:53 +02:00
Petr Půlpán
62f1f9507f [Tumblr] fix test + add description 2014-06-29 09:08:46 +02:00
Petr Půlpán
ee8dda41ae [Toypics] support https urls 2014-06-29 08:21:23 +02:00
Sergey M․
01ba178097 [vk] Update test 2014-06-29 04:51:47 +07:00
Petr Půlpán
78ff59d052 [Motherless] simplify 2014-06-28 20:02:02 +02:00
Petr Půlpán
f3f1cd6b3b Merge pull request #3167 from Schnouki/motherless
* mother/motherless:
  [Motherless] Add new extractor
2014-06-28 19:12:31 +02:00
Sergey M․
803540e811 [drtv] Add missing extractor import 2014-06-28 17:36:13 +07:00
Petr Půlpán
458ade6361 [ArteTVFuture] fix empty formats list 2014-06-28 10:22:53 +02:00
Thomas Jost
a69969ee05 [Motherless] Add new extractor 2014-06-27 18:12:11 +02:00
Sergey M․
f2b8db57eb [drtv] Add extractor for DR TV (Closes #3126) 2014-06-27 20:53:59 +07:00
Jaime Marquínez Ferrándiz
331ae266ff [npo] Add extractor (closes #3145) 2014-06-26 20:30:44 +02:00
Philipp Hagemeister
4242001863 release 2014.06.26 2014-06-26 16:44:01 +02:00
Jaime Marquínez Ferrándiz
78338f71ca [livestream:original] Add support for folder urls (closes #2631)
The webpage only contains shortened links for the videos, since the server
doesn't support HEAD requests, we use an specific extractor for them.
2014-06-26 16:34:36 +02:00
Sergey M․
f5172a3084 [teachertube] Add support for new URL formats 2014-06-26 20:01:59 +07:00
Sergey M․
c7df67edbd [teachertube] Improve extraction 2014-06-26 20:00:47 +07:00
Petr Půlpán
d410fee91d [VideoTt] fix ValueError (#3161) 2014-06-26 07:35:47 +02:00
Philipp Hagemeister
ba7aa464de [soundgasm] PEP8 and add a display_id (#3155) 2014-06-25 23:47:38 +02:00
Philipp Hagemeister
8333034dce Merge remote-tracking branch 'pachacamac/soundgasm' 2014-06-25 23:45:03 +02:00
Philipp Hagemeister
637b6af80f release 2014.06.25 2014-06-25 21:24:01 +02:00
pachacamac
1044f8afd2 [Soundgasm] Add new extractor 2014-06-25 18:07:23 +02:00
Petr Půlpán
2f775107f9 Merge branch 'master' of github.com:rg3/youtube-dl 2014-06-25 17:45:24 +02:00
Petr Půlpán
85342674b2 [Dailymotion] fix uploader name (fixes #3153) 2014-06-25 17:44:19 +02:00
Sergey M․
fd69098a45 [rutube] Update playlist tests 2014-06-25 19:06:11 +07:00
Petr Půlpán
8867f908fc Merge pull request #3148 from crazedpsyc/master
[BlipTV] Allow plus sign in video ID
2014-06-25 07:14:04 +02:00
Michael Smith
b7c33124c8 [BlipTV] Allow plus sign in video ID 2014-06-24 17:55:08 -06:00
Jaime Marquínez Ferrándiz
89a8c423c7 Merge pull request #3146 from pvdl/patch-1
[discovery] Change default url
2014-06-24 18:11:26 +02:00
Peter
cea2582df2 [discovery] Change default url
URL does a redirect from dsc.discovery.com to www.discovery.com
This commit fixes the correct URL.
2014-06-24 17:41:53 +02:00
Sergey M․
e423e0baaa [wistia] Add duration and modernize 2014-06-24 19:34:39 +07:00
Philipp Hagemeister
60b2dd1285 [comedycentral] Correct handling when latest tds episode is a special-episode instead of a regular one 2014-06-24 10:50:41 +02:00
Philipp Hagemeister
36ddd8b3f7 release 2014.06.24.1 2014-06-24 09:03:52 +02:00
Philipp Hagemeister
7575d52a73 release 2014.06.24 2014-06-24 08:59:40 +02:00
Sergey M․
9a2dc4f7ac [teachertube] Fix extraction 2014-06-23 03:07:10 +07:00
Jaime Marquínez Ferrándiz
c5cd249e41 [generic] Extract mtvservices embedded videos 2014-06-22 21:39:36 +02:00
Jaime Marquínez Ferrándiz
8940c1c058 [mtv] Add an extractor for the mtvservices embedded player (closes #2995) 2014-06-22 21:39:27 +02:00
Petr Půlpán
27ec04b232 [BR] replace test 2014-06-22 17:33:27 +02:00
Sergey M․
d2824416aa [firstpost] Fix title extraction and add description 2014-06-22 01:20:40 +07:00
Petr Půlpán
18061bbab0 [Youtube] add DASH format 272 (fixes #3128) 2014-06-21 12:03:27 +02:00
Sergey M․
4ecbbcbcea Merge branch 'eliasp-spiegel' 2014-06-21 16:32:01 +07:00
Sergey M․
55c97a03e1 [spiegel] Add description and modernize 2014-06-21 16:31:18 +07:00
Elias Probst
98aeac6ea9 Use the 'base_url' for building the resulting 'url' as well. 2014-06-21 01:10:10 +02:00
Elias Probst
8bfb6723cb Extract the base_url for the XML download from the JS snippet's 'server' variable. 2014-06-21 01:00:48 +02:00
Elias Probst
a20575e8ae Make debug message useful and also report, which URL failed to download. 2014-06-21 00:35:12 +02:00
Sergey M․
7724572519 [noco] Switch to HTTPS (Closes #3116) 2014-06-20 18:40:47 +07:00
Philipp Hagemeister
d763637f6a release 2014.06.19 2014-06-19 17:13:50 +02:00
Jaime Marquínez Ferrándiz
c26e9ac4b2 [youtube] Recognize signature functions that contain '$' (fixes #3104) 2014-06-19 16:42:49 +02:00
Petr Půlpán
896bf55352 [LifeNews] update thumbnail in test 2014-06-19 16:34:48 +02:00
Petr Půlpán
a23ba9b53c [Steam] update description in test 2014-06-19 16:32:11 +02:00
Sergey M․
38a9339baf [prosiebensat1] Update some regexes 2014-06-19 19:51:49 +07:00
Sergey M․
def8b4039f [bilibili] Fix extraction 2014-06-18 18:53:25 +07:00
Petr Půlpán
a14e1538fe [ustream:channel] replace test for an updated channel 2014-06-17 16:03:03 +02:00
Petr Půlpán
5f28a1acad [GorillaVid] improve extractor 2014-06-17 15:18:46 +02:00
pulpe
25e9953c6f Merge pull request #3059 from marcwebbie/gorillavid
* marcwebbie/gorillavid:
  Changed video url to a public video
  [GorillaVid] Added GorillaVid extractor
2014-06-17 15:14:18 +02:00
Petr Půlpán
f9df094ca5 Merge pull request #3089 from pulpe/ard_fix
[ARDIE] fix formats extraction (fixes #3087)
2014-06-17 14:53:51 +02:00
Sergey M.
b60a469023 Merge pull request #3090 from Kagee/patch-1
tv.nrk.no urls mostly contain capital characters
2014-06-17 02:21:10 +07:00
Anders Einar Hilden
7012631257 Fix test
Didn't use .lower() as planned, so update test with new ID.
2014-06-16 19:37:59 +02:00
Anders Einar Hilden
e6c9f80c48 tv.nrk.no urls mostly contain capital characters
Updated regexp and one of the test cases to reflect this.
tv.nrksuper.no mostly uses lowercase, so that is still there.
2014-06-16 19:29:23 +02:00
pulpe
895ce482b1 [ARDIE] adjustments suggested by @jaimeMF 2014-06-16 18:15:41 +02:00
pulpe
e5da4021eb [ARDIE] fix formats extraction (fixes #3087) 2014-06-16 16:17:49 +02:00
Sergey M․
2371053565 [rai] Skip test 2014-06-16 18:50:15 +07:00
Philipp Hagemeister
33bf9033e0 release 2014.06.16 2014-06-16 10:15:24 +02:00
Jaime Marquínez Ferrándiz
35eacd0dae [brightcove] Set the filesize of the formats and use _sort_formats 2014-06-15 11:37:39 +02:00
Jaime Marquínez Ferrándiz
96bef88f5f [brightcove] Modernize some tests 2014-06-15 11:24:05 +02:00
Jaime Marquínez Ferrándiz
5524b242a7 [brightcove] Add support for renditions with 'remote' set to True (fixes #3081)
The url needs to be modified to get the flv video.
2014-06-15 11:20:40 +02:00
Jaime Marquínez Ferrándiz
a013eba65f [brightcove] Improve the 'experienceJSON' regex (#3081)
One of the strings may contain ';', we would get an invalid json string.
2014-06-15 11:08:24 +02:00
Sergey M.
36755d40b4 Merge pull request #3078 from pulpe/youtube_fix
[Youtube] Recognize playlists with LL
2014-06-15 02:49:16 +07:00
pulpe
7d568f5ab8 [Youtube] Recognize playlists with LL 2014-06-14 13:23:28 +02:00
Sergey M․
a7207cd580 [wrzuta] Add age limit 2014-06-14 17:00:59 +07:00
Sergey M.
e8ef659cd9 Merge pull request #3075 from pulpe/wrzuta
[WrzutaIE] Add extractor for wrzuta.pl (fixes #3072)
2014-06-14 16:51:27 +07:00
Sergey M․
b0adbe98fb [rai] Add support for Rai websites (Closes #2930) 2014-06-13 23:44:44 +07:00
pulpe
0c361c41b8 [WrzutaIE] Add extractor for wrzuta.pl (fixes #3072) 2014-06-13 08:51:35 +02:00
Ariset Llerena
e66ab17a36 Verified with pep8 and pyflakes 2014-06-12 23:08:06 -04:00
Ariset Llerena
cb437dc2ad removed extra char in regexp 2014-06-12 22:33:50 -04:00
Ariset Llerena
0d933b2ad5 Added vimple.ru support 2014-06-12 22:31:08 -04:00
Sergey M․
c5469e046a [livestream] Modernize 2014-06-12 20:42:46 +07:00
Sergey M․
4d2f143ce5 [ted] Update test md5 2014-06-12 20:33:53 +07:00
Sergey M․
8f93030c85 [blinkx] Modernize 2014-06-11 18:38:13 +07:00
Sergey M․
fdb9aebead [tube8] Update test and modernize 2014-06-11 18:20:14 +07:00
Sergey M․
3141feb73b [ndtv] Fix title extraction and modernize 2014-06-10 19:37:38 +07:00
Philipp Hagemeister
9706f3f802 release 2014.06.09 2014-06-09 23:16:37 +02:00
Philipp Hagemeister
d5e944359e Remove unused import 2014-06-09 23:14:04 +02:00
Philipp Hagemeister
826ec77fb2 [Vulture] Add support for vulture.com 2014-06-09 23:06:39 +02:00
Philipp Hagemeister
2656f4eb6a [hypem] Modernize 2014-06-09 22:34:41 +02:00
Philipp Hagemeister
2b88feedf7 [generic] Add support for <embed YouTube 2014-06-09 22:06:45 +02:00
Jaime Marquínez Ferrándiz
23566e0d78 rtmp and hls downloaders: Clarify error message when the external tools are not installed
Ask to install them, as we do in the postprocessor.
We get some reports with it, like #3061 or #3048.
2014-06-09 20:23:20 +02:00
Sergey M․
828553b614 [nuvid] Remove superfluous slash 2014-06-09 20:41:33 +07:00
Sergey M․
3048e82a94 [nuvid] Improve extraction 2014-06-09 20:37:04 +07:00
Sergey M․
09ffa08ba1 [veoh] Capture error message 2014-06-08 23:05:20 +07:00
Sergey M․
e0b4cc489f [dreisat] Modernize 2014-06-08 22:45:12 +07:00
Sergey M․
15e423407f [dreisat] Fix thumbnails' width and height 2014-06-08 22:41:24 +07:00
Sergey M․
702e522044 [teachertube] Fix extraction for Python 3 2014-06-08 22:16:48 +07:00
marcwebbie
77abae55df Changed video url to a public video 2014-06-08 03:13:45 -03:00
marcwebbie
617c0b2239 [GorillaVid] Added GorillaVid extractor 2014-06-07 23:09:45 -03:00
Philipp Hagemeister
814d4257df Remove unused imports 2014-06-07 16:52:34 +02:00
Philipp Hagemeister
23ae281b31 [fc2] Fall back to webpage title if needed 2014-06-07 16:52:11 +02:00
Philipp Hagemeister
94128d6b0d [nrk] Fix test checksum 2014-06-07 16:50:19 +02:00
Philipp Hagemeister
059009c592 release 2014.06.07 2014-06-07 16:42:53 +02:00
Philipp Hagemeister
9cc977f104 Credit @ralfharing for vh1 2014-06-07 16:41:44 +02:00
Philipp Hagemeister
1c0ade7afa [vh1] Skip tests (Do not work from Germany) 2014-06-07 16:40:16 +02:00
Philipp Hagemeister
f2741c8d3a [vh1] Simplify 2014-06-07 16:39:08 +02:00
Philipp Hagemeister
6ab8f3584a Merge remote-tracking branch 'ralfharing/vh1' 2014-06-07 15:53:30 +02:00
Philipp Hagemeister
8ae5ce1726 [cmt] Simplify (mentioned in #2072) 2014-06-07 15:52:49 +02:00
Philipp Hagemeister
eb92077720 [soundcloud] Add duration information (Closes #3035, Fixes #3034) 2014-06-07 15:51:01 +02:00
Philipp Hagemeister
90e0fd4bad [ku6] Improve (#3015) 2014-06-07 15:46:33 +02:00
codelol
05741e05d9 [ku6] Add new extractor 2014-06-07 15:42:33 +02:00
Philipp Hagemeister
9aa6637644 Merge branch 'master' of github.com:rg3/youtube-dl 2014-06-07 15:41:12 +02:00
Philipp Hagemeister
d30d28156d Credit @georgjaehnig for spiegeltv 2014-06-07 15:40:27 +02:00
Philipp Hagemeister
be6d722904 [cnn] Improve thumbnail extraction 2014-06-07 15:39:21 +02:00
Philipp Hagemeister
d551980823 [spiegeltv] Simplify and PEP8 2014-06-07 15:35:13 +02:00
Sergey M․
f0a6c3d2bc [teachertube] Add support for audios 2014-06-07 20:32:23 +07:00
Philipp Hagemeister
4e0fb1280a Merge remote-tracking branch 'georgjaehnig/spiegeltv' 2014-06-07 15:21:33 +02:00
Philipp Hagemeister
24f5251cce Merge remote-tracking branch 'pulpe/teachertube'
Conflicts:
	youtube_dl/extractor/__init__.py
2014-06-07 15:20:12 +02:00
Philipp Hagemeister
ac1390eee8 Merge branch 'master' of github.com:rg3/youtube-dl
Conflicts:
	youtube_dl/extractor/__init__.py
2014-06-07 15:15:39 +02:00
Philipp Hagemeister
4a5b4d34dc [tagesschau] Add support for width/height 2014-06-07 15:14:20 +02:00
Jaime Marquínez Ferrándiz
63adb0cc61 Merge pull request #3057 from pulpe/yt_fmt
[Youtube] Add format code 271 (1440p webm)
2014-06-07 14:39:28 +02:00
pulpe
3c80377b69 [Youtube] Add format code 271 (1440p webm) 2014-06-07 14:31:10 +02:00
Jaime Marquínez Ferrándiz
24577db241 [test/test_youtube_lists] Replace mix list
The old video doesn't have a mix anymore.
2014-06-07 13:43:27 +02:00
Jaime Marquínez Ferrándiz
566bd96da8 [teachingchannel] Add extractor (closes #3048) 2014-06-07 13:11:04 +02:00
Philipp Hagemeister
ebdb64d605 Merge remote-tracking branch 'pulpe/tagesschau' 2014-06-07 12:43:31 +02:00
Sergey M․
a6ffb92f0b [xvideos] Replace test 2014-06-06 21:23:36 +07:00
Sergey M․
3217377b3c [xvideos] Capture and output inline error if any 2014-06-06 21:15:06 +07:00
Jaime Marquínez Ferrándiz
24da5893fc [naver] Modernize 2014-06-06 14:57:37 +02:00
Jaime Marquínez Ferrándiz
087ca2cb07 [naver] Add rtmp formats (fixes #3054) 2014-06-06 14:55:19 +02:00
pulpe
b4e7447458 [TeacherTubeIE] Add extractor for teachertube.com videos + classrooms (fixes #3046) 2014-06-06 11:21:59 +02:00
pulpe
a45e6aadd7 [TagesschauIE] Fix possible error if quality is not defined 2014-06-06 09:00:28 +02:00
Jaime Marquínez Ferrándiz
70e322695d [youtube:playlist] Fix mixes extraction (fixes #3051)
The username seems to be empty now.
2014-06-05 21:23:27 +02:00
pulpe
6a15923b77 [TagesschauIE] Add note to 2nd _download_webpage 2014-06-05 19:34:30 +02:00
pulpe
7ffad0af5a [TagesschauIE] Remove unused import 2014-06-05 18:49:34 +02:00
pulpe
0e3ae92441 [TagesschauIE] Add extractor for tagesschau.de (fixes #3049) 2014-06-05 18:48:03 +02:00
Sergey M.
b3ae826f7a Merge pull request #3047 from pulpe/yahoo_thumb
[yahoo] improve thumbnail extraction
2014-06-05 19:31:28 +07:00
pulpe
dede691aca [yahoo] improve thumbnail extraction 2014-06-04 17:38:41 +02:00
Sergey M․
fb6a5b965b [yahoo] Improve content id extraction 2014-06-04 20:13:36 +07:00
Sergey M․
6340716b3a [yahoo] Make thumbnail optional (Closes #3043) 2014-06-04 20:11:23 +07:00
Philipp Hagemeister
b675b32e6b release 2014.06.04 2014-06-04 06:47:57 +02:00
Jaime Marquínez Ferrándiz
6a3fa81ffb [ard] Fix format extraction (fixes #3006 and #3032) 2014-06-03 21:56:49 +02:00
Georg Jaehnig
df53a98f2b [Spiegeltv] remove the md5 field to pass Travis test build 2014-06-03 17:52:39 +02:00
Georg Jaehnig
db23d8d2a2 [Spiegeltv] skip rtmp download to pass Travis test build 2014-06-03 16:50:54 +02:00
Jaime Marquínez Ferrándiz
0d69795014 Merge pull request #2962 from simonwjackson/patch-1
Update test_age_restriction.py
2014-06-03 16:47:59 +02:00
Sergey M.
3374f3fdc2 Merge pull request #3022 from MikeCol/Extremetube_title
title extraction condition less restrictive
2014-06-03 19:59:08 +07:00
Sergey M.
4bf0727b1f Merge pull request #3033 from Forever-Young/patch-2
Recognize a third format of the upload_date in the 'watch-uploader-info'...
2014-06-02 20:20:21 +07:00
Anton Novosyolov
263bd4ec50 Recognize a third format of the upload_date in the 'watch-uploader-info' element 2014-06-02 13:30:23 +04:00
Philipp Hagemeister
b7e8b6e37a release 2014.06.02 2014-06-02 10:47:24 +02:00
Sergey M․
ceb7a17f34 [mailru] Add support for new mail.ru URL format (Closes #3024) 2014-06-01 14:38:36 +07:00
Philipp Hagemeister
1a2f2e1e66 release 2014.05.31.4 2014-05-31 20:45:24 +02:00
Philipp Hagemeister
6803016858 release 2014.05.31.3 2014-05-31 20:40:48 +02:00
Philipp Hagemeister
9b7c4fd981 release 2014.05.31.2 2014-05-31 20:35:12 +02:00
Philipp Hagemeister
dc31942f42 release 2014.05.31.1 2014-05-31 20:29:53 +02:00
Philipp Hagemeister
1f6b8f3115 release 2014.05.31 2014-05-31 20:28:03 +02:00
MikeCol
9c7b79acd9 title extraction condition less restrictive 2014-05-31 18:31:39 +02:00
Jaime Marquínez Ferrándiz
9168308579 [vevo] The title in the url is optional (fixes #3020) 2014-05-31 17:55:03 +02:00
Jaime Marquínez Ferrándiz
7e8fdb1aae [fc2] Recognize urls without language part (reported in #1154) 2014-05-31 14:45:46 +02:00
Jaime Marquínez Ferrándiz
386ba39cac [fc2] Encode the string used for the md5 checksum
In python 3 it must be a bytes object.
2014-05-31 14:40:05 +02:00
Sergey M․
236d0cd07c [nrktv] Recognize tv.nrksuper.no URL 2014-05-31 17:45:00 +07:00
Jaime Marquínez Ferrándiz
ed86f38a11 [theplatform] Use unicode_literals and _download_json 2014-05-30 21:10:48 +02:00
Jaime Marquínez Ferrándiz
6db80ad2db [comedycentralshows] Transform the rtmp urls so that rtmpdump can download them (fixes #3010)
From 'rtmpe://viacomccstrmfs.fplive.net/viacomccstrm/gsp.comedystor/*' to 'rtmpe://viacommtvstrmfs.fplive.net:1935/viacommtvstrm/gsp.comedystor/*'
2014-05-30 20:59:15 +02:00
Georg Jaehnig
14470ac87b tabs as spaces 2014-05-30 17:56:13 +02:00
Georg Jaehnig
0cdf576d86 use provided function to get JSON 2014-05-30 17:51:36 +02:00
Georg Jaehnig
4ffeca4ea2 cleanup 2014-05-30 16:39:24 +02:00
Georg Jaehnig
211fd6c674 added spiegel.tv 2014-05-30 16:35:17 +02:00
Sergey M․
6ebb46c106 [ivi] Replace tests 2014-05-30 19:12:55 +07:00
Philipp Hagemeister
0f97c9a06f [ard] Fix title (#3006) 2014-05-30 04:59:18 +02:00
Philipp Hagemeister
77fb72646f release 2014.05.30.1 2014-05-30 03:26:03 +02:00
Philipp Hagemeister
aae74e3832 [Makefile] Remove CHANGELOG entry 2014-05-30 03:26:00 +02:00
Philipp Hagemeister
894e730911 release 2014.05.30 2014-05-30 03:19:51 +02:00
Philipp Hagemeister
63961d87a6 [devscripts/release] Do not commit CHANGELOG 2014-05-30 03:19:37 +02:00
Jaime Marquínez Ferrándiz
87fe568c28 [nbcnews] Add support for /feature/* pages (closes #3007) 2014-05-30 00:38:57 +02:00
Sergey M․
46531b374d Merge branch 'anovicecodemonkey-ustream-embed-recorded2' 2014-05-29 20:23:36 +07:00
Sergey M․
9e8753911c [ustream] Modernize 2014-05-29 20:22:36 +07:00
Sergey M․
5c6b1e578c [ustream] Remove unnecessary webpage download 2014-05-29 20:20:11 +07:00
Sergey M․
8f0c8fb452 Merge branch 'ustream-embed-recorded2' of https://github.com/anovicecodemonkey/youtube-dl into anovicecodemonkey-ustream-embed-recorded2 2014-05-29 19:57:42 +07:00
anovicecodemonkey
b702ecebf0 [UstreamIE] added support for "/embed/recorded/" style URLs (Fixes #2990) 2014-05-28 22:17:13 +09:30
Sergey M․
950dc95e97 Merge branch 'rzhxeo-cinemassacre' 2014-05-28 19:38:55 +07:00
Sergey M․
d9dd3584e1 [cinemassacre] Improve formats extraction and modernize 2014-05-28 19:38:44 +07:00
Sergey M․
15a9f36849 Merge branch 'cinemassacre' of https://github.com/rzhxeo/youtube-dl into rzhxeo-cinemassacre 2014-05-28 19:31:23 +07:00
Sergey M․
d0087d4ff2 [nuvid] Fix video URL extraction 2014-05-27 18:46:30 +07:00
Sergey M․
cc5ada6f4c [ivi] Update playlist tests 2014-05-26 00:16:10 +07:00
Sergey M․
dfb2e1a325 [nrktv] Add support for tv.nrk.no (Closes #2980) 2014-05-25 07:14:18 +07:00
Sergey M.
65bab327b4 Merge pull request #2953 from codesparkle/ndr-regexes-escape-correctly
[ndr] fix regexes containing illegal characters
2014-05-25 05:42:06 +07:00
Sergey M.
9eeb7abc6b Merge pull request #2960 from codesparkle/fix-test-format-note-regex
[test] fixed typo in test_format_note (test_YoutubeDL)
2014-05-25 05:36:03 +07:00
Sergey M․
c70df21099 [streamcz] Workaround CertificateError 2014-05-25 05:32:19 +07:00
Sergey M․
418424e5f5 [streamcz] Use compat_str 2014-05-25 05:30:15 +07:00
Sergey M.
8477466125 Merge pull request #2979 from pulpe/streamcz_fix
[StreamCZ] correct video id + add test
2014-05-25 05:28:49 +07:00
pulpe
865dbd4a26 [StreamCZ] correct video id + add test 2014-05-24 16:01:37 +02:00
Sergey M․
b1e6f55912 [empflix] Fix extraction 2014-05-24 01:06:03 +07:00
Sergey M․
4d78f3b770 [pornhub] Fix uploader extraction 2014-05-24 00:44:34 +07:00
Sergey M․
7f739999e9 [swrmediathek] Extract direct links from JSON and add support for audio files 2014-05-23 21:04:21 +07:00
Sergey M․
0f8a01d4f3 [swrmediathek] Simplify 2014-05-22 19:35:46 +07:00
Sergey M.
e2bf499b14 Merge pull request #2944 from pulpe/SWRMediathek
[SWRMediathek] add support for swrmediathek.de (fixes #2929)
2014-05-22 19:30:09 +07:00
rzhxeo
7cf4547ab6 [CinemassacreIE] Extract all available video/audio formats 2014-05-22 10:33:30 +02:00
Simon W. Jackson
8ae980807a Update test_age_restriction.py
typo
2014-05-21 16:35:49 +02:00
Sergey M․
eec4d8ef96 [gamekings] Update test description 2014-05-21 19:53:58 +07:00
codesparkle
1c783bca88 fixed (what I assume was a typo) that caused test_format_note to always fail.
This test was introduced in c57f775710.
2014-05-21 18:03:17 +10:00
Philipp Hagemeister
ac73651f66 Merge pull request #2940 from codesparkle/remove-unused-files
Remove old, unused CHANGELOG and LATEST_VERSION files
2014-05-21 08:33:13 +02:00
codesparkle
e5ceb3bfda Bringing back LATEST_VERSION 2014-05-21 00:55:54 +10:00
Sergey M․
c2ef29234c Credit @codesparkle for #2928, #2934, #2938, #2939 2014-05-20 20:12:57 +07:00
Sergey M.
1a1826c1af Merge pull request #2939 from codesparkle/upload-date-fix
No longer erroneously calculate upload_date within some extractors
2014-05-20 19:53:28 +07:00
Sergey M․
c7c6d43fe1 Merge branch 'codesparkle-bandcamp-albums-regex-duplicate-fix' 2014-05-20 19:45:28 +07:00
Sergey M․
2902d44f99 [bandcamp] Replace maxsplit keyword argument with regular one
Named arguments are not supported by methods implemented in native C (see http://bugs.python.org/issue1176)
2014-05-20 19:44:42 +07:00
Sergey M․
d6e4ba287b Merge branch 'bandcamp-albums-regex-duplicate-fix' of https://github.com/codesparkle/youtube-dl into codesparkle-bandcamp-albums-regex-duplicate-fix 2014-05-20 19:38:28 +07:00
Tobias Bell
e5c3a4b549 [gameone] Fix indentation and removed unused constants 2014-05-19 22:33:51 +02:00
Philipp Hagemeister
f50ee8d1c3 Merge branch 'master' of github.com:rg3/youtube-dl 2014-05-19 17:10:19 +02:00
Philipp Hagemeister
0e67ab0d8e [generic] Abort if user passes in URL "url" (#2942) 2014-05-19 17:10:11 +02:00
Adam Malcontenti-Wilson
1d0668ed5a [tenplay] Add new extractor 2014-05-19 23:28:21 +10:00
Adam Malcontenti-Wilson
d415299a80 [adultswim] Fix tests 2014-05-19 22:32:45 +10:00
codesparkle
77541837e5 The opening curly brace, '{', is a regex reserved control character, so it needs to be escaped (see http://stackoverflow.com/a/400316/1106367)
Minor improvements:
no need to sort the whole list if all we need is the maximum element, also instead of reinventing the wheel we can use utils to get indices from qualities.
2014-05-19 22:17:54 +10:00
Adam Malcontenti-Wilson
48fbb1003d [adultswim] Add new extractor 2014-05-19 22:05:46 +10:00
Sergey M․
e3a6576f35 [nowness] Update test file md5 and modernize 2014-05-19 19:05:18 +07:00
Philipp Hagemeister
89bb8e97ee release 2014.05.19 2014-05-19 11:42:37 +02:00
pulpe
375696b1b1 [SWRMediathek] add support for swrmediathek.de 2014-05-18 14:56:35 +02:00
Sergey M․
4ea5c7b70d [ndr] Improve thumbnail extraction 2014-05-18 14:23:02 +07:00
Tobias Bell
305d068362 [gameone] Added timestamp extraction 2014-05-17 19:04:02 +02:00
Tobias Bell
a231ce87b5 [gameone] Added extraction of age_limit 2014-05-17 18:35:11 +02:00
Tobias Bell
a84d20fc14 [gameone] Simplified extraction of description 2014-05-17 18:20:29 +02:00
Tobias Bell
9e30092361 [gameone] Added extraction of description and fixed failing tests 2014-05-17 17:07:40 +02:00
Tobias Bell
10d5c7aa5f [gameone] Added explanation for usage of http://cdn.riptide-mtvn.com/ 2014-05-17 15:10:19 +02:00
Tobias Bell
412f356e04 [gameone] Add new extractor gameone
Currently only usable for downloading tv episodes residing under
http://www.gameone.de/tv/
2014-05-17 14:47:23 +02:00
Sergey M․
8dfa187b8a [generic] Support pagespeed_iframe for NovaMov embeds 2014-05-17 18:12:12 +07:00
Sergey M․
c1ed1f7055 [ndr] Fix title, description and duration extraction 2014-05-17 18:11:40 +07:00
Sergey M․
1514f74967 [ndr] Fix thumbnail extraction 2014-05-17 17:58:37 +07:00
codesparkle
2e8323e3f7 CHANGELOG and LATEST_VERSION seem to serve no purpose at all. They haven't been changed in years. Unless these are actually used somewhere, let's get rid of them. 2014-05-17 17:07:50 +10:00
codesparkle
69f8364042 removed duplicate and somemtimes incorrect logic for parsing upload date as this job is already taken care of automatically by YoutubeDL.py 2014-05-17 15:21:46 +10:00
codesparkle
79981f039b Fixed test failure in test_all_urls: test_no_duplicates: BandcampAlbumIE inappropriately matched non-album bandcamp links as well.
BandcampIE changed to report full-accuracy duration instead of unnecessarily rounding it to the nearest integer.
Simplified conditionals and parsing a bit. Fixed typos.
2014-05-17 14:22:24 +10:00
Ralf Haring
34d863f3fc [vh1] use standard sort (#2072) 2014-05-16 23:49:41 -04:00
Philipp Hagemeister
91994c2c81 release 2014.05.17 2014-05-17 00:17:40 +02:00
Ralf Haring
3ee4b60d56 [vh1] Add new extractor (#2072) 2014-05-16 18:15:02 -04:00
Jaime Marquínez Ferrándiz
76e92371ac [youtube] Recognize a second format of the upload_date in the 'watch-uploader-info' element (#2911) 2014-05-16 22:12:52 +02:00
Jaime Marquínez Ferrándiz
08af0205f9 Merge remote-tracking branch 'codesparkle/fix-photobucket-url' (closes #2934)
Fix photobucket url extraction
2014-05-16 20:44:52 +02:00
codesparkle
a725fb1f43 test_download works for photobucket after this change 2014-05-17 03:25:41 +10:00
Jaime Marquínez Ferrándiz
05ee2b6dad [youtube] Fix extraction of the feed 'paging' values (fixes #2925) 2014-05-16 16:01:13 +02:00
Philipp Hagemeister
b74feacac5 release 2014.05.16.1 2014-05-16 15:53:17 +02:00
Philipp Hagemeister
426b52fc5d Merge remote-tracking branch 'origin/master' 2014-05-16 15:52:01 +02:00
Philipp Hagemeister
5c30b26846 [francetv] Add support for non-numeric video IDs (Fixes #2927) 2014-05-16 15:51:01 +02:00
Philipp Hagemeister
f07b74fc18 [ffmpeg] Correct argument encoding on Windows with Python 2.x
Fixes #2924
2014-05-16 15:47:56 +02:00
Sergey M․
a5a45015ba [generic] Fix redirect 2014-05-16 20:32:53 +07:00
Philipp Hagemeister
beee53de06 [youtube] Look for published-on date if uploaded-on is not found
Fixes #2911
2014-05-16 13:21:44 +02:00
Philipp Hagemeister
8712f2bea7 release 2014.05.16 2014-05-16 12:04:52 +02:00
Philipp Hagemeister
ea102818c9 Merge remote-tracking branch 'origin/master' 2014-05-16 12:04:24 +02:00
Philipp Hagemeister
0a871f6880 Provide compatibility check_output for 2.6 (Fixes #2926) 2014-05-16 12:03:59 +02:00
Sergey M․
481efc84a8 [bliptv] Switch extraction to RSS (Closes #2920) 2014-05-15 22:20:40 +07:00
Jaime Marquínez Ferrándiz
01ed5c9be3 [youtube] Fix typo 2014-05-15 13:43:29 +02:00
Philipp Hagemeister
ad3bc6acd5 Document and test categories (#2923) 2014-05-15 12:41:42 +02:00
Philipp Hagemeister
5afa7f8bee [extractor/common] --write-pages: Correct file name if video_id is None 2014-05-15 12:39:33 +02:00
Dario Guarascio
ec8deefc27 [youtube] Video categories added to metadata 2014-05-15 13:59:27 +07:00
Sergey M․
a2d5a4ee64 [gamespot] Update test URL and modernize 2014-05-14 20:13:34 +07:00
Jaime Marquínez Ferrándiz
dffcc2ea0c Makefile: write the manpage to the right file and use the processed markdown document 2014-05-13 14:37:05 +02:00
Philipp Hagemeister
1800eeefed add prepare_manpage 2014-05-13 14:21:21 +02:00
Sergey M․
d7e7dedbde [noco] Skip test 2014-05-13 19:12:17 +07:00
Philipp Hagemeister
d19bb9c0aa Split man and README (Fixes #2892) 2014-05-13 11:16:11 +02:00
Philipp Hagemeister
3ef79a974a [README] Stress example URL
This seems to be the part most often overlooked in our README.
2014-05-13 10:28:58 +02:00
139 changed files with 4940 additions and 1274 deletions

View File

@@ -1,14 +0,0 @@
2013.01.02 Codename: GIULIA
* Add support for ComedyCentral clips <nto>
* Corrected Vimeo description fetching <Nick Daniels>
* Added the --no-post-overwrites argument <Barbu Paul - Gheorghe>
* --verbose offers more environment info
* New info_dict field: uploader_id
* New updates system, with signature checking
* New IEs: NBA, JustinTV, FunnyOrDie, TweetReel, Steam, Ustream
* Fixed IEs: BlipTv
* Fixed for Python 3 IEs: Xvideo, Youku, XNXX, Dailymotion, Vimeo, InfoQ
* Simplified IEs and test code
* Various (Python 3 and other) fixes
* Revamped and expanded tests

View File

@@ -1,7 +1,7 @@
all: youtube-dl README.md README.txt youtube-dl.1 youtube-dl.bash-completion
clean:
rm -rf youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz
rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz
cleanall: clean
rm -f youtube-dl youtube-dl.exe
@@ -55,7 +55,9 @@ README.txt: README.md
pandoc -f markdown -t plain README.md -o README.txt
youtube-dl.1: README.md
pandoc -s -f markdown -t man README.md -o youtube-dl.1
python devscripts/prepare_manpage.py >youtube-dl.1.temp.md
pandoc -s -f markdown -t man youtube-dl.1.temp.md -o youtube-dl.1
rm -f youtube-dl.1.temp.md
youtube-dl.bash-completion: youtube_dl/*.py youtube_dl/*/*.py devscripts/bash-completion.in
python devscripts/bash-completion.py
@@ -75,6 +77,6 @@ youtube-dl.tar.gz: youtube-dl README.md README.txt youtube-dl.1 youtube-dl.bash-
--exclude 'docs/_build' \
-- \
bin devscripts test youtube_dl docs \
CHANGELOG LICENSE README.md README.txt \
LICENSE README.md README.txt \
Makefile MANIFEST.in youtube-dl.1 youtube-dl.bash-completion setup.py \
youtube-dl

View File

@@ -1,11 +1,24 @@
% YOUTUBE-DL(1)
# NAME
youtube-dl - download videos from youtube.com or other video platforms
# SYNOPSIS
**youtube-dl** [OPTIONS] URL [URL...]
# INSTALLATION
To install it right away for all UNIX users (Linux, OS X, etc.), type:
sudo curl https://yt-dl.org/latest/youtube-dl -o /usr/local/bin/youtube-dl
sudo chmod a+x /usr/local/bin/youtube-dl
If you do not have curl, you can alternatively use a recent wget:
sudo wget https://yt-dl.org/downloads/latest/youtube-dl -O /usr/local/bin/youtube-dl
sudo chmod a+x /usr/local/bin/youtube-dl
Windows users can [download a .exe file](https://yt-dl.org/latest/youtube-dl.exe) and place it in their home directory or any other location on their [PATH](http://en.wikipedia.org/wiki/PATH_%28variable%29).
Alternatively, refer to the developer instructions below for how to check out and work with the git repository. For further options, including PGP signatures, see https://rg3.github.io/youtube-dl/download.html .
# DESCRIPTION
**youtube-dl** is a small command-line program to download videos from
YouTube.com and a few more sites. It requires the Python interpreter, version
@@ -57,8 +70,9 @@ which means you can modify it, redistribute it or use it however you like.
--default-search PREFIX Use this prefix for unqualified URLs. For
example "gvsearch2:" downloads two videos
from google videos for youtube-dl "large
apple". By default (with value "auto")
youtube-dl guesses.
apple". Use the value "auto" to let
youtube-dl guess. The default value "error"
just throws an error.
--ignore-config Do not read configuration files. When given
in the global configuration file /etc
/youtube-dl.conf: do not read the user
@@ -241,7 +255,7 @@ which means you can modify it, redistribute it or use it however you like.
128K (default 5)
--recode-video FORMAT Encode the video to another format if
necessary (currently supported:
mp4|flv|ogg|webm)
mp4|flv|ogg|webm|mkv)
-k, --keep-video keeps the video file on disk after the
post-processing; the video is erased by
default
@@ -458,7 +472,7 @@ If your report is shorter than two lines, it is almost certainly missing some of
For bug reports, this means that your report should contain the *complete* output of youtube-dl when called with the -v flag. The error message you get for (most) bugs even says so, but you would not believe how many of our bug reports do not contain this information.
Site support requests must contain an example URL. An example URL is a URL you might want to download, like http://www.youtube.com/watch?v=BaW_jenozKc . There should be an obvious video present. Except under very special circumstances, the main page of a video service (e.g. http://www.youtube.com/ ) is *not* an example URL.
Site support requests **must contain an example URL**. An example URL is a URL you might want to download, like http://www.youtube.com/watch?v=BaW_jenozKc . There should be an obvious video present. Except under very special circumstances, the main page of a video service (e.g. http://www.youtube.com/ ) is *not* an example URL.
### Are you using the latest version?

View File

@@ -15,7 +15,7 @@ header = oldreadme[:oldreadme.index('# OPTIONS')]
footer = oldreadme[oldreadme.index('# CONFIGURATION'):]
options = helptext[helptext.index(' General Options:') + 19:]
options = re.sub(r'^ (\w.+)$', r'## \1', options, flags=re.M)
options = re.sub(r'(?m)^ (\w.+)$', r'## \1', options)
options = '# OPTIONS\n' + options + '\n'
with io.open(README_FILE, 'w', encoding='utf-8') as f:

View File

@@ -0,0 +1,20 @@
import io
import os.path
import sys
import re
ROOT_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
README_FILE = os.path.join(ROOT_DIR, 'README.md')
with io.open(README_FILE, encoding='utf-8') as f:
readme = f.read()
PREFIX = '%YOUTUBE-DL(1)\n\n# NAME\n'
readme = re.sub(r'(?s)# INSTALLATION.*?(?=# DESCRIPTION)', '', readme)
readme = PREFIX + readme
if sys.version_info < (3, 0):
print(readme.encode('utf-8'))
else:
print(readme)

View File

@@ -45,9 +45,9 @@ fi
/bin/echo -e "\n### Changing version in version.py..."
sed -i "s/__version__ = '.*'/__version__ = '$version'/" youtube_dl/version.py
/bin/echo -e "\n### Committing CHANGELOG README.md and youtube_dl/version.py..."
/bin/echo -e "\n### Committing README.md and youtube_dl/version.py..."
make README.md
git add CHANGELOG README.md youtube_dl/version.py
git add README.md youtube_dl/version.py
git commit -m "release $version"
/bin/echo -e "\n### Now tagging, signing and pushing..."

1
test/swftests/.gitignore vendored Normal file
View File

@@ -0,0 +1 @@
*.swf

View File

@@ -0,0 +1,19 @@
// input: [["a", "b", "c", "d"]]
// output: ["c", "b", "a", "d"]
package {
public class ArrayAccess {
public static function main(ar:Array):Array {
var aa:ArrayAccess = new ArrayAccess();
return aa.f(ar, 2);
}
private function f(ar:Array, num:Number):Array{
var x:String = ar[0];
var y:String = ar[num % ar.length];
ar[0] = y;
ar[num] = x;
return ar;
}
}
}

View File

@@ -0,0 +1,17 @@
// input: []
// output: 121
package {
public class ClassCall {
public static function main():int{
var f:OtherClass = new OtherClass();
return f.func(100,20);
}
}
}
class OtherClass {
public function func(x: int, y: int):int {
return x+y+1;
}
}

View File

@@ -0,0 +1,15 @@
// input: []
// output: 0
package {
public class ClassConstruction {
public static function main():int{
var f:Foo = new Foo();
return 0;
}
}
}
class Foo {
}

View File

@@ -0,0 +1,13 @@
// input: [1, 2]
// output: 3
package {
public class LocalVars {
public static function main(a:int, b:int):int{
var c:int = a + b + b;
var d:int = c - b;
var e:int = d;
return e;
}
}
}

View File

@@ -0,0 +1,21 @@
// input: []
// output: 9
package {
public class PrivateCall {
public static function main():int{
var f:OtherClass = new OtherClass();
return f.func();
}
}
}
class OtherClass {
private function pf():int {
return 9;
}
public function func():int {
return this.pf();
}
}

View File

@@ -0,0 +1,13 @@
// input: [1]
// output: 1
package {
public class StaticAssignment {
public static var v:int;
public static function main(a:int):int{
v = a;
return v;
}
}
}

View File

@@ -0,0 +1,16 @@
// input: []
// output: 1
package {
public class StaticRetrieval {
public static var v:int;
public static function main():int{
if (v) {
return 0;
} else {
return 1;
}
}
}
}

View File

@@ -67,7 +67,7 @@ class TestFormatSelection(unittest.TestCase):
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['ext'], 'mp4')
# No prefer_free_formats => prefer mp4 and flv for greater compatibilty
# No prefer_free_formats => prefer mp4 and flv for greater compatibility
ydl = YDL()
ydl.params['prefer_free_formats'] = False
formats = [
@@ -279,7 +279,7 @@ class TestFormatSelection(unittest.TestCase):
self.assertEqual(ydl._format_note({}), '')
assertRegexpMatches(self, ydl._format_note({
'vbr': 10,
}), '^x\s*10k$')
}), '^\s*10k$')
if __name__ == '__main__':
unittest.main()

View File

@@ -13,7 +13,7 @@ from youtube_dl import YoutubeDL
def _download_restricted(url, filename, age):
""" Returns true iff the file has been downloaded """
""" Returns true if the file has been downloaded """
params = {
'age_limit': age,

View File

@@ -69,9 +69,6 @@ class TestAllURLsMatching(unittest.TestCase):
def test_youtube_show_matching(self):
self.assertMatch('http://www.youtube.com/show/airdisasters', ['youtube:show'])
def test_youtube_truncated(self):
self.assertMatch('http://www.youtube.com/watch?', ['youtube:truncated_url'])
def test_youtube_search_matching(self):
self.assertMatch('http://www.youtube.com/results?search_query=making+mustard', ['youtube:search_url'])
self.assertMatch('https://www.youtube.com/results?baz=bar&search_query=youtube-dl+test+video&filters=video&lclk=video', ['youtube:search_url'])

View File

@@ -28,7 +28,9 @@ from youtube_dl.extractor import (
SoundcloudSetIE,
SoundcloudUserIE,
SoundcloudPlaylistIE,
TeacherTubeUserIE,
LivestreamIE,
LivestreamOriginalIE,
NHLVideocenterIE,
BambuserChannelIE,
BandcampAlbumIE,
@@ -39,6 +41,7 @@ from youtube_dl.extractor import (
KhanAcademyIE,
EveryonesMixtapeIE,
RutubeChannelIE,
RutubePersonIE,
GoogleSearchIE,
GenericIE,
TEDIE,
@@ -108,15 +111,15 @@ class TestPlaylists(unittest.TestCase):
ie = VineUserIE(dl)
result = ie.extract('https://vine.co/Visa')
self.assertIsPlaylist(result)
self.assertTrue(len(result['entries']) >= 50)
self.assertTrue(len(result['entries']) >= 47)
def test_ustream_channel(self):
dl = FakeYDL()
ie = UstreamChannelIE(dl)
result = ie.extract('http://www.ustream.tv/channel/young-americans-for-liberty')
result = ie.extract('http://www.ustream.tv/channel/channeljapan')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], '5124905')
self.assertTrue(len(result['entries']) >= 6)
self.assertEqual(result['id'], '10874166')
self.assertTrue(len(result['entries']) >= 54)
def test_soundcloud_set(self):
dl = FakeYDL()
@@ -134,6 +137,14 @@ class TestPlaylists(unittest.TestCase):
self.assertEqual(result['id'], '9615865')
self.assertTrue(len(result['entries']) >= 12)
def test_soundcloud_likes(self):
dl = FakeYDL()
ie = SoundcloudUserIE(dl)
result = ie.extract('https://soundcloud.com/the-concept-band/likes')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], '9615865')
self.assertTrue(len(result['entries']) >= 1)
def test_soundcloud_playlist(self):
dl = FakeYDL()
ie = SoundcloudPlaylistIE(dl)
@@ -153,6 +164,14 @@ class TestPlaylists(unittest.TestCase):
self.assertEqual(result['title'], 'TEDCity2.0 (English)')
self.assertTrue(len(result['entries']) >= 4)
def test_livestreamoriginal_folder(self):
dl = FakeYDL()
ie = LivestreamOriginalIE(dl)
result = ie.extract('https://www.livestream.com/newplay/folder?dirId=a07bf706-d0e4-4e75-a747-b021d84f2fd3')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], 'a07bf706-d0e4-4e75-a747-b021d84f2fd3')
self.assertTrue(len(result['entries']) >= 28)
def test_nhl_videocenter(self):
dl = FakeYDL()
ie = NHLVideocenterIE(dl)
@@ -209,20 +228,20 @@ class TestPlaylists(unittest.TestCase):
def test_ivi_compilation(self):
dl = FakeYDL()
ie = IviCompilationIE(dl)
result = ie.extract('http://www.ivi.ru/watch/dezhurnyi_angel')
result = ie.extract('http://www.ivi.ru/watch/dvoe_iz_lartsa')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], 'dezhurnyi_angel')
self.assertEqual(result['title'], 'Дежурный ангел (2010 - 2012)')
self.assertTrue(len(result['entries']) >= 23)
self.assertEqual(result['id'], 'dvoe_iz_lartsa')
self.assertEqual(result['title'], 'Двое из ларца (2006 - 2008)')
self.assertTrue(len(result['entries']) >= 24)
def test_ivi_compilation_season(self):
dl = FakeYDL()
ie = IviCompilationIE(dl)
result = ie.extract('http://www.ivi.ru/watch/dezhurnyi_angel/season2')
result = ie.extract('http://www.ivi.ru/watch/dvoe_iz_lartsa/season1')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], 'dezhurnyi_angel/season2')
self.assertEqual(result['title'], 'Дежурный ангел (2010 - 2012) 2 сезон')
self.assertTrue(len(result['entries']) >= 7)
self.assertEqual(result['id'], 'dvoe_iz_lartsa/season1')
self.assertEqual(result['title'], 'Двое из ларца (2006 - 2008) 1 сезон')
self.assertTrue(len(result['entries']) >= 12)
def test_imdb_list(self):
dl = FakeYDL()
@@ -255,10 +274,18 @@ class TestPlaylists(unittest.TestCase):
def test_rutube_channel(self):
dl = FakeYDL()
ie = RutubeChannelIE(dl)
result = ie.extract('http://rutube.ru/tags/video/1409')
result = ie.extract('http://rutube.ru/tags/video/1800/')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], '1409')
self.assertTrue(len(result['entries']) >= 34)
self.assertEqual(result['id'], '1800')
self.assertTrue(len(result['entries']) >= 68)
def test_rutube_person(self):
dl = FakeYDL()
ie = RutubePersonIE(dl)
result = ie.extract('http://rutube.ru/video/person/313878/')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], '313878')
self.assertTrue(len(result['entries']) >= 37)
def test_multiple_brightcove_videos(self):
# https://github.com/rg3/youtube-dl/issues/2283
@@ -360,5 +387,13 @@ class TestPlaylists(unittest.TestCase):
result['title'], 'Brace Yourself - Today\'s Weirdest News')
self.assertTrue(len(result['entries']) >= 10)
def test_TeacherTubeUser(self):
dl = FakeYDL()
ie = TeacherTubeUserIE(dl)
result = ie.extract('http://www.teachertube.com/user/profile/rbhagwati2')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], 'rbhagwati2')
self.assertTrue(len(result['entries']) >= 179)
if __name__ == '__main__':
unittest.main()

View File

@@ -87,7 +87,7 @@ class TestYoutubeSubtitles(BaseTestSubtitles):
def test_youtube_nosubtitles(self):
self.DL.expect_warning(u'video doesn\'t have subtitles')
self.url = 'sAjKT8FhjI8'
self.url = 'n5BB19UTcdA'
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()

76
test/test_swfinterp.py Normal file
View File

@@ -0,0 +1,76 @@
#!/usr/bin/env python
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import io
import json
import re
import subprocess
from youtube_dl.swfinterp import SWFInterpreter
TEST_DIR = os.path.join(
os.path.dirname(os.path.abspath(__file__)), 'swftests')
class TestSWFInterpreter(unittest.TestCase):
pass
def _make_testfunc(testfile):
m = re.match(r'^(.*)\.(as)$', testfile)
if not m:
return
test_id = m.group(1)
def test_func(self):
as_file = os.path.join(TEST_DIR, testfile)
swf_file = os.path.join(TEST_DIR, test_id + '.swf')
if ((not os.path.exists(swf_file))
or os.path.getmtime(swf_file) < os.path.getmtime(as_file)):
# Recompile
try:
subprocess.check_call(['mxmlc', '-output', swf_file, as_file])
except OSError as ose:
if ose.errno == errno.ENOENT:
print('mxmlc not found! Skipping test.')
return
raise
with open(swf_file, 'rb') as swf_f:
swf_content = swf_f.read()
swfi = SWFInterpreter(swf_content)
with io.open(as_file, 'r', encoding='utf-8') as as_f:
as_content = as_f.read()
def _find_spec(key):
m = re.search(
r'(?m)^//\s*%s:\s*(.*?)\n' % re.escape(key), as_content)
if not m:
raise ValueError('Cannot find %s in %s' % (key, testfile))
return json.loads(m.group(1))
input_args = _find_spec('input')
output = _find_spec('output')
swf_class = swfi.extract_class(test_id)
func = swfi.extract_function(swf_class, 'main')
res = func(input_args)
self.assertEqual(res, output)
test_func.__name__ = str('test_swf_' + test_id)
setattr(TestSWFInterpreter, test_func.__name__, test_func)
for testfile in os.listdir(TEST_DIR):
_make_testfunc(testfile)
if __name__ == '__main__':
unittest.main()

View File

@@ -112,11 +112,11 @@ class TestYoutubeLists(unittest.TestCase):
def test_youtube_mix(self):
dl = FakeYDL()
ie = YoutubePlaylistIE(dl)
result = ie.extract('http://www.youtube.com/watch?v=lLJf9qJHR3E&list=RDrjFaenf1T-Y')
result = ie.extract('https://www.youtube.com/watch?v=W01L70IGBgE&index=2&list=RDOQpdSVF_k_w')
entries = result['entries']
self.assertTrue(len(entries) >= 20)
original_video = entries[0]
self.assertEqual(original_video['id'], 'rjFaenf1T-Y')
self.assertEqual(original_video['id'], 'OQpdSVF_k_w')
def test_youtube_toptracks(self):
print('Skipping: The playlist page gives error 500')

View File

@@ -33,6 +33,24 @@ _TESTS = [
90,
u']\\[@?>=<;:/.-,+*)(\'&%$#"hZYXWVUTSRQPONMLKJIHGFEDCBAzyxwvutsrqponmlkjiagfedcb39876',
),
(
u'https://s.ytimg.com/yts/jsbin/html5player-en_US-vfl0Cbn9e.js',
u'js',
84,
u'O1I3456789abcde0ghijklmnopqrstuvwxyzABCDEFGHfJKLMN2PQRSTUVW@YZ!"#$%&\'()*+,-./:;<=',
),
(
u'https://s.ytimg.com/yts/jsbin/html5player-en_US-vflXGBaUN.js',
u'js',
u'2ACFC7A61CA478CD21425E5A57EBD73DDC78E22A.2094302436B2D377D14A3BBA23022D023B8BC25AA',
u'A52CB8B320D22032ABB3A41D773D2B6342034902.A22E87CDD37DBE75A5E52412DC874AC16A7CFCA2',
),
(
u'http://s.ytimg.com/yts/swfbin/player-vfl5vIhK2/watch_as3.swf',
u'swf',
86,
u'O1I3456789abcde0ghijklmnopqrstuvwxyzABCDEFGHfJKLMN2PQRSTUVWXY\\!"#$%&\'()*+,-./:;<=>?'
),
]
@@ -44,13 +62,13 @@ class TestSignature(unittest.TestCase):
os.mkdir(self.TESTDATA_DIR)
def make_tfunc(url, stype, sig_length, expected_sig):
basename = url.rpartition('/')[2]
m = re.match(r'.*-([a-zA-Z0-9_-]+)\.[a-z]+$', basename)
assert m, '%r should follow URL format' % basename
def make_tfunc(url, stype, sig_input, expected_sig):
m = re.match(r'.*-([a-zA-Z0-9_-]+)(?:/watch_as3)?\.[a-z]+$', url)
assert m, '%r should follow URL format' % url
test_id = m.group(1)
def test_func(self):
basename = 'player-%s.%s' % (test_id, stype)
fn = os.path.join(self.TESTDATA_DIR, basename)
if not os.path.exists(fn):
@@ -66,7 +84,9 @@ def make_tfunc(url, stype, sig_length, expected_sig):
with open(fn, 'rb') as testf:
swfcode = testf.read()
func = ie._parse_sig_swf(swfcode)
src_sig = compat_str(string.printable[:sig_length])
src_sig = (
compat_str(string.printable[:sig_input])
if isinstance(sig_input, int) else sig_input)
got_sig = func(src_sig)
self.assertEqual(got_sig, expected_sig)

View File

@@ -717,6 +717,17 @@ class YoutubeDL(object):
info_dict['playlist'] = None
info_dict['playlist_index'] = None
thumbnails = info_dict.get('thumbnails')
if thumbnails:
thumbnails.sort(key=lambda t: (
t.get('width'), t.get('height'), t.get('url')))
for t in thumbnails:
if 'width' in t and 'height' in t:
t['resolution'] = '%dx%d' % (t['width'], t['height'])
if thumbnails and 'thumbnail' not in info_dict:
info_dict['thumbnail'] = thumbnails[-1]['url']
if 'display_id' not in info_dict and 'id' in info_dict:
info_dict['display_id'] = info_dict['id']
@@ -982,6 +993,8 @@ class YoutubeDL(object):
fd = get_suitable_downloader(info)(self, self.params)
for ph in self._progress_hooks:
fd.add_progress_hook(ph)
if self.params.get('verbose'):
self.to_stdout('[debug] Invoking downloader on %r' % info.get('url'))
return fd.download(name, info)
if info_dict.get('requested_formats') is not None:
downloaded = []

View File

@@ -56,6 +56,16 @@ __authors__ = (
'Nicolas Évrard',
'Jason Normore',
'Hoje Lee',
'Adam Thalhammer',
'Georg Jähnig',
'Ralf Haring',
'Koki Takahashi',
'Ariset Llerena',
'Adam Malcontenti-Wilson',
'Tobias Bell',
'Naglis Jonaitis',
'Charles Chen',
'Hassaan Ali',
)
__license__ = 'Public Domain'
@@ -266,7 +276,7 @@ def parseOpts(overrideArguments=None):
general.add_option(
'--default-search',
dest='default_search', metavar='PREFIX',
help='Use this prefix for unqualified URLs. For example "gvsearch2:" downloads two videos from google videos for youtube-dl "large apple". By default (with value "auto") youtube-dl guesses.')
help='Use this prefix for unqualified URLs. For example "gvsearch2:" downloads two videos from google videos for youtube-dl "large apple". Use the value "auto" to let youtube-dl guess. The default value "error" just throws an error.')
general.add_option(
'--ignore-config',
action='store_true',
@@ -502,7 +512,7 @@ def parseOpts(overrideArguments=None):
postproc.add_option('--audio-quality', metavar='QUALITY', dest='audioquality', default='5',
help='ffmpeg/avconv audio quality specification, insert a value between 0 (better) and 9 (worse) for VBR or a specific bitrate like 128K (default 5)')
postproc.add_option('--recode-video', metavar='FORMAT', dest='recodevideo', default=None,
help='Encode the video to another format if necessary (currently supported: mp4|flv|ogg|webm)')
help='Encode the video to another format if necessary (currently supported: mp4|flv|ogg|webm|mkv)')
postproc.add_option('-k', '--keep-video', action='store_true', dest='keepvideo', default=False,
help='keeps the video file on disk after the post-processing; the video is erased by default')
postproc.add_option('--no-post-overwrites', action='store_true', dest='nopostoverwrites', default=False,

View File

@@ -25,7 +25,7 @@ class HlsFD(FileDownloader):
except (OSError, IOError):
pass
else:
self.report_error(u'm3u8 download detected but ffmpeg or avconv could not be found')
self.report_error(u'm3u8 download detected but ffmpeg or avconv could not be found. Please install one.')
cmd = [program] + args
retval = subprocess.call(cmd)

View File

@@ -96,6 +96,7 @@ class RtmpFD(FileDownloader):
flash_version = info_dict.get('flash_version', None)
live = info_dict.get('rtmp_live', False)
conn = info_dict.get('rtmp_conn', None)
protocol = info_dict.get('rtmp_protocol', None)
self.report_destination(filename)
tmpfilename = self.temp_name(filename)
@@ -105,7 +106,7 @@ class RtmpFD(FileDownloader):
try:
subprocess.call(['rtmpdump', '-h'], stdout=(open(os.path.devnull, 'w')), stderr=subprocess.STDOUT)
except (OSError, IOError):
self.report_error('RTMP download detected but "rtmpdump" could not be run')
self.report_error('RTMP download detected but "rtmpdump" could not be run. Please install it.')
return False
# Download using rtmpdump. rtmpdump returns exit code 2 when
@@ -133,6 +134,8 @@ class RtmpFD(FileDownloader):
basic_args += ['--conn', entry]
elif isinstance(conn, compat_str):
basic_args += ['--conn', conn]
if protocol is not None:
basic_args += ['--protocol', protocol]
args = basic_args + [[], ['--resume', '--skip', '1']][not live and self.params.get('continuedl', False)]
if sys.platform == 'win32' and sys.version_info < (3, 0):

View File

@@ -1,8 +1,10 @@
from .academicearth import AcademicEarthCourseIE
from .addanime import AddAnimeIE
from .adultswim import AdultSwimIE
from .aftonbladet import AftonbladetIE
from .anitube import AnitubeIE
from .aol import AolIE
from .allocine import AllocineIE
from .aparat import AparatIE
from .appletrailers import AppleTrailersIE
from .archiveorg import ArchiveOrgIE
@@ -51,6 +53,7 @@ from .cnn import (
from .collegehumor import CollegeHumorIE
from .comedycentral import ComedyCentralIE, ComedyCentralShowsIE
from .condenast import CondeNastIE
from .cracked import CrackedIE
from .criterion import CriterionIE
from .crunchyroll import CrunchyrollIE
from .cspan import CSpanIE
@@ -61,8 +64,10 @@ from .dailymotion import (
DailymotionUserIE,
)
from .daum import DaumIE
from .dfb import DFBIE
from .dotsub import DotsubIE
from .dreisat import DreiSatIE
from .drtv import DRTVIE
from .defense import DefenseGouvFrIE
from .discovery import DiscoveryIE
from .divxstage import DivxStageIE
@@ -81,6 +86,7 @@ from .extremetube import ExtremeTubeIE
from .facebook import FacebookIE
from .faz import FazIE
from .fc2 import FC2IE
from .firedrive import FiredriveIE
from .firstpost import FirstpostIE
from .firsttv import FirstTVIE
from .fivemin import FiveMinIE
@@ -103,12 +109,15 @@ from .freesound import FreesoundIE
from .freespeech import FreespeechIE
from .funnyordie import FunnyOrDieIE
from .gamekings import GamekingsIE
from .gameone import GameOneIE
from .gamespot import GameSpotIE
from .gametrailers import GametrailersIE
from .gdcvault import GDCVaultIE
from .generic import GenericIE
from .googleplus import GooglePlusIE
from .googlesearch import GoogleSearchIE
from .gorillavid import GorillaVidIE
from .goshgay import GoshgayIE
from .hark import HarkIE
from .helsinki import HelsinkiIE
from .hentaistigma import HentaiStigmaIE
@@ -142,10 +151,15 @@ from .khanacademy import KhanAcademyIE
from .kickstarter import KickStarterIE
from .keek import KeekIE
from .kontrtube import KontrTubeIE
from .ku6 import Ku6IE
from .la7 import LA7IE
from .lifenews import LifeNewsIE
from .liveleak import LiveLeakIE
from .livestream import LivestreamIE, LivestreamOriginalIE
from .livestream import (
LivestreamIE,
LivestreamOriginalIE,
LivestreamShortenerIE,
)
from .lynda import (
LyndaIE,
LyndaCourseIE
@@ -159,15 +173,18 @@ from .metacafe import MetacafeIE
from .metacritic import MetacriticIE
from .mit import TechTVMITIE, MITIE, OCWMITIE
from .mixcloud import MixcloudIE
from .mlb import MLBIE
from .mpora import MporaIE
from .mofosex import MofosexIE
from .mooshare import MooshareIE
from .morningstar import MorningstarIE
from .motherless import MotherlessIE
from .motorsport import MotorsportIE
from .moviezine import MoviezineIE
from .movshare import MovShareIE
from .mtv import (
MTVIE,
MTVServicesEmbeddedIE,
MTVIggyIE,
)
from .musicplayon import MusicPlayOnIE
@@ -194,7 +211,11 @@ from .normalboots import NormalbootsIE
from .novamov import NovaMovIE
from .nowness import NownessIE
from .nowvideo import NowVideoIE
from .nrk import NRKIE
from .npo import NPOIE
from .nrk import (
NRKIE,
NRKTVIE,
)
from .ntv import NTVIE
from .nytimes import NYTimesIE
from .nuvid import NuvidIE
@@ -212,8 +233,10 @@ from .pornotube import PornotubeIE
from .prosiebensat1 import ProSiebenSat1IE
from .pyvideo import PyvideoIE
from .radiofrance import RadioFranceIE
from .rai import RaiIE
from .rbmaradio import RBMARadioIE
from .redtube import RedTubeIE
from .reverbnation import ReverbNationIE
from .ringtv import RingTVIE
from .ro220 import Ro220IE
from .rottentomatoes import RottenTomatoesIE
@@ -222,6 +245,7 @@ from .rtbf import RTBFIE
from .rtlnow import RTLnowIE
from .rts import RTSIE
from .rtve import RTVEALaCartaIE
from .ruhd import RUHDIE
from .rutube import (
RutubeIE,
RutubeChannelIE,
@@ -229,8 +253,10 @@ from .rutube import (
RutubePersonIE,
)
from .rutv import RUTVIE
from .sapo import SapoIE
from .savefrom import SaveFromIE
from .scivee import SciVeeIE
from .screencast import ScreencastIE
from .servingsys import ServingSysIE
from .sina import SinaIE
from .slideshare import SlideshareIE
@@ -248,23 +274,33 @@ from .soundcloud import (
SoundcloudUserIE,
SoundcloudPlaylistIE
)
from .southparkstudios import (
SouthParkStudiosIE,
from .soundgasm import SoundgasmIE
from .southpark import (
SouthParkIE,
SouthparkDeIE,
)
from .space import SpaceIE
from .spankwire import SpankwireIE
from .spiegel import SpiegelIE
from .spiegeltv import SpiegeltvIE
from .spike import SpikeIE
from .stanfordoc import StanfordOpenClassroomIE
from .steam import SteamIE
from .streamcloud import StreamcloudIE
from .streamcz import StreamCZIE
from .swrmediathek import SWRMediathekIE
from .syfy import SyfyIE
from .sztvhu import SztvHuIE
from .tagesschau import TagesschauIE
from .teachertube import (
TeacherTubeIE,
TeacherTubeUserIE,
)
from .teachingchannel import TeachingChannelIE
from .teamcoco import TeamcocoIE
from .techtalks import TechTalksIE
from .ted import TEDIE
from .tenplay import TenPlayIE
from .testurl import TestURLIE
from .tf1 import TF1IE
from .theplatform import ThePlatformIE
@@ -294,6 +330,7 @@ from .veehd import VeeHDIE
from .veoh import VeohIE
from .vesti import VestiIE
from .vevo import VevoIE
from .vh1 import VH1IE
from .viddler import ViddlerIE
from .videobam import VideoBamIE
from .videodetective import VideoDetectiveIE
@@ -311,14 +348,17 @@ from .vimeo import (
VimeoReviewIE,
VimeoWatchLaterIE,
)
from .vimple import VimpleIE
from .vine import (
VineIE,
VineUserIE,
)
from .viki import VikiIE
from .vk import VKIE
from .vodlocker import VodlockerIE
from .vube import VubeIE
from .vuclip import VuClipIE
from .vulture import VultureIE
from .washingtonpost import WashingtonPostIE
from .wat import WatIE
from .wdr import (
@@ -330,6 +370,7 @@ from .weibo import WeiboIE
from .wimp import WimpIE
from .wistia import WistiaIE
from .worldstarhiphop import WorldStarHipHopIE
from .wrzuta import WrzutaIE
from .xbef import XBefIE
from .xhamster import XHamsterIE
from .xnxx import XNXXIE
@@ -360,6 +401,7 @@ from .youtube import (
YoutubeUserIE,
YoutubeWatchLaterIE,
)
from .zdf import ZDFIE

View File

@@ -0,0 +1,139 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class AdultSwimIE(InfoExtractor):
_VALID_URL = r'https?://video\.adultswim\.com/(?P<path>.+?)(?:\.html)?(?:\?.*)?(?:#.*)?$'
_TEST = {
'url': 'http://video.adultswim.com/rick-and-morty/close-rick-counters-of-the-rick-kind.html?x=y#title',
'playlist': [
{
'md5': '4da359ec73b58df4575cd01a610ba5dc',
'info_dict': {
'id': '8a250ba1450996e901453d7f02ca02f5',
'ext': 'flv',
'title': 'Rick and Morty Close Rick-Counters of the Rick Kind part 1',
'description': 'Rick has a run in with some old associates, resulting in a fallout with Morty. You got any chips, broh?',
'uploader': 'Rick and Morty',
'thumbnail': 'http://i.cdn.turner.com/asfix/repository/8a250ba13f865824013fc9db8b6b0400/thumbnail_267549017116827057.jpg'
}
},
{
'md5': 'ffbdf55af9331c509d95350bd0cc1819',
'info_dict': {
'id': '8a250ba1450996e901453d7f4bd102f6',
'ext': 'flv',
'title': 'Rick and Morty Close Rick-Counters of the Rick Kind part 2',
'description': 'Rick has a run in with some old associates, resulting in a fallout with Morty. You got any chips, broh?',
'uploader': 'Rick and Morty',
'thumbnail': 'http://i.cdn.turner.com/asfix/repository/8a250ba13f865824013fc9db8b6b0400/thumbnail_267549017116827057.jpg'
}
},
{
'md5': 'b92409635540304280b4b6c36bd14a0a',
'info_dict': {
'id': '8a250ba1450996e901453d7fa73c02f7',
'ext': 'flv',
'title': 'Rick and Morty Close Rick-Counters of the Rick Kind part 3',
'description': 'Rick has a run in with some old associates, resulting in a fallout with Morty. You got any chips, broh?',
'uploader': 'Rick and Morty',
'thumbnail': 'http://i.cdn.turner.com/asfix/repository/8a250ba13f865824013fc9db8b6b0400/thumbnail_267549017116827057.jpg'
}
},
{
'md5': 'e8818891d60e47b29cd89d7b0278156d',
'info_dict': {
'id': '8a250ba1450996e901453d7fc8ba02f8',
'ext': 'flv',
'title': 'Rick and Morty Close Rick-Counters of the Rick Kind part 4',
'description': 'Rick has a run in with some old associates, resulting in a fallout with Morty. You got any chips, broh?',
'uploader': 'Rick and Morty',
'thumbnail': 'http://i.cdn.turner.com/asfix/repository/8a250ba13f865824013fc9db8b6b0400/thumbnail_267549017116827057.jpg'
}
}
]
}
_video_extensions = {
'3500': 'flv',
'640': 'mp4',
'150': 'mp4',
'ipad': 'm3u8',
'iphone': 'm3u8'
}
_video_dimensions = {
'3500': (1280, 720),
'640': (480, 270),
'150': (320, 180)
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_path = mobj.group('path')
webpage = self._download_webpage(url, video_path)
episode_id = self._html_search_regex(r'<link rel="video_src" href="http://i\.adultswim\.com/adultswim/adultswimtv/tools/swf/viralplayer.swf\?id=([0-9a-f]+?)"\s*/?\s*>', webpage, 'episode_id')
title = self._og_search_title(webpage)
index_url = 'http://asfix.adultswim.com/asfix-svc/episodeSearch/getEpisodesByIDs?networkName=AS&ids=%s' % episode_id
idoc = self._download_xml(index_url, title, 'Downloading episode index', 'Unable to download episode index')
episode_el = idoc.find('.//episode')
show_title = episode_el.attrib.get('collectionTitle')
episode_title = episode_el.attrib.get('title')
thumbnail = episode_el.attrib.get('thumbnailUrl')
description = episode_el.find('./description').text.strip()
entries = []
segment_els = episode_el.findall('./segments/segment')
for part_num, segment_el in enumerate(segment_els):
segment_id = segment_el.attrib.get('id')
segment_title = '%s %s part %d' % (show_title, episode_title, part_num + 1)
thumbnail = segment_el.attrib.get('thumbnailUrl')
duration = segment_el.attrib.get('duration')
segment_url = 'http://asfix.adultswim.com/asfix-svc/episodeservices/getCvpPlaylist?networkName=AS&id=%s' % segment_id
idoc = self._download_xml(segment_url, segment_title, 'Downloading segment information', 'Unable to download segment information')
formats = []
file_els = idoc.findall('.//files/file')
for file_el in file_els:
bitrate = file_el.attrib.get('bitrate')
type = file_el.attrib.get('type')
width, height = self._video_dimensions.get(bitrate, (None, None))
formats.append({
'format_id': '%s-%s' % (bitrate, type),
'url': file_el.text,
'ext': self._video_extensions.get(bitrate, 'mp4'),
# The bitrate may not be a number (for example: 'iphone')
'tbr': int(bitrate) if bitrate.isdigit() else None,
'height': height,
'width': width
})
self._sort_formats(formats)
entries.append({
'id': segment_id,
'title': segment_title,
'formats': formats,
'uploader': show_title,
'thumbnail': thumbnail,
'duration': duration,
'description': description
})
return {
'_type': 'playlist',
'id': episode_id,
'display_id': video_path,
'entries': entries,
'title': '%s %s' % (show_title, episode_title),
'description': description,
'thumbnail': thumbnail
}

View File

@@ -1,7 +1,6 @@
# encoding: utf-8
from __future__ import unicode_literals
import datetime
import re
from .common import InfoExtractor
@@ -16,6 +15,7 @@ class AftonbladetIE(InfoExtractor):
'ext': 'mp4',
'title': 'Vulkanutbrott i rymden - nu släpper NASA bilderna',
'description': 'Jupiters måne mest aktiv av alla himlakroppar',
'timestamp': 1394142732,
'upload_date': '20140306',
},
}
@@ -27,17 +27,17 @@ class AftonbladetIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
# find internal video meta data
META_URL = 'http://aftonbladet-play.drlib.aptoma.no/video/%s.json'
meta_url = 'http://aftonbladet-play.drlib.aptoma.no/video/%s.json'
internal_meta_id = self._html_search_regex(
r'data-aptomaId="([\w\d]+)"', webpage, 'internal_meta_id')
internal_meta_url = META_URL % internal_meta_id
internal_meta_url = meta_url % internal_meta_id
internal_meta_json = self._download_json(
internal_meta_url, video_id, 'Downloading video meta data')
# find internal video formats
FORMATS_URL = 'http://aftonbladet-play.videodata.drvideo.aptoma.no/actions/video/?id=%s'
format_url = 'http://aftonbladet-play.videodata.drvideo.aptoma.no/actions/video/?id=%s'
internal_video_id = internal_meta_json['videoId']
internal_formats_url = FORMATS_URL % internal_video_id
internal_formats_url = format_url % internal_video_id
internal_formats_json = self._download_json(
internal_formats_url, video_id, 'Downloading video formats')
@@ -54,16 +54,13 @@ class AftonbladetIE(InfoExtractor):
})
self._sort_formats(formats)
timestamp = datetime.datetime.fromtimestamp(internal_meta_json['timePublished'])
upload_date = timestamp.strftime('%Y%m%d')
return {
'id': video_id,
'title': internal_meta_json['title'],
'formats': formats,
'thumbnail': internal_meta_json['imageUrl'],
'description': internal_meta_json['shortPreamble'],
'upload_date': upload_date,
'timestamp': internal_meta_json['timePublished'],
'duration': internal_meta_json['duration'],
'view_count': internal_meta_json['views'],
}

View File

@@ -0,0 +1,89 @@
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor
from ..utils import (
compat_str,
qualities,
determine_ext,
)
class AllocineIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?allocine\.fr/(?P<typ>article|video|film)/(fichearticle_gen_carticle=|player_gen_cmedia=|fichefilm_gen_cfilm=)(?P<id>[0-9]+)(?:\.html)?'
_TESTS = [{
'url': 'http://www.allocine.fr/article/fichearticle_gen_carticle=18635087.html',
'md5': '0c9fcf59a841f65635fa300ac43d8269',
'info_dict': {
'id': '19546517',
'ext': 'mp4',
'title': 'Astérix - Le Domaine des Dieux Teaser VF',
'description': 'md5:4a754271d9c6f16c72629a8a993ee884',
'thumbnail': 're:http://.*\.jpg',
},
}, {
'url': 'http://www.allocine.fr/video/player_gen_cmedia=19540403&cfilm=222257.html',
'md5': 'd0cdce5d2b9522ce279fdfec07ff16e0',
'info_dict': {
'id': '19540403',
'ext': 'mp4',
'title': 'Planes 2 Bande-annonce VF',
'description': 'md5:c4b1f7bd682a91de6491ada267ec0f4d',
'thumbnail': 're:http://.*\.jpg',
},
}, {
'url': 'http://www.allocine.fr/film/fichefilm_gen_cfilm=181290.html',
'md5': '101250fb127ef9ca3d73186ff22a47ce',
'info_dict': {
'id': '19544709',
'ext': 'mp4',
'title': 'Dragons 2 - Bande annonce finale VF',
'description': 'md5:e74a4dc750894bac300ece46c7036490',
'thumbnail': 're:http://.*\.jpg',
},
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
typ = mobj.group('typ')
display_id = mobj.group('id')
webpage = self._download_webpage(url, display_id)
if typ == 'film':
video_id = self._search_regex(r'href="/video/player_gen_cmedia=([0-9]+).+"', webpage, 'video id')
else:
player = self._search_regex(r'data-player=\'([^\']+)\'>', webpage, 'data player')
player_data = json.loads(player)
video_id = compat_str(player_data['refMedia'])
xml = self._download_xml('http://www.allocine.fr/ws/AcVisiondataV4.ashx?media=%s' % video_id, display_id)
video = xml.find('.//AcVisionVideo').attrib
quality = qualities(['ld', 'md', 'hd'])
formats = []
for k, v in video.items():
if re.match(r'.+_path', k):
format_id = k.split('_')[0]
formats.append({
'format_id': format_id,
'quality': quality(format_id),
'url': v,
'ext': determine_ext(v),
})
self._sort_formats(formats)
return {
'id': video_id,
'title': video['videoTitle'],
'thumbnail': self._og_search_thumbnail(webpage),
'formats': formats,
'description': self._og_search_description(webpage),
}

View File

@@ -1,22 +1,24 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class AnitubeIE(InfoExtractor):
IE_NAME = u'anitube.se'
IE_NAME = 'anitube.se'
_VALID_URL = r'https?://(?:www\.)?anitube\.se/video/(?P<id>\d+)'
_TEST = {
u'url': u'http://www.anitube.se/video/36621',
u'md5': u'59d0eeae28ea0bc8c05e7af429998d43',
u'file': u'36621.mp4',
u'info_dict': {
u'id': u'36621',
u'ext': u'mp4',
u'title': u'Recorder to Randoseru 01',
'url': 'http://www.anitube.se/video/36621',
'md5': '59d0eeae28ea0bc8c05e7af429998d43',
'info_dict': {
'id': '36621',
'ext': 'mp4',
'title': 'Recorder to Randoseru 01',
'duration': 180.19,
},
u'skip': u'Blocked in the US',
'skip': 'Blocked in the US',
}
def _real_extract(self, url):
@@ -24,13 +26,15 @@ class AnitubeIE(InfoExtractor):
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
key = self._html_search_regex(r'http://www\.anitube\.se/embed/([A-Za-z0-9_-]*)',
webpage, u'key')
key = self._html_search_regex(
r'http://www\.anitube\.se/embed/([A-Za-z0-9_-]*)', webpage, 'key')
config_xml = self._download_xml('http://www.anitube.se/nuevo/econfig.php?key=%s' % key,
key)
config_xml = self._download_xml(
'http://www.anitube.se/nuevo/econfig.php?key=%s' % key, key)
video_title = config_xml.find('title').text
thumbnail = config_xml.find('image').text
duration = float(config_xml.find('duration').text)
formats = []
video_url = config_xml.find('file')
@@ -49,5 +53,7 @@ class AnitubeIE(InfoExtractor):
return {
'id': video_id,
'title': video_title,
'thumbnail': thumbnail,
'duration': duration,
'formats': formats
}

View File

@@ -38,37 +38,43 @@ class ARDIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
title = self._html_search_regex(
r'<h1(?:\s+class="boxTopHeadline")?>(.*?)</h1>', webpage, 'title')
[r'<h1(?:\s+class="boxTopHeadline")?>(.*?)</h1>',
r'<meta name="dcterms.title" content="(.*?)"/>',
r'<h4 class="headline">(.*?)</h4>'],
webpage, 'title')
description = self._html_search_meta(
'dcterms.abstract', webpage, 'description')
thumbnail = self._og_search_thumbnail(webpage)
streams = [
mo.groupdict()
for mo in re.finditer(
r'mediaCollection\.addMediaStream\((?P<media_type>\d+), (?P<quality>\d+), "(?P<rtmp_url>[^"]*)", "(?P<video_url>[^"]*)", "[^"]*"\)', webpage)]
media_info = self._download_json(
'http://www.ardmediathek.de/play/media/%s' % video_id, video_id)
# The second element of the _mediaArray contains the standard http urls
streams = media_info['_mediaArray'][1]['_mediaStreamArray']
if not streams:
if '"fsk"' in webpage:
raise ExtractorError('This video is only available after 20:00')
formats = []
for s in streams:
format = {
'quality': int(s['quality']),
}
if s.get('rtmp_url'):
format['protocol'] = 'rtmp'
format['url'] = s['rtmp_url']
format['playpath'] = s['video_url']
else:
format['url'] = s['video_url']
quality_name = self._search_regex(
r'[,.]([a-zA-Z0-9_-]+),?\.mp4', format['url'],
'quality name', default='NA')
format['format_id'] = '%s-%s-%s-%s' % (
determine_ext(format['url']), quality_name, s['media_type'],
s['quality'])
for s in streams:
if type(s['_stream']) == list:
for index, url in enumerate(s['_stream'][::-1]):
quality = s['_quality'] + index
formats.append({
'quality': quality,
'url': url,
'format_id': '%s-%s' % (determine_ext(url), quality)
})
continue
format = {
'quality': s['_quality'],
'url': s['_stream'],
}
format['format_id'] = '%s-%s' % (
determine_ext(format['url']), format['quality'])
formats.append(format)

View File

@@ -39,7 +39,10 @@ class ArteTvIE(InfoExtractor):
formats = [{
'forma_id': q.attrib['quality'],
'url': q.text,
# The playpath starts at 'mp4:', if we don't manually
# split the url, rtmpdump will incorrectly parse them
'url': q.text.split('mp4:', 1)[0],
'play_path': 'mp4:' + q.text.split('mp4:', 1)[1],
'ext': 'flv',
'quality': 2 if q.attrib['quality'] == 'hd' else 1,
} for q in config.findall('./urls/url')]
@@ -111,7 +114,7 @@ class ArteTVPlus7IE(InfoExtractor):
if not formats:
# Some videos are only available in the 'Originalversion'
# they aren't tagged as being in French or German
if all(f['versionCode'] == 'VO' for f in all_formats):
if all(f['versionCode'] == 'VO' or f['versionCode'] == 'VA' for f in all_formats):
formats = all_formats
else:
raise ExtractorError(u'The formats list is empty')
@@ -189,9 +192,10 @@ class ArteTVFutureIE(ArteTVPlus7IE):
_TEST = {
'url': 'http://future.arte.tv/fr/sujet/info-sciences#article-anchor-7081',
'info_dict': {
'id': '050940-003',
'id': '5201',
'ext': 'mp4',
'title': 'Les champignons au secours de la planète',
'upload_date': '20131101',
},
}

View File

@@ -19,7 +19,7 @@ class BandcampIE(InfoExtractor):
'md5': 'c557841d5e50261777a6585648adf439',
'info_dict': {
"title": "youtube-dl \"'/\\\u00e4\u21ad - youtube-dl test song \"'/\\\u00e4\u21ad",
"duration": 10,
"duration": 9.8485,
},
'_skip': 'There is a limit of 200 free downloads / month for the test song'
}]
@@ -28,36 +28,32 @@ class BandcampIE(InfoExtractor):
mobj = re.match(self._VALID_URL, url)
title = mobj.group('title')
webpage = self._download_webpage(url, title)
# We get the link to the free download page
m_download = re.search(r'freeDownloadPage: "(.*?)"', webpage)
if m_download is None:
if not m_download:
m_trackinfo = re.search(r'trackinfo: (.+),\s*?\n', webpage)
if m_trackinfo:
json_code = m_trackinfo.group(1)
data = json.loads(json_code)
d = data[0]
data = json.loads(json_code)[0]
duration = int(round(d['duration']))
formats = []
for format_id, format_url in d['file'].items():
ext, _, abr_str = format_id.partition('-')
for format_id, format_url in data['file'].items():
ext, abr_str = format_id.split('-', 1)
formats.append({
'format_id': format_id,
'url': format_url,
'ext': format_id.partition('-')[0],
'ext': ext,
'vcodec': 'none',
'acodec': format_id.partition('-')[0],
'abr': int(format_id.partition('-')[2]),
'acodec': ext,
'abr': int(abr_str),
})
self._sort_formats(formats)
return {
'id': compat_str(d['id']),
'title': d['title'],
'id': compat_str(data['id']),
'title': data['title'],
'formats': formats,
'duration': duration,
'duration': float(data['duration']),
}
else:
raise ExtractorError('No free songs found')
@@ -67,11 +63,9 @@ class BandcampIE(InfoExtractor):
r'var TralbumData = {(.*?)id: (?P<id>\d*?)$',
webpage, re.MULTILINE | re.DOTALL).group('id')
download_webpage = self._download_webpage(download_link, video_id,
'Downloading free downloads page')
# We get the dictionary of the track from some javascrip code
info = re.search(r'items: (.*?),$',
download_webpage, re.MULTILINE).group(1)
download_webpage = self._download_webpage(download_link, video_id, 'Downloading free downloads page')
# We get the dictionary of the track from some javascript code
info = re.search(r'items: (.*?),$', download_webpage, re.MULTILINE).group(1)
info = json.loads(info)[0]
# We pick mp3-320 for now, until format selection can be easily implemented.
mp3_info = info['downloads']['mp3-320']
@@ -100,7 +94,7 @@ class BandcampIE(InfoExtractor):
class BandcampAlbumIE(InfoExtractor):
IE_NAME = 'Bandcamp:album'
_VALID_URL = r'https?://(?:(?P<subdomain>[^.]+)\.)?bandcamp\.com(?:/album/(?P<title>[^?#]+))?'
_VALID_URL = r'https?://(?:(?P<subdomain>[^.]+)\.)?bandcamp\.com(?:/album/(?P<title>[^?#]+))'
_TEST = {
'url': 'http://blazo.bandcamp.com/album/jazz-format-mixtape-vol-1',
@@ -123,7 +117,7 @@ class BandcampAlbumIE(InfoExtractor):
'params': {
'playlistend': 2
},
'skip': 'Bancamp imposes download limits. See test_playlists:test_bandcamp_album for the playlist test'
'skip': 'Bandcamp imposes download limits. See test_playlists:test_bandcamp_album for the playlist test'
}
def _real_extract(self, url):

View File

@@ -13,7 +13,7 @@ from ..utils import (
class BiliBiliIE(InfoExtractor):
_VALID_URL = r'http://www\.bilibili\.tv/video/av(?P<id>[0-9]+)/'
_VALID_URL = r'http://www\.bilibili\.(?:tv|com)/video/av(?P<id>[0-9]+)/'
_TEST = {
'url': 'http://www.bilibili.tv/video/av1074402/',
@@ -56,7 +56,7 @@ class BiliBiliIE(InfoExtractor):
'thumbnailUrl', video_code, 'thumbnail', fatal=False)
player_params = compat_parse_qs(self._html_search_regex(
r'<iframe .*?class="player" src="https://secure.bilibili.tv/secure,([^"]+)"',
r'<iframe .*?class="player" src="https://secure\.bilibili\.(?:tv|com)/secure,([^"]+)"',
webpage, 'player params'))
if 'cid' in player_params:

View File

@@ -1,13 +1,10 @@
from __future__ import unicode_literals
import datetime
import json
import re
from .common import InfoExtractor
from ..utils import (
remove_start,
)
from ..utils import remove_start
class BlinkxIE(InfoExtractor):
@@ -16,18 +13,21 @@ class BlinkxIE(InfoExtractor):
_TEST = {
'url': 'http://www.blinkx.com/ce/8aQUy7GVFYgFzpKhT0oqsilwOGFRVXk3R1ZGWWdGenBLaFQwb3FzaWx3OGFRVXk3R1ZGWWdGenB',
'file': '8aQUy7GV.mp4',
'md5': '2e9a07364af40163a908edbf10bb2492',
'info_dict': {
"title": "Police Car Rolls Away",
"uploader": "stupidvideos.com",
"upload_date": "20131215",
"description": "A police car gently rolls away from a fight. Maybe it felt weird being around a confrontation and just had to get out of there!",
"duration": 14.886,
"thumbnails": [{
"width": 100,
"height": 76,
"url": "http://cdn.blinkx.com/stream/b/41/StupidVideos/20131215/1873969261/1873969261_tn_0.jpg",
'id': '8aQUy7GV',
'ext': 'mp4',
'title': 'Police Car Rolls Away',
'uploader': 'stupidvideos.com',
'upload_date': '20131215',
'timestamp': 1387068000,
'description': 'A police car gently rolls away from a fight. Maybe it felt weird being around a confrontation and just had to get out of there!',
'duration': 14.886,
'thumbnails': [{
'width': 100,
'height': 76,
'resolution': '100x76',
'url': 'http://cdn.blinkx.com/stream/b/41/StupidVideos/20131215/1873969261/1873969261_tn_0.jpg',
}],
},
}
@@ -37,13 +37,10 @@ class BlinkxIE(InfoExtractor):
video_id = m.group('id')
display_id = video_id[:8]
api_url = (u'https://apib4.blinkx.com/api.php?action=play_video&' +
api_url = ('https://apib4.blinkx.com/api.php?action=play_video&' +
'video=%s' % video_id)
data_json = self._download_webpage(api_url, display_id)
data = json.loads(data_json)['api']['results'][0]
dt = datetime.datetime.fromtimestamp(data['pubdate_epoch'])
pload_date = dt.strftime('%Y%m%d')
duration = None
thumbnails = []
formats = []
@@ -58,16 +55,13 @@ class BlinkxIE(InfoExtractor):
duration = m['d']
elif m['type'] == 'youtube':
yt_id = m['link']
self.to_screen(u'Youtube video detected: %s' % yt_id)
self.to_screen('Youtube video detected: %s' % yt_id)
return self.url_result(yt_id, 'Youtube', video_id=yt_id)
elif m['type'] in ('flv', 'mp4'):
vcodec = remove_start(m['vcodec'], 'ff')
acodec = remove_start(m['acodec'], 'ff')
tbr = (int(m['vbr']) + int(m['abr'])) // 1000
format_id = (u'%s-%sk-%s' %
(vcodec,
tbr,
m['w']))
format_id = '%s-%sk-%s' % (vcodec, tbr, m['w'])
formats.append({
'format_id': format_id,
'url': m['link'],
@@ -88,7 +82,7 @@ class BlinkxIE(InfoExtractor):
'title': data['title'],
'formats': formats,
'uploader': data['channel_name'],
'upload_date': pload_date,
'timestamp': data['pubdate_epoch'],
'description': data.get('description'),
'thumbnails': thumbnails,
'duration': duration,

View File

@@ -1,102 +1,124 @@
from __future__ import unicode_literals
import datetime
import re
from .common import InfoExtractor
from .subtitles import SubtitlesInfoExtractor
from ..utils import (
compat_str,
compat_urllib_request,
unescapeHTML,
parse_iso8601,
compat_urlparse,
clean_html,
compat_str,
)
class BlipTVIE(SubtitlesInfoExtractor):
"""Information extractor for blip.tv"""
_VALID_URL = r'https?://(?:\w+\.)?blip\.tv/(?:(?:.+-|rss/flash/)(?P<id>\d+)|((?:play/|api\.swf#)(?P<lookup_id>[\da-zA-Z+]+)))'
_VALID_URL = r'https?://(?:\w+\.)?blip\.tv/((.+/)|(play/)|(api\.swf#))(?P<presumptive_id>.+)$'
_TESTS = [{
'url': 'http://blip.tv/cbr/cbr-exclusive-gotham-city-imposters-bats-vs-jokerz-short-3-5796352',
'md5': 'c6934ad0b6acf2bd920720ec888eb812',
'info_dict': {
'id': '5779306',
'ext': 'mov',
'upload_date': '20111205',
'description': 'md5:9bc31f227219cde65e47eeec8d2dc596',
'uploader': 'Comic Book Resources - CBR TV',
'title': 'CBR EXCLUSIVE: "Gotham City Imposters" Bats VS Jokerz Short 3',
_TESTS = [
{
'url': 'http://blip.tv/cbr/cbr-exclusive-gotham-city-imposters-bats-vs-jokerz-short-3-5796352',
'md5': 'c6934ad0b6acf2bd920720ec888eb812',
'info_dict': {
'id': '5779306',
'ext': 'mov',
'title': 'CBR EXCLUSIVE: "Gotham City Imposters" Bats VS Jokerz Short 3',
'description': 'md5:9bc31f227219cde65e47eeec8d2dc596',
'timestamp': 1323138843,
'upload_date': '20111206',
'uploader': 'cbr',
'uploader_id': '679425',
'duration': 81,
}
},
{
# https://github.com/rg3/youtube-dl/pull/2274
'note': 'Video with subtitles',
'url': 'http://blip.tv/play/h6Uag5OEVgI.html',
'md5': '309f9d25b820b086ca163ffac8031806',
'info_dict': {
'id': '6586561',
'ext': 'mp4',
'title': 'Red vs. Blue Season 11 Episode 1',
'description': 'One-Zero-One',
'timestamp': 1371261608,
'upload_date': '20130615',
'uploader': 'redvsblue',
'uploader_id': '792887',
'duration': 279,
}
}
}, {
# https://github.com/rg3/youtube-dl/pull/2274
'note': 'Video with subtitles',
'url': 'http://blip.tv/play/h6Uag5OEVgI.html',
'md5': '309f9d25b820b086ca163ffac8031806',
'info_dict': {
'id': '6586561',
'ext': 'mp4',
'uploader': 'Red vs. Blue',
'description': 'One-Zero-One',
'upload_date': '20130614',
'title': 'Red vs. Blue Season 11 Episode 1',
}
}]
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
presumptive_id = mobj.group('presumptive_id')
lookup_id = mobj.group('lookup_id')
# See https://github.com/rg3/youtube-dl/issues/857
embed_mobj = re.match(r'https?://(?:\w+\.)?blip\.tv/(?:play/|api\.swf#)([a-zA-Z0-9]+)', url)
if embed_mobj:
info_url = 'http://blip.tv/play/%s.x?p=1' % embed_mobj.group(1)
info_page = self._download_webpage(info_url, embed_mobj.group(1))
video_id = self._search_regex(
r'data-episode-id="([0-9]+)', info_page, 'video_id')
return self.url_result('http://blip.tv/a/a-' + video_id, 'BlipTV')
cchar = '&' if '?' in url else '?'
json_url = url + cchar + 'skin=json&version=2&no_wrap=1'
request = compat_urllib_request.Request(json_url)
request.add_header('User-Agent', 'iTunes/10.6.1')
json_data = self._download_json(request, video_id=presumptive_id)
if 'Post' in json_data:
data = json_data['Post']
if lookup_id:
info_page = self._download_webpage(
'http://blip.tv/play/%s.x?p=1' % lookup_id, lookup_id, 'Resolving lookup id')
video_id = self._search_regex(r'data-episode-id="([0-9]+)', info_page, 'video_id')
else:
data = json_data
video_id = mobj.group('id')
rss = self._download_xml('http://blip.tv/rss/flash/%s' % video_id, video_id, 'Downloading video RSS')
def blip(s):
return '{http://blip.tv/dtd/blip/1.0}%s' % s
def media(s):
return '{http://search.yahoo.com/mrss/}%s' % s
def itunes(s):
return '{http://www.itunes.com/dtds/podcast-1.0.dtd}%s' % s
item = rss.find('channel/item')
video_id = item.find(blip('item_id')).text
title = item.find('./title').text
description = clean_html(compat_str(item.find(blip('puredescription')).text))
timestamp = parse_iso8601(item.find(blip('datestamp')).text)
uploader = item.find(blip('user')).text
uploader_id = item.find(blip('userid')).text
duration = int(item.find(blip('runtime')).text)
media_thumbnail = item.find(media('thumbnail'))
thumbnail = media_thumbnail.get('url') if media_thumbnail is not None else item.find(itunes('image')).text
categories = [category.text for category in item.findall('category')]
video_id = compat_str(data['item_id'])
upload_date = datetime.datetime.strptime(data['datestamp'], '%m-%d-%y %H:%M%p').strftime('%Y%m%d')
subtitles = {}
formats = []
if 'additionalMedia' in data:
for f in data['additionalMedia']:
if f.get('file_type_srt') == 1:
LANGS = {
'english': 'en',
}
lang = f['role'].rpartition('-')[-1].strip().lower()
langcode = LANGS.get(lang, lang)
subtitles[langcode] = f['url']
continue
if not int(f['media_width']): # filter m3u8
continue
subtitles = {}
media_group = item.find(media('group'))
for media_content in media_group.findall(media('content')):
url = media_content.get('url')
role = media_content.get(blip('role'))
msg = self._download_webpage(
url + '?showplayer=20140425131715&referrer=http://blip.tv&mask=7&skin=flashvars&view=url',
video_id, 'Resolving URL for %s' % role)
real_url = compat_urlparse.parse_qs(msg)['message'][0]
media_type = media_content.get('type')
if media_type == 'text/srt' or url.endswith('.srt'):
LANGS = {
'english': 'en',
}
lang = role.rpartition('-')[-1].strip().lower()
langcode = LANGS.get(lang, lang)
subtitles[langcode] = url
elif media_type.startswith('video/'):
formats.append({
'url': f['url'],
'format_id': f['role'],
'width': int(f['media_width']),
'height': int(f['media_height']),
'url': real_url,
'format_id': role,
'format_note': media_type,
'vcodec': media_content.get(blip('vcodec')),
'acodec': media_content.get(blip('acodec')),
'filesize': media_content.get('filesize'),
'width': int(media_content.get('width')),
'height': int(media_content.get('height')),
})
else:
formats.append({
'url': data['media']['url'],
'width': int(data['media']['width']),
'height': int(data['media']['height']),
})
self._sort_formats(formats)
# subtitles
@@ -107,12 +129,14 @@ class BlipTVIE(SubtitlesInfoExtractor):
return {
'id': video_id,
'uploader': data['display_name'],
'upload_date': upload_date,
'title': data['title'],
'thumbnail': data['thumbnailUrl'],
'description': data['description'],
'user_agent': 'iTunes/10.6.1',
'title': title,
'description': description,
'timestamp': timestamp,
'uploader': uploader,
'uploader_id': uploader_id,
'duration': duration,
'thumbnail': thumbnail,
'categories': categories,
'formats': formats,
'subtitles': video_subtitles,
}

View File

@@ -17,15 +17,13 @@ class BRIE(InfoExtractor):
_TESTS = [
{
'url': 'http://www.br.de/mediathek/video/anselm-gruen-114.html',
'md5': 'c4f83cf0f023ba5875aba0bf46860df2',
'url': 'http://www.br.de/mediathek/video/sendungen/heimatsound/heimatsound-festival-2014-trailer-100.html',
'md5': '93556dd2bcb2948d9259f8670c516d59',
'info_dict': {
'id': '2c8d81c5-6fb7-4a74-88d4-e768e5856532',
'id': '25e279aa-1ffd-40fd-9955-5325bd48a53a',
'ext': 'mp4',
'title': 'Feiern und Verzichten',
'description': 'Anselm Grün: Feiern und Verzichten',
'uploader': 'BR/Birgit Baier',
'upload_date': '20140301',
'title': 'Am 1. und 2. August in Oberammergau',
'description': 'md5:dfd224e5aa6819bc1fcbb7826a932021',
}
},
{

View File

@@ -15,6 +15,7 @@ from ..utils import (
compat_urllib_request,
compat_parse_qs,
determine_ext,
ExtractorError,
unsmuggle_url,
unescapeHTML,
@@ -29,10 +30,11 @@ class BrightcoveIE(InfoExtractor):
{
# From http://www.8tv.cat/8aldia/videos/xavier-sala-i-martin-aquesta-tarda-a-8-al-dia/
'url': 'http://c.brightcove.com/services/viewer/htmlFederated?playerID=1654948606001&flashID=myExperience&%40videoPlayer=2371591881001',
'file': '2371591881001.mp4',
'md5': '5423e113865d26e40624dce2e4b45d95',
'note': 'Test Brightcove downloads and detection in GenericIE',
'info_dict': {
'id': '2371591881001',
'ext': 'mp4',
'title': 'Xavier Sala i Martín: “Un banc que no presta és un banc zombi que no serveix per a res”',
'uploader': '8TV',
'description': 'md5:a950cc4285c43e44d763d036710cd9cd',
@@ -41,8 +43,9 @@ class BrightcoveIE(InfoExtractor):
{
# From http://medianetwork.oracle.com/video/player/1785452137001
'url': 'http://c.brightcove.com/services/viewer/htmlFederated?playerID=1217746023001&flashID=myPlayer&%40videoPlayer=1785452137001',
'file': '1785452137001.flv',
'info_dict': {
'id': '1785452137001',
'ext': 'flv',
'title': 'JVMLS 2012: Arrays 2.0 - Opportunities and Challenges',
'description': 'John Rose speaks at the JVM Language Summit, August 1, 2012.',
'uploader': 'Oracle',
@@ -70,7 +73,20 @@ class BrightcoveIE(InfoExtractor):
'description': 'md5:363109c02998fee92ec02211bd8000df',
'uploader': 'National Ballet of Canada',
},
}
},
{
# test flv videos served by akamaihd.net
# From http://www.redbull.com/en/bike/stories/1331655643987/replay-uci-dh-world-cup-2014-from-fort-william
'url': 'http://c.brightcove.com/services/viewer/htmlFederated?%40videoPlayer=ref%3ABC2996102916001&linkBaseURL=http%3A%2F%2Fwww.redbull.com%2Fen%2Fbike%2Fvideos%2F1331655630249%2Freplay-uci-fort-william-2014-dh&playerKey=AQ%7E%7E%2CAAAApYJ7UqE%7E%2Cxqr_zXk0I-zzNndy8NlHogrCb5QdyZRf&playerID=1398061561001#__youtubedl_smuggle=%7B%22Referer%22%3A+%22http%3A%2F%2Fwww.redbull.com%2Fen%2Fbike%2Fstories%2F1331655643987%2Freplay-uci-dh-world-cup-2014-from-fort-william%22%7D',
# The md5 checksum changes on each download
'info_dict': {
'id': '2996102916001',
'ext': 'flv',
'title': 'UCI MTB World Cup 2014: Fort William, UK - Downhill Finals',
'uploader': 'Red Bull TV',
'description': 'UCI MTB World Cup 2014: Fort William, UK - Downhill Finals',
},
},
]
@classmethod
@@ -187,7 +203,7 @@ class BrightcoveIE(InfoExtractor):
webpage = self._download_webpage(req, video_id)
self.report_extraction(video_id)
info = self._search_regex(r'var experienceJSON = ({.*?});', webpage, 'json')
info = self._search_regex(r'var experienceJSON = ({.*});', webpage, 'json')
info = json.loads(info)['data']
video_info = info['programmedContent']['videoPlayer']['mediaDTO']
video_info['_youtubedl_adServerURL'] = info.get('adServerURL')
@@ -219,12 +235,26 @@ class BrightcoveIE(InfoExtractor):
renditions = video_info.get('renditions')
if renditions:
renditions = sorted(renditions, key=lambda r: r['size'])
info['formats'] = [{
'url': rend['defaultURL'],
'height': rend.get('frameHeight'),
'width': rend.get('frameWidth'),
} for rend in renditions]
formats = []
for rend in renditions:
url = rend['defaultURL']
if rend['remote']:
# This type of renditions are served through akamaihd.net,
# but they don't use f4m manifests
url = url.replace('control/', '') + '?&v=3.3.0&fp=13&r=FEEFJ&g=RTSJIMBMPFPB'
ext = 'flv'
else:
ext = determine_ext(url)
size = rend.get('size')
formats.append({
'url': url,
'ext': ext,
'height': rend.get('frameHeight'),
'width': rend.get('frameWidth'),
'filesize': size if size != 0 else None,
})
self._sort_formats(formats)
info['formats'] = formats
elif video_info.get('FLVFullLengthURL') is not None:
info.update({
'url': video_info['FLVFullLengthURL'],

View File

@@ -1,10 +1,12 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
int_or_none,
)
@@ -13,9 +15,10 @@ class CinemassacreIE(InfoExtractor):
_TESTS = [
{
'url': 'http://cinemassacre.com/2012/11/10/avgn-the-movie-trailer/',
'file': '19911.mp4',
'md5': '782f8504ca95a0eba8fc9177c373eec7',
'md5': 'fde81fbafaee331785f58cd6c0d46190',
'info_dict': {
'id': '19911',
'ext': 'mp4',
'upload_date': '20121110',
'title': '“Angry Video Game Nerd: The Movie” Trailer',
'description': 'md5:fb87405fcb42a331742a0dce2708560b',
@@ -23,9 +26,10 @@ class CinemassacreIE(InfoExtractor):
},
{
'url': 'http://cinemassacre.com/2013/10/02/the-mummys-hand-1940',
'file': '521be8ef82b16.mp4',
'md5': 'dec39ee5118f8d9cc067f45f9cbe3a35',
'md5': 'd72f10cd39eac4215048f62ab477a511',
'info_dict': {
'id': '521be8ef82b16',
'ext': 'mp4',
'upload_date': '20131002',
'title': 'The Mummys Hand (1940)',
},
@@ -50,29 +54,40 @@ class CinemassacreIE(InfoExtractor):
r'<div class="entry-content">(?P<description>.+?)</div>',
webpage, 'description', flags=re.DOTALL, fatal=False)
playerdata = self._download_webpage(playerdata_url, video_id)
playerdata = self._download_webpage(playerdata_url, video_id, 'Downloading player webpage')
video_thumbnail = self._search_regex(
r'image: \'(?P<thumbnail>[^\']+)\'', playerdata, 'thumbnail', fatal=False)
sd_url = self._search_regex(r'file: \'([^\']+)\', label: \'SD\'', playerdata, 'sd_file')
videolist_url = self._search_regex(r'file: \'([^\']+\.smil)\'}', playerdata, 'videolist_url')
sd_url = self._html_search_regex(r'file: \'([^\']+)\', label: \'SD\'', playerdata, 'sd_file')
hd_url = self._html_search_regex(
r'file: \'([^\']+)\', label: \'HD\'', playerdata, 'hd_file',
default=None)
video_thumbnail = self._html_search_regex(r'image: \'(?P<thumbnail>[^\']+)\'', playerdata, 'thumbnail', fatal=False)
videolist = self._download_xml(videolist_url, video_id, 'Downloading videolist XML')
formats = [{
'url': sd_url,
'ext': 'mp4',
'format': 'sd',
'format_id': 'sd',
'quality': 1,
}]
if hd_url:
formats.append({
'url': hd_url,
'ext': 'mp4',
'format': 'hd',
'format_id': 'hd',
'quality': 2,
})
formats = []
baseurl = sd_url[:sd_url.rfind('/')+1]
for video in videolist.findall('.//video'):
src = video.get('src')
if not src:
continue
file_ = src.partition(':')[-1]
width = int_or_none(video.get('width'))
height = int_or_none(video.get('height'))
bitrate = int_or_none(video.get('system-bitrate'))
format = {
'url': baseurl + file_,
'format_id': src.rpartition('.')[0].rpartition('_')[-1],
}
if width or height:
format.update({
'tbr': bitrate // 1000 if bitrate else None,
'width': width,
'height': height,
})
else:
format.update({
'abr': bitrate // 1000 if bitrate else None,
'vcodec': 'none',
})
formats.append(format)
self._sort_formats(formats)
return {

View File

@@ -1,19 +1,19 @@
from __future__ import unicode_literals
from .mtv import MTVIE
class CMTIE(MTVIE):
IE_NAME = u'cmt.com'
IE_NAME = 'cmt.com'
_VALID_URL = r'https?://www\.cmt\.com/videos/.+?/(?P<videoid>[^/]+)\.jhtml'
_FEED_URL = 'http://www.cmt.com/sitewide/apps/player/embed/rss/'
_TESTS = [
{
u'url': u'http://www.cmt.com/videos/garth-brooks/989124/the-call-featuring-trisha-yearwood.jhtml#artist=30061',
u'md5': u'e6b7ef3c4c45bbfae88061799bbba6c2',
u'info_dict': {
u'id': u'989124',
u'ext': u'mp4',
u'title': u'Garth Brooks - "The Call (featuring Trisha Yearwood)"',
u'description': u'Blame It All On My Roots',
},
_TESTS = [{
'url': 'http://www.cmt.com/videos/garth-brooks/989124/the-call-featuring-trisha-yearwood.jhtml#artist=30061',
'md5': 'e6b7ef3c4c45bbfae88061799bbba6c2',
'info_dict': {
'id': '989124',
'ext': 'mp4',
'title': 'Garth Brooks - "The Call (featuring Trisha Yearwood)"',
'description': 'Blame It All On My Roots',
},
]
}]

View File

@@ -79,8 +79,11 @@ class CNNIE(InfoExtractor):
self._sort_formats(formats)
thumbnails = sorted([((int(t.attrib['height']),int(t.attrib['width'])), t.text) for t in info.findall('images/image')])
thumbs_dict = [{'resolution': res, 'url': t_url} for (res, t_url) in thumbnails]
thumbnails = [{
'height': int(t.attrib['height']),
'width': int(t.attrib['width']),
'url': t.text,
} for t in info.findall('images/image')]
metas_el = info.find('metas')
upload_date = (
@@ -93,8 +96,7 @@ class CNNIE(InfoExtractor):
'id': info.attrib['id'],
'title': info.find('headline').text,
'formats': formats,
'thumbnail': thumbnails[-1][1],
'thumbnails': thumbs_dict,
'thumbnails': thumbnails,
'description': info.find('description').text,
'duration': duration,
'upload_date': upload_date,

View File

@@ -14,13 +14,13 @@ from ..utils import (
class ComedyCentralIE(MTVServicesInfoExtractor):
_VALID_URL = r'''(?x)https?://(?:www\.)?(comedycentral|cc)\.com/
(video-clips|episodes|cc-studios|video-collections)
_VALID_URL = r'''(?x)https?://(?:www\.)?cc\.com/
(video-clips|episodes|cc-studios|video-collections|full-episodes)
/(?P<title>.*)'''
_FEED_URL = 'http://comedycentral.com/feeds/mrss/'
_TEST = {
'url': 'http://www.comedycentral.com/video-clips/kllhuv/stand-up-greg-fitzsimmons--uncensored---too-good-of-a-mother',
'url': 'http://www.cc.com/video-clips/kllhuv/stand-up-greg-fitzsimmons--uncensored---too-good-of-a-mother',
'md5': 'c4f48e9eda1b16dd10add0744344b6d8',
'info_dict': {
'id': 'cef0cbb3-e776-4bc9-b62e-8016deccb354',
@@ -130,7 +130,7 @@ class ComedyCentralShowsIE(InfoExtractor):
raise ExtractorError('Invalid redirected URL: ' + url)
if mobj.group('episode') == '':
raise ExtractorError('Redirected URL is still not specific: ' + url)
epTitle = mobj.group('episode').rpartition('/')[-1]
epTitle = (mobj.group('episode') or mobj.group('videotitle')).rpartition('/')[-1]
mMovieParams = re.findall('(?:<param name="movie" value="|var url = ")(http://media.mtvnservices.com/([^"]*(?:episode|video).*?:.*?))"', webpage)
if len(mMovieParams) == 0:
@@ -188,7 +188,7 @@ class ComedyCentralShowsIE(InfoExtractor):
})
formats.append({
'format_id': 'rtmp-%s' % format,
'url': rtmp_video_url,
'url': rtmp_video_url.replace('viacomccstrm', 'viacommtvstrm'),
'ext': self._video_extensions.get(format, 'mp4'),
'height': h,
'width': w,

View File

@@ -1,11 +1,12 @@
import base64
import hashlib
import json
import netrc
import os
import re
import socket
import sys
import netrc
import time
import xml.etree.ElementTree
from ..utils import (
@@ -92,8 +93,12 @@ class InfoExtractor(object):
unique, but available before title. Typically, id is
something like "4234987", title "Dancing naked mole rats",
and display_id "dancing-naked-mole-rats"
thumbnails: A list of dictionaries (with the entries "resolution" and
"url") for the varying thumbnails
thumbnails: A list of dictionaries, with the following entries:
* "url"
* "width" (optional, int)
* "height" (optional, int)
* "resolution" (optional, string "{width}x{height"},
deprecated)
thumbnail: Full URL to a video thumbnail image.
description: One-line video description.
uploader: Full name of the video uploader.
@@ -113,6 +118,8 @@ class InfoExtractor(object):
webpage_url: The url to the video webpage, if given to youtube-dl it
should allow to get the same result again. (It will be set
by YoutubeDL if it's missing)
categories: A list of categories that the video falls in, for example
["Sports", "Berlin"]
Unless mentioned otherwise, the fields should be Unicode strings.
@@ -242,7 +249,7 @@ class InfoExtractor(object):
url = url_or_request.get_full_url()
except AttributeError:
url = url_or_request
basen = video_id + '_' + url
basen = '%s_%s' % (video_id, url)
if len(basen) > 240:
h = u'___' + hashlib.md5(basen.encode('utf-8')).hexdigest()
basen = basen[:240 - len(h)] + h
@@ -453,14 +460,17 @@ class InfoExtractor(object):
if secure: regexes = self._og_regexes('video:secure_url') + regexes
return self._html_search_regex(regexes, html, name, **kargs)
def _html_search_meta(self, name, html, display_name=None, fatal=False):
def _og_search_url(self, html, **kargs):
return self._og_search_property('url', html, **kargs)
def _html_search_meta(self, name, html, display_name=None, fatal=False, **kwargs):
if display_name is None:
display_name = name
return self._html_search_regex(
r'''(?ix)<meta
(?=[^>]+(?:itemprop|name|property)=["\']%s["\'])
[^>]+content=["\']([^"\']+)["\']''' % re.escape(name),
html, display_name, fatal=fatal)
html, display_name, fatal=fatal, **kwargs)
def _dc_search_uploader(self, html):
return self._html_search_meta('dc.creator', html, 'uploader')
@@ -566,6 +576,13 @@ class InfoExtractor(object):
else:
return url
def _sleep(self, timeout, video_id, msg_template=None):
if msg_template is None:
msg_template = u'%(video_id)s: Waiting for %(timeout)s seconds'
msg = msg_template % {'video_id': video_id, 'timeout': timeout}
self.to_screen(msg)
time.sleep(timeout)
class SearchInfoExtractor(InfoExtractor):
"""
@@ -609,4 +626,3 @@ class SearchInfoExtractor(InfoExtractor):
@property
def SEARCH_KEY(self):
return self._SEARCH_KEY

View File

@@ -0,0 +1,65 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
parse_iso8601,
str_to_int,
)
class CrackedIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?cracked\.com/video_(?P<id>\d+)_[\da-z-]+\.html'
_TEST = {
'url': 'http://www.cracked.com/video_19006_4-plot-holes-you-didnt-notice-in-your-favorite-movies.html',
'md5': '4b29a5eeec292cd5eca6388c7558db9e',
'info_dict': {
'id': '19006',
'ext': 'mp4',
'title': '4 Plot Holes You Didn\'t Notice in Your Favorite Movies',
'description': 'md5:3b909e752661db86007d10e5ec2df769',
'timestamp': 1405659600,
'upload_date': '20140718',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
video_url = self._html_search_regex(
[r'var\s+CK_vidSrc\s*=\s*"([^"]+)"', r'<video\s+src="([^"]+)"'], webpage, 'video URL')
title = self._og_search_title(webpage)
description = self._og_search_description(webpage)
timestamp = self._html_search_regex(r'<time datetime="([^"]+)"', webpage, 'upload date', fatal=False)
if timestamp:
timestamp = parse_iso8601(timestamp[:-6])
view_count = str_to_int(self._html_search_regex(
r'<span class="views" id="viewCounts">([\d,\.]+) Views</span>', webpage, 'view count', fatal=False))
comment_count = str_to_int(self._html_search_regex(
r'<span id="commentCounts">([\d,\.]+)</span>', webpage, 'comment count', fatal=False))
m = re.search(r'_(?P<width>\d+)X(?P<height>\d+)\.mp4$', video_url)
if m:
width = int(m.group('width'))
height = int(m.group('height'))
else:
width = height = None
return {
'id': video_id,
'url':video_url,
'title': title,
'description': description,
'timestamp': timestamp,
'view_count': view_count,
'comment_count': comment_count,
'height': height,
'width': width,
}

View File

@@ -1,40 +1,43 @@
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import determine_ext
class CriterionIE(InfoExtractor):
_VALID_URL = r'https?://www\.criterion\.com/films/(\d*)-.+'
_VALID_URL = r'https?://www\.criterion\.com/films/(?P<id>[0-9]+)-.+'
_TEST = {
u'url': u'http://www.criterion.com/films/184-le-samourai',
u'file': u'184.mp4',
u'md5': u'bc51beba55685509883a9a7830919ec3',
u'info_dict': {
u"title": u"Le Samouraï",
u"description" : u'md5:a2b4b116326558149bef81f76dcbb93f',
'url': 'http://www.criterion.com/films/184-le-samourai',
'md5': 'bc51beba55685509883a9a7830919ec3',
'info_dict': {
'id': '184',
'ext': 'mp4',
'title': 'Le Samouraï',
'description': 'md5:a2b4b116326558149bef81f76dcbb93f',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group(1)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
final_url = self._search_regex(r'so.addVariable\("videoURL", "(.+?)"\)\;',
webpage, 'video url')
title = self._html_search_regex(r'<meta content="(.+?)" property="og:title" />',
webpage, 'video title')
description = self._html_search_regex(r'<meta name="description" content="(.+?)" />',
webpage, 'video description')
thumbnail = self._search_regex(r'so.addVariable\("thumbnailURL", "(.+?)"\)\;',
webpage, 'thumbnail url')
final_url = self._search_regex(
r'so.addVariable\("videoURL", "(.+?)"\)\;', webpage, 'video url')
title = self._og_search_title(webpage)
description = self._html_search_regex(
r'<meta name="description" content="(.+?)" />',
webpage, 'video description')
thumbnail = self._search_regex(
r'so.addVariable\("thumbnailURL", "(.+?)"\)\;',
webpage, 'thumbnail url')
return {'id': video_id,
'url' : final_url,
'title': title,
'ext': determine_ext(final_url),
'description': description,
'thumbnail': thumbnail,
}
return {
'id': video_id,
'url': final_url,
'title': title,
'description': description,
'thumbnail': thumbnail,
}

View File

@@ -150,7 +150,7 @@ class DailymotionIE(DailymotionBaseInfoExtractor, SubtitlesInfoExtractor):
return {
'id': video_id,
'formats': formats,
'uploader': info['owner_screenname'],
'uploader': info['owner.screenname'],
'upload_date': video_upload_date,
'title': self._og_search_title(webpage),
'subtitles': video_subtitles,

View File

@@ -0,0 +1,44 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class DFBIE(InfoExtractor):
IE_NAME = 'tv.dfb.de'
_VALID_URL = r'https?://tv\.dfb\.de/video/[^/]+/(?P<id>\d+)'
_TEST = {
'url': 'http://tv.dfb.de/video/highlights-des-empfangs-in-berlin/9070/',
# The md5 is different each time
'info_dict': {
'id': '9070',
'ext': 'flv',
'title': 'Highlights des Empfangs in Berlin',
'upload_date': '20140716',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
player_info = self._download_xml(
'http://tv.dfb.de/server/hd_video.php?play=%s' % video_id,
video_id)
video_info = player_info.find('video')
f4m_info = self._download_xml(video_info.find('url').text, video_id)
token_el = f4m_info.find('token')
manifest_url = token_el.attrib['url'] + '?' + 'hdnea=' + token_el.attrib['auth'] + '&hdcore=3.2.0'
return {
'id': video_id,
'title': video_info.find('title').text,
'url': manifest_url,
'ext': 'flv',
'thumbnail': self._og_search_thumbnail(webpage),
'upload_date': ''.join(video_info.find('time_date').text.split('.')[::-1]),
}

View File

@@ -7,9 +7,9 @@ from .common import InfoExtractor
class DiscoveryIE(InfoExtractor):
_VALID_URL = r'http://dsc\.discovery\.com\/[a-zA-Z0-9\-]*/[a-zA-Z0-9\-]*/videos/(?P<id>[a-zA-Z0-9\-]*)(.htm)?'
_VALID_URL = r'http://www\.discovery\.com\/[a-zA-Z0-9\-]*/[a-zA-Z0-9\-]*/videos/(?P<id>[a-zA-Z0-9\-]*)(.htm)?'
_TEST = {
'url': 'http://dsc.discovery.com/tv-shows/mythbusters/videos/mission-impossible-outtakes.htm',
'url': 'http://www.discovery.com/tv-shows/mythbusters/videos/mission-impossible-outtakes.htm',
'md5': 'e12614f9ee303a6ccef415cb0793eba2',
'info_dict': {
'id': '614784',

View File

@@ -1,39 +1,37 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
unified_strdate,
)
from ..utils import unified_strdate
class DreiSatIE(InfoExtractor):
IE_NAME = '3sat'
_VALID_URL = r'(?:http://)?(?:www\.)?3sat\.de/mediathek/(?:index\.php)?\?(?:(?:mode|display)=[^&]+&)*obj=(?P<id>[0-9]+)$'
_TEST = {
u"url": u"http://www.3sat.de/mediathek/index.php?obj=36983",
u'file': u'36983.mp4',
u'md5': u'9dcfe344732808dbfcc901537973c922',
u'info_dict': {
u"title": u"Kaffeeland Schweiz",
u"description": u"Über 80 Kaffeeröstereien liefern in der Schweiz das Getränk, in das das Land so vernarrt ist: Mehr als 1000 Tassen trinkt ein Schweizer pro Jahr. SCHWEIZWEIT nimmt die Kaffeekultur unter die...",
u"uploader": u"3sat",
u"upload_date": u"20130622"
'url': 'http://www.3sat.de/mediathek/index.php?obj=36983',
'md5': '9dcfe344732808dbfcc901537973c922',
'info_dict': {
'id': '36983',
'ext': 'mp4',
'title': 'Kaffeeland Schweiz',
'description': 'md5:cc4424b18b75ae9948b13929a0814033',
'uploader': '3sat',
'upload_date': '20130622'
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
details_url = 'http://www.3sat.de/mediathek/xmlservice/web/beitragsDetails?ak=web&id=%s' % video_id
details_doc = self._download_xml(details_url, video_id, note=u'Downloading video details')
details_doc = self._download_xml(details_url, video_id, 'Downloading video details')
thumbnail_els = details_doc.findall('.//teaserimage')
thumbnails = [{
'width': te.attrib['key'].partition('x')[0],
'height': te.attrib['key'].partition('x')[2],
'width': int(te.attrib['key'].partition('x')[0]),
'height': int(te.attrib['key'].partition('x')[2]),
'url': te.text,
} for te in thumbnail_els]

View File

@@ -0,0 +1,91 @@
from __future__ import unicode_literals
import re
from .subtitles import SubtitlesInfoExtractor
from .common import ExtractorError
from ..utils import parse_iso8601
class DRTVIE(SubtitlesInfoExtractor):
_VALID_URL = r'http://(?:www\.)?dr\.dk/tv/se/[^/]+/(?P<id>[\da-z-]+)'
_TEST = {
'url': 'http://www.dr.dk/tv/se/partiets-mand/partiets-mand-7-8',
'md5': '4a7e1dd65cdb2643500a3f753c942f25',
'info_dict': {
'id': 'partiets-mand-7-8',
'ext': 'mp4',
'title': 'Partiets mand (7:8)',
'description': 'md5:a684b90a8f9336cd4aab94b7647d7862',
'timestamp': 1403047940,
'upload_date': '20140617',
'duration': 1299.040,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
programcard = self._download_json(
'http://www.dr.dk/mu/programcard/expanded/%s' % video_id, video_id, 'Downloading video JSON')
data = programcard['Data'][0]
title = data['Title']
description = data['Description']
timestamp = parse_iso8601(data['CreatedTime'][:-5])
thumbnail = None
duration = None
restricted_to_denmark = False
formats = []
subtitles = {}
for asset in data['Assets']:
if asset['Kind'] == 'Image':
thumbnail = asset['Uri']
elif asset['Kind'] == 'VideoResource':
duration = asset['DurationInMilliseconds'] / 1000.0
restricted_to_denmark = asset['RestrictedToDenmark']
for link in asset['Links']:
target = link['Target']
uri = link['Uri']
formats.append({
'url': uri + '?hdcore=3.3.0&plugin=aasp-3.3.0.99.43' if target == 'HDS' else uri,
'format_id': target,
'ext': link['FileFormat'],
'preference': -1 if target == 'HDS' else -2,
})
subtitles_list = asset.get('SubtitlesList')
if isinstance(subtitles_list, list):
LANGS = {
'Danish': 'dk',
}
for subs in subtitles_list:
lang = subs['Language']
subtitles[LANGS.get(lang, lang)] = subs['Uri']
if not formats and restricted_to_denmark:
raise ExtractorError(
'Unfortunately, DR is not allowed to show this program outside Denmark.', expected=True)
self._sort_formats(formats)
if self._downloader.params.get('listsubtitles', False):
self._list_available_subtitles(video_id, subtitles)
return
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'timestamp': timestamp,
'duration': duration,
'formats': formats,
'subtitles': self.extract_subtitles(video_id, subtitles),
}

View File

@@ -3,20 +3,18 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
)
class EmpflixIE(InfoExtractor):
_VALID_URL = r'^https?://www\.empflix\.com/videos/.*?-(?P<id>[0-9]+)\.html'
_TEST = {
'url': 'http://www.empflix.com/videos/Amateur-Finger-Fuck-33051.html',
'md5': '5e5cc160f38ca9857f318eb97146e13e',
'md5': 'b1bc15b6412d33902d6e5952035fcabc',
'info_dict': {
'id': '33051',
'ext': 'flv',
'ext': 'mp4',
'title': 'Amateur Finger Fuck',
'description': 'Amateur solo finger fucking.',
'age_limit': 18,
}
}
@@ -30,6 +28,8 @@ class EmpflixIE(InfoExtractor):
video_title = self._html_search_regex(
r'name="title" value="(?P<title>[^"]*)"', webpage, 'title')
video_description = self._html_search_regex(
r'name="description" value="([^"]*)"', webpage, 'description', fatal=False)
cfg_url = self._html_search_regex(
r'flashvars\.config = escape\("([^"]+)"',
@@ -37,12 +37,18 @@ class EmpflixIE(InfoExtractor):
cfg_xml = self._download_xml(
cfg_url, video_id, note='Downloading metadata')
video_url = cfg_xml.find('videoLink').text
formats = [
{
'url': item.find('videoLink').text,
'format_id': item.find('res').text,
} for item in cfg_xml.findall('./quality/item')
]
return {
'id': video_id,
'url': video_url,
'ext': 'flv',
'title': video_title,
'description': video_description,
'formats': formats,
'age_limit': age_limit,
}

View File

@@ -37,7 +37,7 @@ class ExtremeTubeIE(InfoExtractor):
webpage = self._download_webpage(req, video_id)
video_title = self._html_search_regex(
r'<h1 [^>]*?title="([^"]+)"[^>]*>\1<', webpage, 'title')
r'<h1 [^>]*?title="([^"]+)"[^>]*>', webpage, 'title')
uploader = self._html_search_regex(
r'>Posted by:(?=<)(?:\s|<[^>]*>)*(.+?)\|', webpage, 'uploader',
fatal=False)

View File

@@ -13,7 +13,7 @@ from ..utils import (
class FC2IE(InfoExtractor):
_VALID_URL = r'^http://video\.fc2\.com/(?P<lang>[^/]+)/content/(?P<id>[^/]+)'
_VALID_URL = r'^http://video\.fc2\.com/((?P<lang>[^/]+)/)?content/(?P<id>[^/]+)'
IE_NAME = 'fc2'
_TEST = {
'url': 'http://video.fc2.com/en/content/20121103kUan1KHs',
@@ -36,7 +36,7 @@ class FC2IE(InfoExtractor):
thumbnail = self._og_search_thumbnail(webpage)
refer = url.replace('/content/', '/a/content/')
mimi = hashlib.md5(video_id + '_gGddgPfeaf_gzyr').hexdigest()
mimi = hashlib.md5((video_id + '_gGddgPfeaf_gzyr').encode('utf-8')).hexdigest()
info_url = (
"http://video.fc2.com/ginfo.php?mimi={1:s}&href={2:s}&v={0:s}&fversion=WIN%2011%2C6%2C602%2C180&from=2&otag=0&upid={0:s}&tk=null&".
@@ -50,10 +50,13 @@ class FC2IE(InfoExtractor):
raise ExtractorError('Error code: %s' % info['err_code'][0])
video_url = info['filepath'][0] + '?mid=' + info['mid'][0]
title_info = info.get('title')
if title_info:
title = title_info[0]
return {
'id': video_id,
'title': info['title'][0],
'title': title,
'url': video_url,
'ext': 'flv',
'thumbnail': thumbnail,

View File

@@ -0,0 +1,83 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
compat_urllib_parse,
compat_urllib_request,
determine_ext,
)
class FiredriveIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?firedrive\.com/' + \
'(?:file|embed)/(?P<id>[0-9a-zA-Z]+)'
_FILE_DELETED_REGEX = r'<div class="removed_file_image">'
_TESTS = [{
'url': 'https://www.firedrive.com/file/FEB892FA160EBD01',
'md5': 'd5d4252f80ebeab4dc2d5ceaed1b7970',
'info_dict': {
'id': 'FEB892FA160EBD01',
'ext': 'flv',
'title': 'bbb_theora_486kbit.flv',
'thumbnail': 're:^http://.*\.jpg$',
},
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
url = 'http://firedrive.com/file/%s' % video_id
webpage = self._download_webpage(url, video_id)
if re.search(self._FILE_DELETED_REGEX, webpage) is not None:
raise ExtractorError('Video %s does not exist' % video_id,
expected=True)
fields = dict(re.findall(r'''(?x)<input\s+
type="hidden"\s+
name="([^"]+)"\s+
(?:id="[^"]+"\s+)?
value="([^"]*)"
''', webpage))
post = compat_urllib_parse.urlencode(fields)
req = compat_urllib_request.Request(url, post)
req.add_header('Content-type', 'application/x-www-form-urlencoded')
# Apparently, this header is required for confirmation to work.
req.add_header('Host', 'www.firedrive.com')
webpage = self._download_webpage(req, video_id,
'Downloading video page')
title = self._search_regex(r'class="external_title_left">(.+)</div>',
webpage, 'title')
thumbnail = self._search_regex(r'image:\s?"(//[^\"]+)', webpage,
'thumbnail', fatal=False)
if thumbnail is not None:
thumbnail = 'http:' + thumbnail
ext = self._search_regex(r'type:\s?\'([^\']+)\',',
webpage, 'extension', fatal=False)
video_url = self._search_regex(
r'file:\s?\'(http[^\']+)\',', webpage, 'file url')
formats = [{
'format_id': 'sd',
'url': video_url,
'ext': ext,
}]
return {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'formats': formats,
}

View File

@@ -15,6 +15,7 @@ class FirstpostIE(InfoExtractor):
'id': '1025403',
'ext': 'mp4',
'title': 'India to launch indigenous aircraft carrier INS Vikrant today',
'description': 'md5:feef3041cb09724e0bdc02843348f5f4',
}
}
@@ -22,13 +23,16 @@ class FirstpostIE(InfoExtractor):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
page = self._download_webpage(url, video_id)
title = self._html_search_meta('twitter:title', page, 'title')
description = self._html_search_meta('twitter:description', page, 'title')
data = self._download_xml(
'http://www.firstpost.com/getvideoxml-%s.xml' % video_id, video_id,
'Downloading video XML')
item = data.find('./playlist/item')
thumbnail = item.find('./image').text
title = item.find('./title').text
formats = [
{
@@ -42,6 +46,7 @@ class FirstpostIE(InfoExtractor):
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'formats': formats,
}

View File

@@ -48,24 +48,36 @@ class PluzzIE(FranceTVBaseInfoExtractor):
class FranceTvInfoIE(FranceTVBaseInfoExtractor):
IE_NAME = 'francetvinfo.fr'
_VALID_URL = r'https?://www\.francetvinfo\.fr/replay.*/(?P<title>.+)\.html'
_VALID_URL = r'https?://(?:www|mobile)\.francetvinfo\.fr/.*/(?P<title>.+)\.html'
_TEST = {
_TESTS = [{
'url': 'http://www.francetvinfo.fr/replay-jt/france-3/soir-3/jt-grand-soir-3-lundi-26-aout-2013_393427.html',
'file': '84981923.mp4',
'info_dict': {
'id': '84981923',
'ext': 'mp4',
'title': 'Soir 3',
},
'params': {
'skip_download': True,
},
}
}, {
'url': 'http://www.francetvinfo.fr/elections/europeennes/direct-europeennes-regardez-le-debat-entre-les-candidats-a-la-presidence-de-la-commission_600639.html',
'info_dict': {
'id': 'EV_20019',
'ext': 'mp4',
'title': 'Débat des candidats à la Commission européenne',
'description': 'Débat des candidats à la Commission européenne',
},
'params': {
'skip_download': 'HLS (reqires ffmpeg)'
}
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
page_title = mobj.group('title')
webpage = self._download_webpage(url, page_title)
video_id = self._search_regex(r'id-video=(\d+?)[@"]', webpage, 'video id')
video_id = self._search_regex(r'id-video=((?:[^0-9]*?_)?[0-9]+)[@"]', webpage, 'video id')
return self._extract_video(video_id)
@@ -199,7 +211,7 @@ class GenerationQuoiIE(InfoExtractor):
class CultureboxIE(FranceTVBaseInfoExtractor):
IE_NAME = 'culturebox.francetvinfo.fr'
_VALID_URL = r'https?://culturebox\.francetvinfo\.fr/(?P<name>.*?)(\?|$)'
_VALID_URL = r'https?://(?:m\.)?culturebox\.francetvinfo\.fr/(?P<name>.*?)(\?|$)'
_TEST = {
'url': 'http://culturebox.francetvinfo.fr/einstein-on-the-beach-au-theatre-du-chatelet-146813',

View File

@@ -15,7 +15,7 @@ class GamekingsIE(InfoExtractor):
'id': '20130811',
'ext': 'mp4',
'title': 'Phoenix Wright: Ace Attorney \u2013 Dual Destinies Review',
'description': 'md5:632e61a9f97d700e83f43d77ddafb6a4',
'description': 'md5:36fd701e57e8c15ac8682a2374c99731',
}
}

View File

@@ -0,0 +1,90 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
xpath_with_ns,
parse_iso8601
)
NAMESPACE_MAP = {
'media': 'http://search.yahoo.com/mrss/',
}
# URL prefix to download the mp4 files directly instead of streaming via rtmp
# Credits go to XBox-Maniac
# http://board.jdownloader.org/showpost.php?p=185835&postcount=31
RAW_MP4_URL = 'http://cdn.riptide-mtvn.com/'
class GameOneIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?gameone\.de/tv/(?P<id>\d+)'
_TEST = {
'url': 'http://www.gameone.de/tv/288',
'md5': '136656b7fb4c9cb4a8e2d500651c499b',
'info_dict': {
'id': '288',
'ext': 'mp4',
'title': 'Game One - Folge 288',
'duration': 1238,
'thumbnail': 'http://s3.gameone.de/gameone/assets/video_metas/teaser_images/000/643/636/big/640x360.jpg',
'description': 'FIFA-Pressepokal 2014, Star Citizen, Kingdom Come: Deliverance, Project Cars, Schöner Trants Nerdquiz Folge 2 Runde 1',
'age_limit': 16,
'upload_date': '20140513',
'timestamp': 1399980122,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
og_video = self._og_search_video_url(webpage, secure=False)
description = self._html_search_meta('description', webpage)
age_limit = int(
self._search_regex(
r'age=(\d+)',
self._html_search_meta(
'age-de-meta-label',
webpage),
'age_limit',
'0'))
mrss_url = self._search_regex(r'mrss=([^&]+)', og_video, 'mrss')
mrss = self._download_xml(mrss_url, video_id, 'Downloading mrss')
title = mrss.find('.//item/title').text
thumbnail = mrss.find('.//item/image').get('url')
timestamp = parse_iso8601(mrss.find('.//pubDate').text, delimiter=' ')
content = mrss.find(xpath_with_ns('.//media:content', NAMESPACE_MAP))
content_url = content.get('url')
content = self._download_xml(
content_url,
video_id,
'Downloading media:content')
rendition_items = content.findall('.//rendition')
duration = int(rendition_items[0].get('duration'))
formats = [
{
'url': re.sub(r'.*/(r2)', RAW_MP4_URL + r'\1', r.find('./src').text),
'width': int(r.get('width')),
'height': int(r.get('height')),
'tbr': int(r.get('bitrate')),
}
for r in rendition_items
]
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'duration': duration,
'formats': formats,
'description': description,
'age_limit': age_limit,
'timestamp': timestamp,
}

View File

@@ -15,11 +15,12 @@ from ..utils import (
class GameSpotIE(InfoExtractor):
_VALID_URL = r'(?:http://)?(?:www\.)?gamespot\.com/.*-(?P<page_id>\d+)/?'
_TEST = {
"url": "http://www.gamespot.com/arma-iii/videos/arma-iii-community-guide-sitrep-i-6410818/",
"file": "gs-2300-6410818.mp4",
"md5": "b2a30deaa8654fcccd43713a6b6a4825",
"info_dict": {
"title": "Arma 3 - Community Guide: SITREP I",
'url': 'http://www.gamespot.com/videos/arma-3-community-guide-sitrep-i/2300-6410818/',
'md5': 'b2a30deaa8654fcccd43713a6b6a4825',
'info_dict': {
'id': 'gs-2300-6410818',
'ext': 'mp4',
'title': 'Arma 3 - Community Guide: SITREP I',
'description': 'Check out this video where some of the basics of Arma 3 is explained.',
}
}

View File

@@ -260,7 +260,35 @@ class GenericIE(InfoExtractor):
'uploader': 'Spi0n',
},
'add_ie': ['Dailymotion'],
}
},
# YouTube embed
{
'url': 'http://www.badzine.de/ansicht/datum/2014/06/09/so-funktioniert-die-neue-englische-badminton-liga.html',
'info_dict': {
'id': 'FXRb4ykk4S0',
'ext': 'mp4',
'title': 'The NBL Auction 2014',
'uploader': 'BADMINTON England',
'uploader_id': 'BADMINTONEvents',
'upload_date': '20140603',
'description': 'md5:9ef128a69f1e262a700ed83edb163a73',
},
'add_ie': ['Youtube'],
'params': {
'skip_download': True,
}
},
# MTVSercices embed
{
'url': 'http://www.gametrailers.com/news-post/76093/north-america-europe-is-getting-that-mario-kart-8-mercedes-dlc-too',
'md5': '35727f82f58c76d996fc188f9755b0d5',
'info_dict': {
'id': '0306a69b-8adf-4fb5-aace-75f8e8cbfca9',
'ext': 'mp4',
'title': 'Review',
'description': 'Mario\'s life in the fast lane has never looked so good.',
},
},
]
def report_download_webpage(self, video_id):
@@ -355,7 +383,7 @@ class GenericIE(InfoExtractor):
if not parsed_url.scheme:
default_search = self._downloader.params.get('default_search')
if default_search is None:
default_search = 'auto_warning'
default_search = 'error'
if default_search in ('auto', 'auto_warning'):
if '/' in url:
@@ -363,9 +391,19 @@ class GenericIE(InfoExtractor):
return self.url_result('http://' + url)
else:
if default_search == 'auto_warning':
self._downloader.report_warning(
'Falling back to youtube search for %s . Set --default-search to "auto" to suppress this warning.' % url)
if re.match(r'^(?:url|URL)$', url):
raise ExtractorError(
'Invalid URL: %r . Call youtube-dl like this: youtube-dl -v "https://www.youtube.com/watch?v=BaW_jenozKc" ' % url,
expected=True)
else:
self._downloader.report_warning(
'Falling back to youtube search for %s . Set --default-search "auto" to suppress this warning.' % url)
return self.url_result('ytsearch:' + url)
elif default_search == 'error':
raise ExtractorError(
('%r is not a valid URL. '
'Set --default-search "ytseach" (or run youtube-dl "ytsearch:%s" ) to search YouTube'
) % (url, url), expected=True)
else:
assert ':' in default_search
return self.url_result(default_search + url)
@@ -473,8 +511,13 @@ class GenericIE(InfoExtractor):
# Look for embedded YouTube player
matches = re.findall(r'''(?x)
(?:<iframe[^>]+?src=|embedSWF\(\s*)
(["\'])(?P<url>(?:https?:)?//(?:www\.)?youtube\.com/
(?:
<iframe[^>]+?src=|
<embed[^>]+?src=|
embedSWF\(?:\s*
)
(["\'])
(?P<url>(?:https?:)?//(?:www\.)?youtube\.com/
(?:embed|v)/.+?)
\1''', webpage)
if matches:
@@ -560,7 +603,7 @@ class GenericIE(InfoExtractor):
# Look for embedded NovaMov-based player
mobj = re.search(
r'''(?x)<iframe[^>]+?src=(["\'])
r'''(?x)<(?:pagespeed_)?iframe[^>]+?src=(["\'])
(?P<url>http://(?:(?:embed|www)\.)?
(?:novamov\.com|
nowvideo\.(?:ch|sx|eu|at|ag|co)|
@@ -582,6 +625,11 @@ class GenericIE(InfoExtractor):
if mobj is not None:
return self.url_result(mobj.group('url'), 'VK')
# Look for embedded ivi player
mobj = re.search(r'<embed[^>]+?src=(["\'])(?P<url>https?://(?:www\.)?ivi\.ru/video/player.+?)\1', webpage)
if mobj is not None:
return self.url_result(mobj.group('url'), 'Ivi')
# Look for embedded Huffington Post player
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>https?://embed\.live\.huffingtonpost\.com/.+?)\1', webpage)
@@ -641,6 +689,22 @@ class GenericIE(InfoExtractor):
url = unescapeHTML(mobj.group('url'))
return self.url_result(url)
# Look for embedded vulture.com player
mobj = re.search(
r'<iframe src="(?P<url>https?://video\.vulture\.com/[^"]+)"',
webpage)
if mobj is not None:
url = unescapeHTML(mobj.group('url'))
return self.url_result(url, ie='Vulture')
# Look for embedded mtvservices player
mobj = re.search(
r'<iframe src="(?P<url>https?://media\.mtvnservices\.com/embed/[^"]+)"',
webpage)
if mobj is not None:
url = unescapeHTML(mobj.group('url'))
return self.url_result(url, ie='MTVServicesEmbedded')
# Start with something easy: JW Player in SWFObject
found = re.findall(r'flashvars: [\'"](?:.*&)?file=(http[^\'"&]*)', webpage)
if not found:
@@ -672,7 +736,7 @@ class GenericIE(InfoExtractor):
# HTML5 video
found = re.findall(r'(?s)<video[^<]*(?:>.*?<source.*?)? src="([^"]+)"', webpage)
if not found:
found = re.findall(
found = re.search(
r'(?i)<meta\s+(?=(?:[a-z-]+="[^"]+"\s+)*http-equiv="refresh")'
r'(?:[a-z-]+="[^"]+"\s+)*?content="[0-9]{,2};url=\'([^\']+)\'"',
webpage)

View File

@@ -52,8 +52,7 @@ class GooglePlusIE(InfoExtractor):
# Extract title
# Get the first line for title
video_title = self._html_search_regex(r'<meta name\=\"Description\" content\=\"(.*?)[\n<"]',
webpage, 'title', default='NA')
video_title = self._og_search_description(webpage).splitlines()[0]
# Step 2, Simulate clicking the image box to launch video
DOMAIN = 'https://plus.google.com/'

View File

@@ -0,0 +1,88 @@
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
compat_urllib_parse,
compat_urllib_request,
)
class GorillaVidIE(InfoExtractor):
IE_DESC = 'GorillaVid.in and daclips.in'
_VALID_URL = r'''(?x)
https?://(?P<host>(?:www\.)?
(?:daclips\.in|gorillavid\.in))/
(?:embed-)?(?P<id>[0-9a-zA-Z]+)(?:-[0-9]+x[0-9]+\.html)?
'''
_TESTS = [{
'url': 'http://gorillavid.in/06y9juieqpmi',
'md5': '5ae4a3580620380619678ee4875893ba',
'info_dict': {
'id': '06y9juieqpmi',
'ext': 'flv',
'title': 'Rebecca Black My Moment Official Music Video Reaction',
'thumbnail': 're:http://.*\.jpg',
},
}, {
'url': 'http://gorillavid.in/embed-z08zf8le23c6-960x480.html',
'md5': 'c9e293ca74d46cad638e199c3f3fe604',
'info_dict': {
'id': 'z08zf8le23c6',
'ext': 'mp4',
'title': 'Say something nice',
'thumbnail': 're:http://.*\.jpg',
},
}, {
'url': 'http://daclips.in/3rso4kdn6f9m',
'md5': '1ad8fd39bb976eeb66004d3a4895f106',
'info_dict': {
'id': '3rso4kdn6f9m',
'ext': 'mp4',
'title': 'Micro Pig piglets ready on 16th July 2009',
'thumbnail': 're:http://.*\.jpg',
},
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage('http://%s/%s' % (mobj.group('host'), video_id), video_id)
fields = dict(re.findall(r'''(?x)<input\s+
type="hidden"\s+
name="([^"]+)"\s+
(?:id="[^"]+"\s+)?
value="([^"]*)"
''', webpage))
if fields['op'] == 'download1':
post = compat_urllib_parse.urlencode(fields)
req = compat_urllib_request.Request(url, post)
req.add_header('Content-type', 'application/x-www-form-urlencoded')
webpage = self._download_webpage(req, video_id, 'Downloading video page')
title = self._search_regex(r'style="z-index: [0-9]+;">([0-9a-zA-Z ]+)(?:-.+)?</span>', webpage, 'title')
thumbnail = self._search_regex(r'image:\'(http[^\']+)\',', webpage, 'thumbnail')
url = self._search_regex(r'file: \'(http[^\']+)\',', webpage, 'file url')
formats = [{
'format_id': 'sd',
'url': url,
'ext': determine_ext(url),
'quality': 1,
}]
return {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'formats': formats,
}

View File

@@ -0,0 +1,73 @@
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
compat_urlparse,
str_to_int,
ExtractorError,
)
import json
class GoshgayIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)www.goshgay.com/video(?P<id>\d+?)($|/)'
_TEST = {
'url': 'http://www.goshgay.com/video4116282',
'md5': '268b9f3c3229105c57859e166dd72b03',
'info_dict': {
'id': '4116282',
'ext': 'flv',
'title': 'md5:089833a4790b5e103285a07337f245bf',
'thumbnail': 're:http://.*\.jpg',
'age_limit': 18,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
title = self._search_regex(r'class="video-title"><h1>(.+?)<', webpage, 'title')
player_config = self._search_regex(
r'(?s)jwplayer\("player"\)\.setup\(({.+?})\)', webpage, 'config settings')
player_vars = json.loads(player_config.replace("'", '"'))
width = str_to_int(player_vars.get('width'))
height = str_to_int(player_vars.get('height'))
config_uri = player_vars.get('config')
if config_uri is None:
raise ExtractorError('Missing config URI')
node = self._download_xml(config_uri, video_id, 'Downloading player config XML',
errnote='Unable to download XML')
if node is None:
raise ExtractorError('Missing config XML')
if node.tag != 'config':
raise ExtractorError('Missing config attribute')
fns = node.findall('file')
imgs = node.findall('image')
if len(fns) != 1:
raise ExtractorError('Missing media URI')
video_url = fns[0].text
if len(imgs) < 1:
thumbnail = None
else:
thumbnail = imgs[0].text
url_comp = compat_urlparse.urlparse(url)
ref = "%s://%s%s" % (url_comp[0], url_comp[1], url_comp[2])
return {
'id': video_id,
'url': video_url,
'title': title,
'width': width,
'height': height,
'thumbnail': thumbnail,
'http_referer': ref,
'age_limit': 18,
}

View File

@@ -1,10 +1,11 @@
from __future__ import unicode_literals
import json
import re
import time
from .common import InfoExtractor
from ..utils import (
compat_str,
compat_urllib_parse,
compat_urllib_request,
@@ -13,59 +14,55 @@ from ..utils import (
class HypemIE(InfoExtractor):
"""Information Extractor for hypem"""
_VALID_URL = r'(?:http://)?(?:www\.)?hypem\.com/track/([^/]+)/([^/]+)'
_VALID_URL = r'http://(?:www\.)?hypem\.com/track/([^/]+)/([^/]+)'
_TEST = {
u'url': u'http://hypem.com/track/1v6ga/BODYWORK+-+TAME',
u'file': u'1v6ga.mp3',
u'md5': u'b9cc91b5af8995e9f0c1cee04c575828',
u'info_dict': {
u"title": u"Tame"
'url': 'http://hypem.com/track/1v6ga/BODYWORK+-+TAME',
'md5': 'b9cc91b5af8995e9f0c1cee04c575828',
'info_dict': {
'id': '1v6ga',
'ext': 'mp3',
'title': 'Tame',
'uploader': 'BODYWORK',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
if mobj is None:
raise ExtractorError(u'Invalid URL: %s' % url)
track_id = mobj.group(1)
data = {'ax': 1, 'ts': time.time()}
data_encoded = compat_urllib_parse.urlencode(data)
complete_url = url + "?" + data_encoded
request = compat_urllib_request.Request(complete_url)
response, urlh = self._download_webpage_handle(request, track_id, u'Downloading webpage with the url')
response, urlh = self._download_webpage_handle(
request, track_id, 'Downloading webpage with the url')
cookie = urlh.headers.get('Set-Cookie', '')
self.report_extraction(track_id)
html_tracks = self._html_search_regex(r'<script type="application/json" id="displayList-data">(.*?)</script>',
response, u'tracks', flags=re.MULTILINE|re.DOTALL).strip()
html_tracks = self._html_search_regex(
r'(?ms)<script type="application/json" id="displayList-data">\s*(.*?)\s*</script>',
response, 'tracks')
try:
track_list = json.loads(html_tracks)
track = track_list[u'tracks'][0]
track = track_list['tracks'][0]
except ValueError:
raise ExtractorError(u'Hypemachine contained invalid JSON.')
raise ExtractorError('Hypemachine contained invalid JSON.')
key = track[u"key"]
track_id = track[u"id"]
artist = track[u"artist"]
title = track[u"song"]
key = track['key']
track_id = track['id']
artist = track['artist']
title = track['song']
serve_url = "http://hypem.com/serve/source/%s/%s" % (compat_str(track_id), compat_str(key))
request = compat_urllib_request.Request(serve_url, "" , {'Content-Type': 'application/json'})
serve_url = "http://hypem.com/serve/source/%s/%s" % (track_id, key)
request = compat_urllib_request.Request(
serve_url, '', {'Content-Type': 'application/json'})
request.add_header('cookie', cookie)
song_data_json = self._download_webpage(request, track_id, u'Downloading metadata')
try:
song_data = json.loads(song_data_json)
except ValueError:
raise ExtractorError(u'Hypemachine contained invalid JSON.')
final_url = song_data[u"url"]
song_data = self._download_json(request, track_id, 'Downloading metadata')
final_url = song_data["url"]
return [{
'id': track_id,
'url': final_url,
'ext': "mp3",
'title': title,
'artist': artist,
}]
return {
'id': track_id,
'url': final_url,
'ext': 'mp3',
'title': title,
'uploader': artist,
}

View File

@@ -14,7 +14,7 @@ from ..utils import (
class IviIE(InfoExtractor):
IE_DESC = 'ivi.ru'
IE_NAME = 'ivi'
_VALID_URL = r'https?://(?:www\.)?ivi\.ru/watch(?:/(?P<compilationid>[^/]+))?/(?P<videoid>\d+)'
_VALID_URL = r'https?://(?:www\.)?ivi\.ru/(?:watch/(?:[^/]+/)?|video/player\?.*?videoId=)(?P<videoid>\d+)'
_TESTS = [
# Single movie
@@ -33,14 +33,14 @@ class IviIE(InfoExtractor):
},
# Serial's serie
{
'url': 'http://www.ivi.ru/watch/dezhurnyi_angel/74791',
'md5': '3e6cc9a848c1d2ebcc6476444967baa9',
'url': 'http://www.ivi.ru/watch/dvoe_iz_lartsa/9549',
'md5': '221f56b35e3ed815fde2df71032f4b3e',
'info_dict': {
'id': '74791',
'id': '9549',
'ext': 'mp4',
'title': 'Дежурный ангел - 1 серия',
'duration': 2490,
'thumbnail': 'http://thumbs.ivi.ru/f7.vcp.digitalaccess.ru/contents/8/e/bc2f6c2b6e5d291152fdd32c059141.jpg',
'title': 'Двое из ларца - Серия 1',
'duration': 2655,
'thumbnail': 'http://thumbs.ivi.ru/f15.vcp.digitalaccess.ru/contents/8/4/0068dc0677041f3336b7c2baad8fc0.jpg',
},
'skip': 'Only works from Russia',
}

View File

@@ -0,0 +1,35 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class Ku6IE(InfoExtractor):
_VALID_URL = r'http://v\.ku6\.com/show/(?P<id>[a-zA-Z0-9\-\_]+)(?:\.)*html'
_TEST = {
'url': 'http://v.ku6.com/show/JG-8yS14xzBr4bCn1pu0xw...html',
'md5': '01203549b9efbb45f4b87d55bdea1ed1',
'info_dict': {
'id': 'JG-8yS14xzBr4bCn1pu0xw',
'ext': 'f4v',
'title': 'techniques test',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
title = self._search_regex(r'<h1 title=.*>(.*?)</h1>', webpage, 'title')
dataUrl = 'http://v.ku6.com/fetchVideo4Player/%s.html' % video_id
jsonData = self._download_json(dataUrl, video_id)
downloadUrl = jsonData['data']['f']
return {
'id': video_id,
'title': title,
'url': downloadUrl
}

View File

@@ -24,7 +24,7 @@ class LifeNewsIE(InfoExtractor):
'ext': 'mp4',
'title': 'МВД разыскивает мужчин, оставивших в IKEA сумку с автоматом',
'description': 'Камеры наблюдения гипермаркета зафиксировали троих мужчин, спрятавших оружейный арсенал в камере хранения.',
'thumbnail': 'http://lifenews.ru/static/posts/2014/1/126342/.video.jpg',
'thumbnail': 're:http://.*\.jpg',
'upload_date': '20140130',
}
}

View File

@@ -1,3 +1,5 @@
from __future__ import unicode_literals
import re
import json
@@ -6,31 +8,35 @@ from ..utils import (
compat_urllib_parse_urlparse,
compat_urlparse,
xpath_with_ns,
compat_str,
orderedSet,
)
class LivestreamIE(InfoExtractor):
IE_NAME = u'livestream'
IE_NAME = 'livestream'
_VALID_URL = r'http://new\.livestream\.com/.*?/(?P<event_name>.*?)(/videos/(?P<id>\d+))?/?$'
_TEST = {
u'url': u'http://new.livestream.com/CoheedandCambria/WebsterHall/videos/4719370',
u'file': u'4719370.mp4',
u'md5': u'0d2186e3187d185a04b3cdd02b828836',
u'info_dict': {
u'title': u'Live from Webster Hall NYC',
u'upload_date': u'20121012',
'url': 'http://new.livestream.com/CoheedandCambria/WebsterHall/videos/4719370',
'md5': '53274c76ba7754fb0e8d072716f2292b',
'info_dict': {
'id': '4719370',
'ext': 'mp4',
'title': 'Live from Webster Hall NYC',
'upload_date': '20121012',
}
}
def _extract_video_info(self, video_data):
video_url = video_data.get('progressive_url_hd') or video_data.get('progressive_url')
return {'id': video_data['id'],
'url': video_url,
'ext': 'mp4',
'title': video_data['caption'],
'thumbnail': video_data['thumbnail_url'],
'upload_date': video_data['updated_at'].replace('-','')[:8],
}
return {
'id': compat_str(video_data['id']),
'url': video_url,
'ext': 'mp4',
'title': video_data['caption'],
'thumbnail': video_data['thumbnail_url'],
'upload_date': video_data['updated_at'].replace('-', '')[:8],
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
@@ -40,43 +46,43 @@ class LivestreamIE(InfoExtractor):
if video_id is None:
# This is an event page:
config_json = self._search_regex(r'window.config = ({.*?});',
webpage, u'window config')
config_json = self._search_regex(
r'window.config = ({.*?});', webpage, 'window config')
info = json.loads(config_json)['event']
videos = [self._extract_video_info(video_data['data'])
for video_data in info['feed']['data'] if video_data['type'] == u'video']
for video_data in info['feed']['data'] if video_data['type'] == 'video']
return self.playlist_result(videos, info['id'], info['full_name'])
else:
og_video = self._og_search_video_url(webpage, name=u'player url')
og_video = self._og_search_video_url(webpage, 'player url')
query_str = compat_urllib_parse_urlparse(og_video).query
query = compat_urlparse.parse_qs(query_str)
api_url = query['play_url'][0].replace('.smil', '')
info = json.loads(self._download_webpage(api_url, video_id,
u'Downloading video info'))
info = json.loads(self._download_webpage(
api_url, video_id, 'Downloading video info'))
return self._extract_video_info(info)
# The original version of Livestream uses a different system
class LivestreamOriginalIE(InfoExtractor):
IE_NAME = u'livestream:original'
_VALID_URL = r'https?://www\.livestream\.com/(?P<user>[^/]+)/video\?.*?clipId=(?P<id>.*?)(&|$)'
IE_NAME = 'livestream:original'
_VALID_URL = r'''(?x)https?://www\.livestream\.com/
(?P<user>[^/]+)/(?P<type>video|folder)
(?:\?.*?Id=|/)(?P<id>.*?)(&|$)
'''
_TEST = {
u'url': u'http://www.livestream.com/dealbook/video?clipId=pla_8aa4a3f1-ba15-46a4-893b-902210e138fb',
u'info_dict': {
u'id': u'pla_8aa4a3f1-ba15-46a4-893b-902210e138fb',
u'ext': u'flv',
u'title': u'Spark 1 (BitCoin) with Cameron Winklevoss & Tyler Winklevoss of Winklevoss Capital',
'url': 'http://www.livestream.com/dealbook/video?clipId=pla_8aa4a3f1-ba15-46a4-893b-902210e138fb',
'info_dict': {
'id': 'pla_8aa4a3f1-ba15-46a4-893b-902210e138fb',
'ext': 'flv',
'title': 'Spark 1 (BitCoin) with Cameron Winklevoss & Tyler Winklevoss of Winklevoss Capital',
},
u'params': {
'params': {
# rtmp
u'skip_download': True,
'skip_download': True,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
user = mobj.group('user')
def _extract_video(self, user, video_id):
api_url = 'http://x{0}x.api.channel.livestream.com/2.0/clipdetails?extendedInfo=true&id={1}'.format(user, video_id)
info = self._download_xml(api_url, video_id)
@@ -84,7 +90,7 @@ class LivestreamOriginalIE(InfoExtractor):
ns = {'media': 'http://search.yahoo.com/mrss'}
thumbnail_url = item.find(xpath_with_ns('media:thumbnail', ns)).attrib['url']
# Remove the extension and number from the path (like 1.jpg)
path = self._search_regex(r'(user-files/.+)_.*?\.jpg$', thumbnail_url, u'path')
path = self._search_regex(r'(user-files/.+)_.*?\.jpg$', thumbnail_url, 'path')
return {
'id': video_id,
@@ -94,3 +100,44 @@ class LivestreamOriginalIE(InfoExtractor):
'ext': 'flv',
'thumbnail': thumbnail_url,
}
def _extract_folder(self, url, folder_id):
webpage = self._download_webpage(url, folder_id)
urls = orderedSet(re.findall(r'<a href="(https?://livestre\.am/.*?)"', webpage))
return {
'_type': 'playlist',
'id': folder_id,
'entries': [{
'_type': 'url',
'url': video_url,
} for video_url in urls],
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
id = mobj.group('id')
user = mobj.group('user')
url_type = mobj.group('type')
if url_type == 'folder':
return self._extract_folder(url, id)
else:
return self._extract_video(user, id)
# The server doesn't support HEAD request, the generic extractor can't detect
# the redirection
class LivestreamShortenerIE(InfoExtractor):
IE_NAME = 'livestream:shortener'
IE_DESC = False # Do not list
_VALID_URL = r'https?://livestre\.am/(?P<id>.+)'
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
id = mobj.group('id')
webpage = self._download_webpage(url, id)
return {
'_type': 'url',
'url': self._og_search_url(webpage),
}

View File

@@ -2,7 +2,6 @@
from __future__ import unicode_literals
import re
import datetime
from .common import InfoExtractor
@@ -10,28 +9,48 @@ from .common import InfoExtractor
class MailRuIE(InfoExtractor):
IE_NAME = 'mailru'
IE_DESC = 'Видео@Mail.Ru'
_VALID_URL = r'http://(?:www\.)?my\.mail\.ru/video/.*#video=/?(?P<id>[^/]+/[^/]+/[^/]+/\d+)'
_VALID_URL = r'http://(?:www\.)?my\.mail\.ru/(?:video/.*#video=/?(?P<idv1>(?:[^/]+/){3}\d+)|(?:(?P<idv2prefix>(?:[^/]+/){2})video/(?P<idv2suffix>[^/]+/\d+))\.html)'
_TEST = {
'url': 'http://my.mail.ru/video/top#video=/mail/sonypicturesrus/75/76',
'md5': 'dea205f03120046894db4ebb6159879a',
'info_dict': {
'id': '46301138',
'ext': 'mp4',
'title': 'Новый Человек-Паук. Высокое напряжение. Восстание Электро',
'upload_date': '20140224',
'uploader': 'sonypicturesrus',
'uploader_id': 'sonypicturesrus@mail.ru',
'duration': 184,
}
}
_TESTS = [
{
'url': 'http://my.mail.ru/video/top#video=/mail/sonypicturesrus/75/76',
'md5': 'dea205f03120046894db4ebb6159879a',
'info_dict': {
'id': '46301138',
'ext': 'mp4',
'title': 'Новый Человек-Паук. Высокое напряжение. Восстание Электро',
'timestamp': 1393232740,
'upload_date': '20140224',
'uploader': 'sonypicturesrus',
'uploader_id': 'sonypicturesrus@mail.ru',
'duration': 184,
},
},
{
'url': 'http://my.mail.ru/corp/hitech/video/news_hi-tech_mail_ru/1263.html',
'md5': '00a91a58c3402204dcced523777b475f',
'info_dict': {
'id': '46843144',
'ext': 'mp4',
'title': 'Samsung Galaxy S5 Hammer Smash Fail Battery Explosion',
'timestamp': 1397217632,
'upload_date': '20140411',
'uploader': 'hitech',
'uploader_id': 'hitech@corp.mail.ru',
'duration': 245,
},
},
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = mobj.group('idv1')
if not video_id:
video_id = mobj.group('idv2prefix') + mobj.group('idv2suffix')
video_data = self._download_json(
'http://videoapi.my.mail.ru/videos/%s.json?new=1' % video_id, video_id, 'Downloading video JSON')
'http://api.video.mail.ru/videos/%s.json?new=1' % video_id, video_id, 'Downloading video JSON')
author = video_data['author']
uploader = author['name']
@@ -40,10 +59,11 @@ class MailRuIE(InfoExtractor):
movie = video_data['movie']
content_id = str(movie['contentId'])
title = movie['title']
if title.endswith('.mp4'):
title = title[:-4]
thumbnail = movie['poster']
duration = movie['duration']
upload_date = datetime.datetime.fromtimestamp(video_data['timestamp']).strftime('%Y%m%d')
view_count = video_data['views_count']
formats = [
@@ -57,7 +77,7 @@ class MailRuIE(InfoExtractor):
'id': content_id,
'title': title,
'thumbnail': thumbnail,
'upload_date': upload_date,
'timestamp': video_data['timestamp'],
'uploader': uploader,
'uploader_id': uploader_id,
'duration': duration,

102
youtube_dl/extractor/mlb.py Normal file
View File

@@ -0,0 +1,102 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
parse_duration,
parse_iso8601,
find_xpath_attr,
)
class MLBIE(InfoExtractor):
_VALID_URL = r'https?://m\.mlb\.com/video/(?:topic/[\da-z_-]+/)?v(?P<id>n?\d+)'
_TESTS = [
{
'url': 'http://m.mlb.com/video/topic/81536970/v34496663/mianym-stanton-practices-for-the-home-run-derby',
'md5': 'd9c022c10d21f849f49c05ae12a8a7e9',
'info_dict': {
'id': '34496663',
'ext': 'mp4',
'title': 'Stanton prepares for Derby',
'description': 'md5:d00ce1e5fd9c9069e9c13ab4faedfa57',
'duration': 46,
'timestamp': 1405105800,
'upload_date': '20140711',
'thumbnail': 're:^https?://.*\.jpg$',
},
},
{
'url': 'http://m.mlb.com/video/topic/vtp_hrd_sponsor/v34578115/hrd-cespedes-wins-2014-gillette-home-run-derby',
'md5': '0e6e73d509321e142409b695eadd541f',
'info_dict': {
'id': '34578115',
'ext': 'mp4',
'title': 'Cespedes repeats as Derby champ',
'description': 'md5:08df253ce265d4cf6fb09f581fafad07',
'duration': 488,
'timestamp': 1405399936,
'upload_date': '20140715',
'thumbnail': 're:^https?://.*\.jpg$',
},
},
{
'url': 'http://m.mlb.com/video/v34577915/bautista-on-derby-captaining-duties-his-performance',
'md5': 'b8fd237347b844365d74ea61d4245967',
'info_dict': {
'id': '34577915',
'ext': 'mp4',
'title': 'Bautista on Home Run Derby',
'description': 'md5:b80b34031143d0986dddc64a8839f0fb',
'duration': 52,
'timestamp': 1405390722,
'upload_date': '20140715',
'thumbnail': 're:^https?://.*\.jpg$',
},
},
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
detail = self._download_xml(
'http://m.mlb.com/gen/multimedia/detail/%s/%s/%s/%s.xml'
% (video_id[-3], video_id[-2], video_id[-1], video_id), video_id)
title = detail.find('./headline').text
description = detail.find('./big-blurb').text
duration = parse_duration(detail.find('./duration').text)
timestamp = parse_iso8601(detail.attrib['date'][:-5])
thumbnail = find_xpath_attr(
detail, './thumbnailScenarios/thumbnailScenario', 'type', '45').text
formats = []
for media_url in detail.findall('./url'):
playback_scenario = media_url.attrib['playback_scenario']
fmt = {
'url': media_url.text,
'format_id': playback_scenario,
}
m = re.search(r'(?P<vbr>\d+)K_(?P<width>\d+)X(?P<height>\d+)', playback_scenario)
if m:
fmt.update({
'vbr': int(m.group('vbr')) * 1000,
'width': int(m.group('width')),
'height': int(m.group('height')),
})
formats.append(fmt)
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': description,
'duration': duration,
'timestamp': timestamp,
'formats': formats,
'thumbnail': thumbnail,
}

View File

@@ -0,0 +1,87 @@
from __future__ import unicode_literals
import datetime
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
unified_strdate,
)
class MotherlessIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?motherless\.com/(?P<id>[A-Z0-9]+)'
_TESTS = [
{
'url': 'http://motherless.com/AC3FFE1',
'md5': '5527fef81d2e529215dad3c2d744a7d9',
'info_dict': {
'id': 'AC3FFE1',
'ext': 'flv',
'title': 'Fucked in the ass while playing PS3',
'categories': ['Gaming', 'anal', 'reluctant', 'rough', 'Wife'],
'upload_date': '20100913',
'uploader_id': 'famouslyfuckedup',
'thumbnail': 're:http://.*\.jpg',
'age_limit': 18,
}
},
{
'url': 'http://motherless.com/532291B',
'md5': 'bc59a6b47d1f958e61fbd38a4d31b131',
'info_dict': {
'id': '532291B',
'ext': 'mp4',
'title': 'Amazing girl playing the omegle game, PERFECT!',
'categories': ['Amateur', 'webcam', 'omegle', 'pink', 'young', 'masturbate', 'teen', 'game', 'hairy'],
'upload_date': '20140622',
'uploader_id': 'Sulivana7x',
'thumbnail': 're:http://.*\.jpg',
'age_limit': 18,
}
}
]
def _real_extract(self,url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
title = self._html_search_regex(r'id="view-upload-title">\s+([^<]+)<', webpage, 'title')
video_url = self._html_search_regex(r'setup\(\{\s+"file".+: "([^"]+)",', webpage, 'video_url')
age_limit = self._rta_search(webpage)
view_count = self._html_search_regex(r'<strong>Views</strong>\s+([^<]+)<', webpage, 'view_count')
upload_date = self._html_search_regex(r'<strong>Uploaded</strong>\s+([^<]+)<', webpage, 'upload_date')
if 'Ago' in upload_date:
days = int(re.search(r'([0-9]+)', upload_date).group(1))
upload_date = (datetime.datetime.now() - datetime.timedelta(days=days)).strftime('%Y%m%d')
else:
upload_date = unified_strdate(upload_date)
like_count = self._html_search_regex(r'<strong>Favorited</strong>\s+([^<]+)<', webpage, 'like_count')
comment_count = webpage.count('class="media-comment-contents"')
uploader_id = self._html_search_regex(r'"thumb-member-username">\s+<a href="/m/([^"]+)"', webpage, 'uploader_id')
categories = self._html_search_meta('keywords', webpage)
if categories:
categories = [cat.strip() for cat in categories.split(',')]
return {
'id': video_id,
'title': title,
'upload_date': upload_date,
'uploader_id': uploader_id,
'thumbnail': self._og_search_thumbnail(webpage),
'categories': categories,
'view_count': int_or_none(view_count.replace(',', '')),
'like_count': int_or_none(like_count.replace(',', '')),
'comment_count': comment_count,
'age_limit': age_limit,
'url': video_url,
}

View File

@@ -28,7 +28,7 @@ class MporaIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
data_json = self._search_regex(
r"new FM\.Player\('[^']+',\s*(\{.*?)\);\n", webpage, 'json')
r"new FM\.Player\('[^']+',\s*(\{.*?)\).player;", webpage, 'json')
data = json.loads(data_json)

View File

@@ -22,6 +22,7 @@ def _media_xml_tag(tag):
class MTVServicesInfoExtractor(InfoExtractor):
_MOBILE_TEMPLATE = None
@staticmethod
def _id_from_uri(uri):
return uri.split(':')[-1]
@@ -35,6 +36,9 @@ class MTVServicesInfoExtractor(InfoExtractor):
base = 'http://mtvnmobile.vo.llnwd.net/kip0/_pxn=1+_pxI0=Ripod-h264+_pxL0=undefined+_pxM0=+_pxK=18639+_pxE=mp4/44620/mtvnorigin/'
return base + m.group('finalid')
def _get_feed_url(self, uri):
return self._FEED_URL
def _get_thumbnail_url(self, uri, itemdoc):
search_path = '%s/%s' % (_media_xml_tag('group'), _media_xml_tag('thumbnail'))
thumb_node = itemdoc.find(search_path)
@@ -80,6 +84,7 @@ class MTVServicesInfoExtractor(InfoExtractor):
})
except (KeyError, TypeError):
raise ExtractorError('Invalid rendition field.')
self._sort_formats(formats)
return formats
def _get_video_info(self, itemdoc):
@@ -135,10 +140,10 @@ class MTVServicesInfoExtractor(InfoExtractor):
def _get_videos_info(self, uri):
video_id = self._id_from_uri(uri)
feed_url = self._get_feed_url(uri)
data = compat_urllib_parse.urlencode({'uri': uri})
idoc = self._download_xml(
self._FEED_URL + '?' + data, video_id,
feed_url + '?' + data, video_id,
'Downloading info', transform_source=fix_xml_ampersands)
return [self._get_video_info(item) for item in idoc.findall('.//item')]
@@ -153,12 +158,46 @@ class MTVServicesInfoExtractor(InfoExtractor):
if mgid.endswith('.swf'):
mgid = mgid[:-4]
except RegexNotFoundError:
mgid = None
if mgid is None or ':' not in mgid:
mgid = self._search_regex(
[r'data-mgid="(.*?)"', r'swfobject.embedSWF\(".*?(mgid:.*?)"'],
webpage, u'mgid')
return self._get_videos_info(mgid)
class MTVServicesEmbeddedIE(MTVServicesInfoExtractor):
IE_NAME = 'mtvservices:embedded'
_VALID_URL = r'https?://media\.mtvnservices\.com/embed/(?P<mgid>.+?)(\?|/|$)'
_TEST = {
# From http://www.thewrap.com/peter-dinklage-sums-up-game-of-thrones-in-45-seconds-video/
'url': 'http://media.mtvnservices.com/embed/mgid:uma:video:mtv.com:1043906/cp~vid%3D1043906%26uri%3Dmgid%3Auma%3Avideo%3Amtv.com%3A1043906',
'md5': 'cb349b21a7897164cede95bd7bf3fbb9',
'info_dict': {
'id': '1043906',
'ext': 'mp4',
'title': 'Peter Dinklage Sums Up \'Game Of Thrones\' In 45 Seconds',
'description': '"Sexy sexy sexy, stabby stabby stabby, beautiful language," says Peter Dinklage as he tries summarizing "Game of Thrones" in under a minute.',
},
}
def _get_feed_url(self, uri):
video_id = self._id_from_uri(uri)
site_id = uri.replace(video_id, '')
config_url = 'http://media.mtvnservices.com/pmt/e1/players/{0}/config.xml'.format(site_id)
config_doc = self._download_xml(config_url, video_id)
feed_node = config_doc.find('.//feed')
feed_url = feed_node.text.strip().split('?')[0]
return feed_url
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
mgid = mobj.group('mgid')
return self._get_videos_info(mgid)
class MTVIE(MTVServicesInfoExtractor):
_VALID_URL = r'''(?x)^https?://
(?:(?:www\.)?mtv\.com/videos/.+?/(?P<videoid>[0-9]+)/[^/]+$|

View File

@@ -1,4 +1,6 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
@@ -12,12 +14,13 @@ class NaverIE(InfoExtractor):
_VALID_URL = r'https?://(?:m\.)?tvcast\.naver\.com/v/(?P<id>\d+)'
_TEST = {
u'url': u'http://tvcast.naver.com/v/81652',
u'file': u'81652.mp4',
u'info_dict': {
u'title': u'[9월 모의고사 해설강의][수학_김상희] 수학 A형 16~20번',
u'description': u'합격불변의 법칙 메가스터디 | 메가스터디 수학 김상희 선생님이 9월 모의고사 수학A형 16번에서 20번까지 해설강의를 공개합니다.',
u'upload_date': u'20130903',
'url': 'http://tvcast.naver.com/v/81652',
'info_dict': {
'id': '81652',
'ext': 'mp4',
'title': '[9월 모의고사 해설강의][수학_김상희] 수학 A형 16~20',
'description': '합격불변의 법칙 메가스터디 | 메가스터디 수학 김상희 선생님이 9월 모의고사 수학A형 16번에서 20번까지 해설강의를 공개합니다.',
'upload_date': '20130903',
},
}
@@ -28,7 +31,7 @@ class NaverIE(InfoExtractor):
m_id = re.search(r'var rmcPlayer = new nhn.rmcnmv.RMCVideoPlayer\("(.+?)", "(.+?)"',
webpage)
if m_id is None:
raise ExtractorError(u'couldn\'t extract vid and key')
raise ExtractorError('couldn\'t extract vid and key')
vid = m_id.group(1)
key = m_id.group(2)
query = compat_urllib_parse.urlencode({'vid': vid, 'inKey': key,})
@@ -39,22 +42,27 @@ class NaverIE(InfoExtractor):
})
info = self._download_xml(
'http://serviceapi.rmcnmv.naver.com/flash/videoInfo.nhn?' + query,
video_id, u'Downloading video info')
video_id, 'Downloading video info')
urls = self._download_xml(
'http://serviceapi.rmcnmv.naver.com/flash/playableEncodingOption.nhn?' + query_urls,
video_id, u'Downloading video formats info')
video_id, 'Downloading video formats info')
formats = []
for format_el in urls.findall('EncodingOptions/EncodingOption'):
domain = format_el.find('Domain').text
if domain.startswith('rtmp'):
continue
formats.append({
f = {
'url': domain + format_el.find('uri').text,
'ext': 'mp4',
'width': int(format_el.find('width').text),
'height': int(format_el.find('height').text),
})
}
if domain.startswith('rtmp'):
f.update({
'ext': 'flv',
'rtmp_protocol': '1', # rtmpt
})
formats.append(f)
self._sort_formats(formats)
return {
'id': video_id,

View File

@@ -1,6 +1,7 @@
from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor
from ..utils import find_xpath_attr, compat_str
@@ -31,30 +32,68 @@ class NBCIE(InfoExtractor):
class NBCNewsIE(InfoExtractor):
_VALID_URL = r'https?://www\.nbcnews\.com/video/.+?/(?P<id>\d+)'
_VALID_URL = r'''(?x)https?://www\.nbcnews\.com/
((video/.+?/(?P<id>\d+))|
(feature/[^/]+/(?P<title>.+)))
'''
_TEST = {
'url': 'http://www.nbcnews.com/video/nbc-news/52753292',
'md5': '47abaac93c6eaf9ad37ee6c4463a5179',
'info_dict': {
'id': '52753292',
'ext': 'flv',
'title': 'Crew emerges after four-month Mars food study',
'description': 'md5:24e632ffac72b35f8b67a12d1b6ddfc1',
_TESTS = [
{
'url': 'http://www.nbcnews.com/video/nbc-news/52753292',
'md5': '47abaac93c6eaf9ad37ee6c4463a5179',
'info_dict': {
'id': '52753292',
'ext': 'flv',
'title': 'Crew emerges after four-month Mars food study',
'description': 'md5:24e632ffac72b35f8b67a12d1b6ddfc1',
},
},
}
{
'url': 'http://www.nbcnews.com/feature/edward-snowden-interview/how-twitter-reacted-snowden-interview-n117236',
'md5': 'b2421750c9f260783721d898f4c42063',
'info_dict': {
'id': 'I1wpAI_zmhsQ',
'ext': 'flv',
'title': 'How Twitter Reacted To The Snowden Interview',
'description': 'md5:65a0bd5d76fe114f3c2727aa3a81fe64',
},
'add_ie': ['ThePlatform'],
},
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
all_info = self._download_xml('http://www.nbcnews.com/id/%s/displaymode/1219' % video_id, video_id)
info = all_info.find('video')
if video_id is not None:
all_info = self._download_xml('http://www.nbcnews.com/id/%s/displaymode/1219' % video_id, video_id)
info = all_info.find('video')
return {
'id': video_id,
'title': info.find('headline').text,
'ext': 'flv',
'url': find_xpath_attr(info, 'media', 'type', 'flashVideo').text,
'description': compat_str(info.find('caption').text),
'thumbnail': find_xpath_attr(info, 'media', 'type', 'thumbnail').text,
}
return {
'id': video_id,
'title': info.find('headline').text,
'ext': 'flv',
'url': find_xpath_attr(info, 'media', 'type', 'flashVideo').text,
'description': compat_str(info.find('caption').text),
'thumbnail': find_xpath_attr(info, 'media', 'type', 'thumbnail').text,
}
else:
# "feature" pages use theplatform.com
title = mobj.group('title')
webpage = self._download_webpage(url, title)
bootstrap_json = self._search_regex(
r'var bootstrapJson = ({.+})\s*$', webpage, 'bootstrap json',
flags=re.MULTILINE)
bootstrap = json.loads(bootstrap_json)
info = bootstrap['results'][0]['video']
playlist_url = info['fallbackPlaylistUrl'] + '?form=MPXNBCNewsAPI'
mpxid = info['mpxId']
all_videos = self._download_json(playlist_url, title)['videos']
# The response contains additional videos
info = next(v for v in all_videos if v['mpxId'] == mpxid)
return {
'_type': 'url',
# We get the best quality video
'url': info['videoAssets'][-1]['publicUrl'],
'ie_key': 'ThePlatform',
}

View File

@@ -4,7 +4,11 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import ExtractorError
from ..utils import (
ExtractorError,
int_or_none,
qualities,
)
class NDRIE(InfoExtractor):
@@ -14,15 +18,15 @@ class NDRIE(InfoExtractor):
_TESTS = [
{
'url': 'http://www.ndr.de/fernsehen/sendungen/markt/markt7959.html',
'md5': 'e7a6079ca39d3568f4996cb858dd6708',
'url': 'http://www.ndr.de/fernsehen/media/dienordreportage325.html',
'md5': '4a4eeafd17c3058b65f0c8f091355855',
'note': 'Video file',
'info_dict': {
'id': '7959',
'id': '325',
'ext': 'mp4',
'title': 'Markt - die ganze Sendung',
'description': 'md5:af9179cf07f67c5c12dc6d9997e05725',
'duration': 2655,
'title': 'Blaue Bohnen aus Blocken',
'description': 'md5:190d71ba2ccddc805ed01547718963bc',
'duration': 1715,
},
},
{
@@ -45,17 +49,16 @@ class NDRIE(InfoExtractor):
page = self._download_webpage(url, video_id, 'Downloading page')
title = self._og_search_title(page)
title = self._og_search_title(page).strip()
description = self._og_search_description(page)
if description:
description = description.strip()
mobj = re.search(
r'<div class="duration"><span class="min">(?P<minutes>\d+)</span>:<span class="sec">(?P<seconds>\d+)</span></div>',
page)
duration = int(mobj.group('minutes')) * 60 + int(mobj.group('seconds')) if mobj else None
duration = int_or_none(self._html_search_regex(r'duration: (\d+),\n', page, 'duration', fatal=False))
formats = []
mp3_url = re.search(r'''{src:'(?P<audio>[^']+)', type:"audio/mp3"},''', page)
mp3_url = re.search(r'''\{src:'(?P<audio>[^']+)', type:"audio/mp3"},''', page)
if mp3_url:
formats.append({
'url': mp3_url.group('audio'),
@@ -64,13 +67,15 @@ class NDRIE(InfoExtractor):
thumbnail = None
video_url = re.search(r'''3: {src:'(?P<video>.+?)\.hi\.mp4', type:"video/mp4"},''', page)
video_url = re.search(r'''3: \{src:'(?P<video>.+?)\.hi\.mp4', type:"video/mp4"},''', page)
if video_url:
thumbnail = self._html_search_regex(r'(?m)title: "NDR PLAYER",\s*poster: "([^"]+)",',
page, 'thumbnail', fatal=False)
if thumbnail:
thumbnail = 'http://www.ndr.de' + thumbnail
for format_id in ['lo', 'hi', 'hq']:
thumbnails = re.findall(r'''\d+: \{src: "([^"]+)"(?: \|\| '[^']+')?, quality: '([^']+)'}''', page)
if thumbnails:
quality_key = qualities(['xs', 's', 'm', 'l', 'xl'])
largest = max(thumbnails, key=lambda thumb: quality_key(thumb[1]))
thumbnail = 'http://www.ndr.de' + largest[0]
for format_id in 'lo', 'hi', 'hq':
formats.append({
'url': '%s.%s.mp4' % (video_url.group('video'), format_id),
'format_id': format_id,

View File

@@ -1,22 +1,28 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import month_by_name
from ..utils import (
month_by_name,
int_or_none,
)
class NDTVIE(InfoExtractor):
_VALID_URL = r'^https?://(?:www\.)?ndtv\.com/video/player/[^/]*/[^/]*/(?P<id>[a-z0-9]+)'
_TEST = {
u"url": u"http://www.ndtv.com/video/player/news/ndtv-exclusive-don-t-need-character-certificate-from-rahul-gandhi-says-arvind-kejriwal/300710",
u"file": u"300710.mp4",
u"md5": u"39f992dbe5fb531c395d8bbedb1e5e88",
u"info_dict": {
u"title": u"NDTV exclusive: Don't need character certificate from Rahul Gandhi, says Arvind Kejriwal",
u"description": u"In an exclusive interview to NDTV, Aam Aadmi Party's Arvind Kejriwal says it makes no difference to him that Rahul Gandhi said the Congress needs to learn from his party.",
u"upload_date": u"20131208",
u"duration": 1327,
u"thumbnail": u"http://i.ndtvimg.com/video/images/vod/medium/2013-12/big_300710_1386518307.jpg",
'url': 'http://www.ndtv.com/video/player/news/ndtv-exclusive-don-t-need-character-certificate-from-rahul-gandhi-says-arvind-kejriwal/300710',
'md5': '39f992dbe5fb531c395d8bbedb1e5e88',
'info_dict': {
'id': '300710',
'ext': 'mp4',
'title': "NDTV exclusive: Don't need character certificate from Rahul Gandhi, says Arvind Kejriwal",
'description': 'md5:ab2d4b4a6056c5cb4caa6d729deabf02',
'upload_date': '20131208',
'duration': 1327,
'thumbnail': 'http://i.ndtvimg.com/video/images/vod/medium/2013-12/big_300710_1386518307.jpg',
},
}
@@ -27,13 +33,12 @@ class NDTVIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
filename = self._search_regex(
r"__filename='([^']+)'", webpage, u'video filename')
video_url = (u'http://bitcast-b.bitgravity.com/ndtvod/23372/ndtv/%s' %
r"__filename='([^']+)'", webpage, 'video filename')
video_url = ('http://bitcast-b.bitgravity.com/ndtvod/23372/ndtv/%s' %
filename)
duration_str = filename = self._search_regex(
r"__duration='([^']+)'", webpage, u'duration', fatal=False)
duration = None if duration_str is None else int(duration_str)
duration = int_or_none(self._search_regex(
r"__duration='([^']+)'", webpage, 'duration', fatal=False))
date_m = re.search(r'''(?x)
<p\s+class="vod_dateline">\s*
@@ -41,7 +46,7 @@ class NDTVIE(InfoExtractor):
(?P<monthname>[A-Za-z]+)\s+(?P<day>[0-9]+),\s*(?P<year>[0-9]+)
''', webpage)
upload_date = None
assert date_m
if date_m is not None:
month = month_by_name(date_m.group('monthname'))
if month is not None:
@@ -49,14 +54,19 @@ class NDTVIE(InfoExtractor):
date_m.group('year'), month, int(date_m.group('day')))
description = self._og_search_description(webpage)
READ_MORE = u' (Read more)'
READ_MORE = ' (Read more)'
if description.endswith(READ_MORE):
description = description[:-len(READ_MORE)]
title = self._og_search_title(webpage)
TITLE_SUFFIX = ' - NDTV'
if title.endswith(TITLE_SUFFIX):
title = title[:-len(TITLE_SUFFIX)]
return {
'id': video_id,
'url': video_url,
'title': self._og_search_title(webpage),
'title': title,
'description': description,
'thumbnail': self._og_search_thumbnail(webpage),
'duration': duration,

View File

@@ -4,18 +4,19 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import ExtractorError
class NewstubeIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?newstube\.ru/media/(?P<id>.+)'
_TEST = {
'url': 'http://newstube.ru/media/na-korable-progress-prodolzhaetsya-testirovanie-sistemy-kurs',
'url': 'http://www.newstube.ru/media/telekanal-cnn-peremestil-gorod-slavyansk-v-krym',
'info_dict': {
'id': 'd156a237-a6e9-4111-a682-039995f721f1',
'id': '728e0ef2-e187-4012-bac0-5a081fdcb1f6',
'ext': 'flv',
'title': 'На корабле «Прогресс» продолжается тестирование системы «Курс»',
'description': 'md5:d0cbe7b4a6f600552617e48548d5dc77',
'duration': 20.04,
'title': 'Телеканал CNN переместил город Славянск в Крым',
'description': 'md5:419a8c9f03442bc0b0a794d689360335',
'duration': 31.05,
},
'params': {
# rtmp download
@@ -40,6 +41,10 @@ class NewstubeIE(InfoExtractor):
def ns(s):
return s.replace('/', '/%(ns)s') % {'ns': '{http://app1.newstube.ru/N2SiteWS/player.asmx}'}
error_message = player.find(ns('./ErrorMessage'))
if error_message is not None:
raise ExtractorError('%s returned error: %s' % (self.IE_NAME, error_message.text), expected=True)
session_id = player.find(ns('./SessionId')).text
media_info = player.find(ns('./Medias/MediaInfo'))
title = media_info.find(ns('./Name')).text

View File

@@ -8,10 +8,9 @@ from ..utils import (
compat_urllib_parse,
compat_urllib_request,
compat_urlparse,
compat_str,
ExtractorError,
unified_strdate,
parse_duration,
int_or_none,
)
@@ -30,6 +29,7 @@ class NiconicoIE(InfoExtractor):
'uploader_id': '2698420',
'upload_date': '20131123',
'description': '(c) copyright 2008, Blender Foundation / www.bigbuckbunny.org',
'duration': 33,
},
'params': {
'username': 'ydl.niconico@gmail.com',
@@ -37,17 +37,20 @@ class NiconicoIE(InfoExtractor):
},
}
_VALID_URL = r'^https?://(?:www\.|secure\.)?nicovideo\.jp/watch/([a-z][a-z][0-9]+)(?:.*)$'
_VALID_URL = r'https?://(?:www\.|secure\.)?nicovideo\.jp/watch/((?:[a-z]{2})?[0-9]+)'
_NETRC_MACHINE = 'niconico'
# Determine whether the downloader uses authentication to download video
_AUTHENTICATE = False
def _real_initialize(self):
self._login()
if self._downloader.params.get('username', None) is not None:
self._AUTHENTICATE = True
if self._AUTHENTICATE:
self._login()
def _login(self):
(username, password) = self._get_login_info()
if username is None:
# Login is required
raise ExtractorError('No login info available, needed for using %s.' % self.IE_NAME, expected=True)
# Log in
login_form_strs = {
@@ -79,44 +82,66 @@ class NiconicoIE(InfoExtractor):
'http://ext.nicovideo.jp/api/getthumbinfo/' + video_id, video_id,
note='Downloading video info page')
# Get flv info
flv_info_webpage = self._download_webpage(
'http://flapi.nicovideo.jp/api/getflv?v=' + video_id,
video_id, 'Downloading flv info')
if self._AUTHENTICATE:
# Get flv info
flv_info_webpage = self._download_webpage(
'http://flapi.nicovideo.jp/api/getflv?v=' + video_id,
video_id, 'Downloading flv info')
else:
# Get external player info
ext_player_info = self._download_webpage(
'http://ext.nicovideo.jp/thumb_watch/' + video_id, video_id)
thumb_play_key = self._search_regex(
r'\'thumbPlayKey\'\s*:\s*\'(.*?)\'', ext_player_info, 'thumbPlayKey')
# Get flv info
flv_info_data = compat_urllib_parse.urlencode({
'k': thumb_play_key,
'v': video_id
})
flv_info_request = compat_urllib_request.Request(
'http://ext.nicovideo.jp/thumb_watch', flv_info_data,
{'Content-Type': 'application/x-www-form-urlencoded'})
flv_info_webpage = self._download_webpage(
flv_info_request, video_id,
note='Downloading flv info', errnote='Unable to download flv info')
video_real_url = compat_urlparse.parse_qs(flv_info_webpage)['url'][0]
# Start extracting information
video_title = video_info.find('.//title').text
video_extension = video_info.find('.//movie_type').text
video_format = video_extension.upper()
video_thumbnail = video_info.find('.//thumbnail_url').text
video_description = video_info.find('.//description').text
video_uploader_id = video_info.find('.//user_id').text
video_upload_date = unified_strdate(video_info.find('.//first_retrieve').text.split('+')[0])
video_view_count = video_info.find('.//view_counter').text
video_webpage_url = video_info.find('.//watch_url').text
title = video_info.find('.//title').text
extension = video_info.find('.//movie_type').text
video_format = extension.upper()
thumbnail = video_info.find('.//thumbnail_url').text
description = video_info.find('.//description').text
upload_date = unified_strdate(video_info.find('.//first_retrieve').text.split('+')[0])
view_count = int_or_none(video_info.find('.//view_counter').text)
comment_count = int_or_none(video_info.find('.//comment_num').text)
duration = parse_duration(video_info.find('.//length').text)
webpage_url = video_info.find('.//watch_url').text
# uploader
video_uploader = video_uploader_id
url = 'http://seiga.nicovideo.jp/api/user/info?id=' + video_uploader_id
try:
user_info = self._download_xml(
url, video_id, note='Downloading user information')
video_uploader = user_info.find('.//nickname').text
except ExtractorError as err:
self._downloader.report_warning('Unable to download user info webpage: %s' % compat_str(err))
if video_info.find('.//ch_id') is not None:
uploader_id = video_info.find('.//ch_id').text
uploader = video_info.find('.//ch_name').text
elif video_info.find('.//user_id') is not None:
uploader_id = video_info.find('.//user_id').text
uploader = video_info.find('.//user_nickname').text
else:
uploader_id = uploader = None
return {
'id': video_id,
'url': video_real_url,
'title': video_title,
'ext': video_extension,
'title': title,
'ext': extension,
'format': video_format,
'thumbnail': video_thumbnail,
'description': video_description,
'uploader': video_uploader,
'upload_date': video_upload_date,
'uploader_id': video_uploader_id,
'view_count': video_view_count,
'webpage_url': video_webpage_url,
'thumbnail': thumbnail,
'description': description,
'uploader': uploader,
'upload_date': upload_date,
'uploader_id': uploader_id,
'view_count': view_count,
'comment_count': comment_count,
'duration': duration,
'webpage_url': webpage_url,
}

View File

@@ -47,7 +47,7 @@ class NineGagIE(InfoExtractor):
webpage = self._download_webpage(url, display_id)
post_view = json.loads(self._html_search_regex(
r'var postView = new app\.PostView\({\s*post:\s*({.+?}),', webpage, 'post view'))
r'var postView = new app\.PostView\({\s*post:\s*({.+?}),\s*posts:\s*prefetchedCurrentPost', webpage, 'post view'))
youtube_id = post_view['videoExternalId']
title = post_view['title']

View File

@@ -26,7 +26,8 @@ class NocoIE(InfoExtractor):
'uploader': 'Nolife',
'uploader_id': 'NOL',
'duration': 2851.2,
}
},
'skip': 'Requires noco account',
}
def _real_extract(self, url):
@@ -34,7 +35,7 @@ class NocoIE(InfoExtractor):
video_id = mobj.group('id')
medias = self._download_json(
'http://api.noco.tv/1.0/video/medias/%s' % video_id, video_id, 'Downloading video JSON')
'https://api.noco.tv/1.0/video/medias/%s' % video_id, video_id, 'Downloading video JSON')
formats = []
@@ -42,7 +43,7 @@ class NocoIE(InfoExtractor):
format_id = fmt['quality_key']
file = self._download_json(
'http://api.noco.tv/1.0/video/file/%s/fr/%s' % (format_id.lower(), video_id),
'https://api.noco.tv/1.0/video/file/%s/fr/%s' % (format_id.lower(), video_id),
video_id, 'Downloading %s video JSON' % format_id)
file_url = file['file']
@@ -70,7 +71,7 @@ class NocoIE(InfoExtractor):
self._sort_formats(formats)
show = self._download_json(
'http://api.noco.tv/1.0/shows/show/%s' % video_id, video_id, 'Downloading show JSON')[0]
'https://api.noco.tv/1.0/shows/show/%s' % video_id, video_id, 'Downloading show JSON')[0]
upload_date = unified_strdate(show['indexed'])
uploader = show['partner_name']

View File

@@ -4,9 +4,7 @@ import re
from .brightcove import BrightcoveIE
from .common import InfoExtractor
from ..utils import (
ExtractorError,
)
from ..utils import ExtractorError
class NownessIE(InfoExtractor):
@@ -14,9 +12,10 @@ class NownessIE(InfoExtractor):
_TEST = {
'url': 'http://www.nowness.com/day/2013/6/27/3131/candor--the-art-of-gesticulation',
'file': '2520295746001.mp4',
'md5': '0ece2f70a7bd252c7b00f3070182d418',
'md5': '068bc0202558c2e391924cb8cc470676',
'info_dict': {
'id': '2520295746001',
'ext': 'mp4',
'description': 'Candor: The Art of Gesticulation',
'uploader': 'Nowness',
'title': 'Candor: The Art of Gesticulation',

View File

@@ -0,0 +1,62 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
unified_strdate,
)
class NPOIE(InfoExtractor):
IE_NAME = 'npo.nl'
_VALID_URL = r'https?://www\.npo\.nl/[^/]+/[^/]+/(?P<id>[^/?]+)'
_TEST = {
'url': 'http://www.npo.nl/nieuwsuur/22-06-2014/VPWON_1220719',
'md5': '4b3f9c429157ec4775f2c9cb7b911016',
'info_dict': {
'id': 'VPWON_1220719',
'ext': 'mp4',
'title': 'Nieuwsuur',
'description': 'Dagelijks tussen tien en elf: nieuws, sport en achtergronden.',
'upload_date': '20140622',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
metadata = self._download_json(
'http://e.omroep.nl/metadata/aflevering/%s' % video_id,
video_id,
# We have to remove the javascript callback
transform_source=lambda j: re.sub(r'parseMetadata\((.*?)\);\n//.*$', r'\1', j)
)
token_page = self._download_webpage(
'http://ida.omroep.nl/npoplayer/i.js',
video_id,
note='Downloading token'
)
token = self._search_regex(r'npoplayer.token = "(.+?)"', token_page, 'token')
streams_info = self._download_json(
'http://ida.omroep.nl/odi/?prid=%s&puboptions=h264_std&adaptive=yes&token=%s' % (video_id, token),
video_id
)
stream_info = self._download_json(
streams_info['streams'][0] + '&type=json',
video_id,
'Downloading stream info'
)
return {
'id': video_id,
'title': metadata['titel'],
'ext': 'mp4',
'url': stream_info['url'],
'description': metadata['info'],
'thumbnail': metadata['images'][-1]['url'],
'upload_date': unified_strdate(metadata['gidsdatum']),
}

View File

@@ -4,7 +4,11 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import ExtractorError
from ..utils import (
ExtractorError,
float_or_none,
unified_strdate,
)
class NRKIE(InfoExtractor):
@@ -64,4 +68,77 @@ class NRKIE(InfoExtractor):
'title': data['title'],
'description': data['description'],
'thumbnail': thumbnail,
}
}
class NRKTVIE(InfoExtractor):
_VALID_URL = r'http://tv\.nrk(?:super)?\.no/(?:serie/[^/]+|program)/(?P<id>[a-zA-Z]{4}\d{8})'
_TESTS = [
{
'url': 'http://tv.nrk.no/serie/20-spoersmaal-tv/MUHH48000314/23-05-2014',
'md5': '7b96112fbae1faf09a6f9ae1aff6cb84',
'info_dict': {
'id': 'MUHH48000314',
'ext': 'flv',
'title': '20 spørsmål',
'description': 'md5:bdea103bc35494c143c6a9acdd84887a',
'upload_date': '20140523',
'duration': 1741.52,
}
},
{
'url': 'http://tv.nrk.no/program/mdfp15000514',
'md5': 'af01795a31f1cf7265c8657534d8077b',
'info_dict': {
'id': 'mdfp15000514',
'ext': 'flv',
'title': 'Kunnskapskanalen: Grunnlovsjubiléet - Stor ståhei for ingenting',
'description': 'md5:654c12511f035aed1e42bdf5db3b206a',
'upload_date': '20140524',
'duration': 4605.0,
}
},
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
page = self._download_webpage(url, video_id)
title = self._html_search_meta('title', page, 'title')
description = self._html_search_meta('description', page, 'description')
thumbnail = self._html_search_regex(r'data-posterimage="([^"]+)"', page, 'thumbnail', fatal=False)
upload_date = unified_strdate(self._html_search_meta('rightsfrom', page, 'upload date', fatal=False))
duration = float_or_none(
self._html_search_regex(r'data-duration="([^"]+)"', page, 'duration', fatal=False))
formats = []
f4m_url = re.search(r'data-media="([^"]+)"', page)
if f4m_url:
formats.append({
'url': f4m_url.group(1) + '?hdcore=3.1.1&plugin=aasp-3.1.1.69.124',
'format_id': 'f4m',
'ext': 'flv',
})
m3u8_url = re.search(r'data-hls-media="([^"]+)"', page)
if m3u8_url:
formats.append({
'url': m3u8_url.group(1),
'format_id': 'm3u8',
})
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'upload_date': upload_date,
'duration': duration,
'formats': formats,
}

View File

@@ -5,7 +5,6 @@ import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
unescapeHTML
)

View File

@@ -3,6 +3,11 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
parse_duration,
unified_strdate,
compat_urllib_request,
)
class NuvidIE(InfoExtractor):
@@ -13,8 +18,10 @@ class NuvidIE(InfoExtractor):
'info_dict': {
'id': '1310741',
'ext': 'mp4',
"title": "Horny babes show their awesome bodeis and",
"age_limit": 18,
'title': 'Horny babes show their awesome bodeis and',
'duration': 129,
'upload_date': '20140508',
'age_limit': 18,
}
}
@@ -22,27 +29,41 @@ class NuvidIE(InfoExtractor):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
murl = url.replace('://www.', '://m.')
webpage = self._download_webpage(murl, video_id)
formats = []
for dwnld_speed, format_id in [(0, '3gp'), (5, 'mp4')]:
request = compat_urllib_request.Request(
'http://m.nuvid.com/play/%s' % video_id)
request.add_header('Cookie', 'skip_download_page=1; dwnld_speed=%d; adv_show=1' % dwnld_speed)
webpage = self._download_webpage(
request, video_id, 'Downloading %s page' % format_id)
video_url = self._html_search_regex(
r'<a href="([^"]+)"\s*>Continue to watch video', webpage, '%s video URL' % format_id, fatal=False)
if not video_url:
continue
formats.append({
'url': video_url,
'format_id': format_id,
})
webpage = self._download_webpage(
'http://m.nuvid.com/video/%s' % video_id, video_id, 'Downloading video page')
title = self._html_search_regex(
r'<div class="title">\s+<h2[^>]*>([^<]+)</h2>',
webpage, 'title').strip()
url_end = self._html_search_regex(
r'href="(/mp4/[^"]+)"[^>]*data-link_type="mp4"',
webpage, 'video_url')
video_url = 'http://m.nuvid.com' + url_end
r'<div class="title">\s+<h2[^>]*>([^<]+)</h2>', webpage, 'title').strip()
thumbnail = self._html_search_regex(
r'href="(/thumbs/[^"]+)"[^>]*data-link_type="thumbs"',
webpage, 'thumbnail URL', fatal=False)
duration = parse_duration(self._html_search_regex(
r'Length:\s*<span>(\d{2}:\d{2})</span>',webpage, 'duration', fatal=False))
upload_date = unified_strdate(self._html_search_regex(
r'Added:\s*<span>(\d{4}-\d{2}-\d{2})</span>', webpage, 'upload date', fatal=False))
return {
'id': video_id,
'url': video_url,
'ext': 'mp4',
'title': title,
'thumbnail': thumbnail,
'thumbnail': 'http://m.nuvid.com%s' % thumbnail,
'duration': duration,
'upload_date': upload_date,
'age_limit': 18,
}
'formats': formats,
}

View File

@@ -1,10 +1,10 @@
from __future__ import unicode_literals
import datetime
import json
import re
from .common import InfoExtractor
from ..utils import compat_urllib_parse
class PhotobucketIE(InfoExtractor):
@@ -14,6 +14,7 @@ class PhotobucketIE(InfoExtractor):
'file': 'zpsc0c3b9fa.mp4',
'md5': '7dabfb92b0a31f6c16cebc0f8e60ff99',
'info_dict': {
'timestamp': 1367669341,
'upload_date': '20130504',
'uploader': 'rachaneronas',
'title': 'Tired of Link Building? Try BacklinkMyDomain.com!',
@@ -32,11 +33,12 @@ class PhotobucketIE(InfoExtractor):
info_json = self._search_regex(r'Pb\.Data\.Shared\.put\(Pb\.Data\.Shared\.MEDIA, (.*?)\);',
webpage, 'info json')
info = json.loads(info_json)
url = compat_urllib_parse.unquote(self._html_search_regex(r'file=(.+\.mp4)', info['linkcodes']['html'], 'url'))
return {
'id': video_id,
'url': info['downloadUrl'],
'url': url,
'uploader': info['username'],
'upload_date': datetime.date.fromtimestamp(info['creationDate']).strftime('%Y%m%d'),
'timestamp': info['creationDate'],
'title': info['title'],
'ext': video_extension,
'thumbnail': info['thumbUrl'],

View File

@@ -45,7 +45,7 @@ class PornHubIE(InfoExtractor):
video_title = self._html_search_regex(r'<h1 [^>]+>([^<]+)', webpage, 'title')
video_uploader = self._html_search_regex(
r'(?s)<div class="video-info-row">\s*From:&nbsp;.+?<(?:a href="/users/|<span class="username)[^>]+>(.+?)<',
r'(?s)From:&nbsp;.+?<(?:a href="/users/|<span class="username)[^>]+>(.+?)<',
webpage, 'uploader', fatal=False)
thumbnail = self._html_search_regex(r'"image_url":"([^"]+)', webpage, 'thumbnail', fatal=False)
if thumbnail:

View File

@@ -158,19 +158,19 @@ class ProSiebenSat1IE(InfoExtractor):
_CLIPID_REGEXES = [
r'"clip_id"\s*:\s+"(\d+)"',
r'clipid: "(\d+)"',
r'clipId=(\d+)',
r'clip[iI]d=(\d+)',
]
_TITLE_REGEXES = [
r'<h2 class="subtitle" itemprop="name">\s*(.+?)</h2>',
r'<header class="clearfix">\s*<h3>(.+?)</h3>',
r'<!-- start video -->\s*<h1>(.+?)</h1>',
r'<div class="ep-femvideos-pi4-video-txt">\s*<h2>(.+?)</h2>',
r'<h1 class="att-name">\s*(.+?)</h1>',
]
_DESCRIPTION_REGEXES = [
r'<p itemprop="description">\s*(.+?)</p>',
r'<div class="videoDecription">\s*<p><strong>Beschreibung</strong>: (.+?)</p>',
r'<div class="g-plusone" data-size="medium"></div>\s*</div>\s*</header>\s*(.+?)\s*<footer>',
r'<p>(.+?)</p>\s*<div class="ep-femvideos-pi4-video-footer">',
r'<p class="att-description">\s*(.+?)\s*</p>',
]
_UPLOAD_DATE_REGEXES = [
r'<meta property="og:published_time" content="(.+?)">',

View File

@@ -46,7 +46,7 @@ class PyvideoIE(InfoExtractor):
return self.url_result(m_youtube.group(1), 'Youtube')
title = self._html_search_regex(
r'<div class="section">.*?<h3(?:\s+class="[^"]*")?>([^>]+?)</h3>',
r'<div class="section">\s*<h3(?:\s+class="[^"]*"[^>]*)?>([^>]+?)</h3>',
webpage, 'title', flags=re.DOTALL)
video_url = self._search_regex(
[r'<source src="(.*?)"', r'<dt>Download</dt>.*?<a href="(.+?)"'],

122
youtube_dl/extractor/rai.py Normal file
View File

@@ -0,0 +1,122 @@
from __future__ import unicode_literals
import re
from .subtitles import SubtitlesInfoExtractor
from ..utils import (
parse_duration,
unified_strdate,
compat_urllib_parse,
)
class RaiIE(SubtitlesInfoExtractor):
_VALID_URL = r'(?P<url>http://(?:.+?\.)?(?:rai\.it|rai\.tv|rainews\.it)/dl/.+?-(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})(?:-.+?)?\.html)'
_TESTS = [
{
'url': 'http://www.rai.tv/dl/RaiTV/programmi/media/ContentItem-cb27157f-9dd0-4aee-b788-b1f67643a391.html',
'md5': 'c064c0b2d09c278fb293116ef5d0a32d',
'info_dict': {
'id': 'cb27157f-9dd0-4aee-b788-b1f67643a391',
'ext': 'mp4',
'title': 'Report del 07/04/2014',
'description': 'md5:f27c544694cacb46a078db84ec35d2d9',
'upload_date': '20140407',
'duration': 6160,
}
},
{
'url': 'http://www.raisport.rai.it/dl/raiSport/media/rassegna-stampa-04a9f4bd-b563-40cf-82a6-aad3529cb4a9.html',
'md5': '8bb9c151924ce241b74dd52ef29ceafa',
'info_dict': {
'id': '04a9f4bd-b563-40cf-82a6-aad3529cb4a9',
'ext': 'mp4',
'title': 'TG PRIMO TEMPO',
'description': '',
'upload_date': '20140612',
'duration': 1758,
},
'skip': 'Error 404',
},
{
'url': 'http://www.rainews.it/dl/rainews/media/state-of-the-net-Antonella-La-Carpia-regole-virali-7aafdea9-0e5d-49d5-88a6-7e65da67ae13.html',
'md5': '35cf7c229f22eeef43e48b5cf923bef0',
'info_dict': {
'id': '7aafdea9-0e5d-49d5-88a6-7e65da67ae13',
'ext': 'mp4',
'title': 'State of the Net, Antonella La Carpia: regole virali',
'description': 'md5:b0ba04a324126903e3da7763272ae63c',
'upload_date': '20140613',
},
'skip': 'Error 404',
},
{
'url': 'http://www.rai.tv/dl/RaiTV/programmi/media/ContentItem-b4a49761-e0cc-4b14-8736-2729f6f73132-tg2.html',
'md5': '35694f062977fe6619943f08ed935730',
'info_dict': {
'id': 'b4a49761-e0cc-4b14-8736-2729f6f73132',
'ext': 'mp4',
'title': 'Alluvione in Sardegna e dissesto idrogeologico',
'description': 'Edizione delle ore 20:30 ',
}
},
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
media = self._download_json('%s?json' % mobj.group('url'), video_id, 'Downloading video JSON')
title = media.get('name')
description = media.get('desc')
thumbnail = media.get('image_300') or media.get('image_medium') or media.get('image')
duration = parse_duration(media.get('length'))
uploader = media.get('author')
upload_date = unified_strdate(media.get('date'))
formats = []
for format_id in ['wmv', 'm3u8', 'mediaUri', 'h264']:
media_url = media.get(format_id)
if not media_url:
continue
formats.append({
'url': media_url,
'format_id': format_id,
'ext': 'mp4',
})
if self._downloader.params.get('listsubtitles', False):
page = self._download_webpage(url, video_id)
self._list_available_subtitles(video_id, page)
return
subtitles = {}
if self._have_to_download_any_subtitles:
page = self._download_webpage(url, video_id)
subtitles = self.extract_subtitles(video_id, page)
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'uploader': uploader,
'upload_date': upload_date,
'duration': duration,
'formats': formats,
'subtitles': subtitles,
}
def _get_available_subtitles(self, video_id, webpage):
subtitles = {}
m = re.search(r'<meta name="closedcaption" content="(?P<captions>[^"]+)"', webpage)
if m:
captions = m.group('captions')
STL_EXT = '.stl'
SRT_EXT = '.srt'
if captions.endswith(STL_EXT):
captions = captions[:-len(STL_EXT)] + SRT_EXT
subtitles['it'] = 'http://www.rai.tv%s' % compat_urllib_parse.quote(captions)
return subtitles

View File

@@ -35,9 +35,7 @@ class RedTubeIE(InfoExtractor):
r'<h1 class="videoTitle[^"]*">(.+?)</h1>',
webpage, u'title')
video_thumbnail = self._html_search_regex(
r'playerInnerHTML.+?<img\s+src="(.+?)"',
webpage, u'thumbnail', fatal=False)
video_thumbnail = self._og_search_thumbnail(webpage)
# No self-labeling, but they describe themselves as
# "Home of Videos Porno"

View File

@@ -0,0 +1,45 @@
from __future__ import unicode_literals
import re
import time
from .common import InfoExtractor
from ..utils import strip_jsonp
class ReverbNationIE(InfoExtractor):
_VALID_URL = r'^https?://(?:www\.)?reverbnation\.com/.*?/song/(?P<id>\d+).*?$'
_TESTS = [{
'url': 'http://www.reverbnation.com/alkilados/song/16965047-mona-lisa',
'file': '16965047.mp3',
'md5': '3da12ebca28c67c111a7f8b262d3f7a7',
'info_dict': {
"title": "MONA LISA",
"uploader": "ALKILADOS",
"uploader_id": 216429,
"thumbnail": "//gp1.wac.edgecastcdn.net/802892/production_public/Photo/13761700/image/1366002176_AVATAR_MONA_LISA.jpg"
},
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
song_id = mobj.group('id')
api_res = self._download_json(
'https://api.reverbnation.com/song/%s?callback=api_response_5&_=%d'
% (song_id, int(time.time() * 1000)),
song_id,
transform_source=strip_jsonp,
note='Downloading information of song %s' % song_id
)
return {
'id': song_id,
'title': api_res.get('name'),
'url': api_res.get('url'),
'uploader': api_res.get('artist', {}).get('name'),
'uploader_id': api_res.get('artist', {}).get('id'),
'thumbnail': api_res.get('image', api_res.get('thumbnail')),
'ext': 'mp3',
'vcodec': 'none',
}

View File

@@ -30,7 +30,7 @@ class RTBFIE(InfoExtractor):
page = self._download_webpage('https://www.rtbf.be/video/embed?id=%s' % video_id, video_id)
data = json.loads(self._html_search_regex(
r'<div class="js-player-embed" data-video="([^"]+)"', page, 'data video'))['data']
r'<div class="js-player-embed(?: player-embed)?" data-video="([^"]+)"', page, 'data video'))['data']
video_url = data.get('downloadUrl') or data.get('url')

View File

@@ -0,0 +1,46 @@
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class RUHDIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?ruhd\.ru/play\.php\?vid=(?P<id>\d+)'
_TEST = {
'url': 'http://www.ruhd.ru/play.php?vid=207',
'md5': 'd1a9ec4edf8598e3fbd92bb16072ba83',
'info_dict': {
'id': '207',
'ext': 'divx',
'title': 'КОТ бааааам',
'description': 'классный кот)',
'thumbnail': 're:^http://.*\.jpg$',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
video_url = self._html_search_regex(
r'<param name="src" value="([^"]+)"', webpage, 'video url')
title = self._html_search_regex(
r'<title>([^<]+)&nbsp;&nbsp; RUHD.ru - Видео Высокого качества №1 в России!</title>', webpage, 'title')
description = self._html_search_regex(
r'(?s)<div id="longdesc">(.+?)<span id="showlink">', webpage, 'description', fatal=False)
thumbnail = self._html_search_regex(
r'<param name="previewImage" value="([^"]+)"', webpage, 'thumbnail', fatal=False)
if thumbnail:
thumbnail = 'http://www.ruhd.ru' + thumbnail
return {
'id': video_id,
'url': video_url,
'title': title,
'description': description,
'thumbnail': thumbnail,
}

View File

@@ -0,0 +1,119 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
parse_duration,
unified_strdate,
)
class SapoIE(InfoExtractor):
IE_DESC = 'SAPO Vídeos'
_VALID_URL = r'https?://(?:(?:v2|www)\.)?videos\.sapo\.(?:pt|cv|ao|mz|tl)/(?P<id>[\da-zA-Z]{20})'
_TESTS = [
{
'url': 'http://videos.sapo.pt/UBz95kOtiWYUMTA5Ghfi',
'md5': '79ee523f6ecb9233ac25075dee0eda83',
'note': 'SD video',
'info_dict': {
'id': 'UBz95kOtiWYUMTA5Ghfi',
'ext': 'mp4',
'title': 'Benfica - Marcas na Hitória',
'description': 'md5:c9082000a128c3fd57bf0299e1367f22',
'duration': 264,
'uploader': 'tiago_1988',
'upload_date': '20080229',
'categories': ['benfica', 'cabral', 'desporto', 'futebol', 'geovanni', 'hooijdonk', 'joao', 'karel', 'lisboa', 'miccoli'],
},
},
{
'url': 'http://videos.sapo.pt/IyusNAZ791ZdoCY5H5IF',
'md5': '90a2f283cfb49193fe06e861613a72aa',
'note': 'HD video',
'info_dict': {
'id': 'IyusNAZ791ZdoCY5H5IF',
'ext': 'mp4',
'title': 'Codebits VII - Report',
'description': 'md5:6448d6fd81ce86feac05321f354dbdc8',
'duration': 144,
'uploader': 'codebits',
'upload_date': '20140427',
'categories': ['codebits', 'codebits2014'],
},
},
{
'url': 'http://v2.videos.sapo.pt/yLqjzPtbTimsn2wWBKHz',
'md5': 'e5aa7cc0bdc6db9b33df1a48e49a15ac',
'note': 'v2 video',
'info_dict': {
'id': 'yLqjzPtbTimsn2wWBKHz',
'ext': 'mp4',
'title': 'Hipnose Condicionativa 4',
'description': 'md5:ef0481abf8fb4ae6f525088a6dadbc40',
'duration': 692,
'uploader': 'sapozen',
'upload_date': '20090609',
'categories': ['condicionativa', 'heloisa', 'hipnose', 'miranda', 'sapo', 'zen'],
},
},
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
item = self._download_xml(
'http://rd3.videos.sapo.pt/%s/rss2' % video_id, video_id).find('./channel/item')
title = item.find('./title').text
description = item.find('./{http://videos.sapo.pt/mrss/}synopse').text
thumbnail = item.find('./{http://search.yahoo.com/mrss/}content').get('url')
duration = parse_duration(item.find('./{http://videos.sapo.pt/mrss/}time').text)
uploader = item.find('./{http://videos.sapo.pt/mrss/}author').text
upload_date = unified_strdate(item.find('./pubDate').text)
view_count = int(item.find('./{http://videos.sapo.pt/mrss/}views').text)
comment_count = int(item.find('./{http://videos.sapo.pt/mrss/}comment_count').text)
tags = item.find('./{http://videos.sapo.pt/mrss/}tags').text
categories = tags.split() if tags else []
age_limit = 18 if item.find('./{http://videos.sapo.pt/mrss/}m18').text == 'true' else 0
video_url = item.find('./{http://videos.sapo.pt/mrss/}videoFile').text
video_size = item.find('./{http://videos.sapo.pt/mrss/}videoSize').text.split('x')
formats = [{
'url': video_url,
'ext': 'mp4',
'format_id': 'sd',
'width': int(video_size[0]),
'height': int(video_size[1]),
}]
if item.find('./{http://videos.sapo.pt/mrss/}HD').text == 'true':
formats.append({
'url': re.sub(r'/mov/1$', '/mov/39', video_url),
'ext': 'mp4',
'format_id': 'hd',
'width': 1280,
'height': 720,
})
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'duration': duration,
'uploader': uploader,
'upload_date': upload_date,
'view_count': view_count,
'comment_count': comment_count,
'categories': categories,
'age_limit': age_limit,
'formats': formats,
}

View File

@@ -0,0 +1,112 @@
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
compat_parse_qs,
compat_urllib_request,
)
class ScreencastIE(InfoExtractor):
_VALID_URL = r'https?://www\.screencast\.com/t/(?P<id>[a-zA-Z0-9]+)'
_TESTS = [{
'url': 'http://www.screencast.com/t/3ZEjQXlT',
'md5': '917df1c13798a3e96211dd1561fded83',
'info_dict': {
'id': '3ZEjQXlT',
'ext': 'm4v',
'title': 'Color Measurement with Ocean Optics Spectrometers',
'description': 'md5:240369cde69d8bed61349a199c5fb153',
'thumbnail': 're:^https?://.*\.(?:gif|jpg)$',
}
}, {
'url': 'http://www.screencast.com/t/V2uXehPJa1ZI',
'md5': 'e8e4b375a7660a9e7e35c33973410d34',
'info_dict': {
'id': 'V2uXehPJa1ZI',
'ext': 'mov',
'title': 'The Amadeus Spectrometer',
'description': 're:^In this video, our friends at.*To learn more about Amadeus, visit',
'thumbnail': 're:^https?://.*\.(?:gif|jpg)$',
}
}, {
'url': 'http://www.screencast.com/t/aAB3iowa',
'md5': 'dedb2734ed00c9755761ccaee88527cd',
'info_dict': {
'id': 'aAB3iowa',
'ext': 'mp4',
'title': 'Google Earth Export',
'description': 'Provides a demo of a CommunityViz export to Google Earth, one of the 3D viewing options.',
'thumbnail': 're:^https?://.*\.(?:gif|jpg)$',
}
}, {
'url': 'http://www.screencast.com/t/X3ddTrYh',
'md5': '669ee55ff9c51988b4ebc0877cc8b159',
'info_dict': {
'id': 'X3ddTrYh',
'ext': 'wmv',
'title': 'Toolkit 6 User Group Webinar (2014-03-04) - Default Judgment and First Impression',
'description': 'md5:7b9f393bc92af02326a5c5889639eab0',
'thumbnail': 're:^https?://.*\.(?:gif|jpg)$',
}
},
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
video_url = self._html_search_regex(
r'<embed name="Video".*?src="([^"]+)"', webpage,
'QuickTime embed', default=None)
if video_url is None:
flash_vars_s = self._html_search_regex(
r'<param name="flashVars" value="([^"]+)"', webpage, 'flash vars',
default=None)
if not flash_vars_s:
flash_vars_s = self._html_search_regex(
r'<param name="initParams" value="([^"]+)"', webpage, 'flash vars',
default=None)
if flash_vars_s:
flash_vars_s = flash_vars_s.replace(',', '&')
if flash_vars_s:
flash_vars = compat_parse_qs(flash_vars_s)
video_url_raw = compat_urllib_request.quote(
flash_vars['content'][0])
video_url = video_url_raw.replace('http%3A', 'http:')
if video_url is None:
video_meta = self._html_search_meta(
'og:video', webpage, default=None)
if video_meta:
video_url = self._search_regex(
r'src=(.*?)(?:$|&)', video_meta,
'meta tag video URL', default=None)
if video_url is None:
raise ExtractorError('Cannot find video')
title = self._og_search_title(webpage, default=None)
if title is None:
title = self._html_search_regex(
[r'<b>Title:</b> ([^<]*)</div>',
r'class="tabSeperator">></span><span class="tabText">(.*?)<'],
webpage, 'title')
thumbnail = self._og_search_thumbnail(webpage)
description = self._og_search_description(webpage, default=None)
if description is None:
description = self._html_search_meta('description', webpage)
return {
'id': video_id,
'url': video_url,
'title': title,
'description': description,
'thumbnail': thumbnail,
}

View File

@@ -3,9 +3,6 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
)
class SlutloadIE(InfoExtractor):

View File

@@ -1,7 +1,6 @@
# encoding: utf-8
from __future__ import unicode_literals
import json
import re
import itertools
@@ -12,6 +11,7 @@ from ..utils import (
compat_urllib_parse,
ExtractorError,
int_or_none,
unified_strdate,
)
@@ -44,7 +44,8 @@ class SoundcloudIE(InfoExtractor):
"upload_date": "20121011",
"description": "No Downloads untill we record the finished version this weekend, i was too pumped n i had to post it , earl is prolly gonna b hella p.o'd",
"uploader": "E.T. ExTerrestrial Music",
"title": "Lostin Powers - She so Heavy (SneakPreview) Adrian Ackers Blueprint 1"
"title": "Lostin Powers - She so Heavy (SneakPreview) Adrian Ackers Blueprint 1",
"duration": 143,
}
},
# not streamable song
@@ -57,6 +58,7 @@ class SoundcloudIE(InfoExtractor):
'description': 'From Stockholm Sweden\r\nPovel / Magnus / Filip / David\r\nwww.theroyalconcept.com',
'uploader': 'The Royal Concept',
'upload_date': '20120521',
'duration': 227,
},
'params': {
# rtmp
@@ -74,19 +76,21 @@ class SoundcloudIE(InfoExtractor):
'uploader': 'jaimeMF',
'description': 'test chars: \"\'/\\ä↭',
'upload_date': '20131209',
'duration': 9,
},
},
# downloadable song
{
'url': 'https://soundcloud.com/simgretina/just-your-problem-baby-1',
'md5': '56a8b69568acaa967b4c49f9d1d52d19',
'url': 'https://soundcloud.com/oddsamples/bus-brakes',
'md5': 'fee7b8747b09bb755cefd4b853e7249a',
'info_dict': {
'id': '105614606',
'id': '128590877',
'ext': 'wav',
'title': 'Just Your Problem Baby (Acapella)',
'description': 'Vocals',
'uploader': 'Sim Gretina',
'upload_date': '20130815',
'title': 'Bus Brakes',
'description': 'md5:0170be75dd395c96025d210d261c784e',
'uploader': 'oddsamples',
'upload_date': '20140109',
'duration': 17,
},
},
]
@@ -119,6 +123,7 @@ class SoundcloudIE(InfoExtractor):
'title': info['title'],
'description': info['description'],
'thumbnail': thumbnail,
'duration': int_or_none(info.get('duration'), 1000),
}
formats = []
if info.get('downloadable', False):
@@ -250,7 +255,7 @@ class SoundcloudSetIE(SoundcloudIE):
class SoundcloudUserIE(SoundcloudIE):
_VALID_URL = r'https?://(www\.)?soundcloud\.com/(?P<user>[^/]+)(/?(tracks/)?)?(\?.*)?$'
_VALID_URL = r'https?://(www\.)?soundcloud\.com/(?P<user>[^/]+)/?((?P<rsrc>tracks|likes)/?)?(\?.*)?$'
IE_NAME = 'soundcloud:user'
# it's in tests/test_playlists.py
@@ -259,24 +264,31 @@ class SoundcloudUserIE(SoundcloudIE):
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
uploader = mobj.group('user')
resource = mobj.group('rsrc')
if resource is None:
resource = 'tracks'
elif resource == 'likes':
resource = 'favorites'
url = 'http://soundcloud.com/%s/' % uploader
resolv_url = self._resolv_url(url)
user = self._download_json(
resolv_url, uploader, 'Downloading user info')
base_url = 'http://api.soundcloud.com/users/%s/tracks.json?' % uploader
base_url = 'http://api.soundcloud.com/users/%s/%s.json?' % (uploader, resource)
entries = []
for i in itertools.count():
data = compat_urllib_parse.urlencode({
'offset': i * 50,
'limit': 50,
'client_id': self._CLIENT_ID,
})
new_entries = self._download_json(
base_url + data, uploader, 'Downloading track page %s' % (i + 1))
entries.extend(self._extract_info_dict(e, quiet=True) for e in new_entries)
if len(new_entries) < 50:
if len(new_entries) == 0:
self.to_screen('%s: End page received' % uploader)
break
entries.extend(self._extract_info_dict(e, quiet=True) for e in new_entries)
return {
'_type': 'playlist',

Some files were not shown because too many files have changed in this diff Show More