Compare commits

...

346 Commits

Author SHA1 Message Date
0bf5cf9886 release 2014.02.24 2014-02-24 09:44:22 +01:00
919052d094 [zdf] Fix podcast extraction and use unicode literals (Closes #2446) 2014-02-24 13:47:47 +07:00
a2dafe2887 [youtube] Fix mix video regex
Attributes' order in <li> is arbitrary and changes every time playlist
page is fetched, so we can't rely on `data-index` to be before
`data-video-username`.
2014-02-24 12:52:02 +07:00
92661c994b [normalboots] Modernize and simplify 2014-02-23 18:28:22 +01:00
ffe8fe356a [normalboots] Fix video url extraction 2014-02-23 18:06:51 +01:00
bc2f773b4f [youtube:playlist] Fix mixes extraction (fixes #2444) 2014-02-23 17:17:36 +01:00
f919201ecc [vine] Extract more metadata and support low format 2014-02-23 19:02:31 +07:00
7ff5d5c2e2 Add one more format to unified_strdate 2014-02-23 19:00:51 +07:00
9b77f951c7 [breakcom] Fix error when calling _search_regex
I passed `’webpage’` instead of the variable `webpage`.
2014-02-23 12:28:44 +01:00
a25f2f990a [breakcom] Fix info json extraction 2014-02-23 12:20:58 +01:00
78b373975d [vine] Fix uploader extraction 2014-02-23 12:08:30 +01:00
2fcc873c4c release 2014.02.22.1 2014-02-22 23:17:56 +01:00
23c2baadb3 [videobam] Set age_limit to 18
From [their ToS](http://videobam.com/terms): "User must be eighteen 18[sic] years of age or older to use or access this web site."
2014-02-22 23:15:41 +01:00
521ee82334 Fix imports 2014-02-22 23:03:12 +01:00
1df96e59ce [f4m] Clean up 2014-02-22 23:03:00 +01:00
3e123c1e28 [videobam] Add support for videobam.com (Closes #2411) 2014-02-23 04:50:05 +07:00
f38da66731 Credit @soult for br 2014-02-22 20:19:41 +01:00
06aabfc422 [br] Simplify 2014-02-22 20:17:26 +01:00
1052d2bfec Merge remote-tracking branch 'soult/br' 2014-02-22 17:14:47 +01:00
5e0b652344 release 2014.02.22 2014-02-22 15:07:25 +01:00
0f8f097183 [release.sh] Do not run tests by default
We are at the point that testing takes waay too long for a release cycle, and fails way too often.
Tests through travis are a better indicator than testing just before release.
2014-02-22 15:06:07 +01:00
491ed3dda2 [trutube] Support multiple formats (#2433) 2014-02-22 15:05:30 +01:00
af284c6d1b Merge remote-tracking branch 'JohnyMoSwag/master' 2014-02-22 14:38:42 +01:00
41d3ec5fba [savefrom] Add extractor (Fixes #2434) 2014-02-22 14:36:16 +01:00
0568c352f3 [canalc2] Modernize 2014-02-22 14:27:09 +01:00
2e7b4cb714 [spankwire] Fix uploader id regex 2014-02-22 16:50:08 +07:00
9767726b66 [spankwire] Improve and modernize 2014-02-22 16:45:03 +07:00
9ddfd84e41 added trutubeIE 2014-02-22 00:11:57 -08:00
1cf563d84b release 2014.02.21.1 2014-02-21 18:19:48 +01:00
7928024f57 [BR] Add basic test 2014-02-21 18:00:05 +01:00
3eb38acb43 [BR] Add "BR" extractor
Extractor for videos from the Bayerischer Rundfunk Mediathek[1]. Currently only
supports videos. Audio and podcasts do not work yet with this extractor.

1: http://br.de/mediathek
2014-02-21 17:58:52 +01:00
f7300c5c90 [generic] Fix on python 2.6
`ParseError` is not available, it raises `xml.parsers.expat.ExpatError`.
The webpage needs to be encoded.
2014-02-21 16:59:10 +01:00
3489b7d26c [youtube] Simplify the decryption process for the manifest urls and add a test (closes #2422) 2014-02-21 15:15:58 +01:00
acd2bcc384 Merge branch 'youtube-dash' of github.com:m0vie/youtube-dl 2014-02-21 15:02:47 +01:00
43e77ca455 release 2014.02.21 2014-02-21 12:16:03 +01:00
da36297988 [wimp] Modernize and replace test 2014-02-21 17:57:19 +07:00
dbb94fb044 [youtube] Fix playlist extraction (Closes #2423, #2424, #2425) 2014-02-21 17:19:55 +07:00
d68f0cdb23 [youtube] decrypt signature when downloading dash manifest 2014-02-21 03:24:56 +01:00
eae16eb67b release 2014.02.20 2014-02-20 13:14:21 +01:00
4fc946b546 [generic] Add support for RSS feeds (Fixes #667) 2014-02-20 13:14:09 +01:00
280bc5dad6 [bbccouk] Add friendly contry filter error message (#2184) 2014-02-20 18:50:34 +07:00
f43770d8c9 Merge pull request #2413 from bentley/optypo
Fix minor typo: “to to” → “to”.
2014-02-20 08:02:54 +01:00
98c4b8fa1b Fix minor typo: “to to” → “to”. 2014-02-19 20:02:29 -07:00
ccb079ee67 [xhamster] Fix and improve 2014-02-20 02:37:44 +07:00
2ea237472c Merge pull request #2408 from pulpe/_readme
[README.md] correct the test command
2014-02-19 16:45:14 +01:00
0d4b4865cc [README.md] correct the test command 2014-02-19 16:13:45 +01:00
fe52f9f956 Document prefered config location (#2407) 2014-02-19 11:35:35 +01:00
882907a818 release 2014.02.19.1 2014-02-19 01:27:22 +01:00
572a89cc4e [liveleak] Add support for prochan embeds (Fixes #2406) 2014-02-19 01:27:12 +01:00
c377110539 release 2014.02.19 2014-02-19 01:08:16 +01:00
a9c7198a0b [testurl] Add extractor
This is a pseudo extractor that can be used to quickly look up test URLs, or test without the test harness.
2014-02-19 01:06:16 +01:00
f6f01ea17b [space] modernize 2014-02-19 01:04:24 +01:00
f2d0fc6823 [bbccouk] Replace test
This older episode is from 1994 and hopefully won't get deleted.
2014-02-19 06:46:14 +07:00
f7000f3a1b [youtube] Add support for yourepeat.com URLs (Closes #2397) 2014-02-19 02:00:54 +07:00
c7f0177fa7 [bbccouk] Skip test 2014-02-18 00:26:12 +07:00
09c4d50944 Fix indenting in README 2014-02-17 14:58:39 +01:00
2eb5d315d4 [youtube] Match more truncated URLs (Closes #2402) 2014-02-17 14:56:21 +01:00
ad5976b4d9 [vimeo] Modernize test definition 2014-02-17 11:44:24 +01:00
a0dfcdce5e release 2014.02.17 2014-02-17 11:33:13 +01:00
96d1637082 Credit @Nikerabbit for helsinki 2014-02-17 11:33:01 +01:00
960f317171 [helsinki] Simplify 2014-02-17 11:32:30 +01:00
4412ca751d Merge remote-tracking branch 'Nikerabbit/hki' 2014-02-17 11:26:09 +01:00
cbffec0c95 Credit @patheticpat for 4tube.com (#2398) 2014-02-17 09:08:38 +07:00
0cea52cc18 Credit @pulpe for play.iprima.cz and stream.cz 2014-02-17 09:07:36 +07:00
6d784e87f4 Credit @prutz1311 for normalboots.com (#2279) 2014-02-17 09:03:28 +07:00
ae6cae78f1 [4tube] Minor changes and extract more metadata 2014-02-17 03:51:03 +07:00
0f99566c01 Add one more format in unified_strdate 2014-02-17 03:47:03 +07:00
2db806b4aa Improve parse_duration 2014-02-17 03:46:26 +07:00
3f32c0ba4c Merge branch '4tube' of https://github.com/patheticpat/youtube-dl into patheticpat-4tube 2014-02-17 02:21:45 +07:00
541cb26c0d [smotri] Add entry for netrc authentication 2014-02-17 02:19:55 +07:00
5544e038ab [vk] Add entry for netrc authentication 2014-02-17 02:17:10 +07:00
9032dc28a6 [vk] Add login feature (Closes #2206) 2014-02-17 02:05:15 +07:00
03635e2a71 Add support for 4tube.com. 2014-02-16 18:10:39 +01:00
00cf938aa5 [nfb] Add rtmp app field to format 2014-02-16 06:11:38 +07:00
a5f707c495 Merge branch 'master' of github.com:rg3/youtube-dl 2014-02-15 20:45:12 +01:00
1824b48169 [f4m] Download only the first fragment with the --test option 2014-02-15 17:53:23 +01:00
07ad22b8af [youtube:search] Mark "no results found" error as expected 2014-02-15 16:30:11 +01:00
b53466e168 Fix f4m downloading on Python 2.6 2014-02-15 16:24:43 +01:00
6a7a389679 Merge branch 'master' of github.com:rg3/youtube-dl 2014-02-15 15:34:17 +01:00
4edff78531 Merge remote-tracking branch 'jaimeMF/f4m'
Conflicts:
	youtube_dl/extractor/__init__.py
2014-02-15 15:32:13 +01:00
99043c2ea5 Replace test for dailymotion users 2014-02-15 13:17:31 +01:00
e68abba910 [sohu] Skip test
Only available from China
2014-02-15 13:12:41 +01:00
3165dc4d9f [france2.fr:generation-quoi] Skip test
The videos seem to not be available outside France
2014-02-15 13:04:31 +01:00
66c43a53e4 Add support for video.helsinki.fi archives 2014-02-14 18:14:28 +02:00
463b334616 [ndr] Replace 404 test 2014-02-14 23:12:15 +07:00
b71dbc57c4 [vesti] Fix player regex (Closes #2382) 2014-02-14 22:26:13 +07:00
72ca1d7f45 [vesti] Skip test 2 due to geo restrictions
At least that's how I interpret the error message "Просмотр вид��о ограничен в вашем регионе."
2014-02-13 22:19:59 +01:00
76e461f395 release 2014.02.13 2014-02-13 19:13:05 +01:00
1074982e6e [vesti] Add support for vesti.ru videos and live streams (Closes #2376) 2014-02-13 23:23:48 +07:00
29b2aaf035 [jadorecettepub] Remove unused import 2014-02-13 16:33:12 +01:00
6f90d098c5 [ecapist] modernize and fix id property 2014-02-13 16:32:42 +01:00
0715161450 Merge pull request #2373 from pulpe/_description_fixes
[collegehumor, chilloutzone] changed description in tests
2014-02-12 06:22:03 -08:00
896583517f [collegehumor, chilloutzone] changed description in tests 2014-02-12 15:11:57 +01:00
713d31fac8 [gametrailers] Fix gametrailers test 2014-02-12 01:50:53 +07:00
96cb10a5f5 [mtv] Improve title extraction 2014-02-12 01:07:30 +07:00
c207c1044e Merge pull request #2372 from pulpe/dropbox_fix
[dropbox] replace not working test
2014-02-11 09:34:49 -08:00
79629ec717 [dropbox] replace not working test 2014-02-11 17:27:36 +01:00
008fda0f08 [ndr] Replace 404 video test 2014-02-11 21:21:05 +07:00
0ae6b01937 [cnn] Add an extractor for blogs (closes #2361) 2014-02-11 14:38:17 +01:00
def630e523 [xtube] Fix uploader extraction 2014-02-11 14:20:41 +01:00
c5ba203e23 [xtube] use unicode_literals 2014-02-11 13:51:37 +01:00
2317e6b2b3 [yahoo] use unicode_literals 2014-02-11 13:51:23 +01:00
cb38928974 [firsttv] Skip test 2014-02-11 10:26:52 +07:00
fa78f13302 [streamcz] Minor changes 2014-02-11 10:19:02 +07:00
18395217c4 Merge branch '_stream' of https://github.com/pulpe/youtube-dl into pulpe-_stream 2014-02-11 09:18:46 +07:00
34bd987811 [freesound] Modernize 2014-02-10 21:03:14 +01:00
af6ba6a1c4 [exfm] Modernize 2014-02-10 21:00:37 +01:00
85409a0c69 [dotsub] Modernize 2014-02-10 20:52:53 +01:00
ebfe352b62 [breakcom] Modernize 2014-02-10 20:48:46 +01:00
fde56d2f17 [howcast] Modernize 2014-02-10 20:45:17 +01:00
3501423dfe [googleplus] Modernize and simplify 2014-02-10 20:36:11 +01:00
0de668af51 [instagram] Modernize 2014-02-10 20:24:12 +01:00
2a584ea90a [firsttv] Fix video URL regex 2014-02-11 00:49:37 +07:00
0f6ed94a15 [firsttv] Add support for 1tv.ru videoarchive 2014-02-11 00:20:41 +07:00
bcb891e82b [lifenews] Minor improvements 2014-02-10 21:07:41 +07:00
ac6e4ca1ed [brightcove] Unescape html entities from the 'og:video' url property (fixes #2360) 2014-02-10 07:50:10 +01:00
2e20bba708 release 2014.02.10 2014-02-10 02:01:11 +01:00
e70dc1d14b [youtube] Correct a minor regex typo 2014-02-10 01:30:47 +01:00
0793a7b3c7 [StreamCZ] Add support for stream.cz 2014-02-09 18:37:12 +01:00
026fcc0495 Fix #2355 (date parsing with dashes) 2014-02-09 18:09:57 +01:00
81c2f20b53 [youtube] Correct invalid JSON (Fixes #2353) 2014-02-09 17:56:10 +01:00
1afe753462 [slideshare] Fix description extraction and modernize
The ‘og:description’  property doesn’t contain the full description
2014-02-09 14:23:19 +01:00
524c2c716a [bloomberg] Fix extraction of ooyala embed code 2014-02-09 14:11:45 +01:00
b542d4bbd7 [kontrtube] Add support for kontrtube.ru (Closes #2354) 2014-02-09 19:53:11 +07:00
cf1eb45153 Add a downloader for f4m manifests 2014-02-09 12:24:54 +01:00
a97bcd80ba Add an extractor for syfy.com
It uses theplatfrom.com, which has been updated to work with f4m manifests
2014-02-08 22:30:00 +01:00
17968e444c [bbc.co.uk] Fix TV episode test 2014-02-09 04:04:21 +07:00
2e3fd9ec2f [bbc.co.uk] Improve overall extractor structure, add subtitles support
(#2184)

Everything from http://www.bbc.co.uk/iplayer/ should be downloadable
now.
2014-02-09 04:00:49 +07:00
d6a283b025 release 2014.02.08.2 2014-02-08 19:20:35 +01:00
9766538124 [jadorecettepub] Add extractor (Fixes #2148) 2014-02-08 19:20:23 +01:00
98dbee8681 [jeuxvideo] Modernize 2014-02-08 18:43:12 +01:00
e421491b3b release 2014.02.08.1 2014-02-08 18:38:05 +01:00
6828d37c41 Merge branch 'master' of github.com:rg3/youtube-dl 2014-02-08 18:37:53 +01:00
bf5f610099 [pbs] Add support for viralplayer links (Fixes #2350) 2014-02-08 18:37:33 +01:00
8b7f73404a [bbc.co.uk] Fix regex 2014-02-08 22:55:43 +07:00
85cacb2f51 [bbc.co.uk] Add one more link format 2014-02-08 22:54:05 +07:00
b3fa3917e2 release 2014.02.08 2014-02-08 16:25:03 +01:00
082c6c867a [bbc.co.uk] Add support for bbc.co.uk radio programmes (Closes #2184) 2014-02-08 21:55:28 +07:00
03fcf1ab57 Merge pull request #2342 from MikeCol/tube8
[Tube8] Extended valid urls schema
2014-02-08 04:00:50 +01:00
3b00dea5eb Extended valid urls schema 2014-02-08 00:09:26 +01:00
8bc6c8e3c0 [chilloutzone] Add additional tests (#2340) 2014-02-07 15:42:31 +01:00
79bc27b53a [channel9] Simplify 2014-02-07 19:41:18 +07:00
84dd703199 [ivi] Simplify 2014-02-07 19:36:50 +07:00
c6fdba23a6 [nfb] Add workaround for python2.6 2014-02-07 19:23:53 +07:00
b19fe521a9 Merge pull request #2340 from Fnordlab/master
[chilloutzone] Fixes refactoring bug
2014-02-07 12:46:56 +01:00
c1e672d121 [chilloutzone] fixes bug with youtube extraction
the id used for extracting the video from youtube is stored in
native_video_id not video_id. This id is only used on chilloutzone.net
2014-02-07 12:29:58 +01:00
f4371f4784 Merge remote-tracking branch 'upstream/master' 2014-02-07 12:20:58 +01:00
d914d9d187 [chilloutzone] Add import 2014-02-07 12:03:19 +01:00
845d14d377 credit @Fnordlab for chilloutzone 2014-02-07 12:00:58 +01:00
4a9540b6d2 [chilloutzone] Simplify (#2338) 2014-02-07 12:00:25 +01:00
9f31be7000 Merge remote-tracking branch 'Fnordlab/chilloutzone' 2014-02-07 11:50:26 +01:00
41fa1b627d release 2014.02.06.3 2014-02-07 01:41:01 +01:00
c0c4e66b29 Merge branch 'chilloutzone' 2014-02-06 21:33:16 +01:00
cd8662de22 [chilloutzone] Bug fix, runs against tests
Fixes a bug with python3.3 and made the extractor run successfully
against tox
2014-02-06 21:31:04 +01:00
3587159614 [nfb] Add encode POST data 2014-02-07 02:13:04 +07:00
d67cc9fa7c [youtube:playlist] Recognize ‘top tracks’ urls (closes #2332)
The list parameter starts with ‘MC’ and can have more characters after it, including dots
2014-02-06 19:46:26 +01:00
bf3a2fe923 [elpais] Fix typo 2014-02-07 00:38:29 +07:00
e9ea0bf123 [ndr] Add support for ndr.de (Closes #2325) 2014-02-07 00:35:26 +07:00
63424b6233 release 2014.02.06.2 2014-02-06 15:45:47 +01:00
0bf35c5cf5 [nfb] Add support for onf.ca URLs 2014-02-06 21:41:31 +07:00
95c29381eb [mooshare] Fix bogus video page URL 2014-02-06 21:26:12 +07:00
94c4abce7f [nfb] Add support for nfb.ca (Closes #2069) 2014-02-06 21:19:13 +07:00
f2dffe55f8 Merge branch 'chilloutzone' 2014-02-06 11:49:38 +01:00
46a073bfac [chilloutzone] Added support for chilloutzone.net
Added support for chilloutzone.net videos including embedded youtube
and vimeo movies. In case you find a not working movie, drop me an
email.
2014-02-06 11:44:44 +01:00
df872ec4e7 release 2014.02.06.1 2014-02-06 11:30:00 +01:00
5de90176d9 [elpais] Add extractor 2014-02-06 11:29:46 +01:00
dcf3eec47a [test_download] Skip over BadStatusLine errors
An error like https://travis-ci.org/rg3/youtube-dl/jobs/18317799#L449 is almost certainly the server's fault.
2014-02-06 04:19:57 +01:00
e9e4f30d26 [pbs] Remove unused import 2014-02-06 04:19:43 +01:00
83cebd73d4 [collegehumor] We only get shortened descriptions now 2014-02-06 04:16:22 +01:00
1df4229bd7 [mtv/gametrailers] Change order of title preference
It looks like the plain title is better again
2014-02-06 04:15:12 +01:00
3c995527e9 release 2014.02.06 2014-02-06 03:30:30 +01:00
7c62b568a2 Merge branch 'master' of github.com:rg3/youtube-dl 2014-02-06 03:30:18 +01:00
ccf9114e84 [googlesearch] Fix start, and skip playlists (Fixes #2329) 2014-02-06 03:29:10 +01:00
d8061908bb [ina] Improve _VALID_URL regex (fixes #2328)
Accept all letters in upper case and don’t require anything after the id
2014-02-05 23:01:24 +01:00
211e17dd43 release 2014.02.05 2014-02-05 21:23:28 +01:00
6cb38a9994 [firstpost] Add extractor (Fixes #2324) 2014-02-05 21:23:21 +01:00
fa7df757a7 [thisav] Simplify and use unicode literals 2014-02-05 19:13:06 +07:00
8c82077619 [toutv] Use unicode literals 2014-02-05 19:02:03 +07:00
e5d1f9e50a [m6] Add support for m6.fr (Closes #2313) 2014-02-05 17:38:17 +07:00
7ee50ae7b5 release 2014.02.04.1 2014-02-04 23:26:55 +01:00
de563c9da0 [ina] Simplify
Download the feed with ‘_download_xml’ to make the extraction easier
2014-02-04 23:15:36 +01:00
50451f2a18 [vbox7] simplify 2014-02-04 23:02:53 +01:00
9bc70948e1 [statigram] Simplify 2014-02-04 22:52:27 +01:00
5dc733f071 [vine] Simplify 2014-02-04 22:02:15 +01:00
bc4850908c [test/youtube_signature] Add a test with the last player
To verify it correctly handles function with “$” in their names.
2014-02-04 21:56:17 +01:00
20650c8654 [youtube] signatures: Recognize javascript functions that contain “$” (fixes #2304) 2014-02-04 21:38:50 +01:00
56dced2670 remove accidentally duplicated test file 2014-02-04 16:35:22 +01:00
eef726c04b release 2014.02.04 2014-02-04 16:33:19 +01:00
acf1555d76 Merge remote-tracking branch 'origin/master' 2014-02-04 16:33:06 +01:00
22e7f1a6ec [pbs] Add support for article pages (Fixes #870) 2014-02-04 16:31:00 +01:00
3c49325658 [lifenews] Fix video URL extraction (Closes #2302) 2014-02-04 21:31:25 +07:00
bb1cd2bea1 [mooshare] Add support for mooshare.biz (Closes #2149) 2014-02-04 20:53:46 +07:00
fdf1f8d4ce [collegehumor] Adapt test to changed video description 2014-02-04 10:37:01 +01:00
117c8c6b97 [bliptv] Remove unused imports 2014-02-04 10:25:19 +01:00
5cef4ff09b [subtittles] Check that the result is not empty 2014-02-04 10:24:17 +01:00
91264ce572 [iprima] Use centralized format sorting 2014-02-04 10:24:00 +01:00
c79ef8e1ae Merge remote-tracking branch 'pulpe/_iprima' 2014-02-04 10:21:42 +01:00
58d915df51 [traileraddict] mark as broken
traileraddict has changed their URL encoding scheme.
I'm working on restoring support, but that may take some time.
2014-02-04 10:13:52 +01:00
7881a64499 [iprima] Add support for play.iprima.cz 2014-02-04 07:45:41 +01:00
90159f5561 release 2014.02.03.1 2014-02-03 15:20:41 +01:00
99877772d0 [generic] Add support for multiple brightcove URLs (Fixes #2283) 2014-02-03 15:19:40 +01:00
b0268cb6ce [vimeo] Remove superfluous whitespace 2014-02-03 20:24:11 +07:00
4edff4cfa8 [vimeo] Add subtitle tests 2014-02-03 20:19:23 +07:00
1eac553e7e [vimeo] Add support for subtitles (Closes #2239) 2014-02-03 20:02:58 +07:00
9d3ac7444d release 2014.02.03 2014-02-03 06:54:37 +01:00
588128d054 Add --ignore-config option (Fixes #633) 2014-02-03 06:54:27 +01:00
8e93b9b9aa Merge remote-tracking branch 'origin/master'
Conflicts:
	youtube_dl/extractor/bliptv.py
2014-02-03 05:19:28 +01:00
b4bcffefa3 [blip.tv] Add support for subtitles (#2274) 2014-02-03 05:18:30 +01:00
2b39af9b4f [BlipTV] Add a test case w/ subtitles (#2274) 2014-02-03 02:41:59 +01:00
23fe495feb Merge pull request #2274 from z00nx/master
[bliptv] Filter out SRT files
2014-02-02 17:31:57 -08:00
b5dbe89bba Merge branch 'master' of https://github.com/rg3/youtube-dl 2014-02-03 01:22:41 +07:00
dbe80ca7ad [tinypic] Add support for tinypic.com videos (Closes #2210) 2014-02-03 01:20:03 +07:00
009a3408f5 [cspan] Fix extraction (fixes #2291)
The webpage urls have changed.
The title and thumbnail are now extracted from an xml.
2014-02-02 18:24:20 +01:00
dst
b58e3c8918 [vube] Use 'id' and 'ext' instead of 'file' 2014-02-02 20:04:44 +07:00
56b6faf91e [traileraddict] Fix extraction 2014-02-02 12:52:47 +01:00
7ac1f877a7 [collegehumor] Fix test
The description simply changed, our code is working fine
2014-02-02 12:43:09 +01:00
d55433bbfd Remove unused imports and simplify 2014-02-02 12:03:36 +01:00
f0ce2bc1c5 Merge remote-tracking branch 'dstftw/vube' 2014-02-02 11:54:23 +01:00
c3bc00b90e [Normalboots] Update test video description 2014-02-02 07:17:48 +01:00
ff6b7b049b Merge pull request #2279 from prutz1311/master
Added support for normalboots.com (#2237)
2014-02-01 22:16:37 -08:00
dst
f46359121f [vube] Make video description optional as it may be missing 2014-02-02 12:03:55 +07:00
dst
37c1525c17 [vube] Remove unnecessary coding cookie 2014-02-02 10:49:38 +07:00
dst
c85e4cf7b4 [vube] Add support for vube.com (Closes #2285) 2014-02-02 08:33:24 +07:00
c66dcda287 Merge pull request #2282 from dstftw/lifenews
[lifenews] Add support for lifenews.ru and fix og content extraction regex
2014-01-31 10:23:46 -08:00
dst
6d845922ab [lifenews] Fix test title 2014-02-01 01:10:15 +07:00
2949cbe036 Update normalboots.py
fixed
2014-01-31 16:51:34 +03:00
c3309a7774 [collegehumor] fix test description 2014-01-31 14:48:49 +01:00
7aed837595 [ro220] Simplify and use unicode_literals 2014-01-31 14:07:58 +01:00
0eb799bae9 [ustream] Simplify and use unicode_literals 2014-01-31 14:05:33 +01:00
4baff4a4ae [spiegel] Simplify and use unicode_literals 2014-01-31 14:00:55 +01:00
45d7bc2f8b [vevo] Simplify and use unicode_literals 2014-01-31 13:56:45 +01:00
c0c2ddddcd Merge pull request #2281 from matthewfranglen/master
Fix #2280: Antigen now links to python script
2014-01-30 19:24:43 -08:00
a96ed91610 Add tutorial for adding a new IE 2014-01-31 04:23:39 +01:00
dst
c1206423c4 Fix extraction of og content in single quotes 2014-01-31 03:57:33 +07:00
dst
659aa21ba1 [lifenews] Add support for lifenews.ru 2014-01-31 03:48:00 +07:00
efd02e858a Fix #2280: Antigen now links to python script 2014-01-30 20:44:16 +00:00
3bf8bc7f37 Update normalboots.py
_TEST added
2014-01-30 23:01:35 +03:00
8ccda826d5 release 2014.01.30.2 2014-01-30 19:33:02 +01:00
b9381e43c2 Fix the extraction of full-episodes urls from southpark.com (fixes #2278)
Added an additional regex to the generic _real_extract method of MTVServicesInfoExtractor
2014-01-30 19:04:33 +01:00
fcdea2666d [collegehumor] Add support for embedded youtube videos (fixes #2277) 2014-01-30 18:33:49 +01:00
c4db377cbb [collegehumor] The video may not contain any file in webm format (#2277)
For example http://www.collegehumor.com/video/5812266
2014-01-30 18:33:49 +01:00
90dc5e8693 Merge pull request #2252 from matthewfranglen/master
Add antigen compatible plugin description
2014-01-30 09:28:10 -08:00
c81a855b0f Added support for normalboots.com 2014-01-30 21:26:50 +04:00
c8d8ec8567 Add requested documentation 2014-01-30 15:09:09 +00:00
4f879a5be0 [bliptv] Filter out SRT files 2014-01-30 20:44:53 +11:00
1a0648b4a9 [malemotion] Disable test case
I am not going to look for an alternative one, but feel free to suggest one.
2014-01-30 06:15:50 +01:00
3c1b4669d0 [francetv] Use unicode_literals 2014-01-30 06:13:57 +01:00
24b3d5e538 [francetvinfo.fr] Support more ID suffixes 2014-01-30 06:12:56 +01:00
ab083b08ab [generic] remove testcase
The video seems to have been removed from the site.
2014-01-30 06:10:57 +01:00
89acb96927 [liveleak] Support old and new URLs 2014-01-30 06:09:06 +01:00
79752e18b1 release 2014.01.30.1 2014-01-30 05:33:31 +01:00
55b41c723c Merge branch 'master' of github.com:rg3/youtube-dl 2014-01-30 05:30:16 +01:00
9f8928d032 [generic] Match JWPlayerOptions
This adds support for The Guardian, among others
Closes #2271, fixes #2267
2014-01-30 05:29:10 +01:00
3effa7ceaa Merge pull request #2273 from dstftw/crunchyroll
[crunchyroll] Add support for mobile URLs and use unicode literals
2014-01-29 20:15:38 -08:00
ed9cc2f1e0 release 2014.01.30 2014-01-30 04:52:54 +01:00
975fa541c2 [liveleak] Support multiple formats (Fixes #2262) 2014-01-30 04:52:50 +01:00
251974e44c Merge pull request #2272 from dstftw/master
Improve some regexes
2014-01-29 14:58:14 -08:00
dst
38a40276ec [crunchyroll] Add support for mobile URLs and use unicode literals 2014-01-30 05:23:44 +07:00
dst
57b6288358 [comedycentral] Improve regexes 2014-01-30 04:33:00 +07:00
dst
c3f51436bf Improve some regexes for embedded players 2014-01-30 04:26:46 +07:00
0c708f11cb [bloomberg] Fix ooyala url extraction
Added a helper method to InfoExtractor for searching the ‘twitter:player’ meta property.
Now the OoyalaIE also recognizes the ‘ec’ parameter in the url as the embed code.
2014-01-29 18:03:32 +01:00
fb2a706d11 [myspass] Simplify and use unicode_literals 2014-01-29 16:59:22 +01:00
0b76600deb [youjizz] Simplify and use unicode_literals 2014-01-29 16:59:21 +01:00
245b612a36 [rbmaradio] Simplify and use unicode_literals 2014-01-29 16:59:10 +01:00
d882161d5a [infoq] Simplify and use unicode_literals 2014-01-29 15:34:35 +01:00
d4a21e0b49 [tutv] Simplify and use unicode_literals 2014-01-29 15:22:41 +01:00
26a78d4bbf [nba] Simplify and use unicode_literals
Remove the commented parts for extracting the upload date
2014-01-29 15:16:18 +01:00
8db69786c2 release 2014.01.29 2014-01-29 11:16:28 +01:00
b11cec4162 [youtube:user] Fix id key (Fixes #1745) 2014-01-29 11:16:12 +01:00
7eeb5bef24 [liveleak] Simplify 2014-01-28 21:57:38 +01:00
9d2032932c Merge remote-tracking branch 'dstftw/ivi' 2014-01-28 21:47:05 +01:00
6490306017 Merge remote-tracking branch 'dstftw/channel9' 2014-01-28 21:46:42 +01:00
dst
ceb2b7d257 [ivi] Fix test and use unicode literals 2014-01-29 02:20:48 +07:00
dst
459a53c2c2 [channel9] Remove unnecessary coding cookie 2014-01-29 02:07:29 +07:00
dst
adc267eebf [channel9] Use unicode literals 2014-01-29 02:00:56 +07:00
dst
ffe8f62d27 [smotri] Simplify login and use unicode literals 2014-01-29 01:52:57 +07:00
ed85007039 [ninegag] Use unicode_literals 2014-01-28 18:55:06 +01:00
5aaca50d60 [keek] Simplify and use unicode_literals 2014-01-28 18:47:31 +01:00
869baf3565 [funnyordie] Simplify and use unicode_literals 2014-01-28 18:41:39 +01:00
e299f6d27f [pornhd] Fix 2014-01-28 03:53:00 +01:00
4a192f817e release 2014.01.28.1 2014-01-28 03:44:19 +01:00
bc1d1a5a71 release 2014.01.28 2014-01-28 03:37:42 +01:00
456895d9cf [tumblr] Test new URL format (#2255) 2014-01-28 03:37:38 +01:00
218c15ab59 Merge remote-tracking branch 'mike/tumblr-url' 2014-01-28 03:35:52 +01:00
17ab4d3b5e [brightcove] Move test to generic 2014-01-28 03:35:32 +01:00
31ef0ff038 Merge remote-tracking branch 'dstftw/rutube-channel' 2014-01-28 03:32:22 +01:00
37e3b90d59 [rutube] Simplify 2014-01-28 03:32:07 +01:00
dst
00ff8f92a5 [rutube] Update test 2014-01-28 09:31:14 +07:00
4857beba3a Merge remote-tracking branch 'dstftw/rutube-channel' 2014-01-28 03:30:21 +01:00
c1e60cc2bf Merge remote-tracking branch 'dstftw/master' 2014-01-28 03:29:10 +01:00
dst
98669ed79c [imdb] Fix playlist test 2014-01-28 09:13:08 +07:00
dst
a3978a6159 [imdb] Fix duplicated entries bug 2014-01-28 09:12:23 +07:00
dst
e3a9f32f52 [rutube] Add support for user videos 2014-01-28 08:47:17 +07:00
dst
87fac3238d [rutube] Add channel test 2014-01-28 08:25:56 +07:00
dst
a2fb2a2134 [rutube] Improve video extractor 2014-01-28 08:19:45 +07:00
9e8ee54553 VALID_URL changed to match different kinds of Tumblr-URLs 2014-01-28 01:41:18 +01:00
117bec936c [brightcove] Parse URL from meta element if available (Fixes #2253) 2014-01-28 01:01:23 +01:00
dst
1547c8cc88 [rutube] Add support for channels and movies 2014-01-28 06:56:09 +07:00
075911d48e [la7] Skip test on travis 2014-01-27 23:47:22 +01:00
b21a918984 release 2014.01.27.2 2014-01-27 19:22:45 +01:00
f9b8549609 [ard] Support multiple formats (Closes #2247) 2014-01-27 18:40:10 +01:00
d1b30713fb Add antigen compatible plugin description 2014-01-27 15:33:16 +00:00
e2ba07024f Merge remote-tracking branch 'origin/master' 2014-01-27 12:45:59 +01:00
9b05bd42e5 [discovery] Extract more info and simplify 2014-01-27 12:41:30 +01:00
b6d3a99678 [cliphunter] Simplify (#2233) 2014-01-27 12:39:39 +01:00
96d7b8873a Merge remote-tracking branch 'sahutd/master' 2014-01-27 12:21:00 +01:00
efc867775e [cliphunter] Simplify 2014-01-27 07:55:30 +01:00
5ab772f09c Merge branch 'cliphunter' of https://github.com/pornophage/youtube-dl 2014-01-27 07:48:51 +01:00
2a89386232 Credit @MikeCol for malemotion IE 2014-01-27 07:43:41 +01:00
4d9be98dbc Malemotion extractor 2014-01-27 07:43:02 +01:00
6737907826 [tumblr] Fix thumbnail extraction
Signed-off-by: Philipp Hagemeister <phihag@phihag.de>
2014-01-27 07:38:55 +01:00
c060b77446 [tumblr] Use unicode_literals 2014-01-27 07:36:18 +01:00
7e8caf30c0 Throw an error if no video formats are found 2014-01-27 07:31:54 +01:00
ca3e054750 release 2014.01.27.1 2014-01-27 07:09:55 +01:00
1da1558f46 [la7] Support more URLs 2014-01-27 07:08:01 +01:00
25c67d257c release 2014.01.27 2014-01-27 07:05:39 +01:00
a17d16d59c [la7] Add support 2014-01-27 07:05:28 +01:00
d16076ff3e [huffpost] Fix extractor 2014-01-27 06:55:35 +01:00
6c57e8a063 [setup.py] Only print a warning if documentation files are missing (Fixes #780) 2014-01-27 06:22:15 +01:00
db1f388878 [huffpost] Add support 2014-01-27 05:47:38 +01:00
0f2999fe2b Merge pull request #2221 from Rudloff/master
Removed websurg extractor
2014-01-26 18:03:26 -08:00
53bfd6b24c Added support for Discovery Issue #2227 2014-01-26 14:05:34 +05:30
5700e7792a [youtube] Encode the data when submitting the form for confirming the age
Needed on python 3
2014-01-25 17:22:41 +01:00
38c2e5b8d5 [youtube] Use https: in more urls 2014-01-25 17:11:55 +01:00
48f9678a32 [test/youtube_lists] Change the list used for testing the Top Lists extractor
The ‘Top tracks’ list is not always present in the channel page
2014-01-25 17:02:32 +01:00
beddbc2ad1 [youtube:toplist] Make the regex for finding the playlist link more flexible
`title={foo}` may not be at the end of the `href` string.
2014-01-25 15:47:03 +01:00
f89197d73e Some pep8 style fixes 2014-01-25 15:33:23 +01:00
944d65c762 [extractor/common] Encode the url when calculating the md5 with —write-pages option
This doesn’t cause any problem in python 2.*, but on python 3 the `md5` function only accepts bytes.
2014-01-25 15:32:56 +01:00
f945612bd0 [rtlnow] Simplify 2014-01-25 14:18:54 +01:00
59188de113 Properly escape ‘.’ in some _VALID_URL properties 2014-01-25 11:48:08 +01:00
352d08e3e5 Add an extractor for freespeech.org (closes #2234) 2014-01-25 11:31:30 +01:00
bacb5e4f44 Minor fixes
Remove empty description
Set correct md5 test
2014-01-25 02:34:08 +01:00
008af8660b Add cliphunter extractor 2014-01-25 01:46:52 +01:00
886fa72324 release 2014.01.23.4 2014-01-24 00:06:55 +01:00
2c5bae429a [youtube] Fix new formats 2014-01-24 00:06:26 +01:00
f265fc1238 release 2014.01.23.3 2014-01-23 23:55:53 +01:00
1394ce65b4 [youtube] Add new formats (Fixes #2221) 2014-01-23 23:54:06 +01:00
67ccb77197 Removed websurg extractor 2014-01-23 23:42:34 +01:00
63ef36e8d8 Add build instructions (Fixes #2218) 2014-01-23 23:28:29 +01:00
0b65e5d40f [youtube] Do not break upon unknown formats 2014-01-23 23:21:42 +01:00
629be17af4 release 2014.01.23.2 2014-01-23 19:05:05 +01:00
fd28827864 Do not count unmatched videos for --max-downloads (Fixes #2211) 2014-01-23 19:04:22 +01:00
8c61d9a9b1 Mention default for -f (Fixes #2215) 2014-01-23 18:50:04 +01:00
975d35dbab [youtube:truncated_url] Also match mail subscription links (#2214) 2014-01-23 16:14:54 +01:00
8b769664c4 [sina] Recognize http://video.sina.com.cn/v/b/{id}-*.html urls (fixes #2212) 2014-01-23 14:03:14 +01:00
76f270a46a [sina] use unicode_literals 2014-01-23 14:00:29 +01:00
133 changed files with 5383 additions and 1836 deletions

View File

@ -20,7 +20,7 @@ which means you can modify it, redistribute it or use it however you like.
sure that you have sufficient permissions
(run with sudo if needed)
-i, --ignore-errors continue on download errors, for example to
to skip unavailable videos in a playlist
skip unavailable videos in a playlist
--abort-on-error Abort downloading of further videos (in the
playlist or the command line) if an error
occurs
@ -53,6 +53,12 @@ which means you can modify it, redistribute it or use it however you like.
from google videos for youtube-dl "large
apple". By default (with value "auto")
youtube-dl guesses.
--ignore-config Do not read configuration files. When given
in the global configuration file /etc
/youtube-dl.conf: do not read the user
configuration in ~/.config/youtube-dl.conf
(%APPDATA%/youtube-dl/config.txt on
Windows)
## Video Selection:
--playlist-start NUMBER playlist video to start at (default is 1)
@ -181,7 +187,9 @@ which means you can modify it, redistribute it or use it however you like.
preference using slashes: "-f 22/17/18".
"-f mp4" and "-f flv" are also supported.
You can also use the special names "best",
"bestaudio", "worst", and "worstaudio"
"bestaudio", "worst", and "worstaudio". By
default, youtube-dl will pick the best
quality.
--all-formats download all available video formats
--prefer-free-formats prefer free video formats unless a specific
one is requested
@ -238,7 +246,7 @@ which means you can modify it, redistribute it or use it however you like.
# CONFIGURATION
You can configure youtube-dl by placing default arguments (such as `--extract-audio --no-mtime` to always extract the audio and not copy the mtime) into `/etc/youtube-dl.conf` and/or `~/.config/youtube-dl.conf`. On Windows, the configuration file locations are `%APPDATA%\youtube-dl\config.txt` and `C:\Users\<Yourname>\youtube-dl.conf`.
You can configure youtube-dl by placing default arguments (such as `--extract-audio --no-mtime` to always extract the audio and not copy the mtime) into `/etc/youtube-dl.conf` and/or `~/.config/youtube-dl/config`. On Windows, the configuration file locations are `%APPDATA%\youtube-dl\config.txt` and `C:\Users\<Yourname>\youtube-dl.conf`.
# OUTPUT TEMPLATE
@ -273,10 +281,12 @@ Videos can be filtered by their upload date using the options `--date`, `--dateb
Examples:
$ # Download only the videos uploaded in the last 6 months
# Download only the videos uploaded in the last 6 months
$ youtube-dl --dateafter now-6months
$ # Download only the videos uploaded on January 1, 1970
# Download only the videos uploaded on January 1, 1970
$ youtube-dl --date 19700101
$ # will only download the videos uploaded in the 200x decade
$ youtube-dl --dateafter 20000101 --datebefore 20091231
@ -323,11 +333,31 @@ Since June 2012 (#342) youtube-dl is packed as an executable zipfile, simply unz
To run the exe you need to install first the [Microsoft Visual C++ 2008 Redistributable Package](http://www.microsoft.com/en-us/download/details.aspx?id=29).
# COPYRIGHT
# DEVELOPER INSTRUCTIONS
youtube-dl is released into the public domain by the copyright holders.
Most users do not need to build youtube-dl and can [download the builds](http://rg3.github.io/youtube-dl/download.html) or get them from their distribution.
This README file was originally written by Daniel Bolton (<https://github.com/dbbolton>) and is likewise released into the public domain.
To run youtube-dl as a developer, you don't need to build anything either. Simply execute
python -m youtube_dl
To run the test, simply invoke your favorite test runner, or execute a test file directly; any of the following work:
python -m unittest discover
python test/test_download.py
nosetests
If you want to create a build of youtube-dl yourself, you'll need
* python
* make
* pandoc
* zip
* nosetests
### Adding support for a new site
If you want to add support for a new site, copy *any* [recently modified](https://github.com/rg3/youtube-dl/commits/master/youtube_dl/extractor) file in `youtube_dl/extractor`, add an import in [`youtube_dl/extractor/__init__.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/__init__.py). Have a look at [`youtube_dl/common/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should return](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L38). Don't forget to run the tests with `python test/test_download.py TestDownload.test_YourExtractor`! For a detailed tutorial, refer to [this blog post](http://filippo.io/add-support-for-a-new-video-site-to-youtube-dl/).
# BUGS
@ -386,3 +416,9 @@ Only post features that you (or an incapicated friend you can personally talk to
### Is your question about youtube-dl?
It may sound strange, but some bug reports we receive are completely unrelated to youtube-dl and relate to a different or even the reporter's own application. Please make sure that you are actually using youtube-dl. If you are using a UI for youtube-dl, report the bug to the maintainer of the actual application providing the UI. On the other hand, if your UI for youtube-dl fails in some way you believe is related to youtube-dl, by all means, go ahead and report the bug.
# COPYRIGHT
youtube-dl is released into the public domain by the copyright holders.
This README file was originally written by Daniel Bolton (<https://github.com/dbbolton>) and is likewise released into the public domain.

View File

@ -14,9 +14,9 @@
set -e
skip_tests=false
if [ "$1" = '--skip-test' ]; then
skip_tests=true
skip_tests=true
if [ "$1" = '--run-tests' ]; then
skip_tests=false
shift
fi

View File

@ -3,7 +3,9 @@
from __future__ import print_function
import os.path
import pkg_resources
import warnings
import sys
try:
@ -44,12 +46,24 @@ py2exe_params = {
if len(sys.argv) >= 2 and sys.argv[1] == 'py2exe':
params = py2exe_params
else:
params = {
'data_files': [ # Installing system-wide would require sudo...
files_spec = [
('etc/bash_completion.d', ['youtube-dl.bash-completion']),
('share/doc/youtube_dl', ['README.txt']),
('share/man/man1', ['youtube-dl.1'])
]
root = os.path.dirname(os.path.abspath(__file__))
data_files = []
for dirname, files in files_spec:
resfiles = []
for fn in files:
if not os.path.exists(fn):
warnings.warn('Skipping file %s since it is not present. Type make to build all automatically generated files.' % fn)
else:
resfiles.append(fn)
data_files.append((dirname, resfiles))
params = {
'data_files': data_files,
}
if setuptools_available:
params['entry_points'] = {'console_scripts': ['youtube-dl = youtube_dl:main']}

View File

@ -1,5 +1,7 @@
#!/usr/bin/env python
from __future__ import unicode_literals
# Allow direct execution
import os
import sys
@ -13,6 +15,7 @@ from youtube_dl.extractor import (
FacebookIE,
gen_extractors,
JustinTVIE,
PBSIE,
YoutubeIE,
)
@ -29,18 +32,20 @@ class TestAllURLsMatching(unittest.TestCase):
def test_youtube_playlist_matching(self):
assertPlaylist = lambda url: self.assertMatch(url, ['youtube:playlist'])
assertPlaylist(u'ECUl4u3cNGP61MdtwGTqZA0MreSaDybji8')
assertPlaylist(u'UUBABnxM4Ar9ten8Mdjj1j0Q') #585
assertPlaylist(u'PL63F0C78739B09958')
assertPlaylist(u'https://www.youtube.com/playlist?list=UUBABnxM4Ar9ten8Mdjj1j0Q')
assertPlaylist(u'https://www.youtube.com/course?list=ECUl4u3cNGP61MdtwGTqZA0MreSaDybji8')
assertPlaylist(u'https://www.youtube.com/playlist?list=PLwP_SiAcdui0KVebT0mU9Apz359a4ubsC')
assertPlaylist(u'https://www.youtube.com/watch?v=AV6J6_AeFEQ&playnext=1&list=PL4023E734DA416012') #668
self.assertFalse('youtube:playlist' in self.matching_ies(u'PLtS2H6bU1M'))
assertPlaylist('ECUl4u3cNGP61MdtwGTqZA0MreSaDybji8')
assertPlaylist('UUBABnxM4Ar9ten8Mdjj1j0Q') #585
assertPlaylist('PL63F0C78739B09958')
assertPlaylist('https://www.youtube.com/playlist?list=UUBABnxM4Ar9ten8Mdjj1j0Q')
assertPlaylist('https://www.youtube.com/course?list=ECUl4u3cNGP61MdtwGTqZA0MreSaDybji8')
assertPlaylist('https://www.youtube.com/playlist?list=PLwP_SiAcdui0KVebT0mU9Apz359a4ubsC')
assertPlaylist('https://www.youtube.com/watch?v=AV6J6_AeFEQ&playnext=1&list=PL4023E734DA416012') #668
self.assertFalse('youtube:playlist' in self.matching_ies('PLtS2H6bU1M'))
# Top tracks
assertPlaylist('https://www.youtube.com/playlist?list=MCUS.20142101')
def test_youtube_matching(self):
self.assertTrue(YoutubeIE.suitable(u'PLtS2H6bU1M'))
self.assertFalse(YoutubeIE.suitable(u'https://www.youtube.com/watch?v=AV6J6_AeFEQ&playnext=1&list=PL4023E734DA416012')) #668
self.assertTrue(YoutubeIE.suitable('PLtS2H6bU1M'))
self.assertFalse(YoutubeIE.suitable('https://www.youtube.com/watch?v=AV6J6_AeFEQ&playnext=1&list=PL4023E734DA416012')) #668
self.assertMatch('http://youtu.be/BaW_jenozKc', ['youtube'])
self.assertMatch('http://www.youtube.com/v/BaW_jenozKc', ['youtube'])
self.assertMatch('https://youtube.googleapis.com/v/BaW_jenozKc', ['youtube'])
@ -63,6 +68,9 @@ class TestAllURLsMatching(unittest.TestCase):
def test_youtube_show_matching(self):
self.assertMatch('http://www.youtube.com/show/airdisasters', ['youtube:show'])
def test_youtube_truncated(self):
self.assertMatch('http://www.youtube.com/watch?', ['youtube:truncated_url'])
def test_justin_tv_channelid_matching(self):
self.assertTrue(JustinTVIE.suitable(u"justin.tv/vanillatv"))
self.assertTrue(JustinTVIE.suitable(u"twitch.tv/vanillatv"))
@ -80,7 +88,7 @@ class TestAllURLsMatching(unittest.TestCase):
self.assertTrue(JustinTVIE.suitable(u"http://www.twitch.tv/tsm_theoddone/c/2349361"))
def test_youtube_extract(self):
assertExtractId = lambda url, id: self.assertEqual(YoutubeIE()._extract_id(url), id)
assertExtractId = lambda url, id: self.assertEqual(YoutubeIE.extract_id(url), id)
assertExtractId('http://www.youtube.com/watch?&v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('https://www.youtube.com/watch?&v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('https://www.youtube.com/watch?feature=player_embedded&v=BaW_jenozKc', 'BaW_jenozKc')
@ -89,7 +97,7 @@ class TestAllURLsMatching(unittest.TestCase):
assertExtractId('BaW_jenozKc', 'BaW_jenozKc')
def test_facebook_matching(self):
self.assertTrue(FacebookIE.suitable(u'https://www.facebook.com/Shiniknoh#!/photo.php?v=10153317450565268'))
self.assertTrue(FacebookIE.suitable('https://www.facebook.com/Shiniknoh#!/photo.php?v=10153317450565268'))
def test_no_duplicates(self):
ies = gen_extractors()
@ -120,5 +128,13 @@ class TestAllURLsMatching(unittest.TestCase):
def test_soundcloud_not_matching_sets(self):
self.assertMatch('http://soundcloud.com/floex/sets/gone-ep', ['soundcloud:set'])
def test_tumblr(self):
self.assertMatch('http://tatianamaslanydaily.tumblr.com/post/54196191430/orphan-black-dvd-extra-behind-the-scenes', ['Tumblr'])
self.assertMatch('http://tatianamaslanydaily.tumblr.com/post/54196191430', ['Tumblr'])
def test_pbs(self):
# https://github.com/rg3/youtube-dl/issues/2350
self.assertMatch('http://video.pbs.org/viralplayer/2365173446/', ['PBS'])
if __name__ == '__main__':
unittest.main()

View File

@ -18,10 +18,12 @@ from test.helper import (
import hashlib
import io
import json
import re
import socket
import youtube_dl.YoutubeDL
from youtube_dl.utils import (
compat_http_client,
compat_str,
compat_urllib_error,
compat_HTTPError,
@ -110,7 +112,7 @@ def generator(test_case):
ydl.download([test_case['url']])
except (DownloadError, ExtractorError) as err:
# Check if the exception is not a network related one
if not err.exc_info[0] in (compat_urllib_error.URLError, socket.timeout, UnavailableVideoError) or (err.exc_info[0] == compat_HTTPError and err.exc_info[1].code == 503):
if not err.exc_info[0] in (compat_urllib_error.URLError, socket.timeout, UnavailableVideoError, compat_http_client.BadStatusLine) or (err.exc_info[0] == compat_HTTPError and err.exc_info[1].code == 503):
raise
if try_num == RETRIES:
@ -136,6 +138,15 @@ def generator(test_case):
with io.open(info_json_fn, encoding='utf-8') as infof:
info_dict = json.load(infof)
for (info_field, expected) in tc.get('info_dict', {}).items():
if isinstance(expected, compat_str) and expected.startswith('re:'):
got = info_dict.get(info_field)
match_str = expected[len('re:'):]
match_rex = re.compile(match_str)
self.assertTrue(
isinstance(got, compat_str) and match_rex.match(got),
u'field %s (value: %r) should match %r' % (info_field, got, match_str))
else:
if isinstance(expected, compat_str) and expected.startswith('md5:'):
got = 'md5:' + md5(info_dict.get(info_field))
else:

View File

@ -33,6 +33,9 @@ from youtube_dl.extractor import (
ImdbListIE,
KhanAcademyIE,
EveryonesMixtapeIE,
RutubeChannelIE,
GoogleSearchIE,
GenericIE,
)
@ -52,10 +55,10 @@ class TestPlaylists(unittest.TestCase):
def test_dailymotion_user(self):
dl = FakeYDL()
ie = DailymotionUserIE(dl)
result = ie.extract('http://www.dailymotion.com/user/generation-quoi/')
result = ie.extract('https://www.dailymotion.com/user/nqtv')
self.assertIsPlaylist(result)
self.assertEqual(result['title'], 'Génération Quoi')
self.assertTrue(len(result['entries']) >= 26)
self.assertEqual(result['title'], 'Rémi Gaillard')
self.assertTrue(len(result['entries']) >= 100)
def test_vimeo_channel(self):
dl = FakeYDL()
@ -195,11 +198,11 @@ class TestPlaylists(unittest.TestCase):
def test_imdb_list(self):
dl = FakeYDL()
ie = ImdbListIE(dl)
result = ie.extract('http://www.imdb.com/list/sMjedvGDd8U')
result = ie.extract('http://www.imdb.com/list/JFs9NWw6XI0')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], 'sMjedvGDd8U')
self.assertEqual(result['title'], 'Animated and Family Films')
self.assertTrue(len(result['entries']) >= 48)
self.assertEqual(result['id'], 'JFs9NWw6XI0')
self.assertEqual(result['title'], 'March 23, 2012 Releases')
self.assertEqual(len(result['entries']), 7)
def test_khanacademy_topic(self):
dl = FakeYDL()
@ -220,6 +223,41 @@ class TestPlaylists(unittest.TestCase):
self.assertEqual(result['title'], 'Driving')
self.assertEqual(len(result['entries']), 24)
def test_rutube_channel(self):
dl = FakeYDL()
ie = RutubeChannelIE(dl)
result = ie.extract('http://rutube.ru/tags/video/1409')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], '1409')
self.assertTrue(len(result['entries']) >= 34)
def test_multiple_brightcove_videos(self):
# https://github.com/rg3/youtube-dl/issues/2283
dl = FakeYDL()
ie = GenericIE(dl)
result = ie.extract('http://www.newyorker.com/online/blogs/newsdesk/2014/01/always-never-nuclear-command-and-control.html')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], 'always-never-nuclear-command-and-control')
self.assertEqual(result['title'], 'Always/Never: A Little-Seen Movie About Nuclear Command and Control : The New Yorker')
self.assertEqual(len(result['entries']), 3)
def test_GoogleSearch(self):
dl = FakeYDL()
ie = GoogleSearchIE(dl)
result = ie.extract('gvsearch15:python language')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], 'python language')
self.assertEqual(result['title'], 'python language')
self.assertTrue(len(result['entries']) == 15)
def test_generic_rss_feed(self):
dl = FakeYDL()
ie = GenericIE(dl)
result = ie.extract('http://www.escapistmagazine.com/rss/videos/list/1.xml')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], 'http://www.escapistmagazine.com/rss/videos/list/1.xml')
self.assertEqual(result['title'], 'Zero Punctuation')
self.assertTrue(len(result['entries']) > 10)
if __name__ == '__main__':
unittest.main()

View File

@ -10,9 +10,11 @@ from test.helper import FakeYDL, md5
from youtube_dl.extractor import (
BlipTVIE,
YoutubeIE,
DailymotionIE,
TEDIE,
VimeoIE,
)
@ -202,5 +204,80 @@ class TestTedSubtitles(BaseTestSubtitles):
for lang in langs:
self.assertTrue(subtitles.get(lang) is not None, u'Subtitles for \'%s\' not extracted' % lang)
class TestBlipTVSubtitles(BaseTestSubtitles):
url = 'http://blip.tv/a/a-6603250'
IE = BlipTVIE
def test_list_subtitles(self):
self.DL.expect_warning(u'Automatic Captions not supported by this server')
self.DL.params['listsubtitles'] = True
info_dict = self.getInfoDict()
self.assertEqual(info_dict, None)
def test_allsubtitles(self):
self.DL.expect_warning(u'Automatic Captions not supported by this server')
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(set(subtitles.keys()), set(['en']))
self.assertEqual(md5(subtitles['en']), '5b75c300af65fe4476dff79478bb93e4')
class TestVimeoSubtitles(BaseTestSubtitles):
url = 'http://vimeo.com/76979871'
IE = VimeoIE
def test_no_writesubtitles(self):
subtitles = self.getSubtitles()
self.assertEqual(subtitles, None)
def test_subtitles(self):
self.DL.params['writesubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['en']), '8062383cf4dec168fc40a088aa6d5888')
def test_subtitles_lang(self):
self.DL.params['writesubtitles'] = True
self.DL.params['subtitleslangs'] = ['fr']
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['fr']), 'b6191146a6c5d3a452244d853fde6dc8')
def test_allsubtitles(self):
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(set(subtitles.keys()), set(['de', 'en', 'es', 'fr']))
def test_list_subtitles(self):
self.DL.expect_warning(u'Automatic Captions not supported by this server')
self.DL.params['listsubtitles'] = True
info_dict = self.getInfoDict()
self.assertEqual(info_dict, None)
def test_automatic_captions(self):
self.DL.expect_warning(u'Automatic Captions not supported by this server')
self.DL.params['writeautomaticsub'] = True
self.DL.params['subtitleslang'] = ['en']
subtitles = self.getSubtitles()
self.assertTrue(len(subtitles.keys()) == 0)
def test_nosubtitles(self):
self.DL.expect_warning(u'video doesn\'t have subtitles')
self.url = 'http://vimeo.com/56015672'
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(len(subtitles), 0)
def test_multiple_langs(self):
self.DL.params['writesubtitles'] = True
langs = ['es', 'fr', 'de']
self.DL.params['subtitleslangs'] = langs
subtitles = self.getSubtitles()
for lang in langs:
self.assertTrue(subtitles.get(lang) is not None, u'Subtitles for \'%s\' not extracted' % lang)
if __name__ == '__main__':
unittest.main()

View File

@ -25,6 +25,7 @@ from youtube_dl.utils import (
shell_quote,
smuggle_url,
str_to_int,
struct_unpack,
timeconvert,
unescapeHTML,
unified_strdate,
@ -127,6 +128,7 @@ class TestUtil(unittest.TestCase):
self.assertEqual(unified_strdate('8/7/2009'), '20090708')
self.assertEqual(unified_strdate('Dec 14, 2012'), '20121214')
self.assertEqual(unified_strdate('2012/10/11 01:56:38 +0000'), '20121011')
self.assertEqual(unified_strdate('1968-12-10'), '19681210')
def test_find_xpath_attr(self):
testxml = u'''<root>
@ -200,7 +202,16 @@ class TestUtil(unittest.TestCase):
self.assertEqual(parse_duration('1'), 1)
self.assertEqual(parse_duration('1337:12'), 80232)
self.assertEqual(parse_duration('9:12:43'), 33163)
self.assertEqual(parse_duration('12:00'), 720)
self.assertEqual(parse_duration('00:01:01'), 61)
self.assertEqual(parse_duration('x:y'), None)
self.assertEqual(parse_duration('3h11m53s'), 11513)
self.assertEqual(parse_duration('62m45s'), 3765)
self.assertEqual(parse_duration('6m59s'), 419)
self.assertEqual(parse_duration('49s'), 49)
self.assertEqual(parse_duration('0h0m0s'), 0)
self.assertEqual(parse_duration('0m0s'), 0)
self.assertEqual(parse_duration('0s'), 0)
def test_fix_xml_ampersands(self):
self.assertEqual(
@ -236,5 +247,8 @@ class TestUtil(unittest.TestCase):
testPL(5, 2, (2, 99), [2, 3, 4])
testPL(5, 2, (20, 99), [])
def test_struct_unpack(self):
self.assertEqual(struct_unpack(u'!B', b'\x00'), (0,))
if __name__ == '__main__':
unittest.main()

View File

@ -30,7 +30,7 @@ class TestYoutubeLists(unittest.TestCase):
result = ie.extract('https://www.youtube.com/playlist?list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re')
self.assertIsPlaylist(result)
self.assertEqual(result['title'], 'ytdl test PL')
ytie_results = [YoutubeIE()._extract_id(url['url']) for url in result['entries']]
ytie_results = [YoutubeIE().extract_id(url['url']) for url in result['entries']]
self.assertEqual(ytie_results, [ 'bV9L5Ht9LgY', 'FXxLjLQi3Fg', 'tU3Bgo5qJZE'])
def test_youtube_playlist_noplaylist(self):
@ -39,7 +39,7 @@ class TestYoutubeLists(unittest.TestCase):
ie = YoutubePlaylistIE(dl)
result = ie.extract('https://www.youtube.com/watch?v=FXxLjLQi3Fg&list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re')
self.assertEqual(result['_type'], 'url')
self.assertEqual(YoutubeIE()._extract_id(result['url']), 'FXxLjLQi3Fg')
self.assertEqual(YoutubeIE().extract_id(result['url']), 'FXxLjLQi3Fg')
def test_issue_673(self):
dl = FakeYDL()
@ -59,7 +59,7 @@ class TestYoutubeLists(unittest.TestCase):
dl = FakeYDL()
ie = YoutubePlaylistIE(dl)
result = ie.extract('https://www.youtube.com/playlist?list=PLwP_SiAcdui0KVebT0mU9Apz359a4ubsC')
ytie_results = [YoutubeIE()._extract_id(url['url']) for url in result['entries']]
ytie_results = [YoutubeIE().extract_id(url['url']) for url in result['entries']]
self.assertFalse('pElCt5oNDuI' in ytie_results)
self.assertFalse('KdPEApIVdWM' in ytie_results)
@ -76,9 +76,9 @@ class TestYoutubeLists(unittest.TestCase):
# TODO find a > 100 (paginating?) videos course
result = ie.extract('https://www.youtube.com/course?list=ECUl4u3cNGP61MdtwGTqZA0MreSaDybji8')
entries = result['entries']
self.assertEqual(YoutubeIE()._extract_id(entries[0]['url']), 'j9WZyLZCBzs')
self.assertEqual(YoutubeIE().extract_id(entries[0]['url']), 'j9WZyLZCBzs')
self.assertEqual(len(entries), 25)
self.assertEqual(YoutubeIE()._extract_id(entries[-1]['url']), 'rYefUsYuEp0')
self.assertEqual(YoutubeIE().extract_id(entries[-1]['url']), 'rYefUsYuEp0')
def test_youtube_channel(self):
dl = FakeYDL()
@ -117,10 +117,17 @@ class TestYoutubeLists(unittest.TestCase):
original_video = entries[0]
self.assertEqual(original_video['id'], 'rjFaenf1T-Y')
def test_youtube_toptracks(self):
dl = FakeYDL()
ie = YoutubePlaylistIE(dl)
result = ie.extract('https://www.youtube.com/playlist?list=MCUS')
entries = result['entries']
self.assertEqual(len(entries), 100)
def test_youtube_toplist(self):
dl = FakeYDL()
ie = YoutubeTopListIE(dl)
result = ie.extract('yttoplist:music:Top Tracks')
result = ie.extract('yttoplist:music:Trending')
entries = result['entries']
self.assertTrue(len(entries) >= 5)

View File

@ -27,6 +27,12 @@ _TESTS = [
85,
u'3456789a0cdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRS[UVWXYZ!"#$%&\'()*+,-./:;<=>?@',
),
(
u'https://s.ytimg.com/yts/jsbin/html5player-vfle-mVwz.js',
u'js',
90,
u']\\[@?>=<;:/.-,+*)(\'&%$#"hZYXWVUTSRQPONMLKJIHGFEDCBAzyxwvutsrqponmlkjiagfedcb39876',
),
]

24
youtube-dl.plugin.zsh Normal file
View File

@ -0,0 +1,24 @@
# This allows the youtube-dl command to be installed in ZSH using antigen.
# Antigen is a bundle manager. It allows you to enhance the functionality of
# your zsh session by installing bundles and themes easily.
# Antigen documentation:
# http://antigen.sharats.me/
# https://github.com/zsh-users/antigen
# Install youtube-dl:
# antigen bundle rg3/youtube-dl
# Bundles installed by antigen are available for use immediately.
# Update youtube-dl (and all other antigen bundles):
# antigen update
# The antigen command will download the git repository to a folder and then
# execute an enabling script (this file). The complete process for loading the
# code is documented here:
# https://github.com/zsh-users/antigen#notes-on-writing-plugins
# This specific script just aliases youtube-dl to the python script that this
# library provides. This requires updating the PYTHONPATH to ensure that the
# full set of code can be located.
alias youtube-dl="PYTHONPATH=$(dirname $0) $(dirname $0)/bin/youtube-dl"

View File

@ -396,10 +396,6 @@ class YoutubeDL(object):
except UnicodeEncodeError:
self.to_screen('[download] The file has already been downloaded')
def increment_downloads(self):
"""Increment the ordinal that assigns a number to each file."""
self._num_downloads += 1
def prepare_filename(self, info_dict):
"""Generate the output filename."""
try:
@ -773,8 +769,11 @@ class YoutubeDL(object):
"""Process a single resolved IE result."""
assert info_dict.get('_type', 'video') == 'video'
#We increment the download the download count here to match the previous behaviour.
self.increment_downloads()
max_downloads = self.params.get('max_downloads')
if max_downloads is not None:
if self._num_downloads >= int(max_downloads):
raise MaxDownloadsReached()
info_dict['fulltitle'] = info_dict['title']
if len(info_dict['title']) > 200:
@ -791,10 +790,7 @@ class YoutubeDL(object):
self.to_screen('[download] ' + reason)
return
max_downloads = self.params.get('max_downloads')
if max_downloads is not None:
if self._num_downloads > int(max_downloads):
raise MaxDownloadsReached()
self._num_downloads += 1
filename = self.prepare_filename(info_dict)
@ -1098,9 +1094,15 @@ class YoutubeDL(object):
res += fdict['format_note'] + ' '
if fdict.get('tbr') is not None:
res += '%4dk ' % fdict['tbr']
if fdict.get('container') is not None:
if res:
res += ', '
res += '%s container' % fdict['container']
if (fdict.get('vcodec') is not None and
fdict.get('vcodec') != 'none'):
res += '%-5s' % fdict['vcodec']
if res:
res += ', '
res += fdict['vcodec']
if fdict.get('vbr') is not None:
res += '@'
elif fdict.get('vbr') is not None and fdict.get('abr') is not None:
@ -1110,6 +1112,9 @@ class YoutubeDL(object):
if fdict.get('acodec') is not None:
if res:
res += ', '
if fdict['acodec'] == 'none':
res += 'video only'
else:
res += '%-5s' % fdict['acodec']
elif fdict.get('abr') is not None:
if res:

View File

@ -40,6 +40,13 @@ __authors__ = (
'Michael Orlitzky',
'Chris Gahan',
'Saimadhav Heblikar',
'Mike Col',
'Oleg Prutz',
'pulpe',
'Andreas Schmitz',
'Michael Kaiser',
'Niklas Laxström',
'David Triendl',
)
__license__ = 'Public Domain'
@ -99,6 +106,43 @@ def parseOpts(overrideArguments=None):
optionf.close()
return res
def _readUserConf():
xdg_config_home = os.environ.get('XDG_CONFIG_HOME')
if xdg_config_home:
userConfFile = os.path.join(xdg_config_home, 'youtube-dl', 'config')
if not os.path.isfile(userConfFile):
userConfFile = os.path.join(xdg_config_home, 'youtube-dl.conf')
else:
userConfFile = os.path.join(os.path.expanduser('~'), '.config', 'youtube-dl', 'config')
if not os.path.isfile(userConfFile):
userConfFile = os.path.join(os.path.expanduser('~'), '.config', 'youtube-dl.conf')
userConf = _readOptions(userConfFile, None)
if userConf is None:
appdata_dir = os.environ.get('appdata')
if appdata_dir:
userConf = _readOptions(
os.path.join(appdata_dir, 'youtube-dl', 'config'),
default=None)
if userConf is None:
userConf = _readOptions(
os.path.join(appdata_dir, 'youtube-dl', 'config.txt'),
default=None)
if userConf is None:
userConf = _readOptions(
os.path.join(os.path.expanduser('~'), 'youtube-dl.conf'),
default=None)
if userConf is None:
userConf = _readOptions(
os.path.join(os.path.expanduser('~'), 'youtube-dl.conf.txt'),
default=None)
if userConf is None:
userConf = []
return userConf
def _format_option_string(option):
''' ('-o', '--option') -> -o, --format METAVAR'''
@ -165,7 +209,7 @@ def parseOpts(overrideArguments=None):
general.add_option('-U', '--update',
action='store_true', dest='update_self', help='update this program to latest version. Make sure that you have sufficient permissions (run with sudo if needed)')
general.add_option('-i', '--ignore-errors',
action='store_true', dest='ignoreerrors', help='continue on download errors, for example to to skip unavailable videos in a playlist', default=False)
action='store_true', dest='ignoreerrors', help='continue on download errors, for example to skip unavailable videos in a playlist', default=False)
general.add_option('--abort-on-error',
action='store_false', dest='ignoreerrors',
help='Abort downloading of further videos (in the playlist or the command line) if an error occurs')
@ -202,6 +246,11 @@ def parseOpts(overrideArguments=None):
general.add_option('--default-search',
dest='default_search', metavar='PREFIX',
help='Use this prefix for unqualified URLs. For example "gvsearch2:" downloads two videos from google videos for youtube-dl "large apple". By default (with value "auto") youtube-dl guesses.')
general.add_option(
'--ignore-config',
action='store_true',
help='Do not read configuration files. When given in the global configuration file /etc/youtube-dl.conf: do not read the user configuration in ~/.config/youtube-dl.conf (%APPDATA%/youtube-dl/config.txt on Windows)')
selection.add_option(
'--playlist-start',
@ -261,7 +310,7 @@ def parseOpts(overrideArguments=None):
video_format.add_option('-f', '--format',
action='store', dest='format', metavar='FORMAT', default=None,
help='video format code, specify the order of preference using slashes: "-f 22/17/18". "-f mp4" and "-f flv" are also supported. You can also use the special names "best", "bestaudio", "worst", and "worstaudio"')
help='video format code, specify the order of preference using slashes: "-f 22/17/18". "-f mp4" and "-f flv" are also supported. You can also use the special names "best", "bestaudio", "worst", and "worstaudio". By default, youtube-dl will pick the best quality.')
video_format.add_option('--all-formats',
action='store_const', dest='format', help='download all available video formats', const='all')
video_format.add_option('--prefer-free-formats',
@ -456,44 +505,18 @@ def parseOpts(overrideArguments=None):
if opts.verbose:
write_string(u'[debug] Override config: ' + repr(overrideArguments) + '\n')
else:
systemConf = _readOptions('/etc/youtube-dl.conf')
xdg_config_home = os.environ.get('XDG_CONFIG_HOME')
if xdg_config_home:
userConfFile = os.path.join(xdg_config_home, 'youtube-dl', 'config')
if not os.path.isfile(userConfFile):
userConfFile = os.path.join(xdg_config_home, 'youtube-dl.conf')
else:
userConfFile = os.path.join(os.path.expanduser('~'), '.config', 'youtube-dl', 'config')
if not os.path.isfile(userConfFile):
userConfFile = os.path.join(os.path.expanduser('~'), '.config', 'youtube-dl.conf')
userConf = _readOptions(userConfFile, None)
if userConf is None:
appdata_dir = os.environ.get('appdata')
if appdata_dir:
userConf = _readOptions(
os.path.join(appdata_dir, 'youtube-dl', 'config'),
default=None)
if userConf is None:
userConf = _readOptions(
os.path.join(appdata_dir, 'youtube-dl', 'config.txt'),
default=None)
if userConf is None:
userConf = _readOptions(
os.path.join(os.path.expanduser('~'), 'youtube-dl.conf'),
default=None)
if userConf is None:
userConf = _readOptions(
os.path.join(os.path.expanduser('~'), 'youtube-dl.conf.txt'),
default=None)
if userConf is None:
userConf = []
commandLineConf = sys.argv[1:]
if '--ignore-config' in commandLineConf:
systemConf = []
userConf = []
else:
systemConf = _readOptions('/etc/youtube-dl.conf')
if '--ignore-config' in systemConf:
userConf = []
else:
userConf = _readUserConf()
argv = systemConf + userConf + commandLineConf
opts, args = parser.parse_args(argv)
if opts.verbose:
write_string(u'[debug] System config: ' + repr(_hide_login_info(systemConf)) + '\n')

View File

@ -1,23 +1,29 @@
from __future__ import unicode_literals
from .common import FileDownloader
from .hls import HlsFD
from .http import HttpFD
from .mplayer import MplayerFD
from .rtmp import RtmpFD
from .f4m import F4mFD
from ..utils import (
determine_ext,
)
def get_suitable_downloader(info_dict):
"""Get the downloader class that can handle the info dict."""
url = info_dict['url']
protocol = info_dict.get('protocol')
if url.startswith('rtmp'):
return RtmpFD
if determine_ext(url) == u'm3u8':
if (protocol == 'm3u8') or (protocol is None and determine_ext(url) == 'm3u8'):
return HlsFD
if url.startswith('mms') or url.startswith('rtsp'):
return MplayerFD
if determine_ext(url) == 'f4m':
return F4mFD
else:
return HttpFD

View File

@ -314,4 +314,3 @@ class FileDownloader(object):
if the download is successful.
"""
self._progress_hooks.append(ph)

View File

@ -0,0 +1,314 @@
from __future__ import unicode_literals
import base64
import io
import itertools
import os
import time
import xml.etree.ElementTree as etree
from .common import FileDownloader
from .http import HttpFD
from ..utils import (
struct_pack,
struct_unpack,
compat_urlparse,
format_bytes,
encodeFilename,
sanitize_open,
)
class FlvReader(io.BytesIO):
"""
Reader for Flv files
The file format is documented in https://www.adobe.com/devnet/f4v.html
"""
# Utility functions for reading numbers and strings
def read_unsigned_long_long(self):
return struct_unpack('!Q', self.read(8))[0]
def read_unsigned_int(self):
return struct_unpack('!I', self.read(4))[0]
def read_unsigned_char(self):
return struct_unpack('!B', self.read(1))[0]
def read_string(self):
res = b''
while True:
char = self.read(1)
if char == b'\x00':
break
res += char
return res
def read_box_info(self):
"""
Read a box and return the info as a tuple: (box_size, box_type, box_data)
"""
real_size = size = self.read_unsigned_int()
box_type = self.read(4)
header_end = 8
if size == 1:
real_size = self.read_unsigned_long_long()
header_end = 16
return real_size, box_type, self.read(real_size-header_end)
def read_asrt(self):
# version
self.read_unsigned_char()
# flags
self.read(3)
quality_entry_count = self.read_unsigned_char()
# QualityEntryCount
for i in range(quality_entry_count):
self.read_string()
segment_run_count = self.read_unsigned_int()
segments = []
for i in range(segment_run_count):
first_segment = self.read_unsigned_int()
fragments_per_segment = self.read_unsigned_int()
segments.append((first_segment, fragments_per_segment))
return {
'segment_run': segments,
}
def read_afrt(self):
# version
self.read_unsigned_char()
# flags
self.read(3)
# time scale
self.read_unsigned_int()
quality_entry_count = self.read_unsigned_char()
# QualitySegmentUrlModifiers
for i in range(quality_entry_count):
self.read_string()
fragments_count = self.read_unsigned_int()
fragments = []
for i in range(fragments_count):
first = self.read_unsigned_int()
first_ts = self.read_unsigned_long_long()
duration = self.read_unsigned_int()
if duration == 0:
discontinuity_indicator = self.read_unsigned_char()
else:
discontinuity_indicator = None
fragments.append({
'first': first,
'ts': first_ts,
'duration': duration,
'discontinuity_indicator': discontinuity_indicator,
})
return {
'fragments': fragments,
}
def read_abst(self):
# version
self.read_unsigned_char()
# flags
self.read(3)
self.read_unsigned_int() # BootstrapinfoVersion
# Profile,Live,Update,Reserved
self.read(1)
# time scale
self.read_unsigned_int()
# CurrentMediaTime
self.read_unsigned_long_long()
# SmpteTimeCodeOffset
self.read_unsigned_long_long()
self.read_string() # MovieIdentifier
server_count = self.read_unsigned_char()
# ServerEntryTable
for i in range(server_count):
self.read_string()
quality_count = self.read_unsigned_char()
# QualityEntryTable
for i in range(quality_count):
self.read_string()
# DrmData
self.read_string()
# MetaData
self.read_string()
segments_count = self.read_unsigned_char()
segments = []
for i in range(segments_count):
box_size, box_type, box_data = self.read_box_info()
assert box_type == b'asrt'
segment = FlvReader(box_data).read_asrt()
segments.append(segment)
fragments_run_count = self.read_unsigned_char()
fragments = []
for i in range(fragments_run_count):
box_size, box_type, box_data = self.read_box_info()
assert box_type == b'afrt'
fragments.append(FlvReader(box_data).read_afrt())
return {
'segments': segments,
'fragments': fragments,
}
def read_bootstrap_info(self):
total_size, box_type, box_data = self.read_box_info()
assert box_type == b'abst'
return FlvReader(box_data).read_abst()
def read_bootstrap_info(bootstrap_bytes):
return FlvReader(bootstrap_bytes).read_bootstrap_info()
def build_fragments_list(boot_info):
""" Return a list of (segment, fragment) for each fragment in the video """
res = []
segment_run_table = boot_info['segments'][0]
# I've only found videos with one segment
segment_run_entry = segment_run_table['segment_run'][0]
n_frags = segment_run_entry[1]
fragment_run_entry_table = boot_info['fragments'][0]['fragments']
first_frag_number = fragment_run_entry_table[0]['first']
for (i, frag_number) in zip(range(1, n_frags+1), itertools.count(first_frag_number)):
res.append((1, frag_number))
return res
def write_flv_header(stream, metadata):
"""Writes the FLV header and the metadata to stream"""
# FLV header
stream.write(b'FLV\x01')
stream.write(b'\x05')
stream.write(b'\x00\x00\x00\x09')
# FLV File body
stream.write(b'\x00\x00\x00\x00')
# FLVTAG
# Script data
stream.write(b'\x12')
# Size of the metadata with 3 bytes
stream.write(struct_pack('!L', len(metadata))[1:])
stream.write(b'\x00\x00\x00\x00\x00\x00\x00')
stream.write(metadata)
# Magic numbers extracted from the output files produced by AdobeHDS.php
#(https://github.com/K-S-V/Scripts)
stream.write(b'\x00\x00\x01\x73')
def _add_ns(prop):
return '{http://ns.adobe.com/f4m/1.0}%s' % prop
class HttpQuietDownloader(HttpFD):
def to_screen(self, *args, **kargs):
pass
class F4mFD(FileDownloader):
"""
A downloader for f4m manifests or AdobeHDS.
"""
def real_download(self, filename, info_dict):
man_url = info_dict['url']
self.to_screen('[download] Downloading f4m manifest')
manifest = self.ydl.urlopen(man_url).read()
self.report_destination(filename)
http_dl = HttpQuietDownloader(self.ydl,
{
'continuedl': True,
'quiet': True,
'noprogress': True,
'test': self.params.get('test', False),
})
doc = etree.fromstring(manifest)
formats = [(int(f.attrib.get('bitrate', -1)), f) for f in doc.findall(_add_ns('media'))]
formats = sorted(formats, key=lambda f: f[0])
rate, media = formats[-1]
base_url = compat_urlparse.urljoin(man_url, media.attrib['url'])
bootstrap = base64.b64decode(doc.find(_add_ns('bootstrapInfo')).text)
metadata = base64.b64decode(media.find(_add_ns('metadata')).text)
boot_info = read_bootstrap_info(bootstrap)
fragments_list = build_fragments_list(boot_info)
if self.params.get('test', False):
# We only download the first fragment
fragments_list = fragments_list[:1]
total_frags = len(fragments_list)
tmpfilename = self.temp_name(filename)
(dest_stream, tmpfilename) = sanitize_open(tmpfilename, 'wb')
write_flv_header(dest_stream, metadata)
# This dict stores the download progress, it's updated by the progress
# hook
state = {
'downloaded_bytes': 0,
'frag_counter': 0,
}
start = time.time()
def frag_progress_hook(status):
frag_total_bytes = status.get('total_bytes', 0)
estimated_size = (state['downloaded_bytes'] +
(total_frags - state['frag_counter']) * frag_total_bytes)
if status['status'] == 'finished':
state['downloaded_bytes'] += frag_total_bytes
state['frag_counter'] += 1
progress = self.calc_percent(state['frag_counter'], total_frags)
byte_counter = state['downloaded_bytes']
else:
frag_downloaded_bytes = status['downloaded_bytes']
byte_counter = state['downloaded_bytes'] + frag_downloaded_bytes
frag_progress = self.calc_percent(frag_downloaded_bytes,
frag_total_bytes)
progress = self.calc_percent(state['frag_counter'], total_frags)
progress += frag_progress / float(total_frags)
eta = self.calc_eta(start, time.time(), estimated_size, byte_counter)
self.report_progress(progress, format_bytes(estimated_size),
status.get('speed'), eta)
http_dl.add_progress_hook(frag_progress_hook)
frags_filenames = []
for (seg_i, frag_i) in fragments_list:
name = 'Seg%d-Frag%d' % (seg_i, frag_i)
url = base_url + name
frag_filename = '%s-%s' % (tmpfilename, name)
success = http_dl.download(frag_filename, {'url': url})
if not success:
return False
with open(frag_filename, 'rb') as down:
down_data = down.read()
reader = FlvReader(down_data)
while True:
_, box_type, box_data = reader.read_box_info()
if box_type == b'mdat':
dest_stream.write(box_data)
break
frags_filenames.append(frag_filename)
self.report_finish(format_bytes(state['downloaded_bytes']), time.time() - start)
self.try_rename(tmpfilename, filename)
for frag_file in frags_filenames:
os.remove(frag_file)
fsize = os.path.getsize(encodeFilename(filename))
self._hook_progress({
'downloaded_bytes': fsize,
'total_bytes': fsize,
'filename': filename,
'status': 'finished',
})
return True

View File

@ -27,7 +27,7 @@ class HttpFD(FileDownloader):
request = compat_urllib_request.Request(url, None, headers)
if self.params.get('test', False):
request.add_header('Range','bytes=0-10240')
request.add_header('Range', 'bytes=0-10240')
# Establish possible resume length
if os.path.isfile(encodeFilename(tmpfilename)):
@ -39,7 +39,7 @@ class HttpFD(FileDownloader):
if resume_len != 0:
if self.params.get('continuedl', False):
self.report_resuming_byte(resume_len)
request.add_header('Range','bytes=%d-' % resume_len)
request.add_header('Range', 'bytes=%d-' % resume_len)
open_mode = 'ab'
else:
resume_len = 0

View File

@ -18,7 +18,7 @@ class MplayerFD(FileDownloader):
try:
subprocess.call(['mplayer', '-h'], stdout=(open(os.path.devnull, 'w')), stderr=subprocess.STDOUT)
except (OSError, IOError):
self.report_error(u'MMS or RTSP download detected but "%s" could not be run' % args[0] )
self.report_error(u'MMS or RTSP download detected but "%s" could not be run' % args[0])
return False
# Download using mplayer.

View File

@ -87,8 +87,10 @@ class RtmpFD(FileDownloader):
url = info_dict['url']
player_url = info_dict.get('player_url', None)
page_url = info_dict.get('page_url', None)
app = info_dict.get('app', None)
play_path = info_dict.get('play_path', None)
tc_url = info_dict.get('tc_url', None)
flash_version = info_dict.get('flash_version', None)
live = info_dict.get('rtmp_live', False)
conn = info_dict.get('rtmp_conn', None)
@ -111,12 +113,16 @@ class RtmpFD(FileDownloader):
basic_args += ['--swfVfy', player_url]
if page_url is not None:
basic_args += ['--pageUrl', page_url]
if app is not None:
basic_args += ['--app', app]
if play_path is not None:
basic_args += ['--playpath', play_path]
if tc_url is not None:
basic_args += ['--tcUrl', url]
if test:
basic_args += ['--stop', '1']
if flash_version is not None:
basic_args += ['--flashVer', flash_version]
if live:
basic_args += ['--live']
if conn:

View File

@ -15,9 +15,11 @@ from .arte import (
from .auengine import AUEngineIE
from .bambuser import BambuserIE, BambuserChannelIE
from .bandcamp import BandcampIE, BandcampAlbumIE
from .bbccouk import BBCCoUkIE
from .blinkx import BlinkxIE
from .bliptv import BlipTVIE, BlipTVUserIE
from .bloomberg import BloombergIE
from .br import BRIE
from .breakcom import BreakIE
from .brightcove import BrightcoveIE
from .c56 import C56IE
@ -25,11 +27,16 @@ from .canalplus import CanalplusIE
from .canalc2 import Canalc2IE
from .cbs import CBSIE
from .channel9 import Channel9IE
from .chilloutzone import ChilloutzoneIE
from .cinemassacre import CinemassacreIE
from .clipfish import ClipfishIE
from .cliphunter import CliphunterIE
from .clipsyndicate import ClipsyndicateIE
from .cmt import CMTIE
from .cnn import CNNIE
from .cnn import (
CNNIE,
CNNBlogsIE,
)
from .collegehumor import CollegeHumorIE
from .comedycentral import ComedyCentralIE, ComedyCentralShowsIE
from .condenast import CondeNastIE
@ -47,22 +54,27 @@ from .depositfiles import DepositFilesIE
from .dotsub import DotsubIE
from .dreisat import DreiSatIE
from .defense import DefenseGouvFrIE
from .discovery import DiscoveryIE
from .dropbox import DropboxIE
from .ebaumsworld import EbaumsWorldIE
from .ehow import EHowIE
from .eighttracks import EightTracksIE
from .eitb import EitbIE
from .elpais import ElPaisIE
from .escapist import EscapistIE
from .everyonesmixtape import EveryonesMixtapeIE
from .exfm import ExfmIE
from .extremetube import ExtremeTubeIE
from .facebook import FacebookIE
from .faz import FazIE
from .firstpost import FirstpostIE
from .firsttv import FirstTVIE
from .fktv import (
FKTVIE,
FKTVPosteckeIE,
)
from .flickr import FlickrIE
from .fourtube import FourTubeIE
from .franceinter import FranceInterIE
from .francetv import (
PluzzIE,
@ -72,6 +84,7 @@ from .francetv import (
CultureboxIE,
)
from .freesound import FreesoundIE
from .freespeech import FreespeechIE
from .funnyordie import FunnyOrDieIE
from .gamekings import GamekingsIE
from .gamespot import GameSpotIE
@ -80,8 +93,10 @@ from .generic import GenericIE
from .googleplus import GooglePlusIE
from .googlesearch import GoogleSearchIE
from .hark import HarkIE
from .helsinki import HelsinkiIE
from .hotnewhiphop import HotNewHipHopIE
from .howcast import HowcastIE
from .huffpost import HuffPostIE
from .hypem import HypemIE
from .ign import IGNIE, OneUPIE
from .imdb import (
@ -92,10 +107,12 @@ from .ina import InaIE
from .infoq import InfoQIE
from .instagram import InstagramIE
from .internetvideoarchive import InternetVideoArchiveIE
from .iprima import IPrimaIE
from .ivi import (
IviIE,
IviCompilationIE
)
from .jadorecettepub import JadoreCettePubIE
from .jeuxvideo import JeuxVideoIE
from .jukebox import JukeboxIE
from .justintv import JustinTVIE
@ -105,13 +122,18 @@ from .keezmovies import KeezMoviesIE
from .khanacademy import KhanAcademyIE
from .kickstarter import KickStarterIE
from .keek import KeekIE
from .kontrtube import KontrTubeIE
from .la7 import LA7IE
from .lifenews import LifeNewsIE
from .liveleak import LiveLeakIE
from .livestream import LivestreamIE, LivestreamOriginalIE
from .lynda import (
LyndaIE,
LyndaCourseIE
)
from .m6 import M6IE
from .macgamestore import MacGameStoreIE
from .malemotion import MalemotionIE
from .mdr import MDRIE
from .metacafe import MetacafeIE
from .metacritic import MetacriticIE
@ -119,6 +141,7 @@ from .mit import TechTVMITIE, MITIE
from .mixcloud import MixcloudIE
from .mpora import MporaIE
from .mofosex import MofosexIE
from .mooshare import MooshareIE
from .mtv import (
MTVIE,
MTVIggyIE,
@ -130,11 +153,14 @@ from .myvideo import MyVideoIE
from .naver import NaverIE
from .nba import NBAIE
from .nbc import NBCNewsIE
from .ndr import NDRIE
from .ndtv import NDTVIE
from .newgrounds import NewgroundsIE
from .nfb import NFBIE
from .nhl import NHLIE, NHLVideocenterIE
from .niconico import NiconicoIE
from .ninegag import NineGagIE
from .normalboots import NormalbootsIE
from .novamov import NovamovIE
from .nowness import NownessIE
from .nowvideo import NowVideoIE
@ -155,7 +181,13 @@ from .ro220 import Ro220IE
from .rottentomatoes import RottenTomatoesIE
from .roxwel import RoxwelIE
from .rtlnow import RTLnowIE
from .rutube import RutubeIE
from .rutube import (
RutubeIE,
RutubeChannelIE,
RutubeMovieIE,
RutubePersonIE,
)
from .savefrom import SaveFromIE
from .servingsys import ServingSysIE
from .sina import SinaIE
from .slashdot import SlashdotIE
@ -180,16 +212,21 @@ from .stanfordoc import StanfordOpenClassroomIE
from .statigram import StatigramIE
from .steam import SteamIE
from .streamcloud import StreamcloudIE
from .streamcz import StreamCZIE
from .syfy import SyfyIE
from .sztvhu import SztvHuIE
from .teamcoco import TeamcocoIE
from .techtalks import TechTalksIE
from .ted import TEDIE
from .testurl import TestURLIE
from .tf1 import TF1IE
from .theplatform import ThePlatformIE
from .thisav import ThisAVIE
from .tinypic import TinyPicIE
from .toutv import TouTvIE
from .traileraddict import TrailerAddictIE
from .trilulilu import TriluliluIE
from .trutube import TruTubeIE
from .tube8 import Tube8IE
from .tudou import TudouIE
from .tumblr import TumblrIE
@ -200,9 +237,11 @@ from .ustream import UstreamIE, UstreamChannelIE
from .vbox7 import Vbox7IE
from .veehd import VeeHDIE
from .veoh import VeohIE
from .vesti import VestiIE
from .vevo import VevoIE
from .vice import ViceIE
from .viddler import ViddlerIE
from .videobam import VideoBamIE
from .videodetective import VideoDetectiveIE
from .videofyme import VideofyMeIE
from .videopremium import VideoPremiumIE
@ -217,8 +256,8 @@ from .vimeo import (
from .vine import VineIE
from .viki import VikiIE
from .vk import VKIE
from .vube import VubeIE
from .wat import WatIE
from .websurg import WeBSurgIE
from .weibo import WeiboIE
from .wimp import WimpIE
from .wistia import WistiaIE

View File

@ -1,22 +1,28 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
ExtractorError,
)
class ARDIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?(?:(?:www\.)?ardmediathek\.de|mediathek\.daserste\.de)/(?:.*/)(?P<video_id>[^/\?]+)(?:\?.*)?'
_TITLE = r'<h1(?: class="boxTopHeadline")?>(?P<title>.*)</h1>'
_MEDIA_STREAM = r'mediaCollection\.addMediaStream\((?P<media_type>\d+), (?P<quality>\d+), "(?P<rtmp_url>[^"]*)", "(?P<video_url>[^"]*)", "[^"]*"\)'
_VALID_URL = r'^https?://(?:(?:www\.)?ardmediathek\.de|mediathek\.daserste\.de)/(?:.*/)(?P<video_id>[^/\?]+)(?:\?.*)?'
_TEST = {
u'url': u'http://www.ardmediathek.de/das-erste/tagesschau-in-100-sek?documentId=14077640',
u'file': u'14077640.mp4',
u'md5': u'6ca8824255460c787376353f9e20bbd8',
u'info_dict': {
u"title": u"11.04.2013 09:23 Uhr - Tagesschau in 100 Sekunden"
'url': 'http://www.ardmediathek.de/das-erste/guenther-jauch/edward-snowden-im-interview-held-oder-verraeter?documentId=19288786',
'file': '19288786.mp4',
'md5': '515bf47ce209fb3f5a61b7aad364634c',
'info_dict': {
'title': 'Edward Snowden im Interview - Held oder Verräter?',
'description': 'Edward Snowden hat alles aufs Spiel gesetzt, um die weltweite \xdcberwachung durch die Geheimdienste zu enttarnen. Nun stellt sich der ehemalige NSA-Mitarbeiter erstmals weltweit in einem TV-Interview den Fragen eines NDR-Journalisten. Die Sendung vom Sonntagabend.',
'thumbnail': 'http://www.ardmediathek.de/ard/servlet/contentblob/19/28/87/90/19288790/bild/2250037',
},
u'skip': u'Requires rtmpdump'
'skip': 'Blocked outside of Germany',
}
def _real_extract(self, url):
@ -29,26 +35,49 @@ class ARDIE(InfoExtractor):
else:
video_id = m.group('video_id')
# determine title and media streams from webpage
html = self._download_webpage(url, video_id)
title = re.search(self._TITLE, html).group('title')
streams = [mo.groupdict() for mo in re.finditer(self._MEDIA_STREAM, html)]
webpage = self._download_webpage(url, video_id)
title = self._html_search_regex(
r'<h1(?:\s+class="boxTopHeadline")?>(.*?)</h1>', webpage, 'title')
description = self._html_search_meta(
'dcterms.abstract', webpage, 'description')
thumbnail = self._og_search_thumbnail(webpage)
streams = [
mo.groupdict()
for mo in re.finditer(
r'mediaCollection\.addMediaStream\((?P<media_type>\d+), (?P<quality>\d+), "(?P<rtmp_url>[^"]*)", "(?P<video_url>[^"]*)", "[^"]*"\)', webpage)]
if not streams:
assert '"fsk"' in html
raise ExtractorError(u'This video is only available after 8:00 pm')
if '"fsk"' in webpage:
raise ExtractorError('This video is only available after 20:00')
# choose default media type and highest quality for now
stream = max([s for s in streams if int(s["media_type"]) == 0],
key=lambda s: int(s["quality"]))
# there's two possibilities: RTMP stream or HTTP download
info = {'id': video_id, 'title': title, 'ext': 'mp4'}
if stream['rtmp_url']:
self.to_screen(u'RTMP download detected')
assert stream['video_url'].startswith('mp4:')
info["url"] = stream["rtmp_url"]
info["play_path"] = stream['video_url']
formats = []
for s in streams:
format = {
'quality': int(s['quality']),
}
if s.get('rtmp_url'):
format['protocol'] = 'rtmp'
format['url'] = s['rtmp_url']
format['playpath'] = s['video_url']
else:
assert stream["video_url"].endswith('.mp4')
info["url"] = stream["video_url"]
return [info]
format['url'] = s['video_url']
quality_name = self._search_regex(
r'[,.]([a-zA-Z0-9_-]+),?\.mp4', format['url'],
'quality name', default='NA')
format['format_id'] = '%s-%s-%s-%s' % (
determine_ext(format['url']), quality_name, s['media_type'],
s['quality'])
formats.append(format)
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': description,
'formats': formats,
'thumbnail': thumbnail,
}

View File

@ -0,0 +1,223 @@
from __future__ import unicode_literals
import re
from .subtitles import SubtitlesInfoExtractor
from ..utils import ExtractorError
class BBCCoUkIE(SubtitlesInfoExtractor):
IE_NAME = 'bbc.co.uk'
IE_DESC = 'BBC iPlayer'
_VALID_URL = r'https?://(?:www\.)?bbc\.co\.uk/(?:programmes|iplayer/episode)/(?P<id>[\da-z]{8})'
_TESTS = [
{
'url': 'http://www.bbc.co.uk/programmes/b039g8p7',
'info_dict': {
'id': 'b039d07m',
'ext': 'flv',
'title': 'Kaleidoscope: Leonard Cohen',
'description': 'md5:db4755d7a665ae72343779f7dacb402c',
'duration': 1740,
},
'params': {
# rtmp download
'skip_download': True,
}
},
{
'url': 'http://www.bbc.co.uk/iplayer/episode/b00yng5w/The_Man_in_Black_Series_3_The_Printed_Name/',
'info_dict': {
'id': 'b00yng1d',
'ext': 'flv',
'title': 'The Man in Black: Series 3: The Printed Name',
'description': "Mark Gatiss introduces Nicholas Pierpan's chilling tale of a writer's devilish pact with a mysterious man. Stars Ewan Bailey.",
'duration': 1800,
},
'params': {
# rtmp download
'skip_download': True,
},
'skip': 'Episode is no longer available on BBC iPlayer Radio',
},
{
'url': 'http://www.bbc.co.uk/iplayer/episode/b03vhd1f/The_Voice_UK_Series_3_Blind_Auditions_5/',
'info_dict': {
'id': 'b00yng1d',
'ext': 'flv',
'title': 'The Voice UK: Series 3: Blind Auditions 5',
'description': "Emma Willis and Marvin Humes present the fifth set of blind auditions in the singing competition, as the coaches continue to build their teams based on voice alone.",
'duration': 5100,
},
'params': {
# rtmp download
'skip_download': True,
},
'skip': 'Currently BBC iPlayer TV programmes are available to play in the UK only',
}
]
def _extract_asx_playlist(self, connection, programme_id):
asx = self._download_xml(connection.get('href'), programme_id, 'Downloading ASX playlist')
return [ref.get('href') for ref in asx.findall('./Entry/ref')]
def _extract_connection(self, connection, programme_id):
formats = []
protocol = connection.get('protocol')
supplier = connection.get('supplier')
if protocol == 'http':
href = connection.get('href')
# ASX playlist
if supplier == 'asx':
for i, ref in enumerate(self._extract_asx_playlist(connection, programme_id)):
formats.append({
'url': ref,
'format_id': 'ref%s_%s' % (i, supplier),
})
# Direct link
else:
formats.append({
'url': href,
'format_id': supplier,
})
elif protocol == 'rtmp':
application = connection.get('application', 'ondemand')
auth_string = connection.get('authString')
identifier = connection.get('identifier')
server = connection.get('server')
formats.append({
'url': '%s://%s/%s?%s' % (protocol, server, application, auth_string),
'play_path': identifier,
'app': '%s?%s' % (application, auth_string),
'page_url': 'http://www.bbc.co.uk',
'player_url': 'http://www.bbc.co.uk/emp/releases/iplayer/revisions/617463_618125_4/617463_618125_4_emp.swf',
'rtmp_live': False,
'ext': 'flv',
'format_id': supplier,
})
return formats
def _extract_items(self, playlist):
return playlist.findall('./{http://bbc.co.uk/2008/emp/playlist}item')
def _extract_medias(self, media_selection):
return media_selection.findall('./{http://bbc.co.uk/2008/mp/mediaselection}media')
def _extract_connections(self, media):
return media.findall('./{http://bbc.co.uk/2008/mp/mediaselection}connection')
def _extract_video(self, media, programme_id):
formats = []
vbr = int(media.get('bitrate'))
vcodec = media.get('encoding')
service = media.get('service')
width = int(media.get('width'))
height = int(media.get('height'))
file_size = int(media.get('media_file_size'))
for connection in self._extract_connections(media):
conn_formats = self._extract_connection(connection, programme_id)
for format in conn_formats:
format.update({
'format_id': '%s_%s' % (service, format['format_id']),
'width': width,
'height': height,
'vbr': vbr,
'vcodec': vcodec,
'filesize': file_size,
})
formats.extend(conn_formats)
return formats
def _extract_audio(self, media, programme_id):
formats = []
abr = int(media.get('bitrate'))
acodec = media.get('encoding')
service = media.get('service')
for connection in self._extract_connections(media):
conn_formats = self._extract_connection(connection, programme_id)
for format in conn_formats:
format.update({
'format_id': '%s_%s' % (service, format['format_id']),
'abr': abr,
'acodec': acodec,
})
formats.extend(conn_formats)
return formats
def _extract_captions(self, media, programme_id):
subtitles = {}
for connection in self._extract_connections(media):
captions = self._download_xml(connection.get('href'), programme_id, 'Downloading captions')
lang = captions.get('{http://www.w3.org/XML/1998/namespace}lang', 'en')
ps = captions.findall('./{0}body/{0}div/{0}p'.format('{http://www.w3.org/2006/10/ttaf1}'))
srt = ''
for pos, p in enumerate(ps):
srt += '%s\r\n%s --> %s\r\n%s\r\n\r\n' % (str(pos), p.get('begin'), p.get('end'),
p.text.strip() if p.text is not None else '')
subtitles[lang] = srt
return subtitles
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
group_id = mobj.group('id')
webpage = self._download_webpage(url, group_id, 'Downloading video page')
if re.search(r'id="emp-error" class="notinuk">', webpage):
raise ExtractorError('Currently BBC iPlayer TV programmes are available to play in the UK only',
expected=True)
playlist = self._download_xml('http://www.bbc.co.uk/iplayer/playlist/%s' % group_id, group_id,
'Downloading playlist XML')
no_items = playlist.find('./{http://bbc.co.uk/2008/emp/playlist}noItems')
if no_items is not None:
reason = no_items.get('reason')
if reason == 'preAvailability':
msg = 'Episode %s is not yet available' % group_id
elif reason == 'postAvailability':
msg = 'Episode %s is no longer available' % group_id
else:
msg = 'Episode %s is not available: %s' % (group_id, reason)
raise ExtractorError(msg, expected=True)
formats = []
subtitles = None
for item in self._extract_items(playlist):
kind = item.get('kind')
if kind != 'programme' and kind != 'radioProgramme':
continue
title = playlist.find('./{http://bbc.co.uk/2008/emp/playlist}title').text
description = playlist.find('./{http://bbc.co.uk/2008/emp/playlist}summary').text
programme_id = item.get('identifier')
duration = int(item.get('duration'))
media_selection = self._download_xml(
'http://open.live.bbc.co.uk/mediaselector/5/select/version/2.0/mediaset/pc/vpid/%s' % programme_id,
programme_id, 'Downloading media selection XML')
for media in self._extract_medias(media_selection):
kind = media.get('kind')
if kind == 'audio':
formats.extend(self._extract_audio(media, programme_id))
elif kind == 'video':
formats.extend(self._extract_video(media, programme_id))
elif kind == 'captions':
subtitles = self._extract_captions(media, programme_id)
if self._downloader.params.get('listsubtitles', False):
self._list_available_subtitles(programme_id, subtitles)
return
self._sort_formats(formats)
return {
'id': programme_id,
'title': title,
'description': description,
'duration': duration,
'formats': formats,
'subtitles': subtitles,
}

View File

@ -1,84 +1,88 @@
from __future__ import unicode_literals
import datetime
import json
import re
import socket
from .common import InfoExtractor
from .subtitles import SubtitlesInfoExtractor
from ..utils import (
compat_http_client,
compat_str,
compat_urllib_error,
compat_urllib_request,
ExtractorError,
unescapeHTML,
)
class BlipTVIE(InfoExtractor):
class BlipTVIE(SubtitlesInfoExtractor):
"""Information extractor for blip.tv"""
_VALID_URL = r'^(?:https?://)?(?:\w+\.)?blip\.tv/((.+/)|(play/)|(api\.swf#))(.+)$'
_VALID_URL = r'https?://(?:\w+\.)?blip\.tv/((.+/)|(play/)|(api\.swf#))(?P<presumptive_id>.+)$'
_TEST = {
_TESTS = [{
'url': 'http://blip.tv/cbr/cbr-exclusive-gotham-city-imposters-bats-vs-jokerz-short-3-5796352',
'file': '5779306.mov',
'md5': 'c6934ad0b6acf2bd920720ec888eb812',
'info_dict': {
'id': '5779306',
'ext': 'mov',
'upload_date': '20111205',
'description': 'md5:9bc31f227219cde65e47eeec8d2dc596',
'uploader': 'Comic Book Resources - CBR TV',
'title': 'CBR EXCLUSIVE: "Gotham City Imposters" Bats VS Jokerz Short 3',
}
}, {
# https://github.com/rg3/youtube-dl/pull/2274
'note': 'Video with subtitles',
'url': 'http://blip.tv/play/h6Uag5OEVgI.html',
'md5': '309f9d25b820b086ca163ffac8031806',
'info_dict': {
'id': '6586561',
'ext': 'mp4',
'uploader': 'Red vs. Blue',
'description': 'One-Zero-One',
'upload_date': '20130614',
'title': 'Red vs. Blue Season 11 Episode 1',
}
def report_direct_download(self, title):
"""Report information extraction."""
self.to_screen('%s: Direct download detected' % title)
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
if mobj is None:
raise ExtractorError('Invalid URL: %s' % url)
presumptive_id = mobj.group('presumptive_id')
# See https://github.com/rg3/youtube-dl/issues/857
embed_mobj = re.search(r'^(?:https?://)?(?:\w+\.)?blip\.tv/(?:play/|api\.swf#)([a-zA-Z0-9]+)', url)
embed_mobj = re.match(r'https?://(?:\w+\.)?blip\.tv/(?:play/|api\.swf#)([a-zA-Z0-9]+)', url)
if embed_mobj:
info_url = 'http://blip.tv/play/%s.x?p=1' % embed_mobj.group(1)
info_page = self._download_webpage(info_url, embed_mobj.group(1))
video_id = self._search_regex(r'data-episode-id="(\d+)', info_page, 'video_id')
video_id = self._search_regex(
r'data-episode-id="([0-9]+)', info_page, 'video_id')
return self.url_result('http://blip.tv/a/a-' + video_id, 'BlipTV')
if '?' in url:
cchar = '&'
else:
cchar = '?'
cchar = '&' if '?' in url else '?'
json_url = url + cchar + 'skin=json&version=2&no_wrap=1'
request = compat_urllib_request.Request(json_url)
request.add_header('User-Agent', 'iTunes/10.6.1')
self.report_extraction(mobj.group(1))
urlh = self._request_webpage(request, None, False,
'unable to download video info webpage')
try:
json_code_bytes = urlh.read()
json_code = json_code_bytes.decode('utf-8')
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
raise ExtractorError('Unable to read video info webpage: %s' % compat_str(err))
json_data = self._download_json(request, video_id=presumptive_id)
try:
json_data = json.loads(json_code)
if 'Post' in json_data:
data = json_data['Post']
else:
data = json_data
video_id = compat_str(data['item_id'])
upload_date = datetime.datetime.strptime(data['datestamp'], '%m-%d-%y %H:%M%p').strftime('%Y%m%d')
subtitles = {}
formats = []
if 'additionalMedia' in data:
for f in sorted(data['additionalMedia'], key=lambda f: int(f['media_height'])):
for f in data['additionalMedia']:
if f.get('file_type_srt') == 1:
LANGS = {
'english': 'en',
}
lang = f['role'].rpartition('-')[-1].strip().lower()
langcode = LANGS.get(lang, lang)
subtitles[langcode] = f['url']
continue
if not int(f['media_width']): # filter m3u8
continue
formats.append({
@ -93,11 +97,16 @@ class BlipTVIE(InfoExtractor):
'width': int(data['media']['width']),
'height': int(data['media']['height']),
})
self._sort_formats(formats)
# subtitles
video_subtitles = self.extract_subtitles(video_id, subtitles)
if self._downloader.params.get('listsubtitles', False):
self._list_available_subtitles(video_id, subtitles)
return
return {
'id': compat_str(data['item_id']),
'id': video_id,
'uploader': data['display_name'],
'upload_date': upload_date,
'title': data['title'],
@ -105,24 +114,24 @@ class BlipTVIE(InfoExtractor):
'description': data['description'],
'user_agent': 'iTunes/10.6.1',
'formats': formats,
'subtitles': video_subtitles,
}
except (ValueError, KeyError) as err:
raise ExtractorError('Unable to parse video information: %s' % repr(err))
def _download_subtitle_url(self, sub_lang, url):
# For some weird reason, blip.tv serves a video instead of subtitles
# when we request with a common UA
req = compat_urllib_request.Request(url)
req.add_header('Youtubedl-user-agent', 'youtube-dl')
return self._download_webpage(req, None, note=False)
class BlipTVUserIE(InfoExtractor):
"""Information Extractor for blip.tv users."""
_VALID_URL = r'(?:(?:(?:https?://)?(?:\w+\.)?blip\.tv/)|bliptvuser:)([^/]+)/*$'
_PAGE_SIZE = 12
IE_NAME = 'blip.tv:user'
def _real_extract(self, url):
# Extract username
mobj = re.match(self._VALID_URL, url)
if mobj is None:
raise ExtractorError('Invalid URL: %s' % url)
username = mobj.group(1)
page_base = 'http://m.blip.tv/pr/show_get_full_episode_list?users_id=%s&lite=0&esi=1'
@ -131,7 +140,6 @@ class BlipTVUserIE(InfoExtractor):
mobj = re.search(r'data-users-id="([^"]+)"', page)
page_base = page_base % mobj.group(1)
# Download video ids using BlipTV Ajax calls. Result size per
# query is limited (currently to 12 videos) so we need to query
# page by page until there are no video ids - it means we got
@ -142,8 +150,8 @@ class BlipTVUserIE(InfoExtractor):
while True:
url = page_base + "&page=" + str(pagenum)
page = self._download_webpage(url, username,
'Downloading video ids from page %d' % pagenum)
page = self._download_webpage(
url, username, 'Downloading video ids from page %d' % pagenum)
# Extract video identifiers
ids_in_page = []
@ -167,4 +175,4 @@ class BlipTVUserIE(InfoExtractor):
urls = ['http://blip.tv/%s' % video_id for video_id in video_ids]
url_entries = [self.url_result(vurl, 'BlipTV') for vurl in urls]
return [self.playlist_result(url_entries, playlist_title = username)]
return [self.playlist_result(url_entries, playlist_title=username)]

View File

@ -24,5 +24,7 @@ class BloombergIE(InfoExtractor):
mobj = re.match(self._VALID_URL, url)
name = mobj.group('name')
webpage = self._download_webpage(url, name)
ooyala_code = self._search_regex(r'<source src="http://player.ooyala.com/player/[^/]+/([^".]+)', webpage, u'ooyala url')
return OoyalaIE._build_url_result(ooyala_code)
embed_code = self._search_regex(
r'<source src="https?://[^/]+/[^/]+/[^/]+/([^/]+)', webpage,
'embed code')
return OoyalaIE._build_url_result(embed_code)

View File

@ -0,0 +1,80 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import ExtractorError
class BRIE(InfoExtractor):
IE_DESC = "Bayerischer Rundfunk Mediathek"
_VALID_URL = r"^https?://(?:www\.)?br\.de/mediathek/video/(?:sendungen/)?(?P<id>[a-z0-9\-]+)\.html$"
_BASE_URL = "http://www.br.de"
_TEST = {
"url": "http://www.br.de/mediathek/video/anselm-gruen-114.html",
"md5": "c4f83cf0f023ba5875aba0bf46860df2",
"info_dict": {
"id": "2c8d81c5-6fb7-4a74-88d4-e768e5856532",
"ext": "mp4",
"title": "Feiern und Verzichten",
"description": "Anselm Grün: Feiern und Verzichten",
"uploader": "BR/Birgit Baier",
"upload_date": "20140301"
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('id')
page = self._download_webpage(url, display_id)
xml_url = self._search_regex(
r"return BRavFramework\.register\(BRavFramework\('avPlayer_(?:[a-f0-9-]{36})'\)\.setup\({dataURL:'(/mediathek/video/[a-z0-9/~_.-]+)'}\)\);", page, "XMLURL")
xml = self._download_xml(self._BASE_URL + xml_url, None)
videos = [{
"id": xml_video.get("externalId"),
"title": xml_video.find("title").text,
"formats": self._extract_formats(xml_video.find("assets")),
"thumbnails": self._extract_thumbnails(xml_video.find("teaserImage/variants")),
"description": " ".join(xml_video.find("shareTitle").text.splitlines()),
"uploader": xml_video.find("author").text,
"upload_date": "".join(reversed(xml_video.find("broadcastDate").text.split("."))),
"webpage_url": xml_video.find("permalink").text,
} for xml_video in xml.findall("video")]
if len(videos) > 1:
self._downloader.report_warning(
'found multiple videos; please '
'report this with the video URL to http://yt-dl.org/bug')
if not videos:
raise ExtractorError('No video entries found')
return videos[0]
def _extract_formats(self, assets):
formats = [{
"url": asset.find("downloadUrl").text,
"ext": asset.find("mediaType").text,
"format_id": asset.get("type"),
"width": int(asset.find("frameWidth").text),
"height": int(asset.find("frameHeight").text),
"tbr": int(asset.find("bitrateVideo").text),
"abr": int(asset.find("bitrateAudio").text),
"vcodec": asset.find("codecVideo").text,
"container": asset.find("mediaType").text,
"filesize": int(asset.find("size").text),
} for asset in assets.findall("asset")
if asset.find("downloadUrl") is not None]
self._sort_formats(formats)
return formats
def _extract_thumbnails(self, variants):
thumbnails = [{
"url": self._BASE_URL + variant.find("url").text,
"width": int(variant.find("width").text),
"height": int(variant.find("height").text),
} for variant in variants.findall("variant")]
thumbnails.sort(key=lambda x: x["width"] * x["height"], reverse=True)
return thumbnails

View File

@ -1,18 +1,20 @@
from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor
from ..utils import determine_ext
class BreakIE(InfoExtractor):
_VALID_URL = r'(?:http://)?(?:www\.)?break\.com/video/([^/]+)'
_VALID_URL = r'http://(?:www\.)?break\.com/video/([^/]+)'
_TEST = {
u'url': u'http://www.break.com/video/when-girls-act-like-guys-2468056',
u'file': u'2468056.mp4',
u'md5': u'a3513fb1547fba4fb6cfac1bffc6c46b',
u'info_dict': {
u"title": u"When Girls Act Like D-Bags"
'url': 'http://www.break.com/video/when-girls-act-like-guys-2468056',
'md5': 'a3513fb1547fba4fb6cfac1bffc6c46b',
'info_dict': {
'id': '2468056',
'ext': 'mp4',
'title': 'When Girls Act Like D-Bags',
}
}
@ -21,18 +23,17 @@ class BreakIE(InfoExtractor):
video_id = mobj.group(1).split("-")[-1]
embed_url = 'http://www.break.com/embed/%s' % video_id
webpage = self._download_webpage(embed_url, video_id)
info_json = self._search_regex(r'var embedVars = ({.*?});', webpage,
u'info json', flags=re.DOTALL)
info_json = self._search_regex(r'var embedVars = ({.*})\s*?</script>',
webpage, 'info json', flags=re.DOTALL)
info = json.loads(info_json)
video_url = info['videoUri']
m_youtube = re.search(r'(https?://www\.youtube\.com/watch\?v=.*)', video_url)
if m_youtube is not None:
return self.url_result(m_youtube.group(1), 'Youtube')
final_url = video_url + '?' + info['AuthToken']
return [{
return {
'id': video_id,
'url': final_url,
'ext': determine_ext(final_url),
'title': info['contentName'],
'thumbnail': info['thumbUri'],
}]
}

View File

@ -17,13 +17,13 @@ from ..utils import (
ExtractorError,
unsmuggle_url,
unescapeHTML,
)
class BrightcoveIE(InfoExtractor):
_VALID_URL = r'https?://.*brightcove\.com/(services|viewer).*\?(?P<query>.*)'
_FEDERATED_URL_TEMPLATE = 'http://c.brightcove.com/services/viewer/htmlFederated?%s'
_PLAYLIST_URL_TEMPLATE = 'http://c.brightcove.com/services/json/experience/runtime/?command=get_programming_for_experience&playerKey=%s'
_TESTS = [
{
@ -70,7 +70,7 @@ class BrightcoveIE(InfoExtractor):
'description': 'md5:363109c02998fee92ec02211bd8000df',
'uploader': 'National Ballet of Canada',
},
},
}
]
@classmethod
@ -128,20 +128,28 @@ class BrightcoveIE(InfoExtractor):
@classmethod
def _extract_brightcove_url(cls, webpage):
"""Try to extract the brightcove url from the wepbage, returns None
"""Try to extract the brightcove url from the webpage, returns None
if it can't be found
"""
m_brightcove = re.search(
urls = cls._extract_brightcove_urls(webpage)
return urls[0] if urls else None
@classmethod
def _extract_brightcove_urls(cls, webpage):
"""Return a list of all Brightcove URLs from the webpage """
url_m = re.search(r'<meta\s+property="og:video"\s+content="(http://c.brightcove.com/[^"]+)"', webpage)
if url_m:
return [unescapeHTML(url_m.group(1))]
matches = re.findall(
r'''(?sx)<object
(?:
[^>]+?class=([\'"])[^>]*?BrightcoveExperience.*?\1 |
[^>]+?class=[\'"][^>]*?BrightcoveExperience.*?[\'"] |
[^>]*?>\s*<param\s+name="movie"\s+value="https?://[^/]*brightcove\.com/
).+?</object>''',
webpage)
if m_brightcove is not None:
return cls._build_brighcove_url(m_brightcove.group())
else:
return None
return [cls._build_brighcove_url(m) for m in matches]
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
@ -183,8 +191,9 @@ class BrightcoveIE(InfoExtractor):
return self._extract_video_info(video_info)
def _get_playlist_info(self, player_key):
playlist_info = self._download_webpage(self._PLAYLIST_URL_TEMPLATE % player_key,
player_key, 'Downloading playlist information')
info_url = 'http://c.brightcove.com/services/json/experience/runtime/?command=get_programming_for_experience&playerKey=%s' % player_key
playlist_info = self._download_webpage(
info_url, player_key, 'Downloading playlist information')
json_data = json.loads(playlist_info)
if 'videoList' not in json_data:

View File

@ -1,4 +1,6 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
@ -9,11 +11,12 @@ class Canalc2IE(InfoExtractor):
_VALID_URL = r'http://.*?\.canalc2\.tv/video\.asp\?.*?idVideo=(?P<id>\d+)'
_TEST = {
u'url': u'http://www.canalc2.tv/video.asp?idVideo=12163&voir=oui',
u'file': u'12163.mp4',
u'md5': u'060158428b650f896c542dfbb3d6487f',
u'info_dict': {
u'title': u'Terrasses du Numérique'
'url': 'http://www.canalc2.tv/video.asp?idVideo=12163&voir=oui',
'md5': '060158428b650f896c542dfbb3d6487f',
'info_dict': {
'id': '12163',
'ext': 'mp4',
'title': 'Terrasses du Numérique'
}
}
@ -28,9 +31,10 @@ class Canalc2IE(InfoExtractor):
video_url = 'http://vod-flash.u-strasbg.fr:8080/' + file_name
title = self._html_search_regex(
r'class="evenement8">(.*?)</a>', webpage, u'title')
r'class="evenement8">(.*?)</a>', webpage, 'title')
return {'id': video_id,
return {
'id': video_id,
'ext': 'mp4',
'url': video_url,
'title': title,

View File

@ -1,4 +1,4 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
@ -13,36 +13,38 @@ class Channel9IE(InfoExtractor):
meta Search.PageType from web page HTML rather than URL itself, as it is
not always possible to do.
'''
IE_DESC = u'Channel 9'
IE_NAME = u'channel9'
_VALID_URL = r'^https?://(?:www\.)?channel9\.msdn\.com/(?P<contentpath>.+)/?'
IE_DESC = 'Channel 9'
IE_NAME = 'channel9'
_VALID_URL = r'https?://(?:www\.)?channel9\.msdn\.com/(?P<contentpath>.+)/?'
_TESTS = [
{
u'url': u'http://channel9.msdn.com/Events/TechEd/Australia/2013/KOS002',
u'file': u'Events_TechEd_Australia_2013_KOS002.mp4',
u'md5': u'bbd75296ba47916b754e73c3a4bbdf10',
u'info_dict': {
u'title': u'Developer Kick-Off Session: Stuff We Love',
u'description': u'md5:c08d72240b7c87fcecafe2692f80e35f',
u'duration': 4576,
u'thumbnail': u'http://media.ch9.ms/ch9/9d51/03902f2d-fc97-4d3c-b195-0bfe15a19d51/KOS002_220.jpg',
u'session_code': u'KOS002',
u'session_day': u'Day 1',
u'session_room': u'Arena 1A',
u'session_speakers': [ u'Ed Blankenship', u'Andrew Coates', u'Brady Gaster', u'Patrick Klug', u'Mads Kristensen' ],
'url': 'http://channel9.msdn.com/Events/TechEd/Australia/2013/KOS002',
'md5': 'bbd75296ba47916b754e73c3a4bbdf10',
'info_dict': {
'id': 'Events/TechEd/Australia/2013/KOS002',
'ext': 'mp4',
'title': 'Developer Kick-Off Session: Stuff We Love',
'description': 'md5:c08d72240b7c87fcecafe2692f80e35f',
'duration': 4576,
'thumbnail': 'http://media.ch9.ms/ch9/9d51/03902f2d-fc97-4d3c-b195-0bfe15a19d51/KOS002_220.jpg',
'session_code': 'KOS002',
'session_day': 'Day 1',
'session_room': 'Arena 1A',
'session_speakers': [ 'Ed Blankenship', 'Andrew Coates', 'Brady Gaster', 'Patrick Klug', 'Mads Kristensen' ],
},
},
{
u'url': u'http://channel9.msdn.com/posts/Self-service-BI-with-Power-BI-nuclear-testing',
u'file': u'posts_Self-service-BI-with-Power-BI-nuclear-testing.mp4',
u'md5': u'b43ee4529d111bc37ba7ee4f34813e68',
u'info_dict': {
u'title': u'Self-service BI with Power BI - nuclear testing',
u'description': u'md5:d1e6ecaafa7fb52a2cacdf9599829f5b',
u'duration': 1540,
u'thumbnail': u'http://media.ch9.ms/ch9/87e1/0300391f-a455-4c72-bec3-4422f19287e1/selfservicenuk_512.jpg',
u'authors': [ u'Mike Wilmot' ],
'url': 'http://channel9.msdn.com/posts/Self-service-BI-with-Power-BI-nuclear-testing',
'md5': 'b43ee4529d111bc37ba7ee4f34813e68',
'info_dict': {
'id': 'posts/Self-service-BI-with-Power-BI-nuclear-testing',
'ext': 'mp4',
'title': 'Self-service BI with Power BI - nuclear testing',
'description': 'md5:d1e6ecaafa7fb52a2cacdf9599829f5b',
'duration': 1540,
'thumbnail': 'http://media.ch9.ms/ch9/87e1/0300391f-a455-4c72-bec3-4422f19287e1/selfservicenuk_512.jpg',
'authors': [ 'Mike Wilmot' ],
},
}
]
@ -60,7 +62,7 @@ class Channel9IE(InfoExtractor):
return 0
units = m.group('units')
try:
exponent = [u'B', u'KB', u'MB', u'GB', u'TB', u'PB', u'EB', u'ZB', u'YB'].index(units.upper())
exponent = ['B', 'KB', 'MB', 'GB', 'TB', 'PB', 'EB', 'ZB', 'YB'].index(units.upper())
except ValueError:
return 0
size = float(m.group('size'))
@ -80,7 +82,7 @@ class Channel9IE(InfoExtractor):
'url': x.group('url'),
'format_id': x.group('quality'),
'format_note': x.group('note'),
'format': u'%s (%s)' % (x.group('quality'), x.group('note')),
'format': '%s (%s)' % (x.group('quality'), x.group('note')),
'filesize': self._restore_bytes(x.group('filesize')), # File size is approximate
'preference': self._known_formats.index(x.group('quality')),
'vcodec': 'none' if x.group('note') == 'Audio only' else None,
@ -91,10 +93,10 @@ class Channel9IE(InfoExtractor):
return formats
def _extract_title(self, html):
title = self._html_search_meta(u'title', html, u'title')
title = self._html_search_meta('title', html, 'title')
if title is None:
title = self._og_search_title(html)
TITLE_SUFFIX = u' (Channel 9)'
TITLE_SUFFIX = ' (Channel 9)'
if title is not None and title.endswith(TITLE_SUFFIX):
title = title[:-len(TITLE_SUFFIX)]
return title
@ -110,7 +112,7 @@ class Channel9IE(InfoExtractor):
m = re.search(DESCRIPTION_REGEX, html)
if m is not None:
return m.group('description')
return self._html_search_meta(u'description', html, u'description')
return self._html_search_meta('description', html, 'description')
def _extract_duration(self, html):
m = re.search(r'data-video_duration="(?P<hours>\d{2}):(?P<minutes>\d{2}):(?P<seconds>\d{2})"', html)
@ -172,7 +174,7 @@ class Channel9IE(InfoExtractor):
# Nothing to download
if len(formats) == 0 and slides is None and zip_ is None:
self._downloader.report_warning(u'None of recording, slides or zip are available for %s' % content_path)
self._downloader.report_warning('None of recording, slides or zip are available for %s' % content_path)
return
# Extract meta
@ -244,7 +246,7 @@ class Channel9IE(InfoExtractor):
return contents
def _extract_list(self, content_path):
rss = self._download_xml(self._RSS_URL % content_path, content_path, u'Downloading RSS')
rss = self._download_xml(self._RSS_URL % content_path, content_path, 'Downloading RSS')
entries = [self.url_result(session_url.text, 'Channel9')
for session_url in rss.findall('./channel/item/link')]
title_text = rss.find('./channel/title').text
@ -254,11 +256,11 @@ class Channel9IE(InfoExtractor):
mobj = re.match(self._VALID_URL, url)
content_path = mobj.group('contentpath')
webpage = self._download_webpage(url, content_path, u'Downloading web page')
webpage = self._download_webpage(url, content_path, 'Downloading web page')
page_type_m = re.search(r'<meta name="Search.PageType" content="(?P<pagetype>[^"]+)"/>', webpage)
if page_type_m is None:
raise ExtractorError(u'Search.PageType not found, don\'t know how to process this page', expected=True)
raise ExtractorError('Search.PageType not found, don\'t know how to process this page', expected=True)
page_type = page_type_m.group('pagetype')
if page_type == 'List': # List page, may contain list of 'item'-like objects
@ -268,4 +270,4 @@ class Channel9IE(InfoExtractor):
elif page_type == 'Session': # Event session page, may contain downloadable content
return self._extract_session(webpage, content_path)
else:
raise ExtractorError(u'Unexpected Search.PageType %s' % page_type, expected=True)
raise ExtractorError('Unexpected Search.PageType %s' % page_type, expected=True)

View File

@ -0,0 +1,97 @@
from __future__ import unicode_literals
import re
import base64
import json
from .common import InfoExtractor
from ..utils import (
clean_html,
ExtractorError
)
class ChilloutzoneIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?chilloutzone\.net/video/(?P<id>[\w|-]+)\.html'
_TESTS = [{
'url': 'http://www.chilloutzone.net/video/enemene-meck-alle-katzen-weg.html',
'md5': 'a76f3457e813ea0037e5244f509e66d1',
'info_dict': {
'id': 'enemene-meck-alle-katzen-weg',
'ext': 'mp4',
'title': 'Enemene Meck - Alle Katzen weg',
'description': 'Ist das der Umkehrschluss des Niesenden Panda-Babys?',
},
}, {
'note': 'Video hosted at YouTube',
'url': 'http://www.chilloutzone.net/video/eine-sekunde-bevor.html',
'info_dict': {
'id': '1YVQaAgHyRU',
'ext': 'mp4',
'title': '16 Photos Taken 1 Second Before Disaster',
'description': 'md5:58a8fcf6a459fe0a08f54140f0ad1814',
'uploader': 'BuzzFeedVideo',
'uploader_id': 'BuzzFeedVideo',
'upload_date': '20131105',
},
}, {
'note': 'Video hosted at Vimeo',
'url': 'http://www.chilloutzone.net/video/icon-blending.html',
'md5': '2645c678b8dc4fefcc0e1b60db18dac1',
'info_dict': {
'id': '85523671',
'ext': 'mp4',
'title': 'The Sunday Times - Icons',
'description': 'md5:3e1c0dc6047498d6728dcdaad0891762',
'uploader': 'Us',
'uploader_id': 'usfilms',
'upload_date': '20140131'
},
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
base64_video_info = self._html_search_regex(
r'var cozVidData = "(.+?)";', webpage, 'video data')
decoded_video_info = base64.b64decode(base64_video_info).decode("utf-8")
video_info_dict = json.loads(decoded_video_info)
# get video information from dict
video_url = video_info_dict['mediaUrl']
description = clean_html(video_info_dict.get('description'))
title = video_info_dict['title']
native_platform = video_info_dict['nativePlatform']
native_video_id = video_info_dict['nativeVideoId']
source_priority = video_info_dict['sourcePriority']
# If nativePlatform is None a fallback mechanism is used (i.e. youtube embed)
if native_platform is None:
youtube_url = self._html_search_regex(
r'<iframe.* src="((?:https?:)?//(?:[^.]+\.)?youtube\.com/.+?)"',
webpage, 'fallback video URL', default=None)
if youtube_url is not None:
return self.url_result(youtube_url, ie='Youtube')
# Non Fallback: Decide to use native source (e.g. youtube or vimeo) or
# the own CDN
if source_priority == 'native':
if native_platform == 'youtube':
return self.url_result(native_video_id, ie='Youtube')
if native_platform == 'vimeo':
return self.url_result(
'http://vimeo.com/' + native_video_id, ie='Vimeo')
if not video_url:
raise ExtractorError('No video found')
return {
'id': video_id,
'url': video_url,
'ext': 'mp4',
'title': title,
'description': description,
}

View File

@ -0,0 +1,56 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
translation_table = {
'a': 'h', 'd': 'e', 'e': 'v', 'f': 'o', 'g': 'f', 'i': 'd', 'l': 'n',
'm': 'a', 'n': 'm', 'p': 'u', 'q': 't', 'r': 's', 'v': 'p', 'x': 'r',
'y': 'l', 'z': 'i',
'$': ':', '&': '.', '(': '=', '^': '&', '=': '/',
}
class CliphunterIE(InfoExtractor):
IE_NAME = 'cliphunter'
_VALID_URL = r'''(?x)http://(?:www\.)?cliphunter\.com/w/
(?P<id>[0-9]+)/
(?P<seo>.+?)(?:$|[#\?])
'''
_TEST = {
'url': 'http://www.cliphunter.com/w/1012420/Fun_Jynx_Maze_solo',
'file': '1012420.flv',
'md5': '15e7740f30428abf70f4223478dc1225',
'info_dict': {
'title': 'Fun Jynx Maze solo',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
pl_fiji = self._search_regex(
r'pl_fiji = \'([^\']+)\'', webpage, 'video data')
pl_c_qual = self._search_regex(
r'pl_c_qual = "(.)"', webpage, 'video quality')
video_title = self._search_regex(
r'mediaTitle = "([^"]+)"', webpage, 'title')
video_url = ''.join(translation_table.get(c, c) for c in pl_fiji)
formats = [{
'url': video_url,
'format_id': pl_c_qual,
}]
return {
'id': video_id,
'title': video_title,
'formats': formats,
}

View File

@ -6,6 +6,7 @@ from .common import InfoExtractor
from ..utils import (
int_or_none,
parse_duration,
url_basename,
)
@ -98,3 +99,28 @@ class CNNIE(InfoExtractor):
'duration': duration,
'upload_date': upload_date,
}
class CNNBlogsIE(InfoExtractor):
_VALID_URL = r'https?://[^\.]+\.blogs\.cnn\.com/.+'
_TEST = {
'url': 'http://reliablesources.blogs.cnn.com/2014/02/09/criminalizing-journalism/',
'md5': '3e56f97b0b6ffb4b79f4ea0749551084',
'info_dict': {
'id': 'bestoftv/2014/02/09/criminalizing-journalism.cnn',
'ext': 'mp4',
'title': 'Criminalizing journalism?',
'description': 'Glenn Greenwald responds to comments made this week on Capitol Hill that journalists could be criminal accessories.',
'upload_date': '20140209',
},
'add_ie': ['CNN'],
}
def _real_extract(self, url):
webpage = self._download_webpage(url, url_basename(url))
cnn_url = self._html_search_regex(r'data-url="(.+?)"', webpage, 'cnn url')
return {
'_type': 'url',
'url': cnn_url,
'ie_key': CNNIE.ie_key(),
}

View File

@ -4,6 +4,7 @@ import json
import re
from .common import InfoExtractor
from ..utils import int_or_none
class CollegeHumorIE(InfoExtractor):
@ -11,24 +12,45 @@ class CollegeHumorIE(InfoExtractor):
_TESTS = [{
'url': 'http://www.collegehumor.com/video/6902724/comic-con-cosplay-catastrophe',
'file': '6902724.mp4',
'md5': 'dcc0f5c1c8be98dc33889a191f4c26bd',
'info_dict': {
'id': '6902724',
'ext': 'mp4',
'title': 'Comic-Con Cosplay Catastrophe',
'description': 'Fans get creative this year at San Diego. Too',
'description': 'Fans get creative this year',
'age_limit': 13,
},
},
{
'url': 'http://www.collegehumor.com/video/3505939/font-conference',
'file': '3505939.mp4',
'md5': '72fa701d8ef38664a4dbb9e2ab721816',
'info_dict': {
'id': '3505939',
'ext': 'mp4',
'title': 'Font Conference',
'description': 'This video wasn\'t long enough, so we made it double-spaced.',
'description': 'This video wasn\'t long enough,',
'age_limit': 10,
'duration': 179,
},
}]
},
# embedded youtube video
{
'url': 'http://www.collegehumor.com/embed/6950457',
'info_dict': {
'id': 'W5gMp3ZjYg4',
'ext': 'mp4',
'title': 'Funny Dogs Protecting Babies Compilation 2014 [NEW HD]',
'uploader': 'Funnyplox TV',
'uploader_id': 'funnyploxtv',
'description': 'md5:7ded37421526d54afdf005e25bc2b7a3',
'upload_date': '20140128',
},
'params': {
'skip_download': True,
},
'add_ie': ['Youtube'],
},
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
@ -38,6 +60,12 @@ class CollegeHumorIE(InfoExtractor):
data = json.loads(self._download_webpage(
jsonUrl, video_id, 'Downloading info JSON'))
vdata = data['video']
if vdata.get('youtubeId') is not None:
return {
'_type': 'url',
'url': vdata['youtubeId'],
'ie_key': 'Youtube',
}
AGE_LIMITS = {'nc17': 18, 'r': 18, 'pg13': 13, 'pg': 10, 'g': 0}
rating = vdata.get('rating')
@ -49,7 +77,7 @@ class CollegeHumorIE(InfoExtractor):
PREFS = {'high_quality': 2, 'low_quality': 0}
formats = []
for format_key in ('mp4', 'webm'):
for qname, qurl in vdata[format_key].items():
for qname, qurl in vdata.get(format_key, {}).items():
formats.append({
'format_id': format_key + '_' + qname,
'url': qurl,
@ -58,6 +86,8 @@ class CollegeHumorIE(InfoExtractor):
})
self._sort_formats(formats)
duration = int_or_none(vdata.get('duration'), 1000)
return {
'id': video_id,
'title': vdata['title'],
@ -65,4 +95,5 @@ class CollegeHumorIE(InfoExtractor):
'thumbnail': vdata.get('thumbnail'),
'formats': formats,
'age_limit': age_limit,
'duration': duration,
}

View File

@ -14,7 +14,7 @@ from ..utils import (
class ComedyCentralIE(MTVServicesInfoExtractor):
_VALID_URL = r'''(?x)https?://(?:www.)?comedycentral.com/
_VALID_URL = r'''(?x)https?://(?:www\.)?comedycentral\.com/
(video-clips|episodes|cc-studios|video-collections)
/(?P<title>.*)'''
_FEED_URL = 'http://comedycentral.com/feeds/mrss/'
@ -86,7 +86,7 @@ class ComedyCentralShowsIE(InfoExtractor):
@staticmethod
def _transform_rtmp_url(rtmp_video_url):
m = re.match(r'^rtmpe?://.*?/(?P<finalid>gsp.comedystor/.*)$', rtmp_video_url)
m = re.match(r'^rtmpe?://.*?/(?P<finalid>gsp\.comedystor/.*)$', rtmp_video_url)
if not m:
raise ExtractorError('Cannot transform RTMP url')
base = 'http://mtvnmobile.vo.llnwd.net/kip0/_pxn=1+_pxI0=Ripod-h264+_pxL0=undefined+_pxM0=+_pxK=18639+_pxE=mp4/44620/mtvnorigin/'

View File

@ -66,11 +66,12 @@ class InfoExtractor(object):
* asr Audio sampling rate in Hertz
* vbr Average video bitrate in KBit/s
* vcodec Name of the video codec in use
* container Name of the container format
* filesize The number of bytes, if known in advance
* player_url SWF Player URL (used for rtmpdump).
* protocol The protocol that will be used for the actual
download, lower-case.
"http", "https", "rtsp", "rtmp" or so.
"http", "https", "rtsp", "rtmp", "m3u8" or so.
* preference Order number of this format. If this field is
present and not None, the formats get sorted
by this field.
@ -239,7 +240,7 @@ class InfoExtractor(object):
except AttributeError:
url = url_or_request
if len(url) > 200:
h = u'___' + hashlib.md5(url).hexdigest()
h = u'___' + hashlib.md5(url.encode('utf-8')).hexdigest()
url = url[:200 - len(h)] + h
raw_filename = ('%s_%s.dump' % (video_id, url))
filename = sanitize_filename(raw_filename, restricted=True)
@ -270,8 +271,11 @@ class InfoExtractor(object):
def _download_json(self, url_or_request, video_id,
note=u'Downloading JSON metadata',
errnote=u'Unable to download JSON metadata'):
errnote=u'Unable to download JSON metadata',
transform_source=None):
json_string = self._download_webpage(url_or_request, video_id, note, errnote)
if transform_source:
json_string = transform_source(json_string)
try:
return json.loads(json_string)
except ValueError as ve:
@ -398,7 +402,7 @@ class InfoExtractor(object):
# Helper functions for extracting OpenGraph info
@staticmethod
def _og_regexes(prop):
content_re = r'content=(?:"([^>]+?)"|\'(.+?)\')'
content_re = r'content=(?:"([^>]+?)"|\'([^>]+?)\')'
property_re = r'(?:name|property)=[\'"]og:%s[\'"]' % re.escape(prop)
template = r'<meta[^>]+?%s[^>]+?%s'
return [
@ -464,7 +468,14 @@ class InfoExtractor(object):
}
return RATING_TABLE.get(rating.lower(), None)
def _twitter_search_player(self, html):
return self._html_search_meta('twitter:player', html,
'twitter card player')
def _sort_formats(self, formats):
if not formats:
raise ExtractorError(u'No video formats found')
def _formats_key(f):
# TODO remove the following workaround
from ..utils import determine_ext

View File

@ -30,7 +30,7 @@ class CondeNastIE(InfoExtractor):
'vanityfair': 'Vanity Fair',
}
_VALID_URL = r'http://(video|www).(?P<site>%s).com/(?P<type>watch|series|video)/(?P<id>.+)' % '|'.join(_SITES.keys())
_VALID_URL = r'http://(video|www)\.(?P<site>%s)\.com/(?P<type>watch|series|video)/(?P<id>.+)' % '|'.join(_SITES.keys())
IE_DESC = 'Condé Nast media group: %s' % ', '.join(sorted(_SITES.values()))
_TEST = {

View File

@ -1,4 +1,6 @@
# encoding: utf-8
from __future__ import unicode_literals
import re, base64, zlib
from hashlib import sha1
from math import pow, sqrt, floor
@ -18,29 +20,29 @@ from ..aes import (
)
class CrunchyrollIE(InfoExtractor):
_VALID_URL = r'(?:https?://)?(?:www\.)?(?P<url>crunchyroll\.com/[^/]*/[^/?&]*?(?P<video_id>[0-9]+))(?:[/?&]|$)'
_VALID_URL = r'(?:https?://)?(?:(?P<prefix>www|m)\.)?(?P<url>crunchyroll\.com/(?:[^/]*/[^/?&]*?|media/\?id=)(?P<video_id>[0-9]+))(?:[/?&]|$)'
_TESTS = [{
u'url': u'http://www.crunchyroll.com/wanna-be-the-strongest-in-the-world/episode-1-an-idol-wrestler-is-born-645513',
u'file': u'645513.flv',
#u'md5': u'b1639fd6ddfaa43788c85f6d1dddd412',
u'info_dict': {
u'title': u'Wanna be the Strongest in the World Episode 1 An Idol-Wrestler is Born!',
u'description': u'md5:2d17137920c64f2f49981a7797d275ef',
u'thumbnail': u'http://img1.ak.crunchyroll.com/i/spire1-tmb/20c6b5e10f1a47b10516877d3c039cae1380951166_full.jpg',
u'uploader': u'Yomiuri Telecasting Corporation (YTV)',
u'upload_date': u'20131013',
'url': 'http://www.crunchyroll.com/wanna-be-the-strongest-in-the-world/episode-1-an-idol-wrestler-is-born-645513',
'file': '645513.flv',
#'md5': 'b1639fd6ddfaa43788c85f6d1dddd412',
'info_dict': {
'title': 'Wanna be the Strongest in the World Episode 1 An Idol-Wrestler is Born!',
'description': 'md5:2d17137920c64f2f49981a7797d275ef',
'thumbnail': 'http://img1.ak.crunchyroll.com/i/spire1-tmb/20c6b5e10f1a47b10516877d3c039cae1380951166_full.jpg',
'uploader': 'Yomiuri Telecasting Corporation (YTV)',
'upload_date': '20131013',
},
u'params': {
'params': {
# rtmp
u'skip_download': True,
'skip_download': True,
},
}]
_FORMAT_IDS = {
u'360': (u'60', u'106'),
u'480': (u'61', u'106'),
u'720': (u'62', u'106'),
u'1080': (u'80', u'108'),
'360': ('60', '106'),
'480': ('61', '106'),
'720': ('62', '106'),
'1080': ('80', '108'),
}
def _decrypt_subtitles(self, data, iv, id):
@ -63,7 +65,7 @@ class CrunchyrollIE(InfoExtractor):
num3 = key ^ num1
num4 = num3 ^ (num3 >> 3) ^ num2
prefix = intlist_to_bytes(obfuscate_key_aux(20, 97, (1, 2)))
shaHash = bytes_to_intlist(sha1(prefix + str(num4).encode(u'ascii')).digest())
shaHash = bytes_to_intlist(sha1(prefix + str(num4).encode('ascii')).digest())
# Extend 160 Bit hash to 256 Bit
return shaHash + [0] * 12
@ -79,93 +81,98 @@ class CrunchyrollIE(InfoExtractor):
def _convert_subtitles_to_srt(self, subtitles):
i=1
output = u''
output = ''
for start, end, text in re.findall(r'<event [^>]*?start="([^"]+)" [^>]*?end="([^"]+)" [^>]*?text="([^"]+)"[^>]*?>', subtitles):
start = start.replace(u'.', u',')
end = end.replace(u'.', u',')
start = start.replace('.', ',')
end = end.replace('.', ',')
text = clean_html(text)
text = text.replace(u'\\N', u'\n')
text = text.replace('\\N', '\n')
if not text:
continue
output += u'%d\n%s --> %s\n%s\n\n' % (i, start, end, text)
output += '%d\n%s --> %s\n%s\n\n' % (i, start, end, text)
i+=1
return output
def _real_extract(self,url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('video_id')
webpage_url = u'http://www.' + mobj.group('url')
video_id = mobj.group(u'video_id')
webpage = self._download_webpage(webpage_url, video_id)
note_m = self._html_search_regex(r'<div class="showmedia-trailer-notice">(.+?)</div>', webpage, u'trailer-notice', default=u'')
if mobj.group('prefix') == 'm':
mobile_webpage = self._download_webpage(url, video_id, 'Downloading mobile webpage')
webpage_url = self._search_regex(r'<link rel="canonical" href="([^"]+)" />', mobile_webpage, 'webpage_url')
else:
webpage_url = 'http://www.' + mobj.group('url')
webpage = self._download_webpage(webpage_url, video_id, 'Downloading webpage')
note_m = self._html_search_regex(r'<div class="showmedia-trailer-notice">(.+?)</div>', webpage, 'trailer-notice', default='')
if note_m:
raise ExtractorError(note_m)
video_title = self._html_search_regex(r'<h1[^>]*>(.+?)</h1>', webpage, u'video_title', flags=re.DOTALL)
video_title = re.sub(r' {2,}', u' ', video_title)
video_description = self._html_search_regex(r'"description":"([^"]+)', webpage, u'video_description', default=u'')
video_title = self._html_search_regex(r'<h1[^>]*>(.+?)</h1>', webpage, 'video_title', flags=re.DOTALL)
video_title = re.sub(r' {2,}', ' ', video_title)
video_description = self._html_search_regex(r'"description":"([^"]+)', webpage, 'video_description', default='')
if not video_description:
video_description = None
video_upload_date = self._html_search_regex(r'<div>Availability for free users:(.+?)</div>', webpage, u'video_upload_date', fatal=False, flags=re.DOTALL)
video_upload_date = self._html_search_regex(r'<div>Availability for free users:(.+?)</div>', webpage, 'video_upload_date', fatal=False, flags=re.DOTALL)
if video_upload_date:
video_upload_date = unified_strdate(video_upload_date)
video_uploader = self._html_search_regex(r'<div>\s*Publisher:(.+?)</div>', webpage, u'video_uploader', fatal=False, flags=re.DOTALL)
video_uploader = self._html_search_regex(r'<div>\s*Publisher:(.+?)</div>', webpage, 'video_uploader', fatal=False, flags=re.DOTALL)
playerdata_url = compat_urllib_parse.unquote(self._html_search_regex(r'"config_url":"([^"]+)', webpage, u'playerdata_url'))
playerdata_url = compat_urllib_parse.unquote(self._html_search_regex(r'"config_url":"([^"]+)', webpage, 'playerdata_url'))
playerdata_req = compat_urllib_request.Request(playerdata_url)
playerdata_req.data = compat_urllib_parse.urlencode({u'current_page': webpage_url})
playerdata_req.add_header(u'Content-Type', u'application/x-www-form-urlencoded')
playerdata = self._download_webpage(playerdata_req, video_id, note=u'Downloading media info')
playerdata_req.data = compat_urllib_parse.urlencode({'current_page': webpage_url})
playerdata_req.add_header('Content-Type', 'application/x-www-form-urlencoded')
playerdata = self._download_webpage(playerdata_req, video_id, note='Downloading media info')
stream_id = self._search_regex(r'<media_id>([^<]+)', playerdata, u'stream_id')
video_thumbnail = self._search_regex(r'<episode_image_url>([^<]+)', playerdata, u'thumbnail', fatal=False)
stream_id = self._search_regex(r'<media_id>([^<]+)', playerdata, 'stream_id')
video_thumbnail = self._search_regex(r'<episode_image_url>([^<]+)', playerdata, 'thumbnail', fatal=False)
formats = []
for fmt in re.findall(r'\?p([0-9]{3,4})=1', webpage):
stream_quality, stream_format = self._FORMAT_IDS[fmt]
video_format = fmt+u'p'
streamdata_req = compat_urllib_request.Request(u'http://www.crunchyroll.com/xml/')
video_format = fmt+'p'
streamdata_req = compat_urllib_request.Request('http://www.crunchyroll.com/xml/')
# urlencode doesn't work!
streamdata_req.data = u'req=RpcApiVideoEncode%5FGetStreamInfo&video%5Fencode%5Fquality='+stream_quality+u'&media%5Fid='+stream_id+u'&video%5Fformat='+stream_format
streamdata_req.add_header(u'Content-Type', u'application/x-www-form-urlencoded')
streamdata_req.add_header(u'Content-Length', str(len(streamdata_req.data)))
streamdata = self._download_webpage(streamdata_req, video_id, note=u'Downloading media info for '+video_format)
video_url = self._search_regex(r'<host>([^<]+)', streamdata, u'video_url')
video_play_path = self._search_regex(r'<file>([^<]+)', streamdata, u'video_play_path')
streamdata_req.data = 'req=RpcApiVideoEncode%5FGetStreamInfo&video%5Fencode%5Fquality='+stream_quality+'&media%5Fid='+stream_id+'&video%5Fformat='+stream_format
streamdata_req.add_header('Content-Type', 'application/x-www-form-urlencoded')
streamdata_req.add_header('Content-Length', str(len(streamdata_req.data)))
streamdata = self._download_webpage(streamdata_req, video_id, note='Downloading media info for '+video_format)
video_url = self._search_regex(r'<host>([^<]+)', streamdata, 'video_url')
video_play_path = self._search_regex(r'<file>([^<]+)', streamdata, 'video_play_path')
formats.append({
u'url': video_url,
u'play_path': video_play_path,
u'ext': 'flv',
u'format': video_format,
u'format_id': video_format,
'url': video_url,
'play_path': video_play_path,
'ext': 'flv',
'format': video_format,
'format_id': video_format,
})
subtitles = {}
for sub_id, sub_name in re.findall(r'\?ssid=([0-9]+)" title="([^"]+)', webpage):
sub_page = self._download_webpage(u'http://www.crunchyroll.com/xml/?req=RpcApiSubtitle_GetXml&subtitle_script_id='+sub_id,\
video_id, note=u'Downloading subtitles for '+sub_name)
id = self._search_regex(r'id=\'([0-9]+)', sub_page, u'subtitle_id', fatal=False)
iv = self._search_regex(r'<iv>([^<]+)', sub_page, u'subtitle_iv', fatal=False)
data = self._search_regex(r'<data>([^<]+)', sub_page, u'subtitle_data', fatal=False)
sub_page = self._download_webpage('http://www.crunchyroll.com/xml/?req=RpcApiSubtitle_GetXml&subtitle_script_id='+sub_id,\
video_id, note='Downloading subtitles for '+sub_name)
id = self._search_regex(r'id=\'([0-9]+)', sub_page, 'subtitle_id', fatal=False)
iv = self._search_regex(r'<iv>([^<]+)', sub_page, 'subtitle_iv', fatal=False)
data = self._search_regex(r'<data>([^<]+)', sub_page, 'subtitle_data', fatal=False)
if not id or not iv or not data:
continue
id = int(id)
iv = base64.b64decode(iv)
data = base64.b64decode(data)
subtitle = self._decrypt_subtitles(data, iv, id).decode(u'utf-8')
lang_code = self._search_regex(r'lang_code=\'([^\']+)', subtitle, u'subtitle_lang_code', fatal=False)
subtitle = self._decrypt_subtitles(data, iv, id).decode('utf-8')
lang_code = self._search_regex(r'lang_code=\'([^\']+)', subtitle, 'subtitle_lang_code', fatal=False)
if not lang_code:
continue
subtitles[lang_code] = self._convert_subtitles_to_srt(subtitle)
return {
u'id': video_id,
u'title': video_title,
u'description': video_description,
u'thumbnail': video_thumbnail,
u'uploader': video_uploader,
u'upload_date': video_upload_date,
u'subtitles': subtitles,
u'formats': formats,
'id': video_id,
'title': video_title,
'description': video_description,
'thumbnail': video_thumbnail,
'uploader': video_uploader,
'upload_date': video_upload_date,
'subtitles': subtitles,
'formats': formats,
}

View File

@ -1,49 +1,60 @@
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..utils import (
unescapeHTML,
find_xpath_attr,
)
class CSpanIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?c-spanvideo\.org/program/(?P<name>.*)'
_VALID_URL = r'http://(?:www\.)?c-span\.org/video/\?(?P<id>\d+)'
IE_DESC = 'C-SPAN'
_TEST = {
'url': 'http://www.c-spanvideo.org/program/HolderonV',
'file': '315139.mp4',
'url': 'http://www.c-span.org/video/?313572-1/HolderonV',
'md5': '8e44ce11f0f725527daccc453f553eb0',
'info_dict': {
'id': '315139',
'ext': 'mp4',
'title': 'Attorney General Eric Holder on Voting Rights Act Decision',
'description': 'Attorney General Eric Holder spoke to reporters following the Supreme Court decision in [Shelby County v. Holder] in which the court ruled that the preclearance provisions of the Voting Rights Act could not be enforced until Congress established new guidelines for review.',
'description': 'Attorney General Eric Holder spoke to reporters following the Supreme Court decision in Shelby County v. Holder in which the court ruled that the preclearance provisions of the Voting Rights Act could not be enforced until Congress established new guidelines for review.',
},
'skip': 'Regularly fails on travis, for unknown reasons',
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
prog_name = mobj.group('name')
webpage = self._download_webpage(url, prog_name)
video_id = self._search_regex(r'prog(?:ram)?id=(.*?)&', webpage, 'video id')
page_id = mobj.group('id')
webpage = self._download_webpage(url, page_id)
video_id = self._search_regex(r'data-progid=\'(\d+)\'>', webpage, 'video id')
title = self._html_search_regex(
r'<!-- title -->\n\s*<h1[^>]*>(.*?)</h1>', webpage, 'title')
description = self._og_search_description(webpage)
description = self._html_search_regex(
[
# The full description
r'<div class=\'expandable\'>(.*?)<a href=\'#\'',
# If the description is small enough the other div is not
# present, otherwise this is a stripped version
r'<p class=\'initial\'>(.*?)</p>'
],
webpage, 'description', flags=re.DOTALL)
info_url = 'http://c-spanvideo.org/videoLibrary/assets/player/ajax-player.php?os=android&html5=program&id=' + video_id
data_json = self._download_webpage(
info_url, video_id, 'Downloading video info')
data = json.loads(data_json)
data = self._download_json(info_url, video_id)
url = unescapeHTML(data['video']['files'][0]['path']['#text'])
doc = self._download_xml('http://www.c-span.org/common/services/flashXml.php?programid=' + video_id,
video_id)
def find_string(s):
return find_xpath_attr(doc, './/string', 'name', s).text
return {
'id': video_id,
'title': title,
'title': find_string('title'),
'url': url,
'description': description,
'thumbnail': self._og_search_thumbnail(webpage),
'thumbnail': find_string('poster'),
}

View File

@ -0,0 +1,46 @@
from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor
class DiscoveryIE(InfoExtractor):
_VALID_URL = r'http://dsc\.discovery\.com\/[a-zA-Z0-9\-]*/[a-zA-Z0-9\-]*/videos/(?P<id>[a-zA-Z0-9\-]*)(.htm)?'
_TEST = {
'url': 'http://dsc.discovery.com/tv-shows/mythbusters/videos/mission-impossible-outtakes.htm',
'file': '614784.mp4',
'md5': 'e12614f9ee303a6ccef415cb0793eba2',
'info_dict': {
'title': 'MythBusters: Mission Impossible Outtakes',
'description': ('Watch Jamie Hyneman and Adam Savage practice being'
' each other -- to the point of confusing Jamie\'s dog -- and '
'don\'t miss Adam moon-walking as Jamie ... behind Jamie\'s'
' back.'),
'duration': 156,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
video_list_json = self._search_regex(r'var videoListJSON = ({.*?});',
webpage, 'video list', flags=re.DOTALL)
video_list = json.loads(video_list_json)
info = video_list['clips'][0]
formats = []
for f in info['mp4']:
formats.append(
{'url': f['src'], r'ext': r'mp4', 'tbr': int(f['bitrate'][:-1])})
return {
'id': info['contentId'],
'title': video_list['name'],
'formats': formats,
'description': info['videoCaption'],
'thumbnail': info.get('videoStillURL') or info.get('thumbnailURL'),
'duration': info['duration'],
}

View File

@ -1,34 +1,35 @@
from __future__ import unicode_literals
import re
import json
import time
from .common import InfoExtractor
class DotsubIE(InfoExtractor):
_VALID_URL = r'(?:http://)?(?:www\.)?dotsub\.com/view/([^/]+)'
_VALID_URL = r'http://(?:www\.)?dotsub\.com/view/(?P<id>[^/]+)'
_TEST = {
u'url': u'http://dotsub.com/view/aed3b8b2-1889-4df5-ae63-ad85f5572f27',
u'file': u'aed3b8b2-1889-4df5-ae63-ad85f5572f27.flv',
u'md5': u'0914d4d69605090f623b7ac329fea66e',
u'info_dict': {
u"title": u"Pyramids of Waste (2010), AKA The Lightbulb Conspiracy - Planned obsolescence documentary",
u"uploader": u"4v4l0n42",
u'description': u'Pyramids of Waste (2010) also known as "The lightbulb conspiracy" is a documentary about how our economic system based on consumerism and planned obsolescence is breaking our planet down.\r\n\r\nSolutions to this can be found at:\r\nhttp://robotswillstealyourjob.com\r\nhttp://www.federicopistono.org\r\n\r\nhttp://opensourceecology.org\r\nhttp://thezeitgeistmovement.com',
u'thumbnail': u'http://dotsub.com/media/aed3b8b2-1889-4df5-ae63-ad85f5572f27/p',
u'upload_date': u'20101213',
'url': 'http://dotsub.com/view/aed3b8b2-1889-4df5-ae63-ad85f5572f27',
'md5': '0914d4d69605090f623b7ac329fea66e',
'info_dict': {
'id': 'aed3b8b2-1889-4df5-ae63-ad85f5572f27',
'ext': 'flv',
'title': 'Pyramids of Waste (2010), AKA The Lightbulb Conspiracy - Planned obsolescence documentary',
'uploader': '4v4l0n42',
'description': 'Pyramids of Waste (2010) also known as "The lightbulb conspiracy" is a documentary about how our economic system based on consumerism and planned obsolescence is breaking our planet down.\r\n\r\nSolutions to this can be found at:\r\nhttp://robotswillstealyourjob.com\r\nhttp://www.federicopistono.org\r\n\r\nhttp://opensourceecology.org\r\nhttp://thezeitgeistmovement.com',
'thumbnail': 'http://dotsub.com/media/aed3b8b2-1889-4df5-ae63-ad85f5572f27/p',
'upload_date': '20101213',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group(1)
info_url = "https://dotsub.com/api/media/%s/metadata" %(video_id)
webpage = self._download_webpage(info_url, video_id)
info = json.loads(webpage)
video_id = mobj.group('id')
info_url = "https://dotsub.com/api/media/%s/metadata" % video_id
info = self._download_json(info_url, video_id)
date = time.gmtime(info['dateCreated']/1000) # The timestamp is in miliseconds
return [{
return {
'id': video_id,
'url': info['mediaURI'],
'ext': 'flv',
@ -37,5 +38,5 @@ class DotsubIE(InfoExtractor):
'description': info['description'],
'uploader': info['user'],
'view_count': info['numberOfViews'],
'upload_date': u'%04i%02i%02i' % (date.tm_year, date.tm_mon, date.tm_mday),
}]
'upload_date': '%04i%02i%02i' % (date.tm_year, date.tm_mon, date.tm_mday),
}

View File

@ -10,11 +10,12 @@ from .common import InfoExtractor
class DropboxIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?dropbox[.]com/s/(?P<id>[a-zA-Z0-9]{15})/(?P<title>[^?#]*)'
_TEST = {
'url': 'https://www.dropbox.com/s/mcnzehi9wo55th4/20131219_085616.mp4',
'file': 'mcnzehi9wo55th4.mp4',
'md5': 'f6d65b1b326e82fd7ab7720bea3dacae',
'url': 'https://www.dropbox.com/s/0qr9sai2veej4f8/THE_DOCTOR_GAMES.mp4',
'md5': '8ae17c51172fb7f93bdd6a214cc8c896',
'info_dict': {
'title': '20131219_085616'
'id': '0qr9sai2veej4f8',
'ext': 'mp4',
'title': 'THE_DOCTOR_GAMES'
}
}

View File

@ -0,0 +1,58 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import unified_strdate
class ElPaisIE(InfoExtractor):
_VALID_URL = r'https?://(?:[^.]+\.)?elpais\.com/.*/(?P<id>[^/#?]+)\.html(?:$|[?#])'
IE_DESC = 'El País'
_TEST = {
'url': 'http://blogs.elpais.com/la-voz-de-inaki/2014/02/tiempo-nuevo-recetas-viejas.html',
'md5': '98406f301f19562170ec071b83433d55',
'info_dict': {
'id': 'tiempo-nuevo-recetas-viejas',
'ext': 'mp4',
'title': 'Tiempo nuevo, recetas viejas',
'description': 'De lunes a viernes, a partir de las ocho de la mañana, Iñaki Gabilondo nos cuenta su visión de la actualidad nacional e internacional.',
'upload_date': '20140206',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
prefix = self._html_search_regex(
r'var url_cache = "([^"]+)";', webpage, 'URL prefix')
video_suffix = self._search_regex(
r"URLMediaFile = url_cache \+ '([^']+)'", webpage, 'video URL')
video_url = prefix + video_suffix
thumbnail_suffix = self._search_regex(
r"URLMediaStill = url_cache \+ '([^']+)'", webpage, 'thumbnail URL',
fatal=False)
thumbnail = (
None if thumbnail_suffix is None
else prefix + thumbnail_suffix)
title = self._html_search_regex(
'<h2 class="entry-header entry-title.*?>(.*?)</h2>',
webpage, 'title')
date_str = self._search_regex(
r'<p class="date-header date-int updated"\s+title="([^"]+)">',
webpage, 'upload date', fatal=False)
upload_date = (None if date_str is None else unified_strdate(date_str))
return {
'id': video_id,
'url': video_url,
'title': title,
'description': self._og_search_description(webpage),
'thumbnail': thumbnail,
'upload_date': upload_date,
}

View File

@ -1,9 +1,9 @@
import json
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
compat_str,
compat_urllib_parse,
ExtractorError,
@ -11,70 +11,68 @@ from ..utils import (
class EscapistIE(InfoExtractor):
_VALID_URL = r'^https?://?(www\.)?escapistmagazine\.com/videos/view/(?P<showname>[^/]+)/(?P<episode>[^/?]+)[/?]?.*$'
_VALID_URL = r'^https?://?(www\.)?escapistmagazine\.com/videos/view/(?P<showname>[^/]+)/(?P<id>[0-9]+)-'
_TEST = {
u'url': u'http://www.escapistmagazine.com/videos/view/the-escapist-presents/6618-Breaking-Down-Baldurs-Gate',
u'file': u'6618-Breaking-Down-Baldurs-Gate.mp4',
u'md5': u'ab3a706c681efca53f0a35f1415cf0d1',
u'info_dict': {
u"description": u"Baldur's Gate: Original, Modded or Enhanced Edition? I'll break down what you can expect from the new Baldur's Gate: Enhanced Edition.",
u"uploader": u"the-escapist-presents",
u"title": u"Breaking Down Baldur's Gate"
'url': 'http://www.escapistmagazine.com/videos/view/the-escapist-presents/6618-Breaking-Down-Baldurs-Gate',
'md5': 'ab3a706c681efca53f0a35f1415cf0d1',
'info_dict': {
'id': '6618',
'ext': 'mp4',
'description': "Baldur's Gate: Original, Modded or Enhanced Edition? I'll break down what you can expect from the new Baldur's Gate: Enhanced Edition.",
'uploader': 'the-escapist-presents',
'title': "Breaking Down Baldur's Gate",
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
showName = mobj.group('showname')
videoId = mobj.group('episode')
video_id = mobj.group('id')
self.report_extraction(videoId)
webpage = self._download_webpage(url, videoId)
self.report_extraction(video_id)
webpage = self._download_webpage(url, video_id)
videoDesc = self._html_search_regex(
r'<meta name="description" content="([^"]*)"',
webpage, u'description', fatal=False)
webpage, 'description', fatal=False)
playerUrl = self._og_search_video_url(webpage, name=u'player URL')
title = self._html_search_regex(
r'<meta name="title" content="([^"]*)"',
webpage, u'title').split(' : ')[-1]
webpage, 'title').split(' : ')[-1]
configUrl = self._search_regex('config=(.*)$', playerUrl, u'config URL')
configUrl = self._search_regex('config=(.*)$', playerUrl, 'config URL')
configUrl = compat_urllib_parse.unquote(configUrl)
formats = []
def _add_format(name, cfgurl):
configJSON = self._download_webpage(
cfgurl, videoId,
u'Downloading ' + name + ' configuration',
u'Unable to download ' + name + ' configuration')
def _add_format(name, cfgurl, quality):
config = self._download_json(
cfgurl, video_id,
'Downloading ' + name + ' configuration',
'Unable to download ' + name + ' configuration',
transform_source=lambda s: s.replace("'", '"'))
# Technically, it's JavaScript, not JSON
configJSON = configJSON.replace("'", '"')
try:
config = json.loads(configJSON)
except (ValueError,) as err:
raise ExtractorError(u'Invalid JSON in configuration file: ' + compat_str(err))
playlist = config['playlist']
formats.append({
'url': playlist[1]['url'],
'format_id': name,
'quality': quality,
})
_add_format(u'normal', configUrl)
_add_format('normal', configUrl, quality=0)
hq_url = (configUrl +
('&hq=1' if '?' in configUrl else configUrl + '?hq=1'))
try:
_add_format(u'hq', hq_url)
_add_format('hq', hq_url, quality=1)
except ExtractorError:
pass # That's fine, we'll just use normal quality
self._sort_formats(formats)
return {
'id': videoId,
'id': video_id,
'formats': formats,
'uploader': showName,
'title': title,

View File

@ -1,56 +1,58 @@
from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor
class ExfmIE(InfoExtractor):
IE_NAME = u'exfm'
IE_DESC = u'ex.fm'
_VALID_URL = r'(?:http://)?(?:www\.)?ex\.fm/song/([^/]+)'
_SOUNDCLOUD_URL = r'(?:http://)?(?:www\.)?api\.soundcloud\.com/tracks/([^/]+)/stream'
IE_NAME = 'exfm'
IE_DESC = 'ex.fm'
_VALID_URL = r'http://(?:www\.)?ex\.fm/song/(?P<id>[^/]+)'
_SOUNDCLOUD_URL = r'http://(?:www\.)?api\.soundcloud\.com/tracks/([^/]+)/stream'
_TESTS = [
{
u'url': u'http://ex.fm/song/eh359',
u'file': u'44216187.mp3',
u'md5': u'e45513df5631e6d760970b14cc0c11e7',
u'info_dict': {
u"title": u"Test House \"Love Is Not Enough\" (Extended Mix) DeadJournalist Exclusive",
u"uploader": u"deadjournalist",
u'upload_date': u'20120424',
u'description': u'Test House \"Love Is Not Enough\" (Extended Mix) DeadJournalist Exclusive',
'url': 'http://ex.fm/song/eh359',
'md5': 'e45513df5631e6d760970b14cc0c11e7',
'info_dict': {
'id': '44216187',
'ext': 'mp3',
'title': 'Test House "Love Is Not Enough" (Extended Mix) DeadJournalist Exclusive',
'uploader': 'deadjournalist',
'upload_date': '20120424',
'description': 'Test House \"Love Is Not Enough\" (Extended Mix) DeadJournalist Exclusive',
},
u'note': u'Soundcloud song',
u'skip': u'The site is down too often',
'note': 'Soundcloud song',
'skip': 'The site is down too often',
},
{
u'url': u'http://ex.fm/song/wddt8',
u'file': u'wddt8.mp3',
u'md5': u'966bd70741ac5b8570d8e45bfaed3643',
u'info_dict': {
u'title': u'Safe and Sound',
u'uploader': u'Capital Cities',
'url': 'http://ex.fm/song/wddt8',
'md5': '966bd70741ac5b8570d8e45bfaed3643',
'info_dict': {
'id': 'wddt8',
'ext': 'mp3',
'title': 'Safe and Sound',
'uploader': 'Capital Cities',
},
u'skip': u'The site is down too often',
'skip': 'The site is down too often',
},
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
song_id = mobj.group(1)
info_url = "http://ex.fm/api/v3/song/%s" %(song_id)
webpage = self._download_webpage(info_url, song_id)
info = json.loads(webpage)
song_url = info['song']['url']
song_id = mobj.group('id')
info_url = "http://ex.fm/api/v3/song/%s" % song_id
info = self._download_json(info_url, song_id)['song']
song_url = info['url']
if re.match(self._SOUNDCLOUD_URL, song_url) is not None:
self.to_screen('Soundcloud song detected')
return self.url_result(song_url.replace('/stream',''), 'Soundcloud')
return [{
return self.url_result(song_url.replace('/stream', ''), 'Soundcloud')
return {
'id': song_id,
'url': song_url,
'ext': 'mp3',
'title': info['song']['title'],
'thumbnail': info['song']['image']['large'],
'uploader': info['song']['artist'],
'view_count': info['song']['loved_count'],
}]
'title': info['title'],
'thumbnail': info['image']['large'],
'uploader': info['artist'],
'view_count': info['loved_count'],
}

View File

@ -0,0 +1,38 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class FirstpostIE(InfoExtractor):
IE_NAME = 'Firstpost.com'
_VALID_URL = r'http://(?:www\.)?firstpost\.com/[^/]+/.*-(?P<id>[0-9]+)\.html'
_TEST = {
'url': 'http://www.firstpost.com/india/india-to-launch-indigenous-aircraft-carrier-monday-1025403.html',
'md5': 'ee9114957692f01fb1263ed87039112a',
'info_dict': {
'id': '1025403',
'ext': 'mp4',
'title': 'India to launch indigenous aircraft carrier INS Vikrant today',
'description': 'Its flight deck is over twice the size of a football field, its power unit can light up the entire Kochi city and the cabling is enough to cover the distance between here to Delhi.',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
video_url = self._html_search_regex(
r'<div.*?name="div_video".*?flashvars="([^"]+)">',
webpage, 'video URL')
return {
'id': video_id,
'url': video_url,
'title': self._og_search_title(webpage),
'description': self._og_search_description(webpage),
'thumbnail': self._og_search_thumbnail(webpage),
}

View File

@ -0,0 +1,60 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import int_or_none
class FirstTVIE(InfoExtractor):
IE_NAME = 'firsttv'
IE_DESC = 'Видеоархив - Первый канал'
_VALID_URL = r'http://(?:www\.)?1tv\.ru/videoarchive/(?P<id>\d+)'
_TEST = {
'url': 'http://www.1tv.ru/videoarchive/73390',
'md5': '3de6390cf0cca4a5eae1d1d83895e5ad',
'info_dict': {
'id': '73390',
'ext': 'mp4',
'title': 'Олимпийские канатные дороги',
'description': 'md5:cc730d2bf4215463e37fff6a1e277b13',
'thumbnail': 'http://img1.1tv.ru/imgsize640x360/PR20140210114657.JPG',
'duration': 149,
},
'skip': 'Only works from Russia',
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id, 'Downloading page')
video_url = self._html_search_regex(
r'''(?s)jwplayer\('flashvideoportal_1'\)\.setup\({.*?'file': '([^']+)'.*?}\);''', webpage, 'video URL')
title = self._html_search_regex(
r'<div class="tv_translation">\s*<h1><a href="[^"]+">([^<]*)</a>', webpage, 'title')
description = self._html_search_regex(
r'<div class="descr">\s*<div>&nbsp;</div>\s*<p>([^<]*)</p></div>', webpage, 'description', fatal=False)
thumbnail = self._og_search_thumbnail(webpage)
duration = self._og_search_property('video:duration', webpage, 'video duration', fatal=False)
like_count = self._html_search_regex(r'title="Понравилось".*?/></label> \[(\d+)\]',
webpage, 'like count', fatal=False)
dislike_count = self._html_search_regex(r'title="Не понравилось".*?/></label> \[(\d+)\]',
webpage, 'dislike count', fatal=False)
return {
'id': video_id,
'url': video_url,
'thumbnail': thumbnail,
'title': title,
'description': description,
'duration': int_or_none(duration),
'like_count': int_or_none(like_count),
'dislike_count': int_or_none(dislike_count),
}

View File

@ -0,0 +1,95 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
compat_urllib_request,
unified_strdate,
str_to_int,
parse_duration,
)
from youtube_dl.utils import clean_html
class FourTubeIE(InfoExtractor):
IE_NAME = '4tube'
_VALID_URL = r'https?://(?:www\.)?4tube\.com/videos/(?P<id>\d+)'
_TEST = {
'url': 'http://www.4tube.com/videos/209733/hot-babe-holly-michaels-gets-her-ass-stuffed-by-black',
'md5': '6516c8ac63b03de06bc8eac14362db4f',
'info_dict': {
'id': '209733',
'ext': 'mp4',
'title': 'Hot Babe Holly Michaels gets her ass stuffed by black',
'uploader': 'WCP Club',
'uploader_id': 'wcp-club',
'upload_date': '20131031',
'duration': 583,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage_url = 'http://www.4tube.com/videos/' + video_id
webpage = self._download_webpage(webpage_url, video_id)
self.report_extraction(video_id)
playlist_json = self._html_search_regex(r'var playerConfigPlaylist\s+=\s+([^;]+)', webpage, 'Playlist')
media_id = self._search_regex(r'idMedia:\s*(\d+)', playlist_json, 'Media Id')
sources = self._search_regex(r'sources:\s*\[([^\]]*)\]', playlist_json, 'Sources').split(',')
title = self._search_regex(r'title:\s*"([^"]*)', playlist_json, 'Title')
thumbnail_url = self._search_regex(r'image:\s*"([^"]*)', playlist_json, 'Thumbnail', fatal=False)
uploader_str = self._search_regex(r'<span>Uploaded by</span>(.*?)<span>', webpage, 'uploader', fatal=False)
mobj = re.search(r'<a href="/sites/(?P<id>[^"]+)"><strong>(?P<name>[^<]+)</strong></a>', uploader_str)
(uploader, uploader_id) = (mobj.group('name'), mobj.group('id')) if mobj else (clean_html(uploader_str), None)
upload_date = None
view_count = None
duration = None
description = self._html_search_meta('description', webpage, 'description')
if description:
upload_date = self._search_regex(r'Published Date: (\d{2} [a-zA-Z]{3} \d{4})', description, 'upload date',
fatal=False)
if upload_date:
upload_date = unified_strdate(upload_date)
view_count = self._search_regex(r'Views: ([\d,\.]+)', description, 'view count', fatal=False)
if view_count:
view_count = str_to_int(view_count)
duration = parse_duration(self._search_regex(r'Length: (\d+m\d+s)', description, 'duration', fatal=False))
token_url = "http://tkn.4tube.com/{0}/desktop/{1}".format(media_id, "+".join(sources))
headers = {
b'Content-Type': b'application/x-www-form-urlencoded',
b'Origin': b'http://www.4tube.com',
}
token_req = compat_urllib_request.Request(token_url, b'{}', headers)
tokens = self._download_json(token_req, video_id)
formats = [{
'url': tokens[format]['token'],
'format_id': format + 'p',
'resolution': format + 'p',
'quality': int(format),
} for format in sources]
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'formats': formats,
'thumbnail': thumbnail_url,
'uploader': uploader,
'uploader_id': uploader_id,
'upload_date': upload_date,
'view_count': view_count,
'duration': duration,
'age_limit': 18,
'webpage_url': webpage_url,
}

View File

@ -1,4 +1,7 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
import json
@ -30,7 +33,7 @@ class FranceTVBaseInfoExtractor(InfoExtractor):
class PluzzIE(FranceTVBaseInfoExtractor):
IE_NAME = u'pluzz.francetv.fr'
IE_NAME = 'pluzz.francetv.fr'
_VALID_URL = r'https?://pluzz\.francetv\.fr/videos/(.*?)\.html'
# Can't use tests, videos expire in 7 days
@ -44,17 +47,17 @@ class PluzzIE(FranceTVBaseInfoExtractor):
class FranceTvInfoIE(FranceTVBaseInfoExtractor):
IE_NAME = u'francetvinfo.fr'
IE_NAME = 'francetvinfo.fr'
_VALID_URL = r'https?://www\.francetvinfo\.fr/replay.*/(?P<title>.+)\.html'
_TEST = {
u'url': u'http://www.francetvinfo.fr/replay-jt/france-3/soir-3/jt-grand-soir-3-lundi-26-aout-2013_393427.html',
u'file': u'84981923.mp4',
u'info_dict': {
u'title': u'Soir 3',
'url': 'http://www.francetvinfo.fr/replay-jt/france-3/soir-3/jt-grand-soir-3-lundi-26-aout-2013_393427.html',
'file': '84981923.mp4',
'info_dict': {
'title': 'Soir 3',
},
u'params': {
u'skip_download': True,
'params': {
'skip_download': True,
},
}
@ -62,13 +65,13 @@ class FranceTvInfoIE(FranceTVBaseInfoExtractor):
mobj = re.match(self._VALID_URL, url)
page_title = mobj.group('title')
webpage = self._download_webpage(url, page_title)
video_id = self._search_regex(r'id-video=(\d+?)"', webpage, u'video id')
video_id = self._search_regex(r'id-video=(\d+?)[@"]', webpage, 'video id')
return self._extract_video(video_id)
class FranceTVIE(FranceTVBaseInfoExtractor):
IE_NAME = u'francetv'
IE_DESC = u'France 2, 3, 4, 5 and Ô'
IE_NAME = 'francetv'
IE_DESC = 'France 2, 3, 4, 5 and Ô'
_VALID_URL = r'''(?x)https?://www\.france[2345o]\.fr/
(?:
emissions/.*?/(videos|emissions)/(?P<id>[^/?]+)
@ -78,73 +81,73 @@ class FranceTVIE(FranceTVBaseInfoExtractor):
_TESTS = [
# france2
{
u'url': u'http://www.france2.fr/emissions/13h15-le-samedi-le-dimanche/videos/75540104',
u'file': u'75540104.mp4',
u'info_dict': {
u'title': u'13h15, le samedi...',
u'description': u'md5:2e5b58ba7a2d3692b35c792be081a03d',
'url': 'http://www.france2.fr/emissions/13h15-le-samedi-le-dimanche/videos/75540104',
'file': '75540104.mp4',
'info_dict': {
'title': '13h15, le samedi...',
'description': 'md5:2e5b58ba7a2d3692b35c792be081a03d',
},
u'params': {
'params': {
# m3u8 download
u'skip_download': True,
'skip_download': True,
},
},
# france3
{
u'url': u'http://www.france3.fr/emissions/pieces-a-conviction/diffusions/13-11-2013_145575',
u'info_dict': {
u'id': u'000702326_CAPP_PicesconvictionExtrait313022013_120220131722_Au',
u'ext': u'flv',
u'title': u'Le scandale du prix des médicaments',
u'description': u'md5:1384089fbee2f04fc6c9de025ee2e9ce',
'url': 'http://www.france3.fr/emissions/pieces-a-conviction/diffusions/13-11-2013_145575',
'info_dict': {
'id': '000702326_CAPP_PicesconvictionExtrait313022013_120220131722_Au',
'ext': 'flv',
'title': 'Le scandale du prix des médicaments',
'description': 'md5:1384089fbee2f04fc6c9de025ee2e9ce',
},
u'params': {
'params': {
# rtmp download
u'skip_download': True,
'skip_download': True,
},
},
# france4
{
u'url': u'http://www.france4.fr/emissions/hero-corp/videos/rhozet_herocorp_bonus_1_20131106_1923_06112013172108_F4',
u'info_dict': {
u'id': u'rhozet_herocorp_bonus_1_20131106_1923_06112013172108_F4',
u'ext': u'flv',
u'title': u'Hero Corp Making of - Extrait 1',
u'description': u'md5:c87d54871b1790679aec1197e73d650a',
'url': 'http://www.france4.fr/emissions/hero-corp/videos/rhozet_herocorp_bonus_1_20131106_1923_06112013172108_F4',
'info_dict': {
'id': 'rhozet_herocorp_bonus_1_20131106_1923_06112013172108_F4',
'ext': 'flv',
'title': 'Hero Corp Making of - Extrait 1',
'description': 'md5:c87d54871b1790679aec1197e73d650a',
},
u'params': {
'params': {
# rtmp download
u'skip_download': True,
'skip_download': True,
},
},
# france5
{
u'url': u'http://www.france5.fr/emissions/c-a-dire/videos/92837968',
u'info_dict': {
u'id': u'92837968',
u'ext': u'mp4',
u'title': u'C à dire ?!',
u'description': u'md5:fb1db1cbad784dcce7c7a7bd177c8e2f',
'url': 'http://www.france5.fr/emissions/c-a-dire/videos/92837968',
'info_dict': {
'id': '92837968',
'ext': 'mp4',
'title': 'C à dire ?!',
'description': 'md5:fb1db1cbad784dcce7c7a7bd177c8e2f',
},
u'params': {
'params': {
# m3u8 download
u'skip_download': True,
'skip_download': True,
},
},
# franceo
{
u'url': u'http://www.franceo.fr/jt/info-afrique/04-12-2013',
u'info_dict': {
u'id': u'92327925',
u'ext': u'mp4',
u'title': u'Infô-Afrique',
u'description': u'md5:ebf346da789428841bee0fd2a935ea55',
'url': 'http://www.franceo.fr/jt/info-afrique/04-12-2013',
'info_dict': {
'id': '92327925',
'ext': 'mp4',
'title': 'Infô-Afrique',
'description': 'md5:ebf346da789428841bee0fd2a935ea55',
},
u'params': {
'params': {
# m3u8 download
u'skip_download': True,
'skip_download': True,
},
u'skip': u'The id changes frequently',
'skip': 'The id changes frequently',
},
]
@ -160,27 +163,28 @@ class FranceTVIE(FranceTVBaseInfoExtractor):
'\.fr/\?id-video=([^"/&]+)'),
(r'<a class="video" id="ftv_player_(.+?)"'),
]
video_id = self._html_search_regex(id_res, webpage, u'video ID')
video_id = self._html_search_regex(id_res, webpage, 'video ID')
else:
video_id = mobj.group('id')
return self._extract_video(video_id)
class GenerationQuoiIE(InfoExtractor):
IE_NAME = u'france2.fr:generation-quoi'
IE_NAME = 'france2.fr:generation-quoi'
_VALID_URL = r'https?://generation-quoi\.france2\.fr/portrait/(?P<name>.*)(\?|$)'
_TEST = {
u'url': u'http://generation-quoi.france2.fr/portrait/garde-a-vous',
u'file': u'k7FJX8VBcvvLmX4wA5Q.mp4',
u'info_dict': {
u'title': u'Génération Quoi - Garde à Vous',
u'uploader': u'Génération Quoi',
'url': 'http://generation-quoi.france2.fr/portrait/garde-a-vous',
'file': 'k7FJX8VBcvvLmX4wA5Q.mp4',
'info_dict': {
'title': 'Génération Quoi - Garde à Vous',
'uploader': 'Génération Quoi',
},
u'params': {
'params': {
# It uses Dailymotion
u'skip_download': True,
'skip_download': True,
},
'skip': 'Only available from France',
}
def _real_extract(self, url):
@ -194,20 +198,20 @@ class GenerationQuoiIE(InfoExtractor):
class CultureboxIE(FranceTVBaseInfoExtractor):
IE_NAME = u'culturebox.francetvinfo.fr'
IE_NAME = 'culturebox.francetvinfo.fr'
_VALID_URL = r'https?://culturebox\.francetvinfo\.fr/(?P<name>.*?)(\?|$)'
_TEST = {
u'url': u'http://culturebox.francetvinfo.fr/einstein-on-the-beach-au-theatre-du-chatelet-146813',
u'info_dict': {
u'id': u'EV_6785',
u'ext': u'mp4',
u'title': u'Einstein on the beach au Théâtre du Châtelet',
u'description': u'md5:9ce2888b1efefc617b5e58b3f6200eeb',
'url': 'http://culturebox.francetvinfo.fr/einstein-on-the-beach-au-theatre-du-chatelet-146813',
'info_dict': {
'id': 'EV_6785',
'ext': 'mp4',
'title': 'Einstein on the beach au Théâtre du Châtelet',
'description': 'md5:9ce2888b1efefc617b5e58b3f6200eeb',
},
u'params': {
'params': {
# m3u8 download
u'skip_download': True,
'skip_download': True,
},
}
@ -215,5 +219,5 @@ class CultureboxIE(FranceTVBaseInfoExtractor):
mobj = re.match(self._VALID_URL, url)
name = mobj.group('name')
webpage = self._download_webpage(url, name)
video_id = self._search_regex(r'"http://videos\.francetv\.fr/video/(.*?)"', webpage, u'video id')
video_id = self._search_regex(r'"http://videos\.francetv\.fr/video/(.*?)"', webpage, 'video id')
return self._extract_video(video_id)

View File

@ -1,18 +1,21 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import determine_ext
class FreesoundIE(InfoExtractor):
_VALID_URL = r'(?:https?://)?(?:www\.)?freesound\.org/people/([^/]+)/sounds/(?P<id>[^/]+)'
_VALID_URL = r'https?://(?:www\.)?freesound\.org/people/([^/]+)/sounds/(?P<id>[^/]+)'
_TEST = {
u'url': u'http://www.freesound.org/people/miklovan/sounds/194503/',
u'file': u'194503.mp3',
u'md5': u'12280ceb42c81f19a515c745eae07650',
u'info_dict': {
u"title": u"gulls in the city.wav",
u"uploader" : u"miklovan",
u'description': u'the sounds of seagulls in the city',
'url': 'http://www.freesound.org/people/miklovan/sounds/194503/',
'md5': '12280ceb42c81f19a515c745eae07650',
'info_dict': {
'id': '194503',
'ext': 'mp3',
'title': 'gulls in the city.wav',
'uploader': 'miklovan',
'description': 'the sounds of seagulls in the city',
}
}
@ -20,17 +23,17 @@ class FreesoundIE(InfoExtractor):
mobj = re.match(self._VALID_URL, url)
music_id = mobj.group('id')
webpage = self._download_webpage(url, music_id)
title = self._html_search_regex(r'<div id="single_sample_header">.*?<a href="#">(.+?)</a>',
title = self._html_search_regex(
r'<div id="single_sample_header">.*?<a href="#">(.+?)</a>',
webpage, 'music title', flags=re.DOTALL)
music_url = self._og_search_property('audio', webpage, 'music url')
description = self._html_search_regex(r'<div id="sound_description">(.*?)</div>',
webpage, 'description', fatal=False, flags=re.DOTALL)
description = self._html_search_regex(
r'<div id="sound_description">(.*?)</div>', webpage, 'description',
fatal=False, flags=re.DOTALL)
return [{
return {
'id': music_id,
'title': title,
'url': music_url,
'url': self._og_search_property('audio', webpage, 'music url'),
'uploader': self._og_search_property('audio:artist', webpage, 'music uploader'),
'ext': determine_ext(music_url),
'description': description,
}]
}

View File

@ -0,0 +1,37 @@
from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor
class FreespeechIE(InfoExtractor):
IE_NAME = 'freespeech.org'
_VALID_URL = r'https://www\.freespeech\.org/video/(?P<title>.+)'
_TEST = {
'add_ie': ['Youtube'],
'url': 'https://www.freespeech.org/video/obama-romney-campaign-colorado-ahead-debate-0',
'info_dict': {
'id': 'poKsVCZ64uU',
'ext': 'mp4',
'title': 'Obama, Romney Campaign in Colorado Ahead of Debate',
'description': 'Obama, Romney Campaign in Colorado Ahead of Debate',
'uploader': 'freespeechtv',
'uploader_id': 'freespeechtv',
'upload_date': '20121002',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
title = mobj.group('title')
webpage = self._download_webpage(url, title)
info_json = self._search_regex(r'jQuery.extend\(Drupal.settings, ({.*?})\);', webpage, 'info')
info = json.loads(info_json)
return {
'_type': 'url',
'url': info['jw_player']['basic_video_node_player']['file'],
'ie_key': 'Youtube',
}

View File

@ -1,3 +1,5 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
@ -6,13 +8,16 @@ from .common import InfoExtractor
class FunnyOrDieIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?(?:www\.)?funnyordie\.com/videos/(?P<id>[0-9a-f]+)/.*$'
_TEST = {
u'url': u'http://www.funnyordie.com/videos/0732f586d7/heart-shaped-box-literal-video-version',
u'file': u'0732f586d7.mp4',
u'md5': u'f647e9e90064b53b6e046e75d0241fbd',
u'info_dict': {
u"description": u"Lyrics changed to match the video. Spoken cameo by Obscurus Lupa (from ThatGuyWithTheGlasses.com). Based on a concept by Dustin McLean (DustFilms.com). Performed, edited, and written by David A. Scott.",
u"title": u"Heart-Shaped Box: Literal Video Version"
}
'url': 'http://www.funnyordie.com/videos/0732f586d7/heart-shaped-box-literal-video-version',
'file': '0732f586d7.mp4',
'md5': 'f647e9e90064b53b6e046e75d0241fbd',
'info_dict': {
'description': ('Lyrics changed to match the video. Spoken cameo '
'by Obscurus Lupa (from ThatGuyWithTheGlasses.com). Based on a '
'concept by Dustin McLean (DustFilms.com). Performed, edited, '
'and written by David A. Scott.'),
'title': 'Heart-Shaped Box: Literal Video Version',
},
}
def _real_extract(self, url):
@ -23,13 +28,12 @@ class FunnyOrDieIE(InfoExtractor):
video_url = self._search_regex(
[r'type="video/mp4" src="(.*?)"', r'src="([^>]*?)" type=\'video/mp4\''],
webpage, u'video URL', flags=re.DOTALL)
webpage, 'video URL', flags=re.DOTALL)
info = {
return {
'id': video_id,
'url': video_url,
'ext': 'mp4',
'title': self._og_search_title(webpage),
'description': self._og_search_description(webpage),
}
return [info]

View File

@ -7,10 +7,11 @@ class GametrailersIE(MTVServicesInfoExtractor):
_VALID_URL = r'http://www\.gametrailers\.com/(?P<type>videos|reviews|full-episodes)/(?P<id>.*?)/(?P<title>.*)'
_TEST = {
'url': 'http://www.gametrailers.com/videos/zbvr8i/mirror-s-edge-2-e3-2013--debut-trailer',
'file': '70e9a5d7-cf25-4a10-9104-6f3e7342ae0d.mp4',
'md5': '4c8e67681a0ea7ec241e8c09b3ea8cf7',
'info_dict': {
'title': 'Mirror\'s Edge 2|E3 2013: Debut Trailer',
'id': '70e9a5d7-cf25-4a10-9104-6f3e7342ae0d',
'ext': 'mp4',
'title': 'E3 2013: Debut Trailer',
'description': 'Faith is back! Check out the World Premiere trailer for Mirror\'s Edge 2 straight from the EA Press Conference at E3 2013!',
},
}

View File

@ -4,6 +4,7 @@ from __future__ import unicode_literals
import os
import re
import xml.etree.ElementTree
from .common import InfoExtractor
from .youtube import YoutubeIE
@ -12,6 +13,7 @@ from ..utils import (
compat_urllib_parse,
compat_urllib_request,
compat_urlparse,
compat_xml_parse_error,
ExtractorError,
HEADRequest,
@ -38,18 +40,6 @@ class GenericIE(InfoExtractor):
'title': 'R\u00e9gis plante sa Jeep',
}
},
# embedded vimeo video
{
'add_ie': ['Vimeo'],
'url': 'http://skillsmatter.com/podcast/home/move-semanticsperfect-forwarding-and-rvalue-references',
'file': '22444065.mp4',
'md5': '2903896e23df39722c33f015af0666e2',
'info_dict': {
'title': 'ACCU 2011: Move Semantics,Perfect Forwarding, and Rvalue references- Scott Meyers- 13/04/2011',
'uploader_id': 'skillsmatter',
'uploader': 'Skills Matter',
}
},
# bandcamp page with custom domain
{
'add_ie': ['Bandcamp'],
@ -78,6 +68,18 @@ class GenericIE(InfoExtractor):
'skip_download': True,
},
},
{
# https://github.com/rg3/youtube-dl/issues/2253
'url': 'http://bcove.me/i6nfkrc3',
'file': '3101154703001.mp4',
'md5': '0ba9446db037002366bab3b3eb30c88c',
'info_dict': {
'title': 'Still no power',
'uploader': 'thestar.com',
'description': 'Mississauga resident David Farmer is still out of power as a result of the ice storm a month ago. To keep the house warm, Farmer cuts wood from his property for a wood burning stove downstairs.',
},
'add_ie': ['Brightcove'],
},
# Direct link to a video
{
'url': 'http://media.w3.org/2010/05/sintel/trailer.mp4',
@ -159,6 +161,25 @@ class GenericIE(InfoExtractor):
raise ExtractorError('Invalid URL protocol')
return response
def _extract_rss(self, url, video_id, doc):
playlist_title = doc.find('./channel/title').text
playlist_desc_el = doc.find('./channel/description')
playlist_desc = None if playlist_desc_el is None else playlist_desc_el.text
entries = [{
'_type': 'url',
'url': e.find('link').text,
'title': e.find('title').text,
} for e in doc.findall('./channel/item')]
return {
'_type': 'playlist',
'id': url,
'title': playlist_title,
'description': playlist_desc,
'entries': entries,
}
def _real_extract(self, url):
parsed_url = compat_urlparse.urlparse(url)
if not parsed_url.scheme:
@ -219,6 +240,14 @@ class GenericIE(InfoExtractor):
self.report_extraction(video_id)
# Is it an RSS feed?
try:
doc = xml.etree.ElementTree.fromstring(webpage.encode('utf-8'))
if doc.tag == 'rss':
return self._extract_rss(url, video_id, doc)
except compat_xml_parse_error:
pass
# it's tempting to parse this further, but you would
# have to take into account all the variations like
# Video Title - Site Name
@ -234,15 +263,25 @@ class GenericIE(InfoExtractor):
r'^(?:https?://)?([^/]*)/.*', url, 'video uploader')
# Look for BrightCove:
bc_url = BrightcoveIE._extract_brightcove_url(webpage)
if bc_url is not None:
bc_urls = BrightcoveIE._extract_brightcove_urls(webpage)
if bc_urls:
self.to_screen('Brightcove video detected.')
surl = smuggle_url(bc_url, {'Referer': url})
return self.url_result(surl, 'Brightcove')
entries = [{
'_type': 'url',
'url': smuggle_url(bc_url, {'Referer': url}),
'ie_key': 'Brightcove'
} for bc_url in bc_urls]
return {
'_type': 'playlist',
'title': video_title,
'id': video_id,
'entries': entries,
}
# Look for embedded (iframe) Vimeo player
mobj = re.search(
r'<iframe[^>]+?src="((?:https?:)?//player.vimeo.com/video/.+?)"', webpage)
r'<iframe[^>]+?src="((?:https?:)?//player\.vimeo\.com/video/.+?)"', webpage)
if mobj:
player_url = unescapeHTML(mobj.group(1))
surl = smuggle_url(player_url, {'Referer': url})
@ -250,7 +289,7 @@ class GenericIE(InfoExtractor):
# Look for embedded (swf embed) Vimeo player
mobj = re.search(
r'<embed[^>]+?src="(https?://(?:www\.)?vimeo.com/moogaloop.swf.+?)"', webpage)
r'<embed[^>]+?src="(https?://(?:www\.)?vimeo\.com/moogaloop\.swf.+?)"', webpage)
if mobj:
return self.url_result(mobj.group(1), 'Vimeo')
@ -320,7 +359,7 @@ class GenericIE(InfoExtractor):
return self.url_result(mobj.group(1), 'Aparat')
# Look for MPORA videos
mobj = re.search(r'<iframe .*?src="(http://mpora\.com/videos/[^"]+)"', webpage)
mobj = re.search(r'<iframe .*?src="(http://mpora\.(?:com|de)/videos/[^"]+)"', webpage)
if mobj is not None:
return self.url_result(mobj.group(1), 'Mpora')
@ -332,15 +371,21 @@ class GenericIE(InfoExtractor):
# Look for embedded Facebook player
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>https://www.facebook.com/video/embed.+?)\1', webpage)
r'<iframe[^>]+?src=(["\'])(?P<url>https://www\.facebook\.com/video/embed.+?)\1', webpage)
if mobj is not None:
return self.url_result(mobj.group('url'), 'Facebook')
# Look for embedded Huffington Post player
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>https?://embed\.live\.huffingtonpost\.com/.+?)\1', webpage)
if mobj is not None:
return self.url_result(mobj.group('url'), 'HuffPost')
# Start with something easy: JW Player in SWFObject
mobj = re.search(r'flashvars: [\'"](?:.*&)?file=(http[^\'"&]*)', webpage)
if mobj is None:
# Look for gorilla-vid style embedding
mobj = re.search(r'(?s)jw_plugins.*?file:\s*["\'](.*?)["\']', webpage)
mobj = re.search(r'(?s)(?:jw_plugins|JWPlayerOptions).*?file\s*:\s*["\'](.*?)["\']', webpage)
if mobj is None:
# Broaden the search a little bit
mobj = re.search(r'[^A-Za-z0-9]?(?:file|source)=(http[^\'"&]*)', webpage)

View File

@ -1,4 +1,5 @@
# coding: utf-8
from __future__ import unicode_literals
import datetime
import re
@ -10,32 +11,28 @@ from ..utils import (
class GooglePlusIE(InfoExtractor):
IE_DESC = u'Google Plus'
_VALID_URL = r'(?:https://)?plus\.google\.com/(?:[^/]+/)*?posts/(\w+)'
IE_NAME = u'plus.google'
IE_DESC = 'Google Plus'
_VALID_URL = r'https://plus\.google\.com/(?:[^/]+/)*?posts/(?P<id>\w+)'
IE_NAME = 'plus.google'
_TEST = {
u"url": u"https://plus.google.com/u/0/108897254135232129896/posts/ZButuJc6CtH",
u"file": u"ZButuJc6CtH.flv",
u"info_dict": {
u"upload_date": u"20120613",
u"uploader": u"井上ヨシマサ",
u"title": u"嘆きの天使 降臨"
'url': 'https://plus.google.com/u/0/108897254135232129896/posts/ZButuJc6CtH',
'info_dict': {
'id': 'ZButuJc6CtH',
'ext': 'flv',
'upload_date': '20120613',
'uploader': '井上ヨシマサ',
'title': '嘆きの天使 降臨',
}
}
def _real_extract(self, url):
# Extract id from URL
mobj = re.match(self._VALID_URL, url)
if mobj is None:
raise ExtractorError(u'Invalid URL: %s' % url)
post_url = mobj.group(0)
video_id = mobj.group(1)
video_extension = 'flv'
video_id = mobj.group('id')
# Step 1, Retrieve post webpage to extract further information
webpage = self._download_webpage(post_url, video_id, u'Downloading entry webpage')
webpage = self._download_webpage(url, video_id, 'Downloading entry webpage')
self.report_extraction(video_id)
@ -43,7 +40,7 @@ class GooglePlusIE(InfoExtractor):
upload_date = self._html_search_regex(
r'''(?x)<a.+?class="o-U-s\s[^"]+"\s+style="display:\s*none"\s*>
([0-9]{4}-[0-9]{2}-[0-9]{2})</a>''',
webpage, u'upload date', fatal=False, flags=re.VERBOSE)
webpage, 'upload date', fatal=False, flags=re.VERBOSE)
if upload_date:
# Convert timestring to a format suitable for filename
upload_date = datetime.datetime.strptime(upload_date, "%Y-%m-%d")
@ -51,28 +48,27 @@ class GooglePlusIE(InfoExtractor):
# Extract uploader
uploader = self._html_search_regex(r'rel\="author".*?>(.*?)</a>',
webpage, u'uploader', fatal=False)
webpage, 'uploader', fatal=False)
# Extract title
# Get the first line for title
video_title = self._html_search_regex(r'<meta name\=\"Description\" content\=\"(.*?)[\n<"]',
webpage, 'title', default=u'NA')
webpage, 'title', default='NA')
# Step 2, Simulate clicking the image box to launch video
DOMAIN = 'https://plus.google.com/'
video_page = self._search_regex(r'<a href="((?:%s)?photos/.*?)"' % re.escape(DOMAIN),
webpage, u'video page URL')
webpage, 'video page URL')
if not video_page.startswith(DOMAIN):
video_page = DOMAIN + video_page
webpage = self._download_webpage(video_page, video_id, u'Downloading video page')
webpage = self._download_webpage(video_page, video_id, 'Downloading video page')
# Extract video links on video page
"""Extract video links of all sizes"""
# Extract video links all sizes
pattern = r'\d+,\d+,(\d+),"(http\://redirector\.googlevideo\.com.*?)"'
mobj = re.findall(pattern, webpage)
if len(mobj) == 0:
raise ExtractorError(u'Unable to extract video links')
raise ExtractorError('Unable to extract video links')
# Sort in resolution
links = sorted(mobj)
@ -87,12 +83,11 @@ class GooglePlusIE(InfoExtractor):
except AttributeError: # Python 3
video_url = bytes(video_url, 'ascii').decode('unicode-escape')
return [{
return {
'id': video_id,
'url': video_url,
'uploader': uploader,
'upload_date': upload_date,
'title': video_title,
'ext': video_extension,
}]
'ext': 'flv',
}

View File

@ -1,3 +1,5 @@
from __future__ import unicode_literals
import itertools
import re
@ -8,32 +10,42 @@ from ..utils import (
class GoogleSearchIE(SearchInfoExtractor):
IE_DESC = u'Google Video search'
_MORE_PAGES_INDICATOR = r'id="pnnext" class="pn"'
IE_DESC = 'Google Video search'
_MAX_RESULTS = 1000
IE_NAME = u'video.google:search'
IE_NAME = 'video.google:search'
_SEARCH_KEY = 'gvsearch'
def _get_n_results(self, query, n):
"""Get a specified number of results for a query"""
entries = []
res = {
'_type': 'playlist',
'id': query,
'entries': []
'title': query,
}
for pagenum in itertools.count(1):
result_url = u'http://www.google.com/search?tbm=vid&q=%s&start=%s&hl=en' % (compat_urllib_parse.quote_plus(query), pagenum*10)
webpage = self._download_webpage(result_url, u'gvsearch:' + query,
note='Downloading result page ' + str(pagenum))
for pagenum in itertools.count():
result_url = (
'http://www.google.com/search?tbm=vid&q=%s&start=%s&hl=en'
% (compat_urllib_parse.quote_plus(query), pagenum * 10))
for mobj in re.finditer(r'<h3 class="r"><a href="([^"]+)"', webpage):
e = {
webpage = self._download_webpage(
result_url, 'gvsearch:' + query,
note='Downloading result page ' + str(pagenum + 1))
for hit_idx, mobj in enumerate(re.finditer(
r'<h3 class="r"><a href="([^"]+)"', webpage)):
# Skip playlists
if not re.search(r'id="vidthumb%d"' % (hit_idx + 1), webpage):
continue
entries.append({
'_type': 'url',
'url': mobj.group(1)
}
res['entries'].append(e)
})
if (pagenum * 10 > n) or not re.search(self._MORE_PAGES_INDICATOR, webpage):
if (len(entries) >= n) or not re.search(r'class="pn" id="pnnext"', webpage):
res['entries'] = entries[:n]
return res

View File

@ -0,0 +1,62 @@
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class HelsinkiIE(InfoExtractor):
IE_DESC = 'helsinki.fi'
_VALID_URL = r'https?://video\.helsinki\.fi/Arkisto/flash\.php\?id=(?P<id>\d+)'
_TEST = {
'url': 'http://video.helsinki.fi/Arkisto/flash.php?id=20258',
'info_dict': {
'id': '20258',
'ext': 'mp4',
'title': 'Tietotekniikkafoorumi-iltapäivä',
'description': 'md5:f5c904224d43c133225130fe156a5ee0',
},
'params': {
'skip_download': True, # RTMP
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
formats = []
mobj = re.search(r'file=((\w+):[^&]+)', webpage)
if mobj:
formats.append({
'ext': mobj.group(2),
'play_path': mobj.group(1),
'url': 'rtmp://flashvideo.it.helsinki.fi/vod/',
'player_url': 'http://video.helsinki.fi/player.swf',
'format_note': 'sd',
'quality': 0,
})
mobj = re.search(r'hd\.file=((\w+):[^&]+)', webpage)
if mobj:
formats.append({
'ext': mobj.group(2),
'play_path': mobj.group(1),
'url': 'rtmp://flashvideo.it.helsinki.fi/vod/',
'player_url': 'http://video.helsinki.fi/player.swf',
'format_note': 'hd',
'quality': 1,
})
self._sort_formats(formats)
return {
'id': video_id,
'title': self._og_search_title(webpage).replace('Video: ', ''),
'description': self._og_search_description(webpage),
'thumbnail': self._og_search_thumbnail(webpage),
'formats': formats,
}

View File

@ -13,7 +13,7 @@ from ..utils import (
class HotNewHipHopIE(InfoExtractor):
_VALID_URL = r'http://www\.hotnewhiphop.com/.*\.(?P<id>.*)\.html'
_VALID_URL = r'http://www\.hotnewhiphop\.com/.*\.(?P<id>.*)\.html'
_TEST = {
'url': 'http://www.hotnewhiphop.com/freddie-gibbs-lay-it-down-song.1435540.html',
'file': '1435540.mp3',

View File

@ -1,17 +1,20 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class HowcastIE(InfoExtractor):
_VALID_URL = r'(?:https?://)?(?:www\.)?howcast\.com/videos/(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?howcast\.com/videos/(?P<id>\d+)'
_TEST = {
u'url': u'http://www.howcast.com/videos/390161-How-to-Tie-a-Square-Knot-Properly',
u'file': u'390161.mp4',
u'md5': u'8b743df908c42f60cf6496586c7f12c3',
u'info_dict': {
u"description": u"The square knot, also known as the reef knot, is one of the oldest, most basic knots to tie, and can be used in many different ways. Here's the proper way to tie a square knot.",
u"title": u"How to Tie a Square Knot Properly"
'url': 'http://www.howcast.com/videos/390161-How-to-Tie-a-Square-Knot-Properly',
'md5': '8b743df908c42f60cf6496586c7f12c3',
'info_dict': {
'id': '390161',
'ext': 'mp4',
'description': 'The square knot, also known as the reef knot, is one of the oldest, most basic knots to tie, and can be used in many different ways. Here\'s the proper way to tie a square knot.',
'title': 'How to Tie a Square Knot Properly',
}
}
@ -24,22 +27,15 @@ class HowcastIE(InfoExtractor):
self.report_extraction(video_id)
video_url = self._search_regex(r'\'?file\'?: "(http://mobile-media\.howcast\.com/[0-9]+\.mp4)',
webpage, u'video URL')
video_title = self._html_search_regex(r'<meta content=(?:"([^"]+)"|\'([^\']+)\') property=\'og:title\'',
webpage, u'title')
webpage, 'video URL')
video_description = self._html_search_regex(r'<meta content=(?:"([^"]+)"|\'([^\']+)\') name=\'description\'',
webpage, u'description', fatal=False)
webpage, 'description', fatal=False)
thumbnail = self._html_search_regex(r'<meta content=\'(.+?)\' property=\'og:image\'',
webpage, u'thumbnail', fatal=False)
return [{
return {
'id': video_id,
'url': video_url,
'ext': 'mp4',
'title': video_title,
'title': self._og_search_title(webpage),
'description': video_description,
'thumbnail': thumbnail,
}]
'thumbnail': self._og_search_thumbnail(webpage),
}

View File

@ -0,0 +1,82 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
parse_duration,
unified_strdate,
)
class HuffPostIE(InfoExtractor):
IE_DESC = 'Huffington Post'
_VALID_URL = r'''(?x)
https?://(embed\.)?live\.huffingtonpost\.com/
(?:
r/segment/[^/]+/|
HPLEmbedPlayer/\?segmentId=
)
(?P<id>[0-9a-f]+)'''
_TEST = {
'url': 'http://live.huffingtonpost.com/r/segment/legalese-it/52dd3e4b02a7602131000677',
'file': '52dd3e4b02a7602131000677.mp4',
'md5': '55f5e8981c1c80a64706a44b74833de8',
'info_dict': {
'title': 'Legalese It! with @MikeSacksHP',
'description': 'This week on Legalese It, Mike talks to David Bosco about his new book on the ICC, "Rough Justice," he also discusses the Virginia AG\'s historic stance on gay marriage, the execution of Edgar Tamayo, the ICC\'s delay of Kenya\'s President and more. ',
'duration': 1549,
'upload_date': '20140124',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
api_url = 'http://embed.live.huffingtonpost.com/api/segments/%s.json' % video_id
data = self._download_json(api_url, video_id)['data']
video_title = data['title']
duration = parse_duration(data['running_time'])
upload_date = unified_strdate(data['schedule']['starts_at'])
description = data.get('description')
thumbnails = []
for url in data['images'].values():
m = re.match('.*-([0-9]+x[0-9]+)\.', url)
if not m:
continue
thumbnails.append({
'url': url,
'resolution': m.group(1),
})
formats = [{
'format': key,
'format_id': key.replace('/', '.'),
'ext': 'mp4',
'url': url,
'vcodec': 'none' if key.startswith('audio/') else None,
} for key, url in data['sources']['live'].items()]
if data.get('fivemin_id'):
fid = data['fivemin_id']
fcat = str(int(fid) // 100 + 1)
furl = 'http://avideos.5min.com/2/' + fcat[-3:] + '/' + fcat + '/' + fid + '.mp4'
formats.append({
'format': 'fivemin',
'url': furl,
'preference': 1,
})
self._sort_formats(formats)
return {
'id': video_id,
'title': video_title,
'description': description,
'formats': formats,
'duration': duration,
'upload_date': upload_date,
'thumbnails': thumbnails,
}

View File

@ -69,12 +69,9 @@ class ImdbListIE(InfoExtractor):
list_id = mobj.group('id')
webpage = self._download_webpage(url, list_id)
list_code = self._search_regex(
r'(?s)<div\s+class="list\sdetail">(.*?)class="see-more"',
webpage, 'list code')
entries = [
self.url_result('http://www.imdb.com' + m, 'Imdb')
for m in re.findall(r'href="(/video/imdb/vi[^"]+)"', webpage)]
for m in re.findall(r'href="(/video/imdb/vi[^"]+)"\s+data-type="playlist"', webpage)]
list_title = self._html_search_regex(
r'<h1 class="header">(.*?)</h1>', webpage, 'list title')

View File

@ -1,39 +1,36 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class InaIE(InfoExtractor):
"""Information Extractor for Ina.fr"""
_VALID_URL = r'(?:http://)?(?:www\.)?ina\.fr/video/(?P<id>I?[A-F0-9]+)/.*'
_VALID_URL = r'http://(?:www\.)?ina\.fr/video/(?P<id>I?[A-Z0-9]+)'
_TEST = {
u'url': u'http://www.ina.fr/video/I12055569/francois-hollande-je-crois-que-c-est-clair-video.html',
u'file': u'I12055569.mp4',
u'md5': u'a667021bf2b41f8dc6049479d9bb38a3',
u'info_dict': {
u"title": u"Fran\u00e7ois Hollande \"Je crois que c'est clair\""
'url': 'http://www.ina.fr/video/I12055569/francois-hollande-je-crois-que-c-est-clair-video.html',
'md5': 'a667021bf2b41f8dc6049479d9bb38a3',
'info_dict': {
'id': 'I12055569',
'ext': 'mp4',
'title': 'François Hollande "Je crois que c\'est clair"',
}
}
def _real_extract(self,url):
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
mrss_url='http://player.ina.fr/notices/%s.mrss' % video_id
video_extension = 'mp4'
webpage = self._download_webpage(mrss_url, video_id)
mrss_url = 'http://player.ina.fr/notices/%s.mrss' % video_id
info_doc = self._download_xml(mrss_url, video_id)
self.report_extraction(video_id)
video_url = self._html_search_regex(r'<media:player url="(?P<mp4url>http://mp4.ina.fr/[^"]+\.mp4)',
webpage, u'video URL')
video_url = info_doc.find('.//{http://search.yahoo.com/mrss/}player').attrib['url']
video_title = self._search_regex(r'<title><!\[CDATA\[(?P<titre>.*?)]]></title>',
webpage, u'title')
return [{
return {
'id': video_id,
'url': video_url,
'ext': video_extension,
'title': video_title,
}]
'title': info_doc.find('.//title').text,
}

View File

@ -1,62 +1,55 @@
from __future__ import unicode_literals
import base64
import re
from .common import InfoExtractor
from ..utils import (
compat_urllib_parse,
ExtractorError,
)
class InfoQIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?(?:www\.)?infoq\.com/[^/]+/[^/]+$'
_VALID_URL = r'https?://(?:www\.)?infoq\.com/[^/]+/(?P<id>[^/]+)$'
_TEST = {
u"name": u"InfoQ",
u"url": u"http://www.infoq.com/presentations/A-Few-of-My-Favorite-Python-Things",
u"file": u"12-jan-pythonthings.mp4",
u"info_dict": {
u"description": u"Mike Pirnat presents some tips and tricks, standard libraries and third party packages that make programming in Python a richer experience.",
u"title": u"A Few of My Favorite [Python] Things"
"name": "InfoQ",
"url": "http://www.infoq.com/presentations/A-Few-of-My-Favorite-Python-Things",
"file": "12-jan-pythonthings.mp4",
"info_dict": {
"description": "Mike Pirnat presents some tips and tricks, standard libraries and third party packages that make programming in Python a richer experience.",
"title": "A Few of My Favorite [Python] Things",
},
"params": {
"skip_download": True,
},
u"params": {
u"skip_download": True
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id=url)
self.report_extraction(url)
webpage = self._download_webpage(url, video_id)
# Extract video URL
mobj = re.search(r"jsclassref ?= ?'([^']*)'", webpage)
if mobj is None:
raise ExtractorError(u'Unable to extract video url')
real_id = compat_urllib_parse.unquote(base64.b64decode(mobj.group(1).encode('ascii')).decode('utf-8'))
encoded_id = self._search_regex(r"jsclassref ?= ?'([^']*)'", webpage, 'encoded id')
real_id = compat_urllib_parse.unquote(base64.b64decode(encoded_id.encode('ascii')).decode('utf-8'))
video_url = 'rtmpe://video.infoq.com/cfx/st/' + real_id
# Extract title
video_title = self._search_regex(r'contentTitle = "(.*?)";',
webpage, u'title')
webpage, 'title')
# Extract description
video_description = self._html_search_regex(r'<meta name="description" content="(.*)"(?:\s*/)?>',
webpage, u'description', fatal=False)
webpage, 'description', fatal=False)
video_filename = video_url.split('/')[-1]
video_id, extension = video_filename.split('.')
info = {
return {
'id': video_id,
'url': video_url,
'uploader': None,
'upload_date': None,
'title': video_title,
'ext': extension, # Extension is always(?) mp4, but seems to be flv
'thumbnail': None,
'description': video_description,
}
return [info]

View File

@ -1,35 +1,39 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class InstagramIE(InfoExtractor):
_VALID_URL = r'(?:http://)?instagram\.com/p/(.*?)/'
_VALID_URL = r'http://instagram\.com/p/(?P<id>.*?)/'
_TEST = {
u'url': u'http://instagram.com/p/aye83DjauH/?foo=bar#abc',
u'file': u'aye83DjauH.mp4',
u'md5': u'0d2da106a9d2631273e192b372806516',
u'info_dict': {
u"uploader_id": u"naomipq",
u"title": u"Video by naomipq",
u'description': u'md5:1f17f0ab29bd6fe2bfad705f58de3cb8',
'url': 'http://instagram.com/p/aye83DjauH/?foo=bar#abc',
'md5': '0d2da106a9d2631273e192b372806516',
'info_dict': {
'id': 'aye83DjauH',
'ext': 'mp4',
'uploader_id': 'naomipq',
'title': 'Video by naomipq',
'description': 'md5:1f17f0ab29bd6fe2bfad705f58de3cb8',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group(1)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
uploader_id = self._search_regex(r'"owner":{"username":"(.+?)"',
webpage, u'uploader id', fatal=False)
desc = self._search_regex(r'"caption":"(.*?)"', webpage, u'description',
webpage, 'uploader id', fatal=False)
desc = self._search_regex(r'"caption":"(.*?)"', webpage, 'description',
fatal=False)
return [{
return {
'id': video_id,
'url': self._og_search_video_url(webpage, secure=False),
'ext': 'mp4',
'title': u'Video by %s' % uploader_id,
'title': 'Video by %s' % uploader_id,
'thumbnail': self._og_search_thumbnail(webpage),
'uploader_id' : uploader_id,
'uploader_id': uploader_id,
'description': desc,
}]
}

View File

@ -0,0 +1,85 @@
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
import re
from random import random
from math import floor
from .common import InfoExtractor
from ..utils import compat_urllib_request
class IPrimaIE(InfoExtractor):
_VALID_URL = r'https?://play\.iprima\.cz/(?P<videogroup>.+)/(?P<videoid>.+)'
_TESTS = [{
'url': 'http://play.iprima.cz/particka/particka-92',
'info_dict': {
'id': '39152',
'ext': 'flv',
'title': 'Partička (92)',
'description': 'md5:3740fda51464da35a2d4d0670b8e4fd6',
'thumbnail': 'http://play.iprima.cz/sites/default/files/image_crops/image_620x349/3/491483_particka-92_image_620x349.jpg',
},
'params': {
'skip_download': True,
},
},
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('videoid')
webpage = self._download_webpage(url, video_id)
player_url = 'http://embed.livebox.cz/iprimaplay/player-embed-v2.js?__tok%s__=%s' % (
floor(random()*1073741824),
floor(random()*1073741824))
req = compat_urllib_request.Request(player_url)
req.add_header('Referer', url)
playerpage = self._download_webpage(req, video_id)
base_url = ''.join(re.findall(r"embed\['stream'\] = '(.+?)'.+'(\?auth=)'.+'(.+?)';", playerpage)[1])
zoneGEO = self._html_search_regex(r'"zoneGEO":(.+?),', webpage, 'zoneGEO')
if zoneGEO != '0':
base_url = base_url.replace('token', 'token_'+zoneGEO)
formats = []
for format_id in ['lq', 'hq', 'hd']:
filename = self._html_search_regex(r'"%s_id":(.+?),' % format_id, webpage, 'filename')
if filename == 'null':
continue
real_id = self._search_regex(r'Prima-[0-9]{10}-([0-9]+)_', filename, 'real video id')
if format_id == 'lq':
quality = 0
elif format_id == 'hq':
quality = 1
elif format_id == 'hd':
quality = 2
filename = 'hq/'+filename
formats.append({
'format_id': format_id,
'url': base_url,
'quality': quality,
'play_path': 'mp4:'+filename.replace('"', '')[:-4],
'rtmp_live': True,
'ext': 'flv',
})
self._sort_formats(formats)
return {
'id': real_id,
'title': self._og_search_title(webpage),
'thumbnail': self._og_search_thumbnail(webpage),
'formats': formats,
'description': self._og_search_description(webpage),
}

View File

@ -1,4 +1,5 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
import json
@ -11,35 +12,37 @@ from ..utils import (
class IviIE(InfoExtractor):
IE_DESC = u'ivi.ru'
IE_NAME = u'ivi'
_VALID_URL = r'^https?://(?:www\.)?ivi\.ru/watch(?:/(?P<compilationid>[^/]+))?/(?P<videoid>\d+)'
IE_DESC = 'ivi.ru'
IE_NAME = 'ivi'
_VALID_URL = r'https?://(?:www\.)?ivi\.ru/watch(?:/(?P<compilationid>[^/]+))?/(?P<videoid>\d+)'
_TESTS = [
# Single movie
{
u'url': u'http://www.ivi.ru/watch/53141',
u'file': u'53141.mp4',
u'md5': u'6ff5be2254e796ed346251d117196cf4',
u'info_dict': {
u'title': u'Иван Васильевич меняет профессию',
u'description': u'md5:14d8eda24e9d93d29b5857012c6d6346',
u'duration': 5498,
u'thumbnail': u'http://thumbs.ivi.ru/f20.vcp.digitalaccess.ru/contents/d/1/c3c885163a082c29bceeb7b5a267a6.jpg',
'url': 'http://www.ivi.ru/watch/53141',
'md5': '6ff5be2254e796ed346251d117196cf4',
'info_dict': {
'id': '53141',
'ext': 'mp4',
'title': 'Иван Васильевич меняет профессию',
'description': 'md5:b924063ea1677c8fe343d8a72ac2195f',
'duration': 5498,
'thumbnail': 'http://thumbs.ivi.ru/f20.vcp.digitalaccess.ru/contents/d/1/c3c885163a082c29bceeb7b5a267a6.jpg',
},
u'skip': u'Only works from Russia',
'skip': 'Only works from Russia',
},
# Serial's serie
{
u'url': u'http://www.ivi.ru/watch/dezhurnyi_angel/74791',
u'file': u'74791.mp4',
u'md5': u'3e6cc9a848c1d2ebcc6476444967baa9',
u'info_dict': {
u'title': u'Дежурный ангел - 1 серия',
u'duration': 2490,
u'thumbnail': u'http://thumbs.ivi.ru/f7.vcp.digitalaccess.ru/contents/8/e/bc2f6c2b6e5d291152fdd32c059141.jpg',
'url': 'http://www.ivi.ru/watch/dezhurnyi_angel/74791',
'md5': '3e6cc9a848c1d2ebcc6476444967baa9',
'info_dict': {
'id': '74791',
'ext': 'mp4',
'title': 'Дежурный ангел - 1 серия',
'duration': 2490,
'thumbnail': 'http://thumbs.ivi.ru/f7.vcp.digitalaccess.ru/contents/8/e/bc2f6c2b6e5d291152fdd32c059141.jpg',
},
u'skip': u'Only works from Russia',
'skip': 'Only works from Russia',
}
]
@ -54,7 +57,7 @@ class IviIE(InfoExtractor):
return m.group('description') if m is not None else None
def _extract_comment_count(self, html):
m = re.search(u'(?s)<a href="#" id="view-comments" class="action-button dim gradient">\s*Комментарии:\s*(?P<commentcount>\d+)\s*</a>', html)
m = re.search('(?s)<a href="#" id="view-comments" class="action-button dim gradient">\s*Комментарии:\s*(?P<commentcount>\d+)\s*</a>', html)
return int(m.group('commentcount')) if m is not None else 0
def _real_extract(self, url):
@ -63,49 +66,49 @@ class IviIE(InfoExtractor):
api_url = 'http://api.digitalaccess.ru/api/json/'
data = {u'method': u'da.content.get',
u'params': [video_id, {u'site': u's183',
u'referrer': u'http://www.ivi.ru/watch/%s' % video_id,
u'contentid': video_id
data = {'method': 'da.content.get',
'params': [video_id, {'site': 's183',
'referrer': 'http://www.ivi.ru/watch/%s' % video_id,
'contentid': video_id
}
]
}
request = compat_urllib_request.Request(api_url, json.dumps(data))
video_json_page = self._download_webpage(request, video_id, u'Downloading video JSON')
video_json_page = self._download_webpage(request, video_id, 'Downloading video JSON')
video_json = json.loads(video_json_page)
if u'error' in video_json:
error = video_json[u'error']
if error[u'origin'] == u'NoRedisValidData':
raise ExtractorError(u'Video %s does not exist' % video_id, expected=True)
raise ExtractorError(u'Unable to download video %s: %s' % (video_id, error[u'message']), expected=True)
if 'error' in video_json:
error = video_json['error']
if error['origin'] == 'NoRedisValidData':
raise ExtractorError('Video %s does not exist' % video_id, expected=True)
raise ExtractorError('Unable to download video %s: %s' % (video_id, error['message']), expected=True)
result = video_json[u'result']
result = video_json['result']
formats = [{
'url': x[u'url'],
'format_id': x[u'content_format'],
'preference': self._known_formats.index(x[u'content_format']),
} for x in result[u'files'] if x[u'content_format'] in self._known_formats]
'url': x['url'],
'format_id': x['content_format'],
'preference': self._known_formats.index(x['content_format']),
} for x in result['files'] if x['content_format'] in self._known_formats]
self._sort_formats(formats)
if not formats:
raise ExtractorError(u'No media links available for %s' % video_id)
raise ExtractorError('No media links available for %s' % video_id)
duration = result[u'duration']
compilation = result[u'compilation']
title = result[u'title']
duration = result['duration']
compilation = result['compilation']
title = result['title']
title = '%s - %s' % (compilation, title) if compilation is not None else title
previews = result[u'preview']
previews = result['preview']
previews.sort(key=lambda fmt: self._known_thumbnails.index(fmt['content_format']))
thumbnail = previews[-1][u'url'] if len(previews) > 0 else None
thumbnail = previews[-1]['url'] if len(previews) > 0 else None
video_page = self._download_webpage(url, video_id, u'Downloading video page')
video_page = self._download_webpage(url, video_id, 'Downloading video page')
description = self._extract_description(video_page)
comment_count = self._extract_comment_count(video_page)
@ -121,9 +124,9 @@ class IviIE(InfoExtractor):
class IviCompilationIE(InfoExtractor):
IE_DESC = u'ivi.ru compilations'
IE_NAME = u'ivi:compilation'
_VALID_URL = r'^https?://(?:www\.)?ivi\.ru/watch/(?!\d+)(?P<compilationid>[a-z\d_-]+)(?:/season(?P<seasonid>\d+))?$'
IE_DESC = 'ivi.ru compilations'
IE_NAME = 'ivi:compilation'
_VALID_URL = r'https?://(?:www\.)?ivi\.ru/watch/(?!\d+)(?P<compilationid>[a-z\d_-]+)(?:/season(?P<seasonid>\d+))?$'
def _extract_entries(self, html, compilation_id):
return [self.url_result('http://www.ivi.ru/watch/%s/%s' % (compilation_id, serie), 'Ivi')
@ -135,22 +138,23 @@ class IviCompilationIE(InfoExtractor):
season_id = mobj.group('seasonid')
if season_id is not None: # Season link
season_page = self._download_webpage(url, compilation_id, u'Downloading season %s web page' % season_id)
season_page = self._download_webpage(url, compilation_id, 'Downloading season %s web page' % season_id)
playlist_id = '%s/season%s' % (compilation_id, season_id)
playlist_title = self._html_search_meta(u'title', season_page, u'title')
playlist_title = self._html_search_meta('title', season_page, 'title')
entries = self._extract_entries(season_page, compilation_id)
else: # Compilation link
compilation_page = self._download_webpage(url, compilation_id, u'Downloading compilation web page')
compilation_page = self._download_webpage(url, compilation_id, 'Downloading compilation web page')
playlist_id = compilation_id
playlist_title = self._html_search_meta(u'title', compilation_page, u'title')
playlist_title = self._html_search_meta('title', compilation_page, 'title')
seasons = re.findall(r'<a href="/watch/%s/season(\d+)">[^<]+</a>' % compilation_id, compilation_page)
if len(seasons) == 0: # No seasons in this compilation
entries = self._extract_entries(compilation_page, compilation_id)
else:
entries = []
for season_id in seasons:
season_page = self._download_webpage('http://www.ivi.ru/watch/%s/season%s' % (compilation_id, season_id),
compilation_id, u'Downloading season %s web page' % season_id)
season_page = self._download_webpage(
'http://www.ivi.ru/watch/%s/season%s' % (compilation_id, season_id),
compilation_id, 'Downloading season %s web page' % season_id)
entries.extend(self._extract_entries(season_page, compilation_id))
return self.playlist_result(entries, playlist_id, playlist_title)

View File

@ -0,0 +1,48 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from .youtube import YoutubeIE
class JadoreCettePubIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?jadorecettepub\.com/[0-9]{4}/[0-9]{2}/(?P<id>.*?)\.html'
_TEST = {
'url': 'http://www.jadorecettepub.com/2010/12/star-wars-massacre-par-les-japonais.html',
'md5': '401286a06067c70b44076044b66515de',
'info_dict': {
'id': 'jLMja3tr7a4',
'ext': 'mp4',
'title': 'La pire utilisation de Star Wars',
'description': "Jadorecettepub.com vous a gratifié de plusieurs pubs géniales utilisant Star Wars et Dark Vador plus particulièrement... Mais l'heure est venue de vous proposer une version totalement massacrée, venue du Japon. Quand les Japonais détruisent l'image de Star Wars pour vendre du thon en boite, ça promet...",
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('id')
webpage = self._download_webpage(url, display_id)
title = self._html_search_regex(
r'<span style="font-size: x-large;"><b>(.*?)</b></span>',
webpage, 'title')
description = self._html_search_regex(
r'(?s)<div id="fb-root">(.*?)<script>', webpage, 'description',
fatal=False)
real_url = self._search_regex(
r'\[/postlink\](.*)endofvid', webpage, 'video URL')
video_id = YoutubeIE.extract_id(real_url)
return {
'_type': 'url_transparent',
'url': real_url,
'id': video_id,
'title': title,
'description': description,
}

View File

@ -1,5 +1,7 @@
# coding: utf-8
from __future__ import unicode_literals
import json
import re
@ -10,12 +12,13 @@ class JeuxVideoIE(InfoExtractor):
_VALID_URL = r'http://.*?\.jeuxvideo\.com/.*/(.*?)-\d+\.htm'
_TEST = {
u'url': u'http://www.jeuxvideo.com/reportages-videos-jeux/0004/00046170/tearaway-playstation-vita-gc-2013-tearaway-nous-presente-ses-papiers-d-identite-00115182.htm',
u'file': u'5182.mp4',
u'md5': u'046e491afb32a8aaac1f44dd4ddd54ee',
u'info_dict': {
u'title': u'GC 2013 : Tearaway nous présente ses papiers d\'identité',
u'description': u'Lorsque les développeurs de LittleBigPlanet proposent un nouveau titre, on ne peut que s\'attendre à un résultat original et fort attrayant.\n',
'url': 'http://www.jeuxvideo.com/reportages-videos-jeux/0004/00046170/tearaway-playstation-vita-gc-2013-tearaway-nous-presente-ses-papiers-d-identite-00115182.htm',
'md5': '046e491afb32a8aaac1f44dd4ddd54ee',
'info_dict': {
'id': '5182',
'ext': 'mp4',
'title': 'GC 2013 : Tearaway nous présente ses papiers d\'identité',
'description': 'Lorsque les développeurs de LittleBigPlanet proposent un nouveau titre, on ne peut que s\'attendre à un résultat original et fort attrayant.\n',
},
}
@ -25,14 +28,14 @@ class JeuxVideoIE(InfoExtractor):
webpage = self._download_webpage(url, title)
xml_link = self._html_search_regex(
r'<param name="flashvars" value="config=(.*?)" />',
webpage, u'config URL')
webpage, 'config URL')
video_id = self._search_regex(
r'http://www\.jeuxvideo\.com/config/\w+/\d+/(.*?)/\d+_player\.xml',
xml_link, u'video ID')
xml_link, 'video ID')
config = self._download_xml(
xml_link, title, u'Downloading XML config')
xml_link, title, 'Downloading XML config')
info_json = config.find('format.json').text
info = json.loads(info_json)['versions'][0]

View File

@ -1,3 +1,5 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
@ -5,36 +7,34 @@ from .common import InfoExtractor
class KeekIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?keek\.com/(?:!|\w+/keeks/)(?P<videoID>\w+)'
IE_NAME = u'keek'
IE_NAME = 'keek'
_TEST = {
u'url': u'https://www.keek.com/ytdl/keeks/NODfbab',
u'file': u'NODfbab.mp4',
u'md5': u'9b0636f8c0f7614afa4ea5e4c6e57e83',
u'info_dict': {
u"uploader": u"ytdl",
u"title": u"test chars: \"'/\\\u00e4<>This is a test video for youtube-dl.For more information, contact phihag@phihag.de ."
}
'url': 'https://www.keek.com/ytdl/keeks/NODfbab',
'file': 'NODfbab.mp4',
'md5': '9b0636f8c0f7614afa4ea5e4c6e57e83',
'info_dict': {
'uploader': 'ytdl',
'title': 'test chars: "\'/\\\u00e4<>This is a test video for youtube-dl.For more information, contact phihag@phihag.de .',
},
}
def _real_extract(self, url):
m = re.match(self._VALID_URL, url)
video_id = m.group('videoID')
video_url = u'http://cdn.keek.com/keek/video/%s' % video_id
thumbnail = u'http://cdn.keek.com/keek/thumbnail/%s/w100/h75' % video_id
video_url = 'http://cdn.keek.com/keek/video/%s' % video_id
thumbnail = 'http://cdn.keek.com/keek/thumbnail/%s/w100/h75' % video_id
webpage = self._download_webpage(url, video_id)
video_title = self._og_search_title(webpage)
uploader = self._html_search_regex(
r'<div class="user-name-and-bio">[\S\s]+?<h2>(?P<uploader>.+?)</h2>',
webpage, 'uploader', fatal=False)
uploader = self._html_search_regex(r'<div class="user-name-and-bio">[\S\s]+?<h2>(?P<uploader>.+?)</h2>',
webpage, u'uploader', fatal=False)
info = {
return {
'id': video_id,
'url': video_url,
'ext': 'mp4',
'title': video_title,
'title': self._og_search_title(webpage),
'thumbnail': thumbnail,
'uploader': uploader
}
return [info]

View File

@ -0,0 +1,66 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class KontrTubeIE(InfoExtractor):
IE_NAME = 'kontrtube'
IE_DESC = 'KontrTube.ru - Труба зовёт'
_VALID_URL = r'http://(?:www\.)?kontrtube\.ru/videos/(?P<id>\d+)/.+'
_TEST = {
'url': 'http://www.kontrtube.ru/videos/2678/nad-olimpiyskoy-derevney-v-sochi-podnyat-rossiyskiy-flag/',
'md5': '975a991a4926c9a85f383a736a2e6b80',
'info_dict': {
'id': '2678',
'ext': 'mp4',
'title': 'Над олимпийской деревней в Сочи поднят российский флаг',
'description': 'md5:80edc4c613d5887ae8ccf1d59432be41',
'thumbnail': 'http://www.kontrtube.ru/contents/videos_screenshots/2000/2678/preview.mp4.jpg',
'duration': 270,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id, 'Downloading page')
video_url = self._html_search_regex(r"video_url: '(.+?)/?',", webpage, 'video URL')
thumbnail = self._html_search_regex(r"preview_url: '(.+?)/?',", webpage, 'video thumbnail', fatal=False)
title = self._html_search_regex(r'<title>(.+?) - Труба зовёт - Интересный видеохостинг</title>', webpage,
'video title')
description = self._html_search_meta('description', webpage, 'video description')
mobj = re.search(r'<div class="col_2">Длительность: <span>(?P<minutes>\d+)м:(?P<seconds>\d+)с</span></div>',
webpage)
duration = int(mobj.group('minutes')) * 60 + int(mobj.group('seconds')) if mobj else None
view_count = self._html_search_regex(r'<div class="col_2">Просмотров: <span>(\d+)</span></div>', webpage,
'view count', fatal=False)
view_count = int(view_count) if view_count is not None else None
comment_count = None
comment_str = self._html_search_regex(r'Комментарии: <span>([^<]+)</span>', webpage, 'comment count',
fatal=False)
if comment_str.startswith('комментариев нет'):
comment_count = 0
else:
mobj = re.search(r'\d+ из (?P<total>\d+) комментариев', comment_str)
if mobj:
comment_count = int(mobj.group('total'))
return {
'id': video_id,
'url': video_url,
'thumbnail': thumbnail,
'title': title,
'description': description,
'duration': duration,
'view_count': view_count,
'comment_count': comment_count,
}

View File

@ -0,0 +1,63 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
parse_duration,
)
class LA7IE(InfoExtractor):
IE_NAME = 'la7.tv'
_VALID_URL = r'''(?x)
https?://(?:www\.)?la7\.tv/
(?:
richplayer/\?assetid=|
\?contentId=
)
(?P<id>[0-9]+)'''
_TEST = {
'url': 'http://www.la7.tv/richplayer/?assetid=50355319',
'file': '50355319.mp4',
'md5': 'ec7d1f0224d20ba293ab56cf2259651f',
'info_dict': {
'title': 'IL DIVO',
'description': 'Un film di Paolo Sorrentino con Toni Servillo, Anna Bonaiuto, Giulio Bosetti e Flavio Bucci',
'duration': 6254,
},
'skip': 'Blocked in the US',
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
xml_url = 'http://www.la7.tv/repliche/content/index.php?contentId=%s' % video_id
doc = self._download_xml(xml_url, video_id)
video_title = doc.find('title').text
description = doc.find('description').text
duration = parse_duration(doc.find('duration').text)
thumbnail = doc.find('img').text
view_count = int(doc.find('views').text)
prefix = doc.find('.//fqdn').text.strip().replace('auto:', 'http:')
formats = [{
'format': vnode.find('quality').text,
'tbr': int(vnode.find('quality').text),
'url': vnode.find('fms').text.strip().replace('mp4:', prefix),
} for vnode in doc.findall('.//videos/video')]
self._sort_formats(formats)
return {
'id': video_id,
'title': video_title,
'description': description,
'thumbnail': thumbnail,
'duration': duration,
'formats': formats,
'view_count': view_count,
}

View File

@ -0,0 +1,69 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
unified_strdate
)
class LifeNewsIE(InfoExtractor):
IE_NAME = 'lifenews'
IE_DESC = 'LIFE | NEWS'
_VALID_URL = r'http://lifenews\.ru/(?:mobile/)?news/(?P<id>\d+)'
_TEST = {
'url': 'http://lifenews.ru/news/126342',
'md5': 'e1b50a5c5fb98a6a544250f2e0db570a',
'info_dict': {
'id': '126342',
'ext': 'mp4',
'title': 'МВД разыскивает мужчин, оставивших в IKEA сумку с автоматом',
'description': 'Камеры наблюдения гипермаркета зафиксировали троих мужчин, спрятавших оружейный арсенал в камере хранения.',
'thumbnail': 'http://lifenews.ru/static/posts/2014/1/126342/.video.jpg',
'upload_date': '20140130',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage('http://lifenews.ru/mobile/news/%s' % video_id, video_id, 'Downloading page')
video_url = self._html_search_regex(
r'<video.*?src="([^"]+)".*?></video>', webpage, 'video URL')
thumbnail = self._html_search_regex(
r'<video.*?poster="([^"]+)".*?"></video>', webpage, 'video thumbnail')
title = self._og_search_title(webpage)
TITLE_SUFFIX = ' - Первый по срочным новостям — LIFE | NEWS'
if title.endswith(TITLE_SUFFIX):
title = title[:-len(TITLE_SUFFIX)]
description = self._og_search_description(webpage)
view_count = self._html_search_regex(
r'<div class=\'views\'>(\d+)</div>', webpage, 'view count', fatal=False)
comment_count = self._html_search_regex(
r'<div class=\'comments\'>(\d+)</div>', webpage, 'comment count', fatal=False)
upload_date = self._html_search_regex(
r'<time datetime=\'([^\']+)\'>', webpage, 'upload date',fatal=False)
if upload_date is not None:
upload_date = unified_strdate(upload_date)
return {
'id': video_id,
'url': video_url,
'thumbnail': thumbnail,
'title': title,
'description': description,
'view_count': int_or_none(view_count),
'comment_count': int_or_none(comment_count),
'upload_date': upload_date,
}

View File

@ -1,52 +1,98 @@
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
)
from ..utils import int_or_none
class LiveLeakIE(InfoExtractor):
_VALID_URL = r'^(?:http://)?(?:\w+\.)?liveleak\.com/view\?(?:.*?)i=(?P<video_id>[\w_]+)(?:.*)'
IE_NAME = u'liveleak'
_TEST = {
u'url': u'http://www.liveleak.com/view?i=757_1364311680',
u'file': u'757_1364311680.mp4',
u'md5': u'0813c2430bea7a46bf13acf3406992f4',
u'info_dict': {
u"description": u"extremely bad day for this guy..!",
u"uploader": u"ljfriel2",
u"title": u"Most unlucky car accident"
_TESTS = [{
'url': 'http://www.liveleak.com/view?i=757_1364311680',
'md5': '0813c2430bea7a46bf13acf3406992f4',
'info_dict': {
'id': '757_1364311680',
'ext': 'mp4',
'description': 'extremely bad day for this guy..!',
'uploader': 'ljfriel2',
'title': 'Most unlucky car accident'
}
},
{
'url': 'http://www.liveleak.com/view?i=f93_1390833151',
'md5': 'd3f1367d14cc3c15bf24fbfbe04b9abf',
'info_dict': {
'id': 'f93_1390833151',
'ext': 'mp4',
'description': 'German Television Channel NDR does an exclusive interview with Edward Snowden.\r\nUploaded on LiveLeak cause German Television thinks the rest of the world isn\'t intereseted in Edward Snowden.',
'uploader': 'ARD_Stinkt',
'title': 'German Television does first Edward Snowden Interview (ENGLISH)',
}
},
{
'url': 'http://www.liveleak.com/view?i=4f7_1392687779',
'md5': '42c6d97d54f1db107958760788c5f48f',
'info_dict': {
'id': '4f7_1392687779',
'ext': 'mp4',
'description': "The guy with the cigarette seems amazingly nonchalant about the whole thing... I really hope my friends' reactions would be a bit stronger.\r\n\r\nAction-go to 0:55.",
'uploader': 'CapObveus',
'title': 'Man is Fatally Struck by Reckless Car While Packing up a Moving Truck',
'age_limit': 18,
}
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
if mobj is None:
raise ExtractorError(u'Invalid URL: %s' % url)
video_id = mobj.group('video_id')
webpage = self._download_webpage(url, video_id)
video_url = self._search_regex(r'file: "(.*?)",',
webpage, u'video URL')
video_title = self._og_search_title(webpage).replace('LiveLeak.com -', '').strip()
video_description = self._og_search_description(webpage)
video_uploader = self._html_search_regex(
r'By:.*?(\w+)</a>', webpage, 'uploader', fatal=False)
age_limit = int_or_none(self._search_regex(
r'you confirm that you are ([0-9]+) years and over.',
webpage, 'age limit', default=None))
video_uploader = self._html_search_regex(r'By:.*?(\w+)</a>',
webpage, u'uploader', fatal=False)
info = {
sources_raw = self._search_regex(
r'(?s)sources:\s*(\[.*?\]),', webpage, 'video URLs', default=None)
if sources_raw is None:
alt_source = self._search_regex(
r'(file: ".*?"),', webpage, 'video URL', default=None)
if alt_source:
sources_raw = '[{ %s}]' % alt_source
else:
# Maybe an embed?
embed_url = self._search_regex(
r'<iframe[^>]+src="(http://www.prochan.com/embed\?[^"]+)"',
webpage, 'embed URL')
return {
'_type': 'url_transparent',
'url': embed_url,
'id': video_id,
'url': video_url,
'ext': 'mp4',
'title': video_title,
'description': video_description,
'uploader': video_uploader
'uploader': video_uploader,
'age_limit': age_limit,
}
return [info]
sources_json = re.sub(r'\s([a-z]+):\s', r'"\1": ', sources_raw)
sources = json.loads(sources_json)
formats = [{
'format_note': s.get('label'),
'url': s['file'],
} for s in sources]
self._sort_formats(formats)
return {
'id': video_id,
'title': video_title,
'description': video_description,
'uploader': video_uploader,
'formats': formats,
'age_limit': age_limit,
}

View File

@ -0,0 +1,56 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class M6IE(InfoExtractor):
IE_NAME = 'm6'
_VALID_URL = r'http://(?:www\.)?m6\.fr/[^/]+/videos/(?P<id>\d+)-[^\.]+\.html'
_TEST = {
'url': 'http://www.m6.fr/emission-les_reines_du_shopping/videos/11323908-emeline_est_la_reine_du_shopping_sur_le_theme_ma_fete_d_8217_anniversaire.html',
'md5': '242994a87de2c316891428e0176bcb77',
'info_dict': {
'id': '11323908',
'ext': 'mp4',
'title': 'Emeline est la Reine du Shopping sur le thème « Ma fête danniversaire ! »',
'description': 'md5:1212ae8fb4b7baa4dc3886c5676007c2',
'duration': 100,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
rss = self._download_xml('http://ws.m6.fr/v1/video/info/m6/bonus/%s' % video_id, video_id,
'Downloading video RSS')
title = rss.find('./channel/item/title').text
description = rss.find('./channel/item/description').text
thumbnail = rss.find('./channel/item/visuel_clip_big').text
duration = int(rss.find('./channel/item/duration').text)
view_count = int(rss.find('./channel/item/nombre_vues').text)
formats = []
for format_id in ['lq', 'sd', 'hq', 'hd']:
video_url = rss.find('./channel/item/url_video_%s' % format_id)
if video_url is None:
continue
formats.append({
'url': video_url.text,
'format_id': format_id,
})
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'duration': duration,
'view_count': view_count,
'formats': formats,
}

View File

@ -0,0 +1,59 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
compat_urllib_parse,
)
class MalemotionIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?malemotion\.com/video/(.+?)\.(?P<id>.+?)(#|$)'
_TEST = {
'url': 'http://malemotion.com/video/bien-dur.10ew',
'file': '10ew.mp4',
'md5': 'b3cc49f953b107e4a363cdff07d100ce',
'info_dict': {
"title": "Bien dur",
"age_limit": 18,
},
'skip': 'This video has been deleted.'
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group("id")
webpage = self._download_webpage(url, video_id)
self.report_extraction(video_id)
# Extract video URL
video_url = compat_urllib_parse.unquote(
self._search_regex(r'<source type="video/mp4" src="(.+?)"', webpage, 'video URL'))
# Extract title
video_title = self._html_search_regex(
r'<title>(.*?)</title', webpage, 'title')
# Extract video thumbnail
video_thumbnail = self._search_regex(
r'<video .+?poster="(.+?)"', webpage, 'thumbnail', fatal=False)
formats = [{
'url': video_url,
'ext': 'mp4',
'format_id': 'mp4',
'preference': 1,
}]
return {
'id': video_id,
'formats': formats,
'uploader': None,
'upload_date': None,
'title': video_title,
'thumbnail': video_thumbnail,
'description': None,
'age_limit': 18,
}

View File

@ -0,0 +1,114 @@
from __future__ import unicode_literals
import re
import time
from .common import InfoExtractor
from ..utils import (
ExtractorError,
compat_urllib_request,
compat_urllib_parse,
)
class MooshareIE(InfoExtractor):
IE_NAME = 'mooshare'
IE_DESC = 'Mooshare.biz'
_VALID_URL = r'http://mooshare\.biz/(?P<id>[\da-z]{12})'
_TESTS = [
{
'url': 'http://mooshare.biz/8dqtk4bjbp8g',
'md5': '4e14f9562928aecd2e42c6f341c8feba',
'info_dict': {
'id': '8dqtk4bjbp8g',
'ext': 'mp4',
'title': 'Comedy Football 2011 - (part 1-2)',
'duration': 893,
},
},
{
'url': 'http://mooshare.biz/aipjtoc4g95j',
'info_dict': {
'id': 'aipjtoc4g95j',
'ext': 'mp4',
'title': 'Orange Caramel Dashing Through the Snow',
'duration': 212,
},
'params': {
# rtmp download
'skip_download': True,
}
}
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
page = self._download_webpage(url, video_id, 'Downloading page')
if re.search(r'>Video Not Found or Deleted<', page) is not None:
raise ExtractorError(u'Video %s does not exist' % video_id, expected=True)
hash_key = self._html_search_regex(r'<input type="hidden" name="hash" value="([^"]+)">', page, 'hash')
title = self._html_search_regex(r'(?m)<div class="blockTitle">\s*<h2>Watch ([^<]+)</h2>', page, 'title')
download_form = {
'op': 'download1',
'id': video_id,
'hash': hash_key,
}
request = compat_urllib_request.Request(
'http://mooshare.biz/%s' % video_id, compat_urllib_parse.urlencode(download_form))
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
self.to_screen('%s: Waiting for timeout' % video_id)
time.sleep(5)
video_page = self._download_webpage(request, video_id, 'Downloading video page')
thumbnail = self._html_search_regex(r'image:\s*"([^"]+)",', video_page, 'thumbnail', fatal=False)
duration_str = self._html_search_regex(r'duration:\s*"(\d+)",', video_page, 'duration', fatal=False)
duration = int(duration_str) if duration_str is not None else None
formats = []
# SD video
mobj = re.search(r'(?m)file:\s*"(?P<url>[^"]+)",\s*provider:', video_page)
if mobj is not None:
formats.append({
'url': mobj.group('url'),
'format_id': 'sd',
'format': 'SD',
})
# HD video
mobj = re.search(r'\'hd-2\': { file: \'(?P<url>[^\']+)\' },', video_page)
if mobj is not None:
formats.append({
'url': mobj.group('url'),
'format_id': 'hd',
'format': 'HD',
})
# rtmp video
mobj = re.search(r'(?m)file: "(?P<playpath>[^"]+)",\s*streamer: "(?P<rtmpurl>rtmp://[^"]+)",', video_page)
if mobj is not None:
formats.append({
'url': mobj.group('rtmpurl'),
'play_path': mobj.group('playpath'),
'rtmp_live': False,
'ext': 'mp4',
'format_id': 'rtmp',
'format': 'HD',
})
return {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'duration': duration,
'formats': formats,
}

View File

@ -86,6 +86,9 @@ class MTVServicesInfoExtractor(InfoExtractor):
title_el = itemdoc.find('.//{http://search.yahoo.com/mrss/}title')
if title_el is None:
title_el = itemdoc.find('.//title')
if title_el.text is None:
title_el = None
title = title_el.text
if title is None:
raise ExtractorError('Could not find video title')
@ -119,7 +122,9 @@ class MTVServicesInfoExtractor(InfoExtractor):
if mgid.endswith('.swf'):
mgid = mgid[:-4]
except RegexNotFoundError:
mgid = self._search_regex(r'data-mgid="(.*?)"', webpage, u'mgid')
mgid = self._search_regex(
[r'data-mgid="(.*?)"', r'swfobject.embedSWF\(".*?(mgid:.*?)"'],
webpage, u'mgid')
return self._get_videos_info(mgid)

View File

@ -1,3 +1,4 @@
from __future__ import unicode_literals
import os.path
from .common import InfoExtractor
@ -11,13 +12,13 @@ from ..utils import (
class MySpassIE(InfoExtractor):
_VALID_URL = r'http://www\.myspass\.de/.*'
_TEST = {
u'url': u'http://www.myspass.de/myspass/shows/tvshows/absolute-mehrheit/Absolute-Mehrheit-vom-17022013-Die-Highlights-Teil-2--/11741/',
u'file': u'11741.mp4',
u'md5': u'0b49f4844a068f8b33f4b7c88405862b',
u'info_dict': {
u"description": u"Wer kann in die Fu\u00dfstapfen von Wolfgang Kubicki treten und die Mehrheit der Zuschauer hinter sich versammeln? Wird vielleicht sogar die Absolute Mehrheit geknackt und der Jackpot von 200.000 Euro mit nach Hause genommen?",
u"title": u"Absolute Mehrheit vom 17.02.2013 - Die Highlights, Teil 2"
}
'url': 'http://www.myspass.de/myspass/shows/tvshows/absolute-mehrheit/Absolute-Mehrheit-vom-17022013-Die-Highlights-Teil-2--/11741/',
'file': '11741.mp4',
'md5': '0b49f4844a068f8b33f4b7c88405862b',
'info_dict': {
"description": "Wer kann in die Fu\u00dfstapfen von Wolfgang Kubicki treten und die Mehrheit der Zuschauer hinter sich versammeln? Wird vielleicht sogar die Absolute Mehrheit geknackt und der Jackpot von 200.000 Euro mit nach Hause genommen?",
"title": "Absolute Mehrheit vom 17.02.2013 - Die Highlights, Teil 2",
},
}
def _real_extract(self, url):
@ -37,12 +38,11 @@ class MySpassIE(InfoExtractor):
# extract values from metadata
url_flv_el = metadata.find('url_flv')
if url_flv_el is None:
raise ExtractorError(u'Unable to extract download url')
raise ExtractorError('Unable to extract download url')
video_url = url_flv_el.text
extension = os.path.splitext(video_url)[1][1:]
title_el = metadata.find('title')
if title_el is None:
raise ExtractorError(u'Unable to extract title')
raise ExtractorError('Unable to extract title')
title = title_el.text
format_id_el = metadata.find('format_id')
if format_id_el is None:
@ -59,13 +59,12 @@ class MySpassIE(InfoExtractor):
thumbnail = imagePreview_el.text
else:
thumbnail = None
info = {
return {
'id': video_id,
'url': video_url,
'title': title,
'ext': extension,
'format': format,
'thumbnail': thumbnail,
'description': description
'description': description,
}
return [info]

View File

@ -1,48 +1,39 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
)
class NBAIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?(?:watch\.|www\.)?nba\.com/(?:nba/)?video(/[^?]*?)(?:/index\.html)?(?:\?.*)?$'
_TEST = {
u'url': u'http://www.nba.com/video/games/nets/2012/12/04/0021200253-okc-bkn-recap.nba/index.html',
u'file': u'0021200253-okc-bkn-recap.nba.mp4',
u'md5': u'c0edcfc37607344e2ff8f13c378c88a4',
u'info_dict': {
u"description": u"Kevin Durant scores 32 points and dishes out six assists as the Thunder beat the Nets in Brooklyn.",
u"title": u"Thunder vs. Nets"
}
'url': 'http://www.nba.com/video/games/nets/2012/12/04/0021200253-okc-bkn-recap.nba/index.html',
'file': u'0021200253-okc-bkn-recap.nba.mp4',
'md5': u'c0edcfc37607344e2ff8f13c378c88a4',
'info_dict': {
'description': 'Kevin Durant scores 32 points and dishes out six assists as the Thunder beat the Nets in Brooklyn.',
'title': 'Thunder vs. Nets',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
if mobj is None:
raise ExtractorError(u'Invalid URL: %s' % url)
video_id = mobj.group(1)
webpage = self._download_webpage(url, video_id)
video_url = u'http://ht-mobile.cdn.turner.com/nba/big' + video_id + '_nba_1280x720.mp4'
video_url = 'http://ht-mobile.cdn.turner.com/nba/big' + video_id + '_nba_1280x720.mp4'
shortened_video_id = video_id.rpartition('/')[2]
title = self._og_search_title(webpage, default=shortened_video_id).replace('NBA.com: ', '')
# It isn't there in the HTML it returns to us
# uploader_date = self._html_search_regex(r'<b>Date:</b> (.*?)</div>', webpage, 'upload_date', fatal=False)
description = self._html_search_regex(r'<meta name="description" (?:content|value)="(.*?)" />', webpage, 'description', fatal=False)
info = {
return {
'id': shortened_video_id,
'url': video_url,
'ext': 'mp4',
'title': title,
# 'uploader_date': uploader_date,
'description': description,
}
return [info]

View File

@ -0,0 +1,89 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import ExtractorError
class NDRIE(InfoExtractor):
IE_NAME = 'ndr'
IE_DESC = 'NDR.de - Mediathek'
_VALID_URL = r'https?://www\.ndr\.de/.+?(?P<id>\d+)\.html'
_TESTS = [
{
'url': 'http://www.ndr.de/fernsehen/sendungen/markt/markt7959.html',
'md5': 'e7a6079ca39d3568f4996cb858dd6708',
'note': 'Video file',
'info_dict': {
'id': '7959',
'ext': 'mp4',
'title': 'Markt - die ganze Sendung',
'description': 'md5:af9179cf07f67c5c12dc6d9997e05725',
'duration': 2655,
},
},
{
'url': 'http://www.ndr.de/info/audio51535.html',
'md5': 'bb3cd38e24fbcc866d13b50ca59307b8',
'note': 'Audio file',
'info_dict': {
'id': '51535',
'ext': 'mp3',
'title': 'La Valette entgeht der Hinrichtung',
'description': 'md5:22f9541913a40fe50091d5cdd7c9f536',
'duration': 884,
}
}
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
page = self._download_webpage(url, video_id, 'Downloading page')
title = self._og_search_title(page)
description = self._og_search_description(page)
mobj = re.search(
r'<div class="duration"><span class="min">(?P<minutes>\d+)</span>:<span class="sec">(?P<seconds>\d+)</span></div>',
page)
duration = int(mobj.group('minutes')) * 60 + int(mobj.group('seconds')) if mobj else None
formats = []
mp3_url = re.search(r'''{src:'(?P<audio>[^']+)', type:"audio/mp3"},''', page)
if mp3_url:
formats.append({
'url': mp3_url.group('audio'),
'format_id': 'mp3',
})
thumbnail = None
video_url = re.search(r'''3: {src:'(?P<video>.+?)\.hi\.mp4', type:"video/mp4"},''', page)
if video_url:
thumbnail = self._html_search_regex(r'(?m)title: "NDR PLAYER",\s*poster: "([^"]+)",',
page, 'thumbnail', fatal=False)
if thumbnail:
thumbnail = 'http://www.ndr.de' + thumbnail
for format_id in ['lo', 'hi', 'hq']:
formats.append({
'url': '%s.%s.mp4' % (video_url.group('video'), format_id),
'format_id': format_id,
})
if not formats:
raise ExtractorError('No media links available for %s' % video_id)
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'duration': duration,
'formats': formats,
}

View File

@ -4,18 +4,18 @@ import json
import re
from .common import InfoExtractor
from ..utils import determine_ext
class NewgroundsIE(InfoExtractor):
_VALID_URL = r'(?:https?://)?(?:www\.)?newgrounds\.com/audio/listen/(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?newgrounds\.com/audio/listen/(?P<id>[0-9]+)'
_TEST = {
'url': 'http://www.newgrounds.com/audio/listen/549479',
'file': '549479.mp3',
'md5': 'fe6033d297591288fa1c1f780386f07a',
'info_dict': {
"title": "B7 - BusMode",
"uploader": "Burn7",
'id': '549479',
'ext': 'mp3',
'title': 'B7 - BusMode',
'uploader': 'Burn7',
}
}

View File

@ -0,0 +1,94 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
compat_urllib_request,
compat_urllib_parse,
)
class NFBIE(InfoExtractor):
IE_NAME = 'nfb'
IE_DESC = 'National Film Board of Canada'
_VALID_URL = r'https?://(?:www\.)?(nfb|onf)\.ca/film/(?P<id>[\da-z_-]+)'
_TEST = {
'url': 'https://www.nfb.ca/film/qallunaat_why_white_people_are_funny',
'info_dict': {
'id': 'qallunaat_why_white_people_are_funny',
'ext': 'mp4',
'title': 'Qallunaat! Why White People Are Funny ',
'description': 'md5:836d8aff55e087d04d9f6df554d4e038',
'duration': 3128,
'uploader': 'Mark Sandiford',
'uploader_id': 'mark-sandiford',
},
'params': {
# rtmp download
'skip_download': True,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
page = self._download_webpage('https://www.nfb.ca/film/%s' % video_id, video_id, 'Downloading film page')
uploader_id = self._html_search_regex(r'<a class="director-link" href="/explore-all-directors/([^/]+)/"',
page, 'director id', fatal=False)
uploader = self._html_search_regex(r'<em class="director-name" itemprop="name">([^<]+)</em>',
page, 'director name', fatal=False)
request = compat_urllib_request.Request('https://www.nfb.ca/film/%s/player_config' % video_id,
compat_urllib_parse.urlencode({'getConfig': 'true'}).encode('ascii'))
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
request.add_header('X-NFB-Referer', 'http://www.nfb.ca/medias/flash/NFBVideoPlayer.swf')
config = self._download_xml(request, video_id, 'Downloading player config XML')
title = None
description = None
thumbnail = None
duration = None
formats = []
def extract_thumbnail(media):
thumbnails = {}
for asset in media.findall('assets/asset'):
thumbnails[asset.get('quality')] = asset.find('default/url').text
if not thumbnails:
return None
if 'high' in thumbnails:
return thumbnails['high']
return list(thumbnails.values())[0]
for media in config.findall('./player/stream/media'):
if media.get('type') == 'posterImage':
thumbnail = extract_thumbnail(media)
elif media.get('type') == 'video':
duration = int(media.get('duration'))
title = media.find('title').text
description = media.find('description').text
# It seems assets always go from lower to better quality, so no need to sort
formats = [{
'url': x.find('default/streamerURI').text,
'app': x.find('default/streamerURI').text.split('/', 3)[3],
'play_path': x.find('default/url').text,
'rtmp_live': False,
'ext': 'mp4',
'format_id': x.get('quality'),
} for x in media.findall('assets/asset')]
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'duration': duration,
'uploader': uploader,
'uploader_id': uploader_id,
'formats': formats,
}

View File

@ -1,3 +1,5 @@
from __future__ import unicode_literals
import json
import re
@ -9,13 +11,13 @@ class NineGagIE(InfoExtractor):
_VALID_URL = r'^https?://(?:www\.)?9gag\.tv/v/(?P<id>[0-9]+)'
_TEST = {
u"url": u"http://9gag.tv/v/1912",
u"file": u"1912.mp4",
u"info_dict": {
u"description": u"This 3-minute video will make you smile and then make you feel untalented and insignificant. Anyway, you should share this awesomeness. (Thanks, Dino!)",
u"title": u"\"People Are Awesome 2013\" Is Absolutely Awesome"
"url": "http://9gag.tv/v/1912",
"file": "1912.mp4",
"info_dict": {
"description": "This 3-minute video will make you smile and then make you feel untalented and insignificant. Anyway, you should share this awesomeness. (Thanks, Dino!)",
"title": "\"People Are Awesome 2013\" Is Absolutely Awesome"
},
u'add_ie': [u'Youtube']
'add_ie': ['Youtube']
}
def _real_extract(self, url):
@ -25,7 +27,7 @@ class NineGagIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
data_json = self._html_search_regex(r'''(?x)
<div\s*id="tv-video"\s*data-video-source="youtube"\s*
data-video-meta="([^"]+)"''', webpage, u'video metadata')
data-video-meta="([^"]+)"''', webpage, 'video metadata')
data = json.loads(data_json)

View File

@ -0,0 +1,51 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
unified_strdate,
)
class NormalbootsIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?normalboots\.com/video/(?P<videoid>[0-9a-z-]*)/?$'
_TEST = {
'url': 'http://normalboots.com/video/home-alone-games-jontron/',
'md5': '8bf6de238915dd501105b44ef5f1e0f6',
'info_dict': {
'id': 'home-alone-games-jontron',
'ext': 'mp4',
'title': 'Home Alone Games - JonTron - NormalBoots',
'description': 'Jon is late for Christmas. Typical. Thanks to: Paul Ritchey for Co-Writing/Filming: http://www.youtube.com/user/ContinueShow Michael Azzi for Christmas Intro Animation: http://michafrar.tumblr.com/ Jerrod Waters for Christmas Intro Music: http://www.youtube.com/user/xXJerryTerryXx Casey Ormond for Tense Battle Theme:\xa0http://www.youtube.com/Kiamet/',
'uploader': 'JonTron',
'upload_date': '20140125',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('videoid')
webpage = self._download_webpage(url, video_id)
video_uploader = self._html_search_regex(r'Posted\sby\s<a\shref="[A-Za-z0-9/]*">(?P<uploader>[A-Za-z]*)\s</a>',
webpage, 'uploader')
raw_upload_date = self._html_search_regex('<span style="text-transform:uppercase; font-size:inherit;">[A-Za-z]+, (?P<date>.*)</span>',
webpage, 'date')
video_upload_date = unified_strdate(raw_upload_date)
player_url = self._html_search_regex(r'<iframe\swidth="[0-9]+"\sheight="[0-9]+"\ssrc="(?P<url>[\S]+)"', webpage, 'url')
player_page = self._download_webpage(player_url, video_id)
video_url = self._html_search_regex(r"file:\s'(?P<file>[^']+\.mp4)'", player_page, 'file')
return {
'id': video_id,
'url': video_url,
'title': self._og_search_title(webpage),
'description': self._og_search_description(webpage),
'thumbnail': self._og_search_thumbnail(webpage),
'uploader': video_uploader,
'upload_date': video_upload_date,
}

View File

@ -5,7 +5,7 @@ from .common import InfoExtractor
from ..utils import unescapeHTML
class OoyalaIE(InfoExtractor):
_VALID_URL = r'https?://.+?\.ooyala\.com/.*?embedCode=(?P<id>.+?)(&|$)'
_VALID_URL = r'https?://.+?\.ooyala\.com/.*?(?:embedCode|ec)=(?P<id>.+?)(&|$)'
_TEST = {
# From http://it.slashdot.org/story/13/04/25/178216/recovering-data-from-broken-hard-drives-and-ssds-video

View File

@ -1,30 +1,64 @@
from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor
class PBSIE(InfoExtractor):
_VALID_URL = r'https?://video\.pbs\.org/video/(?P<id>\d+)/?'
_VALID_URL = r'''(?x)https?://
(?:
# Direct video URL
video\.pbs\.org/(?:viralplayer|video)/(?P<id>[0-9]+)/? |
# Article with embedded player
(?:www\.)?pbs\.org/(?:[^/]+/){2,5}(?P<presumptive_id>[^/]+)/?(?:$|[?\#]) |
# Player
video\.pbs\.org/partnerplayer/(?P<player_id>[^/]+)/
)
'''
_TEST = {
u'url': u'http://video.pbs.org/video/2365006249/',
u'file': u'2365006249.mp4',
u'md5': 'ce1888486f0908d555a8093cac9a7362',
u'info_dict': {
u'title': u'A More Perfect Union',
u'description': u'md5:ba0c207295339c8d6eced00b7c363c6a',
u'duration': 3190,
'url': 'http://www.pbs.org/tpt/constitution-usa-peter-sagal/watch/a-more-perfect-union/',
'md5': 'ce1888486f0908d555a8093cac9a7362',
'info_dict': {
'id': '2365006249',
'ext': 'mp4',
'title': 'A More Perfect Union',
'description': 'md5:ba0c207295339c8d6eced00b7c363c6a',
'duration': 3190,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
presumptive_id = mobj.group('presumptive_id')
display_id = presumptive_id
if presumptive_id:
webpage = self._download_webpage(url, display_id)
url = self._search_regex(
r'<iframe\s+id=["\']partnerPlayer["\'].*?\s+src=["\'](.*?)["\']>',
webpage, 'player URL')
mobj = re.match(self._VALID_URL, url)
player_id = mobj.group('player_id')
if not display_id:
display_id = player_id
if player_id:
player_page = self._download_webpage(
url, display_id, note='Downloading player page',
errnote='Could not download player page')
video_id = self._search_regex(
r'<div\s+id="video_([0-9]+)"', player_page, 'video ID')
else:
video_id = mobj.group('id')
display_id = video_id
info_url = 'http://video.pbs.org/videoInfo/%s?format=json' % video_id
info_page = self._download_webpage(info_url, video_id)
info =json.loads(info_page)
return {'id': video_id,
info = self._download_json(info_url, display_id)
return {
'id': video_id,
'title': info['title'],
'url': info['alternate_encoding']['url'],
'ext': 'mp4',

View File

@ -1,3 +1,5 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
@ -7,12 +9,12 @@ from ..utils import compat_urllib_parse
class PornHdIE(InfoExtractor):
_VALID_URL = r'(?:http://)?(?:www\.)?pornhd\.com/(?:[a-z]{2,4}/)?videos/(?P<video_id>[0-9]+)/(?P<video_title>.+)'
_TEST = {
u'url': u'http://www.pornhd.com/videos/1962/sierra-day-gets-his-cum-all-over-herself-hd-porn-video',
u'file': u'1962.flv',
u'md5': u'35272469887dca97abd30abecc6cdf75',
u'info_dict': {
u"title": u"sierra-day-gets-his-cum-all-over-herself-hd-porn-video",
u"age_limit": 18,
'url': 'http://www.pornhd.com/videos/1962/sierra-day-gets-his-cum-all-over-herself-hd-porn-video',
'file': '1962.flv',
'md5': '35272469887dca97abd30abecc6cdf75',
'info_dict': {
"title": "sierra-day-gets-his-cum-all-over-herself-hd-porn-video",
"age_limit": 18,
}
}
@ -24,9 +26,13 @@ class PornHdIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
video_url = self._html_search_regex(
r'&hd=(http.+?)&', webpage, u'video URL')
video_url = compat_urllib_parse.unquote(video_url)
next_url = self._html_search_regex(
r'&hd=(http.+?)&', webpage, 'video URL')
next_url = compat_urllib_parse.unquote(next_url)
video_url = self._download_webpage(
next_url, video_id, note='Retrieving video URL',
errnote='Could not retrieve video URL')
age_limit = 18
return {

View File

@ -1,10 +1,11 @@
# encoding: utf-8
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..utils import (
compat_urllib_parse_urlparse,
ExtractorError,
)
@ -12,16 +13,17 @@ from ..utils import (
class RBMARadioIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?rbmaradio\.com/shows/(?P<videoID>[^/]+)$'
_TEST = {
u'url': u'http://www.rbmaradio.com/shows/ford-lopatin-live-at-primavera-sound-2011',
u'file': u'ford-lopatin-live-at-primavera-sound-2011.mp3',
u'md5': u'6bc6f9bcb18994b4c983bc3bf4384d95',
u'info_dict': {
u"uploader_id": u"ford-lopatin",
u"location": u"Spain",
u"description": u"Joel Ford and Daniel \u2019Oneohtrix Point Never\u2019 Lopatin fly their midified pop extravaganza to Spain. Live at Primavera Sound 2011.",
u"uploader": u"Ford & Lopatin",
u"title": u"Live at Primavera Sound 2011"
}
'url': 'http://www.rbmaradio.com/shows/ford-lopatin-live-at-primavera-sound-2011',
'md5': '6bc6f9bcb18994b4c983bc3bf4384d95',
'info_dict': {
'id': 'ford-lopatin-live-at-primavera-sound-2011',
'ext': 'mp3',
"uploader_id": "ford-lopatin",
"location": "Spain",
"description": "Joel Ford and Daniel Oneohtrix Point Never Lopatin fly their midified pop extravaganza to Spain. Live at Primavera Sound 2011.",
"uploader": "Ford & Lopatin",
"title": "Live at Primavera Sound 2011",
},
}
def _real_extract(self, url):
@ -31,20 +33,18 @@ class RBMARadioIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
json_data = self._search_regex(r'window\.gon.*?gon\.show=(.+?);$',
webpage, u'json data', flags=re.MULTILINE)
webpage, 'json data', flags=re.MULTILINE)
try:
data = json.loads(json_data)
except ValueError as e:
raise ExtractorError(u'Invalid JSON: ' + str(e))
raise ExtractorError('Invalid JSON: ' + str(e))
video_url = data['akamai_url'] + '&cbr=256'
url_parts = compat_urllib_parse_urlparse(video_url)
video_ext = url_parts.path.rpartition('.')[2]
info = {
return {
'id': video_id,
'url': video_url,
'ext': video_ext,
'title': data['title'],
'description': data.get('teaser_text'),
'location': data.get('country_of_origin'),
@ -53,4 +53,3 @@ class RBMARadioIE(InfoExtractor):
'thumbnail': data.get('image', {}).get('large_url_2x'),
'duration': data.get('duration'),
}
return [info]

View File

@ -1,3 +1,5 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
@ -11,12 +13,12 @@ class Ro220IE(InfoExtractor):
IE_NAME = '220.ro'
_VALID_URL = r'(?x)(?:https?://)?(?:www\.)?220\.ro/(?P<category>[^/]+)/(?P<shorttitle>[^/]+)/(?P<video_id>[^/]+)'
_TEST = {
u"url": u"http://www.220.ro/sport/Luati-Le-Banii-Sez-4-Ep-1/LYV6doKo7f/",
u'file': u'LYV6doKo7f.mp4',
u'md5': u'03af18b73a07b4088753930db7a34add',
u'info_dict': {
u"title": u"Luati-le Banii sez 4 ep 1",
u"description": u"Iata-ne reveniti dupa o binemeritata vacanta. Va astept si pe Facebook cu pareri si comentarii.",
"url": "http://www.220.ro/sport/Luati-Le-Banii-Sez-4-Ep-1/LYV6doKo7f/",
'file': 'LYV6doKo7f.mp4',
'md5': '03af18b73a07b4088753930db7a34add',
'info_dict': {
"title": "Luati-le Banii sez 4 ep 1",
"description": "Iata-ne reveniti dupa o binemeritata vacanta. Va astept si pe Facebook cu pareri si comentarii.",
}
}
@ -27,10 +29,10 @@ class Ro220IE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
flashVars_str = self._search_regex(
r'<param name="flashVars" value="([^"]+)"',
webpage, u'flashVars')
webpage, 'flashVars')
flashVars = compat_parse_qs(flashVars_str)
info = {
return {
'_type': 'video',
'id': video_id,
'ext': 'mp4',
@ -39,4 +41,3 @@ class Ro220IE(InfoExtractor):
'description': clean_html(flashVars['desc'][0]),
'thumbnail': flashVars['preview'][0],
}
return info

View File

@ -1,4 +1,7 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
@ -12,78 +15,77 @@ class RTLnowIE(InfoExtractor):
"""Information Extractor for RTL NOW, RTL2 NOW, RTL NITRO, SUPER RTL NOW, VOX NOW and n-tv NOW"""
_VALID_URL = r'(?:http://)?(?P<url>(?P<domain>rtl-now\.rtl\.de|rtl2now\.rtl2\.de|(?:www\.)?voxnow\.de|(?:www\.)?rtlnitronow\.de|(?:www\.)?superrtlnow\.de|(?:www\.)?n-tvnow\.de)/+[a-zA-Z0-9-]+/[a-zA-Z0-9-]+\.php\?(?:container_id|film_id)=(?P<video_id>[0-9]+)&player=1(?:&season=[0-9]+)?(?:&.*)?)'
_TESTS = [{
u'url': u'http://rtl-now.rtl.de/ahornallee/folge-1.php?film_id=90419&player=1&season=1',
u'file': u'90419.flv',
u'info_dict': {
u'upload_date': u'20070416',
u'title': u'Ahornallee - Folge 1 - Der Einzug',
u'description': u'Folge 1 - Der Einzug',
'url': 'http://rtl-now.rtl.de/ahornallee/folge-1.php?film_id=90419&player=1&season=1',
'file': '90419.flv',
'info_dict': {
'upload_date': '20070416',
'title': 'Ahornallee - Folge 1 - Der Einzug',
'description': 'Folge 1 - Der Einzug',
},
u'params': {
u'skip_download': True,
'params': {
'skip_download': True,
},
u'skip': u'Only works from Germany',
'skip': 'Only works from Germany',
},
{
u'url': u'http://rtl2now.rtl2.de/aerger-im-revier/episode-15-teil-1.php?film_id=69756&player=1&season=2&index=5',
u'file': u'69756.flv',
u'info_dict': {
u'upload_date': u'20120519',
u'title': u'Ärger im Revier - Ein junger Ladendieb, ein handfester Streit...',
u'description': u'Ärger im Revier - Ein junger Ladendieb, ein handfester Streit u.a.',
u'thumbnail': u'http://autoimg.static-fra.de/rtl2now/219850/1500x1500/image2.jpg',
'url': 'http://rtl2now.rtl2.de/aerger-im-revier/episode-15-teil-1.php?film_id=69756&player=1&season=2&index=5',
'file': '69756.flv',
'info_dict': {
'upload_date': '20120519',
'title': 'Ärger im Revier - Ein junger Ladendieb, ein handfester Streit...',
'description': 'Ärger im Revier - Ein junger Ladendieb, ein handfester Streit u.a.',
'thumbnail': 'http://autoimg.static-fra.de/rtl2now/219850/1500x1500/image2.jpg',
},
u'params': {
u'skip_download': True,
'params': {
'skip_download': True,
},
u'skip': u'Only works from Germany',
'skip': 'Only works from Germany',
},
{
u'url': u'http://www.voxnow.de/voxtours/suedafrika-reporter-ii.php?film_id=13883&player=1&season=17',
u'file': u'13883.flv',
u'info_dict': {
u'upload_date': u'20090627',
u'title': u'Voxtours - Südafrika-Reporter II',
u'description': u'Südafrika-Reporter II',
'url': 'http://www.voxnow.de/voxtours/suedafrika-reporter-ii.php?film_id=13883&player=1&season=17',
'file': '13883.flv',
'info_dict': {
'upload_date': '20090627',
'title': 'Voxtours - Südafrika-Reporter II',
'description': 'Südafrika-Reporter II',
},
u'params': {
u'skip_download': True,
'params': {
'skip_download': True,
},
},
{
u'url': u'http://superrtlnow.de/medicopter-117/angst.php?film_id=99205&player=1',
u'file': u'99205.flv',
u'info_dict': {
u'upload_date': u'20080928',
u'title': u'Medicopter 117 - Angst!',
u'description': u'Angst!',
u'thumbnail': u'http://autoimg.static-fra.de/superrtlnow/287529/1500x1500/image2.jpg'
'url': 'http://superrtlnow.de/medicopter-117/angst.php?film_id=99205&player=1',
'file': '99205.flv',
'info_dict': {
'upload_date': '20080928',
'title': 'Medicopter 117 - Angst!',
'description': 'Angst!',
'thumbnail': 'http://autoimg.static-fra.de/superrtlnow/287529/1500x1500/image2.jpg'
},
u'params': {
u'skip_download': True,
'params': {
'skip_download': True,
},
},
{
u'url': u'http://www.n-tvnow.de/top-gear/episode-1-2013-01-01-00-00-00.php?film_id=124903&player=1&season=10',
u'file': u'124903.flv',
u'info_dict': {
u'upload_date': u'20130101',
u'title': u'Top Gear vom 01.01.2013',
u'description': u'Episode 1',
'url': 'http://www.n-tvnow.de/top-gear/episode-1-2013-01-01-00-00-00.php?film_id=124903&player=1&season=10',
'file': '124903.flv',
'info_dict': {
'upload_date': '20130101',
'title': 'Top Gear vom 01.01.2013',
'description': 'Episode 1',
},
u'params': {
u'skip_download': True,
'params': {
'skip_download': True,
},
u'skip': u'Only works from Germany',
'skip': 'Only works from Germany',
}]
def _real_extract(self,url):
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
webpage_url = u'http://' + mobj.group('url')
video_page_url = u'http://' + mobj.group('domain') + u'/'
video_id = mobj.group(u'video_id')
webpage_url = 'http://' + mobj.group('url')
video_page_url = 'http://' + mobj.group('domain') + '/'
video_id = mobj.group('video_id')
webpage = self._download_webpage(webpage_url, video_id)
@ -94,43 +96,45 @@ class RTLnowIE(InfoExtractor):
msg = clean_html(note_m.group(1))
raise ExtractorError(msg)
video_title = self._html_search_regex(r'<title>(?P<title>[^<]+?)( \| [^<]*)?</title>',
webpage, u'title')
playerdata_url = self._html_search_regex(r'\'playerdata\': \'(?P<playerdata_url>[^\']+)\'',
webpage, u'playerdata_url')
video_title = self._html_search_regex(
r'<title>(?P<title>[^<]+?)( \| [^<]*)?</title>',
webpage, 'title')
playerdata_url = self._html_search_regex(
r'\'playerdata\': \'(?P<playerdata_url>[^\']+)\'',
webpage, 'playerdata_url')
playerdata = self._download_webpage(playerdata_url, video_id)
mobj = re.search(r'<title><!\[CDATA\[(?P<description>.+?)(?:\s+- (?:Sendung )?vom (?P<upload_date_d>[0-9]{2})\.(?P<upload_date_m>[0-9]{2})\.(?:(?P<upload_date_Y>[0-9]{4})|(?P<upload_date_y>[0-9]{2})) [0-9]{2}:[0-9]{2} Uhr)?\]\]></title>', playerdata)
if mobj:
video_description = mobj.group(u'description')
video_description = mobj.group('description')
if mobj.group('upload_date_Y'):
video_upload_date = mobj.group('upload_date_Y')
elif mobj.group('upload_date_y'):
video_upload_date = u'20' + mobj.group('upload_date_y')
video_upload_date = '20' + mobj.group('upload_date_y')
else:
video_upload_date = None
if video_upload_date:
video_upload_date += mobj.group('upload_date_m')+mobj.group('upload_date_d')
video_upload_date += mobj.group('upload_date_m') + mobj.group('upload_date_d')
else:
video_description = None
video_upload_date = None
self._downloader.report_warning(u'Unable to extract description and upload date')
self._downloader.report_warning('Unable to extract description and upload date')
# Thumbnail: not every video has an thumbnail
mobj = re.search(r'<meta property="og:image" content="(?P<thumbnail>[^"]+)">', webpage)
if mobj:
video_thumbnail = mobj.group(u'thumbnail')
video_thumbnail = mobj.group('thumbnail')
else:
video_thumbnail = None
mobj = re.search(r'<filename [^>]+><!\[CDATA\[(?P<url>rtmpe://(?:[^/]+/){2})(?P<play_path>[^\]]+)\]\]></filename>', playerdata)
if mobj is None:
raise ExtractorError(u'Unable to extract media URL')
video_url = mobj.group(u'url')
video_play_path = u'mp4:' + mobj.group(u'play_path')
video_player_url = video_page_url + u'includes/vodplayer.swf'
raise ExtractorError('Unable to extract media URL')
video_url = mobj.group('url')
video_play_path = 'mp4:' + mobj.group('play_path')
video_player_url = video_page_url + 'includes/vodplayer.swf'
return [{
return {
'id': video_id,
'url': video_url,
'play_path': video_play_path,
@ -141,4 +145,4 @@ class RTLnowIE(InfoExtractor):
'description': video_description,
'upload_date': video_upload_date,
'thumbnail': video_thumbnail,
}]
}

View File

@ -1,58 +1,124 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
import json
import itertools
from .common import InfoExtractor
from ..utils import (
compat_urlparse,
compat_str,
unified_strdate,
ExtractorError,
)
class RutubeIE(InfoExtractor):
_VALID_URL = r'https?://rutube\.ru/video/(?P<long_id>\w+)'
IE_NAME = 'rutube'
IE_DESC = 'Rutube videos'
_VALID_URL = r'https?://rutube\.ru/video/(?P<id>[\da-z]{32})'
_TEST = {
u'url': u'http://rutube.ru/video/3eac3b4561676c17df9132a9a1e62e3e/',
u'file': u'3eac3b4561676c17df9132a9a1e62e3e.mp4',
u'info_dict': {
u'title': u'Раненный кенгуру забежал в аптеку',
u'uploader': u'NTDRussian',
u'uploader_id': u'29790',
'url': 'http://rutube.ru/video/3eac3b4561676c17df9132a9a1e62e3e/',
'file': '3eac3b4561676c17df9132a9a1e62e3e.mp4',
'info_dict': {
'title': 'Раненный кенгуру забежал в аптеку',
'description': 'http://www.ntdtv.ru ',
'duration': 80,
'uploader': 'NTDRussian',
'uploader_id': '29790',
'upload_date': '20131016',
},
u'params': {
'params': {
# It requires ffmpeg (m3u8 download)
u'skip_download': True,
'skip_download': True,
},
}
def _get_api_response(self, short_id, subpath):
api_url = 'http://rutube.ru/api/play/%s/%s/?format=json' % (subpath, short_id)
response_json = self._download_webpage(api_url, short_id,
u'Downloading %s json' % subpath)
return json.loads(response_json)
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
long_id = mobj.group('long_id')
webpage = self._download_webpage(url, long_id)
og_video = self._og_search_video_url(webpage)
short_id = compat_urlparse.urlparse(og_video).path[1:]
options = self._get_api_response(short_id, 'options')
trackinfo = self._get_api_response(short_id, 'trackinfo')
video_id = mobj.group('id')
api_response = self._download_webpage('http://rutube.ru/api/video/%s/?format=json' % video_id,
video_id, 'Downloading video JSON')
video = json.loads(api_response)
api_response = self._download_webpage('http://rutube.ru/api/play/trackinfo/%s/?format=json' % video_id,
video_id, 'Downloading trackinfo JSON')
trackinfo = json.loads(api_response)
# Some videos don't have the author field
author = trackinfo.get('author') or {}
m3u8_url = trackinfo['video_balancer'].get('m3u8')
if m3u8_url is None:
raise ExtractorError(u'Couldn\'t find m3u8 manifest url')
raise ExtractorError('Couldn\'t find m3u8 manifest url')
return {
'id': trackinfo['id'],
'title': trackinfo['title'],
'id': video['id'],
'title': video['title'],
'description': video['description'],
'duration': video['duration'],
'view_count': video['hits'],
'url': m3u8_url,
'ext': 'mp4',
'thumbnail': options['thumbnail_url'],
'thumbnail': video['thumbnail_url'],
'uploader': author.get('name'),
'uploader_id': compat_str(author['id']) if author else None,
'upload_date': unified_strdate(video['created_ts']),
'age_limit': 18 if video['is_adult'] else 0,
}
class RutubeChannelIE(InfoExtractor):
IE_NAME = 'rutube:channel'
IE_DESC = 'Rutube channels'
_VALID_URL = r'http://rutube\.ru/tags/video/(?P<id>\d+)'
_PAGE_TEMPLATE = 'http://rutube.ru/api/tags/video/%s/?page=%s&format=json'
def _extract_videos(self, channel_id, channel_title=None):
entries = []
for pagenum in itertools.count(1):
api_response = self._download_webpage(
self._PAGE_TEMPLATE % (channel_id, pagenum),
channel_id, 'Downloading page %s' % pagenum)
page = json.loads(api_response)
results = page['results']
if not results:
break
entries.extend(self.url_result(result['video_url'], 'Rutube') for result in results)
if not page['has_next']:
break
return self.playlist_result(entries, channel_id, channel_title)
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
channel_id = mobj.group('id')
return self._extract_videos(channel_id)
class RutubeMovieIE(RutubeChannelIE):
IE_NAME = 'rutube:movie'
IE_DESC = 'Rutube movies'
_VALID_URL = r'http://rutube\.ru/metainfo/tv/(?P<id>\d+)'
_MOVIE_TEMPLATE = 'http://rutube.ru/api/metainfo/tv/%s/?format=json'
_PAGE_TEMPLATE = 'http://rutube.ru/api/metainfo/tv/%s/video?page=%s&format=json'
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
movie_id = mobj.group('id')
api_response = self._download_webpage(
self._MOVIE_TEMPLATE % movie_id, movie_id,
'Downloading movie JSON')
movie = json.loads(api_response)
movie_name = movie['name']
return self._extract_videos(movie_id, movie_name)
class RutubePersonIE(RutubeChannelIE):
IE_NAME = 'rutube:person'
IE_DESC = 'Rutube person videos'
_VALID_URL = r'http://rutube\.ru/video/person/(?P<id>\d+)'
_PAGE_TEMPLATE = 'http://rutube.ru/api/video/person/%s/?page=%s&format=json'

View File

@ -0,0 +1,37 @@
# coding: utf-8
from __future__ import unicode_literals
import os.path
import re
from .common import InfoExtractor
class SaveFromIE(InfoExtractor):
IE_NAME = 'savefrom.net'
_VALID_URL = r'https?://[^.]+\.savefrom\.net/\#url=(?P<url>.*)$'
_TEST = {
'url': 'http://en.savefrom.net/#url=http://youtube.com/watch?v=UlVRAPW2WJY&utm_source=youtube.com&utm_medium=short_domains&utm_campaign=ssyoutube.com',
'info_dict': {
'id': 'UlVRAPW2WJY',
'ext': 'mp4',
'title': 'About Team Radical MMA | MMA Fighting',
'upload_date': '20120816',
'uploader': 'Howcast',
'uploader_id': 'Howcast',
'description': 'md5:4f0aac94361a12e1ce57d74f85265175',
},
'params': {
'skip_download': True
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = os.path.splitext(url.split('/')[-1])[0]
return {
'_type': 'url',
'id': video_id,
'url': mobj.group('url'),
}

View File

@ -1,4 +1,5 @@
# coding: utf-8
from __future__ import unicode_literals
import re
@ -12,21 +13,31 @@ from ..utils import (
class SinaIE(InfoExtractor):
_VALID_URL = r'''https?://(.*?\.)?video\.sina\.com\.cn/
(
(.+?/(((?P<pseudo_id>\d+).html)|(.*?(\#|(vid=))(?P<id>\d+?)($|&))))
(.+?/(((?P<pseudo_id>\d+).html)|(.*?(\#|(vid=)|b/)(?P<id>\d+?)($|&|\-))))
|
# This is used by external sites like Weibo
(api/sinawebApi/outplay.php/(?P<token>.+?)\.swf)
)
'''
_TEST = {
u'url': u'http://video.sina.com.cn/news/vlist/zt/chczlj2013/?opsubject_id=top12#110028898',
u'file': u'110028898.flv',
u'md5': u'd65dd22ddcf44e38ce2bf58a10c3e71f',
u'info_dict': {
u'title': u'《中国新闻》 朝鲜要求巴拿马立即释放被扣船员',
}
_TESTS = [
{
'url': 'http://video.sina.com.cn/news/vlist/zt/chczlj2013/?opsubject_id=top12#110028898',
'file': '110028898.flv',
'md5': 'd65dd22ddcf44e38ce2bf58a10c3e71f',
'info_dict': {
'title': '《中国新闻》 朝鲜要求巴拿马立即释放被扣船员',
}
},
{
'url': 'http://video.sina.com.cn/v/b/101314253-1290078633.html',
'info_dict': {
'id': '101314253',
'ext': 'flv',
'title': '军方提高对朝情报监视级别',
},
},
]
@classmethod
def suitable(cls, url):
@ -35,10 +46,10 @@ class SinaIE(InfoExtractor):
def _extract_video(self, video_id):
data = compat_urllib_parse.urlencode({'vid': video_id})
url_doc = self._download_xml('http://v.iask.com/v_play.php?%s' % data,
video_id, u'Downloading video url')
video_id, 'Downloading video url')
image_page = self._download_webpage(
'http://interface.video.sina.com.cn/interface/common/getVideoImage.php?%s' % data,
video_id, u'Downloading thumbnail info')
video_id, 'Downloading thumbnail info')
return {'id': video_id,
'url': url_doc.find('./durl/url').text,
@ -52,7 +63,7 @@ class SinaIE(InfoExtractor):
video_id = mobj.group('id')
if mobj.group('token') is not None:
# The video id is in the redirected url
self.to_screen(u'Getting video id')
self.to_screen('Getting video id')
request = compat_urllib_request.Request(url)
request.get_method = lambda: 'HEAD'
(_, urlh) = self._download_webpage_handle(request, 'NA', False)
@ -60,6 +71,6 @@ class SinaIE(InfoExtractor):
elif video_id is None:
pseudo_id = mobj.group('pseudo_id')
webpage = self._download_webpage(url, pseudo_id)
video_id = self._search_regex(r'vid:\'(\d+?)\'', webpage, u'video id')
video_id = self._search_regex(r'vid:\'(\d+?)\'', webpage, 'video id')
return self._extract_video(video_id)

View File

@ -1,3 +1,5 @@
from __future__ import unicode_literals
import re
import json
@ -12,11 +14,12 @@ class SlideshareIE(InfoExtractor):
_VALID_URL = r'https?://www\.slideshare\.net/[^/]+?/(?P<title>.+?)($|\?)'
_TEST = {
u'url': u'http://www.slideshare.net/Dataversity/keynote-presentation-managing-scale-and-complexity',
u'file': u'25665706.mp4',
u'info_dict': {
u'title': u'Managing Scale and Complexity',
u'description': u'This was a keynote presentation at the NoSQL Now! 2013 Conference & Expo (http://www.nosqlnow.com). This presentation was given by Adrian Cockcroft from Netflix',
'url': 'http://www.slideshare.net/Dataversity/keynote-presentation-managing-scale-and-complexity',
'info_dict': {
'id': '25665706',
'ext': 'mp4',
'title': 'Managing Scale and Complexity',
'description': 'This was a keynote presentation at the NoSQL Now! 2013 Conference & Expo (http://www.nosqlnow.com). This presentation was given by Adrian Cockcroft from Netflix.',
},
}
@ -26,15 +29,17 @@ class SlideshareIE(InfoExtractor):
webpage = self._download_webpage(url, page_title)
slideshare_obj = self._search_regex(
r'var slideshare_object = ({.*?}); var user_info =',
webpage, u'slideshare object')
webpage, 'slideshare object')
info = json.loads(slideshare_obj)
if info['slideshow']['type'] != u'video':
raise ExtractorError(u'Webpage type is "%s": only video extraction is supported for Slideshare' % info['slideshow']['type'], expected=True)
if info['slideshow']['type'] != 'video':
raise ExtractorError('Webpage type is "%s": only video extraction is supported for Slideshare' % info['slideshow']['type'], expected=True)
doc = info['doc']
bucket = info['jsplayer']['video_bucket']
ext = info['jsplayer']['video_extension']
video_url = compat_urlparse.urljoin(bucket, doc + '-SD.' + ext)
description = self._html_search_regex(
r'<p class="description.*?"[^>]*>(.*?)</p>', webpage, 'description')
return {
'_type': 'video',
@ -43,5 +48,5 @@ class SlideshareIE(InfoExtractor):
'ext': ext,
'url': video_url,
'thumbnail': info['slideshow']['pin_image_url'],
'description': self._og_search_description(webpage),
'description': description,
}

View File

@ -1,4 +1,5 @@
# encoding: utf-8
from __future__ import unicode_literals
import os.path
import re
@ -16,72 +17,73 @@ from ..utils import (
class SmotriIE(InfoExtractor):
IE_DESC = u'Smotri.com'
IE_NAME = u'smotri'
IE_DESC = 'Smotri.com'
IE_NAME = 'smotri'
_VALID_URL = r'^https?://(?:www\.)?(?P<url>smotri\.com/video/view/\?id=(?P<videoid>v(?P<realvideoid>[0-9]+)[a-z0-9]{4}))'
_NETRC_MACHINE = 'smotri'
_TESTS = [
# real video id 2610366
{
u'url': u'http://smotri.com/video/view/?id=v261036632ab',
u'file': u'v261036632ab.mp4',
u'md5': u'2a7b08249e6f5636557579c368040eb9',
u'info_dict': {
u'title': u'катастрофа с камер видеонаблюдения',
u'uploader': u'rbc2008',
u'uploader_id': u'rbc08',
u'upload_date': u'20131118',
u'description': u'катастрофа с камер видеонаблюдения, видео катастрофа с камер видеонаблюдения',
u'thumbnail': u'http://frame6.loadup.ru/8b/a9/2610366.3.3.jpg',
'url': 'http://smotri.com/video/view/?id=v261036632ab',
'file': 'v261036632ab.mp4',
'md5': '2a7b08249e6f5636557579c368040eb9',
'info_dict': {
'title': 'катастрофа с камер видеонаблюдения',
'uploader': 'rbc2008',
'uploader_id': 'rbc08',
'upload_date': '20131118',
'description': 'катастрофа с камер видеонаблюдения, видео катастрофа с камер видеонаблюдения',
'thumbnail': 'http://frame6.loadup.ru/8b/a9/2610366.3.3.jpg',
},
},
# real video id 57591
{
u'url': u'http://smotri.com/video/view/?id=v57591cb20',
u'file': u'v57591cb20.flv',
u'md5': u'830266dfc21f077eac5afd1883091bcd',
u'info_dict': {
u'title': u'test',
u'uploader': u'Support Photofile@photofile',
u'uploader_id': u'support-photofile',
u'upload_date': u'20070704',
u'description': u'test, видео test',
u'thumbnail': u'http://frame4.loadup.ru/03/ed/57591.2.3.jpg',
'url': 'http://smotri.com/video/view/?id=v57591cb20',
'file': 'v57591cb20.flv',
'md5': '830266dfc21f077eac5afd1883091bcd',
'info_dict': {
'title': 'test',
'uploader': 'Support Photofile@photofile',
'uploader_id': 'support-photofile',
'upload_date': '20070704',
'description': 'test, видео test',
'thumbnail': 'http://frame4.loadup.ru/03/ed/57591.2.3.jpg',
},
},
# video-password
{
u'url': u'http://smotri.com/video/view/?id=v1390466a13c',
u'file': u'v1390466a13c.mp4',
u'md5': u'f6331cef33cad65a0815ee482a54440b',
u'info_dict': {
u'title': u'TOCCA_A_NOI_-_LE_COSE_NON_VANNO_CAMBIAMOLE_ORA-1',
u'uploader': u'timoxa40',
u'uploader_id': u'timoxa40',
u'upload_date': u'20100404',
u'thumbnail': u'http://frame7.loadup.ru/af/3f/1390466.3.3.jpg',
u'description': u'TOCCA_A_NOI_-_LE_COSE_NON_VANNO_CAMBIAMOLE_ORA-1, видео TOCCA_A_NOI_-_LE_COSE_NON_VANNO_CAMBIAMOLE_ORA-1',
'url': 'http://smotri.com/video/view/?id=v1390466a13c',
'file': 'v1390466a13c.mp4',
'md5': 'f6331cef33cad65a0815ee482a54440b',
'info_dict': {
'title': 'TOCCA_A_NOI_-_LE_COSE_NON_VANNO_CAMBIAMOLE_ORA-1',
'uploader': 'timoxa40',
'uploader_id': 'timoxa40',
'upload_date': '20100404',
'thumbnail': 'http://frame7.loadup.ru/af/3f/1390466.3.3.jpg',
'description': 'TOCCA_A_NOI_-_LE_COSE_NON_VANNO_CAMBIAMOLE_ORA-1, видео TOCCA_A_NOI_-_LE_COSE_NON_VANNO_CAMBIAMOLE_ORA-1',
},
u'params': {
u'videopassword': u'qwerty',
'params': {
'videopassword': 'qwerty',
},
},
# age limit + video-password
{
u'url': u'http://smotri.com/video/view/?id=v15408898bcf',
u'file': u'v15408898bcf.flv',
u'md5': u'91e909c9f0521adf5ee86fbe073aad70',
u'info_dict': {
u'title': u'этот ролик не покажут по ТВ',
u'uploader': u'zzxxx',
u'uploader_id': u'ueggb',
u'upload_date': u'20101001',
u'thumbnail': u'http://frame3.loadup.ru/75/75/1540889.1.3.jpg',
u'age_limit': 18,
u'description': u'этот ролик не покажут по ТВ, видео этот ролик не покажут по ТВ',
'url': 'http://smotri.com/video/view/?id=v15408898bcf',
'file': 'v15408898bcf.flv',
'md5': '91e909c9f0521adf5ee86fbe073aad70',
'info_dict': {
'title': 'этот ролик не покажут по ТВ',
'uploader': 'zzxxx',
'uploader_id': 'ueggb',
'upload_date': '20101001',
'thumbnail': 'http://frame3.loadup.ru/75/75/1540889.1.3.jpg',
'age_limit': 18,
'description': 'этот ролик не покажут по ТВ, видео этот ролик не покажут по ТВ',
},
u'params': {
u'videopassword': u'333'
'params': {
'videopassword': '333'
}
}
]
@ -106,26 +108,26 @@ class SmotriIE(InfoExtractor):
# Download video JSON data
video_json_url = 'http://smotri.com/vt.php?id=%s' % real_video_id
video_json_page = self._download_webpage(video_json_url, video_id, u'Downloading video JSON')
video_json_page = self._download_webpage(video_json_url, video_id, 'Downloading video JSON')
video_json = json.loads(video_json_page)
status = video_json['status']
if status == self._VIDEO_NOT_FOUND:
raise ExtractorError(u'Video %s does not exist' % video_id, expected=True)
raise ExtractorError('Video %s does not exist' % video_id, expected=True)
elif status == self._PASSWORD_DETECTED: # The video is protected by a password, retry with
# video-password set
video_password = self._downloader.params.get('videopassword', None)
if not video_password:
raise ExtractorError(u'This video is protected by a password, use the --video-password option', expected=True)
raise ExtractorError('This video is protected by a password, use the --video-password option', expected=True)
video_json_url += '&md5pass=%s' % hashlib.md5(video_password.encode('utf-8')).hexdigest()
video_json_page = self._download_webpage(video_json_url, video_id, u'Downloading video JSON (video-password set)')
video_json_page = self._download_webpage(video_json_url, video_id, 'Downloading video JSON (video-password set)')
video_json = json.loads(video_json_page)
status = video_json['status']
if status == self._PASSWORD_NOT_VERIFIED:
raise ExtractorError(u'Video password is invalid', expected=True)
raise ExtractorError('Video password is invalid', expected=True)
if status != self._SUCCESS:
raise ExtractorError(u'Unexpected status value %s' % status)
raise ExtractorError('Unexpected status value %s' % status)
# Extract the URL of the video
video_url = video_json['file_data']
@ -133,44 +135,44 @@ class SmotriIE(InfoExtractor):
# Video JSON does not provide enough meta data
# We will extract some from the video web page instead
video_page_url = 'http://' + mobj.group('url')
video_page = self._download_webpage(video_page_url, video_id, u'Downloading video page')
video_page = self._download_webpage(video_page_url, video_id, 'Downloading video page')
# Warning if video is unavailable
warning = self._html_search_regex(
r'<div class="videoUnModer">(.*?)</div>', video_page,
u'warning message', default=None)
'warning message', default=None)
if warning is not None:
self._downloader.report_warning(
u'Video %s may not be available; smotri said: %s ' %
'Video %s may not be available; smotri said: %s ' %
(video_id, warning))
# Adult content
if re.search(u'EroConfirmText">', video_page) is not None:
if re.search('EroConfirmText">', video_page) is not None:
self.report_age_confirmation()
confirm_string = self._html_search_regex(
r'<a href="/video/view/\?id=%s&confirm=([^"]+)" title="[^"]+">' % video_id,
video_page, u'confirm string')
video_page, 'confirm string')
confirm_url = video_page_url + '&confirm=%s' % confirm_string
video_page = self._download_webpage(confirm_url, video_id, u'Downloading video page (age confirmed)')
video_page = self._download_webpage(confirm_url, video_id, 'Downloading video page (age confirmed)')
adult_content = True
else:
adult_content = False
# Extract the rest of meta data
video_title = self._search_meta(u'name', video_page, u'title')
video_title = self._search_meta('name', video_page, 'title')
if not video_title:
video_title = os.path.splitext(url_basename(video_url))[0]
video_description = self._search_meta(u'description', video_page)
END_TEXT = u' на сайте Smotri.com'
video_description = self._search_meta('description', video_page)
END_TEXT = ' на сайте Smotri.com'
if video_description and video_description.endswith(END_TEXT):
video_description = video_description[:-len(END_TEXT)]
START_TEXT = u'Смотреть онлайн ролик '
START_TEXT = 'Смотреть онлайн ролик '
if video_description and video_description.startswith(START_TEXT):
video_description = video_description[len(START_TEXT):]
video_thumbnail = self._search_meta(u'thumbnail', video_page)
video_thumbnail = self._search_meta('thumbnail', video_page)
upload_date_str = self._search_meta(u'uploadDate', video_page, u'upload date')
upload_date_str = self._search_meta('uploadDate', video_page, 'upload date')
if upload_date_str:
upload_date_m = re.search(r'(?P<year>\d{4})\.(?P<month>\d{2})\.(?P<day>\d{2})T', upload_date_str)
video_upload_date = (
@ -184,7 +186,7 @@ class SmotriIE(InfoExtractor):
else:
video_upload_date = None
duration_str = self._search_meta(u'duration', video_page)
duration_str = self._search_meta('duration', video_page)
if duration_str:
duration_m = re.search(r'T(?P<hours>[0-9]{2})H(?P<minutes>[0-9]{2})M(?P<seconds>[0-9]{2})S', duration_str)
video_duration = (
@ -199,16 +201,16 @@ class SmotriIE(InfoExtractor):
video_duration = None
video_uploader = self._html_search_regex(
u'<div class="DescrUser"><div>Автор.*?onmouseover="popup_user_info[^"]+">(.*?)</a>',
video_page, u'uploader', fatal=False, flags=re.MULTILINE|re.DOTALL)
'<div class="DescrUser"><div>Автор.*?onmouseover="popup_user_info[^"]+">(.*?)</a>',
video_page, 'uploader', fatal=False, flags=re.MULTILINE|re.DOTALL)
video_uploader_id = self._html_search_regex(
u'<div class="DescrUser"><div>Автор.*?onmouseover="popup_user_info\\(.*?\'([^\']+)\'\\);">',
video_page, u'uploader id', fatal=False, flags=re.MULTILINE|re.DOTALL)
'<div class="DescrUser"><div>Автор.*?onmouseover="popup_user_info\\(.*?\'([^\']+)\'\\);">',
video_page, 'uploader id', fatal=False, flags=re.MULTILINE|re.DOTALL)
video_view_count = self._html_search_regex(
u'Общее количество просмотров.*?<span class="Number">(\\d+)</span>',
video_page, u'view count', fatal=False, flags=re.MULTILINE|re.DOTALL)
'Общее количество просмотров.*?<span class="Number">(\\d+)</span>',
video_page, 'view count', fatal=False, flags=re.MULTILINE|re.DOTALL)
return {
'id': video_id,
@ -227,8 +229,8 @@ class SmotriIE(InfoExtractor):
class SmotriCommunityIE(InfoExtractor):
IE_DESC = u'Smotri.com community videos'
IE_NAME = u'smotri:community'
IE_DESC = 'Smotri.com community videos'
IE_NAME = 'smotri:community'
_VALID_URL = r'^https?://(?:www\.)?smotri\.com/community/video/(?P<communityid>[0-9A-Za-z_\'-]+)'
def _real_extract(self, url):
@ -236,21 +238,21 @@ class SmotriCommunityIE(InfoExtractor):
community_id = mobj.group('communityid')
url = 'http://smotri.com/export/rss/video/by/community/-/%s/video.xml' % community_id
rss = self._download_xml(url, community_id, u'Downloading community RSS')
rss = self._download_xml(url, community_id, 'Downloading community RSS')
entries = [self.url_result(video_url.text, 'Smotri')
for video_url in rss.findall('./channel/item/link')]
description_text = rss.find('./channel/description').text
community_title = self._html_search_regex(
u'^Видео сообщества "([^"]+)"$', description_text, u'community title')
'^Видео сообщества "([^"]+)"$', description_text, 'community title')
return self.playlist_result(entries, community_id, community_title)
class SmotriUserIE(InfoExtractor):
IE_DESC = u'Smotri.com user videos'
IE_NAME = u'smotri:user'
IE_DESC = 'Smotri.com user videos'
IE_NAME = 'smotri:user'
_VALID_URL = r'^https?://(?:www\.)?smotri\.com/user/(?P<userid>[0-9A-Za-z_\'-]+)'
def _real_extract(self, url):
@ -258,22 +260,22 @@ class SmotriUserIE(InfoExtractor):
user_id = mobj.group('userid')
url = 'http://smotri.com/export/rss/user/video/-/%s/video.xml' % user_id
rss = self._download_xml(url, user_id, u'Downloading user RSS')
rss = self._download_xml(url, user_id, 'Downloading user RSS')
entries = [self.url_result(video_url.text, 'Smotri')
for video_url in rss.findall('./channel/item/link')]
description_text = rss.find('./channel/description').text
user_nickname = self._html_search_regex(
u'^Видео режиссера (.*)$', description_text,
u'user nickname')
'^Видео режиссера (.*)$', description_text,
'user nickname')
return self.playlist_result(entries, user_id, user_nickname)
class SmotriBroadcastIE(InfoExtractor):
IE_DESC = u'Smotri.com broadcasts'
IE_NAME = u'smotri:broadcast'
IE_DESC = 'Smotri.com broadcasts'
IE_NAME = 'smotri:broadcast'
_VALID_URL = r'^https?://(?:www\.)?(?P<url>smotri\.com/live/(?P<broadcastid>[^/]+))/?.*'
def _real_extract(self, url):
@ -281,46 +283,40 @@ class SmotriBroadcastIE(InfoExtractor):
broadcast_id = mobj.group('broadcastid')
broadcast_url = 'http://' + mobj.group('url')
broadcast_page = self._download_webpage(broadcast_url, broadcast_id, u'Downloading broadcast page')
broadcast_page = self._download_webpage(broadcast_url, broadcast_id, 'Downloading broadcast page')
if re.search(u'>Режиссер с логином <br/>"%s"<br/> <span>не существует<' % broadcast_id, broadcast_page) is not None:
raise ExtractorError(u'Broadcast %s does not exist' % broadcast_id, expected=True)
if re.search('>Режиссер с логином <br/>"%s"<br/> <span>не существует<' % broadcast_id, broadcast_page) is not None:
raise ExtractorError('Broadcast %s does not exist' % broadcast_id, expected=True)
# Adult content
if re.search(u'EroConfirmText">', broadcast_page) is not None:
if re.search('EroConfirmText">', broadcast_page) is not None:
(username, password) = self._get_login_info()
if username is None:
raise ExtractorError(u'Erotic broadcasts allowed only for registered users, '
u'use --username and --password options to provide account credentials.', expected=True)
raise ExtractorError('Erotic broadcasts allowed only for registered users, '
'use --username and --password options to provide account credentials.', expected=True)
# Log in
login_form_strs = {
u'login-hint53': '1',
u'confirm_erotic': '1',
u'login': username,
u'password': password,
login_form = {
'login-hint53': '1',
'confirm_erotic': '1',
'login': username,
'password': password,
}
# Convert to UTF-8 *before* urlencode because Python 2.x's urlencode
# chokes on unicode
login_form = dict((k.encode('utf-8'), v.encode('utf-8')) for k,v in login_form_strs.items())
login_data = compat_urllib_parse.urlencode(login_form).encode('utf-8')
login_url = broadcast_url + '/?no_redirect=1'
request = compat_urllib_request.Request(login_url, login_data)
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
broadcast_page = self._download_webpage(
request, broadcast_id, note=u'Logging in and confirming age')
if re.search(u'>Неверный логин или пароль<', broadcast_page) is not None:
raise ExtractorError(u'Unable to log in: bad username or password', expected=True)
request = compat_urllib_request.Request(broadcast_url + '/?no_redirect=1', compat_urllib_parse.urlencode(login_form))
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
broadcast_page = self._download_webpage(request, broadcast_id, 'Logging in and confirming age')
if re.search('>Неверный логин или пароль<', broadcast_page) is not None:
raise ExtractorError('Unable to log in: bad username or password', expected=True)
adult_content = True
else:
adult_content = False
ticket = self._html_search_regex(
u'window\.broadcast_control\.addFlashVar\\(\'file\', \'([^\']+)\'\\);',
broadcast_page, u'broadcast ticket')
'window\.broadcast_control\.addFlashVar\\(\'file\', \'([^\']+)\'\\);',
broadcast_page, 'broadcast ticket')
url = 'http://smotri.com/broadcast/view/url/?ticket=%s' % ticket
@ -328,22 +324,22 @@ class SmotriBroadcastIE(InfoExtractor):
if broadcast_password:
url += '&pass=%s' % hashlib.md5(broadcast_password.encode('utf-8')).hexdigest()
broadcast_json_page = self._download_webpage(url, broadcast_id, u'Downloading broadcast JSON')
broadcast_json_page = self._download_webpage(url, broadcast_id, 'Downloading broadcast JSON')
try:
broadcast_json = json.loads(broadcast_json_page)
protected_broadcast = broadcast_json['_pass_protected'] == 1
if protected_broadcast and not broadcast_password:
raise ExtractorError(u'This broadcast is protected by a password, use the --video-password option', expected=True)
raise ExtractorError('This broadcast is protected by a password, use the --video-password option', expected=True)
broadcast_offline = broadcast_json['is_play'] == 0
if broadcast_offline:
raise ExtractorError(u'Broadcast %s is offline' % broadcast_id, expected=True)
raise ExtractorError('Broadcast %s is offline' % broadcast_id, expected=True)
rtmp_url = broadcast_json['_server']
if not rtmp_url.startswith('rtmp://'):
raise ExtractorError(u'Unexpected broadcast rtmp URL')
raise ExtractorError('Unexpected broadcast rtmp URL')
broadcast_playpath = broadcast_json['_streamName']
broadcast_thumbnail = broadcast_json['_imgURL']
@ -354,8 +350,8 @@ class SmotriBroadcastIE(InfoExtractor):
rtmp_conn = 'S:%s' % uuid.uuid4().hex
except KeyError:
if protected_broadcast:
raise ExtractorError(u'Bad broadcast password', expected=True)
raise ExtractorError(u'Unexpected broadcast JSON')
raise ExtractorError('Bad broadcast password', expected=True)
raise ExtractorError('Unexpected broadcast JSON')
return {
'id': broadcast_id,

View File

@ -17,6 +17,7 @@ class SohuIE(InfoExtractor):
u'info_dict': {
u'title': u'MVFar East Movement《The Illest》',
},
u'skip': u'Only available from China',
}
def _real_extract(self, url):

View File

@ -1,34 +1,36 @@
import re
from __future__ import unicode_literals
from .mtv import MTVServicesInfoExtractor
class SouthParkStudiosIE(MTVServicesInfoExtractor):
IE_NAME = u'southparkstudios.com'
_VALID_URL = r'(https?://)?(www\.)?(?P<url>southparkstudios\.com/(clips|full-episodes)/(?P<id>.+?)(\?|#|$))'
IE_NAME = 'southparkstudios.com'
_VALID_URL = r'https?://(www\.)?(?P<url>southparkstudios\.com/(clips|full-episodes)/(?P<id>.+?)(\?|#|$))'
_FEED_URL = 'http://www.southparkstudios.com/feeds/video-player/mrss'
_TESTS = [{
u'url': u'http://www.southparkstudios.com/clips/104437/bat-daded#tab=featured',
u'file': u'a7bff6c2-ed00-11e0-aca6-0026b9414f30.mp4',
u'info_dict': {
u'title': u'Bat Daded',
u'description': u'Randy disqualifies South Park by getting into a fight with Bat Dad.',
'url': 'http://www.southparkstudios.com/clips/104437/bat-daded#tab=featured',
'info_dict': {
'id': 'a7bff6c2-ed00-11e0-aca6-0026b9414f30',
'ext': 'mp4',
'title': 'Bat Daded',
'description': 'Randy disqualifies South Park by getting into a fight with Bat Dad.',
},
}]
class SouthparkDeIE(SouthParkStudiosIE):
IE_NAME = u'southpark.de'
_VALID_URL = r'(https?://)?(www\.)?(?P<url>southpark\.de/(clips|alle-episoden)/(?P<id>.+?)(\?|#|$))'
IE_NAME = 'southpark.de'
_VALID_URL = r'https?://(www\.)?(?P<url>southpark\.de/(clips|alle-episoden)/(?P<id>.+?)(\?|#|$))'
_FEED_URL = 'http://www.southpark.de/feeds/video-player/mrss/'
_TESTS = [{
u'url': u'http://www.southpark.de/clips/uygssh/the-government-wont-respect-my-privacy#tab=featured',
u'file': u'85487c96-b3b9-4e39-9127-ad88583d9bf2.mp4',
u'info_dict': {
u'title': u'The Government Won\'t Respect My Privacy',
u'description': u'Cartman explains the benefits of "Shitter" to Stan, Kyle and Craig.',
'url': 'http://www.southpark.de/clips/uygssh/the-government-wont-respect-my-privacy#tab=featured',
'info_dict': {
'id': '85487c96-b3b9-4e39-9127-ad88583d9bf2',
'ext': 'mp4',
'title': 'The Government Won\'t Respect My Privacy',
'description': 'Cartman explains the benefits of "Shitter" to Stan, Kyle and Craig.',
},
}]

View File

@ -1,3 +1,5 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
@ -8,14 +10,14 @@ from ..utils import RegexNotFoundError, ExtractorError
class SpaceIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:www|m)\.)?space\.com/\d+-(?P<title>[^/\.\?]*?)-video\.html'
_TEST = {
u'add_ie': ['Brightcove'],
u'url': u'http://www.space.com/23373-huge-martian-landforms-detail-revealed-by-european-probe-video.html',
u'info_dict': {
u'id': u'2780937028001',
u'ext': u'mp4',
u'title': u'Huge Martian Landforms\' Detail Revealed By European Probe | Video',
u'description': u'md5:db81cf7f3122f95ed234b631a6ea1e61',
u'uploader': u'TechMedia Networks',
'add_ie': ['Brightcove'],
'url': 'http://www.space.com/23373-huge-martian-landforms-detail-revealed-by-european-probe-video.html',
'info_dict': {
'id': '2780937028001',
'ext': 'mp4',
'title': 'Huge Martian Landforms\' Detail Revealed By European Probe | Video',
'description': 'md5:db81cf7f3122f95ed234b631a6ea1e61',
'uploader': 'TechMedia Networks',
},
}

View File

@ -1,6 +1,5 @@
from __future__ import unicode_literals
import os
import re
from .common import InfoExtractor
@ -8,23 +7,27 @@ from ..utils import (
compat_urllib_parse_urlparse,
compat_urllib_request,
compat_urllib_parse,
unified_strdate,
str_to_int,
int_or_none,
)
from ..aes import (
aes_decrypt_text
)
from ..aes import aes_decrypt_text
class SpankwireIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?(?:www\.)?(?P<url>spankwire\.com/[^/]*/video(?P<videoid>[0-9]+)/?)'
_VALID_URL = r'https?://(?:www\.)?(?P<url>spankwire\.com/[^/]*/video(?P<videoid>[0-9]+)/?)'
_TEST = {
'url': 'http://www.spankwire.com/Buckcherry-s-X-Rated-Music-Video-Crazy-Bitch/video103545/',
'file': '103545.mp4',
'md5': '1b3f55e345500552dbc252a3e9c1af43',
'md5': '8bbfde12b101204b39e4b9fe7eb67095',
'info_dict': {
"uploader": "oreusz",
"title": "Buckcherry`s X Rated Music Video Crazy Bitch",
"description": "Crazy Bitch X rated music video.",
"age_limit": 18,
'id': '103545',
'ext': 'mp4',
'title': 'Buckcherry`s X Rated Music Video Crazy Bitch',
'description': 'Crazy Bitch X rated music video.',
'uploader': 'oreusz',
'uploader_id': '124697',
'upload_date': '20070508',
'age_limit': 18,
}
}
@ -37,13 +40,26 @@ class SpankwireIE(InfoExtractor):
req.add_header('Cookie', 'age_verified=1')
webpage = self._download_webpage(req, video_id)
video_title = self._html_search_regex(r'<h1>([^<]+)', webpage, 'title')
video_uploader = self._html_search_regex(
r'by:\s*<a [^>]*>(.+?)</a>', webpage, 'uploader', fatal=False)
thumbnail = self._html_search_regex(
r'flashvars\.image_url = "([^"]+)', webpage, 'thumbnail', fatal=False)
title = self._html_search_regex(r'<h1>([^<]+)', webpage, 'title')
description = self._html_search_regex(
r'<div\s+id="descriptionContent">([^<]+)<', webpage, 'description', fatal=False)
thumbnail = self._html_search_regex(
r'flashvars\.image_url = "([^"]+)', webpage, 'thumbnail', fatal=False)
uploader = self._html_search_regex(
r'by:\s*<a [^>]*>(.+?)</a>', webpage, 'uploader', fatal=False)
uploader_id = self._html_search_regex(
r'by:\s*<a href="/Profile\.aspx\?.*?UserId=(\d+).*?"', webpage, 'uploader id', fatal=False)
upload_date = self._html_search_regex(r'</a> on (.+?) at \d+:\d+', webpage, 'upload date', fatal=False)
if upload_date:
upload_date = unified_strdate(upload_date)
view_count = self._html_search_regex(
r'<div id="viewsCounter"><span>([^<]+)</span> views</div>', webpage, 'view count', fatal=False)
if view_count:
view_count = str_to_int(view_count)
comment_count = int_or_none(self._html_search_regex(
r'<span id="spCommentCount">\s*(\d+)</span> Comments</div>', webpage, 'comment count', fatal=False))
video_urls = list(map(compat_urllib_parse.unquote , re.findall(r'flashvars\.quality_[0-9]{3}p = "([^"]+)', webpage)))
if webpage.find('flashvars\.encrypted = "true"') != -1:
@ -53,16 +69,13 @@ class SpankwireIE(InfoExtractor):
formats = []
for video_url in video_urls:
path = compat_urllib_parse_urlparse(video_url).path
extension = os.path.splitext(path)[1][1:]
format = path.split('/')[4].split('_')[:2]
resolution, bitrate_str = format
format = "-".join(format)
height = int(resolution.rstrip('P'))
tbr = int(bitrate_str.rstrip('K'))
height = int(resolution.rstrip('Pp'))
tbr = int(bitrate_str.rstrip('Kk'))
formats.append({
'url': video_url,
'ext': extension,
'resolution': resolution,
'format': format,
'tbr': tbr,
@ -75,10 +88,14 @@ class SpankwireIE(InfoExtractor):
return {
'id': video_id,
'uploader': video_uploader,
'title': video_title,
'thumbnail': thumbnail,
'title': title,
'description': description,
'thumbnail': thumbnail,
'uploader': uploader,
'uploader_id': uploader_id,
'upload_date': upload_date,
'view_count': view_count,
'comment_count': comment_count,
'formats': formats,
'age_limit': age_limit,
}

View File

@ -1,3 +1,5 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
@ -6,20 +8,20 @@ from .common import InfoExtractor
class SpiegelIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?spiegel\.de/video/[^/]*-(?P<videoID>[0-9]+)(?:\.html)?(?:#.*)?$'
_TESTS = [{
u'url': u'http://www.spiegel.de/video/vulkan-tungurahua-in-ecuador-ist-wieder-aktiv-video-1259285.html',
u'file': u'1259285.mp4',
u'md5': u'2c2754212136f35fb4b19767d242f66e',
u'info_dict': {
u"title": u"Vulkanausbruch in Ecuador: Der \"Feuerschlund\" ist wieder aktiv"
}
'url': 'http://www.spiegel.de/video/vulkan-tungurahua-in-ecuador-ist-wieder-aktiv-video-1259285.html',
'file': '1259285.mp4',
'md5': '2c2754212136f35fb4b19767d242f66e',
'info_dict': {
'title': 'Vulkanausbruch in Ecuador: Der "Feuerschlund" ist wieder aktiv',
},
},
{
u'url': u'http://www.spiegel.de/video/schach-wm-videoanalyse-des-fuenften-spiels-video-1309159.html',
u'file': u'1309159.mp4',
u'md5': u'f2cdf638d7aa47654e251e1aee360af1',
u'info_dict': {
u'title': u'Schach-WM in der Videoanalyse: Carlsen nutzt die Fehlgriffe des Titelverteidigers'
}
'url': 'http://www.spiegel.de/video/schach-wm-videoanalyse-des-fuenften-spiels-video-1309159.html',
'file': '1309159.mp4',
'md5': 'f2cdf638d7aa47654e251e1aee360af1',
'info_dict': {
'title': 'Schach-WM in der Videoanalyse: Carlsen nutzt die Fehlgriffe des Titelverteidigers',
},
}]
def _real_extract(self, url):
@ -29,17 +31,17 @@ class SpiegelIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
video_title = self._html_search_regex(
r'<div class="module-title">(.*?)</div>', webpage, u'title')
r'<div class="module-title">(.*?)</div>', webpage, 'title')
xml_url = u'http://video2.spiegel.de/flash/' + video_id + u'.xml'
xml_url = 'http://video2.spiegel.de/flash/' + video_id + '.xml'
idoc = self._download_xml(
xml_url, video_id,
note=u'Downloading XML', errnote=u'Failed to download XML')
note='Downloading XML', errnote='Failed to download XML')
formats = [
{
'format_id': n.tag.rpartition('type')[2],
'url': u'http://video2.spiegel.de/flash/' + n.find('./filename').text,
'url': 'http://video2.spiegel.de/flash/' + n.find('./filename').text,
'width': int(n.find('./width').text),
'height': int(n.find('./height').text),
'abr': int(n.find('./audiobitrate').text),
@ -55,10 +57,9 @@ class SpiegelIE(InfoExtractor):
self._sort_formats(formats)
info = {
return {
'id': video_id,
'title': video_title,
'duration': duration,
'formats': formats,
}
return info

View File

@ -1,36 +1,38 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class StatigramIE(InfoExtractor):
_VALID_URL = r'(?:http://)?(?:www\.)?statigr\.am/p/([^/]+)'
_VALID_URL = r'https?://(www\.)?statigr\.am/p/(?P<id>[^/]+)'
_TEST = {
u'url': u'http://statigr.am/p/522207370455279102_24101272',
u'file': u'522207370455279102_24101272.mp4',
u'md5': u'6eb93b882a3ded7c378ee1d6884b1814',
u'info_dict': {
u'uploader_id': u'aguynamedpatrick',
u'title': u'Instagram photo by @aguynamedpatrick (Patrick Janelle)',
'url': 'http://statigr.am/p/522207370455279102_24101272',
'md5': '6eb93b882a3ded7c378ee1d6884b1814',
'info_dict': {
'id': '522207370455279102_24101272',
'ext': 'mp4',
'uploader_id': 'aguynamedpatrick',
'title': 'Instagram photo by @aguynamedpatrick (Patrick Janelle)',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group(1)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
html_title = self._html_search_regex(
r'<title>(.+?)</title>',
webpage, u'title')
webpage, 'title')
title = re.sub(r'(?: *\(Videos?\))? \| Statigram$', '', html_title)
uploader_id = self._html_search_regex(
r'@([^ ]+)', title, u'uploader name', fatal=False)
ext = 'mp4'
r'@([^ ]+)', title, 'uploader name', fatal=False)
return [{
return {
'id': video_id,
'url': self._og_search_video_url(webpage),
'ext': ext,
'title': title,
'thumbnail': self._og_search_thumbnail(webpage),
'uploader_id' : uploader_id
}]
'uploader_id': uploader_id
}

Some files were not shown because too many files have changed in this diff Show More