Compare commits

...

118 Commits

Author SHA1 Message Date
Philipp Hagemeister
cda008cff1 release 2014.02.27.1 2014-02-27 16:09:58 +01:00
Sergey M.
1877a14049 [lifenews] Switch to non-mobile webpage version (Fixes #2476) 2014-02-27 21:45:34 +07:00
Sergey M.
a9ab8855e4 [prosiebensat1] Fix typo 2014-02-27 17:53:09 +07:00
Sergey M.
8a44ef6868 [prosiebensat1] Add rtmpe support 2014-02-27 17:52:52 +07:00
Sergey M.
0c7214c404 [prosiebensat1] Add support for ProSiebenSat.1 Digital sites (Closes
#2346 #2469)
2014-02-27 17:44:29 +07:00
Sergey M.
4cf9654693 Add one more format to unified_strdate 2014-02-27 17:44:05 +07:00
Philipp Hagemeister
91346358b0 release 2014.02.27 2014-02-27 07:22:34 +01:00
Philipp Hagemeister
f3783d4b77 Merge branch 'master' of github.com:rg3/youtube-dl 2014-02-27 07:22:22 +01:00
Philipp Hagemeister
89ef304bed [generic] Add support for <meta redirect>
Fixes #413
2014-02-27 07:22:02 +01:00
Sergey M.
2acea5c03d [mit] Fix MITIE test 2014-02-26 18:09:43 +07:00
Sergey M.
978177527e [rtlnow] Remove unused import 2014-02-26 18:02:17 +07:00
Sergey M.
2648c436f3 Merge pull request #2464 from rzhxeo/xhamster
[XHamsterIE] Make hd video search more robust
2014-02-26 02:53:54 -08:00
Sergey M.
33f1f2c455 [rtlnow] Fix duration extraction 2014-02-26 17:49:49 +07:00
Sergey M.
995befe0e9 [rtlnow] Replace n-tvnow.de test 2014-02-26 17:43:56 +07:00
Sergey M.
1bb92aff55 [rtlnow] Modernize and add f4m support 2014-02-26 17:36:16 +07:00
rzhxeo
b8e1471d3a [XHamsterIE] Make hd video search more robust 2014-02-26 10:01:44 +01:00
Philipp Hagemeister
a83a3139d1 [mit] Add import 2014-02-26 00:41:13 +01:00
Philipp Hagemeister
fdb7ca3b8d release 2014.02.26 2014-02-26 00:32:22 +01:00
Philipp Hagemeister
0d7caf5cdf Merge remote-tracking branch 'ruuk/master' 2014-02-26 00:31:08 +01:00
Philipp Hagemeister
a339d7ba91 Credit @amlweems for ocw.mit (#2460) 2014-02-26 00:30:47 +01:00
Philipp Hagemeister
7216de55d6 [mit] Fix ocw tests 2014-02-26 00:29:45 +01:00
Philipp Hagemeister
2437fbca64 [tests] Raise an exception if test definition is invalid (Found in #2460) 2014-02-26 00:12:02 +01:00
Philipp Hagemeister
7d75d06b78 Merge branch 'ocw-mit-edu' of https://github.com/amlweems/youtube-dl 2014-02-26 00:09:42 +01:00
Philipp Hagemeister
13ef5648c4 Merge branch 'master' of github.com:rg3/youtube-dl 2014-02-26 00:07:45 +01:00
Philipp Hagemeister
5b2478e2ba [mit] Modernize 2014-02-26 00:06:31 +01:00
Jaime Marquínez Ferrándiz
8b286571c3 [mixcloud] Fix _VALID_RE (fixes #2462)
Accept any character except `/` for uploader and the name, caused problems with non ASCII characters
2014-02-26 00:04:03 +01:00
Jaime Marquínez Ferrándiz
f3ac523794 Merge pull request #2461 from niebles/master
Update __init__.py

`io` wasn't imported.
2014-02-26 00:00:57 +01:00
Jaime Marquínez Ferrándiz
020cf5ebfd [nbc] Add an extractor for the main nbc.com site
Some of the videos are encrypted, the f4m downloader doesn’t support them.
2014-02-25 23:57:54 +01:00
ruuk
54ab193970 Extract thumbnail with _og_search_thumbnail 2014-02-25 14:41:36 -08:00
niebles
8f563f32ab Update __init__.py 2014-02-25 17:31:16 -05:00
Anthony Weems
151bae3566 Add support for ocw.mit.edu video lectures 2014-02-25 14:44:34 -06:00
ruuk
76df418cba Add thumbnail for metacafe 2014-02-25 12:04:44 -08:00
Jaime Marquínez Ferrándiz
d0a72674c6 [crunchyroll] Use enumerate 2014-02-25 20:51:51 +01:00
Sergey M.
1d430674c7 [crunchyroll] Handle error message 2014-02-25 20:30:17 +07:00
Sergey M
70cb73922b [crunchyroll] Fix subtitle lang code extraction 2014-02-25 20:29:53 +07:00
Sergey M
344400951c [crunchyroll] Tidy and modernize 2014-02-25 20:29:53 +07:00
Jaime Marquínez Ferrándiz
ea5a0be811 Skip youtube toptracks test
All the playlists return 500 errors.
2014-02-25 14:11:01 +01:00
Philipp Hagemeister
3c7fd0bdb2 release 2014.02.25.1 2014-02-25 11:15:55 +01:00
Philipp Hagemeister
6cadf8c858 [vevo] Add age_limit support 2014-02-25 11:15:34 +01:00
Philipp Hagemeister
27579b9e4c [vevo] Add suppot for v3 SMIL URLs (Fixes #2409) 2014-02-25 11:06:47 +01:00
Philipp Hagemeister
4d756a9cc0 [testurl] Fix case when only one IE matches 2014-02-25 10:43:34 +01:00
Philipp Hagemeister
3e668e05be Merge pull request #2456 from AGSPhoenix/master
[YT] Fix incorrect format code descriptions
2014-02-25 10:24:02 +01:00
AGSPhoenix
60d3a2e0f8 Fix incorrect format codes
Corrects the descriptions for the DASH video format codes 264 and 138
(1440p and 2160p, respectively).
2014-02-24 21:29:37 -05:00
Philipp Hagemeister
cc3a3b6b47 release 2014.02.25 2014-02-25 01:45:10 +01:00
Philipp Hagemeister
eda1d49a62 Merge remote-tracking branch 'origin/master' 2014-02-25 01:45:00 +01:00
Philipp Hagemeister
62e609ab77 Ignore BOM in batch files (Fixes #2450) 2014-02-25 01:43:17 +01:00
Jaime Marquínez Ferrándiz
2bfe4ead4b [veoh] Allow to download videos with age protection (fixes #2455) 2014-02-24 22:01:34 +01:00
Sergey M.
b1c6c32f78 [generic] Add support for nowvideo embedded videos 2014-02-24 23:37:42 +07:00
Philipp Hagemeister
f6acbdecf4 [podomatic] Use unicode_literals 2014-02-24 17:31:09 +01:00
Sergey M.
f1c9dfcc01 [nowvideo] Rewrite based on novamov extractor 2014-02-24 23:30:58 +07:00
Sergey M.
ce78943ae1 [novamov] Generalize extractor 2014-02-24 23:30:09 +07:00
Sergey M.
d6f0d86649 [novamov] Improve _VALID_URL 2014-02-24 22:01:19 +07:00
Jaime Marquínez Ferrándiz
5bb67dbfea [cinemassacre] Modernize 2014-02-24 14:44:29 +01:00
Jaime Marquínez Ferrándiz
47610c4d3e [cinemassacre] Fix extraction
Now we download over http, we don't need rtmpdump.
2014-02-24 14:35:26 +01:00
Jaime Marquínez Ferrándiz
b732f3581f [academicearth] Remove debug print 2014-02-24 14:20:17 +01:00
Jaime Marquínez Ferrándiz
9e57ce716f [academicearth] Fix extraction
The courses seems to be no longer available, changed the test to a playlist.
2014-02-24 14:18:12 +01:00
Jaime Marquínez Ferrándiz
cd7ee7aa44 [nbc] Modernize 2014-02-24 14:00:31 +01:00
Jaime Marquínez Ferrándiz
3cfe791473 [iprima] Add missing ) 2014-02-24 13:50:53 +01:00
Philipp Hagemeister
973f2532f5 [iprima] Add support for -WEB URLs (Closes #2449) 2014-02-24 10:12:36 +01:00
Philipp Hagemeister
bc3be21d59 [iprima] Clean up a little bit 2014-02-24 09:53:48 +01:00
Philipp Hagemeister
0bf5cf9886 release 2014.02.24 2014-02-24 09:44:22 +01:00
Sergey M.
919052d094 [zdf] Fix podcast extraction and use unicode literals (Closes #2446) 2014-02-24 13:47:47 +07:00
Sergey M.
a2dafe2887 [youtube] Fix mix video regex
Attributes' order in <li> is arbitrary and changes every time playlist
page is fetched, so we can't rely on `data-index` to be before
`data-video-username`.
2014-02-24 12:52:02 +07:00
Jaime Marquínez Ferrándiz
92661c994b [normalboots] Modernize and simplify 2014-02-23 18:28:22 +01:00
Jaime Marquínez Ferrándiz
ffe8fe356a [normalboots] Fix video url extraction 2014-02-23 18:06:51 +01:00
Jaime Marquínez Ferrándiz
bc2f773b4f [youtube:playlist] Fix mixes extraction (fixes #2444) 2014-02-23 17:17:36 +01:00
Sergey M.
f919201ecc [vine] Extract more metadata and support low format 2014-02-23 19:02:31 +07:00
Sergey M.
7ff5d5c2e2 Add one more format to unified_strdate 2014-02-23 19:00:51 +07:00
Jaime Marquínez Ferrándiz
9b77f951c7 [breakcom] Fix error when calling _search_regex
I passed `’webpage’` instead of the variable `webpage`.
2014-02-23 12:28:44 +01:00
Jaime Marquínez Ferrándiz
a25f2f990a [breakcom] Fix info json extraction 2014-02-23 12:20:58 +01:00
Jaime Marquínez Ferrándiz
78b373975d [vine] Fix uploader extraction 2014-02-23 12:08:30 +01:00
Philipp Hagemeister
2fcc873c4c release 2014.02.22.1 2014-02-22 23:17:56 +01:00
Philipp Hagemeister
23c2baadb3 [videobam] Set age_limit to 18
From [their ToS](http://videobam.com/terms): "User must be eighteen 18[sic] years of age or older to use or access this web site."
2014-02-22 23:15:41 +01:00
Philipp Hagemeister
521ee82334 Fix imports 2014-02-22 23:03:12 +01:00
Philipp Hagemeister
1df96e59ce [f4m] Clean up 2014-02-22 23:03:00 +01:00
Sergey M.
3e123c1e28 [videobam] Add support for videobam.com (Closes #2411) 2014-02-23 04:50:05 +07:00
Philipp Hagemeister
f38da66731 Credit @soult for br 2014-02-22 20:19:41 +01:00
Philipp Hagemeister
06aabfc422 [br] Simplify 2014-02-22 20:17:26 +01:00
Philipp Hagemeister
1052d2bfec Merge remote-tracking branch 'soult/br' 2014-02-22 17:14:47 +01:00
Philipp Hagemeister
5e0b652344 release 2014.02.22 2014-02-22 15:07:25 +01:00
Philipp Hagemeister
0f8f097183 [release.sh] Do not run tests by default
We are at the point that testing takes waay too long for a release cycle, and fails way too often.
Tests through travis are a better indicator than testing just before release.
2014-02-22 15:06:07 +01:00
Philipp Hagemeister
491ed3dda2 [trutube] Support multiple formats (#2433) 2014-02-22 15:05:30 +01:00
Philipp Hagemeister
af284c6d1b Merge remote-tracking branch 'JohnyMoSwag/master' 2014-02-22 14:38:42 +01:00
Philipp Hagemeister
41d3ec5fba [savefrom] Add extractor (Fixes #2434) 2014-02-22 14:36:16 +01:00
Philipp Hagemeister
0568c352f3 [canalc2] Modernize 2014-02-22 14:27:09 +01:00
Sergey M.
2e7b4cb714 [spankwire] Fix uploader id regex 2014-02-22 16:50:08 +07:00
Sergey M.
9767726b66 [spankwire] Improve and modernize 2014-02-22 16:45:03 +07:00
Johny Mo Swag
9ddfd84e41 added trutubeIE 2014-02-22 00:11:57 -08:00
Philipp Hagemeister
1cf563d84b release 2014.02.21.1 2014-02-21 18:19:48 +01:00
David Triendl
7928024f57 [BR] Add basic test 2014-02-21 18:00:05 +01:00
David Triendl
3eb38acb43 [BR] Add "BR" extractor
Extractor for videos from the Bayerischer Rundfunk Mediathek[1]. Currently only
supports videos. Audio and podcasts do not work yet with this extractor.

1: http://br.de/mediathek
2014-02-21 17:58:52 +01:00
Jaime Marquínez Ferrándiz
f7300c5c90 [generic] Fix on python 2.6
`ParseError` is not available, it raises `xml.parsers.expat.ExpatError`.
The webpage needs to be encoded.
2014-02-21 16:59:10 +01:00
Jaime Marquínez Ferrándiz
3489b7d26c [youtube] Simplify the decryption process for the manifest urls and add a test (closes #2422) 2014-02-21 15:15:58 +01:00
Jaime Marquínez Ferrándiz
acd2bcc384 Merge branch 'youtube-dash' of github.com:m0vie/youtube-dl 2014-02-21 15:02:47 +01:00
Philipp Hagemeister
43e77ca455 release 2014.02.21 2014-02-21 12:16:03 +01:00
Sergey M.
da36297988 [wimp] Modernize and replace test 2014-02-21 17:57:19 +07:00
Sergey M.
dbb94fb044 [youtube] Fix playlist extraction (Closes #2423, #2424, #2425) 2014-02-21 17:19:55 +07:00
m0viefreak
d68f0cdb23 [youtube] decrypt signature when downloading dash manifest 2014-02-21 03:24:56 +01:00
Philipp Hagemeister
eae16eb67b release 2014.02.20 2014-02-20 13:14:21 +01:00
Philipp Hagemeister
4fc946b546 [generic] Add support for RSS feeds (Fixes #667) 2014-02-20 13:14:09 +01:00
Sergey M.
280bc5dad6 [bbccouk] Add friendly contry filter error message (#2184) 2014-02-20 18:50:34 +07:00
Jaime Marquínez Ferrándiz
f43770d8c9 Merge pull request #2413 from bentley/optypo
Fix minor typo: “to to” → “to”.
2014-02-20 08:02:54 +01:00
Anthony J. Bentley
98c4b8fa1b Fix minor typo: “to to” → “to”. 2014-02-19 20:02:29 -07:00
Sergey M.
ccb079ee67 [xhamster] Fix and improve 2014-02-20 02:37:44 +07:00
Jaime Marquínez Ferrándiz
2ea237472c Merge pull request #2408 from pulpe/_readme
[README.md] correct the test command
2014-02-19 16:45:14 +01:00
pulpe
0d4b4865cc [README.md] correct the test command 2014-02-19 16:13:45 +01:00
Philipp Hagemeister
fe52f9f956 Document prefered config location (#2407) 2014-02-19 11:35:35 +01:00
Philipp Hagemeister
882907a818 release 2014.02.19.1 2014-02-19 01:27:22 +01:00
Philipp Hagemeister
572a89cc4e [liveleak] Add support for prochan embeds (Fixes #2406) 2014-02-19 01:27:12 +01:00
Philipp Hagemeister
c377110539 release 2014.02.19 2014-02-19 01:08:16 +01:00
Philipp Hagemeister
a9c7198a0b [testurl] Add extractor
This is a pseudo extractor that can be used to quickly look up test URLs, or test without the test harness.
2014-02-19 01:06:16 +01:00
Philipp Hagemeister
f6f01ea17b [space] modernize 2014-02-19 01:04:24 +01:00
Sergey M.
f2d0fc6823 [bbccouk] Replace test
This older episode is from 1994 and hopefully won't get deleted.
2014-02-19 06:46:14 +07:00
Sergey M.
f7000f3a1b [youtube] Add support for yourepeat.com URLs (Closes #2397) 2014-02-19 02:00:54 +07:00
Sergey M.
c7f0177fa7 [bbccouk] Skip test 2014-02-18 00:26:12 +07:00
Philipp Hagemeister
09c4d50944 Fix indenting in README 2014-02-17 14:58:39 +01:00
Philipp Hagemeister
2eb5d315d4 [youtube] Match more truncated URLs (Closes #2402) 2014-02-17 14:56:21 +01:00
Philipp Hagemeister
ad5976b4d9 [vimeo] Modernize test definition 2014-02-17 11:44:24 +01:00
49 changed files with 1659 additions and 618 deletions

View File

@@ -20,7 +20,7 @@ which means you can modify it, redistribute it or use it however you like.
sure that you have sufficient permissions sure that you have sufficient permissions
(run with sudo if needed) (run with sudo if needed)
-i, --ignore-errors continue on download errors, for example to -i, --ignore-errors continue on download errors, for example to
to skip unavailable videos in a playlist skip unavailable videos in a playlist
--abort-on-error Abort downloading of further videos (in the --abort-on-error Abort downloading of further videos (in the
playlist or the command line) if an error playlist or the command line) if an error
occurs occurs
@@ -246,7 +246,7 @@ which means you can modify it, redistribute it or use it however you like.
# CONFIGURATION # CONFIGURATION
You can configure youtube-dl by placing default arguments (such as `--extract-audio --no-mtime` to always extract the audio and not copy the mtime) into `/etc/youtube-dl.conf` and/or `~/.config/youtube-dl.conf`. On Windows, the configuration file locations are `%APPDATA%\youtube-dl\config.txt` and `C:\Users\<Yourname>\youtube-dl.conf`. You can configure youtube-dl by placing default arguments (such as `--extract-audio --no-mtime` to always extract the audio and not copy the mtime) into `/etc/youtube-dl.conf` and/or `~/.config/youtube-dl/config`. On Windows, the configuration file locations are `%APPDATA%\youtube-dl\config.txt` and `C:\Users\<Yourname>\youtube-dl.conf`.
# OUTPUT TEMPLATE # OUTPUT TEMPLATE
@@ -281,10 +281,12 @@ Videos can be filtered by their upload date using the options `--date`, `--dateb
Examples: Examples:
$ # Download only the videos uploaded in the last 6 months # Download only the videos uploaded in the last 6 months
$ youtube-dl --dateafter now-6months $ youtube-dl --dateafter now-6months
$ # Download only the videos uploaded on January 1, 1970
# Download only the videos uploaded on January 1, 1970
$ youtube-dl --date 19700101 $ youtube-dl --date 19700101
$ # will only download the videos uploaded in the 200x decade $ # will only download the videos uploaded in the 200x decade
$ youtube-dl --dateafter 20000101 --datebefore 20091231 $ youtube-dl --dateafter 20000101 --datebefore 20091231
@@ -355,7 +357,7 @@ If you want to create a build of youtube-dl yourself, you'll need
### Adding support for a new site ### Adding support for a new site
If you want to add support for a new site, copy *any* [recently modified](https://github.com/rg3/youtube-dl/commits/master/youtube_dl/extractor) file in `youtube_dl/extractor`, add an import in [`youtube_dl/extractor/__init__.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/__init__.py). Have a look at [`youtube_dl/common/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should return](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L38). Don't forget to run the tests with `python test/test_download.py Test_Download.test_YourExtractor`! For a detailed tutorial, refer to [this blog post](http://filippo.io/add-support-for-a-new-video-site-to-youtube-dl/). If you want to add support for a new site, copy *any* [recently modified](https://github.com/rg3/youtube-dl/commits/master/youtube_dl/extractor) file in `youtube_dl/extractor`, add an import in [`youtube_dl/extractor/__init__.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/__init__.py). Have a look at [`youtube_dl/common/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should return](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L38). Don't forget to run the tests with `python test/test_download.py TestDownload.test_YourExtractor`! For a detailed tutorial, refer to [this blog post](http://filippo.io/add-support-for-a-new-video-site-to-youtube-dl/).
# BUGS # BUGS

View File

@@ -14,9 +14,9 @@
set -e set -e
skip_tests=false
if [ "$1" = '--skip-test' ]; then
skip_tests=true skip_tests=true
if [ "$1" = '--run-tests' ]; then
skip_tests=false
shift shift
fi fi

View File

@@ -68,6 +68,9 @@ class TestAllURLsMatching(unittest.TestCase):
def test_youtube_show_matching(self): def test_youtube_show_matching(self):
self.assertMatch('http://www.youtube.com/show/airdisasters', ['youtube:show']) self.assertMatch('http://www.youtube.com/show/airdisasters', ['youtube:show'])
def test_youtube_truncated(self):
self.assertMatch('http://www.youtube.com/watch?', ['youtube:truncated_url'])
def test_justin_tv_channelid_matching(self): def test_justin_tv_channelid_matching(self):
self.assertTrue(JustinTVIE.suitable(u"justin.tv/vanillatv")) self.assertTrue(JustinTVIE.suitable(u"justin.tv/vanillatv"))
self.assertTrue(JustinTVIE.suitable(u"twitch.tv/vanillatv")) self.assertTrue(JustinTVIE.suitable(u"twitch.tv/vanillatv"))

View File

@@ -18,6 +18,7 @@ from test.helper import (
import hashlib import hashlib
import io import io
import json import json
import re
import socket import socket
import youtube_dl.YoutubeDL import youtube_dl.YoutubeDL
@@ -72,9 +73,7 @@ def generator(test_case):
if 'playlist' not in test_case: if 'playlist' not in test_case:
info_dict = test_case.get('info_dict', {}) info_dict = test_case.get('info_dict', {})
if not test_case.get('file') and not (info_dict.get('id') and info_dict.get('ext')): if not test_case.get('file') and not (info_dict.get('id') and info_dict.get('ext')):
print_skipping('The output file cannot be know, the "file" ' raise Exception('Test definition incorrect. The output file cannot be known. Are both \'id\' and \'ext\' keys present?')
'key is missing or the info_dict is incomplete')
return
if 'skip' in test_case: if 'skip' in test_case:
print_skipping(test_case['skip']) print_skipping(test_case['skip'])
return return
@@ -137,6 +136,15 @@ def generator(test_case):
with io.open(info_json_fn, encoding='utf-8') as infof: with io.open(info_json_fn, encoding='utf-8') as infof:
info_dict = json.load(infof) info_dict = json.load(infof)
for (info_field, expected) in tc.get('info_dict', {}).items(): for (info_field, expected) in tc.get('info_dict', {}).items():
if isinstance(expected, compat_str) and expected.startswith('re:'):
got = info_dict.get(info_field)
match_str = expected[len('re:'):]
match_rex = re.compile(match_str)
self.assertTrue(
isinstance(got, compat_str) and match_rex.match(got),
u'field %s (value: %r) should match %r' % (info_field, got, match_str))
else:
if isinstance(expected, compat_str) and expected.startswith('md5:'): if isinstance(expected, compat_str) and expected.startswith('md5:'):
got = 'md5:' + md5(info_dict.get(info_field)) got = 'md5:' + md5(info_dict.get(info_field))
else: else:

View File

@@ -170,12 +170,12 @@ class TestPlaylists(unittest.TestCase):
def test_AcademicEarthCourse(self): def test_AcademicEarthCourse(self):
dl = FakeYDL() dl = FakeYDL()
ie = AcademicEarthCourseIE(dl) ie = AcademicEarthCourseIE(dl)
result = ie.extract('http://academicearth.org/courses/building-dynamic-websites/') result = ie.extract('http://academicearth.org/playlists/laws-of-nature/')
self.assertIsPlaylist(result) self.assertIsPlaylist(result)
self.assertEqual(result['id'], 'building-dynamic-websites') self.assertEqual(result['id'], 'laws-of-nature')
self.assertEqual(result['title'], 'Building Dynamic Websites') self.assertEqual(result['title'], 'Laws of Nature')
self.assertEqual(result['description'], u"Today's websites are increasingly dynamic. Pages are no longer static HTML files but instead generated by scripts and database calls. User interfaces are more seamless, with technologies like Ajax replacing traditional page reloads. This course teaches students how to build dynamic websites with Ajax and with Linux, Apache, MySQL, and PHP (LAMP), one of today's most popular frameworks. Students learn how to set up domain names with DNS, how to structure pages with XHTML and CSS, how to program in JavaScript and PHP, how to configure Apache and MySQL, how to design and query databases with SQL, how to use Ajax with both XML and JSON, and how to build mashups. The course explores issues of security, scalability, and cross-browser support and also discusses enterprise-level deployments of websites, including third-party hosting, virtualization, colocation in data centers, firewalling, and load-balancing.") self.assertEqual(result['description'],u'Introduce yourself to the laws of nature with these free online college lectures from Yale, Harvard, and MIT.')# u"Today's websites are increasingly dynamic. Pages are no longer static HTML files but instead generated by scripts and database calls. User interfaces are more seamless, with technologies like Ajax replacing traditional page reloads. This course teaches students how to build dynamic websites with Ajax and with Linux, Apache, MySQL, and PHP (LAMP), one of today's most popular frameworks. Students learn how to set up domain names with DNS, how to structure pages with XHTML and CSS, how to program in JavaScript and PHP, how to configure Apache and MySQL, how to design and query databases with SQL, how to use Ajax with both XML and JSON, and how to build mashups. The course explores issues of security, scalability, and cross-browser support and also discusses enterprise-level deployments of websites, including third-party hosting, virtualization, colocation in data centers, firewalling, and load-balancing.")
self.assertEqual(len(result['entries']), 10) self.assertEqual(len(result['entries']), 4)
def test_ivi_compilation(self): def test_ivi_compilation(self):
dl = FakeYDL() dl = FakeYDL()
@@ -250,5 +250,14 @@ class TestPlaylists(unittest.TestCase):
self.assertEqual(result['title'], 'python language') self.assertEqual(result['title'], 'python language')
self.assertTrue(len(result['entries']) == 15) self.assertTrue(len(result['entries']) == 15)
def test_generic_rss_feed(self):
dl = FakeYDL()
ie = GenericIE(dl)
result = ie.extract('http://www.escapistmagazine.com/rss/videos/list/1.xml')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], 'http://www.escapistmagazine.com/rss/videos/list/1.xml')
self.assertEqual(result['title'], 'Zero Punctuation')
self.assertTrue(len(result['entries']) > 10)
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()

View File

@@ -9,6 +9,7 @@ sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
# Various small unit tests # Various small unit tests
import io
import xml.etree.ElementTree import xml.etree.ElementTree
#from youtube_dl.utils import htmlentity_transform #from youtube_dl.utils import htmlentity_transform
@@ -21,6 +22,7 @@ from youtube_dl.utils import (
orderedSet, orderedSet,
PagedList, PagedList,
parse_duration, parse_duration,
read_batch_urls,
sanitize_filename, sanitize_filename,
shell_quote, shell_quote,
smuggle_url, smuggle_url,
@@ -250,5 +252,14 @@ class TestUtil(unittest.TestCase):
def test_struct_unpack(self): def test_struct_unpack(self):
self.assertEqual(struct_unpack(u'!B', b'\x00'), (0,)) self.assertEqual(struct_unpack(u'!B', b'\x00'), (0,))
def test_read_batch_urls(self):
f = io.StringIO(u'''\xef\xbb\xbf foo
bar\r
baz
# More after this line\r
; or after this
bam''')
self.assertEqual(read_batch_urls(f), [u'foo', u'bar', u'baz', u'bam'])
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()

View File

@@ -118,6 +118,8 @@ class TestYoutubeLists(unittest.TestCase):
self.assertEqual(original_video['id'], 'rjFaenf1T-Y') self.assertEqual(original_video['id'], 'rjFaenf1T-Y')
def test_youtube_toptracks(self): def test_youtube_toptracks(self):
print('Skipping: The playlist page gives error 500')
return
dl = FakeYDL() dl = FakeYDL()
ie = YoutubePlaylistIE(dl) ie = YoutubePlaylistIE(dl)
result = ie.extract('https://www.youtube.com/playlist?list=MCUS') result = ie.extract('https://www.youtube.com/playlist?list=MCUS')

View File

@@ -46,12 +46,15 @@ __authors__ = (
'Andreas Schmitz', 'Andreas Schmitz',
'Michael Kaiser', 'Michael Kaiser',
'Niklas Laxström', 'Niklas Laxström',
'David Triendl',
'Anthony Weems',
) )
__license__ = 'Public Domain' __license__ = 'Public Domain'
import codecs import codecs
import getpass import getpass
import io
import locale import locale
import optparse import optparse
import os import os
@@ -70,6 +73,7 @@ from .utils import (
get_cachedir, get_cachedir,
MaxDownloadsReached, MaxDownloadsReached,
preferredencoding, preferredencoding,
read_batch_urls,
SameFileError, SameFileError,
setproctitle, setproctitle,
std_headers, std_headers,
@@ -208,7 +212,7 @@ def parseOpts(overrideArguments=None):
general.add_option('-U', '--update', general.add_option('-U', '--update',
action='store_true', dest='update_self', help='update this program to latest version. Make sure that you have sufficient permissions (run with sudo if needed)') action='store_true', dest='update_self', help='update this program to latest version. Make sure that you have sufficient permissions (run with sudo if needed)')
general.add_option('-i', '--ignore-errors', general.add_option('-i', '--ignore-errors',
action='store_true', dest='ignoreerrors', help='continue on download errors, for example to to skip unavailable videos in a playlist', default=False) action='store_true', dest='ignoreerrors', help='continue on download errors, for example to skip unavailable videos in a playlist', default=False)
general.add_option('--abort-on-error', general.add_option('--abort-on-error',
action='store_false', dest='ignoreerrors', action='store_false', dest='ignoreerrors',
help='Abort downloading of further videos (in the playlist or the command line) if an error occurs') help='Abort downloading of further videos (in the playlist or the command line) if an error occurs')
@@ -551,21 +555,19 @@ def _real_main(argv=None):
sys.exit(0) sys.exit(0)
# Batch file verification # Batch file verification
batchurls = [] batch_urls = []
if opts.batchfile is not None: if opts.batchfile is not None:
try: try:
if opts.batchfile == '-': if opts.batchfile == '-':
batchfd = sys.stdin batchfd = sys.stdin
else: else:
batchfd = open(opts.batchfile, 'r') batchfd = io.open(opts.batchfile, 'r', encoding='utf-8', errors='ignore')
batchurls = batchfd.readlines() batch_urls = read_batch_urls(batchfd)
batchurls = [x.strip() for x in batchurls]
batchurls = [x for x in batchurls if len(x) > 0 and not re.search(r'^[#/;]', x)]
if opts.verbose: if opts.verbose:
write_string(u'[debug] Batch file urls: ' + repr(batchurls) + u'\n') write_string(u'[debug] Batch file urls: ' + repr(batch_urls) + u'\n')
except IOError: except IOError:
sys.exit(u'ERROR: batch file could not be read') sys.exit(u'ERROR: batch file could not be read')
all_urls = batchurls + args all_urls = batch_urls + args
all_urls = [url.strip() for url in all_urls] all_urls = [url.strip() for url in all_urls]
_enc = preferredencoding() _enc = preferredencoding()
all_urls = [url.decode(_enc, 'ignore') if isinstance(url, bytes) else url for url in all_urls] all_urls = [url.decode(_enc, 'ignore') if isinstance(url, bytes) else url for url in all_urls]

View File

@@ -12,7 +12,6 @@ from .http import HttpFD
from ..utils import ( from ..utils import (
struct_pack, struct_pack,
struct_unpack, struct_unpack,
compat_urllib_request,
compat_urlparse, compat_urlparse,
format_bytes, format_bytes,
encodeFilename, encodeFilename,
@@ -117,8 +116,8 @@ class FlvReader(io.BytesIO):
self.read_unsigned_char() self.read_unsigned_char()
# flags # flags
self.read(3) self.read(3)
# BootstrapinfoVersion
bootstrap_info_version = self.read_unsigned_int() self.read_unsigned_int() # BootstrapinfoVersion
# Profile,Live,Update,Reserved # Profile,Live,Update,Reserved
self.read(1) self.read(1)
# time scale # time scale
@@ -127,15 +126,15 @@ class FlvReader(io.BytesIO):
self.read_unsigned_long_long() self.read_unsigned_long_long()
# SmpteTimeCodeOffset # SmpteTimeCodeOffset
self.read_unsigned_long_long() self.read_unsigned_long_long()
# MovieIdentifier
movie_identifier = self.read_string() self.read_string() # MovieIdentifier
server_count = self.read_unsigned_char() server_count = self.read_unsigned_char()
# ServerEntryTable # ServerEntryTable
for i in range(server_count): for i in range(server_count):
self.read_string() self.read_string()
quality_count = self.read_unsigned_char() quality_count = self.read_unsigned_char()
# QualityEntryTable # QualityEntryTable
for i in range(server_count): for i in range(quality_count):
self.read_string() self.read_string()
# DrmData # DrmData
self.read_string() self.read_string()

View File

@@ -19,6 +19,7 @@ from .bbccouk import BBCCoUkIE
from .blinkx import BlinkxIE from .blinkx import BlinkxIE
from .bliptv import BlipTVIE, BlipTVUserIE from .bliptv import BlipTVIE, BlipTVUserIE
from .bloomberg import BloombergIE from .bloomberg import BloombergIE
from .br import BRIE
from .breakcom import BreakIE from .breakcom import BreakIE
from .brightcove import BrightcoveIE from .brightcove import BrightcoveIE
from .c56 import C56IE from .c56 import C56IE
@@ -136,7 +137,7 @@ from .malemotion import MalemotionIE
from .mdr import MDRIE from .mdr import MDRIE
from .metacafe import MetacafeIE from .metacafe import MetacafeIE
from .metacritic import MetacriticIE from .metacritic import MetacriticIE
from .mit import TechTVMITIE, MITIE from .mit import TechTVMITIE, MITIE, OCWMITIE
from .mixcloud import MixcloudIE from .mixcloud import MixcloudIE
from .mpora import MporaIE from .mpora import MporaIE
from .mofosex import MofosexIE from .mofosex import MofosexIE
@@ -151,7 +152,10 @@ from .myspass import MySpassIE
from .myvideo import MyVideoIE from .myvideo import MyVideoIE
from .naver import NaverIE from .naver import NaverIE
from .nba import NBAIE from .nba import NBAIE
from .nbc import NBCNewsIE from .nbc import (
NBCIE,
NBCNewsIE,
)
from .ndr import NDRIE from .ndr import NDRIE
from .ndtv import NDTVIE from .ndtv import NDTVIE
from .newgrounds import NewgroundsIE from .newgrounds import NewgroundsIE
@@ -160,7 +164,7 @@ from .nhl import NHLIE, NHLVideocenterIE
from .niconico import NiconicoIE from .niconico import NiconicoIE
from .ninegag import NineGagIE from .ninegag import NineGagIE
from .normalboots import NormalbootsIE from .normalboots import NormalbootsIE
from .novamov import NovamovIE from .novamov import NovaMovIE
from .nowness import NownessIE from .nowness import NownessIE
from .nowvideo import NowVideoIE from .nowvideo import NowVideoIE
from .ooyala import OoyalaIE from .ooyala import OoyalaIE
@@ -171,6 +175,7 @@ from .podomatic import PodomaticIE
from .pornhd import PornHdIE from .pornhd import PornHdIE
from .pornhub import PornHubIE from .pornhub import PornHubIE
from .pornotube import PornotubeIE from .pornotube import PornotubeIE
from .prosiebensat1 import ProSiebenSat1IE
from .pyvideo import PyvideoIE from .pyvideo import PyvideoIE
from .radiofrance import RadioFranceIE from .radiofrance import RadioFranceIE
from .rbmaradio import RBMARadioIE from .rbmaradio import RBMARadioIE
@@ -186,6 +191,7 @@ from .rutube import (
RutubeMovieIE, RutubeMovieIE,
RutubePersonIE, RutubePersonIE,
) )
from .savefrom import SaveFromIE
from .servingsys import ServingSysIE from .servingsys import ServingSysIE
from .sina import SinaIE from .sina import SinaIE
from .slashdot import SlashdotIE from .slashdot import SlashdotIE
@@ -216,6 +222,7 @@ from .sztvhu import SztvHuIE
from .teamcoco import TeamcocoIE from .teamcoco import TeamcocoIE
from .techtalks import TechTalksIE from .techtalks import TechTalksIE
from .ted import TEDIE from .ted import TEDIE
from .testurl import TestURLIE
from .tf1 import TF1IE from .tf1 import TF1IE
from .theplatform import ThePlatformIE from .theplatform import ThePlatformIE
from .thisav import ThisAVIE from .thisav import ThisAVIE
@@ -223,6 +230,7 @@ from .tinypic import TinyPicIE
from .toutv import TouTvIE from .toutv import TouTvIE
from .traileraddict import TrailerAddictIE from .traileraddict import TrailerAddictIE
from .trilulilu import TriluliluIE from .trilulilu import TriluliluIE
from .trutube import TruTubeIE
from .tube8 import Tube8IE from .tube8 import Tube8IE
from .tudou import TudouIE from .tudou import TudouIE
from .tumblr import TumblrIE from .tumblr import TumblrIE
@@ -237,6 +245,7 @@ from .vesti import VestiIE
from .vevo import VevoIE from .vevo import VevoIE
from .vice import ViceIE from .vice import ViceIE
from .viddler import ViddlerIE from .viddler import ViddlerIE
from .videobam import VideoBamIE
from .videodetective import VideoDetectiveIE from .videodetective import VideoDetectiveIE
from .videofyme import VideofyMeIE from .videofyme import VideofyMeIE
from .videopremium import VideoPremiumIE from .videopremium import VideoPremiumIE

View File

@@ -5,7 +5,7 @@ from .common import InfoExtractor
class AcademicEarthCourseIE(InfoExtractor): class AcademicEarthCourseIE(InfoExtractor):
_VALID_URL = r'^https?://(?:www\.)?academicearth\.org/(?:courses|playlists)/(?P<id>[^?#/]+)' _VALID_URL = r'^https?://(?:www\.)?academicearth\.org/playlists/(?P<id>[^?#/]+)'
IE_NAME = 'AcademicEarth:Course' IE_NAME = 'AcademicEarth:Course'
def _real_extract(self, url): def _real_extract(self, url):
@@ -14,12 +14,12 @@ class AcademicEarthCourseIE(InfoExtractor):
webpage = self._download_webpage(url, playlist_id) webpage = self._download_webpage(url, playlist_id)
title = self._html_search_regex( title = self._html_search_regex(
r'<h1 class="playlist-name">(.*?)</h1>', webpage, u'title') r'<h1 class="playlist-name"[^>]*?>(.*?)</h1>', webpage, u'title')
description = self._html_search_regex( description = self._html_search_regex(
r'<p class="excerpt">(.*?)</p>', r'<p class="excerpt"[^>]*?>(.*?)</p>',
webpage, u'description', fatal=False) webpage, u'description', fatal=False)
urls = re.findall( urls = re.findall(
r'<h3 class="lecture-title"><a target="_blank" href="([^"]+)">', r'<li class="lecture-preview">\s*?<a target="_blank" href="([^"]+)">',
webpage) webpage)
entries = [self.url_result(u) for u in urls] entries = [self.url_result(u) for u in urls]

View File

@@ -13,13 +13,13 @@ class BBCCoUkIE(SubtitlesInfoExtractor):
_TESTS = [ _TESTS = [
{ {
'url': 'http://www.bbc.co.uk/programmes/p01q7wz1', 'url': 'http://www.bbc.co.uk/programmes/b039g8p7',
'info_dict': { 'info_dict': {
'id': 'p01q7wz4', 'id': 'b039d07m',
'ext': 'flv', 'ext': 'flv',
'title': 'Friction: Blu Mar Ten guest mix: Blu Mar Ten - Guest Mix', 'title': 'Kaleidoscope: Leonard Cohen',
'description': 'Blu Mar Ten deliver a Guest Mix for Friction.', 'description': 'md5:db4755d7a665ae72343779f7dacb402c',
'duration': 1936, 'duration': 1740,
}, },
'params': { 'params': {
# rtmp download # rtmp download
@@ -38,7 +38,8 @@ class BBCCoUkIE(SubtitlesInfoExtractor):
'params': { 'params': {
# rtmp download # rtmp download
'skip_download': True, 'skip_download': True,
} },
'skip': 'Episode is no longer available on BBC iPlayer Radio',
}, },
{ {
'url': 'http://www.bbc.co.uk/iplayer/episode/b03vhd1f/The_Voice_UK_Series_3_Blind_Auditions_5/', 'url': 'http://www.bbc.co.uk/iplayer/episode/b03vhd1f/The_Voice_UK_Series_3_Blind_Auditions_5/',
@@ -161,6 +162,11 @@ class BBCCoUkIE(SubtitlesInfoExtractor):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
group_id = mobj.group('id') group_id = mobj.group('id')
webpage = self._download_webpage(url, group_id, 'Downloading video page')
if re.search(r'id="emp-error" class="notinuk">', webpage):
raise ExtractorError('Currently BBC iPlayer TV programmes are available to play in the UK only',
expected=True)
playlist = self._download_xml('http://www.bbc.co.uk/iplayer/playlist/%s' % group_id, group_id, playlist = self._download_xml('http://www.bbc.co.uk/iplayer/playlist/%s' % group_id, group_id,
'Downloading playlist XML') 'Downloading playlist XML')

View File

@@ -0,0 +1,80 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import ExtractorError
class BRIE(InfoExtractor):
IE_DESC = "Bayerischer Rundfunk Mediathek"
_VALID_URL = r"^https?://(?:www\.)?br\.de/mediathek/video/(?:sendungen/)?(?P<id>[a-z0-9\-]+)\.html$"
_BASE_URL = "http://www.br.de"
_TEST = {
"url": "http://www.br.de/mediathek/video/anselm-gruen-114.html",
"md5": "c4f83cf0f023ba5875aba0bf46860df2",
"info_dict": {
"id": "2c8d81c5-6fb7-4a74-88d4-e768e5856532",
"ext": "mp4",
"title": "Feiern und Verzichten",
"description": "Anselm Grün: Feiern und Verzichten",
"uploader": "BR/Birgit Baier",
"upload_date": "20140301"
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('id')
page = self._download_webpage(url, display_id)
xml_url = self._search_regex(
r"return BRavFramework\.register\(BRavFramework\('avPlayer_(?:[a-f0-9-]{36})'\)\.setup\({dataURL:'(/mediathek/video/[a-z0-9/~_.-]+)'}\)\);", page, "XMLURL")
xml = self._download_xml(self._BASE_URL + xml_url, None)
videos = [{
"id": xml_video.get("externalId"),
"title": xml_video.find("title").text,
"formats": self._extract_formats(xml_video.find("assets")),
"thumbnails": self._extract_thumbnails(xml_video.find("teaserImage/variants")),
"description": " ".join(xml_video.find("shareTitle").text.splitlines()),
"uploader": xml_video.find("author").text,
"upload_date": "".join(reversed(xml_video.find("broadcastDate").text.split("."))),
"webpage_url": xml_video.find("permalink").text,
} for xml_video in xml.findall("video")]
if len(videos) > 1:
self._downloader.report_warning(
'found multiple videos; please '
'report this with the video URL to http://yt-dl.org/bug')
if not videos:
raise ExtractorError('No video entries found')
return videos[0]
def _extract_formats(self, assets):
formats = [{
"url": asset.find("downloadUrl").text,
"ext": asset.find("mediaType").text,
"format_id": asset.get("type"),
"width": int(asset.find("frameWidth").text),
"height": int(asset.find("frameHeight").text),
"tbr": int(asset.find("bitrateVideo").text),
"abr": int(asset.find("bitrateAudio").text),
"vcodec": asset.find("codecVideo").text,
"container": asset.find("mediaType").text,
"filesize": int(asset.find("size").text),
} for asset in assets.findall("asset")
if asset.find("downloadUrl") is not None]
self._sort_formats(formats)
return formats
def _extract_thumbnails(self, variants):
thumbnails = [{
"url": self._BASE_URL + variant.find("url").text,
"width": int(variant.find("width").text),
"height": int(variant.find("height").text),
} for variant in variants.findall("variant")]
thumbnails.sort(key=lambda x: x["width"] * x["height"], reverse=True)
return thumbnails

View File

@@ -23,8 +23,8 @@ class BreakIE(InfoExtractor):
video_id = mobj.group(1).split("-")[-1] video_id = mobj.group(1).split("-")[-1]
embed_url = 'http://www.break.com/embed/%s' % video_id embed_url = 'http://www.break.com/embed/%s' % video_id
webpage = self._download_webpage(embed_url, video_id) webpage = self._download_webpage(embed_url, video_id)
info_json = self._search_regex(r'var embedVars = ({.*?});', webpage, info_json = self._search_regex(r'var embedVars = ({.*})\s*?</script>',
'info json', flags=re.DOTALL) webpage, 'info json', flags=re.DOTALL)
info = json.loads(info_json) info = json.loads(info_json)
video_url = info['videoUri'] video_url = info['videoUri']
m_youtube = re.search(r'(https?://www\.youtube\.com/watch\?v=.*)', video_url) m_youtube = re.search(r'(https?://www\.youtube\.com/watch\?v=.*)', video_url)

View File

@@ -1,4 +1,6 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
@@ -9,11 +11,12 @@ class Canalc2IE(InfoExtractor):
_VALID_URL = r'http://.*?\.canalc2\.tv/video\.asp\?.*?idVideo=(?P<id>\d+)' _VALID_URL = r'http://.*?\.canalc2\.tv/video\.asp\?.*?idVideo=(?P<id>\d+)'
_TEST = { _TEST = {
u'url': u'http://www.canalc2.tv/video.asp?idVideo=12163&voir=oui', 'url': 'http://www.canalc2.tv/video.asp?idVideo=12163&voir=oui',
u'file': u'12163.mp4', 'md5': '060158428b650f896c542dfbb3d6487f',
u'md5': u'060158428b650f896c542dfbb3d6487f', 'info_dict': {
u'info_dict': { 'id': '12163',
u'title': u'Terrasses du Numérique' 'ext': 'mp4',
'title': 'Terrasses du Numérique'
} }
} }
@@ -28,9 +31,10 @@ class Canalc2IE(InfoExtractor):
video_url = 'http://vod-flash.u-strasbg.fr:8080/' + file_name video_url = 'http://vod-flash.u-strasbg.fr:8080/' + file_name
title = self._html_search_regex( title = self._html_search_regex(
r'class="evenement8">(.*?)</a>', webpage, u'title') r'class="evenement8">(.*?)</a>', webpage, 'title')
return {'id': video_id, return {
'id': video_id,
'ext': 'mp4', 'ext': 'mp4',
'url': video_url, 'url': video_url,
'title': title, 'title': title,

View File

@@ -1,4 +1,5 @@
# encoding: utf-8 # encoding: utf-8
from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
@@ -8,73 +9,63 @@ from ..utils import (
class CinemassacreIE(InfoExtractor): class CinemassacreIE(InfoExtractor):
_VALID_URL = r'(?:http://)?(?:www\.)?(?P<url>cinemassacre\.com/(?P<date_Y>[0-9]{4})/(?P<date_m>[0-9]{2})/(?P<date_d>[0-9]{2})/.+?)(?:[/?].*)?' _VALID_URL = r'http://(?:www\.)?cinemassacre\.com/(?P<date_Y>[0-9]{4})/(?P<date_m>[0-9]{2})/(?P<date_d>[0-9]{2})/.+?'
_TESTS = [{ _TESTS = [
u'url': u'http://cinemassacre.com/2012/11/10/avgn-the-movie-trailer/', {
u'file': u'19911.flv', 'url': 'http://cinemassacre.com/2012/11/10/avgn-the-movie-trailer/',
u'info_dict': { 'file': '19911.mp4',
u'upload_date': u'20121110', 'md5': 'fde81fbafaee331785f58cd6c0d46190',
u'title': u'“Angry Video Game Nerd: The Movie” Trailer', 'info_dict': {
u'description': u'md5:fb87405fcb42a331742a0dce2708560b', 'upload_date': '20121110',
}, 'title': '“Angry Video Game Nerd: The Movie” Trailer',
u'params': { 'description': 'md5:fb87405fcb42a331742a0dce2708560b',
# rtmp download
u'skip_download': True,
}, },
}, },
{ {
u'url': u'http://cinemassacre.com/2013/10/02/the-mummys-hand-1940', 'url': 'http://cinemassacre.com/2013/10/02/the-mummys-hand-1940',
u'file': u'521be8ef82b16.flv', 'file': '521be8ef82b16.mp4',
u'info_dict': { 'md5': 'd72f10cd39eac4215048f62ab477a511',
u'upload_date': u'20131002', 'info_dict': {
u'title': u'The Mummys Hand (1940)', 'upload_date': '20131002',
'title': 'The Mummys Hand (1940)',
}, },
u'params': { }
# rtmp download ]
u'skip_download': True,
},
}]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
webpage_url = u'http://' + mobj.group('url') webpage = self._download_webpage(url, None) # Don't know video id yet
webpage = self._download_webpage(webpage_url, None) # Don't know video id yet
video_date = mobj.group('date_Y') + mobj.group('date_m') + mobj.group('date_d') video_date = mobj.group('date_Y') + mobj.group('date_m') + mobj.group('date_d')
mobj = re.search(r'src="(?P<embed_url>http://player\.screenwavemedia\.com/play/[a-zA-Z]+\.php\?id=(?:Cinemassacre-)?(?P<video_id>.+?))"', webpage) mobj = re.search(r'src="(?P<embed_url>http://player\.screenwavemedia\.com/play/[a-zA-Z]+\.php\?id=(?:Cinemassacre-)?(?P<video_id>.+?))"', webpage)
if not mobj: if not mobj:
raise ExtractorError(u'Can\'t extract embed url and video id') raise ExtractorError('Can\'t extract embed url and video id')
playerdata_url = mobj.group(u'embed_url') playerdata_url = mobj.group('embed_url')
video_id = mobj.group(u'video_id') video_id = mobj.group('video_id')
video_title = self._html_search_regex(r'<title>(?P<title>.+?)\|', video_title = self._html_search_regex(r'<title>(?P<title>.+?)\|',
webpage, u'title') webpage, 'title')
video_description = self._html_search_regex(r'<div class="entry-content">(?P<description>.+?)</div>', video_description = self._html_search_regex(r'<div class="entry-content">(?P<description>.+?)</div>',
webpage, u'description', flags=re.DOTALL, fatal=False) webpage, 'description', flags=re.DOTALL, fatal=False)
if len(video_description) == 0: if len(video_description) == 0:
video_description = None video_description = None
playerdata = self._download_webpage(playerdata_url, video_id) playerdata = self._download_webpage(playerdata_url, video_id)
url = self._html_search_regex(r'\'streamer\': \'(?P<url>[^\']+)\'', playerdata, u'url')
sd_file = self._html_search_regex(r'\'file\': \'(?P<sd_file>[^\']+)\'', playerdata, u'sd_file') sd_url = self._html_search_regex(r'file: \'(?P<sd_file>[^\']+)\', label: \'SD\'', playerdata, 'sd_file')
hd_file = self._html_search_regex(r'\'?file\'?: "(?P<hd_file>[^"]+)"', playerdata, u'hd_file') hd_url = self._html_search_regex(r'file: \'(?P<hd_file>[^\']+)\', label: \'HD\'', playerdata, 'hd_file')
video_thumbnail = self._html_search_regex(r'\'image\': \'(?P<thumbnail>[^\']+)\'', playerdata, u'thumbnail', fatal=False) video_thumbnail = self._html_search_regex(r'image: \'(?P<thumbnail>[^\']+)\'', playerdata, 'thumbnail', fatal=False)
formats = [ formats = [
{ {
'url': url, 'url': sd_url,
'play_path': 'mp4:' + sd_file, 'ext': 'mp4',
'rtmp_live': True, # workaround
'ext': 'flv',
'format': 'sd', 'format': 'sd',
'format_id': 'sd', 'format_id': 'sd',
}, },
{ {
'url': url, 'url': hd_url,
'play_path': 'mp4:' + hd_file, 'ext': 'mp4',
'rtmp_live': True, # workaround
'ext': 'flv',
'format': 'hd', 'format': 'hd',
'format_id': 'hd', 'format_id': 'hd',
}, },

View File

@@ -1,7 +1,11 @@
# encoding: utf-8 # encoding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re, base64, zlib import re
import json
import base64
import zlib
from hashlib import sha1 from hashlib import sha1
from math import pow, sqrt, floor from math import pow, sqrt, floor
from .common import InfoExtractor from .common import InfoExtractor
@@ -19,13 +23,15 @@ from ..aes import (
inc, inc,
) )
class CrunchyrollIE(InfoExtractor): class CrunchyrollIE(InfoExtractor):
_VALID_URL = r'(?:https?://)?(?:(?P<prefix>www|m)\.)?(?P<url>crunchyroll\.com/(?:[^/]*/[^/?&]*?|media/\?id=)(?P<video_id>[0-9]+))(?:[/?&]|$)' _VALID_URL = r'https?://(?:(?P<prefix>www|m)\.)?(?P<url>crunchyroll\.com/(?:[^/]*/[^/?&]*?|media/\?id=)(?P<video_id>[0-9]+))(?:[/?&]|$)'
_TESTS = [{ _TEST = {
'url': 'http://www.crunchyroll.com/wanna-be-the-strongest-in-the-world/episode-1-an-idol-wrestler-is-born-645513', 'url': 'http://www.crunchyroll.com/wanna-be-the-strongest-in-the-world/episode-1-an-idol-wrestler-is-born-645513',
'file': '645513.flv',
#'md5': 'b1639fd6ddfaa43788c85f6d1dddd412', #'md5': 'b1639fd6ddfaa43788c85f6d1dddd412',
'info_dict': { 'info_dict': {
'id': '645513',
'ext': 'flv',
'title': 'Wanna be the Strongest in the World Episode 1 An Idol-Wrestler is Born!', 'title': 'Wanna be the Strongest in the World Episode 1 An Idol-Wrestler is Born!',
'description': 'md5:2d17137920c64f2f49981a7797d275ef', 'description': 'md5:2d17137920c64f2f49981a7797d275ef',
'thumbnail': 'http://img1.ak.crunchyroll.com/i/spire1-tmb/20c6b5e10f1a47b10516877d3c039cae1380951166_full.jpg', 'thumbnail': 'http://img1.ak.crunchyroll.com/i/spire1-tmb/20c6b5e10f1a47b10516877d3c039cae1380951166_full.jpg',
@@ -36,7 +42,7 @@ class CrunchyrollIE(InfoExtractor):
# rtmp # rtmp
'skip_download': True, 'skip_download': True,
}, },
}] }
_FORMAT_IDS = { _FORMAT_IDS = {
'360': ('60', '106'), '360': ('60', '106'),
@@ -80,9 +86,8 @@ class CrunchyrollIE(InfoExtractor):
return zlib.decompress(decrypted_data) return zlib.decompress(decrypted_data)
def _convert_subtitles_to_srt(self, subtitles): def _convert_subtitles_to_srt(self, subtitles):
i=1
output = '' output = ''
for start, end, text in re.findall(r'<event [^>]*?start="([^"]+)" [^>]*?end="([^"]+)" [^>]*?text="([^"]+)"[^>]*?>', subtitles): for i, (start, end, text) in enumerate(re.findall(r'<event [^>]*?start="([^"]+)" [^>]*?end="([^"]+)" [^>]*?text="([^"]+)"[^>]*?>', subtitles), 1):
start = start.replace('.', ',') start = start.replace('.', ',')
end = end.replace('.', ',') end = end.replace('.', ',')
text = clean_html(text) text = clean_html(text)
@@ -90,7 +95,6 @@ class CrunchyrollIE(InfoExtractor):
if not text: if not text:
continue continue
output += '%d\n%s --> %s\n%s\n\n' % (i, start, end, text) output += '%d\n%s --> %s\n%s\n\n' % (i, start, end, text)
i+=1
return output return output
def _real_extract(self,url): def _real_extract(self,url):
@@ -108,6 +112,12 @@ class CrunchyrollIE(InfoExtractor):
if note_m: if note_m:
raise ExtractorError(note_m) raise ExtractorError(note_m)
mobj = re.search(r'Page\.messaging_box_controller\.addItems\(\[(?P<msg>{.+?})\]\)', webpage)
if mobj:
msg = json.loads(mobj.group('msg'))
if msg.get('type') == 'error':
raise ExtractorError('crunchyroll returned error: %s' % msg['message_body'], expected=True)
video_title = self._html_search_regex(r'<h1[^>]*>(.+?)</h1>', webpage, 'video_title', flags=re.DOTALL) video_title = self._html_search_regex(r'<h1[^>]*>(.+?)</h1>', webpage, 'video_title', flags=re.DOTALL)
video_title = re.sub(r' {2,}', ' ', video_title) video_title = re.sub(r' {2,}', ' ', video_title)
video_description = self._html_search_regex(r'"description":"([^"]+)', webpage, 'video_description', default='') video_description = self._html_search_regex(r'"description":"([^"]+)', webpage, 'video_description', default='')
@@ -161,7 +171,7 @@ class CrunchyrollIE(InfoExtractor):
data = base64.b64decode(data) data = base64.b64decode(data)
subtitle = self._decrypt_subtitles(data, iv, id).decode('utf-8') subtitle = self._decrypt_subtitles(data, iv, id).decode('utf-8')
lang_code = self._search_regex(r'lang_code=\'([^\']+)', subtitle, 'subtitle_lang_code', fatal=False) lang_code = self._search_regex(r'lang_code=["\']([^"\']+)', subtitle, 'subtitle_lang_code', fatal=False)
if not lang_code: if not lang_code:
continue continue
subtitles[lang_code] = self._convert_subtitles_to_srt(subtitle) subtitles[lang_code] = self._convert_subtitles_to_srt(subtitle)

View File

@@ -4,6 +4,7 @@ from __future__ import unicode_literals
import os import os
import re import re
import xml.etree.ElementTree
from .common import InfoExtractor from .common import InfoExtractor
from .youtube import YoutubeIE from .youtube import YoutubeIE
@@ -12,6 +13,7 @@ from ..utils import (
compat_urllib_parse, compat_urllib_parse,
compat_urllib_request, compat_urllib_request,
compat_urlparse, compat_urlparse,
compat_xml_parse_error,
ExtractorError, ExtractorError,
HEADRequest, HEADRequest,
@@ -81,10 +83,10 @@ class GenericIE(InfoExtractor):
# Direct link to a video # Direct link to a video
{ {
'url': 'http://media.w3.org/2010/05/sintel/trailer.mp4', 'url': 'http://media.w3.org/2010/05/sintel/trailer.mp4',
'file': 'trailer.mp4',
'md5': '67d406c2bcb6af27fa886f31aa934bbe', 'md5': '67d406c2bcb6af27fa886f31aa934bbe',
'info_dict': { 'info_dict': {
'id': 'trailer', 'id': 'trailer',
'ext': 'mp4',
'title': 'trailer', 'title': 'trailer',
'upload_date': '20100513', 'upload_date': '20100513',
} }
@@ -92,7 +94,6 @@ class GenericIE(InfoExtractor):
# ooyala video # ooyala video
{ {
'url': 'http://www.rollingstone.com/music/videos/norwegian-dj-cashmere-cat-goes-spartan-on-with-me-premiere-20131219', 'url': 'http://www.rollingstone.com/music/videos/norwegian-dj-cashmere-cat-goes-spartan-on-with-me-premiere-20131219',
'file': 'BwY2RxaTrTkslxOfcan0UCf0YqyvWysJ.mp4',
'md5': '5644c6ca5d5782c1d0d350dad9bd840c', 'md5': '5644c6ca5d5782c1d0d350dad9bd840c',
'info_dict': { 'info_dict': {
'id': 'BwY2RxaTrTkslxOfcan0UCf0YqyvWysJ', 'id': 'BwY2RxaTrTkslxOfcan0UCf0YqyvWysJ',
@@ -100,6 +101,22 @@ class GenericIE(InfoExtractor):
'title': '2cc213299525360.mov', # that's what we get 'title': '2cc213299525360.mov', # that's what we get
}, },
}, },
# google redirect
{
'url': 'http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CCUQtwIwAA&url=http%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DcmQHVoWB5FY&ei=F-sNU-LLCaXk4QT52ICQBQ&usg=AFQjCNEw4hL29zgOohLXvpJ-Bdh2bils1Q&bvm=bv.61965928,d.bGE',
'info_dict': {
'id': 'cmQHVoWB5FY',
'ext': 'mp4',
'upload_date': '20130224',
'uploader_id': 'TheVerge',
'description': 'Chris Ziegler takes a look at the Alcatel OneTouch Fire and the ZTE Open; two of the first Firefox OS handsets to be officially announced.',
'uploader': 'The Verge',
'title': 'First Firefox OS phones side-by-side',
},
'params': {
'skip_download': False,
}
}
] ]
def report_download_webpage(self, video_id): def report_download_webpage(self, video_id):
@@ -159,6 +176,25 @@ class GenericIE(InfoExtractor):
raise ExtractorError('Invalid URL protocol') raise ExtractorError('Invalid URL protocol')
return response return response
def _extract_rss(self, url, video_id, doc):
playlist_title = doc.find('./channel/title').text
playlist_desc_el = doc.find('./channel/description')
playlist_desc = None if playlist_desc_el is None else playlist_desc_el.text
entries = [{
'_type': 'url',
'url': e.find('link').text,
'title': e.find('title').text,
} for e in doc.findall('./channel/item')]
return {
'_type': 'playlist',
'id': url,
'title': playlist_title,
'description': playlist_desc,
'entries': entries,
}
def _real_extract(self, url): def _real_extract(self, url):
parsed_url = compat_urlparse.urlparse(url) parsed_url = compat_urlparse.urlparse(url)
if not parsed_url.scheme: if not parsed_url.scheme:
@@ -219,6 +255,14 @@ class GenericIE(InfoExtractor):
self.report_extraction(video_id) self.report_extraction(video_id)
# Is it an RSS feed?
try:
doc = xml.etree.ElementTree.fromstring(webpage.encode('utf-8'))
if doc.tag == 'rss':
return self._extract_rss(url, video_id, doc)
except compat_xml_parse_error:
pass
# it's tempting to parse this further, but you would # it's tempting to parse this further, but you would
# have to take into account all the variations like # have to take into account all the variations like
# Video Title - Site Name # Video Title - Site Name
@@ -334,11 +378,17 @@ class GenericIE(InfoExtractor):
if mobj is not None: if mobj is not None:
return self.url_result(mobj.group(1), 'Mpora') return self.url_result(mobj.group(1), 'Mpora')
# Look for embedded Novamov player # Look for embedded NovaMov player
mobj = re.search( mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>http://(?:(?:embed|www)\.)?novamov\.com/embed\.php.+?)\1', webpage) r'<iframe[^>]+?src=(["\'])(?P<url>http://(?:(?:embed|www)\.)?novamov\.com/embed\.php.+?)\1', webpage)
if mobj is not None: if mobj is not None:
return self.url_result(mobj.group('url'), 'Novamov') return self.url_result(mobj.group('url'), 'NovaMov')
# Look for embedded NowVideo player
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>http://(?:(?:embed|www)\.)?nowvideo\.(?:ch|sx|eu)/embed\.php.+?)\1', webpage)
if mobj is not None:
return self.url_result(mobj.group('url'), 'NowVideo')
# Look for embedded Facebook player # Look for embedded Facebook player
mobj = re.search( mobj = re.search(
@@ -376,6 +426,18 @@ class GenericIE(InfoExtractor):
if mobj is None: if mobj is None:
# HTML5 video # HTML5 video
mobj = re.search(r'<video[^<]*(?:>.*?<source.*?)? src="([^"]+)"', webpage, flags=re.DOTALL) mobj = re.search(r'<video[^<]*(?:>.*?<source.*?)? src="([^"]+)"', webpage, flags=re.DOTALL)
if mobj is None:
mobj = re.search(
r'(?i)<meta\s+(?=(?:[a-z-]+="[^"]+"\s+)*http-equiv="refresh")'
r'(?:[a-z-]+="[^"]+"\s+)*?content="[0-9]{,2};url=\'([^\']+)\'"',
webpage)
if mobj:
new_url = mobj.group(1)
self.report_following_redirect(new_url)
return {
'_type': 'url',
'url': new_url,
}
if mobj is None: if mobj is None:
raise ExtractorError('Unsupported URL: %s' % url) raise ExtractorError('Unsupported URL: %s' % url)

View File

@@ -10,7 +10,7 @@ from ..utils import compat_urllib_request
class IPrimaIE(InfoExtractor): class IPrimaIE(InfoExtractor):
_VALID_URL = r'https?://play\.iprima\.cz/(?P<videogroup>.+)/(?P<videoid>.+)' _VALID_URL = r'https?://play\.iprima\.cz/[^?#]+/(?P<id>[^?#]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://play.iprima.cz/particka/particka-92', 'url': 'http://play.iprima.cz/particka/particka-92',
@@ -22,20 +22,32 @@ class IPrimaIE(InfoExtractor):
'thumbnail': 'http://play.iprima.cz/sites/default/files/image_crops/image_620x349/3/491483_particka-92_image_620x349.jpg', 'thumbnail': 'http://play.iprima.cz/sites/default/files/image_crops/image_620x349/3/491483_particka-92_image_620x349.jpg',
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True, # requires rtmpdump
}, },
}, {
'url': 'http://play.iprima.cz/particka/tchibo-particka-jarni-moda',
'info_dict': {
'id': '9718337',
'ext': 'flv',
'title': 'Tchibo Partička - Jarní móda',
'description': 'md5:589f8f59f414220621ff8882eb3ce7be',
'thumbnail': 're:^http:.*\.jpg$',
}, },
] 'params': {
'skip_download': True, # requires rtmpdump
},
}]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('videoid') video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
player_url = 'http://embed.livebox.cz/iprimaplay/player-embed-v2.js?__tok%s__=%s' % ( player_url = (
floor(random()*1073741824), 'http://embed.livebox.cz/iprimaplay/player-embed-v2.js?__tok%s__=%s' %
floor(random()*1073741824)) (floor(random()*1073741824), floor(random()*1073741824))
)
req = compat_urllib_request.Request(player_url) req = compat_urllib_request.Request(player_url)
req.add_header('Referer', url) req.add_header('Referer', url)
@@ -44,18 +56,20 @@ class IPrimaIE(InfoExtractor):
base_url = ''.join(re.findall(r"embed\['stream'\] = '(.+?)'.+'(\?auth=)'.+'(.+?)';", playerpage)[1]) base_url = ''.join(re.findall(r"embed\['stream'\] = '(.+?)'.+'(\?auth=)'.+'(.+?)';", playerpage)[1])
zoneGEO = self._html_search_regex(r'"zoneGEO":(.+?),', webpage, 'zoneGEO') zoneGEO = self._html_search_regex(r'"zoneGEO":(.+?),', webpage, 'zoneGEO')
if zoneGEO != '0': if zoneGEO != '0':
base_url = base_url.replace('token', 'token_' + zoneGEO) base_url = base_url.replace('token', 'token_' + zoneGEO)
formats = [] formats = []
for format_id in ['lq', 'hq', 'hd']: for format_id in ['lq', 'hq', 'hd']:
filename = self._html_search_regex(r'"%s_id":(.+?),' % format_id, webpage, 'filename') filename = self._html_search_regex(
r'"%s_id":(.+?),' % format_id, webpage, 'filename')
if filename == 'null': if filename == 'null':
continue continue
real_id = self._search_regex(r'Prima-[0-9]{10}-([0-9]+)_', filename, 'real video id') real_id = self._search_regex(
r'Prima-(?:[0-9]{10}|WEB)-([0-9]+)[-_]',
filename, 'real video id')
if format_id == 'lq': if format_id == 'lq':
quality = 0 quality = 0

View File

@@ -32,7 +32,7 @@ class LifeNewsIE(InfoExtractor):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id') video_id = mobj.group('id')
webpage = self._download_webpage('http://lifenews.ru/mobile/news/%s' % video_id, video_id, 'Downloading page') webpage = self._download_webpage('http://lifenews.ru/news/%s' % video_id, video_id, 'Downloading page')
video_url = self._html_search_regex( video_url = self._html_search_regex(
r'<video.*?src="([^"]+)".*?></video>', webpage, 'video URL') r'<video.*?src="([^"]+)".*?></video>', webpage, 'video URL')
@@ -50,7 +50,7 @@ class LifeNewsIE(InfoExtractor):
view_count = self._html_search_regex( view_count = self._html_search_regex(
r'<div class=\'views\'>(\d+)</div>', webpage, 'view count', fatal=False) r'<div class=\'views\'>(\d+)</div>', webpage, 'view count', fatal=False)
comment_count = self._html_search_regex( comment_count = self._html_search_regex(
r'<div class=\'comments\'>(\d+)</div>', webpage, 'comment count', fatal=False) r'<div class=\'comments\'>\s*<span class=\'counter\'>(\d+)</span>', webpage, 'comment count', fatal=False)
upload_date = self._html_search_regex( upload_date = self._html_search_regex(
r'<time datetime=\'([^\']+)\'>', webpage, 'upload date',fatal=False) r'<time datetime=\'([^\']+)\'>', webpage, 'upload date',fatal=False)

View File

@@ -4,15 +4,17 @@ import json
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import int_or_none
class LiveLeakIE(InfoExtractor): class LiveLeakIE(InfoExtractor):
_VALID_URL = r'^(?:http://)?(?:\w+\.)?liveleak\.com/view\?(?:.*?)i=(?P<video_id>[\w_]+)(?:.*)' _VALID_URL = r'^(?:http://)?(?:\w+\.)?liveleak\.com/view\?(?:.*?)i=(?P<video_id>[\w_]+)(?:.*)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.liveleak.com/view?i=757_1364311680', 'url': 'http://www.liveleak.com/view?i=757_1364311680',
'file': '757_1364311680.mp4',
'md5': '0813c2430bea7a46bf13acf3406992f4', 'md5': '0813c2430bea7a46bf13acf3406992f4',
'info_dict': { 'info_dict': {
'id': '757_1364311680',
'ext': 'mp4',
'description': 'extremely bad day for this guy..!', 'description': 'extremely bad day for this guy..!',
'uploader': 'ljfriel2', 'uploader': 'ljfriel2',
'title': 'Most unlucky car accident' 'title': 'Most unlucky car accident'
@@ -20,25 +22,62 @@ class LiveLeakIE(InfoExtractor):
}, },
{ {
'url': 'http://www.liveleak.com/view?i=f93_1390833151', 'url': 'http://www.liveleak.com/view?i=f93_1390833151',
'file': 'f93_1390833151.mp4',
'md5': 'd3f1367d14cc3c15bf24fbfbe04b9abf', 'md5': 'd3f1367d14cc3c15bf24fbfbe04b9abf',
'info_dict': { 'info_dict': {
'id': 'f93_1390833151',
'ext': 'mp4',
'description': 'German Television Channel NDR does an exclusive interview with Edward Snowden.\r\nUploaded on LiveLeak cause German Television thinks the rest of the world isn\'t intereseted in Edward Snowden.', 'description': 'German Television Channel NDR does an exclusive interview with Edward Snowden.\r\nUploaded on LiveLeak cause German Television thinks the rest of the world isn\'t intereseted in Edward Snowden.',
'uploader': 'ARD_Stinkt', 'uploader': 'ARD_Stinkt',
'title': 'German Television does first Edward Snowden Interview (ENGLISH)', 'title': 'German Television does first Edward Snowden Interview (ENGLISH)',
} }
},
{
'url': 'http://www.liveleak.com/view?i=4f7_1392687779',
'md5': '42c6d97d54f1db107958760788c5f48f',
'info_dict': {
'id': '4f7_1392687779',
'ext': 'mp4',
'description': "The guy with the cigarette seems amazingly nonchalant about the whole thing... I really hope my friends' reactions would be a bit stronger.\r\n\r\nAction-go to 0:55.",
'uploader': 'CapObveus',
'title': 'Man is Fatally Struck by Reckless Car While Packing up a Moving Truck',
'age_limit': 18,
}
}] }]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('video_id') video_id = mobj.group('video_id')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
video_title = self._og_search_title(webpage).replace('LiveLeak.com -', '').strip()
video_description = self._og_search_description(webpage)
video_uploader = self._html_search_regex(
r'By:.*?(\w+)</a>', webpage, 'uploader', fatal=False)
age_limit = int_or_none(self._search_regex(
r'you confirm that you are ([0-9]+) years and over.',
webpage, 'age limit', default=None))
sources_raw = self._search_regex( sources_raw = self._search_regex(
r'(?s)sources:\s*(\[.*?\]),', webpage, 'video URLs', default=None) r'(?s)sources:\s*(\[.*?\]),', webpage, 'video URLs', default=None)
if sources_raw is None: if sources_raw is None:
sources_raw = '[{ %s}]' % ( alt_source = self._search_regex(
self._search_regex(r'(file: ".*?"),', webpage, 'video URL')) r'(file: ".*?"),', webpage, 'video URL', default=None)
if alt_source:
sources_raw = '[{ %s}]' % alt_source
else:
# Maybe an embed?
embed_url = self._search_regex(
r'<iframe[^>]+src="(http://www.prochan.com/embed\?[^"]+)"',
webpage, 'embed URL')
return {
'_type': 'url_transparent',
'url': embed_url,
'id': video_id,
'title': video_title,
'description': video_description,
'uploader': video_uploader,
'age_limit': age_limit,
}
sources_json = re.sub(r'\s([a-z]+):\s', r'"\1": ', sources_raw) sources_json = re.sub(r'\s([a-z]+):\s', r'"\1": ', sources_raw)
sources = json.loads(sources_json) sources = json.loads(sources_json)
@@ -49,15 +88,11 @@ class LiveLeakIE(InfoExtractor):
} for s in sources] } for s in sources]
self._sort_formats(formats) self._sort_formats(formats)
video_title = self._og_search_title(webpage).replace('LiveLeak.com -', '').strip()
video_description = self._og_search_description(webpage)
video_uploader = self._html_search_regex(
r'By:.*?(\w+)</a>', webpage, 'uploader', fatal=False)
return { return {
'id': video_id, 'id': video_id,
'title': video_title, 'title': video_title,
'description': video_description, 'description': video_description,
'uploader': video_uploader, 'uploader': video_uploader,
'formats': formats, 'formats': formats,
'age_limit': age_limit,
} }

View File

@@ -166,6 +166,7 @@ class MetacafeIE(InfoExtractor):
video_title = self._html_search_regex(r'(?im)<title>(.*) - Video</title>', webpage, u'title') video_title = self._html_search_regex(r'(?im)<title>(.*) - Video</title>', webpage, u'title')
description = self._og_search_description(webpage) description = self._og_search_description(webpage)
thumbnail = self._og_search_thumbnail(webpage)
video_uploader = self._html_search_regex( video_uploader = self._html_search_regex(
r'submitter=(.*?);|googletag\.pubads\(\)\.setTargeting\("(?:channel|submiter)","([^"]+)"\);', r'submitter=(.*?);|googletag\.pubads\(\)\.setTargeting\("(?:channel|submiter)","([^"]+)"\);',
webpage, u'uploader nickname', fatal=False) webpage, u'uploader nickname', fatal=False)
@@ -183,6 +184,7 @@ class MetacafeIE(InfoExtractor):
'uploader': video_uploader, 'uploader': video_uploader,
'upload_date': None, 'upload_date': None,
'title': video_title, 'title': video_title,
'thumbnail':thumbnail,
'ext': video_ext, 'ext': video_ext,
'age_limit': age_limit, 'age_limit': age_limit,
} }

View File

@@ -1,24 +1,30 @@
from __future__ import unicode_literals
import re import re
import json import json
from .common import InfoExtractor from .common import InfoExtractor
from .youtube import YoutubeIE
from ..utils import ( from ..utils import (
compat_urlparse,
clean_html, clean_html,
ExtractorError,
get_element_by_id, get_element_by_id,
) )
class TechTVMITIE(InfoExtractor): class TechTVMITIE(InfoExtractor):
IE_NAME = u'techtv.mit.edu' IE_NAME = 'techtv.mit.edu'
_VALID_URL = r'https?://techtv\.mit\.edu/(videos|embeds)/(?P<id>\d+)' _VALID_URL = r'https?://techtv\.mit\.edu/(videos|embeds)/(?P<id>\d+)'
_TEST = { _TEST = {
u'url': u'http://techtv.mit.edu/videos/25418-mit-dna-learning-center-set', 'url': 'http://techtv.mit.edu/videos/25418-mit-dna-learning-center-set',
u'file': u'25418.mp4', 'md5': '1f8cb3e170d41fd74add04d3c9330e5f',
u'md5': u'1f8cb3e170d41fd74add04d3c9330e5f', 'info_dict': {
u'info_dict': { 'id': '25418',
u'title': u'MIT DNA Learning Center Set', 'ext': 'mp4',
u'description': u'md5:82313335e8a8a3f243351ba55bc1b474', 'title': 'MIT DNA Learning Center Set',
'description': 'md5:82313335e8a8a3f243351ba55bc1b474',
}, },
} }
@@ -27,12 +33,12 @@ class TechTVMITIE(InfoExtractor):
video_id = mobj.group('id') video_id = mobj.group('id')
raw_page = self._download_webpage( raw_page = self._download_webpage(
'http://techtv.mit.edu/videos/%s' % video_id, video_id) 'http://techtv.mit.edu/videos/%s' % video_id, video_id)
clean_page = re.compile(u'<!--.*?-->', re.S).sub(u'', raw_page) clean_page = re.compile(r'<!--.*?-->', re.S).sub('', raw_page)
base_url = self._search_regex(r'ipadUrl: \'(.+?cloudfront.net/)', base_url = self._search_regex(
raw_page, u'base url') r'ipadUrl: \'(.+?cloudfront.net/)', raw_page, 'base url')
formats_json = self._search_regex(r'bitrates: (\[.+?\])', raw_page, formats_json = self._search_regex(
u'video formats') r'bitrates: (\[.+?\])', raw_page, 'video formats')
formats_mit = json.loads(formats_json) formats_mit = json.loads(formats_json)
formats = [ formats = [
{ {
@@ -48,10 +54,12 @@ class TechTVMITIE(InfoExtractor):
title = get_element_by_id('edit-title', clean_page) title = get_element_by_id('edit-title', clean_page)
description = clean_html(get_element_by_id('edit-description', clean_page)) description = clean_html(get_element_by_id('edit-description', clean_page))
thumbnail = self._search_regex(r'playlist:.*?url: \'(.+?)\'', thumbnail = self._search_regex(
raw_page, u'thumbnail', flags=re.DOTALL) r'playlist:.*?url: \'(.+?)\'',
raw_page, 'thumbnail', flags=re.DOTALL)
return {'id': video_id, return {
'id': video_id,
'title': title, 'title': title,
'formats': formats, 'formats': formats,
'description': description, 'description': description,
@@ -60,16 +68,17 @@ class TechTVMITIE(InfoExtractor):
class MITIE(TechTVMITIE): class MITIE(TechTVMITIE):
IE_NAME = u'video.mit.edu' IE_NAME = 'video.mit.edu'
_VALID_URL = r'https?://video\.mit\.edu/watch/(?P<title>[^/]+)' _VALID_URL = r'https?://video\.mit\.edu/watch/(?P<title>[^/]+)'
_TEST = { _TEST = {
u'url': u'http://video.mit.edu/watch/the-government-is-profiling-you-13222/', 'url': 'http://video.mit.edu/watch/the-government-is-profiling-you-13222/',
u'file': u'21783.mp4', 'md5': '7db01d5ccc1895fc5010e9c9e13648da',
u'md5': u'7db01d5ccc1895fc5010e9c9e13648da', 'info_dict': {
u'info_dict': { 'id': '21783',
u'title': u'The Government is Profiling You', 'ext': 'mp4',
u'description': u'md5:ad5795fe1e1623b73620dbfd47df9afd', 'title': 'The Government is Profiling You',
'description': 'md5:ad5795fe1e1623b73620dbfd47df9afd',
}, },
} }
@@ -77,7 +86,73 @@ class MITIE(TechTVMITIE):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
page_title = mobj.group('title') page_title = mobj.group('title')
webpage = self._download_webpage(url, page_title) webpage = self._download_webpage(url, page_title)
self.to_screen('%s: Extracting %s url' % (page_title, TechTVMITIE.IE_NAME)) embed_url = self._search_regex(
embed_url = self._search_regex(r'<iframe .*?src="(.+?)"', webpage, r'<iframe .*?src="(.+?)"', webpage, 'embed url')
u'embed url')
return self.url_result(embed_url, ie='TechTVMIT') return self.url_result(embed_url, ie='TechTVMIT')
class OCWMITIE(InfoExtractor):
IE_NAME = 'ocw.mit.edu'
_VALID_URL = r'^http://ocw\.mit\.edu/courses/(?P<topic>[a-z0-9\-]+)'
_BASE_URL = 'http://ocw.mit.edu/'
_TESTS = [
{
'url': 'http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-041-probabilistic-systems-analysis-and-applied-probability-fall-2010/video-lectures/lecture-7-multiple-variables-expectations-independence/',
'info_dict': {
'id': 'EObHWIEKGjA',
'ext': 'mp4',
'title': 'Lecture 7: Multiple Discrete Random Variables: Expectations, Conditioning, Independence',
'description': 'In this lecture, the professor discussed multiple random variables, expectations, and binomial distribution.',
#'subtitles': 'http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-041-probabilistic-systems-analysis-and-applied-probability-fall-2010/video-lectures/lecture-7-multiple-variables-expectations-independence/MIT6_041F11_lec07_300k.mp4.srt'
}
},
{
'url': 'http://ocw.mit.edu/courses/mathematics/18-01sc-single-variable-calculus-fall-2010/1.-differentiation/part-a-definition-and-basic-rules/session-1-introduction-to-derivatives/',
'info_dict': {
'id': '7K1sB05pE0A',
'ext': 'mp4',
'title': 'Session 1: Introduction to Derivatives',
'description': 'This section contains lecture video excerpts, lecture notes, an interactive mathlet with supporting documents, and problem solving videos.',
#'subtitles': 'http://ocw.mit.edu//courses/mathematics/18-01sc-single-variable-calculus-fall-2010/ocw-18.01-f07-lec01_300k.SRT'
}
}
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
topic = mobj.group('topic')
webpage = self._download_webpage(url, topic)
title = self._html_search_meta('WT.cg_s', webpage)
description = self._html_search_meta('Description', webpage)
# search for call to ocw_embed_chapter_media(container_id, media_url, provider, page_url, image_url, start, stop, captions_file)
embed_chapter_media = re.search(r'ocw_embed_chapter_media\((.+?)\)', webpage)
if embed_chapter_media:
metadata = re.sub(r'[\'"]', '', embed_chapter_media.group(1))
metadata = re.split(r', ?', metadata)
yt = metadata[1]
subs = compat_urlparse.urljoin(self._BASE_URL, metadata[7])
else:
# search for call to ocw_embed_chapter_media(container_id, media_url, provider, page_url, image_url, captions_file)
embed_media = re.search(r'ocw_embed_media\((.+?)\)', webpage)
if embed_media:
metadata = re.sub(r'[\'"]', '', embed_media.group(1))
metadata = re.split(r', ?', metadata)
yt = metadata[1]
subs = compat_urlparse.urljoin(self._BASE_URL, metadata[5])
else:
raise ExtractorError('Unable to find embedded YouTube video.')
video_id = YoutubeIE.extract_id(yt)
return {
'_type': 'url_transparent',
'id': video_id,
'title': title,
'description': description,
'url': yt,
'url_transparent'
'subtitles': subs,
'ie_key': 'Youtube',
}

View File

@@ -10,7 +10,7 @@ from ..utils import (
class MixcloudIE(InfoExtractor): class MixcloudIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?(?:www\.)?mixcloud\.com/([\w\d-]+)/([\w\d-]+)' _VALID_URL = r'^(?:https?://)?(?:www\.)?mixcloud\.com/([^/]+)/([^/]+)'
IE_NAME = 'mixcloud' IE_NAME = 'mixcloud'
_TEST = { _TEST = {

View File

@@ -1,19 +1,46 @@
from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import find_xpath_attr, compat_str from ..utils import find_xpath_attr, compat_str
class NBCIE(InfoExtractor):
_VALID_URL = r'http://www\.nbc\.com/[^/]+/video/[^/]+/(?P<id>n?\d+)'
_TEST = {
'url': 'http://www.nbc.com/chicago-fire/video/i-am-a-firefighter/2734188',
'md5': '54d0fbc33e0b853a65d7b4de5c06d64e',
'info_dict': {
'id': 'u1RInQZRN7QJ',
'ext': 'flv',
'title': 'I Am a Firefighter',
'description': 'An emergency puts Dawson\'sf irefighter skills to the ultimate test in this four-part digital series.',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
theplatform_url = self._search_regex('class="video-player video-player-full" data-mpx-url="(.*?)"', webpage, 'theplatform url')
if theplatform_url.startswith('//'):
theplatform_url = 'http:' + theplatform_url
return self.url_result(theplatform_url)
class NBCNewsIE(InfoExtractor): class NBCNewsIE(InfoExtractor):
_VALID_URL = r'https?://www\.nbcnews\.com/video/.+?/(?P<id>\d+)' _VALID_URL = r'https?://www\.nbcnews\.com/video/.+?/(?P<id>\d+)'
_TEST = { _TEST = {
u'url': u'http://www.nbcnews.com/video/nbc-news/52753292', 'url': 'http://www.nbcnews.com/video/nbc-news/52753292',
u'file': u'52753292.flv', 'md5': '47abaac93c6eaf9ad37ee6c4463a5179',
u'md5': u'47abaac93c6eaf9ad37ee6c4463a5179', 'info_dict': {
u'info_dict': { 'id': '52753292',
u'title': u'Crew emerges after four-month Mars food study', 'ext': 'flv',
u'description': u'md5:24e632ffac72b35f8b67a12d1b6ddfc1', 'title': 'Crew emerges after four-month Mars food study',
'description': 'md5:24e632ffac72b35f8b67a12d1b6ddfc1',
}, },
} }
@@ -23,7 +50,8 @@ class NBCNewsIE(InfoExtractor):
all_info = self._download_xml('http://www.nbcnews.com/id/%s/displaymode/1219' % video_id, video_id) all_info = self._download_xml('http://www.nbcnews.com/id/%s/displaymode/1219' % video_id, video_id)
info = all_info.find('video') info = all_info.find('video')
return {'id': video_id, return {
'id': video_id,
'title': info.find('headline').text, 'title': info.find('headline').text,
'ext': 'flv', 'ext': 'flv',
'url': find_xpath_attr(info, 'media', 'type', 'flashVideo').text, 'url': find_xpath_attr(info, 'media', 'type', 'flashVideo').text,

View File

@@ -1,61 +1,51 @@
# encoding: utf-8
from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
ExtractorError,
unified_strdate, unified_strdate,
) )
class NormalbootsIE(InfoExtractor): class NormalbootsIE(InfoExtractor):
_VALID_URL = r'(?:http://)?(?:www\.)?normalboots\.com/video/(?P<videoid>[0-9a-z-]*)/?$' _VALID_URL = r'http://(?:www\.)?normalboots\.com/video/(?P<videoid>[0-9a-z-]*)/?$'
_TEST = { _TEST = {
u'url': u'http://normalboots.com/video/home-alone-games-jontron/', 'url': 'http://normalboots.com/video/home-alone-games-jontron/',
u'file': u'home-alone-games-jontron.mp4', 'md5': '8bf6de238915dd501105b44ef5f1e0f6',
u'md5': u'8bf6de238915dd501105b44ef5f1e0f6', 'info_dict': {
u'info_dict': { 'id': 'home-alone-games-jontron',
u'title': u'Home Alone Games - JonTron - NormalBoots', 'ext': 'mp4',
u'description': u'Jon is late for Christmas. Typical. Thanks to: Paul Ritchey for Co-Writing/Filming: http://www.youtube.com/user/ContinueShow Michael Azzi for Christmas Intro Animation: http://michafrar.tumblr.com/ Jerrod Waters for Christmas Intro Music: http://www.youtube.com/user/xXJerryTerryXx Casey Ormond for \u2018Tense Battle Theme\u2019:\xa0http://www.youtube.com/Kiamet/', 'title': 'Home Alone Games - JonTron - NormalBoots',
u'uploader': u'JonTron', 'description': 'Jon is late for Christmas. Typical. Thanks to: Paul Ritchey for Co-Writing/Filming: http://www.youtube.com/user/ContinueShow Michael Azzi for Christmas Intro Animation: http://michafrar.tumblr.com/ Jerrod Waters for Christmas Intro Music: http://www.youtube.com/user/xXJerryTerryXx Casey Ormond for Tense Battle Theme:\xa0http://www.youtube.com/Kiamet/',
u'upload_date': u'20140125', 'uploader': 'JonTron',
'upload_date': '20140125',
} }
} }
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
if mobj is None:
raise ExtractorError(u'Invalid URL: %s' % url)
video_id = mobj.group('videoid') video_id = mobj.group('videoid')
info = {
'id': video_id,
'uploader': None,
'upload_date': None,
}
if url[:4] != 'http':
url = 'http://' + url
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
video_title = self._og_search_title(webpage)
video_description = self._og_search_description(webpage)
video_thumbnail = self._og_search_thumbnail(webpage)
video_uploader = self._html_search_regex(r'Posted\sby\s<a\shref="[A-Za-z0-9/]*">(?P<uploader>[A-Za-z]*)\s</a>', video_uploader = self._html_search_regex(r'Posted\sby\s<a\shref="[A-Za-z0-9/]*">(?P<uploader>[A-Za-z]*)\s</a>',
webpage, 'uploader') webpage, 'uploader')
raw_upload_date = self._html_search_regex('<span style="text-transform:uppercase; font-size:inherit;">[A-Za-z]+, (?P<date>.*)</span>', raw_upload_date = self._html_search_regex('<span style="text-transform:uppercase; font-size:inherit;">[A-Za-z]+, (?P<date>.*)</span>',
webpage, 'date') webpage, 'date')
video_upload_date = unified_strdate(raw_upload_date) video_upload_date = unified_strdate(raw_upload_date)
video_upload_date = unified_strdate(raw_upload_date)
player_url = self._html_search_regex(r'<iframe\swidth="[0-9]+"\sheight="[0-9]+"\ssrc="(?P<url>[\S]+)"', webpage, 'url') player_url = self._html_search_regex(r'<iframe\swidth="[0-9]+"\sheight="[0-9]+"\ssrc="(?P<url>[\S]+)"', webpage, 'url')
player_page = self._download_webpage(player_url, video_id) player_page = self._download_webpage(player_url, video_id)
video_url = u'http://player.screenwavemedia.com/' + self._html_search_regex(r"'file':\s'(?P<file>[0-9A-Za-z-_\.]+)'", player_page, 'file') video_url = self._html_search_regex(r"file:\s'(?P<file>[^']+\.mp4)'", player_page, 'file')
info['url'] = video_url return {
info['title'] = video_title 'id': video_id,
info['description'] = video_description 'url': video_url,
info['thumbnail'] = video_thumbnail 'title': self._og_search_title(webpage),
info['uploader'] = video_uploader 'description': self._og_search_description(webpage),
info['upload_date'] = video_upload_date 'thumbnail': self._og_search_thumbnail(webpage),
'uploader': video_uploader,
return info 'upload_date': video_upload_date,
}

View File

@@ -9,14 +9,25 @@ from ..utils import (
) )
class NovamovIE(InfoExtractor): class NovaMovIE(InfoExtractor):
_VALID_URL = r'http://(?:(?:www\.)?novamov\.com/video/|(?:(?:embed|www)\.)novamov\.com/embed\.php\?v=)(?P<videoid>[a-z\d]{13})' IE_NAME = 'novamov'
IE_DESC = 'NovaMov'
_VALID_URL = r'http://(?:(?:www\.)?%(host)s/video/|(?:(?:embed|www)\.)%(host)s/embed\.php\?(?:.*?&)?v=)(?P<videoid>[a-z\d]{13})' % {'host': 'novamov\.com'}
_HOST = 'www.novamov.com'
_FILE_DELETED_REGEX = r'This file no longer exists on our servers!</h2>'
_FILEKEY_REGEX = r'flashvars\.filekey="(?P<filekey>[^"]+)";'
_TITLE_REGEX = r'(?s)<div class="v_tab blockborder rounded5" id="v_tab1">\s*<h3>([^<]+)</h3>'
_DESCRIPTION_REGEX = r'(?s)<div class="v_tab blockborder rounded5" id="v_tab1">\s*<h3>[^<]+</h3><p>([^<]+)</p>'
_TEST = { _TEST = {
'url': 'http://www.novamov.com/video/4rurhn9x446jj', 'url': 'http://www.novamov.com/video/4rurhn9x446jj',
'file': '4rurhn9x446jj.flv',
'md5': '7205f346a52bbeba427603ba10d4b935', 'md5': '7205f346a52bbeba427603ba10d4b935',
'info_dict': { 'info_dict': {
'id': '4rurhn9x446jj',
'ext': 'flv',
'title': 'search engine optimization', 'title': 'search engine optimization',
'description': 'search engine optimization is used to rank the web page in the google search engine' 'description': 'search engine optimization is used to rank the web page in the google search engine'
}, },
@@ -27,31 +38,26 @@ class NovamovIE(InfoExtractor):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('videoid') video_id = mobj.group('videoid')
page = self._download_webpage('http://www.novamov.com/video/%s' % video_id, page = self._download_webpage(
video_id, 'Downloading video page') 'http://%s/video/%s' % (self._HOST, video_id), video_id, 'Downloading video page')
if re.search(r'This file no longer exists on our servers!</h2>', page) is not None: if re.search(self._FILE_DELETED_REGEX, page) is not None:
raise ExtractorError(u'Video %s does not exist' % video_id, expected=True) raise ExtractorError(u'Video %s does not exist' % video_id, expected=True)
filekey = self._search_regex( filekey = self._search_regex(self._FILEKEY_REGEX, page, 'filekey')
r'flashvars\.filekey="(?P<filekey>[^"]+)";', page, 'filekey')
title = self._html_search_regex( title = self._html_search_regex(self._TITLE_REGEX, page, 'title', fatal=False)
r'(?s)<div class="v_tab blockborder rounded5" id="v_tab1">\s*<h3>([^<]+)</h3>',
page, 'title', fatal=False)
description = self._html_search_regex( description = self._html_search_regex(self._DESCRIPTION_REGEX, page, 'description', default='', fatal=False)
r'(?s)<div class="v_tab blockborder rounded5" id="v_tab1">\s*<h3>[^<]+</h3><p>([^<]+)</p>',
page, 'description', fatal=False)
api_response = self._download_webpage( api_response = self._download_webpage(
'http://www.novamov.com/api/player.api.php?key=%s&file=%s' % (filekey, video_id), 'http://%s/api/player.api.php?key=%s&file=%s' % (self._HOST, filekey, video_id), video_id,
video_id, 'Downloading video api response') 'Downloading video api response')
response = compat_urlparse.parse_qs(api_response) response = compat_urlparse.parse_qs(api_response)
if 'error_msg' in response: if 'error_msg' in response:
raise ExtractorError('novamov returned error: %s' % response['error_msg'][0], expected=True) raise ExtractorError('%s returned error: %s' % (self.IE_NAME, response['error_msg'][0]), expected=True)
video_url = response['url'][0] video_url = response['url'][0]

View File

@@ -1,46 +1,28 @@
import re from __future__ import unicode_literals
from .common import InfoExtractor from .novamov import NovaMovIE
from ..utils import compat_urlparse
class NowVideoIE(InfoExtractor): class NowVideoIE(NovaMovIE):
_VALID_URL = r'(?:https?://)?(?:www\.)?nowvideo\.(?:ch|sx)/video/(?P<id>\w+)' IE_NAME = 'nowvideo'
IE_DESC = 'NowVideo'
_VALID_URL = r'http://(?:(?:www\.)?%(host)s/video/|(?:(?:embed|www)\.)%(host)s/embed\.php\?(?:.*?&)?v=)(?P<videoid>[a-z\d]{13})' % {'host': 'nowvideo\.(?:ch|sx|eu)'}
_HOST = 'www.nowvideo.ch'
_FILE_DELETED_REGEX = r'>This file no longer exists on our servers.<'
_FILEKEY_REGEX = r'var fkzd="([^"]+)";'
_TITLE_REGEX = r'<h4>([^<]+)</h4>'
_DESCRIPTION_REGEX = r'</h4>\s*<p>([^<]+)</p>'
_TEST = { _TEST = {
u'url': u'http://www.nowvideo.ch/video/0mw0yow7b6dxa', 'url': 'http://www.nowvideo.ch/video/0mw0yow7b6dxa',
u'file': u'0mw0yow7b6dxa.flv', 'md5': 'f8fbbc8add72bd95b7850c6a02fc8817',
u'md5': u'f8fbbc8add72bd95b7850c6a02fc8817', 'info_dict': {
u'info_dict': { 'id': '0mw0yow7b6dxa',
u"title": u"youtubedl test video _BaW_jenozKc.mp4"
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage_url = 'http://www.nowvideo.ch/video/' + video_id
embed_url = 'http://embed.nowvideo.ch/embed.php?v=' + video_id
webpage = self._download_webpage(webpage_url, video_id)
embed_page = self._download_webpage(embed_url, video_id,
u'Downloading embed page')
self.report_extraction(video_id)
video_title = self._html_search_regex(r'<h4>(.*)</h4>',
webpage, u'video title')
video_key = self._search_regex(r'var fkzd="(.*)";',
embed_page, u'video key')
api_call = "http://www.nowvideo.ch/api/player.api.php?file={0}&numOfErrors=0&cid=1&key={1}".format(video_id, video_key)
api_response = self._download_webpage(api_call, video_id,
u'Downloading API page')
video_url = compat_urlparse.parse_qs(api_response)[u'url'][0]
return [{
'id': video_id,
'url': video_url,
'ext': 'flv', 'ext': 'flv',
'title': video_title, 'title': 'youtubedl test video _BaW_jenozKc.mp4',
}] 'description': 'Description',
}
}

View File

@@ -1,7 +1,10 @@
from __future__ import unicode_literals
import json import json
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import int_or_none
class PodomaticIE(InfoExtractor): class PodomaticIE(InfoExtractor):
@@ -9,14 +12,14 @@ class PodomaticIE(InfoExtractor):
_VALID_URL = r'^(?P<proto>https?)://(?P<channel>[^.]+)\.podomatic\.com/entry/(?P<id>[^?]+)' _VALID_URL = r'^(?P<proto>https?)://(?P<channel>[^.]+)\.podomatic\.com/entry/(?P<id>[^?]+)'
_TEST = { _TEST = {
u"url": u"http://scienceteachingtips.podomatic.com/entry/2009-01-02T16_03_35-08_00", "url": "http://scienceteachingtips.podomatic.com/entry/2009-01-02T16_03_35-08_00",
u"file": u"2009-01-02T16_03_35-08_00.mp3", "file": "2009-01-02T16_03_35-08_00.mp3",
u"md5": u"84bb855fcf3429e6bf72460e1eed782d", "md5": "84bb855fcf3429e6bf72460e1eed782d",
u"info_dict": { "info_dict": {
u"uploader": u"Science Teaching Tips", "uploader": "Science Teaching Tips",
u"uploader_id": u"scienceteachingtips", "uploader_id": "scienceteachingtips",
u"title": u"64. When the Moon Hits Your Eye", "title": "64. When the Moon Hits Your Eye",
u"duration": 446, "duration": 446,
} }
} }
@@ -36,7 +39,7 @@ class PodomaticIE(InfoExtractor):
uploader = data['podcast'] uploader = data['podcast']
title = data['title'] title = data['title']
thumbnail = data['imageLocation'] thumbnail = data['imageLocation']
duration = int(data['length'] / 1000.0) duration = int_or_none(data.get('length'), 1000)
return { return {
'id': video_id, 'id': video_id,

View File

@@ -0,0 +1,297 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
from hashlib import sha1
from .common import InfoExtractor
from ..utils import (
compat_urllib_parse,
unified_strdate,
clean_html,
RegexNotFoundError,
)
class ProSiebenSat1IE(InfoExtractor):
IE_NAME = 'prosiebensat1'
IE_DESC = 'ProSiebenSat.1 Digital'
_VALID_URL = r'https?://(?:www\.)?(?:(?:prosieben|prosiebenmaxx|sixx|sat1|kabeleins|ran|the-voice-of-germany)\.de|fem\.com)/(?P<id>.+)'
_TESTS = [
{
'url': 'http://www.prosieben.de/tv/circus-halligalli/videos/218-staffel-2-episode-18-jahresrueckblick-ganze-folge',
'info_dict': {
'id': '2104602',
'ext': 'mp4',
'title': 'Staffel 2, Episode 18 - Jahresrückblick',
'description': 'md5:8733c81b702ea472e069bc48bb658fc1',
'upload_date': '20131231',
'duration': 5845.04,
},
'params': {
# rtmp download
'skip_download': True,
},
},
{
'url': 'http://www.prosieben.de/videokatalog/Gesellschaft/Leben/Trends/video-Lady-Umstyling-f%C3%BCr-Audrina-Rebekka-Audrina-Fergen-billig-aussehen-Battal-Modica-700544.html',
'info_dict': {
'id': '2570327',
'ext': 'mp4',
'title': 'Lady-Umstyling für Audrina',
'description': 'md5:4c16d0c17a3461a0d43ea4084e96319d',
'upload_date': '20131014',
'duration': 606.76,
},
'params': {
# rtmp download
'skip_download': True,
},
'skip': 'Seems to be broken',
},
{
'url': 'http://www.prosiebenmaxx.de/yep/one-piece/video/148-folge-48-gold-rogers-heimat-ganze-folge',
'info_dict': {
'id': '2437108',
'ext': 'mp4',
'title': 'Folge 48: Gold Rogers Heimat',
'description': 'Ruffy erreicht die Insel, auf der der berühmte Gold Roger lebte und hingerichtet wurde.',
'upload_date': '20140226',
'duration': 1401.48,
},
'params': {
# rtmp download
'skip_download': True,
},
},
{
'url': 'http://www.sixx.de/stars-style/video/sexy-laufen-in-ugg-boots-clip',
'info_dict': {
'id': '2904997',
'ext': 'mp4',
'title': 'Sexy laufen in Ugg Boots',
'description': 'md5:edf42b8bd5bc4e5da4db4222c5acb7d6',
'upload_date': '20140122',
'duration': 245.32,
},
'params': {
# rtmp download
'skip_download': True,
},
},
{
'url': 'http://www.sat1.de/film/der-ruecktritt/video/im-interview-kai-wiesinger-clip',
'info_dict': {
'id': '2906572',
'ext': 'mp4',
'title': 'Im Interview: Kai Wiesinger',
'description': 'md5:e4e5370652ec63b95023e914190b4eb9',
'upload_date': '20140225',
'duration': 522.56,
},
'params': {
# rtmp download
'skip_download': True,
},
},
{
'url': 'http://www.kabeleins.de/tv/rosins-restaurants/videos/jagd-auf-fertigkost-im-elsthal-teil-2-ganze-folge',
'info_dict': {
'id': '2992323',
'ext': 'mp4',
'title': 'Jagd auf Fertigkost im Elsthal - Teil 2',
'description': 'md5:2669cde3febe9bce13904f701e774eb6',
'upload_date': '20140225',
'duration': 2410.44,
},
'params': {
# rtmp download
'skip_download': True,
},
},
{
'url': 'http://www.ran.de/fussball/bundesliga/video/schalke-toennies-moechte-raul-zurueck-ganze-folge',
'info_dict': {
'id': '3004256',
'ext': 'mp4',
'title': 'Schalke: Tönnies möchte Raul zurück',
'description': 'md5:4b5b271d9bcde223b54390754c8ece3f',
'upload_date': '20140226',
'duration': 228.96,
},
'params': {
# rtmp download
'skip_download': True,
},
},
{
'url': 'http://www.the-voice-of-germany.de/video/31-andreas-kuemmert-rocket-man-clip',
'info_dict': {
'id': '2572814',
'ext': 'mp4',
'title': 'Andreas Kümmert: Rocket Man',
'description': 'md5:6ddb02b0781c6adf778afea606652e38',
'upload_date': '20131017',
'duration': 469.88,
},
'params': {
# rtmp download
'skip_download': True,
},
},
{
'url': 'http://www.fem.com/wellness/videos/wellness-video-clip-kurztripps-zum-valentinstag.html',
'info_dict': {
'id': '2156342',
'ext': 'mp4',
'title': 'Kurztrips zum Valentinstag',
'description': 'md5:8ba6301e70351ae0bedf8da00f7ba528',
'upload_date': '20130206',
'duration': 307.24,
},
'params': {
# rtmp download
'skip_download': True,
},
},
]
_CLIPID_REGEXES = [
r'"clip_id"\s*:\s+"(\d+)"',
r'clipid: "(\d+)"',
]
_TITLE_REGEXES = [
r'<h2 class="subtitle" itemprop="name">\s*(.+?)</h2>',
r'<header class="clearfix">\s*<h3>(.+?)</h3>',
r'<!-- start video -->\s*<h1>(.+?)</h1>',
r'<div class="ep-femvideos-pi4-video-txt">\s*<h2>(.+?)</h2>',
]
_DESCRIPTION_REGEXES = [
r'<p itemprop="description">\s*(.+?)</p>',
r'<div class="videoDecription">\s*<p><strong>Beschreibung</strong>: (.+?)</p>',
r'<div class="g-plusone" data-size="medium"></div>\s*</div>\s*</header>\s*(.+?)\s*<footer>',
r'<p>(.+?)</p>\s*<div class="ep-femvideos-pi4-video-footer">',
]
_UPLOAD_DATE_REGEXES = [
r'<meta property="og:published_time" content="(.+?)">',
r'<span>\s*(\d{2}\.\d{2}\.\d{4} \d{2}:\d{2}) \|\s*<span itemprop="duration"',
r'<footer>\s*(\d{2}\.\d{2}\.\d{4}) \d{2}:\d{2} Uhr',
r'<span style="padding-left: 4px;line-height:20px; color:#404040">(\d{2}\.\d{2}\.\d{4})</span>',
r'(\d{2}\.\d{2}\.\d{4}) \| \d{2}:\d{2} Min<br/>',
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
page = self._download_webpage(url, video_id, 'Downloading page')
def extract(patterns, name, page, fatal=False):
for pattern in patterns:
mobj = re.search(pattern, page)
if mobj:
return clean_html(mobj.group(1))
if fatal:
raise RegexNotFoundError(u'Unable to extract %s' % name)
return None
clip_id = extract(self._CLIPID_REGEXES, 'clip id', page, fatal=True)
access_token = 'testclient'
client_name = 'kolibri-1.2.5'
client_location = url
videos_api_url = 'http://vas.sim-technik.de/vas/live/v2/videos?%s' % compat_urllib_parse.urlencode({
'access_token': access_token,
'client_location': client_location,
'client_name': client_name,
'ids': clip_id,
})
videos = self._download_json(videos_api_url, clip_id, 'Downloading videos JSON')
duration = float(videos[0]['duration'])
source_ids = [source['id'] for source in videos[0]['sources']]
source_ids_str = ','.join(map(str, source_ids))
g = '01!8d8F_)r9]4s[qeuXfP%'
client_id = g[:2] + sha1(''.join([clip_id, g, access_token, client_location, g, client_name])
.encode('utf-8')).hexdigest()
sources_api_url = 'http://vas.sim-technik.de/vas/live/v2/videos/%s/sources?%s' % (clip_id, compat_urllib_parse.urlencode({
'access_token': access_token,
'client_id': client_id,
'client_location': client_location,
'client_name': client_name,
}))
sources = self._download_json(sources_api_url, clip_id, 'Downloading sources JSON')
server_id = sources['server_id']
client_id = g[:2] + sha1(''.join([g, clip_id, access_token, server_id,
client_location, source_ids_str, g, client_name])
.encode('utf-8')).hexdigest()
url_api_url = 'http://vas.sim-technik.de/vas/live/v2/videos/%s/sources/url?%s' % (clip_id, compat_urllib_parse.urlencode({
'access_token': access_token,
'client_id': client_id,
'client_location': client_location,
'client_name': client_name,
'server_id': server_id,
'source_ids': source_ids_str,
}))
urls = self._download_json(url_api_url, clip_id, 'Downloading urls JSON')
title = extract(self._TITLE_REGEXES, 'title', page, fatal=True)
description = extract(self._DESCRIPTION_REGEXES, 'description', page)
thumbnail = self._og_search_thumbnail(page)
upload_date = extract(self._UPLOAD_DATE_REGEXES, 'upload date', page)
if upload_date:
upload_date = unified_strdate(upload_date)
formats = []
urls_sources = urls['sources']
if isinstance(urls_sources, dict):
urls_sources = urls_sources.values()
def fix_bitrate(bitrate):
return bitrate / 1000 if bitrate % 1000 == 0 else bitrate
for source in urls_sources:
protocol = source['protocol']
if protocol == 'rtmp' or protocol == 'rtmpe':
mobj = re.search(r'^(?P<url>rtmpe?://[^/]+/(?P<app>[^/]+))/(?P<playpath>.+)$', source['url'])
if not mobj:
continue
formats.append({
'url': mobj.group('url'),
'app': mobj.group('app'),
'play_path': mobj.group('playpath'),
'player_url': 'http://livepassdl.conviva.com/hf/ver/2.79.0.17083/LivePassModuleMain.swf',
'page_url': 'http://www.prosieben.de',
'vbr': fix_bitrate(source['bitrate']),
'ext': 'mp4',
'format_id': '%s_%s' % (source['cdn'], source['bitrate']),
})
else:
formats.append({
'url': source['url'],
'vbr': fix_bitrate(source['bitrate']),
})
self._sort_formats(formats)
return {
'id': clip_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'upload_date': upload_date,
'duration': duration,
'formats': formats,
}

View File

@@ -1,26 +1,44 @@
# encoding: utf-8 # encoding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
clean_html,
ExtractorError, ExtractorError,
clean_html,
unified_strdate,
int_or_none,
) )
class RTLnowIE(InfoExtractor): class RTLnowIE(InfoExtractor):
"""Information Extractor for RTL NOW, RTL2 NOW, RTL NITRO, SUPER RTL NOW, VOX NOW and n-tv NOW""" """Information Extractor for RTL NOW, RTL2 NOW, RTL NITRO, SUPER RTL NOW, VOX NOW and n-tv NOW"""
_VALID_URL = r'(?:http://)?(?P<url>(?P<domain>rtl-now\.rtl\.de|rtl2now\.rtl2\.de|(?:www\.)?voxnow\.de|(?:www\.)?rtlnitronow\.de|(?:www\.)?superrtlnow\.de|(?:www\.)?n-tvnow\.de)/+[a-zA-Z0-9-]+/[a-zA-Z0-9-]+\.php\?(?:container_id|film_id)=(?P<video_id>[0-9]+)&player=1(?:&season=[0-9]+)?(?:&.*)?)' _VALID_URL = r'''(?x)
_TESTS = [{ (?:https?://)?
(?P<url>
(?P<domain>
rtl-now\.rtl\.de|
rtl2now\.rtl2\.de|
(?:www\.)?voxnow\.de|
(?:www\.)?rtlnitronow\.de|
(?:www\.)?superrtlnow\.de|
(?:www\.)?n-tvnow\.de)
/+[a-zA-Z0-9-]+/[a-zA-Z0-9-]+\.php\?
(?:container_id|film_id)=(?P<video_id>[0-9]+)&
player=1(?:&season=[0-9]+)?(?:&.*)?
)'''
_TESTS = [
{
'url': 'http://rtl-now.rtl.de/ahornallee/folge-1.php?film_id=90419&player=1&season=1', 'url': 'http://rtl-now.rtl.de/ahornallee/folge-1.php?film_id=90419&player=1&season=1',
'file': '90419.flv',
'info_dict': { 'info_dict': {
'upload_date': '20070416', 'id': '90419',
'ext': 'flv',
'title': 'Ahornallee - Folge 1 - Der Einzug', 'title': 'Ahornallee - Folge 1 - Der Einzug',
'description': 'Folge 1 - Der Einzug', 'description': 'md5:ce843b6b5901d9a7f7d04d1bbcdb12de',
'upload_date': '20070416',
'duration': 1685,
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@@ -29,12 +47,14 @@ class RTLnowIE(InfoExtractor):
}, },
{ {
'url': 'http://rtl2now.rtl2.de/aerger-im-revier/episode-15-teil-1.php?film_id=69756&player=1&season=2&index=5', 'url': 'http://rtl2now.rtl2.de/aerger-im-revier/episode-15-teil-1.php?film_id=69756&player=1&season=2&index=5',
'file': '69756.flv',
'info_dict': { 'info_dict': {
'upload_date': '20120519', 'id': '69756',
'title': 'Ärger im Revier - Ein junger Ladendieb, ein handfester Streit...', 'ext': 'flv',
'description': 'Ärger im Revier - Ein junger Ladendieb, ein handfester Streit u.a.', 'title': 'Ärger im Revier - Ein junger Ladendieb, ein handfester Streit u.a.',
'description': 'md5:3fb247005ed21a935ffc82b7dfa70cf0',
'thumbnail': 'http://autoimg.static-fra.de/rtl2now/219850/1500x1500/image2.jpg', 'thumbnail': 'http://autoimg.static-fra.de/rtl2now/219850/1500x1500/image2.jpg',
'upload_date': '20120519',
'duration': 1245,
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@@ -43,11 +63,13 @@ class RTLnowIE(InfoExtractor):
}, },
{ {
'url': 'http://www.voxnow.de/voxtours/suedafrika-reporter-ii.php?film_id=13883&player=1&season=17', 'url': 'http://www.voxnow.de/voxtours/suedafrika-reporter-ii.php?film_id=13883&player=1&season=17',
'file': '13883.flv',
'info_dict': { 'info_dict': {
'upload_date': '20090627', 'id': '13883',
'ext': 'flv',
'title': 'Voxtours - Südafrika-Reporter II', 'title': 'Voxtours - Südafrika-Reporter II',
'description': 'Südafrika-Reporter II', 'description': 'md5:de7f8d56be6fd4fed10f10f57786db00',
'upload_date': '20090627',
'duration': 1800,
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@@ -55,94 +77,89 @@ class RTLnowIE(InfoExtractor):
}, },
{ {
'url': 'http://superrtlnow.de/medicopter-117/angst.php?film_id=99205&player=1', 'url': 'http://superrtlnow.de/medicopter-117/angst.php?film_id=99205&player=1',
'file': '99205.flv',
'info_dict': { 'info_dict': {
'upload_date': '20080928', 'id': '99205',
'ext': 'flv',
'title': 'Medicopter 117 - Angst!', 'title': 'Medicopter 117 - Angst!',
'description': 'Angst!', 'description': 'md5:895b1df01639b5f61a04fc305a5cb94d',
'thumbnail': 'http://autoimg.static-fra.de/superrtlnow/287529/1500x1500/image2.jpg' 'thumbnail': 'http://autoimg.static-fra.de/superrtlnow/287529/1500x1500/image2.jpg',
'upload_date': '20080928',
'duration': 2691,
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
}, },
{ {
'url': 'http://www.n-tvnow.de/top-gear/episode-1-2013-01-01-00-00-00.php?film_id=124903&player=1&season=10', 'url': 'http://www.n-tvnow.de/deluxe-alles-was-spass-macht/thema-ua-luxushotel-fuer-vierbeiner.php?container_id=153819&player=1&season=0',
'file': '124903.flv',
'info_dict': { 'info_dict': {
'upload_date': '20130101', 'id': '153819',
'title': 'Top Gear vom 01.01.2013', 'ext': 'flv',
'description': 'Episode 1', 'title': 'Deluxe - Alles was Spaß macht - Thema u.a.: Luxushotel für Vierbeiner',
}, 'description': 'md5:c3705e1bb32e1a5b2bcd634fc065c631',
'params': { 'thumbnail': 'http://autoimg.static-fra.de/ntvnow/383157/1500x1500/image2.jpg',
'skip_download': True, 'upload_date': '20140221',
'duration': 2429,
}, },
'skip': 'Only works from Germany', 'skip': 'Only works from Germany',
}] },
]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
video_page_url = 'http://%s/' % mobj.group('domain')
webpage_url = 'http://' + mobj.group('url')
video_page_url = 'http://' + mobj.group('domain') + '/'
video_id = mobj.group('video_id') video_id = mobj.group('video_id')
webpage = self._download_webpage(webpage_url, video_id) webpage = self._download_webpage('http://' + mobj.group('url'), video_id)
note_m = re.search(r'''(?sx) mobj = re.search(r'(?s)<div style="margin-left: 20px; font-size: 13px;">(.*?)<div id="playerteaser">', webpage)
<div[ ]style="margin-left:[ ]20px;[ ]font-size:[ ]13px;">(.*?) if mobj:
<div[ ]id="playerteaser">''', webpage) raise ExtractorError(clean_html(mobj.group(1)), expected=True)
if note_m:
msg = clean_html(note_m.group(1)) title = self._og_search_title(webpage)
raise ExtractorError(msg) description = self._og_search_description(webpage)
thumbnail = self._og_search_thumbnail(webpage, default=None)
upload_date = unified_strdate(self._html_search_meta('uploadDate', webpage, 'upload date'))
mobj = re.search(r'<meta itemprop="duration" content="PT(?P<seconds>\d+)S" />', webpage)
duration = int(mobj.group('seconds')) if mobj else None
video_title = self._html_search_regex(
r'<title>(?P<title>[^<]+?)( \| [^<]*)?</title>',
webpage, 'title')
playerdata_url = self._html_search_regex( playerdata_url = self._html_search_regex(
r'\'playerdata\': \'(?P<playerdata_url>[^\']+)\'', r"'playerdata': '(?P<playerdata_url>[^']+)'", webpage, 'playerdata_url')
webpage, 'playerdata_url')
playerdata = self._download_webpage(playerdata_url, video_id) playerdata = self._download_xml(playerdata_url, video_id, 'Downloading player data XML')
mobj = re.search(r'<title><!\[CDATA\[(?P<description>.+?)(?:\s+- (?:Sendung )?vom (?P<upload_date_d>[0-9]{2})\.(?P<upload_date_m>[0-9]{2})\.(?:(?P<upload_date_Y>[0-9]{4})|(?P<upload_date_y>[0-9]{2})) [0-9]{2}:[0-9]{2} Uhr)?\]\]></title>', playerdata)
videoinfo = playerdata.find('./playlist/videoinfo')
formats = []
for filename in videoinfo.findall('filename'):
mobj = re.search(r'(?P<url>rtmpe://(?:[^/]+/){2})(?P<play_path>.+)', filename.text)
if mobj: if mobj:
video_description = mobj.group('description') fmt = {
if mobj.group('upload_date_Y'): 'url': mobj.group('url'),
video_upload_date = mobj.group('upload_date_Y') 'play_path': 'mp4:' + mobj.group('play_path'),
elif mobj.group('upload_date_y'): 'page_url': video_page_url,
video_upload_date = '20' + mobj.group('upload_date_y') 'player_url': video_page_url + 'includes/vodplayer.swf',
}
else: else:
video_upload_date = None fmt = {
if video_upload_date: 'url': filename.text,
video_upload_date += mobj.group('upload_date_m') + mobj.group('upload_date_d') }
else: fmt.update({
video_description = None 'width': int_or_none(filename.get('width')),
video_upload_date = None 'height': int_or_none(filename.get('height')),
self._downloader.report_warning('Unable to extract description and upload date') 'vbr': int_or_none(filename.get('bitrate')),
'ext': 'flv',
# Thumbnail: not every video has an thumbnail })
mobj = re.search(r'<meta property="og:image" content="(?P<thumbnail>[^"]+)">', webpage) formats.append(fmt)
if mobj:
video_thumbnail = mobj.group('thumbnail')
else:
video_thumbnail = None
mobj = re.search(r'<filename [^>]+><!\[CDATA\[(?P<url>rtmpe://(?:[^/]+/){2})(?P<play_path>[^\]]+)\]\]></filename>', playerdata)
if mobj is None:
raise ExtractorError('Unable to extract media URL')
video_url = mobj.group('url')
video_play_path = 'mp4:' + mobj.group('play_path')
video_player_url = video_page_url + 'includes/vodplayer.swf'
return { return {
'id': video_id, 'id': video_id,
'url': video_url, 'title': title,
'play_path': video_play_path, 'description': description,
'page_url': video_page_url, 'thumbnail': thumbnail,
'player_url': video_player_url, 'upload_date': upload_date,
'ext': 'flv', 'duration': duration,
'title': video_title, 'formats': formats,
'description': video_description,
'upload_date': video_upload_date,
'thumbnail': video_thumbnail,
} }

View File

@@ -0,0 +1,37 @@
# coding: utf-8
from __future__ import unicode_literals
import os.path
import re
from .common import InfoExtractor
class SaveFromIE(InfoExtractor):
IE_NAME = 'savefrom.net'
_VALID_URL = r'https?://[^.]+\.savefrom\.net/\#url=(?P<url>.*)$'
_TEST = {
'url': 'http://en.savefrom.net/#url=http://youtube.com/watch?v=UlVRAPW2WJY&utm_source=youtube.com&utm_medium=short_domains&utm_campaign=ssyoutube.com',
'info_dict': {
'id': 'UlVRAPW2WJY',
'ext': 'mp4',
'title': 'About Team Radical MMA | MMA Fighting',
'upload_date': '20120816',
'uploader': 'Howcast',
'uploader_id': 'Howcast',
'description': 'md5:4f0aac94361a12e1ce57d74f85265175',
},
'params': {
'skip_download': True
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = os.path.splitext(url.split('/')[-1])[0]
return {
'_type': 'url',
'id': video_id,
'url': mobj.group('url'),
}

View File

@@ -1,3 +1,5 @@
from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
@@ -8,14 +10,14 @@ from ..utils import RegexNotFoundError, ExtractorError
class SpaceIE(InfoExtractor): class SpaceIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:www|m)\.)?space\.com/\d+-(?P<title>[^/\.\?]*?)-video\.html' _VALID_URL = r'https?://(?:(?:www|m)\.)?space\.com/\d+-(?P<title>[^/\.\?]*?)-video\.html'
_TEST = { _TEST = {
u'add_ie': ['Brightcove'], 'add_ie': ['Brightcove'],
u'url': u'http://www.space.com/23373-huge-martian-landforms-detail-revealed-by-european-probe-video.html', 'url': 'http://www.space.com/23373-huge-martian-landforms-detail-revealed-by-european-probe-video.html',
u'info_dict': { 'info_dict': {
u'id': u'2780937028001', 'id': '2780937028001',
u'ext': u'mp4', 'ext': 'mp4',
u'title': u'Huge Martian Landforms\' Detail Revealed By European Probe | Video', 'title': 'Huge Martian Landforms\' Detail Revealed By European Probe | Video',
u'description': u'md5:db81cf7f3122f95ed234b631a6ea1e61', 'description': 'md5:db81cf7f3122f95ed234b631a6ea1e61',
u'uploader': u'TechMedia Networks', 'uploader': 'TechMedia Networks',
}, },
} }

View File

@@ -1,6 +1,5 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import os
import re import re
from .common import InfoExtractor from .common import InfoExtractor
@@ -8,23 +7,27 @@ from ..utils import (
compat_urllib_parse_urlparse, compat_urllib_parse_urlparse,
compat_urllib_request, compat_urllib_request,
compat_urllib_parse, compat_urllib_parse,
unified_strdate,
str_to_int,
int_or_none,
) )
from ..aes import ( from ..aes import aes_decrypt_text
aes_decrypt_text
)
class SpankwireIE(InfoExtractor): class SpankwireIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?(?:www\.)?(?P<url>spankwire\.com/[^/]*/video(?P<videoid>[0-9]+)/?)' _VALID_URL = r'https?://(?:www\.)?(?P<url>spankwire\.com/[^/]*/video(?P<videoid>[0-9]+)/?)'
_TEST = { _TEST = {
'url': 'http://www.spankwire.com/Buckcherry-s-X-Rated-Music-Video-Crazy-Bitch/video103545/', 'url': 'http://www.spankwire.com/Buckcherry-s-X-Rated-Music-Video-Crazy-Bitch/video103545/',
'file': '103545.mp4', 'md5': '8bbfde12b101204b39e4b9fe7eb67095',
'md5': '1b3f55e345500552dbc252a3e9c1af43',
'info_dict': { 'info_dict': {
"uploader": "oreusz", 'id': '103545',
"title": "Buckcherry`s X Rated Music Video Crazy Bitch", 'ext': 'mp4',
"description": "Crazy Bitch X rated music video.", 'title': 'Buckcherry`s X Rated Music Video Crazy Bitch',
"age_limit": 18, 'description': 'Crazy Bitch X rated music video.',
'uploader': 'oreusz',
'uploader_id': '124697',
'upload_date': '20070508',
'age_limit': 18,
} }
} }
@@ -37,13 +40,26 @@ class SpankwireIE(InfoExtractor):
req.add_header('Cookie', 'age_verified=1') req.add_header('Cookie', 'age_verified=1')
webpage = self._download_webpage(req, video_id) webpage = self._download_webpage(req, video_id)
video_title = self._html_search_regex(r'<h1>([^<]+)', webpage, 'title') title = self._html_search_regex(r'<h1>([^<]+)', webpage, 'title')
video_uploader = self._html_search_regex(
r'by:\s*<a [^>]*>(.+?)</a>', webpage, 'uploader', fatal=False)
thumbnail = self._html_search_regex(
r'flashvars\.image_url = "([^"]+)', webpage, 'thumbnail', fatal=False)
description = self._html_search_regex( description = self._html_search_regex(
r'<div\s+id="descriptionContent">([^<]+)<', webpage, 'description', fatal=False) r'<div\s+id="descriptionContent">([^<]+)<', webpage, 'description', fatal=False)
thumbnail = self._html_search_regex(
r'flashvars\.image_url = "([^"]+)', webpage, 'thumbnail', fatal=False)
uploader = self._html_search_regex(
r'by:\s*<a [^>]*>(.+?)</a>', webpage, 'uploader', fatal=False)
uploader_id = self._html_search_regex(
r'by:\s*<a href="/Profile\.aspx\?.*?UserId=(\d+).*?"', webpage, 'uploader id', fatal=False)
upload_date = self._html_search_regex(r'</a> on (.+?) at \d+:\d+', webpage, 'upload date', fatal=False)
if upload_date:
upload_date = unified_strdate(upload_date)
view_count = self._html_search_regex(
r'<div id="viewsCounter"><span>([^<]+)</span> views</div>', webpage, 'view count', fatal=False)
if view_count:
view_count = str_to_int(view_count)
comment_count = int_or_none(self._html_search_regex(
r'<span id="spCommentCount">\s*(\d+)</span> Comments</div>', webpage, 'comment count', fatal=False))
video_urls = list(map(compat_urllib_parse.unquote , re.findall(r'flashvars\.quality_[0-9]{3}p = "([^"]+)', webpage))) video_urls = list(map(compat_urllib_parse.unquote , re.findall(r'flashvars\.quality_[0-9]{3}p = "([^"]+)', webpage)))
if webpage.find('flashvars\.encrypted = "true"') != -1: if webpage.find('flashvars\.encrypted = "true"') != -1:
@@ -53,16 +69,13 @@ class SpankwireIE(InfoExtractor):
formats = [] formats = []
for video_url in video_urls: for video_url in video_urls:
path = compat_urllib_parse_urlparse(video_url).path path = compat_urllib_parse_urlparse(video_url).path
extension = os.path.splitext(path)[1][1:]
format = path.split('/')[4].split('_')[:2] format = path.split('/')[4].split('_')[:2]
resolution, bitrate_str = format resolution, bitrate_str = format
format = "-".join(format) format = "-".join(format)
height = int(resolution.rstrip('P')) height = int(resolution.rstrip('Pp'))
tbr = int(bitrate_str.rstrip('K')) tbr = int(bitrate_str.rstrip('Kk'))
formats.append({ formats.append({
'url': video_url, 'url': video_url,
'ext': extension,
'resolution': resolution, 'resolution': resolution,
'format': format, 'format': format,
'tbr': tbr, 'tbr': tbr,
@@ -75,10 +88,14 @@ class SpankwireIE(InfoExtractor):
return { return {
'id': video_id, 'id': video_id,
'uploader': video_uploader, 'title': title,
'title': video_title,
'thumbnail': thumbnail,
'description': description, 'description': description,
'thumbnail': thumbnail,
'uploader': uploader,
'uploader_id': uploader_id,
'upload_date': upload_date,
'view_count': view_count,
'comment_count': comment_count,
'formats': formats, 'formats': formats,
'age_limit': age_limit, 'age_limit': age_limit,
} }

View File

@@ -0,0 +1,68 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import ExtractorError
class TestURLIE(InfoExtractor):
""" Allows adressing of the test cases as test:yout.*be_1 """
IE_DESC = False # Do not list
_VALID_URL = r'test(?:url)?:(?P<id>(?P<extractor>.+?)(?:_(?P<num>[0-9]+))?)$'
def _real_extract(self, url):
from ..extractor import gen_extractors
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
extractor_id = mobj.group('extractor')
all_extractors = gen_extractors()
rex = re.compile(extractor_id, flags=re.IGNORECASE)
matching_extractors = [
e for e in all_extractors if rex.search(e.IE_NAME)]
if len(matching_extractors) == 0:
raise ExtractorError(
'No extractors matching %r found' % extractor_id,
expected=True)
elif len(matching_extractors) > 1:
# Is it obvious which one to pick?
try:
extractor = next(
ie for ie in matching_extractors
if ie.IE_NAME.lower() == extractor_id.lower())
except StopIteration:
raise ExtractorError(
('Found multiple matching extractors: %s' %
' '.join(ie.IE_NAME for ie in matching_extractors)),
expected=True)
else:
extractor = matching_extractors[0]
num_str = mobj.group('num')
num = int(num_str) if num_str else 0
testcases = []
t = getattr(extractor, '_TEST', None)
if t:
testcases.append(t)
testcases.extend(getattr(extractor, '_TESTS', []))
try:
tc = testcases[num]
except IndexError:
raise ExtractorError(
('Test case %d not found, got only %d tests' %
(num, len(testcases))),
expected=True)
self.to_screen('Test URL: %s' % tc['url'])
return {
'_type': 'url',
'url': tc['url'],
'id': video_id,
}

View File

@@ -13,7 +13,7 @@ _x = lambda p: xpath_with_ns(p, {'smil': 'http://www.w3.org/2005/SMIL21/Language
class ThePlatformIE(InfoExtractor): class ThePlatformIE(InfoExtractor):
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
(?:https?://(?:link|player)\.theplatform\.com/[sp]/[^/]+/ (?:https?://(?:link|player)\.theplatform\.com/[sp]/[^/]+/
(?P<config>[^/\?]+/(?:swf|config)/select/)? (?P<config>(?:[^/\?]+/(?:swf|config)|onsite)/select/)?
|theplatform:)(?P<id>[^/\?&]+)''' |theplatform:)(?P<id>[^/\?&]+)'''
_TEST = { _TEST = {
@@ -54,10 +54,15 @@ class ThePlatformIE(InfoExtractor):
f4m_node = body.find(_x('smil:seq/smil:video')) f4m_node = body.find(_x('smil:seq/smil:video'))
if f4m_node is not None: if f4m_node is not None:
f4m_url = f4m_node.attrib['src']
if 'manifest.f4m?' not in f4m_url:
f4m_url += '?'
# the parameters are from syfy.com, other sites may use others,
# they also work for nbc.com
f4m_url += '&g=UXWGVKRWHFSP&hdcore=3.0.3'
formats = [{ formats = [{
'ext': 'flv', 'ext': 'flv',
# the parameters are from syfy.com, other sites may use others 'url': f4m_url,
'url': f4m_node.attrib['src'] + '?g=UXWGVKRWHFSP&hdcore=3.0.3',
}] }]
else: else:
base_url = head.find(_x('smil:meta')).attrib['base'] base_url = head.find(_x('smil:meta')).attrib['base']
@@ -95,9 +100,10 @@ class ThePlatformIE(InfoExtractor):
if mobj.group('config'): if mobj.group('config'):
config_url = url+ '&form=json' config_url = url+ '&form=json'
config_url = config_url.replace('swf/', 'config/') config_url = config_url.replace('swf/', 'config/')
config_url = config_url.replace('onsite/', 'onsite/config/')
config_json = self._download_webpage(config_url, video_id, u'Downloading config') config_json = self._download_webpage(config_url, video_id, u'Downloading config')
config = json.loads(config_json) config = json.loads(config_json)
smil_url = config['releaseUrl'] + '&format=SMIL&formats=MPEG4' smil_url = config['releaseUrl'] + '&format=SMIL&formats=MPEG4&manifest=f4m'
else: else:
smil_url = ('http://link.theplatform.com/s/dJ5BDC/{0}/meta.smil?' smil_url = ('http://link.theplatform.com/s/dJ5BDC/{0}/meta.smil?'
'format=smil&mbr=true'.format(video_id)) 'format=smil&mbr=true'.format(video_id))

View File

@@ -0,0 +1,44 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class TruTubeIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?trutube\.tv/video/(?P<id>[0-9]+)/.*'
_TEST = {
'url': 'http://trutube.tv/video/14880/Ramses-II-Proven-To-Be-A-Red-Headed-Caucasoid-',
'md5': 'c5b6e301b0a2040b074746cbeaa26ca1',
'info_dict': {
'id': '14880',
'ext': 'flv',
'title': 'Ramses II - Proven To Be A Red Headed Caucasoid',
'thumbnail': 're:^http:.*\.jpg$',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
video_title = self._og_search_title(webpage).strip()
thumbnail = self._search_regex(
r"var splash_img = '([^']+)';", webpage, 'thumbnail', fatal=False)
all_formats = re.finditer(
r"var (?P<key>[a-z]+)_video_file\s*=\s*'(?P<url>[^']+)';", webpage)
formats = [{
'format_id': m.group('key'),
'quality': -i,
'url': m.group('url'),
} for i, m in enumerate(all_formats)]
self._sort_formats(formats)
return {
'id': video_id,
'title': video_title,
'formats': formats,
'thumbnail': thumbnail,
}

View File

@@ -4,6 +4,7 @@ import re
import json import json
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import compat_urllib_request
class VeohIE(InfoExtractor): class VeohIE(InfoExtractor):
@@ -24,6 +25,13 @@ class VeohIE(InfoExtractor):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id') video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
age_limit = 0
if 'class="adultwarning-container"' in webpage:
self.report_age_confirmation()
age_limit = 18
request = compat_urllib_request.Request(url)
request.add_header('Cookie', 'confirmedAdult=true')
webpage = self._download_webpage(request, video_id)
m_youtube = re.search(r'http://www\.youtube\.com/v/(.*?)(\&|")', webpage) m_youtube = re.search(r'http://www\.youtube\.com/v/(.*?)(\&|")', webpage)
if m_youtube is not None: if m_youtube is not None:
@@ -44,4 +52,5 @@ class VeohIE(InfoExtractor):
'thumbnail': info.get('highResImage') or info.get('medResImage'), 'thumbnail': info.get('highResImage') or info.get('medResImage'),
'description': info['description'], 'description': info['description'],
'view_count': info['views'], 'view_count': info['views'],
'age_limit': age_limit,
} }

View File

@@ -24,9 +24,10 @@ class VevoIE(InfoExtractor):
(?P<id>[^&?#]+)''' (?P<id>[^&?#]+)'''
_TESTS = [{ _TESTS = [{
'url': 'http://www.vevo.com/watch/hurts/somebody-to-die-for/GB1101300280', 'url': 'http://www.vevo.com/watch/hurts/somebody-to-die-for/GB1101300280',
'file': 'GB1101300280.mp4',
"md5": "06bea460acb744eab74a9d7dcb4bfd61", "md5": "06bea460acb744eab74a9d7dcb4bfd61",
'info_dict': { 'info_dict': {
'id': 'GB1101300280',
'ext': 'mp4',
"upload_date": "20130624", "upload_date": "20130624",
"uploader": "Hurts", "uploader": "Hurts",
"title": "Somebody to Die For", "title": "Somebody to Die For",
@@ -34,6 +35,33 @@ class VevoIE(InfoExtractor):
"width": 1920, "width": 1920,
"height": 1080, "height": 1080,
} }
}, {
'note': 'v3 SMIL format',
'url': 'http://www.vevo.com/watch/cassadee-pope/i-wish-i-could-break-your-heart/USUV71302923',
'md5': '893ec0e0d4426a1d96c01de8f2bdff58',
'info_dict': {
'id': 'USUV71302923',
'ext': 'mp4',
'upload_date': '20140219',
'uploader': 'Cassadee Pope',
'title': 'I Wish I Could Break Your Heart',
'duration': 226.101,
'age_limit': 0,
}
}, {
'note': 'Age-limited video',
'url': 'https://www.vevo.com/watch/justin-timberlake/tunnel-vision-explicit/USRV81300282',
'info_dict': {
'id': 'USRV81300282',
'ext': 'mp4',
'age_limit': 18,
'title': 'Tunnel Vision (Explicit)',
'uploader': 'Justin Timberlake',
'upload_date': '20130704',
},
'params': {
'skip_download': 'true',
}
}] }]
_SMIL_BASE_URL = 'http://smil.lvl3.vevo.com/' _SMIL_BASE_URL = 'http://smil.lvl3.vevo.com/'
@@ -105,9 +133,31 @@ class VevoIE(InfoExtractor):
video_info = self._download_json(json_url, video_id)['video'] video_info = self._download_json(json_url, video_id)['video']
formats = self._formats_from_json(video_info) formats = self._formats_from_json(video_info)
try:
is_explicit = video_info.get('isExplicit')
if is_explicit is True:
age_limit = 18
elif is_explicit is False:
age_limit = 0
else:
age_limit = None
# Download SMIL
smil_blocks = sorted((
f for f in video_info['videoVersions']
if f['sourceType'] == 13),
key=lambda f: f['version'])
smil_url = '%s/Video/V2/VFILE/%s/%sr.smil' % ( smil_url = '%s/Video/V2/VFILE/%s/%sr.smil' % (
self._SMIL_BASE_URL, video_id, video_id.lower()) self._SMIL_BASE_URL, video_id, video_id.lower())
if smil_blocks:
smil_url_m = self._search_regex(
r'url="([^"]+)"', smil_blocks[-1]['data'], 'SMIL URL',
fatal=False)
if smil_url_m is not None:
smil_url = smil_url_m
try:
smil_xml = self._download_webpage(smil_url, video_id, smil_xml = self._download_webpage(smil_url, video_id,
'Downloading SMIL info') 'Downloading SMIL info')
formats.extend(self._formats_from_smil(smil_xml)) formats.extend(self._formats_from_smil(smil_xml))
@@ -128,4 +178,5 @@ class VevoIE(InfoExtractor):
'upload_date': upload_date.strftime('%Y%m%d'), 'upload_date': upload_date.strftime('%Y%m%d'),
'uploader': video_info['mainArtists'][0]['artistName'], 'uploader': video_info['mainArtists'][0]['artistName'],
'duration': video_info['duration'], 'duration': video_info['duration'],
'age_limit': age_limit,
} }

View File

@@ -0,0 +1,80 @@
from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor
from ..utils import int_or_none
class VideoBamIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?videobam\.com/(?:videos/download/)?(?P<id>[a-zA-Z]+)'
_TESTS = [
{
'url': 'http://videobam.com/OiJQM',
'md5': 'db471f27763a531f10416a0c58b5a1e0',
'info_dict': {
'id': 'OiJQM',
'ext': 'mp4',
'title': 'Is Alcohol Worse Than Ecstasy?',
'description': 'md5:d25b96151515c91debc42bfbb3eb2683',
'uploader': 'frihetsvinge',
},
},
{
'url': 'http://videobam.com/pqLvq',
'md5': 'd9a565b5379a99126ef94e1d7f9a383e',
'note': 'HD video',
'info_dict': {
'id': 'pqLvq',
'ext': 'mp4',
}
},
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
page = self._download_webpage('http://videobam.com/%s' % video_id, video_id, 'Downloading page')
formats = []
for preference, format_id in enumerate(['low', 'high']):
mobj = re.search(r"%s: '(?P<url>[^']+)'" % format_id, page)
if not mobj:
continue
formats.append({
'url': mobj.group('url'),
'ext': 'mp4',
'format_id': format_id,
'preference': preference,
})
if not formats:
player_config = json.loads(self._html_search_regex(r'var player_config = ({.+?});', page, 'player config'))
formats = [{
'url': item['url'],
'ext': 'mp4',
} for item in player_config['playlist'] if 'autoPlay' in item]
self._sort_formats(formats)
title = self._og_search_title(page, default='VideoBam', fatal=False)
description = self._og_search_description(page, default=None)
thumbnail = self._og_search_thumbnail(page)
uploader = self._html_search_regex(r'Upload by ([^<]+)</a>', page, 'uploader', fatal=False, default=None)
view_count = int_or_none(
self._html_search_regex(r'<strong>Views:</strong> (\d+) ', page, 'view count', fatal=False))
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'uploader': uploader,
'view_count': view_count,
'formats': formats,
'age_limit': 18,
}

View File

@@ -37,9 +37,10 @@ class VimeoIE(SubtitlesInfoExtractor):
_TESTS = [ _TESTS = [
{ {
'url': 'http://vimeo.com/56015672#at=0', 'url': 'http://vimeo.com/56015672#at=0',
'file': '56015672.mp4',
'md5': '8879b6cc097e987f02484baf890129e5', 'md5': '8879b6cc097e987f02484baf890129e5',
'info_dict': { 'info_dict': {
'id': '56015672',
'ext': 'mp4',
"upload_date": "20121220", "upload_date": "20121220",
"description": "This is a test case for youtube-dl.\nFor more information, see github.com/rg3/youtube-dl\nTest chars: \u2605 \" ' \u5e78 / \\ \u00e4 \u21ad \U0001d550", "description": "This is a test case for youtube-dl.\nFor more information, see github.com/rg3/youtube-dl\nTest chars: \u2605 \" ' \u5e78 / \\ \u00e4 \u21ad \U0001d550",
"uploader_id": "user7108434", "uploader_id": "user7108434",

View File

@@ -1,8 +1,10 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
import json
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import unified_strdate
class VineIE(InfoExtractor): class VineIE(InfoExtractor):
@@ -13,31 +15,46 @@ class VineIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': 'b9KOOWX7HUx', 'id': 'b9KOOWX7HUx',
'ext': 'mp4', 'ext': 'mp4',
'uploader': 'Jack Dorsey',
'title': 'Chicken.', 'title': 'Chicken.',
'description': 'Chicken.',
'upload_date': '20130519',
'uploader': 'Jack Dorsey',
'uploader_id': '76',
}, },
} }
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id') video_id = mobj.group('id')
webpage_url = 'https://vine.co/v/' + video_id
webpage = self._download_webpage(webpage_url, video_id)
self.report_extraction(video_id) webpage = self._download_webpage('https://vine.co/v/' + video_id, video_id)
video_url = self._html_search_meta('twitter:player:stream', webpage, data = json.loads(self._html_search_regex(
'video URL') r'window\.POST_DATA = { %s: ({.+?}) }' % video_id, webpage, 'vine data'))
uploader = self._html_search_regex(r'<p class="username">(.*?)</p>', formats = [
webpage, 'uploader', fatal=False, flags=re.DOTALL) {
'url': data['videoLowURL'],
'ext': 'mp4',
'format_id': 'low',
},
{
'url': data['videoUrl'],
'ext': 'mp4',
'format_id': 'standard',
}
]
return { return {
'id': video_id, 'id': video_id,
'url': video_url,
'ext': 'mp4',
'title': self._og_search_title(webpage), 'title': self._og_search_title(webpage),
'thumbnail': self._og_search_thumbnail(webpage), 'description': data['description'],
'uploader': uploader, 'thumbnail': data['thumbnailUrl'],
'upload_date': unified_strdate(data['created']),
'uploader': data['username'],
'uploader_id': data['userIdStr'],
'like_count': data['likes']['count'],
'comment_count': data['comments']['count'],
'repost_count': data['reposts']['count'],
'formats': formats,
} }

View File

@@ -6,14 +6,15 @@ from .common import InfoExtractor
class WimpIE(InfoExtractor): class WimpIE(InfoExtractor):
_VALID_URL = r'(?:http://)?(?:www\.)?wimp\.com/([^/]+)/' _VALID_URL = r'http://(?:www\.)?wimp\.com/([^/]+)/'
_TEST = { _TEST = {
'url': 'http://www.wimp.com/deerfence/', 'url': 'http://www.wimp.com/maruexhausted/',
'file': 'deerfence.flv', 'md5': 'f1acced123ecb28d9bb79f2479f2b6a1',
'md5': '8b215e2e0168c6081a1cf84b2846a2b5',
'info_dict': { 'info_dict': {
"title": "Watch Till End: Herd of deer jump over a fence.", 'id': 'maruexhausted',
"description": "These deer look as fluid as running water when they jump over this fence as a herd. This video is one that needs to be watched until the very end for the true majesty to be witnessed, but once it comes, it's sure to take your breath away.", 'ext': 'flv',
'title': 'Maru is exhausted.',
'description': 'md5:57e099e857c0a4ea312542b684a869b8',
} }
} }

View File

@@ -4,51 +4,51 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
compat_urllib_parse,
ExtractorError, ExtractorError,
unified_strdate,
str_to_int,
int_or_none,
parse_duration,
) )
class XHamsterIE(InfoExtractor): class XHamsterIE(InfoExtractor):
"""Information Extractor for xHamster""" """Information Extractor for xHamster"""
_VALID_URL = r'(?:http://)?(?:www\.)?xhamster\.com/movies/(?P<id>[0-9]+)/(?P<seo>.+?)\.html(?:\?.*)?' _VALID_URL = r'http://(?:www\.)?xhamster\.com/movies/(?P<id>[0-9]+)/(?P<seo>.+?)\.html(?:\?.*)?'
_TESTS = [{ _TESTS = [
{
'url': 'http://xhamster.com/movies/1509445/femaleagent_shy_beauty_takes_the_bait.html', 'url': 'http://xhamster.com/movies/1509445/femaleagent_shy_beauty_takes_the_bait.html',
'file': '1509445.mp4',
'md5': '8281348b8d3c53d39fffb377d24eac4e', 'md5': '8281348b8d3c53d39fffb377d24eac4e',
'info_dict': { 'info_dict': {
"upload_date": "20121014", 'id': '1509445',
"uploader_id": "Ruseful2011", 'ext': 'mp4',
"title": "FemaleAgent Shy beauty takes the bait", 'title': 'FemaleAgent Shy beauty takes the bait',
"age_limit": 18, 'upload_date': '20121014',
'uploader_id': 'Ruseful2011',
'duration': 893,
'age_limit': 18,
} }
}, },
{ {
'url': 'http://xhamster.com/movies/2221348/britney_spears_sexy_booty.html?hd', 'url': 'http://xhamster.com/movies/2221348/britney_spears_sexy_booty.html?hd',
'file': '2221348.flv', 'md5': '4cbd8d56708ecb4fb4124c23e4acb81a',
'md5': 'e767b9475de189320f691f49c679c4c7',
'info_dict': { 'info_dict': {
"upload_date": "20130914", 'id': '2221348',
"uploader_id": "jojo747400", 'ext': 'mp4',
"title": "Britney Spears Sexy Booty", 'title': 'Britney Spears Sexy Booty',
"age_limit": 18, 'upload_date': '20130914',
'uploader_id': 'jojo747400',
'duration': 200,
'age_limit': 18,
} }
}] }
]
def _real_extract(self,url): def _real_extract(self,url):
def extract_video_url(webpage): def extract_video_url(webpage):
mobj = re.search(r'\'srv\': \'(?P<server>[^\']*)\',\s*\'file\': \'(?P<file>[^\']+)\',', webpage) mp4 = re.search(r'<video\s+.*?file="([^"]+)".*?>', webpage)
if mobj is None:
raise ExtractorError('Unable to extract media URL')
if len(mobj.group('server')) == 0:
return compat_urllib_parse.unquote(mobj.group('file'))
else:
return mobj.group('server')+'/key='+mobj.group('file')
def extract_mp4_video_url(webpage):
mp4 = re.search(r'<a href=\"(.+?)\" class=\"mp4Play\"',webpage)
if mp4 is None: if mp4 is None:
return None raise ExtractorError('Unable to extract media URL')
else: else:
return mp4.group(1) return mp4.group(1)
@@ -62,50 +62,49 @@ class XHamsterIE(InfoExtractor):
mrss_url = 'http://xhamster.com/movies/%s/%s.html' % (video_id, seo) mrss_url = 'http://xhamster.com/movies/%s/%s.html' % (video_id, seo)
webpage = self._download_webpage(mrss_url, video_id) webpage = self._download_webpage(mrss_url, video_id)
video_title = self._html_search_regex( title = self._html_search_regex(r'<title>(?P<title>.+?) - xHamster\.com</title>', webpage, 'title')
r'<title>(?P<title>.+?) - xHamster\.com</title>', webpage, 'title')
# Only a few videos have an description # Only a few videos have an description
mobj = re.search(r'<span>Description: </span>([^<]+)', webpage) mobj = re.search(r'<span>Description: </span>([^<]+)', webpage)
video_description = mobj.group(1) if mobj else None description = mobj.group(1) if mobj else None
mobj = re.search(r'hint=\'(?P<upload_date_Y>[0-9]{4})-(?P<upload_date_m>[0-9]{2})-(?P<upload_date_d>[0-9]{2}) [0-9]{2}:[0-9]{2}:[0-9]{2} [A-Z]{3,4}\'', webpage) upload_date = self._html_search_regex(r'hint=\'(\d{4}-\d{2}-\d{2}) \d{2}:\d{2}:\d{2} [A-Z]{3,4}\'',
if mobj: webpage, 'upload date', fatal=False)
video_upload_date = mobj.group('upload_date_Y')+mobj.group('upload_date_m')+mobj.group('upload_date_d') if upload_date:
else: upload_date = unified_strdate(upload_date)
video_upload_date = None
self._downloader.report_warning('Unable to extract upload date')
video_uploader_id = self._html_search_regex( uploader_id = self._html_search_regex(r'<a href=\'/user/[^>]+>(?P<uploader_id>[^<]+)',
r'<a href=\'/user/[^>]+>(?P<uploader_id>[^<]+)',
webpage, 'uploader id', default='anonymous') webpage, 'uploader id', default='anonymous')
video_thumbnail = self._search_regex( thumbnail = self._html_search_regex(r'<video\s+.*?poster="([^"]+)".*?>', webpage, 'thumbnail', fatal=False)
r'\'image\':\'(?P<thumbnail>[^\']+)\'',
webpage, 'thumbnail', fatal=False) duration = parse_duration(self._html_search_regex(r'<span>Runtime:</span> (\d+:\d+)</div>',
webpage, 'duration', fatal=False))
view_count = self._html_search_regex(r'<span>Views:</span> ([^<]+)</div>', webpage, 'view count', fatal=False)
if view_count:
view_count = str_to_int(view_count)
mobj = re.search(r"hint='(?P<likecount>\d+) Likes / (?P<dislikecount>\d+) Dislikes'", webpage)
(like_count, dislike_count) = (mobj.group('likecount'), mobj.group('dislikecount')) if mobj else (None, None)
mobj = re.search(r'</label>Comments \((?P<commentcount>\d+)\)</div>', webpage)
comment_count = mobj.group('commentcount') if mobj else 0
age_limit = self._rta_search(webpage) age_limit = self._rta_search(webpage)
hd = is_hd(webpage) hd = is_hd(webpage)
video_url = extract_video_url(webpage) video_url = extract_video_url(webpage)
formats = [{ formats = [{
'url': video_url, 'url': video_url,
'format_id': 'hd' if hd else 'sd', 'format_id': 'hd' if hd else 'sd',
'preference': 0, 'preference': 1,
}] }]
video_mp4_url = extract_mp4_video_url(webpage)
if video_mp4_url is not None:
formats.append({
'url': video_mp4_url,
'ext': 'mp4',
'format_id': 'mp4-hd' if hd else 'mp4-sd',
'preference': 1,
})
if not hd: if not hd:
webpage = self._download_webpage( mrss_url = self._search_regex(r'<link rel="canonical" href="([^"]+)', webpage, 'mrss_url')
mrss_url + '?hd', video_id, note='Downloading HD webpage') webpage = self._download_webpage(mrss_url + '?hd', video_id, note='Downloading HD webpage')
if is_hd(webpage): if is_hd(webpage):
video_url = extract_video_url(webpage) video_url = extract_video_url(webpage)
formats.append({ formats.append({
@@ -118,11 +117,16 @@ class XHamsterIE(InfoExtractor):
return { return {
'id': video_id, 'id': video_id,
'title': video_title, 'title': title,
'formats': formats, 'description': description,
'description': video_description, 'upload_date': upload_date,
'upload_date': video_upload_date, 'uploader_id': uploader_id,
'uploader_id': video_uploader_id, 'thumbnail': thumbnail,
'thumbnail': video_thumbnail, 'duration': duration,
'view_count': view_count,
'like_count': int_or_none(like_count),
'dislike_count': int_or_none(dislike_count),
'comment_count': int_or_none(comment_count),
'age_limit': age_limit, 'age_limit': age_limit,
'formats': formats,
} }

View File

@@ -29,7 +29,6 @@ from ..utils import (
ExtractorError, ExtractorError,
int_or_none, int_or_none,
PagedList, PagedList,
RegexNotFoundError,
unescapeHTML, unescapeHTML,
unified_strdate, unified_strdate,
orderedSet, orderedSet,
@@ -138,13 +137,14 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
(?:(?:(?:(?:\w+\.)?[yY][oO][uU][tT][uU][bB][eE](?:-nocookie)?\.com/| (?:(?:(?:(?:\w+\.)?[yY][oO][uU][tT][uU][bB][eE](?:-nocookie)?\.com/|
(?:www\.)?deturl\.com/www\.youtube\.com/| (?:www\.)?deturl\.com/www\.youtube\.com/|
(?:www\.)?pwnyoutube\.com/| (?:www\.)?pwnyoutube\.com/|
(?:www\.)?yourepeat\.com/|
tube\.majestyc\.net/| tube\.majestyc\.net/|
youtube\.googleapis\.com/) # the various hostnames, with wildcard subdomains youtube\.googleapis\.com/) # the various hostnames, with wildcard subdomains
(?:.*?\#/)? # handle anchor (#/) redirect urls (?:.*?\#/)? # handle anchor (#/) redirect urls
(?: # the various things that can precede the ID: (?: # the various things that can precede the ID:
(?:(?:v|embed|e)/) # v/ or embed/ or e/ (?:(?:v|embed|e)/) # v/ or embed/ or e/
|(?: # or the v= param in all its forms |(?: # or the v= param in all its forms
(?:(?:watch|movie)(?:_popup)?(?:\.php)?)? # preceding watch(_popup|.php) or nothing (like /?v=xxxx) (?:(?:watch|movie)(?:_popup)?(?:\.php)?/?)? # preceding watch(_popup|.php) or nothing (like /?v=xxxx)
(?:\?|\#!?) # the params delimiter ? or # or #! (?:\?|\#!?) # the params delimiter ? or # or #!
(?:.*?&)? # any other preceding param (like /?s=tuff&v=xxxx) (?:.*?&)? # any other preceding param (like /?s=tuff&v=xxxx)
v= v=
@@ -199,9 +199,9 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
'135': {'ext': 'mp4', 'height': 480, 'resolution': '480p', 'format_note': 'DASH video', 'preference': -40}, '135': {'ext': 'mp4', 'height': 480, 'resolution': '480p', 'format_note': 'DASH video', 'preference': -40},
'136': {'ext': 'mp4', 'height': 720, 'resolution': '720p', 'format_note': 'DASH video', 'preference': -40}, '136': {'ext': 'mp4', 'height': 720, 'resolution': '720p', 'format_note': 'DASH video', 'preference': -40},
'137': {'ext': 'mp4', 'height': 1080, 'resolution': '1080p', 'format_note': 'DASH video', 'preference': -40}, '137': {'ext': 'mp4', 'height': 1080, 'resolution': '1080p', 'format_note': 'DASH video', 'preference': -40},
'138': {'ext': 'mp4', 'height': 1081, 'resolution': '>1080p', 'format_note': 'DASH video', 'preference': -40}, '138': {'ext': 'mp4', 'height': 2160, 'resolution': '2160p', 'format_note': 'DASH video', 'preference': -40},
'160': {'ext': 'mp4', 'height': 192, 'resolution': '192p', 'format_note': 'DASH video', 'preference': -40}, '160': {'ext': 'mp4', 'height': 192, 'resolution': '192p', 'format_note': 'DASH video', 'preference': -40},
'264': {'ext': 'mp4', 'height': 1080, 'resolution': '1080p', 'format_note': 'DASH video', 'preference': -40}, '264': {'ext': 'mp4', 'height': 1440, 'resolution': '1440p', 'format_note': 'DASH video', 'preference': -40},
# Dash mp4 audio # Dash mp4 audio
'139': {'ext': 'm4a', 'format_note': 'DASH audio', 'vcodec': 'none', 'abr': 48, 'preference': -50}, '139': {'ext': 'm4a', 'format_note': 'DASH audio', 'vcodec': 'none', 'abr': 48, 'preference': -50},
@@ -296,6 +296,23 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
u"format": "141", u"format": "141",
}, },
}, },
# DASH manifest with encrypted signature
{
u'url': u'https://www.youtube.com/watch?v=IB3lcPjvWLA',
u'info_dict': {
u'id': u'IB3lcPjvWLA',
u'ext': u'm4a',
u'title': u'Afrojack - The Spark ft. Spree Wilson',
u'description': u'md5:3199ed45ee8836572865580804d7ac0f',
u'uploader': u'AfrojackVEVO',
u'uploader_id': u'AfrojackVEVO',
u'upload_date': u'20131011',
},
u"params": {
u'youtube_include_dash_manifest': True,
u'format': '141',
},
},
] ]
@@ -1271,8 +1288,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
mobj = re.search(r';ytplayer.config = ({.*?});', video_webpage) mobj = re.search(r';ytplayer.config = ({.*?});', video_webpage)
if not mobj: if not mobj:
raise ValueError('Could not find vevo ID') raise ValueError('Could not find vevo ID')
info = json.loads(mobj.group(1)) ytplayer_config = json.loads(mobj.group(1))
args = info['args'] args = ytplayer_config['args']
# Easy way to know if the 's' value is in url_encoded_fmt_stream_map # Easy way to know if the 's' value is in url_encoded_fmt_stream_map
# this signatures are encrypted # this signatures are encrypted
if 'url_encoded_fmt_stream_map' not in args: if 'url_encoded_fmt_stream_map' not in args:
@@ -1365,12 +1382,24 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
raise ExtractorError(u'no conn, hlsvp or url_encoded_fmt_stream_map information found in video info') raise ExtractorError(u'no conn, hlsvp or url_encoded_fmt_stream_map information found in video info')
# Look for the DASH manifest # Look for the DASH manifest
dash_manifest_url_lst = video_info.get('dashmpd') if (self._downloader.params.get('youtube_include_dash_manifest', False)):
if (dash_manifest_url_lst and dash_manifest_url_lst[0] and
self._downloader.params.get('youtube_include_dash_manifest', False)):
try: try:
# The DASH manifest used needs to be the one from the original video_webpage.
# The one found in get_video_info seems to be using different signatures.
# However, in the case of an age restriction there won't be any embedded dashmpd in the video_webpage.
# Luckily, it seems, this case uses some kind of default signature (len == 86), so the
# combination of get_video_info and the _static_decrypt_signature() decryption fallback will work here.
if age_gate:
dash_manifest_url = video_info.get('dashmpd')[0]
else:
dash_manifest_url = ytplayer_config['args']['dashmpd']
def decrypt_sig(mobj):
s = mobj.group(1)
dec_s = self._decrypt_signature(s, video_id, player_url, age_gate)
return '/signature/%s' % dec_s
dash_manifest_url = re.sub(r'/s/([\w\.]+)', decrypt_sig, dash_manifest_url)
dash_doc = self._download_xml( dash_doc = self._download_xml(
dash_manifest_url_lst[0], video_id, dash_manifest_url, video_id,
note=u'Downloading DASH manifest', note=u'Downloading DASH manifest',
errnote=u'Could not download DASH manifest') errnote=u'Could not download DASH manifest')
for r in dash_doc.findall(u'.//{urn:mpeg:DASH:schema:MPD:2011}Representation'): for r in dash_doc.findall(u'.//{urn:mpeg:DASH:schema:MPD:2011}Representation'):
@@ -1442,9 +1471,9 @@ class YoutubePlaylistIE(YoutubeBaseInfoExtractor):
| |
((?:PL|EC|UU|FL|RD)[0-9A-Za-z-_]{10,}) ((?:PL|EC|UU|FL|RD)[0-9A-Za-z-_]{10,})
)""" )"""
_TEMPLATE_URL = 'https://www.youtube.com/playlist?list=%s&page=%s' _TEMPLATE_URL = 'https://www.youtube.com/playlist?list=%s'
_MORE_PAGES_INDICATOR = r'data-link-type="next"' _MORE_PAGES_INDICATOR = r'data-link-type="next"'
_VIDEO_RE = r'href="/watch\?v=(?P<id>[0-9A-Za-z_-]{11})&amp;[^"]*?index=(?P<index>\d+)' _VIDEO_RE = r'href="\s*/watch\?v=(?P<id>[0-9A-Za-z_-]{11})&amp;[^"]*?index=(?P<index>\d+)'
IE_NAME = u'youtube:playlist' IE_NAME = u'youtube:playlist'
def _real_initialize(self): def _real_initialize(self):
@@ -1459,11 +1488,15 @@ class YoutubePlaylistIE(YoutubeBaseInfoExtractor):
# the id of the playlist is just 'RD' + video_id # the id of the playlist is just 'RD' + video_id
url = 'https://youtube.com/watch?v=%s&list=%s' % (playlist_id[-11:], playlist_id) url = 'https://youtube.com/watch?v=%s&list=%s' % (playlist_id[-11:], playlist_id)
webpage = self._download_webpage(url, playlist_id, u'Downloading Youtube mix') webpage = self._download_webpage(url, playlist_id, u'Downloading Youtube mix')
title_span = (get_element_by_attribute('class', 'title long-title', webpage) or search_title = lambda class_name: get_element_by_attribute('class', class_name, webpage)
get_element_by_attribute('class', 'title ', webpage)) title_span = (search_title('playlist-title') or
search_title('title long-title') or search_title('title'))
title = clean_html(title_span) title = clean_html(title_span)
video_re = r'data-index="\d+".*?href="/watch\?v=([0-9A-Za-z_-]{11})&amp;[^"]*?list=%s' % re.escape(playlist_id) video_re = r'''(?x)data-video-username="(.*?)".*?
ids = orderedSet(re.findall(video_re, webpage)) href="/watch\?v=([0-9A-Za-z_-]{11})&amp;[^"]*?list=%s''' % re.escape(playlist_id)
matches = orderedSet(re.findall(video_re, webpage, flags=re.DOTALL))
# Some of the videos may have been deleted, their username field is empty
ids = [video_id for (username, video_id) in matches if username]
url_results = self._ids_to_results(ids) url_results = self._ids_to_results(ids)
return self.playlist_result(url_results, playlist_id, title) return self.playlist_result(url_results, playlist_id, title)
@@ -1492,29 +1525,31 @@ class YoutubePlaylistIE(YoutubeBaseInfoExtractor):
raise ExtractorError(u'For downloading YouTube.com top lists, use ' raise ExtractorError(u'For downloading YouTube.com top lists, use '
u'the "yttoplist" keyword, for example "youtube-dl \'yttoplist:music:Top Tracks\'"', expected=True) u'the "yttoplist" keyword, for example "youtube-dl \'yttoplist:music:Top Tracks\'"', expected=True)
url = self._TEMPLATE_URL % playlist_id
page = self._download_webpage(url, playlist_id)
more_widget_html = content_html = page
# Extract the video ids from the playlist pages # Extract the video ids from the playlist pages
ids = [] ids = []
for page_num in itertools.count(1): for page_num in itertools.count(1):
url = self._TEMPLATE_URL % (playlist_id, page_num) matches = re.finditer(self._VIDEO_RE, content_html)
page = self._download_webpage(url, playlist_id, u'Downloading page #%s' % page_num)
matches = re.finditer(self._VIDEO_RE, page)
# We remove the duplicates and the link with index 0 # We remove the duplicates and the link with index 0
# (it's not the first video of the playlist) # (it's not the first video of the playlist)
new_ids = orderedSet(m.group('id') for m in matches if m.group('index') != '0') new_ids = orderedSet(m.group('id') for m in matches if m.group('index') != '0')
ids.extend(new_ids) ids.extend(new_ids)
if re.search(self._MORE_PAGES_INDICATOR, page) is None: mobj = re.search(r'data-uix-load-more-href="/?(?P<more>[^"]+)"', more_widget_html)
if not mobj:
break break
try: more = self._download_json(
playlist_title = self._og_search_title(page) 'https://youtube.com/%s' % mobj.group('more'), playlist_id, 'Downloading page #%s' % page_num)
except RegexNotFoundError: content_html = more['content_html']
self.report_warning( more_widget_html = more['load_more_widget_html']
u'Playlist page is missing OpenGraph title, falling back ...',
playlist_id)
playlist_title = self._html_search_regex( playlist_title = self._html_search_regex(
r'<h1 class="pl-header-title">(.*?)</h1>', page, u'title') r'<h1 class="pl-header-title">\s*(.*?)\s*</h1>', page, u'title')
url_results = self._ids_to_results(ids) url_results = self._ids_to_results(ids)
return self.playlist_result(url_results, playlist_id, playlist_title) return self.playlist_result(url_results, playlist_id, playlist_title)
@@ -1815,7 +1850,7 @@ class YoutubeTruncatedURLIE(InfoExtractor):
IE_NAME = 'youtube:truncated_url' IE_NAME = 'youtube:truncated_url'
IE_DESC = False # Do not list IE_DESC = False # Do not list
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
(?:https?://)?[^/]+/watch\?feature=[a-z_]+$| (?:https?://)?[^/]+/watch\?(?:feature=[a-z_]+)?$|
(?:https?://)?(?:www\.)?youtube\.com/attribution_link\?a=[^&]+$ (?:https?://)?(?:www\.)?youtube\.com/attribution_link\?a=[^&]+$
''' '''

View File

@@ -1,4 +1,5 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals
import re import re
@@ -13,52 +14,42 @@ class ZDFIE(InfoExtractor):
_VALID_URL = r'^https?://www\.zdf\.de/ZDFmediathek(?P<hash>#)?/(.*beitrag/(?:video/)?)(?P<video_id>[0-9]+)(?:/[^/?]+)?(?:\?.*)?' _VALID_URL = r'^https?://www\.zdf\.de/ZDFmediathek(?P<hash>#)?/(.*beitrag/(?:video/)?)(?P<video_id>[0-9]+)(?:/[^/?]+)?(?:\?.*)?'
_TEST = { _TEST = {
u"url": u"http://www.zdf.de/ZDFmediathek/beitrag/video/2037704/ZDFspezial---Ende-des-Machtpokers--?bc=sts;stt", 'url': 'http://www.zdf.de/ZDFmediathek/beitrag/video/2037704/ZDFspezial---Ende-des-Machtpokers--?bc=sts;stt',
u"file": u"2037704.webm", 'info_dict': {
u"info_dict": { 'id': '2037704',
u"upload_date": u"20131127", 'ext': 'webm',
u"description": u"Union und SPD haben sich auf einen Koalitionsvertrag geeinigt. Aber was bedeutet das für die Bürger? Sehen Sie hierzu das ZDFspezial \"Ende des Machtpokers - Große Koalition für Deutschland\".", 'title': 'ZDFspezial - Ende des Machtpokers',
u"uploader": u"spezial", 'description': 'Union und SPD haben sich auf einen Koalitionsvertrag geeinigt. Aber was bedeutet das für die Bürger? Sehen Sie hierzu das ZDFspezial "Ende des Machtpokers - Große Koalition für Deutschland".',
u"title": u"ZDFspezial - Ende des Machtpokers" 'duration': 1022,
'uploader': 'spezial',
'uploader_id': '225948',
'upload_date': '20131127',
}, },
u"skip": u"Videos on ZDF.de are depublicised in short order", 'skip': 'Videos on ZDF.de are depublicised in short order',
} }
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('video_id') video_id = mobj.group('video_id')
xml_url = u'http://www.zdf.de/ZDFmediathek/xmlservice/web/beitragsDetails?ak=web&id=%s' % video_id xml_url = 'http://www.zdf.de/ZDFmediathek/xmlservice/web/beitragsDetails?ak=web&id=%s' % video_id
doc = self._download_xml( doc = self._download_xml(
xml_url, video_id, xml_url, video_id,
note=u'Downloading video info', note='Downloading video info',
errnote=u'Failed to download video info') errnote='Failed to download video info')
title = doc.find('.//information/title').text title = doc.find('.//information/title').text
description = doc.find('.//information/detail').text description = doc.find('.//information/detail').text
duration = int(doc.find('.//details/lengthSec').text)
uploader_node = doc.find('.//details/originChannelTitle') uploader_node = doc.find('.//details/originChannelTitle')
uploader = None if uploader_node is None else uploader_node.text uploader = None if uploader_node is None else uploader_node.text
duration_str = doc.find('.//details/length').text uploader_id_node = doc.find('.//details/originChannelId')
duration_m = re.match(r'''(?x)^ uploader_id = None if uploader_id_node is None else uploader_id_node.text
(?P<hours>[0-9]{2})
:(?P<minutes>[0-9]{2})
:(?P<seconds>[0-9]{2})
(?:\.(?P<ms>[0-9]+)?)
''', duration_str)
duration = (
(
(int(duration_m.group('hours')) * 60 * 60) +
(int(duration_m.group('minutes')) * 60) +
int(duration_m.group('seconds'))
)
if duration_m
else None
)
upload_date = unified_strdate(doc.find('.//details/airtime').text) upload_date = unified_strdate(doc.find('.//details/airtime').text)
def xml_to_format(fnode): def xml_to_format(fnode):
video_url = fnode.find('url').text video_url = fnode.find('url').text
is_available = u'http://www.metafilegenerator' not in video_url is_available = 'http://www.metafilegenerator' not in video_url
format_id = fnode.attrib['basetype'] format_id = fnode.attrib['basetype']
format_m = re.match(r'''(?x) format_m = re.match(r'''(?x)
@@ -71,22 +62,28 @@ class ZDFIE(InfoExtractor):
quality = fnode.find('./quality').text quality = fnode.find('./quality').text
abr = int(fnode.find('./audioBitrate').text) // 1000 abr = int(fnode.find('./audioBitrate').text) // 1000
vbr = int(fnode.find('./videoBitrate').text) // 1000 vbr_node = fnode.find('./videoBitrate')
vbr = None if vbr_node is None else int(vbr_node.text) // 1000
format_note = u'' width_node = fnode.find('./width')
width = None if width_node is None else int_or_none(width_node.text)
height_node = fnode.find('./height')
height = None if height_node is None else int_or_none(height_node.text)
format_note = ''
if not format_note: if not format_note:
format_note = None format_note = None
return { return {
'format_id': format_id + u'-' + quality, 'format_id': format_id + '-' + quality,
'url': video_url, 'url': video_url,
'ext': ext, 'ext': ext,
'acodec': format_m.group('acodec'), 'acodec': format_m.group('acodec'),
'vcodec': format_m.group('vcodec'), 'vcodec': format_m.group('vcodec'),
'abr': abr, 'abr': abr,
'vbr': vbr, 'vbr': vbr,
'width': int_or_none(fnode.find('./width').text), 'width': width,
'height': int_or_none(fnode.find('./height').text), 'height': height,
'filesize': int_or_none(fnode.find('./filesize').text), 'filesize': int_or_none(fnode.find('./filesize').text),
'format_note': format_note, 'format_note': format_note,
'protocol': proto, 'protocol': proto,
@@ -103,9 +100,10 @@ class ZDFIE(InfoExtractor):
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'formats': formats,
'description': description, 'description': description,
'uploader': uploader,
'duration': duration, 'duration': duration,
'uploader': uploader,
'uploader_id': uploader_id,
'upload_date': upload_date, 'upload_date': upload_date,
'formats': formats,
} }

View File

@@ -1,6 +1,7 @@
#!/usr/bin/env python #!/usr/bin/env python
# -*- coding: utf-8 -*- # -*- coding: utf-8 -*-
import contextlib
import ctypes import ctypes
import datetime import datetime
import email.utils import email.utils
@@ -174,6 +175,11 @@ try:
except NameError: except NameError:
compat_chr = chr compat_chr = chr
try:
from xml.etree.ElementTree import ParseError as compat_xml_parse_error
except ImportError: # Python 2.6
from xml.parsers.expat import ExpatError as compat_xml_parse_error
def compat_ord(c): def compat_ord(c):
if type(c) is int: return c if type(c) is int: return c
else: return ord(c) else: return ord(c)
@@ -766,6 +772,7 @@ def unified_strdate(date_str):
'%B %d %Y', '%B %d %Y',
'%b %d %Y', '%b %d %Y',
'%Y-%m-%d', '%Y-%m-%d',
'%d.%m.%Y',
'%d/%m/%Y', '%d/%m/%Y',
'%Y/%m/%d %H:%M:%S', '%Y/%m/%d %H:%M:%S',
'%Y-%m-%d %H:%M:%S', '%Y-%m-%d %H:%M:%S',
@@ -774,6 +781,7 @@ def unified_strdate(date_str):
'%Y-%m-%dT%H:%M:%S.%fZ', '%Y-%m-%dT%H:%M:%S.%fZ',
'%Y-%m-%dT%H:%M:%S.%f0Z', '%Y-%m-%dT%H:%M:%S.%f0Z',
'%Y-%m-%dT%H:%M:%S', '%Y-%m-%dT%H:%M:%S',
'%Y-%m-%dT%H:%M:%S.%f',
'%Y-%m-%dT%H:%M', '%Y-%m-%dT%H:%M',
] ]
for expression in format_expressions: for expression in format_expressions:
@@ -1239,3 +1247,19 @@ except TypeError:
else: else:
struct_pack = struct.pack struct_pack = struct.pack
struct_unpack = struct.unpack struct_unpack = struct.unpack
def read_batch_urls(batch_fd):
def fixup(url):
if not isinstance(url, compat_str):
url = url.decode('utf-8', 'replace')
BOM_UTF8 = u'\xef\xbb\xbf'
if url.startswith(BOM_UTF8):
url = url[len(BOM_UTF8):]
url = url.strip()
if url.startswith(('#', ';', ']')):
return False
return url
with contextlib.closing(batch_fd) as fd:
return [url for url in map(fixup, fd) if url]

View File

@@ -1,2 +1,2 @@
__version__ = '2014.02.17' __version__ = '2014.02.27.1'