Compare commits

...

518 Commits

Author SHA1 Message Date
Philipp Hagemeister
1cf64ee468 release 2013.10.23.2 2013-10-23 18:38:09 +02:00
Jaime Marquínez Ferrándiz
cdec0190c4 [dailymotion] Extract all the available formats (closes #1028) 2013-10-23 17:33:38 +02:00
Jaime Marquínez Ferrándiz
2450bcb28b [nowvideo] Fix key extraction
Extract it from the embed page
2013-10-23 17:00:33 +02:00
Jaime Marquínez Ferrándiz
3126050c0f Hide the video password on verbose mode 2013-10-23 16:32:17 +02:00
Jaime Marquínez Ferrándiz
93b22c7828 [vimeo] fix the extraction for videos protected with password
Added a test video.
2013-10-23 16:31:53 +02:00
Philipp Hagemeister
0a89b2852e release 2013.10.23.1 2013-10-23 15:12:33 +02:00
Jaime Marquínez Ferrándiz
55b3e45bba [vimeo] Fix pro videos and player.vimeo.com urls
The old process can still be used for those videos.
Added RegexNotFoundError, which is raised by _search_regex if it can't extract the info.
2013-10-23 14:38:03 +02:00
Philipp Hagemeister
365bcf6d97 Merge remote-tracking branch 'origin/master' 2013-10-23 11:40:46 +02:00
Philipp Hagemeister
71907db3ba [vimeo] Fix normal videos (Fixes #1642)
Vimeo Pro Videos are still broken
2013-10-23 11:38:53 +02:00
Philipp Hagemeister
6803655ced Merge pull request #1622 from rbrito/fix-extension
extractor: youtube: Set extension of AAC audio formats to m4a.
2013-10-22 15:16:26 -07:00
Philipp Hagemeister
df1c39ec5c release 2013.10.23 2013-10-23 00:07:27 +02:00
Philipp Hagemeister
80f55a9511 release 2013.10.22 2013-10-22 22:35:13 +02:00
Philipp Hagemeister
7853cc5ae1 Merge remote-tracking branch 'origin/master'
Conflicts:
	youtube_dl/YoutubeDL.py
2013-10-22 22:30:06 +02:00
Philipp Hagemeister
586a91b67f Expand tilde in template (Fixes #1639) 2013-10-22 22:28:26 +02:00
Jaime Marquínez Ferrándiz
b028e96144 [arte.tv:creative] Update the title of the test 2013-10-22 21:06:06 +02:00
Jaime Marquínez Ferrándiz
ce68b5907c [nhl:videocenter] Fix playlist title extraction 2013-10-22 21:01:16 +02:00
Jaime Marquínez Ferrándiz
fe7e0c9825 Style fixes in YoutubeDL.py
Fixed some of the problems reported by pep8
2013-10-22 14:49:34 +02:00
Jaime Marquínez Ferrándiz
12893efe01 Respect the download parameter in YoutubeDL.process_video_result if the extractor handle the format selection 2013-10-22 00:01:59 +02:00
Joshua Elsasser
a6387bfd3c [vimeo] Implement the new format selection system (closes PR #996)
Rebased and deleted some parts to use the new system instead of copying the one from YoutubeIE
2013-10-21 23:16:11 +02:00
Jaime Marquínez Ferrándiz
f6a54188c2 [youtube] Use 'node is None' when checking if the video has automatic captions
It had stopped working and it reports a FutureWarning
2013-10-21 16:28:55 +02:00
Jaime Marquínez Ferrándiz
cbbd9a9c69 Fix the duration field for the VideoDetective and InternetVideoArchive tests
Also remove the use of the old format system and the comment
2013-10-21 15:07:33 +02:00
Jaime Marquínez Ferrándiz
685a9cd2f1 [googleplus] Fix upload_date extraction 2013-10-21 15:00:21 +02:00
Jaime Marquínez Ferrándiz
182a107877 [arte] Set the format_note and the format_id fields (closes #1628) 2013-10-21 14:42:30 +02:00
Jaime Marquínez Ferrándiz
8c51aa6506 The 'format' field now defaults to '{format_id} - {width}x{height}{format_note}'
Following the YoutubeIE format. The 'format_note' gives additional info about the format, for example '3D' or 'DASH video'.
2013-10-21 14:42:06 +02:00
Jaime Marquínez Ferrándiz
3fd39e37f2 YoutubeDL: remove method that came from FileDownloader 2013-10-21 13:52:24 +02:00
Jaime Marquínez Ferrándiz
49e86983e7 Allow to use the extension for the format selection
The best format with the extension is downloaded.
2013-10-21 13:31:55 +02:00
Jaime Marquínez Ferrándiz
a9c58ad945 Accept requested formats to be in the format 35/best (closes #1552)
The format selection code is now an independent function.
2013-10-21 13:19:58 +02:00
Philipp Hagemeister
f8b45beacc Merge remote-tracking branch 'rbrito/set-age'
Conflicts:
	youtube_dl/extractor/xhamster.py
2013-10-19 21:16:14 +02:00
Philipp Hagemeister
9d92015d43 [xhamster] Add support for age_limit (Instead of #1627) 2013-10-19 21:09:48 +02:00
Rogério Brito
50a6150ed9 extractor: Set age limit on some adult-related extractors.
More age limit of videos for adult-related sites.

Note that, for redtube, I explicitly left the variable containing the age
limit, since the comment justifying the age limit is a good thing to have.

That being said, I included the age limit field on the test, to better
reflect what the information extractor does (even if it may not break the
automated tests).

Signed-off-by: Rogério Brito <rbrito@ime.usp.br>
2013-10-19 14:19:25 -03:00
Philipp Hagemeister
284acd57d6 Add an author email 2013-10-19 11:14:20 +02:00
Rogério Brito
8ed6b34477 extractor: Set age limit on some adult-related extractors.
This is similar in spirit to what was done in commit 8e590a117f.

Signed-off-by: Rogério Brito <rbrito@ime.usp.br>
2013-10-18 19:32:37 -03:00
Rogério Brito
f6f1fc9286 extractor: youtube: Fix extension of dash formats.
While we are at it, separate the audio formats from the video formats.

Signed-off-by: Rogério Brito <rbrito@ime.usp.br>
2013-10-18 18:53:00 -03:00
Philipp Hagemeister
8e590a117f [xnxx] Add age_limit 2013-10-18 23:35:17 +02:00
Philipp Hagemeister
d5594202aa Simplify release process 2013-10-18 23:34:55 +02:00
Philipp Hagemeister
b186d949cf release 2013.10.18.2 2013-10-18 23:22:54 +02:00
Philipp Hagemeister
3d2986063c [bash-completion] Do not use dash in function name (Fixes #1623) 2013-10-18 23:13:46 +02:00
Philipp Hagemeister
41fd7c7e60 Add new option --abort-on-error 2013-10-18 23:09:32 +02:00
Philipp Hagemeister
fdefe96bf2 Document %(format)s (#1612) 2013-10-18 23:09:08 +02:00
Rogério Brito
16f36a6fc9 extractor: youtube: Set extension of AAC audio formats to m4a.
This, in particular, eases downloading both audio and videos in DASH formats
before muxing them, which alleviates the problem that I exposed on issue

Furthermore, one may argue that this is, indeed, the case for correctness's
sake.

Signed-off-by: Rogério Brito <rbrito@ime.usp.br>
2013-10-18 17:50:55 -03:00
Philipp Hagemeister
cce722b79c Add metavar to --cache-dir 2013-10-18 11:50:48 +02:00
Philipp Hagemeister
82697fb2ab release 2013.10.18.1 2013-10-18 11:45:30 +02:00
Philipp Hagemeister
53c1d3ef49 Check for embedded YouTube player (Fixes #1616) 2013-10-18 11:44:57 +02:00
Philipp Hagemeister
8e55e9abfc release 2013.10.18 2013-10-18 11:17:21 +02:00
Philipp Hagemeister
7c58ef3275 [tudou] Fix title regex (Fixes #1614) 2013-10-18 11:16:20 +02:00
Philipp Hagemeister
416a5efce7 fix typos 2013-10-18 00:49:45 +02:00
Philipp Hagemeister
f4d96df0f1 Extend #980 with --max-quality support 2013-10-18 00:46:35 +02:00
Philipp Hagemeister
5d254f776a Fix test 2013-10-18 00:27:51 +02:00
Philipp Hagemeister
1c1218fefc Merge remote-tracking branch 'jaimeMF/format_selection' 2013-10-18 00:17:03 +02:00
Jaime Marquínez Ferrándiz
d21ab29200 Add an extractor for techtalks.tv (closes #1606) 2013-10-17 08:20:58 +02:00
Philipp Hagemeister
54ed626cf8 release 2013.10.17 2013-10-17 02:20:26 +02:00
Philipp Hagemeister
a733eb6c53 [youtube] Do not crash if caption info is missing altogether (Fixes #1610) 2013-10-17 02:19:19 +02:00
Philipp Hagemeister
591454798d [brightcove] Raise error if playlist is empty (#1608) 2013-10-17 01:02:17 +02:00
Philipp Hagemeister
38604f1a4f Merge remote-tracking branch 'origin/master' 2013-10-17 00:55:06 +02:00
Philipp Hagemeister
2d0efe70a6 [brightcove] Fix more broken XML (#1608) 2013-10-17 00:46:11 +02:00
Jaime Marquínez Ferrándiz
bfd14b1b2f Add an extractor for rutube.ru (closes #1136)
It downloads with a m3u8 manifest, requires ffmpeg.
2013-10-16 16:57:40 +02:00
Jaime Marquínez Ferrándiz
76965512da Fix the indentation of the Makefile
It uses tabs, no spaces.
2013-10-15 23:15:15 +02:00
Jaime Marquínez Ferrándiz
996d1c3242 Don't include the test/testdata directory in the youtube-dl.tar.gz
The last releases included big files that increased the size of the compressed file.
2013-10-15 23:08:52 +02:00
Philipp Hagemeister
8abbf43f21 release 2013.10.15 2013-10-15 12:06:45 +02:00
Philipp Hagemeister
10eaae48ff Merge branch 'master' of github.com:rg3/youtube-dl 2013-10-15 12:05:24 +02:00
Philipp Hagemeister
9d4660cab1 [generic] Support embedded vimeo videos (#1602) 2013-10-15 12:05:13 +02:00
Jaime Marquínez Ferrándiz
9d74e308f7 [sztvhu] Fix the title extraction 2013-10-15 08:22:59 +02:00
Jaime Marquínez Ferrándiz
e772692ffd Fix an import in the tests and the Youtube Shows test 2013-10-15 08:22:20 +02:00
Jaime Marquínez Ferrándiz
8381a92120 [websurg] Skipt the test
It needs login information.
2013-10-15 08:12:30 +02:00
Philipp Hagemeister
cd054fc491 Use upper-case for prefixes in help to signify bytes (#1043) 2013-10-15 04:53:02 +02:00
Philipp Hagemeister
f219743e33 Merge remote-tracking branch 'alphapapa/master' 2013-10-15 04:52:07 +02:00
Philipp Hagemeister
4f41664de8 Merge remote-tracking branch 'Rudloff/websurg' 2013-10-15 02:11:33 +02:00
Philipp Hagemeister
a4fd04158e Do not import * 2013-10-15 02:07:26 +02:00
Philipp Hagemeister
44a5f1718a Simplify tests
* Make them directly executable again
* Move common stuff (md5, parameters) to helper
* Never import *
* General clean up
2013-10-15 02:00:55 +02:00
Philipp Hagemeister
a623df4c7b Credit @Elbandi for sztvhu 2013-10-15 01:34:47 +02:00
Philipp Hagemeister
7cf67fbe29 [sztvhu] Simplify 2013-10-15 01:33:20 +02:00
Philipp Hagemeister
3ddf1a6d01 Merge remote-tracking branch 'Elbandi/master' 2013-10-15 01:26:34 +02:00
Philipp Hagemeister
850555c484 Merge remote-tracking branch 'origin/master' 2013-10-15 01:25:47 +02:00
Philipp Hagemeister
9ed3bdc64d [tudou] Add support for youku links (Closes #1571) 2013-10-15 01:20:04 +02:00
Jaime Marquínez Ferrándiz
c45aa56080 [gamespot] Fix video extraction (fixes #1587) 2013-10-14 16:46:07 +02:00
Philipp Hagemeister
7394b8db3b Merge remote-tracking branch 'origin/master' 2013-10-14 16:07:53 +02:00
Andras Elso
f9b3d7af47 Add an extractor for Szombathelyi TV 2013-10-14 13:07:47 +02:00
Filippo Valsorda
ea62a2da46 add VideoPremium.tv RTMP support 2013-10-14 01:32:47 -04:00
Filippo Valsorda
7468b6b71d Merge pull request #1569 from Jaiz909/1321-download-annotations
Added downloading annotations download support - closes #1321
2013-10-13 22:03:22 -07:00
Jai Grimshaw
1fb07d10a3 [youtube] Adds #1312 Download annotations
Adds #1321 Download annotations from youtube
Annotations are downloaded and written to a .annotations.xml file using the https://www.youtube.com/annotations_invideo?features=1&legacy=1&video_id=$VIDEOID API.
Added unit test for annotations.
2013-10-14 16:22:27 +11:00
Philipp Hagemeister
9378ae6e1d [youku] Allow shortcut youku:ID and make non-matching groups non-matching (#1571) 2013-10-13 15:55:05 +02:00
Philipp Hagemeister
06723d47c4 Merge remote-tracking branch 'jaimeMF/opus-fix' 2013-10-13 15:26:10 +02:00
Jaime Marquínez Ferrándiz
69a0c470b5 [arte] Add an extractor for future.arte.tv (closes #1593) 2013-10-13 14:21:13 +02:00
Jaime Marquínez Ferrándiz
c40f5cf45c [arte] add an extractor for creative.arte.tv (#1593)
The +7 videos now use an independent extractor that is also used for the creative videos
2013-10-13 13:54:31 +02:00
Jaime Marquínez Ferrándiz
4b7b839f24 Add an extractor for rottentomatoes.com and improve InternetVideoArchiveIE to get the best quality 2013-10-12 22:22:31 +02:00
Jaime Marquínez Ferrándiz
3d60d33773 Add an extractor for videodetective.com (closes #262)
It uses the internetvideoarchive.com platform.
2013-10-12 21:36:17 +02:00
Jaime Marquínez Ferrándiz
d7e66d39a0 Add an extractor for internetvideoarchive.com videos
It's used by videodetective.com
2013-10-12 21:34:04 +02:00
Filippo Valsorda
d3f46b9aa5 Add support for single-test tox runs
Use a sintax like
    tox test.test_download:TestDownload.test_NowVideo
to run the specific test on all the tox environments (Python versions)
2013-10-12 13:17:11 -04:00
Filippo Valsorda
f5e54a1fda add support for NowVideo.ch 2013-10-12 13:11:03 -04:00
Jaime Marquínez Ferrándiz
4eb7f1d12e FFmpegPostProcessor: print the command line used if the --verbose option is given 2013-10-12 13:49:27 +02:00
Jaime Marquínez Ferrándiz
0f6d12e43c Don't set the '-aq' option with the opus format (fixes #1263) 2013-10-12 13:30:30 +02:00
Jaime Marquínez Ferrándiz
b4cdc245cf Merge pull request #1590 from joeyadams/master
Fix Brightcove detection when another Flash object is on the page
2013-10-12 02:09:39 -07:00
Joey Adams
3283533149 Fix Brightcove detection when another Flash object is on the page
The regex used non-greedy match, but alas it failed on input like this:

    <object class="...> ... class="BrightcoveExperience"

It captured two objects and the intervening HTML.  This commit fixes this by
not allowing a ">" to appear before BrightcoveExperience.

Video in question: http://www.harpercollinschildrens.com/feature/petethecat/
2013-10-11 21:52:33 -04:00
Jaime Marquínez Ferrándiz
8032e31f2d Merge pull request #1558 from rzhxeo/cinemassacre
Add support for http://cinemassacre.com
2013-10-11 20:38:26 +02:00
Jaime Marquínez Ferrándiz
d2f9cdb205 Merge branch 'cinemassacre' of github.com:rzhxeo/youtube-dl into rzhxeo-cinemassacre 2013-10-11 19:53:27 +02:00
Jaime Marquínez Ferrándiz
8016c92297 Fix the default values of format_id and format 2013-10-11 16:34:49 +02:00
Jaime Marquínez Ferrándiz
e028d0d1e3 Implement the prefer_free_formats in YoutubeDL 2013-10-11 16:34:49 +02:00
Jaime Marquínez Ferrándiz
79819f58f2 Default 'format' field to {width}x{height}
If width is None, use {height}p and if height is None, '???'
2013-10-11 16:34:49 +02:00
Jaime Marquínez Ferrándiz
6ff000b888 Do not handle format selection for IEs that already handle it 2013-10-11 16:34:48 +02:00
Jaime Marquínez Ferrándiz
99e206d508 Implement the max quality option in YoutubeDL 2013-10-11 16:34:48 +02:00
Jaime Marquínez Ferrándiz
dd82ffea0c Implement format selection in YoutubeDL
Now the IEs can set a formats field in the info_dict, with the formats ordered from worst to best quality. It's a list of dicts with the following fields:
* Mandatory: url and ext
* Optional: format and format_id

The format_id is used for choosing which formats have to be downloaded.

Now a video result is processed by the method process_video_result.
2013-10-11 16:34:48 +02:00
Jaime Marquínez Ferrándiz
3823342d9d [arte] Prepare for generic format support (#980) 2013-10-11 16:33:31 +02:00
Jaime Marquínez Ferrándiz
91dbaef406 [nhl] Add an extractor for videocenter's categories (#1586)
It downloads the last 12 videos.
2013-10-11 14:33:26 +02:00
Jaime Marquínez Ferrándiz
9026dd3858 Make sure it only runs rtmpdump one time in test mode and return True if the download can be resumed 2013-10-11 12:42:15 +02:00
Jaime Marquínez Ferrándiz
81d7f1928c Merge pull request #1565 from rzhxeo/rtmpdump_test
Only download 1 sec. with rtmpdump in test mode
2013-10-11 12:40:18 +02:00
Jaime Marquínez Ferrándiz
bc4f29170f Add a PostProcessor for adding metadata to the file (closes #1570)
It currently sets the title, the date and the author values.
2013-10-11 11:19:09 +02:00
Jaime Marquínez Ferrándiz
cb354c8f62 [yahoo] Download the info from another page
The 'meta' field is not always in the video webpage
2013-10-10 21:01:45 +02:00
Jaime Marquínez Ferrándiz
1cbb27b151 [gamespot] Mark as broken (#1587) 2013-10-10 19:55:52 +02:00
Jaime Marquínez Ferrándiz
0ab4ff6378 [mtv] Strip the description
There were some tabs and newlines added around the string.
2013-10-10 19:53:44 +02:00
Jaime Marquínez Ferrándiz
63da13e829 Add an extractor for faz.net (closes #1582) 2013-10-10 19:37:17 +02:00
Jaime Marquínez Ferrándiz
4193a453c2 Don't add extractors with IE_DESC set to False to the page of supported sites. 2013-10-10 16:18:02 +02:00
Jaime Marquínez Ferrándiz
2e1fa03bf5 Add an extractor for video.nhl.com (closes #1586) 2013-10-10 16:16:49 +02:00
Philipp Hagemeister
8f1ae18a18 release 2013.10.09 2013-10-09 23:50:47 +02:00
Philipp Hagemeister
57da92b7df [youtube] Do not recognize attribution link as user (Fixes #1573) 2013-10-09 23:50:38 +02:00
Jaime Marquínez Ferrándiz
df4f632dbc Merge pull request #1584 from wingsuit/master
Tiny tpo
2013-10-09 07:44:06 -07:00
Jaime Marquínez Ferrándiz
a34c2faae4 [youtube] set the 'name' parameter in the subtitles url (fixes #1577) 2013-10-09 16:41:36 +02:00
Tom
1d368c7589 Tiny tpo 2013-10-09 21:56:09 +08:00
Jaime Marquínez Ferrándiz
88bd97e34c [vevo] Some improvements (fixes #1580)
Extract the info from http://videoplayer.vevo.com/VideoService/AuthenticateVideo?isrc={id}
Some videos don't have an smil manifest, extract the video urls directly from the json and use the last version of the video.
Extract all the available formats and set the 'formats' field of the result
2013-10-08 21:25:38 +02:00
Jaime Marquínez Ferrándiz
2ae3edb1cf Fix the printing of the proxy map in debug mode
The proxies have to be extracted from the opener.handlers
2013-10-07 21:10:31 +02:00
Philipp Hagemeister
b2ad967e45 Simplify test setup 2013-10-07 19:06:36 +02:00
Philipp Hagemeister
a27b9e8bd5 Move opener setup into a separate helper function 2013-10-07 19:01:47 +02:00
Philipp Hagemeister
4481a754e4 release 2013.10.07 2013-10-07 14:34:19 +02:00
Philipp Hagemeister
faa6ef6bc8 [jeuxvideo] Improve code quality (fixes #1567) 2013-10-07 14:33:23 +02:00
Philipp Hagemeister
15870e90b0 Restore warning when user forgets to quote URL (#1396) 2013-10-07 12:21:24 +02:00
rzhxeo
8e4f824365 Remove test parameter from _download_with_rtmpdump 2013-10-06 22:04:32 +02:00
Jaime Marquínez Ferrándiz
387ae5f30b [vimeo] Recognize urls ending in a slash (fixes #1242) 2013-10-06 21:56:23 +02:00
rzhxeo
ad7a071ab6 Only download 1 sec. with rtmpdump in test mode 2013-10-06 20:55:24 +02:00
Philipp Hagemeister
1310bf2474 [redtube] add age_limit 2013-10-06 16:39:35 +02:00
Philipp Hagemeister
b24f347190 Merge branch 'download-archive'
Conflicts:
	youtube_dl/YoutubeDL.py
	youtube_dl/__init__.py
2013-10-06 16:30:26 +02:00
Philipp Hagemeister
ee6c9f95e1 Remove superfluous parenthesis 2013-10-06 16:28:36 +02:00
Philipp Hagemeister
2a69c6b879 Merge branch 'age_limit' 2013-10-06 16:23:18 +02:00
Philipp Hagemeister
cfadd183c4 Call extracted property age_limit everywhere 2013-10-06 16:23:06 +02:00
Philipp Hagemeister
e484c81f0c [generic] Clarify error messages 2013-10-06 16:03:18 +02:00
Philipp Hagemeister
7e5e8306fd release 2013.10.06 2013-10-06 07:13:14 +02:00
Philipp Hagemeister
41e8bca4d0 [viddler] Add basic support (Fixes #1520) 2013-10-06 07:12:47 +02:00
Philipp Hagemeister
8dbe9899a9 Allow users to specify an age limit (fixes #1545)
With these changes, users can now restrict what videos are downloaded by the intented audience, by specifying their age with --age-limit YEARS .
Add rudimentary support in youtube, pornotube, and youporn.
2013-10-06 06:08:56 +02:00
Philipp Hagemeister
f4aac741d5 Move try_rm to test helpers 2013-10-06 05:47:17 +02:00
Philipp Hagemeister
c1c9a79c49 Add basic --download-archive option
Often, users want to be able to download only videos they haven't seen before, despite the video files having been deleted or moved in the mean time.
When --download-archive FILE is given, the extractor and ID of every download is recorded in the specified file. If it is already present, the video in question is skipped.
2013-10-06 04:27:10 +02:00
Philipp Hagemeister
226113c880 Merge remote-tracking branch 'origin/tox' 2013-10-05 22:47:44 +02:00
Filippo Valsorda
8932a66e49 [fixup] remove unnecessary commented function 2013-10-05 16:38:37 -04:00
Filippo Valsorda
79cfb46d42 add tox configuration file for easy testing 2013-10-05 16:08:48 -04:00
Filippo Valsorda
00fcc17aee add capability to suppress expected warnings in tests 2013-10-05 15:55:58 -04:00
Philipp Hagemeister
e94b783c74 [googleplus] Fix upload_date detection 2013-10-05 16:38:33 +02:00
Philipp Hagemeister
97dae9ae07 [bliptv] Make sure video ID is a string 2013-10-05 16:12:29 +02:00
rzhxeo
ca215e0a4f [CinemassacreIE] Use MD5 to check in TEST description 2013-10-05 13:42:17 +02:00
rzhxeo
91a26ca559 [CinemassacreIE] Remove docstring from class 2013-10-05 13:40:05 +02:00
rzhxeo
1ece880d7c [CinemassacreIE] Add support for other embed methods 2013-10-05 13:36:13 +02:00
rzhxeo
400afddaf4 Add CinemassacreIE 2013-10-05 09:37:11 +02:00
Jaime Marquínez Ferrándiz
c3fef636b5 [dailymotion] Fix playlist extraction
The html code has changed, make the video ids extraction more solid.
2013-10-04 14:07:29 +02:00
Philipp Hagemeister
46e28a84ca [brightcove] Fix up some broken HTML (#1553) 2013-10-04 11:53:49 +02:00
Philipp Hagemeister
17ad2b3fb1 [yahoo] Switch ext of test 2013-10-04 11:44:56 +02:00
Philipp Hagemeister
5e2a60db4a [yahoo] Fix test title 2013-10-04 11:44:02 +02:00
Philipp Hagemeister
cd214418f6 [redtube] pep8 2013-10-04 11:41:57 +02:00
Philipp Hagemeister
ba2d9f213e [jeuxvideo] fix video file md5sum 2013-10-04 11:38:56 +02:00
Philipp Hagemeister
7f8ae73a5d Include length in player cache ID
Some videos use the same player with IDs of multiple lengths.
See https://travis-ci.org/rg3/youtube-dl/jobs/12126506#L319 for an example.
2013-10-04 11:36:06 +02:00
Philipp Hagemeister
466880f531 [yahoo] Do not try to run rtmpdump on travis 2013-10-04 11:34:12 +02:00
Philipp Hagemeister
9f1f6d2437 [rtlnow] Skip test on travis 2013-10-04 11:33:14 +02:00
Philipp Hagemeister
9e0f897f6b [francetv] Use common format for ID of generation-quoi subextractor 2013-10-04 11:30:47 +02:00
Philipp Hagemeister
c0f6aa876f Merge remote-tracking branch 'origin/master' 2013-10-04 11:14:20 +02:00
Philipp Hagemeister
d93bdee9a6 [comedycentral] Prepare for generic video extraction (#980) 2013-10-04 11:14:10 +02:00
Philipp Hagemeister
f13d09332d [mtv] Prepare for #980 2013-10-04 11:10:04 +02:00
Philipp Hagemeister
2f5865cc6d Clarify that url and ext are optional when formats is given (#980) 2013-10-04 11:09:43 +02:00
Philipp Hagemeister
deefc05b88 Document formats (for #980) 2013-10-04 10:40:42 +02:00
Philipp Hagemeister
0d8cb1cc14 [ted] Prepare #980 merge 2013-10-04 10:32:34 +02:00
Jaime Marquínez Ferrándiz
a90b9fd209 Merge pull request #1551 from rzhxeo/flickr
[FlickrIE] Fix HTTPS url
2013-10-03 23:14:12 -07:00
rzhxeo
829493439a [FlickrIE] Fix HTTPS url 2013-10-04 07:47:40 +02:00
Pierre Rudloff
73b4fafd82 Use self._download_webpage everywhere 2013-10-04 01:12:42 +02:00
Pierre Rudloff
b039775057 Unused variable 2013-10-04 01:07:24 +02:00
Pierre Rudloff
5c1d63b737 Changes suggested by @phihag 2013-10-04 01:04:38 +02:00
Philipp Hagemeister
3cd022f6e6 Merge remote-tracking branch 'rzhxeo/rtl_ntv' 2013-10-04 00:59:11 +02:00
Philipp Hagemeister
abefd1f7c4 Merge remote-tracking branch 'rzhxeo/rtl_upload_date' 2013-10-04 00:58:35 +02:00
Philipp Hagemeister
c21315f273 [youtube] new static 82 signature 2013-10-04 00:43:01 +02:00
Philipp Hagemeister
9ab1018b1a release 2013.10.04 2013-10-04 00:38:19 +02:00
Philipp Hagemeister
da0a5d2d6e [france2] Add support for URLs without video IDs (Fixes #1547) 2013-10-04 00:34:36 +02:00
Jaime Marquínez Ferrándiz
ee6adb166c [ign] Support more urls and detect multiple videos in articles (fixes #1543) 2013-10-02 20:59:34 +02:00
Philipp Hagemeister
be8fe32c92 Fix help of --cachedir 2013-10-02 14:37:19 +02:00
Philipp Hagemeister
c38b1e776d [youtube] Simplify cache_dir code (#1529) 2013-10-02 08:41:14 +02:00
Philipp Hagemeister
4f8bf17f23 Merge remote-tracking branch 'holomorph/master' 2013-10-02 08:23:53 +02:00
Philipp Hagemeister
ca40186c75 [youtube] Fix static 82 signature (Closes #1539) 2013-10-02 08:20:00 +02:00
Philipp Hagemeister
a8c6b24155 [youtube] Support videos without a title (Fixes #1391, Closes #1542) 2013-10-02 07:25:35 +02:00
Filippo Valsorda
bd8e5c7ca2 Merge pull request #1531 from rg3/no-playlist
[youtube] implement --no-playlist to only download current video
2013-10-01 10:08:20 -07:00
Filippo Valsorda
7c61bd36bb [youtube] correct --no-playlist for python3 2013-10-01 11:58:13 -04:00
Jaime Marquínez Ferrándiz
c54283824c [dailymotion] Detect vevo videos (fixes #1532)
All videos from the Vevo user, just embed videos from vevo.com
2013-10-01 15:05:41 +02:00
Philipp Hagemeister
52f15da2ca release 2013.10.01.1 2013-10-01 14:44:26 +02:00
Philipp Hagemeister
44d466559e Properly handle stream meap not being present 2013-10-01 14:44:09 +02:00
Philipp Hagemeister
05751eb047 release 2013.10.01 2013-10-01 11:43:54 +02:00
Philipp Hagemeister
f10503db67 Handle videos without url_encoded_fmt_stream_map (Fixes #1535) 2013-10-01 11:39:11 +02:00
rzhxeo
adfeafe9e1 [RTLnowIE] Allow video description without upload date
Some videos (feature films) have no upload date.
2013-10-01 07:22:49 +02:00
rzhxeo
4c62a16f4f [RTLnowIE] Add support for http://n-tvnow.de 2013-10-01 06:55:30 +02:00
rzhxeo
c0de39e6d4 Merge pull request #2 from rg3/master
Update
2013-09-30 21:39:58 -07:00
Mark Oteiza
fa55675593 Support XDG base directory specification 2013-09-30 18:22:38 -04:00
Filippo Valsorda
d4d9920a26 add test for --no-playlist 2013-09-30 18:01:17 -04:00
Filippo Valsorda
47192f92d8 implement --no-playlist to only download current video - closes #755 2013-09-30 16:26:25 -04:00
Jaime Marquínez Ferrándiz
722076a123 [rtlnow] Replace one of the tests
The video is no longer available.
2013-09-29 23:07:26 +02:00
Jaime Marquínez Ferrándiz
bb4aa62cf7 [appletrailers] The request for the settings must have the trailer name in lower case (fixes #1329) 2013-09-29 20:59:19 +02:00
Jaime Marquínez Ferrándiz
843530568f [appletrailers] Rework extraction (fixes #1387)
The exraction was broken:
* The includes page contains img elements that need to be fixed.
* Use the 'itunes.inc' page, it contains a json dictionary for each trailer with information.
* Get the formats from 'includes/settings{trailer_name}.json'
* Use urljoin to allow urls with a fragment identifier to work

Removed the thumbnail urls from the tests, they are different now.
2013-09-29 20:49:58 +02:00
Philipp Hagemeister
138a5454b5 release 2013.09.29 2013-09-29 14:38:37 +02:00
Philipp Hagemeister
d279037036 [update] Prevent cmd window popup on Windows (Fixes #1478) 2013-09-29 14:37:06 +02:00
Philipp Hagemeister
46353f6783 [update] Look for .exe extension on Windows (Fixes #745) 2013-09-29 14:37:00 +02:00
Jaime Marquínez Ferrándiz
70922df8b5 [dailymotion] Disable the family filter in the playlists (fixes #1524) 2013-09-29 12:44:02 +02:00
Jaime Marquínez Ferrándiz
9c15e9de84 [yahoo] Fix video extraction (fixes #1521)
There's no need to use two different methods.
Now we can also download videos over http if possible.
Also run the test for rtmp videos, but skip the download.
2013-09-28 21:19:52 +02:00
Philipp Hagemeister
123c10608d Merge branch 'master' of github.com:rg3/youtube-dl 2013-09-28 15:43:38 +02:00
Philipp Hagemeister
0b7c2485b6 [zdf] Add support for hash URLs and simplify (#1518) 2013-09-28 15:43:34 +02:00
Jaime Marquínez Ferrándiz
9abb32045a [youtube] Add hlsvp to the error message if it can't be found and remove the live stream test
It's no longer available, other olympics streams have the same problem.
2013-09-27 15:06:27 +02:00
Jaime Marquínez Ferrándiz
f490e77e77 [youtube] Set the thumbnail to None if it can't be extracted 2013-09-27 14:22:36 +02:00
Jaime Marquínez Ferrándiz
2dc592991a [youtube] update description of test 2013-09-27 14:20:52 +02:00
Jaime Marquínez Ferrándiz
0a60edcfa9 Don't fail if the video thumbnail couldn't be downloaded (fixes #1516)
Just report a warning
2013-09-27 14:19:19 +02:00
Philipp Hagemeister
c53f9d30c8 Merge branch 'master' of github.com:rg3/youtube-dl 2013-09-27 13:09:58 +02:00
Philipp Hagemeister
509f398292 Remove youtube_genalgo (#1515)
With the automatic signature extraction, this script has become superfluous now
2013-09-27 13:09:24 +02:00
Jaime Marquínez Ferrándiz
74bab3f0a4 Don't embed subtitles if the list is empty or the field is not set (fixes #1510) 2013-09-27 08:08:43 +02:00
Philipp Hagemeister
8574862991 Merge remote-tracking branch 'rzhxeo/RTL_T' 2013-09-27 06:25:04 +02:00
Philipp Hagemeister
2de957c7e1 Merge remote-tracking branch 'rzhxeo/RTL' 2013-09-27 06:23:10 +02:00
Philipp Hagemeister
920de7a27d [youtube] Fix 83 signature (Closes #1511) 2013-09-27 06:15:21 +02:00
rzhxeo
63efc427cd [RTLnowIE] Clean video title
The title of some videos has the following format:
Series - Episode | Series online schauen bei ... NOW
2013-09-27 06:00:37 +02:00
rzhxeo
ce65fb6c76 [RTLnowIE] Add support for http://rtlnitronow.de 2013-09-27 05:50:16 +02:00
Jaime Marquínez Ferrándiz
4de1994b6e [brightcove] Use direct url for the tests
The test_all_urls.py test failed because BrightcoveIE doesn't match them.
2013-09-26 18:59:56 +02:00
Jaime Marquínez Ferrándiz
592882aa9f [brightcove] Support videos that only provide flv versions (fixes #1504)
Moved the test from generic.py to brightcove.py
2013-09-26 13:54:31 +02:00
Philipp Hagemeister
b98d6a1e19 release 2013.09.24.2 2013-09-24 21:55:34 +02:00
Philipp Hagemeister
29c7a63df8 Remove debugging code 2013-09-24 21:55:25 +02:00
Philipp Hagemeister
8b25323ae2 release 2013.09.24.1 2013-09-24 21:40:47 +02:00
Philipp Hagemeister
f426de8460 Merge remote-tracking branch 'origin/master' 2013-09-24 21:40:30 +02:00
Philipp Hagemeister
695dc094ab Merge branch 'automatic-signatures' 2013-09-24 21:40:08 +02:00
Jaime Marquínez Ferrándiz
e80d861064 Revert "[southparkstudios] Fix mgid extraction"
This reverts commit 0fd49457f5.

It seems that the redesign was temporary.
2013-09-24 21:39:38 +02:00
Philipp Hagemeister
2cdeb20135 release 2013.09.24 2013-09-24 21:28:06 +02:00
Philipp Hagemeister
7f74773254 Add option --no-cache-dir 2013-09-24 21:26:10 +02:00
Philipp Hagemeister
f2c327fd39 Fix 86 signature (#1494) 2013-09-24 21:20:42 +02:00
Philipp Hagemeister
e35e4ddc9a Fix output of --youtube-print-sig-code when counting down to 0 2013-09-24 21:18:03 +02:00
Philipp Hagemeister
c3c88a2664 Allow opts.cachedir == None to disable cache 2013-09-24 21:04:43 +02:00
Jaime Marquínez Ferrándiz
bb0eee71e7 [youtube] Update one of the test's description 2013-09-24 21:04:13 +02:00
Jaime Marquínez Ferrándiz
6f56389b88 [youtube] update algos for length 86 and 84 (fixes #1494) 2013-09-24 21:02:00 +02:00
Jaime Marquínez Ferrándiz
5b333c1ce6 [francetv] Add an extractor for Generation Quoi (closes #1475) 2013-09-23 21:41:54 +02:00
Jaime Marquínez Ferrándiz
a825f33030 [francetv] Add an extractor for France2 2013-09-23 21:28:33 +02:00
Philipp Hagemeister
92f618f2e2 Merge remote-tracking branch 'origin/master' 2013-09-23 11:24:49 +02:00
Philipp Hagemeister
81ec7c7901 [facebook] Allow untitled videos (Fixes #1484) 2013-09-23 11:24:33 +02:00
Jaime Marquínez Ferrándiz
dd5d2eb03c If the file is already downloaded include the size in the progress hook 2013-09-22 23:39:30 +02:00
Jaime Marquínez Ferrándiz
4ae720042c Include the eta and the speed in the progress hooks
Useful when listening to the progress hook, for example in a GUI.
2013-09-22 23:31:39 +02:00
Philipp Hagemeister
c705320f48 Correct test strings 2013-09-22 12:18:16 +02:00
Philipp Hagemeister
d2d8f89531 Do not warn if fallback is without alternatives (because we did not get the flash player URL) 2013-09-22 12:18:10 +02:00
Philipp Hagemeister
bdde940e90 [youtube] Improve flash player URL handling 2013-09-22 12:17:42 +02:00
Philipp Hagemeister
45f4a76dbc Work around nosetests nosiness 2013-09-22 11:45:29 +02:00
Philipp Hagemeister
13dc64ce74 [youtube] Remove _decrypt_signature_age_gate 2013-09-22 11:17:21 +02:00
Philipp Hagemeister
c35f9e72ce Move cachedir doc 2013-09-22 11:09:25 +02:00
Philipp Hagemeister
f8061589e6 [youtube] Actually pass in cachedir option 2013-09-22 10:51:33 +02:00
Philipp Hagemeister
0ca96d48c7 [youtube] Improve source code quality 2013-09-22 10:37:23 +02:00
Philipp Hagemeister
4ba146f35d Update static signatures 2013-09-22 10:31:25 +02:00
Philipp Hagemeister
edf3e38ebd [youtube] Improve cache and add an option to print the extracted signatures 2013-09-22 10:30:02 +02:00
Philipp Hagemeister
c4417ddb61 [youtube] Add filesystem signature cache 2013-09-22 00:35:03 +02:00
tewe
4a2080e407 [youku] better error handling
blocked videos used to cause death by TypeError, now we report what the
server says
2013-09-21 20:50:31 +02:00
Philipp Hagemeister
2f2ffea9ca Clarify a couple of calls 2013-09-21 15:34:29 +02:00
Philipp Hagemeister
ba552f542f Use reader instead of indexing 2013-09-21 15:32:37 +02:00
Philipp Hagemeister
8379969834 Prepare signature function caching 2013-09-21 15:19:48 +02:00
Philipp Hagemeister
95dbd2f990 Change test target (Verified with node.js) 2013-09-21 15:10:38 +02:00
Philipp Hagemeister
a7177865b1 Implement more opcodes 2013-09-21 14:48:12 +02:00
Philipp Hagemeister
e0df6211cc Restore accidentally deleted commits
That's what happens if you let Windows machines write :(
2013-09-21 14:40:35 +02:00
Jaime Marquínez Ferrándiz
b00ca882a4 [livestream] Fix events extraction (fixes #1467) 2013-09-21 13:50:52 +02:00
Jaime Marquínez Ferrándiz
39baacc49f [dailymotion] Add an extractor for users (closes #1476) 2013-09-21 12:45:53 +02:00
Jaime Marquínez Ferrándiz
3a1d48d6de [dailymotion] Raise ExtractorError if the dailymotion response reports an error 2013-09-21 12:15:54 +02:00
Philipp Hagemeister
34308b30d6 Warn if no locale is set (#1474) 2013-09-21 11:48:07 +02:00
Philipp Hagemeister
bc1506f8c0 Merge branch 'master' of github.com:rg3/youtube-dl 2013-09-21 11:10:30 +02:00
Philipp Hagemeister
b61067fa4f Abort if extractaudio is given without a variable extension (#1470) 2013-09-21 11:10:22 +02:00
Jaime Marquínez Ferrándiz
69b227a9bc [southparkstudios] add support for http://www.southparkstudios.com/full-episodes/* urls (closes #1469) 2013-09-21 10:58:43 +02:00
Jaime Marquínez Ferrándiz
0fd49457f5 [southparkstudios] Fix mgid extraction 2013-09-21 10:51:25 +02:00
Philipp Hagemeister
58f289d013 release 2013.09.20.1 2013-09-20 22:59:14 +02:00
Jaime Marquínez Ferrándiz
3d60bb96e1 Add an extractor for ebaumsworld.com (closes #1462) 2013-09-20 16:55:50 +02:00
Jaime Marquínez Ferrándiz
38d025b3f0 [youtube] add algo for length 91 2013-09-20 14:43:16 +02:00
Jaime Marquínez Ferrándiz
c40c6aaaaa Catch socket.error before IOError
Since python 2.6 it's a child class.
2013-09-20 13:26:03 +02:00
Jaime Marquínez Ferrándiz
1a810f0d4e [funnyordie] Fix video url extraction 2013-09-20 13:05:34 +02:00
Philipp Hagemeister
63037593c0 release 2013.09.20 2013-09-20 10:24:48 +02:00
Jaime Marquínez Ferrándiz
7a878d47fa Merge pull request #1464 from patrickslin/patch-7
Unable to decrypt signature length 93 (fixes #1461)
2013-09-20 08:25:10 +02:00
patrickslin
bc4b900898 Unable to decrypt signature length 93 (fixes #1461) 2013-09-19 21:49:06 -07:00
Jaime Marquínez Ferrándiz
c5e743f66f [fktv] support videos splitted in any number of parts and some style changes 2013-09-18 23:32:37 +02:00
Jaime Marquínez Ferrándiz
6c36d8d6fb Merge pull request #1438 from rzhxeo/fktv
Add support for http://fernsehkritik.tv
2013-09-18 23:05:56 +02:00
Jaime Marquínez Ferrándiz
71c82637e7 [youtube] apply the fix for lists with number of videos multiple of _MAX_RESULTS to user extraction
Copied from the playlist extractor.
2013-09-18 23:00:32 +02:00
Philipp Hagemeister
2dad310e2c Credit @Ruirize for newgrounds 2013-09-18 22:30:22 +02:00
Philipp Hagemeister
d0ae9e3a8d [newgrounds] simplify 2013-09-18 22:14:43 +02:00
Ruirize
a19413c311 Changed file hash. 2013-09-18 17:17:12 +01:00
Ruirize
1ef80b55dd Fixes test fail
Was unaware of --id being passed to test.
2013-09-18 16:23:38 +01:00
Ruirize
eb03f4dad3 Added Newgrounds support 2013-09-18 15:54:45 +01:00
Philipp Hagemeister
830dd1944a Clarify -i help (#1453) 2013-09-18 13:23:04 +02:00
Pierre Rudloff
cc6943e86a Improvements 2013-09-18 00:07:04 +02:00
rzhxeo
1237c9a3a5 XHamsterIE: Fix support for new HD video url format and add test (closes PR #1443) 2013-09-17 23:08:01 +02:00
Pierre Rudloff
8f77093262 Merge remote-tracking branch 'upstream/master' into websurg 2013-09-17 23:07:44 +02:00
Jaime Marquínez Ferrándiz
5d13df79a5 [francetv] Remove Pluzz test
Videos expire in 7 days
2013-09-17 22:49:43 +02:00
Pierre Rudloff
d79a0e233a Extractor for websurg.com 2013-09-17 22:13:40 +02:00
Jaime Marquínez Ferrándiz
6523223a4c [hotnewhiphop] Fix test case title 2013-09-17 21:10:57 +02:00
Jaime Marquínez Ferrándiz
4a67aafb7e [youtube] Don't search the flash player version for videos with age gate activated 2013-09-17 20:59:55 +02:00
Philipp Hagemeister
f3f34c5b0f release 2013.09.17 2013-09-17 17:00:20 +02:00
Philipp Hagemeister
6ae8ee3f54 Update 85 signature (Fixes #1449)
This is the first signature algorithm to have been parsed automatically, although that only works for HTML5 players for now, and is not yet integrated into master.
2013-09-17 16:59:13 +02:00
Jaime Marquínez Ferrándiz
e8f8e80097 Add an extractor for vice.com (closes #1051) 2013-09-16 20:58:36 +02:00
Jaime Marquínez Ferrándiz
4dc0ff3ecf [ooyala] prefer ipad url
It has better quality with m3u8 manifests
2013-09-16 20:38:54 +02:00
Jaime Marquínez Ferrándiz
4b6462fc1e Add an extractor for Bloomberg (closes #1436) 2013-09-16 20:38:48 +02:00
Jaime Marquínez Ferrándiz
c4ece78564 [ooyala] add support for more type of video urls, like m3u8 manifests. 2013-09-16 19:34:10 +02:00
rzhxeo
0761d02b0b Add FKTV extractor 2013-09-16 14:46:19 +02:00
rzhxeo
71c107fc57 Add FKTV extractor
Support for Fernsehkritik-TV (incl. Postecke)
2013-09-16 14:45:14 +02:00
Philipp Hagemeister
7459e3a290 Always correct encoding when writing to sys.stderr (Fixes #1435) 2013-09-16 06:55:41 +02:00
Philipp Hagemeister
f9e66fb993 release 2013.09.16 2013-09-16 04:12:57 +02:00
Philipp Hagemeister
6c603ccce3 [devscripts/release] temporary workarounds 2013-09-16 04:12:43 +02:00
Philipp Hagemeister
ef66b0c6ef Merge remote-tracking branch 'origin/master' 2013-09-16 03:32:53 +02:00
Philipp Hagemeister
22b50ecb2f Starts of a Windows service 2013-09-16 03:32:45 +02:00
Jaime Marquínez Ferrándiz
5a6fecc3de Add an extractor for southparkstudios.com (closes #1434)
It uses the MTV system
2013-09-15 23:30:58 +02:00
Jaime Marquínez Ferrándiz
cdbccafed9 Merge pull request #1422 from rzhxeo/xhamster
XHamsterIE: Add support for new URL format (download in hd by default)
2013-09-15 12:18:39 +02:00
Jaime Marquínez Ferrándiz
e69ae5b9e7 [youtube] support youtube.googleapis.com/v/* urls (fixes #1425) 2013-09-15 12:14:59 +02:00
Jaime Marquínez Ferrándiz
92790f4e54 [soundcloud] Add an extractor for users (closes #1426) 2013-09-14 21:41:49 +02:00
Jaime Marquínez Ferrándiz
471a5ee908 Set the ext field for each format 2013-09-14 14:45:04 +02:00
Jaime Marquínez Ferrándiz
19e1d35989 [mixcloud] Rewrite extractor (fixes #278) 2013-09-14 14:26:42 +02:00
Jaime Marquínez Ferrándiz
0b7f31184d Now --all-sub is a modifier to --write-sub and --write-auto-sub (closes #1412)
For keeping backwards compatibility --all-sub sets --write-sub if --write-auto-sub is not given
2013-09-14 11:14:40 +02:00
Jaime Marquínez Ferrándiz
fad84d50fe [googleplus] Fix upload date extraction 2013-09-14 11:10:01 +02:00
rzhxeo
9a1c32dc54 XHamsterIE: Add support for new URL format 2013-09-14 05:42:00 +02:00
Jaime Marquínez Ferrándiz
a921f40799 [ustream] Simplify channel extraction
the ChannelParser has been moved to a new function in utils get_meta_content
Instead of the SocialStreamParser now it uses a regex
2013-09-13 22:05:29 +02:00
Jaime Marquínez Ferrándiz
74ac9bdd82 Merge pull request #1413 from tewe/master
Add Ustream channel support
2013-09-13 21:34:31 +02:00
Jaime Marquínez Ferrándiz
94518f2087 Merge pull request #1409 from JohnyMoSwag/master (closes #1404)
added kickstarter IE
2013-09-13 19:52:56 +02:00
Jaime Marquínez Ferrándiz
535f59bbcf Merge pull request #1350 from Jaiz909/description-keyerror-fix
Fixed issue #1277 KeyError when no description.
2013-09-13 18:20:42 +02:00
Philipp Hagemeister
71cedb3c0c [buildserver] Service installation and uninstallation 2013-09-13 02:25:12 +02:00
Jaime Marquínez Ferrándiz
dd01d6558a [gamespot] Update test video title 2013-09-12 22:18:39 +02:00
Jaime Marquínez Ferrándiz
ce85f022d2 [youtube] update algo for length 82 (fixes #1416) 2013-09-12 22:04:09 +02:00
Jaime Marquínez Ferrándiz
ad94a6fe44 [canalplust] accept urls that don't include the video id (fixes #1415), extract more info and update test 2013-09-12 21:56:36 +02:00
Philipp Hagemeister
353ba14060 [buildserver] Rely on repository license 2013-09-12 16:34:24 +02:00
Philipp Hagemeister
83de794223 Add original buildserver from @fraca7 2013-09-12 16:30:43 +02:00
tewe
bfd5c93af9 Add Ustream channel support 2013-09-12 12:30:14 +02:00
Jaime Marquínez Ferrándiz
c247d87ef3 [funnyordie] fix video url extraction 2013-09-12 11:31:27 +02:00
Philipp Hagemeister
07ac9e2cc2 release 2013.09.12 2013-09-12 11:26:44 +02:00
Jaime Marquínez Ferrándiz
6bc520c207 Check for both automatic captions and subtitles with options --write-sub and --write-auto-sub (fixes #1224) 2013-09-12 11:15:25 +02:00
Johny Mo Swag
f1d20fa39f added kickstarter IE 2013-09-11 14:50:38 -07:00
Jaime Marquínez Ferrándiz
e3dc22ca3a [youtube] Fix detection of videos with automatic captions 2013-09-11 19:24:56 +02:00
Jaime Marquínez Ferrándiz
d665f8d3cb [subtitles] Also list the available automatic captions languages with '--list-sub' 2013-09-11 19:17:30 +02:00
Jaime Marquínez Ferrándiz
055e6f3657 [youtube] Support automatic captions with original language different from English (fixes #1225) and download in multiple languages. 2013-09-11 19:08:43 +02:00
Jaime Marquínez Ferrándiz
ac4f319ba1 Credit @iemejia 2013-09-11 17:58:51 +02:00
Jaime Marquínez Ferrándiz
542cca0e8c Merge branch 'subtitles_rework' (closes PR #1326) 2013-09-11 17:41:24 +02:00
Jaime Marquínez Ferrándiz
6a2449df3b [howcast] Do not download from http://www.howcast.com/videos/{video_id}
It takes too much to follow the redirection.
2013-09-11 17:36:23 +02:00
Jaime Marquínez Ferrándiz
7fad1c6328 [subtitles] Use self._download_webpage for extracting the subtitles
It raises ExtractorError for the same exceptions we have to catch.
2013-09-11 16:24:47 +02:00
Jaime Marquínez Ferrándiz
d82134c339 [subtitles] Simplify the extraction of subtitles in subclasses and remove NoAutoSubtitlesInfoExtractor
Subclasses just need to call the method extract_subtitles, which will call _extract_subtitles and _request_automatic_caption
Now the default implementation of _request_automatic_caption returns {}.
2013-09-11 16:05:49 +02:00
Jaime Marquínez Ferrándiz
54d39d8b2f [subtitles] rename SubitlesIE to SubtitlesInfoExtractor
Otherwise it can be automatically detected as a IE ready for use.
2013-09-11 15:51:04 +02:00
Jaime Marquínez Ferrándiz
de7f3446e0 [youtube] move subtitles methods from the base extractor to YoutubeIE 2013-09-11 15:48:23 +02:00
Ismael Mejia
f8e52269c1 [subtitles] made inheritance hierarchy flat as requested 2013-09-11 15:21:09 +02:00
Ismael Mejia
cf1dd0c59e Merge branch 'master' into subtitles_rework 2013-09-11 14:26:48 +02:00
Jaime Marquínez Ferrándiz
22c8b52545 In the supported sites page, sort the extractors in case insensitive 2013-09-11 12:04:27 +02:00
Philipp Hagemeister
1f7dc42cd0 release 2013.11.09 2013-09-11 11:30:10 +02:00
Jaime Marquínez Ferrándiz
aa8f2641da [youtube] update algo for length 85 (fixes #1408 and fixes #1406) 2013-09-11 11:24:58 +02:00
Jaime Marquínez Ferrándiz
648d25d43d [francetv] Add an extractor for francetvinfo.fr (closes #1317)
It uses the same system as Pluzz, create a base class for both extractors.
2013-09-10 15:50:34 +02:00
Philipp Hagemeister
df3e61003a Merge pull request #1402 from Rudloff/canalc2
Wrong property name
2013-09-10 03:19:37 -07:00
Pierre Rudloff
6b361ad5ee Wrong property name 2013-09-10 12:13:22 +02:00
Pierre Rudloff
5d8afe69f7 Add an extractor for pluzz.francetv.fr (closes PR #1399) 2013-09-10 12:00:00 +02:00
Philipp Hagemeister
a1ab553858 release 2013.09.10 2013-09-10 11:25:11 +02:00
Jaime Marquínez Ferrándiz
07463ea162 Add an extractor for Slideshare (closes #1400) 2013-09-10 11:19:58 +02:00
Jaime Marquínez Ferrándiz
6d2d21f713 [sohu] add support for my.tv.sohu.com urls (fixes #1398) 2013-09-09 19:56:16 +02:00
Jaime Marquínez Ferrándiz
061b2889a9 Fix the minutes part in FileDownloader.format_seconds (fixed #1397)
It printed for the minutes the result of (seconds // 60)
2013-09-09 10:38:54 +02:00
Jaime Marquínez Ferrándiz
8963d9c266 [youtube] Modify the regex to match ids of length 11 (fixes #1396)
In urls like http://www.youtube.com/watch?v=BaW_jenozKcsharePLED17F32AD9753930 you can't split the query string and ids always have that length.
2013-09-09 10:33:12 +02:00
Jaime Marquínez Ferrándiz
890f62e868 Revert "[youtube] Fix detection of tags from HLS videos."
They have undo the change

This reverts commit 0638ad9999.
2013-09-08 18:50:07 +02:00
Philipp Hagemeister
8f362589a5 release 2013.09.07 2013-09-07 22:29:15 +02:00
Philipp Hagemeister
a27a2470cd Merge branch 'master' of github.com:rg3/youtube-dl 2013-09-07 22:28:54 +02:00
Ismael Mejia
72836fcee4 Merge branch 'master' into subtitles_rework 2013-09-06 23:24:41 +02:00
Jaime Marquínez Ferrándiz
a7130543fa [generic] If the url doesn't specify the protocol, then try to extract prepending 'http://' 2013-09-06 18:39:35 +02:00
Jaime Marquínez Ferrándiz
a490fda746 [daylimotion] accept embed urls (fixes #1386) 2013-09-06 18:36:07 +02:00
Jaime Marquínez Ferrándiz
7e77275293 Add an extractor for Metacritic 2013-09-06 18:08:07 +02:00
Ismael Mejia
d6e203b3dc [subtitles] fixed multiple subtitles language separated by comma after merge
As mentioned in the pull request, I forgot to include this changes.
aa6a10c44a
2013-09-06 16:30:13 +02:00
Jaime Marquínez Ferrándiz
e3ea479087 [youtube] Fix some issues with the detection of playlist/channel urls (reported in #1374)
They were being caught by YoutubeUserIE, now it only extracts a url if the rest of extractors aren't suitable.
Now the url tests check that the urls can only be extracted with an specific extractor.
2013-09-06 16:24:24 +02:00
Jaime Marquínez Ferrándiz
faab1d3836 [youtube] Fix detection of feeds urls (fixes #1294)
Urls like https://www.youtube.com/feed/watch_later were being as users (before the last changes to YoutubeUserIE, as videos)
2013-09-06 14:45:49 +02:00
Philipp Hagemeister
8851a574a3 Fix add-versions 2013-09-06 11:07:34 +02:00
Philipp Hagemeister
59282080c8 release 2013.09.06.1 2013-09-06 10:53:35 +02:00
Philipp Hagemeister
98f3da4040 Merge remote-tracking branch 'origin/master' 2013-09-06 10:53:24 +02:00
Philipp Hagemeister
1d213233cd Do not re-download files for hashsum generation (Fixes #1383) 2013-09-06 10:51:53 +02:00
Jaime Marquínez Ferrándiz
fd9cf73836 [youtube] Users: download from the api in json to simplify extraction (fixes #1358)
There could be duplicate videos or other videos if the description have links.
2013-09-06 10:43:02 +02:00
Jaime Marquínez Ferrándiz
0638ad9999 [youtube] Fix detection of tags from HLS videos. 2013-09-06 10:25:31 +02:00
Philipp Hagemeister
1eb527692a release 2013.09.06 2013-09-06 10:13:33 +02:00
Jaime Marquínez Ferrándiz
09bb17e108 Merge pull request #1378 from patrickslin/patch-6
Vevo sig changed again, please update for us! Thanks very much! (fixes #...
2013-09-06 09:53:23 +02:00
patrickslin
1cf911bc82 Vevo sig changed again, please update for us! Thanks very much! (fixes #1375) 2013-09-05 17:38:03 -07:00
Jaime Marquínez Ferrándiz
f4b052321b [youtube] Urls like youtube.com/NASA are now interpreted as users (fixes #1069)
Video urls like http://youtube.com/BaW_jenozKc are not valid, but http://youtu.be/BaW_jenozKc is correct.
2013-09-05 22:39:15 +02:00
Philipp Hagemeister
a636203ea5 release 2013.09.05 2013-09-05 22:30:50 +02:00
Jaime Marquínez Ferrándiz
c215217e39 [youtube] Playlists: extract the videos id from ['media$group']['yt$videoid'] (fixes #1374)
'media$player' is not defined for private videos.
2013-09-05 21:40:04 +02:00
Jaime Marquínez Ferrándiz
08e291b54d [generic] Recognize html5 video in the format '<video src=".+?"' and only unquote the url when extracting the id (fixes #1372) 2013-09-05 18:02:17 +02:00
Jaime Marquínez Ferrándiz
6b95b065be Add extractor for tvcast.naver.com (closes #1331) 2013-09-05 10:53:40 +02:00
Jaime Marquínez Ferrándiz
9363169b67 [daum] Get the video page from a canonical url to extract the full id (fixes #1373) and extract description. 2013-09-05 10:08:17 +02:00
Jaime Marquínez Ferrándiz
085bea4513 Credit @Huarong for tv.sohu.com 2013-09-04 22:09:22 +02:00
Jaime Marquínez Ferrándiz
150f20828b Add extractor for daum.net (closes #1330) 2013-09-04 22:06:50 +02:00
Philipp Hagemeister
08523ee20a release 2013.09.04 2013-09-04 14:33:32 +02:00
Philipp Hagemeister
5d5171d26a Merge pull request #1341 from xanadu/master
add support for "-f mp4" for YouTube
2013-09-03 18:52:12 -07:00
Allan Zhou
96fb5605b2 AHLS -> Apple HTTP Live Streaming 2013-09-03 18:49:35 -07:00
Philipp Hagemeister
7011de0bc2 Merge pull request #1363 from Rudloff/defense
defense.gouv.fr
2013-09-03 18:23:08 -07:00
Allan Zhou
c3dd69eab4 Merge remote-tracking branch 'upstream/master' 2013-09-03 12:22:29 -07:00
Pierre Rudloff
025171c476 Suggested by @phihag 2013-09-03 12:03:19 +02:00
Jaime Marquínez Ferrándiz
c8dbccde30 [orf] Remove the test video, they seem to expire in one week 2013-09-03 11:51:01 +02:00
Jaime Marquínez Ferrándiz
4ff7a0f1f6 [dailymotion] improve the regex for extracting the video info 2013-09-03 11:33:59 +02:00
Jaime Marquínez Ferrándiz
9c2ade40de [vimeo] Handle Assertions Error when trying to get the description
In some pages the html tags are not closed, python 2.6 cannot handle it.
2013-09-03 11:11:36 +02:00
Jaime Marquínez Ferrándiz
aa32314d09 [vimeo] add support for videos that embed the download url in the player page (fixes #1364) 2013-09-03 10:48:56 +02:00
Pierre Rudloff
52afe99665 Extractor for defense.gouv.fr 2013-09-03 01:51:17 +02:00
Pierre Rudloff
b0446d6a33 Merge remote-tracking branch 'upstream/master' 2013-09-03 01:27:49 +02:00
Jaime Marquínez Ferrándiz
8e4e89f1c2 Add an extractor for VeeHD (closes #1359) 2013-09-02 11:54:09 +02:00
Jaime Marquínez Ferrándiz
6c758d79de [metacafe] Add more cases for detecting the uploader detection (reported in #1343) 2013-08-31 22:35:39 +02:00
Jaime Marquínez Ferrándiz
691008087b Add an automatic page generator for the supported sites (related #156)
They are listed in the "supportedsites.html" page.
2013-08-31 15:18:52 +02:00
Allan Zhou
85f03346eb Merge remote-tracking branch 'upstream/master' 2013-08-30 17:51:59 -07:00
Allan Zhou
bdc6b3fc64 add support for "-f mp4" for YouTube 2013-08-30 17:51:50 -07:00
Pierre Rudloff
847f582290 Merge remote-tracking branch 'upstream/master' 2013-08-31 00:37:29 +02:00
Philipp Hagemeister
10f5c016ec release 2013.08.30 2013-08-30 21:02:07 +02:00
Jaime Marquínez Ferrándiz
2e756879f1 [youtube] update algo for length 86 2013-08-30 20:49:51 +02:00
Jaime Marquínez Ferrándiz
c7a7750d3b [youtube] Fix typo in the _VALID_URL for YoutubeFavouritesIE, it was intended to also match :ytfavourites 2013-08-30 20:13:05 +02:00
Jaime Marquínez Ferrándiz
9193c1eede Add youtube keywords to the bash completion script 2013-08-30 20:11:53 +02:00
Jai Grimshaw
b3f0e53048 Fixed issue #1277 KeyError when no description.
Allows a continue with a warning when an extractor cannot retrieve a description.
2013-08-31 01:53:01 +10:00
Philipp Hagemeister
3243d0f7b6 release 2013.08.29 2013-08-29 23:29:34 +02:00
Jaime Marquínez Ferrándiz
23b00bc0e4 [youtube] update algo for length 84
Only appears sometimes, nearly identical to length 86.
2013-08-29 22:44:29 +02:00
Jaime Marquínez Ferrándiz
52e1eea18b [youtube] update algo for length 86 (fixes #1349) 2013-08-29 22:33:58 +02:00
Jaime Marquínez Ferrándiz
ee80d66727 [ign] update 1up extractor to work with the updated IGNIE 2013-08-29 21:51:09 +02:00
Jaime Marquínez Ferrándiz
f1fb2d12b3 [ign] extract videos from articles pages 2013-08-29 21:39:36 +02:00
Jaime Marquínez Ferrándiz
deb2c73212 Merge pull request #1347 from whydoubt/fix_orf_at
Fix orf.at extractor by adding file coding mark
2013-08-29 11:05:38 -07:00
Jeff Smith
8928491074 Fix orf.at extractor by adding file coding mark 2013-08-29 12:51:38 -05:00
Jaime Marquínez Ferrándiz
545434670b Add an extractor for orf.at (closes #1346)
Make find_xpath_attr also accept numbers in the value
2013-08-29 19:16:07 +02:00
Jaime Marquínez Ferrándiz
54fda45bac Merge pull request #1342 from whydoubt/fix_mit_26
Fix MIT extractor for Python 2.6
2013-08-29 13:42:08 +02:00
Jaime Marquínez Ferrándiz
c7bf7366bc Update descriptions checksum for some test for Unistra and Youtube 2013-08-29 13:41:59 +02:00
Jaime Marquínez Ferrándiz
b7052e5087 Also print the field that fails if it is a md5 checksum 2013-08-29 12:15:45 +02:00
Jaime Marquínez Ferrándiz
0d75ae2ce3 Fix detection of the webpage charset if it's declared using ' instead of "
Like in "<meta charset='utf-8'/>"
2013-08-29 11:35:15 +02:00
Jeff Smith
b5ba7b9dcf Fix MIT extractor for Python 2.6
The HTML for the MIT page does not parse cleanly for Python 2.6 due
to script tags within an actual script element.  The offending piece
is inside a comment block, so removing all such comment blocks
fixes the parsing.
2013-08-28 14:24:42 -05:00
Allan Zhou
483e0ddd4d Merge remote-tracking branch 'upstream/master' 2013-08-28 10:19:28 -07:00
Philipp Hagemeister
2891932bf0 release 2013.08.28.1 2013-08-28 19:00:17 +02:00
Allan Zhou
591078babf Merge remote-tracking branch 'upstream/master' 2013-08-28 09:57:28 -07:00
Philipp Hagemeister
9868c781a1 Merge remote-tracking branch 'origin/master' 2013-08-28 18:22:33 +02:00
Philipp Hagemeister
c257baff85 Merge remote-tracking branch 'rzhxeo/youporn-hd'
Conflicts:
	youtube_dl/utils.py
2013-08-28 18:22:28 +02:00
rzhxeo
878e83c5a4 YoupornIE: Clean up extraction of hd video 2013-08-28 16:04:48 +02:00
rzhxeo
0012690aae Let aes_decrypt_text return bytes instead of unicode 2013-08-28 16:03:35 +02:00
rzhxeo
6e74bc41ca Fix division bug in aes.py 2013-08-28 16:01:43 +02:00
rzhxeo
cba892fa1f Add intlist_to_bytes to utils.py 2013-08-28 15:59:07 +02:00
rzhxeo
550bfd4cbd Merge pull request #1 from phihag/youporn-hd-pr
Allow changes to run under Python 3
2013-08-28 06:23:33 -07:00
Jaime Marquínez Ferrándiz
920ef0779b Hide the password and username in verbose mode (closes #1089) 2013-08-28 15:14:02 +02:00
Philipp Hagemeister
48ea9cea77 Allow changes to run under Python 3 2013-08-28 14:34:49 +02:00
Philipp Hagemeister
ccf4b799df Merge remote-tracking branch 'origin/master' 2013-08-28 14:02:40 +02:00
Philipp Hagemeister
f143d86ad2 [sohu] Handle encoding, and fix tests 2013-08-28 14:00:05 +02:00
Jaime Marquínez Ferrándiz
8ae97d76ee PostProcessingError holds the message in the 'msg' property, not in 'message' (fixes #1323)
Causes DeprecationWarning: http://www.python.org/dev/peps/pep-0352/
2013-08-28 13:37:31 +02:00
Philipp Hagemeister
f8b362739e Merge remote-tracking branch 'Huarong/master' 2013-08-28 13:10:59 +02:00
Philipp Hagemeister
6d69d03bac Merge remote-tracking branch 'origin/reuse_ies' 2013-08-28 13:05:21 +02:00
Philipp Hagemeister
204da0d3e3 Merge remote-tracking branch 'origin/master' 2013-08-28 12:57:44 +02:00
Philipp Hagemeister
c496ca96e7 Fix platform name in Python 2 with --verbose (Closes #1228) 2013-08-28 12:57:10 +02:00
Jaime Marquínez Ferrándiz
67b22dd036 Add extractors for video.mit.edu and techtv.mit.edu (closes #1327)
video.mit.edu just embeds the videos from techtv.mit.edu
2013-08-28 12:55:42 +02:00
Philipp Hagemeister
ce6a696e4d Remove unused imports 2013-08-28 12:47:38 +02:00
Philipp Hagemeister
a5caba1eb0 [generic] simply use urljoin 2013-08-28 12:47:27 +02:00
Pierre Rudloff
cd9c100963 Merge remote-tracking branch 'upstream/master' 2013-08-28 12:20:12 +02:00
Philipp Hagemeister
edde6c56ac Print playpath with --get-url (Fixes #1334) 2013-08-28 12:14:45 +02:00
Pierre Rudloff
b7f89fe692 Merge remote-tracking branch 'upstream/master' 2013-08-28 12:10:34 +02:00
Philipp Hagemeister
ae3531adf9 [generic] Fix URL concatenation
When the url is something like http://example.org/foo/bar?x=y  and the added is file/video.mp4 , we want http://example.org/foo/file/video.mp4
Fixes #1268.
2013-08-28 12:08:17 +02:00
Philipp Hagemeister
8cf5ee7831 Merge branch 'master' of github.com:rg3/youtube-dl 2013-08-28 11:57:18 +02:00
Philipp Hagemeister
aa3e950764 Tolerate junk at the end of gzip-compressed content (#1268) 2013-08-28 11:57:13 +02:00
Pierre Rudloff
1301a0dd42 Merge remote-tracking branch 'upstream/master' 2013-08-28 11:02:12 +02:00
Jaime Marquínez Ferrándiz
af8bd6a82d Show the time taken to download in the same format as the ETA 2013-08-28 10:56:11 +02:00
Jaime Marquínez Ferrándiz
6d38616e67 Merge pull request #1181 from h3xx/master
Add some verbosity when reporting finished downloads

Remove the mixed use of tabs and spaces for indentation.
2013-08-28 10:54:07 +02:00
Philipp Hagemeister
4f5f18acb9 [addanime] add file 2013-08-28 10:28:16 +02:00
Jaime Marquínez Ferrándiz
3e223834d9 [youtube] update algo for length 88, thanks to @Ramhack (fixes #1328) 2013-08-28 10:26:44 +02:00
Jaime Marquínez Ferrándiz
a1bb0f8773 [cnn] remove debug print call. 2013-08-28 10:20:37 +02:00
Jaime Marquínez Ferrándiz
0e283428f7 HTTPError is in urllib.error in Python 3, not in http.error 2013-08-28 10:18:39 +02:00
Philipp Hagemeister
2eabb80254 [addanime] improve 2013-08-28 04:25:38 +02:00
Philipp Hagemeister
44586389e4 [appletrailers] Add support 2013-08-28 02:18:44 +02:00
Ismael Mejia
06a401c845 Merge branch 'master' into subtitles_rework 2013-08-28 00:33:12 +02:00
Philipp Hagemeister
273f603efb [cnn] Allow more URLs 2013-08-28 00:14:19 +02:00
Philipp Hagemeister
1619e22f40 release 2013.08.28 2013-08-27 23:31:36 +02:00
Philipp Hagemeister
88a79ce6a6 Delete default user agent (Fixes #1309) 2013-08-27 23:31:24 +02:00
Philipp Hagemeister
acebc9cd6b Revert "Install our own HTTPS handler as well (#1309)"
This reverts commit 36399e8576 and fixes #1322.
2013-08-27 23:28:20 +02:00
Jaime Marquínez Ferrándiz
443c12a703 Merge pull request #1324 from whydoubt/fix_gplus
Initial slash in Google+ photos link was removed
2013-08-27 13:36:39 -07:00
Jeff Smith
7f3c4f4f65 Initial slash in Google+ photos link was removed 2013-08-27 14:38:50 -05:00
Jaime Marquínez Ferrándiz
0bc56fa66a Add an extractor for NBC news (closes #1320) 2013-08-27 12:38:57 +02:00
Jaime Marquínez Ferrándiz
1a582dd49d Add an extractor for CNN (closes #1318) 2013-08-27 11:56:48 +02:00
Pierre Rudloff
c5b921b597 Merge remote-tracking branch 'upstream/master' 2013-08-27 10:47:47 +02:00
Philipp Hagemeister
e86ea47c02 [canalc2] Small improvements 2013-08-27 10:35:20 +02:00
Philipp Hagemeister
aa5a63a5b5 Merge remote-tracking branch 'Rudloff/canalc2' 2013-08-27 10:31:46 +02:00
Jaime Marquínez Ferrándiz
2a7b4da9b2 [hark] get the song info in JSON and extract more information. 2013-08-27 10:25:38 +02:00
Jaime Marquínez Ferrándiz
069d098f84 [canalplus] Accept player.canalplus.fr urls 2013-08-27 10:21:57 +02:00
Philipp Hagemeister
b3889f7023 release 2013.08.27 2013-08-27 02:30:47 +02:00
Philipp Hagemeister
65883c8dbd Merge branch 'master' of github.com:rg3/youtube-dl 2013-08-27 02:00:23 +02:00
Philipp Hagemeister
341ca8d74c [trilulilu] Add support for trilulilu.ro
Fun fact: The ads (not yet supported) are loaded from youtube ;)
2013-08-27 01:59:00 +02:00
Allan Zhou
99859d436c Merge remote-tracking branch 'upstream/master' 2013-08-26 15:16:13 -07:00
Jaime Marquínez Ferrándiz
1b01e2b085 Merge pull request #1315 from yasoob/master
fixed tests for c56 and dailymotion
2013-08-26 13:38:48 -07:00
M.Yasoob Ullah Khalid ☺
976fc7d137 fixed tests for c56 and dailymotion 2013-08-27 01:00:17 +05:00
Philipp Hagemeister
c3b7b29c23 Merge remote-tracking branch 'origin/master' 2013-08-26 21:29:44 +02:00
Philipp Hagemeister
627a91a9a8 [generic] small typo 2013-08-26 21:29:31 +02:00
Philipp Hagemeister
6dc6302599 Merge pull request #1231 from yasoob/master
Added an IE for hark.com
2013-08-26 12:29:04 -07:00
Pierre Rudloff
7a20e2e1f8 Merge remote-tracking branch 'upstream/master' 2013-08-26 03:16:42 +02:00
Philipp Hagemeister
90648143c3 Merge pull request #1310 from rzhxeo/rtlnow
Add support for http://superrtlnow.de
2013-08-25 15:45:22 -07:00
Pierre Rudloff
5c6658d4dd Merge remote-tracking branch 'upstream/master' 2013-08-24 23:01:39 +02:00
Philipp Hagemeister
9585f890f8 [generic] add support for relative URLs (Fixes #1308) 2013-08-24 22:56:37 +02:00
Philipp Hagemeister
0838239e8e [generic] Support double slash URLs (Fixes #1309) 2013-08-24 22:52:45 +02:00
Philipp Hagemeister
36399e8576 Install our own HTTPS handler as well (#1309) 2013-08-24 22:49:22 +02:00
Philipp Hagemeister
9460db832c [ro220] Add support for 220.ro 2013-08-24 21:10:03 +02:00
rzhxeo
d68730a56e Add SUPER RTL NOW to RTLnow extractor 2013-08-24 13:22:28 +02:00
Jaime Marquínez Ferrándiz
f2aeefe29c [youtube] update algo for length 84 2013-08-24 10:48:12 +02:00
Allan Zhou
39c6f507df Merge remote-tracking branch 'upstream/master' 2013-08-23 15:33:36 -07:00
Philipp Hagemeister
d2d1eb5b0a Switch to domain yt-dl.org 2013-08-23 23:57:23 +02:00
Allan Zhou
614d9c19c1 Merge remote-tracking branch 'upstream/master' 2013-08-22 17:02:41 -07:00
Ismael Mejia
bd2dee6c67 Merge branch 'master' into subtitles_rework 2013-08-23 01:47:10 +02:00
Ismael Mejia
18b4e04f1c Merge branch 'master' into subtitles_rework 2013-08-22 23:29:36 +02:00
Allan Zhou
b4e60dac23 Merge remote-tracking branch 'upstream/master' 2013-08-22 10:43:51 -07:00
Pierre Rudloff
ff2424595a lxml is not part of the standard library. 2013-08-22 14:47:51 +02:00
Pierre Rudloff
adeb9c73d6 Merge remote-tracking branch 'upstream/master' 2013-08-22 14:04:30 +02:00
Pierre Rudloff
cd0abcc0bb Extractor for canalc2.tv 2013-08-22 13:54:23 +02:00
Pierre Rudloff
05a2926c5c Merge remote-tracking branch 'upstream/master' 2013-08-22 12:55:58 +02:00
Pierre Rudloff
7070b83687 Merge remote-tracking branch 'upstream/master' 2013-08-22 12:54:17 +02:00
Pierre Rudloff
8d212e604a Merge remote-tracking branch 'upstream/master'
Conflicts:
	youtube_dl/extractor/jeuxvideo.py
2013-08-22 12:52:05 +02:00
Allan Zhou
5af7e056a7 Merge remote-tracking branch 'upstream/master' 2013-08-21 10:53:42 -07:00
Allan Zhou
a3f62b8255 Merge remote-tracking branch 'upstream/master' 2013-08-21 00:07:03 -07:00
Pierre Rudloff
943f7f7a39 Download videos from jeuxvideo.com 2013-08-18 16:11:47 +02:00
kkalpakloglou
5a27ecdd2e Update AddAnime.py 2013-08-16 23:54:09 +03:00
Yasoob
e3a88568b0 Added an IE for hark.com 2013-08-11 22:23:05 +05:00
rzhxeo
97b3656c2e YoupornIE: Add support for hd videos and update Test 2013-08-09 18:37:33 +02:00
rzhxeo
f3bcebb1d2 add an aes implementation 2013-08-09 18:36:01 +02:00
Ismael Mejia
d80a064eff [subtitles] Added tests to check correct behavior when no subtitles are
available
2013-08-08 22:22:33 +02:00
Ismael Mejia
d55de6eec2 [subtitles] Skips now the subtitles that has already been downloaded.
Just a validation for file exists, I also removed a method that wasn't
been used because it was a copy paste from FileDownloader.
2013-08-08 18:30:04 +02:00
Ismael Mejia
69df680b97 [subtitles] Improved docs + new class for servers who don't support
auto-caption
2013-08-08 11:20:56 +02:00
Ismael Mejia
447591e1ae [test] Cleaned subtitles tests 2013-08-08 11:03:52 +02:00
Ismael Mejia
33eb0ce4c4 [subtitles] removed only-sub option (--skip-download achieves the same
functionality)
2013-08-08 10:06:24 +02:00
Ismael Mejia
505c28aac9 Separated subtitle options in their own group 2013-08-08 09:53:25 +02:00
Ismael Mejia
8377574c9c [internal] Improved subtitle architecture + (update in
youtube/dailymotion)

The structure of subtitles was refined, you only need to implement one
method that returns a dictionnary of the available subtitles (lang, url) to
support all the subtitle options in a website. I updated the subtitle
downloaders for youtube/dailymotion to show how it works.
2013-08-08 08:54:10 +02:00
Ismael Mejia
372297e713 Undo the previous commit (it was a mistake) 2013-08-07 21:24:42 +02:00
Ismael Mejia
953e32b2c1 [dailymotion] Added support for subtitles + new InfoExtractor for
generic subtitle download.

The idea is that all subtitle downloaders must descend from SubtitlesIE
and implement only three basic methods to achieve the complete subtitle
download functionality. This will allow to reduce the code in YoutubeIE
once it is rewritten.
2013-08-07 18:59:11 +02:00
Ismael Mejia
5898e28272 Fixed small type issue 2013-08-07 18:48:24 +02:00
Ismael Mejia
67dfbc0cb9 Added exceptions for the subtitle and video types in .gitignore 2013-08-07 18:42:40 +02:00
huohuarong
d5b00ee6e0 improve sohu extractor 2013-08-06 10:26:57 +08:00
tsantala
461cead4f7 changes 2013-08-06 04:34:24 +03:00
huohuarong
b5a6d40818 fix parse title bug 2013-08-05 22:51:54 +08:00
Dan Church
968b5e0112 Add some verbosity when reporting finished downloads
For example:

    [download] Resuming download at byte 1868140
    [download] Destination: Entry #1-Bn59FJ4HrmU.flv
    [download] 100% of 3.27MiB in 4s

This format is meant to somewhat mirror the behavior of wget(1) when reporting finished downloads:

    100%[==================>] 54,836,682   788KB/s   in 74s

    2013-08-04 12:32:05 (728 KB/s) - 'google-chrome-stable_current_x86_64.rpm' saved [54836682/54836682]
2013-08-04 12:45:24 -05:00
huohuarong
4ec929dc9b use ..utils/clean_html() 2013-08-03 10:29:58 +08:00
huohuarong
6624a2b07d add an extractor for tv.sohu.com 2013-08-02 17:58:46 +08:00
huohuarong
6d3a7d03e1 fix bug: kankan extractor not support http://vod.kankan.com/v/70/70309.shtml 2013-08-02 15:26:11 +08:00
alphapapa
0025da15cf Clarify that download rate is in bytes per second
I found f918ec7ea2 but it is still not clear to anyone who hasn't read Issue #723 whether the limit is in bits or bytes.  This is doubly confusing because 1) ISPs usually advertise speeds in bits per second, and 2) lowercase "k" and "m" are often used in correlation with bits rather than bytes.
2013-07-13 16:42:16 -05:00
Jaime Marquínez Ferrándiz
56c7366547 YoutubeIE: reuse instances of InfoExtractors (closes #998)
When a IE is added to the list, it's also added to a dictionary. When a IE is requested it first looks in the dictionary and if there's no instance it will create a new one.

That way _real_initialize is only called once for each IE, saving time if it needs to login for example.
2013-07-08 15:14:27 +02:00
120 changed files with 6806 additions and 1316 deletions

10
.gitignore vendored
View File

@@ -17,4 +17,12 @@ youtube-dl.tar.gz
.coverage
cover/
updates_key.pem
*.egg-info
*.egg-info
*.srt
*.sbv
*.vtt
*.flv
*.mp4
*.part
test/testdata
.tox

View File

@@ -13,13 +13,13 @@ PYTHON=/usr/bin/env python
# set SYSCONFDIR to /etc if PREFIX=/usr or PREFIX=/usr/local
ifeq ($(PREFIX),/usr)
SYSCONFDIR=/etc
SYSCONFDIR=/etc
else
ifeq ($(PREFIX),/usr/local)
SYSCONFDIR=/etc
else
SYSCONFDIR=$(PREFIX)/etc
endif
ifeq ($(PREFIX),/usr/local)
SYSCONFDIR=/etc
else
SYSCONFDIR=$(PREFIX)/etc
endif
endif
install: youtube-dl youtube-dl.1 youtube-dl.bash-completion
@@ -71,6 +71,7 @@ youtube-dl.tar.gz: youtube-dl README.md README.txt youtube-dl.1 youtube-dl.bash-
--exclude '*~' \
--exclude '__pycache' \
--exclude '.git' \
--exclude 'testdata' \
-- \
bin devscripts test youtube_dl \
CHANGELOG LICENSE README.md README.txt \

View File

@@ -19,7 +19,10 @@ which means you can modify it, redistribute it or use it however you like.
-U, --update update this program to latest version. Make sure
that you have sufficient permissions (run with
sudo if needed)
-i, --ignore-errors continue on download errors
-i, --ignore-errors continue on download errors, for example to to
skip unavailable videos in a playlist
--abort-on-error Abort downloading of further videos (in the
playlist or the command line) if an error occurs
--dump-user-agent display the current browser identification
--user-agent UA specify a custom user agent
--referer REF specify a custom referer, use if the video access
@@ -29,6 +32,11 @@ which means you can modify it, redistribute it or use it however you like.
--extractor-descriptions Output descriptions of all supported extractors
--proxy URL Use the specified HTTP/HTTPS proxy
--no-check-certificate Suppress HTTPS certificate validation.
--cache-dir DIR Location in the filesystem where youtube-dl can
store downloaded information permanently. By
default $XDG_CACHE_HOME/youtube-dl or ~/.cache
/youtube-dl .
--no-cache-dir Disable filesystem caching
## Video Selection:
--playlist-start NUMBER playlist video to start at (default is 1)
@@ -45,11 +53,16 @@ which means you can modify it, redistribute it or use it however you like.
--date DATE download only videos uploaded in this date
--datebefore DATE download only videos uploaded before this date
--dateafter DATE download only videos uploaded after this date
--no-playlist download only the currently playing video
--age-limit YEARS download only videos suitable for the given age
--download-archive FILE Download only videos not present in the archive
file. Record all downloaded videos in it.
## Download Options:
-r, --rate-limit LIMIT maximum download rate (e.g. 50k or 44.6m)
-r, --rate-limit LIMIT maximum download rate in bytes per second (e.g.
50K or 4.2M)
-R, --retries RETRIES number of retries (default is 10)
--buffer-size SIZE size of download buffer (e.g. 1024 or 16k)
--buffer-size SIZE size of download buffer (e.g. 1024 or 16K)
(default is 1024)
--no-resize-buffer do not automatically adjust the buffer size. By
default, the buffer size is automatically resized
@@ -65,15 +78,17 @@ which means you can modify it, redistribute it or use it however you like.
%(uploader_id)s for the uploader nickname if
different, %(autonumber)s to get an automatically
incremented number, %(ext)s for the filename
extension, %(upload_date)s for the upload date
(YYYYMMDD), %(extractor)s for the provider
(youtube, metacafe, etc), %(id)s for the video id
, %(playlist)s for the playlist the video is in,
%(playlist_index)s for the position in the
playlist and %% for a literal percent. Use - to
output to stdout. Can also be used to download to
a different directory, for example with -o '/my/d
ownloads/%(uploader)s/%(title)s-%(id)s.%(ext)s' .
extension, %(format)s for the format description
(like "22 - 1280x720" or "HD")%(upload_date)s for
the upload date (YYYYMMDD), %(extractor)s for the
provider (youtube, metacafe, etc), %(id)s for the
video id , %(playlist)s for the playlist the
video is in, %(playlist_index)s for the position
in the playlist and %% for a literal percent. Use
- to output to stdout. Can also be used to
download to a different directory, for example
with -o '/my/downloads/%(uploader)s/%(title)s-%(i
d)s.%(ext)s' .
--autonumber-size NUMBER Specifies the number of digits in %(autonumber)s
when it is present in output filename template or
--autonumber option is given
@@ -90,6 +105,7 @@ which means you can modify it, redistribute it or use it however you like.
file modification time
--write-description write video description to a .description file
--write-info-json write video metadata to a .info.json file
--write-annotations write video annotations to a .annotation file
--write-thumbnail write thumbnail image to disk
## Verbosity / Simulation Options:
@@ -113,7 +129,8 @@ which means you can modify it, redistribute it or use it however you like.
## Video Format Options:
-f, --format FORMAT video format code, specifiy the order of
preference using slashes: "-f 22/17/18"
preference using slashes: "-f 22/17/18". "-f mp4"
and "-f flv" are also supported
--all-formats download all available video formats
--prefer-free-formats prefer free video formats unless a specific one
is requested
@@ -122,10 +139,8 @@ which means you can modify it, redistribute it or use it however you like.
only)
## Subtitle Options:
--write-sub write subtitle file (currently youtube only)
--write-auto-sub write automatic subtitle file (currently youtube
only)
--only-sub [deprecated] alias of --skip-download
--write-sub write subtitle file
--write-auto-sub write automatic subtitle file (youtube only)
--all-subs downloads all the available subtitles of the
video
--list-subs lists all available subtitles for the video
@@ -157,6 +172,7 @@ which means you can modify it, redistribute it or use it however you like.
processed files are overwritten by default
--embed-subs embed subtitles in the video (only for mp4
videos)
--add-metadata add metadata to the files
# CONFIGURATION

View File

@@ -1,14 +1,18 @@
__youtube-dl()
__youtube_dl()
{
local cur prev opts
COMPREPLY=()
cur="${COMP_WORDS[COMP_CWORD]}"
opts="{{flags}}"
keywords=":ytfavorites :ytrecommended :ytsubscriptions :ytwatchlater"
if [[ ${cur} == * ]] ; then
if [[ ${cur} =~ : ]]; then
COMPREPLY=( $(compgen -W "${keywords}" -- ${cur}) )
return 0
elif [[ ${cur} == * ]] ; then
COMPREPLY=( $(compgen -W "${opts}" -- ${cur}) )
return 0
fi
}
complete -F __youtube-dl youtube-dl
complete -F __youtube_dl youtube-dl

405
devscripts/buildserver.py Normal file
View File

@@ -0,0 +1,405 @@
#!/usr/bin/python3
from http.server import HTTPServer, BaseHTTPRequestHandler
from socketserver import ThreadingMixIn
import argparse
import ctypes
import functools
import sys
import threading
import traceback
import os.path
class BuildHTTPServer(ThreadingMixIn, HTTPServer):
allow_reuse_address = True
advapi32 = ctypes.windll.advapi32
SC_MANAGER_ALL_ACCESS = 0xf003f
SC_MANAGER_CREATE_SERVICE = 0x02
SERVICE_WIN32_OWN_PROCESS = 0x10
SERVICE_AUTO_START = 0x2
SERVICE_ERROR_NORMAL = 0x1
DELETE = 0x00010000
SERVICE_STATUS_START_PENDING = 0x00000002
SERVICE_STATUS_RUNNING = 0x00000004
SERVICE_ACCEPT_STOP = 0x1
SVCNAME = 'youtubedl_builder'
LPTSTR = ctypes.c_wchar_p
START_CALLBACK = ctypes.WINFUNCTYPE(None, ctypes.c_int, ctypes.POINTER(LPTSTR))
class SERVICE_TABLE_ENTRY(ctypes.Structure):
_fields_ = [
('lpServiceName', LPTSTR),
('lpServiceProc', START_CALLBACK)
]
HandlerEx = ctypes.WINFUNCTYPE(
ctypes.c_int, # return
ctypes.c_int, # dwControl
ctypes.c_int, # dwEventType
ctypes.c_void_p, # lpEventData,
ctypes.c_void_p, # lpContext,
)
def _ctypes_array(c_type, py_array):
ar = (c_type * len(py_array))()
ar[:] = py_array
return ar
def win_OpenSCManager():
res = advapi32.OpenSCManagerW(None, None, SC_MANAGER_ALL_ACCESS)
if not res:
raise Exception('Opening service manager failed - '
'are you running this as administrator?')
return res
def win_install_service(service_name, cmdline):
manager = win_OpenSCManager()
try:
h = advapi32.CreateServiceW(
manager, service_name, None,
SC_MANAGER_CREATE_SERVICE, SERVICE_WIN32_OWN_PROCESS,
SERVICE_AUTO_START, SERVICE_ERROR_NORMAL,
cmdline, None, None, None, None, None)
if not h:
raise OSError('Service creation failed: %s' % ctypes.FormatError())
advapi32.CloseServiceHandle(h)
finally:
advapi32.CloseServiceHandle(manager)
def win_uninstall_service(service_name):
manager = win_OpenSCManager()
try:
h = advapi32.OpenServiceW(manager, service_name, DELETE)
if not h:
raise OSError('Could not find service %s: %s' % (
service_name, ctypes.FormatError()))
try:
if not advapi32.DeleteService(h):
raise OSError('Deletion failed: %s' % ctypes.FormatError())
finally:
advapi32.CloseServiceHandle(h)
finally:
advapi32.CloseServiceHandle(manager)
def win_service_report_event(service_name, msg, is_error=True):
with open('C:/sshkeys/log', 'a', encoding='utf-8') as f:
f.write(msg + '\n')
event_log = advapi32.RegisterEventSourceW(None, service_name)
if not event_log:
raise OSError('Could not report event: %s' % ctypes.FormatError())
try:
type_id = 0x0001 if is_error else 0x0004
event_id = 0xc0000000 if is_error else 0x40000000
lines = _ctypes_array(LPTSTR, [msg])
if not advapi32.ReportEventW(
event_log, type_id, 0, event_id, None, len(lines), 0,
lines, None):
raise OSError('Event reporting failed: %s' % ctypes.FormatError())
finally:
advapi32.DeregisterEventSource(event_log)
def win_service_handler(stop_event, *args):
try:
raise ValueError('Handler called with args ' + repr(args))
TODO
except Exception as e:
tb = traceback.format_exc()
msg = str(e) + '\n' + tb
win_service_report_event(service_name, msg, is_error=True)
raise
def win_service_set_status(handle, status_code):
svcStatus = SERVICE_STATUS()
svcStatus.dwServiceType = SERVICE_WIN32_OWN_PROCESS
svcStatus.dwCurrentState = status_code
svcStatus.dwControlsAccepted = SERVICE_ACCEPT_STOP
svcStatus.dwServiceSpecificExitCode = 0
if not advapi32.SetServiceStatus(handle, ctypes.byref(svcStatus)):
raise OSError('SetServiceStatus failed: %r' % ctypes.FormatError())
def win_service_main(service_name, real_main, argc, argv_raw):
try:
#args = [argv_raw[i].value for i in range(argc)]
stop_event = threading.Event()
handler = HandlerEx(functools.partial(stop_event, win_service_handler))
h = advapi32.RegisterServiceCtrlHandlerExW(service_name, handler, None)
if not h:
raise OSError('Handler registration failed: %s' %
ctypes.FormatError())
TODO
except Exception as e:
tb = traceback.format_exc()
msg = str(e) + '\n' + tb
win_service_report_event(service_name, msg, is_error=True)
raise
def win_service_start(service_name, real_main):
try:
cb = START_CALLBACK(
functools.partial(win_service_main, service_name, real_main))
dispatch_table = _ctypes_array(SERVICE_TABLE_ENTRY, [
SERVICE_TABLE_ENTRY(
service_name,
cb
),
SERVICE_TABLE_ENTRY(None, ctypes.cast(None, START_CALLBACK))
])
if not advapi32.StartServiceCtrlDispatcherW(dispatch_table):
raise OSError('ctypes start failed: %s' % ctypes.FormatError())
except Exception as e:
tb = traceback.format_exc()
msg = str(e) + '\n' + tb
win_service_report_event(service_name, msg, is_error=True)
raise
def main(args=None):
parser = argparse.ArgumentParser()
parser.add_argument('-i', '--install',
action='store_const', dest='action', const='install',
help='Launch at Windows startup')
parser.add_argument('-u', '--uninstall',
action='store_const', dest='action', const='uninstall',
help='Remove Windows service')
parser.add_argument('-s', '--service',
action='store_const', dest='action', const='service',
help='Run as a Windows service')
parser.add_argument('-b', '--bind', metavar='<host:port>',
action='store', default='localhost:8142',
help='Bind to host:port (default %default)')
options = parser.parse_args(args=args)
if options.action == 'install':
fn = os.path.abspath(__file__).replace('v:', '\\\\vboxsrv\\vbox')
cmdline = '%s %s -s -b %s' % (sys.executable, fn, options.bind)
win_install_service(SVCNAME, cmdline)
return
if options.action == 'uninstall':
win_uninstall_service(SVCNAME)
return
if options.action == 'service':
win_service_start(SVCNAME, main)
return
host, port_str = options.bind.split(':')
port = int(port_str)
print('Listening on %s:%d' % (host, port))
srv = BuildHTTPServer((host, port), BuildHTTPRequestHandler)
thr = threading.Thread(target=srv.serve_forever)
thr.start()
input('Press ENTER to shut down')
srv.shutdown()
thr.join()
def rmtree(path):
for name in os.listdir(path):
fname = os.path.join(path, name)
if os.path.isdir(fname):
rmtree(fname)
else:
os.chmod(fname, 0o666)
os.remove(fname)
os.rmdir(path)
#==============================================================================
class BuildError(Exception):
def __init__(self, output, code=500):
self.output = output
self.code = code
def __str__(self):
return self.output
class HTTPError(BuildError):
pass
class PythonBuilder(object):
def __init__(self, **kwargs):
pythonVersion = kwargs.pop('python', '2.7')
try:
key = _winreg.OpenKey(_winreg.HKEY_LOCAL_MACHINE, r'SOFTWARE\Python\PythonCore\%s\InstallPath' % pythonVersion)
try:
self.pythonPath, _ = _winreg.QueryValueEx(key, '')
finally:
_winreg.CloseKey(key)
except Exception:
raise BuildError('No such Python version: %s' % pythonVersion)
super(PythonBuilder, self).__init__(**kwargs)
class GITInfoBuilder(object):
def __init__(self, **kwargs):
try:
self.user, self.repoName = kwargs['path'][:2]
self.rev = kwargs.pop('rev')
except ValueError:
raise BuildError('Invalid path')
except KeyError as e:
raise BuildError('Missing mandatory parameter "%s"' % e.args[0])
path = os.path.join(os.environ['APPDATA'], 'Build archive', self.repoName, self.user)
if not os.path.exists(path):
os.makedirs(path)
self.basePath = tempfile.mkdtemp(dir=path)
self.buildPath = os.path.join(self.basePath, 'build')
super(GITInfoBuilder, self).__init__(**kwargs)
class GITBuilder(GITInfoBuilder):
def build(self):
try:
subprocess.check_output(['git', 'clone', 'git://github.com/%s/%s.git' % (self.user, self.repoName), self.buildPath])
subprocess.check_output(['git', 'checkout', self.rev], cwd=self.buildPath)
except subprocess.CalledProcessError as e:
raise BuildError(e.output)
super(GITBuilder, self).build()
class YoutubeDLBuilder(object):
authorizedUsers = ['fraca7', 'phihag', 'rg3', 'FiloSottile']
def __init__(self, **kwargs):
if self.repoName != 'youtube-dl':
raise BuildError('Invalid repository "%s"' % self.repoName)
if self.user not in self.authorizedUsers:
raise HTTPError('Unauthorized user "%s"' % self.user, 401)
super(YoutubeDLBuilder, self).__init__(**kwargs)
def build(self):
try:
subprocess.check_output([os.path.join(self.pythonPath, 'python.exe'), 'setup.py', 'py2exe'],
cwd=self.buildPath)
except subprocess.CalledProcessError as e:
raise BuildError(e.output)
super(YoutubeDLBuilder, self).build()
class DownloadBuilder(object):
def __init__(self, **kwargs):
self.handler = kwargs.pop('handler')
self.srcPath = os.path.join(self.buildPath, *tuple(kwargs['path'][2:]))
self.srcPath = os.path.abspath(os.path.normpath(self.srcPath))
if not self.srcPath.startswith(self.buildPath):
raise HTTPError(self.srcPath, 401)
super(DownloadBuilder, self).__init__(**kwargs)
def build(self):
if not os.path.exists(self.srcPath):
raise HTTPError('No such file', 404)
if os.path.isdir(self.srcPath):
raise HTTPError('Is a directory: %s' % self.srcPath, 401)
self.handler.send_response(200)
self.handler.send_header('Content-Type', 'application/octet-stream')
self.handler.send_header('Content-Disposition', 'attachment; filename=%s' % os.path.split(self.srcPath)[-1])
self.handler.send_header('Content-Length', str(os.stat(self.srcPath).st_size))
self.handler.end_headers()
with open(self.srcPath, 'rb') as src:
shutil.copyfileobj(src, self.handler.wfile)
super(DownloadBuilder, self).build()
class CleanupTempDir(object):
def build(self):
try:
rmtree(self.basePath)
except Exception as e:
print('WARNING deleting "%s": %s' % (self.basePath, e))
super(CleanupTempDir, self).build()
class Null(object):
def __init__(self, **kwargs):
pass
def start(self):
pass
def close(self):
pass
def build(self):
pass
class Builder(PythonBuilder, GITBuilder, YoutubeDLBuilder, DownloadBuilder, CleanupTempDir, Null):
pass
class BuildHTTPRequestHandler(BaseHTTPRequestHandler):
actionDict = { 'build': Builder, 'download': Builder } # They're the same, no more caching.
def do_GET(self):
path = urlparse.urlparse(self.path)
paramDict = dict([(key, value[0]) for key, value in urlparse.parse_qs(path.query).items()])
action, _, path = path.path.strip('/').partition('/')
if path:
path = path.split('/')
if action in self.actionDict:
try:
builder = self.actionDict[action](path=path, handler=self, **paramDict)
builder.start()
try:
builder.build()
finally:
builder.close()
except BuildError as e:
self.send_response(e.code)
msg = unicode(e).encode('UTF-8')
self.send_header('Content-Type', 'text/plain; charset=UTF-8')
self.send_header('Content-Length', len(msg))
self.end_headers()
self.wfile.write(msg)
except HTTPError as e:
self.send_response(e.code, str(e))
else:
self.send_response(500, 'Unknown build method "%s"' % action)
else:
self.send_response(500, 'Malformed URL')
#==============================================================================
if __name__ == '__main__':
main()

View File

@@ -3,31 +3,40 @@
import json
import sys
import hashlib
import urllib.request
import os.path
if len(sys.argv) <= 1:
print('Specify the version number as parameter')
sys.exit()
print('Specify the version number as parameter')
sys.exit()
version = sys.argv[1]
with open('update/LATEST_VERSION', 'w') as f:
f.write(version)
f.write(version)
versions_info = json.load(open('update/versions.json'))
if 'signature' in versions_info:
del versions_info['signature']
del versions_info['signature']
new_version = {}
filenames = {'bin': 'youtube-dl', 'exe': 'youtube-dl.exe', 'tar': 'youtube-dl-%s.tar.gz' % version}
filenames = {
'bin': 'youtube-dl',
'exe': 'youtube-dl.exe',
'tar': 'youtube-dl-%s.tar.gz' % version}
build_dir = os.path.join('..', '..', 'build', version)
for key, filename in filenames.items():
print('Downloading and checksumming %s...' %filename)
url = 'http://youtube-dl.org/downloads/%s/%s' % (version, filename)
data = urllib.request.urlopen(url).read()
sha256sum = hashlib.sha256(data).hexdigest()
new_version[key] = (url, sha256sum)
url = 'https://yt-dl.org/downloads/%s/%s' % (version, filename)
fn = os.path.join(build_dir, filename)
with open(fn, 'rb') as f:
data = f.read()
if not data:
raise ValueError('File %s is empty!' % fn)
sha256sum = hashlib.sha256(data).hexdigest()
new_version[key] = (url, sha256sum)
versions_info['versions'][version] = new_version
versions_info['latest'] = version
json.dump(versions_info, open('update/versions.json', 'w'), indent=4, sort_keys=True)
with open('update/versions.json', 'w') as jsonf:
json.dump(versions_info, jsonf, indent=4, sort_keys=True)

View File

@@ -22,7 +22,7 @@ entry_template=textwrap.dedent("""
<atom:link href="http://rg3.github.io/youtube-dl" />
<atom:content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
Downloads available at <a href="http://youtube-dl.org/downloads/@VERSION@/">http://youtube-dl.org/downloads/@VERSION@/</a>
Downloads available at <a href="https://yt-dl.org/downloads/@VERSION@/">https://yt-dl.org/downloads/@VERSION@/</a>
</div>
</atom:content>
<atom:author>
@@ -54,4 +54,3 @@ atom_template = atom_template.replace('@ENTRIES@', entries_str)
with open('update/releases.atom','w',encoding='utf-8') as atom_file:
atom_file.write(atom_template)

View File

@@ -0,0 +1,34 @@
#!/usr/bin/env python3
import sys
import os
import textwrap
# We must be able to import youtube_dl
sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))
import youtube_dl
def main():
with open('supportedsites.html.in', 'r', encoding='utf-8') as tmplf:
template = tmplf.read()
ie_htmls = []
for ie in sorted(youtube_dl.gen_extractors(), key=lambda i: i.IE_NAME.lower()):
ie_html = '<b>{}</b>'.format(ie.IE_NAME)
ie_desc = getattr(ie, 'IE_DESC', None)
if ie_desc is False:
continue
elif ie_desc is not None:
ie_html += ': {}'.format(ie.IE_DESC)
if ie.working() == False:
ie_html += ' (Currently broken)'
ie_htmls.append('<li>{}</li>'.format(ie_html))
template = template.replace('@SITES@', textwrap.indent('\n'.join(ie_htmls), '\t'))
with open('supportedsites.html', 'w', encoding='utf-8') as sitesf:
sitesf.write(template)
if __name__ == '__main__':
main()

View File

@@ -55,8 +55,8 @@ git push origin "$version"
/bin/echo -e "\n### OK, now it is time to build the binaries..."
REV=$(git rev-parse HEAD)
make youtube-dl youtube-dl.tar.gz
wget "http://jeromelaheurte.net:8142/download/rg3/youtube-dl/youtube-dl.exe?rev=$REV" -O youtube-dl.exe || \
wget "http://jeromelaheurte.net:8142/build/rg3/youtube-dl/youtube-dl.exe?rev=$REV" -O youtube-dl.exe
read -p "VM running? (y/n) " -n 1
wget "http://localhost:8142/build/rg3/youtube-dl/youtube-dl.exe?rev=$REV" -O youtube-dl.exe
mkdir -p "build/$version"
mv youtube-dl youtube-dl.exe "build/$version"
mv youtube-dl.tar.gz "build/$version/youtube-dl-$version.tar.gz"
@@ -67,7 +67,7 @@ RELEASE_FILES="youtube-dl youtube-dl.exe youtube-dl-$version.tar.gz"
(cd build/$version/ && sha512sum $RELEASE_FILES > SHA2-512SUMS)
git checkout HEAD -- youtube-dl youtube-dl.exe
/bin/echo -e "\n### Signing and uploading the new binaries to youtube-dl.org..."
/bin/echo -e "\n### Signing and uploading the new binaries to yt-dl.org ..."
for f in $RELEASE_FILES; do gpg --detach-sig "build/$version/$f"; done
scp -r "build/$version" ytdl@yt-dl.org:html/tmp/
ssh ytdl@yt-dl.org "mv html/tmp/$version html/downloads/"
@@ -85,12 +85,9 @@ ROOT=$(pwd)
"$ROOT/devscripts/gh-pages/sign-versions.py" < "$ROOT/updates_key.pem"
"$ROOT/devscripts/gh-pages/generate-download.py"
"$ROOT/devscripts/gh-pages/update-copyright.py"
"$ROOT/devscripts/gh-pages/update-sites.py"
git add *.html *.html.in update
git commit -m "release $version"
git show HEAD
read -p "Is it good, can I push? (y/n) " -n 1
if [[ ! $REPLY =~ ^[Yy]$ ]]; then exit 1; fi
echo
git push "$ROOT" gh-pages
git push "$ORIGIN_URL" gh-pages
)

View File

@@ -1,109 +0,0 @@
#!/usr/bin/env python
# Generate youtube signature algorithm from test cases
import sys
tests = [
# 92 - vflQw-fB4 2013/07/17
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[]}|:;?/>.<'`~\"",
"mrtyuioplkjhgfdsazxcvbnq1234567890QWERTY}IOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[]\"|:;"),
# 90
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[]}|:;?/>.<'`",
"mrtyuioplkjhgfdsazxcvbne1234567890QWER[YUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={`]}|"),
# 89
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[]}|:;?/>.<'",
"/?;:|}<[{=+-_)(*&^%$#@!MqBVCXZASDFGHJKLPOIUYTREWQ0987654321mnbvcxzasdfghjklpoiuyt"),
# 88
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[]}|:;?/>.<",
"J:|}][{=+-_)(*&;%$#@>MNBVCXZASDFGH^KLPOIUYTREWQ0987654321mnbvcxzasdfghrklpoiuytej"),
# 87
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$^&*()_-+={[]}|:;?/>.<",
"uioplkjhgfdsazxcvbnm1t34567890QWE2TYUIOPLKJHGFDSAZXCVeNM!@#$^&*()_-+={[]}|:;?/>.<"),
# 86 - vflh9ybst 2013/08/23
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[|};?/>.<",
"yuioplkjhgfdsazxcvbnm1234567890QWERrYUIOPLKqHGFDSAZXCVBNM!@#$%^&*()_-+={[|};?/>.<"),
# 85
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[};?/>.<",
".>/?;}[{=+-_)(*&^%$#@!MNBVCXZASDFGHJKLPOIUYTREWQ0q876543r1mnbvcx9asdfghjklpoiuyt2"),
# 84
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[};?>.<",
"<.>?;}[{=+-_)(*&^%$#@!MNBVCXZASDFGHJKLPOIUYTREWQ09876543q1mnbvcxzasdfghjklpoiuew2"),
# 83
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!#$%^&*()_+={[};?/>.<",
".>/?;}[{=+_)(*&^%<#!MNBVCXZASPFGHJKLwOIUYTREWQ0987654321mnbvcxzasdfghjklpoiuytreq"),
# 82 - vflZK4ZYR 2013/08/23
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKHGFDSAZXCVBNM!@#$%^&*(-+={[};?/>.<",
"wertyuioplkjhgfdsaqxcvbnm1234567890QWERTYUIOPLKHGFDSAZXCVBNM!@#$%^&z(-+={[};?/>.<"),
# 81 - vflLC8JvQ 2013/07/25
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKHGFDSAZXCVBNM!@#$%^&*(-+={[};?/>.",
"C>/?;}[{=+-(*&^%$#@!MNBVYXZASDFGHKLPOIU.TREWQ0q87659321mnbvcxzasdfghjkl4oiuytrewp"),
# 80 - vflZK4ZYR 2013/08/23 (sporadic)
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKHGFDSAZXCVBNM!@#$%^&*(-+={[};?/>",
"wertyuioplkjhgfdsaqxcvbnm1234567890QWERTYUIOPLKHGFDSAZXCVBNM!@#$%^&z(-+={[};?/>"),
# 79 - vflLC8JvQ 2013/07/25 (sporadic)
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKHGFDSAZXCVBNM!@#$%^&*(-+={[};?/",
"Z?;}[{=+-(*&^%$#@!MNBVCXRASDFGHKLPOIUYT/EWQ0q87659321mnbvcxzasdfghjkl4oiuytrewp"),
]
tests_age_gate = [
# 86 - vflqinMWD
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[|};?/>.<",
"ertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!/#$%^&*()_-+={[|};?@"),
]
def find_matching(wrong, right):
idxs = [wrong.index(c) for c in right]
return compress(idxs)
return ('s[%d]' % i for i in idxs)
def compress(idxs):
def _genslice(start, end, step):
starts = '' if start == 0 else str(start)
ends = ':%d' % (end+step)
steps = '' if step == 1 else (':%d' % step)
return 's[%s%s%s]' % (starts, ends, steps)
step = None
for i, prev in zip(idxs[1:], idxs[:-1]):
if step is not None:
if i - prev == step:
continue
yield _genslice(start, prev, step)
step = None
continue
if i - prev in [-1, 1]:
step = i - prev
start = prev
continue
else:
yield 's[%d]' % prev
if step is None:
yield 's[%d]' % i
else:
yield _genslice(start, i, step)
def _assert_compress(inp, exp):
res = list(compress(inp))
if res != exp:
print('Got %r, expected %r' % (res, exp))
assert res == exp
_assert_compress([0,2,4,6], ['s[0]', 's[2]', 's[4]', 's[6]'])
_assert_compress([0,1,2,4,6,7], ['s[:3]', 's[4]', 's[6:8]'])
_assert_compress([8,0,1,2,4,7,6,9], ['s[8]', 's[:3]', 's[4]', 's[7:5:-1]', 's[9]'])
def gen(wrong, right, indent):
code = ' + '.join(find_matching(wrong, right))
return 'if len(s) == %d:\n%s return %s\n' % (len(wrong), indent, code)
def genall(tests):
indent = ' ' * 8
return indent + (indent + 'el').join(gen(wrong, right, indent) for wrong,right in tests)
def main():
print(genall(tests))
print(u' Age gate:')
print(genall(tests_age_gate))
if __name__ == '__main__':
main()

View File

@@ -63,6 +63,7 @@ setup(
' YouTube.com and other video sites.',
url='https://github.com/rg3/youtube-dl',
author='Ricardo Garcia',
author_email='ytdl@yt-dl.org',
maintainer='Philipp Hagemeister',
maintainer_email='phihag@phihag.de',
packages=['youtube_dl', 'youtube_dl.extractor'],

0
test/__init__.py Normal file
View File

View File

@@ -1,38 +1,63 @@
import errno
import io
import hashlib
import json
import os.path
import re
import types
import youtube_dl.extractor
from youtube_dl import YoutubeDL, YoutubeDLHandler
from youtube_dl.utils import (
compat_cookiejar,
compat_urllib_request,
)
from youtube_dl import YoutubeDL
# General configuration (from __init__, not very elegant...)
jar = compat_cookiejar.CookieJar()
cookie_processor = compat_urllib_request.HTTPCookieProcessor(jar)
proxy_handler = compat_urllib_request.ProxyHandler()
opener = compat_urllib_request.build_opener(proxy_handler, cookie_processor, YoutubeDLHandler())
compat_urllib_request.install_opener(opener)
PARAMETERS_FILE = os.path.join(os.path.dirname(os.path.abspath(__file__)), "parameters.json")
with io.open(PARAMETERS_FILE, encoding='utf-8') as pf:
parameters = json.load(pf)
def global_setup():
youtube_dl._setup_opener(timeout=10)
def get_params(override=None):
PARAMETERS_FILE = os.path.join(os.path.dirname(os.path.abspath(__file__)),
"parameters.json")
with io.open(PARAMETERS_FILE, encoding='utf-8') as pf:
parameters = json.load(pf)
if override:
parameters.update(override)
return parameters
def try_rm(filename):
""" Remove a file if it exists """
try:
os.remove(filename)
except OSError as ose:
if ose.errno != errno.ENOENT:
raise
class FakeYDL(YoutubeDL):
def __init__(self):
self.result = []
def __init__(self, override=None):
# Different instances of the downloader can't share the same dictionary
# some test set the "sublang" parameter, which would break the md5 checks.
self.params = dict(parameters)
def to_screen(self, s):
params = get_params(override=override)
super(FakeYDL, self).__init__(params)
self.result = []
def to_screen(self, s, skip_eol=None):
print(s)
def trouble(self, s, tb=None):
raise Exception(s)
def download(self, x):
self.result.append(x)
def expect_warning(self, regex):
# Silence an expected warning matching a regex
old_report_warning = self.report_warning
def report_warning(self, message):
if re.match(regex, message): return
old_report_warning(message)
self.report_warning = types.MethodType(report_warning, self)
def get_testcases():
for ie in youtube_dl.extractor.gen_extractors():
t = getattr(ie, '_TEST', None)
@@ -42,3 +67,6 @@ def get_testcases():
for t in getattr(ie, '_TESTS', []):
t['name'] = type(ie).__name__[:-len('IE')]
yield t
md5 = lambda s: hashlib.md5(s.encode('utf-8')).hexdigest()

View File

@@ -38,7 +38,6 @@
"writedescription": false,
"writeinfojson": true,
"writesubtitles": false,
"onlysubtitles": false,
"allsubtitles": false,
"listssubtitles": false
}

133
test/test_YoutubeDL.py Normal file
View File

@@ -0,0 +1,133 @@
#!/usr/bin/env python
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import FakeYDL
class YDL(FakeYDL):
def __init__(self, *args, **kwargs):
super(YDL, self).__init__(*args, **kwargs)
self.downloaded_info_dicts = []
self.msgs = []
def process_info(self, info_dict):
self.downloaded_info_dicts.append(info_dict)
def to_screen(self, msg):
self.msgs.append(msg)
class TestFormatSelection(unittest.TestCase):
def test_prefer_free_formats(self):
# Same resolution => download webm
ydl = YDL()
ydl.params['prefer_free_formats'] = True
formats = [
{u'ext': u'webm', u'height': 460},
{u'ext': u'mp4', u'height': 460},
]
info_dict = {u'formats': formats, u'extractor': u'test'}
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded[u'ext'], u'webm')
# Different resolution => download best quality (mp4)
ydl = YDL()
ydl.params['prefer_free_formats'] = True
formats = [
{u'ext': u'webm', u'height': 720},
{u'ext': u'mp4', u'height': 1080},
]
info_dict[u'formats'] = formats
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded[u'ext'], u'mp4')
# No prefer_free_formats => keep original formats order
ydl = YDL()
ydl.params['prefer_free_formats'] = False
formats = [
{u'ext': u'webm', u'height': 720},
{u'ext': u'flv', u'height': 720},
]
info_dict[u'formats'] = formats
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded[u'ext'], u'flv')
def test_format_limit(self):
formats = [
{u'format_id': u'meh'},
{u'format_id': u'good'},
{u'format_id': u'great'},
{u'format_id': u'excellent'},
]
info_dict = {
u'formats': formats, u'extractor': u'test', 'id': 'testvid'}
ydl = YDL()
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded[u'format_id'], u'excellent')
ydl = YDL({'format_limit': 'good'})
assert ydl.params['format_limit'] == 'good'
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded[u'format_id'], u'good')
ydl = YDL({'format_limit': 'great', 'format': 'all'})
ydl.process_ie_result(info_dict)
self.assertEqual(ydl.downloaded_info_dicts[0][u'format_id'], u'meh')
self.assertEqual(ydl.downloaded_info_dicts[1][u'format_id'], u'good')
self.assertEqual(ydl.downloaded_info_dicts[2][u'format_id'], u'great')
self.assertTrue('3' in ydl.msgs[0])
ydl = YDL()
ydl.params['format_limit'] = 'excellent'
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded[u'format_id'], u'excellent')
def test_format_selection(self):
formats = [
{u'format_id': u'35', u'ext': u'mp4'},
{u'format_id': u'45', u'ext': u'webm'},
{u'format_id': u'47', u'ext': u'webm'},
{u'format_id': u'2', u'ext': u'flv'},
]
info_dict = {u'formats': formats, u'extractor': u'test'}
ydl = YDL({'format': u'20/47'})
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], u'47')
ydl = YDL({'format': u'20/71/worst'})
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], u'35')
ydl = YDL()
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], u'2')
ydl = YDL({'format': u'webm/mp4'})
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], u'47')
ydl = YDL({'format': u'3gp/40/mp4'})
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], u'35')
if __name__ == '__main__':
unittest.main()

View File

@@ -0,0 +1,55 @@
#!/usr/bin/env python
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import global_setup, try_rm
global_setup()
from youtube_dl import YoutubeDL
def _download_restricted(url, filename, age):
""" Returns true iff the file has been downloaded """
params = {
'age_limit': age,
'skip_download': True,
'writeinfojson': True,
"outtmpl": "%(id)s.%(ext)s",
}
ydl = YoutubeDL(params)
ydl.add_default_info_extractors()
json_filename = filename + '.info.json'
try_rm(json_filename)
ydl.download([url])
res = os.path.exists(json_filename)
try_rm(json_filename)
return res
class TestAgeRestriction(unittest.TestCase):
def _assert_restricted(self, url, filename, age, old_age=None):
self.assertTrue(_download_restricted(url, filename, old_age))
self.assertFalse(_download_restricted(url, filename, age))
def test_youtube(self):
self._assert_restricted('07FYdnEawAQ', '07FYdnEawAQ.mp4', 10)
def test_youporn(self):
self._assert_restricted(
'http://www.youporn.com/watch/505835/sex-ed-is-it-safe-to-masturbate-daily/',
'505835.mp4', 2, old_age=25)
def test_pornotube(self):
self._assert_restricted(
'http://pornotube.com/c/173/m/1689755/Marilyn-Monroe-Bathing',
'1689755.flv', 13)
if __name__ == '__main__':
unittest.main()

View File

@@ -1,34 +1,66 @@
#!/usr/bin/env python
import sys
import unittest
# Allow direct execution
import os
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import get_testcases
from youtube_dl.extractor import (
gen_extractors,
JustinTVIE,
YoutubeIE,
)
from youtube_dl.extractor import YoutubeIE, YoutubePlaylistIE, YoutubeChannelIE, JustinTVIE, gen_extractors
from helper import get_testcases
class TestAllURLsMatching(unittest.TestCase):
def setUp(self):
self.ies = gen_extractors()
def matching_ies(self, url):
return [ie.IE_NAME for ie in self.ies if ie.suitable(url) and ie.IE_NAME != 'generic']
def assertMatch(self, url, ie_list):
self.assertEqual(self.matching_ies(url), ie_list)
def test_youtube_playlist_matching(self):
self.assertTrue(YoutubePlaylistIE.suitable(u'ECUl4u3cNGP61MdtwGTqZA0MreSaDybji8'))
self.assertTrue(YoutubePlaylistIE.suitable(u'UUBABnxM4Ar9ten8Mdjj1j0Q')) #585
self.assertTrue(YoutubePlaylistIE.suitable(u'PL63F0C78739B09958'))
self.assertTrue(YoutubePlaylistIE.suitable(u'https://www.youtube.com/playlist?list=UUBABnxM4Ar9ten8Mdjj1j0Q'))
self.assertTrue(YoutubePlaylistIE.suitable(u'https://www.youtube.com/course?list=ECUl4u3cNGP61MdtwGTqZA0MreSaDybji8'))
self.assertTrue(YoutubePlaylistIE.suitable(u'https://www.youtube.com/playlist?list=PLwP_SiAcdui0KVebT0mU9Apz359a4ubsC'))
self.assertTrue(YoutubePlaylistIE.suitable(u'https://www.youtube.com/watch?v=AV6J6_AeFEQ&playnext=1&list=PL4023E734DA416012')) #668
self.assertFalse(YoutubePlaylistIE.suitable(u'PLtS2H6bU1M'))
assertPlaylist = lambda url: self.assertMatch(url, ['youtube:playlist'])
assertPlaylist(u'ECUl4u3cNGP61MdtwGTqZA0MreSaDybji8')
assertPlaylist(u'UUBABnxM4Ar9ten8Mdjj1j0Q') #585
assertPlaylist(u'PL63F0C78739B09958')
assertPlaylist(u'https://www.youtube.com/playlist?list=UUBABnxM4Ar9ten8Mdjj1j0Q')
assertPlaylist(u'https://www.youtube.com/course?list=ECUl4u3cNGP61MdtwGTqZA0MreSaDybji8')
assertPlaylist(u'https://www.youtube.com/playlist?list=PLwP_SiAcdui0KVebT0mU9Apz359a4ubsC')
assertPlaylist(u'https://www.youtube.com/watch?v=AV6J6_AeFEQ&playnext=1&list=PL4023E734DA416012') #668
self.assertFalse('youtube:playlist' in self.matching_ies(u'PLtS2H6bU1M'))
def test_youtube_matching(self):
self.assertTrue(YoutubeIE.suitable(u'PLtS2H6bU1M'))
self.assertFalse(YoutubeIE.suitable(u'https://www.youtube.com/watch?v=AV6J6_AeFEQ&playnext=1&list=PL4023E734DA416012')) #668
self.assertMatch('http://youtu.be/BaW_jenozKc', ['youtube'])
self.assertMatch('http://www.youtube.com/v/BaW_jenozKc', ['youtube'])
self.assertMatch('https://youtube.googleapis.com/v/BaW_jenozKc', ['youtube'])
def test_youtube_channel_matching(self):
self.assertTrue(YoutubeChannelIE.suitable('https://www.youtube.com/channel/HCtnHdj3df7iM'))
self.assertTrue(YoutubeChannelIE.suitable('https://www.youtube.com/channel/HCtnHdj3df7iM?feature=gb_ch_rec'))
self.assertTrue(YoutubeChannelIE.suitable('https://www.youtube.com/channel/HCtnHdj3df7iM/videos'))
assertChannel = lambda url: self.assertMatch(url, ['youtube:channel'])
assertChannel('https://www.youtube.com/channel/HCtnHdj3df7iM')
assertChannel('https://www.youtube.com/channel/HCtnHdj3df7iM?feature=gb_ch_rec')
assertChannel('https://www.youtube.com/channel/HCtnHdj3df7iM/videos')
def test_youtube_user_matching(self):
self.assertMatch('www.youtube.com/NASAgovVideo/videos', ['youtube:user'])
def test_youtube_feeds(self):
self.assertMatch('https://www.youtube.com/feed/watch_later', ['youtube:watch_later'])
self.assertMatch('https://www.youtube.com/feed/subscriptions', ['youtube:subscriptions'])
self.assertMatch('https://www.youtube.com/feed/recommended', ['youtube:recommended'])
self.assertMatch('https://www.youtube.com/my_favorites', ['youtube:favorites'])
def test_youtube_show_matching(self):
self.assertMatch('http://www.youtube.com/show/airdisasters', ['youtube:show'])
def test_justin_tv_channelid_matching(self):
self.assertTrue(JustinTVIE.suitable(u"justin.tv/vanillatv"))
@@ -47,10 +79,13 @@ class TestAllURLsMatching(unittest.TestCase):
self.assertTrue(JustinTVIE.suitable(u"http://www.twitch.tv/tsm_theoddone/c/2349361"))
def test_youtube_extract(self):
self.assertEqual(YoutubeIE()._extract_id('http://www.youtube.com/watch?&v=BaW_jenozKc'), 'BaW_jenozKc')
self.assertEqual(YoutubeIE()._extract_id('https://www.youtube.com/watch?&v=BaW_jenozKc'), 'BaW_jenozKc')
self.assertEqual(YoutubeIE()._extract_id('https://www.youtube.com/watch?feature=player_embedded&v=BaW_jenozKc'), 'BaW_jenozKc')
self.assertEqual(YoutubeIE()._extract_id('https://www.youtube.com/watch_popup?v=BaW_jenozKc'), 'BaW_jenozKc')
assertExtractId = lambda url, id: self.assertEqual(YoutubeIE()._extract_id(url), id)
assertExtractId('http://www.youtube.com/watch?&v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('https://www.youtube.com/watch?&v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('https://www.youtube.com/watch?feature=player_embedded&v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('https://www.youtube.com/watch_popup?v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('http://www.youtube.com/watch?v=BaW_jenozKcsharePLED17F32AD9753930', 'BaW_jenozKc')
assertExtractId('BaW_jenozKc', 'BaW_jenozKc')
def test_no_duplicates(self):
ies = gen_extractors()
@@ -63,15 +98,12 @@ class TestAllURLsMatching(unittest.TestCase):
self.assertFalse(ie.suitable(url), '%s should not match URL %r' % (type(ie).__name__, url))
def test_keywords(self):
ies = gen_extractors()
matching_ies = lambda url: [ie.IE_NAME for ie in ies
if ie.suitable(url) and ie.IE_NAME != 'generic']
self.assertEqual(matching_ies(':ytsubs'), ['youtube:subscriptions'])
self.assertEqual(matching_ies(':ytsubscriptions'), ['youtube:subscriptions'])
self.assertEqual(matching_ies(':thedailyshow'), ['ComedyCentral'])
self.assertEqual(matching_ies(':tds'), ['ComedyCentral'])
self.assertEqual(matching_ies(':colbertreport'), ['ComedyCentral'])
self.assertEqual(matching_ies(':cr'), ['ComedyCentral'])
self.assertMatch(':ytsubs', ['youtube:subscriptions'])
self.assertMatch(':ytsubscriptions', ['youtube:subscriptions'])
self.assertMatch(':thedailyshow', ['ComedyCentral'])
self.assertMatch(':tds', ['ComedyCentral'])
self.assertMatch(':colbertreport', ['ComedyCentral'])
self.assertMatch(':cr', ['ComedyCentral'])
if __name__ == '__main__':

View File

@@ -0,0 +1,70 @@
#!/usr/bin/env python
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import FakeYDL, global_setup, md5
global_setup()
from youtube_dl.extractor import DailymotionIE
class TestDailymotionSubtitles(unittest.TestCase):
def setUp(self):
self.DL = FakeYDL()
self.url = 'http://www.dailymotion.com/video/xczg00'
def getInfoDict(self):
IE = DailymotionIE(self.DL)
info_dict = IE.extract(self.url)
return info_dict
def getSubtitles(self):
info_dict = self.getInfoDict()
return info_dict[0]['subtitles']
def test_no_writesubtitles(self):
subtitles = self.getSubtitles()
self.assertEqual(subtitles, None)
def test_subtitles(self):
self.DL.params['writesubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['en']), '976553874490cba125086bbfea3ff76f')
def test_subtitles_lang(self):
self.DL.params['writesubtitles'] = True
self.DL.params['subtitleslangs'] = ['fr']
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['fr']), '594564ec7d588942e384e920e5341792')
def test_allsubtitles(self):
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(len(subtitles.keys()), 5)
def test_list_subtitles(self):
self.DL.expect_warning(u'Automatic Captions not supported by this server')
self.DL.params['listsubtitles'] = True
info_dict = self.getInfoDict()
self.assertEqual(info_dict, None)
def test_automatic_captions(self):
self.DL.expect_warning(u'Automatic Captions not supported by this server')
self.DL.params['writeautomaticsub'] = True
self.DL.params['subtitleslang'] = ['en']
subtitles = self.getSubtitles()
self.assertTrue(len(subtitles.keys()) == 0)
def test_nosubtitles(self):
self.DL.expect_warning(u'video doesn\'t have subtitles')
self.url = 'http://www.dailymotion.com/video/x12u166_le-zapping-tele-star-du-08-aout-2013_tv'
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(len(subtitles), 0)
def test_multiple_langs(self):
self.DL.params['writesubtitles'] = True
langs = ['es', 'fr', 'de']
self.DL.params['subtitleslangs'] = langs
subtitles = self.getSubtitles()
for lang in langs:
self.assertTrue(subtitles.get(lang) is not None, u'Subtitles for \'%s\' not extracted' % lang)
if __name__ == '__main__':
unittest.main()

View File

@@ -1,43 +1,31 @@
#!/usr/bin/env python
import errno
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import get_params, get_testcases, global_setup, try_rm, md5
global_setup()
import hashlib
import io
import os
import json
import unittest
import sys
import socket
import binascii
# Allow direct execution
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import youtube_dl.YoutubeDL
from youtube_dl.utils import *
PARAMETERS_FILE = os.path.join(os.path.dirname(os.path.abspath(__file__)), "parameters.json")
from youtube_dl.utils import (
compat_str,
compat_urllib_error,
DownloadError,
ExtractorError,
UnavailableVideoError,
)
RETRIES = 3
# General configuration (from __init__, not very elegant...)
jar = compat_cookiejar.CookieJar()
cookie_processor = compat_urllib_request.HTTPCookieProcessor(jar)
proxy_handler = compat_urllib_request.ProxyHandler()
opener = compat_urllib_request.build_opener(proxy_handler, cookie_processor, YoutubeDLHandler())
compat_urllib_request.install_opener(opener)
socket.setdefaulttimeout(10)
def _try_rm(filename):
""" Remove a file if it exists """
try:
os.remove(filename)
except OSError as ose:
if ose.errno != errno.ENOENT:
raise
md5 = lambda s: hashlib.md5(s.encode('utf-8')).hexdigest()
class YoutubeDL(youtube_dl.YoutubeDL):
def __init__(self, *args, **kwargs):
self.to_stderr = self.to_screen
@@ -54,17 +42,12 @@ def _file_md5(fn):
with open(fn, 'rb') as f:
return hashlib.md5(f.read()).hexdigest()
from helper import get_testcases
defs = get_testcases()
with io.open(PARAMETERS_FILE, encoding='utf-8') as pf:
parameters = json.load(pf)
class TestDownload(unittest.TestCase):
maxDiff = None
def setUp(self):
self.parameters = parameters
self.defs = defs
### Dynamically generate tests
@@ -84,8 +67,7 @@ def generator(test_case):
print_skipping(test_case['skip'])
return
params = self.parameters.copy()
params.update(test_case.get('params', {}))
params = get_params(test_case.get('params', {}))
ydl = YoutubeDL(params)
ydl.add_default_info_extractors()
@@ -97,9 +79,9 @@ def generator(test_case):
test_cases = test_case.get('playlist', [test_case])
for tc in test_cases:
_try_rm(tc['file'])
_try_rm(tc['file'] + '.part')
_try_rm(tc['file'] + '.info.json')
try_rm(tc['file'])
try_rm(tc['file'] + '.part')
try_rm(tc['file'] + '.info.json')
try:
for retry in range(1, RETRIES + 1):
try:
@@ -127,12 +109,11 @@ def generator(test_case):
info_dict = json.load(infof)
for (info_field, expected) in tc.get('info_dict', {}).items():
if isinstance(expected, compat_str) and expected.startswith('md5:'):
self.assertEqual(expected, 'md5:' + md5(info_dict.get(info_field)))
got = 'md5:' + md5(info_dict.get(info_field))
else:
got = info_dict.get(info_field)
self.assertEqual(
expected, got,
u'invalid value for field %s, expected %r, got %r' % (info_field, expected, got))
self.assertEqual(expected, got,
u'invalid value for field %s, expected %r, got %r' % (info_field, expected, got))
# If checkable fields are missing from the test case, print the info_dict
test_info_dict = dict((key, value if not isinstance(value, compat_str) or len(value) < 250 else 'md5:' + md5(value))
@@ -146,9 +127,9 @@ def generator(test_case):
self.assertTrue(key in info_dict.keys() and info_dict[key])
finally:
for tc in test_cases:
_try_rm(tc['file'])
_try_rm(tc['file'] + '.part')
_try_rm(tc['file'] + '.info.json')
try_rm(tc['file'])
try_rm(tc['file'] + '.part')
try_rm(tc['file'] + '.info.json')
return test_template

View File

@@ -1,17 +1,27 @@
#!/usr/bin/env python
# encoding: utf-8
import sys
import unittest
import json
# Allow direct execution
import os
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from youtube_dl.extractor import DailymotionPlaylistIE, VimeoChannelIE
from youtube_dl.utils import *
from test.helper import FakeYDL, global_setup
global_setup()
from youtube_dl.extractor import (
DailymotionPlaylistIE,
DailymotionUserIE,
VimeoChannelIE,
UstreamChannelIE,
SoundcloudUserIE,
LivestreamIE,
NHLVideocenterIE,
)
from helper import FakeYDL
class TestPlaylists(unittest.TestCase):
def assertIsPlaylist(self, info):
@@ -26,6 +36,14 @@ class TestPlaylists(unittest.TestCase):
self.assertEqual(result['title'], u'SPORT')
self.assertTrue(len(result['entries']) > 20)
def test_dailymotion_user(self):
dl = FakeYDL()
ie = DailymotionUserIE(dl)
result = ie.extract('http://www.dailymotion.com/user/generation-quoi/')
self.assertIsPlaylist(result)
self.assertEqual(result['title'], u'Génération Quoi')
self.assertTrue(len(result['entries']) >= 26)
def test_vimeo_channel(self):
dl = FakeYDL()
ie = VimeoChannelIE(dl)
@@ -34,5 +52,38 @@ class TestPlaylists(unittest.TestCase):
self.assertEqual(result['title'], u'Vimeo Tributes')
self.assertTrue(len(result['entries']) > 24)
def test_ustream_channel(self):
dl = FakeYDL()
ie = UstreamChannelIE(dl)
result = ie.extract('http://www.ustream.tv/channel/young-americans-for-liberty')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], u'5124905')
self.assertTrue(len(result['entries']) >= 11)
def test_soundcloud_user(self):
dl = FakeYDL()
ie = SoundcloudUserIE(dl)
result = ie.extract('https://soundcloud.com/the-concept-band')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], u'9615865')
self.assertTrue(len(result['entries']) >= 12)
def test_livestream_event(self):
dl = FakeYDL()
ie = LivestreamIE(dl)
result = ie.extract('http://new.livestream.com/tedx/cityenglish')
self.assertIsPlaylist(result)
self.assertEqual(result['title'], u'TEDCity2.0 (English)')
self.assertTrue(len(result['entries']) >= 4)
def test_nhl_videocenter(self):
dl = FakeYDL()
ie = NHLVideocenterIE(dl)
result = ie.extract('http://video.canucks.nhl.com/videocenter/console?catid=999')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], u'999')
self.assertEqual(result['title'], u'Highlights')
self.assertEqual(len(result['entries']), 12)
if __name__ == '__main__':
unittest.main()

View File

@@ -1,23 +1,30 @@
#!/usr/bin/env python
# Various small unit tests
import sys
import unittest
import xml.etree.ElementTree
# coding: utf-8
# Allow direct execution
import os
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
# Various small unit tests
import xml.etree.ElementTree
#from youtube_dl.utils import htmlentity_transform
from youtube_dl.utils import timeconvert
from youtube_dl.utils import sanitize_filename
from youtube_dl.utils import unescapeHTML
from youtube_dl.utils import orderedSet
from youtube_dl.utils import DateRange
from youtube_dl.utils import unified_strdate
from youtube_dl.utils import find_xpath_attr
from youtube_dl.utils import (
timeconvert,
sanitize_filename,
unescapeHTML,
orderedSet,
DateRange,
unified_strdate,
find_xpath_attr,
get_meta_content,
xpath_with_ns,
smuggle_url,
unsmuggle_url,
)
if sys.version_info < (3, 0):
_compat_str = lambda b: b.decode('unicode-escape')
@@ -127,5 +134,42 @@ class TestUtil(unittest.TestCase):
self.assertEqual(find_xpath_attr(doc, './/node', 'x', 'a'), doc[1])
self.assertEqual(find_xpath_attr(doc, './/node', 'y', 'c'), doc[2])
def test_meta_parser(self):
testhtml = u'''
<head>
<meta name="description" content="foo &amp; bar">
<meta content='Plato' name='author'/>
</head>
'''
get_meta = lambda name: get_meta_content(name, testhtml)
self.assertEqual(get_meta('description'), u'foo & bar')
self.assertEqual(get_meta('author'), 'Plato')
def test_xpath_with_ns(self):
testxml = u'''<root xmlns:media="http://example.com/">
<media:song>
<media:author>The Author</media:author>
<url>http://server.com/download.mp3</url>
</media:song>
</root>'''
doc = xml.etree.ElementTree.fromstring(testxml)
find = lambda p: doc.find(xpath_with_ns(p, {'media': 'http://example.com/'}))
self.assertTrue(find('media:song') is not None)
self.assertEqual(find('media:song/media:author').text, u'The Author')
self.assertEqual(find('media:song/url').text, u'http://server.com/download.mp3')
def test_smuggle_url(self):
data = {u"ö": u"ö", u"abc": [3]}
url = 'https://foo.bar/baz?x=y#a'
smug_url = smuggle_url(url, data)
unsmug_url, unsmug_data = unsmuggle_url(smug_url)
self.assertEqual(url, unsmug_url)
self.assertEqual(data, unsmug_data)
res_url, res_data = unsmuggle_url(url)
self.assertEqual(res_url, url)
self.assertEqual(res_data, None)
if __name__ == '__main__':
unittest.main()

View File

@@ -0,0 +1,80 @@
#!/usr/bin/env python
# coding: utf-8
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import get_params, global_setup, try_rm
global_setup()
import io
import xml.etree.ElementTree
import youtube_dl.YoutubeDL
import youtube_dl.extractor
class YoutubeDL(youtube_dl.YoutubeDL):
def __init__(self, *args, **kwargs):
super(YoutubeDL, self).__init__(*args, **kwargs)
self.to_stderr = self.to_screen
params = get_params({
'writeannotations': True,
'skip_download': True,
'writeinfojson': False,
'format': 'flv',
})
TEST_ID = 'gr51aVj-mLg'
ANNOTATIONS_FILE = TEST_ID + '.flv.annotations.xml'
EXPECTED_ANNOTATIONS = ['Speech bubble', 'Note', 'Title', 'Spotlight', 'Label']
class TestAnnotations(unittest.TestCase):
def setUp(self):
# Clear old files
self.tearDown()
def test_info_json(self):
expected = list(EXPECTED_ANNOTATIONS) #Two annotations could have the same text.
ie = youtube_dl.extractor.YoutubeIE()
ydl = YoutubeDL(params)
ydl.add_info_extractor(ie)
ydl.download([TEST_ID])
self.assertTrue(os.path.exists(ANNOTATIONS_FILE))
annoxml = None
with io.open(ANNOTATIONS_FILE, 'r', encoding='utf-8') as annof:
annoxml = xml.etree.ElementTree.parse(annof)
self.assertTrue(annoxml is not None, 'Failed to parse annotations XML')
root = annoxml.getroot()
self.assertEqual(root.tag, 'document')
annotationsTag = root.find('annotations')
self.assertEqual(annotationsTag.tag, 'annotations')
annotations = annotationsTag.findall('annotation')
#Not all the annotations have TEXT children and the annotations are returned unsorted.
for a in annotations:
self.assertEqual(a.tag, 'annotation')
if a.get('type') == 'text':
textTag = a.find('TEXT')
text = textTag.text
self.assertTrue(text in expected) #assertIn only added in python 2.7
#remove the first occurance, there could be more than one annotation with the same text
expected.remove(text)
#We should have seen (and removed) all the expected annotation texts.
self.assertEqual(len(expected), 0, 'Not all expected annotations were found.')
def tearDown(self):
try_rm(ANNOTATIONS_FILE)
if __name__ == '__main__':
unittest.main()

View File

@@ -1,37 +1,34 @@
#!/usr/bin/env python
# coding: utf-8
import json
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
# Allow direct execution
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import get_params, global_setup
global_setup()
import io
import json
import youtube_dl.YoutubeDL
import youtube_dl.extractor
from youtube_dl.utils import *
PARAMETERS_FILE = os.path.join(os.path.dirname(os.path.abspath(__file__)), "parameters.json")
# General configuration (from __init__, not very elegant...)
jar = compat_cookiejar.CookieJar()
cookie_processor = compat_urllib_request.HTTPCookieProcessor(jar)
proxy_handler = compat_urllib_request.ProxyHandler()
opener = compat_urllib_request.build_opener(proxy_handler, cookie_processor, YoutubeDLHandler())
compat_urllib_request.install_opener(opener)
class YoutubeDL(youtube_dl.YoutubeDL):
def __init__(self, *args, **kwargs):
super(YoutubeDL, self).__init__(*args, **kwargs)
self.to_stderr = self.to_screen
with io.open(PARAMETERS_FILE, encoding='utf-8') as pf:
params = json.load(pf)
params['writeinfojson'] = True
params['skip_download'] = True
params['writedescription'] = True
params = get_params({
'writeinfojson': True,
'skip_download': True,
'writedescription': True,
})
TEST_ID = 'BaW_jenozKc'
INFO_JSON_FILE = TEST_ID + '.mp4.info.json'
@@ -42,6 +39,7 @@ This is a test video for youtube-dl.
For more information, contact phihag@phihag.de .'''
class TestInfoJSON(unittest.TestCase):
def setUp(self):
# Clear old files

View File

@@ -1,20 +1,26 @@
#!/usr/bin/env python
import sys
import unittest
import json
# Allow direct execution
import os
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from youtube_dl.extractor import YoutubeUserIE, YoutubePlaylistIE, YoutubeIE, YoutubeChannelIE, YoutubeShowIE
from youtube_dl.utils import *
from test.helper import FakeYDL, global_setup
global_setup()
from youtube_dl.extractor import (
YoutubeUserIE,
YoutubePlaylistIE,
YoutubeIE,
YoutubeChannelIE,
YoutubeShowIE,
)
from helper import FakeYDL
class TestYoutubeLists(unittest.TestCase):
def assertIsPlaylist(self,info):
def assertIsPlaylist(self, info):
"""Make sure the info has '_type' set to 'playlist'"""
self.assertEqual(info['_type'], 'playlist')
@@ -27,6 +33,14 @@ class TestYoutubeLists(unittest.TestCase):
ytie_results = [YoutubeIE()._extract_id(url['url']) for url in result['entries']]
self.assertEqual(ytie_results, [ 'bV9L5Ht9LgY', 'FXxLjLQi3Fg', 'tU3Bgo5qJZE'])
def test_youtube_playlist_noplaylist(self):
dl = FakeYDL()
dl.params['noplaylist'] = True
ie = YoutubePlaylistIE(dl)
result = ie.extract('https://www.youtube.com/watch?v=FXxLjLQi3Fg&list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re')
self.assertEqual(result['_type'], 'url')
self.assertEqual(YoutubeIE()._extract_id(result['url']), 'FXxLjLQi3Fg')
def test_issue_673(self):
dl = FakeYDL()
ie = YoutubePlaylistIE(dl)
@@ -92,7 +106,7 @@ class TestYoutubeLists(unittest.TestCase):
dl = FakeYDL()
ie = YoutubeShowIE(dl)
result = ie.extract('http://www.youtube.com/show/airdisasters')
self.assertTrue(len(result) >= 4)
self.assertTrue(len(result) >= 3)
if __name__ == '__main__':
unittest.main()

View File

@@ -0,0 +1,84 @@
#!/usr/bin/env python
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import global_setup
global_setup()
import io
import re
import string
from youtube_dl.extractor import YoutubeIE
from youtube_dl.utils import compat_str, compat_urlretrieve
_TESTS = [
(
u'https://s.ytimg.com/yts/jsbin/html5player-vflHOr_nV.js',
u'js',
86,
u'>=<;:/.-[+*)(\'&%$#"!ZYX0VUTSRQPONMLKJIHGFEDCBA\\yxwvutsrqponmlkjihgfedcba987654321',
),
(
u'https://s.ytimg.com/yts/jsbin/html5player-vfldJ8xgI.js',
u'js',
85,
u'3456789a0cdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRS[UVWXYZ!"#$%&\'()*+,-./:;<=>?@',
),
(
u'https://s.ytimg.com/yts/swfbin/watch_as3-vflg5GhxU.swf',
u'swf',
82,
u':/.-,+*)=\'&%$#"!ZYX0VUTSRQPONMLKJIHGFEDCBAzyxw>utsrqponmlkjihgfedcba987654321'
),
]
class TestSignature(unittest.TestCase):
def setUp(self):
TEST_DIR = os.path.dirname(os.path.abspath(__file__))
self.TESTDATA_DIR = os.path.join(TEST_DIR, 'testdata')
if not os.path.exists(self.TESTDATA_DIR):
os.mkdir(self.TESTDATA_DIR)
def make_tfunc(url, stype, sig_length, expected_sig):
basename = url.rpartition('/')[2]
m = re.match(r'.*-([a-zA-Z0-9_-]+)\.[a-z]+$', basename)
assert m, '%r should follow URL format' % basename
test_id = m.group(1)
def test_func(self):
fn = os.path.join(self.TESTDATA_DIR, basename)
if not os.path.exists(fn):
compat_urlretrieve(url, fn)
ie = YoutubeIE()
if stype == 'js':
with io.open(fn, encoding='utf-8') as testf:
jscode = testf.read()
func = ie._parse_sig_js(jscode)
else:
assert stype == 'swf'
with open(fn, 'rb') as testf:
swfcode = testf.read()
func = ie._parse_sig_swf(swfcode)
src_sig = compat_str(string.printable[:sig_length])
got_sig = func(src_sig)
self.assertEqual(got_sig, expected_sig)
test_func.__name__ = str('test_signature_' + stype + '_' + test_id)
setattr(TestSignature, test_func.__name__, test_func)
for test_spec in _TESTS:
make_tfunc(*test_spec)
if __name__ == '__main__':
unittest.main()

View File

@@ -1,102 +1,93 @@
#!/usr/bin/env python
import sys
import unittest
import json
import io
import hashlib
# Allow direct execution
import os
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import FakeYDL, global_setup, md5
global_setup()
from youtube_dl.extractor import YoutubeIE
from youtube_dl.utils import *
from helper import FakeYDL
md5 = lambda s: hashlib.md5(s.encode('utf-8')).hexdigest()
class TestYoutubeSubtitles(unittest.TestCase):
def setUp(self):
DL = FakeYDL()
DL.params['allsubtitles'] = False
DL.params['writesubtitles'] = False
DL.params['subtitlesformat'] = 'srt'
DL.params['listsubtitles'] = False
def test_youtube_no_subtitles(self):
DL = FakeYDL()
DL.params['writesubtitles'] = False
IE = YoutubeIE(DL)
info_dict = IE.extract('QRS8MkLhQmM')
subtitles = info_dict[0]['subtitles']
self.DL = FakeYDL()
self.url = 'QRS8MkLhQmM'
def getInfoDict(self):
IE = YoutubeIE(self.DL)
info_dict = IE.extract(self.url)
return info_dict
def getSubtitles(self):
info_dict = self.getInfoDict()
return info_dict[0]['subtitles']
def test_youtube_no_writesubtitles(self):
self.DL.params['writesubtitles'] = False
subtitles = self.getSubtitles()
self.assertEqual(subtitles, None)
def test_youtube_subtitles(self):
DL = FakeYDL()
DL.params['writesubtitles'] = True
IE = YoutubeIE(DL)
info_dict = IE.extract('QRS8MkLhQmM')
sub = info_dict[0]['subtitles']['en']
self.assertEqual(md5(sub), '4cd9278a35ba2305f47354ee13472260')
def test_youtube_subtitles_it(self):
DL = FakeYDL()
DL.params['writesubtitles'] = True
DL.params['subtitleslangs'] = ['it']
IE = YoutubeIE(DL)
info_dict = IE.extract('QRS8MkLhQmM')
sub = info_dict[0]['subtitles']['it']
self.assertEqual(md5(sub), '164a51f16f260476a05b50fe4c2f161d')
def test_youtube_onlysubtitles(self):
DL = FakeYDL()
DL.params['writesubtitles'] = True
DL.params['onlysubtitles'] = True
IE = YoutubeIE(DL)
info_dict = IE.extract('QRS8MkLhQmM')
sub = info_dict[0]['subtitles']['en']
self.assertEqual(md5(sub), '4cd9278a35ba2305f47354ee13472260')
self.DL.params['writesubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['en']), '4cd9278a35ba2305f47354ee13472260')
def test_youtube_subtitles_lang(self):
self.DL.params['writesubtitles'] = True
self.DL.params['subtitleslangs'] = ['it']
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['it']), '164a51f16f260476a05b50fe4c2f161d')
def test_youtube_allsubtitles(self):
DL = FakeYDL()
DL.params['allsubtitles'] = True
IE = YoutubeIE(DL)
info_dict = IE.extract('QRS8MkLhQmM')
subtitles = info_dict[0]['subtitles']
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(len(subtitles.keys()), 13)
def test_youtube_subtitles_sbv_format(self):
DL = FakeYDL()
DL.params['writesubtitles'] = True
DL.params['subtitlesformat'] = 'sbv'
IE = YoutubeIE(DL)
info_dict = IE.extract('QRS8MkLhQmM')
sub = info_dict[0]['subtitles']['en']
self.assertEqual(md5(sub), '13aeaa0c245a8bed9a451cb643e3ad8b')
self.DL.params['writesubtitles'] = True
self.DL.params['subtitlesformat'] = 'sbv'
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['en']), '13aeaa0c245a8bed9a451cb643e3ad8b')
def test_youtube_subtitles_vtt_format(self):
DL = FakeYDL()
DL.params['writesubtitles'] = True
DL.params['subtitlesformat'] = 'vtt'
IE = YoutubeIE(DL)
info_dict = IE.extract('QRS8MkLhQmM')
sub = info_dict[0]['subtitles']['en']
self.assertEqual(md5(sub), '356cdc577fde0c6783b9b822e7206ff7')
self.DL.params['writesubtitles'] = True
self.DL.params['subtitlesformat'] = 'vtt'
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['en']), '356cdc577fde0c6783b9b822e7206ff7')
def test_youtube_list_subtitles(self):
DL = FakeYDL()
DL.params['listsubtitles'] = True
IE = YoutubeIE(DL)
info_dict = IE.extract('QRS8MkLhQmM')
self.DL.expect_warning(u'Video doesn\'t have automatic captions')
self.DL.params['listsubtitles'] = True
info_dict = self.getInfoDict()
self.assertEqual(info_dict, None)
def test_youtube_automatic_captions(self):
DL = FakeYDL()
DL.params['writeautomaticsub'] = True
DL.params['subtitleslangs'] = ['it']
IE = YoutubeIE(DL)
info_dict = IE.extract('8YoUxe5ncPo')
sub = info_dict[0]['subtitles']['it']
self.assertTrue(sub is not None)
self.url = '8YoUxe5ncPo'
self.DL.params['writeautomaticsub'] = True
self.DL.params['subtitleslangs'] = ['it']
subtitles = self.getSubtitles()
self.assertTrue(subtitles['it'] is not None)
def test_youtube_nosubtitles(self):
self.DL.expect_warning(u'video doesn\'t have subtitles')
self.url = 'sAjKT8FhjI8'
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(len(subtitles), 0)
def test_youtube_multiple_langs(self):
DL = FakeYDL()
DL.params['writesubtitles'] = True
self.url = 'QRS8MkLhQmM'
self.DL.params['writesubtitles'] = True
langs = ['it', 'fr', 'de']
DL.params['subtitleslangs'] = langs
IE = YoutubeIE(DL)
subtitles = IE.extract('QRS8MkLhQmM')[0]['subtitles']
self.DL.params['subtitleslangs'] = langs
subtitles = self.getSubtitles()
for lang in langs:
self.assertTrue(subtitles.get(lang) is not None, u'Subtitles for \'%s\' not extracted' % lang)

8
tox.ini Normal file
View File

@@ -0,0 +1,8 @@
[tox]
envlist = py26,py27,py33
[testenv]
deps =
nose
coverage
commands = nosetests --verbose {posargs:test} # --with-coverage --cover-package=youtube_dl --cover-html
# test.test_download:TestDownload.test_NowVideo

View File

@@ -63,36 +63,57 @@ class FileDownloader(object):
converted = float(bytes) / float(1024 ** exponent)
return '%.2f%s' % (converted, suffix)
@staticmethod
def format_seconds(seconds):
(mins, secs) = divmod(seconds, 60)
(hours, mins) = divmod(mins, 60)
if hours > 99:
return '--:--:--'
if hours == 0:
return '%02d:%02d' % (mins, secs)
else:
return '%02d:%02d:%02d' % (hours, mins, secs)
@staticmethod
def calc_percent(byte_counter, data_len):
if data_len is None:
return None
return float(byte_counter) / float(data_len) * 100.0
@staticmethod
def format_percent(percent):
if percent is None:
return '---.-%'
return '%6s' % ('%3.1f%%' % (float(byte_counter) / float(data_len) * 100.0))
return '%6s' % ('%3.1f%%' % percent)
@staticmethod
def calc_eta(start, now, total, current):
if total is None:
return '--:--'
return None
dif = now - start
if current == 0 or dif < 0.001: # One millisecond
return '--:--'
return None
rate = float(current) / dif
eta = int((float(total) - float(current)) / rate)
(eta_mins, eta_secs) = divmod(eta, 60)
(eta_hours, eta_mins) = divmod(eta_mins, 60)
if eta_hours > 99:
return '--:--:--'
if eta_hours == 0:
return '%02d:%02d' % (eta_mins, eta_secs)
else:
return '%02d:%02d:%02d' % (eta_hours, eta_mins, eta_secs)
return int((float(total) - float(current)) / rate)
@staticmethod
def format_eta(eta):
if eta is None:
return '--:--'
return FileDownloader.format_seconds(eta)
@staticmethod
def calc_speed(start, now, bytes):
dif = now - start
if bytes == 0 or dif < 0.001: # One millisecond
return None
return float(bytes) / dif
@staticmethod
def format_speed(speed):
if speed is None:
return '%10s' % '---b/s'
return '%10s' % ('%s/s' % FileDownloader.format_bytes(float(bytes) / dif))
return '%10s' % ('%s/s' % FileDownloader.format_bytes(speed))
@staticmethod
def best_block_size(elapsed_time, bytes):
@@ -201,11 +222,14 @@ class FileDownloader(object):
"""Report destination filename."""
self.to_screen(u'[download] Destination: ' + filename)
def report_progress(self, percent_str, data_len_str, speed_str, eta_str):
def report_progress(self, percent, data_len_str, speed, eta):
"""Report download progress."""
if self.params.get('noprogress', False):
return
clear_line = (u'\x1b[K' if sys.stderr.isatty() and os.name != 'nt' else u'')
eta_str = self.format_eta(eta)
percent_str = self.format_percent(percent)
speed_str = self.format_speed(speed)
if self.params.get('progress_with_newline', False):
self.to_screen(u'[download] %s of %s at %s ETA %s' %
(percent_str, data_len_str, speed_str, eta_str))
@@ -234,16 +258,19 @@ class FileDownloader(object):
"""Report it was impossible to resume download."""
self.to_screen(u'[download] Unable to resume')
def report_finish(self):
def report_finish(self, data_len_str, tot_time):
"""Report download finished."""
if self.params.get('noprogress', False):
self.to_screen(u'[download] Download completed')
else:
self.to_screen(u'')
clear_line = (u'\x1b[K' if sys.stderr.isatty() and os.name != 'nt' else u'')
self.to_screen(u'\r%s[download] 100%% of %s in %s' %
(clear_line, data_len_str, self.format_seconds(tot_time)))
def _download_with_rtmpdump(self, filename, url, player_url, page_url, play_path, tc_url):
self.report_destination(filename)
tmpfilename = self.temp_name(filename)
test = self.params.get('test', False)
# Check for rtmpdump first
try:
@@ -265,6 +292,8 @@ class FileDownloader(object):
basic_args += ['--playpath', play_path]
if tc_url is not None:
basic_args += ['--tcUrl', url]
if test:
basic_args += ['--stop', '1']
args = basic_args + [[], ['--resume', '--skip', '1']][self.params.get('continuedl', False)]
if self.params.get('verbose', False):
try:
@@ -274,7 +303,7 @@ class FileDownloader(object):
shell_quote = repr
self.to_screen(u'[debug] rtmpdump command line: ' + shell_quote(args))
retval = subprocess.call(args)
while retval == 2 or retval == 1:
while (retval == 2 or retval == 1) and not test:
prevsize = os.path.getsize(encodeFilename(tmpfilename))
self.to_screen(u'\r[rtmpdump] %s bytes' % prevsize, skip_eol=True)
time.sleep(5.0) # This seems to be needed
@@ -287,7 +316,7 @@ class FileDownloader(object):
self.to_screen(u'\r[rtmpdump] Could not download the whole video. This can happen for some advertisements.')
retval = 0
break
if retval == 0:
if retval == 0 or (test and retval == 2):
fsize = os.path.getsize(encodeFilename(tmpfilename))
self.to_screen(u'\r[rtmpdump] %s bytes' % fsize)
self.try_rename(tmpfilename, filename)
@@ -372,6 +401,7 @@ class FileDownloader(object):
self._hook_progress({
'filename': filename,
'status': 'finished',
'total_bytes': os.path.getsize(encodeFilename(filename)),
})
return True
@@ -518,13 +548,14 @@ class FileDownloader(object):
block_size = self.best_block_size(after - before, len(data_block))
# Progress message
speed_str = self.calc_speed(start, time.time(), byte_counter - resume_len)
speed = self.calc_speed(start, time.time(), byte_counter - resume_len)
if data_len is None:
self.report_progress('Unknown %', data_len_str, speed_str, 'Unknown ETA')
eta = None
else:
percent_str = self.calc_percent(byte_counter, data_len)
eta_str = self.calc_eta(start, time.time(), data_len - resume_len, byte_counter - resume_len)
self.report_progress(percent_str, data_len_str, speed_str, eta_str)
percent = self.calc_percent(byte_counter, data_len)
eta = self.calc_eta(start, time.time(), data_len - resume_len, byte_counter - resume_len)
self.report_progress(percent, data_len_str, speed, eta)
self._hook_progress({
'downloaded_bytes': byte_counter,
@@ -532,6 +563,8 @@ class FileDownloader(object):
'tmpfilename': tmpfilename,
'filename': filename,
'status': 'downloading',
'eta': eta,
'speed': speed,
})
# Apply rate limit
@@ -542,7 +575,7 @@ class FileDownloader(object):
self.report_error(u'Did not get any data blocks')
return False
stream.close()
self.report_finish()
self.report_finish(data_len_str, (time.time() - start))
if data_len is not None and byte_counter != data_len:
raise ContentTooShortError(byte_counter, int(data_len))
self.try_rename(tmpfilename, filename)
@@ -574,6 +607,8 @@ class FileDownloader(object):
* downloaded_bytes: Bytes on disks
* total_bytes: Total bytes, None if unknown
* tmpfilename: The filename we're currently writing to
* eta: The estimated time in seconds, None if unknown
* speed: The download speed in bytes/second, None if unknown
Hooks are guaranteed to be called at least once (with status "finished")
if the download is successful.

View File

@@ -3,7 +3,14 @@ import subprocess
import sys
import time
from .utils import *
from .utils import (
compat_subprocess_get_DEVNULL,
encodeFilename,
PostProcessingError,
shell_quote,
subtitles_filename,
)
class PostProcessor(object):
@@ -82,6 +89,8 @@ class FFmpegPostProcessor(PostProcessor):
+ opts +
[encodeFilename(self._ffmpeg_filename_argument(out_path))])
if self._downloader.params.get('verbose', False):
self._downloader.to_screen(u'[debug] ffmpeg command line: %s' % shell_quote(cmd))
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout,stderr = p.communicate()
if p.returncode != 0:
@@ -137,7 +146,7 @@ class FFmpegExtractAudioPP(FFmpegPostProcessor):
try:
FFmpegPostProcessor.run_ffmpeg(self, path, out_path, opts)
except FFmpegPostProcessorError as err:
raise AudioConversionError(err.message)
raise AudioConversionError(err.msg)
def run(self, information):
path = information['filepath']
@@ -177,7 +186,8 @@ class FFmpegExtractAudioPP(FFmpegPostProcessor):
extension = self._preferredcodec
more_opts = []
if self._preferredquality is not None:
if int(self._preferredquality) < 10:
# The opus codec doesn't support the -aq option
if int(self._preferredquality) < 10 and extension != 'opus':
more_opts += [self._exes['avconv'] and '-q:a' or '-aq', self._preferredquality]
else:
more_opts += [self._exes['avconv'] and '-b:a' or '-ab', self._preferredquality + 'k']
@@ -207,7 +217,7 @@ class FFmpegExtractAudioPP(FFmpegPostProcessor):
except:
etype,e,tb = sys.exc_info()
if isinstance(e, AudioConversionError):
msg = u'audio conversion failed: ' + e.message
msg = u'audio conversion failed: ' + e.msg
else:
msg = u'error running ' + (self._exes['avconv'] and 'avconv' or 'ffmpeg')
raise PostProcessingError(msg)
@@ -444,8 +454,11 @@ class FFmpegEmbedSubtitlePP(FFmpegPostProcessor):
if information['ext'] != u'mp4':
self._downloader.to_screen(u'[ffmpeg] Subtitles can only be embedded in mp4 files')
return True, information
sub_langs = [key for key in information['subtitles']]
if not information.get('subtitles'):
self._downloader.to_screen(u'[ffmpeg] There aren\'t any subtitles to embed')
return True, information
sub_langs = [key for key in information['subtitles']]
filename = information['filepath']
input_files = [filename] + [subtitles_filename(filename, lang, self._subformat) for lang in sub_langs]
@@ -464,3 +477,35 @@ class FFmpegEmbedSubtitlePP(FFmpegPostProcessor):
os.rename(encodeFilename(temp_filename), encodeFilename(filename))
return True, information
class FFmpegMetadataPP(FFmpegPostProcessor):
def run(self, info):
metadata = {}
if info.get('title') is not None:
metadata['title'] = info['title']
if info.get('upload_date') is not None:
metadata['date'] = info['upload_date']
if info.get('uploader') is not None:
metadata['artist'] = info['uploader']
elif info.get('uploader_id') is not None:
metadata['artist'] = info['uploader_id']
if not metadata:
self._downloader.to_screen(u'[ffmpeg] There isn\'t any metadata to add')
return True, info
filename = info['filepath']
ext = os.path.splitext(filename)[1][1:]
temp_filename = filename + u'.temp'
options = ['-c', 'copy']
for (name, value) in metadata.items():
options.extend(['-metadata', '%s="%s"' % (name, value)])
options.extend(['-f', ext])
self._downloader.to_screen(u'[ffmpeg] Adding metadata to \'%s\'' % filename)
self.run_ffmpeg(filename, temp_filename, options)
os.remove(encodeFilename(filename))
os.rename(encodeFilename(temp_filename), encodeFilename(filename))
return True, info

View File

@@ -3,6 +3,7 @@
from __future__ import absolute_import
import errno
import io
import os
import re
@@ -70,17 +71,27 @@ class YoutubeDL(object):
logtostderr: Log messages to stderr instead of stdout.
writedescription: Write the video description to a .description file
writeinfojson: Write the video description to a .info.json file
writeannotations: Write the video annotations to a .annotations.xml file
writethumbnail: Write the thumbnail image to a file
writesubtitles: Write the video subtitles to a file
writeautomaticsub: Write the automatic subtitles to a file
allsubtitles: Downloads all the subtitles of the video
(requires writesubtitles or writeautomaticsub)
listsubtitles: Lists all available subtitles for the video
subtitlesformat: Subtitle format [srt/sbv/vtt] (default=srt)
subtitleslangs: List of languages of the subtitles to download
keepvideo: Keep the video file after post-processing
daterange: A DateRange object, download only if the upload_date is in the range.
skip_download: Skip the actual download of the video file
cachedir: Location of the cache files in the filesystem.
None to disable filesystem cache.
noplaylist: Download single video instead of a playlist if in doubt.
age_limit: An integer representing the user's age in years.
Unsuitable videos for the given age are skipped.
downloadarchive: File name of a file where all downloads are recorded.
Videos already present in the file are not downloaded
again.
The following parameters are not used by YoutubeDL itself, they are used by
the FileDownloader:
nopart, updatetime, buffersize, ratelimit, min_filesize, max_filesize, test,
@@ -97,11 +108,23 @@ class YoutubeDL(object):
def __init__(self, params):
"""Create a FileDownloader object with the given options."""
self._ies = []
self._ies_instances = {}
self._pps = []
self._progress_hooks = []
self._download_retcode = 0
self._num_downloads = 0
self._screen_file = [sys.stdout, sys.stderr][params.get('logtostderr', False)]
if (sys.version_info >= (3,) and sys.platform != 'win32' and
sys.getfilesystemencoding() in ['ascii', 'ANSI_X3.4-1968']
and not params['restrictfilenames']):
# On Python 3, the Unicode filesystem API will throw errors (#1474)
self.report_warning(
u'Assuming --restrict-filenames since file system encoding '
u'cannot encode all charactes. '
u'Set the LC_ALL environment variable to fix this.')
params['restrictfilenames'] = True
self.params = params
self.fd = FileDownloader(self, self.params)
@@ -111,8 +134,21 @@ class YoutubeDL(object):
def add_info_extractor(self, ie):
"""Add an InfoExtractor object to the end of the list."""
self._ies.append(ie)
self._ies_instances[ie.ie_key()] = ie
ie.set_downloader(self)
def get_info_extractor(self, ie_key):
"""
Get an instance of an IE with name ie_key, it will try to get one from
the _ies list, if there's no instance it will create a new one and add
it to the extractor list.
"""
ie = self._ies_instances.get(ie_key)
if ie is None:
ie = get_info_extractor(ie_key)()
self.add_info_extractor(ie)
return ie
def add_default_info_extractors(self):
"""
Add the InfoExtractors returned by gen_extractors to the end of the list
@@ -127,14 +163,10 @@ class YoutubeDL(object):
def to_screen(self, message, skip_eol=False):
"""Print message to stdout if not in quiet mode."""
assert type(message) == type(u'')
if not self.params.get('quiet', False):
terminator = [u'\n', u''][skip_eol]
output = message + terminator
if 'b' in getattr(self._screen_file, 'mode', '') or sys.version_info[0] < 3: # Python 2 lies about the mode of sys.stdout/sys.stderr
output = output.encode(preferredencoding(), 'ignore')
self._screen_file.write(output)
self._screen_file.flush()
write_string(output, self._screen_file)
def to_stderr(self, message):
"""Print message to stderr."""
@@ -184,10 +216,10 @@ class YoutubeDL(object):
If stderr is a tty file the 'WARNING:' will be colored
'''
if sys.stderr.isatty() and os.name != 'nt':
_msg_header=u'\033[0;33mWARNING:\033[0m'
_msg_header = u'\033[0;33mWARNING:\033[0m'
else:
_msg_header=u'WARNING:'
warning_message=u'%s %s' % (_msg_header,message)
_msg_header = u'WARNING:'
warning_message = u'%s %s' % (_msg_header, message)
self.to_stderr(warning_message)
def report_error(self, message, tb=None):
@@ -202,19 +234,6 @@ class YoutubeDL(object):
error_message = u'%s %s' % (_msg_header, message)
self.trouble(error_message, tb)
def slow_down(self, start_time, byte_counter):
"""Sleep if the download speed is over the rate limit."""
rate_limit = self.params.get('ratelimit', None)
if rate_limit is None or byte_counter == 0:
return
now = time.time()
elapsed = now - start_time
if elapsed <= 0.0:
return
speed = float(byte_counter) / elapsed
if speed > rate_limit:
time.sleep((byte_counter - rate_limit * (now - start_time)) / rate_limit)
def report_writedescription(self, descfn):
""" Report that the description file is being written """
self.to_screen(u'[info] Writing video description to: ' + descfn)
@@ -227,6 +246,10 @@ class YoutubeDL(object):
""" Report that the metadata file has been written """
self.to_screen(u'[info] Video description metadata as JSON to: ' + infofn)
def report_writeannotations(self, annofn):
""" Report that the annotations file has been written. """
self.to_screen(u'[info] Writing video annotations to: ' + annofn)
def report_file_already_downloaded(self, file_name):
"""Report file has already been fully downloaded."""
try:
@@ -252,13 +275,15 @@ class YoutubeDL(object):
if template_dict['playlist_index'] is not None:
template_dict['playlist_index'] = u'%05d' % template_dict['playlist_index']
sanitize = lambda k,v: sanitize_filename(
sanitize = lambda k, v: sanitize_filename(
u'NA' if v is None else compat_str(v),
restricted=self.params.get('restrictfilenames'),
is_id=(k==u'id'))
template_dict = dict((k, sanitize(k, v)) for k,v in template_dict.items())
is_id=(k == u'id'))
template_dict = dict((k, sanitize(k, v))
for k, v in template_dict.items())
filename = self.params['outtmpl'] % template_dict
tmpl = os.path.expanduser(self.params['outtmpl'])
filename = tmpl % template_dict
return filename
except KeyError as err:
self.report_error(u'Erroneous output template')
@@ -284,19 +309,24 @@ class YoutubeDL(object):
dateRange = self.params.get('daterange', DateRange())
if date not in dateRange:
return u'[download] %s upload date is not in range %s' % (date_from_str(date).isoformat(), dateRange)
age_limit = self.params.get('age_limit')
if age_limit is not None:
if age_limit < info_dict.get('age_limit', 0):
return u'Skipping "' + title + '" because it is age restricted'
if self.in_download_archive(info_dict):
return (u'%(title)s has already been recorded in archive'
% info_dict)
return None
def extract_info(self, url, download=True, ie_key=None, extra_info={}):
'''
Returns a list with a dictionary for each video we find.
If 'download', also downloads the videos.
extra_info is a dict containing the extra values to add to each result
'''
if ie_key:
ie = get_info_extractor(ie_key)()
ie.set_downloader(self)
ies = [ie]
ies = [self.get_info_extractor(ie_key)]
else:
ies = self._ies
@@ -336,7 +366,7 @@ class YoutubeDL(object):
raise
else:
self.report_error(u'no suitable InfoExtractor: %s' % url)
def process_ie_result(self, ie_result, download=True, extra_info={}):
"""
Take the result of the ie(may be modified) and resolve all unresolved
@@ -349,13 +379,7 @@ class YoutubeDL(object):
result_type = ie_result.get('_type', 'video') # If not given we suppose it's a video, support the default old system
if result_type == 'video':
ie_result.update(extra_info)
if 'playlist' not in ie_result:
# It isn't part of a playlist
ie_result['playlist'] = None
ie_result['playlist_index'] = None
if download:
self.process_info(ie_result)
return ie_result
return self.process_video_result(ie_result)
elif result_type == 'url':
# We have to add extra_info to the results because it may be
# contained in a playlist
@@ -366,7 +390,7 @@ class YoutubeDL(object):
elif result_type == 'playlist':
# We process each entry in the playlist
playlist = ie_result.get('title', None) or ie_result.get('id', None)
self.to_screen(u'[download] Downloading playlist: %s' % playlist)
self.to_screen(u'[download] Downloading playlist: %s' % playlist)
playlist_results = []
@@ -384,12 +408,12 @@ class YoutubeDL(object):
self.to_screen(u"[%s] playlist '%s': Collected %d video ids (downloading %d of them)" %
(ie_result['extractor'], playlist, n_all_entries, n_entries))
for i,entry in enumerate(entries,1):
self.to_screen(u'[download] Downloading video #%s of %s' %(i, n_entries))
for i, entry in enumerate(entries, 1):
self.to_screen(u'[download] Downloading video #%s of %s' % (i, n_entries))
extra = {
'playlist': playlist,
'playlist_index': i + playliststart,
}
'playlist': playlist,
'playlist_index': i + playliststart,
}
if not 'extractor' in entry:
# We set the extractor, if it's an url it will be set then to
# the new extractor, but if it's already a video we must make
@@ -413,6 +437,103 @@ class YoutubeDL(object):
else:
raise Exception('Invalid result type: %s' % result_type)
def select_format(self, format_spec, available_formats):
if format_spec == 'best' or format_spec is None:
return available_formats[-1]
elif format_spec == 'worst':
return available_formats[0]
else:
extensions = [u'mp4', u'flv', u'webm', u'3gp']
if format_spec in extensions:
filter_f = lambda f: f['ext'] == format_spec
else:
filter_f = lambda f: f['format_id'] == format_spec
matches = list(filter(filter_f, available_formats))
if matches:
return matches[-1]
return None
def process_video_result(self, info_dict, download=True):
assert info_dict.get('_type', 'video') == 'video'
if 'playlist' not in info_dict:
# It isn't part of a playlist
info_dict['playlist'] = None
info_dict['playlist_index'] = None
# This extractors handle format selection themselves
if info_dict['extractor'] in [u'youtube', u'Youku', u'YouPorn', u'mixcloud']:
if download:
self.process_info(info_dict)
return info_dict
# We now pick which formats have to be downloaded
if info_dict.get('formats') is None:
# There's only one format available
formats = [info_dict]
else:
formats = info_dict['formats']
# We check that all the formats have the format and format_id fields
for (i, format) in enumerate(formats):
if format.get('format_id') is None:
format['format_id'] = compat_str(i)
if format.get('format') is None:
format['format'] = u'{id} - {res}{note}'.format(
id=format['format_id'],
res=self.format_resolution(format),
note=u' ({})'.format(format['format_note']) if format.get('format_note') is not None else '',
)
if self.params.get('listformats', None):
self.list_formats(info_dict)
return
format_limit = self.params.get('format_limit', None)
if format_limit:
formats = list(takewhile_inclusive(
lambda f: f['format_id'] != format_limit, formats
))
if self.params.get('prefer_free_formats'):
def _free_formats_key(f):
try:
ext_ord = [u'flv', u'mp4', u'webm'].index(f['ext'])
except ValueError:
ext_ord = -1
# We only compare the extension if they have the same height and width
return (f.get('height'), f.get('width'), ext_ord)
formats = sorted(formats, key=_free_formats_key)
req_format = self.params.get('format', 'best')
if req_format is None:
req_format = 'best'
formats_to_download = []
# The -1 is for supporting YoutubeIE
if req_format in ('-1', 'all'):
formats_to_download = formats
else:
# We can accept formats requestd in the format: 34/5/best, we pick
# the first that is available, starting from left
req_formats = req_format.split('/')
for rf in req_formats:
selected_format = self.select_format(rf, formats)
if selected_format is not None:
formats_to_download = [selected_format]
break
if not formats_to_download:
raise ExtractorError(u'requested format not available')
if download:
if len(formats_to_download) > 1:
self.to_screen(u'[info] %s: downloading video in %s formats' % (info_dict['id'], len(formats_to_download)))
for format in formats_to_download:
new_info = dict(info_dict)
new_info.update(format)
self.process_info(new_info)
# We update the info dict with the best quality format (backwards compatibility)
info_dict.update(formats_to_download[-1])
return info_dict
def process_info(self, info_dict):
"""Process a single resolved IE result."""
@@ -448,7 +569,8 @@ class YoutubeDL(object):
if self.params.get('forceid', False):
compat_print(info_dict['id'])
if self.params.get('forceurl', False):
compat_print(info_dict['url'])
# For RTMP URLs, also include the playpath
compat_print(info_dict['url'] + info_dict.get('play_path', u''))
if self.params.get('forcethumbnail', False) and 'thumbnail' in info_dict:
compat_print(info_dict['thumbnail'])
if self.params.get('forcedescription', False) and 'description' in info_dict:
@@ -479,15 +601,28 @@ class YoutubeDL(object):
self.report_writedescription(descfn)
with io.open(encodeFilename(descfn), 'w', encoding='utf-8') as descfile:
descfile.write(info_dict['description'])
except (KeyError, TypeError):
self.report_warning(u'There\'s no description to write.')
except (OSError, IOError):
self.report_error(u'Cannot write description file ' + descfn)
return
subtitles_are_requested = any([self.params.get('writesubtitles', False),
self.params.get('writeautomaticsub'),
self.params.get('allsubtitles', False)])
if self.params.get('writeannotations', False):
try:
annofn = filename + u'.annotations.xml'
self.report_writeannotations(annofn)
with io.open(encodeFilename(annofn), 'w', encoding='utf-8') as annofile:
annofile.write(info_dict['annotations'])
except (KeyError, TypeError):
self.report_warning(u'There are no annotations to write.')
except (OSError, IOError):
self.report_error(u'Cannot write annotations file: ' + annofn)
return
if subtitles_are_requested and 'subtitles' in info_dict and info_dict['subtitles']:
subtitles_are_requested = any([self.params.get('writesubtitles', False),
self.params.get('writeautomaticsub')])
if subtitles_are_requested and 'subtitles' in info_dict and info_dict['subtitles']:
# subtitles download errors are already managed as troubles in relevant IE
# that way it will silently go on when used with unsupporting IE
subtitles = info_dict['subtitles']
@@ -509,7 +644,7 @@ class YoutubeDL(object):
infofn = filename + u'.info.json'
self.report_writeinfojson(infofn)
try:
json_info_dict = dict((k, v) for k,v in info_dict.items() if not k in ['urlhandle'])
json_info_dict = dict((k, v) for k, v in info_dict.items() if not k in ['urlhandle'])
write_json_file(json_info_dict, encodeFilename(infofn))
except (OSError, IOError):
self.report_error(u'Cannot write metadata to JSON file ' + infofn)
@@ -521,11 +656,15 @@ class YoutubeDL(object):
thumb_filename = filename.rpartition('.')[0] + u'.' + thumb_format
self.to_screen(u'[%s] %s: Downloading thumbnail ...' %
(info_dict['extractor'], info_dict['id']))
uf = compat_urllib_request.urlopen(info_dict['thumbnail'])
with open(thumb_filename, 'wb') as thumbf:
shutil.copyfileobj(uf, thumbf)
self.to_screen(u'[%s] %s: Writing thumbnail to: %s' %
(info_dict['extractor'], info_dict['id'], thumb_filename))
try:
uf = compat_urllib_request.urlopen(info_dict['thumbnail'])
with open(thumb_filename, 'wb') as thumbf:
shutil.copyfileobj(uf, thumbf)
self.to_screen(u'[%s] %s: Writing thumbnail to: %s' %
(info_dict['extractor'], info_dict['id'], thumb_filename))
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
self.report_warning(u'Unable to download thumbnail "%s": %s' %
(info_dict['thumbnail'], compat_str(err)))
if not self.params.get('skip_download', False):
if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(filename)):
@@ -533,11 +672,11 @@ class YoutubeDL(object):
else:
try:
success = self.fd._do_download(filename, info_dict)
except (OSError, IOError) as err:
raise UnavailableVideoError(err)
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
self.report_error(u'unable to download video data: %s' % str(err))
return
except (OSError, IOError) as err:
raise UnavailableVideoError(err)
except (ContentTooShortError, ) as err:
self.report_error(u'content too short (expected %s bytes and served %s)' % (err.expected, err.downloaded))
return
@@ -549,6 +688,8 @@ class YoutubeDL(object):
self.report_error(u'postprocessing: %s' % str(err))
return
self.record_download_archive(info_dict)
def download(self, url_list):
"""Download a given list of URLs."""
if len(url_list) > 1 and self.fixed_template():
@@ -573,7 +714,7 @@ class YoutubeDL(object):
keep_video = None
for pp in self._pps:
try:
keep_video_wish,new_info = pp.run(info)
keep_video_wish, new_info = pp.run(info)
if keep_video_wish is not None:
if keep_video_wish:
keep_video = keep_video_wish
@@ -588,3 +729,55 @@ class YoutubeDL(object):
os.remove(encodeFilename(filename))
except (IOError, OSError):
self.report_warning(u'Unable to remove downloaded video file')
def in_download_archive(self, info_dict):
fn = self.params.get('download_archive')
if fn is None:
return False
vid_id = info_dict['extractor'] + u' ' + info_dict['id']
try:
with locked_file(fn, 'r', encoding='utf-8') as archive_file:
for line in archive_file:
if line.strip() == vid_id:
return True
except IOError as ioe:
if ioe.errno != errno.ENOENT:
raise
return False
def record_download_archive(self, info_dict):
fn = self.params.get('download_archive')
if fn is None:
return
vid_id = info_dict['extractor'] + u' ' + info_dict['id']
with locked_file(fn, 'a', encoding='utf-8') as archive_file:
archive_file.write(vid_id + u'\n')
@staticmethod
def format_resolution(format):
if format.get('height') is not None:
if format.get('width') is not None:
res = u'%sx%s' % (format['width'], format['height'])
else:
res = u'%sp' % format['height']
else:
res = '???'
return res
def list_formats(self, info_dict):
formats_s = []
for format in info_dict.get('formats', [info_dict]):
formats_s.append(u'%-15s: %-5s %-15s[%s]' % (
format['format_id'],
format['ext'],
format.get('format_note') or '-',
self.format_resolution(format),
)
)
if len(formats_s) != 1:
formats_s[0] += ' (worst)'
formats_s[-1] += ' (best)'
formats_s = "\n".join(formats_s)
self.to_screen(u'[info] Available formats for %s:\n'
u'format code extension note resolution\n%s' % (
info_dict['id'], formats_s))

View File

@@ -28,11 +28,16 @@ __authors__ = (
'Axel Noack',
'Albert Kim',
'Pierre Rudloff',
'Huarong Huo',
'Ismael Mejía',
'Steffan \'Ruirize\' James',
'Andras Elso',
)
__license__ = 'Public Domain'
import codecs
import collections
import getpass
import optparse
import os
@@ -42,16 +47,43 @@ import shlex
import socket
import subprocess
import sys
import warnings
import traceback
import platform
from .utils import *
from .utils import (
compat_cookiejar,
compat_print,
compat_str,
compat_urllib_request,
DateRange,
decodeOption,
determine_ext,
DownloadError,
get_cachedir,
make_HTTPS_handler,
MaxDownloadsReached,
platform_name,
preferredencoding,
SameFileError,
std_headers,
write_string,
YoutubeDLHandler,
)
from .update import update_self
from .version import __version__
from .FileDownloader import *
from .FileDownloader import (
FileDownloader,
)
from .extractor import gen_extractors
from .YoutubeDL import YoutubeDL
from .PostProcessor import *
from .PostProcessor import (
FFmpegMetadataPP,
FFmpegVideoConvertor,
FFmpegExtractAudioPP,
FFmpegEmbedSubtitlePP,
)
def parseOpts(overrideArguments=None):
def _readOptions(filename_bytes):
@@ -99,6 +131,16 @@ def parseOpts(overrideArguments=None):
pass
return None
def _hide_login_info(opts):
opts = list(opts)
for private_opt in ['-p', '--password', '-u', '--username', '--video-password']:
try:
i = opts.index(private_opt)
opts[i+1] = '<PRIVATE>'
except ValueError:
pass
return opts
max_width = 80
max_help_position = 80
@@ -136,7 +178,10 @@ def parseOpts(overrideArguments=None):
general.add_option('-U', '--update',
action='store_true', dest='update_self', help='update this program to latest version. Make sure that you have sufficient permissions (run with sudo if needed)')
general.add_option('-i', '--ignore-errors',
action='store_true', dest='ignoreerrors', help='continue on download errors', default=False)
action='store_true', dest='ignoreerrors', help='continue on download errors, for example to to skip unavailable videos in a playlist', default=False)
general.add_option('--abort-on-error',
action='store_false', dest='ignoreerrors',
help='Abort downloading of further videos (in the playlist or the command line) if an error occurs')
general.add_option('--dump-user-agent',
action='store_true', dest='dump_user_agent',
help='display the current browser identification', default=False)
@@ -153,6 +198,12 @@ def parseOpts(overrideArguments=None):
help='Output descriptions of all supported extractors', default=False)
general.add_option('--proxy', dest='proxy', default=None, help='Use the specified HTTP/HTTPS proxy', metavar='URL')
general.add_option('--no-check-certificate', action='store_true', dest='no_check_certificate', default=False, help='Suppress HTTPS certificate validation.')
general.add_option(
'--cache-dir', dest='cachedir', default=get_cachedir(), metavar='DIR',
help='Location in the filesystem where youtube-dl can store downloaded information permanently. By default $XDG_CACHE_HOME/youtube-dl or ~/.cache/youtube-dl .')
general.add_option(
'--no-cache-dir', action='store_const', const=None, dest='cachedir',
help='Disable filesystem caching')
selection.add_option('--playlist-start',
@@ -167,6 +218,13 @@ def parseOpts(overrideArguments=None):
selection.add_option('--date', metavar='DATE', dest='date', help='download only videos uploaded in this date', default=None)
selection.add_option('--datebefore', metavar='DATE', dest='datebefore', help='download only videos uploaded before this date', default=None)
selection.add_option('--dateafter', metavar='DATE', dest='dateafter', help='download only videos uploaded after this date', default=None)
selection.add_option('--no-playlist', action='store_true', dest='noplaylist', help='download only the currently playing video', default=False)
selection.add_option('--age-limit', metavar='YEARS', dest='age_limit',
help='download only videos suitable for the given age',
default=None, type=int)
selection.add_option('--download-archive', metavar='FILE',
dest='download_archive',
help='Download only videos not present in the archive file. Record all downloaded videos in it.')
authentication.add_option('-u', '--username',
@@ -180,8 +238,8 @@ def parseOpts(overrideArguments=None):
video_format.add_option('-f', '--format',
action='store', dest='format', metavar='FORMAT',
help='video format code, specifiy the order of preference using slashes: "-f 22/17/18"')
action='store', dest='format', metavar='FORMAT', default='best',
help='video format code, specifiy the order of preference using slashes: "-f 22/17/18". "-f mp4" and "-f flv" are also supported')
video_format.add_option('--all-formats',
action='store_const', dest='format', help='download all available video formats', const='all')
video_format.add_option('--prefer-free-formats',
@@ -193,13 +251,10 @@ def parseOpts(overrideArguments=None):
subtitles.add_option('--write-sub', '--write-srt',
action='store_true', dest='writesubtitles',
help='write subtitle file (currently youtube only)', default=False)
help='write subtitle file', default=False)
subtitles.add_option('--write-auto-sub', '--write-automatic-sub',
action='store_true', dest='writeautomaticsub',
help='write automatic subtitle file (currently youtube only)', default=False)
subtitles.add_option('--only-sub',
action='store_true', dest='skip_download',
help='[deprecated] alias of --skip-download', default=False)
help='write automatic subtitle file (youtube only)', default=False)
subtitles.add_option('--all-subs',
action='store_true', dest='allsubtitles',
help='downloads all the available subtitles of the video', default=False)
@@ -210,16 +265,16 @@ def parseOpts(overrideArguments=None):
action='store', dest='subtitlesformat', metavar='FORMAT',
help='subtitle format (default=srt) ([sbv/vtt] youtube only)', default='srt')
subtitles.add_option('--sub-lang', '--sub-langs', '--srt-lang',
action='callback', dest='subtitleslang', metavar='LANGS', type='str',
action='callback', dest='subtitleslangs', metavar='LANGS', type='str',
default=[], callback=_comma_separated_values_options_callback,
help='languages of the subtitles to download (optional) separated by commas, use IETF language tags like \'en,pt\'')
downloader.add_option('-r', '--rate-limit',
dest='ratelimit', metavar='LIMIT', help='maximum download rate (e.g. 50k or 44.6m)')
dest='ratelimit', metavar='LIMIT', help='maximum download rate in bytes per second (e.g. 50K or 4.2M)')
downloader.add_option('-R', '--retries',
dest='retries', metavar='RETRIES', help='number of retries (default is %default)', default=10)
downloader.add_option('--buffer-size',
dest='buffersize', metavar='SIZE', help='size of download buffer (e.g. 1024 or 16k) (default is %default)', default="1024")
dest='buffersize', metavar='SIZE', help='size of download buffer (e.g. 1024 or 16K) (default is %default)', default="1024")
downloader.add_option('--no-resize-buffer',
action='store_true', dest='noresizebuffer',
help='do not automatically adjust the buffer size. By default, the buffer size is automatically resized from an initial value of SIZE.', default=False)
@@ -261,6 +316,10 @@ def parseOpts(overrideArguments=None):
verbosity.add_option('--dump-intermediate-pages',
action='store_true', dest='dump_intermediate_pages', default=False,
help='print downloaded pages to debug problems(very verbose)')
verbosity.add_option('--youtube-print-sig-code',
action='store_true', dest='youtube_print_sig_code', default=False,
help=optparse.SUPPRESS_HELP)
filesystem.add_option('-t', '--title',
action='store_true', dest='usetitle', help='use title in file name (default)', default=False)
@@ -276,7 +335,9 @@ def parseOpts(overrideArguments=None):
help=('output filename template. Use %(title)s to get the title, '
'%(uploader)s for the uploader name, %(uploader_id)s for the uploader nickname if different, '
'%(autonumber)s to get an automatically incremented number, '
'%(ext)s for the filename extension, %(upload_date)s for the upload date (YYYYMMDD), '
'%(ext)s for the filename extension, '
'%(format)s for the format description (like "22 - 1280x720" or "HD")'
'%(upload_date)s for the upload date (YYYYMMDD), '
'%(extractor)s for the provider (youtube, metacafe, etc), '
'%(id)s for the video id , %(playlist)s for the playlist the video is in, '
'%(playlist_index)s for the position in the playlist and %% for a literal percent. '
@@ -310,6 +371,9 @@ def parseOpts(overrideArguments=None):
filesystem.add_option('--write-info-json',
action='store_true', dest='writeinfojson',
help='write video metadata to a .info.json file', default=False)
filesystem.add_option('--write-annotations',
action='store_true', dest='writeannotations',
help='write video annotations to a .annotation file', default=False)
filesystem.add_option('--write-thumbnail',
action='store_true', dest='writethumbnail',
help='write thumbnail image to disk', default=False)
@@ -329,6 +393,8 @@ def parseOpts(overrideArguments=None):
help='do not overwrite post-processed files; the post-processed files are overwritten by default')
postproc.add_option('--embed-subs', action='store_true', dest='embedsubtitles', default=False,
help='embed subtitles in the video (only for mp4 videos)')
postproc.add_option('--add-metadata', action='store_true', dest='addmetadata', default=False,
help='add metadata to the files')
parser.add_option_group(general)
@@ -344,22 +410,26 @@ def parseOpts(overrideArguments=None):
if overrideArguments is not None:
opts, args = parser.parse_args(overrideArguments)
if opts.verbose:
sys.stderr.write(u'[debug] Override config: ' + repr(overrideArguments) + '\n')
write_string(u'[debug] Override config: ' + repr(overrideArguments) + '\n')
else:
xdg_config_home = os.environ.get('XDG_CONFIG_HOME')
if xdg_config_home:
userConfFile = os.path.join(xdg_config_home, 'youtube-dl.conf')
userConfFile = os.path.join(xdg_config_home, 'youtube-dl', 'config')
if not os.path.isfile(userConfFile):
userConfFile = os.path.join(xdg_config_home, 'youtube-dl.conf')
else:
userConfFile = os.path.join(os.path.expanduser('~'), '.config', 'youtube-dl.conf')
userConfFile = os.path.join(os.path.expanduser('~'), '.config', 'youtube-dl', 'config')
if not os.path.isfile(userConfFile):
userConfFile = os.path.join(os.path.expanduser('~'), '.config', 'youtube-dl.conf')
systemConf = _readOptions('/etc/youtube-dl.conf')
userConf = _readOptions(userConfFile)
commandLineConf = sys.argv[1:]
argv = systemConf + userConf + commandLineConf
opts, args = parser.parse_args(argv)
if opts.verbose:
sys.stderr.write(u'[debug] System config: ' + repr(systemConf) + '\n')
sys.stderr.write(u'[debug] User config: ' + repr(userConf) + '\n')
sys.stderr.write(u'[debug] Command-line args: ' + repr(commandLineConf) + '\n')
write_string(u'[debug] System config: ' + repr(_hide_login_info(systemConf)) + '\n')
write_string(u'[debug] User config: ' + repr(_hide_login_info(userConf)) + '\n')
write_string(u'[debug] Command-line args: ' + repr(_hide_login_info(commandLineConf)) + '\n')
return parser, opts, args
@@ -382,7 +452,7 @@ def _real_main(argv=None):
except (IOError, OSError) as err:
if opts.verbose:
traceback.print_exc()
sys.stderr.write(u'ERROR: unable to open cookie file\n')
write_string(u'ERROR: unable to open cookie file\n')
sys.exit(101)
# Set user agent
if opts.user_agent is not None:
@@ -409,29 +479,13 @@ def _real_main(argv=None):
batchurls = [x.strip() for x in batchurls]
batchurls = [x for x in batchurls if len(x) > 0 and not re.search(r'^[#/;]', x)]
if opts.verbose:
sys.stderr.write(u'[debug] Batch file urls: ' + repr(batchurls) + u'\n')
write_string(u'[debug] Batch file urls: ' + repr(batchurls) + u'\n')
except IOError:
sys.exit(u'ERROR: batch file could not be read')
all_urls = batchurls + args
all_urls = [url.strip() for url in all_urls]
# General configuration
cookie_processor = compat_urllib_request.HTTPCookieProcessor(jar)
if opts.proxy is not None:
if opts.proxy == '':
proxies = {}
else:
proxies = {'http': opts.proxy, 'https': opts.proxy}
else:
proxies = compat_urllib_request.getproxies()
# Set HTTPS proxy to HTTP one if given (https://github.com/rg3/youtube-dl/issues/805)
if 'http' in proxies and 'https' not in proxies:
proxies['https'] = proxies['http']
proxy_handler = compat_urllib_request.ProxyHandler(proxies)
https_handler = make_HTTPS_handler(opts)
opener = compat_urllib_request.build_opener(https_handler, proxy_handler, cookie_processor, YoutubeDLHandler())
compat_urllib_request.install_opener(opener)
socket.setdefaulttimeout(300) # 5 minutes should be enough (famous last words)
opener = _setup_opener(jar=jar, opts=opts)
extractors = gen_extractors()
@@ -448,6 +502,8 @@ def _real_main(argv=None):
if not ie._WORKING:
continue
desc = getattr(ie, 'IE_DESC', ie.IE_NAME)
if desc is False:
continue
if hasattr(ie, 'SEARCH_KEY'):
_SEARCHES = (u'cute kittens', u'slithering pythons', u'falling cat', u'angry poodle', u'purple fish', u'running tortoise')
_COUNTS = (u'', u'5', u'10', u'all')
@@ -519,6 +575,11 @@ def _real_main(argv=None):
else:
date = DateRange(opts.dateafter, opts.datebefore)
# --all-sub automatically sets --write-sub if --write-auto-sub is not given
# this was the old behaviour if only --all-sub was given.
if opts.allsubtitles and (opts.writeautomaticsub == False):
opts.writesubtitles = True
if sys.version_info < (3,):
# In Python 2, sys.argv is a bytestring (also note http://bugs.python.org/issue2128 for Windows systems)
if opts.outtmpl is not None:
@@ -531,6 +592,10 @@ def _real_main(argv=None):
or (opts.useid and u'%(id)s.%(ext)s')
or (opts.autonumber and u'%(autonumber)s-%(id)s.%(ext)s')
or u'%(title)s-%(id)s.%(ext)s')
if '%(ext)s' not in outtmpl and opts.extractaudio:
parser.error(u'Cannot download a video and extract audio into the same'
u' file! Use "%%(ext)s" instead of %r' %
determine_ext(outtmpl, u''))
# YoutubeDL
ydl = YoutubeDL({
@@ -565,11 +630,13 @@ def _real_main(argv=None):
'progress_with_newline': opts.progress_with_newline,
'playliststart': opts.playliststart,
'playlistend': opts.playlistend,
'noplaylist': opts.noplaylist,
'logtostderr': opts.outtmpl == '-',
'consoletitle': opts.consoletitle,
'nopart': opts.nopart,
'updatetime': opts.updatetime,
'writedescription': opts.writedescription,
'writeannotations': opts.writeannotations,
'writeinfojson': opts.writeinfojson,
'writethumbnail': opts.writethumbnail,
'writesubtitles': opts.writesubtitles,
@@ -577,7 +644,7 @@ def _real_main(argv=None):
'allsubtitles': opts.allsubtitles,
'listsubtitles': opts.listsubtitles,
'subtitlesformat': opts.subtitlesformat,
'subtitleslangs': opts.subtitleslang,
'subtitleslangs': opts.subtitleslangs,
'matchtitle': decodeOption(opts.matchtitle),
'rejecttitle': decodeOption(opts.rejecttitle),
'max_downloads': opts.max_downloads,
@@ -589,10 +656,14 @@ def _real_main(argv=None):
'min_filesize': opts.min_filesize,
'max_filesize': opts.max_filesize,
'daterange': date,
'cachedir': opts.cachedir,
'youtube_print_sig_code': opts.youtube_print_sig_code,
'age_limit': opts.age_limit,
'download_archive': opts.download_archive,
})
if opts.verbose:
sys.stderr.write(u'[debug] youtube-dl version ' + __version__ + u'\n')
write_string(u'[debug] youtube-dl version ' + __version__ + u'\n')
try:
sp = subprocess.Popen(
['git', 'rev-parse', '--short', 'HEAD'],
@@ -601,18 +672,26 @@ def _real_main(argv=None):
out, err = sp.communicate()
out = out.decode().strip()
if re.match('[0-9a-f]+', out):
sys.stderr.write(u'[debug] Git HEAD: ' + out + u'\n')
write_string(u'[debug] Git HEAD: ' + out + u'\n')
except:
try:
sys.exc_clear()
except:
pass
sys.stderr.write(u'[debug] Python version %s - %s' %(platform.python_version(), platform.platform()) + u'\n')
sys.stderr.write(u'[debug] Proxy map: ' + str(proxy_handler.proxies) + u'\n')
write_string(u'[debug] Python version %s - %s' %(platform.python_version(), platform_name()) + u'\n')
proxy_map = {}
for handler in opener.handlers:
if hasattr(handler, 'proxies'):
proxy_map.update(handler.proxies)
write_string(u'[debug] Proxy map: ' + compat_str(proxy_map) + u'\n')
ydl.add_default_info_extractors()
# PostProcessors
# Add the metadata pp first, the other pps will copy it
if opts.addmetadata:
ydl.add_post_processor(FFmpegMetadataPP())
if opts.extractaudio:
ydl.add_post_processor(FFmpegExtractAudioPP(preferredcodec=opts.audioformat, preferredquality=opts.audioquality, nopostoverwrites=opts.nopostoverwrites))
if opts.recodevideo:
@@ -622,7 +701,7 @@ def _real_main(argv=None):
# Update version
if opts.update_self:
update_self(ydl.to_screen, opts.verbose, sys.argv[0])
update_self(ydl.to_screen, opts.verbose)
# Maybe do nothing
if len(all_urls) < 1:
@@ -641,11 +720,42 @@ def _real_main(argv=None):
if opts.cookiefile is not None:
try:
jar.save()
except (IOError, OSError) as err:
except (IOError, OSError):
sys.exit(u'ERROR: unable to save cookie jar')
sys.exit(retcode)
def _setup_opener(jar=None, opts=None, timeout=300):
if opts is None:
FakeOptions = collections.namedtuple(
'FakeOptions', ['proxy', 'no_check_certificate'])
opts = FakeOptions(proxy=None, no_check_certificate=False)
cookie_processor = compat_urllib_request.HTTPCookieProcessor(jar)
if opts.proxy is not None:
if opts.proxy == '':
proxies = {}
else:
proxies = {'http': opts.proxy, 'https': opts.proxy}
else:
proxies = compat_urllib_request.getproxies()
# Set HTTPS proxy to HTTP one if given (https://github.com/rg3/youtube-dl/issues/805)
if 'http' in proxies and 'https' not in proxies:
proxies['https'] = proxies['http']
proxy_handler = compat_urllib_request.ProxyHandler(proxies)
https_handler = make_HTTPS_handler(opts)
opener = compat_urllib_request.build_opener(
https_handler, proxy_handler, cookie_processor, YoutubeDLHandler())
# Delete the default user-agent header, which would otherwise apply in
# cases where our custom HTTP handler doesn't come into play
# (See https://github.com/rg3/youtube-dl/issues/1309 for details)
opener.addheaders = []
compat_urllib_request.install_opener(opener)
socket.setdefaulttimeout(timeout)
return opener
def main(argv=None):
try:
_real_main(argv)

202
youtube_dl/aes.py Normal file
View File

@@ -0,0 +1,202 @@
__all__ = ['aes_encrypt', 'key_expansion', 'aes_ctr_decrypt', 'aes_decrypt_text']
import base64
from math import ceil
from .utils import bytes_to_intlist, intlist_to_bytes
BLOCK_SIZE_BYTES = 16
def aes_ctr_decrypt(data, key, counter):
"""
Decrypt with aes in counter mode
@param {int[]} data cipher
@param {int[]} key 16/24/32-Byte cipher key
@param {instance} counter Instance whose next_value function (@returns {int[]} 16-Byte block)
returns the next counter block
@returns {int[]} decrypted data
"""
expanded_key = key_expansion(key)
block_count = int(ceil(float(len(data)) / BLOCK_SIZE_BYTES))
decrypted_data=[]
for i in range(block_count):
counter_block = counter.next_value()
block = data[i*BLOCK_SIZE_BYTES : (i+1)*BLOCK_SIZE_BYTES]
block += [0]*(BLOCK_SIZE_BYTES - len(block))
cipher_counter_block = aes_encrypt(counter_block, expanded_key)
decrypted_data += xor(block, cipher_counter_block)
decrypted_data = decrypted_data[:len(data)]
return decrypted_data
def key_expansion(data):
"""
Generate key schedule
@param {int[]} data 16/24/32-Byte cipher key
@returns {int[]} 176/208/240-Byte expanded key
"""
data = data[:] # copy
rcon_iteration = 1
key_size_bytes = len(data)
expanded_key_size_bytes = (key_size_bytes // 4 + 7) * BLOCK_SIZE_BYTES
while len(data) < expanded_key_size_bytes:
temp = data[-4:]
temp = key_schedule_core(temp, rcon_iteration)
rcon_iteration += 1
data += xor(temp, data[-key_size_bytes : 4-key_size_bytes])
for _ in range(3):
temp = data[-4:]
data += xor(temp, data[-key_size_bytes : 4-key_size_bytes])
if key_size_bytes == 32:
temp = data[-4:]
temp = sub_bytes(temp)
data += xor(temp, data[-key_size_bytes : 4-key_size_bytes])
for _ in range(3 if key_size_bytes == 32 else 2 if key_size_bytes == 24 else 0):
temp = data[-4:]
data += xor(temp, data[-key_size_bytes : 4-key_size_bytes])
data = data[:expanded_key_size_bytes]
return data
def aes_encrypt(data, expanded_key):
"""
Encrypt one block with aes
@param {int[]} data 16-Byte state
@param {int[]} expanded_key 176/208/240-Byte expanded key
@returns {int[]} 16-Byte cipher
"""
rounds = len(expanded_key) // BLOCK_SIZE_BYTES - 1
data = xor(data, expanded_key[:BLOCK_SIZE_BYTES])
for i in range(1, rounds+1):
data = sub_bytes(data)
data = shift_rows(data)
if i != rounds:
data = mix_columns(data)
data = xor(data, expanded_key[i*BLOCK_SIZE_BYTES : (i+1)*BLOCK_SIZE_BYTES])
return data
def aes_decrypt_text(data, password, key_size_bytes):
"""
Decrypt text
- The first 8 Bytes of decoded 'data' are the 8 high Bytes of the counter
- The cipher key is retrieved by encrypting the first 16 Byte of 'password'
with the first 'key_size_bytes' Bytes from 'password' (if necessary filled with 0's)
- Mode of operation is 'counter'
@param {str} data Base64 encoded string
@param {str,unicode} password Password (will be encoded with utf-8)
@param {int} key_size_bytes Possible values: 16 for 128-Bit, 24 for 192-Bit or 32 for 256-Bit
@returns {str} Decrypted data
"""
NONCE_LENGTH_BYTES = 8
data = bytes_to_intlist(base64.b64decode(data))
password = bytes_to_intlist(password.encode('utf-8'))
key = password[:key_size_bytes] + [0]*(key_size_bytes - len(password))
key = aes_encrypt(key[:BLOCK_SIZE_BYTES], key_expansion(key)) * (key_size_bytes // BLOCK_SIZE_BYTES)
nonce = data[:NONCE_LENGTH_BYTES]
cipher = data[NONCE_LENGTH_BYTES:]
class Counter:
__value = nonce + [0]*(BLOCK_SIZE_BYTES - NONCE_LENGTH_BYTES)
def next_value(self):
temp = self.__value
self.__value = inc(self.__value)
return temp
decrypted_data = aes_ctr_decrypt(cipher, key, Counter())
plaintext = intlist_to_bytes(decrypted_data)
return plaintext
RCON = (0x8d, 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80, 0x1b, 0x36)
SBOX = (0x63, 0x7C, 0x77, 0x7B, 0xF2, 0x6B, 0x6F, 0xC5, 0x30, 0x01, 0x67, 0x2B, 0xFE, 0xD7, 0xAB, 0x76,
0xCA, 0x82, 0xC9, 0x7D, 0xFA, 0x59, 0x47, 0xF0, 0xAD, 0xD4, 0xA2, 0xAF, 0x9C, 0xA4, 0x72, 0xC0,
0xB7, 0xFD, 0x93, 0x26, 0x36, 0x3F, 0xF7, 0xCC, 0x34, 0xA5, 0xE5, 0xF1, 0x71, 0xD8, 0x31, 0x15,
0x04, 0xC7, 0x23, 0xC3, 0x18, 0x96, 0x05, 0x9A, 0x07, 0x12, 0x80, 0xE2, 0xEB, 0x27, 0xB2, 0x75,
0x09, 0x83, 0x2C, 0x1A, 0x1B, 0x6E, 0x5A, 0xA0, 0x52, 0x3B, 0xD6, 0xB3, 0x29, 0xE3, 0x2F, 0x84,
0x53, 0xD1, 0x00, 0xED, 0x20, 0xFC, 0xB1, 0x5B, 0x6A, 0xCB, 0xBE, 0x39, 0x4A, 0x4C, 0x58, 0xCF,
0xD0, 0xEF, 0xAA, 0xFB, 0x43, 0x4D, 0x33, 0x85, 0x45, 0xF9, 0x02, 0x7F, 0x50, 0x3C, 0x9F, 0xA8,
0x51, 0xA3, 0x40, 0x8F, 0x92, 0x9D, 0x38, 0xF5, 0xBC, 0xB6, 0xDA, 0x21, 0x10, 0xFF, 0xF3, 0xD2,
0xCD, 0x0C, 0x13, 0xEC, 0x5F, 0x97, 0x44, 0x17, 0xC4, 0xA7, 0x7E, 0x3D, 0x64, 0x5D, 0x19, 0x73,
0x60, 0x81, 0x4F, 0xDC, 0x22, 0x2A, 0x90, 0x88, 0x46, 0xEE, 0xB8, 0x14, 0xDE, 0x5E, 0x0B, 0xDB,
0xE0, 0x32, 0x3A, 0x0A, 0x49, 0x06, 0x24, 0x5C, 0xC2, 0xD3, 0xAC, 0x62, 0x91, 0x95, 0xE4, 0x79,
0xE7, 0xC8, 0x37, 0x6D, 0x8D, 0xD5, 0x4E, 0xA9, 0x6C, 0x56, 0xF4, 0xEA, 0x65, 0x7A, 0xAE, 0x08,
0xBA, 0x78, 0x25, 0x2E, 0x1C, 0xA6, 0xB4, 0xC6, 0xE8, 0xDD, 0x74, 0x1F, 0x4B, 0xBD, 0x8B, 0x8A,
0x70, 0x3E, 0xB5, 0x66, 0x48, 0x03, 0xF6, 0x0E, 0x61, 0x35, 0x57, 0xB9, 0x86, 0xC1, 0x1D, 0x9E,
0xE1, 0xF8, 0x98, 0x11, 0x69, 0xD9, 0x8E, 0x94, 0x9B, 0x1E, 0x87, 0xE9, 0xCE, 0x55, 0x28, 0xDF,
0x8C, 0xA1, 0x89, 0x0D, 0xBF, 0xE6, 0x42, 0x68, 0x41, 0x99, 0x2D, 0x0F, 0xB0, 0x54, 0xBB, 0x16)
MIX_COLUMN_MATRIX = ((2,3,1,1),
(1,2,3,1),
(1,1,2,3),
(3,1,1,2))
def sub_bytes(data):
return [SBOX[x] for x in data]
def rotate(data):
return data[1:] + [data[0]]
def key_schedule_core(data, rcon_iteration):
data = rotate(data)
data = sub_bytes(data)
data[0] = data[0] ^ RCON[rcon_iteration]
return data
def xor(data1, data2):
return [x^y for x, y in zip(data1, data2)]
def mix_column(data):
data_mixed = []
for row in range(4):
mixed = 0
for column in range(4):
addend = data[column]
if MIX_COLUMN_MATRIX[row][column] in (2,3):
addend <<= 1
if addend > 0xff:
addend &= 0xff
addend ^= 0x1b
if MIX_COLUMN_MATRIX[row][column] == 3:
addend ^= data[column]
mixed ^= addend & 0xff
data_mixed.append(mixed)
return data_mixed
def mix_columns(data):
data_mixed = []
for i in range(4):
column = data[i*4 : (i+1)*4]
data_mixed += mix_column(column)
return data_mixed
def shift_rows(data):
data_shifted = []
for column in range(4):
for row in range(4):
data_shifted.append( data[((column + row) & 0b11) * 4 + row] )
return data_shifted
def inc(data):
data = data[:] # copy
for i in range(len(data)-1,-1,-1):
if data[i] == 255:
data[i] = 0
else:
data[i] = data[i] + 1
break
return data

View File

@@ -1,27 +1,57 @@
from .appletrailers import AppleTrailersIE
from .addanime import AddAnimeIE
from .archiveorg import ArchiveOrgIE
from .ard import ARDIE
from .arte import ArteTvIE
from .arte import (
ArteTvIE,
ArteTVPlus7IE,
ArteTVCreativeIE,
ArteTVFutureIE,
)
from .auengine import AUEngineIE
from .bandcamp import BandcampIE
from .bliptv import BlipTVIE, BlipTVUserIE
from .bloomberg import BloombergIE
from .breakcom import BreakIE
from .brightcove import BrightcoveIE
from .c56 import C56IE
from .canalplus import CanalplusIE
from .canalc2 import Canalc2IE
from .cinemassacre import CinemassacreIE
from .cnn import CNNIE
from .collegehumor import CollegeHumorIE
from .comedycentral import ComedyCentralIE
from .condenast import CondeNastIE
from .criterion import CriterionIE
from .cspan import CSpanIE
from .dailymotion import DailymotionIE, DailymotionPlaylistIE
from .dailymotion import (
DailymotionIE,
DailymotionPlaylistIE,
DailymotionUserIE,
)
from .daum import DaumIE
from .depositfiles import DepositFilesIE
from .dotsub import DotsubIE
from .dreisat import DreiSatIE
from .defense import DefenseGouvFrIE
from .ebaumsworld import EbaumsWorldIE
from .ehow import EHowIE
from .eighttracks import EightTracksIE
from .escapist import EscapistIE
from .exfm import ExfmIE
from .facebook import FacebookIE
from .faz import FazIE
from .fktv import (
FKTVIE,
FKTVPosteckeIE,
)
from .flickr import FlickrIE
from .francetv import (
PluzzIE,
FranceTvInfoIE,
France2IE,
GenerationQuoiIE
)
from .freesound import FreesoundIE
from .funnyordie import FunnyOrDieIE
from .gamespot import GameSpotIE
@@ -29,6 +59,7 @@ from .gametrailers import GametrailersIE
from .generic import GenericIE
from .googleplus import GooglePlusIE
from .googlesearch import GoogleSearchIE
from .hark import HarkIE
from .hotnewhiphop import HotNewHipHopIE
from .howcast import HowcastIE
from .hypem import HypemIE
@@ -36,54 +67,78 @@ from .ign import IGNIE, OneUPIE
from .ina import InaIE
from .infoq import InfoQIE
from .instagram import InstagramIE
from .internetvideoarchive import InternetVideoArchiveIE
from .jeuxvideo import JeuxVideoIE
from .jukebox import JukeboxIE
from .justintv import JustinTVIE
from .kankan import KankanIE
from .kickstarter import KickStarterIE
from .keek import KeekIE
from .liveleak import LiveLeakIE
from .livestream import LivestreamIE
from .metacafe import MetacafeIE
from .metacritic import MetacriticIE
from .mit import TechTVMITIE, MITIE
from .mixcloud import MixcloudIE
from .mtv import MTVIE
from .muzu import MuzuTVIE
from .myspass import MySpassIE
from .myvideo import MyVideoIE
from .naver import NaverIE
from .nba import NBAIE
from .nbc import NBCNewsIE
from .newgrounds import NewgroundsIE
from .nhl import NHLIE, NHLVideocenterIE
from .nowvideo import NowVideoIE
from .ooyala import OoyalaIE
from .orf import ORFIE
from .pbs import PBSIE
from .photobucket import PhotobucketIE
from .pornotube import PornotubeIE
from .rbmaradio import RBMARadioIE
from .redtube import RedTubeIE
from .ringtv import RingTVIE
from .ro220 import Ro220IE
from .rottentomatoes import RottenTomatoesIE
from .roxwel import RoxwelIE
from .rtlnow import RTLnowIE
from .rutube import RutubeIE
from .sina import SinaIE
from .slashdot import SlashdotIE
from .soundcloud import SoundcloudIE, SoundcloudSetIE
from .slideshare import SlideshareIE
from .sohu import SohuIE
from .soundcloud import SoundcloudIE, SoundcloudSetIE, SoundcloudUserIE
from .southparkstudios import SouthParkStudiosIE
from .spiegel import SpiegelIE
from .stanfordoc import StanfordOpenClassroomIE
from .statigram import StatigramIE
from .steam import SteamIE
from .sztvhu import SztvHuIE
from .teamcoco import TeamcocoIE
from .techtalks import TechTalksIE
from .ted import TEDIE
from .tf1 import TF1IE
from .thisav import ThisAVIE
from .traileraddict import TrailerAddictIE
from .trilulilu import TriluliluIE
from .tudou import TudouIE
from .tumblr import TumblrIE
from .tutv import TutvIE
from .ustream import UstreamIE
from .unistra import UnistraIE
from .ustream import UstreamIE, UstreamChannelIE
from .vbox7 import Vbox7IE
from .veehd import VeeHDIE
from .veoh import VeohIE
from .vevo import VevoIE
from .vice import ViceIE
from .viddler import ViddlerIE
from .videodetective import VideoDetectiveIE
from .videofyme import VideofyMeIE
from .videopremium import VideoPremiumIE
from .vimeo import VimeoIE, VimeoChannelIE
from .vine import VineIE
from .c56 import C56IE
from .wat import WatIE
from .websurg import WeBSurgIE
from .weibo import WeiboIE
from .wimp import WimpIE
from .worldstarhiphop import WorldStarHipHopIE
@@ -103,6 +158,7 @@ from .youtube import (
YoutubeShowIE,
YoutubeSubscriptionsIE,
YoutubeRecommendedIE,
YoutubeTruncatedURLIE,
YoutubeWatchLaterIE,
YoutubeFavouritesIE,
)
@@ -116,12 +172,14 @@ _ALL_CLASSES = [
]
_ALL_CLASSES.append(GenericIE)
def gen_extractors():
""" Return a list of an instance of every supported extractor.
The order does matter; the first extractor matched is the one handling the URL.
"""
return [klass() for klass in _ALL_CLASSES]
def get_info_extractor(ie_name):
"""Returns the info extractor class with the given ie_name"""
return globals()[ie_name+'IE']

View File

@@ -0,0 +1,75 @@
import re
from .common import InfoExtractor
from ..utils import (
compat_HTTPError,
compat_str,
compat_urllib_parse,
compat_urllib_parse_urlparse,
ExtractorError,
)
class AddAnimeIE(InfoExtractor):
_VALID_URL = r'^http://(?:\w+\.)?add-anime\.net/watch_video.php\?(?:.*?)v=(?P<video_id>[\w_]+)(?:.*)'
IE_NAME = u'AddAnime'
_TEST = {
u'url': u'http://www.add-anime.net/watch_video.php?v=24MR3YO5SAS9',
u'file': u'24MR3YO5SAS9.flv',
u'md5': u'1036a0e0cd307b95bd8a8c3a5c8cfaf1',
u'info_dict': {
u"description": u"One Piece 606",
u"title": u"One Piece 606"
}
}
def _real_extract(self, url):
try:
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('video_id')
webpage = self._download_webpage(url, video_id)
except ExtractorError as ee:
if not isinstance(ee.cause, compat_HTTPError):
raise
redir_webpage = ee.cause.read().decode('utf-8')
action = self._search_regex(
r'<form id="challenge-form" action="([^"]+)"',
redir_webpage, u'Redirect form')
vc = self._search_regex(
r'<input type="hidden" name="jschl_vc" value="([^"]+)"/>',
redir_webpage, u'redirect vc value')
av = re.search(
r'a\.value = ([0-9]+)[+]([0-9]+)[*]([0-9]+);',
redir_webpage)
if av is None:
raise ExtractorError(u'Cannot find redirect math task')
av_res = int(av.group(1)) + int(av.group(2)) * int(av.group(3))
parsed_url = compat_urllib_parse_urlparse(url)
av_val = av_res + len(parsed_url.netloc)
confirm_url = (
parsed_url.scheme + u'://' + parsed_url.netloc +
action + '?' +
compat_urllib_parse.urlencode({
'jschl_vc': vc, 'jschl_answer': compat_str(av_val)}))
self._download_webpage(
confirm_url, video_id,
note=u'Confirming after redirect')
webpage = self._download_webpage(url, video_id)
video_url = self._search_regex(r"var normal_video_file = '(.*?)';",
webpage, u'video file URL')
video_title = self._og_search_title(webpage)
video_description = self._og_search_description(webpage)
return {
'_type': 'video',
'id': video_id,
'url': video_url,
'ext': 'flv',
'title': video_title,
'description': video_description
}

View File

@@ -0,0 +1,138 @@
import re
import xml.etree.ElementTree
import json
from .common import InfoExtractor
from ..utils import (
compat_urlparse,
determine_ext,
)
class AppleTrailersIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?trailers.apple.com/trailers/(?P<company>[^/]+)/(?P<movie>[^/]+)'
_TEST = {
u"url": u"http://trailers.apple.com/trailers/wb/manofsteel/",
u"playlist": [
{
u"file": u"manofsteel-trailer4.mov",
u"md5": u"d97a8e575432dbcb81b7c3acb741f8a8",
u"info_dict": {
u"duration": 111,
u"title": u"Trailer 4",
u"upload_date": u"20130523",
u"uploader_id": u"wb",
},
},
{
u"file": u"manofsteel-trailer3.mov",
u"md5": u"b8017b7131b721fb4e8d6f49e1df908c",
u"info_dict": {
u"duration": 182,
u"title": u"Trailer 3",
u"upload_date": u"20130417",
u"uploader_id": u"wb",
},
},
{
u"file": u"manofsteel-trailer.mov",
u"md5": u"d0f1e1150989b9924679b441f3404d48",
u"info_dict": {
u"duration": 148,
u"title": u"Trailer",
u"upload_date": u"20121212",
u"uploader_id": u"wb",
},
},
{
u"file": u"manofsteel-teaser.mov",
u"md5": u"5fe08795b943eb2e757fa95cb6def1cb",
u"info_dict": {
u"duration": 93,
u"title": u"Teaser",
u"upload_date": u"20120721",
u"uploader_id": u"wb",
},
}
]
}
_JSON_RE = r'iTunes.playURL\((.*?)\);'
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
movie = mobj.group('movie')
uploader_id = mobj.group('company')
playlist_url = compat_urlparse.urljoin(url, u'includes/playlists/itunes.inc')
playlist_snippet = self._download_webpage(playlist_url, movie)
playlist_cleaned = re.sub(r'(?s)<script[^<]*?>.*?</script>', u'', playlist_snippet)
playlist_cleaned = re.sub(r'<img ([^<]*?)>', r'<img \1/>', playlist_cleaned)
# The ' in the onClick attributes are not escaped, it couldn't be parsed
# with xml.etree.ElementTree.fromstring
# like: http://trailers.apple.com/trailers/wb/gravity/
def _clean_json(m):
return u'iTunes.playURL(%s);' % m.group(1).replace('\'', '&#39;')
playlist_cleaned = re.sub(self._JSON_RE, _clean_json, playlist_cleaned)
playlist_html = u'<html>' + playlist_cleaned + u'</html>'
doc = xml.etree.ElementTree.fromstring(playlist_html)
playlist = []
for li in doc.findall('./div/ul/li'):
on_click = li.find('.//a').attrib['onClick']
trailer_info_json = self._search_regex(self._JSON_RE,
on_click, u'trailer info')
trailer_info = json.loads(trailer_info_json)
title = trailer_info['title']
video_id = movie + '-' + re.sub(r'[^a-zA-Z0-9]', '', title).lower()
thumbnail = li.find('.//img').attrib['src']
upload_date = trailer_info['posted'].replace('-', '')
runtime = trailer_info['runtime']
m = re.search(r'(?P<minutes>[0-9]+):(?P<seconds>[0-9]{1,2})', runtime)
duration = None
if m:
duration = 60 * int(m.group('minutes')) + int(m.group('seconds'))
first_url = trailer_info['url']
trailer_id = first_url.split('/')[-1].rpartition('_')[0].lower()
settings_json_url = compat_urlparse.urljoin(url, 'includes/settings/%s.json' % trailer_id)
settings_json = self._download_webpage(settings_json_url, trailer_id, u'Downloading settings json')
settings = json.loads(settings_json)
formats = []
for format in settings['metadata']['sizes']:
# The src is a file pointing to the real video file
format_url = re.sub(r'_(\d*p.mov)', r'_h\1', format['src'])
formats.append({
'url': format_url,
'ext': determine_ext(format_url),
'format': format['type'],
'width': format['width'],
'height': int(format['height']),
})
formats = sorted(formats, key=lambda f: (f['height'], f['width']))
info = {
'_type': 'video',
'id': video_id,
'title': title,
'formats': formats,
'title': title,
'duration': duration,
'thumbnail': thumbnail,
'upload_date': upload_date,
'uploader_id': uploader_id,
'user_agent': 'QuickTime compatible (youtube-dl)',
}
# TODO: Remove when #980 has been merged
info['url'] = formats[-1]['url']
info['ext'] = formats[-1]['ext']
playlist.append(info)
return {
'_type': 'playlist',
'id': movie,
'entries': playlist,
}

View File

@@ -46,6 +46,8 @@ class ArchiveOrgIE(InfoExtractor):
for fn,fdata in data['files'].items()
if 'Video' in fdata['format']]
formats.sort(key=lambda fdata: fdata['file_size'])
for f in formats:
f['ext'] = determine_ext(f['url'])
info = {
'_type': 'video',
@@ -61,7 +63,6 @@ class ArchiveOrgIE(InfoExtractor):
info['thumbnail'] = thumbnail
# TODO: Remove when #980 has been merged
info['url'] = formats[-1]['url']
info['ext'] = determine_ext(formats[-1]['url'])
info.update(formats[-1])
return info
return info

View File

@@ -1,3 +1,4 @@
# encoding: utf-8
import re
import json
import xml.etree.ElementTree
@@ -7,15 +8,15 @@ from ..utils import (
ExtractorError,
find_xpath_attr,
unified_strdate,
determine_ext,
get_element_by_id,
)
# There are different sources of video in arte.tv, the extraction process
# is different for each one. The videos usually expire in 7 days, so we can't
# add tests.
class ArteTvIE(InfoExtractor):
"""
There are two sources of video in arte.tv: videos.arte.tv and
www.arte.tv/guide, the extraction process is different for each one.
The videos expire in 7 days, so we can't add tests.
"""
_EMISSION_URL = r'(?:http://)?www\.arte.tv/guide/(?P<lang>fr|de)/(?:(?:sendungen|emissions)/)?(?P<id>.*?)/(?P<name>.*?)(\?.*)?'
_VIDEOS_URL = r'(?:http://)?videos.arte.tv/(?P<lang>fr|de)/.*-(?P<id>.*?).html'
_LIVEWEB_URL = r'(?:http://)?liveweb.arte.tv/(?P<lang>fr|de)/(?P<subpage>.+?)/(?P<name>.+)'
_LIVE_URL = r'index-[0-9]+\.html$'
@@ -24,7 +25,7 @@ class ArteTvIE(InfoExtractor):
@classmethod
def suitable(cls, url):
return any(re.match(regex, url) for regex in (cls._EMISSION_URL, cls._VIDEOS_URL, cls._LIVEWEB_URL))
return any(re.match(regex, url) for regex in (cls._VIDEOS_URL, cls._LIVEWEB_URL))
# TODO implement Live Stream
# from ..utils import compat_urllib_parse
@@ -55,14 +56,6 @@ class ArteTvIE(InfoExtractor):
# video_url = u'%s/%s' % (info.get('url'), info.get('path'))
def _real_extract(self, url):
mobj = re.match(self._EMISSION_URL, url)
if mobj is not None:
lang = mobj.group('lang')
# This is not a real id, it can be for example AJT for the news
# http://www.arte.tv/guide/fr/emissions/AJT/arte-journal
video_id = mobj.group('id')
return self._extract_emission(url, video_id, lang)
mobj = re.match(self._VIDEOS_URL, url)
if mobj is not None:
id = mobj.group('id')
@@ -80,49 +73,6 @@ class ArteTvIE(InfoExtractor):
# self.extractLiveStream(url)
# return
def _extract_emission(self, url, video_id, lang):
"""Extract from www.arte.tv/guide"""
webpage = self._download_webpage(url, video_id)
json_url = self._html_search_regex(r'arte_vp_url="(.*?)"', webpage, 'json url')
json_info = self._download_webpage(json_url, video_id, 'Downloading info json')
self.report_extraction(video_id)
info = json.loads(json_info)
player_info = info['videoJsonPlayer']
info_dict = {'id': player_info['VID'],
'title': player_info['VTI'],
'description': player_info.get('VDE'),
'upload_date': unified_strdate(player_info['VDA'].split(' ')[0]),
'thumbnail': player_info['programImage'],
'ext': 'flv',
}
formats = player_info['VSR'].values()
def _match_lang(f):
# Return true if that format is in the language of the url
if lang == 'fr':
l = 'F'
elif lang == 'de':
l = 'A'
regexes = [r'VO?%s' % l, r'VO?.-ST%s' % l]
return any(re.match(r, f['versionCode']) for r in regexes)
# Some formats may not be in the same language as the url
formats = filter(_match_lang, formats)
# We order the formats by quality
formats = sorted(formats, key=lambda f: int(f['height']))
# Prefer videos without subtitles in the same language
formats = sorted(formats, key=lambda f: re.match(r'VO(F|A)-STM\1', f['versionCode']) is None)
# Pick the best quality
format_info = formats[-1]
if format_info['mediaType'] == u'rtmp':
info_dict['url'] = format_info['streamer']
info_dict['play_path'] = 'mp4:' + format_info['url']
else:
info_dict['url'] = format_info['url']
return info_dict
def _extract_video(self, url, video_id, lang):
"""Extract from videos.arte.tv"""
ref_xml_url = url.replace('/videos/', '/do_delegate/videos/')
@@ -172,3 +122,123 @@ class ArteTvIE(InfoExtractor):
'ext': 'flv',
'thumbnail': self._og_search_thumbnail(webpage),
}
class ArteTVPlus7IE(InfoExtractor):
IE_NAME = u'arte.tv:+7'
_VALID_URL = r'https?://www\.arte.tv/guide/(?P<lang>fr|de)/(?:(?:sendungen|emissions)/)?(?P<id>.*?)/(?P<name>.*?)(\?.*)?'
@classmethod
def _extract_url_info(cls, url):
mobj = re.match(cls._VALID_URL, url)
lang = mobj.group('lang')
# This is not a real id, it can be for example AJT for the news
# http://www.arte.tv/guide/fr/emissions/AJT/arte-journal
video_id = mobj.group('id')
return video_id, lang
def _real_extract(self, url):
video_id, lang = self._extract_url_info(url)
webpage = self._download_webpage(url, video_id)
return self._extract_from_webpage(webpage, video_id, lang)
def _extract_from_webpage(self, webpage, video_id, lang):
json_url = self._html_search_regex(r'arte_vp_url="(.*?)"', webpage, 'json url')
json_info = self._download_webpage(json_url, video_id, 'Downloading info json')
self.report_extraction(video_id)
info = json.loads(json_info)
player_info = info['videoJsonPlayer']
info_dict = {
'id': player_info['VID'],
'title': player_info['VTI'],
'description': player_info.get('VDE'),
'upload_date': unified_strdate(player_info.get('VDA', '').split(' ')[0]),
'thumbnail': player_info.get('programImage') or player_info.get('VTU', {}).get('IUR'),
}
formats = player_info['VSR'].values()
def _match_lang(f):
if f.get('versionCode') is None:
return True
# Return true if that format is in the language of the url
if lang == 'fr':
l = 'F'
elif lang == 'de':
l = 'A'
regexes = [r'VO?%s' % l, r'VO?.-ST%s' % l]
return any(re.match(r, f['versionCode']) for r in regexes)
# Some formats may not be in the same language as the url
formats = filter(_match_lang, formats)
# Some formats use the m3u8 protocol
formats = filter(lambda f: f.get('videoFormat') != 'M3U8', formats)
# We order the formats by quality
formats = list(formats) # in python3 filter returns an iterator
if re.match(r'[A-Z]Q', formats[0]['quality']) is not None:
sort_key = lambda f: ['HQ', 'MQ', 'EQ', 'SQ'].index(f['quality'])
else:
sort_key = lambda f: int(f.get('height',-1))
formats = sorted(formats, key=sort_key)
# Prefer videos without subtitles in the same language
formats = sorted(formats, key=lambda f: re.match(r'VO(F|A)-STM\1', f.get('versionCode', '')) is None)
# Pick the best quality
def _format(format_info):
quality = format_info['quality']
m_quality = re.match(r'\w*? - (\d*)p', quality)
if m_quality is not None:
quality = m_quality.group(1)
if format_info.get('versionCode') is not None:
format_id = u'%s-%s' % (quality, format_info['versionCode'])
else:
format_id = quality
info = {
'format_id': format_id,
'format_note': format_info.get('versionLibelle'),
'width': format_info.get('width'),
'height': format_info.get('height'),
}
if format_info['mediaType'] == u'rtmp':
info['url'] = format_info['streamer']
info['play_path'] = 'mp4:' + format_info['url']
info['ext'] = 'flv'
else:
info['url'] = format_info['url']
info['ext'] = determine_ext(info['url'])
return info
info_dict['formats'] = [_format(f) for f in formats]
return info_dict
# It also uses the arte_vp_url url from the webpage to extract the information
class ArteTVCreativeIE(ArteTVPlus7IE):
IE_NAME = u'arte.tv:creative'
_VALID_URL = r'https?://creative\.arte\.tv/(?P<lang>fr|de)/magazine?/(?P<id>.+)'
_TEST = {
u'url': u'http://creative.arte.tv/de/magazin/agentur-amateur-corporate-design',
u'file': u'050489-002.mp4',
u'info_dict': {
u'title': u'Agentur Amateur / Agence Amateur #2 : Corporate Design',
},
}
class ArteTVFutureIE(ArteTVPlus7IE):
IE_NAME = u'arte.tv:future'
_VALID_URL = r'https?://future\.arte\.tv/(?P<lang>fr|de)/(thema|sujet)/.*?#article-anchor-(?P<id>\d+)'
_TEST = {
u'url': u'http://future.arte.tv/fr/sujet/info-sciences#article-anchor-7081',
u'file': u'050940-003.mp4',
u'info_dict': {
u'title': u'Les champignons au secours de la planète',
},
}
def _real_extract(self, url):
anchor_id, lang = self._extract_url_info(url)
webpage = self._download_webpage(url, anchor_id)
row = get_element_by_id(anchor_id, webpage)
return self._extract_from_webpage(row, anchor_id, lang)

View File

@@ -115,7 +115,7 @@ class BlipTVIE(InfoExtractor):
ext = umobj.group(1)
info = {
'id': data['item_id'],
'id': compat_str(data['item_id']),
'url': video_url,
'uploader': data['display_name'],
'upload_date': upload_date,

View File

@@ -0,0 +1,27 @@
import re
from .common import InfoExtractor
class BloombergIE(InfoExtractor):
_VALID_URL = r'https?://www\.bloomberg\.com/video/(?P<name>.+?).html'
_TEST = {
u'url': u'http://www.bloomberg.com/video/shah-s-presentation-on-foreign-exchange-strategies-qurhIVlJSB6hzkVi229d8g.html',
u'file': u'12bzhqZTqQHmmlA8I-i0NpzJgcG5NNYX.mp4',
u'info_dict': {
u'title': u'Shah\'s Presentation on Foreign-Exchange Strategies',
u'description': u'md5:abc86e5236f9f0e4866c59ad36736686',
},
u'params': {
# Requires ffmpeg (m3u8 manifest)
u'skip_download': True,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
name = mobj.group('name')
webpage = self._download_webpage(url, name)
ooyala_url = self._og_search_video_url(webpage)
return self.url_result(ooyala_url, ie='Ooyala')

View File

@@ -1,3 +1,5 @@
# encoding: utf-8
import re
import json
import xml.etree.ElementTree
@@ -7,15 +9,39 @@ from ..utils import (
compat_urllib_parse,
find_xpath_attr,
compat_urlparse,
ExtractorError,
)
class BrightcoveIE(InfoExtractor):
_VALID_URL = r'https?://.*brightcove\.com/(services|viewer).*\?(?P<query>.*)'
_FEDERATED_URL_TEMPLATE = 'http://c.brightcove.com/services/viewer/htmlFederated?%s'
_PLAYLIST_URL_TEMPLATE = 'http://c.brightcove.com/services/json/experience/runtime/?command=get_programming_for_experience&playerKey=%s'
# There is a test for Brigtcove in GenericIE, that way we test both the download
# and the detection of videos, and we don't have to find an URL that is always valid
_TESTS = [
{
# From http://www.8tv.cat/8aldia/videos/xavier-sala-i-martin-aquesta-tarda-a-8-al-dia/
u'url': u'http://c.brightcove.com/services/viewer/htmlFederated?playerID=1654948606001&flashID=myExperience&%40videoPlayer=2371591881001',
u'file': u'2371591881001.mp4',
u'md5': u'9e80619e0a94663f0bdc849b4566af19',
u'note': u'Test Brightcove downloads and detection in GenericIE',
u'info_dict': {
u'title': u'Xavier Sala i Martín: “Un banc que no presta és un banc zombi que no serveix per a res”',
u'uploader': u'8TV',
u'description': u'md5:a950cc4285c43e44d763d036710cd9cd',
}
},
{
# From http://medianetwork.oracle.com/video/player/1785452137001
u'url': u'http://c.brightcove.com/services/viewer/htmlFederated?playerID=1217746023001&flashID=myPlayer&%40videoPlayer=1785452137001',
u'file': u'1785452137001.flv',
u'info_dict': {
u'title': u'JVMLS 2012: Arrays 2.0 - Opportunities and Challenges',
u'description': u'John Rose speaks at the JVM Language Summit, August 1, 2012.',
u'uploader': u'Oracle',
},
},
]
@classmethod
def _build_brighcove_url(cls, object_str):
@@ -23,6 +49,13 @@ class BrightcoveIE(InfoExtractor):
Build a Brightcove url from a xml string containing
<object class="BrightcoveExperience">{params}</object>
"""
# Fix up some stupid HTML, see https://github.com/rg3/youtube-dl/issues/1553
object_str = re.sub(r'(<param name="[^"]+" value="[^"]+")>',
lambda m: m.group(1) + '/>', object_str)
# Fix up some stupid XML, see https://github.com/rg3/youtube-dl/issues/1608
object_str = object_str.replace(u'<--', u'<!--')
object_doc = xml.etree.ElementTree.fromstring(object_str)
assert u'BrightcoveExperience' in object_doc.attrib['class']
params = {'flashID': object_doc.attrib['id'],
@@ -65,22 +98,37 @@ class BrightcoveIE(InfoExtractor):
playlist_info = self._download_webpage(self._PLAYLIST_URL_TEMPLATE % player_key,
player_key, u'Downloading playlist information')
playlist_info = json.loads(playlist_info)['videoList']
json_data = json.loads(playlist_info)
if 'videoList' not in json_data:
raise ExtractorError(u'Empty playlist')
playlist_info = json_data['videoList']
videos = [self._extract_video_info(video_info) for video_info in playlist_info['mediaCollectionDTO']['videoDTOs']]
return self.playlist_result(videos, playlist_id=playlist_info['id'],
playlist_title=playlist_info['mediaCollectionDTO']['displayName'])
def _extract_video_info(self, video_info):
renditions = video_info['renditions']
renditions = sorted(renditions, key=lambda r: r['size'])
best_format = renditions[-1]
info = {
'id': video_info['id'],
'title': video_info['displayName'],
'description': video_info.get('shortDescription'),
'thumbnail': video_info.get('videoStillURL') or video_info.get('thumbnailURL'),
'uploader': video_info.get('publisherName'),
}
return {'id': video_info['id'],
'title': video_info['displayName'],
'url': best_format['defaultURL'],
renditions = video_info.get('renditions')
if renditions:
renditions = sorted(renditions, key=lambda r: r['size'])
best_format = renditions[-1]
info.update({
'url': best_format['defaultURL'],
'ext': 'mp4',
'description': video_info.get('shortDescription'),
'thumbnail': video_info.get('videoStillURL') or video_info.get('thumbnailURL'),
'uploader': video_info.get('publisherName'),
}
})
elif video_info.get('FLVFullLengthURL') is not None:
info.update({
'url': video_info['FLVFullLengthURL'],
'ext': 'flv',
})
else:
raise ExtractorError(u'Unable to extract video url for %s' % info['id'])
return info

View File

@@ -12,8 +12,8 @@ class C56IE(InfoExtractor):
_TEST ={
u'url': u'http://www.56.com/u39/v_OTM0NDA3MTY.html',
u'file': u'93440716.mp4',
u'md5': u'9dc07b5c8e978112a6441f9e75d2b59e',
u'file': u'93440716.flv',
u'md5': u'e59995ac63d0457783ea05f93f12a866',
u'info_dict': {
u'title': u'网事知多少 第32期车怒',
},

View File

@@ -0,0 +1,35 @@
# coding: utf-8
import re
from .common import InfoExtractor
class Canalc2IE(InfoExtractor):
IE_NAME = 'canalc2.tv'
_VALID_URL = r'http://.*?\.canalc2\.tv/video\.asp\?idVideo=(\d+)&voir=oui'
_TEST = {
u'url': u'http://www.canalc2.tv/video.asp?idVideo=12163&voir=oui',
u'file': u'12163.mp4',
u'md5': u'060158428b650f896c542dfbb3d6487f',
u'info_dict': {
u'title': u'Terrasses du Numérique'
}
}
def _real_extract(self, url):
video_id = re.match(self._VALID_URL, url).group(1)
webpage = self._download_webpage(url, video_id)
file_name = self._search_regex(
r"so\.addVariable\('file','(.*?)'\);",
webpage, 'file name')
video_url = 'http://vod-flash.u-strasbg.fr:8080/' + file_name
title = self._html_search_regex(
r'class="evenement8">(.*?)</a>', webpage, u'title')
return {'id': video_id,
'ext': 'mp4',
'url': video_url,
'title': title,
}

View File

@@ -1,3 +1,4 @@
# encoding: utf-8
import re
import xml.etree.ElementTree
@@ -5,24 +6,29 @@ from .common import InfoExtractor
from ..utils import unified_strdate
class CanalplusIE(InfoExtractor):
_VALID_URL = r'https?://www\.canalplus\.fr/.*?\?vid=(?P<id>\d+)'
_VALID_URL = r'https?://(www\.canalplus\.fr/.*?/(?P<path>.*)|player\.canalplus\.fr/#/(?P<id>\d+))'
_VIDEO_INFO_TEMPLATE = 'http://service.canal-plus.com/video/rest/getVideosLiees/cplus/%s'
IE_NAME = u'canalplus.fr'
_TEST = {
u'url': u'http://www.canalplus.fr/c-divertissement/pid3351-c-le-petit-journal.html?vid=889861',
u'file': u'889861.flv',
u'md5': u'590a888158b5f0d6832f84001fbf3e99',
u'url': u'http://www.canalplus.fr/c-infos-documentaires/pid1830-c-zapping.html?vid=922470',
u'file': u'922470.flv',
u'info_dict': {
u'title': u'Le Petit Journal 20/06/13 - La guerre des drone',
u'upload_date': u'20130620',
u'title': u'Zapping - 26/08/13',
u'description': u'Le meilleur de toutes les chaînes, tous les jours.\nEmission du 26 août 2013',
u'upload_date': u'20130826',
},
u'params': {
u'skip_download': True,
},
u'skip': u'Requires rtmpdump'
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
if video_id is None:
webpage = self._download_webpage(url, mobj.group('path'))
video_id = self._search_regex(r'videoId = "(\d+)";', webpage, u'video id')
info_url = self._VIDEO_INFO_TEMPLATE % video_id
info_page = self._download_webpage(info_url,video_id,
u'Downloading video info')
@@ -43,4 +49,6 @@ class CanalplusIE(InfoExtractor):
'ext': 'flv',
'upload_date': unified_strdate(infos.find('PUBLICATION/DATE').text),
'thumbnail': media.find('IMAGES/GRAND').text,
'description': infos.find('DESCRIPTION').text,
'view_count': int(infos.find('NB_VUES').text),
}

View File

@@ -0,0 +1,91 @@
# encoding: utf-8
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
)
class CinemassacreIE(InfoExtractor):
_VALID_URL = r'(?:http://)?(?:www\.)?(?P<url>cinemassacre\.com/(?P<date_Y>[0-9]{4})/(?P<date_m>[0-9]{2})/(?P<date_d>[0-9]{2})/.+?)(?:[/?].*)?'
_TESTS = [{
u'url': u'http://cinemassacre.com/2012/11/10/avgn-the-movie-trailer/',
u'file': u'19911.flv',
u'info_dict': {
u'upload_date': u'20121110',
u'title': u'“Angry Video Game Nerd: The Movie” Trailer',
u'description': u'md5:fb87405fcb42a331742a0dce2708560b',
},
u'params': {
# rtmp download
u'skip_download': True,
},
},
{
u'url': u'http://cinemassacre.com/2013/10/02/the-mummys-hand-1940',
u'file': u'521be8ef82b16.flv',
u'info_dict': {
u'upload_date': u'20131002',
u'title': u'The Mummys Hand (1940)',
},
u'params': {
# rtmp download
u'skip_download': True,
},
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
webpage_url = u'http://' + mobj.group('url')
webpage = self._download_webpage(webpage_url, None) # Don't know video id yet
video_date = mobj.group('date_Y') + mobj.group('date_m') + mobj.group('date_d')
mobj = re.search(r'src="(?P<embed_url>http://player\.screenwavemedia\.com/play/(?:embed|player)\.php\?id=(?:Cinemassacre-)?(?P<video_id>.+?))"', webpage)
if not mobj:
raise ExtractorError(u'Can\'t extract embed url and video id')
playerdata_url = mobj.group(u'embed_url')
video_id = mobj.group(u'video_id')
video_title = self._html_search_regex(r'<title>(?P<title>.+?)\|',
webpage, u'title')
video_description = self._html_search_regex(r'<div class="entry-content">(?P<description>.+?)</div>',
webpage, u'description', flags=re.DOTALL, fatal=False)
if len(video_description) == 0:
video_description = None
playerdata = self._download_webpage(playerdata_url, video_id)
base_url = self._html_search_regex(r'\'streamer\': \'(?P<base_url>rtmp://.*?)/(?:vod|Cinemassacre)\'',
playerdata, u'base_url')
base_url += '/Cinemassacre/'
# Important: The file names in playerdata are not used by the player and even wrong for some videos
sd_file = 'Cinemassacre-%s_high.mp4' % video_id
hd_file = 'Cinemassacre-%s.mp4' % video_id
video_thumbnail = 'http://image.screenwavemedia.com/Cinemassacre/Cinemassacre-%s_thumb_640x360.jpg' % video_id
formats = [
{
'url': base_url + sd_file,
'ext': 'flv',
'format': 'sd',
'format_id': 'sd',
},
{
'url': base_url + hd_file,
'ext': 'flv',
'format': 'hd',
'format_id': 'hd',
},
]
info = {
'id': video_id,
'title': video_title,
'formats': formats,
'description': video_description,
'upload_date': video_date,
'thumbnail': video_thumbnail,
}
# TODO: Remove when #980 has been merged
info.update(formats[-1])
return info

View File

@@ -0,0 +1,58 @@
import re
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import determine_ext
class CNNIE(InfoExtractor):
_VALID_URL = r'''(?x)https?://(edition\.)?cnn\.com/video/(data/.+?|\?)/
(?P<path>.+?/(?P<title>[^/]+?)(?:\.cnn|(?=&)))'''
_TESTS = [{
u'url': u'http://edition.cnn.com/video/?/video/sports/2013/06/09/nadal-1-on-1.cnn',
u'file': u'sports_2013_06_09_nadal-1-on-1.cnn.mp4',
u'md5': u'3e6121ea48df7e2259fe73a0628605c4',
u'info_dict': {
u'title': u'Nadal wins 8th French Open title',
u'description': u'World Sport\'s Amanda Davies chats with 2013 French Open champion Rafael Nadal.',
},
},
{
u"url": u"http://edition.cnn.com/video/?/video/us/2013/08/21/sot-student-gives-epic-speech.georgia-institute-of-technology&utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+rss%2Fcnn_topstories+%28RSS%3A+Top+Stories%29",
u"file": u"us_2013_08_21_sot-student-gives-epic-speech.georgia-institute-of-technology.mp4",
u"md5": u"b5cc60c60a3477d185af8f19a2a26f4e",
u"info_dict": {
u"title": "Student's epic speech stuns new freshmen",
u"description": "A Georgia Tech student welcomes the incoming freshmen with an epic speech backed by music from \"2001: A Space Odyssey.\""
}
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
path = mobj.group('path')
page_title = mobj.group('title')
info_url = u'http://cnn.com/video/data/3.0/%s/index.xml' % path
info_xml = self._download_webpage(info_url, page_title)
info = xml.etree.ElementTree.fromstring(info_xml.encode('utf-8'))
formats = []
for f in info.findall('files/file'):
mf = re.match(r'(\d+)x(\d+)(?:_(.*)k)?',f.attrib['bitrate'])
if mf is not None:
formats.append((int(mf.group(1)), int(mf.group(2)), int(mf.group(3) or 0), f.text))
formats = sorted(formats)
(_,_,_, video_path) = formats[-1]
video_url = 'http://ht.cdn.turner.com/cnn/big%s' % video_path
thumbnails = sorted([((int(t.attrib['height']),int(t.attrib['width'])), t.text) for t in info.findall('images/image')])
thumbs_dict = [{'resolution': res, 'url': t_url} for (res, t_url) in thumbnails]
return {'id': info.attrib['id'],
'title': info.find('headline').text,
'url': video_url,
'ext': determine_ext(video_url),
'thumbnail': thumbnails[-1][1],
'thumbnails': thumbs_dict,
'description': info.find('description').text,
}

View File

@@ -51,12 +51,12 @@ class ComedyCentralIE(InfoExtractor):
'400': 'mp4',
}
_video_dimensions = {
'3500': '1280x720',
'2200': '960x540',
'1700': '768x432',
'1200': '640x360',
'750': '512x288',
'400': '384x216',
'3500': (1280, 720),
'2200': (960, 540),
'1700': (768, 432),
'1200': (640, 360),
'750': (512, 288),
'400': (384, 216),
}
@classmethod
@@ -64,11 +64,13 @@ class ComedyCentralIE(InfoExtractor):
"""Receives a URL and returns True if suitable for this IE."""
return re.match(cls._VALID_URL, url, re.VERBOSE) is not None
def _print_formats(self, formats):
print('Available formats:')
for x in formats:
print('%s\t:\t%s\t[%s]' %(x, self._video_extensions.get(x, 'mp4'), self._video_dimensions.get(x, '???')))
@staticmethod
def _transform_rtmp_url(rtmp_video_url):
m = re.match(r'^rtmpe?://.*?/(?P<finalid>gsp.comedystor/.*)$', rtmp_video_url)
if not m:
raise ExtractorError(u'Cannot transform RTMP url')
base = 'http://mtvnmobile.vo.llnwd.net/kip0/_pxn=1+_pxI0=Ripod-h264+_pxL0=undefined+_pxM0=+_pxK=18639+_pxE=mp4/44620/mtvnorigin/'
return base + m.group('finalid')
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url, re.VERBOSE)
@@ -155,40 +157,31 @@ class ComedyCentralIE(InfoExtractor):
self._downloader.report_error(u'unable to download ' + mediaId + ': No videos found')
continue
if self._downloader.params.get('listformats', None):
self._print_formats([i[0] for i in turls])
return
# For now, just pick the highest bitrate
format,rtmp_video_url = turls[-1]
# Get the format arg from the arg stream
req_format = self._downloader.params.get('format', None)
# Select format if we can find one
for f,v in turls:
if f == req_format:
format, rtmp_video_url = f, v
break
m = re.match(r'^rtmpe?://.*?/(?P<finalid>gsp.comedystor/.*)$', rtmp_video_url)
if not m:
raise ExtractorError(u'Cannot transform RTMP url')
base = 'http://mtvnmobile.vo.llnwd.net/kip0/_pxn=1+_pxI0=Ripod-h264+_pxL0=undefined+_pxM0=+_pxK=18639+_pxE=mp4/44620/mtvnorigin/'
video_url = base + m.group('finalid')
formats = []
for format, rtmp_video_url in turls:
w, h = self._video_dimensions.get(format, (None, None))
formats.append({
'url': self._transform_rtmp_url(rtmp_video_url),
'ext': self._video_extensions.get(format, 'mp4'),
'format_id': format,
'height': h,
'width': w,
})
effTitle = showId + u'-' + epTitle + u' part ' + compat_str(partNum+1)
info = {
'id': shortMediaId,
'url': video_url,
'formats': formats,
'uploader': showId,
'upload_date': officialDate,
'title': effTitle,
'ext': 'mp4',
'format': format,
'thumbnail': None,
'description': compat_str(officialTitle),
}
# TODO: Remove when #980 has been merged
info.update(info['formats'][-1])
results.append(info)
return results

View File

@@ -14,6 +14,7 @@ from ..utils import (
clean_html,
compiled_regex_type,
ExtractorError,
RegexNotFoundError,
unescapeHTML,
)
@@ -35,6 +36,8 @@ class InfoExtractor(object):
title: Video title, unescaped.
ext: Video filename extension.
Instead of url and ext, formats can also specified.
The following fields are optional:
format: The video format, defaults to ext (used for --get-format)
@@ -52,8 +55,23 @@ class InfoExtractor(object):
view_count: How many users have watched the video on the platform.
urlhandle: [internal] The urlHandle to be used to download the file,
like returned by urllib.request.urlopen
age_limit: Age restriction for the video, as an integer (years)
formats: A list of dictionaries for each format available, it must
be ordered from worst to best quality. Potential fields:
* url Mandatory. The URL of the video file
* ext Will be calculated from url if missing
* format A human-readable description of the format
("mp4 container with h264/opus").
Calculated from the format_id, width, height
and format_note fields if missing.
* format_id A short description of the format
("mp4_h264_opus" or "19")
* format_note Additional info about the format
("3D" or "DASH video")
* width Width of the video, if known
* height Height of the video, if known
The fields should all be Unicode strings.
Unless mentioned otherwise, the fields should be Unicode strings.
Subclasses of this one should re-define the _real_initialize() and
_real_extract() methods and define a _VALID_URL regexp.
@@ -114,6 +132,11 @@ class InfoExtractor(object):
"""Real extraction process. Redefine in subclasses."""
pass
@classmethod
def ie_key(cls):
"""A string for getting the InfoExtractor with get_info_extractor"""
return cls.__name__[:-2]
@property
def IE_NAME(self):
return type(self).__name__[:-2]
@@ -129,7 +152,7 @@ class InfoExtractor(object):
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
if errnote is None:
errnote = u'Unable to download webpage'
raise ExtractorError(u'%s: %s' % (errnote, compat_str(err)), sys.exc_info()[2])
raise ExtractorError(u'%s: %s' % (errnote, compat_str(err)), sys.exc_info()[2], cause=err)
def _download_webpage_handle(self, url_or_request, video_id, note=None, errnote=None):
""" Returns a tuple (page content as string, URL handle) """
@@ -140,12 +163,17 @@ class InfoExtractor(object):
urlh = self._request_webpage(url_or_request, video_id, note, errnote)
content_type = urlh.headers.get('Content-Type', '')
webpage_bytes = urlh.read()
m = re.match(r'[a-zA-Z0-9_.-]+/[a-zA-Z0-9_.-]+\s*;\s*charset=(.+)', content_type)
if m:
encoding = m.group(1)
else:
encoding = 'utf-8'
webpage_bytes = urlh.read()
m = re.search(br'<meta[^>]+charset=[\'"]?([^\'")]+)[ /\'">]',
webpage_bytes[:1024])
if m:
encoding = m.group(1).decode('ascii')
else:
encoding = 'utf-8'
if self._downloader.params.get('dump_intermediate_pages', False):
try:
url = url_or_request.get_full_url()
@@ -204,7 +232,7 @@ class InfoExtractor(object):
Perform a regex search on the given string, using a single or a list of
patterns returning the first matching group.
In case of failure return a default value or raise a WARNING or a
ExtractorError, depending on fatal, specifying the field name.
RegexNotFoundError, depending on fatal, specifying the field name.
"""
if isinstance(pattern, (str, compat_str, compiled_regex_type)):
mobj = re.search(pattern, string, flags)
@@ -224,7 +252,7 @@ class InfoExtractor(object):
elif default is not None:
return default
elif fatal:
raise ExtractorError(u'Unable to extract %s' % _name)
raise RegexNotFoundError(u'Unable to extract %s' % _name)
else:
self._downloader.report_warning(u'unable to extract %s; '
u'please report this issue on http://yt-dl.org/bug' % _name)
@@ -295,6 +323,15 @@ class InfoExtractor(object):
self._og_regex('video')],
html, name, **kargs)
def _rta_search(self, html):
# See http://www.rtalabel.org/index.php?content=howtofaq#single
if re.search(r'(?ix)<meta\s+name="rating"\s+'
r' content="RTA-5042-1996-1400-1577-RTA"',
html):
return 18
return 0
class SearchInfoExtractor(InfoExtractor):
"""
Base class for paged search queries extractors.
@@ -332,7 +369,7 @@ class SearchInfoExtractor(InfoExtractor):
def _get_n_results(self, query, n):
"""Get a specified number of results for a query"""
raise NotImplementedError("This method must be implemented by sublclasses")
raise NotImplementedError("This method must be implemented by subclasses")
@property
def SEARCH_KEY(self):

View File

@@ -3,28 +3,65 @@ import json
import itertools
from .common import InfoExtractor
from .subtitles import SubtitlesInfoExtractor
from ..utils import (
compat_urllib_request,
compat_str,
get_element_by_attribute,
get_element_by_id,
orderedSet,
ExtractorError,
)
class DailymotionIE(InfoExtractor):
class DailymotionBaseInfoExtractor(InfoExtractor):
@staticmethod
def _build_request(url):
"""Build a request with the family filter disabled"""
request = compat_urllib_request.Request(url)
request.add_header('Cookie', 'family_filter=off')
return request
class DailymotionIE(DailymotionBaseInfoExtractor, SubtitlesInfoExtractor):
"""Information Extractor for Dailymotion"""
_VALID_URL = r'(?i)(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/video/([^/]+)'
_VALID_URL = r'(?i)(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/(?:embed/)?video/([^/]+)'
IE_NAME = u'dailymotion'
_TEST = {
u'url': u'http://www.dailymotion.com/video/x33vw9_tutoriel-de-youtubeur-dl-des-video_tech',
u'file': u'x33vw9.mp4',
u'md5': u'392c4b85a60a90dc4792da41ce3144eb',
u'info_dict': {
u"uploader": u"Alex and Van .",
u"title": u"Tutoriel de Youtubeur\"DL DES VIDEO DE YOUTUBE\""
}
}
_FORMATS = [
(u'stream_h264_ld_url', u'ld'),
(u'stream_h264_url', u'standard'),
(u'stream_h264_hq_url', u'hq'),
(u'stream_h264_hd_url', u'hd'),
(u'stream_h264_hd1080_url', u'hd180'),
]
_TESTS = [
{
u'url': u'http://www.dailymotion.com/video/x33vw9_tutoriel-de-youtubeur-dl-des-video_tech',
u'file': u'x33vw9.mp4',
u'md5': u'392c4b85a60a90dc4792da41ce3144eb',
u'info_dict': {
u"uploader": u"Amphora Alex and Van .",
u"title": u"Tutoriel de Youtubeur\"DL DES VIDEO DE YOUTUBE\""
}
},
# Vevo video
{
u'url': u'http://www.dailymotion.com/video/x149uew_katy-perry-roar-official_musi',
u'file': u'USUV71301934.mp4',
u'info_dict': {
u'title': u'Roar (Official)',
u'uploader': u'Katy Perry',
u'upload_date': u'20130905',
},
u'params': {
u'skip_download': True,
},
u'skip': u'VEVO is only available in some countries',
},
]
def _real_extract(self, url):
# Extract id and simplified title from URL
@@ -32,16 +69,24 @@ class DailymotionIE(InfoExtractor):
video_id = mobj.group(1).split('_')[0].split('?')[0]
video_extension = 'mp4'
url = 'http://www.dailymotion.com/video/%s' % video_id
# Retrieve video webpage to extract further information
request = compat_urllib_request.Request(url)
request.add_header('Cookie', 'family_filter=off')
request = self._build_request(url)
webpage = self._download_webpage(request, video_id)
# Extract URL, uploader and title from webpage
self.report_extraction(video_id)
# It may just embed a vevo video:
m_vevo = re.search(
r'<link rel="video_src" href="[^"]*?vevo.com[^"]*?videoId=(?P<id>[\w]*)',
webpage)
if m_vevo is not None:
vevo_id = m_vevo.group('id')
self.to_screen(u'Vevo video detected: %s' % vevo_id)
return self.url_result(u'vevo:%s' % vevo_id, ie='Vevo')
video_uploader = self._search_regex([r'(?im)<span class="owner[^\"]+?">[^<]+?<a [^>]+?>([^<]+?)</a>',
# Looking for official user
r'<(?:span|a) .*?rel="author".*?>([^<]+?)</'],
@@ -55,56 +100,114 @@ class DailymotionIE(InfoExtractor):
embed_url = 'http://www.dailymotion.com/embed/video/%s' % video_id
embed_page = self._download_webpage(embed_url, video_id,
u'Downloading embed page')
info = self._search_regex(r'var info = ({.*?}),', embed_page, 'video info')
info = self._search_regex(r'var info = ({.*?}),$', embed_page,
'video info', flags=re.MULTILINE)
info = json.loads(info)
if info.get('error') is not None:
msg = 'Couldn\'t get video, Dailymotion says: %s' % info['error']['title']
raise ExtractorError(msg, expected=True)
# TODO: support choosing qualities
for key in ['stream_h264_hd1080_url','stream_h264_hd_url',
'stream_h264_hq_url','stream_h264_url',
'stream_h264_ld_url']:
if info.get(key):#key in info and info[key]:
max_quality = key
self.to_screen(u'Using %s' % key)
break
else:
formats = []
for (key, format_id) in self._FORMATS:
video_url = info.get(key)
if video_url is not None:
m_size = re.search(r'H264-(\d+)x(\d+)', video_url)
if m_size is not None:
width, height = m_size.group(1), m_size.group(2)
else:
width, height = None, None
formats.append({
'url': video_url,
'ext': 'mp4',
'format_id': format_id,
'width': width,
'height': height,
})
if not formats:
raise ExtractorError(u'Unable to extract video URL')
video_url = info[max_quality]
# subtitles
video_subtitles = self.extract_subtitles(video_id)
if self._downloader.params.get('listsubtitles', False):
self._list_available_subtitles(video_id)
return
return [{
'id': video_id,
'url': video_url,
'formats': formats,
'uploader': video_uploader,
'upload_date': video_upload_date,
'title': self._og_search_title(webpage),
'ext': video_extension,
'subtitles': video_subtitles,
'thumbnail': info['thumbnail_url']
}]
def _get_available_subtitles(self, video_id):
try:
sub_list = self._download_webpage(
'https://api.dailymotion.com/video/%s/subtitles?fields=id,language,url' % video_id,
video_id, note=False)
except ExtractorError as err:
self._downloader.report_warning(u'unable to download video subtitles: %s' % compat_str(err))
return {}
info = json.loads(sub_list)
if (info['total'] > 0):
sub_lang_list = dict((l['language'], l['url']) for l in info['list'])
return sub_lang_list
self._downloader.report_warning(u'video doesn\'t have subtitles')
return {}
class DailymotionPlaylistIE(InfoExtractor):
class DailymotionPlaylistIE(DailymotionBaseInfoExtractor):
IE_NAME = u'dailymotion:playlist'
_VALID_URL = r'(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/playlist/(?P<id>.+?)/'
_MORE_PAGES_INDICATOR = r'<div class="next">.*?<a.*?href="/playlist/.+?".*?>.*?</a>.*?</div>'
_PAGE_TEMPLATE = 'https://www.dailymotion.com/playlist/%s/%s'
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
playlist_id = mobj.group('id')
def _extract_entries(self, id):
video_ids = []
for pagenum in itertools.count(1):
webpage = self._download_webpage('https://www.dailymotion.com/playlist/%s/%s' % (playlist_id, pagenum),
playlist_id, u'Downloading page %s' % pagenum)
request = self._build_request(self._PAGE_TEMPLATE % (id, pagenum))
webpage = self._download_webpage(request,
id, u'Downloading page %s' % pagenum)
playlist_el = get_element_by_attribute(u'class', u'video_list', webpage)
video_ids.extend(re.findall(r'data-id="(.+?)" data-ext-id', playlist_el))
video_ids.extend(re.findall(r'data-id="(.+?)"', playlist_el))
if re.search(self._MORE_PAGES_INDICATOR, webpage, re.DOTALL) is None:
break
return [self.url_result('http://www.dailymotion.com/video/%s' % video_id, 'Dailymotion')
for video_id in orderedSet(video_ids)]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
playlist_id = mobj.group('id')
webpage = self._download_webpage(url, playlist_id)
entries = [self.url_result('http://www.dailymotion.com/video/%s' % video_id, 'Dailymotion')
for video_id in video_ids]
return {'_type': 'playlist',
'id': playlist_id,
'title': get_element_by_id(u'playlist_name', webpage),
'entries': entries,
'entries': self._extract_entries(playlist_id),
}
class DailymotionUserIE(DailymotionPlaylistIE):
IE_NAME = u'dailymotion:user'
_VALID_URL = r'(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/user/(?P<user>[^/]+)'
_MORE_PAGES_INDICATOR = r'<div class="next">.*?<a.*?href="/user/.+?".*?>.*?</a>.*?</div>'
_PAGE_TEMPLATE = 'http://www.dailymotion.com/user/%s/%s'
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
user = mobj.group('user')
webpage = self._download_webpage(url, user)
full_user = self._html_search_regex(
r'<a class="label" href="/%s".*?>(.*?)</' % re.escape(user),
webpage, u'user', flags=re.DOTALL)
return {
'_type': 'playlist',
'id': user,
'title': full_user,
'entries': self._extract_entries(user),
}

View File

@@ -0,0 +1,74 @@
# encoding: utf-8
import re
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import (
compat_urllib_parse,
determine_ext,
)
class DaumIE(InfoExtractor):
_VALID_URL = r'https?://tvpot\.daum\.net/.*?clipid=(?P<id>\d+)'
IE_NAME = u'daum.net'
_TEST = {
u'url': u'http://tvpot.daum.net/clip/ClipView.do?clipid=52554690',
u'file': u'52554690.mp4',
u'info_dict': {
u'title': u'DOTA 2GETHER 시즌2 6회 - 2부',
u'description': u'DOTA 2GETHER 시즌2 6회 - 2부',
u'upload_date': u'20130831',
u'duration': 3868,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group(1)
canonical_url = 'http://tvpot.daum.net/v/%s' % video_id
webpage = self._download_webpage(canonical_url, video_id)
full_id = self._search_regex(r'<link rel="video_src" href=".+?vid=(.+?)"',
webpage, u'full id')
query = compat_urllib_parse.urlencode({'vid': full_id})
info_xml = self._download_webpage(
'http://tvpot.daum.net/clip/ClipInfoXml.do?' + query, video_id,
u'Downloading video info')
urls_xml = self._download_webpage(
'http://videofarm.daum.net/controller/api/open/v1_2/MovieData.apixml?' + query,
video_id, u'Downloading video formats info')
info = xml.etree.ElementTree.fromstring(info_xml.encode('utf-8'))
urls = xml.etree.ElementTree.fromstring(urls_xml.encode('utf-8'))
self.to_screen(u'%s: Getting video urls' % video_id)
formats = []
for format_el in urls.findall('result/output_list/output_list'):
profile = format_el.attrib['profile']
format_query = compat_urllib_parse.urlencode({
'vid': full_id,
'profile': profile,
})
url_xml = self._download_webpage(
'http://videofarm.daum.net/controller/api/open/v1_2/MovieLocation.apixml?' + format_query,
video_id, note=False)
url_doc = xml.etree.ElementTree.fromstring(url_xml.encode('utf-8'))
format_url = url_doc.find('result/url').text
formats.append({
'url': format_url,
'ext': determine_ext(format_url),
'format_id': profile,
})
info = {
'id': video_id,
'title': info.find('TITLE').text,
'formats': formats,
'thumbnail': self._og_search_thumbnail(webpage),
'description': info.find('CONTENTS').text,
'duration': int(info.find('DURATION').text),
'upload_date': info.find('REGDTTM').text[:8],
}
# TODO: Remove when #980 has been merged
info.update(formats[-1])
return info

View File

@@ -0,0 +1,39 @@
import re
import json
from .common import InfoExtractor
class DefenseGouvFrIE(InfoExtractor):
_IE_NAME = 'defense.gouv.fr'
_VALID_URL = (r'http://.*?\.defense\.gouv\.fr/layout/set/'
r'ligthboxvideo/base-de-medias/webtv/(.*)')
_TEST = {
u'url': (u'http://www.defense.gouv.fr/layout/set/ligthboxvideo/'
u'base-de-medias/webtv/attaque-chimique-syrienne-du-21-aout-2013-1'),
u'file': u'11213.mp4',
u'md5': u'75bba6124da7e63d2d60b5244ec9430c',
"info_dict": {
"title": "attaque-chimique-syrienne-du-21-aout-2013-1"
}
}
def _real_extract(self, url):
title = re.match(self._VALID_URL, url).group(1)
webpage = self._download_webpage(url, title)
video_id = self._search_regex(
r"flashvars.pvg_id=\"(\d+)\";",
webpage, 'ID')
json_url = ('http://static.videos.gouv.fr/brightcovehub/export/json/'
+ video_id)
info = self._download_webpage(json_url, title,
'Downloading JSON config')
video_url = json.loads(info)['renditions'][0]['url']
return {'id': video_id,
'ext': 'mp4',
'url': video_url,
'title': title,
}

View File

@@ -54,6 +54,7 @@ class DreiSatIE(InfoExtractor):
'width': int(fe.find('./width').text),
'height': int(fe.find('./height').text),
'url': fe.find('./url').text,
'ext': determine_ext(fe.find('./url').text),
'filesize': int(fe.find('./filesize').text),
'video_bitrate': int(fe.find('./videoBitrate').text),
'3sat_qualityname': fe.find('./quality').text,
@@ -79,7 +80,6 @@ class DreiSatIE(InfoExtractor):
}
# TODO: Remove when #980 has been merged
info['url'] = formats[-1]['url']
info['ext'] = determine_ext(formats[-1]['url'])
info.update(formats[-1])
return info
return info

View File

@@ -0,0 +1,37 @@
import re
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import determine_ext
class EbaumsWorldIE(InfoExtractor):
_VALID_URL = r'https?://www\.ebaumsworld\.com/video/watch/(?P<id>\d+)'
_TEST = {
u'url': u'http://www.ebaumsworld.com/video/watch/83367677/',
u'file': u'83367677.mp4',
u'info_dict': {
u'title': u'A Giant Python Opens The Door',
u'description': u'This is how nightmares start...',
u'uploader': u'jihadpizza',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
config_xml = self._download_webpage(
'http://www.ebaumsworld.com/video/player/%s' % video_id, video_id)
config = xml.etree.ElementTree.fromstring(config_xml.encode('utf-8'))
video_url = config.find('file').text
return {
'id': video_id,
'title': config.find('title').text,
'url': video_url,
'ext': determine_ext(video_url),
'description': config.find('description').text,
'thumbnail': config.find('image').text,
'uploader': config.find('username').text,
}

View File

@@ -106,8 +106,8 @@ class FacebookIE(InfoExtractor):
video_duration = int(video_data['video_duration'])
thumbnail = video_data['thumbnail_src']
video_title = self._html_search_regex('<h2 class="uiHeaderTitle">([^<]+)</h2>',
webpage, u'title')
video_title = self._html_search_regex(
r'<h2 class="uiHeaderTitle">([^<]*)</h2>', webpage, u'title')
info = {
'id': video_id,

View File

@@ -0,0 +1,60 @@
# encoding: utf-8
import re
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import (
determine_ext,
clean_html,
get_element_by_attribute,
)
class FazIE(InfoExtractor):
IE_NAME = u'faz.net'
_VALID_URL = r'https?://www\.faz\.net/multimedia/videos/.*?-(?P<id>\d+).html'
_TEST = {
u'url': u'http://www.faz.net/multimedia/videos/stockholm-chemie-nobelpreis-fuer-drei-amerikanische-forscher-12610585.html',
u'file': u'12610585.mp4',
u'info_dict': {
u'title': u'Stockholm: Chemie-Nobelpreis für drei amerikanische Forscher',
u'description': u'md5:1453fbf9a0d041d985a47306192ea253',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
self.to_screen(video_id)
webpage = self._download_webpage(url, video_id)
config_xml_url = self._search_regex(r'writeFLV\(\'(.+?)\',', webpage,
u'config xml url')
config_xml = self._download_webpage(config_xml_url, video_id,
u'Downloading config xml')
config = xml.etree.ElementTree.fromstring(config_xml.encode('utf-8'))
encodings = config.find('ENCODINGS')
formats = []
for code in ['LOW', 'HIGH', 'HQ']:
encoding = encodings.find(code)
if encoding is None:
continue
encoding_url = encoding.find('FILENAME').text
formats.append({
'url': encoding_url,
'ext': determine_ext(encoding_url),
'format_id': code.lower(),
})
descr_html = get_element_by_attribute('class', 'Content Copy', webpage)
info = {
'id': video_id,
'title': self._og_search_title(webpage),
'formats': formats,
'description': clean_html(descr_html),
'thumbnail': config.find('STILL/STILL_BIG').text,
}
# TODO: Remove when #980 has been merged
info.update(formats[-1])
return info

View File

@@ -0,0 +1,79 @@
import re
import random
import json
from .common import InfoExtractor
from ..utils import (
determine_ext,
get_element_by_id,
clean_html,
)
class FKTVIE(InfoExtractor):
IE_NAME = u'fernsehkritik.tv'
_VALID_URL = r'(?:http://)?(?:www\.)?fernsehkritik.tv/folge-(?P<ep>[0-9]+)(?:/.*)?'
_TEST = {
u'url': u'http://fernsehkritik.tv/folge-1',
u'file': u'00011.flv',
u'info_dict': {
u'title': u'Folge 1 vom 10. April 2007',
u'description': u'md5:fb4818139c7cfe6907d4b83412a6864f',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
episode = int(mobj.group('ep'))
server = random.randint(2, 4)
video_thumbnail = 'http://fernsehkritik.tv/images/magazin/folge%d.jpg' % episode
start_webpage = self._download_webpage('http://fernsehkritik.tv/folge-%d/Start' % episode,
episode)
playlist = self._search_regex(r'playlist = (\[.*?\]);', start_webpage,
u'playlist', flags=re.DOTALL)
files = json.loads(re.sub('{[^{}]*?}', '{}', playlist))
# TODO: return a single multipart video
videos = []
for i, _ in enumerate(files, 1):
video_id = '%04d%d' % (episode, i)
video_url = 'http://dl%d.fernsehkritik.tv/fernsehkritik%d%s.flv' % (server, episode, '' if i == 1 else '-%d' % i)
video_title = 'Fernsehkritik %d.%d' % (episode, i)
videos.append({
'id': video_id,
'url': video_url,
'ext': determine_ext(video_url),
'title': clean_html(get_element_by_id('eptitle', start_webpage)),
'description': clean_html(get_element_by_id('contentlist', start_webpage)),
'thumbnail': video_thumbnail
})
return videos
class FKTVPosteckeIE(InfoExtractor):
IE_NAME = u'fernsehkritik.tv:postecke'
_VALID_URL = r'(?:http://)?(?:www\.)?fernsehkritik.tv/inline-video/postecke.php\?(.*&)?ep=(?P<ep>[0-9]+)(&|$)'
_TEST = {
u'url': u'http://fernsehkritik.tv/inline-video/postecke.php?iframe=true&width=625&height=440&ep=120',
u'file': u'0120.flv',
u'md5': u'262f0adbac80317412f7e57b4808e5c4',
u'info_dict': {
u"title": u"Postecke 120"
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
episode = int(mobj.group('ep'))
server = random.randint(2, 4)
video_id = '%04d' % episode
video_url = 'http://dl%d.fernsehkritik.tv/postecke/postecke%d.flv' % (server, episode)
video_title = 'Postecke %d' % episode
return {
'id': video_id,
'url': video_url,
'ext': determine_ext(video_url),
'title': video_title,
}

View File

@@ -9,7 +9,7 @@ from ..utils import (
class FlickrIE(InfoExtractor):
"""Information Extractor for Flickr videos"""
_VALID_URL = r'(?:https?://)?(?:www\.)?flickr\.com/photos/(?P<uploader_id>[\w\-_@]+)/(?P<id>\d+).*'
_VALID_URL = r'(?:https?://)?(?:www\.|secure\.)?flickr\.com/photos/(?P<uploader_id>[\w\-_@]+)/(?P<id>\d+).*'
_TEST = {
u'url': u'http://www.flickr.com/photos/forestwander-nature-pictures/5645318632/in/photostream/',
u'file': u'5645318632.mp4',

View File

@@ -0,0 +1,129 @@
# encoding: utf-8
import re
import xml.etree.ElementTree
import json
from .common import InfoExtractor
from ..utils import (
compat_urlparse,
)
class FranceTVBaseInfoExtractor(InfoExtractor):
def _extract_video(self, video_id):
xml_desc = self._download_webpage(
'http://www.francetvinfo.fr/appftv/webservices/video/'
'getInfosOeuvre.php?id-diffusion='
+ video_id, video_id, 'Downloading XML config')
info = xml.etree.ElementTree.fromstring(xml_desc.encode('utf-8'))
manifest_url = info.find('videos/video/url').text
video_url = manifest_url.replace('manifest.f4m', 'index_2_av.m3u8')
video_url = video_url.replace('/z/', '/i/')
thumbnail_path = info.find('image').text
return {'id': video_id,
'ext': 'mp4',
'url': video_url,
'title': info.find('titre').text,
'thumbnail': compat_urlparse.urljoin('http://pluzz.francetv.fr', thumbnail_path),
'description': info.find('synopsis').text,
}
class PluzzIE(FranceTVBaseInfoExtractor):
IE_NAME = u'pluzz.francetv.fr'
_VALID_URL = r'https?://pluzz\.francetv\.fr/videos/(.*?)\.html'
# Can't use tests, videos expire in 7 days
def _real_extract(self, url):
title = re.match(self._VALID_URL, url).group(1)
webpage = self._download_webpage(url, title)
video_id = self._search_regex(
r'data-diffusion="(\d+)"', webpage, 'ID')
return self._extract_video(video_id)
class FranceTvInfoIE(FranceTVBaseInfoExtractor):
IE_NAME = u'francetvinfo.fr'
_VALID_URL = r'https?://www\.francetvinfo\.fr/replay.*/(?P<title>.+).html'
_TEST = {
u'url': u'http://www.francetvinfo.fr/replay-jt/france-3/soir-3/jt-grand-soir-3-lundi-26-aout-2013_393427.html',
u'file': u'84981923.mp4',
u'info_dict': {
u'title': u'Soir 3',
},
u'params': {
u'skip_download': True,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
page_title = mobj.group('title')
webpage = self._download_webpage(url, page_title)
video_id = self._search_regex(r'id-video=(\d+?)"', webpage, u'video id')
return self._extract_video(video_id)
class France2IE(FranceTVBaseInfoExtractor):
IE_NAME = u'france2.fr'
_VALID_URL = r'''(?x)https?://www\.france2\.fr/
(?:
emissions/.*?/videos/(?P<id>\d+)
| emission/(?P<key>[^/?]+)
)'''
_TEST = {
u'url': u'http://www.france2.fr/emissions/13h15-le-samedi-le-dimanche/videos/75540104',
u'file': u'75540104.mp4',
u'info_dict': {
u'title': u'13h15, le samedi...',
u'description': u'md5:2e5b58ba7a2d3692b35c792be081a03d',
},
u'params': {
u'skip_download': True,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
if mobj.group('key'):
webpage = self._download_webpage(url, mobj.group('key'))
video_id = self._html_search_regex(
r'''(?x)<div\s+class="video-player">\s*
<a\s+href="http://videos.francetv.fr/video/([0-9]+)"\s+
class="francetv-video-player">''',
webpage, u'video ID')
else:
video_id = mobj.group('id')
return self._extract_video(video_id)
class GenerationQuoiIE(InfoExtractor):
IE_NAME = u'france2.fr:generation-quoi'
_VALID_URL = r'https?://generation-quoi\.france2\.fr/portrait/(?P<name>.*)(\?|$)'
_TEST = {
u'url': u'http://generation-quoi.france2.fr/portrait/garde-a-vous',
u'file': u'k7FJX8VBcvvLmX4wA5Q.mp4',
u'info_dict': {
u'title': u'Génération Quoi - Garde à Vous',
u'uploader': u'Génération Quoi',
},
u'params': {
# It uses Dailymotion
u'skip_download': True,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
name = mobj.group('name')
info_url = compat_urlparse.urljoin(url, '/medias/video/%s.json' % name)
info_json = self._download_webpage(info_url, name)
info = json.loads(info_json)
return self.url_result('http://www.dailymotion.com/video/%s' % info['id'],
ie='Dailymotion')

View File

@@ -21,7 +21,8 @@ class FunnyOrDieIE(InfoExtractor):
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
video_url = self._search_regex(r'type: "video/mp4", src: "(.*?)"',
video_url = self._search_regex(
[r'type="video/mp4" src="(.*?)"', r'src="([^>]*?)" type=\'video/mp4\''],
webpage, u'video URL', flags=re.DOTALL)
info = {

View File

@@ -1,55 +1,59 @@
import re
import xml.etree.ElementTree
import json
from .common import InfoExtractor
from ..utils import (
unified_strdate,
compat_urllib_parse,
compat_urlparse,
unescapeHTML,
get_meta_content,
)
class GameSpotIE(InfoExtractor):
_VALID_URL = r'(?:http://)?(?:www\.)?gamespot\.com/.*-(?P<page_id>\d+)/?'
_TEST = {
u"url": u"http://www.gamespot.com/arma-iii/videos/arma-iii-community-guide-sitrep-i-6410818/",
u"file": u"6410818.mp4",
u"file": u"gs-2300-6410818.mp4",
u"md5": u"b2a30deaa8654fcccd43713a6b6a4825",
u"info_dict": {
u"title": u"Arma III - Community Guide: SITREP I",
u"upload_date": u"20130627",
u"title": u"Arma 3 - Community Guide: SITREP I",
u'description': u'Check out this video where some of the basics of Arma 3 is explained.',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
page_id = mobj.group('page_id')
page_id = video_id = mobj.group('page_id')
webpage = self._download_webpage(url, page_id)
video_id = self._html_search_regex([r'"og:video" content=".*?\?id=(\d+)"',
r'http://www\.gamespot\.com/videoembed/(\d+)'],
webpage, 'video id')
data = compat_urllib_parse.urlencode({'id': video_id, 'newplayer': '1'})
info_url = 'http://www.gamespot.com/pages/video_player/xml.php?' + data
info_xml = self._download_webpage(info_url, video_id)
doc = xml.etree.ElementTree.fromstring(info_xml)
clip_el = doc.find('./playList/clip')
data_video_json = self._search_regex(r'data-video=\'(.*?)\'', webpage, u'data video')
data_video = json.loads(unescapeHTML(data_video_json))
http_urls = [{'url': node.find('filePath').text,
'rate': int(node.find('rate').text)}
for node in clip_el.find('./httpURI')]
best_quality = sorted(http_urls, key=lambda f: f['rate'])[-1]
video_url = best_quality['url']
title = clip_el.find('./title').text
ext = video_url.rpartition('.')[2]
thumbnail_url = clip_el.find('./screenGrabURI').text
view_count = int(clip_el.find('./views').text)
upload_date = unified_strdate(clip_el.find('./postDate').text)
# Transform the manifest url to a link to the mp4 files
# they are used in mobile devices.
f4m_url = data_video['videoStreams']['f4m_stream']
f4m_path = compat_urlparse.urlparse(f4m_url).path
QUALITIES_RE = r'((,\d+)+,?)'
qualities = self._search_regex(QUALITIES_RE, f4m_path, u'qualities').strip(',').split(',')
http_path = f4m_path[1:].split('/', 1)[1]
http_template = re.sub(QUALITIES_RE, r'%s', http_path)
http_template = http_template.replace('.csmil/manifest.f4m', '')
http_template = compat_urlparse.urljoin('http://video.gamespotcdn.com/', http_template)
formats = []
for q in qualities:
formats.append({
'url': http_template % q,
'ext': 'mp4',
'format_id': q,
})
return [{
'id' : video_id,
'url' : video_url,
'ext' : ext,
'title' : title,
'thumbnail' : thumbnail_url,
'upload_date' : upload_date,
'view_count' : view_count,
}]
info = {
'id': data_video['guid'],
'title': compat_urllib_parse.unquote(data_video['title']),
'formats': formats,
'description': get_meta_content('description', webpage),
'thumbnail': self._og_search_thumbnail(webpage),
}
# TODO: Remove when #980 has been merged
info.update(formats[-1])
return info

View File

@@ -8,11 +8,15 @@ from ..utils import (
compat_urllib_error,
compat_urllib_parse,
compat_urllib_request,
compat_urlparse,
ExtractorError,
smuggle_url,
unescapeHTML,
)
from .brightcove import BrightcoveIE
class GenericIE(InfoExtractor):
IE_DESC = u'Generic downloader that works on some sites'
_VALID_URL = r'.*'
@@ -23,21 +27,21 @@ class GenericIE(InfoExtractor):
u'file': u'13601338388002.mp4',
u'md5': u'85b90ccc9d73b4acd9138d3af4c27f89',
u'info_dict': {
u"uploader": u"www.hodiho.fr",
u"uploader": u"www.hodiho.fr",
u"title": u"R\u00e9gis plante sa Jeep"
}
},
# embedded vimeo video
{
u'url': u'http://www.8tv.cat/8aldia/videos/xavier-sala-i-martin-aquesta-tarda-a-8-al-dia/',
u'file': u'2371591881001.mp4',
u'md5': u'9e80619e0a94663f0bdc849b4566af19',
u'note': u'Test Brightcove downloads and detection in GenericIE',
u'url': u'http://skillsmatter.com/podcast/home/move-semanticsperfect-forwarding-and-rvalue-references',
u'file': u'22444065.mp4',
u'md5': u'2903896e23df39722c33f015af0666e2',
u'info_dict': {
u'title': u'Xavier Sala i Martín: “Un banc que no presta és un banc zombi que no serveix per a res”',
u'uploader': u'8TV',
u'description': u'md5:a950cc4285c43e44d763d036710cd9cd',
u'title': u'ACCU 2011: Move Semantics,Perfect Forwarding, and Rvalue references- Scott Meyers- 13/04/2011',
u"uploader_id": u"skillsmatter",
u"uploader": u"Skills Matter",
}
},
}
]
def report_download_webpage(self, video_id):
@@ -107,6 +111,11 @@ class GenericIE(InfoExtractor):
return new_url
def _real_extract(self, url):
parsed_url = compat_urlparse.urlparse(url)
if not parsed_url.scheme:
self._downloader.report_warning('The url doesn\'t specify the protocol, trying with http')
return self.url_result('http://' + url)
try:
new_url = self._test_redirect(url)
if new_url:
@@ -121,16 +130,31 @@ class GenericIE(InfoExtractor):
except ValueError:
# since this is the last-resort InfoExtractor, if
# this error is thrown, it'll be thrown here
raise ExtractorError(u'Invalid URL: %s' % url)
raise ExtractorError(u'Failed to download URL: %s' % url)
self.report_extraction(video_id)
# Look for BrigthCove:
m_brightcove = re.search(r'<object.+?class=([\'"]).*?BrightcoveExperience.*?\1.+?</object>', webpage, re.DOTALL)
# Look for BrightCove:
m_brightcove = re.search(r'<object[^>]+?class=([\'"])[^>]*?BrightcoveExperience.*?\1.+?</object>', webpage, re.DOTALL)
if m_brightcove is not None:
self.to_screen(u'Brightcove video detected.')
bc_url = BrightcoveIE._build_brighcove_url(m_brightcove.group())
return self.url_result(bc_url, 'Brightcove')
# Look for embedded Vimeo player
mobj = re.search(
r'<iframe[^>]+?src="(https?://player.vimeo.com/video/.+?)"', webpage)
if mobj:
player_url = unescapeHTML(mobj.group(1))
surl = smuggle_url(player_url, {'Referer': url})
return self.url_result(surl, 'Vimeo')
# Look for embedded YouTube player
mobj = re.search(
r'<iframe[^>]+?src="(https?://(?:www\.)?youtube.com/embed/.+?)"', webpage)
if mobj:
surl = unescapeHTML(mobj.group(1))
return self.url_result(surl, 'Youtube')
# Start with something easy: JW Player in SWFObject
mobj = re.search(r'flashvars: [\'"](?:.*&)?file=(http[^\'"&]*)', webpage)
if mobj is None:
@@ -151,17 +175,18 @@ class GenericIE(InfoExtractor):
mobj = re.search(r'<meta.*?property="og:video".*?content="(.*?)"', webpage)
if mobj is None:
# HTML5 video
mobj = re.search(r'<video[^<]*>.*?<source .*?src="([^"]+)"', webpage, flags=re.DOTALL)
mobj = re.search(r'<video[^<]*(?:>.*?<source.*?)? src="([^"]+)"', webpage, flags=re.DOTALL)
if mobj is None:
raise ExtractorError(u'Invalid URL: %s' % url)
raise ExtractorError(u'Unsupported URL: %s' % url)
# It's possible that one of the regexes
# matched, but returned an empty group:
if mobj.group(1) is None:
raise ExtractorError(u'Invalid URL: %s' % url)
raise ExtractorError(u'Did not find a valid video URL at %s' % url)
video_url = compat_urllib_parse.unquote(mobj.group(1))
video_id = os.path.basename(video_url)
video_url = mobj.group(1)
video_url = compat_urlparse.urljoin(url, video_url)
video_id = compat_urllib_parse.unquote(os.path.basename(video_url))
# here's a fun little line of code for you:
video_extension = os.path.splitext(video_id)[1][1:]

View File

@@ -40,8 +40,10 @@ class GooglePlusIE(InfoExtractor):
self.report_extraction(video_id)
# Extract update date
upload_date = self._html_search_regex('title="Timestamp">(.*?)</a>',
webpage, u'upload date', fatal=False)
upload_date = self._html_search_regex(
r'''(?x)<a.+?class="o-U-s\s[^"]+"\s+style="display:\s*none"\s*>
([0-9]{4}-[0-9]{2}-[0-9]{2})</a>''',
webpage, u'upload date', fatal=False, flags=re.VERBOSE)
if upload_date:
# Convert timestring to a format suitable for filename
upload_date = datetime.datetime.strptime(upload_date, "%Y-%m-%d")
@@ -57,8 +59,8 @@ class GooglePlusIE(InfoExtractor):
webpage, 'title', default=u'NA')
# Step 2, Simulate clicking the image box to launch video
DOMAIN = 'https://plus.google.com'
video_page = self._search_regex(r'<a href="((?:%s)?/photos/.*?)"' % re.escape(DOMAIN),
DOMAIN = 'https://plus.google.com/'
video_page = self._search_regex(r'<a href="((?:%s)?photos/.*?)"' % re.escape(DOMAIN),
webpage, u'video page URL')
if not video_page.startswith(DOMAIN):
video_page = DOMAIN + video_page

View File

@@ -0,0 +1,37 @@
# -*- coding: utf-8 -*-
import re
import json
from .common import InfoExtractor
from ..utils import determine_ext
class HarkIE(InfoExtractor):
_VALID_URL = r'https?://www\.hark\.com/clips/(.+?)-.+'
_TEST = {
u'url': u'http://www.hark.com/clips/mmbzyhkgny-obama-beyond-the-afghan-theater-we-only-target-al-qaeda-on-may-23-2013',
u'file': u'mmbzyhkgny.mp3',
u'md5': u'6783a58491b47b92c7c1af5a77d4cbee',
u'info_dict': {
u'title': u"Obama: 'Beyond The Afghan Theater, We Only Target Al Qaeda' on May 23, 2013",
u'description': u'President Barack Obama addressed the nation live on May 23, 2013 in a speech aimed at addressing counter-terrorism policies including the use of drone strikes, detainees at Guantanamo Bay prison facility, and American citizens who are terrorists.',
u'duration': 11,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group(1)
json_url = "http://www.hark.com/clips/%s.json" %(video_id)
info_json = self._download_webpage(json_url, video_id)
info = json.loads(info_json)
final_url = info['url']
return {'id': video_id,
'url' : final_url,
'title': info['name'],
'ext': determine_ext(final_url),
'description': info['description'],
'thumbnail': info['image_original'],
'duration': info['duration'],
}

View File

@@ -7,11 +7,11 @@ from .common import InfoExtractor
class HotNewHipHopIE(InfoExtractor):
_VALID_URL = r'http://www\.hotnewhiphop.com/.*\.(?P<id>.*)\.html'
_TEST = {
u'url': u"http://www.hotnewhiphop.com/freddie-gibbs-lay-it-down-song.1435540.html'",
u'url': u"http://www.hotnewhiphop.com/freddie-gibbs-lay-it-down-song.1435540.html",
u'file': u'1435540.mp3',
u'md5': u'2c2cd2f76ef11a9b3b581e8b232f3d96',
u'info_dict': {
u"title": u"Freddie Gibbs Songs - Lay It Down"
u"title": u"Freddie Gibbs - Lay It Down"
}
}

View File

@@ -19,8 +19,7 @@ class HowcastIE(InfoExtractor):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage_url = 'http://www.howcast.com/videos/' + video_id
webpage = self._download_webpage(webpage_url, video_id)
webpage = self._download_webpage(url, video_id)
self.report_extraction(video_id)

View File

@@ -13,7 +13,7 @@ class IGNIE(InfoExtractor):
Some videos of it.ign.com are also supported
"""
_VALID_URL = r'https?://.+?\.ign\.com/(?:videos|show_videos)(/.+)?/(?P<name_or_id>.+)'
_VALID_URL = r'https?://.+?\.ign\.com/(?P<type>videos|show_videos|articles|(?:[^/]*/feature))(/.+)?/(?P<name_or_id>.+)'
IE_NAME = u'ign.com'
_CONFIG_URL_TEMPLATE = 'http://www.ign.com/videos/configs/id/%s.config'
@@ -21,15 +21,39 @@ class IGNIE(InfoExtractor):
r'id="my_show_video">.*?<p>(.*?)</p>',
]
_TEST = {
u'url': u'http://www.ign.com/videos/2013/06/05/the-last-of-us-review',
u'file': u'8f862beef863986b2785559b9e1aa599.mp4',
u'md5': u'eac8bdc1890980122c3b66f14bdd02e9',
u'info_dict': {
u'title': u'The Last of Us Review',
u'description': u'md5:c8946d4260a4d43a00d5ae8ed998870c',
}
}
_TESTS = [
{
u'url': u'http://www.ign.com/videos/2013/06/05/the-last-of-us-review',
u'file': u'8f862beef863986b2785559b9e1aa599.mp4',
u'md5': u'eac8bdc1890980122c3b66f14bdd02e9',
u'info_dict': {
u'title': u'The Last of Us Review',
u'description': u'md5:c8946d4260a4d43a00d5ae8ed998870c',
}
},
{
u'url': u'http://me.ign.com/en/feature/15775/100-little-things-in-gta-5-that-will-blow-your-mind',
u'playlist': [
{
u'file': u'5ebbd138523268b93c9141af17bec937.mp4',
u'info_dict': {
u'title': u'GTA 5 Video Review',
u'description': u'Rockstar drops the mic on this generation of games. Watch our review of the masterly Grand Theft Auto V.',
},
},
{
u'file': u'638672ee848ae4ff108df2a296418ee2.mp4',
u'info_dict': {
u'title': u'GTA 5\'s Twisted Beauty in Super Slow Motion',
u'description': u'The twisted beauty of GTA 5 in stunning slow motion.',
},
},
],
u'params': {
u'skip_download': True,
},
},
]
def _find_video_id(self, webpage):
res_id = [r'data-video-id="(.+?)"',
@@ -41,7 +65,18 @@ class IGNIE(InfoExtractor):
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
name_or_id = mobj.group('name_or_id')
page_type = mobj.group('type')
webpage = self._download_webpage(url, name_or_id)
if page_type == 'articles':
video_url = self._search_regex(r'var videoUrl = "(.+?)"', webpage, u'video url')
return self.url_result(video_url, ie='IGN')
elif page_type != 'video':
multiple_urls = re.findall(
'<param name="flashvars" value="[^"]*?url=(https?://www\.ign\.com/videos/.*?)["&]',
webpage)
if multiple_urls:
return [self.url_result(u, ie='IGN') for u in multiple_urls]
video_id = self._find_video_id(webpage)
result = self._get_video_info(video_id)
description = self._html_search_regex(self._DESCRIPTION_RE,
@@ -68,7 +103,7 @@ class IGNIE(InfoExtractor):
class OneUPIE(IGNIE):
"""Extractor for 1up.com, it uses the ign videos system."""
_VALID_URL = r'https?://gamevideos.1up.com/video/id/(?P<name_or_id>.+)'
_VALID_URL = r'https?://gamevideos.1up.com/(?P<type>video)/id/(?P<name_or_id>.+)'
IE_NAME = '1up.com'
_DESCRIPTION_RE = r'<div id="vid_summary">(.+?)</div>'
@@ -83,6 +118,9 @@ class OneUPIE(IGNIE):
}
}
# Override IGN tests
_TESTS = []
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
id = mobj.group('name_or_id')

View File

@@ -0,0 +1,84 @@
import re
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import (
compat_urlparse,
compat_urllib_parse,
xpath_with_ns,
determine_ext,
)
class InternetVideoArchiveIE(InfoExtractor):
_VALID_URL = r'https?://video\.internetvideoarchive\.net/flash/players/.*?\?.*?publishedid.*?'
_TEST = {
u'url': u'http://video.internetvideoarchive.net/flash/players/flashconfiguration.aspx?customerid=69249&publishedid=452693&playerid=247',
u'file': u'452693.mp4',
u'info_dict': {
u'title': u'SKYFALL',
u'description': u'In SKYFALL, Bond\'s loyalty to M is tested as her past comes back to haunt her. As MI6 comes under attack, 007 must track down and destroy the threat, no matter how personal the cost.',
u'duration': 153,
},
}
@staticmethod
def _build_url(query):
return 'http://video.internetvideoarchive.net/flash/players/flashconfiguration.aspx?' + query
@staticmethod
def _clean_query(query):
NEEDED_ARGS = ['publishedid', 'customerid']
query_dic = compat_urlparse.parse_qs(query)
cleaned_dic = dict((k,v[0]) for (k,v) in query_dic.items() if k in NEEDED_ARGS)
# Other player ids return m3u8 urls
cleaned_dic['playerid'] = '247'
cleaned_dic['videokbrate'] = '100000'
return compat_urllib_parse.urlencode(cleaned_dic)
def _real_extract(self, url):
query = compat_urlparse.urlparse(url).query
query_dic = compat_urlparse.parse_qs(query)
video_id = query_dic['publishedid'][0]
url = self._build_url(query)
flashconfiguration_xml = self._download_webpage(url, video_id,
u'Downloading flash configuration')
flashconfiguration = xml.etree.ElementTree.fromstring(flashconfiguration_xml.encode('utf-8'))
file_url = flashconfiguration.find('file').text
file_url = file_url.replace('/playlist.aspx', '/mrssplaylist.aspx')
# Replace some of the parameters in the query to get the best quality
# and http links (no m3u8 manifests)
file_url = re.sub(r'(?<=\?)(.+)$',
lambda m: self._clean_query(m.group()),
file_url)
info_xml = self._download_webpage(file_url, video_id,
u'Downloading video info')
info = xml.etree.ElementTree.fromstring(info_xml.encode('utf-8'))
item = info.find('channel/item')
def _bp(p):
return xpath_with_ns(p,
{'media': 'http://search.yahoo.com/mrss/',
'jwplayer': 'http://developer.longtailvideo.com/trac/wiki/FlashFormats'})
formats = []
for content in item.findall(_bp('media:group/media:content')):
attr = content.attrib
f_url = attr['url']
formats.append({
'url': f_url,
'ext': determine_ext(f_url),
'width': int(attr['width']),
'bitrate': int(attr['bitrate']),
})
formats = sorted(formats, key=lambda f: f['bitrate'])
return {
'id': video_id,
'title': item.find('title').text,
'formats': formats,
'thumbnail': item.find(_bp('media:thumbnail')).attrib['url'],
'description': item.find('description').text,
'duration': int(attr['duration']),
}

View File

@@ -6,13 +6,14 @@ import xml.etree.ElementTree
from .common import InfoExtractor
class JeuxVideoIE(InfoExtractor):
_VALID_URL = r'http://.*?\.jeuxvideo\.com/.*/(.*?)-\d+\.htm'
_TEST = {
u'url': u'http://www.jeuxvideo.com/reportages-videos-jeux/0004/00046170/tearaway-playstation-vita-gc-2013-tearaway-nous-presente-ses-papiers-d-identite-00115182.htm',
u'file': u'5182.mp4',
u'md5': u'e0fdb0cd3ce98713ef9c1e1e025779d0',
u'md5': u'046e491afb32a8aaac1f44dd4ddd54ee',
u'info_dict': {
u'title': u'GC 2013 : Tearaway nous présente ses papiers d\'identité',
u'description': u'Lorsque les développeurs de LittleBigPlanet proposent un nouveau titre, on ne peut que s\'attendre à un résultat original et fort attrayant.\n',
@@ -23,25 +24,29 @@ class JeuxVideoIE(InfoExtractor):
mobj = re.match(self._VALID_URL, url)
title = re.match(self._VALID_URL, url).group(1)
webpage = self._download_webpage(url, title)
m_download = re.search(r'<param name="flashvars" value="config=(.*?)" />', webpage)
xml_link = m_download.group(1)
xml_link = self._html_search_regex(
r'<param name="flashvars" value="config=(.*?)" />',
webpage, u'config URL')
id = re.search(r'http://www.jeuxvideo.com/config/\w+/0011/(.*?)/\d+_player\.xml', xml_link).group(1)
video_id = self._search_regex(
r'http://www\.jeuxvideo\.com/config/\w+/\d+/(.*?)/\d+_player\.xml',
xml_link, u'video ID')
xml_config = self._download_webpage(xml_link, title,
'Downloading XML config')
xml_config = self._download_webpage(
xml_link, title, u'Downloading XML config')
config = xml.etree.ElementTree.fromstring(xml_config.encode('utf-8'))
info = re.search(r'<format\.json>(.*?)</format\.json>',
xml_config, re.MULTILINE|re.DOTALL).group(1)
info = json.loads(info)['versions'][0]
info_json = self._search_regex(
r'(?sm)<format\.json>(.*?)</format\.json>',
xml_config, u'JSON information')
info = json.loads(info_json)['versions'][0]
video_url = 'http://video720.jeuxvideo.com/' + info['file']
return {'id': id,
'title' : config.find('titre_video').text,
'ext' : 'mp4',
'url' : video_url,
'description': self._og_search_description(webpage),
'thumbnail': config.find('image').text,
}
return {
'id': video_id,
'title': config.find('titre_video').text,
'ext': 'mp4',
'url': video_url,
'description': self._og_search_description(webpage),
'thumbnail': config.find('image').text,
}

View File

@@ -21,8 +21,10 @@ class KankanIE(InfoExtractor):
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
title = self._search_regex(r'G_TITLE=[\'"](.+?)[\'"]', webpage, u'video title')
gcid = self._search_regex(r'lurl:[\'"]http://.+?/.+?/(.+?)/', webpage, u'gcid')
title = self._search_regex(r'(?:G_TITLE=|G_MOVIE_TITLE = )[\'"](.+?)[\'"]', webpage, u'video title')
surls = re.search(r'surls:\[\'.+?\'\]|lurl:\'.+?\.flv\'', webpage).group(0)
gcids = re.findall(r"http://.+?/.+?/(.+?)/", surls)
gcid = gcids[-1]
video_info_page = self._download_webpage('http://p2s.cl.kankan.com/getCdnresource_flv?gcid=%s' % gcid,
video_id, u'Downloading video url info')

View File

@@ -0,0 +1,37 @@
import re
from .common import InfoExtractor
class KickStarterIE(InfoExtractor):
_VALID_URL = r'https?://www\.kickstarter\.com/projects/(?P<id>\d*)/.*'
_TEST = {
u"url": u"https://www.kickstarter.com/projects/1404461844/intersection-the-story-of-josh-grant?ref=home_location",
u"file": u"1404461844.mp4",
u"md5": u"c81addca81327ffa66c642b5d8b08cab",
u"info_dict": {
u"title": u"Intersection: The Story of Josh Grant by Kyle Cowling",
},
}
def _real_extract(self, url):
m = re.match(self._VALID_URL, url)
video_id = m.group('id')
webpage_src = self._download_webpage(url, video_id)
video_url = self._search_regex(r'data-video="(.*?)">',
webpage_src, u'video URL')
if 'mp4' in video_url:
ext = 'mp4'
else:
ext = 'flv'
video_title = self._html_search_regex(r"<title>(.*?)</title>",
webpage_src, u'title').rpartition(u'\u2014 Kickstarter')[0].strip()
results = [{
'id': video_id,
'url': video_url,
'title': video_title,
'ext': ext,
}]
return results

View File

@@ -2,7 +2,12 @@ import re
import json
from .common import InfoExtractor
from ..utils import compat_urllib_parse_urlparse, compat_urlparse
from ..utils import (
compat_urllib_parse_urlparse,
compat_urlparse,
get_meta_content,
ExtractorError,
)
class LivestreamIE(InfoExtractor):
@@ -35,8 +40,11 @@ class LivestreamIE(InfoExtractor):
if video_id is None:
# This is an event page:
api_url = self._search_regex(r'event_design_eventId: \'(.+?)\'',
webpage, 'api url')
player = get_meta_content('twitter:player', webpage)
if player is None:
raise ExtractorError('Couldn\'t extract event api url')
api_url = player.replace('/player', '')
api_url = re.sub(r'^(https?://)(new\.)', r'\1api.\2', api_url)
info = json.loads(self._download_webpage(api_url, event_name,
u'Downloading event info'))
videos = [self._extract_video_info(video_data['data'])

View File

@@ -122,7 +122,7 @@ class MetacafeIE(InfoExtractor):
video_title = self._html_search_regex(r'(?im)<title>(.*) - Video</title>', webpage, u'title')
description = self._og_search_description(webpage)
video_uploader = self._html_search_regex(
r'submitter=(.*?);|googletag\.pubads\(\)\.setTargeting\("channel","([^"]+)"\);',
r'submitter=(.*?);|googletag\.pubads\(\)\.setTargeting\("(?:channel|submiter)","([^"]+)"\);',
webpage, u'uploader nickname', fatal=False)
return {

View File

@@ -0,0 +1,55 @@
import re
import xml.etree.ElementTree
import operator
from .common import InfoExtractor
class MetacriticIE(InfoExtractor):
_VALID_URL = r'https?://www\.metacritic\.com/.+?/trailers/(?P<id>\d+)'
_TEST = {
u'url': u'http://www.metacritic.com/game/playstation-4/infamous-second-son/trailers/3698222',
u'file': u'3698222.mp4',
u'info_dict': {
u'title': u'inFamous: Second Son - inSide Sucker Punch: Smoke & Mirrors',
u'description': u'Take a peak behind-the-scenes to see how Sucker Punch brings smoke into the universe of inFAMOUS Second Son on the PS4.',
u'duration': 221,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
# The xml is not well formatted, there are raw '&'
info_xml = self._download_webpage('http://www.metacritic.com/video_data?video=' + video_id,
video_id, u'Downloading info xml').replace('&', '&amp;')
info = xml.etree.ElementTree.fromstring(info_xml.encode('utf-8'))
clip = next(c for c in info.findall('playList/clip') if c.find('id').text == video_id)
formats = []
for videoFile in clip.findall('httpURI/videoFile'):
rate_str = videoFile.find('rate').text
video_url = videoFile.find('filePath').text
formats.append({
'url': video_url,
'ext': 'mp4',
'format_id': rate_str,
'rate': int(rate_str),
})
formats.sort(key=operator.itemgetter('rate'))
description = self._html_search_regex(r'<b>Description:</b>(.*?)</p>',
webpage, u'description', flags=re.DOTALL)
info = {
'id': video_id,
'title': clip.find('title').text,
'formats': formats,
'description': description,
'duration': int(clip.find('duration').text),
}
# TODO: Remove when #980 has been merged
info.update(formats[-1])
return info

View File

@@ -0,0 +1,74 @@
import re
import json
from .common import InfoExtractor
from ..utils import (
clean_html,
get_element_by_id,
)
class TechTVMITIE(InfoExtractor):
IE_NAME = u'techtv.mit.edu'
_VALID_URL = r'https?://techtv\.mit\.edu/(videos|embeds)/(?P<id>\d+)'
_TEST = {
u'url': u'http://techtv.mit.edu/videos/25418-mit-dna-learning-center-set',
u'file': u'25418.mp4',
u'md5': u'1f8cb3e170d41fd74add04d3c9330e5f',
u'info_dict': {
u'title': u'MIT DNA Learning Center Set',
u'description': u'md5:82313335e8a8a3f243351ba55bc1b474',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
raw_page = self._download_webpage(
'http://techtv.mit.edu/videos/%s' % video_id, video_id)
clean_page = re.compile(u'<!--.*?-->', re.S).sub(u'', raw_page)
base_url = self._search_regex(r'ipadUrl: \'(.+?cloudfront.net/)',
raw_page, u'base url')
formats_json = self._search_regex(r'bitrates: (\[.+?\])', raw_page,
u'video formats')
formats = json.loads(formats_json)
formats = sorted(formats, key=lambda f: f['bitrate'])
title = get_element_by_id('edit-title', clean_page)
description = clean_html(get_element_by_id('edit-description', clean_page))
thumbnail = self._search_regex(r'playlist:.*?url: \'(.+?)\'',
raw_page, u'thumbnail', flags=re.DOTALL)
return {'id': video_id,
'title': title,
'url': base_url + formats[-1]['url'].replace('mp4:', ''),
'ext': 'mp4',
'description': description,
'thumbnail': thumbnail,
}
class MITIE(TechTVMITIE):
IE_NAME = u'video.mit.edu'
_VALID_URL = r'https?://video\.mit\.edu/watch/(?P<title>[^/]+)'
_TEST = {
u'url': u'http://video.mit.edu/watch/the-government-is-profiling-you-13222/',
u'file': u'21783.mp4',
u'md5': u'7db01d5ccc1895fc5010e9c9e13648da',
u'info_dict': {
u'title': u'The Government is Profiling You',
u'description': u'md5:ad5795fe1e1623b73620dbfd47df9afd',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
page_title = mobj.group('title')
webpage = self._download_webpage(url, page_title)
self.to_screen('%s: Extracting %s url' % (page_title, TechTVMITIE.IE_NAME))
embed_url = self._search_regex(r'<iframe .*?src="(.+?)"', webpage,
u'embed url')
return self.url_result(embed_url, ie='TechTVMIT')

View File

@@ -5,34 +5,27 @@ import socket
from .common import InfoExtractor
from ..utils import (
compat_http_client,
compat_str,
compat_urllib_error,
compat_urllib_request,
ExtractorError,
unified_strdate,
)
class MixcloudIE(InfoExtractor):
_WORKING = False # New API, but it seems good http://www.mixcloud.com/developers/documentation/
_VALID_URL = r'^(?:https?://)?(?:www\.)?mixcloud\.com/([\w\d-]+)/([\w\d-]+)'
IE_NAME = u'mixcloud'
def report_download_json(self, file_id):
"""Report JSON download."""
self.to_screen(u'Downloading json')
def get_urls(self, jsonData, fmt, bitrate='best'):
"""Get urls from 'audio_formats' section in json"""
try:
bitrate_list = jsonData[fmt]
if bitrate is None or bitrate == 'best' or bitrate not in bitrate_list:
bitrate = max(bitrate_list) # select highest
url_list = jsonData[fmt][bitrate]
except TypeError: # we have no bitrate info.
url_list = jsonData[fmt]
return url_list
_TEST = {
u'url': u'http://www.mixcloud.com/dholbach/cryptkeeper/',
u'file': u'dholbach-cryptkeeper.mp3',
u'info_dict': {
u'title': u'Cryptkeeper',
u'description': u'After quite a long silence from myself, finally another Drum\'n\'Bass mix with my favourite current dance floor bangers.',
u'uploader': u'Daniel Holbach',
u'uploader_id': u'dholbach',
u'upload_date': u'20111115',
},
}
def check_urls(self, url_list):
"""Returns 1st active url from list"""
@@ -45,71 +38,32 @@ class MixcloudIE(InfoExtractor):
return None
def _print_formats(self, formats):
print('Available formats:')
for fmt in formats.keys():
for b in formats[fmt]:
try:
ext = formats[fmt][b][0]
print('%s\t%s\t[%s]' % (fmt, b, ext.split('.')[-1]))
except TypeError: # we have no bitrate info
ext = formats[fmt][0]
print('%s\t%s\t[%s]' % (fmt, '??', ext.split('.')[-1]))
break
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
if mobj is None:
raise ExtractorError(u'Invalid URL: %s' % url)
# extract uploader & filename from url
uploader = mobj.group(1).decode('utf-8')
file_id = uploader + "-" + mobj.group(2).decode('utf-8')
# construct API request
file_url = 'http://www.mixcloud.com/api/1/cloudcast/' + '/'.join(url.split('/')[-3:-1]) + '.json'
# retrieve .json file with links to files
request = compat_urllib_request.Request(file_url)
try:
self.report_download_json(file_url)
jsonData = compat_urllib_request.urlopen(request).read()
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
raise ExtractorError(u'Unable to retrieve file: %s' % compat_str(err))
uploader = mobj.group(1)
cloudcast_name = mobj.group(2)
track_id = '-'.join((uploader, cloudcast_name))
api_url = 'http://api.mixcloud.com/%s/%s/' % (uploader, cloudcast_name)
webpage = self._download_webpage(url, track_id)
json_data = self._download_webpage(api_url, track_id,
u'Downloading cloudcast info')
info = json.loads(json_data)
# parse JSON
json_data = json.loads(jsonData)
player_url = json_data['player_swf_url']
formats = dict(json_data['audio_formats'])
preview_url = self._search_regex(r'data-preview-url="(.+?)"', webpage, u'preview url')
song_url = preview_url.replace('/previews/', '/cloudcasts/originals/')
template_url = re.sub(r'(stream\d*)', 'stream%d', song_url)
final_song_url = self.check_urls(template_url % i for i in range(30))
req_format = self._downloader.params.get('format', None)
if self._downloader.params.get('listformats', None):
self._print_formats(formats)
return
if req_format is None or req_format == 'best':
for format_param in formats.keys():
url_list = self.get_urls(formats, format_param)
# check urls
file_url = self.check_urls(url_list)
if file_url is not None:
break # got it!
else:
if req_format not in formats:
raise ExtractorError(u'Format is not available')
url_list = self.get_urls(formats, req_format)
file_url = self.check_urls(url_list)
format_param = req_format
return [{
'id': file_id.decode('utf-8'),
'url': file_url.decode('utf-8'),
'uploader': uploader.decode('utf-8'),
'upload_date': None,
'title': json_data['name'],
'ext': file_url.split('.')[-1].decode('utf-8'),
'format': (format_param is None and u'NA' or format_param.decode('utf-8')),
'thumbnail': json_data['thumbnail_url'],
'description': json_data['description'],
'player_url': player_url.decode('utf-8'),
}]
return {
'id': track_id,
'title': info['name'],
'url': final_song_url,
'ext': 'mp3',
'description': info['description'],
'thumbnail': info['pictures'].get('extra_large'),
'uploader': info['user']['name'],
'uploader_id': info['user']['username'],
'upload_date': unified_strdate(info['created_time']),
'view_count': info['play_count'],
}

View File

@@ -54,23 +54,26 @@ class MTVIE(InfoExtractor):
def _get_thumbnail_url(self, uri, itemdoc):
return 'http://mtv.mtvnimages.com/uri/' + uri
def _extract_video_url(self, metadataXml):
def _extract_video_formats(self, metadataXml):
if '/error_country_block.swf' in metadataXml:
raise ExtractorError(u'This video is not available from your country.', expected=True)
mdoc = xml.etree.ElementTree.fromstring(metadataXml.encode('utf-8'))
renditions = mdoc.findall('.//rendition')
# For now, always pick the highest quality.
rendition = renditions[-1]
try:
_,_,ext = rendition.attrib['type'].partition('/')
format = ext + '-' + rendition.attrib['width'] + 'x' + rendition.attrib['height'] + '_' + rendition.attrib['bitrate']
rtmp_video_url = rendition.find('./src').text
except KeyError:
raise ExtractorError('Invalid rendition field.')
video_url = self._transform_rtmp_url(rtmp_video_url)
return {'ext': ext, 'url': video_url, 'format': format}
formats = []
for rendition in mdoc.findall('.//rendition'):
try:
_, _, ext = rendition.attrib['type'].partition('/')
rtmp_video_url = rendition.find('./src').text
formats.append({'ext': ext,
'url': self._transform_rtmp_url(rtmp_video_url),
'format_id': rendition.get('bitrate'),
'width': int(rendition.get('width')),
'height': int(rendition.get('height')),
})
except (KeyError, TypeError):
raise ExtractorError('Invalid rendition field.')
return formats
def _get_video_info(self, itemdoc):
uri = itemdoc.find('guid').text
@@ -81,19 +84,25 @@ class MTVIE(InfoExtractor):
mediagen_url += '&acceptMethods=fms'
mediagen_page = self._download_webpage(mediagen_url, video_id,
u'Downloading video urls')
video_info = self._extract_video_url(mediagen_page)
description_node = itemdoc.find('description')
if description_node is not None:
description = description_node.text
description = description_node.text.strip()
else:
description = None
video_info.update({'title': itemdoc.find('title').text,
'id': video_id,
'thumbnail': self._get_thumbnail_url(uri, itemdoc),
'description': description,
})
return video_info
info = {
'title': itemdoc.find('title').text,
'formats': self._extract_video_formats(mediagen_page),
'id': video_id,
'thumbnail': self._get_thumbnail_url(uri, itemdoc),
'description': description,
}
# TODO: Remove when #980 has been merged
info.update(info['formats'][-1])
return info
def _get_videos_info(self, uri):
video_id = self._id_from_uri(uri)

View File

@@ -0,0 +1,73 @@
# encoding: utf-8
import re
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import (
compat_urllib_parse,
ExtractorError,
)
class NaverIE(InfoExtractor):
_VALID_URL = r'https?://tvcast\.naver\.com/v/(?P<id>\d+)'
_TEST = {
u'url': u'http://tvcast.naver.com/v/81652',
u'file': u'81652.mp4',
u'info_dict': {
u'title': u'[9월 모의고사 해설강의][수학_김상희] 수학 A형 16~20번',
u'description': u'합격불변의 법칙 메가스터디 | 메가스터디 수학 김상희 선생님이 9월 모의고사 수학A형 16번에서 20번까지 해설강의를 공개합니다.',
u'upload_date': u'20130903',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group(1)
webpage = self._download_webpage(url, video_id)
m_id = re.search(r'var rmcPlayer = new nhn.rmcnmv.RMCVideoPlayer\("(.+?)", "(.+?)"',
webpage)
if m_id is None:
raise ExtractorError(u'couldn\'t extract vid and key')
vid = m_id.group(1)
key = m_id.group(2)
query = compat_urllib_parse.urlencode({'vid': vid, 'inKey': key,})
query_urls = compat_urllib_parse.urlencode({
'masterVid': vid,
'protocol': 'p2p',
'inKey': key,
})
info_xml = self._download_webpage(
'http://serviceapi.rmcnmv.naver.com/flash/videoInfo.nhn?' + query,
video_id, u'Downloading video info')
urls_xml = self._download_webpage(
'http://serviceapi.rmcnmv.naver.com/flash/playableEncodingOption.nhn?' + query_urls,
video_id, u'Downloading video formats info')
info = xml.etree.ElementTree.fromstring(info_xml.encode('utf-8'))
urls = xml.etree.ElementTree.fromstring(urls_xml.encode('utf-8'))
formats = []
for format_el in urls.findall('EncodingOptions/EncodingOption'):
domain = format_el.find('Domain').text
if domain.startswith('rtmp'):
continue
formats.append({
'url': domain + format_el.find('uri').text,
'ext': 'mp4',
'width': int(format_el.find('width').text),
'height': int(format_el.find('height').text),
})
info = {
'id': video_id,
'title': info.find('Subject').text,
'formats': formats,
'description': self._og_search_description(webpage),
'thumbnail': self._og_search_thumbnail(webpage),
'upload_date': info.find('WriteDate').text.replace('.', ''),
'view_count': int(info.find('PlayCount').text),
}
# TODO: Remove when #980 has been merged
info.update(formats[-1])
return info

View File

@@ -0,0 +1,33 @@
import re
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import find_xpath_attr, compat_str
class NBCNewsIE(InfoExtractor):
_VALID_URL = r'https?://www\.nbcnews\.com/video/.+?/(?P<id>\d+)'
_TEST = {
u'url': u'http://www.nbcnews.com/video/nbc-news/52753292',
u'file': u'52753292.flv',
u'md5': u'47abaac93c6eaf9ad37ee6c4463a5179',
u'info_dict': {
u'title': u'Crew emerges after four-month Mars food study',
u'description': u'md5:24e632ffac72b35f8b67a12d1b6ddfc1',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
info_xml = self._download_webpage('http://www.nbcnews.com/id/%s/displaymode/1219' % video_id, video_id)
info = xml.etree.ElementTree.fromstring(info_xml.encode('utf-8')).find('video')
return {'id': video_id,
'title': info.find('headline').text,
'ext': 'flv',
'url': find_xpath_attr(info, 'media', 'type', 'flashVideo').text,
'description': compat_str(info.find('caption').text),
'thumbnail': find_xpath_attr(info, 'media', 'type', 'thumbnail').text,
}

View File

@@ -0,0 +1,38 @@
import json
import re
from .common import InfoExtractor
from ..utils import determine_ext
class NewgroundsIE(InfoExtractor):
_VALID_URL = r'(?:https?://)?(?:www\.)?newgrounds\.com/audio/listen/(?P<id>\d+)'
_TEST = {
u'url': u'http://www.newgrounds.com/audio/listen/549479',
u'file': u'549479.mp3',
u'md5': u'fe6033d297591288fa1c1f780386f07a',
u'info_dict': {
u"title": u"B7 - BusMode",
u"uploader": u"Burn7",
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
music_id = mobj.group('id')
webpage = self._download_webpage(url, music_id)
title = self._html_search_regex(r',"name":"([^"]+)",', webpage, u'music title')
uploader = self._html_search_regex(r',"artist":"([^"]+)",', webpage, u'music uploader')
music_url_json_string = self._html_search_regex(r'({"url":"[^"]+"),', webpage, u'music url') + '}'
music_url_json = json.loads(music_url_json_string)
music_url = music_url_json['url']
return {
'id': music_id,
'title': title,
'url': music_url,
'uploader': uploader,
'ext': determine_ext(music_url),
}

120
youtube_dl/extractor/nhl.py Normal file
View File

@@ -0,0 +1,120 @@
import re
import json
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import (
compat_urlparse,
compat_urllib_parse,
determine_ext,
unified_strdate,
)
class NHLBaseInfoExtractor(InfoExtractor):
@staticmethod
def _fix_json(json_string):
return json_string.replace('\\\'', '\'')
def _extract_video(self, info):
video_id = info['id']
self.report_extraction(video_id)
initial_video_url = info['publishPoint']
data = compat_urllib_parse.urlencode({
'type': 'fvod',
'path': initial_video_url.replace('.mp4', '_sd.mp4'),
})
path_url = 'http://video.nhl.com/videocenter/servlets/encryptvideopath?' + data
path_response = self._download_webpage(path_url, video_id,
u'Downloading final video url')
path_doc = xml.etree.ElementTree.fromstring(path_response)
video_url = path_doc.find('path').text
join = compat_urlparse.urljoin
return {
'id': video_id,
'title': info['name'],
'url': video_url,
'ext': determine_ext(video_url),
'description': info['description'],
'duration': int(info['duration']),
'thumbnail': join(join(video_url, '/u/'), info['bigImage']),
'upload_date': unified_strdate(info['releaseDate'].split('.')[0]),
}
class NHLIE(NHLBaseInfoExtractor):
IE_NAME = u'nhl.com'
_VALID_URL = r'https?://video(?P<team>\.[^.]*)?\.nhl\.com/videocenter/console\?.*?(?<=[?&])id=(?P<id>\d+)'
_TEST = {
u'url': u'http://video.canucks.nhl.com/videocenter/console?catid=6?id=453614',
u'file': u'453614.mp4',
u'info_dict': {
u'title': u'Quick clip: Weise 4-3 goal vs Flames',
u'description': u'Dale Weise scores his first of the season to put the Canucks up 4-3.',
u'duration': 18,
u'upload_date': u'20131006',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
json_url = 'http://video.nhl.com/videocenter/servlets/playlist?ids=%s&format=json' % video_id
info_json = self._download_webpage(json_url, video_id,
u'Downloading info json')
info_json = self._fix_json(info_json)
info = json.loads(info_json)[0]
return self._extract_video(info)
class NHLVideocenterIE(NHLBaseInfoExtractor):
IE_NAME = u'nhl.com:videocenter'
IE_DESC = u'Download the first 12 videos from a videocenter category'
_VALID_URL = r'https?://video\.(?P<team>[^.]*)\.nhl\.com/videocenter/(console\?.*?catid=(?P<catid>[^&]+))?'
@classmethod
def suitable(cls, url):
if NHLIE.suitable(url):
return False
return super(NHLVideocenterIE, cls).suitable(url)
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
team = mobj.group('team')
webpage = self._download_webpage(url, team)
cat_id = self._search_regex(
[r'var defaultCatId = "(.+?)";',
r'{statusIndex:0,index:0,.*?id:(.*?),'],
webpage, u'category id')
playlist_title = self._html_search_regex(
r'tab0"[^>]*?>(.*?)</td>',
webpage, u'playlist title', flags=re.DOTALL).lower().capitalize()
data = compat_urllib_parse.urlencode({
'cid': cat_id,
# This is the default value
'count': 12,
'ptrs': 3,
'format': 'json',
})
path = '/videocenter/servlets/browse?' + data
request_url = compat_urlparse.urljoin(url, path)
response = self._download_webpage(request_url, playlist_title)
response = self._fix_json(response)
if not response.strip():
self._downloader.report_warning(u'Got an empty reponse, trying '
u'adding the "newvideos" parameter')
response = self._download_webpage(request_url + '&newvideos=true',
playlist_title)
response = self._fix_json(response)
videos = json.loads(response)
return {
'_type': 'playlist',
'title': playlist_title,
'id': cat_id,
'entries': [self._extract_video(i) for i in videos],
}

View File

@@ -0,0 +1,46 @@
import re
from .common import InfoExtractor
from ..utils import compat_urlparse
class NowVideoIE(InfoExtractor):
_VALID_URL = r'(?:https?://)?(?:www\.)?nowvideo\.ch/video/(?P<id>\w+)'
_TEST = {
u'url': u'http://www.nowvideo.ch/video/0mw0yow7b6dxa',
u'file': u'0mw0yow7b6dxa.flv',
u'md5': u'f8fbbc8add72bd95b7850c6a02fc8817',
u'info_dict': {
u"title": u"youtubedl test video _BaW_jenozKc.mp4"
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage_url = 'http://www.nowvideo.ch/video/' + video_id
embed_url = 'http://embed.nowvideo.ch/embed.php?v=' + video_id
webpage = self._download_webpage(webpage_url, video_id)
embed_page = self._download_webpage(embed_url, video_id,
u'Downloading embed page')
self.report_extraction(video_id)
video_title = self._html_search_regex(r'<h4>(.*)</h4>',
webpage, u'video title')
video_key = self._search_regex(r'var fkzd="(.*)";',
embed_page, u'video key')
api_call = "http://www.nowvideo.ch/api/player.api.php?file={0}&numOfErrors=0&cid=1&key={1}".format(video_id, video_key)
api_response = self._download_webpage(api_call, video_id,
u'Downloading API page')
video_url = compat_urlparse.parse_qs(api_response)[u'url'][0]
return [{
'id': video_id,
'url': video_url,
'ext': 'flv',
'title': video_title,
}]

View File

@@ -18,11 +18,15 @@ class OoyalaIE(InfoExtractor):
},
}
@staticmethod
def _url_for_embed_code(embed_code):
return 'http://player.ooyala.com/player.js?embedCode=%s' % embed_code
def _extract_result(self, info, more_info):
return {'id': info['embedCode'],
'ext': 'mp4',
'title': unescapeHTML(info['title']),
'url': info['url'],
'url': info.get('ipad_url') or info['url'],
'description': unescapeHTML(more_info['description']),
'thumbnail': more_info['promo'],
}
@@ -35,7 +39,9 @@ class OoyalaIE(InfoExtractor):
mobile_url = self._search_regex(r'mobile_player_url="(.+?)&device="',
player, u'mobile player url')
mobile_player = self._download_webpage(mobile_url, embedCode)
videos_info = self._search_regex(r'eval\("\((\[{.*?stream_redirect.*?}\])\)"\);', mobile_player, u'info').replace('\\"','"')
videos_info = self._search_regex(
r'var streams=window.oo_testEnv\?\[\]:eval\("\((\[{.*?}\])\)"\);',
mobile_player, u'info').replace('\\"','"')
videos_more_info = self._search_regex(r'eval\("\(({.*?\\"promo\\".*?})\)"', mobile_player, u'more info').replace('\\"','"')
videos_info = json.loads(videos_info)
videos_more_info =json.loads(videos_more_info)

View File

@@ -0,0 +1,54 @@
# coding: utf-8
import re
import xml.etree.ElementTree
import json
from .common import InfoExtractor
from ..utils import (
compat_urlparse,
ExtractorError,
find_xpath_attr,
)
class ORFIE(InfoExtractor):
_VALID_URL = r'https?://tvthek.orf.at/(programs/.+?/episodes|topics/.+?)/(?P<id>\d+)'
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
playlist_id = mobj.group('id')
webpage = self._download_webpage(url, playlist_id)
flash_xml = self._search_regex('ORF.flashXML = \'(.+?)\'', webpage, u'flash xml')
flash_xml = compat_urlparse.parse_qs('xml='+flash_xml)['xml'][0]
flash_config = xml.etree.ElementTree.fromstring(flash_xml.encode('utf-8'))
playlist_json = self._search_regex(r'playlist\': \'(\[.*?\])\'', webpage, u'playlist').replace(r'\"','"')
playlist = json.loads(playlist_json)
videos = []
ns = '{http://tempuri.org/XMLSchema.xsd}'
xpath = '%(ns)sPlaylist/%(ns)sItems/%(ns)sItem' % {'ns': ns}
webpage_description = self._og_search_description(webpage)
for (i, (item, info)) in enumerate(zip(flash_config.findall(xpath), playlist), 1):
# Get best quality url
rtmp_url = None
for q in ['Q6A', 'Q4A', 'Q1A']:
video_url = find_xpath_attr(item, '%sVideoUrl' % ns, 'quality', q)
if video_url is not None:
rtmp_url = video_url.text
break
if rtmp_url is None:
raise ExtractorError(u'Couldn\'t get video url: %s' % info['id'])
description = self._html_search_regex(
r'id="playlist_entry_%s".*?<p>(.*?)</p>' % i, webpage,
u'description', default=webpage_description, flags=re.DOTALL)
videos.append({
'_type': 'video',
'id': info['id'],
'title': info['title'],
'url': rtmp_url,
'ext': 'flv',
'description': description,
})
return videos

View File

@@ -38,6 +38,7 @@ class PornotubeIE(InfoExtractor):
VIDEO_UPLOADED_RE = r'<div class="video_added_by">Added (?P<date>[0-9\/]+) by'
upload_date = self._html_search_regex(VIDEO_UPLOADED_RE, webpage, u'upload date', fatal=False)
if upload_date: upload_date = unified_strdate(upload_date)
age_limit = self._rta_search(webpage)
info = {'id': video_id,
'url': video_url,
@@ -45,6 +46,7 @@ class PornotubeIE(InfoExtractor):
'upload_date': upload_date,
'title': video_title,
'ext': 'flv',
'format': 'flv'}
'format': 'flv',
'age_limit': age_limit}
return [info]

View File

@@ -10,28 +10,35 @@ class RedTubeIE(InfoExtractor):
u'file': u'66418.mp4',
u'md5': u'7b8c22b5e7098a3e1c09709df1126d2d',
u'info_dict': {
u"title": u"Sucked on a toilet"
u"title": u"Sucked on a toilet",
u"age_limit": 18,
}
}
def _real_extract(self,url):
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_extension = 'mp4'
video_extension = 'mp4'
webpage = self._download_webpage(url, video_id)
self.report_extraction(video_id)
video_url = self._html_search_regex(r'<source src="(.+?)" type="video/mp4">',
webpage, u'video URL')
video_url = self._html_search_regex(
r'<source src="(.+?)" type="video/mp4">', webpage, u'video URL')
video_title = self._html_search_regex('<h1 class="videoTitle slidePanelMovable">(.+?)</h1>',
video_title = self._html_search_regex(
r'<h1 class="videoTitle slidePanelMovable">(.+?)</h1>',
webpage, u'title')
return [{
'id': video_id,
'url': video_url,
'ext': video_extension,
'title': video_title,
}]
# No self-labeling, but they describe themselves as
# "Home of Videos Porno"
age_limit = 18
return {
'id': video_id,
'url': video_url,
'ext': video_extension,
'title': video_title,
'age_limit': age_limit,
}

View File

@@ -0,0 +1,42 @@
import re
from .common import InfoExtractor
from ..utils import (
clean_html,
compat_parse_qs,
)
class Ro220IE(InfoExtractor):
IE_NAME = '220.ro'
_VALID_URL = r'(?x)(?:https?://)?(?:www\.)?220\.ro/(?P<category>[^/]+)/(?P<shorttitle>[^/]+)/(?P<video_id>[^/]+)'
_TEST = {
u"url": u"http://www.220.ro/sport/Luati-Le-Banii-Sez-4-Ep-1/LYV6doKo7f/",
u'file': u'LYV6doKo7f.mp4',
u'md5': u'03af18b73a07b4088753930db7a34add',
u'info_dict': {
u"title": u"Luati-le Banii sez 4 ep 1",
u"description": u"Iata-ne reveniti dupa o binemeritata vacanta. Va astept si pe Facebook cu pareri si comentarii.",
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('video_id')
webpage = self._download_webpage(url, video_id)
flashVars_str = self._search_regex(
r'<param name="flashVars" value="([^"]+)"',
webpage, u'flashVars')
flashVars = compat_parse_qs(flashVars_str)
info = {
'_type': 'video',
'id': video_id,
'ext': 'mp4',
'url': flashVars['videoURL'][0],
'title': flashVars['title'][0],
'description': clean_html(flashVars['desc'][0]),
'thumbnail': flashVars['preview'][0],
}
return info

View File

@@ -0,0 +1,16 @@
from .videodetective import VideoDetectiveIE
# It just uses the same method as videodetective.com,
# the internetvideoarchive.com is extracted from the og:video property
class RottenTomatoesIE(VideoDetectiveIE):
_VALID_URL = r'https?://www\.rottentomatoes\.com/m/[^/]+/trailers/(?P<id>\d+)'
_TEST = {
u'url': u'http://www.rottentomatoes.com/m/toy_story_3/trailers/11028566/',
u'file': '613340.mp4',
u'info_dict': {
u'title': u'TOY STORY 3',
u'description': u'From the creators of the beloved TOY STORY films, comes a story that will reunite the gang in a whole new way.',
},
}

View File

@@ -8,8 +8,8 @@ from ..utils import (
)
class RTLnowIE(InfoExtractor):
"""Information Extractor for RTLnow, RTL2now and VOXnow"""
_VALID_URL = r'(?:http://)?(?P<url>(?P<base_url>rtl(?:(?P<is_rtl2>2)|-)now\.rtl(?(is_rtl2)2|)\.de/|(?:www\.)?voxnow\.de/)[a-zA-Z0-9-]+/[a-zA-Z0-9-]+\.php\?(?:container_id|film_id)=(?P<video_id>[0-9]+)&player=1(?:&season=[0-9]+)?(?:&.*)?)'
"""Information Extractor for RTL NOW, RTL2 NOW, RTL NITRO, SUPER RTL NOW, VOX NOW and n-tv NOW"""
_VALID_URL = r'(?:http://)?(?P<url>(?P<base_url>rtl-now\.rtl\.de/|rtl2now\.rtl2\.de/|(?:www\.)?voxnow\.de/|(?:www\.)?rtlnitronow\.de/|(?:www\.)?superrtlnow\.de/|(?:www\.)?n-tvnow\.de/)[a-zA-Z0-9-]+/[a-zA-Z0-9-]+\.php\?(?:container_id|film_id)=(?P<video_id>[0-9]+)&player=1(?:&season=[0-9]+)?(?:&.*)?)'
_TESTS = [{
u'url': u'http://rtl-now.rtl.de/ahornallee/folge-1.php?film_id=90419&player=1&season=1',
u'file': u'90419.flv',
@@ -48,8 +48,48 @@ class RTLnowIE(InfoExtractor):
u'params': {
u'skip_download': True,
},
},
{
u'url': u'http://superrtlnow.de/medicopter-117/angst.php?film_id=99205&player=1',
u'file': u'99205.flv',
u'info_dict': {
u'upload_date': u'20080928',
u'title': u'Medicopter 117 - Angst!',
u'description': u'Angst!',
u'thumbnail': u'http://autoimg.static-fra.de/superrtlnow/287529/1500x1500/image2.jpg'
},
u'params': {
u'skip_download': True,
},
},
{
u'url': u'http://www.rtlnitronow.de/recht-ordnung/lebensmittelkontrolle-erlangenordnungsamt-berlin.php?film_id=127367&player=1&season=1',
u'file': u'127367.flv',
u'info_dict': {
u'upload_date': u'20130926',
u'title': u'Recht & Ordnung - Lebensmittelkontrolle Erlangen/Ordnungsamt...',
u'description': u'Lebensmittelkontrolle Erlangen/Ordnungsamt Berlin',
u'thumbnail': u'http://autoimg.static-fra.de/nitronow/344787/1500x1500/image2.jpg',
},
u'params': {
u'skip_download': True,
},
},
{
u'url': u'http://www.n-tvnow.de/top-gear/episode-1-2013-01-01-00-00-00.php?film_id=124903&player=1&season=10',
u'file': u'124903.flv',
u'info_dict': {
u'upload_date': u'20130101',
u'title': u'Top Gear vom 01.01.2013',
u'description': u'Episode 1',
},
u'params': {
u'skip_download': True,
},
u'skip': u'Only works from Germany',
}]
def _real_extract(self,url):
mobj = re.match(self._VALID_URL, url)
@@ -66,20 +106,23 @@ class RTLnowIE(InfoExtractor):
msg = clean_html(note_m.group(1))
raise ExtractorError(msg)
video_title = self._html_search_regex(r'<title>(?P<title>[^<]+)</title>',
video_title = self._html_search_regex(r'<title>(?P<title>[^<]+?)( \| [^<]*)?</title>',
webpage, u'title')
playerdata_url = self._html_search_regex(r'\'playerdata\': \'(?P<playerdata_url>[^\']+)\'',
webpage, u'playerdata_url')
playerdata = self._download_webpage(playerdata_url, video_id)
mobj = re.search(r'<title><!\[CDATA\[(?P<description>.+?)\s+- (?:Sendung )?vom (?P<upload_date_d>[0-9]{2})\.(?P<upload_date_m>[0-9]{2})\.(?:(?P<upload_date_Y>[0-9]{4})|(?P<upload_date_y>[0-9]{2})) [0-9]{2}:[0-9]{2} Uhr\]\]></title>', playerdata)
mobj = re.search(r'<title><!\[CDATA\[(?P<description>.+?)(?:\s+- (?:Sendung )?vom (?P<upload_date_d>[0-9]{2})\.(?P<upload_date_m>[0-9]{2})\.(?:(?P<upload_date_Y>[0-9]{4})|(?P<upload_date_y>[0-9]{2})) [0-9]{2}:[0-9]{2} Uhr)?\]\]></title>', playerdata)
if mobj:
video_description = mobj.group(u'description')
if mobj.group('upload_date_Y'):
video_upload_date = mobj.group('upload_date_Y')
else:
elif mobj.group('upload_date_y'):
video_upload_date = u'20' + mobj.group('upload_date_y')
video_upload_date += mobj.group('upload_date_m')+mobj.group('upload_date_d')
else:
video_upload_date = None
if video_upload_date:
video_upload_date += mobj.group('upload_date_m')+mobj.group('upload_date_d')
else:
video_description = None
video_upload_date = None

View File

@@ -0,0 +1,58 @@
# encoding: utf-8
import re
import json
from .common import InfoExtractor
from ..utils import (
compat_urlparse,
compat_str,
ExtractorError,
)
class RutubeIE(InfoExtractor):
_VALID_URL = r'https?://rutube.ru/video/(?P<long_id>\w+)'
_TEST = {
u'url': u'http://rutube.ru/video/3eac3b4561676c17df9132a9a1e62e3e/',
u'file': u'3eac3b4561676c17df9132a9a1e62e3e.mp4',
u'info_dict': {
u'title': u'Раненный кенгуру забежал в аптеку',
u'uploader': u'NTDRussian',
u'uploader_id': u'29790',
},
u'params': {
# It requires ffmpeg (m3u8 download)
u'skip_download': True,
},
}
def _get_api_response(self, short_id, subpath):
api_url = 'http://rutube.ru/api/play/%s/%s/?format=json' % (subpath, short_id)
response_json = self._download_webpage(api_url, short_id,
u'Downloading %s json' % subpath)
return json.loads(response_json)
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
long_id = mobj.group('long_id')
webpage = self._download_webpage(url, long_id)
og_video = self._og_search_video_url(webpage)
short_id = compat_urlparse.urlparse(og_video).path[1:]
options = self._get_api_response(short_id, 'options')
trackinfo = self._get_api_response(short_id, 'trackinfo')
# Some videos don't have the author field
author = trackinfo.get('author') or {}
m3u8_url = trackinfo['video_balancer'].get('m3u8')
if m3u8_url is None:
raise ExtractorError(u'Couldn\'t find m3u8 manifest url')
return {
'id': trackinfo['id'],
'title': trackinfo['title'],
'url': m3u8_url,
'ext': 'mp4',
'thumbnail': options['thumbnail_url'],
'uploader': author.get('name'),
'uploader_id': compat_str(author['id']) if author else None,
}

View File

@@ -0,0 +1,47 @@
import re
import json
from .common import InfoExtractor
from ..utils import (
compat_urlparse,
ExtractorError,
)
class SlideshareIE(InfoExtractor):
_VALID_URL = r'https?://www\.slideshare\.net/[^/]+?/(?P<title>.+?)($|\?)'
_TEST = {
u'url': u'http://www.slideshare.net/Dataversity/keynote-presentation-managing-scale-and-complexity',
u'file': u'25665706.mp4',
u'info_dict': {
u'title': u'Managing Scale and Complexity',
u'description': u'This was a keynote presentation at the NoSQL Now! 2013 Conference & Expo (http://www.nosqlnow.com). This presentation was given by Adrian Cockcroft from Netflix',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
page_title = mobj.group('title')
webpage = self._download_webpage(url, page_title)
slideshare_obj = self._search_regex(
r'var slideshare_object = ({.*?}); var user_info =',
webpage, u'slideshare object')
info = json.loads(slideshare_obj)
if info['slideshow']['type'] != u'video':
raise ExtractorError(u'Webpage type is "%s": only video extraction is supported for Slideshare' % info['slideshow']['type'], expected=True)
doc = info['doc']
bucket = info['jsplayer']['video_bucket']
ext = info['jsplayer']['video_extension']
video_url = compat_urlparse.urljoin(bucket, doc + '-SD.' + ext)
return {
'_type': 'video',
'id': info['slideshow']['id'],
'title': info['slideshow']['title'],
'ext': ext,
'url': video_url,
'thumbnail': info['slideshow']['pin_image_url'],
'description': self._og_search_description(webpage),
}

View File

@@ -0,0 +1,94 @@
# encoding: utf-8
import json
import re
from .common import InfoExtractor
from ..utils import ExtractorError
class SohuIE(InfoExtractor):
_VALID_URL = r'https?://(?P<mytv>my\.)?tv\.sohu\.com/.+?/(?(mytv)|n)(?P<id>\d+)\.shtml.*?'
_TEST = {
u'url': u'http://tv.sohu.com/20130724/n382479172.shtml#super',
u'file': u'382479172.mp4',
u'md5': u'bde8d9a6ffd82c63a1eefaef4eeefec7',
u'info_dict': {
u'title': u'MVFar East Movement《The Illest》',
},
}
def _real_extract(self, url):
def _fetch_data(vid_id, mytv=False):
if mytv:
base_data_url = 'http://my.tv.sohu.com/play/videonew.do?vid='
else:
base_data_url = u'http://hot.vrs.sohu.com/vrs_flash.action?vid='
data_url = base_data_url + str(vid_id)
data_json = self._download_webpage(
data_url, video_id,
note=u'Downloading JSON data for ' + str(vid_id))
return json.loads(data_json)
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
mytv = mobj.group('mytv') is not None
webpage = self._download_webpage(url, video_id)
raw_title = self._html_search_regex(r'(?s)<title>(.+?)</title>',
webpage, u'video title')
title = raw_title.partition('-')[0].strip()
vid = self._html_search_regex(r'var vid ?= ?["\'](\d+)["\']', webpage,
u'video path')
data = _fetch_data(vid, mytv)
QUALITIES = ('ori', 'super', 'high', 'nor')
vid_ids = [data['data'][q + 'Vid']
for q in QUALITIES
if data['data'][q + 'Vid'] != 0]
if not vid_ids:
raise ExtractorError(u'No formats available for this video')
# For now, we just pick the highest available quality
vid_id = vid_ids[-1]
format_data = data if vid == vid_id else _fetch_data(vid_id, mytv)
part_count = format_data['data']['totalBlocks']
allot = format_data['allot']
prot = format_data['prot']
clipsURL = format_data['data']['clipsURL']
su = format_data['data']['su']
playlist = []
for i in range(part_count):
part_url = ('http://%s/?prot=%s&file=%s&new=%s' %
(allot, prot, clipsURL[i], su[i]))
part_str = self._download_webpage(
part_url, video_id,
note=u'Downloading part %d of %d' % (i+1, part_count))
part_info = part_str.split('|')
video_url = '%s%s?key=%s' % (part_info[0], su[i], part_info[3])
video_info = {
'id': '%s_part%02d' % (video_id, i + 1),
'title': title,
'url': video_url,
'ext': 'mp4',
}
playlist.append(video_info)
if len(playlist) == 1:
info = playlist[0]
info['id'] = video_id
else:
info = {
'_type': 'playlist',
'entries': playlist,
'id': video_id,
}
return info

View File

@@ -1,10 +1,12 @@
import json
import re
import itertools
from .common import InfoExtractor
from ..utils import (
compat_str,
compat_urlparse,
compat_urllib_parse,
ExtractorError,
unified_strdate,
@@ -53,10 +55,11 @@ class SoundcloudIE(InfoExtractor):
def _resolv_url(cls, url):
return 'http://api.soundcloud.com/resolve.json?url=' + url + '&client_id=' + cls._CLIENT_ID
def _extract_info_dict(self, info, full_title=None):
def _extract_info_dict(self, info, full_title=None, quiet=False):
video_id = info['id']
name = full_title or video_id
self.report_extraction(name)
if quiet == False:
self.report_extraction(name)
thumbnail = info['artwork_url']
if thumbnail is not None:
@@ -198,3 +201,41 @@ class SoundcloudSetIE(SoundcloudIE):
'id': info['id'],
'title': info['title'],
}
class SoundcloudUserIE(SoundcloudIE):
_VALID_URL = r'https?://(www\.)?soundcloud.com/(?P<user>[^/]+)(/?(tracks/)?)?(\?.*)?$'
IE_NAME = u'soundcloud:user'
# it's in tests/test_playlists.py
_TEST = None
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
uploader = mobj.group('user')
url = 'http://soundcloud.com/%s/' % uploader
resolv_url = self._resolv_url(url)
user_json = self._download_webpage(resolv_url, uploader,
u'Downloading user info')
user = json.loads(user_json)
tracks = []
for i in itertools.count():
data = compat_urllib_parse.urlencode({'offset': i*50,
'client_id': self._CLIENT_ID,
})
tracks_url = 'http://api.soundcloud.com/users/%s/tracks.json?' % user['id'] + data
response = self._download_webpage(tracks_url, uploader,
u'Downloading tracks page %s' % (i+1))
new_tracks = json.loads(response)
tracks.extend(self._extract_info_dict(track, quiet=True) for track in new_tracks)
if len(new_tracks) < 50:
break
return {
'_type': 'playlist',
'id': compat_str(user['id']),
'title': user['username'],
'entries': tracks,
}

View File

@@ -0,0 +1,38 @@
import re
from .mtv import MTVIE, _media_xml_tag
class SouthParkStudiosIE(MTVIE):
IE_NAME = u'southparkstudios.com'
_VALID_URL = r'https?://www\.southparkstudios\.com/(clips|full-episodes)/(?P<id>.+?)(\?|#|$)'
_FEED_URL = 'http://www.southparkstudios.com/feeds/video-player/mrss'
_TEST = {
u'url': u'http://www.southparkstudios.com/clips/104437/bat-daded#tab=featured',
u'file': u'a7bff6c2-ed00-11e0-aca6-0026b9414f30.mp4',
u'info_dict': {
u'title': u'Bat Daded',
u'description': u'Randy disqualifies South Park by getting into a fight with Bat Dad.',
},
}
# Overwrite MTVIE properties we don't want
_TESTS = []
def _get_thumbnail_url(self, uri, itemdoc):
search_path = '%s/%s' % (_media_xml_tag('group'), _media_xml_tag('thumbnail'))
thumb_node = itemdoc.find(search_path)
if thumb_node is None:
return None
else:
return thumb_node.attrib['url']
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
mgid = self._search_regex(r'swfobject.embedSWF\(".*?(mgid:.*?)"',
webpage, u'mgid')
return self._get_videos_info(mgid)

View File

@@ -0,0 +1,91 @@
from .common import InfoExtractor
from ..utils import (
compat_str,
ExtractorError,
)
class SubtitlesInfoExtractor(InfoExtractor):
@property
def _have_to_download_any_subtitles(self):
return any([self._downloader.params.get('writesubtitles', False),
self._downloader.params.get('writeautomaticsub')])
def _list_available_subtitles(self, video_id, webpage=None):
""" outputs the available subtitles for the video """
sub_lang_list = self._get_available_subtitles(video_id)
auto_captions_list = self._get_available_automatic_caption(video_id, webpage)
sub_lang = ",".join(list(sub_lang_list.keys()))
self.to_screen(u'%s: Available subtitles for video: %s' %
(video_id, sub_lang))
auto_lang = ",".join(auto_captions_list.keys())
self.to_screen(u'%s: Available automatic captions for video: %s' %
(video_id, auto_lang))
def extract_subtitles(self, video_id, video_webpage=None):
"""
returns {sub_lang: sub} ,{} if subtitles not found or None if the
subtitles aren't requested.
"""
if not self._have_to_download_any_subtitles:
return None
available_subs_list = {}
if self._downloader.params.get('writeautomaticsub', False):
available_subs_list.update(self._get_available_automatic_caption(video_id, video_webpage))
if self._downloader.params.get('writesubtitles', False):
available_subs_list.update(self._get_available_subtitles(video_id))
if not available_subs_list: # error, it didn't get the available subtitles
return {}
if self._downloader.params.get('allsubtitles', False):
sub_lang_list = available_subs_list
else:
if self._downloader.params.get('subtitleslangs', False):
requested_langs = self._downloader.params.get('subtitleslangs')
elif 'en' in available_subs_list:
requested_langs = ['en']
else:
requested_langs = [list(available_subs_list.keys())[0]]
sub_lang_list = {}
for sub_lang in requested_langs:
if not sub_lang in available_subs_list:
self._downloader.report_warning(u'no closed captions found in the specified language "%s"' % sub_lang)
continue
sub_lang_list[sub_lang] = available_subs_list[sub_lang]
subtitles = {}
for sub_lang, url in sub_lang_list.items():
subtitle = self._request_subtitle_url(sub_lang, url)
if subtitle:
subtitles[sub_lang] = subtitle
return subtitles
def _request_subtitle_url(self, sub_lang, url):
""" makes the http request for the subtitle """
try:
sub = self._download_webpage(url, None, note=False)
except ExtractorError as err:
self._downloader.report_warning(u'unable to download video subtitles for %s: %s' % (sub_lang, compat_str(err)))
return
if not sub:
self._downloader.report_warning(u'Did not fetch video subtitles')
return
return sub
def _get_available_subtitles(self, video_id):
"""
returns {sub_lang: url} or {} if not available
Must be redefined by the subclasses
"""
pass
def _get_available_automatic_caption(self, video_id, webpage):
"""
returns {sub_lang: url} or {} if not available
Must be redefined by the subclasses that support automatic captions,
otherwise it will return {}
"""
self._downloader.report_warning(u'Automatic Captions not supported by this server')
return {}

View File

@@ -0,0 +1,44 @@
# -*- coding: utf-8 -*-
import re
from .common import InfoExtractor
from ..utils import determine_ext
class SztvHuIE(InfoExtractor):
_VALID_URL = r'(?:http://)?(?:(?:www\.)?sztv\.hu|www\.tvszombathely\.hu)/(?:[^/]+)/.+-(?P<id>[0-9]+)'
_TEST = {
u'url': u'http://sztv.hu/hirek/cserkeszek-nepszerusitettek-a-kornyezettudatos-eletmodot-a-savaria-teren-20130909',
u'file': u'20130909.mp4',
u'md5': u'a6df607b11fb07d0e9f2ad94613375cb',
u'info_dict': {
u"title": u"Cserkészek népszerűsítették a környezettudatos életmódot a Savaria téren",
u"description": u'A zöld nap játékos ismeretterjesztő programjait a Magyar Cserkész Szövetség szervezte, akik az ország nyolc városában adják át tudásukat az érdeklődőknek. A PET...',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
video_file = self._search_regex(
r'file: "...:(.*?)",', webpage, 'video file')
title = self._html_search_regex(
r'<meta name="title" content="([^"]*?) - [^-]*? - [^-]*?"',
webpage, 'video title')
description = self._html_search_regex(
r'<meta name="description" content="([^"]*)"/>',
webpage, 'video description', fatal=False)
thumbnail = self._og_search_thumbnail(webpage)
video_url = 'http://media.sztv.hu/vod/' + video_file
return {
'id': video_id,
'url': video_url,
'title': title,
'ext': determine_ext(video_url),
'description': description,
'thumbnail': thumbnail,
}

View File

@@ -0,0 +1,65 @@
import re
from .common import InfoExtractor
from ..utils import (
get_element_by_attribute,
clean_html,
)
class TechTalksIE(InfoExtractor):
_VALID_URL = r'https?://techtalks\.tv/talks/[^/]*/(?P<id>\d+)/'
_TEST = {
u'url': u'http://techtalks.tv/talks/learning-topic-models-going-beyond-svd/57758/',
u'playlist': [
{
u'file': u'57758.flv',
u'info_dict': {
u'title': u'Learning Topic Models --- Going beyond SVD',
},
},
{
u'file': u'57758-slides.flv',
u'info_dict': {
u'title': u'Learning Topic Models --- Going beyond SVD',
},
},
],
u'params': {
# rtmp download
u'skip_download': True,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
talk_id = mobj.group('id')
webpage = self._download_webpage(url, talk_id)
rtmp_url = self._search_regex(r'netConnectionUrl: \'(.*?)\'', webpage,
u'rtmp url')
play_path = self._search_regex(r'href=\'(.*?)\' [^>]*id="flowplayer_presenter"',
webpage, u'presenter play path')
title = clean_html(get_element_by_attribute('class', 'title', webpage))
video_info = {
'id': talk_id,
'title': title,
'url': rtmp_url,
'play_path': play_path,
'ext': 'flv',
}
m_slides = re.search(r'<a class="slides" href=\'(.*?)\'', webpage)
if m_slides is None:
return video_info
else:
return [
video_info,
# The slides video
{
'id': talk_id + '-slides',
'title': title,
'url': rtmp_url,
'play_path': m_slides.group(1),
'ext': 'flv',
},
]

View File

@@ -77,12 +77,20 @@ class TEDIE(InfoExtractor):
thumbnail = self._search_regex(r'</span>[\s.]*</div>[\s.]*<img src="(.*?)"',
webpage, 'thumbnail')
formats = [{
'ext': 'mp4',
'url': stream['file'],
'format': stream['id']
} for stream in info['htmlStreams']]
info = {
'id': info['id'],
'url': info['htmlStreams'][-1]['file'],
'ext': 'mp4',
'title': title,
'thumbnail': thumbnail,
'description': desc,
}
'id': info['id'],
'title': title,
'thumbnail': thumbnail,
'description': desc,
'formats': formats,
}
# TODO: Remove when #980 has been merged
info.update(info['formats'][-1])
return info

View File

@@ -0,0 +1,73 @@
import json
import re
import xml.etree.ElementTree
from .common import InfoExtractor
class TriluliluIE(InfoExtractor):
_VALID_URL = r'(?x)(?:https?://)?(?:www\.)?trilulilu\.ro/video-(?P<category>[^/]+)/(?P<video_id>[^/]+)'
_TEST = {
u"url": u"http://www.trilulilu.ro/video-animatie/big-buck-bunny-1",
u'file': u"big-buck-bunny-1.mp4",
u'info_dict': {
u"title": u"Big Buck Bunny",
u"description": u":) pentru copilul din noi",
},
# Server ignores Range headers (--test)
u"params": {
u"skip_download": True
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('video_id')
webpage = self._download_webpage(url, video_id)
title = self._og_search_title(webpage)
thumbnail = self._og_search_thumbnail(webpage)
description = self._og_search_description(webpage)
log_str = self._search_regex(
r'block_flash_vars[ ]=[ ]({[^}]+})', webpage, u'log info')
log = json.loads(log_str)
format_url = (u'http://fs%(server)s.trilulilu.ro/%(hash)s/'
u'video-formats2' % log)
format_str = self._download_webpage(
format_url, video_id,
note=u'Downloading formats',
errnote=u'Error while downloading formats')
format_doc = xml.etree.ElementTree.fromstring(format_str)
video_url_template = (
u'http://fs%(server)s.trilulilu.ro/stream.php?type=video'
u'&source=site&hash=%(hash)s&username=%(userid)s&'
u'key=ministhebest&format=%%s&sig=&exp=' %
log)
formats = [
{
'format': fnode.text,
'url': video_url_template % fnode.text,
'ext': fnode.text.partition('-')[0]
}
for fnode in format_doc.findall('./formats/format')
]
info = {
'_type': 'video',
'id': video_id,
'formats': formats,
'title': title,
'description': description,
'thumbnail': thumbnail,
}
# TODO: Remove when #980 has been merged
info.update(formats[-1])
return info

View File

@@ -7,15 +7,25 @@ from .common import InfoExtractor
class TudouIE(InfoExtractor):
_VALID_URL = r'(?:http://)?(?:www\.)?tudou\.com/(?:listplay|programs)/(?:view|(.+?))/(?:([^/]+)|([^/]+))(?:\.html)?'
_TEST = {
_VALID_URL = r'(?:http://)?(?:www\.)?tudou\.com/(?:listplay|programs|albumplay)/(?:view|(.+?))/(?:([^/]+)|([^/]+))(?:\.html)?'
_TESTS = [{
u'url': u'http://www.tudou.com/listplay/zzdE77v6Mmo/2xN2duXMxmw.html',
u'file': u'159448201.f4v',
u'md5': u'140a49ed444bd22f93330985d8475fcb',
u'info_dict': {
u"title": u"卡马乔国足开大脚长传冲吊集锦"
}
}
},
{
u'url': u'http://www.tudou.com/albumplay/TenTw_JgiPM/PzsAs5usU9A.html',
u'file': u'todo.mp4',
u'md5': u'todo.mp4',
u'info_dict': {
u'title': u'todo.mp4',
},
u'add_ie': [u'Youku'],
u'skip': u'Only works from China'
}]
def _url_for_id(self, id, quality = None):
info_url = "http://v2.tudou.com/f?id="+str(id)
@@ -29,14 +39,19 @@ class TudouIE(InfoExtractor):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group(2)
webpage = self._download_webpage(url, video_id)
title = re.search(",kw:\"(.+)\"",webpage)
if title is None:
title = re.search(",kw: \'(.+)\'",webpage)
title = title.group(1)
thumbnail_url = re.search(",pic: \'(.+?)\'",webpage)
if thumbnail_url is None:
thumbnail_url = re.search(",pic:\"(.+?)\"",webpage)
thumbnail_url = thumbnail_url.group(1)
m = re.search(r'vcode:\s*[\'"](.+?)[\'"]', webpage)
if m and m.group(1):
return {
'_type': 'url',
'url': u'youku:' + m.group(1),
'ie_key': 'Youku'
}
title = self._search_regex(
r",kw:\s*['\"](.+?)[\"']", webpage, u'title')
thumbnail_url = self._search_regex(
r",pic:\s*[\"'](.+?)[\"']", webpage, u'thumbnail URL', fatal=False)
segs_json = self._search_regex(r'segs: \'(.*)\'', webpage, 'segments')
segments = json.loads(segs_json)

View File

@@ -11,7 +11,7 @@ class UnistraIE(InfoExtractor):
u'md5': u'736f605cfdc96724d55bb543ab3ced24',
u'info_dict': {
u'title': u'M!ss Yella',
u'description': u'md5:75e8439a3e2981cd5d4b6db232e8fdfc',
u'description': u'md5:104892c71bd48e55d70b902736b81bbf',
},
}

View File

@@ -1,6 +1,11 @@
import json
import re
from .common import InfoExtractor
from ..utils import (
compat_urlparse,
get_meta_content,
)
class UstreamIE(InfoExtractor):
@@ -43,3 +48,25 @@ class UstreamIE(InfoExtractor):
'thumbnail': thumbnail,
}
return info
class UstreamChannelIE(InfoExtractor):
_VALID_URL = r'https?://www\.ustream\.tv/channel/(?P<slug>.+)'
IE_NAME = u'ustream:channel'
def _real_extract(self, url):
m = re.match(self._VALID_URL, url)
slug = m.group('slug')
webpage = self._download_webpage(url, slug)
channel_id = get_meta_content('ustream:channel_id', webpage)
BASE = 'http://www.ustream.tv'
next_url = '/ajax/socialstream/videos/%s/1.json' % channel_id
video_ids = []
while next_url:
reply = json.loads(self._download_webpage(compat_urlparse.urljoin(BASE, next_url), channel_id))
video_ids.extend(re.findall(r'data-content-id="(\d.*)"', reply['data']))
next_url = reply['nextUrl']
urls = ['http://www.ustream.tv/recorded/' + vid for vid in video_ids]
url_entries = [self.url_result(eurl, 'Ustream') for eurl in urls]
return self.playlist_result(url_entries, channel_id)

Some files were not shown because too many files have changed in this diff Show More