Compare commits

..

1035 Commits

Author SHA1 Message Date
bbd7706898 release 2016.10.16 2016-10-16 03:23:05 +07:00
112740e79f [ChangeLog] Actualize 2016-10-16 03:09:27 +07:00
c0b1e88895 [huajiao] Improve feed regex 2016-10-16 03:02:41 +07:00
7cdfbbf9b8 [extractors] Change import for theoperaplatform extractor 2016-10-16 02:55:38 +07:00
ac943d48d3 [Beatport] Update extractor name and tests 2016-10-16 02:33:43 +07:00
73498a8921 [ruutu] Add support for supla.fi 2016-10-16 02:31:56 +07:00
9187ee4f19 [README.md] Improve grammar 2016-10-16 02:26:06 +07:00
2273e2c530 [postprocessor/ffmpeg] Return correct filepath and ext in updated information in FFmpegExtractAudioPP
Return correct audio's filepath and ext instead of the video's when extracting audio and audio file already exists.
2016-10-16 02:12:03 +07:00
4b492e3579 [theoperaplatform] Rename, fix _VALID_URL and fix test 2016-10-16 00:24:06 +07:00
9c4258bcec [theoperaplatform] Add extractor 2016-10-16 00:21:15 +07:00
ea8aefd1d7 [lynda] Fix height for prioritized streams 2016-10-16 00:08:46 +07:00
6edfc40a0e [lynda] Add fallback extraction scenario 2016-10-16 00:07:40 +07:00
68d9561ca1 [lynda] Switch to https (closes #10916) 2016-10-15 23:56:09 +07:00
cfc0e7c82b Credit @pyx for the Huajiao extractor (#10917) 2016-10-15 16:48:35 +08:00
4102e64051 Merge branch 'pyx-huajiao' 2016-10-15 14:56:00 +08:00
f605242bfc [ChangeLog] Update for #10917 2016-10-15 14:55:36 +08:00
d32fa0f12c [huajiao] Coding style 2016-10-15 14:53:53 +08:00
a347a0d088 Merge branch 'huajiao' of https://github.com/pyx/youtube-dl into pyx-huajiao 2016-10-15 14:53:05 +08:00
77c5b98dcd [crunchyroll] Skip an invalid _TEST 2016-10-15 14:36:07 +08:00
88ebefc054 [cmt] Fix mgid extraction (closes #10813)
The example in #10813 requires TV provider authentication in Firefox,
while youtube-dl can download it directly with an US proxy.

I'm not sure whether the mgid fix is cmt-specific or it applies to all
mtv-based sites. I keep it in cmt.py until similar patterns are found in
other websites.
2016-10-15 14:27:15 +08:00
2e638d7bca Made optional fields optional 2016-10-14 14:12:06 -04:00
a26b174c61 [safari:course] Add support for techbus.safaribooksonline.com 2016-10-15 00:29:33 +07:00
73c801d660 [orf:tvthek] Fix extraction and modernize (closes #10898) 2016-10-14 23:43:09 +07:00
dff5107b68 README.md: fix alrady typo 2016-10-14 23:21:08 +07:00
8c3e448e80 [clipfish] Update _TEST; the old one is gone 2016-10-15 00:12:21 +08:00
2ecbd2ad6f [chirbit:profile] Fix extraction 2016-10-15 00:01:46 +08:00
62a0b86e4f [carambatv] Fix extraction
The video requested in #9815 now has videomore embeds.
2016-10-14 23:43:18 +08:00
146969e05b [videomore] Support <iframe> embed videos
Seen in CarambaTVPage
2016-10-14 23:42:11 +08:00
e2004ccaf7 [canalplus] Fix video_id and update _TESTS
Some tests are gone, and some redirect to different videos
2016-10-14 20:26:12 +08:00
a5f8473145 [cbsinteractive] Fix extraction for cnet.com 2016-10-14 18:20:01 +08:00
b7f59a3bf6 [huajiao] Add new extractor 2016-10-13 21:51:26 -04:00
580d411931 [parliamentliveuk] Recognize lower case URLs
Closes #10912

Seems parliamentliveuk matches URLs case-insentive. For example this URL
also works:
http://parliamentlive.tv/EvEnt/Index/3F24936f-130f-40bf-9a5d-b3d6479da6a4
2016-10-14 00:44:28 +08:00
5c4bfd4da5 release 2016.10.12 2016-10-12 21:30:05 +07:00
7104ae799c [ChangeLog] Actualize 2016-10-12 21:25:04 +07:00
bcd6276520 [downloader/common] Remove debug output 2016-10-12 21:22:33 +07:00
591e384552 [streamable] Remove debug output 2016-10-12 21:22:12 +07:00
9feb1c9731 [dailymotion] Fix extraction and update _TESTS
Closes #10901

Seems all videos use player V5 syntax now
2016-10-12 21:45:49 +08:00
a093cfc78b [vimeo:review] Fix extraction (#10900)
Now Vimeo Review videos uses React. Thanks @davekaro for analyzing the
problem!
2016-10-12 01:48:06 +08:00
6f20b65e72 [test/test_http] Update tests
After switching to HTML5 extraction helpers in generic.py, the result
info_dict is always a playlist.
2016-10-12 01:41:41 +08:00
cea364f70c [extractor/common] Support HTML media elements without child nodes 2016-10-12 01:40:28 +08:00
55642487f0 [nhl] Skip invalid m3u8 formats (closes #10713) 2016-10-11 20:50:52 +08:00
3d643f4cec [hbo] Add HBOEpisodeIE (#10892) 2016-10-11 17:46:52 +08:00
c452e69d3d [footyroom] Fix extraction and update _TESTS (closes #10810) 2016-10-11 17:46:13 +08:00
555787d717 [streamable] Add helper for extracting embedded videos 2016-10-11 17:44:35 +08:00
f165ca70eb [abc.net.au:iview] Fix for non-series videos (closes #10895) 2016-10-11 12:53:27 +08:00
27b8d2ee95 [hbo] Add display_id and another test (#10892) 2016-10-11 12:41:44 +08:00
71cdcb2331 [hbo] Support episode pages (closes #10892) 2016-10-11 12:30:35 +08:00
176006a120 [allocine] Fix for /video/ videos (closes #10860) 2016-10-09 19:42:42 +08:00
65f4c1de3d [allocine] Fix extraction (closes #10860)
I change the URL of the third test case, because now the original URL
does not contain a video anymore, and there's no easy to get the real
URL from the /film/ one.
2016-10-09 18:58:15 +08:00
b0082629a9 [nextmedia] Support action news (動新聞) on Apple Daily 2016-10-09 18:42:15 +08:00
8204c73352 [Makefile] Fix for GNU make < 4 (closes #9387)
Shell assignment operator in BSD make != is ported to GNU make in
version 4.0, so 3.x doesn't work. I choose to drop BSD make support as
installing GNU make on *BSD systems is easier than installing newer GNU
make.
2016-10-09 18:24:45 +08:00
2b51dac1f9 [slutload] Fix test and simplify 2016-10-09 01:17:38 +07:00
f68901e50a [reverbnation] Eliminate code duplication in thumbnails extraction 2016-10-09 01:02:35 +07:00
3adb9d119e [reverbnation] Modernize 2016-10-09 01:00:38 +07:00
1dd58e14d8 [lego] improve info extraction and bypass geo restriction(closes #10872) 2016-10-08 08:33:18 +01:00
dd4291f729 release 2016.10.07 2016-10-07 22:25:30 +07:00
888f8d6ba4 [ChangeLog] Actualize 2016-10-07 22:23:16 +07:00
f475e88121 [vimeo] PEP 8
[ci skip]
2016-10-07 22:15:26 +07:00
3c6b3bf221 [iprima] detect geo restriction 2016-10-07 15:53:16 +01:00
38588ab977 [facebook] Fix for new handleServerJS syntax (closes #10846)
According to the dump file in #10846, handleServerJS() now accepts
an optional second argument. It's a string from available dump files.
2016-10-07 20:04:49 +08:00
85bcdd081c [extractors] Add MmsIE 2016-10-07 19:31:26 +08:00
9dcd6fd3aa [generic,commonprotocols] Move mms suuport from GenericIE
And use _generic_* helpers in those extractors
2016-10-07 19:24:22 +08:00
98763ee354 [extractor/common] Add id and title helpers for generic IEs 2016-10-07 19:20:53 +08:00
3d83a1ae92 [generic] Support direct MMS links (closes #10838) 2016-10-07 17:50:45 +08:00
c0a7b9b348 Revert "[Makefilea] Fix for GNU make < 4"
This reverts commit 831a34caa2.

The reverted commit breaks lazy extractors.
2016-10-07 16:03:34 +08:00
831a34caa2 [Makefilea] Fix for GNU make < 4
Closes #9387

The shell assignment operator != was introduced in GNU make 4.0, or
specifically the commit in [1]. This fix removes such usages and
fallback to a more portable syntax. Tested with:

* GNU make 3.82 on CentOS 7.2
* bmake 20150910 on CentOS 7.2, source RPM from Fedora 24 [2]
* GNU make 4.2.1 on Arch Linux (Arch official package)
* bmake 20160926 on Arch Linux (Arch official package)
* GNU make 3.82 on Arch Linux (Compiled from source)
* Apple bsdmake-24 on macOS Sierra, binary package from Homebrew

Thanks @bdeyal for the feedback of the first tests

[1] http://git.savannah.gnu.org/cgit/make.git/commit/?id=b34438bee83ee906a23b881f257e684a0993b9b1
[2] http://koji.fedoraproject.org/koji/buildinfo?buildID=716769
2016-10-07 03:28:41 +08:00
09b9c45e24 [generic] Add support for multiple vimeo embeds (Closes #10862) 2016-10-06 23:22:52 +07:00
33898fb19c [nzz] Add new extractor(#4407) 2016-10-06 10:45:57 +01:00
017eb82934 [npo] detect geo restriction 2016-10-05 18:27:02 +01:00
b1d798887e [npo] Add support for 2doc.nl (Closes #10842) 2016-10-05 23:43:08 +07:00
0a33bb2cb2 Rename "Steffan 'Ruirize' James" to "Steffan Donal"
Legal name change!
2016-10-05 03:32:14 +07:00
185744f92f [lego] Add new extractor(closes #10369) 2016-10-04 10:30:57 +01:00
7232e54813 [tonline] Add new extractor(#10376) 2016-10-04 08:00:25 +01:00
6eb5503b12 [techtalks] Relax _VALID_URL 2016-10-04 02:54:36 +07:00
539c881bfc [techtalks] Allow URL-s with name part omitted. 2016-10-04 02:52:33 +07:00
c1b2a0858c [youtube:live] Extend _VALID_URL (Closes #10839) 2016-10-04 02:10:23 +07:00
215ff6e0f3 [theweatherchannel] Add new extractor(closes #7188) 2016-10-03 18:20:34 +01:00
dcdb292fdd Unify coding cookie 2016-10-03 23:44:29 +07:00
c1084ddb0c [thisoldhouse] Add new extractor(closes #10837) 2016-10-03 15:27:09 +01:00
ee5de4e38e [nhl] Add support for wch2016.com (Closes #10833) 2016-10-03 00:54:02 +07:00
25291b979a Merge pull request #10829 from TRox1972/pornoxo_improve
[pornoxo] Use JWPlatform to improve metadata extraction
2016-10-02 20:19:34 +08:00
567a5996ca [pornoxo] Use JWPlatform to improve metadata extraction 2016-10-02 13:07:02 +02:00
6c152ce20f release 2016.10.02 2016-10-02 15:58:00 +07:00
26406d33c7 [ChangeLog] Actualize 2016-10-02 15:56:33 +07:00
703b3afa93 [amcnetworks] Skip a restricted _TEST 2016-10-02 14:25:06 +08:00
99ed78c79e [jwplatform] Support DASH streams 2016-10-02 14:07:49 +08:00
fd15264172 [jwplatform] Support old-style jwplayer playlists 2016-10-02 13:47:06 +08:00
bd26441205 [utils] Fix xattr error handling 2016-10-02 03:03:41 +08:00
b19e275d99 [__init__] Fix lost xattr if --embed-thumbnail used
Reported at
https://github.com/rg3/youtube-dl/issues/9054#issuecomment-250451823
2016-10-02 02:12:14 +08:00
f6ba581f89 [byutv:event] Add extractor 2016-10-02 00:50:07 +07:00
6d2549fb4f [byutv] Fix id and display id 2016-10-02 00:44:54 +07:00
4da4516973 [byutv] Rely on _match_id and _parse_json 2016-10-02 00:41:18 +07:00
e1e97c2446 [periscope:user] Fix extraction (Closes #10820) 2016-10-01 22:50:47 +07:00
53a7e3d287 [utils] Support xattr as well as pyxattr
Closes #9054

There are two xattr packages in Python, pyxattr [1] and xattr [2]. They
have different APIs.

In old days pyxattr supports Linux only and xattr supports Linux, Mac,
FreeBSD and Solaris, and pyxattr supports Linux only. Recently pyxattr
adds support for Mac OS X. [3]

An old version of [2] is shipped with Mac OS X. However, some Linux
distributions have pyxattr only, for example PLD-Linux [4] and old Arch
Linux. [5] As a result, supporting both is the way to go.

[1] https://github.com/iustin/pyxattr
[2] https://github.com/xattr/xattr
[3] https://github.com/iustin/pyxattr/pull/9
[4] https://github.com/rg3/youtube-dl/issues/5498
[5] https://git.archlinux.org/svntogit/community.git/commit/?id=427c4c76401e386d865ccddea4fbfdc74df80492
    https://git.archlinux.org/svntogit/community.git/commit/?id=59b40da7b69622a6761d364a8b07909e9cccaa56
    python-xattr is added on 2016/06/29 while pyxattr is there for more
    than 6 years
2016-10-01 20:13:04 +08:00
d54739a2e6 [downloader/http] xattr values should be bytes 2016-10-01 19:58:13 +08:00
63e0fd5bcc Merge pull request #10818 from TRox1972/criterion_match_id
[criterion] Rely on _match_id, improve regex and add thumbnail to test
2016-10-01 19:49:18 +08:00
9c51a24642 [criterion] Rely on _match_id, improve regex and add thumbnail to test 2016-10-01 13:46:48 +02:00
9bd7bd0b80 [twitch] Skip a 404 test 2016-10-01 16:38:47 +08:00
4a76b73c6c Merge pull request #10817 from TRox1972/clubic_match_id
[clubic] Rely on _match_id and _parse_json
2016-10-01 16:20:12 +08:00
e295618f9e [dctp] Fix extraction (closes #10734) 2016-10-01 15:22:48 +08:00
d7753d1948 [downloader/http] Use write_xattr function for --xattr-set-filesize 2016-10-01 14:47:20 +08:00
eaf9b22f94 [clubic] Rely on _match_id and _parse_json 2016-09-30 20:03:25 +02:00
a1001f47fc [instagram] PEP 8 2016-10-01 00:16:08 +07:00
1609782258 [Instagram] Extract video dimensions 2016-10-01 00:13:34 +07:00
de6babf922 [tvland] Extend _VALID_URL (Closes #10812) 2016-09-30 22:30:34 +07:00
b0582fc806 [vgtv] Add support for tv.aftonbladet.se (Closes #10800) 2016-09-30 00:15:09 +07:00
af33dd8ee7 [aftonbladet] Remove extractor 2016-09-30 00:13:03 +07:00
70d7b323b6 [vk] Improve view count extraction 2016-09-29 23:52:29 +07:00
a7ee8a00f4 [vk] Extract timestamp (Closes #10760) 2016-09-29 23:52:29 +07:00
c6eed6b8c0 [utils] Lower priority for rare date formats and add tests 2016-09-29 23:52:29 +07:00
3aa3953d28 [vk] Fix date and view count extraction. 2016-09-29 23:52:29 +07:00
efa97bdcf1 Move write_xattr to utils.py
There are some other places that use xattr functions. It's better to
move it to a common place so that others can use it.
2016-09-30 00:28:32 +08:00
475f8a4580 [vk] Add support for running live streams (Closes #10799) 2016-09-29 23:21:39 +07:00
93aa0b6318 [vk] Add support for finished live streams (#10799) 2016-09-29 23:04:10 +07:00
0ce26ef228 Merge pull request #10788 from TRox1972/instagram_comments
[Instagram] Extract comments
2016-09-29 21:54:39 +08:00
0d72ff9c51 [leeco] Recognize more Le Sports URLs (#10794) 2016-09-29 21:39:35 +08:00
a56e74e271 [Instagram] Extract comments 2016-09-28 19:32:40 +02:00
f533490bb7 [ketnet] Extract mzsource formats (#10770) 2016-09-28 22:58:25 +07:00
8bfda726c2 [limelight:media] improve http formats extraction 2016-09-28 16:34:27 +01:00
8f0cf20ab9 release 2016.09.27 2016-09-27 23:09:46 +07:00
c8f45f763c [ChangeLog] Remove duplicate 2016-09-27 23:03:00 +07:00
dd2cffeeec [ChangeLog] Actualize 2016-09-27 22:43:35 +07:00
cdfcc4ce95 [mtv] Improve _VALID_URL 2016-09-27 22:27:10 +07:00
e384552590 [vk] Add support for dailymotion embeds
Fixes #10661
2016-09-27 21:58:14 +07:00
1a2fbe322e [periscope] Treat timed_out state as finished stream 2016-09-27 21:55:51 +07:00
f9dd86a112 [npo] Clarify IE_NAMEs (Closes #10775) 2016-09-27 21:37:33 +07:00
2342733f85 fix tests related to 1978540a5122c53012e17a78841f3da0df77fd34(closes #10774) 2016-09-27 15:31:25 +01:00
93933c9819 [awaan:video] fix test(closes #10773) 2016-09-27 15:31:25 +01:00
d75d9e343e [einthusan] Fix extraction (closes #10714) 2016-09-27 14:38:41 +08:00
72c3d02d29 [promptfile] Improve and modernize 2016-09-26 23:39:54 +07:00
d3dbb46330 [promptfile] Fix extraction (Closes #10634) 2016-09-26 23:20:58 +07:00
fffb9cff94 [kaltura] Speed up embed regexes (#10764) 2016-09-26 22:15:58 +07:00
d3c97bad61 Ignore and cleanup 3gp files 2016-09-26 14:14:37 +08:00
2d5b4af007 [extractors] Add import for anderetijden extractor 2016-09-25 23:30:57 +07:00
f1ee462c82 [PULL_REQUEST_TEMPLATE.md] Fix typo 2016-09-25 22:38:36 +07:00
5742c18bc1 [npo] Add support for anderetijden.nl (Closes #10754) 2016-09-25 22:26:14 +07:00
ddb19772d5 [vpro] Fix playlist title extraction and update tests 2016-09-25 22:26:06 +07:00
a3d8b38168 [npo] Generalize playlist extractors 2016-09-25 22:26:00 +07:00
e590b7ff9e [PULL_REQUEST_TEMPLATE.md] Add checkable Improvement options PR's purpose 2016-09-25 18:09:46 +07:00
f3625cc4ca [PULL_REQUEST_TEMPLATE.md] Add Unlicense notice 2016-09-25 18:08:35 +07:00
2d3d29976b [youtube] Change test URLs from http to https 2016-09-25 17:45:24 +07:00
493353c7fd [prosiebensat1] Add support for advopedia 2016-09-25 06:25:57 +07:00
0a078550b9 [prosiebensat1] Improve _VALID_URL 2016-09-25 06:19:17 +07:00
f92bb612c6 [mwave] Relax _VALID_URLs (Closes #10735, closes #10748) 2016-09-25 06:14:32 +07:00
ddde91952f [prosiebensat1] Fix playlist support (Closes #10745) 2016-09-25 05:36:18 +07:00
63c583eb2c [prosiebensat1] Add support for sat1gold (#10745) 2016-09-25 04:43:10 +07:00
7fd57de6fb [cbsnews:livevideo] fix extraction and extract m3u8 formats 2016-09-24 22:01:33 +01:00
e71a450956 [common] add hdcore sign to akamai f4m formats 2016-09-24 21:55:53 +01:00
27e99078d3 [brightcove:new] add support for live streams 2016-09-24 15:39:48 +01:00
6f126d903f [download/hls] Delegate downloading to ffmpeg for live streams 2016-09-24 15:39:47 +01:00
7518a61d41 [soundcloud] Fix typo in playlist base class name 2016-09-24 19:29:49 +07:00
8e45e1cc4d [soundcloud] Generalize playlist entries extraction (#10733) 2016-09-24 19:18:01 +07:00
f0bc5a8609 [twitter] Support Periscope embeds (closes #10737)
Also update _TESTS
2016-09-24 20:00:29 +08:00
a54ffb8aa7 [mtv] add common IE_NAME prefix for MTVIE and MTVVideoIE 2016-09-24 10:50:14 +01:00
8add4bfecb [mtv] add support for new website urls(closes #8169)(closes #9808) 2016-09-24 10:42:20 +01:00
0711995bca [openload] Support subtitles (closes #10625) 2016-09-24 14:27:08 +08:00
5968d7d2fe [extractor/common] Improved support for HTML5 subtitles
Ref: #10625

In a strict sense, <track>s with kind=captions are not subtitles. [1]
openload misuses this attribute, and I guess there will be more
examples, so I add it to common.py.

Also allow extracting information for subtitles-only <video> or <audio>
tags, which is the case of openload.

[1] https://www.w3.org/TR/html5/embedded-content-0.html#attr-track-kind
2016-09-24 14:20:42 +08:00
e6332059ac release 2016.09.24 2016-09-24 02:16:47 +07:00
8eec691e8a [ChangeLog] Actualize 2016-09-24 02:12:49 +07:00
24628cf7db [soundcloud:playlist] Provide video id for playlist entries (Closes #10733) 2016-09-24 02:01:01 +07:00
71ad00c09f [prosiebensat1] Add support for kabeleinsdoku (Closes #10732) 2016-09-23 21:08:16 +07:00
45cae3b021 [cbs] extract info from thunder videoPlayerService(closes #10728) 2016-09-22 19:28:22 +01:00
4ddcb5999d [openload] Fix extraction (closes #10408, closes #10727)
Thanks to @daniel100097 for providing a working version
2016-09-23 01:47:51 +08:00
628406db96 [Makefile] Cleanup files from fragment-based downloaders 2016-09-23 01:13:56 +08:00
e3d6bdc8fc [ustream] Support HLS streams (closes #10698) 2016-09-23 01:11:13 +08:00
0a439c5c4c [udemy] Stringify video id 2016-09-22 21:48:53 +07:00
1978540a51 [ooyala] extract all hls formats 2016-09-21 21:49:52 +01:00
12f211d0cb [videomore] Fix embed regex 2016-09-21 22:51:36 +07:00
3a5a18705f [adobepass] add support MSO that depend on watchTVeverywhere(closes #10709) 2016-09-21 15:57:27 +01:00
1ae0ae5db0 [cartoonnetwork] add support Adobe Pass auth 2016-09-20 18:52:00 +01:00
f62a77b99a [soundcloud] Modernize 2016-09-20 21:56:57 +07:00
4bfd294e2f [soundcloud] Extract license metadata 2016-09-20 21:56:57 +07:00
e33a7253b2 [fox] add support for Adobe Pass auth(closes #8584) 2016-09-20 15:52:23 +01:00
c38f06818d add support for Adobe Pass auth in tbs,tnt and trutv extractors(fixes #10642)(closes #10222)(closes #10519) 2016-09-20 11:55:30 +01:00
cb57386873 release 2016.09.19 2016-09-19 02:58:32 +07:00
59fd8f931d [ChangeLog] Actualize 2016-09-19 02:57:14 +07:00
70b4cf9b1b [crunchyroll] Check if already logged in (Closes #10700) 2016-09-19 02:50:06 +07:00
cc764a6da8 [twitch:stream] Remove fallback to profile extraction when stream is offline
Main page does not contain profile videos anymore
2016-09-18 19:10:18 +07:00
d8dbf8707d [thisav] Improve title extraction (closes #10682)
I didn't add a test case as the one in #10682 looks like a copyrighted
product.
2016-09-18 18:35:38 +08:00
a1da888d0c [vyborymos] Improve station info extraction 2016-09-18 17:30:55 +07:00
3acff9423d release 2016.09.18 2016-09-18 17:16:55 +07:00
9ca93b99d1 [ChangeLog] Actualize 2016-09-18 17:15:22 +07:00
14ae11efab [vyborymos] Add extractor (Closes #10692) 2016-09-18 16:56:40 +07:00
190d2027d0 [xfileshare] Add title regex for streamin.to and fallback to video id (Closes #10646) 2016-09-18 07:22:06 +07:00
26394d021d [globo:article] Add support for multiple videos (Closes #10653) 2016-09-17 23:34:10 +07:00
30d0b549be [extractor/common] Add manifest_url for hls and hds formats 2016-09-17 21:33:38 +07:00
86f4d14f81 Refactor fragments interface and dash segments downloader
- Eliminate segment_urls and initialization_url
+ Introduce manifest_url (manifest may contain unfragmented data in this case url will be used for direct media URL and manifest_url for manifest itself correspondingly)
* Rewrite dashsegments downloader to use fragments data
* Improve generic mpd extraction
2016-09-17 20:35:22 +07:00
21d21b0c72 [svt] Fix DASH formats extraction 2016-09-17 19:25:31 +07:00
b4c1d6e800 [extractor/common] Expose fragments interface for dashsegments formats 2016-09-17 18:31:18 +07:00
a0d5077c8d [extractor/common] Introduce fragments interface 2016-09-17 18:31:09 +07:00
584d6f3457 [thisav] Recognize jwplayers (closes #10447) 2016-09-17 18:46:43 +08:00
e14c82bd6b [jwplatform] Use js_to_json to detect more JWPlayers 2016-09-17 18:45:08 +08:00
c51a7f0b2f [franceinter] Fix upload date extraction 2016-09-17 15:44:37 +07:00
d05ef09d9d [mangomolo] fix domain regex 2016-09-17 08:11:01 +01:00
30d9e20938 [postprocessor/ffmpeg] apply FFmpegFixupM3u8PP only for videos with aac codec(#5591) 2016-09-16 22:06:55 +01:00
fc86d4eed0 [mangomolo] fix typo 2016-09-16 20:10:47 +01:00
7d273a387a [mangomolo] add support for Mangomolo embeds 2016-09-16 19:31:39 +01:00
6ad0219556 [common] add helper method for Wowza Streaming Engine format extraction 2016-09-16 19:30:38 +01:00
98b7506e96 [toutv] add support for authentication(closes #10669) 2016-09-16 17:40:15 +01:00
52dc8a9b3f [franceinter] Fix upload date extraction 2016-09-16 22:02:59 +07:00
9d8985a165 [tv4] Fix hls and hds formats (Closes #10659) 2016-09-16 00:54:34 +07:00
f5e008d134 release 2016.09.15 2016-09-15 23:46:11 +07:00
e6bf3621e7 [ChangeLog] Actualize 2016-09-15 23:31:16 +07:00
490b755769 Improve some id regexes 2016-09-15 23:12:58 +07:00
1dec2c8a0e [adobepass] Change mvpd cache section name
In order to better emphasize it's relation to Adobe Pass
2016-09-15 22:47:45 +07:00
dcce092e0a [extractor/common] Simplify _get_netrc_login_info and carry long lines 2016-09-15 22:35:12 +07:00
32443dd346 [extractor/common] Update _get_login_info's comment 2016-09-15 22:34:29 +07:00
2133565cec [extractor/common] Simplify _get_login_info 2016-09-15 22:26:37 +07:00
1da50aa34e [YoutubeDL] Improve Adobe Pass options' wording 2016-09-15 22:24:55 +07:00
d2522b86ac [options] Actually print Adobe Pass options sections in --help 2016-09-15 22:18:31 +07:00
537f753399 [options] Improve Adobe Pass wording 2016-09-15 22:17:17 +07:00
c849836854 [utils] Improve _hidden_inputs 2016-09-15 21:54:48 +07:00
eb5b1fc021 [crunchyroll] Fix authentication (Closes #10655) 2016-09-15 21:53:35 +07:00
95be29e1c6 [twitch] Fix api calls (Closes #10654, closes #10660) 2016-09-15 20:58:02 +07:00
c035dba19e [bellmedia] add support for more sites 2016-09-15 08:12:12 +01:00
87148bb711 [adobepass] rename --ap-mso-list option to --ap-list-mso 2016-09-14 20:21:09 +01:00
797c636bcb [ap] improve adobe pass names and parse error handling 2016-09-14 18:58:47 +01:00
0002962f3f [franceinter] Improve extraction (Closes #10538) 2016-09-14 23:59:38 +07:00
3e4185c396 [utils] Use native french month names 2016-09-14 23:59:38 +07:00
f6717dec8a [utils] Improve month_by_name and add tests 2016-09-14 23:59:38 +07:00
a942d6cb48 [utils,franceinter] Add french months' names and fix extraction
Update of the "FranceInter" radio extractor : webpages HTML structure
had changed, the extractor didn't work. So I updated this extractor to
get the mp3 URL and all details.
2016-09-14 23:59:38 +07:00
961516bfd1 [kwuo:song] Improve error detection (closes #10650) 2016-09-15 00:56:15 +08:00
6db354a9f4 [kuwo] Update _TESTS 2016-09-15 00:53:04 +08:00
353f340e11 [go] fix typo 2016-09-14 17:22:42 +01:00
014b7e6b25 [go] add support for free full episodes(#10439) 2016-09-14 17:08:25 +01:00
925194022c Improve some _VALID_URLs 2016-09-14 22:47:21 +07:00
b690ea15eb [viafree] Fix test 2016-09-14 22:45:23 +07:00
5712c0f426 [adobepass] remove unnecessary option 2016-09-14 16:37:21 +01:00
86d68f906e [bilibili] Fix extraction for videos without backup_url (#10647) 2016-09-14 22:11:49 +08:00
4875ff6847 [bilibili] Remove copyrighted test cases
I can't find any English or Chinese material that claims BiliBili has
bought legal redistribution permissions for copyrighted products from
copyrighted holders.

References for removed test cases:
"刀语": https://en.wikipedia.org/wiki/Katanagatari, by White Fox
"哆啦A梦": https://en.wikipedia.org/wiki/Doraemon, by Shin-Ei Animation
"岳父岳母真难当": https://en.wikipedia.org/wiki/Serial_(Bad)_Weddings, by Les films du 24
"混沌武士": https://en.wikipedia.org/wiki/Samurai_Champloo, by Manglobe

I shouldn't have added them to _TESTS
2016-09-14 22:09:43 +08:00
1b6712ab23 [adobepass] add specific options for adobe pass authentication
- add --ap-username and --ap-password option to specify
TV provider username and password in the cmd line
- add --ap-retries option to limit the number of retries
- add --list-ap-msi-ids to list the supported TV Providers
2016-09-13 22:16:01 +01:00
8414c2da31 [adobepass] PEP 8 2016-09-13 23:22:16 +07:00
45396dd2ed [nhk] Fix extraction (Closes #10633) 2016-09-13 23:20:25 +07:00
7a7309219c [adobepass] add an option to specify mso_id and support for ROGERS TV Provider(closes #10606) 2016-09-12 23:39:35 +01:00
fcba157e80 [ISSUE_TEMPLATE_tmpl.md] Fix typo 2016-09-12 23:29:43 +07:00
a6ccc3e518 [safari] Improve ids regexes (#10617) 2016-09-12 23:05:52 +07:00
1d16035bb4 [kaltura] Improve audio detection 2016-09-12 22:43:45 +07:00
e8bcd982cc [kaltura] Skip chun format 2016-09-12 22:33:00 +07:00
a5ff05df1a [extractor/generic] Add vimeo embed that requires Referer passed 2016-09-12 21:49:31 +07:00
d002e91986 [vimeo:ondemand] Pass Referer along with embed URL (#10624) 2016-09-12 21:48:45 +07:00
546edb2efa [ISSUE_TEMPLATE_tmpl.md] Fix typo 2016-09-12 21:01:31 +07:00
be45730226 [nbc] Add new extractor for NBC Olympics (#10295, #10361) 2016-09-12 02:55:15 +08:00
ee7e672eb0 [tube8] Remove proxy settings from test 2016-09-11 23:46:50 +07:00
0307d6fba6 release 2016.09.11.1 2016-09-11 23:33:20 +07:00
fc150cba1d [devscripts/release.sh] Add missing fi 2016-09-11 23:32:01 +07:00
d667ab7fad [ChangeLog] Actualize 2016-09-11 23:30:18 +07:00
eb87d4545a [devscripts/release.sh] Add ChangeLog reminder prompt 2016-09-11 23:29:25 +07:00
1c81476cbb release 2016.09.11 2016-09-11 23:20:09 +07:00
bc9186c882 [tvplay] Remove unused import 2016-09-11 22:51:12 +07:00
6599c72527 [tube8] Extract categories and tags (Closes #10579) 2016-09-11 22:50:36 +07:00
6bb05b32a9 [pornhub] Extract categories and tags (closes #10499) 2016-09-11 19:22:51 +08:00
fea74acad8 [foxnews] Revert to old extractor names 2016-09-11 18:54:24 +08:00
f01115c933 [openload] Temporary fix (#10408) 2016-09-11 18:36:59 +08:00
2cdbc06a1f [foxnews] Support Fox News Articles (closes #10598) 2016-09-11 18:32:45 +08:00
2cb93afcd8 [viafree] Improve video id extraction (Closes #10615) 2016-09-11 14:59:14 +07:00
bfcda07a27 [abc:iview] Skip the test. They are removed soon 2016-09-11 04:06:00 +08:00
001a5fd3d7 [iwara] Fix extraction after relaunch
Closes #10462, closes #3215
2016-09-11 03:02:00 +08:00
1e35999c1e [tfo] Add new extractor 2016-09-10 19:43:31 +01:00
2512b17493 [lrt] Fix audio extraction (Closes #10566) 2016-09-11 01:27:20 +07:00
56c0ead4d3 [9now] Improve video data extraction (Closes #10561) 2016-09-11 00:42:13 +07:00
7324243750 [9now] Fix extraction 2016-09-11 00:16:29 +07:00
84a18e9b90 [polskieradio:category] Improve extraction 2016-09-10 22:01:49 +07:00
b29f842e0e [canalplus] Add support for c8.fr (Closes #10577) 2016-09-10 20:46:45 +07:00
f009fcac0d Merge branch 'master' of github.com:rg3/youtube-dl 2016-09-10 19:21:03 +07:00
6c3affcb18 [newgrounds] Fix uploader extraction
Closes #10584

Also change test URLs to HTTPS, as proposed by
@stepshal in #10593.

Closes #10593
2016-09-10 20:09:09 +08:00
1e19ff2984 Merge branch 'polskie-radio-programme' of https://github.com/JakubAdamWieczorek/youtube-dl 2016-09-10 00:42:36 +07:00
c6129feb7f [ketnet] Add extractor (Closes #10343) 2016-09-09 23:20:45 +07:00
bb5ebd4453 [canvas] Add support for een.be (Closes #10605) 2016-09-09 22:16:21 +07:00
cb9cbd84ed [extractors] add import for TeleQuebecIE 2016-09-08 22:55:27 +01:00
4d5726b0d7 [telequebec] Add new extractor(closes #1999) 2016-09-08 22:53:44 +01:00
4614ad7b59 [parliamentliveuk] fix extraction(closes #9137) 2016-09-08 20:46:12 +01:00
b717837190 release 2016.09.08 2016-09-08 23:46:14 +07:00
2abad67e52 [ChangeLog] Actualize 2016-09-08 23:32:16 +07:00
ad0e2b3359 [abcotvs] Add support for ABC Owned Television Stations 2016-09-08 23:15:58 +07:00
37720844f6 [jwplatform] Extract height from label 2016-09-08 22:53:20 +07:00
6cfcb8ac36 [tvnoe] Do not capture unused groups in _VALID_URL 2016-09-08 22:53:20 +07:00
7a979da8cb [yahoo] Look for Brightcove Legacy Studio embeds(closes #9345) 2016-09-08 16:44:22 +01:00
2fdc7b0e04 [viafree] PEP 8 2016-09-08 22:40:02 +07:00
010d034fca [videomore] Fix extraction (Closes #10592) 2016-09-08 22:38:49 +07:00
02e552886f Merge pull request #10596 from stepshal/r_prefix
Add missing r prefix for _VALID_URLs
2016-09-08 18:31:09 +08:00
25042f7372 Add missing r prefix for _VALID_URLs 2016-09-08 17:04:57 +07:00
3f612f0767 Fix _VALID_URLs further (#10594) 2016-09-08 17:39:29 +08:00
17bf6e71cc Merge pull request #10594 from stepshal/https_support
Add support for https for rest of the exctractors.
2016-09-08 17:28:46 +08:00
881f35479d Credit @xyb for miaopai extractor (#10556) 2016-09-08 17:22:43 +08:00
89f257d6e5 Add support for https for rest of the exctractors. 2016-09-08 13:52:22 +07:00
e78a5428b6 [foxgay] Fix extraction (closes #10480) 2016-09-08 02:01:09 +08:00
6656a82481 [rmcdecouverte] Add new extractor(closes #9709) 2016-09-07 17:33:22 +01:00
d7e794928d [tlc] fix query string parsing 2016-09-07 17:33:22 +01:00
9c27188988 Merge branch 'xyb-miaopai' 2016-09-08 00:31:06 +08:00
b84d311d53 [ChangeLog] Update for #10556 2016-09-08 00:29:55 +08:00
f87feb4b68 [miaopai] Coding style (#10556) 2016-09-08 00:28:33 +08:00
2841bdcebb Merge branch 'miaopai' of https://github.com/xyb/youtube-dl into xyb-miaopai 2016-09-08 00:08:02 +08:00
84b91dd4e3 [gamestar] Fix metadata extraction (closes #10479) 2016-09-07 23:07:50 +08:00
92c9c2a88b [moevideo] Skip another removed test (#10474) 2016-09-07 22:21:59 +08:00
9d54b02bae [puls4] fix extraction(closes #10583) 2016-09-07 14:43:20 +01:00
846d8b76a0 [cctv] Add new extractor(closes #8153) 2016-09-07 10:11:09 +01:00
aa3f9fe695 Explain why and why not to specify --hls-prefer-native
This has been asked at http://stackoverflow.com/questions/39357037/what-does-youtube-dl-option-hls-prefer-native-do-any-downside-adding-to-youtu
2016-09-07 10:38:59 +02:00
8258f4457c [lci] Add new extractor(closes #10573) 2016-09-06 20:47:42 +01:00
948cd5b72d [wat] extract dash formats 2016-09-06 20:44:45 +01:00
8d3737cda7 [polskieradio] Add support for downloading whole programmes.
This extends the Polskie Radio (the Polish national radio) extractor to
enable the user to download all the broadcasts of a single programme.
2016-09-06 21:34:44 +02:00
155bc674c4 [viafree] Improve video id detection (Closes #10569) 2016-09-07 00:41:31 +07:00
c33c962adf [trutv] Add new extractor(#10519) 2016-09-06 15:56:17 +01:00
bdcc046d12 [turner] use android secure hls host and catch token extraction errors 2016-09-06 15:53:03 +01:00
a493f10208 using _parse_html5_media_entries to parse video tag 2016-09-05 23:08:33 +08:00
f3eeaacb4e [nick] Add test for #10559 2016-09-05 21:42:41 +07:00
b4d6a85d60 [nick] Add support for nickelodeon.nl (Closes #10559) 2016-09-05 21:33:14 +07:00
0b36a96212 [abcotvs] extend _VALID_URL and add support for clips.abcotvs.com(closes #9551) 2016-09-05 13:41:21 +01:00
bc22a79694 Credit @mcepl for #10524 2016-09-05 16:44:06 +08:00
340e31ca74 Merge branch 'PeterDing-bilibili' 2016-09-05 13:55:07 +08:00
973dee491f [ChangeLog] Update for #10190 2016-09-05 13:54:35 +08:00
1f85029d82 [bilibili] Simplify 2016-09-05 13:53:58 +08:00
95be19d436 [miaopai] Add new extractor 2016-09-05 13:53:09 +08:00
95843da529 Merge branch 'bilibili' of https://github.com/PeterDing/youtube-dl into PeterDing-bilibili 2016-09-05 13:47:24 +08:00
abf2c79f95 Merge branch 'mcepl-tvnoe' 2016-09-05 13:39:51 +08:00
b49ad71ce1 [ChangeLog] Update for #10524 2016-09-05 13:38:55 +08:00
9127e1533d [tvnoe] PEP8 and coding style 2016-09-05 13:37:36 +08:00
78e762d23c Add new extractor for TV Noe (Czech Christian TV).
Fixes #10520
2016-09-04 19:06:40 +02:00
4809490108 release 2016.09.04.1 2016-09-04 20:58:28 +07:00
8112bfeaba [ChangeLog] Actualize 2016-09-04 20:57:18 +07:00
d9606d9b6c release 2016.09.04 2016-09-04 20:51:48 +07:00
433af6ad30 [theplatform] fix player regex(closes #10546) 2016-09-04 14:24:41 +01:00
feaa5ad787 [youtube:playlist] Extend _VALID_URL 2016-09-04 20:12:34 +07:00
100bd86a68 [rottentomatoes] delegate extraction to InternetVideoArchiveIE 2016-09-04 11:45:29 +01:00
0def758782 [internetvideoarchive] extract all formats 2016-09-04 11:45:29 +01:00
919cf1a62f [downloader/dash] Abort if the first segment fails
Closes #10497, Closes #10542
2016-09-04 17:32:29 +08:00
b29cd56591 [pornovoisines] Fix extraction (closes #10469) 2016-09-04 17:01:39 +08:00
622638512b [rottentomatoes] Fix extraction
Closes #10467
2016-09-04 16:25:59 +08:00
37c7490ac6 [espn] Extend _VALID_URL (Closes #10549) 2016-09-04 04:59:46 +07:00
091624f9da [vimple] Extend _VALID_URL (Closes #10547) 2016-09-04 03:39:13 +07:00
7e5dc339de [youtube:watchlater] Fix extraction (Closes #10544) 2016-09-04 00:29:01 +07:00
4a69fa04e0 [downloader/dash] Abort download immediately after giving up on some fragment 2016-09-03 17:51:48 +07:00
2e99cd30c3 [downloader/dash:hls] Report exact fragment error on retry 2016-09-03 17:51:48 +07:00
25afc2a783 [downloader/dash:hls] Respect --fragment-retries and --skip-unavailable-fragments (Closes #10165, closes #10448) 2016-09-03 17:51:48 +07:00
9603b66012 Introduce --skip-unavailable-fragments 2016-09-03 17:51:48 +07:00
45aab4d30b [youjizz] Fix extraction. The site has moved to HTML5
Closes #10437
2016-09-03 18:37:36 +08:00
ed2bfe93aa [fc2:embed] Add ie_key 2016-09-03 18:22:00 +08:00
cdc783510b [foxnews:insider] Add new extractor
Closes #10445
2016-09-03 18:16:19 +08:00
cf0efe9636 [fc2:embed] New extractor for Flash player URLs
Closes #10512
2016-09-03 17:25:03 +08:00
dedb177029 Fix parsing of HTML5 media elements
This fixes an error in _parse_html5_media_entries in case
an audio or video tag directly uses a src attribute insted
of <source> elements in it's body.
2016-09-03 16:09:35 +07:00
86c3bbbced release 2016.09.03 2016-09-03 01:46:41 +07:00
4b3a607658 [ChangeLog] Actualize 2016-09-03 01:45:17 +07:00
3a7d35b982 Credit @C4K3 for #10536 2016-09-03 01:42:33 +07:00
6496ccb413 [youtube] Add support for rental videos' previews (Closes #10532) 2016-09-03 01:17:15 +07:00
3fcce30289 [drtv] Update tests 2016-09-02 23:53:17 +07:00
c2b2c7e138 [utils] Add quicktime to mimetype2ext 2016-09-02 23:50:42 +07:00
dacb3a864a [youtube:playlist] Fallback to video extraction for video/playlist URLs when playlist is broken (Closes #10537) 2016-09-02 23:43:20 +07:00
6066d03db0 [drtv] Modernize and make more robust 2016-09-02 23:02:15 +07:00
6562d34a8c [utils] Improve mimetype2ext 2016-09-02 22:57:48 +07:00
5e9e3d0f6b [drtv] Add support for dr.dk/nyheder
It's the same video player, the only difference is that the video player
is loaded differently, and certain metadata (title and description) is
not available under dr.dk/mu, so make it by default get that from some
of the html meta tags.

Skip the dr.dk/tv test

dr.dk/tv videos are only available for between 7 and 90 days due to
Danish law, and in certain cases may be readded. Skip this test as it is
no longer available.
2016-09-02 22:20:36 +07:00
349fc5c705 [facebook:plugins:video] Add extractor (Closes #10530) 2016-09-02 21:13:50 +07:00
2c3e0af93e [go] Add new extractor 2016-09-02 09:53:04 +01:00
6150502e47 [adobepass] check for authz_token expiration(#10527) 2016-09-01 22:29:20 +01:00
b207d5ebd4 [curiositystream] don't cache auth token 2016-09-01 19:46:58 +01:00
4191779dcd [nytimes] improve extraction 2016-09-01 19:08:29 +01:00
f97ec8bcb9 [glide] Remove unused import 2016-09-01 23:46:58 +07:00
8276d3b87a [thestar] Fix extraction (Closes #10465) 2016-09-01 23:46:15 +07:00
af95ee94b4 [glide] Fix extraction (Closes #10478) 2016-09-01 23:38:49 +07:00
8fb6af6bba [exfm] Remove extractor (Closes #10482) 2016-09-01 23:32:28 +07:00
f6af0f888b [youporn] Fix categories and tags extraction (Closes #10521) 2016-09-01 23:15:01 +07:00
e816c9d158 [extractor/common] Simplify _extract_m3u8_formats 2016-09-01 22:18:16 +07:00
9250181f37 [extractor/common] Restore NAME usage from EXT-X-MEDIA tag for formats codes in _extract_m3u8_formats (Closes #10522) 2016-09-01 21:37:25 +07:00
f096ec2625 [curiositystream] Add new extractor 2016-09-01 13:37:09 +01:00
4c8ab6fd71 [thvideo] Remove extractor. Website down.
Closes #10464

According to a screenshot in http://tieba.baidu.com/p/4691302183,
thvideo.tv is shut down "temporarily". I see no clues that it will be up
again, so I remove it here.
2016-09-01 17:04:41 +08:00
05d4612947 [movingimage] Adapt to the new domain name and fix extraction
Closes #10466
2016-09-01 16:58:16 +08:00
746a695b36 [myvidster] Update _TESTS (closes #10473) 2016-09-01 16:42:35 +08:00
165c54e97d [southpark.cc.com:español] Skip geo-restricted _TESTS
Breaks https://travis-ci.org/rg3/youtube-dl/jobs/156728175
2016-09-01 16:28:03 +08:00
2896dd73bc [cbs] extract once formats(closes #10515) 2016-09-01 08:00:13 +01:00
f8fd510eb4 [limelight] skip ism manifests and reduce requests 2016-08-31 18:32:15 +01:00
7a3e849f6e [porncom] Extract categories and tags (Closes #10510) 2016-08-31 22:23:55 +07:00
196c6ba067 [facebook] Extract timestamp (Closes #10508) 2016-08-31 22:12:37 +07:00
165620e320 [yahoo] extract more and better formats 2016-08-30 21:49:28 +01:00
4fd350611c release 2016.08.31 2016-08-31 02:39:39 +07:00
263fef43de [ChangeLog] Actualize 2016-08-31 02:37:40 +07:00
a249ab83cb [pyvideo] Remove debugging code 2016-08-31 01:56:58 +07:00
f7043ef39c [soundcloud] Fix _VALID_URL clashes with sets (Closes #10505) 2016-08-31 01:56:15 +07:00
64fc49aba0 [bandcamp:album] Fix title extraction (Closes #10455) 2016-08-31 00:29:49 +07:00
245023a861 [pyvideo] Fix extraction (Closes #10468) 2016-08-30 23:51:18 +07:00
3c77a54d5d [turner] keep video id intact 2016-08-30 10:46:48 +01:00
da30a20a4d [turner,cnn] move a check for wrong timestamp to CNNIE 2016-08-29 19:26:53 +01:00
1fe48afea5 [cnn] update _TEST for CNNBlogsIE and CNNArticleIE(closes #10489) 2016-08-29 18:24:16 +01:00
42e05be867 [ctv] add support for (tsn,bnn,thecomedynetwork).ca websites(#10016) 2016-08-29 18:24:16 +01:00
fe45b0e060 [9c9media] fix multiple stacks extraction and extract more metadata(#10016) 2016-08-29 18:24:16 +01:00
a06e1498aa [kusi] Update test 2016-08-29 22:54:33 +07:00
5a80e7b43a [turner] Skip invalid subtitles' URLs 2016-08-29 22:44:15 +07:00
3fb2a23029 [adultswim] Extract video info from onlineOriginals (Closes #10492) 2016-08-29 22:40:35 +07:00
7be15d4097 [bilibili] Support episodes
[extractor/bilibili] add md5 for testing

[extractor/bilibili] remove unnecessary headers

[extractor/bilibili] correct _TESTS; find thumbnail for episode

[extractor/bilibili] [Fix] restore removed tests
2016-08-29 23:31:08 +08:00
cd10b3ea63 [turner] Extract all formats 2016-08-29 22:13:49 +07:00
547993dcd0 [turner] Fix subtitles extraction 2016-08-29 21:52:41 +07:00
6c9b71bc08 [downloader/external] Recommend --hls-prefer-native for SOCKS users
Related: #10490
2016-08-29 19:05:38 +08:00
93b8404599 [generic,vodplatform] improve embed regex 2016-08-29 07:57:20 +01:00
9ba1e1dcc0 [played] Remove extractor (Closes #10470) 2016-08-29 08:26:07 +07:00
b8079a40bc [turner] fix secure m3u8 formats downloading 2016-08-28 17:51:53 +01:00
5bc8a73af6 [cartoonnetwork] make extraction work for more videos in the website
some videos require `networkName=CN2` to be present in the feed url
2016-08-28 17:08:26 +01:00
b3eaeded12 [tbs] Add new extractor(#10222) 2016-08-28 16:51:09 +01:00
ec65b391cb [cartoonnetwork] Add new extractor(#10110) 2016-08-28 16:51:09 +01:00
2982514072 [turner,nba,cnn,adultswim] add base extractor to parse cvp feeds 2016-08-28 16:51:09 +01:00
98908bcf7c [openload] Update algorithm again (#10408) 2016-08-28 22:49:46 +08:00
04b32c8f96 [bilibili] Fix extraction (closes #10375)
Thanks @gdkchan for the algorithm
2016-08-28 22:06:31 +08:00
40eec6b15c [openload] Fix extraction (closes #10408)
Thanks to @yokrysty again!
2016-08-28 20:27:52 +08:00
39efc6e3e0 [generic] Update some _TESTS 2016-08-28 15:46:11 +08:00
1198fe14a1 release 2016.08.28 2016-08-28 07:24:08 +07:00
71e90766b5 [README.md] Fix typo in download archive FAQ entry 2016-08-28 07:09:03 +07:00
d7aae610f6 [ChangeLog] Actualize 2016-08-28 07:00:15 +07:00
92c27a0dbf [periscope:user] Fix extraction (Closes #10453) 2016-08-28 02:35:49 +07:00
d181cff685 Merge branch 'steven7851-patch-2' 2016-08-27 01:17:12 +08:00
3b4b82d4ce [douyutv] Simplify 2016-08-27 01:16:39 +08:00
545ef4f531 Merge branch 'patch-2' of https://github.com/steven7851/youtube-dl into steven7851-patch-2 2016-08-26 22:29:46 +08:00
906b87cf5f [crackle] Revert to template-based thumbnail extraction
To reduce to number of HTTP requests
2016-08-26 19:58:47 +08:00
b281aad2dc [douyutv] Use new api
use lapi for flv info, and html5 api for room info
#10153 #10318
2016-08-26 07:32:54 +08:00
6b18a24e6e [tnaflix] Fix extraction (Closes #10434) 2016-08-26 05:57:52 +07:00
c9de980106 Credit @Xender for nhk:vod (#10424) 2016-08-26 04:49:52 +07:00
f9b373afda [nhk:vod] Improve extraction (Closes #10424) 2016-08-26 04:48:40 +07:00
298a120ab7 [nhk] Add extractor for VoD. 2016-08-26 04:15:51 +07:00
e3faecde30 [trutube] Remove extractor (Closes #10438) 2016-08-26 03:43:13 +07:00
a0f071a50d [usanetwork] Add new extractor 2016-08-25 19:41:31 +01:00
20bad91d76 [downloader/external] Clarify that ffmpeg doesn't support SOCKS
Ref: #10304
2016-08-25 22:38:06 +08:00
b54a2da433 [crackle] Fix extraction and update _TESTS (closes #10333) 2016-08-25 22:22:31 +08:00
dc2c37f316 [spankbang] Fix description and uploader (closes #10339) 2016-08-25 20:47:35 +08:00
c1f62dd338 [README] Clean up grammar in --download-archive paragraph 2016-08-25 14:45:01 +02:00
5a3efcd27c [README.md] Add FAQ entry for download archive 2016-08-25 18:57:31 +07:00
4c8f9c2577 [README.md] Add comments in sample configuration for clarity 2016-08-25 18:27:15 +07:00
f26a298247 [README.md] Use en-US URL in cookies FAQ entry 2016-08-25 18:19:41 +07:00
ea01cdbf61 [README.md] Clarify how to export cookies from browser for cookies FAQ entry 2016-08-25 18:17:45 +07:00
6a76b53355 [README.md] Quote URL in streaming to player FAQ entry 2016-08-25 18:05:01 +07:00
d37708fc86 [YoutubeDL] check only for None Value in thumbnails sorting 2016-08-25 11:53:47 +01:00
5c13c28566 raise unexpected error when no stream found 2016-08-25 09:55:23 +01:00
f70e9229e6 [discoverygo] detect when video needs authentication(closes #10425) 2016-08-25 09:11:23 +01:00
30afe4aeb2 [cbc] Add support for watch.cbc.ca 2016-08-25 08:49:44 +01:00
75fa990dc6 [YoutubeDL] add fallback value for thumbnails values in thumbnails sorting 2016-08-25 08:49:44 +01:00
f39ffc5877 [common] extract formats from #EXT-X-MEDIA tags 2016-08-25 08:49:44 +01:00
07ea9c9b05 [downloader/hls] fill IV with zeros for IVs shorter than 16-octet 2016-08-25 08:49:44 +01:00
073ac1225f [utils] add ac-3 to the list of audio codecs in parse_codecs 2016-08-25 08:49:44 +01:00
0c6422cdd6 [README.md] Add FAQ entry for streaming to player 2016-08-25 07:34:55 +07:00
08773689f3 [kickstarter] Silent the warning for og:description
Closes #10415
2016-08-25 01:29:32 +08:00
0c75abbb7b [mtvservices:embedded] Use another endpoint to get feed URL
Closes #10363

In the original mtvservices:embedded test case, config.xml is still used
to get the feed URL. Some other examples, including test_Generic_40
(http://www.vulture.com/2016/06/new-key-peele-sketches-released.html),
and the video mentioned in #10363, use another endpoint to get the feed
URL. The 'index.html' approach works for the original test case, too. So
I didn't keep the old approach.
2016-08-24 23:58:22 +08:00
97653f81b2 [bilibili] Mark as broken
Bilibili now uses emscripten, which is very difficult for reverse
engineering. I don't expect it to be fixed in near future, so I mark
it as broken.

Ref: #10375
2016-08-24 21:28:00 +08:00
d38b27dd9b release 2016.08.24.1 2016-08-24 10:11:04 +07:00
6d94cbd2f4 [ChangeLog] Actualize 2016-08-24 10:07:06 +07:00
30317f4887 [pluralsight] Modernize and make more robust 2016-08-24 08:52:12 +07:00
8c3e35dd44 [pluralsight] Add support for subtitles (Closes #9681) 2016-08-24 08:41:52 +07:00
c86f51ee38 release 2016.08.24 2016-08-24 01:38:46 +07:00
6e52bbb413 [ChangeLog] Actualize 2016-08-24 01:36:27 +07:00
05bddcc512 [youtube] Fix authentication (2) (Closes #10392) 2016-08-24 01:29:50 +07:00
1212e9972f [youtube] Fix authentication (#10392) 2016-08-24 00:25:21 +07:00
ccb6570e9e [syfy,bravotv] restrict drupal settings regex 2016-08-23 17:31:35 +01:00
18b6216150 [openload] Fix extraction (closes #10408)
Thanks @yokrysty for the algorithm
2016-08-23 21:55:58 +08:00
fb009b7f53 [bravotv] correct clip info extraction and add support for adobe pass auth(closes #10407) 2016-08-23 10:29:52 +01:00
3083e4dc07 [eagleplatform] Improve detection of embedded videos (Closes #10409) 2016-08-23 07:22:14 +07:00
7367bdef23 [awaan] fix extraction, modernize, rename the extractors and add test for live stream 2016-08-22 23:10:06 +01:00
ad31642584 [nrk,abc:iview] use _extract_akamai_formats 2016-08-22 07:54:08 +01:00
c7c43a93ba [common] add helper method to extract akamai m3u8 and f4m formats 2016-08-22 07:49:34 +01:00
96229e5f95 [mtvservices:embedded] Update config URL
All starts from #10363. The test case in mtvservices:embedded uses
config.xml, while the video from #10363 and the test case in generic.py
is broken. Both uses index.html for fetching the feed URL.
2016-08-22 13:56:09 +08:00
55d119e2a1 [abc:iview] Add new extractor(closes #6148) 2016-08-22 00:07:17 +01:00
6d2679ee26 release 2016.08.22 2016-08-22 04:17:34 +07:00
afbab5688e [ChangeLog] Actualize 2016-08-22 04:15:46 +07:00
3d897cc791 [ivi] Fix episode number extraction 2016-08-22 03:34:27 +07:00
cf143c4d97 [ivi] Add support for 720p and 1080p 2016-08-22 03:31:33 +07:00
ad120ae1c5 [extractor/common] Change the default m3u8 protocol in HTML5
Helper functions should have consistent default values
2016-08-22 02:26:07 +08:00
d0fa172e5f [firsttv] keep a test videos with multiple formats 2016-08-21 19:13:43 +01:00
f97f9f71e5 Merge branch 'TRox1972-charlierose' 2016-08-22 02:11:43 +08:00
526656726b [charlierose] Simplify and improve 2016-08-22 02:06:47 +08:00
9b8c554ea7 [firsttv] fix extraction(closes #9249) 2016-08-21 17:56:25 +01:00
d13bfc07b7 Merge branch 'charlierose' of https://github.com/TRox1972/youtube-dl into TRox1972-charlierose 2016-08-22 00:48:35 +08:00
efe470e261 [twitch] Renew authentication 2016-08-21 22:45:50 +07:00
e3f6b56909 [twitch] Refactor API calls 2016-08-21 22:09:29 +07:00
b1e676fde8 [twitch] Modernize 2016-08-21 21:28:02 +07:00
92d4cfa358 [kaltura] Fallback ext calculation on caption's format 2016-08-21 21:01:01 +07:00
3d47ee0a9e [zingmp3] fix extraction and add support for video clips(closes #10041) 2016-08-21 14:09:48 +01:00
d164a0d41b [README.md] Add a format selection example using comma
Ref: #10399
2016-08-21 20:00:48 +08:00
db29af6d36 [charlierose] Add new extractor 2016-08-21 11:29:48 +02:00
2c6acdfd2d [kaltura] Add test for #10279 2016-08-21 08:37:01 +07:00
fddaa76a59 [kaltura] Assume ttml to be default subtitles' extension 2016-08-21 08:28:36 +07:00
a809446750 [kaltura] Add subtitles support when entry_id is unknown beforehand (Closes #10279) 2016-08-21 08:28:36 +07:00
d8f30a7e66 [kaltura] Remove unused code 2016-08-21 08:28:36 +07:00
5b1d85754e [YoutubeDL] Autocalculate ext when ext is None 2016-08-21 08:28:36 +07:00
e25586e471 [cultureunplugged] fix extraction(closes #10330) 2016-08-20 20:02:49 +01:00
292a2301bf [cnn] add support for money.cnn.com videos(closes #2797) 2016-08-20 19:00:25 +01:00
dabe15701b [cbs, cbsnews] fix extraction(fixes #10393) 2016-08-20 13:25:32 +01:00
4245f55880 [dotsub] Replace test (Closes #10386) 2016-08-20 06:18:20 +07:00
5b9d187cc6 [imdb] Improve title extraction and make thumbnail non-fatal 2016-08-20 04:50:39 +07:00
39e1c4f08c [litv] Support 'promo' URLs (closes #10385) 2016-08-20 00:52:37 +08:00
19f35402c5 [snotr] Fix extraction (closes #10338) 2016-08-20 00:18:22 +08:00
70852b47ca [utils] Recognize units with full names in parse_filename
Reference: https://en.wikipedia.org/wiki/Template:Quantities_of_bytes
2016-08-20 00:17:26 +08:00
a9a3b4a081 [miomio] Adapt to the new API and update _TESTS
The test case is from #9680
2016-08-20 00:08:23 +08:00
ecc90093f9 [vuclip] Adapt to the new API and update _TEST 2016-08-19 23:56:09 +08:00
520251c093 [extractor/common] Recognize m3u8 manifests in HTML5 multimedia tags 2016-08-19 23:53:47 +08:00
55af45fcab [radiobremen] Update _TEST (closes #10337) 2016-08-19 23:12:30 +08:00
b82232036a [n-tv.de] Fix extraction (closes #10331) 2016-08-19 20:39:28 +08:00
e4659b4547 [utils] Correct octal/hexadecimal number detection in js_to_json 2016-08-19 20:37:17 +08:00
9e5751b9fe [globo:article] Relax _VALID_URL and video id regex (Closes #10379) 2016-08-19 01:13:45 +07:00
bd1bcd3ea0 release 2016.08.19 2016-08-19 00:15:12 +07:00
93a63b36f1 [ChangeLog] Actualize 2016-08-19 00:13:24 +07:00
8b2dc4c328 [options] Remove output template description from --help
Same reasons as for --format
2016-08-18 23:59:13 +07:00
850837b67a [porncom] Add extractor (Closes #2251, closes #10251) 2016-08-18 23:52:41 +07:00
13585d7682 [utils] Recognize lowercase units in parse_filesize 2016-08-18 23:32:00 +07:00
fd3ec986a4 [generic] Fix dbtv test (Closes #10364) 2016-08-18 21:35:41 +07:00
b0d578ff7b [dbtv] Relax embed regex 2016-08-18 21:30:55 +07:00
b0c8f2e9c8 [DBTV:generic] Add support for embeds 2016-08-18 21:29:27 +07:00
51815886a9 [vk:wallpost] Fix audio extraction 2016-08-18 06:14:05 +07:00
08a42f9c74 [vk] Fix authentication on python3 2016-08-18 05:22:23 +07:00
e15ad9ef09 [keezmovies] PEP 8 2016-08-18 04:39:31 +07:00
4e9fee1015 [hgtvcom:show] Add extractor (Closes #10365) 2016-08-18 04:37:14 +07:00
7273e5849b [discoverygo] extend _VALID_URL to support other networks 2016-08-17 11:03:09 +01:00
b505e98784 [extremetube] Revert display_id 2016-08-17 07:02:13 +07:00
92cd9fd565 [keezmovies] Make display_id optional 2016-08-17 07:01:32 +07:00
b3d7dce429 release 2016.08.17 2016-08-17 06:21:21 +07:00
a44694ab4e [ChangeLog] Actualize 2016-08-17 06:19:22 +07:00
ab19b46b88 [extremetube] Modernize 2016-08-17 06:02:12 +07:00
8804f10e6b [tube8] Modernize 2016-08-17 05:46:45 +07:00
6be17c0870 [mofosex] Extract all formats and modernize (Closes #10335) 2016-08-17 05:45:49 +07:00
8652770bd2 [keezmovies] Improve and modernize 2016-08-17 05:44:46 +07:00
2a1321a272 [vbox7:generic] Add support for vbox7 embeds 2016-08-17 01:02:59 +07:00
9c0fa60bf3 [vbox7] Add support for embed URLs 2016-08-17 00:42:02 +07:00
502d87c546 [mtg] Improve view count extraction 2016-08-17 00:32:28 +07:00
b35b0d73d8 [viafree] Add extractor (Closes #10358) 2016-08-17 00:21:30 +07:00
6e7e4a6edf [mtg] Add support for viafree URLs (#10358) 2016-08-17 00:19:43 +07:00
53fef319f1 [fxnetworks] extend _VALID_URL to support simpsonsworld.com 2016-08-16 16:22:34 +01:00
2cabee2a7d [amcnetworks] fix typo 2016-08-16 16:22:34 +01:00
11f502fac1 [theplatform] extract subtitles with multiple formats from the metadata 2016-08-16 16:22:34 +01:00
98affc1a48 [xvideos] Fix test 2016-08-16 21:20:15 +07:00
70a2829fee [xvideos] Fix HLS extraction (Closes #10356) 2016-08-16 21:17:52 +07:00
837e56c8ee [amcnetworks] extract episode metadata 2016-08-16 14:49:32 +01:00
b5ddee8c77 [amcnetworks] Add new extractor 2016-08-16 13:44:01 +01:00
fb64adcbd3 [adobepass] PEP 8 2016-08-16 04:45:21 +07:00
4f640f2890 [bbc:playlist] Fix tests 2016-08-16 04:43:10 +07:00
254e64a20a [bbc:playlist] Add support for pagination (Closes #10349) 2016-08-16 04:36:23 +07:00
818ac213eb [adobepass] add IE suffix to the extractor and remove duplicate constant 2016-08-15 21:36:34 +01:00
cbef4d5c9f [fxnetworks] add test and check geo restriction 2016-08-15 17:10:45 +01:00
bf90c46790 [fxnetworks] Add new extractor(closes #9462) 2016-08-15 16:34:32 +01:00
69eb4d699f [cbsnews] Remove invalid tests. CBS Live videos gets deleted soon. 2016-08-15 20:29:22 +08:00
6d8ec8c3b7 [ChangeLog] Update for CBSLocal and related changes 2016-08-15 13:39:43 +08:00
760845ce99 [cbslocal] Adapt to SendtoNewsIE 2016-08-15 13:37:37 +08:00
5c2d087221 [sendtonews] Fix extraction 2016-08-15 13:31:08 +08:00
b6c4e36728 [jwplatform] Parse video_id from JWPlayer data
And remove a mysterious comma from 115c65793a
2016-08-15 13:29:01 +08:00
1a57b8c18c [zippcast] Remove extractor (Closes #10332)
ZippCast is shut down
2016-08-15 08:25:24 +07:00
24eb13b1c6 [uplynk,viceland] update tests and change uplynk extractors names 2016-08-14 22:45:43 +01:00
525e0316c0 [adobepass] fix check for pendingLogout errors 2016-08-14 21:25:43 +01:00
7e60ce9cf7 [adobepass] clear cache in case of pendingLogout errors 2016-08-14 21:24:33 +01:00
e811bcf8f8 [viceland] raise ExtractorError for errors other than HTTP 400 2016-08-14 20:13:35 +01:00
6103f59095 [viceland] remove outdated comment 2016-08-14 19:08:35 +01:00
9fa5789279 [viceland] fix info extraction(closes #8799) 2016-08-14 19:04:23 +01:00
d2ac04674d [viceland] Add new extractor(#8799) 2016-08-14 18:04:50 +01:00
1fd6e30988 [adobepass] create separate class for adobe pass authentication 2016-08-14 18:04:50 +01:00
884cdb6cd9 [life:embed] Improve extraction 2016-08-14 20:49:11 +07:00
9771b1f901 [theplatform] use _get_netrc_login_info and fix session expiration check(#10345) 2016-08-14 11:55:28 +01:00
2118fdd1a9 [common] add separate method for getting netrc ligin info 2016-08-14 11:55:28 +01:00
320d597c21 [vgtv] Detect geo restricted videos (#10348) 2016-08-14 16:25:14 +07:00
aaf44a2f47 [uplynk] Add new extractor 2016-08-13 22:53:41 +01:00
fafabc0712 Update ChangeLog for #10342
[skip ci]
2016-08-14 02:33:15 +08:00
409760a932 Merge pull request #10342 from muphil/patch-1
[xiami] bug fix for extractor xiami.py
2016-08-14 02:30:50 +08:00
phi
097eba019d bug fix for extractor xiami.py
Before applying this patch, when downloading resources from xiami.com, it crashes with these:
Traceback (most recent call last):
  File "/home/phi/.local/bin/youtube-dl", line 11, in <module>
    sys.exit(main())
  File "/home/phi/.local/lib/python3.5/site-packages/youtube_dl/__init__.py", line 433, in main
    _real_main(argv)
  File "/home/phi/.local/lib/python3.5/site-packages/youtube_dl/__init__.py", line 423, in _real_main
    retcode = ydl.download(all_urls)
  File "/home/phi/.local/lib/python3.5/site-packages/youtube_dl/YoutubeDL.py", line 1786, in download
    url, force_generic_extractor=self.params.get('force_generic_extractor', False))
  File "/home/phi/.local/lib/python3.5/site-packages/youtube_dl/YoutubeDL.py", line 691, in extract_info
    ie_result = ie.extract(url)
  File "/home/phi/.local/lib/python3.5/site-packages/youtube_dl/extractor/common.py", line 347, in extract
    return self._real_extract(url)
  File "/home/phi/.local/lib/python3.5/site-packages/youtube_dl/extractor/xiami.py", line 116, in _real_extract
    return self._extract_tracks(self._match_id(url))[0]
  File "/home/phi/.local/lib/python3.5/site-packages/youtube_dl/extractor/xiami.py", line 43, in _extract_tracks
    '%s/%s%s' % (self._API_BASE_URL, item_id, '/type/%s' % typ if typ else ''), item_id)
  File "/home/phi/.local/lib/python3.5/site-packages/youtube_dl/extractor/common.py", line 562, in _download_json
    json_string, video_id, transform_source=transform_source, fatal=fatal)
  File "/home/phi/.local/lib/python3.5/site-packages/youtube_dl/extractor/common.py", line 568, in _parse_json
    return json.loads(json_string)
  File "/usr/lib/python3.5/json/__init__.py", line 312, in loads
    s.__class__.__name__))
TypeError: the JSON object must be str, not 'NoneType'

This patch solves exactly this problem.
2016-08-14 02:18:59 +08:00
73a85620ee release 2016.08.13 2016-08-13 23:17:11 +07:00
a560f28c98 [ChangeLog] Actualize 2016-08-13 23:01:35 +07:00
5ec5461e1a [pbs] Clarify comment on http formats 2016-08-13 22:50:18 +07:00
542130a5d9 [pbs] Fix description extraction and update tests 2016-08-13 21:59:29 +07:00
82997dad57 [franceculture] Fix extraction (Closes #10324) 2016-08-13 21:00:34 +07:00
647a7bf5e8 [pornotube] Fix extraction (Closes #10322) 2016-08-13 20:49:16 +07:00
77afa008dd [4tube] Fix metadata extraction (Closes #10321) 2016-08-13 19:55:09 +07:00
db535435b3 [bigflix] Remove an invalid test
There's no video anymore
2016-08-13 18:02:11 +08:00
c2a453b461 [imgur] Fix width and height extraction (Closes #10325) 2016-08-13 16:46:07 +07:00
cd29eaab95 [vbox7] Remove unused imports 2016-08-13 16:45:34 +07:00
52aa7e7476 [test_verbose_output] Fix tests under Python 3 2016-08-13 17:36:14 +08:00
e97c55ee6a [expotv] Improve extraction and update test 2016-08-13 16:29:05 +07:00
acfccacad5 [downloader/external:curl] Clarify why CurlFD should not capture stderr 2016-08-13 10:26:02 +01:00
5f2c2b7936 [test_utils] add test for option with not str value 2016-08-13 09:54:12 +01:00
cb55908e51 [vbox7] Fix extraction (Closes #10309) 2016-08-13 15:47:20 +07:00
e581224843 [tapely] Remove extractor. It's shut down
Closes #10323
2016-08-13 16:32:07 +08:00
f50365e91c [pbs] add test for videos with undocumented http formats and remove unused import 2016-08-13 09:10:09 +01:00
c366f8d30a [24video] Add support for me and xxx TLDs 2016-08-13 14:47:51 +07:00
6a26c5f9d5 [muenchentv] Fix extraction (Closes #10313) 2016-08-13 14:28:44 +07:00
bd6fb007de [24video] Fix comment count extraction 2016-08-13 14:22:47 +07:00
b69b2ff736 [sunporno] Add support for embed URLs 2016-08-13 14:13:49 +07:00
794e5dcd7e [sunporno] Fix metadata extraction (Closes #10316) 2016-08-13 14:09:35 +07:00
f0d3669437 [hgtv] Add new extractor(closes #3999) 2016-08-12 18:05:49 +01:00
98e698f1ff [external/curl] respect more downloader options and display progress 2016-08-12 12:30:02 +01:00
3cddb8d6a7 [pbs] check all http formats and remove unnecessary request
- some of the quality that not reported in the documentation
are available(4500k, 6500k)
- the videoInfo request doesn't work for a long time
2016-08-12 08:38:06 +01:00
990d533ee4 [crunchyroll] Add support for HLS (Closes #10301) 2016-08-12 00:56:16 +07:00
b0081562d2 release 2016.08.12 2016-08-12 00:22:22 +07:00
fff37cfd4f [ChangeLog] Actualize 2016-08-12 00:18:28 +07:00
a3be69b7f0 [viu] Remove from extractors 2016-08-12 00:14:51 +07:00
0fd1b1624c [goldenmoustache] Remove extractor (Closes #10298)
Now uses dailymotion
2016-08-11 23:52:17 +07:00
367976d49f [drtuber] Improve title extraction 2016-08-11 23:47:52 +07:00
0aef0771f8 [drtuber] Make dislike count optional (Closes #10297) 2016-08-11 23:47:27 +07:00
0c070681c5 [chirbit] Fix extraction (Closes #10296) 2016-08-11 23:37:56 +07:00
30b25d382d [francetvinfo] Relax _VALID_URL 2016-08-11 21:42:55 +07:00
e5f878c205 [ChangeLog] Add change log for #10269
[skip ci]
2016-08-11 19:13:41 +08:00
e2e84aed7e Merge branch 'lkho-pr/#10268' 2016-08-11 19:09:18 +08:00
b1927f4e8a [YoutubeDL] Disable newline conversion when writing subtitles
By default io.open() convert all '\n' occurrences to '\r\n' when writing
files. If the content already contains '\r\n', it will be converted to
'\r\r\n', breaking some video players.
2016-08-11 19:04:23 +08:00
3b9323d96e Merge branch 'pr/#10268' of https://github.com/lkho/youtube-dl into lkho-pr/#10268 2016-08-11 19:03:08 +08:00
7f832413d6 Preserve line endings for downloaded subtitle files 2016-08-10 23:40:50 +08:00
7f2ed47595 [rtlnl] Relax _VALID_URL (Closes #10282) 2016-08-10 21:07:43 +07:00
c3fa77bdef [formula1] Relax _VALID_URL (Closes #10283) 2016-08-10 21:00:40 +07:00
57ce8a6d08 [wat] improve extraction(#10281)
add alternative method to extract http formats
works even if the video is geo-restricted or removed
from public access(most of the cases)
2016-08-10 14:20:28 +01:00
69d8eeeec5 [ctsnews] Fix extraction 2016-08-10 11:38:38 +08:00
81c13222c6 [utils] Recognize more formats in unified_timestamp
Used in CtsNews
2016-08-10 11:37:23 +08:00
b1ce2ba197 release 2016.08.10 2016-08-10 00:20:44 +07:00
5c8411e968 [ChangeLog] Actualize 2016-08-10 00:18:28 +07:00
cc9c8ce5df [devscripts/prepare_manpage] Fix description strings starting with dash (Closes #10273) 2016-08-09 22:24:58 +07:00
20ef4123b9 [uol] remove unused import 2016-08-09 15:13:15 +01:00
4e62d26aa2 [uol] Add new extractor(#4263) 2016-08-09 15:09:08 +01:00
b657816684 Credit @singh-pratyush96 for #10223 2016-08-09 04:04:45 +07:00
9778b3e7ee Credit @zvonicek for #10242 and #10253 2016-08-09 04:03:52 +07:00
25dd58ca6a [metadatafromtitle] Remove unused exception class 2016-08-09 04:01:05 +07:00
5e42f8a0ad Make --metadata-from-title non fatal
Output a warning if the metadata can't be parsed from the title (and don't write any metadata) instead of raising a critical error.
2016-08-09 03:56:22 +07:00
1ad6b891b2 Add more checks for --min/max-sleep-interval arguments and use more idiomatic naming 2016-08-09 03:47:56 +07:00
7aa589a5e1 Fix --min/max-sleep-interval wording 2016-08-09 03:46:52 +07:00
065bc35489 Add --max-sleep-interval (Closes #9930) 2016-08-09 03:32:42 +07:00
3a380766d1 [rbmaradio] Improve, simplify and extract all formats (Closes #10242) 2016-08-09 02:46:29 +07:00
affaea0688 [rbmaradio] Fixed extractor 2016-08-09 02:18:33 +07:00
77426a087b [sonyliv] Improve (Closes #10258) 2016-08-09 02:16:28 +07:00
8991844ea2 [sonyliv] Add new extractor 2016-08-09 02:09:13 +07:00
082395d0a0 [extractor/generic] Add proper default to _search_json_ld call 2016-08-08 22:48:33 +07:00
e8ed7354e6 [flipagram] Add proper default to _search_json_ld call 2016-08-08 22:46:19 +07:00
1e7f602e2a [condenast] Make _search_json_ld call non fatal 2016-08-08 22:45:49 +07:00
522f6c066d [bbc] Add proper default to _search_json_ld call 2016-08-08 22:44:36 +07:00
321b5e082a [extractor/common] Respect default in _search_json_ld 2016-08-08 22:36:18 +07:00
3711fa1eb2 Revert "[flipagram] Make _search_json_ld non fatal"
This reverts commit d34995a9e3.
2016-08-08 21:49:45 +07:00
395c74615c Revert "[extractor/generic] Make _search_json_ld non fatal"
This reverts commit 958849275f.
2016-08-08 21:49:27 +07:00
3dc240e8c6 [sohu] Update _TESTS (closes #10260) 2016-08-08 18:48:21 +08:00
a41a6c5094 [chaturbate] Skip the invalid test 2016-08-08 13:06:02 +08:00
d71207121d [biqle] Skip an invalid test 2016-08-08 12:59:55 +08:00
b1c6f21c74 [aparat] Fix extraction 2016-08-08 12:59:07 +08:00
412abb8760 [bilibili] Update _TESTS 2016-08-08 12:57:17 +08:00
f17d5f6d14 [features.aol.com] Fix _TESTS 2016-08-08 12:52:36 +08:00
6bb801cfaf [cwtv] extract http formats 2016-08-07 22:58:12 +01:00
de02d1f4e9 [rozhlas] Fix regexes and improve extraction (Closes #10253) 2016-08-08 04:58:02 +07:00
e1f93a0a76 [rozhlas] Add new extractor 2016-08-08 04:41:45 +07:00
d21a661bb4 [README.md] Update Options Link
The link references a bad anchor. The updated link now references the correct anchor.
2016-08-08 03:46:42 +07:00
b2bd968f4b [kuwo:singer] Fix extraction 2016-08-07 22:59:34 +08:00
4a01befb34 release 2016.08.07 2016-08-07 21:12:41 +07:00
845dfcdc40 [ChangeLog] Actualize 2016-08-07 21:10:48 +07:00
d92cb46305 [discoverygo] Add extractor (Closes #10245) 2016-08-07 20:57:05 +07:00
a8795327ca [utils] Add support TV Parental Guidelines ratings in parse_age_limit 2016-08-07 20:45:18 +07:00
d34995a9e3 [flipagram] Make _search_json_ld non fatal 2016-08-07 19:06:55 +07:00
958849275f [extractor/generic] Make _search_json_ld non fatal 2016-08-07 19:04:22 +07:00
998f094452 [bbc] Remove proxy from test 2016-08-07 18:13:05 +07:00
aaa42cf0cf [bbc] PEP 8 2016-08-07 18:05:13 +07:00
9fb64c04cd [bbc] Add support for morph embeds (Closes #10239) 2016-08-07 18:01:50 +07:00
f9622868e7 [bbc] preserve format_id backward compatibility 2016-08-07 11:14:15 +01:00
37768f9242 [common] correctly lower the preference of m3u8 master manifest format 2016-08-07 10:59:09 +01:00
a1aadd09a4 [tnaflixnetworkbase] Improve title extraction 2016-08-07 16:00:09 +07:00
b47a75017b [tnaflix] Fix metadata extraction (Closes #10249) 2016-08-07 16:00:03 +07:00
e37b54b140 [fox] fix theplatform release url query 2016-08-06 20:53:39 +01:00
c1decda58c [openload] Fix extraction (closes #9706) 2016-08-07 02:44:15 +08:00
d3f8e038fe [utils] Add decode_png for openload (#9706) 2016-08-07 02:42:58 +08:00
ad152e2d95 [bbc] fix test 2016-08-06 19:36:12 +01:00
b0af12154e [bbc] reduce requests and improve format_id 2016-08-06 19:24:59 +01:00
d16b3c6677 [common] extract partOfTVSeries info in json-ld 2016-08-06 18:58:38 +01:00
c57244cdb1 [common] lower the preference of m3u8 master manifest format 2016-08-06 18:55:05 +01:00
a7e5f27412 [bbc] improve extraction
- extract f4m and dash formats
- improve format sorting and listing
- improve extraction of articles with `otherSettings.playlist`
2016-08-06 18:48:09 +01:00
089a40955c [pokemon] improve _VALID_URL 2016-08-06 12:08:14 +01:00
d73ebac100 [pokemon] Add new extractor(closes #10093) 2016-08-06 11:18:14 +01:00
e563c0d73b [condenast] fallback to loader.js if video.js fail 2016-08-05 21:01:16 +01:00
491c42e690 release 2016.08.06 2016-08-06 01:23:48 +07:00
7f2339c617 [ChangeLog] Actualize 2016-08-06 01:19:47 +07:00
8122e79fef [gamekings] Remove remnants 2016-08-06 00:12:37 +07:00
fe3ad1d456 [adultswim] Remove superfluous md5 from test 2016-08-06 00:02:05 +07:00
038a5e1a65 [adultswim] Add support for trailers (Closes #10235) 2016-08-06 00:00:05 +07:00
84bc23b41b [archiveorg] PEP 8 2016-08-05 23:16:19 +07:00
46933a15d6 [extractor/common] Support root JSON-LD lists (Closes #10203) 2016-08-05 23:14:32 +07:00
3859ebeee6 [tvplay] Capture and output native error message 2016-08-05 22:50:42 +07:00
d50aca41f8 [archiveorg] improve format extraction(closes #10219) 2016-08-05 16:42:15 +01:00
0ca057b965 [jwplatform] add support for playlist extraction and relative urls and improve audio detection 2016-08-05 16:42:15 +01:00
5ca968d0a6 [tvplay] Extract series metadata 2016-08-05 22:37:38 +07:00
f0d31c624e [tvplay] Add support for subtitles (Closes #10194) 2016-08-05 22:17:32 +07:00
08c655906c [5min] fix _VALID_URL(closes #10228) 2016-08-05 10:22:33 +01:00
5a993e1692 [natgeo] fix tests(closes #10229) 2016-08-05 10:13:26 +01:00
a7d2953073 [extractors] add tvp:embed import 2016-08-05 10:11:59 +01:00
fdd0b8f8e0 [tvp] extract video id from the webpage(fixes #7799) 2016-08-05 09:44:15 +01:00
f65dc41b72 [naver] extract upload date 2016-08-05 08:12:25 +01:00
962250f7ea [cbslocal] Fix timestamp parsing (closes #10213) 2016-08-05 11:44:50 +08:00
7dc2a74e0a [utils] Fix unified_timestamp for formats parsed by parsedate_tz() 2016-08-05 11:41:55 +08:00
b02b960c6b [naver] improve extraction(closes #8096) 2016-08-04 21:42:22 +01:00
4f427c4be8 [condenast] improve extraction 2016-08-04 18:30:56 +01:00
8a00ea567b [natgeo:episodeguide] Do not shadow url from outer scope 2016-08-04 23:21:04 +07:00
8895be01fc [5min] fix _VALID_URL 2016-08-04 16:55:12 +01:00
52e7fcfeb7 [engadget] Relax _VALID_URL 2016-08-04 16:34:47 +01:00
2396062c74 [5min] delegate extraction to AolIE
recently the 5min SenseHandler request return
HTTP Error 503: Service Unavailable error
2016-08-04 16:21:27 +01:00
14704aeff6 [kaltura] remove debugging line 2016-08-04 14:54:34 +01:00
3c2c3af059 [extractors] change imports for national geographic extractors 2016-08-04 12:20:56 +01:00
1891ea2d76 [nationalgeographic] Add support for National Geographic Episode Guide 2016-08-04 12:18:10 +01:00
1094074c04 [kaltura] extract subtitles and reduce requests 2016-08-04 09:39:06 +01:00
217d5ae013 [vodplatform] Add new extractor 2016-08-04 09:39:06 +01:00
8b40854529 [common] lower proto_preference of rtsp formats
Most of the time the RtspFD fail to download videos but it report
success of the download with this output:
[mpv] 0 bytes
[download] 100% of 0.00B
2016-08-04 09:39:06 +01:00
6bb0fbf9fb Revert "[README.md] Use full paths for all configuration files (#8863)"
This reverts commit 899d2bea63.
2016-08-04 09:54:28 +08:00
8d3b226b83 [gamekings] Remove extractor
Now covered by generic jwplayer
2016-08-03 22:06:10 +07:00
42b7a5afe0 [limelight] extract http formats 2016-08-03 13:12:51 +01:00
899d2bea63 [README.md] Use full paths for all configuration files (#8863) 2016-08-03 11:15:27 +08:00
9cb0e65d7e [ntvru] Fix extraction 2016-08-02 22:56:48 +07:00
b070564efb [extractor/common] Support multiple properties in _og_search_property 2016-08-02 22:55:14 +07:00
ce28252c48 [options] Add test that checks that --password=secret is hidden in verbose output 2016-08-02 17:03:46 +02:00
3aa9a73554 [options] Hide --password=secret in verbose output 2016-08-02 17:03:26 +02:00
6a9b3b61ea [comedycentral] Re-add shortnames
In cc99d4f826, the shortname feature got deleted by accident. Re-add it as a separate IE.
2016-08-02 14:02:31 +02:00
45408eb075 release 2016.08.01 2016-08-01 22:59:23 +07:00
eafc66855d [ChangeLog] Add recent changes 2016-08-01 22:56:01 +07:00
e03d3e6453 [cwtv] Add support for cwtvpr.com (Closes #10196) 2016-08-01 22:51:01 +07:00
a70e45f80a [limelight] keep videos marked as previewStream
e382b953f0 (commitcomment-18472915)
2016-08-01 16:25:41 +01:00
697655a7c0 [safari] Relax url regexes (Closes #10202) 2016-08-01 21:48:48 +07:00
e382b953f0 [limelight] skip preview and drm protected videos 2016-08-01 00:33:30 +01:00
116e7e0d04 [bloomberg] Support BPlayer() players (closes #10187) 2016-07-31 14:47:19 +08:00
cf03e34ad3 [yandexmusic:track] Fix extraction (Closes #10193) 2016-07-31 07:56:18 +07:00
2903137292 release 2016.07.30 2016-07-30 14:45:07 +07:00
9361f2169c [ChangeLog] Make extractor improvements' descriptions more concrete 2016-07-30 14:43:28 +07:00
35aa6c538f Add ChangeLog 2016-07-30 12:33:09 +08:00
fa9f1d16b8 [dailymotion:playlist] Carry long line 2016-07-29 22:47:34 +07:00
485fedf6fd [dailymotion:playlist] Optimize download archive processing 2016-07-29 22:45:41 +07:00
da0baba5c8 [rtve] Fix extraction for some videos
For example http://www.rtve.es/alacarta/videos/documentos-tv/documentos-tv-descredito/3574098/.
2016-07-29 17:20:27 +02:00
bb9f3bfedf Revert "[rtve] Fix extraction (#10076)"
This reverts commit c39b2ed990.

Apparently outside of Spain using 'auth/resources' is required (#10097).
2016-07-29 17:14:04 +02:00
dbc0b39b91 [tv2] Improve extraction 2016-07-29 22:01:34 +07:00
481c5c5137 [tv2:article] Fix extraction (Closes #10188) 2016-07-29 21:43:17 +07:00
0cacae2807 [twitch:clips] Sort formats 2016-07-29 09:01:53 +07:00
d9d56deadf release 2016.07.28 2016-07-28 02:42:57 +07:00
74ba450a81 [twitch:clips] Fix extraction (Closes #9767) 2016-07-28 22:30:09 +07:00
db19df6ca0 [extractor/generic] Add test for #10179 2016-07-28 22:20:08 +07:00
fbdf8d15d1 [soundcloud] Add _extract_urls (#10179) 2016-07-28 22:16:05 +07:00
94aae01548 [extractor/generic] Extract all soundcloud embeds (Closes #10179) 2016-07-28 22:15:15 +07:00
39eef54cf0 [ard:mediathek] Skip unavailable test 2016-07-28 21:38:23 +07:00
05c8268c81 [shared] Modernize and make more robust 2016-07-27 23:39:02 +07:00
289a16b4f3 [shared] Respect redirect URL (Closes #10170) 2016-07-27 23:28:01 +07:00
7935926baa [devscripts/show-downloads-statistics] Add support for paging 2016-07-27 00:14:40 +07:00
dcbb07c35a release 2016.07.26.2 2016-07-26 23:56:53 +07:00
40090e8d51 [extractor/common] Improve is_suitable
In order to fix breakage introduced by a3aa814b77
2016-07-26 23:54:06 +07:00
3e050d51d4 [orf:oe1] Relax _VALID_URL 2016-07-26 23:14:04 +07:00
ced70c8640 [cbc] PEP 8 2016-07-26 23:08:08 +07:00
9a700deea4 [instagram] Remove duplicate field in test 2016-07-26 23:07:16 +07:00
dc35ba0eba [mgtv] Fix typo 2016-07-26 23:06:21 +07:00
88bd486b9a [cbc] Improve extraction for videos embedded with clipId 2016-07-26 22:58:50 +07:00
7f8b92e3cf [bigflix] Update tests 2016-07-26 21:44:53 +07:00
35f6e0ff36 [mtv.de] Skip 2 geo-restricted tests 2016-07-26 13:19:47 +08:00
326fa4e6e5 [generic] Skip an invalid test 2016-07-26 13:16:04 +08:00
c74299a72c [cmt] Detect unavailable videos and update _TESTS 2016-07-26 13:13:14 +08:00
10a1bb3a78 [mtv] Fix for videos with missing bitrates 2016-07-26 13:12:24 +08:00
4d3e543c73 Update extractors.py 2016-07-26 11:17:28 +08:00
05d1e7aaa9 [generic] Fix an MTV test and another test that breaks nosetests 2016-07-26 11:11:36 +08:00
a3aa814b77 Update _TESTS for MTV sites 2016-07-26 11:10:41 +08:00
5c32a77cad [nextmovie] Remove extractor
This domain name now redirects to mtv.com
2016-07-26 11:08:55 +08:00
14a28e705b [test/test_all_urls] Remove *.cc.com tests 2016-07-26 11:08:09 +08:00
cc99d4f826 [comedycentral] Remove IEs for *.cc.com except tosh.cc.com
All other subdomains now redirects to cc.com/* URLs
2016-07-26 11:06:50 +08:00
712c7530ff [mtv] Extract more metadata and more
1. Remove MTVIggyIE. All www.mtviggy.com URLs now redirects to
   www.mtv.com
2. Fix MTVDEIE
3. Return multiple URLs from _transform_rtmp_url. This is for
   tosh.cc.com
2016-07-26 11:03:43 +08:00
0a147785e8 [camdemy] Extract duration properly 2016-07-25 23:03:58 +07:00
59eaf69e33 [camdemy] Fix camdemy 2016-07-25 23:03:43 +07:00
e8be2943a7 [smotri] Modernize, make more robust and fix tests 2016-07-24 18:38:18 +07:00
8fdc538b46 release 2016.07.24 2016-07-24 11:39:50 +07:00
9513c1eb17 [tvp] Update dash format comment 2016-07-24 11:03:39 +07:00
ae6fff4e64 [onet] Enable dash formats 2016-07-24 10:43:05 +07:00
5a65668e25 [dcn] Enable dash formats 2016-07-24 10:35:55 +07:00
f75e6890db [telegraaf] Make hls non fatal 2016-07-24 10:29:26 +07:00
d9cb92c840 [telegraaf] Enable dash formats 2016-07-24 10:29:09 +07:00
94c04a3c79 [arkena] Enable dash formats 2016-07-24 10:28:11 +07:00
f094834857 [extractor/common] Add support for $ in SegmentTemplate in MPD manifests 2016-07-24 10:27:16 +07:00
111de00289 [DailyMail] Improve title and description extraction 2016-07-24 05:37:13 +07:00
b4a131e1a5 [facebook] Relax _VALID_URL (Closes #10151) 2016-07-24 04:36:49 +07:00
f1991ce928 [arkena] Skip dash formats 2016-07-23 18:07:55 +07:00
6548030a17 Credit @rvanbekkum for arkena (#8682) 2016-07-23 18:00:19 +07:00
3a8947650b [arkenaplay] Remove extractor 2016-07-23 17:57:55 +07:00
1979969f91 [extractor/generic] Add support for arkena embeds 2016-07-23 17:56:48 +07:00
0673741af3 [extractors] Add imports for arkena and lcp 2016-07-23 17:56:29 +07:00
c8e170b209 [lcp] Improve extraction 2016-07-23 17:56:11 +07:00
bbe1f3634a [arkena] Improve extraction (Closes #8682) 2016-07-23 17:55:54 +07:00
4671dd41b2 [arkena:lcp] Add extractors 2016-07-23 17:01:09 +07:00
f164b97123 [utils] Add another f4m mimetype to mimetype2ext 2016-07-23 16:48:59 +07:00
5275efe30d release 2016.07.22 2016-07-22 23:11:28 +07:00
b13647cf3c [eporner] Fix extraction (Closes #10139) 2016-07-22 23:04:13 +07:00
add7d2a0e2 [pornhub] Make error regex less ambiguous (Closes #10138) 2016-07-22 21:24:09 +07:00
e298d3a08c [youtube] Fix authentication (Closes #10140) 2016-07-22 21:05:39 +07:00
fd8c8c7dcd [youtube:shared] Relax _VALID_URL 2016-07-21 22:58:34 +07:00
9158af16cc [bbc.co.uk:iplayer:playlist] Add support for group URLs 2016-07-21 22:37:36 +07:00
c6668e4ad1 [bbc.co.uk:iplayer:playlist] Skip unavailable test 2016-07-21 22:34:55 +07:00
84e8cca48b [youjizz] Relax _VALID_URL (Closes #10131) 2016-07-20 22:41:13 +07:00
790b06b7d4 [odatv] Improve (Closes #9285) 2016-07-20 21:43:22 +07:00
740d7c49c2 [odatv] Add extractor 2016-07-20 21:42:05 +07:00
4e51ec5f57 [extractors] Add import for comedycentral.tv 2016-07-19 22:50:37 +07:00
05087d1b4c [bbc] Improve extraction from sxml playlists 2016-07-19 22:49:38 +07:00
a66a73ee90 [ard] Add test for rbb-online 2016-07-18 02:25:31 +07:00
8188b923db release 2016.07.17 2016-07-17 19:04:29 +07:00
d993a1354d [README.md] Make download URLs consistent 2016-07-17 18:58:47 +07:00
e8882e7043 [spike] Relax _VALID_URL and improve extraction (Closes #10106) 2016-07-17 18:34:25 +07:00
1056821799 [viki] Fix tests (Closes #10098) 2016-07-17 18:13:54 +07:00
890e6d3309 [viki] Lower m3u8 preference
http URLs are always provde the same or better quality
2016-07-17 18:12:03 +07:00
246080d378 [viki] Override m3u8 formats acodec 2016-07-17 18:10:16 +07:00
b1ea680270 Revert "[bbc] extract more and better qulities from Unified Streaming Platform m3u8 manifests"
This reverts commit 0385aa6199.
2016-07-17 17:29:36 +07:00
45550d1039 [comedycentraltv] Add extractor (Closes #10101) 2016-07-17 16:58:58 +07:00
7cdfc4c90f [mtvservices] Strip description 2016-07-17 16:56:39 +07:00
af21f56f98 [ard] Add support for rbb-online (Closes #10095) 2016-07-17 03:40:58 +07:00
1a8f0773b6 [streamable] Fix title extraction and improve (Closes #9122) 2016-07-17 02:01:00 +07:00
59cc5bd8bf [streamable] Add extractor 2016-07-17 01:35:09 +07:00
49bc16b95e [nintendo] Improve playlist extraction (Closes #9986) 2016-07-17 00:01:25 +07:00
a2f9ca1e67 [nintendo] Add extractor 2016-07-16 23:58:53 +07:00
371ddb14fe [extractor/generic] Change twitter:player embeds priority to lowest (Closes #10090) 2016-07-16 15:59:43 +07:00
998895dffa [cloudy] Drop videoraj.to
videoraj.ch is now a shoe-selling website, and videoraj.to domain name
is gone.
2016-07-16 15:37:54 +08:00
aadd3ce21f [cliphunter] Update _TESTS 2016-07-16 15:37:54 +08:00
ae7b846203 [cbsnews] Update _TESTS of CBSNewsLiveVideoIE 2016-07-16 15:37:54 +08:00
21ba7d0981 [cbc] Skip geo-restricted test case 2016-07-16 15:37:54 +08:00
691fbe7f98 release 2016.07.16 2016-07-16 02:20:00 +07:00
2e221ca3a8 [YoutubeDL] Fix incomplete formats check 2016-07-16 01:18:05 +07:00
317f7ab634 [YoutubeDL] Fix format selection with filters (Closes #10083) 2016-07-16 00:55:43 +07:00
23495d6a39 Revert "[ffmpeg] Fix embedding subtitles (#9063)"
This reverts commit ccff2c404d.

Fixes #10081.

The new approach breaks embedding subtitles into video-only or
audio-only files. FFMpeg provides a trick: add '?' after the argument of
'-map' so that a missing stream is ignored. For example:

opts = [
    '-map', '0:v?',
    '-c:v', 'copy',
    '-map', '0:a?',
    '-c:a', 'copy',
    # other options...
]

Unfortunately, such a format is not implemented in avconv, either.
I guess adding '-ignore_unknown' if self.basename == 'ffmpeg' is the
best solution. However, the example mentioned in #9063 no longer serves
problematic files, so I can't test it. I'll reopen #9063 and wait for
another example so that I can test '-ignore_unknown'.
2016-07-15 20:02:36 +08:00
224db034ab [syfy] fix extraction(closes #9087)(closes #3820)(closes #2388) 2016-07-14 23:59:47 +01:00
ad27649be3 [3qsdn] Restrict src JS regex 2016-07-15 03:36:50 +07:00
84571be645 [orf:tvthek] Remove test md5 2016-07-15 03:17:29 +07:00
7b0d333a7e Fix unit tests for m3u8 and RTSP extractors that require ffmpeg or mplayer 2016-07-15 03:06:23 +07:00
342f0c3682 [ninenow] correct test url 2016-07-14 14:19:18 +01:00
38e0f16a94 [ninenow] Add new extractor(closes #5181) 2016-07-14 14:16:11 +01:00
e910fe2fe4 [brightcove] skip ism manifests 2016-07-14 14:13:57 +01:00
233b58dec7 Add extractor for rtve.es/television (fixes #10076) 2016-07-13 21:02:34 +02:00
c39b2ed990 [rtve] Fix extraction (#10076)
For http://www.rtve.es/alacarta/videos/documentos-tv/documentos-tv-revolucion-del-movil/3069778/ using 'auth/resources' fails, and other URLs seem to work fine.
2016-07-13 20:23:27 +02:00
35ec86689c [bbc] extract only the original Unified Streaming Platform m3u8 manifests
0385aa6199 (commitcomment-18233275)
manifests with higher birate require more time to check formats
2016-07-13 18:01:14 +01:00
c485959034 release 2016.07.13 2016-07-13 23:58:01 +07:00
a0560d8ab8 [ellentv] Improve extraction (Closes #10067) 2016-07-13 22:42:53 +07:00
0385aa6199 [bbc] extract more and better qulities from Unified Streaming Platform m3u8 manifests 2016-07-13 15:58:24 +01:00
00f4764cb7 [common] extract vbr, abr and fps for Unified Streaming Platform m3u8 manifests 2016-07-13 15:58:24 +01:00
51c2cd0b83 [extractors] Add vk:wallpost extractor import 2016-07-13 21:53:23 +07:00
5f5a9d6158 [vk] Improve login 2016-07-13 21:52:52 +07:00
2d19fb5072 [vk:wallpost] Add extractor 2016-07-13 21:51:44 +07:00
9d865a1af6 [travis] Skip downloading srelay
SOCKS tests never run on Travis CI due to unknown reasons, and
downloading them broke some tests (e.g.
https://travis-ci.org/rg3/youtube-dl/builds/144306425)
2016-07-13 14:27:14 +08:00
41aa44259d [shahid] try to bypass geo restriction and extract more metadata(closes #10062) 2016-07-12 23:15:38 +01:00
381ff44756 [devscripts/generate-download] Remove MD5 and SHA1 2016-07-12 09:09:54 +02:00
7f29cf545a [youtube] Add YouTube Red paid video reference test (#10059) 2016-07-12 02:10:35 +07:00
7d1219f3e0 [tmz] delegate extraction to KalturaIE 2016-07-11 19:08:22 +01:00
f1b4af7d79 [beightcove:new] remove html tags from description 2016-07-11 19:06:50 +01:00
8a8590a617 [dbtv] delegate extraction to BrightcoveNewIE 2016-07-11 16:30:24 +01:00
4a7a5e41f7 [tvplay] improve extraction 2016-07-11 14:51:44 +01:00
2a49d01600 [playvid] Update _TESTS
Blocks https://travis-ci.org/rg3/youtube-dl/jobs/143809100
2016-07-11 15:15:28 +08:00
b99af8a51c [biobiochiletv] Fix extraction and update _TESTS 2016-07-11 13:23:57 +08:00
8e7020daef [rudo] Add new extractor
Used in biobiochile.tv
2016-07-11 13:19:25 +08:00
a26bcc61c1 release 2016.07.11 2016-07-11 03:17:12 +07:00
5c4dcf8172 [vidzi] Add support for embed URLs (Closes #10058) 2016-07-11 03:14:39 +07:00
e9fb6a4bbe [youtube] Relax TFA regexes 2016-07-11 03:08:38 +07:00
e2dbcaa1bf [vuclip] Fix extraction 2016-07-11 00:52:25 +08:00
ae01850165 [miomio] Fix _TESTS 2016-07-11 00:03:24 +08:00
c3baaedfc8 [miomio] Support new 'h5' player (closes #9605)
Depends on #8876
2016-07-10 23:46:48 +08:00
0b68de3cc1 Merge pull request #8876 from remitamine/html5_media
[extractor/common] add helper method to extract html5 media entries
2016-07-10 23:40:45 +08:00
39e9d524e5 Credit @nehalvpatel for roosterteeth (#9864) 2016-07-10 01:30:12 +07:00
865b087224 [roosterteeth] Improve (Closes #9864) 2016-07-10 01:30:12 +07:00
3121b25639 [roosterteeth] Add extractor 2016-07-10 01:30:12 +07:00
0286b85c79 release 2016.07.09.2 2016-07-09 22:22:24 +07:00
ab52bb5137 [animeondemand] Fix typo 2016-07-09 22:20:34 +07:00
61a98b8623 [lynda] Remove md5 from test (Closes #10047) 2016-07-09 21:29:11 +07:00
6daf34a045 [facebook] Fix typo and break when found video_data (Closes #10048) 2016-07-09 21:25:07 +07:00
c03adf90bd [generic] Add the test. Closes #1638 2016-07-09 14:39:01 +08:00
0ece114b7b [vimeo] Recognize non-standard embeds (#1638) 2016-07-09 14:38:27 +08:00
5b6a74856b Merge pull request #9288 from reyyed/issue#9063fix
[ffmpeg] Fix embedding subtitles (#9063)
2016-07-09 14:29:53 +08:00
ce43100a01 release 2016.07.09.1 2016-07-09 10:06:40 +07:00
8cc9b4016d [srmediathek] extend _VALID_URL(closes #9373) 2016-07-09 03:22:09 +01:00
31eeab9f41 [ard] fix f4m extraction and skip tests with 404 errors 2016-07-09 03:22:09 +01:00
9558dcec9c [youtube:user] Preserve user/c path segment 2016-07-09 08:37:19 +07:00
6e6b70d65f [extractor/generic] Properly comment out a test 2016-07-09 08:37:19 +07:00
d417fd88d0 release 2016.07.09 2016-07-09 07:16:47 +07:00
9e4f5dc1e9 [animeondemand] Pass num for episode based videos 2016-07-09 07:13:32 +07:00
1251565ee0 [options] Rollback old behavior for configuratio files' encoding
Until agreed with some solution
2016-07-09 07:12:52 +07:00
1f7258a367 [animeondemand] Add support for full length films (Closes #10031) 2016-07-09 06:57:04 +07:00
0af985069b [flipagram] Improve extraction (Closes #9898) 2016-07-09 03:31:17 +07:00
0de168f7ed [extractor/generic] Detect schema.org/VideoObject embeds 2016-07-09 03:29:07 +07:00
95b31e266b [extractor/common] Add expected_type in json ld routines 2016-07-09 03:28:04 +07:00
6b3a3098b5 [extractor/common] Extract more metadata for VideoObject in _json_ld 2016-07-09 03:27:11 +07:00
2de624fdd5 [extractor/common] Introduce filesize metafield for thumbnails 2016-07-09 03:24:36 +07:00
3fee7f636c [flipagram] Add extractor 2016-07-09 03:23:32 +07:00
89e2fff2b7 [mgtv] pass geo verification headers for api request 2016-07-08 20:18:25 +01:00
cedc70b292 [facebook] Fix invalid video being extracted (Closes #9851) 2016-07-09 00:28:07 +07:00
07d7689f2e [le] extract http formats 2016-07-08 15:35:20 +01:00
ae8cb5328d Merge branch 'JakubAdamWieczorek-polskie-radio' 2016-07-08 19:35:21 +08:00
2e32ac0b9a [polskieradio] Fix regex in _TESTS 2016-07-08 19:34:53 +08:00
672f01c370 Merge branch 'polskie-radio' of https://github.com/JakubAdamWieczorek/youtube-dl into JakubAdamWieczorek-polskie-radio 2016-07-08 19:33:28 +08:00
e2d616dd30 [polskieradio] Add thumbnails. 2016-07-08 13:23:00 +02:00
0ab7f4fe2b [nick] support nickjr.com (closes #7542) 2016-07-08 15:11:28 +08:00
29c4a07776 [lynda] Fix test 2016-07-08 03:33:53 +07:00
826e911e41 Merge branch 'master' of github.com:rg3/youtube-dl 2016-07-07 19:42:22 +02:00
30d22dae8e [options] Do not decode Unicode on Python 2.x
The configuration file contents are being returned as unicode now, so decoding them is no longer necessary.
(Run python2 with -3 to see the warning before this commit)
2016-07-07 19:41:00 +02:00
ec3518725b [compat] Fix test_cmdline_umlauts on Python 2.6
The original statement raises uncaught UnicodeWarning on Python 2.6
2016-07-07 22:30:58 +08:00
5f87d845eb [tweakers] fix info extraction(closes #9516) 2016-07-07 12:51:42 +01:00
571808a7aa document comments in configuration file (fixes #10024) 2016-07-07 12:12:21 +02:00
dfe5fa49ae [compat] Fix compat_shlex_split for non-ASCII input
Closes #9871
2016-07-07 17:37:29 +08:00
01a0c511eb [radiocanada] extract more formats 2016-07-07 03:46:12 +01:00
b3d30315ce Merge pull request #9597 from remitamine/toutv
[toutv] fix info extraction(closes #1792)(closes #2082)
2016-07-07 01:51:01 +01:00
882af14d7d [toutv] fix info extraction(closes #1792)(closes #2082) 2016-07-07 01:47:28 +01:00
47335a0efa [telecinco] fix info extraction 2016-07-06 23:09:13 +01:00
34bc2d9dfd release 2016.07.07 2016-07-07 01:54:29 +07:00
08c7af4afa [kamcord] Add extractor (Closes #10001) 2016-07-07 01:50:39 +07:00
f7291a0b7c [daum.net] Fix extraction for specific examples
Closes #9972
2016-07-07 01:26:14 +08:00
c65aa4e9e1 [brightcove:legacy] Support 'playlistTabs' and skip a dead test
Closes #9965
2016-07-07 01:13:37 +08:00
ad213a1d74 [francetv] Recognize more Dailymotion embedded videos
Closes #9955
2016-07-06 23:37:54 +08:00
43f1e4e41e [onet] Add MD5 checksum 2016-07-06 20:32:03 +08:00
54b0e909d5 [amp] Fix a typo 2016-07-06 20:10:47 +08:00
f8752b86ac [Onet,ClipRs] Add new extractor for onet.tv and use it for clip.rs
Closes #9950
2016-07-06 20:09:05 +08:00
84c237fb8a [utils] Add get_element_by_class
For #9950
2016-07-06 20:02:52 +08:00
ab49d7a9fa use mimetype2ext to determine manifest ext in multiple extractors 2016-07-06 09:11:46 +01:00
b4173f1551 [utils] add mimetypes to determine manifest ext(m3u8, f4m, mpd) 2016-07-06 09:06:28 +01:00
2817b99cf2 [metacafe] fix info extraction(closes #8539)(closes #3253) 2016-07-06 02:19:55 +01:00
001fffd004 [spiegel:article] update test(closes #10018) 2016-07-06 00:16:41 +01:00
0e94b4713d release 2016.07.06 2016-07-06 00:54:23 +07:00
a6d3b89feb [prosiebensat1] Make downloading urls JSON non fatal 2016-07-06 00:52:48 +07:00
6c26815d63 [onionstudios] fix info extraction 2016-07-05 18:05:07 +01:00
73c4ac2c95 [youtube:channel] Improve channel id extraction and detect unavailable channels (Closes #10009) 2016-07-05 23:30:44 +07:00
84f214d840 [prosiebensat1] extract all formats 2016-07-05 17:11:45 +01:00
e3f88be7a9 [rtvnh] extract all formats 2016-07-05 14:45:39 +01:00
31af3e35e0 [sandia] remove unused imports 2016-07-05 13:39:24 +01:00
94a5cff91d [sendia] fix info extraction 2016-07-05 13:37:46 +01:00
77082c7b9e [slideshare] fix description extraction 2016-07-05 12:01:04 +01:00
252a1f75d2 [spiegel] improve info extraction 2016-07-05 11:46:25 +01:00
5abf513cf8 [stitcher] fix episode config extraction 2016-07-05 10:44:16 +01:00
c6054e3201 [xuite] Support videos with already encoded media id 2016-07-05 14:26:42 +08:00
4080530624 [youtube:shared] Recognize the new 'shared' URLs
Closes #10007
2016-07-05 13:15:05 +08:00
c25f1a9b63 release 2016.07.05 2016-07-05 06:32:46 +07:00
dfaa86b75e [test_utils] add test for smuggling a smuggled url 2016-07-04 21:36:32 +01:00
d9163ae3b6 [kaltura] fix extraction error for videos from multiple kaltura servers 2016-07-04 21:34:27 +01:00
dafafe7cf1 [la7] extract more info from a kaltura custom server 2016-07-04 17:59:58 +01:00
81953d1ae5 [kaltura] add support videos stored on custom kaltura servers(closes #5557) 2016-07-04 17:59:58 +01:00
3a212ed62e [iqiyi] Skip an unstable MD5 checksum 2016-07-04 11:25:46 +08:00
195f084542 [pornhub] Detect private videos (Closes #9987) 2016-07-04 03:27:00 +07:00
aa7a455b2e [README.md] Clarify configuration file may not exist by default 2016-07-04 01:24:33 +07:00
6a4e659c93 [yahoo] Recognize brightcove embed (Closes #9995) 2016-07-03 23:00:36 +07:00
40f3666f6b [test/test_http] Update tests for 38cce791c7 2016-07-03 23:50:55 +08:00
dd801bbe18 [brightcove] improve error detection 2016-07-03 16:37:22 +01:00
38cce791c7 Rename --cn-verfication-proxy to --geo-verification-proxy
And deprecate the former one

Since commit f138873900, this option is
not limited to China websites, so rename it.
2016-07-03 23:29:56 +08:00
bf3ae6a543 [devscripts/show-downloads-statictics] Add script for displaying downloads statistics 2016-07-03 22:20:14 +07:00
bff98341d5 release 2016.07.03.1 2016-07-03 21:28:55 +07:00
2644e911be [iqiyi] Fix extraction
See https://github.com/soimort/you-get/issues/1211#issuecomment-229011559
2016-07-03 22:19:56 +08:00
a5f67895d3 [nationalgeographic] restore http formats
there was a misunderstanding about the reason of 403 response
the problem happen only when the user use aria2c as a downloader
a1f6f5c768 (commitcomment-18107559)
2016-07-03 14:10:25 +01:00
15e4b6b758 [rai] Support an alternative form of embedded relinker URL
Closes #8551
2016-07-03 19:52:11 +08:00
2b28b892d8 [rai] Support videos with embedded content item ID (#8551) 2016-07-03 19:52:11 +08:00
7507fc98cb [README.md] Fix somes typo in coding conventions section 2016-07-03 18:35:28 +07:00
477b7a8474 [downloader/f4m] Fix for Rai live streams 2016-07-03 19:26:39 +08:00
034a884957 [rai] Support direct relinker URLs (closes #8552) 2016-07-03 19:26:39 +08:00
64436cb1a4 [nationalgeographic] skip download for national geographic channel tests(closes #9991) 2016-07-03 10:43:36 +01:00
f138873900 [rai] Fix extraction and update _TESTS
Closes #8617
Closes #9157
Closes #9232
2016-07-03 15:49:35 +08:00
e793338c88 [buzzfeed] Detect Facebook embed and update _TESTS
Closes #5701
2016-07-03 14:12:02 +08:00
369bb06206 [facebook] Improve embed detection (#5701) 2016-07-03 14:11:29 +08:00
2cb31d288e [history:topic] Relax _VALID_URL 2016-07-03 13:01:04 +07:00
c723d1cd8d [README.md] Update some codebase links 2016-07-03 11:35:13 +07:00
1f55234057 Add PULL_REQUEST_TEMPLATE.md 2016-07-03 11:31:49 +07:00
04006fae8d [README.md] Start writing youtube-dl coding conventions 2016-07-03 11:31:07 +07:00
4cb13d0d6a [hrti] Don't redefine variable in list comprehension 2016-07-02 23:02:14 +02:00
a1f6f5c768 [nationalgeographic] add support Adobe Pass auth 2016-07-02 21:24:22 +01:00
05c7feec77 [aenetworks] add support Adobe Pass auth 2016-07-02 21:24:22 +01:00
bf83024826 [theplatform] add basic support for Adobe Pass 2016-07-02 21:24:22 +01:00
a0cfd82dda release 2016.07.03 2016-07-03 03:19:22 +07:00
1b734adb2d [xtube] Fix extraction (Closes #9953, closes #9961) 2016-07-03 03:17:35 +07:00
9b724d7277 [extractors] Add hrti:playlist import 2016-07-03 02:25:39 +07:00
c3a5dd3b5d Credit @atopuzov for hrti (#9482) 2016-07-03 02:22:59 +07:00
e3755a624b [hrti] Improve and add support for playlists (Closes #9482) 2016-07-03 02:22:14 +07:00
95cf60e826 [utils] Add PUTRequest 2016-07-03 02:21:32 +07:00
6b03e1e25d [HRTi] Implement extractor for Croatian Radiotelevision 2016-07-03 02:20:41 +07:00
712b0b5b70 [la7.it] Fix the extractor 2016-07-02 23:49:03 +08:00
6a424391d9 [facebook] Make embed detection stricter to prevent false-positives 2016-07-02 23:15:55 +08:00
dbf0157a26 [generic] Add MD5 checksums 2016-07-02 21:58:07 +08:00
7deef1ba67 [generic] Support Wordpress "YouTube Video Importer" plugin
Closes #9938
2016-07-02 21:58:07 +08:00
fd6ca38262 [facebook] Improve Facebook embedded detection
Related to #9938.

Another example comes from 9834872bf6.
2016-07-02 21:58:07 +08:00
bdafd88da0 [vk] Extend _VALID_URLs to support new domain (Closes #9981) 2016-07-02 16:43:19 +07:00
7a1e71575e release 2016.07.02 2016-07-02 02:47:42 +07:00
ac2d8f54d1 [vine] Remove superfluous whitespace 2016-07-02 02:45:00 +07:00
14ff6baa0e [fusion] Improve 2016-07-02 02:44:37 +07:00
bb08101ec4 [Fusion] Add new extractor 2016-07-02 02:37:28 +07:00
bc4b2d75ba [pornhub] Add support for thumbzilla (Closes #8696) 2016-07-02 02:11:07 +07:00
35fc3021ba [periscope] Add another fallback source 2016-07-02 01:35:57 +07:00
347227237b [periscope] fix playlist extraction (#9967)
The JSON response changed and the extractor needed to be updated in order to gather the video IDs.
2016-07-02 01:29:11 +07:00
564dc3c6e8 [vine] Fix extraction (Closes #9970) 2016-07-02 01:24:57 +07:00
9f4576a7eb [twitch] Update usher URL (Closes #9975) 2016-07-01 23:16:43 +07:00
f11315e8d4 release 2016.07.01 2016-07-01 03:59:57 +07:00
0c2ac64bb8 [sixplay] Rename preference key to quality in format dict 2016-07-01 03:57:59 +07:00
a9eede3913 [test/compat] compat_shlex_split: test with newlines 2016-07-01 03:30:35 +07:00
9e29ef13a3 [options] Accept quoted string across multiple lines (#9940)
Like:

    -f "
    bestvideo+bestaudio/
    best
    "
2016-07-01 03:30:31 +07:00
eaaaaec042 [pornhub] Add more tests with removed videos 2016-07-01 03:18:27 +07:00
3cb3b60064 [pornhub] Relax removed message regex (Closes #9964) 2016-07-01 03:14:23 +07:00
044e3d91b5 [Pornhub] Fix error detection 2016-07-01 02:59:50 +07:00
c9e538a3b1 [ctvnews] use orderedSet, increase the number of items for playlists and use smaller bin list for test 2016-06-30 19:52:32 +01:00
76dad392f5 [meta] Clarify the source of uppod st decryption algorithm 2016-06-30 18:27:57 +01:00
9617b557aa [ctv] Add new extractor(closes #4077) 2016-06-30 18:22:35 +01:00
bf4fa24414 [ctvnews] Add new extractor(closes #2156) 2016-06-30 18:22:35 +01:00
20361b4f25 [rds] extract 9c9media formats 2016-06-30 18:22:35 +01:00
05a0068a76 [9c9media] Add new extractor 2016-06-30 18:22:35 +01:00
66a42309fa release 2016.06.30 2016-06-30 23:56:55 +07:00
fd94e2671a [meta] Add support for pladform embeds 2016-06-30 23:20:44 +07:00
8ff6697861 [pladform] Improve embed detection 2016-06-30 23:19:29 +07:00
eafa643715 [meta] Make duration and description optional
For iframe URLs
2016-06-30 23:06:13 +07:00
049da7cb6c [meta] Extend _VALID_URL 2016-06-30 23:04:18 +07:00
7dbeee7e22 [generic] make twitter:player extraction non fatal 2016-06-30 14:11:55 +01:00
93ad6c6bfa [sixplay] Add new extractor(closes #2183) 2016-06-30 13:50:49 +01:00
329179073b [generic] add generic support for twitter:player embeds 2016-06-30 12:01:30 +01:00
4d86d2008e [urplay] fix typo and check with flake8 2016-06-30 11:30:42 +01:00
ab47b6e881 [theatlantic] Add new extractor(closes #6611) 2016-06-30 04:08:56 +01:00
df43389ade [skysports] Add new extractor(closes #7066) 2016-06-30 02:54:21 +01:00
397b305cfe [meta] Add new extractor(closes #8789) 2016-06-30 00:21:03 +01:00
e496fa50cd [urplay] Add new extractor(closes #9332) 2016-06-29 20:19:31 +01:00
06a96da15b [eagleplatform] Improve embed detection and extract in separate routine (Closes #9926) 2016-06-29 23:01:34 +07:00
70157c2c43 [aenetworks] add support for movie pages 2016-06-29 16:55:17 +01:00
c58ed8563d [aenetworks] extract history topic playlist title 2016-06-29 16:18:16 +01:00
4c7821227c [aenetworks:historytopic] fix topic video url 2016-06-29 16:03:32 +01:00
42362fdb5e [aenetworks] add support for show and season for A&E Network sites and History topics(closes #9816) 2016-06-29 15:49:17 +01:00
97124e572d [arte:playlist] Fix test 2016-06-28 22:39:53 +07:00
32616c14cc [vrt] extract all formats 2016-06-28 14:02:03 +01:00
8174d0fe95 release 2016.06.27 2016-06-27 23:09:39 +07:00
8704778d95 [pbs] Check manually constructed http links (Closes #9921) 2016-06-27 23:06:42 +07:00
c287f2bc60 [extractor/generic] Use _extract_url for kaltura embeds (Closes #9922) 2016-06-27 22:45:26 +07:00
9ea5c04c0d [kaltura] Add _extract_url with fixed regex 2016-06-27 22:44:17 +07:00
fd7a7498a4 [test_all_urls] PEP 8 and change wording 2016-06-27 22:11:45 +07:00
e3a6747d8f New test-case: extractor names are supposed to be unique
@dstftw explained in
https://github.com/rg3/youtube-dl/pull/9918#issuecomment-228625878 that
extractor names are supposed to be unique. @dstftw has fixed the two
offending extractors, and here I add a test to ensure this does not
happen in the future.
2016-06-27 22:09:29 +07:00
f41ffc00d1 [skynewsarabia:article] Clarify IE_NAME 2016-06-27 05:08:09 +07:00
81fda15369 [sr:mediathek] Clarify IE_NAME 2016-06-27 05:07:12 +07:00
427cd050a3 [extractor/generic] Improve kaltura embed detection (Closes #9911) 2016-06-27 04:11:53 +07:00
b0c200f1ec [msn] Add test URL with non-alphanumeric characters 2016-06-26 22:03:36 +07:00
92747e664a release 2016.06.26 2016-06-26 21:15:24 +07:00
f1f336322d [msn] Fix extraction (Closes #8960, closes #9542) 2016-06-26 21:10:05 +07:00
bf8dd79045 [extractor/common] Fix sorting with custom field preference 2016-06-26 21:09:07 +07:00
c6781156aa [MSN] add new extractor 2016-06-26 21:07:59 +07:00
59bbe4911a [extractor/common] add helper method to extract html5 media entries 2016-06-26 14:04:08 +01:00
4f3c5e0627 [utils] add helper function for parsing codecs 2016-06-26 14:03:58 +01:00
f484c5fa25 [vidbit] Improve (Closes #9759) 2016-06-26 16:59:28 +07:00
88d9f6c0c4 [utils] Add support for name list in _html_search_meta 2016-06-26 16:57:14 +07:00
3c9c088f9c [Vidbit] Add new extractor 2016-06-26 16:52:52 +07:00
fc3996bfe1 [iqiyi] Remove codes for debugging 2016-06-26 15:45:41 +08:00
5b6ad8630c [iqiyi] Partially fix IqiyiIE
Use the HTML5 API. Only low-resolution formats available

Related: #9839

Thanks @zhangn1985 for the overall algorithm (soimort/you-get#1224)
2016-06-26 15:18:32 +08:00
30105f4ac0 [le] Move urshift() to utils.py 2016-06-26 15:17:26 +08:00
1143535d76 [utils] Add urshift()
Used in IqiyiIE and LeIE
2016-06-26 15:16:49 +08:00
7d52c052ef [generic] Fix test_Generic_76
Broken: https://travis-ci.org/rg3/youtube-dl/jobs/140251658
2016-06-26 11:56:27 +08:00
a2406fce3c Fix misspelling 2016-06-26 01:28:55 +07:00
3b34ab538c [svtplay] Extend _VALID_URL (#9900) 2016-06-26 00:29:53 +07:00
ac782306f1 [iqiyi] Mark broken 2016-06-26 00:25:41 +07:00
0c00e889f3 Credit @JakubAdamWieczorek for #9813 2016-06-25 23:35:57 +07:00
ce96ed05f4 [polskieradio] Add test with video 2016-06-25 23:31:21 +07:00
0463b77a1f [polskieradio] Improve extraction (Closes #9813) 2016-06-25 23:19:18 +07:00
2d185706ea [polskieradio] Add support for Polskie Radio.
Polskie Radio is the main Polish state-funded radio broadcasting service.
2016-06-25 23:19:18 +07:00
b72b44318c [utils] Add strip_or_none 2016-06-25 23:19:18 +07:00
46f59e89ea [utils] Add unified_timestamp 2016-06-25 23:19:18 +07:00
b4241e308e release 2016.06.25 2016-06-25 03:03:20 +07:00
3d4b08dfc7 [setup.py] Add file version information and quotes consistency (Closes #9878) 2016-06-25 02:50:12 +07:00
be49068d65 [youtube] Fix and skip some tests 2016-06-24 22:47:19 +07:00
525cedb971 [youtube] Relax URL expansion in description 2016-06-24 22:37:13 +07:00
de3c7fe0d4 [youtube] Fix 141 format tests 2016-06-24 22:27:55 +07:00
896cc72750 [mixcloud] View count and like count may be absent
Closes #9874
2016-06-24 17:26:12 +08:00
c1ff6e1ad0 [vimeo:review] Fix extraction for password-protected videos
Closes #9853
2016-06-24 16:48:37 +08:00
fee70322d7 [appletrailers] correct thumbnail fallback 2016-06-23 19:03:34 +01:00
8065d6c55f [dcn] extend _VALID_URL for awaan.ae and extract all available formats 2016-06-23 17:22:15 +01:00
494172d2e5 [appletrailers] extract info from an alternative source if available(closes #8422)(closes #8422) 2016-06-23 15:49:42 +01:00
6e3c2047f8 [tvp] extract all formats and detect erros 2016-06-23 04:36:16 +01:00
ccff2c404d [ffmpeg] Fix embedding subtitles (#9063)
Changed command line parameters for ffmpeg when embedding subtitles.
Changed to ‘-map 0:v -c:v copy -map 0:a -c:a copy’
2016-04-24 00:08:02 +08:00
440 changed files with 18324 additions and 7353 deletions

View File

@ -6,8 +6,8 @@
--- ---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.06.23.1*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected. ### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.10.16*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.06.23.1** - [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.10.16**
### Before submitting an *issue* make sure you have: ### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections - [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2016.06.23.1 [debug] youtube-dl version 2016.10.16
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}
@ -55,4 +55,4 @@ $ youtube-dl -v <your command line>
### Description of your *issue*, suggested solution and other information ### Description of your *issue*, suggested solution and other information
Explanation of your *issue* in arbitrary form goes here. Please make sure the [description is worded well enough to be understood](https://github.com/rg3/youtube-dl#is-the-description-of-the-issue-itself-sufficient). Provide as much context and examples as possible. Explanation of your *issue* in arbitrary form goes here. Please make sure the [description is worded well enough to be understood](https://github.com/rg3/youtube-dl#is-the-description-of-the-issue-itself-sufficient). Provide as much context and examples as possible.
If work on your *issue* required an account credentials please provide them or explain how one can obtain them. If work on your *issue* requires account credentials please provide them or explain how one can obtain them.

View File

@ -55,4 +55,4 @@ $ youtube-dl -v <your command line>
### Description of your *issue*, suggested solution and other information ### Description of your *issue*, suggested solution and other information
Explanation of your *issue* in arbitrary form goes here. Please make sure the [description is worded well enough to be understood](https://github.com/rg3/youtube-dl#is-the-description-of-the-issue-itself-sufficient). Provide as much context and examples as possible. Explanation of your *issue* in arbitrary form goes here. Please make sure the [description is worded well enough to be understood](https://github.com/rg3/youtube-dl#is-the-description-of-the-issue-itself-sufficient). Provide as much context and examples as possible.
If work on your *issue* required an account credentials please provide them or explain how one can obtain them. If work on your *issue* requires account credentials please provide them or explain how one can obtain them.

27
.github/PULL_REQUEST_TEMPLATE.md vendored Normal file
View File

@ -0,0 +1,27 @@
## Please follow the guide below
- You will be asked some questions, please read them **carefully** and answer honestly
- Put an `x` into all the boxes [ ] relevant to your *pull request* (like that [x])
- Use *Preview* tab to see how your *pull request* will actually look like
---
### Before submitting a *pull request* make sure you have:
- [ ] At least skimmed through [adding new extractor tutorial](https://github.com/rg3/youtube-dl#adding-support-for-a-new-site) and [youtube-dl coding conventions](https://github.com/rg3/youtube-dl#youtube-dl-coding-conventions) sections
- [ ] [Searched](https://github.com/rg3/youtube-dl/search?q=is%3Apr&type=Issues) the bugtracker for similar pull requests
### In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under [Unlicense](http://unlicense.org/). Check one of the following options:
- [ ] I am the original author of this code and I am willing to release it under [Unlicense](http://unlicense.org/)
- [ ] I am not the original author of this code but it is in public domain or released under [Unlicense](http://unlicense.org/) (provide reliable evidence)
### What is the purpose of your *pull request*?
- [ ] Bug fix
- [ ] Improvement
- [ ] New extractor
- [ ] New feature
---
### Description of your *pull request* and other information
Explanation of your *pull request* in arbitrary form goes here. Please make sure the description explains the purpose and effect of your *pull request* and is worded well enough to be understood. Provide as much context and examples as possible.

1
.gitignore vendored
View File

@ -29,6 +29,7 @@ updates_key.pem
*.m4a *.m4a
*.m4v *.m4v
*.mp3 *.mp3
*.3gp
*.part *.part
*.swp *.swp
test/testdata test/testdata

View File

@ -7,9 +7,6 @@ python:
- "3.4" - "3.4"
- "3.5" - "3.5"
sudo: false sudo: false
install:
- bash ./devscripts/install_srelay.sh
- export PATH=$PATH:$(pwd)/tmp/srelay-0.4.8b6
script: nosetests test --verbose script: nosetests test --verbose
notifications: notifications:
email: email:

13
AUTHORS
View File

@ -26,7 +26,7 @@ Albert Kim
Pierre Rudloff Pierre Rudloff
Huarong Huo Huarong Huo
Ismael Mejía Ismael Mejía
Steffan 'Ruirize' James Steffan Donal
Andras Elso Andras Elso
Jelle van der Waa Jelle van der Waa
Marcin Cieślak Marcin Cieślak
@ -175,3 +175,14 @@ Tomáš Čech
Déstin Reed Déstin Reed
Roman Tsiupa Roman Tsiupa
Artur Krysiak Artur Krysiak
Jakub Adam Wieczorek
Aleksandar Topuzović
Nehal Patel
Rob van Bekkum
Petr Zvoníček
Pratyush Singh
Aleksander Nitecki
Sebastian Blunt
Matěj Cepl
Xie Yanbo
Philip Xu

View File

@ -12,7 +12,7 @@ $ youtube-dl -v <your command line>
[debug] Proxy map: {} [debug] Proxy map: {}
... ...
``` ```
**Do not post screenshots of verbose log only plain text is acceptable.** **Do not post screenshots of verbose logs; only plain text is acceptable.**
The output (including the first lines) contains important debugging information. Issues without the full output are often not reproducible and therefore do not get solved in short order, if ever. The output (including the first lines) contains important debugging information. Issues without the full output are often not reproducible and therefore do not get solved in short order, if ever.
@ -46,7 +46,7 @@ Make sure that someone has not already opened the issue you're trying to open. S
### Why are existing options not enough? ### Why are existing options not enough?
Before requesting a new feature, please have a quick peek at [the list of supported options](https://github.com/rg3/youtube-dl/blob/master/README.md#synopsis). Many feature requests are for features that actually exist already! Please, absolutely do show off your work in the issue report and detail how the existing similar options do *not* solve your problem. Before requesting a new feature, please have a quick peek at [the list of supported options](https://github.com/rg3/youtube-dl/blob/master/README.md#options). Many feature requests are for features that actually exist already! Please, absolutely do show off your work in the issue report and detail how the existing similar options do *not* solve your problem.
### Is there enough context in your bug report? ### Is there enough context in your bug report?
@ -66,7 +66,7 @@ Only post features that you (or an incapacitated friend you can personally talk
### Is your question about youtube-dl? ### Is your question about youtube-dl?
It may sound strange, but some bug reports we receive are completely unrelated to youtube-dl and relate to a different or even the reporter's own application. Please make sure that you are actually using youtube-dl. If you are using a UI for youtube-dl, report the bug to the maintainer of the actual application providing the UI. On the other hand, if your UI for youtube-dl fails in some way you believe is related to youtube-dl, by all means, go ahead and report the bug. It may sound strange, but some bug reports we receive are completely unrelated to youtube-dl and relate to a different, or even the reporter's own, application. Please make sure that you are actually using youtube-dl. If you are using a UI for youtube-dl, report the bug to the maintainer of the actual application providing the UI. On the other hand, if your UI for youtube-dl fails in some way you believe is related to youtube-dl, by all means, go ahead and report the bug.
# DEVELOPER INSTRUCTIONS # DEVELOPER INSTRUCTIONS
@ -85,7 +85,7 @@ To run the test, simply invoke your favorite test runner, or execute a test file
If you want to create a build of youtube-dl yourself, you'll need If you want to create a build of youtube-dl yourself, you'll need
* python * python
* make (both GNU make and BSD make are supported) * make (only GNU make is supported)
* pandoc * pandoc
* zip * zip
* nosetests * nosetests
@ -97,9 +97,17 @@ If you want to add support for a new site, first of all **make sure** this site
After you have ensured this site is distributing it's content legally, you can follow this quick list (assuming your service is called `yourextractor`): After you have ensured this site is distributing it's content legally, you can follow this quick list (assuming your service is called `yourextractor`):
1. [Fork this repository](https://github.com/rg3/youtube-dl/fork) 1. [Fork this repository](https://github.com/rg3/youtube-dl/fork)
2. Check out the source code with `git clone git@github.com:YOUR_GITHUB_USERNAME/youtube-dl.git` 2. Check out the source code with:
3. Start a new git branch with `cd youtube-dl; git checkout -b yourextractor`
git clone git@github.com:YOUR_GITHUB_USERNAME/youtube-dl.git
3. Start a new git branch with
cd youtube-dl
git checkout -b yourextractor
4. Start with this simple template and save it to `youtube_dl/extractor/yourextractor.py`: 4. Start with this simple template and save it to `youtube_dl/extractor/yourextractor.py`:
```python ```python
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
@ -143,16 +151,148 @@ After you have ensured this site is distributing it's content legally, you can f
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py). 5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. 6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc.
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L74-L252). Add tests and code for as many as you want. 7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L74-L252). Add tests and code for as many as you want.
8. Keep in mind that the only mandatory fields in info dict for successful extraction process are `id`, `title` and either `url` or `formats`, i.e. these are the critical data the extraction does not make any sense without. This means that [any field](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L148-L252) apart from aforementioned mandatory ones should be treated **as optional** and extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields. For example, if you have some intermediate dict `meta` that is a source of metadata and it has a key `summary` that you want to extract and put into resulting info dict as `description`, you should be ready that this key may be missing from the `meta` dict, i.e. you should extract it as `meta.get('summary')` and not `meta['summary']`. Similarly, you should pass `fatal=False` when extracting data from a webpage with `_search_regex/_html_search_regex`. 8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://pypi.python.org/pypi/flake8). Also make sure your code works under all [Python](http://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+.
9. Check the code with [flake8](https://pypi.python.org/pypi/flake8). Also make sure your code works under all [Python](http://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+. 9. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
10. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
$ git add youtube_dl/extractor/extractors.py $ git add youtube_dl/extractor/extractors.py
$ git add youtube_dl/extractor/yourextractor.py $ git add youtube_dl/extractor/yourextractor.py
$ git commit -m '[yourextractor] Add new extractor' $ git commit -m '[yourextractor] Add new extractor'
$ git push origin yourextractor $ git push origin yourextractor
11. Finally, [create a pull request](https://help.github.com/articles/creating-a-pull-request). We'll then review and merge it. 10. Finally, [create a pull request](https://help.github.com/articles/creating-a-pull-request). We'll then review and merge it.
In any case, thank you very much for your contributions! In any case, thank you very much for your contributions!
## youtube-dl coding conventions
This section introduces a guide lines for writing idiomatic, robust and future-proof extractor code.
Extractors are very fragile by nature since they depend on the layout of the source data provided by 3rd party media hosters out of your control and this layout tends to change. As an extractor implementer your task is not only to write code that will extract media links and metadata correctly but also to minimize dependency on the source's layout and even to make the code foresee potential future changes and be ready for that. This is important because it will allow the extractor not to break on minor layout changes thus keeping old youtube-dl versions working. Even though this breakage issue is easily fixed by emitting a new version of youtube-dl with a fix incorporated, all the previous versions become broken in all repositories and distros' packages that may not be so prompt in fetching the update from us. Needless to say, some non rolling release distros may never receive an update at all.
### Mandatory and optional metafields
For extraction to work youtube-dl relies on metadata your extractor extracts and provides to youtube-dl expressed by an [information dictionary](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L75-L257) or simply *info dict*. Only the following meta fields in the *info dict* are considered mandatory for a successful extraction process by youtube-dl:
- `id` (media identifier)
- `title` (media title)
- `url` (media download URL) or `formats`
In fact only the last option is technically mandatory (i.e. if you can't figure out the download location of the media the extraction does not make any sense). But by convention youtube-dl also treats `id` and `title` as mandatory. Thus the aforementioned metafields are the critical data that the extraction does not make any sense without and if any of them fail to be extracted then the extractor is considered completely broken.
[Any field](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L149-L257) apart from the aforementioned ones are considered **optional**. That means that extraction should be **tolerant** to situations when sources for these fields can potentially be unavailable (even if they are always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields.
#### Example
Say you have some source dictionary `meta` that you've fetched as JSON with HTTP request and it has a key `summary`:
```python
meta = self._download_json(url, video_id)
```
Assume at this point `meta`'s layout is:
```python
{
...
"summary": "some fancy summary text",
...
}
```
Assume you want to extract `summary` and put it into the resulting info dict as `description`. Since `description` is an optional metafield you should be ready that this key may be missing from the `meta` dict, so that you should extract it like:
```python
description = meta.get('summary') # correct
```
and not like:
```python
description = meta['summary'] # incorrect
```
The latter will break extraction process with `KeyError` if `summary` disappears from `meta` at some later time but with the former approach extraction will just go ahead with `description` set to `None` which is perfectly fine (remember `None` is equivalent to the absence of data).
Similarly, you should pass `fatal=False` when extracting optional data from a webpage with `_search_regex`, `_html_search_regex` or similar methods, for instance:
```python
description = self._search_regex(
r'<span[^>]+id="title"[^>]*>([^<]+)<',
webpage, 'description', fatal=False)
```
With `fatal` set to `False` if `_search_regex` fails to extract `description` it will emit a warning and continue extraction.
You can also pass `default=<some fallback value>`, for example:
```python
description = self._search_regex(
r'<span[^>]+id="title"[^>]*>([^<]+)<',
webpage, 'description', default=None)
```
On failure this code will silently continue the extraction with `description` set to `None`. That is useful for metafields that may or may not be present.
### Provide fallbacks
When extracting metadata try to do so from multiple sources. For example if `title` is present in several places, try extracting from at least some of them. This makes it more future-proof in case some of the sources become unavailable.
#### Example
Say `meta` from the previous example has a `title` and you are about to extract it. Since `title` is a mandatory meta field you should end up with something like:
```python
title = meta['title']
```
If `title` disappeares from `meta` in future due to some changes on the hoster's side the extraction would fail since `title` is mandatory. That's expected.
Assume that you have some another source you can extract `title` from, for example `og:title` HTML meta of a `webpage`. In this case you can provide a fallback scenario:
```python
title = meta.get('title') or self._og_search_title(webpage)
```
This code will try to extract from `meta` first and if it fails it will try extracting `og:title` from a `webpage`.
### Make regular expressions flexible
When using regular expressions try to write them fuzzy and flexible.
#### Example
Say you need to extract `title` from the following HTML code:
```html
<span style="position: absolute; left: 910px; width: 90px; float: right; z-index: 9999;" class="title">some fancy title</span>
```
The code for that task should look similar to:
```python
title = self._search_regex(
r'<span[^>]+class="title"[^>]*>([^<]+)', webpage, 'title')
```
Or even better:
```python
title = self._search_regex(
r'<span[^>]+class=(["\'])title\1[^>]*>(?P<title>[^<]+)',
webpage, 'title', group='title')
```
Note how you tolerate potential changes in the `style` attribute's value or switch from using double quotes to single for `class` attribute:
The code definitely should not look like:
```python
title = self._search_regex(
r'<span style="position: absolute; left: 910px; width: 90px; float: right; z-index: 9999;" class="title">(.*?)</span>',
webpage, 'title', group='title')
```
### Use safe conversion functions
Wrap all extracted numeric data into safe functions from `utils`: `int_or_none`, `float_or_none`. Use them for string to number conversions as well.

821
ChangeLog Normal file
View File

@ -0,0 +1,821 @@
version 2016.10.16
Core
* [postprocessor/ffmpeg] Return correct filepath and ext in updated information
in FFmpegExtractAudioPP (#10879)
Extractors
+ [ruutu] Add support for supla.fi (#10849)
+ [theoperaplatform] Add support for theoperaplatform.eu (#10914)
* [lynda] Fix height for prioritized streams
+ [lynda] Add fallback extraction scenario
* [lynda] Switch to https (#10916)
+ [huajiao] New extractor (#10917)
* [cmt] Fix mgid extraction (#10813)
+ [safari:course] Add support for techbus.safaribooksonline.com
* [orf:tvthek] Fix extraction and modernize (#10898)
* [chirbit] Fix extraction of user profile pages
* [carambatv] Fix extraction
* [canalplus] Fix extraction for some videos
* [cbsinteractive] Fix extraction for cnet.com
* [parliamentliveuk] Lower case URLs are now recognized (#10912)
version 2016.10.12
Core
+ Support HTML media elements without child nodes
* [Makefile] Support for GNU make < 4 is fixed; BSD make dropped (#9387)
Extractors
* [dailymotion] Fix extraction (#10901)
* [vimeo:review] Fix extraction (#10900)
* [nhl] Correctly handle invalid formats (#10713)
* [footyroom] Fix extraction (#10810)
* [abc.net.au:iview] Fix for standalone (non series) videos (#10895)
+ [hbo] Add support for episode pages (#10892)
* [allocine] Fix extraction (#10860)
+ [nextmedia] Recognize action news on AppleDaily
* [lego] Improve info extraction and bypass geo restriction (#10872)
version 2016.10.07
Extractors
+ [iprima] Detect geo restriction
* [facebook] Fix video extraction (#10846)
+ [commonprotocols] Support direct MMS links (#10838)
+ [generic] Add support for multiple vimeo embeds (#10862)
+ [nzz] Add support for nzz.ch (#4407)
+ [npo] Detect geo restriction
+ [npo] Add support for 2doc.nl (#10842)
+ [lego] Add support for lego.com (#10369)
+ [tonline] Add support for t-online.de (#10376)
* [techtalks] Relax URL regular expression (#10840)
* [youtube:live] Extend URL regular expression (#10839)
+ [theweatherchannel] Add support for weather.com (#7188)
+ [thisoldhouse] Add support for thisoldhouse.com (#10837)
+ [nhl] Add support for wch2016.com (#10833)
* [pornoxo] Use JWPlatform to improve metadata extraction
version 2016.10.02
Core
* Fix possibly lost extended attributes during post-processing
+ Support pyxattr as well as python-xattr for --xattrs and
--xattr-set-filesize (#9054)
Extractors
+ [jwplatform] Support DASH streams in JWPlayer
+ [jwplatform] Support old-style JWPlayer playlists
+ [byutv:event] Add extractor
* [periscope:user] Fix extraction (#10820)
* [dctp] Fix extraction (#10734)
+ [instagram] Extract video dimensions (#10790)
+ [tvland] Extend URL regular expression (#10812)
+ [vgtv] Add support for tv.aftonbladet.se (#10800)
- [aftonbladet] Remove extractor
* [vk] Fix timestamp and view count extraction (#10760)
+ [vk] Add support for running and finished live streams (#10799)
+ [leeco] Recognize more Le Sports URLs (#10794)
+ [instagram] Extract comments (#10788)
+ [ketnet] Extract mzsource formats (#10770)
* [limelight:media] Improve HTTP formats extraction
version 2016.09.27
Core
+ Add hdcore query parameter to akamai f4m formats
+ Delegate HLS live streams downloading to ffmpeg
+ Improved support for HTML5 subtitles
Extractors
+ [vk] Add support for dailymotion embeds (#10661)
* [promptfile] Fix extraction (#10634)
* [kaltura] Speed up embed regular expressions (#10764)
+ [npo] Add support for anderetijden.nl (#10754)
+ [prosiebensat1] Add support for advopedia sites
* [mwave] Relax URL regular expression (#10735, #10748)
* [prosiebensat1] Fix playlist support (#10745)
+ [prosiebensat1] Add support for sat1gold sites (#10745)
+ [cbsnews:livevideo] Fix extraction and extract m3u8 formats
+ [brightcove:new] Add support for live streams
* [soundcloud] Generalize playlist entries extraction (#10733)
+ [mtv] Add support for new URL schema (#8169, #9808)
* [einthusan] Fix extraction (#10714)
+ [twitter] Support Periscope embeds (#10737)
+ [openload] Support subtitles (#10625)
version 2016.09.24
Core
+ Add support for watchTVeverywhere.com authentication provider based MSOs for
Adobe Pass authentication (#10709)
Extractors
+ [soundcloud:playlist] Provide video id for early playlist entries (#10733)
+ [prosiebensat1] Add support for kabeleinsdoku (#10732)
* [cbs] Extract info from thunder videoPlayerService (#10728)
* [openload] Fix extraction (#10408)
+ [ustream] Support the new HLS streams (#10698)
+ [ooyala] Extract all HLS formats
+ [cartoonnetwork] Add support for Adobe Pass authentication
+ [soundcloud] Extract license metadata
+ [fox] Add support for Adobe Pass authentication (#8584)
+ [tbs] Add support for Adobe Pass authentication (#10642, #10222)
+ [trutv] Add support for Adobe Pass authentication (#10519)
+ [turner] Add support for Adobe Pass authentication
version 2016.09.19
Extractors
+ [crunchyroll] Check if already authenticated (#10700)
- [twitch:stream] Remove fallback to profile extraction when stream is offline
* [thisav] Improve title extraction (#10682)
* [vyborymos] Improve station info extraction
version 2016.09.18
Core
+ Introduce manifest_url and fragments fields in formats dictionary for
fragmented media
+ Provide manifest_url field for DASH segments, HLS and HDS
+ Provide fragments field for DASH segments
* Rework DASH segments downloader to use fragments field
+ Add helper method for Wowza Streaming Engine formats extraction
Extractors
+ [vyborymos] Add extractor for vybory.mos.ru (#10692)
+ [xfileshare] Add title regular expression for streamin.to (#10646)
+ [globo:article] Add support for multiple videos (#10653)
+ [thisav] Recognize HTML5 videos (#10447)
* [jwplatform] Improve JWPlayer detection
+ [mangomolo] Add support for Mangomolo embeds
+ [toutv] Add support for authentication (#10669)
* [franceinter] Fix upload date extraction
* [tv4] Fix HLS and HDS formats extraction (#10659)
version 2016.09.15
Core
* Improve _hidden_inputs
+ Introduce improved explicit Adobe Pass support
+ Add --ap-mso to provide multiple-system operator identifier
+ Add --ap-username to provide MSO account username
+ Add --ap-password to provide MSO account password
+ Add --ap-list-mso to list all supported MSOs
+ Add support for Rogers Cable multiple-system operator (#10606)
Extractors
* [crunchyroll] Fix authentication (#10655)
* [twitch] Fix API calls (#10654, #10660)
+ [bellmedia] Add support for more Bell Media Television sites
* [franceinter] Fix extraction (#10538, #2105)
* [kuwo] Improve error detection (#10650)
+ [go] Add support for free full episodes (#10439)
* [bilibili] Fix extraction for specific videos (#10647)
* [nhk] Fix extraction (#10633)
* [kaltura] Improve audio detection
* [kaltura] Skip chun format
+ [vimeo:ondemand] Pass Referer along with embed URL (#10624)
+ [nbc] Add support for NBC Olympics (#10361)
version 2016.09.11.1
Extractors
+ [tube8] Extract categories and tags (#10579)
+ [pornhub] Extract categories and tags (#10499)
* [openload] Temporary fix (#10408)
+ [foxnews] Add support Fox News articles (#10598)
* [viafree] Improve video id extraction (#10615)
* [iwara] Fix extraction after relaunch (#10462, #3215)
+ [tfo] Add extractor for tfo.org
* [lrt] Fix audio extraction (#10566)
* [9now] Fix extraction (#10561)
+ [canalplus] Add support for c8.fr (#10577)
* [newgrounds] Fix uploader extraction (#10584)
+ [polskieradio:category] Add support for category lists (#10576)
+ [ketnet] Add extractor for ketnet.be (#10343)
+ [canvas] Add support for een.be (#10605)
+ [telequebec] Add extractor for telequebec.tv (#1999)
* [parliamentliveuk] Fix extraction (#9137)
version 2016.09.08
Extractors
+ [jwplatform] Extract height from format label
+ [yahoo] Extract Brightcove Legacy Studio embeds (#9345)
* [videomore] Fix extraction (#10592)
* [foxgay] Fix extraction (#10480)
+ [rmcdecouverte] Add extractor for rmcdecouverte.bfmtv.com (#9709)
* [gamestar] Fix metadata extraction (#10479)
* [puls4] Fix extraction (#10583)
+ [cctv] Add extractor for CCTV and CNTV (#8153)
+ [lci] Add extractor for lci.fr (#10573)
+ [wat] Extract DASH formats
+ [viafree] Improve video id detection (#10569)
+ [trutv] Add extractor for trutv.com (#10519)
+ [nick] Add support for nickelodeon.nl (#10559)
+ [abcotvs:clips] Add support for clips.abcotvs.com
+ [abcotvs] Add support for ABC Owned Television Stations sites (#9551)
+ [miaopai] Add extractor for miaopai.com (#10556)
* [gamestar] Fix metadata extraction (#10479)
+ [bilibili] Add support for episodes (#10190)
+ [tvnoe] Add extractor for tvnoe.cz (#10524)
version 2016.09.04.1
Core
* In DASH downloader if the first segment fails, abort the whole download
process to prevent throttling (#10497)
+ Add support for --skip-unavailable-fragments and --fragment retries in
hlsnative downloader (#10165, #10448).
+ Add support for --skip-unavailable-fragments in DASH downloader
+ Introduce --skip-unavailable-fragments option for fragment based downloaders
that allows to skip fragments unavailable due to a HTTP error
* Fix extraction of video/audio entries with src attribute in
_parse_html5_media_entries (#10540)
Extractors
* [theplatform] Relax URL regular expression (#10546)
* [youtube:playlist] Extend URL regular expression
* [rottentomatoes] Delegate extraction to internetvideoarchive extractor
* [internetvideoarchive] Extract all formats
* [pornvoisines] Fix extraction (#10469)
* [rottentomatoes] Fix extraction (#10467)
* [espn] Extend URL regular expression (#10549)
* [vimple] Extend URL regular expression (#10547)
* [youtube:watchlater] Fix extraction (#10544)
* [youjizz] Fix extraction (#10437)
+ [foxnews] Add support for FoxNews Insider (#10445)
+ [fc2] Recognize Flash player URLs (#10512)
version 2016.09.03
Core
* Restore usage of NAME attribute from EXT-X-MEDIA tag for formats codes in
_extract_m3u8_formats (#10522)
* Handle semicolon in mimetype2ext
Extractors
+ [youtube] Add support for rental videos' previews (#10532)
* [youtube:playlist] Fallback to video extraction for video/playlist URLs when
no playlist is actually served (#10537)
+ [drtv] Add support for dr.dk/nyheder (#10536)
+ [facebook:plugins:video] Add extractor (#10530)
+ [go] Add extractor for *.go.com sites
* [adobepass] Check for authz_token expiration (#10527)
* [nytimes] improve extraction
* [thestar] Fix extraction (#10465)
* [glide] Fix extraction (#10478)
- [exfm] Remove extractor (#10482)
* [youporn] Fix categories and tags extraction (#10521)
+ [curiositystream] Add extractor for app.curiositystream.com
- [thvideo] Remove extractor (#10464)
* [movingimage] Fix for the new site name (#10466)
+ [cbs] Add support for once formats (#10515)
* [limelight] Skip ism snd duplicate manifests
+ [porncom] Extract categories and tags (#10510)
+ [facebook] Extract timestamp (#10508)
+ [yahoo] Extract more formats
version 2016.08.31
Extractors
* [soundcloud] Fix URL regular expression to avoid clashes with sets (#10505)
* [bandcamp:album] Fix title extraction (#10455)
* [pyvideo] Fix extraction (#10468)
+ [ctv] Add support for tsn.ca, bnn.ca and thecomedynetwork.ca (#10016)
* [9c9media] Extract more metadata
* [9c9media] Fix multiple stacks extraction (#10016)
* [adultswim] Improve video info extraction (#10492)
* [vodplatform] Improve embed regular expression
- [played] Remove extractor (#10470)
+ [tbs] Add extractor for tbs.com and tntdrama.com (#10222)
+ [cartoonnetwork] Add extractor for cartoonnetwork.com (#10110)
* [adultswim] Rework in terms of turner extractor
* [cnn] Rework in terms of turner extractor
* [nba] Rework in terms of turner extractor
+ [turner] Add base extractor for Turner Broadcasting System based sites
* [bilibili] Fix extraction (#10375)
* [openload] Fix extraction (#10408)
version 2016.08.28
Core
+ Add warning message that ffmpeg doesn't support SOCKS
* Improve thumbnail sorting
+ Extract formats from #EXT-X-MEDIA tags in _extract_m3u8_formats
* Fill IV with leading zeros for IVs shorter than 16 octets in hlsnative
+ Add ac-3 to the list of audio codecs in parse_codecs
Extractors
* [periscope:user] Fix extraction (#10453)
* [douyutv] Fix extraction (#10153, #10318, #10444)
+ [nhk:vod] Add extractor for www3.nhk.or.jp on demand (#4437, #10424)
- [trutube] Remove extractor (#10438)
+ [usanetwork] Add extractor for usanetwork.com
* [crackle] Fix extraction (#10333)
* [spankbang] Fix description and uploader extraction (#10339)
* [discoverygo] Detect cable provider restricted videos (#10425)
+ [cbc] Add support for watch.cbc.ca
* [kickstarter] Silent the warning for og:description (#10415)
* [mtvservices:embedded] Fix extraction for the new 'edge' player (#10363)
version 2016.08.24.1
Extractors
+ [pluralsight] Add support for subtitles (#9681)
version 2016.08.24
Extractors
* [youtube] Fix authentication (#10392)
* [openload] Fix extraction (#10408)
+ [bravotv] Add support for Adobe Pass (#10407)
* [bravotv] Fix clip info extraction (#10407)
* [eagleplatform] Improve embedded videos detection (#10409)
* [awaan] Fix extraction
* [mtvservices:embedded] Update config URL
+ [abc:iview] Add extractor (#6148)
version 2016.08.22
Core
* Improve formats and subtitles extension auto calculation
+ Recognize full unit names in parse_filesize
+ Add support for m3u8 manifests in HTML5 multimedia tags
* Fix octal/hexadecimal number detection in js_to_json
Extractors
+ [ivi] Add support for 720p and 1080p
+ [charlierose] Add new extractor (#10382)
* [1tv] Fix extraction (#9249)
* [twitch] Renew authentication
* [kaltura] Improve subtitles extension calculation
+ [zingmp3] Add support for video clips
* [zingmp3] Fix extraction (#10041)
* [kaltura] Improve subtitles extraction (#10279)
* [cultureunplugged] Fix extraction (#10330)
+ [cnn] Add support for money.cnn.com (#2797)
* [cbsnews] Fix extraction (#10362)
* [cbs] Fix extraction (#10393)
+ [litv] Support 'promo' URLs (#10385)
* [snotr] Fix extraction (#10338)
* [n-tv.de] Fix extraction (#10331)
* [globo:article] Relax URL and video id regular expressions (#10379)
version 2016.08.19
Core
- Remove output template description from --help
* Recognize lowercase units in parse_filesize
Extractors
+ [porncom] Add extractor for porn.com (#2251, #10251)
+ [generic] Add support for DBTV embeds
* [vk:wallpost] Fix audio extraction for new site layout
* [vk] Fix authentication
+ [hgtvcom:show] Add extractor for hgtv.com shows (#10365)
+ [discoverygo] Add support for another GO network sites
version 2016.08.17
Core
+ Add _get_netrc_login_info
Extractors
* [mofosex] Extract all formats (#10335)
+ [generic] Add support for vbox7 embeds
+ [vbox7] Add support for embed URLs
+ [viafree] Add extractor (#10358)
+ [mtg] Add support for viafree URLs (#10358)
* [theplatform] Extract all subtitles per language
+ [xvideos] Fix HLS extraction (#10356)
+ [amcnetworks] Add extractor
+ [bbc:playlist] Add support for pagination (#10349)
+ [fxnetworks] Add extractor (#9462)
* [cbslocal] Fix extraction for SendtoNews-based videos
* [sendtonews] Fix extraction
* [jwplatform] Extract video id from JWPlayer data
- [zippcast] Remove extractor (#10332)
+ [viceland] Add extractor (#8799)
+ [adobepass] Add base extractor for Adobe Pass Authentication
* [life:embed] Improve extraction
* [vgtv] Detect geo restricted videos (#10348)
+ [uplynk] Add extractor
* [xiami] Fix extraction (#10342)
version 2016.08.13
Core
* Show progress for curl external downloader
* Forward more options to curl external downloader
Extractors
* [pbs] Fix description extraction
* [franceculture] Fix extraction (#10324)
* [pornotube] Fix extraction (#10322)
* [4tube] Fix metadata extraction (#10321)
* [imgur] Fix width and height extraction (#10325)
* [expotv] Improve extraction
+ [vbox7] Fix extraction (#10309)
- [tapely] Remove extractor (#10323)
* [muenchentv] Fix extraction (#10313)
+ [24video] Add support for .me and .xxx TLDs
* [24video] Fix comment count extraction
* [sunporno] Add support for embed URLs
* [sunporno] Fix metadata extraction (#10316)
+ [hgtv] Add extractor for hgtv.ca (#3999)
- [pbs] Remove request to unavailable API
+ [pbs] Add support for high quality HTTP formats
+ [crunchyroll] Add support for HLS formats (#10301)
version 2016.08.12
Core
* Subtitles are now written as is. Newline conversions are disabled. (#10268)
+ Recognize more formats in unified_timestamp
Extractors
- [goldenmoustache] Remove extractor (#10298)
* [drtuber] Improve title extraction
* [drtuber] Make dislike count optional (#10297)
* [chirbit] Fix extraction (#10296)
* [francetvinfo] Relax URL regular expression
* [rtlnl] Relax URL regular expression (#10282)
* [formula1] Relax URL regular expression (#10283)
* [wat] Improve extraction (#10281)
* [ctsnews] Fix extraction
version 2016.08.10
Core
* Make --metadata-from-title non fatal when title does not match the pattern
* Introduce options for randomized sleep before each download
--min-sleep-interval and --max-sleep-interval (#9930)
* Respect default in _search_json_ld
Extractors
+ [uol] Add extractor for uol.com.br (#4263)
* [rbmaradio] Fix extraction and extract all formats (#10242)
+ [sonyliv] Add extractor for sonyliv.com (#10258)
* [aparat] Fix extraction
* [cwtv] Extract HTTP formats
+ [rozhlas] Add extractor for prehravac.rozhlas.cz (#10253)
* [kuwo:singer] Fix extraction
version 2016.08.07
Core
+ Add support for TV Parental Guidelines ratings in parse_age_limit
+ Add decode_png (#9706)
+ Add support for partOfTVSeries in JSON-LD
* Lower master M3U8 manifest preference for better format sorting
Extractors
+ [discoverygo] Add extractor (#10245)
* [flipagram] Make JSON-LD extraction non fatal
* [generic] Make JSON-LD extraction non fatal
+ [bbc] Add support for morph embeds (#10239)
* [tnaflixnetworkbase] Improve title extraction
* [tnaflix] Fix metadata extraction (#10249)
* [fox] Fix theplatform release URL query
* [openload] Fix extraction (#9706)
* [bbc] Skip duplicate manifest URLs
* [bbc] Improve format code
+ [bbc] Add support for DASH and F4M
* [bbc] Improve format sorting and listing
* [bbc] Improve playlist extraction
+ [pokemon] Add extractor (#10093)
+ [condenast] Add fallback scenario for video info extraction
version 2016.08.06
Core
* Add support for JSON-LD root list entries (#10203)
* Improve unified_timestamp
* Lower preference of RTSP formats in generic sorting
+ Add support for multiple properties in _og_search_property
* Improve password hiding from verbose output
Extractors
+ [adultswim] Add support for trailers (#10235)
* [archiveorg] Improve extraction (#10219)
+ [jwplatform] Add support for playlists
+ [jwplatform] Add support for relative URLs
* [jwplatform] Improve audio detection
+ [tvplay] Capture and output native error message
+ [tvplay] Extract series metadata
+ [tvplay] Add support for subtitles (#10194)
* [tvp] Improve extraction (#7799)
* [cbslocal] Fix timestamp parsing (#10213)
+ [naver] Add support for subtitles (#8096)
* [naver] Improve extraction
* [condenast] Improve extraction
* [engadget] Relax URL regular expression
* [5min] Fix extraction
+ [nationalgeographic] Add support for Episode Guide
+ [kaltura] Add support for subtitles
* [kaltura] Optimize network requests
+ [vodplatform] Add extractor for vod-platform.net
- [gamekings] Remove extractor
* [limelight] Extract HTTP formats
* [ntvru] Fix extraction
+ [comedycentral] Re-add :tds and :thedailyshow shortnames
version 2016.08.01
Fixed/improved extractors
- [yandexmusic:track] Adapt to changes in track location JSON (#10193)
- [bloomberg] Support another form of player (#10187)
- [limelight] Skip DRM protected videos
- [safari] Relax regular expressions for URL matching (#10202)
- [cwtv] Add support for cwtvpr.com (#10196)
version 2016.07.30
Fixed/improved extractors
- [twitch:clips] Sort formats
- [tv2] Use m3u8_native
- [tv2:article] Fix video detection (#10188)
- rtve (#10076)
- [dailymotion:playlist] Optimize download archive processing (#10180)
version 2016.07.28
Fixed/improved extractors
- shared (#10170)
- soundcloud (#10179)
- twitch (#9767)
version 2016.07.26.2
Fixed/improved extractors
- smotri
- camdemy
- mtv
- comedycentral
- cmt
- cbc
- mgtv
- orf
version 2016.07.24
New extractors
- arkena (#8682)
- lcp (#8682)
Fixed/improved extractors
- facebook (#10151)
- dailymail
- telegraaf
- dcn
- onet
- tvp
Miscellaneous
- Support $Time$ in DASH manifests
version 2016.07.22
New extractors
- odatv (#9285)
Fixed/improved extractors
- bbc
- youjizz (#10131)
- youtube (#10140)
- pornhub (#10138)
- eporner (#10139)
version 2016.07.17
New extractors
- nintendo (#9986)
- streamable (#9122)
Fixed/improved extractors
- ard (#10095)
- mtv
- comedycentral (#10101)
- viki (#10098)
- spike (#10106)
Miscellaneous
- Improved twitter player detection (#10090)
version 2016.07.16
New extractors
- ninenow (#5181)
Fixed/improved extractors
- rtve (#10076)
- brightcove
- 3qsdn
- syfy (#9087, #3820, #2388)
- youtube (#10083)
Miscellaneous
- Fix subtitle embedding for video-only and audio-only files (#10081)
version 2016.07.13
New extractors
- rudo
Fixed/improved extractors
- biobiochiletv
- tvplay
- dbtv
- brightcove
- tmz
- youtube (#10059)
- shahid (#10062)
- vk
- ellentv (#10067)
version 2016.07.11
New Extractors
- roosterteeth (#9864)
Fixed/improved extractors
- miomio (#9605)
- vuclip
- youtube
- vidzi (#10058)
version 2016.07.09.2
Fixed/improved extractors
- vimeo (#1638)
- facebook (#10048)
- lynda (#10047)
- animeondemand
Fixed/improved features
- Embedding subtitles no longer throws an error with problematic inputs (#9063)
version 2016.07.09.1
Fixed/improved extractors
- youtube
- ard
- srmediatek (#9373)
version 2016.07.09
New extractors
- Flipagram (#9898)
Fixed/improved extractors
- telecinco
- toutv
- radiocanada
- tweakers (#9516)
- lynda
- nick (#7542)
- polskieradio (#10028)
- le
- facebook (#9851)
- mgtv
- animeondemand (#10031)
Fixed/improved features
- `--postprocessor-args` and `--downloader-args` now accepts non-ASCII inputs
on non-Windows systems
version 2016.07.07
New extractors
- kamcord (#10001)
Fixed/improved extractors
- spiegel (#10018)
- metacafe (#8539, #3253)
- onet (#9950)
- francetv (#9955)
- brightcove (#9965)
- daum (#9972)
version 2016.07.06
Fixed/improved extractors
- youtube (#10007, #10009)
- xuite
- stitcher
- spiegel
- slideshare
- sandia
- rtvnh
- prosiebensat1
- onionstudios
version 2016.07.05
Fixed/improved extractors
- brightcove
- yahoo (#9995)
- pornhub (#9997)
- iqiyi
- kaltura (#5557)
- la7
- Changed features
- Rename --cn-verfication-proxy to --geo-verification-proxy
Miscellaneous
- Add script for displaying downloads statistics
version 2016.07.03.1
Fixed/improved extractors
- theplatform
- aenetworks
- nationalgeographic
- hrti (#9482)
- facebook (#5701)
- buzzfeed (#5701)
- rai (#8617, #9157, #9232, #8552, #8551)
- nationalgeographic (#9991)
- iqiyi
version 2016.07.03
New extractors
- hrti (#9482)
Fixed/improved extractors
- vk (#9981)
- facebook (#9938)
- xtube (#9953, #9961)
version 2016.07.02
New extractors
- fusion (#9958)
Fixed/improved extractors
- twitch (#9975)
- vine (#9970)
- periscope (#9967)
- pornhub (#8696)
version 2016.07.01
New extractors
- 9c9media
- ctvnews (#2156)
- ctv (#4077)
Fixed/Improved extractors
- rds
- meta (#8789)
- pornhub (#9964)
- sixplay (#2183)
New features
- Accept quoted strings across multiple lines (#9940)

View File

@ -1,7 +1,7 @@
all: youtube-dl README.md CONTRIBUTING.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish supportedsites all: youtube-dl README.md CONTRIBUTING.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish supportedsites
clean: clean:
rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish youtube_dl/extractor/lazy_extractors.py *.dump *.part *.info.json *.mp4 *.m4a *.flv *.mp3 *.avi *.mkv *.webm *.jpg *.png CONTRIBUTING.md.tmp ISSUE_TEMPLATE.md.tmp youtube-dl youtube-dl.exe rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish youtube_dl/extractor/lazy_extractors.py *.dump *.part* *.info.json *.mp4 *.m4a *.flv *.mp3 *.avi *.mkv *.webm *.3gp *.jpg *.png CONTRIBUTING.md.tmp ISSUE_TEMPLATE.md.tmp youtube-dl youtube-dl.exe
find . -name "*.pyc" -delete find . -name "*.pyc" -delete
find . -name "*.class" -delete find . -name "*.class" -delete
@ -12,7 +12,7 @@ SHAREDIR ?= $(PREFIX)/share
PYTHON ?= /usr/bin/env python PYTHON ?= /usr/bin/env python
# set SYSCONFDIR to /etc if PREFIX=/usr or PREFIX=/usr/local # set SYSCONFDIR to /etc if PREFIX=/usr or PREFIX=/usr/local
SYSCONFDIR != if [ $(PREFIX) = /usr -o $(PREFIX) = /usr/local ]; then echo /etc; else echo $(PREFIX)/etc; fi SYSCONFDIR = $(shell if [ $(PREFIX) = /usr -o $(PREFIX) = /usr/local ]; then echo /etc; else echo $(PREFIX)/etc; fi)
install: youtube-dl youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish install: youtube-dl youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish
install -d $(DESTDIR)$(BINDIR) install -d $(DESTDIR)$(BINDIR)
@ -90,11 +90,11 @@ fish-completion: youtube-dl.fish
lazy-extractors: youtube_dl/extractor/lazy_extractors.py lazy-extractors: youtube_dl/extractor/lazy_extractors.py
_EXTRACTOR_FILES != find youtube_dl/extractor -iname '*.py' -and -not -iname 'lazy_extractors.py' _EXTRACTOR_FILES = $(shell find youtube_dl/extractor -iname '*.py' -and -not -iname 'lazy_extractors.py')
youtube_dl/extractor/lazy_extractors.py: devscripts/make_lazy_extractors.py devscripts/lazy_load_template.py $(_EXTRACTOR_FILES) youtube_dl/extractor/lazy_extractors.py: devscripts/make_lazy_extractors.py devscripts/lazy_load_template.py $(_EXTRACTOR_FILES)
$(PYTHON) devscripts/make_lazy_extractors.py $@ $(PYTHON) devscripts/make_lazy_extractors.py $@
youtube-dl.tar.gz: youtube-dl README.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish youtube-dl.tar.gz: youtube-dl README.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish ChangeLog
@tar -czf youtube-dl.tar.gz --transform "s|^|youtube-dl/|" --owner 0 --group 0 \ @tar -czf youtube-dl.tar.gz --transform "s|^|youtube-dl/|" --owner 0 --group 0 \
--exclude '*.DS_Store' \ --exclude '*.DS_Store' \
--exclude '*.kate-swp' \ --exclude '*.kate-swp' \
@ -107,7 +107,7 @@ youtube-dl.tar.gz: youtube-dl README.md README.txt youtube-dl.1 youtube-dl.bash-
--exclude 'docs/_build' \ --exclude 'docs/_build' \
-- \ -- \
bin devscripts test youtube_dl docs \ bin devscripts test youtube_dl docs \
LICENSE README.md README.txt \ ChangeLog LICENSE README.md README.txt \
Makefile MANIFEST.in youtube-dl.1 youtube-dl.bash-completion \ Makefile MANIFEST.in youtube-dl.1 youtube-dl.bash-completion \
youtube-dl.zsh youtube-dl.fish setup.py \ youtube-dl.zsh youtube-dl.fish setup.py \
youtube-dl youtube-dl

321
README.md
View File

@ -17,7 +17,7 @@ youtube-dl - download videos from youtube.com or other video platforms
To install it right away for all UNIX users (Linux, OS X, etc.), type: To install it right away for all UNIX users (Linux, OS X, etc.), type:
sudo curl -L https://yt-dl.org/latest/youtube-dl -o /usr/local/bin/youtube-dl sudo curl -L https://yt-dl.org/downloads/latest/youtube-dl -o /usr/local/bin/youtube-dl
sudo chmod a+rx /usr/local/bin/youtube-dl sudo chmod a+rx /usr/local/bin/youtube-dl
If you do not have curl, you can alternatively use a recent wget: If you do not have curl, you can alternatively use a recent wget:
@ -89,6 +89,8 @@ which means you can modify it, redistribute it or use it however you like.
--mark-watched Mark videos watched (YouTube only) --mark-watched Mark videos watched (YouTube only)
--no-mark-watched Do not mark videos watched (YouTube only) --no-mark-watched Do not mark videos watched (YouTube only)
--no-color Do not emit color codes in output --no-color Do not emit color codes in output
--abort-on-unavailable-fragment Abort downloading when some fragment is not
available
## Network Options: ## Network Options:
--proxy URL Use the specified HTTP/HTTPS/SOCKS proxy. --proxy URL Use the specified HTTP/HTTPS/SOCKS proxy.
@ -103,9 +105,9 @@ which means you can modify it, redistribute it or use it however you like.
(experimental) (experimental)
-6, --force-ipv6 Make all connections via IPv6 -6, --force-ipv6 Make all connections via IPv6
(experimental) (experimental)
--cn-verification-proxy URL Use this proxy to verify the IP address for --geo-verification-proxy URL Use this proxy to verify the IP address for
some Chinese sites. The default proxy some geo-restricted sites. The default
specified by --proxy (or none, if the proxy specified by --proxy (or none, if the
options is not present) is used for the options is not present) is used for the
actual downloading. (experimental) actual downloading. (experimental)
@ -173,7 +175,10 @@ which means you can modify it, redistribute it or use it however you like.
-R, --retries RETRIES Number of retries (default is 10), or -R, --retries RETRIES Number of retries (default is 10), or
"infinite". "infinite".
--fragment-retries RETRIES Number of retries for a fragment (default --fragment-retries RETRIES Number of retries for a fragment (default
is 10), or "infinite" (DASH only) is 10), or "infinite" (DASH and hlsnative
only)
--skip-unavailable-fragments Skip unavailable fragments (DASH and
hlsnative only)
--buffer-size SIZE Size of download buffer (e.g. 1024 or 16K) --buffer-size SIZE Size of download buffer (e.g. 1024 or 16K)
(default is 1024) (default is 1024)
--no-resize-buffer Do not automatically adjust the buffer --no-resize-buffer Do not automatically adjust the buffer
@ -201,32 +206,8 @@ which means you can modify it, redistribute it or use it however you like.
-a, --batch-file FILE File containing URLs to download ('-' for -a, --batch-file FILE File containing URLs to download ('-' for
stdin) stdin)
--id Use only video ID in file name --id Use only video ID in file name
-o, --output TEMPLATE Output filename template. Use %(title)s to -o, --output TEMPLATE Output filename template, see the "OUTPUT
get the title, %(uploader)s for the TEMPLATE" for all the info
uploader name, %(uploader_id)s for the
uploader nickname if different,
%(autonumber)s to get an automatically
incremented number, %(ext)s for the
filename extension, %(format)s for the
format description (like "22 - 1280x720" or
"HD"), %(format_id)s for the unique id of
the format (like YouTube's itags: "137"),
%(upload_date)s for the upload date
(YYYYMMDD), %(extractor)s for the provider
(youtube, metacafe, etc), %(id)s for the
video id, %(playlist_title)s,
%(playlist_id)s, or %(playlist)s (=title if
present, ID otherwise) for the playlist the
video is in, %(playlist_index)s for the
position in the playlist. %(height)s and
%(width)s for the width and height of the
video format. %(resolution)s for a textual
description of the resolution of the video
format. %% for a literal percent. Use - to
output to stdout. Can also be used to
download to a different directory, for
example with -o '/my/downloads/%(uploader)s
/%(title)s-%(id)s.%(ext)s' .
--autonumber-size NUMBER Specify the number of digits in --autonumber-size NUMBER Specify the number of digits in
%(autonumber)s when it is present in output %(autonumber)s when it is present in output
filename template or --auto-number option filename template or --auto-number option
@ -330,7 +311,15 @@ which means you can modify it, redistribute it or use it however you like.
bidirectional text support. Requires bidiv bidirectional text support. Requires bidiv
or fribidi executable in PATH or fribidi executable in PATH
--sleep-interval SECONDS Number of seconds to sleep before each --sleep-interval SECONDS Number of seconds to sleep before each
download. download when used alone or a lower bound
of a range for randomized sleep before each
download (minimum possible number of
seconds to sleep) when used along with
--max-sleep-interval.
--max-sleep-interval SECONDS Upper bound of a range for randomized sleep
before each download (maximum possible
number of seconds to sleep). Must only be
used along with --min-sleep-interval.
## Video Format Options: ## Video Format Options:
-f, --format FORMAT Video format code, see the "FORMAT -f, --format FORMAT Video format code, see the "FORMAT
@ -369,6 +358,17 @@ which means you can modify it, redistribute it or use it however you like.
-n, --netrc Use .netrc authentication data -n, --netrc Use .netrc authentication data
--video-password PASSWORD Video password (vimeo, smotri, youku) --video-password PASSWORD Video password (vimeo, smotri, youku)
## Adobe Pass Options:
--ap-mso MSO Adobe Pass multiple-system operator (TV
provider) identifier, use --ap-list-mso for
a list of available MSOs
--ap-username USERNAME Multiple-system operator account login
--ap-password PASSWORD Multiple-system operator account password.
If this option is left out, youtube-dl will
ask interactively.
--ap-list-mso List all supported multiple-system
operators
## Post-processing Options: ## Post-processing Options:
-x, --extract-audio Convert video files to audio-only files -x, --extract-audio Convert video files to audio-only files
(requires ffmpeg or avconv and ffprobe or (requires ffmpeg or avconv and ffprobe or
@ -424,13 +424,22 @@ which means you can modify it, redistribute it or use it however you like.
# CONFIGURATION # CONFIGURATION
You can configure youtube-dl by placing any supported command line option to a configuration file. On Linux and OS X, the system wide configuration file is located at `/etc/youtube-dl.conf` and the user wide configuration file at `~/.config/youtube-dl/config`. On Windows, the user wide configuration file locations are `%APPDATA%\youtube-dl\config.txt` or `C:\Users\<user name>\youtube-dl.conf`. You can configure youtube-dl by placing any supported command line option to a configuration file. On Linux and OS X, the system wide configuration file is located at `/etc/youtube-dl.conf` and the user wide configuration file at `~/.config/youtube-dl/config`. On Windows, the user wide configuration file locations are `%APPDATA%\youtube-dl\config.txt` or `C:\Users\<user name>\youtube-dl.conf`. Note that by default configuration file may not exist so you may need to create it yourself.
For example, with the following configuration file youtube-dl will always extract the audio, not copy the mtime, use a proxy and save all videos under `Movies` directory in your home directory: For example, with the following configuration file youtube-dl will always extract the audio, not copy the mtime, use a proxy and save all videos under `Movies` directory in your home directory:
``` ```
# Lines starting with # are comments
# Always extract audio
-x -x
# Do not copy the mtime
--no-mtime --no-mtime
# Use this proxy
--proxy 127.0.0.1:3128 --proxy 127.0.0.1:3128
# Save all videos under Movies directory in your home directory
-o ~/Movies/%(title)s.%(ext)s -o ~/Movies/%(title)s.%(ext)s
``` ```
@ -440,12 +449,12 @@ You can use `--ignore-config` if you want to disable the configuration file for
### Authentication with `.netrc` file ### Authentication with `.netrc` file
You may also want to configure automatic credentials storage for extractors that support authentication (by providing login and password with `--username` and `--password`) in order not to pass credentials as command line arguments on every youtube-dl execution and prevent tracking plain text passwords in the shell command history. You can achieve this using a [`.netrc` file](http://stackoverflow.com/tags/.netrc/info) on per extractor basis. For that you will need to create a `.netrc` file in your `$HOME` and restrict permissions to read/write by you only: You may also want to configure automatic credentials storage for extractors that support authentication (by providing login and password with `--username` and `--password`) in order not to pass credentials as command line arguments on every youtube-dl execution and prevent tracking plain text passwords in the shell command history. You can achieve this using a [`.netrc` file](http://stackoverflow.com/tags/.netrc/info) on a per extractor basis. For that you will need to create a `.netrc` file in your `$HOME` and restrict permissions to read/write by only you:
``` ```
touch $HOME/.netrc touch $HOME/.netrc
chmod a-rwx,u+rw $HOME/.netrc chmod a-rwx,u+rw $HOME/.netrc
``` ```
After that you can add credentials for extractor in the following format, where *extractor* is the name of extractor in lowercase: After that you can add credentials for an extractor in the following format, where *extractor* is the name of the extractor in lowercase:
``` ```
machine <extractor> login <login> password <password> machine <extractor> login <login> password <password>
``` ```
@ -541,13 +550,13 @@ Available for the media that is a track or a part of a music album:
- `disc_number`: Number of the disc or other physical medium the track belongs to - `disc_number`: Number of the disc or other physical medium the track belongs to
- `release_year`: Year (YYYY) when the album was released - `release_year`: Year (YYYY) when the album was released
Each aforementioned sequence when referenced in output template will be replaced by the actual value corresponding to the sequence name. Note that some of the sequences are not guaranteed to be present since they depend on the metadata obtained by particular extractor, such sequences will be replaced with `NA`. Each aforementioned sequence when referenced in an output template will be replaced by the actual value corresponding to the sequence name. Note that some of the sequences are not guaranteed to be present since they depend on the metadata obtained by a particular extractor. Such sequences will be replaced with `NA`.
For example for `-o %(title)s-%(id)s.%(ext)s` and mp4 video with title `youtube-dl test video` and id `BaW_jenozKcj` this will result in a `youtube-dl test video-BaW_jenozKcj.mp4` file created in the current directory. For example for `-o %(title)s-%(id)s.%(ext)s` and an mp4 video with title `youtube-dl test video` and id `BaW_jenozKcj`, this will result in a `youtube-dl test video-BaW_jenozKcj.mp4` file created in the current directory.
Output template can also contain arbitrary hierarchical path, e.g. `-o '%(playlist)s/%(playlist_index)s - %(title)s.%(ext)s'` that will result in downloading each video in a directory corresponding to this path template. Any missing directory will be automatically created for you. Output templates can also contain arbitrary hierarchical path, e.g. `-o '%(playlist)s/%(playlist_index)s - %(title)s.%(ext)s'` which will result in downloading each video in a directory corresponding to this path template. Any missing directory will be automatically created for you.
To specify percent literal in output template use `%%`. To output to stdout use `-o -`. To use percent literals in an output template use `%%`. To output to stdout use `-o -`.
The current default template is `%(title)s-%(id)s.%(ext)s`. The current default template is `%(title)s-%(id)s.%(ext)s`.
@ -555,7 +564,7 @@ In some cases, you don't want special characters such as 中, spaces, or &, such
#### Output template and Windows batch files #### Output template and Windows batch files
If you are using output template inside a Windows batch file then you must escape plain percent characters (`%`) by doubling, so that `-o "%(title)s-%(id)s.%(ext)s"` should become `-o "%%(title)s-%%(id)s.%%(ext)s"`. However you should not touch `%`'s that are not plain characters, e.g. environment variables for expansion should stay intact: `-o "C:\%HOMEPATH%\Desktop\%%(title)s.%%(ext)s"`. If you are using an output template inside a Windows batch file then you must escape plain percent characters (`%`) by doubling, so that `-o "%(title)s-%(id)s.%(ext)s"` should become `-o "%%(title)s-%%(id)s.%%(ext)s"`. However you should not touch `%`'s that are not plain characters, e.g. environment variables for expansion should stay intact: `-o "C:\%HOMEPATH%\Desktop\%%(title)s.%%(ext)s"`.
#### Output template examples #### Output template examples
@ -588,7 +597,7 @@ $ youtube-dl -o - BaW_jenozKc
By default youtube-dl tries to download the best available quality, i.e. if you want the best quality you **don't need** to pass any special options, youtube-dl will guess it for you by **default**. By default youtube-dl tries to download the best available quality, i.e. if you want the best quality you **don't need** to pass any special options, youtube-dl will guess it for you by **default**.
But sometimes you may want to download in a different format, for example when you are on a slow or intermittent connection. The key mechanism for achieving this is so called *format selection* based on which you can explicitly specify desired format, select formats based on some criterion or criteria, setup precedence and much more. But sometimes you may want to download in a different format, for example when you are on a slow or intermittent connection. The key mechanism for achieving this is so-called *format selection* based on which you can explicitly specify desired format, select formats based on some criterion or criteria, setup precedence and much more.
The general syntax for format selection is `--format FORMAT` or shorter `-f FORMAT` where `FORMAT` is a *selector expression*, i.e. an expression that describes format or formats you would like to download. The general syntax for format selection is `--format FORMAT` or shorter `-f FORMAT` where `FORMAT` is a *selector expression*, i.e. an expression that describes format or formats you would like to download.
@ -596,21 +605,21 @@ The general syntax for format selection is `--format FORMAT` or shorter `-f FORM
The simplest case is requesting a specific format, for example with `-f 22` you can download the format with format code equal to 22. You can get the list of available format codes for particular video using `--list-formats` or `-F`. Note that these format codes are extractor specific. The simplest case is requesting a specific format, for example with `-f 22` you can download the format with format code equal to 22. You can get the list of available format codes for particular video using `--list-formats` or `-F`. Note that these format codes are extractor specific.
You can also use a file extension (currently `3gp`, `aac`, `flv`, `m4a`, `mp3`, `mp4`, `ogg`, `wav`, `webm` are supported) to download best quality format of particular file extension served as a single file, e.g. `-f webm` will download best quality format with `webm` extension served as a single file. You can also use a file extension (currently `3gp`, `aac`, `flv`, `m4a`, `mp3`, `mp4`, `ogg`, `wav`, `webm` are supported) to download the best quality format of a particular file extension served as a single file, e.g. `-f webm` will download the best quality format with the `webm` extension served as a single file.
You can also use special names to select particular edge case format: You can also use special names to select particular edge case formats:
- `best`: Select best quality format represented by single file with video and audio - `best`: Select the best quality format represented by a single file with video and audio.
- `worst`: Select worst quality format represented by single file with video and audio - `worst`: Select the worst quality format represented by a single file with video and audio.
- `bestvideo`: Select best quality video only format (e.g. DASH video), may not be available - `bestvideo`: Select the best quality video-only format (e.g. DASH video). May not be available.
- `worstvideo`: Select worst quality video only format, may not be available - `worstvideo`: Select the worst quality video-only format. May not be available.
- `bestaudio`: Select best quality audio only format, may not be available - `bestaudio`: Select the best quality audio only-format. May not be available.
- `worstaudio`: Select worst quality audio only format, may not be available - `worstaudio`: Select the worst quality audio only-format. May not be available.
For example, to download worst quality video only format you can use `-f worstvideo`. For example, to download the worst quality video-only format you can use `-f worstvideo`.
If you want to download multiple videos and they don't have the same formats available, you can specify the order of preference using slashes. Note that slash is left-associative, i.e. formats on the left hand side are preferred, for example `-f 22/17/18` will download format 22 if it's available, otherwise it will download format 17 if it's available, otherwise it will download format 18 if it's available, otherwise it will complain that no suitable formats are available for download. If you want to download multiple videos and they don't have the same formats available, you can specify the order of preference using slashes. Note that slash is left-associative, i.e. formats on the left hand side are preferred, for example `-f 22/17/18` will download format 22 if it's available, otherwise it will download format 17 if it's available, otherwise it will download format 18 if it's available, otherwise it will complain that no suitable formats are available for download.
If you want to download several formats of the same video use comma as a separator, e.g. `-f 22,17,18` will download all these three formats, of course if they are available. Or more sophisticated example combined with precedence feature `-f 136/137/mp4/bestvideo,140/m4a/bestaudio`. If you want to download several formats of the same video use a comma as a separator, e.g. `-f 22,17,18` will download all these three formats, of course if they are available. Or a more sophisticated example combined with the precedence feature: `-f 136/137/mp4/bestvideo,140/m4a/bestaudio`.
You can also filter the video formats by putting a condition in brackets, as in `-f "best[height=720]"` (or `-f "[filesize>10M]"`). You can also filter the video formats by putting a condition in brackets, as in `-f "best[height=720]"` (or `-f "[filesize>10M]"`).
@ -632,15 +641,15 @@ Also filtering work for comparisons `=` (equals), `!=` (not equals), `^=` (begin
- `protocol`: The protocol that will be used for the actual download, lower-case. `http`, `https`, `rtsp`, `rtmp`, `rtmpe`, `m3u8`, or `m3u8_native` - `protocol`: The protocol that will be used for the actual download, lower-case. `http`, `https`, `rtsp`, `rtmp`, `rtmpe`, `m3u8`, or `m3u8_native`
- `format_id`: A short description of the format - `format_id`: A short description of the format
Note that none of the aforementioned meta fields are guaranteed to be present since this solely depends on the metadata obtained by particular extractor, i.e. the metadata offered by video hoster. Note that none of the aforementioned meta fields are guaranteed to be present since this solely depends on the metadata obtained by particular extractor, i.e. the metadata offered by the video hoster.
Formats for which the value is not known are excluded unless you put a question mark (`?`) after the operator. You can combine format filters, so `-f "[height <=? 720][tbr>500]"` selects up to 720p videos (or videos where the height is not known) with a bitrate of at least 500 KBit/s. Formats for which the value is not known are excluded unless you put a question mark (`?`) after the operator. You can combine format filters, so `-f "[height <=? 720][tbr>500]"` selects up to 720p videos (or videos where the height is not known) with a bitrate of at least 500 KBit/s.
You can merge the video and audio of two formats into a single file using `-f <video-format>+<audio-format>` (requires ffmpeg or avconv installed), for example `-f bestvideo+bestaudio` will download best video only format, best audio only format and mux them together with ffmpeg/avconv. You can merge the video and audio of two formats into a single file using `-f <video-format>+<audio-format>` (requires ffmpeg or avconv installed), for example `-f bestvideo+bestaudio` will download the best video-only format, the best audio-only format and mux them together with ffmpeg/avconv.
Format selectors can also be grouped using parentheses, for example if you want to download the best mp4 and webm formats with a height lower than 480 you can use `-f '(mp4,webm)[height<480]'`. Format selectors can also be grouped using parentheses, for example if you want to download the best mp4 and webm formats with a height lower than 480 you can use `-f '(mp4,webm)[height<480]'`.
Since the end of April 2015 and version 2015.04.26 youtube-dl uses `-f bestvideo+bestaudio/best` as default format selection (see [#5447](https://github.com/rg3/youtube-dl/issues/5447), [#5456](https://github.com/rg3/youtube-dl/issues/5456)). If ffmpeg or avconv are installed this results in downloading `bestvideo` and `bestaudio` separately and muxing them together into a single file giving the best overall quality available. Otherwise it falls back to `best` and results in downloading the best available quality served as a single file. `best` is also needed for videos that don't come from YouTube because they don't provide the audio and video in two different files. If you want to only download some DASH formats (for example if you are not interested in getting videos with a resolution higher than 1080p), you can add `-f bestvideo[height<=?1080]+bestaudio/best` to your configuration file. Note that if you use youtube-dl to stream to `stdout` (and most likely to pipe it to your media player then), i.e. you explicitly specify output template as `-o -`, youtube-dl still uses `-f best` format selection in order to start content delivery immediately to your player and not to wait until `bestvideo` and `bestaudio` are downloaded and muxed. Since the end of April 2015 and version 2015.04.26, youtube-dl uses `-f bestvideo+bestaudio/best` as the default format selection (see [#5447](https://github.com/rg3/youtube-dl/issues/5447), [#5456](https://github.com/rg3/youtube-dl/issues/5456)). If ffmpeg or avconv are installed this results in downloading `bestvideo` and `bestaudio` separately and muxing them together into a single file giving the best overall quality available. Otherwise it falls back to `best` and results in downloading the best available quality served as a single file. `best` is also needed for videos that don't come from YouTube because they don't provide the audio and video in two different files. If you want to only download some DASH formats (for example if you are not interested in getting videos with a resolution higher than 1080p), you can add `-f bestvideo[height<=?1080]+bestaudio/best` to your configuration file. Note that if you use youtube-dl to stream to `stdout` (and most likely to pipe it to your media player then), i.e. you explicitly specify output template as `-o -`, youtube-dl still uses `-f best` format selection in order to start content delivery immediately to your player and not to wait until `bestvideo` and `bestaudio` are downloaded and muxed.
If you want to preserve the old format selection behavior (prior to youtube-dl 2015.04.26), i.e. you want to download the best available quality media served as a single file, you should explicitly specify your choice with `-f best`. You may want to add it to the [configuration file](#configuration) in order not to type it every time you run youtube-dl. If you want to preserve the old format selection behavior (prior to youtube-dl 2015.04.26), i.e. you want to download the best available quality media served as a single file, you should explicitly specify your choice with `-f best`. You may want to add it to the [configuration file](#configuration) in order not to type it every time you run youtube-dl.
@ -660,7 +669,11 @@ $ youtube-dl -f 'best[filesize<50M]'
# Download best format available via direct link over HTTP/HTTPS protocol # Download best format available via direct link over HTTP/HTTPS protocol
$ youtube-dl -f '(bestvideo+bestaudio/best)[protocol^=http]' $ youtube-dl -f '(bestvideo+bestaudio/best)[protocol^=http]'
# Download the best video format and the best audio format without merging them
$ youtube-dl -f 'bestvideo,bestaudio' -o '%(title)s.f%(format_id)s.%(ext)s'
``` ```
Note that in the last example, an output template is recommended as bestvideo and bestaudio may have the same file name.
# VIDEO SELECTION # VIDEO SELECTION
@ -741,7 +754,7 @@ Videos or video formats streamed via RTMP protocol can only be downloaded when [
### I have downloaded a video but how can I play it? ### I have downloaded a video but how can I play it?
Once the video is fully downloaded, use any video player, such as [mpv](https://mpv.io/), [vlc](http://www.videolan.org) or [mplayer](http://www.mplayerhq.hu/). Once the video is fully downloaded, use any video player, such as [mpv](https://mpv.io/), [vlc](http://www.videolan.org/) or [mplayer](http://www.mplayerhq.hu/).
### I extracted a video URL with `-g`, but it does not play on another machine / in my webbrowser. ### I extracted a video URL with `-g`, but it does not play on another machine / in my webbrowser.
@ -823,10 +836,42 @@ Either prepend `http://www.youtube.com/watch?v=` or separate the ID from the opt
### How do I pass cookies to youtube-dl? ### How do I pass cookies to youtube-dl?
Use the `--cookies` option, for example `--cookies /path/to/cookies/file.txt`. Note that the cookies file must be in Mozilla/Netscape format and the first line of the cookies file must be either `# HTTP Cookie File` or `# Netscape HTTP Cookie File`. Make sure you have correct [newline format](https://en.wikipedia.org/wiki/Newline) in the cookies file and convert newlines if necessary to correspond with your OS, namely `CRLF` (`\r\n`) for Windows, `LF` (`\n`) for Linux and `CR` (`\r`) for Mac OS. `HTTP Error 400: Bad Request` when using `--cookies` is a good sign of invalid newline format. Use the `--cookies` option, for example `--cookies /path/to/cookies/file.txt`.
In order to extract cookies from browser use any conforming browser extension for exporting cookies. For example, [cookies.txt](https://chrome.google.com/webstore/detail/cookiestxt/njabckikapfpffapmjgojcnbfjonfjfg) (for Chrome) or [Export Cookies](https://addons.mozilla.org/en-US/firefox/addon/export-cookies/) (for Firefox).
Note that the cookies file must be in Mozilla/Netscape format and the first line of the cookies file must be either `# HTTP Cookie File` or `# Netscape HTTP Cookie File`. Make sure you have correct [newline format](https://en.wikipedia.org/wiki/Newline) in the cookies file and convert newlines if necessary to correspond with your OS, namely `CRLF` (`\r\n`) for Windows, `LF` (`\n`) for Linux and `CR` (`\r`) for Mac OS. `HTTP Error 400: Bad Request` when using `--cookies` is a good sign of invalid newline format.
Passing cookies to youtube-dl is a good way to workaround login when a particular extractor does not implement it explicitly. Another use case is working around [CAPTCHA](https://en.wikipedia.org/wiki/CAPTCHA) some websites require you to solve in particular cases in order to get access (e.g. YouTube, CloudFlare). Passing cookies to youtube-dl is a good way to workaround login when a particular extractor does not implement it explicitly. Another use case is working around [CAPTCHA](https://en.wikipedia.org/wiki/CAPTCHA) some websites require you to solve in particular cases in order to get access (e.g. YouTube, CloudFlare).
### How do I stream directly to media player?
You will first need to tell youtube-dl to stream media to stdout with `-o -`, and also tell your media player to read from stdin (it must be capable of this for streaming) and then pipe former to latter. For example, streaming to [vlc](http://www.videolan.org/) can be achieved with:
youtube-dl -o - "http://www.youtube.com/watch?v=BaW_jenozKcj" | vlc -
### How do I download only new videos from a playlist?
Use download-archive feature. With this feature you should initially download the complete playlist with `--download-archive /path/to/download/archive/file.txt` that will record identifiers of all the videos in a special file. Each subsequent run with the same `--download-archive` will download only new videos and skip all videos that have been downloaded before. Note that only successful downloads are recorded in the file.
For example, at first,
youtube-dl --download-archive archive.txt "https://www.youtube.com/playlist?list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re"
will download the complete `PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re` playlist and create a file `archive.txt`. Each subsequent run will only download new videos if any:
youtube-dl --download-archive archive.txt "https://www.youtube.com/playlist?list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re"
### Should I add `--hls-prefer-native` into my config?
When youtube-dl detects an HLS video, it can download it either with the built-in downloader or ffmpeg. Since many HLS streams are slightly invalid and ffmpeg/youtube-dl each handle some invalid cases better than the other, there is an option to switch the downloader if needed.
When youtube-dl knows that one particular downloader works better for a given website, that downloader will be picked. Otherwise, youtube-dl will pick the best downloader for general compatibility, which at the moment happens to be ffmpeg. This choice may change in future versions of youtube-dl, with improvements of the built-in downloader and/or ffmpeg.
In particular, the generic extractor (used when your website is not in the [list of supported sites by youtube-dl](http://rg3.github.io/youtube-dl/supportedsites.html) cannot mandate one specific downloader.
If you put either `--hls-prefer-native` or `--hls-prefer-ffmpeg` into your configuration, a different subset of videos will fail to download correctly. Instead, it is much better to [file an issue](https://yt-dl.org/bug) or a pull request which details why the native or the ffmpeg HLS downloader is a better choice for your use case.
### Can you add support for this anime video site, or site which shows current movies for free? ### Can you add support for this anime video site, or site which shows current movies for free?
As a matter of policy (as well as legality), youtube-dl does not include support for services that specialize in infringing copyright. As a rule of thumb, if you cannot easily find a video that the service is quite obviously allowed to distribute (i.e. that has been uploaded by the creator, the creator's distributor, or is published under a free license), the service is probably unfit for inclusion to youtube-dl. As a matter of policy (as well as legality), youtube-dl does not include support for services that specialize in infringing copyright. As a rule of thumb, if you cannot easily find a video that the service is quite obviously allowed to distribute (i.e. that has been uploaded by the creator, the creator's distributor, or is published under a free license), the service is probably unfit for inclusion to youtube-dl.
@ -857,7 +902,7 @@ If you want to find out whether a given URL is supported, simply call youtube-dl
# Why do I need to go through that much red tape when filing bugs? # Why do I need to go through that much red tape when filing bugs?
Before we had the issue template, despite our extensive [bug reporting instructions](#bugs), about 80% of the issue reports we got were useless, for instance because people used ancient versions hundreds of releases old, because of simple syntactic errors (not in youtube-dl but in general shell usage), because the problem was alrady reported multiple times before, because people did not actually read an error message, even if it said "please install ffmpeg", because people did not mention the URL they were trying to download and many more simple, easy-to-avoid problems, many of whom were totally unrelated to youtube-dl. Before we had the issue template, despite our extensive [bug reporting instructions](#bugs), about 80% of the issue reports we got were useless, for instance because people used ancient versions hundreds of releases old, because of simple syntactic errors (not in youtube-dl but in general shell usage), because the problem was already reported multiple times before, because people did not actually read an error message, even if it said "please install ffmpeg", because people did not mention the URL they were trying to download and many more simple, easy-to-avoid problems, many of whom were totally unrelated to youtube-dl.
youtube-dl is an open-source project manned by too few volunteers, so we'd rather spend time fixing bugs where we are certain none of those simple problems apply, and where we can be reasonably confident to be able to reproduce the issue without asking the reporter repeatedly. As such, the output of `youtube-dl -v YOUR_URL_HERE` is really all that's required to file an issue. The issue template also guides you through some basic steps you can do, such as checking that your version of youtube-dl is current. youtube-dl is an open-source project manned by too few volunteers, so we'd rather spend time fixing bugs where we are certain none of those simple problems apply, and where we can be reasonably confident to be able to reproduce the issue without asking the reporter repeatedly. As such, the output of `youtube-dl -v YOUR_URL_HERE` is really all that's required to file an issue. The issue template also guides you through some basic steps you can do, such as checking that your version of youtube-dl is current.
@ -878,7 +923,7 @@ To run the test, simply invoke your favorite test runner, or execute a test file
If you want to create a build of youtube-dl yourself, you'll need If you want to create a build of youtube-dl yourself, you'll need
* python * python
* make (both GNU make and BSD make are supported) * make (only GNU make is supported)
* pandoc * pandoc
* zip * zip
* nosetests * nosetests
@ -890,9 +935,17 @@ If you want to add support for a new site, first of all **make sure** this site
After you have ensured this site is distributing it's content legally, you can follow this quick list (assuming your service is called `yourextractor`): After you have ensured this site is distributing it's content legally, you can follow this quick list (assuming your service is called `yourextractor`):
1. [Fork this repository](https://github.com/rg3/youtube-dl/fork) 1. [Fork this repository](https://github.com/rg3/youtube-dl/fork)
2. Check out the source code with `git clone git@github.com:YOUR_GITHUB_USERNAME/youtube-dl.git` 2. Check out the source code with:
3. Start a new git branch with `cd youtube-dl; git checkout -b yourextractor`
git clone git@github.com:YOUR_GITHUB_USERNAME/youtube-dl.git
3. Start a new git branch with
cd youtube-dl
git checkout -b yourextractor
4. Start with this simple template and save it to `youtube_dl/extractor/yourextractor.py`: 4. Start with this simple template and save it to `youtube_dl/extractor/yourextractor.py`:
```python ```python
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
@ -936,19 +989,151 @@ After you have ensured this site is distributing it's content legally, you can f
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py). 5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. 6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc.
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L74-L252). Add tests and code for as many as you want. 7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L74-L252). Add tests and code for as many as you want.
8. Keep in mind that the only mandatory fields in info dict for successful extraction process are `id`, `title` and either `url` or `formats`, i.e. these are the critical data the extraction does not make any sense without. This means that [any field](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L148-L252) apart from aforementioned mandatory ones should be treated **as optional** and extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields. For example, if you have some intermediate dict `meta` that is a source of metadata and it has a key `summary` that you want to extract and put into resulting info dict as `description`, you should be ready that this key may be missing from the `meta` dict, i.e. you should extract it as `meta.get('summary')` and not `meta['summary']`. Similarly, you should pass `fatal=False` when extracting data from a webpage with `_search_regex/_html_search_regex`. 8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://pypi.python.org/pypi/flake8). Also make sure your code works under all [Python](http://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+.
9. Check the code with [flake8](https://pypi.python.org/pypi/flake8). Also make sure your code works under all [Python](http://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+. 9. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
10. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
$ git add youtube_dl/extractor/extractors.py $ git add youtube_dl/extractor/extractors.py
$ git add youtube_dl/extractor/yourextractor.py $ git add youtube_dl/extractor/yourextractor.py
$ git commit -m '[yourextractor] Add new extractor' $ git commit -m '[yourextractor] Add new extractor'
$ git push origin yourextractor $ git push origin yourextractor
11. Finally, [create a pull request](https://help.github.com/articles/creating-a-pull-request). We'll then review and merge it. 10. Finally, [create a pull request](https://help.github.com/articles/creating-a-pull-request). We'll then review and merge it.
In any case, thank you very much for your contributions! In any case, thank you very much for your contributions!
## youtube-dl coding conventions
This section introduces a guide lines for writing idiomatic, robust and future-proof extractor code.
Extractors are very fragile by nature since they depend on the layout of the source data provided by 3rd party media hosters out of your control and this layout tends to change. As an extractor implementer your task is not only to write code that will extract media links and metadata correctly but also to minimize dependency on the source's layout and even to make the code foresee potential future changes and be ready for that. This is important because it will allow the extractor not to break on minor layout changes thus keeping old youtube-dl versions working. Even though this breakage issue is easily fixed by emitting a new version of youtube-dl with a fix incorporated, all the previous versions become broken in all repositories and distros' packages that may not be so prompt in fetching the update from us. Needless to say, some non rolling release distros may never receive an update at all.
### Mandatory and optional metafields
For extraction to work youtube-dl relies on metadata your extractor extracts and provides to youtube-dl expressed by an [information dictionary](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L75-L257) or simply *info dict*. Only the following meta fields in the *info dict* are considered mandatory for a successful extraction process by youtube-dl:
- `id` (media identifier)
- `title` (media title)
- `url` (media download URL) or `formats`
In fact only the last option is technically mandatory (i.e. if you can't figure out the download location of the media the extraction does not make any sense). But by convention youtube-dl also treats `id` and `title` as mandatory. Thus the aforementioned metafields are the critical data that the extraction does not make any sense without and if any of them fail to be extracted then the extractor is considered completely broken.
[Any field](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L149-L257) apart from the aforementioned ones are considered **optional**. That means that extraction should be **tolerant** to situations when sources for these fields can potentially be unavailable (even if they are always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields.
#### Example
Say you have some source dictionary `meta` that you've fetched as JSON with HTTP request and it has a key `summary`:
```python
meta = self._download_json(url, video_id)
```
Assume at this point `meta`'s layout is:
```python
{
...
"summary": "some fancy summary text",
...
}
```
Assume you want to extract `summary` and put it into the resulting info dict as `description`. Since `description` is an optional metafield you should be ready that this key may be missing from the `meta` dict, so that you should extract it like:
```python
description = meta.get('summary') # correct
```
and not like:
```python
description = meta['summary'] # incorrect
```
The latter will break extraction process with `KeyError` if `summary` disappears from `meta` at some later time but with the former approach extraction will just go ahead with `description` set to `None` which is perfectly fine (remember `None` is equivalent to the absence of data).
Similarly, you should pass `fatal=False` when extracting optional data from a webpage with `_search_regex`, `_html_search_regex` or similar methods, for instance:
```python
description = self._search_regex(
r'<span[^>]+id="title"[^>]*>([^<]+)<',
webpage, 'description', fatal=False)
```
With `fatal` set to `False` if `_search_regex` fails to extract `description` it will emit a warning and continue extraction.
You can also pass `default=<some fallback value>`, for example:
```python
description = self._search_regex(
r'<span[^>]+id="title"[^>]*>([^<]+)<',
webpage, 'description', default=None)
```
On failure this code will silently continue the extraction with `description` set to `None`. That is useful for metafields that may or may not be present.
### Provide fallbacks
When extracting metadata try to do so from multiple sources. For example if `title` is present in several places, try extracting from at least some of them. This makes it more future-proof in case some of the sources become unavailable.
#### Example
Say `meta` from the previous example has a `title` and you are about to extract it. Since `title` is a mandatory meta field you should end up with something like:
```python
title = meta['title']
```
If `title` disappeares from `meta` in future due to some changes on the hoster's side the extraction would fail since `title` is mandatory. That's expected.
Assume that you have some another source you can extract `title` from, for example `og:title` HTML meta of a `webpage`. In this case you can provide a fallback scenario:
```python
title = meta.get('title') or self._og_search_title(webpage)
```
This code will try to extract from `meta` first and if it fails it will try extracting `og:title` from a `webpage`.
### Make regular expressions flexible
When using regular expressions try to write them fuzzy and flexible.
#### Example
Say you need to extract `title` from the following HTML code:
```html
<span style="position: absolute; left: 910px; width: 90px; float: right; z-index: 9999;" class="title">some fancy title</span>
```
The code for that task should look similar to:
```python
title = self._search_regex(
r'<span[^>]+class="title"[^>]*>([^<]+)', webpage, 'title')
```
Or even better:
```python
title = self._search_regex(
r'<span[^>]+class=(["\'])title\1[^>]*>(?P<title>[^<]+)',
webpage, 'title', group='title')
```
Note how you tolerate potential changes in the `style` attribute's value or switch from using double quotes to single for `class` attribute:
The code definitely should not look like:
```python
title = self._search_regex(
r'<span style="position: absolute; left: 910px; width: 90px; float: right; z-index: 9999;" class="title">(.*?)</span>',
webpage, 'title', group='title')
```
### Use safe conversion functions
Wrap all extracted numeric data into safe functions from `utils`: `int_or_none`, `float_or_none`. Use them for string to number conversions as well.
# EMBEDDING YOUTUBE-DL # EMBEDDING YOUTUBE-DL
youtube-dl makes the best effort to be a good command-line program, and thus should be callable from any programming language. If you encounter any problems parsing its output, feel free to [create a report](https://github.com/rg3/youtube-dl/issues/new). youtube-dl makes the best effort to be a good command-line program, and thus should be callable from any programming language. If you encounter any problems parsing its output, feel free to [create a report](https://github.com/rg3/youtube-dl/issues/new).
@ -1005,7 +1190,7 @@ with youtube_dl.YoutubeDL(ydl_opts) as ydl:
# BUGS # BUGS
Bugs and suggestions should be reported at: <https://github.com/rg3/youtube-dl/issues>. Unless you were prompted so or there is another pertinent reason (e.g. GitHub fails to accept the bug report), please do not send bug reports via personal email. For discussions, join us in the IRC channel [#youtube-dl](irc://chat.freenode.net/#youtube-dl) on freenode ([webchat](http://webchat.freenode.net/?randomnick=1&channels=youtube-dl)). Bugs and suggestions should be reported at: <https://github.com/rg3/youtube-dl/issues>. Unless you were prompted to or there is another pertinent reason (e.g. GitHub fails to accept the bug report), please do not send bug reports via personal email. For discussions, join us in the IRC channel [#youtube-dl](irc://chat.freenode.net/#youtube-dl) on freenode ([webchat](http://webchat.freenode.net/?randomnick=1&channels=youtube-dl)).
**Please include the full output of youtube-dl when run with `-v`**, i.e. **add** `-v` flag to **your command line**, copy the **whole** output and post it in the issue body wrapped in \`\`\` for better formatting. It should look similar to this: **Please include the full output of youtube-dl when run with `-v`**, i.e. **add** `-v` flag to **your command line**, copy the **whole** output and post it in the issue body wrapped in \`\`\` for better formatting. It should look similar to this:
``` ```
@ -1021,7 +1206,7 @@ $ youtube-dl -v <your command line>
[debug] Proxy map: {} [debug] Proxy map: {}
... ...
``` ```
**Do not post screenshots of verbose log only plain text is acceptable.** **Do not post screenshots of verbose logs; only plain text is acceptable.**
The output (including the first lines) contains important debugging information. Issues without the full output are often not reproducible and therefore do not get solved in short order, if ever. The output (including the first lines) contains important debugging information. Issues without the full output are often not reproducible and therefore do not get solved in short order, if ever.
@ -1055,7 +1240,7 @@ Make sure that someone has not already opened the issue you're trying to open. S
### Why are existing options not enough? ### Why are existing options not enough?
Before requesting a new feature, please have a quick peek at [the list of supported options](https://github.com/rg3/youtube-dl/blob/master/README.md#synopsis). Many feature requests are for features that actually exist already! Please, absolutely do show off your work in the issue report and detail how the existing similar options do *not* solve your problem. Before requesting a new feature, please have a quick peek at [the list of supported options](https://github.com/rg3/youtube-dl/blob/master/README.md#options). Many feature requests are for features that actually exist already! Please, absolutely do show off your work in the issue report and detail how the existing similar options do *not* solve your problem.
### Is there enough context in your bug report? ### Is there enough context in your bug report?
@ -1075,7 +1260,7 @@ Only post features that you (or an incapacitated friend you can personally talk
### Is your question about youtube-dl? ### Is your question about youtube-dl?
It may sound strange, but some bug reports we receive are completely unrelated to youtube-dl and relate to a different or even the reporter's own application. Please make sure that you are actually using youtube-dl. If you are using a UI for youtube-dl, report the bug to the maintainer of the actual application providing the UI. On the other hand, if your UI for youtube-dl fails in some way you believe is related to youtube-dl, by all means, go ahead and report the bug. It may sound strange, but some bug reports we receive are completely unrelated to youtube-dl and relate to a different, or even the reporter's own, application. Please make sure that you are actually using youtube-dl. If you are using a UI for youtube-dl, report the bug to the maintainer of the actual application providing the UI. On the other hand, if your UI for youtube-dl fails in some way you believe is related to youtube-dl, by all means, go ahead and report the bug.
# COPYRIGHT # COPYRIGHT

View File

@ -15,13 +15,9 @@ data = urllib.request.urlopen(URL).read()
with open('download.html.in', 'r', encoding='utf-8') as tmplf: with open('download.html.in', 'r', encoding='utf-8') as tmplf:
template = tmplf.read() template = tmplf.read()
md5sum = hashlib.md5(data).hexdigest()
sha1sum = hashlib.sha1(data).hexdigest()
sha256sum = hashlib.sha256(data).hexdigest() sha256sum = hashlib.sha256(data).hexdigest()
template = template.replace('@PROGRAM_VERSION@', version) template = template.replace('@PROGRAM_VERSION@', version)
template = template.replace('@PROGRAM_URL@', URL) template = template.replace('@PROGRAM_URL@', URL)
template = template.replace('@PROGRAM_MD5SUM@', md5sum)
template = template.replace('@PROGRAM_SHA1SUM@', sha1sum)
template = template.replace('@PROGRAM_SHA256SUM@', sha256sum) template = template.replace('@PROGRAM_SHA256SUM@', sha256sum)
template = template.replace('@EXE_URL@', versions_info['versions'][version]['exe'][0]) template = template.replace('@EXE_URL@', versions_info['versions'][version]['exe'][0])
template = template.replace('@EXE_SHA256SUM@', versions_info['versions'][version]['exe'][1]) template = template.replace('@EXE_SHA256SUM@', versions_info['versions'][version]['exe'][1])

View File

@ -1,8 +0,0 @@
#!/bin/bash
mkdir -p tmp && cd tmp
wget -N http://downloads.sourceforge.net/project/socks-relay/socks-relay/srelay-0.4.8/srelay-0.4.8b6.tar.gz
tar zxvf srelay-0.4.8b6.tar.gz
cd srelay-0.4.8b6
./configure
make

View File

@ -1,4 +1,4 @@
# encoding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re

View File

@ -54,17 +54,21 @@ def filter_options(readme):
if in_options: if in_options:
if line.lstrip().startswith('-'): if line.lstrip().startswith('-'):
option, description = re.split(r'\s{2,}', line.lstrip()) split = re.split(r'\s{2,}', line.lstrip())
split_option = option.split(' ') # Description string may start with `-` as well. If there is
# only one piece then it's a description bit not an option.
if len(split) > 1:
option, description = split
split_option = option.split(' ')
if not split_option[-1].startswith('-'): # metavar if not split_option[-1].startswith('-'): # metavar
option = ' '.join(split_option[:-1] + ['*%s*' % split_option[-1]]) option = ' '.join(split_option[:-1] + ['*%s*' % split_option[-1]])
# Pandoc's definition_lists. See http://pandoc.org/README.html # Pandoc's definition_lists. See http://pandoc.org/README.html
# for more information. # for more information.
ret += '\n%s\n: %s\n' % (option, description) ret += '\n%s\n: %s\n' % (option, description)
else: continue
ret += line.lstrip() + '\n' ret += line.lstrip() + '\n'
else: else:
ret += line + '\n' ret += line + '\n'

View File

@ -60,6 +60,9 @@ if ! type pandoc >/dev/null 2>/dev/null; then echo 'ERROR: pandoc is missing'; e
if ! python3 -c 'import rsa' 2>/dev/null; then echo 'ERROR: python3-rsa is missing'; exit 1; fi if ! python3 -c 'import rsa' 2>/dev/null; then echo 'ERROR: python3-rsa is missing'; exit 1; fi
if ! python3 -c 'import wheel' 2>/dev/null; then echo 'ERROR: wheel is missing'; exit 1; fi if ! python3 -c 'import wheel' 2>/dev/null; then echo 'ERROR: wheel is missing'; exit 1; fi
read -p "Is ChangeLog up to date? (y/n) " -n 1
if [[ ! $REPLY =~ ^[Yy]$ ]]; then exit 1; fi
/bin/echo -e "\n### First of all, testing..." /bin/echo -e "\n### First of all, testing..."
make clean make clean
if $skip_tests ; then if $skip_tests ; then
@ -71,9 +74,12 @@ fi
/bin/echo -e "\n### Changing version in version.py..." /bin/echo -e "\n### Changing version in version.py..."
sed -i "s/__version__ = '.*'/__version__ = '$version'/" youtube_dl/version.py sed -i "s/__version__ = '.*'/__version__ = '$version'/" youtube_dl/version.py
/bin/echo -e "\n### Changing version in ChangeLog..."
sed -i "s/<unreleased>/$version/" ChangeLog
/bin/echo -e "\n### Committing documentation, templates and youtube_dl/version.py..." /bin/echo -e "\n### Committing documentation, templates and youtube_dl/version.py..."
make README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE.md supportedsites make README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE.md supportedsites
git add README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE.md docs/supportedsites.md youtube_dl/version.py git add README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE.md docs/supportedsites.md youtube_dl/version.py ChangeLog
git commit $gpg_sign_commits -m "release $version" git commit $gpg_sign_commits -m "release $version"
/bin/echo -e "\n### Now tagging, signing and pushing..." /bin/echo -e "\n### Now tagging, signing and pushing..."

View File

@ -0,0 +1,47 @@
#!/usr/bin/env python
from __future__ import unicode_literals
import itertools
import json
import os
import re
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from youtube_dl.compat import (
compat_print,
compat_urllib_request,
)
from youtube_dl.utils import format_bytes
def format_size(bytes):
return '%s (%d bytes)' % (format_bytes(bytes), bytes)
total_bytes = 0
for page in itertools.count(1):
releases = json.loads(compat_urllib_request.urlopen(
'https://api.github.com/repos/rg3/youtube-dl/releases?page=%s' % page
).read().decode('utf-8'))
if not releases:
break
for release in releases:
compat_print(release['name'])
for asset in release['assets']:
asset_name = asset['name']
total_bytes += asset['download_count'] * asset['size']
if all(not re.match(p, asset_name) for p in (
r'^youtube-dl$',
r'^youtube-dl-\d{4}\.\d{2}\.\d{2}(?:\.\d+)?\.tar\.gz$',
r'^youtube-dl\.exe$')):
continue
compat_print(
' %s size: %s downloads: %d'
% (asset_name, format_size(asset['size']), asset['download_count']))
compat_print('total downloads traffic: %s' % format_size(total_bytes))

View File

@ -1,4 +1,4 @@
# -*- coding: utf-8 -*- # coding: utf-8
# #
# youtube-dl documentation build configuration file, created by # youtube-dl documentation build configuration file, created by
# sphinx-quickstart on Fri Mar 14 21:05:43 2014. # sphinx-quickstart on Fri Mar 14 21:05:43 2014.

View File

@ -13,11 +13,16 @@
- **5min** - **5min**
- **8tracks** - **8tracks**
- **91porn** - **91porn**
- **9c9media**
- **9c9media:stack**
- **9gag** - **9gag**
- **9now.com.au**
- **abc.net.au** - **abc.net.au**
- **Abc7News** - **abc.net.au:iview**
- **abcnews** - **abcnews**
- **abcnews:video** - **abcnews:video**
- **abcotvs**: ABC Owned Television Stations
- **abcotvs:clips**
- **AcademicEarth:Course** - **AcademicEarth:Course**
- **acast** - **acast**
- **acast:channel** - **acast:channel**
@ -29,11 +34,12 @@
- **AdultSwim** - **AdultSwim**
- **aenetworks**: A+E Networks: A&E, Lifetime, History.com, FYI Network - **aenetworks**: A+E Networks: A&E, Lifetime, History.com, FYI Network
- **AfreecaTV**: afreecatv.com - **AfreecaTV**: afreecatv.com
- **Aftonbladet**
- **AirMozilla** - **AirMozilla**
- **AlJazeera** - **AlJazeera**
- **Allocine** - **Allocine**
- **AlphaPorno** - **AlphaPorno**
- **AMCNetworks**
- **anderetijden**: npo.nl and ntr.nl
- **AnimeOnDemand** - **AnimeOnDemand**
- **anitube.se** - **anitube.se**
- **AnySex** - **AnySex**
@ -45,7 +51,7 @@
- **archive.org**: archive.org videos - **archive.org**: archive.org videos
- **ARD** - **ARD**
- **ARD:mediathek** - **ARD:mediathek**
- **ARD:mediathek**: Saarländischer Rundfunk - **Arkena**
- **arte.tv** - **arte.tv**
- **arte.tv:+7** - **arte.tv:+7**
- **arte.tv:cinema** - **arte.tv:cinema**
@ -64,6 +70,10 @@
- **audiomack** - **audiomack**
- **audiomack:album** - **audiomack:album**
- **auroravid**: AuroraVid - **auroravid**: AuroraVid
- **AWAAN**
- **awaan:live**
- **awaan:season**
- **awaan:video**
- **Azubu** - **Azubu**
- **AzubuLive** - **AzubuLive**
- **BaiduVideo**: 百度视频 - **BaiduVideo**: 百度视频
@ -76,9 +86,10 @@
- **bbc.co.uk:article**: BBC articles - **bbc.co.uk:article**: BBC articles
- **bbc.co.uk:iplayer:playlist** - **bbc.co.uk:iplayer:playlist**
- **bbc.co.uk:playlist** - **bbc.co.uk:playlist**
- **BeatportPro** - **Beatport**
- **Beeg** - **Beeg**
- **BehindKink** - **BehindKink**
- **BellMedia**
- **Bet** - **Bet**
- **Bigflix** - **Bigflix**
- **Bild**: Bild.de - **Bild**: Bild.de
@ -100,6 +111,7 @@
- **bt:vestlendingen**: Bergens Tidende - Vestlendingen - **bt:vestlendingen**: Bergens Tidende - Vestlendingen
- **BuzzFeed** - **BuzzFeed**
- **BYUtv** - **BYUtv**
- **BYUtvEvent**
- **Camdemy** - **Camdemy**
- **CamdemyFolder** - **CamdemyFolder**
- **CamWithHer** - **CamWithHer**
@ -108,17 +120,22 @@
- **Canvas** - **Canvas**
- **CarambaTV** - **CarambaTV**
- **CarambaTVPage** - **CarambaTVPage**
- **CBC** - **CartoonNetwork**
- **CBCPlayer** - **cbc.ca**
- **cbc.ca:player**
- **cbc.ca:watch**
- **cbc.ca:watch:video**
- **CBS** - **CBS**
- **CBSInteractive** - **CBSInteractive**
- **CBSLocal** - **CBSLocal**
- **CBSNews**: CBS News - **cbsnews**: CBS News
- **CBSNewsLiveVideo**: CBS News Live Videos - **cbsnews:livevideo**: CBS News Live Videos
- **CBSSports** - **CBSSports**
- **CCTV**
- **CDA** - **CDA**
- **CeskaTelevize** - **CeskaTelevize**
- **channel9**: Channel 9 - **channel9**: Channel 9
- **CharlieRose**
- **Chaturbate** - **Chaturbate**
- **Chilloutzone** - **Chilloutzone**
- **chirbit** - **chirbit**
@ -141,7 +158,8 @@
- **CollegeRama** - **CollegeRama**
- **ComCarCoff** - **ComCarCoff**
- **ComedyCentral** - **ComedyCentral**
- **ComedyCentralShows**: The Daily Show / The Colbert Report - **ComedyCentralShortname**
- **ComedyCentralTV**
- **CondeNast**: Condé Nast media group: Allure, Architectural Digest, Ars Technica, Bon Appétit, Brides, Condé Nast, Condé Nast Traveler, Details, Epicurious, GQ, Glamour, Golf Digest, SELF, Teen Vogue, The New Yorker, Vanity Fair, Vogue, W Magazine, WIRED - **CondeNast**: Condé Nast media group: Allure, Architectural Digest, Ars Technica, Bon Appétit, Brides, Condé Nast, Condé Nast Traveler, Details, Epicurious, GQ, Glamour, Golf Digest, SELF, Teen Vogue, The New Yorker, Vanity Fair, Vogue, W Magazine, WIRED
- **Coub** - **Coub**
- **Cracked** - **Cracked**
@ -153,8 +171,11 @@
- **CSNNE** - **CSNNE**
- **CSpan**: C-SPAN - **CSpan**: C-SPAN
- **CtsNews**: 華視新聞 - **CtsNews**: 華視新聞
- **CTVNews**
- **culturebox.francetvinfo.fr** - **culturebox.francetvinfo.fr**
- **CultureUnplugged** - **CultureUnplugged**
- **curiositystream**
- **curiositystream:collection**
- **CWTV** - **CWTV**
- **DailyMail** - **DailyMail**
- **dailymotion** - **dailymotion**
@ -166,10 +187,6 @@
- **daum.net:playlist** - **daum.net:playlist**
- **daum.net:user** - **daum.net:user**
- **DBTV** - **DBTV**
- **DCN**
- **dcn:live**
- **dcn:season**
- **dcn:video**
- **DctpTv** - **DctpTv**
- **DeezerPlaylist** - **DeezerPlaylist**
- **defense.gouv.fr** - **defense.gouv.fr**
@ -178,6 +195,7 @@
- **DigitallySpeaking** - **DigitallySpeaking**
- **Digiteka** - **Digiteka**
- **Discovery** - **Discovery**
- **DiscoveryGo**
- **Dotsub** - **Dotsub**
- **DouyuTV**: 斗鱼 - **DouyuTV**: 斗鱼
- **DPlay** - **DPlay**
@ -210,29 +228,32 @@
- **EsriVideo** - **EsriVideo**
- **Europa** - **Europa**
- **EveryonesMixtape** - **EveryonesMixtape**
- **exfm**: ex.fm
- **ExpoTV** - **ExpoTV**
- **ExtremeTube** - **ExtremeTube**
- **EyedoTV** - **EyedoTV**
- **facebook** - **facebook**
- **FacebookPluginsVideo**
- **faz.net** - **faz.net**
- **fc2** - **fc2**
- **fc2:embed**
- **Fczenit** - **Fczenit**
- **features.aol.com** - **features.aol.com**
- **fernsehkritik.tv** - **fernsehkritik.tv**
- **Firstpost** - **Firstpost**
- **FiveTV** - **FiveTV**
- **Flickr** - **Flickr**
- **Flipagram**
- **Folketinget**: Folketinget (ft.dk; Danish parliament) - **Folketinget**: Folketinget (ft.dk; Danish parliament)
- **FootyRoom** - **FootyRoom**
- **Formula1** - **Formula1**
- **FOX** - **FOX**
- **Foxgay** - **Foxgay**
- **FoxNews**: Fox News and Fox Business Video - **foxnews**: Fox News and Fox Business Video
- **foxnews:article**
- **foxnews:insider**
- **FoxSports** - **FoxSports**
- **france2.fr:generation-quoi** - **france2.fr:generation-quoi**
- **FranceCulture** - **FranceCulture**
- **FranceCultureEmission**
- **FranceInter** - **FranceInter**
- **francetv**: France 2, 3, 4, 5 and Ô - **francetv**: France 2, 3, 4, 5 and Ô
- **francetvinfo.fr** - **francetvinfo.fr**
@ -241,8 +262,9 @@
- **FreeVideo** - **FreeVideo**
- **Funimation** - **Funimation**
- **FunnyOrDie** - **FunnyOrDie**
- **Fusion**
- **FXNetworks**
- **GameInformer** - **GameInformer**
- **Gamekings**
- **GameOne** - **GameOne**
- **gameone:playlist** - **gameone:playlist**
- **Gamersyde** - **Gamersyde**
@ -257,9 +279,9 @@
- **Glide**: Glide mobile video messages (glide.me) - **Glide**: Glide mobile video messages (glide.me)
- **Globo** - **Globo**
- **GloboArticle** - **GloboArticle**
- **Go**
- **GodTube** - **GodTube**
- **GodTV** - **GodTV**
- **GoldenMoustache**
- **Golem** - **Golem**
- **GoogleDrive** - **GoogleDrive**
- **Goshgay** - **Goshgay**
@ -267,12 +289,16 @@
- **Groupon** - **Groupon**
- **Hark** - **Hark**
- **HBO** - **HBO**
- **HBOEpisode**
- **HearThisAt** - **HearThisAt**
- **Heise** - **Heise**
- **HellPorno** - **HellPorno**
- **Helsinki**: helsinki.fi - **Helsinki**: helsinki.fi
- **HentaiStigma** - **HentaiStigma**
- **HGTV**
- **hgtv.com:show**
- **HistoricFilms** - **HistoricFilms**
- **history:topic**: History.com Topic
- **hitbox** - **hitbox**
- **hitbox:live** - **hitbox:live**
- **HornBunny** - **HornBunny**
@ -280,6 +306,9 @@
- **HotStar** - **HotStar**
- **Howcast** - **Howcast**
- **HowStuffWorks** - **HowStuffWorks**
- **HRTi**
- **HRTiPlaylist**
- **Huajiao**: 花椒直播
- **HuffPost**: Huffington Post - **HuffPost**: Huffington Post
- **Hypem** - **Hypem**
- **Iconosquare** - **Iconosquare**
@ -301,18 +330,21 @@
- **ivi**: ivi.ru - **ivi**: ivi.ru
- **ivi:compilation**: ivi.ru compilations - **ivi:compilation**: ivi.ru compilations
- **ivideon**: Ivideon TV - **ivideon**: Ivideon TV
- **Iwara**
- **Izlesene** - **Izlesene**
- **JeuxVideo** - **JeuxVideo**
- **Jove** - **Jove**
- **jpopsuki.tv** - **jpopsuki.tv**
- **JWPlatform** - **JWPlatform**
- **Kaltura** - **Kaltura**
- **Kamcord**
- **KanalPlay**: Kanal 5/9/11 Play - **KanalPlay**: Kanal 5/9/11 Play
- **Kankan** - **Kankan**
- **Karaoketv** - **Karaoketv**
- **KarriereVideos** - **KarriereVideos**
- **keek** - **keek**
- **KeezMovies** - **KeezMovies**
- **Ketnet**
- **KhanAcademy** - **KhanAcademy**
- **KickStarter** - **KickStarter**
- **KonserthusetPlay** - **KonserthusetPlay**
@ -326,11 +358,15 @@
- **kuwo:mv**: 酷我音乐 - MV - **kuwo:mv**: 酷我音乐 - MV
- **kuwo:singer**: 酷我音乐 - 歌手 - **kuwo:singer**: 酷我音乐 - 歌手
- **kuwo:song**: 酷我音乐 - **kuwo:song**: 酷我音乐
- **la7.tv** - **la7.it**
- **Laola1Tv** - **Laola1Tv**
- **LCI**
- **Lcp**
- **LcpPlay**
- **Le**: 乐视网 - **Le**: 乐视网
- **Learnr** - **Learnr**
- **Lecture2Go** - **Lecture2Go**
- **LEGO**
- **Lemonde** - **Lemonde**
- **LePlaylist** - **LePlaylist**
- **LetvCloud**: 乐视云 - **LetvCloud**: 乐视云
@ -356,13 +392,17 @@
- **mailru**: Видео@Mail.Ru - **mailru**: Видео@Mail.Ru
- **MakersChannel** - **MakersChannel**
- **MakerTV** - **MakerTV**
- **mangomolo:live**
- **mangomolo:video**
- **MatchTV** - **MatchTV**
- **MDR**: MDR.DE and KiKA - **MDR**: MDR.DE and KiKA
- **media.ccc.de** - **media.ccc.de**
- **META**
- **metacafe** - **metacafe**
- **Metacritic** - **Metacritic**
- **Mgoon** - **Mgoon**
- **MGTV**: 芒果TV - **MGTV**: 芒果TV
- **MiaoPai**
- **Minhateca** - **Minhateca**
- **MinistryGrid** - **MinistryGrid**
- **Minoto** - **Minoto**
@ -384,10 +424,13 @@
- **MovieClips** - **MovieClips**
- **MovieFap** - **MovieFap**
- **Moviezine** - **Moviezine**
- **MovingImage**
- **MPORA** - **MPORA**
- **MTV** - **MSN**
- **mtg**: MTG services
- **mtv**
- **mtv.de** - **mtv.de**
- **mtviggy.com** - **mtv:video**
- **mtvservices:embedded** - **mtvservices:embedded**
- **MuenchenTV**: münchen.tv - **MuenchenTV**: münchen.tv
- **MusicPlayOn** - **MusicPlayOn**
@ -403,11 +446,13 @@
- **MyVidster** - **MyVidster**
- **n-tv.de** - **n-tv.de**
- **natgeo** - **natgeo**
- **natgeo:channel** - **natgeo:episodeguide**
- **natgeo:video**
- **Naver** - **Naver**
- **NBA** - **NBA**
- **NBC** - **NBC**
- **NBCNews** - **NBCNews**
- **NBCOlympics**
- **NBCSports** - **NBCSports**
- **NBCSportsVPlayer** - **NBCSportsVPlayer**
- **ndr**: NDR.de - Norddeutscher Rundfunk - **ndr**: NDR.de - Norddeutscher Rundfunk
@ -427,9 +472,9 @@
- **Newstube** - **Newstube**
- **NextMedia**: 蘋果日報 - **NextMedia**: 蘋果日報
- **NextMediaActionNews**: 蘋果日報 - 動新聞 - **NextMediaActionNews**: 蘋果日報 - 動新聞
- **nextmovie.com**
- **nfb**: National Film Board of Canada - **nfb**: National Film Board of Canada
- **nfl.com** - **nfl.com**
- **NhkVod**
- **nhl.com** - **nhl.com**
- **nhl.com:news**: NHL news - **nhl.com:news**: NHL news
- **nhl.com:videocenter** - **nhl.com:videocenter**
@ -438,6 +483,7 @@
- **nick.de** - **nick.de**
- **niconico**: ニコニコ動画 - **niconico**: ニコニコ動画
- **NiconicoPlaylist** - **NiconicoPlaylist**
- **Nintendo**
- **njoy**: N-JOY - **njoy**: N-JOY
- **njoy:embed** - **njoy:embed**
- **Noco** - **Noco**
@ -464,10 +510,14 @@
- **Nuvid** - **Nuvid**
- **NYTimes** - **NYTimes**
- **NYTimesArticle** - **NYTimesArticle**
- **NZZ**
- **ocw.mit.edu** - **ocw.mit.edu**
- **OdaTV**
- **Odnoklassniki** - **Odnoklassniki**
- **OktoberfestTV** - **OktoberfestTV**
- **on.aol.com** - **on.aol.com**
- **onet.tv**
- **onet.tv:channel**
- **OnionStudios** - **OnionStudios**
- **Ooyala** - **Ooyala**
- **OoyalaExternal** - **OoyalaExternal**
@ -491,7 +541,6 @@
- **Pinkbike** - **Pinkbike**
- **Pladform** - **Pladform**
- **play.fm** - **play.fm**
- **played.to**
- **PlaysTV** - **PlaysTV**
- **Playtvak**: Playtvak.cz, iDNES.cz and Lidovky.cz - **Playtvak**: Playtvak.cz, iDNES.cz and Lidovky.cz
- **Playvid** - **Playvid**
@ -501,8 +550,12 @@
- **plus.google**: Google Plus - **plus.google**: Google Plus
- **pluzz.francetv.fr** - **pluzz.francetv.fr**
- **podomatic** - **podomatic**
- **Pokemon**
- **PolskieRadio**
- **PolskieRadioCategory**
- **PornCom**
- **PornHd** - **PornHd**
- **PornHub** - **PornHub**: PornHub and Thumbzilla
- **PornHubPlaylist** - **PornHubPlaylist**
- **PornHubUserVideos** - **PornHubUserVideos**
- **Pornotube** - **Pornotube**
@ -540,9 +593,12 @@
- **revision3:embed** - **revision3:embed**
- **RICE** - **RICE**
- **RingTV** - **RingTV**
- **RMCDecouverte**
- **RockstarGames** - **RockstarGames**
- **RoosterTeeth**
- **RottenTomatoes** - **RottenTomatoes**
- **Roxwel** - **Roxwel**
- **Rozhlas**
- **RTBF** - **RTBF**
- **rte**: Raidió Teilifís Éireann TV - **rte**: Raidió Teilifís Éireann TV
- **rte:radio**: Raidió Teilifís Éireann radio - **rte:radio**: Raidió Teilifís Éireann radio
@ -553,7 +609,9 @@
- **rtve.es:alacarta**: RTVE a la carta - **rtve.es:alacarta**: RTVE a la carta
- **rtve.es:infantil**: RTVE infantil - **rtve.es:infantil**: RTVE infantil
- **rtve.es:live**: RTVE.es live streams - **rtve.es:live**: RTVE.es live streams
- **rtve.es:television**
- **RTVNH** - **RTVNH**
- **Rudo**
- **RUHD** - **RUHD**
- **RulePorn** - **RulePorn**
- **rutube**: Rutube videos - **rutube**: Rutube videos
@ -586,8 +644,10 @@
- **Shared**: shared.sx and vivo.sx - **Shared**: shared.sx and vivo.sx
- **ShareSix** - **ShareSix**
- **Sina** - **Sina**
- **SixPlay**
- **skynewsarabia:article**
- **skynewsarabia:video** - **skynewsarabia:video**
- **skynewsarabia:video** - **SkySports**
- **Slideshare** - **Slideshare**
- **Slutload** - **Slutload**
- **smotri**: Smotri.com - **smotri**: Smotri.com
@ -596,6 +656,7 @@
- **smotri:user**: Smotri.com user videos - **smotri:user**: Smotri.com user videos
- **Snotr** - **Snotr**
- **Sohu** - **Sohu**
- **SonyLIV**
- **soundcloud** - **soundcloud**
- **soundcloud:playlist** - **soundcloud:playlist**
- **soundcloud:search**: Soundcloud search - **soundcloud:search**: Soundcloud search
@ -619,12 +680,13 @@
- **SportBoxEmbed** - **SportBoxEmbed**
- **SportDeutschland** - **SportDeutschland**
- **Sportschau** - **Sportschau**
- **sr:mediathek**: Saarländischer Rundfunk
- **SRGSSR** - **SRGSSR**
- **SRGSSRPlay**: srf.ch, rts.ch, rsi.ch, rtr.ch and swissinfo.ch play sites - **SRGSSRPlay**: srf.ch, rts.ch, rsi.ch, rtr.ch and swissinfo.ch play sites
- **SSA**
- **stanfordoc**: Stanford Open ClassRoom - **stanfordoc**: Stanford Open ClassRoom
- **Steam** - **Steam**
- **Stitcher** - **Stitcher**
- **Streamable**
- **streamcloud.eu** - **streamcloud.eu**
- **StreamCZ** - **StreamCZ**
- **StreetVoice** - **StreetVoice**
@ -634,10 +696,11 @@
- **SWRMediathek** - **SWRMediathek**
- **Syfy** - **Syfy**
- **SztvHu** - **SztvHu**
- **t-online.de**
- **Tagesschau** - **Tagesschau**
- **tagesschau:player** - **tagesschau:player**
- **Tapely**
- **Tass** - **Tass**
- **TBS**
- **TDSLifeway** - **TDSLifeway**
- **teachertube**: teachertube.com videos - **teachertube**: teachertube.com videos
- **teachertube:user:collection**: teachertube.com user and collection videos - **teachertube:user:collection**: teachertube.com user and collection videos
@ -652,19 +715,22 @@
- **Telecinco**: telecinco.es, cuatro.com and mediaset.es - **Telecinco**: telecinco.es, cuatro.com and mediaset.es
- **Telegraaf** - **Telegraaf**
- **TeleMB** - **TeleMB**
- **TeleQuebec**
- **TeleTask** - **TeleTask**
- **Telewebion** - **Telewebion**
- **TF1** - **TF1**
- **TFO**
- **TheIntercept** - **TheIntercept**
- **theoperaplatform**
- **ThePlatform** - **ThePlatform**
- **ThePlatformFeed** - **ThePlatformFeed**
- **TheScene** - **TheScene**
- **TheSixtyOne** - **TheSixtyOne**
- **TheStar** - **TheStar**
- **TheWeatherChannel**
- **ThisAmericanLife** - **ThisAmericanLife**
- **ThisAV** - **ThisAV**
- **THVideo** - **ThisOldHouse**
- **THVideoPlaylist**
- **tinypic**: tinypic.com videos - **tinypic**: tinypic.com videos
- **tlc.de** - **tlc.de**
- **TMZ** - **TMZ**
@ -672,13 +738,13 @@
- **TNAFlix** - **TNAFlix**
- **TNAFlixNetworkEmbed** - **TNAFlixNetworkEmbed**
- **toggle** - **toggle**
- **Tosh**: Tosh.0
- **tou.tv** - **tou.tv**
- **Toypics**: Toypics user profile - **Toypics**: Toypics user profile
- **ToypicsUser**: Toypics user profile - **ToypicsUser**: Toypics user profile
- **TrailerAddict** (Currently broken) - **TrailerAddict** (Currently broken)
- **Trilulilu** - **Trilulilu**
- **trollvids** - **TruTV**
- **TruTube**
- **Tube8** - **Tube8**
- **TubiTv** - **TubiTv**
- **tudou** - **tudou**
@ -700,9 +766,10 @@
- **TVCArticle** - **TVCArticle**
- **tvigle**: Интернет-телевидение Tvigle.ru - **tvigle**: Интернет-телевидение Tvigle.ru
- **tvland.com** - **tvland.com**
- **TVNoe**
- **tvp**: Telewizja Polska - **tvp**: Telewizja Polska
- **tvp:embed**: Telewizja Polska
- **tvp:series** - **tvp:series**
- **TVPlay**: TV3Play and related services
- **Tweakers** - **Tweakers**
- **twitch:chapter** - **twitch:chapter**
- **twitch:clips** - **twitch:clips**
@ -718,7 +785,12 @@
- **udemy:course** - **udemy:course**
- **UDNEmbed**: 聯合影音 - **UDNEmbed**: 聯合影音
- **Unistra** - **Unistra**
- **uol.com.br**
- **uplynk**
- **uplynk:preplay**
- **Urort**: NRK P3 Urørt - **Urort**: NRK P3 Urørt
- **URPlay**
- **USANetwork**
- **USAToday** - **USAToday**
- **ustream** - **ustream**
- **ustream:channel** - **ustream:channel**
@ -734,8 +806,11 @@
- **VevoPlaylist** - **VevoPlaylist**
- **VGTV**: VGTV, BTTV, FTV, Aftenposten and Aftonbladet - **VGTV**: VGTV, BTTV, FTV, Aftenposten and Aftonbladet
- **vh1.com** - **vh1.com**
- **Viafree**
- **Vice** - **Vice**
- **Viceland**
- **ViceShow** - **ViceShow**
- **Vidbit**
- **Viddler** - **Viddler**
- **video.google:search**: Google Video search - **video.google:search**: Google Video search
- **video.mit.edu** - **video.mit.edu**
@ -775,8 +850,10 @@
- **vine:user** - **vine:user**
- **vk**: VK - **vk**: VK
- **vk:uservideos**: VK - User's Videos - **vk:uservideos**: VK - User's Videos
- **vk:wallpost**
- **vlive** - **vlive**
- **Vodlocker** - **Vodlocker**
- **VODPlatform**
- **VoiceRepublic** - **VoiceRepublic**
- **VoxMedia** - **VoxMedia**
- **Vporn** - **Vporn**
@ -784,6 +861,7 @@
- **VRT** - **VRT**
- **vube**: Vube.com - **vube**: Vube.com
- **VuClip** - **VuClip**
- **VyboryMos**
- **Walla** - **Walla**
- **washingtonpost** - **washingtonpost**
- **washingtonpost:article** - **washingtonpost:article**
@ -797,7 +875,7 @@
- **wholecloud**: WholeCloud - **wholecloud**: WholeCloud
- **Wimp** - **Wimp**
- **Wistia** - **Wistia**
- **WNL** - **wnl**: npo.nl and ntr.nl
- **WorldStarHipHop** - **WorldStarHipHop**
- **wrzuta.pl** - **wrzuta.pl**
- **wrzuta.pl:playlist** - **wrzuta.pl:playlist**
@ -843,6 +921,7 @@
- **youtube:search**: YouTube.com searches - **youtube:search**: YouTube.com searches
- **youtube:search:date**: YouTube.com searches, newest videos first - **youtube:search:date**: YouTube.com searches, newest videos first
- **youtube:search_url**: YouTube.com search URLs - **youtube:search_url**: YouTube.com search URLs
- **youtube:shared**
- **youtube:show**: YouTube.com (multi-season) shows - **youtube:show**: YouTube.com (multi-season) shows
- **youtube:subscriptions**: YouTube.com subscriptions feed, "ytsubs" keyword (requires authentication) - **youtube:subscriptions**: YouTube.com subscriptions feed, "ytsubs" keyword (requires authentication)
- **youtube:user**: YouTube.com user videos (URL or "ytuser" keyword) - **youtube:user**: YouTube.com user videos (URL or "ytuser" keyword)
@ -850,6 +929,4 @@
- **Zapiks** - **Zapiks**
- **ZDF** - **ZDF**
- **ZDFChannel** - **ZDFChannel**
- **zingmp3:album**: mp3.zing.vn albums - **zingmp3**: mp3.zing.vn
- **zingmp3:song**: mp3.zing.vn songs
- **ZippCast**

View File

@ -1,5 +1,5 @@
#!/usr/bin/env python #!/usr/bin/env python
# -*- coding: utf-8 -*- # coding: utf-8
from __future__ import print_function from __future__ import print_function
@ -21,25 +21,37 @@ try:
import py2exe import py2exe
except ImportError: except ImportError:
if len(sys.argv) >= 2 and sys.argv[1] == 'py2exe': if len(sys.argv) >= 2 and sys.argv[1] == 'py2exe':
print("Cannot import py2exe", file=sys.stderr) print('Cannot import py2exe', file=sys.stderr)
exit(1) exit(1)
py2exe_options = { py2exe_options = {
"bundle_files": 1, 'bundle_files': 1,
"compressed": 1, 'compressed': 1,
"optimize": 2, 'optimize': 2,
"dist_dir": '.', 'dist_dir': '.',
"dll_excludes": ['w9xpopen.exe', 'crypt32.dll'], 'dll_excludes': ['w9xpopen.exe', 'crypt32.dll'],
} }
# Get the version from youtube_dl/version.py without importing the package
exec(compile(open('youtube_dl/version.py').read(),
'youtube_dl/version.py', 'exec'))
DESCRIPTION = 'YouTube video downloader'
LONG_DESCRIPTION = 'Command-line program to download videos from YouTube.com and other video sites'
py2exe_console = [{ py2exe_console = [{
"script": "./youtube_dl/__main__.py", 'script': './youtube_dl/__main__.py',
"dest_base": "youtube-dl", 'dest_base': 'youtube-dl',
'version': __version__,
'description': DESCRIPTION,
'comments': LONG_DESCRIPTION,
'product_name': 'youtube-dl',
'product_version': __version__,
}] }]
py2exe_params = { py2exe_params = {
'console': py2exe_console, 'console': py2exe_console,
'options': {"py2exe": py2exe_options}, 'options': {'py2exe': py2exe_options},
'zipfile': None 'zipfile': None
} }
@ -72,7 +84,7 @@ else:
params['scripts'] = ['bin/youtube-dl'] params['scripts'] = ['bin/youtube-dl']
class build_lazy_extractors(Command): class build_lazy_extractors(Command):
description = "Build the extractor lazy loading module" description = 'Build the extractor lazy loading module'
user_options = [] user_options = []
def initialize_options(self): def initialize_options(self):
@ -87,16 +99,11 @@ class build_lazy_extractors(Command):
dry_run=self.dry_run, dry_run=self.dry_run,
) )
# Get the version from youtube_dl/version.py without importing the package
exec(compile(open('youtube_dl/version.py').read(),
'youtube_dl/version.py', 'exec'))
setup( setup(
name='youtube_dl', name='youtube_dl',
version=__version__, version=__version__,
description='YouTube video downloader', description=DESCRIPTION,
long_description='Small command-line program to download videos from' long_description=LONG_DESCRIPTION,
' YouTube.com and other video sites.',
url='https://github.com/rg3/youtube-dl', url='https://github.com/rg3/youtube-dl',
author='Ricardo Garcia', author='Ricardo Garcia',
author_email='ytdl@yt-dl.org', author_email='ytdl@yt-dl.org',
@ -112,17 +119,17 @@ setup(
# test_requires = ['nosetest'], # test_requires = ['nosetest'],
classifiers=[ classifiers=[
"Topic :: Multimedia :: Video", 'Topic :: Multimedia :: Video',
"Development Status :: 5 - Production/Stable", 'Development Status :: 5 - Production/Stable',
"Environment :: Console", 'Environment :: Console',
"License :: Public Domain", 'License :: Public Domain',
"Programming Language :: Python :: 2.6", 'Programming Language :: Python :: 2.6',
"Programming Language :: Python :: 2.7", 'Programming Language :: Python :: 2.7',
"Programming Language :: Python :: 3", 'Programming Language :: Python :: 3',
"Programming Language :: Python :: 3.2", 'Programming Language :: Python :: 3.2',
"Programming Language :: Python :: 3.3", 'Programming Language :: Python :: 3.3',
"Programming Language :: Python :: 3.4", 'Programming Language :: Python :: 3.4',
"Programming Language :: Python :: 3.5", 'Programming Language :: Python :: 3.5',
], ],
cmdclass={'build_lazy_extractors': build_lazy_extractors}, cmdclass={'build_lazy_extractors': build_lazy_extractors},

View File

@ -11,7 +11,7 @@ sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import FakeYDL from test.helper import FakeYDL
from youtube_dl.extractor.common import InfoExtractor from youtube_dl.extractor.common import InfoExtractor
from youtube_dl.extractor import YoutubeIE, get_info_extractor from youtube_dl.extractor import YoutubeIE, get_info_extractor
from youtube_dl.utils import encode_data_uri, strip_jsonp, ExtractorError from youtube_dl.utils import encode_data_uri, strip_jsonp, ExtractorError, RegexNotFoundError
class TestIE(InfoExtractor): class TestIE(InfoExtractor):
@ -48,6 +48,9 @@ class TestInfoExtractor(unittest.TestCase):
self.assertEqual(ie._og_search_property('foobar', html), 'Foo') self.assertEqual(ie._og_search_property('foobar', html), 'Foo')
self.assertEqual(ie._og_search_property('test1', html), 'foo > < bar') self.assertEqual(ie._og_search_property('test1', html), 'foo > < bar')
self.assertEqual(ie._og_search_property('test2', html), 'foo >//< bar') self.assertEqual(ie._og_search_property('test2', html), 'foo >//< bar')
self.assertEqual(ie._og_search_property(('test0', 'test1'), html), 'foo > < bar')
self.assertRaises(RegexNotFoundError, ie._og_search_property, 'test0', html, None, fatal=True)
self.assertRaises(RegexNotFoundError, ie._og_search_property, ('test0', 'test00'), html, None, fatal=True)
def test_html_search_meta(self): def test_html_search_meta(self):
ie = self.ie ie = self.ie
@ -66,6 +69,11 @@ class TestInfoExtractor(unittest.TestCase):
self.assertEqual(ie._html_search_meta('d', html), '4') self.assertEqual(ie._html_search_meta('d', html), '4')
self.assertEqual(ie._html_search_meta('e', html), '5') self.assertEqual(ie._html_search_meta('e', html), '5')
self.assertEqual(ie._html_search_meta('f', html), '6') self.assertEqual(ie._html_search_meta('f', html), '6')
self.assertEqual(ie._html_search_meta(('a', 'b', 'c'), html), '1')
self.assertEqual(ie._html_search_meta(('c', 'b', 'a'), html), '3')
self.assertEqual(ie._html_search_meta(('z', 'x', 'c'), html), '3')
self.assertRaises(RegexNotFoundError, ie._html_search_meta, 'z', html, None, fatal=True)
self.assertRaises(RegexNotFoundError, ie._html_search_meta, ('z', 'x'), html, None, fatal=True)
def test_download_json(self): def test_download_json(self):
uri = encode_data_uri(b'{"foo": "blah"}', 'application/json') uri = encode_data_uri(b'{"foo": "blah"}', 'application/json')

View File

@ -335,6 +335,40 @@ class TestFormatSelection(unittest.TestCase):
downloaded = ydl.downloaded_info_dicts[0] downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], f1['format_id']) self.assertEqual(downloaded['format_id'], f1['format_id'])
def test_audio_only_extractor_format_selection(self):
# For extractors with incomplete formats (all formats are audio-only or
# video-only) best and worst should fallback to corresponding best/worst
# video-only or audio-only formats (as per
# https://github.com/rg3/youtube-dl/pull/5556)
formats = [
{'format_id': 'low', 'ext': 'mp3', 'preference': 1, 'vcodec': 'none', 'url': TEST_URL},
{'format_id': 'high', 'ext': 'mp3', 'preference': 2, 'vcodec': 'none', 'url': TEST_URL},
]
info_dict = _make_result(formats)
ydl = YDL({'format': 'best'})
ydl.process_ie_result(info_dict.copy())
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'high')
ydl = YDL({'format': 'worst'})
ydl.process_ie_result(info_dict.copy())
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'low')
def test_format_not_available(self):
formats = [
{'format_id': 'regular', 'ext': 'mp4', 'height': 360, 'url': TEST_URL},
{'format_id': 'video', 'ext': 'mp4', 'height': 720, 'acodec': 'none', 'url': TEST_URL},
]
info_dict = _make_result(formats)
# This must fail since complete video-audio format does not match filter
# and extractor does not provide incomplete only formats (i.e. only
# video-only or audio-only).
ydl = YDL({'format': 'best[height>360]'})
self.assertRaises(ExtractorError, ydl.process_ie_result, info_dict.copy())
def test_invalid_format_specs(self): def test_invalid_format_specs(self):
def assert_syntax_error(format_spec): def assert_syntax_error(format_spec):
ydl = YDL({'format': format_spec}) ydl = YDL({'format': format_spec})

View File

@ -6,6 +6,7 @@ from __future__ import unicode_literals
import os import os
import sys import sys
import unittest import unittest
import collections
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
@ -100,8 +101,6 @@ class TestAllURLsMatching(unittest.TestCase):
self.assertMatch(':ytsubs', ['youtube:subscriptions']) self.assertMatch(':ytsubs', ['youtube:subscriptions'])
self.assertMatch(':ytsubscriptions', ['youtube:subscriptions']) self.assertMatch(':ytsubscriptions', ['youtube:subscriptions'])
self.assertMatch(':ythistory', ['youtube:history']) self.assertMatch(':ythistory', ['youtube:history'])
self.assertMatch(':thedailyshow', ['ComedyCentralShows'])
self.assertMatch(':tds', ['ComedyCentralShows'])
def test_vimeo_matching(self): def test_vimeo_matching(self):
self.assertMatch('https://vimeo.com/channels/tributes', ['vimeo:channel']) self.assertMatch('https://vimeo.com/channels/tributes', ['vimeo:channel'])
@ -130,6 +129,15 @@ class TestAllURLsMatching(unittest.TestCase):
'https://screen.yahoo.com/smartwatches-latest-wearable-gadgets-163745379-cbs.html', 'https://screen.yahoo.com/smartwatches-latest-wearable-gadgets-163745379-cbs.html',
['Yahoo']) ['Yahoo'])
def test_no_duplicated_ie_names(self):
name_accu = collections.defaultdict(list)
for ie in self.ies:
name_accu[ie.IE_NAME.lower()].append(type(ie).__name__)
for (ie_name, ie_list) in name_accu.items():
self.assertEqual(
len(ie_list), 1,
'Multiple extractors with the same IE_NAME "%s" (%s)' % (ie_name, ', '.join(ie_list)))
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()

View File

@ -87,6 +87,8 @@ class TestCompat(unittest.TestCase):
def test_compat_shlex_split(self): def test_compat_shlex_split(self):
self.assertEqual(compat_shlex_split('-option "one two"'), ['-option', 'one two']) self.assertEqual(compat_shlex_split('-option "one two"'), ['-option', 'one two'])
self.assertEqual(compat_shlex_split('-option "one\ntwo" \n -flag'), ['-option', 'one\ntwo', '-flag'])
self.assertEqual(compat_shlex_split('-val 中文'), ['-val', '中文'])
def test_compat_etree_fromstring(self): def test_compat_etree_fromstring(self):
xml = ''' xml = '''

View File

@ -87,7 +87,7 @@ class TestHTTP(unittest.TestCase):
ydl = YoutubeDL({'logger': FakeLogger()}) ydl = YoutubeDL({'logger': FakeLogger()})
r = ydl.extract_info('http://localhost:%d/302' % self.port) r = ydl.extract_info('http://localhost:%d/302' % self.port)
self.assertEqual(r['url'], 'http://localhost:%d/vid.mp4' % self.port) self.assertEqual(r['entries'][0]['url'], 'http://localhost:%d/vid.mp4' % self.port)
class TestHTTPS(unittest.TestCase): class TestHTTPS(unittest.TestCase):
@ -111,7 +111,7 @@ class TestHTTPS(unittest.TestCase):
ydl = YoutubeDL({'logger': FakeLogger(), 'nocheckcertificate': True}) ydl = YoutubeDL({'logger': FakeLogger(), 'nocheckcertificate': True})
r = ydl.extract_info('https://localhost:%d/video.html' % self.port) r = ydl.extract_info('https://localhost:%d/video.html' % self.port)
self.assertEqual(r['url'], 'https://localhost:%d/vid.mp4' % self.port) self.assertEqual(r['entries'][0]['url'], 'https://localhost:%d/vid.mp4' % self.port)
def _build_proxy_handler(name): def _build_proxy_handler(name):
@ -138,27 +138,27 @@ class TestProxy(unittest.TestCase):
self.proxy_thread.daemon = True self.proxy_thread.daemon = True
self.proxy_thread.start() self.proxy_thread.start()
self.cn_proxy = compat_http_server.HTTPServer( self.geo_proxy = compat_http_server.HTTPServer(
('localhost', 0), _build_proxy_handler('cn')) ('localhost', 0), _build_proxy_handler('geo'))
self.cn_port = http_server_port(self.cn_proxy) self.geo_port = http_server_port(self.geo_proxy)
self.cn_proxy_thread = threading.Thread(target=self.cn_proxy.serve_forever) self.geo_proxy_thread = threading.Thread(target=self.geo_proxy.serve_forever)
self.cn_proxy_thread.daemon = True self.geo_proxy_thread.daemon = True
self.cn_proxy_thread.start() self.geo_proxy_thread.start()
def test_proxy(self): def test_proxy(self):
cn_proxy = 'localhost:{0}'.format(self.cn_port) geo_proxy = 'localhost:{0}'.format(self.geo_port)
ydl = YoutubeDL({ ydl = YoutubeDL({
'proxy': 'localhost:{0}'.format(self.port), 'proxy': 'localhost:{0}'.format(self.port),
'cn_verification_proxy': cn_proxy, 'geo_verification_proxy': geo_proxy,
}) })
url = 'http://foo.com/bar' url = 'http://foo.com/bar'
response = ydl.urlopen(url).read().decode('utf-8') response = ydl.urlopen(url).read().decode('utf-8')
self.assertEqual(response, 'normal: {0}'.format(url)) self.assertEqual(response, 'normal: {0}'.format(url))
req = compat_urllib_request.Request(url) req = compat_urllib_request.Request(url)
req.add_header('Ytdl-request-proxy', cn_proxy) req.add_header('Ytdl-request-proxy', geo_proxy)
response = ydl.urlopen(req).read().decode('utf-8') response = ydl.urlopen(req).read().decode('utf-8')
self.assertEqual(response, 'cn: {0}'.format(url)) self.assertEqual(response, 'geo: {0}'.format(url))
def test_proxy_with_idn(self): def test_proxy_with_idn(self):
ydl = YoutubeDL({ ydl = YoutubeDL({

View File

@ -33,14 +33,18 @@ from youtube_dl.utils import (
ExtractorError, ExtractorError,
find_xpath_attr, find_xpath_attr,
fix_xml_ampersands, fix_xml_ampersands,
get_element_by_class,
InAdvancePagedList, InAdvancePagedList,
intlist_to_bytes, intlist_to_bytes,
is_html, is_html,
js_to_json, js_to_json,
limit_length, limit_length,
mimetype2ext,
month_by_name,
ohdave_rsa_encrypt, ohdave_rsa_encrypt,
OnDemandPagedList, OnDemandPagedList,
orderedSet, orderedSet,
parse_age_limit,
parse_duration, parse_duration,
parse_filesize, parse_filesize,
parse_count, parse_count,
@ -60,11 +64,13 @@ from youtube_dl.utils import (
timeconvert, timeconvert,
unescapeHTML, unescapeHTML,
unified_strdate, unified_strdate,
unified_timestamp,
unsmuggle_url, unsmuggle_url,
uppercase_escape, uppercase_escape,
lowercase_escape, lowercase_escape,
url_basename, url_basename,
urlencode_postdata, urlencode_postdata,
urshift,
update_url_query, update_url_query,
version_tuple, version_tuple,
xpath_with_ns, xpath_with_ns,
@ -78,6 +84,7 @@ from youtube_dl.utils import (
cli_option, cli_option,
cli_valueless_option, cli_valueless_option,
cli_bool_option, cli_bool_option,
parse_codecs,
) )
from youtube_dl.compat import ( from youtube_dl.compat import (
compat_chr, compat_chr,
@ -283,7 +290,30 @@ class TestUtil(unittest.TestCase):
'20150202') '20150202')
self.assertEqual(unified_strdate('Feb 14th 2016 5:45PM'), '20160214') self.assertEqual(unified_strdate('Feb 14th 2016 5:45PM'), '20160214')
self.assertEqual(unified_strdate('25-09-2014'), '20140925') self.assertEqual(unified_strdate('25-09-2014'), '20140925')
self.assertEqual(unified_strdate('27.02.2016 17:30'), '20160227')
self.assertEqual(unified_strdate('UNKNOWN DATE FORMAT'), None) self.assertEqual(unified_strdate('UNKNOWN DATE FORMAT'), None)
self.assertEqual(unified_strdate('Feb 7, 2016 at 6:35 pm'), '20160207')
def test_unified_timestamps(self):
self.assertEqual(unified_timestamp('December 21, 2010'), 1292889600)
self.assertEqual(unified_timestamp('8/7/2009'), 1247011200)
self.assertEqual(unified_timestamp('Dec 14, 2012'), 1355443200)
self.assertEqual(unified_timestamp('2012/10/11 01:56:38 +0000'), 1349920598)
self.assertEqual(unified_timestamp('1968 12 10'), -33436800)
self.assertEqual(unified_timestamp('1968-12-10'), -33436800)
self.assertEqual(unified_timestamp('28/01/2014 21:00:00 +0100'), 1390939200)
self.assertEqual(
unified_timestamp('11/26/2014 11:30:00 AM PST', day_first=False),
1417001400)
self.assertEqual(
unified_timestamp('2/2/2015 6:47:40 PM', day_first=False),
1422902860)
self.assertEqual(unified_timestamp('Feb 14th 2016 5:45PM'), 1455471900)
self.assertEqual(unified_timestamp('25-09-2014'), 1411603200)
self.assertEqual(unified_timestamp('27.02.2016 17:30'), 1456594200)
self.assertEqual(unified_timestamp('UNKNOWN DATE FORMAT'), None)
self.assertEqual(unified_timestamp('May 16, 2016 11:15 PM'), 1463440500)
self.assertEqual(unified_timestamp('Feb 7, 2016 at 6:35 pm'), 1454870100)
def test_determine_ext(self): def test_determine_ext(self):
self.assertEqual(determine_ext('http://example.com/foo/bar.mp4/?download'), 'mp4') self.assertEqual(determine_ext('http://example.com/foo/bar.mp4/?download'), 'mp4')
@ -383,6 +413,12 @@ class TestUtil(unittest.TestCase):
self.assertEqual(res_url, url) self.assertEqual(res_url, url)
self.assertEqual(res_data, None) self.assertEqual(res_data, None)
smug_url = smuggle_url(url, {'a': 'b'})
smug_smug_url = smuggle_url(smug_url, {'c': 'd'})
res_url, res_data = unsmuggle_url(smug_smug_url)
self.assertEqual(res_url, url)
self.assertEqual(res_data, {'a': 'b', 'c': 'd'})
def test_shell_quote(self): def test_shell_quote(self):
args = ['ffmpeg', '-i', encodeFilename('ñ€ß\'.mp4')] args = ['ffmpeg', '-i', encodeFilename('ñ€ß\'.mp4')]
self.assertEqual(shell_quote(args), """ffmpeg -i 'ñ€ß'"'"'.mp4'""") self.assertEqual(shell_quote(args), """ffmpeg -i 'ñ€ß'"'"'.mp4'""")
@ -401,6 +437,20 @@ class TestUtil(unittest.TestCase):
url_basename('http://media.w3.org/2010/05/sintel/trailer.mp4'), url_basename('http://media.w3.org/2010/05/sintel/trailer.mp4'),
'trailer.mp4') 'trailer.mp4')
def test_parse_age_limit(self):
self.assertEqual(parse_age_limit(None), None)
self.assertEqual(parse_age_limit(False), None)
self.assertEqual(parse_age_limit('invalid'), None)
self.assertEqual(parse_age_limit(0), 0)
self.assertEqual(parse_age_limit(18), 18)
self.assertEqual(parse_age_limit(21), 21)
self.assertEqual(parse_age_limit(22), None)
self.assertEqual(parse_age_limit('18'), 18)
self.assertEqual(parse_age_limit('18+'), 18)
self.assertEqual(parse_age_limit('PG-13'), 13)
self.assertEqual(parse_age_limit('TV-14'), 14)
self.assertEqual(parse_age_limit('TV-MA'), 17)
def test_parse_duration(self): def test_parse_duration(self):
self.assertEqual(parse_duration(None), None) self.assertEqual(parse_duration(None), None)
self.assertEqual(parse_duration(False), None) self.assertEqual(parse_duration(False), None)
@ -579,6 +629,45 @@ class TestUtil(unittest.TestCase):
limit_length('foo bar baz asd', 12).startswith('foo bar')) limit_length('foo bar baz asd', 12).startswith('foo bar'))
self.assertTrue('...' in limit_length('foo bar baz asd', 12)) self.assertTrue('...' in limit_length('foo bar baz asd', 12))
def test_mimetype2ext(self):
self.assertEqual(mimetype2ext(None), None)
self.assertEqual(mimetype2ext('video/x-flv'), 'flv')
self.assertEqual(mimetype2ext('application/x-mpegURL'), 'm3u8')
self.assertEqual(mimetype2ext('text/vtt'), 'vtt')
self.assertEqual(mimetype2ext('text/vtt;charset=utf-8'), 'vtt')
self.assertEqual(mimetype2ext('text/html; charset=utf-8'), 'html')
def test_month_by_name(self):
self.assertEqual(month_by_name(None), None)
self.assertEqual(month_by_name('December', 'en'), 12)
self.assertEqual(month_by_name('décembre', 'fr'), 12)
self.assertEqual(month_by_name('December'), 12)
self.assertEqual(month_by_name('décembre'), None)
self.assertEqual(month_by_name('Unknown', 'unknown'), None)
def test_parse_codecs(self):
self.assertEqual(parse_codecs(''), {})
self.assertEqual(parse_codecs('avc1.77.30, mp4a.40.2'), {
'vcodec': 'avc1.77.30',
'acodec': 'mp4a.40.2',
})
self.assertEqual(parse_codecs('mp4a.40.2'), {
'vcodec': 'none',
'acodec': 'mp4a.40.2',
})
self.assertEqual(parse_codecs('mp4a.40.5,avc1.42001e'), {
'vcodec': 'avc1.42001e',
'acodec': 'mp4a.40.5',
})
self.assertEqual(parse_codecs('avc3.640028'), {
'vcodec': 'avc3.640028',
'acodec': 'none',
})
self.assertEqual(parse_codecs(', h264,,newcodec,aac'), {
'vcodec': 'h264',
'acodec': 'aac',
})
def test_escape_rfc3986(self): def test_escape_rfc3986(self):
reserved = "!*'();:@&=+$,/?#[]" reserved = "!*'();:@&=+$,/?#[]"
unreserved = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_.~' unreserved = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_.~'
@ -643,6 +732,9 @@ class TestUtil(unittest.TestCase):
inp = '''{"foo":101}''' inp = '''{"foo":101}'''
self.assertEqual(js_to_json(inp), '''{"foo":101}''') self.assertEqual(js_to_json(inp), '''{"foo":101}''')
inp = '''{"duration": "00:01:07"}'''
self.assertEqual(js_to_json(inp), '''{"duration": "00:01:07"}''')
def test_js_to_json_edgecases(self): def test_js_to_json_edgecases(self):
on = js_to_json("{abc_def:'1\\'\\\\2\\\\\\'3\"4'}") on = js_to_json("{abc_def:'1\\'\\\\2\\\\\\'3\"4'}")
self.assertEqual(json.loads(on), {"abc_def": "1'\\2\\'3\"4"}) self.assertEqual(json.loads(on), {"abc_def": "1'\\2\\'3\"4"})
@ -748,7 +840,10 @@ class TestUtil(unittest.TestCase):
self.assertEqual(parse_filesize('2 MiB'), 2097152) self.assertEqual(parse_filesize('2 MiB'), 2097152)
self.assertEqual(parse_filesize('5 GB'), 5000000000) self.assertEqual(parse_filesize('5 GB'), 5000000000)
self.assertEqual(parse_filesize('1.2Tb'), 1200000000000) self.assertEqual(parse_filesize('1.2Tb'), 1200000000000)
self.assertEqual(parse_filesize('1.2tb'), 1200000000000)
self.assertEqual(parse_filesize('1,24 KB'), 1240) self.assertEqual(parse_filesize('1,24 KB'), 1240)
self.assertEqual(parse_filesize('1,24 kb'), 1240)
self.assertEqual(parse_filesize('8.5 megabytes'), 8500000)
def test_parse_count(self): def test_parse_count(self):
self.assertEqual(parse_count(None), None) self.assertEqual(parse_count(None), None)
@ -899,6 +994,7 @@ The first line
self.assertEqual(cli_option({'proxy': '127.0.0.1:3128'}, '--proxy', 'proxy'), ['--proxy', '127.0.0.1:3128']) self.assertEqual(cli_option({'proxy': '127.0.0.1:3128'}, '--proxy', 'proxy'), ['--proxy', '127.0.0.1:3128'])
self.assertEqual(cli_option({'proxy': None}, '--proxy', 'proxy'), []) self.assertEqual(cli_option({'proxy': None}, '--proxy', 'proxy'), [])
self.assertEqual(cli_option({}, '--proxy', 'proxy'), []) self.assertEqual(cli_option({}, '--proxy', 'proxy'), [])
self.assertEqual(cli_option({'retries': 10}, '--retries', 'retries'), ['--retries', '10'])
def test_cli_valueless_option(self): def test_cli_valueless_option(self):
self.assertEqual(cli_valueless_option( self.assertEqual(cli_valueless_option(
@ -959,5 +1055,17 @@ The first line
self.assertRaises(ValueError, encode_base_n, 0, 70) self.assertRaises(ValueError, encode_base_n, 0, 70)
self.assertRaises(ValueError, encode_base_n, 0, 60, custom_table) self.assertRaises(ValueError, encode_base_n, 0, 60, custom_table)
def test_urshift(self):
self.assertEqual(urshift(3, 1), 1)
self.assertEqual(urshift(-3, 1), 2147483646)
def test_get_element_by_class(self):
html = '''
<span class="foo bar">nice</span>
'''
self.assertEqual(get_element_by_class('foo', html), 'nice')
self.assertEqual(get_element_by_class('no-such-class', html), None)
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()

View File

@ -0,0 +1,70 @@
#!/usr/bin/env python
# coding: utf-8
from __future__ import unicode_literals
import unittest
import sys
import os
import subprocess
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
rootDir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
class TestVerboseOutput(unittest.TestCase):
def test_private_info_arg(self):
outp = subprocess.Popen(
[
sys.executable, 'youtube_dl/__main__.py', '-v',
'--username', 'johnsmith@gmail.com',
'--password', 'secret',
], cwd=rootDir, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
sout, serr = outp.communicate()
self.assertTrue(b'--username' in serr)
self.assertTrue(b'johnsmith' not in serr)
self.assertTrue(b'--password' in serr)
self.assertTrue(b'secret' not in serr)
def test_private_info_shortarg(self):
outp = subprocess.Popen(
[
sys.executable, 'youtube_dl/__main__.py', '-v',
'-u', 'johnsmith@gmail.com',
'-p', 'secret',
], cwd=rootDir, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
sout, serr = outp.communicate()
self.assertTrue(b'-u' in serr)
self.assertTrue(b'johnsmith' not in serr)
self.assertTrue(b'-p' in serr)
self.assertTrue(b'secret' not in serr)
def test_private_info_eq(self):
outp = subprocess.Popen(
[
sys.executable, 'youtube_dl/__main__.py', '-v',
'--username=johnsmith@gmail.com',
'--password=secret',
], cwd=rootDir, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
sout, serr = outp.communicate()
self.assertTrue(b'--username' in serr)
self.assertTrue(b'johnsmith' not in serr)
self.assertTrue(b'--password' in serr)
self.assertTrue(b'secret' not in serr)
def test_private_info_shortarg_eq(self):
outp = subprocess.Popen(
[
sys.executable, 'youtube_dl/__main__.py', '-v',
'-u=johnsmith@gmail.com',
'-p=secret',
], cwd=rootDir, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
sout, serr = outp.communicate()
self.assertTrue(b'-u' in serr)
self.assertTrue(b'johnsmith' not in serr)
self.assertTrue(b'-p' in serr)
self.assertTrue(b'secret' not in serr)
if __name__ == '__main__':
unittest.main()

View File

@ -1,10 +1,11 @@
#!/usr/bin/env python #!/usr/bin/env python
# -*- coding: utf-8 -*- # coding: utf-8
from __future__ import absolute_import, unicode_literals from __future__ import absolute_import, unicode_literals
import collections import collections
import contextlib import contextlib
import copy
import datetime import datetime
import errno import errno
import fileinput import fileinput
@ -130,6 +131,9 @@ class YoutubeDL(object):
username: Username for authentication purposes. username: Username for authentication purposes.
password: Password for authentication purposes. password: Password for authentication purposes.
videopassword: Password for accessing a video. videopassword: Password for accessing a video.
ap_mso: Adobe Pass multiple-system operator identifier.
ap_username: Multiple-system operator account username.
ap_password: Multiple-system operator account password.
usenetrc: Use netrc for authentication instead. usenetrc: Use netrc for authentication instead.
verbose: Print additional info to stdout. verbose: Print additional info to stdout.
quiet: Do not print messages to stdout. quiet: Do not print messages to stdout.
@ -196,8 +200,8 @@ class YoutubeDL(object):
prefer_insecure: Use HTTP instead of HTTPS to retrieve information. prefer_insecure: Use HTTP instead of HTTPS to retrieve information.
At the moment, this is only supported by YouTube. At the moment, this is only supported by YouTube.
proxy: URL of the proxy server to use proxy: URL of the proxy server to use
cn_verification_proxy: URL of the proxy to use for IP address verification geo_verification_proxy: URL of the proxy to use for IP address verification
on Chinese sites. (Experimental) on geo-restricted sites. (Experimental)
socket_timeout: Time to wait for unresponsive hosts, in seconds socket_timeout: Time to wait for unresponsive hosts, in seconds
bidi_workaround: Work around buggy terminals without bidirectional text bidi_workaround: Work around buggy terminals without bidirectional text
support, using fridibi support, using fridibi
@ -248,7 +252,16 @@ class YoutubeDL(object):
source_address: (Experimental) Client-side IP address to bind to. source_address: (Experimental) Client-side IP address to bind to.
call_home: Boolean, true iff we are allowed to contact the call_home: Boolean, true iff we are allowed to contact the
youtube-dl servers for debugging. youtube-dl servers for debugging.
sleep_interval: Number of seconds to sleep before each download. sleep_interval: Number of seconds to sleep before each download when
used alone or a lower bound of a range for randomized
sleep before each download (minimum possible number
of seconds to sleep) when used along with
max_sleep_interval.
max_sleep_interval:Upper bound of a range for randomized sleep before each
download (maximum possible number of seconds to sleep).
Must only be used along with sleep_interval.
Actual sleep time will be a random float from range
[sleep_interval; max_sleep_interval].
listformats: Print an overview of available video formats and exit. listformats: Print an overview of available video formats and exit.
list_thumbnails: Print a table of all thumbnails and exit. list_thumbnails: Print a table of all thumbnails and exit.
match_filter: A function that gets called with the info_dict of match_filter: A function that gets called with the info_dict of
@ -304,6 +317,11 @@ class YoutubeDL(object):
self.params.update(params) self.params.update(params)
self.cache = Cache(self) self.cache = Cache(self)
if self.params.get('cn_verification_proxy') is not None:
self.report_warning('--cn-verification-proxy is deprecated. Use --geo-verification-proxy instead.')
if self.params.get('geo_verification_proxy') is None:
self.params['geo_verification_proxy'] = self.params['cn_verification_proxy']
if params.get('bidi_workaround', False): if params.get('bidi_workaround', False):
try: try:
import pty import pty
@ -1046,9 +1064,9 @@ class YoutubeDL(object):
if isinstance(selector, list): if isinstance(selector, list):
fs = [_build_selector_function(s) for s in selector] fs = [_build_selector_function(s) for s in selector]
def selector_function(formats): def selector_function(ctx):
for f in fs: for f in fs:
for format in f(formats): for format in f(ctx):
yield format yield format
return selector_function return selector_function
elif selector.type == GROUP: elif selector.type == GROUP:
@ -1056,17 +1074,17 @@ class YoutubeDL(object):
elif selector.type == PICKFIRST: elif selector.type == PICKFIRST:
fs = [_build_selector_function(s) for s in selector.selector] fs = [_build_selector_function(s) for s in selector.selector]
def selector_function(formats): def selector_function(ctx):
for f in fs: for f in fs:
picked_formats = list(f(formats)) picked_formats = list(f(ctx))
if picked_formats: if picked_formats:
return picked_formats return picked_formats
return [] return []
elif selector.type == SINGLE: elif selector.type == SINGLE:
format_spec = selector.selector format_spec = selector.selector
def selector_function(formats): def selector_function(ctx):
formats = list(formats) formats = list(ctx['formats'])
if not formats: if not formats:
return return
if format_spec == 'all': if format_spec == 'all':
@ -1079,9 +1097,10 @@ class YoutubeDL(object):
if f.get('vcodec') != 'none' and f.get('acodec') != 'none'] if f.get('vcodec') != 'none' and f.get('acodec') != 'none']
if audiovideo_formats: if audiovideo_formats:
yield audiovideo_formats[format_idx] yield audiovideo_formats[format_idx]
# for audio only (soundcloud) or video only (imgur) urls, select the best/worst audio format # for extractors with incomplete formats (audio only (soundcloud)
elif (all(f.get('acodec') != 'none' for f in formats) or # or video only (imgur)) we will fallback to best/worst
all(f.get('vcodec') != 'none' for f in formats)): # {video,audio}-only format
elif ctx['incomplete_formats']:
yield formats[format_idx] yield formats[format_idx]
elif format_spec == 'bestaudio': elif format_spec == 'bestaudio':
audio_formats = [ audio_formats = [
@ -1155,17 +1174,18 @@ class YoutubeDL(object):
} }
video_selector, audio_selector = map(_build_selector_function, selector.selector) video_selector, audio_selector = map(_build_selector_function, selector.selector)
def selector_function(formats): def selector_function(ctx):
formats = list(formats) for pair in itertools.product(
for pair in itertools.product(video_selector(formats), audio_selector(formats)): video_selector(copy.deepcopy(ctx)), audio_selector(copy.deepcopy(ctx))):
yield _merge(pair) yield _merge(pair)
filters = [self._build_format_filter(f) for f in selector.filters] filters = [self._build_format_filter(f) for f in selector.filters]
def final_selector(formats): def final_selector(ctx):
ctx_copy = copy.deepcopy(ctx)
for _filter in filters: for _filter in filters:
formats = list(filter(_filter, formats)) ctx_copy['formats'] = list(filter(_filter, ctx_copy['formats']))
return selector_function(formats) return selector_function(ctx_copy)
return final_selector return final_selector
stream = io.BytesIO(format_spec.encode('utf-8')) stream = io.BytesIO(format_spec.encode('utf-8'))
@ -1239,8 +1259,10 @@ class YoutubeDL(object):
info_dict['thumbnails'] = thumbnails = [{'url': thumbnail}] info_dict['thumbnails'] = thumbnails = [{'url': thumbnail}]
if thumbnails: if thumbnails:
thumbnails.sort(key=lambda t: ( thumbnails.sort(key=lambda t: (
t.get('preference'), t.get('width'), t.get('height'), t.get('preference') if t.get('preference') is not None else -1,
t.get('id'), t.get('url'))) t.get('width') if t.get('width') is not None else -1,
t.get('height') if t.get('height') is not None else -1,
t.get('id') if t.get('id') is not None else '', t.get('url')))
for i, t in enumerate(thumbnails): for i, t in enumerate(thumbnails):
t['url'] = sanitize_url(t['url']) t['url'] = sanitize_url(t['url'])
if t.get('width') and t.get('height'): if t.get('width') and t.get('height'):
@ -1282,7 +1304,7 @@ class YoutubeDL(object):
for subtitle_format in subtitle: for subtitle_format in subtitle:
if subtitle_format.get('url'): if subtitle_format.get('url'):
subtitle_format['url'] = sanitize_url(subtitle_format['url']) subtitle_format['url'] = sanitize_url(subtitle_format['url'])
if 'ext' not in subtitle_format: if subtitle_format.get('ext') is None:
subtitle_format['ext'] = determine_ext(subtitle_format['url']).lower() subtitle_format['ext'] = determine_ext(subtitle_format['url']).lower()
if self.params.get('listsubtitles', False): if self.params.get('listsubtitles', False):
@ -1337,7 +1359,7 @@ class YoutubeDL(object):
note=' ({0})'.format(format['format_note']) if format.get('format_note') is not None else '', note=' ({0})'.format(format['format_note']) if format.get('format_note') is not None else '',
) )
# Automatically determine file extension if missing # Automatically determine file extension if missing
if 'ext' not in format: if format.get('ext') is None:
format['ext'] = determine_ext(format['url']).lower() format['ext'] = determine_ext(format['url']).lower()
# Automatically determine protocol if missing (useful for format # Automatically determine protocol if missing (useful for format
# selection purposes) # selection purposes)
@ -1372,7 +1394,34 @@ class YoutubeDL(object):
req_format_list.append('best') req_format_list.append('best')
req_format = '/'.join(req_format_list) req_format = '/'.join(req_format_list)
format_selector = self.build_format_selector(req_format) format_selector = self.build_format_selector(req_format)
formats_to_download = list(format_selector(formats))
# While in format selection we may need to have an access to the original
# format set in order to calculate some metrics or do some processing.
# For now we need to be able to guess whether original formats provided
# by extractor are incomplete or not (i.e. whether extractor provides only
# video-only or audio-only formats) for proper formats selection for
# extractors with such incomplete formats (see
# https://github.com/rg3/youtube-dl/pull/5556).
# Since formats may be filtered during format selection and may not match
# the original formats the results may be incorrect. Thus original formats
# or pre-calculated metrics should be passed to format selection routines
# as well.
# We will pass a context object containing all necessary additional data
# instead of just formats.
# This fixes incorrect format selection issue (see
# https://github.com/rg3/youtube-dl/issues/10083).
incomplete_formats = (
# All formats are video-only or
all(f.get('vcodec') != 'none' and f.get('acodec') == 'none' for f in formats) or
# all formats are audio-only
all(f.get('vcodec') == 'none' and f.get('acodec') != 'none' for f in formats))
ctx = {
'formats': formats,
'incomplete_formats': incomplete_formats,
}
formats_to_download = list(format_selector(ctx))
if not formats_to_download: if not formats_to_download:
raise ExtractorError('requested format not available', raise ExtractorError('requested format not available',
expected=True) expected=True)
@ -1559,7 +1608,9 @@ class YoutubeDL(object):
self.to_screen('[info] Video subtitle %s.%s is already_present' % (sub_lang, sub_format)) self.to_screen('[info] Video subtitle %s.%s is already_present' % (sub_lang, sub_format))
else: else:
self.to_screen('[info] Writing video subtitles to: ' + sub_filename) self.to_screen('[info] Writing video subtitles to: ' + sub_filename)
with io.open(encodeFilename(sub_filename), 'w', encoding='utf-8') as subfile: # Use newline='' to prevent conversion of newline characters
# See https://github.com/rg3/youtube-dl/issues/10268
with io.open(encodeFilename(sub_filename), 'w', encoding='utf-8', newline='') as subfile:
subfile.write(sub_data) subfile.write(sub_data)
except (OSError, IOError): except (OSError, IOError):
self.report_error('Cannot write subtitles file ' + sub_filename) self.report_error('Cannot write subtitles file ' + sub_filename)

View File

@ -1,5 +1,5 @@
#!/usr/bin/env python #!/usr/bin/env python
# -*- coding: utf-8 -*- # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
@ -34,12 +34,14 @@ from .utils import (
setproctitle, setproctitle,
std_headers, std_headers,
write_string, write_string,
render_table,
) )
from .update import update_self from .update import update_self
from .downloader import ( from .downloader import (
FileDownloader, FileDownloader,
) )
from .extractor import gen_extractors, list_extractors from .extractor import gen_extractors, list_extractors
from .extractor.adobepass import MSO_INFO
from .YoutubeDL import YoutubeDL from .YoutubeDL import YoutubeDL
@ -118,18 +120,26 @@ def _real_main(argv=None):
desc += ' (Example: "%s%s:%s" )' % (ie.SEARCH_KEY, random.choice(_COUNTS), random.choice(_SEARCHES)) desc += ' (Example: "%s%s:%s" )' % (ie.SEARCH_KEY, random.choice(_COUNTS), random.choice(_SEARCHES))
write_string(desc + '\n', out=sys.stdout) write_string(desc + '\n', out=sys.stdout)
sys.exit(0) sys.exit(0)
if opts.ap_list_mso:
table = [[mso_id, mso_info['name']] for mso_id, mso_info in MSO_INFO.items()]
write_string('Supported TV Providers:\n' + render_table(['mso', 'mso name'], table) + '\n', out=sys.stdout)
sys.exit(0)
# Conflicting, missing and erroneous options # Conflicting, missing and erroneous options
if opts.usenetrc and (opts.username is not None or opts.password is not None): if opts.usenetrc and (opts.username is not None or opts.password is not None):
parser.error('using .netrc conflicts with giving username/password') parser.error('using .netrc conflicts with giving username/password')
if opts.password is not None and opts.username is None: if opts.password is not None and opts.username is None:
parser.error('account username missing\n') parser.error('account username missing\n')
if opts.ap_password is not None and opts.ap_username is None:
parser.error('TV Provider account username missing\n')
if opts.outtmpl is not None and (opts.usetitle or opts.autonumber or opts.useid): if opts.outtmpl is not None and (opts.usetitle or opts.autonumber or opts.useid):
parser.error('using output template conflicts with using title, video ID or auto number') parser.error('using output template conflicts with using title, video ID or auto number')
if opts.usetitle and opts.useid: if opts.usetitle and opts.useid:
parser.error('using title conflicts with using video ID') parser.error('using title conflicts with using video ID')
if opts.username is not None and opts.password is None: if opts.username is not None and opts.password is None:
opts.password = compat_getpass('Type account password and press [Return]: ') opts.password = compat_getpass('Type account password and press [Return]: ')
if opts.ap_username is not None and opts.ap_password is None:
opts.ap_password = compat_getpass('Type TV provider account password and press [Return]: ')
if opts.ratelimit is not None: if opts.ratelimit is not None:
numeric_limit = FileDownloader.parse_bytes(opts.ratelimit) numeric_limit = FileDownloader.parse_bytes(opts.ratelimit)
if numeric_limit is None: if numeric_limit is None:
@ -145,6 +155,18 @@ def _real_main(argv=None):
if numeric_limit is None: if numeric_limit is None:
parser.error('invalid max_filesize specified') parser.error('invalid max_filesize specified')
opts.max_filesize = numeric_limit opts.max_filesize = numeric_limit
if opts.sleep_interval is not None:
if opts.sleep_interval < 0:
parser.error('sleep interval must be positive or 0')
if opts.max_sleep_interval is not None:
if opts.max_sleep_interval < 0:
parser.error('max sleep interval must be positive or 0')
if opts.max_sleep_interval < opts.sleep_interval:
parser.error('max sleep interval must be greater than or equal to min sleep interval')
else:
opts.max_sleep_interval = opts.sleep_interval
if opts.ap_mso and opts.ap_mso not in MSO_INFO:
parser.error('Unsupported TV Provider, use --ap-list-mso to get a list of supported TV Providers')
def parse_retries(retries): def parse_retries(retries):
if retries in ('inf', 'infinite'): if retries in ('inf', 'infinite'):
@ -244,8 +266,6 @@ def _real_main(argv=None):
postprocessors.append({ postprocessors.append({
'key': 'FFmpegEmbedSubtitle', 'key': 'FFmpegEmbedSubtitle',
}) })
if opts.xattrs:
postprocessors.append({'key': 'XAttrMetadata'})
if opts.embedthumbnail: if opts.embedthumbnail:
already_have_thumbnail = opts.writethumbnail or opts.write_all_thumbnails already_have_thumbnail = opts.writethumbnail or opts.write_all_thumbnails
postprocessors.append({ postprocessors.append({
@ -254,6 +274,10 @@ def _real_main(argv=None):
}) })
if not already_have_thumbnail: if not already_have_thumbnail:
opts.writethumbnail = True opts.writethumbnail = True
# XAttrMetadataPP should be run after post-processors that may change file
# contents
if opts.xattrs:
postprocessors.append({'key': 'XAttrMetadata'})
# Please keep ExecAfterDownload towards the bottom as it allows the user to modify the final file in any way. # Please keep ExecAfterDownload towards the bottom as it allows the user to modify the final file in any way.
# So if the user is able to remove the file before your postprocessor runs it might cause a few problems. # So if the user is able to remove the file before your postprocessor runs it might cause a few problems.
if opts.exec_cmd: if opts.exec_cmd:
@ -261,12 +285,6 @@ def _real_main(argv=None):
'key': 'ExecAfterDownload', 'key': 'ExecAfterDownload',
'exec_cmd': opts.exec_cmd, 'exec_cmd': opts.exec_cmd,
}) })
if opts.xattr_set_filesize:
try:
import xattr
xattr # Confuse flake8
except ImportError:
parser.error('setting filesize xattr requested but python-xattr is not available')
external_downloader_args = None external_downloader_args = None
if opts.external_downloader_args: if opts.external_downloader_args:
external_downloader_args = compat_shlex_split(opts.external_downloader_args) external_downloader_args = compat_shlex_split(opts.external_downloader_args)
@ -283,6 +301,9 @@ def _real_main(argv=None):
'password': opts.password, 'password': opts.password,
'twofactor': opts.twofactor, 'twofactor': opts.twofactor,
'videopassword': opts.videopassword, 'videopassword': opts.videopassword,
'ap_mso': opts.ap_mso,
'ap_username': opts.ap_username,
'ap_password': opts.ap_password,
'quiet': (opts.quiet or any_getting or any_printing), 'quiet': (opts.quiet or any_getting or any_printing),
'no_warnings': opts.no_warnings, 'no_warnings': opts.no_warnings,
'forceurl': opts.geturl, 'forceurl': opts.geturl,
@ -308,6 +329,7 @@ def _real_main(argv=None):
'nooverwrites': opts.nooverwrites, 'nooverwrites': opts.nooverwrites,
'retries': opts.retries, 'retries': opts.retries,
'fragment_retries': opts.fragment_retries, 'fragment_retries': opts.fragment_retries,
'skip_unavailable_fragments': opts.skip_unavailable_fragments,
'buffersize': opts.buffersize, 'buffersize': opts.buffersize,
'noresizebuffer': opts.noresizebuffer, 'noresizebuffer': opts.noresizebuffer,
'continuedl': opts.continue_dl, 'continuedl': opts.continue_dl,
@ -370,6 +392,7 @@ def _real_main(argv=None):
'source_address': opts.source_address, 'source_address': opts.source_address,
'call_home': opts.call_home, 'call_home': opts.call_home,
'sleep_interval': opts.sleep_interval, 'sleep_interval': opts.sleep_interval,
'max_sleep_interval': opts.max_sleep_interval,
'external_downloader': opts.external_downloader, 'external_downloader': opts.external_downloader,
'list_thumbnails': opts.list_thumbnails, 'list_thumbnails': opts.list_thumbnails,
'playlist_items': opts.playlist_items, 'playlist_items': opts.playlist_items,
@ -382,6 +405,8 @@ def _real_main(argv=None):
'external_downloader_args': external_downloader_args, 'external_downloader_args': external_downloader_args,
'postprocessor_args': postprocessor_args, 'postprocessor_args': postprocessor_args,
'cn_verification_proxy': opts.cn_verification_proxy, 'cn_verification_proxy': opts.cn_verification_proxy,
'geo_verification_proxy': opts.geo_verification_proxy,
} }
with YoutubeDL(ydl_opts) as ydl: with YoutubeDL(ydl_opts) as ydl:

View File

@ -1,3 +1,4 @@
# coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import binascii import binascii
@ -2594,15 +2595,19 @@ except ImportError: # Python < 3.3
return "'" + s.replace("'", "'\"'\"'") + "'" return "'" + s.replace("'", "'\"'\"'") + "'"
if sys.version_info >= (2, 7, 3): try:
args = shlex.split('中文')
assert (isinstance(args, list) and
isinstance(args[0], compat_str) and
args[0] == '中文')
compat_shlex_split = shlex.split compat_shlex_split = shlex.split
else: except (AssertionError, UnicodeEncodeError):
# Working around shlex issue with unicode strings on some python 2 # Working around shlex issue with unicode strings on some python 2
# versions (see http://bugs.python.org/issue1548891) # versions (see http://bugs.python.org/issue1548891)
def compat_shlex_split(s, comments=False, posix=True): def compat_shlex_split(s, comments=False, posix=True):
if isinstance(s, compat_str): if isinstance(s, compat_str):
s = s.encode('utf-8') s = s.encode('utf-8')
return shlex.split(s, comments, posix) return list(map(lambda s: s.decode('utf-8'), shlex.split(s, comments, posix)))
def compat_ord(c): def compat_ord(c):

View File

@ -4,6 +4,7 @@ import os
import re import re
import sys import sys
import time import time
import random
from ..compat import compat_os_name from ..compat import compat_os_name
from ..utils import ( from ..utils import (
@ -342,8 +343,10 @@ class FileDownloader(object):
}) })
return True return True
sleep_interval = self.params.get('sleep_interval') min_sleep_interval = self.params.get('sleep_interval')
if sleep_interval: if min_sleep_interval:
max_sleep_interval = self.params.get('max_sleep_interval', min_sleep_interval)
sleep_interval = random.uniform(min_sleep_interval, max_sleep_interval)
self.to_screen('[download] Sleeping %s seconds...' % sleep_interval) self.to_screen('[download] Sleeping %s seconds...' % sleep_interval)
time.sleep(sleep_interval) time.sleep(sleep_interval)

View File

@ -1,7 +1,6 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import os import os
import re
from .fragment import FragmentFD from .fragment import FragmentFD
from ..compat import compat_urllib_error from ..compat import compat_urllib_error
@ -19,32 +18,32 @@ class DashSegmentsFD(FragmentFD):
FD_NAME = 'dashsegments' FD_NAME = 'dashsegments'
def real_download(self, filename, info_dict): def real_download(self, filename, info_dict):
base_url = info_dict['url'] segments = info_dict['fragments'][:1] if self.params.get(
segment_urls = [info_dict['segment_urls'][0]] if self.params.get('test', False) else info_dict['segment_urls'] 'test', False) else info_dict['fragments']
initialization_url = info_dict.get('initialization_url')
ctx = { ctx = {
'filename': filename, 'filename': filename,
'total_frags': len(segment_urls) + (1 if initialization_url else 0), 'total_frags': len(segments),
} }
self._prepare_and_start_frag_download(ctx) self._prepare_and_start_frag_download(ctx)
def combine_url(base_url, target_url):
if re.match(r'^https?://', target_url):
return target_url
return '%s%s%s' % (base_url, '' if base_url.endswith('/') else '/', target_url)
segments_filenames = [] segments_filenames = []
fragment_retries = self.params.get('fragment_retries', 0) fragment_retries = self.params.get('fragment_retries', 0)
skip_unavailable_fragments = self.params.get('skip_unavailable_fragments', True)
def append_url_to_file(target_url, tmp_filename, segment_name): def process_segment(segment, tmp_filename, num):
segment_url = segment['url']
segment_name = 'Frag%d' % num
target_filename = '%s-%s' % (tmp_filename, segment_name) target_filename = '%s-%s' % (tmp_filename, segment_name)
# In DASH, the first segment contains necessary headers to
# generate a valid MP4 file, so always abort for the first segment
fatal = num == 0 or not skip_unavailable_fragments
count = 0 count = 0
while count <= fragment_retries: while count <= fragment_retries:
try: try:
success = ctx['dl'].download(target_filename, {'url': combine_url(base_url, target_url)}) success = ctx['dl'].download(target_filename, {'url': segment_url})
if not success: if not success:
return False return False
down, target_sanitized = sanitize_open(target_filename, 'rb') down, target_sanitized = sanitize_open(target_filename, 'rb')
@ -52,26 +51,27 @@ class DashSegmentsFD(FragmentFD):
down.close() down.close()
segments_filenames.append(target_sanitized) segments_filenames.append(target_sanitized)
break break
except (compat_urllib_error.HTTPError, ) as err: except compat_urllib_error.HTTPError as err:
# YouTube may often return 404 HTTP error for a fragment causing the # YouTube may often return 404 HTTP error for a fragment causing the
# whole download to fail. However if the same fragment is immediately # whole download to fail. However if the same fragment is immediately
# retried with the same request data this usually succeeds (1-2 attemps # retried with the same request data this usually succeeds (1-2 attemps
# is usually enough) thus allowing to download the whole file successfully. # is usually enough) thus allowing to download the whole file successfully.
# So, we will retry all fragments that fail with 404 HTTP error for now. # To be future-proof we will retry all fragments that fail with any
if err.code != 404: # HTTP error.
raise
# Retry fragment
count += 1 count += 1
if count <= fragment_retries: if count <= fragment_retries:
self.report_retry_fragment(segment_name, count, fragment_retries) self.report_retry_fragment(err, segment_name, count, fragment_retries)
if count > fragment_retries: if count > fragment_retries:
if not fatal:
self.report_skip_fragment(segment_name)
return True
self.report_error('giving up after %s fragment retries' % fragment_retries) self.report_error('giving up after %s fragment retries' % fragment_retries)
return False return False
return True
if initialization_url: for i, segment in enumerate(segments):
append_url_to_file(initialization_url, ctx['tmpfilename'], 'Init') if not process_segment(segment, ctx['tmpfilename'], i):
for i, segment_url in enumerate(segment_urls): return False
append_url_to_file(segment_url, ctx['tmpfilename'], 'Seg%d' % i)
self._finish_frag_download(ctx) self._finish_frag_download(ctx)

View File

@ -96,6 +96,12 @@ class CurlFD(ExternalFD):
cmd = [self.exe, '--location', '-o', tmpfilename] cmd = [self.exe, '--location', '-o', tmpfilename]
for key, val in info_dict['http_headers'].items(): for key, val in info_dict['http_headers'].items():
cmd += ['--header', '%s: %s' % (key, val)] cmd += ['--header', '%s: %s' % (key, val)]
cmd += self._bool_option('--continue-at', 'continuedl', '-', '0')
cmd += self._valueless_option('--silent', 'noprogress')
cmd += self._valueless_option('--verbose', 'verbose')
cmd += self._option('--limit-rate', 'ratelimit')
cmd += self._option('--retry', 'retries')
cmd += self._option('--max-filesize', 'max_filesize')
cmd += self._option('--interface', 'source_address') cmd += self._option('--interface', 'source_address')
cmd += self._option('--proxy', 'proxy') cmd += self._option('--proxy', 'proxy')
cmd += self._valueless_option('--insecure', 'nocheckcertificate') cmd += self._valueless_option('--insecure', 'nocheckcertificate')
@ -103,6 +109,16 @@ class CurlFD(ExternalFD):
cmd += ['--', info_dict['url']] cmd += ['--', info_dict['url']]
return cmd return cmd
def _call_downloader(self, tmpfilename, info_dict):
cmd = [encodeArgument(a) for a in self._make_cmd(tmpfilename, info_dict)]
self._debug_cmd(cmd)
# curl writes the progress to stderr so don't capture it.
p = subprocess.Popen(cmd)
p.communicate()
return p.returncode
class AxelFD(ExternalFD): class AxelFD(ExternalFD):
AVAILABLE_OPT = '-V' AVAILABLE_OPT = '-V'
@ -204,6 +220,12 @@ class FFmpegFD(ExternalFD):
if proxy: if proxy:
if not re.match(r'^[\da-zA-Z]+://', proxy): if not re.match(r'^[\da-zA-Z]+://', proxy):
proxy = 'http://%s' % proxy proxy = 'http://%s' % proxy
if proxy.startswith('socks'):
self.report_warning(
'%s does not support SOCKS proxies. Downloading is likely to fail. '
'Consider adding --hls-prefer-native to your command.' % self.get_basename())
# Since December 2015 ffmpeg supports -http_proxy option (see # Since December 2015 ffmpeg supports -http_proxy option (see
# http://git.videolan.org/?p=ffmpeg.git;a=commit;h=b4eb1f29ebddd60c41a2eb39f5af701e38e0d3fd) # http://git.videolan.org/?p=ffmpeg.git;a=commit;h=b4eb1f29ebddd60c41a2eb39f5af701e38e0d3fd)
# We could switch to the following code if we are able to detect version properly # We could switch to the following code if we are able to detect version properly

View File

@ -196,6 +196,11 @@ def build_fragments_list(boot_info):
first_frag_number = fragment_run_entry_table[0]['first'] first_frag_number = fragment_run_entry_table[0]['first']
fragments_counter = itertools.count(first_frag_number) fragments_counter = itertools.count(first_frag_number)
for segment, fragments_count in segment_run_table['segment_run']: for segment, fragments_count in segment_run_table['segment_run']:
# In some live HDS streams (for example Rai), `fragments_count` is
# abnormal and causing out-of-memory errors. It's OK to change the
# number of fragments for live streams as they are updated periodically
if fragments_count == 4294967295 and boot_info['live']:
fragments_count = 2
for _ in range(fragments_count): for _ in range(fragments_count):
res.append((segment, next(fragments_counter))) res.append((segment, next(fragments_counter)))
@ -329,7 +334,11 @@ class F4mFD(FragmentFD):
base_url = compat_urlparse.urljoin(man_url, media.attrib['url']) base_url = compat_urlparse.urljoin(man_url, media.attrib['url'])
bootstrap_node = doc.find(_add_ns('bootstrapInfo')) bootstrap_node = doc.find(_add_ns('bootstrapInfo'))
boot_info, bootstrap_url = self._parse_bootstrap_node(bootstrap_node, base_url) # From Adobe F4M 3.0 spec:
# The <baseURL> element SHALL be the base URL for all relative
# (HTTP-based) URLs in the manifest. If <baseURL> is not present, said
# URLs should be relative to the location of the containing document.
boot_info, bootstrap_url = self._parse_bootstrap_node(bootstrap_node, man_url)
live = boot_info['live'] live = boot_info['live']
metadata_node = media.find(_add_ns('metadata')) metadata_node = media.find(_add_ns('metadata'))
if metadata_node is not None: if metadata_node is not None:

View File

@ -6,6 +6,7 @@ import time
from .common import FileDownloader from .common import FileDownloader
from .http import HttpFD from .http import HttpFD
from ..utils import ( from ..utils import (
error_to_compat_str,
encodeFilename, encodeFilename,
sanitize_open, sanitize_open,
) )
@ -22,13 +23,19 @@ class FragmentFD(FileDownloader):
Available options: Available options:
fragment_retries: Number of times to retry a fragment for HTTP error (DASH only) fragment_retries: Number of times to retry a fragment for HTTP error (DASH
and hlsnative only)
skip_unavailable_fragments:
Skip unavailable fragments (DASH and hlsnative only)
""" """
def report_retry_fragment(self, fragment_name, count, retries): def report_retry_fragment(self, err, fragment_name, count, retries):
self.to_screen( self.to_screen(
'[download] Got server HTTP error. Retrying fragment %s (attempt %d of %s)...' '[download] Got server HTTP error: %s. Retrying fragment %s (attempt %d of %s)...'
% (fragment_name, count, self.format_retries(retries))) % (error_to_compat_str(err), fragment_name, count, self.format_retries(retries)))
def report_skip_fragment(self, fragment_name):
self.to_screen('[download] Skipping fragment %s...' % fragment_name)
def _prepare_and_start_frag_download(self, ctx): def _prepare_and_start_frag_download(self, ctx):
self._prepare_frag_download(ctx) self._prepare_frag_download(ctx)

View File

@ -13,6 +13,7 @@ from .fragment import FragmentFD
from .external import FFmpegFD from .external import FFmpegFD
from ..compat import ( from ..compat import (
compat_urllib_error,
compat_urlparse, compat_urlparse,
compat_struct_pack, compat_struct_pack,
) )
@ -20,6 +21,7 @@ from ..utils import (
encodeFilename, encodeFilename,
sanitize_open, sanitize_open,
parse_m3u8_attributes, parse_m3u8_attributes,
update_url_query,
) )
@ -29,7 +31,7 @@ class HlsFD(FragmentFD):
FD_NAME = 'hlsnative' FD_NAME = 'hlsnative'
@staticmethod @staticmethod
def can_download(manifest): def can_download(manifest, info_dict):
UNSUPPORTED_FEATURES = ( UNSUPPORTED_FEATURES = (
r'#EXT-X-KEY:METHOD=(?!NONE|AES-128)', # encrypted streams [1] r'#EXT-X-KEY:METHOD=(?!NONE|AES-128)', # encrypted streams [1]
r'#EXT-X-BYTERANGE', # playlists composed of byte ranges of media files [2] r'#EXT-X-BYTERANGE', # playlists composed of byte ranges of media files [2]
@ -51,6 +53,7 @@ class HlsFD(FragmentFD):
) )
check_results = [not re.search(feature, manifest) for feature in UNSUPPORTED_FEATURES] check_results = [not re.search(feature, manifest) for feature in UNSUPPORTED_FEATURES]
check_results.append(can_decrypt_frag or '#EXT-X-KEY:METHOD=AES-128' not in manifest) check_results.append(can_decrypt_frag or '#EXT-X-KEY:METHOD=AES-128' not in manifest)
check_results.append(not info_dict.get('is_live'))
return all(check_results) return all(check_results)
def real_download(self, filename, info_dict): def real_download(self, filename, info_dict):
@ -60,7 +63,7 @@ class HlsFD(FragmentFD):
s = manifest.decode('utf-8', 'ignore') s = manifest.decode('utf-8', 'ignore')
if not self.can_download(s): if not self.can_download(s, info_dict):
self.report_warning( self.report_warning(
'hlsnative has detected features it does not support, ' 'hlsnative has detected features it does not support, '
'extraction will be delegated to ffmpeg') 'extraction will be delegated to ffmpeg')
@ -82,6 +85,14 @@ class HlsFD(FragmentFD):
self._prepare_and_start_frag_download(ctx) self._prepare_and_start_frag_download(ctx)
fragment_retries = self.params.get('fragment_retries', 0)
skip_unavailable_fragments = self.params.get('skip_unavailable_fragments', True)
test = self.params.get('test', False)
extra_query = None
extra_param_to_segment_url = info_dict.get('extra_param_to_segment_url')
if extra_param_to_segment_url:
extra_query = compat_urlparse.parse_qs(extra_param_to_segment_url)
i = 0 i = 0
media_sequence = 0 media_sequence = 0
decrypt_info = {'METHOD': 'NONE'} decrypt_info = {'METHOD': 'NONE'}
@ -94,13 +105,37 @@ class HlsFD(FragmentFD):
line line
if re.match(r'^https?://', line) if re.match(r'^https?://', line)
else compat_urlparse.urljoin(man_url, line)) else compat_urlparse.urljoin(man_url, line))
frag_filename = '%s-Frag%d' % (ctx['tmpfilename'], i) frag_name = 'Frag%d' % i
success = ctx['dl'].download(frag_filename, {'url': frag_url}) frag_filename = '%s-%s' % (ctx['tmpfilename'], frag_name)
if not success: if extra_query:
frag_url = update_url_query(frag_url, extra_query)
count = 0
while count <= fragment_retries:
try:
success = ctx['dl'].download(frag_filename, {'url': frag_url})
if not success:
return False
down, frag_sanitized = sanitize_open(frag_filename, 'rb')
frag_content = down.read()
down.close()
break
except compat_urllib_error.HTTPError as err:
# Unavailable (possibly temporary) fragments may be served.
# First we try to retry then either skip or abort.
# See https://github.com/rg3/youtube-dl/issues/10165,
# https://github.com/rg3/youtube-dl/issues/10448).
count += 1
if count <= fragment_retries:
self.report_retry_fragment(err, frag_name, count, fragment_retries)
if count > fragment_retries:
if skip_unavailable_fragments:
i += 1
media_sequence += 1
self.report_skip_fragment(frag_name)
continue
self.report_error(
'giving up after %s fragment retries' % fragment_retries)
return False return False
down, frag_sanitized = sanitize_open(frag_filename, 'rb')
frag_content = down.read()
down.close()
if decrypt_info['METHOD'] == 'AES-128': if decrypt_info['METHOD'] == 'AES-128':
iv = decrypt_info.get('IV') or compat_struct_pack('>8xq', media_sequence) iv = decrypt_info.get('IV') or compat_struct_pack('>8xq', media_sequence)
frag_content = AES.new( frag_content = AES.new(
@ -108,7 +143,7 @@ class HlsFD(FragmentFD):
ctx['dest_stream'].write(frag_content) ctx['dest_stream'].write(frag_content)
frags_filenames.append(frag_sanitized) frags_filenames.append(frag_sanitized)
# We only download the first fragment during the test # We only download the first fragment during the test
if self.params.get('test', False): if test:
break break
i += 1 i += 1
media_sequence += 1 media_sequence += 1
@ -116,10 +151,12 @@ class HlsFD(FragmentFD):
decrypt_info = parse_m3u8_attributes(line[11:]) decrypt_info = parse_m3u8_attributes(line[11:])
if decrypt_info['METHOD'] == 'AES-128': if decrypt_info['METHOD'] == 'AES-128':
if 'IV' in decrypt_info: if 'IV' in decrypt_info:
decrypt_info['IV'] = binascii.unhexlify(decrypt_info['IV'][2:]) decrypt_info['IV'] = binascii.unhexlify(decrypt_info['IV'][2:].zfill(32))
if not re.match(r'^https?://', decrypt_info['URI']): if not re.match(r'^https?://', decrypt_info['URI']):
decrypt_info['URI'] = compat_urlparse.urljoin( decrypt_info['URI'] = compat_urlparse.urljoin(
man_url, decrypt_info['URI']) man_url, decrypt_info['URI'])
if extra_query:
decrypt_info['URI'] = update_url_query(decrypt_info['URI'], extra_query)
decrypt_info['KEY'] = self.ydl.urlopen(decrypt_info['URI']).read() decrypt_info['KEY'] = self.ydl.urlopen(decrypt_info['URI']).read()
elif line.startswith('#EXT-X-MEDIA-SEQUENCE'): elif line.startswith('#EXT-X-MEDIA-SEQUENCE'):
media_sequence = int(line[22:]) media_sequence = int(line[22:])

View File

@ -13,6 +13,9 @@ from ..utils import (
encodeFilename, encodeFilename,
sanitize_open, sanitize_open,
sanitized_Request, sanitized_Request,
write_xattr,
XAttrMetadataError,
XAttrUnavailableError,
) )
@ -179,9 +182,8 @@ class HttpFD(FileDownloader):
if self.params.get('xattr_set_filesize', False) and data_len is not None: if self.params.get('xattr_set_filesize', False) and data_len is not None:
try: try:
import xattr write_xattr(tmpfilename, 'user.ytdl.filesize', str(data_len).encode('utf-8'))
xattr.setxattr(tmpfilename, 'user.ytdl.filesize', str(data_len)) except (XAttrUnavailableError, XAttrMetadataError) as err:
except(OSError, IOError, ImportError) as err:
self.report_error('unable to set filesize xattr: %s' % str(err)) self.report_error('unable to set filesize xattr: %s' % str(err))
try: try:

View File

@ -7,12 +7,13 @@ from ..utils import (
ExtractorError, ExtractorError,
js_to_json, js_to_json,
int_or_none, int_or_none,
parse_iso8601,
) )
class ABCIE(InfoExtractor): class ABCIE(InfoExtractor):
IE_NAME = 'abc.net.au' IE_NAME = 'abc.net.au'
_VALID_URL = r'https?://www\.abc\.net\.au/news/(?:[^/]+/){1,2}(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?abc\.net\.au/news/(?:[^/]+/){1,2}(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.abc.net.au/news/2014-11-05/australia-to-staff-ebola-treatment-centre-in-sierra-leone/5868334', 'url': 'http://www.abc.net.au/news/2014-11-05/australia-to-staff-ebola-treatment-centre-in-sierra-leone/5868334',
@ -93,3 +94,59 @@ class ABCIE(InfoExtractor):
'description': self._og_search_description(webpage), 'description': self._og_search_description(webpage),
'thumbnail': self._og_search_thumbnail(webpage), 'thumbnail': self._og_search_thumbnail(webpage),
} }
class ABCIViewIE(InfoExtractor):
IE_NAME = 'abc.net.au:iview'
_VALID_URL = r'https?://iview\.abc\.net\.au/programs/[^/]+/(?P<id>[^/?#]+)'
# ABC iview programs are normally available for 14 days only.
_TESTS = [{
'url': 'http://iview.abc.net.au/programs/diaries-of-a-broken-mind/ZX9735A001S00',
'md5': 'cde42d728b3b7c2b32b1b94b4a548afc',
'info_dict': {
'id': 'ZX9735A001S00',
'ext': 'mp4',
'title': 'Diaries Of A Broken Mind',
'description': 'md5:7de3903874b7a1be279fe6b68718fc9e',
'upload_date': '20161010',
'uploader_id': 'abc2',
'timestamp': 1476064920,
},
'skip': 'Video gone',
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_params = self._parse_json(self._search_regex(
r'videoParams\s*=\s*({.+?});', webpage, 'video params'), video_id)
title = video_params.get('title') or video_params['seriesTitle']
stream = next(s for s in video_params['playlist'] if s.get('type') == 'program')
formats = self._extract_akamai_formats(stream['hds-unmetered'], video_id)
self._sort_formats(formats)
subtitles = {}
src_vtt = stream.get('captions', {}).get('src-vtt')
if src_vtt:
subtitles['en'] = [{
'url': src_vtt,
'ext': 'vtt',
}]
return {
'id': video_id,
'title': title,
'description': self._html_search_meta(['og:description', 'twitter:description'], webpage),
'thumbnail': self._html_search_meta(['og:image', 'twitter:image:src'], webpage),
'duration': int_or_none(video_params.get('eventDuration')),
'timestamp': parse_iso8601(video_params.get('pubDate'), ' '),
'series': video_params.get('seriesTitle'),
'series_id': video_params.get('seriesHouseNumber') or video_id[:7],
'episode_number': int_or_none(self._html_search_meta('episodeNumber', webpage, default=None)),
'episode': self._html_search_meta('episode_title', webpage, default=None),
'uploader_id': video_params.get('channel'),
'formats': formats,
'subtitles': subtitles,
}

View File

@ -12,7 +12,7 @@ from ..compat import compat_urlparse
class AbcNewsVideoIE(AMPIE): class AbcNewsVideoIE(AMPIE):
IE_NAME = 'abcnews:video' IE_NAME = 'abcnews:video'
_VALID_URL = 'http://abcnews.go.com/[^/]+/video/(?P<display_id>[0-9a-z-]+)-(?P<id>\d+)' _VALID_URL = r'https?://abcnews\.go\.com/[^/]+/video/(?P<display_id>[0-9a-z-]+)-(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://abcnews.go.com/ThisWeek/video/week-exclusive-irans-foreign-minister-zarif-20411932', 'url': 'http://abcnews.go.com/ThisWeek/video/week-exclusive-irans-foreign-minister-zarif-20411932',
@ -49,7 +49,7 @@ class AbcNewsVideoIE(AMPIE):
class AbcNewsIE(InfoExtractor): class AbcNewsIE(InfoExtractor):
IE_NAME = 'abcnews' IE_NAME = 'abcnews'
_VALID_URL = 'https?://abcnews\.go\.com/(?:[^/]+/)+(?P<display_id>[0-9a-z-]+)/story\?id=(?P<id>\d+)' _VALID_URL = r'https?://abcnews\.go\.com/(?:[^/]+/)+(?P<display_id>[0-9a-z-]+)/story\?id=(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://abcnews.go.com/Blotter/News/dramatic-video-rare-death-job-america/story?id=10498713#.UIhwosWHLjY', 'url': 'http://abcnews.go.com/Blotter/News/dramatic-video-rare-death-job-america/story?id=10498713#.UIhwosWHLjY',

View File

@ -1,13 +1,19 @@
# coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import parse_iso8601 from ..utils import (
int_or_none,
parse_iso8601,
)
class Abc7NewsIE(InfoExtractor): class ABCOTVSIE(InfoExtractor):
_VALID_URL = r'https?://abc7news\.com(?:/[^/]+/(?P<display_id>[^/]+))?/(?P<id>\d+)' IE_NAME = 'abcotvs'
IE_DESC = 'ABC Owned Television Stations'
_VALID_URL = r'https?://(?:abc(?:7(?:news|ny|chicago)?|11|13|30)|6abc)\.com(?:/[^/]+/(?P<display_id>[^/]+))?/(?P<id>\d+)'
_TESTS = [ _TESTS = [
{ {
'url': 'http://abc7news.com/entertainment/east-bay-museum-celebrates-vintage-synthesizers/472581/', 'url': 'http://abc7news.com/entertainment/east-bay-museum-celebrates-vintage-synthesizers/472581/',
@ -15,7 +21,7 @@ class Abc7NewsIE(InfoExtractor):
'id': '472581', 'id': '472581',
'display_id': 'east-bay-museum-celebrates-vintage-synthesizers', 'display_id': 'east-bay-museum-celebrates-vintage-synthesizers',
'ext': 'mp4', 'ext': 'mp4',
'title': 'East Bay museum celebrates history of synthesized music', 'title': 'East Bay museum celebrates vintage synthesizers',
'description': 'md5:a4f10fb2f2a02565c1749d4adbab4b10', 'description': 'md5:a4f10fb2f2a02565c1749d4adbab4b10',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
'timestamp': 1421123075, 'timestamp': 1421123075,
@ -41,7 +47,7 @@ class Abc7NewsIE(InfoExtractor):
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
m3u8 = self._html_search_meta( m3u8 = self._html_search_meta(
'contentURL', webpage, 'm3u8 url', fatal=True) 'contentURL', webpage, 'm3u8 url', fatal=True).split('?')[0]
formats = self._extract_m3u8_formats(m3u8, display_id, 'mp4') formats = self._extract_m3u8_formats(m3u8, display_id, 'mp4')
self._sort_formats(formats) self._sort_formats(formats)
@ -66,3 +72,41 @@ class Abc7NewsIE(InfoExtractor):
'uploader': uploader, 'uploader': uploader,
'formats': formats, 'formats': formats,
} }
class ABCOTVSClipsIE(InfoExtractor):
IE_NAME = 'abcotvs:clips'
_VALID_URL = r'https?://clips\.abcotvs\.com/(?:[^/]+/)*video/(?P<id>\d+)'
_TEST = {
'url': 'https://clips.abcotvs.com/kabc/video/214814',
'info_dict': {
'id': '214814',
'ext': 'mp4',
'title': 'SpaceX launch pad explosion destroys rocket, satellite',
'description': 'md5:9f186e5ad8f490f65409965ee9c7be1b',
'upload_date': '20160901',
'timestamp': 1472756695,
},
'params': {
# m3u8 download
'skip_download': True,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
video_data = self._download_json('https://clips.abcotvs.com/vogo/video/getByIds?ids=' + video_id, video_id)['results'][0]
title = video_data['title']
formats = self._extract_m3u8_formats(
video_data['videoURL'].split('?')[0], video_id, 'mp4')
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': video_data.get('description'),
'thumbnail': video_data.get('thumbnailURL'),
'duration': int_or_none(video_data.get('duration')),
'timestamp': int_or_none(video_data.get('pubDate')),
'formats': formats,
}

File diff suppressed because it is too large Load Diff

View File

@ -3,16 +3,14 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .turner import TurnerBaseIE
from ..utils import ( from ..utils import (
determine_ext,
ExtractorError, ExtractorError,
float_or_none, int_or_none,
xpath_text,
) )
class AdultSwimIE(InfoExtractor): class AdultSwimIE(TurnerBaseIE):
_VALID_URL = r'https?://(?:www\.)?adultswim\.com/videos/(?P<is_playlist>playlists/)?(?P<show_path>[^/]+)/(?P<episode_path>[^/?#]+)/?' _VALID_URL = r'https?://(?:www\.)?adultswim\.com/videos/(?P<is_playlist>playlists/)?(?P<show_path>[^/]+)/(?P<episode_path>[^/?#]+)/?'
_TESTS = [{ _TESTS = [{
@ -83,6 +81,21 @@ class AdultSwimIE(InfoExtractor):
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
} }
}, {
# heroMetadata.trailer
'url': 'http://www.adultswim.com/videos/decker/inside-decker-a-new-hero/',
'info_dict': {
'id': 'I0LQFQkaSUaFp8PnAWHhoQ',
'ext': 'mp4',
'title': 'Decker - Inside Decker: A New Hero',
'description': 'md5:c916df071d425d62d70c86d4399d3ee0',
'duration': 249.008,
},
'params': {
# m3u8 download
'skip_download': True,
},
'expected_warnings': ['Unable to download f4m manifest'],
}] }]
@staticmethod @staticmethod
@ -133,79 +146,56 @@ class AdultSwimIE(InfoExtractor):
if video_info is None: if video_info is None:
if bootstrapped_data.get('slugged_video', {}).get('slug') == episode_path: if bootstrapped_data.get('slugged_video', {}).get('slug') == episode_path:
video_info = bootstrapped_data['slugged_video'] video_info = bootstrapped_data['slugged_video']
else: if not video_info:
raise ExtractorError('Unable to find video info') video_info = bootstrapped_data.get(
'heroMetadata', {}).get('trailer', {}).get('video')
if not video_info:
video_info = bootstrapped_data.get('onlineOriginals', [None])[0]
if not video_info:
raise ExtractorError('Unable to find video info')
show = bootstrapped_data['show'] show = bootstrapped_data['show']
show_title = show['title'] show_title = show['title']
stream = video_info.get('stream') stream = video_info.get('stream')
clips = [stream] if stream else video_info.get('clips') if stream and stream.get('videoPlaybackID'):
if not clips: segment_ids = [stream['videoPlaybackID']]
raise ExtractorError( elif video_info.get('clips'):
'This video is only available via cable service provider subscription that' segment_ids = [clip['videoPlaybackID'] for clip in video_info['clips']]
' is not currently supported. You may want to use --cookies.' elif video_info.get('videoPlaybackID'):
if video_info.get('auth') is True else 'Unable to find stream or clips', segment_ids = [video_info['videoPlaybackID']]
expected=True) else:
segment_ids = [clip['videoPlaybackID'] for clip in clips] if video_info.get('auth') is True:
raise ExtractorError(
'This video is only available via cable service provider subscription that'
' is not currently supported. You may want to use --cookies.', expected=True)
else:
raise ExtractorError('Unable to find stream or clips')
episode_id = video_info['id'] episode_id = video_info['id']
episode_title = video_info['title'] episode_title = video_info['title']
episode_description = video_info['description'] episode_description = video_info.get('description')
episode_duration = video_info.get('duration') episode_duration = int_or_none(video_info.get('duration'))
view_count = int_or_none(video_info.get('views'))
entries = [] entries = []
for part_num, segment_id in enumerate(segment_ids): for part_num, segment_id in enumerate(segment_ids):
segment_url = 'http://www.adultswim.com/videos/api/v0/assets?id=%s&platform=desktop' % segment_id segement_info = self._extract_cvp_info(
'http://www.adultswim.com/videos/api/v0/assets?id=%s&platform=desktop' % segment_id,
segment_id, {
'secure': {
'media_src': 'http://androidhls-secure.cdn.turner.com/adultswim/big',
'tokenizer_src': 'http://www.adultswim.com/astv/mvpd/processors/services/token_ipadAdobe.do',
},
})
segment_title = '%s - %s' % (show_title, episode_title) segment_title = '%s - %s' % (show_title, episode_title)
if len(segment_ids) > 1: if len(segment_ids) > 1:
segment_title += ' Part %d' % (part_num + 1) segment_title += ' Part %d' % (part_num + 1)
segement_info.update({
idoc = self._download_xml(
segment_url, segment_title,
'Downloading segment information', 'Unable to download segment information')
segment_duration = float_or_none(
xpath_text(idoc, './/trt', 'segment duration').strip())
formats = []
file_els = idoc.findall('.//files/file') or idoc.findall('./files/file')
unique_urls = []
unique_file_els = []
for file_el in file_els:
media_url = file_el.text
if not media_url or determine_ext(media_url) == 'f4m':
continue
if file_el.text not in unique_urls:
unique_urls.append(file_el.text)
unique_file_els.append(file_el)
for file_el in unique_file_els:
bitrate = file_el.attrib.get('bitrate')
ftype = file_el.attrib.get('type')
media_url = file_el.text
if determine_ext(media_url) == 'm3u8':
formats.extend(self._extract_m3u8_formats(
media_url, segment_title, 'mp4', preference=0,
m3u8_id='hls', fatal=False))
else:
formats.append({
'format_id': '%s_%s' % (bitrate, ftype),
'url': file_el.text.strip(),
# The bitrate may not be a number (for example: 'iphone')
'tbr': int(bitrate) if bitrate.isdigit() else None,
})
self._sort_formats(formats)
entries.append({
'id': segment_id, 'id': segment_id,
'title': segment_title, 'title': segment_title,
'formats': formats, 'description': episode_description,
'duration': segment_duration,
'description': episode_description
}) })
entries.append(segement_info)
return { return {
'_type': 'playlist', '_type': 'playlist',
@ -214,5 +204,6 @@ class AdultSwimIE(InfoExtractor):
'entries': entries, 'entries': entries,
'title': '%s - %s' % (show_title, episode_title), 'title': '%s - %s' % (show_title, episode_title),
'description': episode_description, 'description': episode_description,
'duration': episode_duration 'duration': episode_duration,
'view_count': view_count,
} }

View File

@ -2,23 +2,140 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .theplatform import ThePlatformIE
from ..utils import ( from ..utils import (
smuggle_url, smuggle_url,
update_url_query, update_url_query,
unescapeHTML, unescapeHTML,
extract_attributes,
get_element_by_attribute,
)
from ..compat import (
compat_urlparse,
) )
class AENetworksIE(InfoExtractor): class AENetworksBaseIE(ThePlatformIE):
_THEPLATFORM_KEY = 'crazyjava'
_THEPLATFORM_SECRET = 's3cr3t'
class AENetworksIE(AENetworksBaseIE):
IE_NAME = 'aenetworks' IE_NAME = 'aenetworks'
IE_DESC = 'A+E Networks: A&E, Lifetime, History.com, FYI Network' IE_DESC = 'A+E Networks: A&E, Lifetime, History.com, FYI Network'
_VALID_URL = r'https?://(?:www\.)?(?:(?:history|aetv|mylifetime)\.com|fyi\.tv)/(?P<type>[^/]+)/(?:[^/]+/)+(?P<id>[^/]+?)(?:$|[?#])' _VALID_URL = r'https?://(?:www\.)?(?P<domain>(?:history|aetv|mylifetime)\.com|fyi\.tv)/(?:shows/(?P<show_path>[^/]+(?:/[^/]+){0,2})|movies/(?P<movie_display_id>[^/]+)/full-movie)'
_TESTS = [{
'url': 'http://www.history.com/shows/mountain-men/season-1/episode-1',
'md5': '8ff93eb073449f151d6b90c0ae1ef0c7',
'info_dict': {
'id': '22253814',
'ext': 'mp4',
'title': 'Winter Is Coming',
'description': 'md5:641f424b7a19d8e24f26dea22cf59d74',
'timestamp': 1338306241,
'upload_date': '20120529',
'uploader': 'AENE-NEW',
},
'add_ie': ['ThePlatform'],
}, {
'url': 'http://www.history.com/shows/ancient-aliens/season-1',
'info_dict': {
'id': '71889446852',
},
'playlist_mincount': 5,
}, {
'url': 'http://www.mylifetime.com/shows/atlanta-plastic',
'info_dict': {
'id': 'SERIES4317',
'title': 'Atlanta Plastic',
},
'playlist_mincount': 2,
}, {
'url': 'http://www.aetv.com/shows/duck-dynasty/season-9/episode-1',
'only_matching': True
}, {
'url': 'http://www.fyi.tv/shows/tiny-house-nation/season-1/episode-8',
'only_matching': True
}, {
'url': 'http://www.mylifetime.com/shows/project-runway-junior/season-1/episode-6',
'only_matching': True
}, {
'url': 'http://www.mylifetime.com/movies/center-stage-on-pointe/full-movie',
'only_matching': True
}]
_DOMAIN_TO_REQUESTOR_ID = {
'history.com': 'HISTORY',
'aetv.com': 'AETV',
'mylifetime.com': 'LIFETIME',
'fyi.tv': 'FYI',
}
def _real_extract(self, url):
domain, show_path, movie_display_id = re.match(self._VALID_URL, url).groups()
display_id = show_path or movie_display_id
webpage = self._download_webpage(url, display_id)
if show_path:
url_parts = show_path.split('/')
url_parts_len = len(url_parts)
if url_parts_len == 1:
entries = []
for season_url_path in re.findall(r'(?s)<li[^>]+data-href="(/shows/%s/season-\d+)"' % url_parts[0], webpage):
entries.append(self.url_result(
compat_urlparse.urljoin(url, season_url_path), 'AENetworks'))
return self.playlist_result(
entries, self._html_search_meta('aetn:SeriesId', webpage),
self._html_search_meta('aetn:SeriesTitle', webpage))
elif url_parts_len == 2:
entries = []
for episode_item in re.findall(r'(?s)<div[^>]+class="[^"]*episode-item[^"]*"[^>]*>', webpage):
episode_attributes = extract_attributes(episode_item)
episode_url = compat_urlparse.urljoin(
url, episode_attributes['data-canonical'])
entries.append(self.url_result(
episode_url, 'AENetworks',
episode_attributes['data-videoid']))
return self.playlist_result(
entries, self._html_search_meta('aetn:SeasonId', webpage))
query = {
'mbr': 'true',
'assetTypes': 'medium_video_s3'
}
video_id = self._html_search_meta('aetn:VideoID', webpage)
media_url = self._search_regex(
r"media_url\s*=\s*'([^']+)'", webpage, 'video url')
theplatform_metadata = self._download_theplatform_metadata(self._search_regex(
r'https?://link.theplatform.com/s/([^?]+)', media_url, 'theplatform_path'), video_id)
info = self._parse_theplatform_metadata(theplatform_metadata)
if theplatform_metadata.get('AETN$isBehindWall'):
requestor_id = self._DOMAIN_TO_REQUESTOR_ID[domain]
resource = self._get_mvpd_resource(
requestor_id, theplatform_metadata['title'],
theplatform_metadata.get('AETN$PPL_pplProgramId') or theplatform_metadata.get('AETN$PPL_pplProgramId_OLD'),
theplatform_metadata['ratings'][0]['rating'])
query['auth'] = self._extract_mvpd_auth(
url, video_id, requestor_id, resource)
info.update(self._search_json_ld(webpage, video_id, fatal=False))
media_url = update_url_query(media_url, query)
media_url = self._sign_url(media_url, self._THEPLATFORM_KEY, self._THEPLATFORM_SECRET)
formats, subtitles = self._extract_theplatform_smil(media_url, video_id)
self._sort_formats(formats)
info.update({
'id': video_id,
'formats': formats,
'subtitles': subtitles,
})
return info
class HistoryTopicIE(AENetworksBaseIE):
IE_NAME = 'history:topic'
IE_DESC = 'History.com Topic'
_VALID_URL = r'https?://(?:www\.)?history\.com/topics/(?:[^/]+/)?(?P<topic_id>[^/]+)(?:/[^/]+(?:/(?P<video_display_id>[^/?#]+))?)?'
_TESTS = [{ _TESTS = [{
'url': 'http://www.history.com/topics/valentines-day/history-of-valentines-day/videos/bet-you-didnt-know-valentines-day?m=528e394da93ae&s=undefined&f=1&free=false', 'url': 'http://www.history.com/topics/valentines-day/history-of-valentines-day/videos/bet-you-didnt-know-valentines-day?m=528e394da93ae&s=undefined&f=1&free=false',
'info_dict': { 'info_dict': {
'id': 'g12m5Gyt3fdR', 'id': '40700995724',
'ext': 'mp4', 'ext': 'mp4',
'title': "Bet You Didn't Know: Valentine's Day", 'title': "Bet You Didn't Know: Valentine's Day",
'description': 'md5:7b57ea4829b391995b405fa60bd7b5f7', 'description': 'md5:7b57ea4829b391995b405fa60bd7b5f7',
@ -31,57 +148,61 @@ class AENetworksIE(InfoExtractor):
'skip_download': True, 'skip_download': True,
}, },
'add_ie': ['ThePlatform'], 'add_ie': ['ThePlatform'],
'expected_warnings': ['JSON-LD'],
}, { }, {
'url': 'http://www.history.com/shows/mountain-men/season-1/episode-1', 'url': 'http://www.history.com/topics/world-war-i/world-war-i-history/videos',
'md5': '8ff93eb073449f151d6b90c0ae1ef0c7', 'info_dict':
'info_dict': { {
'id': 'eg47EERs_JsZ', 'id': 'world-war-i-history',
'ext': 'mp4', 'title': 'World War I History',
'title': 'Winter Is Coming',
'description': 'md5:641f424b7a19d8e24f26dea22cf59d74',
'timestamp': 1338306241,
'upload_date': '20120529',
'uploader': 'AENE-NEW',
}, },
'add_ie': ['ThePlatform'], 'playlist_mincount': 24,
}, { }, {
'url': 'http://www.aetv.com/shows/duck-dynasty/video/inlawful-entry', 'url': 'http://www.history.com/topics/world-war-i-history/videos',
'only_matching': True 'only_matching': True,
}, { }, {
'url': 'http://www.fyi.tv/shows/tiny-house-nation/videos/207-sq-ft-minnesota-prairie-cottage', 'url': 'http://www.history.com/topics/world-war-i/world-war-i-history',
'only_matching': True 'only_matching': True,
}, { }, {
'url': 'http://www.mylifetime.com/shows/project-runway-junior/video/season-1/episode-6/superstar-clients', 'url': 'http://www.history.com/topics/world-war-i/world-war-i-history/speeches',
'only_matching': True 'only_matching': True,
}] }]
def _real_extract(self, url): def theplatform_url_result(self, theplatform_url, video_id, query):
page_type, video_id = re.match(self._VALID_URL, url).groups() return {
webpage = self._download_webpage(url, video_id)
video_url_re = [
r'data-href="[^"]*/%s"[^>]+data-release-url="([^"]+)"' % video_id,
r"media_url\s*=\s*'([^']+)'"
]
video_url = unescapeHTML(self._search_regex(video_url_re, webpage, 'video url'))
query = {'mbr': 'true'}
if page_type == 'shows':
query['assetTypes'] = 'medium_video_s3'
if 'switch=hds' in video_url:
query['switch'] = 'hls'
info = self._search_json_ld(webpage, video_id, fatal=False)
info.update({
'_type': 'url_transparent', '_type': 'url_transparent',
'id': video_id,
'url': smuggle_url( 'url': smuggle_url(
update_url_query(video_url, query), update_url_query(theplatform_url, query),
{ {
'sig': { 'sig': {
'key': 'crazyjava', 'key': self._THEPLATFORM_KEY,
'secret': 's3cr3t'}, 'secret': self._THEPLATFORM_SECRET,
},
'force_smil_url': True 'force_smil_url': True
}), }),
}) 'ie_key': 'ThePlatform',
return info }
def _real_extract(self, url):
topic_id, video_display_id = re.match(self._VALID_URL, url).groups()
if video_display_id:
webpage = self._download_webpage(url, video_display_id)
release_url, video_id = re.search(r"_videoPlayer.play\('([^']+)'\s*,\s*'[^']+'\s*,\s*'(\d+)'\)", webpage).groups()
release_url = unescapeHTML(release_url)
return self.theplatform_url_result(
release_url, video_id, {
'mbr': 'true',
'switch': 'hls'
})
else:
webpage = self._download_webpage(url, topic_id)
entries = []
for episode_item in re.findall(r'<a.+?data-release-url="[^"]+"[^>]*>', webpage):
video_attributes = extract_attributes(episode_item)
entries.append(self.theplatform_url_result(
video_attributes['data-release-url'], video_attributes['data-id'], {
'mbr': 'true',
'switch': 'hls'
}))
return self.playlist_result(entries, topic_id, get_element_by_attribute('class', 'show-title', webpage))

View File

@ -1,64 +0,0 @@
# encoding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import int_or_none
class AftonbladetIE(InfoExtractor):
_VALID_URL = r'https?://tv\.aftonbladet\.se/abtv/articles/(?P<id>[0-9]+)'
_TEST = {
'url': 'http://tv.aftonbladet.se/abtv/articles/36015',
'info_dict': {
'id': '36015',
'ext': 'mp4',
'title': 'Vulkanutbrott i rymden - nu släpper NASA bilderna',
'description': 'Jupiters måne mest aktiv av alla himlakroppar',
'timestamp': 1394142732,
'upload_date': '20140306',
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
# find internal video meta data
meta_url = 'http://aftonbladet-play-metadata.cdn.drvideo.aptoma.no/video/%s.json'
player_config = self._parse_json(self._html_search_regex(
r'data-player-config="([^"]+)"', webpage, 'player config'), video_id)
internal_meta_id = player_config['aptomaVideoId']
internal_meta_url = meta_url % internal_meta_id
internal_meta_json = self._download_json(
internal_meta_url, video_id, 'Downloading video meta data')
# find internal video formats
format_url = 'http://aftonbladet-play.videodata.drvideo.aptoma.no/actions/video/?id=%s'
internal_video_id = internal_meta_json['videoId']
internal_formats_url = format_url % internal_video_id
internal_formats_json = self._download_json(
internal_formats_url, video_id, 'Downloading video formats')
formats = []
for fmt in internal_formats_json['formats']['http']['pseudostreaming']['mp4']:
p = fmt['paths'][0]
formats.append({
'url': 'http://%s:%d/%s/%s' % (p['address'], p['port'], p['path'], p['filename']),
'ext': 'mp4',
'width': int_or_none(fmt.get('width')),
'height': int_or_none(fmt.get('height')),
'tbr': int_or_none(fmt.get('bitrate')),
'protocol': 'http',
})
self._sort_formats(formats)
return {
'id': video_id,
'title': internal_meta_json['title'],
'formats': formats,
'thumbnail': internal_meta_json.get('imageUrl'),
'description': internal_meta_json.get('shortPreamble'),
'timestamp': int_or_none(internal_meta_json.get('timePublished')),
'duration': int_or_none(internal_meta_json.get('duration')),
'view_count': int_or_none(internal_meta_json.get('views')),
}

View File

@ -4,7 +4,7 @@ from .common import InfoExtractor
class AlJazeeraIE(InfoExtractor): class AlJazeeraIE(InfoExtractor):
_VALID_URL = r'https?://www\.aljazeera\.com/programmes/.*?/(?P<id>[^/]+)\.html' _VALID_URL = r'https?://(?:www\.)?aljazeera\.com/programmes/.*?/(?P<id>[^/]+)\.html'
_TEST = { _TEST = {
'url': 'http://www.aljazeera.com/programmes/the-slum/2014/08/deliverance-201482883754237240.html', 'url': 'http://www.aljazeera.com/programmes/the-slum/2014/08/deliverance-201482883754237240.html',

View File

@ -1,29 +1,26 @@
# -*- coding: utf-8 -*- # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str
from ..utils import ( from ..utils import (
remove_end,
qualities, qualities,
unescapeHTML, url_basename,
xpath_element,
) )
class AllocineIE(InfoExtractor): class AllocineIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?allocine\.fr/(?P<typ>article|video|film)/(fichearticle_gen_carticle=|player_gen_cmedia=|fichefilm_gen_cfilm=|video-)(?P<id>[0-9]+)(?:\.html)?' _VALID_URL = r'https?://(?:www\.)?allocine\.fr/(?:article|video|film)/(?:fichearticle_gen_carticle=|player_gen_cmedia=|fichefilm_gen_cfilm=|video-)(?P<id>[0-9]+)(?:\.html)?'
_TESTS = [{ _TESTS = [{
'url': 'http://www.allocine.fr/article/fichearticle_gen_carticle=18635087.html', 'url': 'http://www.allocine.fr/article/fichearticle_gen_carticle=18635087.html',
'md5': '0c9fcf59a841f65635fa300ac43d8269', 'md5': '0c9fcf59a841f65635fa300ac43d8269',
'info_dict': { 'info_dict': {
'id': '19546517', 'id': '19546517',
'display_id': '18635087',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Astérix - Le Domaine des Dieux Teaser VF', 'title': 'Astérix - Le Domaine des Dieux Teaser VF',
'description': 'md5:abcd09ce503c6560512c14ebfdb720d2', 'description': 'md5:4a754271d9c6f16c72629a8a993ee884',
'thumbnail': 're:http://.*\.jpg', 'thumbnail': 're:http://.*\.jpg',
}, },
}, { }, {
@ -31,64 +28,82 @@ class AllocineIE(InfoExtractor):
'md5': 'd0cdce5d2b9522ce279fdfec07ff16e0', 'md5': 'd0cdce5d2b9522ce279fdfec07ff16e0',
'info_dict': { 'info_dict': {
'id': '19540403', 'id': '19540403',
'display_id': '19540403',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Planes 2 Bande-annonce VF', 'title': 'Planes 2 Bande-annonce VF',
'description': 'Regardez la bande annonce du film Planes 2 (Planes 2 Bande-annonce VF). Planes 2, un film de Roberts Gannaway', 'description': 'Regardez la bande annonce du film Planes 2 (Planes 2 Bande-annonce VF). Planes 2, un film de Roberts Gannaway',
'thumbnail': 're:http://.*\.jpg', 'thumbnail': 're:http://.*\.jpg',
}, },
}, { }, {
'url': 'http://www.allocine.fr/film/fichefilm_gen_cfilm=181290.html', 'url': 'http://www.allocine.fr/video/player_gen_cmedia=19544709&cfilm=181290.html',
'md5': '101250fb127ef9ca3d73186ff22a47ce', 'md5': '101250fb127ef9ca3d73186ff22a47ce',
'info_dict': { 'info_dict': {
'id': '19544709', 'id': '19544709',
'display_id': '19544709',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Dragons 2 - Bande annonce finale VF', 'title': 'Dragons 2 - Bande annonce finale VF',
'description': 'md5:601d15393ac40f249648ef000720e7e3', 'description': 'md5:6cdd2d7c2687d4c6aafe80a35e17267a',
'thumbnail': 're:http://.*\.jpg', 'thumbnail': 're:http://.*\.jpg',
}, },
}, { }, {
'url': 'http://www.allocine.fr/video/video-19550147/', 'url': 'http://www.allocine.fr/video/video-19550147/',
'only_matching': True, 'md5': '3566c0668c0235e2d224fd8edb389f67',
'info_dict': {
'id': '19550147',
'ext': 'mp4',
'title': 'Faux Raccord N°123 - Les gaffes de Cliffhanger',
'description': 'md5:bc734b83ffa2d8a12188d9eb48bb6354',
'thumbnail': 're:http://.*\.jpg',
},
}] }]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) display_id = self._match_id(url)
typ = mobj.group('typ')
display_id = mobj.group('id')
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
if typ == 'film': formats = []
video_id = self._search_regex(r'href="/video/player_gen_cmedia=([0-9]+).+"', webpage, 'video id')
else:
player = self._search_regex(r'data-player=\'([^\']+)\'>', webpage, 'data player', default=None)
if player:
player_data = json.loads(player)
video_id = compat_str(player_data['refMedia'])
else:
model = self._search_regex(r'data-model="([^"]+)">', webpage, 'data model')
model_data = self._parse_json(unescapeHTML(model), display_id)
video_id = compat_str(model_data['id'])
xml = self._download_xml('http://www.allocine.fr/ws/AcVisiondataV4.ashx?media=%s' % video_id, display_id)
video = xpath_element(xml, './/AcVisionVideo').attrib
quality = qualities(['ld', 'md', 'hd']) quality = qualities(['ld', 'md', 'hd'])
formats = [] model = self._html_search_regex(
for k, v in video.items(): r'data-model="([^"]+)"', webpage, 'data model', default=None)
if re.match(r'.+_path', k): if model:
format_id = k.split('_')[0] model_data = self._parse_json(model, display_id)
for video_url in model_data['sources'].values():
video_id, format_id = url_basename(video_url).split('_')[:2]
formats.append({ formats.append({
'format_id': format_id, 'format_id': format_id,
'quality': quality(format_id), 'quality': quality(format_id),
'url': v, 'url': video_url,
}) })
title = model_data['title']
else:
video_id = display_id
media_data = self._download_json(
'http://www.allocine.fr/ws/AcVisiondataV5.ashx?media=%s' % video_id, display_id)
for key, value in media_data['video'].items():
if not key.endswith('Path'):
continue
format_id = key[:-len('Path')]
formats.append({
'format_id': format_id,
'quality': quality(format_id),
'url': value,
})
title = remove_end(self._html_search_regex(
r'(?s)<title>(.+?)</title>', webpage, 'title'
).strip(), ' - AlloCiné')
self._sort_formats(formats) self._sort_formats(formats)
return { return {
'id': video_id, 'id': video_id,
'title': video['videoTitle'], 'display_id': display_id,
'title': title,
'thumbnail': self._og_search_thumbnail(webpage), 'thumbnail': self._og_search_thumbnail(webpage),
'formats': formats, 'formats': formats,
'description': self._og_search_description(webpage), 'description': self._og_search_description(webpage),

View File

@ -0,0 +1,92 @@
# coding: utf-8
from __future__ import unicode_literals
from .theplatform import ThePlatformIE
from ..utils import (
update_url_query,
parse_age_limit,
int_or_none,
)
class AMCNetworksIE(ThePlatformIE):
_VALID_URL = r'https?://(?:www\.)?(?:amc|bbcamerica|ifc|wetv)\.com/(?:movies/|shows/[^/]+/(?:full-episodes/)?season-\d+/episode-\d+(?:-(?:[^/]+/)?|/))(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'http://www.ifc.com/shows/maron/season-04/episode-01/step-1',
'md5': '',
'info_dict': {
'id': 's3MX01Nl4vPH',
'ext': 'mp4',
'title': 'Maron - Season 4 - Step 1',
'description': 'In denial about his current situation, Marc is reluctantly convinced by his friends to enter rehab. Starring Marc Maron and Constance Zimmer.',
'age_limit': 17,
'upload_date': '20160505',
'timestamp': 1462468831,
'uploader': 'AMCN',
},
'params': {
# m3u8 download
'skip_download': True,
},
'skip': 'Requires TV provider accounts',
}, {
'url': 'http://www.bbcamerica.com/shows/the-hunt/full-episodes/season-1/episode-01-the-hardest-challenge',
'only_matching': True,
}, {
'url': 'http://www.amc.com/shows/preacher/full-episodes/season-01/episode-00/pilot',
'only_matching': True,
}, {
'url': 'http://www.wetv.com/shows/million-dollar-matchmaker/season-01/episode-06-the-dumped-dj-and-shallow-hal',
'only_matching': True,
}, {
'url': 'http://www.ifc.com/movies/chaos',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
query = {
'mbr': 'true',
'manifest': 'm3u',
}
media_url = self._search_regex(r'window\.platformLinkURL\s*=\s*[\'"]([^\'"]+)', webpage, 'media url')
theplatform_metadata = self._download_theplatform_metadata(self._search_regex(
r'https?://link.theplatform.com/s/([^?]+)', media_url, 'theplatform_path'), display_id)
info = self._parse_theplatform_metadata(theplatform_metadata)
video_id = theplatform_metadata['pid']
title = theplatform_metadata['title']
rating = theplatform_metadata['ratings'][0]['rating']
auth_required = self._search_regex(r'window\.authRequired\s*=\s*(true|false);', webpage, 'auth required')
if auth_required == 'true':
requestor_id = self._search_regex(r'window\.requestor_id\s*=\s*[\'"]([^\'"]+)', webpage, 'requestor id')
resource = self._get_mvpd_resource(requestor_id, title, video_id, rating)
query['auth'] = self._extract_mvpd_auth(url, video_id, requestor_id, resource)
media_url = update_url_query(media_url, query)
formats, subtitles = self._extract_theplatform_smil(media_url, video_id)
self._sort_formats(formats)
info.update({
'id': video_id,
'subtitles': subtitles,
'formats': formats,
'age_limit': parse_age_limit(parse_age_limit(rating)),
})
ns_keys = theplatform_metadata.get('$xmlns', {}).keys()
if ns_keys:
ns = list(ns_keys)[0]
series = theplatform_metadata.get(ns + '$show')
season_number = int_or_none(theplatform_metadata.get(ns + '$season'))
episode = theplatform_metadata.get(ns + '$episodeTitle')
episode_number = int_or_none(theplatform_metadata.get(ns + '$episode'))
if season_number:
title = 'Season %d - %s' % (season_number, title)
if series:
title = '%s - %s' % (series, title)
info.update({
'title': title,
'series': series,
'season_number': season_number,
'episode': episode,
'episode_number': episode_number,
})
return info

View File

@ -5,6 +5,8 @@ from .common import InfoExtractor
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
parse_iso8601, parse_iso8601,
mimetype2ext,
determine_ext,
) )
@ -50,21 +52,25 @@ class AMPIE(InfoExtractor):
if isinstance(media_content, dict): if isinstance(media_content, dict):
media_content = [media_content] media_content = [media_content]
for media_data in media_content: for media_data in media_content:
media = media_data['@attributes'] media = media_data.get('@attributes', {})
media_type = media['type'] media_url = media.get('url')
if media_type in ('video/f4m', 'application/f4m+xml'): if not media_url:
continue
ext = mimetype2ext(media.get('type')) or determine_ext(media_url)
if ext == 'f4m':
formats.extend(self._extract_f4m_formats( formats.extend(self._extract_f4m_formats(
media['url'] + '?hdcore=3.4.0&plugin=aasp-3.4.0.132.124', media_url + '?hdcore=3.4.0&plugin=aasp-3.4.0.132.124',
video_id, f4m_id='hds', fatal=False)) video_id, f4m_id='hds', fatal=False))
elif media_type == 'application/x-mpegURL': elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
media['url'], video_id, 'mp4', m3u8_id='hls', fatal=False)) media_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
else: else:
formats.append({ formats.append({
'format_id': media_data.get('media-category', {}).get('@attributes', {}).get('label'), 'format_id': media_data.get('media-category', {}).get('@attributes', {}).get('label'),
'url': media['url'], 'url': media['url'],
'tbr': int_or_none(media.get('bitrate')), 'tbr': int_or_none(media.get('bitrate')),
'filesize': int_or_none(media.get('fileSize')), 'filesize': int_or_none(media.get('fileSize')),
'ext': ext,
}) })
self._sort_formats(formats) self._sort_formats(formats)

View File

@ -22,6 +22,7 @@ class AnimeOnDemandIE(InfoExtractor):
_APPLY_HTML5_URL = 'https://www.anime-on-demand.de/html5apply' _APPLY_HTML5_URL = 'https://www.anime-on-demand.de/html5apply'
_NETRC_MACHINE = 'animeondemand' _NETRC_MACHINE = 'animeondemand'
_TESTS = [{ _TESTS = [{
# jap, OmU
'url': 'https://www.anime-on-demand.de/anime/161', 'url': 'https://www.anime-on-demand.de/anime/161',
'info_dict': { 'info_dict': {
'id': '161', 'id': '161',
@ -30,17 +31,21 @@ class AnimeOnDemandIE(InfoExtractor):
}, },
'playlist_mincount': 4, 'playlist_mincount': 4,
}, { }, {
# Film wording is used instead of Episode # Film wording is used instead of Episode, ger/jap, Dub/OmU
'url': 'https://www.anime-on-demand.de/anime/39', 'url': 'https://www.anime-on-demand.de/anime/39',
'only_matching': True, 'only_matching': True,
}, { }, {
# Episodes without titles # Episodes without titles, jap, OmU
'url': 'https://www.anime-on-demand.de/anime/162', 'url': 'https://www.anime-on-demand.de/anime/162',
'only_matching': True, 'only_matching': True,
}, { }, {
# ger/jap, Dub/OmU, account required # ger/jap, Dub/OmU, account required
'url': 'https://www.anime-on-demand.de/anime/169', 'url': 'https://www.anime-on-demand.de/anime/169',
'only_matching': True, 'only_matching': True,
}, {
# Full length film, non-series, ger/jap, Dub/OmU, account required
'url': 'https://www.anime-on-demand.de/anime/185',
'only_matching': True,
}] }]
def _login(self): def _login(self):
@ -110,35 +115,12 @@ class AnimeOnDemandIE(InfoExtractor):
entries = [] entries = []
for num, episode_html in enumerate(re.findall( def extract_info(html, video_id, num=None):
r'(?s)<h3[^>]+class="episodebox-title".+?>Episodeninhalt<', webpage), 1): title, description = [None] * 2
episodebox_title = self._search_regex(
(r'class="episodebox-title"[^>]+title=(["\'])(?P<title>.+?)\1',
r'class="episodebox-title"[^>]+>(?P<title>.+?)<'),
episode_html, 'episodebox title', default=None, group='title')
if not episodebox_title:
continue
episode_number = int(self._search_regex(
r'(?:Episode|Film)\s*(\d+)',
episodebox_title, 'episode number', default=num))
episode_title = self._search_regex(
r'(?:Episode|Film)\s*\d+\s*-\s*(.+)',
episodebox_title, 'episode title', default=None)
video_id = 'episode-%d' % episode_number
common_info = {
'id': video_id,
'series': anime_title,
'episode': episode_title,
'episode_number': episode_number,
}
formats = [] formats = []
for input_ in re.findall( for input_ in re.findall(
r'<input[^>]+class=["\'].*?streamstarter_html5[^>]+>', episode_html): r'<input[^>]+class=["\'].*?streamstarter_html5[^>]+>', html):
attributes = extract_attributes(input_) attributes = extract_attributes(input_)
playlist_urls = [] playlist_urls = []
for playlist_key in ('data-playlist', 'data-otherplaylist'): for playlist_key in ('data-playlist', 'data-otherplaylist'):
@ -161,7 +143,7 @@ class AnimeOnDemandIE(InfoExtractor):
format_id_list.append(lang) format_id_list.append(lang)
if kind: if kind:
format_id_list.append(kind) format_id_list.append(kind)
if not format_id_list: if not format_id_list and num is not None:
format_id_list.append(compat_str(num)) format_id_list.append(compat_str(num))
format_id = '-'.join(format_id_list) format_id = '-'.join(format_id_list)
format_note = ', '.join(filter(None, (kind, lang_note))) format_note = ', '.join(filter(None, (kind, lang_note)))
@ -215,28 +197,74 @@ class AnimeOnDemandIE(InfoExtractor):
}) })
formats.extend(file_formats) formats.extend(file_formats)
if formats: return {
self._sort_formats(formats) 'title': title,
'description': description,
'formats': formats,
}
def extract_entries(html, video_id, common_info, num=None):
info = extract_info(html, video_id, num)
if info['formats']:
self._sort_formats(info['formats'])
f = common_info.copy() f = common_info.copy()
f.update({ f.update(info)
'title': title,
'description': description,
'formats': formats,
})
entries.append(f) entries.append(f)
# Extract teaser only when full episode is not available # Extract teaser/trailer only when full episode is not available
if not formats: if not info['formats']:
m = re.search( m = re.search(
r'data-dialog-header=(["\'])(?P<title>.+?)\1[^>]+href=(["\'])(?P<href>.+?)\3[^>]*>Teaser<', r'data-dialog-header=(["\'])(?P<title>.+?)\1[^>]+href=(["\'])(?P<href>.+?)\3[^>]*>(?P<kind>Teaser|Trailer)<',
episode_html) html)
if m: if m:
f = common_info.copy() f = common_info.copy()
f.update({ f.update({
'id': '%s-teaser' % f['id'], 'id': '%s-%s' % (f['id'], m.group('kind').lower()),
'title': m.group('title'), 'title': m.group('title'),
'url': compat_urlparse.urljoin(url, m.group('href')), 'url': compat_urlparse.urljoin(url, m.group('href')),
}) })
entries.append(f) entries.append(f)
def extract_episodes(html):
for num, episode_html in enumerate(re.findall(
r'(?s)<h3[^>]+class="episodebox-title".+?>Episodeninhalt<', html), 1):
episodebox_title = self._search_regex(
(r'class="episodebox-title"[^>]+title=(["\'])(?P<title>.+?)\1',
r'class="episodebox-title"[^>]+>(?P<title>.+?)<'),
episode_html, 'episodebox title', default=None, group='title')
if not episodebox_title:
continue
episode_number = int(self._search_regex(
r'(?:Episode|Film)\s*(\d+)',
episodebox_title, 'episode number', default=num))
episode_title = self._search_regex(
r'(?:Episode|Film)\s*\d+\s*-\s*(.+)',
episodebox_title, 'episode title', default=None)
video_id = 'episode-%d' % episode_number
common_info = {
'id': video_id,
'series': anime_title,
'episode': episode_title,
'episode_number': episode_number,
}
extract_entries(episode_html, video_id, common_info)
def extract_film(html, video_id):
common_info = {
'id': anime_id,
'title': anime_title,
'description': anime_description,
}
extract_entries(html, video_id, common_info)
extract_episodes(webpage)
if not entries:
extract_film(webpage, anime_id)
return self.playlist_result(entries, anime_id, anime_title, anime_description) return self.playlist_result(entries, anime_id, anime_title, anime_description)

View File

@ -123,6 +123,10 @@ class AolFeaturesIE(InfoExtractor):
'title': 'What To Watch - February 17, 2016', 'title': 'What To Watch - February 17, 2016',
}, },
'add_ie': ['FiveMin'], 'add_ie': ['FiveMin'],
'params': {
# encrypted m3u8 download
'skip_download': True,
},
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -1,8 +1,6 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
@ -15,7 +13,7 @@ class AparatIE(InfoExtractor):
_TEST = { _TEST = {
'url': 'http://www.aparat.com/v/wP8On', 'url': 'http://www.aparat.com/v/wP8On',
'md5': '6714e0af7e0d875c5a39c4dc4ab46ad1', 'md5': '131aca2e14fe7c4dcb3c4877ba300c89',
'info_dict': { 'info_dict': {
'id': 'wP8On', 'id': 'wP8On',
'ext': 'mp4', 'ext': 'mp4',
@ -31,13 +29,13 @@ class AparatIE(InfoExtractor):
# Note: There is an easier-to-parse configuration at # Note: There is an easier-to-parse configuration at
# http://www.aparat.com/video/video/config/videohash/%video_id # http://www.aparat.com/video/video/config/videohash/%video_id
# but the URL in there does not work # but the URL in there does not work
embed_url = ('http://www.aparat.com/video/video/embed/videohash/' + embed_url = 'http://www.aparat.com/video/video/embed/vt/frame/showvideo/yes/videohash/' + video_id
video_id + '/vt/frame')
webpage = self._download_webpage(embed_url, video_id) webpage = self._download_webpage(embed_url, video_id)
video_urls = [video_url.replace('\\/', '/') for video_url in re.findall( file_list = self._parse_json(self._search_regex(
r'(?:fileList\[[0-9]+\]\s*=|"file"\s*:)\s*"([^"]+)"', webpage)] r'fileList\s*=\s*JSON\.parse\(\'([^\']+)\'\)', webpage, 'file list'), video_id)
for i, video_url in enumerate(video_urls): for i, item in enumerate(file_list[0]):
video_url = item['file']
req = HEADRequest(video_url) req = HEADRequest(video_url)
res = self._request_webpage( res = self._request_webpage(
req, video_id, note='Testing video URL %d' % i, errnote=False) req, video_id, note='Testing video URL %d' % i, errnote=False)

View File

@ -7,6 +7,8 @@ from .common import InfoExtractor
from ..compat import compat_urlparse from ..compat import compat_urlparse
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
parse_duration,
unified_strdate,
) )
@ -16,7 +18,8 @@ class AppleTrailersIE(InfoExtractor):
_TESTS = [{ _TESTS = [{
'url': 'http://trailers.apple.com/trailers/wb/manofsteel/', 'url': 'http://trailers.apple.com/trailers/wb/manofsteel/',
'info_dict': { 'info_dict': {
'id': 'manofsteel', 'id': '5111',
'title': 'Man of Steel',
}, },
'playlist': [ 'playlist': [
{ {
@ -70,6 +73,15 @@ class AppleTrailersIE(InfoExtractor):
'id': 'blackthorn', 'id': 'blackthorn',
}, },
'playlist_mincount': 2, 'playlist_mincount': 2,
'expected_warnings': ['Unable to download JSON metadata'],
}, {
# json data only available from http://trailers.apple.com/trailers/feeds/data/15881.json
'url': 'http://trailers.apple.com/trailers/fox/kungfupanda3/',
'info_dict': {
'id': '15881',
'title': 'Kung Fu Panda 3',
},
'playlist_mincount': 4,
}, { }, {
'url': 'http://trailers.apple.com/ca/metropole/autrui/', 'url': 'http://trailers.apple.com/ca/metropole/autrui/',
'only_matching': True, 'only_matching': True,
@ -85,6 +97,45 @@ class AppleTrailersIE(InfoExtractor):
movie = mobj.group('movie') movie = mobj.group('movie')
uploader_id = mobj.group('company') uploader_id = mobj.group('company')
webpage = self._download_webpage(url, movie)
film_id = self._search_regex(r"FilmId\s*=\s*'(\d+)'", webpage, 'film id')
film_data = self._download_json(
'http://trailers.apple.com/trailers/feeds/data/%s.json' % film_id,
film_id, fatal=False)
if film_data:
entries = []
for clip in film_data.get('clips', []):
clip_title = clip['title']
formats = []
for version, version_data in clip.get('versions', {}).items():
for size, size_data in version_data.get('sizes', {}).items():
src = size_data.get('src')
if not src:
continue
formats.append({
'format_id': '%s-%s' % (version, size),
'url': re.sub(r'_(\d+p.mov)', r'_h\1', src),
'width': int_or_none(size_data.get('width')),
'height': int_or_none(size_data.get('height')),
'language': version[:2],
})
self._sort_formats(formats)
entries.append({
'id': movie + '-' + re.sub(r'[^a-zA-Z0-9]', '', clip_title).lower(),
'formats': formats,
'title': clip_title,
'thumbnail': clip.get('screen') or clip.get('thumb'),
'duration': parse_duration(clip.get('runtime') or clip.get('faded')),
'upload_date': unified_strdate(clip.get('posted')),
'uploader_id': uploader_id,
})
page_data = film_data.get('page', {})
return self.playlist_result(entries, film_id, page_data.get('movie_title'))
playlist_url = compat_urlparse.urljoin(url, 'includes/playlists/itunes.inc') playlist_url = compat_urlparse.urljoin(url, 'includes/playlists/itunes.inc')
def fix_html(s): def fix_html(s):

View File

@ -1,67 +1,65 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .jwplatform import JWPlatformBaseIE
from ..utils import unified_strdate from ..utils import (
unified_strdate,
clean_html,
)
class ArchiveOrgIE(InfoExtractor): class ArchiveOrgIE(JWPlatformBaseIE):
IE_NAME = 'archive.org' IE_NAME = 'archive.org'
IE_DESC = 'archive.org videos' IE_DESC = 'archive.org videos'
_VALID_URL = r'https?://(?:www\.)?archive\.org/details/(?P<id>[^?/]+)(?:[?].*)?$' _VALID_URL = r'https?://(?:www\.)?archive\.org/(?:details|embed)/(?P<id>[^/?#]+)(?:[?].*)?$'
_TESTS = [{ _TESTS = [{
'url': 'http://archive.org/details/XD300-23_68HighlightsAResearchCntAugHumanIntellect', 'url': 'http://archive.org/details/XD300-23_68HighlightsAResearchCntAugHumanIntellect',
'md5': '8af1d4cf447933ed3c7f4871162602db', 'md5': '8af1d4cf447933ed3c7f4871162602db',
'info_dict': { 'info_dict': {
'id': 'XD300-23_68HighlightsAResearchCntAugHumanIntellect', 'id': 'XD300-23_68HighlightsAResearchCntAugHumanIntellect',
'ext': 'ogv', 'ext': 'ogg',
'title': '1968 Demo - FJCC Conference Presentation Reel #1', 'title': '1968 Demo - FJCC Conference Presentation Reel #1',
'description': 'md5:1780b464abaca9991d8968c877bb53ed', 'description': 'md5:da45c349df039f1cc8075268eb1b5c25',
'upload_date': '19681210', 'upload_date': '19681210',
'uploader': 'SRI International' 'uploader': 'SRI International'
} }
}, { }, {
'url': 'https://archive.org/details/Cops1922', 'url': 'https://archive.org/details/Cops1922',
'md5': '18f2a19e6d89af8425671da1cf3d4e04', 'md5': 'bc73c8ab3838b5a8fc6c6651fa7b58ba',
'info_dict': { 'info_dict': {
'id': 'Cops1922', 'id': 'Cops1922',
'ext': 'ogv', 'ext': 'mp4',
'title': 'Buster Keaton\'s "Cops" (1922)', 'title': 'Buster Keaton\'s "Cops" (1922)',
'description': 'md5:70f72ee70882f713d4578725461ffcc3', 'description': 'md5:b4544662605877edd99df22f9620d858',
} }
}, {
'url': 'http://archive.org/embed/XD300-23_68HighlightsAResearchCntAugHumanIntellect',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(
'http://archive.org/embed/' + video_id, video_id)
jwplayer_playlist = self._parse_json(self._search_regex(
r"(?s)Play\('[^']+'\s*,\s*(\[.+\])\s*,\s*{.*?}\);",
webpage, 'jwplayer playlist'), video_id)
info = self._parse_jwplayer_data(
{'playlist': jwplayer_playlist}, video_id, base_url=url)
json_url = url + ('&' if '?' in url else '?') + 'output=json' def get_optional(metadata, field):
data = self._download_json(json_url, video_id) return metadata.get(field, [None])[0]
def get_optional(data_dict, field): metadata = self._download_json(
return data_dict['metadata'].get(field, [None])[0] 'http://archive.org/details/' + video_id, video_id, query={
'output': 'json',
title = get_optional(data, 'title') })['metadata']
description = get_optional(data, 'description') info.update({
uploader = get_optional(data, 'creator') 'title': get_optional(metadata, 'title') or info.get('title'),
upload_date = unified_strdate(get_optional(data, 'date')) 'description': clean_html(get_optional(metadata, 'description')),
})
formats = [ if info.get('_type') != 'playlist':
{ info.update({
'format': fdata['format'], 'uploader': get_optional(metadata, 'creator'),
'url': 'http://' + data['server'] + data['dir'] + fn, 'upload_date': unified_strdate(get_optional(metadata, 'date')),
'file_size': int(fdata['size']), })
} return info
for fn, fdata in data['files'].items()
if 'Video' in fdata['format']]
self._sort_formats(formats)
return {
'_type': 'video',
'id': video_id,
'title': title,
'formats': formats,
'description': description,
'uploader': uploader,
'upload_date': upload_date,
'thumbnail': data.get('misc', {}).get('image'),
}

View File

@ -13,13 +13,14 @@ from ..utils import (
parse_duration, parse_duration,
unified_strdate, unified_strdate,
xpath_text, xpath_text,
update_url_query,
) )
from ..compat import compat_etree_fromstring from ..compat import compat_etree_fromstring
class ARDMediathekIE(InfoExtractor): class ARDMediathekIE(InfoExtractor):
IE_NAME = 'ARD:mediathek' IE_NAME = 'ARD:mediathek'
_VALID_URL = r'^https?://(?:(?:www\.)?ardmediathek\.de|mediathek\.daserste\.de)/(?:.*/)(?P<video_id>[0-9]+|[^0-9][^/\?]+)[^/\?]*(?:\?.*)?' _VALID_URL = r'^https?://(?:(?:www\.)?ardmediathek\.de|mediathek\.(?:daserste|rbb-online)\.de)/(?:.*/)(?P<video_id>[0-9]+|[^0-9][^/\?]+)[^/\?]*(?:\?.*)?'
_TESTS = [{ _TESTS = [{
'url': 'http://www.ardmediathek.de/tv/Dokumentation-und-Reportage/Ich-liebe-das-Leben-trotzdem/rbb-Fernsehen/Video?documentId=29582122&bcastId=3822114', 'url': 'http://www.ardmediathek.de/tv/Dokumentation-und-Reportage/Ich-liebe-das-Leben-trotzdem/rbb-Fernsehen/Video?documentId=29582122&bcastId=3822114',
@ -34,6 +35,7 @@ class ARDMediathekIE(InfoExtractor):
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
}, },
'skip': 'HTTP Error 404: Not Found',
}, { }, {
'url': 'http://www.ardmediathek.de/tv/Tatort/Tatort-Scheinwelten-H%C3%B6rfassung-Video/Das-Erste/Video?documentId=29522730&bcastId=602916', 'url': 'http://www.ardmediathek.de/tv/Tatort/Tatort-Scheinwelten-H%C3%B6rfassung-Video/Das-Erste/Video?documentId=29522730&bcastId=602916',
'md5': 'f4d98b10759ac06c0072bbcd1f0b9e3e', 'md5': 'f4d98b10759ac06c0072bbcd1f0b9e3e',
@ -44,6 +46,7 @@ class ARDMediathekIE(InfoExtractor):
'description': 'md5:196392e79876d0ac94c94e8cdb2875f1', 'description': 'md5:196392e79876d0ac94c94e8cdb2875f1',
'duration': 5252, 'duration': 5252,
}, },
'skip': 'HTTP Error 404: Not Found',
}, { }, {
# audio # audio
'url': 'http://www.ardmediathek.de/tv/WDR-H%C3%B6rspiel-Speicher/Tod-eines-Fu%C3%9Fballers/WDR-3/Audio-Podcast?documentId=28488308&bcastId=23074086', 'url': 'http://www.ardmediathek.de/tv/WDR-H%C3%B6rspiel-Speicher/Tod-eines-Fu%C3%9Fballers/WDR-3/Audio-Podcast?documentId=28488308&bcastId=23074086',
@ -55,9 +58,22 @@ class ARDMediathekIE(InfoExtractor):
'description': 'md5:f6e39f3461f0e1f54bfa48c8875c86ef', 'description': 'md5:f6e39f3461f0e1f54bfa48c8875c86ef',
'duration': 3240, 'duration': 3240,
}, },
'skip': 'HTTP Error 404: Not Found',
}, { }, {
'url': 'http://mediathek.daserste.de/sendungen_a-z/328454_anne-will/22429276_vertrauen-ist-gut-spionieren-ist-besser-geht', 'url': 'http://mediathek.daserste.de/sendungen_a-z/328454_anne-will/22429276_vertrauen-ist-gut-spionieren-ist-besser-geht',
'only_matching': True, 'only_matching': True,
}, {
# audio
'url': 'http://mediathek.rbb-online.de/radio/Hörspiel/Vor-dem-Fest/kulturradio/Audio?documentId=30796318&topRessort=radio&bcastId=9839158',
'md5': '4e8f00631aac0395fee17368ac0e9867',
'info_dict': {
'id': '30796318',
'ext': 'mp3',
'title': 'Vor dem Fest',
'description': 'md5:c0c1c8048514deaed2a73b3a60eecacb',
'duration': 3287,
},
'skip': 'Video is no longer available',
}] }]
def _extract_media_info(self, media_info_url, webpage, video_id): def _extract_media_info(self, media_info_url, webpage, video_id):
@ -113,11 +129,14 @@ class ARDMediathekIE(InfoExtractor):
continue continue
if ext == 'f4m': if ext == 'f4m':
formats.extend(self._extract_f4m_formats( formats.extend(self._extract_f4m_formats(
stream_url + '?hdcore=3.1.1&plugin=aasp-3.1.1.69.124', update_url_query(stream_url, {
video_id, preference=-1, f4m_id='hds', fatal=False)) 'hdcore': '3.1.1',
'plugin': 'aasp-3.1.1.69.124'
}),
video_id, f4m_id='hds', fatal=False))
elif ext == 'm3u8': elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
stream_url, video_id, 'mp4', preference=1, m3u8_id='hls', fatal=False)) stream_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
else: else:
if server and server.startswith('rtmp'): if server and server.startswith('rtmp'):
f = { f = {
@ -219,7 +238,7 @@ class ARDMediathekIE(InfoExtractor):
class ARDIE(InfoExtractor): class ARDIE(InfoExtractor):
_VALID_URL = '(?P<mainurl>https?://(www\.)?daserste\.de/[^?#]+/videos/(?P<display_id>[^/?#]+)-(?P<id>[0-9]+))\.html' _VALID_URL = r'(?P<mainurl>https?://(www\.)?daserste\.de/[^?#]+/videos/(?P<display_id>[^/?#]+)-(?P<id>[0-9]+))\.html'
_TEST = { _TEST = {
'url': 'http://www.daserste.de/information/reportage-dokumentation/dokus/videos/die-story-im-ersten-mission-unter-falscher-flagge-100.html', 'url': 'http://www.daserste.de/information/reportage-dokumentation/dokus/videos/die-story-im-ersten-mission-unter-falscher-flagge-100.html',
'md5': 'd216c3a86493f9322545e045ddc3eb35', 'md5': 'd216c3a86493f9322545e045ddc3eb35',
@ -231,7 +250,8 @@ class ARDIE(InfoExtractor):
'title': 'Die Story im Ersten: Mission unter falscher Flagge', 'title': 'Die Story im Ersten: Mission unter falscher Flagge',
'upload_date': '20140804', 'upload_date': '20140804',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
} },
'skip': 'HTTP Error 404: Not Found',
} }
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -0,0 +1,115 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
float_or_none,
int_or_none,
mimetype2ext,
parse_iso8601,
strip_jsonp,
)
class ArkenaIE(InfoExtractor):
_VALID_URL = r'https?://play\.arkena\.com/(?:config|embed)/avp/v\d/player/media/(?P<id>[^/]+)/[^/]+/(?P<account_id>\d+)'
_TESTS = [{
'url': 'https://play.arkena.com/embed/avp/v2/player/media/b41dda37-d8e7-4d3f-b1b5-9a9db578bdfe/1/129411',
'md5': 'b96f2f71b359a8ecd05ce4e1daa72365',
'info_dict': {
'id': 'b41dda37-d8e7-4d3f-b1b5-9a9db578bdfe',
'ext': 'mp4',
'title': 'Big Buck Bunny',
'description': 'Royalty free test video',
'timestamp': 1432816365,
'upload_date': '20150528',
'is_live': False,
},
}, {
'url': 'https://play.arkena.com/config/avp/v2/player/media/b41dda37-d8e7-4d3f-b1b5-9a9db578bdfe/1/129411/?callbackMethod=jQuery1111023664739129262213_1469227693893',
'only_matching': True,
}, {
'url': 'http://play.arkena.com/config/avp/v1/player/media/327336/darkmatter/131064/?callbackMethod=jQuery1111002221189684892677_1469227595972',
'only_matching': True,
}, {
'url': 'http://play.arkena.com/embed/avp/v1/player/media/327336/darkmatter/131064/',
'only_matching': True,
}]
@staticmethod
def _extract_url(webpage):
# See https://support.arkena.com/display/PLAY/Ways+to+embed+your+video
mobj = re.search(
r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//play\.arkena\.com/embed/avp/.+?)\1',
webpage)
if mobj:
return mobj.group('url')
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
account_id = mobj.group('account_id')
playlist = self._download_json(
'https://play.arkena.com/config/avp/v2/player/media/%s/0/%s/?callbackMethod=_'
% (video_id, account_id),
video_id, transform_source=strip_jsonp)['Playlist'][0]
media_info = playlist['MediaInfo']
title = media_info['Title']
media_files = playlist['MediaFiles']
is_live = False
formats = []
for kind_case, kind_formats in media_files.items():
kind = kind_case.lower()
for f in kind_formats:
f_url = f.get('Url')
if not f_url:
continue
is_live = f.get('Live') == 'true'
exts = (mimetype2ext(f.get('Type')), determine_ext(f_url, None))
if kind == 'm3u8' or 'm3u8' in exts:
formats.extend(self._extract_m3u8_formats(
f_url, video_id, 'mp4',
entry_protocol='m3u8' if is_live else 'm3u8_native',
m3u8_id=kind, fatal=False, live=is_live))
elif kind == 'flash' or 'f4m' in exts:
formats.extend(self._extract_f4m_formats(
f_url, video_id, f4m_id=kind, fatal=False))
elif kind == 'dash' or 'mpd' in exts:
formats.extend(self._extract_mpd_formats(
f_url, video_id, mpd_id=kind, fatal=False))
elif kind == 'silverlight':
# TODO: process when ism is supported (see
# https://github.com/rg3/youtube-dl/issues/8118)
continue
else:
tbr = float_or_none(f.get('Bitrate'), 1000)
formats.append({
'url': f_url,
'format_id': '%s-%d' % (kind, tbr) if tbr else kind,
'tbr': tbr,
})
self._sort_formats(formats)
description = media_info.get('Description')
video_id = media_info.get('VideoId') or video_id
timestamp = parse_iso8601(media_info.get('PublishDate'))
thumbnails = [{
'url': thumbnail['Url'],
'width': int_or_none(thumbnail.get('Size')),
} for thumbnail in (media_info.get('Poster') or []) if thumbnail.get('Url')]
return {
'id': video_id,
'title': title,
'description': description,
'timestamp': timestamp,
'is_live': is_live,
'thumbnails': thumbnails,
'formats': formats,
}

View File

@ -1,4 +1,4 @@
# encoding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
@ -410,6 +410,22 @@ class ArteTVEmbedIE(ArteTVPlus7IE):
return self._extract_from_json_url(json_url, video_id, lang) return self._extract_from_json_url(json_url, video_id, lang)
class TheOperaPlatformIE(ArteTVPlus7IE):
IE_NAME = 'theoperaplatform'
_VALID_URL = r'https?://(?:www\.)?theoperaplatform\.eu/(?P<lang>fr|de|en|es)/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://www.theoperaplatform.eu/de/opera/verdi-otello',
'md5': '970655901fa2e82e04c00b955e9afe7b',
'info_dict': {
'id': '060338-009-A',
'ext': 'mp4',
'title': 'Verdi - OTELLO',
'upload_date': '20160927',
},
}]
class ArteTVPlaylistIE(ArteTVBaseIE): class ArteTVPlaylistIE(ArteTVBaseIE):
IE_NAME = 'arte.tv:playlist' IE_NAME = 'arte.tv:playlist'
_VALID_URL = r'https?://(?:www\.)?arte\.tv/guide/(?P<lang>fr|de|en|es)/[^#]*#collection/(?P<id>PL-\d+)' _VALID_URL = r'https?://(?:www\.)?arte\.tv/guide/(?P<lang>fr|de|en|es)/[^#]*#collection/(?P<id>PL-\d+)'
@ -419,6 +435,7 @@ class ArteTVPlaylistIE(ArteTVBaseIE):
'info_dict': { 'info_dict': {
'id': 'PL-013263', 'id': 'PL-013263',
'title': 'Areva & Uramin', 'title': 'Areva & Uramin',
'description': 'md5:a1dc0312ce357c262259139cfd48c9bf',
}, },
'playlist_mincount': 6, 'playlist_mincount': 6,
}, { }, {

View File

@ -0,0 +1,185 @@
# coding: utf-8
from __future__ import unicode_literals
import re
import base64
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse_urlencode,
compat_str,
)
from ..utils import (
int_or_none,
parse_iso8601,
smuggle_url,
unsmuggle_url,
urlencode_postdata,
)
class AWAANIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?:awaan|dcndigital)\.ae/(?:#/)?show/(?P<show_id>\d+)/[^/]+(?:/(?P<video_id>\d+)/(?P<season_id>\d+))?'
def _real_extract(self, url):
show_id, video_id, season_id = re.match(self._VALID_URL, url).groups()
if video_id and int(video_id) > 0:
return self.url_result(
'http://awaan.ae/media/%s' % video_id, 'AWAANVideo')
elif season_id and int(season_id) > 0:
return self.url_result(smuggle_url(
'http://awaan.ae/program/season/%s' % season_id,
{'show_id': show_id}), 'AWAANSeason')
else:
return self.url_result(
'http://awaan.ae/program/%s' % show_id, 'AWAANSeason')
class AWAANBaseIE(InfoExtractor):
def _parse_video_data(self, video_data, video_id, is_live):
title = video_data.get('title_en') or video_data['title_ar']
img = video_data.get('img')
return {
'id': video_id,
'title': self._live_title(title) if is_live else title,
'description': video_data.get('description_en') or video_data.get('description_ar'),
'thumbnail': 'http://admin.mangomolo.com/analytics/%s' % img if img else None,
'duration': int_or_none(video_data.get('duration')),
'timestamp': parse_iso8601(video_data.get('create_time'), ' '),
'is_live': is_live,
}
class AWAANVideoIE(AWAANBaseIE):
IE_NAME = 'awaan:video'
_VALID_URL = r'https?://(?:www\.)?(?:awaan|dcndigital)\.ae/(?:#/)?(?:video(?:/[^/]+)?|media|catchup/[^/]+/[^/]+)/(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.dcndigital.ae/#/video/%D8%B1%D8%AD%D9%84%D8%A9-%D8%A7%D9%84%D8%B9%D9%85%D8%B1-%D8%A7%D9%84%D8%AD%D9%84%D9%82%D8%A9-1/17375',
'md5': '5f61c33bfc7794315c671a62d43116aa',
'info_dict':
{
'id': '17375',
'ext': 'mp4',
'title': 'رحلة العمر : الحلقة 1',
'description': 'md5:0156e935d870acb8ef0a66d24070c6d6',
'duration': 2041,
'timestamp': 1227504126,
'upload_date': '20081124',
'uploader_id': '71',
},
}, {
'url': 'http://awaan.ae/video/26723981/%D8%AF%D8%A7%D8%B1-%D8%A7%D9%84%D8%B3%D9%84%D8%A7%D9%85:-%D8%AE%D9%8A%D8%B1-%D8%AF%D9%88%D8%B1-%D8%A7%D9%84%D8%A3%D9%86%D8%B5%D8%A7%D8%B1',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
video_data = self._download_json(
'http://admin.mangomolo.com/analytics/index.php/plus/video?id=%s' % video_id,
video_id, headers={'Origin': 'http://awaan.ae'})
info = self._parse_video_data(video_data, video_id, False)
embed_url = 'http://admin.mangomolo.com/analytics/index.php/customers/embed/video?' + compat_urllib_parse_urlencode({
'id': video_data['id'],
'user_id': video_data['user_id'],
'signature': video_data['signature'],
'countries': 'Q0M=',
'filter': 'DENY',
})
info.update({
'_type': 'url_transparent',
'url': embed_url,
'ie_key': 'MangomoloVideo',
})
return info
class AWAANLiveIE(AWAANBaseIE):
IE_NAME = 'awaan:live'
_VALID_URL = r'https?://(?:www\.)?(?:awaan|dcndigital)\.ae/(?:#/)?live/(?P<id>\d+)'
_TEST = {
'url': 'http://awaan.ae/live/6/dubai-tv',
'info_dict': {
'id': '6',
'ext': 'mp4',
'title': 're:Dubai Al Oula [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'upload_date': '20150107',
'timestamp': 1420588800,
},
'params': {
# m3u8 download
'skip_download': True,
},
}
def _real_extract(self, url):
channel_id = self._match_id(url)
channel_data = self._download_json(
'http://admin.mangomolo.com/analytics/index.php/plus/getchanneldetails?channel_id=%s' % channel_id,
channel_id, headers={'Origin': 'http://awaan.ae'})
info = self._parse_video_data(channel_data, channel_id, True)
embed_url = 'http://admin.mangomolo.com/analytics/index.php/customers/embed/index?' + compat_urllib_parse_urlencode({
'id': base64.b64encode(channel_data['user_id'].encode()).decode(),
'channelid': base64.b64encode(channel_data['id'].encode()).decode(),
'signature': channel_data['signature'],
'countries': 'Q0M=',
'filter': 'DENY',
})
info.update({
'_type': 'url_transparent',
'url': embed_url,
'ie_key': 'MangomoloLive',
})
return info
class AWAANSeasonIE(InfoExtractor):
IE_NAME = 'awaan:season'
_VALID_URL = r'https?://(?:www\.)?(?:awaan|dcndigital)\.ae/(?:#/)?program/(?:(?P<show_id>\d+)|season/(?P<season_id>\d+))'
_TEST = {
'url': 'http://dcndigital.ae/#/program/205024/%D9%85%D8%AD%D8%A7%D8%B6%D8%B1%D8%A7%D8%AA-%D8%A7%D9%84%D8%B4%D9%8A%D8%AE-%D8%A7%D9%84%D8%B4%D8%B9%D8%B1%D8%A7%D9%88%D9%8A',
'info_dict':
{
'id': '7910',
'title': 'محاضرات الشيخ الشعراوي',
},
'playlist_mincount': 27,
}
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
show_id, season_id = re.match(self._VALID_URL, url).groups()
data = {}
if season_id:
data['season'] = season_id
show_id = smuggled_data.get('show_id')
if show_id is None:
season = self._download_json(
'http://admin.mangomolo.com/analytics/index.php/plus/season_info?id=%s' % season_id,
season_id, headers={'Origin': 'http://awaan.ae'})
show_id = season['id']
data['show_id'] = show_id
show = self._download_json(
'http://admin.mangomolo.com/analytics/index.php/plus/show',
show_id, data=urlencode_postdata(data), headers={
'Origin': 'http://awaan.ae',
'Content-Type': 'application/x-www-form-urlencoded'
})
if not season_id:
season_id = show['default_season']
for season in show['seasons']:
if season['id'] == season_id:
title = season.get('title_en') or season['title_ar']
entries = []
for video in show['videos']:
video_id = compat_str(video['id'])
entries.append(self.url_result(
'http://awaan.ae/media/%s' % video_id, 'AWAANVideo', video_id))
return self.playlist_result(entries, season_id, title)

View File

@ -103,7 +103,7 @@ class AzubuIE(InfoExtractor):
class AzubuLiveIE(InfoExtractor): class AzubuLiveIE(InfoExtractor):
_VALID_URL = r'https?://www.azubu.tv/(?P<id>[^/]+)$' _VALID_URL = r'https?://(?:www\.)?azubu\.tv/(?P<id>[^/]+)$'
_TEST = { _TEST = {
'url': 'http://www.azubu.tv/MarsTVMDLen', 'url': 'http://www.azubu.tv/MarsTVMDLen',

View File

@ -162,6 +162,15 @@ class BandcampAlbumIE(InfoExtractor):
'uploader_id': 'dotscale', 'uploader_id': 'dotscale',
}, },
'playlist_mincount': 7, 'playlist_mincount': 7,
}, {
# with escaped quote in title
'url': 'https://jstrecords.bandcamp.com/album/entropy-ep',
'info_dict': {
'title': '"Entropy" EP',
'uploader_id': 'jstrecords',
'id': 'entropy-ep',
},
'playlist_mincount': 3,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -176,8 +185,11 @@ class BandcampAlbumIE(InfoExtractor):
entries = [ entries = [
self.url_result(compat_urlparse.urljoin(url, t_path), ie=BandcampIE.ie_key()) self.url_result(compat_urlparse.urljoin(url, t_path), ie=BandcampIE.ie_key())
for t_path in tracks_paths] for t_path in tracks_paths]
title = self._search_regex( title = self._html_search_regex(
r'album_title\s*:\s*"(.*?)"', webpage, 'title', fatal=False) r'album_title\s*:\s*"((?:\\.|[^"\\])+?)"',
webpage, 'title', fatal=False)
if title:
title = title.replace(r'\"', '"')
return { return {
'_type': 'playlist', '_type': 'playlist',
'uploader_id': uploader_id, 'uploader_id': uploader_id,

View File

@ -2,19 +2,23 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
import itertools
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
dict_get,
ExtractorError, ExtractorError,
float_or_none, float_or_none,
int_or_none, int_or_none,
parse_duration, parse_duration,
parse_iso8601, parse_iso8601,
try_get,
unescapeHTML, unescapeHTML,
) )
from ..compat import ( from ..compat import (
compat_etree_fromstring, compat_etree_fromstring,
compat_HTTPError, compat_HTTPError,
compat_urlparse,
) )
@ -229,51 +233,6 @@ class BBCCoUkIE(InfoExtractor):
asx = self._download_xml(connection.get('href'), programme_id, 'Downloading ASX playlist') asx = self._download_xml(connection.get('href'), programme_id, 'Downloading ASX playlist')
return [ref.get('href') for ref in asx.findall('./Entry/ref')] return [ref.get('href') for ref in asx.findall('./Entry/ref')]
def _extract_connection(self, connection, programme_id):
formats = []
kind = connection.get('kind')
protocol = connection.get('protocol')
supplier = connection.get('supplier')
if protocol == 'http':
href = connection.get('href')
transfer_format = connection.get('transferFormat')
# ASX playlist
if supplier == 'asx':
for i, ref in enumerate(self._extract_asx_playlist(connection, programme_id)):
formats.append({
'url': ref,
'format_id': 'ref%s_%s' % (i, supplier),
})
# Skip DASH until supported
elif transfer_format == 'dash':
pass
elif transfer_format == 'hls':
formats.extend(self._extract_m3u8_formats(
href, programme_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id=supplier, fatal=False))
# Direct link
else:
formats.append({
'url': href,
'format_id': supplier or kind or protocol,
})
elif protocol == 'rtmp':
application = connection.get('application', 'ondemand')
auth_string = connection.get('authString')
identifier = connection.get('identifier')
server = connection.get('server')
formats.append({
'url': '%s://%s/%s?%s' % (protocol, server, application, auth_string),
'play_path': identifier,
'app': '%s?%s' % (application, auth_string),
'page_url': 'http://www.bbc.co.uk',
'player_url': 'http://www.bbc.co.uk/emp/releases/iplayer/revisions/617463_618125_4/617463_618125_4_emp.swf',
'rtmp_live': False,
'ext': 'flv',
'format_id': supplier,
})
return formats
def _extract_items(self, playlist): def _extract_items(self, playlist):
return playlist.findall('./{%s}item' % self._EMP_PLAYLIST_NS) return playlist.findall('./{%s}item' % self._EMP_PLAYLIST_NS)
@ -294,46 +253,6 @@ class BBCCoUkIE(InfoExtractor):
def _extract_connections(self, media): def _extract_connections(self, media):
return self._findall_ns(media, './{%s}connection') return self._findall_ns(media, './{%s}connection')
def _extract_video(self, media, programme_id):
formats = []
vbr = int_or_none(media.get('bitrate'))
vcodec = media.get('encoding')
service = media.get('service')
width = int_or_none(media.get('width'))
height = int_or_none(media.get('height'))
file_size = int_or_none(media.get('media_file_size'))
for connection in self._extract_connections(media):
conn_formats = self._extract_connection(connection, programme_id)
for format in conn_formats:
format.update({
'width': width,
'height': height,
'vbr': vbr,
'vcodec': vcodec,
'filesize': file_size,
})
if service:
format['format_id'] = '%s_%s' % (service, format['format_id'])
formats.extend(conn_formats)
return formats
def _extract_audio(self, media, programme_id):
formats = []
abr = int_or_none(media.get('bitrate'))
acodec = media.get('encoding')
service = media.get('service')
for connection in self._extract_connections(media):
conn_formats = self._extract_connection(connection, programme_id)
for format in conn_formats:
format.update({
'format_id': '%s_%s' % (service, format['format_id']),
'abr': abr,
'acodec': acodec,
'vcodec': 'none',
})
formats.extend(conn_formats)
return formats
def _get_subtitles(self, media, programme_id): def _get_subtitles(self, media, programme_id):
subtitles = {} subtitles = {}
for connection in self._extract_connections(media): for connection in self._extract_connections(media):
@ -379,13 +298,87 @@ class BBCCoUkIE(InfoExtractor):
def _process_media_selector(self, media_selection, programme_id): def _process_media_selector(self, media_selection, programme_id):
formats = [] formats = []
subtitles = None subtitles = None
urls = []
for media in self._extract_medias(media_selection): for media in self._extract_medias(media_selection):
kind = media.get('kind') kind = media.get('kind')
if kind == 'audio': if kind in ('video', 'audio'):
formats.extend(self._extract_audio(media, programme_id)) bitrate = int_or_none(media.get('bitrate'))
elif kind == 'video': encoding = media.get('encoding')
formats.extend(self._extract_video(media, programme_id)) service = media.get('service')
width = int_or_none(media.get('width'))
height = int_or_none(media.get('height'))
file_size = int_or_none(media.get('media_file_size'))
for connection in self._extract_connections(media):
href = connection.get('href')
if href in urls:
continue
if href:
urls.append(href)
conn_kind = connection.get('kind')
protocol = connection.get('protocol')
supplier = connection.get('supplier')
transfer_format = connection.get('transferFormat')
format_id = supplier or conn_kind or protocol
if service:
format_id = '%s_%s' % (service, format_id)
# ASX playlist
if supplier == 'asx':
for i, ref in enumerate(self._extract_asx_playlist(connection, programme_id)):
formats.append({
'url': ref,
'format_id': 'ref%s_%s' % (i, format_id),
})
elif transfer_format == 'dash':
formats.extend(self._extract_mpd_formats(
href, programme_id, mpd_id=format_id, fatal=False))
elif transfer_format == 'hls':
formats.extend(self._extract_m3u8_formats(
href, programme_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id=format_id, fatal=False))
elif transfer_format == 'hds':
formats.extend(self._extract_f4m_formats(
href, programme_id, f4m_id=format_id, fatal=False))
else:
if not service and not supplier and bitrate:
format_id += '-%d' % bitrate
fmt = {
'format_id': format_id,
'filesize': file_size,
}
if kind == 'video':
fmt.update({
'width': width,
'height': height,
'vbr': bitrate,
'vcodec': encoding,
})
else:
fmt.update({
'abr': bitrate,
'acodec': encoding,
'vcodec': 'none',
})
if protocol == 'http':
# Direct link
fmt.update({
'url': href,
})
elif protocol == 'rtmp':
application = connection.get('application', 'ondemand')
auth_string = connection.get('authString')
identifier = connection.get('identifier')
server = connection.get('server')
fmt.update({
'url': '%s://%s/%s?%s' % (protocol, server, application, auth_string),
'play_path': identifier,
'app': '%s?%s' % (application, auth_string),
'page_url': 'http://www.bbc.co.uk',
'player_url': 'http://www.bbc.co.uk/emp/releases/iplayer/revisions/617463_618125_4/617463_618125_4_emp.swf',
'rtmp_live': False,
'ext': 'flv',
})
formats.append(fmt)
elif kind == 'captions': elif kind == 'captions':
subtitles = self.extract_subtitles(media, programme_id) subtitles = self.extract_subtitles(media, programme_id)
return formats, subtitles return formats, subtitles
@ -590,6 +583,7 @@ class BBCIE(BBCCoUkIE):
'id': '150615_telabyad_kentin_cogu', 'id': '150615_telabyad_kentin_cogu',
'ext': 'mp4', 'ext': 'mp4',
'title': "YPG: Tel Abyad'ın tamamı kontrolümüzde", 'title': "YPG: Tel Abyad'ın tamamı kontrolümüzde",
'description': 'md5:33a4805a855c9baf7115fcbde57e7025',
'timestamp': 1434397334, 'timestamp': 1434397334,
'upload_date': '20150615', 'upload_date': '20150615',
}, },
@ -603,6 +597,7 @@ class BBCIE(BBCCoUkIE):
'id': '150619_video_honduras_militares_hospitales_corrupcion_aw', 'id': '150619_video_honduras_militares_hospitales_corrupcion_aw',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Honduras militariza sus hospitales por nuevo escándalo de corrupción', 'title': 'Honduras militariza sus hospitales por nuevo escándalo de corrupción',
'description': 'md5:1525f17448c4ee262b64b8f0c9ce66c8',
'timestamp': 1434713142, 'timestamp': 1434713142,
'upload_date': '20150619', 'upload_date': '20150619',
}, },
@ -652,6 +647,23 @@ class BBCIE(BBCCoUkIE):
# rtmp download # rtmp download
'skip_download': True, 'skip_download': True,
} }
}, {
# single video embedded with Morph
'url': 'http://www.bbc.co.uk/sport/live/olympics/36895975',
'info_dict': {
'id': 'p041vhd0',
'ext': 'mp4',
'title': "Nigeria v Japan - Men's First Round",
'description': 'Live coverage of the first round from Group B at the Amazonia Arena.',
'duration': 7980,
'uploader': 'BBC Sport',
'uploader_id': 'bbc_sport',
},
'params': {
# m3u8 download
'skip_download': True,
},
'skip': 'Georestricted to UK',
}, { }, {
# single video with playlist.sxml URL in playlist param # single video with playlist.sxml URL in playlist param
'url': 'http://www.bbc.com/sport/0/football/33653409', 'url': 'http://www.bbc.com/sport/0/football/33653409',
@ -749,7 +761,7 @@ class BBCIE(BBCCoUkIE):
webpage = self._download_webpage(url, playlist_id) webpage = self._download_webpage(url, playlist_id)
json_ld_info = self._search_json_ld(webpage, playlist_id, default=None) json_ld_info = self._search_json_ld(webpage, playlist_id, default={})
timestamp = json_ld_info.get('timestamp') timestamp = json_ld_info.get('timestamp')
playlist_title = json_ld_info.get('title') playlist_title = json_ld_info.get('title')
@ -818,8 +830,29 @@ class BBCIE(BBCCoUkIE):
# http://www.bbc.com/turkce/multimedya/2015/10/151010_vid_ankara_patlama_ani) # http://www.bbc.com/turkce/multimedya/2015/10/151010_vid_ankara_patlama_ani)
playlist = data_playable.get('otherSettings', {}).get('playlist', {}) playlist = data_playable.get('otherSettings', {}).get('playlist', {})
if playlist: if playlist:
entries.append(self._extract_from_playlist_sxml( entry = None
playlist.get('progressiveDownloadUrl'), playlist_id, timestamp)) for key in ('streaming', 'progressiveDownload'):
playlist_url = playlist.get('%sUrl' % key)
if not playlist_url:
continue
try:
info = self._extract_from_playlist_sxml(
playlist_url, playlist_id, timestamp)
if not entry:
entry = info
else:
entry['title'] = info['title']
entry['formats'].extend(info['formats'])
except Exception as e:
# Some playlist URL may fail with 500, at the same time
# the other one may work fine (e.g.
# http://www.bbc.com/turkce/haberler/2015/06/150615_telabyad_kentin_cogu)
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 500:
continue
raise
if entry:
self._sort_formats(entry['formats'])
entries.append(entry)
if entries: if entries:
return self.playlist_result(entries, playlist_id, playlist_title, playlist_description) return self.playlist_result(entries, playlist_id, playlist_title, playlist_description)
@ -852,6 +885,50 @@ class BBCIE(BBCCoUkIE):
'subtitles': subtitles, 'subtitles': subtitles,
} }
# Morph based embed (e.g. http://www.bbc.co.uk/sport/live/olympics/36895975)
# There are several setPayload calls may be present but the video
# seems to be always related to the first one
morph_payload = self._parse_json(
self._search_regex(
r'Morph\.setPayload\([^,]+,\s*({.+?})\);',
webpage, 'morph payload', default='{}'),
playlist_id, fatal=False)
if morph_payload:
components = try_get(morph_payload, lambda x: x['body']['components'], list) or []
for component in components:
if not isinstance(component, dict):
continue
lead_media = try_get(component, lambda x: x['props']['leadMedia'], dict)
if not lead_media:
continue
identifiers = lead_media.get('identifiers')
if not identifiers or not isinstance(identifiers, dict):
continue
programme_id = identifiers.get('vpid') or identifiers.get('playablePid')
if not programme_id:
continue
title = lead_media.get('title') or self._og_search_title(webpage)
formats, subtitles = self._download_media_selector(programme_id)
self._sort_formats(formats)
description = lead_media.get('summary')
uploader = lead_media.get('masterBrand')
uploader_id = lead_media.get('mid')
duration = None
duration_d = lead_media.get('duration')
if isinstance(duration_d, dict):
duration = parse_duration(dict_get(
duration_d, ('rawDuration', 'formattedDuration', 'spokenDuration')))
return {
'id': programme_id,
'title': title,
'description': description,
'duration': duration,
'uploader': uploader,
'uploader_id': uploader_id,
'formats': formats,
'subtitles': subtitles,
}
def extract_all(pattern): def extract_all(pattern):
return list(filter(None, map( return list(filter(None, map(
lambda s: self._parse_json(s, playlist_id, fatal=False), lambda s: self._parse_json(s, playlist_id, fatal=False),
@ -869,7 +946,7 @@ class BBCIE(BBCCoUkIE):
r'setPlaylist\("(%s)"\)' % EMBED_URL, webpage)) r'setPlaylist\("(%s)"\)' % EMBED_URL, webpage))
if entries: if entries:
return self.playlist_result( return self.playlist_result(
[self.url_result(entry, 'BBCCoUk') for entry in entries], [self.url_result(entry_, 'BBCCoUk') for entry_ in entries],
playlist_id, playlist_title, playlist_description) playlist_id, playlist_title, playlist_description)
# Multiple video article (e.g. http://www.bbc.com/news/world-europe-32668511) # Multiple video article (e.g. http://www.bbc.com/news/world-europe-32668511)
@ -951,7 +1028,7 @@ class BBCIE(BBCCoUkIE):
class BBCCoUkArticleIE(InfoExtractor): class BBCCoUkArticleIE(InfoExtractor):
_VALID_URL = r'https?://www.bbc.co.uk/programmes/articles/(?P<id>[a-zA-Z0-9]+)' _VALID_URL = r'https?://(?:www\.)?bbc\.co\.uk/programmes/articles/(?P<id>[a-zA-Z0-9]+)'
IE_NAME = 'bbc.co.uk:article' IE_NAME = 'bbc.co.uk:article'
IE_DESC = 'BBC articles' IE_DESC = 'BBC articles'
@ -981,27 +1058,43 @@ class BBCCoUkArticleIE(InfoExtractor):
class BBCCoUkPlaylistBaseIE(InfoExtractor): class BBCCoUkPlaylistBaseIE(InfoExtractor):
def _entries(self, webpage, url, playlist_id):
single_page = 'page' in compat_urlparse.parse_qs(
compat_urlparse.urlparse(url).query)
for page_num in itertools.count(2):
for video_id in re.findall(
self._VIDEO_ID_TEMPLATE % BBCCoUkIE._ID_REGEX, webpage):
yield self.url_result(
self._URL_TEMPLATE % video_id, BBCCoUkIE.ie_key())
if single_page:
return
next_page = self._search_regex(
r'<li[^>]+class=(["\'])pagination_+next\1[^>]*><a[^>]+href=(["\'])(?P<url>(?:(?!\2).)+)\2',
webpage, 'next page url', default=None, group='url')
if not next_page:
break
webpage = self._download_webpage(
compat_urlparse.urljoin(url, next_page), playlist_id,
'Downloading page %d' % page_num, page_num)
def _real_extract(self, url): def _real_extract(self, url):
playlist_id = self._match_id(url) playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id) webpage = self._download_webpage(url, playlist_id)
entries = [
self.url_result(self._URL_TEMPLATE % video_id, BBCCoUkIE.ie_key())
for video_id in re.findall(
self._VIDEO_ID_TEMPLATE % BBCCoUkIE._ID_REGEX, webpage)]
title, description = self._extract_title_and_description(webpage) title, description = self._extract_title_and_description(webpage)
return self.playlist_result(entries, playlist_id, title, description) return self.playlist_result(
self._entries(webpage, url, playlist_id),
playlist_id, title, description)
class BBCCoUkIPlayerPlaylistIE(BBCCoUkPlaylistBaseIE): class BBCCoUkIPlayerPlaylistIE(BBCCoUkPlaylistBaseIE):
IE_NAME = 'bbc.co.uk:iplayer:playlist' IE_NAME = 'bbc.co.uk:iplayer:playlist'
_VALID_URL = r'https?://(?:www\.)?bbc\.co\.uk/iplayer/episodes/(?P<id>%s)' % BBCCoUkIE._ID_REGEX _VALID_URL = r'https?://(?:www\.)?bbc\.co\.uk/iplayer/(?:episodes|group)/(?P<id>%s)' % BBCCoUkIE._ID_REGEX
_URL_TEMPLATE = 'http://www.bbc.co.uk/iplayer/episode/%s' _URL_TEMPLATE = 'http://www.bbc.co.uk/iplayer/episode/%s'
_VIDEO_ID_TEMPLATE = r'data-ip-id=["\'](%s)' _VIDEO_ID_TEMPLATE = r'data-ip-id=["\'](%s)'
_TEST = { _TESTS = [{
'url': 'http://www.bbc.co.uk/iplayer/episodes/b05rcz9v', 'url': 'http://www.bbc.co.uk/iplayer/episodes/b05rcz9v',
'info_dict': { 'info_dict': {
'id': 'b05rcz9v', 'id': 'b05rcz9v',
@ -1009,7 +1102,17 @@ class BBCCoUkIPlayerPlaylistIE(BBCCoUkPlaylistBaseIE):
'description': 'French thriller serial about a missing teenager.', 'description': 'French thriller serial about a missing teenager.',
}, },
'playlist_mincount': 6, 'playlist_mincount': 6,
} 'skip': 'This programme is not currently available on BBC iPlayer',
}, {
# Available for over a year unlike 30 days for most other programmes
'url': 'http://www.bbc.co.uk/iplayer/group/p02tcc32',
'info_dict': {
'id': 'p02tcc32',
'title': 'Bohemian Icons',
'description': 'md5:683e901041b2fe9ba596f2ab04c4dbe7',
},
'playlist_mincount': 10,
}]
def _extract_title_and_description(self, webpage): def _extract_title_and_description(self, webpage):
title = self._search_regex(r'<h1>([^<]+)</h1>', webpage, 'title', fatal=False) title = self._search_regex(r'<h1>([^<]+)</h1>', webpage, 'title', fatal=False)
@ -1032,6 +1135,24 @@ class BBCCoUkPlaylistIE(BBCCoUkPlaylistBaseIE):
'description': 'French thriller serial about a missing teenager.', 'description': 'French thriller serial about a missing teenager.',
}, },
'playlist_mincount': 7, 'playlist_mincount': 7,
}, {
# multipage playlist, explicit page
'url': 'http://www.bbc.co.uk/programmes/b00mfl7n/clips?page=1',
'info_dict': {
'id': 'b00mfl7n',
'title': 'Frozen Planet - Clips - BBC One',
'description': 'md5:65dcbf591ae628dafe32aa6c4a4a0d8c',
},
'playlist_mincount': 24,
}, {
# multipage playlist, all pages
'url': 'http://www.bbc.co.uk/programmes/b00mfl7n/clips',
'info_dict': {
'id': 'b00mfl7n',
'title': 'Frozen Planet - Clips - BBC One',
'description': 'md5:65dcbf591ae628dafe32aa6c4a4a0d8c',
},
'playlist_mincount': 142,
}, { }, {
'url': 'http://www.bbc.co.uk/programmes/b05rcz9v/broadcasts/2016/06', 'url': 'http://www.bbc.co.uk/programmes/b05rcz9v/broadcasts/2016/06',
'only_matching': True, 'only_matching': True,

View File

@ -8,10 +8,10 @@ from ..compat import compat_str
from ..utils import int_or_none from ..utils import int_or_none
class BeatportProIE(InfoExtractor): class BeatportIE(InfoExtractor):
_VALID_URL = r'https?://pro\.beatport\.com/track/(?P<display_id>[^/]+)/(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.|pro\.)?beatport\.com/track/(?P<display_id>[^/]+)/(?P<id>[0-9]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://pro.beatport.com/track/synesthesia-original-mix/5379371', 'url': 'https://beatport.com/track/synesthesia-original-mix/5379371',
'md5': 'b3c34d8639a2f6a7f734382358478887', 'md5': 'b3c34d8639a2f6a7f734382358478887',
'info_dict': { 'info_dict': {
'id': '5379371', 'id': '5379371',
@ -20,7 +20,7 @@ class BeatportProIE(InfoExtractor):
'title': 'Froxic - Synesthesia (Original Mix)', 'title': 'Froxic - Synesthesia (Original Mix)',
}, },
}, { }, {
'url': 'https://pro.beatport.com/track/love-and-war-original-mix/3756896', 'url': 'https://beatport.com/track/love-and-war-original-mix/3756896',
'md5': 'e44c3025dfa38c6577fbaeb43da43514', 'md5': 'e44c3025dfa38c6577fbaeb43da43514',
'info_dict': { 'info_dict': {
'id': '3756896', 'id': '3756896',
@ -29,7 +29,7 @@ class BeatportProIE(InfoExtractor):
'title': 'Wolfgang Gartner - Love & War (Original Mix)', 'title': 'Wolfgang Gartner - Love & War (Original Mix)',
}, },
}, { }, {
'url': 'https://pro.beatport.com/track/birds-original-mix/4991738', 'url': 'https://beatport.com/track/birds-original-mix/4991738',
'md5': 'a1fd8e8046de3950fd039304c186c05f', 'md5': 'a1fd8e8046de3950fd039304c186c05f',
'info_dict': { 'info_dict': {
'id': '4991738', 'id': '4991738',

View File

@ -0,0 +1,75 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class BellMediaIE(InfoExtractor):
_VALID_URL = r'''(?x)https?://(?:www\.)?
(?P<domain>
(?:
ctv|
tsn|
bnn|
thecomedynetwork|
discovery|
discoveryvelocity|
sciencechannel|
investigationdiscovery|
animalplanet|
bravo|
mtv|
space
)\.ca|
much\.com
)/.*?(?:\bvid=|-vid|~|%7E|/(?:episode)?)(?P<id>[0-9]{6})'''
_TESTS = [{
'url': 'http://www.ctv.ca/video/player?vid=706966',
'md5': 'ff2ebbeae0aa2dcc32a830c3fd69b7b0',
'info_dict': {
'id': '706966',
'ext': 'mp4',
'title': 'Larry Day and Richard Jutras on the TIFF red carpet of \'Stonewall\'',
'description': 'etalk catches up with Larry Day and Richard Jutras on the TIFF red carpet of "Stonewall”.',
'upload_date': '20150919',
'timestamp': 1442624700,
},
'expected_warnings': ['HTTP Error 404'],
}, {
'url': 'http://www.thecomedynetwork.ca/video/player?vid=923582',
'only_matching': True,
}, {
'url': 'http://www.tsn.ca/video/expectations-high-for-milos-raonic-at-us-open~939549',
'only_matching': True,
}, {
'url': 'http://www.bnn.ca/video/berman-s-call-part-two-viewer-questions~939654',
'only_matching': True,
}, {
'url': 'http://www.ctv.ca/YourMorning/Video/S1E6-Monday-August-29-2016-vid938009',
'only_matching': True,
}, {
'url': 'http://www.much.com/shows/atmidnight/episode948007/tuesday-september-13-2016',
'only_matching': True,
}, {
'url': 'http://www.much.com/shows/the-almost-impossible-gameshow/928979/episode-6',
'only_matching': True,
}]
_DOMAINS = {
'thecomedynetwork': 'comedy',
'discoveryvelocity': 'discvel',
'sciencechannel': 'discsci',
'investigationdiscovery': 'invdisc',
'animalplanet': 'aniplan',
}
def _real_extract(self, url):
domain, video_id = re.match(self._VALID_URL, url).groups()
domain = domain.split('.')[0]
return {
'_type': 'url_transparent',
'id': video_id,
'url': '9c9media:%s_web:%s' % (self._DOMAINS.get(domain, domain), video_id),
'ie_key': 'NineCNineMedia',
}

View File

@ -2,7 +2,6 @@ from __future__ import unicode_literals
from .mtv import MTVServicesInfoExtractor from .mtv import MTVServicesInfoExtractor
from ..utils import unified_strdate from ..utils import unified_strdate
from ..compat import compat_urllib_parse_urlencode
class BetIE(MTVServicesInfoExtractor): class BetIE(MTVServicesInfoExtractor):
@ -53,9 +52,9 @@ class BetIE(MTVServicesInfoExtractor):
_FEED_URL = "http://feeds.mtvnservices.com/od/feed/bet-mrss-player" _FEED_URL = "http://feeds.mtvnservices.com/od/feed/bet-mrss-player"
def _get_feed_query(self, uri): def _get_feed_query(self, uri):
return compat_urllib_parse_urlencode({ return {
'uuid': uri, 'uuid': uri,
}) }
def _extract_mgid(self, webpage): def _extract_mgid(self, webpage):
return self._search_regex(r'data-uri="([^"]+)', webpage, 'mgid') return self._search_regex(r'data-uri="([^"]+)', webpage, 'mgid')

View File

@ -11,22 +11,13 @@ from ..compat import compat_urllib_parse_unquote
class BigflixIE(InfoExtractor): class BigflixIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?bigflix\.com/.+/(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.)?bigflix\.com/.+/(?P<id>[0-9]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.bigflix.com/Hindi-movies/Action-movies/Singham-Returns/16537',
'md5': 'ec76aa9b1129e2e5b301a474e54fab74',
'info_dict': {
'id': '16537',
'ext': 'mp4',
'title': 'Singham Returns',
'description': 'md5:3d2ba5815f14911d5cc6a501ae0cf65d',
}
}, {
# 2 formats # 2 formats
'url': 'http://www.bigflix.com/Tamil-movies/Drama-movies/Madarasapatinam/16070', 'url': 'http://www.bigflix.com/Tamil-movies/Drama-movies/Madarasapatinam/16070',
'info_dict': { 'info_dict': {
'id': '16070', 'id': '16070',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Madarasapatinam', 'title': 'Madarasapatinam',
'description': 'md5:63b9b8ed79189c6f0418c26d9a3452ca', 'description': 'md5:9f0470b26a4ba8e824c823b5d95c2f6b',
'formats': 'mincount:2', 'formats': 'mincount:2',
}, },
'params': { 'params': {

View File

@ -1,205 +1,101 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import calendar import hashlib
import datetime
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import compat_parse_qs
compat_etree_fromstring,
compat_str,
compat_parse_qs,
compat_xml_parse_error,
)
from ..utils import ( from ..utils import (
ExtractorError,
int_or_none, int_or_none,
float_or_none, float_or_none,
xpath_text, unified_timestamp,
urlencode_postdata,
) )
class BiliBiliIE(InfoExtractor): class BiliBiliIE(InfoExtractor):
_VALID_URL = r'https?://www\.bilibili\.(?:tv|com)/video/av(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.|bangumi\.|)bilibili\.(?:tv|com)/(?:video/av|anime/v/)(?P<id>\d+)'
_TESTS = [{ _TEST = {
'url': 'http://www.bilibili.tv/video/av1074402/', 'url': 'http://www.bilibili.tv/video/av1074402/',
'md5': '5f7d29e1a2872f3df0cf76b1f87d3788', 'md5': '9fa226fe2b8a9a4d5a69b4c6a183417e',
'info_dict': { 'info_dict': {
'id': '1554319', 'id': '1074402',
'ext': 'flv', 'ext': 'mp4',
'title': '【金坷垃】金泡沫', 'title': '【金坷垃】金泡沫',
'description': 'md5:ce18c2a2d2193f0df2917d270f2e5923', 'description': 'md5:ce18c2a2d2193f0df2917d270f2e5923',
'duration': 308.067, 'duration': 308.315,
'timestamp': 1398012660, 'timestamp': 1398012660,
'upload_date': '20140420', 'upload_date': '20140420',
'thumbnail': 're:^https?://.+\.jpg', 'thumbnail': 're:^https?://.+\.jpg',
'uploader': '菊子桑', 'uploader': '菊子桑',
'uploader_id': '156160', 'uploader_id': '156160',
}, },
}, { }
'url': 'http://www.bilibili.com/video/av1041170/',
'info_dict': {
'id': '1041170',
'title': '【BD1080P】刀语【诸神&异域】',
'description': '这是个神奇的故事~每个人不留弹幕不给走哦~切利哦!~',
},
'playlist_count': 9,
}, {
'url': 'http://www.bilibili.com/video/av4808130/',
'info_dict': {
'id': '4808130',
'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
},
'playlist': [{
'md5': '55cdadedf3254caaa0d5d27cf20a8f9c',
'info_dict': {
'id': '4808130_part1',
'ext': 'flv',
'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
'timestamp': 1464564180,
'upload_date': '20160529',
'uploader': '喜欢拉面',
'uploader_id': '151066',
},
}, {
'md5': '926f9f67d0c482091872fbd8eca7ea3d',
'info_dict': {
'id': '4808130_part2',
'ext': 'flv',
'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
'timestamp': 1464564180,
'upload_date': '20160529',
'uploader': '喜欢拉面',
'uploader_id': '151066',
},
}, {
'md5': '4b7b225b968402d7c32348c646f1fd83',
'info_dict': {
'id': '4808130_part3',
'ext': 'flv',
'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
'timestamp': 1464564180,
'upload_date': '20160529',
'uploader': '喜欢拉面',
'uploader_id': '151066',
},
}, {
'md5': '7b795e214166501e9141139eea236e91',
'info_dict': {
'id': '4808130_part4',
'ext': 'flv',
'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
'timestamp': 1464564180,
'upload_date': '20160529',
'uploader': '喜欢拉面',
'uploader_id': '151066',
},
}],
}, {
# Missing upload time
'url': 'http://www.bilibili.com/video/av1867637/',
'info_dict': {
'id': '2880301',
'ext': 'flv',
'title': '【HDTV】【喜剧】岳父岳母真难当 2014【法国票房冠军】',
'description': '一个信奉天主教的法国旧式传统资产阶级家庭中有四个女儿。三个女儿却分别找了阿拉伯、犹太、中国丈夫,老夫老妻唯独期盼剩下未嫁的小女儿能找一个信奉天主教的法国白人,结果没想到小女儿找了一位非裔黑人……【这次应该不会跳帧了】',
'uploader': '黑夜为猫',
'uploader_id': '610729',
},
'params': {
# Just to test metadata extraction
'skip_download': True,
},
'expected_warnings': ['upload time'],
}]
# BiliBili blocks keys from time to time. The current key is extracted from _APP_KEY = '6f90a59ac58a4123'
# the Android client _BILIBILI_KEY = '0bfd84cc3940035173f35e6777508326'
# TODO: find the sign algorithm used in the flash player
_APP_KEY = '86385cdc024c0f6c'
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id = self._match_id(url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
params = compat_parse_qs(self._search_regex( if 'anime/v' not in url:
[r'EmbedPlayer\([^)]+,\s*"([^"]+)"\)', cid = compat_parse_qs(self._search_regex(
r'<iframe[^>]+src="https://secure\.bilibili\.com/secure,([^"]+)"'], [r'EmbedPlayer\([^)]+,\s*"([^"]+)"\)',
webpage, 'player parameters')) r'<iframe[^>]+src="https://secure\.bilibili\.com/secure,([^"]+)"'],
cid = params['cid'][0] webpage, 'player parameters'))['cid'][0]
info_xml_str = self._download_webpage(
'http://interface.bilibili.com/v_cdn_play',
cid, query={'appkey': self._APP_KEY, 'cid': cid},
note='Downloading video info page')
err_msg = None
durls = None
info_xml = None
try:
info_xml = compat_etree_fromstring(info_xml_str.encode('utf-8'))
except compat_xml_parse_error:
info_json = self._parse_json(info_xml_str, video_id, fatal=False)
err_msg = (info_json or {}).get('error_text')
else: else:
err_msg = xpath_text(info_xml, './message') js = self._download_json(
'http://bangumi.bilibili.com/web_api/get_source', video_id,
data=urlencode_postdata({'episode_id': video_id}),
headers={'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8'})
cid = js['result']['cid']
if info_xml is not None: payload = 'appkey=%s&cid=%s&otype=json&quality=2&type=mp4' % (self._APP_KEY, cid)
durls = info_xml.findall('./durl') sign = hashlib.md5((payload + self._BILIBILI_KEY).encode('utf-8')).hexdigest()
if not durls:
if err_msg: video_info = self._download_json(
raise ExtractorError('%s said: %s' % (self.IE_NAME, err_msg), expected=True) 'http://interface.bilibili.com/playurl?%s&sign=%s' % (payload, sign),
else: video_id, note='Downloading video info page')
raise ExtractorError('No videos found!')
entries = [] entries = []
for durl in durls: for idx, durl in enumerate(video_info['durl']):
size = xpath_text(durl, ['./filesize', './size'])
formats = [{ formats = [{
'url': durl.find('./url').text, 'url': durl['url'],
'filesize': int_or_none(size), 'filesize': int_or_none(durl['size']),
}] }]
for backup_url in durl.findall('./backup_url/url'): for backup_url in durl.get('backup_url', []):
formats.append({ formats.append({
'url': backup_url.text, 'url': backup_url,
# backup URLs have lower priorities # backup URLs have lower priorities
'preference': -2 if 'hd.mp4' in backup_url.text else -3, 'preference': -2 if 'hd.mp4' in backup_url else -3,
}) })
self._sort_formats(formats) self._sort_formats(formats)
entries.append({ entries.append({
'id': '%s_part%s' % (cid, xpath_text(durl, './order')), 'id': '%s_part%s' % (video_id, idx),
'duration': int_or_none(xpath_text(durl, './length'), 1000), 'duration': float_or_none(durl.get('length'), 1000),
'formats': formats, 'formats': formats,
}) })
title = self._html_search_regex('<h1[^>]+title="([^"]+)">', webpage, 'title') title = self._html_search_regex('<h1[^>]+title="([^"]+)">', webpage, 'title')
description = self._html_search_meta('description', webpage) description = self._html_search_meta('description', webpage)
datetime_str = self._html_search_regex( timestamp = unified_timestamp(self._html_search_regex(
r'<time[^>]+datetime="([^"]+)"', webpage, 'upload time', fatal=False) r'<time[^>]+datetime="([^"]+)"', webpage, 'upload time', fatal=False))
timestamp = None thumbnail = self._html_search_meta(['og:image', 'thumbnailUrl'], webpage)
if datetime_str:
timestamp = calendar.timegm(datetime.datetime.strptime(datetime_str, '%Y-%m-%dT%H:%M').timetuple())
# TODO 'view_count' requires deobfuscating Javascript # TODO 'view_count' requires deobfuscating Javascript
info = { info = {
'id': compat_str(cid), 'id': video_id,
'title': title, 'title': title,
'description': description, 'description': description,
'timestamp': timestamp, 'timestamp': timestamp,
'thumbnail': self._html_search_meta('thumbnailUrl', webpage), 'thumbnail': thumbnail,
'duration': float_or_none(xpath_text(info_xml, './timelength'), scale=1000), 'duration': float_or_none(video_info.get('timelength'), scale=1000),
} }
uploader_mobj = re.search( uploader_mobj = re.search(

View File

@ -2,11 +2,15 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import remove_end from ..utils import (
ExtractorError,
remove_end,
)
from .rudo import RudoIE
class BioBioChileTVIE(InfoExtractor): class BioBioChileTVIE(InfoExtractor):
_VALID_URL = r'https?://tv\.biobiochile\.cl/notas/(?:[^/]+/)+(?P<id>[^/]+)\.shtml' _VALID_URL = r'https?://(?:tv|www)\.biobiochile\.cl/(?:notas|noticias)/(?:[^/]+/)+(?P<id>[^/]+)\.shtml'
_TESTS = [{ _TESTS = [{
'url': 'http://tv.biobiochile.cl/notas/2015/10/21/sobre-camaras-y-camarillas-parlamentarias.shtml', 'url': 'http://tv.biobiochile.cl/notas/2015/10/21/sobre-camaras-y-camarillas-parlamentarias.shtml',
@ -18,6 +22,7 @@ class BioBioChileTVIE(InfoExtractor):
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
'uploader': 'Fernando Atria', 'uploader': 'Fernando Atria',
}, },
'skip': 'URL expired and redirected to http://www.biobiochile.cl/portada/bbtv/index.html',
}, { }, {
# different uploader layout # different uploader layout
'url': 'http://tv.biobiochile.cl/notas/2016/03/18/natalia-valdebenito-repasa-a-diputado-hasbun-paso-a-la-categoria-de-hablar-brutalidades.shtml', 'url': 'http://tv.biobiochile.cl/notas/2016/03/18/natalia-valdebenito-repasa-a-diputado-hasbun-paso-a-la-categoria-de-hablar-brutalidades.shtml',
@ -32,6 +37,16 @@ class BioBioChileTVIE(InfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
'skip': 'URL expired and redirected to http://www.biobiochile.cl/portada/bbtv/index.html',
}, {
'url': 'http://www.biobiochile.cl/noticias/bbtv/comentarios-bio-bio/2016/07/08/edecanes-del-congreso-figuras-decorativas-que-le-cuestan-muy-caro-a-los-chilenos.shtml',
'info_dict': {
'id': 'edecanes-del-congreso-figuras-decorativas-que-le-cuestan-muy-caro-a-los-chilenos',
'ext': 'mp4',
'uploader': '(none)',
'upload_date': '20160708',
'title': 'Edecanes del Congreso: Figuras decorativas que le cuestan muy caro a los chilenos',
},
}, { }, {
'url': 'http://tv.biobiochile.cl/notas/2015/10/22/ninos-transexuales-de-quien-es-la-decision.shtml', 'url': 'http://tv.biobiochile.cl/notas/2015/10/22/ninos-transexuales-de-quien-es-la-decision.shtml',
'only_matching': True, 'only_matching': True,
@ -45,42 +60,22 @@ class BioBioChileTVIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
rudo_url = RudoIE._extract_url(webpage)
if not rudo_url:
raise ExtractorError('No videos found')
title = remove_end(self._og_search_title(webpage), ' - BioBioChile TV') title = remove_end(self._og_search_title(webpage), ' - BioBioChile TV')
file_url = self._search_regex(
r'loadFWPlayerVideo\([^,]+,\s*(["\'])(?P<url>.+?)\1',
webpage, 'file url', group='url')
base_url = self._search_regex(
r'file\s*:\s*(["\'])(?P<url>.+?)\1\s*\+\s*fileURL', webpage,
'base url', default='http://unlimited2-cl.digitalproserver.com/bbtv/',
group='url')
formats = self._extract_m3u8_formats(
'%s%s/playlist.m3u8' % (base_url, file_url), video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls', fatal=False)
f = {
'url': '%s%s' % (base_url, file_url),
'format_id': 'http',
'protocol': 'http',
'preference': 1,
}
if formats:
f_copy = formats[-1].copy()
f_copy.update(f)
f = f_copy
formats.append(f)
self._sort_formats(formats)
thumbnail = self._og_search_thumbnail(webpage) thumbnail = self._og_search_thumbnail(webpage)
uploader = self._html_search_regex( uploader = self._html_search_regex(
r'<a[^>]+href=["\']https?://busca\.biobiochile\.cl/author[^>]+>(.+?)</a>', r'<a[^>]+href=["\']https?://(?:busca|www)\.biobiochile\.cl/(?:lista/)?(?:author|autor)[^>]+>(.+?)</a>',
webpage, 'uploader', fatal=False) webpage, 'uploader', fatal=False)
return { return {
'_type': 'url_transparent',
'url': rudo_url,
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'uploader': uploader, 'uploader': uploader,
'formats': formats,
} }

View File

@ -24,7 +24,8 @@ class BIQLEIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Ребенок в шоке от автоматической мойки', 'title': 'Ребенок в шоке от автоматической мойки',
'uploader': 'Dmitry Kotov', 'uploader': 'Dmitry Kotov',
} },
'skip': ' This video was marked as adult. Embedding adult videos on external sites is prohibited.',
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -1,3 +1,4 @@
# coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
@ -20,6 +21,18 @@ class BloombergIE(InfoExtractor):
'params': { 'params': {
'format': 'best[format_id^=hds]', 'format': 'best[format_id^=hds]',
}, },
}, {
# video ID in BPlayer(...)
'url': 'http://www.bloomberg.com/features/2016-hello-world-new-zealand/',
'info_dict': {
'id': '938c7e72-3f25-4ddb-8b85-a9be731baa74',
'ext': 'flv',
'title': 'Meet the Real-Life Tech Wizards of Middle Earth',
'description': 'Hello World, Episode 1: New Zealands freaky AI babies, robot exoskeletons, and a virtual you.',
},
'params': {
'format': 'best[format_id^=hds]',
},
}, { }, {
'url': 'http://www.bloomberg.com/news/articles/2015-11-12/five-strange-things-that-have-been-happening-in-financial-markets', 'url': 'http://www.bloomberg.com/news/articles/2015-11-12/five-strange-things-that-have-been-happening-in-financial-markets',
'only_matching': True, 'only_matching': True,
@ -33,7 +46,11 @@ class BloombergIE(InfoExtractor):
webpage = self._download_webpage(url, name) webpage = self._download_webpage(url, name)
video_id = self._search_regex( video_id = self._search_regex(
r'["\']bmmrId["\']\s*:\s*(["\'])(?P<url>.+?)\1', r'["\']bmmrId["\']\s*:\s*(["\'])(?P<url>.+?)\1',
webpage, 'id', group='url') webpage, 'id', group='url', default=None)
if not video_id:
bplayer_data = self._parse_json(self._search_regex(
r'BPlayer\(null,\s*({[^;]+})\);', webpage, 'id'), name)
video_id = bplayer_data['id']
title = re.sub(': Video$', '', self._og_search_title(webpage)) title = re.sub(': Video$', '', self._og_search_title(webpage))
embed_info = self._download_json( embed_info = self._download_json(

View File

@ -12,7 +12,7 @@ from ..utils import (
class BpbIE(InfoExtractor): class BpbIE(InfoExtractor):
IE_DESC = 'Bundeszentrale für politische Bildung' IE_DESC = 'Bundeszentrale für politische Bildung'
_VALID_URL = r'https?://www\.bpb\.de/mediathek/(?P<id>[0-9]+)/' _VALID_URL = r'https?://(?:www\.)?bpb\.de/mediathek/(?P<id>[0-9]+)/'
_TEST = { _TEST = {
'url': 'http://www.bpb.de/mediathek/297/joachim-gauck-zu-1989-und-die-erinnerung-an-die-ddr', 'url': 'http://www.bpb.de/mediathek/297/joachim-gauck-zu-1989-und-die-erinnerung-an-die-ddr',

View File

@ -1,31 +1,74 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .adobepass import AdobePassIE
from ..utils import smuggle_url from ..utils import (
smuggle_url,
update_url_query,
int_or_none,
)
class BravoTVIE(InfoExtractor): class BravoTVIE(AdobePassIE):
_VALID_URL = r'https?://(?:www\.)?bravotv\.com/(?:[^/]+/)+videos/(?P<id>[^/?]+)' _VALID_URL = r'https?://(?:www\.)?bravotv\.com/(?:[^/]+/)+(?P<id>[^/?#]+)'
_TEST = { _TESTS = [{
'url': 'http://www.bravotv.com/last-chance-kitchen/season-5/videos/lck-ep-12-fishy-finale', 'url': 'http://www.bravotv.com/last-chance-kitchen/season-5/videos/lck-ep-12-fishy-finale',
'md5': 'd60cdf68904e854fac669bd26cccf801', 'md5': '9086d0b7ef0ea2aabc4781d75f4e5863',
'info_dict': { 'info_dict': {
'id': 'LitrBdX64qLn', 'id': 'zHyk1_HU_mPy',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Last Chance Kitchen Returns', 'title': 'LCK Ep 12: Fishy Finale',
'description': 'S13: Last Chance Kitchen Returns for Top Chef Season 13', 'description': 'S13/E12: Two eliminated chefs have just 12 minutes to cook up a delicious fish dish.',
'timestamp': 1448926740,
'upload_date': '20151130',
'uploader': 'NBCU-BRAV', 'uploader': 'NBCU-BRAV',
'upload_date': '20160302',
'timestamp': 1456945320,
} }
} }, {
'url': 'http://www.bravotv.com/below-deck/season-3/ep-14-reunion-part-1',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, display_id)
account_pid = self._search_regex(r'"account_pid"\s*:\s*"([^"]+)"', webpage, 'account pid') settings = self._parse_json(self._search_regex(
release_pid = self._search_regex(r'"release_pid"\s*:\s*"([^"]+)"', webpage, 'release pid') r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);', webpage, 'drupal settings'),
return self.url_result(smuggle_url( display_id)
'http://link.theplatform.com/s/%s/%s?mbr=true&switch=progressive' % (account_pid, release_pid), info = {}
{'force_smil_url': True}), 'ThePlatform', release_pid) query = {
'mbr': 'true',
}
account_pid, release_pid = [None] * 2
tve = settings.get('sharedTVE')
if tve:
query['manifest'] = 'm3u'
account_pid = 'HNK2IC'
release_pid = tve['release_pid']
if tve.get('entitlement') == 'auth':
adobe_pass = settings.get('adobePass', {})
resource = self._get_mvpd_resource(
adobe_pass.get('adobePassResourceId', 'bravo'),
tve['title'], release_pid, tve.get('rating'))
query['auth'] = self._extract_mvpd_auth(
url, release_pid, adobe_pass.get('adobePassRequestorId', 'bravo'), resource)
else:
shared_playlist = settings['shared_playlist']
account_pid = shared_playlist['account_pid']
metadata = shared_playlist['video_metadata'][shared_playlist['default_clip']]
release_pid = metadata['release_pid']
info.update({
'title': metadata['title'],
'description': metadata.get('description'),
'season_number': int_or_none(metadata.get('season_num')),
'episode_number': int_or_none(metadata.get('episode_num')),
})
query['switch'] = 'progressive'
info.update({
'_type': 'url_transparent',
'id': release_pid,
'url': smuggle_url(update_url_query(
'http://link.theplatform.com/s/%s/%s' % (account_pid, release_pid),
query), {'force_smil_url': True}),
'ie_key': 'ThePlatform',
})
return info

View File

@ -1,4 +1,4 @@
# encoding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
@ -26,6 +26,8 @@ from ..utils import (
unescapeHTML, unescapeHTML,
unsmuggle_url, unsmuggle_url,
update_url_query, update_url_query,
clean_html,
mimetype2ext,
) )
@ -90,6 +92,7 @@ class BrightcoveLegacyIE(InfoExtractor):
'description': 'md5:363109c02998fee92ec02211bd8000df', 'description': 'md5:363109c02998fee92ec02211bd8000df',
'uploader': 'National Ballet of Canada', 'uploader': 'National Ballet of Canada',
}, },
'skip': 'Video gone',
}, },
{ {
# test flv videos served by akamaihd.net # test flv videos served by akamaihd.net
@ -108,7 +111,7 @@ class BrightcoveLegacyIE(InfoExtractor):
}, },
}, },
{ {
# playlist test # playlist with 'videoList'
# from http://support.brightcove.com/en/video-cloud/docs/playlist-support-single-video-players # from http://support.brightcove.com/en/video-cloud/docs/playlist-support-single-video-players
'url': 'http://c.brightcove.com/services/viewer/htmlFederated?playerID=3550052898001&playerKey=AQ%7E%7E%2CAAABmA9XpXk%7E%2C-Kp7jNgisre1fG5OdqpAFUTcs0lP_ZoL', 'url': 'http://c.brightcove.com/services/viewer/htmlFederated?playerID=3550052898001&playerKey=AQ%7E%7E%2CAAABmA9XpXk%7E%2C-Kp7jNgisre1fG5OdqpAFUTcs0lP_ZoL',
'info_dict': { 'info_dict': {
@ -117,6 +120,15 @@ class BrightcoveLegacyIE(InfoExtractor):
}, },
'playlist_mincount': 7, 'playlist_mincount': 7,
}, },
{
# playlist with 'playlistTab' (https://github.com/rg3/youtube-dl/issues/9965)
'url': 'http://c.brightcove.com/services/json/experience/runtime/?command=get_programming_for_experience&playerKey=AQ%7E%7E,AAABXlLMdok%7E,NJ4EoMlZ4rZdx9eU1rkMVd8EaYPBBUlg',
'info_dict': {
'id': '1522758701001',
'title': 'Lesson 08',
},
'playlist_mincount': 10,
},
] ]
FLV_VCODECS = { FLV_VCODECS = {
1: 'SORENSON', 1: 'SORENSON',
@ -298,13 +310,19 @@ class BrightcoveLegacyIE(InfoExtractor):
info_url, player_key, 'Downloading playlist information') info_url, player_key, 'Downloading playlist information')
json_data = json.loads(playlist_info) json_data = json.loads(playlist_info)
if 'videoList' not in json_data: if 'videoList' in json_data:
playlist_info = json_data['videoList']
playlist_dto = playlist_info['mediaCollectionDTO']
elif 'playlistTabs' in json_data:
playlist_info = json_data['playlistTabs']
playlist_dto = playlist_info['lineupListDTO']['playlistDTOs'][0]
else:
raise ExtractorError('Empty playlist') raise ExtractorError('Empty playlist')
playlist_info = json_data['videoList']
videos = [self._extract_video_info(video_info) for video_info in playlist_info['mediaCollectionDTO']['videoDTOs']] videos = [self._extract_video_info(video_info) for video_info in playlist_dto['videoDTOs']]
return self.playlist_result(videos, playlist_id='%s' % playlist_info['id'], return self.playlist_result(videos, playlist_id='%s' % playlist_info['id'],
playlist_title=playlist_info['mediaCollectionDTO']['displayName']) playlist_title=playlist_dto['displayName'])
def _extract_video_info(self, video_info): def _extract_video_info(self, video_info):
video_id = compat_str(video_info['id']) video_id = compat_str(video_info['id'])
@ -528,14 +546,16 @@ class BrightcoveNewIE(InfoExtractor):
formats = [] formats = []
for source in json_data.get('sources', []): for source in json_data.get('sources', []):
container = source.get('container') container = source.get('container')
source_type = source.get('type') ext = mimetype2ext(source.get('type'))
src = source.get('src') src = source.get('src')
if source_type == 'application/x-mpegURL' or container == 'M2TS': if ext == 'ism':
continue
elif ext == 'm3u8' or container == 'M2TS':
if not src: if not src:
continue continue
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
src, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False)) src, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
elif source_type == 'application/dash+xml': elif ext == 'mpd':
if not src: if not src:
continue continue
formats.extend(self._extract_mpd_formats(src, video_id, 'dash', fatal=False)) formats.extend(self._extract_mpd_formats(src, video_id, 'dash', fatal=False))
@ -551,7 +571,7 @@ class BrightcoveNewIE(InfoExtractor):
'tbr': tbr, 'tbr': tbr,
'filesize': int_or_none(source.get('size')), 'filesize': int_or_none(source.get('size')),
'container': container, 'container': container,
'ext': container.lower(), 'ext': ext or container.lower(),
} }
if width == 0 and height == 0: if width == 0 and height == 0:
f.update({ f.update({
@ -585,6 +605,13 @@ class BrightcoveNewIE(InfoExtractor):
'format_id': build_format_id('rtmp'), 'format_id': build_format_id('rtmp'),
}) })
formats.append(f) formats.append(f)
errors = json_data.get('errors')
if not formats and errors:
error = errors[0]
raise ExtractorError(
error.get('message') or error.get('error_subcode') or error['error_code'], expected=True)
self._sort_formats(formats) self._sort_formats(formats)
subtitles = {} subtitles = {}
@ -594,15 +621,21 @@ class BrightcoveNewIE(InfoExtractor):
'url': text_track['src'], 'url': text_track['src'],
}) })
is_live = False
duration = float_or_none(json_data.get('duration'), 1000)
if duration and duration < 0:
is_live = True
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': self._live_title(title) if is_live else title,
'description': json_data.get('description'), 'description': clean_html(json_data.get('description')),
'thumbnail': json_data.get('thumbnail') or json_data.get('poster'), 'thumbnail': json_data.get('thumbnail') or json_data.get('poster'),
'duration': float_or_none(json_data.get('duration'), 1000), 'duration': duration,
'timestamp': parse_iso8601(json_data.get('published_at')), 'timestamp': parse_iso8601(json_data.get('published_at')),
'uploader_id': account_id, 'uploader_id': account_id,
'formats': formats, 'formats': formats,
'subtitles': subtitles, 'subtitles': subtitles,
'tags': json_data.get('tags', []), 'tags': json_data.get('tags', []),
'is_live': is_live,
} }

View File

@ -5,6 +5,7 @@ import json
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from .facebook import FacebookIE
class BuzzFeedIE(InfoExtractor): class BuzzFeedIE(InfoExtractor):
@ -20,11 +21,11 @@ class BuzzFeedIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': 'aVCR29aE_OQ', 'id': 'aVCR29aE_OQ',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Angry Ram destroys a punching bag..',
'description': 'md5:c59533190ef23fd4458a5e8c8c872345',
'upload_date': '20141024', 'upload_date': '20141024',
'uploader_id': 'Buddhanz1', 'uploader_id': 'Buddhanz1',
'description': 'He likes to stay in shape with his heavy bag, he wont stop until its on the ground\n\nFollow Angry Ram on Facebook for regular updates -\nhttps://www.facebook.com/pages/Angry-Ram/1436897249899558?ref=hl', 'uploader': 'Angry Ram',
'uploader': 'Buddhanz',
'title': 'Angry Ram destroys a punching bag',
} }
}] }]
}, { }, {
@ -41,13 +42,30 @@ class BuzzFeedIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': 'mVmBL8B-In0', 'id': 'mVmBL8B-In0',
'ext': 'mp4', 'ext': 'mp4',
'title': 're:Munchkin the Teddy Bear gets her exercise',
'description': 'md5:28faab95cda6e361bcff06ec12fc21d8',
'upload_date': '20141124', 'upload_date': '20141124',
'uploader_id': 'CindysMunchkin', 'uploader_id': 'CindysMunchkin',
'description': 're:© 2014 Munchkin the',
'uploader': 're:^Munchkin the', 'uploader': 're:^Munchkin the',
'title': 're:Munchkin the Teddy Bear gets her exercise',
}, },
}] }]
}, {
'url': 'http://www.buzzfeed.com/craigsilverman/the-most-adorable-crash-landing-ever#.eq7pX0BAmK',
'info_dict': {
'id': 'the-most-adorable-crash-landing-ever',
'title': 'Watch This Baby Goose Make The Most Adorable Crash Landing',
'description': 'This gosling knows how to stick a landing.',
},
'playlist': [{
'md5': '763ca415512f91ca62e4621086900a23',
'info_dict': {
'id': '971793786185728',
'ext': 'mp4',
'title': 'We set up crash pads so that the goslings on our roof would have a safe landi...',
'uploader': 'Calgary Outdoor Centre-University of Calgary',
},
}],
'add_ie': ['Facebook'],
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -66,6 +84,10 @@ class BuzzFeedIE(InfoExtractor):
continue continue
entries.append(self.url_result(video['url'])) entries.append(self.url_result(video['url']))
facebook_url = FacebookIE._extract_url(webpage)
if facebook_url:
entries.append(self.url_result(facebook_url))
return { return {
'_type': 'playlist', '_type': 'playlist',
'id': playlist_id, 'id': playlist_id,

View File

@ -1,6 +1,5 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import json
import re import re
from .common import InfoExtractor from .common import InfoExtractor
@ -8,15 +7,15 @@ from ..utils import ExtractorError
class BYUtvIE(InfoExtractor): class BYUtvIE(InfoExtractor):
_VALID_URL = r'^https?://(?:www\.)?byutv.org/watch/[0-9a-f-]+/(?P<video_id>[^/?#]+)' _VALID_URL = r'https?://(?:www\.)?byutv\.org/watch/(?!event/)(?P<id>[0-9a-f-]+)(?:/(?P<display_id>[^/?#&]+))?'
_TEST = { _TESTS = [{
'url': 'http://www.byutv.org/watch/6587b9a3-89d2-42a6-a7f7-fd2f81840a7d/studio-c-season-5-episode-5', 'url': 'http://www.byutv.org/watch/6587b9a3-89d2-42a6-a7f7-fd2f81840a7d/studio-c-season-5-episode-5',
'md5': '05850eb8c749e2ee05ad5a1c34668493',
'info_dict': { 'info_dict': {
'id': 'studio-c-season-5-episode-5', 'id': '6587b9a3-89d2-42a6-a7f7-fd2f81840a7d',
'display_id': 'studio-c-season-5-episode-5',
'ext': 'mp4', 'ext': 'mp4',
'description': 'md5:e07269172baff037f8e8bf9956bc9747',
'title': 'Season 5 Episode 5', 'title': 'Season 5 Episode 5',
'description': 'md5:e07269172baff037f8e8bf9956bc9747',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
'duration': 1486.486, 'duration': 1486.486,
}, },
@ -24,28 +23,71 @@ class BYUtvIE(InfoExtractor):
'skip_download': True, 'skip_download': True,
}, },
'add_ie': ['Ooyala'], 'add_ie': ['Ooyala'],
}, {
'url': 'http://www.byutv.org/watch/6587b9a3-89d2-42a6-a7f7-fd2f81840a7d',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
display_id = mobj.group('display_id') or video_id
webpage = self._download_webpage(url, display_id)
episode_code = self._search_regex(
r'(?s)episode:(.*?\}),\s*\n', webpage, 'episode information')
ep = self._parse_json(
episode_code, display_id, transform_source=lambda s:
re.sub(r'(\n\s+)([a-zA-Z]+):\s+\'(.*?)\'', r'\1"\2": "\3"', s))
if ep['providerType'] != 'Ooyala':
raise ExtractorError('Unsupported provider %s' % ep['provider'])
return {
'_type': 'url_transparent',
'ie_key': 'Ooyala',
'url': 'ooyala:%s' % ep['providerId'],
'id': video_id,
'display_id': display_id,
'title': ep['title'],
'description': ep.get('description'),
'thumbnail': ep.get('imageThumbnail'),
}
class BYUtvEventIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?byutv\.org/watch/event/(?P<id>[0-9a-f-]+)'
_TEST = {
'url': 'http://www.byutv.org/watch/event/29941b9b-8bf6-48d2-aebf-7a87add9e34b',
'info_dict': {
'id': '29941b9b-8bf6-48d2-aebf-7a87add9e34b',
'ext': 'mp4',
'title': 'Toledo vs. BYU (9/30/16)',
},
'params': {
'skip_download': True,
},
'add_ie': ['Ooyala'],
} }
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id = self._match_id(url)
video_id = mobj.group('video_id')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
episode_code = self._search_regex(
r'(?s)episode:(.*?\}),\s*\n', webpage, 'episode information')
episode_json = re.sub(
r'(\n\s+)([a-zA-Z]+):\s+\'(.*?)\'', r'\1"\2": "\3"', episode_code)
ep = json.loads(episode_json)
if ep['providerType'] == 'Ooyala': ooyala_id = self._search_regex(
return { r'providerId\s*:\s*(["\'])(?P<id>(?:(?!\1).)+)\1',
'_type': 'url_transparent', webpage, 'ooyala id', group='id')
'ie_key': 'Ooyala',
'url': 'ooyala:%s' % ep['providerId'], title = self._search_regex(
'id': video_id, r'class=["\']description["\'][^>]*>\s*<h1>([^<]+)</h1>', webpage,
'title': ep['title'], 'title').strip()
'description': ep.get('description'),
'thumbnail': ep.get('imageThumbnail'), return {
} '_type': 'url_transparent',
else: 'ie_key': 'Ooyala',
raise ExtractorError('Unsupported provider %s' % ep['provider']) 'url': 'ooyala:%s' % ooyala_id,
'id': video_id,
'title': title,
}

View File

@ -1,7 +1,6 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import datetime
import re import re
from .common import InfoExtractor from .common import InfoExtractor
@ -10,8 +9,10 @@ from ..compat import (
compat_urlparse, compat_urlparse,
) )
from ..utils import ( from ..utils import (
parse_iso8601, clean_html,
parse_duration,
str_to_int, str_to_int,
unified_strdate,
) )
@ -26,14 +27,14 @@ class CamdemyIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Ch1-1 Introduction, Signals (02-23-2012)', 'title': 'Ch1-1 Introduction, Signals (02-23-2012)',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
'description': '',
'creator': 'ss11spring', 'creator': 'ss11spring',
'duration': 1591,
'upload_date': '20130114', 'upload_date': '20130114',
'timestamp': 1358154556,
'view_count': int, 'view_count': int,
} }
}, { }, {
# With non-empty description # With non-empty description
# webpage returns "No permission or not login"
'url': 'http://www.camdemy.com/media/13885', 'url': 'http://www.camdemy.com/media/13885',
'md5': '4576a3bb2581f86c61044822adbd1249', 'md5': '4576a3bb2581f86c61044822adbd1249',
'info_dict': { 'info_dict': {
@ -41,70 +42,77 @@ class CamdemyIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'EverCam + Camdemy QuickStart', 'title': 'EverCam + Camdemy QuickStart',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
'description': 'md5:050b62f71ed62928f8a35f1a41e186c9', 'description': 'md5:2a9f989c2b153a2342acee579c6e7db6',
'creator': 'evercam', 'creator': 'evercam',
'upload_date': '20140620', 'duration': 318,
'timestamp': 1403271569,
} }
}, { }, {
# External source # External source (YouTube)
'url': 'http://www.camdemy.com/media/14842', 'url': 'http://www.camdemy.com/media/14842',
'md5': '50e1c3c3aa233d3d7b7daa2fa10b1cf7',
'info_dict': { 'info_dict': {
'id': '2vsYQzNIsJo', 'id': '2vsYQzNIsJo',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Excel 2013 Tutorial - How to add Password Protection',
'description': 'Excel 2013 Tutorial for Beginners - How to add Password Protection',
'upload_date': '20130211', 'upload_date': '20130211',
'uploader': 'Hun Kim', 'uploader': 'Hun Kim',
'description': 'Excel 2013 Tutorial for Beginners - How to add Password Protection',
'uploader_id': 'hunkimtutorials', 'uploader_id': 'hunkimtutorials',
'title': 'Excel 2013 Tutorial - How to add Password Protection', },
} 'params': {
'skip_download': True,
},
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
page = self._download_webpage(url, video_id)
webpage = self._download_webpage(url, video_id)
src_from = self._html_search_regex( src_from = self._html_search_regex(
r"<div class='srcFrom'>Source: <a title='([^']+)'", page, r"class=['\"]srcFrom['\"][^>]*>Sources?(?:\s+from)?\s*:\s*<a[^>]+(?:href|title)=(['\"])(?P<url>(?:(?!\1).)+)\1",
'external source', default=None) webpage, 'external source', default=None, group='url')
if src_from: if src_from:
return self.url_result(src_from) return self.url_result(src_from)
oembed_obj = self._download_json( oembed_obj = self._download_json(
'http://www.camdemy.com/oembed/?format=json&url=' + url, video_id) 'http://www.camdemy.com/oembed/?format=json&url=' + url, video_id)
title = oembed_obj['title']
thumb_url = oembed_obj['thumbnail_url'] thumb_url = oembed_obj['thumbnail_url']
video_folder = compat_urlparse.urljoin(thumb_url, 'video/') video_folder = compat_urlparse.urljoin(thumb_url, 'video/')
file_list_doc = self._download_xml( file_list_doc = self._download_xml(
compat_urlparse.urljoin(video_folder, 'fileList.xml'), compat_urlparse.urljoin(video_folder, 'fileList.xml'),
video_id, 'Filelist XML') video_id, 'Downloading filelist XML')
file_name = file_list_doc.find('./video/item/fileName').text file_name = file_list_doc.find('./video/item/fileName').text
video_url = compat_urlparse.urljoin(video_folder, file_name) video_url = compat_urlparse.urljoin(video_folder, file_name)
timestamp = parse_iso8601(self._html_search_regex( # Some URLs return "No permission or not login" in a webpage despite being
r"<div class='title'>Posted\s*:</div>\s*<div class='value'>([^<>]+)<", # freely available via oembed JSON URL (e.g. http://www.camdemy.com/media/13885)
page, 'creation time', fatal=False), upload_date = unified_strdate(self._search_regex(
delimiter=' ', timezone=datetime.timedelta(hours=8)) r'>published on ([^<]+)<', webpage,
view_count = str_to_int(self._html_search_regex( 'upload date', default=None))
r"<div class='title'>Views\s*:</div>\s*<div class='value'>([^<>]+)<", view_count = str_to_int(self._search_regex(
page, 'view count', fatal=False)) r'role=["\']viewCnt["\'][^>]*>([\d,.]+) views',
webpage, 'view count', default=None))
description = self._html_search_meta(
'description', webpage, default=None) or clean_html(
oembed_obj.get('description'))
return { return {
'id': video_id, 'id': video_id,
'url': video_url, 'url': video_url,
'title': oembed_obj['title'], 'title': title,
'thumbnail': thumb_url, 'thumbnail': thumb_url,
'description': self._html_search_meta('description', page), 'description': description,
'creator': oembed_obj['author_name'], 'creator': oembed_obj.get('author_name'),
'duration': oembed_obj['duration'], 'duration': parse_duration(oembed_obj.get('duration')),
'timestamp': timestamp, 'upload_date': upload_date,
'view_count': view_count, 'view_count': view_count,
} }
class CamdemyFolderIE(InfoExtractor): class CamdemyFolderIE(InfoExtractor):
_VALID_URL = r'https?://www.camdemy.com/folder/(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?camdemy\.com/folder/(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
# links with trailing slash # links with trailing slash
'url': 'http://www.camdemy.com/folder/450', 'url': 'http://www.camdemy.com/folder/450',

View File

@ -1,4 +1,4 @@
# encoding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
@ -6,11 +6,13 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urllib_parse_urlparse from ..compat import compat_urllib_parse_urlparse
from ..utils import ( from ..utils import (
dict_get,
ExtractorError, ExtractorError,
HEADRequest, HEADRequest,
unified_strdate,
qualities,
int_or_none, int_or_none,
qualities,
remove_end,
unified_strdate,
) )
@ -23,6 +25,7 @@ class CanalplusIE(InfoExtractor):
(?:(?:www|m)\.)?canalplus\.fr| (?:(?:www|m)\.)?canalplus\.fr|
(?:www\.)?piwiplus\.fr| (?:www\.)?piwiplus\.fr|
(?:www\.)?d8\.tv| (?:www\.)?d8\.tv|
(?:www\.)?c8\.fr|
(?:www\.)?d17\.tv| (?:www\.)?d17\.tv|
(?:www\.)?itele\.fr (?:www\.)?itele\.fr
)/(?:(?:[^/]+/)*(?P<display_id>[^/?#&]+))?(?:\?.*\bvid=(?P<vid>\d+))?| )/(?:(?:[^/]+/)*(?P<display_id>[^/?#&]+))?(?:\?.*\bvid=(?P<vid>\d+))?|
@ -35,53 +38,53 @@ class CanalplusIE(InfoExtractor):
'canalplus': 'cplus', 'canalplus': 'cplus',
'piwiplus': 'teletoon', 'piwiplus': 'teletoon',
'd8': 'd8', 'd8': 'd8',
'c8': 'd8',
'd17': 'd17', 'd17': 'd17',
'itele': 'itele', 'itele': 'itele',
} }
_TESTS = [{ _TESTS = [{
'url': 'http://www.canalplus.fr/c-emissions/pid1830-c-zapping.html?vid=1192814', 'url': 'http://www.canalplus.fr/c-emissions/pid1830-c-zapping.html?vid=1192814',
'md5': '41f438a4904f7664b91b4ed0dec969dc',
'info_dict': { 'info_dict': {
'id': '1192814', 'id': '1405510',
'display_id': 'pid1830-c-zapping',
'ext': 'mp4', 'ext': 'mp4',
'title': "L'Année du Zapping 2014 - L'Année du Zapping 2014", 'title': 'Zapping - 02/07/2016',
'description': "Toute l'année 2014 dans un Zapping exceptionnel !", 'description': 'Le meilleur de toutes les chaînes, tous les jours',
'upload_date': '20150105', 'upload_date': '20160702',
}, },
}, { }, {
'url': 'http://www.piwiplus.fr/videos-piwi/pid1405-le-labyrinthe-boing-super-ranger.html?vid=1108190', 'url': 'http://www.piwiplus.fr/videos-piwi/pid1405-le-labyrinthe-boing-super-ranger.html?vid=1108190',
'info_dict': { 'info_dict': {
'id': '1108190', 'id': '1108190',
'ext': 'flv', 'display_id': 'pid1405-le-labyrinthe-boing-super-ranger',
'title': 'Le labyrinthe - Boing super ranger', 'ext': 'mp4',
'title': 'BOING SUPER RANGER - Ep : Le labyrinthe',
'description': 'md5:4cea7a37153be42c1ba2c1d3064376ff', 'description': 'md5:4cea7a37153be42c1ba2c1d3064376ff',
'upload_date': '20140724', 'upload_date': '20140724',
}, },
'skip': 'Only works from France', 'skip': 'Only works from France',
}, { }, {
'url': 'http://www.d8.tv/d8-docs-mags/pid5198-d8-en-quete-d-actualite.html?vid=1390231', 'url': 'http://www.c8.fr/c8-divertissement/ms-touche-pas-a-mon-poste/pid6318-videos-integrales.html',
'md5': '4b47b12b4ee43002626b97fad8fb1de5',
'info_dict': { 'info_dict': {
'id': '1390231', 'id': '1420213',
'display_id': 'pid6318-videos-integrales',
'ext': 'mp4', 'ext': 'mp4',
'title': "Vacances pas chères : prix discount ou grosses dépenses ? - En quête d'actualité", 'title': 'TPMP ! Même le matin - Les 35H de Baba - 14/10/2016',
'description': 'md5:edb6cf1cb4a1e807b5dd089e1ac8bfc6', 'description': 'md5:f96736c1b0ffaa96fd5b9e60ad871799',
'upload_date': '20160512', 'upload_date': '20161014',
},
'params': {
'skip_download': True,
}, },
'skip': 'Only works from France',
}, { }, {
'url': 'http://www.itele.fr/chroniques/invite-bruce-toussaint/thierry-solere-nicolas-sarkozy-officialisera-sa-candidature-a-la-primaire-quand-il-le-voudra-167224', 'url': 'http://www.itele.fr/chroniques/invite-michael-darmon/rachida-dati-nicolas-sarkozy-est-le-plus-en-phase-avec-les-inquietudes-des-francais-171510',
'info_dict': { 'info_dict': {
'id': '1398334', 'id': '1420176',
'display_id': 'rachida-dati-nicolas-sarkozy-est-le-plus-en-phase-avec-les-inquietudes-des-francais-171510',
'ext': 'mp4', 'ext': 'mp4',
'title': "L'invité de Bruce Toussaint du 07/06/2016 - ", 'title': 'L\'invité de Michaël Darmon du 14/10/2016 - ',
'description': 'md5:40ac7c9ad0feaeb6f605bad986f61324', 'description': 'Chaque matin du lundi au vendredi, Michaël Darmon reçoit un invité politique à 8h25.',
'upload_date': '20160607', 'upload_date': '20161014',
},
'params': {
'skip_download': True,
}, },
}, { }, {
'url': 'http://m.canalplus.fr/?vid=1398231', 'url': 'http://m.canalplus.fr/?vid=1398231',
@ -93,18 +96,17 @@ class CanalplusIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.groupdict().get('id') or mobj.groupdict().get('vid')
site_id = self._SITE_ID_MAP[compat_urllib_parse_urlparse(url).netloc.rsplit('.', 2)[-2]] site_id = self._SITE_ID_MAP[compat_urllib_parse_urlparse(url).netloc.rsplit('.', 2)[-2]]
# Beware, some subclasses do not define an id group # Beware, some subclasses do not define an id group
display_id = mobj.group('display_id') or video_id display_id = remove_end(dict_get(mobj.groupdict(), ('display_id', 'id', 'vid')), '.html')
if video_id is None: webpage = self._download_webpage(url, display_id)
webpage = self._download_webpage(url, display_id) video_id = self._search_regex(
video_id = self._search_regex( [r'<canal:player[^>]+?videoId=(["\'])(?P<id>\d+)',
[r'<canal:player[^>]+?videoId=(["\'])(?P<id>\d+)', r'id=["\']canal_video_player(?P<id>\d+)'], r'id=["\']canal_video_player(?P<id>\d+)'],
webpage, 'video id', group='id') webpage, 'video id', group='id')
info_url = self._VIDEO_INFO_TEMPLATE % (site_id, video_id) info_url = self._VIDEO_INFO_TEMPLATE % (site_id, video_id)
video_data = self._download_json(info_url, video_id, 'Downloading video JSON') video_data = self._download_json(info_url, video_id, 'Downloading video JSON')

View File

@ -1,11 +1,13 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import float_or_none from ..utils import float_or_none
class CanvasIE(InfoExtractor): class CanvasIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?canvas\.be/video/(?:[^/]+/)*(?P<id>[^/?#&]+)' _VALID_URL = r'https?://(?:www\.)?(?P<site_id>canvas|een)\.be/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.canvas.be/video/de-afspraak/najaar-2015/de-afspraak-veilt-voor-de-warmste-week', 'url': 'http://www.canvas.be/video/de-afspraak/najaar-2015/de-afspraak-veilt-voor-de-warmste-week',
'md5': 'ea838375a547ac787d4064d8c7860a6c', 'md5': 'ea838375a547ac787d4064d8c7860a6c',
@ -38,22 +40,42 @@ class CanvasIE(InfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
} }
}, {
'url': 'https://www.een.be/sorry-voor-alles/herbekijk-sorry-voor-alles',
'info_dict': {
'id': 'mz-ast-11a587f8-b921-4266-82e2-0bce3e80d07f',
'display_id': 'herbekijk-sorry-voor-alles',
'ext': 'mp4',
'title': 'Herbekijk Sorry voor alles',
'description': 'md5:8bb2805df8164e5eb95d6a7a29dc0dd3',
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 3788.06,
},
'params': {
'skip_download': True,
}
}, {
'url': 'https://www.canvas.be/check-point/najaar-2016/de-politie-uw-vriend',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) mobj = re.match(self._VALID_URL, url)
site_id, display_id = mobj.group('site_id'), mobj.group('id')
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
title = self._search_regex( title = (self._search_regex(
r'<h1[^>]+class="video__body__header__title"[^>]*>(.+?)</h1>', r'<h1[^>]+class="video__body__header__title"[^>]*>(.+?)</h1>',
webpage, 'title', default=None) or self._og_search_title(webpage) webpage, 'title', default=None) or self._og_search_title(
webpage)).strip()
video_id = self._html_search_regex( video_id = self._html_search_regex(
r'data-video=(["\'])(?P<id>.+?)\1', webpage, 'video id', group='id') r'data-video=(["\'])(?P<id>(?:(?!\1).)+)\1', webpage, 'video id', group='id')
data = self._download_json( data = self._download_json(
'https://mediazone.vrt.be/api/v1/canvas/assets/%s' % video_id, display_id) 'https://mediazone.vrt.be/api/v1/%s/assets/%s'
% (site_id, video_id), display_id)
formats = [] formats = []
for target in data['targetUrls']: for target in data['targetUrls']:

View File

@ -9,6 +9,8 @@ from ..utils import (
try_get, try_get,
) )
from .videomore import VideomoreIE
class CarambaTVIE(InfoExtractor): class CarambaTVIE(InfoExtractor):
_VALID_URL = r'(?:carambatv:|https?://video1\.carambatv\.ru/v/)(?P<id>\d+)' _VALID_URL = r'(?:carambatv:|https?://video1\.carambatv\.ru/v/)(?P<id>\d+)'
@ -62,14 +64,16 @@ class CarambaTVPageIE(InfoExtractor):
_VALID_URL = r'https?://carambatv\.ru/(?:[^/]+/)+(?P<id>[^/?#&]+)' _VALID_URL = r'https?://carambatv\.ru/(?:[^/]+/)+(?P<id>[^/?#&]+)'
_TEST = { _TEST = {
'url': 'http://carambatv.ru/movie/bad-comedian/razborka-v-manile/', 'url': 'http://carambatv.ru/movie/bad-comedian/razborka-v-manile/',
'md5': '', 'md5': 'a49fb0ec2ad66503eeb46aac237d3c86',
'info_dict': { 'info_dict': {
'id': '191910501', 'id': '475222',
'ext': 'mp4', 'ext': 'flv',
'title': '[BadComedian] - Разборка в Маниле (Абсолютный обзор)', 'title': '[BadComedian] - Разборка в Маниле (Абсолютный обзор)',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg',
'duration': 2678.31, # duration reported by videomore is incorrect
'duration': int,
}, },
'add_ie': [VideomoreIE.ie_key()],
} }
def _real_extract(self, url): def _real_extract(self, url):
@ -77,6 +81,16 @@ class CarambaTVPageIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
videomore_url = VideomoreIE._extract_url(webpage)
if videomore_url:
title = self._og_search_title(webpage)
return {
'_type': 'url_transparent',
'url': videomore_url,
'ie_key': VideomoreIE.ie_key(),
'title': title,
}
video_url = self._og_search_property('video:iframe', webpage, default=None) video_url = self._og_search_property('video:iframe', webpage, default=None)
if not video_url: if not video_url:

View File

@ -0,0 +1,42 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .turner import TurnerBaseIE
class CartoonNetworkIE(TurnerBaseIE):
_VALID_URL = r'https?://(?:www\.)?cartoonnetwork\.com/video/(?:[^/]+/)+(?P<id>[^/?#]+)-(?:clip|episode)\.html'
_TEST = {
'url': 'http://www.cartoonnetwork.com/video/teen-titans-go/starfire-the-cat-lady-clip.html',
'info_dict': {
'id': '8a250ab04ed07e6c014ef3f1e2f9016c',
'ext': 'mp4',
'title': 'Starfire the Cat Lady',
'description': 'Robin decides to become a cat so that Starfire will finally love him.',
},
'params': {
# m3u8 download
'skip_download': True,
},
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
id_type, video_id = re.search(r"_cnglobal\.cvp(Video|Title)Id\s*=\s*'([^']+)';", webpage).groups()
query = ('id' if id_type == 'Video' else 'titleId') + '=' + video_id
return self._extract_cvp_info(
'http://www.cartoonnetwork.com/video-seo-svc/episodeservices/getCvpPlaylist?networkName=CN2&' + query, video_id, {
'secure': {
'media_src': 'http://androidhls-secure.cdn.turner.com/toon/big',
'tokenizer_src': 'http://www.cartoonnetwork.com/cntv/mvpd/processors/services/token_ipadAdobe.do',
},
}, {
'url': url,
'site_name': 'CartoonNetwork',
'auth_required': self._search_regex(
r'_cnglobal\.cvpFullOrPreviewAuth\s*=\s*(true|false);',
webpage, 'auth required', default='false') == 'true',
})

View File

@ -4,13 +4,24 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str
from ..utils import ( from ..utils import (
js_to_json, js_to_json,
smuggle_url, smuggle_url,
try_get,
xpath_text,
xpath_element,
xpath_with_ns,
find_xpath_attr,
parse_iso8601,
parse_age_limit,
int_or_none,
ExtractorError,
) )
class CBCIE(InfoExtractor): class CBCIE(InfoExtractor):
IE_NAME = 'cbc.ca'
_VALID_URL = r'https?://(?:www\.)?cbc\.ca/(?!player/)(?:[^/]+/)+(?P<id>[^/?#]+)' _VALID_URL = r'https?://(?:www\.)?cbc\.ca/(?!player/)(?:[^/]+/)+(?P<id>[^/?#]+)'
_TESTS = [{ _TESTS = [{
# with mediaId # with mediaId
@ -25,8 +36,22 @@ class CBCIE(InfoExtractor):
'upload_date': '20160203', 'upload_date': '20160203',
'uploader': 'CBCC-NEW', 'uploader': 'CBCC-NEW',
}, },
'skip': 'Geo-restricted to Canada',
}, { }, {
# with clipId # with clipId, feed available via tpfeed.cbc.ca and feed.theplatform.com
'url': 'http://www.cbc.ca/22minutes/videos/22-minutes-update/22-minutes-update-episode-4',
'md5': '162adfa070274b144f4fdc3c3b8207db',
'info_dict': {
'id': '2414435309',
'ext': 'mp4',
'title': '22 Minutes Update: What Not To Wear Quebec',
'description': "This week's latest Canadian top political story is What Not To Wear Quebec.",
'upload_date': '20131025',
'uploader': 'CBCC-NEW',
'timestamp': 1382717907,
},
}, {
# with clipId, feed only available via tpfeed.cbc.ca
'url': 'http://www.cbc.ca/archives/entry/1978-robin-williams-freestyles-on-90-minutes-live', 'url': 'http://www.cbc.ca/archives/entry/1978-robin-williams-freestyles-on-90-minutes-live',
'md5': '0274a90b51a9b4971fe005c63f592f12', 'md5': '0274a90b51a9b4971fe005c63f592f12',
'info_dict': { 'info_dict': {
@ -64,6 +89,7 @@ class CBCIE(InfoExtractor):
'uploader': 'CBCC-NEW', 'uploader': 'CBCC-NEW',
}, },
}], }],
'skip': 'Geo-restricted to Canada',
}] }]
@classmethod @classmethod
@ -81,9 +107,15 @@ class CBCIE(InfoExtractor):
media_id = player_info.get('mediaId') media_id = player_info.get('mediaId')
if not media_id: if not media_id:
clip_id = player_info['clipId'] clip_id = player_info['clipId']
media_id = self._download_json( feed = self._download_json(
'http://feed.theplatform.com/f/h9dtGB/punlNGjMlc1F?fields=id&byContent=byReleases%3DbyId%253D' + clip_id, 'http://tpfeed.cbc.ca/f/ExhSPC/vms_5akSXx4Ng_Zn?byCustomValue={:mpsReleases}{%s}' % clip_id,
clip_id)['entries'][0]['id'].split('/')[-1] clip_id, fatal=False)
if feed:
media_id = try_get(feed, lambda x: x['entries'][0]['guid'], compat_str)
if not media_id:
media_id = self._download_json(
'http://feed.theplatform.com/f/h9dtGB/punlNGjMlc1F?fields=id&byContent=byReleases%3DbyId%253D' + clip_id,
clip_id)['entries'][0]['id'].split('/')[-1]
return self.url_result('cbcplayer:%s' % media_id, 'CBCPlayer', media_id) return self.url_result('cbcplayer:%s' % media_id, 'CBCPlayer', media_id)
else: else:
entries = [self.url_result('cbcplayer:%s' % media_id, 'CBCPlayer', media_id) for media_id in re.findall(r'<iframe[^>]+src="[^"]+?mediaId=(\d+)"', webpage)] entries = [self.url_result('cbcplayer:%s' % media_id, 'CBCPlayer', media_id) for media_id in re.findall(r'<iframe[^>]+src="[^"]+?mediaId=(\d+)"', webpage)]
@ -91,6 +123,7 @@ class CBCIE(InfoExtractor):
class CBCPlayerIE(InfoExtractor): class CBCPlayerIE(InfoExtractor):
IE_NAME = 'cbc.ca:player'
_VALID_URL = r'(?:cbcplayer:|https?://(?:www\.)?cbc\.ca/(?:player/play/|i/caffeine/syndicate/\?mediaId=))(?P<id>\d+)' _VALID_URL = r'(?:cbcplayer:|https?://(?:www\.)?cbc\.ca/(?:player/play/|i/caffeine/syndicate/\?mediaId=))(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.cbc.ca/player/play/2683190193', 'url': 'http://www.cbc.ca/player/play/2683190193',
@ -104,6 +137,7 @@ class CBCPlayerIE(InfoExtractor):
'upload_date': '20160210', 'upload_date': '20160210',
'uploader': 'CBCC-NEW', 'uploader': 'CBCC-NEW',
}, },
'skip': 'Geo-restricted to Canada',
}, { }, {
# Redirected from http://www.cbc.ca/player/AudioMobile/All%20in%20a%20Weekend%20Montreal/ID/2657632011/ # Redirected from http://www.cbc.ca/player/AudioMobile/All%20in%20a%20Weekend%20Montreal/ID/2657632011/
'url': 'http://www.cbc.ca/player/play/2657631896', 'url': 'http://www.cbc.ca/player/play/2657631896',
@ -143,3 +177,165 @@ class CBCPlayerIE(InfoExtractor):
}), }),
'id': video_id, 'id': video_id,
} }
class CBCWatchBaseIE(InfoExtractor):
_device_id = None
_device_token = None
_API_BASE_URL = 'https://api-cbc.cloud.clearleap.com/cloffice/client/'
_NS_MAP = {
'media': 'http://search.yahoo.com/mrss/',
'clearleap': 'http://www.clearleap.com/namespace/clearleap/1.0/',
}
def _call_api(self, path, video_id):
url = path if path.startswith('http') else self._API_BASE_URL + path
result = self._download_xml(url, video_id, headers={
'X-Clearleap-DeviceId': self._device_id,
'X-Clearleap-DeviceToken': self._device_token,
})
error_message = xpath_text(result, 'userMessage') or xpath_text(result, 'systemMessage')
if error_message:
raise ExtractorError('%s said: %s' % (self.IE_NAME, error_message))
return result
def _real_initialize(self):
if not self._device_id or not self._device_token:
device = self._downloader.cache.load('cbcwatch', 'device') or {}
self._device_id, self._device_token = device.get('id'), device.get('token')
if not self._device_id or not self._device_token:
result = self._download_xml(
self._API_BASE_URL + 'device/register',
None, data=b'<device><type>web</type></device>')
self._device_id = xpath_text(result, 'deviceId', fatal=True)
self._device_token = xpath_text(result, 'deviceToken', fatal=True)
self._downloader.cache.store(
'cbcwatch', 'device', {
'id': self._device_id,
'token': self._device_token,
})
def _parse_rss_feed(self, rss):
channel = xpath_element(rss, 'channel', fatal=True)
def _add_ns(path):
return xpath_with_ns(path, self._NS_MAP)
entries = []
for item in channel.findall('item'):
guid = xpath_text(item, 'guid', fatal=True)
title = xpath_text(item, 'title', fatal=True)
media_group = xpath_element(item, _add_ns('media:group'), fatal=True)
content = xpath_element(media_group, _add_ns('media:content'), fatal=True)
content_url = content.attrib['url']
thumbnails = []
for thumbnail in media_group.findall(_add_ns('media:thumbnail')):
thumbnail_url = thumbnail.get('url')
if not thumbnail_url:
continue
thumbnails.append({
'id': thumbnail.get('profile'),
'url': thumbnail_url,
'width': int_or_none(thumbnail.get('width')),
'height': int_or_none(thumbnail.get('height')),
})
timestamp = None
release_date = find_xpath_attr(
item, _add_ns('media:credit'), 'role', 'releaseDate')
if release_date is not None:
timestamp = parse_iso8601(release_date.text)
entries.append({
'_type': 'url_transparent',
'url': content_url,
'id': guid,
'title': title,
'description': xpath_text(item, 'description'),
'timestamp': timestamp,
'duration': int_or_none(content.get('duration')),
'age_limit': parse_age_limit(xpath_text(item, _add_ns('media:rating'))),
'episode': xpath_text(item, _add_ns('clearleap:episode')),
'episode_number': int_or_none(xpath_text(item, _add_ns('clearleap:episodeInSeason'))),
'series': xpath_text(item, _add_ns('clearleap:series')),
'season_number': int_or_none(xpath_text(item, _add_ns('clearleap:season'))),
'thumbnails': thumbnails,
'ie_key': 'CBCWatchVideo',
})
return self.playlist_result(
entries, xpath_text(channel, 'guid'),
xpath_text(channel, 'title'),
xpath_text(channel, 'description'))
class CBCWatchVideoIE(CBCWatchBaseIE):
IE_NAME = 'cbc.ca:watch:video'
_VALID_URL = r'https?://api-cbc\.cloud\.clearleap\.com/cloffice/client/web/play/?\?.*?\bcontentId=(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'
def _real_extract(self, url):
video_id = self._match_id(url)
result = self._call_api(url, video_id)
m3u8_url = xpath_text(result, 'url', fatal=True)
formats = self._extract_m3u8_formats(re.sub(r'/([^/]+)/[^/?]+\.m3u8', r'/\1/\1.m3u8', m3u8_url), video_id, 'mp4', fatal=False)
if len(formats) < 2:
formats = self._extract_m3u8_formats(m3u8_url, video_id, 'mp4')
# Despite metadata in m3u8 all video+audio formats are
# actually video-only (no audio)
for f in formats:
if f.get('acodec') != 'none' and f.get('vcodec') != 'none':
f['acodec'] = 'none'
self._sort_formats(formats)
info = {
'id': video_id,
'title': video_id,
'formats': formats,
}
rss = xpath_element(result, 'rss')
if rss:
info.update(self._parse_rss_feed(rss)['entries'][0])
del info['url']
del info['_type']
del info['ie_key']
return info
class CBCWatchIE(CBCWatchBaseIE):
IE_NAME = 'cbc.ca:watch'
_VALID_URL = r'https?://watch\.cbc\.ca/(?:[^/]+/)+(?P<id>[0-9a-f-]+)'
_TESTS = [{
'url': 'http://watch.cbc.ca/doc-zone/season-6/customer-disservice/38e815a-009e3ab12e4',
'info_dict': {
'id': '38e815a-009e3ab12e4',
'ext': 'mp4',
'title': 'Customer (Dis)Service',
'description': 'md5:8bdd6913a0fe03d4b2a17ebe169c7c87',
'upload_date': '20160219',
'timestamp': 1455840000,
},
'params': {
# m3u8 download
'skip_download': True,
'format': 'bestvideo',
},
'skip': 'Geo-restricted to Canada',
}, {
'url': 'http://watch.cbc.ca/arthur/all/1ed4b385-cd84-49cf-95f0-80f004680057',
'info_dict': {
'id': '1ed4b385-cd84-49cf-95f0-80f004680057',
'title': 'Arthur',
'description': 'Arthur, the sweetest 8-year-old aardvark, and his pals solve all kinds of problems with humour, kindness and teamwork.',
},
'playlist_mincount': 30,
'skip': 'Geo-restricted to Canada',
}]
def _real_extract(self, url):
video_id = self._match_id(url)
rss = self._call_api('web/browse/' + video_id, video_id)
return self._parse_rss_feed(rss)

View File

@ -4,6 +4,9 @@ from .theplatform import ThePlatformFeedIE
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
find_xpath_attr, find_xpath_attr,
xpath_element,
xpath_text,
update_url_query,
) )
@ -17,19 +20,6 @@ class CBSBaseIE(ThePlatformFeedIE):
}] }]
} if closed_caption_e is not None and closed_caption_e.attrib.get('value') else [] } if closed_caption_e is not None and closed_caption_e.attrib.get('value') else []
def _extract_video_info(self, filter_query, video_id):
return self._extract_feed_info(
'dJ5BDC', 'VxxJg8Ymh8sE', filter_query, video_id, lambda entry: {
'series': entry.get('cbs$SeriesTitle'),
'season_number': int_or_none(entry.get('cbs$SeasonNumber')),
'episode': entry.get('cbs$EpisodeTitle'),
'episode_number': int_or_none(entry.get('cbs$EpisodeNumber')),
}, {
'StreamPack': {
'manifest': 'm3u',
}
})
class CBSIE(CBSBaseIE): class CBSIE(CBSBaseIE):
_VALID_URL = r'(?:cbs:|https?://(?:www\.)?(?:cbs\.com/shows/[^/]+/video|colbertlateshow\.com/(?:video|podcasts))/)(?P<id>[\w-]+)' _VALID_URL = r'(?:cbs:|https?://(?:www\.)?(?:cbs\.com/shows/[^/]+/video|colbertlateshow\.com/(?:video|podcasts))/)(?P<id>[\w-]+)'
@ -38,7 +28,6 @@ class CBSIE(CBSBaseIE):
'url': 'http://www.cbs.com/shows/garth-brooks/video/_u7W953k6la293J7EPTd9oHkSPs6Xn6_/connect-chat-feat-garth-brooks/', 'url': 'http://www.cbs.com/shows/garth-brooks/video/_u7W953k6la293J7EPTd9oHkSPs6Xn6_/connect-chat-feat-garth-brooks/',
'info_dict': { 'info_dict': {
'id': '_u7W953k6la293J7EPTd9oHkSPs6Xn6_', 'id': '_u7W953k6la293J7EPTd9oHkSPs6Xn6_',
'display_id': 'connect-chat-feat-garth-brooks',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Connect Chat feat. Garth Brooks', 'title': 'Connect Chat feat. Garth Brooks',
'description': 'Connect with country music singer Garth Brooks, as he chats with fans on Wednesday November 27, 2013. Be sure to tune in to Garth Brooks: Live from Las Vegas, Friday November 29, at 9/8c on CBS!', 'description': 'Connect with country music singer Garth Brooks, as he chats with fans on Wednesday November 27, 2013. Be sure to tune in to Garth Brooks: Live from Las Vegas, Friday November 29, at 9/8c on CBS!',
@ -47,7 +36,10 @@ class CBSIE(CBSBaseIE):
'upload_date': '20131127', 'upload_date': '20131127',
'uploader': 'CBSI-NEW', 'uploader': 'CBSI-NEW',
}, },
'expected_warnings': ['Failed to download m3u8 information'], 'params': {
# m3u8 download
'skip_download': True,
},
'_skip': 'Blocked outside the US', '_skip': 'Blocked outside the US',
}, { }, {
'url': 'http://colbertlateshow.com/video/8GmB0oY0McANFvp2aEffk9jZZZ2YyXxy/the-colbeard/', 'url': 'http://colbertlateshow.com/video/8GmB0oY0McANFvp2aEffk9jZZZ2YyXxy/the-colbeard/',
@ -56,8 +48,53 @@ class CBSIE(CBSBaseIE):
'url': 'http://www.colbertlateshow.com/podcasts/dYSwjqPs_X1tvbV_P2FcPWRa_qT6akTC/in-the-bad-room-with-stephen/', 'url': 'http://www.colbertlateshow.com/podcasts/dYSwjqPs_X1tvbV_P2FcPWRa_qT6akTC/in-the-bad-room-with-stephen/',
'only_matching': True, 'only_matching': True,
}] }]
TP_RELEASE_URL_TEMPLATE = 'http://link.theplatform.com/s/dJ5BDC/%s?mbr=true'
def _extract_video_info(self, content_id):
items_data = self._download_xml(
'http://can.cbs.com/thunder/player/videoPlayerService.php',
content_id, query={'partner': 'cbs', 'contentId': content_id})
video_data = xpath_element(items_data, './/item')
title = xpath_text(video_data, 'videoTitle', 'title', True)
tp_path = 'dJ5BDC/media/guid/2198311517/%s' % content_id
tp_release_url = 'http://link.theplatform.com/s/' + tp_path
asset_types = []
subtitles = {}
formats = []
for item in items_data.findall('.//item'):
asset_type = xpath_text(item, 'assetType')
if not asset_type or asset_type in asset_types:
continue
asset_types.append(asset_type)
query = {
'mbr': 'true',
'assetTypes': asset_type,
}
if asset_type.startswith('HLS') or asset_type in ('OnceURL', 'StreamPack'):
query['formats'] = 'MPEG4,M3U'
elif asset_type in ('RTMP', 'WIFI', '3G'):
query['formats'] = 'MPEG4,FLV'
tp_formats, tp_subtitles = self._extract_theplatform_smil(
update_url_query(tp_release_url, query), content_id,
'Downloading %s SMIL data' % asset_type)
formats.extend(tp_formats)
subtitles = self._merge_subtitles(subtitles, tp_subtitles)
self._sort_formats(formats)
info = self._extract_theplatform_metadata(tp_path, content_id)
info.update({
'id': content_id,
'title': title,
'series': xpath_text(video_data, 'seriesTitle'),
'season_number': int_or_none(xpath_text(video_data, 'seasonNumber')),
'episode_number': int_or_none(xpath_text(video_data, 'episodeNumber')),
'duration': int_or_none(xpath_text(video_data, 'videoLength'), 1000),
'thumbnail': xpath_text(video_data, 'previewImageURL'),
'formats': formats,
'subtitles': subtitles,
})
return info
def _real_extract(self, url): def _real_extract(self, url):
content_id = self._match_id(url) content_id = self._match_id(url)
return self._extract_video_info('byGuid=%s' % content_id, content_id) return self._extract_video_info(content_id)

View File

@ -63,7 +63,7 @@ class CBSInteractiveIE(ThePlatformIE):
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
data_json = self._html_search_regex( data_json = self._html_search_regex(
r"data-(?:cnet|zdnet)-video(?:-uvp)?-options='([^']+)'", r"data-(?:cnet|zdnet)-video(?:-uvp(?:js)?)?-options='([^']+)'",
webpage, 'data json') webpage, 'data json')
data = self._parse_json(data_json, display_id) data = self._parse_json(data_json, display_id)
vdata = data.get('video') or data['videos'][0] vdata = data.get('video') or data['videos'][0]
@ -80,9 +80,6 @@ class CBSInteractiveIE(ThePlatformIE):
media_guid_path = 'media/guid/%d/%s' % (self.MPX_ACCOUNTS[site], vdata['mpxRefId']) media_guid_path = 'media/guid/%d/%s' % (self.MPX_ACCOUNTS[site], vdata['mpxRefId'])
formats, subtitles = [], {} formats, subtitles = [], {}
if site == 'cnet':
formats, subtitles = self._extract_theplatform_smil(
self.TP_RELEASE_URL_TEMPLATE % media_guid_path, video_id)
for (fkey, vid) in vdata['files'].items(): for (fkey, vid) in vdata['files'].items():
if fkey == 'hls_phone' and 'hls_tablet' in vdata['files']: if fkey == 'hls_phone' and 'hls_tablet' in vdata['files']:
continue continue
@ -94,7 +91,7 @@ class CBSInteractiveIE(ThePlatformIE):
subtitles = self._merge_subtitles(subtitles, tp_subtitles) subtitles = self._merge_subtitles(subtitles, tp_subtitles)
self._sort_formats(formats) self._sort_formats(formats)
info = self.get_metadata('kYEXFC/%s' % media_guid_path, video_id) info = self._extract_theplatform_metadata('kYEXFC/%s' % media_guid_path, video_id)
info.update({ info.update({
'id': video_id, 'id': video_id,
'display_id': display_id, 'display_id': display_id,

View File

@ -1,12 +1,10 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import calendar
import datetime
from .anvato import AnvatoIE from .anvato import AnvatoIE
from .sendtonews import SendtoNewsIE from .sendtonews import SendtoNewsIE
from ..compat import compat_urlparse from ..compat import compat_urlparse
from ..utils import unified_timestamp
class CBSLocalIE(AnvatoIE): class CBSLocalIE(AnvatoIE):
@ -43,13 +41,8 @@ class CBSLocalIE(AnvatoIE):
'url': 'http://cleveland.cbslocal.com/2016/05/16/indians-score-season-high-15-runs-in-blowout-win-over-reds-rapid-reaction/', 'url': 'http://cleveland.cbslocal.com/2016/05/16/indians-score-season-high-15-runs-in-blowout-win-over-reds-rapid-reaction/',
'info_dict': { 'info_dict': {
'id': 'GxfCe0Zo7D-175909-5588', 'id': 'GxfCe0Zo7D-175909-5588',
'ext': 'mp4',
'title': 'Recap: CLE 15, CIN 6',
'description': '5/16/16: Indians\' bats explode for 15 runs in a win',
'upload_date': '20160516',
'timestamp': 1463433840,
'duration': 49,
}, },
'playlist_count': 9,
'params': { 'params': {
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
@ -62,19 +55,15 @@ class CBSLocalIE(AnvatoIE):
sendtonews_url = SendtoNewsIE._extract_url(webpage) sendtonews_url = SendtoNewsIE._extract_url(webpage)
if sendtonews_url: if sendtonews_url:
info_dict = { return self.url_result(
'_type': 'url_transparent', compat_urlparse.urljoin(url, sendtonews_url),
'url': compat_urlparse.urljoin(url, sendtonews_url), ie=SendtoNewsIE.ie_key())
}
else: info_dict = self._extract_anvato_videos(webpage, display_id)
info_dict = self._extract_anvato_videos(webpage, display_id)
time_str = self._html_search_regex( time_str = self._html_search_regex(
r'class="entry-date">([^<]+)<', webpage, 'released date', fatal=False) r'class="entry-date">([^<]+)<', webpage, 'released date', fatal=False)
timestamp = None timestamp = unified_timestamp(time_str)
if time_str:
timestamp = calendar.timegm(datetime.datetime.strptime(
time_str, '%b %d, %Y %I:%M %p').timetuple())
info_dict.update({ info_dict.update({
'display_id': display_id, 'display_id': display_id,

View File

@ -1,14 +1,15 @@
# encoding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from .cbs import CBSBaseIE from .cbs import CBSIE
from ..utils import ( from ..utils import (
parse_duration, parse_duration,
) )
class CBSNewsIE(CBSBaseIE): class CBSNewsIE(CBSIE):
IE_NAME = 'cbsnews'
IE_DESC = 'CBS News' IE_DESC = 'CBS News'
_VALID_URL = r'https?://(?:www\.)?cbsnews\.com/(?:news|videos)/(?P<id>[\da-z_-]+)' _VALID_URL = r'https?://(?:www\.)?cbsnews\.com/(?:news|videos)/(?P<id>[\da-z_-]+)'
@ -26,6 +27,7 @@ class CBSNewsIE(CBSBaseIE):
# rtmp download # rtmp download
'skip_download': True, 'skip_download': True,
}, },
'skip': 'Subscribers only',
}, },
{ {
'url': 'http://www.cbsnews.com/videos/fort-hood-shooting-army-downplays-mental-illness-as-cause-of-attack/', 'url': 'http://www.cbsnews.com/videos/fort-hood-shooting-army-downplays-mental-illness-as-cause-of-attack/',
@ -34,7 +36,8 @@ class CBSNewsIE(CBSBaseIE):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Fort Hood shooting: Army downplays mental illness as cause of attack', 'title': 'Fort Hood shooting: Army downplays mental illness as cause of attack',
'description': 'md5:4a6983e480542d8b333a947bfc64ddc7', 'description': 'md5:4a6983e480542d8b333a947bfc64ddc7',
'upload_date': '19700101', 'upload_date': '20140404',
'timestamp': 1396650660,
'uploader': 'CBSI-NEW', 'uploader': 'CBSI-NEW',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
'duration': 205, 'duration': 205,
@ -62,43 +65,43 @@ class CBSNewsIE(CBSBaseIE):
item = video_info['item'] if 'item' in video_info else video_info item = video_info['item'] if 'item' in video_info else video_info
guid = item['mpxRefId'] guid = item['mpxRefId']
return self._extract_video_info('byGuid=%s' % guid, guid) return self._extract_video_info(guid)
class CBSNewsLiveVideoIE(InfoExtractor): class CBSNewsLiveVideoIE(InfoExtractor):
IE_NAME = 'cbsnews:livevideo'
IE_DESC = 'CBS News Live Videos' IE_DESC = 'CBS News Live Videos'
_VALID_URL = r'https?://(?:www\.)?cbsnews\.com/live/video/(?P<id>[\da-z_-]+)' _VALID_URL = r'https?://(?:www\.)?cbsnews\.com/live/video/(?P<id>[^/?#]+)'
# Live videos get deleted soon. See http://www.cbsnews.com/live/ for the latest examples
_TEST = { _TEST = {
'url': 'http://www.cbsnews.com/live/video/clinton-sanders-prepare-to-face-off-in-nh/', 'url': 'http://www.cbsnews.com/live/video/clinton-sanders-prepare-to-face-off-in-nh/',
'info_dict': { 'info_dict': {
'id': 'clinton-sanders-prepare-to-face-off-in-nh', 'id': 'clinton-sanders-prepare-to-face-off-in-nh',
'ext': 'flv', 'ext': 'mp4',
'title': 'Clinton, Sanders Prepare To Face Off In NH', 'title': 'Clinton, Sanders Prepare To Face Off In NH',
'duration': 334, 'duration': 334,
}, },
'skip': 'Video gone',
} }
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) video_info = self._download_json(
'http://feeds.cbsn.cbsnews.com/rundown/story', display_id, query={
'device': 'desktop',
'dvr_slug': display_id,
})
video_info = self._parse_json(self._html_search_regex( formats = self._extract_akamai_formats(video_info['url'], display_id)
r'data-story-obj=\'({.+?})\'', webpage, 'video JSON info'), video_id)['story'] self._sort_formats(formats)
hdcore_sign = 'hdcore=3.3.1'
f4m_formats = self._extract_f4m_formats(video_info['url'] + '&' + hdcore_sign, video_id)
if f4m_formats:
for entry in f4m_formats:
# URLs without the extra param induce an 404 error
entry.update({'extra_param_to_segment_url': hdcore_sign})
self._sort_formats(f4m_formats)
return { return {
'id': video_id, 'id': display_id,
'display_id': display_id,
'title': video_info['headline'], 'title': video_info['headline'],
'thumbnail': video_info.get('thumbnail_url_hd') or video_info.get('thumbnail_url_sd'), 'thumbnail': video_info.get('thumbnail_url_hd') or video_info.get('thumbnail_url_sd'),
'duration': parse_duration(video_info.get('segmentDur')), 'duration': parse_duration(video_info.get('segmentDur')),
'formats': f4m_formats, 'formats': formats,
} }

View File

@ -4,7 +4,7 @@ from .cbs import CBSBaseIE
class CBSSportsIE(CBSBaseIE): class CBSSportsIE(CBSBaseIE):
_VALID_URL = r'https?://www\.cbssports\.com/video/player/[^/]+/(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?cbssports\.com/video/player/[^/]+/(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.cbssports.com/video/player/videos/708337219968/0/ben-simmons-the-next-lebron?-not-so-fast', 'url': 'http://www.cbssports.com/video/player/videos/708337219968/0/ben-simmons-the-next-lebron?-not-so-fast',
@ -23,6 +23,9 @@ class CBSSportsIE(CBSBaseIE):
} }
}] }]
def _extract_video_info(self, filter_query, video_id):
return self._extract_feed_info('dJ5BDC', 'VxxJg8Ymh8sE', filter_query, video_id)
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
return self._extract_video_info('byId=%s' % video_id, video_id) return self._extract_video_info('byId=%s' % video_id, video_id)

View File

@ -0,0 +1,53 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import float_or_none
class CCTVIE(InfoExtractor):
_VALID_URL = r'''(?x)https?://(?:.+?\.)?
(?:
cctv\.(?:com|cn)|
cntv\.cn
)/
(?:
video/[^/]+/(?P<id>[0-9a-f]{32})|
\d{4}/\d{2}/\d{2}/(?P<display_id>VID[0-9A-Za-z]+)
)'''
_TESTS = [{
'url': 'http://english.cntv.cn/2016/09/03/VIDEhnkB5y9AgHyIEVphCEz1160903.shtml',
'md5': '819c7b49fc3927d529fb4cd555621823',
'info_dict': {
'id': '454368eb19ad44a1925bf1eb96140a61',
'ext': 'mp4',
'title': 'Portrait of Real Current Life 09/03/2016 Modern Inventors Part 1',
}
}, {
'url': 'http://tv.cctv.com/2016/09/07/VIDE5C1FnlX5bUywlrjhxXOV160907.shtml',
'only_matching': True,
}, {
'url': 'http://tv.cntv.cn/video/C39296/95cfac44cabd3ddc4a9438780a4e5c44',
'only_matching': True
}]
def _real_extract(self, url):
video_id, display_id = re.match(self._VALID_URL, url).groups()
if not video_id:
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
r'(?:fo\.addVariable\("videoCenterId",\s*|guid\s*=\s*)"([0-9a-f]{32})',
webpage, 'video_id')
api_data = self._download_json(
'http://vdn.apps.cntv.cn/api/getHttpVideoInfo.do?pid=' + video_id, video_id)
m3u8_url = re.sub(r'maxbr=\d+&?', '', api_data['hls_url'])
return {
'id': video_id,
'title': api_data['title'],
'formats': self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native', fatal=False),
'duration': float_or_none(api_data.get('video', {}).get('totalLength')),
}

View File

@ -1,4 +1,4 @@
# -*- coding: utf-8 -*- # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
@ -17,7 +17,7 @@ from ..utils import (
class CeskaTelevizeIE(InfoExtractor): class CeskaTelevizeIE(InfoExtractor):
_VALID_URL = r'https?://www\.ceskatelevize\.cz/(porady|ivysilani)/(?:[^/]+/)*(?P<id>[^/#?]+)/*(?:[#?].*)?$' _VALID_URL = r'https?://(?:www\.)?ceskatelevize\.cz/(porady|ivysilani)/(?:[^/]+/)*(?P<id>[^/#?]+)/*(?:[#?].*)?$'
_TESTS = [{ _TESTS = [{
'url': 'http://www.ceskatelevize.cz/ivysilani/ivysilani/10441294653-hyde-park-civilizace/214411058091220', 'url': 'http://www.ceskatelevize.cz/ivysilani/ivysilani/10441294653-hyde-park-civilizace/214411058091220',
'info_dict': { 'info_dict': {

View File

@ -0,0 +1,51 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import remove_end
class CharlieRoseIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?charlierose\.com/video(?:s|/player)/(?P<id>\d+)'
_TESTS = [{
'url': 'https://charlierose.com/videos/27996',
'md5': 'fda41d49e67d4ce7c2411fd2c4702e09',
'info_dict': {
'id': '27996',
'ext': 'mp4',
'title': 'Remembering Zaha Hadid',
'thumbnail': 're:^https?://.*\.jpg\?\d+',
'description': 'We revisit past conversations with Zaha Hadid, in memory of the world renowned Iraqi architect.',
'subtitles': {
'en': [{
'ext': 'vtt',
}],
},
},
}, {
'url': 'https://charlierose.com/videos/27996',
'only_matching': True,
}]
_PLAYER_BASE = 'https://charlierose.com/video/player/%s'
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(self._PLAYER_BASE % video_id, video_id)
title = remove_end(self._og_search_title(webpage), ' - Charlie Rose')
info_dict = self._parse_html5_media_entries(
self._PLAYER_BASE % video_id, webpage, video_id,
m3u8_entry_protocol='m3u8_native')[0]
self._sort_formats(info_dict['formats'])
self._remove_duplicate_formats(info_dict['formats'])
info_dict.update({
'id': video_id,
'title': title,
'thumbnail': self._og_search_thumbnail(webpage),
'description': self._og_search_description(webpage),
})
return info_dict

View File

@ -17,7 +17,8 @@ class ChaturbateIE(InfoExtractor):
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
} },
'skip': 'Room is offline',
}, { }, {
'url': 'https://en.chaturbate.com/siswet19/', 'url': 'https://en.chaturbate.com/siswet19/',
'only_matching': True, 'only_matching': True,

View File

@ -1,30 +1,34 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import base64
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import parse_duration
parse_duration,
int_or_none,
)
class ChirbitIE(InfoExtractor): class ChirbitIE(InfoExtractor):
IE_NAME = 'chirbit' IE_NAME = 'chirbit'
_VALID_URL = r'https?://(?:www\.)?chirb\.it/(?:(?:wp|pl)/|fb_chirbit_player\.swf\?key=)?(?P<id>[\da-zA-Z]+)' _VALID_URL = r'https?://(?:www\.)?chirb\.it/(?:(?:wp|pl)/|fb_chirbit_player\.swf\?key=)?(?P<id>[\da-zA-Z]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://chirb.it/PrIPv5', 'url': 'http://chirb.it/be2abG',
'md5': '9847b0dad6ac3e074568bf2cfb197de8',
'info_dict': { 'info_dict': {
'id': 'PrIPv5', 'id': 'be2abG',
'ext': 'mp3', 'ext': 'mp3',
'title': 'Фасадстрой', 'title': 'md5:f542ea253f5255240be4da375c6a5d7e',
'duration': 52, 'description': 'md5:f24a4e22a71763e32da5fed59e47c770',
'view_count': int, 'duration': 306,
'comment_count': int, },
'params': {
'skip_download': True,
} }
}, { }, {
'url': 'https://chirb.it/fb_chirbit_player.swf?key=PrIPv5', 'url': 'https://chirb.it/fb_chirbit_player.swf?key=PrIPv5',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://chirb.it/wp/MN58c2',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -33,38 +37,40 @@ class ChirbitIE(InfoExtractor):
webpage = self._download_webpage( webpage = self._download_webpage(
'http://chirb.it/%s' % audio_id, audio_id) 'http://chirb.it/%s' % audio_id, audio_id)
audio_url = self._search_regex( data_fd = self._search_regex(
r'"setFile"\s*,\s*"([^"]+)"', webpage, 'audio url') r'data-fd=(["\'])(?P<url>(?:(?!\1).)+)\1',
webpage, 'data fd', group='url')
# Reverse engineered from https://chirb.it/js/chirbit.player.js (look
# for soundURL)
audio_url = base64.b64decode(
data_fd[::-1].encode('ascii')).decode('utf-8')
title = self._search_regex( title = self._search_regex(
r'itemprop="name">([^<]+)', webpage, 'title') r'class=["\']chirbit-title["\'][^>]*>([^<]+)', webpage, 'title')
duration = parse_duration(self._html_search_meta( description = self._search_regex(
'duration', webpage, 'duration', fatal=False)) r'<h3>Description</h3>\s*<pre[^>]*>([^<]+)</pre>',
view_count = int_or_none(self._search_regex( webpage, 'description', default=None)
r'itemprop="playCount"\s*>(\d+)', webpage, duration = parse_duration(self._search_regex(
'listen count', fatal=False)) r'class=["\']c-length["\'][^>]*>([^<]+)',
comment_count = int_or_none(self._search_regex( webpage, 'duration', fatal=False))
r'>(\d+) Comments?:', webpage,
'comment count', fatal=False))
return { return {
'id': audio_id, 'id': audio_id,
'url': audio_url, 'url': audio_url,
'title': title, 'title': title,
'description': description,
'duration': duration, 'duration': duration,
'view_count': view_count,
'comment_count': comment_count,
} }
class ChirbitProfileIE(InfoExtractor): class ChirbitProfileIE(InfoExtractor):
IE_NAME = 'chirbit:profile' IE_NAME = 'chirbit:profile'
_VALID_URL = r'https?://(?:www\.)?chirbit.com/(?:rss/)?(?P<id>[^/]+)' _VALID_URL = r'https?://(?:www\.)?chirbit\.com/(?:rss/)?(?P<id>[^/]+)'
_TEST = { _TEST = {
'url': 'http://chirbit.com/ScarletBeauty', 'url': 'http://chirbit.com/ScarletBeauty',
'info_dict': { 'info_dict': {
'id': 'ScarletBeauty', 'id': 'ScarletBeauty',
'title': 'Chirbits by ScarletBeauty',
}, },
'playlist_mincount': 3, 'playlist_mincount': 3,
} }
@ -72,13 +78,10 @@ class ChirbitProfileIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
profile_id = self._match_id(url) profile_id = self._match_id(url)
rss = self._download_xml( webpage = self._download_webpage(url, profile_id)
'http://chirbit.com/rss/%s' % profile_id, profile_id)
entries = [ entries = [
self.url_result(audio_url.text, 'Chirbit') self.url_result(self._proto_relative_url('//chirb.it/' + video_id))
for audio_url in rss.findall('./channel/item/link')] for _, video_id in re.findall(r'<input[^>]+id=([\'"])copy-btn-(?P<id>[0-9a-zA-Z]+)\1', webpage)]
title = rss.find('./channel/title').text return self.playlist_result(entries, profile_id)
return self.playlist_result(entries, profile_id, title)

View File

@ -1,3 +1,4 @@
# coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
@ -10,15 +11,15 @@ from ..utils import (
class ClipfishIE(InfoExtractor): class ClipfishIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?clipfish\.de/(?:[^/]+/)+video/(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.)?clipfish\.de/(?:[^/]+/)+video/(?P<id>[0-9]+)'
_TEST = { _TEST = {
'url': 'http://www.clipfish.de/special/game-trailer/video/3966754/fifa-14-e3-2013-trailer/', 'url': 'http://www.clipfish.de/special/ugly-americans/video/4343170/s01-e01-ugly-americans-date-in-der-hoelle/',
'md5': '79bc922f3e8a9097b3d68a93780fd475', 'md5': '720563e467b86374c194bdead08d207d',
'info_dict': { 'info_dict': {
'id': '3966754', 'id': '4343170',
'ext': 'mp4', 'ext': 'mp4',
'title': 'FIFA 14 - E3 2013 Trailer', 'title': 'S01 E01 - Ugly Americans - Date in der Hölle',
'description': 'Video zu FIFA 14: E3 2013 Trailer', 'description': 'Mark Lilly arbeitet im Sozialdienst der Stadt New York und soll Immigranten bei ihrer Einbürgerung in die USA zur Seite stehen.',
'upload_date': '20130611', 'upload_date': '20161005',
'duration': 82, 'duration': 1291,
'view_count': int, 'view_count': int,
} }
} }
@ -50,10 +51,14 @@ class ClipfishIE(InfoExtractor):
'tbr': int_or_none(video_info.get('bitrate')), 'tbr': int_or_none(video_info.get('bitrate')),
}) })
descr = video_info.get('descr')
if descr:
descr = descr.strip()
return { return {
'id': video_id, 'id': video_id,
'title': video_info['title'], 'title': video_info['title'],
'description': video_info.get('descr'), 'description': descr,
'formats': formats, 'formats': formats,
'thumbnail': video_info.get('media_content_thumbnail_large') or video_info.get('media_thumbnail'), 'thumbnail': video_info.get('media_content_thumbnail_large') or video_info.get('media_thumbnail'),
'duration': int_or_none(video_info.get('media_length')), 'duration': int_or_none(video_info.get('media_length')),

View File

@ -23,7 +23,7 @@ class CliphunterIE(InfoExtractor):
(?P<id>[0-9]+)/ (?P<id>[0-9]+)/
(?P<seo>.+?)(?:$|[#\?]) (?P<seo>.+?)(?:$|[#\?])
''' '''
_TEST = { _TESTS = [{
'url': 'http://www.cliphunter.com/w/1012420/Fun_Jynx_Maze_solo', 'url': 'http://www.cliphunter.com/w/1012420/Fun_Jynx_Maze_solo',
'md5': 'b7c9bbd4eb3a226ab91093714dcaa480', 'md5': 'b7c9bbd4eb3a226ab91093714dcaa480',
'info_dict': { 'info_dict': {
@ -32,8 +32,19 @@ class CliphunterIE(InfoExtractor):
'title': 'Fun Jynx Maze solo', 'title': 'Fun Jynx Maze solo',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
'age_limit': 18, 'age_limit': 18,
} },
} 'skip': 'Video gone',
}, {
'url': 'http://www.cliphunter.com/w/2019449/ShesNew__My_booty_girlfriend_Victoria_Paradices_pussy_filled_with_jizz',
'md5': '55a723c67bfc6da6b0cfa00d55da8a27',
'info_dict': {
'id': '2019449',
'ext': 'mp4',
'title': 'ShesNew - My booty girlfriend, Victoria Paradice\'s pussy filled with jizz',
'thumbnail': 're:^https?://.*\.jpg$',
'age_limit': 18,
},
}]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)

View File

@ -1,16 +1,10 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .onet import OnetBaseIE
from ..utils import (
ExtractorError,
float_or_none,
int_or_none,
parse_iso8601,
)
class ClipRsIE(InfoExtractor): class ClipRsIE(OnetBaseIE):
_VALID_URL = r'https?://(?:www\.)?clip\.rs/(?P<id>[^/]+)/\d+' _VALID_URL = r'https?://(?:www\.)?clip\.rs/(?P<id>[^/]+)/\d+'
_TEST = { _TEST = {
'url': 'http://www.clip.rs/premijera-frajle-predstavljaju-novi-spot-za-pesmu-moli-me-moli/3732', 'url': 'http://www.clip.rs/premijera-frajle-predstavljaju-novi-spot-za-pesmu-moli-me-moli/3732',
@ -27,64 +21,13 @@ class ClipRsIE(InfoExtractor):
} }
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, display_id)
video_id = self._search_regex( mvp_id = self._search_mvp_id(webpage)
r'id=(["\'])mvp:(?P<id>.+?)\1', webpage, 'mvp id', group='id')
response = self._download_json( info_dict = self._extract_from_id(mvp_id, webpage)
'http://qi.ckm.onetapi.pl/', video_id, info_dict['display_id'] = display_id
query={
'body[id]': video_id,
'body[jsonrpc]': '2.0',
'body[method]': 'get_asset_detail',
'body[params][ID_Publikacji]': video_id,
'body[params][Service]': 'www.onet.pl',
'content-type': 'application/jsonp',
'x-onet-app': 'player.front.onetapi.pl',
})
error = response.get('error') return info_dict
if error:
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, error['message']), expected=True)
video = response['result'].get('0')
formats = []
for _, formats_dict in video['formats'].items():
if not isinstance(formats_dict, dict):
continue
for format_id, format_list in formats_dict.items():
if not isinstance(format_list, list):
continue
for f in format_list:
if not f.get('url'):
continue
formats.append({
'url': f['url'],
'format_id': format_id,
'height': int_or_none(f.get('vertical_resolution')),
'width': int_or_none(f.get('horizontal_resolution')),
'abr': float_or_none(f.get('audio_bitrate')),
'vbr': float_or_none(f.get('video_bitrate')),
})
self._sort_formats(formats)
meta = video.get('meta', {})
title = self._og_search_title(webpage, default=None) or meta['title']
description = self._og_search_description(webpage, default=None) or meta.get('description')
duration = meta.get('length') or meta.get('lenght')
timestamp = parse_iso8601(meta.get('addDate'), ' ')
return {
'id': video_id,
'title': title,
'description': description,
'duration': duration,
'timestamp': timestamp,
'formats': formats,
}

View File

@ -6,7 +6,6 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_parse_qs, compat_parse_qs,
compat_urllib_parse_urlencode,
compat_HTTPError, compat_HTTPError,
) )
from ..utils import ( from ..utils import (
@ -17,37 +16,26 @@ from ..utils import (
class CloudyIE(InfoExtractor): class CloudyIE(InfoExtractor):
_IE_DESC = 'cloudy.ec and videoraj.ch' _IE_DESC = 'cloudy.ec'
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?://(?:www\.)?(?P<host>cloudy\.ec|videoraj\.(?:ch|to))/ https?://(?:www\.)?cloudy\.ec/
(?:v/|embed\.php\?id=) (?:v/|embed\.php\?id=)
(?P<id>[A-Za-z0-9]+) (?P<id>[A-Za-z0-9]+)
''' '''
_EMBED_URL = 'http://www.%s/embed.php?id=%s' _EMBED_URL = 'http://www.cloudy.ec/embed.php?id=%s'
_API_URL = 'http://www.%s/api/player.api.php?%s' _API_URL = 'http://www.cloudy.ec/api/player.api.php'
_MAX_TRIES = 2 _MAX_TRIES = 2
_TESTS = [ _TEST = {
{ 'url': 'https://www.cloudy.ec/v/af511e2527aac',
'url': 'https://www.cloudy.ec/v/af511e2527aac', 'md5': '5cb253ace826a42f35b4740539bedf07',
'md5': '5cb253ace826a42f35b4740539bedf07', 'info_dict': {
'info_dict': { 'id': 'af511e2527aac',
'id': 'af511e2527aac', 'ext': 'flv',
'ext': 'flv', 'title': 'Funny Cats and Animals Compilation june 2013',
'title': 'Funny Cats and Animals Compilation june 2013',
}
},
{
'url': 'http://www.videoraj.to/v/47f399fd8bb60',
'md5': '7d0f8799d91efd4eda26587421c3c3b0',
'info_dict': {
'id': '47f399fd8bb60',
'ext': 'flv',
'title': 'Burning a New iPhone 5 with Gasoline - Will it Survive?',
}
} }
] }
def _extract_video(self, video_host, video_id, file_key, error_url=None, try_num=0): def _extract_video(self, video_id, file_key, error_url=None, try_num=0):
if try_num > self._MAX_TRIES - 1: if try_num > self._MAX_TRIES - 1:
raise ExtractorError('Unable to extract video URL', expected=True) raise ExtractorError('Unable to extract video URL', expected=True)
@ -64,9 +52,8 @@ class CloudyIE(InfoExtractor):
'errorUrl': error_url, 'errorUrl': error_url,
}) })
data_url = self._API_URL % (video_host, compat_urllib_parse_urlencode(form))
player_data = self._download_webpage( player_data = self._download_webpage(
data_url, video_id, 'Downloading player data') self._API_URL, video_id, 'Downloading player data', query=form)
data = compat_parse_qs(player_data) data = compat_parse_qs(player_data)
try_num += 1 try_num += 1
@ -88,7 +75,7 @@ class CloudyIE(InfoExtractor):
except ExtractorError as e: except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code in [404, 410]: if isinstance(e.cause, compat_HTTPError) and e.cause.code in [404, 410]:
self.report_warning('Invalid video URL, requesting another', video_id) self.report_warning('Invalid video URL, requesting another', video_id)
return self._extract_video(video_host, video_id, file_key, video_url, try_num) return self._extract_video(video_id, file_key, video_url, try_num)
return { return {
'id': video_id, 'id': video_id,
@ -98,14 +85,13 @@ class CloudyIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
video_host = mobj.group('host')
video_id = mobj.group('id') video_id = mobj.group('id')
url = self._EMBED_URL % (video_host, video_id) url = self._EMBED_URL % video_id
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
file_key = self._search_regex( file_key = self._search_regex(
[r'key\s*:\s*"([^"]+)"', r'filekey\s*=\s*"([^"]+)"'], [r'key\s*:\s*"([^"]+)"', r'filekey\s*=\s*"([^"]+)"'],
webpage, 'file_key') webpage, 'file_key')
return self._extract_video(video_host, video_id, file_key) return self._extract_video(video_id, file_key)

View File

@ -1,9 +1,6 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
clean_html, clean_html,
@ -30,16 +27,14 @@ class ClubicIE(InfoExtractor):
}] }]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id = self._match_id(url)
video_id = mobj.group('id')
player_url = 'http://player.m6web.fr/v1/player/clubic/%s.html' % video_id player_url = 'http://player.m6web.fr/v1/player/clubic/%s.html' % video_id
player_page = self._download_webpage(player_url, video_id) player_page = self._download_webpage(player_url, video_id)
config_json = self._search_regex( config = self._parse_json(self._search_regex(
r'(?m)M6\.Player\.config\s*=\s*(\{.+?\});$', player_page, r'(?m)M6\.Player\.config\s*=\s*(\{.+?\});$', player_page,
'configuration') 'configuration'), video_id)
config = json.loads(config_json)
video_info = config['videoInfo'] video_info = config['videoInfo']
sources = config['sources'] sources = config['sources']

View File

@ -1,10 +1,12 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .mtv import MTVIE from .mtv import MTVIE
from ..utils import ExtractorError
class CMTIE(MTVIE): class CMTIE(MTVIE):
IE_NAME = 'cmt.com' IE_NAME = 'cmt.com'
_VALID_URL = r'https?://www\.cmt\.com/(?:videos|shows)/(?:[^/]+/)*(?P<videoid>\d+)' _VALID_URL = r'https?://(?:www\.)?cmt\.com/(?:videos|shows)/(?:[^/]+/)*(?P<videoid>\d+)'
_FEED_URL = 'http://www.cmt.com/sitewide/apps/player/embed/rss/' _FEED_URL = 'http://www.cmt.com/sitewide/apps/player/embed/rss/'
_TESTS = [{ _TESTS = [{
@ -16,7 +18,32 @@ class CMTIE(MTVIE):
'title': 'Garth Brooks - "The Call (featuring Trisha Yearwood)"', 'title': 'Garth Brooks - "The Call (featuring Trisha Yearwood)"',
'description': 'Blame It All On My Roots', 'description': 'Blame It All On My Roots',
}, },
'skip': 'Video not available',
}, {
'url': 'http://www.cmt.com/videos/misc/1504699/still-the-king-ep-109-in-3-minutes.jhtml#id=1739908',
'md5': 'e61a801ca4a183a466c08bd98dccbb1c',
'info_dict': {
'id': '1504699',
'ext': 'mp4',
'title': 'Still The King Ep. 109 in 3 Minutes',
'description': 'Relive or catch up with Still The King by watching this recap of season 1, episode 9.',
'timestamp': 1469421000.0,
'upload_date': '20160725',
},
}, { }, {
'url': 'http://www.cmt.com/shows/party-down-south/party-down-south-ep-407-gone-girl/1738172/playlist/#id=1738172', 'url': 'http://www.cmt.com/shows/party-down-south/party-down-south-ep-407-gone-girl/1738172/playlist/#id=1738172',
'only_matching': True, 'only_matching': True,
}] }]
@classmethod
def _transform_rtmp_url(cls, rtmp_video_url):
if 'error_not_available.swf' in rtmp_video_url:
raise ExtractorError(
'%s said: video is not available' % cls.IE_NAME, expected=True)
return super(CMTIE, cls)._transform_rtmp_url(rtmp_video_url)
def _extract_mgid(self, webpage):
return self._search_regex(
r'MTVN\.VIDEO\.contentUri\s*=\s*([\'"])(?P<mgid>.+?)\1',
webpage, 'mgid', group='mgid')

View File

@ -3,15 +3,12 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from .turner import TurnerBaseIE
int_or_none, from ..utils import url_basename
parse_duration,
url_basename,
)
class CNNIE(InfoExtractor): class CNNIE(TurnerBaseIE):
_VALID_URL = r'''(?x)https?://(?:(?:edition|www)\.)?cnn\.com/video/(?:data/.+?|\?)/ _VALID_URL = r'''(?x)https?://(?:(?P<sub_domain>edition|www|money)\.)?cnn\.com/(?:video/(?:data/.+?|\?)/)?videos?/
(?P<path>.+?/(?P<title>[^/]+?)(?:\.(?:[a-z\-]+)|(?=&)))''' (?P<path>.+?/(?P<title>[^/]+?)(?:\.(?:[a-z\-]+)|(?=&)))'''
_TESTS = [{ _TESTS = [{
@ -25,6 +22,7 @@ class CNNIE(InfoExtractor):
'duration': 135, 'duration': 135,
'upload_date': '20130609', 'upload_date': '20130609',
}, },
'expected_warnings': ['Failed to download m3u8 information'],
}, { }, {
'url': 'http://edition.cnn.com/video/?/video/us/2013/08/21/sot-student-gives-epic-speech.georgia-institute-of-technology&utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+rss%2Fcnn_topstories+%28RSS%3A+Top+Stories%29', 'url': 'http://edition.cnn.com/video/?/video/us/2013/08/21/sot-student-gives-epic-speech.georgia-institute-of-technology&utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+rss%2Fcnn_topstories+%28RSS%3A+Top+Stories%29',
'md5': 'b5cc60c60a3477d185af8f19a2a26f4e', 'md5': 'b5cc60c60a3477d185af8f19a2a26f4e',
@ -34,7 +32,8 @@ class CNNIE(InfoExtractor):
'title': "Student's epic speech stuns new freshmen", 'title': "Student's epic speech stuns new freshmen",
'description': "A Georgia Tech student welcomes the incoming freshmen with an epic speech backed by music from \"2001: A Space Odyssey.\"", 'description': "A Georgia Tech student welcomes the incoming freshmen with an epic speech backed by music from \"2001: A Space Odyssey.\"",
'upload_date': '20130821', 'upload_date': '20130821',
} },
'expected_warnings': ['Failed to download m3u8 information'],
}, { }, {
'url': 'http://www.cnn.com/video/data/2.0/video/living/2014/12/22/growing-america-nashville-salemtown-board-episode-1.hln.html', 'url': 'http://www.cnn.com/video/data/2.0/video/living/2014/12/22/growing-america-nashville-salemtown-board-episode-1.hln.html',
'md5': 'f14d02ebd264df951feb2400e2c25a1b', 'md5': 'f14d02ebd264df951feb2400e2c25a1b',
@ -44,80 +43,61 @@ class CNNIE(InfoExtractor):
'title': 'Nashville Ep. 1: Hand crafted skateboards', 'title': 'Nashville Ep. 1: Hand crafted skateboards',
'description': 'md5:e7223a503315c9f150acac52e76de086', 'description': 'md5:e7223a503315c9f150acac52e76de086',
'upload_date': '20141222', 'upload_date': '20141222',
} },
'expected_warnings': ['Failed to download m3u8 information'],
}, {
'url': 'http://money.cnn.com/video/news/2016/08/19/netflix-stunning-stats.cnnmoney/index.html',
'md5': '52a515dc1b0f001cd82e4ceda32be9d1',
'info_dict': {
'id': '/video/news/2016/08/19/netflix-stunning-stats.cnnmoney',
'ext': 'mp4',
'title': '5 stunning stats about Netflix',
'description': 'Did you know that Netflix has more than 80 million members? Here are five facts about the online video distributor that you probably didn\'t know.',
'upload_date': '20160819',
},
'params': {
# m3u8 download
'skip_download': True,
},
}, { }, {
'url': 'http://cnn.com/video/?/video/politics/2015/03/27/pkg-arizona-senator-church-attendance-mandatory.ktvk', 'url': 'http://cnn.com/video/?/video/politics/2015/03/27/pkg-arizona-senator-church-attendance-mandatory.ktvk',
'only_matching': True, 'only_matching': True,
}, { }, {
'url': 'http://cnn.com/video/?/video/us/2015/04/06/dnt-baker-refuses-anti-gay-order.wkmg', 'url': 'http://cnn.com/video/?/video/us/2015/04/06/dnt-baker-refuses-anti-gay-order.wkmg',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://edition.cnn.com/videos/arts/2016/04/21/olympic-games-cultural-a-z-brazil.cnn',
'only_matching': True,
}] }]
_CONFIG = {
# http://edition.cnn.com/.element/apps/cvp/3.0/cfg/spider/cnn/expansion/config.xml
'edition': {
'data_src': 'http://edition.cnn.com/video/data/3.0/video/%s/index.xml',
'media_src': 'http://pmd.cdn.turner.com/cnn/big',
},
# http://money.cnn.com/.element/apps/cvp2/cfg/config.xml
'money': {
'data_src': 'http://money.cnn.com/video/data/4.0/video/%s.xml',
'media_src': 'http://ht3.cdn.turner.com/money/big',
},
}
def _extract_timestamp(self, video_data):
# TODO: fix timestamp extraction
return None
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) sub_domain, path, page_title = re.match(self._VALID_URL, url).groups()
path = mobj.group('path') if sub_domain not in ('money', 'edition'):
page_title = mobj.group('title') sub_domain = 'edition'
info_url = 'http://edition.cnn.com/video/data/3.0/%s/index.xml' % path config = self._CONFIG[sub_domain]
info = self._download_xml(info_url, page_title) return self._extract_cvp_info(
config['data_src'] % path, page_title, {
formats = [] 'default': {
rex = re.compile(r'''(?x) 'media_src': config['media_src'],
(?P<width>[0-9]+)x(?P<height>[0-9]+) }
(?:_(?P<bitrate>[0-9]+)k)? })
''')
for f in info.findall('files/file'):
video_url = 'http://ht.cdn.turner.com/cnn/big%s' % (f.text.strip())
fdct = {
'format_id': f.attrib['bitrate'],
'url': video_url,
}
mf = rex.match(f.attrib['bitrate'])
if mf:
fdct['width'] = int(mf.group('width'))
fdct['height'] = int(mf.group('height'))
fdct['tbr'] = int_or_none(mf.group('bitrate'))
else:
mf = rex.search(f.text)
if mf:
fdct['width'] = int(mf.group('width'))
fdct['height'] = int(mf.group('height'))
fdct['tbr'] = int_or_none(mf.group('bitrate'))
else:
mi = re.match(r'ios_(audio|[0-9]+)$', f.attrib['bitrate'])
if mi:
if mi.group(1) == 'audio':
fdct['vcodec'] = 'none'
fdct['ext'] = 'm4a'
else:
fdct['tbr'] = int(mi.group(1))
formats.append(fdct)
self._sort_formats(formats)
thumbnails = [{
'height': int(t.attrib['height']),
'width': int(t.attrib['width']),
'url': t.text,
} for t in info.findall('images/image')]
metas_el = info.find('metas')
upload_date = (
metas_el.attrib.get('version') if metas_el is not None else None)
duration_el = info.find('length')
duration = parse_duration(duration_el.text)
return {
'id': info.attrib['id'],
'title': info.find('headline').text,
'formats': formats,
'thumbnails': thumbnails,
'description': info.find('description').text,
'duration': duration,
'upload_date': upload_date,
}
class CNNBlogsIE(InfoExtractor): class CNNBlogsIE(InfoExtractor):
@ -132,6 +112,7 @@ class CNNBlogsIE(InfoExtractor):
'description': 'Glenn Greenwald responds to comments made this week on Capitol Hill that journalists could be criminal accessories.', 'description': 'Glenn Greenwald responds to comments made this week on Capitol Hill that journalists could be criminal accessories.',
'upload_date': '20140209', 'upload_date': '20140209',
}, },
'expected_warnings': ['Failed to download m3u8 information'],
'add_ie': ['CNN'], 'add_ie': ['CNN'],
} }
@ -146,7 +127,7 @@ class CNNBlogsIE(InfoExtractor):
class CNNArticleIE(InfoExtractor): class CNNArticleIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:edition|www)\.)?cnn\.com/(?!video/)' _VALID_URL = r'https?://(?:(?:edition|www)\.)?cnn\.com/(?!videos?/)'
_TEST = { _TEST = {
'url': 'http://www.cnn.com/2014/12/21/politics/obama-north-koreas-hack-not-war-but-cyber-vandalism/', 'url': 'http://www.cnn.com/2014/12/21/politics/obama-north-koreas-hack-not-war-but-cyber-vandalism/',
'md5': '689034c2a3d9c6dc4aa72d65a81efd01', 'md5': '689034c2a3d9c6dc4aa72d65a81efd01',
@ -154,9 +135,10 @@ class CNNArticleIE(InfoExtractor):
'id': 'bestoftv/2014/12/21/ip-north-korea-obama.cnn', 'id': 'bestoftv/2014/12/21/ip-north-korea-obama.cnn',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Obama: Cyberattack not an act of war', 'title': 'Obama: Cyberattack not an act of war',
'description': 'md5:51ce6750450603795cad0cdfbd7d05c5', 'description': 'md5:0a802a40d2376f60e6b04c8d5bcebc4b',
'upload_date': '20141221', 'upload_date': '20141221',
}, },
'expected_warnings': ['Failed to download m3u8 information'],
'add_ie': ['CNN'], 'add_ie': ['CNN'],
} }

View File

@ -1,4 +1,4 @@
# encoding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor

View File

@ -1,17 +1,7 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .mtv import MTVServicesInfoExtractor from .mtv import MTVServicesInfoExtractor
from ..compat import ( from .common import InfoExtractor
compat_str,
compat_urllib_parse_urlencode,
)
from ..utils import (
ExtractorError,
float_or_none,
unified_strdate,
)
class ComedyCentralIE(MTVServicesInfoExtractor): class ComedyCentralIE(MTVServicesInfoExtractor):
@ -26,8 +16,10 @@ class ComedyCentralIE(MTVServicesInfoExtractor):
'info_dict': { 'info_dict': {
'id': 'cef0cbb3-e776-4bc9-b62e-8016deccb354', 'id': 'cef0cbb3-e776-4bc9-b62e-8016deccb354',
'ext': 'mp4', 'ext': 'mp4',
'title': 'CC:Stand-Up|Greg Fitzsimmons: Life on Stage|Uncensored - Too Good of a Mother', 'title': 'CC:Stand-Up|August 18, 2013|1|0101|Uncensored - Too Good of a Mother',
'description': 'After a certain point, breastfeeding becomes c**kblocking.', 'description': 'After a certain point, breastfeeding becomes c**kblocking.',
'timestamp': 1376798400,
'upload_date': '20130818',
}, },
}, { }, {
'url': 'http://www.cc.com/shows/the-daily-show-with-trevor-noah/interviews/6yx39d/exclusive-rand-paul-extended-interview', 'url': 'http://www.cc.com/shows/the-daily-show-with-trevor-noah/interviews/6yx39d/exclusive-rand-paul-extended-interview',
@ -35,241 +27,92 @@ class ComedyCentralIE(MTVServicesInfoExtractor):
}] }]
class ComedyCentralShowsIE(MTVServicesInfoExtractor): class ToshIE(MTVServicesInfoExtractor):
IE_DESC = 'The Daily Show / The Colbert Report' IE_DESC = 'Tosh.0'
# urls can be abbreviations like :thedailyshow _VALID_URL = r'^https?://tosh\.cc\.com/video-(?:clips|collections)/[^/]+/(?P<videotitle>[^/?#]+)'
# urls for episodes like: _FEED_URL = 'http://tosh.cc.com/feeds/mrss'
# or urls for clips like: http://www.thedailyshow.com/watch/mon-december-10-2012/any-given-gun-day
# or: http://www.colbertnation.com/the-colbert-report-videos/421667/november-29-2012/moon-shattering-news
# or: http://www.colbertnation.com/the-colbert-report-collections/422008/festival-of-lights/79524
_VALID_URL = r'''(?x)^(:(?P<shortname>tds|thedailyshow)
|https?://(:www\.)?
(?P<showname>thedailyshow|thecolbertreport|tosh)\.(?:cc\.)?com/
((?:full-)?episodes/(?:[0-9a-z]{6}/)?(?P<episode>.*)|
(?P<clip>
(?:(?:guests/[^/]+|videos|video-(?:clips|playlists)|special-editions|news-team/[^/]+)/[^/]+/(?P<videotitle>[^/?#]+))
|(the-colbert-report-(videos|collections)/(?P<clipID>[0-9]+)/[^/]*/(?P<cntitle>.*?))
|(watch/(?P<date>[^/]*)/(?P<tdstitle>.*))
)|
(?P<interview>
extended-interviews/(?P<interID>[0-9a-z]+)/
(?:playlist_tds_extended_)?(?P<interview_title>[^/?#]*?)
(?:/[^/?#]?|[?#]|$))))
'''
_TESTS = [{ _TESTS = [{
'url': 'http://thedailyshow.cc.com/watch/thu-december-13-2012/kristen-stewart',
'md5': '4e2f5cb088a83cd8cdb7756132f9739d',
'info_dict': {
'id': 'ab9ab3e7-5a98-4dbe-8b21-551dc0523d55',
'ext': 'mp4',
'upload_date': '20121213',
'description': 'Kristen Stewart learns to let loose in "On the Road."',
'uploader': 'thedailyshow',
'title': 'thedailyshow kristen-stewart part 1',
}
}, {
'url': 'http://thedailyshow.cc.com/extended-interviews/b6364d/sarah-chayes-extended-interview',
'info_dict': {
'id': 'sarah-chayes-extended-interview',
'description': 'Carnegie Endowment Senior Associate Sarah Chayes discusses how corrupt institutions function throughout the world in her book "Thieves of State: Why Corruption Threatens Global Security."',
'title': 'thedailyshow Sarah Chayes Extended Interview',
},
'playlist': [
{
'info_dict': {
'id': '0baad492-cbec-4ec1-9e50-ad91c291127f',
'ext': 'mp4',
'upload_date': '20150129',
'description': 'Carnegie Endowment Senior Associate Sarah Chayes discusses how corrupt institutions function throughout the world in her book "Thieves of State: Why Corruption Threatens Global Security."',
'uploader': 'thedailyshow',
'title': 'thedailyshow sarah-chayes-extended-interview part 1',
},
},
{
'info_dict': {
'id': '1e4fb91b-8ce7-4277-bd7c-98c9f1bbd283',
'ext': 'mp4',
'upload_date': '20150129',
'description': 'Carnegie Endowment Senior Associate Sarah Chayes discusses how corrupt institutions function throughout the world in her book "Thieves of State: Why Corruption Threatens Global Security."',
'uploader': 'thedailyshow',
'title': 'thedailyshow sarah-chayes-extended-interview part 2',
},
},
],
'params': {
'skip_download': True,
},
}, {
'url': 'http://thedailyshow.cc.com/extended-interviews/xm3fnq/andrew-napolitano-extended-interview',
'only_matching': True,
}, {
'url': 'http://thecolbertreport.cc.com/videos/29w6fx/-realhumanpraise-for-fox-news',
'only_matching': True,
}, {
'url': 'http://thecolbertreport.cc.com/videos/gh6urb/neil-degrasse-tyson-pt--1?xrs=eml_col_031114',
'only_matching': True,
}, {
'url': 'http://thedailyshow.cc.com/guests/michael-lewis/3efna8/exclusive---michael-lewis-extended-interview-pt--3',
'only_matching': True,
}, {
'url': 'http://thedailyshow.cc.com/episodes/sy7yv0/april-8--2014---denis-leary',
'only_matching': True,
}, {
'url': 'http://thecolbertreport.cc.com/episodes/8ase07/april-8--2014---jane-goodall',
'only_matching': True,
}, {
'url': 'http://thedailyshow.cc.com/video-playlists/npde3s/the-daily-show-19088-highlights',
'only_matching': True,
}, {
'url': 'http://thedailyshow.cc.com/video-playlists/t6d9sg/the-daily-show-20038-highlights/be3cwo',
'only_matching': True,
}, {
'url': 'http://thedailyshow.cc.com/special-editions/2l8fdb/special-edition---a-look-back-at-food',
'only_matching': True,
}, {
'url': 'http://thedailyshow.cc.com/news-team/michael-che/7wnfel/we-need-to-talk-about-israel',
'only_matching': True,
}, {
'url': 'http://tosh.cc.com/video-clips/68g93d/twitter-users-share-summer-plans', 'url': 'http://tosh.cc.com/video-clips/68g93d/twitter-users-share-summer-plans',
'info_dict': {
'description': 'Tosh asked fans to share their summer plans.',
'title': 'Twitter Users Share Summer Plans',
},
'playlist': [{
'md5': 'f269e88114c1805bb6d7653fecea9e06',
'info_dict': {
'id': '90498ec2-ed00-11e0-aca6-0026b9414f30',
'ext': 'mp4',
'title': 'Tosh.0|June 9, 2077|2|211|Twitter Users Share Summer Plans',
'description': 'Tosh asked fans to share their summer plans.',
'thumbnail': 're:^https?://.*\.jpg',
# It's really reported to be published on year 2077
'upload_date': '20770610',
'timestamp': 3390510600,
'subtitles': {
'en': 'mincount:3',
},
},
}]
}, {
'url': 'http://tosh.cc.com/video-collections/x2iz7k/just-plain-foul/m5q4fp',
'only_matching': True, 'only_matching': True,
}] }]
_available_formats = ['3500', '2200', '1700', '1200', '750', '400'] @classmethod
def _transform_rtmp_url(cls, rtmp_video_url):
new_urls = super(ToshIE, cls)._transform_rtmp_url(rtmp_video_url)
new_urls['rtmp'] = rtmp_video_url.replace('viacomccstrm', 'viacommtvstrm')
return new_urls
_video_extensions = {
'3500': 'mp4', class ComedyCentralTVIE(MTVServicesInfoExtractor):
'2200': 'mp4', _VALID_URL = r'https?://(?:www\.)?comedycentral\.tv/(?:staffeln|shows)/(?P<id>[^/?#&]+)'
'1700': 'mp4', _TESTS = [{
'1200': 'mp4', 'url': 'http://www.comedycentral.tv/staffeln/7436-the-mindy-project-staffel-4',
'750': 'mp4', 'info_dict': {
'400': 'mp4', 'id': 'local_playlist-f99b626bdfe13568579a',
} 'ext': 'flv',
_video_dimensions = { 'title': 'Episode_the-mindy-project_shows_season-4_episode-3_full-episode_part1',
'3500': (1280, 720), },
'2200': (960, 540), 'params': {
'1700': (768, 432), # rtmp download
'1200': (640, 360), 'skip_download': True,
'750': (512, 288), },
'400': (384, 216), }, {
} 'url': 'http://www.comedycentral.tv/shows/1074-workaholics',
'only_matching': True,
}, {
'url': 'http://www.comedycentral.tv/shows/1727-the-mindy-project/bonus',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id = self._match_id(url)
if mobj.group('shortname'): webpage = self._download_webpage(url, video_id)
return self.url_result('http://www.cc.com/shows/the-daily-show-with-trevor-noah/full-episodes')
if mobj.group('clip'): mrss_url = self._search_regex(
if mobj.group('videotitle'): r'data-mrss=(["\'])(?P<url>(?:(?!\1).)+)\1',
epTitle = mobj.group('videotitle') webpage, 'mrss url', group='url')
elif mobj.group('showname') == 'thedailyshow':
epTitle = mobj.group('tdstitle')
else:
epTitle = mobj.group('cntitle')
dlNewest = False
elif mobj.group('interview'):
epTitle = mobj.group('interview_title')
dlNewest = False
else:
dlNewest = not mobj.group('episode')
if dlNewest:
epTitle = mobj.group('showname')
else:
epTitle = mobj.group('episode')
show_name = mobj.group('showname')
webpage, htmlHandle = self._download_webpage_handle(url, epTitle) return self._get_videos_info_from_url(mrss_url, video_id)
if dlNewest:
url = htmlHandle.geturl()
mobj = re.match(self._VALID_URL, url, re.VERBOSE)
if mobj is None:
raise ExtractorError('Invalid redirected URL: ' + url)
if mobj.group('episode') == '':
raise ExtractorError('Redirected URL is still not specific: ' + url)
epTitle = (mobj.group('episode') or mobj.group('videotitle')).rpartition('/')[-1]
mMovieParams = re.findall('(?:<param name="movie" value="|var url = ")(http://media.mtvnservices.com/([^"]*(?:episode|video).*?:.*?))"', webpage)
if len(mMovieParams) == 0:
# The Colbert Report embeds the information in a without
# a URL prefix; so extract the alternate reference
# and then add the URL prefix manually.
altMovieParams = re.findall('data-mgid="([^"]*(?:episode|video|playlist).*?:.*?)"', webpage) class ComedyCentralShortnameIE(InfoExtractor):
if len(altMovieParams) == 0: _VALID_URL = r'^:(?P<id>tds|thedailyshow)$'
raise ExtractorError('unable to find Flash URL in webpage ' + url) _TESTS = [{
else: 'url': ':tds',
mMovieParams = [('http://media.mtvnservices.com/' + altMovieParams[0], altMovieParams[0])] 'only_matching': True,
}, {
'url': ':thedailyshow',
'only_matching': True,
}]
uri = mMovieParams[0][1] def _real_extract(self, url):
# Correct cc.com in uri video_id = self._match_id(url)
uri = re.sub(r'(episode:[^.]+)(\.cc)?\.com', r'\1.com', uri) shortcut_map = {
'tds': 'http://www.cc.com/shows/the-daily-show-with-trevor-noah/full-episodes',
index_url = 'http://%s.cc.com/feeds/mrss?%s' % (show_name, compat_urllib_parse_urlencode({'uri': uri})) 'thedailyshow': 'http://www.cc.com/shows/the-daily-show-with-trevor-noah/full-episodes',
idoc = self._download_xml(
index_url, epTitle,
'Downloading show index', 'Unable to download episode index')
title = idoc.find('./channel/title').text
description = idoc.find('./channel/description').text
entries = []
item_els = idoc.findall('.//item')
for part_num, itemEl in enumerate(item_els):
upload_date = unified_strdate(itemEl.findall('./pubDate')[0].text)
thumbnail = itemEl.find('.//{http://search.yahoo.com/mrss/}thumbnail').attrib.get('url')
content = itemEl.find('.//{http://search.yahoo.com/mrss/}content')
duration = float_or_none(content.attrib.get('duration'))
mediagen_url = content.attrib['url']
guid = itemEl.find('./guid').text.rpartition(':')[-1]
cdoc = self._download_xml(
mediagen_url, epTitle,
'Downloading configuration for segment %d / %d' % (part_num + 1, len(item_els)))
turls = []
for rendition in cdoc.findall('.//rendition'):
finfo = (rendition.attrib['bitrate'], rendition.findall('./src')[0].text)
turls.append(finfo)
formats = []
for format, rtmp_video_url in turls:
w, h = self._video_dimensions.get(format, (None, None))
formats.append({
'format_id': 'vhttp-%s' % format,
'url': self._transform_rtmp_url(rtmp_video_url),
'ext': self._video_extensions.get(format, 'mp4'),
'height': h,
'width': w,
})
formats.append({
'format_id': 'rtmp-%s' % format,
'url': rtmp_video_url.replace('viacomccstrm', 'viacommtvstrm'),
'ext': self._video_extensions.get(format, 'mp4'),
'height': h,
'width': w,
})
self._sort_formats(formats)
subtitles = self._extract_subtitles(cdoc, guid)
virtual_id = show_name + ' ' + epTitle + ' part ' + compat_str(part_num + 1)
entries.append({
'id': guid,
'title': virtual_id,
'formats': formats,
'uploader': show_name,
'upload_date': upload_date,
'duration': duration,
'thumbnail': thumbnail,
'description': description,
'subtitles': subtitles,
})
return {
'_type': 'playlist',
'id': epTitle,
'entries': entries,
'title': show_name + ' ' + title,
'description': description,
} }
return self.url_result(shortcut_map[video_id])

View File

@ -21,6 +21,7 @@ from ..compat import (
compat_os_name, compat_os_name,
compat_str, compat_str,
compat_urllib_error, compat_urllib_error,
compat_urllib_parse_unquote,
compat_urllib_parse_urlencode, compat_urllib_parse_urlencode,
compat_urllib_request, compat_urllib_request,
compat_urlparse, compat_urlparse,
@ -44,6 +45,7 @@ from ..utils import (
sanitized_Request, sanitized_Request,
unescapeHTML, unescapeHTML,
unified_strdate, unified_strdate,
unified_timestamp,
url_basename, url_basename,
xpath_element, xpath_element,
xpath_text, xpath_text,
@ -54,6 +56,8 @@ from ..utils import (
update_Request, update_Request,
update_url_query, update_url_query,
parse_m3u8_attributes, parse_m3u8_attributes,
extract_attributes,
parse_codecs,
) )
@ -84,6 +88,9 @@ class InfoExtractor(object):
Potential fields: Potential fields:
* url Mandatory. The URL of the video file * url Mandatory. The URL of the video file
* manifest_url
The URL of the manifest file in case of
fragmented media (DASH, hls, hds)
* ext Will be calculated from URL if missing * ext Will be calculated from URL if missing
* format A human-readable description of the format * format A human-readable description of the format
("mp4 container with h264/opus"). ("mp4 container with h264/opus").
@ -112,6 +119,11 @@ class InfoExtractor(object):
download, lower-case. download, lower-case.
"http", "https", "rtsp", "rtmp", "rtmpe", "http", "https", "rtsp", "rtmp", "rtmpe",
"m3u8", "m3u8_native" or "http_dash_segments". "m3u8", "m3u8_native" or "http_dash_segments".
* fragments A list of fragments of the fragmented media,
with the following entries:
* "url" (mandatory) - fragment's URL
* "duration" (optional, int or float)
* "filesize" (optional, int)
* preference Order number of this format. If this field is * preference Order number of this format. If this field is
present and not None, the formats get sorted present and not None, the formats get sorted
by this field, regardless of all other values. by this field, regardless of all other values.
@ -161,6 +173,7 @@ class InfoExtractor(object):
* "height" (optional, int) * "height" (optional, int)
* "resolution" (optional, string "{width}x{height"}, * "resolution" (optional, string "{width}x{height"},
deprecated) deprecated)
* "filesize" (optional, int)
thumbnail: Full URL to a video thumbnail image. thumbnail: Full URL to a video thumbnail image.
description: Full video description. description: Full video description.
uploader: Full name of the video uploader. uploader: Full name of the video uploader.
@ -658,35 +671,48 @@ class InfoExtractor(object):
else: else:
return res return res
def _get_login_info(self): def _get_netrc_login_info(self, netrc_machine=None):
username = None
password = None
netrc_machine = netrc_machine or self._NETRC_MACHINE
if self._downloader.params.get('usenetrc', False):
try:
info = netrc.netrc().authenticators(netrc_machine)
if info is not None:
username = info[0]
password = info[2]
else:
raise netrc.NetrcParseError(
'No authenticators for %s' % netrc_machine)
except (IOError, netrc.NetrcParseError) as err:
self._downloader.report_warning(
'parsing .netrc: %s' % error_to_compat_str(err))
return username, password
def _get_login_info(self, username_option='username', password_option='password', netrc_machine=None):
""" """
Get the login info as (username, password) Get the login info as (username, password)
It will look in the netrc file using the _NETRC_MACHINE value First look for the manually specified credentials using username_option
and password_option as keys in params dictionary. If no such credentials
available look in the netrc file using the netrc_machine or _NETRC_MACHINE
value.
If there's no info available, return (None, None) If there's no info available, return (None, None)
""" """
if self._downloader is None: if self._downloader is None:
return (None, None) return (None, None)
username = None
password = None
downloader_params = self._downloader.params downloader_params = self._downloader.params
# Attempt to use provided username and password or .netrc data # Attempt to use provided username and password or .netrc data
if downloader_params.get('username') is not None: if downloader_params.get(username_option) is not None:
username = downloader_params['username'] username = downloader_params[username_option]
password = downloader_params['password'] password = downloader_params[password_option]
elif downloader_params.get('usenetrc', False): else:
try: username, password = self._get_netrc_login_info(netrc_machine)
info = netrc.netrc().authenticators(self._NETRC_MACHINE)
if info is not None:
username = info[0]
password = info[2]
else:
raise netrc.NetrcParseError('No authenticators for %s' % self._NETRC_MACHINE)
except (IOError, netrc.NetrcParseError) as err:
self._downloader.report_warning('parsing .netrc: %s' % error_to_compat_str(err))
return (username, password) return username, password
def _get_tfa_info(self, note='two-factor verification code'): def _get_tfa_info(self, note='two-factor verification code'):
""" """
@ -723,9 +749,14 @@ class InfoExtractor(object):
[^>]+?content=(["\'])(?P<content>.*?)\2''' % re.escape(prop) [^>]+?content=(["\'])(?P<content>.*?)\2''' % re.escape(prop)
def _og_search_property(self, prop, html, name=None, **kargs): def _og_search_property(self, prop, html, name=None, **kargs):
if not isinstance(prop, (list, tuple)):
prop = [prop]
if name is None: if name is None:
name = 'OpenGraph %s' % prop name = 'OpenGraph %s' % prop[0]
escaped = self._search_regex(self._og_regexes(prop), html, name, flags=re.DOTALL, **kargs) og_regexes = []
for p in prop:
og_regexes.extend(self._og_regexes(p))
escaped = self._search_regex(og_regexes, html, name, flags=re.DOTALL, **kargs)
if escaped is None: if escaped is None:
return None return None
return unescapeHTML(escaped) return unescapeHTML(escaped)
@ -749,10 +780,12 @@ class InfoExtractor(object):
return self._og_search_property('url', html, **kargs) return self._og_search_property('url', html, **kargs)
def _html_search_meta(self, name, html, display_name=None, fatal=False, **kwargs): def _html_search_meta(self, name, html, display_name=None, fatal=False, **kwargs):
if not isinstance(name, (list, tuple)):
name = [name]
if display_name is None: if display_name is None:
display_name = name display_name = name[0]
return self._html_search_regex( return self._html_search_regex(
self._meta_regex(name), [self._meta_regex(n) for n in name],
html, display_name, fatal=fatal, group='content', **kwargs) html, display_name, fatal=fatal, group='content', **kwargs)
def _dc_search_uploader(self, html): def _dc_search_uploader(self, html):
@ -801,56 +834,82 @@ class InfoExtractor(object):
return self._html_search_meta('twitter:player', html, return self._html_search_meta('twitter:player', html,
'twitter card player') 'twitter card player')
def _search_json_ld(self, html, video_id, **kwargs): def _search_json_ld(self, html, video_id, expected_type=None, **kwargs):
json_ld = self._search_regex( json_ld = self._search_regex(
r'(?s)<script[^>]+type=(["\'])application/ld\+json\1[^>]*>(?P<json_ld>.+?)</script>', r'(?s)<script[^>]+type=(["\'])application/ld\+json\1[^>]*>(?P<json_ld>.+?)</script>',
html, 'JSON-LD', group='json_ld', **kwargs) html, 'JSON-LD', group='json_ld', **kwargs)
default = kwargs.get('default', NO_DEFAULT)
if not json_ld: if not json_ld:
return {} return default if default is not NO_DEFAULT else {}
return self._json_ld(json_ld, video_id, fatal=kwargs.get('fatal', True)) # JSON-LD may be malformed and thus `fatal` should be respected.
# At the same time `default` may be passed that assumes `fatal=False`
# for _search_regex. Let's simulate the same behavior here as well.
fatal = kwargs.get('fatal', True) if default == NO_DEFAULT else False
return self._json_ld(json_ld, video_id, fatal=fatal, expected_type=expected_type)
def _json_ld(self, json_ld, video_id, fatal=True): def _json_ld(self, json_ld, video_id, fatal=True, expected_type=None):
if isinstance(json_ld, compat_str): if isinstance(json_ld, compat_str):
json_ld = self._parse_json(json_ld, video_id, fatal=fatal) json_ld = self._parse_json(json_ld, video_id, fatal=fatal)
if not json_ld: if not json_ld:
return {} return {}
info = {} info = {}
if json_ld.get('@context') == 'http://schema.org': if not isinstance(json_ld, (list, tuple, dict)):
item_type = json_ld.get('@type') return info
if item_type == 'TVEpisode': if isinstance(json_ld, dict):
info.update({ json_ld = [json_ld]
'episode': unescapeHTML(json_ld.get('name')), for e in json_ld:
'episode_number': int_or_none(json_ld.get('episodeNumber')), if e.get('@context') == 'http://schema.org':
'description': unescapeHTML(json_ld.get('description')), item_type = e.get('@type')
}) if expected_type is not None and expected_type != item_type:
part_of_season = json_ld.get('partOfSeason') return info
if isinstance(part_of_season, dict) and part_of_season.get('@type') == 'TVSeason': if item_type == 'TVEpisode':
info['season_number'] = int_or_none(part_of_season.get('seasonNumber')) info.update({
part_of_series = json_ld.get('partOfSeries') 'episode': unescapeHTML(e.get('name')),
if isinstance(part_of_series, dict) and part_of_series.get('@type') == 'TVSeries': 'episode_number': int_or_none(e.get('episodeNumber')),
info['series'] = unescapeHTML(part_of_series.get('name')) 'description': unescapeHTML(e.get('description')),
elif item_type == 'Article': })
info.update({ part_of_season = e.get('partOfSeason')
'timestamp': parse_iso8601(json_ld.get('datePublished')), if isinstance(part_of_season, dict) and part_of_season.get('@type') == 'TVSeason':
'title': unescapeHTML(json_ld.get('headline')), info['season_number'] = int_or_none(part_of_season.get('seasonNumber'))
'description': unescapeHTML(json_ld.get('articleBody')), part_of_series = e.get('partOfSeries') or e.get('partOfTVSeries')
}) if isinstance(part_of_series, dict) and part_of_series.get('@type') == 'TVSeries':
info['series'] = unescapeHTML(part_of_series.get('name'))
elif item_type == 'Article':
info.update({
'timestamp': parse_iso8601(e.get('datePublished')),
'title': unescapeHTML(e.get('headline')),
'description': unescapeHTML(e.get('articleBody')),
})
elif item_type == 'VideoObject':
info.update({
'url': e.get('contentUrl'),
'title': unescapeHTML(e.get('name')),
'description': unescapeHTML(e.get('description')),
'thumbnail': e.get('thumbnailUrl'),
'duration': parse_duration(e.get('duration')),
'timestamp': unified_timestamp(e.get('uploadDate')),
'filesize': float_or_none(e.get('contentSize')),
'tbr': int_or_none(e.get('bitrate')),
'width': int_or_none(e.get('width')),
'height': int_or_none(e.get('height')),
})
break
return dict((k, v) for k, v in info.items() if v is not None) return dict((k, v) for k, v in info.items() if v is not None)
@staticmethod @staticmethod
def _hidden_inputs(html): def _hidden_inputs(html):
html = re.sub(r'<!--(?:(?!<!--).)*-->', '', html) html = re.sub(r'<!--(?:(?!<!--).)*-->', '', html)
hidden_inputs = {} hidden_inputs = {}
for input in re.findall(r'(?i)<input([^>]+)>', html): for input in re.findall(r'(?i)(<input[^>]+>)', html):
if not re.search(r'type=(["\'])(?:hidden|submit)\1', input): attrs = extract_attributes(input)
if not input:
continue continue
name = re.search(r'(?:name|id)=(["\'])(?P<value>.+?)\1', input) if attrs.get('type') not in ('hidden', 'submit'):
if not name:
continue continue
value = re.search(r'value=(["\'])(?P<value>.*?)\1', input) name = attrs.get('name') or attrs.get('id')
if not value: value = attrs.get('value')
continue if name and value is not None:
hidden_inputs[name.group('value')] = value.group('value') hidden_inputs[name] = value
return hidden_inputs return hidden_inputs
def _form_hidden_inputs(self, form_id, html): def _form_hidden_inputs(self, form_id, html):
@ -876,7 +935,11 @@ class InfoExtractor(object):
f['ext'] = determine_ext(f['url']) f['ext'] = determine_ext(f['url'])
if isinstance(field_preference, (list, tuple)): if isinstance(field_preference, (list, tuple)):
return tuple(f.get(field) if f.get(field) is not None else -1 for field in field_preference) return tuple(
f.get(field)
if f.get(field) is not None
else ('' if field == 'format_id' else -1)
for field in field_preference)
preference = f.get('preference') preference = f.get('preference')
if preference is None: if preference is None:
@ -884,7 +947,8 @@ class InfoExtractor(object):
if f.get('ext') in ['f4f', 'f4m']: # Not yet supported if f.get('ext') in ['f4f', 'f4m']: # Not yet supported
preference -= 0.5 preference -= 0.5
proto_preference = 0 if determine_protocol(f) in ['http', 'https'] else -0.1 protocol = f.get('protocol') or determine_protocol(f)
proto_preference = 0 if protocol in ['http', 'https'] else (-0.5 if protocol == 'rtsp' else -0.1)
if f.get('vcodec') == 'none': # audio only if f.get('vcodec') == 'none': # audio only
preference -= 50 preference -= 50
@ -1087,6 +1151,7 @@ class InfoExtractor(object):
formats.append({ formats.append({
'format_id': format_id, 'format_id': format_id,
'url': manifest_url, 'url': manifest_url,
'manifest_url': manifest_url,
'ext': 'flv' if bootstrap_info is not None else None, 'ext': 'flv' if bootstrap_info is not None else None,
'tbr': tbr, 'tbr': tbr,
'width': width, 'width': width,
@ -1101,7 +1166,7 @@ class InfoExtractor(object):
'url': m3u8_url, 'url': m3u8_url,
'ext': ext, 'ext': ext,
'protocol': 'm3u8', 'protocol': 'm3u8',
'preference': preference - 1 if preference else -1, 'preference': preference - 100 if preference else -100,
'resolution': 'multiple', 'resolution': 'multiple',
'format_note': 'Quality selection URL', 'format_note': 'Quality selection URL',
} }
@ -1111,13 +1176,6 @@ class InfoExtractor(object):
m3u8_id=None, note=None, errnote=None, m3u8_id=None, note=None, errnote=None,
fatal=True, live=False): fatal=True, live=False):
formats = [self._m3u8_meta_format(m3u8_url, ext, preference, m3u8_id)]
format_url = lambda u: (
u
if re.match(r'^https?://', u)
else compat_urlparse.urljoin(m3u8_url, u))
res = self._download_webpage_handle( res = self._download_webpage_handle(
m3u8_url, video_id, m3u8_url, video_id,
note=note or 'Downloading m3u8 information', note=note or 'Downloading m3u8 information',
@ -1128,6 +1186,13 @@ class InfoExtractor(object):
m3u8_doc, urlh = res m3u8_doc, urlh = res
m3u8_url = urlh.geturl() m3u8_url = urlh.geturl()
formats = [self._m3u8_meta_format(m3u8_url, ext, preference, m3u8_id)]
format_url = lambda u: (
u
if re.match(r'^https?://', u)
else compat_urlparse.urljoin(m3u8_url, u))
# We should try extracting formats only from master playlists [1], i.e. # We should try extracting formats only from master playlists [1], i.e.
# playlists that describe available qualities. On the other hand media # playlists that describe available qualities. On the other hand media
# playlists [2] should be returned as is since they contain just the media # playlists [2] should be returned as is since they contain just the media
@ -1149,37 +1214,57 @@ class InfoExtractor(object):
'protocol': entry_protocol, 'protocol': entry_protocol,
'preference': preference, 'preference': preference,
}] }]
last_info = None last_info = {}
last_media = None last_media = {}
for line in m3u8_doc.splitlines(): for line in m3u8_doc.splitlines():
if line.startswith('#EXT-X-STREAM-INF:'): if line.startswith('#EXT-X-STREAM-INF:'):
last_info = parse_m3u8_attributes(line) last_info = parse_m3u8_attributes(line)
elif line.startswith('#EXT-X-MEDIA:'): elif line.startswith('#EXT-X-MEDIA:'):
last_media = parse_m3u8_attributes(line) media = parse_m3u8_attributes(line)
media_type = media.get('TYPE')
if media_type in ('VIDEO', 'AUDIO'):
media_url = media.get('URI')
if media_url:
format_id = []
for v in (media.get('GROUP-ID'), media.get('NAME')):
if v:
format_id.append(v)
formats.append({
'format_id': '-'.join(format_id),
'url': format_url(media_url),
'language': media.get('LANGUAGE'),
'vcodec': 'none' if media_type == 'AUDIO' else None,
'ext': ext,
'protocol': entry_protocol,
'preference': preference,
})
else:
# When there is no URI in EXT-X-MEDIA let this tag's
# data be used by regular URI lines below
last_media = media
elif line.startswith('#') or not line.strip(): elif line.startswith('#') or not line.strip():
continue continue
else: else:
if last_info is None: tbr = int_or_none(last_info.get('AVERAGE-BANDWIDTH') or last_info.get('BANDWIDTH'), scale=1000)
formats.append({'url': format_url(line)})
continue
tbr = int_or_none(last_info.get('BANDWIDTH'), scale=1000)
format_id = [] format_id = []
if m3u8_id: if m3u8_id:
format_id.append(m3u8_id) format_id.append(m3u8_id)
last_media_name = last_media.get('NAME') if last_media and last_media.get('TYPE') not in ('SUBTITLES', 'CLOSED-CAPTIONS') else None
# Despite specification does not mention NAME attribute for # Despite specification does not mention NAME attribute for
# EXT-X-STREAM-INF it still sometimes may be present # EXT-X-STREAM-INF it still sometimes may be present
stream_name = last_info.get('NAME') or last_media_name stream_name = last_info.get('NAME') or last_media.get('NAME')
# Bandwidth of live streams may differ over time thus making # Bandwidth of live streams may differ over time thus making
# format_id unpredictable. So it's better to keep provided # format_id unpredictable. So it's better to keep provided
# format_id intact. # format_id intact.
if not live: if not live:
format_id.append(stream_name if stream_name else '%d' % (tbr if tbr else len(formats))) format_id.append(stream_name if stream_name else '%d' % (tbr if tbr else len(formats)))
manifest_url = format_url(line.strip())
f = { f = {
'format_id': '-'.join(format_id), 'format_id': '-'.join(format_id),
'url': format_url(line.strip()), 'url': manifest_url,
'manifest_url': manifest_url,
'tbr': tbr, 'tbr': tbr,
'ext': ext, 'ext': ext,
'fps': float_or_none(last_info.get('FRAME-RATE')),
'protocol': entry_protocol, 'protocol': entry_protocol,
'preference': preference, 'preference': preference,
} }
@ -1188,29 +1273,20 @@ class InfoExtractor(object):
width_str, height_str = resolution.split('x') width_str, height_str = resolution.split('x')
f['width'] = int(width_str) f['width'] = int(width_str)
f['height'] = int(height_str) f['height'] = int(height_str)
codecs = last_info.get('CODECS') # Unified Streaming Platform
if codecs: mobj = re.search(
vcodec, acodec = [None] * 2 r'audio.*?(?:%3D|=)(\d+)(?:-video.*?(?:%3D|=)(\d+))?', f['url'])
va_codecs = codecs.split(',') if mobj:
if len(va_codecs) == 1: abr, vbr = mobj.groups()
# Audio only entries usually come with single codec and abr, vbr = float_or_none(abr, 1000), float_or_none(vbr, 1000)
# no resolution. For more robustness we also check it to
# be mp4 audio.
if not resolution and va_codecs[0].startswith('mp4a'):
vcodec, acodec = 'none', va_codecs[0]
else:
vcodec = va_codecs[0]
else:
vcodec, acodec = va_codecs[:2]
f.update({ f.update({
'acodec': acodec, 'vbr': vbr,
'vcodec': vcodec, 'abr': abr,
}) })
if last_media is not None: f.update(parse_codecs(last_info.get('CODECS')))
f['m3u8_media'] = last_media
last_media = None
formats.append(f) formats.append(f)
last_info = {} last_info = {}
last_media = {}
return formats return formats
@staticmethod @staticmethod
@ -1457,9 +1533,17 @@ class InfoExtractor(object):
mpd_base_url = re.match(r'https?://.+/', urlh.geturl()).group() mpd_base_url = re.match(r'https?://.+/', urlh.geturl()).group()
return self._parse_mpd_formats( return self._parse_mpd_formats(
compat_etree_fromstring(mpd.encode('utf-8')), mpd_id, mpd_base_url, formats_dict=formats_dict) compat_etree_fromstring(mpd.encode('utf-8')), mpd_id, mpd_base_url,
formats_dict=formats_dict, mpd_url=mpd_url)
def _parse_mpd_formats(self, mpd_doc, mpd_id=None, mpd_base_url='', formats_dict={}): def _parse_mpd_formats(self, mpd_doc, mpd_id=None, mpd_base_url='', formats_dict={}, mpd_url=None):
"""
Parse formats from MPD manifest.
References:
1. MPEG-DASH Standard, ISO/IEC 23009-1:2014(E),
http://standards.iso.org/ittf/PubliclyAvailableStandards/c065274_ISO_IEC_23009-1_2014.zip
2. https://en.wikipedia.org/wiki/Dynamic_Adaptive_Streaming_over_HTTP
"""
if mpd_doc.get('type') == 'dynamic': if mpd_doc.get('type') == 'dynamic':
return [] return []
@ -1473,34 +1557,52 @@ class InfoExtractor(object):
def extract_multisegment_info(element, ms_parent_info): def extract_multisegment_info(element, ms_parent_info):
ms_info = ms_parent_info.copy() ms_info = ms_parent_info.copy()
# As per [1, 5.3.9.2.2] SegmentList and SegmentTemplate share some
# common attributes and elements. We will only extract relevant
# for us.
def extract_common(source):
segment_timeline = source.find(_add_ns('SegmentTimeline'))
if segment_timeline is not None:
s_e = segment_timeline.findall(_add_ns('S'))
if s_e:
ms_info['total_number'] = 0
ms_info['s'] = []
for s in s_e:
r = int(s.get('r', 0))
ms_info['total_number'] += 1 + r
ms_info['s'].append({
't': int(s.get('t', 0)),
# @d is mandatory (see [1, 5.3.9.6.2, Table 17, page 60])
'd': int(s.attrib['d']),
'r': r,
})
start_number = source.get('startNumber')
if start_number:
ms_info['start_number'] = int(start_number)
timescale = source.get('timescale')
if timescale:
ms_info['timescale'] = int(timescale)
segment_duration = source.get('duration')
if segment_duration:
ms_info['segment_duration'] = int(segment_duration)
def extract_Initialization(source):
initialization = source.find(_add_ns('Initialization'))
if initialization is not None:
ms_info['initialization_url'] = initialization.attrib['sourceURL']
segment_list = element.find(_add_ns('SegmentList')) segment_list = element.find(_add_ns('SegmentList'))
if segment_list is not None: if segment_list is not None:
extract_common(segment_list)
extract_Initialization(segment_list)
segment_urls_e = segment_list.findall(_add_ns('SegmentURL')) segment_urls_e = segment_list.findall(_add_ns('SegmentURL'))
if segment_urls_e: if segment_urls_e:
ms_info['segment_urls'] = [segment.attrib['media'] for segment in segment_urls_e] ms_info['segment_urls'] = [segment.attrib['media'] for segment in segment_urls_e]
initialization = segment_list.find(_add_ns('Initialization'))
if initialization is not None:
ms_info['initialization_url'] = initialization.attrib['sourceURL']
else: else:
segment_template = element.find(_add_ns('SegmentTemplate')) segment_template = element.find(_add_ns('SegmentTemplate'))
if segment_template is not None: if segment_template is not None:
start_number = segment_template.get('startNumber') extract_common(segment_template)
if start_number:
ms_info['start_number'] = int(start_number)
segment_timeline = segment_template.find(_add_ns('SegmentTimeline'))
if segment_timeline is not None:
s_e = segment_timeline.findall(_add_ns('S'))
if s_e:
ms_info['total_number'] = 0
for s in s_e:
ms_info['total_number'] += 1 + int(s.get('r', '0'))
else:
timescale = segment_template.get('timescale')
if timescale:
ms_info['timescale'] = int(timescale)
segment_duration = segment_template.get('duration')
if segment_duration:
ms_info['segment_duration'] = int(segment_duration)
media_template = segment_template.get('media') media_template = segment_template.get('media')
if media_template: if media_template:
ms_info['media_template'] = media_template ms_info['media_template'] = media_template
@ -1508,11 +1610,14 @@ class InfoExtractor(object):
if initialization: if initialization:
ms_info['initialization_url'] = initialization ms_info['initialization_url'] = initialization
else: else:
initialization = segment_template.find(_add_ns('Initialization')) extract_Initialization(segment_template)
if initialization is not None:
ms_info['initialization_url'] = initialization.attrib['sourceURL']
return ms_info return ms_info
def combine_url(base_url, target_url):
if re.match(r'^https?://', target_url):
return target_url
return '%s%s%s' % (base_url, '' if base_url.endswith('/') else '/', target_url)
mpd_duration = parse_duration(mpd_doc.get('mediaPresentationDuration')) mpd_duration = parse_duration(mpd_doc.get('mediaPresentationDuration'))
formats = [] formats = []
for period in mpd_doc.findall(_add_ns('Period')): for period in mpd_doc.findall(_add_ns('Period')):
@ -1530,7 +1635,7 @@ class InfoExtractor(object):
continue continue
representation_attrib = adaptation_set.attrib.copy() representation_attrib = adaptation_set.attrib.copy()
representation_attrib.update(representation.attrib) representation_attrib.update(representation.attrib)
# According to page 41 of ISO/IEC 29001-1:2014, @mimeType is mandatory # According to [1, 5.3.7.2, Table 9, page 41], @mimeType is mandatory
mime_type = representation_attrib['mimeType'] mime_type = representation_attrib['mimeType']
content_type = mime_type.split('/')[0] content_type = mime_type.split('/')[0]
if content_type == 'text': if content_type == 'text':
@ -1555,6 +1660,7 @@ class InfoExtractor(object):
f = { f = {
'format_id': '%s-%s' % (mpd_id, representation_id) if mpd_id else representation_id, 'format_id': '%s-%s' % (mpd_id, representation_id) if mpd_id else representation_id,
'url': base_url, 'url': base_url,
'manifest_url': mpd_url,
'ext': mimetype2ext(mime_type), 'ext': mimetype2ext(mime_type),
'width': int_or_none(representation_attrib.get('width')), 'width': int_or_none(representation_attrib.get('width')),
'height': int_or_none(representation_attrib.get('height')), 'height': int_or_none(representation_attrib.get('height')),
@ -1569,33 +1675,88 @@ class InfoExtractor(object):
} }
representation_ms_info = extract_multisegment_info(representation, adaption_set_ms_info) representation_ms_info = extract_multisegment_info(representation, adaption_set_ms_info)
if 'segment_urls' not in representation_ms_info and 'media_template' in representation_ms_info: if 'segment_urls' not in representation_ms_info and 'media_template' in representation_ms_info:
if 'total_number' not in representation_ms_info and 'segment_duration':
segment_duration = float(representation_ms_info['segment_duration']) / float(representation_ms_info['timescale'])
representation_ms_info['total_number'] = int(math.ceil(float(period_duration) / segment_duration))
media_template = representation_ms_info['media_template'] media_template = representation_ms_info['media_template']
media_template = media_template.replace('$RepresentationID$', representation_id) media_template = media_template.replace('$RepresentationID$', representation_id)
media_template = re.sub(r'\$(Number|Bandwidth)\$', r'%(\1)d', media_template) media_template = re.sub(r'\$(Number|Bandwidth|Time)\$', r'%(\1)d', media_template)
media_template = re.sub(r'\$(Number|Bandwidth)%([^$]+)\$', r'%(\1)\2', media_template) media_template = re.sub(r'\$(Number|Bandwidth|Time)%([^$]+)\$', r'%(\1)\2', media_template)
media_template.replace('$$', '$') media_template.replace('$$', '$')
representation_ms_info['segment_urls'] = [
media_template % { # As per [1, 5.3.9.4.4, Table 16, page 55] $Number$ and $Time$
'Number': segment_number, # can't be used at the same time
'Bandwidth': representation_attrib.get('bandwidth')} if '%(Number' in media_template and 's' not in representation_ms_info:
for segment_number in range( segment_duration = None
if 'total_number' not in representation_ms_info and 'segment_duration':
segment_duration = float_or_none(representation_ms_info['segment_duration'], representation_ms_info['timescale'])
representation_ms_info['total_number'] = int(math.ceil(float(period_duration) / segment_duration))
representation_ms_info['fragments'] = [{
'url': media_template % {
'Number': segment_number,
'Bandwidth': representation_attrib.get('bandwidth'),
},
'duration': segment_duration,
} for segment_number in range(
representation_ms_info['start_number'], representation_ms_info['start_number'],
representation_ms_info['total_number'] + representation_ms_info['start_number'])] representation_ms_info['total_number'] + representation_ms_info['start_number'])]
if 'segment_urls' in representation_ms_info: else:
# $Number*$ or $Time$ in media template with S list available
# Example $Number*$: http://www.svtplay.se/klipp/9023742/stopptid-om-bjorn-borg
# Example $Time$: https://play.arkena.com/embed/avp/v2/player/media/b41dda37-d8e7-4d3f-b1b5-9a9db578bdfe/1/129411
representation_ms_info['fragments'] = []
segment_time = 0
segment_d = None
segment_number = representation_ms_info['start_number']
def add_segment_url():
segment_url = media_template % {
'Time': segment_time,
'Bandwidth': representation_attrib.get('bandwidth'),
'Number': segment_number,
}
representation_ms_info['fragments'].append({
'url': segment_url,
'duration': float_or_none(segment_d, representation_ms_info['timescale']),
})
for num, s in enumerate(representation_ms_info['s']):
segment_time = s.get('t') or segment_time
segment_d = s['d']
add_segment_url()
segment_number += 1
for r in range(s.get('r', 0)):
segment_time += segment_d
add_segment_url()
segment_number += 1
segment_time += segment_d
elif 'segment_urls' in representation_ms_info and 's' in representation_ms_info:
# No media template
# Example: https://www.youtube.com/watch?v=iXZV5uAYMJI
# or any YouTube dashsegments video
fragments = []
s_num = 0
for segment_url in representation_ms_info['segment_urls']:
s = representation_ms_info['s'][s_num]
for r in range(s.get('r', 0) + 1):
fragments.append({
'url': segment_url,
'duration': float_or_none(s['d'], representation_ms_info['timescale']),
})
representation_ms_info['fragments'] = fragments
# NB: MPD manifest may contain direct URLs to unfragmented media.
# No fragments key is present in this case.
if 'fragments' in representation_ms_info:
f.update({ f.update({
'segment_urls': representation_ms_info['segment_urls'], 'fragments': [],
'protocol': 'http_dash_segments', 'protocol': 'http_dash_segments',
}) })
if 'initialization_url' in representation_ms_info: if 'initialization_url' in representation_ms_info:
initialization_url = representation_ms_info['initialization_url'].replace('$RepresentationID$', representation_id) initialization_url = representation_ms_info['initialization_url'].replace('$RepresentationID$', representation_id)
f.update({
'initialization_url': initialization_url,
})
if not f.get('url'): if not f.get('url'):
f['url'] = initialization_url f['url'] = initialization_url
f['fragments'].append({'url': initialization_url})
f['fragments'].extend(representation_ms_info['fragments'])
for fragment in f['fragments']:
fragment['url'] = combine_url(base_url, fragment['url'])
try: try:
existing_format = next( existing_format = next(
fo for fo in formats fo for fo in formats
@ -1610,6 +1771,140 @@ class InfoExtractor(object):
self.report_warning('Unknown MIME type %s in DASH manifest' % mime_type) self.report_warning('Unknown MIME type %s in DASH manifest' % mime_type)
return formats return formats
def _parse_html5_media_entries(self, base_url, webpage, video_id, m3u8_id=None, m3u8_entry_protocol='m3u8'):
def absolute_url(video_url):
return compat_urlparse.urljoin(base_url, video_url)
def parse_content_type(content_type):
if not content_type:
return {}
ctr = re.search(r'(?P<mimetype>[^/]+/[^;]+)(?:;\s*codecs="?(?P<codecs>[^"]+))?', content_type)
if ctr:
mimetype, codecs = ctr.groups()
f = parse_codecs(codecs)
f['ext'] = mimetype2ext(mimetype)
return f
return {}
def _media_formats(src, cur_media_type):
full_url = absolute_url(src)
if determine_ext(full_url) == 'm3u8':
is_plain_url = False
formats = self._extract_m3u8_formats(
full_url, video_id, ext='mp4',
entry_protocol=m3u8_entry_protocol, m3u8_id=m3u8_id)
else:
is_plain_url = True
formats = [{
'url': full_url,
'vcodec': 'none' if cur_media_type == 'audio' else None,
}]
return is_plain_url, formats
entries = []
media_tags = [(media_tag, media_type, '')
for media_tag, media_type
in re.findall(r'(?s)(<(video|audio)[^>]*/>)', webpage)]
media_tags.extend(re.findall(r'(?s)(<(?P<tag>video|audio)[^>]*>)(.*?)</(?P=tag)>', webpage))
for media_tag, media_type, media_content in media_tags:
media_info = {
'formats': [],
'subtitles': {},
}
media_attributes = extract_attributes(media_tag)
src = media_attributes.get('src')
if src:
_, formats = _media_formats(src, media_type)
media_info['formats'].extend(formats)
media_info['thumbnail'] = media_attributes.get('poster')
if media_content:
for source_tag in re.findall(r'<source[^>]+>', media_content):
source_attributes = extract_attributes(source_tag)
src = source_attributes.get('src')
if not src:
continue
is_plain_url, formats = _media_formats(src, media_type)
if is_plain_url:
f = parse_content_type(source_attributes.get('type'))
f.update(formats[0])
media_info['formats'].append(f)
else:
media_info['formats'].extend(formats)
for track_tag in re.findall(r'<track[^>]+>', media_content):
track_attributes = extract_attributes(track_tag)
kind = track_attributes.get('kind')
if not kind or kind in ('subtitles', 'captions'):
src = track_attributes.get('src')
if not src:
continue
lang = track_attributes.get('srclang') or track_attributes.get('lang') or track_attributes.get('label')
media_info['subtitles'].setdefault(lang, []).append({
'url': absolute_url(src),
})
if media_info['formats'] or media_info['subtitles']:
entries.append(media_info)
return entries
def _extract_akamai_formats(self, manifest_url, video_id):
formats = []
hdcore_sign = 'hdcore=3.7.0'
f4m_url = re.sub(r'(https?://.+?)/i/', r'\1/z/', manifest_url).replace('/master.m3u8', '/manifest.f4m')
if 'hdcore=' not in f4m_url:
f4m_url += ('&' if '?' in f4m_url else '?') + hdcore_sign
f4m_formats = self._extract_f4m_formats(
f4m_url, video_id, f4m_id='hds', fatal=False)
for entry in f4m_formats:
entry.update({'extra_param_to_segment_url': hdcore_sign})
formats.extend(f4m_formats)
m3u8_url = re.sub(r'(https?://.+?)/z/', r'\1/i/', manifest_url).replace('/manifest.f4m', '/master.m3u8')
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
return formats
def _extract_wowza_formats(self, url, video_id, m3u8_entry_protocol='m3u8_native', skip_protocols=[]):
url = re.sub(r'/(?:manifest|playlist|jwplayer)\.(?:m3u8|f4m|mpd|smil)', '', url)
url_base = self._search_regex(r'(?:https?|rtmp|rtsp)(://[^?]+)', url, 'format url')
http_base_url = 'http' + url_base
formats = []
if 'm3u8' not in skip_protocols:
formats.extend(self._extract_m3u8_formats(
http_base_url + '/playlist.m3u8', video_id, 'mp4',
m3u8_entry_protocol, m3u8_id='hls', fatal=False))
if 'f4m' not in skip_protocols:
formats.extend(self._extract_f4m_formats(
http_base_url + '/manifest.f4m',
video_id, f4m_id='hds', fatal=False))
if re.search(r'(?:/smil:|\.smil)', url_base):
if 'dash' not in skip_protocols:
formats.extend(self._extract_mpd_formats(
http_base_url + '/manifest.mpd',
video_id, mpd_id='dash', fatal=False))
if 'smil' not in skip_protocols:
rtmp_formats = self._extract_smil_formats(
http_base_url + '/jwplayer.smil',
video_id, fatal=False)
for rtmp_format in rtmp_formats:
rtsp_format = rtmp_format.copy()
rtsp_format['url'] = '%s/%s' % (rtmp_format['url'], rtmp_format['play_path'])
del rtsp_format['play_path']
del rtsp_format['ext']
rtsp_format.update({
'url': rtsp_format['url'].replace('rtmp://', 'rtsp://'),
'format_id': rtmp_format['format_id'].replace('rtmp', 'rtsp'),
'protocol': 'rtsp',
})
formats.extend([rtmp_format, rtsp_format])
else:
for protocol in ('rtmp', 'rtsp'):
if protocol not in skip_protocols:
formats.append({
'url': protocol + url_base,
'format_id': protocol,
'protocol': protocol,
})
return formats
def _live_title(self, name): def _live_title(self, name):
""" Generate the title for a live video """ """ Generate the title for a live video """
now = datetime.datetime.now() now = datetime.datetime.now()
@ -1670,7 +1965,7 @@ class InfoExtractor(object):
any_restricted = False any_restricted = False
for tc in self.get_testcases(include_onlymatching=False): for tc in self.get_testcases(include_onlymatching=False):
if 'playlist' in tc: if tc.get('playlist', []):
tc = tc['playlist'][0] tc = tc['playlist'][0]
is_restricted = age_restricted( is_restricted = age_restricted(
tc.get('info_dict', {}).get('age_limit'), age_limit) tc.get('info_dict', {}).get('age_limit'), age_limit)
@ -1723,6 +2018,19 @@ class InfoExtractor(object):
def _mark_watched(self, *args, **kwargs): def _mark_watched(self, *args, **kwargs):
raise NotImplementedError('This method must be implemented by subclasses') raise NotImplementedError('This method must be implemented by subclasses')
def geo_verification_headers(self):
headers = {}
geo_verification_proxy = self._downloader.params.get('geo_verification_proxy')
if geo_verification_proxy:
headers['Ytdl-request-proxy'] = geo_verification_proxy
return headers
def _generic_id(self, url):
return compat_urllib_parse_unquote(os.path.splitext(url.rstrip('/').split('/')[-1])[0])
def _generic_title(self, url):
return compat_urllib_parse_unquote(os.path.splitext(url_basename(url))[0])
class SearchInfoExtractor(InfoExtractor): class SearchInfoExtractor(InfoExtractor):
""" """

View File

@ -1,13 +1,9 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import os
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_urllib_parse_unquote,
compat_urlparse, compat_urlparse,
) )
from ..utils import url_basename
class RtmpIE(InfoExtractor): class RtmpIE(InfoExtractor):
@ -23,8 +19,8 @@ class RtmpIE(InfoExtractor):
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id = compat_urllib_parse_unquote(os.path.splitext(url.rstrip('/').split('/')[-1])[0]) video_id = self._generic_id(url)
title = compat_urllib_parse_unquote(os.path.splitext(url_basename(url))[0]) title = self._generic_title(url)
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
@ -34,3 +30,31 @@ class RtmpIE(InfoExtractor):
'format_id': compat_urlparse.urlparse(url).scheme, 'format_id': compat_urlparse.urlparse(url).scheme,
}], }],
} }
class MmsIE(InfoExtractor):
IE_DESC = False # Do not list
_VALID_URL = r'(?i)mms://.+'
_TEST = {
# Direct MMS link
'url': 'mms://kentro.kaist.ac.kr/200907/MilesReid(0709).wmv',
'info_dict': {
'id': 'MilesReid(0709)',
'ext': 'wmv',
'title': 'MilesReid(0709)',
},
'params': {
'skip_download': True, # rtsp downloads, requiring mplayer or mpv
},
}
def _real_extract(self, url):
video_id = self._generic_id(url)
title = self._generic_title(url)
return {
'id': video_id,
'title': title,
'url': url,
}

View File

@ -5,13 +5,17 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_urllib_parse_urlencode,
compat_urllib_parse_urlparse, compat_urllib_parse_urlparse,
compat_urlparse, compat_urlparse,
) )
from ..utils import ( from ..utils import (
orderedSet, orderedSet,
remove_end, remove_end,
extract_attributes,
mimetype2ext,
determine_ext,
int_or_none,
parse_iso8601,
) )
@ -58,6 +62,9 @@ class CondeNastIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': '3D Printed Speakers Lit With LED', 'title': '3D Printed Speakers Lit With LED',
'description': 'Check out these beautiful 3D printed LED speakers. You can\'t actually buy them, but LumiGeek is working on a board that will let you make you\'re own.', 'description': 'Check out these beautiful 3D printed LED speakers. You can\'t actually buy them, but LumiGeek is working on a board that will let you make you\'re own.',
'uploader': 'wired',
'upload_date': '20130314',
'timestamp': 1363219200,
} }
}, { }, {
# JS embed # JS embed
@ -67,70 +74,93 @@ class CondeNastIE(InfoExtractor):
'id': '55f9cf8b61646d1acf00000c', 'id': '55f9cf8b61646d1acf00000c',
'ext': 'mp4', 'ext': 'mp4',
'title': '3D printed TSA Travel Sentry keys really do open TSA locks', 'title': '3D printed TSA Travel Sentry keys really do open TSA locks',
'uploader': 'arstechnica',
'upload_date': '20150916',
'timestamp': 1442434955,
} }
}] }]
def _extract_series(self, url, webpage): def _extract_series(self, url, webpage):
title = self._html_search_regex(r'<div class="cne-series-info">.*?<h1>(.+?)</h1>', title = self._html_search_regex(
webpage, 'series title', flags=re.DOTALL) r'(?s)<div class="cne-series-info">.*?<h1>(.+?)</h1>',
webpage, 'series title')
url_object = compat_urllib_parse_urlparse(url) url_object = compat_urllib_parse_urlparse(url)
base_url = '%s://%s' % (url_object.scheme, url_object.netloc) base_url = '%s://%s' % (url_object.scheme, url_object.netloc)
m_paths = re.finditer(r'<p class="cne-thumb-title">.*?<a href="(/watch/.+?)["\?]', m_paths = re.finditer(
webpage, flags=re.DOTALL) r'(?s)<p class="cne-thumb-title">.*?<a href="(/watch/.+?)["\?]', webpage)
paths = orderedSet(m.group(1) for m in m_paths) paths = orderedSet(m.group(1) for m in m_paths)
build_url = lambda path: compat_urlparse.urljoin(base_url, path) build_url = lambda path: compat_urlparse.urljoin(base_url, path)
entries = [self.url_result(build_url(path), 'CondeNast') for path in paths] entries = [self.url_result(build_url(path), 'CondeNast') for path in paths]
return self.playlist_result(entries, playlist_title=title) return self.playlist_result(entries, playlist_title=title)
def _extract_video(self, webpage, url_type): def _extract_video(self, webpage, url_type):
if url_type != 'embed': query = {}
description = self._html_search_regex( params = self._search_regex(
[ r'(?s)var params = {(.+?)}[;,]', webpage, 'player params', default=None)
r'<div class="cne-video-description">(.+?)</div>', if params:
r'<div class="video-post-content">(.+?)</div>', query.update({
], 'videoId': self._search_regex(r'videoId: [\'"](.+?)[\'"]', params, 'video id'),
webpage, 'description', fatal=False, flags=re.DOTALL) 'playerId': self._search_regex(r'playerId: [\'"](.+?)[\'"]', params, 'player id'),
'target': self._search_regex(r'target: [\'"](.+?)[\'"]', params, 'target'),
})
else: else:
description = None params = extract_attributes(self._search_regex(
params = self._search_regex(r'var params = {(.+?)}[;,]', webpage, r'(<[^>]+data-js="video-player"[^>]+>)',
'player params', flags=re.DOTALL) webpage, 'player params element'))
video_id = self._search_regex(r'videoId: [\'"](.+?)[\'"]', params, 'video id') query.update({
player_id = self._search_regex(r'playerId: [\'"](.+?)[\'"]', params, 'player id') 'videoId': params['data-video'],
target = self._search_regex(r'target: [\'"](.+?)[\'"]', params, 'target') 'playerId': params['data-player'],
data = compat_urllib_parse_urlencode({'videoId': video_id, 'target': params['id'],
'playerId': player_id, })
'target': target, video_id = query['videoId']
}) video_info = None
base_info_url = self._search_regex(r'url = [\'"](.+?)[\'"][,;]', info_page = self._download_webpage(
webpage, 'base info url', 'http://player.cnevids.com/player/video.js',
default='http://player.cnevids.com/player/loader.js?') video_id, 'Downloading video info', query=query, fatal=False)
info_url = base_info_url + data if info_page:
info_page = self._download_webpage(info_url, video_id, video_info = self._parse_json(self._search_regex(
'Downloading video info') r'loadCallback\(({.+})\)', info_page, 'video info'), video_id)['video']
video_info = self._search_regex(r'var\s+video\s*=\s*({.+?});', info_page, 'video info') else:
video_info = self._parse_json(video_info, video_id) info_page = self._download_webpage(
'http://player.cnevids.com/player/loader.js',
video_id, 'Downloading loader info', query=query)
video_info = self._parse_json(self._search_regex(
r'var\s+video\s*=\s*({.+?});', info_page, 'video info'), video_id)
title = video_info['title']
formats = [{ formats = []
'format_id': '%s-%s' % (fdata['type'].split('/')[-1], fdata['quality']), for fdata in video_info.get('sources', [{}])[0]:
'url': fdata['src'], src = fdata.get('src')
'ext': fdata['type'].split('/')[-1], if not src:
'quality': 1 if fdata['quality'] == 'high' else 0, continue
} for fdata in video_info['sources'][0]] ext = mimetype2ext(fdata.get('type')) or determine_ext(src)
quality = fdata.get('quality')
formats.append({
'format_id': ext + ('-%s' % quality if quality else ''),
'url': src,
'ext': ext,
'quality': 1 if quality == 'high' else 0,
})
self._sort_formats(formats) self._sort_formats(formats)
return { info = self._search_json_ld(
webpage, video_id, fatal=False) if url_type != 'embed' else {}
info.update({
'id': video_id, 'id': video_id,
'formats': formats, 'formats': formats,
'title': video_info['title'], 'title': title,
'thumbnail': video_info['poster_frame'], 'thumbnail': video_info.get('poster_frame'),
'description': description, 'uploader': video_info.get('brand'),
} 'duration': int_or_none(video_info.get('duration')),
'tags': video_info.get('tags'),
'series': video_info.get('series_title'),
'season': video_info.get('season_title'),
'timestamp': parse_iso8601(video_info.get('premiere_date')),
})
return info
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) site, url_type, item_id = re.match(self._VALID_URL, url).groups()
site = mobj.group('site')
url_type = mobj.group('type')
item_id = mobj.group('id')
# Convert JS embed to regular embed # Convert JS embed to regular embed
if url_type == 'embedjs': if url_type == 'embedjs':

Some files were not shown because too many files have changed in this diff Show More