Compare commits

...

640 Commits

Author SHA1 Message Date
8956d6608a release 2016.11.02 2016-11-02 02:39:36 +07:00
3365ea8929 [extractor/common] Remove unused code 2016-11-02 02:34:23 +07:00
a18aeee803 [ChangeLog] Actualize 2016-11-02 02:33:17 +07:00
1616f9b452 [extractor/common] Fix typo 2016-11-02 02:30:25 +07:00
02dc0a36b7 [utils] Introduce base_url 2016-11-02 02:30:18 +07:00
639e3b5c99 extract ISM formats in some of the extractors 2016-11-02 01:54:45 +07:00
b2758123c5 add Basic support for Smooth Streaming protocol(#8118) 2016-11-02 01:54:45 +07:00
f449c061d0 [nicknight] Improve extraction (closes #10769) 2016-11-02 01:35:53 +07:00
9c82bba05d [nickde] Improve extraction 2016-11-02 01:29:05 +07:00
e3577722b0 [nicknight] Add extractor 2016-11-02 01:25:59 +07:00
b82c33dd67 [extractor/common] Improve mpd base URL extraction (closes #10909, closes #11079) 2016-11-01 01:15:46 +07:00
e5a088dc4b [utils] Fix --match-filter for int-like strings (closes #11082) 2016-10-31 23:32:08 +07:00
2c6da7df4a release 2016.10.31 2016-10-31 01:36:53 +07:00
7e7a028aa4 [README.md] Fix a typo 2016-10-31 01:12:36 +07:00
e70a5e6566 release 2016.10.30 2016-10-30 18:24:49 +07:00
3bf55be466 [ChangeLog] Actualize 2016-10-30 18:19:29 +07:00
a901fc5fc2 [vessel] Add tests for #11068 2016-10-30 18:17:15 +07:00
cae6bc0118 [vessel] Improve video id extraction 2016-10-30 18:14:51 +07:00
d9ee2e5cf6 [facebook] Remove SWF params so that 1080P are detected
Closes #11073

In the provided link, SWF params give up to 720P, and VideoConfig
gives 1080P for both best and bestvideo. I guess all Facebook videos
supports HTML5 now, so I remove the old detection for SWF params
2016-10-30 18:20:55 +08:00
e1a0b3b81c [imgur] Recognize /r/ URLs (closes #11071) 2016-10-30 17:02:03 +08:00
2a048f9878 [beeg] Fix extraction (closes #11069) 2016-10-30 05:27:50 +07:00
ea331f40e6 Credit @Thor77 for jamendo (#10934) 2016-10-30 05:10:31 +07:00
f02700a1fa [openload] Fix extraction (#10408)
Thanks @TwelveCharzz again for studying openload codes
2016-10-29 18:00:30 +08:00
f3517569f6 [gvsearch] Modernize and fix page result request (closes #11051) 2016-10-28 23:19:59 +07:00
c725333d41 [ard] Fix typo 2016-10-27 03:59:00 +07:00
a5a8877f9c [adultswim] Fix extraction (closes #10979) 2016-10-27 02:16:48 +08:00
43c53a1700 [nobelprize] Add new extractor(closes #9999) 2016-10-26 18:15:23 +01:00
ec8705117a [hornbunny] Fix extraction (#10981) 2016-10-27 00:10:51 +08:00
3d8d44c7b1 [tvp] improve video id extraction(closes #10585) 2016-10-26 16:47:22 +01:00
88839f4380 release 2016.10.26 2016-10-26 19:55:09 +07:00
83e9374464 [ChangeLog] Actualize 2016-10-26 19:53:44 +07:00
773017c648 [rentv] Move rentv test from generic extractor and add only matching tests 2016-10-26 19:52:43 +07:00
777d90dc28 [rentv] Add new extractor(closes #10620) 2016-10-26 10:07:35 +01:00
3791d84acc [ard] Detect unavailable videos (closes #11018) 2016-10-25 21:21:47 +07:00
9305a0dc60 [vk] Fix extraction (closes #11022) 2016-10-25 21:05:29 +07:00
94e08950e3 release 2016.10.25 2016-10-25 03:19:36 +07:00
ee824a8d06 [ChangeLog] Actualize
[ci skip]
2016-10-25 02:51:07 +07:00
d3b6b3b95b [jamendo] Improve 2016-10-25 02:46:48 +07:00
b17422753f [jamendo] Add extractor 2016-10-25 02:43:03 +07:00
b0b28b8241 [ChangeLog] Actualize 2016-10-25 01:53:41 +07:00
81cb7a5978 Credit @azuwis for pandatv (#10736) 2016-10-25 01:51:46 +07:00
d2e96a8ed4 [pandatv] Extract m3u8, document reverse source and PEP 8 2016-10-25 01:51:37 +07:00
2e7c8cab55 [pandatv] Add new extractor 2016-10-25 01:46:02 +07:00
d7d4481c6a [movieclips] Fix _VALID_URL 2016-10-24 23:54:42 +07:00
5ace137bf4 [dotsub] Support vimeo embed (closes #10964) 2016-10-24 15:13:33 +08:00
9dde0e04e6 [litv] Fix extraction (#11006) 2016-10-23 23:23:40 +08:00
f16f8505b1 [vimeo] Delegate ondemand redirects to ondemand extractor (closes #10994) 2016-10-23 18:48:50 +07:00
9dc13a6780 [vivo] Fix extraction (closes #11003) 2016-10-23 18:07:56 +07:00
9aa929d337 [twitch:stream] Add support for rebroadcasts (closes #10995) 2016-10-23 17:20:45 +07:00
425f3fdfcb [pluralsight] Fix subtitles conversion (closes #10990) 2016-10-22 21:15:39 +07:00
e034cbc581 Merge branch 'johnhawkinson-stdin2' 2016-10-22 13:10:27 +08:00
5378f8ce0d [ChangeLog] Update for #10996 2016-10-22 13:08:56 +08:00
b64d04c119 [utils] Clarify for redirecting STDIN in get_exe_version() 2016-10-22 13:04:05 +08:00
00ca755231 [get_exe_version] Do version probes with <&-
When doing version probes for ffmpeg, do the
equivalent of calling it as:

    ffmpeg -version <&-

Where <&- is shell syntax for closing stdin before calling the
program. This is roughly equivalent to </dev/null without actually
opening /dev/null.

This prevents ffmpeg -version from hanging when run in the background.
Fixes #955.

The reason is that ffmpeg tries to manipulate stdin to set up terminal
characteristic, and that causes the kernel to suspend the parent
process (youtube-dl).

Note that closing stdin is achieved by calling subprocess.Popen() with
stdin set to subprocess.PIPE and without passing any input to
Popen.communicate(). This is somewhat subtle.
2016-10-22 00:34:08 -04:00
69c2d42bd7 release 2016.10.21.1 2016-10-21 04:57:28 +07:00
062e2769a3 [ChangeLog] Actualize 2016-10-21 04:53:26 +07:00
859447a28d [adobepass] PEP 8 2016-10-21 04:38:14 +07:00
f8ae2c7f30 [pluralsight] Process all clip URLs (closes #10984) 2016-10-21 04:35:32 +07:00
9ce0077485 release 2016.10.21 2016-10-21 03:08:42 +07:00
0ebb86bd18 [ChangeLog] Actualize 2016-10-21 03:07:03 +07:00
9df6b03caf [pluralsight] Adapt to new API (closes #10972) 2016-10-21 03:00:03 +07:00
8e2915d70b Revert "[postprocessor/embedthumbnail] Allow mkv to embed thumbnails"
This reverts commit 7360db05b4.

This commit was added as an attempt to fix #6046. Unfortunately, the fix
is completely wrong. As reported on #10359, embedded thumbnails are not
displayed in VLC, and Se7en on IRC reports that the embedded thumbnail
misleads mpv as well.

The correct way is using -attachment of ffmpeg, while the current
run_ffmpeg_multiple_files API can't handle it cleanly.
2016-10-20 15:07:19 +08:00
19e447150d [ChangeLog] Update for #10971 2016-10-20 04:19:33 +08:00
ad9fd84004 Merge pull request #10971 from kasper93/openload
[openload] Fix extraction.
2016-10-20 04:18:27 +08:00
60633ae9a0 [openload] Fix extraction.
Fixes #10408
2016-10-19 22:00:29 +02:00
a81dc82151 Credit @raleeper for the Comcast support (#10819) 2016-10-19 20:39:52 +01:00
9218a6b4f5 Merge pull request #10819 from raleeper/adobepass
[adobepass] Add Comcast
2016-10-19 20:16:24 +01:00
02af6ec707 [natgeo] extract m3u8 formats(closes #10959) 2016-10-19 19:39:41 +01:00
05b7996cab [ChangeLog] Fix typos and add unreleased version header 2016-10-20 00:00:53 +07:00
46f6052950 [adobepass] Add Comcast with fixed _download_webpage calls 2016-10-19 09:56:26 -07:00
c8802041dd release 2016.10.19 2016-10-19 23:55:16 +07:00
c7911009a0 [ChangeLog] Actualize 2016-10-19 23:53:37 +07:00
2b96b06bf0 [vidzi] Fix extraction (closes #10908, closes #10952) 2016-10-19 23:31:58 +07:00
06b3fe2926 [utils] Expose PACKED_CODES_RE 2016-10-19 23:28:49 +07:00
2c6743bf0f [README.md] Change 'guys' to 'people'
Folks at Ubuntu aren't all male so calling them 'guys' is a little odd.
2016-10-19 22:32:17 +07:00
efb6242916 [urplay] add supprt for urskola.se and fix subtitle extraction(closes #10915) 2016-10-19 15:05:39 +01:00
0384932e3d [extractor/common] try to extract non smil wowza mpd manifests 2016-10-19 14:57:12 +01:00
edd6074cea [extractor/common] detect f4m audio only formats 2016-10-19 14:42:48 +01:00
791d29dbf8 [orf] add subtitles support(closes #10939) 2016-10-19 11:34:46 +01:00
481cc7335c [youtube] Fix --no-playlist behavior for youtu.be/id URLs (closes #10896) 2016-10-19 03:27:18 +07:00
853a71b628 [nrk] Improve _VALID_URL 2016-10-19 03:02:14 +07:00
e2628fb6a0 [nrk] Relax _VALID_URL (closes #10928) 2016-10-19 02:59:44 +07:00
df4939b1cd [nytimes] Fix typo 2016-10-17 22:16:23 +07:00
0b94dbb115 [postprocessor/ffmpeg] PEP 8 2016-10-16 19:16:47 +07:00
8d76bdf12b [extractor/common] Mention podcast in series fields section 2016-10-16 18:37:17 +07:00
8204bacf1d Credit @johnhawkinson for nytimes podcasts (#10926) 2016-10-16 18:21:42 +07:00
47da782337 [nytimes] Improve (closes #10926) 2016-10-16 18:21:02 +07:00
74324a7ac2 [nytimes] Add support for podcasts 2016-10-16 17:31:55 +07:00
b0dfcab60a [pluralsight] Relax _VALID_URL (closes #10941) 2016-10-16 17:20:32 +07:00
bbd7706898 release 2016.10.16 2016-10-16 03:23:05 +07:00
112740e79f [ChangeLog] Actualize 2016-10-16 03:09:27 +07:00
c0b1e88895 [huajiao] Improve feed regex 2016-10-16 03:02:41 +07:00
7cdfbbf9b8 [extractors] Change import for theoperaplatform extractor 2016-10-16 02:55:38 +07:00
ac943d48d3 [Beatport] Update extractor name and tests 2016-10-16 02:33:43 +07:00
73498a8921 [ruutu] Add support for supla.fi 2016-10-16 02:31:56 +07:00
9187ee4f19 [README.md] Improve grammar 2016-10-16 02:26:06 +07:00
2273e2c530 [postprocessor/ffmpeg] Return correct filepath and ext in updated information in FFmpegExtractAudioPP
Return correct audio's filepath and ext instead of the video's when extracting audio and audio file already exists.
2016-10-16 02:12:03 +07:00
4b492e3579 [theoperaplatform] Rename, fix _VALID_URL and fix test 2016-10-16 00:24:06 +07:00
9c4258bcec [theoperaplatform] Add extractor 2016-10-16 00:21:15 +07:00
ea8aefd1d7 [lynda] Fix height for prioritized streams 2016-10-16 00:08:46 +07:00
6edfc40a0e [lynda] Add fallback extraction scenario 2016-10-16 00:07:40 +07:00
68d9561ca1 [lynda] Switch to https (closes #10916) 2016-10-15 23:56:09 +07:00
cfc0e7c82b Credit @pyx for the Huajiao extractor (#10917) 2016-10-15 16:48:35 +08:00
4102e64051 Merge branch 'pyx-huajiao' 2016-10-15 14:56:00 +08:00
f605242bfc [ChangeLog] Update for #10917 2016-10-15 14:55:36 +08:00
d32fa0f12c [huajiao] Coding style 2016-10-15 14:53:53 +08:00
a347a0d088 Merge branch 'huajiao' of https://github.com/pyx/youtube-dl into pyx-huajiao 2016-10-15 14:53:05 +08:00
77c5b98dcd [crunchyroll] Skip an invalid _TEST 2016-10-15 14:36:07 +08:00
88ebefc054 [cmt] Fix mgid extraction (closes #10813)
The example in #10813 requires TV provider authentication in Firefox,
while youtube-dl can download it directly with an US proxy.

I'm not sure whether the mgid fix is cmt-specific or it applies to all
mtv-based sites. I keep it in cmt.py until similar patterns are found in
other websites.
2016-10-15 14:27:15 +08:00
2e638d7bca Made optional fields optional 2016-10-14 14:12:06 -04:00
a26b174c61 [safari:course] Add support for techbus.safaribooksonline.com 2016-10-15 00:29:33 +07:00
73c801d660 [orf:tvthek] Fix extraction and modernize (closes #10898) 2016-10-14 23:43:09 +07:00
dff5107b68 README.md: fix alrady typo 2016-10-14 23:21:08 +07:00
8c3e448e80 [clipfish] Update _TEST; the old one is gone 2016-10-15 00:12:21 +08:00
2ecbd2ad6f [chirbit:profile] Fix extraction 2016-10-15 00:01:46 +08:00
62a0b86e4f [carambatv] Fix extraction
The video requested in #9815 now has videomore embeds.
2016-10-14 23:43:18 +08:00
146969e05b [videomore] Support <iframe> embed videos
Seen in CarambaTVPage
2016-10-14 23:42:11 +08:00
e2004ccaf7 [canalplus] Fix video_id and update _TESTS
Some tests are gone, and some redirect to different videos
2016-10-14 20:26:12 +08:00
a5f8473145 [cbsinteractive] Fix extraction for cnet.com 2016-10-14 18:20:01 +08:00
b7f59a3bf6 [huajiao] Add new extractor 2016-10-13 21:51:26 -04:00
580d411931 [parliamentliveuk] Recognize lower case URLs
Closes #10912

Seems parliamentliveuk matches URLs case-insentive. For example this URL
also works:
http://parliamentlive.tv/EvEnt/Index/3F24936f-130f-40bf-9a5d-b3d6479da6a4
2016-10-14 00:44:28 +08:00
5c4bfd4da5 release 2016.10.12 2016-10-12 21:30:05 +07:00
7104ae799c [ChangeLog] Actualize 2016-10-12 21:25:04 +07:00
bcd6276520 [downloader/common] Remove debug output 2016-10-12 21:22:33 +07:00
591e384552 [streamable] Remove debug output 2016-10-12 21:22:12 +07:00
9feb1c9731 [dailymotion] Fix extraction and update _TESTS
Closes #10901

Seems all videos use player V5 syntax now
2016-10-12 21:45:49 +08:00
a093cfc78b [vimeo:review] Fix extraction (#10900)
Now Vimeo Review videos uses React. Thanks @davekaro for analyzing the
problem!
2016-10-12 01:48:06 +08:00
6f20b65e72 [test/test_http] Update tests
After switching to HTML5 extraction helpers in generic.py, the result
info_dict is always a playlist.
2016-10-12 01:41:41 +08:00
cea364f70c [extractor/common] Support HTML media elements without child nodes 2016-10-12 01:40:28 +08:00
55642487f0 [nhl] Skip invalid m3u8 formats (closes #10713) 2016-10-11 20:50:52 +08:00
3d643f4cec [hbo] Add HBOEpisodeIE (#10892) 2016-10-11 17:46:52 +08:00
c452e69d3d [footyroom] Fix extraction and update _TESTS (closes #10810) 2016-10-11 17:46:13 +08:00
555787d717 [streamable] Add helper for extracting embedded videos 2016-10-11 17:44:35 +08:00
f165ca70eb [abc.net.au:iview] Fix for non-series videos (closes #10895) 2016-10-11 12:53:27 +08:00
27b8d2ee95 [hbo] Add display_id and another test (#10892) 2016-10-11 12:41:44 +08:00
71cdcb2331 [hbo] Support episode pages (closes #10892) 2016-10-11 12:30:35 +08:00
176006a120 [allocine] Fix for /video/ videos (closes #10860) 2016-10-09 19:42:42 +08:00
65f4c1de3d [allocine] Fix extraction (closes #10860)
I change the URL of the third test case, because now the original URL
does not contain a video anymore, and there's no easy to get the real
URL from the /film/ one.
2016-10-09 18:58:15 +08:00
b0082629a9 [nextmedia] Support action news (動新聞) on Apple Daily 2016-10-09 18:42:15 +08:00
8204c73352 [Makefile] Fix for GNU make < 4 (closes #9387)
Shell assignment operator in BSD make != is ported to GNU make in
version 4.0, so 3.x doesn't work. I choose to drop BSD make support as
installing GNU make on *BSD systems is easier than installing newer GNU
make.
2016-10-09 18:24:45 +08:00
2b51dac1f9 [slutload] Fix test and simplify 2016-10-09 01:17:38 +07:00
f68901e50a [reverbnation] Eliminate code duplication in thumbnails extraction 2016-10-09 01:02:35 +07:00
3adb9d119e [reverbnation] Modernize 2016-10-09 01:00:38 +07:00
1dd58e14d8 [lego] improve info extraction and bypass geo restriction(closes #10872) 2016-10-08 08:33:18 +01:00
dd4291f729 release 2016.10.07 2016-10-07 22:25:30 +07:00
888f8d6ba4 [ChangeLog] Actualize 2016-10-07 22:23:16 +07:00
f475e88121 [vimeo] PEP 8
[ci skip]
2016-10-07 22:15:26 +07:00
3c6b3bf221 [iprima] detect geo restriction 2016-10-07 15:53:16 +01:00
38588ab977 [facebook] Fix for new handleServerJS syntax (closes #10846)
According to the dump file in #10846, handleServerJS() now accepts
an optional second argument. It's a string from available dump files.
2016-10-07 20:04:49 +08:00
85bcdd081c [extractors] Add MmsIE 2016-10-07 19:31:26 +08:00
9dcd6fd3aa [generic,commonprotocols] Move mms suuport from GenericIE
And use _generic_* helpers in those extractors
2016-10-07 19:24:22 +08:00
98763ee354 [extractor/common] Add id and title helpers for generic IEs 2016-10-07 19:20:53 +08:00
3d83a1ae92 [generic] Support direct MMS links (closes #10838) 2016-10-07 17:50:45 +08:00
c0a7b9b348 Revert "[Makefilea] Fix for GNU make < 4"
This reverts commit 831a34caa2.

The reverted commit breaks lazy extractors.
2016-10-07 16:03:34 +08:00
831a34caa2 [Makefilea] Fix for GNU make < 4
Closes #9387

The shell assignment operator != was introduced in GNU make 4.0, or
specifically the commit in [1]. This fix removes such usages and
fallback to a more portable syntax. Tested with:

* GNU make 3.82 on CentOS 7.2
* bmake 20150910 on CentOS 7.2, source RPM from Fedora 24 [2]
* GNU make 4.2.1 on Arch Linux (Arch official package)
* bmake 20160926 on Arch Linux (Arch official package)
* GNU make 3.82 on Arch Linux (Compiled from source)
* Apple bsdmake-24 on macOS Sierra, binary package from Homebrew

Thanks @bdeyal for the feedback of the first tests

[1] http://git.savannah.gnu.org/cgit/make.git/commit/?id=b34438bee83ee906a23b881f257e684a0993b9b1
[2] http://koji.fedoraproject.org/koji/buildinfo?buildID=716769
2016-10-07 03:28:41 +08:00
09b9c45e24 [generic] Add support for multiple vimeo embeds (Closes #10862) 2016-10-06 23:22:52 +07:00
33898fb19c [nzz] Add new extractor(#4407) 2016-10-06 10:45:57 +01:00
017eb82934 [npo] detect geo restriction 2016-10-05 18:27:02 +01:00
b1d798887e [npo] Add support for 2doc.nl (Closes #10842) 2016-10-05 23:43:08 +07:00
0a33bb2cb2 Rename "Steffan 'Ruirize' James" to "Steffan Donal"
Legal name change!
2016-10-05 03:32:14 +07:00
185744f92f [lego] Add new extractor(closes #10369) 2016-10-04 10:30:57 +01:00
7232e54813 [tonline] Add new extractor(#10376) 2016-10-04 08:00:25 +01:00
6eb5503b12 [techtalks] Relax _VALID_URL 2016-10-04 02:54:36 +07:00
539c881bfc [techtalks] Allow URL-s with name part omitted. 2016-10-04 02:52:33 +07:00
c1b2a0858c [youtube:live] Extend _VALID_URL (Closes #10839) 2016-10-04 02:10:23 +07:00
215ff6e0f3 [theweatherchannel] Add new extractor(closes #7188) 2016-10-03 18:20:34 +01:00
dcdb292fdd Unify coding cookie 2016-10-03 23:44:29 +07:00
c1084ddb0c [thisoldhouse] Add new extractor(closes #10837) 2016-10-03 15:27:09 +01:00
ee5de4e38e [nhl] Add support for wch2016.com (Closes #10833) 2016-10-03 00:54:02 +07:00
25291b979a Merge pull request #10829 from TRox1972/pornoxo_improve
[pornoxo] Use JWPlatform to improve metadata extraction
2016-10-02 20:19:34 +08:00
567a5996ca [pornoxo] Use JWPlatform to improve metadata extraction 2016-10-02 13:07:02 +02:00
6c152ce20f release 2016.10.02 2016-10-02 15:58:00 +07:00
26406d33c7 [ChangeLog] Actualize 2016-10-02 15:56:33 +07:00
703b3afa93 [amcnetworks] Skip a restricted _TEST 2016-10-02 14:25:06 +08:00
99ed78c79e [jwplatform] Support DASH streams 2016-10-02 14:07:49 +08:00
fd15264172 [jwplatform] Support old-style jwplayer playlists 2016-10-02 13:47:06 +08:00
bd26441205 [utils] Fix xattr error handling 2016-10-02 03:03:41 +08:00
b19e275d99 [__init__] Fix lost xattr if --embed-thumbnail used
Reported at
https://github.com/rg3/youtube-dl/issues/9054#issuecomment-250451823
2016-10-02 02:12:14 +08:00
f6ba581f89 [byutv:event] Add extractor 2016-10-02 00:50:07 +07:00
6d2549fb4f [byutv] Fix id and display id 2016-10-02 00:44:54 +07:00
4da4516973 [byutv] Rely on _match_id and _parse_json 2016-10-02 00:41:18 +07:00
e1e97c2446 [periscope:user] Fix extraction (Closes #10820) 2016-10-01 22:50:47 +07:00
53a7e3d287 [utils] Support xattr as well as pyxattr
Closes #9054

There are two xattr packages in Python, pyxattr [1] and xattr [2]. They
have different APIs.

In old days pyxattr supports Linux only and xattr supports Linux, Mac,
FreeBSD and Solaris, and pyxattr supports Linux only. Recently pyxattr
adds support for Mac OS X. [3]

An old version of [2] is shipped with Mac OS X. However, some Linux
distributions have pyxattr only, for example PLD-Linux [4] and old Arch
Linux. [5] As a result, supporting both is the way to go.

[1] https://github.com/iustin/pyxattr
[2] https://github.com/xattr/xattr
[3] https://github.com/iustin/pyxattr/pull/9
[4] https://github.com/rg3/youtube-dl/issues/5498
[5] https://git.archlinux.org/svntogit/community.git/commit/?id=427c4c76401e386d865ccddea4fbfdc74df80492
    https://git.archlinux.org/svntogit/community.git/commit/?id=59b40da7b69622a6761d364a8b07909e9cccaa56
    python-xattr is added on 2016/06/29 while pyxattr is there for more
    than 6 years
2016-10-01 20:13:04 +08:00
d54739a2e6 [downloader/http] xattr values should be bytes 2016-10-01 19:58:13 +08:00
63e0fd5bcc Merge pull request #10818 from TRox1972/criterion_match_id
[criterion] Rely on _match_id, improve regex and add thumbnail to test
2016-10-01 19:49:18 +08:00
9c51a24642 [criterion] Rely on _match_id, improve regex and add thumbnail to test 2016-10-01 13:46:48 +02:00
9bd7bd0b80 [twitch] Skip a 404 test 2016-10-01 16:38:47 +08:00
4a76b73c6c Merge pull request #10817 from TRox1972/clubic_match_id
[clubic] Rely on _match_id and _parse_json
2016-10-01 16:20:12 +08:00
e295618f9e [dctp] Fix extraction (closes #10734) 2016-10-01 15:22:48 +08:00
d7753d1948 [downloader/http] Use write_xattr function for --xattr-set-filesize 2016-10-01 14:47:20 +08:00
eaf9b22f94 [clubic] Rely on _match_id and _parse_json 2016-09-30 20:03:25 +02:00
a1001f47fc [instagram] PEP 8 2016-10-01 00:16:08 +07:00
1609782258 [Instagram] Extract video dimensions 2016-10-01 00:13:34 +07:00
de6babf922 [tvland] Extend _VALID_URL (Closes #10812) 2016-09-30 22:30:34 +07:00
b0582fc806 [vgtv] Add support for tv.aftonbladet.se (Closes #10800) 2016-09-30 00:15:09 +07:00
af33dd8ee7 [aftonbladet] Remove extractor 2016-09-30 00:13:03 +07:00
70d7b323b6 [vk] Improve view count extraction 2016-09-29 23:52:29 +07:00
a7ee8a00f4 [vk] Extract timestamp (Closes #10760) 2016-09-29 23:52:29 +07:00
c6eed6b8c0 [utils] Lower priority for rare date formats and add tests 2016-09-29 23:52:29 +07:00
3aa3953d28 [vk] Fix date and view count extraction. 2016-09-29 23:52:29 +07:00
efa97bdcf1 Move write_xattr to utils.py
There are some other places that use xattr functions. It's better to
move it to a common place so that others can use it.
2016-09-30 00:28:32 +08:00
475f8a4580 [vk] Add support for running live streams (Closes #10799) 2016-09-29 23:21:39 +07:00
93aa0b6318 [vk] Add support for finished live streams (#10799) 2016-09-29 23:04:10 +07:00
0ce26ef228 Merge pull request #10788 from TRox1972/instagram_comments
[Instagram] Extract comments
2016-09-29 21:54:39 +08:00
0d72ff9c51 [leeco] Recognize more Le Sports URLs (#10794) 2016-09-29 21:39:35 +08:00
a56e74e271 [Instagram] Extract comments 2016-09-28 19:32:40 +02:00
f533490bb7 [ketnet] Extract mzsource formats (#10770) 2016-09-28 22:58:25 +07:00
8bfda726c2 [limelight:media] improve http formats extraction 2016-09-28 16:34:27 +01:00
8f0cf20ab9 release 2016.09.27 2016-09-27 23:09:46 +07:00
c8f45f763c [ChangeLog] Remove duplicate 2016-09-27 23:03:00 +07:00
dd2cffeeec [ChangeLog] Actualize 2016-09-27 22:43:35 +07:00
cdfcc4ce95 [mtv] Improve _VALID_URL 2016-09-27 22:27:10 +07:00
e384552590 [vk] Add support for dailymotion embeds
Fixes #10661
2016-09-27 21:58:14 +07:00
1a2fbe322e [periscope] Treat timed_out state as finished stream 2016-09-27 21:55:51 +07:00
f9dd86a112 [npo] Clarify IE_NAMEs (Closes #10775) 2016-09-27 21:37:33 +07:00
2342733f85 fix tests related to 1978540a5122c53012e17a78841f3da0df77fd34(closes #10774) 2016-09-27 15:31:25 +01:00
93933c9819 [awaan:video] fix test(closes #10773) 2016-09-27 15:31:25 +01:00
d75d9e343e [einthusan] Fix extraction (closes #10714) 2016-09-27 14:38:41 +08:00
72c3d02d29 [promptfile] Improve and modernize 2016-09-26 23:39:54 +07:00
d3dbb46330 [promptfile] Fix extraction (Closes #10634) 2016-09-26 23:20:58 +07:00
fffb9cff94 [kaltura] Speed up embed regexes (#10764) 2016-09-26 22:15:58 +07:00
d3c97bad61 Ignore and cleanup 3gp files 2016-09-26 14:14:37 +08:00
2d5b4af007 [extractors] Add import for anderetijden extractor 2016-09-25 23:30:57 +07:00
f1ee462c82 [PULL_REQUEST_TEMPLATE.md] Fix typo 2016-09-25 22:38:36 +07:00
5742c18bc1 [npo] Add support for anderetijden.nl (Closes #10754) 2016-09-25 22:26:14 +07:00
ddb19772d5 [vpro] Fix playlist title extraction and update tests 2016-09-25 22:26:06 +07:00
a3d8b38168 [npo] Generalize playlist extractors 2016-09-25 22:26:00 +07:00
e590b7ff9e [PULL_REQUEST_TEMPLATE.md] Add checkable Improvement options PR's purpose 2016-09-25 18:09:46 +07:00
f3625cc4ca [PULL_REQUEST_TEMPLATE.md] Add Unlicense notice 2016-09-25 18:08:35 +07:00
2d3d29976b [youtube] Change test URLs from http to https 2016-09-25 17:45:24 +07:00
493353c7fd [prosiebensat1] Add support for advopedia 2016-09-25 06:25:57 +07:00
0a078550b9 [prosiebensat1] Improve _VALID_URL 2016-09-25 06:19:17 +07:00
f92bb612c6 [mwave] Relax _VALID_URLs (Closes #10735, closes #10748) 2016-09-25 06:14:32 +07:00
ddde91952f [prosiebensat1] Fix playlist support (Closes #10745) 2016-09-25 05:36:18 +07:00
63c583eb2c [prosiebensat1] Add support for sat1gold (#10745) 2016-09-25 04:43:10 +07:00
7fd57de6fb [cbsnews:livevideo] fix extraction and extract m3u8 formats 2016-09-24 22:01:33 +01:00
e71a450956 [common] add hdcore sign to akamai f4m formats 2016-09-24 21:55:53 +01:00
27e99078d3 [brightcove:new] add support for live streams 2016-09-24 15:39:48 +01:00
6f126d903f [download/hls] Delegate downloading to ffmpeg for live streams 2016-09-24 15:39:47 +01:00
7518a61d41 [soundcloud] Fix typo in playlist base class name 2016-09-24 19:29:49 +07:00
8e45e1cc4d [soundcloud] Generalize playlist entries extraction (#10733) 2016-09-24 19:18:01 +07:00
f0bc5a8609 [twitter] Support Periscope embeds (closes #10737)
Also update _TESTS
2016-09-24 20:00:29 +08:00
a54ffb8aa7 [mtv] add common IE_NAME prefix for MTVIE and MTVVideoIE 2016-09-24 10:50:14 +01:00
8add4bfecb [mtv] add support for new website urls(closes #8169)(closes #9808) 2016-09-24 10:42:20 +01:00
0711995bca [openload] Support subtitles (closes #10625) 2016-09-24 14:27:08 +08:00
5968d7d2fe [extractor/common] Improved support for HTML5 subtitles
Ref: #10625

In a strict sense, <track>s with kind=captions are not subtitles. [1]
openload misuses this attribute, and I guess there will be more
examples, so I add it to common.py.

Also allow extracting information for subtitles-only <video> or <audio>
tags, which is the case of openload.

[1] https://www.w3.org/TR/html5/embedded-content-0.html#attr-track-kind
2016-09-24 14:20:42 +08:00
e6332059ac release 2016.09.24 2016-09-24 02:16:47 +07:00
8eec691e8a [ChangeLog] Actualize 2016-09-24 02:12:49 +07:00
24628cf7db [soundcloud:playlist] Provide video id for playlist entries (Closes #10733) 2016-09-24 02:01:01 +07:00
71ad00c09f [prosiebensat1] Add support for kabeleinsdoku (Closes #10732) 2016-09-23 21:08:16 +07:00
45cae3b021 [cbs] extract info from thunder videoPlayerService(closes #10728) 2016-09-22 19:28:22 +01:00
4ddcb5999d [openload] Fix extraction (closes #10408, closes #10727)
Thanks to @daniel100097 for providing a working version
2016-09-23 01:47:51 +08:00
628406db96 [Makefile] Cleanup files from fragment-based downloaders 2016-09-23 01:13:56 +08:00
e3d6bdc8fc [ustream] Support HLS streams (closes #10698) 2016-09-23 01:11:13 +08:00
0a439c5c4c [udemy] Stringify video id 2016-09-22 21:48:53 +07:00
1978540a51 [ooyala] extract all hls formats 2016-09-21 21:49:52 +01:00
12f211d0cb [videomore] Fix embed regex 2016-09-21 22:51:36 +07:00
3a5a18705f [adobepass] add support MSO that depend on watchTVeverywhere(closes #10709) 2016-09-21 15:57:27 +01:00
1ae0ae5db0 [cartoonnetwork] add support Adobe Pass auth 2016-09-20 18:52:00 +01:00
f62a77b99a [soundcloud] Modernize 2016-09-20 21:56:57 +07:00
4bfd294e2f [soundcloud] Extract license metadata 2016-09-20 21:56:57 +07:00
e33a7253b2 [fox] add support for Adobe Pass auth(closes #8584) 2016-09-20 15:52:23 +01:00
c38f06818d add support for Adobe Pass auth in tbs,tnt and trutv extractors(fixes #10642)(closes #10222)(closes #10519) 2016-09-20 11:55:30 +01:00
cb57386873 release 2016.09.19 2016-09-19 02:58:32 +07:00
59fd8f931d [ChangeLog] Actualize 2016-09-19 02:57:14 +07:00
70b4cf9b1b [crunchyroll] Check if already logged in (Closes #10700) 2016-09-19 02:50:06 +07:00
cc764a6da8 [twitch:stream] Remove fallback to profile extraction when stream is offline
Main page does not contain profile videos anymore
2016-09-18 19:10:18 +07:00
d8dbf8707d [thisav] Improve title extraction (closes #10682)
I didn't add a test case as the one in #10682 looks like a copyrighted
product.
2016-09-18 18:35:38 +08:00
a1da888d0c [vyborymos] Improve station info extraction 2016-09-18 17:30:55 +07:00
3acff9423d release 2016.09.18 2016-09-18 17:16:55 +07:00
9ca93b99d1 [ChangeLog] Actualize 2016-09-18 17:15:22 +07:00
14ae11efab [vyborymos] Add extractor (Closes #10692) 2016-09-18 16:56:40 +07:00
190d2027d0 [xfileshare] Add title regex for streamin.to and fallback to video id (Closes #10646) 2016-09-18 07:22:06 +07:00
26394d021d [globo:article] Add support for multiple videos (Closes #10653) 2016-09-17 23:34:10 +07:00
30d0b549be [extractor/common] Add manifest_url for hls and hds formats 2016-09-17 21:33:38 +07:00
86f4d14f81 Refactor fragments interface and dash segments downloader
- Eliminate segment_urls and initialization_url
+ Introduce manifest_url (manifest may contain unfragmented data in this case url will be used for direct media URL and manifest_url for manifest itself correspondingly)
* Rewrite dashsegments downloader to use fragments data
* Improve generic mpd extraction
2016-09-17 20:35:22 +07:00
21d21b0c72 [svt] Fix DASH formats extraction 2016-09-17 19:25:31 +07:00
b4c1d6e800 [extractor/common] Expose fragments interface for dashsegments formats 2016-09-17 18:31:18 +07:00
a0d5077c8d [extractor/common] Introduce fragments interface 2016-09-17 18:31:09 +07:00
584d6f3457 [thisav] Recognize jwplayers (closes #10447) 2016-09-17 18:46:43 +08:00
e14c82bd6b [jwplatform] Use js_to_json to detect more JWPlayers 2016-09-17 18:45:08 +08:00
c51a7f0b2f [franceinter] Fix upload date extraction 2016-09-17 15:44:37 +07:00
d05ef09d9d [mangomolo] fix domain regex 2016-09-17 08:11:01 +01:00
30d9e20938 [postprocessor/ffmpeg] apply FFmpegFixupM3u8PP only for videos with aac codec(#5591) 2016-09-16 22:06:55 +01:00
fc86d4eed0 [mangomolo] fix typo 2016-09-16 20:10:47 +01:00
7d273a387a [mangomolo] add support for Mangomolo embeds 2016-09-16 19:31:39 +01:00
6ad0219556 [common] add helper method for Wowza Streaming Engine format extraction 2016-09-16 19:30:38 +01:00
98b7506e96 [toutv] add support for authentication(closes #10669) 2016-09-16 17:40:15 +01:00
52dc8a9b3f [franceinter] Fix upload date extraction 2016-09-16 22:02:59 +07:00
9d8985a165 [tv4] Fix hls and hds formats (Closes #10659) 2016-09-16 00:54:34 +07:00
f5e008d134 release 2016.09.15 2016-09-15 23:46:11 +07:00
e6bf3621e7 [ChangeLog] Actualize 2016-09-15 23:31:16 +07:00
490b755769 Improve some id regexes 2016-09-15 23:12:58 +07:00
1dec2c8a0e [adobepass] Change mvpd cache section name
In order to better emphasize it's relation to Adobe Pass
2016-09-15 22:47:45 +07:00
dcce092e0a [extractor/common] Simplify _get_netrc_login_info and carry long lines 2016-09-15 22:35:12 +07:00
32443dd346 [extractor/common] Update _get_login_info's comment 2016-09-15 22:34:29 +07:00
2133565cec [extractor/common] Simplify _get_login_info 2016-09-15 22:26:37 +07:00
1da50aa34e [YoutubeDL] Improve Adobe Pass options' wording 2016-09-15 22:24:55 +07:00
d2522b86ac [options] Actually print Adobe Pass options sections in --help 2016-09-15 22:18:31 +07:00
537f753399 [options] Improve Adobe Pass wording 2016-09-15 22:17:17 +07:00
c849836854 [utils] Improve _hidden_inputs 2016-09-15 21:54:48 +07:00
eb5b1fc021 [crunchyroll] Fix authentication (Closes #10655) 2016-09-15 21:53:35 +07:00
95be29e1c6 [twitch] Fix api calls (Closes #10654, closes #10660) 2016-09-15 20:58:02 +07:00
c035dba19e [bellmedia] add support for more sites 2016-09-15 08:12:12 +01:00
87148bb711 [adobepass] rename --ap-mso-list option to --ap-list-mso 2016-09-14 20:21:09 +01:00
797c636bcb [ap] improve adobe pass names and parse error handling 2016-09-14 18:58:47 +01:00
0002962f3f [franceinter] Improve extraction (Closes #10538) 2016-09-14 23:59:38 +07:00
3e4185c396 [utils] Use native french month names 2016-09-14 23:59:38 +07:00
f6717dec8a [utils] Improve month_by_name and add tests 2016-09-14 23:59:38 +07:00
a942d6cb48 [utils,franceinter] Add french months' names and fix extraction
Update of the "FranceInter" radio extractor : webpages HTML structure
had changed, the extractor didn't work. So I updated this extractor to
get the mp3 URL and all details.
2016-09-14 23:59:38 +07:00
961516bfd1 [kwuo:song] Improve error detection (closes #10650) 2016-09-15 00:56:15 +08:00
6db354a9f4 [kuwo] Update _TESTS 2016-09-15 00:53:04 +08:00
353f340e11 [go] fix typo 2016-09-14 17:22:42 +01:00
014b7e6b25 [go] add support for free full episodes(#10439) 2016-09-14 17:08:25 +01:00
925194022c Improve some _VALID_URLs 2016-09-14 22:47:21 +07:00
b690ea15eb [viafree] Fix test 2016-09-14 22:45:23 +07:00
5712c0f426 [adobepass] remove unnecessary option 2016-09-14 16:37:21 +01:00
86d68f906e [bilibili] Fix extraction for videos without backup_url (#10647) 2016-09-14 22:11:49 +08:00
4875ff6847 [bilibili] Remove copyrighted test cases
I can't find any English or Chinese material that claims BiliBili has
bought legal redistribution permissions for copyrighted products from
copyrighted holders.

References for removed test cases:
"刀语": https://en.wikipedia.org/wiki/Katanagatari, by White Fox
"哆啦A梦": https://en.wikipedia.org/wiki/Doraemon, by Shin-Ei Animation
"岳父岳母真难当": https://en.wikipedia.org/wiki/Serial_(Bad)_Weddings, by Les films du 24
"混沌武士": https://en.wikipedia.org/wiki/Samurai_Champloo, by Manglobe

I shouldn't have added them to _TESTS
2016-09-14 22:09:43 +08:00
1b6712ab23 [adobepass] add specific options for adobe pass authentication
- add --ap-username and --ap-password option to specify
TV provider username and password in the cmd line
- add --ap-retries option to limit the number of retries
- add --list-ap-msi-ids to list the supported TV Providers
2016-09-13 22:16:01 +01:00
8414c2da31 [adobepass] PEP 8 2016-09-13 23:22:16 +07:00
45396dd2ed [nhk] Fix extraction (Closes #10633) 2016-09-13 23:20:25 +07:00
7a7309219c [adobepass] add an option to specify mso_id and support for ROGERS TV Provider(closes #10606) 2016-09-12 23:39:35 +01:00
fcba157e80 [ISSUE_TEMPLATE_tmpl.md] Fix typo 2016-09-12 23:29:43 +07:00
a6ccc3e518 [safari] Improve ids regexes (#10617) 2016-09-12 23:05:52 +07:00
1d16035bb4 [kaltura] Improve audio detection 2016-09-12 22:43:45 +07:00
e8bcd982cc [kaltura] Skip chun format 2016-09-12 22:33:00 +07:00
a5ff05df1a [extractor/generic] Add vimeo embed that requires Referer passed 2016-09-12 21:49:31 +07:00
d002e91986 [vimeo:ondemand] Pass Referer along with embed URL (#10624) 2016-09-12 21:48:45 +07:00
546edb2efa [ISSUE_TEMPLATE_tmpl.md] Fix typo 2016-09-12 21:01:31 +07:00
be45730226 [nbc] Add new extractor for NBC Olympics (#10295, #10361) 2016-09-12 02:55:15 +08:00
ee7e672eb0 [tube8] Remove proxy settings from test 2016-09-11 23:46:50 +07:00
0307d6fba6 release 2016.09.11.1 2016-09-11 23:33:20 +07:00
fc150cba1d [devscripts/release.sh] Add missing fi 2016-09-11 23:32:01 +07:00
d667ab7fad [ChangeLog] Actualize 2016-09-11 23:30:18 +07:00
eb87d4545a [devscripts/release.sh] Add ChangeLog reminder prompt 2016-09-11 23:29:25 +07:00
1c81476cbb release 2016.09.11 2016-09-11 23:20:09 +07:00
bc9186c882 [tvplay] Remove unused import 2016-09-11 22:51:12 +07:00
6599c72527 [tube8] Extract categories and tags (Closes #10579) 2016-09-11 22:50:36 +07:00
6bb05b32a9 [pornhub] Extract categories and tags (closes #10499) 2016-09-11 19:22:51 +08:00
fea74acad8 [foxnews] Revert to old extractor names 2016-09-11 18:54:24 +08:00
f01115c933 [openload] Temporary fix (#10408) 2016-09-11 18:36:59 +08:00
2cdbc06a1f [foxnews] Support Fox News Articles (closes #10598) 2016-09-11 18:32:45 +08:00
2cb93afcd8 [viafree] Improve video id extraction (Closes #10615) 2016-09-11 14:59:14 +07:00
bfcda07a27 [abc:iview] Skip the test. They are removed soon 2016-09-11 04:06:00 +08:00
001a5fd3d7 [iwara] Fix extraction after relaunch
Closes #10462, closes #3215
2016-09-11 03:02:00 +08:00
1e35999c1e [tfo] Add new extractor 2016-09-10 19:43:31 +01:00
2512b17493 [lrt] Fix audio extraction (Closes #10566) 2016-09-11 01:27:20 +07:00
56c0ead4d3 [9now] Improve video data extraction (Closes #10561) 2016-09-11 00:42:13 +07:00
7324243750 [9now] Fix extraction 2016-09-11 00:16:29 +07:00
84a18e9b90 [polskieradio:category] Improve extraction 2016-09-10 22:01:49 +07:00
b29f842e0e [canalplus] Add support for c8.fr (Closes #10577) 2016-09-10 20:46:45 +07:00
f009fcac0d Merge branch 'master' of github.com:rg3/youtube-dl 2016-09-10 19:21:03 +07:00
6c3affcb18 [newgrounds] Fix uploader extraction
Closes #10584

Also change test URLs to HTTPS, as proposed by
@stepshal in #10593.

Closes #10593
2016-09-10 20:09:09 +08:00
1e19ff2984 Merge branch 'polskie-radio-programme' of https://github.com/JakubAdamWieczorek/youtube-dl 2016-09-10 00:42:36 +07:00
c6129feb7f [ketnet] Add extractor (Closes #10343) 2016-09-09 23:20:45 +07:00
bb5ebd4453 [canvas] Add support for een.be (Closes #10605) 2016-09-09 22:16:21 +07:00
cb9cbd84ed [extractors] add import for TeleQuebecIE 2016-09-08 22:55:27 +01:00
4d5726b0d7 [telequebec] Add new extractor(closes #1999) 2016-09-08 22:53:44 +01:00
4614ad7b59 [parliamentliveuk] fix extraction(closes #9137) 2016-09-08 20:46:12 +01:00
b717837190 release 2016.09.08 2016-09-08 23:46:14 +07:00
2abad67e52 [ChangeLog] Actualize 2016-09-08 23:32:16 +07:00
ad0e2b3359 [abcotvs] Add support for ABC Owned Television Stations 2016-09-08 23:15:58 +07:00
37720844f6 [jwplatform] Extract height from label 2016-09-08 22:53:20 +07:00
6cfcb8ac36 [tvnoe] Do not capture unused groups in _VALID_URL 2016-09-08 22:53:20 +07:00
7a979da8cb [yahoo] Look for Brightcove Legacy Studio embeds(closes #9345) 2016-09-08 16:44:22 +01:00
2fdc7b0e04 [viafree] PEP 8 2016-09-08 22:40:02 +07:00
010d034fca [videomore] Fix extraction (Closes #10592) 2016-09-08 22:38:49 +07:00
02e552886f Merge pull request #10596 from stepshal/r_prefix
Add missing r prefix for _VALID_URLs
2016-09-08 18:31:09 +08:00
25042f7372 Add missing r prefix for _VALID_URLs 2016-09-08 17:04:57 +07:00
3f612f0767 Fix _VALID_URLs further (#10594) 2016-09-08 17:39:29 +08:00
17bf6e71cc Merge pull request #10594 from stepshal/https_support
Add support for https for rest of the exctractors.
2016-09-08 17:28:46 +08:00
881f35479d Credit @xyb for miaopai extractor (#10556) 2016-09-08 17:22:43 +08:00
89f257d6e5 Add support for https for rest of the exctractors. 2016-09-08 13:52:22 +07:00
e78a5428b6 [foxgay] Fix extraction (closes #10480) 2016-09-08 02:01:09 +08:00
6656a82481 [rmcdecouverte] Add new extractor(closes #9709) 2016-09-07 17:33:22 +01:00
d7e794928d [tlc] fix query string parsing 2016-09-07 17:33:22 +01:00
9c27188988 Merge branch 'xyb-miaopai' 2016-09-08 00:31:06 +08:00
b84d311d53 [ChangeLog] Update for #10556 2016-09-08 00:29:55 +08:00
f87feb4b68 [miaopai] Coding style (#10556) 2016-09-08 00:28:33 +08:00
2841bdcebb Merge branch 'miaopai' of https://github.com/xyb/youtube-dl into xyb-miaopai 2016-09-08 00:08:02 +08:00
84b91dd4e3 [gamestar] Fix metadata extraction (closes #10479) 2016-09-07 23:07:50 +08:00
92c9c2a88b [moevideo] Skip another removed test (#10474) 2016-09-07 22:21:59 +08:00
9d54b02bae [puls4] fix extraction(closes #10583) 2016-09-07 14:43:20 +01:00
846d8b76a0 [cctv] Add new extractor(closes #8153) 2016-09-07 10:11:09 +01:00
aa3f9fe695 Explain why and why not to specify --hls-prefer-native
This has been asked at http://stackoverflow.com/questions/39357037/what-does-youtube-dl-option-hls-prefer-native-do-any-downside-adding-to-youtu
2016-09-07 10:38:59 +02:00
8258f4457c [lci] Add new extractor(closes #10573) 2016-09-06 20:47:42 +01:00
948cd5b72d [wat] extract dash formats 2016-09-06 20:44:45 +01:00
8d3737cda7 [polskieradio] Add support for downloading whole programmes.
This extends the Polskie Radio (the Polish national radio) extractor to
enable the user to download all the broadcasts of a single programme.
2016-09-06 21:34:44 +02:00
155bc674c4 [viafree] Improve video id detection (Closes #10569) 2016-09-07 00:41:31 +07:00
c33c962adf [trutv] Add new extractor(#10519) 2016-09-06 15:56:17 +01:00
bdcc046d12 [turner] use android secure hls host and catch token extraction errors 2016-09-06 15:53:03 +01:00
a493f10208 using _parse_html5_media_entries to parse video tag 2016-09-05 23:08:33 +08:00
f3eeaacb4e [nick] Add test for #10559 2016-09-05 21:42:41 +07:00
b4d6a85d60 [nick] Add support for nickelodeon.nl (Closes #10559) 2016-09-05 21:33:14 +07:00
0b36a96212 [abcotvs] extend _VALID_URL and add support for clips.abcotvs.com(closes #9551) 2016-09-05 13:41:21 +01:00
bc22a79694 Credit @mcepl for #10524 2016-09-05 16:44:06 +08:00
340e31ca74 Merge branch 'PeterDing-bilibili' 2016-09-05 13:55:07 +08:00
973dee491f [ChangeLog] Update for #10190 2016-09-05 13:54:35 +08:00
1f85029d82 [bilibili] Simplify 2016-09-05 13:53:58 +08:00
95be19d436 [miaopai] Add new extractor 2016-09-05 13:53:09 +08:00
95843da529 Merge branch 'bilibili' of https://github.com/PeterDing/youtube-dl into PeterDing-bilibili 2016-09-05 13:47:24 +08:00
abf2c79f95 Merge branch 'mcepl-tvnoe' 2016-09-05 13:39:51 +08:00
b49ad71ce1 [ChangeLog] Update for #10524 2016-09-05 13:38:55 +08:00
9127e1533d [tvnoe] PEP8 and coding style 2016-09-05 13:37:36 +08:00
78e762d23c Add new extractor for TV Noe (Czech Christian TV).
Fixes #10520
2016-09-04 19:06:40 +02:00
4809490108 release 2016.09.04.1 2016-09-04 20:58:28 +07:00
8112bfeaba [ChangeLog] Actualize 2016-09-04 20:57:18 +07:00
d9606d9b6c release 2016.09.04 2016-09-04 20:51:48 +07:00
433af6ad30 [theplatform] fix player regex(closes #10546) 2016-09-04 14:24:41 +01:00
feaa5ad787 [youtube:playlist] Extend _VALID_URL 2016-09-04 20:12:34 +07:00
100bd86a68 [rottentomatoes] delegate extraction to InternetVideoArchiveIE 2016-09-04 11:45:29 +01:00
0def758782 [internetvideoarchive] extract all formats 2016-09-04 11:45:29 +01:00
919cf1a62f [downloader/dash] Abort if the first segment fails
Closes #10497, Closes #10542
2016-09-04 17:32:29 +08:00
b29cd56591 [pornovoisines] Fix extraction (closes #10469) 2016-09-04 17:01:39 +08:00
622638512b [rottentomatoes] Fix extraction
Closes #10467
2016-09-04 16:25:59 +08:00
37c7490ac6 [espn] Extend _VALID_URL (Closes #10549) 2016-09-04 04:59:46 +07:00
091624f9da [vimple] Extend _VALID_URL (Closes #10547) 2016-09-04 03:39:13 +07:00
7e5dc339de [youtube:watchlater] Fix extraction (Closes #10544) 2016-09-04 00:29:01 +07:00
4a69fa04e0 [downloader/dash] Abort download immediately after giving up on some fragment 2016-09-03 17:51:48 +07:00
2e99cd30c3 [downloader/dash:hls] Report exact fragment error on retry 2016-09-03 17:51:48 +07:00
25afc2a783 [downloader/dash:hls] Respect --fragment-retries and --skip-unavailable-fragments (Closes #10165, closes #10448) 2016-09-03 17:51:48 +07:00
9603b66012 Introduce --skip-unavailable-fragments 2016-09-03 17:51:48 +07:00
45aab4d30b [youjizz] Fix extraction. The site has moved to HTML5
Closes #10437
2016-09-03 18:37:36 +08:00
ed2bfe93aa [fc2:embed] Add ie_key 2016-09-03 18:22:00 +08:00
cdc783510b [foxnews:insider] Add new extractor
Closes #10445
2016-09-03 18:16:19 +08:00
cf0efe9636 [fc2:embed] New extractor for Flash player URLs
Closes #10512
2016-09-03 17:25:03 +08:00
dedb177029 Fix parsing of HTML5 media elements
This fixes an error in _parse_html5_media_entries in case
an audio or video tag directly uses a src attribute insted
of <source> elements in it's body.
2016-09-03 16:09:35 +07:00
86c3bbbced release 2016.09.03 2016-09-03 01:46:41 +07:00
4b3a607658 [ChangeLog] Actualize 2016-09-03 01:45:17 +07:00
3a7d35b982 Credit @C4K3 for #10536 2016-09-03 01:42:33 +07:00
6496ccb413 [youtube] Add support for rental videos' previews (Closes #10532) 2016-09-03 01:17:15 +07:00
3fcce30289 [drtv] Update tests 2016-09-02 23:53:17 +07:00
c2b2c7e138 [utils] Add quicktime to mimetype2ext 2016-09-02 23:50:42 +07:00
dacb3a864a [youtube:playlist] Fallback to video extraction for video/playlist URLs when playlist is broken (Closes #10537) 2016-09-02 23:43:20 +07:00
6066d03db0 [drtv] Modernize and make more robust 2016-09-02 23:02:15 +07:00
6562d34a8c [utils] Improve mimetype2ext 2016-09-02 22:57:48 +07:00
5e9e3d0f6b [drtv] Add support for dr.dk/nyheder
It's the same video player, the only difference is that the video player
is loaded differently, and certain metadata (title and description) is
not available under dr.dk/mu, so make it by default get that from some
of the html meta tags.

Skip the dr.dk/tv test

dr.dk/tv videos are only available for between 7 and 90 days due to
Danish law, and in certain cases may be readded. Skip this test as it is
no longer available.
2016-09-02 22:20:36 +07:00
349fc5c705 [facebook:plugins:video] Add extractor (Closes #10530) 2016-09-02 21:13:50 +07:00
2c3e0af93e [go] Add new extractor 2016-09-02 09:53:04 +01:00
6150502e47 [adobepass] check for authz_token expiration(#10527) 2016-09-01 22:29:20 +01:00
b207d5ebd4 [curiositystream] don't cache auth token 2016-09-01 19:46:58 +01:00
4191779dcd [nytimes] improve extraction 2016-09-01 19:08:29 +01:00
f97ec8bcb9 [glide] Remove unused import 2016-09-01 23:46:58 +07:00
8276d3b87a [thestar] Fix extraction (Closes #10465) 2016-09-01 23:46:15 +07:00
af95ee94b4 [glide] Fix extraction (Closes #10478) 2016-09-01 23:38:49 +07:00
8fb6af6bba [exfm] Remove extractor (Closes #10482) 2016-09-01 23:32:28 +07:00
f6af0f888b [youporn] Fix categories and tags extraction (Closes #10521) 2016-09-01 23:15:01 +07:00
e816c9d158 [extractor/common] Simplify _extract_m3u8_formats 2016-09-01 22:18:16 +07:00
9250181f37 [extractor/common] Restore NAME usage from EXT-X-MEDIA tag for formats codes in _extract_m3u8_formats (Closes #10522) 2016-09-01 21:37:25 +07:00
f096ec2625 [curiositystream] Add new extractor 2016-09-01 13:37:09 +01:00
4c8ab6fd71 [thvideo] Remove extractor. Website down.
Closes #10464

According to a screenshot in http://tieba.baidu.com/p/4691302183,
thvideo.tv is shut down "temporarily". I see no clues that it will be up
again, so I remove it here.
2016-09-01 17:04:41 +08:00
05d4612947 [movingimage] Adapt to the new domain name and fix extraction
Closes #10466
2016-09-01 16:58:16 +08:00
746a695b36 [myvidster] Update _TESTS (closes #10473) 2016-09-01 16:42:35 +08:00
165c54e97d [southpark.cc.com:español] Skip geo-restricted _TESTS
Breaks https://travis-ci.org/rg3/youtube-dl/jobs/156728175
2016-09-01 16:28:03 +08:00
2896dd73bc [cbs] extract once formats(closes #10515) 2016-09-01 08:00:13 +01:00
f8fd510eb4 [limelight] skip ism manifests and reduce requests 2016-08-31 18:32:15 +01:00
7a3e849f6e [porncom] Extract categories and tags (Closes #10510) 2016-08-31 22:23:55 +07:00
196c6ba067 [facebook] Extract timestamp (Closes #10508) 2016-08-31 22:12:37 +07:00
165620e320 [yahoo] extract more and better formats 2016-08-30 21:49:28 +01:00
4fd350611c release 2016.08.31 2016-08-31 02:39:39 +07:00
263fef43de [ChangeLog] Actualize 2016-08-31 02:37:40 +07:00
a249ab83cb [pyvideo] Remove debugging code 2016-08-31 01:56:58 +07:00
f7043ef39c [soundcloud] Fix _VALID_URL clashes with sets (Closes #10505) 2016-08-31 01:56:15 +07:00
64fc49aba0 [bandcamp:album] Fix title extraction (Closes #10455) 2016-08-31 00:29:49 +07:00
245023a861 [pyvideo] Fix extraction (Closes #10468) 2016-08-30 23:51:18 +07:00
3c77a54d5d [turner] keep video id intact 2016-08-30 10:46:48 +01:00
da30a20a4d [turner,cnn] move a check for wrong timestamp to CNNIE 2016-08-29 19:26:53 +01:00
1fe48afea5 [cnn] update _TEST for CNNBlogsIE and CNNArticleIE(closes #10489) 2016-08-29 18:24:16 +01:00
42e05be867 [ctv] add support for (tsn,bnn,thecomedynetwork).ca websites(#10016) 2016-08-29 18:24:16 +01:00
fe45b0e060 [9c9media] fix multiple stacks extraction and extract more metadata(#10016) 2016-08-29 18:24:16 +01:00
a06e1498aa [kusi] Update test 2016-08-29 22:54:33 +07:00
5a80e7b43a [turner] Skip invalid subtitles' URLs 2016-08-29 22:44:15 +07:00
3fb2a23029 [adultswim] Extract video info from onlineOriginals (Closes #10492) 2016-08-29 22:40:35 +07:00
7be15d4097 [bilibili] Support episodes
[extractor/bilibili] add md5 for testing

[extractor/bilibili] remove unnecessary headers

[extractor/bilibili] correct _TESTS; find thumbnail for episode

[extractor/bilibili] [Fix] restore removed tests
2016-08-29 23:31:08 +08:00
cd10b3ea63 [turner] Extract all formats 2016-08-29 22:13:49 +07:00
547993dcd0 [turner] Fix subtitles extraction 2016-08-29 21:52:41 +07:00
6c9b71bc08 [downloader/external] Recommend --hls-prefer-native for SOCKS users
Related: #10490
2016-08-29 19:05:38 +08:00
93b8404599 [generic,vodplatform] improve embed regex 2016-08-29 07:57:20 +01:00
9ba1e1dcc0 [played] Remove extractor (Closes #10470) 2016-08-29 08:26:07 +07:00
b8079a40bc [turner] fix secure m3u8 formats downloading 2016-08-28 17:51:53 +01:00
5bc8a73af6 [cartoonnetwork] make extraction work for more videos in the website
some videos require `networkName=CN2` to be present in the feed url
2016-08-28 17:08:26 +01:00
b3eaeded12 [tbs] Add new extractor(#10222) 2016-08-28 16:51:09 +01:00
ec65b391cb [cartoonnetwork] Add new extractor(#10110) 2016-08-28 16:51:09 +01:00
2982514072 [turner,nba,cnn,adultswim] add base extractor to parse cvp feeds 2016-08-28 16:51:09 +01:00
98908bcf7c [openload] Update algorithm again (#10408) 2016-08-28 22:49:46 +08:00
04b32c8f96 [bilibili] Fix extraction (closes #10375)
Thanks @gdkchan for the algorithm
2016-08-28 22:06:31 +08:00
40eec6b15c [openload] Fix extraction (closes #10408)
Thanks to @yokrysty again!
2016-08-28 20:27:52 +08:00
39efc6e3e0 [generic] Update some _TESTS 2016-08-28 15:46:11 +08:00
1198fe14a1 release 2016.08.28 2016-08-28 07:24:08 +07:00
71e90766b5 [README.md] Fix typo in download archive FAQ entry 2016-08-28 07:09:03 +07:00
d7aae610f6 [ChangeLog] Actualize 2016-08-28 07:00:15 +07:00
92c27a0dbf [periscope:user] Fix extraction (Closes #10453) 2016-08-28 02:35:49 +07:00
d181cff685 Merge branch 'steven7851-patch-2' 2016-08-27 01:17:12 +08:00
3b4b82d4ce [douyutv] Simplify 2016-08-27 01:16:39 +08:00
545ef4f531 Merge branch 'patch-2' of https://github.com/steven7851/youtube-dl into steven7851-patch-2 2016-08-26 22:29:46 +08:00
906b87cf5f [crackle] Revert to template-based thumbnail extraction
To reduce to number of HTTP requests
2016-08-26 19:58:47 +08:00
b281aad2dc [douyutv] Use new api
use lapi for flv info, and html5 api for room info
#10153 #10318
2016-08-26 07:32:54 +08:00
6b18a24e6e [tnaflix] Fix extraction (Closes #10434) 2016-08-26 05:57:52 +07:00
c9de980106 Credit @Xender for nhk:vod (#10424) 2016-08-26 04:49:52 +07:00
f9b373afda [nhk:vod] Improve extraction (Closes #10424) 2016-08-26 04:48:40 +07:00
298a120ab7 [nhk] Add extractor for VoD. 2016-08-26 04:15:51 +07:00
e3faecde30 [trutube] Remove extractor (Closes #10438) 2016-08-26 03:43:13 +07:00
a0f071a50d [usanetwork] Add new extractor 2016-08-25 19:41:31 +01:00
20bad91d76 [downloader/external] Clarify that ffmpeg doesn't support SOCKS
Ref: #10304
2016-08-25 22:38:06 +08:00
b54a2da433 [crackle] Fix extraction and update _TESTS (closes #10333) 2016-08-25 22:22:31 +08:00
dc2c37f316 [spankbang] Fix description and uploader (closes #10339) 2016-08-25 20:47:35 +08:00
c1f62dd338 [README] Clean up grammar in --download-archive paragraph 2016-08-25 14:45:01 +02:00
5a3efcd27c [README.md] Add FAQ entry for download archive 2016-08-25 18:57:31 +07:00
4c8f9c2577 [README.md] Add comments in sample configuration for clarity 2016-08-25 18:27:15 +07:00
f26a298247 [README.md] Use en-US URL in cookies FAQ entry 2016-08-25 18:19:41 +07:00
ea01cdbf61 [README.md] Clarify how to export cookies from browser for cookies FAQ entry 2016-08-25 18:17:45 +07:00
6a76b53355 [README.md] Quote URL in streaming to player FAQ entry 2016-08-25 18:05:01 +07:00
d37708fc86 [YoutubeDL] check only for None Value in thumbnails sorting 2016-08-25 11:53:47 +01:00
5c13c28566 raise unexpected error when no stream found 2016-08-25 09:55:23 +01:00
f70e9229e6 [discoverygo] detect when video needs authentication(closes #10425) 2016-08-25 09:11:23 +01:00
30afe4aeb2 [cbc] Add support for watch.cbc.ca 2016-08-25 08:49:44 +01:00
75fa990dc6 [YoutubeDL] add fallback value for thumbnails values in thumbnails sorting 2016-08-25 08:49:44 +01:00
f39ffc5877 [common] extract formats from #EXT-X-MEDIA tags 2016-08-25 08:49:44 +01:00
07ea9c9b05 [downloader/hls] fill IV with zeros for IVs shorter than 16-octet 2016-08-25 08:49:44 +01:00
073ac1225f [utils] add ac-3 to the list of audio codecs in parse_codecs 2016-08-25 08:49:44 +01:00
0c6422cdd6 [README.md] Add FAQ entry for streaming to player 2016-08-25 07:34:55 +07:00
08773689f3 [kickstarter] Silent the warning for og:description
Closes #10415
2016-08-25 01:29:32 +08:00
0c75abbb7b [mtvservices:embedded] Use another endpoint to get feed URL
Closes #10363

In the original mtvservices:embedded test case, config.xml is still used
to get the feed URL. Some other examples, including test_Generic_40
(http://www.vulture.com/2016/06/new-key-peele-sketches-released.html),
and the video mentioned in #10363, use another endpoint to get the feed
URL. The 'index.html' approach works for the original test case, too. So
I didn't keep the old approach.
2016-08-24 23:58:22 +08:00
97653f81b2 [bilibili] Mark as broken
Bilibili now uses emscripten, which is very difficult for reverse
engineering. I don't expect it to be fixed in near future, so I mark
it as broken.

Ref: #10375
2016-08-24 21:28:00 +08:00
d38b27dd9b release 2016.08.24.1 2016-08-24 10:11:04 +07:00
6d94cbd2f4 [ChangeLog] Actualize 2016-08-24 10:07:06 +07:00
30317f4887 [pluralsight] Modernize and make more robust 2016-08-24 08:52:12 +07:00
8c3e35dd44 [pluralsight] Add support for subtitles (Closes #9681) 2016-08-24 08:41:52 +07:00
c86f51ee38 release 2016.08.24 2016-08-24 01:38:46 +07:00
6e52bbb413 [ChangeLog] Actualize 2016-08-24 01:36:27 +07:00
05bddcc512 [youtube] Fix authentication (2) (Closes #10392) 2016-08-24 01:29:50 +07:00
1212e9972f [youtube] Fix authentication (#10392) 2016-08-24 00:25:21 +07:00
ccb6570e9e [syfy,bravotv] restrict drupal settings regex 2016-08-23 17:31:35 +01:00
18b6216150 [openload] Fix extraction (closes #10408)
Thanks @yokrysty for the algorithm
2016-08-23 21:55:58 +08:00
fb009b7f53 [bravotv] correct clip info extraction and add support for adobe pass auth(closes #10407) 2016-08-23 10:29:52 +01:00
3083e4dc07 [eagleplatform] Improve detection of embedded videos (Closes #10409) 2016-08-23 07:22:14 +07:00
7367bdef23 [awaan] fix extraction, modernize, rename the extractors and add test for live stream 2016-08-22 23:10:06 +01:00
ad31642584 [nrk,abc:iview] use _extract_akamai_formats 2016-08-22 07:54:08 +01:00
c7c43a93ba [common] add helper method to extract akamai m3u8 and f4m formats 2016-08-22 07:49:34 +01:00
96229e5f95 [mtvservices:embedded] Update config URL
All starts from #10363. The test case in mtvservices:embedded uses
config.xml, while the video from #10363 and the test case in generic.py
is broken. Both uses index.html for fetching the feed URL.
2016-08-22 13:56:09 +08:00
55d119e2a1 [abc:iview] Add new extractor(closes #6148) 2016-08-22 00:07:17 +01:00
6d2679ee26 release 2016.08.22 2016-08-22 04:17:34 +07:00
afbab5688e [ChangeLog] Actualize 2016-08-22 04:15:46 +07:00
3d897cc791 [ivi] Fix episode number extraction 2016-08-22 03:34:27 +07:00
cf143c4d97 [ivi] Add support for 720p and 1080p 2016-08-22 03:31:33 +07:00
ad120ae1c5 [extractor/common] Change the default m3u8 protocol in HTML5
Helper functions should have consistent default values
2016-08-22 02:26:07 +08:00
d0fa172e5f [firsttv] keep a test videos with multiple formats 2016-08-21 19:13:43 +01:00
f97f9f71e5 Merge branch 'TRox1972-charlierose' 2016-08-22 02:11:43 +08:00
526656726b [charlierose] Simplify and improve 2016-08-22 02:06:47 +08:00
9b8c554ea7 [firsttv] fix extraction(closes #9249) 2016-08-21 17:56:25 +01:00
d13bfc07b7 Merge branch 'charlierose' of https://github.com/TRox1972/youtube-dl into TRox1972-charlierose 2016-08-22 00:48:35 +08:00
efe470e261 [twitch] Renew authentication 2016-08-21 22:45:50 +07:00
e3f6b56909 [twitch] Refactor API calls 2016-08-21 22:09:29 +07:00
b1e676fde8 [twitch] Modernize 2016-08-21 21:28:02 +07:00
92d4cfa358 [kaltura] Fallback ext calculation on caption's format 2016-08-21 21:01:01 +07:00
3d47ee0a9e [zingmp3] fix extraction and add support for video clips(closes #10041) 2016-08-21 14:09:48 +01:00
d164a0d41b [README.md] Add a format selection example using comma
Ref: #10399
2016-08-21 20:00:48 +08:00
db29af6d36 [charlierose] Add new extractor 2016-08-21 11:29:48 +02:00
2c6acdfd2d [kaltura] Add test for #10279 2016-08-21 08:37:01 +07:00
fddaa76a59 [kaltura] Assume ttml to be default subtitles' extension 2016-08-21 08:28:36 +07:00
a809446750 [kaltura] Add subtitles support when entry_id is unknown beforehand (Closes #10279) 2016-08-21 08:28:36 +07:00
d8f30a7e66 [kaltura] Remove unused code 2016-08-21 08:28:36 +07:00
5b1d85754e [YoutubeDL] Autocalculate ext when ext is None 2016-08-21 08:28:36 +07:00
e25586e471 [cultureunplugged] fix extraction(closes #10330) 2016-08-20 20:02:49 +01:00
292a2301bf [cnn] add support for money.cnn.com videos(closes #2797) 2016-08-20 19:00:25 +01:00
dabe15701b [cbs, cbsnews] fix extraction(fixes #10393) 2016-08-20 13:25:32 +01:00
4245f55880 [dotsub] Replace test (Closes #10386) 2016-08-20 06:18:20 +07:00
5b9d187cc6 [imdb] Improve title extraction and make thumbnail non-fatal 2016-08-20 04:50:39 +07:00
39e1c4f08c [litv] Support 'promo' URLs (closes #10385) 2016-08-20 00:52:37 +08:00
19f35402c5 [snotr] Fix extraction (closes #10338) 2016-08-20 00:18:22 +08:00
70852b47ca [utils] Recognize units with full names in parse_filename
Reference: https://en.wikipedia.org/wiki/Template:Quantities_of_bytes
2016-08-20 00:17:26 +08:00
a9a3b4a081 [miomio] Adapt to the new API and update _TESTS
The test case is from #9680
2016-08-20 00:08:23 +08:00
ecc90093f9 [vuclip] Adapt to the new API and update _TEST 2016-08-19 23:56:09 +08:00
520251c093 [extractor/common] Recognize m3u8 manifests in HTML5 multimedia tags 2016-08-19 23:53:47 +08:00
55af45fcab [radiobremen] Update _TEST (closes #10337) 2016-08-19 23:12:30 +08:00
b82232036a [n-tv.de] Fix extraction (closes #10331) 2016-08-19 20:39:28 +08:00
e4659b4547 [utils] Correct octal/hexadecimal number detection in js_to_json 2016-08-19 20:37:17 +08:00
9e5751b9fe [globo:article] Relax _VALID_URL and video id regex (Closes #10379) 2016-08-19 01:13:45 +07:00
bd1bcd3ea0 release 2016.08.19 2016-08-19 00:15:12 +07:00
93a63b36f1 [ChangeLog] Actualize 2016-08-19 00:13:24 +07:00
8b2dc4c328 [options] Remove output template description from --help
Same reasons as for --format
2016-08-18 23:59:13 +07:00
850837b67a [porncom] Add extractor (Closes #2251, closes #10251) 2016-08-18 23:52:41 +07:00
13585d7682 [utils] Recognize lowercase units in parse_filesize 2016-08-18 23:32:00 +07:00
fd3ec986a4 [generic] Fix dbtv test (Closes #10364) 2016-08-18 21:35:41 +07:00
b0d578ff7b [dbtv] Relax embed regex 2016-08-18 21:30:55 +07:00
b0c8f2e9c8 [DBTV:generic] Add support for embeds 2016-08-18 21:29:27 +07:00
51815886a9 [vk:wallpost] Fix audio extraction 2016-08-18 06:14:05 +07:00
08a42f9c74 [vk] Fix authentication on python3 2016-08-18 05:22:23 +07:00
e15ad9ef09 [keezmovies] PEP 8 2016-08-18 04:39:31 +07:00
4e9fee1015 [hgtvcom:show] Add extractor (Closes #10365) 2016-08-18 04:37:14 +07:00
7273e5849b [discoverygo] extend _VALID_URL to support other networks 2016-08-17 11:03:09 +01:00
b505e98784 [extremetube] Revert display_id 2016-08-17 07:02:13 +07:00
92cd9fd565 [keezmovies] Make display_id optional 2016-08-17 07:01:32 +07:00
b3d7dce429 release 2016.08.17 2016-08-17 06:21:21 +07:00
a44694ab4e [ChangeLog] Actualize 2016-08-17 06:19:22 +07:00
ab19b46b88 [extremetube] Modernize 2016-08-17 06:02:12 +07:00
8804f10e6b [tube8] Modernize 2016-08-17 05:46:45 +07:00
6be17c0870 [mofosex] Extract all formats and modernize (Closes #10335) 2016-08-17 05:45:49 +07:00
8652770bd2 [keezmovies] Improve and modernize 2016-08-17 05:44:46 +07:00
2a1321a272 [vbox7:generic] Add support for vbox7 embeds 2016-08-17 01:02:59 +07:00
9c0fa60bf3 [vbox7] Add support for embed URLs 2016-08-17 00:42:02 +07:00
502d87c546 [mtg] Improve view count extraction 2016-08-17 00:32:28 +07:00
b35b0d73d8 [viafree] Add extractor (Closes #10358) 2016-08-17 00:21:30 +07:00
6e7e4a6edf [mtg] Add support for viafree URLs (#10358) 2016-08-17 00:19:43 +07:00
53fef319f1 [fxnetworks] extend _VALID_URL to support simpsonsworld.com 2016-08-16 16:22:34 +01:00
2cabee2a7d [amcnetworks] fix typo 2016-08-16 16:22:34 +01:00
11f502fac1 [theplatform] extract subtitles with multiple formats from the metadata 2016-08-16 16:22:34 +01:00
98affc1a48 [xvideos] Fix test 2016-08-16 21:20:15 +07:00
70a2829fee [xvideos] Fix HLS extraction (Closes #10356) 2016-08-16 21:17:52 +07:00
837e56c8ee [amcnetworks] extract episode metadata 2016-08-16 14:49:32 +01:00
b5ddee8c77 [amcnetworks] Add new extractor 2016-08-16 13:44:01 +01:00
fb64adcbd3 [adobepass] PEP 8 2016-08-16 04:45:21 +07:00
4f640f2890 [bbc:playlist] Fix tests 2016-08-16 04:43:10 +07:00
254e64a20a [bbc:playlist] Add support for pagination (Closes #10349) 2016-08-16 04:36:23 +07:00
818ac213eb [adobepass] add IE suffix to the extractor and remove duplicate constant 2016-08-15 21:36:34 +01:00
cbef4d5c9f [fxnetworks] add test and check geo restriction 2016-08-15 17:10:45 +01:00
bf90c46790 [fxnetworks] Add new extractor(closes #9462) 2016-08-15 16:34:32 +01:00
69eb4d699f [cbsnews] Remove invalid tests. CBS Live videos gets deleted soon. 2016-08-15 20:29:22 +08:00
6d8ec8c3b7 [ChangeLog] Update for CBSLocal and related changes 2016-08-15 13:39:43 +08:00
760845ce99 [cbslocal] Adapt to SendtoNewsIE 2016-08-15 13:37:37 +08:00
5c2d087221 [sendtonews] Fix extraction 2016-08-15 13:31:08 +08:00
b6c4e36728 [jwplatform] Parse video_id from JWPlayer data
And remove a mysterious comma from 115c65793a
2016-08-15 13:29:01 +08:00
1a57b8c18c [zippcast] Remove extractor (Closes #10332)
ZippCast is shut down
2016-08-15 08:25:24 +07:00
24eb13b1c6 [uplynk,viceland] update tests and change uplynk extractors names 2016-08-14 22:45:43 +01:00
525e0316c0 [adobepass] fix check for pendingLogout errors 2016-08-14 21:25:43 +01:00
7e60ce9cf7 [adobepass] clear cache in case of pendingLogout errors 2016-08-14 21:24:33 +01:00
e811bcf8f8 [viceland] raise ExtractorError for errors other than HTTP 400 2016-08-14 20:13:35 +01:00
6103f59095 [viceland] remove outdated comment 2016-08-14 19:08:35 +01:00
9fa5789279 [viceland] fix info extraction(closes #8799) 2016-08-14 19:04:23 +01:00
d2ac04674d [viceland] Add new extractor(#8799) 2016-08-14 18:04:50 +01:00
1fd6e30988 [adobepass] create separate class for adobe pass authentication 2016-08-14 18:04:50 +01:00
884cdb6cd9 [life:embed] Improve extraction 2016-08-14 20:49:11 +07:00
9771b1f901 [theplatform] use _get_netrc_login_info and fix session expiration check(#10345) 2016-08-14 11:55:28 +01:00
2118fdd1a9 [common] add separate method for getting netrc ligin info 2016-08-14 11:55:28 +01:00
320d597c21 [vgtv] Detect geo restricted videos (#10348) 2016-08-14 16:25:14 +07:00
aaf44a2f47 [uplynk] Add new extractor 2016-08-13 22:53:41 +01:00
fafabc0712 Update ChangeLog for #10342
[skip ci]
2016-08-14 02:33:15 +08:00
409760a932 Merge pull request #10342 from muphil/patch-1
[xiami] bug fix for extractor xiami.py
2016-08-14 02:30:50 +08:00
phi
097eba019d bug fix for extractor xiami.py
Before applying this patch, when downloading resources from xiami.com, it crashes with these:
Traceback (most recent call last):
  File "/home/phi/.local/bin/youtube-dl", line 11, in <module>
    sys.exit(main())
  File "/home/phi/.local/lib/python3.5/site-packages/youtube_dl/__init__.py", line 433, in main
    _real_main(argv)
  File "/home/phi/.local/lib/python3.5/site-packages/youtube_dl/__init__.py", line 423, in _real_main
    retcode = ydl.download(all_urls)
  File "/home/phi/.local/lib/python3.5/site-packages/youtube_dl/YoutubeDL.py", line 1786, in download
    url, force_generic_extractor=self.params.get('force_generic_extractor', False))
  File "/home/phi/.local/lib/python3.5/site-packages/youtube_dl/YoutubeDL.py", line 691, in extract_info
    ie_result = ie.extract(url)
  File "/home/phi/.local/lib/python3.5/site-packages/youtube_dl/extractor/common.py", line 347, in extract
    return self._real_extract(url)
  File "/home/phi/.local/lib/python3.5/site-packages/youtube_dl/extractor/xiami.py", line 116, in _real_extract
    return self._extract_tracks(self._match_id(url))[0]
  File "/home/phi/.local/lib/python3.5/site-packages/youtube_dl/extractor/xiami.py", line 43, in _extract_tracks
    '%s/%s%s' % (self._API_BASE_URL, item_id, '/type/%s' % typ if typ else ''), item_id)
  File "/home/phi/.local/lib/python3.5/site-packages/youtube_dl/extractor/common.py", line 562, in _download_json
    json_string, video_id, transform_source=transform_source, fatal=fatal)
  File "/home/phi/.local/lib/python3.5/site-packages/youtube_dl/extractor/common.py", line 568, in _parse_json
    return json.loads(json_string)
  File "/usr/lib/python3.5/json/__init__.py", line 312, in loads
    s.__class__.__name__))
TypeError: the JSON object must be str, not 'NoneType'

This patch solves exactly this problem.
2016-08-14 02:18:59 +08:00
358 changed files with 11443 additions and 4217 deletions

View File

@ -6,8 +6,8 @@
--- ---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.08.13*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected. ### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.11.02*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.08.13** - [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.11.02**
### Before submitting an *issue* make sure you have: ### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections - [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2016.08.13 [debug] youtube-dl version 2016.11.02
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}
@ -55,4 +55,4 @@ $ youtube-dl -v <your command line>
### Description of your *issue*, suggested solution and other information ### Description of your *issue*, suggested solution and other information
Explanation of your *issue* in arbitrary form goes here. Please make sure the [description is worded well enough to be understood](https://github.com/rg3/youtube-dl#is-the-description-of-the-issue-itself-sufficient). Provide as much context and examples as possible. Explanation of your *issue* in arbitrary form goes here. Please make sure the [description is worded well enough to be understood](https://github.com/rg3/youtube-dl#is-the-description-of-the-issue-itself-sufficient). Provide as much context and examples as possible.
If work on your *issue* required an account credentials please provide them or explain how one can obtain them. If work on your *issue* requires account credentials please provide them or explain how one can obtain them.

View File

@ -55,4 +55,4 @@ $ youtube-dl -v <your command line>
### Description of your *issue*, suggested solution and other information ### Description of your *issue*, suggested solution and other information
Explanation of your *issue* in arbitrary form goes here. Please make sure the [description is worded well enough to be understood](https://github.com/rg3/youtube-dl#is-the-description-of-the-issue-itself-sufficient). Provide as much context and examples as possible. Explanation of your *issue* in arbitrary form goes here. Please make sure the [description is worded well enough to be understood](https://github.com/rg3/youtube-dl#is-the-description-of-the-issue-itself-sufficient). Provide as much context and examples as possible.
If work on your *issue* required an account credentials please provide them or explain how one can obtain them. If work on your *issue* requires account credentials please provide them or explain how one can obtain them.

View File

@ -10,8 +10,13 @@
- [ ] At least skimmed through [adding new extractor tutorial](https://github.com/rg3/youtube-dl#adding-support-for-a-new-site) and [youtube-dl coding conventions](https://github.com/rg3/youtube-dl#youtube-dl-coding-conventions) sections - [ ] At least skimmed through [adding new extractor tutorial](https://github.com/rg3/youtube-dl#adding-support-for-a-new-site) and [youtube-dl coding conventions](https://github.com/rg3/youtube-dl#youtube-dl-coding-conventions) sections
- [ ] [Searched](https://github.com/rg3/youtube-dl/search?q=is%3Apr&type=Issues) the bugtracker for similar pull requests - [ ] [Searched](https://github.com/rg3/youtube-dl/search?q=is%3Apr&type=Issues) the bugtracker for similar pull requests
### In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under [Unlicense](http://unlicense.org/). Check one of the following options:
- [ ] I am the original author of this code and I am willing to release it under [Unlicense](http://unlicense.org/)
- [ ] I am not the original author of this code but it is in public domain or released under [Unlicense](http://unlicense.org/) (provide reliable evidence)
### What is the purpose of your *pull request*? ### What is the purpose of your *pull request*?
- [ ] Bug fix - [ ] Bug fix
- [ ] Improvement
- [ ] New extractor - [ ] New extractor
- [ ] New feature - [ ] New feature

1
.gitignore vendored
View File

@ -29,6 +29,7 @@ updates_key.pem
*.m4a *.m4a
*.m4v *.m4v
*.mp3 *.mp3
*.3gp
*.part *.part
*.swp *.swp
test/testdata test/testdata

11
AUTHORS
View File

@ -26,7 +26,7 @@ Albert Kim
Pierre Rudloff Pierre Rudloff
Huarong Huo Huarong Huo
Ismael Mejía Ismael Mejía
Steffan 'Ruirize' James Steffan Donal
Andras Elso Andras Elso
Jelle van der Waa Jelle van der Waa
Marcin Cieślak Marcin Cieślak
@ -181,3 +181,12 @@ Nehal Patel
Rob van Bekkum Rob van Bekkum
Petr Zvoníček Petr Zvoníček
Pratyush Singh Pratyush Singh
Aleksander Nitecki
Sebastian Blunt
Matěj Cepl
Xie Yanbo
Philip Xu
John Hawkinson
Rich Leeper
Zhong Jianxin
Thor77

View File

@ -12,7 +12,7 @@ $ youtube-dl -v <your command line>
[debug] Proxy map: {} [debug] Proxy map: {}
... ...
``` ```
**Do not post screenshots of verbose log only plain text is acceptable.** **Do not post screenshots of verbose logs; only plain text is acceptable.**
The output (including the first lines) contains important debugging information. Issues without the full output are often not reproducible and therefore do not get solved in short order, if ever. The output (including the first lines) contains important debugging information. Issues without the full output are often not reproducible and therefore do not get solved in short order, if ever.
@ -66,7 +66,7 @@ Only post features that you (or an incapacitated friend you can personally talk
### Is your question about youtube-dl? ### Is your question about youtube-dl?
It may sound strange, but some bug reports we receive are completely unrelated to youtube-dl and relate to a different or even the reporter's own application. Please make sure that you are actually using youtube-dl. If you are using a UI for youtube-dl, report the bug to the maintainer of the actual application providing the UI. On the other hand, if your UI for youtube-dl fails in some way you believe is related to youtube-dl, by all means, go ahead and report the bug. It may sound strange, but some bug reports we receive are completely unrelated to youtube-dl and relate to a different, or even the reporter's own, application. Please make sure that you are actually using youtube-dl. If you are using a UI for youtube-dl, report the bug to the maintainer of the actual application providing the UI. On the other hand, if your UI for youtube-dl fails in some way you believe is related to youtube-dl, by all means, go ahead and report the bug.
# DEVELOPER INSTRUCTIONS # DEVELOPER INSTRUCTIONS
@ -85,7 +85,7 @@ To run the test, simply invoke your favorite test runner, or execute a test file
If you want to create a build of youtube-dl yourself, you'll need If you want to create a build of youtube-dl yourself, you'll need
* python * python
* make (both GNU make and BSD make are supported) * make (only GNU make is supported)
* pandoc * pandoc
* zip * zip
* nosetests * nosetests
@ -167,19 +167,19 @@ In any case, thank you very much for your contributions!
This section introduces a guide lines for writing idiomatic, robust and future-proof extractor code. This section introduces a guide lines for writing idiomatic, robust and future-proof extractor code.
Extractors are very fragile by nature since they depend on the layout of the source data provided by 3rd party media hoster out of your control and this layout tend to change. As an extractor implementer your task is not only to write code that will extract media links and metadata correctly but also to minimize code dependency on source's layout changes and even to make the code foresee potential future changes and be ready for that. This is important because it will allow extractor not to break on minor layout changes thus keeping old youtube-dl versions working. Even though this breakage issue is easily fixed by emitting a new version of youtube-dl with fix incorporated all the previous version become broken in all repositories and distros' packages that may not be so prompt in fetching the update from us. Needless to say some may never receive an update at all that is possible for non rolling release distros. Extractors are very fragile by nature since they depend on the layout of the source data provided by 3rd party media hosters out of your control and this layout tends to change. As an extractor implementer your task is not only to write code that will extract media links and metadata correctly but also to minimize dependency on the source's layout and even to make the code foresee potential future changes and be ready for that. This is important because it will allow the extractor not to break on minor layout changes thus keeping old youtube-dl versions working. Even though this breakage issue is easily fixed by emitting a new version of youtube-dl with a fix incorporated, all the previous versions become broken in all repositories and distros' packages that may not be so prompt in fetching the update from us. Needless to say, some non rolling release distros may never receive an update at all.
### Mandatory and optional metafields ### Mandatory and optional metafields
For extraction to work youtube-dl relies on metadata your extractor extracts and provides to youtube-dl expressed by [information dictionary](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L75-L257) or simply *info dict*. Only the following meta fields in *info dict* are considered mandatory for successful extraction process by youtube-dl: For extraction to work youtube-dl relies on metadata your extractor extracts and provides to youtube-dl expressed by an [information dictionary](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L75-L257) or simply *info dict*. Only the following meta fields in the *info dict* are considered mandatory for a successful extraction process by youtube-dl:
- `id` (media identifier) - `id` (media identifier)
- `title` (media title) - `title` (media title)
- `url` (media download URL) or `formats` - `url` (media download URL) or `formats`
In fact only the last option is technically mandatory (i.e. if you can't figure out the download location of the media the extraction does not make any sense). But by convention youtube-dl also treats `id` and `title` to be mandatory. Thus aforementioned metafields are the critical data the extraction does not make any sense without and if any of them fail to be extracted then extractor is considered completely broken. In fact only the last option is technically mandatory (i.e. if you can't figure out the download location of the media the extraction does not make any sense). But by convention youtube-dl also treats `id` and `title` as mandatory. Thus the aforementioned metafields are the critical data that the extraction does not make any sense without and if any of them fail to be extracted then the extractor is considered completely broken.
[Any field](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L149-L257) apart from the aforementioned ones are considered **optional**. That means that extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they are always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields. [Any field](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L149-L257) apart from the aforementioned ones are considered **optional**. That means that extraction should be **tolerant** to situations when sources for these fields can potentially be unavailable (even if they are always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields.
#### Example #### Example
@ -199,7 +199,7 @@ Assume at this point `meta`'s layout is:
} }
``` ```
Assume you want to extract `summary` and put into resulting info dict as `description`. Since `description` is optional metafield you should be ready that this key may be missing from the `meta` dict, so that you should extract it like: Assume you want to extract `summary` and put it into the resulting info dict as `description`. Since `description` is an optional metafield you should be ready that this key may be missing from the `meta` dict, so that you should extract it like:
```python ```python
description = meta.get('summary') # correct description = meta.get('summary') # correct
@ -211,7 +211,7 @@ and not like:
description = meta['summary'] # incorrect description = meta['summary'] # incorrect
``` ```
The latter will break extraction process with `KeyError` if `summary` disappears from `meta` at some time later but with former approach extraction will just go ahead with `description` set to `None` that is perfectly fine (remember `None` is equivalent for absence of data). The latter will break extraction process with `KeyError` if `summary` disappears from `meta` at some later time but with the former approach extraction will just go ahead with `description` set to `None` which is perfectly fine (remember `None` is equivalent to the absence of data).
Similarly, you should pass `fatal=False` when extracting optional data from a webpage with `_search_regex`, `_html_search_regex` or similar methods, for instance: Similarly, you should pass `fatal=False` when extracting optional data from a webpage with `_search_regex`, `_html_search_regex` or similar methods, for instance:
@ -231,21 +231,21 @@ description = self._search_regex(
webpage, 'description', default=None) webpage, 'description', default=None)
``` ```
On failure this code will silently continue the extraction with `description` set to `None`. That is useful for metafields that are known to may or may not be present. On failure this code will silently continue the extraction with `description` set to `None`. That is useful for metafields that may or may not be present.
### Provide fallbacks ### Provide fallbacks
When extracting metadata try to provide several scenarios for that. For example if `title` is present in several places/sources try extracting from at least some of them. This would make it more future-proof in case some of the sources became unavailable. When extracting metadata try to do so from multiple sources. For example if `title` is present in several places, try extracting from at least some of them. This makes it more future-proof in case some of the sources become unavailable.
#### Example #### Example
Say `meta` from previous example has a `title` and you are about to extract it. Since `title` is mandatory meta field you should end up with something like: Say `meta` from the previous example has a `title` and you are about to extract it. Since `title` is a mandatory meta field you should end up with something like:
```python ```python
title = meta['title'] title = meta['title']
``` ```
If `title` disappeares from `meta` in future due to some changes on hoster's side the extraction would fail since `title` is mandatory. That's expected. If `title` disappears from `meta` in future due to some changes on the hoster's side the extraction would fail since `title` is mandatory. That's expected.
Assume that you have some another source you can extract `title` from, for example `og:title` HTML meta of a `webpage`. In this case you can provide a fallback scenario: Assume that you have some another source you can extract `title` from, for example `og:title` HTML meta of a `webpage`. In this case you can provide a fallback scenario:
@ -282,7 +282,7 @@ title = self._search_regex(
webpage, 'title', group='title') webpage, 'title', group='title')
``` ```
Note how you tolerate potential changes in `style` attribute's value or switch from using double quotes to single for `class` attribute: Note how you tolerate potential changes in the `style` attribute's value or switch from using double quotes to single for `class` attribute:
The code definitely should not look like: The code definitely should not look like:

516
ChangeLog
View File

@ -1,3 +1,518 @@
version 2016.11.02
Core
+ Add basic support for Smooth Streaming protocol (#8118, #10969)
* Improve MPD manifest base URL extraction (#10909, #11079)
* Fix --match-filter for int-like strings (#11082)
Extractors
+ [mva] Add support for ISM formats
+ [msn] Add support for ISM formats
+ [onet] Add support for ISM formats
+ [tvp] Add support for ISM formats
+ [nicknight] Add support for nicknight sites (#10769)
version 2016.10.30
Extractors
* [facebook] Improve 1080P video detection (#11073)
* [imgur] Recognize /r/ URLs (#11071)
* [beeg] Fix extraction (#11069)
* [openload] Fix extraction (#10408)
* [gvsearch] Modernize and fix search request (#11051)
* [adultswim] Fix extraction (#10979)
+ [nobelprize] Add support for nobelprize.org (#9999)
* [hornbunny] Fix extraction (#10981)
* [tvp] Improve video id extraction (#10585)
version 2016.10.26
Extractors
+ [rentv] Add support for ren.tv (#10620)
+ [ard] Detect unavailable videos (#11018)
* [vk] Fix extraction (#11022)
version 2016.10.25
Core
* Running youtube-dl in the background is fixed (#10996, #10706, #955)
Extractors
+ [jamendo] Add support for jamendo.com (#10132, #10736)
+ [pandatv] Add support for panda.tv (#10736)
+ [dotsub] Support Vimeo embed (#10964)
* [litv] Fix extraction
+ [vimeo] Delegate ondemand redirects to ondemand extractor (#10994)
* [vivo] Fix extraction (#11003)
+ [twitch:stream] Add support for rebroadcasts (#10995)
* [pluralsight] Fix subtitles conversion (#10990)
version 2016.10.21.1
Extractors
+ [pluralsight] Process all clip URLs (#10984)
version 2016.10.21
Core
- Disable thumbnails embedding in mkv
+ Add support for Comcast multiple-system operator (#10819)
Extractors
* [pluralsight] Adapt to new API (#10972)
* [openload] Fix extraction (#10408, #10971)
+ [natgeo] Extract m3u8 formats (#10959)
version 2016.10.19
Core
+ [utils] Expose PACKED_CODES_RE
+ [extractor/common] Extract non smil wowza mpd manifests
+ [extractor/common] Detect f4m audio-only formats
Extractors
* [vidzi] Fix extraction (#10908, #10952)
* [urplay] Fix subtitles extraction
+ [urplay] Add support for urskola.se (#10915)
+ [orf] Add subtitles support (#10939)
* [youtube] Fix --no-playlist behavior for youtu.be/id URLs (#10896)
* [nrk] Relax URL regular expression (#10928)
+ [nytimes] Add support for podcasts (#10926)
* [pluralsight] Relax URL regular expression (#10941)
version 2016.10.16
Core
* [postprocessor/ffmpeg] Return correct filepath and ext in updated information
in FFmpegExtractAudioPP (#10879)
Extractors
+ [ruutu] Add support for supla.fi (#10849)
+ [theoperaplatform] Add support for theoperaplatform.eu (#10914)
* [lynda] Fix height for prioritized streams
+ [lynda] Add fallback extraction scenario
* [lynda] Switch to https (#10916)
+ [huajiao] New extractor (#10917)
* [cmt] Fix mgid extraction (#10813)
+ [safari:course] Add support for techbus.safaribooksonline.com
* [orf:tvthek] Fix extraction and modernize (#10898)
* [chirbit] Fix extraction of user profile pages
* [carambatv] Fix extraction
* [canalplus] Fix extraction for some videos
* [cbsinteractive] Fix extraction for cnet.com
* [parliamentliveuk] Lower case URLs are now recognized (#10912)
version 2016.10.12
Core
+ Support HTML media elements without child nodes
* [Makefile] Support for GNU make < 4 is fixed; BSD make dropped (#9387)
Extractors
* [dailymotion] Fix extraction (#10901)
* [vimeo:review] Fix extraction (#10900)
* [nhl] Correctly handle invalid formats (#10713)
* [footyroom] Fix extraction (#10810)
* [abc.net.au:iview] Fix for standalone (non series) videos (#10895)
+ [hbo] Add support for episode pages (#10892)
* [allocine] Fix extraction (#10860)
+ [nextmedia] Recognize action news on AppleDaily
* [lego] Improve info extraction and bypass geo restriction (#10872)
version 2016.10.07
Extractors
+ [iprima] Detect geo restriction
* [facebook] Fix video extraction (#10846)
+ [commonprotocols] Support direct MMS links (#10838)
+ [generic] Add support for multiple vimeo embeds (#10862)
+ [nzz] Add support for nzz.ch (#4407)
+ [npo] Detect geo restriction
+ [npo] Add support for 2doc.nl (#10842)
+ [lego] Add support for lego.com (#10369)
+ [tonline] Add support for t-online.de (#10376)
* [techtalks] Relax URL regular expression (#10840)
* [youtube:live] Extend URL regular expression (#10839)
+ [theweatherchannel] Add support for weather.com (#7188)
+ [thisoldhouse] Add support for thisoldhouse.com (#10837)
+ [nhl] Add support for wch2016.com (#10833)
* [pornoxo] Use JWPlatform to improve metadata extraction
version 2016.10.02
Core
* Fix possibly lost extended attributes during post-processing
+ Support pyxattr as well as python-xattr for --xattrs and
--xattr-set-filesize (#9054)
Extractors
+ [jwplatform] Support DASH streams in JWPlayer
+ [jwplatform] Support old-style JWPlayer playlists
+ [byutv:event] Add extractor
* [periscope:user] Fix extraction (#10820)
* [dctp] Fix extraction (#10734)
+ [instagram] Extract video dimensions (#10790)
+ [tvland] Extend URL regular expression (#10812)
+ [vgtv] Add support for tv.aftonbladet.se (#10800)
- [aftonbladet] Remove extractor
* [vk] Fix timestamp and view count extraction (#10760)
+ [vk] Add support for running and finished live streams (#10799)
+ [leeco] Recognize more Le Sports URLs (#10794)
+ [instagram] Extract comments (#10788)
+ [ketnet] Extract mzsource formats (#10770)
* [limelight:media] Improve HTTP formats extraction
version 2016.09.27
Core
+ Add hdcore query parameter to akamai f4m formats
+ Delegate HLS live streams downloading to ffmpeg
+ Improved support for HTML5 subtitles
Extractors
+ [vk] Add support for dailymotion embeds (#10661)
* [promptfile] Fix extraction (#10634)
* [kaltura] Speed up embed regular expressions (#10764)
+ [npo] Add support for anderetijden.nl (#10754)
+ [prosiebensat1] Add support for advopedia sites
* [mwave] Relax URL regular expression (#10735, #10748)
* [prosiebensat1] Fix playlist support (#10745)
+ [prosiebensat1] Add support for sat1gold sites (#10745)
+ [cbsnews:livevideo] Fix extraction and extract m3u8 formats
+ [brightcove:new] Add support for live streams
* [soundcloud] Generalize playlist entries extraction (#10733)
+ [mtv] Add support for new URL schema (#8169, #9808)
* [einthusan] Fix extraction (#10714)
+ [twitter] Support Periscope embeds (#10737)
+ [openload] Support subtitles (#10625)
version 2016.09.24
Core
+ Add support for watchTVeverywhere.com authentication provider based MSOs for
Adobe Pass authentication (#10709)
Extractors
+ [soundcloud:playlist] Provide video id for early playlist entries (#10733)
+ [prosiebensat1] Add support for kabeleinsdoku (#10732)
* [cbs] Extract info from thunder videoPlayerService (#10728)
* [openload] Fix extraction (#10408)
+ [ustream] Support the new HLS streams (#10698)
+ [ooyala] Extract all HLS formats
+ [cartoonnetwork] Add support for Adobe Pass authentication
+ [soundcloud] Extract license metadata
+ [fox] Add support for Adobe Pass authentication (#8584)
+ [tbs] Add support for Adobe Pass authentication (#10642, #10222)
+ [trutv] Add support for Adobe Pass authentication (#10519)
+ [turner] Add support for Adobe Pass authentication
version 2016.09.19
Extractors
+ [crunchyroll] Check if already authenticated (#10700)
- [twitch:stream] Remove fallback to profile extraction when stream is offline
* [thisav] Improve title extraction (#10682)
* [vyborymos] Improve station info extraction
version 2016.09.18
Core
+ Introduce manifest_url and fragments fields in formats dictionary for
fragmented media
+ Provide manifest_url field for DASH segments, HLS and HDS
+ Provide fragments field for DASH segments
* Rework DASH segments downloader to use fragments field
+ Add helper method for Wowza Streaming Engine formats extraction
Extractors
+ [vyborymos] Add extractor for vybory.mos.ru (#10692)
+ [xfileshare] Add title regular expression for streamin.to (#10646)
+ [globo:article] Add support for multiple videos (#10653)
+ [thisav] Recognize HTML5 videos (#10447)
* [jwplatform] Improve JWPlayer detection
+ [mangomolo] Add support for Mangomolo embeds
+ [toutv] Add support for authentication (#10669)
* [franceinter] Fix upload date extraction
* [tv4] Fix HLS and HDS formats extraction (#10659)
version 2016.09.15
Core
* Improve _hidden_inputs
+ Introduce improved explicit Adobe Pass support
+ Add --ap-mso to provide multiple-system operator identifier
+ Add --ap-username to provide MSO account username
+ Add --ap-password to provide MSO account password
+ Add --ap-list-mso to list all supported MSOs
+ Add support for Rogers Cable multiple-system operator (#10606)
Extractors
* [crunchyroll] Fix authentication (#10655)
* [twitch] Fix API calls (#10654, #10660)
+ [bellmedia] Add support for more Bell Media Television sites
* [franceinter] Fix extraction (#10538, #2105)
* [kuwo] Improve error detection (#10650)
+ [go] Add support for free full episodes (#10439)
* [bilibili] Fix extraction for specific videos (#10647)
* [nhk] Fix extraction (#10633)
* [kaltura] Improve audio detection
* [kaltura] Skip chun format
+ [vimeo:ondemand] Pass Referer along with embed URL (#10624)
+ [nbc] Add support for NBC Olympics (#10361)
version 2016.09.11.1
Extractors
+ [tube8] Extract categories and tags (#10579)
+ [pornhub] Extract categories and tags (#10499)
* [openload] Temporary fix (#10408)
+ [foxnews] Add support Fox News articles (#10598)
* [viafree] Improve video id extraction (#10615)
* [iwara] Fix extraction after relaunch (#10462, #3215)
+ [tfo] Add extractor for tfo.org
* [lrt] Fix audio extraction (#10566)
* [9now] Fix extraction (#10561)
+ [canalplus] Add support for c8.fr (#10577)
* [newgrounds] Fix uploader extraction (#10584)
+ [polskieradio:category] Add support for category lists (#10576)
+ [ketnet] Add extractor for ketnet.be (#10343)
+ [canvas] Add support for een.be (#10605)
+ [telequebec] Add extractor for telequebec.tv (#1999)
* [parliamentliveuk] Fix extraction (#9137)
version 2016.09.08
Extractors
+ [jwplatform] Extract height from format label
+ [yahoo] Extract Brightcove Legacy Studio embeds (#9345)
* [videomore] Fix extraction (#10592)
* [foxgay] Fix extraction (#10480)
+ [rmcdecouverte] Add extractor for rmcdecouverte.bfmtv.com (#9709)
* [gamestar] Fix metadata extraction (#10479)
* [puls4] Fix extraction (#10583)
+ [cctv] Add extractor for CCTV and CNTV (#8153)
+ [lci] Add extractor for lci.fr (#10573)
+ [wat] Extract DASH formats
+ [viafree] Improve video id detection (#10569)
+ [trutv] Add extractor for trutv.com (#10519)
+ [nick] Add support for nickelodeon.nl (#10559)
+ [abcotvs:clips] Add support for clips.abcotvs.com
+ [abcotvs] Add support for ABC Owned Television Stations sites (#9551)
+ [miaopai] Add extractor for miaopai.com (#10556)
* [gamestar] Fix metadata extraction (#10479)
+ [bilibili] Add support for episodes (#10190)
+ [tvnoe] Add extractor for tvnoe.cz (#10524)
version 2016.09.04.1
Core
* In DASH downloader if the first segment fails, abort the whole download
process to prevent throttling (#10497)
+ Add support for --skip-unavailable-fragments and --fragment retries in
hlsnative downloader (#10165, #10448).
+ Add support for --skip-unavailable-fragments in DASH downloader
+ Introduce --skip-unavailable-fragments option for fragment based downloaders
that allows to skip fragments unavailable due to a HTTP error
* Fix extraction of video/audio entries with src attribute in
_parse_html5_media_entries (#10540)
Extractors
* [theplatform] Relax URL regular expression (#10546)
* [youtube:playlist] Extend URL regular expression
* [rottentomatoes] Delegate extraction to internetvideoarchive extractor
* [internetvideoarchive] Extract all formats
* [pornvoisines] Fix extraction (#10469)
* [rottentomatoes] Fix extraction (#10467)
* [espn] Extend URL regular expression (#10549)
* [vimple] Extend URL regular expression (#10547)
* [youtube:watchlater] Fix extraction (#10544)
* [youjizz] Fix extraction (#10437)
+ [foxnews] Add support for FoxNews Insider (#10445)
+ [fc2] Recognize Flash player URLs (#10512)
version 2016.09.03
Core
* Restore usage of NAME attribute from EXT-X-MEDIA tag for formats codes in
_extract_m3u8_formats (#10522)
* Handle semicolon in mimetype2ext
Extractors
+ [youtube] Add support for rental videos' previews (#10532)
* [youtube:playlist] Fallback to video extraction for video/playlist URLs when
no playlist is actually served (#10537)
+ [drtv] Add support for dr.dk/nyheder (#10536)
+ [facebook:plugins:video] Add extractor (#10530)
+ [go] Add extractor for *.go.com sites
* [adobepass] Check for authz_token expiration (#10527)
* [nytimes] improve extraction
* [thestar] Fix extraction (#10465)
* [glide] Fix extraction (#10478)
- [exfm] Remove extractor (#10482)
* [youporn] Fix categories and tags extraction (#10521)
+ [curiositystream] Add extractor for app.curiositystream.com
- [thvideo] Remove extractor (#10464)
* [movingimage] Fix for the new site name (#10466)
+ [cbs] Add support for once formats (#10515)
* [limelight] Skip ism snd duplicate manifests
+ [porncom] Extract categories and tags (#10510)
+ [facebook] Extract timestamp (#10508)
+ [yahoo] Extract more formats
version 2016.08.31
Extractors
* [soundcloud] Fix URL regular expression to avoid clashes with sets (#10505)
* [bandcamp:album] Fix title extraction (#10455)
* [pyvideo] Fix extraction (#10468)
+ [ctv] Add support for tsn.ca, bnn.ca and thecomedynetwork.ca (#10016)
* [9c9media] Extract more metadata
* [9c9media] Fix multiple stacks extraction (#10016)
* [adultswim] Improve video info extraction (#10492)
* [vodplatform] Improve embed regular expression
- [played] Remove extractor (#10470)
+ [tbs] Add extractor for tbs.com and tntdrama.com (#10222)
+ [cartoonnetwork] Add extractor for cartoonnetwork.com (#10110)
* [adultswim] Rework in terms of turner extractor
* [cnn] Rework in terms of turner extractor
* [nba] Rework in terms of turner extractor
+ [turner] Add base extractor for Turner Broadcasting System based sites
* [bilibili] Fix extraction (#10375)
* [openload] Fix extraction (#10408)
version 2016.08.28
Core
+ Add warning message that ffmpeg doesn't support SOCKS
* Improve thumbnail sorting
+ Extract formats from #EXT-X-MEDIA tags in _extract_m3u8_formats
* Fill IV with leading zeros for IVs shorter than 16 octets in hlsnative
+ Add ac-3 to the list of audio codecs in parse_codecs
Extractors
* [periscope:user] Fix extraction (#10453)
* [douyutv] Fix extraction (#10153, #10318, #10444)
+ [nhk:vod] Add extractor for www3.nhk.or.jp on demand (#4437, #10424)
- [trutube] Remove extractor (#10438)
+ [usanetwork] Add extractor for usanetwork.com
* [crackle] Fix extraction (#10333)
* [spankbang] Fix description and uploader extraction (#10339)
* [discoverygo] Detect cable provider restricted videos (#10425)
+ [cbc] Add support for watch.cbc.ca
* [kickstarter] Silent the warning for og:description (#10415)
* [mtvservices:embedded] Fix extraction for the new 'edge' player (#10363)
version 2016.08.24.1
Extractors
+ [pluralsight] Add support for subtitles (#9681)
version 2016.08.24
Extractors
* [youtube] Fix authentication (#10392)
* [openload] Fix extraction (#10408)
+ [bravotv] Add support for Adobe Pass (#10407)
* [bravotv] Fix clip info extraction (#10407)
* [eagleplatform] Improve embedded videos detection (#10409)
* [awaan] Fix extraction
* [mtvservices:embedded] Update config URL
+ [abc:iview] Add extractor (#6148)
version 2016.08.22
Core
* Improve formats and subtitles extension auto calculation
+ Recognize full unit names in parse_filesize
+ Add support for m3u8 manifests in HTML5 multimedia tags
* Fix octal/hexadecimal number detection in js_to_json
Extractors
+ [ivi] Add support for 720p and 1080p
+ [charlierose] Add new extractor (#10382)
* [1tv] Fix extraction (#9249)
* [twitch] Renew authentication
* [kaltura] Improve subtitles extension calculation
+ [zingmp3] Add support for video clips
* [zingmp3] Fix extraction (#10041)
* [kaltura] Improve subtitles extraction (#10279)
* [cultureunplugged] Fix extraction (#10330)
+ [cnn] Add support for money.cnn.com (#2797)
* [cbsnews] Fix extraction (#10362)
* [cbs] Fix extraction (#10393)
+ [litv] Support 'promo' URLs (#10385)
* [snotr] Fix extraction (#10338)
* [n-tv.de] Fix extraction (#10331)
* [globo:article] Relax URL and video id regular expressions (#10379)
version 2016.08.19
Core
- Remove output template description from --help
* Recognize lowercase units in parse_filesize
Extractors
+ [porncom] Add extractor for porn.com (#2251, #10251)
+ [generic] Add support for DBTV embeds
* [vk:wallpost] Fix audio extraction for new site layout
* [vk] Fix authentication
+ [hgtvcom:show] Add extractor for hgtv.com shows (#10365)
+ [discoverygo] Add support for another GO network sites
version 2016.08.17
Core
+ Add _get_netrc_login_info
Extractors
* [mofosex] Extract all formats (#10335)
+ [generic] Add support for vbox7 embeds
+ [vbox7] Add support for embed URLs
+ [viafree] Add extractor (#10358)
+ [mtg] Add support for viafree URLs (#10358)
* [theplatform] Extract all subtitles per language
+ [xvideos] Fix HLS extraction (#10356)
+ [amcnetworks] Add extractor
+ [bbc:playlist] Add support for pagination (#10349)
+ [fxnetworks] Add extractor (#9462)
* [cbslocal] Fix extraction for SendtoNews-based videos
* [sendtonews] Fix extraction
* [jwplatform] Extract video id from JWPlayer data
- [zippcast] Remove extractor (#10332)
+ [viceland] Add extractor (#8799)
+ [adobepass] Add base extractor for Adobe Pass Authentication
* [life:embed] Improve extraction
* [vgtv] Detect geo restricted videos (#10348)
+ [uplynk] Add extractor
* [xiami] Fix extraction (#10342)
version 2016.08.13 version 2016.08.13
Core Core
@ -23,6 +538,7 @@ Extractors
+ [pbs] Add support for high quality HTTP formats + [pbs] Add support for high quality HTTP formats
+ [crunchyroll] Add support for HLS formats (#10301) + [crunchyroll] Add support for HLS formats (#10301)
version 2016.08.12 version 2016.08.12
Core Core

View File

@ -1,7 +1,7 @@
all: youtube-dl README.md CONTRIBUTING.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish supportedsites all: youtube-dl README.md CONTRIBUTING.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish supportedsites
clean: clean:
rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish youtube_dl/extractor/lazy_extractors.py *.dump *.part *.info.json *.mp4 *.m4a *.flv *.mp3 *.avi *.mkv *.webm *.jpg *.png CONTRIBUTING.md.tmp ISSUE_TEMPLATE.md.tmp youtube-dl youtube-dl.exe rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish youtube_dl/extractor/lazy_extractors.py *.dump *.part* *.info.json *.mp4 *.m4a *.flv *.mp3 *.avi *.mkv *.webm *.3gp *.jpg *.png CONTRIBUTING.md.tmp ISSUE_TEMPLATE.md.tmp youtube-dl youtube-dl.exe
find . -name "*.pyc" -delete find . -name "*.pyc" -delete
find . -name "*.class" -delete find . -name "*.class" -delete
@ -12,7 +12,7 @@ SHAREDIR ?= $(PREFIX)/share
PYTHON ?= /usr/bin/env python PYTHON ?= /usr/bin/env python
# set SYSCONFDIR to /etc if PREFIX=/usr or PREFIX=/usr/local # set SYSCONFDIR to /etc if PREFIX=/usr or PREFIX=/usr/local
SYSCONFDIR != if [ $(PREFIX) = /usr -o $(PREFIX) = /usr/local ]; then echo /etc; else echo $(PREFIX)/etc; fi SYSCONFDIR = $(shell if [ $(PREFIX) = /usr -o $(PREFIX) = /usr/local ]; then echo /etc; else echo $(PREFIX)/etc; fi)
install: youtube-dl youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish install: youtube-dl youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish
install -d $(DESTDIR)$(BINDIR) install -d $(DESTDIR)$(BINDIR)
@ -90,7 +90,7 @@ fish-completion: youtube-dl.fish
lazy-extractors: youtube_dl/extractor/lazy_extractors.py lazy-extractors: youtube_dl/extractor/lazy_extractors.py
_EXTRACTOR_FILES != find youtube_dl/extractor -iname '*.py' -and -not -iname 'lazy_extractors.py' _EXTRACTOR_FILES = $(shell find youtube_dl/extractor -iname '*.py' -and -not -iname 'lazy_extractors.py')
youtube_dl/extractor/lazy_extractors.py: devscripts/make_lazy_extractors.py devscripts/lazy_load_template.py $(_EXTRACTOR_FILES) youtube_dl/extractor/lazy_extractors.py: devscripts/make_lazy_extractors.py devscripts/lazy_load_template.py $(_EXTRACTOR_FILES)
$(PYTHON) devscripts/make_lazy_extractors.py $@ $(PYTHON) devscripts/make_lazy_extractors.py $@

178
README.md
View File

@ -89,6 +89,8 @@ which means you can modify it, redistribute it or use it however you like.
--mark-watched Mark videos watched (YouTube only) --mark-watched Mark videos watched (YouTube only)
--no-mark-watched Do not mark videos watched (YouTube only) --no-mark-watched Do not mark videos watched (YouTube only)
--no-color Do not emit color codes in output --no-color Do not emit color codes in output
--abort-on-unavailable-fragment Abort downloading when some fragment is not
available
## Network Options: ## Network Options:
--proxy URL Use the specified HTTP/HTTPS/SOCKS proxy. --proxy URL Use the specified HTTP/HTTPS/SOCKS proxy.
@ -173,7 +175,10 @@ which means you can modify it, redistribute it or use it however you like.
-R, --retries RETRIES Number of retries (default is 10), or -R, --retries RETRIES Number of retries (default is 10), or
"infinite". "infinite".
--fragment-retries RETRIES Number of retries for a fragment (default --fragment-retries RETRIES Number of retries for a fragment (default
is 10), or "infinite" (DASH only) is 10), or "infinite" (DASH and hlsnative
only)
--skip-unavailable-fragments Skip unavailable fragments (DASH and
hlsnative only)
--buffer-size SIZE Size of download buffer (e.g. 1024 or 16K) --buffer-size SIZE Size of download buffer (e.g. 1024 or 16K)
(default is 1024) (default is 1024)
--no-resize-buffer Do not automatically adjust the buffer --no-resize-buffer Do not automatically adjust the buffer
@ -201,32 +206,8 @@ which means you can modify it, redistribute it or use it however you like.
-a, --batch-file FILE File containing URLs to download ('-' for -a, --batch-file FILE File containing URLs to download ('-' for
stdin) stdin)
--id Use only video ID in file name --id Use only video ID in file name
-o, --output TEMPLATE Output filename template. Use %(title)s to -o, --output TEMPLATE Output filename template, see the "OUTPUT
get the title, %(uploader)s for the TEMPLATE" for all the info
uploader name, %(uploader_id)s for the
uploader nickname if different,
%(autonumber)s to get an automatically
incremented number, %(ext)s for the
filename extension, %(format)s for the
format description (like "22 - 1280x720" or
"HD"), %(format_id)s for the unique id of
the format (like YouTube's itags: "137"),
%(upload_date)s for the upload date
(YYYYMMDD), %(extractor)s for the provider
(youtube, metacafe, etc), %(id)s for the
video id, %(playlist_title)s,
%(playlist_id)s, or %(playlist)s (=title if
present, ID otherwise) for the playlist the
video is in, %(playlist_index)s for the
position in the playlist. %(height)s and
%(width)s for the width and height of the
video format. %(resolution)s for a textual
description of the resolution of the video
format. %% for a literal percent. Use - to
output to stdout. Can also be used to
download to a different directory, for
example with -o '/my/downloads/%(uploader)s
/%(title)s-%(id)s.%(ext)s' .
--autonumber-size NUMBER Specify the number of digits in --autonumber-size NUMBER Specify the number of digits in
%(autonumber)s when it is present in output %(autonumber)s when it is present in output
filename template or --auto-number option filename template or --auto-number option
@ -377,6 +358,17 @@ which means you can modify it, redistribute it or use it however you like.
-n, --netrc Use .netrc authentication data -n, --netrc Use .netrc authentication data
--video-password PASSWORD Video password (vimeo, smotri, youku) --video-password PASSWORD Video password (vimeo, smotri, youku)
## Adobe Pass Options:
--ap-mso MSO Adobe Pass multiple-system operator (TV
provider) identifier, use --ap-list-mso for
a list of available MSOs
--ap-username USERNAME Multiple-system operator account login
--ap-password PASSWORD Multiple-system operator account password.
If this option is left out, youtube-dl will
ask interactively.
--ap-list-mso List all supported multiple-system
operators
## Post-processing Options: ## Post-processing Options:
-x, --extract-audio Convert video files to audio-only files -x, --extract-audio Convert video files to audio-only files
(requires ffmpeg or avconv and ffprobe or (requires ffmpeg or avconv and ffprobe or
@ -436,11 +428,19 @@ You can configure youtube-dl by placing any supported command line option to a c
For example, with the following configuration file youtube-dl will always extract the audio, not copy the mtime, use a proxy and save all videos under `Movies` directory in your home directory: For example, with the following configuration file youtube-dl will always extract the audio, not copy the mtime, use a proxy and save all videos under `Movies` directory in your home directory:
``` ```
-x
--no-mtime
--proxy 127.0.0.1:3128
-o ~/Movies/%(title)s.%(ext)s
# Lines starting with # are comments # Lines starting with # are comments
# Always extract audio
-x
# Do not copy the mtime
--no-mtime
# Use this proxy
--proxy 127.0.0.1:3128
# Save all videos under Movies directory in your home directory
-o ~/Movies/%(title)s.%(ext)s
``` ```
Note that options in configuration file are just the same options aka switches used in regular command line calls thus there **must be no whitespace** after `-` or `--`, e.g. `-o` or `--proxy` but not `- o` or `-- proxy`. Note that options in configuration file are just the same options aka switches used in regular command line calls thus there **must be no whitespace** after `-` or `--`, e.g. `-o` or `--proxy` but not `- o` or `-- proxy`.
@ -449,12 +449,12 @@ You can use `--ignore-config` if you want to disable the configuration file for
### Authentication with `.netrc` file ### Authentication with `.netrc` file
You may also want to configure automatic credentials storage for extractors that support authentication (by providing login and password with `--username` and `--password`) in order not to pass credentials as command line arguments on every youtube-dl execution and prevent tracking plain text passwords in the shell command history. You can achieve this using a [`.netrc` file](http://stackoverflow.com/tags/.netrc/info) on per extractor basis. For that you will need to create a `.netrc` file in your `$HOME` and restrict permissions to read/write by you only: You may also want to configure automatic credentials storage for extractors that support authentication (by providing login and password with `--username` and `--password`) in order not to pass credentials as command line arguments on every youtube-dl execution and prevent tracking plain text passwords in the shell command history. You can achieve this using a [`.netrc` file](http://stackoverflow.com/tags/.netrc/info) on a per extractor basis. For that you will need to create a `.netrc` file in your `$HOME` and restrict permissions to read/write by only you:
``` ```
touch $HOME/.netrc touch $HOME/.netrc
chmod a-rwx,u+rw $HOME/.netrc chmod a-rwx,u+rw $HOME/.netrc
``` ```
After that you can add credentials for extractor in the following format, where *extractor* is the name of extractor in lowercase: After that you can add credentials for an extractor in the following format, where *extractor* is the name of the extractor in lowercase:
``` ```
machine <extractor> login <login> password <password> machine <extractor> login <login> password <password>
``` ```
@ -550,13 +550,13 @@ Available for the media that is a track or a part of a music album:
- `disc_number`: Number of the disc or other physical medium the track belongs to - `disc_number`: Number of the disc or other physical medium the track belongs to
- `release_year`: Year (YYYY) when the album was released - `release_year`: Year (YYYY) when the album was released
Each aforementioned sequence when referenced in output template will be replaced by the actual value corresponding to the sequence name. Note that some of the sequences are not guaranteed to be present since they depend on the metadata obtained by particular extractor, such sequences will be replaced with `NA`. Each aforementioned sequence when referenced in an output template will be replaced by the actual value corresponding to the sequence name. Note that some of the sequences are not guaranteed to be present since they depend on the metadata obtained by a particular extractor. Such sequences will be replaced with `NA`.
For example for `-o %(title)s-%(id)s.%(ext)s` and mp4 video with title `youtube-dl test video` and id `BaW_jenozKcj` this will result in a `youtube-dl test video-BaW_jenozKcj.mp4` file created in the current directory. For example for `-o %(title)s-%(id)s.%(ext)s` and an mp4 video with title `youtube-dl test video` and id `BaW_jenozKcj`, this will result in a `youtube-dl test video-BaW_jenozKcj.mp4` file created in the current directory.
Output template can also contain arbitrary hierarchical path, e.g. `-o '%(playlist)s/%(playlist_index)s - %(title)s.%(ext)s'` that will result in downloading each video in a directory corresponding to this path template. Any missing directory will be automatically created for you. Output templates can also contain arbitrary hierarchical path, e.g. `-o '%(playlist)s/%(playlist_index)s - %(title)s.%(ext)s'` which will result in downloading each video in a directory corresponding to this path template. Any missing directory will be automatically created for you.
To specify percent literal in output template use `%%`. To output to stdout use `-o -`. To use percent literals in an output template use `%%`. To output to stdout use `-o -`.
The current default template is `%(title)s-%(id)s.%(ext)s`. The current default template is `%(title)s-%(id)s.%(ext)s`.
@ -564,7 +564,7 @@ In some cases, you don't want special characters such as 中, spaces, or &, such
#### Output template and Windows batch files #### Output template and Windows batch files
If you are using output template inside a Windows batch file then you must escape plain percent characters (`%`) by doubling, so that `-o "%(title)s-%(id)s.%(ext)s"` should become `-o "%%(title)s-%%(id)s.%%(ext)s"`. However you should not touch `%`'s that are not plain characters, e.g. environment variables for expansion should stay intact: `-o "C:\%HOMEPATH%\Desktop\%%(title)s.%%(ext)s"`. If you are using an output template inside a Windows batch file then you must escape plain percent characters (`%`) by doubling, so that `-o "%(title)s-%(id)s.%(ext)s"` should become `-o "%%(title)s-%%(id)s.%%(ext)s"`. However you should not touch `%`'s that are not plain characters, e.g. environment variables for expansion should stay intact: `-o "C:\%HOMEPATH%\Desktop\%%(title)s.%%(ext)s"`.
#### Output template examples #### Output template examples
@ -597,7 +597,7 @@ $ youtube-dl -o - BaW_jenozKc
By default youtube-dl tries to download the best available quality, i.e. if you want the best quality you **don't need** to pass any special options, youtube-dl will guess it for you by **default**. By default youtube-dl tries to download the best available quality, i.e. if you want the best quality you **don't need** to pass any special options, youtube-dl will guess it for you by **default**.
But sometimes you may want to download in a different format, for example when you are on a slow or intermittent connection. The key mechanism for achieving this is so called *format selection* based on which you can explicitly specify desired format, select formats based on some criterion or criteria, setup precedence and much more. But sometimes you may want to download in a different format, for example when you are on a slow or intermittent connection. The key mechanism for achieving this is so-called *format selection* based on which you can explicitly specify desired format, select formats based on some criterion or criteria, setup precedence and much more.
The general syntax for format selection is `--format FORMAT` or shorter `-f FORMAT` where `FORMAT` is a *selector expression*, i.e. an expression that describes format or formats you would like to download. The general syntax for format selection is `--format FORMAT` or shorter `-f FORMAT` where `FORMAT` is a *selector expression*, i.e. an expression that describes format or formats you would like to download.
@ -605,21 +605,21 @@ The general syntax for format selection is `--format FORMAT` or shorter `-f FORM
The simplest case is requesting a specific format, for example with `-f 22` you can download the format with format code equal to 22. You can get the list of available format codes for particular video using `--list-formats` or `-F`. Note that these format codes are extractor specific. The simplest case is requesting a specific format, for example with `-f 22` you can download the format with format code equal to 22. You can get the list of available format codes for particular video using `--list-formats` or `-F`. Note that these format codes are extractor specific.
You can also use a file extension (currently `3gp`, `aac`, `flv`, `m4a`, `mp3`, `mp4`, `ogg`, `wav`, `webm` are supported) to download best quality format of particular file extension served as a single file, e.g. `-f webm` will download best quality format with `webm` extension served as a single file. You can also use a file extension (currently `3gp`, `aac`, `flv`, `m4a`, `mp3`, `mp4`, `ogg`, `wav`, `webm` are supported) to download the best quality format of a particular file extension served as a single file, e.g. `-f webm` will download the best quality format with the `webm` extension served as a single file.
You can also use special names to select particular edge case format: You can also use special names to select particular edge case formats:
- `best`: Select best quality format represented by single file with video and audio - `best`: Select the best quality format represented by a single file with video and audio.
- `worst`: Select worst quality format represented by single file with video and audio - `worst`: Select the worst quality format represented by a single file with video and audio.
- `bestvideo`: Select best quality video only format (e.g. DASH video), may not be available - `bestvideo`: Select the best quality video-only format (e.g. DASH video). May not be available.
- `worstvideo`: Select worst quality video only format, may not be available - `worstvideo`: Select the worst quality video-only format. May not be available.
- `bestaudio`: Select best quality audio only format, may not be available - `bestaudio`: Select the best quality audio only-format. May not be available.
- `worstaudio`: Select worst quality audio only format, may not be available - `worstaudio`: Select the worst quality audio only-format. May not be available.
For example, to download worst quality video only format you can use `-f worstvideo`. For example, to download the worst quality video-only format you can use `-f worstvideo`.
If you want to download multiple videos and they don't have the same formats available, you can specify the order of preference using slashes. Note that slash is left-associative, i.e. formats on the left hand side are preferred, for example `-f 22/17/18` will download format 22 if it's available, otherwise it will download format 17 if it's available, otherwise it will download format 18 if it's available, otherwise it will complain that no suitable formats are available for download. If you want to download multiple videos and they don't have the same formats available, you can specify the order of preference using slashes. Note that slash is left-associative, i.e. formats on the left hand side are preferred, for example `-f 22/17/18` will download format 22 if it's available, otherwise it will download format 17 if it's available, otherwise it will download format 18 if it's available, otherwise it will complain that no suitable formats are available for download.
If you want to download several formats of the same video use comma as a separator, e.g. `-f 22,17,18` will download all these three formats, of course if they are available. Or more sophisticated example combined with precedence feature `-f 136/137/mp4/bestvideo,140/m4a/bestaudio`. If you want to download several formats of the same video use a comma as a separator, e.g. `-f 22,17,18` will download all these three formats, of course if they are available. Or a more sophisticated example combined with the precedence feature: `-f 136/137/mp4/bestvideo,140/m4a/bestaudio`.
You can also filter the video formats by putting a condition in brackets, as in `-f "best[height=720]"` (or `-f "[filesize>10M]"`). You can also filter the video formats by putting a condition in brackets, as in `-f "best[height=720]"` (or `-f "[filesize>10M]"`).
@ -641,15 +641,15 @@ Also filtering work for comparisons `=` (equals), `!=` (not equals), `^=` (begin
- `protocol`: The protocol that will be used for the actual download, lower-case. `http`, `https`, `rtsp`, `rtmp`, `rtmpe`, `m3u8`, or `m3u8_native` - `protocol`: The protocol that will be used for the actual download, lower-case. `http`, `https`, `rtsp`, `rtmp`, `rtmpe`, `m3u8`, or `m3u8_native`
- `format_id`: A short description of the format - `format_id`: A short description of the format
Note that none of the aforementioned meta fields are guaranteed to be present since this solely depends on the metadata obtained by particular extractor, i.e. the metadata offered by video hoster. Note that none of the aforementioned meta fields are guaranteed to be present since this solely depends on the metadata obtained by particular extractor, i.e. the metadata offered by the video hoster.
Formats for which the value is not known are excluded unless you put a question mark (`?`) after the operator. You can combine format filters, so `-f "[height <=? 720][tbr>500]"` selects up to 720p videos (or videos where the height is not known) with a bitrate of at least 500 KBit/s. Formats for which the value is not known are excluded unless you put a question mark (`?`) after the operator. You can combine format filters, so `-f "[height <=? 720][tbr>500]"` selects up to 720p videos (or videos where the height is not known) with a bitrate of at least 500 KBit/s.
You can merge the video and audio of two formats into a single file using `-f <video-format>+<audio-format>` (requires ffmpeg or avconv installed), for example `-f bestvideo+bestaudio` will download best video only format, best audio only format and mux them together with ffmpeg/avconv. You can merge the video and audio of two formats into a single file using `-f <video-format>+<audio-format>` (requires ffmpeg or avconv installed), for example `-f bestvideo+bestaudio` will download the best video-only format, the best audio-only format and mux them together with ffmpeg/avconv.
Format selectors can also be grouped using parentheses, for example if you want to download the best mp4 and webm formats with a height lower than 480 you can use `-f '(mp4,webm)[height<480]'`. Format selectors can also be grouped using parentheses, for example if you want to download the best mp4 and webm formats with a height lower than 480 you can use `-f '(mp4,webm)[height<480]'`.
Since the end of April 2015 and version 2015.04.26 youtube-dl uses `-f bestvideo+bestaudio/best` as default format selection (see [#5447](https://github.com/rg3/youtube-dl/issues/5447), [#5456](https://github.com/rg3/youtube-dl/issues/5456)). If ffmpeg or avconv are installed this results in downloading `bestvideo` and `bestaudio` separately and muxing them together into a single file giving the best overall quality available. Otherwise it falls back to `best` and results in downloading the best available quality served as a single file. `best` is also needed for videos that don't come from YouTube because they don't provide the audio and video in two different files. If you want to only download some DASH formats (for example if you are not interested in getting videos with a resolution higher than 1080p), you can add `-f bestvideo[height<=?1080]+bestaudio/best` to your configuration file. Note that if you use youtube-dl to stream to `stdout` (and most likely to pipe it to your media player then), i.e. you explicitly specify output template as `-o -`, youtube-dl still uses `-f best` format selection in order to start content delivery immediately to your player and not to wait until `bestvideo` and `bestaudio` are downloaded and muxed. Since the end of April 2015 and version 2015.04.26, youtube-dl uses `-f bestvideo+bestaudio/best` as the default format selection (see [#5447](https://github.com/rg3/youtube-dl/issues/5447), [#5456](https://github.com/rg3/youtube-dl/issues/5456)). If ffmpeg or avconv are installed this results in downloading `bestvideo` and `bestaudio` separately and muxing them together into a single file giving the best overall quality available. Otherwise it falls back to `best` and results in downloading the best available quality served as a single file. `best` is also needed for videos that don't come from YouTube because they don't provide the audio and video in two different files. If you want to only download some DASH formats (for example if you are not interested in getting videos with a resolution higher than 1080p), you can add `-f bestvideo[height<=?1080]+bestaudio/best` to your configuration file. Note that if you use youtube-dl to stream to `stdout` (and most likely to pipe it to your media player then), i.e. you explicitly specify output template as `-o -`, youtube-dl still uses `-f best` format selection in order to start content delivery immediately to your player and not to wait until `bestvideo` and `bestaudio` are downloaded and muxed.
If you want to preserve the old format selection behavior (prior to youtube-dl 2015.04.26), i.e. you want to download the best available quality media served as a single file, you should explicitly specify your choice with `-f best`. You may want to add it to the [configuration file](#configuration) in order not to type it every time you run youtube-dl. If you want to preserve the old format selection behavior (prior to youtube-dl 2015.04.26), i.e. you want to download the best available quality media served as a single file, you should explicitly specify your choice with `-f best`. You may want to add it to the [configuration file](#configuration) in order not to type it every time you run youtube-dl.
@ -669,7 +669,11 @@ $ youtube-dl -f 'best[filesize<50M]'
# Download best format available via direct link over HTTP/HTTPS protocol # Download best format available via direct link over HTTP/HTTPS protocol
$ youtube-dl -f '(bestvideo+bestaudio/best)[protocol^=http]' $ youtube-dl -f '(bestvideo+bestaudio/best)[protocol^=http]'
# Download the best video format and the best audio format without merging them
$ youtube-dl -f 'bestvideo,bestaudio' -o '%(title)s.f%(format_id)s.%(ext)s'
``` ```
Note that in the last example, an output template is recommended as bestvideo and bestaudio may have the same file name.
# VIDEO SELECTION # VIDEO SELECTION
@ -724,7 +728,7 @@ Add a file exclusion for `youtube-dl.exe` in Windows Defender settings.
YouTube changed their playlist format in March 2014 and later on, so you'll need at least youtube-dl 2014.07.25 to download all YouTube videos. YouTube changed their playlist format in March 2014 and later on, so you'll need at least youtube-dl 2014.07.25 to download all YouTube videos.
If you have installed youtube-dl with a package manager, pip, setup.py or a tarball, please use that to update. Note that Ubuntu packages do not seem to get updated anymore. Since we are not affiliated with Ubuntu, there is little we can do. Feel free to [report bugs](https://bugs.launchpad.net/ubuntu/+source/youtube-dl/+filebug) to the [Ubuntu packaging guys](mailto:ubuntu-motu@lists.ubuntu.com?subject=outdated%20version%20of%20youtube-dl) - all they have to do is update the package to a somewhat recent version. See above for a way to update. If you have installed youtube-dl with a package manager, pip, setup.py or a tarball, please use that to update. Note that Ubuntu packages do not seem to get updated anymore. Since we are not affiliated with Ubuntu, there is little we can do. Feel free to [report bugs](https://bugs.launchpad.net/ubuntu/+source/youtube-dl/+filebug) to the [Ubuntu packaging people](mailto:ubuntu-motu@lists.ubuntu.com?subject=outdated%20version%20of%20youtube-dl) - all they have to do is update the package to a somewhat recent version. See above for a way to update.
### I'm getting an error when trying to use output template: `error: using output template conflicts with using title, video ID or auto number` ### I'm getting an error when trying to use output template: `error: using output template conflicts with using title, video ID or auto number`
@ -750,7 +754,7 @@ Videos or video formats streamed via RTMP protocol can only be downloaded when [
### I have downloaded a video but how can I play it? ### I have downloaded a video but how can I play it?
Once the video is fully downloaded, use any video player, such as [mpv](https://mpv.io/), [vlc](http://www.videolan.org) or [mplayer](http://www.mplayerhq.hu/). Once the video is fully downloaded, use any video player, such as [mpv](https://mpv.io/), [vlc](http://www.videolan.org/) or [mplayer](http://www.mplayerhq.hu/).
### I extracted a video URL with `-g`, but it does not play on another machine / in my webbrowser. ### I extracted a video URL with `-g`, but it does not play on another machine / in my webbrowser.
@ -832,10 +836,42 @@ Either prepend `http://www.youtube.com/watch?v=` or separate the ID from the opt
### How do I pass cookies to youtube-dl? ### How do I pass cookies to youtube-dl?
Use the `--cookies` option, for example `--cookies /path/to/cookies/file.txt`. Note that the cookies file must be in Mozilla/Netscape format and the first line of the cookies file must be either `# HTTP Cookie File` or `# Netscape HTTP Cookie File`. Make sure you have correct [newline format](https://en.wikipedia.org/wiki/Newline) in the cookies file and convert newlines if necessary to correspond with your OS, namely `CRLF` (`\r\n`) for Windows, `LF` (`\n`) for Linux and `CR` (`\r`) for Mac OS. `HTTP Error 400: Bad Request` when using `--cookies` is a good sign of invalid newline format. Use the `--cookies` option, for example `--cookies /path/to/cookies/file.txt`.
In order to extract cookies from browser use any conforming browser extension for exporting cookies. For example, [cookies.txt](https://chrome.google.com/webstore/detail/cookiestxt/njabckikapfpffapmjgojcnbfjonfjfg) (for Chrome) or [Export Cookies](https://addons.mozilla.org/en-US/firefox/addon/export-cookies/) (for Firefox).
Note that the cookies file must be in Mozilla/Netscape format and the first line of the cookies file must be either `# HTTP Cookie File` or `# Netscape HTTP Cookie File`. Make sure you have correct [newline format](https://en.wikipedia.org/wiki/Newline) in the cookies file and convert newlines if necessary to correspond with your OS, namely `CRLF` (`\r\n`) for Windows, `LF` (`\n`) for Linux and `CR` (`\r`) for Mac OS. `HTTP Error 400: Bad Request` when using `--cookies` is a good sign of invalid newline format.
Passing cookies to youtube-dl is a good way to workaround login when a particular extractor does not implement it explicitly. Another use case is working around [CAPTCHA](https://en.wikipedia.org/wiki/CAPTCHA) some websites require you to solve in particular cases in order to get access (e.g. YouTube, CloudFlare). Passing cookies to youtube-dl is a good way to workaround login when a particular extractor does not implement it explicitly. Another use case is working around [CAPTCHA](https://en.wikipedia.org/wiki/CAPTCHA) some websites require you to solve in particular cases in order to get access (e.g. YouTube, CloudFlare).
### How do I stream directly to media player?
You will first need to tell youtube-dl to stream media to stdout with `-o -`, and also tell your media player to read from stdin (it must be capable of this for streaming) and then pipe former to latter. For example, streaming to [vlc](http://www.videolan.org/) can be achieved with:
youtube-dl -o - "http://www.youtube.com/watch?v=BaW_jenozKcj" | vlc -
### How do I download only new videos from a playlist?
Use download-archive feature. With this feature you should initially download the complete playlist with `--download-archive /path/to/download/archive/file.txt` that will record identifiers of all the videos in a special file. Each subsequent run with the same `--download-archive` will download only new videos and skip all videos that have been downloaded before. Note that only successful downloads are recorded in the file.
For example, at first,
youtube-dl --download-archive archive.txt "https://www.youtube.com/playlist?list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re"
will download the complete `PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re` playlist and create a file `archive.txt`. Each subsequent run will only download new videos if any:
youtube-dl --download-archive archive.txt "https://www.youtube.com/playlist?list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re"
### Should I add `--hls-prefer-native` into my config?
When youtube-dl detects an HLS video, it can download it either with the built-in downloader or ffmpeg. Since many HLS streams are slightly invalid and ffmpeg/youtube-dl each handle some invalid cases better than the other, there is an option to switch the downloader if needed.
When youtube-dl knows that one particular downloader works better for a given website, that downloader will be picked. Otherwise, youtube-dl will pick the best downloader for general compatibility, which at the moment happens to be ffmpeg. This choice may change in future versions of youtube-dl, with improvements of the built-in downloader and/or ffmpeg.
In particular, the generic extractor (used when your website is not in the [list of supported sites by youtube-dl](http://rg3.github.io/youtube-dl/supportedsites.html) cannot mandate one specific downloader.
If you put either `--hls-prefer-native` or `--hls-prefer-ffmpeg` into your configuration, a different subset of videos will fail to download correctly. Instead, it is much better to [file an issue](https://yt-dl.org/bug) or a pull request which details why the native or the ffmpeg HLS downloader is a better choice for your use case.
### Can you add support for this anime video site, or site which shows current movies for free? ### Can you add support for this anime video site, or site which shows current movies for free?
As a matter of policy (as well as legality), youtube-dl does not include support for services that specialize in infringing copyright. As a rule of thumb, if you cannot easily find a video that the service is quite obviously allowed to distribute (i.e. that has been uploaded by the creator, the creator's distributor, or is published under a free license), the service is probably unfit for inclusion to youtube-dl. As a matter of policy (as well as legality), youtube-dl does not include support for services that specialize in infringing copyright. As a rule of thumb, if you cannot easily find a video that the service is quite obviously allowed to distribute (i.e. that has been uploaded by the creator, the creator's distributor, or is published under a free license), the service is probably unfit for inclusion to youtube-dl.
@ -866,7 +902,7 @@ If you want to find out whether a given URL is supported, simply call youtube-dl
# Why do I need to go through that much red tape when filing bugs? # Why do I need to go through that much red tape when filing bugs?
Before we had the issue template, despite our extensive [bug reporting instructions](#bugs), about 80% of the issue reports we got were useless, for instance because people used ancient versions hundreds of releases old, because of simple syntactic errors (not in youtube-dl but in general shell usage), because the problem was alrady reported multiple times before, because people did not actually read an error message, even if it said "please install ffmpeg", because people did not mention the URL they were trying to download and many more simple, easy-to-avoid problems, many of whom were totally unrelated to youtube-dl. Before we had the issue template, despite our extensive [bug reporting instructions](#bugs), about 80% of the issue reports we got were useless, for instance because people used ancient versions hundreds of releases old, because of simple syntactic errors (not in youtube-dl but in general shell usage), because the problem was already reported multiple times before, because people did not actually read an error message, even if it said "please install ffmpeg", because people did not mention the URL they were trying to download and many more simple, easy-to-avoid problems, many of whom were totally unrelated to youtube-dl.
youtube-dl is an open-source project manned by too few volunteers, so we'd rather spend time fixing bugs where we are certain none of those simple problems apply, and where we can be reasonably confident to be able to reproduce the issue without asking the reporter repeatedly. As such, the output of `youtube-dl -v YOUR_URL_HERE` is really all that's required to file an issue. The issue template also guides you through some basic steps you can do, such as checking that your version of youtube-dl is current. youtube-dl is an open-source project manned by too few volunteers, so we'd rather spend time fixing bugs where we are certain none of those simple problems apply, and where we can be reasonably confident to be able to reproduce the issue without asking the reporter repeatedly. As such, the output of `youtube-dl -v YOUR_URL_HERE` is really all that's required to file an issue. The issue template also guides you through some basic steps you can do, such as checking that your version of youtube-dl is current.
@ -887,7 +923,7 @@ To run the test, simply invoke your favorite test runner, or execute a test file
If you want to create a build of youtube-dl yourself, you'll need If you want to create a build of youtube-dl yourself, you'll need
* python * python
* make (both GNU make and BSD make are supported) * make (only GNU make is supported)
* pandoc * pandoc
* zip * zip
* nosetests * nosetests
@ -969,19 +1005,19 @@ In any case, thank you very much for your contributions!
This section introduces a guide lines for writing idiomatic, robust and future-proof extractor code. This section introduces a guide lines for writing idiomatic, robust and future-proof extractor code.
Extractors are very fragile by nature since they depend on the layout of the source data provided by 3rd party media hoster out of your control and this layout tend to change. As an extractor implementer your task is not only to write code that will extract media links and metadata correctly but also to minimize code dependency on source's layout changes and even to make the code foresee potential future changes and be ready for that. This is important because it will allow extractor not to break on minor layout changes thus keeping old youtube-dl versions working. Even though this breakage issue is easily fixed by emitting a new version of youtube-dl with fix incorporated all the previous version become broken in all repositories and distros' packages that may not be so prompt in fetching the update from us. Needless to say some may never receive an update at all that is possible for non rolling release distros. Extractors are very fragile by nature since they depend on the layout of the source data provided by 3rd party media hosters out of your control and this layout tends to change. As an extractor implementer your task is not only to write code that will extract media links and metadata correctly but also to minimize dependency on the source's layout and even to make the code foresee potential future changes and be ready for that. This is important because it will allow the extractor not to break on minor layout changes thus keeping old youtube-dl versions working. Even though this breakage issue is easily fixed by emitting a new version of youtube-dl with a fix incorporated, all the previous versions become broken in all repositories and distros' packages that may not be so prompt in fetching the update from us. Needless to say, some non rolling release distros may never receive an update at all.
### Mandatory and optional metafields ### Mandatory and optional metafields
For extraction to work youtube-dl relies on metadata your extractor extracts and provides to youtube-dl expressed by [information dictionary](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L75-L257) or simply *info dict*. Only the following meta fields in *info dict* are considered mandatory for successful extraction process by youtube-dl: For extraction to work youtube-dl relies on metadata your extractor extracts and provides to youtube-dl expressed by an [information dictionary](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L75-L257) or simply *info dict*. Only the following meta fields in the *info dict* are considered mandatory for a successful extraction process by youtube-dl:
- `id` (media identifier) - `id` (media identifier)
- `title` (media title) - `title` (media title)
- `url` (media download URL) or `formats` - `url` (media download URL) or `formats`
In fact only the last option is technically mandatory (i.e. if you can't figure out the download location of the media the extraction does not make any sense). But by convention youtube-dl also treats `id` and `title` to be mandatory. Thus aforementioned metafields are the critical data the extraction does not make any sense without and if any of them fail to be extracted then extractor is considered completely broken. In fact only the last option is technically mandatory (i.e. if you can't figure out the download location of the media the extraction does not make any sense). But by convention youtube-dl also treats `id` and `title` as mandatory. Thus the aforementioned metafields are the critical data that the extraction does not make any sense without and if any of them fail to be extracted then the extractor is considered completely broken.
[Any field](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L149-L257) apart from the aforementioned ones are considered **optional**. That means that extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they are always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields. [Any field](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L149-L257) apart from the aforementioned ones are considered **optional**. That means that extraction should be **tolerant** to situations when sources for these fields can potentially be unavailable (even if they are always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields.
#### Example #### Example
@ -1001,7 +1037,7 @@ Assume at this point `meta`'s layout is:
} }
``` ```
Assume you want to extract `summary` and put into resulting info dict as `description`. Since `description` is optional metafield you should be ready that this key may be missing from the `meta` dict, so that you should extract it like: Assume you want to extract `summary` and put it into the resulting info dict as `description`. Since `description` is an optional metafield you should be ready that this key may be missing from the `meta` dict, so that you should extract it like:
```python ```python
description = meta.get('summary') # correct description = meta.get('summary') # correct
@ -1013,7 +1049,7 @@ and not like:
description = meta['summary'] # incorrect description = meta['summary'] # incorrect
``` ```
The latter will break extraction process with `KeyError` if `summary` disappears from `meta` at some time later but with former approach extraction will just go ahead with `description` set to `None` that is perfectly fine (remember `None` is equivalent for absence of data). The latter will break extraction process with `KeyError` if `summary` disappears from `meta` at some later time but with the former approach extraction will just go ahead with `description` set to `None` which is perfectly fine (remember `None` is equivalent to the absence of data).
Similarly, you should pass `fatal=False` when extracting optional data from a webpage with `_search_regex`, `_html_search_regex` or similar methods, for instance: Similarly, you should pass `fatal=False` when extracting optional data from a webpage with `_search_regex`, `_html_search_regex` or similar methods, for instance:
@ -1033,21 +1069,21 @@ description = self._search_regex(
webpage, 'description', default=None) webpage, 'description', default=None)
``` ```
On failure this code will silently continue the extraction with `description` set to `None`. That is useful for metafields that are known to may or may not be present. On failure this code will silently continue the extraction with `description` set to `None`. That is useful for metafields that may or may not be present.
### Provide fallbacks ### Provide fallbacks
When extracting metadata try to provide several scenarios for that. For example if `title` is present in several places/sources try extracting from at least some of them. This would make it more future-proof in case some of the sources became unavailable. When extracting metadata try to do so from multiple sources. For example if `title` is present in several places, try extracting from at least some of them. This makes it more future-proof in case some of the sources become unavailable.
#### Example #### Example
Say `meta` from previous example has a `title` and you are about to extract it. Since `title` is mandatory meta field you should end up with something like: Say `meta` from the previous example has a `title` and you are about to extract it. Since `title` is a mandatory meta field you should end up with something like:
```python ```python
title = meta['title'] title = meta['title']
``` ```
If `title` disappeares from `meta` in future due to some changes on hoster's side the extraction would fail since `title` is mandatory. That's expected. If `title` disappears from `meta` in future due to some changes on the hoster's side the extraction would fail since `title` is mandatory. That's expected.
Assume that you have some another source you can extract `title` from, for example `og:title` HTML meta of a `webpage`. In this case you can provide a fallback scenario: Assume that you have some another source you can extract `title` from, for example `og:title` HTML meta of a `webpage`. In this case you can provide a fallback scenario:
@ -1084,7 +1120,7 @@ title = self._search_regex(
webpage, 'title', group='title') webpage, 'title', group='title')
``` ```
Note how you tolerate potential changes in `style` attribute's value or switch from using double quotes to single for `class` attribute: Note how you tolerate potential changes in the `style` attribute's value or switch from using double quotes to single for `class` attribute:
The code definitely should not look like: The code definitely should not look like:
@ -1154,7 +1190,7 @@ with youtube_dl.YoutubeDL(ydl_opts) as ydl:
# BUGS # BUGS
Bugs and suggestions should be reported at: <https://github.com/rg3/youtube-dl/issues>. Unless you were prompted so or there is another pertinent reason (e.g. GitHub fails to accept the bug report), please do not send bug reports via personal email. For discussions, join us in the IRC channel [#youtube-dl](irc://chat.freenode.net/#youtube-dl) on freenode ([webchat](http://webchat.freenode.net/?randomnick=1&channels=youtube-dl)). Bugs and suggestions should be reported at: <https://github.com/rg3/youtube-dl/issues>. Unless you were prompted to or there is another pertinent reason (e.g. GitHub fails to accept the bug report), please do not send bug reports via personal email. For discussions, join us in the IRC channel [#youtube-dl](irc://chat.freenode.net/#youtube-dl) on freenode ([webchat](http://webchat.freenode.net/?randomnick=1&channels=youtube-dl)).
**Please include the full output of youtube-dl when run with `-v`**, i.e. **add** `-v` flag to **your command line**, copy the **whole** output and post it in the issue body wrapped in \`\`\` for better formatting. It should look similar to this: **Please include the full output of youtube-dl when run with `-v`**, i.e. **add** `-v` flag to **your command line**, copy the **whole** output and post it in the issue body wrapped in \`\`\` for better formatting. It should look similar to this:
``` ```
@ -1170,7 +1206,7 @@ $ youtube-dl -v <your command line>
[debug] Proxy map: {} [debug] Proxy map: {}
... ...
``` ```
**Do not post screenshots of verbose log only plain text is acceptable.** **Do not post screenshots of verbose logs; only plain text is acceptable.**
The output (including the first lines) contains important debugging information. Issues without the full output are often not reproducible and therefore do not get solved in short order, if ever. The output (including the first lines) contains important debugging information. Issues without the full output are often not reproducible and therefore do not get solved in short order, if ever.
@ -1224,7 +1260,7 @@ Only post features that you (or an incapacitated friend you can personally talk
### Is your question about youtube-dl? ### Is your question about youtube-dl?
It may sound strange, but some bug reports we receive are completely unrelated to youtube-dl and relate to a different or even the reporter's own application. Please make sure that you are actually using youtube-dl. If you are using a UI for youtube-dl, report the bug to the maintainer of the actual application providing the UI. On the other hand, if your UI for youtube-dl fails in some way you believe is related to youtube-dl, by all means, go ahead and report the bug. It may sound strange, but some bug reports we receive are completely unrelated to youtube-dl and relate to a different, or even the reporter's own, application. Please make sure that you are actually using youtube-dl. If you are using a UI for youtube-dl, report the bug to the maintainer of the actual application providing the UI. On the other hand, if your UI for youtube-dl fails in some way you believe is related to youtube-dl, by all means, go ahead and report the bug.
# COPYRIGHT # COPYRIGHT

View File

@ -1,4 +1,4 @@
# encoding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re

View File

@ -60,6 +60,9 @@ if ! type pandoc >/dev/null 2>/dev/null; then echo 'ERROR: pandoc is missing'; e
if ! python3 -c 'import rsa' 2>/dev/null; then echo 'ERROR: python3-rsa is missing'; exit 1; fi if ! python3 -c 'import rsa' 2>/dev/null; then echo 'ERROR: python3-rsa is missing'; exit 1; fi
if ! python3 -c 'import wheel' 2>/dev/null; then echo 'ERROR: wheel is missing'; exit 1; fi if ! python3 -c 'import wheel' 2>/dev/null; then echo 'ERROR: wheel is missing'; exit 1; fi
read -p "Is ChangeLog up to date? (y/n) " -n 1
if [[ ! $REPLY =~ ^[Yy]$ ]]; then exit 1; fi
/bin/echo -e "\n### First of all, testing..." /bin/echo -e "\n### First of all, testing..."
make clean make clean
if $skip_tests ; then if $skip_tests ; then

View File

@ -1,4 +1,4 @@
# -*- coding: utf-8 -*- # coding: utf-8
# #
# youtube-dl documentation build configuration file, created by # youtube-dl documentation build configuration file, created by
# sphinx-quickstart on Fri Mar 14 21:05:43 2014. # sphinx-quickstart on Fri Mar 14 21:05:43 2014.

View File

@ -13,12 +13,16 @@
- **5min** - **5min**
- **8tracks** - **8tracks**
- **91porn** - **91porn**
- **9c9media**
- **9c9media:stack**
- **9gag** - **9gag**
- **9now.com.au** - **9now.com.au**
- **abc.net.au** - **abc.net.au**
- **Abc7News** - **abc.net.au:iview**
- **abcnews** - **abcnews**
- **abcnews:video** - **abcnews:video**
- **abcotvs**: ABC Owned Television Stations
- **abcotvs:clips**
- **AcademicEarth:Course** - **AcademicEarth:Course**
- **acast** - **acast**
- **acast:channel** - **acast:channel**
@ -30,11 +34,12 @@
- **AdultSwim** - **AdultSwim**
- **aenetworks**: A+E Networks: A&E, Lifetime, History.com, FYI Network - **aenetworks**: A+E Networks: A&E, Lifetime, History.com, FYI Network
- **AfreecaTV**: afreecatv.com - **AfreecaTV**: afreecatv.com
- **Aftonbladet**
- **AirMozilla** - **AirMozilla**
- **AlJazeera** - **AlJazeera**
- **Allocine** - **Allocine**
- **AlphaPorno** - **AlphaPorno**
- **AMCNetworks**
- **anderetijden**: npo.nl and ntr.nl
- **AnimeOnDemand** - **AnimeOnDemand**
- **anitube.se** - **anitube.se**
- **AnySex** - **AnySex**
@ -65,6 +70,10 @@
- **audiomack** - **audiomack**
- **audiomack:album** - **audiomack:album**
- **auroravid**: AuroraVid - **auroravid**: AuroraVid
- **AWAAN**
- **awaan:live**
- **awaan:season**
- **awaan:video**
- **Azubu** - **Azubu**
- **AzubuLive** - **AzubuLive**
- **BaiduVideo**: 百度视频 - **BaiduVideo**: 百度视频
@ -77,9 +86,10 @@
- **bbc.co.uk:article**: BBC articles - **bbc.co.uk:article**: BBC articles
- **bbc.co.uk:iplayer:playlist** - **bbc.co.uk:iplayer:playlist**
- **bbc.co.uk:playlist** - **bbc.co.uk:playlist**
- **BeatportPro** - **Beatport**
- **Beeg** - **Beeg**
- **BehindKink** - **BehindKink**
- **BellMedia**
- **Bet** - **Bet**
- **Bigflix** - **Bigflix**
- **Bild**: Bild.de - **Bild**: Bild.de
@ -101,6 +111,7 @@
- **bt:vestlendingen**: Bergens Tidende - Vestlendingen - **bt:vestlendingen**: Bergens Tidende - Vestlendingen
- **BuzzFeed** - **BuzzFeed**
- **BYUtv** - **BYUtv**
- **BYUtvEvent**
- **Camdemy** - **Camdemy**
- **CamdemyFolder** - **CamdemyFolder**
- **CamWithHer** - **CamWithHer**
@ -109,17 +120,22 @@
- **Canvas** - **Canvas**
- **CarambaTV** - **CarambaTV**
- **CarambaTVPage** - **CarambaTVPage**
- **CBC** - **CartoonNetwork**
- **CBCPlayer** - **cbc.ca**
- **cbc.ca:player**
- **cbc.ca:watch**
- **cbc.ca:watch:video**
- **CBS** - **CBS**
- **CBSInteractive** - **CBSInteractive**
- **CBSLocal** - **CBSLocal**
- **CBSNews**: CBS News - **cbsnews**: CBS News
- **CBSNewsLiveVideo**: CBS News Live Videos - **cbsnews:livevideo**: CBS News Live Videos
- **CBSSports** - **CBSSports**
- **CCTV**
- **CDA** - **CDA**
- **CeskaTelevize** - **CeskaTelevize**
- **channel9**: Channel 9 - **channel9**: Channel 9
- **CharlieRose**
- **Chaturbate** - **Chaturbate**
- **Chilloutzone** - **Chilloutzone**
- **chirbit** - **chirbit**
@ -155,10 +171,11 @@
- **CSNNE** - **CSNNE**
- **CSpan**: C-SPAN - **CSpan**: C-SPAN
- **CtsNews**: 華視新聞 - **CtsNews**: 華視新聞
- **CTV**
- **CTVNews** - **CTVNews**
- **culturebox.francetvinfo.fr** - **culturebox.francetvinfo.fr**
- **CultureUnplugged** - **CultureUnplugged**
- **curiositystream**
- **curiositystream:collection**
- **CWTV** - **CWTV**
- **DailyMail** - **DailyMail**
- **dailymotion** - **dailymotion**
@ -170,10 +187,6 @@
- **daum.net:playlist** - **daum.net:playlist**
- **daum.net:user** - **daum.net:user**
- **DBTV** - **DBTV**
- **DCN**
- **dcn:live**
- **dcn:season**
- **dcn:video**
- **DctpTv** - **DctpTv**
- **DeezerPlaylist** - **DeezerPlaylist**
- **defense.gouv.fr** - **defense.gouv.fr**
@ -215,13 +228,14 @@
- **EsriVideo** - **EsriVideo**
- **Europa** - **Europa**
- **EveryonesMixtape** - **EveryonesMixtape**
- **exfm**: ex.fm
- **ExpoTV** - **ExpoTV**
- **ExtremeTube** - **ExtremeTube**
- **EyedoTV** - **EyedoTV**
- **facebook** - **facebook**
- **FacebookPluginsVideo**
- **faz.net** - **faz.net**
- **fc2** - **fc2**
- **fc2:embed**
- **Fczenit** - **Fczenit**
- **features.aol.com** - **features.aol.com**
- **fernsehkritik.tv** - **fernsehkritik.tv**
@ -234,7 +248,9 @@
- **Formula1** - **Formula1**
- **FOX** - **FOX**
- **Foxgay** - **Foxgay**
- **FoxNews**: Fox News and Fox Business Video - **foxnews**: Fox News and Fox Business Video
- **foxnews:article**
- **foxnews:insider**
- **FoxSports** - **FoxSports**
- **france2.fr:generation-quoi** - **france2.fr:generation-quoi**
- **FranceCulture** - **FranceCulture**
@ -247,6 +263,7 @@
- **Funimation** - **Funimation**
- **FunnyOrDie** - **FunnyOrDie**
- **Fusion** - **Fusion**
- **FXNetworks**
- **GameInformer** - **GameInformer**
- **GameOne** - **GameOne**
- **gameone:playlist** - **gameone:playlist**
@ -262,6 +279,7 @@
- **Glide**: Glide mobile video messages (glide.me) - **Glide**: Glide mobile video messages (glide.me)
- **Globo** - **Globo**
- **GloboArticle** - **GloboArticle**
- **Go**
- **GodTube** - **GodTube**
- **GodTV** - **GodTV**
- **Golem** - **Golem**
@ -271,12 +289,14 @@
- **Groupon** - **Groupon**
- **Hark** - **Hark**
- **HBO** - **HBO**
- **HBOEpisode**
- **HearThisAt** - **HearThisAt**
- **Heise** - **Heise**
- **HellPorno** - **HellPorno**
- **Helsinki**: helsinki.fi - **Helsinki**: helsinki.fi
- **HentaiStigma** - **HentaiStigma**
- **HGTV** - **HGTV**
- **hgtv.com:show**
- **HistoricFilms** - **HistoricFilms**
- **history:topic**: History.com Topic - **history:topic**: History.com Topic
- **hitbox** - **hitbox**
@ -288,6 +308,7 @@
- **HowStuffWorks** - **HowStuffWorks**
- **HRTi** - **HRTi**
- **HRTiPlaylist** - **HRTiPlaylist**
- **Huajiao**: 花椒直播
- **HuffPost**: Huffington Post - **HuffPost**: Huffington Post
- **Hypem** - **Hypem**
- **Iconosquare** - **Iconosquare**
@ -309,7 +330,10 @@
- **ivi**: ivi.ru - **ivi**: ivi.ru
- **ivi:compilation**: ivi.ru compilations - **ivi:compilation**: ivi.ru compilations
- **ivideon**: Ivideon TV - **ivideon**: Ivideon TV
- **Iwara**
- **Izlesene** - **Izlesene**
- **Jamendo**
- **JamendoAlbum**
- **JeuxVideo** - **JeuxVideo**
- **Jove** - **Jove**
- **jpopsuki.tv** - **jpopsuki.tv**
@ -322,6 +346,7 @@
- **KarriereVideos** - **KarriereVideos**
- **keek** - **keek**
- **KeezMovies** - **KeezMovies**
- **Ketnet**
- **KhanAcademy** - **KhanAcademy**
- **KickStarter** - **KickStarter**
- **KonserthusetPlay** - **KonserthusetPlay**
@ -337,11 +362,13 @@
- **kuwo:song**: 酷我音乐 - **kuwo:song**: 酷我音乐
- **la7.it** - **la7.it**
- **Laola1Tv** - **Laola1Tv**
- **LCI**
- **Lcp** - **Lcp**
- **LcpPlay** - **LcpPlay**
- **Le**: 乐视网 - **Le**: 乐视网
- **Learnr** - **Learnr**
- **Lecture2Go** - **Lecture2Go**
- **LEGO**
- **Lemonde** - **Lemonde**
- **LePlaylist** - **LePlaylist**
- **LetvCloud**: 乐视云 - **LetvCloud**: 乐视云
@ -367,6 +394,8 @@
- **mailru**: Видео@Mail.Ru - **mailru**: Видео@Mail.Ru
- **MakersChannel** - **MakersChannel**
- **MakerTV** - **MakerTV**
- **mangomolo:live**
- **mangomolo:video**
- **MatchTV** - **MatchTV**
- **MDR**: MDR.DE and KiKA - **MDR**: MDR.DE and KiKA
- **media.ccc.de** - **media.ccc.de**
@ -375,6 +404,7 @@
- **Metacritic** - **Metacritic**
- **Mgoon** - **Mgoon**
- **MGTV**: 芒果TV - **MGTV**: 芒果TV
- **MiaoPai**
- **Minhateca** - **Minhateca**
- **MinistryGrid** - **MinistryGrid**
- **Minoto** - **Minoto**
@ -396,10 +426,13 @@
- **MovieClips** - **MovieClips**
- **MovieFap** - **MovieFap**
- **Moviezine** - **Moviezine**
- **MovingImage**
- **MPORA** - **MPORA**
- **MSN** - **MSN**
- **MTV** - **mtg**: MTG services
- **mtv**
- **mtv.de** - **mtv.de**
- **mtv:video**
- **mtvservices:embedded** - **mtvservices:embedded**
- **MuenchenTV**: münchen.tv - **MuenchenTV**: münchen.tv
- **MusicPlayOn** - **MusicPlayOn**
@ -421,6 +454,7 @@
- **NBA** - **NBA**
- **NBC** - **NBC**
- **NBCNews** - **NBCNews**
- **NBCOlympics**
- **NBCSports** - **NBCSports**
- **NBCSportsVPlayer** - **NBCSportsVPlayer**
- **ndr**: NDR.de - Norddeutscher Rundfunk - **ndr**: NDR.de - Norddeutscher Rundfunk
@ -442,18 +476,20 @@
- **NextMediaActionNews**: 蘋果日報 - 動新聞 - **NextMediaActionNews**: 蘋果日報 - 動新聞
- **nfb**: National Film Board of Canada - **nfb**: National Film Board of Canada
- **nfl.com** - **nfl.com**
- **NhkVod**
- **nhl.com** - **nhl.com**
- **nhl.com:news**: NHL news - **nhl.com:news**: NHL news
- **nhl.com:videocenter** - **nhl.com:videocenter**
- **nhl.com:videocenter:category**: NHL videocenter category - **nhl.com:videocenter:category**: NHL videocenter category
- **nick.com** - **nick.com**
- **nick.de** - **nick.de**
- **nicknight**
- **niconico**: ニコニコ動画 - **niconico**: ニコニコ動画
- **NiconicoPlaylist** - **NiconicoPlaylist**
- **NineCNineMedia**
- **Nintendo** - **Nintendo**
- **njoy**: N-JOY - **njoy**: N-JOY
- **njoy:embed** - **njoy:embed**
- **NobelPrize**
- **Noco** - **Noco**
- **Normalboots** - **Normalboots**
- **NosVideo** - **NosVideo**
@ -478,6 +514,7 @@
- **Nuvid** - **Nuvid**
- **NYTimes** - **NYTimes**
- **NYTimesArticle** - **NYTimesArticle**
- **NZZ**
- **ocw.mit.edu** - **ocw.mit.edu**
- **OdaTV** - **OdaTV**
- **Odnoklassniki** - **Odnoklassniki**
@ -494,6 +531,7 @@
- **orf:iptv**: iptv.ORF.at - **orf:iptv**: iptv.ORF.at
- **orf:oe1**: Radio Österreich 1 - **orf:oe1**: Radio Österreich 1
- **orf:tvthek**: ORF TVthek - **orf:tvthek**: ORF TVthek
- **PandaTV**: 熊猫TV
- **pandora.tv**: 판도라TV - **pandora.tv**: 판도라TV
- **parliamentlive.tv**: UK parliament videos - **parliamentlive.tv**: UK parliament videos
- **Patreon** - **Patreon**
@ -508,7 +546,6 @@
- **Pinkbike** - **Pinkbike**
- **Pladform** - **Pladform**
- **play.fm** - **play.fm**
- **played.to**
- **PlaysTV** - **PlaysTV**
- **Playtvak**: Playtvak.cz, iDNES.cz and Lidovky.cz - **Playtvak**: Playtvak.cz, iDNES.cz and Lidovky.cz
- **Playvid** - **Playvid**
@ -520,6 +557,8 @@
- **podomatic** - **podomatic**
- **Pokemon** - **Pokemon**
- **PolskieRadio** - **PolskieRadio**
- **PolskieRadioCategory**
- **PornCom**
- **PornHd** - **PornHd**
- **PornHub**: PornHub and Thumbzilla - **PornHub**: PornHub and Thumbzilla
- **PornHubPlaylist** - **PornHubPlaylist**
@ -552,6 +591,8 @@
- **RDS**: RDS.ca - **RDS**: RDS.ca
- **RedTube** - **RedTube**
- **RegioTV** - **RegioTV**
- **RENTV**
- **RENTVArticle**
- **Restudy** - **Restudy**
- **Reuters** - **Reuters**
- **ReverbNation** - **ReverbNation**
@ -559,6 +600,7 @@
- **revision3:embed** - **revision3:embed**
- **RICE** - **RICE**
- **RingTV** - **RingTV**
- **RMCDecouverte**
- **RockstarGames** - **RockstarGames**
- **RoosterTeeth** - **RoosterTeeth**
- **RottenTomatoes** - **RottenTomatoes**
@ -606,7 +648,7 @@
- **ServingSys** - **ServingSys**
- **Sexu** - **Sexu**
- **Shahid** - **Shahid**
- **Shared**: shared.sx and vivo.sx - **Shared**: shared.sx
- **ShareSix** - **ShareSix**
- **Sina** - **Sina**
- **SixPlay** - **SixPlay**
@ -648,7 +690,6 @@
- **sr:mediathek**: Saarländischer Rundfunk - **sr:mediathek**: Saarländischer Rundfunk
- **SRGSSR** - **SRGSSR**
- **SRGSSRPlay**: srf.ch, rts.ch, rsi.ch, rtr.ch and swissinfo.ch play sites - **SRGSSRPlay**: srf.ch, rts.ch, rsi.ch, rtr.ch and swissinfo.ch play sites
- **SSA**
- **stanfordoc**: Stanford Open ClassRoom - **stanfordoc**: Stanford Open ClassRoom
- **Steam** - **Steam**
- **Stitcher** - **Stitcher**
@ -662,9 +703,11 @@
- **SWRMediathek** - **SWRMediathek**
- **Syfy** - **Syfy**
- **SztvHu** - **SztvHu**
- **t-online.de**
- **Tagesschau** - **Tagesschau**
- **tagesschau:player** - **tagesschau:player**
- **Tass** - **Tass**
- **TBS**
- **TDSLifeway** - **TDSLifeway**
- **teachertube**: teachertube.com videos - **teachertube**: teachertube.com videos
- **teachertube:user:collection**: teachertube.com user and collection videos - **teachertube:user:collection**: teachertube.com user and collection videos
@ -679,19 +722,22 @@
- **Telecinco**: telecinco.es, cuatro.com and mediaset.es - **Telecinco**: telecinco.es, cuatro.com and mediaset.es
- **Telegraaf** - **Telegraaf**
- **TeleMB** - **TeleMB**
- **TeleQuebec**
- **TeleTask** - **TeleTask**
- **Telewebion** - **Telewebion**
- **TF1** - **TF1**
- **TFO**
- **TheIntercept** - **TheIntercept**
- **theoperaplatform**
- **ThePlatform** - **ThePlatform**
- **ThePlatformFeed** - **ThePlatformFeed**
- **TheScene** - **TheScene**
- **TheSixtyOne** - **TheSixtyOne**
- **TheStar** - **TheStar**
- **TheWeatherChannel**
- **ThisAmericanLife** - **ThisAmericanLife**
- **ThisAV** - **ThisAV**
- **THVideo** - **ThisOldHouse**
- **THVideoPlaylist**
- **tinypic**: tinypic.com videos - **tinypic**: tinypic.com videos
- **tlc.de** - **tlc.de**
- **TMZ** - **TMZ**
@ -705,8 +751,7 @@
- **ToypicsUser**: Toypics user profile - **ToypicsUser**: Toypics user profile
- **TrailerAddict** (Currently broken) - **TrailerAddict** (Currently broken)
- **Trilulilu** - **Trilulilu**
- **trollvids** - **TruTV**
- **TruTube**
- **Tube8** - **Tube8**
- **TubiTv** - **TubiTv**
- **tudou** - **tudou**
@ -728,10 +773,10 @@
- **TVCArticle** - **TVCArticle**
- **tvigle**: Интернет-телевидение Tvigle.ru - **tvigle**: Интернет-телевидение Tvigle.ru
- **tvland.com** - **tvland.com**
- **TVNoe**
- **tvp**: Telewizja Polska - **tvp**: Telewizja Polska
- **tvp:embed**: Telewizja Polska - **tvp:embed**: Telewizja Polska
- **tvp:series** - **tvp:series**
- **TVPlay**: TV3Play and related services
- **Tweakers** - **Tweakers**
- **twitch:chapter** - **twitch:chapter**
- **twitch:clips** - **twitch:clips**
@ -748,8 +793,11 @@
- **UDNEmbed**: 聯合影音 - **UDNEmbed**: 聯合影音
- **Unistra** - **Unistra**
- **uol.com.br** - **uol.com.br**
- **uplynk**
- **uplynk:preplay**
- **Urort**: NRK P3 Urørt - **Urort**: NRK P3 Urørt
- **URPlay** - **URPlay**
- **USANetwork**
- **USAToday** - **USAToday**
- **ustream** - **ustream**
- **ustream:channel** - **ustream:channel**
@ -765,7 +813,9 @@
- **VevoPlaylist** - **VevoPlaylist**
- **VGTV**: VGTV, BTTV, FTV, Aftenposten and Aftonbladet - **VGTV**: VGTV, BTTV, FTV, Aftenposten and Aftonbladet
- **vh1.com** - **vh1.com**
- **Viafree**
- **Vice** - **Vice**
- **Viceland**
- **ViceShow** - **ViceShow**
- **Vidbit** - **Vidbit**
- **Viddler** - **Viddler**
@ -805,6 +855,7 @@
- **Vimple**: Vimple - one-click video hosting - **Vimple**: Vimple - one-click video hosting
- **Vine** - **Vine**
- **vine:user** - **vine:user**
- **Vivo**: vivo.sx
- **vk**: VK - **vk**: VK
- **vk:uservideos**: VK - User's Videos - **vk:uservideos**: VK - User's Videos
- **vk:wallpost** - **vk:wallpost**
@ -818,6 +869,7 @@
- **VRT** - **VRT**
- **vube**: Vube.com - **vube**: Vube.com
- **VuClip** - **VuClip**
- **VyboryMos**
- **Walla** - **Walla**
- **washingtonpost** - **washingtonpost**
- **washingtonpost:article** - **washingtonpost:article**
@ -831,7 +883,7 @@
- **wholecloud**: WholeCloud - **wholecloud**: WholeCloud
- **Wimp** - **Wimp**
- **Wistia** - **Wistia**
- **WNL** - **wnl**: npo.nl and ntr.nl
- **WorldStarHipHop** - **WorldStarHipHop**
- **wrzuta.pl** - **wrzuta.pl**
- **wrzuta.pl:playlist** - **wrzuta.pl:playlist**
@ -885,6 +937,4 @@
- **Zapiks** - **Zapiks**
- **ZDF** - **ZDF**
- **ZDFChannel** - **ZDFChannel**
- **zingmp3:album**: mp3.zing.vn albums - **zingmp3**: mp3.zing.vn
- **zingmp3:song**: mp3.zing.vn songs
- **ZippCast**

View File

@ -1,5 +1,5 @@
#!/usr/bin/env python #!/usr/bin/env python
# -*- coding: utf-8 -*- # coding: utf-8
from __future__ import print_function from __future__ import print_function

View File

@ -605,6 +605,7 @@ class TestYoutubeDL(unittest.TestCase):
'extractor': 'TEST', 'extractor': 'TEST',
'duration': 30, 'duration': 30,
'filesize': 10 * 1024, 'filesize': 10 * 1024,
'playlist_id': '42',
} }
second = { second = {
'id': '2', 'id': '2',
@ -614,6 +615,7 @@ class TestYoutubeDL(unittest.TestCase):
'duration': 10, 'duration': 10,
'description': 'foo', 'description': 'foo',
'filesize': 5 * 1024, 'filesize': 5 * 1024,
'playlist_id': '43',
} }
videos = [first, second] videos = [first, second]
@ -650,6 +652,10 @@ class TestYoutubeDL(unittest.TestCase):
res = get_videos(f) res = get_videos(f)
self.assertEqual(res, ['1']) self.assertEqual(res, ['1'])
f = match_filter_func('playlist_id = 42')
res = get_videos(f)
self.assertEqual(res, ['1'])
def test_playlist_items_selection(self): def test_playlist_items_selection(self):
entries = [{ entries = [{
'id': compat_str(i), 'id': compat_str(i),

View File

@ -87,7 +87,7 @@ class TestHTTP(unittest.TestCase):
ydl = YoutubeDL({'logger': FakeLogger()}) ydl = YoutubeDL({'logger': FakeLogger()})
r = ydl.extract_info('http://localhost:%d/302' % self.port) r = ydl.extract_info('http://localhost:%d/302' % self.port)
self.assertEqual(r['url'], 'http://localhost:%d/vid.mp4' % self.port) self.assertEqual(r['entries'][0]['url'], 'http://localhost:%d/vid.mp4' % self.port)
class TestHTTPS(unittest.TestCase): class TestHTTPS(unittest.TestCase):
@ -111,7 +111,7 @@ class TestHTTPS(unittest.TestCase):
ydl = YoutubeDL({'logger': FakeLogger(), 'nocheckcertificate': True}) ydl = YoutubeDL({'logger': FakeLogger(), 'nocheckcertificate': True})
r = ydl.extract_info('https://localhost:%d/video.html' % self.port) r = ydl.extract_info('https://localhost:%d/video.html' % self.port)
self.assertEqual(r['url'], 'https://localhost:%d/vid.mp4' % self.port) self.assertEqual(r['entries'][0]['url'], 'https://localhost:%d/vid.mp4' % self.port)
def _build_proxy_handler(name): def _build_proxy_handler(name):

View File

@ -39,6 +39,8 @@ from youtube_dl.utils import (
is_html, is_html,
js_to_json, js_to_json,
limit_length, limit_length,
mimetype2ext,
month_by_name,
ohdave_rsa_encrypt, ohdave_rsa_encrypt,
OnDemandPagedList, OnDemandPagedList,
orderedSet, orderedSet,
@ -67,6 +69,7 @@ from youtube_dl.utils import (
uppercase_escape, uppercase_escape,
lowercase_escape, lowercase_escape,
url_basename, url_basename,
base_url,
urlencode_postdata, urlencode_postdata,
urshift, urshift,
update_url_query, update_url_query,
@ -290,6 +293,7 @@ class TestUtil(unittest.TestCase):
self.assertEqual(unified_strdate('25-09-2014'), '20140925') self.assertEqual(unified_strdate('25-09-2014'), '20140925')
self.assertEqual(unified_strdate('27.02.2016 17:30'), '20160227') self.assertEqual(unified_strdate('27.02.2016 17:30'), '20160227')
self.assertEqual(unified_strdate('UNKNOWN DATE FORMAT'), None) self.assertEqual(unified_strdate('UNKNOWN DATE FORMAT'), None)
self.assertEqual(unified_strdate('Feb 7, 2016 at 6:35 pm'), '20160207')
def test_unified_timestamps(self): def test_unified_timestamps(self):
self.assertEqual(unified_timestamp('December 21, 2010'), 1292889600) self.assertEqual(unified_timestamp('December 21, 2010'), 1292889600)
@ -310,6 +314,7 @@ class TestUtil(unittest.TestCase):
self.assertEqual(unified_timestamp('27.02.2016 17:30'), 1456594200) self.assertEqual(unified_timestamp('27.02.2016 17:30'), 1456594200)
self.assertEqual(unified_timestamp('UNKNOWN DATE FORMAT'), None) self.assertEqual(unified_timestamp('UNKNOWN DATE FORMAT'), None)
self.assertEqual(unified_timestamp('May 16, 2016 11:15 PM'), 1463440500) self.assertEqual(unified_timestamp('May 16, 2016 11:15 PM'), 1463440500)
self.assertEqual(unified_timestamp('Feb 7, 2016 at 6:35 pm'), 1454870100)
def test_determine_ext(self): def test_determine_ext(self):
self.assertEqual(determine_ext('http://example.com/foo/bar.mp4/?download'), 'mp4') self.assertEqual(determine_ext('http://example.com/foo/bar.mp4/?download'), 'mp4')
@ -433,6 +438,13 @@ class TestUtil(unittest.TestCase):
url_basename('http://media.w3.org/2010/05/sintel/trailer.mp4'), url_basename('http://media.w3.org/2010/05/sintel/trailer.mp4'),
'trailer.mp4') 'trailer.mp4')
def test_base_url(self):
self.assertEqual(base_url('http://foo.de/'), 'http://foo.de/')
self.assertEqual(base_url('http://foo.de/bar'), 'http://foo.de/')
self.assertEqual(base_url('http://foo.de/bar/'), 'http://foo.de/bar/')
self.assertEqual(base_url('http://foo.de/bar/baz'), 'http://foo.de/bar/')
self.assertEqual(base_url('http://foo.de/bar/baz?x=z/x/c'), 'http://foo.de/bar/')
def test_parse_age_limit(self): def test_parse_age_limit(self):
self.assertEqual(parse_age_limit(None), None) self.assertEqual(parse_age_limit(None), None)
self.assertEqual(parse_age_limit(False), None) self.assertEqual(parse_age_limit(False), None)
@ -625,6 +637,22 @@ class TestUtil(unittest.TestCase):
limit_length('foo bar baz asd', 12).startswith('foo bar')) limit_length('foo bar baz asd', 12).startswith('foo bar'))
self.assertTrue('...' in limit_length('foo bar baz asd', 12)) self.assertTrue('...' in limit_length('foo bar baz asd', 12))
def test_mimetype2ext(self):
self.assertEqual(mimetype2ext(None), None)
self.assertEqual(mimetype2ext('video/x-flv'), 'flv')
self.assertEqual(mimetype2ext('application/x-mpegURL'), 'm3u8')
self.assertEqual(mimetype2ext('text/vtt'), 'vtt')
self.assertEqual(mimetype2ext('text/vtt;charset=utf-8'), 'vtt')
self.assertEqual(mimetype2ext('text/html; charset=utf-8'), 'html')
def test_month_by_name(self):
self.assertEqual(month_by_name(None), None)
self.assertEqual(month_by_name('December', 'en'), 12)
self.assertEqual(month_by_name('décembre', 'fr'), 12)
self.assertEqual(month_by_name('December'), 12)
self.assertEqual(month_by_name('décembre'), None)
self.assertEqual(month_by_name('Unknown', 'unknown'), None)
def test_parse_codecs(self): def test_parse_codecs(self):
self.assertEqual(parse_codecs(''), {}) self.assertEqual(parse_codecs(''), {})
self.assertEqual(parse_codecs('avc1.77.30, mp4a.40.2'), { self.assertEqual(parse_codecs('avc1.77.30, mp4a.40.2'), {
@ -712,6 +740,9 @@ class TestUtil(unittest.TestCase):
inp = '''{"foo":101}''' inp = '''{"foo":101}'''
self.assertEqual(js_to_json(inp), '''{"foo":101}''') self.assertEqual(js_to_json(inp), '''{"foo":101}''')
inp = '''{"duration": "00:01:07"}'''
self.assertEqual(js_to_json(inp), '''{"duration": "00:01:07"}''')
def test_js_to_json_edgecases(self): def test_js_to_json_edgecases(self):
on = js_to_json("{abc_def:'1\\'\\\\2\\\\\\'3\"4'}") on = js_to_json("{abc_def:'1\\'\\\\2\\\\\\'3\"4'}")
self.assertEqual(json.loads(on), {"abc_def": "1'\\2\\'3\"4"}) self.assertEqual(json.loads(on), {"abc_def": "1'\\2\\'3\"4"})
@ -817,7 +848,10 @@ class TestUtil(unittest.TestCase):
self.assertEqual(parse_filesize('2 MiB'), 2097152) self.assertEqual(parse_filesize('2 MiB'), 2097152)
self.assertEqual(parse_filesize('5 GB'), 5000000000) self.assertEqual(parse_filesize('5 GB'), 5000000000)
self.assertEqual(parse_filesize('1.2Tb'), 1200000000000) self.assertEqual(parse_filesize('1.2Tb'), 1200000000000)
self.assertEqual(parse_filesize('1.2tb'), 1200000000000)
self.assertEqual(parse_filesize('1,24 KB'), 1240) self.assertEqual(parse_filesize('1,24 KB'), 1240)
self.assertEqual(parse_filesize('1,24 kb'), 1240)
self.assertEqual(parse_filesize('8.5 megabytes'), 8500000)
def test_parse_count(self): def test_parse_count(self):
self.assertEqual(parse_count(None), None) self.assertEqual(parse_count(None), None)

View File

@ -1,5 +1,5 @@
#!/usr/bin/env python #!/usr/bin/env python
# -*- coding: utf-8 -*- # coding: utf-8
from __future__ import absolute_import, unicode_literals from __future__ import absolute_import, unicode_literals
@ -131,6 +131,9 @@ class YoutubeDL(object):
username: Username for authentication purposes. username: Username for authentication purposes.
password: Password for authentication purposes. password: Password for authentication purposes.
videopassword: Password for accessing a video. videopassword: Password for accessing a video.
ap_mso: Adobe Pass multiple-system operator identifier.
ap_username: Multiple-system operator account username.
ap_password: Multiple-system operator account password.
usenetrc: Use netrc for authentication instead. usenetrc: Use netrc for authentication instead.
verbose: Print additional info to stdout. verbose: Print additional info to stdout.
quiet: Do not print messages to stdout. quiet: Do not print messages to stdout.
@ -1256,8 +1259,10 @@ class YoutubeDL(object):
info_dict['thumbnails'] = thumbnails = [{'url': thumbnail}] info_dict['thumbnails'] = thumbnails = [{'url': thumbnail}]
if thumbnails: if thumbnails:
thumbnails.sort(key=lambda t: ( thumbnails.sort(key=lambda t: (
t.get('preference'), t.get('width'), t.get('height'), t.get('preference') if t.get('preference') is not None else -1,
t.get('id'), t.get('url'))) t.get('width') if t.get('width') is not None else -1,
t.get('height') if t.get('height') is not None else -1,
t.get('id') if t.get('id') is not None else '', t.get('url')))
for i, t in enumerate(thumbnails): for i, t in enumerate(thumbnails):
t['url'] = sanitize_url(t['url']) t['url'] = sanitize_url(t['url'])
if t.get('width') and t.get('height'): if t.get('width') and t.get('height'):
@ -1299,7 +1304,7 @@ class YoutubeDL(object):
for subtitle_format in subtitle: for subtitle_format in subtitle:
if subtitle_format.get('url'): if subtitle_format.get('url'):
subtitle_format['url'] = sanitize_url(subtitle_format['url']) subtitle_format['url'] = sanitize_url(subtitle_format['url'])
if 'ext' not in subtitle_format: if subtitle_format.get('ext') is None:
subtitle_format['ext'] = determine_ext(subtitle_format['url']).lower() subtitle_format['ext'] = determine_ext(subtitle_format['url']).lower()
if self.params.get('listsubtitles', False): if self.params.get('listsubtitles', False):
@ -1354,7 +1359,7 @@ class YoutubeDL(object):
note=' ({0})'.format(format['format_note']) if format.get('format_note') is not None else '', note=' ({0})'.format(format['format_note']) if format.get('format_note') is not None else '',
) )
# Automatically determine file extension if missing # Automatically determine file extension if missing
if 'ext' not in format: if format.get('ext') is None:
format['ext'] = determine_ext(format['url']).lower() format['ext'] = determine_ext(format['url']).lower()
# Automatically determine protocol if missing (useful for format # Automatically determine protocol if missing (useful for format
# selection purposes) # selection purposes)
@ -1653,7 +1658,7 @@ class YoutubeDL(object):
video_ext, audio_ext = audio.get('ext'), video.get('ext') video_ext, audio_ext = audio.get('ext'), video.get('ext')
if video_ext and audio_ext: if video_ext and audio_ext:
COMPATIBLE_EXTS = ( COMPATIBLE_EXTS = (
('mp3', 'mp4', 'm4a', 'm4p', 'm4b', 'm4r', 'm4v'), ('mp3', 'mp4', 'm4a', 'm4p', 'm4b', 'm4r', 'm4v', 'ismv', 'isma'),
('webm') ('webm')
) )
for exts in COMPATIBLE_EXTS: for exts in COMPATIBLE_EXTS:

View File

@ -1,5 +1,5 @@
#!/usr/bin/env python #!/usr/bin/env python
# -*- coding: utf-8 -*- # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
@ -34,12 +34,14 @@ from .utils import (
setproctitle, setproctitle,
std_headers, std_headers,
write_string, write_string,
render_table,
) )
from .update import update_self from .update import update_self
from .downloader import ( from .downloader import (
FileDownloader, FileDownloader,
) )
from .extractor import gen_extractors, list_extractors from .extractor import gen_extractors, list_extractors
from .extractor.adobepass import MSO_INFO
from .YoutubeDL import YoutubeDL from .YoutubeDL import YoutubeDL
@ -118,18 +120,26 @@ def _real_main(argv=None):
desc += ' (Example: "%s%s:%s" )' % (ie.SEARCH_KEY, random.choice(_COUNTS), random.choice(_SEARCHES)) desc += ' (Example: "%s%s:%s" )' % (ie.SEARCH_KEY, random.choice(_COUNTS), random.choice(_SEARCHES))
write_string(desc + '\n', out=sys.stdout) write_string(desc + '\n', out=sys.stdout)
sys.exit(0) sys.exit(0)
if opts.ap_list_mso:
table = [[mso_id, mso_info['name']] for mso_id, mso_info in MSO_INFO.items()]
write_string('Supported TV Providers:\n' + render_table(['mso', 'mso name'], table) + '\n', out=sys.stdout)
sys.exit(0)
# Conflicting, missing and erroneous options # Conflicting, missing and erroneous options
if opts.usenetrc and (opts.username is not None or opts.password is not None): if opts.usenetrc and (opts.username is not None or opts.password is not None):
parser.error('using .netrc conflicts with giving username/password') parser.error('using .netrc conflicts with giving username/password')
if opts.password is not None and opts.username is None: if opts.password is not None and opts.username is None:
parser.error('account username missing\n') parser.error('account username missing\n')
if opts.ap_password is not None and opts.ap_username is None:
parser.error('TV Provider account username missing\n')
if opts.outtmpl is not None and (opts.usetitle or opts.autonumber or opts.useid): if opts.outtmpl is not None and (opts.usetitle or opts.autonumber or opts.useid):
parser.error('using output template conflicts with using title, video ID or auto number') parser.error('using output template conflicts with using title, video ID or auto number')
if opts.usetitle and opts.useid: if opts.usetitle and opts.useid:
parser.error('using title conflicts with using video ID') parser.error('using title conflicts with using video ID')
if opts.username is not None and opts.password is None: if opts.username is not None and opts.password is None:
opts.password = compat_getpass('Type account password and press [Return]: ') opts.password = compat_getpass('Type account password and press [Return]: ')
if opts.ap_username is not None and opts.ap_password is None:
opts.ap_password = compat_getpass('Type TV provider account password and press [Return]: ')
if opts.ratelimit is not None: if opts.ratelimit is not None:
numeric_limit = FileDownloader.parse_bytes(opts.ratelimit) numeric_limit = FileDownloader.parse_bytes(opts.ratelimit)
if numeric_limit is None: if numeric_limit is None:
@ -155,6 +165,8 @@ def _real_main(argv=None):
parser.error('max sleep interval must be greater than or equal to min sleep interval') parser.error('max sleep interval must be greater than or equal to min sleep interval')
else: else:
opts.max_sleep_interval = opts.sleep_interval opts.max_sleep_interval = opts.sleep_interval
if opts.ap_mso and opts.ap_mso not in MSO_INFO:
parser.error('Unsupported TV Provider, use --ap-list-mso to get a list of supported TV Providers')
def parse_retries(retries): def parse_retries(retries):
if retries in ('inf', 'infinite'): if retries in ('inf', 'infinite'):
@ -254,8 +266,6 @@ def _real_main(argv=None):
postprocessors.append({ postprocessors.append({
'key': 'FFmpegEmbedSubtitle', 'key': 'FFmpegEmbedSubtitle',
}) })
if opts.xattrs:
postprocessors.append({'key': 'XAttrMetadata'})
if opts.embedthumbnail: if opts.embedthumbnail:
already_have_thumbnail = opts.writethumbnail or opts.write_all_thumbnails already_have_thumbnail = opts.writethumbnail or opts.write_all_thumbnails
postprocessors.append({ postprocessors.append({
@ -264,6 +274,10 @@ def _real_main(argv=None):
}) })
if not already_have_thumbnail: if not already_have_thumbnail:
opts.writethumbnail = True opts.writethumbnail = True
# XAttrMetadataPP should be run after post-processors that may change file
# contents
if opts.xattrs:
postprocessors.append({'key': 'XAttrMetadata'})
# Please keep ExecAfterDownload towards the bottom as it allows the user to modify the final file in any way. # Please keep ExecAfterDownload towards the bottom as it allows the user to modify the final file in any way.
# So if the user is able to remove the file before your postprocessor runs it might cause a few problems. # So if the user is able to remove the file before your postprocessor runs it might cause a few problems.
if opts.exec_cmd: if opts.exec_cmd:
@ -271,12 +285,6 @@ def _real_main(argv=None):
'key': 'ExecAfterDownload', 'key': 'ExecAfterDownload',
'exec_cmd': opts.exec_cmd, 'exec_cmd': opts.exec_cmd,
}) })
if opts.xattr_set_filesize:
try:
import xattr
xattr # Confuse flake8
except ImportError:
parser.error('setting filesize xattr requested but python-xattr is not available')
external_downloader_args = None external_downloader_args = None
if opts.external_downloader_args: if opts.external_downloader_args:
external_downloader_args = compat_shlex_split(opts.external_downloader_args) external_downloader_args = compat_shlex_split(opts.external_downloader_args)
@ -293,6 +301,9 @@ def _real_main(argv=None):
'password': opts.password, 'password': opts.password,
'twofactor': opts.twofactor, 'twofactor': opts.twofactor,
'videopassword': opts.videopassword, 'videopassword': opts.videopassword,
'ap_mso': opts.ap_mso,
'ap_username': opts.ap_username,
'ap_password': opts.ap_password,
'quiet': (opts.quiet or any_getting or any_printing), 'quiet': (opts.quiet or any_getting or any_printing),
'no_warnings': opts.no_warnings, 'no_warnings': opts.no_warnings,
'forceurl': opts.geturl, 'forceurl': opts.geturl,
@ -318,6 +329,7 @@ def _real_main(argv=None):
'nooverwrites': opts.nooverwrites, 'nooverwrites': opts.nooverwrites,
'retries': opts.retries, 'retries': opts.retries,
'fragment_retries': opts.fragment_retries, 'fragment_retries': opts.fragment_retries,
'skip_unavailable_fragments': opts.skip_unavailable_fragments,
'buffersize': opts.buffersize, 'buffersize': opts.buffersize,
'noresizebuffer': opts.noresizebuffer, 'noresizebuffer': opts.noresizebuffer,
'continuedl': opts.continue_dl, 'continuedl': opts.continue_dl,

View File

@ -7,6 +7,7 @@ from .http import HttpFD
from .rtmp import RtmpFD from .rtmp import RtmpFD
from .dash import DashSegmentsFD from .dash import DashSegmentsFD
from .rtsp import RtspFD from .rtsp import RtspFD
from .ism import IsmFD
from .external import ( from .external import (
get_external_downloader, get_external_downloader,
FFmpegFD, FFmpegFD,
@ -24,6 +25,7 @@ PROTOCOL_MAP = {
'rtsp': RtspFD, 'rtsp': RtspFD,
'f4m': F4mFD, 'f4m': F4mFD,
'http_dash_segments': DashSegmentsFD, 'http_dash_segments': DashSegmentsFD,
'ism': IsmFD,
} }

View File

@ -346,7 +346,6 @@ class FileDownloader(object):
min_sleep_interval = self.params.get('sleep_interval') min_sleep_interval = self.params.get('sleep_interval')
if min_sleep_interval: if min_sleep_interval:
max_sleep_interval = self.params.get('max_sleep_interval', min_sleep_interval) max_sleep_interval = self.params.get('max_sleep_interval', min_sleep_interval)
print(min_sleep_interval, max_sleep_interval)
sleep_interval = random.uniform(min_sleep_interval, max_sleep_interval) sleep_interval = random.uniform(min_sleep_interval, max_sleep_interval)
self.to_screen('[download] Sleeping %s seconds...' % sleep_interval) self.to_screen('[download] Sleeping %s seconds...' % sleep_interval)
time.sleep(sleep_interval) time.sleep(sleep_interval)

View File

@ -1,7 +1,6 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import os import os
import re
from .fragment import FragmentFD from .fragment import FragmentFD
from ..compat import compat_urllib_error from ..compat import compat_urllib_error
@ -19,32 +18,32 @@ class DashSegmentsFD(FragmentFD):
FD_NAME = 'dashsegments' FD_NAME = 'dashsegments'
def real_download(self, filename, info_dict): def real_download(self, filename, info_dict):
base_url = info_dict['url'] segments = info_dict['fragments'][:1] if self.params.get(
segment_urls = [info_dict['segment_urls'][0]] if self.params.get('test', False) else info_dict['segment_urls'] 'test', False) else info_dict['fragments']
initialization_url = info_dict.get('initialization_url')
ctx = { ctx = {
'filename': filename, 'filename': filename,
'total_frags': len(segment_urls) + (1 if initialization_url else 0), 'total_frags': len(segments),
} }
self._prepare_and_start_frag_download(ctx) self._prepare_and_start_frag_download(ctx)
def combine_url(base_url, target_url):
if re.match(r'^https?://', target_url):
return target_url
return '%s%s%s' % (base_url, '' if base_url.endswith('/') else '/', target_url)
segments_filenames = [] segments_filenames = []
fragment_retries = self.params.get('fragment_retries', 0) fragment_retries = self.params.get('fragment_retries', 0)
skip_unavailable_fragments = self.params.get('skip_unavailable_fragments', True)
def append_url_to_file(target_url, tmp_filename, segment_name): def process_segment(segment, tmp_filename, num):
segment_url = segment['url']
segment_name = 'Frag%d' % num
target_filename = '%s-%s' % (tmp_filename, segment_name) target_filename = '%s-%s' % (tmp_filename, segment_name)
# In DASH, the first segment contains necessary headers to
# generate a valid MP4 file, so always abort for the first segment
fatal = num == 0 or not skip_unavailable_fragments
count = 0 count = 0
while count <= fragment_retries: while count <= fragment_retries:
try: try:
success = ctx['dl'].download(target_filename, {'url': combine_url(base_url, target_url)}) success = ctx['dl'].download(target_filename, {'url': segment_url})
if not success: if not success:
return False return False
down, target_sanitized = sanitize_open(target_filename, 'rb') down, target_sanitized = sanitize_open(target_filename, 'rb')
@ -52,26 +51,27 @@ class DashSegmentsFD(FragmentFD):
down.close() down.close()
segments_filenames.append(target_sanitized) segments_filenames.append(target_sanitized)
break break
except (compat_urllib_error.HTTPError, ) as err: except compat_urllib_error.HTTPError as err:
# YouTube may often return 404 HTTP error for a fragment causing the # YouTube may often return 404 HTTP error for a fragment causing the
# whole download to fail. However if the same fragment is immediately # whole download to fail. However if the same fragment is immediately
# retried with the same request data this usually succeeds (1-2 attemps # retried with the same request data this usually succeeds (1-2 attemps
# is usually enough) thus allowing to download the whole file successfully. # is usually enough) thus allowing to download the whole file successfully.
# So, we will retry all fragments that fail with 404 HTTP error for now. # To be future-proof we will retry all fragments that fail with any
if err.code != 404: # HTTP error.
raise
# Retry fragment
count += 1 count += 1
if count <= fragment_retries: if count <= fragment_retries:
self.report_retry_fragment(segment_name, count, fragment_retries) self.report_retry_fragment(err, segment_name, count, fragment_retries)
if count > fragment_retries: if count > fragment_retries:
if not fatal:
self.report_skip_fragment(segment_name)
return True
self.report_error('giving up after %s fragment retries' % fragment_retries) self.report_error('giving up after %s fragment retries' % fragment_retries)
return False return False
return True
if initialization_url: for i, segment in enumerate(segments):
append_url_to_file(initialization_url, ctx['tmpfilename'], 'Init') if not process_segment(segment, ctx['tmpfilename'], i):
for i, segment_url in enumerate(segment_urls): return False
append_url_to_file(segment_url, ctx['tmpfilename'], 'Seg%d' % i)
self._finish_frag_download(ctx) self._finish_frag_download(ctx)

View File

@ -220,6 +220,12 @@ class FFmpegFD(ExternalFD):
if proxy: if proxy:
if not re.match(r'^[\da-zA-Z]+://', proxy): if not re.match(r'^[\da-zA-Z]+://', proxy):
proxy = 'http://%s' % proxy proxy = 'http://%s' % proxy
if proxy.startswith('socks'):
self.report_warning(
'%s does not support SOCKS proxies. Downloading is likely to fail. '
'Consider adding --hls-prefer-native to your command.' % self.get_basename())
# Since December 2015 ffmpeg supports -http_proxy option (see # Since December 2015 ffmpeg supports -http_proxy option (see
# http://git.videolan.org/?p=ffmpeg.git;a=commit;h=b4eb1f29ebddd60c41a2eb39f5af701e38e0d3fd) # http://git.videolan.org/?p=ffmpeg.git;a=commit;h=b4eb1f29ebddd60c41a2eb39f5af701e38e0d3fd)
# We could switch to the following code if we are able to detect version properly # We could switch to the following code if we are able to detect version properly

View File

@ -6,6 +6,7 @@ import time
from .common import FileDownloader from .common import FileDownloader
from .http import HttpFD from .http import HttpFD
from ..utils import ( from ..utils import (
error_to_compat_str,
encodeFilename, encodeFilename,
sanitize_open, sanitize_open,
) )
@ -22,13 +23,19 @@ class FragmentFD(FileDownloader):
Available options: Available options:
fragment_retries: Number of times to retry a fragment for HTTP error (DASH only) fragment_retries: Number of times to retry a fragment for HTTP error (DASH
and hlsnative only)
skip_unavailable_fragments:
Skip unavailable fragments (DASH and hlsnative only)
""" """
def report_retry_fragment(self, fragment_name, count, retries): def report_retry_fragment(self, err, fragment_name, count, retries):
self.to_screen( self.to_screen(
'[download] Got server HTTP error. Retrying fragment %s (attempt %d of %s)...' '[download] Got server HTTP error: %s. Retrying fragment %s (attempt %d of %s)...'
% (fragment_name, count, self.format_retries(retries))) % (error_to_compat_str(err), fragment_name, count, self.format_retries(retries)))
def report_skip_fragment(self, fragment_name):
self.to_screen('[download] Skipping fragment %s...' % fragment_name)
def _prepare_and_start_frag_download(self, ctx): def _prepare_and_start_frag_download(self, ctx):
self._prepare_frag_download(ctx) self._prepare_frag_download(ctx)

View File

@ -13,6 +13,7 @@ from .fragment import FragmentFD
from .external import FFmpegFD from .external import FFmpegFD
from ..compat import ( from ..compat import (
compat_urllib_error,
compat_urlparse, compat_urlparse,
compat_struct_pack, compat_struct_pack,
) )
@ -20,6 +21,7 @@ from ..utils import (
encodeFilename, encodeFilename,
sanitize_open, sanitize_open,
parse_m3u8_attributes, parse_m3u8_attributes,
update_url_query,
) )
@ -29,7 +31,7 @@ class HlsFD(FragmentFD):
FD_NAME = 'hlsnative' FD_NAME = 'hlsnative'
@staticmethod @staticmethod
def can_download(manifest): def can_download(manifest, info_dict):
UNSUPPORTED_FEATURES = ( UNSUPPORTED_FEATURES = (
r'#EXT-X-KEY:METHOD=(?!NONE|AES-128)', # encrypted streams [1] r'#EXT-X-KEY:METHOD=(?!NONE|AES-128)', # encrypted streams [1]
r'#EXT-X-BYTERANGE', # playlists composed of byte ranges of media files [2] r'#EXT-X-BYTERANGE', # playlists composed of byte ranges of media files [2]
@ -51,6 +53,7 @@ class HlsFD(FragmentFD):
) )
check_results = [not re.search(feature, manifest) for feature in UNSUPPORTED_FEATURES] check_results = [not re.search(feature, manifest) for feature in UNSUPPORTED_FEATURES]
check_results.append(can_decrypt_frag or '#EXT-X-KEY:METHOD=AES-128' not in manifest) check_results.append(can_decrypt_frag or '#EXT-X-KEY:METHOD=AES-128' not in manifest)
check_results.append(not info_dict.get('is_live'))
return all(check_results) return all(check_results)
def real_download(self, filename, info_dict): def real_download(self, filename, info_dict):
@ -60,7 +63,7 @@ class HlsFD(FragmentFD):
s = manifest.decode('utf-8', 'ignore') s = manifest.decode('utf-8', 'ignore')
if not self.can_download(s): if not self.can_download(s, info_dict):
self.report_warning( self.report_warning(
'hlsnative has detected features it does not support, ' 'hlsnative has detected features it does not support, '
'extraction will be delegated to ffmpeg') 'extraction will be delegated to ffmpeg')
@ -82,6 +85,14 @@ class HlsFD(FragmentFD):
self._prepare_and_start_frag_download(ctx) self._prepare_and_start_frag_download(ctx)
fragment_retries = self.params.get('fragment_retries', 0)
skip_unavailable_fragments = self.params.get('skip_unavailable_fragments', True)
test = self.params.get('test', False)
extra_query = None
extra_param_to_segment_url = info_dict.get('extra_param_to_segment_url')
if extra_param_to_segment_url:
extra_query = compat_urlparse.parse_qs(extra_param_to_segment_url)
i = 0 i = 0
media_sequence = 0 media_sequence = 0
decrypt_info = {'METHOD': 'NONE'} decrypt_info = {'METHOD': 'NONE'}
@ -94,13 +105,37 @@ class HlsFD(FragmentFD):
line line
if re.match(r'^https?://', line) if re.match(r'^https?://', line)
else compat_urlparse.urljoin(man_url, line)) else compat_urlparse.urljoin(man_url, line))
frag_filename = '%s-Frag%d' % (ctx['tmpfilename'], i) frag_name = 'Frag%d' % i
frag_filename = '%s-%s' % (ctx['tmpfilename'], frag_name)
if extra_query:
frag_url = update_url_query(frag_url, extra_query)
count = 0
while count <= fragment_retries:
try:
success = ctx['dl'].download(frag_filename, {'url': frag_url}) success = ctx['dl'].download(frag_filename, {'url': frag_url})
if not success: if not success:
return False return False
down, frag_sanitized = sanitize_open(frag_filename, 'rb') down, frag_sanitized = sanitize_open(frag_filename, 'rb')
frag_content = down.read() frag_content = down.read()
down.close() down.close()
break
except compat_urllib_error.HTTPError as err:
# Unavailable (possibly temporary) fragments may be served.
# First we try to retry then either skip or abort.
# See https://github.com/rg3/youtube-dl/issues/10165,
# https://github.com/rg3/youtube-dl/issues/10448).
count += 1
if count <= fragment_retries:
self.report_retry_fragment(err, frag_name, count, fragment_retries)
if count > fragment_retries:
if skip_unavailable_fragments:
i += 1
media_sequence += 1
self.report_skip_fragment(frag_name)
continue
self.report_error(
'giving up after %s fragment retries' % fragment_retries)
return False
if decrypt_info['METHOD'] == 'AES-128': if decrypt_info['METHOD'] == 'AES-128':
iv = decrypt_info.get('IV') or compat_struct_pack('>8xq', media_sequence) iv = decrypt_info.get('IV') or compat_struct_pack('>8xq', media_sequence)
frag_content = AES.new( frag_content = AES.new(
@ -108,7 +143,7 @@ class HlsFD(FragmentFD):
ctx['dest_stream'].write(frag_content) ctx['dest_stream'].write(frag_content)
frags_filenames.append(frag_sanitized) frags_filenames.append(frag_sanitized)
# We only download the first fragment during the test # We only download the first fragment during the test
if self.params.get('test', False): if test:
break break
i += 1 i += 1
media_sequence += 1 media_sequence += 1
@ -116,10 +151,12 @@ class HlsFD(FragmentFD):
decrypt_info = parse_m3u8_attributes(line[11:]) decrypt_info = parse_m3u8_attributes(line[11:])
if decrypt_info['METHOD'] == 'AES-128': if decrypt_info['METHOD'] == 'AES-128':
if 'IV' in decrypt_info: if 'IV' in decrypt_info:
decrypt_info['IV'] = binascii.unhexlify(decrypt_info['IV'][2:]) decrypt_info['IV'] = binascii.unhexlify(decrypt_info['IV'][2:].zfill(32))
if not re.match(r'^https?://', decrypt_info['URI']): if not re.match(r'^https?://', decrypt_info['URI']):
decrypt_info['URI'] = compat_urlparse.urljoin( decrypt_info['URI'] = compat_urlparse.urljoin(
man_url, decrypt_info['URI']) man_url, decrypt_info['URI'])
if extra_query:
decrypt_info['URI'] = update_url_query(decrypt_info['URI'], extra_query)
decrypt_info['KEY'] = self.ydl.urlopen(decrypt_info['URI']).read() decrypt_info['KEY'] = self.ydl.urlopen(decrypt_info['URI']).read()
elif line.startswith('#EXT-X-MEDIA-SEQUENCE'): elif line.startswith('#EXT-X-MEDIA-SEQUENCE'):
media_sequence = int(line[22:]) media_sequence = int(line[22:])

View File

@ -13,6 +13,9 @@ from ..utils import (
encodeFilename, encodeFilename,
sanitize_open, sanitize_open,
sanitized_Request, sanitized_Request,
write_xattr,
XAttrMetadataError,
XAttrUnavailableError,
) )
@ -179,9 +182,8 @@ class HttpFD(FileDownloader):
if self.params.get('xattr_set_filesize', False) and data_len is not None: if self.params.get('xattr_set_filesize', False) and data_len is not None:
try: try:
import xattr write_xattr(tmpfilename, 'user.ytdl.filesize', str(data_len).encode('utf-8'))
xattr.setxattr(tmpfilename, 'user.ytdl.filesize', str(data_len)) except (XAttrUnavailableError, XAttrMetadataError) as err:
except(OSError, IOError, ImportError) as err:
self.report_error('unable to set filesize xattr: %s' % str(err)) self.report_error('unable to set filesize xattr: %s' % str(err))
try: try:

View File

@ -0,0 +1,273 @@
from __future__ import unicode_literals
import os
import time
import struct
import binascii
import io
from .fragment import FragmentFD
from ..compat import compat_urllib_error
from ..utils import (
sanitize_open,
encodeFilename,
)
u8 = struct.Struct(b'>B')
u88 = struct.Struct(b'>Bx')
u16 = struct.Struct(b'>H')
u1616 = struct.Struct(b'>Hxx')
u32 = struct.Struct(b'>I')
u64 = struct.Struct(b'>Q')
s88 = struct.Struct(b'>bx')
s16 = struct.Struct(b'>h')
s1616 = struct.Struct(b'>hxx')
s32 = struct.Struct(b'>i')
unity_matrix = (s32.pack(0x10000) + s32.pack(0) * 3) * 2 + s32.pack(0x40000000)
TRACK_ENABLED = 0x1
TRACK_IN_MOVIE = 0x2
TRACK_IN_PREVIEW = 0x4
SELF_CONTAINED = 0x1
def box(box_type, payload):
return u32.pack(8 + len(payload)) + box_type + payload
def full_box(box_type, version, flags, payload):
return box(box_type, u8.pack(version) + u32.pack(flags)[1:] + payload)
def write_piff_header(stream, params):
track_id = params['track_id']
fourcc = params['fourcc']
duration = params['duration']
timescale = params.get('timescale', 10000000)
language = params.get('language', 'und')
height = params.get('height', 0)
width = params.get('width', 0)
is_audio = width == 0 and height == 0
creation_time = modification_time = int(time.time())
ftyp_payload = b'isml' # major brand
ftyp_payload += u32.pack(1) # minor version
ftyp_payload += b'piff' + b'iso2' # compatible brands
stream.write(box(b'ftyp', ftyp_payload)) # File Type Box
mvhd_payload = u64.pack(creation_time)
mvhd_payload += u64.pack(modification_time)
mvhd_payload += u32.pack(timescale)
mvhd_payload += u64.pack(duration)
mvhd_payload += s1616.pack(1) # rate
mvhd_payload += s88.pack(1) # volume
mvhd_payload += u16.pack(0) # reserved
mvhd_payload += u32.pack(0) * 2 # reserved
mvhd_payload += unity_matrix
mvhd_payload += u32.pack(0) * 6 # pre defined
mvhd_payload += u32.pack(0xffffffff) # next track id
moov_payload = full_box(b'mvhd', 1, 0, mvhd_payload) # Movie Header Box
tkhd_payload = u64.pack(creation_time)
tkhd_payload += u64.pack(modification_time)
tkhd_payload += u32.pack(track_id) # track id
tkhd_payload += u32.pack(0) # reserved
tkhd_payload += u64.pack(duration)
tkhd_payload += u32.pack(0) * 2 # reserved
tkhd_payload += s16.pack(0) # layer
tkhd_payload += s16.pack(0) # alternate group
tkhd_payload += s88.pack(1 if is_audio else 0) # volume
tkhd_payload += u16.pack(0) # reserved
tkhd_payload += unity_matrix
tkhd_payload += u1616.pack(width)
tkhd_payload += u1616.pack(height)
trak_payload = full_box(b'tkhd', 1, TRACK_ENABLED | TRACK_IN_MOVIE | TRACK_IN_PREVIEW, tkhd_payload) # Track Header Box
mdhd_payload = u64.pack(creation_time)
mdhd_payload += u64.pack(modification_time)
mdhd_payload += u32.pack(timescale)
mdhd_payload += u64.pack(duration)
mdhd_payload += u16.pack(((ord(language[0]) - 0x60) << 10) | ((ord(language[1]) - 0x60) << 5) | (ord(language[2]) - 0x60))
mdhd_payload += u16.pack(0) # pre defined
mdia_payload = full_box(b'mdhd', 1, 0, mdhd_payload) # Media Header Box
hdlr_payload = u32.pack(0) # pre defined
hdlr_payload += b'soun' if is_audio else b'vide' # handler type
hdlr_payload += u32.pack(0) * 3 # reserved
hdlr_payload += (b'Sound' if is_audio else b'Video') + b'Handler\0' # name
mdia_payload += full_box(b'hdlr', 0, 0, hdlr_payload) # Handler Reference Box
if is_audio:
smhd_payload = s88.pack(0) # balance
smhd_payload = u16.pack(0) # reserved
media_header_box = full_box(b'smhd', 0, 0, smhd_payload) # Sound Media Header
else:
vmhd_payload = u16.pack(0) # graphics mode
vmhd_payload += u16.pack(0) * 3 # opcolor
media_header_box = full_box(b'vmhd', 0, 1, vmhd_payload) # Video Media Header
minf_payload = media_header_box
dref_payload = u32.pack(1) # entry count
dref_payload += full_box(b'url ', 0, SELF_CONTAINED, b'') # Data Entry URL Box
dinf_payload = full_box(b'dref', 0, 0, dref_payload) # Data Reference Box
minf_payload += box(b'dinf', dinf_payload) # Data Information Box
stsd_payload = u32.pack(1) # entry count
sample_entry_payload = u8.pack(0) * 6 # reserved
sample_entry_payload += u16.pack(1) # data reference index
if is_audio:
sample_entry_payload += u32.pack(0) * 2 # reserved
sample_entry_payload += u16.pack(params.get('channels', 2))
sample_entry_payload += u16.pack(params.get('bits_per_sample', 16))
sample_entry_payload += u16.pack(0) # pre defined
sample_entry_payload += u16.pack(0) # reserved
sample_entry_payload += u1616.pack(params['sampling_rate'])
if fourcc == 'AACL':
smaple_entry_box = box(b'mp4a', sample_entry_payload)
else:
sample_entry_payload = sample_entry_payload
sample_entry_payload += u16.pack(0) # pre defined
sample_entry_payload += u16.pack(0) # reserved
sample_entry_payload += u32.pack(0) * 3 # pre defined
sample_entry_payload += u16.pack(width)
sample_entry_payload += u16.pack(height)
sample_entry_payload += u1616.pack(0x48) # horiz resolution 72 dpi
sample_entry_payload += u1616.pack(0x48) # vert resolution 72 dpi
sample_entry_payload += u32.pack(0) # reserved
sample_entry_payload += u16.pack(1) # frame count
sample_entry_payload += u8.pack(0) * 32 # compressor name
sample_entry_payload += u16.pack(0x18) # depth
sample_entry_payload += s16.pack(-1) # pre defined
codec_private_data = binascii.unhexlify(params['codec_private_data'])
if fourcc in ('H264', 'AVC1'):
sps, pps = codec_private_data.split(u32.pack(1))[1:]
avcc_payload = u8.pack(1) # configuration version
avcc_payload += sps[1] # avc profile indication
avcc_payload += sps[2] # profile compatibility
avcc_payload += sps[3] # avc level indication
avcc_payload += u8.pack(0xfc | (params.get('nal_unit_length_field', 4) - 1)) # complete represenation (1) + reserved (11111) + length size minus one
avcc_payload += u8.pack(1) # reserved (0) + number of sps (0000001)
avcc_payload += u16.pack(len(sps))
avcc_payload += sps
avcc_payload += u8.pack(1) # number of pps
avcc_payload += u16.pack(len(pps))
avcc_payload += pps
sample_entry_payload += box(b'avcC', avcc_payload) # AVC Decoder Configuration Record
smaple_entry_box = box(b'avc1', sample_entry_payload) # AVC Simple Entry
stsd_payload += smaple_entry_box
stbl_payload = full_box(b'stsd', 0, 0, stsd_payload) # Sample Description Box
stts_payload = u32.pack(0) # entry count
stbl_payload += full_box(b'stts', 0, 0, stts_payload) # Decoding Time to Sample Box
stsc_payload = u32.pack(0) # entry count
stbl_payload += full_box(b'stsc', 0, 0, stsc_payload) # Sample To Chunk Box
stco_payload = u32.pack(0) # entry count
stbl_payload += full_box(b'stco', 0, 0, stco_payload) # Chunk Offset Box
minf_payload += box(b'stbl', stbl_payload) # Sample Table Box
mdia_payload += box(b'minf', minf_payload) # Media Information Box
trak_payload += box(b'mdia', mdia_payload) # Media Box
moov_payload += box(b'trak', trak_payload) # Track Box
mehd_payload = u64.pack(duration)
mvex_payload = full_box(b'mehd', 1, 0, mehd_payload) # Movie Extends Header Box
trex_payload = u32.pack(track_id) # track id
trex_payload += u32.pack(1) # default sample description index
trex_payload += u32.pack(0) # default sample duration
trex_payload += u32.pack(0) # default sample size
trex_payload += u32.pack(0) # default sample flags
mvex_payload += full_box(b'trex', 0, 0, trex_payload) # Track Extends Box
moov_payload += box(b'mvex', mvex_payload) # Movie Extends Box
stream.write(box(b'moov', moov_payload)) # Movie Box
def extract_box_data(data, box_sequence):
data_reader = io.BytesIO(data)
while True:
box_size = u32.unpack(data_reader.read(4))[0]
box_type = data_reader.read(4)
if box_type == box_sequence[0]:
box_data = data_reader.read(box_size - 8)
if len(box_sequence) == 1:
return box_data
return extract_box_data(box_data, box_sequence[1:])
data_reader.seek(box_size - 8, 1)
class IsmFD(FragmentFD):
"""
Download segments in a ISM manifest
"""
FD_NAME = 'ism'
def real_download(self, filename, info_dict):
segments = info_dict['fragments'][:1] if self.params.get(
'test', False) else info_dict['fragments']
ctx = {
'filename': filename,
'total_frags': len(segments),
}
self._prepare_and_start_frag_download(ctx)
segments_filenames = []
fragment_retries = self.params.get('fragment_retries', 0)
skip_unavailable_fragments = self.params.get('skip_unavailable_fragments', True)
track_written = False
for i, segment in enumerate(segments):
segment_url = segment['url']
segment_name = 'Frag%d' % i
target_filename = '%s-%s' % (ctx['tmpfilename'], segment_name)
count = 0
while count <= fragment_retries:
try:
success = ctx['dl'].download(target_filename, {'url': segment_url})
if not success:
return False
down, target_sanitized = sanitize_open(target_filename, 'rb')
down_data = down.read()
if not track_written:
tfhd_data = extract_box_data(down_data, [b'moof', b'traf', b'tfhd'])
info_dict['_download_params']['track_id'] = u32.unpack(tfhd_data[4:8])[0]
write_piff_header(ctx['dest_stream'], info_dict['_download_params'])
track_written = True
ctx['dest_stream'].write(down_data)
down.close()
segments_filenames.append(target_sanitized)
break
except compat_urllib_error.HTTPError as err:
count += 1
if count <= fragment_retries:
self.report_retry_fragment(err, segment_name, count, fragment_retries)
if count > fragment_retries:
if skip_unavailable_fragments:
self.report_skip_fragment(segment_name)
continue
self.report_error('giving up after %s fragment retries' % fragment_retries)
return False
self._finish_frag_download(ctx)
for segment_file in segments_filenames:
os.remove(encodeFilename(segment_file))
return True

View File

@ -7,12 +7,13 @@ from ..utils import (
ExtractorError, ExtractorError,
js_to_json, js_to_json,
int_or_none, int_or_none,
parse_iso8601,
) )
class ABCIE(InfoExtractor): class ABCIE(InfoExtractor):
IE_NAME = 'abc.net.au' IE_NAME = 'abc.net.au'
_VALID_URL = r'https?://www\.abc\.net\.au/news/(?:[^/]+/){1,2}(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?abc\.net\.au/news/(?:[^/]+/){1,2}(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.abc.net.au/news/2014-11-05/australia-to-staff-ebola-treatment-centre-in-sierra-leone/5868334', 'url': 'http://www.abc.net.au/news/2014-11-05/australia-to-staff-ebola-treatment-centre-in-sierra-leone/5868334',
@ -93,3 +94,59 @@ class ABCIE(InfoExtractor):
'description': self._og_search_description(webpage), 'description': self._og_search_description(webpage),
'thumbnail': self._og_search_thumbnail(webpage), 'thumbnail': self._og_search_thumbnail(webpage),
} }
class ABCIViewIE(InfoExtractor):
IE_NAME = 'abc.net.au:iview'
_VALID_URL = r'https?://iview\.abc\.net\.au/programs/[^/]+/(?P<id>[^/?#]+)'
# ABC iview programs are normally available for 14 days only.
_TESTS = [{
'url': 'http://iview.abc.net.au/programs/diaries-of-a-broken-mind/ZX9735A001S00',
'md5': 'cde42d728b3b7c2b32b1b94b4a548afc',
'info_dict': {
'id': 'ZX9735A001S00',
'ext': 'mp4',
'title': 'Diaries Of A Broken Mind',
'description': 'md5:7de3903874b7a1be279fe6b68718fc9e',
'upload_date': '20161010',
'uploader_id': 'abc2',
'timestamp': 1476064920,
},
'skip': 'Video gone',
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_params = self._parse_json(self._search_regex(
r'videoParams\s*=\s*({.+?});', webpage, 'video params'), video_id)
title = video_params.get('title') or video_params['seriesTitle']
stream = next(s for s in video_params['playlist'] if s.get('type') == 'program')
formats = self._extract_akamai_formats(stream['hds-unmetered'], video_id)
self._sort_formats(formats)
subtitles = {}
src_vtt = stream.get('captions', {}).get('src-vtt')
if src_vtt:
subtitles['en'] = [{
'url': src_vtt,
'ext': 'vtt',
}]
return {
'id': video_id,
'title': title,
'description': self._html_search_meta(['og:description', 'twitter:description'], webpage),
'thumbnail': self._html_search_meta(['og:image', 'twitter:image:src'], webpage),
'duration': int_or_none(video_params.get('eventDuration')),
'timestamp': parse_iso8601(video_params.get('pubDate'), ' '),
'series': video_params.get('seriesTitle'),
'series_id': video_params.get('seriesHouseNumber') or video_id[:7],
'episode_number': int_or_none(self._html_search_meta('episodeNumber', webpage, default=None)),
'episode': self._html_search_meta('episode_title', webpage, default=None),
'uploader_id': video_params.get('channel'),
'formats': formats,
'subtitles': subtitles,
}

View File

@ -12,7 +12,7 @@ from ..compat import compat_urlparse
class AbcNewsVideoIE(AMPIE): class AbcNewsVideoIE(AMPIE):
IE_NAME = 'abcnews:video' IE_NAME = 'abcnews:video'
_VALID_URL = 'http://abcnews.go.com/[^/]+/video/(?P<display_id>[0-9a-z-]+)-(?P<id>\d+)' _VALID_URL = r'https?://abcnews\.go\.com/[^/]+/video/(?P<display_id>[0-9a-z-]+)-(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://abcnews.go.com/ThisWeek/video/week-exclusive-irans-foreign-minister-zarif-20411932', 'url': 'http://abcnews.go.com/ThisWeek/video/week-exclusive-irans-foreign-minister-zarif-20411932',
@ -49,7 +49,7 @@ class AbcNewsVideoIE(AMPIE):
class AbcNewsIE(InfoExtractor): class AbcNewsIE(InfoExtractor):
IE_NAME = 'abcnews' IE_NAME = 'abcnews'
_VALID_URL = 'https?://abcnews\.go\.com/(?:[^/]+/)+(?P<display_id>[0-9a-z-]+)/story\?id=(?P<id>\d+)' _VALID_URL = r'https?://abcnews\.go\.com/(?:[^/]+/)+(?P<display_id>[0-9a-z-]+)/story\?id=(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://abcnews.go.com/Blotter/News/dramatic-video-rare-death-job-america/story?id=10498713#.UIhwosWHLjY', 'url': 'http://abcnews.go.com/Blotter/News/dramatic-video-rare-death-job-america/story?id=10498713#.UIhwosWHLjY',

View File

@ -1,13 +1,19 @@
# coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import parse_iso8601 from ..utils import (
int_or_none,
parse_iso8601,
)
class Abc7NewsIE(InfoExtractor): class ABCOTVSIE(InfoExtractor):
_VALID_URL = r'https?://abc7news\.com(?:/[^/]+/(?P<display_id>[^/]+))?/(?P<id>\d+)' IE_NAME = 'abcotvs'
IE_DESC = 'ABC Owned Television Stations'
_VALID_URL = r'https?://(?:abc(?:7(?:news|ny|chicago)?|11|13|30)|6abc)\.com(?:/[^/]+/(?P<display_id>[^/]+))?/(?P<id>\d+)'
_TESTS = [ _TESTS = [
{ {
'url': 'http://abc7news.com/entertainment/east-bay-museum-celebrates-vintage-synthesizers/472581/', 'url': 'http://abc7news.com/entertainment/east-bay-museum-celebrates-vintage-synthesizers/472581/',
@ -15,7 +21,7 @@ class Abc7NewsIE(InfoExtractor):
'id': '472581', 'id': '472581',
'display_id': 'east-bay-museum-celebrates-vintage-synthesizers', 'display_id': 'east-bay-museum-celebrates-vintage-synthesizers',
'ext': 'mp4', 'ext': 'mp4',
'title': 'East Bay museum celebrates history of synthesized music', 'title': 'East Bay museum celebrates vintage synthesizers',
'description': 'md5:a4f10fb2f2a02565c1749d4adbab4b10', 'description': 'md5:a4f10fb2f2a02565c1749d4adbab4b10',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
'timestamp': 1421123075, 'timestamp': 1421123075,
@ -41,7 +47,7 @@ class Abc7NewsIE(InfoExtractor):
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
m3u8 = self._html_search_meta( m3u8 = self._html_search_meta(
'contentURL', webpage, 'm3u8 url', fatal=True) 'contentURL', webpage, 'm3u8 url', fatal=True).split('?')[0]
formats = self._extract_m3u8_formats(m3u8, display_id, 'mp4') formats = self._extract_m3u8_formats(m3u8, display_id, 'mp4')
self._sort_formats(formats) self._sort_formats(formats)
@ -66,3 +72,41 @@ class Abc7NewsIE(InfoExtractor):
'uploader': uploader, 'uploader': uploader,
'formats': formats, 'formats': formats,
} }
class ABCOTVSClipsIE(InfoExtractor):
IE_NAME = 'abcotvs:clips'
_VALID_URL = r'https?://clips\.abcotvs\.com/(?:[^/]+/)*video/(?P<id>\d+)'
_TEST = {
'url': 'https://clips.abcotvs.com/kabc/video/214814',
'info_dict': {
'id': '214814',
'ext': 'mp4',
'title': 'SpaceX launch pad explosion destroys rocket, satellite',
'description': 'md5:9f186e5ad8f490f65409965ee9c7be1b',
'upload_date': '20160901',
'timestamp': 1472756695,
},
'params': {
# m3u8 download
'skip_download': True,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
video_data = self._download_json('https://clips.abcotvs.com/vogo/video/getByIds?ids=' + video_id, video_id)['results'][0]
title = video_data['title']
formats = self._extract_m3u8_formats(
video_data['videoURL'].split('?')[0], video_id, 'mp4')
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': video_data.get('description'),
'thumbnail': video_data.get('thumbnailURL'),
'duration': int_or_none(video_data.get('duration')),
'timestamp': int_or_none(video_data.get('pubDate')),
'formats': formats,
}

File diff suppressed because it is too large Load Diff

View File

@ -3,16 +3,14 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .turner import TurnerBaseIE
from ..utils import ( from ..utils import (
determine_ext,
ExtractorError, ExtractorError,
float_or_none, int_or_none,
xpath_text,
) )
class AdultSwimIE(InfoExtractor): class AdultSwimIE(TurnerBaseIE):
_VALID_URL = r'https?://(?:www\.)?adultswim\.com/videos/(?P<is_playlist>playlists/)?(?P<show_path>[^/]+)/(?P<episode_path>[^/?#]+)/?' _VALID_URL = r'https?://(?:www\.)?adultswim\.com/videos/(?P<is_playlist>playlists/)?(?P<show_path>[^/]+)/(?P<episode_path>[^/?#]+)/?'
_TESTS = [{ _TESTS = [{
@ -96,7 +94,29 @@ class AdultSwimIE(InfoExtractor):
'params': { 'params': {
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
} },
'expected_warnings': ['Unable to download f4m manifest'],
}, {
'url': 'http://www.adultswim.com/videos/toonami/friday-october-14th-2016/',
'info_dict': {
'id': 'eYiLsKVgQ6qTC6agD67Sig',
'title': 'Toonami - Friday, October 14th, 2016',
'description': 'md5:99892c96ffc85e159a428de85c30acde',
},
'playlist': [{
'md5': '',
'info_dict': {
'id': 'eYiLsKVgQ6qTC6agD67Sig',
'ext': 'mp4',
'title': 'Toonami - Friday, October 14th, 2016',
'description': 'md5:99892c96ffc85e159a428de85c30acde',
},
}],
'params': {
# m3u8 download
'skip_download': True,
},
'expected_warnings': ['Unable to download f4m manifest'],
}] }]
@staticmethod @staticmethod
@ -148,7 +168,10 @@ class AdultSwimIE(InfoExtractor):
if bootstrapped_data.get('slugged_video', {}).get('slug') == episode_path: if bootstrapped_data.get('slugged_video', {}).get('slug') == episode_path:
video_info = bootstrapped_data['slugged_video'] video_info = bootstrapped_data['slugged_video']
if not video_info: if not video_info:
video_info = bootstrapped_data.get('heroMetadata', {}).get('trailer').get('video') video_info = bootstrapped_data.get(
'heroMetadata', {}).get('trailer', {}).get('video')
if not video_info:
video_info = bootstrapped_data.get('onlineOriginals', [None])[0]
if not video_info: if not video_info:
raise ExtractorError('Unable to find video info') raise ExtractorError('Unable to find video info')
@ -161,71 +184,41 @@ class AdultSwimIE(InfoExtractor):
segment_ids = [clip['videoPlaybackID'] for clip in video_info['clips']] segment_ids = [clip['videoPlaybackID'] for clip in video_info['clips']]
elif video_info.get('videoPlaybackID'): elif video_info.get('videoPlaybackID'):
segment_ids = [video_info['videoPlaybackID']] segment_ids = [video_info['videoPlaybackID']]
elif video_info.get('id'):
segment_ids = [video_info['id']]
else: else:
if video_info.get('auth') is True:
raise ExtractorError( raise ExtractorError(
'This video is only available via cable service provider subscription that' 'This video is only available via cable service provider subscription that'
' is not currently supported. You may want to use --cookies.' ' is not currently supported. You may want to use --cookies.', expected=True)
if video_info.get('auth') is True else 'Unable to find stream or clips', else:
expected=True) raise ExtractorError('Unable to find stream or clips')
episode_id = video_info['id'] episode_id = video_info['id']
episode_title = video_info['title'] episode_title = video_info['title']
episode_description = video_info['description'] episode_description = video_info.get('description')
episode_duration = video_info.get('duration') episode_duration = int_or_none(video_info.get('duration'))
view_count = int_or_none(video_info.get('views'))
entries = [] entries = []
for part_num, segment_id in enumerate(segment_ids): for part_num, segment_id in enumerate(segment_ids):
segment_url = 'http://www.adultswim.com/videos/api/v0/assets?id=%s&platform=desktop' % segment_id segement_info = self._extract_cvp_info(
'http://www.adultswim.com/videos/api/v0/assets?id=%s&platform=desktop' % segment_id,
segment_id, {
'secure': {
'media_src': 'http://androidhls-secure.cdn.turner.com/adultswim/big',
'tokenizer_src': 'http://www.adultswim.com/astv/mvpd/processors/services/token_ipadAdobe.do',
},
})
segment_title = '%s - %s' % (show_title, episode_title) segment_title = '%s - %s' % (show_title, episode_title)
if len(segment_ids) > 1: if len(segment_ids) > 1:
segment_title += ' Part %d' % (part_num + 1) segment_title += ' Part %d' % (part_num + 1)
segement_info.update({
idoc = self._download_xml(
segment_url, segment_title,
'Downloading segment information', 'Unable to download segment information')
segment_duration = float_or_none(
xpath_text(idoc, './/trt', 'segment duration').strip())
formats = []
file_els = idoc.findall('.//files/file') or idoc.findall('./files/file')
unique_urls = []
unique_file_els = []
for file_el in file_els:
media_url = file_el.text
if not media_url or determine_ext(media_url) == 'f4m':
continue
if file_el.text not in unique_urls:
unique_urls.append(file_el.text)
unique_file_els.append(file_el)
for file_el in unique_file_els:
bitrate = file_el.attrib.get('bitrate')
ftype = file_el.attrib.get('type')
media_url = file_el.text
if determine_ext(media_url) == 'm3u8':
formats.extend(self._extract_m3u8_formats(
media_url, segment_title, 'mp4', preference=0,
m3u8_id='hls', fatal=False))
else:
formats.append({
'format_id': '%s_%s' % (bitrate, ftype),
'url': file_el.text.strip(),
# The bitrate may not be a number (for example: 'iphone')
'tbr': int(bitrate) if bitrate.isdigit() else None,
})
self._sort_formats(formats)
entries.append({
'id': segment_id, 'id': segment_id,
'title': segment_title, 'title': segment_title,
'formats': formats, 'description': episode_description,
'duration': segment_duration,
'description': episode_description
}) })
entries.append(segement_info)
return { return {
'_type': 'playlist', '_type': 'playlist',
@ -234,5 +227,6 @@ class AdultSwimIE(InfoExtractor):
'entries': entries, 'entries': entries,
'title': '%s - %s' % (show_title, episode_title), 'title': '%s - %s' % (show_title, episode_title),
'description': episode_description, 'description': episode_description,
'duration': episode_duration 'duration': episode_duration,
'view_count': view_count,
} }

View File

@ -109,7 +109,10 @@ class AENetworksIE(AENetworksBaseIE):
info = self._parse_theplatform_metadata(theplatform_metadata) info = self._parse_theplatform_metadata(theplatform_metadata)
if theplatform_metadata.get('AETN$isBehindWall'): if theplatform_metadata.get('AETN$isBehindWall'):
requestor_id = self._DOMAIN_TO_REQUESTOR_ID[domain] requestor_id = self._DOMAIN_TO_REQUESTOR_ID[domain]
resource = '<rss version="2.0" xmlns:media="http://search.yahoo.com/mrss/"><channel><title>%s</title><item><title>%s</title><guid>%s</guid><media:rating scheme="urn:v-chip">%s</media:rating></item></channel></rss>' % (requestor_id, theplatform_metadata['title'], theplatform_metadata['AETN$PPL_pplProgramId'], theplatform_metadata['ratings'][0]['rating']) resource = self._get_mvpd_resource(
requestor_id, theplatform_metadata['title'],
theplatform_metadata.get('AETN$PPL_pplProgramId') or theplatform_metadata.get('AETN$PPL_pplProgramId_OLD'),
theplatform_metadata['ratings'][0]['rating'])
query['auth'] = self._extract_mvpd_auth( query['auth'] = self._extract_mvpd_auth(
url, video_id, requestor_id, resource) url, video_id, requestor_id, resource)
info.update(self._search_json_ld(webpage, video_id, fatal=False)) info.update(self._search_json_ld(webpage, video_id, fatal=False))

View File

@ -1,64 +0,0 @@
# encoding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import int_or_none
class AftonbladetIE(InfoExtractor):
_VALID_URL = r'https?://tv\.aftonbladet\.se/abtv/articles/(?P<id>[0-9]+)'
_TEST = {
'url': 'http://tv.aftonbladet.se/abtv/articles/36015',
'info_dict': {
'id': '36015',
'ext': 'mp4',
'title': 'Vulkanutbrott i rymden - nu släpper NASA bilderna',
'description': 'Jupiters måne mest aktiv av alla himlakroppar',
'timestamp': 1394142732,
'upload_date': '20140306',
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
# find internal video meta data
meta_url = 'http://aftonbladet-play-metadata.cdn.drvideo.aptoma.no/video/%s.json'
player_config = self._parse_json(self._html_search_regex(
r'data-player-config="([^"]+)"', webpage, 'player config'), video_id)
internal_meta_id = player_config['aptomaVideoId']
internal_meta_url = meta_url % internal_meta_id
internal_meta_json = self._download_json(
internal_meta_url, video_id, 'Downloading video meta data')
# find internal video formats
format_url = 'http://aftonbladet-play.videodata.drvideo.aptoma.no/actions/video/?id=%s'
internal_video_id = internal_meta_json['videoId']
internal_formats_url = format_url % internal_video_id
internal_formats_json = self._download_json(
internal_formats_url, video_id, 'Downloading video formats')
formats = []
for fmt in internal_formats_json['formats']['http']['pseudostreaming']['mp4']:
p = fmt['paths'][0]
formats.append({
'url': 'http://%s:%d/%s/%s' % (p['address'], p['port'], p['path'], p['filename']),
'ext': 'mp4',
'width': int_or_none(fmt.get('width')),
'height': int_or_none(fmt.get('height')),
'tbr': int_or_none(fmt.get('bitrate')),
'protocol': 'http',
})
self._sort_formats(formats)
return {
'id': video_id,
'title': internal_meta_json['title'],
'formats': formats,
'thumbnail': internal_meta_json.get('imageUrl'),
'description': internal_meta_json.get('shortPreamble'),
'timestamp': int_or_none(internal_meta_json.get('timePublished')),
'duration': int_or_none(internal_meta_json.get('duration')),
'view_count': int_or_none(internal_meta_json.get('views')),
}

View File

@ -4,7 +4,7 @@ from .common import InfoExtractor
class AlJazeeraIE(InfoExtractor): class AlJazeeraIE(InfoExtractor):
_VALID_URL = r'https?://www\.aljazeera\.com/programmes/.*?/(?P<id>[^/]+)\.html' _VALID_URL = r'https?://(?:www\.)?aljazeera\.com/programmes/.*?/(?P<id>[^/]+)\.html'
_TEST = { _TEST = {
'url': 'http://www.aljazeera.com/programmes/the-slum/2014/08/deliverance-201482883754237240.html', 'url': 'http://www.aljazeera.com/programmes/the-slum/2014/08/deliverance-201482883754237240.html',

View File

@ -1,29 +1,26 @@
# -*- coding: utf-8 -*- # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str
from ..utils import ( from ..utils import (
remove_end,
qualities, qualities,
unescapeHTML, url_basename,
xpath_element,
) )
class AllocineIE(InfoExtractor): class AllocineIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?allocine\.fr/(?P<typ>article|video|film)/(fichearticle_gen_carticle=|player_gen_cmedia=|fichefilm_gen_cfilm=|video-)(?P<id>[0-9]+)(?:\.html)?' _VALID_URL = r'https?://(?:www\.)?allocine\.fr/(?:article|video|film)/(?:fichearticle_gen_carticle=|player_gen_cmedia=|fichefilm_gen_cfilm=|video-)(?P<id>[0-9]+)(?:\.html)?'
_TESTS = [{ _TESTS = [{
'url': 'http://www.allocine.fr/article/fichearticle_gen_carticle=18635087.html', 'url': 'http://www.allocine.fr/article/fichearticle_gen_carticle=18635087.html',
'md5': '0c9fcf59a841f65635fa300ac43d8269', 'md5': '0c9fcf59a841f65635fa300ac43d8269',
'info_dict': { 'info_dict': {
'id': '19546517', 'id': '19546517',
'display_id': '18635087',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Astérix - Le Domaine des Dieux Teaser VF', 'title': 'Astérix - Le Domaine des Dieux Teaser VF',
'description': 'md5:abcd09ce503c6560512c14ebfdb720d2', 'description': 'md5:4a754271d9c6f16c72629a8a993ee884',
'thumbnail': 're:http://.*\.jpg', 'thumbnail': 're:http://.*\.jpg',
}, },
}, { }, {
@ -31,64 +28,82 @@ class AllocineIE(InfoExtractor):
'md5': 'd0cdce5d2b9522ce279fdfec07ff16e0', 'md5': 'd0cdce5d2b9522ce279fdfec07ff16e0',
'info_dict': { 'info_dict': {
'id': '19540403', 'id': '19540403',
'display_id': '19540403',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Planes 2 Bande-annonce VF', 'title': 'Planes 2 Bande-annonce VF',
'description': 'Regardez la bande annonce du film Planes 2 (Planes 2 Bande-annonce VF). Planes 2, un film de Roberts Gannaway', 'description': 'Regardez la bande annonce du film Planes 2 (Planes 2 Bande-annonce VF). Planes 2, un film de Roberts Gannaway',
'thumbnail': 're:http://.*\.jpg', 'thumbnail': 're:http://.*\.jpg',
}, },
}, { }, {
'url': 'http://www.allocine.fr/film/fichefilm_gen_cfilm=181290.html', 'url': 'http://www.allocine.fr/video/player_gen_cmedia=19544709&cfilm=181290.html',
'md5': '101250fb127ef9ca3d73186ff22a47ce', 'md5': '101250fb127ef9ca3d73186ff22a47ce',
'info_dict': { 'info_dict': {
'id': '19544709', 'id': '19544709',
'display_id': '19544709',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Dragons 2 - Bande annonce finale VF', 'title': 'Dragons 2 - Bande annonce finale VF',
'description': 'md5:601d15393ac40f249648ef000720e7e3', 'description': 'md5:6cdd2d7c2687d4c6aafe80a35e17267a',
'thumbnail': 're:http://.*\.jpg', 'thumbnail': 're:http://.*\.jpg',
}, },
}, { }, {
'url': 'http://www.allocine.fr/video/video-19550147/', 'url': 'http://www.allocine.fr/video/video-19550147/',
'only_matching': True, 'md5': '3566c0668c0235e2d224fd8edb389f67',
'info_dict': {
'id': '19550147',
'ext': 'mp4',
'title': 'Faux Raccord N°123 - Les gaffes de Cliffhanger',
'description': 'md5:bc734b83ffa2d8a12188d9eb48bb6354',
'thumbnail': 're:http://.*\.jpg',
},
}] }]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) display_id = self._match_id(url)
typ = mobj.group('typ')
display_id = mobj.group('id')
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
if typ == 'film': formats = []
video_id = self._search_regex(r'href="/video/player_gen_cmedia=([0-9]+).+"', webpage, 'video id')
else:
player = self._search_regex(r'data-player=\'([^\']+)\'>', webpage, 'data player', default=None)
if player:
player_data = json.loads(player)
video_id = compat_str(player_data['refMedia'])
else:
model = self._search_regex(r'data-model="([^"]+)">', webpage, 'data model')
model_data = self._parse_json(unescapeHTML(model), display_id)
video_id = compat_str(model_data['id'])
xml = self._download_xml('http://www.allocine.fr/ws/AcVisiondataV4.ashx?media=%s' % video_id, display_id)
video = xpath_element(xml, './/AcVisionVideo').attrib
quality = qualities(['ld', 'md', 'hd']) quality = qualities(['ld', 'md', 'hd'])
formats = [] model = self._html_search_regex(
for k, v in video.items(): r'data-model="([^"]+)"', webpage, 'data model', default=None)
if re.match(r'.+_path', k): if model:
format_id = k.split('_')[0] model_data = self._parse_json(model, display_id)
for video_url in model_data['sources'].values():
video_id, format_id = url_basename(video_url).split('_')[:2]
formats.append({ formats.append({
'format_id': format_id, 'format_id': format_id,
'quality': quality(format_id), 'quality': quality(format_id),
'url': v, 'url': video_url,
}) })
title = model_data['title']
else:
video_id = display_id
media_data = self._download_json(
'http://www.allocine.fr/ws/AcVisiondataV5.ashx?media=%s' % video_id, display_id)
for key, value in media_data['video'].items():
if not key.endswith('Path'):
continue
format_id = key[:-len('Path')]
formats.append({
'format_id': format_id,
'quality': quality(format_id),
'url': value,
})
title = remove_end(self._html_search_regex(
r'(?s)<title>(.+?)</title>', webpage, 'title'
).strip(), ' - AlloCiné')
self._sort_formats(formats) self._sort_formats(formats)
return { return {
'id': video_id, 'id': video_id,
'title': video['videoTitle'], 'display_id': display_id,
'title': title,
'thumbnail': self._og_search_thumbnail(webpage), 'thumbnail': self._og_search_thumbnail(webpage),
'formats': formats, 'formats': formats,
'description': self._og_search_description(webpage), 'description': self._og_search_description(webpage),

View File

@ -0,0 +1,92 @@
# coding: utf-8
from __future__ import unicode_literals
from .theplatform import ThePlatformIE
from ..utils import (
update_url_query,
parse_age_limit,
int_or_none,
)
class AMCNetworksIE(ThePlatformIE):
_VALID_URL = r'https?://(?:www\.)?(?:amc|bbcamerica|ifc|wetv)\.com/(?:movies/|shows/[^/]+/(?:full-episodes/)?season-\d+/episode-\d+(?:-(?:[^/]+/)?|/))(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'http://www.ifc.com/shows/maron/season-04/episode-01/step-1',
'md5': '',
'info_dict': {
'id': 's3MX01Nl4vPH',
'ext': 'mp4',
'title': 'Maron - Season 4 - Step 1',
'description': 'In denial about his current situation, Marc is reluctantly convinced by his friends to enter rehab. Starring Marc Maron and Constance Zimmer.',
'age_limit': 17,
'upload_date': '20160505',
'timestamp': 1462468831,
'uploader': 'AMCN',
},
'params': {
# m3u8 download
'skip_download': True,
},
'skip': 'Requires TV provider accounts',
}, {
'url': 'http://www.bbcamerica.com/shows/the-hunt/full-episodes/season-1/episode-01-the-hardest-challenge',
'only_matching': True,
}, {
'url': 'http://www.amc.com/shows/preacher/full-episodes/season-01/episode-00/pilot',
'only_matching': True,
}, {
'url': 'http://www.wetv.com/shows/million-dollar-matchmaker/season-01/episode-06-the-dumped-dj-and-shallow-hal',
'only_matching': True,
}, {
'url': 'http://www.ifc.com/movies/chaos',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
query = {
'mbr': 'true',
'manifest': 'm3u',
}
media_url = self._search_regex(r'window\.platformLinkURL\s*=\s*[\'"]([^\'"]+)', webpage, 'media url')
theplatform_metadata = self._download_theplatform_metadata(self._search_regex(
r'https?://link.theplatform.com/s/([^?]+)', media_url, 'theplatform_path'), display_id)
info = self._parse_theplatform_metadata(theplatform_metadata)
video_id = theplatform_metadata['pid']
title = theplatform_metadata['title']
rating = theplatform_metadata['ratings'][0]['rating']
auth_required = self._search_regex(r'window\.authRequired\s*=\s*(true|false);', webpage, 'auth required')
if auth_required == 'true':
requestor_id = self._search_regex(r'window\.requestor_id\s*=\s*[\'"]([^\'"]+)', webpage, 'requestor id')
resource = self._get_mvpd_resource(requestor_id, title, video_id, rating)
query['auth'] = self._extract_mvpd_auth(url, video_id, requestor_id, resource)
media_url = update_url_query(media_url, query)
formats, subtitles = self._extract_theplatform_smil(media_url, video_id)
self._sort_formats(formats)
info.update({
'id': video_id,
'subtitles': subtitles,
'formats': formats,
'age_limit': parse_age_limit(parse_age_limit(rating)),
})
ns_keys = theplatform_metadata.get('$xmlns', {}).keys()
if ns_keys:
ns = list(ns_keys)[0]
series = theplatform_metadata.get(ns + '$show')
season_number = int_or_none(theplatform_metadata.get(ns + '$season'))
episode = theplatform_metadata.get(ns + '$episodeTitle')
episode_number = int_or_none(theplatform_metadata.get(ns + '$episode'))
if season_number:
title = 'Season %d - %s' % (season_number, title)
if series:
title = '%s - %s' % (series, title)
info.update({
'title': title,
'series': series,
'season_number': season_number,
'episode': episode,
'episode_number': episode_number,
})
return info

View File

@ -174,11 +174,17 @@ class ARDMediathekIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
if '>Der gewünschte Beitrag ist nicht mehr verfügbar.<' in webpage: ERRORS = (
raise ExtractorError('Video %s is no longer available' % video_id, expected=True) ('>Leider liegt eine Störung vor.', 'Video %s is unavailable'),
('>Der gewünschte Beitrag ist nicht mehr verfügbar.<',
'Video %s is no longer available'),
('Diese Sendung ist für Jugendliche unter 12 Jahren nicht geeignet. Der Clip ist deshalb nur von 20 bis 6 Uhr verfügbar.',
'This program is only suitable for those aged 12 and older. Video %s is therefore only available between 8 pm and 6 am.'),
)
if 'Diese Sendung ist für Jugendliche unter 12 Jahren nicht geeignet. Der Clip ist deshalb nur von 20 bis 6 Uhr verfügbar.' in webpage: for pattern, message in ERRORS:
raise ExtractorError('This program is only suitable for those aged 12 and older. Video %s is therefore only available between 20 pm and 6 am.' % video_id, expected=True) if pattern in webpage:
raise ExtractorError(message % video_id, expected=True)
if re.search(r'[\?&]rss($|[=&])', url): if re.search(r'[\?&]rss($|[=&])', url):
doc = compat_etree_fromstring(webpage.encode('utf-8')) doc = compat_etree_fromstring(webpage.encode('utf-8'))
@ -238,7 +244,7 @@ class ARDMediathekIE(InfoExtractor):
class ARDIE(InfoExtractor): class ARDIE(InfoExtractor):
_VALID_URL = '(?P<mainurl>https?://(www\.)?daserste\.de/[^?#]+/videos/(?P<display_id>[^/?#]+)-(?P<id>[0-9]+))\.html' _VALID_URL = r'(?P<mainurl>https?://(www\.)?daserste\.de/[^?#]+/videos/(?P<display_id>[^/?#]+)-(?P<id>[0-9]+))\.html'
_TEST = { _TEST = {
'url': 'http://www.daserste.de/information/reportage-dokumentation/dokus/videos/die-story-im-ersten-mission-unter-falscher-flagge-100.html', 'url': 'http://www.daserste.de/information/reportage-dokumentation/dokus/videos/die-story-im-ersten-mission-unter-falscher-flagge-100.html',
'md5': 'd216c3a86493f9322545e045ddc3eb35', 'md5': 'd216c3a86493f9322545e045ddc3eb35',

View File

@ -1,4 +1,4 @@
# encoding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
@ -410,6 +410,22 @@ class ArteTVEmbedIE(ArteTVPlus7IE):
return self._extract_from_json_url(json_url, video_id, lang) return self._extract_from_json_url(json_url, video_id, lang)
class TheOperaPlatformIE(ArteTVPlus7IE):
IE_NAME = 'theoperaplatform'
_VALID_URL = r'https?://(?:www\.)?theoperaplatform\.eu/(?P<lang>fr|de|en|es)/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://www.theoperaplatform.eu/de/opera/verdi-otello',
'md5': '970655901fa2e82e04c00b955e9afe7b',
'info_dict': {
'id': '060338-009-A',
'ext': 'mp4',
'title': 'Verdi - OTELLO',
'upload_date': '20160927',
},
}]
class ArteTVPlaylistIE(ArteTVBaseIE): class ArteTVPlaylistIE(ArteTVBaseIE):
IE_NAME = 'arte.tv:playlist' IE_NAME = 'arte.tv:playlist'
_VALID_URL = r'https?://(?:www\.)?arte\.tv/guide/(?P<lang>fr|de|en|es)/[^#]*#collection/(?P<id>PL-\d+)' _VALID_URL = r'https?://(?:www\.)?arte\.tv/guide/(?P<lang>fr|de|en|es)/[^#]*#collection/(?P<id>PL-\d+)'

View File

@ -12,74 +12,51 @@ from ..compat import (
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
parse_iso8601, parse_iso8601,
sanitized_Request,
smuggle_url, smuggle_url,
unsmuggle_url, unsmuggle_url,
urlencode_postdata, urlencode_postdata,
) )
class DCNIE(InfoExtractor): class AWAANIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?:awaan|dcndigital)\.ae/(?:#/)?show/(?P<show_id>\d+)/[^/]+(?:/(?P<video_id>\d+)/(?P<season_id>\d+))?' _VALID_URL = r'https?://(?:www\.)?(?:awaan|dcndigital)\.ae/(?:#/)?show/(?P<show_id>\d+)/[^/]+(?:/(?P<video_id>\d+)/(?P<season_id>\d+))?'
def _real_extract(self, url): def _real_extract(self, url):
show_id, video_id, season_id = re.match(self._VALID_URL, url).groups() show_id, video_id, season_id = re.match(self._VALID_URL, url).groups()
if video_id and int(video_id) > 0: if video_id and int(video_id) > 0:
return self.url_result( return self.url_result(
'http://www.dcndigital.ae/media/%s' % video_id, 'DCNVideo') 'http://awaan.ae/media/%s' % video_id, 'AWAANVideo')
elif season_id and int(season_id) > 0: elif season_id and int(season_id) > 0:
return self.url_result(smuggle_url( return self.url_result(smuggle_url(
'http://www.dcndigital.ae/program/season/%s' % season_id, 'http://awaan.ae/program/season/%s' % season_id,
{'show_id': show_id}), 'DCNSeason') {'show_id': show_id}), 'AWAANSeason')
else: else:
return self.url_result( return self.url_result(
'http://www.dcndigital.ae/program/%s' % show_id, 'DCNSeason') 'http://awaan.ae/program/%s' % show_id, 'AWAANSeason')
class DCNBaseIE(InfoExtractor): class AWAANBaseIE(InfoExtractor):
def _extract_video_info(self, video_data, video_id, is_live): def _parse_video_data(self, video_data, video_id, is_live):
title = video_data.get('title_en') or video_data['title_ar'] title = video_data.get('title_en') or video_data['title_ar']
img = video_data.get('img') img = video_data.get('img')
thumbnail = 'http://admin.mangomolo.com/analytics/%s' % img if img else None
duration = int_or_none(video_data.get('duration'))
description = video_data.get('description_en') or video_data.get('description_ar')
timestamp = parse_iso8601(video_data.get('create_time'), ' ')
return { return {
'id': video_id, 'id': video_id,
'title': self._live_title(title) if is_live else title, 'title': self._live_title(title) if is_live else title,
'description': description, 'description': video_data.get('description_en') or video_data.get('description_ar'),
'thumbnail': thumbnail, 'thumbnail': 'http://admin.mangomolo.com/analytics/%s' % img if img else None,
'duration': duration, 'duration': int_or_none(video_data.get('duration')),
'timestamp': timestamp, 'timestamp': parse_iso8601(video_data.get('create_time'), ' '),
'is_live': is_live, 'is_live': is_live,
} }
def _extract_video_formats(self, webpage, video_id, m3u8_entry_protocol):
formats = []
format_url_base = 'http' + self._html_search_regex(
[
r'file\s*:\s*"https?(://[^"]+)/playlist.m3u8',
r'<a[^>]+href="rtsp(://[^"]+)"'
], webpage, 'format url')
formats.extend(self._extract_mpd_formats(
format_url_base + '/manifest.mpd',
video_id, mpd_id='dash', fatal=False))
formats.extend(self._extract_m3u8_formats(
format_url_base + '/playlist.m3u8', video_id, 'mp4',
m3u8_entry_protocol, m3u8_id='hls', fatal=False))
formats.extend(self._extract_f4m_formats(
format_url_base + '/manifest.f4m',
video_id, f4m_id='hds', fatal=False))
self._sort_formats(formats)
return formats
class AWAANVideoIE(AWAANBaseIE):
class DCNVideoIE(DCNBaseIE): IE_NAME = 'awaan:video'
IE_NAME = 'dcn:video'
_VALID_URL = r'https?://(?:www\.)?(?:awaan|dcndigital)\.ae/(?:#/)?(?:video(?:/[^/]+)?|media|catchup/[^/]+/[^/]+)/(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?(?:awaan|dcndigital)\.ae/(?:#/)?(?:video(?:/[^/]+)?|media|catchup/[^/]+/[^/]+)/(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.dcndigital.ae/#/video/%D8%B1%D8%AD%D9%84%D8%A9-%D8%A7%D9%84%D8%B9%D9%85%D8%B1-%D8%A7%D9%84%D8%AD%D9%84%D9%82%D8%A9-1/17375', 'url': 'http://www.dcndigital.ae/#/video/%D8%B1%D8%AD%D9%84%D8%A9-%D8%A7%D9%84%D8%B9%D9%85%D8%B1-%D8%A7%D9%84%D8%AD%D9%84%D9%82%D8%A9-1/17375',
'md5': '5f61c33bfc7794315c671a62d43116aa',
'info_dict': 'info_dict':
{ {
'id': '17375', 'id': '17375',
@ -89,10 +66,7 @@ class DCNVideoIE(DCNBaseIE):
'duration': 2041, 'duration': 2041,
'timestamp': 1227504126, 'timestamp': 1227504126,
'upload_date': '20081124', 'upload_date': '20081124',
}, 'uploader_id': '71',
'params': {
# m3u8 download
'skip_download': True,
}, },
}, { }, {
'url': 'http://awaan.ae/video/26723981/%D8%AF%D8%A7%D8%B1-%D8%A7%D9%84%D8%B3%D9%84%D8%A7%D9%85:-%D8%AE%D9%8A%D8%B1-%D8%AF%D9%88%D8%B1-%D8%A7%D9%84%D8%A3%D9%86%D8%B5%D8%A7%D8%B1', 'url': 'http://awaan.ae/video/26723981/%D8%AF%D8%A7%D8%B1-%D8%A7%D9%84%D8%B3%D9%84%D8%A7%D9%85:-%D8%AE%D9%8A%D8%B1-%D8%AF%D9%88%D8%B1-%D8%A7%D9%84%D8%A3%D9%86%D8%B5%D8%A7%D8%B1',
@ -102,54 +76,69 @@ class DCNVideoIE(DCNBaseIE):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
request = sanitized_Request( video_data = self._download_json(
'http://admin.mangomolo.com/analytics/index.php/plus/video?id=%s' % video_id, 'http://admin.mangomolo.com/analytics/index.php/plus/video?id=%s' % video_id,
headers={'Origin': 'http://www.dcndigital.ae'}) video_id, headers={'Origin': 'http://awaan.ae'})
video_data = self._download_json(request, video_id) info = self._parse_video_data(video_data, video_id, False)
info = self._extract_video_info(video_data, video_id, False)
webpage = self._download_webpage( embed_url = 'http://admin.mangomolo.com/analytics/index.php/customers/embed/video?' + compat_urllib_parse_urlencode({
'http://admin.mangomolo.com/analytics/index.php/customers/embed/video?' +
compat_urllib_parse_urlencode({
'id': video_data['id'], 'id': video_data['id'],
'user_id': video_data['user_id'], 'user_id': video_data['user_id'],
'signature': video_data['signature'], 'signature': video_data['signature'],
'countries': 'Q0M=', 'countries': 'Q0M=',
'filter': 'DENY', 'filter': 'DENY',
}), video_id) })
info['formats'] = self._extract_video_formats(webpage, video_id, 'm3u8_native') info.update({
'_type': 'url_transparent',
'url': embed_url,
'ie_key': 'MangomoloVideo',
})
return info return info
class DCNLiveIE(DCNBaseIE): class AWAANLiveIE(AWAANBaseIE):
IE_NAME = 'dcn:live' IE_NAME = 'awaan:live'
_VALID_URL = r'https?://(?:www\.)?(?:awaan|dcndigital)\.ae/(?:#/)?live/(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?(?:awaan|dcndigital)\.ae/(?:#/)?live/(?P<id>\d+)'
_TEST = {
'url': 'http://awaan.ae/live/6/dubai-tv',
'info_dict': {
'id': '6',
'ext': 'mp4',
'title': 're:Dubai Al Oula [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'upload_date': '20150107',
'timestamp': 1420588800,
},
'params': {
# m3u8 download
'skip_download': True,
},
}
def _real_extract(self, url): def _real_extract(self, url):
channel_id = self._match_id(url) channel_id = self._match_id(url)
request = sanitized_Request( channel_data = self._download_json(
'http://admin.mangomolo.com/analytics/index.php/plus/getchanneldetails?channel_id=%s' % channel_id, 'http://admin.mangomolo.com/analytics/index.php/plus/getchanneldetails?channel_id=%s' % channel_id,
headers={'Origin': 'http://www.dcndigital.ae'}) channel_id, headers={'Origin': 'http://awaan.ae'})
info = self._parse_video_data(channel_data, channel_id, True)
channel_data = self._download_json(request, channel_id) embed_url = 'http://admin.mangomolo.com/analytics/index.php/customers/embed/index?' + compat_urllib_parse_urlencode({
info = self._extract_video_info(channel_data, channel_id, True)
webpage = self._download_webpage(
'http://admin.mangomolo.com/analytics/index.php/customers/embed/index?' +
compat_urllib_parse_urlencode({
'id': base64.b64encode(channel_data['user_id'].encode()).decode(), 'id': base64.b64encode(channel_data['user_id'].encode()).decode(),
'channelid': base64.b64encode(channel_data['id'].encode()).decode(), 'channelid': base64.b64encode(channel_data['id'].encode()).decode(),
'signature': channel_data['signature'], 'signature': channel_data['signature'],
'countries': 'Q0M=', 'countries': 'Q0M=',
'filter': 'DENY', 'filter': 'DENY',
}), channel_id) })
info['formats'] = self._extract_video_formats(webpage, channel_id, 'm3u8') info.update({
'_type': 'url_transparent',
'url': embed_url,
'ie_key': 'MangomoloLive',
})
return info return info
class DCNSeasonIE(InfoExtractor): class AWAANSeasonIE(InfoExtractor):
IE_NAME = 'dcn:season' IE_NAME = 'awaan:season'
_VALID_URL = r'https?://(?:www\.)?(?:awaan|dcndigital)\.ae/(?:#/)?program/(?:(?P<show_id>\d+)|season/(?P<season_id>\d+))' _VALID_URL = r'https?://(?:www\.)?(?:awaan|dcndigital)\.ae/(?:#/)?program/(?:(?P<show_id>\d+)|season/(?P<season_id>\d+))'
_TEST = { _TEST = {
'url': 'http://dcndigital.ae/#/program/205024/%D9%85%D8%AD%D8%A7%D8%B6%D8%B1%D8%A7%D8%AA-%D8%A7%D9%84%D8%B4%D9%8A%D8%AE-%D8%A7%D9%84%D8%B4%D8%B9%D8%B1%D8%A7%D9%88%D9%8A', 'url': 'http://dcndigital.ae/#/program/205024/%D9%85%D8%AD%D8%A7%D8%B6%D8%B1%D8%A7%D8%AA-%D8%A7%D9%84%D8%B4%D9%8A%D8%AE-%D8%A7%D9%84%D8%B4%D8%B9%D8%B1%D8%A7%D9%88%D9%8A',
@ -170,21 +159,17 @@ class DCNSeasonIE(InfoExtractor):
data['season'] = season_id data['season'] = season_id
show_id = smuggled_data.get('show_id') show_id = smuggled_data.get('show_id')
if show_id is None: if show_id is None:
request = sanitized_Request( season = self._download_json(
'http://admin.mangomolo.com/analytics/index.php/plus/season_info?id=%s' % season_id, 'http://admin.mangomolo.com/analytics/index.php/plus/season_info?id=%s' % season_id,
headers={'Origin': 'http://www.dcndigital.ae'}) season_id, headers={'Origin': 'http://awaan.ae'})
season = self._download_json(request, season_id)
show_id = season['id'] show_id = season['id']
data['show_id'] = show_id data['show_id'] = show_id
request = sanitized_Request( show = self._download_json(
'http://admin.mangomolo.com/analytics/index.php/plus/show', 'http://admin.mangomolo.com/analytics/index.php/plus/show',
urlencode_postdata(data), show_id, data=urlencode_postdata(data), headers={
{ 'Origin': 'http://awaan.ae',
'Origin': 'http://www.dcndigital.ae',
'Content-Type': 'application/x-www-form-urlencoded' 'Content-Type': 'application/x-www-form-urlencoded'
}) })
show = self._download_json(request, show_id)
if not season_id: if not season_id:
season_id = show['default_season'] season_id = show['default_season']
for season in show['seasons']: for season in show['seasons']:
@ -195,6 +180,6 @@ class DCNSeasonIE(InfoExtractor):
for video in show['videos']: for video in show['videos']:
video_id = compat_str(video['id']) video_id = compat_str(video['id'])
entries.append(self.url_result( entries.append(self.url_result(
'http://www.dcndigital.ae/media/%s' % video_id, 'DCNVideo', video_id)) 'http://awaan.ae/media/%s' % video_id, 'AWAANVideo', video_id))
return self.playlist_result(entries, season_id, title) return self.playlist_result(entries, season_id, title)

View File

@ -103,7 +103,7 @@ class AzubuIE(InfoExtractor):
class AzubuLiveIE(InfoExtractor): class AzubuLiveIE(InfoExtractor):
_VALID_URL = r'https?://www.azubu.tv/(?P<id>[^/]+)$' _VALID_URL = r'https?://(?:www\.)?azubu\.tv/(?P<id>[^/]+)$'
_TEST = { _TEST = {
'url': 'http://www.azubu.tv/MarsTVMDLen', 'url': 'http://www.azubu.tv/MarsTVMDLen',

View File

@ -162,6 +162,15 @@ class BandcampAlbumIE(InfoExtractor):
'uploader_id': 'dotscale', 'uploader_id': 'dotscale',
}, },
'playlist_mincount': 7, 'playlist_mincount': 7,
}, {
# with escaped quote in title
'url': 'https://jstrecords.bandcamp.com/album/entropy-ep',
'info_dict': {
'title': '"Entropy" EP',
'uploader_id': 'jstrecords',
'id': 'entropy-ep',
},
'playlist_mincount': 3,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -176,8 +185,11 @@ class BandcampAlbumIE(InfoExtractor):
entries = [ entries = [
self.url_result(compat_urlparse.urljoin(url, t_path), ie=BandcampIE.ie_key()) self.url_result(compat_urlparse.urljoin(url, t_path), ie=BandcampIE.ie_key())
for t_path in tracks_paths] for t_path in tracks_paths]
title = self._search_regex( title = self._html_search_regex(
r'album_title\s*:\s*"(.*?)"', webpage, 'title', fatal=False) r'album_title\s*:\s*"((?:\\.|[^"\\])+?)"',
webpage, 'title', fatal=False)
if title:
title = title.replace(r'\"', '"')
return { return {
'_type': 'playlist', '_type': 'playlist',
'uploader_id': uploader_id, 'uploader_id': uploader_id,

View File

@ -2,6 +2,7 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
import itertools
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
@ -17,6 +18,7 @@ from ..utils import (
from ..compat import ( from ..compat import (
compat_etree_fromstring, compat_etree_fromstring,
compat_HTTPError, compat_HTTPError,
compat_urlparse,
) )
@ -1026,7 +1028,7 @@ class BBCIE(BBCCoUkIE):
class BBCCoUkArticleIE(InfoExtractor): class BBCCoUkArticleIE(InfoExtractor):
_VALID_URL = r'https?://www.bbc.co.uk/programmes/articles/(?P<id>[a-zA-Z0-9]+)' _VALID_URL = r'https?://(?:www\.)?bbc\.co\.uk/programmes/articles/(?P<id>[a-zA-Z0-9]+)'
IE_NAME = 'bbc.co.uk:article' IE_NAME = 'bbc.co.uk:article'
IE_DESC = 'BBC articles' IE_DESC = 'BBC articles'
@ -1056,19 +1058,35 @@ class BBCCoUkArticleIE(InfoExtractor):
class BBCCoUkPlaylistBaseIE(InfoExtractor): class BBCCoUkPlaylistBaseIE(InfoExtractor):
def _entries(self, webpage, url, playlist_id):
single_page = 'page' in compat_urlparse.parse_qs(
compat_urlparse.urlparse(url).query)
for page_num in itertools.count(2):
for video_id in re.findall(
self._VIDEO_ID_TEMPLATE % BBCCoUkIE._ID_REGEX, webpage):
yield self.url_result(
self._URL_TEMPLATE % video_id, BBCCoUkIE.ie_key())
if single_page:
return
next_page = self._search_regex(
r'<li[^>]+class=(["\'])pagination_+next\1[^>]*><a[^>]+href=(["\'])(?P<url>(?:(?!\2).)+)\2',
webpage, 'next page url', default=None, group='url')
if not next_page:
break
webpage = self._download_webpage(
compat_urlparse.urljoin(url, next_page), playlist_id,
'Downloading page %d' % page_num, page_num)
def _real_extract(self, url): def _real_extract(self, url):
playlist_id = self._match_id(url) playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id) webpage = self._download_webpage(url, playlist_id)
entries = [
self.url_result(self._URL_TEMPLATE % video_id, BBCCoUkIE.ie_key())
for video_id in re.findall(
self._VIDEO_ID_TEMPLATE % BBCCoUkIE._ID_REGEX, webpage)]
title, description = self._extract_title_and_description(webpage) title, description = self._extract_title_and_description(webpage)
return self.playlist_result(entries, playlist_id, title, description) return self.playlist_result(
self._entries(webpage, url, playlist_id),
playlist_id, title, description)
class BBCCoUkIPlayerPlaylistIE(BBCCoUkPlaylistBaseIE): class BBCCoUkIPlayerPlaylistIE(BBCCoUkPlaylistBaseIE):
@ -1117,6 +1135,24 @@ class BBCCoUkPlaylistIE(BBCCoUkPlaylistBaseIE):
'description': 'French thriller serial about a missing teenager.', 'description': 'French thriller serial about a missing teenager.',
}, },
'playlist_mincount': 7, 'playlist_mincount': 7,
}, {
# multipage playlist, explicit page
'url': 'http://www.bbc.co.uk/programmes/b00mfl7n/clips?page=1',
'info_dict': {
'id': 'b00mfl7n',
'title': 'Frozen Planet - Clips - BBC One',
'description': 'md5:65dcbf591ae628dafe32aa6c4a4a0d8c',
},
'playlist_mincount': 24,
}, {
# multipage playlist, all pages
'url': 'http://www.bbc.co.uk/programmes/b00mfl7n/clips',
'info_dict': {
'id': 'b00mfl7n',
'title': 'Frozen Planet - Clips - BBC One',
'description': 'md5:65dcbf591ae628dafe32aa6c4a4a0d8c',
},
'playlist_mincount': 142,
}, { }, {
'url': 'http://www.bbc.co.uk/programmes/b05rcz9v/broadcasts/2016/06', 'url': 'http://www.bbc.co.uk/programmes/b05rcz9v/broadcasts/2016/06',
'only_matching': True, 'only_matching': True,

View File

@ -8,10 +8,10 @@ from ..compat import compat_str
from ..utils import int_or_none from ..utils import int_or_none
class BeatportProIE(InfoExtractor): class BeatportIE(InfoExtractor):
_VALID_URL = r'https?://pro\.beatport\.com/track/(?P<display_id>[^/]+)/(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.|pro\.)?beatport\.com/track/(?P<display_id>[^/]+)/(?P<id>[0-9]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://pro.beatport.com/track/synesthesia-original-mix/5379371', 'url': 'https://beatport.com/track/synesthesia-original-mix/5379371',
'md5': 'b3c34d8639a2f6a7f734382358478887', 'md5': 'b3c34d8639a2f6a7f734382358478887',
'info_dict': { 'info_dict': {
'id': '5379371', 'id': '5379371',
@ -20,7 +20,7 @@ class BeatportProIE(InfoExtractor):
'title': 'Froxic - Synesthesia (Original Mix)', 'title': 'Froxic - Synesthesia (Original Mix)',
}, },
}, { }, {
'url': 'https://pro.beatport.com/track/love-and-war-original-mix/3756896', 'url': 'https://beatport.com/track/love-and-war-original-mix/3756896',
'md5': 'e44c3025dfa38c6577fbaeb43da43514', 'md5': 'e44c3025dfa38c6577fbaeb43da43514',
'info_dict': { 'info_dict': {
'id': '3756896', 'id': '3756896',
@ -29,7 +29,7 @@ class BeatportProIE(InfoExtractor):
'title': 'Wolfgang Gartner - Love & War (Original Mix)', 'title': 'Wolfgang Gartner - Love & War (Original Mix)',
}, },
}, { }, {
'url': 'https://pro.beatport.com/track/birds-original-mix/4991738', 'url': 'https://beatport.com/track/birds-original-mix/4991738',
'md5': 'a1fd8e8046de3950fd039304c186c05f', 'md5': 'a1fd8e8046de3950fd039304c186c05f',
'info_dict': { 'info_dict': {
'id': '4991738', 'id': '4991738',

View File

@ -46,19 +46,19 @@ class BeegIE(InfoExtractor):
self._proto_relative_url(cpl_url), video_id, self._proto_relative_url(cpl_url), video_id,
'Downloading cpl JS', fatal=False) 'Downloading cpl JS', fatal=False)
if cpl: if cpl:
beeg_version = self._search_regex( beeg_version = int_or_none(self._search_regex(
r'beeg_version\s*=\s*(\d+)', cpl, r'beeg_version\s*=\s*([^\b]+)', cpl,
'beeg version', default=None) or self._search_regex( 'beeg version', default=None)) or self._search_regex(
r'/(\d+)\.js', cpl_url, 'beeg version', default=None) r'/(\d+)\.js', cpl_url, 'beeg version', default=None)
beeg_salt = self._search_regex( beeg_salt = self._search_regex(
r'beeg_salt\s*=\s*(["\'])(?P<beeg_salt>.+?)\1', cpl, 'beeg beeg_salt', r'beeg_salt\s*=\s*(["\'])(?P<beeg_salt>.+?)\1', cpl, 'beeg salt',
default=None, group='beeg_salt') default=None, group='beeg_salt')
beeg_version = beeg_version or '1750' beeg_version = beeg_version or '2000'
beeg_salt = beeg_salt or 'MIDtGaw96f0N1kMMAM1DE46EC9pmFr' beeg_salt = beeg_salt or 'pmweAkq8lAYKdfWcFCUj0yoVgoPlinamH5UE1CB3H'
video = self._download_json( video = self._download_json(
'http://api.beeg.com/api/v6/%s/video/%s' % (beeg_version, video_id), 'https://api.beeg.com/api/v6/%s/video/%s' % (beeg_version, video_id),
video_id) video_id)
def split(o, e): def split(o, e):

View File

@ -0,0 +1,75 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class BellMediaIE(InfoExtractor):
_VALID_URL = r'''(?x)https?://(?:www\.)?
(?P<domain>
(?:
ctv|
tsn|
bnn|
thecomedynetwork|
discovery|
discoveryvelocity|
sciencechannel|
investigationdiscovery|
animalplanet|
bravo|
mtv|
space
)\.ca|
much\.com
)/.*?(?:\bvid=|-vid|~|%7E|/(?:episode)?)(?P<id>[0-9]{6})'''
_TESTS = [{
'url': 'http://www.ctv.ca/video/player?vid=706966',
'md5': 'ff2ebbeae0aa2dcc32a830c3fd69b7b0',
'info_dict': {
'id': '706966',
'ext': 'mp4',
'title': 'Larry Day and Richard Jutras on the TIFF red carpet of \'Stonewall\'',
'description': 'etalk catches up with Larry Day and Richard Jutras on the TIFF red carpet of "Stonewall”.',
'upload_date': '20150919',
'timestamp': 1442624700,
},
'expected_warnings': ['HTTP Error 404'],
}, {
'url': 'http://www.thecomedynetwork.ca/video/player?vid=923582',
'only_matching': True,
}, {
'url': 'http://www.tsn.ca/video/expectations-high-for-milos-raonic-at-us-open~939549',
'only_matching': True,
}, {
'url': 'http://www.bnn.ca/video/berman-s-call-part-two-viewer-questions~939654',
'only_matching': True,
}, {
'url': 'http://www.ctv.ca/YourMorning/Video/S1E6-Monday-August-29-2016-vid938009',
'only_matching': True,
}, {
'url': 'http://www.much.com/shows/atmidnight/episode948007/tuesday-september-13-2016',
'only_matching': True,
}, {
'url': 'http://www.much.com/shows/the-almost-impossible-gameshow/928979/episode-6',
'only_matching': True,
}]
_DOMAINS = {
'thecomedynetwork': 'comedy',
'discoveryvelocity': 'discvel',
'sciencechannel': 'discsci',
'investigationdiscovery': 'invdisc',
'animalplanet': 'aniplan',
}
def _real_extract(self, url):
domain, video_id = re.match(self._VALID_URL, url).groups()
domain = domain.split('.')[0]
return {
'_type': 'url_transparent',
'id': video_id,
'url': '9c9media:%s_web:%s' % (self._DOMAINS.get(domain, domain), video_id),
'ie_key': 'NineCNineMedia',
}

View File

@ -2,7 +2,6 @@ from __future__ import unicode_literals
from .mtv import MTVServicesInfoExtractor from .mtv import MTVServicesInfoExtractor
from ..utils import unified_strdate from ..utils import unified_strdate
from ..compat import compat_urllib_parse_urlencode
class BetIE(MTVServicesInfoExtractor): class BetIE(MTVServicesInfoExtractor):
@ -53,9 +52,9 @@ class BetIE(MTVServicesInfoExtractor):
_FEED_URL = "http://feeds.mtvnservices.com/od/feed/bet-mrss-player" _FEED_URL = "http://feeds.mtvnservices.com/od/feed/bet-mrss-player"
def _get_feed_query(self, uri): def _get_feed_query(self, uri):
return compat_urllib_parse_urlencode({ return {
'uuid': uri, 'uuid': uri,
}) }
def _extract_mgid(self, webpage): def _extract_mgid(self, webpage):
return self._search_regex(r'data-uri="([^"]+)', webpage, 'mgid') return self._search_regex(r'data-uri="([^"]+)', webpage, 'mgid')

View File

@ -1,33 +1,27 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import calendar import hashlib
import datetime
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import compat_parse_qs
compat_etree_fromstring,
compat_str,
compat_parse_qs,
compat_xml_parse_error,
)
from ..utils import ( from ..utils import (
ExtractorError,
int_or_none, int_or_none,
float_or_none, float_or_none,
xpath_text, unified_timestamp,
urlencode_postdata,
) )
class BiliBiliIE(InfoExtractor): class BiliBiliIE(InfoExtractor):
_VALID_URL = r'https?://www\.bilibili\.(?:tv|com)/video/av(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.|bangumi\.|)bilibili\.(?:tv|com)/(?:video/av|anime/v/)(?P<id>\d+)'
_TESTS = [{ _TEST = {
'url': 'http://www.bilibili.tv/video/av1074402/', 'url': 'http://www.bilibili.tv/video/av1074402/',
'md5': '9fa226fe2b8a9a4d5a69b4c6a183417e', 'md5': '9fa226fe2b8a9a4d5a69b4c6a183417e',
'info_dict': { 'info_dict': {
'id': '1554319', 'id': '1074402',
'ext': 'mp4', 'ext': 'mp4',
'title': '【金坷垃】金泡沫', 'title': '【金坷垃】金泡沫',
'description': 'md5:ce18c2a2d2193f0df2917d270f2e5923', 'description': 'md5:ce18c2a2d2193f0df2917d270f2e5923',
@ -38,128 +32,70 @@ class BiliBiliIE(InfoExtractor):
'uploader': '菊子桑', 'uploader': '菊子桑',
'uploader_id': '156160', 'uploader_id': '156160',
}, },
}, { }
'url': 'http://www.bilibili.com/video/av1041170/',
'info_dict': {
'id': '1507019',
'ext': 'mp4',
'title': '【BD1080P】刀语【诸神&异域】',
'description': '这是个神奇的故事~每个人不留弹幕不给走哦~切利哦!~',
'timestamp': 1396530060,
'upload_date': '20140403',
'uploader': '枫叶逝去',
'uploader_id': '520116',
},
}, {
'url': 'http://www.bilibili.com/video/av4808130/',
'info_dict': {
'id': '7802182',
'ext': 'mp4',
'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
'timestamp': 1464564180,
'upload_date': '20160529',
'uploader': '喜欢拉面',
'uploader_id': '151066',
},
}, {
# Missing upload time
'url': 'http://www.bilibili.com/video/av1867637/',
'info_dict': {
'id': '2880301',
'ext': 'mp4',
'title': '【HDTV】【喜剧】岳父岳母真难当 2014【法国票房冠军】',
'description': '一个信奉天主教的法国旧式传统资产阶级家庭中有四个女儿。三个女儿却分别找了阿拉伯、犹太、中国丈夫,老夫老妻唯独期盼剩下未嫁的小女儿能找一个信奉天主教的法国白人,结果没想到小女儿找了一位非裔黑人……【这次应该不会跳帧了】',
'uploader': '黑夜为猫',
'uploader_id': '610729',
},
'params': {
# Just to test metadata extraction
'skip_download': True,
},
'expected_warnings': ['upload time'],
}]
# BiliBili blocks keys from time to time. The current key is extracted from _APP_KEY = '6f90a59ac58a4123'
# the Android client _BILIBILI_KEY = '0bfd84cc3940035173f35e6777508326'
# TODO: find the sign algorithm used in the flash player
_APP_KEY = '86385cdc024c0f6c'
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id = self._match_id(url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
params = compat_parse_qs(self._search_regex( if 'anime/v' not in url:
cid = compat_parse_qs(self._search_regex(
[r'EmbedPlayer\([^)]+,\s*"([^"]+)"\)', [r'EmbedPlayer\([^)]+,\s*"([^"]+)"\)',
r'<iframe[^>]+src="https://secure\.bilibili\.com/secure,([^"]+)"'], r'<iframe[^>]+src="https://secure\.bilibili\.com/secure,([^"]+)"'],
webpage, 'player parameters')) webpage, 'player parameters'))['cid'][0]
cid = params['cid'][0]
info_xml_str = self._download_webpage(
'http://interface.bilibili.com/v_cdn_play',
cid, query={'appkey': self._APP_KEY, 'cid': cid},
note='Downloading video info page')
err_msg = None
durls = None
info_xml = None
try:
info_xml = compat_etree_fromstring(info_xml_str.encode('utf-8'))
except compat_xml_parse_error:
info_json = self._parse_json(info_xml_str, video_id, fatal=False)
err_msg = (info_json or {}).get('error_text')
else: else:
err_msg = xpath_text(info_xml, './message') js = self._download_json(
'http://bangumi.bilibili.com/web_api/get_source', video_id,
data=urlencode_postdata({'episode_id': video_id}),
headers={'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8'})
cid = js['result']['cid']
if info_xml is not None: payload = 'appkey=%s&cid=%s&otype=json&quality=2&type=mp4' % (self._APP_KEY, cid)
durls = info_xml.findall('./durl') sign = hashlib.md5((payload + self._BILIBILI_KEY).encode('utf-8')).hexdigest()
if not durls:
if err_msg: video_info = self._download_json(
raise ExtractorError('%s said: %s' % (self.IE_NAME, err_msg), expected=True) 'http://interface.bilibili.com/playurl?%s&sign=%s' % (payload, sign),
else: video_id, note='Downloading video info page')
raise ExtractorError('No videos found!')
entries = [] entries = []
for durl in durls: for idx, durl in enumerate(video_info['durl']):
size = xpath_text(durl, ['./filesize', './size'])
formats = [{ formats = [{
'url': durl.find('./url').text, 'url': durl['url'],
'filesize': int_or_none(size), 'filesize': int_or_none(durl['size']),
}] }]
for backup_url in durl.findall('./backup_url/url'): for backup_url in durl.get('backup_url', []):
formats.append({ formats.append({
'url': backup_url.text, 'url': backup_url,
# backup URLs have lower priorities # backup URLs have lower priorities
'preference': -2 if 'hd.mp4' in backup_url.text else -3, 'preference': -2 if 'hd.mp4' in backup_url else -3,
}) })
self._sort_formats(formats) self._sort_formats(formats)
entries.append({ entries.append({
'id': '%s_part%s' % (cid, xpath_text(durl, './order')), 'id': '%s_part%s' % (video_id, idx),
'duration': int_or_none(xpath_text(durl, './length'), 1000), 'duration': float_or_none(durl.get('length'), 1000),
'formats': formats, 'formats': formats,
}) })
title = self._html_search_regex('<h1[^>]+title="([^"]+)">', webpage, 'title') title = self._html_search_regex('<h1[^>]+title="([^"]+)">', webpage, 'title')
description = self._html_search_meta('description', webpage) description = self._html_search_meta('description', webpage)
datetime_str = self._html_search_regex( timestamp = unified_timestamp(self._html_search_regex(
r'<time[^>]+datetime="([^"]+)"', webpage, 'upload time', fatal=False) r'<time[^>]+datetime="([^"]+)"', webpage, 'upload time', fatal=False))
timestamp = None thumbnail = self._html_search_meta(['og:image', 'thumbnailUrl'], webpage)
if datetime_str:
timestamp = calendar.timegm(datetime.datetime.strptime(datetime_str, '%Y-%m-%dT%H:%M').timetuple())
# TODO 'view_count' requires deobfuscating Javascript # TODO 'view_count' requires deobfuscating Javascript
info = { info = {
'id': compat_str(cid), 'id': video_id,
'title': title, 'title': title,
'description': description, 'description': description,
'timestamp': timestamp, 'timestamp': timestamp,
'thumbnail': self._html_search_meta('thumbnailUrl', webpage), 'thumbnail': thumbnail,
'duration': float_or_none(xpath_text(info_xml, './timelength'), scale=1000), 'duration': float_or_none(video_info.get('timelength'), scale=1000),
} }
uploader_mobj = re.search( uploader_mobj = re.search(

View File

@ -12,7 +12,7 @@ from ..utils import (
class BpbIE(InfoExtractor): class BpbIE(InfoExtractor):
IE_DESC = 'Bundeszentrale für politische Bildung' IE_DESC = 'Bundeszentrale für politische Bildung'
_VALID_URL = r'https?://www\.bpb\.de/mediathek/(?P<id>[0-9]+)/' _VALID_URL = r'https?://(?:www\.)?bpb\.de/mediathek/(?P<id>[0-9]+)/'
_TEST = { _TEST = {
'url': 'http://www.bpb.de/mediathek/297/joachim-gauck-zu-1989-und-die-erinnerung-an-die-ddr', 'url': 'http://www.bpb.de/mediathek/297/joachim-gauck-zu-1989-und-die-erinnerung-an-die-ddr',

View File

@ -1,31 +1,74 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .adobepass import AdobePassIE
from ..utils import smuggle_url from ..utils import (
smuggle_url,
update_url_query,
int_or_none,
)
class BravoTVIE(InfoExtractor): class BravoTVIE(AdobePassIE):
_VALID_URL = r'https?://(?:www\.)?bravotv\.com/(?:[^/]+/)+videos/(?P<id>[^/?]+)' _VALID_URL = r'https?://(?:www\.)?bravotv\.com/(?:[^/]+/)+(?P<id>[^/?#]+)'
_TEST = { _TESTS = [{
'url': 'http://www.bravotv.com/last-chance-kitchen/season-5/videos/lck-ep-12-fishy-finale', 'url': 'http://www.bravotv.com/last-chance-kitchen/season-5/videos/lck-ep-12-fishy-finale',
'md5': 'd60cdf68904e854fac669bd26cccf801', 'md5': '9086d0b7ef0ea2aabc4781d75f4e5863',
'info_dict': { 'info_dict': {
'id': 'LitrBdX64qLn', 'id': 'zHyk1_HU_mPy',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Last Chance Kitchen Returns', 'title': 'LCK Ep 12: Fishy Finale',
'description': 'S13: Last Chance Kitchen Returns for Top Chef Season 13', 'description': 'S13/E12: Two eliminated chefs have just 12 minutes to cook up a delicious fish dish.',
'timestamp': 1448926740,
'upload_date': '20151130',
'uploader': 'NBCU-BRAV', 'uploader': 'NBCU-BRAV',
'upload_date': '20160302',
'timestamp': 1456945320,
} }
} }, {
'url': 'http://www.bravotv.com/below-deck/season-3/ep-14-reunion-part-1',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, display_id)
account_pid = self._search_regex(r'"account_pid"\s*:\s*"([^"]+)"', webpage, 'account pid') settings = self._parse_json(self._search_regex(
release_pid = self._search_regex(r'"release_pid"\s*:\s*"([^"]+)"', webpage, 'release pid') r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);', webpage, 'drupal settings'),
return self.url_result(smuggle_url( display_id)
'http://link.theplatform.com/s/%s/%s?mbr=true&switch=progressive' % (account_pid, release_pid), info = {}
{'force_smil_url': True}), 'ThePlatform', release_pid) query = {
'mbr': 'true',
}
account_pid, release_pid = [None] * 2
tve = settings.get('sharedTVE')
if tve:
query['manifest'] = 'm3u'
account_pid = 'HNK2IC'
release_pid = tve['release_pid']
if tve.get('entitlement') == 'auth':
adobe_pass = settings.get('adobePass', {})
resource = self._get_mvpd_resource(
adobe_pass.get('adobePassResourceId', 'bravo'),
tve['title'], release_pid, tve.get('rating'))
query['auth'] = self._extract_mvpd_auth(
url, release_pid, adobe_pass.get('adobePassRequestorId', 'bravo'), resource)
else:
shared_playlist = settings['shared_playlist']
account_pid = shared_playlist['account_pid']
metadata = shared_playlist['video_metadata'][shared_playlist['default_clip']]
release_pid = metadata['release_pid']
info.update({
'title': metadata['title'],
'description': metadata.get('description'),
'season_number': int_or_none(metadata.get('season_num')),
'episode_number': int_or_none(metadata.get('episode_num')),
})
query['switch'] = 'progressive'
info.update({
'_type': 'url_transparent',
'id': release_pid,
'url': smuggle_url(update_url_query(
'http://link.theplatform.com/s/%s/%s' % (account_pid, release_pid),
query), {'force_smil_url': True}),
'ie_key': 'ThePlatform',
})
return info

View File

@ -1,4 +1,4 @@
# encoding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
@ -621,15 +621,21 @@ class BrightcoveNewIE(InfoExtractor):
'url': text_track['src'], 'url': text_track['src'],
}) })
is_live = False
duration = float_or_none(json_data.get('duration'), 1000)
if duration and duration < 0:
is_live = True
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': self._live_title(title) if is_live else title,
'description': clean_html(json_data.get('description')), 'description': clean_html(json_data.get('description')),
'thumbnail': json_data.get('thumbnail') or json_data.get('poster'), 'thumbnail': json_data.get('thumbnail') or json_data.get('poster'),
'duration': float_or_none(json_data.get('duration'), 1000), 'duration': duration,
'timestamp': parse_iso8601(json_data.get('published_at')), 'timestamp': parse_iso8601(json_data.get('published_at')),
'uploader_id': account_id, 'uploader_id': account_id,
'formats': formats, 'formats': formats,
'subtitles': subtitles, 'subtitles': subtitles,
'tags': json_data.get('tags', []), 'tags': json_data.get('tags', []),
'is_live': is_live,
} }

View File

@ -1,6 +1,5 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import json
import re import re
from .common import InfoExtractor from .common import InfoExtractor
@ -8,15 +7,15 @@ from ..utils import ExtractorError
class BYUtvIE(InfoExtractor): class BYUtvIE(InfoExtractor):
_VALID_URL = r'^https?://(?:www\.)?byutv.org/watch/[0-9a-f-]+/(?P<video_id>[^/?#]+)' _VALID_URL = r'https?://(?:www\.)?byutv\.org/watch/(?!event/)(?P<id>[0-9a-f-]+)(?:/(?P<display_id>[^/?#&]+))?'
_TEST = { _TESTS = [{
'url': 'http://www.byutv.org/watch/6587b9a3-89d2-42a6-a7f7-fd2f81840a7d/studio-c-season-5-episode-5', 'url': 'http://www.byutv.org/watch/6587b9a3-89d2-42a6-a7f7-fd2f81840a7d/studio-c-season-5-episode-5',
'md5': '05850eb8c749e2ee05ad5a1c34668493',
'info_dict': { 'info_dict': {
'id': 'studio-c-season-5-episode-5', 'id': '6587b9a3-89d2-42a6-a7f7-fd2f81840a7d',
'display_id': 'studio-c-season-5-episode-5',
'ext': 'mp4', 'ext': 'mp4',
'description': 'md5:e07269172baff037f8e8bf9956bc9747',
'title': 'Season 5 Episode 5', 'title': 'Season 5 Episode 5',
'description': 'md5:e07269172baff037f8e8bf9956bc9747',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
'duration': 1486.486, 'duration': 1486.486,
}, },
@ -24,28 +23,71 @@ class BYUtvIE(InfoExtractor):
'skip_download': True, 'skip_download': True,
}, },
'add_ie': ['Ooyala'], 'add_ie': ['Ooyala'],
}, {
'url': 'http://www.byutv.org/watch/6587b9a3-89d2-42a6-a7f7-fd2f81840a7d',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
display_id = mobj.group('display_id') or video_id
webpage = self._download_webpage(url, display_id)
episode_code = self._search_regex(
r'(?s)episode:(.*?\}),\s*\n', webpage, 'episode information')
ep = self._parse_json(
episode_code, display_id, transform_source=lambda s:
re.sub(r'(\n\s+)([a-zA-Z]+):\s+\'(.*?)\'', r'\1"\2": "\3"', s))
if ep['providerType'] != 'Ooyala':
raise ExtractorError('Unsupported provider %s' % ep['provider'])
return {
'_type': 'url_transparent',
'ie_key': 'Ooyala',
'url': 'ooyala:%s' % ep['providerId'],
'id': video_id,
'display_id': display_id,
'title': ep['title'],
'description': ep.get('description'),
'thumbnail': ep.get('imageThumbnail'),
}
class BYUtvEventIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?byutv\.org/watch/event/(?P<id>[0-9a-f-]+)'
_TEST = {
'url': 'http://www.byutv.org/watch/event/29941b9b-8bf6-48d2-aebf-7a87add9e34b',
'info_dict': {
'id': '29941b9b-8bf6-48d2-aebf-7a87add9e34b',
'ext': 'mp4',
'title': 'Toledo vs. BYU (9/30/16)',
},
'params': {
'skip_download': True,
},
'add_ie': ['Ooyala'],
} }
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id = self._match_id(url)
video_id = mobj.group('video_id')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
episode_code = self._search_regex(
r'(?s)episode:(.*?\}),\s*\n', webpage, 'episode information')
episode_json = re.sub(
r'(\n\s+)([a-zA-Z]+):\s+\'(.*?)\'', r'\1"\2": "\3"', episode_code)
ep = json.loads(episode_json)
if ep['providerType'] == 'Ooyala': ooyala_id = self._search_regex(
r'providerId\s*:\s*(["\'])(?P<id>(?:(?!\1).)+)\1',
webpage, 'ooyala id', group='id')
title = self._search_regex(
r'class=["\']description["\'][^>]*>\s*<h1>([^<]+)</h1>', webpage,
'title').strip()
return { return {
'_type': 'url_transparent', '_type': 'url_transparent',
'ie_key': 'Ooyala', 'ie_key': 'Ooyala',
'url': 'ooyala:%s' % ep['providerId'], 'url': 'ooyala:%s' % ooyala_id,
'id': video_id, 'id': video_id,
'title': ep['title'], 'title': title,
'description': ep.get('description'),
'thumbnail': ep.get('imageThumbnail'),
} }
else:
raise ExtractorError('Unsupported provider %s' % ep['provider'])

View File

@ -112,7 +112,7 @@ class CamdemyIE(InfoExtractor):
class CamdemyFolderIE(InfoExtractor): class CamdemyFolderIE(InfoExtractor):
_VALID_URL = r'https?://www.camdemy.com/folder/(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?camdemy\.com/folder/(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
# links with trailing slash # links with trailing slash
'url': 'http://www.camdemy.com/folder/450', 'url': 'http://www.camdemy.com/folder/450',

View File

@ -1,4 +1,4 @@
# encoding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
@ -6,11 +6,13 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urllib_parse_urlparse from ..compat import compat_urllib_parse_urlparse
from ..utils import ( from ..utils import (
dict_get,
ExtractorError, ExtractorError,
HEADRequest, HEADRequest,
unified_strdate,
qualities,
int_or_none, int_or_none,
qualities,
remove_end,
unified_strdate,
) )
@ -23,6 +25,7 @@ class CanalplusIE(InfoExtractor):
(?:(?:www|m)\.)?canalplus\.fr| (?:(?:www|m)\.)?canalplus\.fr|
(?:www\.)?piwiplus\.fr| (?:www\.)?piwiplus\.fr|
(?:www\.)?d8\.tv| (?:www\.)?d8\.tv|
(?:www\.)?c8\.fr|
(?:www\.)?d17\.tv| (?:www\.)?d17\.tv|
(?:www\.)?itele\.fr (?:www\.)?itele\.fr
)/(?:(?:[^/]+/)*(?P<display_id>[^/?#&]+))?(?:\?.*\bvid=(?P<vid>\d+))?| )/(?:(?:[^/]+/)*(?P<display_id>[^/?#&]+))?(?:\?.*\bvid=(?P<vid>\d+))?|
@ -35,53 +38,53 @@ class CanalplusIE(InfoExtractor):
'canalplus': 'cplus', 'canalplus': 'cplus',
'piwiplus': 'teletoon', 'piwiplus': 'teletoon',
'd8': 'd8', 'd8': 'd8',
'c8': 'd8',
'd17': 'd17', 'd17': 'd17',
'itele': 'itele', 'itele': 'itele',
} }
_TESTS = [{ _TESTS = [{
'url': 'http://www.canalplus.fr/c-emissions/pid1830-c-zapping.html?vid=1192814', 'url': 'http://www.canalplus.fr/c-emissions/pid1830-c-zapping.html?vid=1192814',
'md5': '41f438a4904f7664b91b4ed0dec969dc',
'info_dict': { 'info_dict': {
'id': '1192814', 'id': '1405510',
'display_id': 'pid1830-c-zapping',
'ext': 'mp4', 'ext': 'mp4',
'title': "L'Année du Zapping 2014 - L'Année du Zapping 2014", 'title': 'Zapping - 02/07/2016',
'description': "Toute l'année 2014 dans un Zapping exceptionnel !", 'description': 'Le meilleur de toutes les chaînes, tous les jours',
'upload_date': '20150105', 'upload_date': '20160702',
}, },
}, { }, {
'url': 'http://www.piwiplus.fr/videos-piwi/pid1405-le-labyrinthe-boing-super-ranger.html?vid=1108190', 'url': 'http://www.piwiplus.fr/videos-piwi/pid1405-le-labyrinthe-boing-super-ranger.html?vid=1108190',
'info_dict': { 'info_dict': {
'id': '1108190', 'id': '1108190',
'ext': 'flv', 'display_id': 'pid1405-le-labyrinthe-boing-super-ranger',
'title': 'Le labyrinthe - Boing super ranger', 'ext': 'mp4',
'title': 'BOING SUPER RANGER - Ep : Le labyrinthe',
'description': 'md5:4cea7a37153be42c1ba2c1d3064376ff', 'description': 'md5:4cea7a37153be42c1ba2c1d3064376ff',
'upload_date': '20140724', 'upload_date': '20140724',
}, },
'skip': 'Only works from France', 'skip': 'Only works from France',
}, { }, {
'url': 'http://www.d8.tv/d8-docs-mags/pid5198-d8-en-quete-d-actualite.html?vid=1390231', 'url': 'http://www.c8.fr/c8-divertissement/ms-touche-pas-a-mon-poste/pid6318-videos-integrales.html',
'md5': '4b47b12b4ee43002626b97fad8fb1de5',
'info_dict': { 'info_dict': {
'id': '1390231', 'id': '1420213',
'display_id': 'pid6318-videos-integrales',
'ext': 'mp4', 'ext': 'mp4',
'title': "Vacances pas chères : prix discount ou grosses dépenses ? - En quête d'actualité", 'title': 'TPMP ! Même le matin - Les 35H de Baba - 14/10/2016',
'description': 'md5:edb6cf1cb4a1e807b5dd089e1ac8bfc6', 'description': 'md5:f96736c1b0ffaa96fd5b9e60ad871799',
'upload_date': '20160512', 'upload_date': '20161014',
},
'params': {
'skip_download': True,
}, },
'skip': 'Only works from France',
}, { }, {
'url': 'http://www.itele.fr/chroniques/invite-bruce-toussaint/thierry-solere-nicolas-sarkozy-officialisera-sa-candidature-a-la-primaire-quand-il-le-voudra-167224', 'url': 'http://www.itele.fr/chroniques/invite-michael-darmon/rachida-dati-nicolas-sarkozy-est-le-plus-en-phase-avec-les-inquietudes-des-francais-171510',
'info_dict': { 'info_dict': {
'id': '1398334', 'id': '1420176',
'display_id': 'rachida-dati-nicolas-sarkozy-est-le-plus-en-phase-avec-les-inquietudes-des-francais-171510',
'ext': 'mp4', 'ext': 'mp4',
'title': "L'invité de Bruce Toussaint du 07/06/2016 - ", 'title': 'L\'invité de Michaël Darmon du 14/10/2016 - ',
'description': 'md5:40ac7c9ad0feaeb6f605bad986f61324', 'description': 'Chaque matin du lundi au vendredi, Michaël Darmon reçoit un invité politique à 8h25.',
'upload_date': '20160607', 'upload_date': '20161014',
},
'params': {
'skip_download': True,
}, },
}, { }, {
'url': 'http://m.canalplus.fr/?vid=1398231', 'url': 'http://m.canalplus.fr/?vid=1398231',
@ -93,17 +96,16 @@ class CanalplusIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.groupdict().get('id') or mobj.groupdict().get('vid')
site_id = self._SITE_ID_MAP[compat_urllib_parse_urlparse(url).netloc.rsplit('.', 2)[-2]] site_id = self._SITE_ID_MAP[compat_urllib_parse_urlparse(url).netloc.rsplit('.', 2)[-2]]
# Beware, some subclasses do not define an id group # Beware, some subclasses do not define an id group
display_id = mobj.group('display_id') or video_id display_id = remove_end(dict_get(mobj.groupdict(), ('display_id', 'id', 'vid')), '.html')
if video_id is None:
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
video_id = self._search_regex( video_id = self._search_regex(
[r'<canal:player[^>]+?videoId=(["\'])(?P<id>\d+)', r'id=["\']canal_video_player(?P<id>\d+)'], [r'<canal:player[^>]+?videoId=(["\'])(?P<id>\d+)',
r'id=["\']canal_video_player(?P<id>\d+)'],
webpage, 'video id', group='id') webpage, 'video id', group='id')
info_url = self._VIDEO_INFO_TEMPLATE % (site_id, video_id) info_url = self._VIDEO_INFO_TEMPLATE % (site_id, video_id)

View File

@ -1,11 +1,13 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import float_or_none from ..utils import float_or_none
class CanvasIE(InfoExtractor): class CanvasIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?canvas\.be/video/(?:[^/]+/)*(?P<id>[^/?#&]+)' _VALID_URL = r'https?://(?:www\.)?(?P<site_id>canvas|een)\.be/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.canvas.be/video/de-afspraak/najaar-2015/de-afspraak-veilt-voor-de-warmste-week', 'url': 'http://www.canvas.be/video/de-afspraak/najaar-2015/de-afspraak-veilt-voor-de-warmste-week',
'md5': 'ea838375a547ac787d4064d8c7860a6c', 'md5': 'ea838375a547ac787d4064d8c7860a6c',
@ -38,22 +40,42 @@ class CanvasIE(InfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
} }
}, {
'url': 'https://www.een.be/sorry-voor-alles/herbekijk-sorry-voor-alles',
'info_dict': {
'id': 'mz-ast-11a587f8-b921-4266-82e2-0bce3e80d07f',
'display_id': 'herbekijk-sorry-voor-alles',
'ext': 'mp4',
'title': 'Herbekijk Sorry voor alles',
'description': 'md5:8bb2805df8164e5eb95d6a7a29dc0dd3',
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 3788.06,
},
'params': {
'skip_download': True,
}
}, {
'url': 'https://www.canvas.be/check-point/najaar-2016/de-politie-uw-vriend',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) mobj = re.match(self._VALID_URL, url)
site_id, display_id = mobj.group('site_id'), mobj.group('id')
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
title = self._search_regex( title = (self._search_regex(
r'<h1[^>]+class="video__body__header__title"[^>]*>(.+?)</h1>', r'<h1[^>]+class="video__body__header__title"[^>]*>(.+?)</h1>',
webpage, 'title', default=None) or self._og_search_title(webpage) webpage, 'title', default=None) or self._og_search_title(
webpage)).strip()
video_id = self._html_search_regex( video_id = self._html_search_regex(
r'data-video=(["\'])(?P<id>.+?)\1', webpage, 'video id', group='id') r'data-video=(["\'])(?P<id>(?:(?!\1).)+)\1', webpage, 'video id', group='id')
data = self._download_json( data = self._download_json(
'https://mediazone.vrt.be/api/v1/canvas/assets/%s' % video_id, display_id) 'https://mediazone.vrt.be/api/v1/%s/assets/%s'
% (site_id, video_id), display_id)
formats = [] formats = []
for target in data['targetUrls']: for target in data['targetUrls']:

View File

@ -9,6 +9,8 @@ from ..utils import (
try_get, try_get,
) )
from .videomore import VideomoreIE
class CarambaTVIE(InfoExtractor): class CarambaTVIE(InfoExtractor):
_VALID_URL = r'(?:carambatv:|https?://video1\.carambatv\.ru/v/)(?P<id>\d+)' _VALID_URL = r'(?:carambatv:|https?://video1\.carambatv\.ru/v/)(?P<id>\d+)'
@ -62,14 +64,16 @@ class CarambaTVPageIE(InfoExtractor):
_VALID_URL = r'https?://carambatv\.ru/(?:[^/]+/)+(?P<id>[^/?#&]+)' _VALID_URL = r'https?://carambatv\.ru/(?:[^/]+/)+(?P<id>[^/?#&]+)'
_TEST = { _TEST = {
'url': 'http://carambatv.ru/movie/bad-comedian/razborka-v-manile/', 'url': 'http://carambatv.ru/movie/bad-comedian/razborka-v-manile/',
'md5': '', 'md5': 'a49fb0ec2ad66503eeb46aac237d3c86',
'info_dict': { 'info_dict': {
'id': '191910501', 'id': '475222',
'ext': 'mp4', 'ext': 'flv',
'title': '[BadComedian] - Разборка в Маниле (Абсолютный обзор)', 'title': '[BadComedian] - Разборка в Маниле (Абсолютный обзор)',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg',
'duration': 2678.31, # duration reported by videomore is incorrect
'duration': int,
}, },
'add_ie': [VideomoreIE.ie_key()],
} }
def _real_extract(self, url): def _real_extract(self, url):
@ -77,6 +81,16 @@ class CarambaTVPageIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
videomore_url = VideomoreIE._extract_url(webpage)
if videomore_url:
title = self._og_search_title(webpage)
return {
'_type': 'url_transparent',
'url': videomore_url,
'ie_key': VideomoreIE.ie_key(),
'title': title,
}
video_url = self._og_search_property('video:iframe', webpage, default=None) video_url = self._og_search_property('video:iframe', webpage, default=None)
if not video_url: if not video_url:

View File

@ -0,0 +1,42 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .turner import TurnerBaseIE
class CartoonNetworkIE(TurnerBaseIE):
_VALID_URL = r'https?://(?:www\.)?cartoonnetwork\.com/video/(?:[^/]+/)+(?P<id>[^/?#]+)-(?:clip|episode)\.html'
_TEST = {
'url': 'http://www.cartoonnetwork.com/video/teen-titans-go/starfire-the-cat-lady-clip.html',
'info_dict': {
'id': '8a250ab04ed07e6c014ef3f1e2f9016c',
'ext': 'mp4',
'title': 'Starfire the Cat Lady',
'description': 'Robin decides to become a cat so that Starfire will finally love him.',
},
'params': {
# m3u8 download
'skip_download': True,
},
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
id_type, video_id = re.search(r"_cnglobal\.cvp(Video|Title)Id\s*=\s*'([^']+)';", webpage).groups()
query = ('id' if id_type == 'Video' else 'titleId') + '=' + video_id
return self._extract_cvp_info(
'http://www.cartoonnetwork.com/video-seo-svc/episodeservices/getCvpPlaylist?networkName=CN2&' + query, video_id, {
'secure': {
'media_src': 'http://androidhls-secure.cdn.turner.com/toon/big',
'tokenizer_src': 'http://www.cartoonnetwork.com/cntv/mvpd/processors/services/token_ipadAdobe.do',
},
}, {
'url': url,
'site_name': 'CartoonNetwork',
'auth_required': self._search_regex(
r'_cnglobal\.cvpFullOrPreviewAuth\s*=\s*(true|false);',
webpage, 'auth required', default='false') == 'true',
})

View File

@ -9,10 +9,19 @@ from ..utils import (
js_to_json, js_to_json,
smuggle_url, smuggle_url,
try_get, try_get,
xpath_text,
xpath_element,
xpath_with_ns,
find_xpath_attr,
parse_iso8601,
parse_age_limit,
int_or_none,
ExtractorError,
) )
class CBCIE(InfoExtractor): class CBCIE(InfoExtractor):
IE_NAME = 'cbc.ca'
_VALID_URL = r'https?://(?:www\.)?cbc\.ca/(?!player/)(?:[^/]+/)+(?P<id>[^/?#]+)' _VALID_URL = r'https?://(?:www\.)?cbc\.ca/(?!player/)(?:[^/]+/)+(?P<id>[^/?#]+)'
_TESTS = [{ _TESTS = [{
# with mediaId # with mediaId
@ -114,6 +123,7 @@ class CBCIE(InfoExtractor):
class CBCPlayerIE(InfoExtractor): class CBCPlayerIE(InfoExtractor):
IE_NAME = 'cbc.ca:player'
_VALID_URL = r'(?:cbcplayer:|https?://(?:www\.)?cbc\.ca/(?:player/play/|i/caffeine/syndicate/\?mediaId=))(?P<id>\d+)' _VALID_URL = r'(?:cbcplayer:|https?://(?:www\.)?cbc\.ca/(?:player/play/|i/caffeine/syndicate/\?mediaId=))(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.cbc.ca/player/play/2683190193', 'url': 'http://www.cbc.ca/player/play/2683190193',
@ -167,3 +177,165 @@ class CBCPlayerIE(InfoExtractor):
}), }),
'id': video_id, 'id': video_id,
} }
class CBCWatchBaseIE(InfoExtractor):
_device_id = None
_device_token = None
_API_BASE_URL = 'https://api-cbc.cloud.clearleap.com/cloffice/client/'
_NS_MAP = {
'media': 'http://search.yahoo.com/mrss/',
'clearleap': 'http://www.clearleap.com/namespace/clearleap/1.0/',
}
def _call_api(self, path, video_id):
url = path if path.startswith('http') else self._API_BASE_URL + path
result = self._download_xml(url, video_id, headers={
'X-Clearleap-DeviceId': self._device_id,
'X-Clearleap-DeviceToken': self._device_token,
})
error_message = xpath_text(result, 'userMessage') or xpath_text(result, 'systemMessage')
if error_message:
raise ExtractorError('%s said: %s' % (self.IE_NAME, error_message))
return result
def _real_initialize(self):
if not self._device_id or not self._device_token:
device = self._downloader.cache.load('cbcwatch', 'device') or {}
self._device_id, self._device_token = device.get('id'), device.get('token')
if not self._device_id or not self._device_token:
result = self._download_xml(
self._API_BASE_URL + 'device/register',
None, data=b'<device><type>web</type></device>')
self._device_id = xpath_text(result, 'deviceId', fatal=True)
self._device_token = xpath_text(result, 'deviceToken', fatal=True)
self._downloader.cache.store(
'cbcwatch', 'device', {
'id': self._device_id,
'token': self._device_token,
})
def _parse_rss_feed(self, rss):
channel = xpath_element(rss, 'channel', fatal=True)
def _add_ns(path):
return xpath_with_ns(path, self._NS_MAP)
entries = []
for item in channel.findall('item'):
guid = xpath_text(item, 'guid', fatal=True)
title = xpath_text(item, 'title', fatal=True)
media_group = xpath_element(item, _add_ns('media:group'), fatal=True)
content = xpath_element(media_group, _add_ns('media:content'), fatal=True)
content_url = content.attrib['url']
thumbnails = []
for thumbnail in media_group.findall(_add_ns('media:thumbnail')):
thumbnail_url = thumbnail.get('url')
if not thumbnail_url:
continue
thumbnails.append({
'id': thumbnail.get('profile'),
'url': thumbnail_url,
'width': int_or_none(thumbnail.get('width')),
'height': int_or_none(thumbnail.get('height')),
})
timestamp = None
release_date = find_xpath_attr(
item, _add_ns('media:credit'), 'role', 'releaseDate')
if release_date is not None:
timestamp = parse_iso8601(release_date.text)
entries.append({
'_type': 'url_transparent',
'url': content_url,
'id': guid,
'title': title,
'description': xpath_text(item, 'description'),
'timestamp': timestamp,
'duration': int_or_none(content.get('duration')),
'age_limit': parse_age_limit(xpath_text(item, _add_ns('media:rating'))),
'episode': xpath_text(item, _add_ns('clearleap:episode')),
'episode_number': int_or_none(xpath_text(item, _add_ns('clearleap:episodeInSeason'))),
'series': xpath_text(item, _add_ns('clearleap:series')),
'season_number': int_or_none(xpath_text(item, _add_ns('clearleap:season'))),
'thumbnails': thumbnails,
'ie_key': 'CBCWatchVideo',
})
return self.playlist_result(
entries, xpath_text(channel, 'guid'),
xpath_text(channel, 'title'),
xpath_text(channel, 'description'))
class CBCWatchVideoIE(CBCWatchBaseIE):
IE_NAME = 'cbc.ca:watch:video'
_VALID_URL = r'https?://api-cbc\.cloud\.clearleap\.com/cloffice/client/web/play/?\?.*?\bcontentId=(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'
def _real_extract(self, url):
video_id = self._match_id(url)
result = self._call_api(url, video_id)
m3u8_url = xpath_text(result, 'url', fatal=True)
formats = self._extract_m3u8_formats(re.sub(r'/([^/]+)/[^/?]+\.m3u8', r'/\1/\1.m3u8', m3u8_url), video_id, 'mp4', fatal=False)
if len(formats) < 2:
formats = self._extract_m3u8_formats(m3u8_url, video_id, 'mp4')
# Despite metadata in m3u8 all video+audio formats are
# actually video-only (no audio)
for f in formats:
if f.get('acodec') != 'none' and f.get('vcodec') != 'none':
f['acodec'] = 'none'
self._sort_formats(formats)
info = {
'id': video_id,
'title': video_id,
'formats': formats,
}
rss = xpath_element(result, 'rss')
if rss:
info.update(self._parse_rss_feed(rss)['entries'][0])
del info['url']
del info['_type']
del info['ie_key']
return info
class CBCWatchIE(CBCWatchBaseIE):
IE_NAME = 'cbc.ca:watch'
_VALID_URL = r'https?://watch\.cbc\.ca/(?:[^/]+/)+(?P<id>[0-9a-f-]+)'
_TESTS = [{
'url': 'http://watch.cbc.ca/doc-zone/season-6/customer-disservice/38e815a-009e3ab12e4',
'info_dict': {
'id': '38e815a-009e3ab12e4',
'ext': 'mp4',
'title': 'Customer (Dis)Service',
'description': 'md5:8bdd6913a0fe03d4b2a17ebe169c7c87',
'upload_date': '20160219',
'timestamp': 1455840000,
},
'params': {
# m3u8 download
'skip_download': True,
'format': 'bestvideo',
},
'skip': 'Geo-restricted to Canada',
}, {
'url': 'http://watch.cbc.ca/arthur/all/1ed4b385-cd84-49cf-95f0-80f004680057',
'info_dict': {
'id': '1ed4b385-cd84-49cf-95f0-80f004680057',
'title': 'Arthur',
'description': 'Arthur, the sweetest 8-year-old aardvark, and his pals solve all kinds of problems with humour, kindness and teamwork.',
},
'playlist_mincount': 30,
'skip': 'Geo-restricted to Canada',
}]
def _real_extract(self, url):
video_id = self._match_id(url)
rss = self._call_api('web/browse/' + video_id, video_id)
return self._parse_rss_feed(rss)

View File

@ -4,6 +4,9 @@ from .theplatform import ThePlatformFeedIE
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
find_xpath_attr, find_xpath_attr,
xpath_element,
xpath_text,
update_url_query,
) )
@ -17,19 +20,6 @@ class CBSBaseIE(ThePlatformFeedIE):
}] }]
} if closed_caption_e is not None and closed_caption_e.attrib.get('value') else [] } if closed_caption_e is not None and closed_caption_e.attrib.get('value') else []
def _extract_video_info(self, filter_query, video_id):
return self._extract_feed_info(
'dJ5BDC', 'VxxJg8Ymh8sE', filter_query, video_id, lambda entry: {
'series': entry.get('cbs$SeriesTitle'),
'season_number': int_or_none(entry.get('cbs$SeasonNumber')),
'episode': entry.get('cbs$EpisodeTitle'),
'episode_number': int_or_none(entry.get('cbs$EpisodeNumber')),
}, {
'StreamPack': {
'manifest': 'm3u',
}
})
class CBSIE(CBSBaseIE): class CBSIE(CBSBaseIE):
_VALID_URL = r'(?:cbs:|https?://(?:www\.)?(?:cbs\.com/shows/[^/]+/video|colbertlateshow\.com/(?:video|podcasts))/)(?P<id>[\w-]+)' _VALID_URL = r'(?:cbs:|https?://(?:www\.)?(?:cbs\.com/shows/[^/]+/video|colbertlateshow\.com/(?:video|podcasts))/)(?P<id>[\w-]+)'
@ -38,7 +28,6 @@ class CBSIE(CBSBaseIE):
'url': 'http://www.cbs.com/shows/garth-brooks/video/_u7W953k6la293J7EPTd9oHkSPs6Xn6_/connect-chat-feat-garth-brooks/', 'url': 'http://www.cbs.com/shows/garth-brooks/video/_u7W953k6la293J7EPTd9oHkSPs6Xn6_/connect-chat-feat-garth-brooks/',
'info_dict': { 'info_dict': {
'id': '_u7W953k6la293J7EPTd9oHkSPs6Xn6_', 'id': '_u7W953k6la293J7EPTd9oHkSPs6Xn6_',
'display_id': 'connect-chat-feat-garth-brooks',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Connect Chat feat. Garth Brooks', 'title': 'Connect Chat feat. Garth Brooks',
'description': 'Connect with country music singer Garth Brooks, as he chats with fans on Wednesday November 27, 2013. Be sure to tune in to Garth Brooks: Live from Las Vegas, Friday November 29, at 9/8c on CBS!', 'description': 'Connect with country music singer Garth Brooks, as he chats with fans on Wednesday November 27, 2013. Be sure to tune in to Garth Brooks: Live from Las Vegas, Friday November 29, at 9/8c on CBS!',
@ -47,7 +36,10 @@ class CBSIE(CBSBaseIE):
'upload_date': '20131127', 'upload_date': '20131127',
'uploader': 'CBSI-NEW', 'uploader': 'CBSI-NEW',
}, },
'expected_warnings': ['Failed to download m3u8 information'], 'params': {
# m3u8 download
'skip_download': True,
},
'_skip': 'Blocked outside the US', '_skip': 'Blocked outside the US',
}, { }, {
'url': 'http://colbertlateshow.com/video/8GmB0oY0McANFvp2aEffk9jZZZ2YyXxy/the-colbeard/', 'url': 'http://colbertlateshow.com/video/8GmB0oY0McANFvp2aEffk9jZZZ2YyXxy/the-colbeard/',
@ -56,8 +48,53 @@ class CBSIE(CBSBaseIE):
'url': 'http://www.colbertlateshow.com/podcasts/dYSwjqPs_X1tvbV_P2FcPWRa_qT6akTC/in-the-bad-room-with-stephen/', 'url': 'http://www.colbertlateshow.com/podcasts/dYSwjqPs_X1tvbV_P2FcPWRa_qT6akTC/in-the-bad-room-with-stephen/',
'only_matching': True, 'only_matching': True,
}] }]
TP_RELEASE_URL_TEMPLATE = 'http://link.theplatform.com/s/dJ5BDC/%s?mbr=true'
def _extract_video_info(self, content_id):
items_data = self._download_xml(
'http://can.cbs.com/thunder/player/videoPlayerService.php',
content_id, query={'partner': 'cbs', 'contentId': content_id})
video_data = xpath_element(items_data, './/item')
title = xpath_text(video_data, 'videoTitle', 'title', True)
tp_path = 'dJ5BDC/media/guid/2198311517/%s' % content_id
tp_release_url = 'http://link.theplatform.com/s/' + tp_path
asset_types = []
subtitles = {}
formats = []
for item in items_data.findall('.//item'):
asset_type = xpath_text(item, 'assetType')
if not asset_type or asset_type in asset_types:
continue
asset_types.append(asset_type)
query = {
'mbr': 'true',
'assetTypes': asset_type,
}
if asset_type.startswith('HLS') or asset_type in ('OnceURL', 'StreamPack'):
query['formats'] = 'MPEG4,M3U'
elif asset_type in ('RTMP', 'WIFI', '3G'):
query['formats'] = 'MPEG4,FLV'
tp_formats, tp_subtitles = self._extract_theplatform_smil(
update_url_query(tp_release_url, query), content_id,
'Downloading %s SMIL data' % asset_type)
formats.extend(tp_formats)
subtitles = self._merge_subtitles(subtitles, tp_subtitles)
self._sort_formats(formats)
info = self._extract_theplatform_metadata(tp_path, content_id)
info.update({
'id': content_id,
'title': title,
'series': xpath_text(video_data, 'seriesTitle'),
'season_number': int_or_none(xpath_text(video_data, 'seasonNumber')),
'episode_number': int_or_none(xpath_text(video_data, 'episodeNumber')),
'duration': int_or_none(xpath_text(video_data, 'videoLength'), 1000),
'thumbnail': xpath_text(video_data, 'previewImageURL'),
'formats': formats,
'subtitles': subtitles,
})
return info
def _real_extract(self, url): def _real_extract(self, url):
content_id = self._match_id(url) content_id = self._match_id(url)
return self._extract_video_info('byGuid=%s' % content_id, content_id) return self._extract_video_info(content_id)

View File

@ -63,7 +63,7 @@ class CBSInteractiveIE(ThePlatformIE):
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
data_json = self._html_search_regex( data_json = self._html_search_regex(
r"data-(?:cnet|zdnet)-video(?:-uvp)?-options='([^']+)'", r"data-(?:cnet|zdnet)-video(?:-uvp(?:js)?)?-options='([^']+)'",
webpage, 'data json') webpage, 'data json')
data = self._parse_json(data_json, display_id) data = self._parse_json(data_json, display_id)
vdata = data.get('video') or data['videos'][0] vdata = data.get('video') or data['videos'][0]

View File

@ -41,13 +41,8 @@ class CBSLocalIE(AnvatoIE):
'url': 'http://cleveland.cbslocal.com/2016/05/16/indians-score-season-high-15-runs-in-blowout-win-over-reds-rapid-reaction/', 'url': 'http://cleveland.cbslocal.com/2016/05/16/indians-score-season-high-15-runs-in-blowout-win-over-reds-rapid-reaction/',
'info_dict': { 'info_dict': {
'id': 'GxfCe0Zo7D-175909-5588', 'id': 'GxfCe0Zo7D-175909-5588',
'ext': 'mp4',
'title': 'Recap: CLE 15, CIN 6',
'description': '5/16/16: Indians\' bats explode for 15 runs in a win',
'upload_date': '20160516',
'timestamp': 1463433840,
'duration': 49,
}, },
'playlist_count': 9,
'params': { 'params': {
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
@ -60,11 +55,10 @@ class CBSLocalIE(AnvatoIE):
sendtonews_url = SendtoNewsIE._extract_url(webpage) sendtonews_url = SendtoNewsIE._extract_url(webpage)
if sendtonews_url: if sendtonews_url:
info_dict = { return self.url_result(
'_type': 'url_transparent', compat_urlparse.urljoin(url, sendtonews_url),
'url': compat_urlparse.urljoin(url, sendtonews_url), ie=SendtoNewsIE.ie_key())
}
else:
info_dict = self._extract_anvato_videos(webpage, display_id) info_dict = self._extract_anvato_videos(webpage, display_id)
time_str = self._html_search_regex( time_str = self._html_search_regex(

View File

@ -1,14 +1,15 @@
# encoding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from .cbs import CBSBaseIE from .cbs import CBSIE
from ..utils import ( from ..utils import (
parse_duration, parse_duration,
) )
class CBSNewsIE(CBSBaseIE): class CBSNewsIE(CBSIE):
IE_NAME = 'cbsnews'
IE_DESC = 'CBS News' IE_DESC = 'CBS News'
_VALID_URL = r'https?://(?:www\.)?cbsnews\.com/(?:news|videos)/(?P<id>[\da-z_-]+)' _VALID_URL = r'https?://(?:www\.)?cbsnews\.com/(?:news|videos)/(?P<id>[\da-z_-]+)'
@ -35,7 +36,8 @@ class CBSNewsIE(CBSBaseIE):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Fort Hood shooting: Army downplays mental illness as cause of attack', 'title': 'Fort Hood shooting: Army downplays mental illness as cause of attack',
'description': 'md5:4a6983e480542d8b333a947bfc64ddc7', 'description': 'md5:4a6983e480542d8b333a947bfc64ddc7',
'upload_date': '19700101', 'upload_date': '20140404',
'timestamp': 1396650660,
'uploader': 'CBSI-NEW', 'uploader': 'CBSI-NEW',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
'duration': 205, 'duration': 205,
@ -63,51 +65,43 @@ class CBSNewsIE(CBSBaseIE):
item = video_info['item'] if 'item' in video_info else video_info item = video_info['item'] if 'item' in video_info else video_info
guid = item['mpxRefId'] guid = item['mpxRefId']
return self._extract_video_info('byGuid=%s' % guid, guid) return self._extract_video_info(guid)
class CBSNewsLiveVideoIE(InfoExtractor): class CBSNewsLiveVideoIE(InfoExtractor):
IE_NAME = 'cbsnews:livevideo'
IE_DESC = 'CBS News Live Videos' IE_DESC = 'CBS News Live Videos'
_VALID_URL = r'https?://(?:www\.)?cbsnews\.com/live/video/(?P<id>[\da-z_-]+)' _VALID_URL = r'https?://(?:www\.)?cbsnews\.com/live/video/(?P<id>[^/?#]+)'
_TESTS = [{ # Live videos get deleted soon. See http://www.cbsnews.com/live/ for the latest examples
_TEST = {
'url': 'http://www.cbsnews.com/live/video/clinton-sanders-prepare-to-face-off-in-nh/', 'url': 'http://www.cbsnews.com/live/video/clinton-sanders-prepare-to-face-off-in-nh/',
'info_dict': { 'info_dict': {
'id': 'clinton-sanders-prepare-to-face-off-in-nh', 'id': 'clinton-sanders-prepare-to-face-off-in-nh',
'ext': 'flv', 'ext': 'mp4',
'title': 'Clinton, Sanders Prepare To Face Off In NH', 'title': 'Clinton, Sanders Prepare To Face Off In NH',
'duration': 334, 'duration': 334,
}, },
'skip': 'Video gone, redirected to http://www.cbsnews.com/live/', 'skip': 'Video gone',
}, { }
'url': 'http://www.cbsnews.com/live/video/video-shows-intense-paragliding-accident/',
'info_dict': {
'id': 'video-shows-intense-paragliding-accident',
'ext': 'flv',
'title': 'Video Shows Intense Paragliding Accident',
},
}]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) video_info = self._download_json(
'http://feeds.cbsn.cbsnews.com/rundown/story', display_id, query={
'device': 'desktop',
'dvr_slug': display_id,
})
video_info = self._parse_json(self._html_search_regex( formats = self._extract_akamai_formats(video_info['url'], display_id)
r'data-story-obj=\'({.+?})\'', webpage, 'video JSON info'), video_id)['story'] self._sort_formats(formats)
hdcore_sign = 'hdcore=3.3.1'
f4m_formats = self._extract_f4m_formats(video_info['url'] + '&' + hdcore_sign, video_id)
if f4m_formats:
for entry in f4m_formats:
# URLs without the extra param induce an 404 error
entry.update({'extra_param_to_segment_url': hdcore_sign})
self._sort_formats(f4m_formats)
return { return {
'id': video_id, 'id': display_id,
'display_id': display_id,
'title': video_info['headline'], 'title': video_info['headline'],
'thumbnail': video_info.get('thumbnail_url_hd') or video_info.get('thumbnail_url_sd'), 'thumbnail': video_info.get('thumbnail_url_hd') or video_info.get('thumbnail_url_sd'),
'duration': parse_duration(video_info.get('segmentDur')), 'duration': parse_duration(video_info.get('segmentDur')),
'formats': f4m_formats, 'formats': formats,
} }

View File

@ -4,7 +4,7 @@ from .cbs import CBSBaseIE
class CBSSportsIE(CBSBaseIE): class CBSSportsIE(CBSBaseIE):
_VALID_URL = r'https?://www\.cbssports\.com/video/player/[^/]+/(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?cbssports\.com/video/player/[^/]+/(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.cbssports.com/video/player/videos/708337219968/0/ben-simmons-the-next-lebron?-not-so-fast', 'url': 'http://www.cbssports.com/video/player/videos/708337219968/0/ben-simmons-the-next-lebron?-not-so-fast',
@ -23,6 +23,9 @@ class CBSSportsIE(CBSBaseIE):
} }
}] }]
def _extract_video_info(self, filter_query, video_id):
return self._extract_feed_info('dJ5BDC', 'VxxJg8Ymh8sE', filter_query, video_id)
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
return self._extract_video_info('byId=%s' % video_id, video_id) return self._extract_video_info('byId=%s' % video_id, video_id)

View File

@ -0,0 +1,53 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import float_or_none
class CCTVIE(InfoExtractor):
_VALID_URL = r'''(?x)https?://(?:.+?\.)?
(?:
cctv\.(?:com|cn)|
cntv\.cn
)/
(?:
video/[^/]+/(?P<id>[0-9a-f]{32})|
\d{4}/\d{2}/\d{2}/(?P<display_id>VID[0-9A-Za-z]+)
)'''
_TESTS = [{
'url': 'http://english.cntv.cn/2016/09/03/VIDEhnkB5y9AgHyIEVphCEz1160903.shtml',
'md5': '819c7b49fc3927d529fb4cd555621823',
'info_dict': {
'id': '454368eb19ad44a1925bf1eb96140a61',
'ext': 'mp4',
'title': 'Portrait of Real Current Life 09/03/2016 Modern Inventors Part 1',
}
}, {
'url': 'http://tv.cctv.com/2016/09/07/VIDE5C1FnlX5bUywlrjhxXOV160907.shtml',
'only_matching': True,
}, {
'url': 'http://tv.cntv.cn/video/C39296/95cfac44cabd3ddc4a9438780a4e5c44',
'only_matching': True
}]
def _real_extract(self, url):
video_id, display_id = re.match(self._VALID_URL, url).groups()
if not video_id:
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
r'(?:fo\.addVariable\("videoCenterId",\s*|guid\s*=\s*)"([0-9a-f]{32})',
webpage, 'video_id')
api_data = self._download_json(
'http://vdn.apps.cntv.cn/api/getHttpVideoInfo.do?pid=' + video_id, video_id)
m3u8_url = re.sub(r'maxbr=\d+&?', '', api_data['hls_url'])
return {
'id': video_id,
'title': api_data['title'],
'formats': self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native', fatal=False),
'duration': float_or_none(api_data.get('video', {}).get('totalLength')),
}

View File

@ -1,4 +1,4 @@
# -*- coding: utf-8 -*- # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
@ -17,7 +17,7 @@ from ..utils import (
class CeskaTelevizeIE(InfoExtractor): class CeskaTelevizeIE(InfoExtractor):
_VALID_URL = r'https?://www\.ceskatelevize\.cz/(porady|ivysilani)/(?:[^/]+/)*(?P<id>[^/#?]+)/*(?:[#?].*)?$' _VALID_URL = r'https?://(?:www\.)?ceskatelevize\.cz/(porady|ivysilani)/(?:[^/]+/)*(?P<id>[^/#?]+)/*(?:[#?].*)?$'
_TESTS = [{ _TESTS = [{
'url': 'http://www.ceskatelevize.cz/ivysilani/ivysilani/10441294653-hyde-park-civilizace/214411058091220', 'url': 'http://www.ceskatelevize.cz/ivysilani/ivysilani/10441294653-hyde-park-civilizace/214411058091220',
'info_dict': { 'info_dict': {

View File

@ -0,0 +1,51 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import remove_end
class CharlieRoseIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?charlierose\.com/video(?:s|/player)/(?P<id>\d+)'
_TESTS = [{
'url': 'https://charlierose.com/videos/27996',
'md5': 'fda41d49e67d4ce7c2411fd2c4702e09',
'info_dict': {
'id': '27996',
'ext': 'mp4',
'title': 'Remembering Zaha Hadid',
'thumbnail': 're:^https?://.*\.jpg\?\d+',
'description': 'We revisit past conversations with Zaha Hadid, in memory of the world renowned Iraqi architect.',
'subtitles': {
'en': [{
'ext': 'vtt',
}],
},
},
}, {
'url': 'https://charlierose.com/videos/27996',
'only_matching': True,
}]
_PLAYER_BASE = 'https://charlierose.com/video/player/%s'
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(self._PLAYER_BASE % video_id, video_id)
title = remove_end(self._og_search_title(webpage), ' - Charlie Rose')
info_dict = self._parse_html5_media_entries(
self._PLAYER_BASE % video_id, webpage, video_id,
m3u8_entry_protocol='m3u8_native')[0]
self._sort_formats(info_dict['formats'])
self._remove_duplicate_formats(info_dict['formats'])
info_dict.update({
'id': video_id,
'title': title,
'thumbnail': self._og_search_thumbnail(webpage),
'description': self._og_search_description(webpage),
})
return info_dict

View File

@ -2,6 +2,7 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import base64 import base64
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import parse_duration from ..utils import parse_duration
@ -65,12 +66,11 @@ class ChirbitIE(InfoExtractor):
class ChirbitProfileIE(InfoExtractor): class ChirbitProfileIE(InfoExtractor):
IE_NAME = 'chirbit:profile' IE_NAME = 'chirbit:profile'
_VALID_URL = r'https?://(?:www\.)?chirbit.com/(?:rss/)?(?P<id>[^/]+)' _VALID_URL = r'https?://(?:www\.)?chirbit\.com/(?:rss/)?(?P<id>[^/]+)'
_TEST = { _TEST = {
'url': 'http://chirbit.com/ScarletBeauty', 'url': 'http://chirbit.com/ScarletBeauty',
'info_dict': { 'info_dict': {
'id': 'ScarletBeauty', 'id': 'ScarletBeauty',
'title': 'Chirbits by ScarletBeauty',
}, },
'playlist_mincount': 3, 'playlist_mincount': 3,
} }
@ -78,13 +78,10 @@ class ChirbitProfileIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
profile_id = self._match_id(url) profile_id = self._match_id(url)
rss = self._download_xml( webpage = self._download_webpage(url, profile_id)
'http://chirbit.com/rss/%s' % profile_id, profile_id)
entries = [ entries = [
self.url_result(audio_url.text, 'Chirbit') self.url_result(self._proto_relative_url('//chirb.it/' + video_id))
for audio_url in rss.findall('./channel/item/link')] for _, video_id in re.findall(r'<input[^>]+id=([\'"])copy-btn-(?P<id>[0-9a-zA-Z]+)\1', webpage)]
title = rss.find('./channel/title').text return self.playlist_result(entries, profile_id)
return self.playlist_result(entries, profile_id, title)

View File

@ -1,3 +1,4 @@
# coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
@ -10,15 +11,15 @@ from ..utils import (
class ClipfishIE(InfoExtractor): class ClipfishIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?clipfish\.de/(?:[^/]+/)+video/(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.)?clipfish\.de/(?:[^/]+/)+video/(?P<id>[0-9]+)'
_TEST = { _TEST = {
'url': 'http://www.clipfish.de/special/game-trailer/video/3966754/fifa-14-e3-2013-trailer/', 'url': 'http://www.clipfish.de/special/ugly-americans/video/4343170/s01-e01-ugly-americans-date-in-der-hoelle/',
'md5': '79bc922f3e8a9097b3d68a93780fd475', 'md5': '720563e467b86374c194bdead08d207d',
'info_dict': { 'info_dict': {
'id': '3966754', 'id': '4343170',
'ext': 'mp4', 'ext': 'mp4',
'title': 'FIFA 14 - E3 2013 Trailer', 'title': 'S01 E01 - Ugly Americans - Date in der Hölle',
'description': 'Video zu FIFA 14: E3 2013 Trailer', 'description': 'Mark Lilly arbeitet im Sozialdienst der Stadt New York und soll Immigranten bei ihrer Einbürgerung in die USA zur Seite stehen.',
'upload_date': '20130611', 'upload_date': '20161005',
'duration': 82, 'duration': 1291,
'view_count': int, 'view_count': int,
} }
} }
@ -50,10 +51,14 @@ class ClipfishIE(InfoExtractor):
'tbr': int_or_none(video_info.get('bitrate')), 'tbr': int_or_none(video_info.get('bitrate')),
}) })
descr = video_info.get('descr')
if descr:
descr = descr.strip()
return { return {
'id': video_id, 'id': video_id,
'title': video_info['title'], 'title': video_info['title'],
'description': video_info.get('descr'), 'description': descr,
'formats': formats, 'formats': formats,
'thumbnail': video_info.get('media_content_thumbnail_large') or video_info.get('media_thumbnail'), 'thumbnail': video_info.get('media_content_thumbnail_large') or video_info.get('media_thumbnail'),
'duration': int_or_none(video_info.get('media_length')), 'duration': int_or_none(video_info.get('media_length')),

View File

@ -1,9 +1,6 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
clean_html, clean_html,
@ -30,16 +27,14 @@ class ClubicIE(InfoExtractor):
}] }]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id = self._match_id(url)
video_id = mobj.group('id')
player_url = 'http://player.m6web.fr/v1/player/clubic/%s.html' % video_id player_url = 'http://player.m6web.fr/v1/player/clubic/%s.html' % video_id
player_page = self._download_webpage(player_url, video_id) player_page = self._download_webpage(player_url, video_id)
config_json = self._search_regex( config = self._parse_json(self._search_regex(
r'(?m)M6\.Player\.config\s*=\s*(\{.+?\});$', player_page, r'(?m)M6\.Player\.config\s*=\s*(\{.+?\});$', player_page,
'configuration') 'configuration'), video_id)
config = json.loads(config_json)
video_info = config['videoInfo'] video_info = config['videoInfo']
sources = config['sources'] sources = config['sources']

View File

@ -6,7 +6,7 @@ from ..utils import ExtractorError
class CMTIE(MTVIE): class CMTIE(MTVIE):
IE_NAME = 'cmt.com' IE_NAME = 'cmt.com'
_VALID_URL = r'https?://www\.cmt\.com/(?:videos|shows)/(?:[^/]+/)*(?P<videoid>\d+)' _VALID_URL = r'https?://(?:www\.)?cmt\.com/(?:videos|shows)/(?:[^/]+/)*(?P<videoid>\d+)'
_FEED_URL = 'http://www.cmt.com/sitewide/apps/player/embed/rss/' _FEED_URL = 'http://www.cmt.com/sitewide/apps/player/embed/rss/'
_TESTS = [{ _TESTS = [{
@ -26,7 +26,7 @@ class CMTIE(MTVIE):
'id': '1504699', 'id': '1504699',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Still The King Ep. 109 in 3 Minutes', 'title': 'Still The King Ep. 109 in 3 Minutes',
'description': 'Relive or catch up with Still The King by watching this recap of season 1, episode 9. New episodes Sundays 9/8c.', 'description': 'Relive or catch up with Still The King by watching this recap of season 1, episode 9.',
'timestamp': 1469421000.0, 'timestamp': 1469421000.0,
'upload_date': '20160725', 'upload_date': '20160725',
}, },
@ -42,3 +42,8 @@ class CMTIE(MTVIE):
'%s said: video is not available' % cls.IE_NAME, expected=True) '%s said: video is not available' % cls.IE_NAME, expected=True)
return super(CMTIE, cls)._transform_rtmp_url(rtmp_video_url) return super(CMTIE, cls)._transform_rtmp_url(rtmp_video_url)
def _extract_mgid(self, webpage):
return self._search_regex(
r'MTVN\.VIDEO\.contentUri\s*=\s*([\'"])(?P<mgid>.+?)\1',
webpage, 'mgid', group='mgid')

View File

@ -3,15 +3,12 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from .turner import TurnerBaseIE
int_or_none, from ..utils import url_basename
parse_duration,
url_basename,
)
class CNNIE(InfoExtractor): class CNNIE(TurnerBaseIE):
_VALID_URL = r'''(?x)https?://(?:(?:edition|www)\.)?cnn\.com/video/(?:data/.+?|\?)/ _VALID_URL = r'''(?x)https?://(?:(?P<sub_domain>edition|www|money)\.)?cnn\.com/(?:video/(?:data/.+?|\?)/)?videos?/
(?P<path>.+?/(?P<title>[^/]+?)(?:\.(?:[a-z\-]+)|(?=&)))''' (?P<path>.+?/(?P<title>[^/]+?)(?:\.(?:[a-z\-]+)|(?=&)))'''
_TESTS = [{ _TESTS = [{
@ -25,6 +22,7 @@ class CNNIE(InfoExtractor):
'duration': 135, 'duration': 135,
'upload_date': '20130609', 'upload_date': '20130609',
}, },
'expected_warnings': ['Failed to download m3u8 information'],
}, { }, {
'url': 'http://edition.cnn.com/video/?/video/us/2013/08/21/sot-student-gives-epic-speech.georgia-institute-of-technology&utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+rss%2Fcnn_topstories+%28RSS%3A+Top+Stories%29', 'url': 'http://edition.cnn.com/video/?/video/us/2013/08/21/sot-student-gives-epic-speech.georgia-institute-of-technology&utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+rss%2Fcnn_topstories+%28RSS%3A+Top+Stories%29',
'md5': 'b5cc60c60a3477d185af8f19a2a26f4e', 'md5': 'b5cc60c60a3477d185af8f19a2a26f4e',
@ -34,7 +32,8 @@ class CNNIE(InfoExtractor):
'title': "Student's epic speech stuns new freshmen", 'title': "Student's epic speech stuns new freshmen",
'description': "A Georgia Tech student welcomes the incoming freshmen with an epic speech backed by music from \"2001: A Space Odyssey.\"", 'description': "A Georgia Tech student welcomes the incoming freshmen with an epic speech backed by music from \"2001: A Space Odyssey.\"",
'upload_date': '20130821', 'upload_date': '20130821',
} },
'expected_warnings': ['Failed to download m3u8 information'],
}, { }, {
'url': 'http://www.cnn.com/video/data/2.0/video/living/2014/12/22/growing-america-nashville-salemtown-board-episode-1.hln.html', 'url': 'http://www.cnn.com/video/data/2.0/video/living/2014/12/22/growing-america-nashville-salemtown-board-episode-1.hln.html',
'md5': 'f14d02ebd264df951feb2400e2c25a1b', 'md5': 'f14d02ebd264df951feb2400e2c25a1b',
@ -44,80 +43,61 @@ class CNNIE(InfoExtractor):
'title': 'Nashville Ep. 1: Hand crafted skateboards', 'title': 'Nashville Ep. 1: Hand crafted skateboards',
'description': 'md5:e7223a503315c9f150acac52e76de086', 'description': 'md5:e7223a503315c9f150acac52e76de086',
'upload_date': '20141222', 'upload_date': '20141222',
} },
'expected_warnings': ['Failed to download m3u8 information'],
}, {
'url': 'http://money.cnn.com/video/news/2016/08/19/netflix-stunning-stats.cnnmoney/index.html',
'md5': '52a515dc1b0f001cd82e4ceda32be9d1',
'info_dict': {
'id': '/video/news/2016/08/19/netflix-stunning-stats.cnnmoney',
'ext': 'mp4',
'title': '5 stunning stats about Netflix',
'description': 'Did you know that Netflix has more than 80 million members? Here are five facts about the online video distributor that you probably didn\'t know.',
'upload_date': '20160819',
},
'params': {
# m3u8 download
'skip_download': True,
},
}, { }, {
'url': 'http://cnn.com/video/?/video/politics/2015/03/27/pkg-arizona-senator-church-attendance-mandatory.ktvk', 'url': 'http://cnn.com/video/?/video/politics/2015/03/27/pkg-arizona-senator-church-attendance-mandatory.ktvk',
'only_matching': True, 'only_matching': True,
}, { }, {
'url': 'http://cnn.com/video/?/video/us/2015/04/06/dnt-baker-refuses-anti-gay-order.wkmg', 'url': 'http://cnn.com/video/?/video/us/2015/04/06/dnt-baker-refuses-anti-gay-order.wkmg',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://edition.cnn.com/videos/arts/2016/04/21/olympic-games-cultural-a-z-brazil.cnn',
'only_matching': True,
}] }]
_CONFIG = {
# http://edition.cnn.com/.element/apps/cvp/3.0/cfg/spider/cnn/expansion/config.xml
'edition': {
'data_src': 'http://edition.cnn.com/video/data/3.0/video/%s/index.xml',
'media_src': 'http://pmd.cdn.turner.com/cnn/big',
},
# http://money.cnn.com/.element/apps/cvp2/cfg/config.xml
'money': {
'data_src': 'http://money.cnn.com/video/data/4.0/video/%s.xml',
'media_src': 'http://ht3.cdn.turner.com/money/big',
},
}
def _extract_timestamp(self, video_data):
# TODO: fix timestamp extraction
return None
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) sub_domain, path, page_title = re.match(self._VALID_URL, url).groups()
path = mobj.group('path') if sub_domain not in ('money', 'edition'):
page_title = mobj.group('title') sub_domain = 'edition'
info_url = 'http://edition.cnn.com/video/data/3.0/%s/index.xml' % path config = self._CONFIG[sub_domain]
info = self._download_xml(info_url, page_title) return self._extract_cvp_info(
config['data_src'] % path, page_title, {
formats = [] 'default': {
rex = re.compile(r'''(?x) 'media_src': config['media_src'],
(?P<width>[0-9]+)x(?P<height>[0-9]+)
(?:_(?P<bitrate>[0-9]+)k)?
''')
for f in info.findall('files/file'):
video_url = 'http://ht.cdn.turner.com/cnn/big%s' % (f.text.strip())
fdct = {
'format_id': f.attrib['bitrate'],
'url': video_url,
}
mf = rex.match(f.attrib['bitrate'])
if mf:
fdct['width'] = int(mf.group('width'))
fdct['height'] = int(mf.group('height'))
fdct['tbr'] = int_or_none(mf.group('bitrate'))
else:
mf = rex.search(f.text)
if mf:
fdct['width'] = int(mf.group('width'))
fdct['height'] = int(mf.group('height'))
fdct['tbr'] = int_or_none(mf.group('bitrate'))
else:
mi = re.match(r'ios_(audio|[0-9]+)$', f.attrib['bitrate'])
if mi:
if mi.group(1) == 'audio':
fdct['vcodec'] = 'none'
fdct['ext'] = 'm4a'
else:
fdct['tbr'] = int(mi.group(1))
formats.append(fdct)
self._sort_formats(formats)
thumbnails = [{
'height': int(t.attrib['height']),
'width': int(t.attrib['width']),
'url': t.text,
} for t in info.findall('images/image')]
metas_el = info.find('metas')
upload_date = (
metas_el.attrib.get('version') if metas_el is not None else None)
duration_el = info.find('length')
duration = parse_duration(duration_el.text)
return {
'id': info.attrib['id'],
'title': info.find('headline').text,
'formats': formats,
'thumbnails': thumbnails,
'description': info.find('description').text,
'duration': duration,
'upload_date': upload_date,
} }
})
class CNNBlogsIE(InfoExtractor): class CNNBlogsIE(InfoExtractor):
@ -132,6 +112,7 @@ class CNNBlogsIE(InfoExtractor):
'description': 'Glenn Greenwald responds to comments made this week on Capitol Hill that journalists could be criminal accessories.', 'description': 'Glenn Greenwald responds to comments made this week on Capitol Hill that journalists could be criminal accessories.',
'upload_date': '20140209', 'upload_date': '20140209',
}, },
'expected_warnings': ['Failed to download m3u8 information'],
'add_ie': ['CNN'], 'add_ie': ['CNN'],
} }
@ -146,7 +127,7 @@ class CNNBlogsIE(InfoExtractor):
class CNNArticleIE(InfoExtractor): class CNNArticleIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:edition|www)\.)?cnn\.com/(?!video/)' _VALID_URL = r'https?://(?:(?:edition|www)\.)?cnn\.com/(?!videos?/)'
_TEST = { _TEST = {
'url': 'http://www.cnn.com/2014/12/21/politics/obama-north-koreas-hack-not-war-but-cyber-vandalism/', 'url': 'http://www.cnn.com/2014/12/21/politics/obama-north-koreas-hack-not-war-but-cyber-vandalism/',
'md5': '689034c2a3d9c6dc4aa72d65a81efd01', 'md5': '689034c2a3d9c6dc4aa72d65a81efd01',
@ -154,9 +135,10 @@ class CNNArticleIE(InfoExtractor):
'id': 'bestoftv/2014/12/21/ip-north-korea-obama.cnn', 'id': 'bestoftv/2014/12/21/ip-north-korea-obama.cnn',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Obama: Cyberattack not an act of war', 'title': 'Obama: Cyberattack not an act of war',
'description': 'md5:51ce6750450603795cad0cdfbd7d05c5', 'description': 'md5:0a802a40d2376f60e6b04c8d5bcebc4b',
'upload_date': '20141221', 'upload_date': '20141221',
}, },
'expected_warnings': ['Failed to download m3u8 information'],
'add_ie': ['CNN'], 'add_ie': ['CNN'],
} }

View File

@ -1,4 +1,4 @@
# encoding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor

View File

@ -21,6 +21,7 @@ from ..compat import (
compat_os_name, compat_os_name,
compat_str, compat_str,
compat_urllib_error, compat_urllib_error,
compat_urllib_parse_unquote,
compat_urllib_parse_urlencode, compat_urllib_parse_urlencode,
compat_urllib_request, compat_urllib_request,
compat_urlparse, compat_urlparse,
@ -29,6 +30,7 @@ from ..downloader.f4m import remove_encrypted_media
from ..utils import ( from ..utils import (
NO_DEFAULT, NO_DEFAULT,
age_restricted, age_restricted,
base_url,
bug_reports_message, bug_reports_message,
clean_html, clean_html,
compiled_regex_type, compiled_regex_type,
@ -87,6 +89,9 @@ class InfoExtractor(object):
Potential fields: Potential fields:
* url Mandatory. The URL of the video file * url Mandatory. The URL of the video file
* manifest_url
The URL of the manifest file in case of
fragmented media (DASH, hls, hds)
* ext Will be calculated from URL if missing * ext Will be calculated from URL if missing
* format A human-readable description of the format * format A human-readable description of the format
("mp4 container with h264/opus"). ("mp4 container with h264/opus").
@ -115,6 +120,11 @@ class InfoExtractor(object):
download, lower-case. download, lower-case.
"http", "https", "rtsp", "rtmp", "rtmpe", "http", "https", "rtsp", "rtmp", "rtmpe",
"m3u8", "m3u8_native" or "http_dash_segments". "m3u8", "m3u8_native" or "http_dash_segments".
* fragments A list of fragments of the fragmented media,
with the following entries:
* "url" (mandatory) - fragment's URL
* "duration" (optional, int or float)
* "filesize" (optional, int)
* preference Order number of this format. If this field is * preference Order number of this format. If this field is
present and not None, the formats get sorted present and not None, the formats get sorted
by this field, regardless of all other values. by this field, regardless of all other values.
@ -226,7 +236,7 @@ class InfoExtractor(object):
chapter_id: Id of the chapter the video belongs to, as a unicode string. chapter_id: Id of the chapter the video belongs to, as a unicode string.
The following fields should only be used when the video is an episode of some The following fields should only be used when the video is an episode of some
series or programme: series, programme or podcast:
series: Title of the series or programme the video episode belongs to. series: Title of the series or programme the video episode belongs to.
season: Title of the season the video episode belongs to. season: Title of the season the video episode belongs to.
@ -662,35 +672,48 @@ class InfoExtractor(object):
else: else:
return res return res
def _get_login_info(self): def _get_netrc_login_info(self, netrc_machine=None):
username = None
password = None
netrc_machine = netrc_machine or self._NETRC_MACHINE
if self._downloader.params.get('usenetrc', False):
try:
info = netrc.netrc().authenticators(netrc_machine)
if info is not None:
username = info[0]
password = info[2]
else:
raise netrc.NetrcParseError(
'No authenticators for %s' % netrc_machine)
except (IOError, netrc.NetrcParseError) as err:
self._downloader.report_warning(
'parsing .netrc: %s' % error_to_compat_str(err))
return username, password
def _get_login_info(self, username_option='username', password_option='password', netrc_machine=None):
""" """
Get the login info as (username, password) Get the login info as (username, password)
It will look in the netrc file using the _NETRC_MACHINE value First look for the manually specified credentials using username_option
and password_option as keys in params dictionary. If no such credentials
available look in the netrc file using the netrc_machine or _NETRC_MACHINE
value.
If there's no info available, return (None, None) If there's no info available, return (None, None)
""" """
if self._downloader is None: if self._downloader is None:
return (None, None) return (None, None)
username = None
password = None
downloader_params = self._downloader.params downloader_params = self._downloader.params
# Attempt to use provided username and password or .netrc data # Attempt to use provided username and password or .netrc data
if downloader_params.get('username') is not None: if downloader_params.get(username_option) is not None:
username = downloader_params['username'] username = downloader_params[username_option]
password = downloader_params['password'] password = downloader_params[password_option]
elif downloader_params.get('usenetrc', False):
try:
info = netrc.netrc().authenticators(self._NETRC_MACHINE)
if info is not None:
username = info[0]
password = info[2]
else: else:
raise netrc.NetrcParseError('No authenticators for %s' % self._NETRC_MACHINE) username, password = self._get_netrc_login_info(netrc_machine)
except (IOError, netrc.NetrcParseError) as err:
self._downloader.report_warning('parsing .netrc: %s' % error_to_compat_str(err))
return (username, password) return username, password
def _get_tfa_info(self, note='two-factor verification code'): def _get_tfa_info(self, note='two-factor verification code'):
""" """
@ -878,16 +901,16 @@ class InfoExtractor(object):
def _hidden_inputs(html): def _hidden_inputs(html):
html = re.sub(r'<!--(?:(?!<!--).)*-->', '', html) html = re.sub(r'<!--(?:(?!<!--).)*-->', '', html)
hidden_inputs = {} hidden_inputs = {}
for input in re.findall(r'(?i)<input([^>]+)>', html): for input in re.findall(r'(?i)(<input[^>]+>)', html):
if not re.search(r'type=(["\'])(?:hidden|submit)\1', input): attrs = extract_attributes(input)
if not input:
continue continue
name = re.search(r'(?:name|id)=(["\'])(?P<value>.+?)\1', input) if attrs.get('type') not in ('hidden', 'submit'):
if not name:
continue continue
value = re.search(r'value=(["\'])(?P<value>.*?)\1', input) name = attrs.get('name') or attrs.get('id')
if not value: value = attrs.get('value')
continue if name and value is not None:
hidden_inputs[name.group('value')] = value.group('value') hidden_inputs[name] = value
return hidden_inputs return hidden_inputs
def _form_hidden_inputs(self, form_id, html): def _form_hidden_inputs(self, form_id, html):
@ -1078,6 +1101,13 @@ class InfoExtractor(object):
manifest, ['{http://ns.adobe.com/f4m/1.0}bootstrapInfo', '{http://ns.adobe.com/f4m/2.0}bootstrapInfo'], manifest, ['{http://ns.adobe.com/f4m/1.0}bootstrapInfo', '{http://ns.adobe.com/f4m/2.0}bootstrapInfo'],
'bootstrap info', default=None) 'bootstrap info', default=None)
vcodec = None
mime_type = xpath_text(
manifest, ['{http://ns.adobe.com/f4m/1.0}mimeType', '{http://ns.adobe.com/f4m/2.0}mimeType'],
'base URL', default=None)
if mime_type and mime_type.startswith('audio/'):
vcodec = 'none'
for i, media_el in enumerate(media_nodes): for i, media_el in enumerate(media_nodes):
tbr = int_or_none(media_el.attrib.get('bitrate')) tbr = int_or_none(media_el.attrib.get('bitrate'))
width = int_or_none(media_el.attrib.get('width')) width = int_or_none(media_el.attrib.get('width'))
@ -1118,6 +1148,7 @@ class InfoExtractor(object):
'width': f.get('width') or width, 'width': f.get('width') or width,
'height': f.get('height') or height, 'height': f.get('height') or height,
'format_id': f.get('format_id') if not tbr else format_id, 'format_id': f.get('format_id') if not tbr else format_id,
'vcodec': vcodec,
}) })
formats.extend(f4m_formats) formats.extend(f4m_formats)
continue continue
@ -1129,10 +1160,12 @@ class InfoExtractor(object):
formats.append({ formats.append({
'format_id': format_id, 'format_id': format_id,
'url': manifest_url, 'url': manifest_url,
'manifest_url': manifest_url,
'ext': 'flv' if bootstrap_info is not None else None, 'ext': 'flv' if bootstrap_info is not None else None,
'tbr': tbr, 'tbr': tbr,
'width': width, 'width': width,
'height': height, 'height': height,
'vcodec': vcodec,
'preference': preference, 'preference': preference,
}) })
return formats return formats
@ -1153,13 +1186,6 @@ class InfoExtractor(object):
m3u8_id=None, note=None, errnote=None, m3u8_id=None, note=None, errnote=None,
fatal=True, live=False): fatal=True, live=False):
formats = [self._m3u8_meta_format(m3u8_url, ext, preference, m3u8_id)]
format_url = lambda u: (
u
if re.match(r'^https?://', u)
else compat_urlparse.urljoin(m3u8_url, u))
res = self._download_webpage_handle( res = self._download_webpage_handle(
m3u8_url, video_id, m3u8_url, video_id,
note=note or 'Downloading m3u8 information', note=note or 'Downloading m3u8 information',
@ -1170,6 +1196,13 @@ class InfoExtractor(object):
m3u8_doc, urlh = res m3u8_doc, urlh = res
m3u8_url = urlh.geturl() m3u8_url = urlh.geturl()
formats = [self._m3u8_meta_format(m3u8_url, ext, preference, m3u8_id)]
format_url = lambda u: (
u
if re.match(r'^https?://', u)
else compat_urlparse.urljoin(m3u8_url, u))
# We should try extracting formats only from master playlists [1], i.e. # We should try extracting formats only from master playlists [1], i.e.
# playlists that describe available qualities. On the other hand media # playlists that describe available qualities. On the other hand media
# playlists [2] should be returned as is since they contain just the media # playlists [2] should be returned as is since they contain just the media
@ -1191,35 +1224,54 @@ class InfoExtractor(object):
'protocol': entry_protocol, 'protocol': entry_protocol,
'preference': preference, 'preference': preference,
}] }]
last_info = None last_info = {}
last_media = None last_media = {}
for line in m3u8_doc.splitlines(): for line in m3u8_doc.splitlines():
if line.startswith('#EXT-X-STREAM-INF:'): if line.startswith('#EXT-X-STREAM-INF:'):
last_info = parse_m3u8_attributes(line) last_info = parse_m3u8_attributes(line)
elif line.startswith('#EXT-X-MEDIA:'): elif line.startswith('#EXT-X-MEDIA:'):
last_media = parse_m3u8_attributes(line) media = parse_m3u8_attributes(line)
media_type = media.get('TYPE')
if media_type in ('VIDEO', 'AUDIO'):
media_url = media.get('URI')
if media_url:
format_id = []
for v in (media.get('GROUP-ID'), media.get('NAME')):
if v:
format_id.append(v)
formats.append({
'format_id': '-'.join(format_id),
'url': format_url(media_url),
'language': media.get('LANGUAGE'),
'vcodec': 'none' if media_type == 'AUDIO' else None,
'ext': ext,
'protocol': entry_protocol,
'preference': preference,
})
else:
# When there is no URI in EXT-X-MEDIA let this tag's
# data be used by regular URI lines below
last_media = media
elif line.startswith('#') or not line.strip(): elif line.startswith('#') or not line.strip():
continue continue
else: else:
if last_info is None: tbr = int_or_none(last_info.get('AVERAGE-BANDWIDTH') or last_info.get('BANDWIDTH'), scale=1000)
formats.append({'url': format_url(line)})
continue
tbr = int_or_none(last_info.get('BANDWIDTH'), scale=1000)
format_id = [] format_id = []
if m3u8_id: if m3u8_id:
format_id.append(m3u8_id) format_id.append(m3u8_id)
last_media_name = last_media.get('NAME') if last_media and last_media.get('TYPE') not in ('SUBTITLES', 'CLOSED-CAPTIONS') else None
# Despite specification does not mention NAME attribute for # Despite specification does not mention NAME attribute for
# EXT-X-STREAM-INF it still sometimes may be present # EXT-X-STREAM-INF it still sometimes may be present
stream_name = last_info.get('NAME') or last_media_name stream_name = last_info.get('NAME') or last_media.get('NAME')
# Bandwidth of live streams may differ over time thus making # Bandwidth of live streams may differ over time thus making
# format_id unpredictable. So it's better to keep provided # format_id unpredictable. So it's better to keep provided
# format_id intact. # format_id intact.
if not live: if not live:
format_id.append(stream_name if stream_name else '%d' % (tbr if tbr else len(formats))) format_id.append(stream_name if stream_name else '%d' % (tbr if tbr else len(formats)))
manifest_url = format_url(line.strip())
f = { f = {
'format_id': '-'.join(format_id), 'format_id': '-'.join(format_id),
'url': format_url(line.strip()), 'url': manifest_url,
'manifest_url': manifest_url,
'tbr': tbr, 'tbr': tbr,
'ext': ext, 'ext': ext,
'fps': float_or_none(last_info.get('FRAME-RATE')), 'fps': float_or_none(last_info.get('FRAME-RATE')),
@ -1242,11 +1294,9 @@ class InfoExtractor(object):
'abr': abr, 'abr': abr,
}) })
f.update(parse_codecs(last_info.get('CODECS'))) f.update(parse_codecs(last_info.get('CODECS')))
if last_media is not None:
f['m3u8_media'] = last_media
last_media = None
formats.append(f) formats.append(f)
last_info = {} last_info = {}
last_media = {}
return formats return formats
@staticmethod @staticmethod
@ -1490,12 +1540,13 @@ class InfoExtractor(object):
if res is False: if res is False:
return [] return []
mpd, urlh = res mpd, urlh = res
mpd_base_url = re.match(r'https?://.+/', urlh.geturl()).group() mpd_base_url = base_url(urlh.geturl())
return self._parse_mpd_formats( return self._parse_mpd_formats(
compat_etree_fromstring(mpd.encode('utf-8')), mpd_id, mpd_base_url, formats_dict=formats_dict) compat_etree_fromstring(mpd.encode('utf-8')), mpd_id, mpd_base_url,
formats_dict=formats_dict, mpd_url=mpd_url)
def _parse_mpd_formats(self, mpd_doc, mpd_id=None, mpd_base_url='', formats_dict={}): def _parse_mpd_formats(self, mpd_doc, mpd_id=None, mpd_base_url='', formats_dict={}, mpd_url=None):
""" """
Parse formats from MPD manifest. Parse formats from MPD manifest.
References: References:
@ -1516,21 +1567,12 @@ class InfoExtractor(object):
def extract_multisegment_info(element, ms_parent_info): def extract_multisegment_info(element, ms_parent_info):
ms_info = ms_parent_info.copy() ms_info = ms_parent_info.copy()
segment_list = element.find(_add_ns('SegmentList'))
if segment_list is not None: # As per [1, 5.3.9.2.2] SegmentList and SegmentTemplate share some
segment_urls_e = segment_list.findall(_add_ns('SegmentURL')) # common attributes and elements. We will only extract relevant
if segment_urls_e: # for us.
ms_info['segment_urls'] = [segment.attrib['media'] for segment in segment_urls_e] def extract_common(source):
initialization = segment_list.find(_add_ns('Initialization')) segment_timeline = source.find(_add_ns('SegmentTimeline'))
if initialization is not None:
ms_info['initialization_url'] = initialization.attrib['sourceURL']
else:
segment_template = element.find(_add_ns('SegmentTemplate'))
if segment_template is not None:
start_number = segment_template.get('startNumber')
if start_number:
ms_info['start_number'] = int(start_number)
segment_timeline = segment_template.find(_add_ns('SegmentTimeline'))
if segment_timeline is not None: if segment_timeline is not None:
s_e = segment_timeline.findall(_add_ns('S')) s_e = segment_timeline.findall(_add_ns('S'))
if s_e: if s_e:
@ -1545,13 +1587,32 @@ class InfoExtractor(object):
'd': int(s.attrib['d']), 'd': int(s.attrib['d']),
'r': r, 'r': r,
}) })
else: start_number = source.get('startNumber')
timescale = segment_template.get('timescale') if start_number:
ms_info['start_number'] = int(start_number)
timescale = source.get('timescale')
if timescale: if timescale:
ms_info['timescale'] = int(timescale) ms_info['timescale'] = int(timescale)
segment_duration = segment_template.get('duration') segment_duration = source.get('duration')
if segment_duration: if segment_duration:
ms_info['segment_duration'] = int(segment_duration) ms_info['segment_duration'] = int(segment_duration)
def extract_Initialization(source):
initialization = source.find(_add_ns('Initialization'))
if initialization is not None:
ms_info['initialization_url'] = initialization.attrib['sourceURL']
segment_list = element.find(_add_ns('SegmentList'))
if segment_list is not None:
extract_common(segment_list)
extract_Initialization(segment_list)
segment_urls_e = segment_list.findall(_add_ns('SegmentURL'))
if segment_urls_e:
ms_info['segment_urls'] = [segment.attrib['media'] for segment in segment_urls_e]
else:
segment_template = element.find(_add_ns('SegmentTemplate'))
if segment_template is not None:
extract_common(segment_template)
media_template = segment_template.get('media') media_template = segment_template.get('media')
if media_template: if media_template:
ms_info['media_template'] = media_template ms_info['media_template'] = media_template
@ -1559,11 +1620,14 @@ class InfoExtractor(object):
if initialization: if initialization:
ms_info['initialization_url'] = initialization ms_info['initialization_url'] = initialization
else: else:
initialization = segment_template.find(_add_ns('Initialization')) extract_Initialization(segment_template)
if initialization is not None:
ms_info['initialization_url'] = initialization.attrib['sourceURL']
return ms_info return ms_info
def combine_url(base_url, target_url):
if re.match(r'^https?://', target_url):
return target_url
return '%s%s%s' % (base_url, '' if base_url.endswith('/') else '/', target_url)
mpd_duration = parse_duration(mpd_doc.get('mediaPresentationDuration')) mpd_duration = parse_duration(mpd_doc.get('mediaPresentationDuration'))
formats = [] formats = []
for period in mpd_doc.findall(_add_ns('Period')): for period in mpd_doc.findall(_add_ns('Period')):
@ -1606,6 +1670,7 @@ class InfoExtractor(object):
f = { f = {
'format_id': '%s-%s' % (mpd_id, representation_id) if mpd_id else representation_id, 'format_id': '%s-%s' % (mpd_id, representation_id) if mpd_id else representation_id,
'url': base_url, 'url': base_url,
'manifest_url': mpd_url,
'ext': mimetype2ext(mime_type), 'ext': mimetype2ext(mime_type),
'width': int_or_none(representation_attrib.get('width')), 'width': int_or_none(representation_attrib.get('width')),
'height': int_or_none(representation_attrib.get('height')), 'height': int_or_none(representation_attrib.get('height')),
@ -1620,9 +1685,7 @@ class InfoExtractor(object):
} }
representation_ms_info = extract_multisegment_info(representation, adaption_set_ms_info) representation_ms_info = extract_multisegment_info(representation, adaption_set_ms_info)
if 'segment_urls' not in representation_ms_info and 'media_template' in representation_ms_info: if 'segment_urls' not in representation_ms_info and 'media_template' in representation_ms_info:
if 'total_number' not in representation_ms_info and 'segment_duration':
segment_duration = float(representation_ms_info['segment_duration']) / float(representation_ms_info['timescale'])
representation_ms_info['total_number'] = int(math.ceil(float(period_duration) / segment_duration))
media_template = representation_ms_info['media_template'] media_template = representation_ms_info['media_template']
media_template = media_template.replace('$RepresentationID$', representation_id) media_template = media_template.replace('$RepresentationID$', representation_id)
media_template = re.sub(r'\$(Number|Bandwidth|Time)\$', r'%(\1)d', media_template) media_template = re.sub(r'\$(Number|Bandwidth|Time)\$', r'%(\1)d', media_template)
@ -1631,46 +1694,79 @@ class InfoExtractor(object):
# As per [1, 5.3.9.4.4, Table 16, page 55] $Number$ and $Time$ # As per [1, 5.3.9.4.4, Table 16, page 55] $Number$ and $Time$
# can't be used at the same time # can't be used at the same time
if '%(Number' in media_template: if '%(Number' in media_template and 's' not in representation_ms_info:
representation_ms_info['segment_urls'] = [ segment_duration = None
media_template % { if 'total_number' not in representation_ms_info and 'segment_duration':
segment_duration = float_or_none(representation_ms_info['segment_duration'], representation_ms_info['timescale'])
representation_ms_info['total_number'] = int(math.ceil(float(period_duration) / segment_duration))
representation_ms_info['fragments'] = [{
'url': media_template % {
'Number': segment_number, 'Number': segment_number,
'Bandwidth': representation_attrib.get('bandwidth'), 'Bandwidth': representation_attrib.get('bandwidth'),
} },
for segment_number in range( 'duration': segment_duration,
} for segment_number in range(
representation_ms_info['start_number'], representation_ms_info['start_number'],
representation_ms_info['total_number'] + representation_ms_info['start_number'])] representation_ms_info['total_number'] + representation_ms_info['start_number'])]
else: else:
representation_ms_info['segment_urls'] = [] # $Number*$ or $Time$ in media template with S list available
# Example $Number*$: http://www.svtplay.se/klipp/9023742/stopptid-om-bjorn-borg
# Example $Time$: https://play.arkena.com/embed/avp/v2/player/media/b41dda37-d8e7-4d3f-b1b5-9a9db578bdfe/1/129411
representation_ms_info['fragments'] = []
segment_time = 0 segment_time = 0
segment_d = None
segment_number = representation_ms_info['start_number']
def add_segment_url(): def add_segment_url():
representation_ms_info['segment_urls'].append( segment_url = media_template % {
media_template % {
'Time': segment_time, 'Time': segment_time,
'Bandwidth': representation_attrib.get('bandwidth'), 'Bandwidth': representation_attrib.get('bandwidth'),
'Number': segment_number,
} }
) representation_ms_info['fragments'].append({
'url': segment_url,
'duration': float_or_none(segment_d, representation_ms_info['timescale']),
})
for num, s in enumerate(representation_ms_info['s']): for num, s in enumerate(representation_ms_info['s']):
segment_time = s.get('t') or segment_time segment_time = s.get('t') or segment_time
segment_d = s['d']
add_segment_url() add_segment_url()
segment_number += 1
for r in range(s.get('r', 0)): for r in range(s.get('r', 0)):
segment_time += s['d'] segment_time += segment_d
add_segment_url() add_segment_url()
segment_time += s['d'] segment_number += 1
if 'segment_urls' in representation_ms_info: segment_time += segment_d
elif 'segment_urls' in representation_ms_info and 's' in representation_ms_info:
# No media template
# Example: https://www.youtube.com/watch?v=iXZV5uAYMJI
# or any YouTube dashsegments video
fragments = []
s_num = 0
for segment_url in representation_ms_info['segment_urls']:
s = representation_ms_info['s'][s_num]
for r in range(s.get('r', 0) + 1):
fragments.append({
'url': segment_url,
'duration': float_or_none(s['d'], representation_ms_info['timescale']),
})
representation_ms_info['fragments'] = fragments
# NB: MPD manifest may contain direct URLs to unfragmented media.
# No fragments key is present in this case.
if 'fragments' in representation_ms_info:
f.update({ f.update({
'segment_urls': representation_ms_info['segment_urls'], 'fragments': [],
'protocol': 'http_dash_segments', 'protocol': 'http_dash_segments',
}) })
if 'initialization_url' in representation_ms_info: if 'initialization_url' in representation_ms_info:
initialization_url = representation_ms_info['initialization_url'].replace('$RepresentationID$', representation_id) initialization_url = representation_ms_info['initialization_url'].replace('$RepresentationID$', representation_id)
f.update({
'initialization_url': initialization_url,
})
if not f.get('url'): if not f.get('url'):
f['url'] = initialization_url f['url'] = initialization_url
f['fragments'].append({'url': initialization_url})
f['fragments'].extend(representation_ms_info['fragments'])
for fragment in f['fragments']:
fragment['url'] = combine_url(base_url, fragment['url'])
try: try:
existing_format = next( existing_format = next(
fo for fo in formats fo for fo in formats
@ -1685,7 +1781,106 @@ class InfoExtractor(object):
self.report_warning('Unknown MIME type %s in DASH manifest' % mime_type) self.report_warning('Unknown MIME type %s in DASH manifest' % mime_type)
return formats return formats
def _parse_html5_media_entries(self, base_url, webpage): def _extract_ism_formats(self, ism_url, video_id, ism_id=None, note=None, errnote=None, fatal=True):
res = self._download_webpage_handle(
ism_url, video_id,
note=note or 'Downloading ISM manifest',
errnote=errnote or 'Failed to download ISM manifest',
fatal=fatal)
if res is False:
return []
ism, urlh = res
return self._parse_ism_formats(
compat_etree_fromstring(ism.encode('utf-8')), urlh.geturl(), ism_id)
def _parse_ism_formats(self, ism_doc, ism_url, ism_id=None):
if ism_doc.get('IsLive') == 'TRUE' or ism_doc.find('Protection') is not None:
return []
duration = int(ism_doc.attrib['Duration'])
timescale = int_or_none(ism_doc.get('TimeScale')) or 10000000
formats = []
for stream in ism_doc.findall('StreamIndex'):
stream_type = stream.get('Type')
if stream_type not in ('video', 'audio'):
continue
url_pattern = stream.attrib['Url']
stream_timescale = int_or_none(stream.get('TimeScale')) or timescale
stream_name = stream.get('Name')
for track in stream.findall('QualityLevel'):
fourcc = track.get('FourCC')
# TODO: add support for WVC1 and WMAP
if fourcc not in ('H264', 'AVC1', 'AACL'):
self.report_warning('%s is not a supported codec' % fourcc)
continue
tbr = int(track.attrib['Bitrate']) // 1000
width = int_or_none(track.get('MaxWidth'))
height = int_or_none(track.get('MaxHeight'))
sampling_rate = int_or_none(track.get('SamplingRate'))
track_url_pattern = re.sub(r'{[Bb]itrate}', track.attrib['Bitrate'], url_pattern)
track_url_pattern = compat_urlparse.urljoin(ism_url, track_url_pattern)
fragments = []
fragment_ctx = {
'time': 0,
}
stream_fragments = stream.findall('c')
for stream_fragment_index, stream_fragment in enumerate(stream_fragments):
fragment_ctx['time'] = int_or_none(stream_fragment.get('t')) or fragment_ctx['time']
fragment_repeat = int_or_none(stream_fragment.get('r')) or 1
fragment_ctx['duration'] = int_or_none(stream_fragment.get('d'))
if not fragment_ctx['duration']:
try:
next_fragment_time = int(stream_fragment[stream_fragment_index + 1].attrib['t'])
except IndexError:
next_fragment_time = duration
fragment_ctx['duration'] = (next_fragment_time - fragment_ctx['time']) / fragment_repeat
for _ in range(fragment_repeat):
fragments.append({
'url': re.sub(r'{start[ _]time}', compat_str(fragment_ctx['time']), track_url_pattern),
'duration': fragment_ctx['duration'] / stream_timescale,
})
fragment_ctx['time'] += fragment_ctx['duration']
format_id = []
if ism_id:
format_id.append(ism_id)
if stream_name:
format_id.append(stream_name)
format_id.append(compat_str(tbr))
formats.append({
'format_id': '-'.join(format_id),
'url': ism_url,
'manifest_url': ism_url,
'ext': 'ismv' if stream_type == 'video' else 'isma',
'width': width,
'height': height,
'tbr': tbr,
'asr': sampling_rate,
'vcodec': 'none' if stream_type == 'audio' else fourcc,
'acodec': 'none' if stream_type == 'video' else fourcc,
'protocol': 'ism',
'fragments': fragments,
'_download_params': {
'duration': duration,
'timescale': stream_timescale,
'width': width or 0,
'height': height or 0,
'fourcc': fourcc,
'codec_private_data': track.get('CodecPrivateData'),
'sampling_rate': sampling_rate,
'channels': int_or_none(track.get('Channels', 2)),
'bits_per_sample': int_or_none(track.get('BitsPerSample', 16)),
'nal_unit_length_field': int_or_none(track.get('NALUnitLengthField', 4)),
},
})
return formats
def _parse_html5_media_entries(self, base_url, webpage, video_id, m3u8_id=None, m3u8_entry_protocol='m3u8'):
def absolute_url(video_url): def absolute_url(video_url):
return compat_urlparse.urljoin(base_url, video_url) return compat_urlparse.urljoin(base_url, video_url)
@ -1700,8 +1895,27 @@ class InfoExtractor(object):
return f return f
return {} return {}
def _media_formats(src, cur_media_type):
full_url = absolute_url(src)
if determine_ext(full_url) == 'm3u8':
is_plain_url = False
formats = self._extract_m3u8_formats(
full_url, video_id, ext='mp4',
entry_protocol=m3u8_entry_protocol, m3u8_id=m3u8_id)
else:
is_plain_url = True
formats = [{
'url': full_url,
'vcodec': 'none' if cur_media_type == 'audio' else None,
}]
return is_plain_url, formats
entries = [] entries = []
for media_tag, media_type, media_content in re.findall(r'(?s)(<(?P<tag>video|audio)[^>]*>)(.*?)</(?P=tag)>', webpage): media_tags = [(media_tag, media_type, '')
for media_tag, media_type
in re.findall(r'(?s)(<(video|audio)[^>]*/>)', webpage)]
media_tags.extend(re.findall(r'(?s)(<(?P<tag>video|audio)[^>]*>)(.*?)</(?P=tag)>', webpage))
for media_tag, media_type, media_content in media_tags:
media_info = { media_info = {
'formats': [], 'formats': [],
'subtitles': {}, 'subtitles': {},
@ -1709,10 +1923,8 @@ class InfoExtractor(object):
media_attributes = extract_attributes(media_tag) media_attributes = extract_attributes(media_tag)
src = media_attributes.get('src') src = media_attributes.get('src')
if src: if src:
media_info['formats'].append({ _, formats = _media_formats(src, media_type)
'url': absolute_url(src), media_info['formats'].extend(formats)
'vcodec': 'none' if media_type == 'audio' else None,
})
media_info['thumbnail'] = media_attributes.get('poster') media_info['thumbnail'] = media_attributes.get('poster')
if media_content: if media_content:
for source_tag in re.findall(r'<source[^>]+>', media_content): for source_tag in re.findall(r'<source[^>]+>', media_content):
@ -1720,16 +1932,17 @@ class InfoExtractor(object):
src = source_attributes.get('src') src = source_attributes.get('src')
if not src: if not src:
continue continue
is_plain_url, formats = _media_formats(src, media_type)
if is_plain_url:
f = parse_content_type(source_attributes.get('type')) f = parse_content_type(source_attributes.get('type'))
f.update({ f.update(formats[0])
'url': absolute_url(src),
'vcodec': 'none' if media_type == 'audio' else None,
})
media_info['formats'].append(f) media_info['formats'].append(f)
else:
media_info['formats'].extend(formats)
for track_tag in re.findall(r'<track[^>]+>', media_content): for track_tag in re.findall(r'<track[^>]+>', media_content):
track_attributes = extract_attributes(track_tag) track_attributes = extract_attributes(track_tag)
kind = track_attributes.get('kind') kind = track_attributes.get('kind')
if not kind or kind == 'subtitles': if not kind or kind in ('subtitles', 'captions'):
src = track_attributes.get('src') src = track_attributes.get('src')
if not src: if not src:
continue continue
@ -1737,10 +1950,70 @@ class InfoExtractor(object):
media_info['subtitles'].setdefault(lang, []).append({ media_info['subtitles'].setdefault(lang, []).append({
'url': absolute_url(src), 'url': absolute_url(src),
}) })
if media_info['formats']: if media_info['formats'] or media_info['subtitles']:
entries.append(media_info) entries.append(media_info)
return entries return entries
def _extract_akamai_formats(self, manifest_url, video_id):
formats = []
hdcore_sign = 'hdcore=3.7.0'
f4m_url = re.sub(r'(https?://.+?)/i/', r'\1/z/', manifest_url).replace('/master.m3u8', '/manifest.f4m')
if 'hdcore=' not in f4m_url:
f4m_url += ('&' if '?' in f4m_url else '?') + hdcore_sign
f4m_formats = self._extract_f4m_formats(
f4m_url, video_id, f4m_id='hds', fatal=False)
for entry in f4m_formats:
entry.update({'extra_param_to_segment_url': hdcore_sign})
formats.extend(f4m_formats)
m3u8_url = re.sub(r'(https?://.+?)/z/', r'\1/i/', manifest_url).replace('/manifest.f4m', '/master.m3u8')
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
return formats
def _extract_wowza_formats(self, url, video_id, m3u8_entry_protocol='m3u8_native', skip_protocols=[]):
url = re.sub(r'/(?:manifest|playlist|jwplayer)\.(?:m3u8|f4m|mpd|smil)', '', url)
url_base = self._search_regex(r'(?:https?|rtmp|rtsp)(://[^?]+)', url, 'format url')
http_base_url = 'http' + url_base
formats = []
if 'm3u8' not in skip_protocols:
formats.extend(self._extract_m3u8_formats(
http_base_url + '/playlist.m3u8', video_id, 'mp4',
m3u8_entry_protocol, m3u8_id='hls', fatal=False))
if 'f4m' not in skip_protocols:
formats.extend(self._extract_f4m_formats(
http_base_url + '/manifest.f4m',
video_id, f4m_id='hds', fatal=False))
if 'dash' not in skip_protocols:
formats.extend(self._extract_mpd_formats(
http_base_url + '/manifest.mpd',
video_id, mpd_id='dash', fatal=False))
if re.search(r'(?:/smil:|\.smil)', url_base):
if 'smil' not in skip_protocols:
rtmp_formats = self._extract_smil_formats(
http_base_url + '/jwplayer.smil',
video_id, fatal=False)
for rtmp_format in rtmp_formats:
rtsp_format = rtmp_format.copy()
rtsp_format['url'] = '%s/%s' % (rtmp_format['url'], rtmp_format['play_path'])
del rtsp_format['play_path']
del rtsp_format['ext']
rtsp_format.update({
'url': rtsp_format['url'].replace('rtmp://', 'rtsp://'),
'format_id': rtmp_format['format_id'].replace('rtmp', 'rtsp'),
'protocol': 'rtsp',
})
formats.extend([rtmp_format, rtsp_format])
else:
for protocol in ('rtmp', 'rtsp'):
if protocol not in skip_protocols:
formats.append({
'url': protocol + url_base,
'format_id': protocol,
'protocol': protocol,
})
return formats
def _live_title(self, name): def _live_title(self, name):
""" Generate the title for a live video """ """ Generate the title for a live video """
now = datetime.datetime.now() now = datetime.datetime.now()
@ -1861,6 +2134,12 @@ class InfoExtractor(object):
headers['Ytdl-request-proxy'] = geo_verification_proxy headers['Ytdl-request-proxy'] = geo_verification_proxy
return headers return headers
def _generic_id(self, url):
return compat_urllib_parse_unquote(os.path.splitext(url.rstrip('/').split('/')[-1])[0])
def _generic_title(self, url):
return compat_urllib_parse_unquote(os.path.splitext(url_basename(url))[0])
class SearchInfoExtractor(InfoExtractor): class SearchInfoExtractor(InfoExtractor):
""" """

View File

@ -1,13 +1,9 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import os
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_urllib_parse_unquote,
compat_urlparse, compat_urlparse,
) )
from ..utils import url_basename
class RtmpIE(InfoExtractor): class RtmpIE(InfoExtractor):
@ -23,8 +19,8 @@ class RtmpIE(InfoExtractor):
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id = compat_urllib_parse_unquote(os.path.splitext(url.rstrip('/').split('/')[-1])[0]) video_id = self._generic_id(url)
title = compat_urllib_parse_unquote(os.path.splitext(url_basename(url))[0]) title = self._generic_title(url)
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
@ -34,3 +30,31 @@ class RtmpIE(InfoExtractor):
'format_id': compat_urlparse.urlparse(url).scheme, 'format_id': compat_urlparse.urlparse(url).scheme,
}], }],
} }
class MmsIE(InfoExtractor):
IE_DESC = False # Do not list
_VALID_URL = r'(?i)mms://.+'
_TEST = {
# Direct MMS link
'url': 'mms://kentro.kaist.ac.kr/200907/MilesReid(0709).wmv',
'info_dict': {
'id': 'MilesReid(0709)',
'ext': 'wmv',
'title': 'MilesReid(0709)',
},
'params': {
'skip_download': True, # rtsp downloads, requiring mplayer or mpv
},
}
def _real_extract(self, url):
video_id = self._generic_id(url)
title = self._generic_title(url)
return {
'id': video_id,
'title': title,
'url': url,
}

View File

@ -1,5 +1,5 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals, division
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import int_or_none from ..utils import int_or_none
@ -8,12 +8,22 @@ from ..utils import int_or_none
class CrackleIE(InfoExtractor): class CrackleIE(InfoExtractor):
_VALID_URL = r'(?:crackle:|https?://(?:www\.)?crackle\.com/(?:playlist/\d+/|(?:[^/]+/)+))(?P<id>\d+)' _VALID_URL = r'(?:crackle:|https?://(?:www\.)?crackle\.com/(?:playlist/\d+/|(?:[^/]+/)+))(?P<id>\d+)'
_TEST = { _TEST = {
'url': 'http://www.crackle.com/the-art-of-more/2496419', 'url': 'http://www.crackle.com/comedians-in-cars-getting-coffee/2498934',
'info_dict': { 'info_dict': {
'id': '2496419', 'id': '2498934',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Heavy Lies the Head', 'title': 'Everybody Respects A Bloody Nose',
'description': 'md5:bb56aa0708fe7b9a4861535f15c3abca', 'description': 'Jerry is kaffeeklatsching in L.A. with funnyman J.B. Smoove (Saturday Night Live, Real Husbands of Hollywood). Theyre headed for brew at 10 Speed Coffee in a 1964 Studebaker Avanti.',
'thumbnail': 're:^https?://.*\.jpg',
'duration': 906,
'series': 'Comedians In Cars Getting Coffee',
'season_number': 8,
'episode_number': 4,
'subtitles': {
'en-US': [{
'ext': 'ttml',
}]
},
}, },
'params': { 'params': {
# m3u8 download # m3u8 download
@ -21,12 +31,8 @@ class CrackleIE(InfoExtractor):
} }
} }
# extracted from http://legacyweb-us.crackle.com/flash/QueryReferrer.ashx
_SUBTITLE_SERVER = 'http://web-us-az.crackle.com'
_UPLYNK_OWNER_ID = 'e8773f7770a44dbd886eee4fca16a66b'
_THUMBNAIL_TEMPLATE = 'http://images-us-am.crackle.com/%stnl_1920x1080.jpg?ts=20140107233116?c=635333335057637614'
# extracted from http://legacyweb-us.crackle.com/flash/ReferrerRedirect.ashx # extracted from http://legacyweb-us.crackle.com/flash/ReferrerRedirect.ashx
_THUMBNAIL_TEMPLATE = 'http://images-us-am.crackle.com/%stnl_1920x1080.jpg?ts=20140107233116?c=635333335057637614'
_MEDIA_FILE_SLOTS = { _MEDIA_FILE_SLOTS = {
'c544.flv': { 'c544.flv': {
'width': 544, 'width': 544,
@ -48,16 +54,21 @@ class CrackleIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
config_doc = self._download_xml(
'http://legacyweb-us.crackle.com/flash/QueryReferrer.ashx?site=16',
video_id, 'Downloading config')
item = self._download_xml( item = self._download_xml(
'http://legacyweb-us.crackle.com/app/revamp/vidwallcache.aspx?flags=-1&fm=%s' % video_id, 'http://legacyweb-us.crackle.com/app/revamp/vidwallcache.aspx?flags=-1&fm=%s' % video_id,
video_id).find('i') video_id).find('i')
title = item.attrib['t'] title = item.attrib['t']
thumbnail = None
subtitles = {} subtitles = {}
formats = self._extract_m3u8_formats( formats = self._extract_m3u8_formats(
'http://content.uplynk.com/ext/%s/%s.m3u8' % (self._UPLYNK_OWNER_ID, video_id), 'http://content.uplynk.com/ext/%s/%s.m3u8' % (config_doc.attrib['strUplynkOwnerId'], video_id),
video_id, 'mp4', m3u8_id='hls', fatal=None) video_id, 'mp4', m3u8_id='hls', fatal=None)
thumbnail = None
path = item.attrib.get('p') path = item.attrib.get('p')
if path: if path:
thumbnail = self._THUMBNAIL_TEMPLATE % path thumbnail = self._THUMBNAIL_TEMPLATE % path
@ -76,7 +87,7 @@ class CrackleIE(InfoExtractor):
if locale not in subtitles: if locale not in subtitles:
subtitles[locale] = [] subtitles[locale] = []
subtitles[locale] = [{ subtitles[locale] = [{
'url': '%s/%s%s_%s.xml' % (self._SUBTITLE_SERVER, path, locale, v), 'url': '%s/%s%s_%s.xml' % (config_doc.attrib['strSubtitleServer'], path, locale, v),
'ext': 'ttml', 'ext': 'ttml',
}] }]
self._sort_formats(formats, ('width', 'height', 'tbr', 'format_id')) self._sort_formats(formats, ('width', 'height', 'tbr', 'format_id'))
@ -85,7 +96,7 @@ class CrackleIE(InfoExtractor):
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'description': item.attrib.get('d'), 'description': item.attrib.get('d'),
'duration': int(item.attrib.get('r'), 16) if item.attrib.get('r') else None, 'duration': int(item.attrib.get('r'), 16) / 1000 if item.attrib.get('r') else None,
'series': item.attrib.get('sn'), 'series': item.attrib.get('sn'),
'season_number': int_or_none(item.attrib.get('se')), 'season_number': int_or_none(item.attrib.get('se')),
'episode_number': int_or_none(item.attrib.get('ep')), 'episode_number': int_or_none(item.attrib.get('ep')),

View File

@ -1,13 +1,11 @@
# -*- coding: utf-8 -*- # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
class CriterionIE(InfoExtractor): class CriterionIE(InfoExtractor):
_VALID_URL = r'https?://www\.criterion\.com/films/(?P<id>[0-9]+)-.+' _VALID_URL = r'https?://(?:www\.)?criterion\.com/films/(?P<id>[0-9]+)-.+'
_TEST = { _TEST = {
'url': 'http://www.criterion.com/films/184-le-samourai', 'url': 'http://www.criterion.com/films/184-le-samourai',
'md5': 'bc51beba55685509883a9a7830919ec3', 'md5': 'bc51beba55685509883a9a7830919ec3',
@ -16,20 +14,20 @@ class CriterionIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Le Samouraï', 'title': 'Le Samouraï',
'description': 'md5:a2b4b116326558149bef81f76dcbb93f', 'description': 'md5:a2b4b116326558149bef81f76dcbb93f',
'thumbnail': 're:^https?://.*\.jpg$',
} }
} }
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id = self._match_id(url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
final_url = self._search_regex( final_url = self._search_regex(
r'so.addVariable\("videoURL", "(.+?)"\)\;', webpage, 'video url') r'so\.addVariable\("videoURL", "(.+?)"\)\;', webpage, 'video url')
title = self._og_search_title(webpage) title = self._og_search_title(webpage)
description = self._html_search_meta('description', webpage) description = self._html_search_meta('description', webpage)
thumbnail = self._search_regex( thumbnail = self._search_regex(
r'so.addVariable\("thumbnailURL", "(.+?)"\)\;', r'so\.addVariable\("thumbnailURL", "(.+?)"\)\;',
webpage, 'thumbnail url') webpage, 'thumbnail url')
return { return {

View File

@ -1,4 +1,4 @@
# encoding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
@ -34,22 +34,58 @@ from ..aes import (
class CrunchyrollBaseIE(InfoExtractor): class CrunchyrollBaseIE(InfoExtractor):
_LOGIN_URL = 'https://www.crunchyroll.com/login'
_LOGIN_FORM = 'login_form'
_NETRC_MACHINE = 'crunchyroll' _NETRC_MACHINE = 'crunchyroll'
def _login(self): def _login(self):
(username, password) = self._get_login_info() (username, password) = self._get_login_info()
if username is None: if username is None:
return return
self.report_login()
login_url = 'https://www.crunchyroll.com/?a=formhandler' login_page = self._download_webpage(
data = urlencode_postdata({ self._LOGIN_URL, None, 'Downloading login page')
'formname': 'RpcApiUser_Login',
'name': username, def is_logged(webpage):
'password': password, return '<title>Redirecting' in webpage
# Already logged in
if is_logged(login_page):
return
login_form_str = self._search_regex(
r'(?P<form><form[^>]+?id=(["\'])%s\2[^>]*>)' % self._LOGIN_FORM,
login_page, 'login form', group='form')
post_url = extract_attributes(login_form_str).get('action')
if not post_url:
post_url = self._LOGIN_URL
elif not post_url.startswith('http'):
post_url = compat_urlparse.urljoin(self._LOGIN_URL, post_url)
login_form = self._form_hidden_inputs(self._LOGIN_FORM, login_page)
login_form.update({
'login_form[name]': username,
'login_form[password]': password,
}) })
login_request = sanitized_Request(login_url, data)
login_request.add_header('Content-Type', 'application/x-www-form-urlencoded') response = self._download_webpage(
self._download_webpage(login_request, None, False, 'Wrong login info') post_url, None, 'Logging in', 'Wrong login info',
data=urlencode_postdata(login_form),
headers={'Content-Type': 'application/x-www-form-urlencoded'})
# Successful login
if is_logged(response):
return
error = self._html_search_regex(
'(?s)<ul[^>]+class=["\']messages["\'][^>]*>(.+?)</ul>',
response, 'error message', default=None)
if error:
raise ExtractorError('Unable to login: %s' % error, expected=True)
raise ExtractorError('Unable to log in')
def _real_initialize(self): def _real_initialize(self):
self._login() self._login()
@ -114,6 +150,7 @@ class CrunchyrollIE(CrunchyrollBaseIE):
# rtmp # rtmp
'skip_download': True, 'skip_download': True,
}, },
'skip': 'Video gone',
}, { }, {
'url': 'http://www.crunchyroll.com/rezero-starting-life-in-another-world-/episode-5-the-morning-of-our-promise-is-still-distant-702409', 'url': 'http://www.crunchyroll.com/rezero-starting-life-in-another-world-/episode-5-the-morning-of-our-promise-is-still-distant-702409',
'info_dict': { 'info_dict': {

View File

@ -1,30 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class CTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?ctv\.ca/video/player\?vid=(?P<id>[0-9.]+)'
_TESTS = [{
'url': 'http://www.ctv.ca/video/player?vid=706966',
'md5': 'ff2ebbeae0aa2dcc32a830c3fd69b7b0',
'info_dict': {
'id': '706966',
'ext': 'mp4',
'title': 'Larry Day and Richard Jutras on the TIFF red carpet of \'Stonewall\'',
'description': 'etalk catches up with Larry Day and Richard Jutras on the TIFF red carpet of "Stonewall”.',
'upload_date': '20150919',
'timestamp': 1442624700,
},
'expected_warnings': ['HTTP Error 404'],
}]
def _real_extract(self, url):
video_id = self._match_id(url)
return {
'_type': 'url_transparent',
'id': video_id,
'url': '9c9media:ctv_web:%s' % video_id,
'ie_key': 'NineCNineMedia',
}

View File

@ -1,9 +1,13 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
import time
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import int_or_none from ..utils import (
int_or_none,
HEADRequest,
)
class CultureUnpluggedIE(InfoExtractor): class CultureUnpluggedIE(InfoExtractor):
@ -32,6 +36,9 @@ class CultureUnpluggedIE(InfoExtractor):
video_id = mobj.group('id') video_id = mobj.group('id')
display_id = mobj.group('display_id') or video_id display_id = mobj.group('display_id') or video_id
# request setClientTimezone.php to get PHPSESSID cookie which is need to get valid json data in the next request
self._request_webpage(HEADRequest(
'http://www.cultureunplugged.com/setClientTimezone.php?timeOffset=%d' % -(time.timezone / 3600)), display_id)
movie_data = self._download_json( movie_data = self._download_json(
'http://www.cultureunplugged.com/movie-data/cu-%s.json' % video_id, display_id) 'http://www.cultureunplugged.com/movie-data/cu-%s.json' % video_id, display_id)

View File

@ -0,0 +1,120 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
urlencode_postdata,
compat_str,
ExtractorError,
)
class CuriosityStreamBaseIE(InfoExtractor):
_NETRC_MACHINE = 'curiositystream'
_auth_token = None
_API_BASE_URL = 'https://api.curiositystream.com/v1/'
def _handle_errors(self, result):
error = result.get('error', {}).get('message')
if error:
if isinstance(error, dict):
error = ', '.join(error.values())
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, error), expected=True)
def _call_api(self, path, video_id):
headers = {}
if self._auth_token:
headers['X-Auth-Token'] = self._auth_token
result = self._download_json(
self._API_BASE_URL + path, video_id, headers=headers)
self._handle_errors(result)
return result['data']
def _real_initialize(self):
(email, password) = self._get_login_info()
if email is None:
return
result = self._download_json(
self._API_BASE_URL + 'login', None, data=urlencode_postdata({
'email': email,
'password': password,
}))
self._handle_errors(result)
self._auth_token = result['message']['auth_token']
def _extract_media_info(self, media):
video_id = compat_str(media['id'])
limelight_media_id = media['limelight_media_id']
title = media['title']
subtitles = {}
for closed_caption in media.get('closed_captions', []):
sub_url = closed_caption.get('file')
if not sub_url:
continue
lang = closed_caption.get('code') or closed_caption.get('language') or 'en'
subtitles.setdefault(lang, []).append({
'url': sub_url,
})
return {
'_type': 'url_transparent',
'id': video_id,
'url': 'limelight:media:' + limelight_media_id,
'title': title,
'description': media.get('description'),
'thumbnail': media.get('image_large') or media.get('image_medium') or media.get('image_small'),
'duration': int_or_none(media.get('duration')),
'tags': media.get('tags'),
'subtitles': subtitles,
'ie_key': 'LimelightMedia',
}
class CuriosityStreamIE(CuriosityStreamBaseIE):
IE_NAME = 'curiositystream'
_VALID_URL = r'https?://app\.curiositystream\.com/video/(?P<id>\d+)'
_TEST = {
'url': 'https://app.curiositystream.com/video/2',
'md5': 'a0074c190e6cddaf86900b28d3e9ee7a',
'info_dict': {
'id': '2',
'ext': 'mp4',
'title': 'How Did You Develop The Internet?',
'description': 'Vint Cerf, Google\'s Chief Internet Evangelist, describes how he and Bob Kahn created the internet.',
'timestamp': 1448388615,
'upload_date': '20151124',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
media = self._call_api('media/' + video_id, video_id)
return self._extract_media_info(media)
class CuriosityStreamCollectionIE(CuriosityStreamBaseIE):
IE_NAME = 'curiositystream:collection'
_VALID_URL = r'https?://app\.curiositystream\.com/collection/(?P<id>\d+)'
_TEST = {
'url': 'https://app.curiositystream.com/collection/2',
'info_dict': {
'id': '2',
'title': 'Curious Minds: The Internet',
'description': 'How is the internet shaping our lives in the 21st Century?',
},
'playlist_mincount': 17,
}
def _real_extract(self, url):
collection_id = self._match_id(url)
collection = self._call_api(
'collections/' + collection_id, collection_id)
entries = []
for media in collection.get('media', []):
entries.append(self._extract_media_info(media))
return self.playlist_result(
entries, collection_id,
collection.get('title'), collection.get('description'))

View File

@ -94,7 +94,8 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
'title': 'Leanna Decker - Cyber Girl Of The Year Desires Nude [Playboy Plus]', 'title': 'Leanna Decker - Cyber Girl Of The Year Desires Nude [Playboy Plus]',
'uploader': 'HotWaves1012', 'uploader': 'HotWaves1012',
'age_limit': 18, 'age_limit': 18,
} },
'skip': 'video gone',
}, },
# geo-restricted, player v5 # geo-restricted, player v5
{ {
@ -144,7 +145,8 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
player_v5 = self._search_regex( player_v5 = self._search_regex(
[r'buildPlayer\(({.+?})\);\n', # See https://github.com/rg3/youtube-dl/issues/7826 [r'buildPlayer\(({.+?})\);\n', # See https://github.com/rg3/youtube-dl/issues/7826
r'playerV5\s*=\s*dmp\.create\([^,]+?,\s*({.+?})\);', r'playerV5\s*=\s*dmp\.create\([^,]+?,\s*({.+?})\);',
r'buildPlayer\(({.+?})\);'], r'buildPlayer\(({.+?})\);',
r'var\s+config\s*=\s*({.+?});'],
webpage, 'player v5', default=None) webpage, 'player v5', default=None)
if player_v5: if player_v5:
player = self._parse_json(player_v5, video_id) player = self._parse_json(player_v5, video_id)
@ -394,7 +396,7 @@ class DailymotionUserIE(DailymotionPlaylistIE):
class DailymotionCloudIE(DailymotionBaseInfoExtractor): class DailymotionCloudIE(DailymotionBaseInfoExtractor):
_VALID_URL_PREFIX = r'http://api\.dmcloud\.net/(?:player/)?embed/' _VALID_URL_PREFIX = r'https?://api\.dmcloud\.net/(?:player/)?embed/'
_VALID_URL = r'%s[^/]+/(?P<id>[^/?]+)' % _VALID_URL_PREFIX _VALID_URL = r'%s[^/]+/(?P<id>[^/?]+)' % _VALID_URL_PREFIX
_VALID_EMBED_URL = r'%s[^/]+/[^\'"]+' % _VALID_URL_PREFIX _VALID_EMBED_URL = r'%s[^/]+/[^\'"]+' % _VALID_URL_PREFIX

View File

@ -1,4 +1,4 @@
# encoding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals

View File

@ -38,6 +38,12 @@ class DBTVIE(InfoExtractor):
'only_matching': True, 'only_matching': True,
}] }]
@staticmethod
def _extract_urls(webpage):
return [url for _, url in re.findall(
r'<iframe[^>]+src=(["\'])((?:https?:)?//(?:www\.)?dbtv\.no/(?:lazy)?player/\d+.*?)\1',
webpage)]
def _real_extract(self, url): def _real_extract(self, url):
video_id, display_id = re.match(self._VALID_URL, url).groups() video_id, display_id = re.match(self._VALID_URL, url).groups()

View File

@ -1,61 +1,54 @@
# encoding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str from ..utils import unified_strdate
class DctpTvIE(InfoExtractor): class DctpTvIE(InfoExtractor):
_VALID_URL = r'https?://www.dctp.tv/(#/)?filme/(?P<id>.+?)/$' _VALID_URL = r'https?://(?:www\.)?dctp\.tv/(#/)?filme/(?P<id>.+?)/$'
_TEST = { _TEST = {
'url': 'http://www.dctp.tv/filme/videoinstallation-fuer-eine-kaufhausfassade/', 'url': 'http://www.dctp.tv/filme/videoinstallation-fuer-eine-kaufhausfassade/',
'md5': '174dd4a8a6225cf5655952f969cfbe24',
'info_dict': { 'info_dict': {
'id': '1324', 'id': '95eaa4f33dad413aa17b4ee613cccc6c',
'display_id': 'videoinstallation-fuer-eine-kaufhausfassade', 'display_id': 'videoinstallation-fuer-eine-kaufhausfassade',
'ext': 'flv', 'ext': 'mp4',
'title': 'Videoinstallation für eine Kaufhausfassade' 'title': 'Videoinstallation für eine Kaufhausfassade',
'description': 'Kurzfilm',
'upload_date': '20110407',
'thumbnail': 're:^https?://.*\.jpg$',
}, },
'params': {
# rtmp download
'skip_download': True,
}
} }
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
base_url = 'http://dctp-ivms2-restapi.s3.amazonaws.com/' webpage = self._download_webpage(url, video_id)
version_json = self._download_json(
base_url + 'version.json', object_id = self._html_search_meta('DC.identifier', webpage)
video_id, note='Determining file version')
version = version_json['version_name']
info_json = self._download_json(
'{0}{1}/restapi/slugs/{2}.json'.format(base_url, version, video_id),
video_id, note='Fetching object ID')
object_id = compat_str(info_json['object_id'])
meta_json = self._download_json(
'{0}{1}/restapi/media/{2}.json'.format(base_url, version, object_id),
video_id, note='Downloading metadata')
uuid = meta_json['uuid']
title = meta_json['title']
wide = meta_json['is_wide']
if wide:
ratio = '16x9'
else:
ratio = '4x3'
play_path = 'mp4:{0}_dctp_0500_{1}.m4v'.format(uuid, ratio)
servers_json = self._download_json( servers_json = self._download_json(
'http://www.dctp.tv/streaming_servers/', 'http://www.dctp.tv/elastic_streaming_client/get_streaming_server/',
video_id, note='Downloading server list') video_id, note='Downloading server list')
url = servers_json[0]['endpoint'] server = servers_json[0]['server']
m3u8_path = self._search_regex(
r'\'([^\'"]+/playlist\.m3u8)"', webpage, 'm3u8 path')
formats = self._extract_m3u8_formats(
'http://%s%s' % (server, m3u8_path), video_id, ext='mp4',
entry_protocol='m3u8_native')
title = self._og_search_title(webpage)
description = self._html_search_meta('DC.description', webpage)
upload_date = unified_strdate(
self._html_search_meta('DC.date.created', webpage))
thumbnail = self._og_search_thumbnail(webpage)
return { return {
'id': object_id, 'id': object_id,
'title': title, 'title': title,
'format': 'rtmp', 'formats': formats,
'url': url, 'display_id': video_id,
'play_path': play_path, 'description': description,
'rtmp_real_time': True, 'upload_date': upload_date,
'ext': 'flv', 'thumbnail': thumbnail,
'display_id': video_id
} }

View File

@ -13,7 +13,7 @@ from ..utils import (
class DemocracynowIE(InfoExtractor): class DemocracynowIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?democracynow.org/(?P<id>[^\?]*)' _VALID_URL = r'https?://(?:www\.)?democracynow\.org/(?P<id>[^\?]*)'
IE_NAME = 'democracynow' IE_NAME = 'democracynow'
_TESTS = [{ _TESTS = [{
'url': 'http://www.democracynow.org/shows/2015/7/3', 'url': 'http://www.democracynow.org/shows/2015/7/3',

View File

@ -7,11 +7,22 @@ from ..utils import (
int_or_none, int_or_none,
parse_age_limit, parse_age_limit,
unescapeHTML, unescapeHTML,
ExtractorError,
) )
class DiscoveryGoIE(InfoExtractor): class DiscoveryGoIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?discoverygo\.com/(?:[^/]+/)*(?P<id>[^/?#&]+)' _VALID_URL = r'''(?x)https?://(?:www\.)?(?:
discovery|
investigationdiscovery|
discoverylife|
animalplanet|
ahctv|
destinationamerica|
sciencechannel|
tlc|
velocitychannel
)go\.com/(?:[^/]+/)*(?P<id>[^/?#&]+)'''
_TEST = { _TEST = {
'url': 'https://www.discoverygo.com/love-at-first-kiss/kiss-first-ask-questions-later/', 'url': 'https://www.discoverygo.com/love-at-first-kiss/kiss-first-ask-questions-later/',
'info_dict': { 'info_dict': {
@ -43,7 +54,14 @@ class DiscoveryGoIE(InfoExtractor):
title = video['name'] title = video['name']
stream = video['stream'] stream = video.get('stream')
if not stream:
if video.get('authenticated') is True:
raise ExtractorError(
'This video is only available via cable service provider subscription that'
' is not currently supported. You may want to use --cookies.', expected=True)
else:
raise ExtractorError('Unable to find stream')
STREAM_URL_SUFFIX = 'streamUrl' STREAM_URL_SUFFIX = 'streamUrl'
formats = [] formats = []
for stream_kind in ('', 'hds'): for stream_kind in ('', 'hds'):

View File

@ -9,22 +9,39 @@ from ..utils import (
class DotsubIE(InfoExtractor): class DotsubIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?dotsub\.com/view/(?P<id>[^/]+)' _VALID_URL = r'https?://(?:www\.)?dotsub\.com/view/(?P<id>[^/]+)'
_TEST = { _TESTS = [{
'url': 'http://dotsub.com/view/aed3b8b2-1889-4df5-ae63-ad85f5572f27', 'url': 'https://dotsub.com/view/9c63db2a-fa95-4838-8e6e-13deafe47f09',
'md5': '0914d4d69605090f623b7ac329fea66e', 'md5': '21c7ff600f545358134fea762a6d42b6',
'info_dict': { 'info_dict': {
'id': 'aed3b8b2-1889-4df5-ae63-ad85f5572f27', 'id': '9c63db2a-fa95-4838-8e6e-13deafe47f09',
'ext': 'flv', 'ext': 'flv',
'title': 'Pyramids of Waste (2010), AKA The Lightbulb Conspiracy - Planned obsolescence documentary', 'title': 'MOTIVATION - "It\'s Possible" Best Inspirational Video Ever',
'description': 'md5:699a0f7f50aeec6042cb3b1db2d0d074', 'description': 'md5:41af1e273edbbdfe4e216a78b9d34ac6',
'thumbnail': 're:^https?://dotsub.com/media/aed3b8b2-1889-4df5-ae63-ad85f5572f27/p', 'thumbnail': 're:^https?://dotsub.com/media/9c63db2a-fa95-4838-8e6e-13deafe47f09/p',
'duration': 3169, 'duration': 198,
'uploader': '4v4l0n42', 'uploader': 'liuxt',
'timestamp': 1292248482.625, 'timestamp': 1385778501.104,
'upload_date': '20101213', 'upload_date': '20131130',
'view_count': int, 'view_count': int,
} }
} }, {
'url': 'https://dotsub.com/view/747bcf58-bd59-45b7-8c8c-ac312d084ee6',
'md5': '2bb4a83896434d5c26be868c609429a3',
'info_dict': {
'id': '168006778',
'ext': 'mp4',
'title': 'Apartments and flats in Raipur the white symphony',
'description': 'md5:784d0639e6b7d1bc29530878508e38fe',
'thumbnail': 're:^https?://dotsub.com/media/747bcf58-bd59-45b7-8c8c-ac312d084ee6/p',
'duration': 290,
'timestamp': 1476767794.2809999,
'upload_date': '20160525',
'uploader': 'parthivi001',
'uploader_id': 'user52596202',
'view_count': int,
},
'add_ie': ['Vimeo'],
}]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
@ -37,12 +54,23 @@ class DotsubIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
video_url = self._search_regex( video_url = self._search_regex(
[r'<source[^>]+src="([^"]+)"', r'"file"\s*:\s*\'([^\']+)'], [r'<source[^>]+src="([^"]+)"', r'"file"\s*:\s*\'([^\']+)'],
webpage, 'video url') webpage, 'video url', default=None)
info_dict = {
return {
'id': video_id, 'id': video_id,
'url': video_url, 'url': video_url,
'ext': 'flv', 'ext': 'flv',
}
if not video_url:
setup_data = self._parse_json(self._html_search_regex(
r'(?s)data-setup=([\'"])(?P<content>(?!\1).+?)\1',
webpage, 'setup data', group='content'), video_id)
info_dict = {
'_type': 'url_transparent',
'url': setup_data['src'],
}
info_dict.update({
'title': info['title'], 'title': info['title'],
'description': info.get('description'), 'description': info.get('description'),
'thumbnail': info.get('screenshotURI'), 'thumbnail': info.get('screenshotURI'),
@ -50,4 +78,6 @@ class DotsubIE(InfoExtractor):
'uploader': info.get('user'), 'uploader': info.get('user'),
'timestamp': float_or_none(info.get('dateCreated'), 1000), 'timestamp': float_or_none(info.get('dateCreated'), 1000),
'view_count': int_or_none(info.get('numberOfViews')), 'view_count': int_or_none(info.get('numberOfViews')),
} })
return info_dict

View File

@ -3,9 +3,17 @@ from __future__ import unicode_literals
import hashlib import hashlib
import time import time
import uuid
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import (ExtractorError, unescapeHTML) from ..compat import (
from ..compat import (compat_str, compat_basestring) compat_str,
compat_urllib_parse_urlencode,
)
from ..utils import (
ExtractorError,
unescapeHTML,
)
class DouyuTVIE(InfoExtractor): class DouyuTVIE(InfoExtractor):
@ -21,7 +29,6 @@ class DouyuTVIE(InfoExtractor):
'description': 're:.*m7show@163\.com.*', 'description': 're:.*m7show@163\.com.*',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
'uploader': '7师傅', 'uploader': '7师傅',
'uploader_id': '431925',
'is_live': True, 'is_live': True,
}, },
'params': { 'params': {
@ -37,7 +44,6 @@ class DouyuTVIE(InfoExtractor):
'description': 'md5:746a2f7a253966a06755a912f0acc0d2', 'description': 'md5:746a2f7a253966a06755a912f0acc0d2',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
'uploader': 'douyu小漠', 'uploader': 'douyu小漠',
'uploader_id': '3769985',
'is_live': True, 'is_live': True,
}, },
'params': { 'params': {
@ -54,7 +60,6 @@ class DouyuTVIE(InfoExtractor):
'description': 're:.*m7show@163\.com.*', 'description': 're:.*m7show@163\.com.*',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
'uploader': '7师傅', 'uploader': '7师傅',
'uploader_id': '431925',
'is_live': True, 'is_live': True,
}, },
'params': { 'params': {
@ -65,6 +70,10 @@ class DouyuTVIE(InfoExtractor):
'only_matching': True, 'only_matching': True,
}] }]
# Decompile core.swf in webpage by ffdec "Search SWFs in memory". core.swf
# is encrypted originally, but ffdec can dump memory to get the decrypted one.
_API_KEY = 'A12Svb&%1UUmf@hC'
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
@ -75,74 +84,56 @@ class DouyuTVIE(InfoExtractor):
room_id = self._html_search_regex( room_id = self._html_search_regex(
r'"room_id"\s*:\s*(\d+),', page, 'room id') r'"room_id"\s*:\s*(\d+),', page, 'room id')
config = None room = self._download_json(
# Douyu API sometimes returns error "Unable to load the requested class: eticket_redis_cache" 'http://m.douyu.com/html5/live?roomId=%s' % room_id, video_id,
# Retry with different parameters - same parameters cause same errors note='Downloading room info')['data']
for i in range(5):
prefix = 'room/%s?aid=android&client_sys=android&time=%d' % (
room_id, int(time.time()))
auth = hashlib.md5((prefix + '1231').encode('ascii')).hexdigest()
config_page = self._download_webpage(
'http://www.douyutv.com/api/v1/%s&auth=%s' % (prefix, auth),
video_id)
try:
config = self._parse_json(config_page, video_id, fatal=False)
except ExtractorError:
# Wait some time before retrying to get a different time() value
self._sleep(1, video_id, msg_template='%(video_id)s: Error occurs. '
'Waiting for %(timeout)s seconds before retrying')
continue
else:
break
if config is None:
raise ExtractorError('Unable to fetch API result')
data = config['data']
error_code = config.get('error', 0)
if error_code is not 0:
error_desc = 'Server reported error %i' % error_code
if isinstance(data, (compat_str, compat_basestring)):
error_desc += ': ' + data
raise ExtractorError(error_desc, expected=True)
show_status = data.get('show_status')
# 1 = live, 2 = offline # 1 = live, 2 = offline
if show_status == '2': if room.get('show_status') == '2':
raise ExtractorError('Live stream is offline', expected=True)
tt = compat_str(int(time.time() / 60))
did = uuid.uuid4().hex.upper()
sign_content = ''.join((room_id, did, self._API_KEY, tt))
sign = hashlib.md5((sign_content).encode('utf-8')).hexdigest()
flv_data = compat_urllib_parse_urlencode({
'cdn': 'ws',
'rate': '0',
'tt': tt,
'did': did,
'sign': sign,
})
video_info = self._download_json(
'http://www.douyu.com/lapi/live/getPlay/%s' % room_id, video_id,
data=flv_data, note='Downloading video info',
headers={'Content-Type': 'application/x-www-form-urlencoded'})
error_code = video_info.get('error', 0)
if error_code is not 0:
raise ExtractorError( raise ExtractorError(
'Live stream is offline', expected=True) '%s reported error %i' % (self.IE_NAME, error_code),
expected=True)
base_url = data['rtmp_url'] base_url = video_info['data']['rtmp_url']
live_path = data['rtmp_live'] live_path = video_info['data']['rtmp_live']
title = self._live_title(unescapeHTML(data['room_name'])) video_url = '%s/%s' % (base_url, live_path)
description = data.get('show_details')
thumbnail = data.get('room_src')
uploader = data.get('nickname') title = self._live_title(unescapeHTML(room['room_name']))
uploader_id = data.get('owner_uid') description = room.get('notice')
thumbnail = room.get('room_src')
multi_formats = data.get('rtmp_multi_bitrate') uploader = room.get('nickname')
if not isinstance(multi_formats, dict):
multi_formats = {}
multi_formats['live'] = live_path
formats = [{
'url': '%s/%s' % (base_url, format_path),
'format_id': format_id,
'preference': 1 if format_id == 'live' else 0,
} for format_id, format_path in multi_formats.items()]
self._sort_formats(formats)
return { return {
'id': room_id, 'id': room_id,
'display_id': video_id, 'display_id': video_id,
'url': video_url,
'title': title, 'title': title,
'description': description, 'description': description,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'uploader': uploader, 'uploader': uploader,
'uploader_id': uploader_id,
'formats': formats,
'is_live': True, 'is_live': True,
} }

View File

@ -1,4 +1,4 @@
# encoding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import itertools import itertools

View File

@ -4,26 +4,45 @@ from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
int_or_none,
float_or_none,
mimetype2ext,
parse_iso8601, parse_iso8601,
remove_end,
) )
class DRTVIE(InfoExtractor): class DRTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?dr\.dk/tv/se/(?:[^/]+/)*(?P<id>[\da-z-]+)(?:[/#?]|$)' _VALID_URL = r'https?://(?:www\.)?dr\.dk/(?:tv/se|nyheder)/(?:[^/]+/)*(?P<id>[\da-z-]+)(?:[/#?]|$)'
_TEST = { _TESTS = [{
'url': 'https://www.dr.dk/tv/se/boern/ultra/panisk-paske/panisk-paske-5', 'url': 'https://www.dr.dk/tv/se/boern/ultra/klassen-ultra/klassen-darlig-taber-10',
'md5': 'dc515a9ab50577fa14cc4e4b0265168f', 'md5': '25e659cccc9a2ed956110a299fdf5983',
'info_dict': { 'info_dict': {
'id': 'panisk-paske-5', 'id': 'klassen-darlig-taber-10',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Panisk Påske (5)', 'title': 'Klassen - Dårlig taber (10)',
'description': 'md5:ca14173c5ab24cd26b0fcc074dff391c', 'description': 'md5:815fe1b7fa656ed80580f31e8b3c79aa',
'timestamp': 1426984612, 'timestamp': 1471991907,
'upload_date': '20150322', 'upload_date': '20160823',
'duration': 1455, 'duration': 606.84,
}, },
} 'params': {
'skip_download': True,
},
}, {
'url': 'https://www.dr.dk/nyheder/indland/live-christianias-rydning-af-pusher-street-er-i-gang',
'md5': '2c37175c718155930f939ef59952474a',
'info_dict': {
'id': 'christiania-pusher-street-ryddes-drdkrjpo',
'ext': 'mp4',
'title': 'LIVE Christianias rydning af Pusher Street er i gang',
'description': '- Det er det fedeste, der er sket i 20 år, fortæller christianit til DR Nyheder.',
'timestamp': 1472800279,
'upload_date': '20160902',
'duration': 131.4,
},
}]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
@ -35,7 +54,8 @@ class DRTVIE(InfoExtractor):
'Video %s is not available' % video_id, expected=True) 'Video %s is not available' % video_id, expected=True)
video_id = self._search_regex( video_id = self._search_regex(
r'data-(?:material-identifier|episode-slug)="([^"]+)"', (r'data-(?:material-identifier|episode-slug)="([^"]+)"',
r'data-resource="[^>"]+mu/programcard/expanded/([^"]+)"'),
webpage, 'video id') webpage, 'video id')
programcard = self._download_json( programcard = self._download_json(
@ -43,9 +63,12 @@ class DRTVIE(InfoExtractor):
video_id, 'Downloading video JSON') video_id, 'Downloading video JSON')
data = programcard['Data'][0] data = programcard['Data'][0]
title = data['Title'] title = remove_end(self._og_search_title(
description = data['Description'] webpage, default=None), ' | TV | DR') or data['Title']
timestamp = parse_iso8601(data['CreatedTime']) description = self._og_search_description(
webpage, default=None) or data.get('Description')
timestamp = parse_iso8601(data.get('CreatedTime'))
thumbnail = None thumbnail = None
duration = None duration = None
@ -56,16 +79,18 @@ class DRTVIE(InfoExtractor):
subtitles = {} subtitles = {}
for asset in data['Assets']: for asset in data['Assets']:
if asset['Kind'] == 'Image': if asset.get('Kind') == 'Image':
thumbnail = asset['Uri'] thumbnail = asset.get('Uri')
elif asset['Kind'] == 'VideoResource': elif asset.get('Kind') == 'VideoResource':
duration = asset['DurationInMilliseconds'] / 1000.0 duration = float_or_none(asset.get('DurationInMilliseconds'), 1000)
restricted_to_denmark = asset['RestrictedToDenmark'] restricted_to_denmark = asset.get('RestrictedToDenmark')
spoken_subtitles = asset['Target'] == 'SpokenSubtitles' spoken_subtitles = asset.get('Target') == 'SpokenSubtitles'
for link in asset['Links']: for link in asset.get('Links', []):
uri = link['Uri'] uri = link.get('Uri')
target = link['Target'] if not uri:
format_id = target continue
target = link.get('Target')
format_id = target or ''
preference = None preference = None
if spoken_subtitles: if spoken_subtitles:
preference = -1 preference = -1
@ -76,8 +101,8 @@ class DRTVIE(InfoExtractor):
video_id, preference, f4m_id=format_id)) video_id, preference, f4m_id=format_id))
elif target == 'HLS': elif target == 'HLS':
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
uri, video_id, 'mp4', preference=preference, uri, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id=format_id)) preference=preference, m3u8_id=format_id))
else: else:
bitrate = link.get('Bitrate') bitrate = link.get('Bitrate')
if bitrate: if bitrate:
@ -85,7 +110,7 @@ class DRTVIE(InfoExtractor):
formats.append({ formats.append({
'url': uri, 'url': uri,
'format_id': format_id, 'format_id': format_id,
'tbr': bitrate, 'tbr': int_or_none(bitrate),
'ext': link.get('FileFormat'), 'ext': link.get('FileFormat'),
}) })
subtitles_list = asset.get('SubtitlesList') subtitles_list = asset.get('SubtitlesList')
@ -94,12 +119,18 @@ class DRTVIE(InfoExtractor):
'Danish': 'da', 'Danish': 'da',
} }
for subs in subtitles_list: for subs in subtitles_list:
lang = subs['Language'] if not subs.get('Uri'):
subtitles[LANGS.get(lang, lang)] = [{'url': subs['Uri'], 'ext': 'vtt'}] continue
lang = subs.get('Language') or 'da'
subtitles.setdefault(LANGS.get(lang, lang), []).append({
'url': subs['Uri'],
'ext': mimetype2ext(subs.get('MimeType')) or 'vtt'
})
if not formats and restricted_to_denmark: if not formats and restricted_to_denmark:
raise ExtractorError( self.raise_geo_restricted(
'Unfortunately, DR is not allowed to show this program outside Denmark.', expected=True) 'Unfortunately, DR is not allowed to show this program outside Denmark.',
expected=True)
self._sort_formats(formats) self._sort_formats(formats)

View File

@ -52,11 +52,24 @@ class EaglePlatformIE(InfoExtractor):
@staticmethod @staticmethod
def _extract_url(webpage): def _extract_url(webpage):
# Regular iframe embedding
mobj = re.search( mobj = re.search(
r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//.+?\.media\.eagleplatform\.com/index/player\?.+?)\1', r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//.+?\.media\.eagleplatform\.com/index/player\?.+?)\1',
webpage) webpage)
if mobj is not None: if mobj is not None:
return mobj.group('url') return mobj.group('url')
# Basic usage embedding (see http://dultonmedia.github.io/eplayer/)
mobj = re.search(
r'''(?xs)
<script[^>]+
src=(?P<q1>["\'])(?:https?:)?//(?P<host>.+?\.media\.eagleplatform\.com)/player/player\.js(?P=q1)
.+?
<div[^>]+
class=(?P<q2>["\'])eagleplayer(?P=q2)[^>]+
data-id=["\'](?P<id>\d+)
''', webpage)
if mobj is not None:
return 'eagleplatform:%(host)s:%(id)s' % mobj.groupdict()
@staticmethod @staticmethod
def _handle_error(response): def _handle_error(response):

View File

@ -14,7 +14,7 @@ class EinthusanIE(InfoExtractor):
_TESTS = [ _TESTS = [
{ {
'url': 'http://www.einthusan.com/movies/watch.php?id=2447', 'url': 'http://www.einthusan.com/movies/watch.php?id=2447',
'md5': 'af244f4458cd667205e513d75da5b8b1', 'md5': 'd71379996ff5b7f217eca034c34e3461',
'info_dict': { 'info_dict': {
'id': '2447', 'id': '2447',
'ext': 'mp4', 'ext': 'mp4',
@ -25,13 +25,13 @@ class EinthusanIE(InfoExtractor):
}, },
{ {
'url': 'http://www.einthusan.com/movies/watch.php?id=1671', 'url': 'http://www.einthusan.com/movies/watch.php?id=1671',
'md5': 'ef63c7a803e22315880ed182c10d1c5c', 'md5': 'b16a6fd3c67c06eb7c79c8a8615f4213',
'info_dict': { 'info_dict': {
'id': '1671', 'id': '1671',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Soodhu Kavvuum', 'title': 'Soodhu Kavvuum',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
'description': 'md5:05d8a0c0281a4240d86d76e14f2f4d51', 'description': 'md5:b40f2bf7320b4f9414f3780817b2af8c',
} }
}, },
] ]
@ -50,9 +50,11 @@ class EinthusanIE(InfoExtractor):
video_id = self._search_regex( video_id = self._search_regex(
r'data-movieid=["\'](\d+)', webpage, 'video id', default=video_id) r'data-movieid=["\'](\d+)', webpage, 'video id', default=video_id)
video_url = self._download_webpage( m3u8_url = self._download_webpage(
'http://cdn.einthusan.com/geturl/%s/hd/London,Washington,Toronto,Dallas,San,Sydney/' 'http://cdn.einthusan.com/geturl/%s/hd/London,Washington,Toronto,Dallas,San,Sydney/'
% video_id, video_id) % video_id, video_id, headers={'Referer': url})
formats = self._extract_m3u8_formats(
m3u8_url, video_id, ext='mp4', entry_protocol='m3u8_native')
description = self._html_search_meta('description', webpage) description = self._html_search_meta('description', webpage)
thumbnail = self._html_search_regex( thumbnail = self._html_search_regex(
@ -64,7 +66,7 @@ class EinthusanIE(InfoExtractor):
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'url': video_url, 'formats': formats,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'description': description, 'description': description,
} }

View File

@ -1,4 +1,4 @@
# encoding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor

View File

@ -1,4 +1,4 @@
# encoding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor

View File

@ -4,7 +4,7 @@ from .common import InfoExtractor
class EngadgetIE(InfoExtractor): class EngadgetIE(InfoExtractor):
_VALID_URL = r'https?://www.engadget.com/video/(?P<id>[^/?#]+)' _VALID_URL = r'https?://(?:www\.)?engadget\.com/video/(?P<id>[^/?#]+)'
_TESTS = [{ _TESTS = [{
# video with 5min ID # video with 5min ID

View File

@ -5,7 +5,7 @@ from ..utils import remove_end
class ESPNIE(InfoExtractor): class ESPNIE(InfoExtractor):
_VALID_URL = r'https?://espn\.go\.com/(?:[^/]+/)*(?P<id>[^/]+)' _VALID_URL = r'https?://(?:espn\.go|(?:www\.)?espn)\.com/(?:[^/]+/)*(?P<id>[^/]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://espn.go.com/video/clip?id=10365079', 'url': 'http://espn.go.com/video/clip?id=10365079',
'md5': '60e5d097a523e767d06479335d1bdc58', 'md5': '60e5d097a523e767d06479335d1bdc58',
@ -47,6 +47,9 @@ class ESPNIE(InfoExtractor):
}, { }, {
'url': 'http://espn.go.com/nba/playoffs/2015/story/_/id/12887571/john-wall-washington-wizards-no-swelling-left-hand-wrist-game-5-return', 'url': 'http://espn.go.com/nba/playoffs/2015/story/_/id/12887571/john-wall-washington-wizards-no-swelling-left-hand-wrist-game-5-return',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://www.espn.com/video/clip?id=10365079',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -1,58 +0,0 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class ExfmIE(InfoExtractor):
IE_NAME = 'exfm'
IE_DESC = 'ex.fm'
_VALID_URL = r'https?://(?:www\.)?ex\.fm/song/(?P<id>[^/]+)'
_SOUNDCLOUD_URL = r'http://(?:www\.)?api\.soundcloud\.com/tracks/([^/]+)/stream'
_TESTS = [
{
'url': 'http://ex.fm/song/eh359',
'md5': 'e45513df5631e6d760970b14cc0c11e7',
'info_dict': {
'id': '44216187',
'ext': 'mp3',
'title': 'Test House "Love Is Not Enough" (Extended Mix) DeadJournalist Exclusive',
'uploader': 'deadjournalist',
'upload_date': '20120424',
'description': 'Test House \"Love Is Not Enough\" (Extended Mix) DeadJournalist Exclusive',
},
'note': 'Soundcloud song',
'skip': 'The site is down too often',
},
{
'url': 'http://ex.fm/song/wddt8',
'md5': '966bd70741ac5b8570d8e45bfaed3643',
'info_dict': {
'id': 'wddt8',
'ext': 'mp3',
'title': 'Safe and Sound',
'uploader': 'Capital Cities',
},
'skip': 'The site is down too often',
},
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
song_id = mobj.group('id')
info_url = 'http://ex.fm/api/v3/song/%s' % song_id
info = self._download_json(info_url, song_id)['song']
song_url = info['url']
if re.match(self._SOUNDCLOUD_URL, song_url) is not None:
self.to_screen('Soundcloud song detected')
return self.url_result(song_url.replace('/stream', ''), 'Soundcloud')
return {
'id': song_id,
'url': song_url,
'ext': 'mp3',
'title': info['title'],
'thumbnail': info['image']['large'],
'uploader': info['artist'],
'view_count': info['loved_count'],
}

View File

@ -8,7 +8,7 @@ from ..utils import (
class ExpoTVIE(InfoExtractor): class ExpoTVIE(InfoExtractor):
_VALID_URL = r'https?://www\.expotv\.com/videos/[^?#]*/(?P<id>[0-9]+)($|[?#])' _VALID_URL = r'https?://(?:www\.)?expotv\.com/videos/[^?#]*/(?P<id>[0-9]+)($|[?#])'
_TEST = { _TEST = {
'url': 'http://www.expotv.com/videos/reviews/3/40/NYX-Butter-lipstick/667916', 'url': 'http://www.expotv.com/videos/reviews/3/40/NYX-Butter-lipstick/667916',
'md5': 'fe1d728c3a813ff78f595bc8b7a707a8', 'md5': 'fe1d728c3a813ff78f595bc8b7a707a8',

View File

@ -1,12 +1,18 @@
# flake8: noqa # flake8: noqa
from __future__ import unicode_literals from __future__ import unicode_literals
from .abc import ABCIE from .abc import (
from .abc7news import Abc7NewsIE ABCIE,
ABCIViewIE,
)
from .abcnews import ( from .abcnews import (
AbcNewsIE, AbcNewsIE,
AbcNewsVideoIE, AbcNewsVideoIE,
) )
from .abcotvs import (
ABCOTVSIE,
ABCOTVSClipsIE,
)
from .academicearth import AcademicEarthCourseIE from .academicearth import AcademicEarthCourseIE
from .acast import ( from .acast import (
ACastIE, ACastIE,
@ -25,10 +31,10 @@ from .aenetworks import (
HistoryTopicIE, HistoryTopicIE,
) )
from .afreecatv import AfreecaTVIE from .afreecatv import AfreecaTVIE
from .aftonbladet import AftonbladetIE
from .airmozilla import AirMozillaIE from .airmozilla import AirMozillaIE
from .aljazeera import AlJazeeraIE from .aljazeera import AlJazeeraIE
from .alphaporno import AlphaPornoIE from .alphaporno import AlphaPornoIE
from .amcnetworks import AMCNetworksIE
from .animeondemand import AnimeOnDemandIE from .animeondemand import AnimeOnDemandIE
from .anitube import AnitubeIE from .anitube import AnitubeIE
from .anysex import AnySexIE from .anysex import AnySexIE
@ -60,6 +66,7 @@ from .arte import (
ArteTVDDCIE, ArteTVDDCIE,
ArteTVMagazineIE, ArteTVMagazineIE,
ArteTVEmbedIE, ArteTVEmbedIE,
TheOperaPlatformIE,
ArteTVPlaylistIE, ArteTVPlaylistIE,
) )
from .atresplayer import AtresPlayerIE from .atresplayer import AtresPlayerIE
@ -67,6 +74,12 @@ from .atttechchannel import ATTTechChannelIE
from .audimedia import AudiMediaIE from .audimedia import AudiMediaIE
from .audioboom import AudioBoomIE from .audioboom import AudioBoomIE
from .audiomack import AudiomackIE, AudiomackAlbumIE from .audiomack import AudiomackIE, AudiomackAlbumIE
from .awaan import (
AWAANIE,
AWAANVideoIE,
AWAANLiveIE,
AWAANSeasonIE,
)
from .azubu import AzubuIE, AzubuLiveIE from .azubu import AzubuIE, AzubuLiveIE
from .baidu import BaiduVideoIE from .baidu import BaiduVideoIE
from .bambuser import BambuserIE, BambuserChannelIE from .bambuser import BambuserIE, BambuserChannelIE
@ -80,7 +93,8 @@ from .bbc import (
) )
from .beeg import BeegIE from .beeg import BeegIE
from .behindkink import BehindKinkIE from .behindkink import BehindKinkIE
from .beatportpro import BeatportProIE from .bellmedia import BellMediaIE
from .beatport import BeatportIE
from .bet import BetIE from .bet import BetIE
from .bigflix import BigflixIE from .bigflix import BigflixIE
from .bild import BildIE from .bild import BildIE
@ -103,7 +117,10 @@ from .brightcove import (
BrightcoveNewIE, BrightcoveNewIE,
) )
from .buzzfeed import BuzzFeedIE from .buzzfeed import BuzzFeedIE
from .byutv import BYUtvIE from .byutv import (
BYUtvIE,
BYUtvEventIE,
)
from .c56 import C56IE from .c56 import C56IE
from .camdemy import ( from .camdemy import (
CamdemyIE, CamdemyIE,
@ -117,9 +134,12 @@ from .carambatv import (
CarambaTVIE, CarambaTVIE,
CarambaTVPageIE, CarambaTVPageIE,
) )
from .cartoonnetwork import CartoonNetworkIE
from .cbc import ( from .cbc import (
CBCIE, CBCIE,
CBCPlayerIE, CBCPlayerIE,
CBCWatchVideoIE,
CBCWatchIE,
) )
from .cbs import CBSIE from .cbs import CBSIE
from .cbslocal import CBSLocalIE from .cbslocal import CBSLocalIE
@ -130,9 +150,11 @@ from .cbsnews import (
) )
from .cbssports import CBSSportsIE from .cbssports import CBSSportsIE
from .ccc import CCCIE from .ccc import CCCIE
from .cctv import CCTVIE
from .cda import CDAIE from .cda import CDAIE
from .ceskatelevize import CeskaTelevizeIE from .ceskatelevize import CeskaTelevizeIE
from .channel9 import Channel9IE from .channel9 import Channel9IE
from .charlierose import CharlieRoseIE
from .chaturbate import ChaturbateIE from .chaturbate import ChaturbateIE
from .chilloutzone import ChilloutzoneIE from .chilloutzone import ChilloutzoneIE
from .chirbit import ( from .chirbit import (
@ -165,7 +187,10 @@ from .comedycentral import (
) )
from .comcarcoff import ComCarCoffIE from .comcarcoff import ComCarCoffIE
from .commonmistakes import CommonMistakesIE, UnicodeBOMIE from .commonmistakes import CommonMistakesIE, UnicodeBOMIE
from .commonprotocols import RtmpIE from .commonprotocols import (
MmsIE,
RtmpIE,
)
from .condenast import CondeNastIE from .condenast import CondeNastIE
from .cracked import CrackedIE from .cracked import CrackedIE
from .crackle import CrackleIE from .crackle import CrackleIE
@ -177,9 +202,12 @@ from .crunchyroll import (
) )
from .cspan import CSpanIE from .cspan import CSpanIE
from .ctsnews import CtsNewsIE from .ctsnews import CtsNewsIE
from .ctv import CTVIE
from .ctvnews import CTVNewsIE from .ctvnews import CTVNewsIE
from .cultureunplugged import CultureUnpluggedIE from .cultureunplugged import CultureUnpluggedIE
from .curiositystream import (
CuriosityStreamIE,
CuriosityStreamCollectionIE,
)
from .cwtv import CWTVIE from .cwtv import CWTVIE
from .dailymail import DailyMailIE from .dailymail import DailyMailIE
from .dailymotion import ( from .dailymotion import (
@ -195,12 +223,6 @@ from .daum import (
DaumUserIE, DaumUserIE,
) )
from .dbtv import DBTVIE from .dbtv import DBTVIE
from .dcn import (
DCNIE,
DCNVideoIE,
DCNLiveIE,
DCNSeasonIE,
)
from .dctp import DctpTvIE from .dctp import DctpTvIE
from .deezer import DeezerPlaylistIE from .deezer import DeezerPlaylistIE
from .democracynow import DemocracynowIE from .democracynow import DemocracynowIE
@ -249,13 +271,18 @@ from .espn import ESPNIE
from .esri import EsriVideoIE from .esri import EsriVideoIE
from .europa import EuropaIE from .europa import EuropaIE
from .everyonesmixtape import EveryonesMixtapeIE from .everyonesmixtape import EveryonesMixtapeIE
from .exfm import ExfmIE
from .expotv import ExpoTVIE from .expotv import ExpoTVIE
from .extremetube import ExtremeTubeIE from .extremetube import ExtremeTubeIE
from .eyedotv import EyedoTVIE from .eyedotv import EyedoTVIE
from .facebook import FacebookIE from .facebook import (
FacebookIE,
FacebookPluginsVideoIE,
)
from .faz import FazIE from .faz import FazIE
from .fc2 import FC2IE from .fc2 import (
FC2IE,
FC2EmbedIE,
)
from .fczenit import FczenitIE from .fczenit import FczenitIE
from .firstpost import FirstpostIE from .firstpost import FirstpostIE
from .firsttv import FirstTVIE from .firsttv import FirstTVIE
@ -270,7 +297,11 @@ from .formula1 import Formula1IE
from .fourtube import FourTubeIE from .fourtube import FourTubeIE
from .fox import FOXIE from .fox import FOXIE
from .foxgay import FoxgayIE from .foxgay import FoxgayIE
from .foxnews import FoxNewsIE from .foxnews import (
FoxNewsIE,
FoxNewsArticleIE,
FoxNewsInsiderIE,
)
from .foxsports import FoxSportsIE from .foxsports import FoxSportsIE
from .franceculture import FranceCultureIE from .franceculture import FranceCultureIE
from .franceinter import FranceInterIE from .franceinter import FranceInterIE
@ -287,6 +318,7 @@ from .freevideo import FreeVideoIE
from .funimation import FunimationIE from .funimation import FunimationIE
from .funnyordie import FunnyOrDieIE from .funnyordie import FunnyOrDieIE
from .fusion import FusionIE from .fusion import FusionIE
from .fxnetworks import FXNetworksIE
from .gameinformer import GameInformerIE from .gameinformer import GameInformerIE
from .gameone import ( from .gameone import (
GameOneIE, GameOneIE,
@ -306,6 +338,7 @@ from .globo import (
GloboIE, GloboIE,
GloboArticleIE, GloboArticleIE,
) )
from .go import GoIE
from .godtube import GodTubeIE from .godtube import GodTubeIE
from .godtv import GodTVIE from .godtv import GodTVIE
from .golem import GolemIE from .golem import GolemIE
@ -316,13 +349,19 @@ from .goshgay import GoshgayIE
from .gputechconf import GPUTechConfIE from .gputechconf import GPUTechConfIE
from .groupon import GrouponIE from .groupon import GrouponIE
from .hark import HarkIE from .hark import HarkIE
from .hbo import HBOIE from .hbo import (
HBOIE,
HBOEpisodeIE,
)
from .hearthisat import HearThisAtIE from .hearthisat import HearThisAtIE
from .heise import HeiseIE from .heise import HeiseIE
from .hellporno import HellPornoIE from .hellporno import HellPornoIE
from .helsinki import HelsinkiIE from .helsinki import HelsinkiIE
from .hentaistigma import HentaiStigmaIE from .hentaistigma import HentaiStigmaIE
from .hgtv import HGTVIE from .hgtv import (
HGTVIE,
HGTVComShowIE,
)
from .historicfilms import HistoricFilmsIE from .historicfilms import HistoricFilmsIE
from .hitbox import HitboxIE, HitboxLiveIE from .hitbox import HitboxIE, HitboxLiveIE
from .hornbunny import HornBunnyIE from .hornbunny import HornBunnyIE
@ -334,6 +373,7 @@ from .hrti import (
HRTiIE, HRTiIE,
HRTiPlaylistIE, HRTiPlaylistIE,
) )
from .huajiao import HuajiaoIE
from .huffpost import HuffPostIE from .huffpost import HuffPostIE
from .hypem import HypemIE from .hypem import HypemIE
from .iconosquare import IconosquareIE from .iconosquare import IconosquareIE
@ -366,7 +406,12 @@ from .ivi import (
IviCompilationIE IviCompilationIE
) )
from .ivideon import IvideonIE from .ivideon import IvideonIE
from .iwara import IwaraIE
from .izlesene import IzleseneIE from .izlesene import IzleseneIE
from .jamendo import (
JamendoIE,
JamendoAlbumIE,
)
from .jeuxvideo import JeuxVideoIE from .jeuxvideo import JeuxVideoIE
from .jove import JoveIE from .jove import JoveIE
from .jwplatform import JWPlatformIE from .jwplatform import JWPlatformIE
@ -378,6 +423,7 @@ from .kankan import KankanIE
from .karaoketv import KaraoketvIE from .karaoketv import KaraoketvIE
from .karrierevideos import KarriereVideosIE from .karrierevideos import KarriereVideosIE
from .keezmovies import KeezMoviesIE from .keezmovies import KeezMoviesIE
from .ketnet import KetnetIE
from .khanacademy import KhanAcademyIE from .khanacademy import KhanAcademyIE
from .kickstarter import KickStarterIE from .kickstarter import KickStarterIE
from .keek import KeekIE from .keek import KeekIE
@ -396,12 +442,14 @@ from .kuwo import (
) )
from .la7 import LA7IE from .la7 import LA7IE
from .laola1tv import Laola1TvIE from .laola1tv import Laola1TvIE
from .lci import LCIIE
from .lcp import ( from .lcp import (
LcpPlayIE, LcpPlayIE,
LcpIE, LcpIE,
) )
from .learnr import LearnrIE from .learnr import LearnrIE
from .lecture2go import Lecture2GoIE from .lecture2go import Lecture2GoIE
from .lego import LEGOIE
from .lemonde import LemondeIE from .lemonde import LemondeIE
from .leeco import ( from .leeco import (
LeIE, LeIE,
@ -439,6 +487,10 @@ from .macgamestore import MacGameStoreIE
from .mailru import MailRuIE from .mailru import MailRuIE
from .makerschannel import MakersChannelIE from .makerschannel import MakersChannelIE
from .makertv import MakerTVIE from .makertv import MakerTVIE
from .mangomolo import (
MangomoloVideoIE,
MangomoloLiveIE,
)
from .matchtv import MatchTVIE from .matchtv import MatchTVIE
from .mdr import MDRIE from .mdr import MDRIE
from .meta import METAIE from .meta import METAIE
@ -446,6 +498,7 @@ from .metacafe import MetacafeIE
from .metacritic import MetacriticIE from .metacritic import MetacriticIE
from .mgoon import MgoonIE from .mgoon import MgoonIE
from .mgtv import MGTVIE from .mgtv import MGTVIE
from .miaopai import MiaoPaiIE
from .microsoftvirtualacademy import ( from .microsoftvirtualacademy import (
MicrosoftVirtualAcademyIE, MicrosoftVirtualAcademyIE,
MicrosoftVirtualAcademyCourseIE, MicrosoftVirtualAcademyCourseIE,
@ -474,9 +527,11 @@ from .motherless import MotherlessIE
from .motorsport import MotorsportIE from .motorsport import MotorsportIE
from .movieclips import MovieClipsIE from .movieclips import MovieClipsIE
from .moviezine import MoviezineIE from .moviezine import MoviezineIE
from .movingimage import MovingImageIE
from .msn import MSNIE from .msn import MSNIE
from .mtv import ( from .mtv import (
MTVIE, MTVIE,
MTVVideoIE,
MTVServicesEmbeddedIE, MTVServicesEmbeddedIE,
MTVDEIE, MTVDEIE,
) )
@ -499,6 +554,7 @@ from .nbc import (
CSNNEIE, CSNNEIE,
NBCIE, NBCIE,
NBCNewsIE, NBCNewsIE,
NBCOlympicsIE,
NBCSportsIE, NBCSportsIE,
NBCSportsVPlayerIE, NBCSportsVPlayerIE,
) )
@ -530,6 +586,7 @@ from .nextmedia import (
) )
from .nfb import NFBIE from .nfb import NFBIE
from .nfl import NFLIE from .nfl import NFLIE
from .nhk import NhkVodIE
from .nhl import ( from .nhl import (
NHLVideocenterIE, NHLVideocenterIE,
NHLNewsIE, NHLNewsIE,
@ -539,12 +596,17 @@ from .nhl import (
from .nick import ( from .nick import (
NickIE, NickIE,
NickDeIE, NickDeIE,
NickNightIE,
) )
from .niconico import NiconicoIE, NiconicoPlaylistIE from .niconico import NiconicoIE, NiconicoPlaylistIE
from .ninecninemedia import NineCNineMediaIE from .ninecninemedia import (
NineCNineMediaStackIE,
NineCNineMediaIE,
)
from .ninegag import NineGagIE from .ninegag import NineGagIE
from .ninenow import NineNowIE from .ninenow import NineNowIE
from .nintendo import NintendoIE from .nintendo import NintendoIE
from .nobelprize import NobelPrizeIE
from .noco import NocoIE from .noco import NocoIE
from .normalboots import NormalbootsIE from .normalboots import NormalbootsIE
from .nosvideo import NosVideoIE from .nosvideo import NosVideoIE
@ -567,13 +629,14 @@ from .nowtv import (
) )
from .noz import NozIE from .noz import NozIE
from .npo import ( from .npo import (
AndereTijdenIE,
NPOIE, NPOIE,
NPOLiveIE, NPOLiveIE,
NPORadioIE, NPORadioIE,
NPORadioFragmentIE, NPORadioFragmentIE,
SchoolTVIE, SchoolTVIE,
VPROIE, VPROIE,
WNLIE WNLIE,
) )
from .npr import NprIE from .npr import NprIE
from .nrk import ( from .nrk import (
@ -589,6 +652,7 @@ from .nytimes import (
NYTimesArticleIE, NYTimesArticleIE,
) )
from .nuvid import NuvidIE from .nuvid import NuvidIE
from .nzz import NZZIE
from .odatv import OdaTVIE from .odatv import OdaTVIE
from .odnoklassniki import OdnoklassnikiIE from .odnoklassniki import OdnoklassnikiIE
from .oktoberfesttv import OktoberfestTVIE from .oktoberfesttv import OktoberfestTVIE
@ -609,6 +673,7 @@ from .orf import (
ORFFM4IE, ORFFM4IE,
ORFIPTVIE, ORFIPTVIE,
) )
from .pandatv import PandaTVIE
from .pandoratv import PandoraTVIE from .pandoratv import PandoraTVIE
from .parliamentliveuk import ParliamentLiveUKIE from .parliamentliveuk import ParliamentLiveUKIE
from .patreon import PatreonIE from .patreon import PatreonIE
@ -623,7 +688,6 @@ from .phoenix import PhoenixIE
from .photobucket import PhotobucketIE from .photobucket import PhotobucketIE
from .pinkbike import PinkbikeIE from .pinkbike import PinkbikeIE
from .pladform import PladformIE from .pladform import PladformIE
from .played import PlayedIE
from .playfm import PlayFMIE from .playfm import PlayFMIE
from .plays import PlaysTVIE from .plays import PlaysTVIE
from .playtvak import PlaytvakIE from .playtvak import PlaytvakIE
@ -635,8 +699,12 @@ from .pluralsight import (
) )
from .podomatic import PodomaticIE from .podomatic import PodomaticIE
from .pokemon import PokemonIE from .pokemon import PokemonIE
from .polskieradio import PolskieRadioIE from .polskieradio import (
PolskieRadioIE,
PolskieRadioCategoryIE,
)
from .porn91 import Porn91IE from .porn91 import Porn91IE
from .porncom import PornComIE
from .pornhd import PornHdIE from .pornhd import PornHdIE
from .pornhub import ( from .pornhub import (
PornHubIE, PornHubIE,
@ -679,6 +747,10 @@ from .rbmaradio import RBMARadioIE
from .rds import RDSIE from .rds import RDSIE
from .redtube import RedTubeIE from .redtube import RedTubeIE
from .regiotv import RegioTVIE from .regiotv import RegioTVIE
from .rentv import (
RENTVIE,
RENTVArticleIE,
)
from .restudy import RestudyIE from .restudy import RestudyIE
from .reuters import ReutersIE from .reuters import ReutersIE
from .reverbnation import ReverbNationIE from .reverbnation import ReverbNationIE
@ -688,6 +760,7 @@ from .revision3 import (
) )
from .rice import RICEIE from .rice import RICEIE
from .ringtv import RingTVIE from .ringtv import RingTVIE
from .rmcdecouverte import RMCDecouverteIE
from .ro220 import Ro220IE from .ro220 import Ro220IE
from .rockstargames import RockstarGamesIE from .rockstargames import RockstarGamesIE
from .roosterteeth import RoosterTeethIE from .roosterteeth import RoosterTeethIE
@ -734,7 +807,10 @@ from .sendtonews import SendtoNewsIE
from .servingsys import ServingSysIE from .servingsys import ServingSysIE
from .sexu import SexuIE from .sexu import SexuIE
from .shahid import ShahidIE from .shahid import ShahidIE
from .shared import SharedIE from .shared import (
SharedIE,
VivoIE,
)
from .sharesix import ShareSixIE from .sharesix import ShareSixIE
from .sina import SinaIE from .sina import SinaIE
from .sixplay import SixPlayIE from .sixplay import SixPlayIE
@ -790,7 +866,6 @@ from .srgssr import (
SRGSSRPlayIE, SRGSSRPlayIE,
) )
from .srmediathek import SRMediathekIE from .srmediathek import SRMediathekIE
from .ssa import SSAIE
from .stanfordoc import StanfordOpenClassroomIE from .stanfordoc import StanfordOpenClassroomIE
from .steam import SteamIE from .steam import SteamIE
from .streamable import StreamableIE from .streamable import StreamableIE
@ -810,6 +885,7 @@ from .tagesschau import (
TagesschauIE, TagesschauIE,
) )
from .tass import TassIE from .tass import TassIE
from .tbs import TBSIE
from .tdslifeway import TDSLifewayIE from .tdslifeway import TDSLifewayIE
from .teachertube import ( from .teachertube import (
TeacherTubeIE, TeacherTubeIE,
@ -824,10 +900,12 @@ from .telebruxelles import TeleBruxellesIE
from .telecinco import TelecincoIE from .telecinco import TelecincoIE
from .telegraaf import TelegraafIE from .telegraaf import TelegraafIE
from .telemb import TeleMBIE from .telemb import TeleMBIE
from .telequebec import TeleQuebecIE
from .teletask import TeleTaskIE from .teletask import TeleTaskIE
from .telewebion import TelewebionIE from .telewebion import TelewebionIE
from .testurl import TestURLIE from .testurl import TestURLIE
from .tf1 import TF1IE from .tf1 import TF1IE
from .tfo import TFOIE
from .theintercept import TheInterceptIE from .theintercept import TheInterceptIE
from .theplatform import ( from .theplatform import (
ThePlatformIE, ThePlatformIE,
@ -836,8 +914,10 @@ from .theplatform import (
from .thescene import TheSceneIE from .thescene import TheSceneIE
from .thesixtyone import TheSixtyOneIE from .thesixtyone import TheSixtyOneIE
from .thestar import TheStarIE from .thestar import TheStarIE
from .theweatherchannel import TheWeatherChannelIE
from .thisamericanlife import ThisAmericanLifeIE from .thisamericanlife import ThisAmericanLifeIE
from .thisav import ThisAVIE from .thisav import ThisAVIE
from .thisoldhouse import ThisOldHouseIE
from .threeqsdn import ThreeQSDNIE from .threeqsdn import ThreeQSDNIE
from .tinypic import TinyPicIE from .tinypic import TinyPicIE
from .tlc import TlcDeIE from .tlc import TlcDeIE
@ -852,16 +932,12 @@ from .tnaflix import (
MovieFapIE, MovieFapIE,
) )
from .toggle import ToggleIE from .toggle import ToggleIE
from .thvideo import ( from .tonline import TOnlineIE
THVideoIE,
THVideoPlaylistIE
)
from .toutv import TouTvIE from .toutv import TouTvIE
from .toypics import ToypicsUserIE, ToypicsIE from .toypics import ToypicsUserIE, ToypicsIE
from .traileraddict import TrailerAddictIE from .traileraddict import TrailerAddictIE
from .trilulilu import TriluliluIE from .trilulilu import TriluliluIE
from .trollvids import TrollvidsIE from .trutv import TruTVIE
from .trutube import TruTubeIE
from .tube8 import Tube8IE from .tube8 import Tube8IE
from .tubitv import TubiTvIE from .tubitv import TubiTvIE
from .tudou import ( from .tudou import (
@ -891,12 +967,16 @@ from .tvc import (
) )
from .tvigle import TvigleIE from .tvigle import TvigleIE
from .tvland import TVLandIE from .tvland import TVLandIE
from .tvnoe import TVNoeIE
from .tvp import ( from .tvp import (
TVPEmbedIE, TVPEmbedIE,
TVPIE, TVPIE,
TVPSeriesIE, TVPSeriesIE,
) )
from .tvplay import TVPlayIE from .tvplay import (
TVPlayIE,
ViafreeIE,
)
from .tweakers import TweakersIE from .tweakers import TweakersIE
from .twentyfourvideo import TwentyFourVideoIE from .twentyfourvideo import TwentyFourVideoIE
from .twentymin import TwentyMinutenIE from .twentymin import TwentyMinutenIE
@ -926,8 +1006,13 @@ from .udn import UDNEmbedIE
from .digiteka import DigitekaIE from .digiteka import DigitekaIE
from .unistra import UnistraIE from .unistra import UnistraIE
from .uol import UOLIE from .uol import UOLIE
from .uplynk import (
UplynkIE,
UplynkPreplayIE,
)
from .urort import UrortIE from .urort import UrortIE
from .urplay import URPlayIE from .urplay import URPlayIE
from .usanetwork import USANetworkIE
from .usatoday import USATodayIE from .usatoday import USATodayIE
from .ustream import UstreamIE, UstreamChannelIE from .ustream import UstreamIE, UstreamChannelIE
from .ustudio import ( from .ustudio import (
@ -954,6 +1039,7 @@ from .vice import (
ViceIE, ViceIE,
ViceShowIE, ViceShowIE,
) )
from .viceland import VicelandIE
from .vidbit import VidbitIE from .vidbit import VidbitIE
from .viddler import ViddlerIE from .viddler import ViddlerIE
from .videodetective import VideoDetectiveIE from .videodetective import VideoDetectiveIE
@ -1014,6 +1100,7 @@ from .vporn import VpornIE
from .vrt import VRTIE from .vrt import VRTIE
from .vube import VubeIE from .vube import VubeIE
from .vuclip import VuClipIE from .vuclip import VuClipIE
from .vyborymos import VyboryMosIE
from .walla import WallaIE from .walla import WallaIE
from .washingtonpost import ( from .washingtonpost import (
WashingtonPostIE, WashingtonPostIE,
@ -1100,8 +1187,4 @@ from .youtube import (
) )
from .zapiks import ZapiksIE from .zapiks import ZapiksIE
from .zdf import ZDFIE, ZDFChannelIE from .zdf import ZDFIE, ZDFChannelIE
from .zingmp3 import ( from .zingmp3 import ZingMp3IE
ZingMp3SongIE,
ZingMp3AlbumIE,
)
from .zippcast import ZippCastIE

View File

@ -1,20 +1,14 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re from ..utils import str_to_int
from .keezmovies import KeezMoviesIE
from .common import InfoExtractor
from ..utils import (
int_or_none,
sanitized_Request,
str_to_int,
)
class ExtremeTubeIE(InfoExtractor): class ExtremeTubeIE(KeezMoviesIE):
_VALID_URL = r'https?://(?:www\.)?extremetube\.com/(?:[^/]+/)?video/(?P<id>[^/#?&]+)' _VALID_URL = r'https?://(?:www\.)?extremetube\.com/(?:[^/]+/)?video/(?P<id>[^/#?&]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.extremetube.com/video/music-video-14-british-euro-brit-european-cumshots-swallow-652431', 'url': 'http://www.extremetube.com/video/music-video-14-british-euro-brit-european-cumshots-swallow-652431',
'md5': '344d0c6d50e2f16b06e49ca011d8ac69', 'md5': '1fb9228f5e3332ec8c057d6ac36f33e0',
'info_dict': { 'info_dict': {
'id': 'music-video-14-british-euro-brit-european-cumshots-swallow-652431', 'id': 'music-video-14-british-euro-brit-european-cumshots-swallow-652431',
'ext': 'mp4', 'ext': 'mp4',
@ -35,58 +29,22 @@ class ExtremeTubeIE(InfoExtractor):
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) webpage, info = self._extract_info(url)
req = sanitized_Request(url) if not info['title']:
req.add_header('Cookie', 'age_verified=1') info['title'] = self._search_regex(
webpage = self._download_webpage(req, video_id) r'<h1[^>]+title="([^"]+)"[^>]*>', webpage, 'title')
video_title = self._html_search_regex(
r'<h1 [^>]*?title="([^"]+)"[^>]*>', webpage, 'title')
uploader = self._html_search_regex( uploader = self._html_search_regex(
r'Uploaded by:\s*</strong>\s*(.+?)\s*</div>', r'Uploaded by:\s*</strong>\s*(.+?)\s*</div>',
webpage, 'uploader', fatal=False) webpage, 'uploader', fatal=False)
view_count = str_to_int(self._html_search_regex( view_count = str_to_int(self._search_regex(
r'Views:\s*</strong>\s*<span>([\d,\.]+)</span>', r'Views:\s*</strong>\s*<span>([\d,\.]+)</span>',
webpage, 'view count', fatal=False)) webpage, 'view count', fatal=False))
flash_vars = self._parse_json( info.update({
self._search_regex(
r'var\s+flashvars\s*=\s*({.+?});', webpage, 'flash vars'),
video_id)
formats = []
for quality_key, video_url in flash_vars.items():
height = int_or_none(self._search_regex(
r'quality_(\d+)[pP]$', quality_key, 'height', default=None))
if not height:
continue
f = {
'url': video_url,
}
mobj = re.search(
r'/(?P<height>\d{3,4})[pP]_(?P<bitrate>\d+)[kK]_\d+', video_url)
if mobj:
height = int(mobj.group('height'))
bitrate = int(mobj.group('bitrate'))
f.update({
'format_id': '%dp-%dk' % (height, bitrate),
'height': height,
'tbr': bitrate,
})
else:
f.update({
'format_id': '%dp' % height,
'height': height,
})
formats.append(f)
self._sort_formats(formats)
return {
'id': video_id,
'title': video_title,
'formats': formats,
'uploader': uploader, 'uploader': uploader,
'view_count': view_count, 'view_count': view_count,
'age_limit': 18, })
}
return info

Some files were not shown because too many files have changed in this diff Show More