Compare commits

...

180 Commits

Author SHA1 Message Date
Sergey M․
c58e07a7aa release 2016.11.08 2016-11-08 22:11:21 +07:00
Sergey M․
f700afa24c [ChangeLog] Actualize 2016-11-08 22:09:03 +07:00
Yen Chi Hsuan
5d47b38cf5 [tmz:article] Fix extraction (closes #11052) 2016-11-08 21:53:41 +08:00
Sergey M․
ebc7ab1e23 [espn] Fix extraction (closes #11041) 2016-11-08 00:29:12 +07:00
Sergey M․
97726317ac [README.md] Mention HTTP headers and alternative way to obtain cookies and headers in -g FAQ 2016-11-07 23:53:22 +07:00
DarkZeros
cb882540e8 [mitele] Fix extraction after website redesign (fixes #10824) 2016-11-07 11:13:59 +01:00
Sergey M․
98708e6cbd [ard] Remove age restriction check (closes #11129) 2016-11-06 23:20:15 +07:00
Sergey M․
b52c9ef165 [extractor/generic] Improve support for pornhub embeds (closes #11100) 2016-11-06 21:52:00 +07:00
Sergey M․
e28ed498e6 [extractor/generic] Add support for redtube embds (closes #11099) 2016-11-06 21:42:41 +07:00
Sergey M․
5021ca6c13 [redtube] Add support for embed URLs 2016-11-06 21:39:29 +07:00
Sergey M․
37e7a71c6c [extractor/generic] Add support for drtuber embds (closes #11098) 2016-11-06 21:33:51 +07:00
Sergey M․
f5c4b06f17 [drtuber] Fix title extraction 2016-11-06 21:29:15 +07:00
Sergey M․
519d897049 [drtuber] Add support for embed URLs 2016-11-06 21:28:51 +07:00
Sergey M․
b61cd51869 [yahoo] Add test and improve some content id regex 2016-11-06 21:16:33 +07:00
Sergey M․
f420902a3b [yahoo] Add another content id regex (closes #11088) 2016-11-06 21:14:15 +07:00
Sergey M․
de328af362 [toutv] Relax _VALID_URL (closes #11121) 2016-11-05 03:24:42 +07:00
Sergey M․
b30e4c2754 release 2016.11.04 2016-11-04 22:07:54 +07:00
Sergey M․
09ffe34b00 [ChangeLog] Actualize 2016-11-04 21:59:42 +07:00
Sergey M․
640aff1d0c [anvato] Improve formats extraction 2016-11-04 21:45:24 +07:00
Sergey M․
c897af8aac [cbslocal] Update test 2016-11-04 21:33:08 +07:00
Sergey M․
f3c705f8ec [fox9] Add extractor (closes #11110) 2016-11-04 21:32:30 +07:00
Sergey M․
f93ac1d175 [anvato] Extract more metadata 2016-11-04 21:17:56 +07:00
Sergey M․
c4c9b8440c [extractor/common] Tolerate malformed RESOLUTION attribute in m3u8 manifests (closes #11113) 2016-11-04 05:02:31 +07:00
Sergey M․
32f2627aed [vodlocker] Add another removed file pattern (closes #11106) 2016-11-03 22:22:40 +07:00
Sergey M․
9d64e1dcdc [downloader/ism] Fix typo 2016-11-03 22:15:09 +07:00
Remita Amine
10380e55de [downloader/ism] fix AVC Decoder Configuration Record creation in python 3 2016-11-03 16:08:57 +01:00
Remita Amine
22979993e7 [vice] add coding cookie 2016-11-03 16:07:22 +01:00
Remita Amine
b47ecd0b74 [vzaar] Add new extractor(closes #11093) 2016-11-03 12:50:41 +01:00
Yen Chi Hsuan
3a86b2c51e Ignore and clean .wav files 2016-11-03 18:55:55 +08:00
Remita Amine
b811b4c93b [vice] add support for uplynk preplay videos(#11101) 2016-11-03 10:37:07 +01:00
Remita Amine
f4dfa9a5ed [tubitv] fix extraction(closes #11061) 2016-11-03 09:04:20 +01:00
Remita Amine
3b4b66b50c [shahid] add support for authentication(closes #11091) 2016-11-03 00:44:12 +01:00
Sergey M․
4119a96ce5 [extractor/generic] Skip URLs we came from when delegating ISM extraction 2016-11-02 23:43:41 +07:00
Sergey M․
26aae56690 [extractor/generic] Improve ISM extraction 2016-11-02 23:34:37 +07:00
Remita Amine
4f9cd4d36f [radiocanada] extract subtitle(closes #11096) 2016-11-02 13:55:40 +01:00
Sergey M․
cc99a77ac1 [extractor/generic] Add support for ISM manifests 2016-11-02 03:01:13 +07:00
Sergey M․
8956d6608a release 2016.11.02 2016-11-02 02:39:36 +07:00
Sergey M․
3365ea8929 [extractor/common] Remove unused code 2016-11-02 02:34:23 +07:00
Sergey M․
a18aeee803 [ChangeLog] Actualize 2016-11-02 02:33:17 +07:00
Sergey M․
1616f9b452 [extractor/common] Fix typo 2016-11-02 02:30:25 +07:00
Sergey M․
02dc0a36b7 [utils] Introduce base_url 2016-11-02 02:30:18 +07:00
Remita Amine
639e3b5c99 extract ISM formats in some of the extractors 2016-11-02 01:54:45 +07:00
Remita Amine
b2758123c5 add Basic support for Smooth Streaming protocol(#8118) 2016-11-02 01:54:45 +07:00
Sergey M․
f449c061d0 [nicknight] Improve extraction (closes #10769) 2016-11-02 01:35:53 +07:00
Sergey M․
9c82bba05d [nickde] Improve extraction 2016-11-02 01:29:05 +07:00
NeroBurner
e3577722b0 [nicknight] Add extractor 2016-11-02 01:25:59 +07:00
Sergey M․
b82c33dd67 [extractor/common] Improve mpd base URL extraction (closes #10909, closes #11079) 2016-11-01 01:15:46 +07:00
Sergey M․
e5a088dc4b [utils] Fix --match-filter for int-like strings (closes #11082) 2016-10-31 23:32:08 +07:00
Sergey M․
2c6da7df4a release 2016.10.31 2016-10-31 01:36:53 +07:00
Mel Shafer
7e7a028aa4 [README.md] Fix a typo 2016-10-31 01:12:36 +07:00
Sergey M․
e70a5e6566 release 2016.10.30 2016-10-30 18:24:49 +07:00
Sergey M․
3bf55be466 [ChangeLog] Actualize 2016-10-30 18:19:29 +07:00
Sergey M․
a901fc5fc2 [vessel] Add tests for #11068 2016-10-30 18:17:15 +07:00
dundua
cae6bc0118 [vessel] Improve video id extraction 2016-10-30 18:14:51 +07:00
Yen Chi Hsuan
d9ee2e5cf6 [facebook] Remove SWF params so that 1080P are detected
Closes #11073

In the provided link, SWF params give up to 720P, and VideoConfig
gives 1080P for both best and bestvideo. I guess all Facebook videos
supports HTML5 now, so I remove the old detection for SWF params
2016-10-30 18:20:55 +08:00
Yen Chi Hsuan
e1a0b3b81c [imgur] Recognize /r/ URLs (closes #11071) 2016-10-30 17:02:03 +08:00
Sergey M․
2a048f9878 [beeg] Fix extraction (closes #11069) 2016-10-30 05:27:50 +07:00
Sergey M․
ea331f40e6 Credit @Thor77 for jamendo (#10934) 2016-10-30 05:10:31 +07:00
Yen Chi Hsuan
f02700a1fa [openload] Fix extraction (#10408)
Thanks @TwelveCharzz again for studying openload codes
2016-10-29 18:00:30 +08:00
Sergey M․
f3517569f6 [gvsearch] Modernize and fix page result request (closes #11051) 2016-10-28 23:19:59 +07:00
neutric
c725333d41 [ard] Fix typo 2016-10-27 03:59:00 +07:00
Yen Chi Hsuan
a5a8877f9c [adultswim] Fix extraction (closes #10979) 2016-10-27 02:16:48 +08:00
Remita Amine
43c53a1700 [nobelprize] Add new extractor(closes #9999) 2016-10-26 18:15:23 +01:00
Yen Chi Hsuan
ec8705117a [hornbunny] Fix extraction (#10981) 2016-10-27 00:10:51 +08:00
Remita Amine
3d8d44c7b1 [tvp] improve video id extraction(closes #10585) 2016-10-26 16:47:22 +01:00
Sergey M․
88839f4380 release 2016.10.26 2016-10-26 19:55:09 +07:00
Sergey M․
83e9374464 [ChangeLog] Actualize 2016-10-26 19:53:44 +07:00
Sergey M․
773017c648 [rentv] Move rentv test from generic extractor and add only matching tests 2016-10-26 19:52:43 +07:00
Remita Amine
777d90dc28 [rentv] Add new extractor(closes #10620) 2016-10-26 10:07:35 +01:00
Sergey M․
3791d84acc [ard] Detect unavailable videos (closes #11018) 2016-10-25 21:21:47 +07:00
Sergey M․
9305a0dc60 [vk] Fix extraction (closes #11022) 2016-10-25 21:05:29 +07:00
Sergey M․
94e08950e3 release 2016.10.25 2016-10-25 03:19:36 +07:00
Sergey M․
ee824a8d06 [ChangeLog] Actualize
[ci skip]
2016-10-25 02:51:07 +07:00
Sergey M․
d3b6b3b95b [jamendo] Improve 2016-10-25 02:46:48 +07:00
Thor77
b17422753f [jamendo] Add extractor 2016-10-25 02:43:03 +07:00
Sergey M․
b0b28b8241 [ChangeLog] Actualize 2016-10-25 01:53:41 +07:00
Sergey M․
81cb7a5978 Credit @azuwis for pandatv (#10736) 2016-10-25 01:51:46 +07:00
Sergey M․
d2e96a8ed4 [pandatv] Extract m3u8, document reverse source and PEP 8 2016-10-25 01:51:37 +07:00
Zhong Jianxin
2e7c8cab55 [pandatv] Add new extractor 2016-10-25 01:46:02 +07:00
Sergey M․
d7d4481c6a [movieclips] Fix _VALID_URL 2016-10-24 23:54:42 +07:00
Yen Chi Hsuan
5ace137bf4 [dotsub] Support vimeo embed (closes #10964) 2016-10-24 15:13:33 +08:00
Yen Chi Hsuan
9dde0e04e6 [litv] Fix extraction (#11006) 2016-10-23 23:23:40 +08:00
Sergey M․
f16f8505b1 [vimeo] Delegate ondemand redirects to ondemand extractor (closes #10994) 2016-10-23 18:48:50 +07:00
Sergey M․
9dc13a6780 [vivo] Fix extraction (closes #11003) 2016-10-23 18:07:56 +07:00
Sergey M․
9aa929d337 [twitch:stream] Add support for rebroadcasts (closes #10995) 2016-10-23 17:20:45 +07:00
Sergey M․
425f3fdfcb [pluralsight] Fix subtitles conversion (closes #10990) 2016-10-22 21:15:39 +07:00
Yen Chi Hsuan
e034cbc581 Merge branch 'johnhawkinson-stdin2' 2016-10-22 13:10:27 +08:00
Yen Chi Hsuan
5378f8ce0d [ChangeLog] Update for #10996 2016-10-22 13:08:56 +08:00
Yen Chi Hsuan
b64d04c119 [utils] Clarify for redirecting STDIN in get_exe_version() 2016-10-22 13:04:05 +08:00
John Hawkinson
00ca755231 [get_exe_version] Do version probes with <&-
When doing version probes for ffmpeg, do the
equivalent of calling it as:

    ffmpeg -version <&-

Where <&- is shell syntax for closing stdin before calling the
program. This is roughly equivalent to </dev/null without actually
opening /dev/null.

This prevents ffmpeg -version from hanging when run in the background.
Fixes #955.

The reason is that ffmpeg tries to manipulate stdin to set up terminal
characteristic, and that causes the kernel to suspend the parent
process (youtube-dl).

Note that closing stdin is achieved by calling subprocess.Popen() with
stdin set to subprocess.PIPE and without passing any input to
Popen.communicate(). This is somewhat subtle.
2016-10-22 00:34:08 -04:00
Sergey M․
69c2d42bd7 release 2016.10.21.1 2016-10-21 04:57:28 +07:00
Sergey M․
062e2769a3 [ChangeLog] Actualize 2016-10-21 04:53:26 +07:00
Sergey M․
859447a28d [adobepass] PEP 8 2016-10-21 04:38:14 +07:00
Sergey M․
f8ae2c7f30 [pluralsight] Process all clip URLs (closes #10984) 2016-10-21 04:35:32 +07:00
Sergey M․
9ce0077485 release 2016.10.21 2016-10-21 03:08:42 +07:00
Sergey M․
0ebb86bd18 [ChangeLog] Actualize 2016-10-21 03:07:03 +07:00
Sergey M․
9df6b03caf [pluralsight] Adapt to new API (closes #10972) 2016-10-21 03:00:03 +07:00
Yen Chi Hsuan
8e2915d70b Revert "[postprocessor/embedthumbnail] Allow mkv to embed thumbnails"
This reverts commit 7360db05b4.

This commit was added as an attempt to fix #6046. Unfortunately, the fix
is completely wrong. As reported on #10359, embedded thumbnails are not
displayed in VLC, and Se7en on IRC reports that the embedded thumbnail
misleads mpv as well.

The correct way is using -attachment of ffmpeg, while the current
run_ffmpeg_multiple_files API can't handle it cleanly.
2016-10-20 15:07:19 +08:00
Yen Chi Hsuan
19e447150d [ChangeLog] Update for #10971 2016-10-20 04:19:33 +08:00
Yen Chi Hsuan
ad9fd84004 Merge pull request #10971 from kasper93/openload
[openload] Fix extraction.
2016-10-20 04:18:27 +08:00
Kacper Michajłow
60633ae9a0 [openload] Fix extraction.
Fixes #10408
2016-10-19 22:00:29 +02:00
Remita Amine
a81dc82151 Credit @raleeper for the Comcast support (#10819) 2016-10-19 20:39:52 +01:00
remitamine
9218a6b4f5 Merge pull request #10819 from raleeper/adobepass
[adobepass] Add Comcast
2016-10-19 20:16:24 +01:00
Remita Amine
02af6ec707 [natgeo] extract m3u8 formats(closes #10959) 2016-10-19 19:39:41 +01:00
Sergey M․
05b7996cab [ChangeLog] Fix typos and add unreleased version header 2016-10-20 00:00:53 +07:00
raleeper
46f6052950 [adobepass] Add Comcast with fixed _download_webpage calls 2016-10-19 09:56:26 -07:00
Sergey M․
c8802041dd release 2016.10.19 2016-10-19 23:55:16 +07:00
Sergey M․
c7911009a0 [ChangeLog] Actualize 2016-10-19 23:53:37 +07:00
Sergey M․
2b96b06bf0 [vidzi] Fix extraction (closes #10908, closes #10952) 2016-10-19 23:31:58 +07:00
Sergey M․
06b3fe2926 [utils] Expose PACKED_CODES_RE 2016-10-19 23:28:49 +07:00
Jack Danger Canty
2c6743bf0f [README.md] Change 'guys' to 'people'
Folks at Ubuntu aren't all male so calling them 'guys' is a little odd.
2016-10-19 22:32:17 +07:00
Remita Amine
efb6242916 [urplay] add supprt for urskola.se and fix subtitle extraction(closes #10915) 2016-10-19 15:05:39 +01:00
Remita Amine
0384932e3d [extractor/common] try to extract non smil wowza mpd manifests 2016-10-19 14:57:12 +01:00
Remita Amine
edd6074cea [extractor/common] detect f4m audio only formats 2016-10-19 14:42:48 +01:00
Remita Amine
791d29dbf8 [orf] add subtitles support(closes #10939) 2016-10-19 11:34:46 +01:00
Sergey M․
481cc7335c [youtube] Fix --no-playlist behavior for youtu.be/id URLs (closes #10896) 2016-10-19 03:27:18 +07:00
Sergey M․
853a71b628 [nrk] Improve _VALID_URL 2016-10-19 03:02:14 +07:00
Sergey M․
e2628fb6a0 [nrk] Relax _VALID_URL (closes #10928) 2016-10-19 02:59:44 +07:00
Sergey M․
df4939b1cd [nytimes] Fix typo 2016-10-17 22:16:23 +07:00
Sergey M․
0b94dbb115 [postprocessor/ffmpeg] PEP 8 2016-10-16 19:16:47 +07:00
Sergey M․
8d76bdf12b [extractor/common] Mention podcast in series fields section 2016-10-16 18:37:17 +07:00
Sergey M․
8204bacf1d Credit @johnhawkinson for nytimes podcasts (#10926) 2016-10-16 18:21:42 +07:00
Sergey M․
47da782337 [nytimes] Improve (closes #10926) 2016-10-16 18:21:02 +07:00
John Hawkinson
74324a7ac2 [nytimes] Add support for podcasts 2016-10-16 17:31:55 +07:00
Sergey M․
b0dfcab60a [pluralsight] Relax _VALID_URL (closes #10941) 2016-10-16 17:20:32 +07:00
Sergey M․
bbd7706898 release 2016.10.16 2016-10-16 03:23:05 +07:00
Sergey M․
112740e79f [ChangeLog] Actualize 2016-10-16 03:09:27 +07:00
Sergey M․
c0b1e88895 [huajiao] Improve feed regex 2016-10-16 03:02:41 +07:00
Sergey M․
7cdfbbf9b8 [extractors] Change import for theoperaplatform extractor 2016-10-16 02:55:38 +07:00
Déstin Reed
ac943d48d3 [Beatport] Update extractor name and tests 2016-10-16 02:33:43 +07:00
arza
73498a8921 [ruutu] Add support for supla.fi 2016-10-16 02:31:56 +07:00
Simon Morgan
9187ee4f19 [README.md] Improve grammar 2016-10-16 02:26:06 +07:00
Pierre Mdawar
2273e2c530 [postprocessor/ffmpeg] Return correct filepath and ext in updated information in FFmpegExtractAudioPP
Return correct audio's filepath and ext instead of the video's when extracting audio and audio file already exists.
2016-10-16 02:12:03 +07:00
Sergey M․
4b492e3579 [theoperaplatform] Rename, fix _VALID_URL and fix test 2016-10-16 00:24:06 +07:00
Juanjo Benages
9c4258bcec [theoperaplatform] Add extractor 2016-10-16 00:21:15 +07:00
Sergey M․
ea8aefd1d7 [lynda] Fix height for prioritized streams 2016-10-16 00:08:46 +07:00
Sergey M․
6edfc40a0e [lynda] Add fallback extraction scenario 2016-10-16 00:07:40 +07:00
Sergey M․
68d9561ca1 [lynda] Switch to https (closes #10916) 2016-10-15 23:56:09 +07:00
Yen Chi Hsuan
cfc0e7c82b Credit @pyx for the Huajiao extractor (#10917) 2016-10-15 16:48:35 +08:00
Yen Chi Hsuan
4102e64051 Merge branch 'pyx-huajiao' 2016-10-15 14:56:00 +08:00
Yen Chi Hsuan
f605242bfc [ChangeLog] Update for #10917 2016-10-15 14:55:36 +08:00
Yen Chi Hsuan
d32fa0f12c [huajiao] Coding style 2016-10-15 14:53:53 +08:00
Yen Chi Hsuan
a347a0d088 Merge branch 'huajiao' of https://github.com/pyx/youtube-dl into pyx-huajiao 2016-10-15 14:53:05 +08:00
Yen Chi Hsuan
77c5b98dcd [crunchyroll] Skip an invalid _TEST 2016-10-15 14:36:07 +08:00
Yen Chi Hsuan
88ebefc054 [cmt] Fix mgid extraction (closes #10813)
The example in #10813 requires TV provider authentication in Firefox,
while youtube-dl can download it directly with an US proxy.

I'm not sure whether the mgid fix is cmt-specific or it applies to all
mtv-based sites. I keep it in cmt.py until similar patterns are found in
other websites.
2016-10-15 14:27:15 +08:00
Philip Xu
2e638d7bca Made optional fields optional 2016-10-14 14:12:06 -04:00
Sergey M․
a26b174c61 [safari:course] Add support for techbus.safaribooksonline.com 2016-10-15 00:29:33 +07:00
Sergey M․
73c801d660 [orf:tvthek] Fix extraction and modernize (closes #10898) 2016-10-14 23:43:09 +07:00
Vítor Galvão
dff5107b68 README.md: fix alrady typo 2016-10-14 23:21:08 +07:00
Yen Chi Hsuan
8c3e448e80 [clipfish] Update _TEST; the old one is gone 2016-10-15 00:12:21 +08:00
Yen Chi Hsuan
2ecbd2ad6f [chirbit:profile] Fix extraction 2016-10-15 00:01:46 +08:00
Yen Chi Hsuan
62a0b86e4f [carambatv] Fix extraction
The video requested in #9815 now has videomore embeds.
2016-10-14 23:43:18 +08:00
Yen Chi Hsuan
146969e05b [videomore] Support <iframe> embed videos
Seen in CarambaTVPage
2016-10-14 23:42:11 +08:00
Yen Chi Hsuan
e2004ccaf7 [canalplus] Fix video_id and update _TESTS
Some tests are gone, and some redirect to different videos
2016-10-14 20:26:12 +08:00
Yen Chi Hsuan
a5f8473145 [cbsinteractive] Fix extraction for cnet.com 2016-10-14 18:20:01 +08:00
Philip Xu
b7f59a3bf6 [huajiao] Add new extractor 2016-10-13 21:51:26 -04:00
Yen Chi Hsuan
580d411931 [parliamentliveuk] Recognize lower case URLs
Closes #10912

Seems parliamentliveuk matches URLs case-insentive. For example this URL
also works:
http://parliamentlive.tv/EvEnt/Index/3F24936f-130f-40bf-9a5d-b3d6479da6a4
2016-10-14 00:44:28 +08:00
Sergey M․
5c4bfd4da5 release 2016.10.12 2016-10-12 21:30:05 +07:00
Sergey M․
7104ae799c [ChangeLog] Actualize 2016-10-12 21:25:04 +07:00
Sergey M․
bcd6276520 [downloader/common] Remove debug output 2016-10-12 21:22:33 +07:00
Sergey M․
591e384552 [streamable] Remove debug output 2016-10-12 21:22:12 +07:00
Yen Chi Hsuan
9feb1c9731 [dailymotion] Fix extraction and update _TESTS
Closes #10901

Seems all videos use player V5 syntax now
2016-10-12 21:45:49 +08:00
Yen Chi Hsuan
a093cfc78b [vimeo:review] Fix extraction (#10900)
Now Vimeo Review videos uses React. Thanks @davekaro for analyzing the
problem!
2016-10-12 01:48:06 +08:00
Yen Chi Hsuan
6f20b65e72 [test/test_http] Update tests
After switching to HTML5 extraction helpers in generic.py, the result
info_dict is always a playlist.
2016-10-12 01:41:41 +08:00
Yen Chi Hsuan
cea364f70c [extractor/common] Support HTML media elements without child nodes 2016-10-12 01:40:28 +08:00
Yen Chi Hsuan
55642487f0 [nhl] Skip invalid m3u8 formats (closes #10713) 2016-10-11 20:50:52 +08:00
Yen Chi Hsuan
3d643f4cec [hbo] Add HBOEpisodeIE (#10892) 2016-10-11 17:46:52 +08:00
Yen Chi Hsuan
c452e69d3d [footyroom] Fix extraction and update _TESTS (closes #10810) 2016-10-11 17:46:13 +08:00
Yen Chi Hsuan
555787d717 [streamable] Add helper for extracting embedded videos 2016-10-11 17:44:35 +08:00
Yen Chi Hsuan
f165ca70eb [abc.net.au:iview] Fix for non-series videos (closes #10895) 2016-10-11 12:53:27 +08:00
Yen Chi Hsuan
27b8d2ee95 [hbo] Add display_id and another test (#10892) 2016-10-11 12:41:44 +08:00
Yen Chi Hsuan
71cdcb2331 [hbo] Support episode pages (closes #10892) 2016-10-11 12:30:35 +08:00
Yen Chi Hsuan
176006a120 [allocine] Fix for /video/ videos (closes #10860) 2016-10-09 19:42:42 +08:00
Yen Chi Hsuan
65f4c1de3d [allocine] Fix extraction (closes #10860)
I change the URL of the third test case, because now the original URL
does not contain a video anymore, and there's no easy to get the real
URL from the /film/ one.
2016-10-09 18:58:15 +08:00
Yen Chi Hsuan
b0082629a9 [nextmedia] Support action news (動新聞) on Apple Daily 2016-10-09 18:42:15 +08:00
Yen Chi Hsuan
8204c73352 [Makefile] Fix for GNU make < 4 (closes #9387)
Shell assignment operator in BSD make != is ported to GNU make in
version 4.0, so 3.x doesn't work. I choose to drop BSD make support as
installing GNU make on *BSD systems is easier than installing newer GNU
make.
2016-10-09 18:24:45 +08:00
Déstin Reed
2b51dac1f9 [slutload] Fix test and simplify 2016-10-09 01:17:38 +07:00
Sergey M․
f68901e50a [reverbnation] Eliminate code duplication in thumbnails extraction 2016-10-09 01:02:35 +07:00
Déstin Reed
3adb9d119e [reverbnation] Modernize 2016-10-09 01:00:38 +07:00
Remita Amine
1dd58e14d8 [lego] improve info extraction and bypass geo restriction(closes #10872) 2016-10-08 08:33:18 +01:00
102 changed files with 2766 additions and 839 deletions

View File

@@ -6,8 +6,8 @@
---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.10.07*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.10.07**
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.11.08*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.11.08**
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2016.10.07
[debug] youtube-dl version 2016.11.08
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

1
.gitignore vendored
View File

@@ -30,6 +30,7 @@ updates_key.pem
*.m4v
*.mp3
*.3gp
*.wav
*.part
*.swp
test/testdata

View File

@@ -185,3 +185,8 @@ Aleksander Nitecki
Sebastian Blunt
Matěj Cepl
Xie Yanbo
Philip Xu
John Hawkinson
Rich Leeper
Zhong Jianxin
Thor77

View File

@@ -12,7 +12,7 @@ $ youtube-dl -v <your command line>
[debug] Proxy map: {}
...
```
**Do not post screenshots of verbose log only plain text is acceptable.**
**Do not post screenshots of verbose logs; only plain text is acceptable.**
The output (including the first lines) contains important debugging information. Issues without the full output are often not reproducible and therefore do not get solved in short order, if ever.
@@ -66,7 +66,7 @@ Only post features that you (or an incapacitated friend you can personally talk
### Is your question about youtube-dl?
It may sound strange, but some bug reports we receive are completely unrelated to youtube-dl and relate to a different or even the reporter's own application. Please make sure that you are actually using youtube-dl. If you are using a UI for youtube-dl, report the bug to the maintainer of the actual application providing the UI. On the other hand, if your UI for youtube-dl fails in some way you believe is related to youtube-dl, by all means, go ahead and report the bug.
It may sound strange, but some bug reports we receive are completely unrelated to youtube-dl and relate to a different, or even the reporter's own, application. Please make sure that you are actually using youtube-dl. If you are using a UI for youtube-dl, report the bug to the maintainer of the actual application providing the UI. On the other hand, if your UI for youtube-dl fails in some way you believe is related to youtube-dl, by all means, go ahead and report the bug.
# DEVELOPER INSTRUCTIONS
@@ -85,7 +85,7 @@ To run the test, simply invoke your favorite test runner, or execute a test file
If you want to create a build of youtube-dl yourself, you'll need
* python
* make (both GNU make and BSD make are supported)
* make (only GNU make is supported)
* pandoc
* zip
* nosetests
@@ -167,19 +167,19 @@ In any case, thank you very much for your contributions!
This section introduces a guide lines for writing idiomatic, robust and future-proof extractor code.
Extractors are very fragile by nature since they depend on the layout of the source data provided by 3rd party media hoster out of your control and this layout tend to change. As an extractor implementer your task is not only to write code that will extract media links and metadata correctly but also to minimize code dependency on source's layout changes and even to make the code foresee potential future changes and be ready for that. This is important because it will allow extractor not to break on minor layout changes thus keeping old youtube-dl versions working. Even though this breakage issue is easily fixed by emitting a new version of youtube-dl with fix incorporated all the previous version become broken in all repositories and distros' packages that may not be so prompt in fetching the update from us. Needless to say some may never receive an update at all that is possible for non rolling release distros.
Extractors are very fragile by nature since they depend on the layout of the source data provided by 3rd party media hosters out of your control and this layout tends to change. As an extractor implementer your task is not only to write code that will extract media links and metadata correctly but also to minimize dependency on the source's layout and even to make the code foresee potential future changes and be ready for that. This is important because it will allow the extractor not to break on minor layout changes thus keeping old youtube-dl versions working. Even though this breakage issue is easily fixed by emitting a new version of youtube-dl with a fix incorporated, all the previous versions become broken in all repositories and distros' packages that may not be so prompt in fetching the update from us. Needless to say, some non rolling release distros may never receive an update at all.
### Mandatory and optional metafields
For extraction to work youtube-dl relies on metadata your extractor extracts and provides to youtube-dl expressed by [information dictionary](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L75-L257) or simply *info dict*. Only the following meta fields in *info dict* are considered mandatory for successful extraction process by youtube-dl:
For extraction to work youtube-dl relies on metadata your extractor extracts and provides to youtube-dl expressed by an [information dictionary](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L75-L257) or simply *info dict*. Only the following meta fields in the *info dict* are considered mandatory for a successful extraction process by youtube-dl:
- `id` (media identifier)
- `title` (media title)
- `url` (media download URL) or `formats`
In fact only the last option is technically mandatory (i.e. if you can't figure out the download location of the media the extraction does not make any sense). But by convention youtube-dl also treats `id` and `title` to be mandatory. Thus aforementioned metafields are the critical data the extraction does not make any sense without and if any of them fail to be extracted then extractor is considered completely broken.
In fact only the last option is technically mandatory (i.e. if you can't figure out the download location of the media the extraction does not make any sense). But by convention youtube-dl also treats `id` and `title` as mandatory. Thus the aforementioned metafields are the critical data that the extraction does not make any sense without and if any of them fail to be extracted then the extractor is considered completely broken.
[Any field](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L149-L257) apart from the aforementioned ones are considered **optional**. That means that extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they are always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields.
[Any field](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L149-L257) apart from the aforementioned ones are considered **optional**. That means that extraction should be **tolerant** to situations when sources for these fields can potentially be unavailable (even if they are always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields.
#### Example
@@ -199,7 +199,7 @@ Assume at this point `meta`'s layout is:
}
```
Assume you want to extract `summary` and put into resulting info dict as `description`. Since `description` is optional metafield you should be ready that this key may be missing from the `meta` dict, so that you should extract it like:
Assume you want to extract `summary` and put it into the resulting info dict as `description`. Since `description` is an optional metafield you should be ready that this key may be missing from the `meta` dict, so that you should extract it like:
```python
description = meta.get('summary') # correct
@@ -211,7 +211,7 @@ and not like:
description = meta['summary'] # incorrect
```
The latter will break extraction process with `KeyError` if `summary` disappears from `meta` at some time later but with former approach extraction will just go ahead with `description` set to `None` that is perfectly fine (remember `None` is equivalent for absence of data).
The latter will break extraction process with `KeyError` if `summary` disappears from `meta` at some later time but with the former approach extraction will just go ahead with `description` set to `None` which is perfectly fine (remember `None` is equivalent to the absence of data).
Similarly, you should pass `fatal=False` when extracting optional data from a webpage with `_search_regex`, `_html_search_regex` or similar methods, for instance:
@@ -231,21 +231,21 @@ description = self._search_regex(
webpage, 'description', default=None)
```
On failure this code will silently continue the extraction with `description` set to `None`. That is useful for metafields that are known to may or may not be present.
On failure this code will silently continue the extraction with `description` set to `None`. That is useful for metafields that may or may not be present.
### Provide fallbacks
When extracting metadata try to provide several scenarios for that. For example if `title` is present in several places/sources try extracting from at least some of them. This would make it more future-proof in case some of the sources became unavailable.
When extracting metadata try to do so from multiple sources. For example if `title` is present in several places, try extracting from at least some of them. This makes it more future-proof in case some of the sources become unavailable.
#### Example
Say `meta` from previous example has a `title` and you are about to extract it. Since `title` is mandatory meta field you should end up with something like:
Say `meta` from the previous example has a `title` and you are about to extract it. Since `title` is a mandatory meta field you should end up with something like:
```python
title = meta['title']
```
If `title` disappeares from `meta` in future due to some changes on hoster's side the extraction would fail since `title` is mandatory. That's expected.
If `title` disappears from `meta` in future due to some changes on the hoster's side the extraction would fail since `title` is mandatory. That's expected.
Assume that you have some another source you can extract `title` from, for example `og:title` HTML meta of a `webpage`. In this case you can provide a fallback scenario:
@@ -282,7 +282,7 @@ title = self._search_regex(
webpage, 'title', group='title')
```
Note how you tolerate potential changes in `style` attribute's value or switch from using double quotes to single for `class` attribute:
Note how you tolerate potential changes in the `style` attribute's value or switch from using double quotes to single for `class` attribute:
The code definitely should not look like:

165
ChangeLog
View File

@@ -1,3 +1,168 @@
version 2016.11.08
Extractors
* [tmz:article] Fix extraction (#11052)
* [espn] Fix extraction (#11041)
* [mitele] Fix extraction after website redesign (#10824)
- [ard] Remove age restriction check (#11129)
* [generic] Improve support for pornhub.com embeds (#11100)
+ [generic] Add support for redtube.com embeds (#11099)
+ [generic] Add support for drtuber.com embeds (#11098)
+ [redtube] Add support for embed URLs
+ [drtuber] Add support for embed URLs
+ [yahoo] Improve content id extraction (#11088)
* [toutv] Relax URL regular expression (#11121)
version 2016.11.04
Core
* [extractor/common] Tolerate malformed RESOLUTION attribute in m3u8
manifests (#11113)
* [downloader/ism] Fix AVC Decoder Configuration Record
Extractors
+ [fox9] Add support for fox9.com (#11110)
+ [anvato] Extract more metadata and improve formats extraction
* [vodlocker] Improve removed videos detection (#11106)
+ [vzaar] Add support for vzaar.com (#11093)
+ [vice] Add support for uplynk preplay videos (#11101)
* [tubitv] Fix extraction (#11061)
+ [shahid] Add support for authentication (#11091)
+ [radiocanada] Add subtitles support (#11096)
+ [generic] Add support for ISM manifests
version 2016.11.02
Core
+ Add basic support for Smooth Streaming protocol (#8118, #10969)
* Improve MPD manifest base URL extraction (#10909, #11079)
* Fix --match-filter for int-like strings (#11082)
Extractors
+ [mva] Add support for ISM formats
+ [msn] Add support for ISM formats
+ [onet] Add support for ISM formats
+ [tvp] Add support for ISM formats
+ [nicknight] Add support for nicknight sites (#10769)
version 2016.10.30
Extractors
* [facebook] Improve 1080P video detection (#11073)
* [imgur] Recognize /r/ URLs (#11071)
* [beeg] Fix extraction (#11069)
* [openload] Fix extraction (#10408)
* [gvsearch] Modernize and fix search request (#11051)
* [adultswim] Fix extraction (#10979)
+ [nobelprize] Add support for nobelprize.org (#9999)
* [hornbunny] Fix extraction (#10981)
* [tvp] Improve video id extraction (#10585)
version 2016.10.26
Extractors
+ [rentv] Add support for ren.tv (#10620)
+ [ard] Detect unavailable videos (#11018)
* [vk] Fix extraction (#11022)
version 2016.10.25
Core
* Running youtube-dl in the background is fixed (#10996, #10706, #955)
Extractors
+ [jamendo] Add support for jamendo.com (#10132, #10736)
+ [pandatv] Add support for panda.tv (#10736)
+ [dotsub] Support Vimeo embed (#10964)
* [litv] Fix extraction
+ [vimeo] Delegate ondemand redirects to ondemand extractor (#10994)
* [vivo] Fix extraction (#11003)
+ [twitch:stream] Add support for rebroadcasts (#10995)
* [pluralsight] Fix subtitles conversion (#10990)
version 2016.10.21.1
Extractors
+ [pluralsight] Process all clip URLs (#10984)
version 2016.10.21
Core
- Disable thumbnails embedding in mkv
+ Add support for Comcast multiple-system operator (#10819)
Extractors
* [pluralsight] Adapt to new API (#10972)
* [openload] Fix extraction (#10408, #10971)
+ [natgeo] Extract m3u8 formats (#10959)
version 2016.10.19
Core
+ [utils] Expose PACKED_CODES_RE
+ [extractor/common] Extract non smil wowza mpd manifests
+ [extractor/common] Detect f4m audio-only formats
Extractors
* [vidzi] Fix extraction (#10908, #10952)
* [urplay] Fix subtitles extraction
+ [urplay] Add support for urskola.se (#10915)
+ [orf] Add subtitles support (#10939)
* [youtube] Fix --no-playlist behavior for youtu.be/id URLs (#10896)
* [nrk] Relax URL regular expression (#10928)
+ [nytimes] Add support for podcasts (#10926)
* [pluralsight] Relax URL regular expression (#10941)
version 2016.10.16
Core
* [postprocessor/ffmpeg] Return correct filepath and ext in updated information
in FFmpegExtractAudioPP (#10879)
Extractors
+ [ruutu] Add support for supla.fi (#10849)
+ [theoperaplatform] Add support for theoperaplatform.eu (#10914)
* [lynda] Fix height for prioritized streams
+ [lynda] Add fallback extraction scenario
* [lynda] Switch to https (#10916)
+ [huajiao] New extractor (#10917)
* [cmt] Fix mgid extraction (#10813)
+ [safari:course] Add support for techbus.safaribooksonline.com
* [orf:tvthek] Fix extraction and modernize (#10898)
* [chirbit] Fix extraction of user profile pages
* [carambatv] Fix extraction
* [canalplus] Fix extraction for some videos
* [cbsinteractive] Fix extraction for cnet.com
* [parliamentliveuk] Lower case URLs are now recognized (#10912)
version 2016.10.12
Core
+ Support HTML media elements without child nodes
* [Makefile] Support for GNU make < 4 is fixed; BSD make dropped (#9387)
Extractors
* [dailymotion] Fix extraction (#10901)
* [vimeo:review] Fix extraction (#10900)
* [nhl] Correctly handle invalid formats (#10713)
* [footyroom] Fix extraction (#10810)
* [abc.net.au:iview] Fix for standalone (non series) videos (#10895)
+ [hbo] Add support for episode pages (#10892)
* [allocine] Fix extraction (#10860)
+ [nextmedia] Recognize action news on AppleDaily
* [lego] Improve info extraction and bypass geo restriction (#10872)
version 2016.10.07
Extractors

View File

@@ -1,7 +1,7 @@
all: youtube-dl README.md CONTRIBUTING.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish supportedsites
clean:
rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish youtube_dl/extractor/lazy_extractors.py *.dump *.part* *.info.json *.mp4 *.m4a *.flv *.mp3 *.avi *.mkv *.webm *.3gp *.jpg *.png CONTRIBUTING.md.tmp ISSUE_TEMPLATE.md.tmp youtube-dl youtube-dl.exe
rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish youtube_dl/extractor/lazy_extractors.py *.dump *.part* *.info.json *.mp4 *.m4a *.flv *.mp3 *.avi *.mkv *.webm *.3gp *.wav *.jpg *.png CONTRIBUTING.md.tmp ISSUE_TEMPLATE.md.tmp youtube-dl youtube-dl.exe
find . -name "*.pyc" -delete
find . -name "*.class" -delete
@@ -12,7 +12,7 @@ SHAREDIR ?= $(PREFIX)/share
PYTHON ?= /usr/bin/env python
# set SYSCONFDIR to /etc if PREFIX=/usr or PREFIX=/usr/local
SYSCONFDIR != if [ $(PREFIX) = /usr -o $(PREFIX) = /usr/local ]; then echo /etc; else echo $(PREFIX)/etc; fi
SYSCONFDIR = $(shell if [ $(PREFIX) = /usr -o $(PREFIX) = /usr/local ]; then echo /etc; else echo $(PREFIX)/etc; fi)
install: youtube-dl youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish
install -d $(DESTDIR)$(BINDIR)
@@ -90,7 +90,7 @@ fish-completion: youtube-dl.fish
lazy-extractors: youtube_dl/extractor/lazy_extractors.py
_EXTRACTOR_FILES != find youtube_dl/extractor -iname '*.py' -and -not -iname 'lazy_extractors.py'
_EXTRACTOR_FILES = $(shell find youtube_dl/extractor -iname '*.py' -and -not -iname 'lazy_extractors.py')
youtube_dl/extractor/lazy_extractors.py: devscripts/make_lazy_extractors.py devscripts/lazy_load_template.py $(_EXTRACTOR_FILES)
$(PYTHON) devscripts/make_lazy_extractors.py $@

View File

@@ -449,12 +449,12 @@ You can use `--ignore-config` if you want to disable the configuration file for
### Authentication with `.netrc` file
You may also want to configure automatic credentials storage for extractors that support authentication (by providing login and password with `--username` and `--password`) in order not to pass credentials as command line arguments on every youtube-dl execution and prevent tracking plain text passwords in the shell command history. You can achieve this using a [`.netrc` file](http://stackoverflow.com/tags/.netrc/info) on per extractor basis. For that you will need to create a `.netrc` file in your `$HOME` and restrict permissions to read/write by you only:
You may also want to configure automatic credentials storage for extractors that support authentication (by providing login and password with `--username` and `--password`) in order not to pass credentials as command line arguments on every youtube-dl execution and prevent tracking plain text passwords in the shell command history. You can achieve this using a [`.netrc` file](http://stackoverflow.com/tags/.netrc/info) on a per extractor basis. For that you will need to create a `.netrc` file in your `$HOME` and restrict permissions to read/write by only you:
```
touch $HOME/.netrc
chmod a-rwx,u+rw $HOME/.netrc
```
After that you can add credentials for extractor in the following format, where *extractor* is the name of extractor in lowercase:
After that you can add credentials for an extractor in the following format, where *extractor* is the name of the extractor in lowercase:
```
machine <extractor> login <login> password <password>
```
@@ -550,13 +550,13 @@ Available for the media that is a track or a part of a music album:
- `disc_number`: Number of the disc or other physical medium the track belongs to
- `release_year`: Year (YYYY) when the album was released
Each aforementioned sequence when referenced in output template will be replaced by the actual value corresponding to the sequence name. Note that some of the sequences are not guaranteed to be present since they depend on the metadata obtained by particular extractor, such sequences will be replaced with `NA`.
Each aforementioned sequence when referenced in an output template will be replaced by the actual value corresponding to the sequence name. Note that some of the sequences are not guaranteed to be present since they depend on the metadata obtained by a particular extractor. Such sequences will be replaced with `NA`.
For example for `-o %(title)s-%(id)s.%(ext)s` and mp4 video with title `youtube-dl test video` and id `BaW_jenozKcj` this will result in a `youtube-dl test video-BaW_jenozKcj.mp4` file created in the current directory.
For example for `-o %(title)s-%(id)s.%(ext)s` and an mp4 video with title `youtube-dl test video` and id `BaW_jenozKcj`, this will result in a `youtube-dl test video-BaW_jenozKcj.mp4` file created in the current directory.
Output template can also contain arbitrary hierarchical path, e.g. `-o '%(playlist)s/%(playlist_index)s - %(title)s.%(ext)s'` that will result in downloading each video in a directory corresponding to this path template. Any missing directory will be automatically created for you.
Output templates can also contain arbitrary hierarchical path, e.g. `-o '%(playlist)s/%(playlist_index)s - %(title)s.%(ext)s'` which will result in downloading each video in a directory corresponding to this path template. Any missing directory will be automatically created for you.
To specify percent literal in output template use `%%`. To output to stdout use `-o -`.
To use percent literals in an output template use `%%`. To output to stdout use `-o -`.
The current default template is `%(title)s-%(id)s.%(ext)s`.
@@ -564,7 +564,7 @@ In some cases, you don't want special characters such as 中, spaces, or &, such
#### Output template and Windows batch files
If you are using output template inside a Windows batch file then you must escape plain percent characters (`%`) by doubling, so that `-o "%(title)s-%(id)s.%(ext)s"` should become `-o "%%(title)s-%%(id)s.%%(ext)s"`. However you should not touch `%`'s that are not plain characters, e.g. environment variables for expansion should stay intact: `-o "C:\%HOMEPATH%\Desktop\%%(title)s.%%(ext)s"`.
If you are using an output template inside a Windows batch file then you must escape plain percent characters (`%`) by doubling, so that `-o "%(title)s-%(id)s.%(ext)s"` should become `-o "%%(title)s-%%(id)s.%%(ext)s"`. However you should not touch `%`'s that are not plain characters, e.g. environment variables for expansion should stay intact: `-o "C:\%HOMEPATH%\Desktop\%%(title)s.%%(ext)s"`.
#### Output template examples
@@ -597,7 +597,7 @@ $ youtube-dl -o - BaW_jenozKc
By default youtube-dl tries to download the best available quality, i.e. if you want the best quality you **don't need** to pass any special options, youtube-dl will guess it for you by **default**.
But sometimes you may want to download in a different format, for example when you are on a slow or intermittent connection. The key mechanism for achieving this is so called *format selection* based on which you can explicitly specify desired format, select formats based on some criterion or criteria, setup precedence and much more.
But sometimes you may want to download in a different format, for example when you are on a slow or intermittent connection. The key mechanism for achieving this is so-called *format selection* based on which you can explicitly specify desired format, select formats based on some criterion or criteria, setup precedence and much more.
The general syntax for format selection is `--format FORMAT` or shorter `-f FORMAT` where `FORMAT` is a *selector expression*, i.e. an expression that describes format or formats you would like to download.
@@ -605,21 +605,21 @@ The general syntax for format selection is `--format FORMAT` or shorter `-f FORM
The simplest case is requesting a specific format, for example with `-f 22` you can download the format with format code equal to 22. You can get the list of available format codes for particular video using `--list-formats` or `-F`. Note that these format codes are extractor specific.
You can also use a file extension (currently `3gp`, `aac`, `flv`, `m4a`, `mp3`, `mp4`, `ogg`, `wav`, `webm` are supported) to download best quality format of particular file extension served as a single file, e.g. `-f webm` will download best quality format with `webm` extension served as a single file.
You can also use a file extension (currently `3gp`, `aac`, `flv`, `m4a`, `mp3`, `mp4`, `ogg`, `wav`, `webm` are supported) to download the best quality format of a particular file extension served as a single file, e.g. `-f webm` will download the best quality format with the `webm` extension served as a single file.
You can also use special names to select particular edge case format:
- `best`: Select best quality format represented by single file with video and audio
- `worst`: Select worst quality format represented by single file with video and audio
- `bestvideo`: Select best quality video only format (e.g. DASH video), may not be available
- `worstvideo`: Select worst quality video only format, may not be available
- `bestaudio`: Select best quality audio only format, may not be available
- `worstaudio`: Select worst quality audio only format, may not be available
You can also use special names to select particular edge case formats:
- `best`: Select the best quality format represented by a single file with video and audio.
- `worst`: Select the worst quality format represented by a single file with video and audio.
- `bestvideo`: Select the best quality video-only format (e.g. DASH video). May not be available.
- `worstvideo`: Select the worst quality video-only format. May not be available.
- `bestaudio`: Select the best quality audio only-format. May not be available.
- `worstaudio`: Select the worst quality audio only-format. May not be available.
For example, to download worst quality video only format you can use `-f worstvideo`.
For example, to download the worst quality video-only format you can use `-f worstvideo`.
If you want to download multiple videos and they don't have the same formats available, you can specify the order of preference using slashes. Note that slash is left-associative, i.e. formats on the left hand side are preferred, for example `-f 22/17/18` will download format 22 if it's available, otherwise it will download format 17 if it's available, otherwise it will download format 18 if it's available, otherwise it will complain that no suitable formats are available for download.
If you want to download several formats of the same video use comma as a separator, e.g. `-f 22,17,18` will download all these three formats, of course if they are available. Or more sophisticated example combined with precedence feature `-f 136/137/mp4/bestvideo,140/m4a/bestaudio`.
If you want to download several formats of the same video use a comma as a separator, e.g. `-f 22,17,18` will download all these three formats, of course if they are available. Or a more sophisticated example combined with the precedence feature: `-f 136/137/mp4/bestvideo,140/m4a/bestaudio`.
You can also filter the video formats by putting a condition in brackets, as in `-f "best[height=720]"` (or `-f "[filesize>10M]"`).
@@ -641,15 +641,15 @@ Also filtering work for comparisons `=` (equals), `!=` (not equals), `^=` (begin
- `protocol`: The protocol that will be used for the actual download, lower-case. `http`, `https`, `rtsp`, `rtmp`, `rtmpe`, `m3u8`, or `m3u8_native`
- `format_id`: A short description of the format
Note that none of the aforementioned meta fields are guaranteed to be present since this solely depends on the metadata obtained by particular extractor, i.e. the metadata offered by video hoster.
Note that none of the aforementioned meta fields are guaranteed to be present since this solely depends on the metadata obtained by particular extractor, i.e. the metadata offered by the video hoster.
Formats for which the value is not known are excluded unless you put a question mark (`?`) after the operator. You can combine format filters, so `-f "[height <=? 720][tbr>500]"` selects up to 720p videos (or videos where the height is not known) with a bitrate of at least 500 KBit/s.
You can merge the video and audio of two formats into a single file using `-f <video-format>+<audio-format>` (requires ffmpeg or avconv installed), for example `-f bestvideo+bestaudio` will download best video only format, best audio only format and mux them together with ffmpeg/avconv.
You can merge the video and audio of two formats into a single file using `-f <video-format>+<audio-format>` (requires ffmpeg or avconv installed), for example `-f bestvideo+bestaudio` will download the best video-only format, the best audio-only format and mux them together with ffmpeg/avconv.
Format selectors can also be grouped using parentheses, for example if you want to download the best mp4 and webm formats with a height lower than 480 you can use `-f '(mp4,webm)[height<480]'`.
Since the end of April 2015 and version 2015.04.26 youtube-dl uses `-f bestvideo+bestaudio/best` as default format selection (see [#5447](https://github.com/rg3/youtube-dl/issues/5447), [#5456](https://github.com/rg3/youtube-dl/issues/5456)). If ffmpeg or avconv are installed this results in downloading `bestvideo` and `bestaudio` separately and muxing them together into a single file giving the best overall quality available. Otherwise it falls back to `best` and results in downloading the best available quality served as a single file. `best` is also needed for videos that don't come from YouTube because they don't provide the audio and video in two different files. If you want to only download some DASH formats (for example if you are not interested in getting videos with a resolution higher than 1080p), you can add `-f bestvideo[height<=?1080]+bestaudio/best` to your configuration file. Note that if you use youtube-dl to stream to `stdout` (and most likely to pipe it to your media player then), i.e. you explicitly specify output template as `-o -`, youtube-dl still uses `-f best` format selection in order to start content delivery immediately to your player and not to wait until `bestvideo` and `bestaudio` are downloaded and muxed.
Since the end of April 2015 and version 2015.04.26, youtube-dl uses `-f bestvideo+bestaudio/best` as the default format selection (see [#5447](https://github.com/rg3/youtube-dl/issues/5447), [#5456](https://github.com/rg3/youtube-dl/issues/5456)). If ffmpeg or avconv are installed this results in downloading `bestvideo` and `bestaudio` separately and muxing them together into a single file giving the best overall quality available. Otherwise it falls back to `best` and results in downloading the best available quality served as a single file. `best` is also needed for videos that don't come from YouTube because they don't provide the audio and video in two different files. If you want to only download some DASH formats (for example if you are not interested in getting videos with a resolution higher than 1080p), you can add `-f bestvideo[height<=?1080]+bestaudio/best` to your configuration file. Note that if you use youtube-dl to stream to `stdout` (and most likely to pipe it to your media player then), i.e. you explicitly specify output template as `-o -`, youtube-dl still uses `-f best` format selection in order to start content delivery immediately to your player and not to wait until `bestvideo` and `bestaudio` are downloaded and muxed.
If you want to preserve the old format selection behavior (prior to youtube-dl 2015.04.26), i.e. you want to download the best available quality media served as a single file, you should explicitly specify your choice with `-f best`. You may want to add it to the [configuration file](#configuration) in order not to type it every time you run youtube-dl.
@@ -728,7 +728,7 @@ Add a file exclusion for `youtube-dl.exe` in Windows Defender settings.
YouTube changed their playlist format in March 2014 and later on, so you'll need at least youtube-dl 2014.07.25 to download all YouTube videos.
If you have installed youtube-dl with a package manager, pip, setup.py or a tarball, please use that to update. Note that Ubuntu packages do not seem to get updated anymore. Since we are not affiliated with Ubuntu, there is little we can do. Feel free to [report bugs](https://bugs.launchpad.net/ubuntu/+source/youtube-dl/+filebug) to the [Ubuntu packaging guys](mailto:ubuntu-motu@lists.ubuntu.com?subject=outdated%20version%20of%20youtube-dl) - all they have to do is update the package to a somewhat recent version. See above for a way to update.
If you have installed youtube-dl with a package manager, pip, setup.py or a tarball, please use that to update. Note that Ubuntu packages do not seem to get updated anymore. Since we are not affiliated with Ubuntu, there is little we can do. Feel free to [report bugs](https://bugs.launchpad.net/ubuntu/+source/youtube-dl/+filebug) to the [Ubuntu packaging people](mailto:ubuntu-motu@lists.ubuntu.com?subject=outdated%20version%20of%20youtube-dl) - all they have to do is update the package to a somewhat recent version. See above for a way to update.
### I'm getting an error when trying to use output template: `error: using output template conflicts with using title, video ID or auto number`
@@ -758,7 +758,7 @@ Once the video is fully downloaded, use any video player, such as [mpv](https://
### I extracted a video URL with `-g`, but it does not play on another machine / in my webbrowser.
It depends a lot on the service. In many cases, requests for the video (to download/play it) must come from the same IP address and with the same cookies. Use the `--cookies` option to write the required cookies into a file, and advise your downloader to read cookies from that file. Some sites also require a common user agent to be used, use `--dump-user-agent` to see the one in use by youtube-dl.
It depends a lot on the service. In many cases, requests for the video (to download/play it) must come from the same IP address and with the same cookies and/or HTTP headers. Use the `--cookies` option to write the required cookies into a file, and advise your downloader to read cookies from that file. Some sites also require a common user agent to be used, use `--dump-user-agent` to see the one in use by youtube-dl. You can also get necessary cookies and HTTP headers from JSON output obtained with `--dump-json`.
It may be beneficial to use IPv6; in some cases, the restrictions are only applied to IPv4. Some services (sometimes only for a subset of videos) do not restrict the video URL by IP address, cookie, or user-agent, but these are the exception rather than the rule.
@@ -902,7 +902,7 @@ If you want to find out whether a given URL is supported, simply call youtube-dl
# Why do I need to go through that much red tape when filing bugs?
Before we had the issue template, despite our extensive [bug reporting instructions](#bugs), about 80% of the issue reports we got were useless, for instance because people used ancient versions hundreds of releases old, because of simple syntactic errors (not in youtube-dl but in general shell usage), because the problem was alrady reported multiple times before, because people did not actually read an error message, even if it said "please install ffmpeg", because people did not mention the URL they were trying to download and many more simple, easy-to-avoid problems, many of whom were totally unrelated to youtube-dl.
Before we had the issue template, despite our extensive [bug reporting instructions](#bugs), about 80% of the issue reports we got were useless, for instance because people used ancient versions hundreds of releases old, because of simple syntactic errors (not in youtube-dl but in general shell usage), because the problem was already reported multiple times before, because people did not actually read an error message, even if it said "please install ffmpeg", because people did not mention the URL they were trying to download and many more simple, easy-to-avoid problems, many of whom were totally unrelated to youtube-dl.
youtube-dl is an open-source project manned by too few volunteers, so we'd rather spend time fixing bugs where we are certain none of those simple problems apply, and where we can be reasonably confident to be able to reproduce the issue without asking the reporter repeatedly. As such, the output of `youtube-dl -v YOUR_URL_HERE` is really all that's required to file an issue. The issue template also guides you through some basic steps you can do, such as checking that your version of youtube-dl is current.
@@ -923,7 +923,7 @@ To run the test, simply invoke your favorite test runner, or execute a test file
If you want to create a build of youtube-dl yourself, you'll need
* python
* make (both GNU make and BSD make are supported)
* make (only GNU make is supported)
* pandoc
* zip
* nosetests
@@ -1005,19 +1005,19 @@ In any case, thank you very much for your contributions!
This section introduces a guide lines for writing idiomatic, robust and future-proof extractor code.
Extractors are very fragile by nature since they depend on the layout of the source data provided by 3rd party media hoster out of your control and this layout tend to change. As an extractor implementer your task is not only to write code that will extract media links and metadata correctly but also to minimize code dependency on source's layout changes and even to make the code foresee potential future changes and be ready for that. This is important because it will allow extractor not to break on minor layout changes thus keeping old youtube-dl versions working. Even though this breakage issue is easily fixed by emitting a new version of youtube-dl with fix incorporated all the previous version become broken in all repositories and distros' packages that may not be so prompt in fetching the update from us. Needless to say some may never receive an update at all that is possible for non rolling release distros.
Extractors are very fragile by nature since they depend on the layout of the source data provided by 3rd party media hosters out of your control and this layout tends to change. As an extractor implementer your task is not only to write code that will extract media links and metadata correctly but also to minimize dependency on the source's layout and even to make the code foresee potential future changes and be ready for that. This is important because it will allow the extractor not to break on minor layout changes thus keeping old youtube-dl versions working. Even though this breakage issue is easily fixed by emitting a new version of youtube-dl with a fix incorporated, all the previous versions become broken in all repositories and distros' packages that may not be so prompt in fetching the update from us. Needless to say, some non rolling release distros may never receive an update at all.
### Mandatory and optional metafields
For extraction to work youtube-dl relies on metadata your extractor extracts and provides to youtube-dl expressed by [information dictionary](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L75-L257) or simply *info dict*. Only the following meta fields in *info dict* are considered mandatory for successful extraction process by youtube-dl:
For extraction to work youtube-dl relies on metadata your extractor extracts and provides to youtube-dl expressed by an [information dictionary](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L75-L257) or simply *info dict*. Only the following meta fields in the *info dict* are considered mandatory for a successful extraction process by youtube-dl:
- `id` (media identifier)
- `title` (media title)
- `url` (media download URL) or `formats`
In fact only the last option is technically mandatory (i.e. if you can't figure out the download location of the media the extraction does not make any sense). But by convention youtube-dl also treats `id` and `title` to be mandatory. Thus aforementioned metafields are the critical data the extraction does not make any sense without and if any of them fail to be extracted then extractor is considered completely broken.
In fact only the last option is technically mandatory (i.e. if you can't figure out the download location of the media the extraction does not make any sense). But by convention youtube-dl also treats `id` and `title` as mandatory. Thus the aforementioned metafields are the critical data that the extraction does not make any sense without and if any of them fail to be extracted then the extractor is considered completely broken.
[Any field](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L149-L257) apart from the aforementioned ones are considered **optional**. That means that extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they are always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields.
[Any field](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L149-L257) apart from the aforementioned ones are considered **optional**. That means that extraction should be **tolerant** to situations when sources for these fields can potentially be unavailable (even if they are always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields.
#### Example
@@ -1037,7 +1037,7 @@ Assume at this point `meta`'s layout is:
}
```
Assume you want to extract `summary` and put into resulting info dict as `description`. Since `description` is optional metafield you should be ready that this key may be missing from the `meta` dict, so that you should extract it like:
Assume you want to extract `summary` and put it into the resulting info dict as `description`. Since `description` is an optional metafield you should be ready that this key may be missing from the `meta` dict, so that you should extract it like:
```python
description = meta.get('summary') # correct
@@ -1049,7 +1049,7 @@ and not like:
description = meta['summary'] # incorrect
```
The latter will break extraction process with `KeyError` if `summary` disappears from `meta` at some time later but with former approach extraction will just go ahead with `description` set to `None` that is perfectly fine (remember `None` is equivalent for absence of data).
The latter will break extraction process with `KeyError` if `summary` disappears from `meta` at some later time but with the former approach extraction will just go ahead with `description` set to `None` which is perfectly fine (remember `None` is equivalent to the absence of data).
Similarly, you should pass `fatal=False` when extracting optional data from a webpage with `_search_regex`, `_html_search_regex` or similar methods, for instance:
@@ -1069,21 +1069,21 @@ description = self._search_regex(
webpage, 'description', default=None)
```
On failure this code will silently continue the extraction with `description` set to `None`. That is useful for metafields that are known to may or may not be present.
On failure this code will silently continue the extraction with `description` set to `None`. That is useful for metafields that may or may not be present.
### Provide fallbacks
When extracting metadata try to provide several scenarios for that. For example if `title` is present in several places/sources try extracting from at least some of them. This would make it more future-proof in case some of the sources became unavailable.
When extracting metadata try to do so from multiple sources. For example if `title` is present in several places, try extracting from at least some of them. This makes it more future-proof in case some of the sources become unavailable.
#### Example
Say `meta` from previous example has a `title` and you are about to extract it. Since `title` is mandatory meta field you should end up with something like:
Say `meta` from the previous example has a `title` and you are about to extract it. Since `title` is a mandatory meta field you should end up with something like:
```python
title = meta['title']
```
If `title` disappeares from `meta` in future due to some changes on hoster's side the extraction would fail since `title` is mandatory. That's expected.
If `title` disappears from `meta` in future due to some changes on the hoster's side the extraction would fail since `title` is mandatory. That's expected.
Assume that you have some another source you can extract `title` from, for example `og:title` HTML meta of a `webpage`. In this case you can provide a fallback scenario:
@@ -1120,7 +1120,7 @@ title = self._search_regex(
webpage, 'title', group='title')
```
Note how you tolerate potential changes in `style` attribute's value or switch from using double quotes to single for `class` attribute:
Note how you tolerate potential changes in the `style` attribute's value or switch from using double quotes to single for `class` attribute:
The code definitely should not look like:
@@ -1190,7 +1190,7 @@ with youtube_dl.YoutubeDL(ydl_opts) as ydl:
# BUGS
Bugs and suggestions should be reported at: <https://github.com/rg3/youtube-dl/issues>. Unless you were prompted so or there is another pertinent reason (e.g. GitHub fails to accept the bug report), please do not send bug reports via personal email. For discussions, join us in the IRC channel [#youtube-dl](irc://chat.freenode.net/#youtube-dl) on freenode ([webchat](http://webchat.freenode.net/?randomnick=1&channels=youtube-dl)).
Bugs and suggestions should be reported at: <https://github.com/rg3/youtube-dl/issues>. Unless you were prompted to or there is another pertinent reason (e.g. GitHub fails to accept the bug report), please do not send bug reports via personal email. For discussions, join us in the IRC channel [#youtube-dl](irc://chat.freenode.net/#youtube-dl) on freenode ([webchat](http://webchat.freenode.net/?randomnick=1&channels=youtube-dl)).
**Please include the full output of youtube-dl when run with `-v`**, i.e. **add** `-v` flag to **your command line**, copy the **whole** output and post it in the issue body wrapped in \`\`\` for better formatting. It should look similar to this:
```
@@ -1206,7 +1206,7 @@ $ youtube-dl -v <your command line>
[debug] Proxy map: {}
...
```
**Do not post screenshots of verbose log only plain text is acceptable.**
**Do not post screenshots of verbose logs; only plain text is acceptable.**
The output (including the first lines) contains important debugging information. Issues without the full output are often not reproducible and therefore do not get solved in short order, if ever.
@@ -1260,7 +1260,7 @@ Only post features that you (or an incapacitated friend you can personally talk
### Is your question about youtube-dl?
It may sound strange, but some bug reports we receive are completely unrelated to youtube-dl and relate to a different or even the reporter's own application. Please make sure that you are actually using youtube-dl. If you are using a UI for youtube-dl, report the bug to the maintainer of the actual application providing the UI. On the other hand, if your UI for youtube-dl fails in some way you believe is related to youtube-dl, by all means, go ahead and report the bug.
It may sound strange, but some bug reports we receive are completely unrelated to youtube-dl and relate to a different, or even the reporter's own, application. Please make sure that you are actually using youtube-dl. If you are using a UI for youtube-dl, report the bug to the maintainer of the actual application providing the UI. On the other hand, if your UI for youtube-dl fails in some way you believe is related to youtube-dl, by all means, go ahead and report the bug.
# COPYRIGHT

View File

@@ -86,7 +86,7 @@
- **bbc.co.uk:article**: BBC articles
- **bbc.co.uk:iplayer:playlist**
- **bbc.co.uk:playlist**
- **BeatportPro**
- **Beatport**
- **Beeg**
- **BehindKink**
- **BellMedia**
@@ -247,6 +247,7 @@
- **FootyRoom**
- **Formula1**
- **FOX**
- **FOX9**
- **Foxgay**
- **foxnews**: Fox News and Fox Business Video
- **foxnews:article**
@@ -289,6 +290,7 @@
- **Groupon**
- **Hark**
- **HBO**
- **HBOEpisode**
- **HearThisAt**
- **Heise**
- **HellPorno**
@@ -307,6 +309,7 @@
- **HowStuffWorks**
- **HRTi**
- **HRTiPlaylist**
- **Huajiao**: 花椒直播
- **HuffPost**: Huffington Post
- **Hypem**
- **Iconosquare**
@@ -330,6 +333,8 @@
- **ivideon**: Ivideon TV
- **Iwara**
- **Izlesene**
- **Jamendo**
- **JamendoAlbum**
- **JeuxVideo**
- **Jove**
- **jpopsuki.tv**
@@ -479,11 +484,13 @@
- **nhl.com:videocenter:category**: NHL videocenter category
- **nick.com**
- **nick.de**
- **nicknight**
- **niconico**: ニコニコ動画
- **NiconicoPlaylist**
- **Nintendo**
- **njoy**: N-JOY
- **njoy:embed**
- **NobelPrize**
- **Noco**
- **Normalboots**
- **NosVideo**
@@ -525,6 +532,7 @@
- **orf:iptv**: iptv.ORF.at
- **orf:oe1**: Radio Österreich 1
- **orf:tvthek**: ORF TVthek
- **PandaTV**: 熊猫TV
- **pandora.tv**: 판도라TV
- **parliamentlive.tv**: UK parliament videos
- **Patreon**
@@ -584,6 +592,8 @@
- **RDS**: RDS.ca
- **RedTube**
- **RegioTV**
- **RENTV**
- **RENTVArticle**
- **Restudy**
- **Reuters**
- **ReverbNation**
@@ -639,7 +649,7 @@
- **ServingSys**
- **Sexu**
- **Shahid**
- **Shared**: shared.sx and vivo.sx
- **Shared**: shared.sx
- **ShareSix**
- **Sina**
- **SixPlay**
@@ -719,6 +729,7 @@
- **TF1**
- **TFO**
- **TheIntercept**
- **theoperaplatform**
- **ThePlatform**
- **ThePlatformFeed**
- **TheScene**
@@ -845,6 +856,7 @@
- **Vimple**: Vimple - one-click video hosting
- **Vine**
- **vine:user**
- **Vivo**: vivo.sx
- **vk**: VK
- **vk:uservideos**: VK - User's Videos
- **vk:wallpost**
@@ -859,6 +871,7 @@
- **vube**: Vube.com
- **VuClip**
- **VyboryMos**
- **Vzaar**
- **Walla**
- **washingtonpost**
- **washingtonpost:article**

View File

@@ -605,6 +605,7 @@ class TestYoutubeDL(unittest.TestCase):
'extractor': 'TEST',
'duration': 30,
'filesize': 10 * 1024,
'playlist_id': '42',
}
second = {
'id': '2',
@@ -614,6 +615,7 @@ class TestYoutubeDL(unittest.TestCase):
'duration': 10,
'description': 'foo',
'filesize': 5 * 1024,
'playlist_id': '43',
}
videos = [first, second]
@@ -650,6 +652,10 @@ class TestYoutubeDL(unittest.TestCase):
res = get_videos(f)
self.assertEqual(res, ['1'])
f = match_filter_func('playlist_id = 42')
res = get_videos(f)
self.assertEqual(res, ['1'])
def test_playlist_items_selection(self):
entries = [{
'id': compat_str(i),

View File

@@ -87,7 +87,7 @@ class TestHTTP(unittest.TestCase):
ydl = YoutubeDL({'logger': FakeLogger()})
r = ydl.extract_info('http://localhost:%d/302' % self.port)
self.assertEqual(r['url'], 'http://localhost:%d/vid.mp4' % self.port)
self.assertEqual(r['entries'][0]['url'], 'http://localhost:%d/vid.mp4' % self.port)
class TestHTTPS(unittest.TestCase):
@@ -111,7 +111,7 @@ class TestHTTPS(unittest.TestCase):
ydl = YoutubeDL({'logger': FakeLogger(), 'nocheckcertificate': True})
r = ydl.extract_info('https://localhost:%d/video.html' % self.port)
self.assertEqual(r['url'], 'https://localhost:%d/vid.mp4' % self.port)
self.assertEqual(r['entries'][0]['url'], 'https://localhost:%d/vid.mp4' % self.port)
def _build_proxy_handler(name):

View File

@@ -69,6 +69,7 @@ from youtube_dl.utils import (
uppercase_escape,
lowercase_escape,
url_basename,
base_url,
urlencode_postdata,
urshift,
update_url_query,
@@ -437,6 +438,13 @@ class TestUtil(unittest.TestCase):
url_basename('http://media.w3.org/2010/05/sintel/trailer.mp4'),
'trailer.mp4')
def test_base_url(self):
self.assertEqual(base_url('http://foo.de/'), 'http://foo.de/')
self.assertEqual(base_url('http://foo.de/bar'), 'http://foo.de/')
self.assertEqual(base_url('http://foo.de/bar/'), 'http://foo.de/bar/')
self.assertEqual(base_url('http://foo.de/bar/baz'), 'http://foo.de/bar/')
self.assertEqual(base_url('http://foo.de/bar/baz?x=z/x/c'), 'http://foo.de/bar/')
def test_parse_age_limit(self):
self.assertEqual(parse_age_limit(None), None)
self.assertEqual(parse_age_limit(False), None)

View File

@@ -1658,7 +1658,7 @@ class YoutubeDL(object):
video_ext, audio_ext = audio.get('ext'), video.get('ext')
if video_ext and audio_ext:
COMPATIBLE_EXTS = (
('mp3', 'mp4', 'm4a', 'm4p', 'm4b', 'm4r', 'm4v'),
('mp3', 'mp4', 'm4a', 'm4p', 'm4b', 'm4r', 'm4v', 'ismv', 'isma'),
('webm')
)
for exts in COMPATIBLE_EXTS:

View File

@@ -7,6 +7,7 @@ from .http import HttpFD
from .rtmp import RtmpFD
from .dash import DashSegmentsFD
from .rtsp import RtspFD
from .ism import IsmFD
from .external import (
get_external_downloader,
FFmpegFD,
@@ -24,6 +25,7 @@ PROTOCOL_MAP = {
'rtsp': RtspFD,
'f4m': F4mFD,
'http_dash_segments': DashSegmentsFD,
'ism': IsmFD,
}

View File

@@ -346,7 +346,6 @@ class FileDownloader(object):
min_sleep_interval = self.params.get('sleep_interval')
if min_sleep_interval:
max_sleep_interval = self.params.get('max_sleep_interval', min_sleep_interval)
print(min_sleep_interval, max_sleep_interval)
sleep_interval = random.uniform(min_sleep_interval, max_sleep_interval)
self.to_screen('[download] Sleeping %s seconds...' % sleep_interval)
time.sleep(sleep_interval)

View File

@@ -0,0 +1,271 @@
from __future__ import unicode_literals
import os
import time
import struct
import binascii
import io
from .fragment import FragmentFD
from ..compat import compat_urllib_error
from ..utils import (
sanitize_open,
encodeFilename,
)
u8 = struct.Struct(b'>B')
u88 = struct.Struct(b'>Bx')
u16 = struct.Struct(b'>H')
u1616 = struct.Struct(b'>Hxx')
u32 = struct.Struct(b'>I')
u64 = struct.Struct(b'>Q')
s88 = struct.Struct(b'>bx')
s16 = struct.Struct(b'>h')
s1616 = struct.Struct(b'>hxx')
s32 = struct.Struct(b'>i')
unity_matrix = (s32.pack(0x10000) + s32.pack(0) * 3) * 2 + s32.pack(0x40000000)
TRACK_ENABLED = 0x1
TRACK_IN_MOVIE = 0x2
TRACK_IN_PREVIEW = 0x4
SELF_CONTAINED = 0x1
def box(box_type, payload):
return u32.pack(8 + len(payload)) + box_type + payload
def full_box(box_type, version, flags, payload):
return box(box_type, u8.pack(version) + u32.pack(flags)[1:] + payload)
def write_piff_header(stream, params):
track_id = params['track_id']
fourcc = params['fourcc']
duration = params['duration']
timescale = params.get('timescale', 10000000)
language = params.get('language', 'und')
height = params.get('height', 0)
width = params.get('width', 0)
is_audio = width == 0 and height == 0
creation_time = modification_time = int(time.time())
ftyp_payload = b'isml' # major brand
ftyp_payload += u32.pack(1) # minor version
ftyp_payload += b'piff' + b'iso2' # compatible brands
stream.write(box(b'ftyp', ftyp_payload)) # File Type Box
mvhd_payload = u64.pack(creation_time)
mvhd_payload += u64.pack(modification_time)
mvhd_payload += u32.pack(timescale)
mvhd_payload += u64.pack(duration)
mvhd_payload += s1616.pack(1) # rate
mvhd_payload += s88.pack(1) # volume
mvhd_payload += u16.pack(0) # reserved
mvhd_payload += u32.pack(0) * 2 # reserved
mvhd_payload += unity_matrix
mvhd_payload += u32.pack(0) * 6 # pre defined
mvhd_payload += u32.pack(0xffffffff) # next track id
moov_payload = full_box(b'mvhd', 1, 0, mvhd_payload) # Movie Header Box
tkhd_payload = u64.pack(creation_time)
tkhd_payload += u64.pack(modification_time)
tkhd_payload += u32.pack(track_id) # track id
tkhd_payload += u32.pack(0) # reserved
tkhd_payload += u64.pack(duration)
tkhd_payload += u32.pack(0) * 2 # reserved
tkhd_payload += s16.pack(0) # layer
tkhd_payload += s16.pack(0) # alternate group
tkhd_payload += s88.pack(1 if is_audio else 0) # volume
tkhd_payload += u16.pack(0) # reserved
tkhd_payload += unity_matrix
tkhd_payload += u1616.pack(width)
tkhd_payload += u1616.pack(height)
trak_payload = full_box(b'tkhd', 1, TRACK_ENABLED | TRACK_IN_MOVIE | TRACK_IN_PREVIEW, tkhd_payload) # Track Header Box
mdhd_payload = u64.pack(creation_time)
mdhd_payload += u64.pack(modification_time)
mdhd_payload += u32.pack(timescale)
mdhd_payload += u64.pack(duration)
mdhd_payload += u16.pack(((ord(language[0]) - 0x60) << 10) | ((ord(language[1]) - 0x60) << 5) | (ord(language[2]) - 0x60))
mdhd_payload += u16.pack(0) # pre defined
mdia_payload = full_box(b'mdhd', 1, 0, mdhd_payload) # Media Header Box
hdlr_payload = u32.pack(0) # pre defined
hdlr_payload += b'soun' if is_audio else b'vide' # handler type
hdlr_payload += u32.pack(0) * 3 # reserved
hdlr_payload += (b'Sound' if is_audio else b'Video') + b'Handler\0' # name
mdia_payload += full_box(b'hdlr', 0, 0, hdlr_payload) # Handler Reference Box
if is_audio:
smhd_payload = s88.pack(0) # balance
smhd_payload = u16.pack(0) # reserved
media_header_box = full_box(b'smhd', 0, 0, smhd_payload) # Sound Media Header
else:
vmhd_payload = u16.pack(0) # graphics mode
vmhd_payload += u16.pack(0) * 3 # opcolor
media_header_box = full_box(b'vmhd', 0, 1, vmhd_payload) # Video Media Header
minf_payload = media_header_box
dref_payload = u32.pack(1) # entry count
dref_payload += full_box(b'url ', 0, SELF_CONTAINED, b'') # Data Entry URL Box
dinf_payload = full_box(b'dref', 0, 0, dref_payload) # Data Reference Box
minf_payload += box(b'dinf', dinf_payload) # Data Information Box
stsd_payload = u32.pack(1) # entry count
sample_entry_payload = u8.pack(0) * 6 # reserved
sample_entry_payload += u16.pack(1) # data reference index
if is_audio:
sample_entry_payload += u32.pack(0) * 2 # reserved
sample_entry_payload += u16.pack(params.get('channels', 2))
sample_entry_payload += u16.pack(params.get('bits_per_sample', 16))
sample_entry_payload += u16.pack(0) # pre defined
sample_entry_payload += u16.pack(0) # reserved
sample_entry_payload += u1616.pack(params['sampling_rate'])
if fourcc == 'AACL':
sample_entry_box = box(b'mp4a', sample_entry_payload)
else:
sample_entry_payload = sample_entry_payload
sample_entry_payload += u16.pack(0) # pre defined
sample_entry_payload += u16.pack(0) # reserved
sample_entry_payload += u32.pack(0) * 3 # pre defined
sample_entry_payload += u16.pack(width)
sample_entry_payload += u16.pack(height)
sample_entry_payload += u1616.pack(0x48) # horiz resolution 72 dpi
sample_entry_payload += u1616.pack(0x48) # vert resolution 72 dpi
sample_entry_payload += u32.pack(0) # reserved
sample_entry_payload += u16.pack(1) # frame count
sample_entry_payload += u8.pack(0) * 32 # compressor name
sample_entry_payload += u16.pack(0x18) # depth
sample_entry_payload += s16.pack(-1) # pre defined
codec_private_data = binascii.unhexlify(params['codec_private_data'])
if fourcc in ('H264', 'AVC1'):
sps, pps = codec_private_data.split(u32.pack(1))[1:]
avcc_payload = u8.pack(1) # configuration version
avcc_payload += sps[1:4] # avc profile indication + profile compatibility + avc level indication
avcc_payload += u8.pack(0xfc | (params.get('nal_unit_length_field', 4) - 1)) # complete represenation (1) + reserved (11111) + length size minus one
avcc_payload += u8.pack(1) # reserved (0) + number of sps (0000001)
avcc_payload += u16.pack(len(sps))
avcc_payload += sps
avcc_payload += u8.pack(1) # number of pps
avcc_payload += u16.pack(len(pps))
avcc_payload += pps
sample_entry_payload += box(b'avcC', avcc_payload) # AVC Decoder Configuration Record
sample_entry_box = box(b'avc1', sample_entry_payload) # AVC Simple Entry
stsd_payload += sample_entry_box
stbl_payload = full_box(b'stsd', 0, 0, stsd_payload) # Sample Description Box
stts_payload = u32.pack(0) # entry count
stbl_payload += full_box(b'stts', 0, 0, stts_payload) # Decoding Time to Sample Box
stsc_payload = u32.pack(0) # entry count
stbl_payload += full_box(b'stsc', 0, 0, stsc_payload) # Sample To Chunk Box
stco_payload = u32.pack(0) # entry count
stbl_payload += full_box(b'stco', 0, 0, stco_payload) # Chunk Offset Box
minf_payload += box(b'stbl', stbl_payload) # Sample Table Box
mdia_payload += box(b'minf', minf_payload) # Media Information Box
trak_payload += box(b'mdia', mdia_payload) # Media Box
moov_payload += box(b'trak', trak_payload) # Track Box
mehd_payload = u64.pack(duration)
mvex_payload = full_box(b'mehd', 1, 0, mehd_payload) # Movie Extends Header Box
trex_payload = u32.pack(track_id) # track id
trex_payload += u32.pack(1) # default sample description index
trex_payload += u32.pack(0) # default sample duration
trex_payload += u32.pack(0) # default sample size
trex_payload += u32.pack(0) # default sample flags
mvex_payload += full_box(b'trex', 0, 0, trex_payload) # Track Extends Box
moov_payload += box(b'mvex', mvex_payload) # Movie Extends Box
stream.write(box(b'moov', moov_payload)) # Movie Box
def extract_box_data(data, box_sequence):
data_reader = io.BytesIO(data)
while True:
box_size = u32.unpack(data_reader.read(4))[0]
box_type = data_reader.read(4)
if box_type == box_sequence[0]:
box_data = data_reader.read(box_size - 8)
if len(box_sequence) == 1:
return box_data
return extract_box_data(box_data, box_sequence[1:])
data_reader.seek(box_size - 8, 1)
class IsmFD(FragmentFD):
"""
Download segments in a ISM manifest
"""
FD_NAME = 'ism'
def real_download(self, filename, info_dict):
segments = info_dict['fragments'][:1] if self.params.get(
'test', False) else info_dict['fragments']
ctx = {
'filename': filename,
'total_frags': len(segments),
}
self._prepare_and_start_frag_download(ctx)
segments_filenames = []
fragment_retries = self.params.get('fragment_retries', 0)
skip_unavailable_fragments = self.params.get('skip_unavailable_fragments', True)
track_written = False
for i, segment in enumerate(segments):
segment_url = segment['url']
segment_name = 'Frag%d' % i
target_filename = '%s-%s' % (ctx['tmpfilename'], segment_name)
count = 0
while count <= fragment_retries:
try:
success = ctx['dl'].download(target_filename, {'url': segment_url})
if not success:
return False
down, target_sanitized = sanitize_open(target_filename, 'rb')
down_data = down.read()
if not track_written:
tfhd_data = extract_box_data(down_data, [b'moof', b'traf', b'tfhd'])
info_dict['_download_params']['track_id'] = u32.unpack(tfhd_data[4:8])[0]
write_piff_header(ctx['dest_stream'], info_dict['_download_params'])
track_written = True
ctx['dest_stream'].write(down_data)
down.close()
segments_filenames.append(target_sanitized)
break
except compat_urllib_error.HTTPError as err:
count += 1
if count <= fragment_retries:
self.report_retry_fragment(err, segment_name, count, fragment_retries)
if count > fragment_retries:
if skip_unavailable_fragments:
self.report_skip_fragment(segment_name)
continue
self.report_error('giving up after %s fragment retries' % fragment_retries)
return False
self._finish_frag_download(ctx)
for segment_file in segments_filenames:
os.remove(encodeFilename(segment_file))
return True

View File

@@ -102,16 +102,16 @@ class ABCIViewIE(InfoExtractor):
# ABC iview programs are normally available for 14 days only.
_TESTS = [{
'url': 'http://iview.abc.net.au/programs/gardening-australia/FA1505V024S00',
'md5': '979d10b2939101f0d27a06b79edad536',
'url': 'http://iview.abc.net.au/programs/diaries-of-a-broken-mind/ZX9735A001S00',
'md5': 'cde42d728b3b7c2b32b1b94b4a548afc',
'info_dict': {
'id': 'FA1505V024S00',
'id': 'ZX9735A001S00',
'ext': 'mp4',
'title': 'Series 27 Ep 24',
'description': 'md5:b28baeae7504d1148e1d2f0e3ed3c15d',
'upload_date': '20160820',
'uploader_id': 'abc1',
'timestamp': 1471719600,
'title': 'Diaries Of A Broken Mind',
'description': 'md5:7de3903874b7a1be279fe6b68718fc9e',
'upload_date': '20161010',
'uploader_id': 'abc2',
'timestamp': 1476064920,
},
'skip': 'Video gone',
}]
@@ -121,7 +121,7 @@ class ABCIViewIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
video_params = self._parse_json(self._search_regex(
r'videoParams\s*=\s*({.+?});', webpage, 'video params'), video_id)
title = video_params['title']
title = video_params.get('title') or video_params['seriesTitle']
stream = next(s for s in video_params['playlist'] if s.get('type') == 'program')
formats = self._extract_akamai_formats(stream['hds-unmetered'], video_id)
@@ -144,8 +144,8 @@ class ABCIViewIE(InfoExtractor):
'timestamp': parse_iso8601(video_params.get('pubDate'), ' '),
'series': video_params.get('seriesTitle'),
'series_id': video_params.get('seriesHouseNumber') or video_id[:7],
'episode_number': int_or_none(self._html_search_meta('episodeNumber', webpage)),
'episode': self._html_search_meta('episode_title', webpage),
'episode_number': int_or_none(self._html_search_meta('episodeNumber', webpage, default=None)),
'episode': self._html_search_meta('episode_title', webpage, default=None),
'uploader_id': video_params.get('channel'),
'formats': formats,
'subtitles': subtitles,

View File

@@ -26,6 +26,11 @@ MSO_INFO = {
'username_field': 'UserName',
'password_field': 'UserPassword',
},
'Comcast_SSO': {
'name': 'Comcast XFINITY',
'username_field': 'user',
'password_field': 'passwd',
},
'thr030': {
'name': '3 Rivers Communications'
},
@@ -1364,14 +1369,53 @@ class AdobePassIE(InfoExtractor):
'domain_name': 'adobe.com',
'redirect_url': url,
})
provider_login_page_res = post_form(
provider_redirect_page_res, 'Downloading Provider Login Page')
mvpd_confirm_page_res = post_form(provider_login_page_res, 'Logging in', {
mso_info.get('username_field', 'username'): username,
mso_info.get('password_field', 'password'): password,
})
if mso_id != 'Rogers':
post_form(mvpd_confirm_page_res, 'Confirming Login')
if mso_id == 'Comcast_SSO':
# Comcast page flow varies by video site and whether you
# are on Comcast's network.
provider_redirect_page, urlh = provider_redirect_page_res
# Check for Comcast auto login
if 'automatically signing you in' in provider_redirect_page:
oauth_redirect_url = self._html_search_regex(
r'window\.location\s*=\s*[\'"]([^\'"]+)',
provider_redirect_page, 'oauth redirect')
# Just need to process the request. No useful data comes back
self._download_webpage(
oauth_redirect_url, video_id, 'Confirming auto login')
else:
if '<form name="signin"' in provider_redirect_page:
# already have the form, just fill it
provider_login_page_res = provider_redirect_page_res
elif 'http-equiv="refresh"' in provider_redirect_page:
# redirects to the login page
oauth_redirect_url = self._html_search_regex(
r'content="0;\s*url=([^\'"]+)',
provider_redirect_page, 'meta refresh redirect')
provider_login_page_res = self._download_webpage_handle(
oauth_redirect_url,
video_id, 'Downloading Provider Login Page')
else:
provider_login_page_res = post_form(
provider_redirect_page_res, 'Downloading Provider Login Page')
mvpd_confirm_page_res = post_form(provider_login_page_res, 'Logging in', {
mso_info.get('username_field', 'username'): username,
mso_info.get('password_field', 'password'): password,
})
mvpd_confirm_page, urlh = mvpd_confirm_page_res
if '<button class="submit" value="Resume">Resume</button>' in mvpd_confirm_page:
post_form(mvpd_confirm_page_res, 'Confirming Login')
else:
# Normal, non-Comcast flow
provider_login_page_res = post_form(
provider_redirect_page_res, 'Downloading Provider Login Page')
mvpd_confirm_page_res = post_form(provider_login_page_res, 'Logging in', {
mso_info.get('username_field', 'username'): username,
mso_info.get('password_field', 'password'): password,
})
if mso_id != 'Rogers':
post_form(mvpd_confirm_page_res, 'Confirming Login')
session = self._download_webpage(
self._SERVICE_PROVIDER_TEMPLATE % 'session', video_id,

View File

@@ -96,6 +96,27 @@ class AdultSwimIE(TurnerBaseIE):
'skip_download': True,
},
'expected_warnings': ['Unable to download f4m manifest'],
}, {
'url': 'http://www.adultswim.com/videos/toonami/friday-october-14th-2016/',
'info_dict': {
'id': 'eYiLsKVgQ6qTC6agD67Sig',
'title': 'Toonami - Friday, October 14th, 2016',
'description': 'md5:99892c96ffc85e159a428de85c30acde',
},
'playlist': [{
'md5': '',
'info_dict': {
'id': 'eYiLsKVgQ6qTC6agD67Sig',
'ext': 'mp4',
'title': 'Toonami - Friday, October 14th, 2016',
'description': 'md5:99892c96ffc85e159a428de85c30acde',
},
}],
'params': {
# m3u8 download
'skip_download': True,
},
'expected_warnings': ['Unable to download f4m manifest'],
}]
@staticmethod
@@ -163,6 +184,8 @@ class AdultSwimIE(TurnerBaseIE):
segment_ids = [clip['videoPlaybackID'] for clip in video_info['clips']]
elif video_info.get('videoPlaybackID'):
segment_ids = [video_info['videoPlaybackID']]
elif video_info.get('id'):
segment_ids = [video_info['id']]
else:
if video_info.get('auth') is True:
raise ExtractorError(

View File

@@ -1,29 +1,26 @@
# coding: utf-8
from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
remove_end,
qualities,
unescapeHTML,
xpath_element,
url_basename,
)
class AllocineIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?allocine\.fr/(?P<typ>article|video|film)/(fichearticle_gen_carticle=|player_gen_cmedia=|fichefilm_gen_cfilm=|video-)(?P<id>[0-9]+)(?:\.html)?'
_VALID_URL = r'https?://(?:www\.)?allocine\.fr/(?:article|video|film)/(?:fichearticle_gen_carticle=|player_gen_cmedia=|fichefilm_gen_cfilm=|video-)(?P<id>[0-9]+)(?:\.html)?'
_TESTS = [{
'url': 'http://www.allocine.fr/article/fichearticle_gen_carticle=18635087.html',
'md5': '0c9fcf59a841f65635fa300ac43d8269',
'info_dict': {
'id': '19546517',
'display_id': '18635087',
'ext': 'mp4',
'title': 'Astérix - Le Domaine des Dieux Teaser VF',
'description': 'md5:abcd09ce503c6560512c14ebfdb720d2',
'description': 'md5:4a754271d9c6f16c72629a8a993ee884',
'thumbnail': 're:http://.*\.jpg',
},
}, {
@@ -31,64 +28,82 @@ class AllocineIE(InfoExtractor):
'md5': 'd0cdce5d2b9522ce279fdfec07ff16e0',
'info_dict': {
'id': '19540403',
'display_id': '19540403',
'ext': 'mp4',
'title': 'Planes 2 Bande-annonce VF',
'description': 'Regardez la bande annonce du film Planes 2 (Planes 2 Bande-annonce VF). Planes 2, un film de Roberts Gannaway',
'thumbnail': 're:http://.*\.jpg',
},
}, {
'url': 'http://www.allocine.fr/film/fichefilm_gen_cfilm=181290.html',
'url': 'http://www.allocine.fr/video/player_gen_cmedia=19544709&cfilm=181290.html',
'md5': '101250fb127ef9ca3d73186ff22a47ce',
'info_dict': {
'id': '19544709',
'display_id': '19544709',
'ext': 'mp4',
'title': 'Dragons 2 - Bande annonce finale VF',
'description': 'md5:601d15393ac40f249648ef000720e7e3',
'description': 'md5:6cdd2d7c2687d4c6aafe80a35e17267a',
'thumbnail': 're:http://.*\.jpg',
},
}, {
'url': 'http://www.allocine.fr/video/video-19550147/',
'only_matching': True,
'md5': '3566c0668c0235e2d224fd8edb389f67',
'info_dict': {
'id': '19550147',
'ext': 'mp4',
'title': 'Faux Raccord N°123 - Les gaffes de Cliffhanger',
'description': 'md5:bc734b83ffa2d8a12188d9eb48bb6354',
'thumbnail': 're:http://.*\.jpg',
},
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
typ = mobj.group('typ')
display_id = mobj.group('id')
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
if typ == 'film':
video_id = self._search_regex(r'href="/video/player_gen_cmedia=([0-9]+).+"', webpage, 'video id')
else:
player = self._search_regex(r'data-player=\'([^\']+)\'>', webpage, 'data player', default=None)
if player:
player_data = json.loads(player)
video_id = compat_str(player_data['refMedia'])
else:
model = self._search_regex(r'data-model="([^"]+)">', webpage, 'data model')
model_data = self._parse_json(unescapeHTML(model), display_id)
video_id = compat_str(model_data['id'])
xml = self._download_xml('http://www.allocine.fr/ws/AcVisiondataV4.ashx?media=%s' % video_id, display_id)
video = xpath_element(xml, './/AcVisionVideo').attrib
formats = []
quality = qualities(['ld', 'md', 'hd'])
formats = []
for k, v in video.items():
if re.match(r'.+_path', k):
format_id = k.split('_')[0]
model = self._html_search_regex(
r'data-model="([^"]+)"', webpage, 'data model', default=None)
if model:
model_data = self._parse_json(model, display_id)
for video_url in model_data['sources'].values():
video_id, format_id = url_basename(video_url).split('_')[:2]
formats.append({
'format_id': format_id,
'quality': quality(format_id),
'url': v,
'url': video_url,
})
title = model_data['title']
else:
video_id = display_id
media_data = self._download_json(
'http://www.allocine.fr/ws/AcVisiondataV5.ashx?media=%s' % video_id, display_id)
for key, value in media_data['video'].items():
if not key.endswith('Path'):
continue
format_id = key[:-len('Path')]
formats.append({
'format_id': format_id,
'quality': quality(format_id),
'url': value,
})
title = remove_end(self._html_search_regex(
r'(?s)<title>(.+?)</title>', webpage, 'title'
).strip(), ' - AlloCiné')
self._sort_formats(formats)
return {
'id': video_id,
'title': video['videoTitle'],
'display_id': display_id,
'title': title,
'thumbnail': self._og_search_thumbnail(webpage),
'formats': formats,
'description': self._og_search_description(webpage),

View File

@@ -157,22 +157,16 @@ class AnvatoIE(InfoExtractor):
video_data_url, video_id, transform_source=strip_jsonp,
data=json.dumps(payload).encode('utf-8'))
def _extract_anvato_videos(self, webpage, video_id):
anvplayer_data = self._parse_json(self._html_search_regex(
r'<script[^>]+data-anvp=\'([^\']+)\'', webpage,
'Anvato player data'), video_id)
video_id = anvplayer_data['video']
access_key = anvplayer_data['accessKey']
def _get_anvato_videos(self, access_key, video_id):
video_data = self._get_video_json(access_key, video_id)
formats = []
for published_url in video_data['published_urls']:
video_url = published_url['embed_url']
media_format = published_url.get('format')
ext = determine_ext(video_url)
if ext == 'smil':
if ext == 'smil' or media_format == 'smil':
formats.extend(self._extract_smil_formats(video_url, video_id))
continue
@@ -183,7 +177,7 @@ class AnvatoIE(InfoExtractor):
'tbr': tbr if tbr != 0 else None,
}
if ext == 'm3u8':
if ext == 'm3u8' or media_format in ('m3u8', 'm3u8-variant'):
# Not using _extract_m3u8_formats here as individual media
# playlists are also included in published_urls.
if tbr is None:
@@ -194,7 +188,7 @@ class AnvatoIE(InfoExtractor):
'format_id': '-'.join(filter(None, ['hls', compat_str(tbr)])),
'ext': 'mp4',
})
elif ext == 'mp3':
elif ext == 'mp3' or media_format == 'mp3':
a_format['vcodec'] = 'none'
else:
a_format.update({
@@ -218,7 +212,19 @@ class AnvatoIE(InfoExtractor):
'formats': formats,
'title': video_data.get('def_title'),
'description': video_data.get('def_description'),
'tags': video_data.get('def_tags', '').split(','),
'categories': video_data.get('categories'),
'thumbnail': video_data.get('thumbnail'),
'timestamp': int_or_none(video_data.get(
'ts_published') or video_data.get('ts_added')),
'uploader': video_data.get('mcp_id'),
'duration': int_or_none(video_data.get('duration')),
'subtitles': subtitles,
}
def _extract_anvato_videos(self, webpage, video_id):
anvplayer_data = self._parse_json(self._html_search_regex(
r'<script[^>]+data-anvp=\'([^\']+)\'', webpage,
'Anvato player data'), video_id)
return self._get_anvato_videos(
anvplayer_data['accessKey'], anvplayer_data['video'])

View File

@@ -174,11 +174,15 @@ class ARDMediathekIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
if '>Der gewünschte Beitrag ist nicht mehr verfügbar.<' in webpage:
raise ExtractorError('Video %s is no longer available' % video_id, expected=True)
ERRORS = (
('>Leider liegt eine Störung vor.', 'Video %s is unavailable'),
('>Der gewünschte Beitrag ist nicht mehr verfügbar.<',
'Video %s is no longer available'),
)
if 'Diese Sendung ist für Jugendliche unter 12 Jahren nicht geeignet. Der Clip ist deshalb nur von 20 bis 6 Uhr verfügbar.' in webpage:
raise ExtractorError('This program is only suitable for those aged 12 and older. Video %s is therefore only available between 20 pm and 6 am.' % video_id, expected=True)
for pattern, message in ERRORS:
if pattern in webpage:
raise ExtractorError(message % video_id, expected=True)
if re.search(r'[\?&]rss($|[=&])', url):
doc = compat_etree_fromstring(webpage.encode('utf-8'))

View File

@@ -410,6 +410,22 @@ class ArteTVEmbedIE(ArteTVPlus7IE):
return self._extract_from_json_url(json_url, video_id, lang)
class TheOperaPlatformIE(ArteTVPlus7IE):
IE_NAME = 'theoperaplatform'
_VALID_URL = r'https?://(?:www\.)?theoperaplatform\.eu/(?P<lang>fr|de|en|es)/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://www.theoperaplatform.eu/de/opera/verdi-otello',
'md5': '970655901fa2e82e04c00b955e9afe7b',
'info_dict': {
'id': '060338-009-A',
'ext': 'mp4',
'title': 'Verdi - OTELLO',
'upload_date': '20160927',
},
}]
class ArteTVPlaylistIE(ArteTVBaseIE):
IE_NAME = 'arte.tv:playlist'
_VALID_URL = r'https?://(?:www\.)?arte\.tv/guide/(?P<lang>fr|de|en|es)/[^#]*#collection/(?P<id>PL-\d+)'

View File

@@ -8,10 +8,10 @@ from ..compat import compat_str
from ..utils import int_or_none
class BeatportProIE(InfoExtractor):
_VALID_URL = r'https?://pro\.beatport\.com/track/(?P<display_id>[^/]+)/(?P<id>[0-9]+)'
class BeatportIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.|pro\.)?beatport\.com/track/(?P<display_id>[^/]+)/(?P<id>[0-9]+)'
_TESTS = [{
'url': 'https://pro.beatport.com/track/synesthesia-original-mix/5379371',
'url': 'https://beatport.com/track/synesthesia-original-mix/5379371',
'md5': 'b3c34d8639a2f6a7f734382358478887',
'info_dict': {
'id': '5379371',
@@ -20,7 +20,7 @@ class BeatportProIE(InfoExtractor):
'title': 'Froxic - Synesthesia (Original Mix)',
},
}, {
'url': 'https://pro.beatport.com/track/love-and-war-original-mix/3756896',
'url': 'https://beatport.com/track/love-and-war-original-mix/3756896',
'md5': 'e44c3025dfa38c6577fbaeb43da43514',
'info_dict': {
'id': '3756896',
@@ -29,7 +29,7 @@ class BeatportProIE(InfoExtractor):
'title': 'Wolfgang Gartner - Love & War (Original Mix)',
},
}, {
'url': 'https://pro.beatport.com/track/birds-original-mix/4991738',
'url': 'https://beatport.com/track/birds-original-mix/4991738',
'md5': 'a1fd8e8046de3950fd039304c186c05f',
'info_dict': {
'id': '4991738',

View File

@@ -46,19 +46,19 @@ class BeegIE(InfoExtractor):
self._proto_relative_url(cpl_url), video_id,
'Downloading cpl JS', fatal=False)
if cpl:
beeg_version = self._search_regex(
r'beeg_version\s*=\s*(\d+)', cpl,
'beeg version', default=None) or self._search_regex(
beeg_version = int_or_none(self._search_regex(
r'beeg_version\s*=\s*([^\b]+)', cpl,
'beeg version', default=None)) or self._search_regex(
r'/(\d+)\.js', cpl_url, 'beeg version', default=None)
beeg_salt = self._search_regex(
r'beeg_salt\s*=\s*(["\'])(?P<beeg_salt>.+?)\1', cpl, 'beeg beeg_salt',
r'beeg_salt\s*=\s*(["\'])(?P<beeg_salt>.+?)\1', cpl, 'beeg salt',
default=None, group='beeg_salt')
beeg_version = beeg_version or '1750'
beeg_salt = beeg_salt or 'MIDtGaw96f0N1kMMAM1DE46EC9pmFr'
beeg_version = beeg_version or '2000'
beeg_salt = beeg_salt or 'pmweAkq8lAYKdfWcFCUj0yoVgoPlinamH5UE1CB3H'
video = self._download_json(
'http://api.beeg.com/api/v6/%s/video/%s' % (beeg_version, video_id),
'https://api.beeg.com/api/v6/%s/video/%s' % (beeg_version, video_id),
video_id)
def split(o, e):

View File

@@ -6,11 +6,13 @@ import re
from .common import InfoExtractor
from ..compat import compat_urllib_parse_urlparse
from ..utils import (
dict_get,
ExtractorError,
HEADRequest,
unified_strdate,
qualities,
int_or_none,
qualities,
remove_end,
unified_strdate,
)
@@ -43,47 +45,46 @@ class CanalplusIE(InfoExtractor):
_TESTS = [{
'url': 'http://www.canalplus.fr/c-emissions/pid1830-c-zapping.html?vid=1192814',
'md5': '41f438a4904f7664b91b4ed0dec969dc',
'info_dict': {
'id': '1192814',
'id': '1405510',
'display_id': 'pid1830-c-zapping',
'ext': 'mp4',
'title': "L'Année du Zapping 2014 - L'Année du Zapping 2014",
'description': "Toute l'année 2014 dans un Zapping exceptionnel !",
'upload_date': '20150105',
'title': 'Zapping - 02/07/2016',
'description': 'Le meilleur de toutes les chaînes, tous les jours',
'upload_date': '20160702',
},
}, {
'url': 'http://www.piwiplus.fr/videos-piwi/pid1405-le-labyrinthe-boing-super-ranger.html?vid=1108190',
'info_dict': {
'id': '1108190',
'ext': 'flv',
'title': 'Le labyrinthe - Boing super ranger',
'display_id': 'pid1405-le-labyrinthe-boing-super-ranger',
'ext': 'mp4',
'title': 'BOING SUPER RANGER - Ep : Le labyrinthe',
'description': 'md5:4cea7a37153be42c1ba2c1d3064376ff',
'upload_date': '20140724',
},
'skip': 'Only works from France',
}, {
'url': 'http://www.d8.tv/d8-docs-mags/pid5198-d8-en-quete-d-actualite.html?vid=1390231',
'url': 'http://www.c8.fr/c8-divertissement/ms-touche-pas-a-mon-poste/pid6318-videos-integrales.html',
'md5': '4b47b12b4ee43002626b97fad8fb1de5',
'info_dict': {
'id': '1390231',
'id': '1420213',
'display_id': 'pid6318-videos-integrales',
'ext': 'mp4',
'title': "Vacances pas chères : prix discount ou grosses dépenses ? - En quête d'actualité",
'description': 'md5:edb6cf1cb4a1e807b5dd089e1ac8bfc6',
'upload_date': '20160512',
},
'params': {
'skip_download': True,
'title': 'TPMP ! Même le matin - Les 35H de Baba - 14/10/2016',
'description': 'md5:f96736c1b0ffaa96fd5b9e60ad871799',
'upload_date': '20161014',
},
'skip': 'Only works from France',
}, {
'url': 'http://www.itele.fr/chroniques/invite-bruce-toussaint/thierry-solere-nicolas-sarkozy-officialisera-sa-candidature-a-la-primaire-quand-il-le-voudra-167224',
'url': 'http://www.itele.fr/chroniques/invite-michael-darmon/rachida-dati-nicolas-sarkozy-est-le-plus-en-phase-avec-les-inquietudes-des-francais-171510',
'info_dict': {
'id': '1398334',
'id': '1420176',
'display_id': 'rachida-dati-nicolas-sarkozy-est-le-plus-en-phase-avec-les-inquietudes-des-francais-171510',
'ext': 'mp4',
'title': "L'invité de Bruce Toussaint du 07/06/2016 - ",
'description': 'md5:40ac7c9ad0feaeb6f605bad986f61324',
'upload_date': '20160607',
},
'params': {
'skip_download': True,
'title': 'L\'invité de Michaël Darmon du 14/10/2016 - ',
'description': 'Chaque matin du lundi au vendredi, Michaël Darmon reçoit un invité politique à 8h25.',
'upload_date': '20161014',
},
}, {
'url': 'http://m.canalplus.fr/?vid=1398231',
@@ -95,18 +96,17 @@ class CanalplusIE(InfoExtractor):
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.groupdict().get('id') or mobj.groupdict().get('vid')
site_id = self._SITE_ID_MAP[compat_urllib_parse_urlparse(url).netloc.rsplit('.', 2)[-2]]
# Beware, some subclasses do not define an id group
display_id = mobj.group('display_id') or video_id
display_id = remove_end(dict_get(mobj.groupdict(), ('display_id', 'id', 'vid')), '.html')
if video_id is None:
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
[r'<canal:player[^>]+?videoId=(["\'])(?P<id>\d+)', r'id=["\']canal_video_player(?P<id>\d+)'],
webpage, 'video id', group='id')
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
[r'<canal:player[^>]+?videoId=(["\'])(?P<id>\d+)',
r'id=["\']canal_video_player(?P<id>\d+)'],
webpage, 'video id', group='id')
info_url = self._VIDEO_INFO_TEMPLATE % (site_id, video_id)
video_data = self._download_json(info_url, video_id, 'Downloading video JSON')

View File

@@ -9,6 +9,8 @@ from ..utils import (
try_get,
)
from .videomore import VideomoreIE
class CarambaTVIE(InfoExtractor):
_VALID_URL = r'(?:carambatv:|https?://video1\.carambatv\.ru/v/)(?P<id>\d+)'
@@ -62,14 +64,16 @@ class CarambaTVPageIE(InfoExtractor):
_VALID_URL = r'https?://carambatv\.ru/(?:[^/]+/)+(?P<id>[^/?#&]+)'
_TEST = {
'url': 'http://carambatv.ru/movie/bad-comedian/razborka-v-manile/',
'md5': '',
'md5': 'a49fb0ec2ad66503eeb46aac237d3c86',
'info_dict': {
'id': '191910501',
'ext': 'mp4',
'id': '475222',
'ext': 'flv',
'title': '[BadComedian] - Разборка в Маниле (Абсолютный обзор)',
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 2678.31,
'thumbnail': 're:^https?://.*\.jpg',
# duration reported by videomore is incorrect
'duration': int,
},
'add_ie': [VideomoreIE.ie_key()],
}
def _real_extract(self, url):
@@ -77,6 +81,16 @@ class CarambaTVPageIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
videomore_url = VideomoreIE._extract_url(webpage)
if videomore_url:
title = self._og_search_title(webpage)
return {
'_type': 'url_transparent',
'url': videomore_url,
'ie_key': VideomoreIE.ie_key(),
'title': title,
}
video_url = self._og_search_property('video:iframe', webpage, default=None)
if not video_url:

View File

@@ -63,7 +63,7 @@ class CBSInteractiveIE(ThePlatformIE):
webpage = self._download_webpage(url, display_id)
data_json = self._html_search_regex(
r"data-(?:cnet|zdnet)-video(?:-uvp)?-options='([^']+)'",
r"data-(?:cnet|zdnet)-video(?:-uvp(?:js)?)?-options='([^']+)'",
webpage, 'data json')
data = self._parse_json(data_json, display_id)
vdata = data.get('video') or data['videos'][0]

View File

@@ -22,6 +22,7 @@ class CBSLocalIE(AnvatoIE):
'thumbnail': 're:^https?://.*',
'timestamp': 1463440500,
'upload_date': '20160516',
'uploader': 'CBS',
'subtitles': {
'en': 'mincount:5',
},
@@ -35,6 +36,7 @@ class CBSLocalIE(AnvatoIE):
'Syndication\\Curb.tv',
'Content\\News'
],
'tags': ['CBS 2 News Evening'],
},
}, {
# SendtoNews embed

View File

@@ -2,6 +2,7 @@
from __future__ import unicode_literals
import base64
import re
from .common import InfoExtractor
from ..utils import parse_duration
@@ -70,7 +71,6 @@ class ChirbitProfileIE(InfoExtractor):
'url': 'http://chirbit.com/ScarletBeauty',
'info_dict': {
'id': 'ScarletBeauty',
'title': 'Chirbits by ScarletBeauty',
},
'playlist_mincount': 3,
}
@@ -78,13 +78,10 @@ class ChirbitProfileIE(InfoExtractor):
def _real_extract(self, url):
profile_id = self._match_id(url)
rss = self._download_xml(
'http://chirbit.com/rss/%s' % profile_id, profile_id)
webpage = self._download_webpage(url, profile_id)
entries = [
self.url_result(audio_url.text, 'Chirbit')
for audio_url in rss.findall('./channel/item/link')]
self.url_result(self._proto_relative_url('//chirb.it/' + video_id))
for _, video_id in re.findall(r'<input[^>]+id=([\'"])copy-btn-(?P<id>[0-9a-zA-Z]+)\1', webpage)]
title = rss.find('./channel/title').text
return self.playlist_result(entries, profile_id, title)
return self.playlist_result(entries, profile_id)

View File

@@ -1,3 +1,4 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
@@ -10,15 +11,15 @@ from ..utils import (
class ClipfishIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?clipfish\.de/(?:[^/]+/)+video/(?P<id>[0-9]+)'
_TEST = {
'url': 'http://www.clipfish.de/special/game-trailer/video/3966754/fifa-14-e3-2013-trailer/',
'md5': '79bc922f3e8a9097b3d68a93780fd475',
'url': 'http://www.clipfish.de/special/ugly-americans/video/4343170/s01-e01-ugly-americans-date-in-der-hoelle/',
'md5': '720563e467b86374c194bdead08d207d',
'info_dict': {
'id': '3966754',
'id': '4343170',
'ext': 'mp4',
'title': 'FIFA 14 - E3 2013 Trailer',
'description': 'Video zu FIFA 14: E3 2013 Trailer',
'upload_date': '20130611',
'duration': 82,
'title': 'S01 E01 - Ugly Americans - Date in der Hölle',
'description': 'Mark Lilly arbeitet im Sozialdienst der Stadt New York und soll Immigranten bei ihrer Einbürgerung in die USA zur Seite stehen.',
'upload_date': '20161005',
'duration': 1291,
'view_count': int,
}
}
@@ -50,10 +51,14 @@ class ClipfishIE(InfoExtractor):
'tbr': int_or_none(video_info.get('bitrate')),
})
descr = video_info.get('descr')
if descr:
descr = descr.strip()
return {
'id': video_id,
'title': video_info['title'],
'description': video_info.get('descr'),
'description': descr,
'formats': formats,
'thumbnail': video_info.get('media_content_thumbnail_large') or video_info.get('media_thumbnail'),
'duration': int_or_none(video_info.get('media_length')),

View File

@@ -26,7 +26,7 @@ class CMTIE(MTVIE):
'id': '1504699',
'ext': 'mp4',
'title': 'Still The King Ep. 109 in 3 Minutes',
'description': 'Relive or catch up with Still The King by watching this recap of season 1, episode 9. New episodes Sundays 9/8c.',
'description': 'Relive or catch up with Still The King by watching this recap of season 1, episode 9.',
'timestamp': 1469421000.0,
'upload_date': '20160725',
},
@@ -42,3 +42,8 @@ class CMTIE(MTVIE):
'%s said: video is not available' % cls.IE_NAME, expected=True)
return super(CMTIE, cls)._transform_rtmp_url(rtmp_video_url)
def _extract_mgid(self, webpage):
return self._search_regex(
r'MTVN\.VIDEO\.contentUri\s*=\s*([\'"])(?P<mgid>.+?)\1',
webpage, 'mgid', group='mgid')

View File

@@ -30,6 +30,7 @@ from ..downloader.f4m import remove_encrypted_media
from ..utils import (
NO_DEFAULT,
age_restricted,
base_url,
bug_reports_message,
clean_html,
compiled_regex_type,
@@ -235,7 +236,7 @@ class InfoExtractor(object):
chapter_id: Id of the chapter the video belongs to, as a unicode string.
The following fields should only be used when the video is an episode of some
series or programme:
series, programme or podcast:
series: Title of the series or programme the video episode belongs to.
season: Title of the season the video episode belongs to.
@@ -1100,6 +1101,13 @@ class InfoExtractor(object):
manifest, ['{http://ns.adobe.com/f4m/1.0}bootstrapInfo', '{http://ns.adobe.com/f4m/2.0}bootstrapInfo'],
'bootstrap info', default=None)
vcodec = None
mime_type = xpath_text(
manifest, ['{http://ns.adobe.com/f4m/1.0}mimeType', '{http://ns.adobe.com/f4m/2.0}mimeType'],
'base URL', default=None)
if mime_type and mime_type.startswith('audio/'):
vcodec = 'none'
for i, media_el in enumerate(media_nodes):
tbr = int_or_none(media_el.attrib.get('bitrate'))
width = int_or_none(media_el.attrib.get('width'))
@@ -1140,6 +1148,7 @@ class InfoExtractor(object):
'width': f.get('width') or width,
'height': f.get('height') or height,
'format_id': f.get('format_id') if not tbr else format_id,
'vcodec': vcodec,
})
formats.extend(f4m_formats)
continue
@@ -1156,6 +1165,7 @@ class InfoExtractor(object):
'tbr': tbr,
'width': width,
'height': height,
'vcodec': vcodec,
'preference': preference,
})
return formats
@@ -1270,9 +1280,10 @@ class InfoExtractor(object):
}
resolution = last_info.get('RESOLUTION')
if resolution:
width_str, height_str = resolution.split('x')
f['width'] = int(width_str)
f['height'] = int(height_str)
mobj = re.search(r'(?P<width>\d+)[xX](?P<height>\d+)', resolution)
if mobj:
f['width'] = int(mobj.group('width'))
f['height'] = int(mobj.group('height'))
# Unified Streaming Platform
mobj = re.search(
r'audio.*?(?:%3D|=)(\d+)(?:-video.*?(?:%3D|=)(\d+))?', f['url'])
@@ -1530,7 +1541,7 @@ class InfoExtractor(object):
if res is False:
return []
mpd, urlh = res
mpd_base_url = re.match(r'https?://.+/', urlh.geturl()).group()
mpd_base_url = base_url(urlh.geturl())
return self._parse_mpd_formats(
compat_etree_fromstring(mpd.encode('utf-8')), mpd_id, mpd_base_url,
@@ -1771,6 +1782,105 @@ class InfoExtractor(object):
self.report_warning('Unknown MIME type %s in DASH manifest' % mime_type)
return formats
def _extract_ism_formats(self, ism_url, video_id, ism_id=None, note=None, errnote=None, fatal=True):
res = self._download_webpage_handle(
ism_url, video_id,
note=note or 'Downloading ISM manifest',
errnote=errnote or 'Failed to download ISM manifest',
fatal=fatal)
if res is False:
return []
ism, urlh = res
return self._parse_ism_formats(
compat_etree_fromstring(ism.encode('utf-8')), urlh.geturl(), ism_id)
def _parse_ism_formats(self, ism_doc, ism_url, ism_id=None):
if ism_doc.get('IsLive') == 'TRUE' or ism_doc.find('Protection') is not None:
return []
duration = int(ism_doc.attrib['Duration'])
timescale = int_or_none(ism_doc.get('TimeScale')) or 10000000
formats = []
for stream in ism_doc.findall('StreamIndex'):
stream_type = stream.get('Type')
if stream_type not in ('video', 'audio'):
continue
url_pattern = stream.attrib['Url']
stream_timescale = int_or_none(stream.get('TimeScale')) or timescale
stream_name = stream.get('Name')
for track in stream.findall('QualityLevel'):
fourcc = track.get('FourCC')
# TODO: add support for WVC1 and WMAP
if fourcc not in ('H264', 'AVC1', 'AACL'):
self.report_warning('%s is not a supported codec' % fourcc)
continue
tbr = int(track.attrib['Bitrate']) // 1000
width = int_or_none(track.get('MaxWidth'))
height = int_or_none(track.get('MaxHeight'))
sampling_rate = int_or_none(track.get('SamplingRate'))
track_url_pattern = re.sub(r'{[Bb]itrate}', track.attrib['Bitrate'], url_pattern)
track_url_pattern = compat_urlparse.urljoin(ism_url, track_url_pattern)
fragments = []
fragment_ctx = {
'time': 0,
}
stream_fragments = stream.findall('c')
for stream_fragment_index, stream_fragment in enumerate(stream_fragments):
fragment_ctx['time'] = int_or_none(stream_fragment.get('t')) or fragment_ctx['time']
fragment_repeat = int_or_none(stream_fragment.get('r')) or 1
fragment_ctx['duration'] = int_or_none(stream_fragment.get('d'))
if not fragment_ctx['duration']:
try:
next_fragment_time = int(stream_fragment[stream_fragment_index + 1].attrib['t'])
except IndexError:
next_fragment_time = duration
fragment_ctx['duration'] = (next_fragment_time - fragment_ctx['time']) / fragment_repeat
for _ in range(fragment_repeat):
fragments.append({
'url': re.sub(r'{start[ _]time}', compat_str(fragment_ctx['time']), track_url_pattern),
'duration': fragment_ctx['duration'] / stream_timescale,
})
fragment_ctx['time'] += fragment_ctx['duration']
format_id = []
if ism_id:
format_id.append(ism_id)
if stream_name:
format_id.append(stream_name)
format_id.append(compat_str(tbr))
formats.append({
'format_id': '-'.join(format_id),
'url': ism_url,
'manifest_url': ism_url,
'ext': 'ismv' if stream_type == 'video' else 'isma',
'width': width,
'height': height,
'tbr': tbr,
'asr': sampling_rate,
'vcodec': 'none' if stream_type == 'audio' else fourcc,
'acodec': 'none' if stream_type == 'video' else fourcc,
'protocol': 'ism',
'fragments': fragments,
'_download_params': {
'duration': duration,
'timescale': stream_timescale,
'width': width or 0,
'height': height or 0,
'fourcc': fourcc,
'codec_private_data': track.get('CodecPrivateData'),
'sampling_rate': sampling_rate,
'channels': int_or_none(track.get('Channels', 2)),
'bits_per_sample': int_or_none(track.get('BitsPerSample', 16)),
'nal_unit_length_field': int_or_none(track.get('NALUnitLengthField', 4)),
},
})
return formats
def _parse_html5_media_entries(self, base_url, webpage, video_id, m3u8_id=None, m3u8_entry_protocol='m3u8'):
def absolute_url(video_url):
return compat_urlparse.urljoin(base_url, video_url)
@@ -1802,7 +1912,11 @@ class InfoExtractor(object):
return is_plain_url, formats
entries = []
for media_tag, media_type, media_content in re.findall(r'(?s)(<(?P<tag>video|audio)[^>]*>)(.*?)</(?P=tag)>', webpage):
media_tags = [(media_tag, media_type, '')
for media_tag, media_type
in re.findall(r'(?s)(<(video|audio)[^>]*/>)', webpage)]
media_tags.extend(re.findall(r'(?s)(<(?P<tag>video|audio)[^>]*>)(.*?)</(?P=tag)>', webpage))
for media_tag, media_type, media_content in media_tags:
media_info = {
'formats': [],
'subtitles': {},
@@ -1871,11 +1985,11 @@ class InfoExtractor(object):
formats.extend(self._extract_f4m_formats(
http_base_url + '/manifest.f4m',
video_id, f4m_id='hds', fatal=False))
if 'dash' not in skip_protocols:
formats.extend(self._extract_mpd_formats(
http_base_url + '/manifest.mpd',
video_id, mpd_id='dash', fatal=False))
if re.search(r'(?:/smil:|\.smil)', url_base):
if 'dash' not in skip_protocols:
formats.extend(self._extract_mpd_formats(
http_base_url + '/manifest.mpd',
video_id, mpd_id='dash', fatal=False))
if 'smil' not in skip_protocols:
rtmp_formats = self._extract_smil_formats(
http_base_url + '/jwplayer.smil',

View File

@@ -150,6 +150,7 @@ class CrunchyrollIE(CrunchyrollBaseIE):
# rtmp
'skip_download': True,
},
'skip': 'Video gone',
}, {
'url': 'http://www.crunchyroll.com/rezero-starting-life-in-another-world-/episode-5-the-morning-of-our-promise-is-still-distant-702409',
'info_dict': {

View File

@@ -94,7 +94,8 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
'title': 'Leanna Decker - Cyber Girl Of The Year Desires Nude [Playboy Plus]',
'uploader': 'HotWaves1012',
'age_limit': 18,
}
},
'skip': 'video gone',
},
# geo-restricted, player v5
{
@@ -144,7 +145,8 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
player_v5 = self._search_regex(
[r'buildPlayer\(({.+?})\);\n', # See https://github.com/rg3/youtube-dl/issues/7826
r'playerV5\s*=\s*dmp\.create\([^,]+?,\s*({.+?})\);',
r'buildPlayer\(({.+?})\);'],
r'buildPlayer\(({.+?})\);',
r'var\s+config\s*=\s*({.+?});'],
webpage, 'player v5', default=None)
if player_v5:
player = self._parse_json(player_v5, video_id)

View File

@@ -9,7 +9,7 @@ from ..utils import (
class DotsubIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?dotsub\.com/view/(?P<id>[^/]+)'
_TEST = {
_TESTS = [{
'url': 'https://dotsub.com/view/9c63db2a-fa95-4838-8e6e-13deafe47f09',
'md5': '21c7ff600f545358134fea762a6d42b6',
'info_dict': {
@@ -24,7 +24,24 @@ class DotsubIE(InfoExtractor):
'upload_date': '20131130',
'view_count': int,
}
}
}, {
'url': 'https://dotsub.com/view/747bcf58-bd59-45b7-8c8c-ac312d084ee6',
'md5': '2bb4a83896434d5c26be868c609429a3',
'info_dict': {
'id': '168006778',
'ext': 'mp4',
'title': 'Apartments and flats in Raipur the white symphony',
'description': 'md5:784d0639e6b7d1bc29530878508e38fe',
'thumbnail': 're:^https?://dotsub.com/media/747bcf58-bd59-45b7-8c8c-ac312d084ee6/p',
'duration': 290,
'timestamp': 1476767794.2809999,
'upload_date': '20160525',
'uploader': 'parthivi001',
'uploader_id': 'user52596202',
'view_count': int,
},
'add_ie': ['Vimeo'],
}]
def _real_extract(self, url):
video_id = self._match_id(url)
@@ -37,12 +54,23 @@ class DotsubIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
video_url = self._search_regex(
[r'<source[^>]+src="([^"]+)"', r'"file"\s*:\s*\'([^\']+)'],
webpage, 'video url')
webpage, 'video url', default=None)
info_dict = {
'id': video_id,
'url': video_url,
'ext': 'flv',
}
return {
'id': video_id,
'url': video_url,
'ext': 'flv',
if not video_url:
setup_data = self._parse_json(self._html_search_regex(
r'(?s)data-setup=([\'"])(?P<content>(?!\1).+?)\1',
webpage, 'setup data', group='content'), video_id)
info_dict = {
'_type': 'url_transparent',
'url': setup_data['src'],
}
info_dict.update({
'title': info['title'],
'description': info.get('description'),
'thumbnail': info.get('screenshotURI'),
@@ -50,4 +78,6 @@ class DotsubIE(InfoExtractor):
'uploader': info.get('user'),
'timestamp': float_or_none(info.get('dateCreated'), 1000),
'view_count': int_or_none(info.get('numberOfViews')),
}
})
return info_dict

View File

@@ -10,8 +10,8 @@ from ..utils import (
class DrTuberIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?drtuber\.com/video/(?P<id>\d+)/(?P<display_id>[\w-]+)'
_TEST = {
_VALID_URL = r'https?://(?:www\.)?drtuber\.com/(?:video|embed)/(?P<id>\d+)(?:/(?P<display_id>[\w-]+))?'
_TESTS = [{
'url': 'http://www.drtuber.com/video/1740434/hot-perky-blonde-naked-golf',
'md5': '93e680cf2536ad0dfb7e74d94a89facd',
'info_dict': {
@@ -25,20 +25,30 @@ class DrTuberIE(InfoExtractor):
'thumbnail': 're:https?://.*\.jpg$',
'age_limit': 18,
}
}
}, {
'url': 'http://www.drtuber.com/embed/489939',
'only_matching': True,
}]
@staticmethod
def _extract_urls(webpage):
return re.findall(
r'<iframe[^>]+?src=["\'](?P<url>(?:https?:)?//(?:www\.)?drtuber\.com/embed/\d+)',
webpage)
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
display_id = mobj.group('display_id')
display_id = mobj.group('display_id') or video_id
webpage = self._download_webpage(url, display_id)
webpage = self._download_webpage(
'http://www.drtuber.com/video/%s' % video_id, display_id)
video_url = self._html_search_regex(
r'<source src="([^"]+)"', webpage, 'video URL')
title = self._html_search_regex(
(r'class="title_watch"[^>]*><p>([^<]+)<',
(r'class="title_watch"[^>]*><(?:p|h\d+)[^>]*>([^<]+)<',
r'<p[^>]+class="title_substrate">([^<]+)</p>',
r'<title>([^<]+) - \d+'),
webpage, 'title')

View File

@@ -1,38 +1,117 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import remove_end
from ..compat import compat_str
from ..utils import (
determine_ext,
int_or_none,
unified_timestamp,
)
class ESPNIE(InfoExtractor):
_VALID_URL = r'https?://(?:espn\.go|(?:www\.)?espn)\.com/(?:[^/]+/)*(?P<id>[^/]+)'
_VALID_URL = r'https?://(?:espn\.go|(?:www\.)?espn)\.com/video/clip(?:\?.*?\bid=|/_/id/)(?P<id>\d+)'
_TESTS = [{
'url': 'http://espn.go.com/video/clip?id=10365079',
'md5': '60e5d097a523e767d06479335d1bdc58',
'info_dict': {
'id': 'FkYWtmazr6Ed8xmvILvKLWjd4QvYZpzG',
'id': '10365079',
'ext': 'mp4',
'title': '30 for 30 Shorts: Judging Jewell',
'description': None,
'description': 'md5:39370c2e016cb4ecf498ffe75bef7f0f',
'timestamp': 1390936111,
'upload_date': '20140128',
},
'params': {
'skip_download': True,
},
'add_ie': ['OoyalaExternal'],
}, {
# intl video, from http://www.espnfc.us/video/mls-highlights/150/video/2743663/must-see-moments-best-of-the-mls-season
'url': 'http://espn.go.com/video/clip?id=2743663',
'md5': 'f4ac89b59afc7e2d7dbb049523df6768',
'info_dict': {
'id': '50NDFkeTqRHB0nXBOK-RGdSG5YQPuxHg',
'id': '2743663',
'ext': 'mp4',
'title': 'Must-See Moments: Best of the MLS season',
'description': 'md5:4c2d7232beaea572632bec41004f0aeb',
'timestamp': 1449446454,
'upload_date': '20151207',
},
'params': {
'skip_download': True,
},
'add_ie': ['OoyalaExternal'],
'expected_warnings': ['Unable to download f4m manifest'],
}, {
'url': 'http://www.espn.com/video/clip?id=10365079',
'only_matching': True,
}, {
'url': 'http://www.espn.com/video/clip/_/id/17989860',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
clip = self._download_json(
'http://api-app.espn.com/v1/video/clips/%s' % video_id,
video_id)['videos'][0]
title = clip['headline']
format_urls = set()
formats = []
def traverse_source(source, base_source_id=None):
for source_id, source in source.items():
if isinstance(source, compat_str):
extract_source(source, base_source_id)
elif isinstance(source, dict):
traverse_source(
source,
'%s-%s' % (base_source_id, source_id)
if base_source_id else source_id)
def extract_source(source_url, source_id=None):
if source_url in format_urls:
return
format_urls.add(source_url)
ext = determine_ext(source_url)
if ext == 'smil':
formats.extend(self._extract_smil_formats(
source_url, video_id, fatal=False))
elif ext == 'f4m':
formats.extend(self._extract_f4m_formats(
source_url, video_id, f4m_id=source_id, fatal=False))
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
source_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id=source_id, fatal=False))
else:
formats.append({
'url': source_url,
'format_id': source_id,
})
traverse_source(clip['links']['source'])
self._sort_formats(formats)
description = clip.get('caption') or clip.get('description')
thumbnail = clip.get('thumbnail')
duration = int_or_none(clip.get('duration'))
timestamp = unified_timestamp(clip.get('originalPublishDate'))
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'timestamp': timestamp,
'duration': duration,
'formats': formats,
}
class ESPNArticleIE(InfoExtractor):
_VALID_URL = r'https?://(?:espn\.go|(?:www\.)?espn)\.com/(?:[^/]+/)*(?P<id>[^/]+)'
_TESTS = [{
'url': 'https://espn.go.com/video/iframe/twitter/?cms=espn&id=10365079',
'only_matching': True,
}, {
@@ -47,11 +126,12 @@ class ESPNIE(InfoExtractor):
}, {
'url': 'http://espn.go.com/nba/playoffs/2015/story/_/id/12887571/john-wall-washington-wizards-no-swelling-left-hand-wrist-game-5-return',
'only_matching': True,
}, {
'url': 'http://www.espn.com/video/clip?id=10365079',
'only_matching': True,
}]
@classmethod
def suitable(cls, url):
return False if ESPNIE.suitable(url) else super(ESPNArticleIE, cls).suitable(url)
def _real_extract(self, url):
video_id = self._match_id(url)
@@ -61,23 +141,5 @@ class ESPNIE(InfoExtractor):
r'class=(["\']).*?video-play-button.*?\1[^>]+data-id=["\'](?P<id>\d+)',
webpage, 'video id', group='id')
cms = 'espn'
if 'data-source="intl"' in webpage:
cms = 'intl'
player_url = 'https://espn.go.com/video/iframe/twitter/?id=%s&cms=%s' % (video_id, cms)
player = self._download_webpage(
player_url, video_id)
pcode = self._search_regex(
r'["\']pcode=([^"\']+)["\']', player, 'pcode')
title = remove_end(
self._og_search_title(webpage),
'- ESPN Video').strip()
return {
'_type': 'url_transparent',
'url': 'ooyalaexternal:%s:%s:%s' % (cms, video_id, pcode),
'ie_key': 'OoyalaExternal',
'title': title,
}
return self.url_result(
'http://espn.go.com/video/clip?id=%s' % video_id, ESPNIE.ie_key())

View File

@@ -66,6 +66,7 @@ from .arte import (
ArteTVDDCIE,
ArteTVMagazineIE,
ArteTVEmbedIE,
TheOperaPlatformIE,
ArteTVPlaylistIE,
)
from .atresplayer import AtresPlayerIE
@@ -93,7 +94,7 @@ from .bbc import (
from .beeg import BeegIE
from .behindkink import BehindKinkIE
from .bellmedia import BellMediaIE
from .beatportpro import BeatportProIE
from .beatport import BeatportIE
from .bet import BetIE
from .bigflix import BigflixIE
from .bild import BildIE
@@ -295,6 +296,7 @@ from .footyroom import FootyRoomIE
from .formula1 import Formula1IE
from .fourtube import FourTubeIE
from .fox import FOXIE
from .fox9 import FOX9IE
from .foxgay import FoxgayIE
from .foxnews import (
FoxNewsIE,
@@ -348,7 +350,10 @@ from .goshgay import GoshgayIE
from .gputechconf import GPUTechConfIE
from .groupon import GrouponIE
from .hark import HarkIE
from .hbo import HBOIE
from .hbo import (
HBOIE,
HBOEpisodeIE,
)
from .hearthisat import HearThisAtIE
from .heise import HeiseIE
from .hellporno import HellPornoIE
@@ -369,6 +374,7 @@ from .hrti import (
HRTiIE,
HRTiPlaylistIE,
)
from .huajiao import HuajiaoIE
from .huffpost import HuffPostIE
from .hypem import HypemIE
from .iconosquare import IconosquareIE
@@ -403,6 +409,10 @@ from .ivi import (
from .ivideon import IvideonIE
from .iwara import IwaraIE
from .izlesene import IzleseneIE
from .jamendo import (
JamendoIE,
JamendoAlbumIE,
)
from .jeuxvideo import JeuxVideoIE
from .jove import JoveIE
from .jwplatform import JWPlatformIE
@@ -587,6 +597,7 @@ from .nhl import (
from .nick import (
NickIE,
NickDeIE,
NickNightIE,
)
from .niconico import NiconicoIE, NiconicoPlaylistIE
from .ninecninemedia import (
@@ -596,6 +607,7 @@ from .ninecninemedia import (
from .ninegag import NineGagIE
from .ninenow import NineNowIE
from .nintendo import NintendoIE
from .nobelprize import NobelPrizeIE
from .noco import NocoIE
from .normalboots import NormalbootsIE
from .nosvideo import NosVideoIE
@@ -662,6 +674,7 @@ from .orf import (
ORFFM4IE,
ORFIPTVIE,
)
from .pandatv import PandaTVIE
from .pandoratv import PandoraTVIE
from .parliamentliveuk import ParliamentLiveUKIE
from .patreon import PatreonIE
@@ -735,6 +748,10 @@ from .rbmaradio import RBMARadioIE
from .rds import RDSIE
from .redtube import RedTubeIE
from .regiotv import RegioTVIE
from .rentv import (
RENTVIE,
RENTVArticleIE,
)
from .restudy import RestudyIE
from .reuters import ReutersIE
from .reverbnation import ReverbNationIE
@@ -791,7 +808,10 @@ from .sendtonews import SendtoNewsIE
from .servingsys import ServingSysIE
from .sexu import SexuIE
from .shahid import ShahidIE
from .shared import SharedIE
from .shared import (
SharedIE,
VivoIE,
)
from .sharesix import ShareSixIE
from .sina import SinaIE
from .sixplay import SixPlayIE
@@ -1082,6 +1102,7 @@ from .vrt import VRTIE
from .vube import VubeIE
from .vuclip import VuClipIE
from .vyborymos import VyboryMosIE
from .vzaar import VzaarIE
from .walla import WallaIE
from .washingtonpost import (
WashingtonPostIE,

View File

@@ -1,6 +1,5 @@
from __future__ import unicode_literals
import json
import re
import socket
@@ -100,7 +99,8 @@ class FacebookIE(InfoExtractor):
'ext': 'mp4',
'title': '"What are you doing running in the snow?"',
'uploader': 'FailArmy',
}
},
'skip': 'Video gone',
}, {
'url': 'https://m.facebook.com/story.php?story_fbid=1035862816472149&id=116132035111903',
'md5': '1deb90b6ac27f7efcf6d747c8a27f5e3',
@@ -110,6 +110,7 @@ class FacebookIE(InfoExtractor):
'title': 'What the Flock Is Going On In New Zealand Credit: ViralHog',
'uploader': 'S. Saint',
},
'skip': 'Video gone',
}, {
'note': 'swf params escaped',
'url': 'https://www.facebook.com/barackobama/posts/10153664894881749',
@@ -119,6 +120,18 @@ class FacebookIE(InfoExtractor):
'ext': 'mp4',
'title': 'Facebook video #10153664894881749',
},
}, {
# have 1080P, but only up to 720p in swf params
'url': 'https://www.facebook.com/cnn/videos/10155529876156509/',
'md5': '0d9813160b146b3bc8744e006027fcc6',
'info_dict': {
'id': '10155529876156509',
'ext': 'mp4',
'title': 'Holocaust survivor becomes US citizen',
'timestamp': 1477818095,
'upload_date': '20161030',
'uploader': 'CNN',
},
}, {
'url': 'https://www.facebook.com/video.php?v=10204634152394104',
'only_matching': True,
@@ -227,43 +240,13 @@ class FacebookIE(InfoExtractor):
video_data = None
BEFORE = '{swf.addParam(param[0], param[1]);});'
AFTER = '.forEach(function(variable) {swf.addVariable(variable[0], variable[1]);});'
PATTERN = re.escape(BEFORE) + '(?:\n|\\\\n)(.*?)' + re.escape(AFTER)
for m in re.findall(PATTERN, webpage):
swf_params = m.replace('\\\\', '\\').replace('\\"', '"')
data = dict(json.loads(swf_params))
params_raw = compat_urllib_parse_unquote(data['params'])
video_data_candidate = json.loads(params_raw)['video_data']
for _, f in video_data_candidate.items():
if not f:
continue
if isinstance(f, dict):
f = [f]
if not isinstance(f, list):
continue
if f[0].get('video_id') == video_id:
video_data = video_data_candidate
break
if video_data:
server_js_data = self._parse_json(self._search_regex(
r'handleServerJS\(({.+})(?:\);|,")', webpage, 'server js data', default='{}'), video_id)
for item in server_js_data.get('instances', []):
if item[1][0] == 'VideoConfig':
video_data = item[2][0]['videoData']
break
def video_data_list2dict(video_data):
ret = {}
for item in video_data:
format_id = item['stream_type']
ret.setdefault(format_id, []).append(item)
return ret
if not video_data:
server_js_data = self._parse_json(self._search_regex(
r'handleServerJS\(({.+})(?:\);|,")', webpage, 'server js data', default='{}'), video_id)
for item in server_js_data.get('instances', []):
if item[1][0] == 'VideoConfig':
video_data = video_data_list2dict(item[2][0]['videoData'])
break
if not video_data:
if not fatal_if_no_video:
return webpage, False
@@ -276,7 +259,8 @@ class FacebookIE(InfoExtractor):
raise ExtractorError('Cannot parse data')
formats = []
for format_id, f in video_data.items():
for f in video_data:
format_id = f['stream_type']
if f and isinstance(f, dict):
f = [f]
if not f or not isinstance(f, list):

View File

@@ -2,25 +2,27 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from .streamable import StreamableIE
class FootyRoomIE(InfoExtractor):
_VALID_URL = r'https?://footyroom\.com/(?P<id>[^/]+)'
_VALID_URL = r'https?://footyroom\.com/matches/(?P<id>\d+)'
_TESTS = [{
'url': 'http://footyroom.com/schalke-04-0-2-real-madrid-2015-02/',
'url': 'http://footyroom.com/matches/79922154/hull-city-vs-chelsea/review',
'info_dict': {
'id': 'schalke-04-0-2-real-madrid-2015-02',
'title': 'Schalke 04 0 2 Real Madrid',
'id': '79922154',
'title': 'VIDEO Hull City 0 - 2 Chelsea',
},
'playlist_count': 3,
'skip': 'Video for this match is not available',
'playlist_count': 2,
'add_ie': [StreamableIE.ie_key()],
}, {
'url': 'http://footyroom.com/georgia-0-2-germany-2015-03/',
'url': 'http://footyroom.com/matches/75817984/georgia-vs-germany/review',
'info_dict': {
'id': 'georgia-0-2-germany-2015-03',
'title': 'Georgia 0 2 Germany',
'id': '75817984',
'title': 'VIDEO Georgia 0 - 2 Germany',
},
'playlist_count': 1,
'add_ie': ['Playwire']
}]
def _real_extract(self, url):
@@ -28,9 +30,8 @@ class FootyRoomIE(InfoExtractor):
webpage = self._download_webpage(url, playlist_id)
playlist = self._parse_json(
self._search_regex(
r'VideoSelector\.load\((\[.+?\])\);', webpage, 'video selector'),
playlist = self._parse_json(self._search_regex(
r'DataStore\.media\s*=\s*([^;]+)', webpage, 'media data'),
playlist_id)
playlist_title = self._og_search_title(webpage)
@@ -40,11 +41,16 @@ class FootyRoomIE(InfoExtractor):
payload = video.get('payload')
if not payload:
continue
playwire_url = self._search_regex(
playwire_url = self._html_search_regex(
r'data-config="([^"]+)"', payload,
'playwire url', default=None)
if playwire_url:
entries.append(self.url_result(self._proto_relative_url(
playwire_url, 'http:'), 'Playwire'))
streamable_url = StreamableIE._extract_url(payload)
if streamable_url:
entries.append(self.url_result(
streamable_url, StreamableIE.ie_key()))
return self.playlist_result(entries, playlist_id, playlist_title)

View File

@@ -0,0 +1,43 @@
# coding: utf-8
from __future__ import unicode_literals
from .anvato import AnvatoIE
from ..utils import js_to_json
class FOX9IE(AnvatoIE):
_VALID_URL = r'https?://(?:www\.)?fox9\.com/(?:[^/]+/)+(?P<id>\d+)-story'
_TESTS = [{
'url': 'http://www.fox9.com/news/215123287-story',
'md5': 'd6e1b2572c3bab8a849c9103615dd243',
'info_dict': {
'id': '314473',
'ext': 'mp4',
'title': 'Bear climbs tree in downtown Duluth',
'description': 'md5:6a36bfb5073a411758a752455408ac90',
'duration': 51,
'timestamp': 1478123580,
'upload_date': '20161102',
'uploader': 'EPFOX',
'categories': ['News', 'Sports'],
'tags': ['news', 'video'],
},
}, {
'url': 'http://www.fox9.com/news/investigators/214070684-story',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_id = self._parse_json(
self._search_regex(
r'AnvatoPlaylist\s*\(\s*(\[.+?\])\s*\)\s*;',
webpage, 'anvato playlist'),
video_id, transform_source=js_to_json)[0]['video']
return self._get_anvato_videos(
'anvato_epfox_app_web_prod_b3373168e12f423f41504f207000188daf88251b',
video_id)

View File

@@ -47,6 +47,8 @@ from .svt import SVTIE
from .pornhub import PornHubIE
from .xhamster import XHamsterEmbedIE
from .tnaflix import TNAFlixNetworkEmbedIE
from .drtuber import DrTuberIE
from .redtube import RedTubeIE
from .vimeo import VimeoIE
from .dailymotion import (
DailymotionIE,
@@ -1208,20 +1210,6 @@ class GenericIE(InfoExtractor):
'duration': 51690,
},
},
# JWPlayer with M3U8
{
'url': 'http://ren.tv/novosti/2015-09-25/sluchaynyy-prohozhiy-poymal-avtougonshchika-v-murmanske-video',
'info_dict': {
'id': 'playlist',
'ext': 'mp4',
'title': 'Случайный прохожий поймал автоугонщика в Мурманске. ВИДЕО | РЕН ТВ',
'uploader': 'ren.tv',
},
'params': {
# m3u8 downloads
'skip_download': True,
}
},
# Brightcove embed, with no valid 'renditions' but valid 'IOSRenditions'
# This video can't be played in browsers if Flash disabled and UA set to iPhone, which is actually a false alarm
{
@@ -1648,6 +1636,10 @@ class GenericIE(InfoExtractor):
doc = compat_etree_fromstring(webpage.encode('utf-8'))
if doc.tag == 'rss':
return self._extract_rss(url, video_id, doc)
elif doc.tag == 'SmoothStreamingMedia':
info_dict['formats'] = self._parse_ism_formats(doc, url)
self._sort_formats(info_dict['formats'])
return info_dict
elif re.match(r'^(?:{[^}]+})?smil$', doc.tag):
smil = self._parse_smil(doc, url, video_id)
self._sort_formats(smil['formats'])
@@ -1991,11 +1983,6 @@ class GenericIE(InfoExtractor):
if sportbox_urls:
return _playlist_from_matches(sportbox_urls, ie='SportBoxEmbed')
# Look for embedded PornHub player
pornhub_url = PornHubIE._extract_url(webpage)
if pornhub_url:
return self.url_result(pornhub_url, 'PornHub')
# Look for embedded XHamster player
xhamster_urls = XHamsterEmbedIE._extract_urls(webpage)
if xhamster_urls:
@@ -2006,6 +1993,21 @@ class GenericIE(InfoExtractor):
if tnaflix_urls:
return _playlist_from_matches(tnaflix_urls, ie=TNAFlixNetworkEmbedIE.ie_key())
# Look for embedded PornHub player
pornhub_urls = PornHubIE._extract_urls(webpage)
if pornhub_urls:
return _playlist_from_matches(pornhub_urls, ie=PornHubIE.ie_key())
# Look for embedded DrTuber player
drtuber_urls = DrTuberIE._extract_urls(webpage)
if drtuber_urls:
return _playlist_from_matches(drtuber_urls, ie=DrTuberIE.ie_key())
# Look for embedded RedTube player
redtube_urls = RedTubeIE._extract_urls(webpage)
if redtube_urls:
return _playlist_from_matches(redtube_urls, ie=RedTubeIE.ie_key())
# Look for embedded Tvigle player
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//cloud\.tvigle\.ru/video/.+?)\1', webpage)
@@ -2463,6 +2465,21 @@ class GenericIE(InfoExtractor):
entry_info_dict['formats'] = self._extract_mpd_formats(video_url, video_id)
elif ext == 'f4m':
entry_info_dict['formats'] = self._extract_f4m_formats(video_url, video_id)
elif re.search(r'(?i)\.(?:ism|smil)/manifest', video_url) and video_url != url:
# Just matching .ism/manifest is not enough to be reliably sure
# whether it's actually an ISM manifest or some other streaming
# manifest since there are various streaming URL formats
# possible (see [1]) as well as some other shenanigans like
# .smil/manifest URLs that actually serve an ISM (see [2]) and
# so on.
# Thus the most reasonable way to solve this is to delegate
# to generic extractor in order to look into the contents of
# the manifest itself.
# 1. https://azure.microsoft.com/en-us/documentation/articles/media-services-deliver-content-overview/#streaming-url-formats
# 2. https://svs.itworkscdn.net/lbcivod/smil:itwfcdn/lbci/170976.smil/Manifest
entry_info_dict = self.url_result(
smuggle_url(video_url, {'to_generic': True}),
GenericIE.ie_key())
else:
entry_info_dict['url'] = video_url

View File

@@ -4,9 +4,6 @@ import itertools
import re
from .common import SearchInfoExtractor
from ..compat import (
compat_urllib_parse,
)
class GoogleSearchIE(SearchInfoExtractor):
@@ -34,13 +31,16 @@ class GoogleSearchIE(SearchInfoExtractor):
}
for pagenum in itertools.count():
result_url = (
'http://www.google.com/search?tbm=vid&q=%s&start=%s&hl=en'
% (compat_urllib_parse.quote_plus(query), pagenum * 10))
webpage = self._download_webpage(
result_url, 'gvsearch:' + query,
note='Downloading result page ' + str(pagenum + 1))
'http://www.google.com/search',
'gvsearch:' + query,
note='Downloading result page %s' % (pagenum + 1),
query={
'tbm': 'vid',
'q': query,
'start': pagenum * 10,
'hl': 'en',
})
for hit_idx, mobj in enumerate(re.finditer(
r'<h3 class="r"><a href="([^"]+)"', webpage)):

View File

@@ -12,17 +12,7 @@ from ..utils import (
)
class HBOIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?hbo\.com/video/video\.html\?.*vid=(?P<id>[0-9]+)'
_TEST = {
'url': 'http://www.hbo.com/video/video.html?autoplay=true&g=u&vid=1437839',
'md5': '1c33253f0c7782142c993c0ba62a8753',
'info_dict': {
'id': '1437839',
'ext': 'mp4',
'title': 'Ep. 64 Clip: Encryption',
}
}
class HBOBaseIE(InfoExtractor):
_FORMATS_INFO = {
'1920': {
'width': 1280,
@@ -50,8 +40,7 @@ class HBOIE(InfoExtractor):
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
def _extract_from_id(self, video_id):
video_data = self._download_xml(
'http://render.lv3.hbo.com/data/content/global/videos/data/%s.xml' % video_id, video_id)
title = xpath_text(video_data, 'title', 'title', True)
@@ -116,7 +105,60 @@ class HBOIE(InfoExtractor):
return {
'id': video_id,
'title': title,
'duration': parse_duration(xpath_element(video_data, 'duration/tv14')),
'duration': parse_duration(xpath_text(video_data, 'duration/tv14')),
'formats': formats,
'thumbnails': thumbnails,
}
class HBOIE(HBOBaseIE):
_VALID_URL = r'https?://(?:www\.)?hbo\.com/video/video\.html\?.*vid=(?P<id>[0-9]+)'
_TEST = {
'url': 'http://www.hbo.com/video/video.html?autoplay=true&g=u&vid=1437839',
'md5': '1c33253f0c7782142c993c0ba62a8753',
'info_dict': {
'id': '1437839',
'ext': 'mp4',
'title': 'Ep. 64 Clip: Encryption',
'thumbnail': 're:https?://.*\.jpg$',
'duration': 1072,
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
return self._extract_from_id(video_id)
class HBOEpisodeIE(HBOBaseIE):
_VALID_URL = r'https?://(?:www\.)?hbo\.com/(?!video)([^/]+/)+video/(?P<id>[0-9a-z-]+)\.html'
_TESTS = [{
'url': 'http://www.hbo.com/girls/episodes/5/52-i-love-you-baby/video/ep-52-inside-the-episode.html?autoplay=true',
'md5': '689132b253cc0ab7434237fc3a293210',
'info_dict': {
'id': '1439518',
'display_id': 'ep-52-inside-the-episode',
'ext': 'mp4',
'title': 'Ep. 52: Inside the Episode',
'thumbnail': 're:https?://.*\.jpg$',
'duration': 240,
},
}, {
'url': 'http://www.hbo.com/game-of-thrones/about/video/season-5-invitation-to-the-set.html?autoplay=true',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
r'(?P<q1>[\'"])videoId(?P=q1)\s*:\s*(?P<q2>[\'"])(?P<video_id>\d+)(?P=q2)',
webpage, 'video ID', group='video_id')
info_dict = self._extract_from_id(video_id)
info_dict['display_id'] = display_id
return info_dict

View File

@@ -1,8 +1,6 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
@@ -14,29 +12,24 @@ class HornBunnyIE(InfoExtractor):
_VALID_URL = r'http?://(?:www\.)?hornbunny\.com/videos/(?P<title_dash>[a-z-]+)-(?P<id>\d+)\.html'
_TEST = {
'url': 'http://hornbunny.com/videos/panty-slut-jerk-off-instruction-5227.html',
'md5': '95e40865aedd08eff60272b704852ad7',
'md5': 'e20fd862d1894b67564c96f180f43924',
'info_dict': {
'id': '5227',
'ext': 'flv',
'ext': 'mp4',
'title': 'panty slut jerk off instruction',
'duration': 550,
'age_limit': 18,
'view_count': int,
'thumbnail': 're:^https?://.*\.jpg$',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
webpage = self._download_webpage(
url, video_id, note='Downloading initial webpage')
title = self._html_search_regex(
r'class="title">(.*?)</h2>', webpage, 'title')
redirect_url = self._html_search_regex(
r'pg&settings=(.*?)\|0"\);', webpage, 'title')
webpage2 = self._download_webpage(redirect_url, video_id)
video_url = self._html_search_regex(
r'flvMask:(.*?);', webpage2, 'video_url')
webpage = self._download_webpage(url, video_id)
title = self._og_search_title(webpage)
info_dict = self._parse_html5_media_entries(url, webpage, video_id)[0]
duration = parse_duration(self._search_regex(
r'<strong>Runtime:</strong>\s*([0-9:]+)</div>',
@@ -45,12 +38,12 @@ class HornBunnyIE(InfoExtractor):
r'<strong>Views:</strong>\s*(\d+)</div>',
webpage, 'view count', fatal=False))
return {
info_dict.update({
'id': video_id,
'url': video_url,
'title': title,
'ext': 'flv',
'duration': duration,
'view_count': view_count,
'age_limit': 18,
}
})
return info_dict

View File

@@ -0,0 +1,56 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
parse_duration,
parse_iso8601,
)
class HuajiaoIE(InfoExtractor):
IE_DESC = '花椒直播'
_VALID_URL = r'https?://(?:www\.)?huajiao\.com/l/(?P<id>[0-9]+)'
_TEST = {
'url': 'http://www.huajiao.com/l/38941232',
'md5': 'd08bf9ac98787d24d1e4c0283f2d372d',
'info_dict': {
'id': '38941232',
'ext': 'mp4',
'title': '#新人求关注#',
'description': 're:.*',
'duration': 2424.0,
'thumbnail': 're:^https?://.*\.jpg$',
'timestamp': 1475866459,
'upload_date': '20161007',
'uploader': 'Penny_余姿昀',
'uploader_id': '75206005',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
feed_json = self._search_regex(
r'var\s+feed\s*=\s*({.+})', webpage, 'feed json')
feed = self._parse_json(feed_json, video_id)
description = self._html_search_meta(
'description', webpage, 'description', fatal=False)
def get(section, field):
return feed.get(section, {}).get(field)
return {
'id': video_id,
'title': feed['feed']['formated_title'],
'description': description,
'duration': parse_duration(get('feed', 'duration')),
'thumbnail': get('feed', 'image'),
'timestamp': parse_iso8601(feed.get('creatime'), ' '),
'uploader': get('author', 'nickname'),
'uploader_id': get('author', 'uid'),
'formats': self._extract_m3u8_formats(
feed['feed']['m3u8'], video_id, 'mp4', 'm3u8_native'),
}

View File

@@ -13,7 +13,7 @@ from ..utils import (
class ImgurIE(InfoExtractor):
_VALID_URL = r'https?://(?:i\.)?imgur\.com/(?:(?:gallery|topic/[^/]+)/)?(?P<id>[a-zA-Z0-9]{6,})(?:[/?#&]+|\.[a-z]+)?$'
_VALID_URL = r'https?://(?:i\.)?imgur\.com/(?:(?:gallery|(?:topic|r)/[^/]+)/)?(?P<id>[a-zA-Z0-9]{6,})(?:[/?#&]+|\.[a-z]+)?$'
_TESTS = [{
'url': 'https://i.imgur.com/A61SaA1.gifv',
@@ -43,6 +43,9 @@ class ImgurIE(InfoExtractor):
}, {
'url': 'http://imgur.com/topic/Funny/N8rOudd',
'only_matching': True,
}, {
'url': 'http://imgur.com/r/aww/VQcQPhM',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@@ -0,0 +1,107 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from ..compat import compat_urlparse
from .common import InfoExtractor
class JamendoIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?jamendo\.com/track/(?P<id>[0-9]+)/(?P<display_id>[^/?#&]+)'
_TEST = {
'url': 'https://www.jamendo.com/track/196219/stories-from-emona-i',
'md5': '6e9e82ed6db98678f171c25a8ed09ffd',
'info_dict': {
'id': '196219',
'display_id': 'stories-from-emona-i',
'ext': 'flac',
'title': 'Stories from Emona I',
'thumbnail': 're:^https?://.*\.jpg'
}
}
def _real_extract(self, url):
mobj = self._VALID_URL_RE.match(url)
track_id = mobj.group('id')
display_id = mobj.group('display_id')
webpage = self._download_webpage(url, display_id)
title = self._html_search_meta('name', webpage, 'title')
formats = [{
'url': 'https://%s.jamendo.com/?trackid=%s&format=%s&from=app-97dab294'
% (sub_domain, track_id, format_id),
'format_id': format_id,
'ext': ext,
'quality': quality,
} for quality, (format_id, sub_domain, ext) in enumerate((
('mp31', 'mp3l', 'mp3'),
('mp32', 'mp3d', 'mp3'),
('ogg1', 'ogg', 'ogg'),
('flac', 'flac', 'flac'),
))]
self._sort_formats(formats)
thumbnail = self._html_search_meta(
'image', webpage, 'thumbnail', fatal=False)
return {
'id': track_id,
'display_id': display_id,
'thumbnail': thumbnail,
'title': title,
'formats': formats
}
class JamendoAlbumIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?jamendo\.com/album/(?P<id>[0-9]+)/(?P<display_id>[\w-]+)'
_TEST = {
'url': 'https://www.jamendo.com/album/121486/duck-on-cover',
'info_dict': {
'id': '121486',
'title': 'Duck On Cover'
},
'playlist': [{
'md5': 'e1a2fcb42bda30dfac990212924149a8',
'info_dict': {
'id': '1032333',
'ext': 'flac',
'title': 'Warmachine'
}
}, {
'md5': '1f358d7b2f98edfe90fd55dac0799d50',
'info_dict': {
'id': '1032330',
'ext': 'flac',
'title': 'Without Your Ghost'
}
}],
'params': {
'playlistend': 2
}
}
def _real_extract(self, url):
mobj = self._VALID_URL_RE.match(url)
album_id = mobj.group('id')
webpage = self._download_webpage(url, mobj.group('display_id'))
title = self._html_search_meta('name', webpage, 'title')
entries = [
self.url_result(
compat_urlparse.urljoin(url, m.group('path')),
ie=JamendoIE.ie_key(),
video_id=self._search_regex(
r'/track/(\d+)', m.group('path'),
'track id', default=None))
for m in re.finditer(
r'<a[^>]+href=(["\'])(?P<path>(?:(?!\1).)+)\1[^>]+class=["\'][^>]*js-trackrow-albumpage-link',
webpage)
]
return self.playlist_result(entries, album_id, title)

View File

@@ -1,45 +1,86 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
unescapeHTML,
int_or_none,
parse_duration,
get_element_by_class,
)
class LEGOIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?lego\.com/(?:[^/]+/)*videos/(?:[^/]+/)*[^/?#]+-(?P<id>[0-9a-f]+)'
_TEST = {
_VALID_URL = r'https?://(?:www\.)?lego\.com/(?P<locale>[^/]+)/(?:[^/]+/)*videos/(?:[^/]+/)*[^/?#]+-(?P<id>[0-9a-f]+)'
_TESTS = [{
'url': 'http://www.lego.com/en-us/videos/themes/club/blocumentary-kawaguchi-55492d823b1b4d5e985787fa8c2973b1',
'md5': 'f34468f176cfd76488767fc162c405fa',
'info_dict': {
'id': '55492d823b1b4d5e985787fa8c2973b1',
'ext': 'mp4',
'title': 'Blocumentary Great Creations: Akiyuki Kawaguchi',
}
}
'description': 'Blocumentary Great Creations: Akiyuki Kawaguchi',
},
}, {
# geo-restricted but the contentUrl contain a valid url
'url': 'http://www.lego.com/nl-nl/videos/themes/nexoknights/episode-20-kingdom-of-heroes-13bdc2299ab24d9685701a915b3d71e7##sp=399',
'md5': '4c3fec48a12e40c6e5995abc3d36cc2e',
'info_dict': {
'id': '13bdc2299ab24d9685701a915b3d71e7',
'ext': 'mp4',
'title': 'Aflevering 20 - Helden van het koninkrijk',
'description': 'md5:8ee499aac26d7fa8bcb0cedb7f9c3941',
},
}, {
# special characters in title
'url': 'http://www.lego.com/en-us/starwars/videos/lego-star-wars-force-surprise-9685ee9d12e84ff38e84b4e3d0db533d',
'info_dict': {
'id': '9685ee9d12e84ff38e84b4e3d0db533d',
'ext': 'mp4',
'title': 'Force Surprise LEGO® Star Wars™ Microfighters',
'description': 'md5:9c673c96ce6f6271b88563fe9dc56de3',
},
'params': {
'skip_download': True,
},
}]
_BITRATES = [256, 512, 1024, 1536, 2560]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(
'http://www.lego.com/en-US/mediaplayer/video/' + video_id, video_id)
title = self._search_regex(r'<title>(.+?)</title>', webpage, 'title')
video_data = self._parse_json(unescapeHTML(self._search_regex(
r"video='([^']+)'", webpage, 'video data')), video_id)
progressive_base = self._search_regex(
r'data-video-progressive-url="([^"]+)"',
webpage, 'progressive base', default='https://lc-mediaplayerns-live-s.legocdn.com/')
streaming_base = self._search_regex(
r'data-video-streaming-url="([^"]+)"',
webpage, 'streaming base', default='http://legoprod-f.akamaihd.net/')
item_id = video_data['ItemId']
locale, video_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, video_id)
title = get_element_by_class('video-header', webpage).strip()
progressive_base = 'https://lc-mediaplayerns-live-s.legocdn.com/'
streaming_base = 'http://legoprod-f.akamaihd.net/'
content_url = self._html_search_meta('contentUrl', webpage)
path = self._search_regex(
r'(?:https?:)?//[^/]+/(?:[iz]/s/)?public/(.+)_[0-9,]+\.(?:mp4|webm)',
content_url, 'video path', default=None)
if not path:
player_url = self._proto_relative_url(self._search_regex(
r'<iframe[^>]+src="((?:https?)?//(?:www\.)?lego\.com/[^/]+/mediaplayer/video/[^"]+)',
webpage, 'player url', default=None))
if not player_url:
base_url = self._proto_relative_url(self._search_regex(
r'data-baseurl="([^"]+)"', webpage, 'base url',
default='http://www.lego.com/%s/mediaplayer/video/' % locale))
player_url = base_url + video_id
player_webpage = self._download_webpage(player_url, video_id)
video_data = self._parse_json(unescapeHTML(self._search_regex(
r"video='([^']+)'", player_webpage, 'video data')), video_id)
progressive_base = self._search_regex(
r'data-video-progressive-url="([^"]+)"',
player_webpage, 'progressive base', default='https://lc-mediaplayerns-live-s.legocdn.com/')
streaming_base = self._search_regex(
r'data-video-streaming-url="([^"]+)"',
player_webpage, 'streaming base', default='http://legoprod-f.akamaihd.net/')
item_id = video_data['ItemId']
net_storage_path = video_data.get('NetStoragePath') or '/'.join([item_id[:2], item_id[2:4]])
base_path = '_'.join([item_id, video_data['VideoId'], video_data['Locale'], compat_str(video_data['VideoVersion'])])
path = '/'.join([net_storage_path, base_path])
net_storage_path = video_data.get('NetStoragePath') or '/'.join([item_id[:2], item_id[2:4]])
base_path = '_'.join([item_id, video_data['VideoId'], video_data['Locale'], compat_str(video_data['VideoVersion'])])
path = '/'.join([net_storage_path, base_path])
streaming_path = ','.join(map(lambda bitrate: compat_str(bitrate), self._BITRATES))
formats = self._extract_akamai_formats(
@@ -80,7 +121,8 @@ class LEGOIE(InfoExtractor):
return {
'id': video_id,
'title': title,
'thumbnail': video_data.get('CoverImageUrl'),
'duration': int_or_none(video_data.get('Length')),
'description': self._html_search_meta('description', webpage),
'thumbnail': self._html_search_meta('thumbnail', webpage),
'duration': parse_duration(self._html_search_meta('duration', webpage)),
'formats': formats,
}

View File

@@ -2,7 +2,6 @@
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..utils import (
@@ -52,8 +51,8 @@ class LiTVIE(InfoExtractor):
'skip': 'Georestricted to Taiwan',
}]
def _extract_playlist(self, season_list, video_id, vod_data, view_data, prompt=True):
episode_title = view_data['title']
def _extract_playlist(self, season_list, video_id, program_info, prompt=True):
episode_title = program_info['title']
content_id = season_list['contentId']
if prompt:
@@ -61,7 +60,7 @@ class LiTVIE(InfoExtractor):
all_episodes = [
self.url_result(smuggle_url(
self._URL_TEMPLATE % (view_data['contentType'], episode['contentId']),
self._URL_TEMPLATE % (program_info['contentType'], episode['contentId']),
{'force_noplaylist': True})) # To prevent infinite recursion
for episode in season_list['episode']]
@@ -80,19 +79,15 @@ class LiTVIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
view_data = dict(map(lambda t: (t[0], t[2]), re.findall(
r'viewData\.([a-zA-Z]+)\s*=\s*(["\'])([^"\']+)\2',
webpage)))
vod_data = self._parse_json(self._search_regex(
'var\s+vod\s*=\s*([^;]+)', webpage, 'VOD data', default='{}'),
program_info = self._parse_json(self._search_regex(
'var\s+programInfo\s*=\s*([^;]+)', webpage, 'VOD data', default='{}'),
video_id)
season_list = list(vod_data.get('seasonList', {}).values())
season_list = list(program_info.get('seasonList', {}).values())
if season_list:
if not noplaylist:
return self._extract_playlist(
season_list[0], video_id, vod_data, view_data,
season_list[0], video_id, program_info,
prompt=noplaylist_prompt)
if noplaylist_prompt:
@@ -102,8 +97,8 @@ class LiTVIE(InfoExtractor):
# endpoint gives the same result as the data embedded in the webpage.
# If georestricted, there are no embedded data, so an extra request is
# necessary to get the error code
if 'assetId' not in view_data:
view_data = self._download_json(
if 'assetId' not in program_info:
program_info = self._download_json(
'https://www.litv.tv/vod/ajax/getProgramInfo', video_id,
query={'contentId': video_id},
headers={'Accept': 'application/json'})
@@ -112,9 +107,9 @@ class LiTVIE(InfoExtractor):
webpage, 'video data', default='{}'), video_id)
if not video_data:
payload = {
'assetId': view_data['assetId'],
'watchDevices': view_data['watchDevices'],
'contentType': view_data['contentType'],
'assetId': program_info['assetId'],
'watchDevices': program_info['watchDevices'],
'contentType': program_info['contentType'],
}
video_data = self._download_json(
'https://www.litv.tv/vod/getMainUrl', video_id,
@@ -136,11 +131,11 @@ class LiTVIE(InfoExtractor):
# LiTV HLS segments doesn't like compressions
a_format.setdefault('http_headers', {})['Youtubedl-no-compression'] = True
title = view_data['title'] + view_data.get('secondaryMark', '')
description = view_data.get('description')
thumbnail = view_data.get('imageFile')
categories = [item['name'] for item in vod_data.get('category', [])]
episode = int_or_none(view_data.get('episode'))
title = program_info['title'] + program_info.get('secondaryMark', '')
description = program_info.get('description')
thumbnail = program_info.get('imageFile')
categories = [item['name'] for item in program_info.get('category', [])]
episode = int_or_none(program_info.get('episode'))
return {
'id': video_id,

View File

@@ -94,12 +94,12 @@ class LyndaBaseIE(InfoExtractor):
class LyndaIE(LyndaBaseIE):
IE_NAME = 'lynda'
IE_DESC = 'lynda.com videos'
_VALID_URL = r'https?://(?:www\.)?lynda\.com/(?:[^/]+/[^/]+/\d+|player/embed)/(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?lynda\.com/(?:[^/]+/[^/]+/(?P<course_id>\d+)|player/embed)/(?P<id>\d+)'
_TIMECODE_REGEX = r'\[(?P<timecode>\d+:\d+:\d+[\.,]\d+)\]'
_TESTS = [{
'url': 'http://www.lynda.com/Bootstrap-tutorials/Using-exercise-files/110885/114408-4.html',
'url': 'https://www.lynda.com/Bootstrap-tutorials/Using-exercise-files/110885/114408-4.html',
# md5 is unstable
'info_dict': {
'id': '114408',
@@ -112,19 +112,71 @@ class LyndaIE(LyndaBaseIE):
'only_matching': True,
}]
def _raise_unavailable(self, video_id):
self.raise_login_required(
'Video %s is only available for members' % video_id)
def _real_extract(self, url):
video_id = self._match_id(url)
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
course_id = mobj.group('course_id')
query = {
'videoId': video_id,
'type': 'video',
}
video = self._download_json(
'http://www.lynda.com/ajax/player?videoId=%s&type=video' % video_id,
video_id, 'Downloading video JSON')
'https://www.lynda.com/ajax/player', video_id,
'Downloading video JSON', fatal=False, query=query)
# Fallback scenario
if not video:
query['courseId'] = course_id
play = self._download_json(
'https://www.lynda.com/ajax/course/%s/%s/play'
% (course_id, video_id), video_id, 'Downloading play JSON')
if not play:
self._raise_unavailable(video_id)
formats = []
for formats_dict in play:
urls = formats_dict.get('urls')
if not isinstance(urls, dict):
continue
cdn = formats_dict.get('name')
for format_id, format_url in urls.items():
if not format_url:
continue
formats.append({
'url': format_url,
'format_id': '%s-%s' % (cdn, format_id) if cdn else format_id,
'height': int_or_none(format_id),
})
self._sort_formats(formats)
conviva = self._download_json(
'https://www.lynda.com/ajax/player/conviva', video_id,
'Downloading conviva JSON', query=query)
return {
'id': video_id,
'title': conviva['VideoTitle'],
'description': conviva.get('VideoDescription'),
'release_year': int_or_none(conviva.get('ReleaseYear')),
'duration': int_or_none(conviva.get('Duration')),
'creator': conviva.get('Author'),
'formats': formats,
}
if 'Status' in video:
raise ExtractorError(
'lynda returned error: %s' % video['Message'], expected=True)
if video.get('HasAccess') is False:
self.raise_login_required('Video %s is only available for members' % video_id)
self._raise_unavailable(video_id)
video_id = compat_str(video.get('ID') or video_id)
duration = int_or_none(video.get('DurationInSeconds'))
@@ -148,7 +200,7 @@ class LyndaIE(LyndaBaseIE):
for prioritized_stream_id, prioritized_stream in prioritized_streams.items():
formats.extend([{
'url': video_url,
'width': int_or_none(format_id),
'height': int_or_none(format_id),
'format_id': '%s-%s' % (prioritized_stream_id, format_id),
} for format_id, video_url in prioritized_stream.items()])
@@ -187,7 +239,7 @@ class LyndaIE(LyndaBaseIE):
return srt
def _get_subtitles(self, video_id):
url = 'http://www.lynda.com/ajax/player?videoId=%s&type=transcript' % video_id
url = 'https://www.lynda.com/ajax/player?videoId=%s&type=transcript' % video_id
subs = self._download_json(url, None, False)
if subs:
return {'en': [{'ext': 'srt', 'data': self._fix_subtitles(subs)}]}
@@ -209,7 +261,7 @@ class LyndaCourseIE(LyndaBaseIE):
course_id = mobj.group('courseid')
course = self._download_json(
'http://www.lynda.com/ajax/player?courseId=%s&type=course' % course_id,
'https://www.lynda.com/ajax/player?courseId=%s&type=course' % course_id,
course_id, 'Downloading course JSON')
if course.get('Status') == 'NotFound':
@@ -231,7 +283,7 @@ class LyndaCourseIE(LyndaBaseIE):
if video_id:
entries.append({
'_type': 'url_transparent',
'url': 'http://www.lynda.com/%s/%s-4.html' % (course_path, video_id),
'url': 'https://www.lynda.com/%s/%s-4.html' % (course_path, video_id),
'ie_key': LyndaIE.ie_key(),
'chapter': chapter.get('Title'),
'chapter_number': int_or_none(chapter.get('ChapterIndex')),

View File

@@ -71,12 +71,15 @@ class MicrosoftVirtualAcademyIE(MicrosoftVirtualAcademyBaseIE):
formats = []
for sources in settings.findall(compat_xpath('.//MediaSources')):
if sources.get('videoType') == 'smoothstreaming':
continue
sources_type = sources.get('videoType')
for source in sources.findall(compat_xpath('./MediaSource')):
video_url = source.text
if not video_url or not video_url.startswith('http'):
continue
if sources_type == 'smoothstreaming':
formats.extend(self._extract_ism_formats(
video_url, video_id, 'mss', fatal=False))
continue
video_mode = source.get('videoMode')
height = int_or_none(self._search_regex(
r'^(\d+)[pP]$', video_mode or '', 'height', default=None))

View File

@@ -1,19 +1,20 @@
# coding: utf-8
from __future__ import unicode_literals
import re
import uuid
from .common import InfoExtractor
from ..compat import (
compat_str,
compat_urllib_parse_urlencode,
compat_urlparse,
)
from ..utils import (
get_element_by_attribute,
int_or_none,
remove_start,
extract_attributes,
determine_ext,
smuggle_url,
parse_duration,
)
@@ -72,16 +73,14 @@ class MiTeleBaseIE(InfoExtractor):
}
class MiTeleIE(MiTeleBaseIE):
class MiTeleIE(InfoExtractor):
IE_DESC = 'mitele.es'
_VALID_URL = r'https?://(?:www\.)?mitele\.es/(?:[^/]+/){3}(?P<id>[^/]+)/'
_VALID_URL = r'https?://(?:www\.)?mitele\.es/programas-tv/(?:[^/]+/)(?P<id>[^/]+)/player'
_TESTS = [{
'url': 'http://www.mitele.es/programas-tv/diario-de/la-redaccion/programa-144/',
# MD5 is unstable
'url': 'http://www.mitele.es/programas-tv/diario-de/57b0dfb9c715da65618b4afa/player',
'info_dict': {
'id': '0NF1jJnxS1Wu3pHrmvFyw2',
'display_id': 'programa-144',
'id': '57b0dfb9c715da65618b4afa',
'ext': 'mp4',
'title': 'Tor, la web invisible',
'description': 'md5:3b6fce7eaa41b2d97358726378d9369f',
@@ -91,57 +90,71 @@ class MiTeleIE(MiTeleBaseIE):
'thumbnail': 're:(?i)^https?://.*\.jpg$',
'duration': 2913,
},
'add_ie': ['Ooyala'],
}, {
# no explicit title
'url': 'http://www.mitele.es/programas-tv/cuarto-milenio/temporada-6/programa-226/',
'url': 'http://www.mitele.es/programas-tv/cuarto-milenio/57b0de3dc915da14058b4876/player',
'info_dict': {
'id': 'eLZSwoEd1S3pVyUm8lc6F',
'display_id': 'programa-226',
'id': '57b0de3dc915da14058b4876',
'ext': 'mp4',
'title': 'Cuarto Milenio - Temporada 6 - Programa 226',
'description': 'md5:50daf9fadefa4e62d9fc866d0c015701',
'title': 'Cuarto Milenio Temporada 6 Programa 226',
'description': 'md5:5ff132013f0cd968ffbf1f5f3538a65f',
'series': 'Cuarto Milenio',
'season': 'Temporada 6',
'episode': 'Programa 226',
'thumbnail': 're:(?i)^https?://.*\.jpg$',
'duration': 7312,
'duration': 7313,
},
'params': {
'skip_download': True,
},
'add_ie': ['Ooyala'],
}]
def _real_extract(self, url):
display_id = self._match_id(url)
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
webpage = self._download_webpage(url, display_id)
gigya_url = self._search_regex(r'<gigya-api>[^>]*</gigya-api>[^>]*<script\s*src="([^"]*)">[^>]*</script>', webpage, 'gigya', default=None)
gigya_sc = self._download_webpage(compat_urlparse.urljoin(r'http://www.mitele.es/', gigya_url), video_id, 'Downloading gigya script')
# Get a appKey/uuid for getting the session key
appKey_var = self._search_regex(r'value\("appGridApplicationKey",([0-9a-f]+)\)', gigya_sc, 'appKey variable')
appKey = self._search_regex(r'var %s="([0-9a-f]+)"' % appKey_var, gigya_sc, 'appKey')
uid = compat_str(uuid.uuid4())
session_url = 'https://appgrid-api.cloud.accedo.tv/session?appKey=%s&uuid=%s' % (appKey, uid)
session_json = self._download_json(session_url, video_id, 'Downloading session keys')
sessionKey = compat_str(session_json['sessionKey'])
info = self._get_player_info(url, webpage)
paths_url = 'https://appgrid-api.cloud.accedo.tv/metadata/general_configuration,%20web_configuration?sessionKey=' + sessionKey
paths = self._download_json(paths_url, video_id, 'Downloading paths JSON')
ooyala_s = paths['general_configuration']['api_configuration']['ooyala_search']
data_p = (
'http://' + ooyala_s['base_url'] + ooyala_s['full_path'] + ooyala_s['provider_id'] +
'/docs/' + video_id + '?include_titles=Series,Season&product_name=test&format=full')
data = self._download_json(data_p, video_id, 'Downloading data JSON')
source = data['hits']['hits'][0]['_source']
embedCode = source['offers'][0]['embed_codes'][0]
title = self._search_regex(
r'class="Destacado-text"[^>]*>\s*<strong>([^<]+)</strong>',
webpage, 'title', default=None)
titles = source['localizable_titles'][0]
title = titles.get('title_medium') or titles['title_long']
episode = titles['title_sort_name']
description = titles['summary_long']
titles_series = source['localizable_titles_series'][0]
series = titles_series['title_long']
titles_season = source['localizable_titles_season'][0]
season = titles_season['title_medium']
duration = parse_duration(source['videos'][0]['duration'])
mobj = re.search(r'''(?sx)
class="Destacado-text"[^>]*>.*?<h1>\s*
<span>(?P<series>[^<]+)</span>\s*
<span>(?P<season>[^<]+)</span>\s*
<span>(?P<episode>[^<]+)</span>''', webpage)
series, season, episode = mobj.groups() if mobj else [None] * 3
if not title:
if mobj:
title = '%s - %s - %s' % (series, season, episode)
else:
title = remove_start(self._search_regex(
r'<title>([^<]+)</title>', webpage, 'title'), 'Ver online ')
info.update({
'display_id': display_id,
return {
'_type': 'url_transparent',
# for some reason only HLS is supported
'url': smuggle_url('ooyala:' + embedCode, {'supportedformats': 'm3u8'}),
'id': video_id,
'title': title,
'description': get_element_by_attribute('class', 'text', webpage),
'description': description,
'series': series,
'season': season,
'episode': episode,
})
return info
'duration': duration,
'thumbnail': source['images'][0]['url'],
}

View File

@@ -11,7 +11,7 @@ from ..utils import (
class MovieClipsIE(InfoExtractor):
_VALID_URL = r'https?://(?:www.)?movieclips\.com/videos/.+-(?P<id>\d+)(?:\?|$)'
_VALID_URL = r'https?://(?:www\.)?movieclips\.com/videos/.+-(?P<id>\d+)(?:\?|$)'
_TEST = {
'url': 'http://www.movieclips.com/videos/warcraft-trailer-1-561180739597',
'md5': '42b5a0352d4933a7bd54f2104f481244',

View File

@@ -69,10 +69,9 @@ class MSNIE(InfoExtractor):
if not format_url:
continue
ext = determine_ext(format_url)
# .ism is not yet supported (see
# https://github.com/rg3/youtube-dl/issues/8118)
if ext == 'ism':
continue
formats.extend(self._extract_ism_formats(
format_url + '/Manifest', display_id, 'mss', fatal=False))
if 'm3u8' in format_url:
# m3u8_native should not be used here until
# https://github.com/rg3/youtube-dl/issues/9913 is fixed

View File

@@ -4,6 +4,7 @@ import re
from .common import InfoExtractor
from .adobepass import AdobePassIE
from .theplatform import ThePlatformIE
from ..utils import (
smuggle_url,
url_basename,
@@ -65,7 +66,7 @@ class NationalGeographicVideoIE(InfoExtractor):
}
class NationalGeographicIE(AdobePassIE):
class NationalGeographicIE(ThePlatformIE, AdobePassIE):
IE_NAME = 'natgeo'
_VALID_URL = r'https?://channel\.nationalgeographic\.com/(?:wild/)?[^/]+/(?:videos|episodes)/(?P<id>[^/?]+)'
@@ -110,25 +111,39 @@ class NationalGeographicIE(AdobePassIE):
release_url = self._search_regex(
r'video_auth_playlist_url\s*=\s*"([^"]+)"',
webpage, 'release url')
theplatform_path = self._search_regex(r'https?://link.theplatform.com/s/([^?]+)', release_url, 'theplatform path')
video_id = theplatform_path.split('/')[-1]
query = {
'mbr': 'true',
'switch': 'http',
}
is_auth = self._search_regex(r'video_is_auth\s*=\s*"([^"]+)"', webpage, 'is auth', fatal=False)
if is_auth == 'auth':
auth_resource_id = self._search_regex(
r"video_auth_resourceId\s*=\s*'([^']+)'",
webpage, 'auth resource id')
query['auth'] = self._extract_mvpd_auth(url, display_id, 'natgeo', auth_resource_id)
query['auth'] = self._extract_mvpd_auth(url, video_id, 'natgeo', auth_resource_id)
return {
'_type': 'url_transparent',
'ie_key': 'ThePlatform',
'url': smuggle_url(
update_url_query(release_url, query),
{'force_smil_url': True}),
formats = []
subtitles = {}
for key, value in (('switch', 'http'), ('manifest', 'm3u')):
tp_query = query.copy()
tp_query.update({
key: value,
})
tp_formats, tp_subtitles = self._extract_theplatform_smil(
update_url_query(release_url, tp_query), video_id, 'Downloading %s SMIL data' % value)
formats.extend(tp_formats)
subtitles = self._merge_subtitles(subtitles, tp_subtitles)
self._sort_formats(formats)
info = self._extract_theplatform_metadata(theplatform_path, display_id)
info.update({
'id': video_id,
'formats': formats,
'subtitles': subtitles,
'display_id': display_id,
}
})
return info
class NationalGeographicEpisodeGuideIE(InfoExtractor):

View File

@@ -93,7 +93,7 @@ class NextMediaActionNewsIE(NextMediaIE):
class AppleDailyIE(NextMediaIE):
IE_DESC = '臺灣蘋果日報'
_VALID_URL = r'https?://(www|ent)\.appledaily\.com\.tw/(?:animation|appledaily|enews|realtimenews)/[^/]+/[^/]+/(?P<date>\d+)/(?P<id>\d+)(/.*)?'
_VALID_URL = r'https?://(www|ent)\.appledaily\.com\.tw/(?:animation|appledaily|enews|realtimenews|actionnews)/[^/]+/[^/]+/(?P<date>\d+)/(?P<id>\d+)(/.*)?'
_TESTS = [{
'url': 'http://ent.appledaily.com.tw/enews/article/entertainment/20150128/36354694',
'md5': 'a843ab23d150977cc55ef94f1e2c1e4d',
@@ -154,6 +154,9 @@ class AppleDailyIE(NextMediaIE):
'description': 'md5:7b859991a6a4fedbdf3dd3b66545c748',
'upload_date': '20140417',
},
}, {
'url': 'http://www.appledaily.com.tw/actionnews/appledaily/7/20161003/960588/',
'only_matching': True,
}]
_URL_PATTERN = r'\{url: \'(.+)\'\}'

View File

@@ -274,6 +274,18 @@ class NHLIE(InfoExtractor):
'upload_date': '20160204',
'timestamp': 1454544904,
},
}, {
# Some m3u8 URLs are invalid (https://github.com/rg3/youtube-dl/issues/10713)
'url': 'https://www.nhl.com/predators/video/poile-laviolette-on-subban-trade/t-277437416/c-44315003',
'md5': '50b2bb47f405121484dda3ccbea25459',
'info_dict': {
'id': '44315003',
'ext': 'mp4',
'title': 'Poile, Laviolette on Subban trade',
'description': 'General manager David Poile and head coach Peter Laviolette share their thoughts on acquiring P.K. Subban from Montreal (06/29/16)',
'timestamp': 1467242866,
'upload_date': '20160629',
},
}, {
'url': 'https://www.wch2016.com/video/caneur-best-of-game-2-micd-up/t-281230378/c-44983703',
'only_matching': True,
@@ -301,9 +313,11 @@ class NHLIE(InfoExtractor):
continue
ext = determine_ext(playback_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
m3u8_formats = self._extract_m3u8_formats(
playback_url, video_id, 'mp4', 'm3u8_native',
m3u8_id=playback.get('name', 'hls'), fatal=False))
m3u8_id=playback.get('name', 'hls'), fatal=False)
self._check_formats(m3u8_formats, video_id)
formats.extend(m3u8_formats)
else:
height = int_or_none(playback.get('height'))
formats.append({

View File

@@ -1,6 +1,8 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .mtv import MTVServicesInfoExtractor
from ..utils import update_url_query
@@ -69,7 +71,7 @@ class NickIE(MTVServicesInfoExtractor):
class NickDeIE(MTVServicesInfoExtractor):
IE_NAME = 'nick.de'
_VALID_URL = r'https?://(?:www\.)?(?:nick\.de|nickelodeon\.nl)/(?:playlist|shows)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_VALID_URL = r'https?://(?:www\.)?(?P<host>nick\.de|nickelodeon\.(?:nl|at))/(?:playlist|shows)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://www.nick.de/playlist/3773-top-videos/videos/episode/17306-zu-wasser-und-zu-land-rauchende-erdnusse',
'only_matching': True,
@@ -79,15 +81,43 @@ class NickDeIE(MTVServicesInfoExtractor):
}, {
'url': 'http://www.nickelodeon.nl/shows/474-spongebob/videos/17403-een-kijkje-in-de-keuken-met-sandy-van-binnenuit',
'only_matching': True,
}, {
'url': 'http://www.nickelodeon.at/playlist/3773-top-videos/videos/episode/77993-das-letzte-gefecht',
'only_matching': True,
}]
def _extract_mrss_url(self, webpage, host):
return update_url_query(self._search_regex(
r'data-mrss=(["\'])(?P<url>http.+?)\1', webpage, 'mrss url', group='url'),
{'siteKey': host})
def _real_extract(self, url):
video_id = self._match_id(url)
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
host = mobj.group('host')
webpage = self._download_webpage(url, video_id)
mrss_url = update_url_query(self._search_regex(
r'data-mrss=(["\'])(?P<url>http.+?)\1', webpage, 'mrss url', group='url'),
{'siteKey': 'nick.de'})
mrss_url = self._extract_mrss_url(webpage, host)
return self._get_videos_info_from_url(mrss_url, video_id)
class NickNightIE(NickDeIE):
IE_NAME = 'nicknight'
_VALID_URL = r'https?://(?:www\.)(?P<host>nicknight\.(?:de|at|tv))/(?:playlist|shows)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://www.nicknight.at/shows/977-awkward/videos/85987-nimmer-beste-freunde',
'only_matching': True,
}, {
'url': 'http://www.nicknight.at/shows/977-awkward',
'only_matching': True,
}, {
'url': 'http://www.nicknight.at/shows/1900-faking-it',
'only_matching': True,
}]
def _extract_mrss_url(self, webpage, *args):
return self._search_regex(
r'mrss\s*:\s*(["\'])(?P<url>http.+?)\1', webpage,
'mrss url', group='url')

View File

@@ -0,0 +1,62 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
js_to_json,
mimetype2ext,
determine_ext,
update_url_query,
get_element_by_attribute,
int_or_none,
)
class NobelPrizeIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?nobelprize\.org/mediaplayer.*?\bid=(?P<id>\d+)'
_TEST = {
'url': 'http://www.nobelprize.org/mediaplayer/?id=2636',
'md5': '04c81e5714bb36cc4e2232fee1d8157f',
'info_dict': {
'id': '2636',
'ext': 'mp4',
'title': 'Announcement of the 2016 Nobel Prize in Physics',
'description': 'md5:05beba57f4f5a4bbd4cf2ef28fcff739',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
media = self._parse_json(self._search_regex(
r'(?s)var\s*config\s*=\s*({.+?});', webpage,
'config'), video_id, js_to_json)['media']
title = media['title']
formats = []
for source in media.get('source', []):
source_src = source.get('src')
if not source_src:
continue
ext = mimetype2ext(source.get('type')) or determine_ext(source_src)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
source_src, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
elif ext == 'f4m':
formats.extend(self._extract_f4m_formats(
update_url_query(source_src, {'hdcore': '3.7.0'}),
video_id, f4m_id='hds', fatal=False))
else:
formats.append({
'url': source_src,
})
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': get_element_by_attribute('itemprop', 'description', webpage),
'duration': int_or_none(media.get('duration')),
'formats': formats,
}

View File

@@ -113,7 +113,17 @@ class NRKBaseIE(InfoExtractor):
class NRKIE(NRKBaseIE):
_VALID_URL = r'(?:nrk:|https?://(?:www\.)?nrk\.no/video/PS\*)(?P<id>\d+)'
_VALID_URL = r'''(?x)
(?:
nrk:|
https?://
(?:
(?:www\.)?nrk\.no/video/PS\*|
v8-psapi\.nrk\.no/mediaelement/
)
)
(?P<id>[^/?#&]+)
'''
_API_HOST = 'v8.psapi.nrk.no'
_TESTS = [{
# video
@@ -137,6 +147,12 @@ class NRKIE(NRKBaseIE):
'description': 'md5:a621f5cc1bd75c8d5104cb048c6b8568',
'duration': 20,
}
}, {
'url': 'nrk:ecc1b952-96dc-4a98-81b9-5296dc7a98d9',
'only_matching': True,
}, {
'url': 'https://v8-psapi.nrk.no/mediaelement/ecc1b952-96dc-4a98-81b9-5296dc7a98d9',
'only_matching': True,
}]

View File

@@ -1,3 +1,4 @@
# coding: utf-8
from __future__ import unicode_literals
import hmac
@@ -6,11 +7,13 @@ import base64
from .common import InfoExtractor
from ..utils import (
determine_ext,
float_or_none,
int_or_none,
parse_iso8601,
js_to_json,
mimetype2ext,
determine_ext,
parse_iso8601,
remove_start,
)
@@ -138,16 +141,83 @@ class NYTimesArticleIE(NYTimesBaseIE):
'upload_date': '20150414',
'uploader': 'Matthew Williams',
}
}, {
'url': 'http://www.nytimes.com/2016/10/14/podcasts/revelations-from-the-final-weeks.html',
'md5': 'e0d52040cafb07662acf3c9132db3575',
'info_dict': {
'id': '100000004709062',
'title': 'The Run-Up: He Was Like an Octopus',
'ext': 'mp3',
'description': 'md5:fb5c6b93b12efc51649b4847fe066ee4',
'series': 'The Run-Up',
'episode': 'He Was Like an Octopus',
'episode_number': 20,
'duration': 2130,
}
}, {
'url': 'http://www.nytimes.com/2016/10/16/books/review/inside-the-new-york-times-book-review-the-rise-of-hitler.html',
'info_dict': {
'id': '100000004709479',
'title': 'The Rise of Hitler',
'ext': 'mp3',
'description': 'md5:bce877fd9e3444990cb141875fab0028',
'creator': 'Pamela Paul',
'duration': 3475,
},
'params': {
'skip_download': True,
},
}, {
'url': 'http://www.nytimes.com/news/minute/2014/03/17/times-minute-whats-next-in-crimea/?_php=true&_type=blogs&_php=true&_type=blogs&_r=1',
'only_matching': True,
}]
def _extract_podcast_from_json(self, json, page_id, webpage):
podcast_audio = self._parse_json(
json, page_id, transform_source=js_to_json)
audio_data = podcast_audio['data']
track = audio_data['track']
episode_title = track['title']
video_url = track['source']
description = track.get('description') or self._html_search_meta(
['og:description', 'twitter:description'], webpage)
podcast_title = audio_data.get('podcast', {}).get('title')
title = ('%s: %s' % (podcast_title, episode_title)
if podcast_title else episode_title)
episode = audio_data.get('podcast', {}).get('episode') or ''
episode_number = int_or_none(self._search_regex(
r'[Ee]pisode\s+(\d+)', episode, 'episode number', default=None))
return {
'id': remove_start(podcast_audio.get('target'), 'FT') or page_id,
'url': video_url,
'title': title,
'description': description,
'creator': track.get('credit'),
'series': podcast_title,
'episode': episode_title,
'episode_number': episode_number,
'duration': int_or_none(track.get('duration')),
}
def _real_extract(self, url):
video_id = self._match_id(url)
page_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
webpage = self._download_webpage(url, page_id)
video_id = self._html_search_regex(r'data-videoid="(\d+)"', webpage, 'video id')
video_id = self._search_regex(
r'data-videoid=["\'](\d+)', webpage, 'video id',
default=None, fatal=False)
if video_id is not None:
return self._extract_video_from_id(video_id)
return self._extract_video_from_id(video_id)
podcast_data = self._search_regex(
(r'NYTD\.FlexTypes\.push\s*\(\s*({.+?})\s*\)\s*;\s*</script',
r'NYTD\.FlexTypes\.push\s*\(\s*({.+})\s*\)\s*;'),
webpage, 'podcast data')
return self._extract_podcast_from_json(podcast_data, page_id, webpage)

View File

@@ -56,8 +56,8 @@ class OnetBaseIE(InfoExtractor):
continue
ext = determine_ext(video_url)
if format_id == 'ism':
# TODO: Support Microsoft Smooth Streaming
continue
formats.extend(self._extract_ism_formats(
video_url, video_id, 'mss', fatal=False))
elif ext == 'mpd':
formats.extend(self._extract_mpd_formats(
video_url, video_id, mpd_id='dash', fatal=False))

View File

@@ -18,7 +18,7 @@ class OoyalaBaseIE(InfoExtractor):
_CONTENT_TREE_BASE = _PLAYER_BASE + 'player_api/v1/content_tree/'
_AUTHORIZATION_URL_TEMPLATE = _PLAYER_BASE + 'sas/player_api/v2/authorization/embed_code/%s/%s?'
def _extract(self, content_tree_url, video_id, domain='example.org'):
def _extract(self, content_tree_url, video_id, domain='example.org', supportedformats=None):
content_tree = self._download_json(content_tree_url, video_id)['content_tree']
metadata = content_tree[list(content_tree)[0]]
embed_code = metadata['embed_code']
@@ -29,7 +29,7 @@ class OoyalaBaseIE(InfoExtractor):
self._AUTHORIZATION_URL_TEMPLATE % (pcode, embed_code) +
compat_urllib_parse_urlencode({
'domain': domain,
'supportedFormats': 'mp4,rtmp,m3u8,hds',
'supportedFormats': supportedformats or 'mp4,rtmp,m3u8,hds',
}), video_id)
cur_auth_data = auth_data['authorization_data'][embed_code]
@@ -145,8 +145,9 @@ class OoyalaIE(OoyalaBaseIE):
url, smuggled_data = unsmuggle_url(url, {})
embed_code = self._match_id(url)
domain = smuggled_data.get('domain')
supportedformats = smuggled_data.get('supportedformats')
content_tree_url = self._CONTENT_TREE_BASE + 'embed_code/%s/%s' % (embed_code, embed_code)
return self._extract(content_tree_url, embed_code, domain)
return self._extract(content_tree_url, embed_code, domain, supportedformats)
class OoyalaExternalIE(OoyalaBaseIE):

View File

@@ -70,14 +70,19 @@ class OpenloadIE(InfoExtractor):
r'<span[^>]*>([^<]+)</span>\s*<span[^>]*>[^<]+</span>\s*<span[^>]+id="streamurl"',
webpage, 'encrypted data')
magic = compat_ord(enc_data[-1])
video_url_chars = []
for idx, c in enumerate(enc_data):
j = compat_ord(c)
if j == magic:
j -= 1
elif j == magic - 1:
j += 1
if j >= 33 and j <= 126:
j = ((j + 14) % 94) + 33
if idx == len(enc_data) - 1:
j += 2
j += 3
video_url_chars += compat_chr(j)
video_url = 'https://openload.co/stream/%s?mime=true' % ''.join(video_url_chars)

View File

@@ -1,28 +1,28 @@
# coding: utf-8
from __future__ import unicode_literals
import json
import re
import calendar
import datetime
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
HEADRequest,
unified_strdate,
ExtractorError,
strip_jsonp,
int_or_none,
float_or_none,
determine_ext,
remove_end,
unescapeHTML,
)
class ORFTVthekIE(InfoExtractor):
IE_NAME = 'orf:tvthek'
IE_DESC = 'ORF TVthek'
_VALID_URL = r'https?://tvthek\.orf\.at/(?:programs/.+?/episodes|topics?/.+?|program/[^/]+)/(?P<id>\d+)'
_VALID_URL = r'https?://tvthek\.orf\.at/(?:[^/]+/)+(?P<id>\d+)'
_TESTS = [{
'url': 'http://tvthek.orf.at/program/Aufgetischt/2745173/Aufgetischt-Mit-der-Steirischen-Tafelrunde/8891389',
@@ -51,26 +51,23 @@ class ORFTVthekIE(InfoExtractor):
'skip_download': True, # rtsp downloads
},
'_skip': 'Blocked outside of Austria / Germany',
}, {
'url': 'http://tvthek.orf.at/topic/Fluechtlingskrise/10463081/Heimat-Fremde-Heimat/13879132/Senioren-betreuen-Migrantenkinder/13879141',
'skip_download': True,
}, {
'url': 'http://tvthek.orf.at/profile/Universum/35429',
'skip_download': True,
}]
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
data_json = self._search_regex(
r'initializeAdworx\((.+?)\);\n', webpage, 'video info')
all_data = json.loads(data_json)
def get_segments(all_data):
for data in all_data:
if data['name'] in (
'Tracker::EPISODE_DETAIL_PAGE_OVER_PROGRAM',
'Tracker::EPISODE_DETAIL_PAGE_OVER_TOPIC'):
return data['values']['segments']
sdata = get_segments(all_data)
if not sdata:
raise ExtractorError('Unable to extract segments')
data_jsb = self._parse_json(
self._search_regex(
r'<div[^>]+class=(["\']).*?VideoPlaylist.*?\1[^>]+data-jsb=(["\'])(?P<json>.+?)\2',
webpage, 'playlist', group='json'),
playlist_id, transform_source=unescapeHTML)['playlist']['videos']
def quality_to_int(s):
m = re.search('([0-9]+)', s)
@@ -79,8 +76,11 @@ class ORFTVthekIE(InfoExtractor):
return int(m.group(1))
entries = []
for sd in sdata:
video_id = sd['id']
for sd in data_jsb:
video_id, title = sd.get('id'), sd.get('title')
if not video_id or not title:
continue
video_id = compat_str(video_id)
formats = [{
'preference': -10 if fd['delivery'] == 'hls' else None,
'format_id': '%s-%s-%s' % (
@@ -88,7 +88,7 @@ class ORFTVthekIE(InfoExtractor):
'url': fd['src'],
'protocol': fd['protocol'],
'quality': quality_to_int(fd['quality']),
} for fd in sd['playlist_item_array']['sources']]
} for fd in sd['sources']]
# Check for geoblocking.
# There is a property is_geoprotection, but that's always false
@@ -115,14 +115,24 @@ class ORFTVthekIE(InfoExtractor):
self._check_formats(formats, video_id)
self._sort_formats(formats)
upload_date = unified_strdate(sd['created_date'])
subtitles = {}
for sub in sd.get('subtitles', []):
sub_src = sub.get('src')
if not sub_src:
continue
subtitles.setdefault(sub.get('lang', 'de-AT'), []).append({
'url': sub_src,
})
upload_date = unified_strdate(sd.get('created_date'))
entries.append({
'_type': 'video',
'id': video_id,
'title': sd['header'],
'title': title,
'formats': formats,
'subtitles': subtitles,
'description': sd.get('description'),
'duration': int(sd['duration_in_seconds']),
'duration': int_or_none(sd.get('duration_in_seconds')),
'upload_date': upload_date,
'thumbnail': sd.get('image_full_url'),
})

View File

@@ -0,0 +1,91 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
ExtractorError,
qualities,
)
class PandaTVIE(InfoExtractor):
IE_DESC = '熊猫TV'
_VALID_URL = r'http://(?:www\.)?panda\.tv/(?P<id>[0-9]+)'
_TEST = {
'url': 'http://www.panda.tv/10091',
'info_dict': {
'id': '10091',
'title': 're:.+',
'uploader': '囚徒',
'ext': 'flv',
'is_live': True,
},
'params': {
'skip_download': True,
},
'skip': 'Live stream is offline',
}
def _real_extract(self, url):
video_id = self._match_id(url)
config = self._download_json(
'http://www.panda.tv/api_room?roomid=%s' % video_id, video_id)
error_code = config.get('errno', 0)
if error_code is not 0:
raise ExtractorError(
'%s returned error %s: %s'
% (self.IE_NAME, error_code, config['errmsg']),
expected=True)
data = config['data']
video_info = data['videoinfo']
# 2 = live, 3 = offline
if video_info.get('status') != '2':
raise ExtractorError(
'Live stream is offline', expected=True)
title = data['roominfo']['name']
uploader = data.get('hostinfo', {}).get('name')
room_key = video_info['room_key']
stream_addr = video_info.get(
'stream_addr', {'OD': '1', 'HD': '1', 'SD': '1'})
# Reverse engineered from web player swf
# (http://s6.pdim.gs/static/07153e425f581151.swf at the moment of
# writing).
plflag0, plflag1 = video_info['plflag'].split('_')
plflag0 = int(plflag0) - 1
if plflag1 == '21':
plflag0 = 10
plflag1 = '4'
live_panda = 'live_panda' if plflag0 < 1 else ''
quality_key = qualities(['OD', 'HD', 'SD'])
suffix = ['_small', '_mid', '']
formats = []
for k, v in stream_addr.items():
if v != '1':
continue
quality = quality_key(k)
if quality <= 0:
continue
for pref, (ext, pl) in enumerate((('m3u8', '-hls'), ('flv', ''))):
formats.append({
'url': 'http://pl%s%s.live.panda.tv/live_panda/%s%s%s.%s'
% (pl, plflag1, room_key, live_panda, suffix[quality], ext),
'format_id': '%s-%s' % (k, ext),
'quality': quality,
'source_preference': pref,
})
self._sort_formats(formats)
return {
'id': video_id,
'title': self._live_title(title),
'uploader': uploader,
'formats': formats,
'is_live': True,
}

View File

@@ -6,9 +6,9 @@ from .common import InfoExtractor
class ParliamentLiveUKIE(InfoExtractor):
IE_NAME = 'parliamentlive.tv'
IE_DESC = 'UK parliament videos'
_VALID_URL = r'https?://(?:www\.)?parliamentlive\.tv/Event/Index/(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'
_VALID_URL = r'(?i)https?://(?:www\.)?parliamentlive\.tv/Event/Index/(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'
_TEST = {
_TESTS = [{
'url': 'http://parliamentlive.tv/Event/Index/c1e9d44d-fd6c-4263-b50f-97ed26cc998b',
'info_dict': {
'id': 'c1e9d44d-fd6c-4263-b50f-97ed26cc998b',
@@ -18,7 +18,10 @@ class ParliamentLiveUKIE(InfoExtractor):
'timestamp': 1422696664,
'upload_date': '20150131',
},
}
}, {
'url': 'http://parliamentlive.tv/event/index/3f24936f-130f-40bf-9a5d-b3d6479da6a4',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)

View File

@@ -4,7 +4,6 @@ import collections
import json
import os
import random
import re
from .common import InfoExtractor
from ..compat import (
@@ -12,6 +11,7 @@ from ..compat import (
compat_urlparse,
)
from ..utils import (
dict_get,
ExtractorError,
float_or_none,
int_or_none,
@@ -23,12 +23,12 @@ from ..utils import (
class PluralsightBaseIE(InfoExtractor):
_API_BASE = 'http://app.pluralsight.com'
_API_BASE = 'https://app.pluralsight.com'
class PluralsightIE(PluralsightBaseIE):
IE_NAME = 'pluralsight'
_VALID_URL = r'https?://(?:(?:www|app)\.)?pluralsight\.com/training/player\?'
_VALID_URL = r'https?://(?:(?:www|app)\.)?pluralsight\.com/(?:training/)?player\?'
_LOGIN_URL = 'https://app.pluralsight.com/id/'
_NETRC_MACHINE = 'pluralsight'
@@ -50,6 +50,9 @@ class PluralsightIE(PluralsightBaseIE):
# available without pluralsight account
'url': 'http://app.pluralsight.com/training/player?author=scott-allen&name=angularjs-get-started-m1-introduction&mode=live&clip=0&course=angularjs-get-started',
'only_matching': True,
}, {
'url': 'https://app.pluralsight.com/player?course=ccna-intro-networking&author=ross-bagurdes&name=ccna-intro-networking-m06&clip=0',
'only_matching': True,
}]
def _real_initialize(self):
@@ -99,7 +102,7 @@ class PluralsightIE(PluralsightBaseIE):
'm': name,
}
captions = self._download_json(
'%s/training/Player/Captions' % self._API_BASE, video_id,
'%s/player/retrieve-captions' % self._API_BASE, video_id,
'Downloading captions JSON', 'Unable to download captions JSON',
fatal=False, data=json.dumps(captions_post).encode('utf-8'),
headers={'Content-Type': 'application/json;charset=utf-8'})
@@ -117,14 +120,17 @@ class PluralsightIE(PluralsightBaseIE):
@staticmethod
def _convert_subtitles(duration, subs):
srt = ''
TIME_OFFSET_KEYS = ('displayTimeOffset', 'DisplayTimeOffset')
TEXT_KEYS = ('text', 'Text')
for num, current in enumerate(subs):
current = subs[num]
start, text = float_or_none(
current.get('DisplayTimeOffset')), current.get('Text')
start, text = (
float_or_none(dict_get(current, TIME_OFFSET_KEYS)),
dict_get(current, TEXT_KEYS))
if start is None or text is None:
continue
end = duration if num == len(subs) - 1 else float_or_none(
subs[num + 1].get('DisplayTimeOffset'))
dict_get(subs[num + 1], TIME_OFFSET_KEYS))
if end is None:
continue
srt += os.linesep.join(
@@ -144,28 +150,22 @@ class PluralsightIE(PluralsightBaseIE):
author = qs.get('author', [None])[0]
name = qs.get('name', [None])[0]
clip_id = qs.get('clip', [None])[0]
course = qs.get('course', [None])[0]
course_name = qs.get('course', [None])[0]
if any(not f for f in (author, name, clip_id, course,)):
if any(not f for f in (author, name, clip_id, course_name,)):
raise ExtractorError('Invalid URL', expected=True)
display_id = '%s-%s' % (name, clip_id)
webpage = self._download_webpage(url, display_id)
parsed_url = compat_urlparse.urlparse(url)
modules = self._search_regex(
r'moduleCollection\s*:\s*new\s+ModuleCollection\((\[.+?\])\s*,\s*\$rootScope\)',
webpage, 'modules', default=None)
payload_url = compat_urlparse.urlunparse(parsed_url._replace(
netloc='app.pluralsight.com', path='player/api/v1/payload'))
if modules:
collection = self._parse_json(modules, display_id)
else:
# Webpage may be served in different layout (see
# https://github.com/rg3/youtube-dl/issues/7607)
collection = self._parse_json(
self._search_regex(
r'var\s+initialState\s*=\s*({.+?});\n', webpage, 'initial state'),
display_id)['course']['modules']
course = self._download_json(
payload_url, display_id, headers={'Referer': url})['payload']['course']
collection = course['modules']
module, clip = None, None
@@ -206,8 +206,7 @@ class PluralsightIE(PluralsightBaseIE):
# Some courses also offer widescreen resolution for high quality (see
# https://github.com/rg3/youtube-dl/issues/7766)
widescreen = True if re.search(
r'courseSupportsWidescreenVideoFormats\s*:\s*true', webpage) else False
widescreen = course.get('supportsWideScreenVideoFormats') is True
best_quality = 'high-widescreen' if widescreen else 'high'
if widescreen:
for allowed_quality in ALLOWED_QUALITIES:
@@ -236,19 +235,19 @@ class PluralsightIE(PluralsightBaseIE):
for quality in qualities_:
f = QUALITIES[quality].copy()
clip_post = {
'a': author,
'cap': 'false',
'cn': clip_id,
'course': course,
'lc': 'en',
'm': name,
'mt': ext,
'q': '%dx%d' % (f['width'], f['height']),
'author': author,
'includeCaptions': False,
'clipIndex': int(clip_id),
'courseName': course_name,
'locale': 'en',
'moduleName': name,
'mediaType': ext,
'quality': '%dx%d' % (f['width'], f['height']),
}
format_id = '%s-%s' % (ext, quality)
clip_url = self._download_webpage(
'%s/training/Player/ViewClip' % self._API_BASE, display_id,
'Downloading %s URL' % format_id, fatal=False,
viewclip = self._download_json(
'%s/video/clips/viewclip' % self._API_BASE, display_id,
'Downloading %s viewclip JSON' % format_id, fatal=False,
data=json.dumps(clip_post).encode('utf-8'),
headers={'Content-Type': 'application/json;charset=utf-8'})
@@ -262,15 +261,28 @@ class PluralsightIE(PluralsightBaseIE):
random.randint(2, 5), display_id,
'%(video_id)s: Waiting for %(timeout)s seconds to avoid throttling')
if not clip_url:
if not viewclip:
continue
f.update({
'url': clip_url,
'ext': ext,
'format_id': format_id,
'quality': quality_key(quality),
})
formats.append(f)
clip_urls = viewclip.get('urls')
if not isinstance(clip_urls, list):
continue
for clip_url_data in clip_urls:
clip_url = clip_url_data.get('url')
if not clip_url:
continue
cdn = clip_url_data.get('cdn')
clip_f = f.copy()
clip_f.update({
'url': clip_url,
'ext': ext,
'format_id': '%s-%s' % (format_id, cdn) if cdn else format_id,
'quality': quality_key(quality),
'source_preference': int_or_none(clip_url_data.get('rank')),
})
formats.append(clip_f)
self._sort_formats(formats)
duration = int_or_none(

View File

@@ -33,7 +33,7 @@ class PornHubIE(InfoExtractor):
(?:[a-z]+\.)?pornhub\.com/(?:view_video\.php\?viewkey=|embed/)|
(?:www\.)?thumbzilla\.com/video/
)
(?P<id>[0-9a-z]+)
(?P<id>[\da-z]+)
'''
_TESTS = [{
'url': 'http://www.pornhub.com/view_video.php?viewkey=648719015',
@@ -96,12 +96,11 @@ class PornHubIE(InfoExtractor):
'only_matching': True,
}]
@classmethod
def _extract_url(cls, webpage):
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?pornhub\.com/embed/\d+)\1', webpage)
if mobj:
return mobj.group('url')
@staticmethod
def _extract_urls(webpage):
return re.findall(
r'<iframe[^>]+?src=["\'](?P<url>(?:https?:)?//(?:www\.)?pornhub\.com/embed/[\da-z]+)',
webpage)
def _extract_count(self, pattern, webpage, name):
return str_to_int(self._search_regex(

View File

@@ -125,6 +125,14 @@ class RadioCanadaIE(InfoExtractor):
f4m_id='hds', fatal=False))
self._sort_formats(formats)
subtitles = {}
closed_caption_url = get_meta('closedCaption') or get_meta('closedCaptionHTML5')
if closed_caption_url:
subtitles['fr'] = [{
'url': closed_caption_url,
'ext': determine_ext(closed_caption_url, 'vtt'),
}]
return {
'id': video_id,
'title': get_meta('Title'),
@@ -135,6 +143,7 @@ class RadioCanadaIE(InfoExtractor):
'season_number': int_or_none('SrcSaison'),
'episode_number': int_or_none('SrcEpisode'),
'upload_date': unified_strdate(get_meta('Date')),
'subtitles': subtitles,
'formats': formats,
}

View File

@@ -1,5 +1,7 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
@@ -10,8 +12,8 @@ from ..utils import (
class RedTubeIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?redtube\.com/(?P<id>[0-9]+)'
_TEST = {
_VALID_URL = r'https?://(?:(?:www\.)?redtube\.com/|embed\.redtube\.com/\?.*?\bid=)(?P<id>[0-9]+)'
_TESTS = [{
'url': 'http://www.redtube.com/66418',
'md5': '7b8c22b5e7098a3e1c09709df1126d2d',
'info_dict': {
@@ -23,11 +25,21 @@ class RedTubeIE(InfoExtractor):
'view_count': int,
'age_limit': 18,
}
}
}, {
'url': 'http://embed.redtube.com/?bgcolor=000000&id=1443286',
'only_matching': True,
}]
@staticmethod
def _extract_urls(webpage):
return re.findall(
r'<iframe[^>]+?src=["\'](?P<url>(?:https?:)?//embed\.redtube\.com/\?.*?\bid=\d+)',
webpage)
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
webpage = self._download_webpage(
'http://www.redtube.com/%s' % video_id, video_id)
if any(s in webpage for s in ['video-deleted-info', '>This video has been removed']):
raise ExtractorError('Video %s has been removed' % video_id, expected=True)

View File

@@ -0,0 +1,76 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from .jwplatform import JWPlatformBaseIE
from ..compat import compat_str
class RENTVIE(JWPlatformBaseIE):
_VALID_URL = r'(?:rentv:|https?://(?:www\.)?ren\.tv/(?:player|video/epizod)/)(?P<id>\d+)'
_TESTS = [{
'url': 'http://ren.tv/video/epizod/118577',
'md5': 'd91851bf9af73c0ad9b2cdf76c127fbb',
'info_dict': {
'id': '118577',
'ext': 'mp4',
'title': 'Документальный спецпроект: "Промывка мозгов. Технологии XXI века"'
}
}, {
'url': 'http://ren.tv/player/118577',
'only_matching': True,
}, {
'url': 'rentv:118577',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage('http://ren.tv/player/' + video_id, video_id)
jw_config = self._parse_json(self._search_regex(
r'config\s*=\s*({.+});', webpage, 'jw config'), video_id)
return self._parse_jwplayer_data(jw_config, video_id, m3u8_id='hls')
class RENTVArticleIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?ren\.tv/novosti/\d{4}-\d{2}-\d{2}/(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'http://ren.tv/novosti/2016-10-26/video-mikroavtobus-popavshiy-v-dtp-s-gruzovikami-v-podmoskove-prevratilsya-v',
'md5': 'ebd63c4680b167693745ab91343df1d6',
'info_dict': {
'id': '136472',
'ext': 'mp4',
'title': 'Видео: микроавтобус, попавший в ДТП с грузовиками в Подмосковье, превратился в груду металла',
'description': 'Жертвами столкновения двух фур и микроавтобуса, по последним данным, стали семь человек.',
}
}, {
# TODO: invalid m3u8
'url': 'http://ren.tv/novosti/2015-09-25/sluchaynyy-prohozhiy-poymal-avtougonshchika-v-murmanske-video',
'info_dict': {
'id': 'playlist',
'ext': 'mp4',
'title': 'Случайный прохожий поймал автоугонщика в Мурманске. ВИДЕО | РЕН ТВ',
'uploader': 'ren.tv',
},
'params': {
# m3u8 downloads
'skip_download': True,
},
'skip': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
drupal_settings = self._parse_json(self._search_regex(
r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);',
webpage, 'drupal settings'), display_id)
entries = []
for config_profile in drupal_settings.get('ren_jwplayer', {}).values():
media_id = config_profile.get('mediaid')
if not media_id:
continue
media_id = compat_str(media_id)
entries.append(self.url_result('rentv:' + media_id, 'RENTV', media_id))
return self.playlist_result(entries, display_id)

View File

@@ -1,29 +1,29 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import str_or_none
from ..utils import (
qualities,
str_or_none,
)
class ReverbNationIE(InfoExtractor):
_VALID_URL = r'^https?://(?:www\.)?reverbnation\.com/.*?/song/(?P<id>\d+).*?$'
_TESTS = [{
'url': 'http://www.reverbnation.com/alkilados/song/16965047-mona-lisa',
'md5': '3da12ebca28c67c111a7f8b262d3f7a7',
'md5': 'c0aaf339bcee189495fdf5a8c8ba8645',
'info_dict': {
'id': '16965047',
'ext': 'mp3',
'title': 'MONA LISA',
'uploader': 'ALKILADOS',
'uploader_id': '216429',
'thumbnail': 're:^https://gp1\.wac\.edgecastcdn\.net/.*?\.jpg$'
'thumbnail': 're:^https?://.*\.jpg',
},
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
song_id = mobj.group('id')
song_id = self._match_id(url)
api_res = self._download_json(
'https://api.reverbnation.com/song/%s' % song_id,
@@ -31,14 +31,23 @@ class ReverbNationIE(InfoExtractor):
note='Downloading information of song %s' % song_id
)
THUMBNAILS = ('thumbnail', 'image')
quality = qualities(THUMBNAILS)
thumbnails = []
for thumb_key in THUMBNAILS:
if api_res.get(thumb_key):
thumbnails.append({
'url': api_res[thumb_key],
'preference': quality(thumb_key)
})
return {
'id': song_id,
'title': api_res.get('name'),
'url': api_res.get('url'),
'title': api_res['name'],
'url': api_res['url'],
'uploader': api_res.get('artist', {}).get('name'),
'uploader_id': str_or_none(api_res.get('artist', {}).get('id')),
'thumbnail': self._proto_relative_url(
api_res.get('image', api_res.get('thumbnail'))),
'thumbnails': thumbnails,
'ext': 'mp3',
'vcodec': 'none',
}

View File

@@ -12,7 +12,7 @@ from ..utils import (
class RuutuIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?ruutu\.fi/video/(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?(?:ruutu|supla)\.fi/(?:video|supla)/(?P<id>\d+)'
_TESTS = [
{
'url': 'http://www.ruutu.fi/video/2058907',
@@ -34,12 +34,24 @@ class RuutuIE(InfoExtractor):
'id': '2057306',
'ext': 'mp4',
'title': 'Superpesis: katso koko kausi Ruudussa',
'description': 'md5:da2736052fef3b2bd5e0005e63c25eac',
'description': 'md5:bfb7336df2a12dc21d18fa696c9f8f23',
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 40,
'age_limit': 0,
},
},
{
'url': 'http://www.supla.fi/supla/2231370',
'md5': 'df14e782d49a2c0df03d3be2a54ef949',
'info_dict': {
'id': '2231370',
'ext': 'mp4',
'title': 'Osa 1: Mikael Jungner',
'description': 'md5:7d90f358c47542e3072ff65d7b1bcffe',
'thumbnail': 're:^https?://.*\.jpg$',
'age_limit': 0,
},
},
]
def _real_extract(self, url):

View File

@@ -157,7 +157,14 @@ class SafariCourseIE(SafariBaseIE):
IE_NAME = 'safari:course'
IE_DESC = 'safaribooksonline.com online courses'
_VALID_URL = r'https?://(?:www\.)?safaribooksonline\.com/(?:library/view/[^/]+|api/v1/book)/(?P<id>[^/]+)/?(?:[#?]|$)'
_VALID_URL = r'''(?x)
https?://
(?:
(?:www\.)?safaribooksonline\.com/(?:library/view/[^/]+|api/v1/book)|
techbus\.safaribooksonline\.com
)
/(?P<id>[^/]+)/?(?:[#?]|$)
'''
_TESTS = [{
'url': 'https://www.safaribooksonline.com/library/view/hadoop-fundamentals-livelessons/9780133392838/',
@@ -170,6 +177,9 @@ class SafariCourseIE(SafariBaseIE):
}, {
'url': 'https://www.safaribooksonline.com/api/v1/book/9781449396459/?override_format=json',
'only_matching': True,
}, {
'url': 'http://techbus.safaribooksonline.com/9780134426365',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@@ -1,17 +1,24 @@
# coding: utf-8
from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor
from ..compat import compat_HTTPError
from ..utils import (
ExtractorError,
int_or_none,
parse_iso8601,
str_or_none,
urlencode_postdata,
clean_html,
)
class ShahidIE(InfoExtractor):
_VALID_URL = r'https?://shahid\.mbc\.net/ar/episode/(?P<id>\d+)/?'
_NETRC_MACHINE = 'shahid'
_VALID_URL = r'https?://shahid\.mbc\.net/ar/(?P<type>episode|movie)/(?P<id>\d+)'
_TESTS = [{
'url': 'https://shahid.mbc.net/ar/episode/90574/%D8%A7%D9%84%D9%85%D9%84%D9%83-%D8%B9%D8%A8%D8%AF%D8%A7%D9%84%D9%84%D9%87-%D8%A7%D9%84%D8%A5%D9%86%D8%B3%D8%A7%D9%86-%D8%A7%D9%84%D9%85%D9%88%D8%B3%D9%85-1-%D9%83%D9%84%D9%8A%D8%A8-3.html',
'info_dict': {
@@ -27,18 +34,54 @@ class ShahidIE(InfoExtractor):
# m3u8 download
'skip_download': True,
}
}, {
'url': 'https://shahid.mbc.net/ar/movie/151746/%D8%A7%D9%84%D9%82%D9%86%D8%A7%D8%B5%D8%A9.html',
'only_matching': True
}, {
# shahid plus subscriber only
'url': 'https://shahid.mbc.net/ar/episode/90511/%D9%85%D8%B1%D8%A7%D9%8A%D8%A7-2011-%D8%A7%D9%84%D9%85%D9%88%D8%B3%D9%85-1-%D8%A7%D9%84%D8%AD%D9%84%D9%82%D8%A9-1.html',
'only_matching': True
}]
def _call_api(self, path, video_id, note):
data = self._download_json(
'http://api.shahid.net/api/v1_1/' + path, video_id, note, query={
'apiKey': 'sh@hid0nlin3',
'hash': 'b2wMCTHpSmyxGqQjJFOycRmLSex+BpTK/ooxy6vHaqs=',
}).get('data', {})
def _real_initialize(self):
email, password = self._get_login_info()
if email is None:
return
try:
user_data = self._download_json(
'https://shahid.mbc.net/wd/service/users/login',
None, 'Logging in', data=json.dumps({
'email': email,
'password': password,
'basic': 'false',
}).encode('utf-8'), headers={
'Content-Type': 'application/json; charset=UTF-8',
})['user']
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError):
fail_data = self._parse_json(
e.cause.read().decode('utf-8'), None, fatal=False)
if fail_data:
faults = fail_data.get('faults', [])
faults_message = ', '.join([clean_html(fault['userMessage']) for fault in faults if fault.get('userMessage')])
if faults_message:
raise ExtractorError(faults_message, expected=True)
raise
self._download_webpage(
'https://shahid.mbc.net/populateContext',
None, 'Populate Context', data=urlencode_postdata({
'firstName': user_data['firstName'],
'lastName': user_data['lastName'],
'userName': user_data['email'],
'csg_user_name': user_data['email'],
'subscriberId': user_data['id'],
'sessionId': user_data['sessionId'],
}))
def _get_api_data(self, response):
data = response.get('data', {})
error = data.get('error')
if error:
@@ -49,11 +92,11 @@ class ShahidIE(InfoExtractor):
return data
def _real_extract(self, url):
video_id = self._match_id(url)
page_type, video_id = re.match(self._VALID_URL, url).groups()
player = self._call_api(
'Content/Episode/%s' % video_id,
video_id, 'Downloading player JSON')
player = self._get_api_data(self._download_json(
'https://shahid.mbc.net/arContent/getPlayerContent-param-.id-%s.type-player.html' % video_id,
video_id, 'Downloading player JSON'))
if player.get('drm'):
raise ExtractorError('This video is DRM protected.', expected=True)
@@ -61,9 +104,12 @@ class ShahidIE(InfoExtractor):
formats = self._extract_m3u8_formats(player['url'], video_id, 'mp4')
self._sort_formats(formats)
video = self._call_api(
'episode/%s' % video_id, video_id,
'Downloading video JSON')['episode']
video = self._get_api_data(self._download_json(
'http://api.shahid.net/api/v1_1/%s/%s' % (page_type, video_id),
video_id, 'Downloading video JSON', query={
'apiKey': 'sh@hid0nlin3',
'hash': 'b2wMCTHpSmyxGqQjJFOycRmLSex+BpTK/ooxy6vHaqs=',
}))[page_type]
title = video['title']
categories = [

View File

@@ -10,11 +10,38 @@ from ..utils import (
)
class SharedIE(InfoExtractor):
IE_DESC = 'shared.sx and vivo.sx'
_VALID_URL = r'https?://(?:shared|vivo)\.sx/(?P<id>[\da-z]{10})'
class SharedBaseIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
_TESTS = [{
webpage, urlh = self._download_webpage_handle(url, video_id)
if self._FILE_NOT_FOUND in webpage:
raise ExtractorError(
'Video %s does not exist' % video_id, expected=True)
video_url = self._extract_video_url(webpage, video_id, url)
title = base64.b64decode(self._html_search_meta(
'full:title', webpage, 'title').encode('utf-8')).decode('utf-8')
filesize = int_or_none(self._html_search_meta(
'full:size', webpage, 'file size', fatal=False))
return {
'id': video_id,
'url': video_url,
'ext': 'mp4',
'filesize': filesize,
'title': title,
}
class SharedIE(SharedBaseIE):
IE_DESC = 'shared.sx'
_VALID_URL = r'https?://shared\.sx/(?P<id>[\da-z]{10})'
_FILE_NOT_FOUND = '>File does not exist<'
_TEST = {
'url': 'http://shared.sx/0060718775',
'md5': '106fefed92a8a2adb8c98e6a0652f49b',
'info_dict': {
@@ -23,7 +50,32 @@ class SharedIE(InfoExtractor):
'title': 'Bmp4',
'filesize': 1720110,
},
}, {
}
def _extract_video_url(self, webpage, video_id, url):
download_form = self._hidden_inputs(webpage)
video_page = self._download_webpage(
url, video_id, 'Downloading video page',
data=urlencode_postdata(download_form),
headers={
'Content-Type': 'application/x-www-form-urlencoded',
'Referer': url,
})
video_url = self._html_search_regex(
r'data-url=(["\'])(?P<url>(?:(?!\1).)+)\1',
video_page, 'video URL', group='url')
return video_url
class VivoIE(SharedBaseIE):
IE_DESC = 'vivo.sx'
_VALID_URL = r'https?://vivo\.sx/(?P<id>[\da-z]{10})'
_FILE_NOT_FOUND = '>The file you have requested does not exists or has been removed'
_TEST = {
'url': 'http://vivo.sx/d7ddda0e78',
'md5': '15b3af41be0b4fe01f4df075c2678b2c',
'info_dict': {
@@ -32,43 +84,13 @@ class SharedIE(InfoExtractor):
'title': 'Chicken',
'filesize': 528031,
},
}]
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage, urlh = self._download_webpage_handle(url, video_id)
if '>File does not exist<' in webpage:
raise ExtractorError(
'Video %s does not exist' % video_id, expected=True)
download_form = self._hidden_inputs(webpage)
video_page = self._download_webpage(
urlh.geturl(), video_id, 'Downloading video page',
data=urlencode_postdata(download_form),
headers={
'Content-Type': 'application/x-www-form-urlencoded',
'Referer': urlh.geturl(),
})
video_url = self._html_search_regex(
r'data-url=(["\'])(?P<url>(?:(?!\1).)+)\1',
video_page, 'video URL', group='url')
title = base64.b64decode(self._html_search_meta(
'full:title', webpage, 'title').encode('utf-8')).decode('utf-8')
filesize = int_or_none(self._html_search_meta(
'full:size', webpage, 'file size', fatal=False))
thumbnail = self._html_search_regex(
r'data-poster=(["\'])(?P<url>(?:(?!\1).)+)\1',
video_page, 'thumbnail', default=None, group='url')
return {
'id': video_id,
'url': video_url,
'ext': 'mp4',
'filesize': filesize,
'title': title,
'thumbnail': thumbnail,
}
def _extract_video_url(self, webpage, video_id, *args):
return self._parse_json(
self._search_regex(
r'InitializeStream\s*\(\s*(["\'])(?P<url>(?:(?!\1).)+)\1',
webpage, 'stream', group='url'),
video_id,
transform_source=lambda x: base64.b64decode(
x.encode('ascii')).decode('utf-8'))[0]

View File

@@ -1,7 +1,5 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
@@ -9,7 +7,7 @@ class SlutloadIE(InfoExtractor):
_VALID_URL = r'^https?://(?:\w+\.)?slutload\.com/video/[^/]+/(?P<id>[^/]+)/?$'
_TEST = {
'url': 'http://www.slutload.com/video/virginie-baisee-en-cam/TD73btpBqSxc/',
'md5': '0cf531ae8006b530bd9df947a6a0df77',
'md5': '868309628ba00fd488cf516a113fd717',
'info_dict': {
'id': 'TD73btpBqSxc',
'ext': 'mp4',
@@ -20,9 +18,7 @@ class SlutloadIE(InfoExtractor):
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_title = self._html_search_regex(r'<h1><strong>([^<]+)</strong>',

View File

@@ -1,6 +1,8 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
@@ -48,6 +50,14 @@ class StreamableIE(InfoExtractor):
}
]
@staticmethod
def _extract_url(webpage):
mobj = re.search(
r'<iframe[^>]+src=(?P<q1>[\'"])(?P<src>(?:https?:)?//streamable\.com/(?:(?!\1).+))(?P=q1)',
webpage)
if mobj:
return mobj.group('src')
def _real_extract(self, url):
video_id = self._match_id(url)

View File

@@ -32,12 +32,15 @@ class TMZArticleIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?tmz\.com/\d{4}/\d{2}/\d{2}/(?P<id>[^/]+)/?'
_TEST = {
'url': 'http://www.tmz.com/2015/04/19/bobby-brown-bobbi-kristina-awake-video-concert',
'md5': 'e482a414a38db73087450e3a6ce69d00',
'md5': '3316ff838ae5bb7f642537825e1e90d2',
'info_dict': {
'id': '0_6snoelag',
'ext': 'mp4',
'ext': 'mov',
'title': 'Bobby Brown Tells Crowd ... Bobbi Kristina is Awake',
'description': 'Bobby Brown stunned his audience during a concert Saturday night, when he told the crowd, "Bobbi is awake. She\'s watching me."',
'timestamp': 1429467813,
'upload_date': '20150419',
'uploader_id': 'batchUser',
}
}
@@ -45,12 +48,9 @@ class TMZArticleIE(InfoExtractor):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
embedded_video_info_str = self._html_search_regex(
r'tmzVideoEmbedV2\("([^)]+)"\);', webpage, 'embedded video info')
embedded_video_info = self._parse_json(
embedded_video_info_str, video_id,
transform_source=lambda s: s.replace('\\', ''))
embedded_video_info = self._parse_json(self._html_search_regex(
r'tmzVideoEmbed\(({.+?})\);', webpage, 'embedded video info'),
video_id)
return self.url_result(
'http://www.tmz.com/videos/%s/' % embedded_video_info['id'])

View File

@@ -15,11 +15,11 @@ from ..utils import (
class TouTvIE(InfoExtractor):
_NETRC_MACHINE = 'toutv'
IE_NAME = 'tou.tv'
_VALID_URL = r'https?://ici\.tou\.tv/(?P<id>[a-zA-Z0-9_-]+/S[0-9]+E[0-9]+)'
_VALID_URL = r'https?://ici\.tou\.tv/(?P<id>[a-zA-Z0-9_-]+(?:/S[0-9]+E[0-9]+)?)'
_access_token = None
_claims = None
_TEST = {
_TESTS = [{
'url': 'http://ici.tou.tv/garfield-tout-court/S2015E17',
'info_dict': {
'id': '122017',
@@ -33,7 +33,10 @@ class TouTvIE(InfoExtractor):
'skip_download': True,
},
'skip': '404 Not Found',
}
}, {
'url': 'http://ici.tou.tv/hackers',
'only_matching': True,
}]
def _real_initialize(self):
email, password = self._get_login_info()

View File

@@ -9,7 +9,6 @@ from ..utils import (
int_or_none,
sanitized_Request,
urlencode_postdata,
parse_iso8601,
)
@@ -19,17 +18,13 @@ class TubiTvIE(InfoExtractor):
_NETRC_MACHINE = 'tubitv'
_TEST = {
'url': 'http://tubitv.com/video/283829/the_comedian_at_the_friday',
'md5': '43ac06be9326f41912dc64ccf7a80320',
'info_dict': {
'id': '283829',
'ext': 'mp4',
'title': 'The Comedian at The Friday',
'description': 'A stand up comedian is forced to look at the decisions in his life while on a one week trip to the west coast.',
'uploader': 'Indie Rights Films',
'upload_date': '20160111',
'timestamp': 1452555979,
},
'params': {
'skip_download': 'HLS download',
'uploader_id': 'bc168bee0d18dd1cb3b86c68706ab434',
},
}
@@ -58,19 +53,28 @@ class TubiTvIE(InfoExtractor):
video_id = self._match_id(url)
video_data = self._download_json(
'http://tubitv.com/oz/videos/%s/content' % video_id, video_id)
title = video_data['n']
title = video_data['title']
formats = self._extract_m3u8_formats(
video_data['mh'], video_id, 'mp4', 'm3u8_native')
self._proto_relative_url(video_data['url']),
video_id, 'mp4', 'm3u8_native')
self._sort_formats(formats)
thumbnails = []
for thumbnail_url in video_data.get('thumbnails', []):
if not thumbnail_url:
continue
thumbnails.append({
'url': self._proto_relative_url(thumbnail_url),
})
subtitles = {}
for sub in video_data.get('sb', []):
sub_url = sub.get('u')
for sub in video_data.get('subtitles', []):
sub_url = sub.get('url')
if not sub_url:
continue
subtitles.setdefault(sub.get('l', 'en'), []).append({
'url': sub_url,
subtitles.setdefault(sub.get('lang', 'English'), []).append({
'url': self._proto_relative_url(sub_url),
})
return {
@@ -78,9 +82,8 @@ class TubiTvIE(InfoExtractor):
'title': title,
'formats': formats,
'subtitles': subtitles,
'thumbnail': video_data.get('ph'),
'description': video_data.get('d'),
'duration': int_or_none(video_data.get('s')),
'timestamp': parse_iso8601(video_data.get('u')),
'uploader': video_data.get('on'),
'thumbnails': thumbnails,
'description': video_data.get('description'),
'duration': int_or_none(video_data.get('duration')),
'uploader_id': video_data.get('publisher_id'),
}

View File

@@ -69,7 +69,8 @@ class TVPIE(InfoExtractor):
webpage = self._download_webpage(url, page_id)
video_id = self._search_regex([
r'<iframe[^>]+src="[^"]*?object_id=(\d+)',
"object_id\s*:\s*'(\d+)'"], webpage, 'video id')
r"object_id\s*:\s*'(\d+)'",
r'data-video-id="(\d+)"'], webpage, 'video id', default=page_id)
return {
'_type': 'url_transparent',
'url': 'tvp:' + video_id,
@@ -138,6 +139,9 @@ class TVPEmbedIE(InfoExtractor):
# formats.extend(self._extract_mpd_formats(
# video_url_base + '.ism/video.mpd',
# video_id, mpd_id='dash', fatal=False))
formats.extend(self._extract_ism_formats(
video_url_base + '.ism/Manifest',
video_id, 'mss', fatal=False))
formats.extend(self._extract_f4m_formats(
video_url_base + '.ism/video.f4m',
video_id, f4m_id='hds', fatal=False))

View File

@@ -398,7 +398,7 @@ class TwitchStreamIE(TwitchBaseIE):
channel_id = self._match_id(url)
stream = self._call_api(
'kraken/streams/%s' % channel_id, channel_id,
'kraken/streams/%s?stream_type=all' % channel_id, channel_id,
'Downloading stream JSON').get('stream')
if not stream:
@@ -417,6 +417,7 @@ class TwitchStreamIE(TwitchBaseIE):
query = {
'allow_source': 'true',
'allow_audio_only': 'true',
'allow_spectre': 'true',
'p': random.randint(1000000, 10000000),
'player': 'twitchweb',
'segment_preference': '4',

View File

@@ -5,17 +5,20 @@ from .common import InfoExtractor
class URPlayIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?urplay\.se/program/(?P<id>[0-9]+)'
_TEST = {
_VALID_URL = r'https?://(?:www\.)?ur(?:play|skola)\.se/(?:program|Produkter)/(?P<id>[0-9]+)'
_TESTS = [{
'url': 'http://urplay.se/program/190031-tripp-trapp-trad-sovkudde',
'md5': '15ca67b63fd8fb320ac2bcd854bad7b6',
'md5': 'ad5f0de86f16ca4c8062cd103959a9eb',
'info_dict': {
'id': '190031',
'ext': 'mp4',
'title': 'Tripp, Trapp, Träd : Sovkudde',
'description': 'md5:b86bffdae04a7e9379d1d7e5947df1d1',
}
}
},
}, {
'url': 'http://urskola.se/Produkter/155794-Smasagor-meankieli-Grodan-i-vida-varlden',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
@@ -27,30 +30,17 @@ class URPlayIE(InfoExtractor):
formats = []
for quality_attr, quality, preference in (('', 'sd', 0), ('_hd', 'hd', 1)):
file_rtmp = urplayer_data.get('file_rtmp' + quality_attr)
if file_rtmp:
formats.append({
'url': 'rtmp://%s/urplay/mp4:%s' % (host, file_rtmp),
'format_id': quality + '-rtmp',
'ext': 'flv',
'preference': preference,
})
file_http = urplayer_data.get('file_http' + quality_attr) or urplayer_data.get('file_http_sub' + quality_attr)
if file_http:
file_http_base_url = 'http://%s/%s' % (host, file_http)
formats.extend(self._extract_f4m_formats(
file_http_base_url + 'manifest.f4m', video_id,
preference, '%s-hds' % quality, fatal=False))
formats.extend(self._extract_m3u8_formats(
file_http_base_url + 'playlist.m3u8', video_id, 'mp4',
'm3u8_native', preference, '%s-hls' % quality, fatal=False))
formats.extend(self._extract_wowza_formats(
'http://%s/%splaylist.m3u8' % (host, file_http), video_id, skip_protocols=['rtmp', 'rtsp']))
self._sort_formats(formats)
subtitles = {}
for subtitle in urplayer_data.get('subtitles', []):
subtitle_url = subtitle.get('file')
kind = subtitle.get('kind')
if subtitle_url or kind and kind != 'captions':
if not subtitle_url or (kind and kind != 'captions'):
continue
subtitles.setdefault(subtitle.get('label', 'Svenska'), []).append({
'url': subtitle_url,

View File

@@ -13,7 +13,7 @@ from ..utils import (
class VesselIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?vessel\.com/(?:videos|embed)/(?P<id>[0-9a-zA-Z]+)'
_VALID_URL = r'https?://(?:www\.)?vessel\.com/(?:videos|embed)/(?P<id>[0-9a-zA-Z-_]+)'
_API_URL_TEMPLATE = 'https://www.vessel.com/api/view/items/%s'
_LOGIN_URL = 'https://www.vessel.com/api/account/login'
_NETRC_MACHINE = 'vessel'
@@ -32,12 +32,18 @@ class VesselIE(InfoExtractor):
}, {
'url': 'https://www.vessel.com/embed/G4U7gUJ6a?w=615&h=346',
'only_matching': True,
}, {
'url': 'https://www.vessel.com/videos/F01_dsLj1',
'only_matching': True,
}, {
'url': 'https://www.vessel.com/videos/RRX-sir-J',
'only_matching': True,
}]
@staticmethod
def _extract_urls(webpage):
return [url for _, url in re.findall(
r'<iframe[^>]+src=(["\'])((?:https?:)?//(?:www\.)?vessel\.com/embed/[0-9a-zA-Z]+.*?)\1',
r'<iframe[^>]+src=(["\'])((?:https?:)?//(?:www\.)?vessel\.com/embed/[0-9a-zA-Z-_]+.*?)\1',
webpage)]
@staticmethod

View File

@@ -1,12 +1,93 @@
# coding: utf-8
from __future__ import unicode_literals
import re
import time
import hashlib
import json
from .adobepass import AdobePassIE
from .common import InfoExtractor
from ..utils import ExtractorError
from ..compat import compat_HTTPError
from ..utils import (
int_or_none,
parse_age_limit,
str_or_none,
parse_duration,
ExtractorError,
extract_attributes,
)
class ViceIE(InfoExtractor):
class ViceBaseIE(AdobePassIE):
def _extract_preplay_video(self, url, webpage):
watch_hub_data = extract_attributes(self._search_regex(
r'(?s)(<watch-hub\s*.+?</watch-hub>)', webpage, 'watch hub'))
video_id = watch_hub_data['vms-id']
title = watch_hub_data['video-title']
query = {}
is_locked = watch_hub_data.get('video-locked') == '1'
if is_locked:
resource = self._get_mvpd_resource(
'VICELAND', title, video_id,
watch_hub_data.get('video-rating'))
query['tvetoken'] = self._extract_mvpd_auth(url, video_id, 'VICELAND', resource)
# signature generation algorithm is reverse engineered from signatureGenerator in
# webpack:///../shared/~/vice-player/dist/js/vice-player.js in
# https://www.viceland.com/assets/common/js/web.vendor.bundle.js
exp = int(time.time()) + 14400
query.update({
'exp': exp,
'sign': hashlib.sha512(('%s:GET:%d' % (video_id, exp)).encode()).hexdigest(),
})
try:
host = 'www.viceland' if is_locked else self._PREPLAY_HOST
preplay = self._download_json('https://%s.com/en_us/preplay/%s' % (host, video_id), video_id, query=query)
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 400:
error = json.loads(e.cause.read().decode())
raise ExtractorError('%s said: %s' % (self.IE_NAME, error['details']), expected=True)
raise
video_data = preplay['video']
base = video_data['base']
uplynk_preplay_url = preplay['preplayURL']
episode = video_data.get('episode', {})
channel = video_data.get('channel', {})
subtitles = {}
cc_url = preplay.get('ccURL')
if cc_url:
subtitles['en'] = [{
'url': cc_url,
}]
return {
'_type': 'url_transparent',
'url': uplynk_preplay_url,
'id': video_id,
'title': title,
'description': base.get('body'),
'thumbnail': watch_hub_data.get('cover-image') or watch_hub_data.get('thumbnail'),
'duration': parse_duration(video_data.get('video_duration') or watch_hub_data.get('video-duration')),
'timestamp': int_or_none(video_data.get('created_at')),
'age_limit': parse_age_limit(video_data.get('video_rating')),
'series': video_data.get('show_title') or watch_hub_data.get('show-title'),
'episode_number': int_or_none(episode.get('episode_number') or watch_hub_data.get('episode')),
'episode_id': str_or_none(episode.get('id') or video_data.get('episode_id')),
'season_number': int_or_none(watch_hub_data.get('season')),
'season_id': str_or_none(episode.get('season_id')),
'uploader': channel.get('base', {}).get('title') or watch_hub_data.get('channel-title'),
'uploader_id': str_or_none(channel.get('id')),
'subtitles': subtitles,
'ie_key': 'UplynkPreplay',
}
class ViceIE(ViceBaseIE):
_VALID_URL = r'https?://(?:.+?\.)?vice\.com/(?:[^/]+/)?videos?/(?P<id>[^/?#&]+)'
_TESTS = [{
@@ -21,7 +102,7 @@ class ViceIE(InfoExtractor):
'add_ie': ['Ooyala'],
}, {
'url': 'http://www.vice.com/video/how-to-hack-a-car',
'md5': '6fb2989a3fed069fb8eab3401fc2d3c9',
'md5': 'a7ecf64ee4fa19b916c16f4b56184ae2',
'info_dict': {
'id': '3jstaBeXgAs',
'ext': 'mp4',
@@ -32,6 +113,22 @@ class ViceIE(InfoExtractor):
'upload_date': '20140529',
},
'add_ie': ['Youtube'],
}, {
'url': 'https://video.vice.com/en_us/video/the-signal-from-tolva/5816510690b70e6c5fd39a56',
'md5': '',
'info_dict': {
'id': '5816510690b70e6c5fd39a56',
'ext': 'mp4',
'uploader': 'Waypoint',
'title': 'The Signal From Tölva',
'uploader_id': '57f7d621e05ca860fa9ccaf9',
'timestamp': 1477941983938,
},
'params': {
# m3u8 download
'skip_download': True,
},
'add_ie': ['UplynkPreplay'],
}, {
'url': 'https://news.vice.com/video/experimenting-on-animals-inside-the-monkey-lab',
'only_matching': True,
@@ -42,21 +139,21 @@ class ViceIE(InfoExtractor):
'url': 'https://munchies.vice.com/en/videos/watch-the-trailer-for-our-new-series-the-pizza-show',
'only_matching': True,
}]
_PREPLAY_HOST = 'video.vice'
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
try:
embed_code = self._search_regex(
r'embedCode=([^&\'"]+)', webpage,
'ooyala embed code', default=None)
if embed_code:
return self.url_result('ooyala:%s' % embed_code, 'Ooyala')
youtube_id = self._search_regex(
r'data-youtube-id="([^"]+)"', webpage, 'youtube id')
webpage, urlh = self._download_webpage_handle(url, video_id)
embed_code = self._search_regex(
r'embedCode=([^&\'"]+)', webpage,
'ooyala embed code', default=None)
if embed_code:
return self.url_result('ooyala:%s' % embed_code, 'Ooyala')
youtube_id = self._search_regex(
r'data-youtube-id="([^"]+)"', webpage, 'youtube id', default=None)
if youtube_id:
return self.url_result(youtube_id, 'Youtube')
except ExtractorError:
raise ExtractorError('The page doesn\'t contain a video', expected=True)
return self._extract_preplay_video(urlh.geturl(), webpage)
class ViceShowIE(InfoExtractor):

View File

@@ -1,23 +1,10 @@
# coding: utf-8
from __future__ import unicode_literals
import time
import hashlib
import json
from .adobepass import AdobePassIE
from ..compat import compat_HTTPError
from ..utils import (
int_or_none,
parse_age_limit,
str_or_none,
parse_duration,
ExtractorError,
extract_attributes,
)
from .vice import ViceBaseIE
class VicelandIE(AdobePassIE):
class VicelandIE(ViceBaseIE):
_VALID_URL = r'https?://(?:www\.)?viceland\.com/[^/]+/video/[^/]+/(?P<id>[a-f0-9]+)'
_TEST = {
'url': 'https://www.viceland.com/en_us/video/cyberwar-trailer/57608447973ee7705f6fbd4e',
@@ -38,70 +25,9 @@ class VicelandIE(AdobePassIE):
},
'add_ie': ['UplynkPreplay'],
}
_PREPLAY_HOST = 'www.viceland'
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
watch_hub_data = extract_attributes(self._search_regex(
r'(?s)(<watch-hub\s*.+?</watch-hub>)', webpage, 'watch hub'))
video_id = watch_hub_data['vms-id']
title = watch_hub_data['video-title']
query = {}
if watch_hub_data.get('video-locked') == '1':
resource = self._get_mvpd_resource(
'VICELAND', title, video_id,
watch_hub_data.get('video-rating'))
query['tvetoken'] = self._extract_mvpd_auth(url, video_id, 'VICELAND', resource)
# signature generation algorithm is reverse engineered from signatureGenerator in
# webpack:///../shared/~/vice-player/dist/js/vice-player.js in
# https://www.viceland.com/assets/common/js/web.vendor.bundle.js
exp = int(time.time()) + 14400
query.update({
'exp': exp,
'sign': hashlib.sha512(('%s:GET:%d' % (video_id, exp)).encode()).hexdigest(),
})
try:
preplay = self._download_json('https://www.viceland.com/en_us/preplay/%s' % video_id, video_id, query=query)
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 400:
error = json.loads(e.cause.read().decode())
raise ExtractorError('%s said: %s' % (self.IE_NAME, error['details']), expected=True)
raise
video_data = preplay['video']
base = video_data['base']
uplynk_preplay_url = preplay['preplayURL']
episode = video_data.get('episode', {})
channel = video_data.get('channel', {})
subtitles = {}
cc_url = preplay.get('ccURL')
if cc_url:
subtitles['en'] = [{
'url': cc_url,
}]
return {
'_type': 'url_transparent',
'url': uplynk_preplay_url,
'id': video_id,
'title': title,
'description': base.get('body'),
'thumbnail': watch_hub_data.get('cover-image') or watch_hub_data.get('thumbnail'),
'duration': parse_duration(video_data.get('video_duration') or watch_hub_data.get('video-duration')),
'timestamp': int_or_none(video_data.get('created_at')),
'age_limit': parse_age_limit(video_data.get('video_rating')),
'series': video_data.get('show_title') or watch_hub_data.get('show-title'),
'episode_number': int_or_none(episode.get('episode_number') or watch_hub_data.get('episode')),
'episode_id': str_or_none(episode.get('id') or video_data.get('episode_id')),
'season_number': int_or_none(watch_hub_data.get('season')),
'season_id': str_or_none(episode.get('season_id')),
'uploader': channel.get('base', {}).get('title') or watch_hub_data.get('channel-title'),
'uploader_id': str_or_none(channel.get('id')),
'subtitles': subtitles,
'ie_key': 'UplynkPreplay',
}
return self._extract_preplay_video(url, webpage)

View File

@@ -86,6 +86,11 @@ class VideomoreIE(InfoExtractor):
mobj = re.search(
r'<object[^>]+data=(["\'])https?://videomore\.ru/player\.swf\?.*config=(?P<url>https?://videomore\.ru/(?:[^/]+/)+\d+\.xml).*\1',
webpage)
if not mobj:
mobj = re.search(
r'<iframe[^>]+src=([\'"])(?P<url>https?://videomore\.ru/embed/\d+)',
webpage)
if mobj:
return mobj.group('url')

View File

@@ -1,10 +1,14 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .jwplatform import JWPlatformBaseIE
from ..utils import (
decode_packed_codes,
js_to_json,
NO_DEFAULT,
PACKED_CODES_RE,
)
@@ -35,10 +39,17 @@ class VidziIE(JWPlatformBaseIE):
title = self._html_search_regex(
r'(?s)<h2 class="video-title">(.*?)</h2>', webpage, 'title')
code = decode_packed_codes(webpage).replace('\\\'', '\'')
jwplayer_data = self._parse_json(
self._search_regex(r'setup\(([^)]+)\)', code, 'jwplayer data'),
video_id, transform_source=js_to_json)
packed_codes = [mobj.group(0) for mobj in re.finditer(
PACKED_CODES_RE, webpage)]
for num, pc in enumerate(packed_codes, 1):
code = decode_packed_codes(pc).replace('\\\'', '\'')
jwplayer_data = self._parse_json(
self._search_regex(
r'setup\(([^)]+)\)', code, 'jwplayer data',
default=NO_DEFAULT if num == len(packed_codes) else '{}'),
video_id, transform_source=js_to_json)
if jwplayer_data:
break
info_dict = self._parse_jwplayer_data(jwplayer_data, video_id, require_title=False)
info_dict['title'] = title

View File

@@ -49,7 +49,7 @@ class VierIE(InfoExtractor):
webpage, 'filename')
playlist_url = 'http://vod.streamcloud.be/%s/_definst_/mp4:%s.mp4/playlist.m3u8' % (application, filename)
formats = self._extract_wowza_formats(playlist_url, display_id)
formats = self._extract_wowza_formats(playlist_url, display_id, skip_protocols=['dash'])
self._sort_formats(formats)
title = self._og_search_title(webpage, default=display_id)

View File

@@ -322,6 +322,22 @@ class VimeoIE(VimeoBaseInfoExtractor):
},
'expected_warnings': ['Unable to download JSON metadata'],
},
{
# redirects to ondemand extractor and should be passed throught it
# for successful extraction
'url': 'https://vimeo.com/73445910',
'info_dict': {
'id': '73445910',
'ext': 'mp4',
'title': 'The Reluctant Revolutionary',
'uploader': '10Ft Films',
'uploader_url': 're:https?://(?:www\.)?vimeo\.com/tenfootfilms',
'uploader_id': 'tenfootfilms',
},
'params': {
'skip_download': True,
},
},
{
'url': 'http://vimeo.com/moogaloop.swf?clip_id=2539741',
'only_matching': True,
@@ -414,7 +430,12 @@ class VimeoIE(VimeoBaseInfoExtractor):
# Retrieve video webpage to extract further information
request = sanitized_Request(url, headers=headers)
try:
webpage = self._download_webpage(request, video_id)
webpage, urlh = self._download_webpage_handle(request, video_id)
# Some URLs redirect to ondemand can't be extracted with
# this extractor right away thus should be passed through
# ondemand extractor (e.g. https://vimeo.com/73445910)
if VimeoOndemandIE.suitable(urlh.geturl()):
return self.url_result(urlh.geturl(), VimeoOndemandIE.ie_key())
except ExtractorError as ee:
if isinstance(ee.cause, compat_HTTPError) and ee.cause.code == 403:
errmsg = ee.cause.read()
@@ -837,6 +858,7 @@ class VimeoReviewIE(VimeoBaseInfoExtractor):
'params': {
'videopassword': 'holygrail',
},
'skip': 'video gone',
}]
def _real_initialize(self):
@@ -844,9 +866,10 @@ class VimeoReviewIE(VimeoBaseInfoExtractor):
def _get_config_url(self, webpage_url, video_id, video_password_verified=False):
webpage = self._download_webpage(webpage_url, video_id)
config_url = self._html_search_regex(
r'data-config-url="([^"]+)"', webpage, 'config URL',
default=NO_DEFAULT if video_password_verified else None)
data = self._parse_json(self._search_regex(
r'window\s*=\s*_extend\(window,\s*({.+?})\);', webpage, 'data',
default=NO_DEFAULT if video_password_verified else '{}'), video_id)
config_url = data.get('vimeo_esi', {}).get('config', {}).get('configUrl')
if config_url is None:
self._verify_video_password(webpage_url, video_id, webpage)
config_url = self._get_config_url(

View File

@@ -3,7 +3,6 @@ from __future__ import unicode_literals
import collections
import re
import json
import sys
from .common import InfoExtractor
@@ -369,8 +368,18 @@ class VKIE(VKBaseIE):
opts_url = 'http:' + opts_url
return self.url_result(opts_url)
data_json = self._search_regex(r'var\s+vars\s*=\s*({.+?});', info_page, 'vars')
data = json.loads(data_json)
# vars does not look to be served anymore since 24.10.2016
data = self._parse_json(
self._search_regex(
r'var\s+vars\s*=\s*({.+?});', info_page, 'vars', default='{}'),
video_id, fatal=False)
# <!json> is served instead
if not data:
data = self._parse_json(
self._search_regex(
r'<!json>\s*({.+?})\s*<!>', info_page, 'json'),
video_id)['player']['params'][0]
title = unescapeHTML(data['md_title'])

View File

@@ -31,7 +31,8 @@ class VodlockerIE(InfoExtractor):
if any(p in webpage for p in (
'>THIS FILE WAS DELETED<',
'>File Not Found<',
'The file you were looking for could not be found, sorry for any inconvenience.<')):
'The file you were looking for could not be found, sorry for any inconvenience.<',
'>The file was removed')):
raise ExtractorError('Video %s does not exist' % video_id, expected=True)
fields = self._hidden_inputs(webpage)

View File

@@ -0,0 +1,55 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
float_or_none,
)
class VzaarIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:www|view)\.)?vzaar\.com/(?:videos/)?(?P<id>\d+)'
_TESTS = [{
'url': 'https://vzaar.com/videos/1152805',
'md5': 'bde5ddfeb104a6c56a93a06b04901dbf',
'info_dict': {
'id': '1152805',
'ext': 'mp4',
'title': 'sample video (public)',
},
}, {
'url': 'https://view.vzaar.com/27272/player',
'md5': '3b50012ac9bbce7f445550d54e0508f2',
'info_dict': {
'id': '27272',
'ext': 'mp3',
'title': 'MP3',
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
video_data = self._download_json(
'http://view.vzaar.com/v2/%s/video' % video_id, video_id)
source_url = video_data['sourceUrl']
info = {
'id': video_id,
'title': video_data['videoTitle'],
'url': source_url,
'thumbnail': self._proto_relative_url(video_data.get('poster')),
'duration': float_or_none(video_data.get('videoDuration')),
}
if 'audio' in source_url:
info.update({
'vcodec': 'none',
'ext': 'mp3',
})
else:
info.update({
'width': int_or_none(video_data.get('width')),
'height': int_or_none(video_data.get('height')),
'ext': 'mp4',
})
return info

View File

@@ -201,6 +201,32 @@ class YahooIE(InfoExtractor):
},
'skip': 'redirect to https://www.yahoo.com/music',
},
{
# yahoo://article/
'url': 'https://www.yahoo.com/movies/video/true-story-trailer-173000497.html',
'info_dict': {
'id': '071c4013-ce30-3a93-a5b2-e0413cd4a9d1',
'ext': 'mp4',
'title': "'True Story' Trailer",
'description': 'True Story',
},
'params': {
'skip_download': True,
},
},
{
# ytwnews://cavideo/
'url': 'https://tw.video.yahoo.com/movie-tw/單車天使-中文版預-092316541.html',
'info_dict': {
'id': 'ba133ff2-0793-3510-b636-59dfe9ff6cff',
'ext': 'mp4',
'title': '單車天使 - 中文版預',
'description': '中文版預',
},
'params': {
'skip_download': True,
},
},
]
def _real_extract(self, url):
@@ -269,7 +295,8 @@ class YahooIE(InfoExtractor):
r'"first_videoid"\s*:\s*"([^"]+)"',
r'%s[^}]*"ccm_id"\s*:\s*"([^"]+)"' % re.escape(page_id),
r'<article[^>]data-uuid=["\']([^"\']+)',
r'yahoo://article/view\?.*\buuid=([^&"\']+)',
r'<meta[^<>]+yahoo://article/view\?.*\buuid=([^&"\']+)',
r'<meta[^<>]+["\']ytwnews://cavideo/(?:[^/]+/)+([\da-fA-F-]+)[&"\']',
]
video_id = self._search_regex(
CONTENT_ID_REGEXES, webpage, 'content ID')

View File

@@ -1867,7 +1867,7 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
'title': 'Uploads from Interstellar Movie',
'id': 'UUXw-G3eDE9trcvY2sBMM_aA',
},
'playlist_mincout': 21,
'playlist_mincount': 21,
}, {
# Playlist URL that does not actually serve a playlist
'url': 'https://www.youtube.com/watch?v=FqZTN594JQw&list=PLMYEtVRpaqY00V9W81Cwmzp6N6vZqfUKD4',
@@ -1890,6 +1890,27 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
'skip_download': True,
},
'add_ie': [YoutubeIE.ie_key()],
}, {
'url': 'https://youtu.be/yeWKywCrFtk?list=PL2qgrgXsNUG5ig9cat4ohreBjYLAPC0J5',
'info_dict': {
'id': 'yeWKywCrFtk',
'ext': 'mp4',
'title': 'Small Scale Baler and Braiding Rugs',
'uploader': 'Backus-Page House Museum',
'uploader_id': 'backuspagemuseum',
'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/backuspagemuseum',
'upload_date': '20161008',
'license': 'Standard YouTube License',
'description': 'md5:800c0c78d5eb128500bffd4f0b4f2e8a',
'categories': ['Nonprofits & Activism'],
'tags': list,
'like_count': int,
'dislike_count': int,
},
'params': {
'noplaylist': True,
'skip_download': True,
},
}, {
'url': 'https://youtu.be/uWyaPkt-VOI?list=PL9D9FC436B881BA21',
'only_matching': True,
@@ -1971,8 +1992,10 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
def _check_download_just_video(self, url, playlist_id):
# Check if it's a video-specific URL
query_dict = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
if 'v' in query_dict:
video_id = query_dict['v'][0]
video_id = query_dict.get('v', [None])[0] or self._search_regex(
r'(?:^|//)youtu\.be/([0-9A-Za-z_-]{11})', url,
'video id', default=None)
if video_id:
if self._downloader.params.get('noplaylist'):
self.to_screen('Downloading just video %s because of --no-playlist' % video_id)
return video_id, self.url_result(video_id, 'Youtube', video_id=video_id)

View File

@@ -40,7 +40,7 @@ class EmbedThumbnailPP(FFmpegPostProcessor):
'Skipping embedding the thumbnail because the file is missing.')
return [], info
if info['ext'] in ('mp3', 'mkv'):
if info['ext'] == 'mp3':
options = [
'-c', 'copy', '-map', '0', '-map', '1',
'-metadata:s:v', 'title="Album cover"', '-metadata:s:v', 'comment="Cover (Front)"']

View File

@@ -279,6 +279,9 @@ class FFmpegExtractAudioPP(FFmpegPostProcessor):
prefix, sep, ext = path.rpartition('.') # not os.path.splitext, since the latter does not work on unicode in all setups
new_path = prefix + sep + extension
information['filepath'] = new_path
information['ext'] = extension
# If we download foo.mp3 and convert it to... foo.mp3, then don't delete foo.mp3, silly.
if (new_path == path or
(self._nopostoverwrites and os.path.exists(encodeFilename(new_path)))):
@@ -300,9 +303,6 @@ class FFmpegExtractAudioPP(FFmpegPostProcessor):
new_path, time.time(), information['filetime'],
errnote='Cannot update utime of audio file')
information['filepath'] = new_path
information['ext'] = extension
return [path], information

Some files were not shown because too many files have changed in this diff Show More