Compare commits

...

906 Commits

Author SHA1 Message Date
Philipp Hagemeister
f1ed3acae5 release 2016.02.04 2016-02-04 13:39:26 +01:00
remitamine
920d21b9d3 [test_subtitles] update youtube subtitles tests 2016-02-04 08:50:55 +01:00
remitamine
2fb35d1c28 [youtube] fix subtitle order 2016-02-04 08:39:01 +01:00
remitamine
09be85b8dd [youtube] fix subtitle extraction(fixes #8415) 2016-02-04 08:28:37 +01:00
remitamine
eadc3ccd50 [generic] extract m3u8 formats when mpegurl content type detected 2016-02-04 01:25:36 +01:00
Yen Chi Hsuan
58be922079 [kuwo] Check for georestriction 2016-02-04 01:26:25 +08:00
Sergey M
c84d3a557d [README.md] Clarify unavailable sequences in output format 2016-02-03 19:18:25 +05:00
remitamine
6ad2b01e14 [srgssr] use flv as ext for rtmp formats 2016-02-02 23:09:50 +01:00
remitamine
fd3a1f3d60 [cbsnews] add support for live videos(fixes #7010) 2016-02-02 23:02:18 +01:00
Jaime Marquínez Ferrándiz
87de7069b9 [utils] dfxp2srt: make TTMLPElementParser inherit from object
For consistency between python 2 and 3.
2016-02-02 22:30:13 +01:00
remitamine
6fba62c87a [ffmpeg] fix adding metadata when using --hls-prefer-native(#8350) 2016-02-02 22:14:23 +01:00
Yen Chi Hsuan
1df4141196 [test_YoutubeDL] Fix test_youtube_format_selection
Broken since a6c2c24479. Thanks to
@jaimeMF and @anisse for pointing that out
2016-02-03 03:42:37 +08:00
remitamine
fae45ede08 Merge pull request #8354 from remitamine/m3u8_metadata
[ffmpeg] fix adding metadata when using m3u8_native(fixes #8350)
2016-02-02 19:13:58 +01:00
remitamine
4e0cff2a50 Merge pull request #8348 from remitamine/dfxp2srt-text
[utils] fix dfxp2srt text extraction(fixes #8055)
2016-02-02 18:36:26 +01:00
Sergey M․
0436157b95 [vk:uservideos] Improve _VALID_URL (Closes #8389) 2016-02-02 00:52:37 +06:00
Philipp Hagemeister
ae0db349c1 release 2016.02.01 2016-02-01 12:00:32 +01:00
Yen Chi Hsuan
08411970d5 Merge pull request #8374 from yan12125/facebook-dash
Facebook DASH formats
2016-02-01 18:31:49 +08:00
Yen Chi Hsuan
dc724e0c8b [daum.net:user] Match more URLs (#1952) 2016-02-01 18:26:23 +08:00
Yen Chi Hsuan
0a5d1ec706 Merge branch 'ping-daum-playlist-user' 2016-02-01 18:19:32 +08:00
Yen Chi Hsuan
58250eff2b [daum] Update test_daum_1 2016-02-01 18:19:02 +08:00
Yen Chi Hsuan
11a4efc505 [daum] Do not match a single URL with multiple info extractors 2016-02-01 18:15:53 +08:00
Yen Chi Hsuan
7537b35fb8 [daum] PEP8 2016-02-01 17:40:35 +08:00
Yen Chi Hsuan
33cc74eeeb Merge branch 'daum-playlist-user' of https://github.com/ping/youtube-dl into ping-daum-playlist-user 2016-02-01 17:29:12 +08:00
Yen Chi Hsuan
f021acee49 [kickstarter] Fix title and test_kickstarter
It's the description page that contains a video. The original URL is now
the calendar.
2016-02-01 17:21:40 +08:00
Yen Chi Hsuan
abe694ca95 [kickstarter] Eliminate the warning message and add_ie 2016-02-01 17:10:11 +08:00
Yen Chi Hsuan
b286f201a8 [YoutubeDL] Do not override ie_key in url_transparent 2016-02-01 17:05:48 +08:00
Yen Chi Hsuan
bd93a12e85 [vidzi] Fix _TESTS 2016-02-01 17:03:31 +08:00
Yen Chi Hsuan
92769650fa [vidzi] Fix extraction
Closes #8386.

Vidzi.tv now uses jwplayer, which can be handled by GenericIE
2016-02-01 15:40:42 +08:00
Yen Chi Hsuan
dc4fe5c6d7 [allocine] Use xpath_element 2016-02-01 05:32:28 +08:00
Yen Chi Hsuan
566bda51f2 [bpb] Fix extraction and update tests 2016-02-01 05:00:09 +08:00
Yen Chi Hsuan
f63757ec35 [allocine] Fix for Python 2.6
Python 2.6 does not support .// syntax in find(). Fortunately, the
interested node is at the top level
2016-02-01 03:34:02 +08:00
Yen Chi Hsuan
7a0ed06909 [allocine] Fix extraction of test_allocine_1 and update tests 2016-02-01 03:31:58 +08:00
Yen Chi Hsuan
9934fe76be [acast] Remove ACastBaseIE
No longer necessary as _API_BASE_URL is used by ACastChannelIE only
2016-02-01 03:08:46 +08:00
Yen Chi Hsuan
a8aad21001 [acast] Fix extraction 2016-02-01 03:07:45 +08:00
Yen Chi Hsuan
d055bf91cc Merge branch 'rrooij-gamekings_fix' 2016-02-01 02:21:02 +08:00
Yen Chi Hsuan
0e1b1a011d [gamekings] Stricter checks 2016-02-01 02:19:03 +08:00
Yen Chi Hsuan
eab3c2895c [gamekings] add_ie 2016-02-01 02:15:25 +08:00
Yen Chi Hsuan
163da6a484 [gamekings] Add MD5 back
The test is now a YouTube video, whose MD5 should be stable
2016-02-01 02:13:11 +08:00
Yen Chi Hsuan
324916d11a Merge branch 'gamekings_fix' of https://github.com/rrooij/youtube-dl into rrooij-gamekings_fix 2016-02-01 02:11:25 +08:00
Jaime Marquínez Ferrándiz
3ccb0655c1 [youtube] Use 'orderedSet' instead of 'set' to preserve the order 2016-01-31 15:11:00 +01:00
Jaime Marquínez Ferrándiz
e04398e397 [FFmpegSubtitlesConvertorPP] delete old subtitle files (fixes #8382) 2016-01-31 14:22:36 +01:00
Yen Chi Hsuan
231ea2a3bb [xuite] Replace the test case with my uploaded one 2016-01-31 20:21:57 +08:00
Yen Chi Hsuan
b99d88c6a1 [youporn] Fix uploader and description 2016-01-31 20:12:43 +08:00
Yen Chi Hsuan
189d72d5fd [test_subtitles] Fix TestRaiSubtitles
RaiIE is renamed to RaiTVIE in 06d5556dfa
2016-01-31 20:12:43 +08:00
Yen Chi Hsuan
a7aab0c23e [test_youtube_lists] Fix TestYoutubeLists.test_youtube_course
Youtube entries are now generators
2016-01-31 20:12:43 +08:00
Philipp Hagemeister
a69bee4762 release 2016.01.31 2016-01-31 12:57:18 +01:00
Sergey M․
9acd33094d [youtube] Filter duplicates in playlists base extractor 2016-01-31 17:52:02 +06:00
Sergey M․
8e7aad2075 [youtube] Use authentication for entry list base extractor (Closes #8380) 2016-01-31 17:49:59 +06:00
rrooij
ce5879fa14 [Gamekings] Fix viewing of old videos
Some old videos that aren't on Vimeo are being uploaded to YouTube under the
'Gamekings Vault' channel. They use YouTube now for some videos as video
hosting instead of Vimeo or their own hosting. The first test failed to
succeed under the existing code, but works now by using the YouTube
extractor.

The Regex is changed to find the new gogoVideo JavaScript line with the
YouTube embed. Checking if there is a YouTube embed is done by a String
find, which is probably not the best method of checking this.
2016-01-31 00:20:46 +01:00
Yen Chi Hsuan
7b7507d6e1 [letv] Fix LetvCloud extraction 2016-01-31 07:15:43 +08:00
rrooij
14823decf3 [Gamekings] Fix url from .tv to .nl
Gamekings doesn't use the .tv top level domain anymore, but the regular
domain for Dutch sites.
2016-01-31 00:03:23 +01:00
Sergey M․
673fb82e65 [schooltv] Improve video id regex 2016-01-31 04:41:18 +06:00
Sergey M
181cf24bc0 Merge pull request #8376 from rrooij/schooltv
[schooltv] Add extractor for SchoolTV playlists
2016-01-31 04:36:33 +06:00
rrooij
89f2602880 [schooltv] Add extractor for SchoolTV playlists
This closes #8163
2016-01-30 23:21:42 +01:00
Yen Chi Hsuan
db9b1dbcd9 [nba] Add ext for hls formats and fix test_NBA 2016-01-31 04:58:10 +08:00
Yen Chi Hsuan
e881c4bcab [nbc] Use NBC's id and fix _TESTS
ThePlatform URL gives the same ID for all _TESTS
2016-01-31 04:58:10 +08:00
Yen Chi Hsuan
670ad51ade [nrktv] Fix _TESTS 2016-01-31 04:58:10 +08:00
Yen Chi Hsuan
eb6fc7d32a [senateisvp] Fix test_SenateISVP and test_SenateISVP_1 2016-01-31 04:58:10 +08:00
Yen Chi Hsuan
ed1a390583 [tv2] Fix test_TV2 2016-01-31 04:58:10 +08:00
Yen Chi Hsuan
809e1857c5 [screenwavemedia] Fix HLS extension and test_TeamFour 2016-01-31 04:58:10 +08:00
Yen Chi Hsuan
7c38af48b9 [vgtv] Fix test_VGTV_2 2016-01-31 04:58:10 +08:00
Yen Chi Hsuan
60ad3eb970 [viidea] Skip download for the test case requiring ffmpeg 2016-01-31 04:58:10 +08:00
Sergey M․
a7685b3a6b [npo] Add extension for m3u8 2016-01-31 02:38:28 +06:00
remitamine
8f1fddc816 [limelight] fix format sorting and make m3u8 and f4m extraction non fatal 2016-01-30 20:51:47 +01:00
remitamine
1bf996fa5c [generic] Add support for Limelight API 2016-01-30 20:45:56 +01:00
Yen Chi Hsuan
248ae880b6 [facebook] Add md5 for the test case with DASH 2016-01-30 23:01:19 +08:00
Yen Chi Hsuan
2d2fa82d17 [common] Add _extract_dash_manifest_formats 2016-01-30 22:52:23 +08:00
Yen Chi Hsuan
c94678957f [common] Remove unused arguments 2016-01-30 22:45:16 +08:00
Yen Chi Hsuan
16f38a699f [common] Rename to namespace
For consistency with _parse_smil_*
2016-01-30 22:40:56 +08:00
Yen Chi Hsuan
a6c2c24479 [youtube] Remove '(v|a)codec': 'none' entries
Not used anymore
2016-01-30 22:28:53 +08:00
Sergey M․
b8c9926c0a [downloader/f4m] Do not update fragment list while test 2016-01-30 19:43:25 +06:00
Yen Chi Hsuan
df374b5222 [common] Prefer the manifest than formats_dict in determining codecs 2016-01-30 21:42:27 +08:00
Yen Chi Hsuan
5ea1eb78f5 [common] Fix for youtube 2016-01-30 21:36:01 +08:00
Yen Chi Hsuan
5d2c0fd9ba [youtube] Pass self._formats to _parse_dash_manifest 2016-01-30 21:32:15 +08:00
Yen Chi Hsuan
0803753fea [facebook] Add support for DASH manifests 2016-01-30 21:31:53 +08:00
Sergey M․
2c2f1efdcd [downloader/fragment] Remove superfluous whitespace 2016-01-30 19:30:31 +06:00
Yen Chi Hsuan
b323e1707d [common] Modify _parse_dash_manifest for use in Facebook 2016-01-30 21:27:43 +08:00
Sergey M․
09104e9930 [downloader/f4m] Add live stream flag to context
Now download progress for f4m livestreams is reported correctly
2016-01-30 19:22:15 +06:00
Sergey M․
5fa1702ca6 [downloader/fragment] Do not report total bytes estimation and eta for live streams 2016-01-30 19:20:52 +06:00
Yen Chi Hsuan
17b598d30c [common] _parse_dash_manifest() from youtube.py 2016-01-30 21:05:55 +08:00
Sergey M․
53be8894e4 [options] Add missing closing parenthesis 2016-01-30 18:44:22 +06:00
Sergey M․
c3deacd562 [matchtv] Add extractor (Closes #8313) 2016-01-30 18:30:27 +06:00
Sergey M․
8ab3fe81d8 [downloader/f4m] Prefer bootstrap url attribute over inline bootstrap info 2016-01-30 18:28:38 +06:00
ping
2f0a33d8a3 [daum.net] Support for playlists, user channels 2016-01-30 20:10:36 +08:00
Yen Chi Hsuan
05d0d131a7 [youtube] Move decrypt_sig out of _parse_dash_manifest 2016-01-30 20:05:56 +08:00
Yen Chi Hsuan
c140629995 [facebook] Support alternative webpage form
Fixes #8371
2016-01-30 19:33:22 +08:00
Jaime Marquínez Ferrándiz
7d106a65ca Add --hls-use-mpegts option
When using the mpegts container hls vidoes can be played while being downloaded (useful if you are recording a live stream).
VLC and mpv play them file, but QuickTime doesn't.
2016-01-30 12:26:40 +01:00
Yen Chi Hsuan
0179f6a830 [daum] Add 'thumbnail' to all _TESTS 2016-01-30 16:54:14 +08:00
Yen Chi Hsuan
830afe85dc [daum.net] Support VodPlayer.swf URLs (closes #8173) 2016-01-30 16:50:13 +08:00
Yen Chi Hsuan
8bf39420b4 Merge remote-tracking branch 'upstream/master' 2016-01-30 16:25:55 +08:00
Yen Chi Hsuan
71d08b3e29 Merge branch 'ping-daum-fix-clip' 2016-01-30 16:25:06 +08:00
Yen Chi Hsuan
06ffa33485 [daum.net] Move the request to ClipInfoXml.do
To reduce the number of wasted requests
2016-01-30 16:23:37 +08:00
Yen Chi Hsuan
874e05975b Merge branch 'daum-fix-clip' of https://github.com/ping/youtube-dl into ping-daum-fix-clip 2016-01-30 16:22:37 +08:00
ping
f5d30d521c [daum] Fix add view_count, comment_count to test 2016-01-30 11:09:30 +08:00
ping
e047922be0 [daum] Fix copy-paste mistake 2016-01-30 11:04:11 +08:00
Sergey M․
83ab8a79cc [espn] Improve video id extraction (Closes #8368) 2016-01-30 01:48:54 +06:00
Sergey M․
350cf045d8 [extractor/common] Restrict checks when auto calculating tbr 2016-01-30 01:47:46 +06:00
Sergey M․
68a0ea15b4 [cspan] Unescape path (Closes #8365) 2016-01-30 00:26:33 +06:00
Jaime Marquínez Ferrándiz
2b4f5e68d1 [azubu] Add extractor for live streams (closes #8343) 2016-01-29 15:36:33 +01:00
Philipp Hagemeister
055f417278 release 2016.01.29 2016-01-29 12:20:08 +01:00
Jaime Marquínez Ferrándiz
70029bc348 [youtube:user] Require 'https?://' in the url (fixes #8356)
It was matching www.youtube.com/embed/WpfukLMe1TM.
The generic extractor automatically adds http:// if it's missing.
2016-01-29 11:27:11 +01:00
remitamine
cf57433bbd [ffmpeg] fix adding metadata when using m3u8_native(fixes #8350) 2016-01-28 18:57:32 +01:00
Sergey M․
1ac6e794cb [bbc] Add test for #8147 2016-01-28 23:27:48 +06:00
Sergey M․
a853427427 [bbc] Add another description regex 2016-01-28 23:23:13 +06:00
Sergey M․
50e989e263 [bbc] Add another title regex (Closes #8340) 2016-01-28 23:19:53 +06:00
Sergey M․
10e6ed9341 [ok] Add support for mobile URLs (Closes #8345) 2016-01-28 22:56:49 +06:00
Sergey M․
38c84acae5 [ndr:embed:base] Add missing ext for m3u8 2016-01-28 22:50:18 +06:00
Yen Chi Hsuan
29f46c2bee Credit @dyn888 for improving format selection
[ci skip]
2016-01-28 22:56:59 +08:00
Yen Chi Hsuan
39c10a2b6e Merge pull request #8346 from dyn888/dyn888-regex-1
Regex pattern update to match more codecs (fixes #6858)
2016-01-28 22:22:43 +08:00
dyn888
b913348d5f Test codec with a dot '.' in name selection. 2016-01-28 15:07:33 +01:00
remitamine
2b14cb566f [utils] fix dfxp2srt text extraction(fixes #8055) 2016-01-28 12:38:34 +01:00
dyn888
b0df5223be Update YoutubeDL.py 2016-01-28 12:07:15 +01:00
Sergey M․
ed7cd1e859 [cbsnews] Remove unused import 2016-01-28 00:42:04 +06:00
remitamine
f125d9115b [cbsnews] extract all formats 2016-01-27 19:11:21 +01:00
remitamine
a9d5f12fec Merge pull request #8328 from remitamine/hls-master-detect
[extractor/common] detect media playlist in _extract_m3u8_formats
2016-01-27 18:07:30 +01:00
remitamine
7f32e5dc35 [extractor/common] detect media playlist in _extract_m3u8_formats 2016-01-27 17:53:42 +01:00
Sergey M․
c3111ab34f [spankbang] Fix title extraction (Closes #8329) 2016-01-27 21:49:56 +06:00
Sergey M․
9339774af2 [spankbang] Fix formats extraction 2016-01-27 21:49:39 +06:00
Sergey M․
b0d21deda9 [extractor/common] Auto calculate tbr when missing 2016-01-27 21:11:17 +06:00
Philipp Hagemeister
fab6f0e65b release 2016.01.27 2016-01-27 08:32:03 +01:00
ping
b6c33fd544 [daum.net] Fixes #8331 2016-01-27 12:48:00 +08:00
Sergey M․
fb4b345800 [instagram] Make description optional (Closes #8326) 2016-01-26 21:46:51 +06:00
Sergey M․
af9c2a07ae [cspan] Extract from path when no qualities (Closes #8317) 2016-01-26 21:29:42 +06:00
remitamine
ab180fc648 Merge branch 'master' of github.com:rg3/youtube-dl 2016-01-26 15:55:38 +01:00
remitamine
682f8c43b5 [vevo] fallback to youtube video only if vevo video is geo restricted(fixes 8263)(fixes 2874) 2016-01-26 15:54:32 +01:00
Sergey M․
f693213567 [cspan] Fix clip/prog id extraction (#8317) 2016-01-26 20:42:20 +06:00
remitamine
9165d6bab9 [vevo] extract metadata and formats from api if videoinfo is empty
these was fixed by @yan12125 in ff51983e15
i only added some code to extract video metadata and more formats from
api
2016-01-26 13:46:58 +01:00
remitamine
2975fe1a7b [vevo] extract all formats and bypass geo restriction 2016-01-25 22:35:06 +01:00
Sergey M․
de691a498d [facebook:post] Add extractor (Closes #8321) 2016-01-25 22:18:34 +06:00
Sergey M․
2e6e742c3c [facebook] Add shortcut and reformat _VALID_URL 2016-01-25 22:15:21 +06:00
Yen Chi Hsuan
e9bd0f772b Merge pull request #8130 from dyn888/master
[youtube] added vcodec/acodec/abr for multiple itags
2016-01-25 01:15:11 +08:00
Yen Chi Hsuan
77f785076f [common] Keep full codec name from m3u8 manifests
See #8293. This is for consistency between YouTube and HLS formats.
2016-01-25 01:03:46 +08:00
Yen Chi Hsuan
94278f7202 [youtube] Prefer info from YouTube than _formats (#8293) 2016-01-25 01:02:19 +08:00
Yen Chi Hsuan
a0d8d704df [utils] Reorder items in mimetype2ext alphabetically 2016-01-25 01:01:15 +08:00
Yen Chi Hsuan
f6861ec96f [utils] Add more items to mimetype2ext (#8293)
These are used in Youtube formats
2016-01-25 00:58:53 +08:00
Philipp Hagemeister
f733b05302 release 2016.01.23 2016-01-23 12:03:12 +01:00
Sergey M․
6fa73386cb [drtv] Use IETF language tag 2016-01-23 01:54:00 +06:00
Sergey M․
5ca01bb9e4 [kanalplay] Use IETF language tag 2016-01-23 01:51:18 +06:00
Sergey M․
1ca59daca9 [options] Clarify language tags 2016-01-23 01:50:06 +06:00
Sergey M․
594c4d79a5 [svt] Improve subtitles extraction and add test (Closes #8265) 2016-01-23 01:47:54 +06:00
Marian Sigler
1f16b958b1 [SVTPlay] Add subtitle support 2016-01-23 01:28:31 +06:00
Sergey M․
4c0d13df9b [lovehomeporn] Add extractor 2016-01-23 00:52:23 +06:00
Sergey M․
b2c6528baf [ruleporn] Rework in terms of nuevo (Closes #8206) 2016-01-23 00:40:11 +06:00
Sergey M․
ea17820432 [nuevo] Improve thumbnail extraction 2016-01-23 00:38:58 +06:00
Dankryn
1257b049bc [ruleporn] Add new extractor 2016-01-23 00:13:27 +06:00
Sergey M․
b969813548 Credit @nexAkari for trollvids and nuevo (#7728) 2016-01-23 00:10:49 +06:00
Sergey M․
10677ece81 [nuevo] Simplify nuevo extractors (Closes #7728) 2016-01-23 00:04:33 +06:00
Andrew "Akari" Alexeyew
d570746e45 [nuevo] Generalize nuevo extractor and add support for trollvids
Supports only the nuevo player for now (most common).

[trollvids] convert duration to an int

[trollvids] added a test

[trollvids] made flake8 shut up

Generalized the Nuevo extractor

Affects: anitube, trollvids, trutube

[nuevo] Complied with the code comments.
2016-01-22 23:29:24 +06:00
Sergey M․
4fcd9d147d [arte:cinema] Add extractor 2016-01-22 23:00:50 +06:00
Sergey M․
9c54ae3387 [arte:future] Make duplicated test matching only 2016-01-22 23:00:05 +06:00
François Charlier
24114fee74 [arte:future] Fix extraction
[arte] Add support for more "Arte Future" uri
2016-01-22 22:58:37 +06:00
Sergey M․
220ee33f2b [cbsnews] Simplify subtitles extraction and fix test (Closes #8295) 2016-01-22 22:23:21 +06:00
John Assael
4118cc02c1 [cbsnews] Extract subtitles
added test function for CBS News subtitles
2016-01-22 22:15:51 +06:00
Jaime Marquínez Ferrándiz
32d77eeb04 [downloader/common] report_retry: Don't crash when retries is infinite (fixes #8299) 2016-01-22 14:49:17 +01:00
Filippo Valsorda
032f232626 Merge pull request #8142 from FiloSottile/filippo/updates
[update] fix (unexploitable) BB'06 vulnerability in rsa_verify
2016-01-21 20:17:37 +00:00
Filippo Valsorda
4d318be195 [update] fix (unexploitable) BB'06 vulnerability in rsa_verify
The rsa_verify code was vulnerable to a BB'06 attack, allowing to forge
signatures for arbitrary messages if and only if the public key exponent is
3.  Since the updates key is hardcoded to 65537, there is no risk for
youtube-dl, but I don't want vulnerable code in the wild.

The new function adopts a way safer approach of encoding-and-comparing to
replace the dangerous parsing code.
2016-01-21 20:12:17 +00:00
Yen Chi Hsuan
6b45f9aba2 [iqiyi] Update key (closes #8292) 2016-01-22 02:14:47 +08:00
Sergey M․
1e10d02fec [hitbox] Skip subscribe only formats (Closes #8217) 2016-01-21 23:28:22 +06:00
Sergey M․
51290d8457 [youtube] Simplify automatic captions URL check (Closes #8287) 2016-01-21 22:58:03 +06:00
Dimitre Liotev
582f4f834e Fix issue #8109 (error when downloading automatic captions) 2016-01-21 22:55:36 +06:00
Sergey M․
e87d98b0dd [yahoo] Add improve content id regexes (Closes #8290) 2016-01-21 22:42:50 +06:00
igv
383496e65e Additional regex for yahoo extractor 2016-01-21 22:35:28 +06:00
Jaime Marquínez Ferrándiz
4519c1f43c [vimeo] 'ext' must be a string, not a tuple (fixes #8288)
There was an ',' at the end of the line.
2016-01-21 12:43:45 +01:00
Sergey M․
a616f65471 [tube8] PEP 8 2016-01-20 21:30:29 +06:00
CeruleanSky
1f78ed189a [OraTV] update extractor
"current" is now "video"
"hls_stream" is now hls_stream without quotes
video_id is now id
duration for current video is not present(for other videos it is)

modified regex to find hls_stream variable to work reguardless of whether it is quoted or not.

[ora] Improve (Closes #8273)
2016-01-20 21:28:42 +06:00
Sergey M․
7dde358adc [tube8] Extract duration and modernize 2016-01-20 20:07:32 +06:00
Sergey M․
27b83249c9 [tube8] Fix extraction and extract all formats (Closes #8281) 2016-01-20 20:00:51 +06:00
Yen Chi Hsuan
56aa074538 Credit @FounderSG for WeiqiTV and LetvCloud (#7994)
[ci skip]
2016-01-20 13:20:03 +08:00
Jaime Marquínez Ferrándiz
9d90e7de03 [downloader/hls] Ask ffmpeg to quit when interrupting youtube-dl with 'Ctrl+C' (#8252)
Otherwise the mp4 file can't be played.
2016-01-19 22:07:14 +01:00
Yen Chi Hsuan
7d4d9c526a Merge branch 'ping-patch-8239' 2016-01-20 04:22:25 +08:00
Yen Chi Hsuan
fe6856b059 [neteasemusic] Use float_or_none 2016-01-20 04:21:51 +08:00
Yen Chi Hsuan
a54fbf2ca6 Merge branch 'patch-8239' of https://github.com/ping/youtube-dl into ping-patch-8239 2016-01-20 04:15:46 +08:00
Yen Chi Hsuan
d8024aebe5 Merge branch 'FounderSG-Weiqitv' 2016-01-20 04:06:09 +08:00
Yen Chi Hsuan
8652bd22f1 [weiqitv] Use single quotes 2016-01-20 04:04:39 +08:00
Yen Chi Hsuan
f15a9ca301 [weiqitv] Rename the extractor - capitilize 'TV' 2016-01-20 04:03:57 +08:00
Yen Chi Hsuan
65ced034b8 [weiqitv] Make codes shorter 2016-01-20 04:02:30 +08:00
Yen Chi Hsuan
bec30224ff [letv] LetvCloud: Detect ext instead of the hardcoded one 2016-01-20 04:00:37 +08:00
Yen Chi Hsuan
0428106da3 [letv] LetvCloud: make title looks like a title 2016-01-20 03:53:17 +08:00
Yen Chi Hsuan
73e7442456 [letv] LetvCloud: simplify and improve _VALID_URL 2016-01-20 03:42:01 +08:00
Yen Chi Hsuan
26de1bba83 [letv] LetvCloud: check error messages from server 2016-01-20 03:31:34 +08:00
Yen Chi Hsuan
e0690782b8 [letv] LetvCloud: guard against invalid URLs 2016-01-20 03:25:12 +08:00
Yen Chi Hsuan
8fff4f61e5 [letv] Use single quotes 2016-01-20 03:18:54 +08:00
Yen Chi Hsuan
10defdd06a [letv] Reduce duplicated codes 2016-01-20 03:17:35 +08:00
Sergey M․
485139c15c [viewster] Tolerate missing synopsis (Closes #8274) 2016-01-20 00:02:46 +06:00
Sergey M․
b605ebb609 [lemonde] Add extractor 2016-01-19 22:09:55 +06:00
Sergey M․
aecfcd4e59 [ultimedia] Rename to digiteka 2016-01-19 21:51:46 +06:00
Sergey M․
942d46196f [ultimedia] Extend _VALID_URL to support digiteka 2016-01-19 21:47:06 +06:00
Yen Chi Hsuan
78be2eca7c Merge branch 'Weiqitv' of https://github.com/FounderSG/youtube-dl into FounderSG-Weiqitv 2016-01-19 23:39:32 +08:00
Sergey M․
1fa2b9841d [extractor/generic] Extend dailymotion embed regex 2016-01-19 21:20:45 +06:00
Sergey M․
9fbd0822aa [dailymotion] Extend _VALID_URL 2016-01-19 21:20:14 +06:00
Sergey M․
e323cf3ff3 [youtube] Skip test 2016-01-19 20:56:04 +06:00
Sergey M․
8ceabd4df3 [youtube] Capture and output unavailable message 2016-01-19 20:54:43 +06:00
Sergey M․
a8776b107b [youtube] Clarify test_Youtube_18 2016-01-18 23:19:38 +06:00
Sergey M․
096b533982 [youtube] Fix URL expansion in video description
Fixes test_Youtube_18
2016-01-18 23:17:45 +06:00
Sergey M․
dae503afaa [atresplayer] Skip HLS completely (Closes #8261) 2016-01-17 22:14:07 +06:00
Yen Chi Hsuan
b39eab7f94 Merge pull request #8262 from jwilk/https-everywhere
[ustream] Use HTTPS for GitHub URL
2016-01-17 22:10:03 +08:00
Jakub Wilk
e5a66240c0 [ustream] Use HTTPS for GitHub URL 2016-01-17 15:06:00 +01:00
ping
e0ef13ddeb [neteasemusic] Fallback to alt hosts if m5.music.126.net doesn't work 2016-01-17 07:48:46 +08:00
Sergey M․
855f90fa6f [ae] Rename to aenetworks and clarify extractor name and description 2016-01-17 03:02:45 +06:00
Yen Chi Hsuan
614db89ae3 [compat] Clarify the versions requiring compat_kwargs
It's supported since 2.7.0 alpha 1 and 2.6.5 rc 1. See
https://hg.python.org/cpython/file/v2.7a1/Misc/NEWS#l337
https://hg.python.org/cpython/file/v2.6.5rc1/Misc/NEWS#l28
2016-01-16 22:17:31 +08:00
Yen Chi Hsuan
1358b94163 [ae] Fix _TESTS 2016-01-16 20:56:53 +08:00
Yen Chi Hsuan
350e02d40d [bbc] Use _search_json_ld 2016-01-16 20:46:28 +08:00
Yen Chi Hsuan
0b26ba3fc8 [extractor/common] Allow passing more parameters to _search_json_ld 2016-01-16 20:45:36 +08:00
ping
3a0a78731b Fixes #8239 2016-01-16 12:17:07 +08:00
Sergey M
6be16ed24b [README.md] Add protocol usage example in format selection 2016-01-16 10:15:24 +06:00
Sergey M․
b555942428 [YoutubeDL] Ensure protocol is always present 2016-01-16 10:10:28 +06:00
Sergey M
b2dca40d81 [README.md] Improve format selection documentation 2016-01-16 09:55:52 +06:00
Sergey M
15870bbd01 [README.md] Mention new string operators for format selection 2016-01-16 09:53:31 +06:00
Yen Chi Hsuan
10d33b3473 [YoutubeDL] Introduce CSS3 like string operators 2016-01-16 09:53:12 +06:00
Sergey M
ac25992bc7 Merge pull request #8246 from dstftw/initial-json-ld-metadata-support
Initial JSON-LD metadata extraction support
2016-01-16 07:20:15 +06:00
Sergey M
30783c442d Merge pull request #8245 from dstftw/auto-generate-title-fields
[YoutubeDL] Auto generate title fields corresponding to the *_number fields
2016-01-16 07:20:03 +06:00
Sergey M․
a50a8003a0 [cultureunplugged] Improve (Closes #8060) 2016-01-16 07:10:51 +06:00
Sergey M․
315bdae00a [zippcast] Improve (Closes #8198) 2016-01-16 06:27:34 +06:00
ckuu
2ddfd26f1b '[ZippCast] Add new extractor'
Closes rg3/youtube-dl#6591
2016-01-16 06:25:06 +06:00
Philipp Hagemeister
f3ed5df611 release 2016.01.15 2016-01-15 19:43:04 +01:00
Sergey M․
b4e44234bc [ae] Use JSON-LD for TV series metadata 2016-01-16 00:36:49 +06:00
Sergey M․
4ca2a3cf3c [extractor/common] Add initial support for JSON-LD metadata extraction into info_dict 2016-01-16 00:36:02 +06:00
Sergey M․
33d2fc2f64 [YoutubeDL] Auto generate title fields corresponding to the *_number fields
Auto generate title fields corresponding to the *_number fields when missing in order to always have clean titles. This is very common for TV series.
2016-01-16 00:09:54 +06:00
remitamine
27a95f51aa [cwtv] Add new extractor 2016-01-15 17:45:51 +01:00
Sergey M․
a78d6a9bb1 [ae] Improve _VALID_URL 2016-01-15 22:13:48 +06:00
Sergey M․
567f9a5809 [ae] Add extractor import 2016-01-15 22:12:51 +06:00
Sergey M․
3a421c724f [history] Remove import (Closes #8243) 2016-01-15 22:10:07 +06:00
Sergey M․
34dd81c03a [xtube:user] Fix extraction (Closes #8224) 2016-01-15 21:35:20 +06:00
Sergey M․
b3f502cdb9 [xtube] Add shortcut 2016-01-15 21:28:36 +06:00
remitamine
587dfd44a4 [ae] Add support for fyi.tv, aetv.com and mylifetime.com(closes #3599) 2016-01-15 16:18:07 +01:00
remitamine
52767c1ba0 [history] add support for episode pages(fixes #8240) 2016-01-15 15:16:57 +01:00
remitamine
014b5c59d8 [theplatform] extend _VALID_URL regex 2016-01-15 15:12:35 +01:00
remitamine
fad7a336a1 Revert "[history] fix signature and media url extraction(fixes #8240)"
This reverts commit ffbc0baf72.
2016-01-15 14:54:39 +01:00
remitamine
ffbc0baf72 [history] fix signature and media url extraction(fixes #8240) 2016-01-15 12:35:31 +01:00
Sergey M
345f12196c Merge pull request #8228 from jaimeMF/disable-file-handler
[YoutubeDL] urlopen: disable the 'file:' protocol (#8227)
2016-01-14 22:20:02 +05:00
Sergey M․
5769b68bc0 Credit @TomGijselinck for canvas (#7145) 2016-01-14 23:15:26 +06:00
Sergey M․
4e2743abd9 [canvas] Improve (Closes #7145) 2016-01-14 23:15:12 +06:00
Tom Gijselinck
be2d40a58a [Canvas] Add new extractor 2016-01-14 23:14:41 +06:00
Sergey M․
81549898c0 [prosiebensat1] Fix some extraction and update tests 2016-01-14 22:45:09 +06:00
Lucas
0baedd1851 [prosiebensat1] add support for 7tv.de 2016-01-14 22:14:04 +06:00
Sergey M․
6b559c2fbc [ntvde] Improve regex 2016-01-14 22:12:24 +06:00
Sergey M․
986986064e [orf:fm4] Add test 2016-01-14 22:11:33 +06:00
Sergey M․
4654c1d016 [orf:fm4] Extend _VALID_URL (Closes #8234) 2016-01-14 22:07:42 +06:00
Sergey M․
163e8369b0 [ntvde] Fix extraction 2016-01-14 22:05:04 +06:00
Sergey M․
5cc9c5dfa8 [unistra] Fix extraction 2016-01-14 21:53:24 +06:00
Sergey M․
fbd90643cb [vodlocker] Fix extraction (Closes #8231) 2016-01-14 21:48:08 +06:00
Jaime Marquínez Ferrándiz
30e2f2d76f [YoutubeDL] use a more correct terminology in the error message for file:// URLs 2016-01-14 16:28:46 +01:00
Philipp Hagemeister
11c60089a8 release 2016.01.14 2016-01-14 15:43:21 +01:00
Sergey M․
abb893e6e4 [beeg] Update API URL 2016-01-14 19:57:56 +06:00
Sergey M․
4511c1976d [beeg] Fix extraction (Closes #8225) 2016-01-14 19:57:20 +06:00
Jaime Marquínez Ferrándiz
4240d50496 [YoutubeDL] improve error message for file:/// URLs 2016-01-14 14:07:54 +01:00
Jaime Marquínez Ferrándiz
6240b0a278 [YoutubeDL] urlopen: use build_opener again
Otherwise we would need to manually add handlers like HTTPRedirectHandler, instead we add a customized FileHandler instance that raises an error.
2016-01-14 08:16:39 +01:00
Jaime Marquínez Ferrándiz
e37afbe0b8 [YoutubeDL] urlopen: disable the 'file:' protocol (#8227)
If someone is running youtube-dl on a server to deliver files, the user could input 'file:///some/important/file' and youtube-dl would save that file as a video giving access to sensitive information to the user.
'file:' urls can be filtered, but the user can use an URL to a crafted m3u8 manifest like:

    #EXTM3U
    #EXT-X-MEDIA-SEQUENCE:0
    #EXTINF:10.0
    file:///etc/passwd
    #EXT-X-ENDLIST

With this patch 'file:' URLs raise URLError like for unknown protocols.
2016-01-14 00:24:04 +01:00
remitamine
40cf7fcbd2 [tudou] Add support for Albums and Playlists and extract more metadata 2016-01-13 13:29:00 +01:00
Yen Chi Hsuan
cc28492d31 [youtube] Fix acodec and vcodec order
In RFC6381, there's no rule stating that the first part of codecs should
be video and the second part should be audio, while it seems the case
for data reported by YouTube.
2016-01-13 17:05:38 +08:00
Sergey M․
bc0550c262 [pluralsight] Fix new player (Closes #8215) 2016-01-13 08:18:37 +06:00
Sergey M․
b83b782dc4 [downloader/fragment] Move helper data to context dict 2016-01-13 00:00:31 +06:00
Sergey M․
16a348475c [dailymotion] Prefer direct links (Closes #8156) 2016-01-12 23:23:39 +06:00
Sergey M․
709185a264 [downloader/fragment] More smooth calculations
`downloaded_bytes` is now updated on each fragment progress hook invocation
2016-01-12 23:18:38 +06:00
Sergey M․
9cb1a06b6c [downloader/fragment] Remove unused code and fix zero division error 2016-01-12 22:09:38 +06:00
Sergey M․
be27283ef6 [iprima] Mark broken 2016-01-11 22:00:17 +06:00
Sergey M․
b924bfad68 [videott] Mark broken 2016-01-11 21:58:32 +06:00
Sergey M․
192b9a571c [videomega] Mark broken 2016-01-11 21:56:19 +06:00
remitamine
6ec6cb4e95 Revert "fix typos"
This reverts commit 36a0e46c39.
2016-01-10 19:27:22 +01:00
remitamine
36a0e46c39 fix typos 2016-01-10 17:55:41 +01:00
Jakub Wilk
dfb1b1468c Fix typos
Closes #8200.
2016-01-10 17:24:28 +01:00
Jaime Marquínez Ferrándiz
3c91e41614 [downloader/fragment] Don't fail if the 'Content-Length' header is missing
In some dailymotion videos (like http://www.dailymotion.com/video/x3k0dtv from #8156) the segments URLs don't have the 'Content-Length' header and HttpFD sets the 'totat_bytes' field to None, so we also use '0' in that case (since we do different math operations with it).
2016-01-10 14:41:38 +01:00
Jaime Marquínez Ferrándiz
7e8a800f29 [bigflix] Use correct indentation to make flake8 happy 2016-01-10 14:26:27 +01:00
remitamine
2334762b03 [shahid] raise ExtractorError if the video is DRM protected 2016-01-10 07:55:58 +01:00
remitamine
3fc088f8c7 [dcn] extract video ids in season entries 2016-01-10 07:45:41 +01:00
Sergey M․
a9bbd26f1d [bigflix] Improve formats extraction 2016-01-10 10:49:27 +06:00
Sergey M․
6e99d5762a [bigflix] Extract all formats 2016-01-10 10:31:36 +06:00
Sergey M․
15b1c6656f Credit @vickyg3 for bigflix (#8194) 2016-01-10 10:03:56 +06:00
Sergey M
d412794205 Merge pull request #8194 from vickyg3/bigflix_ie
[Bigflix] Add new extractor for bigflix.com
2016-01-10 09:02:18 +05:00
Vignesh Venkat
0a899a1448 [Bigflix] Add new extractor for bigflix.com
Add an IE to support bigflix.com. It uses some sort of silverlight
plugin whose video url is being populated using base64 encoded
flashvars. So it is quite straightforward to extract.
2016-01-09 19:45:58 -08:00
Sergey M․
7a34302e95 [canalc2] Fix extraction (Closes #8191) 2016-01-10 01:37:10 +06:00
Jaime Marquínez Ferrándiz
27783821af [xhamster] Remove unused import 2016-01-09 11:16:23 +01:00
Philipp Hagemeister
b374af6ebd release 2016.01.09 2016-01-09 01:16:08 +01:00
Sergey M․
16f1131a4d [vimeo] Add test for #8187 2016-01-09 03:07:29 +06:00
Sergey M․
d5f071afb5 [vimeo] Check source file URL (Closes #8187) 2016-01-09 03:06:09 +06:00
Sergey M․
14b4f038c0 [xhamster] Update tests 2016-01-09 00:36:43 +06:00
Sergey M․
bcac2a0710 [xhamster] Fix uploader extraction 2016-01-09 00:36:19 +06:00
Sergey M․
1a6d92847f [xhamster] Change title regex precedence 2016-01-09 00:31:24 +06:00
Sergey M․
6a16fd4a1a [xhamster] Fix view count extraction 2016-01-09 00:29:10 +06:00
Sergey M․
44731e308c [xhamster] Fix duration extraction 2016-01-09 00:26:37 +06:00
Sergey M․
4763b624a6 [xhamster] Fix upload date extraction 2016-01-09 00:21:57 +06:00
Sergey M․
6609b3ce37 [xhamster] Improve title extraction 2016-01-09 00:19:36 +06:00
Sergey M
7e182627d9 Merge pull request #8182 from atomic83/patch-1
Extract xHamster title fix
2016-01-08 23:04:32 +05:00
atomic83
5777f5d386 Extract xHamster title fix 2016-01-08 12:58:05 +01:00
Sergey M․
5dbe81a1d3 [vimeo] Automatically pickup full movie when rented (Closes #8171) 2016-01-08 10:41:24 +06:00
Sergey M․
4cf096a4a9 [ivideon] Add support for map bound URLs 2016-01-08 05:11:23 +06:00
Sergey M․
18e6c97c48 [adultswim] Skip georestricted hls (Closes #8168) 2016-01-08 03:19:47 +06:00
Sergey M․
97afd99a18 [soundcloud:likes] Adapt to API changes (Closes #8166) 2016-01-08 01:54:31 +06:00
Sergey M․
23f13e9754 [youtube] Support expanding alternative format of links in description (Closes #8164) 2016-01-08 00:52:55 +06:00
Sergey M․
2e02ecbccc [ivideon] Add extractor 2016-01-07 12:24:32 +06:00
oittaa
e4f49a8753 check video_play_path and use xpath_text
"This check should take place earlier and should be more general if not video_url:. Same should be done for video_play_path. Also these fields better extracted with xpath_text."

Suggestions by @dstftw
2016-01-07 11:36:17 +06:00
Sergey M․
51d3045de2 [npr] Fix extractor (Closes #7218) 2016-01-07 01:57:36 +06:00
kaspi
76048b23e8 [npr] Add extractor
removed md5 from _TEST

moved from xml data to json

test

changed _TEST url to one that will not expire, so tests would not be failing
2016-01-07 01:55:55 +06:00
Sergey M․
f20756fb10 [udemy] Fix non free course message 2016-01-06 00:03:39 +06:00
Sergey M․
17b2d7ca77 [udemy] Detect non free courses (Closes #8138) 2016-01-06 00:02:21 +06:00
Sergey M
40f796288a [README.md] Clarify cookies usage 2016-01-05 02:17:12 +06:00
Sergey M․
2f546d0a3c [vrt] Prefix format ids 2016-01-05 01:59:45 +06:00
Sergey M․
18c782ab26 [vrt] Extend _VALUD_URL 2016-01-05 01:58:25 +06:00
Sergey M․
33cee6c7f6 [dramafever] Add test for custom episode title 2016-01-05 01:41:18 +06:00
Sergey M․
a2e51e7b49 [dramafever] Fix episode fallback 2016-01-05 01:36:38 +06:00
Sergey M․
bd19aa0ed3 [dramafever] Extract episode 2016-01-05 01:28:48 +06:00
Sergey M․
8f4c56f334 [dramafever] Extract episode number 2016-01-05 01:17:33 +06:00
Sergey M․
1dcc38b233 [dramafever] Improve subtitles extraction (Closes #8136) 2016-01-05 01:11:07 +06:00
Sergey M․
fff79f1867 [amp] Add missing subtitles to info dict 2016-01-05 01:05:37 +06:00
Jaime Marquínez Ferrándiz
3f17c357d9 [downloader/hls] Don't let ffmpeg read from stdin (#8139)
If you run 'while read aurl ; do youtube-dl "${aurl}"; done < path_to_batch_file'  (batch_file contains one url per line that uses the hls downloader) each call to youtube-dl consumed some characters and 'read' would assing to 'aurl' a non valid url

(This is the same problem that was fixed for the ffmpeg postprocessors in cffcbc02de)
2016-01-04 18:35:31 +01:00
Sergey M․
9938a17f92 [rte:radio] Extract timestamp 2016-01-04 05:04:48 +06:00
Sergey M․
9746f4314a [rte:radio] Simplify 2016-01-04 05:01:32 +06:00
Sergey M․
0238451fc0 [rte] PEP 8 2016-01-04 04:49:13 +06:00
Sergey M
2098aee7d6 Merge pull request #8063 from bpfoley/rteradio
[rte:radio] Add support for RTE radio player
2016-01-04 03:33:22 +05:00
Sergey M․
fb588f6a56 Credit @bpfoley for rte:radio (#8063) 2016-01-04 04:32:47 +06:00
bpfoley
896c7a23cd [extractor/rte.py] Add support for RTE radio player
While here, stop RteIE changing filename extensions to .mp4. The files
saved are .flv containers with h264 video.
2016-01-03 22:26:06 +00:00
Sergey M․
1463c5b9ac [ivi] Extract season info 2016-01-04 03:54:52 +06:00
Sergey M․
c6270b2ed5 [ivi:compilation] Fix extraction 2016-01-04 03:49:18 +06:00
Sergey M․
ab3176af34 [ivi] Fix extraction and modernize 2016-01-04 03:34:15 +06:00
Sergey M․
5aa535c329 [bbccouk] Update tests (Closes #8090) 2016-01-04 02:55:25 +06:00
Sergey M․
133b1886fc [20min] Improve (Closes #8110) 2016-01-04 02:33:08 +06:00
pingtux
66295fa4a6 [20min.ch] Added support for videoportal 2016-01-04 02:22:03 +06:00
pingtux
e54c44eeab [20min.ch] Add new extractor (closes #5977) 2016-01-04 02:21:56 +06:00
Sergey M․
a7aaa39863 [utils] Extract known extensions for reuse 2016-01-04 01:08:34 +06:00
Sergey M
ea6abd740f [nowtv] Mark broken 2016-01-03 10:12:13 +05:00
dyn888
e1a0bfdffe [youtube] added vcodec/acodec/abr for multiple itags
Should make downloading with filters more precise and easier, ie. bestvideo[vcodec=h264]. By default a lot of codecs are specified as avc1.xxxxxx and unique for each format, which makes them unusable for bestvideo selection.
2016-01-03 04:11:19 +01:00
Sergey M
3f3343cd3e Merge pull request #8061 from dstftw/introduce-chapter-and-series-fields
Introduce chapter and series fields
2016-01-03 03:54:22 +06:00
remitamine
4059eabd58 [dreisat] use extract_from_xml_url from ZDFIE for info extraction(fixes #7680)(fixes #8104)(closes #8121) 2016-01-02 21:29:10 +01:00
remitamine
6b46102661 [zdf] fix rtmpt format downloading handle errors 2016-01-02 21:24:57 +01:00
Yen Chi Hsuan
141a273a8b [qqmusic] Update tests 2016-01-02 22:39:09 +08:00
Yen Chi Hsuan
2fffb1dcd0 [qqmusic:playlist] Capture errors and update tests 2016-01-02 22:33:33 +08:00
Yen Chi Hsuan
e698e4e533 Merge branch 'remitamine-baidu' 2016-01-02 21:50:17 +08:00
Yen Chi Hsuan
b7546397f0 [baidu] Use list comprehension 2016-01-02 21:46:40 +08:00
Yen Chi Hsuan
0311677258 [baidu] Add notes for API calls 2016-01-02 21:44:49 +08:00
Sergey M․
88fb59d91b [bbccouk] Extend title extraction 2016-01-02 19:42:11 +06:00
Yen Chi Hsuan
a1d9f6c5dc [baidu] Improve playlist description 2016-01-02 21:36:35 +08:00
Yen Chi Hsuan
c579c5e967 [baidu] Cleanups 2016-01-02 21:31:02 +08:00
Yen Chi Hsuan
c9c194053d Merge branch 'baidu' of https://github.com/remitamine/youtube-dl into remitamine-baidu 2016-01-02 21:27:32 +08:00
Sergey M․
f20a11ed25 [bbccouk] Extend _VALID_URL (Closes #8116) 2016-01-02 19:22:39 +06:00
Sergey M․
76a353c9e5 [ruutu] Fix extraction (Closes #8107) 2016-01-02 07:44:30 +06:00
Sergey M
392f04d586 Merge pull request #8112 from pingtux/testtube
Remove testtube import
2016-01-02 07:11:32 +06:00
pingtux
94de6cf59c Remove testtube import
Extractor got deleted in remitamine/youtube-dl@8af2804
2016-01-02 01:35:09 +01:00
remitamine
8af2804a5d [testtube] Remove Extractor 2016-01-01 21:53:19 +01:00
remitamine
054479754c [revision3] Add new extractor(closes #6388)
- revision3.com
- testtube.com
- animalist.com
2016-01-01 21:03:16 +01:00
Sergey M․
5bafcf6525 [udemy] Use chapter_number 2016-01-01 20:34:29 +06:00
Sergey M․
306c51c669 [videomore] Use number fields for series 2016-01-01 20:30:08 +06:00
Sergey M․
27bfd4e526 [extractor/common] Introduce number fields for chapters and series 2016-01-01 20:26:56 +06:00
Jaime Marquínez Ferrándiz
ca227c8698 [yahoo] Support pages that use an alias (fixes #8084) 2016-01-01 14:32:00 +01:00
Philipp Hagemeister
32f9036447 [ccc] Add language information to formats 2016-01-01 13:28:45 +01:00
Philipp Hagemeister
190ef07981 release 2016.01.01 2016-01-01 12:17:10 +01:00
Sergey M․
82597f0ec0 [ccc] Extract duration 2016-01-01 15:41:52 +06:00
Sergey M․
8499d21158 [ccc] Fix description extraction and update test 2016-01-01 15:29:42 +06:00
Sergey M․
c9154514c4 [ccc] Fix upload date extraction 2016-01-01 15:22:22 +06:00
Sergey M․
0d5095fc65 [ccc] Update _VALID_URL (Closes #8097) 2016-01-01 15:14:41 +06:00
Yen Chi Hsuan
034caf70b2 [youku] Fix extraction (#8068) 2016-01-01 13:33:01 +08:00
remitamine
e565cf6048 [nextmovie] Add new extractor 2015-12-31 22:47:18 +01:00
remitamine
59f197aec1 Merge branch 'master' of github.com:rg3/youtube-dl 2015-12-31 22:15:14 +01:00
remitamine
a0e5beb0fb [nick] Add new extractor 2015-12-31 22:12:05 +01:00
remitamine
c1e90619bd [mtv] extract mgid extraction and query building into separate methods 2015-12-31 22:10:00 +01:00
Sergey M․
fec09bf15d [einthusan] Improve extraction (Closes #7877) 2016-01-01 02:39:00 +06:00
j
a0d7ede350 Fix einthusan parser 2016-01-01 02:38:50 +06:00
Sergey M․
b26afec81f [einthusan] Improve extraction (Closes #7877) 2016-01-01 02:23:03 +06:00
Sergey M․
8f7c4f7d2e Merge branch 'master' of github.com:rg3/youtube-dl 2016-01-01 02:22:26 +06:00
j
0416006a30 Fix einthusan parser 2016-01-01 01:58:49 +06:00
remitamine
7f9134fb2d [tvland] inherit from MTVServicesInfoExtractor 2015-12-31 20:52:47 +01:00
remitamine
91e274546c [tvland] Add new extractor 2015-12-31 20:23:48 +01:00
Jaime Marquínez Ferrándiz
69f8595256 [espn] Extract better titles 2015-12-31 20:06:21 +01:00
Jaime Marquínez Ferrándiz
930087f2f6 [espn] Support 'intl' videos (#7858) 2015-12-31 20:04:17 +01:00
Jaime Marquínez Ferrándiz
9f9f7664b5 [espn] Update test 2015-12-31 19:52:48 +01:00
Sergey M․
72528252e3 [pandoratv] Add IE names 2016-01-01 00:42:42 +06:00
Sergey M․
e4bd63f9c0 [pandoratv] Improve extraction (Closes #7921) 2016-01-01 00:40:27 +06:00
j
9accfed4e7 [pandoratv] Add new extractor (closes #6884) 2016-01-01 00:18:13 +06:00
remitamine
f1e21efe63 [tlc] remove TlcIE 2015-12-31 18:33:40 +01:00
remitamine
b05641ce40 [discovery] improve _VALID_URL regex 2015-12-31 18:24:49 +01:00
remitamine
fec040e754 [discovery] add support for discovery related sites
- investigationdiscovery.com
- discoverylife.com
- animalplanet.com
- ahctv.com
- destinationamerica.com
- sciencechannel.com
- tlc.com
- velocity.com
2015-12-31 17:29:37 +01:00
Sergey M․
34a9da136f [regiotv] Improve extraction (Closes #7915) 2015-12-31 22:12:47 +06:00
j
c43fda4c1a [regiotv] Add new extractor (closes #7797) 2015-12-31 22:11:13 +06:00
Philipp Hagemeister
7de81fcc53 release 2015.12.31 2015-12-31 16:50:53 +01:00
remitamine
9d46608efa [ora] Add new extractor(closes #7732) 2015-12-31 16:35:51 +01:00
remitamine
80b8b72cb8 [animalplanet] Add new extractor(closes #5303) 2015-12-31 13:36:07 +01:00
remitamine
9787c5f4c8 [fox] Add new extractor(closes #3063) 2015-12-31 12:02:33 +01:00
remitamine
d5f6429de8 [canalplus] improve extraction(fixes #6301)
- extract data from json instead of xml
- fix http format urls
- extract more metadata
- update tests
- make m3u8 and f4m format extraction non fatal
- use m3u8_native implementation
2015-12-31 04:02:08 +01:00
Sergey M․
df827a983a [discovery] Allow https (Closes #8065) 2015-12-31 06:06:36 +06:00
Sergey M․
29f3683901 [espn] Remove broken flag 2015-12-31 05:31:01 +06:00
Jaime Marquínez Ferrándiz
c7932289e7 [cbsnews] Fix extraction of the URL for the 'RtmpDesktop' format (fixes #8048) 2015-12-30 23:57:19 +01:00
Sergey M․
7a0b07c719 [videomore] Extract series info 2015-12-31 03:13:02 +06:00
Sergey M․
4d402db521 [udemy] Extract chapter info 2015-12-31 03:11:21 +06:00
Sergey M․
7109903e61 [extractor/common] Document chapter and series fields 2015-12-31 03:10:44 +06:00
Sergey M․
3092fc4035 [udemy] Fix typo 2015-12-31 01:09:21 +06:00
Sergey M․
f5bc4b5f95 [options] Prefer --convert-subs spelling 2015-12-30 23:12:35 +06:00
Sergey M․
69759a5990 [videomore] Set IE_NAME 2015-12-30 00:44:07 +06:00
Sergey M․
453fe2a345 [dramafever] Fix subtitles extraction (Closes #8049) 2015-12-30 00:13:00 +06:00
Sergey M․
ff18735cb2 [extractor/generic] Add support for videomore embeds 2015-12-29 23:58:23 +06:00
Sergey M․
030dfb04e0 [videomore] Add extractor (Closes #8040) 2015-12-29 23:57:46 +06:00
remitamine
06e4874c99 Merge branch 'jukebox' of https://github.com/remitamine/youtube-dl into remitamine-jukebox 2015-12-29 17:31:18 +01:00
remitamine
0d8a0fdc30 [srgssr] use SRFIE format ids 2015-12-29 16:38:06 +01:00
Jaime Marquínez Ferrándiz
53365f74a7 Credit @flatgreen for FranceCultureEmission (#8022) 2015-12-29 16:14:05 +01:00
remitamine
0368181998 [wdr] split long lines 2015-12-29 15:07:50 +01:00
remitamine
6101f45ef9 [ooyala] split long lines, fix test duration and add hdcode param to hds url 2015-12-29 15:05:21 +01:00
remitamine
bf96b45238 [rai] split long lines 2015-12-29 15:03:14 +01:00
remitamine
98d7c0f4f7 [tele13] split long lines 2015-12-29 15:02:18 +01:00
remitamine
f2017cb020 [srgssr] split long lines and use m3u8_native 2015-12-29 14:58:59 +01:00
remitamine
f889ac45b8 [ign] split long lines 2015-12-29 14:55:36 +01:00
remitamine
eccde5e9de [audimedia] split long lines 2015-12-29 14:53:43 +01:00
remitamine
ce7d243c7e [srgssr] fix IE_DESC 2015-12-29 12:01:22 +01:00
remitamine
6c4d6609ad [phoenix] fix IE_NAME 2015-12-29 12:00:52 +01:00
remitamine
db710571fd [daum] fix IE_NAME 2015-12-29 11:59:27 +01:00
remitamine
574dd17882 Merge branch 'remitamine-srgssr' 2015-12-29 11:38:19 +01:00
remitamine
422f7c112c [srgssr] update tests 2015-12-29 11:36:04 +01:00
Philipp Hagemeister
e974356f32 release 2015.12.29 2015-12-29 11:02:58 +01:00
remitamine
b19ad2fb53 Merge branch 'srgssr' of https://github.com/remitamine/youtube-dl into remitamine-srgssr 2015-12-29 10:57:45 +01:00
remitamine
126d7701b0 Merge branch 'daum' of https://github.com/remitamine/youtube-dl into remitamine-daum 2015-12-29 10:40:32 +01:00
flatgreen
ecf17d1653 [franceculture] Add extractor for '/emission-*' urls (closes #3777, closes #8022) 2015-12-28 20:07:35 +01:00
Jaime Marquínez Ferrándiz
7447661e9b [franceculture] Fix test 2015-12-28 20:05:47 +01:00
Sergey M․
7e5edcfd33 Simplify formats accumulation for f4m/m3u8/smil formats
Now all _extract_*_formats routines return a list
2015-12-29 00:58:24 +06:00
remitamine
39d60b715a Merge pull request #7769 from remitamine/sort
[common] lower (m3u8,rtmp,rtsp) format preference only if required program is not available
2015-12-28 19:15:14 +01:00
remitamine
d497a201ca [common] use specific variable for protocol preference in _sort_formats 2015-12-28 18:45:01 +01:00
remitamine
54537cdfb3 Merge pull request #8023 from remitamine/extract-formats
[common] simplify the use of _extract_m3u8_formats and _extract_f4m_formats
2015-12-28 18:17:12 +01:00
remitamine
bc737bd61a Merge branch 'zdf'(fixes #8024) 2015-12-28 17:54:04 +01:00
remitamine
a5c1d95500 [zdf] fix formats extraction 2015-12-28 17:52:05 +01:00
Sergey M․
c1f49e1684 [facebook] Fix authentication 2015-12-28 21:37:02 +06:00
Sergey M․
9f66931e16 [facebook] Extract login error 2015-12-28 21:20:09 +06:00
Jaime Marquínez Ferrándiz
6c6b8bd5cc [cspan] Fix extraction (fixes #8032) 2015-12-28 13:50:29 +01:00
Jaime Marquínez Ferrándiz
04e24906be [cspan] Initialize 'video_type' to avoid 'UnboundLocalError' exceptions (#8032) 2015-12-28 13:06:30 +01:00
remitamine
974c1b2d42 Merge branch 'dcn' of github.com:remitamine/youtube-dl into remitamine-dcn 2015-12-28 10:38:31 +01:00
remitamine
bca9bea1c1 [dcn] make m3u8 formats extraction non fatal 2015-12-28 10:27:17 +01:00
remitamine
bd3f9ecabe [tunein] add support for tunein topic,clip and program(fixes #7348) 2015-12-28 00:36:57 +01:00
Yen Chi Hsuan
c047270c02 [utils] Remove Content-encoding from headers after decompression
With cn_verification_proxy, our http_response() is called twice, one from
PerRequestProxyHandler.proxy_open() and another from normal
YoutubeDL.urlopen(). As a result, for proxies honoring Accept-Encoding, the
following bug occurs:

$ youtube-dl -vs --cn-verification-proxy https://secure.uku.im:993 "test:letv"
[debug] System config: []
[debug] User config: []
[debug] Command-line args: ['-vs', '--cn-verification-proxy', 'https://secure.uku.im:993', 'test:letv']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2015.12.23
[debug] Git HEAD: 97f18fa
[debug] Python version 3.5.1 - Linux-4.3.3-1-ARCH-x86_64-with-arch-Arch-Linux
[debug] exe versions: ffmpeg 2.8.4, ffprobe 2.8.4, rtmpdump 2.4
[debug] Proxy map: {}
[TestURL] Test URL: http://www.letv.com/ptv/vplay/22005890.html
[Letv] 22005890: Downloading webpage
[Letv] 22005890: Downloading playJson data
ERROR: Unable to download JSON metadata: Not a gzipped file (b'{"') (caused by OSError('Not a gzipped file (b\'{"\')',)); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
  File "/home/yen/Executables/Multimedia/youtube-dl/youtube_dl/extractor/common.py", line 330, in _request_webpage
    return self._downloader.urlopen(url_or_request)
  File "/home/yen/Executables/Multimedia/youtube-dl/youtube_dl/YoutubeDL.py", line 1886, in urlopen
    return self._opener.open(req, timeout=self._socket_timeout)
  File "/usr/lib/python3.5/urllib/request.py", line 471, in open
    response = meth(req, response)
  File "/home/yen/Executables/Multimedia/youtube-dl/youtube_dl/utils.py", line 773, in http_response
    raise original_ioerror
  File "/home/yen/Executables/Multimedia/youtube-dl/youtube_dl/utils.py", line 761, in http_response
    uncompressed = io.BytesIO(gz.read())
  File "/usr/lib/python3.5/gzip.py", line 274, in read
    return self._buffer.read(size)
  File "/usr/lib/python3.5/gzip.py", line 461, in read
    if not self._read_gzip_header():
  File "/usr/lib/python3.5/gzip.py", line 409, in _read_gzip_header
    raise OSError('Not a gzipped file (%r)' % magic)
2015-12-28 01:09:18 +08:00
remitamine
97f18fac3a [vgtv] fix f4m downloading(fixes #7843) 2015-12-27 17:44:55 +01:00
remitamine
c71d2e2087 [livestream] change test url 2015-12-27 17:27:20 +01:00
Yen Chi Hsuan
59185202c6 [iqiyi] Add tests for #7894 2015-12-28 00:19:36 +08:00
forDream
bee83e84f6 [iqiyi]fix valid url
eg:
http://yule.iqiyi.com/zbj.html
2015-12-28 00:16:34 +08:00
gam2046
82e02ea5fc Update iqiyi.py
Fix part of the address can not be resolved.
eg:http://www.iqiyi.com/w_19rt6o8t9p.html
2015-12-28 00:16:21 +08:00
Sergey M․
a95c26a06a [jwplatform] Carry long line 2015-12-27 21:43:14 +06:00
Sergey M․
0b0a17ae9d [viki] Fix typo 2015-12-27 21:42:26 +06:00
Sergey M․
30f51acbc8 [rai] Fix typos 2015-12-27 21:42:12 +06:00
Sergey M․
e0898585a1 [jwplatform] Fix typo 2015-12-27 21:41:55 +06:00
Sergey M․
62bdc9fecc [esri] Fix typo 2015-12-27 21:41:36 +06:00
Sergey M․
e73277c7e8 [abc7news] Remove redundant formats sorting 2015-12-27 21:41:21 +06:00
remitamine
8d29e47f54 [common] simplify the use of _extract_m3u8_formats and _extract_f4m_formats 2015-12-27 15:33:39 +01:00
remitamine
2db772b9ea Merge branch 'master' of github.com:rg3/youtube-dl 2015-12-27 15:03:13 +01:00
remitamine
7b81316508 [livestream] skip m3u8 manifest in progressive_urls 2015-12-27 14:58:57 +01:00
Philipp Hagemeister
05358deeca Merge branch 'master' of github.com:rg3/youtube-dl 2015-12-27 14:34:51 +01:00
Philipp Hagemeister
9f610f3a9e [sportdeutschland] Do not abort if meta info is missing
This fixes http://sportdeutschland.tv/badminton/yonex-copenhagen-masters-2015 . No testcase though since the event will be over by 2016.
2015-12-27 13:11:55 +01:00
remitamine
dbfd06730c Merge pull request #7892 from remitamine/livestream
[livestream] improve extraction(fixes #2133)(fixes #2838)(fixes #4152)(fixes #6988)
2015-12-27 13:11:33 +01:00
remitamine
5b025168c7 [livestream] improve extraction
- split long lines
- use m3u8 entry protocol for live streams
- extend _VALID_URL regex for livestream original
- extract livestream original live streams
2015-12-27 12:57:39 +01:00
remitamine
46124a49b2 Merge pull request #7851 from remitamine/kaltura
[kaltura] extract more formats
2015-12-27 10:34:36 +01:00
remitamine
608cc3b85c [kaltura] add referrer to m3u8 url 2015-12-27 10:19:44 +01:00
remitamine
6afe044b51 [dcn] improve extraction 2015-12-27 09:56:15 +01:00
Sergey M․
15aad84dc5 [lrt] Extract counters 2015-12-27 12:26:48 +06:00
Sergey M․
f7e1d82d40 [lrt] Improve 2015-12-27 12:16:55 +06:00
Giedrius Statkevičius
339b1944e7 [lrt] fix the rest of extractor
Closes #7690.
2015-12-26 20:02:54 +01:00
Giedrius Statkevičius
85367c3a47 [lrt] fix duration parsing 2015-12-26 19:29:54 +01:00
Sergey M․
607d65fbce [ign] flake8 2015-12-26 03:17:56 +06:00
remitamine
9f0ee2a388 Merge pull request #7045 from remitamine/ign
[ign] add support for pcmag and extract all formats and more metadata(fixes #5335)(fixes #7006)
2015-12-25 20:06:27 +01:00
remitamine
1fc0b47fdf [srmediathek] improve extraction 2015-12-25 17:37:50 +01:00
Sergey M․
6418b2439b [rutv] Fix extraction (Closes #8004) 2015-12-25 21:14:00 +06:00
remitamine
06d5556dfa [rai] improve extraction 2015-12-25 15:38:12 +01:00
remitamine
fb8e402ad2 [hotstar] Add new extractor 2015-12-25 01:59:56 +01:00
Sergey M․
c24044635b [zdf:channel] Add more tests 2015-12-24 20:44:49 +06:00
Sergey M․
67ba388efb [zdf:channel] Relax _VALID_URL 2015-12-24 20:42:29 +06:00
Boris Wachtmeister
e41604227c [zdf] expand valid-url pattern for channels
The webpage also creates URLs which include additional text that defines
the sorting order on the page like "aktuellste" (most current) and
"meist-gesehen" (most seen), e.g.:

http://www.zdf.de/ZDFmediathek/kanaluebersicht/aktuellste/332
http://www.zdf.de/ZDFmediathek/kanaluebersicht/meist-gesehen/332
2015-12-24 20:39:37 +06:00
Sergey M․
8a609c32fd [chaturbate] Improve error extraction (Closes #7989) 2015-12-24 20:09:48 +06:00
remitamine
96db61ffb8 [theintercept] improve extraction 2015-12-23 22:36:53 +01:00
remitamine
c153bd8b2f Merge branch 'theintercept' of https://github.com/bit/youtube-dl into bit-theintercept 2015-12-23 21:50:27 +01:00
Sergey M
383003653f Merge pull request #7969 from jwilk/spelling
Fix typos
2015-12-24 00:15:08 +06:00
Jakub Wilk
fc383f199e Fix typos 2015-12-23 19:12:16 +01:00
Sergey M․
2c566d02fe [pbs] Extend PBS station regex (Closes #7964) 2015-12-23 23:22:47 +06:00
Jaime Marquínez Ferrándiz
a8f1d167f6 [arte] Prefer json URLs that contain the video id from the 'vid' parameter in the URL (fixes #7920) 2015-12-23 18:00:46 +01:00
remitamine
261b4c23c7 [appletrailers] skip clips with empty url 2015-12-23 17:48:37 +01:00
Sergey M․
dcdc352371 [instagram:user] Improve _VALID_URL (Closes #7955) 2015-12-23 21:13:31 +06:00
Sergey M․
be514c856c [24video] Fix test 2015-12-23 20:49:52 +06:00
Sergey M․
128eb31d90 [24video] Fix extraction on python 2.6 2015-12-23 20:49:41 +06:00
Sergey M․
747b028412 [24video] Fix extraction (Closes #7956) 2015-12-23 20:42:36 +06:00
Jaime Marquínez Ferrándiz
7fe37d8a05 [appletrailers] Improve regex for fixing '<img>' tags (#7953) 2015-12-23 14:48:40 +01:00
Philipp Hagemeister
f10c27b8cb release 2015.12.23 2015-12-23 14:05:06 +01:00
remitamine
60427f63d1 [appletrailers] Add support for AppleTrailers Section 2015-12-23 10:40:45 +01:00
Sergey M․
178b47e6af [daum] Add test for #7949 2015-12-23 02:59:49 +06:00
Sergey M․
3a70ed9ebe [daum] Fix extraction (Closes #7949) 2015-12-23 02:54:32 +06:00
Sergey M․
89abf7bf4d [periscope] Fix token based extraction (Closes #7943) 2015-12-23 02:09:50 +06:00
Sergey M․
cfe9e5aa6c [comcarcoff] Extract duration 2015-12-23 01:18:14 +06:00
Sergey M․
4c24ed9464 [comcarcoff] Improve json data regex and modernize 2015-12-23 01:16:14 +06:00
Sergey M
11208ebbf1 Merge pull request #7942 from ausbin/comcarcoff-json-fix
[comcarcoff] adjust for json updates
2015-12-23 01:05:54 +06:00
Sergey M․
774ce35571 [imgur] Improve (Closes #7928) 2015-12-22 21:48:48 +06:00
Abhishek Kedia
dbee18b552 Improve extraction (Closes #7918)
remove outer parentheses in if

Conflicts:
	youtube_dl/extractor/imgur.py

checked code with flake8

not returning list in case of single images.

using the fact that id with length 5 are albums and more are single videos.
Also for single videos ie ImgurIE both urls - http://imgur.com/gallery/oWeAMW2 and http://imgur.com/oWeAMW2 are equally fine. Change regex to allow thuis.
For albums urls - http://imgur.com/gallery/Q95ko and http://imgur.com/Q95ko are ok. Change regex to allow this also.

update description in ImgurIE Tests.
Also move single video test 'https://imgur.com/gallery/YcAQlkx' from ImgurAlbumIE to ImgurIE.
2015-12-22 21:43:49 +06:00
remitamine
31d9ea4a3e Merge pull request #7322 from remitamine/vgtv
[vgtv] extract videos from FTV, Aftenposten, Aftonbladet using VGTVIE
2015-12-22 16:10:04 +01:00
remitamine
3b68efdc6a [vgtv] update tests and correct format sorting 2015-12-22 15:54:51 +01:00
j
2be689b7e2 [theintercept] Add new extractor 2015-12-22 13:08:16 +01:00
remitamine
2db5806991 [franceinter] use _match_id 2015-12-22 11:30:35 +01:00
remitamine
220bc3f0e3 [franceinter] fix title extraction 2015-12-22 11:27:18 +01:00
remitamine
48a6c984b8 [bleacherreport] update test 2015-12-22 10:14:57 +01:00
remitamine
dc016bf521 [viki] detect errors and fix formats extraction 2015-12-22 09:55:25 +01:00
remitamine
ff43d2365f [soompi] remove extractor
http://tv.soompi.com now redirect to viki.com because Viki has acquired
Soompi
http://www.soompi.com/2015/08/19/we-got-married-soompi-joins-viki/
2015-12-22 07:58:33 +01:00
Austin Adams
ed63cbd6ee [comcarcoff] adjust for json updates 2015-12-21 20:28:38 -05:00
Founder Fang
5f432ac8f5 [Weiqitv] Add new extractor 2015-12-22 06:21:56 +08:00
remitamine
c7224074d6 [audimedia] correct test case id 2015-12-21 23:02:55 +01:00
remitamine
eed30fea75 [flickr] fix format sorting 2015-12-21 22:10:16 +01:00
remitamine
5625bd0617 [br] add support for br-klassik.de and improve extraction
- extend _VALID_URL to match both br.de and br-klassik.de
- extract all formats(hls,hds and rtmp)
- use xpath_element and xpath_text for xml info extraction
2015-12-21 21:06:10 +01:00
Sergey M․
5ef5d25b15 [audiomack] Fix typo (Closes #7936) 2015-12-21 22:51:58 +06:00
remitamine
0f15ad7b9b [adultswim] update test 2015-12-21 17:07:19 +01:00
remitamine
f11d00fa41 [test_subtitles] remove BlipTV test 2015-12-21 16:52:47 +01:00
remitamine
61ebb401b7 [atresplayer] improve extraction
- select hashlib.md5 constructor as digestmod(in python 3.4+ MD5 as
implicit default digest for digestmod is deprecated.)
- extract hls formats
- update tests
- extract errors
2015-12-21 16:26:40 +01:00
remitamine
5c5a3ecf1b [abc] detect expired state and update tests 2015-12-21 13:07:52 +01:00
Philipp Hagemeister
0197004f78 release 2015.12.21 2015-12-21 11:42:25 +01:00
remitamine
2c28da8e05 Merge branch 'bleacherreport' of github.com:remitamine/youtube-dl into remitamine-bleacherreport 2015-12-21 11:18:32 +01:00
remitamine
c7fa5fa42c [bleacherreport] fix style issues and simplify 2015-12-21 11:12:58 +01:00
remitamine
7ba71e30fb Merge branch 'bliptv' of github.com:remitamine/youtube-dl into remitamine-bliptv 2015-12-21 04:31:17 +01:00
remitamine
7cb0952474 [makertv] improve extraction 2015-12-21 04:24:58 +01:00
remitamine
a8ae232fa9 Merge branch 'googledrive' of github.com:remitamine/youtube-dl into remitamine-googledrive 2015-12-21 03:15:19 +01:00
remitamine
5b251628e9 [googledrive] Modernize 2015-12-21 03:05:34 +01:00
remitamine
b9a324c0da Merge branch 'flickr' of github.com:remitamine/youtube-dl into remitamine-flickr 2015-12-21 00:37:51 +01:00
remitamine
5b95419ca5 [flickr] extract views_count and tags 2015-12-21 00:20:22 +01:00
remitamine
ecbccea703 [faz] extract duration and bitrate and use xpath_element and xpath_text for extraction 2015-12-20 21:38:30 +01:00
remitamine
c240ab6ecf Merge pull request #6790 from remitamine/tele13
[canal13cl] fix info extraction
2015-12-20 16:11:07 +01:00
remitamine
6882c0870e [tele13] improve extraction
- improve jwplayer setup regex
- sort formats
- remove duplicate formats
- update youtube test
2015-12-20 15:48:19 +01:00
remitamine
b0eeaf4f40 Merge pull request #6928 from remitamine/cnet
[cnet] fix extraction and extract more formats and metadata(closes #7003)
2015-12-20 12:59:35 +01:00
remitamine
c6ed6fadc2 [cnet] improve extraction
- relex data json regex
- extract the platform metadata once
- extract hds formats
- extract duration
- extract thumbnail
2015-12-20 12:43:00 +01:00
Sergey M․
e462474e1d [youtube] Generalize playlists extractor 2015-12-20 07:48:16 +06:00
Sergey M․
6b77d52b1f [test_utils] Add tests for encode_compat_str 2015-12-20 07:07:14 +06:00
Sergey M․
9b9c5355e4 Rename error_to_str to error_to_compat_str 2015-12-20 07:00:39 +06:00
Sergey M․
d890b4cc0a [nbc:news] Remove unnecessary compat_str 2015-12-20 06:43:42 +06:00
Sergey M․
2c74e6fa77 [YoutubeDL] Revert error_to_str for ExtractorError 2015-12-20 06:35:58 +06:00
Sergey M․
c0384f221e Use proper encoding on compat_str construction when necessary 2015-12-20 06:29:36 +06:00
Sergey M․
8e60dc7526 [utils] Add encode_compat_str 2015-12-20 06:26:26 +06:00
Sergey M․
8900ab4d9b [YoutubeDL] More error_to_str 2015-12-20 06:22:01 +06:00
Sergey M․
fb043a6e4e [YoutubeDL] Use error_to_str 2015-12-20 06:16:19 +06:00
Sergey M․
7f8b271465 Properly convert errors to strings 2015-12-20 05:27:38 +06:00
Sergey M․
fdae235858 [utils] Add error_to_str 2015-12-20 05:26:47 +06:00
remitamine
1deb710f26 [gputechconf] improve extraction 2015-12-19 23:59:00 +01:00
remitamine
ec6504b39c [gputechconf] Add new extractor(closes #5775) 2015-12-19 23:28:54 +01:00
Sergey M․
dd85e4d707 [extractor/common] Properly decode error string on python 2 (Closes #1354, closes #3957, closes #4037, closes #6449) 2015-12-20 02:43:50 +06:00
remitamine
fa64a84311 [faz] fix info extraction 2015-12-19 19:02:04 +01:00
remitamine
e0f06eae43 [fktv] fix info extraction 2015-12-19 18:26:28 +01:00
Sergey M․
0f206ee814 [toggle] Change IE_NAME 2015-12-19 23:11:23 +06:00
Sergey M․
cc0f378d54 [toggle] Rename to toggle 2015-12-19 19:59:00 +06:00
Sergey M․
e33c9cba7c [toggle] Improve _VALID_URL 2015-12-19 19:58:18 +06:00
Sergey M․
989e9f8ead [toggle] Improve formats extraction robustness 2015-12-19 19:52:37 +06:00
Sergey M․
8f097af4ec [toggle] Extract counters 2015-12-19 19:23:28 +06:00
Sergey M․
c40dbb19ab [toggle] Extract thumbnails 2015-12-19 19:19:26 +06:00
Sergey M․
ffaf6e66e3 [toggle] Improve 2015-12-19 19:16:49 +06:00
Sergey M․
74c730174f [toggle] Style 2015-12-19 19:06:05 +06:00
Sergey M․
c82a8dd14c [toggle] Remove unused imports 2015-12-19 19:04:38 +06:00
Sergey M․
f8253af561 [toggle] Use sanitized_Request 2015-12-19 19:03:55 +06:00
ping
ed370ff0e6 [togglesg] Fixes 2015-12-19 18:48:59 +06:00
ping
ee0f0393cf [togglesg] New extractor for toggle.sg 2015-12-19 18:48:46 +06:00
Yen Chi Hsuan
db2fe38b55 [utils] Support alternative timestamp format in TTML
Fixes #7608
2015-12-19 19:29:51 +08:00
Yen Chi Hsuan
d631d5f9f2 [utils] Fix TTML conversion
Tolerate invalid timestamps (closes #7909)
2015-12-19 18:21:42 +08:00
Sergey M․
4f29fa9906 [brightcove:new] Add test for ref: prefixed video id 2015-12-18 22:31:48 +06:00
Sergey M․
5b72fda140 [brightcove:new] Clarify ref: prefix 2015-12-18 22:22:41 +06:00
Sergey M․
f81ccbb3df [brightcove:new] Fix typo 2015-12-18 22:20:44 +06:00
Sergey M․
9fd0f67678 [brightcove:new] Add support for ref: preffixed video ids (Closes #7794) 2015-12-18 22:18:55 +06:00
Sergey M․
15d50aca64 [nowness] Add support for brightcove:new videos (Closes #7884) 2015-12-18 22:05:56 +06:00
Sergey M․
7234d1d9c7 [brightcove:new] Add _extract_url 2015-12-18 22:05:32 +06:00
Sergey M․
9796a9b20c [ndr] Fix description and upload date extraction (Closes #7893) 2015-12-18 21:34:17 +06:00
Philipp Hagemeister
016dd82050 release 2015.12.18 2015-12-18 14:21:30 +01:00
Sergey M․
b95779be21 [jsinterp] Extend function regex (Closes #7900, closes #7901) 2015-12-18 18:57:49 +06:00
Yen Chi Hsuan
10171468d9 [iqiyi] Update key (closes #7896) 2015-12-18 18:20:41 +08:00
Yen Chi Hsuan
bf597a6dd1 Merge pull request #7895 from Blue9/patch-1
Fix hyperlink to youtube-dl options
2015-12-18 18:10:41 +08:00
Gautam M
45dad8bab9 Fix hyperlink to youtube-dl options 2015-12-18 03:16:36 -05:00
remitamine
64ccbf18c0 [livestream] improve extraction, add support for live streams and extract more info and formats 2015-12-17 20:56:54 +01:00
Sergey M․
9dc1d94a0c [noco] Fix bitrates 2015-12-17 22:18:28 +06:00
Sergey M․
7824e1f6a6 [noco] Modernize 2015-12-17 22:16:58 +06:00
Sergey M․
2469a6aecb [noco] Adjust timestamp according to server time (Closes #7864) 2015-12-17 22:16:22 +06:00
Sergey M․
8f0afda028 [pbs] Extend _VALID_URL (Closes #7889) 2015-12-17 20:24:33 +06:00
remitamine
35e22b6b32 [youku] check for the correct variable 2015-12-17 12:51:50 +01:00
remitamine
323f82a7e0 [vimeo] add test for original format 2015-12-16 17:00:17 +01:00
remitamine
8534bf1f00 [vimeo] prefer original format 2015-12-16 16:36:25 +01:00
remitamine
eb4f27405b [vimeo] extract source file(closes #1072) 2015-12-16 09:43:53 +01:00
Sergey M․
2d3b70271c [rutube] Extend _VALID_URL 2015-12-16 04:44:17 +06:00
Sergey M․
ad1b6017cd [tf1] Fix tests 2015-12-15 21:36:59 +06:00
Sergey M․
05467d5a52 [tf1] Relax _VALID_URL 2015-12-15 21:31:58 +06:00
Sergey M․
ae5e94808e [tf1] Fix extraction (2) 2015-12-15 21:11:52 +06:00
Sergey M․
d7ffcfcf97 [tf1] Fix extraction (Closes #7873) 2015-12-15 21:09:14 +06:00
Sergey M․
0cb58b0259 [youtube] Extract alt_title and creator for music videos (Closes #7862) 2015-12-14 21:31:53 +06:00
Sergey M․
31b2051e21 [utils] Add remove_quotes 2015-12-14 21:30:58 +06:00
Yen Chi Hsuan
eb0bdc2c3e [novamov] Fix again 2015-12-14 02:50:59 +08:00
Yen Chi Hsuan
6583c741cd [novamov] Fix filekey extraction and reupload test video 2015-12-14 02:34:20 +08:00
Sergey M․
2d9295643e [footyroom] Skip test 2015-12-13 23:55:10 +06:00
Sergey M․
ee86e2c6d7 [novamov] Add support for mobile URLs 2015-12-13 19:16:01 +06:00
Yen Chi Hsuan
02a63fadc3 [infoq] Refactor and support the Chinese version
Closes #7576
2015-12-13 19:16:58 +08:00
Philipp Hagemeister
f3711edcf1 release 2015.12.13 2015-12-13 10:52:59 +01:00
Yen Chi Hsuan
22d07ba4e4 [infoq] Fix extraction for HTTP URLs (closes #7739) 2015-12-13 17:29:27 +08:00
Yen Chi Hsuan
f6abca506e [nowvideo] Skip deleted test case 2015-12-13 15:43:20 +08:00
Yen Chi Hsuan
b5424acdb9 [novamov] Improve existence checking 2015-12-13 15:43:20 +08:00
Yen Chi Hsuan
47c7f3d995 [novamov] Fix filekey extraction (closes #7764) 2015-12-13 15:43:20 +08:00
Sergey M․
0014ffa829 [funimation] Improve login 2015-12-13 07:17:42 +06:00
Sergey M․
c03943f394 Credit @Slyneth for funimation (#7775) 2015-12-12 15:19:23 +06:00
Yen Chi Hsuan
deb1e8d20e [youku] Put the missing item to get_hd 2015-12-12 15:49:19 +08:00
Yen Chi Hsuan
174964a7bc Credit @Celthi for fixing Youku extractor 2015-12-12 15:34:40 +08:00
Yen Chi Hsuan
9c568178fb Merge branch 'Celthi-youku_bugfix' 2015-12-12 15:26:43 +08:00
Yen Chi Hsuan
dbb7d7e26c [youku] Reorder format items 2015-12-12 15:24:58 +08:00
Yen Chi Hsuan
ade2340971 [youku] Simplify 2015-12-12 15:19:14 +08:00
Yen Chi Hsuan
4d77550cf0 [youku] Fix tests 2015-12-12 14:57:14 +08:00
Yen Chi Hsuan
c683454e7e [youku] MD5 is unstable 2015-12-12 14:48:46 +08:00
Yen Chi Hsuan
f133fd326b [youku] Cleanup and PEP8 2015-12-12 14:41:53 +08:00
Yen Chi Hsuan
1faa66f005 Merge branch 'youku_bugfix' of https://github.com/Celthi/youtube-dl into Celthi-youku_bugfix 2015-12-12 14:36:29 +08:00
Yen Chi Hsuan
8773f3158f [safari] Use postdata_urlencode (#7465) 2015-12-12 14:28:05 +08:00
Celthi
7e37c39485 merge data1 and data2 2015-12-12 11:26:15 +08:00
Celthi
14c17cafa1 add support to video protected by password 2015-12-12 11:21:44 +08:00
Celthi
8696a7fd13 fix the keyerror(mp4hd), todo support download the video protected by password 2015-12-12 10:44:21 +08:00
remitamine
bb4b8c57b9 [kaltura] extract more formats 2015-12-11 23:40:12 +01:00
Sergey M․
d63cfc3f0f [beeg] API v5 (Closes #7846) 2015-12-12 02:52:20 +06:00
Sergey M․
f377f44dae [funimation] Improve extraction 2015-12-12 01:02:54 +06:00
Sergey M․
0b1bb1ac3a [funimation] Add test for promotional video 2015-12-12 00:52:00 +06:00
Sergey M․
f208e52a76 [funimation] Fix promotional videos extraction 2015-12-12 00:48:09 +06:00
Sergey M․
b091529a3c [funimation] Extend _VALID_URL to match promotional videos 2015-12-12 00:43:03 +06:00
Sergey M․
b323a3cbff [funimation] Remove unused import 2015-12-12 00:39:44 +06:00
Sergey M․
b59623ef43 [funimation] Use mobile webpage for workaround hulu error 2015-12-12 00:38:58 +06:00
Sergey M․
9c163950da [funimation] Improve _VALID_URL 2015-12-11 23:20:10 +06:00
Sergey M․
d357bbd375 [funimation] Update test 2015-12-11 23:06:44 +06:00
Sergey M?
f542a3d26b [funimation] Improve extraction (Closes #7775) 2015-12-11 23:00:38 +06:00
Sergey M?
59a4ff482a [funimation] Real UA is required for login 2015-12-11 23:00:37 +06:00
Sergey M?
40ca5b04f4 [funimation] Remove unnecessary login form field 2015-12-11 23:00:37 +06:00
Sergey M?
411e5b88c9 [funimation] Fix login message 2015-12-11 23:00:37 +06:00
Sergey M?
b4c299bad0 [funimation] PEP 8 2015-12-11 23:00:36 +06:00
Muratcan Simsek
ab4bdc913f [funimation] Add new extractor
Update funimation.py

Update funimation.py

Removed unnecessary lines.

Update funimation.py

Added thumbnail and description.

Filename improvement.

fixed TEST.
2015-12-11 23:00:35 +06:00
remitamine
1fe248a51b Merge pull request #7833 from remitamine/ooyala
[ooyala] improve extraction
2015-12-11 17:55:32 +01:00
remitamine
2559b9d017 [wdr] extract all formats(closes #7788) 2015-12-11 17:31:33 +01:00
Sergey M․
4db43567e8 [downloader/f4m] Decode manifest before fixing 2015-12-11 20:28:44 +06:00
Celthi
5333842a1d According the blog and you-get fixed the issues #7627. 2015-12-11 20:08:14 +08:00
Celthi
98c3806b15 fix some not important codesnips 2015-12-11 19:18:14 +08:00
Yen Chi Hsuan
b6afc225c8 [vevo] Use _download_smil to provide informative error messages 2015-12-11 19:16:51 +08:00
Yen Chi Hsuan
ad30dc1e20 [vevo] Allow calling API without https
Not all proxies allow CONNECT
2015-12-11 19:07:13 +08:00
Yen Chi Hsuan
ff51983e15 [vevo] Handle videos without video_info (#7802) 2015-12-11 18:52:03 +08:00
Celthi
fdf01663d1 able to download first part of the video, but fail in the left part 2015-12-11 17:48:40 +08:00
Yen Chi Hsuan
4b94288301 [vevo] Use _match_id 2015-12-11 17:32:29 +08:00
Yen Chi Hsuan
4bf99ade15 [vevo] Catch the georestriction message (#7802) 2015-12-11 14:25:01 +08:00
remitamine
e3e166d8cf [ign] improve extraction and extract uploader_id 2015-12-11 00:08:54 +01:00
remitamine
3da3999612 [ultimedia] keep direct support for ultimedia videos 2015-12-10 23:04:28 +01:00
remitamine
d50116b8ac [vgtv] extract 5 digit length video ids using both xstream and vgtv 2015-12-10 22:18:42 +01:00
remitamine
75ed53320e [ooyala] improve extraction 2015-12-10 19:08:16 +01:00
Sergey M․
17b786ae73 [downloader/f4m] Fix malformed manifests (Closes #7823) 2015-12-10 22:59:50 +06:00
Sergey M
dfd42a43c3 Merge pull request #7821 from joksnet/patch-1
[FFmpegPostProcessor] Default of prefer ffmpeg
2015-12-10 22:10:20 +06:00
Philipp Hagemeister
f7b8dd63f0 release 2015.12.10 2015-12-10 17:05:13 +01:00
Sergey M․
a8abf124c8 [dailymotion] Add subtitles test URL for reference 2015-12-10 21:54:48 +06:00
Sergey M․
176ccefcd8 [pbs] PEP 8 2015-12-10 21:33:40 +06:00
Sergey M․
cbd2ffd031 [dailymotion] Fix subtitles extraction 2015-12-10 21:29:07 +06:00
Sergey M․
0b534d2adc [dailymotion] Restrict player v5 regex (Closes #7826) 2015-12-10 21:27:47 +06:00
Sergey M․
526a20bd16 [pbs] Clarify member stations' URLs 2015-12-10 21:04:26 +06:00
Celthi
51094b1b08 add cookie and referer in headers, change the video url 2015-12-10 21:42:12 +08:00
Philipp Hagemeister
f1ac2033ab Merge pull request #7827 from habi/master
Updating README.md
2015-12-10 13:54:18 +01:00
David Haberthür
a1b8d815f5 Reverting markup changes 2015-12-10 13:45:53 +01:00
David Haberthür
8b756bd98e Merge branch 'update-readme' 2015-12-10 13:20:25 +01:00
David Haberthür
46047c58d0 Updating README.md
- Harmonizing mentions of **youtube-dl** in the text
- Removing unnecessary Markdown markup for headers
- Adding some links
2015-12-10 13:19:26 +01:00
Juan M Martínez
374c761e77 [FFmpegPostProcessor] Default of prefer ffmpeg
When no `downloader` is passed to `FFmpegPostProcessor`
an exception was raised trying to get the prefer ffmpeg param.

    AttributeError: 'NoneType' object has no attribute 'params'

This fixes and defaults to `False`.
2015-12-09 20:56:00 -03:00
Sergey M․
6c7b26e13f [pbs] Make URLs lowercase 2015-12-09 21:28:04 +06:00
Sergey M․
b51b108045 [pbs] Clean up stations list from duplicates 2015-12-09 21:23:19 +06:00
Philipp Hagemeister
86e8c89488 release 2015.12.09 2015-12-09 15:32:26 +01:00
remitamine
41c3b34b1f [vgtv] add sortcut expressions to use the extractor 2015-12-09 10:50:11 +01:00
Jaime Marquínez Ferrándiz
47f48f5d85 [test/test_all_urls] Update pbs extractor name
It's in lowercase now (since e15e2ef7a0).
2015-12-08 21:12:13 +01:00
Sergey M․
e15e2ef7a0 [pbs] Add support for all member stations (#7674) 2015-12-09 01:51:34 +06:00
Sergey M․
d0c8b279da [pbs] Add another coveplayer pattern (Closes #7674) 2015-12-08 23:34:43 +06:00
Sergey M․
612d83b51d [pbs] Extend _VALID_URL 2015-12-08 23:28:36 +06:00
Sergey M
9c30efeb7e Merge pull request #7792 from jindaxia/fix_sohu_403forbidden
[sohu] Fix 403 forbidden
2015-12-08 22:54:14 +06:00
Sergey M․
39fa4cc107 [cliphunter] Fix extraction (Closes #7796) 2015-12-08 21:56:00 +06:00
Sergey M?
b09c122373 [nbc] Add another theplatform pattern 2015-12-08 21:35:42 +06:00
Sergey M
3348243b7b [README.md] Clarify verbose log requirements 2015-12-08 21:34:26 +06:00
Sergey M․
b46b65ed37 [nbc] Smuggle referer (Closes #7791) 2015-12-08 21:16:14 +06:00
Sergey M․
18e4088fad [theplatform] Add support for referer protected videos wuth explicit SMIL 2015-12-08 21:15:45 +06:00
虾哥哥
5fd6cd64f9 [sohu]fix 403 forbidden 2015-12-08 14:14:14 +08:00
Sergey M․
3d24bbfbe4 [YoutubeDL] Check formats for merge to be opposite (#7786) 2015-12-07 23:10:57 +06:00
Sergey M․
1775612512 [wimp] Improve video URL regex 2015-12-07 22:18:00 +06:00
Sergey M․
0d2d967cc7 [wimp] Fix extraction (Closes #7784) 2015-12-07 22:14:45 +06:00
Sergey M․
a5e52a1fd4 [vk] Add test for pladform embed 2015-12-07 22:05:54 +06:00
Sergey M․
291a93bafa [vk] Remove unnecessary message 2015-12-07 22:04:47 +06:00
Sergey M․
c4737bea17 [vk] Add support for pladform embeds (Closes #7780) 2015-12-07 22:03:52 +06:00
Sergey M․
45dad7ba1b [extractor/generic] Use _extract_url for pladform 2015-12-07 22:03:21 +06:00
Sergey M․
db7c9da871 [pladform] Add _extract_url routine 2015-12-07 22:02:45 +06:00
Philipp Hagemeister
bc92621ade release 2015.12.06 2015-12-06 18:51:25 +01:00
Sergey M․
fd8e559c3a [beeg] Switch to api v4 (Closes #7774) 2015-12-06 23:47:10 +06:00
Sergey M․
222e11d4ae [bbc] Add another pattern for playlist.sxml (Closes #7743) 2015-12-06 16:41:12 +06:00
Sergey M․
7d682f0acb [nowtv] Extend _VALID_URL to support jahr URLs (Closes #7755) 2015-12-06 16:18:59 +06:00
Yen Chi Hsuan
8364b6b0b1 [iqiyi] Update key
Closes #7772
2015-12-06 16:41:02 +08:00
remitamine
9f6b517671 [vgtv] extract all formats and improve extraction 2015-12-06 07:50:27 +01:00
Sergey M․
7ac40e5521 [nowvideo] Update test 2015-12-06 09:42:20 +06:00
Sergey M․
36066dd3ee [movshare] Rename to wholecloud 2015-12-06 09:42:00 +06:00
Sergey M․
636aa83ed3 [cloudtime] Add extractor 2015-12-06 09:37:38 +06:00
Sergey M․
33d152b6cc [novamov] Move all novamov based extractors to a single place
For easier navigation
2015-12-06 09:29:41 +06:00
remitamine
51c4fec0d5 [nba] use int_or_none for tbr 2015-12-05 21:04:22 +01:00
remitamine
0017486dca [nba] use int instead of int_or_none 2015-12-05 20:58:44 +01:00
Sergey M․
edc70f4aaf [pluralsight] Fix format code split while guessing quality 2015-12-06 01:40:13 +06:00
Sergey M․
756926ff00 [pluralsight] Add support for widescreen videos (Closes #7766) 2015-12-06 01:39:28 +06:00
remitamine
cb160dd531 [nba] handle format info properly 2015-12-05 18:47:15 +01:00
Jaime Marquínez Ferrándiz
77334ccb44 [metacafe] Fix age limit extraction 2015-12-05 16:12:50 +01:00
Jaime Marquínez Ferrándiz
796db21295 [metacafe] Fix video url extraction (closes #7763) 2015-12-05 16:12:02 +01:00
Philipp Hagemeister
535d7b681b release 2015.12.05 2015-12-05 16:01:37 +01:00
remitamine
7db2897ded [srgssr] handle all play urls only in SRGSSRIE and keep RTSIE for articles 2015-12-05 15:57:10 +01:00
Sergey M․
960e038886 [hypem] Modernize 2015-12-05 20:46:57 +06:00
Sergey M․
ea14422ff1 [hypem] Correctly handle cookies (Closes #7762) 2015-12-05 20:42:21 +06:00
Yen Chi Hsuan
38d05d17e5 [fc2] Fix test_FC2_1 2015-12-05 21:10:26 +08:00
Yen Chi Hsuan
db9bd5267f [keezmovies] Fix extraction
Also fixes #7752
2015-12-05 17:26:13 +08:00
remitamine
ab3b773bbe [acast] change tests into more stable casts and work with channel extractor only if it didn't match cast regex 2015-12-05 10:14:34 +01:00
Yen Chi Hsuan
0bc4ee60e0 [bbc] Fix test_BBC_6 2015-12-05 16:55:53 +08:00
Yen Chi Hsuan
a3ef0e1cdd [bbc.co.uk] Skip removed test video 2015-12-05 16:55:53 +08:00
Yen Chi Hsuan
679bacf0b5 [bbc.co.uk] Fix test_BBCCoUk
This is similar to the one in #7756, So also fixes #7756.
2015-12-05 16:55:53 +08:00
remitamine
02e3952f3b [trilulilu] handle errors 2015-12-05 09:42:00 +01:00
Yen Chi Hsuan
64b7e89c0c [srf] Support audios (closes #7760) 2015-12-05 16:26:30 +08:00
remitamine
bee4c5571a [clipfish] improve extraction 2015-12-04 16:38:05 +01:00
remitamine
96929dd1e8 [skynewsarabia] fix extractor name 2015-12-04 16:23:44 +01:00
remitamine
53e06b2507 [ooyala] fix duration scale 2015-12-04 16:18:02 +01:00
remitamine
b80d4bebf3 [nba] fix extraction errors 2015-12-04 16:04:22 +01:00
Jaime Marquínez Ferrándiz
55bec9b658 [clipfish] Remove unused import and style fix 2015-12-04 14:29:37 +01:00
Jaime Marquínez Ferrándiz
2a63b0f110 [mixcloud] Fix extraction of the audio url (fixes #7751) 2015-12-04 14:26:34 +01:00
remitamine
07b88cffce Merge pull request #7686 from remitamine/acast
[acast] Add new extractor
2015-12-04 09:10:02 +01:00
remitamine
58c8451f36 Merge pull request #7660 from remitamine/gameinformer
[gameinformer] Add new extractor(closes #3376)
2015-12-04 09:03:21 +01:00
remitamine
3047121c63 Merge pull request #7320 from remitamine/adobetv
[adobetv] improve extraction and add support specific language video,show and channel extraction
2015-12-04 08:54:06 +01:00
remitamine
7079f8ff1f [adobetv] use compat_str 2015-12-04 08:44:18 +01:00
remitamine
2c3b9f3570 [adobetv] use a variable for api base url 2015-12-04 08:37:08 +01:00
remitamine
fad2428f47 [gameinformer] split long line 2015-12-04 08:24:04 +01:00
remitamine
c3d3110f6a Merge pull request #7185 from remitamine/ooyala
[ooyala] extract more formats and metadata
2015-12-04 08:23:21 +01:00
remitamine
79ec00276c Merge pull request #7326 from remitamine/clipfish
[clipfish] improve info extraction
2015-12-04 07:57:58 +01:00
remitamine
9c117d345f [nba] improve(fixes #7068)
* extract more formats
* extract videos from team mini sites
* extract more metadata
2015-12-04 07:20:27 +01:00
remitamine
46cc1c65a4 [nba] use xpath utils 2015-12-04 07:09:48 +01:00
remitamine
71d9fe7818 [trilulilu] improve extraction 2015-12-04 06:53:33 +01:00
remitamine
4ccabf93db [trilulilu] fix info extraction 2015-12-04 00:51:02 +01:00
remitamine
6612a34939 [bilibili] flake8 2015-12-03 22:43:19 +01:00
remitamine
e5b4225f7c [audimedia] flake8 2015-12-03 22:25:08 +01:00
remitamine
b2ca35ddbc Merge pull request #7745 from remitamine/bilibili
[bilibili] use xpath_text and catch errors in xml document
2015-12-03 22:11:41 +01:00
remitamine
76ab842d9b [bilibili] use xpath_text and catch errors in xml document 2015-12-03 22:01:32 +01:00
remitamine
78653a33aa Merge remote-tracking branch 'upstream/master' into bliptv 2015-12-03 20:33:22 +01:00
remitamine
24dc1ed715 Merge pull request #7659 from remitamine/audimedia
[audimedia] Add new extractor(closes #7654)
2015-12-03 20:28:52 +01:00
remitamine
682d8dcd21 Merge pull request #7210 from remitamine/bilibili
[bilibili] fix info extraction(fixes #7182)
2015-12-03 20:16:54 +01:00
remitamine
640bb54e73 Merge branch 'master' of https://github.com/rg3/youtube-dl into bilibili 2015-12-03 20:05:11 +01:00
Sergey M․
e0977d7686 [beeg] Decrypt URL (Closes #7736) 2015-12-04 00:59:32 +06:00
remitamine
112ab398db Merge pull request #7681 from remitamine/skynewarabia
[skynewsarabia] Add new extractor
2015-12-03 18:41:38 +01:00
Sergey M․
af93fcfa05 [beeg] Update API URL (Closes #7736) 2015-12-03 23:23:36 +06:00
Sergey M․
62d231c004 [extractor/common] Clarify duration can be float 2015-12-03 20:55:02 +06:00
Sergey M․
49358274d7 [bbc] Fix _VALID_URL 2015-12-03 20:49:14 +06:00
Jaime Marquínez Ferrándiz
7b1e379ca9 [gametrailers] Fix extraction (fixes #7722)
They have stopped using the MTV system.
2015-12-03 13:47:21 +01:00
Sergey M․
22d7368dfb [bbc] Extract _ID_REGEX and ad one more video id pattern (Closes #7724) 2015-12-02 02:34:31 +06:00
Sergey M․
24121bc703 [udemy] Make lecture downloading fatal 2015-12-02 00:53:03 +06:00
Sergey M․
9fc87fa767 [udemy] Remove unused import 2015-12-02 00:51:47 +06:00
Sergey M․
328f82d59a [udemy] Semi-switch to api 2.0 (Closes #7704)
* Use api 2.0 to get lectures since it provides more formats
* Fix authorization for api 2.0
* Autotry enrolling in the course for single lectures
* Extract additional metadata rom asset['data']['outputs']
2015-12-02 00:48:27 +06:00
Sergey M․
78717fc328 [udemy] Allow authentication via cookies 2015-12-01 22:10:10 +06:00
Sergey M․
3b35c3425e [udemy] Extract formats from data.outputs (#7704) 2015-12-01 20:35:46 +06:00
Sergey M․
874ae0354e [nrk] Extract f4m formats and impose geo restriction only when not media URL (Closes #7715) 2015-12-01 18:35:24 +06:00
Sergey M․
4c6b4764f0 [youtube] Clarify itag 272 possible resolutions (#7699) 2015-11-30 20:42:05 +06:00
Sergey M․
59ee8a8647 [facebook] Make alternative title optional (Closes #7700) 2015-11-30 20:10:09 +06:00
Sergey M․
af284305d5 [vodlocker] Capture file not found error (Closes #7696) 2015-11-30 03:58:39 +06:00
Sergey M․
d53a4af1a4 [pornhub:playlist] Allow alphanumeric viewkeys (Closes #7695) 2015-11-30 03:47:01 +06:00
Sergey M․
2e1b928540 [youtube:playlist] Extend _VALID_URL 2015-11-29 21:04:11 +06:00
Sergey M․
040ac68679 [youtube] Extend _VALID_URL (Closes #7694) 2015-11-29 21:01:59 +06:00
Yen Chi Hsuan
049d71d874 [youtube] Simplify and make sure header values are strings 2015-11-29 19:52:48 +08:00
Sergey M․
bf2c8c8f82 [spiegel] Fix extraction (Closes #7693) 2015-11-29 17:03:33 +06:00
Yen Chi Hsuan
ef428960c9 Merge pull request #7691 from ryandesign/use-PYTHON-env-var
Always use PYTHON env var in Makefile
2015-11-29 13:08:46 +08:00
Yen Chi Hsuan
992fc9d6e1 [utils] Refactor handle_youtubedl_headers for future extension 2015-11-29 12:58:29 +08:00
Ryan Schmidt
8639f89f51 Always use PYTHON env var in Makefile 2015-11-28 22:56:24 -06:00
Yen Chi Hsuan
0424ec307b [utils] Correct docstring of YoutubeDLHandler 2015-11-29 12:46:04 +08:00
Yen Chi Hsuan
ac5a69af45 [youtube] Disable compression for live streams 2015-11-29 12:44:24 +08:00
Yen Chi Hsuan
94e8c80473 [downloader/hls] Respect Youtubedl-* headers 2015-11-29 12:43:59 +08:00
Yen Chi Hsuan
87f0e62d94 [utils] Separate codes for handling Youtubedl-* headers 2015-11-29 12:42:50 +08:00
remitamine
46b4070f3f Merge pull request #7057 from remitamine/cspan
[cspan] correct the clip info extraction (fixes #7335)
2015-11-28 21:36:52 +01:00
remitamine
2a776f9788 [cspan] change into a function 2015-11-28 20:22:31 +01:00
remitamine
f4c7ef9862 [skynewsarabia] return empty categories array if there is no topic 2015-11-28 18:20:44 +01:00
remitamine
50e12e9df1 [acast] Add new extractor 2015-11-28 18:10:37 +01:00
Sergey M․
b7faebbac8 [bloomberg] Improve formats extraction 2015-11-28 22:45:19 +06:00
Sergey M․
4191fdf147 [bloomberg] Improve video id regex 2015-11-28 22:41:39 +06:00
Sergey M․
9a4f12be98 [bloomberg] Modernize 2015-11-28 22:40:29 +06:00
Sergey M․
7ad4258add [bloomberg] Relax _VALID_URL even more (Closes #7685) 2015-11-28 22:39:36 +06:00
Sergey M․
9945c4994c Credit @reiv for soundcloud:search 2015-11-28 20:21:03 +06:00
Sergey M․
5faf9fed7e [youtube] Clarify rationale for yt:stretch validation 2015-11-28 18:50:21 +06:00
Sergey M
13a9b69b09 Merge pull request #7677 from lalinsky/yt-stretch-zero-height
[youtube] Ignore yt:stretch with zero width/height
2015-11-28 18:14:06 +06:00
remitamine
4975650e00 [skynewsarabia] fix IE_NAME 2015-11-28 12:20:39 +01:00
remitamine
0cc7178546 [skynewsarabia] Add new extractor 2015-11-28 11:48:18 +01:00
Lukáš Lalinský
41f24c321d [youtube] Use the existing w and h variables 2015-11-28 08:16:46 +01:00
Yen Chi Hsuan
4b3fbafdd2 [options] Changed wording for --list-formats
As proposed by @dstftw at 9bff48a0e7
2015-11-28 14:14:20 +08:00
Sergey M․
7ac40086f5 [dbtv] Expand _VALID_URL (Closes #7645) 2015-11-28 08:44:13 +06:00
Lukáš Lalinský
313dfc45f5 [youtube] Ignore yt:stretch with zero width/height 2015-11-28 01:07:07 +01:00
Philipp Hagemeister
78a55d7a28 release 2015.11.27.1 2015-11-27 16:39:59 +01:00
Philipp Hagemeister
bb6ac83698 release 2015.11.27 2015-11-27 16:32:51 +01:00
Yen Chi Hsuan
9d0e366880 [downloader/hls] Remove Accept-encoding from headers passed to ffmpeg
Fails for Youtube Gaming live streams (#7671)
2015-11-27 21:37:45 +08:00
Yen Chi Hsuan
9bff48a0e7 [options] Clarify --list-formats needs videos (closes #7669) 2015-11-27 21:24:39 +08:00
remitamine
60121eb514 [gameinformer] Add new extractor 2015-11-26 22:43:31 +01:00
remitamine
527ca1da4f [audimedia] Add new extractor(closes #7654) 2015-11-26 21:24:10 +01:00
Sergey M
7689413e42 [README.md] Mention mplayer and mpv in "other programs" question 2015-11-24 23:06:21 +06:00
Philipp Hagemeister
ba7a92b0ce release 2015.11.24 2015-11-24 07:46:38 +01:00
Philipp Hagemeister
4c7d816dd7 [jsinterp] Adapt to updated YouTube code generation (Fixes #7623, fixes #7624, fixes #7625, fixes #7626) 2015-11-24 07:45:38 +01:00
Philipp Hagemeister
032f2f260f README: Document which other programs may be helpful (Fixes #7621) 2015-11-24 03:38:46 +01:00
Philipp Hagemeister
20e98bf6c0 release 2015.11.23 2015-11-23 18:07:58 +01:00
Sergey M?
5c2266df4b Switch codebase to use sanitized_Request instead of
compat_urllib_request.Request

[downloader/dash] Use sanitized_Request

[downloader/http] Use sanitized_Request

[atresplayer] Use sanitized_Request

[bambuser] Use sanitized_Request

[bliptv] Use sanitized_Request

[brightcove] Use sanitized_Request

[cbs] Use sanitized_Request

[ceskatelevize] Use sanitized_Request

[collegerama] Use sanitized_Request

[extractor/common] Use sanitized_Request

[crunchyroll] Use sanitized_Request

[dailymotion] Use sanitized_Request

[dcn] Use sanitized_Request

[dramafever] Use sanitized_Request

[dumpert] Use sanitized_Request

[eitb] Use sanitized_Request

[escapist] Use sanitized_Request

[everyonesmixtape] Use sanitized_Request

[extremetube] Use sanitized_Request

[facebook] Use sanitized_Request

[fc2] Use sanitized_Request

[flickr] Use sanitized_Request

[4tube] Use sanitized_Request

[gdcvault] Use sanitized_Request

[extractor/generic] Use sanitized_Request

[hearthisat] Use sanitized_Request

[hotnewhiphop] Use sanitized_Request

[hypem] Use sanitized_Request

[iprima] Use sanitized_Request

[ivi] Use sanitized_Request

[keezmovies] Use sanitized_Request

[letv] Use sanitized_Request

[lynda] Use sanitized_Request

[metacafe] Use sanitized_Request

[minhateca] Use sanitized_Request

[miomio] Use sanitized_Request

[meovideo] Use sanitized_Request

[mofosex] Use sanitized_Request

[moniker] Use sanitized_Request

[mooshare] Use sanitized_Request

[movieclips] Use sanitized_Request

[mtv] Use sanitized_Request

[myvideo] Use sanitized_Request

[neteasemusic] Use sanitized_Request

[nfb] Use sanitized_Request

[niconico] Use sanitized_Request

[noco] Use sanitized_Request

[nosvideo] Use sanitized_Request

[novamov] Use sanitized_Request

[nowness] Use sanitized_Request

[nuvid] Use sanitized_Request

[played] Use sanitized_Request

[pluralsight] Use sanitized_Request

[pornhub] Use sanitized_Request

[pornotube] Use sanitized_Request

[primesharetv] Use sanitized_Request

[promptfile] Use sanitized_Request

[qqmusic] Use sanitized_Request

[rtve] Use sanitized_Request

[safari] Use sanitized_Request

[sandia] Use sanitized_Request

[shared] Use sanitized_Request

[sharesix] Use sanitized_Request

[sina] Use sanitized_Request

[smotri] Use sanitized_Request

[sohu] Use sanitized_Request

[spankwire] Use sanitized_Request

[sportdeutschland] Use sanitized_Request

[streamcloud] Use sanitized_Request

[streamcz] Use sanitized_Request

[tapely] Use sanitized_Request

[tube8] Use sanitized_Request

[tubitv] Use sanitized_Request

[twitch] Use sanitized_Request

[twitter] Use sanitized_Request

[udemy] Use sanitized_Request

[vbox7] Use sanitized_Request

[veoh] Use sanitized_Request

[vessel] Use sanitized_Request

[vevo] Use sanitized_Request

[viddler] Use sanitized_Request

[videomega] Use sanitized_Request

[viewvster] Use sanitized_Request

[viki] Use sanitized_Request

[vk] Use sanitized_Request

[vodlocker] Use sanitized_Request

[voicerepublic] Use sanitized_Request

[wistia] Use sanitized_Request

[xfileshare] Use sanitized_Request

[xtube] Use sanitized_Request

[xvideos] Use sanitized_Request

[yandexmusic] Use sanitized_Request

[youku] Use sanitized_Request

[youporn] Use sanitized_Request

[youtube] Use sanitized_Request

[patreon] Use sanitized_Request

[extractor/common] Remove unused import

[nfb] PEP 8
2015-11-23 21:56:23 +06:00
Sergey M․
67dda51722 Rename compat_urllib_request_Request to sanitized_Request and move to utils 2015-11-23 21:55:15 +06:00
Sergey M․
e4c4bcf36f [vimeo] Use compat_urllib_request_Request 2015-11-23 21:55:14 +06:00
Sergey M․
82d8a8b6e2 [YoutubeDL] Wrap plain-text URL requests in compat_urllib_request_Request 2015-11-23 21:55:13 +06:00
Sergey M․
13a10d5aa3 [compat] Add compat_urllib_request_Request
This is actually not a compatibility routine but rather a workaround for URLs without protocol specified.
The protocol-less URL is treated as HTTP one since it's most probable scenario and it will most likely to
redirect to HTTPS if HTTPS was actually expected. This routine could also be useful for any Request
preprocessing that may be added in future.
2015-11-23 21:55:12 +06:00
Sergey M․
9022726446 [youtube] Fix test 2015-11-23 21:37:21 +06:00
Sergey M․
94bfcd23b7 [youtube] Fix test 2015-11-23 21:35:23 +06:00
Sergey M․
526b3b0716 [youtube] Clarify ytplayer.config extraction rationale 2015-11-23 21:14:03 +06:00
Sergey M․
61f92af1cf [youtube] Add test with '};' in tags 2015-11-23 21:02:37 +06:00
Sergey M․
a72778d364 [youtube] Improve ytplayer.config extraction 2015-11-23 21:00:06 +06:00
Sergey M
5ae17037a3 Merge pull request #7599 from lalinsky/fix-youtube
[youtube] More explicit player config JSON extraction (fixes #7468)
2015-11-23 20:52:23 +06:00
Sergey M․
02f0da20b0 [pluralsight] Add support for alternative webpage layout (Closes #7607) 2015-11-23 03:08:38 +06:00
Lukáš Lalinský
b41631c4e6 [youtube] Send the list of patterns directly to _search_regex 2015-11-22 13:53:26 +01:00
Lukáš Lalinský
0e49d9a6b0 [youtube] Fall back to the original regex for ytplayer.config 2015-11-22 13:49:33 +01:00
Sergey M․
4a7d108ab3 [rutube] Remove unnecessary print 2015-11-22 18:24:17 +06:00
Lukáš Lalinský
3cfd000849 [youtube] More explicit player config JSON extraction (fixes #7468) 2015-11-22 13:14:35 +01:00
Sergey M․
1b38185361 [pornhd] Fix title extraction (Closes #7596) 2015-11-22 18:08:30 +06:00
Sergey M․
9cb9a5df77 [utils] Check ext with trailing slash against the list of known extensions 2015-11-22 17:27:13 +06:00
Sergey M․
5035536e3f [test_utils] Add tests for determine_ext 2015-11-22 06:33:52 +06:00
Sergey M․
3e12bc583a [utils] Improve determine_ext (Closes #7593) 2015-11-22 06:29:39 +06:00
Sergey M․
e568c2233e [youtube] Add test for multi page list of playlists 2015-11-22 05:03:23 +06:00
Sergey M․
061a75edd6 [youtube] Extract base for entry list extractors and support multi page lists of playlists 2015-11-22 05:01:01 +06:00
Philipp Hagemeister
82c4d7b0ce release 2015.11.21 2015-11-21 23:36:27 +01:00
Sergey M․
136dadde95 [youtube:show] Rework in terms of playlists base extractor 2015-11-22 04:18:20 +06:00
Sergey M․
0c14841585 [youtube:user:playlists] Add extractor (Closes #3817) 2015-11-22 04:17:07 +06:00
Sergey M․
0eebf34d9d [pluralsight] Rephrase 2015-11-22 00:58:25 +06:00
Sergey M․
cf186b77a7 [pluralsight] Clarify allowed qualities guessing rationale 2015-11-22 00:56:40 +06:00
Sergey M․
a3372437bf [soundcloud] Remove unused variable 2015-11-22 00:49:58 +06:00
Sergey M․
4c57b4853d [pluralsight] Until listing formats request only single format 2015-11-22 00:42:58 +06:00
Sergey M․
38eb2968ab [pluralsight] Clarify and randomize ViewClip sleep interval 2015-11-22 00:07:09 +06:00
Andrzej Lichnerowicz
bea56c9569 [pluralsight] prevent error 429 when sensing video formats 2015-11-21 23:49:58 +06:00
Sergey M․
7e508ff2cf [pluralsight] Improve login detection 2015-11-21 21:49:37 +06:00
Sergey M․
563772eda4 [pluralsight] Extract base class 2015-11-21 21:37:29 +06:00
Sergey M․
0533915aad [pluralsight] Update some more URLs 2015-11-21 21:35:08 +06:00
Sergey M․
c3a227d1c4 [pluralsight] Update _LOGIN_URL 2015-11-21 21:25:48 +06:00
Sergey M․
f6c903e708 [soundcloud:search] Simplify (Closes #7213) 2015-11-21 21:21:21 +06:00
Sergey M․
7dc011c063 [soundcloud:search] Remove no track results message 2015-11-21 21:00:42 +06:00
Sergey M․
4e3b303016 [soundcloud:search] Fix non-ASCII searches 2015-11-21 20:55:48 +06:00
Sergey M․
7e1f5447e7 [utils] Improve encode_dict 2015-11-21 20:46:33 +06:00
Sergey M․
7e3472758b [soundcloud:search] PEP 8 2015-11-21 20:04:35 +06:00
reiv
328a22e175 [soundcloud] Remove limit on search results 2015-11-21 19:41:36 +06:00
reiv
417b453699 [soundcloud] Use correct error message conventions 2015-11-21 19:41:31 +06:00
reiv
6ea7190a3e Rewrite as list comprehension. 2015-11-21 19:41:26 +06:00
reiv
b54b08c91b Simplify with itertools.islice(). 2015-11-21 19:41:19 +06:00
reiv
c30943b1c0 Fix some compatibility issues, cleanup. 2015-11-21 19:41:15 +06:00
reiv
2abf7cab80 [soundcloud] Add Soundcloud search extractor 2015-11-21 19:41:08 +06:00
Sergey M․
4137196899 [rutube] Extract all formats 2015-11-21 18:02:52 +06:00
Sergey M․
019839faaa [extractor/common] Use baseURL from f4m manifest for recursive manifest extraction 2015-11-21 18:01:39 +06:00
Sergey M․
f52183a878 [rutube:embed] Extend _VALID_URL (Closes #7588) 2015-11-21 17:39:24 +06:00
Yen Chi Hsuan
750b9ff032 [generic] Extract M3U8 formats (closes #7582) 2015-11-21 16:43:01 +08:00
Yen Chi Hsuan
28602e747c [generic] Refactor 2015-11-21 16:08:54 +08:00
Yen Chi Hsuan
6cc37c69e2 [generic] Unescape URLs from JWPlayer (#7582) 2015-11-21 14:12:34 +08:00
Sergey M․
a5cd0eb8a4 [pluralsight:course] Improve _VALID_URL 2015-11-21 08:32:48 +06:00
Sergey M․
c23e266427 [pluralsight] Do not require pluralsight account
Looks like some courses are available without pluralsight account
2015-11-21 08:25:52 +06:00
Sergey M․
651acffbe5 [pluralsight] Update ViewClip URL 2015-11-21 08:21:33 +06:00
Sergey M․
71bd93b89c [pluralsight] Do not rely on argument order in query (Closes #7583) 2015-11-21 08:08:34 +06:00
Sergey M․
6da620de58 [kaltura] Add test for referrer protected video (#7409) 2015-11-21 01:40:28 +06:00
Sergey M․
bdceea7afd [kaltura] Clean description 2015-11-21 01:39:29 +06:00
Sergey M․
d80a39cec8 [kaltura] Improve 2015-11-21 01:38:08 +06:00
Sergey M․
5b5fae5f20 [generic] Use referrer from source kaltura embed URLs (#7409) 2015-11-21 01:35:58 +06:00
Sergey M․
01b06aedcf [kaltura] Add support for referrer protected videos (#7409) 2015-11-21 01:34:02 +06:00
Sergey M
c711383811 Merge pull request #7579 from ashutosh-mishra/typo_fix
Typo fix, found while going through the code.
2015-11-20 23:24:54 +06:00
ashutosh-mishra
17cc153435 Typo fix, found while going through the code. 2015-11-20 22:51:46 +05:30
Sergey M․
67446fd49b [instagram] Improve _VALID_URL (Closes #7568) 2015-11-20 04:07:39 +06:00
Sergey M․
325bb615a7 [theplatform] Style 2015-11-19 22:58:43 +06:00
Sergey M․
ee5cd8418e [theplatform] Handle protocolless feed URLs (Closes #7532) 2015-11-19 22:58:29 +06:00
Sergey M․
342609a1b4 [bloomberg] Reax _VALID_URL (Closes #7546) 2015-11-19 22:55:06 +06:00
Sergey M
f270cf1a26 Merge pull request #7519 from barlik/master
Clarify that automatic subtitles are generated.
2015-11-19 22:44:08 +06:00
hedii
371c3b796c [YoutubeDL] Add playlist finished downloading message (Closes #7517)
Conflicts:
	youtube_dl/YoutubeDL.py
2015-11-19 22:39:02 +06:00
Sergey M․
6b7ceee1b9 [vimeo] Add test for #7552 2015-11-19 22:31:16 +06:00
Sergey M․
fdb20a27a3 [vimeo:group] Improve _VALID_URL (Closes #7552) 2015-11-19 22:30:58 +06:00
Sergey M․
2c94198eb6 [vimeo] Improve playlists extraction 2015-11-19 21:29:32 +06:00
Philipp Hagemeister
e8110b8125 release 2015.11.19 2015-11-19 15:35:13 +01:00
Yen Chi Hsuan
c39fd7b1ca [UDNEmbed] Fix generic UDN pages
Closes #7547
2015-11-19 22:32:56 +08:00
Sergey M․
a9c09a7c62 [pbs] Update API URL (Closes #7565) 2015-11-19 20:25:28 +06:00
Rastislav Barlik
741dd8ea65 Clarify that automatic subtitles are generated.
It wasn't clear what automatic word mean.
2015-11-16 14:15:25 +00:00
remitamine
63b728f06f [bleacherreport] Add new Extractor 2015-11-07 16:56:21 +01:00
remitamine
3793090b1b [amp] Add generic extractor for Akamai AMP feeds and use it in dramafever and foxnews extractors 2015-11-07 16:54:35 +01:00
remitamine
a641b24592 [cnet] skip hls_phone if hls_tablet is present 2015-11-06 07:23:03 +01:00
remitamine
967c9076a3 raise ExtractorError if the page doesn't contain a video 2015-11-05 18:01:13 +01:00
remitamine
f3003531a5 [flickr] handle error message 2015-11-01 13:38:11 +01:00
remitamine
146672254e [flickr] extract fresh api key and remove duplication in test 2015-11-01 13:23:23 +01:00
remitamine
02fb980451 [flickr] extract more info and formats 2015-11-01 02:08:19 +01:00
remitamine
50b9dd7344 [dcn] improve season info extraction 2015-10-31 15:40:11 +01:00
remitamine
720334659a [daum] improve info extraction 2015-10-31 01:08:37 +01:00
remitamine
240384afe6 [clipfish] improve info extraction 2015-10-30 20:06:38 +01:00
remitamine
957e0db1d2 [baidu] improve info extraction 2015-10-30 13:56:21 +01:00
remitamine
804afc5871 [vgtv] improve _VALID_URL regex 2015-10-30 10:20:38 +01:00
remitamine
00d24327ef [vgtv] extract videos from FTV, Aftenposten, Aftonbladet using VGTVIE 2015-10-30 09:48:56 +01:00
remitamine
9a605c8859 [adobetv] add support for show and channel extraction 2015-10-29 20:00:27 +01:00
remitamine
402ca40c9d [adobetv] extract AdobeTVVideo info from json directly 2015-10-29 19:55:04 +01:00
remitamine
30bd1c16c8 [adobetv] use api for extraction and add support specific language videos 2015-10-29 19:44:26 +01:00
remitamine
497f5fd93f [bilibili] extract multiple backup_urls 2015-10-21 08:24:05 +01:00
remitamine
4bf5614195 [cspan] move get_text_attr to CSpanIE 2015-10-20 07:43:39 +01:00
remitamine
520e753390 [bilibili] add support for specefic page extraction 2015-10-17 23:12:58 +01:00
remitamine
355c7ad361 [cspan] handle error massages and extract qualities 2015-10-17 21:30:38 +01:00
remitamine
55af2b26e0 [bilibili] extract backup url 2015-10-17 18:30:51 +01:00
remitamine
d90e40305b [bilibili] fix info extraction 2015-10-17 17:28:09 +01:00
remitamine
cce9d15d01 [ooyala] extract domain,handle errors and change related tests 2015-10-16 16:02:40 +01:00
remitamine
dd414c970b [ooyala] fix sorting and format id 2015-10-16 10:12:42 +01:00
remitamine
77302fe5c9 [bliptv] remove extractor and add support for site replacement(makertv) 2015-10-15 23:27:46 +01:00
remitamine
497ca088a6 [ooyala] remove print statment 2015-10-15 14:37:05 +01:00
remitamine
90bddb6cdd [ooyala] extract more formats and metadata 2015-10-15 14:28:56 +01:00
remitamine
db8e38b8cf [ign] add tests for me.ign specific language urls 2015-10-14 11:55:03 +01:00
remitamine
e09f58b3bc [srgssr] change the url chortcut, fix image extraction ,add a test and extract format id 2015-10-14 10:40:54 +01:00
remitamine
05ad5409b4 [srgssr] fix regex for swissinfo.ch 2015-10-09 20:34:03 +01:00
remitamine
1ef1563649 [srgssr] Add generic extractor for SRGSSR Group sites 2015-10-09 20:08:37 +01:00
remitamine
6a11bb77ba [nba] add support for team subsites 2015-10-07 12:17:32 +01:00
remitamine
ecf6de5b02 [nba] extract width,height and bitrate from format key 2015-10-07 07:09:45 +01:00
remitamine
139f27827e [nba] skip Legacy Video Files 2015-10-07 06:53:19 +01:00
remitamine
30787f7259 [cspan] correct the clip info extraction 2015-10-03 19:28:48 +01:00
remitamine
c233e6bcc3 [nba] extract video info from xml feed 2015-10-03 12:30:05 +01:00
remitamine
28809ab07a [nba] extract more formats 2015-10-03 09:47:19 +01:00
remitamine
adccf33632 [ign] add support for pcmag and extract all formats and more metadata 2015-10-02 21:58:20 +01:00
remitamine
8fc226ef99 [nba] extract all video formats and extract more info 2015-10-02 17:24:30 +01:00
remitamine
6aeba407db [jukebox] remove extractor and handle it using generic extractor 2015-09-25 10:52:48 +01:00
remitamine
b306c439d7 [cnet] fix extraction and extract more formats 2015-09-23 13:28:05 +01:00
remitamine
436416afe2 [tele13] skip test 2015-09-07 21:13:49 +01:00
remitamine
8b55cadc83 [canal13cl] fix info extraction 2015-09-07 16:39:01 +01:00
remitamine
486375154c correct season info extraction and simplify 2015-09-05 11:30:42 +01:00
remitamine
8e2898edf9 [dcn] add support for live streams and catchup videos 2015-09-04 15:42:09 +01:00
remitamine
b477da2094 correct the extractor name and id and remove unnecessary request 2015-09-03 16:59:10 +01:00
remitamine
9f4921bfa0 [dcn] add show extraction and support for other types of urls 2015-09-03 00:29:53 +01:00
remitamine
8e92d21ebf [googledrive] raise ExtractorError instead of warning 2015-07-23 11:59:13 +01:00
remitamine
36dbca8784 fix recursive error 2015-07-23 11:59:13 +01:00
remitamine
d1cc05e17e remove unnecessary regex group names 2015-07-23 11:59:12 +01:00
remitamine
3b3d531965 fix embed regex 2015-07-23 11:59:12 +01:00
remitamine
653789afc7 add google drive embeds 2015-07-23 11:59:12 +01:00
remitamine
2d651a2d02 import google drive embed class 2015-07-23 11:57:09 +01:00
remitamine
3e5f3df172 move the embed to a separate class 2015-07-23 11:57:09 +01:00
remitamine
f120a7ab5e change the _TEST info 2015-07-23 11:57:09 +01:00
remitamine
984e4d4875 [googledrive] Add new extractor 2015-07-23 11:57:08 +01:00
305 changed files with 10409 additions and 4538 deletions

10
AUTHORS
View File

@@ -146,3 +146,13 @@ Lukáš Lalinský
Qijiang Fan
Rémy Léone
Marco Ferragina
reiv
Muratcan Simsek
Evan Lu
flatgreen
Brian Foley
Vignesh Venkat
Tom Gijselinck
Founder Fang
Andrew Alexeyew
Saso Bezlaj

View File

@@ -1,4 +1,18 @@
**Please include the full output of youtube-dl when run with `-v`**.
**Please include the full output of youtube-dl when run with `-v`**, i.e. add `-v` flag to your command line, copy the **whole** output and post it in the issue body wrapped in \`\`\` for better formatting. It should look similar to this:
```
$ youtube-dl -v http://www.youtube.com/watch?v=BaW_jenozKcj
[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2015.12.06
[debug] Git HEAD: 135392e
[debug] Python version 2.6.6 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}
...
```
**Do not post screenshots of verbose log only plain text is acceptable.**
The output (including the first lines) contains important debugging information. Issues without the full output are often not reproducible and therefore do not get solved in short order, if ever.
@@ -14,13 +28,13 @@ So please elaborate on what feature you are requesting, or what bug you want to
- How it could be fixed
- How your proposed solution would look like
If your report is shorter than two lines, it is almost certainly missing some of these, which makes it hard for us to respond to it. We're often too polite to close the issue outright, but the missing info makes misinterpretation likely. As a commiter myself, I often get frustrated by these issues, since the only possible way for me to move forward on them is to ask for clarification over and over.
If your report is shorter than two lines, it is almost certainly missing some of these, which makes it hard for us to respond to it. We're often too polite to close the issue outright, but the missing info makes misinterpretation likely. As a committer myself, I often get frustrated by these issues, since the only possible way for me to move forward on them is to ask for clarification over and over.
For bug reports, this means that your report should contain the *complete* output of youtube-dl when called with the `-v` flag. The error message you get for (most) bugs even says so, but you would not believe how many of our bug reports do not contain this information.
If your server has multiple IPs or you suspect censorship, adding `--call-home` may be a good idea to get more diagnostics. If the error is `ERROR: Unable to extract ...` and you cannot reproduce it from multiple countries, add `--dump-pages` (warning: this will yield a rather large output, redirect it to the file `log.txt` by adding `>log.txt 2>&1` to your command-line) or upload the `.dump` files you get when you add `--write-pages` [somewhere](https://gist.github.com/).
**Site support requests must contain an example URL**. An example URL is a URL you might want to download, like http://www.youtube.com/watch?v=BaW_jenozKc . There should be an obvious video present. Except under very special circumstances, the main page of a video service (e.g. http://www.youtube.com/ ) is *not* an example URL.
**Site support requests must contain an example URL**. An example URL is a URL you might want to download, like `http://www.youtube.com/watch?v=BaW_jenozKc`. There should be an obvious video present. Except under very special circumstances, the main page of a video service (e.g. `http://www.youtube.com/`) is *not* an example URL.
### Are you using the latest version?
@@ -28,7 +42,7 @@ Before reporting any issue, type `youtube-dl -U`. This should report that you're
### Is the issue already documented?
Make sure that someone has not already opened the issue you're trying to open. Search at the top of the window or at https://github.com/rg3/youtube-dl/search?type=Issues . If there is an issue, feel free to write something along the lines of "This affects me as well, with version 2015.01.01. Here is some more information on the issue: ...". While some issues may be old, a new post into them often spurs rapid activity.
Make sure that someone has not already opened the issue you're trying to open. Search at the top of the window or browse the [GitHub Issues](https://github.com/rg3/youtube-dl/search?type=Issues) of this repository. If there is an issue, feel free to write something along the lines of "This affects me as well, with version 2015.01.01. Here is some more information on the issue: ...". While some issues may be old, a new post into them often spurs rapid activity.
### Why are existing options not enough?

View File

@@ -61,34 +61,34 @@ youtube-dl: youtube_dl/*.py youtube_dl/*/*.py
chmod a+x youtube-dl
README.md: youtube_dl/*.py youtube_dl/*/*.py
COLUMNS=80 python youtube_dl/__main__.py --help | python devscripts/make_readme.py
COLUMNS=80 $(PYTHON) youtube_dl/__main__.py --help | $(PYTHON) devscripts/make_readme.py
CONTRIBUTING.md: README.md
python devscripts/make_contributing.py README.md CONTRIBUTING.md
$(PYTHON) devscripts/make_contributing.py README.md CONTRIBUTING.md
supportedsites:
python devscripts/make_supportedsites.py docs/supportedsites.md
$(PYTHON) devscripts/make_supportedsites.py docs/supportedsites.md
README.txt: README.md
pandoc -f markdown -t plain README.md -o README.txt
youtube-dl.1: README.md
python devscripts/prepare_manpage.py >youtube-dl.1.temp.md
$(PYTHON) devscripts/prepare_manpage.py >youtube-dl.1.temp.md
pandoc -s -f markdown -t man youtube-dl.1.temp.md -o youtube-dl.1
rm -f youtube-dl.1.temp.md
youtube-dl.bash-completion: youtube_dl/*.py youtube_dl/*/*.py devscripts/bash-completion.in
python devscripts/bash-completion.py
$(PYTHON) devscripts/bash-completion.py
bash-completion: youtube-dl.bash-completion
youtube-dl.zsh: youtube_dl/*.py youtube_dl/*/*.py devscripts/zsh-completion.in
python devscripts/zsh-completion.py
$(PYTHON) devscripts/zsh-completion.py
zsh-completion: youtube-dl.zsh
youtube-dl.fish: youtube_dl/*.py youtube_dl/*/*.py devscripts/fish-completion.in
python devscripts/fish-completion.py
$(PYTHON) devscripts/fish-completion.py
fish-completion: youtube-dl.fish

137
README.md
View File

@@ -35,7 +35,7 @@ You can also use pip:
sudo pip install youtube-dl
Alternatively, refer to the [developer instructions](#developer-instructions) for how to check out and work with the git repository. For further options, including PGP signatures, see https://rg3.github.io/youtube-dl/download.html .
Alternatively, refer to the [developer instructions](#developer-instructions) for how to check out and work with the git repository. For further options, including PGP signatures, see the [youtube-dl Download Page](https://rg3.github.io/youtube-dl/download.html).
# DESCRIPTION
**youtube-dl** is a small command-line program to download videos from
@@ -173,6 +173,10 @@ which means you can modify it, redistribute it or use it however you like.
expected filesize (experimental)
--hls-prefer-native Use the native HLS downloader instead of
ffmpeg (experimental)
--hls-use-mpegts Use the mpegts container for HLS videos,
allowing to play the video while
downloading (some players may not be able
to play it)
--external-downloader COMMAND Use the specified external downloader.
Currently supports
aria2c,axel,curl,httpie,wget
@@ -319,7 +323,8 @@ which means you can modify it, redistribute it or use it however you like.
--all-formats Download all available video formats
--prefer-free-formats Prefer free video formats unless a specific
one is requested
-F, --list-formats List all available formats
-F, --list-formats List all available formats of requested
videos
--youtube-skip-dash-manifest Do not download the DASH manifests and
related data on YouTube videos
--merge-output-format FORMAT If a merge is required (e.g.
@@ -329,8 +334,8 @@ which means you can modify it, redistribute it or use it however you like.
## Subtitle Options:
--write-sub Write subtitle file
--write-auto-sub Write automatic subtitle file (YouTube
only)
--write-auto-sub Write automatically generated subtitle file
(YouTube only)
--all-subs Download all the available subtitles of the
video
--list-subs List all available subtitles for the video
@@ -338,8 +343,8 @@ which means you can modify it, redistribute it or use it however you like.
preference, for example: "srt" or
"ass/srt/best"
--sub-lang LANGS Languages of the subtitles to download
(optional) separated by commas, use IETF
language tags like 'en,pt'
(optional) separated by commas, use --list-
subs for available language tags
## Authentication Options:
-u, --username USERNAME Login with this account ID
@@ -399,7 +404,7 @@ which means you can modify it, redistribute it or use it however you like.
downloading, similar to find's -exec
syntax. Example: --exec 'adb push {}
/sdcard/Music/ && rm {}'
--convert-subtitles FORMAT Convert the subtitles to other format
--convert-subs FORMAT Convert the subtitles to other format
(currently supported: srt|ass|vtt)
# CONFIGURATION
@@ -413,7 +418,7 @@ You can configure youtube-dl by placing any supported command line option to a c
You can use `--ignore-config` if you want to disable the configuration file for a particular youtube-dl run.
### Authentication with `.netrc` file ###
### Authentication with `.netrc` file
You may also want to configure automatic credentials storage for extractors that support authentication (by providing login and password with `--username` and `--password`) in order not to pass credentials as command line arguments on every youtube-dl execution and prevent tracking plain text passwords in the shell command history. You can achieve this using a [`.netrc` file](http://stackoverflow.com/tags/.netrc/info) on per extractor basis. For that you will need to create a`.netrc` file in your `$HOME` and restrict permissions to read/write by you only:
```
@@ -450,6 +455,8 @@ The `-o` option allows users to indicate a template for the output file names. T
- `format_id`: The sequence will be replaced by the format code specified by `--format`.
- `duration`: The sequence will be replaced by the length of the video in seconds.
Note that some of the aforementioned sequences are not guaranteed to be present since they depend on the metadata obtained by particular extractor, such sequences will be replaced with `NA`.
The current default template is `%(title)s-%(id)s.%(ext)s`.
In some cases, you don't want special characters such as 中, spaces, or &, such as when transferring the downloaded filename to a Windows system or the filename through an 8bit-unsafe channel. In these cases, add the `--restrict-filenames` flag to get a shorter title:
@@ -463,15 +470,77 @@ youtube-dl_test_video_.mp4 # A simple file name
# FORMAT SELECTION
By default youtube-dl tries to download the best quality, but sometimes you may want to download in a different format.
The simplest case is requesting a specific format, for example `-f 22`. You can get the list of available formats using `--list-formats`, you can also use a file extension (currently it supports aac, m4a, mp3, mp4, ogg, wav, webm) or the special names `best`, `bestvideo`, `bestaudio` and `worst`.
By default youtube-dl tries to download the best available quality, i.e. if you want the best quality you **don't need** to pass any special options, youtube-dl will guess it for you by **default**.
If you want to download multiple videos and they don't have the same formats available, you can specify the order of preference using slashes, as in `-f 22/17/18`. You can also filter the video results by putting a condition in brackets, as in `-f "best[height=720]"` (or `-f "[filesize>10M]"`). This works for filesize, height, width, tbr, abr, vbr, asr, and fps and the comparisons <, <=, >, >=, =, != and for ext, acodec, vcodec, container, and protocol and the comparisons =, != . Formats for which the value is not known are excluded unless you put a question mark (?) after the operator. You can combine format filters, so `-f "[height <=? 720][tbr>500]"` selects up to 720p videos (or videos where the height is not known) with a bitrate of at least 500 KBit/s. Use commas to download multiple formats, such as `-f 136/137/mp4/bestvideo,140/m4a/bestaudio`. You can merge the video and audio of two formats into a single file using `-f <video-format>+<audio-format>` (requires ffmpeg or avconv), for example `-f bestvideo+bestaudio`. Format selectors can also be grouped using parentheses, for example if you want to download the best mp4 and webm formats with a height lower than 480 you can use `-f '(mp4,webm)[height<480]'`.
But sometimes you may want to download in a different format, for example when you are on a slow or intermittent connection. The key mechanism for achieving this is so called *format selection* based on which you can explicitly specify desired format, select formats based on some criterion or criteria, setup precedence and much more.
Since the end of April 2015 and version 2015.04.26 youtube-dl uses `-f bestvideo+bestaudio/best` as default format selection (see #5447, #5456). If ffmpeg or avconv are installed this results in downloading `bestvideo` and `bestaudio` separately and muxing them together into a single file giving the best overall quality available. Otherwise it falls back to `best` and results in downloading the best available quality served as a single file. `best` is also needed for videos that don't come from YouTube because they don't provide the audio and video in two different files. If you want to only download some dash formats (for example if you are not interested in getting videos with a resolution higher than 1080p), you can add `-f bestvideo[height<=?1080]+bestaudio/best` to your configuration file. Note that if you use youtube-dl to stream to `stdout` (and most likely to pipe it to your media player then), i.e. you explicitly specify output template as `-o -`, youtube-dl still uses `-f best` format selection in order to start content delivery immediately to your player and not to wait until `bestvideo` and `bestaudio` are downloaded and muxed.
The general syntax for format selection is `--format FORMAT` or shorter `-f FORMAT` where `FORMAT` is a *selector expression*, i.e. an expression that describes format or formats you would like to download.
The simplest case is requesting a specific format, for example with `-f 22` you can download the format with format code equal to 22. You can get the list of available format codes for particular video using `--list-formats` or `-F`. Note that these format codes are extractor specific.
You can also use a file extension (currently `3gp`, `aac`, `flv`, `m4a`, `mp3`, `mp4`, `ogg`, `wav`, `webm` are supported) to download best quality format of particular file extension served as a single file, e.g. `-f webm` will download best quality format with `webm` extension served as a single file.
You can also use special names to select particular edge case format:
- `best`: Select best quality format represented by single file with video and audio
- `worst`: Select worst quality format represented by single file with video and audio
- `bestvideo`: Select best quality video only format (e.g. DASH video), may not be available
- `worstvideo`: Select worst quality video only format, may not be available
- `bestaudio`: Select best quality audio only format, may not be available
- `worstaudio`: Select worst quality audio only format, may not be available
For example, to download worst quality video only format you can use `-f worstvideo`.
If you want to download multiple videos and they don't have the same formats available, you can specify the order of preference using slashes. Note that slash is left-associative, i.e. formats on the left hand side are preferred, for example `-f 22/17/18` will download format 22 if it's available, otherwise it will download format 17 if it's available, otherwise it will download format 18 if it's available, otherwise it will complain that no suitable formats are available for download.
If you want to download several formats of the same video use comma as a separator, e.g. `-f 22,17,18` will download all these three formats, of course if they are available. Or more sophisticated example combined with precedence feature `-f 136/137/mp4/bestvideo,140/m4a/bestaudio`.
You can also filter the video formats by putting a condition in brackets, as in `-f "best[height=720]"` (or `-f "[filesize>10M]"`).
The following numeric meta fields can be used with comparisons `<`, `<=`, `>`, `>=`, `=` (equals), `!=` (not equals):
- `filesize`: The number of bytes, if known in advance
- `width`: Width of the video, if known
- `height`: Height of the video, if known
- `tbr`: Average bitrate of audio and video in KBit/s
- `abr`: Average audio bitrate in KBit/s
- `vbr`: Average video bitrate in KBit/s
- `asr`: Audio sampling rate in Hertz
- `fps`: Frame rate
Also filtering work for comparisons `=` (equals), `!=` (not equals), `^=` (begins with), `$=` (ends with), `*=` (contains) and following string meta fields:
- `ext`: File extension
- `acodec`: Name of the audio codec in use
- `vcodec`: Name of the video codec in use
- `container`: Name of the container format
- `protocol`: The protocol that will be used for the actual download, lower-case. `http`, `https`, `rtsp`, `rtmp`, `rtmpe`, `m3u8`, or `m3u8_native`
Note that none of the aforementioned meta fields are guaranteed to be present since this solely depends on the metadata obtained by particular extractor, i.e. the metadata offered by video hoster.
Formats for which the value is not known are excluded unless you put a question mark (`?`) after the operator. You can combine format filters, so `-f "[height <=? 720][tbr>500]"` selects up to 720p videos (or videos where the height is not known) with a bitrate of at least 500 KBit/s.
You can merge the video and audio of two formats into a single file using `-f <video-format>+<audio-format>` (requires ffmpeg or avconv installed), for example `-f bestvideo+bestaudio` will download best video only format, best audio only format and mux them together with ffmpeg/avconv.
Format selectors can also be grouped using parentheses, for example if you want to download the best mp4 and webm formats with a height lower than 480 you can use `-f '(mp4,webm)[height<480]'`.
Since the end of April 2015 and version 2015.04.26 youtube-dl uses `-f bestvideo+bestaudio/best` as default format selection (see #5447, #5456). If ffmpeg or avconv are installed this results in downloading `bestvideo` and `bestaudio` separately and muxing them together into a single file giving the best overall quality available. Otherwise it falls back to `best` and results in downloading the best available quality served as a single file. `best` is also needed for videos that don't come from YouTube because they don't provide the audio and video in two different files. If you want to only download some DASH formats (for example if you are not interested in getting videos with a resolution higher than 1080p), you can add `-f bestvideo[height<=?1080]+bestaudio/best` to your configuration file. Note that if you use youtube-dl to stream to `stdout` (and most likely to pipe it to your media player then), i.e. you explicitly specify output template as `-o -`, youtube-dl still uses `-f best` format selection in order to start content delivery immediately to your player and not to wait until `bestvideo` and `bestaudio` are downloaded and muxed.
If you want to preserve the old format selection behavior (prior to youtube-dl 2015.04.26), i.e. you want to download the best available quality media served as a single file, you should explicitly specify your choice with `-f best`. You may want to add it to the [configuration file](#configuration) in order not to type it every time you run youtube-dl.
Examples (note on Windows you may need to use double quotes instead of single):
```bash
# Download best mp4 format available or any other best if no mp4 available
$ youtube-dl -f 'bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best'
# Download best format available but not better that 480p
$ youtube-dl -f 'bestvideo[height<=480]+bestaudio/best[height<=480]'
# Download best video only format but no bigger that 50 MB
$ youtube-dl -f 'best[filesize<50M]'
# Download best format available via direct link over HTTP/HTTPS protocol
$ youtube-dl -f '(bestvideo+bestaudio/best)[protocol^=http]'
```
# VIDEO SELECTION
Videos can be filtered by their upload date using the options `--date`, `--datebefore` or `--dateafter`. They accept dates in two formats:
@@ -534,6 +603,12 @@ Most people asking this question are not aware that youtube-dl now defaults to d
Apparently YouTube requires you to pass a CAPTCHA test if you download too much. We're [considering to provide a way to let you solve the CAPTCHA](https://github.com/rg3/youtube-dl/issues/154), but at the moment, your best course of action is pointing a webbrowser to the youtube URL, solving the CAPTCHA, and restart youtube-dl.
### Do I need any other programs?
youtube-dl works fine on its own on most sites. However, if you want to convert video/audio, you'll need [avconv](https://libav.org/) or [ffmpeg](https://www.ffmpeg.org/). On some sites - most notably YouTube - videos can be retrieved in a higher quality format without sound. youtube-dl will detect whether avconv/ffmpeg is present and automatically pick the best option.
Videos or video formats streamed via RTMP protocol can only be downloaded when [rtmpdump](https://rtmpdump.mplayerhq.hu/) is installed. Downloading MMS and RTSP videos requires either [mplayer](http://mplayerhq.hu/) or [mpv](https://mpv.io/) to be installed.
### I have downloaded a video but how can I play it?
Once the video is fully downloaded, use any video player, such as [vlc](http://www.videolan.org) or [mplayer](http://www.mplayerhq.hu/).
@@ -552,11 +627,11 @@ If you want to play the video on a machine that is not running youtube-dl, you c
YouTube has switched to a new video info format in July 2011 which is not supported by old versions of youtube-dl. See [above](#how-do-i-update-youtube-dl) for how to update youtube-dl.
### ERROR: unable to download video ###
### ERROR: unable to download video
YouTube requires an additional signature since September 2012 which is not supported by old versions of youtube-dl. See [above](#how-do-i-update-youtube-dl) for how to update youtube-dl.
### Video URL contains an ampersand and I'm getting some strange output `[1] 2839` or `'v' is not recognized as an internal or external command` ###
### Video URL contains an ampersand and I'm getting some strange output `[1] 2839` or `'v' is not recognized as an internal or external command`
That's actually the output from your shell. Since ampersand is one of the special shell characters it's interpreted by the shell preventing you from passing the whole URL to youtube-dl. To disable your shell from interpreting the ampersands (or any other special characters) you have to either put the whole URL in quotes or escape them with a backslash (which approach will work depends on your shell).
@@ -580,7 +655,7 @@ In February 2015, the new YouTube player contained a character sequence in a str
These two error codes indicate that the service is blocking your IP address because of overuse. Contact the service and ask them to unblock your IP address, or - if you have acquired a whitelisted IP address already - use the [`--proxy` or `--source-address` options](#network-options) to select another IP address.
### SyntaxError: Non-ASCII character ###
### SyntaxError: Non-ASCII character
The error
@@ -609,7 +684,7 @@ From then on, after restarting your shell, you will be able to access both youtu
Use the `-o` to specify an [output template](#output-template), for example `-o "/home/user/videos/%(title)s-%(id)s.%(ext)s"`. If you want this for all of your downloads, put the option into your [configuration file](#configuration).
### How do I download a video starting with a `-` ?
### How do I download a video starting with a `-`?
Either prepend `http://www.youtube.com/watch?v=` or separate the ID from the options with `--`:
@@ -620,7 +695,7 @@ Either prepend `http://www.youtube.com/watch?v=` or separate the ID from the opt
Use the `--cookies` option, for example `--cookies /path/to/cookies/file.txt`. Note that the cookies file must be in Mozilla/Netscape format and the first line of the cookies file must be either `# HTTP Cookie File` or `# Netscape HTTP Cookie File`. Make sure you have correct [newline format](https://en.wikipedia.org/wiki/Newline) in the cookies file and convert newlines if necessary to correspond with your OS, namely `CRLF` (`\r\n`) for Windows, `LF` (`\n`) for Linux and `CR` (`\r`) for Mac OS. `HTTP Error 400: Bad Request` when using `--cookies` is a good sign of invalid newline format.
Passing cookies to youtube-dl is a good way to workaround login when a particular extractor does not implement it explicitly.
Passing cookies to youtube-dl is a good way to workaround login when a particular extractor does not implement it explicitly. Another use case is working around [CAPTCHA](https://en.wikipedia.org/wiki/CAPTCHA) some websites require you to solve in particular cases in order to get access (e.g. YouTube, CloudFlare).
### Can you add support for this anime video site, or site which shows current movies for free?
@@ -750,7 +825,7 @@ with youtube_dl.YoutubeDL(ydl_opts) as ydl:
ydl.download(['http://www.youtube.com/watch?v=BaW_jenozKc'])
```
Most likely, you'll want to use various options. For a list of what can be done, have a look at [youtube_dl/YoutubeDL.py](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/YoutubeDL.py#L117-L265). For a start, if you want to intercept youtube-dl's output, set a `logger` object.
Most likely, you'll want to use various options. For a list of what can be done, have a look at [`youtube_dl/YoutubeDL.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/YoutubeDL.py#L121-L269). For a start, if you want to intercept youtube-dl's output, set a `logger` object.
Here's a more complete example of a program that outputs only errors (and a short message after the download is finished), and downloads/converts the video to an mp3 file:
@@ -791,9 +866,23 @@ with youtube_dl.YoutubeDL(ydl_opts) as ydl:
# BUGS
Bugs and suggestions should be reported at: <https://github.com/rg3/youtube-dl/issues> . Unless you were prompted so or there is another pertinent reason (e.g. GitHub fails to accept the bug report), please do not send bug reports via personal email. For discussions, join us in the irc channel #youtube-dl on freenode.
Bugs and suggestions should be reported at: <https://github.com/rg3/youtube-dl/issues>. Unless you were prompted so or there is another pertinent reason (e.g. GitHub fails to accept the bug report), please do not send bug reports via personal email. For discussions, join us in the IRC channel [#youtube-dl](irc://chat.freenode.net/#youtube-dl) on freenode ([webchat](http://webchat.freenode.net/?randomnick=1&channels=youtube-dl)).
**Please include the full output of youtube-dl when run with `-v`**.
**Please include the full output of youtube-dl when run with `-v`**, i.e. add `-v` flag to your command line, copy the **whole** output and post it in the issue body wrapped in \`\`\` for better formatting. It should look similar to this:
```
$ youtube-dl -v http://www.youtube.com/watch?v=BaW_jenozKcj
[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2015.12.06
[debug] Git HEAD: 135392e
[debug] Python version 2.6.6 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}
...
```
**Do not post screenshots of verbose log only plain text is acceptable.**
The output (including the first lines) contains important debugging information. Issues without the full output are often not reproducible and therefore do not get solved in short order, if ever.
@@ -809,13 +898,13 @@ So please elaborate on what feature you are requesting, or what bug you want to
- How it could be fixed
- How your proposed solution would look like
If your report is shorter than two lines, it is almost certainly missing some of these, which makes it hard for us to respond to it. We're often too polite to close the issue outright, but the missing info makes misinterpretation likely. As a commiter myself, I often get frustrated by these issues, since the only possible way for me to move forward on them is to ask for clarification over and over.
If your report is shorter than two lines, it is almost certainly missing some of these, which makes it hard for us to respond to it. We're often too polite to close the issue outright, but the missing info makes misinterpretation likely. As a committer myself, I often get frustrated by these issues, since the only possible way for me to move forward on them is to ask for clarification over and over.
For bug reports, this means that your report should contain the *complete* output of youtube-dl when called with the `-v` flag. The error message you get for (most) bugs even says so, but you would not believe how many of our bug reports do not contain this information.
If your server has multiple IPs or you suspect censorship, adding `--call-home` may be a good idea to get more diagnostics. If the error is `ERROR: Unable to extract ...` and you cannot reproduce it from multiple countries, add `--dump-pages` (warning: this will yield a rather large output, redirect it to the file `log.txt` by adding `>log.txt 2>&1` to your command-line) or upload the `.dump` files you get when you add `--write-pages` [somewhere](https://gist.github.com/).
**Site support requests must contain an example URL**. An example URL is a URL you might want to download, like http://www.youtube.com/watch?v=BaW_jenozKc . There should be an obvious video present. Except under very special circumstances, the main page of a video service (e.g. http://www.youtube.com/ ) is *not* an example URL.
**Site support requests must contain an example URL**. An example URL is a URL you might want to download, like `http://www.youtube.com/watch?v=BaW_jenozKc`. There should be an obvious video present. Except under very special circumstances, the main page of a video service (e.g. `http://www.youtube.com/`) is *not* an example URL.
### Are you using the latest version?
@@ -823,7 +912,7 @@ Before reporting any issue, type `youtube-dl -U`. This should report that you're
### Is the issue already documented?
Make sure that someone has not already opened the issue you're trying to open. Search at the top of the window or at https://github.com/rg3/youtube-dl/search?type=Issues . If there is an issue, feel free to write something along the lines of "This affects me as well, with version 2015.01.01. Here is some more information on the issue: ...". While some issues may be old, a new post into them often spurs rapid activity.
Make sure that someone has not already opened the issue you're trying to open. Search at the top of the window or browse the [GitHub Issues](https://github.com/rg3/youtube-dl/search?type=Issues) of this repository. If there is an issue, feel free to write something along the lines of "This affects me as well, with version 2015.01.01. Here is some more information on the issue: ...". While some issues may be old, a new post into them often spurs rapid activity.
### Why are existing options not enough?
@@ -853,4 +942,4 @@ It may sound strange, but some bug reports we receive are completely unrelated t
youtube-dl is released into the public domain by the copyright holders.
This README file was originally written by Daniel Bolton (<https://github.com/dbbolton>) and is likewise released into the public domain.
This README file was originally written by [Daniel Bolton](https://github.com/dbbolton) and is likewise released into the public domain.

View File

@@ -5,7 +5,7 @@ from __future__ import with_statement, unicode_literals
import datetime
import glob
import io # For Python 2 compatibilty
import io # For Python 2 compatibility
import os
import re

View File

@@ -1,6 +1,7 @@
# Supported sites
- **1tv**: Первый канал
- **1up.com**
- **20min**
- **220.ro**
- **22tracks:genre**
- **22tracks:track**
@@ -15,11 +16,15 @@
- **abc.net.au**
- **Abc7News**
- **AcademicEarth:Course**
- **acast**
- **acast:channel**
- **AddAnime**
- **AdobeTV**
- **AdobeTVChannel**
- **AdobeTVShow**
- **AdobeTVVideo**
- **AdultSwim**
- **Aftenposten**
- **aenetworks**: A+E Networks: A&E, Lifetime, History.com, FYI Network
- **Aftonbladet**
- **AirMozilla**
- **AlJazeera**
@@ -30,12 +35,15 @@
- **Aparat**
- **AppleConnect**
- **AppleDaily**: 臺灣蘋果日報
- **AppleTrailers**
- **appletrailers**
- **appletrailers:section**
- **archive.org**: archive.org videos
- **ARD**
- **ARD:mediathek**: Saarländischer Rundfunk
- **ARD:mediathek**
- **arte.tv**
- **arte.tv:+7**
- **arte.tv:cinema**
- **arte.tv:concert**
- **arte.tv:creative**
- **arte.tv:ddc**
@@ -43,9 +51,11 @@
- **arte.tv:future**
- **AtresPlayer**
- **ATTTechChannel**
- **AudiMedia**
- **audiomack**
- **audiomack:album**
- **Azubu**
- **AzubuLive**
- **BaiduVideo**: 百度视频
- **bambuser**
- **bambuser:channel**
@@ -58,11 +68,12 @@
- **Beeg**
- **BehindKink**
- **Bet**
- **Bigflix**
- **Bild**: Bild.de
- **BiliBili**
- **BleacherReport**
- **BleacherReportCMS**
- **blinkx**
- **blip.tv:user**
- **BlipTV**
- **Bloomberg**
- **Bpb**: Bundeszentrale für politische Bildung
- **BR**: Bayerischer Rundfunk Mediathek
@@ -75,11 +86,12 @@
- **BYUtv**
- **Camdemy**
- **CamdemyFolder**
- **Canal13cl**
- **canalc2.tv**
- **Canalplus**: canalplus.fr, piwiplus.fr and d8.tv
- **Canvas**
- **CBS**
- **CBSNews**: CBS News
- **CBSNewsLiveVideo**: CBS News Live Videos
- **CBSSports**
- **CeskaTelevize**
- **channel9**: Channel 9
@@ -92,6 +104,7 @@
- **Clipfish**
- **cliphunter**
- **Clipsyndicate**
- **cloudtime**: CloudTime
- **Cloudy**
- **Clubic**
- **Clyp**
@@ -114,18 +127,27 @@
- **CSpan**: C-SPAN
- **CtsNews**: 華視新聞
- **culturebox.francetvinfo.fr**
- **CultureUnplugged**
- **CWTV**
- **dailymotion**
- **dailymotion:playlist**
- **dailymotion:user**
- **DailymotionCloud**
- **daum.net**
- **daum.net:clip**
- **daum.net:playlist**
- **daum.net:user**
- **DBTV**
- **DCN**
- **dcn:live**
- **dcn:season**
- **dcn:video**
- **DctpTv**
- **DeezerPlaylist**
- **defense.gouv.fr**
- **democracynow**
- **DHM**: Filmarchiv - Deutsches Historisches Museum
- **Digiteka**
- **Discovery**
- **Dotsub**
- **DouyuTV**: 斗鱼
@@ -154,7 +176,7 @@
- **Eporner**
- **EroProfile**
- **Escapist**
- **ESPN** (Currently broken)
- **ESPN**
- **EsriVideo**
- **Europa**
- **EveryonesMixtape**
@@ -162,6 +184,7 @@
- **ExpoTV**
- **ExtremeTube**
- **facebook**
- **facebook:post**
- **faz.net**
- **fc2**
- **Fczenit**
@@ -171,18 +194,22 @@
- **Flickr**
- **Folketinget**: Folketinget (ft.dk; Danish parliament)
- **FootyRoom**
- **FOX**
- **Foxgay**
- **FoxNews**: Fox News and Fox Business Video
- **FoxSports**
- **france2.fr:generation-quoi**
- **FranceCulture**
- **FranceCultureEmission**
- **FranceInter**
- **francetv**: France 2, 3, 4, 5 and Ô
- **francetvinfo.fr**
- **Freesound**
- **freespeech.org**
- **FreeVideo**
- **Funimation**
- **FunnyOrDie**
- **GameInformer**
- **Gamekings**
- **GameOne**
- **gameone:playlist**
@@ -202,7 +229,9 @@
- **GodTube**
- **GoldenMoustache**
- **Golem**
- **GoogleDrive**
- **Goshgay**
- **GPUTechConf**
- **Groupon**
- **Hark**
- **HearThisAt**
@@ -211,11 +240,11 @@
- **Helsinki**: helsinki.fi
- **HentaiStigma**
- **HistoricFilms**
- **History**
- **hitbox**
- **hitbox:live**
- **HornBunny**
- **HotNewHipHop**
- **HotStar**
- **Howcast**
- **HowStuffWorks**
- **HuffPost**: Huffington Post
@@ -233,17 +262,18 @@
- **Instagram**
- **instagram:user**: Instagram user profile
- **InternetVideoArchive**
- **IPrima**
- **IPrima** (Currently broken)
- **iqiyi**: 爱奇艺
- **Ir90Tv**
- **ivi**: ivi.ru
- **ivi:compilation**: ivi.ru compilations
- **ivideon**: Ivideon TV
- **Izlesene**
- **JadoreCettePub**
- **JeuxVideo**
- **Jove**
- **jpopsuki.tv**
- **Jukebox**
- **JWPlatform**
- **Kaltura**
- **KanalPlay**: Kanal 5/9/11 Play
- **Kankan**
@@ -265,7 +295,9 @@
- **la7.tv**
- **Laola1Tv**
- **Lecture2Go**
- **Lemonde**
- **Letv**: 乐视网
- **LetvCloud**: 乐视云
- **LetvPlaylist**
- **LetvTv**
- **Libsyn**
@@ -278,13 +310,16 @@
- **livestream**
- **livestream:original**
- **LnkGo**
- **LoveHomePorn**
- **lrt.lt**
- **lynda**: lynda.com videos
- **lynda:course**: lynda.com online courses
- **m6**
- **macgamestore**: MacGameStore trailers
- **mailru**: Видео@Mail.Ru
- **MakerTV**
- **Malemotion**
- **MatchTV**
- **MDR**: MDR.DE and KiKA
- **media.ccc.de**
- **metacafe**
@@ -307,7 +342,6 @@
- **MovieClips**
- **MovieFap**
- **Moviezine**
- **movshare**: MovShare
- **MPORA**
- **MSNBC**
- **MTV**
@@ -350,11 +384,13 @@
- **Newstube**
- **NextMedia**: 蘋果日報
- **NextMediaActionNews**: 蘋果日報 - 動新聞
- **nextmovie.com**
- **nfb**: National Film Board of Canada
- **nfl.com**
- **nhl.com**
- **nhl.com:news**: NHL news
- **nhl.com:videocenter**: NHL videocenter category
- **nick.com**
- **niconico**: ニコニコ動画
- **NiconicoPlaylist**
- **njoy**: N-JOY
@@ -367,13 +403,14 @@
- **nowness**
- **nowness:playlist**
- **nowness:series**
- **NowTV**
- **NowTV** (Currently broken)
- **NowTVList**
- **nowvideo**: NowVideo
- **npo**: npo.nl and ntr.nl
- **npo.nl:live**
- **npo.nl:radio**
- **npo.nl:radio:fragment**
- **Npr**
- **NRK**
- **NRKPlaylist**
- **NRKTV**: NRK TV and NRK Radio
@@ -388,16 +425,19 @@
- **OnionStudios**
- **Ooyala**
- **OoyalaExternal**
- **OraTV**
- **orf:fm4**: radio FM4
- **orf:iptv**: iptv.ORF.at
- **orf:oe1**: Radio Österreich 1
- **orf:tvthek**: ORF TVthek
- **pandora.tv**: 판도라TV
- **parliamentlive.tv**: UK parliament videos
- **Patreon**
- **PBS**
- **pbs**: Public Broadcasting Service (PBS) and member stations: PBS: Public Broadcasting Service, APT - Alabama Public Television (WBIQ), GPB/Georgia Public Broadcasting (WGTV), Mississippi Public Broadcasting (WMPN), Nashville Public Television (WNPT), WFSU-TV (WFSU), WSRE (WSRE), WTCI (WTCI), WPBA/Channel 30 (WPBA), Alaska Public Media (KAKM), Arizona PBS (KAET), KNME-TV/Channel 5 (KNME), Vegas PBS (KLVX), AETN/ARKANSAS ETV NETWORK (KETS), KET (WKLE), WKNO/Channel 10 (WKNO), LPB/LOUISIANA PUBLIC BROADCASTING (WLPB), OETA (KETA), Ozarks Public Television (KOZK), WSIU Public Broadcasting (WSIU), KEET TV (KEET), KIXE/Channel 9 (KIXE), KPBS San Diego (KPBS), KQED (KQED), KVIE Public Television (KVIE), PBS SoCal/KOCE (KOCE), ValleyPBS (KVPT), CONNECTICUT PUBLIC TELEVISION (WEDH), KNPB Channel 5 (KNPB), SOPTV (KSYS), Rocky Mountain PBS (KRMA), KENW-TV3 (KENW), KUED Channel 7 (KUED), Wyoming PBS (KCWC), Colorado Public Television / KBDI 12 (KBDI), KBYU-TV (KBYU), Thirteen/WNET New York (WNET), WGBH/Channel 2 (WGBH), WGBY (WGBY), NJTV Public Media NJ (WNJT), WLIW21 (WLIW), mpt/Maryland Public Television (WMPB), WETA Television and Radio (WETA), WHYY (WHYY), PBS 39 (WLVT), WVPT - Your Source for PBS and More! (WVPT), Howard University Television (WHUT), WEDU PBS (WEDU), WGCU Public Media (WGCU), WPBT2 (WPBT), WUCF TV (WUCF), WUFT/Channel 5 (WUFT), WXEL/Channel 42 (WXEL), WLRN/Channel 17 (WLRN), WUSF Public Broadcasting (WUSF), ETV (WRLK), UNC-TV (WUNC), PBS Hawaii - Oceanic Cable Channel 10 (KHET), Idaho Public Television (KAID), KSPS (KSPS), OPB (KOPB), KWSU/Channel 10 & KTNW/Channel 31 (KWSU), WILL-TV (WILL), Network Knowledge - WSEC/Springfield (WSEC), WTTW11 (WTTW), Iowa Public Television/IPTV (KDIN), Nine Network (KETC), PBS39 Fort Wayne (WFWA), WFYI Indianapolis (WFYI), Milwaukee Public Television (WMVS), WNIN (WNIN), WNIT Public Television (WNIT), WPT (WPNE), WVUT/Channel 22 (WVUT), WEIU/Channel 51 (WEIU), WQPT-TV (WQPT), WYCC PBS Chicago (WYCC), WIPB-TV (WIPB), WTIU (WTIU), CET (WCET), ThinkTVNetwork (WPTD), WBGU-TV (WBGU), WGVU TV (WGVU), NET1 (KUON), Pioneer Public Television (KWCM), SDPB Television (KUSD), TPT (KTCA), KSMQ (KSMQ), KPTS/Channel 8 (KPTS), KTWU/Channel 11 (KTWU), East Tennessee PBS (WSJK), WCTE-TV (WCTE), WLJT, Channel 11 (WLJT), WOSU TV (WOSU), WOUB/WOUC (WOUB), WVPB (WVPB), WKYU-PBS (WKYU), KERA 13 (KERA), MPBN (WCBB), Mountain Lake PBS (WCFE), NHPTV (WENH), Vermont PBS (WETK), witf (WITF), WQED Multimedia (WQED), WMHT Educational Telecommunications (WMHT), Q-TV (WDCQ), WTVS Detroit Public TV (WTVS), CMU Public Television (WCMU), WKAR-TV (WKAR), WNMU-TV Public TV 13 (WNMU), WDSE - WRPT (WDSE), WGTE TV (WGTE), Lakeland Public Television (KAWE), KMOS-TV - Channels 6.1, 6.2 and 6.3 (KMOS), MontanaPBS (KUSM), KRWG/Channel 22 (KRWG), KACV (KACV), KCOS/Channel 13 (KCOS), WCNY/Channel 24 (WCNY), WNED (WNED), WPBS (WPBS), WSKG Public TV (WSKG), WXXI (WXXI), WPSU (WPSU), WVIA Public Media Studios (WVIA), WTVI (WTVI), Western Reserve PBS (WNEO), WVIZ/PBS ideastream (WVIZ), KCTS 9 (KCTS), Basin PBS (KPBT), KUHT / Channel 8 (KUHT), KLRN (KLRN), KLRU (KLRU), WTJX Channel 12 (WTJX), WCVE PBS (WCVE), KBTC Public Television (KBTC)
- **pcmag**
- **Periscope**: Periscope
- **PhilharmonieDeParis**: Philharmonie de Paris
- **Phoenix**
- **phoenix.de**
- **Photobucket**
- **Pinkbike**
- **Pladform**
@@ -435,16 +475,20 @@
- **radiofrance**
- **RadioJavan**
- **Rai**
- **RaiTV**
- **RBMARadio**
- **RDS**: RDS.ca
- **RedTube**
- **RegioTV**
- **Restudy**
- **ReverbNation**
- **Revision3**
- **RingTV**
- **RottenTomatoes**
- **Roxwel**
- **RTBF**
- **Rte**
- **rte**: Raidió Teilifís Éireann TV
- **rte:radio**: Raidió Teilifís Éireann radio
- **rtl.nl**: rtl.nl and rtlxl.nl
- **RTL2**
- **RTP**
@@ -454,6 +498,7 @@
- **rtve.es:live**: RTVE.es live streams
- **RTVNH**
- **RUHD**
- **RulePorn**
- **rutube**: Rutube videos
- **rutube:channel**: Rutube channels
- **rutube:embed**: Rutube embedded videos
@@ -467,6 +512,7 @@
- **Sapo**: SAPO Vídeos
- **savefrom.net**
- **SBS**: sbs.com.au
- **schooltv**
- **SciVee**
- **screen.yahoo:search**: Yahoo screen search
- **Screencast**
@@ -480,6 +526,8 @@
- **Shared**: shared.sx and vivo.sx
- **ShareSix**
- **Sina**
- **skynewsarabia:video**
- **skynewsarabia:video**
- **Slideshare**
- **Slutload**
- **smotri**: Smotri.com
@@ -490,10 +538,9 @@
- **SnagFilmsEmbed**
- **Snotr**
- **Sohu**
- **soompi**
- **soompi:show**
- **soundcloud**
- **soundcloud:playlist**
- **soundcloud:search**: Soundcloud search
- **soundcloud:set**
- **soundcloud:user**
- **soundgasm**
@@ -515,8 +562,8 @@
- **SportBoxEmbed**
- **SportDeutschland**
- **Sportschau**
- **Srf**
- **SRMediathek**: Saarländischer Rundfunk
- **SRGSSR**
- **SRGSSRPlay**: srf.ch, rts.ch, rsi.ch, rtr.ch and swissinfo.ch play sites
- **SSA**
- **stanfordoc**: Stanford Open ClassRoom
- **Steam**
@@ -541,14 +588,15 @@
- **TechTalks**
- **techtv.mit.edu**
- **ted**
- **Tele13**
- **TeleBruxelles**
- **Telecinco**: telecinco.es, cuatro.com and mediaset.es
- **Telegraaf**
- **TeleMB**
- **TeleTask**
- **TenPlay**
- **TestTube**
- **TF1**
- **TheIntercept**
- **TheOnion**
- **ThePlatform**
- **ThePlatformFeed**
@@ -558,22 +606,28 @@
- **THVideo**
- **THVideoPlaylist**
- **tinypic**: tinypic.com videos
- **tlc.com**
- **tlc.de**
- **TMZ**
- **TMZArticle**
- **TNAFlix**
- **toggle**
- **tou.tv**
- **Toypics**: Toypics user profile
- **ToypicsUser**: Toypics user profile
- **TrailerAddict** (Currently broken)
- **Trilulilu**
- **trollvids**
- **TruTube**
- **Tube8**
- **TubiTv**
- **Tudou**
- **tudou**
- **tudou:album**
- **tudou:playlist**
- **Tumblr**
- **TuneIn**
- **tunein:clip**
- **tunein:program**
- **tunein:station**
- **tunein:topic**
- **Turbo**
- **Tutv**
- **tv.dfb.de**
@@ -583,6 +637,7 @@
- **TVC**
- **TVCArticle**
- **tvigle**: Интернет-телевидение Tvigle.ru
- **tvland.com**
- **tvp.pl**
- **tvp.pl:Series**
- **TVPlay**: TV3Play and related services
@@ -600,7 +655,6 @@
- **udemy**
- **udemy:course**
- **UDNEmbed**: 聯合影音
- **Ultimedia**
- **Unistra**
- **Urort**: NRK P3 Urørt
- **ustream**
@@ -612,7 +666,7 @@
- **Vessel**
- **Vesti**: Вести.Ru
- **Vevo**
- **VGTV**: VGTV and BTTV
- **VGTV**: VGTV, BTTV, FTV, Aftenposten and Aftonbladet
- **vh1.com**
- **Vice**
- **Viddler**
@@ -620,9 +674,12 @@
- **video.mit.edu**
- **VideoDetective**
- **videofy.me**
- **VideoMega**
- **VideoMega** (Currently broken)
- **videomore**
- **videomore:season**
- **videomore:video**
- **VideoPremium**
- **VideoTt**: video.tt - Your True Tube
- **VideoTt**: video.tt - Your True Tube (Currently broken)
- **videoweed**: VideoWeed
- **Vidme**
- **Vidzi**
@@ -664,6 +721,8 @@
- **WebOfStories**
- **WebOfStoriesPlaylist**
- **Weibo**
- **WeiqiTV**: WQTV
- **wholecloud**: WholeCloud
- **Wimp**
- **Wistia**
- **WNL**
@@ -700,6 +759,7 @@
- **youtube:favorites**: YouTube.com favourite videos, ":ytfav" for short (requires authentication)
- **youtube:history**: Youtube watch history, ":ythistory" for short (requires authentication)
- **youtube:playlist**: YouTube.com playlists
- **youtube:playlists**: YouTube.com user/channel playlists
- **youtube:recommended**: YouTube.com recommended videos, ":ytrec" for short (requires authentication)
- **youtube:search**: YouTube.com searches
- **youtube:search:date**: YouTube.com searches, newest videos first
@@ -713,3 +773,4 @@
- **ZDFChannel**
- **zingmp3:album**: mp3.zing.vn albums
- **zingmp3:song**: mp3.zing.vn songs
- **ZippCast**

View File

@@ -12,8 +12,9 @@ import copy
from test.helper import FakeYDL, assertRegexpMatches
from youtube_dl import YoutubeDL
from youtube_dl.compat import compat_str
from youtube_dl.compat import compat_str, compat_urllib_error
from youtube_dl.extractor import YoutubeIE
from youtube_dl.extractor.common import InfoExtractor
from youtube_dl.postprocessor.common import PostProcessor
from youtube_dl.utils import ExtractorError, match_filter_func
@@ -221,6 +222,16 @@ class TestFormatSelection(unittest.TestCase):
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'dash-video-low')
formats = [
{'format_id': 'vid-vcodec-dot', 'ext': 'mp4', 'preference': 1, 'vcodec': 'avc1.123456', 'acodec': 'none', 'url': TEST_URL},
]
info_dict = _make_result(formats)
ydl = YDL({'format': 'bestvideo[vcodec=avc1.123456]'})
ydl.process_ie_result(info_dict.copy())
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'vid-vcodec-dot')
def test_youtube_format_selection(self):
order = [
'38', '37', '46', '22', '45', '35', '44', '18', '34', '43', '6', '5', '36', '17', '13',
@@ -237,6 +248,17 @@ class TestFormatSelection(unittest.TestCase):
def format_info(f_id):
info = YoutubeIE._formats[f_id].copy()
# XXX: In real cases InfoExtractor._parse_mpd() fills up 'acodec'
# and 'vcodec', while in tests such information is incomplete since
# commit a6c2c24479e5f4827ceb06f64d855329c0a6f593
# test_YoutubeDL.test_youtube_format_selection is broken without
# this fix
if 'acodec' in info and 'vcodec' not in info:
info['vcodec'] = 'none'
elif 'vcodec' in info and 'acodec' not in info:
info['acodec'] = 'none'
info['format_id'] = f_id
info['url'] = 'url:' + f_id
return info
@@ -631,6 +653,47 @@ class TestYoutubeDL(unittest.TestCase):
result = get_ids({'playlist_items': '10'})
self.assertEqual(result, [])
def test_urlopen_no_file_protocol(self):
# see https://github.com/rg3/youtube-dl/issues/8227
ydl = YDL()
self.assertRaises(compat_urllib_error.URLError, ydl.urlopen, 'file:///etc/passwd')
def test_do_not_override_ie_key_in_url_transparent(self):
ydl = YDL()
class Foo1IE(InfoExtractor):
_VALID_URL = r'foo1:'
def _real_extract(self, url):
return {
'_type': 'url_transparent',
'url': 'foo2:',
'ie_key': 'Foo2',
}
class Foo2IE(InfoExtractor):
_VALID_URL = r'foo2:'
def _real_extract(self, url):
return {
'_type': 'url',
'url': 'foo3:',
'ie_key': 'Foo3',
}
class Foo3IE(InfoExtractor):
_VALID_URL = r'foo3:'
def _real_extract(self, url):
return _make_result([{'url': TEST_URL}])
ydl.add_info_extractor(Foo1IE(ydl))
ydl.add_info_extractor(Foo2IE(ydl))
ydl.add_info_extractor(Foo3IE(ydl))
ydl.extract_info('foo1:')
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['url'], TEST_URL)
if __name__ == '__main__':
unittest.main()

View File

@@ -56,7 +56,7 @@ class TestAllURLsMatching(unittest.TestCase):
assertChannel('https://www.youtube.com/channel/HCtnHdj3df7iM/videos')
def test_youtube_user_matching(self):
self.assertMatch('www.youtube.com/NASAgovVideo/videos', ['youtube:user'])
self.assertMatch('http://www.youtube.com/NASAgovVideo/videos', ['youtube:user'])
def test_youtube_feeds(self):
self.assertMatch('https://www.youtube.com/feed/watch_later', ['youtube:watchlater'])
@@ -121,8 +121,8 @@ class TestAllURLsMatching(unittest.TestCase):
def test_pbs(self):
# https://github.com/rg3/youtube-dl/issues/2350
self.assertMatch('http://video.pbs.org/viralplayer/2365173446/', ['PBS'])
self.assertMatch('http://video.pbs.org/widget/partnerplayer/980042464/', ['PBS'])
self.assertMatch('http://video.pbs.org/viralplayer/2365173446/', ['pbs'])
self.assertMatch('http://video.pbs.org/widget/partnerplayer/980042464/', ['pbs'])
def test_yahoo_https(self):
# https://github.com/rg3/youtube-dl/issues/2701

View File

@@ -11,7 +11,6 @@ from test.helper import FakeYDL, md5
from youtube_dl.extractor import (
BlipTVIE,
YoutubeIE,
DailymotionIE,
TEDIE,
@@ -22,7 +21,7 @@ from youtube_dl.extractor import (
NPOIE,
ComedyCentralIE,
NRKTVIE,
RaiIE,
RaiTVIE,
VikiIE,
ThePlatformIE,
ThePlatformFeedIE,
@@ -66,16 +65,16 @@ class TestYoutubeSubtitles(BaseTestSubtitles):
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(len(subtitles.keys()), 13)
self.assertEqual(md5(subtitles['en']), '4cd9278a35ba2305f47354ee13472260')
self.assertEqual(md5(subtitles['it']), '164a51f16f260476a05b50fe4c2f161d')
for lang in ['it', 'fr', 'de']:
self.assertEqual(md5(subtitles['en']), '3cb210999d3e021bd6c7f0ea751eab06')
self.assertEqual(md5(subtitles['it']), '6d752b98c31f1cf8d597050c7a2cb4b5')
for lang in ['fr', 'de']:
self.assertTrue(subtitles.get(lang) is not None, 'Subtitles for \'%s\' not extracted' % lang)
def test_youtube_subtitles_sbv_format(self):
def test_youtube_subtitles_ttml_format(self):
self.DL.params['writesubtitles'] = True
self.DL.params['subtitlesformat'] = 'sbv'
self.DL.params['subtitlesformat'] = 'ttml'
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['en']), '13aeaa0c245a8bed9a451cb643e3ad8b')
self.assertEqual(md5(subtitles['en']), 'e306f8c42842f723447d9f63ad65df54')
def test_youtube_subtitles_vtt_format(self):
self.DL.params['writesubtitles'] = True
@@ -145,18 +144,6 @@ class TestTedSubtitles(BaseTestSubtitles):
self.assertTrue(subtitles.get(lang) is not None, 'Subtitles for \'%s\' not extracted' % lang)
class TestBlipTVSubtitles(BaseTestSubtitles):
url = 'http://blip.tv/a/a-6603250'
IE = BlipTVIE
def test_allsubtitles(self):
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(set(subtitles.keys()), set(['en']))
self.assertEqual(md5(subtitles['en']), '5b75c300af65fe4476dff79478bb93e4')
class TestVimeoSubtitles(BaseTestSubtitles):
url = 'http://vimeo.com/76979871'
IE = VimeoIE
@@ -273,7 +260,7 @@ class TestNRKSubtitles(BaseTestSubtitles):
class TestRaiSubtitles(BaseTestSubtitles):
url = 'http://www.rai.tv/dl/RaiTV/programmi/media/ContentItem-cb27157f-9dd0-4aee-b788-b1f67643a391.html'
IE = RaiIE
IE = RaiTVIE
def test_allsubtitles(self):
self.DL.params['writesubtitles'] = True

30
test/test_update.py Normal file
View File

@@ -0,0 +1,30 @@
#!/usr/bin/env python
from __future__ import unicode_literals
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import json
from youtube_dl.update import rsa_verify
class TestUpdate(unittest.TestCase):
def test_rsa_verify(self):
UPDATES_RSA_KEY = (0x9d60ee4d8f805312fdb15a62f87b95bd66177b91df176765d13514a0f1754bcd2057295c5b6f1d35daa6742c3ffc9a82d3e118861c207995a8031e151d863c9927e304576bc80692bc8e094896fcf11b66f3e29e04e3a71e9a11558558acea1840aec37fc396fb6b65dc81a1c4144e03bd1c011de62e3f1357b327d08426fe93, 65537)
with open(os.path.join(os.path.dirname(os.path.abspath(__file__)), 'versions.json'), 'rb') as f:
versions_info = f.read().decode()
versions_info = json.loads(versions_info)
signature = versions_info['signature']
del versions_info['signature']
self.assertTrue(rsa_verify(
json.dumps(versions_info, sort_keys=True).encode('utf-8'),
signature, UPDATES_RSA_KEY))
if __name__ == '__main__':
unittest.main()

View File

@@ -21,6 +21,8 @@ from youtube_dl.utils import (
clean_html,
DateRange,
detect_exe_version,
determine_ext,
encode_compat_str,
encodeFilename,
escape_rfc3986,
escape_url,
@@ -42,6 +44,7 @@ from youtube_dl.utils import (
sanitize_path,
prepend_extension,
replace_extension,
remove_quotes,
shell_quote,
smuggle_url,
str_to_int,
@@ -199,6 +202,15 @@ class TestUtil(unittest.TestCase):
self.assertEqual(replace_extension('.abc', 'temp'), '.abc.temp')
self.assertEqual(replace_extension('.abc.ext', 'temp'), '.abc.temp')
def test_remove_quotes(self):
self.assertEqual(remove_quotes(None), None)
self.assertEqual(remove_quotes('"'), '"')
self.assertEqual(remove_quotes("'"), "'")
self.assertEqual(remove_quotes(';'), ';')
self.assertEqual(remove_quotes('";'), '";')
self.assertEqual(remove_quotes('""'), '')
self.assertEqual(remove_quotes('";"'), ';')
def test_ordered_set(self):
self.assertEqual(orderedSet([1, 1, 2, 3, 4, 4, 5, 6, 7, 3, 5]), [1, 2, 3, 4, 5, 6, 7])
self.assertEqual(orderedSet([]), [])
@@ -238,6 +250,13 @@ class TestUtil(unittest.TestCase):
self.assertEqual(unified_strdate('25-09-2014'), '20140925')
self.assertEqual(unified_strdate('UNKNOWN DATE FORMAT'), None)
def test_determine_ext(self):
self.assertEqual(determine_ext('http://example.com/foo/bar.mp4/?download'), 'mp4')
self.assertEqual(determine_ext('http://example.com/foo/bar/?download', None), None)
self.assertEqual(determine_ext('http://example.com/foo/bar.nonext/?download', None), None)
self.assertEqual(determine_ext('http://example.com/foo/bar/mp4?download', None), None)
self.assertEqual(determine_ext('http://example.com/foo/bar.m3u8//?download'), 'm3u8')
def test_find_xpath_attr(self):
testxml = '''<root>
<node/>
@@ -431,6 +450,10 @@ class TestUtil(unittest.TestCase):
data = urlencode_postdata({'username': 'foo@bar.com', 'password': '1234'})
self.assertTrue(isinstance(data, bytes))
def test_encode_compat_str(self):
self.assertEqual(encode_compat_str(b'\xd1\x82\xd0\xb5\xd1\x81\xd1\x82', 'utf-8'), 'тест')
self.assertEqual(encode_compat_str('тест', 'utf-8'), 'тест')
def test_parse_iso8601(self):
self.assertEqual(parse_iso8601('2014-03-23T23:04:26+0100'), 1395612266)
self.assertEqual(parse_iso8601('2014-03-23T22:04:26+0000'), 1395612266)
@@ -643,12 +666,13 @@ ffmpeg version 2.4.4 Copyright (c) 2000-2014 the FFmpeg ...'''), '2.4.4')
{'like_count': 190, 'dislike_count': 10}))
def test_parse_dfxp_time_expr(self):
self.assertEqual(parse_dfxp_time_expr(None), 0.0)
self.assertEqual(parse_dfxp_time_expr(''), 0.0)
self.assertEqual(parse_dfxp_time_expr(None), None)
self.assertEqual(parse_dfxp_time_expr(''), None)
self.assertEqual(parse_dfxp_time_expr('0.1'), 0.1)
self.assertEqual(parse_dfxp_time_expr('0.1s'), 0.1)
self.assertEqual(parse_dfxp_time_expr('00:00:01'), 1.0)
self.assertEqual(parse_dfxp_time_expr('00:00:01.100'), 1.1)
self.assertEqual(parse_dfxp_time_expr('00:00:01:100'), 1.1)
def test_dfxp2srt(self):
dfxp_data = '''<?xml version="1.0" encoding="UTF-8"?>
@@ -658,6 +682,9 @@ ffmpeg version 2.4.4 Copyright (c) 2000-2014 the FFmpeg ...'''), '2.4.4')
<p begin="0" end="1">The following line contains Chinese characters and special symbols</p>
<p begin="1" end="2">第二行<br/>♪♪</p>
<p begin="2" dur="1"><span>Third<br/>Line</span></p>
<p begin="3" end="-1">Lines with invalid timestamps are ignored</p>
<p begin="-1" end="-1">Ignore, two</p>
<p begin="3" dur="-1">Ignored, three</p>
</div>
</body>
</tt>'''

View File

@@ -66,7 +66,7 @@ class TestAnnotations(unittest.TestCase):
textTag = a.find('TEXT')
text = textTag.text
self.assertTrue(text in expected) # assertIn only added in python 2.7
# remove the first occurance, there could be more than one annotation with the same text
# remove the first occurrence, there could be more than one annotation with the same text
expected.remove(text)
# We should have seen (and removed) all the expected annotation texts.
self.assertEqual(len(expected), 0, 'Not all expected annotations were found.')

View File

@@ -34,7 +34,7 @@ class TestYoutubeLists(unittest.TestCase):
ie = YoutubePlaylistIE(dl)
# TODO find a > 100 (paginating?) videos course
result = ie.extract('https://www.youtube.com/course?list=ECUl4u3cNGP61MdtwGTqZA0MreSaDybji8')
entries = result['entries']
entries = list(result['entries'])
self.assertEqual(YoutubeIE().extract_id(entries[0]['url']), 'j9WZyLZCBzs')
self.assertEqual(len(entries), 25)
self.assertEqual(YoutubeIE().extract_id(entries[-1]['url']), 'rYefUsYuEp0')

34
test/versions.json Normal file
View File

@@ -0,0 +1,34 @@
{
"latest": "2013.01.06",
"signature": "72158cdba391628569ffdbea259afbcf279bbe3d8aeb7492690735dc1cfa6afa754f55c61196f3871d429599ab22f2667f1fec98865527b32632e7f4b3675a7ef0f0fbe084d359256ae4bba68f0d33854e531a70754712f244be71d4b92e664302aa99653ee4df19800d955b6c4149cd2b3f24288d6e4b40b16126e01f4c8ce6",
"versions": {
"2013.01.02": {
"bin": [
"http://youtube-dl.org/downloads/2013.01.02/youtube-dl",
"f5b502f8aaa77675c4884938b1e4871ebca2611813a0c0e74f60c0fbd6dcca6b"
],
"exe": [
"http://youtube-dl.org/downloads/2013.01.02/youtube-dl.exe",
"75fa89d2ce297d102ff27675aa9d92545bbc91013f52ec52868c069f4f9f0422"
],
"tar": [
"http://youtube-dl.org/downloads/2013.01.02/youtube-dl-2013.01.02.tar.gz",
"6a66d022ac8e1c13da284036288a133ec8dba003b7bd3a5179d0c0daca8c8196"
]
},
"2013.01.06": {
"bin": [
"http://youtube-dl.org/downloads/2013.01.06/youtube-dl",
"64b6ed8865735c6302e836d4d832577321b4519aa02640dc508580c1ee824049"
],
"exe": [
"http://youtube-dl.org/downloads/2013.01.06/youtube-dl.exe",
"58609baf91e4389d36e3ba586e21dab882daaaee537e4448b1265392ae86ff84"
],
"tar": [
"http://youtube-dl.org/downloads/2013.01.06/youtube-dl-2013.01.06.tar.gz",
"fe77ab20a95d980ed17a659aa67e371fdd4d656d19c4c7950e7b720b0c2f1a86"
]
}
}
}

View File

@@ -28,6 +28,7 @@ if os.name == 'nt':
import ctypes
from .compat import (
compat_basestring,
compat_cookiejar,
compat_expanduser,
compat_get_terminal_size,
@@ -45,8 +46,11 @@ from .utils import (
DateRange,
DEFAULT_OUTTMPL,
determine_ext,
determine_protocol,
DownloadError,
encode_compat_str,
encodeFilename,
error_to_compat_str,
ExtractorError,
format_bytes,
formatSeconds,
@@ -63,6 +67,7 @@ from .utils import (
SameFileError,
sanitize_filename,
sanitize_path,
sanitized_Request,
std_headers,
subtitles_filename,
UnavailableVideoError,
@@ -156,7 +161,7 @@ class YoutubeDL(object):
writethumbnail: Write the thumbnail image to a file
write_all_thumbnails: Write all thumbnail formats to files
writesubtitles: Write the video subtitles to a file
writeautomaticsub: Write the automatic subtitles to a file
writeautomaticsub: Write the automatically generated subtitles to a file
allsubtitles: Downloads all the subtitles of the video
(requires writesubtitles or writeautomaticsub)
listsubtitles: Lists all available subtitles for the video
@@ -258,7 +263,7 @@ class YoutubeDL(object):
the downloader (see youtube_dl/downloader/common.py):
nopart, updatetime, buffersize, ratelimit, min_filesize, max_filesize, test,
noresizebuffer, retries, continuedl, noprogress, consoletitle,
xattr_set_filesize, external_downloader_args.
xattr_set_filesize, external_downloader_args, hls_use_mpegts.
The following options are used by the post processors:
prefer_ffmpeg: If True, use ffmpeg instead of avconv if both are available,
@@ -493,7 +498,7 @@ class YoutubeDL(object):
tb = ''
if hasattr(sys.exc_info()[1], 'exc_info') and sys.exc_info()[1].exc_info[0]:
tb += ''.join(traceback.format_exception(*sys.exc_info()[1].exc_info))
tb += compat_str(traceback.format_exc())
tb += encode_compat_str(traceback.format_exc())
else:
tb_data = traceback.format_list(traceback.extract_stack())
tb = ''.join(tb_data)
@@ -672,14 +677,14 @@ class YoutubeDL(object):
return self.process_ie_result(ie_result, download, extra_info)
else:
return ie_result
except ExtractorError as de: # An error we somewhat expected
self.report_error(compat_str(de), de.format_traceback())
except ExtractorError as e: # An error we somewhat expected
self.report_error(compat_str(e), e.format_traceback())
break
except MaxDownloadsReached:
raise
except Exception as e:
if self.params.get('ignoreerrors', False):
self.report_error(compat_str(e), tb=compat_str(traceback.format_exc()))
self.report_error(error_to_compat_str(e), tb=encode_compat_str(traceback.format_exc()))
break
else:
raise
@@ -702,7 +707,6 @@ class YoutubeDL(object):
It will also download the videos if 'download'.
Returns the resolved ie_result.
"""
result_type = ie_result.get('_type', 'video')
if result_type in ('url', 'url_transparent'):
@@ -731,7 +735,7 @@ class YoutubeDL(object):
force_properties = dict(
(k, v) for k, v in ie_result.items() if v is not None)
for f in ('_type', 'url'):
for f in ('_type', 'url', 'ie_key'):
if f in force_properties:
del force_properties[f]
new_result = info.copy()
@@ -833,6 +837,7 @@ class YoutubeDL(object):
extra_info=extra)
playlist_results.append(entry_result)
ie_result['entries'] = playlist_results
self.to_screen('[download] Finished downloading playlist: %s' % playlist)
return ie_result
elif result_type == 'compat_list':
self.report_warning(
@@ -893,11 +898,14 @@ class YoutubeDL(object):
STR_OPERATORS = {
'=': operator.eq,
'!=': operator.ne,
'^=': lambda attr, value: attr.startswith(value),
'$=': lambda attr, value: attr.endswith(value),
'*=': lambda attr, value: value in attr,
}
str_operator_rex = re.compile(r'''(?x)
\s*(?P<key>ext|acodec|vcodec|container|protocol)
\s*(?P<op>%s)(?P<none_inclusive>\s*\?)?
\s*(?P<value>[a-zA-Z0-9_-]+)
\s*(?P<value>[a-zA-Z0-9._-]+)
\s*$
''' % '|'.join(map(re.escape, STR_OPERATORS.keys())))
m = str_operator_rex.search(filter_spec)
@@ -937,7 +945,7 @@ class YoutubeDL(object):
filter_parts.append(string)
def _remove_unused_ops(tokens):
# Remove operators that we don't use and join them with the sourrounding strings
# Remove operators that we don't use and join them with the surrounding strings
# for example: 'mp4' '-' 'baseline' '-' '16x9' is converted to 'mp4-baseline-16x9'
ALLOWED_OPS = ('/', '+', ',', '(', ')')
last_string, last_start, last_end, last_line = None, None, None, None
@@ -1107,6 +1115,12 @@ class YoutubeDL(object):
'contain the video, try using '
'"-f %s+%s"' % (format_2, format_1))
return
# Formats must be opposite (video+audio)
if formats_info[0].get('acodec') == 'none' and formats_info[1].get('acodec') == 'none':
self.report_error(
'Both formats %s and %s are video-only, you must specify "-f video+audio"'
% (format_1, format_2))
return
output_ext = (
formats_info[0]['ext']
if self.params.get('merge_output_format') is None
@@ -1186,7 +1200,7 @@ class YoutubeDL(object):
return res
def _calc_cookies(self, info_dict):
pr = compat_urllib_request.Request(info_dict['url'])
pr = sanitized_Request(info_dict['url'])
self.cookiejar.add_cookie_header(pr)
return pr.get_header('Cookie')
@@ -1233,6 +1247,12 @@ class YoutubeDL(object):
except (ValueError, OverflowError, OSError):
pass
# Auto generate title fields corresponding to the *_number fields when missing
# in order to always have clean titles. This is very common for TV series.
for field in ('chapter', 'season', 'episode'):
if info_dict.get('%s_number' % field) is not None and not info_dict.get(field):
info_dict[field] = '%s %d' % (field.capitalize(), info_dict['%s_number' % field])
subtitles = info_dict.get('subtitles')
if subtitles:
for _, subtitle in subtitles.items():
@@ -1289,6 +1309,10 @@ class YoutubeDL(object):
# Automatically determine file extension if missing
if 'ext' not in format:
format['ext'] = determine_ext(format['url']).lower()
# Automatically determine protocol if missing (useful for format
# selection purposes)
if 'protocol' not in format:
format['protocol'] = determine_protocol(format)
# Add HTTP headers, so that external programs can use them from the
# json output
full_format_info = info_dict.copy()
@@ -1301,7 +1325,7 @@ class YoutubeDL(object):
# only set the 'formats' fields if the original info_dict list them
# otherwise we end up with a circular reference, the first (and unique)
# element in the 'formats' field in info_dict is info_dict itself,
# wich can't be exported to json
# which can't be exported to json
info_dict['formats'] = formats
if self.params.get('listformats'):
self.list_formats(info_dict)
@@ -1450,7 +1474,7 @@ class YoutubeDL(object):
if dn and not os.path.exists(dn):
os.makedirs(dn)
except (OSError, IOError) as err:
self.report_error('unable to create directory ' + compat_str(err))
self.report_error('unable to create directory ' + error_to_compat_str(err))
return
if self.params.get('writedescription', False):
@@ -1501,7 +1525,7 @@ class YoutubeDL(object):
sub_info['url'], info_dict['id'], note=False)
except ExtractorError as err:
self.report_warning('Unable to download subtitle for "%s": %s' %
(sub_lang, compat_str(err.cause)))
(sub_lang, error_to_compat_str(err.cause)))
continue
try:
sub_filename = subtitles_filename(filename, sub_lang, sub_format)
@@ -1780,6 +1804,10 @@ class YoutubeDL(object):
res = ''
if fdict.get('ext') in ['f4f', 'f4m']:
res += '(unsupported) '
if fdict.get('language'):
if res:
res += ' '
res += '[%s]' % fdict['language']
if fdict.get('format_note') is not None:
res += fdict['format_note'] + ' '
if fdict.get('tbr') is not None:
@@ -1870,6 +1898,8 @@ class YoutubeDL(object):
def urlopen(self, req):
""" Start an HTTP download """
if isinstance(req, compat_basestring):
req = sanitized_Request(req)
return self._opener.open(req, timeout=self._socket_timeout)
def print_debug_header(self):
@@ -1969,8 +1999,19 @@ class YoutubeDL(object):
https_handler = make_HTTPS_handler(self.params, debuglevel=debuglevel)
ydlh = YoutubeDLHandler(self.params, debuglevel=debuglevel)
data_handler = compat_urllib_request_DataHandler()
# When passing our own FileHandler instance, build_opener won't add the
# default FileHandler and allows us to disable the file protocol, which
# can be used for malicious purposes (see
# https://github.com/rg3/youtube-dl/issues/8227)
file_handler = compat_urllib_request.FileHandler()
def file_open(*args, **kwargs):
raise compat_urllib_error.URLError('file:// scheme is explicitly disabled in youtube-dl for security reasons')
file_handler.file_open = file_open
opener = compat_urllib_request.build_opener(
proxy_handler, https_handler, cookie_processor, ydlh, data_handler)
proxy_handler, https_handler, cookie_processor, ydlh, data_handler, file_handler)
# Delete the default user-agent header, which would otherwise apply in
# cases where our custom HTTP handler doesn't come into play
@@ -2028,4 +2069,4 @@ class YoutubeDL(object):
(info_dict['extractor'], info_dict['id'], thumb_display_id, thumb_filename))
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
self.report_warning('Unable to download thumbnail "%s": %s' %
(t['url'], compat_str(err)))
(t['url'], error_to_compat_str(err)))

View File

@@ -369,6 +369,7 @@ def _real_main(argv=None):
'no_color': opts.no_color,
'ffmpeg_location': opts.ffmpeg_location,
'hls_prefer_native': opts.hls_prefer_native,
'hls_use_mpegts': opts.hls_use_mpegts,
'external_downloader_args': external_downloader_args,
'postprocessor_args': postprocessor_args,
'cn_verification_proxy': opts.cn_verification_proxy,

View File

@@ -433,7 +433,7 @@ if sys.version_info < (3, 0) and sys.platform == 'win32':
else:
compat_getpass = getpass.getpass
# Old 2.6 and 2.7 releases require kwargs to be bytes
# Python < 2.6.5 require kwargs to be bytes
try:
def _testfunc(x):
pass

View File

@@ -5,9 +5,9 @@ import re
import sys
import time
from ..compat import compat_str
from ..utils import (
encodeFilename,
error_to_compat_str,
decodeArgument,
format_bytes,
timeconvert,
@@ -42,9 +42,10 @@ class FileDownloader(object):
min_filesize: Skip files smaller than this size
max_filesize: Skip files larger than this size
xattr_set_filesize: Set ytdl.filesize user xattribute with expected size.
(experimenatal)
(experimental)
external_downloader_args: A list of additional command-line arguments for the
external downloader.
hls_use_mpegts: Use the mpegts container for HLS videos.
Subclasses of this one must re-define the real_download method.
"""
@@ -186,7 +187,7 @@ class FileDownloader(object):
return
os.rename(encodeFilename(old_filename), encodeFilename(new_filename))
except (IOError, OSError) as err:
self.report_error('unable to rename file: %s' % compat_str(err))
self.report_error('unable to rename file: %s' % error_to_compat_str(err))
def try_utime(self, filename, last_modified_hdr):
"""Try to set the last-modified time of the given file."""
@@ -295,7 +296,7 @@ class FileDownloader(object):
def report_retry(self, count, retries):
"""Report retry in case of HTTP error 5xx"""
self.to_screen('[download] Got server HTTP error. Retrying (attempt %d of %d)...' % (count, retries))
self.to_screen('[download] Got server HTTP error. Retrying (attempt %d of %.0f)...' % (count, retries))
def report_file_already_downloaded(self, file_name):
"""Report file has already been fully downloaded."""

View File

@@ -3,7 +3,7 @@ from __future__ import unicode_literals
import re
from .common import FileDownloader
from ..compat import compat_urllib_request
from ..utils import sanitized_Request
class DashSegmentsFD(FileDownloader):
@@ -22,7 +22,7 @@ class DashSegmentsFD(FileDownloader):
def append_url_to_file(outf, target_url, target_name, remaining_bytes=None):
self.to_screen('[DashSegments] %s: Downloading %s' % (info_dict['id'], target_name))
req = compat_urllib_request.Request(target_url)
req = sanitized_Request(target_url)
if remaining_bytes is not None:
req.add_header('Range', 'bytes=0-%d' % (remaining_bytes - 1))

View File

@@ -15,6 +15,7 @@ from ..compat import (
)
from ..utils import (
encodeFilename,
fix_xml_ampersands,
sanitize_open,
struct_pack,
struct_unpack,
@@ -272,15 +273,21 @@ class F4mFD(FragmentFD):
return fragments_list
def _parse_bootstrap_node(self, node, base_url):
if node.text is None:
# Sometimes non empty inline bootstrap info can be specified along
# with bootstrap url attribute (e.g. dummy inline bootstrap info
# contains whitespace characters in [1]). We will prefer bootstrap
# url over inline bootstrap info when present.
# 1. http://live-1-1.rutube.ru/stream/1024/HDS/SD/C2NKsS85HQNckgn5HdEmOQ/1454167650/S-s604419906/move/four/dirs/upper/1024-576p.f4m
bootstrap_url = node.get('url')
if bootstrap_url:
bootstrap_url = compat_urlparse.urljoin(
base_url, node.attrib['url'])
base_url, bootstrap_url)
boot_info = self._get_bootstrap_from_url(bootstrap_url)
else:
bootstrap_url = None
bootstrap = base64.b64decode(node.text.encode('ascii'))
boot_info = read_bootstrap_info(bootstrap)
return (boot_info, bootstrap_url)
return boot_info, bootstrap_url
def real_download(self, filename, info_dict):
man_url = info_dict['url']
@@ -288,7 +295,10 @@ class F4mFD(FragmentFD):
self.to_screen('[%s] Downloading f4m manifest' % self.FD_NAME)
urlh = self.ydl.urlopen(man_url)
man_url = urlh.geturl()
manifest = urlh.read()
# Some manifests may be malformed, e.g. prosiebensat1 generated manifests
# (see https://github.com/rg3/youtube-dl/issues/6215#issuecomment-121704244
# and https://github.com/rg3/youtube-dl/issues/7823)
manifest = fix_xml_ampersands(urlh.read().decode('utf-8', 'ignore')).strip()
doc = compat_etree_fromstring(manifest)
formats = [(int(f.attrib.get('bitrate', -1)), f)
@@ -312,7 +322,8 @@ class F4mFD(FragmentFD):
metadata = None
fragments_list = build_fragments_list(boot_info)
if self.params.get('test', False):
test = self.params.get('test', False)
if test:
# We only download the first fragment
fragments_list = fragments_list[:1]
total_frags = len(fragments_list)
@@ -322,6 +333,7 @@ class F4mFD(FragmentFD):
ctx = {
'filename': filename,
'total_frags': total_frags,
'live': live,
}
self._prepare_frag_download(ctx)
@@ -376,7 +388,7 @@ class F4mFD(FragmentFD):
else:
raise
if not fragments_list and live and bootstrap_url:
if not fragments_list and not test and live and bootstrap_url:
fragments_list = self._update_live_fragments(bootstrap_url, frag_i)
total_frags += len(fragments_list)
if fragments_list and (fragments_list[0][1] > frag_i + 1):

View File

@@ -26,7 +26,11 @@ class FragmentFD(FileDownloader):
self._start_frag_download(ctx)
def _prepare_frag_download(self, ctx):
self.to_screen('[%s] Total fragments: %d' % (self.FD_NAME, ctx['total_frags']))
if 'live' not in ctx:
ctx['live'] = False
self.to_screen(
'[%s] Total fragments: %s'
% (self.FD_NAME, ctx['total_frags'] if not ctx['live'] else 'unknown (live)'))
self.report_destination(ctx['filename'])
dl = HttpQuietDownloader(
self.ydl,
@@ -59,37 +63,44 @@ class FragmentFD(FileDownloader):
'filename': ctx['filename'],
'tmpfilename': ctx['tmpfilename'],
}
start = time.time()
ctx['started'] = start
ctx.update({
'started': start,
# Total complete fragments downloaded so far in bytes
'complete_frags_downloaded_bytes': 0,
# Amount of fragment's bytes downloaded by the time of the previous
# frag progress hook invocation
'prev_frag_downloaded_bytes': 0,
})
def frag_progress_hook(s):
if s['status'] not in ('downloading', 'finished'):
return
frag_total_bytes = s.get('total_bytes', 0)
if s['status'] == 'finished':
state['downloaded_bytes'] += frag_total_bytes
state['frag_index'] += 1
estimated_size = (
(state['downloaded_bytes'] + frag_total_bytes) /
(state['frag_index'] + 1) * total_frags)
time_now = time.time()
state['total_bytes_estimate'] = estimated_size
state['elapsed'] = time_now - start
frag_total_bytes = s.get('total_bytes') or 0
if not ctx['live']:
estimated_size = (
(ctx['complete_frags_downloaded_bytes'] + frag_total_bytes) /
(state['frag_index'] + 1) * total_frags)
state['total_bytes_estimate'] = estimated_size
if s['status'] == 'finished':
progress = self.calc_percent(state['frag_index'], total_frags)
state['frag_index'] += 1
state['downloaded_bytes'] += frag_total_bytes - ctx['prev_frag_downloaded_bytes']
ctx['complete_frags_downloaded_bytes'] = state['downloaded_bytes']
ctx['prev_frag_downloaded_bytes'] = 0
else:
frag_downloaded_bytes = s['downloaded_bytes']
frag_progress = self.calc_percent(frag_downloaded_bytes,
frag_total_bytes)
progress = self.calc_percent(state['frag_index'], total_frags)
progress += frag_progress / float(total_frags)
state['eta'] = self.calc_eta(
start, time_now, estimated_size, state['downloaded_bytes'] + frag_downloaded_bytes)
state['downloaded_bytes'] += frag_downloaded_bytes - ctx['prev_frag_downloaded_bytes']
if not ctx['live']:
state['eta'] = self.calc_eta(
start, time_now, estimated_size,
state['downloaded_bytes'])
state['speed'] = s.get('speed')
ctx['prev_frag_downloaded_bytes'] = frag_downloaded_bytes
self._hook_progress(state)
ctx['dl'].add_progress_hook(frag_progress_hook)

View File

@@ -13,6 +13,7 @@ from ..utils import (
encodeArgument,
encodeFilename,
sanitize_open,
handle_youtubedl_headers,
)
@@ -33,18 +34,32 @@ class HlsFD(FileDownloader):
if info_dict['http_headers'] and re.match(r'^https?://', url):
# Trailing \r\n after each HTTP header is important to prevent warning from ffmpeg/avconv:
# [http @ 00000000003d2fa0] No trailing CRLF found in HTTP header.
headers = handle_youtubedl_headers(info_dict['http_headers'])
args += [
'-headers',
''.join('%s: %s\r\n' % (key, val) for key, val in info_dict['http_headers'].items())]
''.join('%s: %s\r\n' % (key, val) for key, val in headers.items())]
args += ['-i', url, '-f', 'mp4', '-c', 'copy', '-bsf:a', 'aac_adtstoasc']
args += ['-i', url, '-c', 'copy']
if self.params.get('hls_use_mpegts', False):
args += ['-f', 'mpegts']
else:
args += ['-f', 'mp4', '-bsf:a', 'aac_adtstoasc']
args = [encodeArgument(opt) for opt in args]
args.append(encodeFilename(ffpp._ffmpeg_filename_argument(tmpfilename), True))
self._debug_cmd(args)
retval = subprocess.call(args)
proc = subprocess.Popen(args, stdin=subprocess.PIPE)
try:
retval = proc.wait()
except KeyboardInterrupt:
# subprocces.run would send the SIGKILL signal to ffmpeg and the
# mp4 file couldn't be played, but if we ask ffmpeg to quit it
# produces a file that is playable (this is mostly useful for live
# streams)
proc.communicate(b'q')
raise
if retval == 0:
fsize = os.path.getsize(encodeFilename(tmpfilename))
self.to_screen('\r[%s] %s bytes' % (args[0], fsize))

View File

@@ -7,14 +7,12 @@ import time
import re
from .common import FileDownloader
from ..compat import (
compat_urllib_request,
compat_urllib_error,
)
from ..compat import compat_urllib_error
from ..utils import (
ContentTooShortError,
encodeFilename,
sanitize_open,
sanitized_Request,
)
@@ -29,8 +27,8 @@ class HttpFD(FileDownloader):
add_headers = info_dict.get('http_headers')
if add_headers:
headers.update(add_headers)
basic_request = compat_urllib_request.Request(url, None, headers)
request = compat_urllib_request.Request(url, None, headers)
basic_request = sanitized_Request(url, None, headers)
request = sanitized_Request(url, None, headers)
is_test = self.params.get('test', False)

View File

@@ -117,7 +117,7 @@ class RtmpFD(FileDownloader):
return False
# Download using rtmpdump. rtmpdump returns exit code 2 when
# the connection was interrumpted and resuming appears to be
# the connection was interrupted and resuming appears to be
# possible. This is part of rtmpdump's normal usage, AFAIK.
basic_args = [
'rtmpdump', '--verbose', '-r', url,

View File

@@ -3,13 +3,19 @@ from __future__ import unicode_literals
from .abc import ABCIE
from .abc7news import Abc7NewsIE
from .academicearth import AcademicEarthCourseIE
from .acast import (
ACastIE,
ACastChannelIE,
)
from .addanime import AddAnimeIE
from .adobetv import (
AdobeTVIE,
AdobeTVShowIE,
AdobeTVChannelIE,
AdobeTVVideoIE,
)
from .adultswim import AdultSwimIE
from .aftenposten import AftenpostenIE
from .aenetworks import AENetworksIE
from .aftonbladet import AftonbladetIE
from .airmozilla import AirMozillaIE
from .aljazeera import AlJazeeraIE
@@ -20,7 +26,10 @@ from .aol import AolIE
from .allocine import AllocineIE
from .aparat import AparatIE
from .appleconnect import AppleConnectIE
from .appletrailers import AppleTrailersIE
from .appletrailers import (
AppleTrailersIE,
AppleTrailersSectionIE,
)
from .archiveorg import ArchiveOrgIE
from .ard import (
ARDIE,
@@ -33,13 +42,15 @@ from .arte import (
ArteTVCreativeIE,
ArteTVConcertIE,
ArteTVFutureIE,
ArteTVCinemaIE,
ArteTVDDCIE,
ArteTVEmbedIE,
)
from .atresplayer import AtresPlayerIE
from .atttechchannel import ATTTechChannelIE
from .audimedia import AudiMediaIE
from .audiomack import AudiomackIE, AudiomackAlbumIE
from .azubu import AzubuIE
from .azubu import AzubuIE, AzubuLiveIE
from .baidu import BaiduVideoIE
from .bambuser import BambuserIE, BambuserChannelIE
from .bandcamp import BandcampIE, BandcampAlbumIE
@@ -52,10 +63,14 @@ from .beeg import BeegIE
from .behindkink import BehindKinkIE
from .beatportpro import BeatportProIE
from .bet import BetIE
from .bigflix import BigflixIE
from .bild import BildIE
from .bilibili import BiliBiliIE
from .bleacherreport import (
BleacherReportIE,
BleacherReportCMSIE,
)
from .blinkx import BlinkxIE
from .bliptv import BlipTVIE, BlipTVUserIE
from .bloomberg import BloombergIE
from .bpb import BpbIE
from .br import BRIE
@@ -71,11 +86,14 @@ from .camdemy import (
CamdemyIE,
CamdemyFolderIE
)
from .canal13cl import Canal13clIE
from .canalplus import CanalplusIE
from .canalc2 import Canalc2IE
from .canvas import CanvasIE
from .cbs import CBSIE
from .cbsnews import CBSNewsIE
from .cbsnews import (
CBSNewsIE,
CBSNewsLiveVideoIE,
)
from .cbssports import CBSSportsIE
from .ccc import CCCIE
from .ceskatelevize import CeskaTelevizeIE
@@ -116,15 +134,27 @@ from .crunchyroll import (
)
from .cspan import CSpanIE
from .ctsnews import CtsNewsIE
from .cultureunplugged import CultureUnpluggedIE
from .cwtv import CWTVIE
from .dailymotion import (
DailymotionIE,
DailymotionPlaylistIE,
DailymotionUserIE,
DailymotionCloudIE,
)
from .daum import DaumIE
from .daum import (
DaumIE,
DaumClipIE,
DaumPlaylistIE,
DaumUserIE,
)
from .dbtv import DBTVIE
from .dcn import DCNIE
from .dcn import (
DCNIE,
DCNVideoIE,
DCNLiveIE,
DCNSeasonIE,
)
from .dctp import DctpTvIE
from .deezer import DeezerPlaylistIE
from .democracynow import DemocracynowIE
@@ -171,7 +201,10 @@ from .everyonesmixtape import EveryonesMixtapeIE
from .exfm import ExfmIE
from .expotv import ExpoTVIE
from .extremetube import ExtremeTubeIE
from .facebook import FacebookIE
from .facebook import (
FacebookIE,
FacebookPostIE,
)
from .faz import FazIE
from .fc2 import FC2IE
from .fczenit import FczenitIE
@@ -184,10 +217,14 @@ from .flickr import FlickrIE
from .folketinget import FolketingetIE
from .footyroom import FootyRoomIE
from .fourtube import FourTubeIE
from .fox import FOXIE
from .foxgay import FoxgayIE
from .foxnews import FoxNewsIE
from .foxsports import FoxSportsIE
from .franceculture import FranceCultureIE
from .franceculture import (
FranceCultureIE,
FranceCultureEmissionIE,
)
from .franceinter import FranceInterIE
from .francetv import (
PluzzIE,
@@ -199,7 +236,9 @@ from .francetv import (
from .freesound import FreesoundIE
from .freespeech import FreespeechIE
from .freevideo import FreeVideoIE
from .funimation import FunimationIE
from .funnyordie import FunnyOrDieIE
from .gameinformer import GameInformerIE
from .gamekings import GamekingsIE
from .gameone import (
GameOneIE,
@@ -223,9 +262,11 @@ from .globo import (
from .godtube import GodTubeIE
from .goldenmoustache import GoldenMoustacheIE
from .golem import GolemIE
from .googledrive import GoogleDriveIE
from .googleplus import GooglePlusIE
from .googlesearch import GoogleSearchIE
from .goshgay import GoshgayIE
from .gputechconf import GPUTechConfIE
from .groupon import GrouponIE
from .hark import HarkIE
from .hearthisat import HearThisAtIE
@@ -234,16 +275,20 @@ from .hellporno import HellPornoIE
from .helsinki import HelsinkiIE
from .hentaistigma import HentaiStigmaIE
from .historicfilms import HistoricFilmsIE
from .history import HistoryIE
from .hitbox import HitboxIE, HitboxLiveIE
from .hornbunny import HornBunnyIE
from .hotnewhiphop import HotNewHipHopIE
from .hotstar import HotStarIE
from .howcast import HowcastIE
from .howstuffworks import HowStuffWorksIE
from .huffpost import HuffPostIE
from .hypem import HypemIE
from .iconosquare import IconosquareIE
from .ign import IGNIE, OneUPIE
from .ign import (
IGNIE,
OneUPIE,
PCMagIE,
)
from .imdb import (
ImdbIE,
ImdbListIE
@@ -267,11 +312,12 @@ from .ivi import (
IviIE,
IviCompilationIE
)
from .ivideon import IvideonIE
from .izlesene import IzleseneIE
from .jadorecettepub import JadoreCettePubIE
from .jeuxvideo import JeuxVideoIE
from .jove import JoveIE
from .jukebox import JukeboxIE
from .jwplatform import JWPlatformIE
from .jpopsukitv import JpopsukiIE
from .kaltura import KalturaIE
from .kanalplay import KanalPlayIE
@@ -296,10 +342,12 @@ from .kuwo import (
from .la7 import LA7IE
from .laola1tv import Laola1TvIE
from .lecture2go import Lecture2GoIE
from .lemonde import LemondeIE
from .letv import (
LetvIE,
LetvTvIE,
LetvPlaylistIE
LetvPlaylistIE,
LetvCloudIE,
)
from .libsyn import LibsynIE
from .lifenews import (
@@ -318,6 +366,7 @@ from .livestream import (
LivestreamShortenerIE,
)
from .lnkgo import LnkGoIE
from .lovehomeporn import LoveHomePornIE
from .lrt import LRTIE
from .lynda import (
LyndaIE,
@@ -326,7 +375,9 @@ from .lynda import (
from .m6 import M6IE
from .macgamestore import MacGameStoreIE
from .mailru import MailRuIE
from .makertv import MakerTVIE
from .malemotion import MalemotionIE
from .matchtv import MatchTVIE
from .mdr import MDRIE
from .metacafe import MetacafeIE
from .metacritic import MetacriticIE
@@ -349,7 +400,6 @@ from .motherless import MotherlessIE
from .motorsport import MotorsportIE
from .movieclips import MovieClipsIE
from .moviezine import MoviezineIE
from .movshare import MovShareIE
from .mtv import (
MTVIE,
MTVServicesEmbeddedIE,
@@ -402,6 +452,7 @@ from .nextmedia import (
NextMediaActionNewsIE,
AppleDailyIE,
)
from .nextmovie import NextMovieIE
from .nfb import NFBIE
from .nfl import NFLIE
from .nhl import (
@@ -409,13 +460,20 @@ from .nhl import (
NHLNewsIE,
NHLVideocenterIE,
)
from .nick import NickIE
from .niconico import NiconicoIE, NiconicoPlaylistIE
from .ninegag import NineGagIE
from .noco import NocoIE
from .normalboots import NormalbootsIE
from .nosvideo import NosVideoIE
from .nova import NovaIE
from .novamov import NovaMovIE
from .novamov import (
NovaMovIE,
WholeCloudIE,
NowVideoIE,
VideoWeedIE,
CloudTimeIE,
)
from .nowness import (
NownessIE,
NownessPlaylistIE,
@@ -425,15 +483,16 @@ from .nowtv import (
NowTVIE,
NowTVListIE,
)
from .nowvideo import NowVideoIE
from .npo import (
NPOIE,
NPOLiveIE,
NPORadioIE,
NPORadioFragmentIE,
SchoolTVIE,
VPROIE,
WNLIE
)
from .npr import NprIE
from .nrk import (
NRKIE,
NRKPlaylistIE,
@@ -453,12 +512,14 @@ from .ooyala import (
OoyalaIE,
OoyalaExternalIE,
)
from .ora import OraTVIE
from .orf import (
ORFTVthekIE,
ORFOE1IE,
ORFFM4IE,
ORFIPTVIE,
)
from .pandoratv import PandoraTVIE
from .parliamentliveuk import ParliamentLiveUKIE
from .patreon import PatreonIE
from .pbs import PBSIE
@@ -506,18 +567,23 @@ from .radiode import RadioDeIE
from .radiojavan import RadioJavanIE
from .radiobremen import RadioBremenIE
from .radiofrance import RadioFranceIE
from .rai import RaiIE
from .rai import (
RaiTVIE,
RaiIE,
)
from .rbmaradio import RBMARadioIE
from .rds import RDSIE
from .redtube import RedTubeIE
from .regiotv import RegioTVIE
from .restudy import RestudyIE
from .reverbnation import ReverbNationIE
from .revision3 import Revision3IE
from .ringtv import RingTVIE
from .ro220 import Ro220IE
from .rottentomatoes import RottenTomatoesIE
from .roxwel import RoxwelIE
from .rtbf import RTBFIE
from .rte import RteIE
from .rte import RteIE, RteRadioIE
from .rtlnl import RtlNlIE
from .rtl2 import RTL2IE
from .rtp import RTPIE
@@ -525,6 +591,7 @@ from .rts import RTSIE
from .rtve import RTVEALaCartaIE, RTVELiveIE, RTVEInfantilIE
from .rtvnh import RTVNHIE
from .ruhd import RUHDIE
from .ruleporn import RulePornIE
from .rutube import (
RutubeIE,
RutubeChannelIE,
@@ -554,6 +621,10 @@ from .shahid import ShahidIE
from .shared import SharedIE
from .sharesix import ShareSixIE
from .sina import SinaIE
from .skynewsarabia import (
SkyNewsArabiaIE,
SkyNewsArabiaArticleIE,
)
from .slideshare import SlideshareIE
from .slutload import SlutloadIE
from .smotri import (
@@ -568,15 +639,12 @@ from .snagfilms import (
)
from .snotr import SnotrIE
from .sohu import SohuIE
from .soompi import (
SoompiIE,
SoompiShowIE,
)
from .soundcloud import (
SoundcloudIE,
SoundcloudSetIE,
SoundcloudUserIE,
SoundcloudPlaylistIE
SoundcloudPlaylistIE,
SoundcloudSearchIE
)
from .soundgasm import (
SoundgasmIE,
@@ -602,7 +670,10 @@ from .sportbox import (
SportBoxEmbedIE,
)
from .sportdeutschland import SportDeutschlandIE
from .srf import SrfIE
from .srgssr import (
SRGSSRIE,
SRGSSRPlayIE,
)
from .srmediathek import SRMediathekIE
from .ssa import SSAIE
from .stanfordoc import StanfordOpenClassroomIE
@@ -629,6 +700,7 @@ from .teachingchannel import TeachingChannelIE
from .teamcoco import TeamcocoIE
from .techtalks import TechTalksIE
from .ted import TEDIE
from .tele13 import Tele13IE
from .telebruxelles import TeleBruxellesIE
from .telecinco import TelecincoIE
from .telegraaf import TelegraafIE
@@ -636,8 +708,8 @@ from .telemb import TeleMBIE
from .teletask import TeleTaskIE
from .tenplay import TenPlayIE
from .testurl import TestURLIE
from .testtube import TestTubeIE
from .tf1 import TF1IE
from .theintercept import TheInterceptIE
from .theonion import TheOnionIE
from .theplatform import (
ThePlatformIE,
@@ -647,7 +719,7 @@ from .thesixtyone import TheSixtyOneIE
from .thisamericanlife import ThisAmericanLifeIE
from .thisav import ThisAVIE
from .tinypic import TinyPicIE
from .tlc import TlcIE, TlcDeIE
from .tlc import TlcDeIE
from .tmz import (
TMZIE,
TMZArticleIE,
@@ -657,6 +729,7 @@ from .tnaflix import (
EMPFlixIE,
MovieFapIE,
)
from .toggle import ToggleIE
from .thvideo import (
THVideoIE,
THVideoPlaylistIE
@@ -665,12 +738,23 @@ from .toutv import TouTvIE
from .toypics import ToypicsUserIE, ToypicsIE
from .traileraddict import TrailerAddictIE
from .trilulilu import TriluliluIE
from .trollvids import TrollvidsIE
from .trutube import TruTubeIE
from .tube8 import Tube8IE
from .tubitv import TubiTvIE
from .tudou import TudouIE
from .tudou import (
TudouIE,
TudouPlaylistIE,
TudouAlbumIE,
)
from .tumblr import TumblrIE
from .tunein import TuneInIE
from .tunein import (
TuneInClipIE,
TuneInStationIE,
TuneInProgramIE,
TuneInTopicIE,
TuneInShortenerIE,
)
from .turbo import TurboIE
from .tutv import TutvIE
from .tv2 import (
@@ -683,10 +767,12 @@ from .tvc import (
TVCArticleIE,
)
from .tvigle import TvigleIE
from .tvland import TVLandIE
from .tvp import TvpIE, TvpSeriesIE
from .tvplay import TVPlayIE
from .tweakers import TweakersIE
from .twentyfourvideo import TwentyFourVideoIE
from .twentymin import TwentyMinutenIE
from .twentytwotracks import (
TwentyTwoTracksIE,
TwentyTwoTracksGenreIE
@@ -707,7 +793,7 @@ from .udemy import (
UdemyCourseIE
)
from .udn import UDNEmbedIE
from .ultimedia import UltimediaIE
from .digiteka import DigitekaIE
from .unistra import UnistraIE
from .urort import UrortIE
from .ustream import UstreamIE, UstreamChannelIE
@@ -729,9 +815,13 @@ from .viddler import ViddlerIE
from .videodetective import VideoDetectiveIE
from .videofyme import VideofyMeIE
from .videomega import VideoMegaIE
from .videomore import (
VideomoreIE,
VideomoreVideoIE,
VideomoreSeasonIE,
)
from .videopremium import VideoPremiumIE
from .videott import VideoTtIE
from .videoweed import VideoWeedIE
from .vidme import VidmeIE
from .vidzi import VidziIE
from .vier import VierIE, VierVideosIE
@@ -782,6 +872,7 @@ from .webofstories import (
WebOfStoriesPlaylistIE,
)
from .weibo import WeiboIE
from .weiqitv import WeiqiTVIE
from .wimp import WimpIE
from .wistia import WistiaIE
from .worldstarhiphop import WorldStarHipHopIE
@@ -833,6 +924,7 @@ from .youtube import (
YoutubeTruncatedIDIE,
YoutubeTruncatedURLIE,
YoutubeUserIE,
YoutubePlaylistsIE,
YoutubeWatchLaterIE,
)
from .zapiks import ZapiksIE
@@ -841,6 +933,7 @@ from .zingmp3 import (
ZingMp3SongIE,
ZingMp3AlbumIE,
)
from .zippcast import ZippCastIE
_ALL_CLASSES = [
klass

View File

@@ -23,6 +23,7 @@ class ABCIE(InfoExtractor):
'title': 'Australia to help staff Ebola treatment centre in Sierra Leone',
'description': 'md5:809ad29c67a05f54eb41f2a105693a67',
},
'skip': 'this video has expired',
}, {
'url': 'http://www.abc.net.au/news/2015-08-17/warren-entsch-introduces-same-sex-marriage-bill/6702326',
'md5': 'db2a5369238b51f9811ad815b69dc086',
@@ -36,6 +37,7 @@ class ABCIE(InfoExtractor):
'title': 'Marriage Equality: Warren Entsch introduces same sex marriage bill',
},
'add_ie': ['Youtube'],
'skip': 'Not accessible from Travis CI server',
}, {
'url': 'http://www.abc.net.au/news/2015-10-23/nab-lifts-interest-rates-following-westpac-and-cba/6880080',
'md5': 'b96eee7c9edf4fc5a358a0252881cc1f',
@@ -58,6 +60,9 @@ class ABCIE(InfoExtractor):
r'inline(?P<type>Video|Audio|YouTube)Data\.push\((?P<json_data>[^)]+)\);',
webpage)
if mobj is None:
expired = self._html_search_regex(r'(?s)class="expired-(?:video|audio)".+?<span>(.+?)</span>', webpage, 'expired', None)
if expired:
raise ExtractorError('%s said: %s' % (self.IE_NAME, expired), expected=True)
raise ExtractorError('Unable to extract video urls')
urls_info = self._parse_json(

View File

@@ -44,7 +44,6 @@ class Abc7NewsIE(InfoExtractor):
'contentURL', webpage, 'm3u8 url', fatal=True)
formats = self._extract_m3u8_formats(m3u8, display_id, 'mp4')
self._sort_formats(formats)
title = self._og_search_title(webpage).strip()
description = self._og_search_description(webpage).strip()

View File

@@ -0,0 +1,72 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import int_or_none
class ACastIE(InfoExtractor):
IE_NAME = 'acast'
_VALID_URL = r'https?://(?:www\.)?acast\.com/(?P<channel>[^/]+)/(?P<id>[^/#?]+)'
_TEST = {
'url': 'https://www.acast.com/condenasttraveler/-where-are-you-taipei-101-taiwan',
'md5': 'ada3de5a1e3a2a381327d749854788bb',
'info_dict': {
'id': '57de3baa-4bb0-487e-9418-2692c1277a34',
'ext': 'mp3',
'title': '"Where Are You?": Taipei 101, Taiwan',
'timestamp': 1196172000000,
'description': 'md5:a0b4ef3634e63866b542e5b1199a1a0e',
'duration': 211,
}
}
def _real_extract(self, url):
channel, display_id = re.match(self._VALID_URL, url).groups()
embed_page = self._download_webpage(
re.sub('(?:www\.)?acast\.com', 'embedcdn.acast.com', url), display_id)
cast_data = self._parse_json(self._search_regex(
r'window\[\'acast/queries\'\]\s*=\s*([^;]+);', embed_page, 'acast data'),
display_id)['GetAcast/%s/%s' % (channel, display_id)]
return {
'id': compat_str(cast_data['id']),
'display_id': display_id,
'url': cast_data['blings'][0]['audio'],
'title': cast_data['name'],
'description': cast_data.get('description'),
'thumbnail': cast_data.get('image'),
'timestamp': int_or_none(cast_data.get('publishingDate')),
'duration': int_or_none(cast_data.get('duration')),
}
class ACastChannelIE(InfoExtractor):
IE_NAME = 'acast:channel'
_VALID_URL = r'https?://(?:www\.)?acast\.com/(?P<id>[^/#?]+)'
_TEST = {
'url': 'https://www.acast.com/condenasttraveler',
'info_dict': {
'id': '50544219-29bb-499e-a083-6087f4cb7797',
'title': 'Condé Nast Traveler Podcast',
'description': 'md5:98646dee22a5b386626ae31866638fbd',
},
'playlist_mincount': 20,
}
_API_BASE_URL = 'https://www.acast.com/api/'
@classmethod
def suitable(cls, url):
return False if ACastIE.suitable(url) else super(ACastChannelIE, cls).suitable(url)
def _real_extract(self, url):
display_id = self._match_id(url)
channel_data = self._download_json(self._API_BASE_URL + 'channels/%s' % display_id, display_id)
casts = self._download_json(self._API_BASE_URL + 'channels/%s/acasts' % display_id, display_id)
entries = [self.url_result('https://www.acast.com/%s/%s' % (display_id, cast['url']), 'ACast') for cast in casts]
return self.playlist_result(entries, compat_str(channel_data['id']), channel_data['name'], channel_data.get('description'))

View File

@@ -1,23 +1,32 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
parse_duration,
unified_strdate,
str_to_int,
int_or_none,
float_or_none,
ISO639Utils,
determine_ext,
)
class AdobeTVIE(InfoExtractor):
_VALID_URL = r'https?://tv\.adobe\.com/watch/[^/]+/(?P<id>[^/]+)'
class AdobeTVBaseIE(InfoExtractor):
_API_BASE_URL = 'http://tv.adobe.com/api/v4/'
class AdobeTVIE(AdobeTVBaseIE):
_VALID_URL = r'https?://tv\.adobe\.com/(?:(?P<language>fr|de|es|jp)/)?watch/(?P<show_urlname>[^/]+)/(?P<id>[^/]+)'
_TEST = {
'url': 'http://tv.adobe.com/watch/the-complete-picture-with-julieanne-kost/quick-tip-how-to-draw-a-circle-around-an-object-in-photoshop/',
'md5': '9bc5727bcdd55251f35ad311ca74fa1e',
'info_dict': {
'id': 'quick-tip-how-to-draw-a-circle-around-an-object-in-photoshop',
'id': '10981',
'ext': 'mp4',
'title': 'Quick Tip - How to Draw a Circle Around an Object in Photoshop',
'description': 'md5:99ec318dc909d7ba2a1f2b038f7d2311',
@@ -29,50 +38,106 @@ class AdobeTVIE(InfoExtractor):
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
language, show_urlname, urlname = re.match(self._VALID_URL, url).groups()
if not language:
language = 'en'
player = self._parse_json(
self._search_regex(r'html5player:\s*({.+?})\s*\n', webpage, 'player'),
video_id)
title = player.get('title') or self._search_regex(
r'data-title="([^"]+)"', webpage, 'title')
description = self._og_search_description(webpage)
thumbnail = self._og_search_thumbnail(webpage)
upload_date = unified_strdate(
self._html_search_meta('datepublished', webpage, 'upload date'))
duration = parse_duration(
self._html_search_meta('duration', webpage, 'duration') or
self._search_regex(
r'Runtime:\s*(\d{2}:\d{2}:\d{2})',
webpage, 'duration', fatal=False))
view_count = str_to_int(self._search_regex(
r'<div class="views">\s*Views?:\s*([\d,.]+)\s*</div>',
webpage, 'view count'))
video_data = self._download_json(
self._API_BASE_URL + 'episode/get/?language=%s&show_urlname=%s&urlname=%s&disclosure=standard' % (language, show_urlname, urlname),
urlname)['data'][0]
formats = [{
'url': source['src'],
'format_id': source.get('quality') or source['src'].split('-')[-1].split('.')[0] or None,
'tbr': source.get('bitrate'),
} for source in player['sources']]
'url': source['url'],
'format_id': source.get('quality_level') or source['url'].split('-')[-1].split('.')[0] or None,
'width': int_or_none(source.get('width')),
'height': int_or_none(source.get('height')),
'tbr': int_or_none(source.get('video_data_rate')),
} for source in video_data['videos']]
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'upload_date': upload_date,
'duration': duration,
'view_count': view_count,
'id': compat_str(video_data['id']),
'title': video_data['title'],
'description': video_data.get('description'),
'thumbnail': video_data.get('thumbnail'),
'upload_date': unified_strdate(video_data.get('start_date')),
'duration': parse_duration(video_data.get('duration')),
'view_count': str_to_int(video_data.get('playcount')),
'formats': formats,
}
class AdobeTVPlaylistBaseIE(AdobeTVBaseIE):
def _parse_page_data(self, page_data):
return [self.url_result(self._get_element_url(element_data)) for element_data in page_data]
def _extract_playlist_entries(self, url, display_id):
page = self._download_json(url, display_id)
entries = self._parse_page_data(page['data'])
for page_num in range(2, page['paging']['pages'] + 1):
entries.extend(self._parse_page_data(
self._download_json(url + '&page=%d' % page_num, display_id)['data']))
return entries
class AdobeTVShowIE(AdobeTVPlaylistBaseIE):
_VALID_URL = r'https?://tv\.adobe\.com/(?:(?P<language>fr|de|es|jp)/)?show/(?P<id>[^/]+)'
_TEST = {
'url': 'http://tv.adobe.com/show/the-complete-picture-with-julieanne-kost',
'info_dict': {
'id': '36',
'title': 'The Complete Picture with Julieanne Kost',
'description': 'md5:fa50867102dcd1aa0ddf2ab039311b27',
},
'playlist_mincount': 136,
}
def _get_element_url(self, element_data):
return element_data['urls'][0]
def _real_extract(self, url):
language, show_urlname = re.match(self._VALID_URL, url).groups()
if not language:
language = 'en'
query = 'language=%s&show_urlname=%s' % (language, show_urlname)
show_data = self._download_json(self._API_BASE_URL + 'show/get/?%s' % query, show_urlname)['data'][0]
return self.playlist_result(
self._extract_playlist_entries(self._API_BASE_URL + 'episode/?%s' % query, show_urlname),
compat_str(show_data['id']),
show_data['show_name'],
show_data['show_description'])
class AdobeTVChannelIE(AdobeTVPlaylistBaseIE):
_VALID_URL = r'https?://tv\.adobe\.com/(?:(?P<language>fr|de|es|jp)/)?channel/(?P<id>[^/]+)(?:/(?P<category_urlname>[^/]+))?'
_TEST = {
'url': 'http://tv.adobe.com/channel/development',
'info_dict': {
'id': 'development',
},
'playlist_mincount': 96,
}
def _get_element_url(self, element_data):
return element_data['url']
def _real_extract(self, url):
language, channel_urlname, category_urlname = re.match(self._VALID_URL, url).groups()
if not language:
language = 'en'
query = 'language=%s&channel_urlname=%s' % (language, channel_urlname)
if category_urlname:
query += '&category_urlname=%s' % category_urlname
return self.playlist_result(
self._extract_playlist_entries(self._API_BASE_URL + 'show/?%s' % query, channel_urlname),
channel_urlname)
class AdobeTVVideoIE(InfoExtractor):
_VALID_URL = r'https?://video\.tv\.adobe\.com/v/(?P<id>\d+)'
@@ -91,28 +156,25 @@ class AdobeTVVideoIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
player_params = self._parse_json(self._search_regex(
r'var\s+bridge\s*=\s*([^;]+);', webpage, 'player parameters'),
video_id)
video_data = self._download_json(url + '?format=json', video_id)
formats = [{
'format_id': '%s-%s' % (determine_ext(source['src']), source.get('height')),
'url': source['src'],
'width': source.get('width'),
'height': source.get('height'),
'tbr': source.get('bitrate'),
} for source in player_params['sources']]
'width': int_or_none(source.get('width')),
'height': int_or_none(source.get('height')),
'tbr': int_or_none(source.get('bitrate')),
} for source in video_data['sources']]
self._sort_formats(formats)
# For both metadata and downloaded files the duration varies among
# formats. I just pick the max one
duration = max(filter(None, [
float_or_none(source.get('duration'), scale=1000)
for source in player_params['sources']]))
for source in video_data['sources']]))
subtitles = {}
for translation in player_params.get('translations', []):
for translation in video_data.get('translations', []):
lang_id = translation.get('language_w3c') or ISO639Utils.long2short(translation['language_medium'])
if lang_id not in subtitles:
subtitles[lang_id] = []
@@ -124,8 +186,9 @@ class AdobeTVVideoIE(InfoExtractor):
return {
'id': video_id,
'formats': formats,
'title': player_params['title'],
'description': self._og_search_description(webpage),
'title': video_data['title'],
'description': video_data.get('description'),
'thumbnail': video_data['video'].get('poster'),
'duration': duration,
'subtitles': subtitles,
}

View File

@@ -68,7 +68,7 @@ class AdultSwimIE(InfoExtractor):
'md5': '3e346a2ab0087d687a05e1e7f3b3e529',
'info_dict': {
'id': 'sY3cMUR_TbuE4YmdjzbIcQ-0',
'ext': 'flv',
'ext': 'mp4',
'title': 'Tim and Eric Awesome Show Great Job! - Dr. Steve Brule, For Your Wine',
'description': 'Dr. Brule reports live from Wine Country with a special report on wines. \r\nWatch Tim and Eric Awesome Show Great Job! episode #20, "Embarrassed" on Adult Swim.\r\n\r\n',
},
@@ -79,6 +79,10 @@ class AdultSwimIE(InfoExtractor):
'title': 'Tim and Eric Awesome Show Great Job! - Dr. Steve Brule, For Your Wine',
'description': 'Dr. Brule reports live from Wine Country with a special report on wines. \r\nWatch Tim and Eric Awesome Show Great Job! episode #20, "Embarrassed" on Adult Swim.\r\n\r\n',
},
'params': {
# m3u8 download
'skip_download': True,
}
}]
@staticmethod
@@ -183,7 +187,8 @@ class AdultSwimIE(InfoExtractor):
media_url = file_el.text
if determine_ext(media_url) == 'm3u8':
formats.extend(self._extract_m3u8_formats(
media_url, segment_title, 'mp4', preference=0, m3u8_id='hls'))
media_url, segment_title, 'mp4', preference=0,
m3u8_id='hls', fatal=False))
else:
formats.append({
'format_id': '%s_%s' % (bitrate, ftype),

View File

@@ -0,0 +1,66 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import smuggle_url
class AENetworksIE(InfoExtractor):
IE_NAME = 'aenetworks'
IE_DESC = 'A+E Networks: A&E, Lifetime, History.com, FYI Network'
_VALID_URL = r'https?://(?:www\.)?(?:(?:history|aetv|mylifetime)\.com|fyi\.tv)/(?:[^/]+/)+(?P<id>[^/]+?)(?:$|[?#])'
_TESTS = [{
'url': 'http://www.history.com/topics/valentines-day/history-of-valentines-day/videos/bet-you-didnt-know-valentines-day?m=528e394da93ae&s=undefined&f=1&free=false',
'info_dict': {
'id': 'g12m5Gyt3fdR',
'ext': 'mp4',
'title': "Bet You Didn't Know: Valentine's Day",
'description': 'md5:7b57ea4829b391995b405fa60bd7b5f7',
},
'params': {
# m3u8 download
'skip_download': True,
},
'add_ie': ['ThePlatform'],
'expected_warnings': ['JSON-LD'],
}, {
'url': 'http://www.history.com/shows/mountain-men/season-1/episode-1',
'info_dict': {
'id': 'eg47EERs_JsZ',
'ext': 'mp4',
'title': "Winter Is Coming",
'description': 'md5:641f424b7a19d8e24f26dea22cf59d74',
},
'params': {
# m3u8 download
'skip_download': True,
},
'add_ie': ['ThePlatform'],
}, {
'url': 'http://www.aetv.com/shows/duck-dynasty/video/inlawful-entry',
'only_matching': True
}, {
'url': 'http://www.fyi.tv/shows/tiny-house-nation/videos/207-sq-ft-minnesota-prairie-cottage',
'only_matching': True
}, {
'url': 'http://www.mylifetime.com/shows/project-runway-junior/video/season-1/episode-6/superstar-clients',
'only_matching': True
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_url_re = [
r'data-href="[^"]*/%s"[^>]+data-release-url="([^"]+)"' % video_id,
r"media_url\s*=\s*'([^']+)'"
]
video_url = self._search_regex(video_url_re, webpage, 'video url')
info = self._search_json_ld(webpage, video_id, fatal=False)
info.update({
'_type': 'url_transparent',
'url': smuggle_url(video_url, {'sig': {'key': 'crazyjava', 'secret': 's3cr3t'}}),
})
return info

View File

@@ -1,23 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class AftenpostenIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?aftenposten\.no/webtv/(?:#!/)?video/(?P<id>\d+)'
_TEST = {
'url': 'http://www.aftenposten.no/webtv/#!/video/21039/trailer-sweatshop-i-can-t-take-any-more',
'md5': 'fd828cd29774a729bf4d4425fe192972',
'info_dict': {
'id': '21039',
'ext': 'mov',
'title': 'TRAILER: "Sweatshop" - I can´t take any more',
'description': 'md5:21891f2b0dd7ec2f78d84a50e54f8238',
'timestamp': 1416927969,
'upload_date': '20141125',
}
}
def _real_extract(self, url):
return self.url_result('xstream:ap:%s' % self._match_id(url), 'Xstream')

View File

@@ -8,6 +8,8 @@ from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
qualities,
unescapeHTML,
xpath_element,
)
@@ -31,7 +33,7 @@ class AllocineIE(InfoExtractor):
'id': '19540403',
'ext': 'mp4',
'title': 'Planes 2 Bande-annonce VF',
'description': 'md5:eeaffe7c2d634525e21159b93acf3b1e',
'description': 'Regardez la bande annonce du film Planes 2 (Planes 2 Bande-annonce VF). Planes 2, un film de Roberts Gannaway',
'thumbnail': 're:http://.*\.jpg',
},
}, {
@@ -41,7 +43,7 @@ class AllocineIE(InfoExtractor):
'id': '19544709',
'ext': 'mp4',
'title': 'Dragons 2 - Bande annonce finale VF',
'description': 'md5:71742e3a74b0d692c7fce0dd2017a4ac',
'description': 'md5:601d15393ac40f249648ef000720e7e3',
'thumbnail': 're:http://.*\.jpg',
},
}, {
@@ -59,14 +61,18 @@ class AllocineIE(InfoExtractor):
if typ == 'film':
video_id = self._search_regex(r'href="/video/player_gen_cmedia=([0-9]+).+"', webpage, 'video id')
else:
player = self._search_regex(r'data-player=\'([^\']+)\'>', webpage, 'data player')
player_data = json.loads(player)
video_id = compat_str(player_data['refMedia'])
player = self._search_regex(r'data-player=\'([^\']+)\'>', webpage, 'data player', default=None)
if player:
player_data = json.loads(player)
video_id = compat_str(player_data['refMedia'])
else:
model = self._search_regex(r'data-model="([^"]+)">', webpage, 'data model')
model_data = self._parse_json(unescapeHTML(model), display_id)
video_id = compat_str(model_data['id'])
xml = self._download_xml('http://www.allocine.fr/ws/AcVisiondataV4.ashx?media=%s' % video_id, display_id)
video = xml.find('.//AcVisionVideo').attrib
video = xpath_element(xml, './/AcVisionVideo').attrib
quality = qualities(['ld', 'md', 'hd'])
formats = []

View File

@@ -0,0 +1,81 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
parse_iso8601,
)
class AMPIE(InfoExtractor):
# parse Akamai Adaptive Media Player feed
def _extract_feed_info(self, url):
item = self._download_json(
url, None, 'Downloading Akamai AMP feed',
'Unable to download Akamai AMP feed')['channel']['item']
video_id = item['guid']
def get_media_node(name, default=None):
media_name = 'media-%s' % name
media_group = item.get('media-group') or item
return media_group.get(media_name) or item.get(media_name) or item.get(name, default)
thumbnails = []
media_thumbnail = get_media_node('thumbnail')
if media_thumbnail:
if isinstance(media_thumbnail, dict):
media_thumbnail = [media_thumbnail]
for thumbnail_data in media_thumbnail:
thumbnail = thumbnail_data['@attributes']
thumbnails.append({
'url': self._proto_relative_url(thumbnail['url'], 'http:'),
'width': int_or_none(thumbnail.get('width')),
'height': int_or_none(thumbnail.get('height')),
})
subtitles = {}
media_subtitle = get_media_node('subTitle')
if media_subtitle:
if isinstance(media_subtitle, dict):
media_subtitle = [media_subtitle]
for subtitle_data in media_subtitle:
subtitle = subtitle_data['@attributes']
lang = subtitle.get('lang') or 'en'
subtitles[lang] = [{'url': subtitle['href']}]
formats = []
media_content = get_media_node('content')
if isinstance(media_content, dict):
media_content = [media_content]
for media_data in media_content:
media = media_data['@attributes']
media_type = media['type']
if media_type == 'video/f4m':
formats.extend(self._extract_f4m_formats(
media['url'] + '?hdcore=3.4.0&plugin=aasp-3.4.0.132.124',
video_id, f4m_id='hds', fatal=False))
elif media_type == 'application/x-mpegURL':
formats.extend(self._extract_m3u8_formats(
media['url'], video_id, 'mp4', m3u8_id='hls', fatal=False))
else:
formats.append({
'format_id': media_data['media-category']['@attributes']['label'],
'url': media['url'],
'tbr': int_or_none(media.get('bitrate')),
'filesize': int_or_none(media.get('fileSize')),
})
self._sort_formats(formats)
return {
'id': video_id,
'title': get_media_node('title'),
'description': get_media_node('description'),
'thumbnails': thumbnails,
'timestamp': parse_iso8601(item.get('pubDate'), ' '),
'duration': int_or_none(media_content[0].get('@attributes', {}).get('duration')),
'subtitles': subtitles,
'formats': formats,
}

View File

@@ -1,11 +1,9 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from .nuevo import NuevoBaseIE
class AnitubeIE(InfoExtractor):
class AnitubeIE(NuevoBaseIE):
IE_NAME = 'anitube.se'
_VALID_URL = r'https?://(?:www\.)?anitube\.se/video/(?P<id>\d+)'
@@ -22,38 +20,11 @@ class AnitubeIE(InfoExtractor):
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
key = self._search_regex(
r'src=["\']https?://[^/]+/embed/([A-Za-z0-9_-]+)', webpage, 'key')
config_xml = self._download_xml(
'http://www.anitube.se/nuevo/econfig.php?key=%s' % key, key)
video_title = config_xml.find('title').text
thumbnail = config_xml.find('image').text
duration = float(config_xml.find('duration').text)
formats = []
video_url = config_xml.find('file')
if video_url is not None:
formats.append({
'format_id': 'sd',
'url': video_url.text,
})
video_url = config_xml.find('filehd')
if video_url is not None:
formats.append({
'format_id': 'hd',
'url': video_url.text,
})
return {
'id': video_id,
'title': video_title,
'thumbnail': thumbnail,
'duration': duration,
'formats': formats
}
return self._extract_nuevo(
'http://www.anitube.se/nuevo/econfig.php?key=%s' % key, video_id)

View File

@@ -11,6 +11,7 @@ from ..utils import (
class AppleTrailersIE(InfoExtractor):
IE_NAME = 'appletrailers'
_VALID_URL = r'https?://(?:www\.)?trailers\.apple\.com/(?:trailers|ca)/(?P<company>[^/]+)/(?P<movie>[^/]+)'
_TESTS = [{
'url': 'http://trailers.apple.com/trailers/wb/manofsteel/',
@@ -63,6 +64,12 @@ class AppleTrailersIE(InfoExtractor):
},
},
]
}, {
'url': 'http://trailers.apple.com/trailers/magnolia/blackthorn/',
'info_dict': {
'id': 'blackthorn',
},
'playlist_mincount': 2,
}, {
'url': 'http://trailers.apple.com/ca/metropole/autrui/',
'only_matching': True,
@@ -79,7 +86,7 @@ class AppleTrailersIE(InfoExtractor):
def fix_html(s):
s = re.sub(r'(?s)<script[^<]*?>.*?</script>', '', s)
s = re.sub(r'<img ([^<]*?)>', r'<img \1/>', s)
s = re.sub(r'<img ([^<]*?)/?>', r'<img \1/>', s)
# The ' in the onClick attributes are not escaped, it couldn't be parsed
# like: http://trailers.apple.com/trailers/wb/gravity/
@@ -96,6 +103,9 @@ class AppleTrailersIE(InfoExtractor):
trailer_info_json = self._search_regex(self._JSON_RE,
on_click, 'trailer info')
trailer_info = json.loads(trailer_info_json)
first_url = trailer_info.get('url')
if not first_url:
continue
title = trailer_info['title']
video_id = movie + '-' + re.sub(r'[^a-zA-Z0-9]', '', title).lower()
thumbnail = li.find('.//img').attrib['src']
@@ -107,7 +117,6 @@ class AppleTrailersIE(InfoExtractor):
if m:
duration = 60 * int(m.group('minutes')) + int(m.group('seconds'))
first_url = trailer_info['url']
trailer_id = first_url.split('/')[-1].rpartition('_')[0].lower()
settings_json_url = compat_urlparse.urljoin(url, 'includes/settings/%s.json' % trailer_id)
settings = self._download_json(settings_json_url, trailer_id, 'Downloading settings json')
@@ -144,3 +153,76 @@ class AppleTrailersIE(InfoExtractor):
'id': movie,
'entries': playlist,
}
class AppleTrailersSectionIE(InfoExtractor):
IE_NAME = 'appletrailers:section'
_SECTIONS = {
'justadded': {
'feed_path': 'just_added',
'title': 'Just Added',
},
'exclusive': {
'feed_path': 'exclusive',
'title': 'Exclusive',
},
'justhd': {
'feed_path': 'just_hd',
'title': 'Just HD',
},
'mostpopular': {
'feed_path': 'most_pop',
'title': 'Most Popular',
},
'moviestudios': {
'feed_path': 'studios',
'title': 'Movie Studios',
},
}
_VALID_URL = r'https?://(?:www\.)?trailers\.apple\.com/#section=(?P<id>%s)' % '|'.join(_SECTIONS)
_TESTS = [{
'url': 'http://trailers.apple.com/#section=justadded',
'info_dict': {
'title': 'Just Added',
'id': 'justadded',
},
'playlist_mincount': 80,
}, {
'url': 'http://trailers.apple.com/#section=exclusive',
'info_dict': {
'title': 'Exclusive',
'id': 'exclusive',
},
'playlist_mincount': 80,
}, {
'url': 'http://trailers.apple.com/#section=justhd',
'info_dict': {
'title': 'Just HD',
'id': 'justhd',
},
'playlist_mincount': 80,
}, {
'url': 'http://trailers.apple.com/#section=mostpopular',
'info_dict': {
'title': 'Most Popular',
'id': 'mostpopular',
},
'playlist_mincount': 80,
}, {
'url': 'http://trailers.apple.com/#section=moviestudios',
'info_dict': {
'title': 'Movie Studios',
'id': 'moviestudios',
},
'playlist_mincount': 80,
}]
def _real_extract(self, url):
section = self._match_id(url)
section_data = self._download_json(
'http://trailers.apple.com/trailers/home/feeds/%s.json' % self._SECTIONS[section]['feed_path'],
section)
entries = [
self.url_result('http://trailers.apple.com' + e['location'])
for e in section_data]
return self.playlist_result(entries, section, self._SECTIONS[section]['title'])

View File

@@ -110,13 +110,15 @@ class ARDMediathekIE(InfoExtractor):
server = stream.get('_server')
for stream_url in stream_urls:
ext = determine_ext(stream_url)
if quality != 'auto' and ext in ('f4m', 'm3u8'):
continue
if ext == 'f4m':
formats.extend(self._extract_f4m_formats(
stream_url + '?hdcore=3.1.1&plugin=aasp-3.1.1.69.124',
video_id, preference=-1, f4m_id='hds'))
video_id, preference=-1, f4m_id='hds', fatal=False))
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
stream_url, video_id, 'mp4', preference=1, m3u8_id='hls'))
stream_url, video_id, 'mp4', preference=1, m3u8_id='hls', fatal=False))
else:
if server and server.startswith('rtmp'):
f = {

View File

@@ -68,9 +68,13 @@ class ArteTVPlus7IE(InfoExtractor):
def _extract_url_info(cls, url):
mobj = re.match(cls._VALID_URL, url)
lang = mobj.group('lang')
# This is not a real id, it can be for example AJT for the news
# http://www.arte.tv/guide/fr/emissions/AJT/arte-journal
video_id = mobj.group('id')
query = compat_parse_qs(compat_urllib_parse_urlparse(url).query)
if 'vid' in query:
video_id = query['vid'][0]
else:
# This is not a real id, it can be for example AJT for the news
# http://www.arte.tv/guide/fr/emissions/AJT/arte-journal
video_id = mobj.group('id')
return video_id, lang
def _real_extract(self, url):
@@ -79,9 +83,15 @@ class ArteTVPlus7IE(InfoExtractor):
return self._extract_from_webpage(webpage, video_id, lang)
def _extract_from_webpage(self, webpage, video_id, lang):
patterns_templates = (r'arte_vp_url=["\'](.*?%s.*?)["\']', r'data-url=["\']([^"]+%s[^"]+)["\']')
ids = (video_id, '')
# some pages contain multiple videos (like
# http://www.arte.tv/guide/de/sendungen/XEN/xenius/?vid=055918-015_PLUS7-D),
# so we first try to look for json URLs that contain the video id from
# the 'vid' parameter.
patterns = [t % re.escape(_id) for _id in ids for t in patterns_templates]
json_url = self._html_search_regex(
[r'arte_vp_url=["\'](.*?)["\']', r'data-url=["\']([^"]+)["\']'],
webpage, 'json vp url', default=None)
patterns, webpage, 'json vp url', default=None)
if not json_url:
iframe_url = self._html_search_regex(
r'<iframe[^>]+src=(["\'])(?P<url>.+\bjson_url=.+?)\1',
@@ -189,25 +199,19 @@ class ArteTVCreativeIE(ArteTVPlus7IE):
class ArteTVFutureIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:future'
_VALID_URL = r'https?://future\.arte\.tv/(?P<lang>fr|de)/(thema|sujet)/.*?#article-anchor-(?P<id>\d+)'
_VALID_URL = r'https?://future\.arte\.tv/(?P<lang>fr|de)/(?P<id>.+)'
_TEST = {
'url': 'http://future.arte.tv/fr/sujet/info-sciences#article-anchor-7081',
_TESTS = [{
'url': 'http://future.arte.tv/fr/info-sciences/les-ecrevisses-aussi-sont-anxieuses',
'info_dict': {
'id': '5201',
'id': '050940-028-A',
'ext': 'mp4',
'title': 'Les champignons au secours de la planète',
'upload_date': '20131101',
'title': 'Les écrevisses aussi peuvent être anxieuses',
},
}
def _real_extract(self, url):
anchor_id, lang = self._extract_url_info(url)
webpage = self._download_webpage(url, anchor_id)
row = self._search_regex(
r'(?s)id="%s"[^>]*>.+?(<div[^>]*arte_vp_url[^>]*>)' % anchor_id,
webpage, 'row')
return self._extract_from_webpage(row, anchor_id, lang)
}, {
'url': 'http://future.arte.tv/fr/la-science-est-elle-responsable',
'only_matching': True,
}]
class ArteTVDDCIE(ArteTVPlus7IE):
@@ -245,6 +249,23 @@ class ArteTVConcertIE(ArteTVPlus7IE):
}
class ArteTVCinemaIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:cinema'
_VALID_URL = r'https?://cinema\.arte\.tv/(?P<lang>de|fr)/(?P<id>.+)'
_TEST = {
'url': 'http://cinema.arte.tv/de/node/38291',
'md5': '6b275511a5107c60bacbeeda368c3aa1',
'info_dict': {
'id': '055876-000_PWA12025-D',
'ext': 'mp4',
'title': 'Tod auf dem Nil',
'upload_date': '20160122',
'description': 'md5:7f749bbb77d800ef2be11d54529b96bc',
},
}
class ArteTVEmbedIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:embed'
_VALID_URL = r'''(?x)

View File

@@ -2,16 +2,18 @@ from __future__ import unicode_literals
import time
import hmac
import hashlib
import re
from .common import InfoExtractor
from ..compat import (
compat_str,
compat_urllib_parse,
compat_urllib_request,
)
from ..utils import (
int_or_none,
float_or_none,
sanitized_Request,
xpath_text,
ExtractorError,
)
@@ -32,6 +34,19 @@ class AtresPlayerIE(InfoExtractor):
'duration': 5527.6,
'thumbnail': 're:^https?://.*\.jpg$',
},
'skip': 'This video is only available for registered users'
},
{
'url': 'http://www.atresplayer.com/television/especial/videoencuentros/temporada-1/capitulo-112-david-bustamante_2014121600375.html',
'md5': '0d0e918533bbd4b263f2de4d197d4aac',
'info_dict': {
'id': 'capitulo-112-david-bustamante',
'ext': 'flv',
'title': 'David Bustamante',
'description': 'md5:f33f1c0a05be57f6708d4dd83a3b81c6',
'duration': 1439.0,
'thumbnail': 're:^https?://.*\.jpg$',
},
},
{
'url': 'http://www.atresplayer.com/television/series/el-secreto-de-puente-viejo/el-chico-de-los-tres-lunares/capitulo-977-29-12-14_2014122400174.html',
@@ -50,6 +65,13 @@ class AtresPlayerIE(InfoExtractor):
_LOGIN_URL = 'https://servicios.atresplayer.com/j_spring_security_check'
_ERRORS = {
'UNPUBLISHED': 'We\'re sorry, but this video is not yet available.',
'DELETED': 'This video has expired and is no longer available for online streaming.',
'GEOUNPUBLISHED': 'We\'re sorry, but this video is not available in your region due to right restrictions.',
# 'PREMIUM': 'PREMIUM',
}
def _real_initialize(self):
self._login()
@@ -63,7 +85,7 @@ class AtresPlayerIE(InfoExtractor):
'j_password': password,
}
request = compat_urllib_request.Request(
request = sanitized_Request(
self._LOGIN_URL, compat_urllib_parse.urlencode(login_form).encode('utf-8'))
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
response = self._download_webpage(
@@ -83,58 +105,72 @@ class AtresPlayerIE(InfoExtractor):
episode_id = self._search_regex(
r'episode="([^"]+)"', webpage, 'episode id')
request = sanitized_Request(
self._PLAYER_URL_TEMPLATE % episode_id,
headers={'User-Agent': self._USER_AGENT})
player = self._download_json(request, episode_id, 'Downloading player JSON')
episode_type = player.get('typeOfEpisode')
error_message = self._ERRORS.get(episode_type)
if error_message:
raise ExtractorError(
'%s returned error: %s' % (self.IE_NAME, error_message), expected=True)
formats = []
video_url = player.get('urlVideo')
if video_url:
format_info = {
'url': video_url,
'format_id': 'http',
}
mobj = re.search(r'(?P<bitrate>\d+)K_(?P<width>\d+)x(?P<height>\d+)', video_url)
if mobj:
format_info.update({
'width': int_or_none(mobj.group('width')),
'height': int_or_none(mobj.group('height')),
'tbr': int_or_none(mobj.group('bitrate')),
})
formats.append(format_info)
timestamp = int_or_none(self._download_webpage(
self._TIME_API_URL,
video_id, 'Downloading timestamp', fatal=False), 1000, time.time())
timestamp_shifted = compat_str(timestamp + self._TIMESTAMP_SHIFT)
token = hmac.new(
self._MAGIC.encode('ascii'),
(episode_id + timestamp_shifted).encode('utf-8')
(episode_id + timestamp_shifted).encode('utf-8'), hashlib.md5
).hexdigest()
formats = []
for fmt in ['windows', 'android_tablet']:
request = compat_urllib_request.Request(
self._URL_VIDEO_TEMPLATE.format(fmt, episode_id, timestamp_shifted, token))
request.add_header('User-Agent', self._USER_AGENT)
request = sanitized_Request(
self._URL_VIDEO_TEMPLATE.format('windows', episode_id, timestamp_shifted, token),
headers={'User-Agent': self._USER_AGENT})
fmt_json = self._download_json(
request, video_id, 'Downloading %s video JSON' % fmt)
fmt_json = self._download_json(
request, video_id, 'Downloading windows video JSON')
result = fmt_json.get('resultDes')
if result.lower() != 'ok':
raise ExtractorError(
'%s returned error: %s' % (self.IE_NAME, result), expected=True)
result = fmt_json.get('resultDes')
if result.lower() != 'ok':
raise ExtractorError(
'%s returned error: %s' % (self.IE_NAME, result), expected=True)
for format_id, video_url in fmt_json['resultObject'].items():
if format_id == 'token' or not video_url.startswith('http'):
continue
if video_url.endswith('/Manifest'):
if 'geodeswowsmpra3player' in video_url:
f4m_path = video_url.split('smil:', 1)[-1].split('free_', 1)[0]
f4m_url = 'http://drg.antena3.com/{0}hds/es/sd.f4m'.format(f4m_path)
# this videos are protected by DRM, the f4m downloader doesn't support them
continue
else:
f4m_url = video_url[:-9] + '/manifest.f4m'
formats.extend(self._extract_f4m_formats(f4m_url, video_id))
else:
formats.append({
'url': video_url,
'format_id': 'android-%s' % format_id,
'preference': 1,
})
for format_id, video_url in fmt_json['resultObject'].items():
if format_id == 'token' or not video_url.startswith('http'):
continue
if 'geodeswowsmpra3player' in video_url:
f4m_path = video_url.split('smil:', 1)[-1].split('free_', 1)[0]
f4m_url = 'http://drg.antena3.com/{0}hds/es/sd.f4m'.format(f4m_path)
# this videos are protected by DRM, the f4m downloader doesn't support them
continue
else:
f4m_url = video_url[:-9] + '/manifest.f4m'
formats.extend(self._extract_f4m_formats(f4m_url, video_id, f4m_id='hds', fatal=False))
self._sort_formats(formats)
player = self._download_json(
self._PLAYER_URL_TEMPLATE % episode_id,
episode_id)
path_data = player.get('pathData')
episode = self._download_xml(
self._EPISODE_URL_TEMPLATE % path_data,
video_id, 'Downloading episode XML')
self._EPISODE_URL_TEMPLATE % path_data, video_id,
'Downloading episode XML')
duration = float_or_none(xpath_text(
episode, './media/asset/info/technical/contentDuration', 'duration'))

View File

@@ -0,0 +1,80 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
parse_iso8601,
sanitized_Request,
)
class AudiMediaIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?audimedia\.tv/(?:en|de)/vid/(?P<id>[^/?#]+)'
_TEST = {
'url': 'https://audimedia.tv/en/vid/60-seconds-of-audi-sport-104-2015-wec-bahrain-rookie-test',
'md5': '79a8b71c46d49042609795ab59779b66',
'info_dict': {
'id': '1565',
'ext': 'mp4',
'title': '60 Seconds of Audi Sport 104/2015 - WEC Bahrain, Rookie Test',
'description': 'md5:60e5d30a78ced725f7b8d34370762941',
'upload_date': '20151124',
'timestamp': 1448354940,
'duration': 74022,
'view_count': int,
}
}
# extracted from https://audimedia.tv/assets/embed/embedded-player.js (dataSourceAuthToken)
_AUTH_TOKEN = 'e25b42847dba18c6c8816d5d8ce94c326e06823ebf0859ed164b3ba169be97f2'
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
raw_payload = self._search_regex(r'<script[^>]+class="amtv-embed"[^>]+id="([^"]+)"', webpage, 'raw payload')
_, stage_mode, video_id, lang = raw_payload.split('-')
# TODO: handle s and e stage_mode (live streams and ended live streams)
if stage_mode not in ('s', 'e'):
request = sanitized_Request(
'https://audimedia.tv/api/video/v1/videos/%s?embed[]=video_versions&embed[]=thumbnail_image&where[content_language_iso]=%s' % (video_id, lang),
headers={'X-Auth-Token': self._AUTH_TOKEN})
json_data = self._download_json(request, video_id)['results']
formats = []
stream_url_hls = json_data.get('stream_url_hls')
if stream_url_hls:
formats.extend(self._extract_m3u8_formats(
stream_url_hls, video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls', fatal=False))
stream_url_hds = json_data.get('stream_url_hds')
if stream_url_hds:
formats.extend(self._extract_f4m_formats(
stream_url_hds + '?hdcore=3.4.0',
video_id, f4m_id='hds', fatal=False))
for video_version in json_data.get('video_versions'):
video_version_url = video_version.get('download_url') or video_version.get('stream_url')
if not video_version_url:
continue
formats.append({
'url': video_version_url,
'width': int_or_none(video_version.get('width')),
'height': int_or_none(video_version.get('height')),
'abr': int_or_none(video_version.get('audio_bitrate')),
'vbr': int_or_none(video_version.get('video_bitrate')),
})
self._sort_formats(formats)
return {
'id': video_id,
'title': json_data['title'],
'description': json_data.get('subtitle'),
'thumbnail': json_data.get('thumbnail_image', {}).get('file'),
'timestamp': parse_iso8601(json_data.get('publication_date')),
'duration': int_or_none(json_data.get('duration')),
'view_count': int_or_none(json_data.get('view_count')),
'formats': formats,
}

View File

@@ -56,7 +56,7 @@ class AudiomackIE(InfoExtractor):
# API is inconsistent with errors
if 'url' not in api_response or not api_response['url'] or 'error' in api_response:
raise ExtractorError('Invalid url %s', url)
raise ExtractorError('Invalid url %s' % url)
# Audiomack wraps a lot of soundcloud tracks in their branded wrapper
# if so, pass the work off to the soundcloud extractor

View File

@@ -3,7 +3,11 @@ from __future__ import unicode_literals
import json
from .common import InfoExtractor
from ..utils import float_or_none
from ..utils import (
ExtractorError,
float_or_none,
sanitized_Request,
)
class AzubuIE(InfoExtractor):
@@ -91,3 +95,37 @@ class AzubuIE(InfoExtractor):
'view_count': view_count,
'formats': formats,
}
class AzubuLiveIE(InfoExtractor):
_VALID_URL = r'http://www.azubu.tv/(?P<id>[^/]+)$'
_TEST = {
'url': 'http://www.azubu.tv/MarsTVMDLen',
'only_matching': True,
}
def _real_extract(self, url):
user = self._match_id(url)
info = self._download_json(
'http://api.azubu.tv/public/modules/last-video/{0}/info'.format(user),
user)['data']
if info['type'] != 'STREAM':
raise ExtractorError('{0} is not streaming live'.format(user), expected=True)
req = sanitized_Request(
'https://edge-elb.api.brightcove.com/playback/v1/accounts/3361910549001/videos/ref:' + info['reference_id'])
req.add_header('Accept', 'application/json;pk=BCpkADawqM1gvI0oGWg8dxQHlgT8HkdE2LnAlWAZkOlznO39bSZX726u4JqnDsK3MDXcO01JxXK2tZtJbgQChxgaFzEVdHRjaDoxaOu8hHOO8NYhwdxw9BzvgkvLUlpbDNUuDoc4E4wxDToV')
bc_info = self._download_json(req, user)
m3u8_url = next(source['src'] for source in bc_info['sources'] if source['container'] == 'M2TS')
formats = self._extract_m3u8_formats(m3u8_url, user, ext='mp4')
return {
'id': info['id'],
'title': self._live_title(info['title']),
'uploader_id': user,
'formats': formats,
'is_live': True,
'thumbnail': bc_info['poster'],
}

View File

@@ -4,7 +4,7 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import unescapeHTML
class BaiduVideoIE(InfoExtractor):
@@ -14,8 +14,8 @@ class BaiduVideoIE(InfoExtractor):
'url': 'http://v.baidu.com/comic/1069.htm?frp=bdbrand&q=%E4%B8%AD%E5%8D%8E%E5%B0%8F%E5%BD%93%E5%AE%B6',
'info_dict': {
'id': '1069',
'title': '中华小当家 TV版 (全52集)',
'description': 'md5:395a419e41215e531c857bb037bbaf80',
'title': '中华小当家 TV版国语',
'description': 'md5:51be07afe461cf99fa61231421b5397c',
},
'playlist_count': 52,
}, {
@@ -25,45 +25,32 @@ class BaiduVideoIE(InfoExtractor):
'title': 're:^奔跑吧兄弟',
'description': 'md5:1bf88bad6d850930f542d51547c089b8',
},
'playlist_mincount': 3,
'playlist_mincount': 12,
}]
def _call_api(self, path, category, playlist_id, note):
return self._download_json('http://app.video.baidu.com/%s/?worktype=adnative%s&id=%s' % (
path, category, playlist_id), playlist_id, note)
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
playlist_id = mobj.group('id')
category = category2 = mobj.group('type')
category, playlist_id = re.match(self._VALID_URL, url).groups()
if category == 'show':
category2 = 'tvshow'
category = 'tvshow'
if category == 'tv':
category = 'tvplay'
webpage = self._download_webpage(url, playlist_id)
playlist_detail = self._call_api(
'xqinfo', category, playlist_id, 'Download playlist JSON metadata')
playlist_title = self._html_search_regex(
r'title\s*:\s*(["\'])(?P<title>[^\']+)\1', webpage,
'playlist title', group='title')
playlist_description = self._html_search_regex(
r'<input[^>]+class="j-data-intro"[^>]+value="([^"]+)"/>', webpage,
playlist_id, 'playlist description')
playlist_title = playlist_detail['title']
playlist_description = unescapeHTML(playlist_detail.get('intro'))
site = self._html_search_regex(
r'filterSite\s*:\s*["\']([^"]*)["\']', webpage,
'primary provider site')
api_result = self._download_json(
'http://v.baidu.com/%s_intro/?dtype=%sPlayUrl&id=%s&site=%s' % (
category, category2, playlist_id, site),
playlist_id, 'Get playlist links')
episodes_detail = self._call_api(
'xqsingle', category, playlist_id, 'Download episodes JSON metadata')
entries = []
for episode in api_result[0]['episodes']:
episode_id = '%s_%s' % (playlist_id, episode['episode'])
redirect_page = self._download_webpage(
compat_urlparse.urljoin(url, episode['url']), episode_id,
note='Download Baidu redirect page')
real_url = self._html_search_regex(
r'location\.replace\("([^"]+)"\)', redirect_page, 'real URL')
entries.append(self.url_result(
real_url, video_title=episode['single_title']))
entries = [self.url_result(
episode['url'], video_title=episode['title']
) for episode in episodes_detail['videos']]
return self.playlist_result(
entries, playlist_id, playlist_title, playlist_description)

View File

@@ -6,13 +6,13 @@ import itertools
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse,
compat_urllib_request,
compat_str,
)
from ..utils import (
ExtractorError,
int_or_none,
float_or_none,
sanitized_Request,
)
@@ -57,7 +57,7 @@ class BambuserIE(InfoExtractor):
'pass': password,
}
request = compat_urllib_request.Request(
request = sanitized_Request(
self._LOGIN_URL, compat_urllib_parse.urlencode(login_form).encode('utf-8'))
request.add_header('Referer', self._LOGIN_URL)
response = self._download_webpage(
@@ -126,7 +126,7 @@ class BambuserChannelIE(InfoExtractor):
'&sort=created&access_mode=0%2C1%2C2&limit={count}'
'&method=broadcast&format=json&vid_older_than={last}'
).format(user=user, count=self._STEP, last=last_id)
req = compat_urllib_request.Request(req_url)
req = sanitized_Request(req_url)
# Without setting this header, we wouldn't get any result
req.add_header('Referer', 'http://bambuser.com/channel/%s' % user)
data = self._download_json(

View File

@@ -22,7 +22,18 @@ from ..compat import (
class BBCCoUkIE(InfoExtractor):
IE_NAME = 'bbc.co.uk'
IE_DESC = 'BBC iPlayer'
_VALID_URL = r'https?://(?:www\.)?bbc\.co\.uk/(?:(?:programmes/(?!articles/)|iplayer(?:/[^/]+)?/(?:episode/|playlist/))|music/clips[/#])(?P<id>[\da-z]{8})'
_ID_REGEX = r'[pb][\da-z]{7}'
_VALID_URL = r'''(?x)
https?://
(?:www\.)?bbc\.co\.uk/
(?:
programmes/(?!articles/)|
iplayer(?:/[^/]+)?/(?:episode/|playlist/)|
music/clips[/#]|
radio/player/
)
(?P<id>%s)
''' % _ID_REGEX
_MEDIASELECTOR_URLS = [
# Provides HQ HLS streams with even better quality that pc mediaset but fails
@@ -46,9 +57,8 @@ class BBCCoUkIE(InfoExtractor):
'info_dict': {
'id': 'b039d07m',
'ext': 'flv',
'title': 'Kaleidoscope, Leonard Cohen',
'title': 'Leonard Cohen, Kaleidoscope - BBC Radio 4',
'description': 'The Canadian poet and songwriter reflects on his musical career.',
'duration': 1740,
},
'params': {
# rtmp download
@@ -111,16 +121,17 @@ class BBCCoUkIE(InfoExtractor):
'params': {
# rtmp download
'skip_download': True,
}
},
'skip': 'Episode is no longer available on BBC iPlayer Radio',
}, {
'url': 'http://www.bbc.co.uk/music/clips/p02frcc3',
'url': 'http://www.bbc.co.uk/music/clips/p022h44b',
'note': 'Audio',
'info_dict': {
'id': 'p02frcch',
'id': 'p022h44j',
'ext': 'flv',
'title': 'Pete Tong, Past, Present and Future Special, Madeon - After Hours mix',
'description': 'French house superstar Madeon takes us out of the club and onto the after party.',
'duration': 3507,
'title': 'BBC Proms Music Guides, Rachmaninov: Symphonic Dances',
'description': "In this Proms Music Guide, Andrew McGregor looks at Rachmaninov's Symphonic Dances.",
'duration': 227,
},
'params': {
# rtmp download
@@ -171,13 +182,25 @@ class BBCCoUkIE(InfoExtractor):
}, {
# iptv-all mediaset fails with geolocation however there is no geo restriction
# for this programme at all
'url': 'http://www.bbc.co.uk/programmes/b06bp7lf',
'url': 'http://www.bbc.co.uk/programmes/b06rkn85',
'info_dict': {
'id': 'b06bp7kf',
'id': 'b06rkms3',
'ext': 'flv',
'title': "Annie Mac's Friday Night, B.Traits sits in for Annie",
'description': 'B.Traits sits in for Annie Mac with a Mini-Mix from Disclosure.',
'duration': 10800,
'title': "Best of the Mini-Mixes 2015: Part 3, Annie Mac's Friday Night - BBC Radio 1",
'description': "Annie has part three in the Best of the Mini-Mixes 2015, plus the year's Most Played!",
},
'params': {
# rtmp download
'skip_download': True,
},
}, {
# compact player (https://github.com/rg3/youtube-dl/issues/8147)
'url': 'http://www.bbc.co.uk/programmes/p028bfkf/player',
'info_dict': {
'id': 'p028bfkj',
'ext': 'flv',
'title': 'Extract from BBC documentary Look Stranger - Giant Leeks and Magic Brews',
'description': 'Extract from BBC documentary Look Stranger - Giant Leeks and Magic Brews',
},
'params': {
# rtmp download
@@ -192,6 +215,9 @@ class BBCCoUkIE(InfoExtractor):
}, {
'url': 'http://www.bbc.co.uk/iplayer/cbeebies/episode/b0480276/bing-14-atchoo',
'only_matching': True,
}, {
'url': 'http://www.bbc.co.uk/radio/player/p03cchwf',
'only_matching': True,
}
]
@@ -222,11 +248,9 @@ class BBCCoUkIE(InfoExtractor):
elif transfer_format == 'dash':
pass
elif transfer_format == 'hls':
m3u8_formats = self._extract_m3u8_formats(
formats.extend(self._extract_m3u8_formats(
href, programme_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id=supplier, fatal=False)
if m3u8_formats:
formats.extend(m3u8_formats)
m3u8_id=supplier, fatal=False))
# Direct link
else:
formats.append({
@@ -453,6 +477,7 @@ class BBCCoUkIE(InfoExtractor):
webpage = self._download_webpage(url, group_id, 'Downloading video page')
programme_id = None
duration = None
tviplayer = self._search_regex(
r'mediator\.bind\(({.+?})\s*,\s*document\.getElementById',
@@ -465,14 +490,19 @@ class BBCCoUkIE(InfoExtractor):
if not programme_id:
programme_id = self._search_regex(
r'"vpid"\s*:\s*"([\da-z]{8})"', webpage, 'vpid', fatal=False, default=None)
r'"vpid"\s*:\s*"(%s)"' % self._ID_REGEX, webpage, 'vpid', fatal=False, default=None)
if programme_id:
formats, subtitles = self._download_media_selector(programme_id)
title = self._og_search_title(webpage)
title = self._og_search_title(webpage, default=None) or self._html_search_regex(
(r'<h2[^>]+id="parent-title"[^>]*>(.+?)</h2>',
r'<div[^>]+class="info"[^>]*>\s*<h1>(.+?)</h1>'), webpage, 'title')
description = self._search_regex(
r'<p class="[^"]*medium-description[^"]*">([^<]+)</p>',
webpage, 'description', fatal=False)
(r'<p class="[^"]*medium-description[^"]*">([^<]+)</p>',
r'<div[^>]+class="info_+synopsis"[^>]*>([^<]+)</div>'),
webpage, 'description', default=None)
if not description:
description = self._html_search_meta('description', webpage)
else:
programme_id, title, description, duration, formats, subtitles = self._download_playlist(group_id)
@@ -586,6 +616,7 @@ class BBCIE(BBCCoUkIE):
'ext': 'mp4',
'title': '''Judge Mindy Glazer: "I'm sorry to see you here... I always wondered what happened to you"''',
'duration': 56,
'description': '''Judge Mindy Glazer: "I'm sorry to see you here... I always wondered what happened to you"''',
},
'params': {
'skip_download': True,
@@ -702,19 +733,10 @@ class BBCIE(BBCCoUkIE):
webpage = self._download_webpage(url, playlist_id)
timestamp = None
playlist_title = None
playlist_description = None
ld = self._parse_json(
self._search_regex(
r'(?s)<script type="application/ld\+json">(.+?)</script>',
webpage, 'ld json', default='{}'),
playlist_id, fatal=False)
if ld:
timestamp = parse_iso8601(ld.get('datePublished'))
playlist_title = ld.get('headline')
playlist_description = ld.get('articleBody')
json_ld_info = self._search_json_ld(webpage, playlist_id, default=None)
timestamp = json_ld_info.get('timestamp')
playlist_title = json_ld_info.get('title')
playlist_description = json_ld_info.get('description')
if not timestamp:
timestamp = parse_iso8601(self._search_regex(
@@ -728,6 +750,7 @@ class BBCIE(BBCCoUkIE):
# article with multiple videos embedded with playlist.sxml (e.g.
# http://www.bbc.com/sport/0/football/34475836)
playlists = re.findall(r'<param[^>]+name="playlist"[^>]+value="([^"]+)"', webpage)
playlists.extend(re.findall(r'data-media-id="([^"]+/playlist\.sxml)"', webpage))
if playlists:
entries = [
self._extract_from_playlist_sxml(playlist_url, playlist_id, timestamp)
@@ -780,8 +803,9 @@ class BBCIE(BBCCoUkIE):
# single video story (e.g. http://www.bbc.com/travel/story/20150625-sri-lankas-spicy-secret)
programme_id = self._search_regex(
[r'data-video-player-vpid="([\da-z]{8})"',
r'<param[^>]+name="externalIdentifier"[^>]+value="([\da-z]{8})"'],
[r'data-video-player-vpid="(%s)"' % self._ID_REGEX,
r'<param[^>]+name="externalIdentifier"[^>]+value="(%s)"' % self._ID_REGEX,
r'videoId\s*:\s*["\'](%s)["\']' % self._ID_REGEX],
webpage, 'vpid', default=None)
if programme_id:
@@ -816,7 +840,7 @@ class BBCIE(BBCCoUkIE):
# Multiple video article (e.g.
# http://www.bbc.co.uk/blogs/adamcurtis/entries/3662a707-0af9-3149-963f-47bea720b460)
EMBED_URL = r'https?://(?:www\.)?bbc\.co\.uk/(?:[^/]+/)+[\da-z]{8}(?:\b[^"]+)?'
EMBED_URL = r'https?://(?:www\.)?bbc\.co\.uk/(?:[^/]+/)+%s(?:\b[^"]+)?' % self._ID_REGEX
entries = []
for match in extract_all(r'new\s+SMP\(({.+?})\)'):
embed_url = match.get('playerSettings', {}).get('externalEmbedUrl')

View File

@@ -1,6 +1,11 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import (
compat_chr,
compat_ord,
compat_urllib_parse_unquote,
)
from ..utils import (
int_or_none,
parse_iso8601,
@@ -29,7 +34,38 @@ class BeegIE(InfoExtractor):
video_id = self._match_id(url)
video = self._download_json(
'http://beeg.com/api/v1/video/%s' % video_id, video_id)
'https://api.beeg.com/api/v5/video/%s' % video_id, video_id)
def split(o, e):
def cut(s, x):
n.append(s[:x])
return s[x:]
n = []
r = len(o) % e
if r > 0:
o = cut(o, r)
while len(o) > e:
o = cut(o, e)
n.append(o)
return n
def decrypt_key(key):
# Reverse engineered from http://static.beeg.com/cpl/1105.js
a = '5ShMcIQlssOd7zChAIOlmeTZDaUxULbJRnywYaiB'
e = compat_urllib_parse_unquote(key)
o = ''.join([
compat_chr(compat_ord(e[n]) - compat_ord(a[n % len(a)]) % 21)
for n in range(len(e))])
return ''.join(split(o, 3)[::-1])
def decrypt_url(encrypted_url):
encrypted_url = self._proto_relative_url(
encrypted_url.replace('{DATA_MARKERS}', ''), 'https:')
key = self._search_regex(
r'/key=(.*?)%2Cend=', encrypted_url, 'key', default=None)
if not key:
return encrypted_url
return encrypted_url.replace(key, decrypt_key(key))
formats = []
for format_id, video_url in video.items():
@@ -40,7 +76,7 @@ class BeegIE(InfoExtractor):
if not height:
continue
formats.append({
'url': self._proto_relative_url(video_url.replace('{DATA_MARKERS}', ''), 'http:'),
'url': decrypt_url(video_url),
'format_id': format_id,
'height': int(height),
})

View File

@@ -0,0 +1,85 @@
# coding: utf-8
from __future__ import unicode_literals
import base64
import re
from .common import InfoExtractor
from ..compat import compat_urllib_parse_unquote
class BigflixIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?bigflix\.com/.+/(?P<id>[0-9]+)'
_TESTS = [{
'url': 'http://www.bigflix.com/Hindi-movies/Action-movies/Singham-Returns/16537',
'md5': 'ec76aa9b1129e2e5b301a474e54fab74',
'info_dict': {
'id': '16537',
'ext': 'mp4',
'title': 'Singham Returns',
'description': 'md5:3d2ba5815f14911d5cc6a501ae0cf65d',
}
}, {
# 2 formats
'url': 'http://www.bigflix.com/Tamil-movies/Drama-movies/Madarasapatinam/16070',
'info_dict': {
'id': '16070',
'ext': 'mp4',
'title': 'Madarasapatinam',
'description': 'md5:63b9b8ed79189c6f0418c26d9a3452ca',
'formats': 'mincount:2',
},
'params': {
'skip_download': True,
}
}, {
# multiple formats
'url': 'http://www.bigflix.com/Malayalam-movies/Drama-movies/Indian-Rupee/15967',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = self._html_search_regex(
r'<div[^>]+class=["\']pagetitle["\'][^>]*>(.+?)</div>',
webpage, 'title')
def decode_url(quoted_b64_url):
return base64.b64decode(compat_urllib_parse_unquote(
quoted_b64_url).encode('ascii')).decode('utf-8')
formats = []
for height, encoded_url in re.findall(
r'ContentURL_(\d{3,4})[pP][^=]+=([^&]+)', webpage):
video_url = decode_url(encoded_url)
f = {
'url': video_url,
'format_id': '%sp' % height,
'height': int(height),
}
if video_url.startswith('rtmp'):
f['ext'] = 'flv'
formats.append(f)
file_url = self._search_regex(
r'file=([^&]+)', webpage, 'video url', default=None)
if file_url:
video_url = decode_url(file_url)
if all(f['url'] != video_url for f in formats):
formats.append({
'url': decode_url(file_url),
})
self._sort_formats(formats)
description = self._html_search_meta('description', webpage)
return {
'id': video_id,
'title': title,
'description': description,
'formats': formats
}

View File

@@ -2,143 +2,109 @@
from __future__ import unicode_literals
import re
import itertools
import json
from .common import InfoExtractor
from ..compat import (
compat_etree_fromstring,
)
from ..compat import compat_str
from ..utils import (
int_or_none,
unified_strdate,
unescapeHTML,
ExtractorError,
xpath_text,
)
class BiliBiliIE(InfoExtractor):
_VALID_URL = r'http://www\.bilibili\.(?:tv|com)/video/av(?P<id>[0-9]+)/'
_VALID_URL = r'http://www\.bilibili\.(?:tv|com)/video/av(?P<id>\d+)(?:/index_(?P<page_num>\d+).html)?'
_TESTS = [{
'url': 'http://www.bilibili.tv/video/av1074402/',
'md5': '2c301e4dab317596e837c3e7633e7d86',
'info_dict': {
'id': '1074402_part1',
'id': '1554319',
'ext': 'flv',
'title': '【金坷垃】金泡沫',
'duration': 308,
'duration': 308313,
'upload_date': '20140420',
'thumbnail': 're:^https?://.+\.jpg',
'description': 'md5:ce18c2a2d2193f0df2917d270f2e5923',
'timestamp': 1397983878,
'uploader': '菊子桑',
},
}, {
'url': 'http://www.bilibili.com/video/av1041170/',
'info_dict': {
'id': '1041170',
'title': '【BD1080P】刀语【诸神&异域】',
'description': '这是个神奇的故事~每个人不留弹幕不给走哦~切利哦!~',
'uploader': '枫叶逝去',
'timestamp': 1396501299,
},
'playlist_count': 9,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
page_num = mobj.group('page_num') or '1'
if '(此视频不存在或被删除)' in webpage:
raise ExtractorError(
'The video does not exist or was deleted', expected=True)
view_data = self._download_json(
'http://api.bilibili.com/view?type=json&appkey=8e9fc618fbd41e28&id=%s&page=%s' % (video_id, page_num),
video_id)
if 'error' in view_data:
raise ExtractorError('%s said: %s' % (self.IE_NAME, view_data['error']), expected=True)
if '>你没有权限浏览! 由于版权相关问题 我们不对您所在的地区提供服务<' in webpage:
raise ExtractorError(
'The video is not available in your region due to copyright reasons',
expected=True)
cid = view_data['cid']
title = unescapeHTML(view_data['title'])
video_code = self._search_regex(
r'(?s)<div itemprop="video".*?>(.*?)</div>', webpage, 'video code')
doc = self._download_xml(
'http://interface.bilibili.com/v_cdn_play?appkey=8e9fc618fbd41e28&cid=%s' % cid,
cid,
'Downloading page %s/%s' % (page_num, view_data['pages'])
)
title = self._html_search_meta(
'media:title', video_code, 'title', fatal=True)
duration_str = self._html_search_meta(
'duration', video_code, 'duration')
if duration_str is None:
duration = None
else:
duration_mobj = re.match(
r'^T(?:(?P<hours>[0-9]+)H)?(?P<minutes>[0-9]+)M(?P<seconds>[0-9]+)S$',
duration_str)
duration = (
int_or_none(duration_mobj.group('hours'), default=0) * 3600 +
int(duration_mobj.group('minutes')) * 60 +
int(duration_mobj.group('seconds')))
upload_date = unified_strdate(self._html_search_meta(
'uploadDate', video_code, fatal=False))
thumbnail = self._html_search_meta(
'thumbnailUrl', video_code, 'thumbnail', fatal=False)
cid = self._search_regex(r'cid=(\d+)', webpage, 'cid')
if xpath_text(doc, './result') == 'error':
raise ExtractorError('%s said: %s' % (self.IE_NAME, xpath_text(doc, './message')), expected=True)
entries = []
lq_page = self._download_webpage(
'http://interface.bilibili.com/v_cdn_play?appkey=1&cid=%s' % cid,
video_id,
note='Downloading LQ video info'
)
try:
err_info = json.loads(lq_page)
raise ExtractorError(
'BiliBili said: ' + err_info['error_text'], expected=True)
except ValueError:
pass
lq_doc = compat_etree_fromstring(lq_page)
lq_durls = lq_doc.findall('./durl')
hq_doc = self._download_xml(
'http://interface.bilibili.com/playurl?appkey=1&cid=%s' % cid,
video_id,
note='Downloading HQ video info',
fatal=False,
)
if hq_doc is not False:
hq_durls = hq_doc.findall('./durl')
assert len(lq_durls) == len(hq_durls)
else:
hq_durls = itertools.repeat(None)
i = 1
for lq_durl, hq_durl in zip(lq_durls, hq_durls):
for durl in doc.findall('./durl'):
size = xpath_text(durl, ['./filesize', './size'])
formats = [{
'format_id': 'lq',
'quality': 1,
'url': lq_durl.find('./url').text,
'filesize': int_or_none(
lq_durl.find('./size'), get_attr='text'),
'url': durl.find('./url').text,
'filesize': int_or_none(size),
'ext': 'flv',
}]
if hq_durl is not None:
formats.append({
'format_id': 'hq',
'quality': 2,
'ext': 'flv',
'url': hq_durl.find('./url').text,
'filesize': int_or_none(
hq_durl.find('./size'), get_attr='text'),
})
self._sort_formats(formats)
backup_urls = durl.find('./backup_url')
if backup_urls is not None:
for backup_url in backup_urls.findall('./url'):
formats.append({'url': backup_url.text})
formats.reverse()
entries.append({
'id': '%s_part%d' % (video_id, i),
'id': '%s_part%s' % (cid, xpath_text(durl, './order')),
'title': title,
'duration': int_or_none(xpath_text(durl, './length'), 1000),
'formats': formats,
'duration': duration,
'upload_date': upload_date,
'thumbnail': thumbnail,
})
i += 1
return {
'_type': 'multi_video',
'entries': entries,
'id': video_id,
'title': title
info = {
'id': compat_str(cid),
'title': title,
'description': view_data.get('description'),
'thumbnail': view_data.get('pic'),
'uploader': view_data.get('author'),
'timestamp': int_or_none(view_data.get('created')),
'view_count': int_or_none(view_data.get('play')),
'duration': int_or_none(xpath_text(doc, './timelength')),
}
if len(entries) == 1:
entries[0].update(info)
return entries[0]
else:
info.update({
'_type': 'multi_video',
'id': video_id,
'entries': entries,
})
return info

View File

@@ -0,0 +1,106 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from .amp import AMPIE
from ..utils import (
ExtractorError,
int_or_none,
parse_iso8601,
)
class BleacherReportIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?bleacherreport\.com/articles/(?P<id>\d+)'
_TESTS = [{
'url': 'http://bleacherreport.com/articles/2496438-fsu-stat-projections-is-jalen-ramsey-best-defensive-player-in-college-football',
'md5': 'a3ffc3dc73afdbc2010f02d98f990f20',
'info_dict': {
'id': '2496438',
'ext': 'mp4',
'title': 'FSU Stat Projections: Is Jalen Ramsey Best Defensive Player in College Football?',
'uploader_id': 3992341,
'description': 'CFB, ACC, Florida State',
'timestamp': 1434380212,
'upload_date': '20150615',
'uploader': 'Team Stream Now ',
},
'add_ie': ['Ooyala'],
}, {
'url': 'http://bleacherreport.com/articles/2586817-aussie-golfers-get-fright-of-their-lives-after-being-chased-by-angry-kangaroo',
'md5': 'af5f90dc9c7ba1c19d0a3eac806bbf50',
'info_dict': {
'id': '2586817',
'ext': 'mp4',
'title': 'Aussie Golfers Get Fright of Their Lives After Being Chased by Angry Kangaroo',
'timestamp': 1446839961,
'uploader': 'Sean Fay',
'description': 'md5:825e94e0f3521df52fa83b2ed198fa20',
'uploader_id': 6466954,
'upload_date': '20151011',
},
'add_ie': ['Youtube'],
}]
def _real_extract(self, url):
article_id = self._match_id(url)
article_data = self._download_json('http://api.bleacherreport.com/api/v1/articles/%s' % article_id, article_id)['article']
thumbnails = []
primary_photo = article_data.get('primaryPhoto')
if primary_photo:
thumbnails = [{
'url': primary_photo['url'],
'width': primary_photo.get('width'),
'height': primary_photo.get('height'),
}]
info = {
'_type': 'url_transparent',
'id': article_id,
'title': article_data['title'],
'uploader': article_data.get('author', {}).get('name'),
'uploader_id': article_data.get('authorId'),
'timestamp': parse_iso8601(article_data.get('createdAt')),
'thumbnails': thumbnails,
'comment_count': int_or_none(article_data.get('commentsCount')),
'view_count': int_or_none(article_data.get('hitCount')),
}
video = article_data.get('video')
if video:
video_type = video['type']
if video_type == 'cms.bleacherreport.com':
info['url'] = 'http://bleacherreport.com/video_embed?id=%s' % video['id']
elif video_type == 'ooyala.com':
info['url'] = 'ooyala:%s' % video['id']
elif video_type == 'youtube.com':
info['url'] = video['id']
elif video_type == 'vine.co':
info['url'] = 'https://vine.co/v/%s' % video['id']
else:
info['url'] = video_type + video['id']
return info
else:
raise ExtractorError('no video in the article', expected=True)
class BleacherReportCMSIE(AMPIE):
_VALID_URL = r'https?://(?:www\.)?bleacherreport\.com/video_embed\?id=(?P<id>[0-9a-f-]{36})'
_TESTS = [{
'url': 'http://bleacherreport.com/video_embed?id=8fd44c2f-3dc5-4821-9118-2c825a98c0e1',
'md5': '8c2c12e3af7805152675446c905d159b',
'info_dict': {
'id': '8fd44c2f-3dc5-4821-9118-2c825a98c0e1',
'ext': 'flv',
'title': 'Cena vs. Rollins Would Expose the Heavyweight Division',
'description': 'md5:984afb4ade2f9c0db35f3267ed88b36e',
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
info = self._extract_feed_info('http://cms.bleacherreport.com/media/items/%s/akamai.json' % video_id)
info['id'] = video_id
return info

View File

@@ -1,292 +0,0 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_urllib_request,
compat_urlparse,
)
from ..utils import (
clean_html,
int_or_none,
parse_iso8601,
unescapeHTML,
xpath_text,
xpath_with_ns,
)
class BlipTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:\w+\.)?blip\.tv/(?:(?:.+-|rss/flash/)(?P<id>\d+)|((?:play/|api\.swf#)(?P<lookup_id>[\da-zA-Z+_]+)))'
_TESTS = [
{
'url': 'http://blip.tv/cbr/cbr-exclusive-gotham-city-imposters-bats-vs-jokerz-short-3-5796352',
'md5': '80baf1ec5c3d2019037c1c707d676b9f',
'info_dict': {
'id': '5779306',
'ext': 'm4v',
'title': 'CBR EXCLUSIVE: "Gotham City Imposters" Bats VS Jokerz Short 3',
'description': 'md5:9bc31f227219cde65e47eeec8d2dc596',
'timestamp': 1323138843,
'upload_date': '20111206',
'uploader': 'cbr',
'uploader_id': '679425',
'duration': 81,
}
},
{
# https://github.com/rg3/youtube-dl/pull/2274
'note': 'Video with subtitles',
'url': 'http://blip.tv/play/h6Uag5OEVgI.html',
'md5': '309f9d25b820b086ca163ffac8031806',
'info_dict': {
'id': '6586561',
'ext': 'mp4',
'title': 'Red vs. Blue Season 11 Episode 1',
'description': 'One-Zero-One',
'timestamp': 1371261608,
'upload_date': '20130615',
'uploader': 'redvsblue',
'uploader_id': '792887',
'duration': 279,
}
},
{
# https://bugzilla.redhat.com/show_bug.cgi?id=967465
'url': 'http://a.blip.tv/api.swf#h6Uag5KbVwI',
'md5': '314e87b1ebe7a48fcbfdd51b791ce5a6',
'info_dict': {
'id': '6573122',
'ext': 'mov',
'upload_date': '20130520',
'description': 'Two hapless space marines argue over what to do when they realize they have an astronomically huge problem on their hands.',
'title': 'Red vs. Blue Season 11 Trailer',
'timestamp': 1369029609,
'uploader': 'redvsblue',
'uploader_id': '792887',
}
},
{
'url': 'http://blip.tv/play/gbk766dkj4Yn',
'md5': 'fe0a33f022d49399a241e84a8ea8b8e3',
'info_dict': {
'id': '1749452',
'ext': 'mp4',
'upload_date': '20090208',
'description': 'Witness the first appearance of the Nostalgia Critic character, as Doug reviews the movie Transformers.',
'title': 'Nostalgia Critic: Transformers',
'timestamp': 1234068723,
'uploader': 'NostalgiaCritic',
'uploader_id': '246467',
}
},
{
# https://github.com/rg3/youtube-dl/pull/4404
'note': 'Audio only',
'url': 'http://blip.tv/hilarios-productions/weekly-manga-recap-kingdom-7119982',
'md5': '76c0a56f24e769ceaab21fbb6416a351',
'info_dict': {
'id': '7103299',
'ext': 'flv',
'title': 'Weekly Manga Recap: Kingdom',
'description': 'And then Shin breaks the enemy line, and he&apos;s all like HWAH! And then he slices a guy and it&apos;s all like FWASHING! And... it&apos;s really hard to describe the best parts of this series without breaking down into sound effects, okay?',
'timestamp': 1417660321,
'upload_date': '20141204',
'uploader': 'The Rollo T',
'uploader_id': '407429',
'duration': 7251,
'vcodec': 'none',
}
},
{
# missing duration
'url': 'http://blip.tv/rss/flash/6700880',
'info_dict': {
'id': '6684191',
'ext': 'm4v',
'title': 'Cowboy Bebop: Gateway Shuffle Review',
'description': 'md5:3acc480c0f9ae157f5fe88547ecaf3f8',
'timestamp': 1386639757,
'upload_date': '20131210',
'uploader': 'sfdebris',
'uploader_id': '706520',
}
}
]
@staticmethod
def _extract_url(webpage):
mobj = re.search(r'<meta\s[^>]*https?://api\.blip\.tv/\w+/redirect/\w+/(\d+)', webpage)
if mobj:
return 'http://blip.tv/a/a-' + mobj.group(1)
mobj = re.search(r'<(?:iframe|embed|object)\s[^>]*(https?://(?:\w+\.)?blip\.tv/(?:play/|api\.swf#)[a-zA-Z0-9_]+)', webpage)
if mobj:
return mobj.group(1)
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
lookup_id = mobj.group('lookup_id')
# See https://github.com/rg3/youtube-dl/issues/857 and
# https://github.com/rg3/youtube-dl/issues/4197
if lookup_id:
urlh = self._request_webpage(
'http://blip.tv/play/%s' % lookup_id, lookup_id, 'Resolving lookup id')
url = compat_urlparse.urlparse(urlh.geturl())
qs = compat_urlparse.parse_qs(url.query)
mobj = re.match(self._VALID_URL, qs['file'][0])
video_id = mobj.group('id')
rss = self._download_xml('http://blip.tv/rss/flash/%s' % video_id, video_id, 'Downloading video RSS')
def _x(p):
return xpath_with_ns(p, {
'blip': 'http://blip.tv/dtd/blip/1.0',
'media': 'http://search.yahoo.com/mrss/',
'itunes': 'http://www.itunes.com/dtds/podcast-1.0.dtd',
})
item = rss.find('channel/item')
video_id = xpath_text(item, _x('blip:item_id'), 'video id') or lookup_id
title = xpath_text(item, 'title', 'title', fatal=True)
description = clean_html(xpath_text(item, _x('blip:puredescription'), 'description'))
timestamp = parse_iso8601(xpath_text(item, _x('blip:datestamp'), 'timestamp'))
uploader = xpath_text(item, _x('blip:user'), 'uploader')
uploader_id = xpath_text(item, _x('blip:userid'), 'uploader id')
duration = int_or_none(xpath_text(item, _x('blip:runtime'), 'duration'))
media_thumbnail = item.find(_x('media:thumbnail'))
thumbnail = (media_thumbnail.get('url') if media_thumbnail is not None
else xpath_text(item, 'image', 'thumbnail'))
categories = [category.text for category in item.findall('category') if category is not None]
formats = []
subtitles_urls = {}
media_group = item.find(_x('media:group'))
for media_content in media_group.findall(_x('media:content')):
url = media_content.get('url')
role = media_content.get(_x('blip:role'))
msg = self._download_webpage(
url + '?showplayer=20140425131715&referrer=http://blip.tv&mask=7&skin=flashvars&view=url',
video_id, 'Resolving URL for %s' % role)
real_url = compat_urlparse.parse_qs(msg.strip())['message'][0]
media_type = media_content.get('type')
if media_type == 'text/srt' or url.endswith('.srt'):
LANGS = {
'english': 'en',
}
lang = role.rpartition('-')[-1].strip().lower()
langcode = LANGS.get(lang, lang)
subtitles_urls[langcode] = url
elif media_type.startswith('video/'):
formats.append({
'url': real_url,
'format_id': role,
'format_note': media_type,
'vcodec': media_content.get(_x('blip:vcodec')) or 'none',
'acodec': media_content.get(_x('blip:acodec')),
'filesize': media_content.get('filesize'),
'width': int_or_none(media_content.get('width')),
'height': int_or_none(media_content.get('height')),
})
self._check_formats(formats, video_id)
self._sort_formats(formats)
subtitles = self.extract_subtitles(video_id, subtitles_urls)
return {
'id': video_id,
'title': title,
'description': description,
'timestamp': timestamp,
'uploader': uploader,
'uploader_id': uploader_id,
'duration': duration,
'thumbnail': thumbnail,
'categories': categories,
'formats': formats,
'subtitles': subtitles,
}
def _get_subtitles(self, video_id, subtitles_urls):
subtitles = {}
for lang, url in subtitles_urls.items():
# For some weird reason, blip.tv serves a video instead of subtitles
# when we request with a common UA
req = compat_urllib_request.Request(url)
req.add_header('User-Agent', 'youtube-dl')
subtitles[lang] = [{
# The extension is 'srt' but it's actually an 'ass' file
'ext': 'ass',
'data': self._download_webpage(req, None, note=False),
}]
return subtitles
class BlipTVUserIE(InfoExtractor):
_VALID_URL = r'(?:(?:https?://(?:\w+\.)?blip\.tv/)|bliptvuser:)(?!api\.swf)([^/]+)/*$'
_PAGE_SIZE = 12
IE_NAME = 'blip.tv:user'
_TEST = {
'url': 'http://blip.tv/actone',
'info_dict': {
'id': 'actone',
'title': 'Act One: The Series',
},
'playlist_count': 5,
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
username = mobj.group(1)
page_base = 'http://m.blip.tv/pr/show_get_full_episode_list?users_id=%s&lite=0&esi=1'
page = self._download_webpage(url, username, 'Downloading user page')
mobj = re.search(r'data-users-id="([^"]+)"', page)
page_base = page_base % mobj.group(1)
title = self._og_search_title(page)
# Download video ids using BlipTV Ajax calls. Result size per
# query is limited (currently to 12 videos) so we need to query
# page by page until there are no video ids - it means we got
# all of them.
video_ids = []
pagenum = 1
while True:
url = page_base + "&page=" + str(pagenum)
page = self._download_webpage(
url, username, 'Downloading video ids from page %d' % pagenum)
# Extract video identifiers
ids_in_page = []
for mobj in re.finditer(r'href="/([^"]+)"', page):
if mobj.group(1) not in ids_in_page:
ids_in_page.append(unescapeHTML(mobj.group(1)))
video_ids.extend(ids_in_page)
# A little optimization - if current page is not
# "full", ie. does not contain PAGE_SIZE video ids then
# we can assume that this page is the last one - there
# are no more ids on further pages - no need to query
# again.
if len(ids_in_page) < self._PAGE_SIZE:
break
pagenum += 1
urls = ['http://blip.tv/%s' % video_id for video_id in video_ids]
url_entries = [self.url_result(vurl, 'BlipTV') for vurl in urls]
return self.playlist_result(
url_entries, playlist_title=title, playlist_id=username)

View File

@@ -6,9 +6,9 @@ from .common import InfoExtractor
class BloombergIE(InfoExtractor):
_VALID_URL = r'https?://www\.bloomberg\.com/news/videos/[^/]+/(?P<id>[^/?#]+)'
_VALID_URL = r'https?://(?:www\.)?bloomberg\.com/(?:[^/]+/)*(?P<id>[^/?#]+)'
_TEST = {
_TESTS = [{
'url': 'http://www.bloomberg.com/news/videos/b/aaeae121-5949-481e-a1ce-4562db6f5df2',
# The md5 checksum changes
'info_dict': {
@@ -17,22 +17,35 @@ class BloombergIE(InfoExtractor):
'title': 'Shah\'s Presentation on Foreign-Exchange Strategies',
'description': 'md5:a8ba0302912d03d246979735c17d2761',
},
}
}, {
'url': 'http://www.bloomberg.com/news/articles/2015-11-12/five-strange-things-that-have-been-happening-in-financial-markets',
'only_matching': True,
}, {
'url': 'http://www.bloomberg.com/politics/videos/2015-11-25/karl-rove-on-jeb-bush-s-struggles-stopping-trump',
'only_matching': True,
}]
def _real_extract(self, url):
name = self._match_id(url)
webpage = self._download_webpage(url, name)
video_id = self._search_regex(r'"bmmrId":"(.+?)"', webpage, 'id')
video_id = self._search_regex(
r'["\']bmmrId["\']\s*:\s*(["\'])(?P<url>.+?)\1',
webpage, 'id', group='url')
title = re.sub(': Video$', '', self._og_search_title(webpage))
embed_info = self._download_json(
'http://www.bloomberg.com/api/embed?id=%s' % video_id, video_id)
formats = []
for stream in embed_info['streams']:
if stream["muxing_format"] == "TS":
formats.extend(self._extract_m3u8_formats(stream['url'], video_id))
stream_url = stream.get('url')
if not stream_url:
continue
if stream['muxing_format'] == 'TS':
formats.extend(self._extract_m3u8_formats(
stream_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
else:
formats.extend(self._extract_f4m_formats(stream['url'], video_id))
formats.extend(self._extract_f4m_formats(
stream_url, video_id, f4m_id='hds', fatal=False))
self._sort_formats(formats)
return {

View File

@@ -1,7 +1,13 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
js_to_json,
determine_ext,
)
class BpbIE(InfoExtractor):
@@ -10,7 +16,8 @@ class BpbIE(InfoExtractor):
_TEST = {
'url': 'http://www.bpb.de/mediathek/297/joachim-gauck-zu-1989-und-die-erinnerung-an-die-ddr',
'md5': '0792086e8e2bfbac9cdf27835d5f2093',
# md5 fails in Python 2.6 due to buggy server response and wrong handling of urllib2
'md5': 'c4f84c8a8044ca9ff68bb8441d300b3f',
'info_dict': {
'id': '297',
'ext': 'mp4',
@@ -25,13 +32,26 @@ class BpbIE(InfoExtractor):
title = self._html_search_regex(
r'<h2 class="white">(.*?)</h2>', webpage, 'title')
video_url = self._html_search_regex(
r'(http://film\.bpb\.de/player/dokument_[0-9]+\.mp4)',
webpage, 'video URL')
video_info_dicts = re.findall(
r"({\s*src:\s*'http://film\.bpb\.de/[^}]+})", webpage)
formats = []
for video_info in video_info_dicts:
video_info = self._parse_json(video_info, video_id, transform_source=js_to_json)
quality = video_info['quality']
video_url = video_info['src']
formats.append({
'url': video_url,
'preference': 10 if quality == 'high' else 0,
'format_note': quality,
'format_id': '%s-%s' % (quality, determine_ext(video_url)),
})
self._sort_formats(formats)
return {
'id': video_id,
'url': video_url,
'formats': formats,
'title': title,
'description': self._og_search_description(webpage),
}

View File

@@ -1,18 +1,21 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
int_or_none,
parse_duration,
xpath_element,
xpath_text,
)
class BRIE(InfoExtractor):
IE_DESC = 'Bayerischer Rundfunk Mediathek'
_VALID_URL = r'https?://(?:www\.)?br\.de/(?:[a-z0-9\-_]+/)+(?P<id>[a-z0-9\-_]+)\.html'
_BASE_URL = 'http://www.br.de'
_VALID_URL = r'(?P<base_url>https?://(?:www\.)?br(?:-klassik)?\.de)/(?:[a-z0-9\-_]+/)+(?P<id>[a-z0-9\-_]+)\.html'
_TESTS = [
{
@@ -22,7 +25,7 @@ class BRIE(InfoExtractor):
'id': '48f656ef-287e-486f-be86-459122db22cc',
'ext': 'mp4',
'title': 'Die böse Überraschung',
'description': 'Betriebliche Altersvorsorge: Die böse Überraschung',
'description': 'md5:ce9ac81b466ce775b8018f6801b48ac9',
'duration': 180,
'uploader': 'Reinhard Weber',
'upload_date': '20150422',
@@ -30,23 +33,23 @@ class BRIE(InfoExtractor):
},
{
'url': 'http://www.br.de/nachrichten/oberbayern/inhalt/muenchner-polizeipraesident-schreiber-gestorben-100.html',
'md5': 'a44396d73ab6a68a69a568fae10705bb',
'md5': 'af3a3a4aa43ff0ce6a89504c67f427ef',
'info_dict': {
'id': 'a4b83e34-123d-4b81-9f4e-c0d3121a4e05',
'ext': 'mp4',
'ext': 'flv',
'title': 'Manfred Schreiber ist tot',
'description': 'Abendschau kompakt: Manfred Schreiber ist tot',
'description': 'md5:b454d867f2a9fc524ebe88c3f5092d97',
'duration': 26,
}
},
{
'url': 'http://www.br.de/radio/br-klassik/sendungen/allegro/premiere-urauffuehrung-the-land-2015-dance-festival-muenchen-100.html',
'url': 'https://www.br-klassik.de/audio/peeping-tom-premierenkritik-dance-festival-muenchen-100.html',
'md5': '8b5b27c0b090f3b35eac4ab3f7a73d3d',
'info_dict': {
'id': '74c603c9-26d3-48bb-b85b-079aeed66e0b',
'ext': 'aac',
'title': 'Kurzweilig und sehr bewegend',
'description': '"The Land" von Peeping Tom: Kurzweilig und sehr bewegend',
'description': 'md5:0351996e3283d64adeb38ede91fac54e',
'duration': 296,
}
},
@@ -57,7 +60,7 @@ class BRIE(InfoExtractor):
'id': '6ba73750-d405-45d3-861d-1ce8c524e059',
'ext': 'mp4',
'title': 'Umweltbewusster Häuslebauer',
'description': 'Uwe Erdelt: Umweltbewusster Häuslebauer',
'description': 'md5:d52dae9792d00226348c1dbb13c9bae2',
'duration': 116,
}
},
@@ -68,7 +71,7 @@ class BRIE(InfoExtractor):
'id': 'd982c9ce-8648-4753-b358-98abb8aec43d',
'ext': 'mp4',
'title': 'Folge 1 - Metaphysik',
'description': 'Kant für Anfänger: Folge 1 - Metaphysik',
'description': 'md5:bb659990e9e59905c3d41e369db1fbe3',
'duration': 893,
'uploader': 'Eva Maria Steimle',
'upload_date': '20140117',
@@ -77,28 +80,31 @@ class BRIE(InfoExtractor):
]
def _real_extract(self, url):
display_id = self._match_id(url)
base_url, display_id = re.search(self._VALID_URL, url).groups()
page = self._download_webpage(url, display_id)
xml_url = self._search_regex(
r"return BRavFramework\.register\(BRavFramework\('avPlayer_(?:[a-f0-9-]{36})'\)\.setup\({dataURL:'(/(?:[a-z0-9\-]+/)+[a-z0-9/~_.-]+)'}\)\);", page, 'XMLURL')
xml = self._download_xml(self._BASE_URL + xml_url, None)
xml = self._download_xml(base_url + xml_url, display_id)
medias = []
for xml_media in xml.findall('video') + xml.findall('audio'):
media_id = xml_media.get('externalId')
media = {
'id': xml_media.get('externalId'),
'title': xml_media.find('title').text,
'duration': parse_duration(xml_media.find('duration').text),
'formats': self._extract_formats(xml_media.find('assets')),
'thumbnails': self._extract_thumbnails(xml_media.find('teaserImage/variants')),
'description': ' '.join(xml_media.find('shareTitle').text.splitlines()),
'webpage_url': xml_media.find('permalink').text
'id': media_id,
'title': xpath_text(xml_media, 'title', 'title', True),
'duration': parse_duration(xpath_text(xml_media, 'duration')),
'formats': self._extract_formats(xpath_element(
xml_media, 'assets'), media_id),
'thumbnails': self._extract_thumbnails(xpath_element(
xml_media, 'teaserImage/variants'), base_url),
'description': xpath_text(xml_media, 'desc'),
'webpage_url': xpath_text(xml_media, 'permalink'),
'uploader': xpath_text(xml_media, 'author'),
}
if xml_media.find('author').text:
media['uploader'] = xml_media.find('author').text
if xml_media.find('broadcastDate').text:
media['upload_date'] = ''.join(reversed(xml_media.find('broadcastDate').text.split('.')))
broadcast_date = xpath_text(xml_media, 'broadcastDate')
if broadcast_date:
media['upload_date'] = ''.join(reversed(broadcast_date.split('.')))
medias.append(media)
if len(medias) > 1:
@@ -109,35 +115,54 @@ class BRIE(InfoExtractor):
raise ExtractorError('No media entries found')
return medias[0]
def _extract_formats(self, assets):
def text_or_none(asset, tag):
elem = asset.find(tag)
return None if elem is None else elem.text
formats = [{
'url': text_or_none(asset, 'downloadUrl'),
'ext': text_or_none(asset, 'mediaType'),
'format_id': asset.get('type'),
'width': int_or_none(text_or_none(asset, 'frameWidth')),
'height': int_or_none(text_or_none(asset, 'frameHeight')),
'tbr': int_or_none(text_or_none(asset, 'bitrateVideo')),
'abr': int_or_none(text_or_none(asset, 'bitrateAudio')),
'vcodec': text_or_none(asset, 'codecVideo'),
'acodec': text_or_none(asset, 'codecAudio'),
'container': text_or_none(asset, 'mediaType'),
'filesize': int_or_none(text_or_none(asset, 'size')),
} for asset in assets.findall('asset')
if asset.find('downloadUrl') is not None]
def _extract_formats(self, assets, media_id):
formats = []
for asset in assets.findall('asset'):
format_url = xpath_text(asset, ['downloadUrl', 'url'])
asset_type = asset.get('type')
if asset_type == 'HDS':
formats.extend(self._extract_f4m_formats(
format_url + '?hdcore=3.2.0', media_id, f4m_id='hds', fatal=False))
elif asset_type == 'HLS':
formats.extend(self._extract_m3u8_formats(
format_url, media_id, 'mp4', 'm3u8_native', m3u8_id='hds', fatal=False))
else:
format_info = {
'ext': xpath_text(asset, 'mediaType'),
'width': int_or_none(xpath_text(asset, 'frameWidth')),
'height': int_or_none(xpath_text(asset, 'frameHeight')),
'tbr': int_or_none(xpath_text(asset, 'bitrateVideo')),
'abr': int_or_none(xpath_text(asset, 'bitrateAudio')),
'vcodec': xpath_text(asset, 'codecVideo'),
'acodec': xpath_text(asset, 'codecAudio'),
'container': xpath_text(asset, 'mediaType'),
'filesize': int_or_none(xpath_text(asset, 'size')),
}
format_url = self._proto_relative_url(format_url)
if format_url:
http_format_info = format_info.copy()
http_format_info.update({
'url': format_url,
'format_id': 'http-%s' % asset_type,
})
formats.append(http_format_info)
server_prefix = xpath_text(asset, 'serverPrefix')
if server_prefix:
rtmp_format_info = format_info.copy()
rtmp_format_info.update({
'url': server_prefix,
'play_path': xpath_text(asset, 'fileName'),
'format_id': 'rtmp-%s' % asset_type,
})
formats.append(rtmp_format_info)
self._sort_formats(formats)
return formats
def _extract_thumbnails(self, variants):
def _extract_thumbnails(self, variants, base_url):
thumbnails = [{
'url': self._BASE_URL + variant.find('url').text,
'width': int_or_none(variant.find('width').text),
'height': int_or_none(variant.find('height').text),
} for variant in variants.findall('variant')]
'url': base_url + xpath_text(variant, 'url'),
'width': int_or_none(xpath_text(variant, 'width')),
'height': int_or_none(xpath_text(variant, 'height')),
} for variant in variants.findall('variant') if xpath_text(variant, 'url')]
thumbnails.sort(key=lambda x: x['width'] * x['height'], reverse=True)
return thumbnails

View File

@@ -11,7 +11,6 @@ from ..compat import (
compat_str,
compat_urllib_parse,
compat_urllib_parse_urlparse,
compat_urllib_request,
compat_urlparse,
compat_xml_parse_error,
)
@@ -24,6 +23,7 @@ from ..utils import (
js_to_json,
int_or_none,
parse_iso8601,
sanitized_Request,
unescapeHTML,
unsmuggle_url,
)
@@ -250,7 +250,7 @@ class BrightcoveLegacyIE(InfoExtractor):
def _get_video_info(self, video_id, query_str, query, referer=None):
request_url = self._FEDERATED_URL_TEMPLATE % query_str
req = compat_urllib_request.Request(request_url)
req = sanitized_Request(request_url)
linkBase = query.get('linkBaseURL')
if linkBase is not None:
referer = linkBase[0]
@@ -355,7 +355,7 @@ class BrightcoveLegacyIE(InfoExtractor):
class BrightcoveNewIE(InfoExtractor):
IE_NAME = 'brightcove:new'
_VALID_URL = r'https?://players\.brightcove\.net/(?P<account_id>\d+)/(?P<player_id>[^/]+)_(?P<embed>[^/]+)/index\.html\?.*videoId=(?P<video_id>\d+)'
_VALID_URL = r'https?://players\.brightcove\.net/(?P<account_id>\d+)/(?P<player_id>[^/]+)_(?P<embed>[^/]+)/index\.html\?.*videoId=(?P<video_id>(?:ref:)?\d+)'
_TESTS = [{
'url': 'http://players.brightcove.net/929656772001/e41d32dc-ec74-459e-a845-6c69f7b724ea_default/index.html?videoId=4463358922001',
'md5': 'c8100925723840d4b0d243f7025703be',
@@ -387,14 +387,24 @@ class BrightcoveNewIE(InfoExtractor):
'params': {
'skip_download': True,
}
}, {
# ref: prefixed video id
'url': 'http://players.brightcove.net/3910869709001/21519b5c-4b3b-4363-accb-bdc8f358f823_default/index.html?videoId=ref:7069442',
'only_matching': True,
}]
@staticmethod
def _extract_url(webpage):
urls = BrightcoveNewIE._extract_urls(webpage)
return urls[0] if urls else None
@staticmethod
def _extract_urls(webpage):
# Reference:
# 1. http://docs.brightcove.com/en/video-cloud/brightcove-player/guides/publish-video.html#setvideoiniframe
# 2. http://docs.brightcove.com/en/video-cloud/brightcove-player/guides/publish-video.html#setvideousingjavascript)
# 2. http://docs.brightcove.com/en/video-cloud/brightcove-player/guides/publish-video.html#setvideousingjavascript
# 3. http://docs.brightcove.com/en/video-cloud/brightcove-player/guides/embed-in-page.html
# 4. https://support.brightcove.com/en/video-cloud/docs/dynamically-assigning-videos-player
entries = []
@@ -407,9 +417,10 @@ class BrightcoveNewIE(InfoExtractor):
for video_id, account_id, player_id, embed in re.findall(
# According to examples from [3] it's unclear whether video id
# may be optional and what to do when it is
# According to [4] data-video-id may be prefixed with ref:
r'''(?sx)
<video[^>]+
data-video-id=["\'](\d+)["\'][^>]*>.*?
data-video-id=["\']((?:ref:)?\d+)["\'][^>]*>.*?
</video>.*?
<script[^>]+
src=["\'](?:https?:)?//players\.brightcove\.net/
@@ -443,7 +454,7 @@ class BrightcoveNewIE(InfoExtractor):
r'policyKey\s*:\s*(["\'])(?P<pk>.+?)\1',
webpage, 'policy key', group='pk')
req = compat_urllib_request.Request(
req = sanitized_Request(
'https://edge.api.brightcove.com/playback/v1/accounts/%s/videos/%s'
% (account_id, video_id),
headers={'Accept': 'application/json;pk=%s' % policy_key})
@@ -458,11 +469,9 @@ class BrightcoveNewIE(InfoExtractor):
if source_type == 'application/x-mpegURL':
if not src:
continue
m3u8_formats = self._extract_m3u8_formats(
formats.extend(self._extract_m3u8_formats(
src, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False)
if m3u8_formats:
formats.extend(m3u8_formats)
m3u8_id='hls', fatal=False))
else:
streaming_src = source.get('streaming_src')
stream_name, app_name = source.get('stream_name'), source.get('app_name')

View File

@@ -14,9 +14,10 @@ class BYUtvIE(InfoExtractor):
'info_dict': {
'id': 'studio-c-season-5-episode-5',
'ext': 'mp4',
'description': 'md5:5438d33774b6bdc662f9485a340401cc',
'description': 'md5:e07269172baff037f8e8bf9956bc9747',
'title': 'Season 5 Episode 5',
'thumbnail': 're:^https?://.*\.jpg$'
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 1486.486,
},
'params': {
'skip_download': True,

View File

@@ -1,48 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class Canal13clIE(InfoExtractor):
_VALID_URL = r'^http://(?:www\.)?13\.cl/(?:[^/?#]+/)*(?P<id>[^/?#]+)'
_TEST = {
'url': 'http://www.13.cl/t13/nacional/el-circulo-de-hierro-de-michelle-bachelet-en-su-regreso-a-la-moneda',
'md5': '4cb1fa38adcad8fea88487a078831755',
'info_dict': {
'id': '1403022125',
'display_id': 'el-circulo-de-hierro-de-michelle-bachelet-en-su-regreso-a-la-moneda',
'ext': 'mp4',
'title': 'El "círculo de hierro" de Michelle Bachelet en su regreso a La Moneda',
'description': '(Foto: Agencia Uno) En nueve días más, Michelle Bachelet va a asumir por segunda vez como presidenta de la República. Entre aquellos que la acompañarán hay caras que se repiten y otras que se consolidan en su entorno de colaboradores más cercanos.',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('id')
webpage = self._download_webpage(url, display_id)
title = self._html_search_meta(
'twitter:title', webpage, 'title', fatal=True)
description = self._html_search_meta(
'twitter:description', webpage, 'description')
url = self._html_search_regex(
r'articuloVideo = \"(.*?)\"', webpage, 'url')
real_id = self._search_regex(
r'[^0-9]([0-9]{7,})[^0-9]', url, 'id', default=display_id)
thumbnail = self._html_search_regex(
r'articuloImagen = \"(.*?)\"', webpage, 'thumbnail')
return {
'id': real_id,
'display_id': display_id,
'url': url,
'title': title,
'description': description,
'ext': 'mp4',
'thumbnail': thumbnail,
}

View File

@@ -9,9 +9,9 @@ from ..utils import parse_duration
class Canalc2IE(InfoExtractor):
IE_NAME = 'canalc2.tv'
_VALID_URL = r'https?://(?:www\.)?canalc2\.tv/video/(?P<id>\d+)'
_VALID_URL = r'https?://(?:(?:www\.)?canalc2\.tv/video/|archives-canalc2\.u-strasbg\.fr/video\.asp\?.*\bidVideo=)(?P<id>\d+)'
_TEST = {
_TESTS = [{
'url': 'http://www.canalc2.tv/video/12163',
'md5': '060158428b650f896c542dfbb3d6487f',
'info_dict': {
@@ -23,24 +23,36 @@ class Canalc2IE(InfoExtractor):
'params': {
'skip_download': True, # Requires rtmpdump
}
}
}, {
'url': 'http://archives-canalc2.u-strasbg.fr/video.asp?idVideo=11427&voir=oui',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_url = self._search_regex(
r'jwplayer\((["\'])Player\1\)\.setup\({[^}]*file\s*:\s*(["\'])(?P<file>.+?)\2',
webpage, 'video_url', group='file')
formats = [{'url': video_url}]
if video_url.startswith('rtmp://'):
rtmp = re.search(r'^(?P<url>rtmp://[^/]+/(?P<app>.+/))(?P<play_path>mp4:.+)$', video_url)
formats[0].update({
'url': rtmp.group('url'),
'ext': 'flv',
'app': rtmp.group('app'),
'play_path': rtmp.group('play_path'),
'page_url': url,
})
webpage = self._download_webpage(
'http://www.canalc2.tv/video/%s' % video_id, video_id)
formats = []
for _, video_url in re.findall(r'file\s*=\s*(["\'])(.+?)\1', webpage):
if video_url.startswith('rtmp://'):
rtmp = re.search(
r'^(?P<url>rtmp://[^/]+/(?P<app>.+/))(?P<play_path>mp4:.+)$', video_url)
formats.append({
'url': rtmp.group('url'),
'format_id': 'rtmp',
'ext': 'flv',
'app': rtmp.group('app'),
'play_path': rtmp.group('play_path'),
'page_url': url,
})
else:
formats.append({
'url': video_url,
'format_id': 'http',
})
self._sort_formats(formats)
title = self._html_search_regex(
r'(?s)class="[^"]*col_description[^"]*">.*?<h3>(.*?)</h3>', webpage, 'title')

View File

@@ -10,13 +10,14 @@ from ..utils import (
unified_strdate,
url_basename,
qualities,
int_or_none,
)
class CanalplusIE(InfoExtractor):
IE_DESC = 'canalplus.fr, piwiplus.fr and d8.tv'
_VALID_URL = r'https?://(?:www\.(?P<site>canalplus\.fr|piwiplus\.fr|d8\.tv|itele\.fr)/.*?/(?P<path>.*)|player\.canalplus\.fr/#/(?P<id>[0-9]+))'
_VIDEO_INFO_TEMPLATE = 'http://service.canal-plus.com/video/rest/getVideosLiees/%s/%s'
_VIDEO_INFO_TEMPLATE = 'http://service.canal-plus.com/video/rest/getVideosLiees/%s/%s?format=json'
_SITE_ID_MAP = {
'canalplus.fr': 'cplus',
'piwiplus.fr': 'teletoon',
@@ -26,10 +27,10 @@ class CanalplusIE(InfoExtractor):
_TESTS = [{
'url': 'http://www.canalplus.fr/c-emissions/pid1830-c-zapping.html?vid=1263092',
'md5': 'b3481d7ca972f61e37420798d0a9d934',
'md5': '12164a6f14ff6df8bd628e8ba9b10b78',
'info_dict': {
'id': '1263092',
'ext': 'flv',
'ext': 'mp4',
'title': 'Le Zapping - 13/05/15',
'description': 'md5:09738c0d06be4b5d06a0940edb0da73f',
'upload_date': '20150513',
@@ -56,10 +57,10 @@ class CanalplusIE(InfoExtractor):
'skip': 'videos get deleted after a while',
}, {
'url': 'http://www.itele.fr/france/video/aubervilliers-un-lycee-en-colere-111559',
'md5': 'f3a46edcdf28006598ffaf5b30e6a2d4',
'md5': '38b8f7934def74f0d6f3ba6c036a5f82',
'info_dict': {
'id': '1213714',
'ext': 'flv',
'ext': 'mp4',
'title': 'Aubervilliers : un lycée en colère - Le 11/02/2015 à 06h45',
'description': 'md5:8216206ec53426ea6321321f3b3c16db',
'upload_date': '20150211',
@@ -82,15 +83,16 @@ class CanalplusIE(InfoExtractor):
webpage, 'video id', group='id')
info_url = self._VIDEO_INFO_TEMPLATE % (site_id, video_id)
doc = self._download_xml(info_url, video_id, 'Downloading video XML')
video_data = self._download_json(info_url, video_id, 'Downloading video JSON')
video_info = [video for video in doc if video.find('ID').text == video_id][0]
media = video_info.find('MEDIA')
infos = video_info.find('INFOS')
if isinstance(video_data, list):
video_data = [video for video in video_data if video.get('ID') == video_id][0]
media = video_data['MEDIA']
infos = video_data['INFOS']
preference = qualities(['MOBILE', 'BAS_DEBIT', 'HAUT_DEBIT', 'HD', 'HLS', 'HDS'])
preference = qualities(['MOBILE', 'BAS_DEBIT', 'HAUT_DEBIT', 'HD'])
fmt_url = next(iter(media.find('VIDEOS'))).text
fmt_url = next(iter(media.get('VIDEOS')))
if '/geo' in fmt_url.lower():
response = self._request_webpage(
HEADRequest(fmt_url), video_id,
@@ -101,35 +103,42 @@ class CanalplusIE(InfoExtractor):
expected=True)
formats = []
for fmt in media.find('VIDEOS'):
format_url = fmt.text
for format_id, format_url in media['VIDEOS'].items():
if not format_url:
continue
format_id = fmt.tag
if format_id == 'HLS':
formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4', preference=preference(format_id)))
format_url, video_id, 'mp4', 'm3u8_native', m3u8_id=format_id, fatal=False))
elif format_id == 'HDS':
formats.extend(self._extract_f4m_formats(
format_url + '?hdcore=2.11.3', video_id, preference=preference(format_id)))
format_url + '?hdcore=2.11.3', video_id, f4m_id=format_id, fatal=False))
else:
formats.append({
'url': format_url,
# the secret extracted ya function in http://player.canalplus.fr/common/js/canalPlayer.js
'url': format_url + '?secret=pqzerjlsmdkjfoiuerhsdlfknaes',
'format_id': format_id,
'preference': preference(format_id),
})
self._sort_formats(formats)
thumbnails = [{
'id': image_id,
'url': image_url,
} for image_id, image_url in media.get('images', {}).items()]
titrage = infos['TITRAGE']
return {
'id': video_id,
'display_id': display_id,
'title': '%s - %s' % (infos.find('TITRAGE/TITRE').text,
infos.find('TITRAGE/SOUS_TITRE').text),
'upload_date': unified_strdate(infos.find('PUBLICATION/DATE').text),
'thumbnail': media.find('IMAGES/GRAND').text,
'description': infos.find('DESCRIPTION').text,
'view_count': int(infos.find('NB_VUES').text),
'like_count': int(infos.find('NB_LIKES').text),
'comment_count': int(infos.find('NB_COMMENTS').text),
'title': '%s - %s' % (titrage['TITRE'],
titrage['SOUS_TITRE']),
'upload_date': unified_strdate(infos.get('PUBLICATION', {}).get('DATE')),
'thumbnails': thumbnails,
'description': infos.get('DESCRIPTION'),
'duration': int_or_none(infos.get('DURATION')),
'view_count': int_or_none(infos.get('NB_VUES')),
'like_count': int_or_none(infos.get('NB_LIKES')),
'comment_count': int_or_none(infos.get('NB_COMMENTS')),
'formats': formats,
}

View File

@@ -0,0 +1,65 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import float_or_none
class CanvasIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?canvas\.be/video/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TEST = {
'url': 'http://www.canvas.be/video/de-afspraak/najaar-2015/de-afspraak-veilt-voor-de-warmste-week',
'md5': 'ea838375a547ac787d4064d8c7860a6c',
'info_dict': {
'id': 'mz-ast-5e5f90b6-2d72-4c40-82c2-e134f884e93e',
'display_id': 'de-afspraak-veilt-voor-de-warmste-week',
'ext': 'mp4',
'title': 'De afspraak veilt voor de Warmste Week',
'description': 'md5:24cb860c320dc2be7358e0e5aa317ba6',
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 49.02,
}
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
title = self._search_regex(
r'<h1[^>]+class="video__body__header__title"[^>]*>(.+?)</h1>',
webpage, 'title', default=None) or self._og_search_title(webpage)
video_id = self._html_search_regex(
r'data-video=(["\'])(?P<id>.+?)\1', webpage, 'video id', group='id')
data = self._download_json(
'https://mediazone.vrt.be/api/v1/canvas/assets/%s' % video_id, display_id)
formats = []
for target in data['targetUrls']:
format_url, format_type = target.get('url'), target.get('type')
if not format_url or not format_type:
continue
if format_type == 'HLS':
formats.extend(self._extract_m3u8_formats(
format_url, display_id, entry_protocol='m3u8_native',
ext='mp4', preference=0, fatal=False, m3u8_id=format_type))
elif format_type == 'HDS':
formats.extend(self._extract_f4m_formats(
format_url, display_id, f4m_id=format_type, fatal=False))
else:
formats.append({
'format_id': format_type,
'url': format_url,
})
self._sort_formats(formats)
return {
'id': video_id,
'display_id': display_id,
'title': title,
'description': self._og_search_description(webpage),
'formats': formats,
'duration': float_or_none(data.get('duration'), 1000),
'thumbnail': data.get('posterImageUrl'),
}

View File

@@ -1,8 +1,10 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_urllib_request
from ..utils import smuggle_url
from ..utils import (
sanitized_Request,
smuggle_url,
)
class CBSIE(InfoExtractor):
@@ -48,7 +50,7 @@ class CBSIE(InfoExtractor):
def _real_extract(self, url):
display_id = self._match_id(url)
request = compat_urllib_request.Request(url)
request = sanitized_Request(url)
# Android UA is served with higher quality (720p) streams (see
# https://github.com/rg3/youtube-dl/issues/7490)
request.add_header('User-Agent', 'Mozilla/5.0 (Linux; Android 4.4; Nexus 5)')

View File

@@ -1,15 +1,14 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor
from .theplatform import ThePlatformIE
from ..utils import parse_duration
class CBSNewsIE(InfoExtractor):
class CBSNewsIE(ThePlatformIE):
IE_DESC = 'CBS News'
_VALID_URL = r'http://(?:www\.)?cbsnews\.com/(?:[^/]+/)+(?P<id>[\da-z_-]+)'
_VALID_URL = r'http://(?:www\.)?cbsnews\.com/(?:news|videos)/(?P<id>[\da-z_-]+)'
_TESTS = [
{
@@ -30,56 +29,54 @@ class CBSNewsIE(InfoExtractor):
'url': 'http://www.cbsnews.com/videos/fort-hood-shooting-army-downplays-mental-illness-as-cause-of-attack/',
'info_dict': {
'id': 'fort-hood-shooting-army-downplays-mental-illness-as-cause-of-attack',
'ext': 'flv',
'ext': 'mp4',
'title': 'Fort Hood shooting: Army downplays mental illness as cause of attack',
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 205,
'subtitles': {
'en': [{
'ext': 'ttml',
}],
},
},
'params': {
# rtmp download
# m3u8 download
'skip_download': True,
},
},
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_info = json.loads(self._html_search_regex(
video_info = self._parse_json(self._html_search_regex(
r'(?:<ul class="media-list items" id="media-related-items"><li data-video-info|<div id="cbsNewsVideoPlayer" data-video-player-options)=\'({.+?})\'',
webpage, 'video JSON info'))
webpage, 'video JSON info'), video_id)
item = video_info['item'] if 'item' in video_info else video_info
title = item.get('articleTitle') or item.get('hed')
duration = item.get('duration')
thumbnail = item.get('mediaImage') or item.get('thumbnail')
subtitles = {}
if 'mpxRefId' in video_info:
subtitles['en'] = [{
'ext': 'ttml',
'url': 'http://www.cbsnews.com/videos/captions/%s.adb_xml' % video_info['mpxRefId'],
}]
formats = []
for format_id in ['RtmpMobileLow', 'RtmpMobileHigh', 'Hls', 'RtmpDesktop']:
uri = item.get('media' + format_id + 'URI')
if not uri:
pid = item.get('media' + format_id)
if not pid:
continue
fmt = {
'url': uri,
'format_id': format_id,
}
if uri.startswith('rtmp'):
play_path = re.sub(
r'{slistFilePath}', '',
uri.split('<break>')[-1].split('{break}')[-1])
fmt.update({
'app': 'ondemand?auth=cbs',
'play_path': 'mp4:' + play_path,
'player_url': 'http://www.cbsnews.com/[[IMPORT]]/vidtech.cbsinteractive.com/player/3_3_0/CBSI_PLAYER_HD.swf',
'page_url': 'http://www.cbsnews.com',
'ext': 'flv',
})
elif uri.endswith('.m3u8'):
fmt['ext'] = 'mp4'
formats.append(fmt)
release_url = 'http://link.theplatform.com/s/dJ5BDC/%s?format=SMIL&mbr=true' % pid
tp_formats, tp_subtitles = self._extract_theplatform_smil(release_url, video_id, 'Downloading %s SMIL data' % pid)
formats.extend(tp_formats)
subtitles = self._merge_subtitles(subtitles, tp_subtitles)
self._sort_formats(formats)
return {
'id': video_id,
@@ -87,4 +84,43 @@ class CBSNewsIE(InfoExtractor):
'thumbnail': thumbnail,
'duration': duration,
'formats': formats,
'subtitles': subtitles,
}
class CBSNewsLiveVideoIE(InfoExtractor):
IE_DESC = 'CBS News Live Videos'
_VALID_URL = r'http://(?:www\.)?cbsnews\.com/live/video/(?P<id>[\da-z_-]+)'
_TEST = {
'url': 'http://www.cbsnews.com/live/video/clinton-sanders-prepare-to-face-off-in-nh/',
'info_dict': {
'id': 'clinton-sanders-prepare-to-face-off-in-nh',
'ext': 'flv',
'title': 'Clinton, Sanders Prepare To Face Off In NH',
'duration': 334,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_info = self._parse_json(self._html_search_regex(
r'data-story-obj=\'({.+?})\'', webpage, 'video JSON info'), video_id)['story']
hdcore_sign = 'hdcore=3.3.1'
f4m_formats = self._extract_f4m_formats(video_info['url'] + '&' + hdcore_sign, video_id)
if f4m_formats:
for entry in f4m_formats:
# URLs without the extra param induce an 404 error
entry.update({'extra_param_to_segment_url': hdcore_sign})
return {
'id': video_id,
'title': video_info['headline'],
'thumbnail': video_info.get('thumbnail_url_hd') or video_info.get('thumbnail_url_sd'),
'duration': parse_duration(video_info.get('segmentDur')),
'formats': f4m_formats,
}

View File

@@ -5,6 +5,7 @@ import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
parse_duration,
qualities,
unified_strdate,
)
@@ -12,21 +13,25 @@ from ..utils import (
class CCCIE(InfoExtractor):
IE_NAME = 'media.ccc.de'
_VALID_URL = r'https?://(?:www\.)?media\.ccc\.de/[^?#]+/[^?#/]*?_(?P<id>[0-9]{8,})._[^?#/]*\.html'
_VALID_URL = r'https?://(?:www\.)?media\.ccc\.de/v/(?P<id>[^/?#&]+)'
_TEST = {
'url': 'http://media.ccc.de/browse/congress/2013/30C3_-_5443_-_en_-_saal_g_-_201312281830_-_introduction_to_processor_design_-_byterazor.html#video',
_TESTS = [{
'url': 'https://media.ccc.de/v/30C3_-_5443_-_en_-_saal_g_-_201312281830_-_introduction_to_processor_design_-_byterazor#video',
'md5': '3a1eda8f3a29515d27f5adb967d7e740',
'info_dict': {
'id': '20131228183',
'id': '30C3_-_5443_-_en_-_saal_g_-_201312281830_-_introduction_to_processor_design_-_byterazor',
'ext': 'mp4',
'title': 'Introduction to Processor Design',
'description': 'md5:5ddbf8c734800267f2cee4eab187bc1b',
'description': 'md5:80be298773966f66d56cb11260b879af',
'thumbnail': 're:^https?://.*\.jpg$',
'view_count': int,
'upload_date': '20131229',
'upload_date': '20131228',
'duration': 3660,
}
}
}, {
'url': 'https://media.ccc.de/v/32c3-7368-shopshifting#download',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
@@ -40,21 +45,25 @@ class CCCIE(InfoExtractor):
title = self._html_search_regex(
r'(?s)<h1>(.*?)</h1>', webpage, 'title')
description = self._html_search_regex(
r"(?s)<p class='description'>(.*?)</p>",
r"(?s)<h3>About</h3>(.+?)<h3>",
webpage, 'description', fatal=False)
upload_date = unified_strdate(self._html_search_regex(
r"(?s)<span class='[^']*fa-calendar-o'></span>(.*?)</li>",
r"(?s)<span[^>]+class='[^']*fa-calendar-o'[^>]*>(.+?)</span>",
webpage, 'upload date', fatal=False))
view_count = int_or_none(self._html_search_regex(
r"(?s)<span class='[^']*fa-eye'></span>(.*?)</li>",
webpage, 'view count', fatal=False))
duration = parse_duration(self._html_search_regex(
r'(?s)<span[^>]+class=(["\']).*?fa-clock-o.*?\1[^>]*></span>(?P<duration>.+?)</li',
webpage, 'duration', fatal=False, group='duration'))
matches = re.finditer(r'''(?xs)
<(?:span|div)\s+class='label\s+filetype'>(?P<format>.*?)</(?:span|div)>\s*
<(?:span|div)\s+class='label\s+filetype'>(?P<format>[^<]*)</(?:span|div)>\s*
<(?:span|div)\s+class='label\s+filetype'>(?P<lang>[^<]*)</(?:span|div)>\s*
<a\s+download\s+href='(?P<http_url>[^']+)'>\s*
(?:
.*?
<a\s+href='(?P<torrent_url>[^']+\.torrent)'
<a\s+(?:download\s+)?href='(?P<torrent_url>[^']+\.torrent)'
)?''', webpage)
formats = []
for m in matches:
@@ -62,12 +71,15 @@ class CCCIE(InfoExtractor):
format_id = self._search_regex(
r'.*/([a-z0-9_-]+)/[^/]*$',
m.group('http_url'), 'format id', default=None)
if format_id:
format_id = m.group('lang') + '-' + format_id
vcodec = 'h264' if 'h264' in format_id else (
'none' if format_id in ('mp3', 'opus') else None
)
formats.append({
'format_id': format_id,
'format': format,
'language': m.group('lang'),
'url': m.group('http_url'),
'vcodec': vcodec,
'preference': preference(format_id),
@@ -95,5 +107,6 @@ class CCCIE(InfoExtractor):
'thumbnail': thumbnail,
'view_count': view_count,
'upload_date': upload_date,
'duration': duration,
'formats': formats,
}

View File

@@ -5,7 +5,6 @@ import re
from .common import InfoExtractor
from ..compat import (
compat_urllib_request,
compat_urllib_parse,
compat_urllib_parse_unquote,
compat_urllib_parse_urlparse,
@@ -13,6 +12,7 @@ from ..compat import (
from ..utils import (
ExtractorError,
float_or_none,
sanitized_Request,
)
@@ -100,7 +100,7 @@ class CeskaTelevizeIE(InfoExtractor):
'requestSource': 'iVysilani',
}
req = compat_urllib_request.Request(
req = sanitized_Request(
'http://www.ceskatelevize.cz/ivysilani/ajax/get-client-playlist',
data=compat_urllib_parse.urlencode(data))
@@ -115,7 +115,7 @@ class CeskaTelevizeIE(InfoExtractor):
if playlist_url == 'error_region':
raise ExtractorError(NOT_AVAILABLE_STRING, expected=True)
req = compat_urllib_request.Request(compat_urllib_parse_unquote(playlist_url))
req = sanitized_Request(compat_urllib_parse_unquote(playlist_url))
req.add_header('Referer', url)
playlist_title = self._og_search_title(webpage)

View File

@@ -23,6 +23,8 @@ class ChaturbateIE(InfoExtractor):
'only_matching': True,
}]
_ROOM_OFFLINE = 'Room is currently offline'
def _real_extract(self, url):
video_id = self._match_id(url)
@@ -34,9 +36,16 @@ class ChaturbateIE(InfoExtractor):
if not m3u8_url:
error = self._search_regex(
r'<span[^>]+class=(["\'])desc_span\1[^>]*>(?P<error>[^<]+)</span>',
webpage, 'error', group='error')
raise ExtractorError(error, expected=True)
[r'<span[^>]+class=(["\'])desc_span\1[^>]*>(?P<error>[^<]+)</span>',
r'<div[^>]+id=(["\'])defchat\1[^>]*>\s*<p><strong>(?P<error>[^<]+)<'],
webpage, 'error', group='error', default=None)
if not error:
if any(p not in webpage for p in (
self._ROOM_OFFLINE, 'offline_tipping', 'tip_offline')):
error = self._ROOM_OFFLINE
if error:
raise ExtractorError(error, expected=True)
raise ExtractorError('Unable to find stream URL')
formats = self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4')

View File

@@ -5,7 +5,6 @@ import re
from .common import InfoExtractor
from ..utils import ExtractorError
from .bliptv import BlipTVIE
from .screenwavemedia import ScreenwaveMediaIE
@@ -34,18 +33,17 @@ class CinemassacreIE(InfoExtractor):
},
},
{
# blip.tv embedded video
# Youtube embedded video
'url': 'http://cinemassacre.com/2006/12/07/chronologically-confused-about-bad-movie-and-video-game-sequel-titles/',
'md5': 'ca9b3c8dd5a66f9375daeb5135f5a3de',
'md5': 'df4cf8a1dcedaec79a73d96d83b99023',
'info_dict': {
'id': '4065369',
'ext': 'flv',
'id': 'OEVzPCY2T-g',
'ext': 'mp4',
'title': 'AVGN: Chronologically Confused about Bad Movie and Video Game Sequel Titles',
'upload_date': '20061207',
'uploader': 'cinemassacre',
'uploader_id': '250778',
'timestamp': 1283233867,
'description': 'md5:0a108c78d130676b207d0f6d029ecffd',
'uploader': 'Cinemassacre',
'uploader_id': 'JamesNintendoNerd',
'description': 'md5:784734696c2b8b7f4b8625cc799e07f6',
}
},
{
@@ -88,8 +86,6 @@ class CinemassacreIE(InfoExtractor):
r'<iframe[^>]+src="(?P<url>(?:https?:)?//(?:[^.]+\.)?youtube\.com/.+?)"',
],
webpage, 'player data URL', default=None, group='url')
if not playerdata_url:
playerdata_url = BlipTVIE._extract_url(webpage)
if not playerdata_url:
raise ExtractorError('Unable to find player data')

View File

@@ -1,14 +1,9 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
int_or_none,
js_to_json,
parse_iso8601,
remove_end,
unified_strdate,
)
@@ -21,48 +16,47 @@ class ClipfishIE(InfoExtractor):
'id': '3966754',
'ext': 'mp4',
'title': 'FIFA 14 - E3 2013 Trailer',
'timestamp': 1370938118,
'description': 'Video zu FIFA 14: E3 2013 Trailer',
'upload_date': '20130611',
'duration': 82,
'view_count': int,
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_info = self._parse_json(
js_to_json(self._html_search_regex(
'(?s)videoObject\s*=\s*({.+?});', webpage, 'video object')),
video_id)
video_info = self._download_json(
'http://www.clipfish.de/devapi/id/%s?format=json&apikey=hbbtv' % video_id,
video_id)['items'][0]
formats = []
for video_url in re.findall(r'var\s+videourl\s*=\s*"([^"]+)"', webpage):
ext = determine_ext(video_url)
if ext == 'm3u8':
formats.append({
'url': video_url.replace('de.hls.fra.clipfish.de', 'hls.fra.clipfish.de'),
'ext': 'mp4',
'format_id': 'hls',
})
else:
formats.append({
'url': video_url,
'format_id': ext,
})
self._sort_formats(formats)
title = remove_end(self._og_search_title(webpage), ' - Video')
thumbnail = self._og_search_thumbnail(webpage)
duration = int_or_none(video_info.get('length'))
timestamp = parse_iso8601(self._html_search_meta('uploadDate', webpage, 'upload date'))
m3u8_url = video_info.get('media_videourl_hls')
if m3u8_url:
formats.append({
'url': m3u8_url.replace('de.hls.fra.clipfish.de', 'hls.fra.clipfish.de'),
'ext': 'mp4',
'format_id': 'hls',
})
mp4_url = video_info.get('media_videourl')
if mp4_url:
formats.append({
'url': mp4_url,
'format_id': 'mp4',
'width': int_or_none(video_info.get('width')),
'height': int_or_none(video_info.get('height')),
'tbr': int_or_none(video_info.get('bitrate')),
})
return {
'id': video_id,
'title': title,
'title': video_info['title'],
'description': video_info.get('descr'),
'formats': formats,
'thumbnail': thumbnail,
'duration': duration,
'timestamp': timestamp,
'thumbnail': video_info.get('media_content_thumbnail_large') or video_info.get('media_thumbnail'),
'duration': int_or_none(video_info.get('media_length')),
'upload_date': unified_strdate(video_info.get('pubDate')),
'view_count': int_or_none(video_info.get('media_views'))
}

View File

@@ -1,7 +1,7 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import determine_ext
from ..utils import int_or_none
_translation_table = {
@@ -42,31 +42,26 @@ class CliphunterIE(InfoExtractor):
video_title = self._search_regex(
r'mediaTitle = "([^"]+)"', webpage, 'title')
fmts = {}
for fmt in ('mp4', 'flv'):
fmt_list = self._parse_json(self._search_regex(
r'var %sjson\s*=\s*(\[.*?\]);' % fmt, webpage, '%s formats' % fmt), video_id)
for f in fmt_list:
fmts[f['fname']] = _decode(f['sUrl'])
qualities = self._parse_json(self._search_regex(
r'var player_btns\s*=\s*(.*?);\n', webpage, 'quality info'), video_id)
gexo_files = self._parse_json(
self._search_regex(
r'var\s+gexoFiles\s*=\s*({.+?});', webpage, 'gexo files'),
video_id)
formats = []
for fname, url in fmts.items():
f = {
'url': url,
}
if fname in qualities:
qual = qualities[fname]
f.update({
'format_id': '%s_%sp' % (determine_ext(url), qual['h']),
'width': qual['w'],
'height': qual['h'],
'tbr': qual['br'],
})
formats.append(f)
for format_id, f in gexo_files.items():
video_url = f.get('url')
if not video_url:
continue
fmt = f.get('fmt')
height = f.get('h')
format_id = '%s_%sp' % (fmt, height) if fmt and height else format_id
formats.append({
'url': _decode(video_url),
'format_id': format_id,
'width': int_or_none(f.get('w')),
'height': int_or_none(height),
'tbr': int_or_none(f.get('br')),
})
self._sort_formats(formats)
thumbnail = self._search_regex(

View File

@@ -1,15 +1,11 @@
# coding: utf-8
from __future__ import unicode_literals
import json
from .common import InfoExtractor
from ..utils import (
ExtractorError,
)
from .theplatform import ThePlatformIE
from ..utils import int_or_none
class CNETIE(InfoExtractor):
class CNETIE(ThePlatformIE):
_VALID_URL = r'https?://(?:www\.)?cnet\.com/videos/(?P<id>[^/]+)/'
_TESTS = [{
'url': 'http://www.cnet.com/videos/hands-on-with-microsofts-windows-8-1-update/',
@@ -18,25 +14,20 @@ class CNETIE(InfoExtractor):
'ext': 'flv',
'title': 'Hands-on with Microsoft Windows 8.1 Update',
'description': 'The new update to the Windows 8 OS brings improved performance for mouse and keyboard users.',
'thumbnail': 're:^http://.*/flmswindows8.jpg$',
'uploader_id': '6085384d-619e-11e3-b231-14feb5ca9861',
'uploader': 'Sarah Mitroff',
'duration': 70,
},
'params': {
'skip_download': 'requires rtmpdump',
}
}, {
'url': 'http://www.cnet.com/videos/whiny-pothole-tweets-at-local-government-when-hit-by-cars-tomorrow-daily-187/',
'info_dict': {
'id': '56527b93-d25d-44e3-b738-f989ce2e49ba',
'ext': 'flv',
'title': 'Whiny potholes tweet at local government when hit by cars (Tomorrow Daily 187)',
'description': 'Khail and Ashley wonder what other civic woes can be solved by self-tweeting objects, investigate a new kind of VR camera and watch an origami robot self-assemble, walk, climb, dig and dissolve. #TDPothole',
'uploader_id': 'b163284d-6b73-44fc-b3e6-3da66c392d40',
'uploader': 'Ashley Esqueda',
'title': 'Whiny potholes tweet at local government when hit by cars (Tomorrow Daily 187)',
},
'params': {
'skip_download': True, # requires rtmpdump
'duration': 1482,
},
}]
@@ -45,26 +36,13 @@ class CNETIE(InfoExtractor):
webpage = self._download_webpage(url, display_id)
data_json = self._html_search_regex(
r"<div class=\"cnetVideoPlayer\"\s+.*?data-cnet-video-options='([^']+)'",
r"data-cnet-video(?:-uvp)?-options='([^']+)'",
webpage, 'data json')
data = json.loads(data_json)
vdata = data['video']
if not vdata:
vdata = data['videos'][0]
if not vdata:
raise ExtractorError('Cannot find video data')
mpx_account = data['config']['players']['default']['mpx_account']
vid = vdata['files'].get('rtmp', vdata['files']['hds'])
tp_link = 'http://link.theplatform.com/s/%s/%s' % (mpx_account, vid)
data = self._parse_json(data_json, display_id)
vdata = data.get('video') or data['videos'][0]
video_id = vdata['id']
title = vdata.get('headline')
if title is None:
title = vdata.get('title')
if title is None:
raise ExtractorError('Cannot find title!')
thumbnail = vdata.get('image', {}).get('path')
title = vdata['title']
author = vdata.get('author')
if author:
uploader = '%s %s' % (author['firstName'], author['lastName'])
@@ -73,13 +51,34 @@ class CNETIE(InfoExtractor):
uploader = None
uploader_id = None
mpx_account = data['config']['uvpConfig']['default']['mpx_account']
metadata = self.get_metadata('%s/%s' % (mpx_account, list(vdata['files'].values())[0]), video_id)
description = vdata.get('description') or metadata.get('description')
duration = int_or_none(vdata.get('duration')) or metadata.get('duration')
formats = []
subtitles = {}
for (fkey, vid) in vdata['files'].items():
if fkey == 'hls_phone' and 'hls_tablet' in vdata['files']:
continue
release_url = 'http://link.theplatform.com/s/%s/%s?format=SMIL&mbr=true' % (mpx_account, vid)
if fkey == 'hds':
release_url += '&manifest=f4m'
tp_formats, tp_subtitles = self._extract_theplatform_smil(release_url, video_id, 'Downloading %s SMIL data' % fkey)
formats.extend(tp_formats)
subtitles = self._merge_subtitles(subtitles, tp_subtitles)
self._sort_formats(formats)
return {
'_type': 'url_transparent',
'url': tp_link,
'id': video_id,
'display_id': display_id,
'title': title,
'description': description,
'thumbnail': metadata.get('thumbnail'),
'duration': duration,
'uploader': uploader,
'uploader_id': uploader_id,
'thumbnail': thumbnail,
'subtitles': subtitles,
'formats': formats,
}

View File

@@ -3,10 +3,10 @@ from __future__ import unicode_literals
import json
from .common import InfoExtractor
from ..compat import compat_urllib_request
from ..utils import (
float_or_none,
int_or_none,
sanitized_Request,
)
@@ -52,7 +52,7 @@ class CollegeRamaIE(InfoExtractor):
}
}
request = compat_urllib_request.Request(
request = sanitized_Request(
'http://collegerama.tudelft.nl/Mediasite/PlayerService/PlayerService.svc/json/GetPlayerOptions',
json.dumps(player_options_request))
request.add_header('Content-Type', 'application/json')

View File

@@ -1,10 +1,12 @@
# encoding: utf-8
from __future__ import unicode_literals
import json
from .common import InfoExtractor
from ..utils import parse_iso8601
from ..utils import (
int_or_none,
parse_duration,
parse_iso8601,
)
class ComCarCoffIE(InfoExtractor):
@@ -16,6 +18,7 @@ class ComCarCoffIE(InfoExtractor):
'ext': 'mp4',
'upload_date': '20141127',
'timestamp': 1417107600,
'duration': 1232,
'title': 'Happy Thanksgiving Miranda',
'description': 'Jerry Seinfeld and his special guest Miranda Sings cruise around town in search of coffee, complaining and apologizing along the way.',
'thumbnail': 'http://ccc.crackle.com/images/s5e4_thumb.jpg',
@@ -31,9 +34,10 @@ class ComCarCoffIE(InfoExtractor):
display_id = 'comediansincarsgettingcoffee.com'
webpage = self._download_webpage(url, display_id)
full_data = json.loads(self._search_regex(
r'<script type="application/json" id="videoData">(?P<json>.+?)</script>',
webpage, 'full data json'))
full_data = self._parse_json(
self._search_regex(
r'window\.app\s*=\s*({.+?});\n', webpage, 'full data json'),
display_id)['videoData']
video_id = full_data['activeVideo']['video']
video_data = full_data.get('videos', {}).get(video_id) or full_data['singleshots'][video_id]
@@ -45,12 +49,18 @@ class ComCarCoffIE(InfoExtractor):
formats = self._extract_m3u8_formats(
video_data['mediaUrl'], video_id, ext='mp4')
timestamp = int_or_none(video_data.get('pubDateTime')) or parse_iso8601(
video_data.get('pubDate'))
duration = int_or_none(video_data.get('durationSeconds')) or parse_duration(
video_data.get('duration'))
return {
'id': video_id,
'display_id': display_id,
'title': video_data['title'],
'description': video_data.get('description'),
'timestamp': parse_iso8601(video_data.get('pubDate')),
'timestamp': timestamp,
'duration': duration,
'thumbnails': thumbnails,
'formats': formats,
'webpage_url': 'http://comediansincarsgettingcoffee.com/%s' % (video_data.get('urlSlug', video_data.get('slug'))),

View File

@@ -18,8 +18,6 @@ from ..compat import (
compat_http_client,
compat_urllib_error,
compat_urllib_parse,
compat_urllib_parse_urlparse,
compat_urllib_request,
compat_urlparse,
compat_str,
compat_etree_fromstring,
@@ -31,17 +29,21 @@ from ..utils import (
clean_html,
compiled_regex_type,
determine_ext,
error_to_compat_str,
ExtractorError,
fix_xml_ampersands,
float_or_none,
int_or_none,
parse_iso8601,
RegexNotFoundError,
sanitize_filename,
sanitized_Request,
unescapeHTML,
unified_strdate,
url_basename,
xpath_text,
xpath_with_ns,
determine_protocol,
)
@@ -107,8 +109,9 @@ class InfoExtractor(object):
-2 or smaller for less than default.
< -1000 to hide the format (if there is
another one which is strictly better)
* language_preference Is this in the correct requested
language?
* language Language code, e.g. "de" or "en-US".
* language_preference Is this in the language mentioned in
the URL?
10 if it's what the URL is about,
-1 for default (don't know),
-10 otherwise, other values reserved for now.
@@ -167,7 +170,7 @@ class InfoExtractor(object):
"ext" will be calculated from URL if missing
automatic_captions: Like 'subtitles', used by the YoutubeIE for
automatically generated captions
duration: Length of the video in seconds, as an integer.
duration: Length of the video in seconds, as an integer or float.
view_count: How many users have watched the video on the platform.
like_count: Number of positive ratings of the video
dislike_count: Number of negative ratings of the video
@@ -199,6 +202,26 @@ class InfoExtractor(object):
end_time: Time in seconds where the reproduction should end, as
specified in the URL.
The following fields should only be used when the video belongs to some logical
chapter or section:
chapter: Name or title of the chapter the video belongs to.
chapter_number: Number of the chapter the video belongs to, as an integer.
chapter_id: Id of the chapter the video belongs to, as a unicode string.
The following fields should only be used when the video is an episode of some
series or programme:
series: Title of the series or programme the video episode belongs to.
season: Title of the season the video episode belongs to.
season_number: Number of the season the video episode belongs to, as an integer.
season_id: Id of the season the video episode belongs to, as a unicode string.
episode: Title of the video episode. Unlike mandatory video title field,
this field should denote the exact title of the video episode
without any kind of decoration.
episode_number: Number of the video episode within a season, as an integer.
episode_id: Id of the video episode, as a unicode string.
Unless mentioned otherwise, the fields should be Unicode strings.
Unless mentioned otherwise, None is equivalent to absence of information.
@@ -291,9 +314,9 @@ class InfoExtractor(object):
except ExtractorError:
raise
except compat_http_client.IncompleteRead as e:
raise ExtractorError('A network error has occured.', cause=e, expected=True)
raise ExtractorError('A network error has occurred.', cause=e, expected=True)
except (KeyError, StopIteration) as e:
raise ExtractorError('An extractor error has occured.', cause=e)
raise ExtractorError('An extractor error has occurred.', cause=e)
def set_downloader(self, downloader):
"""Sets the downloader for this IE."""
@@ -332,7 +355,8 @@ class InfoExtractor(object):
return False
if errnote is None:
errnote = 'Unable to download webpage'
errmsg = '%s: %s' % (errnote, compat_str(err))
errmsg = '%s: %s' % (errnote, error_to_compat_str(err))
if fatal:
raise ExtractorError(errmsg, sys.exc_info()[2], cause=err)
else:
@@ -622,7 +646,7 @@ class InfoExtractor(object):
else:
raise netrc.NetrcParseError('No authenticators for %s' % self._NETRC_MACHINE)
except (IOError, netrc.NetrcParseError) as err:
self._downloader.report_warning('parsing .netrc: %s' % compat_str(err))
self._downloader.report_warning('parsing .netrc: %s' % error_to_compat_str(err))
return (username, password)
@@ -739,6 +763,42 @@ class InfoExtractor(object):
return self._html_search_meta('twitter:player', html,
'twitter card player')
def _search_json_ld(self, html, video_id, **kwargs):
json_ld = self._search_regex(
r'(?s)<script[^>]+type=(["\'])application/ld\+json\1[^>]*>(?P<json_ld>.+?)</script>',
html, 'JSON-LD', group='json_ld', **kwargs)
if not json_ld:
return {}
return self._json_ld(json_ld, video_id, fatal=kwargs.get('fatal', True))
def _json_ld(self, json_ld, video_id, fatal=True):
if isinstance(json_ld, compat_str):
json_ld = self._parse_json(json_ld, video_id, fatal=fatal)
if not json_ld:
return {}
info = {}
if json_ld.get('@context') == 'http://schema.org':
item_type = json_ld.get('@type')
if item_type == 'TVEpisode':
info.update({
'episode': unescapeHTML(json_ld.get('name')),
'episode_number': int_or_none(json_ld.get('episodeNumber')),
'description': unescapeHTML(json_ld.get('description')),
})
part_of_season = json_ld.get('partOfSeason')
if isinstance(part_of_season, dict) and part_of_season.get('@type') == 'TVSeason':
info['season_number'] = int_or_none(part_of_season.get('seasonNumber'))
part_of_series = json_ld.get('partOfSeries')
if isinstance(part_of_series, dict) and part_of_series.get('@type') == 'TVSeries':
info['series'] = unescapeHTML(part_of_series.get('name'))
elif item_type == 'Article':
info.update({
'timestamp': parse_iso8601(json_ld.get('datePublished')),
'title': unescapeHTML(json_ld.get('headline')),
'description': unescapeHTML(json_ld.get('articleBody')),
})
return dict((k, v) for k, v in info.items() if v is not None)
@staticmethod
def _hidden_inputs(html):
html = re.sub(r'<!--(?:(?!<!--).)*-->', '', html)
@@ -765,6 +825,12 @@ class InfoExtractor(object):
if not formats:
raise ExtractorError('No video formats found')
for f in formats:
# Automatically determine tbr when missing based on abr and vbr (improves
# formats sorting in some cases)
if 'tbr' not in f and f.get('abr') is not None and f.get('vbr') is not None:
f['tbr'] = f['abr'] + f['vbr']
def _formats_key(f):
# TODO remove the following workaround
from ..utils import determine_ext
@@ -776,14 +842,12 @@ class InfoExtractor(object):
preference = f.get('preference')
if preference is None:
proto = f.get('protocol')
if proto is None:
proto = compat_urllib_parse_urlparse(f.get('url', '')).scheme
preference = 0 if proto in ['http', 'https'] else -0.1
preference = 0
if f.get('ext') in ['f4f', 'f4m']: # Not yet supported
preference -= 0.5
proto_preference = 0 if determine_protocol(f) in ['http', 'https'] else -0.1
if f.get('vcodec') == 'none': # audio only
if self._downloader.params.get('prefer_free_formats'):
ORDER = ['aac', 'mp3', 'm4a', 'webm', 'ogg', 'opus']
@@ -814,6 +878,7 @@ class InfoExtractor(object):
f.get('vbr') if f.get('vbr') is not None else -1,
f.get('height') if f.get('height') is not None else -1,
f.get('width') if f.get('width') is not None else -1,
proto_preference,
ext_preference,
f.get('abr') if f.get('abr') is not None else -1,
audio_ext_preference,
@@ -883,7 +948,7 @@ class InfoExtractor(object):
fatal=fatal)
if manifest is False:
return manifest
return []
formats = []
manifest_version = '1.0'
@@ -891,6 +956,11 @@ class InfoExtractor(object):
if not media_nodes:
manifest_version = '2.0'
media_nodes = manifest.findall('{http://ns.adobe.com/f4m/2.0}media')
base_url = xpath_text(
manifest, ['{http://ns.adobe.com/f4m/1.0}baseURL', '{http://ns.adobe.com/f4m/2.0}baseURL'],
'base URL', default=None)
if base_url:
base_url = base_url.strip()
for i, media_el in enumerate(media_nodes):
if manifest_version == '2.0':
media_url = media_el.attrib.get('href') or media_el.attrib.get('url')
@@ -898,16 +968,14 @@ class InfoExtractor(object):
continue
manifest_url = (
media_url if media_url.startswith('http://') or media_url.startswith('https://')
else ('/'.join(manifest_url.split('/')[:-1]) + '/' + media_url))
else ((base_url or '/'.join(manifest_url.split('/')[:-1])) + '/' + media_url))
# If media_url is itself a f4m manifest do the recursive extraction
# since bitrates in parent manifest (this one) and media_url manifest
# may differ leading to inability to resolve the format by requested
# bitrate in f4m downloader
if determine_ext(manifest_url) == 'f4m':
f4m_formats = self._extract_f4m_formats(
manifest_url, video_id, preference, f4m_id, fatal=fatal)
if f4m_formats:
formats.extend(f4m_formats)
formats.extend(self._extract_f4m_formats(
manifest_url, video_id, preference, f4m_id, fatal=fatal))
continue
tbr = int_or_none(media_el.attrib.get('bitrate'))
formats.append({
@@ -949,9 +1017,21 @@ class InfoExtractor(object):
errnote=errnote or 'Failed to download m3u8 information',
fatal=fatal)
if res is False:
return res
return []
m3u8_doc, urlh = res
m3u8_url = urlh.geturl()
# A Media Playlist Tag MUST NOT appear in a Master Playlist
# https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3
# The EXT-X-TARGETDURATION tag is REQUIRED for every M3U8 Media Playlists
# https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3.1
if '#EXT-X-TARGETDURATION' in m3u8_doc:
return [{
'url': m3u8_url,
'format_id': m3u8_id,
'ext': ext,
'protocol': entry_protocol,
'preference': preference,
}]
last_info = None
last_media = None
kv_rex = re.compile(
@@ -996,9 +1076,9 @@ class InfoExtractor(object):
# TODO: looks like video codec is not always necessarily goes first
va_codecs = codecs.split(',')
if va_codecs[0]:
f['vcodec'] = va_codecs[0].partition('.')[0]
f['vcodec'] = va_codecs[0]
if len(va_codecs) > 1 and va_codecs[1]:
f['acodec'] = va_codecs[1].partition('.')[0]
f['acodec'] = va_codecs[1]
resolution = last_info.get('RESOLUTION')
if resolution:
width_str, height_str = resolution.split('x')
@@ -1102,6 +1182,7 @@ class InfoExtractor(object):
formats = []
rtmp_count = 0
http_count = 0
m3u8_count = 0
videos = smil.findall(self._xpath_ns('.//video', namespace))
for video in videos:
@@ -1143,8 +1224,15 @@ class InfoExtractor(object):
if proto == 'm3u8' or src_ext == 'm3u8':
m3u8_formats = self._extract_m3u8_formats(
src_url, video_id, ext or 'mp4', m3u8_id='hls', fatal=False)
if m3u8_formats:
formats.extend(m3u8_formats)
if len(m3u8_formats) == 1:
m3u8_count += 1
m3u8_formats[0].update({
'format_id': 'hls-%d' % (m3u8_count if bitrate is None else bitrate),
'tbr': bitrate,
'width': width,
'height': height,
})
formats.extend(m3u8_formats)
continue
if src_ext == 'f4m':
@@ -1156,9 +1244,7 @@ class InfoExtractor(object):
}
f4m_url += '&' if '?' in f4m_url else '?'
f4m_url += compat_urllib_parse.urlencode(f4m_params)
f4m_formats = self._extract_f4m_formats(f4m_url, video_id, f4m_id='hds', fatal=False)
if f4m_formats:
formats.extend(f4m_formats)
formats.extend(self._extract_f4m_formats(f4m_url, video_id, f4m_id='hds', fatal=False))
continue
if src_url.startswith('http') and self._is_valid_url(src, video_id):
@@ -1244,6 +1330,83 @@ class InfoExtractor(object):
})
return entries
def _download_dash_manifest(self, dash_manifest_url, video_id, fatal=True):
return self._download_xml(
dash_manifest_url, video_id,
note='Downloading DASH manifest',
errnote='Could not download DASH manifest',
fatal=fatal)
def _extract_dash_manifest_formats(self, dash_manifest_url, video_id, fatal=True, namespace=None, formats_dict={}):
dash_doc = self._download_dash_manifest(dash_manifest_url, video_id, fatal)
if dash_doc is False:
return []
return self._parse_dash_manifest(
dash_doc, namespace=namespace, formats_dict=formats_dict)
def _parse_dash_manifest(self, dash_doc, namespace=None, formats_dict={}):
def _add_ns(path):
return self._xpath_ns(path, namespace)
formats = []
for a in dash_doc.findall('.//' + _add_ns('AdaptationSet')):
mime_type = a.attrib.get('mimeType')
for r in a.findall(_add_ns('Representation')):
mime_type = r.attrib.get('mimeType') or mime_type
url_el = r.find(_add_ns('BaseURL'))
if mime_type == 'text/vtt':
# TODO implement WebVTT downloading
pass
elif mime_type.startswith('audio/') or mime_type.startswith('video/'):
segment_list = r.find(_add_ns('SegmentList'))
format_id = r.attrib['id']
video_url = url_el.text if url_el is not None else None
filesize = int_or_none(url_el.attrib.get('{http://youtube.com/yt/2012/10/10}contentLength') if url_el is not None else None)
f = {
'format_id': format_id,
'url': video_url,
'width': int_or_none(r.attrib.get('width')),
'height': int_or_none(r.attrib.get('height')),
'tbr': int_or_none(r.attrib.get('bandwidth'), 1000),
'asr': int_or_none(r.attrib.get('audioSamplingRate')),
'filesize': filesize,
'fps': int_or_none(r.attrib.get('frameRate')),
}
if segment_list is not None:
initialization_url = segment_list.find(_add_ns('Initialization')).attrib['sourceURL']
f.update({
'initialization_url': initialization_url,
'segment_urls': [segment.attrib.get('media') for segment in segment_list.findall(_add_ns('SegmentURL'))],
'protocol': 'http_dash_segments',
})
if not f.get('url'):
f['url'] = initialization_url
try:
existing_format = next(
fo for fo in formats
if fo['format_id'] == format_id)
except StopIteration:
full_info = formats_dict.get(format_id, {}).copy()
full_info.update(f)
codecs = r.attrib.get('codecs')
if codecs:
if mime_type.startswith('video/'):
vcodec, acodec = codecs, 'none'
else: # mime_type.startswith('audio/')
vcodec, acodec = 'none', codecs
full_info.update({
'vcodec': vcodec,
'acodec': acodec,
})
formats.append(full_info)
else:
existing_format.update(f)
else:
self.report_warning('Unknown MIME type %s in DASH manifest' % mime_type)
return formats
def _live_title(self, name):
""" Generate the title for a live video """
now = datetime.datetime.now()
@@ -1280,7 +1443,7 @@ class InfoExtractor(object):
def _get_cookies(self, url):
""" Return a compat_cookies.SimpleCookie with the cookies for the url """
req = compat_urllib_request.Request(url)
req = sanitized_Request(url)
self._downloader.cookiejar.add_cookie_header(req)
return compat_cookies.SimpleCookie(req.get_header('Cookie'))

View File

@@ -23,6 +23,7 @@ from ..utils import (
int_or_none,
lowercase_escape,
remove_end,
sanitized_Request,
unified_strdate,
urlencode_postdata,
xpath_text,
@@ -46,7 +47,7 @@ class CrunchyrollBaseIE(InfoExtractor):
'name': username,
'password': password,
})
login_request = compat_urllib_request.Request(login_url, data)
login_request = sanitized_Request(login_url, data)
login_request.add_header('Content-Type', 'application/x-www-form-urlencoded')
self._download_webpage(login_request, None, False, 'Wrong login info')
@@ -55,7 +56,7 @@ class CrunchyrollBaseIE(InfoExtractor):
def _download_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True, tries=1, timeout=5, encoding=None):
request = (url_or_request if isinstance(url_or_request, compat_urllib_request.Request)
else compat_urllib_request.Request(url_or_request))
else sanitized_Request(url_or_request))
# Accept-Language must be set explicitly to accept any language to avoid issues
# similar to https://github.com/rg3/youtube-dl/issues/6797.
# Along with IP address Crunchyroll uses Accept-Language to guess whether georestriction
@@ -307,7 +308,7 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
'video_uploader', fatal=False)
playerdata_url = compat_urllib_parse_unquote(self._html_search_regex(r'"config_url":"([^"]+)', webpage, 'playerdata_url'))
playerdata_req = compat_urllib_request.Request(playerdata_url)
playerdata_req = sanitized_Request(playerdata_url)
playerdata_req.data = compat_urllib_parse.urlencode({'current_page': webpage_url})
playerdata_req.add_header('Content-Type', 'application/x-www-form-urlencoded')
playerdata = self._download_webpage(playerdata_req, video_id, note='Downloading media info')
@@ -319,7 +320,7 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
for fmt in re.findall(r'showmedia\.([0-9]{3,4})p', webpage):
stream_quality, stream_format = self._FORMAT_IDS[fmt]
video_format = fmt + 'p'
streamdata_req = compat_urllib_request.Request(
streamdata_req = sanitized_Request(
'http://www.crunchyroll.com/xml/?req=RpcApiVideoPlayer_GetStandardConfig&media_id=%s&video_format=%s&video_quality=%s'
% (stream_id, stream_format, stream_quality),
compat_urllib_parse.urlencode({'current_page': url}).encode('utf-8'))
@@ -328,8 +329,10 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
streamdata_req, video_id,
note='Downloading media info for %s' % video_format)
stream_info = streamdata.find('./{default}preload/stream_info')
video_url = stream_info.find('./host').text
video_play_path = stream_info.find('./file').text
video_url = xpath_text(stream_info, './host')
video_play_path = xpath_text(stream_info, './file')
if not video_url or not video_play_path:
continue
metadata = stream_info.find('./metadata')
format_info = {
'format': video_format,

View File

@@ -9,6 +9,7 @@ from ..utils import (
find_xpath_attr,
smuggle_url,
determine_ext,
ExtractorError,
)
from .senateisvp import SenateISVPIE
@@ -18,33 +19,32 @@ class CSpanIE(InfoExtractor):
IE_DESC = 'C-SPAN'
_TESTS = [{
'url': 'http://www.c-span.org/video/?313572-1/HolderonV',
'md5': '8e44ce11f0f725527daccc453f553eb0',
'md5': '94b29a4f131ff03d23471dd6f60b6a1d',
'info_dict': {
'id': '315139',
'ext': 'mp4',
'title': 'Attorney General Eric Holder on Voting Rights Act Decision',
'description': 'Attorney General Eric Holder spoke to reporters following the Supreme Court decision in Shelby County v. Holder in which the court ruled that the preclearance provisions of the Voting Rights Act could not be enforced until Congress established new guidelines for review.',
'description': 'Attorney General Eric Holder speaks to reporters following the Supreme Court decision in [Shelby County v. Holder], in which the court ruled that the preclearance provisions of the Voting Rights Act could not be enforced.',
},
'skip': 'Regularly fails on travis, for unknown reasons',
}, {
'url': 'http://www.c-span.org/video/?c4486943/cspan-international-health-care-models',
# For whatever reason, the served video alternates between
# two different ones
'md5': '8e5fbfabe6ad0f89f3012a7943c1287b',
'info_dict': {
'id': '340723',
'id': 'c4486943',
'ext': 'mp4',
'title': 'International Health Care Models',
'title': 'CSPAN - International Health Care Models',
'description': 'md5:7a985a2d595dba00af3d9c9f0783c967',
}
}, {
'url': 'http://www.c-span.org/video/?318608-1/gm-ignition-switch-recall',
'md5': '446562a736c6bf97118e389433ed88d4',
'md5': '2ae5051559169baadba13fc35345ae74',
'info_dict': {
'id': '342759',
'ext': 'mp4',
'title': 'General Motors Ignition Switch Recall',
'duration': 14848,
'description': 'md5:70c7c3b8fa63fa60d42772440596034c'
'description': 'md5:118081aedd24bf1d3b68b3803344e7f3'
},
}, {
# Video from senate.gov
@@ -57,67 +57,94 @@ class CSpanIE(InfoExtractor):
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
page_id = mobj.group('id')
webpage = self._download_webpage(url, page_id)
video_id = self._search_regex(r'progid=\'?([0-9]+)\'?>', webpage, 'video id')
video_id = self._match_id(url)
video_type = None
webpage = self._download_webpage(url, video_id)
# We first look for clipid, because clipprog always appears before
patterns = [r'id=\'clip(%s)\'\s*value=\'([0-9]+)\'' % t for t in ('id', 'prog')]
results = list(filter(None, (re.search(p, webpage) for p in patterns)))
if results:
matches = results[0]
video_type, video_id = matches.groups()
video_type = 'clip' if video_type == 'id' else 'program'
else:
m = re.search(r'data-(?P<type>clip|prog)id=["\'](?P<id>\d+)', webpage)
if m:
video_id = m.group('id')
video_type = 'program' if m.group('type') == 'prog' else 'clip'
else:
senate_isvp_url = SenateISVPIE._search_iframe_url(webpage)
if senate_isvp_url:
title = self._og_search_title(webpage)
surl = smuggle_url(senate_isvp_url, {'force_title': title})
return self.url_result(surl, 'SenateISVP', video_id, title)
if video_type is None or video_id is None:
raise ExtractorError('unable to find video id and type')
description = self._html_search_regex(
[
# The full description
r'<div class=\'expandable\'>(.*?)<a href=\'#\'',
# If the description is small enough the other div is not
# present, otherwise this is a stripped version
r'<p class=\'initial\'>(.*?)</p>'
],
webpage, 'description', flags=re.DOTALL, default=None)
def get_text_attr(d, attr):
return d.get(attr, {}).get('#text')
info_url = 'http://c-spanvideo.org/videoLibrary/assets/player/ajax-player.php?os=android&html5=program&id=' + video_id
data = self._download_json(info_url, video_id)
data = self._download_json(
'http://www.c-span.org/assets/player/ajax-player.php?os=android&html5=%s&id=%s' % (video_type, video_id),
video_id)['video']
if data['@status'] != 'Success':
raise ExtractorError('%s said: %s' % (self.IE_NAME, get_text_attr(data, 'error')), expected=True)
doc = self._download_xml(
'http://www.c-span.org/common/services/flashXml.php?programid=' + video_id,
'http://www.c-span.org/common/services/flashXml.php?%sid=%s' % (video_type, video_id),
video_id)
description = self._html_search_meta('description', webpage)
title = find_xpath_attr(doc, './/string', 'name', 'title').text
thumbnail = find_xpath_attr(doc, './/string', 'name', 'poster').text
senate_isvp_url = SenateISVPIE._search_iframe_url(webpage)
if senate_isvp_url:
surl = smuggle_url(senate_isvp_url, {'force_title': title})
return self.url_result(surl, 'SenateISVP', video_id, title)
files = data['files']
capfile = get_text_attr(data, 'capfile')
files = data['video']['files']
try:
capfile = data['video']['capfile']['#text']
except KeyError:
capfile = None
entries = [{
'id': '%s_%d' % (video_id, partnum + 1),
'title': (
title if len(files) == 1 else
'%s part %d' % (title, partnum + 1)),
'url': unescapeHTML(f['path']['#text']),
'description': description,
'thumbnail': thumbnail,
'duration': int_or_none(f.get('length', {}).get('#text')),
'subtitles': {
'en': [{
'url': capfile,
'ext': determine_ext(capfile, 'dfxp')
}],
} if capfile else None,
} for partnum, f in enumerate(files)]
entries = []
for partnum, f in enumerate(files):
formats = []
for quality in f['qualities']:
formats.append({
'format_id': '%s-%sp' % (get_text_attr(quality, 'bitrate'), get_text_attr(quality, 'height')),
'url': unescapeHTML(get_text_attr(quality, 'file')),
'height': int_or_none(get_text_attr(quality, 'height')),
'tbr': int_or_none(get_text_attr(quality, 'bitrate')),
})
if not formats:
path = unescapeHTML(get_text_attr(f, 'path'))
if not path:
continue
formats = self._extract_m3u8_formats(
path, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls') if determine_ext(path) == 'm3u8' else [{'url': path, }]
self._sort_formats(formats)
entries.append({
'id': '%s_%d' % (video_id, partnum + 1),
'title': (
title if len(files) == 1 else
'%s part %d' % (title, partnum + 1)),
'formats': formats,
'description': description,
'thumbnail': thumbnail,
'duration': int_or_none(get_text_attr(f, 'length')),
'subtitles': {
'en': [{
'url': capfile,
'ext': determine_ext(capfile, 'dfxp')
}],
} if capfile else None,
})
if len(entries) == 1:
entry = dict(entries[0])
entry['id'] = video_id
entry['id'] = 'c' + video_id if video_type == 'clip' else video_id
return entry
else:
return {
'_type': 'playlist',
'entries': entries,
'title': title,
'id': video_id,
'id': 'c' + video_id if video_type == 'clip' else video_id,
}

View File

@@ -0,0 +1,63 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import int_or_none
class CultureUnpluggedIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?cultureunplugged\.com/documentary/watch-online/play/(?P<id>\d+)(?:/(?P<display_id>[^/]+))?'
_TESTS = [{
'url': 'http://www.cultureunplugged.com/documentary/watch-online/play/53662/The-Next--Best-West',
'md5': 'ac6c093b089f7d05e79934dcb3d228fc',
'info_dict': {
'id': '53662',
'display_id': 'The-Next--Best-West',
'ext': 'mp4',
'title': 'The Next, Best West',
'description': 'md5:0423cd00833dea1519cf014e9d0903b1',
'thumbnail': 're:^https?://.*\.jpg$',
'creator': 'Coldstream Creative',
'duration': 2203,
'view_count': int,
}
}, {
'url': 'http://www.cultureunplugged.com/documentary/watch-online/play/53662',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
display_id = mobj.group('display_id') or video_id
movie_data = self._download_json(
'http://www.cultureunplugged.com/movie-data/cu-%s.json' % video_id, display_id)
video_url = movie_data['url']
title = movie_data['title']
description = movie_data.get('synopsis')
creator = movie_data.get('producer')
duration = int_or_none(movie_data.get('duration'))
view_count = int_or_none(movie_data.get('views'))
thumbnails = [{
'url': movie_data['%s_thumb' % size],
'id': size,
'preference': preference,
} for preference, size in enumerate((
'small', 'large')) if movie_data.get('%s_thumb' % size)]
return {
'id': video_id,
'display_id': display_id,
'url': video_url,
'title': title,
'description': description,
'creator': creator,
'duration': duration,
'view_count': view_count,
'thumbnails': thumbnails,
}

View File

@@ -0,0 +1,88 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
parse_iso8601,
)
class CWTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?cw(?:tv|seed)\.com/shows/(?:[^/]+/){2}\?play=(?P<id>[a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{12})'
_TESTS = [{
'url': 'http://cwtv.com/shows/arrow/legends-of-yesterday/?play=6b15e985-9345-4f60-baf8-56e96be57c63',
'info_dict': {
'id': '6b15e985-9345-4f60-baf8-56e96be57c63',
'ext': 'mp4',
'title': 'Legends of Yesterday',
'description': 'Oliver and Barry Allen take Kendra Saunders and Carter Hall to a remote location to keep them hidden from Vandal Savage while they figure out how to defeat him.',
'duration': 2665,
'series': 'Arrow',
'season_number': 4,
'season': '4',
'episode_number': 8,
'upload_date': '20151203',
'timestamp': 1449122100,
},
'params': {
# m3u8 download
'skip_download': True,
}
}, {
'url': 'http://www.cwseed.com/shows/whose-line-is-it-anyway/jeff-davis-4/?play=24282b12-ead2-42f2-95ad-26770c2c6088',
'info_dict': {
'id': '24282b12-ead2-42f2-95ad-26770c2c6088',
'ext': 'mp4',
'title': 'Jeff Davis 4',
'description': 'Jeff Davis is back to make you laugh.',
'duration': 1263,
'series': 'Whose Line Is It Anyway?',
'season_number': 11,
'season': '11',
'episode_number': 20,
'upload_date': '20151006',
'timestamp': 1444107300,
},
'params': {
# m3u8 download
'skip_download': True,
}
}]
def _real_extract(self, url):
video_id = self._match_id(url)
video_data = self._download_json(
'http://metaframe.digitalsmiths.tv/v2/CWtv/assets/%s/partner/132?format=json' % video_id, video_id)
formats = self._extract_m3u8_formats(
video_data['videos']['variantplaylist']['uri'], video_id, 'mp4')
thumbnails = [{
'url': image['uri'],
'width': image.get('width'),
'height': image.get('height'),
} for image_id, image in video_data['images'].items() if image.get('uri')] if video_data.get('images') else None
video_metadata = video_data['assetFields']
subtitles = {
'en': [{
'url': video_metadata['UnicornCcUrl'],
}],
} if video_metadata.get('UnicornCcUrl') else None
return {
'id': video_id,
'title': video_metadata['title'],
'description': video_metadata.get('description'),
'duration': int_or_none(video_metadata.get('duration')),
'series': video_metadata.get('seriesName'),
'season_number': int_or_none(video_metadata.get('seasonNumber')),
'season': video_metadata.get('seasonName'),
'episode_number': int_or_none(video_metadata.get('episodeNumber')),
'timestamp': parse_iso8601(video_data.get('startTime')),
'thumbnails': thumbnails,
'formats': formats,
'subtitles': subtitles,
}

View File

@@ -7,15 +7,13 @@ import itertools
from .common import InfoExtractor
from ..compat import (
compat_str,
compat_urllib_request,
)
from ..utils import (
ExtractorError,
determine_ext,
error_to_compat_str,
ExtractorError,
int_or_none,
parse_iso8601,
sanitized_Request,
str_to_int,
unescapeHTML,
)
@@ -25,7 +23,7 @@ class DailymotionBaseInfoExtractor(InfoExtractor):
@staticmethod
def _build_request(url):
"""Build a request with the family filter disabled"""
request = compat_urllib_request.Request(url)
request = sanitized_Request(url)
request.add_header('Cookie', 'family_filter=off; ff=off')
return request
@@ -39,7 +37,7 @@ class DailymotionBaseInfoExtractor(InfoExtractor):
class DailymotionIE(DailymotionBaseInfoExtractor):
_VALID_URL = r'(?i)(?:https?://)?(?:(www|touch)\.)?dailymotion\.[a-z]{2,3}/(?:(embed|#)/)?video/(?P<id>[^/?_]+)'
_VALID_URL = r'(?i)(?:https?://)?(?:(www|touch)\.)?dailymotion\.[a-z]{2,3}/(?:(?:embed|swf|#)/)?video/(?P<id>[^/?_]+)'
IE_NAME = 'dailymotion'
_FORMATS = [
@@ -101,6 +99,15 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
{
'url': 'http://www.dailymotion.com/video/xhza0o',
'only_matching': True,
},
# with subtitles
{
'url': 'http://www.dailymotion.com/video/x20su5f_the-power-of-nightmares-1-the-rise-of-the-politics-of-fear-bbc-2004_news',
'only_matching': True,
},
{
'url': 'http://www.dailymotion.com/swf/video/x3n92nf',
'only_matching': True,
}
]
@@ -124,7 +131,9 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
webpage, 'comment count', fatal=False))
player_v5 = self._search_regex(
[r'buildPlayer\(({.+?})\);', r'playerV5\s*=\s*dmp\.create\([^,]+?,\s*({.+?})\);'],
[r'buildPlayer\(({.+?})\);\n', # See https://github.com/rg3/youtube-dl/issues/7826
r'playerV5\s*=\s*dmp\.create\([^,]+?,\s*({.+?})\);',
r'buildPlayer\(({.+?})\);'],
webpage, 'player v5', default=None)
if player_v5:
player = self._parse_json(player_v5, video_id)
@@ -143,19 +152,16 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
continue
ext = determine_ext(media_url)
if type_ == 'application/x-mpegURL' or ext == 'm3u8':
m3u8_formats = self._extract_m3u8_formats(
media_url, video_id, 'mp4', m3u8_id='hls', fatal=False)
if m3u8_formats:
formats.extend(m3u8_formats)
formats.extend(self._extract_m3u8_formats(
media_url, video_id, 'mp4', preference=-1,
m3u8_id='hls', fatal=False))
elif type_ == 'application/f4m' or ext == 'f4m':
f4m_formats = self._extract_f4m_formats(
media_url, video_id, preference=-1, f4m_id='hds', fatal=False)
if f4m_formats:
formats.extend(f4m_formats)
formats.extend(self._extract_f4m_formats(
media_url, video_id, preference=-1, f4m_id='hds', fatal=False))
else:
f = {
'url': media_url,
'format_id': quality,
'format_id': 'http-%s' % quality,
}
m = re.search(r'H264-(?P<width>\d+)x(?P<height>\d+)', media_url)
if m:
@@ -174,11 +180,13 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
uploader_id = metadata.get('owner', {}).get('id')
subtitles = {}
for subtitle_lang, subtitle in metadata.get('subtitles', {}).get('data', {}).items():
subtitles[subtitle_lang] = [{
'ext': determine_ext(subtitle_url),
'url': subtitle_url,
} for subtitle_url in subtitle.get('urls', [])]
subtitles_data = metadata.get('subtitles', {}).get('data', {})
if subtitles_data and isinstance(subtitles_data, dict):
for subtitle_lang, subtitle in subtitles_data.items():
subtitles[subtitle_lang] = [{
'ext': determine_ext(subtitle_url),
'url': subtitle_url,
} for subtitle_url in subtitle.get('urls', [])]
return {
'id': video_id,
@@ -271,7 +279,7 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
'https://api.dailymotion.com/video/%s/subtitles?fields=id,language,url' % video_id,
video_id, note=False)
except ExtractorError as err:
self._downloader.report_warning('unable to download video subtitles: %s' % compat_str(err))
self._downloader.report_warning('unable to download video subtitles: %s' % error_to_compat_str(err))
return {}
info = json.loads(sub_list)
if (info['total'] > 0):
@@ -332,7 +340,7 @@ class DailymotionPlaylistIE(DailymotionBaseInfoExtractor):
class DailymotionUserIE(DailymotionPlaylistIE):
IE_NAME = 'dailymotion:user'
_VALID_URL = r'https?://(?:www\.)?dailymotion\.[a-z]{2,3}/(?!(?:embed|#|video|playlist)/)(?:(?:old/)?user/)?(?P<user>[^/]+)'
_VALID_URL = r'https?://(?:www\.)?dailymotion\.[a-z]{2,3}/(?!(?:embed|swf|#|video|playlist)/)(?:(?:old/)?user/)?(?P<user>[^/]+)'
_PAGE_TEMPLATE = 'http://www.dailymotion.com/user/%s/%s'
_TESTS = [{
'url': 'https://www.dailymotion.com/user/nqtv',

View File

@@ -3,56 +3,91 @@
from __future__ import unicode_literals
import re
import itertools
from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_urllib_parse,
compat_urllib_parse_unquote,
compat_urlparse,
)
from ..utils import (
int_or_none,
str_to_int,
xpath_text,
unescapeHTML,
)
class DaumIE(InfoExtractor):
_VALID_URL = r'https?://(?:m\.)?tvpot\.daum\.net/(?:v/|.*?clipid=)(?P<id>[^?#&]+)'
_VALID_URL = r'https?://(?:(?:m\.)?tvpot\.daum\.net/v/|videofarm\.daum\.net/controller/player/VodPlayer\.swf\?vid=)(?P<id>[^?#&]+)'
IE_NAME = 'daum.net'
_TESTS = [{
'url': 'http://tvpot.daum.net/clip/ClipView.do?clipid=52554690',
'url': 'http://tvpot.daum.net/v/vab4dyeDBysyBssyukBUjBz',
'info_dict': {
'id': '52554690',
'id': 'vab4dyeDBysyBssyukBUjBz',
'ext': 'mp4',
'title': 'DOTA 2GETHER 시즌2 6회 - 2부',
'description': 'DOTA 2GETHER 시즌2 6회 - 2부',
'upload_date': '20130831',
'duration': 3868,
'title': '마크 헌트 vs 안토니오 실바',
'description': 'Mark Hunt vs Antonio Silva',
'upload_date': '20131217',
'thumbnail': 're:^https?://.*\.(?:jpg|png)',
'duration': 2117,
'view_count': int,
'comment_count': int,
},
}, {
'url': 'http://tvpot.daum.net/v/vab4dyeDBysyBssyukBUjBz',
'only_matching': True,
'url': 'http://m.tvpot.daum.net/v/65139429',
'info_dict': {
'id': '65139429',
'ext': 'mp4',
'title': '1297회, \'아빠 아들로 태어나길 잘 했어\' 민수, 감동의 눈물[아빠 어디가] 20150118',
'description': 'md5:79794514261164ff27e36a21ad229fc5',
'upload_date': '20150604',
'thumbnail': 're:^https?://.*\.(?:jpg|png)',
'duration': 154,
'view_count': int,
'comment_count': int,
},
}, {
'url': 'http://tvpot.daum.net/v/07dXWRka62Y%24',
'only_matching': True,
}, {
'url': 'http://videofarm.daum.net/controller/player/VodPlayer.swf?vid=vwIpVpCQsT8%24&ref=',
'info_dict': {
'id': 'vwIpVpCQsT8$',
'ext': 'flv',
'title': '01-Korean War ( Trouble on the horizon )',
'description': '\nKorean War 01\nTrouble on the horizon\n전쟁의 먹구름',
'upload_date': '20080223',
'thumbnail': 're:^https?://.*\.(?:jpg|png)',
'duration': 249,
'view_count': int,
'comment_count': int,
},
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
canonical_url = 'http://tvpot.daum.net/v/%s' % video_id
webpage = self._download_webpage(canonical_url, video_id)
full_id = self._search_regex(
r'src=["\']http://videofarm\.daum\.net/controller/video/viewer/Video\.html\?.*?vid=(.+?)[&"\']',
webpage, 'full id')
query = compat_urllib_parse.urlencode({'vid': full_id})
video_id = compat_urllib_parse_unquote(self._match_id(url))
query = compat_urllib_parse.urlencode({'vid': video_id})
movie_data = self._download_json(
'http://videofarm.daum.net/controller/api/closed/v1_2/IntegratedMovieData.json?' + query,
video_id, 'Downloading video formats info')
# For urls like http://m.tvpot.daum.net/v/65139429, where the video_id is really a clipid
if not movie_data.get('output_list', {}).get('output_list') and re.match(r'^\d+$', video_id):
return self.url_result('http://tvpot.daum.net/clip/ClipView.do?clipid=%s' % video_id)
info = self._download_xml(
'http://tvpot.daum.net/clip/ClipInfoXml.do?' + query, video_id,
'Downloading video info')
urls = self._download_xml(
'http://videofarm.daum.net/controller/api/open/v1_2/MovieData.apixml?' + query,
video_id, 'Downloading video formats info')
formats = []
for format_el in urls.findall('result/output_list/output_list'):
profile = format_el.attrib['profile']
for format_el in movie_data['output_list']['output_list']:
profile = format_el['profile']
format_query = compat_urllib_parse.urlencode({
'vid': full_id,
'vid': video_id,
'profile': profile,
})
url_doc = self._download_xml(
@@ -62,14 +97,202 @@ class DaumIE(InfoExtractor):
formats.append({
'url': format_url,
'format_id': profile,
'width': int_or_none(format_el.get('width')),
'height': int_or_none(format_el.get('height')),
'filesize': int_or_none(format_el.get('filesize')),
})
self._sort_formats(formats)
return {
'id': video_id,
'title': info.find('TITLE').text,
'formats': formats,
'thumbnail': self._og_search_thumbnail(webpage),
'description': info.find('CONTENTS').text,
'duration': int(info.find('DURATION').text),
'thumbnail': xpath_text(info, 'THUMB_URL'),
'description': xpath_text(info, 'CONTENTS'),
'duration': int_or_none(xpath_text(info, 'DURATION')),
'upload_date': info.find('REGDTTM').text[:8],
'view_count': str_to_int(xpath_text(info, 'PLAY_CNT')),
'comment_count': str_to_int(xpath_text(info, 'COMMENT_CNT')),
}
class DaumClipIE(InfoExtractor):
_VALID_URL = r'https?://(?:m\.)?tvpot\.daum\.net/(?:clip/ClipView.(?:do|tv)|mypot/View.do)\?.*?clipid=(?P<id>\d+)'
IE_NAME = 'daum.net:clip'
_URL_TEMPLATE = 'http://tvpot.daum.net/clip/ClipView.do?clipid=%s'
_TESTS = [{
'url': 'http://tvpot.daum.net/clip/ClipView.do?clipid=52554690',
'info_dict': {
'id': '52554690',
'ext': 'mp4',
'title': 'DOTA 2GETHER 시즌2 6회 - 2부',
'description': 'DOTA 2GETHER 시즌2 6회 - 2부',
'upload_date': '20130831',
'thumbnail': 're:^https?://.*\.(?:jpg|png)',
'duration': 3868,
'view_count': int,
},
}, {
'url': 'http://m.tvpot.daum.net/clip/ClipView.tv?clipid=54999425',
'only_matching': True,
}]
@classmethod
def suitable(cls, url):
return False if DaumPlaylistIE.suitable(url) or DaumUserIE.suitable(url) else super(DaumClipIE, cls).suitable(url)
def _real_extract(self, url):
video_id = self._match_id(url)
clip_info = self._download_json(
'http://tvpot.daum.net/mypot/json/GetClipInfo.do?clipid=%s' % video_id,
video_id, 'Downloading clip info')['clip_bean']
return {
'_type': 'url_transparent',
'id': video_id,
'url': 'http://tvpot.daum.net/v/%s' % clip_info['vid'],
'title': unescapeHTML(clip_info['title']),
'thumbnail': clip_info.get('thumb_url'),
'description': clip_info.get('contents'),
'duration': int_or_none(clip_info.get('duration')),
'upload_date': clip_info.get('up_date')[:8],
'view_count': int_or_none(clip_info.get('play_count')),
'ie_key': 'Daum',
}
class DaumListIE(InfoExtractor):
def _get_entries(self, list_id, list_id_type):
name = None
entries = []
for pagenum in itertools.count(1):
list_info = self._download_json(
'http://tvpot.daum.net/mypot/json/GetClipInfo.do?size=48&init=true&order=date&page=%d&%s=%s' % (
pagenum, list_id_type, list_id), list_id, 'Downloading list info - %s' % pagenum)
entries.extend([
self.url_result(
'http://tvpot.daum.net/v/%s' % clip['vid'])
for clip in list_info['clip_list']
])
if not name:
name = list_info.get('playlist_bean', {}).get('name') or \
list_info.get('potInfo', {}).get('name')
if not list_info.get('has_more'):
break
return name, entries
def _check_clip(self, url, list_id):
query_dict = compat_parse_qs(compat_urlparse.urlparse(url).query)
if 'clipid' in query_dict:
clip_id = query_dict['clipid'][0]
if self._downloader.params.get('noplaylist'):
self.to_screen('Downloading just video %s because of --no-playlist' % clip_id)
return self.url_result(DaumClipIE._URL_TEMPLATE % clip_id, 'DaumClip')
else:
self.to_screen('Downloading playlist %s - add --no-playlist to just download video' % list_id)
class DaumPlaylistIE(DaumListIE):
_VALID_URL = r'https?://(?:m\.)?tvpot\.daum\.net/mypot/(?:View\.do|Top\.tv)\?.*?playlistid=(?P<id>[0-9]+)'
IE_NAME = 'daum.net:playlist'
_URL_TEMPLATE = 'http://tvpot.daum.net/mypot/View.do?playlistid=%s'
_TESTS = [{
'note': 'Playlist url with clipid',
'url': 'http://tvpot.daum.net/mypot/View.do?playlistid=6213966&clipid=73806844',
'info_dict': {
'id': '6213966',
'title': 'Woorissica Official',
},
'playlist_mincount': 181
}, {
'note': 'Playlist url with clipid - noplaylist',
'url': 'http://tvpot.daum.net/mypot/View.do?playlistid=6213966&clipid=73806844',
'info_dict': {
'id': '73806844',
'ext': 'mp4',
'title': '151017 Airport',
'upload_date': '20160117',
},
'params': {
'noplaylist': True,
'skip_download': True,
}
}]
@classmethod
def suitable(cls, url):
return False if DaumUserIE.suitable(url) else super(DaumPlaylistIE, cls).suitable(url)
def _real_extract(self, url):
list_id = self._match_id(url)
clip_result = self._check_clip(url, list_id)
if clip_result:
return clip_result
name, entries = self._get_entries(list_id, 'playlistid')
return self.playlist_result(entries, list_id, name)
class DaumUserIE(DaumListIE):
_VALID_URL = r'https?://(?:m\.)?tvpot\.daum\.net/mypot/(?:View|Top)\.(?:do|tv)\?.*?ownerid=(?P<id>[0-9a-zA-Z]+)'
IE_NAME = 'daum.net:user'
_TESTS = [{
'url': 'http://tvpot.daum.net/mypot/View.do?ownerid=o2scDLIVbHc0',
'info_dict': {
'id': 'o2scDLIVbHc0',
'title': '마이 리틀 텔레비전',
},
'playlist_mincount': 213
}, {
'url': 'http://tvpot.daum.net/mypot/View.do?ownerid=o2scDLIVbHc0&clipid=73801156',
'info_dict': {
'id': '73801156',
'ext': 'mp4',
'title': '[미공개] 김구라, 오만석이 부릅니다 \'오케피\' - 마이 리틀 텔레비전 20160116',
'upload_date': '20160117',
'description': 'md5:5e91d2d6747f53575badd24bd62b9f36'
},
'params': {
'noplaylist': True,
'skip_download': True,
}
}, {
'note': 'Playlist url has ownerid and playlistid, playlistid takes precedence',
'url': 'http://tvpot.daum.net/mypot/View.do?ownerid=o2scDLIVbHc0&playlistid=6196631',
'info_dict': {
'id': '6196631',
'title': '마이 리틀 텔레비전 - 20160109',
},
'playlist_count': 11
}, {
'url': 'http://tvpot.daum.net/mypot/Top.do?ownerid=o2scDLIVbHc0',
'only_matching': True,
}, {
'url': 'http://m.tvpot.daum.net/mypot/Top.tv?ownerid=45x1okb1If50&playlistid=3569733',
'only_matching': True,
}]
def _real_extract(self, url):
list_id = self._match_id(url)
clip_result = self._check_clip(url, list_id)
if clip_result:
return clip_result
query_dict = compat_parse_qs(compat_urlparse.urlparse(url).query)
if 'playlistid' in query_dict:
playlist_id = query_dict['playlistid'][0]
return self.url_result(DaumPlaylistIE._URL_TEMPLATE % playlist_id, 'DaumPlaylist')
name, entries = self._get_entries(list_id, 'ownerid')
return self.playlist_result(entries, list_id, name)

View File

@@ -13,8 +13,8 @@ from ..utils import (
class DBTVIE(InfoExtractor):
_VALID_URL = r'http://dbtv\.no/(?P<id>[0-9]+)#(?P<display_id>.+)'
_TEST = {
_VALID_URL = r'https?://(?:www\.)?dbtv\.no/(?:(?:lazyplayer|player)/)?(?P<id>[0-9]+)(?:#(?P<display_id>.+))?'
_TESTS = [{
'url': 'http://dbtv.no/3649835190001#Skulle_teste_ut_fornøyelsespark,_men_kollegaen_var_bare_opptatt_av_bikinikroppen',
'md5': 'b89953ed25dacb6edb3ef6c6f430f8bc',
'info_dict': {
@@ -30,12 +30,18 @@ class DBTVIE(InfoExtractor):
'view_count': int,
'categories': list,
}
}
}, {
'url': 'http://dbtv.no/3649835190001',
'only_matching': True,
}, {
'url': 'http://www.dbtv.no/lazyplayer/4631135248001',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
display_id = mobj.group('display_id')
display_id = mobj.group('display_id') or video_id
data = self._download_json(
'http://api.dbtv.no/discovery/%s' % video_id, display_id)

View File

@@ -1,28 +1,90 @@
# coding: utf-8
from __future__ import unicode_literals
import re
import base64
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse,
compat_urllib_request,
compat_str,
)
from ..utils import (
int_or_none,
parse_iso8601,
sanitized_Request,
smuggle_url,
unsmuggle_url,
)
class DCNIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?dcndigital\.ae/(?:#/)?(?:video/.+|show/\d+/.+?)/(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?dcndigital\.ae/(?:#/)?show/(?P<show_id>\d+)/[^/]+(?:/(?P<video_id>\d+)/(?P<season_id>\d+))?'
def _real_extract(self, url):
show_id, video_id, season_id = re.match(self._VALID_URL, url).groups()
if video_id and int(video_id) > 0:
return self.url_result(
'http://www.dcndigital.ae/media/%s' % video_id, 'DCNVideo')
elif season_id and int(season_id) > 0:
return self.url_result(smuggle_url(
'http://www.dcndigital.ae/program/season/%s' % season_id,
{'show_id': show_id}), 'DCNSeason')
else:
return self.url_result(
'http://www.dcndigital.ae/program/%s' % show_id, 'DCNSeason')
class DCNBaseIE(InfoExtractor):
def _extract_video_info(self, video_data, video_id, is_live):
title = video_data.get('title_en') or video_data['title_ar']
img = video_data.get('img')
thumbnail = 'http://admin.mangomolo.com/analytics/%s' % img if img else None
duration = int_or_none(video_data.get('duration'))
description = video_data.get('description_en') or video_data.get('description_ar')
timestamp = parse_iso8601(video_data.get('create_time'), ' ')
return {
'id': video_id,
'title': self._live_title(title) if is_live else title,
'description': description,
'thumbnail': thumbnail,
'duration': duration,
'timestamp': timestamp,
'is_live': is_live,
}
def _extract_video_formats(self, webpage, video_id, entry_protocol):
formats = []
m3u8_url = self._html_search_regex(
r'file\s*:\s*"([^"]+)', webpage, 'm3u8 url', fatal=False)
if m3u8_url:
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', entry_protocol, m3u8_id='hls', fatal=None))
rtsp_url = self._search_regex(
r'<a[^>]+href="(rtsp://[^"]+)"', webpage, 'rtsp url', fatal=False)
if rtsp_url:
formats.append({
'url': rtsp_url,
'format_id': 'rtsp',
})
self._sort_formats(formats)
return formats
class DCNVideoIE(DCNBaseIE):
IE_NAME = 'dcn:video'
_VALID_URL = r'https?://(?:www\.)?dcndigital\.ae/(?:#/)?(?:video/[^/]+|media|catchup/[^/]+/[^/]+)/(?P<id>\d+)'
_TEST = {
'url': 'http://www.dcndigital.ae/#/show/199074/%D8%B1%D8%AD%D9%84%D8%A9-%D8%A7%D9%84%D8%B9%D9%85%D8%B1-%D8%A7%D9%84%D8%AD%D9%84%D9%82%D8%A9-1/17375/6887',
'url': 'http://www.dcndigital.ae/#/video/%D8%B1%D8%AD%D9%84%D8%A9-%D8%A7%D9%84%D8%B9%D9%85%D8%B1-%D8%A7%D9%84%D8%AD%D9%84%D9%82%D8%A9-1/17375',
'info_dict':
{
'id': '17375',
'ext': 'mp4',
'title': 'رحلة العمر : الحلقة 1',
'description': 'md5:0156e935d870acb8ef0a66d24070c6d6',
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 2041,
'timestamp': 1227504126,
'upload_date': '20081124',
@@ -36,49 +98,99 @@ class DCNIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
request = compat_urllib_request.Request(
request = sanitized_Request(
'http://admin.mangomolo.com/analytics/index.php/plus/video?id=%s' % video_id,
headers={'Origin': 'http://www.dcndigital.ae'})
video = self._download_json(request, video_id)
title = video.get('title_en') or video['title_ar']
video_data = self._download_json(request, video_id)
info = self._extract_video_info(video_data, video_id, False)
webpage = self._download_webpage(
'http://admin.mangomolo.com/analytics/index.php/customers/embed/video?' +
compat_urllib_parse.urlencode({
'id': video['id'],
'user_id': video['user_id'],
'signature': video['signature'],
'id': video_data['id'],
'user_id': video_data['user_id'],
'signature': video_data['signature'],
'countries': 'Q0M=',
'filter': 'DENY',
}), video_id)
info['formats'] = self._extract_video_formats(webpage, video_id, 'm3u8_native')
return info
m3u8_url = self._html_search_regex(r'file:\s*"([^"]+)', webpage, 'm3u8 url')
formats = self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', entry_protocol='m3u8_native', m3u8_id='hls')
rtsp_url = self._search_regex(
r'<a[^>]+href="(rtsp://[^"]+)"', webpage, 'rtsp url', fatal=False)
if rtsp_url:
formats.append({
'url': rtsp_url,
'format_id': 'rtsp',
class DCNLiveIE(DCNBaseIE):
IE_NAME = 'dcn:live'
_VALID_URL = r'https?://(?:www\.)?dcndigital\.ae/(?:#/)?live/(?P<id>\d+)'
def _real_extract(self, url):
channel_id = self._match_id(url)
request = sanitized_Request(
'http://admin.mangomolo.com/analytics/index.php/plus/getchanneldetails?channel_id=%s' % channel_id,
headers={'Origin': 'http://www.dcndigital.ae'})
channel_data = self._download_json(request, channel_id)
info = self._extract_video_info(channel_data, channel_id, True)
webpage = self._download_webpage(
'http://admin.mangomolo.com/analytics/index.php/customers/embed/index?' +
compat_urllib_parse.urlencode({
'id': base64.b64encode(channel_data['user_id'].encode()).decode(),
'channelid': base64.b64encode(channel_data['id'].encode()).decode(),
'signature': channel_data['signature'],
'countries': 'Q0M=',
'filter': 'DENY',
}), channel_id)
info['formats'] = self._extract_video_formats(webpage, channel_id, 'm3u8')
return info
class DCNSeasonIE(InfoExtractor):
IE_NAME = 'dcn:season'
_VALID_URL = r'https?://(?:www\.)?dcndigital\.ae/(?:#/)?program/(?:(?P<show_id>\d+)|season/(?P<season_id>\d+))'
_TEST = {
'url': 'http://dcndigital.ae/#/program/205024/%D9%85%D8%AD%D8%A7%D8%B6%D8%B1%D8%A7%D8%AA-%D8%A7%D9%84%D8%B4%D9%8A%D8%AE-%D8%A7%D9%84%D8%B4%D8%B9%D8%B1%D8%A7%D9%88%D9%8A',
'info_dict':
{
'id': '7910',
'title': 'محاضرات الشيخ الشعراوي',
},
'playlist_mincount': 27,
}
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
show_id, season_id = re.match(self._VALID_URL, url).groups()
data = {}
if season_id:
data['season'] = season_id
show_id = smuggled_data.get('show_id')
if show_id is None:
request = sanitized_Request(
'http://admin.mangomolo.com/analytics/index.php/plus/season_info?id=%s' % season_id,
headers={'Origin': 'http://www.dcndigital.ae'})
season = self._download_json(request, season_id)
show_id = season['id']
data['show_id'] = show_id
request = sanitized_Request(
'http://admin.mangomolo.com/analytics/index.php/plus/show',
compat_urllib_parse.urlencode(data),
{
'Origin': 'http://www.dcndigital.ae',
'Content-Type': 'application/x-www-form-urlencoded'
})
self._sort_formats(formats)
show = self._download_json(request, show_id)
if not season_id:
season_id = show['default_season']
for season in show['seasons']:
if season['id'] == season_id:
title = season.get('title_en') or season['title_ar']
img = video.get('img')
thumbnail = 'http://admin.mangomolo.com/analytics/%s' % img if img else None
duration = int_or_none(video.get('duration'))
description = video.get('description_en') or video.get('description_ar')
timestamp = parse_iso8601(video.get('create_time') or video.get('update_time'), ' ')
entries = []
for video in show['videos']:
video_id = compat_str(video['id'])
entries.append(self.url_result(
'http://www.dcndigital.ae/media/%s' % video_id, 'DCNVideo', video_id))
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'duration': duration,
'timestamp': timestamp,
'formats': formats,
}
return self.playlist_result(entries, season_id, title)

View File

@@ -0,0 +1,112 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import int_or_none
class DigitekaIE(InfoExtractor):
_VALID_URL = r'''(?x)
https?://(?:www\.)?(?:digiteka\.net|ultimedia\.com)/
(?:
deliver/
(?P<embed_type>
generic|
musique
)
(?:/[^/]+)*/
(?:
src|
article
)|
default/index/video
(?P<site_type>
generic|
music
)
/id
)/(?P<id>[\d+a-z]+)'''
_TESTS = [{
# news
'url': 'https://www.ultimedia.com/default/index/videogeneric/id/s8uk0r',
'md5': '276a0e49de58c7e85d32b057837952a2',
'info_dict': {
'id': 's8uk0r',
'ext': 'mp4',
'title': 'Loi sur la fin de vie: le texte prévoit un renforcement des directives anticipées',
'thumbnail': 're:^https?://.*\.jpg',
'duration': 74,
'upload_date': '20150317',
'timestamp': 1426604939,
'uploader_id': '3fszv',
},
}, {
# music
'url': 'https://www.ultimedia.com/default/index/videomusic/id/xvpfp8',
'md5': '2ea3513813cf230605c7e2ffe7eca61c',
'info_dict': {
'id': 'xvpfp8',
'ext': 'mp4',
'title': 'Two - C\'est La Vie (clip)',
'thumbnail': 're:^https?://.*\.jpg',
'duration': 233,
'upload_date': '20150224',
'timestamp': 1424760500,
'uploader_id': '3rfzk',
},
}, {
'url': 'https://www.digiteka.net/deliver/generic/iframe/mdtk/01637594/src/lqm3kl/zone/1/showtitle/1/autoplay/yes',
'only_matching': True,
}]
@staticmethod
def _extract_url(webpage):
mobj = re.search(
r'<(?:iframe|script)[^>]+src=["\'](?P<url>(?:https?:)?//(?:www\.)?ultimedia\.com/deliver/(?:generic|musique)(?:/[^/]+)*/(?:src|article)/[\d+a-z]+)',
webpage)
if mobj:
return mobj.group('url')
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_type = mobj.group('embed_type') or mobj.group('site_type')
if video_type == 'music':
video_type = 'musique'
deliver_info = self._download_json(
'http://www.ultimedia.com/deliver/video?video=%s&topic=%s' % (video_id, video_type),
video_id)
yt_id = deliver_info.get('yt_id')
if yt_id:
return self.url_result(yt_id, 'Youtube')
jwconf = deliver_info['jwconf']
formats = []
for source in jwconf['playlist'][0]['sources']:
formats.append({
'url': source['file'],
'format_id': source.get('label'),
})
self._sort_formats(formats)
title = deliver_info['title']
thumbnail = jwconf.get('image')
duration = int_or_none(deliver_info.get('duration'))
timestamp = int_or_none(deliver_info.get('release_time'))
uploader_id = deliver_info.get('owner_id')
return {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'duration': duration,
'timestamp': timestamp,
'uploader_id': uploader_id,
'formats': formats,
}

View File

@@ -9,7 +9,17 @@ from ..compat import compat_str
class DiscoveryIE(InfoExtractor):
_VALID_URL = r'http://www\.discovery\.com\/[a-zA-Z0-9\-]*/[a-zA-Z0-9\-]*/videos/(?P<id>[a-zA-Z0-9_\-]*)(?:\.htm)?'
_VALID_URL = r'''(?x)http://(?:www\.)?(?:
discovery|
investigationdiscovery|
discoverylife|
animalplanet|
ahctv|
destinationamerica|
sciencechannel|
tlc|
velocity
)\.com/(?:[^/]+/)*(?P<id>[^./?#]+)'''
_TESTS = [{
'url': 'http://www.discovery.com/tv-shows/mythbusters/videos/mission-impossible-outtakes.htm',
'info_dict': {
@@ -21,8 +31,8 @@ class DiscoveryIE(InfoExtractor):
'don\'t miss Adam moon-walking as Jamie ... behind Jamie\'s'
' back.'),
'duration': 156,
'timestamp': 1303099200,
'upload_date': '20110418',
'timestamp': 1302032462,
'upload_date': '20110405',
},
'params': {
'skip_download': True, # requires ffmpeg
@@ -33,27 +43,38 @@ class DiscoveryIE(InfoExtractor):
'id': 'mythbusters-the-simpsons',
'title': 'MythBusters: The Simpsons',
},
'playlist_count': 9,
'playlist_mincount': 10,
}, {
'url': 'http://www.animalplanet.com/longfin-eels-maneaters/',
'info_dict': {
'id': '78326',
'ext': 'mp4',
'title': 'Longfin Eels: Maneaters?',
'description': 'Jeremy Wade tests whether or not New Zealand\'s longfin eels are man-eaters by covering himself in fish guts and getting in the water with them.',
'upload_date': '20140725',
'timestamp': 1406246400,
'duration': 116,
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
info = self._download_json(url + '?flat=1', video_id)
display_id = self._match_id(url)
info = self._download_json(url + '?flat=1', display_id)
video_title = info.get('playlist_title') or info.get('video_title')
entries = [{
'id': compat_str(video_info['id']),
'formats': self._extract_m3u8_formats(
video_info['src'], video_id, ext='mp4',
video_info['src'], display_id, 'mp4', 'm3u8_native', m3u8_id='hls',
note='Download m3u8 information for video %d' % (idx + 1)),
'title': video_info['title'],
'description': video_info.get('description'),
'duration': parse_duration(video_info.get('video_length')),
'webpage_url': video_info.get('href'),
'webpage_url': video_info.get('href') or video_info.get('url'),
'thumbnail': video_info.get('thumbnailURL'),
'alt_title': video_info.get('secondary_title'),
'timestamp': parse_iso8601(video_info.get('publishedDate')),
} for idx, video_info in enumerate(info['playlist'])]
return self.playlist_result(entries, video_id, video_title)
return self.playlist_result(entries, display_id, video_title)

View File

@@ -3,23 +3,21 @@ from __future__ import unicode_literals
import itertools
from .common import InfoExtractor
from .amp import AMPIE
from ..compat import (
compat_HTTPError,
compat_urllib_parse,
compat_urllib_request,
compat_urlparse,
)
from ..utils import (
ExtractorError,
clean_html,
determine_ext,
int_or_none,
parse_iso8601,
sanitized_Request,
)
class DramaFeverBaseIE(InfoExtractor):
class DramaFeverBaseIE(AMPIE):
_LOGIN_URL = 'https://www.dramafever.com/accounts/login/'
_NETRC_MACHINE = 'dramafever'
@@ -51,7 +49,7 @@ class DramaFeverBaseIE(InfoExtractor):
'password': password,
}
request = compat_urllib_request.Request(
request = sanitized_Request(
self._LOGIN_URL, compat_urllib_parse.urlencode(login_form).encode('utf-8'))
response = self._download_webpage(
request, None, 'Logging in as %s' % username)
@@ -69,71 +67,56 @@ class DramaFeverBaseIE(InfoExtractor):
class DramaFeverIE(DramaFeverBaseIE):
IE_NAME = 'dramafever'
_VALID_URL = r'https?://(?:www\.)?dramafever\.com/drama/(?P<id>[0-9]+/[0-9]+)(?:/|$)'
_TEST = {
_TESTS = [{
'url': 'http://www.dramafever.com/drama/4512/1/Cooking_with_Shin/',
'info_dict': {
'id': '4512.1',
'ext': 'flv',
'ext': 'mp4',
'title': 'Cooking with Shin 4512.1',
'description': 'md5:a8eec7942e1664a6896fcd5e1287bfd0',
'episode': 'Episode 1',
'episode_number': 1,
'thumbnail': 're:^https?://.*\.jpg',
'timestamp': 1404336058,
'upload_date': '20140702',
'duration': 343,
}
}
},
'params': {
# m3u8 download
'skip_download': True,
},
}, {
'url': 'http://www.dramafever.com/drama/4826/4/Mnet_Asian_Music_Awards_2015/?ap=1',
'info_dict': {
'id': '4826.4',
'ext': 'mp4',
'title': 'Mnet Asian Music Awards 2015 4826.4',
'description': 'md5:3ff2ee8fedaef86e076791c909cf2e91',
'episode': 'Mnet Asian Music Awards 2015 - Part 3',
'episode_number': 4,
'thumbnail': 're:^https?://.*\.jpg',
'timestamp': 1450213200,
'upload_date': '20151215',
'duration': 5602,
},
'params': {
# m3u8 download
'skip_download': True,
},
}]
def _real_extract(self, url):
video_id = self._match_id(url).replace('/', '.')
try:
feed = self._download_json(
'http://www.dramafever.com/amp/episode/feed.json?guid=%s' % video_id,
video_id, 'Downloading episode JSON')['channel']['item']
info = self._extract_feed_info(
'http://www.dramafever.com/amp/episode/feed.json?guid=%s' % video_id)
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError):
raise ExtractorError(
'Currently unavailable in your country.', expected=True)
raise
media_group = feed.get('media-group', {})
formats = []
for media_content in media_group['media-content']:
src = media_content.get('@attributes', {}).get('url')
if not src:
continue
ext = determine_ext(src)
if ext == 'f4m':
formats.extend(self._extract_f4m_formats(
src, video_id, f4m_id='hds'))
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
src, video_id, 'mp4', m3u8_id='hls'))
else:
formats.append({
'url': src,
})
self._sort_formats(formats)
title = media_group.get('media-title')
description = media_group.get('media-description')
duration = int_or_none(media_group['media-content'][0].get('@attributes', {}).get('duration'))
thumbnail = self._proto_relative_url(
media_group.get('media-thumbnail', {}).get('@attributes', {}).get('url'))
timestamp = parse_iso8601(feed.get('pubDate'), ' ')
subtitles = {}
for media_subtitle in media_group.get('media-subTitle', []):
lang = media_subtitle.get('@attributes', {}).get('lang')
href = media_subtitle.get('@attributes', {}).get('href')
if not lang or not href:
continue
subtitles[lang] = [{
'ext': 'ttml',
'url': href,
}]
series_id, episode_number = video_id.split('.')
episode_info = self._download_json(
# We only need a single episode info, so restricting page size to one episode
@@ -143,24 +126,24 @@ class DramaFeverIE(DramaFeverBaseIE):
video_id, 'Downloading episode info JSON', fatal=False)
if episode_info:
value = episode_info.get('value')
if value:
subfile = value[0].get('subfile') or value[0].get('new_subfile')
if subfile and subfile != 'http://www.dramafever.com/st/':
subtitles.setdefault('English', []).append({
'ext': 'srt',
'url': subfile,
})
if isinstance(value, list):
for v in value:
if v.get('type') == 'Episode':
subfile = v.get('subfile') or v.get('new_subfile')
if subfile and subfile != 'http://www.dramafever.com/st/':
info.setdefault('subtitles', {}).setdefault('English', []).append({
'ext': 'srt',
'url': subfile,
})
episode_number = int_or_none(v.get('number'))
episode_fallback = 'Episode'
if episode_number:
episode_fallback += ' %d' % episode_number
info['episode'] = v.get('title') or episode_fallback
info['episode_number'] = episode_number
break
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'timestamp': timestamp,
'duration': duration,
'formats': formats,
'subtitles': subtitles,
}
return info
class DramaFeverSeriesIE(DramaFeverBaseIE):

View File

@@ -2,14 +2,10 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
unified_strdate,
)
from .zdf import ZDFIE
class DreiSatIE(InfoExtractor):
class DreiSatIE(ZDFIE):
IE_NAME = '3sat'
_VALID_URL = r'(?:http://)?(?:www\.)?3sat\.de/mediathek/(?:index\.php|mediathek\.php)?\?(?:(?:mode|display)=[^&]+&)*obj=(?P<id>[0-9]+)$'
_TESTS = [
@@ -35,53 +31,4 @@ class DreiSatIE(InfoExtractor):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
details_url = 'http://www.3sat.de/mediathek/xmlservice/web/beitragsDetails?ak=web&id=%s' % video_id
details_doc = self._download_xml(details_url, video_id, 'Downloading video details')
status_code = details_doc.find('./status/statuscode')
if status_code is not None and status_code.text != 'ok':
code = status_code.text
if code == 'notVisibleAnymore':
message = 'Video %s is not available' % video_id
else:
message = '%s returned error: %s' % (self.IE_NAME, code)
raise ExtractorError(message, expected=True)
thumbnail_els = details_doc.findall('.//teaserimage')
thumbnails = [{
'width': int(te.attrib['key'].partition('x')[0]),
'height': int(te.attrib['key'].partition('x')[2]),
'url': te.text,
} for te in thumbnail_els]
information_el = details_doc.find('.//information')
video_title = information_el.find('./title').text
video_description = information_el.find('./detail').text
details_el = details_doc.find('.//details')
video_uploader = details_el.find('./channel').text
upload_date = unified_strdate(details_el.find('./airtime').text)
format_els = details_doc.findall('.//formitaet')
formats = [{
'format_id': fe.attrib['basetype'],
'width': int(fe.find('./width').text),
'height': int(fe.find('./height').text),
'url': fe.find('./url').text,
'filesize': int(fe.find('./filesize').text),
'video_bitrate': int(fe.find('./videoBitrate').text),
} for fe in format_els
if not fe.find('./url').text.startswith('http://www.metafilegenerator.de/')]
self._sort_formats(formats)
return {
'_type': 'video',
'id': video_id,
'title': video_title,
'formats': formats,
'description': video_description,
'thumbnails': thumbnails,
'thumbnail': thumbnails[-1]['url'],
'uploader': video_uploader,
'upload_date': upload_date,
}
return self.extract_from_xml_url(video_id, details_url)

View File

@@ -91,7 +91,7 @@ class DRTVIE(InfoExtractor):
subtitles_list = asset.get('SubtitlesList')
if isinstance(subtitles_list, list):
LANGS = {
'Danish': 'dk',
'Danish': 'da',
}
for subs in subtitles_list:
lang = subs['Language']

View File

@@ -5,8 +5,10 @@ import base64
import re
from .common import InfoExtractor
from ..compat import compat_urllib_request
from ..utils import qualities
from ..utils import (
qualities,
sanitized_Request,
)
class DumpertIE(InfoExtractor):
@@ -32,7 +34,7 @@ class DumpertIE(InfoExtractor):
protocol = mobj.group('protocol')
url = '%s://www.dumpert.nl/mediabase/%s' % (protocol, video_id)
req = compat_urllib_request.Request(url)
req = sanitized_Request(url)
req.add_header('Cookie', 'nsfw=1; cpc=10')
webpage = self._download_webpage(req, video_id)

View File

@@ -1,9 +1,12 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import (
remove_start,
sanitized_Request,
)
class EinthusanIE(InfoExtractor):
@@ -34,27 +37,33 @@ class EinthusanIE(InfoExtractor):
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
video_id = self._match_id(url)
video_title = self._html_search_regex(
r'<h1><a class="movie-title".*?>(.*?)</a></h1>', webpage, 'title')
request = sanitized_Request(url)
request.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 5.2; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0')
webpage = self._download_webpage(request, video_id)
video_url = self._html_search_regex(
r'''(?s)jwplayer\("mediaplayer"\)\.setup\({.*?'file': '([^']+)'.*?}\);''',
webpage, 'video url')
title = self._html_search_regex(
r'<h1><a[^>]+class=["\']movie-title["\'][^>]*>(.+?)</a></h1>',
webpage, 'title')
video_id = self._search_regex(
r'data-movieid=["\'](\d+)', webpage, 'video id', default=video_id)
video_url = self._download_webpage(
'http://cdn.einthusan.com/geturl/%s/hd/London,Washington,Toronto,Dallas,San,Sydney/'
% video_id, video_id)
description = self._html_search_meta('description', webpage)
thumbnail = self._html_search_regex(
r'''<a class="movie-cover-wrapper".*?><img src=["'](.*?)["'].*?/></a>''',
webpage, "thumbnail url", fatal=False)
if thumbnail is not None:
thumbnail = thumbnail.replace('..', 'http://www.einthusan.com')
thumbnail = compat_urlparse.urljoin(url, remove_start(thumbnail, '..'))
return {
'id': video_id,
'title': video_title,
'title': title,
'url': video_url,
'thumbnail': thumbnail,
'description': description,

View File

@@ -2,11 +2,11 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_urllib_request
from ..utils import (
float_or_none,
int_or_none,
parse_iso8601,
sanitized_Request,
)
@@ -57,7 +57,7 @@ class EitbIE(InfoExtractor):
hls_url = media.get('HLS_SURL')
if hls_url:
request = compat_urllib_request.Request(
request = sanitized_Request(
'http://mam.eitb.eus/mam/REST/ServiceMultiweb/DomainRestrictedSecurity/TokenAuth/',
headers={'Referer': url})
token_data = self._download_json(
@@ -65,18 +65,14 @@ class EitbIE(InfoExtractor):
if token_data:
token = token_data.get('token')
if token:
m3u8_formats = self._extract_m3u8_formats(
'%s?hdnts=%s' % (hls_url, token), video_id, m3u8_id='hls', fatal=False)
if m3u8_formats:
formats.extend(m3u8_formats)
formats.extend(self._extract_m3u8_formats(
'%s?hdnts=%s' % (hls_url, token), video_id, m3u8_id='hls', fatal=False))
hds_url = media.get('HDS_SURL')
if hds_url:
f4m_formats = self._extract_f4m_formats(
formats.extend(self._extract_f4m_formats(
'%s?hdcore=3.7.0' % hds_url.replace('euskalsvod', 'euskalvod'),
video_id, f4m_id='hds', fatal=False)
if f4m_formats:
formats.extend(f4m_formats)
video_id, f4m_id='hds', fatal=False))
self._sort_formats(formats)

View File

@@ -13,12 +13,12 @@ class EllenTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?:ellentv|ellentube)\.com/videos/(?P<id>[a-z0-9_-]+)'
_TEST = {
'url': 'http://www.ellentv.com/videos/0-ipq1gsai/',
'md5': '8e3c576bf2e9bfff4d76565f56f94c9c',
'md5': '4294cf98bc165f218aaa0b89e0fd8042',
'info_dict': {
'id': '0_ipq1gsai',
'ext': 'mp4',
'ext': 'mov',
'title': 'Fast Fingers of Fate',
'description': 'md5:587e79fbbd0d73b148bc596d99ce48e6',
'description': 'md5:3539013ddcbfa64b2a6d1b38d910868a',
'timestamp': 1428035648,
'upload_date': '20150403',
'uploader_id': 'batchUser',

View File

@@ -3,13 +3,12 @@ from __future__ import unicode_literals
import json
from .common import InfoExtractor
from ..compat import compat_urllib_request
from ..utils import (
determine_ext,
clean_html,
int_or_none,
float_or_none,
sanitized_Request,
)
@@ -75,7 +74,7 @@ class EscapistIE(InfoExtractor):
video_id = ims_video['videoID']
key = ims_video['hash']
config_req = compat_urllib_request.Request(
config_req = sanitized_Request(
'http://www.escapistmagazine.com/videos/'
'vidconfig.php?videoID=%s&hash=%s' % (video_id, key))
config_req.add_header('Referer', url)

View File

@@ -1,18 +1,30 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import remove_end
class ESPNIE(InfoExtractor):
_VALID_URL = r'https?://espn\.go\.com/(?:[^/]+/)*(?P<id>[^/]+)'
_WORKING = False
_TESTS = [{
'url': 'http://espn.go.com/video/clip?id=10365079',
'info_dict': {
'id': 'FkYWtmazr6Ed8xmvILvKLWjd4QvYZpzG',
'ext': 'mp4',
'title': 'dm_140128_30for30Shorts___JudgingJewellv2',
'description': '',
'title': '30 for 30 Shorts: Judging Jewell',
'description': None,
},
'params': {
# m3u8 download
'skip_download': True,
},
}, {
# intl video, from http://www.espnfc.us/video/mls-highlights/150/video/2743663/must-see-moments-best-of-the-mls-season
'url': 'http://espn.go.com/video/clip?id=2743663',
'info_dict': {
'id': '50NDFkeTqRHB0nXBOK-RGdSG5YQPuxHg',
'ext': 'mp4',
'title': 'Must-See Moments: Best of the MLS season',
},
'params': {
# m3u8 download
@@ -41,15 +53,26 @@ class ESPNIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
video_id = self._search_regex(
r'class="video-play-button"[^>]+data-id="(\d+)',
webpage, 'video id')
r'class=(["\']).*?video-play-button.*?\1[^>]+data-id=["\'](?P<id>\d+)',
webpage, 'video id', group='id')
cms = 'espn'
if 'data-source="intl"' in webpage:
cms = 'intl'
player_url = 'https://espn.go.com/video/iframe/twitter/?id=%s&cms=%s' % (video_id, cms)
player = self._download_webpage(
'https://espn.go.com/video/iframe/twitter/?id=%s' % video_id, video_id)
player_url, video_id)
pcode = self._search_regex(
r'["\']pcode=([^"\']+)["\']', player, 'pcode')
return self.url_result(
'ooyalaexternal:espn:%s:%s' % (video_id, pcode),
'OoyalaExternal')
title = remove_end(
self._og_search_title(webpage),
'- ESPN Video').strip()
return {
'_type': 'url_transparent',
'url': 'ooyalaexternal:%s:%s:%s' % (cms, video_id, pcode),
'ie_key': 'OoyalaExternal',
'title': title,
}

View File

@@ -61,7 +61,7 @@ class EsriVideoIE(InfoExtractor):
webpage, 'duration', fatal=False))
upload_date = unified_strdate(self._html_search_meta(
'last-modified', webpage, 'upload date', fatal=None))
'last-modified', webpage, 'upload date', fatal=False))
return {
'id': video_id,

View File

@@ -3,11 +3,9 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_urllib_request,
)
from ..utils import (
ExtractorError,
sanitized_Request,
)
@@ -42,7 +40,7 @@ class EveryonesMixtapeIE(InfoExtractor):
playlist_id = mobj.group('id')
pllist_url = 'http://everyonesmixtape.com/mixtape.php?a=getMixes&u=-1&linked=%s&explore=' % playlist_id
pllist_req = compat_urllib_request.Request(pllist_url)
pllist_req = sanitized_Request(pllist_url)
pllist_req.add_header('X-Requested-With', 'XMLHttpRequest')
playlist_list = self._download_json(
@@ -55,7 +53,7 @@ class EveryonesMixtapeIE(InfoExtractor):
raise ExtractorError('Playlist id not found')
pl_url = 'http://everyonesmixtape.com/mixtape.php?a=getMix&id=%s&userId=null&code=' % playlist_no
pl_req = compat_urllib_request.Request(pl_url)
pl_req = sanitized_Request(pl_url)
pl_req.add_header('X-Requested-With', 'XMLHttpRequest')
playlist = self._download_json(
pl_req, playlist_id, note='Downloading playlist info')

View File

@@ -3,9 +3,9 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_urllib_request
from ..utils import (
int_or_none,
sanitized_Request,
str_to_int,
)
@@ -37,7 +37,7 @@ class ExtremeTubeIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
req = compat_urllib_request.Request(url)
req = sanitized_Request(url)
req.add_header('Cookie', 'age_verified=1')
webpage = self._download_webpage(req, video_id)

View File

@@ -6,15 +6,17 @@ import socket
from .common import InfoExtractor
from ..compat import (
compat_etree_fromstring,
compat_http_client,
compat_str,
compat_urllib_error,
compat_urllib_parse_unquote,
compat_urllib_request,
compat_urllib_parse_unquote_plus,
)
from ..utils import (
error_to_compat_str,
ExtractorError,
limit_length,
sanitized_Request,
urlencode_postdata,
get_element_by_id,
clean_html,
@@ -23,19 +25,30 @@ from ..utils import (
class FacebookIE(InfoExtractor):
_VALID_URL = r'''(?x)
https?://(?:\w+\.)?facebook\.com/
(?:[^#]*?\#!/)?
(?:
(?:video/video\.php|photo\.php|video\.php|video/embed)\?(?:.*?)
(?:v|video_id)=|
[^/]+/videos/(?:[^/]+/)?
)
(?P<id>[0-9]+)
(?:.*)'''
(?:
https?://
(?:\w+\.)?facebook\.com/
(?:[^#]*?\#!/)?
(?:
(?:
video/video\.php|
photo\.php|
video\.php|
video/embed
)\?(?:.*?)(?:v|video_id)=|
[^/]+/videos/(?:[^/]+/)?
)|
facebook:
)
(?P<id>[0-9]+)
'''
_LOGIN_URL = 'https://www.facebook.com/login.php?next=http%3A%2F%2Ffacebook.com%2Fhome.php&login_attempt=1'
_CHECKPOINT_URL = 'https://www.facebook.com/checkpoint/?next=http%3A%2F%2Ffacebook.com%2Fhome.php&_fb_noscript=1'
_NETRC_MACHINE = 'facebook'
IE_NAME = 'facebook'
_CHROME_USER_AGENT = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.97 Safari/537.36'
_TESTS = [{
'url': 'https://www.facebook.com/video.php?v=637842556329505&fref=nf',
'md5': '6a40d33c0eccbb1af76cf0485a052659',
@@ -57,6 +70,16 @@ class FacebookIE(InfoExtractor):
'expected_warnings': [
'title'
]
}, {
'note': 'Video with DASH manifest',
'url': 'https://www.facebook.com/video.php?v=957955867617029',
'md5': '54706e4db4f5ad58fbad82dde1f1213f',
'info_dict': {
'id': '957955867617029',
'ext': 'mp4',
'title': 'When you post epic content on instagram.com/433 8 million followers, this is ...',
'uploader': 'Demy de Zeeuw',
},
}, {
'url': 'https://www.facebook.com/video.php?v=10204634152394104',
'only_matching': True,
@@ -66,6 +89,9 @@ class FacebookIE(InfoExtractor):
}, {
'url': 'https://www.facebook.com/ChristyClarkForBC/videos/vb.22819070941/10153870694020942/?type=2&theater',
'only_matching': True,
}, {
'url': 'facebook:544765982287235',
'only_matching': True,
}]
def _login(self):
@@ -73,8 +99,8 @@ class FacebookIE(InfoExtractor):
if useremail is None:
return
login_page_req = compat_urllib_request.Request(self._LOGIN_URL)
login_page_req.add_header('Cookie', 'locale=en_US')
login_page_req = sanitized_Request(self._LOGIN_URL)
self._set_cookie('facebook.com', 'locale', 'en_US')
login_page = self._download_webpage(login_page_req, None,
note='Downloading login page',
errnote='Unable to download login page')
@@ -94,29 +120,41 @@ class FacebookIE(InfoExtractor):
'timezone': '-60',
'trynum': '1',
}
request = compat_urllib_request.Request(self._LOGIN_URL, urlencode_postdata(login_form))
request = sanitized_Request(self._LOGIN_URL, urlencode_postdata(login_form))
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
try:
login_results = self._download_webpage(request, None,
note='Logging in', errnote='unable to fetch login page')
if re.search(r'<form(.*)name="login"(.*)</form>', login_results) is not None:
self._downloader.report_warning('unable to log in: bad username/password, or exceded login rate limit (~3/min). Check credentials or wait.')
error = self._html_search_regex(
r'(?s)<div[^>]+class=(["\']).*?login_error_box.*?\1[^>]*><div[^>]*>.*?</div><div[^>]*>(?P<error>.+?)</div>',
login_results, 'login error', default=None, group='error')
if error:
raise ExtractorError('Unable to login: %s' % error, expected=True)
self._downloader.report_warning('unable to log in: bad username/password, or exceeded login rate limit (~3/min). Check credentials or wait.')
return
fb_dtsg = self._search_regex(
r'name="fb_dtsg" value="(.+?)"', login_results, 'fb_dtsg', default=None)
h = self._search_regex(
r'name="h"\s+(?:\w+="[^"]+"\s+)*?value="([^"]+)"', login_results, 'h', default=None)
if not fb_dtsg or not h:
return
check_form = {
'fb_dtsg': self._search_regex(r'name="fb_dtsg" value="(.+?)"', login_results, 'fb_dtsg'),
'h': self._search_regex(
r'name="h"\s+(?:\w+="[^"]+"\s+)*?value="([^"]+)"', login_results, 'h'),
'fb_dtsg': fb_dtsg,
'h': h,
'name_action_selected': 'dont_save',
}
check_req = compat_urllib_request.Request(self._CHECKPOINT_URL, urlencode_postdata(check_form))
check_req = sanitized_Request(self._CHECKPOINT_URL, urlencode_postdata(check_form))
check_req.add_header('Content-Type', 'application/x-www-form-urlencoded')
check_response = self._download_webpage(check_req, None,
note='Confirming login')
if re.search(r'id="checkpointSubmitButton"', check_response) is not None:
self._downloader.report_warning('Unable to confirm login, you have to login in your brower and authorize the login.')
self._downloader.report_warning('Unable to confirm login, you have to login in your browser and authorize the login.')
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
self._downloader.report_warning('unable to log in: %s' % compat_str(err))
self._downloader.report_warning('unable to log in: %s' % error_to_compat_str(err))
return
def _real_initialize(self):
@@ -124,13 +162,36 @@ class FacebookIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
url = 'https://www.facebook.com/video/video.php?v=%s' % video_id
webpage = self._download_webpage(url, video_id)
req = sanitized_Request('https://www.facebook.com/video/video.php?v=%s' % video_id)
req.add_header('User-Agent', self._CHROME_USER_AGENT)
webpage = self._download_webpage(req, video_id)
video_data = None
BEFORE = '{swf.addParam(param[0], param[1]);});\n'
AFTER = '.forEach(function(variable) {swf.addVariable(variable[0], variable[1]);});'
m = re.search(re.escape(BEFORE) + '(.*?)' + re.escape(AFTER), webpage)
if not m:
if m:
data = dict(json.loads(m.group(1)))
params_raw = compat_urllib_parse_unquote(data['params'])
video_data = json.loads(params_raw)['video_data']
def video_data_list2dict(video_data):
ret = {}
for item in video_data:
format_id = item['stream_type']
ret.setdefault(format_id, []).append(item)
return ret
if not video_data:
server_js_data = self._parse_json(self._search_regex(
r'handleServerJS\(({.+})\);', webpage, 'server js data'), video_id)
for item in server_js_data['instances']:
if item[1][0] == 'VideoConfig':
video_data = video_data_list2dict(item[2][0]['videoData'])
break
if not video_data:
m_msg = re.search(r'class="[^"]*uiInterstitialContent[^"]*"><div>(.*?)</div>', webpage)
if m_msg is not None:
raise ExtractorError(
@@ -138,12 +199,9 @@ class FacebookIE(InfoExtractor):
expected=True)
else:
raise ExtractorError('Cannot parse data')
data = dict(json.loads(m.group(1)))
params_raw = compat_urllib_parse_unquote(data['params'])
params = json.loads(params_raw)
formats = []
for format_id, f in params['video_data'].items():
for format_id, f in video_data.items():
if not f or not isinstance(f, list):
continue
for quality in ('sd', 'hd'):
@@ -155,16 +213,23 @@ class FacebookIE(InfoExtractor):
'url': src,
'preference': -10 if format_id == 'progressive' else 0,
})
dash_manifest = f[0].get('dash_manifest')
if dash_manifest:
formats.extend(self._parse_dash_manifest(
compat_etree_fromstring(compat_urllib_parse_unquote_plus(dash_manifest)),
namespace='urn:mpeg:dash:schema:mpd:2011'))
if not formats:
raise ExtractorError('Cannot find video formats')
self._sort_formats(formats)
video_title = self._html_search_regex(
r'<h2\s+[^>]*class="uiHeaderTitle"[^>]*>([^<]*)</h2>', webpage, 'title',
default=None)
if not video_title:
video_title = self._html_search_regex(
r'(?s)<span class="fbPhotosPhotoCaption".*?id="fbPhotoPageCaption"><span class="hasCaption">(.*?)</span>',
webpage, 'alternative title', fatal=False)
webpage, 'alternative title', default=None)
video_title = limit_length(video_title, 80)
if not video_title:
video_title = 'Facebook video #%s' % video_id
@@ -176,3 +241,33 @@ class FacebookIE(InfoExtractor):
'formats': formats,
'uploader': uploader,
}
class FacebookPostIE(InfoExtractor):
IE_NAME = 'facebook:post'
_VALID_URL = r'https?://(?:\w+\.)?facebook\.com/[^/]+/posts/(?P<id>\d+)'
_TEST = {
'url': 'https://www.facebook.com/maxlayn/posts/10153807558977570',
'md5': '037b1fa7f3c2d02b7a0d7bc16031ecc6',
'info_dict': {
'id': '544765982287235',
'ext': 'mp4',
'title': '"What are you doing running in the snow?"',
'uploader': 'FailArmy',
}
}
def _real_extract(self, url):
post_id = self._match_id(url)
webpage = self._download_webpage(url, post_id)
entries = [
self.url_result('facebook:%s' % video_id, FacebookIE.ie_key())
for video_id in self._parse_json(
self._search_regex(
r'(["\'])video_ids\1\s*:\s*(?P<ids>\[.+?\])',
webpage, 'video ids', group='ids'),
post_id)]
return self.playlist_result(entries, post_id)

View File

@@ -2,6 +2,11 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
xpath_element,
xpath_text,
int_or_none,
)
class FazIE(InfoExtractor):
@@ -37,31 +42,32 @@ class FazIE(InfoExtractor):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
description = self._og_search_description(webpage)
config_xml_url = self._search_regex(
r'writeFLV\(\'(.+?)\',', webpage, 'config xml url')
r'videoXMLURL\s*=\s*"([^"]+)', webpage, 'config xml url')
config = self._download_xml(
config_xml_url, video_id, 'Downloading config xml')
encodings = config.find('ENCODINGS')
encodings = xpath_element(config, 'ENCODINGS', 'encodings', True)
formats = []
for pref, code in enumerate(['LOW', 'HIGH', 'HQ']):
encoding = encodings.find(code)
if encoding is None:
continue
encoding_url = encoding.find('FILENAME').text
formats.append({
'url': encoding_url,
'format_id': code.lower(),
'quality': pref,
})
encoding = xpath_element(encodings, code)
if encoding:
encoding_url = xpath_text(encoding, 'FILENAME')
if encoding_url:
formats.append({
'url': encoding_url,
'format_id': code.lower(),
'quality': pref,
'tbr': int_or_none(xpath_text(encoding, 'AVERAGEBITRATE')),
})
self._sort_formats(formats)
descr = self._html_search_regex(
r'<p class="Content Copy">(.*?)</p>', webpage, 'description', fatal=False)
return {
'id': video_id,
'title': self._og_search_title(webpage),
'formats': formats,
'description': descr,
'thumbnail': config.find('STILL/STILL_BIG').text,
'description': description.strip() if description else None,
'thumbnail': xpath_text(config, 'STILL/STILL_BIG'),
'duration': int_or_none(xpath_text(config, 'DURATION')),
}

View File

@@ -12,6 +12,7 @@ from ..compat import (
from ..utils import (
encode_dict,
ExtractorError,
sanitized_Request,
)
@@ -36,8 +37,8 @@ class FC2IE(InfoExtractor):
'params': {
'username': 'ytdl@yt-dl.org',
'password': '(snip)',
'skip': 'requires actual password'
}
},
'skip': 'requires actual password',
}, {
'url': 'http://video.fc2.com/en/a/content/20130926eZpARwsF',
'only_matching': True,
@@ -57,7 +58,7 @@ class FC2IE(InfoExtractor):
}
login_data = compat_urllib_parse.urlencode(encode_dict(login_form_strs)).encode('utf-8')
request = compat_urllib_request.Request(
request = sanitized_Request(
'https://secure.id.fc2.com/index.php?mode=login&switch_language=en', login_data)
login_results = self._download_webpage(request, None, note='Logging in', errnote='Unable to log in')
@@ -66,7 +67,7 @@ class FC2IE(InfoExtractor):
return False
# this is also needed
login_redir = compat_urllib_request.Request('http://id.fc2.com/?mode=redirect&login=done')
login_redir = sanitized_Request('http://id.fc2.com/?mode=redirect&login=done')
self._download_webpage(
login_redir, None, note='Login redirect', errnote='Login redirect failed')

View File

@@ -1,12 +1,10 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
clean_html,
determine_ext,
ExtractorError,
js_to_json,
)
@@ -32,24 +30,22 @@ class FKTVIE(InfoExtractor):
'http://fernsehkritik.tv/folge-%s/play' % episode, episode)
title = clean_html(self._html_search_regex(
'<h3>([^<]+)</h3>', webpage, 'title'))
matches = re.search(
r'(?s)<video(?:(?!poster)[^>])+(?:poster="([^"]+)")?[^>]*>(.*)</video>',
webpage)
if matches is None:
raise ExtractorError('Unable to extract the video')
thumbnail = self._search_regex(r'POSTER\s*=\s*"([^"]+)', webpage, 'thumbnail', fatal=False)
sources = self._parse_json(self._search_regex(r'(?s)MEDIA\s*=\s*(\[.+?\]);', webpage, 'media'), episode, js_to_json)
poster, sources = matches.groups()
if poster is None:
self.report_warning('unable to extract thumbnail')
formats = []
for source in sources:
furl = source.get('src')
if furl:
formats.append({
'url': furl,
'format_id': determine_ext(furl),
})
self._sort_formats(formats)
urls = re.findall(r'<source[^>]+src="([^"]+)"', sources)
formats = [{
'url': furl,
'format_id': determine_ext(furl),
} for furl in urls]
return {
'id': episode,
'title': title,
'formats': formats,
'thumbnail': poster,
'thumbnail': thumbnail,
}

View File

@@ -1,67 +1,93 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_urllib_request
from ..compat import compat_urllib_parse
from ..utils import (
ExtractorError,
find_xpath_attr,
int_or_none,
qualities,
)
class FlickrIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.|secure\.)?flickr\.com/photos/(?P<uploader_id>[\w\-_@]+)/(?P<id>\d+).*'
_VALID_URL = r'https?://(?:www\.|secure\.)?flickr\.com/photos/[\w\-_@]+/(?P<id>\d+)'
_TEST = {
'url': 'http://www.flickr.com/photos/forestwander-nature-pictures/5645318632/in/photostream/',
'md5': '6fdc01adbc89d72fc9c4f15b4a4ba87b',
'md5': '164fe3fa6c22e18d448d4d5af2330f31',
'info_dict': {
'id': '5645318632',
'ext': 'mp4',
"description": "Waterfalls in the Springtime at Dark Hollow Waterfalls. These are located just off of Skyline Drive in Virginia. They are only about 6/10 of a mile hike but it is a pretty steep hill and a good climb back up.",
"uploader_id": "forestwander-nature-pictures",
"title": "Dark Hollow Waterfalls"
'ext': 'mpg',
'description': 'Waterfalls in the Springtime at Dark Hollow Waterfalls. These are located just off of Skyline Drive in Virginia. They are only about 6/10 of a mile hike but it is a pretty steep hill and a good climb back up.',
'title': 'Dark Hollow Waterfalls',
'duration': 19,
'timestamp': 1303528740,
'upload_date': '20110423',
'uploader_id': '10922353@N03',
'uploader': 'Forest Wander',
'comment_count': int,
'view_count': int,
'tags': list,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
_API_BASE_URL = 'https://api.flickr.com/services/rest?'
video_id = mobj.group('id')
video_uploader_id = mobj.group('uploader_id')
webpage_url = 'http://www.flickr.com/photos/' + video_uploader_id + '/' + video_id
req = compat_urllib_request.Request(webpage_url)
req.add_header(
'User-Agent',
# it needs a more recent version
'Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20150101 Firefox/38.0 (Chrome)')
webpage = self._download_webpage(req, video_id)
secret = self._search_regex(r'secret"\s*:\s*"(\w+)"', webpage, 'secret')
first_url = 'https://secure.flickr.com/apps/video/video_mtl_xml.gne?v=x&photo_id=' + video_id + '&secret=' + secret + '&bitrate=700&target=_self'
first_xml = self._download_xml(first_url, video_id, 'Downloading first data webpage')
node_id = find_xpath_attr(
first_xml, './/{http://video.yahoo.com/YEP/1.0/}Item', 'id',
'id').text
second_url = 'https://secure.flickr.com/video_playlist.gne?node_id=' + node_id + '&tech=flash&mode=playlist&bitrate=700&secret=' + secret + '&rd=video.yahoo.com&noad=1'
second_xml = self._download_xml(second_url, video_id, 'Downloading second data webpage')
self.report_extraction(video_id)
stream = second_xml.find('.//STREAM')
if stream is None:
raise ExtractorError('Unable to extract video url')
video_url = stream.attrib['APP'] + stream.attrib['FULLPATH']
return {
'id': video_id,
'url': video_url,
'ext': 'mp4',
'title': self._og_search_title(webpage),
'description': self._og_search_description(webpage),
'thumbnail': self._og_search_thumbnail(webpage),
'uploader_id': video_uploader_id,
def _call_api(self, method, video_id, api_key, note, secret=None):
query = {
'photo_id': video_id,
'method': 'flickr.%s' % method,
'api_key': api_key,
'format': 'json',
'nojsoncallback': 1,
}
if secret:
query['secret'] = secret
data = self._download_json(self._API_BASE_URL + compat_urllib_parse.urlencode(query), video_id, note)
if data['stat'] != 'ok':
raise ExtractorError(data['message'])
return data
def _real_extract(self, url):
video_id = self._match_id(url)
api_key = self._download_json(
'https://www.flickr.com/hermes_error_beacon.gne', video_id,
'Downloading api key')['site_key']
video_info = self._call_api(
'photos.getInfo', video_id, api_key, 'Downloading video info')['photo']
if video_info['media'] == 'video':
streams = self._call_api(
'video.getStreamInfo', video_id, api_key,
'Downloading streams info', video_info['secret'])['streams']
preference = qualities(
['288p', 'iphone_wifi', '100', '300', '700', '360p', 'appletv', '720p', '1080p', 'orig'])
formats = []
for stream in streams['stream']:
stream_type = str(stream.get('type'))
formats.append({
'format_id': stream_type,
'url': stream['_content'],
'preference': preference(stream_type),
})
self._sort_formats(formats)
owner = video_info.get('owner', {})
return {
'id': video_id,
'title': video_info['title']['_content'],
'description': video_info.get('description', {}).get('_content'),
'formats': formats,
'timestamp': int_or_none(video_info.get('dateuploaded')),
'duration': int_or_none(video_info.get('video', {}).get('duration')),
'uploader_id': owner.get('nsid'),
'uploader': owner.get('realname'),
'comment_count': int_or_none(video_info.get('comments', {}).get('_content')),
'view_count': int_or_none(video_info.get('views')),
'tags': [tag.get('_content') for tag in video_info.get('tags', {}).get('tag', [])]
}
else:
raise ExtractorError('not a video', expected=True)

View File

@@ -13,6 +13,7 @@ class FootyRoomIE(InfoExtractor):
'title': 'Schalke 04 0 2 Real Madrid',
},
'playlist_count': 3,
'skip': 'Video for this match is not available',
}, {
'url': 'http://footyroom.com/georgia-0-2-germany-2015-03/',
'info_dict': {

View File

@@ -3,12 +3,10 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_urllib_request,
)
from ..utils import (
parse_duration,
parse_iso8601,
sanitized_Request,
str_to_int,
)
@@ -93,7 +91,7 @@ class FourTubeIE(InfoExtractor):
b'Content-Type': b'application/x-www-form-urlencoded',
b'Origin': b'http://www.4tube.com',
}
token_req = compat_urllib_request.Request(token_url, b'{}', headers)
token_req = sanitized_Request(token_url, b'{}', headers)
tokens = self._download_json(token_req, video_id)
formats = [{
'url': tokens[format]['token'],

Some files were not shown because too many files have changed in this diff Show More