Compare commits

..

479 Commits

Author SHA1 Message Date
Sergey M․
c485959034 release 2016.07.13 2016-07-13 23:58:01 +07:00
Sergey M․
a0560d8ab8 [ellentv] Improve extraction (Closes #10067) 2016-07-13 22:42:53 +07:00
Remita Amine
0385aa6199 [bbc] extract more and better qulities from Unified Streaming Platform m3u8 manifests 2016-07-13 15:58:24 +01:00
Remita Amine
00f4764cb7 [common] extract vbr, abr and fps for Unified Streaming Platform m3u8 manifests 2016-07-13 15:58:24 +01:00
Sergey M․
51c2cd0b83 [extractors] Add vk:wallpost extractor import 2016-07-13 21:53:23 +07:00
Sergey M․
5f5a9d6158 [vk] Improve login 2016-07-13 21:52:52 +07:00
Sergey M․
2d19fb5072 [vk:wallpost] Add extractor 2016-07-13 21:51:44 +07:00
Yen Chi Hsuan
9d865a1af6 [travis] Skip downloading srelay
SOCKS tests never run on Travis CI due to unknown reasons, and
downloading them broke some tests (e.g.
https://travis-ci.org/rg3/youtube-dl/builds/144306425)
2016-07-13 14:27:14 +08:00
Remita Amine
41aa44259d [shahid] try to bypass geo restriction and extract more metadata(closes #10062) 2016-07-12 23:15:38 +01:00
Philipp Hagemeister
381ff44756 [devscripts/generate-download] Remove MD5 and SHA1 2016-07-12 09:09:54 +02:00
Sergey M․
7f29cf545a [youtube] Add YouTube Red paid video reference test (#10059) 2016-07-12 02:10:35 +07:00
Remita Amine
7d1219f3e0 [tmz] delegate extraction to KalturaIE 2016-07-11 19:08:22 +01:00
Remita Amine
f1b4af7d79 [beightcove:new] remove html tags from description 2016-07-11 19:06:50 +01:00
Remita Amine
8a8590a617 [dbtv] delegate extraction to BrightcoveNewIE 2016-07-11 16:30:24 +01:00
Remita Amine
4a7a5e41f7 [tvplay] improve extraction 2016-07-11 14:51:44 +01:00
Yen Chi Hsuan
2a49d01600 [playvid] Update _TESTS
Blocks https://travis-ci.org/rg3/youtube-dl/jobs/143809100
2016-07-11 15:15:28 +08:00
Yen Chi Hsuan
b99af8a51c [biobiochiletv] Fix extraction and update _TESTS 2016-07-11 13:23:57 +08:00
Yen Chi Hsuan
8e7020daef [rudo] Add new extractor
Used in biobiochile.tv
2016-07-11 13:19:25 +08:00
Sergey M․
a26bcc61c1 release 2016.07.11 2016-07-11 03:17:12 +07:00
Sergey M․
5c4dcf8172 [vidzi] Add support for embed URLs (Closes #10058) 2016-07-11 03:14:39 +07:00
Sergey M․
e9fb6a4bbe [youtube] Relax TFA regexes 2016-07-11 03:08:38 +07:00
Yen Chi Hsuan
e2dbcaa1bf [vuclip] Fix extraction 2016-07-11 00:52:25 +08:00
Yen Chi Hsuan
ae01850165 [miomio] Fix _TESTS 2016-07-11 00:03:24 +08:00
Yen Chi Hsuan
c3baaedfc8 [miomio] Support new 'h5' player (closes #9605)
Depends on #8876
2016-07-10 23:46:48 +08:00
Yen Chi Hsuan
0b68de3cc1 Merge pull request #8876 from remitamine/html5_media
[extractor/common] add helper method to extract html5 media entries
2016-07-10 23:40:45 +08:00
Sergey M․
39e9d524e5 Credit @nehalvpatel for roosterteeth (#9864) 2016-07-10 01:30:12 +07:00
Sergey M․
865b087224 [roosterteeth] Improve (Closes #9864) 2016-07-10 01:30:12 +07:00
Nehal Patel
3121b25639 [roosterteeth] Add extractor 2016-07-10 01:30:12 +07:00
Sergey M․
0286b85c79 release 2016.07.09.2 2016-07-09 22:22:24 +07:00
Sergey M․
ab52bb5137 [animeondemand] Fix typo 2016-07-09 22:20:34 +07:00
Sergey M․
61a98b8623 [lynda] Remove md5 from test (Closes #10047) 2016-07-09 21:29:11 +07:00
Sergey M․
6daf34a045 [facebook] Fix typo and break when found video_data (Closes #10048) 2016-07-09 21:25:07 +07:00
Yen Chi Hsuan
c03adf90bd [generic] Add the test. Closes #1638 2016-07-09 14:39:01 +08:00
Yen Chi Hsuan
0ece114b7b [vimeo] Recognize non-standard embeds (#1638) 2016-07-09 14:38:27 +08:00
Yen Chi Hsuan
5b6a74856b Merge pull request #9288 from reyyed/issue#9063fix
[ffmpeg] Fix embedding subtitles (#9063)
2016-07-09 14:29:53 +08:00
Sergey M․
ce43100a01 release 2016.07.09.1 2016-07-09 10:06:40 +07:00
Remita Amine
8cc9b4016d [srmediathek] extend _VALID_URL(closes #9373) 2016-07-09 03:22:09 +01:00
Remita Amine
31eeab9f41 [ard] fix f4m extraction and skip tests with 404 errors 2016-07-09 03:22:09 +01:00
Sergey M․
9558dcec9c [youtube:user] Preserve user/c path segment 2016-07-09 08:37:19 +07:00
Sergey M․
6e6b70d65f [extractor/generic] Properly comment out a test 2016-07-09 08:37:19 +07:00
Sergey M․
d417fd88d0 release 2016.07.09 2016-07-09 07:16:47 +07:00
Sergey M․
9e4f5dc1e9 [animeondemand] Pass num for episode based videos 2016-07-09 07:13:32 +07:00
Sergey M․
1251565ee0 [options] Rollback old behavior for configuratio files' encoding
Until agreed with some solution
2016-07-09 07:12:52 +07:00
Sergey M․
1f7258a367 [animeondemand] Add support for full length films (Closes #10031) 2016-07-09 06:57:04 +07:00
Sergey M․
0af985069b [flipagram] Improve extraction (Closes #9898) 2016-07-09 03:31:17 +07:00
Sergey M․
0de168f7ed [extractor/generic] Detect schema.org/VideoObject embeds 2016-07-09 03:29:07 +07:00
Sergey M․
95b31e266b [extractor/common] Add expected_type in json ld routines 2016-07-09 03:28:04 +07:00
Sergey M․
6b3a3098b5 [extractor/common] Extract more metadata for VideoObject in _json_ld 2016-07-09 03:27:11 +07:00
Sergey M․
2de624fdd5 [extractor/common] Introduce filesize metafield for thumbnails 2016-07-09 03:24:36 +07:00
Déstin Reed
3fee7f636c [flipagram] Add extractor 2016-07-09 03:23:32 +07:00
Remita Amine
89e2fff2b7 [mgtv] pass geo verification headers for api request 2016-07-08 20:18:25 +01:00
Sergey M․
cedc70b292 [facebook] Fix invalid video being extracted (Closes #9851) 2016-07-09 00:28:07 +07:00
Remita Amine
07d7689f2e [le] extract http formats 2016-07-08 15:35:20 +01:00
Yen Chi Hsuan
ae8cb5328d Merge branch 'JakubAdamWieczorek-polskie-radio' 2016-07-08 19:35:21 +08:00
Yen Chi Hsuan
2e32ac0b9a [polskieradio] Fix regex in _TESTS 2016-07-08 19:34:53 +08:00
Yen Chi Hsuan
672f01c370 Merge branch 'polskie-radio' of https://github.com/JakubAdamWieczorek/youtube-dl into JakubAdamWieczorek-polskie-radio 2016-07-08 19:33:28 +08:00
Jakub Adam Wieczorek
e2d616dd30 [polskieradio] Add thumbnails. 2016-07-08 13:23:00 +02:00
Yen Chi Hsuan
0ab7f4fe2b [nick] support nickjr.com (closes #7542) 2016-07-08 15:11:28 +08:00
Sergey M․
29c4a07776 [lynda] Fix test 2016-07-08 03:33:53 +07:00
Philipp Hagemeister
826e911e41 Merge branch 'master' of github.com:rg3/youtube-dl 2016-07-07 19:42:22 +02:00
Philipp Hagemeister
30d22dae8e [options] Do not decode Unicode on Python 2.x
The configuration file contents are being returned as unicode now, so decoding them is no longer necessary.
(Run python2 with -3 to see the warning before this commit)
2016-07-07 19:41:00 +02:00
Yen Chi Hsuan
ec3518725b [compat] Fix test_cmdline_umlauts on Python 2.6
The original statement raises uncaught UnicodeWarning on Python 2.6
2016-07-07 22:30:58 +08:00
Remita Amine
5f87d845eb [tweakers] fix info extraction(closes #9516) 2016-07-07 12:51:42 +01:00
Philipp Hagemeister
571808a7aa document comments in configuration file (fixes #10024) 2016-07-07 12:12:21 +02:00
Yen Chi Hsuan
dfe5fa49ae [compat] Fix compat_shlex_split for non-ASCII input
Closes #9871
2016-07-07 17:37:29 +08:00
Remita Amine
01a0c511eb [radiocanada] extract more formats 2016-07-07 03:46:12 +01:00
remitamine
b3d30315ce Merge pull request #9597 from remitamine/toutv
[toutv] fix info extraction(closes #1792)(closes #2082)
2016-07-07 01:51:01 +01:00
remitamine
882af14d7d [toutv] fix info extraction(closes #1792)(closes #2082) 2016-07-07 01:47:28 +01:00
Remita Amine
47335a0efa [telecinco] fix info extraction 2016-07-06 23:09:13 +01:00
Sergey M․
34bc2d9dfd release 2016.07.07 2016-07-07 01:54:29 +07:00
Sergey M․
08c7af4afa [kamcord] Add extractor (Closes #10001) 2016-07-07 01:50:39 +07:00
Yen Chi Hsuan
f7291a0b7c [daum.net] Fix extraction for specific examples
Closes #9972
2016-07-07 01:26:14 +08:00
Yen Chi Hsuan
c65aa4e9e1 [brightcove:legacy] Support 'playlistTabs' and skip a dead test
Closes #9965
2016-07-07 01:13:37 +08:00
Yen Chi Hsuan
ad213a1d74 [francetv] Recognize more Dailymotion embedded videos
Closes #9955
2016-07-06 23:37:54 +08:00
Yen Chi Hsuan
43f1e4e41e [onet] Add MD5 checksum 2016-07-06 20:32:03 +08:00
Yen Chi Hsuan
54b0e909d5 [amp] Fix a typo 2016-07-06 20:10:47 +08:00
Yen Chi Hsuan
f8752b86ac [Onet,ClipRs] Add new extractor for onet.tv and use it for clip.rs
Closes #9950
2016-07-06 20:09:05 +08:00
Yen Chi Hsuan
84c237fb8a [utils] Add get_element_by_class
For #9950
2016-07-06 20:02:52 +08:00
Remita Amine
ab49d7a9fa use mimetype2ext to determine manifest ext in multiple extractors 2016-07-06 09:11:46 +01:00
Remita Amine
b4173f1551 [utils] add mimetypes to determine manifest ext(m3u8, f4m, mpd) 2016-07-06 09:06:28 +01:00
Remita Amine
2817b99cf2 [metacafe] fix info extraction(closes #8539)(closes #3253) 2016-07-06 02:19:55 +01:00
Remita Amine
001fffd004 [spiegel:article] update test(closes #10018) 2016-07-06 00:16:41 +01:00
Sergey M․
0e94b4713d release 2016.07.06 2016-07-06 00:54:23 +07:00
Sergey M․
a6d3b89feb [prosiebensat1] Make downloading urls JSON non fatal 2016-07-06 00:52:48 +07:00
Remita Amine
6c26815d63 [onionstudios] fix info extraction 2016-07-05 18:05:07 +01:00
Sergey M․
73c4ac2c95 [youtube:channel] Improve channel id extraction and detect unavailable channels (Closes #10009) 2016-07-05 23:30:44 +07:00
Remita Amine
84f214d840 [prosiebensat1] extract all formats 2016-07-05 17:11:45 +01:00
Remita Amine
e3f88be7a9 [rtvnh] extract all formats 2016-07-05 14:45:39 +01:00
Remita Amine
31af3e35e0 [sandia] remove unused imports 2016-07-05 13:39:24 +01:00
Remita Amine
94a5cff91d [sendia] fix info extraction 2016-07-05 13:37:46 +01:00
Remita Amine
77082c7b9e [slideshare] fix description extraction 2016-07-05 12:01:04 +01:00
Remita Amine
252a1f75d2 [spiegel] improve info extraction 2016-07-05 11:46:25 +01:00
Remita Amine
5abf513cf8 [stitcher] fix episode config extraction 2016-07-05 10:44:16 +01:00
Yen Chi Hsuan
c6054e3201 [xuite] Support videos with already encoded media id 2016-07-05 14:26:42 +08:00
Yen Chi Hsuan
4080530624 [youtube:shared] Recognize the new 'shared' URLs
Closes #10007
2016-07-05 13:15:05 +08:00
Sergey M․
c25f1a9b63 release 2016.07.05 2016-07-05 06:32:46 +07:00
Remita Amine
dfaa86b75e [test_utils] add test for smuggling a smuggled url 2016-07-04 21:36:32 +01:00
Remita Amine
d9163ae3b6 [kaltura] fix extraction error for videos from multiple kaltura servers 2016-07-04 21:34:27 +01:00
Remita Amine
dafafe7cf1 [la7] extract more info from a kaltura custom server 2016-07-04 17:59:58 +01:00
Remita Amine
81953d1ae5 [kaltura] add support videos stored on custom kaltura servers(closes #5557) 2016-07-04 17:59:58 +01:00
Yen Chi Hsuan
3a212ed62e [iqiyi] Skip an unstable MD5 checksum 2016-07-04 11:25:46 +08:00
Sergey M․
195f084542 [pornhub] Detect private videos (Closes #9987) 2016-07-04 03:27:00 +07:00
Sergey M․
aa7a455b2e [README.md] Clarify configuration file may not exist by default 2016-07-04 01:24:33 +07:00
Sergey M․
6a4e659c93 [yahoo] Recognize brightcove embed (Closes #9995) 2016-07-03 23:00:36 +07:00
Yen Chi Hsuan
40f3666f6b [test/test_http] Update tests for 38cce791c7 2016-07-03 23:50:55 +08:00
Remita Amine
dd801bbe18 [brightcove] improve error detection 2016-07-03 16:37:22 +01:00
Yen Chi Hsuan
38cce791c7 Rename --cn-verfication-proxy to --geo-verification-proxy
And deprecate the former one

Since commit f138873900, this option is
not limited to China websites, so rename it.
2016-07-03 23:29:56 +08:00
Sergey M․
bf3ae6a543 [devscripts/show-downloads-statictics] Add script for displaying downloads statistics 2016-07-03 22:20:14 +07:00
Sergey M․
bff98341d5 release 2016.07.03.1 2016-07-03 21:28:55 +07:00
Yen Chi Hsuan
2644e911be [iqiyi] Fix extraction
See https://github.com/soimort/you-get/issues/1211#issuecomment-229011559
2016-07-03 22:19:56 +08:00
Remita Amine
a5f67895d3 [nationalgeographic] restore http formats
there was a misunderstanding about the reason of 403 response
the problem happen only when the user use aria2c as a downloader
a1f6f5c768 (commitcomment-18107559)
2016-07-03 14:10:25 +01:00
Yen Chi Hsuan
15e4b6b758 [rai] Support an alternative form of embedded relinker URL
Closes #8551
2016-07-03 19:52:11 +08:00
Yen Chi Hsuan
2b28b892d8 [rai] Support videos with embedded content item ID (#8551) 2016-07-03 19:52:11 +08:00
Sergey M․
7507fc98cb [README.md] Fix somes typo in coding conventions section 2016-07-03 18:35:28 +07:00
Yen Chi Hsuan
477b7a8474 [downloader/f4m] Fix for Rai live streams 2016-07-03 19:26:39 +08:00
Yen Chi Hsuan
034a884957 [rai] Support direct relinker URLs (closes #8552) 2016-07-03 19:26:39 +08:00
Remita Amine
64436cb1a4 [nationalgeographic] skip download for national geographic channel tests(closes #9991) 2016-07-03 10:43:36 +01:00
Yen Chi Hsuan
f138873900 [rai] Fix extraction and update _TESTS
Closes #8617
Closes #9157
Closes #9232
2016-07-03 15:49:35 +08:00
Yen Chi Hsuan
e793338c88 [buzzfeed] Detect Facebook embed and update _TESTS
Closes #5701
2016-07-03 14:12:02 +08:00
Yen Chi Hsuan
369bb06206 [facebook] Improve embed detection (#5701) 2016-07-03 14:11:29 +08:00
Sergey M․
2cb31d288e [history:topic] Relax _VALID_URL 2016-07-03 13:01:04 +07:00
Sergey M․
c723d1cd8d [README.md] Update some codebase links 2016-07-03 11:35:13 +07:00
Sergey M․
1f55234057 Add PULL_REQUEST_TEMPLATE.md 2016-07-03 11:31:49 +07:00
Sergey M․
04006fae8d [README.md] Start writing youtube-dl coding conventions 2016-07-03 11:31:07 +07:00
Jaime Marquínez Ferrándiz
4cb13d0d6a [hrti] Don't redefine variable in list comprehension 2016-07-02 23:02:14 +02:00
Remita Amine
a1f6f5c768 [nationalgeographic] add support Adobe Pass auth 2016-07-02 21:24:22 +01:00
Remita Amine
05c7feec77 [aenetworks] add support Adobe Pass auth 2016-07-02 21:24:22 +01:00
Remita Amine
bf83024826 [theplatform] add basic support for Adobe Pass 2016-07-02 21:24:22 +01:00
Sergey M․
a0cfd82dda release 2016.07.03 2016-07-03 03:19:22 +07:00
Sergey M․
1b734adb2d [xtube] Fix extraction (Closes #9953, closes #9961) 2016-07-03 03:17:35 +07:00
Sergey M․
9b724d7277 [extractors] Add hrti:playlist import 2016-07-03 02:25:39 +07:00
Sergey M․
c3a5dd3b5d Credit @atopuzov for hrti (#9482) 2016-07-03 02:22:59 +07:00
Sergey M․
e3755a624b [hrti] Improve and add support for playlists (Closes #9482) 2016-07-03 02:22:14 +07:00
Sergey M․
95cf60e826 [utils] Add PUTRequest 2016-07-03 02:21:32 +07:00
Aleksandar Topuzovic
6b03e1e25d [HRTi] Implement extractor for Croatian Radiotelevision 2016-07-03 02:20:41 +07:00
Yen Chi Hsuan
712b0b5b70 [la7.it] Fix the extractor 2016-07-02 23:49:03 +08:00
Yen Chi Hsuan
6a424391d9 [facebook] Make embed detection stricter to prevent false-positives 2016-07-02 23:15:55 +08:00
Yen Chi Hsuan
dbf0157a26 [generic] Add MD5 checksums 2016-07-02 21:58:07 +08:00
Yen Chi Hsuan
7deef1ba67 [generic] Support Wordpress "YouTube Video Importer" plugin
Closes #9938
2016-07-02 21:58:07 +08:00
Yen Chi Hsuan
fd6ca38262 [facebook] Improve Facebook embedded detection
Related to #9938.

Another example comes from 9834872bf6.
2016-07-02 21:58:07 +08:00
Sergey M․
bdafd88da0 [vk] Extend _VALID_URLs to support new domain (Closes #9981) 2016-07-02 16:43:19 +07:00
Sergey M․
7a1e71575e release 2016.07.02 2016-07-02 02:47:42 +07:00
Sergey M․
ac2d8f54d1 [vine] Remove superfluous whitespace 2016-07-02 02:45:00 +07:00
Sergey M․
14ff6baa0e [fusion] Improve 2016-07-02 02:44:37 +07:00
TRox1972
bb08101ec4 [Fusion] Add new extractor 2016-07-02 02:37:28 +07:00
Sergey M․
bc4b2d75ba [pornhub] Add support for thumbzilla (Closes #8696) 2016-07-02 02:11:07 +07:00
Sergey M․
35fc3021ba [periscope] Add another fallback source 2016-07-02 01:35:57 +07:00
cant-think-of-a-name
347227237b [periscope] fix playlist extraction (#9967)
The JSON response changed and the extractor needed to be updated in order to gather the video IDs.
2016-07-02 01:29:11 +07:00
Sergey M․
564dc3c6e8 [vine] Fix extraction (Closes #9970) 2016-07-02 01:24:57 +07:00
Sergey M․
9f4576a7eb [twitch] Update usher URL (Closes #9975) 2016-07-01 23:16:43 +07:00
Sergey M․
f11315e8d4 release 2016.07.01 2016-07-01 03:59:57 +07:00
Sergey M․
0c2ac64bb8 [sixplay] Rename preference key to quality in format dict 2016-07-01 03:57:59 +07:00
Jaime Marquínez Ferrándiz
a9eede3913 [test/compat] compat_shlex_split: test with newlines 2016-07-01 03:30:35 +07:00
Jaime Marquínez Ferrándiz
9e29ef13a3 [options] Accept quoted string across multiple lines (#9940)
Like:

    -f "
    bestvideo+bestaudio/
    best
    "
2016-07-01 03:30:31 +07:00
Sergey M․
eaaaaec042 [pornhub] Add more tests with removed videos 2016-07-01 03:18:27 +07:00
Sergey M․
3cb3b60064 [pornhub] Relax removed message regex (Closes #9964) 2016-07-01 03:14:23 +07:00
kidol
044e3d91b5 [Pornhub] Fix error detection 2016-07-01 02:59:50 +07:00
Remita Amine
c9e538a3b1 [ctvnews] use orderedSet, increase the number of items for playlists and use smaller bin list for test 2016-06-30 19:52:32 +01:00
Remita Amine
76dad392f5 [meta] Clarify the source of uppod st decryption algorithm 2016-06-30 18:27:57 +01:00
Remita Amine
9617b557aa [ctv] Add new extractor(closes #4077) 2016-06-30 18:22:35 +01:00
Remita Amine
bf4fa24414 [ctvnews] Add new extractor(closes #2156) 2016-06-30 18:22:35 +01:00
Remita Amine
20361b4f25 [rds] extract 9c9media formats 2016-06-30 18:22:35 +01:00
Remita Amine
05a0068a76 [9c9media] Add new extractor 2016-06-30 18:22:35 +01:00
Sergey M․
66a42309fa release 2016.06.30 2016-06-30 23:56:55 +07:00
Sergey M․
fd94e2671a [meta] Add support for pladform embeds 2016-06-30 23:20:44 +07:00
Sergey M․
8ff6697861 [pladform] Improve embed detection 2016-06-30 23:19:29 +07:00
Sergey M․
eafa643715 [meta] Make duration and description optional
For iframe URLs
2016-06-30 23:06:13 +07:00
Sergey M․
049da7cb6c [meta] Extend _VALID_URL 2016-06-30 23:04:18 +07:00
Remita Amine
7dbeee7e22 [generic] make twitter:player extraction non fatal 2016-06-30 14:11:55 +01:00
Remita Amine
93ad6c6bfa [sixplay] Add new extractor(closes #2183) 2016-06-30 13:50:49 +01:00
Remita Amine
329179073b [generic] add generic support for twitter:player embeds 2016-06-30 12:01:30 +01:00
Remita Amine
4d86d2008e [urplay] fix typo and check with flake8 2016-06-30 11:30:42 +01:00
Remita Amine
ab47b6e881 [theatlantic] Add new extractor(closes #6611) 2016-06-30 04:08:56 +01:00
Remita Amine
df43389ade [skysports] Add new extractor(closes #7066) 2016-06-30 02:54:21 +01:00
Remita Amine
397b305cfe [meta] Add new extractor(closes #8789) 2016-06-30 00:21:03 +01:00
Remita Amine
e496fa50cd [urplay] Add new extractor(closes #9332) 2016-06-29 20:19:31 +01:00
Sergey M․
06a96da15b [eagleplatform] Improve embed detection and extract in separate routine (Closes #9926) 2016-06-29 23:01:34 +07:00
Remita Amine
70157c2c43 [aenetworks] add support for movie pages 2016-06-29 16:55:17 +01:00
Remita Amine
c58ed8563d [aenetworks] extract history topic playlist title 2016-06-29 16:18:16 +01:00
Remita Amine
4c7821227c [aenetworks:historytopic] fix topic video url 2016-06-29 16:03:32 +01:00
Remita Amine
42362fdb5e [aenetworks] add support for show and season for A&E Network sites and History topics(closes #9816) 2016-06-29 15:49:17 +01:00
Sergey M․
97124e572d [arte:playlist] Fix test 2016-06-28 22:39:53 +07:00
Remita Amine
32616c14cc [vrt] extract all formats 2016-06-28 14:02:03 +01:00
Sergey M․
8174d0fe95 release 2016.06.27 2016-06-27 23:09:39 +07:00
Sergey M․
8704778d95 [pbs] Check manually constructed http links (Closes #9921) 2016-06-27 23:06:42 +07:00
Sergey M․
c287f2bc60 [extractor/generic] Use _extract_url for kaltura embeds (Closes #9922) 2016-06-27 22:45:26 +07:00
Sergey M․
9ea5c04c0d [kaltura] Add _extract_url with fixed regex 2016-06-27 22:44:17 +07:00
Sergey M․
fd7a7498a4 [test_all_urls] PEP 8 and change wording 2016-06-27 22:11:45 +07:00
Matthieu Muffato
e3a6747d8f New test-case: extractor names are supposed to be unique
@dstftw explained in
https://github.com/rg3/youtube-dl/pull/9918#issuecomment-228625878 that
extractor names are supposed to be unique. @dstftw has fixed the two
offending extractors, and here I add a test to ensure this does not
happen in the future.
2016-06-27 22:09:29 +07:00
Sergey M․
f41ffc00d1 [skynewsarabia:article] Clarify IE_NAME 2016-06-27 05:08:09 +07:00
Sergey M․
81fda15369 [sr:mediathek] Clarify IE_NAME 2016-06-27 05:07:12 +07:00
Sergey M․
427cd050a3 [extractor/generic] Improve kaltura embed detection (Closes #9911) 2016-06-27 04:11:53 +07:00
Sergey M․
b0c200f1ec [msn] Add test URL with non-alphanumeric characters 2016-06-26 22:03:36 +07:00
Sergey M․
92747e664a release 2016.06.26 2016-06-26 21:15:24 +07:00
Sergey M․
f1f336322d [msn] Fix extraction (Closes #8960, closes #9542) 2016-06-26 21:10:05 +07:00
Sergey M․
bf8dd79045 [extractor/common] Fix sorting with custom field preference 2016-06-26 21:09:07 +07:00
TRox1972
c6781156aa [MSN] add new extractor 2016-06-26 21:07:59 +07:00
remitamine
59bbe4911a [extractor/common] add helper method to extract html5 media entries 2016-06-26 14:04:08 +01:00
remitamine
4f3c5e0627 [utils] add helper function for parsing codecs 2016-06-26 14:03:58 +01:00
Sergey M․
f484c5fa25 [vidbit] Improve (Closes #9759) 2016-06-26 16:59:28 +07:00
Sergey M․
88d9f6c0c4 [utils] Add support for name list in _html_search_meta 2016-06-26 16:57:14 +07:00
TRox1972
3c9c088f9c [Vidbit] Add new extractor 2016-06-26 16:52:52 +07:00
Yen Chi Hsuan
fc3996bfe1 [iqiyi] Remove codes for debugging 2016-06-26 15:45:41 +08:00
Yen Chi Hsuan
5b6ad8630c [iqiyi] Partially fix IqiyiIE
Use the HTML5 API. Only low-resolution formats available

Related: #9839

Thanks @zhangn1985 for the overall algorithm (soimort/you-get#1224)
2016-06-26 15:18:32 +08:00
Yen Chi Hsuan
30105f4ac0 [le] Move urshift() to utils.py 2016-06-26 15:17:26 +08:00
Yen Chi Hsuan
1143535d76 [utils] Add urshift()
Used in IqiyiIE and LeIE
2016-06-26 15:16:49 +08:00
Yen Chi Hsuan
7d52c052ef [generic] Fix test_Generic_76
Broken: https://travis-ci.org/rg3/youtube-dl/jobs/140251658
2016-06-26 11:56:27 +08:00
stepshal
a2406fce3c Fix misspelling 2016-06-26 01:28:55 +07:00
Sergey M․
3b34ab538c [svtplay] Extend _VALID_URL (#9900) 2016-06-26 00:29:53 +07:00
Sergey M․
ac782306f1 [iqiyi] Mark broken 2016-06-26 00:25:41 +07:00
Sergey M․
0c00e889f3 Credit @JakubAdamWieczorek for #9813 2016-06-25 23:35:57 +07:00
Sergey M․
ce96ed05f4 [polskieradio] Add test with video 2016-06-25 23:31:21 +07:00
Sergey M․
0463b77a1f [polskieradio] Improve extraction (Closes #9813) 2016-06-25 23:19:18 +07:00
Jakub Adam Wieczorek
2d185706ea [polskieradio] Add support for Polskie Radio.
Polskie Radio is the main Polish state-funded radio broadcasting service.
2016-06-25 23:19:18 +07:00
Sergey M․
b72b44318c [utils] Add strip_or_none 2016-06-25 23:19:18 +07:00
Sergey M․
46f59e89ea [utils] Add unified_timestamp 2016-06-25 23:19:18 +07:00
Sergey M․
b4241e308e release 2016.06.25 2016-06-25 03:03:20 +07:00
Sergey M․
3d4b08dfc7 [setup.py] Add file version information and quotes consistency (Closes #9878) 2016-06-25 02:50:12 +07:00
Sergey M․
be49068d65 [youtube] Fix and skip some tests 2016-06-24 22:47:19 +07:00
Sergey M․
525cedb971 [youtube] Relax URL expansion in description 2016-06-24 22:37:13 +07:00
Sergey M․
de3c7fe0d4 [youtube] Fix 141 format tests 2016-06-24 22:27:55 +07:00
Yen Chi Hsuan
896cc72750 [mixcloud] View count and like count may be absent
Closes #9874
2016-06-24 17:26:12 +08:00
Yen Chi Hsuan
c1ff6e1ad0 [vimeo:review] Fix extraction for password-protected videos
Closes #9853
2016-06-24 16:48:37 +08:00
Remita Amine
fee70322d7 [appletrailers] correct thumbnail fallback 2016-06-23 19:03:34 +01:00
Remita Amine
8065d6c55f [dcn] extend _VALID_URL for awaan.ae and extract all available formats 2016-06-23 17:22:15 +01:00
Remita Amine
494172d2e5 [appletrailers] extract info from an alternative source if available(closes #8422)(closes #8422) 2016-06-23 15:49:42 +01:00
Remita Amine
6e3c2047f8 [tvp] extract all formats and detect erros 2016-06-23 04:36:16 +01:00
Sergey M․
011bd3221b release 2016.06.23.1 2016-06-23 09:42:56 +07:00
Sergey M․
b46eabecd3 [jsinterp] Relax JS function regex (Closes #9863) 2016-06-23 09:41:34 +07:00
Remita Amine
0437307a41 [nbc:nbcnews] improve extraction and add msnbc to the extractor 2016-06-23 01:36:19 +01:00
Remita Amine
22b7ac13ef [tf1] fix wat id extraction(closes #9862) 2016-06-23 00:14:34 +01:00
Sergey M․
96f88e91b7 release 2016.06.23 2016-06-23 04:29:34 +07:00
Sergey M․
3331a4644d [vk] Remove unused import 2016-06-23 04:27:10 +07:00
Sergey M․
adf1921dc1 [xnxx] Improve _VALID_URL (Closes #9858) 2016-06-23 04:26:49 +07:00
Sergey M․
97674f0419 [xnxx] Replace test 2016-06-23 04:24:00 +07:00
rr-
73843ae8ac [xnxx] fix url regex
The pattern has changed from "video123412" to "video-o8xa19".
The changes maintain backwards compatibility with old-style URLs.
2016-06-23 04:19:55 +07:00
Sergey M․
f2bb8c036a [vk] Modernize 2016-06-23 04:18:43 +07:00
Sergey M․
75ca6bcee2 [vk] Workaround buggy new.vk.com Set-Cookie headers 2016-06-23 04:17:13 +07:00
Sergey M․
089657ed1f [vimeo:album] Add paged example URL 2016-06-23 02:00:03 +07:00
Sergey M․
b5eab86c24 [vimeo:album] Impove _VALID_URL 2016-06-23 01:56:58 +07:00
Sergey M․
c8e3e0974b [vimeo:channel] Improve playlist extraction 2016-06-23 01:28:36 +07:00
Purdea Andrei
dfc8f46e1c [vimeo:channel] Add video id to url_result
This will allow us to decide much faster that we don't want an already archived video,
and will allow having to download webpages for each video that has already been downloaded,
thus significantly speeding up the archival of channels that have no new content.
2016-06-23 01:26:27 +07:00
Sergey M․
c143ddce5d [vimeo] Override original URL only when necessary 2016-06-23 00:51:36 +07:00
Jaime Marquínez Ferrándiz
169d836feb lazy-extractors: Fix after commit 6e6b9f600f
The problem was in the following code:

    class ArteTVPlus7IE(ArteTVBaseIE):

        ...

        @classmethod
        def suitable(cls, url):
            return False if ArteTVPlaylistIE.suitable(url) else super(ArteTVPlus7IE, cls).suitable(url)

And its sublcasses like ArteTVCinemaIE.

Since in the lazy_extractors.py file ArteTVCinemaIE was not a subclass of ArteTVPlus7IE, super(ArteTVPlus7IE, cls) failed.

To fix it we have to make it a subclass. Since the order of _ALL_CLASSES is arbitrary we must sort them so that the base classes are defined first. We also must add base classes like YoutubeBaseInfoExtractor.
2016-06-22 19:20:50 +02:00
TRox1972
6ae938b295 [Vine] Extract view count 2016-06-22 23:57:35 +07:00
Sergey M․
cf40fdf5c1 release 2016.06.22 2016-06-22 23:43:24 +07:00
Sergey M․
23bdae0955 [svt] Various improvements
+ [svt:play] Add fallback path looking for video id and fix extraction for oppetarkiv
* [svt:base] Detect geo restriction
* [svt:base] Extract series related metadata
2016-06-22 23:36:07 +07:00
Shai Coleman
ca74c90bf5 Fix issue downloading facebook videos
youtube-dl expects the format items to be returned as a list,
but when there's only one item Facebook returns a dict instead,
this wraps the dict in a list if necessary
2016-06-22 12:52:15 +01:00
Sergey M․
7cfc1e2a10 [gametrailers] Remove extractor
gametrailers closed (see http://www.polygon.com/2016/2/8/10944452/gametrailers-shuts-down-after-13-year-run)
2016-06-21 22:31:41 +07:00
Remita Amine
1ac5705f62 [gamespot] extract all formats 2016-06-21 13:37:57 +01:00
Yen Chi Hsuan
e4f90ea0a7 [svt] Fix extraction for SVTPlay (closes #9809) 2016-06-21 17:55:53 +08:00
Sergey M․
cdfc187cd5 [cbs] Remove unused import 2016-06-20 22:40:33 +07:00
Sergey M․
feef925f49 [streamcloud] Capture error message (#9840) 2016-06-20 22:40:22 +07:00
Sergey M․
19e2d1cdea release 2016.06.20 2016-06-20 20:50:01 +07:00
Sergey M․
8369a4fe76 [downloader/hls] Simplify and carry long lines 2016-06-20 21:55:17 +07:00
Philipp Hagemeister
1f749b6658 Revert "[jsinterp] Avoid double key lookup for setting new key"
This reverts commit 7c05097633.
2016-06-20 13:29:13 +02:00
Remita Amine
819707920a [cbs] fix _VALID_URL 2016-06-19 23:55:19 +01:00
Remita Amine
43518503a6 [cbs,cbsnews,cbssports] reduce requests while extracting all formats 2016-06-19 23:40:00 +01:00
Remita Amine
5839d556e4 [theplatform] reduce requests for theplatform feed info extraction 2016-06-19 23:37:05 +01:00
Yen Chi Hsuan
6c83e583b3 [radiojavan] PEP8
E275 is added in pycodestyle 2.6

See https://github.com/PyCQA/pycodestyle/pull/491
2016-06-19 13:32:08 +08:00
Yen Chi Hsuan
6aeb64b673 Merge pull request #8201 from remitamine/hls-aes
[downloader/hls] Add support for AES-128 encrypted segments in hlsnative downloader
2016-06-19 13:25:08 +08:00
Remita Amine
6cd64b6806 [foxsports] extract http formats 2016-06-19 05:45:48 +01:00
remitamine
e154c65128 [downloader/hls] Add support for AES-128 encrypted segments in hlsnative downloader 2016-06-19 01:01:40 +01:00
Sergey M․
a50fd6e026 release 2016.06.19.1 2016-06-19 03:57:14 +07:00
Sergey M․
6a55bb66ee [vimeo] Fix rented videos (Closes #9830) 2016-06-19 03:56:01 +07:00
Lucas Moura
7c05097633 [jsinterp] Avoid double key lookup for setting new key
In order to add a new key to both __objects and __functions dicts on jsinterp.py, it is
necessary to first verify if a key was present and if not, create the key and
assign it to a value.

However, this can be done with a single step using dict setdefault method.
2016-06-19 03:29:45 +07:00
Sergey M․
589568789f release 2016.06.19 2016-06-19 02:30:29 +07:00
Sergey M․
7577d849a6 [r7] Fix extraction and add support for articles (Closes #9826) 2016-06-19 02:25:34 +07:00
Sergey M․
cb23192bc4 [closertotruth] Update and improve (Closes #8680) 2016-06-19 00:35:29 +07:00
Steven Gosseling
41c1023300 [closertotruth] Add extractor
Removed print statement from code.

Replaced two regex searches with the corret ones.

Removed some unnecessary semicolumns

fixed title extraction

refactored everything to search_regex

processed comments on commit 5650b0d, fixed feedback from flake8

Improved regexes and returns info dict now.

Added support for closertotruth interview URL

Added support for episodes page
2016-06-18 23:19:56 +07:00
Sergey M․
90b6288cce [arte:+7] Simplify _VALID_URL 2016-06-18 22:23:48 +07:00
Sergey M․
c1823c8ad9 [README.md] Remove 'small' from description (#9814) 2016-06-18 22:08:48 +07:00
Sergey M․
d7c6c656c5 [arte:+7] Expand _VALID_URL (Closes #9820) 2016-06-18 21:42:17 +07:00
Yen Chi Hsuan
b0b128049a [extractors] Update references to sportschau (#9799) 2016-06-18 13:43:47 +08:00
Yen Chi Hsuan
e8f13f2637 [sportschau.de] Fix extraction and moved to its own file (closes #9799) 2016-06-18 13:42:58 +08:00
Yen Chi Hsuan
b5aad37f6b [ard] Remove SportschauIE, which is now based on WDR (#9799) 2016-06-18 13:42:39 +08:00
Yen Chi Hsuan
6d0d4fc26d [wdr] Add WDRBaseIE, for Sportschau (#9799) 2016-06-18 13:40:55 +08:00
Yen Chi Hsuan
0278aa443f [br] Skip invalid tests 2016-06-18 12:53:48 +08:00
Yen Chi Hsuan
1f35745758 [azubu] Don't fail on optional fields 2016-06-18 12:39:08 +08:00
Yen Chi Hsuan
573c35272f [bbc] Skip a geo-restricted test case 2016-06-18 12:35:55 +08:00
Yen Chi Hsuan
09e3f91e40 [arte] Update _TESTS and fix for pages with multiple YouTube videos
Some tests are from #6895 and #6613
2016-06-18 12:34:58 +08:00
Yen Chi Hsuan
1b6cf16be7 [aftonbladet] Fix extraction 2016-06-18 12:27:39 +08:00
Yen Chi Hsuan
26264cb056 [adobetv] Use embedded data in the webpage
Sometimes the HTML webpage is returned even with '?format=json'
2016-06-18 12:21:40 +08:00
Yen Chi Hsuan
a72df5f36f [mtvservices] Fix ext for RTMP streams 2016-06-18 12:19:06 +08:00
Yen Chi Hsuan
c878e635de [bet] Moved to MTVServices 2016-06-18 12:17:24 +08:00
Sergey M․
0f47cc2e92 release 2016.06.18.1 2016-06-18 06:20:34 +07:00
Sergey M․
5fc2757682 release 2016.06.18 2016-06-18 06:00:05 +07:00
Sergey M․
e3944c2621 [pornhd] Add working test 2016-06-18 05:50:17 +07:00
Sergey M․
667d96480b [pornhd] Detect removed videos and modernize 2016-06-18 05:42:20 +07:00
Sergey M․
e6fe993c31 [pornhd] Improve formats extraction 2016-06-18 05:37:53 +07:00
Sergey M․
d0d93f76ea [pornhd] Fix metadata extraction 2016-06-18 05:30:46 +07:00
Sergey M․
20a6a154fe [mtv] Use compat_xpath and fix FutureWarning 2016-06-18 04:46:26 +07:00
Sergey M․
f011876076 [nickde] Add extractor (Closes #9778) 2016-06-18 04:40:48 +07:00
Sergey M․
6929569403 [mitele] Extract series metadata and make title more robust (Closes #9758) 2016-06-18 04:06:19 +07:00
Sergey M․
eb451890da [carambatv] Add extractor (Closes #9815) 2016-06-18 03:04:14 +07:00
Sergey M․
ded7511a70 [bbccouk] Add support for playlists (Closes #9812) 2016-06-17 23:42:52 +07:00
Sergey M․
d2161cade5 release 2016.06.16 2016-06-16 22:40:55 +07:00
Sergey M․
27e5fa8198 [cda] Fix extraction (Closes #9803) 2016-06-16 22:33:12 +07:00
Yen Chi Hsuan
efbd1eb51a [wimp] Fix extraction and update _TESTS 2016-06-16 12:27:21 +08:00
Yen Chi Hsuan
369ff75081 [jwplatform] Improved JWPlayer support 2016-06-16 12:26:45 +08:00
Yen Chi Hsuan
47212f7bcb [utils] Don't transform numbers not starting with a zero
Fix test_Viidea and maybe others
2016-06-16 11:00:54 +08:00
Sergey M․
4c93ee8d14 [imdb] Improve _VALID_URL (Closes #9788) 2016-06-15 22:34:55 +07:00
Yen Chi Hsuan
8bc4dbb1af [wrzuta.pl] Detect error and update _TESTS 2016-06-14 11:14:59 +08:00
Sergey M․
6c3760292c [pornhub] Improve title extraction (Closes #9777) 2016-06-14 04:57:59 +07:00
Sergey M․
4cef70db6c [devscripts/release.sh] Add flag for gpg-sign commits 2016-06-14 03:16:56 +07:00
Sergey M․
ff4af6ec59 [lynda] Remove superfluous _NETRC_MACHINE 2016-06-14 02:49:33 +07:00
Sergey M․
d01fb21d4c release 2016.06.14 2016-06-14 02:19:42 +07:00
Sergey M․
a4ea28eee6 Credit @venth for wrzuta:playlist (#9341) 2016-06-14 02:15:47 +07:00
Sergey M․
bc2a871f3e Credit @dracony for rockstargames (#9737) 2016-06-14 02:15:09 +07:00
Sergey M․
1759672eed [wrzuta:playlist] Improve and simplify (Closes #9341) 2016-06-14 02:13:54 +07:00
venth
fea55ef4a9 [wrzuta.pl:playlist] Added playlist extraction from wrzuta.pl 2016-06-14 02:10:48 +07:00
Sergey M․
16b6bd01d2 [rockstargames] Improve and add Youtube fallback (Closes #9737) 2016-06-14 01:11:24 +07:00
Dracony
14d0f4e0f3 Added extractor for rockstargames.com 2016-06-14 01:09:35 +07:00
Sergey M․
778f969447 [twitch:clips] Add extractor (Closes #9767) 2016-06-14 00:06:31 +07:00
Sergey M․
79cd8b3d8a [README.md] Suggest checking extractor code under all Python versions 2016-06-13 10:04:04 +07:00
Sergey M․
b4663f12b1 [README.md] Update links to info dict metafields 2016-06-13 07:16:35 +07:00
Sergey M․
b50e02c1e4 [README.md] Update links to options available for YoutubeDL 2016-06-13 07:05:32 +07:00
Sergey M․
33b72ce64e [xfileshare] Improve removed videos detection 2016-06-13 01:19:54 +07:00
Sergey M․
cf2bf840ba [xfileshare] Fix test 2016-06-13 01:11:14 +07:00
Sergey M․
bccdac6874 [xfileshare:xvidstage] Add support for videos with packed codes (Closes #4335) 2016-06-13 01:11:04 +07:00
Sergey M․
e69f9f5d68 [downloader/external] Decode error string before writing to stderr 2016-06-12 16:45:07 +07:00
Sergey M․
77a9a9c295 release 2016.06.12 2016-06-12 12:06:48 +07:00
Sergey M․
84dcd1c4e4 [streamcloud] Detect removed videos (Closes #3768) 2016-06-12 11:08:39 +07:00
Sergey M․
971e3b7520 [nrk:skole] Fix extraction 2016-06-12 07:20:37 +07:00
Sergey M․
4e79011729 [nrktv] Fix tests 2016-06-12 06:57:04 +07:00
Sergey M․
a936ac321c [README.md] Document using output template in batch files (Closes #9717) 2016-06-12 06:39:31 +07:00
Sergey M․
98960c911c [instagram] Extract metadata from JSON 2016-06-12 06:06:04 +07:00
Sergey M․
329ca3bef6 [utils] Add try_get
To reduce boilerplate when accessing JSON
2016-06-12 06:05:34 +07:00
Sergey M․
2c3322e36e [youporn] Fix metadata extraction 2016-06-12 04:49:37 +07:00
Sergey M․
80ae228b34 [matchtv] Modernize 2016-06-12 01:57:23 +07:00
Yen Chi Hsuan
6d28c408cf [viki] Do not use a fallback language for title in the first try
In test_Viki_3, 'titles' gives a Hebrew title.
2016-06-11 23:00:44 +08:00
Yen Chi Hsuan
c83b35d4aa [viki] Update _TESTS 2016-06-11 22:39:13 +08:00
Yen Chi Hsuan
94e5d6aedb [viki] Skip a geo-restricted test 2016-06-11 21:49:01 +08:00
Yen Chi Hsuan
531a74968c [vimeo] Fix extraction for VimeoReview videos 2016-06-11 21:35:08 +08:00
Yen Chi Hsuan
c5edd147d1 [generic] Remove an invalid test
Now handled by telewebion.py
2016-06-11 18:39:58 +08:00
Yen Chi Hsuan
856150d056 [telewebion] Add new extractor (closes #5135) 2016-06-11 18:39:58 +08:00
Yen Chi Hsuan
03ebea89b0 Merge pull request #9755 from vxbinaca/patch-2
[utils] Change Firefox 44 to 47
2016-06-11 17:38:45 +08:00
Paul Henning
15d106787e [utils] Change Firefox 44 to 47
See commit title.
2016-06-11 05:36:31 -04:00
Yen Chi Hsuan
7aab3696dd [kuwo] Update _TESTS 2016-06-11 15:37:04 +08:00
Yen Chi Hsuan
47787efa2b [leeco] Recognize Le Sports URLs (fixes #9750) 2016-06-11 13:14:41 +08:00
Sergey M․
4a420119a6 release 2016.06.11.3 2016-06-11 08:34:30 +07:00
Sergey M․
33751818d3 release 2016.06.11.2 2016-06-11 08:28:51 +07:00
Sergey M․
698f127c1a [setup.py] Add python 3.5 classifier 2016-06-11 06:14:22 +07:00
Sergey M․
fe458b6596 [limelight] Extract ttml subtitles (Closes #9739) 2016-06-11 05:57:27 +07:00
Sergey M․
21ac1a8ac3 [limelight] Fix typo 2016-06-11 05:52:50 +07:00
Sergey M․
79027c0ea0 [limelight] Improve _VALID_URLs 2016-06-11 05:40:02 +07:00
Sergey M․
4cad2929cd [limelight] Fix _VALID_URLs 2016-06-11 05:30:44 +07:00
Sergey M․
62666af99f [indavideo] Fix formats' height (Closes #9744) 2016-06-11 05:13:05 +07:00
Sergey M․
9ddc289f88 [README.md] Document missing playlist fields in output template 2016-06-11 04:59:47 +07:00
Sergey M․
6626c214e1 release 2016.06.11.1 2016-06-11 03:00:08 +07:00
Sergey M․
d845622b2e release 2016.06.11 2016-06-11 02:41:48 +07:00
Sergey M
1058f56e96 Merge pull request #9747 from TRox1972/lynda
[Lynda] Extract course description
2016-06-11 02:34:58 +07:00
TRox1972
0434358823 [Lynda] Extract course description 2016-06-10 19:17:58 +02:00
Sergey M․
3841256c2c [lynda] Skip login if already logged in 2016-06-10 23:01:52 +07:00
Sergey M․
bdf16f8140 [lynda] Add support for new authentication (Closes #9740) 2016-06-10 22:40:18 +07:00
Yen Chi Hsuan
836ab0c554 [compat] Import html5 entities correctly 2016-06-10 18:12:57 +08:00
Yen Chi Hsuan
6c0376fe4f [dw] Skip an invalid test
DW documentaries only last for one or two weeks. See #9475
2016-06-10 16:53:40 +08:00
Yen Chi Hsuan
1fa309da40 [generic] Update test_Generic_40
The original link now redirects to an YouTube user channel.
2016-06-10 16:39:31 +08:00
Yen Chi Hsuan
daa0df9e8b [youtube:user] Support another URL form
Such an URL comes from http://www.gametrailers.com/. This is originally
a test case in GenericIE, but now seems all GameTrailers videos are on
YouTube.
2016-06-10 16:37:12 +08:00
Yen Chi Hsuan
09728d5fbc [audiomack:album] Force video_id to be strings
Related: be6217b261
2016-06-10 16:11:28 +08:00
Yen Chi Hsuan
c16f8a4659 [voicerepublic] Force video_id to be strings
Related: be6217b261
2016-06-10 16:04:28 +08:00
Yen Chi Hsuan
a225238530 [vporn] Improve error detection and update _TESTS 2016-06-10 15:12:53 +08:00
Yen Chi Hsuan
55b2f099c0 [utils] Decode HTML5 entities
Used in test_Vporn_1. Also related to #9270
2016-06-10 15:11:55 +08:00
Yen Chi Hsuan
9631a94fb5 [compat] Add compat_html_entities_html5
Used in tset_Vporn_1. Also Related to #9270
2016-06-10 15:05:24 +08:00
Yen Chi Hsuan
cc4444662c [generic] Remove Vulture embed detection
Vulture.com videos now hosts on YouTube, Vimeo, MTV, NBC News or Hulu.
Here's an example of Hulu:
http://www.vulture.com/2016/06/kimmel-interviews-mariah-carey-in-a-bathtub.html
2016-06-10 13:40:57 +08:00
Yen Chi Hsuan
de3eb07ed6 [generic] Detect NBC News embeds 2016-06-10 13:32:59 +08:00
Yen Chi Hsuan
5de008e8c3 [nbcnews] Support embed widgets
Used in some Vulture videos
2016-06-10 13:31:55 +08:00
Yen Chi Hsuan
3e74b444e7 [vulture] Remove the extractor
The first 10 URLs in google search "site:http://video.vulture.com/video"
is dead. I guess Vulture does not host videos on their own anymore.
2016-06-10 13:13:59 +08:00
Yen Chi Hsuan
e1e0a10c56 [weibo] Remove the extractor
The Weibo weishipin (微視頻, tiny videos) service is dead and now all
videos are hosted on Sina videos, which is covered by sina.py
2016-06-10 13:01:22 +08:00
Yen Chi Hsuan
436214baf7 [xfileshare] Skip an invalid test 2016-06-10 12:31:06 +08:00
Yen Chi Hsuan
506d0e9693 [xuite] Skip the invalid test 2016-06-10 12:29:58 +08:00
Yen Chi Hsuan
55290788d3 [yahoo] Yahoo doesn't like region names in lower cases
Fix test_Yahoo_7
2016-06-10 12:28:56 +08:00
Yen Chi Hsuan
bc7e7adf51 [wdr] Subtitles are TTML 2016-06-10 00:22:41 +08:00
Sergey M․
b0aebe702c [godtv] Relax _VALID_URL 2016-06-09 21:34:47 +07:00
Sergey M․
416878f41f [godtv] Add more tests 2016-06-09 21:33:51 +07:00
Sergey M․
c0fed3bda5 [godtv] Improve and add support for playlists (Closes #9608) 2016-06-09 21:29:41 +07:00
TRox1972
bb1e44cc8e [godtv] Add extractor
[GodTV] Improvements
2016-06-09 21:27:27 +07:00
N1k145
21efee5f8b [openload] Relax _VALID_URL
[openload] added to _TESTS, removed escape
2016-06-09 20:46:54 +07:00
Yen Chi Hsuan
e2713d32f4 [openload] Fix extraction. Thanks @perron375 for the solution
Closes #9706
2016-06-09 19:00:13 +08:00
Yen Chi Hsuan
e21c26daf9 Merge pull request #9395 from pmrowla/afreecatv
[afreecatv] Add new extractor for afreecatv.com VODs
2016-06-09 17:20:16 +08:00
Yen Chi Hsuan
1594a4932f [wdr] Misc changes 2016-06-09 13:49:35 +08:00
Yen Chi Hsuan
6869d634c6 [wdr] Simplify extraction 2016-06-09 13:41:12 +08:00
Yen Chi Hsuan
50918c4ee0 [wdr] Support radio players (closes #6147) 2016-06-09 13:04:30 +08:00
Yen Chi Hsuan
6c33d24b46 [utils] Add audio/mpeg to mimetype2ext()
Used in WDR live radios (#6147)
2016-06-09 12:58:24 +08:00
Sergey M․
be6217b261 [YoutubeDL] Force string conversion on non string video ids 2016-06-09 05:34:19 +07:00
Sergey M․
9d51a0a9a1 [vessel] Make hls formats non fatal 2016-06-09 04:13:38 +07:00
Sergey M․
39da509f67 [vessel] Extract DASH formats 2016-06-09 04:12:48 +07:00
Sergey M․
a479b8f687 [vessel] Use native hls by default 2016-06-09 04:09:32 +07:00
Sergey M․
48a5eabc48 [extractor/generic] Add support vessel embeds (Closes #7083) 2016-06-09 04:02:27 +07:00
Sergey M․
11380753b5 [vessel] Add support for embed urls and improve extraction 2016-06-09 04:00:47 +07:00
Yen Chi Hsuan
411c590a1f [youku:show] Add new extractor 2016-06-08 23:45:46 +08:00
Yen Chi Hsuan
6da8d7de69 [twitter] Update _TESTS 2016-06-08 21:48:12 +08:00
Yen Chi Hsuan
c6308b3153 [twitter] Fix extraction for videos with HLS streams
Closes #9623
2016-06-08 21:28:10 +08:00
Yen Chi Hsuan
fc0a45fa41 [twitter] Detect suspended accounts and update _TESTS 2016-06-08 21:12:14 +08:00
Yen Chi Hsuan
e6e90515db [nbc] Add the test case from #9578
Closes #9578
2016-06-08 20:50:01 +08:00
Yen Chi Hsuan
22a0a95247 [theplatform] Some NBC videos require an additional cookie
Related: #9578
2016-06-08 20:47:39 +08:00
Yen Chi Hsuan
50ce1c331c [downloader/external] Add another env for proxies in ffmpeg/avconv
Related sources:
https://git.libav.org/?p=libav.git;a=blob;f=libavformat/http.c;h=8fe8d11e1edfdbb04a8726db2c49cfef3f572aac;hb=HEAD#l152
https://git.libav.org/?p=libav.git;a=blob;f=libavformat/tls.c;h=fab243e93e20034e88e619188c13a44a5d8ccdb9;hb=HEAD#l63
https://github.com/FFmpeg/FFmpeg/blob/f8e89d8/libavformat/http.c#L191
https://github.com/FFmpeg/FFmpeg/blob/f8e89d8/libavformat/tls.c#L92
2016-06-08 14:43:52 +08:00
Yen Chi Hsuan
7264e38591 [bilibili] Fix for videos without upload time (closes #9710) 2016-06-08 14:31:40 +08:00
Sergey M․
33d9f3707c [thesixtyone] Relax _VALID_URL (Closes #9714) 2016-06-08 02:22:04 +07:00
Sergey M․
a26a9d6239 [livestream:event] Ensure video id is string (Closes #9721) 2016-06-07 23:53:08 +07:00
Yen Chi Hsuan
a4a8201c02 [wdr] Update _TESTS 2016-06-08 00:25:51 +08:00
Yen Chi Hsuan
a6571f1073 [common] Fix <bootstrapInfo> detection in F4M manifests
Regression since 0a5685b26f
2016-06-08 00:19:33 +08:00
Sergey M․
57b6e9652e [canal+] Add support for d17.tv 2016-06-07 22:32:08 +07:00
Sergey M․
3d9b3605a3 [canal+] Update tests 2016-06-07 22:26:18 +07:00
Sergey M․
74193838f7 [canal+] Improve extraction (Closes #9718) 2016-06-07 22:12:20 +07:00
Sergey M
fb94e260b5 Merge pull request #9720 from Kagami/vlive-new-statuses
[vlive] Acknowledge vlive+ streams statuses
2016-06-07 21:22:53 +07:00
Kagami Hiiragi
345dec937f [vlive] Acknowledge vlive+ streams statuses
Same as common statuses just with "PRODUCT_" prefix:
PRODUCE_LIVE_END, PRODUCT_COMING_SOON, etc.
2016-06-07 17:12:13 +03:00
Philipp Hagemeister
4315f74fa8 Merge remote-tracking branch 'Boris-de/wdrmaus_fix#8562' 2016-06-07 12:29:18 +02:00
Jaime Marquínez Ferrándiz
e67f688025 [compat] Add 'compat_input' to __all__ 2016-06-05 23:16:08 +02:00
Sergey M․
db59b37d0b [devscripts/create-github-release] Make full published releases by default 2016-06-06 03:02:11 +07:00
Sergey M․
244fe977fe [options] Add --load-info-json alias for symmetry with --write-info-json 2016-06-06 02:52:58 +07:00
Sergey M․
7b0d1c2859 [__init__] Use write_string instead of compat_string (Closes #9689) 2016-06-05 21:01:20 +07:00
Yen Chi Hsuan
21d0a8e48b Merge pull request #9702 from Eun/patch-1
curl: follow redirect
2016-06-05 17:43:26 +08:00
Tobias Salzmann
47f12ad3e3 curl: follow redirect 2016-06-05 11:04:55 +02:00
Sergey M
8f1aaa97a1 [README.md] Update pypi instructions 2016-06-05 11:19:44 +07:00
Sergey M
9d78524cbe Merge pull request #9697 from ryandesign/ryandesign-README-MacPorts
Update README.md to mention MacPorts
2016-06-05 11:09:20 +07:00
Ryan Schmidt
bc270284b5 Update README.md to mention MacPorts 2016-06-04 21:30:22 -05:00
Philipp Hagemeister
c93b4eaceb git pushMerge branch 'master' of github.com:rg3/youtube-dl 2016-06-04 22:55:21 +02:00
Philipp Hagemeister
71b9cb3107 extend FAQ (#9696) 2016-06-04 22:55:15 +02:00
Sergey M․
633b444fd2 [downloader/hls] Correct comment on twitch vods 2016-06-05 03:31:10 +07:00
Sergey M․
51c4d85ce7 [downloader/hls] PEP 8 2016-06-05 03:21:43 +07:00
Sergey M․
631d4c87ee [twitch:vod] Use native hls 2016-06-05 03:19:44 +07:00
Sergey M․
1e236d7e23 [downloader/hls] Do not rely on EXT-X-PLAYLIST-TYPE:EVENT 2016-06-05 03:16:05 +07:00
Sergey M․
2c34735267 [youtube] Add itags 256 and 258 2016-06-05 01:44:13 +07:00
Sergey M․
39b32571df [devscripts/release.sh] Release to GitHub 2016-06-05 00:48:33 +07:00
Sergey M․
db56f281d9 [devscripts/create-github-release] Add script for releasing on GitHub
Yet only Basic authentication is supported either via .netrc or by manual input
2016-06-05 00:47:26 +07:00
Sergey M․
e92b552a10 [devscripts/buildserver] Use compat_input from compat 2016-06-05 00:44:51 +07:00
Sergey M․
1ae6c83bce [compat] Add compat_input 2016-06-05 00:43:55 +07:00
Sergey M․
0fc832e1b2 [vidio] Improve (Closes #9562) 2016-06-04 16:48:24 +07:00
TRox1972
7def35712a [vidio] Add extractor (Closes #7195)
[Vidio] fix fallback value and wrap duration in int_or_none

[Vidio] don't use video_id for _html_search_regex()
2016-06-04 16:48:24 +07:00
Philipp Hagemeister
cad88f96dc disable uploading to yt-dl.org for now 2016-06-04 11:42:52 +02:00
Sergey M․
762d44c956 [channel9] Add support for rss links (Closes #9673) 2016-06-04 04:57:16 +07:00
Sergey M․
4d8856d511 [loc] Extract direct download links 2016-06-04 00:26:03 +07:00
Sergey M․
c917106be4 [loc] Extract subtites 2016-06-03 23:55:22 +07:00
Sergey M․
76e9cd7f24 [loc] Add support for another URL schema and simplify 2016-06-03 23:43:34 +07:00
Sergey M․
bf4c6a38e1 release 2016.06.03 2016-06-03 23:25:24 +07:00
Sergey M․
7f3c3dfa52 [loc] Improve (Closes #9521) 2016-06-03 23:19:11 +07:00
TRox1972
9c3c447eb3 [loc] Add extractor (Closes #3188)
Added extractor of loc.gov, which closes #3188. I am not an experienced programmer, so I am sure I did a bunch of mistakes, but the extractor works (for me at least).

[LibraryOfCongress] don't use video_id for _search_regex()

[LibraryOfCongress] Improvements
2016-06-03 22:17:35 +07:00
Yen Chi Hsuan
ad73083ff0 [bilibili] Add _part%d suffixes back (closes #9660) 2016-06-02 19:29:27 +08:00
Yen Chi Hsuan
1e8b59243f Merge pull request #9669 from bzc6p/master
Added sanitization support for Hungarian letters Ő and Ű
2016-06-02 18:23:54 +08:00
bzc6p
c88270271e Added sanitization support for Hungarian letters Ő and Ű 2016-06-02 11:51:48 +02:00
bzc6p
b96f007eeb Added sanitization support for Hungarian letters Ő and Ű 2016-06-02 11:39:32 +02:00
Yen Chi Hsuan
9a4aec8b7e [utils] Use bytes-like objects as header values on Python 2 2016-06-02 15:00:49 +08:00
Yen Chi Hsuan
54fb199681 [test/test_http] Fix getsockname() on Jython 2016-06-02 15:00:49 +08:00
Yen Chi Hsuan
8c32e5dc32 [test/test_utils] Add test for #9588 2016-06-02 15:00:49 +08:00
Yen Chi Hsuan
0ea590076f [utils] Always decode Location header
escape_url is broken for bytes-like objects
2016-06-02 15:00:49 +08:00
Remita Amine
4a684895c0 [seeker] Add new extractor(closes #9619) 2016-06-01 21:20:25 +01:00
Remita Amine
f4e4aa9b6b [revision3:embed] Add new extractor 2016-06-01 21:20:25 +01:00
Sergey M․
5e3856a2c5 release 2016.06.02 2016-06-02 01:19:57 +07:00
Sergey M․
6e6b9f600f [arte] Add support for playlists and rework tests (Closes #9632) 2016-06-02 01:10:23 +07:00
Sergey M․
6a1df4fb5f [spankwire] Add support for new URL format (Closes #9657) 2016-06-01 21:23:58 +07:00
Yen Chi Hsuan
dde1ce7c06 [tf1] Fix a regular expression (closes #9656)
This is a Python bug fixed in 2.7.6 [1]

[1] https://github.com/rg3/youtube-dl/issues/9656#issuecomment-222968594
2016-06-01 20:04:43 +08:00
Yen Chi Hsuan
811586ebcf [generic] Update the UDNEmbed test case 2016-06-01 19:23:44 +08:00
Yen Chi Hsuan
0ff3749bfe [udn] Fix m3u8 and f4m extraction as well as improve 2016-06-01 19:23:09 +08:00
Yen Chi Hsuan
28bab13348 [generic,viewlift] Move a test case to the specialized extractor 2016-06-01 19:18:01 +08:00
Yen Chi Hsuan
877032314f [generic] Improve Kaltura detection
Closes #4004
2016-06-01 18:37:34 +08:00
Peter Rowlands
e7d85c4ef7 use /track/video/file to determine if video exists 2016-05-31 17:28:49 +09:00
Sergey M․
8ec2b2c41c [options] Add --limit-rate alias for rate limiting option
Closes #9644
In order to follow regular --verb-noun pattern and better conformity with wget and curl
2016-05-30 21:48:35 +07:00
Sergey M․
197a5da1d0 [yandexmusic] Improve captcha detection 2016-05-30 03:26:26 +07:00
Sergey M․
abbb2938fa release 2016.05.30.2 2016-05-30 03:12:12 +07:00
Boris Wachtmeister
3a686853e1 [WDR] fixed parsing of playlists 2016-05-26 20:54:51 +02:00
Boris Wachtmeister
949fc42e00 [WDR] the other wdrmaus.de pages also changed to the new player 2016-05-26 20:54:51 +02:00
Boris Wachtmeister
33a1ff7113 [WDR] extract jsonp-url by parsing data-extension of mediaLink 2016-05-26 20:54:51 +02:00
Boris Wachtmeister
bec2c14f2c [WDR] add special handling if alt-url is a m3u8 2016-05-26 20:54:51 +02:00
Boris Wachtmeister
37f972954d [WDR] use _download_json with a strip_jsonp 2016-05-26 20:54:51 +02:00
Boris Wachtmeister
3874e6ea66 [WDR] use single quotes for strings 2016-05-26 20:54:51 +02:00
Peter Rowlands
93fdb14177 don't use selection by attribute 2016-05-08 10:33:17 +09:00
Peter Rowlands
370d4eb8ad use stricter file selector
in case of empty in case of empty ./track/video/file entries
2016-05-08 10:02:48 +09:00
Peter Rowlands
3452c3a27c update tests 2016-05-08 10:02:19 +09:00
Peter Rowlands
81f35fee2f fix extractors.py import order 2016-05-08 08:57:16 +09:00
Peter Rowlands
0fdbe3146c use dict.get in case upload_date does not exist 2016-05-08 08:56:22 +09:00
Peter Rowlands
8d93c21466 add multi_video test case 2016-05-06 12:08:43 +09:00
Peter Rowlands
1dbfd78754 fix multi_video part naming, add upload_date field 2016-05-06 12:07:29 +09:00
Peter Rowlands
22e35adefd use url instead of single formats entry 2016-05-06 10:41:30 +09:00
Peter Rowlands
833b644fff use xpath_text 2016-05-06 01:24:02 +09:00
Peter Rowlands
57cf9b7f06 [afreecatv] Add new extractor for afreecatv.com VODs 2016-05-05 03:59:23 +09:00
Wang Jun Tham
ccff2c404d [ffmpeg] Fix embedding subtitles (#9063)
Changed command line parameters for ffmpeg when embedding subtitles.
Changed to ‘-map 0:v -c:v copy -map 0:a -c:a copy’
2016-04-24 00:08:02 +08:00
Boris Wachtmeister
14f7a2b8af [WDRMaus] switch current show to new WDR extractor (fixes #8562)
It seems that the "current show" already uses the new WDR video-player,
while all the others videos still use the old player.

I just added the current show URL to the normal WDR-extractor, which
works fine. This commit needs my changes from PR #8842 that fix the
support for WDR.
2016-04-23 11:53:22 +02:00
Boris Wachtmeister
c0837a12c8 [WDR] complete overhaul after relaunch of the site
The WDR relaunched their site on 2016-02-23 which not only changed the
URL-schema completely but also the layout of their pages.

Apparently the whole "mediathek" now runs on the wdr-domain, so no
separate URL for funkhauseuropa anymore.
There seems to be no explicit handling of video-sizes on the page or in
the URLs anymore. There seems to be only one size for HTML5, but still
several sizes for flash. The extractor adds all to the list of formats.

There is no metadata for the HTML5-stream, so that the best flash-stream
will always be considered as the "best" format. At least in my tests
this seemed to be true anyway.
2016-04-23 11:42:18 +02:00
190 changed files with 10525 additions and 3488 deletions

View File

@@ -6,8 +6,8 @@
--- ---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.05.30.1*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected. ### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.07.13*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.05.30.1** - [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.07.13**
### Before submitting an *issue* make sure you have: ### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections - [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2016.05.30.1 [debug] youtube-dl version 2016.07.13
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

22
.github/PULL_REQUEST_TEMPLATE.md vendored Normal file
View File

@@ -0,0 +1,22 @@
## Please follow the guide below
- You will be asked some questions, please read them **carefully** and answer honestly
- Put an `x` into all the boxes [ ] relevant to your *pull request* (like that [x])
- Use *Preview* tab to see how your *pull request* will actually look like
---
### Before submitting a *pull request* make sure you have:
- [ ] At least skimmed through [adding new extractor tutorial](https://github.com/rg3/youtube-dl#adding-support-for-a-new-site) and [youtube-dl coding conventions](https://github.com/rg3/youtube-dl#youtube-dl-coding-conventions) sections
- [ ] [Searched](https://github.com/rg3/youtube-dl/search?q=is%3Apr&type=Issues) the bugtracker for similar pull requests
### What is the purpose of your *pull request*?
- [ ] Bug fix
- [ ] New extractor
- [ ] New feature
---
### Description of your *pull request* and other information
Explanation of your *pull request* in arbitrary form goes here. Please make sure the description explains the purpose and effect of your *pull request* and is worded well enough to be understood. Provide as much context and examples as possible.

View File

@@ -7,9 +7,6 @@ python:
- "3.4" - "3.4"
- "3.5" - "3.5"
sudo: false sudo: false
install:
- bash ./devscripts/install_srelay.sh
- export PATH=$PATH:$(pwd)/tmp/srelay-0.4.8b6
script: nosetests test --verbose script: nosetests test --verbose
notifications: notifications:
email: email:

View File

@@ -173,3 +173,8 @@ Kevin Deldycke
inondle inondle
Tomáš Čech Tomáš Čech
Déstin Reed Déstin Reed
Roman Tsiupa
Artur Krysiak
Jakub Adam Wieczorek
Aleksandar Topuzović
Nehal Patel

View File

@@ -97,9 +97,17 @@ If you want to add support for a new site, first of all **make sure** this site
After you have ensured this site is distributing it's content legally, you can follow this quick list (assuming your service is called `yourextractor`): After you have ensured this site is distributing it's content legally, you can follow this quick list (assuming your service is called `yourextractor`):
1. [Fork this repository](https://github.com/rg3/youtube-dl/fork) 1. [Fork this repository](https://github.com/rg3/youtube-dl/fork)
2. Check out the source code with `git clone git@github.com:YOUR_GITHUB_USERNAME/youtube-dl.git` 2. Check out the source code with:
3. Start a new git branch with `cd youtube-dl; git checkout -b yourextractor`
git clone git@github.com:YOUR_GITHUB_USERNAME/youtube-dl.git
3. Start a new git branch with
cd youtube-dl
git checkout -b yourextractor
4. Start with this simple template and save it to `youtube_dl/extractor/yourextractor.py`: 4. Start with this simple template and save it to `youtube_dl/extractor/yourextractor.py`:
```python ```python
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
@@ -142,17 +150,149 @@ After you have ensured this site is distributing it's content legally, you can f
``` ```
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py). 5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. 6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc.
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L68-L226). Add tests and code for as many as you want. 7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L74-L252). Add tests and code for as many as you want.
8. Keep in mind that the only mandatory fields in info dict for successful extraction process are `id`, `title` and either `url` or `formats`, i.e. these are the critical data the extraction does not make any sense without. This means that [any field](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L138-L226) apart from aforementioned mandatory ones should be treated **as optional** and extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields. For example, if you have some intermediate dict `meta` that is a source of metadata and it has a key `summary` that you want to extract and put into resulting info dict as `description`, you should be ready that this key may be missing from the `meta` dict, i.e. you should extract it as `meta.get('summary')` and not `meta['summary']`. Similarly, you should pass `fatal=False` when extracting data from a webpage with `_search_regex/_html_search_regex`. 8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://pypi.python.org/pypi/flake8). Also make sure your code works under all [Python](http://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+.
9. Check the code with [flake8](https://pypi.python.org/pypi/flake8). 9. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
10. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
$ git add youtube_dl/extractor/extractors.py $ git add youtube_dl/extractor/extractors.py
$ git add youtube_dl/extractor/yourextractor.py $ git add youtube_dl/extractor/yourextractor.py
$ git commit -m '[yourextractor] Add new extractor' $ git commit -m '[yourextractor] Add new extractor'
$ git push origin yourextractor $ git push origin yourextractor
11. Finally, [create a pull request](https://help.github.com/articles/creating-a-pull-request). We'll then review and merge it. 10. Finally, [create a pull request](https://help.github.com/articles/creating-a-pull-request). We'll then review and merge it.
In any case, thank you very much for your contributions! In any case, thank you very much for your contributions!
## youtube-dl coding conventions
This section introduces a guide lines for writing idiomatic, robust and future-proof extractor code.
Extractors are very fragile by nature since they depend on the layout of the source data provided by 3rd party media hoster out of your control and this layout tend to change. As an extractor implementer your task is not only to write code that will extract media links and metadata correctly but also to minimize code dependency on source's layout changes and even to make the code foresee potential future changes and be ready for that. This is important because it will allow extractor not to break on minor layout changes thus keeping old youtube-dl versions working. Even though this breakage issue is easily fixed by emitting a new version of youtube-dl with fix incorporated all the previous version become broken in all repositories and distros' packages that may not be so prompt in fetching the update from us. Needless to say some may never receive an update at all that is possible for non rolling release distros.
### Mandatory and optional metafields
For extraction to work youtube-dl relies on metadata your extractor extracts and provides to youtube-dl expressed by [information dictionary](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L75-L257) or simply *info dict*. Only the following meta fields in *info dict* are considered mandatory for successful extraction process by youtube-dl:
- `id` (media identifier)
- `title` (media title)
- `url` (media download URL) or `formats`
In fact only the last option is technically mandatory (i.e. if you can't figure out the download location of the media the extraction does not make any sense). But by convention youtube-dl also treats `id` and `title` to be mandatory. Thus aforementioned metafields are the critical data the extraction does not make any sense without and if any of them fail to be extracted then extractor is considered completely broken.
[Any field](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L149-L257) apart from the aforementioned ones are considered **optional**. That means that extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they are always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields.
#### Example
Say you have some source dictionary `meta` that you've fetched as JSON with HTTP request and it has a key `summary`:
```python
meta = self._download_json(url, video_id)
```
Assume at this point `meta`'s layout is:
```python
{
...
"summary": "some fancy summary text",
...
}
```
Assume you want to extract `summary` and put into resulting info dict as `description`. Since `description` is optional metafield you should be ready that this key may be missing from the `meta` dict, so that you should extract it like:
```python
description = meta.get('summary') # correct
```
and not like:
```python
description = meta['summary'] # incorrect
```
The latter will break extraction process with `KeyError` if `summary` disappears from `meta` at some time later but with former approach extraction will just go ahead with `description` set to `None` that is perfectly fine (remember `None` is equivalent for absence of data).
Similarly, you should pass `fatal=False` when extracting optional data from a webpage with `_search_regex`, `_html_search_regex` or similar methods, for instance:
```python
description = self._search_regex(
r'<span[^>]+id="title"[^>]*>([^<]+)<',
webpage, 'description', fatal=False)
```
With `fatal` set to `False` if `_search_regex` fails to extract `description` it will emit a warning and continue extraction.
You can also pass `default=<some fallback value>`, for example:
```python
description = self._search_regex(
r'<span[^>]+id="title"[^>]*>([^<]+)<',
webpage, 'description', default=None)
```
On failure this code will silently continue the extraction with `description` set to `None`. That is useful for metafields that are known to may or may not be present.
### Provide fallbacks
When extracting metadata try to provide several scenarios for that. For example if `title` is present in several places/sources try extracting from at least some of them. This would make it more future-proof in case some of the sources became unavailable.
#### Example
Say `meta` from previous example has a `title` and you are about to extract it. Since `title` is mandatory meta field you should end up with something like:
```python
title = meta['title']
```
If `title` disappeares from `meta` in future due to some changes on hoster's side the extraction would fail since `title` is mandatory. That's expected.
Assume that you have some another source you can extract `title` from, for example `og:title` HTML meta of a `webpage`. In this case you can provide a fallback scenario:
```python
title = meta.get('title') or self._og_search_title(webpage)
```
This code will try to extract from `meta` first and if it fails it will try extracting `og:title` from a `webpage`.
### Make regular expressions flexible
When using regular expressions try to write them fuzzy and flexible.
#### Example
Say you need to extract `title` from the following HTML code:
```html
<span style="position: absolute; left: 910px; width: 90px; float: right; z-index: 9999;" class="title">some fancy title</span>
```
The code for that task should look similar to:
```python
title = self._search_regex(
r'<span[^>]+class="title"[^>]*>([^<]+)', webpage, 'title')
```
Or even better:
```python
title = self._search_regex(
r'<span[^>]+class=(["\'])title\1[^>]*>(?P<title>[^<]+)',
webpage, 'title', group='title')
```
Note how you tolerate potential changes in `style` attribute's value or switch from using double quotes to single for `class` attribute:
The code definitely should not look like:
```python
title = self._search_regex(
r'<span style="position: absolute; left: 910px; width: 90px; float: right; z-index: 9999;" class="title">(.*?)</span>',
webpage, 'title', group='title')
```
### Use safe conversion functions
Wrap all extracted numeric data into safe functions from `utils`: `int_or_none`, `float_or_none`. Use them for string to number conversions as well.

198
README.md
View File

@@ -17,7 +17,7 @@ youtube-dl - download videos from youtube.com or other video platforms
To install it right away for all UNIX users (Linux, OS X, etc.), type: To install it right away for all UNIX users (Linux, OS X, etc.), type:
sudo curl https://yt-dl.org/latest/youtube-dl -o /usr/local/bin/youtube-dl sudo curl -L https://yt-dl.org/latest/youtube-dl -o /usr/local/bin/youtube-dl
sudo chmod a+rx /usr/local/bin/youtube-dl sudo chmod a+rx /usr/local/bin/youtube-dl
If you do not have curl, you can alternatively use a recent wget: If you do not have curl, you can alternatively use a recent wget:
@@ -27,18 +27,24 @@ If you do not have curl, you can alternatively use a recent wget:
Windows users can [download an .exe file](https://yt-dl.org/latest/youtube-dl.exe) and place it in any location on their [PATH](http://en.wikipedia.org/wiki/PATH_%28variable%29) except for `%SYSTEMROOT%\System32` (e.g. **do not** put in `C:\Windows\System32`). Windows users can [download an .exe file](https://yt-dl.org/latest/youtube-dl.exe) and place it in any location on their [PATH](http://en.wikipedia.org/wiki/PATH_%28variable%29) except for `%SYSTEMROOT%\System32` (e.g. **do not** put in `C:\Windows\System32`).
OS X users can install **youtube-dl** with [Homebrew](http://brew.sh/). You can also use pip:
sudo pip install --upgrade youtube-dl
This command will update youtube-dl if you have already installed it. See the [pypi page](https://pypi.python.org/pypi/youtube_dl) for more information.
OS X users can install youtube-dl with [Homebrew](http://brew.sh/):
brew install youtube-dl brew install youtube-dl
You can also use pip: Or with [MacPorts](https://www.macports.org/):
sudo pip install youtube-dl sudo port install youtube-dl
Alternatively, refer to the [developer instructions](#developer-instructions) for how to check out and work with the git repository. For further options, including PGP signatures, see the [youtube-dl Download Page](https://rg3.github.io/youtube-dl/download.html). Alternatively, refer to the [developer instructions](#developer-instructions) for how to check out and work with the git repository. For further options, including PGP signatures, see the [youtube-dl Download Page](https://rg3.github.io/youtube-dl/download.html).
# DESCRIPTION # DESCRIPTION
**youtube-dl** is a small command-line program to download videos from **youtube-dl** is a command-line program to download videos from
YouTube.com and a few more sites. It requires the Python interpreter, version YouTube.com and a few more sites. It requires the Python interpreter, version
2.6, 2.7, or 3.2+, and it is not platform specific. It should work on 2.6, 2.7, or 3.2+, and it is not platform specific. It should work on
your Unix box, on Windows or on Mac OS X. It is released to the public domain, your Unix box, on Windows or on Mac OS X. It is released to the public domain,
@@ -97,9 +103,9 @@ which means you can modify it, redistribute it or use it however you like.
(experimental) (experimental)
-6, --force-ipv6 Make all connections via IPv6 -6, --force-ipv6 Make all connections via IPv6
(experimental) (experimental)
--cn-verification-proxy URL Use this proxy to verify the IP address for --geo-verification-proxy URL Use this proxy to verify the IP address for
some Chinese sites. The default proxy some geo-restricted sites. The default
specified by --proxy (or none, if the proxy specified by --proxy (or none, if the
options is not present) is used for the options is not present) is used for the
actual downloading. (experimental) actual downloading. (experimental)
@@ -162,7 +168,7 @@ which means you can modify it, redistribute it or use it however you like.
(experimental) (experimental)
## Download Options: ## Download Options:
-r, --rate-limit LIMIT Maximum download rate in bytes per second -r, --limit-rate RATE Maximum download rate in bytes per second
(e.g. 50K or 4.2M) (e.g. 50K or 4.2M)
-R, --retries RETRIES Number of retries (default is 10), or -R, --retries RETRIES Number of retries (default is 10), or
"infinite". "infinite".
@@ -249,7 +255,7 @@ which means you can modify it, redistribute it or use it however you like.
--write-info-json Write video metadata to a .info.json file --write-info-json Write video metadata to a .info.json file
--write-annotations Write video annotations to a --write-annotations Write video annotations to a
.annotations.xml file .annotations.xml file
--load-info FILE JSON file containing the video information --load-info-json FILE JSON file containing the video information
(created with the "--write-info-json" (created with the "--write-info-json"
option) option)
--cookies FILE File to read cookies from and dump cookie --cookies FILE File to read cookies from and dump cookie
@@ -418,7 +424,7 @@ which means you can modify it, redistribute it or use it however you like.
# CONFIGURATION # CONFIGURATION
You can configure youtube-dl by placing any supported command line option to a configuration file. On Linux and OS X, the system wide configuration file is located at `/etc/youtube-dl.conf` and the user wide configuration file at `~/.config/youtube-dl/config`. On Windows, the user wide configuration file locations are `%APPDATA%\youtube-dl\config.txt` or `C:\Users\<user name>\youtube-dl.conf`. You can configure youtube-dl by placing any supported command line option to a configuration file. On Linux and OS X, the system wide configuration file is located at `/etc/youtube-dl.conf` and the user wide configuration file at `~/.config/youtube-dl/config`. On Windows, the user wide configuration file locations are `%APPDATA%\youtube-dl\config.txt` or `C:\Users\<user name>\youtube-dl.conf`. Note that by default configuration file may not exist so you may need to create it yourself.
For example, with the following configuration file youtube-dl will always extract the audio, not copy the mtime, use a proxy and save all videos under `Movies` directory in your home directory: For example, with the following configuration file youtube-dl will always extract the audio, not copy the mtime, use a proxy and save all videos under `Movies` directory in your home directory:
``` ```
@@ -426,6 +432,7 @@ For example, with the following configuration file youtube-dl will always extrac
--no-mtime --no-mtime
--proxy 127.0.0.1:3128 --proxy 127.0.0.1:3128
-o ~/Movies/%(title)s.%(ext)s -o ~/Movies/%(title)s.%(ext)s
# Lines starting with # are comments
``` ```
Note that options in configuration file are just the same options aka switches used in regular command line calls thus there **must be no whitespace** after `-` or `--`, e.g. `-o` or `--proxy` but not `- o` or `-- proxy`. Note that options in configuration file are just the same options aka switches used in regular command line calls thus there **must be no whitespace** after `-` or `--`, e.g. `-o` or `--proxy` but not `- o` or `-- proxy`.
@@ -505,6 +512,9 @@ The basic usage is not to set any template arguments when downloading a single f
- `autonumber`: Five-digit number that will be increased with each download, starting at zero - `autonumber`: Five-digit number that will be increased with each download, starting at zero
- `playlist`: Name or id of the playlist that contains the video - `playlist`: Name or id of the playlist that contains the video
- `playlist_index`: Index of the video in the playlist padded with leading zeros according to the total length of the playlist - `playlist_index`: Index of the video in the playlist padded with leading zeros according to the total length of the playlist
- `playlist_id`: Playlist identifier
- `playlist_title`: Playlist title
Available for the video that belongs to some logical chapter or section: Available for the video that belongs to some logical chapter or section:
- `chapter`: Name or title of the chapter the video belongs to - `chapter`: Name or title of the chapter the video belongs to
@@ -544,6 +554,10 @@ The current default template is `%(title)s-%(id)s.%(ext)s`.
In some cases, you don't want special characters such as 中, spaces, or &, such as when transferring the downloaded filename to a Windows system or the filename through an 8bit-unsafe channel. In these cases, add the `--restrict-filenames` flag to get a shorter title: In some cases, you don't want special characters such as 中, spaces, or &, such as when transferring the downloaded filename to a Windows system or the filename through an 8bit-unsafe channel. In these cases, add the `--restrict-filenames` flag to get a shorter title:
#### Output template and Windows batch files
If you are using output template inside a Windows batch file then you must escape plain percent characters (`%`) by doubling, so that `-o "%(title)s-%(id)s.%(ext)s"` should become `-o "%%(title)s-%%(id)s.%%(ext)s"`. However you should not touch `%`'s that are not plain characters, e.g. environment variables for expansion should stay intact: `-o "C:\%HOMEPATH%\Desktop\%%(title)s.%%(ext)s"`.
#### Output template examples #### Output template examples
Note on Windows you may need to use double quotes instead of single. Note on Windows you may need to use double quotes instead of single.
@@ -842,6 +856,12 @@ It is *not* possible to detect whether a URL is supported or not. That's because
If you want to find out whether a given URL is supported, simply call youtube-dl with it. If you get no videos back, chances are the URL is either not referring to a video or unsupported. You can find out which by examining the output (if you run youtube-dl on the console) or catching an `UnsupportedError` exception if you run it from a Python program. If you want to find out whether a given URL is supported, simply call youtube-dl with it. If you get no videos back, chances are the URL is either not referring to a video or unsupported. You can find out which by examining the output (if you run youtube-dl on the console) or catching an `UnsupportedError` exception if you run it from a Python program.
# Why do I need to go through that much red tape when filing bugs?
Before we had the issue template, despite our extensive [bug reporting instructions](#bugs), about 80% of the issue reports we got were useless, for instance because people used ancient versions hundreds of releases old, because of simple syntactic errors (not in youtube-dl but in general shell usage), because the problem was alrady reported multiple times before, because people did not actually read an error message, even if it said "please install ffmpeg", because people did not mention the URL they were trying to download and many more simple, easy-to-avoid problems, many of whom were totally unrelated to youtube-dl.
youtube-dl is an open-source project manned by too few volunteers, so we'd rather spend time fixing bugs where we are certain none of those simple problems apply, and where we can be reasonably confident to be able to reproduce the issue without asking the reporter repeatedly. As such, the output of `youtube-dl -v YOUR_URL_HERE` is really all that's required to file an issue. The issue template also guides you through some basic steps you can do, such as checking that your version of youtube-dl is current.
# DEVELOPER INSTRUCTIONS # DEVELOPER INSTRUCTIONS
Most users do not need to build youtube-dl and can [download the builds](http://rg3.github.io/youtube-dl/download.html) or get them from their distribution. Most users do not need to build youtube-dl and can [download the builds](http://rg3.github.io/youtube-dl/download.html) or get them from their distribution.
@@ -871,9 +891,17 @@ If you want to add support for a new site, first of all **make sure** this site
After you have ensured this site is distributing it's content legally, you can follow this quick list (assuming your service is called `yourextractor`): After you have ensured this site is distributing it's content legally, you can follow this quick list (assuming your service is called `yourextractor`):
1. [Fork this repository](https://github.com/rg3/youtube-dl/fork) 1. [Fork this repository](https://github.com/rg3/youtube-dl/fork)
2. Check out the source code with `git clone git@github.com:YOUR_GITHUB_USERNAME/youtube-dl.git` 2. Check out the source code with:
3. Start a new git branch with `cd youtube-dl; git checkout -b yourextractor`
git clone git@github.com:YOUR_GITHUB_USERNAME/youtube-dl.git
3. Start a new git branch with
cd youtube-dl
git checkout -b yourextractor
4. Start with this simple template and save it to `youtube_dl/extractor/yourextractor.py`: 4. Start with this simple template and save it to `youtube_dl/extractor/yourextractor.py`:
```python ```python
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
@@ -916,20 +944,152 @@ After you have ensured this site is distributing it's content legally, you can f
``` ```
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py). 5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. 6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc.
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L68-L226). Add tests and code for as many as you want. 7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L74-L252). Add tests and code for as many as you want.
8. Keep in mind that the only mandatory fields in info dict for successful extraction process are `id`, `title` and either `url` or `formats`, i.e. these are the critical data the extraction does not make any sense without. This means that [any field](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L138-L226) apart from aforementioned mandatory ones should be treated **as optional** and extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields. For example, if you have some intermediate dict `meta` that is a source of metadata and it has a key `summary` that you want to extract and put into resulting info dict as `description`, you should be ready that this key may be missing from the `meta` dict, i.e. you should extract it as `meta.get('summary')` and not `meta['summary']`. Similarly, you should pass `fatal=False` when extracting data from a webpage with `_search_regex/_html_search_regex`. 8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://pypi.python.org/pypi/flake8). Also make sure your code works under all [Python](http://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+.
9. Check the code with [flake8](https://pypi.python.org/pypi/flake8). 9. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
10. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
$ git add youtube_dl/extractor/extractors.py $ git add youtube_dl/extractor/extractors.py
$ git add youtube_dl/extractor/yourextractor.py $ git add youtube_dl/extractor/yourextractor.py
$ git commit -m '[yourextractor] Add new extractor' $ git commit -m '[yourextractor] Add new extractor'
$ git push origin yourextractor $ git push origin yourextractor
11. Finally, [create a pull request](https://help.github.com/articles/creating-a-pull-request). We'll then review and merge it. 10. Finally, [create a pull request](https://help.github.com/articles/creating-a-pull-request). We'll then review and merge it.
In any case, thank you very much for your contributions! In any case, thank you very much for your contributions!
## youtube-dl coding conventions
This section introduces a guide lines for writing idiomatic, robust and future-proof extractor code.
Extractors are very fragile by nature since they depend on the layout of the source data provided by 3rd party media hoster out of your control and this layout tend to change. As an extractor implementer your task is not only to write code that will extract media links and metadata correctly but also to minimize code dependency on source's layout changes and even to make the code foresee potential future changes and be ready for that. This is important because it will allow extractor not to break on minor layout changes thus keeping old youtube-dl versions working. Even though this breakage issue is easily fixed by emitting a new version of youtube-dl with fix incorporated all the previous version become broken in all repositories and distros' packages that may not be so prompt in fetching the update from us. Needless to say some may never receive an update at all that is possible for non rolling release distros.
### Mandatory and optional metafields
For extraction to work youtube-dl relies on metadata your extractor extracts and provides to youtube-dl expressed by [information dictionary](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L75-L257) or simply *info dict*. Only the following meta fields in *info dict* are considered mandatory for successful extraction process by youtube-dl:
- `id` (media identifier)
- `title` (media title)
- `url` (media download URL) or `formats`
In fact only the last option is technically mandatory (i.e. if you can't figure out the download location of the media the extraction does not make any sense). But by convention youtube-dl also treats `id` and `title` to be mandatory. Thus aforementioned metafields are the critical data the extraction does not make any sense without and if any of them fail to be extracted then extractor is considered completely broken.
[Any field](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L149-L257) apart from the aforementioned ones are considered **optional**. That means that extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they are always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields.
#### Example
Say you have some source dictionary `meta` that you've fetched as JSON with HTTP request and it has a key `summary`:
```python
meta = self._download_json(url, video_id)
```
Assume at this point `meta`'s layout is:
```python
{
...
"summary": "some fancy summary text",
...
}
```
Assume you want to extract `summary` and put into resulting info dict as `description`. Since `description` is optional metafield you should be ready that this key may be missing from the `meta` dict, so that you should extract it like:
```python
description = meta.get('summary') # correct
```
and not like:
```python
description = meta['summary'] # incorrect
```
The latter will break extraction process with `KeyError` if `summary` disappears from `meta` at some time later but with former approach extraction will just go ahead with `description` set to `None` that is perfectly fine (remember `None` is equivalent for absence of data).
Similarly, you should pass `fatal=False` when extracting optional data from a webpage with `_search_regex`, `_html_search_regex` or similar methods, for instance:
```python
description = self._search_regex(
r'<span[^>]+id="title"[^>]*>([^<]+)<',
webpage, 'description', fatal=False)
```
With `fatal` set to `False` if `_search_regex` fails to extract `description` it will emit a warning and continue extraction.
You can also pass `default=<some fallback value>`, for example:
```python
description = self._search_regex(
r'<span[^>]+id="title"[^>]*>([^<]+)<',
webpage, 'description', default=None)
```
On failure this code will silently continue the extraction with `description` set to `None`. That is useful for metafields that are known to may or may not be present.
### Provide fallbacks
When extracting metadata try to provide several scenarios for that. For example if `title` is present in several places/sources try extracting from at least some of them. This would make it more future-proof in case some of the sources became unavailable.
#### Example
Say `meta` from previous example has a `title` and you are about to extract it. Since `title` is mandatory meta field you should end up with something like:
```python
title = meta['title']
```
If `title` disappeares from `meta` in future due to some changes on hoster's side the extraction would fail since `title` is mandatory. That's expected.
Assume that you have some another source you can extract `title` from, for example `og:title` HTML meta of a `webpage`. In this case you can provide a fallback scenario:
```python
title = meta.get('title') or self._og_search_title(webpage)
```
This code will try to extract from `meta` first and if it fails it will try extracting `og:title` from a `webpage`.
### Make regular expressions flexible
When using regular expressions try to write them fuzzy and flexible.
#### Example
Say you need to extract `title` from the following HTML code:
```html
<span style="position: absolute; left: 910px; width: 90px; float: right; z-index: 9999;" class="title">some fancy title</span>
```
The code for that task should look similar to:
```python
title = self._search_regex(
r'<span[^>]+class="title"[^>]*>([^<]+)', webpage, 'title')
```
Or even better:
```python
title = self._search_regex(
r'<span[^>]+class=(["\'])title\1[^>]*>(?P<title>[^<]+)',
webpage, 'title', group='title')
```
Note how you tolerate potential changes in `style` attribute's value or switch from using double quotes to single for `class` attribute:
The code definitely should not look like:
```python
title = self._search_regex(
r'<span style="position: absolute; left: 910px; width: 90px; float: right; z-index: 9999;" class="title">(.*?)</span>',
webpage, 'title', group='title')
```
### Use safe conversion functions
Wrap all extracted numeric data into safe functions from `utils`: `int_or_none`, `float_or_none`. Use them for string to number conversions as well.
# EMBEDDING YOUTUBE-DL # EMBEDDING YOUTUBE-DL
youtube-dl makes the best effort to be a good command-line program, and thus should be callable from any programming language. If you encounter any problems parsing its output, feel free to [create a report](https://github.com/rg3/youtube-dl/issues/new). youtube-dl makes the best effort to be a good command-line program, and thus should be callable from any programming language. If you encounter any problems parsing its output, feel free to [create a report](https://github.com/rg3/youtube-dl/issues/new).
@@ -945,7 +1105,7 @@ with youtube_dl.YoutubeDL(ydl_opts) as ydl:
ydl.download(['http://www.youtube.com/watch?v=BaW_jenozKc']) ydl.download(['http://www.youtube.com/watch?v=BaW_jenozKc'])
``` ```
Most likely, you'll want to use various options. For a list of what can be done, have a look at [`youtube_dl/YoutubeDL.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/YoutubeDL.py#L121-L269). For a start, if you want to intercept youtube-dl's output, set a `logger` object. Most likely, you'll want to use various options. For a list of options available, have a look at [`youtube_dl/YoutubeDL.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/YoutubeDL.py#L128-L278). For a start, if you want to intercept youtube-dl's output, set a `logger` object.
Here's a more complete example of a program that outputs only errors (and a short message after the download is finished), and downloads/converts the video to an mp3 file: Here's a more complete example of a program that outputs only errors (and a short message after the download is finished), and downloads/converts the video to an mp3 file:

View File

@@ -13,6 +13,7 @@ import os.path
sys.path.insert(0, os.path.dirname(os.path.dirname((os.path.abspath(__file__))))) sys.path.insert(0, os.path.dirname(os.path.dirname((os.path.abspath(__file__)))))
from youtube_dl.compat import ( from youtube_dl.compat import (
compat_input,
compat_http_server, compat_http_server,
compat_str, compat_str,
compat_urlparse, compat_urlparse,
@@ -30,11 +31,6 @@ try:
except ImportError: # Python 2 except ImportError: # Python 2
import SocketServer as compat_socketserver import SocketServer as compat_socketserver
try:
compat_input = raw_input
except NameError: # Python 3
compat_input = input
class BuildHTTPServer(compat_socketserver.ThreadingMixIn, compat_http_server.HTTPServer): class BuildHTTPServer(compat_socketserver.ThreadingMixIn, compat_http_server.HTTPServer):
allow_reuse_address = True allow_reuse_address = True

View File

@@ -0,0 +1,111 @@
#!/usr/bin/env python
from __future__ import unicode_literals
import base64
import json
import mimetypes
import netrc
import optparse
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from youtube_dl.compat import (
compat_basestring,
compat_input,
compat_getpass,
compat_print,
compat_urllib_request,
)
from youtube_dl.utils import (
make_HTTPS_handler,
sanitized_Request,
)
class GitHubReleaser(object):
_API_URL = 'https://api.github.com/repos/rg3/youtube-dl/releases'
_UPLOADS_URL = 'https://uploads.github.com/repos/rg3/youtube-dl/releases/%s/assets?name=%s'
_NETRC_MACHINE = 'github.com'
def __init__(self, debuglevel=0):
self._init_github_account()
https_handler = make_HTTPS_handler({}, debuglevel=debuglevel)
self._opener = compat_urllib_request.build_opener(https_handler)
def _init_github_account(self):
try:
info = netrc.netrc().authenticators(self._NETRC_MACHINE)
if info is not None:
self._username = info[0]
self._password = info[2]
compat_print('Using GitHub credentials found in .netrc...')
return
else:
compat_print('No GitHub credentials found in .netrc')
except (IOError, netrc.NetrcParseError):
compat_print('Unable to parse .netrc')
self._username = compat_input(
'Type your GitHub username or email address and press [Return]: ')
self._password = compat_getpass(
'Type your GitHub password and press [Return]: ')
def _call(self, req):
if isinstance(req, compat_basestring):
req = sanitized_Request(req)
# Authorizing manually since GitHub does not response with 401 with
# WWW-Authenticate header set (see
# https://developer.github.com/v3/#basic-authentication)
b64 = base64.b64encode(
('%s:%s' % (self._username, self._password)).encode('utf-8')).decode('ascii')
req.add_header('Authorization', 'Basic %s' % b64)
response = self._opener.open(req).read().decode('utf-8')
return json.loads(response)
def list_releases(self):
return self._call(self._API_URL)
def create_release(self, tag_name, name=None, body='', draft=False, prerelease=False):
data = {
'tag_name': tag_name,
'target_commitish': 'master',
'name': name,
'body': body,
'draft': draft,
'prerelease': prerelease,
}
req = sanitized_Request(self._API_URL, json.dumps(data).encode('utf-8'))
return self._call(req)
def create_asset(self, release_id, asset):
asset_name = os.path.basename(asset)
url = self._UPLOADS_URL % (release_id, asset_name)
# Our files are small enough to be loaded directly into memory.
data = open(asset, 'rb').read()
req = sanitized_Request(url, data)
mime_type, _ = mimetypes.guess_type(asset_name)
req.add_header('Content-Type', mime_type or 'application/octet-stream')
return self._call(req)
def main():
parser = optparse.OptionParser(usage='%prog VERSION BUILDPATH')
options, args = parser.parse_args()
if len(args) != 2:
parser.error('Expected a version and a build directory')
version, build_path = args
releaser = GitHubReleaser()
new_release = releaser.create_release(version, name='youtube-dl %s' % version)
release_id = new_release['id']
for asset in os.listdir(build_path):
compat_print('Uploading %s...' % asset)
releaser.create_asset(release_id, os.path.join(build_path, asset))
if __name__ == '__main__':
main()

View File

@@ -15,13 +15,9 @@ data = urllib.request.urlopen(URL).read()
with open('download.html.in', 'r', encoding='utf-8') as tmplf: with open('download.html.in', 'r', encoding='utf-8') as tmplf:
template = tmplf.read() template = tmplf.read()
md5sum = hashlib.md5(data).hexdigest()
sha1sum = hashlib.sha1(data).hexdigest()
sha256sum = hashlib.sha256(data).hexdigest() sha256sum = hashlib.sha256(data).hexdigest()
template = template.replace('@PROGRAM_VERSION@', version) template = template.replace('@PROGRAM_VERSION@', version)
template = template.replace('@PROGRAM_URL@', URL) template = template.replace('@PROGRAM_URL@', URL)
template = template.replace('@PROGRAM_MD5SUM@', md5sum)
template = template.replace('@PROGRAM_SHA1SUM@', sha1sum)
template = template.replace('@PROGRAM_SHA256SUM@', sha256sum) template = template.replace('@PROGRAM_SHA256SUM@', sha256sum)
template = template.replace('@EXE_URL@', versions_info['versions'][version]['exe'][0]) template = template.replace('@EXE_URL@', versions_info['versions'][version]['exe'][0])
template = template.replace('@EXE_SHA256SUM@', versions_info['versions'][version]['exe'][1]) template = template.replace('@EXE_SHA256SUM@', versions_info['versions'][version]['exe'][1])

View File

@@ -1,8 +0,0 @@
#!/bin/bash
mkdir -p tmp && cd tmp
wget -N http://downloads.sourceforge.net/project/socks-relay/socks-relay/srelay-0.4.8/srelay-0.4.8b6.tar.gz
tar zxvf srelay-0.4.8b6.tar.gz
cd srelay-0.4.8b6
./configure
make

View File

@@ -14,15 +14,17 @@ if os.path.exists(lazy_extractors_filename):
os.remove(lazy_extractors_filename) os.remove(lazy_extractors_filename)
from youtube_dl.extractor import _ALL_CLASSES from youtube_dl.extractor import _ALL_CLASSES
from youtube_dl.extractor.common import InfoExtractor from youtube_dl.extractor.common import InfoExtractor, SearchInfoExtractor
with open('devscripts/lazy_load_template.py', 'rt') as f: with open('devscripts/lazy_load_template.py', 'rt') as f:
module_template = f.read() module_template = f.read()
module_contents = [module_template + '\n' + getsource(InfoExtractor.suitable)] module_contents = [
module_template + '\n' + getsource(InfoExtractor.suitable) + '\n',
'class LazyLoadSearchExtractor(LazyLoadExtractor):\n pass\n']
ie_template = ''' ie_template = '''
class {name}(LazyLoadExtractor): class {name}({bases}):
_VALID_URL = {valid_url!r} _VALID_URL = {valid_url!r}
_module = '{module}' _module = '{module}'
''' '''
@@ -34,10 +36,20 @@ make_valid_template = '''
''' '''
def get_base_name(base):
if base is InfoExtractor:
return 'LazyLoadExtractor'
elif base is SearchInfoExtractor:
return 'LazyLoadSearchExtractor'
else:
return base.__name__
def build_lazy_ie(ie, name): def build_lazy_ie(ie, name):
valid_url = getattr(ie, '_VALID_URL', None) valid_url = getattr(ie, '_VALID_URL', None)
s = ie_template.format( s = ie_template.format(
name=name, name=name,
bases=', '.join(map(get_base_name, ie.__bases__)),
valid_url=valid_url, valid_url=valid_url,
module=ie.__module__) module=ie.__module__)
if ie.suitable.__func__ is not InfoExtractor.suitable.__func__: if ie.suitable.__func__ is not InfoExtractor.suitable.__func__:
@@ -47,12 +59,35 @@ def build_lazy_ie(ie, name):
s += make_valid_template.format(valid_url=ie._make_valid_url()) s += make_valid_template.format(valid_url=ie._make_valid_url())
return s return s
# find the correct sorting and add the required base classes so that sublcasses
# can be correctly created
classes = _ALL_CLASSES[:-1]
ordered_cls = []
while classes:
for c in classes[:]:
bases = set(c.__bases__) - set((object, InfoExtractor, SearchInfoExtractor))
stop = False
for b in bases:
if b not in classes and b not in ordered_cls:
if b.__name__ == 'GenericIE':
exit()
classes.insert(0, b)
stop = True
if stop:
break
if all(b in ordered_cls for b in bases):
ordered_cls.append(c)
classes.remove(c)
break
ordered_cls.append(_ALL_CLASSES[-1])
names = [] names = []
for ie in list(sorted(_ALL_CLASSES[:-1], key=lambda cls: cls.ie_key())) + _ALL_CLASSES[-1:]: for ie in ordered_cls:
name = ie.ie_key() + 'IE' name = ie.__name__
src = build_lazy_ie(ie, name) src = build_lazy_ie(ie, name)
module_contents.append(src) module_contents.append(src)
names.append(name) if ie in _ALL_CLASSES:
names.append(name)
module_contents.append( module_contents.append(
'_ALL_CLASSES = [{0}]'.format(', '.join(names))) '_ALL_CLASSES = [{0}]'.format(', '.join(names)))

View File

@@ -15,6 +15,7 @@
set -e set -e
skip_tests=true skip_tests=true
gpg_sign_commits=""
buildserver='localhost:8142' buildserver='localhost:8142'
while true while true
@@ -24,6 +25,10 @@ case "$1" in
skip_tests=false skip_tests=false
shift shift
;; ;;
--gpg-sign-commits|-S)
gpg_sign_commits="-S"
shift
;;
--buildserver) --buildserver)
buildserver="$2" buildserver="$2"
shift 2 shift 2
@@ -69,7 +74,7 @@ sed -i "s/__version__ = '.*'/__version__ = '$version'/" youtube_dl/version.py
/bin/echo -e "\n### Committing documentation, templates and youtube_dl/version.py..." /bin/echo -e "\n### Committing documentation, templates and youtube_dl/version.py..."
make README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE.md supportedsites make README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE.md supportedsites
git add README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE.md docs/supportedsites.md youtube_dl/version.py git add README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE.md docs/supportedsites.md youtube_dl/version.py
git commit -m "release $version" git commit $gpg_sign_commits -m "release $version"
/bin/echo -e "\n### Now tagging, signing and pushing..." /bin/echo -e "\n### Now tagging, signing and pushing..."
git tag -s -m "Release $version" "$version" git tag -s -m "Release $version" "$version"
@@ -95,15 +100,16 @@ RELEASE_FILES="youtube-dl youtube-dl.exe youtube-dl-$version.tar.gz"
(cd build/$version/ && sha256sum $RELEASE_FILES > SHA2-256SUMS) (cd build/$version/ && sha256sum $RELEASE_FILES > SHA2-256SUMS)
(cd build/$version/ && sha512sum $RELEASE_FILES > SHA2-512SUMS) (cd build/$version/ && sha512sum $RELEASE_FILES > SHA2-512SUMS)
/bin/echo -e "\n### Signing and uploading the new binaries to yt-dl.org ..." /bin/echo -e "\n### Signing and uploading the new binaries to GitHub..."
for f in $RELEASE_FILES; do gpg --passphrase-repeat 5 --detach-sig "build/$version/$f"; done for f in $RELEASE_FILES; do gpg --passphrase-repeat 5 --detach-sig "build/$version/$f"; done
scp -r "build/$version" ytdl@yt-dl.org:html/tmp/
ssh ytdl@yt-dl.org "mv html/tmp/$version html/downloads/" ROOT=$(pwd)
python devscripts/create-github-release.py $version "$ROOT/build/$version"
ssh ytdl@yt-dl.org "sh html/update_latest.sh $version" ssh ytdl@yt-dl.org "sh html/update_latest.sh $version"
/bin/echo -e "\n### Now switching to gh-pages..." /bin/echo -e "\n### Now switching to gh-pages..."
git clone --branch gh-pages --single-branch . build/gh-pages git clone --branch gh-pages --single-branch . build/gh-pages
ROOT=$(pwd)
( (
set -e set -e
ORIGIN_URL=$(git config --get remote.origin.url) ORIGIN_URL=$(git config --get remote.origin.url)
@@ -115,7 +121,7 @@ ROOT=$(pwd)
"$ROOT/devscripts/gh-pages/update-copyright.py" "$ROOT/devscripts/gh-pages/update-copyright.py"
"$ROOT/devscripts/gh-pages/update-sites.py" "$ROOT/devscripts/gh-pages/update-sites.py"
git add *.html *.html.in update git add *.html *.html.in update
git commit -m "release $version" git commit $gpg_sign_commits -m "release $version"
git push "$ROOT" gh-pages git push "$ROOT" gh-pages
git push "$ORIGIN_URL" gh-pages git push "$ORIGIN_URL" gh-pages
) )

View File

@@ -0,0 +1,41 @@
#!/usr/bin/env python
from __future__ import unicode_literals
import json
import os
import re
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from youtube_dl.compat import (
compat_print,
compat_urllib_request,
)
from youtube_dl.utils import format_bytes
def format_size(bytes):
return '%s (%d bytes)' % (format_bytes(bytes), bytes)
total_bytes = 0
releases = json.loads(compat_urllib_request.urlopen(
'https://api.github.com/repos/rg3/youtube-dl/releases').read().decode('utf-8'))
for release in releases:
compat_print(release['name'])
for asset in release['assets']:
asset_name = asset['name']
total_bytes += asset['download_count'] * asset['size']
if all(not re.match(p, asset_name) for p in (
r'^youtube-dl$',
r'^youtube-dl-\d{4}\.\d{2}\.\d{2}(?:\.\d+)?\.tar\.gz$',
r'^youtube-dl\.exe$')):
continue
compat_print(
' %s size: %s downloads: %d'
% (asset_name, format_size(asset['size']), asset['download_count']))
compat_print('total downloads traffic: %s' % format_size(total_bytes))

View File

@@ -28,6 +28,7 @@
- **AdobeTVVideo** - **AdobeTVVideo**
- **AdultSwim** - **AdultSwim**
- **aenetworks**: A+E Networks: A&E, Lifetime, History.com, FYI Network - **aenetworks**: A+E Networks: A&E, Lifetime, History.com, FYI Network
- **AfreecaTV**: afreecatv.com
- **Aftonbladet** - **Aftonbladet**
- **AirMozilla** - **AirMozilla**
- **AlJazeera** - **AlJazeera**
@@ -44,7 +45,6 @@
- **archive.org**: archive.org videos - **archive.org**: archive.org videos
- **ARD** - **ARD**
- **ARD:mediathek** - **ARD:mediathek**
- **ARD:mediathek**: Saarländischer Rundfunk
- **arte.tv** - **arte.tv**
- **arte.tv:+7** - **arte.tv:+7**
- **arte.tv:cinema** - **arte.tv:cinema**
@@ -55,6 +55,7 @@
- **arte.tv:future** - **arte.tv:future**
- **arte.tv:info** - **arte.tv:info**
- **arte.tv:magazine** - **arte.tv:magazine**
- **arte.tv:playlist**
- **AtresPlayer** - **AtresPlayer**
- **ATTTechChannel** - **ATTTechChannel**
- **AudiMedia** - **AudiMedia**
@@ -72,6 +73,8 @@
- **bbc**: BBC - **bbc**: BBC
- **bbc.co.uk**: BBC iPlayer - **bbc.co.uk**: BBC iPlayer
- **bbc.co.uk:article**: BBC articles - **bbc.co.uk:article**: BBC articles
- **bbc.co.uk:iplayer:playlist**
- **bbc.co.uk:playlist**
- **BeatportPro** - **BeatportPro**
- **Beeg** - **Beeg**
- **BehindKink** - **BehindKink**
@@ -102,6 +105,8 @@
- **canalc2.tv** - **canalc2.tv**
- **Canalplus**: canalplus.fr, piwiplus.fr and d8.tv - **Canalplus**: canalplus.fr, piwiplus.fr and d8.tv
- **Canvas** - **Canvas**
- **CarambaTV**
- **CarambaTVPage**
- **CBC** - **CBC**
- **CBCPlayer** - **CBCPlayer**
- **CBS** - **CBS**
@@ -122,6 +127,7 @@
- **cliphunter** - **cliphunter**
- **ClipRs** - **ClipRs**
- **Clipsyndicate** - **Clipsyndicate**
- **CloserToTruth**
- **cloudtime**: CloudTime - **cloudtime**: CloudTime
- **Cloudy** - **Cloudy**
- **Clubic** - **Clubic**
@@ -146,6 +152,8 @@
- **CSNNE** - **CSNNE**
- **CSpan**: C-SPAN - **CSpan**: C-SPAN
- **CtsNews**: 華視新聞 - **CtsNews**: 華視新聞
- **CTV**
- **CTVNews**
- **culturebox.francetvinfo.fr** - **culturebox.francetvinfo.fr**
- **CultureUnplugged** - **CultureUnplugged**
- **CWTV** - **CWTV**
@@ -216,6 +224,7 @@
- **Firstpost** - **Firstpost**
- **FiveTV** - **FiveTV**
- **Flickr** - **Flickr**
- **Flipagram**
- **Folketinget**: Folketinget (ft.dk; Danish parliament) - **Folketinget**: Folketinget (ft.dk; Danish parliament)
- **FootyRoom** - **FootyRoom**
- **Formula1** - **Formula1**
@@ -234,6 +243,7 @@
- **FreeVideo** - **FreeVideo**
- **Funimation** - **Funimation**
- **FunnyOrDie** - **FunnyOrDie**
- **Fusion**
- **GameInformer** - **GameInformer**
- **Gamekings** - **Gamekings**
- **GameOne** - **GameOne**
@@ -241,7 +251,6 @@
- **Gamersyde** - **Gamersyde**
- **GameSpot** - **GameSpot**
- **GameStar** - **GameStar**
- **Gametrailers**
- **Gazeta** - **Gazeta**
- **GDCVault** - **GDCVault**
- **generic**: Generic downloader that works on some sites - **generic**: Generic downloader that works on some sites
@@ -252,6 +261,7 @@
- **Globo** - **Globo**
- **GloboArticle** - **GloboArticle**
- **GodTube** - **GodTube**
- **GodTV**
- **GoldenMoustache** - **GoldenMoustache**
- **Golem** - **Golem**
- **GoogleDrive** - **GoogleDrive**
@@ -266,6 +276,7 @@
- **Helsinki**: helsinki.fi - **Helsinki**: helsinki.fi
- **HentaiStigma** - **HentaiStigma**
- **HistoricFilms** - **HistoricFilms**
- **history:topic**: History.com Topic
- **hitbox** - **hitbox**
- **hitbox:live** - **hitbox:live**
- **HornBunny** - **HornBunny**
@@ -273,6 +284,8 @@
- **HotStar** - **HotStar**
- **Howcast** - **Howcast**
- **HowStuffWorks** - **HowStuffWorks**
- **HRTi**
- **HRTiPlaylist**
- **HuffPost**: Huffington Post - **HuffPost**: Huffington Post
- **Hypem** - **Hypem**
- **Iconosquare** - **Iconosquare**
@@ -300,6 +313,7 @@
- **jpopsuki.tv** - **jpopsuki.tv**
- **JWPlatform** - **JWPlatform**
- **Kaltura** - **Kaltura**
- **Kamcord**
- **KanalPlay**: Kanal 5/9/11 Play - **KanalPlay**: Kanal 5/9/11 Play
- **Kankan** - **Kankan**
- **Karaoketv** - **Karaoketv**
@@ -319,7 +333,7 @@
- **kuwo:mv**: 酷我音乐 - MV - **kuwo:mv**: 酷我音乐 - MV
- **kuwo:singer**: 酷我音乐 - 歌手 - **kuwo:singer**: 酷我音乐 - 歌手
- **kuwo:song**: 酷我音乐 - **kuwo:song**: 酷我音乐
- **la7.tv** - **la7.it**
- **Laola1Tv** - **Laola1Tv**
- **Le**: 乐视网 - **Le**: 乐视网
- **Learnr** - **Learnr**
@@ -338,6 +352,7 @@
- **livestream** - **livestream**
- **livestream:original** - **livestream:original**
- **LnkGo** - **LnkGo**
- **loc**: Library of Congress
- **LocalNews8** - **LocalNews8**
- **LoveHomePorn** - **LoveHomePorn**
- **lrt.lt** - **lrt.lt**
@@ -351,6 +366,7 @@
- **MatchTV** - **MatchTV**
- **MDR**: MDR.DE and KiKA - **MDR**: MDR.DE and KiKA
- **media.ccc.de** - **media.ccc.de**
- **META**
- **metacafe** - **metacafe**
- **Metacritic** - **Metacritic**
- **Mgoon** - **Mgoon**
@@ -377,7 +393,7 @@
- **MovieFap** - **MovieFap**
- **Moviezine** - **Moviezine**
- **MPORA** - **MPORA**
- **MSNBC** - **MSN**
- **MTV** - **MTV**
- **mtv.de** - **mtv.de**
- **mtviggy.com** - **mtviggy.com**
@@ -428,8 +444,10 @@
- **nhl.com:videocenter** - **nhl.com:videocenter**
- **nhl.com:videocenter:category**: NHL videocenter category - **nhl.com:videocenter:category**: NHL videocenter category
- **nick.com** - **nick.com**
- **nick.de**
- **niconico**: ニコニコ動画 - **niconico**: ニコニコ動画
- **NiconicoPlaylist** - **NiconicoPlaylist**
- **NineCNineMedia**
- **njoy**: N-JOY - **njoy**: N-JOY
- **njoy:embed** - **njoy:embed**
- **Noco** - **Noco**
@@ -460,6 +478,8 @@
- **Odnoklassniki** - **Odnoklassniki**
- **OktoberfestTV** - **OktoberfestTV**
- **on.aol.com** - **on.aol.com**
- **onet.tv**
- **onet.tv:channel**
- **OnionStudios** - **OnionStudios**
- **Ooyala** - **Ooyala**
- **OoyalaExternal** - **OoyalaExternal**
@@ -493,8 +513,9 @@
- **plus.google**: Google Plus - **plus.google**: Google Plus
- **pluzz.francetv.fr** - **pluzz.francetv.fr**
- **podomatic** - **podomatic**
- **PolskieRadio**
- **PornHd** - **PornHd**
- **PornHub** - **PornHub**: PornHub and Thumbzilla
- **PornHubPlaylist** - **PornHubPlaylist**
- **PornHubUserVideos** - **PornHubUserVideos**
- **Pornotube** - **Pornotube**
@@ -512,6 +533,7 @@
- **qqmusic:singer**: QQ音乐 - 歌手 - **qqmusic:singer**: QQ音乐 - 歌手
- **qqmusic:toplist**: QQ音乐 - 排行榜 - **qqmusic:toplist**: QQ音乐 - 排行榜
- **R7** - **R7**
- **R7Article**
- **radio.de** - **radio.de**
- **radiobremen** - **radiobremen**
- **radiocanada** - **radiocanada**
@@ -527,9 +549,12 @@
- **Restudy** - **Restudy**
- **Reuters** - **Reuters**
- **ReverbNation** - **ReverbNation**
- **Revision3** - **revision**
- **revision3:embed**
- **RICE** - **RICE**
- **RingTV** - **RingTV**
- **RockstarGames**
- **RoosterTeeth**
- **RottenTomatoes** - **RottenTomatoes**
- **Roxwel** - **Roxwel**
- **RTBF** - **RTBF**
@@ -543,6 +568,7 @@
- **rtve.es:infantil**: RTVE infantil - **rtve.es:infantil**: RTVE infantil
- **rtve.es:live**: RTVE.es live streams - **rtve.es:live**: RTVE.es live streams
- **RTVNH** - **RTVNH**
- **Rudo**
- **RUHD** - **RUHD**
- **RulePorn** - **RulePorn**
- **rutube**: Rutube videos - **rutube**: Rutube videos
@@ -566,6 +592,7 @@
- **ScreencastOMatic** - **ScreencastOMatic**
- **ScreenJunkies** - **ScreenJunkies**
- **ScreenwaveMedia** - **ScreenwaveMedia**
- **Seeker**
- **SenateISVP** - **SenateISVP**
- **SendtoNews** - **SendtoNews**
- **ServingSys** - **ServingSys**
@@ -574,8 +601,10 @@
- **Shared**: shared.sx and vivo.sx - **Shared**: shared.sx and vivo.sx
- **ShareSix** - **ShareSix**
- **Sina** - **Sina**
- **SixPlay**
- **skynewsarabia:article**
- **skynewsarabia:video** - **skynewsarabia:video**
- **skynewsarabia:video** - **SkySports**
- **Slideshare** - **Slideshare**
- **Slutload** - **Slutload**
- **smotri**: Smotri.com - **smotri**: Smotri.com
@@ -607,6 +636,7 @@
- **SportBoxEmbed** - **SportBoxEmbed**
- **SportDeutschland** - **SportDeutschland**
- **Sportschau** - **Sportschau**
- **sr:mediathek**: Saarländischer Rundfunk
- **SRGSSR** - **SRGSSR**
- **SRGSSRPlay**: srf.ch, rts.ch, rsi.ch, rtr.ch and swissinfo.ch play sites - **SRGSSRPlay**: srf.ch, rts.ch, rsi.ch, rtr.ch and swissinfo.ch play sites
- **SSA** - **SSA**
@@ -641,6 +671,7 @@
- **Telegraaf** - **Telegraaf**
- **TeleMB** - **TeleMB**
- **TeleTask** - **TeleTask**
- **Telewebion**
- **TF1** - **TF1**
- **TheIntercept** - **TheIntercept**
- **ThePlatform** - **ThePlatform**
@@ -692,6 +723,7 @@
- **TVPlay**: TV3Play and related services - **TVPlay**: TV3Play and related services
- **Tweakers** - **Tweakers**
- **twitch:chapter** - **twitch:chapter**
- **twitch:clips**
- **twitch:past_broadcasts** - **twitch:past_broadcasts**
- **twitch:profile** - **twitch:profile**
- **twitch:stream** - **twitch:stream**
@@ -705,6 +737,7 @@
- **UDNEmbed**: 聯合影音 - **UDNEmbed**: 聯合影音
- **Unistra** - **Unistra**
- **Urort**: NRK P3 Urørt - **Urort**: NRK P3 Urørt
- **URPlay**
- **USAToday** - **USAToday**
- **ustream** - **ustream**
- **ustream:channel** - **ustream:channel**
@@ -722,6 +755,7 @@
- **vh1.com** - **vh1.com**
- **Vice** - **Vice**
- **ViceShow** - **ViceShow**
- **Vidbit**
- **Viddler** - **Viddler**
- **video.google:search**: Google Video search - **video.google:search**: Google Video search
- **video.mit.edu** - **video.mit.edu**
@@ -734,6 +768,7 @@
- **VideoPremium** - **VideoPremium**
- **VideoTt**: video.tt - Your True Tube (Currently broken) - **VideoTt**: video.tt - Your True Tube (Currently broken)
- **videoweed**: VideoWeed - **videoweed**: VideoWeed
- **Vidio**
- **vidme** - **vidme**
- **vidme:user** - **vidme:user**
- **vidme:user:likes** - **vidme:user:likes**
@@ -760,6 +795,7 @@
- **vine:user** - **vine:user**
- **vk**: VK - **vk**: VK
- **vk:uservideos**: VK - User's Videos - **vk:uservideos**: VK - User's Videos
- **vk:wallpost**
- **vlive** - **vlive**
- **Vodlocker** - **Vodlocker**
- **VoiceRepublic** - **VoiceRepublic**
@@ -769,7 +805,6 @@
- **VRT** - **VRT**
- **vube**: Vube.com - **vube**: Vube.com
- **VuClip** - **VuClip**
- **vulture.com**
- **Walla** - **Walla**
- **washingtonpost** - **washingtonpost**
- **washingtonpost:article** - **washingtonpost:article**
@@ -777,10 +812,8 @@
- **WatchIndianPorn**: Watch Indian Porn - **WatchIndianPorn**: Watch Indian Porn
- **WDR** - **WDR**
- **wdr:mobile** - **wdr:mobile**
- **WDRMaus**: Sendung mit der Maus
- **WebOfStories** - **WebOfStories**
- **WebOfStoriesPlaylist** - **WebOfStoriesPlaylist**
- **Weibo**
- **WeiqiTV**: WQTV - **WeiqiTV**: WQTV
- **wholecloud**: WholeCloud - **wholecloud**: WholeCloud
- **Wimp** - **Wimp**
@@ -788,10 +821,11 @@
- **WNL** - **WNL**
- **WorldStarHipHop** - **WorldStarHipHop**
- **wrzuta.pl** - **wrzuta.pl**
- **wrzuta.pl:playlist**
- **WSJ**: Wall Street Journal - **WSJ**: Wall Street Journal
- **XBef** - **XBef**
- **XboxClips** - **XboxClips**
- **XFileShare**: XFileShare based sites: DaClips, FileHoot, GorillaVid, MovPod, PowerWatch, Rapidvideo.ws, TheVideoBee, Vidto, Streamin.To - **XFileShare**: XFileShare based sites: DaClips, FileHoot, GorillaVid, MovPod, PowerWatch, Rapidvideo.ws, TheVideoBee, Vidto, Streamin.To, XVIDSTAGE
- **XHamster** - **XHamster**
- **XHamsterEmbed** - **XHamsterEmbed**
- **xiami:album**: 虾米音乐 - 专辑 - **xiami:album**: 虾米音乐 - 专辑
@@ -816,6 +850,7 @@
- **Ynet** - **Ynet**
- **YouJizz** - **YouJizz**
- **youku**: 优酷 - **youku**: 优酷
- **youku:show**
- **YouPorn** - **YouPorn**
- **YourUpload** - **YourUpload**
- **youtube**: YouTube.com - **youtube**: YouTube.com
@@ -829,6 +864,7 @@
- **youtube:search**: YouTube.com searches - **youtube:search**: YouTube.com searches
- **youtube:search:date**: YouTube.com searches, newest videos first - **youtube:search:date**: YouTube.com searches, newest videos first
- **youtube:search_url**: YouTube.com search URLs - **youtube:search_url**: YouTube.com search URLs
- **youtube:shared**
- **youtube:show**: YouTube.com (multi-season) shows - **youtube:show**: YouTube.com (multi-season) shows
- **youtube:subscriptions**: YouTube.com subscriptions feed, "ytsubs" keyword (requires authentication) - **youtube:subscriptions**: YouTube.com subscriptions feed, "ytsubs" keyword (requires authentication)
- **youtube:user**: YouTube.com user videos (URL or "ytuser" keyword) - **youtube:user**: YouTube.com user videos (URL or "ytuser" keyword)

View File

@@ -21,25 +21,37 @@ try:
import py2exe import py2exe
except ImportError: except ImportError:
if len(sys.argv) >= 2 and sys.argv[1] == 'py2exe': if len(sys.argv) >= 2 and sys.argv[1] == 'py2exe':
print("Cannot import py2exe", file=sys.stderr) print('Cannot import py2exe', file=sys.stderr)
exit(1) exit(1)
py2exe_options = { py2exe_options = {
"bundle_files": 1, 'bundle_files': 1,
"compressed": 1, 'compressed': 1,
"optimize": 2, 'optimize': 2,
"dist_dir": '.', 'dist_dir': '.',
"dll_excludes": ['w9xpopen.exe', 'crypt32.dll'], 'dll_excludes': ['w9xpopen.exe', 'crypt32.dll'],
} }
# Get the version from youtube_dl/version.py without importing the package
exec(compile(open('youtube_dl/version.py').read(),
'youtube_dl/version.py', 'exec'))
DESCRIPTION = 'YouTube video downloader'
LONG_DESCRIPTION = 'Command-line program to download videos from YouTube.com and other video sites'
py2exe_console = [{ py2exe_console = [{
"script": "./youtube_dl/__main__.py", 'script': './youtube_dl/__main__.py',
"dest_base": "youtube-dl", 'dest_base': 'youtube-dl',
'version': __version__,
'description': DESCRIPTION,
'comments': LONG_DESCRIPTION,
'product_name': 'youtube-dl',
'product_version': __version__,
}] }]
py2exe_params = { py2exe_params = {
'console': py2exe_console, 'console': py2exe_console,
'options': {"py2exe": py2exe_options}, 'options': {'py2exe': py2exe_options},
'zipfile': None 'zipfile': None
} }
@@ -72,7 +84,7 @@ else:
params['scripts'] = ['bin/youtube-dl'] params['scripts'] = ['bin/youtube-dl']
class build_lazy_extractors(Command): class build_lazy_extractors(Command):
description = "Build the extractor lazy loading module" description = 'Build the extractor lazy loading module'
user_options = [] user_options = []
def initialize_options(self): def initialize_options(self):
@@ -87,16 +99,11 @@ class build_lazy_extractors(Command):
dry_run=self.dry_run, dry_run=self.dry_run,
) )
# Get the version from youtube_dl/version.py without importing the package
exec(compile(open('youtube_dl/version.py').read(),
'youtube_dl/version.py', 'exec'))
setup( setup(
name='youtube_dl', name='youtube_dl',
version=__version__, version=__version__,
description='YouTube video downloader', description=DESCRIPTION,
long_description='Small command-line program to download videos from' long_description=LONG_DESCRIPTION,
' YouTube.com and other video sites.',
url='https://github.com/rg3/youtube-dl', url='https://github.com/rg3/youtube-dl',
author='Ricardo Garcia', author='Ricardo Garcia',
author_email='ytdl@yt-dl.org', author_email='ytdl@yt-dl.org',
@@ -112,16 +119,17 @@ setup(
# test_requires = ['nosetest'], # test_requires = ['nosetest'],
classifiers=[ classifiers=[
"Topic :: Multimedia :: Video", 'Topic :: Multimedia :: Video',
"Development Status :: 5 - Production/Stable", 'Development Status :: 5 - Production/Stable',
"Environment :: Console", 'Environment :: Console',
"License :: Public Domain", 'License :: Public Domain',
"Programming Language :: Python :: 2.6", 'Programming Language :: Python :: 2.6',
"Programming Language :: Python :: 2.7", 'Programming Language :: Python :: 2.7',
"Programming Language :: Python :: 3", 'Programming Language :: Python :: 3',
"Programming Language :: Python :: 3.2", 'Programming Language :: Python :: 3.2',
"Programming Language :: Python :: 3.3", 'Programming Language :: Python :: 3.3',
"Programming Language :: Python :: 3.4", 'Programming Language :: Python :: 3.4',
'Programming Language :: Python :: 3.5',
], ],
cmdclass={'build_lazy_extractors': build_lazy_extractors}, cmdclass={'build_lazy_extractors': build_lazy_extractors},

View File

@@ -11,7 +11,7 @@ sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import FakeYDL from test.helper import FakeYDL
from youtube_dl.extractor.common import InfoExtractor from youtube_dl.extractor.common import InfoExtractor
from youtube_dl.extractor import YoutubeIE, get_info_extractor from youtube_dl.extractor import YoutubeIE, get_info_extractor
from youtube_dl.utils import encode_data_uri, strip_jsonp, ExtractorError from youtube_dl.utils import encode_data_uri, strip_jsonp, ExtractorError, RegexNotFoundError
class TestIE(InfoExtractor): class TestIE(InfoExtractor):
@@ -66,6 +66,11 @@ class TestInfoExtractor(unittest.TestCase):
self.assertEqual(ie._html_search_meta('d', html), '4') self.assertEqual(ie._html_search_meta('d', html), '4')
self.assertEqual(ie._html_search_meta('e', html), '5') self.assertEqual(ie._html_search_meta('e', html), '5')
self.assertEqual(ie._html_search_meta('f', html), '6') self.assertEqual(ie._html_search_meta('f', html), '6')
self.assertEqual(ie._html_search_meta(('a', 'b', 'c'), html), '1')
self.assertEqual(ie._html_search_meta(('c', 'b', 'a'), html), '3')
self.assertEqual(ie._html_search_meta(('z', 'x', 'c'), html), '3')
self.assertRaises(RegexNotFoundError, ie._html_search_meta, 'z', html, None, fatal=True)
self.assertRaises(RegexNotFoundError, ie._html_search_meta, ('z', 'x'), html, None, fatal=True)
def test_download_json(self): def test_download_json(self):
uri = encode_data_uri(b'{"foo": "blah"}', 'application/json') uri = encode_data_uri(b'{"foo": "blah"}', 'application/json')

View File

@@ -6,6 +6,7 @@ from __future__ import unicode_literals
import os import os
import sys import sys
import unittest import unittest
import collections
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
@@ -130,6 +131,15 @@ class TestAllURLsMatching(unittest.TestCase):
'https://screen.yahoo.com/smartwatches-latest-wearable-gadgets-163745379-cbs.html', 'https://screen.yahoo.com/smartwatches-latest-wearable-gadgets-163745379-cbs.html',
['Yahoo']) ['Yahoo'])
def test_no_duplicated_ie_names(self):
name_accu = collections.defaultdict(list)
for ie in self.ies:
name_accu[ie.IE_NAME.lower()].append(type(ie).__name__)
for (ie_name, ie_list) in name_accu.items():
self.assertEqual(
len(ie_list), 1,
'Multiple extractors with the same IE_NAME "%s" (%s)' % (ie_name, ', '.join(ie_list)))
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()

View File

@@ -87,6 +87,8 @@ class TestCompat(unittest.TestCase):
def test_compat_shlex_split(self): def test_compat_shlex_split(self):
self.assertEqual(compat_shlex_split('-option "one two"'), ['-option', 'one two']) self.assertEqual(compat_shlex_split('-option "one two"'), ['-option', 'one two'])
self.assertEqual(compat_shlex_split('-option "one\ntwo" \n -flag'), ['-option', 'one\ntwo', '-flag'])
self.assertEqual(compat_shlex_split('-val 中文'), ['-val', '中文'])
def test_compat_etree_fromstring(self): def test_compat_etree_fromstring(self):
xml = ''' xml = '''

View File

@@ -16,6 +16,15 @@ import threading
TEST_DIR = os.path.dirname(os.path.abspath(__file__)) TEST_DIR = os.path.dirname(os.path.abspath(__file__))
def http_server_port(httpd):
if os.name == 'java' and isinstance(httpd.socket, ssl.SSLSocket):
# In Jython SSLSocket is not a subclass of socket.socket
sock = httpd.socket.sock
else:
sock = httpd.socket
return sock.getsockname()[1]
class HTTPTestRequestHandler(compat_http_server.BaseHTTPRequestHandler): class HTTPTestRequestHandler(compat_http_server.BaseHTTPRequestHandler):
def log_message(self, format, *args): def log_message(self, format, *args):
pass pass
@@ -31,6 +40,22 @@ class HTTPTestRequestHandler(compat_http_server.BaseHTTPRequestHandler):
self.send_header('Content-Type', 'video/mp4') self.send_header('Content-Type', 'video/mp4')
self.end_headers() self.end_headers()
self.wfile.write(b'\x00\x00\x00\x00\x20\x66\x74[video]') self.wfile.write(b'\x00\x00\x00\x00\x20\x66\x74[video]')
elif self.path == '/302':
if sys.version_info[0] == 3:
# XXX: Python 3 http server does not allow non-ASCII header values
self.send_response(404)
self.end_headers()
return
new_url = 'http://localhost:%d/中文.html' % http_server_port(self.server)
self.send_response(302)
self.send_header(b'Location', new_url.encode('utf-8'))
self.end_headers()
elif self.path == '/%E4%B8%AD%E6%96%87.html':
self.send_response(200)
self.send_header('Content-Type', 'text/html; charset=utf-8')
self.end_headers()
self.wfile.write(b'<html><video src="/vid.mp4" /></html>')
else: else:
assert False assert False
@@ -47,18 +72,32 @@ class FakeLogger(object):
class TestHTTP(unittest.TestCase): class TestHTTP(unittest.TestCase):
def setUp(self):
self.httpd = compat_http_server.HTTPServer(
('localhost', 0), HTTPTestRequestHandler)
self.port = http_server_port(self.httpd)
self.server_thread = threading.Thread(target=self.httpd.serve_forever)
self.server_thread.daemon = True
self.server_thread.start()
def test_unicode_path_redirection(self):
# XXX: Python 3 http server does not allow non-ASCII header values
if sys.version_info[0] == 3:
return
ydl = YoutubeDL({'logger': FakeLogger()})
r = ydl.extract_info('http://localhost:%d/302' % self.port)
self.assertEqual(r['url'], 'http://localhost:%d/vid.mp4' % self.port)
class TestHTTPS(unittest.TestCase):
def setUp(self): def setUp(self):
certfn = os.path.join(TEST_DIR, 'testcert.pem') certfn = os.path.join(TEST_DIR, 'testcert.pem')
self.httpd = compat_http_server.HTTPServer( self.httpd = compat_http_server.HTTPServer(
('localhost', 0), HTTPTestRequestHandler) ('localhost', 0), HTTPTestRequestHandler)
self.httpd.socket = ssl.wrap_socket( self.httpd.socket = ssl.wrap_socket(
self.httpd.socket, certfile=certfn, server_side=True) self.httpd.socket, certfile=certfn, server_side=True)
if os.name == 'java': self.port = http_server_port(self.httpd)
# In Jython SSLSocket is not a subclass of socket.socket
sock = self.httpd.socket.sock
else:
sock = self.httpd.socket
self.port = sock.getsockname()[1]
self.server_thread = threading.Thread(target=self.httpd.serve_forever) self.server_thread = threading.Thread(target=self.httpd.serve_forever)
self.server_thread.daemon = True self.server_thread.daemon = True
self.server_thread.start() self.server_thread.start()
@@ -94,32 +133,32 @@ class TestProxy(unittest.TestCase):
def setUp(self): def setUp(self):
self.proxy = compat_http_server.HTTPServer( self.proxy = compat_http_server.HTTPServer(
('localhost', 0), _build_proxy_handler('normal')) ('localhost', 0), _build_proxy_handler('normal'))
self.port = self.proxy.socket.getsockname()[1] self.port = http_server_port(self.proxy)
self.proxy_thread = threading.Thread(target=self.proxy.serve_forever) self.proxy_thread = threading.Thread(target=self.proxy.serve_forever)
self.proxy_thread.daemon = True self.proxy_thread.daemon = True
self.proxy_thread.start() self.proxy_thread.start()
self.cn_proxy = compat_http_server.HTTPServer( self.geo_proxy = compat_http_server.HTTPServer(
('localhost', 0), _build_proxy_handler('cn')) ('localhost', 0), _build_proxy_handler('geo'))
self.cn_port = self.cn_proxy.socket.getsockname()[1] self.geo_port = http_server_port(self.geo_proxy)
self.cn_proxy_thread = threading.Thread(target=self.cn_proxy.serve_forever) self.geo_proxy_thread = threading.Thread(target=self.geo_proxy.serve_forever)
self.cn_proxy_thread.daemon = True self.geo_proxy_thread.daemon = True
self.cn_proxy_thread.start() self.geo_proxy_thread.start()
def test_proxy(self): def test_proxy(self):
cn_proxy = 'localhost:{0}'.format(self.cn_port) geo_proxy = 'localhost:{0}'.format(self.geo_port)
ydl = YoutubeDL({ ydl = YoutubeDL({
'proxy': 'localhost:{0}'.format(self.port), 'proxy': 'localhost:{0}'.format(self.port),
'cn_verification_proxy': cn_proxy, 'geo_verification_proxy': geo_proxy,
}) })
url = 'http://foo.com/bar' url = 'http://foo.com/bar'
response = ydl.urlopen(url).read().decode('utf-8') response = ydl.urlopen(url).read().decode('utf-8')
self.assertEqual(response, 'normal: {0}'.format(url)) self.assertEqual(response, 'normal: {0}'.format(url))
req = compat_urllib_request.Request(url) req = compat_urllib_request.Request(url)
req.add_header('Ytdl-request-proxy', cn_proxy) req.add_header('Ytdl-request-proxy', geo_proxy)
response = ydl.urlopen(req).read().decode('utf-8') response = ydl.urlopen(req).read().decode('utf-8')
self.assertEqual(response, 'cn: {0}'.format(url)) self.assertEqual(response, 'geo: {0}'.format(url))
def test_proxy_with_idn(self): def test_proxy_with_idn(self):
ydl = YoutubeDL({ ydl = YoutubeDL({

View File

@@ -33,6 +33,7 @@ from youtube_dl.utils import (
ExtractorError, ExtractorError,
find_xpath_attr, find_xpath_attr,
fix_xml_ampersands, fix_xml_ampersands,
get_element_by_class,
InAdvancePagedList, InAdvancePagedList,
intlist_to_bytes, intlist_to_bytes,
is_html, is_html,
@@ -60,11 +61,13 @@ from youtube_dl.utils import (
timeconvert, timeconvert,
unescapeHTML, unescapeHTML,
unified_strdate, unified_strdate,
unified_timestamp,
unsmuggle_url, unsmuggle_url,
uppercase_escape, uppercase_escape,
lowercase_escape, lowercase_escape,
url_basename, url_basename,
urlencode_postdata, urlencode_postdata,
urshift,
update_url_query, update_url_query,
version_tuple, version_tuple,
xpath_with_ns, xpath_with_ns,
@@ -78,6 +81,7 @@ from youtube_dl.utils import (
cli_option, cli_option,
cli_valueless_option, cli_valueless_option,
cli_bool_option, cli_bool_option,
parse_codecs,
) )
from youtube_dl.compat import ( from youtube_dl.compat import (
compat_chr, compat_chr,
@@ -157,8 +161,8 @@ class TestUtil(unittest.TestCase):
self.assertTrue(sanitize_filename(':', restricted=True) != '') self.assertTrue(sanitize_filename(':', restricted=True) != '')
self.assertEqual(sanitize_filename( self.assertEqual(sanitize_filename(
'ÂÃÄÀÁÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØŒÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøœùúûüýþÿ', restricted=True), 'ÂÃÄÀÁÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖŐØŒÙÚÛÜŰÝÞßàáâãäåæçèéêëìíîïðñòóôõöőøœùúûüűýþÿ', restricted=True),
'AAAAAAAECEEEEIIIIDNOOOOOOOEUUUUYPssaaaaaaaeceeeeiiiionoooooooeuuuuypy') 'AAAAAAAECEEEEIIIIDNOOOOOOOOEUUUUUYPssaaaaaaaeceeeeiiiionooooooooeuuuuuypy')
def test_sanitize_ids(self): def test_sanitize_ids(self):
self.assertEqual(sanitize_filename('_n_cd26wFpw', is_id=True), '_n_cd26wFpw') self.assertEqual(sanitize_filename('_n_cd26wFpw', is_id=True), '_n_cd26wFpw')
@@ -249,6 +253,8 @@ class TestUtil(unittest.TestCase):
self.assertEqual(unescapeHTML('&#47;'), '/') self.assertEqual(unescapeHTML('&#47;'), '/')
self.assertEqual(unescapeHTML('&eacute;'), 'é') self.assertEqual(unescapeHTML('&eacute;'), 'é')
self.assertEqual(unescapeHTML('&#2013266066;'), '&#2013266066;') self.assertEqual(unescapeHTML('&#2013266066;'), '&#2013266066;')
# HTML5 entities
self.assertEqual(unescapeHTML('&period;&apos;'), '.\'')
def test_date_from_str(self): def test_date_from_str(self):
self.assertEqual(date_from_str('yesterday'), date_from_str('now-1day')) self.assertEqual(date_from_str('yesterday'), date_from_str('now-1day'))
@@ -281,8 +287,28 @@ class TestUtil(unittest.TestCase):
'20150202') '20150202')
self.assertEqual(unified_strdate('Feb 14th 2016 5:45PM'), '20160214') self.assertEqual(unified_strdate('Feb 14th 2016 5:45PM'), '20160214')
self.assertEqual(unified_strdate('25-09-2014'), '20140925') self.assertEqual(unified_strdate('25-09-2014'), '20140925')
self.assertEqual(unified_strdate('27.02.2016 17:30'), '20160227')
self.assertEqual(unified_strdate('UNKNOWN DATE FORMAT'), None) self.assertEqual(unified_strdate('UNKNOWN DATE FORMAT'), None)
def test_unified_timestamps(self):
self.assertEqual(unified_timestamp('December 21, 2010'), 1292889600)
self.assertEqual(unified_timestamp('8/7/2009'), 1247011200)
self.assertEqual(unified_timestamp('Dec 14, 2012'), 1355443200)
self.assertEqual(unified_timestamp('2012/10/11 01:56:38 +0000'), 1349920598)
self.assertEqual(unified_timestamp('1968 12 10'), -33436800)
self.assertEqual(unified_timestamp('1968-12-10'), -33436800)
self.assertEqual(unified_timestamp('28/01/2014 21:00:00 +0100'), 1390939200)
self.assertEqual(
unified_timestamp('11/26/2014 11:30:00 AM PST', day_first=False),
1417001400)
self.assertEqual(
unified_timestamp('2/2/2015 6:47:40 PM', day_first=False),
1422902860)
self.assertEqual(unified_timestamp('Feb 14th 2016 5:45PM'), 1455471900)
self.assertEqual(unified_timestamp('25-09-2014'), 1411603200)
self.assertEqual(unified_timestamp('27.02.2016 17:30'), 1456594200)
self.assertEqual(unified_timestamp('UNKNOWN DATE FORMAT'), None)
def test_determine_ext(self): def test_determine_ext(self):
self.assertEqual(determine_ext('http://example.com/foo/bar.mp4/?download'), 'mp4') self.assertEqual(determine_ext('http://example.com/foo/bar.mp4/?download'), 'mp4')
self.assertEqual(determine_ext('http://example.com/foo/bar/?download', None), None) self.assertEqual(determine_ext('http://example.com/foo/bar/?download', None), None)
@@ -381,6 +407,12 @@ class TestUtil(unittest.TestCase):
self.assertEqual(res_url, url) self.assertEqual(res_url, url)
self.assertEqual(res_data, None) self.assertEqual(res_data, None)
smug_url = smuggle_url(url, {'a': 'b'})
smug_smug_url = smuggle_url(smug_url, {'c': 'd'})
res_url, res_data = unsmuggle_url(smug_smug_url)
self.assertEqual(res_url, url)
self.assertEqual(res_data, {'a': 'b', 'c': 'd'})
def test_shell_quote(self): def test_shell_quote(self):
args = ['ffmpeg', '-i', encodeFilename('ñ€ß\'.mp4')] args = ['ffmpeg', '-i', encodeFilename('ñ€ß\'.mp4')]
self.assertEqual(shell_quote(args), """ffmpeg -i 'ñ€ß'"'"'.mp4'""") self.assertEqual(shell_quote(args), """ffmpeg -i 'ñ€ß'"'"'.mp4'""")
@@ -577,6 +609,29 @@ class TestUtil(unittest.TestCase):
limit_length('foo bar baz asd', 12).startswith('foo bar')) limit_length('foo bar baz asd', 12).startswith('foo bar'))
self.assertTrue('...' in limit_length('foo bar baz asd', 12)) self.assertTrue('...' in limit_length('foo bar baz asd', 12))
def test_parse_codecs(self):
self.assertEqual(parse_codecs(''), {})
self.assertEqual(parse_codecs('avc1.77.30, mp4a.40.2'), {
'vcodec': 'avc1.77.30',
'acodec': 'mp4a.40.2',
})
self.assertEqual(parse_codecs('mp4a.40.2'), {
'vcodec': 'none',
'acodec': 'mp4a.40.2',
})
self.assertEqual(parse_codecs('mp4a.40.5,avc1.42001e'), {
'vcodec': 'avc1.42001e',
'acodec': 'mp4a.40.5',
})
self.assertEqual(parse_codecs('avc3.640028'), {
'vcodec': 'avc3.640028',
'acodec': 'none',
})
self.assertEqual(parse_codecs(', h264,,newcodec,aac'), {
'vcodec': 'h264',
'acodec': 'aac',
})
def test_escape_rfc3986(self): def test_escape_rfc3986(self):
reserved = "!*'();:@&=+$,/?#[]" reserved = "!*'();:@&=+$,/?#[]"
unreserved = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_.~' unreserved = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_.~'
@@ -638,6 +693,9 @@ class TestUtil(unittest.TestCase):
"1":{"src":"skipped", "type": "application/vnd.apple.mpegURL"} "1":{"src":"skipped", "type": "application/vnd.apple.mpegURL"}
}''') }''')
inp = '''{"foo":101}'''
self.assertEqual(js_to_json(inp), '''{"foo":101}''')
def test_js_to_json_edgecases(self): def test_js_to_json_edgecases(self):
on = js_to_json("{abc_def:'1\\'\\\\2\\\\\\'3\"4'}") on = js_to_json("{abc_def:'1\\'\\\\2\\\\\\'3\"4'}")
self.assertEqual(json.loads(on), {"abc_def": "1'\\2\\'3\"4"}) self.assertEqual(json.loads(on), {"abc_def": "1'\\2\\'3\"4"})
@@ -954,5 +1012,17 @@ The first line
self.assertRaises(ValueError, encode_base_n, 0, 70) self.assertRaises(ValueError, encode_base_n, 0, 70)
self.assertRaises(ValueError, encode_base_n, 0, 60, custom_table) self.assertRaises(ValueError, encode_base_n, 0, 60, custom_table)
def test_urshift(self):
self.assertEqual(urshift(3, 1), 1)
self.assertEqual(urshift(-3, 1), 2147483646)
def test_get_element_by_class(self):
html = '''
<span class="foo bar">nice</span>
'''
self.assertEqual(get_element_by_class('foo', html), 'nice')
self.assertEqual(get_element_by_class('no-such-class', html), None)
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()

View File

@@ -196,8 +196,8 @@ class YoutubeDL(object):
prefer_insecure: Use HTTP instead of HTTPS to retrieve information. prefer_insecure: Use HTTP instead of HTTPS to retrieve information.
At the moment, this is only supported by YouTube. At the moment, this is only supported by YouTube.
proxy: URL of the proxy server to use proxy: URL of the proxy server to use
cn_verification_proxy: URL of the proxy to use for IP address verification geo_verification_proxy: URL of the proxy to use for IP address verification
on Chinese sites. (Experimental) on geo-restricted sites. (Experimental)
socket_timeout: Time to wait for unresponsive hosts, in seconds socket_timeout: Time to wait for unresponsive hosts, in seconds
bidi_workaround: Work around buggy terminals without bidirectional text bidi_workaround: Work around buggy terminals without bidirectional text
support, using fridibi support, using fridibi
@@ -304,6 +304,11 @@ class YoutubeDL(object):
self.params.update(params) self.params.update(params)
self.cache = Cache(self) self.cache = Cache(self)
if self.params.get('cn_verification_proxy') is not None:
self.report_warning('--cn-verification-proxy is deprecated. Use --geo-verification-proxy instead.')
if self.params.get('geo_verification_proxy') is None:
self.params['geo_verification_proxy'] = self.params['cn_verification_proxy']
if params.get('bidi_workaround', False): if params.get('bidi_workaround', False):
try: try:
import pty import pty
@@ -1223,6 +1228,10 @@ class YoutubeDL(object):
if 'title' not in info_dict: if 'title' not in info_dict:
raise ExtractorError('Missing "title" field in extractor result') raise ExtractorError('Missing "title" field in extractor result')
if not isinstance(info_dict['id'], compat_str):
self.report_warning('"id" field is not a string - forcing string conversion')
info_dict['id'] = compat_str(info_dict['id'])
if 'playlist' not in info_dict: if 'playlist' not in info_dict:
# It isn't part of a playlist # It isn't part of a playlist
info_dict['playlist'] = None info_dict['playlist'] = None

View File

@@ -18,7 +18,6 @@ from .options import (
from .compat import ( from .compat import (
compat_expanduser, compat_expanduser,
compat_getpass, compat_getpass,
compat_print,
compat_shlex_split, compat_shlex_split,
workaround_optparse_bug9161, workaround_optparse_bug9161,
) )
@@ -76,7 +75,7 @@ def _real_main(argv=None):
# Dump user agent # Dump user agent
if opts.dump_user_agent: if opts.dump_user_agent:
compat_print(std_headers['User-Agent']) write_string(std_headers['User-Agent'] + '\n', out=sys.stdout)
sys.exit(0) sys.exit(0)
# Batch file verification # Batch file verification
@@ -101,10 +100,10 @@ def _real_main(argv=None):
if opts.list_extractors: if opts.list_extractors:
for ie in list_extractors(opts.age_limit): for ie in list_extractors(opts.age_limit):
compat_print(ie.IE_NAME + (' (CURRENTLY BROKEN)' if not ie._WORKING else '')) write_string(ie.IE_NAME + (' (CURRENTLY BROKEN)' if not ie._WORKING else '') + '\n', out=sys.stdout)
matchedUrls = [url for url in all_urls if ie.suitable(url)] matchedUrls = [url for url in all_urls if ie.suitable(url)]
for mu in matchedUrls: for mu in matchedUrls:
compat_print(' ' + mu) write_string(' ' + mu + '\n', out=sys.stdout)
sys.exit(0) sys.exit(0)
if opts.list_extractor_descriptions: if opts.list_extractor_descriptions:
for ie in list_extractors(opts.age_limit): for ie in list_extractors(opts.age_limit):
@@ -117,7 +116,7 @@ def _real_main(argv=None):
_SEARCHES = ('cute kittens', 'slithering pythons', 'falling cat', 'angry poodle', 'purple fish', 'running tortoise', 'sleeping bunny', 'burping cow') _SEARCHES = ('cute kittens', 'slithering pythons', 'falling cat', 'angry poodle', 'purple fish', 'running tortoise', 'sleeping bunny', 'burping cow')
_COUNTS = ('', '5', '10', 'all') _COUNTS = ('', '5', '10', 'all')
desc += ' (Example: "%s%s:%s" )' % (ie.SEARCH_KEY, random.choice(_COUNTS), random.choice(_SEARCHES)) desc += ' (Example: "%s%s:%s" )' % (ie.SEARCH_KEY, random.choice(_COUNTS), random.choice(_SEARCHES))
compat_print(desc) write_string(desc + '\n', out=sys.stdout)
sys.exit(0) sys.exit(0)
# Conflicting, missing and erroneous options # Conflicting, missing and erroneous options
@@ -383,6 +382,8 @@ def _real_main(argv=None):
'external_downloader_args': external_downloader_args, 'external_downloader_args': external_downloader_args,
'postprocessor_args': postprocessor_args, 'postprocessor_args': postprocessor_args,
'cn_verification_proxy': opts.cn_verification_proxy, 'cn_verification_proxy': opts.cn_verification_proxy,
'geo_verification_proxy': opts.geo_verification_proxy,
} }
with YoutubeDL(ydl_opts) as ydl: with YoutubeDL(ydl_opts) as ydl:

File diff suppressed because it is too large Load Diff

View File

@@ -85,7 +85,7 @@ class ExternalFD(FileDownloader):
cmd, stderr=subprocess.PIPE) cmd, stderr=subprocess.PIPE)
_, stderr = p.communicate() _, stderr = p.communicate()
if p.returncode != 0: if p.returncode != 0:
self.to_stderr(stderr) self.to_stderr(stderr.decode('utf-8', 'replace'))
return p.returncode return p.returncode
@@ -210,6 +210,7 @@ class FFmpegFD(ExternalFD):
# args += ['-http_proxy', proxy] # args += ['-http_proxy', proxy]
env = os.environ.copy() env = os.environ.copy()
compat_setenv('HTTP_PROXY', proxy, env=env) compat_setenv('HTTP_PROXY', proxy, env=env)
compat_setenv('http_proxy', proxy, env=env)
protocol = info_dict.get('protocol') protocol = info_dict.get('protocol')

View File

@@ -196,6 +196,11 @@ def build_fragments_list(boot_info):
first_frag_number = fragment_run_entry_table[0]['first'] first_frag_number = fragment_run_entry_table[0]['first']
fragments_counter = itertools.count(first_frag_number) fragments_counter = itertools.count(first_frag_number)
for segment, fragments_count in segment_run_table['segment_run']: for segment, fragments_count in segment_run_table['segment_run']:
# In some live HDS streams (for example Rai), `fragments_count` is
# abnormal and causing out-of-memory errors. It's OK to change the
# number of fragments for live streams as they are updated periodically
if fragments_count == 4294967295 and boot_info['live']:
fragments_count = 2
for _ in range(fragments_count): for _ in range(fragments_count):
res.append((segment, next(fragments_counter))) res.append((segment, next(fragments_counter)))
@@ -329,7 +334,11 @@ class F4mFD(FragmentFD):
base_url = compat_urlparse.urljoin(man_url, media.attrib['url']) base_url = compat_urlparse.urljoin(man_url, media.attrib['url'])
bootstrap_node = doc.find(_add_ns('bootstrapInfo')) bootstrap_node = doc.find(_add_ns('bootstrapInfo'))
boot_info, bootstrap_url = self._parse_bootstrap_node(bootstrap_node, base_url) # From Adobe F4M 3.0 spec:
# The <baseURL> element SHALL be the base URL for all relative
# (HTTP-based) URLs in the manifest. If <baseURL> is not present, said
# URLs should be relative to the location of the containing document.
boot_info, bootstrap_url = self._parse_bootstrap_node(bootstrap_node, man_url)
live = boot_info['live'] live = boot_info['live']
metadata_node = media.find(_add_ns('metadata')) metadata_node = media.find(_add_ns('metadata'))
if metadata_node is not None: if metadata_node is not None:

View File

@@ -2,14 +2,24 @@ from __future__ import unicode_literals
import os.path import os.path
import re import re
import binascii
try:
from Crypto.Cipher import AES
can_decrypt_frag = True
except ImportError:
can_decrypt_frag = False
from .fragment import FragmentFD from .fragment import FragmentFD
from .external import FFmpegFD from .external import FFmpegFD
from ..compat import compat_urlparse from ..compat import (
compat_urlparse,
compat_struct_pack,
)
from ..utils import ( from ..utils import (
encodeFilename, encodeFilename,
sanitize_open, sanitize_open,
parse_m3u8_attributes,
) )
@@ -21,19 +31,27 @@ class HlsFD(FragmentFD):
@staticmethod @staticmethod
def can_download(manifest): def can_download(manifest):
UNSUPPORTED_FEATURES = ( UNSUPPORTED_FEATURES = (
r'#EXT-X-KEY:METHOD=(?!NONE)', # encrypted streams [1] r'#EXT-X-KEY:METHOD=(?!NONE|AES-128)', # encrypted streams [1]
r'#EXT-X-BYTERANGE', # playlists composed of byte ranges of media files [2] r'#EXT-X-BYTERANGE', # playlists composed of byte ranges of media files [2]
# Live streams heuristic does not always work (e.g. geo restricted to Germany # Live streams heuristic does not always work (e.g. geo restricted to Germany
# http://hls-geo.daserste.de/i/videoportal/Film/c_620000/622873/format,716451,716457,716450,716458,716459,.mp4.csmil/index_4_av.m3u8?null=0) # http://hls-geo.daserste.de/i/videoportal/Film/c_620000/622873/format,716451,716457,716450,716458,716459,.mp4.csmil/index_4_av.m3u8?null=0)
# r'#EXT-X-MEDIA-SEQUENCE:(?!0$)', # live streams [3] # r'#EXT-X-MEDIA-SEQUENCE:(?!0$)', # live streams [3]
r'#EXT-X-PLAYLIST-TYPE:EVENT', # media segments may be appended to the end of
# event media playlists [4] # This heuristic also is not correct since segments may not be appended as well.
# Twitch vods of finished streams have EXT-X-PLAYLIST-TYPE:EVENT despite
# no segments will definitely be appended to the end of the playlist.
# r'#EXT-X-PLAYLIST-TYPE:EVENT', # media segments may be appended to the end of
# # event media playlists [4]
# 1. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.2.4 # 1. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.2.4
# 2. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.2.2 # 2. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.2.2
# 3. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3.2 # 3. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3.2
# 4. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3.5 # 4. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3.5
) )
return all(not re.search(feature, manifest) for feature in UNSUPPORTED_FEATURES) check_results = [not re.search(feature, manifest) for feature in UNSUPPORTED_FEATURES]
check_results.append(can_decrypt_frag or '#EXT-X-KEY:METHOD=AES-128' not in manifest)
return all(check_results)
def real_download(self, filename, info_dict): def real_download(self, filename, info_dict):
man_url = info_dict['url'] man_url = info_dict['url']
@@ -51,36 +69,60 @@ class HlsFD(FragmentFD):
fd.add_progress_hook(ph) fd.add_progress_hook(ph)
return fd.real_download(filename, info_dict) return fd.real_download(filename, info_dict)
fragment_urls = [] total_frags = 0
for line in s.splitlines(): for line in s.splitlines():
line = line.strip() line = line.strip()
if line and not line.startswith('#'): if line and not line.startswith('#'):
segment_url = ( total_frags += 1
line
if re.match(r'^https?://', line)
else compat_urlparse.urljoin(man_url, line))
fragment_urls.append(segment_url)
# We only download the first fragment during the test
if self.params.get('test', False):
break
ctx = { ctx = {
'filename': filename, 'filename': filename,
'total_frags': len(fragment_urls), 'total_frags': total_frags,
} }
self._prepare_and_start_frag_download(ctx) self._prepare_and_start_frag_download(ctx)
i = 0
media_sequence = 0
decrypt_info = {'METHOD': 'NONE'}
frags_filenames = [] frags_filenames = []
for i, frag_url in enumerate(fragment_urls): for line in s.splitlines():
frag_filename = '%s-Frag%d' % (ctx['tmpfilename'], i) line = line.strip()
success = ctx['dl'].download(frag_filename, {'url': frag_url}) if line:
if not success: if not line.startswith('#'):
return False frag_url = (
down, frag_sanitized = sanitize_open(frag_filename, 'rb') line
ctx['dest_stream'].write(down.read()) if re.match(r'^https?://', line)
down.close() else compat_urlparse.urljoin(man_url, line))
frags_filenames.append(frag_sanitized) frag_filename = '%s-Frag%d' % (ctx['tmpfilename'], i)
success = ctx['dl'].download(frag_filename, {'url': frag_url})
if not success:
return False
down, frag_sanitized = sanitize_open(frag_filename, 'rb')
frag_content = down.read()
down.close()
if decrypt_info['METHOD'] == 'AES-128':
iv = decrypt_info.get('IV') or compat_struct_pack('>8xq', media_sequence)
frag_content = AES.new(
decrypt_info['KEY'], AES.MODE_CBC, iv).decrypt(frag_content)
ctx['dest_stream'].write(frag_content)
frags_filenames.append(frag_sanitized)
# We only download the first fragment during the test
if self.params.get('test', False):
break
i += 1
media_sequence += 1
elif line.startswith('#EXT-X-KEY'):
decrypt_info = parse_m3u8_attributes(line[11:])
if decrypt_info['METHOD'] == 'AES-128':
if 'IV' in decrypt_info:
decrypt_info['IV'] = binascii.unhexlify(decrypt_info['IV'][2:])
if not re.match(r'^https?://', decrypt_info['URI']):
decrypt_info['URI'] = compat_urlparse.urljoin(
man_url, decrypt_info['URI'])
decrypt_info['KEY'] = self.ydl.urlopen(decrypt_info['URI']).read()
elif line.startswith('#EXT-X-MEDIA-SEQUENCE'):
media_sequence = int(line[22:])
self._finish_frag_download(ctx) self._finish_frag_download(ctx)

View File

@@ -156,7 +156,10 @@ class AdobeTVVideoIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
video_data = self._download_json(url + '?format=json', video_id) webpage = self._download_webpage(url, video_id)
video_data = self._parse_json(self._search_regex(
r'var\s+bridge\s*=\s*([^;]+);', webpage, 'bridged data'), video_id)
formats = [{ formats = [{
'format_id': '%s-%s' % (determine_ext(source['src']), source.get('height')), 'format_id': '%s-%s' % (determine_ext(source['src']), source.get('height')),

View File

@@ -2,23 +2,137 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .theplatform import ThePlatformIE
from ..utils import ( from ..utils import (
smuggle_url, smuggle_url,
update_url_query, update_url_query,
unescapeHTML, unescapeHTML,
extract_attributes,
get_element_by_attribute,
)
from ..compat import (
compat_urlparse,
) )
class AENetworksIE(InfoExtractor): class AENetworksBaseIE(ThePlatformIE):
_THEPLATFORM_KEY = 'crazyjava'
_THEPLATFORM_SECRET = 's3cr3t'
class AENetworksIE(AENetworksBaseIE):
IE_NAME = 'aenetworks' IE_NAME = 'aenetworks'
IE_DESC = 'A+E Networks: A&E, Lifetime, History.com, FYI Network' IE_DESC = 'A+E Networks: A&E, Lifetime, History.com, FYI Network'
_VALID_URL = r'https?://(?:www\.)?(?:(?:history|aetv|mylifetime)\.com|fyi\.tv)/(?P<type>[^/]+)/(?:[^/]+/)+(?P<id>[^/]+?)(?:$|[?#])' _VALID_URL = r'https?://(?:www\.)?(?P<domain>(?:history|aetv|mylifetime)\.com|fyi\.tv)/(?:shows/(?P<show_path>[^/]+(?:/[^/]+){0,2})|movies/(?P<movie_display_id>[^/]+)/full-movie)'
_TESTS = [{
'url': 'http://www.history.com/shows/mountain-men/season-1/episode-1',
'md5': '8ff93eb073449f151d6b90c0ae1ef0c7',
'info_dict': {
'id': '22253814',
'ext': 'mp4',
'title': 'Winter Is Coming',
'description': 'md5:641f424b7a19d8e24f26dea22cf59d74',
'timestamp': 1338306241,
'upload_date': '20120529',
'uploader': 'AENE-NEW',
},
'add_ie': ['ThePlatform'],
}, {
'url': 'http://www.history.com/shows/ancient-aliens/season-1',
'info_dict': {
'id': '71889446852',
},
'playlist_mincount': 5,
}, {
'url': 'http://www.mylifetime.com/shows/atlanta-plastic',
'info_dict': {
'id': 'SERIES4317',
'title': 'Atlanta Plastic',
},
'playlist_mincount': 2,
}, {
'url': 'http://www.aetv.com/shows/duck-dynasty/season-9/episode-1',
'only_matching': True
}, {
'url': 'http://www.fyi.tv/shows/tiny-house-nation/season-1/episode-8',
'only_matching': True
}, {
'url': 'http://www.mylifetime.com/shows/project-runway-junior/season-1/episode-6',
'only_matching': True
}, {
'url': 'http://www.mylifetime.com/movies/center-stage-on-pointe/full-movie',
'only_matching': True
}]
_DOMAIN_TO_REQUESTOR_ID = {
'history.com': 'HISTORY',
'aetv.com': 'AETV',
'mylifetime.com': 'LIFETIME',
'fyi.tv': 'FYI',
}
def _real_extract(self, url):
domain, show_path, movie_display_id = re.match(self._VALID_URL, url).groups()
display_id = show_path or movie_display_id
webpage = self._download_webpage(url, display_id)
if show_path:
url_parts = show_path.split('/')
url_parts_len = len(url_parts)
if url_parts_len == 1:
entries = []
for season_url_path in re.findall(r'(?s)<li[^>]+data-href="(/shows/%s/season-\d+)"' % url_parts[0], webpage):
entries.append(self.url_result(
compat_urlparse.urljoin(url, season_url_path), 'AENetworks'))
return self.playlist_result(
entries, self._html_search_meta('aetn:SeriesId', webpage),
self._html_search_meta('aetn:SeriesTitle', webpage))
elif url_parts_len == 2:
entries = []
for episode_item in re.findall(r'(?s)<div[^>]+class="[^"]*episode-item[^"]*"[^>]*>', webpage):
episode_attributes = extract_attributes(episode_item)
episode_url = compat_urlparse.urljoin(
url, episode_attributes['data-canonical'])
entries.append(self.url_result(
episode_url, 'AENetworks',
episode_attributes['data-videoid']))
return self.playlist_result(
entries, self._html_search_meta('aetn:SeasonId', webpage))
query = {
'mbr': 'true',
'assetTypes': 'medium_video_s3'
}
video_id = self._html_search_meta('aetn:VideoID', webpage)
media_url = self._search_regex(
r"media_url\s*=\s*'([^']+)'", webpage, 'video url')
theplatform_metadata = self._download_theplatform_metadata(self._search_regex(
r'https?://link.theplatform.com/s/([^?]+)', media_url, 'theplatform_path'), video_id)
info = self._parse_theplatform_metadata(theplatform_metadata)
if theplatform_metadata.get('AETN$isBehindWall'):
requestor_id = self._DOMAIN_TO_REQUESTOR_ID[domain]
resource = '<rss version="2.0" xmlns:media="http://search.yahoo.com/mrss/"><channel><title>%s</title><item><title>%s</title><guid>%s</guid><media:rating scheme="urn:v-chip">%s</media:rating></item></channel></rss>' % (requestor_id, theplatform_metadata['title'], theplatform_metadata['AETN$PPL_pplProgramId'], theplatform_metadata['ratings'][0]['rating'])
query['auth'] = self._extract_mvpd_auth(
url, video_id, requestor_id, resource)
info.update(self._search_json_ld(webpage, video_id, fatal=False))
media_url = update_url_query(media_url, query)
media_url = self._sign_url(media_url, self._THEPLATFORM_KEY, self._THEPLATFORM_SECRET)
formats, subtitles = self._extract_theplatform_smil(media_url, video_id)
self._sort_formats(formats)
info.update({
'id': video_id,
'formats': formats,
'subtitles': subtitles,
})
return info
class HistoryTopicIE(AENetworksBaseIE):
IE_NAME = 'history:topic'
IE_DESC = 'History.com Topic'
_VALID_URL = r'https?://(?:www\.)?history\.com/topics/(?:[^/]+/)?(?P<topic_id>[^/]+)(?:/[^/]+(?:/(?P<video_display_id>[^/?#]+))?)?'
_TESTS = [{ _TESTS = [{
'url': 'http://www.history.com/topics/valentines-day/history-of-valentines-day/videos/bet-you-didnt-know-valentines-day?m=528e394da93ae&s=undefined&f=1&free=false', 'url': 'http://www.history.com/topics/valentines-day/history-of-valentines-day/videos/bet-you-didnt-know-valentines-day?m=528e394da93ae&s=undefined&f=1&free=false',
'info_dict': { 'info_dict': {
'id': 'g12m5Gyt3fdR', 'id': '40700995724',
'ext': 'mp4', 'ext': 'mp4',
'title': "Bet You Didn't Know: Valentine's Day", 'title': "Bet You Didn't Know: Valentine's Day",
'description': 'md5:7b57ea4829b391995b405fa60bd7b5f7', 'description': 'md5:7b57ea4829b391995b405fa60bd7b5f7',
@@ -31,57 +145,61 @@ class AENetworksIE(InfoExtractor):
'skip_download': True, 'skip_download': True,
}, },
'add_ie': ['ThePlatform'], 'add_ie': ['ThePlatform'],
'expected_warnings': ['JSON-LD'],
}, { }, {
'url': 'http://www.history.com/shows/mountain-men/season-1/episode-1', 'url': 'http://www.history.com/topics/world-war-i/world-war-i-history/videos',
'md5': '8ff93eb073449f151d6b90c0ae1ef0c7', 'info_dict':
'info_dict': { {
'id': 'eg47EERs_JsZ', 'id': 'world-war-i-history',
'ext': 'mp4', 'title': 'World War I History',
'title': 'Winter Is Coming',
'description': 'md5:641f424b7a19d8e24f26dea22cf59d74',
'timestamp': 1338306241,
'upload_date': '20120529',
'uploader': 'AENE-NEW',
}, },
'add_ie': ['ThePlatform'], 'playlist_mincount': 24,
}, { }, {
'url': 'http://www.aetv.com/shows/duck-dynasty/video/inlawful-entry', 'url': 'http://www.history.com/topics/world-war-i-history/videos',
'only_matching': True 'only_matching': True,
}, { }, {
'url': 'http://www.fyi.tv/shows/tiny-house-nation/videos/207-sq-ft-minnesota-prairie-cottage', 'url': 'http://www.history.com/topics/world-war-i/world-war-i-history',
'only_matching': True 'only_matching': True,
}, { }, {
'url': 'http://www.mylifetime.com/shows/project-runway-junior/video/season-1/episode-6/superstar-clients', 'url': 'http://www.history.com/topics/world-war-i/world-war-i-history/speeches',
'only_matching': True 'only_matching': True,
}] }]
def _real_extract(self, url): def theplatform_url_result(self, theplatform_url, video_id, query):
page_type, video_id = re.match(self._VALID_URL, url).groups() return {
webpage = self._download_webpage(url, video_id)
video_url_re = [
r'data-href="[^"]*/%s"[^>]+data-release-url="([^"]+)"' % video_id,
r"media_url\s*=\s*'([^']+)'"
]
video_url = unescapeHTML(self._search_regex(video_url_re, webpage, 'video url'))
query = {'mbr': 'true'}
if page_type == 'shows':
query['assetTypes'] = 'medium_video_s3'
if 'switch=hds' in video_url:
query['switch'] = 'hls'
info = self._search_json_ld(webpage, video_id, fatal=False)
info.update({
'_type': 'url_transparent', '_type': 'url_transparent',
'id': video_id,
'url': smuggle_url( 'url': smuggle_url(
update_url_query(video_url, query), update_url_query(theplatform_url, query),
{ {
'sig': { 'sig': {
'key': 'crazyjava', 'key': self._THEPLATFORM_KEY,
'secret': 's3cr3t'}, 'secret': self._THEPLATFORM_SECRET,
},
'force_smil_url': True 'force_smil_url': True
}), }),
}) 'ie_key': 'ThePlatform',
return info }
def _real_extract(self, url):
topic_id, video_display_id = re.match(self._VALID_URL, url).groups()
if video_display_id:
webpage = self._download_webpage(url, video_display_id)
release_url, video_id = re.search(r"_videoPlayer.play\('([^']+)'\s*,\s*'[^']+'\s*,\s*'(\d+)'\)", webpage).groups()
release_url = unescapeHTML(release_url)
return self.theplatform_url_result(
release_url, video_id, {
'mbr': 'true',
'switch': 'hls'
})
else:
webpage = self._download_webpage(url, topic_id)
entries = []
for episode_item in re.findall(r'<a.+?data-release-url="[^"]+"[^>]*>', webpage):
video_attributes = extract_attributes(episode_item)
entries.append(self.theplatform_url_result(
video_attributes['data-release-url'], video_attributes['data-id'], {
'mbr': 'true',
'switch': 'hls'
}))
return self.playlist_result(entries, topic_id, get_element_by_attribute('class', 'show-title', webpage))

View File

@@ -0,0 +1,133 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse_urlparse,
compat_urlparse,
)
from ..utils import (
ExtractorError,
int_or_none,
xpath_element,
xpath_text,
)
class AfreecaTVIE(InfoExtractor):
IE_DESC = 'afreecatv.com'
_VALID_URL = r'''(?x)^
https?://(?:(live|afbbs|www)\.)?afreeca(?:tv)?\.com(?::\d+)?
(?:
/app/(?:index|read_ucc_bbs)\.cgi|
/player/[Pp]layer\.(?:swf|html))
\?.*?\bnTitleNo=(?P<id>\d+)'''
_TESTS = [{
'url': 'http://live.afreecatv.com:8079/app/index.cgi?szType=read_ucc_bbs&szBjId=dailyapril&nStationNo=16711924&nBbsNo=18605867&nTitleNo=36164052&szSkin=',
'md5': 'f72c89fe7ecc14c1b5ce506c4996046e',
'info_dict': {
'id': '36164052',
'ext': 'mp4',
'title': '데일리 에이프릴 요정들의 시상식!',
'thumbnail': 're:^https?://(?:video|st)img.afreecatv.com/.*$',
'uploader': 'dailyapril',
'uploader_id': 'dailyapril',
'upload_date': '20160503',
}
}, {
'url': 'http://afbbs.afreecatv.com:8080/app/read_ucc_bbs.cgi?nStationNo=16711924&nTitleNo=36153164&szBjId=dailyapril&nBbsNo=18605867',
'info_dict': {
'id': '36153164',
'title': "BJ유트루와 함께하는 '팅커벨 메이크업!'",
'thumbnail': 're:^https?://(?:video|st)img.afreecatv.com/.*$',
'uploader': 'dailyapril',
'uploader_id': 'dailyapril',
},
'playlist_count': 2,
'playlist': [{
'md5': 'd8b7c174568da61d774ef0203159bf97',
'info_dict': {
'id': '36153164_1',
'ext': 'mp4',
'title': "BJ유트루와 함께하는 '팅커벨 메이크업!'",
'upload_date': '20160502',
},
}, {
'md5': '58f2ce7f6044e34439ab2d50612ab02b',
'info_dict': {
'id': '36153164_2',
'ext': 'mp4',
'title': "BJ유트루와 함께하는 '팅커벨 메이크업!'",
'upload_date': '20160502',
},
}],
}, {
'url': 'http://www.afreecatv.com/player/Player.swf?szType=szBjId=djleegoon&nStationNo=11273158&nBbsNo=13161095&nTitleNo=36327652',
'only_matching': True,
}]
@staticmethod
def parse_video_key(key):
video_key = {}
m = re.match(r'^(?P<upload_date>\d{8})_\w+_(?P<part>\d+)$', key)
if m:
video_key['upload_date'] = m.group('upload_date')
video_key['part'] = m.group('part')
return video_key
def _real_extract(self, url):
video_id = self._match_id(url)
parsed_url = compat_urllib_parse_urlparse(url)
info_url = compat_urlparse.urlunparse(parsed_url._replace(
netloc='afbbs.afreecatv.com:8080',
path='/api/video/get_video_info.php'))
video_xml = self._download_xml(info_url, video_id)
if xpath_element(video_xml, './track/video/file') is None:
raise ExtractorError('Specified AfreecaTV video does not exist',
expected=True)
title = xpath_text(video_xml, './track/title', 'title')
uploader = xpath_text(video_xml, './track/nickname', 'uploader')
uploader_id = xpath_text(video_xml, './track/bj_id', 'uploader id')
duration = int_or_none(xpath_text(video_xml, './track/duration',
'duration'))
thumbnail = xpath_text(video_xml, './track/titleImage', 'thumbnail')
entries = []
for i, video_file in enumerate(video_xml.findall('./track/video/file')):
video_key = self.parse_video_key(video_file.get('key', ''))
if not video_key:
continue
entries.append({
'id': '%s_%s' % (video_id, video_key.get('part', i + 1)),
'title': title,
'upload_date': video_key.get('upload_date'),
'duration': int_or_none(video_file.get('duration')),
'url': video_file.text,
})
info = {
'id': video_id,
'title': title,
'uploader': uploader,
'uploader_id': uploader_id,
'duration': duration,
'thumbnail': thumbnail,
}
if len(entries) > 1:
info['_type'] = 'multi_video'
info['entries'] = entries
elif len(entries) == 1:
info['url'] = entries[0]['url']
info['upload_date'] = entries[0].get('upload_date')
else:
raise ExtractorError(
'No files found for the specified AfreecaTV video, either'
' the URL is incorrect or the video has been made private.',
expected=True)
return info

View File

@@ -24,10 +24,10 @@ class AftonbladetIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
# find internal video meta data # find internal video meta data
meta_url = 'http://aftonbladet-play.drlib.aptoma.no/video/%s.json' meta_url = 'http://aftonbladet-play-metadata.cdn.drvideo.aptoma.no/video/%s.json'
player_config = self._parse_json(self._html_search_regex( player_config = self._parse_json(self._html_search_regex(
r'data-player-config="([^"]+)"', webpage, 'player config'), video_id) r'data-player-config="([^"]+)"', webpage, 'player config'), video_id)
internal_meta_id = player_config['videoId'] internal_meta_id = player_config['aptomaVideoId']
internal_meta_url = meta_url % internal_meta_id internal_meta_url = meta_url % internal_meta_id
internal_meta_json = self._download_json( internal_meta_json = self._download_json(
internal_meta_url, video_id, 'Downloading video meta data') internal_meta_url, video_id, 'Downloading video meta data')

View File

@@ -5,6 +5,8 @@ from .common import InfoExtractor
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
parse_iso8601, parse_iso8601,
mimetype2ext,
determine_ext,
) )
@@ -50,21 +52,25 @@ class AMPIE(InfoExtractor):
if isinstance(media_content, dict): if isinstance(media_content, dict):
media_content = [media_content] media_content = [media_content]
for media_data in media_content: for media_data in media_content:
media = media_data['@attributes'] media = media_data.get('@attributes', {})
media_type = media['type'] media_url = media.get('url')
if media_type in ('video/f4m', 'application/f4m+xml'): if not media_url:
continue
ext = mimetype2ext(media.get('type')) or determine_ext(media_url)
if ext == 'f4m':
formats.extend(self._extract_f4m_formats( formats.extend(self._extract_f4m_formats(
media['url'] + '?hdcore=3.4.0&plugin=aasp-3.4.0.132.124', media_url + '?hdcore=3.4.0&plugin=aasp-3.4.0.132.124',
video_id, f4m_id='hds', fatal=False)) video_id, f4m_id='hds', fatal=False))
elif media_type == 'application/x-mpegURL': elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
media['url'], video_id, 'mp4', m3u8_id='hls', fatal=False)) media_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
else: else:
formats.append({ formats.append({
'format_id': media_data.get('media-category', {}).get('@attributes', {}).get('label'), 'format_id': media_data.get('media-category', {}).get('@attributes', {}).get('label'),
'url': media['url'], 'url': media['url'],
'tbr': int_or_none(media.get('bitrate')), 'tbr': int_or_none(media.get('bitrate')),
'filesize': int_or_none(media.get('fileSize')), 'filesize': int_or_none(media.get('fileSize')),
'ext': ext,
}) })
self._sort_formats(formats) self._sort_formats(formats)

View File

@@ -22,6 +22,7 @@ class AnimeOnDemandIE(InfoExtractor):
_APPLY_HTML5_URL = 'https://www.anime-on-demand.de/html5apply' _APPLY_HTML5_URL = 'https://www.anime-on-demand.de/html5apply'
_NETRC_MACHINE = 'animeondemand' _NETRC_MACHINE = 'animeondemand'
_TESTS = [{ _TESTS = [{
# jap, OmU
'url': 'https://www.anime-on-demand.de/anime/161', 'url': 'https://www.anime-on-demand.de/anime/161',
'info_dict': { 'info_dict': {
'id': '161', 'id': '161',
@@ -30,17 +31,21 @@ class AnimeOnDemandIE(InfoExtractor):
}, },
'playlist_mincount': 4, 'playlist_mincount': 4,
}, { }, {
# Film wording is used instead of Episode # Film wording is used instead of Episode, ger/jap, Dub/OmU
'url': 'https://www.anime-on-demand.de/anime/39', 'url': 'https://www.anime-on-demand.de/anime/39',
'only_matching': True, 'only_matching': True,
}, { }, {
# Episodes without titles # Episodes without titles, jap, OmU
'url': 'https://www.anime-on-demand.de/anime/162', 'url': 'https://www.anime-on-demand.de/anime/162',
'only_matching': True, 'only_matching': True,
}, { }, {
# ger/jap, Dub/OmU, account required # ger/jap, Dub/OmU, account required
'url': 'https://www.anime-on-demand.de/anime/169', 'url': 'https://www.anime-on-demand.de/anime/169',
'only_matching': True, 'only_matching': True,
}, {
# Full length film, non-series, ger/jap, Dub/OmU, account required
'url': 'https://www.anime-on-demand.de/anime/185',
'only_matching': True,
}] }]
def _login(self): def _login(self):
@@ -110,35 +115,12 @@ class AnimeOnDemandIE(InfoExtractor):
entries = [] entries = []
for num, episode_html in enumerate(re.findall( def extract_info(html, video_id, num=None):
r'(?s)<h3[^>]+class="episodebox-title".+?>Episodeninhalt<', webpage), 1): title, description = [None] * 2
episodebox_title = self._search_regex(
(r'class="episodebox-title"[^>]+title=(["\'])(?P<title>.+?)\1',
r'class="episodebox-title"[^>]+>(?P<title>.+?)<'),
episode_html, 'episodebox title', default=None, group='title')
if not episodebox_title:
continue
episode_number = int(self._search_regex(
r'(?:Episode|Film)\s*(\d+)',
episodebox_title, 'episode number', default=num))
episode_title = self._search_regex(
r'(?:Episode|Film)\s*\d+\s*-\s*(.+)',
episodebox_title, 'episode title', default=None)
video_id = 'episode-%d' % episode_number
common_info = {
'id': video_id,
'series': anime_title,
'episode': episode_title,
'episode_number': episode_number,
}
formats = [] formats = []
for input_ in re.findall( for input_ in re.findall(
r'<input[^>]+class=["\'].*?streamstarter_html5[^>]+>', episode_html): r'<input[^>]+class=["\'].*?streamstarter_html5[^>]+>', html):
attributes = extract_attributes(input_) attributes = extract_attributes(input_)
playlist_urls = [] playlist_urls = []
for playlist_key in ('data-playlist', 'data-otherplaylist'): for playlist_key in ('data-playlist', 'data-otherplaylist'):
@@ -161,7 +143,7 @@ class AnimeOnDemandIE(InfoExtractor):
format_id_list.append(lang) format_id_list.append(lang)
if kind: if kind:
format_id_list.append(kind) format_id_list.append(kind)
if not format_id_list: if not format_id_list and num is not None:
format_id_list.append(compat_str(num)) format_id_list.append(compat_str(num))
format_id = '-'.join(format_id_list) format_id = '-'.join(format_id_list)
format_note = ', '.join(filter(None, (kind, lang_note))) format_note = ', '.join(filter(None, (kind, lang_note)))
@@ -215,28 +197,74 @@ class AnimeOnDemandIE(InfoExtractor):
}) })
formats.extend(file_formats) formats.extend(file_formats)
if formats: return {
self._sort_formats(formats) 'title': title,
'description': description,
'formats': formats,
}
def extract_entries(html, video_id, common_info, num=None):
info = extract_info(html, video_id, num)
if info['formats']:
self._sort_formats(info['formats'])
f = common_info.copy() f = common_info.copy()
f.update({ f.update(info)
'title': title,
'description': description,
'formats': formats,
})
entries.append(f) entries.append(f)
# Extract teaser only when full episode is not available # Extract teaser/trailer only when full episode is not available
if not formats: if not info['formats']:
m = re.search( m = re.search(
r'data-dialog-header=(["\'])(?P<title>.+?)\1[^>]+href=(["\'])(?P<href>.+?)\3[^>]*>Teaser<', r'data-dialog-header=(["\'])(?P<title>.+?)\1[^>]+href=(["\'])(?P<href>.+?)\3[^>]*>(?P<kind>Teaser|Trailer)<',
episode_html) html)
if m: if m:
f = common_info.copy() f = common_info.copy()
f.update({ f.update({
'id': '%s-teaser' % f['id'], 'id': '%s-%s' % (f['id'], m.group('kind').lower()),
'title': m.group('title'), 'title': m.group('title'),
'url': compat_urlparse.urljoin(url, m.group('href')), 'url': compat_urlparse.urljoin(url, m.group('href')),
}) })
entries.append(f) entries.append(f)
def extract_episodes(html):
for num, episode_html in enumerate(re.findall(
r'(?s)<h3[^>]+class="episodebox-title".+?>Episodeninhalt<', html), 1):
episodebox_title = self._search_regex(
(r'class="episodebox-title"[^>]+title=(["\'])(?P<title>.+?)\1',
r'class="episodebox-title"[^>]+>(?P<title>.+?)<'),
episode_html, 'episodebox title', default=None, group='title')
if not episodebox_title:
continue
episode_number = int(self._search_regex(
r'(?:Episode|Film)\s*(\d+)',
episodebox_title, 'episode number', default=num))
episode_title = self._search_regex(
r'(?:Episode|Film)\s*\d+\s*-\s*(.+)',
episodebox_title, 'episode title', default=None)
video_id = 'episode-%d' % episode_number
common_info = {
'id': video_id,
'series': anime_title,
'episode': episode_title,
'episode_number': episode_number,
}
extract_entries(episode_html, video_id, common_info)
def extract_film(html, video_id):
common_info = {
'id': anime_id,
'title': anime_title,
'description': anime_description,
}
extract_entries(html, video_id, common_info)
extract_episodes(webpage)
if not entries:
extract_film(webpage, anime_id)
return self.playlist_result(entries, anime_id, anime_title, anime_description) return self.playlist_result(entries, anime_id, anime_title, anime_description)

View File

@@ -7,6 +7,8 @@ from .common import InfoExtractor
from ..compat import compat_urlparse from ..compat import compat_urlparse
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
parse_duration,
unified_strdate,
) )
@@ -16,7 +18,8 @@ class AppleTrailersIE(InfoExtractor):
_TESTS = [{ _TESTS = [{
'url': 'http://trailers.apple.com/trailers/wb/manofsteel/', 'url': 'http://trailers.apple.com/trailers/wb/manofsteel/',
'info_dict': { 'info_dict': {
'id': 'manofsteel', 'id': '5111',
'title': 'Man of Steel',
}, },
'playlist': [ 'playlist': [
{ {
@@ -70,6 +73,15 @@ class AppleTrailersIE(InfoExtractor):
'id': 'blackthorn', 'id': 'blackthorn',
}, },
'playlist_mincount': 2, 'playlist_mincount': 2,
'expected_warnings': ['Unable to download JSON metadata'],
}, {
# json data only available from http://trailers.apple.com/trailers/feeds/data/15881.json
'url': 'http://trailers.apple.com/trailers/fox/kungfupanda3/',
'info_dict': {
'id': '15881',
'title': 'Kung Fu Panda 3',
},
'playlist_mincount': 4,
}, { }, {
'url': 'http://trailers.apple.com/ca/metropole/autrui/', 'url': 'http://trailers.apple.com/ca/metropole/autrui/',
'only_matching': True, 'only_matching': True,
@@ -85,6 +97,45 @@ class AppleTrailersIE(InfoExtractor):
movie = mobj.group('movie') movie = mobj.group('movie')
uploader_id = mobj.group('company') uploader_id = mobj.group('company')
webpage = self._download_webpage(url, movie)
film_id = self._search_regex(r"FilmId\s*=\s*'(\d+)'", webpage, 'film id')
film_data = self._download_json(
'http://trailers.apple.com/trailers/feeds/data/%s.json' % film_id,
film_id, fatal=False)
if film_data:
entries = []
for clip in film_data.get('clips', []):
clip_title = clip['title']
formats = []
for version, version_data in clip.get('versions', {}).items():
for size, size_data in version_data.get('sizes', {}).items():
src = size_data.get('src')
if not src:
continue
formats.append({
'format_id': '%s-%s' % (version, size),
'url': re.sub(r'_(\d+p.mov)', r'_h\1', src),
'width': int_or_none(size_data.get('width')),
'height': int_or_none(size_data.get('height')),
'language': version[:2],
})
self._sort_formats(formats)
entries.append({
'id': movie + '-' + re.sub(r'[^a-zA-Z0-9]', '', clip_title).lower(),
'formats': formats,
'title': clip_title,
'thumbnail': clip.get('screen') or clip.get('thumb'),
'duration': parse_duration(clip.get('runtime') or clip.get('faded')),
'upload_date': unified_strdate(clip.get('posted')),
'uploader_id': uploader_id,
})
page_data = film_data.get('page', {})
return self.playlist_result(entries, film_id, page_data.get('movie_title'))
playlist_url = compat_urlparse.urljoin(url, 'includes/playlists/itunes.inc') playlist_url = compat_urlparse.urljoin(url, 'includes/playlists/itunes.inc')
def fix_html(s): def fix_html(s):

View File

@@ -8,12 +8,12 @@ from .generic import GenericIE
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
ExtractorError, ExtractorError,
get_element_by_attribute,
qualities, qualities,
int_or_none, int_or_none,
parse_duration, parse_duration,
unified_strdate, unified_strdate,
xpath_text, xpath_text,
update_url_query,
) )
from ..compat import compat_etree_fromstring from ..compat import compat_etree_fromstring
@@ -35,6 +35,7 @@ class ARDMediathekIE(InfoExtractor):
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
}, },
'skip': 'HTTP Error 404: Not Found',
}, { }, {
'url': 'http://www.ardmediathek.de/tv/Tatort/Tatort-Scheinwelten-H%C3%B6rfassung-Video/Das-Erste/Video?documentId=29522730&bcastId=602916', 'url': 'http://www.ardmediathek.de/tv/Tatort/Tatort-Scheinwelten-H%C3%B6rfassung-Video/Das-Erste/Video?documentId=29522730&bcastId=602916',
'md5': 'f4d98b10759ac06c0072bbcd1f0b9e3e', 'md5': 'f4d98b10759ac06c0072bbcd1f0b9e3e',
@@ -45,6 +46,7 @@ class ARDMediathekIE(InfoExtractor):
'description': 'md5:196392e79876d0ac94c94e8cdb2875f1', 'description': 'md5:196392e79876d0ac94c94e8cdb2875f1',
'duration': 5252, 'duration': 5252,
}, },
'skip': 'HTTP Error 404: Not Found',
}, { }, {
# audio # audio
'url': 'http://www.ardmediathek.de/tv/WDR-H%C3%B6rspiel-Speicher/Tod-eines-Fu%C3%9Fballers/WDR-3/Audio-Podcast?documentId=28488308&bcastId=23074086', 'url': 'http://www.ardmediathek.de/tv/WDR-H%C3%B6rspiel-Speicher/Tod-eines-Fu%C3%9Fballers/WDR-3/Audio-Podcast?documentId=28488308&bcastId=23074086',
@@ -56,6 +58,7 @@ class ARDMediathekIE(InfoExtractor):
'description': 'md5:f6e39f3461f0e1f54bfa48c8875c86ef', 'description': 'md5:f6e39f3461f0e1f54bfa48c8875c86ef',
'duration': 3240, 'duration': 3240,
}, },
'skip': 'HTTP Error 404: Not Found',
}, { }, {
'url': 'http://mediathek.daserste.de/sendungen_a-z/328454_anne-will/22429276_vertrauen-ist-gut-spionieren-ist-besser-geht', 'url': 'http://mediathek.daserste.de/sendungen_a-z/328454_anne-will/22429276_vertrauen-ist-gut-spionieren-ist-besser-geht',
'only_matching': True, 'only_matching': True,
@@ -114,11 +117,14 @@ class ARDMediathekIE(InfoExtractor):
continue continue
if ext == 'f4m': if ext == 'f4m':
formats.extend(self._extract_f4m_formats( formats.extend(self._extract_f4m_formats(
stream_url + '?hdcore=3.1.1&plugin=aasp-3.1.1.69.124', update_url_query(stream_url, {
video_id, preference=-1, f4m_id='hds', fatal=False)) 'hdcore': '3.1.1',
'plugin': 'aasp-3.1.1.69.124'
}),
video_id, f4m_id='hds', fatal=False))
elif ext == 'm3u8': elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
stream_url, video_id, 'mp4', preference=1, m3u8_id='hls', fatal=False)) stream_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
else: else:
if server and server.startswith('rtmp'): if server and server.startswith('rtmp'):
f = { f = {
@@ -232,7 +238,8 @@ class ARDIE(InfoExtractor):
'title': 'Die Story im Ersten: Mission unter falscher Flagge', 'title': 'Die Story im Ersten: Mission unter falscher Flagge',
'upload_date': '20140804', 'upload_date': '20140804',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
} },
'skip': 'HTTP Error 404: Not Found',
} }
def _real_extract(self, url): def _real_extract(self, url):
@@ -274,41 +281,3 @@ class ARDIE(InfoExtractor):
'upload_date': upload_date, 'upload_date': upload_date,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
} }
class SportschauIE(ARDMediathekIE):
IE_NAME = 'Sportschau'
_VALID_URL = r'(?P<baseurl>https?://(?:www\.)?sportschau\.de/(?:[^/]+/)+video(?P<id>[^/#?]+))\.html'
_TESTS = [{
'url': 'http://www.sportschau.de/tourdefrance/videoseppeltkokainhatnichtsmitklassischemdopingzutun100.html',
'info_dict': {
'id': 'seppeltkokainhatnichtsmitklassischemdopingzutun100',
'ext': 'mp4',
'title': 'Seppelt: "Kokain hat nichts mit klassischem Doping zu tun"',
'thumbnail': 're:^https?://.*\.jpg$',
'description': 'Der ARD-Doping Experte Hajo Seppelt gibt seine Einschätzung zum ersten Dopingfall der diesjährigen Tour de France um den Italiener Luca Paolini ab.',
},
'params': {
# m3u8 download
'skip_download': True,
},
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
base_url = mobj.group('baseurl')
webpage = self._download_webpage(url, video_id)
title = get_element_by_attribute('class', 'headline', webpage)
description = self._html_search_meta('description', webpage, 'description')
info = self._extract_media_info(
base_url + '-mc_defaultQuality-h.json', webpage, video_id)
info.update({
'title': title,
'description': description,
})
return info

View File

@@ -61,10 +61,7 @@ class ArteTvIE(InfoExtractor):
} }
class ArteTVPlus7IE(InfoExtractor): class ArteTVBaseIE(InfoExtractor):
IE_NAME = 'arte.tv:+7'
_VALID_URL = r'https?://(?:www\.)?arte\.tv/guide/(?P<lang>fr|de|en|es)/(?:(?:sendungen|emissions|embed)/)?(?P<id>[^/]+)/(?P<name>[^/?#&]+)'
@classmethod @classmethod
def _extract_url_info(cls, url): def _extract_url_info(cls, url):
mobj = re.match(cls._VALID_URL, url) mobj = re.match(cls._VALID_URL, url)
@@ -78,60 +75,6 @@ class ArteTVPlus7IE(InfoExtractor):
video_id = mobj.group('id') video_id = mobj.group('id')
return video_id, lang return video_id, lang
def _real_extract(self, url):
video_id, lang = self._extract_url_info(url)
webpage = self._download_webpage(url, video_id)
return self._extract_from_webpage(webpage, video_id, lang)
def _extract_from_webpage(self, webpage, video_id, lang):
patterns_templates = (r'arte_vp_url=["\'](.*?%s.*?)["\']', r'data-url=["\']([^"]+%s[^"]+)["\']')
ids = (video_id, '')
# some pages contain multiple videos (like
# http://www.arte.tv/guide/de/sendungen/XEN/xenius/?vid=055918-015_PLUS7-D),
# so we first try to look for json URLs that contain the video id from
# the 'vid' parameter.
patterns = [t % re.escape(_id) for _id in ids for t in patterns_templates]
json_url = self._html_search_regex(
patterns, webpage, 'json vp url', default=None)
if not json_url:
def find_iframe_url(webpage, default=NO_DEFAULT):
return self._html_search_regex(
r'<iframe[^>]+src=(["\'])(?P<url>.+\bjson_url=.+?)\1',
webpage, 'iframe url', group='url', default=default)
iframe_url = find_iframe_url(webpage, None)
if not iframe_url:
embed_url = self._html_search_regex(
r'arte_vp_url_oembed=\'([^\']+?)\'', webpage, 'embed url', default=None)
if embed_url:
player = self._download_json(
embed_url, video_id, 'Downloading player page')
iframe_url = find_iframe_url(player['html'])
# en and es URLs produce react-based pages with different layout (e.g.
# http://www.arte.tv/guide/en/053330-002-A/carnival-italy?zone=world)
if not iframe_url:
program = self._search_regex(
r'program\s*:\s*({.+?["\']embed_html["\'].+?}),?\s*\n',
webpage, 'program', default=None)
if program:
embed_html = self._parse_json(program, video_id)
if embed_html:
iframe_url = find_iframe_url(embed_html['embed_html'])
if iframe_url:
json_url = compat_parse_qs(
compat_urllib_parse_urlparse(iframe_url).query)['json_url'][0]
if json_url:
title = self._search_regex(
r'<h3[^>]+title=(["\'])(?P<title>.+?)\1',
webpage, 'title', default=None, group='title')
return self._extract_from_json_url(json_url, video_id, lang, title=title)
# Different kind of embed URL (e.g.
# http://www.arte.tv/magazine/trepalium/fr/episode-0406-replay-trepalium)
embed_url = self._search_regex(
r'<iframe[^>]+src=(["\'])(?P<url>.+?)\1',
webpage, 'embed url', group='url')
return self.url_result(embed_url)
def _extract_from_json_url(self, json_url, video_id, lang, title=None): def _extract_from_json_url(self, json_url, video_id, lang, title=None):
info = self._download_json(json_url, video_id) info = self._download_json(json_url, video_id)
player_info = info['videoJsonPlayer'] player_info = info['videoJsonPlayer']
@@ -235,28 +178,94 @@ class ArteTVPlus7IE(InfoExtractor):
return info_dict return info_dict
class ArteTVPlus7IE(ArteTVBaseIE):
IE_NAME = 'arte.tv:+7'
_VALID_URL = r'https?://(?:(?:www|sites)\.)?arte\.tv/[^/]+/(?P<lang>fr|de|en|es)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://www.arte.tv/guide/de/sendungen/XEN/xenius/?vid=055918-015_PLUS7-D',
'only_matching': True,
}, {
'url': 'http://sites.arte.tv/karambolage/de/video/karambolage-22',
'only_matching': True,
}]
@classmethod
def suitable(cls, url):
return False if ArteTVPlaylistIE.suitable(url) else super(ArteTVPlus7IE, cls).suitable(url)
def _real_extract(self, url):
video_id, lang = self._extract_url_info(url)
webpage = self._download_webpage(url, video_id)
return self._extract_from_webpage(webpage, video_id, lang)
def _extract_from_webpage(self, webpage, video_id, lang):
patterns_templates = (r'arte_vp_url=["\'](.*?%s.*?)["\']', r'data-url=["\']([^"]+%s[^"]+)["\']')
ids = (video_id, '')
# some pages contain multiple videos (like
# http://www.arte.tv/guide/de/sendungen/XEN/xenius/?vid=055918-015_PLUS7-D),
# so we first try to look for json URLs that contain the video id from
# the 'vid' parameter.
patterns = [t % re.escape(_id) for _id in ids for t in patterns_templates]
json_url = self._html_search_regex(
patterns, webpage, 'json vp url', default=None)
if not json_url:
def find_iframe_url(webpage, default=NO_DEFAULT):
return self._html_search_regex(
r'<iframe[^>]+src=(["\'])(?P<url>.+\bjson_url=.+?)\1',
webpage, 'iframe url', group='url', default=default)
iframe_url = find_iframe_url(webpage, None)
if not iframe_url:
embed_url = self._html_search_regex(
r'arte_vp_url_oembed=\'([^\']+?)\'', webpage, 'embed url', default=None)
if embed_url:
player = self._download_json(
embed_url, video_id, 'Downloading player page')
iframe_url = find_iframe_url(player['html'])
# en and es URLs produce react-based pages with different layout (e.g.
# http://www.arte.tv/guide/en/053330-002-A/carnival-italy?zone=world)
if not iframe_url:
program = self._search_regex(
r'program\s*:\s*({.+?["\']embed_html["\'].+?}),?\s*\n',
webpage, 'program', default=None)
if program:
embed_html = self._parse_json(program, video_id)
if embed_html:
iframe_url = find_iframe_url(embed_html['embed_html'])
if iframe_url:
json_url = compat_parse_qs(
compat_urllib_parse_urlparse(iframe_url).query)['json_url'][0]
if json_url:
title = self._search_regex(
r'<h3[^>]+title=(["\'])(?P<title>.+?)\1',
webpage, 'title', default=None, group='title')
return self._extract_from_json_url(json_url, video_id, lang, title=title)
# Different kind of embed URL (e.g.
# http://www.arte.tv/magazine/trepalium/fr/episode-0406-replay-trepalium)
entries = [
self.url_result(url)
for _, url in re.findall(r'<iframe[^>]+src=(["\'])(?P<url>.+?)\1', webpage)]
return self.playlist_result(entries)
# It also uses the arte_vp_url url from the webpage to extract the information # It also uses the arte_vp_url url from the webpage to extract the information
class ArteTVCreativeIE(ArteTVPlus7IE): class ArteTVCreativeIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:creative' IE_NAME = 'arte.tv:creative'
_VALID_URL = r'https?://creative\.arte\.tv/(?P<lang>fr|de|en|es)/(?:[^/]+/)*(?P<id>[^/?#&]+)' _VALID_URL = r'https?://creative\.arte\.tv/(?P<lang>fr|de|en|es)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://creative.arte.tv/de/magazin/agentur-amateur-corporate-design', 'url': 'http://creative.arte.tv/fr/episode/osmosis-episode-1',
'info_dict': { 'info_dict': {
'id': '72176', 'id': '057405-001-A',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Folge 2 - Corporate Design', 'title': 'OSMOSIS - N\'AYEZ PLUS PEUR D\'AIMER (1)',
'upload_date': '20131004', 'upload_date': '20150716',
}, },
}, { }, {
'url': 'http://creative.arte.tv/fr/Monty-Python-Reunion', 'url': 'http://creative.arte.tv/fr/Monty-Python-Reunion',
'info_dict': { 'playlist_count': 11,
'id': '160676', 'add_ie': ['Youtube'],
'ext': 'mp4',
'title': 'Monty Python live (mostly)',
'description': 'Événement ! Quarante-cinq ans après leurs premiers succès, les légendaires Monty Python remontent sur scène.\n',
'upload_date': '20140805',
}
}, { }, {
'url': 'http://creative.arte.tv/de/episode/agentur-amateur-4-der-erste-kunde', 'url': 'http://creative.arte.tv/de/episode/agentur-amateur-4-der-erste-kunde',
'only_matching': True, 'only_matching': True,
@@ -267,7 +276,7 @@ class ArteTVInfoIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:info' IE_NAME = 'arte.tv:info'
_VALID_URL = r'https?://info\.arte\.tv/(?P<lang>fr|de|en|es)/(?:[^/]+/)*(?P<id>[^/?#&]+)' _VALID_URL = r'https?://info\.arte\.tv/(?P<lang>fr|de|en|es)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TEST = { _TESTS = [{
'url': 'http://info.arte.tv/fr/service-civique-un-cache-misere', 'url': 'http://info.arte.tv/fr/service-civique-un-cache-misere',
'info_dict': { 'info_dict': {
'id': '067528-000-A', 'id': '067528-000-A',
@@ -275,7 +284,7 @@ class ArteTVInfoIE(ArteTVPlus7IE):
'title': 'Service civique, un cache misère ?', 'title': 'Service civique, un cache misère ?',
'upload_date': '20160403', 'upload_date': '20160403',
}, },
} }]
class ArteTVFutureIE(ArteTVPlus7IE): class ArteTVFutureIE(ArteTVPlus7IE):
@@ -300,6 +309,8 @@ class ArteTVDDCIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:ddc' IE_NAME = 'arte.tv:ddc'
_VALID_URL = r'https?://ddc\.arte\.tv/(?P<lang>emission|folge)/(?P<id>[^/?#&]+)' _VALID_URL = r'https?://ddc\.arte\.tv/(?P<lang>emission|folge)/(?P<id>[^/?#&]+)'
_TESTS = []
def _real_extract(self, url): def _real_extract(self, url):
video_id, lang = self._extract_url_info(url) video_id, lang = self._extract_url_info(url)
if lang == 'folge': if lang == 'folge':
@@ -318,7 +329,7 @@ class ArteTVConcertIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:concert' IE_NAME = 'arte.tv:concert'
_VALID_URL = r'https?://concert\.arte\.tv/(?P<lang>fr|de|en|es)/(?P<id>[^/?#&]+)' _VALID_URL = r'https?://concert\.arte\.tv/(?P<lang>fr|de|en|es)/(?P<id>[^/?#&]+)'
_TEST = { _TESTS = [{
'url': 'http://concert.arte.tv/de/notwist-im-pariser-konzertclub-divan-du-monde', 'url': 'http://concert.arte.tv/de/notwist-im-pariser-konzertclub-divan-du-monde',
'md5': '9ea035b7bd69696b67aa2ccaaa218161', 'md5': '9ea035b7bd69696b67aa2ccaaa218161',
'info_dict': { 'info_dict': {
@@ -328,24 +339,23 @@ class ArteTVConcertIE(ArteTVPlus7IE):
'upload_date': '20140128', 'upload_date': '20140128',
'description': 'md5:486eb08f991552ade77439fe6d82c305', 'description': 'md5:486eb08f991552ade77439fe6d82c305',
}, },
} }]
class ArteTVCinemaIE(ArteTVPlus7IE): class ArteTVCinemaIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:cinema' IE_NAME = 'arte.tv:cinema'
_VALID_URL = r'https?://cinema\.arte\.tv/(?P<lang>fr|de|en|es)/(?P<id>.+)' _VALID_URL = r'https?://cinema\.arte\.tv/(?P<lang>fr|de|en|es)/(?P<id>.+)'
_TEST = { _TESTS = [{
'url': 'http://cinema.arte.tv/de/node/38291', 'url': 'http://cinema.arte.tv/fr/article/les-ailes-du-desir-de-julia-reck',
'md5': '6b275511a5107c60bacbeeda368c3aa1', 'md5': 'a5b9dd5575a11d93daf0e3f404f45438',
'info_dict': { 'info_dict': {
'id': '055876-000_PWA12025-D', 'id': '062494-000-A',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Tod auf dem Nil', 'title': 'Film lauréat du concours web - "Les ailes du désir" de Julia Reck',
'upload_date': '20160122', 'upload_date': '20150807',
'description': 'md5:7f749bbb77d800ef2be11d54529b96bc',
}, },
} }]
class ArteTVMagazineIE(ArteTVPlus7IE): class ArteTVMagazineIE(ArteTVPlus7IE):
@@ -390,9 +400,42 @@ class ArteTVEmbedIE(ArteTVPlus7IE):
) )
''' '''
_TESTS = []
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id') video_id = mobj.group('id')
lang = mobj.group('lang') lang = mobj.group('lang')
json_url = mobj.group('json_url') json_url = mobj.group('json_url')
return self._extract_from_json_url(json_url, video_id, lang) return self._extract_from_json_url(json_url, video_id, lang)
class ArteTVPlaylistIE(ArteTVBaseIE):
IE_NAME = 'arte.tv:playlist'
_VALID_URL = r'https?://(?:www\.)?arte\.tv/guide/(?P<lang>fr|de|en|es)/[^#]*#collection/(?P<id>PL-\d+)'
_TESTS = [{
'url': 'http://www.arte.tv/guide/de/plus7/?country=DE#collection/PL-013263/ARTETV',
'info_dict': {
'id': 'PL-013263',
'title': 'Areva & Uramin',
'description': 'md5:a1dc0312ce357c262259139cfd48c9bf',
},
'playlist_mincount': 6,
}, {
'url': 'http://www.arte.tv/guide/de/playlists?country=DE#collection/PL-013190/ARTETV',
'only_matching': True,
}]
def _real_extract(self, url):
playlist_id, lang = self._extract_url_info(url)
collection = self._download_json(
'https://api.arte.tv/api/player/v1/collectionData/%s/%s?source=videos'
% (lang, playlist_id), playlist_id)
title = collection.get('title')
description = collection.get('shortDescription') or collection.get('teaserText')
entries = [
self._extract_from_json_url(
video['jsonUrl'], video.get('programId') or playlist_id, lang)
for video in collection['videos'] if video.get('jsonUrl')]
return self.playlist_result(entries, playlist_id, title, description)

View File

@@ -6,6 +6,7 @@ import time
from .common import InfoExtractor from .common import InfoExtractor
from .soundcloud import SoundcloudIE from .soundcloud import SoundcloudIE
from ..compat import compat_str
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
url_basename, url_basename,
@@ -136,7 +137,7 @@ class AudiomackAlbumIE(InfoExtractor):
result[resultkey] = api_response[apikey] result[resultkey] = api_response[apikey]
song_id = url_basename(api_response['url']).rpartition('.')[0] song_id = url_basename(api_response['url']).rpartition('.')[0]
result['entries'].append({ result['entries'].append({
'id': api_response.get('id', song_id), 'id': compat_str(api_response.get('id', song_id)),
'uploader': api_response.get('artist'), 'uploader': api_response.get('artist'),
'title': api_response.get('title', song_id), 'title': api_response.get('title', song_id),
'url': api_response['url'], 'url': api_response['url'],

View File

@@ -46,6 +46,7 @@ class AzubuIE(InfoExtractor):
'uploader_id': 272749, 'uploader_id': 272749,
'view_count': int, 'view_count': int,
}, },
'skip': 'Channel offline',
}, },
] ]
@@ -56,22 +57,26 @@ class AzubuIE(InfoExtractor):
'http://www.azubu.tv/api/video/%s' % video_id, video_id)['data'] 'http://www.azubu.tv/api/video/%s' % video_id, video_id)['data']
title = data['title'].strip() title = data['title'].strip()
description = data['description'] description = data.get('description')
thumbnail = data['thumbnail'] thumbnail = data.get('thumbnail')
view_count = data['view_count'] view_count = data.get('view_count')
uploader = data['user']['username'] user = data.get('user', {})
uploader_id = data['user']['id'] uploader = user.get('username')
uploader_id = user.get('id')
stream_params = json.loads(data['stream_params']) stream_params = json.loads(data['stream_params'])
timestamp = float_or_none(stream_params['creationDate'], 1000) timestamp = float_or_none(stream_params.get('creationDate'), 1000)
duration = float_or_none(stream_params['length'], 1000) duration = float_or_none(stream_params.get('length'), 1000)
renditions = stream_params.get('renditions') or [] renditions = stream_params.get('renditions') or []
video = stream_params.get('FLVFullLength') or stream_params.get('videoFullLength') video = stream_params.get('FLVFullLength') or stream_params.get('videoFullLength')
if video: if video:
renditions.append(video) renditions.append(video)
if not renditions and not user.get('channel', {}).get('is_live', True):
raise ExtractorError('%s said: channel is offline.' % self.IE_NAME, expected=True)
formats = [{ formats = [{
'url': fmt['url'], 'url': fmt['url'],
'width': fmt['frameWidth'], 'width': fmt['frameWidth'],

View File

@@ -31,7 +31,7 @@ class BBCCoUkIE(InfoExtractor):
music/clips[/#]| music/clips[/#]|
radio/player/ radio/player/
) )
(?P<id>%s) (?P<id>%s)(?!/(?:episodes|broadcasts|clips))
''' % _ID_REGEX ''' % _ID_REGEX
_MEDIASELECTOR_URLS = [ _MEDIASELECTOR_URLS = [
@@ -44,6 +44,8 @@ class BBCCoUkIE(InfoExtractor):
_MEDIASELECTION_NS = 'http://bbc.co.uk/2008/mp/mediaselection' _MEDIASELECTION_NS = 'http://bbc.co.uk/2008/mp/mediaselection'
_EMP_PLAYLIST_NS = 'http://bbc.co.uk/2008/emp/playlist' _EMP_PLAYLIST_NS = 'http://bbc.co.uk/2008/emp/playlist'
# Unified Streaming Platform
_USP_RE = r'/([^/]+)\.ism(?:\.hlsv2\.ism)?/[^/]+\.m3u8'
_NAMESPACES = ( _NAMESPACES = (
_MEDIASELECTION_NS, _MEDIASELECTION_NS,
@@ -55,12 +57,11 @@ class BBCCoUkIE(InfoExtractor):
'url': 'http://www.bbc.co.uk/programmes/b039g8p7', 'url': 'http://www.bbc.co.uk/programmes/b039g8p7',
'info_dict': { 'info_dict': {
'id': 'b039d07m', 'id': 'b039d07m',
'ext': 'flv', 'ext': 'mp4',
'title': 'Leonard Cohen, Kaleidoscope - BBC Radio 4', 'title': 'Leonard Cohen, Kaleidoscope - BBC Radio 4',
'description': 'The Canadian poet and songwriter reflects on his musical career.', 'description': 'The Canadian poet and songwriter reflects on his musical career.',
}, },
'params': { 'params': {
# rtmp download
'skip_download': True, 'skip_download': True,
} }
}, },
@@ -92,7 +93,7 @@ class BBCCoUkIE(InfoExtractor):
# rtmp download # rtmp download
'skip_download': True, 'skip_download': True,
}, },
'skip': 'Currently BBC iPlayer TV programmes are available to play in the UK only', 'skip': 'this episode is not currently available',
}, },
{ {
'url': 'http://www.bbc.co.uk/iplayer/episode/p026c7jt/tomorrows-worlds-the-unearthly-history-of-science-fiction-2-invasion', 'url': 'http://www.bbc.co.uk/iplayer/episode/p026c7jt/tomorrows-worlds-the-unearthly-history-of-science-fiction-2-invasion',
@@ -107,7 +108,7 @@ class BBCCoUkIE(InfoExtractor):
# rtmp download # rtmp download
'skip_download': True, 'skip_download': True,
}, },
'skip': 'Currently BBC iPlayer TV programmes are available to play in the UK only', 'skip': 'this episode is not currently available',
}, { }, {
'url': 'http://www.bbc.co.uk/programmes/b04v20dw', 'url': 'http://www.bbc.co.uk/programmes/b04v20dw',
'info_dict': { 'info_dict': {
@@ -127,13 +128,12 @@ class BBCCoUkIE(InfoExtractor):
'note': 'Audio', 'note': 'Audio',
'info_dict': { 'info_dict': {
'id': 'p022h44j', 'id': 'p022h44j',
'ext': 'flv', 'ext': 'mp4',
'title': 'BBC Proms Music Guides, Rachmaninov: Symphonic Dances', 'title': 'BBC Proms Music Guides, Rachmaninov: Symphonic Dances',
'description': "In this Proms Music Guide, Andrew McGregor looks at Rachmaninov's Symphonic Dances.", 'description': "In this Proms Music Guide, Andrew McGregor looks at Rachmaninov's Symphonic Dances.",
'duration': 227, 'duration': 227,
}, },
'params': { 'params': {
# rtmp download
'skip_download': True, 'skip_download': True,
} }
}, { }, {
@@ -141,13 +141,12 @@ class BBCCoUkIE(InfoExtractor):
'note': 'Video', 'note': 'Video',
'info_dict': { 'info_dict': {
'id': 'p025c103', 'id': 'p025c103',
'ext': 'flv', 'ext': 'mp4',
'title': 'Reading and Leeds Festival, 2014, Rae Morris - Closer (Live on BBC Three)', 'title': 'Reading and Leeds Festival, 2014, Rae Morris - Closer (Live on BBC Three)',
'description': 'Rae Morris performs Closer for BBC Three at Reading 2014', 'description': 'Rae Morris performs Closer for BBC Three at Reading 2014',
'duration': 226, 'duration': 226,
}, },
'params': { 'params': {
# rtmp download
'skip_download': True, 'skip_download': True,
} }
}, { }, {
@@ -163,7 +162,7 @@ class BBCCoUkIE(InfoExtractor):
# rtmp download # rtmp download
'skip_download': True, 'skip_download': True,
}, },
'skip': 'geolocation', 'skip': 'this episode is not currently available',
}, { }, {
'url': 'http://www.bbc.co.uk/iplayer/episode/b05zmgwn/royal-academy-summer-exhibition', 'url': 'http://www.bbc.co.uk/iplayer/episode/b05zmgwn/royal-academy-summer-exhibition',
'info_dict': { 'info_dict': {
@@ -177,7 +176,7 @@ class BBCCoUkIE(InfoExtractor):
# rtmp download # rtmp download
'skip_download': True, 'skip_download': True,
}, },
'skip': 'geolocation', 'skip': 'this episode is not currently available',
}, { }, {
# iptv-all mediaset fails with geolocation however there is no geo restriction # iptv-all mediaset fails with geolocation however there is no geo restriction
# for this programme at all # for this programme at all
@@ -192,17 +191,17 @@ class BBCCoUkIE(InfoExtractor):
# rtmp download # rtmp download
'skip_download': True, 'skip_download': True,
}, },
'skip': 'this episode is not currently available on BBC iPlayer Radio',
}, { }, {
# compact player (https://github.com/rg3/youtube-dl/issues/8147) # compact player (https://github.com/rg3/youtube-dl/issues/8147)
'url': 'http://www.bbc.co.uk/programmes/p028bfkf/player', 'url': 'http://www.bbc.co.uk/programmes/p028bfkf/player',
'info_dict': { 'info_dict': {
'id': 'p028bfkj', 'id': 'p028bfkj',
'ext': 'flv', 'ext': 'mp4',
'title': 'Extract from BBC documentary Look Stranger - Giant Leeks and Magic Brews', 'title': 'Extract from BBC documentary Look Stranger - Giant Leeks and Magic Brews',
'description': 'Extract from BBC documentary Look Stranger - Giant Leeks and Magic Brews', 'description': 'Extract from BBC documentary Look Stranger - Giant Leeks and Magic Brews',
}, },
'params': { 'params': {
# rtmp download
'skip_download': True, 'skip_download': True,
}, },
}, { }, {
@@ -247,9 +246,15 @@ class BBCCoUkIE(InfoExtractor):
elif transfer_format == 'dash': elif transfer_format == 'dash':
pass pass
elif transfer_format == 'hls': elif transfer_format == 'hls':
formats.extend(self._extract_m3u8_formats( is_unified_streaming = re.search(self._USP_RE, href)
if is_unified_streaming:
href = re.sub(self._USP_RE, r'/\1.ism/\1.m3u8', href)
m3u8_formats = self._extract_m3u8_formats(
href, programme_id, ext='mp4', entry_protocol='m3u8_native', href, programme_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id=supplier, fatal=False)) m3u8_id=supplier, fatal=False)
if is_unified_streaming:
self._check_formats(m3u8_formats, programme_id)
formats.extend(m3u8_formats)
# Direct link # Direct link
else: else:
formats.append({ formats.append({
@@ -304,13 +309,14 @@ class BBCCoUkIE(InfoExtractor):
for connection in self._extract_connections(media): for connection in self._extract_connections(media):
conn_formats = self._extract_connection(connection, programme_id) conn_formats = self._extract_connection(connection, programme_id)
for format in conn_formats: for format in conn_formats:
format.update({ if format.get('protocol') != 'm3u8_native':
'width': width, format.update({
'height': height, 'width': width,
'vbr': vbr, 'height': height,
'vcodec': vcodec, 'vbr': vbr,
'filesize': file_size, 'vcodec': vcodec,
}) 'filesize': file_size,
})
if service: if service:
format['format_id'] = '%s_%s' % (service, format['format_id']) format['format_id'] = '%s_%s' % (service, format['format_id'])
formats.extend(conn_formats) formats.extend(conn_formats)
@@ -698,7 +704,9 @@ class BBCIE(BBCCoUkIE):
@classmethod @classmethod
def suitable(cls, url): def suitable(cls, url):
return False if BBCCoUkIE.suitable(url) or BBCCoUkArticleIE.suitable(url) else super(BBCIE, cls).suitable(url) EXCLUDE_IE = (BBCCoUkIE, BBCCoUkArticleIE, BBCCoUkIPlayerPlaylistIE, BBCCoUkPlaylistIE)
return (False if any(ie.suitable(url) for ie in EXCLUDE_IE)
else super(BBCIE, cls).suitable(url))
def _extract_from_media_meta(self, media_meta, video_id): def _extract_from_media_meta(self, media_meta, video_id):
# Direct links to media in media metadata (e.g. # Direct links to media in media metadata (e.g.
@@ -975,3 +983,72 @@ class BBCCoUkArticleIE(InfoExtractor):
r'<div[^>]+typeof="Clip"[^>]+resource="([^"]+)"', webpage)] r'<div[^>]+typeof="Clip"[^>]+resource="([^"]+)"', webpage)]
return self.playlist_result(entries, playlist_id, title, description) return self.playlist_result(entries, playlist_id, title, description)
class BBCCoUkPlaylistBaseIE(InfoExtractor):
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
entries = [
self.url_result(self._URL_TEMPLATE % video_id, BBCCoUkIE.ie_key())
for video_id in re.findall(
self._VIDEO_ID_TEMPLATE % BBCCoUkIE._ID_REGEX, webpage)]
title, description = self._extract_title_and_description(webpage)
return self.playlist_result(entries, playlist_id, title, description)
class BBCCoUkIPlayerPlaylistIE(BBCCoUkPlaylistBaseIE):
IE_NAME = 'bbc.co.uk:iplayer:playlist'
_VALID_URL = r'https?://(?:www\.)?bbc\.co\.uk/iplayer/episodes/(?P<id>%s)' % BBCCoUkIE._ID_REGEX
_URL_TEMPLATE = 'http://www.bbc.co.uk/iplayer/episode/%s'
_VIDEO_ID_TEMPLATE = r'data-ip-id=["\'](%s)'
_TEST = {
'url': 'http://www.bbc.co.uk/iplayer/episodes/b05rcz9v',
'info_dict': {
'id': 'b05rcz9v',
'title': 'The Disappearance',
'description': 'French thriller serial about a missing teenager.',
},
'playlist_mincount': 6,
}
def _extract_title_and_description(self, webpage):
title = self._search_regex(r'<h1>([^<]+)</h1>', webpage, 'title', fatal=False)
description = self._search_regex(
r'<p[^>]+class=(["\'])subtitle\1[^>]*>(?P<value>[^<]+)</p>',
webpage, 'description', fatal=False, group='value')
return title, description
class BBCCoUkPlaylistIE(BBCCoUkPlaylistBaseIE):
IE_NAME = 'bbc.co.uk:playlist'
_VALID_URL = r'https?://(?:www\.)?bbc\.co\.uk/programmes/(?P<id>%s)/(?:episodes|broadcasts|clips)' % BBCCoUkIE._ID_REGEX
_URL_TEMPLATE = 'http://www.bbc.co.uk/programmes/%s'
_VIDEO_ID_TEMPLATE = r'data-pid=["\'](%s)'
_TESTS = [{
'url': 'http://www.bbc.co.uk/programmes/b05rcz9v/clips',
'info_dict': {
'id': 'b05rcz9v',
'title': 'The Disappearance - Clips - BBC Four',
'description': 'French thriller serial about a missing teenager.',
},
'playlist_mincount': 7,
}, {
'url': 'http://www.bbc.co.uk/programmes/b05rcz9v/broadcasts/2016/06',
'only_matching': True,
}, {
'url': 'http://www.bbc.co.uk/programmes/b05rcz9v/clips',
'only_matching': True,
}, {
'url': 'http://www.bbc.co.uk/programmes/b055jkys/episodes/player',
'only_matching': True,
}]
def _extract_title_and_description(self, webpage):
title = self._og_search_title(webpage, fatal=False)
description = self._og_search_description(webpage)
return title, description

View File

@@ -1,31 +1,27 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .mtv import MTVServicesInfoExtractor
from ..compat import compat_urllib_parse_unquote from ..utils import unified_strdate
from ..utils import ( from ..compat import compat_urllib_parse_urlencode
xpath_text,
xpath_with_ns,
int_or_none,
parse_iso8601,
)
class BetIE(InfoExtractor): class BetIE(MTVServicesInfoExtractor):
_VALID_URL = r'https?://(?:www\.)?bet\.com/(?:[^/]+/)+(?P<id>.+?)\.html' _VALID_URL = r'https?://(?:www\.)?bet\.com/(?:[^/]+/)+(?P<id>.+?)\.html'
_TESTS = [ _TESTS = [
{ {
'url': 'http://www.bet.com/news/politics/2014/12/08/in-bet-exclusive-obama-talks-race-and-racism.html', 'url': 'http://www.bet.com/news/politics/2014/12/08/in-bet-exclusive-obama-talks-race-and-racism.html',
'info_dict': { 'info_dict': {
'id': 'news/national/2014/a-conversation-with-president-obama', 'id': '07e96bd3-8850-3051-b856-271b457f0ab8',
'display_id': 'in-bet-exclusive-obama-talks-race-and-racism', 'display_id': 'in-bet-exclusive-obama-talks-race-and-racism',
'ext': 'flv', 'ext': 'flv',
'title': 'A Conversation With President Obama', 'title': 'A Conversation With President Obama',
'description': 'md5:699d0652a350cf3e491cd15cc745b5da', 'description': 'President Obama urges persistence in confronting racism and bias.',
'duration': 1534, 'duration': 1534,
'timestamp': 1418075340,
'upload_date': '20141208', 'upload_date': '20141208',
'uploader': 'admin',
'thumbnail': 're:(?i)^https?://.*\.jpg$', 'thumbnail': 're:(?i)^https?://.*\.jpg$',
'subtitles': {
'en': 'mincount:2',
}
}, },
'params': { 'params': {
# rtmp download # rtmp download
@@ -35,16 +31,17 @@ class BetIE(InfoExtractor):
{ {
'url': 'http://www.bet.com/video/news/national/2014/justice-for-ferguson-a-community-reacts.html', 'url': 'http://www.bet.com/video/news/national/2014/justice-for-ferguson-a-community-reacts.html',
'info_dict': { 'info_dict': {
'id': 'news/national/2014/justice-for-ferguson-a-community-reacts', 'id': '9f516bf1-7543-39c4-8076-dd441b459ba9',
'display_id': 'justice-for-ferguson-a-community-reacts', 'display_id': 'justice-for-ferguson-a-community-reacts',
'ext': 'flv', 'ext': 'flv',
'title': 'Justice for Ferguson: A Community Reacts', 'title': 'Justice for Ferguson: A Community Reacts',
'description': 'A BET News special.', 'description': 'A BET News special.',
'duration': 1696, 'duration': 1696,
'timestamp': 1416942360,
'upload_date': '20141125', 'upload_date': '20141125',
'uploader': 'admin',
'thumbnail': 're:(?i)^https?://.*\.jpg$', 'thumbnail': 're:(?i)^https?://.*\.jpg$',
'subtitles': {
'en': 'mincount:2',
}
}, },
'params': { 'params': {
# rtmp download # rtmp download
@@ -53,57 +50,32 @@ class BetIE(InfoExtractor):
} }
] ]
_FEED_URL = "http://feeds.mtvnservices.com/od/feed/bet-mrss-player"
def _get_feed_query(self, uri):
return compat_urllib_parse_urlencode({
'uuid': uri,
})
def _extract_mgid(self, webpage):
return self._search_regex(r'data-uri="([^"]+)', webpage, 'mgid')
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
mgid = self._extract_mgid(webpage)
videos_info = self._get_videos_info(mgid)
media_url = compat_urllib_parse_unquote(self._search_regex( info_dict = videos_info['entries'][0]
[r'mediaURL\s*:\s*"([^"]+)"', r"var\s+mrssMediaUrl\s*=\s*'([^']+)'"],
webpage, 'media URL'))
video_id = self._search_regex( upload_date = unified_strdate(self._html_search_meta('date', webpage))
r'/video/(.*)/_jcr_content/', media_url, 'video id') description = self._html_search_meta('description', webpage)
mrss = self._download_xml(media_url, display_id) info_dict.update({
item = mrss.find('./channel/item')
NS_MAP = {
'dc': 'http://purl.org/dc/elements/1.1/',
'media': 'http://search.yahoo.com/mrss/',
'ka': 'http://kickapps.com/karss',
}
title = xpath_text(item, './title', 'title')
description = xpath_text(
item, './description', 'description', fatal=False)
timestamp = parse_iso8601(xpath_text(
item, xpath_with_ns('./dc:date', NS_MAP),
'upload date', fatal=False))
uploader = xpath_text(
item, xpath_with_ns('./dc:creator', NS_MAP),
'uploader', fatal=False)
media_content = item.find(
xpath_with_ns('./media:content', NS_MAP))
duration = int_or_none(media_content.get('duration'))
smil_url = media_content.get('url')
thumbnail = media_content.find(
xpath_with_ns('./media:thumbnail', NS_MAP)).get('url')
formats = self._extract_smil_formats(smil_url, display_id)
self._sort_formats(formats)
return {
'id': video_id,
'display_id': display_id, 'display_id': display_id,
'title': title,
'description': description, 'description': description,
'thumbnail': thumbnail, 'upload_date': upload_date,
'timestamp': timestamp, })
'uploader': uploader,
'duration': duration, return info_dict
'formats': formats,
}

View File

@@ -46,6 +46,78 @@ class BiliBiliIE(InfoExtractor):
'description': '这是个神奇的故事~每个人不留弹幕不给走哦~切利哦!~', 'description': '这是个神奇的故事~每个人不留弹幕不给走哦~切利哦!~',
}, },
'playlist_count': 9, 'playlist_count': 9,
}, {
'url': 'http://www.bilibili.com/video/av4808130/',
'info_dict': {
'id': '4808130',
'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
},
'playlist': [{
'md5': '55cdadedf3254caaa0d5d27cf20a8f9c',
'info_dict': {
'id': '4808130_part1',
'ext': 'flv',
'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
'timestamp': 1464564180,
'upload_date': '20160529',
'uploader': '喜欢拉面',
'uploader_id': '151066',
},
}, {
'md5': '926f9f67d0c482091872fbd8eca7ea3d',
'info_dict': {
'id': '4808130_part2',
'ext': 'flv',
'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
'timestamp': 1464564180,
'upload_date': '20160529',
'uploader': '喜欢拉面',
'uploader_id': '151066',
},
}, {
'md5': '4b7b225b968402d7c32348c646f1fd83',
'info_dict': {
'id': '4808130_part3',
'ext': 'flv',
'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
'timestamp': 1464564180,
'upload_date': '20160529',
'uploader': '喜欢拉面',
'uploader_id': '151066',
},
}, {
'md5': '7b795e214166501e9141139eea236e91',
'info_dict': {
'id': '4808130_part4',
'ext': 'flv',
'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
'timestamp': 1464564180,
'upload_date': '20160529',
'uploader': '喜欢拉面',
'uploader_id': '151066',
},
}],
}, {
# Missing upload time
'url': 'http://www.bilibili.com/video/av1867637/',
'info_dict': {
'id': '2880301',
'ext': 'flv',
'title': '【HDTV】【喜剧】岳父岳母真难当 2014【法国票房冠军】',
'description': '一个信奉天主教的法国旧式传统资产阶级家庭中有四个女儿。三个女儿却分别找了阿拉伯、犹太、中国丈夫,老夫老妻唯独期盼剩下未嫁的小女儿能找一个信奉天主教的法国白人,结果没想到小女儿找了一位非裔黑人……【这次应该不会跳帧了】',
'uploader': '黑夜为猫',
'uploader_id': '610729',
},
'params': {
# Just to test metadata extraction
'skip_download': True,
},
'expected_warnings': ['upload time'],
}] }]
# BiliBili blocks keys from time to time. The current key is extracted from # BiliBili blocks keys from time to time. The current key is extracted from
@@ -116,6 +188,7 @@ class BiliBiliIE(InfoExtractor):
description = self._html_search_meta('description', webpage) description = self._html_search_meta('description', webpage)
datetime_str = self._html_search_regex( datetime_str = self._html_search_regex(
r'<time[^>]+datetime="([^"]+)"', webpage, 'upload time', fatal=False) r'<time[^>]+datetime="([^"]+)"', webpage, 'upload time', fatal=False)
timestamp = None
if datetime_str: if datetime_str:
timestamp = calendar.timegm(datetime.datetime.strptime(datetime_str, '%Y-%m-%dT%H:%M').timetuple()) timestamp = calendar.timegm(datetime.datetime.strptime(datetime_str, '%Y-%m-%dT%H:%M').timetuple())
@@ -144,6 +217,9 @@ class BiliBiliIE(InfoExtractor):
if len(entries) == 1: if len(entries) == 1:
return entries[0] return entries[0]
else: else:
for idx, entry in enumerate(entries):
entry['id'] = '%s_part%d' % (video_id, (idx + 1))
return { return {
'_type': 'multi_video', '_type': 'multi_video',
'id': video_id, 'id': video_id,

View File

@@ -2,11 +2,15 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import remove_end from ..utils import (
ExtractorError,
remove_end,
)
from .rudo import RudoIE
class BioBioChileTVIE(InfoExtractor): class BioBioChileTVIE(InfoExtractor):
_VALID_URL = r'https?://tv\.biobiochile\.cl/notas/(?:[^/]+/)+(?P<id>[^/]+)\.shtml' _VALID_URL = r'https?://(?:tv|www)\.biobiochile\.cl/(?:notas|noticias)/(?:[^/]+/)+(?P<id>[^/]+)\.shtml'
_TESTS = [{ _TESTS = [{
'url': 'http://tv.biobiochile.cl/notas/2015/10/21/sobre-camaras-y-camarillas-parlamentarias.shtml', 'url': 'http://tv.biobiochile.cl/notas/2015/10/21/sobre-camaras-y-camarillas-parlamentarias.shtml',
@@ -18,6 +22,7 @@ class BioBioChileTVIE(InfoExtractor):
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
'uploader': 'Fernando Atria', 'uploader': 'Fernando Atria',
}, },
'skip': 'URL expired and redirected to http://www.biobiochile.cl/portada/bbtv/index.html',
}, { }, {
# different uploader layout # different uploader layout
'url': 'http://tv.biobiochile.cl/notas/2016/03/18/natalia-valdebenito-repasa-a-diputado-hasbun-paso-a-la-categoria-de-hablar-brutalidades.shtml', 'url': 'http://tv.biobiochile.cl/notas/2016/03/18/natalia-valdebenito-repasa-a-diputado-hasbun-paso-a-la-categoria-de-hablar-brutalidades.shtml',
@@ -32,6 +37,16 @@ class BioBioChileTVIE(InfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
'skip': 'URL expired and redirected to http://www.biobiochile.cl/portada/bbtv/index.html',
}, {
'url': 'http://www.biobiochile.cl/noticias/bbtv/comentarios-bio-bio/2016/07/08/edecanes-del-congreso-figuras-decorativas-que-le-cuestan-muy-caro-a-los-chilenos.shtml',
'info_dict': {
'id': 'edecanes-del-congreso-figuras-decorativas-que-le-cuestan-muy-caro-a-los-chilenos',
'ext': 'mp4',
'uploader': '(none)',
'upload_date': '20160708',
'title': 'Edecanes del Congreso: Figuras decorativas que le cuestan muy caro a los chilenos',
},
}, { }, {
'url': 'http://tv.biobiochile.cl/notas/2015/10/22/ninos-transexuales-de-quien-es-la-decision.shtml', 'url': 'http://tv.biobiochile.cl/notas/2015/10/22/ninos-transexuales-de-quien-es-la-decision.shtml',
'only_matching': True, 'only_matching': True,
@@ -45,42 +60,22 @@ class BioBioChileTVIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
rudo_url = RudoIE._extract_url(webpage)
if not rudo_url:
raise ExtractorError('No videos found')
title = remove_end(self._og_search_title(webpage), ' - BioBioChile TV') title = remove_end(self._og_search_title(webpage), ' - BioBioChile TV')
file_url = self._search_regex(
r'loadFWPlayerVideo\([^,]+,\s*(["\'])(?P<url>.+?)\1',
webpage, 'file url', group='url')
base_url = self._search_regex(
r'file\s*:\s*(["\'])(?P<url>.+?)\1\s*\+\s*fileURL', webpage,
'base url', default='http://unlimited2-cl.digitalproserver.com/bbtv/',
group='url')
formats = self._extract_m3u8_formats(
'%s%s/playlist.m3u8' % (base_url, file_url), video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls', fatal=False)
f = {
'url': '%s%s' % (base_url, file_url),
'format_id': 'http',
'protocol': 'http',
'preference': 1,
}
if formats:
f_copy = formats[-1].copy()
f_copy.update(f)
f = f_copy
formats.append(f)
self._sort_formats(formats)
thumbnail = self._og_search_thumbnail(webpage) thumbnail = self._og_search_thumbnail(webpage)
uploader = self._html_search_regex( uploader = self._html_search_regex(
r'<a[^>]+href=["\']https?://busca\.biobiochile\.cl/author[^>]+>(.+?)</a>', r'<a[^>]+href=["\']https?://(?:busca|www)\.biobiochile\.cl/(?:lista/)?(?:author|autor)[^>]+>(.+?)</a>',
webpage, 'uploader', fatal=False) webpage, 'uploader', fatal=False)
return { return {
'_type': 'url_transparent',
'url': rudo_url,
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'uploader': uploader, 'uploader': uploader,
'formats': formats,
} }

View File

@@ -29,7 +29,8 @@ class BRIE(InfoExtractor):
'duration': 180, 'duration': 180,
'uploader': 'Reinhard Weber', 'uploader': 'Reinhard Weber',
'upload_date': '20150422', 'upload_date': '20150422',
} },
'skip': '404 not found',
}, },
{ {
'url': 'http://www.br.de/nachrichten/oberbayern/inhalt/muenchner-polizeipraesident-schreiber-gestorben-100.html', 'url': 'http://www.br.de/nachrichten/oberbayern/inhalt/muenchner-polizeipraesident-schreiber-gestorben-100.html',
@@ -40,7 +41,8 @@ class BRIE(InfoExtractor):
'title': 'Manfred Schreiber ist tot', 'title': 'Manfred Schreiber ist tot',
'description': 'md5:b454d867f2a9fc524ebe88c3f5092d97', 'description': 'md5:b454d867f2a9fc524ebe88c3f5092d97',
'duration': 26, 'duration': 26,
} },
'skip': '404 not found',
}, },
{ {
'url': 'https://www.br-klassik.de/audio/peeping-tom-premierenkritik-dance-festival-muenchen-100.html', 'url': 'https://www.br-klassik.de/audio/peeping-tom-premierenkritik-dance-festival-muenchen-100.html',
@@ -51,7 +53,8 @@ class BRIE(InfoExtractor):
'title': 'Kurzweilig und sehr bewegend', 'title': 'Kurzweilig und sehr bewegend',
'description': 'md5:0351996e3283d64adeb38ede91fac54e', 'description': 'md5:0351996e3283d64adeb38ede91fac54e',
'duration': 296, 'duration': 296,
} },
'skip': '404 not found',
}, },
{ {
'url': 'http://www.br.de/radio/bayern1/service/team/videos/team-video-erdelt100.html', 'url': 'http://www.br.de/radio/bayern1/service/team/videos/team-video-erdelt100.html',

View File

@@ -26,6 +26,7 @@ from ..utils import (
unescapeHTML, unescapeHTML,
unsmuggle_url, unsmuggle_url,
update_url_query, update_url_query,
clean_html,
) )
@@ -90,6 +91,7 @@ class BrightcoveLegacyIE(InfoExtractor):
'description': 'md5:363109c02998fee92ec02211bd8000df', 'description': 'md5:363109c02998fee92ec02211bd8000df',
'uploader': 'National Ballet of Canada', 'uploader': 'National Ballet of Canada',
}, },
'skip': 'Video gone',
}, },
{ {
# test flv videos served by akamaihd.net # test flv videos served by akamaihd.net
@@ -108,7 +110,7 @@ class BrightcoveLegacyIE(InfoExtractor):
}, },
}, },
{ {
# playlist test # playlist with 'videoList'
# from http://support.brightcove.com/en/video-cloud/docs/playlist-support-single-video-players # from http://support.brightcove.com/en/video-cloud/docs/playlist-support-single-video-players
'url': 'http://c.brightcove.com/services/viewer/htmlFederated?playerID=3550052898001&playerKey=AQ%7E%7E%2CAAABmA9XpXk%7E%2C-Kp7jNgisre1fG5OdqpAFUTcs0lP_ZoL', 'url': 'http://c.brightcove.com/services/viewer/htmlFederated?playerID=3550052898001&playerKey=AQ%7E%7E%2CAAABmA9XpXk%7E%2C-Kp7jNgisre1fG5OdqpAFUTcs0lP_ZoL',
'info_dict': { 'info_dict': {
@@ -117,6 +119,15 @@ class BrightcoveLegacyIE(InfoExtractor):
}, },
'playlist_mincount': 7, 'playlist_mincount': 7,
}, },
{
# playlist with 'playlistTab' (https://github.com/rg3/youtube-dl/issues/9965)
'url': 'http://c.brightcove.com/services/json/experience/runtime/?command=get_programming_for_experience&playerKey=AQ%7E%7E,AAABXlLMdok%7E,NJ4EoMlZ4rZdx9eU1rkMVd8EaYPBBUlg',
'info_dict': {
'id': '1522758701001',
'title': 'Lesson 08',
},
'playlist_mincount': 10,
},
] ]
FLV_VCODECS = { FLV_VCODECS = {
1: 'SORENSON', 1: 'SORENSON',
@@ -298,13 +309,19 @@ class BrightcoveLegacyIE(InfoExtractor):
info_url, player_key, 'Downloading playlist information') info_url, player_key, 'Downloading playlist information')
json_data = json.loads(playlist_info) json_data = json.loads(playlist_info)
if 'videoList' not in json_data: if 'videoList' in json_data:
playlist_info = json_data['videoList']
playlist_dto = playlist_info['mediaCollectionDTO']
elif 'playlistTabs' in json_data:
playlist_info = json_data['playlistTabs']
playlist_dto = playlist_info['lineupListDTO']['playlistDTOs'][0]
else:
raise ExtractorError('Empty playlist') raise ExtractorError('Empty playlist')
playlist_info = json_data['videoList']
videos = [self._extract_video_info(video_info) for video_info in playlist_info['mediaCollectionDTO']['videoDTOs']] videos = [self._extract_video_info(video_info) for video_info in playlist_dto['videoDTOs']]
return self.playlist_result(videos, playlist_id='%s' % playlist_info['id'], return self.playlist_result(videos, playlist_id='%s' % playlist_info['id'],
playlist_title=playlist_info['mediaCollectionDTO']['displayName']) playlist_title=playlist_dto['displayName'])
def _extract_video_info(self, video_info): def _extract_video_info(self, video_info):
video_id = compat_str(video_info['id']) video_id = compat_str(video_info['id'])
@@ -585,6 +602,13 @@ class BrightcoveNewIE(InfoExtractor):
'format_id': build_format_id('rtmp'), 'format_id': build_format_id('rtmp'),
}) })
formats.append(f) formats.append(f)
errors = json_data.get('errors')
if not formats and errors:
error = errors[0]
raise ExtractorError(
error.get('message') or error.get('error_subcode') or error['error_code'], expected=True)
self._sort_formats(formats) self._sort_formats(formats)
subtitles = {} subtitles = {}
@@ -597,7 +621,7 @@ class BrightcoveNewIE(InfoExtractor):
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'description': json_data.get('description'), 'description': clean_html(json_data.get('description')),
'thumbnail': json_data.get('thumbnail') or json_data.get('poster'), 'thumbnail': json_data.get('thumbnail') or json_data.get('poster'),
'duration': float_or_none(json_data.get('duration'), 1000), 'duration': float_or_none(json_data.get('duration'), 1000),
'timestamp': parse_iso8601(json_data.get('published_at')), 'timestamp': parse_iso8601(json_data.get('published_at')),

View File

@@ -5,6 +5,7 @@ import json
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from .facebook import FacebookIE
class BuzzFeedIE(InfoExtractor): class BuzzFeedIE(InfoExtractor):
@@ -20,11 +21,11 @@ class BuzzFeedIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': 'aVCR29aE_OQ', 'id': 'aVCR29aE_OQ',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Angry Ram destroys a punching bag..',
'description': 'md5:c59533190ef23fd4458a5e8c8c872345',
'upload_date': '20141024', 'upload_date': '20141024',
'uploader_id': 'Buddhanz1', 'uploader_id': 'Buddhanz1',
'description': 'He likes to stay in shape with his heavy bag, he wont stop until its on the ground\n\nFollow Angry Ram on Facebook for regular updates -\nhttps://www.facebook.com/pages/Angry-Ram/1436897249899558?ref=hl', 'uploader': 'Angry Ram',
'uploader': 'Buddhanz',
'title': 'Angry Ram destroys a punching bag',
} }
}] }]
}, { }, {
@@ -41,13 +42,30 @@ class BuzzFeedIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': 'mVmBL8B-In0', 'id': 'mVmBL8B-In0',
'ext': 'mp4', 'ext': 'mp4',
'title': 're:Munchkin the Teddy Bear gets her exercise',
'description': 'md5:28faab95cda6e361bcff06ec12fc21d8',
'upload_date': '20141124', 'upload_date': '20141124',
'uploader_id': 'CindysMunchkin', 'uploader_id': 'CindysMunchkin',
'description': 're:© 2014 Munchkin the',
'uploader': 're:^Munchkin the', 'uploader': 're:^Munchkin the',
'title': 're:Munchkin the Teddy Bear gets her exercise',
}, },
}] }]
}, {
'url': 'http://www.buzzfeed.com/craigsilverman/the-most-adorable-crash-landing-ever#.eq7pX0BAmK',
'info_dict': {
'id': 'the-most-adorable-crash-landing-ever',
'title': 'Watch This Baby Goose Make The Most Adorable Crash Landing',
'description': 'This gosling knows how to stick a landing.',
},
'playlist': [{
'md5': '763ca415512f91ca62e4621086900a23',
'info_dict': {
'id': '971793786185728',
'ext': 'mp4',
'title': 'We set up crash pads so that the goslings on our roof would have a safe landi...',
'uploader': 'Calgary Outdoor Centre-University of Calgary',
},
}],
'add_ie': ['Facebook'],
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@@ -66,6 +84,10 @@ class BuzzFeedIE(InfoExtractor):
continue continue
entries.append(self.url_result(video['url'])) entries.append(self.url_result(video['url']))
facebook_url = FacebookIE._extract_url(webpage)
if facebook_url:
entries.append(self.url_result(facebook_url))
return { return {
'_type': 'playlist', '_type': 'playlist',
'id': playlist_id, 'id': playlist_id,

View File

@@ -4,11 +4,11 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urllib_parse_urlparse
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
HEADRequest, HEADRequest,
unified_strdate, unified_strdate,
url_basename,
qualities, qualities,
int_or_none, int_or_none,
) )
@@ -16,24 +16,38 @@ from ..utils import (
class CanalplusIE(InfoExtractor): class CanalplusIE(InfoExtractor):
IE_DESC = 'canalplus.fr, piwiplus.fr and d8.tv' IE_DESC = 'canalplus.fr, piwiplus.fr and d8.tv'
_VALID_URL = r'https?://(?:www\.(?P<site>canalplus\.fr|piwiplus\.fr|d8\.tv|itele\.fr)/.*?/(?P<path>.*)|player\.canalplus\.fr/#/(?P<id>[0-9]+))' _VALID_URL = r'''(?x)
https?://
(?:
(?:
(?:(?:www|m)\.)?canalplus\.fr|
(?:www\.)?piwiplus\.fr|
(?:www\.)?d8\.tv|
(?:www\.)?d17\.tv|
(?:www\.)?itele\.fr
)/(?:(?:[^/]+/)*(?P<display_id>[^/?#&]+))?(?:\?.*\bvid=(?P<vid>\d+))?|
player\.canalplus\.fr/#/(?P<id>\d+)
)
'''
_VIDEO_INFO_TEMPLATE = 'http://service.canal-plus.com/video/rest/getVideosLiees/%s/%s?format=json' _VIDEO_INFO_TEMPLATE = 'http://service.canal-plus.com/video/rest/getVideosLiees/%s/%s?format=json'
_SITE_ID_MAP = { _SITE_ID_MAP = {
'canalplus.fr': 'cplus', 'canalplus': 'cplus',
'piwiplus.fr': 'teletoon', 'piwiplus': 'teletoon',
'd8.tv': 'd8', 'd8': 'd8',
'itele.fr': 'itele', 'd17': 'd17',
'itele': 'itele',
} }
_TESTS = [{ _TESTS = [{
'url': 'http://www.canalplus.fr/c-emissions/pid1830-c-zapping.html?vid=1263092', 'url': 'http://www.canalplus.fr/c-emissions/pid1830-c-zapping.html?vid=1192814',
'md5': '12164a6f14ff6df8bd628e8ba9b10b78', 'md5': '41f438a4904f7664b91b4ed0dec969dc',
'info_dict': { 'info_dict': {
'id': '1263092', 'id': '1192814',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Le Zapping - 13/05/15', 'title': "L'Année du Zapping 2014 - L'Année du Zapping 2014",
'description': 'md5:09738c0d06be4b5d06a0940edb0da73f', 'description': "Toute l'année 2014 dans un Zapping exceptionnel !",
'upload_date': '20150513', 'upload_date': '20150105',
}, },
}, { }, {
'url': 'http://www.piwiplus.fr/videos-piwi/pid1405-le-labyrinthe-boing-super-ranger.html?vid=1108190', 'url': 'http://www.piwiplus.fr/videos-piwi/pid1405-le-labyrinthe-boing-super-ranger.html?vid=1108190',
@@ -46,35 +60,45 @@ class CanalplusIE(InfoExtractor):
}, },
'skip': 'Only works from France', 'skip': 'Only works from France',
}, { }, {
'url': 'http://www.d8.tv/d8-docs-mags/pid6589-d8-campagne-intime.html', 'url': 'http://www.d8.tv/d8-docs-mags/pid5198-d8-en-quete-d-actualite.html?vid=1390231',
'info_dict': { 'info_dict': {
'id': '966289', 'id': '1390231',
'ext': 'flv',
'title': 'Campagne intime - Documentaire exceptionnel',
'description': 'md5:d2643b799fb190846ae09c61e59a859f',
'upload_date': '20131108',
},
'skip': 'videos get deleted after a while',
}, {
'url': 'http://www.itele.fr/france/video/aubervilliers-un-lycee-en-colere-111559',
'md5': '38b8f7934def74f0d6f3ba6c036a5f82',
'info_dict': {
'id': '1213714',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Aubervilliers : un lycée en colère - Le 11/02/2015 à 06h45', 'title': "Vacances pas chères : prix discount ou grosses dépenses ? - En quête d'actualité",
'description': 'md5:8216206ec53426ea6321321f3b3c16db', 'description': 'md5:edb6cf1cb4a1e807b5dd089e1ac8bfc6',
'upload_date': '20150211', 'upload_date': '20160512',
}, },
'params': {
'skip_download': True,
},
}, {
'url': 'http://www.itele.fr/chroniques/invite-bruce-toussaint/thierry-solere-nicolas-sarkozy-officialisera-sa-candidature-a-la-primaire-quand-il-le-voudra-167224',
'info_dict': {
'id': '1398334',
'ext': 'mp4',
'title': "L'invité de Bruce Toussaint du 07/06/2016 - ",
'description': 'md5:40ac7c9ad0feaeb6f605bad986f61324',
'upload_date': '20160607',
},
'params': {
'skip_download': True,
},
}, {
'url': 'http://m.canalplus.fr/?vid=1398231',
'only_matching': True,
}, {
'url': 'http://www.d17.tv/emissions/pid8303-lolywood.html?vid=1397061',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.groupdict().get('id') video_id = mobj.groupdict().get('id') or mobj.groupdict().get('vid')
site_id = self._SITE_ID_MAP[mobj.group('site') or 'canal'] site_id = self._SITE_ID_MAP[compat_urllib_parse_urlparse(url).netloc.rsplit('.', 2)[-2]]
# Beware, some subclasses do not define an id group # Beware, some subclasses do not define an id group
display_id = url_basename(mobj.group('path')) display_id = mobj.group('display_id') or video_id
if video_id is None: if video_id is None:
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)

View File

@@ -0,0 +1,88 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
float_or_none,
int_or_none,
try_get,
)
class CarambaTVIE(InfoExtractor):
_VALID_URL = r'(?:carambatv:|https?://video1\.carambatv\.ru/v/)(?P<id>\d+)'
_TESTS = [{
'url': 'http://video1.carambatv.ru/v/191910501',
'md5': '2f4a81b7cfd5ab866ee2d7270cb34a2a',
'info_dict': {
'id': '191910501',
'ext': 'mp4',
'title': '[BadComedian] - Разборка в Маниле (Абсолютный обзор)',
'thumbnail': 're:^https?://.*\.jpg',
'duration': 2678.31,
},
}, {
'url': 'carambatv:191910501',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
video = self._download_json(
'http://video1.carambatv.ru/v/%s/videoinfo.js' % video_id,
video_id)
title = video['title']
base_url = video.get('video') or 'http://video1.carambatv.ru/v/%s/' % video_id
formats = [{
'url': base_url + f['fn'],
'height': int_or_none(f.get('height')),
'format_id': '%sp' % f['height'] if f.get('height') else None,
} for f in video['qualities'] if f.get('fn')]
self._sort_formats(formats)
thumbnail = video.get('splash')
duration = float_or_none(try_get(
video, lambda x: x['annotations'][0]['end_time'], compat_str))
return {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'duration': duration,
'formats': formats,
}
class CarambaTVPageIE(InfoExtractor):
_VALID_URL = r'https?://carambatv\.ru/(?:[^/]+/)+(?P<id>[^/?#&]+)'
_TEST = {
'url': 'http://carambatv.ru/movie/bad-comedian/razborka-v-manile/',
'md5': '',
'info_dict': {
'id': '191910501',
'ext': 'mp4',
'title': '[BadComedian] - Разборка в Маниле (Абсолютный обзор)',
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 2678.31,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_url = self._og_search_property('video:iframe', webpage, default=None)
if not video_url:
video_id = self._search_regex(
r'(?:video_id|crmb_vuid)\s*[:=]\s*["\']?(\d+)',
webpage, 'video id')
video_url = 'carambatv:%s' % video_id
return self.url_result(video_url, CarambaTVIE.ie_key())

View File

@@ -1,17 +1,13 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re from .theplatform import ThePlatformFeedIE
from .theplatform import ThePlatformIE
from ..utils import ( from ..utils import (
xpath_text,
xpath_element,
int_or_none, int_or_none,
find_xpath_attr, find_xpath_attr,
) )
class CBSBaseIE(ThePlatformIE): class CBSBaseIE(ThePlatformFeedIE):
def _parse_smil_subtitles(self, smil, namespace=None, subtitles_lang='en'): def _parse_smil_subtitles(self, smil, namespace=None, subtitles_lang='en'):
closed_caption_e = find_xpath_attr(smil, self._xpath_ns('.//param', namespace), 'name', 'ClosedCaptionURL') closed_caption_e = find_xpath_attr(smil, self._xpath_ns('.//param', namespace), 'name', 'ClosedCaptionURL')
return { return {
@@ -21,9 +17,22 @@ class CBSBaseIE(ThePlatformIE):
}] }]
} if closed_caption_e is not None and closed_caption_e.attrib.get('value') else [] } if closed_caption_e is not None and closed_caption_e.attrib.get('value') else []
def _extract_video_info(self, filter_query, video_id):
return self._extract_feed_info(
'dJ5BDC', 'VxxJg8Ymh8sE', filter_query, video_id, lambda entry: {
'series': entry.get('cbs$SeriesTitle'),
'season_number': int_or_none(entry.get('cbs$SeasonNumber')),
'episode': entry.get('cbs$EpisodeTitle'),
'episode_number': int_or_none(entry.get('cbs$EpisodeNumber')),
}, {
'StreamPack': {
'manifest': 'm3u',
}
})
class CBSIE(CBSBaseIE): class CBSIE(CBSBaseIE):
_VALID_URL = r'(?:cbs:(?P<content_id>\w+)|https?://(?:www\.)?(?:cbs\.com/shows/[^/]+/(?:video|artist)|colbertlateshow\.com/(?:video|podcasts))/[^/]+/(?P<display_id>[^/]+))' _VALID_URL = r'(?:cbs:|https?://(?:www\.)?(?:cbs\.com/shows/[^/]+/video|colbertlateshow\.com/(?:video|podcasts))/)(?P<id>[\w-]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.cbs.com/shows/garth-brooks/video/_u7W953k6la293J7EPTd9oHkSPs6Xn6_/connect-chat-feat-garth-brooks/', 'url': 'http://www.cbs.com/shows/garth-brooks/video/_u7W953k6la293J7EPTd9oHkSPs6Xn6_/connect-chat-feat-garth-brooks/',
@@ -38,25 +47,7 @@ class CBSIE(CBSBaseIE):
'upload_date': '20131127', 'upload_date': '20131127',
'uploader': 'CBSI-NEW', 'uploader': 'CBSI-NEW',
}, },
'params': { 'expected_warnings': ['Failed to download m3u8 information'],
# rtmp download
'skip_download': True,
},
'_skip': 'Blocked outside the US',
}, {
'url': 'http://www.cbs.com/shows/liveonletterman/artist/221752/st-vincent/',
'info_dict': {
'id': 'WWF_5KqY3PK1',
'display_id': 'st-vincent',
'ext': 'flv',
'title': 'Live on Letterman - St. Vincent',
'description': 'Live On Letterman: St. Vincent in concert from New York\'s Ed Sullivan Theater on Tuesday, July 16, 2014.',
'duration': 3221,
},
'params': {
# rtmp download
'skip_download': True,
},
'_skip': 'Blocked outside the US', '_skip': 'Blocked outside the US',
}, { }, {
'url': 'http://colbertlateshow.com/video/8GmB0oY0McANFvp2aEffk9jZZZ2YyXxy/the-colbeard/', 'url': 'http://colbertlateshow.com/video/8GmB0oY0McANFvp2aEffk9jZZZ2YyXxy/the-colbeard/',
@@ -68,44 +59,5 @@ class CBSIE(CBSBaseIE):
TP_RELEASE_URL_TEMPLATE = 'http://link.theplatform.com/s/dJ5BDC/%s?mbr=true' TP_RELEASE_URL_TEMPLATE = 'http://link.theplatform.com/s/dJ5BDC/%s?mbr=true'
def _real_extract(self, url): def _real_extract(self, url):
content_id, display_id = re.match(self._VALID_URL, url).groups() content_id = self._match_id(url)
if not content_id: return self._extract_video_info('byGuid=%s' % content_id, content_id)
webpage = self._download_webpage(url, display_id)
content_id = self._search_regex(
[r"video\.settings\.content_id\s*=\s*'([^']+)';", r"cbsplayer\.contentId\s*=\s*'([^']+)';"],
webpage, 'content id')
items_data = self._download_xml(
'http://can.cbs.com/thunder/player/videoPlayerService.php',
content_id, query={'partner': 'cbs', 'contentId': content_id})
video_data = xpath_element(items_data, './/item')
title = xpath_text(video_data, 'videoTitle', 'title', True)
subtitles = {}
formats = []
for item in items_data.findall('.//item'):
pid = xpath_text(item, 'pid')
if not pid:
continue
tp_release_url = self.TP_RELEASE_URL_TEMPLATE % pid
if '.m3u8' in xpath_text(item, 'contentUrl', default=''):
tp_release_url += '&manifest=m3u'
tp_formats, tp_subtitles = self._extract_theplatform_smil(
tp_release_url, content_id, 'Downloading %s SMIL data' % pid)
formats.extend(tp_formats)
subtitles = self._merge_subtitles(subtitles, tp_subtitles)
self._sort_formats(formats)
info = self.get_metadata('dJ5BDC/media/guid/2198311517/%s' % content_id, content_id)
info.update({
'id': content_id,
'display_id': display_id,
'title': title,
'series': xpath_text(video_data, 'seriesTitle'),
'season_number': int_or_none(xpath_text(video_data, 'seasonNumber')),
'episode_number': int_or_none(xpath_text(video_data, 'episodeNumber')),
'duration': int_or_none(xpath_text(video_data, 'videoLength'), 1000),
'thumbnail': xpath_text(video_data, 'previewImageURL'),
'formats': formats,
'subtitles': subtitles,
})
return info

View File

@@ -80,9 +80,6 @@ class CBSInteractiveIE(ThePlatformIE):
media_guid_path = 'media/guid/%d/%s' % (self.MPX_ACCOUNTS[site], vdata['mpxRefId']) media_guid_path = 'media/guid/%d/%s' % (self.MPX_ACCOUNTS[site], vdata['mpxRefId'])
formats, subtitles = [], {} formats, subtitles = [], {}
if site == 'cnet':
formats, subtitles = self._extract_theplatform_smil(
self.TP_RELEASE_URL_TEMPLATE % media_guid_path, video_id)
for (fkey, vid) in vdata['files'].items(): for (fkey, vid) in vdata['files'].items():
if fkey == 'hls_phone' and 'hls_tablet' in vdata['files']: if fkey == 'hls_phone' and 'hls_tablet' in vdata['files']:
continue continue
@@ -94,7 +91,7 @@ class CBSInteractiveIE(ThePlatformIE):
subtitles = self._merge_subtitles(subtitles, tp_subtitles) subtitles = self._merge_subtitles(subtitles, tp_subtitles)
self._sort_formats(formats) self._sort_formats(formats)
info = self.get_metadata('kYEXFC/%s' % media_guid_path, video_id) info = self._extract_theplatform_metadata('kYEXFC/%s' % media_guid_path, video_id)
info.update({ info.update({
'id': video_id, 'id': video_id,
'display_id': display_id, 'display_id': display_id,

View File

@@ -30,9 +30,12 @@ class CBSNewsIE(CBSBaseIE):
{ {
'url': 'http://www.cbsnews.com/videos/fort-hood-shooting-army-downplays-mental-illness-as-cause-of-attack/', 'url': 'http://www.cbsnews.com/videos/fort-hood-shooting-army-downplays-mental-illness-as-cause-of-attack/',
'info_dict': { 'info_dict': {
'id': 'fort-hood-shooting-army-downplays-mental-illness-as-cause-of-attack', 'id': 'SNJBOYzXiWBOvaLsdzwH8fmtP1SCd91Y',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Fort Hood shooting: Army downplays mental illness as cause of attack', 'title': 'Fort Hood shooting: Army downplays mental illness as cause of attack',
'description': 'md5:4a6983e480542d8b333a947bfc64ddc7',
'upload_date': '19700101',
'uploader': 'CBSI-NEW',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
'duration': 205, 'duration': 205,
'subtitles': { 'subtitles': {
@@ -58,30 +61,8 @@ class CBSNewsIE(CBSBaseIE):
webpage, 'video JSON info'), video_id) webpage, 'video JSON info'), video_id)
item = video_info['item'] if 'item' in video_info else video_info item = video_info['item'] if 'item' in video_info else video_info
title = item.get('articleTitle') or item.get('hed') guid = item['mpxRefId']
duration = item.get('duration') return self._extract_video_info('byGuid=%s' % guid, guid)
thumbnail = item.get('mediaImage') or item.get('thumbnail')
subtitles = {}
formats = []
for format_id in ['RtmpMobileLow', 'RtmpMobileHigh', 'Hls', 'RtmpDesktop']:
pid = item.get('media' + format_id)
if not pid:
continue
release_url = 'http://link.theplatform.com/s/dJ5BDC/%s?mbr=true' % pid
tp_formats, tp_subtitles = self._extract_theplatform_smil(release_url, video_id, 'Downloading %s SMIL data' % pid)
formats.extend(tp_formats)
subtitles = self._merge_subtitles(subtitles, tp_subtitles)
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'duration': duration,
'formats': formats,
'subtitles': subtitles,
}
class CBSNewsLiveVideoIE(InfoExtractor): class CBSNewsLiveVideoIE(InfoExtractor):

View File

@@ -1,30 +1,28 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re from .cbs import CBSBaseIE
from .common import InfoExtractor
class CBSSportsIE(InfoExtractor): class CBSSportsIE(CBSBaseIE):
_VALID_URL = r'https?://www\.cbssports\.com/video/player/(?P<section>[^/]+)/(?P<id>[^/]+)' _VALID_URL = r'https?://www\.cbssports\.com/video/player/[^/]+/(?P<id>\d+)'
_TEST = { _TESTS = [{
'url': 'http://www.cbssports.com/video/player/tennis/318462531970/0/us-open-flashbacks-1990s', 'url': 'http://www.cbssports.com/video/player/videos/708337219968/0/ben-simmons-the-next-lebron?-not-so-fast',
'info_dict': { 'info_dict': {
'id': '_d5_GbO8p1sT', 'id': '708337219968',
'ext': 'flv', 'ext': 'mp4',
'title': 'US Open flashbacks: 1990s', 'title': 'Ben Simmons the next LeBron? Not so fast',
'description': 'Bill Macatee relives the best moments in US Open history from the 1990s.', 'description': 'md5:854294f627921baba1f4b9a990d87197',
'timestamp': 1466293740,
'upload_date': '20160618',
'uploader': 'CBSI-NEW',
}, },
} 'params': {
# m3u8 download
'skip_download': True,
}
}]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id = self._match_id(url)
section = mobj.group('section') return self._extract_video_info('byId=%s' % video_id, video_id)
video_id = mobj.group('id')
all_videos = self._download_json(
'http://www.cbssports.com/data/video/player/getVideos/%s?as=json' % section,
video_id)
# The json file contains the info of all the videos in the section
video_info = next(v for v in all_videos if v['pcid'] == video_id)
return self.url_result('theplatform:%s' % video_info['pid'], 'ThePlatform')

View File

@@ -58,7 +58,8 @@ class CDAIE(InfoExtractor):
def extract_format(page, version): def extract_format(page, version):
unpacked = decode_packed_codes(page) unpacked = decode_packed_codes(page)
format_url = self._search_regex( format_url = self._search_regex(
r"url:\\'(.+?)\\'", unpacked, '%s url' % version, fatal=False) r"(?:file|url)\s*:\s*(\\?[\"'])(?P<url>http.+?)\1", unpacked,
'%s url' % version, fatal=False, group='url')
if not format_url: if not format_url:
return return
f = { f = {
@@ -75,7 +76,8 @@ class CDAIE(InfoExtractor):
info_dict['formats'].append(f) info_dict['formats'].append(f)
if not info_dict['duration']: if not info_dict['duration']:
info_dict['duration'] = parse_duration(self._search_regex( info_dict['duration'] = parse_duration(self._search_regex(
r"duration:\\'(.+?)\\'", unpacked, 'duration', fatal=False)) r"duration\s*:\s*(\\?[\"'])(?P<duration>.+?)\1",
unpacked, 'duration', fatal=False, group='duration'))
extract_format(webpage, 'default') extract_format(webpage, 'default')

View File

@@ -20,54 +20,64 @@ class Channel9IE(InfoExtractor):
''' '''
IE_DESC = 'Channel 9' IE_DESC = 'Channel 9'
IE_NAME = 'channel9' IE_NAME = 'channel9'
_VALID_URL = r'https?://(?:www\.)?channel9\.msdn\.com/(?P<contentpath>.+)/?' _VALID_URL = r'https?://(?:www\.)?channel9\.msdn\.com/(?P<contentpath>.+?)(?P<rss>/RSS)?/?(?:[?#&]|$)'
_TESTS = [ _TESTS = [{
{ 'url': 'http://channel9.msdn.com/Events/TechEd/Australia/2013/KOS002',
'url': 'http://channel9.msdn.com/Events/TechEd/Australia/2013/KOS002', 'md5': 'bbd75296ba47916b754e73c3a4bbdf10',
'md5': 'bbd75296ba47916b754e73c3a4bbdf10', 'info_dict': {
'info_dict': { 'id': 'Events/TechEd/Australia/2013/KOS002',
'id': 'Events/TechEd/Australia/2013/KOS002', 'ext': 'mp4',
'ext': 'mp4', 'title': 'Developer Kick-Off Session: Stuff We Love',
'title': 'Developer Kick-Off Session: Stuff We Love', 'description': 'md5:c08d72240b7c87fcecafe2692f80e35f',
'description': 'md5:c08d72240b7c87fcecafe2692f80e35f', 'duration': 4576,
'duration': 4576, 'thumbnail': 're:http://.*\.jpg',
'thumbnail': 're:http://.*\.jpg', 'session_code': 'KOS002',
'session_code': 'KOS002', 'session_day': 'Day 1',
'session_day': 'Day 1', 'session_room': 'Arena 1A',
'session_room': 'Arena 1A', 'session_speakers': ['Ed Blankenship', 'Andrew Coates', 'Brady Gaster', 'Patrick Klug',
'session_speakers': ['Ed Blankenship', 'Andrew Coates', 'Brady Gaster', 'Patrick Klug', 'Mads Kristensen'], 'Mads Kristensen'],
},
}, },
{ }, {
'url': 'http://channel9.msdn.com/posts/Self-service-BI-with-Power-BI-nuclear-testing', 'url': 'http://channel9.msdn.com/posts/Self-service-BI-with-Power-BI-nuclear-testing',
'md5': 'b43ee4529d111bc37ba7ee4f34813e68', 'md5': 'b43ee4529d111bc37ba7ee4f34813e68',
'info_dict': { 'info_dict': {
'id': 'posts/Self-service-BI-with-Power-BI-nuclear-testing', 'id': 'posts/Self-service-BI-with-Power-BI-nuclear-testing',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Self-service BI with Power BI - nuclear testing', 'title': 'Self-service BI with Power BI - nuclear testing',
'description': 'md5:d1e6ecaafa7fb52a2cacdf9599829f5b', 'description': 'md5:d1e6ecaafa7fb52a2cacdf9599829f5b',
'duration': 1540, 'duration': 1540,
'thumbnail': 're:http://.*\.jpg', 'thumbnail': 're:http://.*\.jpg',
'authors': ['Mike Wilmot'], 'authors': ['Mike Wilmot'],
},
}, },
{ }, {
# low quality mp4 is best # low quality mp4 is best
'url': 'https://channel9.msdn.com/Events/CPP/CppCon-2015/Ranges-for-the-Standard-Library', 'url': 'https://channel9.msdn.com/Events/CPP/CppCon-2015/Ranges-for-the-Standard-Library',
'info_dict': { 'info_dict': {
'id': 'Events/CPP/CppCon-2015/Ranges-for-the-Standard-Library', 'id': 'Events/CPP/CppCon-2015/Ranges-for-the-Standard-Library',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Ranges for the Standard Library', 'title': 'Ranges for the Standard Library',
'description': 'md5:2e6b4917677af3728c5f6d63784c4c5d', 'description': 'md5:2e6b4917677af3728c5f6d63784c4c5d',
'duration': 5646, 'duration': 5646,
'thumbnail': 're:http://.*\.jpg', 'thumbnail': 're:http://.*\.jpg',
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
} }, {
] 'url': 'https://channel9.msdn.com/Niners/Splendid22/Queue/76acff796e8f411184b008028e0d492b/RSS',
'info_dict': {
'id': 'Niners/Splendid22/Queue/76acff796e8f411184b008028e0d492b',
'title': 'Channel 9',
},
'playlist_count': 2,
}, {
'url': 'https://channel9.msdn.com/Events/DEVintersection/DEVintersection-2016/RSS',
'only_matching': True,
}, {
'url': 'https://channel9.msdn.com/Events/Speakers/scott-hanselman/RSS?UrlSafeName=scott-hanselman',
'only_matching': True,
}]
_RSS_URL = 'http://channel9.msdn.com/%s/RSS' _RSS_URL = 'http://channel9.msdn.com/%s/RSS'
@@ -254,22 +264,30 @@ class Channel9IE(InfoExtractor):
return self.playlist_result(contents) return self.playlist_result(contents)
def _extract_list(self, content_path): def _extract_list(self, video_id, rss_url=None):
rss = self._download_xml(self._RSS_URL % content_path, content_path, 'Downloading RSS') if not rss_url:
rss_url = self._RSS_URL % video_id
rss = self._download_xml(rss_url, video_id, 'Downloading RSS')
entries = [self.url_result(session_url.text, 'Channel9') entries = [self.url_result(session_url.text, 'Channel9')
for session_url in rss.findall('./channel/item/link')] for session_url in rss.findall('./channel/item/link')]
title_text = rss.find('./channel/title').text title_text = rss.find('./channel/title').text
return self.playlist_result(entries, content_path, title_text) return self.playlist_result(entries, video_id, title_text)
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
content_path = mobj.group('contentpath') content_path = mobj.group('contentpath')
rss = mobj.group('rss')
webpage = self._download_webpage(url, content_path, 'Downloading web page') if rss:
return self._extract_list(content_path, url)
page_type_m = re.search(r'<meta name="WT.entryid" content="(?P<pagetype>[^:]+)[^"]+"/>', webpage) webpage = self._download_webpage(
if page_type_m is not None: url, content_path, 'Downloading web page')
page_type = page_type_m.group('pagetype')
page_type = self._search_regex(
r'<meta[^>]+name=(["\'])WT\.entryid\1[^>]+content=(["\'])(?P<pagetype>[^:]+).+?\2',
webpage, 'page type', default=None, group='pagetype')
if page_type:
if page_type == 'Entry': # Any 'item'-like page, may contain downloadable content if page_type == 'Entry': # Any 'item'-like page, may contain downloadable content
return self._extract_entry_item(webpage, content_path) return self._extract_entry_item(webpage, content_path)
elif page_type == 'Session': # Event session page, may contain downloadable content elif page_type == 'Session': # Event session page, may contain downloadable content
@@ -278,6 +296,5 @@ class Channel9IE(InfoExtractor):
return self._extract_list(content_path) return self._extract_list(content_path)
else: else:
raise ExtractorError('Unexpected WT.entryid %s' % page_type, expected=True) raise ExtractorError('Unexpected WT.entryid %s' % page_type, expected=True)
else: # Assuming list else: # Assuming list
return self._extract_list(content_path) return self._extract_list(content_path)

View File

@@ -1,16 +1,10 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .onet import OnetBaseIE
from ..utils import (
ExtractorError,
float_or_none,
int_or_none,
parse_iso8601,
)
class ClipRsIE(InfoExtractor): class ClipRsIE(OnetBaseIE):
_VALID_URL = r'https?://(?:www\.)?clip\.rs/(?P<id>[^/]+)/\d+' _VALID_URL = r'https?://(?:www\.)?clip\.rs/(?P<id>[^/]+)/\d+'
_TEST = { _TEST = {
'url': 'http://www.clip.rs/premijera-frajle-predstavljaju-novi-spot-za-pesmu-moli-me-moli/3732', 'url': 'http://www.clip.rs/premijera-frajle-predstavljaju-novi-spot-za-pesmu-moli-me-moli/3732',
@@ -27,64 +21,13 @@ class ClipRsIE(InfoExtractor):
} }
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, display_id)
video_id = self._search_regex( mvp_id = self._search_mvp_id(webpage)
r'id=(["\'])mvp:(?P<id>.+?)\1', webpage, 'mvp id', group='id')
response = self._download_json( info_dict = self._extract_from_id(mvp_id, webpage)
'http://qi.ckm.onetapi.pl/', video_id, info_dict['display_id'] = display_id
query={
'body[id]': video_id,
'body[jsonrpc]': '2.0',
'body[method]': 'get_asset_detail',
'body[params][ID_Publikacji]': video_id,
'body[params][Service]': 'www.onet.pl',
'content-type': 'application/jsonp',
'x-onet-app': 'player.front.onetapi.pl',
})
error = response.get('error') return info_dict
if error:
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, error['message']), expected=True)
video = response['result'].get('0')
formats = []
for _, formats_dict in video['formats'].items():
if not isinstance(formats_dict, dict):
continue
for format_id, format_list in formats_dict.items():
if not isinstance(format_list, list):
continue
for f in format_list:
if not f.get('url'):
continue
formats.append({
'url': f['url'],
'format_id': format_id,
'height': int_or_none(f.get('vertical_resolution')),
'width': int_or_none(f.get('horizontal_resolution')),
'abr': float_or_none(f.get('audio_bitrate')),
'vbr': float_or_none(f.get('video_bitrate')),
})
self._sort_formats(formats)
meta = video.get('meta', {})
title = self._og_search_title(webpage, default=None) or meta['title']
description = self._og_search_description(webpage, default=None) or meta.get('description')
duration = meta.get('length') or meta.get('lenght')
timestamp = parse_iso8601(meta.get('addDate'), ' ')
return {
'id': video_id,
'title': title,
'description': description,
'duration': duration,
'timestamp': timestamp,
'formats': formats,
}

View File

@@ -0,0 +1,92 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class CloserToTruthIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?closertotruth\.com/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://closertotruth.com/series/solutions-the-mind-body-problem#video-3688',
'info_dict': {
'id': '0_zof1ktre',
'display_id': 'solutions-the-mind-body-problem',
'ext': 'mov',
'title': 'Solutions to the Mind-Body Problem?',
'upload_date': '20140221',
'timestamp': 1392956007,
'uploader_id': 'CTTXML'
},
'params': {
'skip_download': True,
},
}, {
'url': 'http://closertotruth.com/episodes/how-do-brains-work',
'info_dict': {
'id': '0_iuxai6g6',
'display_id': 'how-do-brains-work',
'ext': 'mov',
'title': 'How do Brains Work?',
'upload_date': '20140221',
'timestamp': 1392956024,
'uploader_id': 'CTTXML'
},
'params': {
'skip_download': True,
},
}, {
'url': 'http://closertotruth.com/interviews/1725',
'info_dict': {
'id': '1725',
'title': 'AyaFr-002',
},
'playlist_mincount': 2,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
partner_id = self._search_regex(
r'<script[^>]+src=["\'].*?\b(?:partner_id|p)/(\d+)',
webpage, 'kaltura partner_id')
title = self._search_regex(
r'<title>(.+?)\s*\|\s*.+?</title>', webpage, 'video title')
select = self._search_regex(
r'(?s)<select[^>]+id="select-version"[^>]*>(.+?)</select>',
webpage, 'select version', default=None)
if select:
entry_ids = set()
entries = []
for mobj in re.finditer(
r'<option[^>]+value=(["\'])(?P<id>[0-9a-z_]+)(?:#.+?)?\1[^>]*>(?P<title>[^<]+)',
webpage):
entry_id = mobj.group('id')
if entry_id in entry_ids:
continue
entry_ids.add(entry_id)
entries.append({
'_type': 'url_transparent',
'url': 'kaltura:%s:%s' % (partner_id, entry_id),
'ie_key': 'Kaltura',
'title': mobj.group('title'),
})
if entries:
return self.playlist_result(entries, display_id, title)
entry_id = self._search_regex(
r'<a[^>]+id=(["\'])embed-kaltura\1[^>]+data-kaltura=(["\'])(?P<id>[0-9a-z_]+)\2',
webpage, 'kaltura entry_id', group='id')
return {
'_type': 'url_transparent',
'display_id': display_id,
'url': 'kaltura:%s:%s' % (partner_id, entry_id),
'ie_key': 'Kaltura',
'title': title
}

View File

@@ -44,7 +44,9 @@ from ..utils import (
sanitized_Request, sanitized_Request,
unescapeHTML, unescapeHTML,
unified_strdate, unified_strdate,
unified_timestamp,
url_basename, url_basename,
xpath_element,
xpath_text, xpath_text,
xpath_with_ns, xpath_with_ns,
determine_protocol, determine_protocol,
@@ -52,6 +54,9 @@ from ..utils import (
mimetype2ext, mimetype2ext,
update_Request, update_Request,
update_url_query, update_url_query,
parse_m3u8_attributes,
extract_attributes,
parse_codecs,
) )
@@ -159,6 +164,7 @@ class InfoExtractor(object):
* "height" (optional, int) * "height" (optional, int)
* "resolution" (optional, string "{width}x{height"}, * "resolution" (optional, string "{width}x{height"},
deprecated) deprecated)
* "filesize" (optional, int)
thumbnail: Full URL to a video thumbnail image. thumbnail: Full URL to a video thumbnail image.
description: Full video description. description: Full video description.
uploader: Full name of the video uploader. uploader: Full name of the video uploader.
@@ -747,10 +753,12 @@ class InfoExtractor(object):
return self._og_search_property('url', html, **kargs) return self._og_search_property('url', html, **kargs)
def _html_search_meta(self, name, html, display_name=None, fatal=False, **kwargs): def _html_search_meta(self, name, html, display_name=None, fatal=False, **kwargs):
if not isinstance(name, (list, tuple)):
name = [name]
if display_name is None: if display_name is None:
display_name = name display_name = name[0]
return self._html_search_regex( return self._html_search_regex(
self._meta_regex(name), [self._meta_regex(n) for n in name],
html, display_name, fatal=fatal, group='content', **kwargs) html, display_name, fatal=fatal, group='content', **kwargs)
def _dc_search_uploader(self, html): def _dc_search_uploader(self, html):
@@ -799,15 +807,17 @@ class InfoExtractor(object):
return self._html_search_meta('twitter:player', html, return self._html_search_meta('twitter:player', html,
'twitter card player') 'twitter card player')
def _search_json_ld(self, html, video_id, **kwargs): def _search_json_ld(self, html, video_id, expected_type=None, **kwargs):
json_ld = self._search_regex( json_ld = self._search_regex(
r'(?s)<script[^>]+type=(["\'])application/ld\+json\1[^>]*>(?P<json_ld>.+?)</script>', r'(?s)<script[^>]+type=(["\'])application/ld\+json\1[^>]*>(?P<json_ld>.+?)</script>',
html, 'JSON-LD', group='json_ld', **kwargs) html, 'JSON-LD', group='json_ld', **kwargs)
if not json_ld: if not json_ld:
return {} return {}
return self._json_ld(json_ld, video_id, fatal=kwargs.get('fatal', True)) return self._json_ld(
json_ld, video_id, fatal=kwargs.get('fatal', True),
expected_type=expected_type)
def _json_ld(self, json_ld, video_id, fatal=True): def _json_ld(self, json_ld, video_id, fatal=True, expected_type=None):
if isinstance(json_ld, compat_str): if isinstance(json_ld, compat_str):
json_ld = self._parse_json(json_ld, video_id, fatal=fatal) json_ld = self._parse_json(json_ld, video_id, fatal=fatal)
if not json_ld: if not json_ld:
@@ -815,6 +825,8 @@ class InfoExtractor(object):
info = {} info = {}
if json_ld.get('@context') == 'http://schema.org': if json_ld.get('@context') == 'http://schema.org':
item_type = json_ld.get('@type') item_type = json_ld.get('@type')
if expected_type is not None and expected_type != item_type:
return info
if item_type == 'TVEpisode': if item_type == 'TVEpisode':
info.update({ info.update({
'episode': unescapeHTML(json_ld.get('name')), 'episode': unescapeHTML(json_ld.get('name')),
@@ -833,6 +845,19 @@ class InfoExtractor(object):
'title': unescapeHTML(json_ld.get('headline')), 'title': unescapeHTML(json_ld.get('headline')),
'description': unescapeHTML(json_ld.get('articleBody')), 'description': unescapeHTML(json_ld.get('articleBody')),
}) })
elif item_type == 'VideoObject':
info.update({
'url': json_ld.get('contentUrl'),
'title': unescapeHTML(json_ld.get('name')),
'description': unescapeHTML(json_ld.get('description')),
'thumbnail': json_ld.get('thumbnailUrl'),
'duration': parse_duration(json_ld.get('duration')),
'timestamp': unified_timestamp(json_ld.get('uploadDate')),
'filesize': float_or_none(json_ld.get('contentSize')),
'tbr': int_or_none(json_ld.get('bitrate')),
'width': int_or_none(json_ld.get('width')),
'height': int_or_none(json_ld.get('height')),
})
return dict((k, v) for k, v in info.items() if v is not None) return dict((k, v) for k, v in info.items() if v is not None)
@staticmethod @staticmethod
@@ -874,7 +899,11 @@ class InfoExtractor(object):
f['ext'] = determine_ext(f['url']) f['ext'] = determine_ext(f['url'])
if isinstance(field_preference, (list, tuple)): if isinstance(field_preference, (list, tuple)):
return tuple(f.get(field) if f.get(field) is not None else -1 for field in field_preference) return tuple(
f.get(field)
if f.get(field) is not None
else ('' if field == 'format_id' else -1)
for field in field_preference)
preference = f.get('preference') preference = f.get('preference')
if preference is None: if preference is None:
@@ -1030,7 +1059,7 @@ class InfoExtractor(object):
if base_url: if base_url:
base_url = base_url.strip() base_url = base_url.strip()
bootstrap_info = xpath_text( bootstrap_info = xpath_element(
manifest, ['{http://ns.adobe.com/f4m/1.0}bootstrapInfo', '{http://ns.adobe.com/f4m/2.0}bootstrapInfo'], manifest, ['{http://ns.adobe.com/f4m/1.0}bootstrapInfo', '{http://ns.adobe.com/f4m/2.0}bootstrapInfo'],
'bootstrap info', default=None) 'bootstrap info', default=None)
@@ -1085,7 +1114,7 @@ class InfoExtractor(object):
formats.append({ formats.append({
'format_id': format_id, 'format_id': format_id,
'url': manifest_url, 'url': manifest_url,
'ext': 'flv' if bootstrap_info else None, 'ext': 'flv' if bootstrap_info is not None else None,
'tbr': tbr, 'tbr': tbr,
'width': width, 'width': width,
'height': height, 'height': height,
@@ -1149,23 +1178,11 @@ class InfoExtractor(object):
}] }]
last_info = None last_info = None
last_media = None last_media = None
kv_rex = re.compile(
r'(?P<key>[a-zA-Z_-]+)=(?P<val>"[^"]+"|[^",]+)(?:,|$)')
for line in m3u8_doc.splitlines(): for line in m3u8_doc.splitlines():
if line.startswith('#EXT-X-STREAM-INF:'): if line.startswith('#EXT-X-STREAM-INF:'):
last_info = {} last_info = parse_m3u8_attributes(line)
for m in kv_rex.finditer(line):
v = m.group('val')
if v.startswith('"'):
v = v[1:-1]
last_info[m.group('key')] = v
elif line.startswith('#EXT-X-MEDIA:'): elif line.startswith('#EXT-X-MEDIA:'):
last_media = {} last_media = parse_m3u8_attributes(line)
for m in kv_rex.finditer(line):
v = m.group('val')
if v.startswith('"'):
v = v[1:-1]
last_media[m.group('key')] = v
elif line.startswith('#') or not line.strip(): elif line.startswith('#') or not line.strip():
continue continue
else: else:
@@ -1190,6 +1207,7 @@ class InfoExtractor(object):
'url': format_url(line.strip()), 'url': format_url(line.strip()),
'tbr': tbr, 'tbr': tbr,
'ext': ext, 'ext': ext,
'fps': float_or_none(last_info.get('FRAME-RATE')),
'protocol': entry_protocol, 'protocol': entry_protocol,
'preference': preference, 'preference': preference,
} }
@@ -1198,24 +1216,17 @@ class InfoExtractor(object):
width_str, height_str = resolution.split('x') width_str, height_str = resolution.split('x')
f['width'] = int(width_str) f['width'] = int(width_str)
f['height'] = int(height_str) f['height'] = int(height_str)
codecs = last_info.get('CODECS') # Unified Streaming Platform
if codecs: mobj = re.search(
vcodec, acodec = [None] * 2 r'audio.*?(?:%3D|=)(\d+)(?:-video.*?(?:%3D|=)(\d+))?', f['url'])
va_codecs = codecs.split(',') if mobj:
if len(va_codecs) == 1: abr, vbr = mobj.groups()
# Audio only entries usually come with single codec and abr, vbr = float_or_none(abr, 1000), float_or_none(vbr, 1000)
# no resolution. For more robustness we also check it to
# be mp4 audio.
if not resolution and va_codecs[0].startswith('mp4a'):
vcodec, acodec = 'none', va_codecs[0]
else:
vcodec = va_codecs[0]
else:
vcodec, acodec = va_codecs[:2]
f.update({ f.update({
'acodec': acodec, 'vbr': vbr,
'vcodec': vcodec, 'abr': abr,
}) })
f.update(parse_codecs(last_info.get('CODECS')))
if last_media is not None: if last_media is not None:
f['m3u8_media'] = last_media f['m3u8_media'] = last_media
last_media = None last_media = None
@@ -1620,6 +1631,62 @@ class InfoExtractor(object):
self.report_warning('Unknown MIME type %s in DASH manifest' % mime_type) self.report_warning('Unknown MIME type %s in DASH manifest' % mime_type)
return formats return formats
def _parse_html5_media_entries(self, base_url, webpage):
def absolute_url(video_url):
return compat_urlparse.urljoin(base_url, video_url)
def parse_content_type(content_type):
if not content_type:
return {}
ctr = re.search(r'(?P<mimetype>[^/]+/[^;]+)(?:;\s*codecs="?(?P<codecs>[^"]+))?', content_type)
if ctr:
mimetype, codecs = ctr.groups()
f = parse_codecs(codecs)
f['ext'] = mimetype2ext(mimetype)
return f
return {}
entries = []
for media_tag, media_type, media_content in re.findall(r'(?s)(<(?P<tag>video|audio)[^>]*>)(.*?)</(?P=tag)>', webpage):
media_info = {
'formats': [],
'subtitles': {},
}
media_attributes = extract_attributes(media_tag)
src = media_attributes.get('src')
if src:
media_info['formats'].append({
'url': absolute_url(src),
'vcodec': 'none' if media_type == 'audio' else None,
})
media_info['thumbnail'] = media_attributes.get('poster')
if media_content:
for source_tag in re.findall(r'<source[^>]+>', media_content):
source_attributes = extract_attributes(source_tag)
src = source_attributes.get('src')
if not src:
continue
f = parse_content_type(source_attributes.get('type'))
f.update({
'url': absolute_url(src),
'vcodec': 'none' if media_type == 'audio' else None,
})
media_info['formats'].append(f)
for track_tag in re.findall(r'<track[^>]+>', media_content):
track_attributes = extract_attributes(track_tag)
kind = track_attributes.get('kind')
if not kind or kind == 'subtitles':
src = track_attributes.get('src')
if not src:
continue
lang = track_attributes.get('srclang') or track_attributes.get('lang') or track_attributes.get('label')
media_info['subtitles'].setdefault(lang, []).append({
'url': absolute_url(src),
})
if media_info['formats']:
entries.append(media_info)
return entries
def _live_title(self, name): def _live_title(self, name):
""" Generate the title for a live video """ """ Generate the title for a live video """
now = datetime.datetime.now() now = datetime.datetime.now()
@@ -1733,6 +1800,13 @@ class InfoExtractor(object):
def _mark_watched(self, *args, **kwargs): def _mark_watched(self, *args, **kwargs):
raise NotImplementedError('This method must be implemented by subclasses') raise NotImplementedError('This method must be implemented by subclasses')
def geo_verification_headers(self):
headers = {}
geo_verification_proxy = self._downloader.params.get('geo_verification_proxy')
if geo_verification_proxy:
headers['Ytdl-request-proxy'] = geo_verification_proxy
return headers
class SearchInfoExtractor(InfoExtractor): class SearchInfoExtractor(InfoExtractor):
""" """

View File

@@ -0,0 +1,30 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class CTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?ctv\.ca/video/player\?vid=(?P<id>[0-9.]+)'
_TESTS = [{
'url': 'http://www.ctv.ca/video/player?vid=706966',
'md5': 'ff2ebbeae0aa2dcc32a830c3fd69b7b0',
'info_dict': {
'id': '706966',
'ext': 'mp4',
'title': 'Larry Day and Richard Jutras on the TIFF red carpet of \'Stonewall\'',
'description': 'etalk catches up with Larry Day and Richard Jutras on the TIFF red carpet of "Stonewall”.',
'upload_date': '20150919',
'timestamp': 1442624700,
},
'expected_warnings': ['HTTP Error 404'],
}]
def _real_extract(self, url):
video_id = self._match_id(url)
return {
'_type': 'url_transparent',
'id': video_id,
'url': '9c9media:ctv_web:%s' % video_id,
'ie_key': 'NineCNineMedia',
}

View File

@@ -0,0 +1,65 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import orderedSet
class CTVNewsIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?ctvnews\.ca/(?:video\?(?:clip|playlist|bin)Id=|.*?)(?P<id>[0-9.]+)'
_TESTS = [{
'url': 'http://www.ctvnews.ca/video?clipId=901995',
'md5': '10deb320dc0ccb8d01d34d12fc2ea672',
'info_dict': {
'id': '901995',
'ext': 'mp4',
'title': 'Extended: \'That person cannot be me\' Johnson says',
'description': 'md5:958dd3b4f5bbbf0ed4d045c790d89285',
'timestamp': 1467286284,
'upload_date': '20160630',
}
}, {
'url': 'http://www.ctvnews.ca/video?playlistId=1.2966224',
'info_dict':
{
'id': '1.2966224',
},
'playlist_mincount': 19,
}, {
'url': 'http://www.ctvnews.ca/video?binId=1.2876780',
'info_dict':
{
'id': '1.2876780',
},
'playlist_mincount': 100,
}, {
'url': 'http://www.ctvnews.ca/1.810401',
'only_matching': True,
}, {
'url': 'http://www.ctvnews.ca/canadiens-send-p-k-subban-to-nashville-in-blockbuster-trade-1.2967231',
'only_matching': True,
}]
def _real_extract(self, url):
page_id = self._match_id(url)
def ninecninemedia_url_result(clip_id):
return {
'_type': 'url_transparent',
'id': clip_id,
'url': '9c9media:ctvnews_web:%s' % clip_id,
'ie_key': 'NineCNineMedia',
}
if page_id.isdigit():
return ninecninemedia_url_result(page_id)
else:
webpage = self._download_webpage('http://www.ctvnews.ca/%s' % page_id, page_id, query={
'ot': 'example.AjaxPageLayout.ot',
'maxItemsPerPage': 1000000,
})
entries = [ninecninemedia_url_result(clip_id) for clip_id in orderedSet(
re.findall(r'clip\.id\s*=\s*(\d+);', webpage))]
return self.playlist_result(entries, page_id)

View File

@@ -16,6 +16,7 @@ from ..utils import (
sanitized_Request, sanitized_Request,
str_to_int, str_to_int,
unescapeHTML, unescapeHTML,
mimetype2ext,
) )
@@ -111,6 +112,13 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
} }
] ]
@staticmethod
def _extract_urls(webpage):
# Look for embedded Dailymotion player
matches = re.findall(
r'<(?:(?:embed|iframe)[^>]+?src=|input[^>]+id=[\'"]dmcloudUrlEmissionSelect[\'"][^>]+value=)(["\'])(?P<url>(?:https?:)?//(?:www\.)?dailymotion\.com/(?:embed|swf)/video/.+?)\1', webpage)
return list(map(lambda m: unescapeHTML(m[1]), matches))
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
@@ -153,18 +161,19 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
type_ = media.get('type') type_ = media.get('type')
if type_ == 'application/vnd.lumberjack.manifest': if type_ == 'application/vnd.lumberjack.manifest':
continue continue
ext = determine_ext(media_url) ext = mimetype2ext(type_) or determine_ext(media_url)
if type_ == 'application/x-mpegURL' or ext == 'm3u8': if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
media_url, video_id, 'mp4', preference=-1, media_url, video_id, 'mp4', preference=-1,
m3u8_id='hls', fatal=False)) m3u8_id='hls', fatal=False))
elif type_ == 'application/f4m' or ext == 'f4m': elif ext == 'f4m':
formats.extend(self._extract_f4m_formats( formats.extend(self._extract_f4m_formats(
media_url, video_id, preference=-1, f4m_id='hds', fatal=False)) media_url, video_id, preference=-1, f4m_id='hds', fatal=False))
else: else:
f = { f = {
'url': media_url, 'url': media_url,
'format_id': 'http-%s' % quality, 'format_id': 'http-%s' % quality,
'ext': ext,
} }
m = re.search(r'H264-(?P<width>\d+)x(?P<height>\d+)', media_url) m = re.search(r'H264-(?P<width>\d+)x(?P<height>\d+)', media_url)
if m: if m:

View File

@@ -66,22 +66,32 @@ class DaumIE(InfoExtractor):
'view_count': int, 'view_count': int,
'comment_count': int, 'comment_count': int,
}, },
}, {
# Requires dte_type=WEB (#9972)
'url': 'http://tvpot.daum.net/v/s3794Uf1NZeZ1qMpGpeqeRU',
'md5': 'a8917742069a4dd442516b86e7d66529',
'info_dict': {
'id': 's3794Uf1NZeZ1qMpGpeqeRU',
'ext': 'mp4',
'title': '러블리즈 - Destiny (나의 지구) (Lovelyz - Destiny) [쇼! 음악중심] 508회 20160611',
'description': '러블리즈 - Destiny (나의 지구) (Lovelyz - Destiny)\n\n[쇼! 음악중심] 20160611, 507회',
'upload_date': '20160611',
},
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id = compat_urllib_parse_unquote(self._match_id(url)) video_id = compat_urllib_parse_unquote(self._match_id(url))
query = compat_urllib_parse_urlencode({'vid': video_id})
movie_data = self._download_json( movie_data = self._download_json(
'http://videofarm.daum.net/controller/api/closed/v1_2/IntegratedMovieData.json?' + query, 'http://videofarm.daum.net/controller/api/closed/v1_2/IntegratedMovieData.json',
video_id, 'Downloading video formats info') video_id, 'Downloading video formats info', query={'vid': video_id, 'dte_type': 'WEB'})
# For urls like http://m.tvpot.daum.net/v/65139429, where the video_id is really a clipid # For urls like http://m.tvpot.daum.net/v/65139429, where the video_id is really a clipid
if not movie_data.get('output_list', {}).get('output_list') and re.match(r'^\d+$', video_id): if not movie_data.get('output_list', {}).get('output_list') and re.match(r'^\d+$', video_id):
return self.url_result('http://tvpot.daum.net/clip/ClipView.do?clipid=%s' % video_id) return self.url_result('http://tvpot.daum.net/clip/ClipView.do?clipid=%s' % video_id)
info = self._download_xml( info = self._download_xml(
'http://tvpot.daum.net/clip/ClipInfoXml.do?' + query, video_id, 'http://tvpot.daum.net/clip/ClipInfoXml.do', video_id,
'Downloading video info') 'Downloading video info', query={'vid': video_id})
formats = [] formats = []
for format_el in movie_data['output_list']['output_list']: for format_el in movie_data['output_list']['output_list']:

View File

@@ -4,78 +4,47 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
float_or_none,
int_or_none,
clean_html,
)
class DBTVIE(InfoExtractor): class DBTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?dbtv\.no/(?:(?:lazyplayer|player)/)?(?P<id>[0-9]+)(?:#(?P<display_id>.+))?' _VALID_URL = r'https?://(?:www\.)?dbtv\.no/(?:[^/]+/)?(?P<id>[0-9]+)(?:#(?P<display_id>.+))?'
_TESTS = [{ _TESTS = [{
'url': 'http://dbtv.no/3649835190001#Skulle_teste_ut_fornøyelsespark,_men_kollegaen_var_bare_opptatt_av_bikinikroppen', 'url': 'http://dbtv.no/3649835190001#Skulle_teste_ut_fornøyelsespark,_men_kollegaen_var_bare_opptatt_av_bikinikroppen',
'md5': 'b89953ed25dacb6edb3ef6c6f430f8bc', 'md5': '2e24f67936517b143a234b4cadf792ec',
'info_dict': { 'info_dict': {
'id': '33100', 'id': '3649835190001',
'display_id': 'Skulle_teste_ut_fornøyelsespark,_men_kollegaen_var_bare_opptatt_av_bikinikroppen', 'display_id': 'Skulle_teste_ut_fornøyelsespark,_men_kollegaen_var_bare_opptatt_av_bikinikroppen',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Skulle teste ut fornøyelsespark, men kollegaen var bare opptatt av bikinikroppen', 'title': 'Skulle teste ut fornøyelsespark, men kollegaen var bare opptatt av bikinikroppen',
'description': 'md5:1504a54606c4dde3e4e61fc97aa857e0', 'description': 'md5:1504a54606c4dde3e4e61fc97aa857e0',
'thumbnail': 're:https?://.*\.jpg$', 'thumbnail': 're:https?://.*\.jpg',
'timestamp': 1404039863.438, 'timestamp': 1404039863,
'upload_date': '20140629', 'upload_date': '20140629',
'duration': 69.544, 'duration': 69.544,
'view_count': int, 'uploader_id': '1027729757001',
'categories': list, },
} 'add_ie': ['BrightcoveNew']
}, { }, {
'url': 'http://dbtv.no/3649835190001', 'url': 'http://dbtv.no/3649835190001',
'only_matching': True, 'only_matching': True,
}, { }, {
'url': 'http://www.dbtv.no/lazyplayer/4631135248001', 'url': 'http://www.dbtv.no/lazyplayer/4631135248001',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://dbtv.no/vice/5000634109001',
'only_matching': True,
}, {
'url': 'http://dbtv.no/filmtrailer/3359293614001',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id, display_id = re.match(self._VALID_URL, url).groups()
video_id = mobj.group('id')
display_id = mobj.group('display_id') or video_id
data = self._download_json(
'http://api.dbtv.no/discovery/%s' % video_id, display_id)
video = data['playlist'][0]
formats = [{
'url': f['URL'],
'vcodec': f.get('container'),
'width': int_or_none(f.get('width')),
'height': int_or_none(f.get('height')),
'vbr': float_or_none(f.get('rate'), 1000),
'filesize': int_or_none(f.get('size')),
} for f in video['renditions'] if 'URL' in f]
if not formats:
for url_key, format_id in [('URL', 'mp4'), ('HLSURL', 'hls')]:
if url_key in video:
formats.append({
'url': video[url_key],
'format_id': format_id,
})
self._sort_formats(formats)
return { return {
'id': compat_str(video['id']), '_type': 'url_transparent',
'url': 'http://players.brightcove.net/1027729757001/default_default/index.html?videoId=%s' % video_id,
'id': video_id,
'display_id': display_id, 'display_id': display_id,
'title': video['title'], 'ie_key': 'BrightcoveNew',
'description': clean_html(video['desc']),
'thumbnail': video.get('splash') or video.get('thumb'),
'timestamp': float_or_none(video.get('publishedAt'), 1000),
'duration': float_or_none(video.get('length'), 1000),
'view_count': int_or_none(video.get('views')),
'categories': video.get('tags'),
'formats': formats,
} }

View File

@@ -20,7 +20,7 @@ from ..utils import (
class DCNIE(InfoExtractor): class DCNIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?dcndigital\.ae/(?:#/)?show/(?P<show_id>\d+)/[^/]+(?:/(?P<video_id>\d+)/(?P<season_id>\d+))?' _VALID_URL = r'https?://(?:www\.)?(?:awaan|dcndigital)\.ae/(?:#/)?show/(?P<show_id>\d+)/[^/]+(?:/(?P<video_id>\d+)/(?P<season_id>\d+))?'
def _real_extract(self, url): def _real_extract(self, url):
show_id, video_id, season_id = re.match(self._VALID_URL, url).groups() show_id, video_id, season_id = re.match(self._VALID_URL, url).groups()
@@ -55,30 +55,32 @@ class DCNBaseIE(InfoExtractor):
'is_live': is_live, 'is_live': is_live,
} }
def _extract_video_formats(self, webpage, video_id, entry_protocol): def _extract_video_formats(self, webpage, video_id, m3u8_entry_protocol):
formats = [] formats = []
m3u8_url = self._html_search_regex( format_url_base = 'http' + self._html_search_regex(
r'file\s*:\s*"([^"]+)', webpage, 'm3u8 url', fatal=False) [
if m3u8_url: r'file\s*:\s*"https?(://[^"]+)/playlist.m3u8',
formats.extend(self._extract_m3u8_formats( r'<a[^>]+href="rtsp(://[^"]+)"'
m3u8_url, video_id, 'mp4', entry_protocol, m3u8_id='hls', fatal=None)) ], webpage, 'format url')
# TODO: Current DASH formats are broken - $Time$ pattern in
rtsp_url = self._search_regex( # <SegmentTemplate> not implemented yet
r'<a[^>]+href="(rtsp://[^"]+)"', webpage, 'rtsp url', fatal=False) # formats.extend(self._extract_mpd_formats(
if rtsp_url: # format_url_base + '/manifest.mpd',
formats.append({ # video_id, mpd_id='dash', fatal=False))
'url': rtsp_url, formats.extend(self._extract_m3u8_formats(
'format_id': 'rtsp', format_url_base + '/playlist.m3u8', video_id, 'mp4',
}) m3u8_entry_protocol, m3u8_id='hls', fatal=False))
formats.extend(self._extract_f4m_formats(
format_url_base + '/manifest.f4m',
video_id, f4m_id='hds', fatal=False))
self._sort_formats(formats) self._sort_formats(formats)
return formats return formats
class DCNVideoIE(DCNBaseIE): class DCNVideoIE(DCNBaseIE):
IE_NAME = 'dcn:video' IE_NAME = 'dcn:video'
_VALID_URL = r'https?://(?:www\.)?dcndigital\.ae/(?:#/)?(?:video/[^/]+|media|catchup/[^/]+/[^/]+)/(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?(?:awaan|dcndigital)\.ae/(?:#/)?(?:video(?:/[^/]+)?|media|catchup/[^/]+/[^/]+)/(?P<id>\d+)'
_TEST = { _TESTS = [{
'url': 'http://www.dcndigital.ae/#/video/%D8%B1%D8%AD%D9%84%D8%A9-%D8%A7%D9%84%D8%B9%D9%85%D8%B1-%D8%A7%D9%84%D8%AD%D9%84%D9%82%D8%A9-1/17375', 'url': 'http://www.dcndigital.ae/#/video/%D8%B1%D8%AD%D9%84%D8%A9-%D8%A7%D9%84%D8%B9%D9%85%D8%B1-%D8%A7%D9%84%D8%AD%D9%84%D9%82%D8%A9-1/17375',
'info_dict': 'info_dict':
{ {
@@ -94,7 +96,10 @@ class DCNVideoIE(DCNBaseIE):
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
}, },
} }, {
'url': 'http://awaan.ae/video/26723981/%D8%AF%D8%A7%D8%B1-%D8%A7%D9%84%D8%B3%D9%84%D8%A7%D9%85:-%D8%AE%D9%8A%D8%B1-%D8%AF%D9%88%D8%B1-%D8%A7%D9%84%D8%A3%D9%86%D8%B5%D8%A7%D8%B1',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
@@ -120,7 +125,7 @@ class DCNVideoIE(DCNBaseIE):
class DCNLiveIE(DCNBaseIE): class DCNLiveIE(DCNBaseIE):
IE_NAME = 'dcn:live' IE_NAME = 'dcn:live'
_VALID_URL = r'https?://(?:www\.)?dcndigital\.ae/(?:#/)?live/(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?(?:awaan|dcndigital)\.ae/(?:#/)?live/(?P<id>\d+)'
def _real_extract(self, url): def _real_extract(self, url):
channel_id = self._match_id(url) channel_id = self._match_id(url)
@@ -147,7 +152,7 @@ class DCNLiveIE(DCNBaseIE):
class DCNSeasonIE(InfoExtractor): class DCNSeasonIE(InfoExtractor):
IE_NAME = 'dcn:season' IE_NAME = 'dcn:season'
_VALID_URL = r'https?://(?:www\.)?dcndigital\.ae/(?:#/)?program/(?:(?P<show_id>\d+)|season/(?P<season_id>\d+))' _VALID_URL = r'https?://(?:www\.)?(?:awaan|dcndigital)\.ae/(?:#/)?program/(?:(?P<show_id>\d+)|season/(?P<season_id>\d+))'
_TEST = { _TEST = {
'url': 'http://dcndigital.ae/#/program/205024/%D9%85%D8%AD%D8%A7%D8%B6%D8%B1%D8%A7%D8%AA-%D8%A7%D9%84%D8%B4%D9%8A%D8%AE-%D8%A7%D9%84%D8%B4%D8%B9%D8%B1%D8%A7%D9%88%D9%8A', 'url': 'http://dcndigital.ae/#/program/205024/%D9%85%D8%AD%D8%A7%D8%B6%D8%B1%D8%A7%D8%AA-%D8%A7%D9%84%D8%B4%D9%8A%D8%AE-%D8%A7%D9%84%D8%B4%D8%B9%D8%B1%D8%A7%D9%88%D9%8A',
'info_dict': 'info_dict':

View File

@@ -35,6 +35,7 @@ class DWIE(InfoExtractor):
'upload_date': '20160311', 'upload_date': '20160311',
} }
}, { }, {
# DW documentaries, only last for one or two weeks
'url': 'http://www.dw.com/en/documentaries-welcome-to-the-90s-2016-05-21/e-19220158-9798', 'url': 'http://www.dw.com/en/documentaries-welcome-to-the-90s-2016-05-21/e-19220158-9798',
'md5': '56b6214ef463bfb9a3b71aeb886f3cf1', 'md5': '56b6214ef463bfb9a3b71aeb886f3cf1',
'info_dict': { 'info_dict': {
@@ -44,6 +45,7 @@ class DWIE(InfoExtractor):
'description': 'Welcome to the 90s - The Golden Decade of Hip Hop', 'description': 'Welcome to the 90s - The Golden Decade of Hip Hop',
'upload_date': '20160521', 'upload_date': '20160521',
}, },
'skip': 'Video removed',
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@@ -50,6 +50,14 @@ class EaglePlatformIE(InfoExtractor):
'skip': 'Georestricted', 'skip': 'Georestricted',
}] }]
@staticmethod
def _extract_url(webpage):
mobj = re.search(
r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//.+?\.media\.eagleplatform\.com/index/player\?.+?)\1',
webpage)
if mobj is not None:
return mobj.group('url')
@staticmethod @staticmethod
def _handle_error(response): def _handle_error(response):
status = int_or_none(response.get('status', 200)) status = int_or_none(response.get('status', 200))

View File

@@ -6,12 +6,13 @@ import json
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
NO_DEFAULT,
) )
class EllenTVIE(InfoExtractor): class EllenTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?:ellentv|ellentube)\.com/videos/(?P<id>[a-z0-9_-]+)' _VALID_URL = r'https?://(?:www\.)?(?:ellentv|ellentube)\.com/videos/(?P<id>[a-z0-9_-]+)'
_TEST = { _TESTS = [{
'url': 'http://www.ellentv.com/videos/0-ipq1gsai/', 'url': 'http://www.ellentv.com/videos/0-ipq1gsai/',
'md5': '4294cf98bc165f218aaa0b89e0fd8042', 'md5': '4294cf98bc165f218aaa0b89e0fd8042',
'info_dict': { 'info_dict': {
@@ -22,24 +23,47 @@ class EllenTVIE(InfoExtractor):
'timestamp': 1428035648, 'timestamp': 1428035648,
'upload_date': '20150403', 'upload_date': '20150403',
'uploader_id': 'batchUser', 'uploader_id': 'batchUser',
} },
} }, {
# not available via http://widgets.ellentube.com/
'url': 'http://www.ellentv.com/videos/1-szkgu2m2/',
'info_dict': {
'id': '1_szkgu2m2',
'ext': 'flv',
'title': "Ellen's Amazingly Talented Audience",
'description': 'md5:86ff1e376ff0d717d7171590e273f0a5',
'timestamp': 1255140900,
'upload_date': '20091010',
'uploader_id': 'ellenkaltura@gmail.com',
},
'params': {
'skip_download': True,
},
}]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage( URLS = ('http://widgets.ellentube.com/videos/%s' % video_id, url)
'http://widgets.ellentube.com/videos/%s' % video_id,
video_id)
partner_id = self._search_regex( for num, url_ in enumerate(URLS, 1):
r"var\s+partnerId\s*=\s*'([^']+)", webpage, 'partner id') webpage = self._download_webpage(
url_, video_id, fatal=num == len(URLS))
kaltura_id = self._search_regex( default = NO_DEFAULT if num == len(URLS) else None
[r'id="kaltura_player_([^"]+)"',
r"_wb_entry_id\s*:\s*'([^']+)", partner_id = self._search_regex(
r'data-kaltura-entry-id="([^"]+)'], r"var\s+partnerId\s*=\s*'([^']+)", webpage, 'partner id',
webpage, 'kaltura id') default=default)
kaltura_id = self._search_regex(
[r'id="kaltura_player_([^"]+)"',
r"_wb_entry_id\s*:\s*'([^']+)",
r'data-kaltura-entry-id="([^"]+)'],
webpage, 'kaltura id', default=default)
if partner_id and kaltura_id:
break
return self.url_result('kaltura:%s:%s' % (partner_id, kaltura_id), 'Kaltura') return self.url_result('kaltura:%s:%s' % (partner_id, kaltura_id), 'Kaltura')

View File

@@ -20,7 +20,11 @@ from .adobetv import (
AdobeTVVideoIE, AdobeTVVideoIE,
) )
from .adultswim import AdultSwimIE from .adultswim import AdultSwimIE
from .aenetworks import AENetworksIE from .aenetworks import (
AENetworksIE,
HistoryTopicIE,
)
from .afreecatv import AfreecaTVIE
from .aftonbladet import AftonbladetIE from .aftonbladet import AftonbladetIE
from .airmozilla import AirMozillaIE from .airmozilla import AirMozillaIE
from .aljazeera import AlJazeeraIE from .aljazeera import AlJazeeraIE
@@ -43,7 +47,6 @@ from .archiveorg import ArchiveOrgIE
from .ard import ( from .ard import (
ARDIE, ARDIE,
ARDMediathekIE, ARDMediathekIE,
SportschauIE,
) )
from .arte import ( from .arte import (
ArteTvIE, ArteTvIE,
@@ -56,6 +59,7 @@ from .arte import (
ArteTVDDCIE, ArteTVDDCIE,
ArteTVMagazineIE, ArteTVMagazineIE,
ArteTVEmbedIE, ArteTVEmbedIE,
ArteTVPlaylistIE,
) )
from .atresplayer import AtresPlayerIE from .atresplayer import AtresPlayerIE
from .atttechchannel import ATTTechChannelIE from .atttechchannel import ATTTechChannelIE
@@ -69,6 +73,8 @@ from .bandcamp import BandcampIE, BandcampAlbumIE
from .bbc import ( from .bbc import (
BBCCoUkIE, BBCCoUkIE,
BBCCoUkArticleIE, BBCCoUkArticleIE,
BBCCoUkIPlayerPlaylistIE,
BBCCoUkPlaylistIE,
BBCIE, BBCIE,
) )
from .beeg import BeegIE from .beeg import BeegIE
@@ -106,6 +112,10 @@ from .camwithher import CamWithHerIE
from .canalplus import CanalplusIE from .canalplus import CanalplusIE
from .canalc2 import Canalc2IE from .canalc2 import Canalc2IE
from .canvas import CanvasIE from .canvas import CanvasIE
from .carambatv import (
CarambaTVIE,
CarambaTVPageIE,
)
from .cbc import ( from .cbc import (
CBCIE, CBCIE,
CBCPlayerIE, CBCPlayerIE,
@@ -129,10 +139,11 @@ from .chirbit import (
ChirbitProfileIE, ChirbitProfileIE,
) )
from .cinchcast import CinchcastIE from .cinchcast import CinchcastIE
from .cliprs import ClipRsIE
from .clipfish import ClipfishIE from .clipfish import ClipfishIE
from .cliphunter import CliphunterIE from .cliphunter import CliphunterIE
from .cliprs import ClipRsIE
from .clipsyndicate import ClipsyndicateIE from .clipsyndicate import ClipsyndicateIE
from .closertotruth import CloserToTruthIE
from .cloudy import CloudyIE from .cloudy import CloudyIE
from .clubic import ClubicIE from .clubic import ClubicIE
from .clyp import ClypIE from .clyp import ClypIE
@@ -160,6 +171,8 @@ from .crunchyroll import (
) )
from .cspan import CSpanIE from .cspan import CSpanIE
from .ctsnews import CtsNewsIE from .ctsnews import CtsNewsIE
from .ctv import CTVIE
from .ctvnews import CTVNewsIE
from .cultureunplugged import CultureUnpluggedIE from .cultureunplugged import CultureUnpluggedIE
from .cwtv import CWTVIE from .cwtv import CWTVIE
from .dailymail import DailyMailIE from .dailymail import DailyMailIE
@@ -243,6 +256,7 @@ from .fivemin import FiveMinIE
from .fivetv import FiveTVIE from .fivetv import FiveTVIE
from .fktv import FKTVIE from .fktv import FKTVIE
from .flickr import FlickrIE from .flickr import FlickrIE
from .flipagram import FlipagramIE
from .folketinget import FolketingetIE from .folketinget import FolketingetIE
from .footyroom import FootyRoomIE from .footyroom import FootyRoomIE
from .formula1 import Formula1IE from .formula1 import Formula1IE
@@ -268,6 +282,7 @@ from .freespeech import FreespeechIE
from .freevideo import FreeVideoIE from .freevideo import FreeVideoIE
from .funimation import FunimationIE from .funimation import FunimationIE
from .funnyordie import FunnyOrDieIE from .funnyordie import FunnyOrDieIE
from .fusion import FusionIE
from .gameinformer import GameInformerIE from .gameinformer import GameInformerIE
from .gamekings import GamekingsIE from .gamekings import GamekingsIE
from .gameone import ( from .gameone import (
@@ -277,7 +292,6 @@ from .gameone import (
from .gamersyde import GamersydeIE from .gamersyde import GamersydeIE
from .gamespot import GameSpotIE from .gamespot import GameSpotIE
from .gamestar import GameStarIE from .gamestar import GameStarIE
from .gametrailers import GametrailersIE
from .gazeta import GazetaIE from .gazeta import GazetaIE
from .gdcvault import GDCVaultIE from .gdcvault import GDCVaultIE
from .generic import GenericIE from .generic import GenericIE
@@ -290,6 +304,7 @@ from .globo import (
GloboArticleIE, GloboArticleIE,
) )
from .godtube import GodTubeIE from .godtube import GodTubeIE
from .godtv import GodTVIE
from .goldenmoustache import GoldenMoustacheIE from .goldenmoustache import GoldenMoustacheIE
from .golem import GolemIE from .golem import GolemIE
from .googledrive import GoogleDriveIE from .googledrive import GoogleDriveIE
@@ -312,6 +327,10 @@ from .hotnewhiphop import HotNewHipHopIE
from .hotstar import HotStarIE from .hotstar import HotStarIE
from .howcast import HowcastIE from .howcast import HowcastIE
from .howstuffworks import HowStuffWorksIE from .howstuffworks import HowStuffWorksIE
from .hrti import (
HRTiIE,
HRTiPlaylistIE,
)
from .huffpost import HuffPostIE from .huffpost import HuffPostIE
from .hypem import HypemIE from .hypem import HypemIE
from .iconosquare import IconosquareIE from .iconosquare import IconosquareIE
@@ -350,6 +369,7 @@ from .jove import JoveIE
from .jwplatform import JWPlatformIE from .jwplatform import JWPlatformIE
from .jpopsukitv import JpopsukiIE from .jpopsukitv import JpopsukiIE
from .kaltura import KalturaIE from .kaltura import KalturaIE
from .kamcord import KamcordIE
from .kanalplay import KanalPlayIE from .kanalplay import KanalPlayIE
from .kankan import KankanIE from .kankan import KankanIE
from .karaoketv import KaraoketvIE from .karaoketv import KaraoketvIE
@@ -381,6 +401,7 @@ from .leeco import (
LePlaylistIE, LePlaylistIE,
LetvCloudIE, LetvCloudIE,
) )
from .libraryofcongress import LibraryOfCongressIE
from .libsyn import LibsynIE from .libsyn import LibsynIE
from .lifenews import ( from .lifenews import (
LifeNewsIE, LifeNewsIE,
@@ -413,6 +434,7 @@ from .makerschannel import MakersChannelIE
from .makertv import MakerTVIE from .makertv import MakerTVIE
from .matchtv import MatchTVIE from .matchtv import MatchTVIE
from .mdr import MDRIE from .mdr import MDRIE
from .meta import METAIE
from .metacafe import MetacafeIE from .metacafe import MetacafeIE
from .metacritic import MetacriticIE from .metacritic import MetacriticIE
from .mgoon import MgoonIE from .mgoon import MgoonIE
@@ -445,6 +467,7 @@ from .motherless import MotherlessIE
from .motorsport import MotorsportIE from .motorsport import MotorsportIE
from .movieclips import MovieClipsIE from .movieclips import MovieClipsIE
from .moviezine import MoviezineIE from .moviezine import MoviezineIE
from .msn import MSNIE
from .mtv import ( from .mtv import (
MTVIE, MTVIE,
MTVServicesEmbeddedIE, MTVServicesEmbeddedIE,
@@ -471,7 +494,6 @@ from .nbc import (
NBCNewsIE, NBCNewsIE,
NBCSportsIE, NBCSportsIE,
NBCSportsVPlayerIE, NBCSportsVPlayerIE,
MSNBCIE,
) )
from .ndr import ( from .ndr import (
NDRIE, NDRIE,
@@ -508,8 +530,12 @@ from .nhl import (
NHLVideocenterCategoryIE, NHLVideocenterCategoryIE,
NHLIE, NHLIE,
) )
from .nick import NickIE from .nick import (
NickIE,
NickDeIE,
)
from .niconico import NiconicoIE, NiconicoPlaylistIE from .niconico import NiconicoIE, NiconicoPlaylistIE
from .ninecninemedia import NineCNineMediaIE
from .ninegag import NineGagIE from .ninegag import NineGagIE
from .noco import NocoIE from .noco import NocoIE
from .normalboots import NormalbootsIE from .normalboots import NormalbootsIE
@@ -557,6 +583,10 @@ from .nytimes import (
from .nuvid import NuvidIE from .nuvid import NuvidIE
from .odnoklassniki import OdnoklassnikiIE from .odnoklassniki import OdnoklassnikiIE
from .oktoberfesttv import OktoberfestTVIE from .oktoberfesttv import OktoberfestTVIE
from .onet import (
OnetIE,
OnetChannelIE,
)
from .onionstudios import OnionStudiosIE from .onionstudios import OnionStudiosIE
from .ooyala import ( from .ooyala import (
OoyalaIE, OoyalaIE,
@@ -595,6 +625,7 @@ from .pluralsight import (
PluralsightCourseIE, PluralsightCourseIE,
) )
from .podomatic import PodomaticIE from .podomatic import PodomaticIE
from .polskieradio import PolskieRadioIE
from .porn91 import Porn91IE from .porn91 import Porn91IE
from .pornhd import PornHdIE from .pornhd import PornHdIE
from .pornhub import ( from .pornhub import (
@@ -618,7 +649,10 @@ from .qqmusic import (
QQMusicToplistIE, QQMusicToplistIE,
QQMusicPlaylistIE, QQMusicPlaylistIE,
) )
from .r7 import R7IE from .r7 import (
R7IE,
R7ArticleIE,
)
from .radiocanada import ( from .radiocanada import (
RadioCanadaIE, RadioCanadaIE,
RadioCanadaAudioVideoIE, RadioCanadaAudioVideoIE,
@@ -638,10 +672,15 @@ from .regiotv import RegioTVIE
from .restudy import RestudyIE from .restudy import RestudyIE
from .reuters import ReutersIE from .reuters import ReutersIE
from .reverbnation import ReverbNationIE from .reverbnation import ReverbNationIE
from .revision3 import Revision3IE from .revision3 import (
Revision3EmbedIE,
Revision3IE,
)
from .rice import RICEIE from .rice import RICEIE
from .ringtv import RingTVIE from .ringtv import RingTVIE
from .ro220 import Ro220IE from .ro220 import Ro220IE
from .rockstargames import RockstarGamesIE
from .roosterteeth import RoosterTeethIE
from .rottentomatoes import RottenTomatoesIE from .rottentomatoes import RottenTomatoesIE
from .roxwel import RoxwelIE from .roxwel import RoxwelIE
from .rtbf import RTBFIE from .rtbf import RTBFIE
@@ -652,6 +691,7 @@ from .rtp import RTPIE
from .rts import RTSIE from .rts import RTSIE
from .rtve import RTVEALaCartaIE, RTVELiveIE, RTVEInfantilIE from .rtve import RTVEALaCartaIE, RTVELiveIE, RTVEInfantilIE
from .rtvnh import RTVNHIE from .rtvnh import RTVNHIE
from .rudo import RudoIE
from .ruhd import RUHDIE from .ruhd import RUHDIE
from .ruleporn import RulePornIE from .ruleporn import RulePornIE
from .rutube import ( from .rutube import (
@@ -677,6 +717,7 @@ from .screencast import ScreencastIE
from .screencastomatic import ScreencastOMaticIE from .screencastomatic import ScreencastOMaticIE
from .screenjunkies import ScreenJunkiesIE from .screenjunkies import ScreenJunkiesIE
from .screenwavemedia import ScreenwaveMediaIE, TeamFourIE from .screenwavemedia import ScreenwaveMediaIE, TeamFourIE
from .seeker import SeekerIE
from .senateisvp import SenateISVPIE from .senateisvp import SenateISVPIE
from .sendtonews import SendtoNewsIE from .sendtonews import SendtoNewsIE
from .servingsys import ServingSysIE from .servingsys import ServingSysIE
@@ -685,10 +726,12 @@ from .shahid import ShahidIE
from .shared import SharedIE from .shared import SharedIE
from .sharesix import ShareSixIE from .sharesix import ShareSixIE
from .sina import SinaIE from .sina import SinaIE
from .sixplay import SixPlayIE
from .skynewsarabia import ( from .skynewsarabia import (
SkyNewsArabiaIE, SkyNewsArabiaIE,
SkyNewsArabiaArticleIE, SkyNewsArabiaArticleIE,
) )
from .skysports import SkySportsIE
from .slideshare import SlideshareIE from .slideshare import SlideshareIE
from .slutload import SlutloadIE from .slutload import SlutloadIE
from .smotri import ( from .smotri import (
@@ -729,6 +772,7 @@ from .sportbox import (
SportBoxEmbedIE, SportBoxEmbedIE,
) )
from .sportdeutschland import SportDeutschlandIE from .sportdeutschland import SportDeutschlandIE
from .sportschau import SportschauIE
from .srgssr import ( from .srgssr import (
SRGSSRIE, SRGSSRIE,
SRGSSRPlayIE, SRGSSRPlayIE,
@@ -769,6 +813,7 @@ from .telecinco import TelecincoIE
from .telegraaf import TelegraafIE from .telegraaf import TelegraafIE
from .telemb import TeleMBIE from .telemb import TeleMBIE
from .teletask import TeleTaskIE from .teletask import TeleTaskIE
from .telewebion import TelewebionIE
from .testurl import TestURLIE from .testurl import TestURLIE
from .tf1 import TF1IE from .tf1 import TF1IE
from .theintercept import TheInterceptIE from .theintercept import TheInterceptIE
@@ -853,6 +898,7 @@ from .twitch import (
TwitchProfileIE, TwitchProfileIE,
TwitchPastBroadcastsIE, TwitchPastBroadcastsIE,
TwitchStreamIE, TwitchStreamIE,
TwitchClipsIE,
) )
from .twitter import ( from .twitter import (
TwitterCardIE, TwitterCardIE,
@@ -867,6 +913,7 @@ from .udn import UDNEmbedIE
from .digiteka import DigitekaIE from .digiteka import DigitekaIE
from .unistra import UnistraIE from .unistra import UnistraIE
from .urort import UrortIE from .urort import UrortIE
from .urplay import URPlayIE
from .usatoday import USATodayIE from .usatoday import USATodayIE
from .ustream import UstreamIE, UstreamChannelIE from .ustream import UstreamIE, UstreamChannelIE
from .ustudio import ( from .ustudio import (
@@ -893,6 +940,7 @@ from .vice import (
ViceIE, ViceIE,
ViceShowIE, ViceShowIE,
) )
from .vidbit import VidbitIE
from .viddler import ViddlerIE from .viddler import ViddlerIE
from .videodetective import VideoDetectiveIE from .videodetective import VideoDetectiveIE
from .videofyme import VideofyMeIE from .videofyme import VideofyMeIE
@@ -904,6 +952,7 @@ from .videomore import (
) )
from .videopremium import VideoPremiumIE from .videopremium import VideoPremiumIE
from .videott import VideoTtIE from .videott import VideoTtIE
from .vidio import VidioIE
from .vidme import ( from .vidme import (
VidmeIE, VidmeIE,
VidmeUserIE, VidmeUserIE,
@@ -940,6 +989,7 @@ from .viki import (
from .vk import ( from .vk import (
VKIE, VKIE,
VKUserVideosIE, VKUserVideosIE,
VKWallPostIE,
) )
from .vlive import VLiveIE from .vlive import VLiveIE
from .vodlocker import VodlockerIE from .vodlocker import VodlockerIE
@@ -949,7 +999,6 @@ from .vporn import VpornIE
from .vrt import VRTIE from .vrt import VRTIE
from .vube import VubeIE from .vube import VubeIE
from .vuclip import VuClipIE from .vuclip import VuClipIE
from .vulture import VultureIE
from .walla import WallaIE from .walla import WallaIE
from .washingtonpost import ( from .washingtonpost import (
WashingtonPostIE, WashingtonPostIE,
@@ -960,18 +1009,19 @@ from .watchindianporn import WatchIndianPornIE
from .wdr import ( from .wdr import (
WDRIE, WDRIE,
WDRMobileIE, WDRMobileIE,
WDRMausIE,
) )
from .webofstories import ( from .webofstories import (
WebOfStoriesIE, WebOfStoriesIE,
WebOfStoriesPlaylistIE, WebOfStoriesPlaylistIE,
) )
from .weibo import WeiboIE
from .weiqitv import WeiqiTVIE from .weiqitv import WeiqiTVIE
from .wimp import WimpIE from .wimp import WimpIE
from .wistia import WistiaIE from .wistia import WistiaIE
from .worldstarhiphop import WorldStarHipHopIE from .worldstarhiphop import WorldStarHipHopIE
from .wrzuta import WrzutaIE from .wrzuta import (
WrzutaIE,
WrzutaPlaylistIE,
)
from .wsj import WSJIE from .wsj import WSJIE
from .xbef import XBefIE from .xbef import XBefIE
from .xboxclips import XboxClipsIE from .xboxclips import XboxClipsIE
@@ -1007,7 +1057,10 @@ from .yesjapan import YesJapanIE
from .yinyuetai import YinYueTaiIE from .yinyuetai import YinYueTaiIE
from .ynet import YnetIE from .ynet import YnetIE
from .youjizz import YouJizzIE from .youjizz import YouJizzIE
from .youku import YoukuIE from .youku import (
YoukuIE,
YoukuShowIE,
)
from .youporn import YouPornIE from .youporn import YouPornIE
from .yourupload import YourUploadIE from .yourupload import YourUploadIE
from .youtube import ( from .youtube import (
@@ -1022,6 +1075,7 @@ from .youtube import (
YoutubeSearchDateIE, YoutubeSearchDateIE,
YoutubeSearchIE, YoutubeSearchIE,
YoutubeSearchURLIE, YoutubeSearchURLIE,
YoutubeSharedVideoIE,
YoutubeShowIE, YoutubeShowIE,
YoutubeSubscriptionsIE, YoutubeSubscriptionsIE,
YoutubeTruncatedIDIE, YoutubeTruncatedIDIE,

View File

@@ -129,6 +129,21 @@ class FacebookIE(InfoExtractor):
'only_matching': True, 'only_matching': True,
}] }]
@staticmethod
def _extract_url(webpage):
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>https://www\.facebook\.com/video/embed.+?)\1', webpage)
if mobj is not None:
return mobj.group('url')
# Facebook API embed
# see https://developers.facebook.com/docs/plugins/embedded-video-player
mobj = re.search(r'''(?x)<div[^>]+
class=(?P<q1>[\'"])[^\'"]*\bfb-(?:video|post)\b[^\'"]*(?P=q1)[^>]+
data-href=(?P<q2>[\'"])(?P<url>(?:https?:)?//(?:www\.)?facebook.com/.+?)(?P=q2)''', webpage)
if mobj is not None:
return mobj.group('url')
def _login(self): def _login(self):
(useremail, password) = self._get_login_info() (useremail, password) = self._get_login_info()
if useremail is None: if useremail is None:
@@ -204,12 +219,25 @@ class FacebookIE(InfoExtractor):
BEFORE = '{swf.addParam(param[0], param[1]);});' BEFORE = '{swf.addParam(param[0], param[1]);});'
AFTER = '.forEach(function(variable) {swf.addVariable(variable[0], variable[1]);});' AFTER = '.forEach(function(variable) {swf.addVariable(variable[0], variable[1]);});'
m = re.search(re.escape(BEFORE) + '(?:\n|\\\\n)(.*?)' + re.escape(AFTER), webpage) PATTERN = re.escape(BEFORE) + '(?:\n|\\\\n)(.*?)' + re.escape(AFTER)
if m:
swf_params = m.group(1).replace('\\\\', '\\').replace('\\"', '"') for m in re.findall(PATTERN, webpage):
swf_params = m.replace('\\\\', '\\').replace('\\"', '"')
data = dict(json.loads(swf_params)) data = dict(json.loads(swf_params))
params_raw = compat_urllib_parse_unquote(data['params']) params_raw = compat_urllib_parse_unquote(data['params'])
video_data = json.loads(params_raw)['video_data'] video_data_candidate = json.loads(params_raw)['video_data']
for _, f in video_data_candidate.items():
if not f:
continue
if isinstance(f, dict):
f = [f]
if not isinstance(f, list):
continue
if f[0].get('video_id') == video_id:
video_data = video_data_candidate
break
if video_data:
break
def video_data_list2dict(video_data): def video_data_list2dict(video_data):
ret = {} ret = {}
@@ -239,6 +267,8 @@ class FacebookIE(InfoExtractor):
formats = [] formats = []
for format_id, f in video_data.items(): for format_id, f in video_data.items():
if f and isinstance(f, dict):
f = [f]
if not f or not isinstance(f, list): if not f or not isinstance(f, list):
continue continue
for quality in ('sd', 'hd'): for quality in ('sd', 'hd'):

View File

@@ -0,0 +1,115 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
int_or_none,
float_or_none,
try_get,
unified_timestamp,
)
class FlipagramIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?flipagram\.com/f/(?P<id>[^/?#&]+)'
_TEST = {
'url': 'https://flipagram.com/f/nyvTSJMKId',
'md5': '888dcf08b7ea671381f00fab74692755',
'info_dict': {
'id': 'nyvTSJMKId',
'ext': 'mp4',
'title': 'Flipagram by sjuria101 featuring Midnight Memories by One Direction',
'description': 'md5:d55e32edc55261cae96a41fa85ff630e',
'duration': 35.571,
'timestamp': 1461244995,
'upload_date': '20160421',
'uploader': 'kitty juria',
'uploader_id': 'sjuria101',
'creator': 'kitty juria',
'view_count': int,
'like_count': int,
'repost_count': int,
'comment_count': int,
'comments': list,
'formats': 'mincount:2',
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_data = self._parse_json(
self._search_regex(
r'window\.reactH2O\s*=\s*({.+});', webpage, 'video data'),
video_id)
flipagram = video_data['flipagram']
video = flipagram['video']
json_ld = self._search_json_ld(webpage, video_id, default=False)
title = json_ld.get('title') or flipagram['captionText']
description = json_ld.get('description') or flipagram.get('captionText')
formats = [{
'url': video['url'],
'width': int_or_none(video.get('width')),
'height': int_or_none(video.get('height')),
'filesize': int_or_none(video_data.get('size')),
}]
preview_url = try_get(
flipagram, lambda x: x['music']['track']['previewUrl'], compat_str)
if preview_url:
formats.append({
'url': preview_url,
'ext': 'm4a',
'vcodec': 'none',
})
self._sort_formats(formats)
counts = flipagram.get('counts', {})
user = flipagram.get('user', {})
video_data = flipagram.get('video', {})
thumbnails = [{
'url': self._proto_relative_url(cover['url']),
'width': int_or_none(cover.get('width')),
'height': int_or_none(cover.get('height')),
'filesize': int_or_none(cover.get('size')),
} for cover in flipagram.get('covers', []) if cover.get('url')]
# Note that this only retrieves comments that are initally loaded.
# For videos with large amounts of comments, most won't be retrieved.
comments = []
for comment in video_data.get('comments', {}).get(video_id, {}).get('items', []):
text = comment.get('comment')
if not text or not isinstance(text, list):
continue
comments.append({
'author': comment.get('user', {}).get('name'),
'author_id': comment.get('user', {}).get('username'),
'id': comment.get('id'),
'text': text[0],
'timestamp': unified_timestamp(comment.get('created')),
})
return {
'id': video_id,
'title': title,
'description': description,
'duration': float_or_none(flipagram.get('duration'), 1000),
'thumbnails': thumbnails,
'timestamp': unified_timestamp(flipagram.get('iso8601Created')),
'uploader': user.get('name'),
'uploader_id': user.get('username'),
'creator': user.get('name'),
'view_count': int_or_none(counts.get('plays')),
'like_count': int_or_none(counts.get('likes')),
'repost_count': int_or_none(counts.get('reflips')),
'comment_count': int_or_none(counts.get('comments')),
'comments': comments,
'formats': formats,
}

View File

@@ -1,7 +1,10 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import smuggle_url from ..utils import (
smuggle_url,
update_url_query,
)
class FoxSportsIE(InfoExtractor): class FoxSportsIE(InfoExtractor):
@@ -9,11 +12,15 @@ class FoxSportsIE(InfoExtractor):
_TEST = { _TEST = {
'url': 'http://www.foxsports.com/video?vid=432609859715', 'url': 'http://www.foxsports.com/video?vid=432609859715',
'md5': 'b49050e955bebe32c301972e4012ac17',
'info_dict': { 'info_dict': {
'id': 'gA0bHB3Ladz3', 'id': 'i0qKWsk3qJaM',
'ext': 'flv', 'ext': 'mp4',
'title': 'Courtney Lee on going up 2-0 in series vs. Blazers', 'title': 'Courtney Lee on going up 2-0 in series vs. Blazers',
'description': 'Courtney Lee talks about Memphis being focused.', 'description': 'Courtney Lee talks about Memphis being focused.',
'upload_date': '20150423',
'timestamp': 1429761109,
'uploader': 'NEWA-FNG-FOXSPORTS',
}, },
'add_ie': ['ThePlatform'], 'add_ie': ['ThePlatform'],
} }
@@ -28,5 +35,8 @@ class FoxSportsIE(InfoExtractor):
r"data-player-config='([^']+)'", webpage, 'data player config'), r"data-player-config='([^']+)'", webpage, 'data player config'),
video_id) video_id)
return self.url_result(smuggle_url( return self.url_result(smuggle_url(update_url_query(
config['releaseURL'] + '&manifest=f4m', {'force_smil_url': True})) config['releaseURL'], {
'mbr': 'true',
'switch': 'http',
}), {'force_smil_url': True}))

View File

@@ -14,7 +14,10 @@ from ..utils import (
parse_duration, parse_duration,
determine_ext, determine_ext,
) )
from .dailymotion import DailymotionCloudIE from .dailymotion import (
DailymotionIE,
DailymotionCloudIE,
)
class FranceTVBaseInfoExtractor(InfoExtractor): class FranceTVBaseInfoExtractor(InfoExtractor):
@@ -188,6 +191,21 @@ class FranceTvInfoIE(FranceTVBaseInfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
}, {
# Dailymotion embed
'url': 'http://www.francetvinfo.fr/politique/notre-dame-des-landes/video-sur-france-inter-cecile-duflot-denonce-le-regard-meprisant-de-patrick-cohen_1520091.html',
'md5': 'ee7f1828f25a648addc90cb2687b1f12',
'info_dict': {
'id': 'x4iiko0',
'ext': 'mp4',
'title': 'NDDL, référendum, Brexit : Cécile Duflot répond à Patrick Cohen',
'description': 'Au lendemain de la victoire du "oui" au référendum sur l\'aéroport de Notre-Dame-des-Landes, l\'ancienne ministre écologiste est l\'invitée de Patrick Cohen. Plus d\'info : https://www.franceinter.fr/emissions/le-7-9/le-7-9-27-juin-2016',
'timestamp': 1467011958,
'upload_date': '20160627',
'uploader': 'France Inter',
'uploader_id': 'x2q2ez',
},
'add_ie': ['Dailymotion'],
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@@ -197,7 +215,13 @@ class FranceTvInfoIE(FranceTVBaseInfoExtractor):
dmcloud_url = DailymotionCloudIE._extract_dmcloud_url(webpage) dmcloud_url = DailymotionCloudIE._extract_dmcloud_url(webpage)
if dmcloud_url: if dmcloud_url:
return self.url_result(dmcloud_url, 'DailymotionCloud') return self.url_result(dmcloud_url, DailymotionCloudIE.ie_key())
dailymotion_urls = DailymotionIE._extract_urls(webpage)
if dailymotion_urls:
return self.playlist_result([
self.url_result(dailymotion_url, DailymotionIE.ie_key())
for dailymotion_url in dailymotion_urls])
video_id, catalogue = self._search_regex( video_id, catalogue = self._search_regex(
(r'id-video=([^@]+@[^"]+)', (r'id-video=([^@]+@[^"]+)',

View File

@@ -0,0 +1,35 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from .ooyala import OoyalaIE
class FusionIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?fusion\.net/video/(?P<id>\d+)'
_TESTS = [{
'url': 'http://fusion.net/video/201781/u-s-and-panamanian-forces-work-together-to-stop-a-vessel-smuggling-drugs/',
'info_dict': {
'id': 'ZpcWNoMTE6x6uVIIWYpHh0qQDjxBuq5P',
'ext': 'mp4',
'title': 'U.S. and Panamanian forces work together to stop a vessel smuggling drugs',
'description': 'md5:0cc84a9943c064c0f46b128b41b1b0d7',
'duration': 140.0,
},
'params': {
'skip_download': True,
},
'add_ie': ['Ooyala'],
}, {
'url': 'http://fusion.net/video/201781',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
ooyala_code = self._search_regex(
r'data-video-id=(["\'])(?P<code>.+?)\1',
webpage, 'ooyala code', group='code')
return OoyalaIE._build_url_result(ooyala_code)

View File

@@ -1,19 +1,19 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
import json
from .common import InfoExtractor from .once import OnceIE
from ..compat import ( from ..compat import (
compat_urllib_parse_unquote, compat_urllib_parse_unquote,
compat_urlparse,
) )
from ..utils import ( from ..utils import (
unescapeHTML, unescapeHTML,
url_basename,
dict_get,
) )
class GameSpotIE(InfoExtractor): class GameSpotIE(OnceIE):
_VALID_URL = r'https?://(?:www\.)?gamespot\.com/.*-(?P<id>\d+)/?' _VALID_URL = r'https?://(?:www\.)?gamespot\.com/.*-(?P<id>\d+)/?'
_TESTS = [{ _TESTS = [{
'url': 'http://www.gamespot.com/videos/arma-3-community-guide-sitrep-i/2300-6410818/', 'url': 'http://www.gamespot.com/videos/arma-3-community-guide-sitrep-i/2300-6410818/',
@@ -39,29 +39,73 @@ class GameSpotIE(InfoExtractor):
webpage = self._download_webpage(url, page_id) webpage = self._download_webpage(url, page_id)
data_video_json = self._search_regex( data_video_json = self._search_regex(
r'data-video=["\'](.*?)["\']', webpage, 'data video') r'data-video=["\'](.*?)["\']', webpage, 'data video')
data_video = json.loads(unescapeHTML(data_video_json)) data_video = self._parse_json(unescapeHTML(data_video_json), page_id)
streams = data_video['videoStreams'] streams = data_video['videoStreams']
manifest_url = None
formats = [] formats = []
f4m_url = streams.get('f4m_stream') f4m_url = streams.get('f4m_stream')
if f4m_url is not None: if f4m_url:
# Transform the manifest url to a link to the mp4 files manifest_url = f4m_url
# they are used in mobile devices. formats.extend(self._extract_f4m_formats(
f4m_path = compat_urlparse.urlparse(f4m_url).path f4m_url + '?hdcore=3.7.0', page_id, f4m_id='hds', fatal=False))
QUALITIES_RE = r'((,\d+)+,?)' m3u8_url = streams.get('m3u8_stream')
qualities = self._search_regex(QUALITIES_RE, f4m_path, 'qualities').strip(',').split(',') if m3u8_url:
http_path = f4m_path[1:].split('/', 1)[1] manifest_url = m3u8_url
http_template = re.sub(QUALITIES_RE, r'%s', http_path) m3u8_formats = self._extract_m3u8_formats(
http_template = http_template.replace('.csmil/manifest.f4m', '') m3u8_url, page_id, 'mp4', 'm3u8_native',
http_template = compat_urlparse.urljoin( m3u8_id='hls', fatal=False)
'http://video.gamespotcdn.com/', http_template) formats.extend(m3u8_formats)
for q in qualities: progressive_url = dict_get(
formats.append({ streams, ('progressive_hd', 'progressive_high', 'progressive_low'))
'url': http_template % q, if progressive_url and manifest_url:
'ext': 'mp4', qualities_basename = self._search_regex(
'format_id': q, '/([^/]+)\.csmil/',
}) manifest_url, 'qualities basename', default=None)
else: if qualities_basename:
QUALITIES_RE = r'((,\d+)+,?)'
qualities = self._search_regex(
QUALITIES_RE, qualities_basename,
'qualities', default=None)
if qualities:
qualities = list(map(lambda q: int(q), qualities.strip(',').split(',')))
qualities.sort()
http_template = re.sub(QUALITIES_RE, r'%d', qualities_basename)
http_url_basename = url_basename(progressive_url)
if m3u8_formats:
self._sort_formats(m3u8_formats)
m3u8_formats = list(filter(
lambda f: f.get('vcodec') != 'none' and f.get('resolution') != 'multiple',
m3u8_formats))
if len(qualities) == len(m3u8_formats):
for q, m3u8_format in zip(qualities, m3u8_formats):
f = m3u8_format.copy()
f.update({
'url': progressive_url.replace(
http_url_basename, http_template % q),
'format_id': f['format_id'].replace('hls', 'http'),
'protocol': 'http',
})
formats.append(f)
else:
for q in qualities:
formats.append({
'url': progressive_url.replace(
http_url_basename, http_template % q),
'ext': 'mp4',
'format_id': 'http-%d' % q,
'tbr': q,
})
onceux_json = self._search_regex(
r'data-onceux-options=["\'](.*?)["\']', webpage, 'data video', default=None)
if onceux_json:
onceux_url = self._parse_json(unescapeHTML(onceux_json), page_id).get('metadataUri')
if onceux_url:
formats.extend(self._extract_once_formats(re.sub(
r'https?://[^/]+', 'http://once.unicornmedia.com', onceux_url).replace('ads/vmap/', '')))
if not formats:
for quality in ['sd', 'hd']: for quality in ['sd', 'hd']:
# It's actually a link to a flv file # It's actually a link to a flv file
flv_url = streams.get('f4m_{0}'.format(quality)) flv_url = streams.get('f4m_{0}'.format(quality))
@@ -71,6 +115,7 @@ class GameSpotIE(InfoExtractor):
'ext': 'flv', 'ext': 'flv',
'format_id': quality, 'format_id': quality,
}) })
self._sort_formats(formats)
return { return {
'id': data_video['guid'], 'id': data_video['guid'],

View File

@@ -1,62 +0,0 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
parse_age_limit,
url_basename,
)
class GametrailersIE(InfoExtractor):
_VALID_URL = r'https?://www\.gametrailers\.com/videos/view/[^/]+/(?P<id>.+)'
_TEST = {
'url': 'http://www.gametrailers.com/videos/view/gametrailers-com/116437-Just-Cause-3-Review',
'md5': 'f28c4efa0bdfaf9b760f6507955b6a6a',
'info_dict': {
'id': '2983958',
'ext': 'mp4',
'display_id': '116437-Just-Cause-3-Review',
'title': 'Just Cause 3 - Review',
'description': 'It\'s a lot of fun to shoot at things and then watch them explode in Just Cause 3, but should there be more to the experience than that?',
},
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
title = self._html_search_regex(
r'<title>(.+?)\|', webpage, 'title').strip()
embed_url = self._proto_relative_url(
self._search_regex(
r'src=\'(//embed.gametrailers.com/embed/[^\']+)\'', webpage,
'embed url'),
scheme='http:')
video_id = url_basename(embed_url)
embed_page = self._download_webpage(embed_url, video_id)
embed_vars_json = self._search_regex(
r'(?s)var embedVars = (\{.*?\})\s*</script>', embed_page,
'embed vars')
info = self._parse_json(embed_vars_json, video_id)
formats = []
for media in info['media']:
if media['mediaPurpose'] == 'play':
formats.append({
'url': media['uri'],
'height': media['height'],
'width:': media['width'],
})
self._sort_formats(formats)
return {
'id': video_id,
'display_id': display_id,
'title': title,
'formats': formats,
'thumbnail': info.get('thumbUri'),
'description': self._og_search_description(webpage),
'duration': int_or_none(info.get('videoLengthInSeconds')),
'age_limit': parse_age_limit(info.get('audienceRating')),
}

View File

@@ -49,7 +49,10 @@ from .pornhub import PornHubIE
from .xhamster import XHamsterEmbedIE from .xhamster import XHamsterEmbedIE
from .tnaflix import TNAFlixNetworkEmbedIE from .tnaflix import TNAFlixNetworkEmbedIE
from .vimeo import VimeoIE from .vimeo import VimeoIE
from .dailymotion import DailymotionCloudIE from .dailymotion import (
DailymotionIE,
DailymotionCloudIE,
)
from .onionstudios import OnionStudiosIE from .onionstudios import OnionStudiosIE
from .viewlift import ViewLiftEmbedIE from .viewlift import ViewLiftEmbedIE
from .screenwavemedia import ScreenwaveMediaIE from .screenwavemedia import ScreenwaveMediaIE
@@ -63,6 +66,10 @@ from .instagram import InstagramIE
from .liveleak import LiveLeakIE from .liveleak import LiveLeakIE
from .threeqsdn import ThreeQSDNIE from .threeqsdn import ThreeQSDNIE
from .theplatform import ThePlatformIE from .theplatform import ThePlatformIE
from .vessel import VesselIE
from .kaltura import KalturaIE
from .eagleplatform import EaglePlatformIE
from .facebook import FacebookIE
class GenericIE(InfoExtractor): class GenericIE(InfoExtractor):
@@ -626,13 +633,13 @@ class GenericIE(InfoExtractor):
}, },
# MTVSercices embed # MTVSercices embed
{ {
'url': 'http://www.gametrailers.com/news-post/76093/north-america-europe-is-getting-that-mario-kart-8-mercedes-dlc-too', 'url': 'http://www.vulture.com/2016/06/new-key-peele-sketches-released.html',
'md5': '35727f82f58c76d996fc188f9755b0d5', 'md5': 'ca1aef97695ef2c1d6973256a57e5252',
'info_dict': { 'info_dict': {
'id': '0306a69b-8adf-4fb5-aace-75f8e8cbfca9', 'id': '769f7ec0-0692-4d62-9b45-0d88074bffc1',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Review', 'title': 'Key and Peele|October 10, 2012|2|203|Liam Neesons - Uncensored',
'description': 'Mario\'s life in the fast lane has never looked so good.', 'description': 'Two valets share their love for movie star Liam Neesons.',
}, },
}, },
# YouTube embed via <data-embed-url=""> # YouTube embed via <data-embed-url="">
@@ -881,18 +888,6 @@ class GenericIE(InfoExtractor):
'title': 'EP3S5 - Bon Appétit - Baqueira Mi Corazon !', 'title': 'EP3S5 - Bon Appétit - Baqueira Mi Corazon !',
} }
}, },
# Kaltura embed
{
'url': 'http://www.monumentalnetwork.com/videos/john-carlson-postgame-2-25-15',
'info_dict': {
'id': '1_eergr3h1',
'ext': 'mp4',
'upload_date': '20150226',
'uploader_id': 'MonumentalSports-Kaltura@perfectsensedigital.com',
'timestamp': int,
'title': 'John Carlson Postgame 2/25/15',
},
},
# Kaltura embed (different embed code) # Kaltura embed (different embed code)
{ {
'url': 'http://www.premierchristianradio.com/Shows/Saturday/Unbelievable/Conference-Videos/Os-Guinness-Is-It-Fools-Talk-Unbelievable-Conference-2014', 'url': 'http://www.premierchristianradio.com/Shows/Saturday/Unbelievable/Conference-Videos/Os-Guinness-Is-It-Fools-Talk-Unbelievable-Conference-2014',
@@ -918,6 +913,37 @@ class GenericIE(InfoExtractor):
'uploader_id': 'echojecka', 'uploader_id': 'echojecka',
}, },
}, },
# Kaltura embed with single quotes
{
'url': 'http://fod.infobase.com/p_ViewPlaylist.aspx?AssignmentID=NUN8ZY',
'info_dict': {
'id': '0_izeg5utt',
'ext': 'mp4',
'title': '35871',
'timestamp': 1355743100,
'upload_date': '20121217',
'uploader_id': 'batchUser',
},
'add_ie': ['Kaltura'],
},
{
# Kaltura embedded via quoted entry_id
'url': 'https://www.oreilly.com/ideas/my-cloud-makes-pretty-pictures',
'info_dict': {
'id': '0_utuok90b',
'ext': 'mp4',
'title': '06_matthew_brender_raj_dutt',
'timestamp': 1466638791,
'upload_date': '20160622',
},
'add_ie': ['Kaltura'],
'expected_warnings': [
'Could not send HEAD request'
],
'params': {
'skip_download': True,
}
},
# Eagle.Platform embed (generic URL) # Eagle.Platform embed (generic URL)
{ {
'url': 'http://lenta.ru/news/2015/03/06/navalny/', 'url': 'http://lenta.ru/news/2015/03/06/navalny/',
@@ -1030,16 +1056,31 @@ class GenericIE(InfoExtractor):
'timestamp': 1389118457, 'timestamp': 1389118457,
}, },
}, },
# NBC News embed
{
'url': 'http://www.vulture.com/2016/06/letterman-couldnt-care-less-about-late-night.html',
'md5': '1aa589c675898ae6d37a17913cf68d66',
'info_dict': {
'id': '701714499682',
'ext': 'mp4',
'title': 'PREVIEW: On Assignment: David Letterman',
'description': 'A preview of Tom Brokaw\'s interview with David Letterman as part of the On Assignment series powered by Dateline. Airs Sunday June 12 at 7/6c.',
},
},
# UDN embed # UDN embed
{ {
'url': 'http://www.udn.com/news/story/7314/822787', 'url': 'https://video.udn.com/news/300346',
'md5': 'fd2060e988c326991037b9aff9df21a6', 'md5': 'fd2060e988c326991037b9aff9df21a6',
'info_dict': { 'info_dict': {
'id': '300346', 'id': '300346',
'ext': 'mp4', 'ext': 'mp4',
'title': '中一中男師變性 全校師生力挺', 'title': '中一中男師變性 全校師生力挺',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
} },
'params': {
# m3u8 download
'skip_download': True,
},
}, },
# Ooyala embed # Ooyala embed
{ {
@@ -1056,20 +1097,6 @@ class GenericIE(InfoExtractor):
'skip_download': True, 'skip_download': True,
} }
}, },
# Contains a SMIL manifest
{
'url': 'http://www.telewebion.com/fa/1263668/%D9%82%D8%B1%D8%B9%D9%87%E2%80%8C%DA%A9%D8%B4%DB%8C-%D9%84%DB%8C%DA%AF-%D9%82%D9%87%D8%B1%D9%85%D8%A7%D9%86%D8%A7%D9%86-%D8%A7%D8%B1%D9%88%D9%BE%D8%A7/%2B-%D9%81%D9%88%D8%AA%D8%A8%D8%A7%D9%84.html',
'info_dict': {
'id': 'file',
'ext': 'flv',
'title': '+ Football: Lottery Champions League Europe',
'uploader': 'www.telewebion.com',
},
'params': {
# rtmpe downloads
'skip_download': True,
}
},
# Brightcove URL in single quotes # Brightcove URL in single quotes
{ {
'url': 'http://www.sportsnet.ca/baseball/mlb/sn-presents-russell-martin-world-citizen/', 'url': 'http://www.sportsnet.ca/baseball/mlb/sn-presents-russell-martin-world-citizen/',
@@ -1088,12 +1115,17 @@ class GenericIE(InfoExtractor):
# Dailymotion Cloud video # Dailymotion Cloud video
{ {
'url': 'http://replay.publicsenat.fr/vod/le-debat/florent-kolandjian,dominique-cena,axel-decourtye,laurence-abeille,bruno-parmentier/175910', 'url': 'http://replay.publicsenat.fr/vod/le-debat/florent-kolandjian,dominique-cena,axel-decourtye,laurence-abeille,bruno-parmentier/175910',
'md5': '49444254273501a64675a7e68c502681', 'md5': 'dcaf23ad0c67a256f4278bce6e0bae38',
'info_dict': { 'info_dict': {
'id': '5585de919473990de4bee11b', 'id': 'x2uy8t3',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Le débat', 'title': 'Sauvons les abeilles ! - Le débat',
'description': 'md5:d9082128b1c5277987825d684939ca26',
'thumbnail': 're:^https?://.*\.jpe?g$', 'thumbnail': 're:^https?://.*\.jpe?g$',
'timestamp': 1434970506,
'upload_date': '20150622',
'uploader': 'Public Sénat',
'uploader_id': 'xa9gza',
} }
}, },
# OnionStudios embed # OnionStudios embed
@@ -1217,6 +1249,102 @@ class GenericIE(InfoExtractor):
'uploader': 'www.hudl.com', 'uploader': 'www.hudl.com',
}, },
}, },
# twitter:player embed
{
'url': 'http://www.theatlantic.com/video/index/484130/what-do-black-holes-sound-like/',
'md5': 'a3e0df96369831de324f0778e126653c',
'info_dict': {
'id': '4909620399001',
'ext': 'mp4',
'title': 'What Do Black Holes Sound Like?',
'description': 'what do black holes sound like',
'upload_date': '20160524',
'uploader_id': '29913724001',
'timestamp': 1464107587,
'uploader': 'TheAtlantic',
},
'add_ie': ['BrightcoveLegacy'],
},
# Facebook <iframe> embed
{
'url': 'https://www.hostblogger.de/blog/archives/6181-Auto-jagt-Betonmischer.html',
'md5': 'fbcde74f534176ecb015849146dd3aee',
'info_dict': {
'id': '599637780109885',
'ext': 'mp4',
'title': 'Facebook video #599637780109885',
},
},
# Facebook API embed
{
'url': 'http://www.lothype.com/blue-stars-2016-preview-standstill-full-show/',
'md5': 'a47372ee61b39a7b90287094d447d94e',
'info_dict': {
'id': '10153467542406923',
'ext': 'mp4',
'title': 'Facebook video #10153467542406923',
},
},
# Wordpress "YouTube Video Importer" plugin
{
'url': 'http://www.lothype.com/blue-devils-drumline-stanford-lot-2016/',
'md5': 'd16797741b560b485194eddda8121b48',
'info_dict': {
'id': 'HNTXWDXV9Is',
'ext': 'mp4',
'title': 'Blue Devils Drumline Stanford lot 2016',
'upload_date': '20160627',
'uploader_id': 'GENOCIDE8GENERAL10',
'uploader': 'cylus cyrus',
},
},
{
# video stored on custom kaltura server
'url': 'http://www.expansion.com/multimedia/videos.html?media=EQcM30NHIPv',
'md5': '537617d06e64dfed891fa1593c4b30cc',
'info_dict': {
'id': '0_1iotm5bh',
'ext': 'mp4',
'title': 'Elecciones británicas: 5 lecciones para Rajoy',
'description': 'md5:435a89d68b9760b92ce67ed227055f16',
'uploader_id': 'videos.expansion@el-mundo.net',
'upload_date': '20150429',
'timestamp': 1430303472,
},
'add_ie': ['Kaltura'],
},
{
# Non-standard Vimeo embed
'url': 'https://openclassrooms.com/courses/understanding-the-web',
'md5': '64d86f1c7d369afd9a78b38cbb88d80a',
'info_dict': {
'id': '148867247',
'ext': 'mp4',
'title': 'Understanding the web - Teaser',
'description': 'This is "Understanding the web - Teaser" by openclassrooms on Vimeo, the home for high quality videos and the people who love them.',
'upload_date': '20151214',
'uploader': 'OpenClassrooms',
'uploader_id': 'openclassrooms',
},
'add_ie': ['Vimeo'],
},
# {
# # TODO: find another test
# # http://schema.org/VideoObject
# 'url': 'https://flipagram.com/f/nyvTSJMKId',
# 'md5': '888dcf08b7ea671381f00fab74692755',
# 'info_dict': {
# 'id': 'nyvTSJMKId',
# 'ext': 'mp4',
# 'title': 'Flipagram by sjuria101 featuring Midnight Memories by One Direction',
# 'description': '#love for cats.',
# 'timestamp': 1461244995,
# 'upload_date': '20160421',
# },
# 'params': {
# 'force_generic_extractor': True,
# },
# }
] ]
def report_following_redirect(self, new_url): def report_following_redirect(self, new_url):
@@ -1528,6 +1656,11 @@ class GenericIE(InfoExtractor):
if tp_urls: if tp_urls:
return _playlist_from_matches(tp_urls, ie='ThePlatform') return _playlist_from_matches(tp_urls, ie='ThePlatform')
# Look for Vessel embeds
vessel_urls = VesselIE._extract_urls(webpage)
if vessel_urls:
return _playlist_from_matches(vessel_urls, ie=VesselIE.ie_key())
# Look for embedded rtl.nl player # Look for embedded rtl.nl player
matches = re.findall( matches = re.findall(
r'<iframe[^>]+?src="((?:https?:)?//(?:www\.)?rtl\.nl/system/videoplayer/[^"]+(?:video_)?embed[^"]+)"', r'<iframe[^>]+?src="((?:https?:)?//(?:www\.)?rtl\.nl/system/videoplayer/[^"]+(?:video_)?embed[^"]+)"',
@@ -1568,12 +1701,16 @@ class GenericIE(InfoExtractor):
if matches: if matches:
return _playlist_from_matches(matches, lambda m: unescapeHTML(m)) return _playlist_from_matches(matches, lambda m: unescapeHTML(m))
# Look for embedded Dailymotion player # Look for Wordpress "YouTube Video Importer" plugin
matches = re.findall( matches = re.findall(r'''(?x)<div[^>]+
r'<(?:(?:embed|iframe)[^>]+?src=|input[^>]+id=[\'"]dmcloudUrlEmissionSelect[\'"][^>]+value=)(["\'])(?P<url>(?:https?:)?//(?:www\.)?dailymotion\.com/(?:embed|swf)/video/.+?)\1', webpage) class=(?P<q1>[\'"])[^\'"]*\byvii_single_video_player\b[^\'"]*(?P=q1)[^>]+
data-video_id=(?P<q2>[\'"])([^\'"]+)(?P=q2)''', webpage)
if matches: if matches:
return _playlist_from_matches( return _playlist_from_matches(matches, lambda m: m[-1])
matches, lambda m: unescapeHTML(m[1]))
matches = DailymotionIE._extract_urls(webpage)
if matches:
return _playlist_from_matches(matches)
# Look for embedded Dailymotion playlist player (#3822) # Look for embedded Dailymotion playlist player (#3822)
m = re.search( m = re.search(
@@ -1710,10 +1847,9 @@ class GenericIE(InfoExtractor):
return self.url_result(mobj.group('url')) return self.url_result(mobj.group('url'))
# Look for embedded Facebook player # Look for embedded Facebook player
mobj = re.search( facebook_url = FacebookIE._extract_url(webpage)
r'<iframe[^>]+?src=(["\'])(?P<url>https://www\.facebook\.com/video/embed.+?)\1', webpage) if facebook_url is not None:
if mobj is not None: return self.url_result(facebook_url, 'Facebook')
return self.url_result(mobj.group('url'), 'Facebook')
# Look for embedded VK player # Look for embedded VK player
mobj = re.search(r'<iframe[^>]+?src=(["\'])(?P<url>https?://vk\.com/video_ext\.php.+?)\1', webpage) mobj = re.search(r'<iframe[^>]+?src=(["\'])(?P<url>https?://vk\.com/video_ext\.php.+?)\1', webpage)
@@ -1835,14 +1971,6 @@ class GenericIE(InfoExtractor):
url = unescapeHTML(mobj.group('url')) url = unescapeHTML(mobj.group('url'))
return self.url_result(url) return self.url_result(url)
# Look for embedded vulture.com player
mobj = re.search(
r'<iframe src="(?P<url>https?://video\.vulture\.com/[^"]+)"',
webpage)
if mobj is not None:
url = unescapeHTML(mobj.group('url'))
return self.url_result(url, ie='Vulture')
# Look for embedded mtvservices player # Look for embedded mtvservices player
mtvservices_url = MTVServicesEmbeddedIE._extract_url(webpage) mtvservices_url = MTVServicesEmbeddedIE._extract_url(webpage)
if mtvservices_url: if mtvservices_url:
@@ -1903,18 +2031,14 @@ class GenericIE(InfoExtractor):
return self.url_result(mobj.group('url'), 'Zapiks') return self.url_result(mobj.group('url'), 'Zapiks')
# Look for Kaltura embeds # Look for Kaltura embeds
mobj = (re.search(r"(?s)kWidget\.(?:thumb)?[Ee]mbed\(\{.*?'wid'\s*:\s*'_?(?P<partner_id>[^']+)',.*?'entry_?[Ii]d'\s*:\s*'(?P<id>[^']+)',", webpage) or kaltura_url = KalturaIE._extract_url(webpage)
re.search(r'(?s)(?P<q1>["\'])(?:https?:)?//cdnapi(?:sec)?\.kaltura\.com/.*?(?:p|partner_id)/(?P<partner_id>\d+).*?(?P=q1).*?entry_?[Ii]d\s*:\s*(?P<q2>["\'])(?P<id>.+?)(?P=q2)', webpage)) if kaltura_url:
if mobj is not None: return self.url_result(smuggle_url(kaltura_url, {'source_url': url}), KalturaIE.ie_key())
return self.url_result(smuggle_url(
'kaltura:%(partner_id)s:%(id)s' % mobj.groupdict(),
{'source_url': url}), 'Kaltura')
# Look for Eagle.Platform embeds # Look for Eagle.Platform embeds
mobj = re.search( eagleplatform_url = EaglePlatformIE._extract_url(webpage)
r'<iframe[^>]+src="(?P<url>https?://.+?\.media\.eagleplatform\.com/index/player\?.+?)"', webpage) if eagleplatform_url:
if mobj is not None: return self.url_result(eagleplatform_url, EaglePlatformIE.ie_key())
return self.url_result(mobj.group('url'), 'EaglePlatform')
# Look for ClipYou (uses Eagle.Platform) embeds # Look for ClipYou (uses Eagle.Platform) embeds
mobj = re.search( mobj = re.search(
@@ -1955,6 +2079,12 @@ class GenericIE(InfoExtractor):
if nbc_sports_url: if nbc_sports_url:
return self.url_result(nbc_sports_url, 'NBCSportsVPlayer') return self.url_result(nbc_sports_url, 'NBCSportsVPlayer')
# Look for NBC News embeds
nbc_news_embed_url = re.search(
r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//www\.nbcnews\.com/widget/video-embed/[^"\']+)\1', webpage)
if nbc_news_embed_url:
return self.url_result(nbc_news_embed_url.group('url'), 'NBCNews')
# Look for Google Drive embeds # Look for Google Drive embeds
google_drive_url = GoogleDriveIE._extract_url(webpage) google_drive_url = GoogleDriveIE._extract_url(webpage)
if google_drive_url: if google_drive_url:
@@ -2054,6 +2184,24 @@ class GenericIE(InfoExtractor):
'uploader': video_uploader, 'uploader': video_uploader,
} }
# https://dev.twitter.com/cards/types/player#On_twitter.com_via_desktop_browser
embed_url = self._html_search_meta('twitter:player', webpage, default=None)
if embed_url:
return self.url_result(embed_url)
# Looking for http://schema.org/VideoObject
json_ld = self._search_json_ld(
webpage, video_id, default=None, expected_type='VideoObject')
if json_ld and json_ld.get('url'):
info_dict.update({
'title': video_title or info_dict['title'],
'description': video_description,
'thumbnail': video_thumbnail,
'age_limit': age_limit
})
info_dict.update(json_ld)
return info_dict
def check_video(vurl): def check_video(vurl):
if YoutubeIE.suitable(vurl): if YoutubeIE.suitable(vurl):
return True return True

View File

@@ -0,0 +1,66 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from .ooyala import OoyalaIE
from ..utils import js_to_json
class GodTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?god\.tv(?:/[^/]+)*/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://god.tv/jesus-image/video/jesus-conference-2016/randy-needham',
'info_dict': {
'id': 'lpd3g2MzE6D1g8zFAKz8AGpxWcpu6o_3',
'ext': 'mp4',
'title': 'Randy Needham',
'duration': 3615.08,
},
'params': {
'skip_download': True,
}
}, {
'url': 'http://god.tv/playlist/bible-study',
'info_dict': {
'id': 'bible-study',
},
'playlist_mincount': 37,
}, {
'url': 'http://god.tv/node/15097',
'only_matching': True,
}, {
'url': 'http://god.tv/live/africa',
'only_matching': True,
}, {
'url': 'http://god.tv/liveevents',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
settings = self._parse_json(
self._search_regex(
r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);',
webpage, 'settings', default='{}'),
display_id, transform_source=js_to_json, fatal=False)
ooyala_id = None
if settings:
playlist = settings.get('playlist')
if playlist and isinstance(playlist, list):
entries = [
OoyalaIE._build_url_result(video['content_id'])
for video in playlist if video.get('content_id')]
if entries:
return self.playlist_result(entries, display_id)
ooyala_id = settings.get('ooyala', {}).get('content_id')
if not ooyala_id:
ooyala_id = self._search_regex(
r'["\']content_id["\']\s*:\s*(["\'])(?P<id>[\w-]+)\1',
webpage, 'ooyala id', group='id')
return OoyalaIE._build_url_result(ooyala_id)

View File

@@ -0,0 +1,202 @@
# coding: utf-8
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..compat import compat_HTTPError
from ..utils import (
clean_html,
ExtractorError,
int_or_none,
parse_age_limit,
sanitized_Request,
try_get,
)
class HRTiBaseIE(InfoExtractor):
"""
Base Information Extractor for Croatian Radiotelevision
video on demand site https://hrti.hrt.hr
Reverse engineered from the JavaScript app in app.min.js
"""
_NETRC_MACHINE = 'hrti'
_APP_LANGUAGE = 'hr'
_APP_VERSION = '1.1'
_APP_PUBLICATION_ID = 'all_in_one'
_API_URL = 'http://clientapi.hrt.hr/client_api.php/config/identify/format/json'
def _initialize_api(self):
init_data = {
'application_publication_id': self._APP_PUBLICATION_ID
}
uuid = self._download_json(
self._API_URL, None, note='Downloading uuid',
errnote='Unable to download uuid',
data=json.dumps(init_data).encode('utf-8'))['uuid']
app_data = {
'uuid': uuid,
'application_publication_id': self._APP_PUBLICATION_ID,
'application_version': self._APP_VERSION
}
req = sanitized_Request(self._API_URL, data=json.dumps(app_data).encode('utf-8'))
req.get_method = lambda: 'PUT'
resources = self._download_json(
req, None, note='Downloading session information',
errnote='Unable to download session information')
self._session_id = resources['session_id']
modules = resources['modules']
self._search_url = modules['vod_catalog']['resources']['search']['uri'].format(
language=self._APP_LANGUAGE,
application_id=self._APP_PUBLICATION_ID)
self._login_url = (modules['user']['resources']['login']['uri'] +
'/format/json').format(session_id=self._session_id)
self._logout_url = modules['user']['resources']['logout']['uri']
def _login(self):
(username, password) = self._get_login_info()
# TODO: figure out authentication with cookies
if username is None or password is None:
self.raise_login_required()
auth_data = {
'username': username,
'password': password,
}
try:
auth_info = self._download_json(
self._login_url, None, note='Logging in', errnote='Unable to log in',
data=json.dumps(auth_data).encode('utf-8'))
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 406:
auth_info = self._parse_json(e.cause.read().encode('utf-8'), None)
else:
raise
error_message = auth_info.get('error', {}).get('message')
if error_message:
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, error_message),
expected=True)
self._token = auth_info['secure_streaming_token']
def _real_initialize(self):
self._initialize_api()
self._login()
class HRTiIE(HRTiBaseIE):
_VALID_URL = r'''(?x)
(?:
hrti:(?P<short_id>[0-9]+)|
https?://
hrti\.hrt\.hr/\#/video/show/(?P<id>[0-9]+)/(?P<display_id>[^/]+)?
)
'''
_TESTS = [{
'url': 'https://hrti.hrt.hr/#/video/show/2181385/republika-dokumentarna-serija-16-hd',
'info_dict': {
'id': '2181385',
'display_id': 'republika-dokumentarna-serija-16-hd',
'ext': 'mp4',
'title': 'REPUBLIKA, dokumentarna serija (1/6) (HD)',
'description': 'md5:48af85f620e8e0e1df4096270568544f',
'duration': 2922,
'view_count': int,
'average_rating': int,
'episode_number': int,
'season_number': int,
'age_limit': 12,
},
'skip': 'Requires account credentials',
}, {
'url': 'https://hrti.hrt.hr/#/video/show/2181385/',
'only_matching': True,
}, {
'url': 'hrti:2181385',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('short_id') or mobj.group('id')
display_id = mobj.group('display_id') or video_id
video = self._download_json(
'%s/video_id/%s/format/json' % (self._search_url, video_id),
display_id, 'Downloading video metadata JSON')['video'][0]
title_info = video['title']
title = title_info['title_long']
movie = video['video_assets']['movie'][0]
m3u8_url = movie['url'].format(TOKEN=self._token)
formats = self._extract_m3u8_formats(
m3u8_url, display_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls')
self._sort_formats(formats)
description = clean_html(title_info.get('summary_long'))
age_limit = parse_age_limit(video.get('parental_control', {}).get('rating'))
view_count = int_or_none(video.get('views'))
average_rating = int_or_none(video.get('user_rating'))
duration = int_or_none(movie.get('duration'))
return {
'id': video_id,
'display_id': display_id,
'title': title,
'description': description,
'duration': duration,
'view_count': view_count,
'average_rating': average_rating,
'age_limit': age_limit,
'formats': formats,
}
class HRTiPlaylistIE(HRTiBaseIE):
_VALID_URL = r'https?://hrti.hrt.hr/#/video/list/category/(?P<id>[0-9]+)/(?P<display_id>[^/]+)?'
_TESTS = [{
'url': 'https://hrti.hrt.hr/#/video/list/category/212/ekumena',
'info_dict': {
'id': '212',
'title': 'ekumena',
},
'playlist_mincount': 8,
'skip': 'Requires account credentials',
}, {
'url': 'https://hrti.hrt.hr/#/video/list/category/212/',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
category_id = mobj.group('id')
display_id = mobj.group('display_id') or category_id
response = self._download_json(
'%s/category_id/%s/format/json' % (self._search_url, category_id),
display_id, 'Downloading video metadata JSON')
video_ids = try_get(
response, lambda x: x['video_listings'][0]['alternatives'][0]['list'],
list) or [video['id'] for video in response.get('videos', []) if video.get('id')]
entries = [self.url_result('hrti:%s' % video_id) for video_id in video_ids]
return self.playlist_result(entries, category_id, display_id)

View File

@@ -12,7 +12,7 @@ from ..utils import (
class ImdbIE(InfoExtractor): class ImdbIE(InfoExtractor):
IE_NAME = 'imdb' IE_NAME = 'imdb'
IE_DESC = 'Internet Movie Database trailers' IE_DESC = 'Internet Movie Database trailers'
_VALID_URL = r'https?://(?:www|m)\.imdb\.com/video/[^/]+/vi(?P<id>\d+)' _VALID_URL = r'https?://(?:www|m)\.imdb\.com/(?:video/[^/]+/|title/tt\d+.*?#lb-)vi(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.imdb.com/video/imdb/vi2524815897', 'url': 'http://www.imdb.com/video/imdb/vi2524815897',
@@ -25,6 +25,12 @@ class ImdbIE(InfoExtractor):
}, { }, {
'url': 'http://www.imdb.com/video/_/vi2524815897', 'url': 'http://www.imdb.com/video/_/vi2524815897',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://www.imdb.com/title/tt1667889/?ref_=ext_shr_eml_vi#lb-vi2524815897',
'only_matching': True,
}, {
'url': 'http://www.imdb.com/title/tt1667889/#lb-vi2524815897',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@@ -60,7 +60,8 @@ class IndavideoEmbedIE(InfoExtractor):
formats = [{ formats = [{
'url': video_url, 'url': video_url,
'height': self._search_regex(r'\.(\d{3,4})\.mp4$', video_url, 'height', default=None), 'height': int_or_none(self._search_regex(
r'\.(\d{3,4})\.mp4(?:\?|$)', video_url, 'height', default=None)),
} for video_url in video_urls] } for video_url in video_urls]
self._sort_formats(formats) self._sort_formats(formats)

View File

@@ -8,6 +8,7 @@ from ..utils import (
int_or_none, int_or_none,
limit_length, limit_length,
lowercase_escape, lowercase_escape,
try_get,
) )
@@ -19,10 +20,16 @@ class InstagramIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': 'aye83DjauH', 'id': 'aye83DjauH',
'ext': 'mp4', 'ext': 'mp4',
'uploader_id': 'naomipq',
'title': 'Video by naomipq', 'title': 'Video by naomipq',
'description': 'md5:1f17f0ab29bd6fe2bfad705f58de3cb8', 'description': 'md5:1f17f0ab29bd6fe2bfad705f58de3cb8',
} 'thumbnail': 're:^https?://.*\.jpg',
'timestamp': 1371748545,
'upload_date': '20130620',
'uploader_id': 'naomipq',
'uploader': 'Naomi Leonor Phan-Quang',
'like_count': int,
'comment_count': int,
},
}, { }, {
# missing description # missing description
'url': 'https://www.instagram.com/p/BA-pQFBG8HZ/?taken-by=britneyspears', 'url': 'https://www.instagram.com/p/BA-pQFBG8HZ/?taken-by=britneyspears',
@@ -31,6 +38,13 @@ class InstagramIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'uploader_id': 'britneyspears', 'uploader_id': 'britneyspears',
'title': 'Video by britneyspears', 'title': 'Video by britneyspears',
'thumbnail': 're:^https?://.*\.jpg',
'timestamp': 1453760977,
'upload_date': '20160125',
'uploader_id': 'britneyspears',
'uploader': 'Britney Spears',
'like_count': int,
'comment_count': int,
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@@ -67,21 +81,57 @@ class InstagramIE(InfoExtractor):
url = mobj.group('url') url = mobj.group('url')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
uploader_id = self._search_regex(r'"owner":{"username":"(.+?)"',
webpage, 'uploader id', fatal=False) (video_url, description, thumbnail, timestamp, uploader,
desc = self._search_regex( uploader_id, like_count, comment_count) = [None] * 8
r'"caption":"(.+?)"', webpage, 'description', default=None)
if desc is not None: shared_data = self._parse_json(
desc = lowercase_escape(desc) self._search_regex(
r'window\._sharedData\s*=\s*({.+?});',
webpage, 'shared data', default='{}'),
video_id, fatal=False)
if shared_data:
media = try_get(
shared_data, lambda x: x['entry_data']['PostPage'][0]['media'], dict)
if media:
video_url = media.get('video_url')
description = media.get('caption')
thumbnail = media.get('display_src')
timestamp = int_or_none(media.get('date'))
uploader = media.get('owner', {}).get('full_name')
uploader_id = media.get('owner', {}).get('username')
like_count = int_or_none(media.get('likes', {}).get('count'))
comment_count = int_or_none(media.get('comments', {}).get('count'))
if not video_url:
video_url = self._og_search_video_url(webpage, secure=False)
if not uploader_id:
uploader_id = self._search_regex(
r'"owner"\s*:\s*{\s*"username"\s*:\s*"(.+?)"',
webpage, 'uploader id', fatal=False)
if not description:
description = self._search_regex(
r'"caption"\s*:\s*"(.+?)"', webpage, 'description', default=None)
if description is not None:
description = lowercase_escape(description)
if not thumbnail:
thumbnail = self._og_search_thumbnail(webpage)
return { return {
'id': video_id, 'id': video_id,
'url': self._og_search_video_url(webpage, secure=False), 'url': video_url,
'ext': 'mp4', 'ext': 'mp4',
'title': 'Video by %s' % uploader_id, 'title': 'Video by %s' % uploader_id,
'thumbnail': self._og_search_thumbnail(webpage), 'description': description,
'thumbnail': thumbnail,
'timestamp': timestamp,
'uploader_id': uploader_id, 'uploader_id': uploader_id,
'description': desc, 'uploader': uploader,
'like_count': like_count,
'comment_count': comment_count,
} }

View File

@@ -3,28 +3,22 @@ from __future__ import unicode_literals
import hashlib import hashlib
import itertools import itertools
import math
import os
import random
import re import re
import time import time
import uuid
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_parse_qs,
compat_str, compat_str,
compat_urllib_parse_urlencode, compat_urllib_parse_urlencode,
compat_urllib_parse_urlparse,
) )
from ..utils import ( from ..utils import (
clean_html,
decode_packed_codes, decode_packed_codes,
get_element_by_id,
get_element_by_attribute,
ExtractorError, ExtractorError,
ohdave_rsa_encrypt, ohdave_rsa_encrypt,
remove_start, remove_start,
sanitized_Request,
urlencode_postdata,
url_basename,
) )
@@ -171,70 +165,21 @@ class IqiyiIE(InfoExtractor):
_TESTS = [{ _TESTS = [{
'url': 'http://www.iqiyi.com/v_19rrojlavg.html', 'url': 'http://www.iqiyi.com/v_19rrojlavg.html',
'md5': '2cb594dc2781e6c941a110d8f358118b', # MD5 checksum differs on my machine and Travis CI
'info_dict': { 'info_dict': {
'id': '9c1fb1b99d192b21c559e5a1a2cb3c73', 'id': '9c1fb1b99d192b21c559e5a1a2cb3c73',
'ext': 'mp4',
'title': '美国德州空中惊现奇异云团 酷似UFO', 'title': '美国德州空中惊现奇异云团 酷似UFO',
'ext': 'f4v',
} }
}, { }, {
'url': 'http://www.iqiyi.com/v_19rrhnnclk.html', 'url': 'http://www.iqiyi.com/v_19rrhnnclk.html',
'md5': '667171934041350c5de3f5015f7f1152',
'info_dict': { 'info_dict': {
'id': 'e3f585b550a280af23c98b6cb2be19fb', 'id': 'e3f585b550a280af23c98b6cb2be19fb',
'title': '名侦探柯南第752集', 'ext': 'mp4',
}, 'title': '名侦探柯南 国语版第752集 迫近灰原秘密的黑影 下篇',
'playlist': [{
'info_dict': {
'id': 'e3f585b550a280af23c98b6cb2be19fb_part1',
'ext': 'f4v',
'title': '名侦探柯南第752集',
},
}, {
'info_dict': {
'id': 'e3f585b550a280af23c98b6cb2be19fb_part2',
'ext': 'f4v',
'title': '名侦探柯南第752集',
},
}, {
'info_dict': {
'id': 'e3f585b550a280af23c98b6cb2be19fb_part3',
'ext': 'f4v',
'title': '名侦探柯南第752集',
},
}, {
'info_dict': {
'id': 'e3f585b550a280af23c98b6cb2be19fb_part4',
'ext': 'f4v',
'title': '名侦探柯南第752集',
},
}, {
'info_dict': {
'id': 'e3f585b550a280af23c98b6cb2be19fb_part5',
'ext': 'f4v',
'title': '名侦探柯南第752集',
},
}, {
'info_dict': {
'id': 'e3f585b550a280af23c98b6cb2be19fb_part6',
'ext': 'f4v',
'title': '名侦探柯南第752集',
},
}, {
'info_dict': {
'id': 'e3f585b550a280af23c98b6cb2be19fb_part7',
'ext': 'f4v',
'title': '名侦探柯南第752集',
},
}, {
'info_dict': {
'id': 'e3f585b550a280af23c98b6cb2be19fb_part8',
'ext': 'f4v',
'title': '名侦探柯南第752集',
},
}],
'params': {
'skip_download': True,
}, },
'skip': 'Geo-restricted to China',
}, { }, {
'url': 'http://www.iqiyi.com/w_19rt6o8t9p.html', 'url': 'http://www.iqiyi.com/w_19rt6o8t9p.html',
'only_matching': True, 'only_matching': True,
@@ -250,22 +195,10 @@ class IqiyiIE(InfoExtractor):
'url': 'http://www.iqiyi.com/v_19rrny4w8w.html', 'url': 'http://www.iqiyi.com/v_19rrny4w8w.html',
'info_dict': { 'info_dict': {
'id': 'f3cf468b39dddb30d676f89a91200dc1', 'id': 'f3cf468b39dddb30d676f89a91200dc1',
'ext': 'mp4',
'title': '泰坦尼克号', 'title': '泰坦尼克号',
}, },
'playlist': [{ 'skip': 'Geo-restricted to China',
'info_dict': {
'id': 'f3cf468b39dddb30d676f89a91200dc1_part1',
'ext': 'f4v',
'title': '泰坦尼克号',
},
}, {
'info_dict': {
'id': 'f3cf468b39dddb30d676f89a91200dc1_part2',
'ext': 'f4v',
'title': '泰坦尼克号',
},
}],
'expected_warnings': ['Needs a VIP account for full video'],
}, { }, {
'url': 'http://www.iqiyi.com/a_19rrhb8ce1.html', 'url': 'http://www.iqiyi.com/a_19rrhb8ce1.html',
'info_dict': { 'info_dict': {
@@ -278,20 +211,15 @@ class IqiyiIE(InfoExtractor):
'only_matching': True, 'only_matching': True,
}] }]
_FORMATS_MAP = [ _FORMATS_MAP = {
('1', 'h6'), '96': 1, # 216p, 240p
('2', 'h5'), '1': 2, # 336p, 360p
('3', 'h4'), '2': 3, # 480p, 504p
('4', 'h3'), '21': 4, # 504p
('5', 'h2'), '4': 5, # 720p
('10', 'h1'), '17': 5, # 720p
] '5': 6, # 1072p, 1080p
'18': 7, # 1080p
AUTH_API_ERRORS = {
# No preview available (不允许试看鉴权失败)
'Q00505': 'This video requires a VIP account',
# End of preview time (试看结束鉴权失败)
'Q00506': 'Needs a VIP account for full video',
} }
def _real_initialize(self): def _real_initialize(self):
@@ -352,177 +280,23 @@ class IqiyiIE(InfoExtractor):
return True return True
def _authenticate_vip_video(self, api_video_url, video_id, tvid, _uuid, do_report_warning): def get_raw_data(self, tvid, video_id):
auth_params = { tm = int(time.time() * 1000)
# version and platform hard-coded in com/qiyi/player/core/model/remote/AuthenticationRemote.as
'version': '2.0', key = 'd5fb4bd9d50c4be6948c97edd7254b0e'
'platform': 'b6c13e26323c537d', sc = md5_text(compat_str(tm) + key + tvid)
'aid': tvid, params = {
'tvid': tvid, 'tvid': tvid,
'uid': '',
'deviceId': _uuid,
'playType': 'main', # XXX: always main?
'filename': os.path.splitext(url_basename(api_video_url))[0],
}
qd_items = compat_parse_qs(compat_urllib_parse_urlparse(api_video_url).query)
for key, val in qd_items.items():
auth_params[key] = val[0]
auth_req = sanitized_Request(
'http://api.vip.iqiyi.com/services/ckn.action',
urlencode_postdata(auth_params))
# iQiyi server throws HTTP 405 error without the following header
auth_req.add_header('Content-Type', 'application/x-www-form-urlencoded')
auth_result = self._download_json(
auth_req, video_id,
note='Downloading video authentication JSON',
errnote='Unable to download video authentication JSON')
code = auth_result.get('code')
msg = self.AUTH_API_ERRORS.get(code) or auth_result.get('msg') or code
if code == 'Q00506':
if do_report_warning:
self.report_warning(msg)
return False
if 'data' not in auth_result:
if msg is not None:
raise ExtractorError('%s said: %s' % (self.IE_NAME, msg), expected=True)
raise ExtractorError('Unexpected error from Iqiyi auth API')
return auth_result['data']
def construct_video_urls(self, data, video_id, _uuid, tvid):
def do_xor(x, y):
a = y % 3
if a == 1:
return x ^ 121
if a == 2:
return x ^ 72
return x ^ 103
def get_encode_code(l):
a = 0
b = l.split('-')
c = len(b)
s = ''
for i in range(c - 1, -1, -1):
a = do_xor(int(b[c - i - 1], 16), i)
s += chr(a)
return s[::-1]
def get_path_key(x, format_id, segment_index):
mg = ')(*&^flash@#$%a'
tm = self._download_json(
'http://data.video.qiyi.com/t?tn=' + str(random.random()), video_id,
note='Download path key of segment %d for format %s' % (segment_index + 1, format_id)
)['t']
t = str(int(math.floor(int(tm) / (600.0))))
return md5_text(t + mg + x)
video_urls_dict = {}
need_vip_warning_report = True
for format_item in data['vp']['tkl'][0]['vs']:
if 0 < int(format_item['bid']) <= 10:
format_id = self.get_format(format_item['bid'])
else:
continue
video_urls = []
video_urls_info = format_item['fs']
if not format_item['fs'][0]['l'].startswith('/'):
t = get_encode_code(format_item['fs'][0]['l'])
if t.endswith('mp4'):
video_urls_info = format_item['flvs']
for segment_index, segment in enumerate(video_urls_info):
vl = segment['l']
if not vl.startswith('/'):
vl = get_encode_code(vl)
is_vip_video = '/vip/' in vl
filesize = segment['b']
base_url = data['vp']['du'].split('/')
if not is_vip_video:
key = get_path_key(
vl.split('/')[-1].split('.')[0], format_id, segment_index)
base_url.insert(-1, key)
base_url = '/'.join(base_url)
param = {
'su': _uuid,
'qyid': uuid.uuid4().hex,
'client': '',
'z': '',
'bt': '',
'ct': '',
'tn': str(int(time.time()))
}
api_video_url = base_url + vl
if is_vip_video:
api_video_url = api_video_url.replace('.f4v', '.hml')
auth_result = self._authenticate_vip_video(
api_video_url, video_id, tvid, _uuid, need_vip_warning_report)
if auth_result is False:
need_vip_warning_report = False
break
param.update({
't': auth_result['t'],
# cid is hard-coded in com/qiyi/player/core/player/RuntimeData.as
'cid': 'afbe8fd3d73448c9',
'vid': video_id,
'QY00001': auth_result['u'],
})
api_video_url += '?' if '?' not in api_video_url else '&'
api_video_url += compat_urllib_parse_urlencode(param)
js = self._download_json(
api_video_url, video_id,
note='Download video info of segment %d for format %s' % (segment_index + 1, format_id))
video_url = js['l']
video_urls.append(
(video_url, filesize))
video_urls_dict[format_id] = video_urls
return video_urls_dict
def get_format(self, bid):
matched_format_ids = [_format_id for _bid, _format_id in self._FORMATS_MAP if _bid == str(bid)]
return matched_format_ids[0] if len(matched_format_ids) else None
def get_bid(self, format_id):
matched_bids = [_bid for _bid, _format_id in self._FORMATS_MAP if _format_id == format_id]
return matched_bids[0] if len(matched_bids) else None
def get_raw_data(self, tvid, video_id, enc_key, _uuid):
tm = str(int(time.time()))
tail = tm + tvid
param = {
'key': 'fvip',
'src': md5_text('youtube-dl'),
'tvId': tvid,
'vid': video_id, 'vid': video_id,
'vinfo': 1, 'src': '76f90cbd92f94a2e925d83e8ccd22cb7',
'tm': tm, 'sc': sc,
'enc': md5_text(enc_key + tail), 't': tm,
'qyid': _uuid,
'tn': random.random(),
# In iQiyi's flash player, um is set to 1 if there's a logged user
# Some 1080P formats are only available with a logged user.
# Here force um=1 to trick the iQiyi server
'um': 1,
'authkey': md5_text(md5_text('') + tail),
'k_tag': 1,
} }
api_url = 'http://cache.video.qiyi.com/vms' + '?' + \ return self._download_json(
compat_urllib_parse_urlencode(param) 'http://cache.m.iqiyi.com/jp/tmts/%s/%s/' % (tvid, video_id),
raw_data = self._download_json(api_url, video_id) video_id, transform_source=lambda s: remove_start(s, 'var tvInfoJs='),
return raw_data query=params, headers=self.geo_verification_headers())
def get_enc_key(self, video_id):
# TODO: automatic key extraction
# last update at 2016-01-22 for Zombie::bite
enc_key = '4a1caba4b4465345366f28da7c117d20'
return enc_key
def _extract_playlist(self, webpage): def _extract_playlist(self, webpage):
PAGE_SIZE = 50 PAGE_SIZE = 50
@@ -571,58 +345,41 @@ class IqiyiIE(InfoExtractor):
r'data-player-tvid\s*=\s*[\'"](\d+)', webpage, 'tvid') r'data-player-tvid\s*=\s*[\'"](\d+)', webpage, 'tvid')
video_id = self._search_regex( video_id = self._search_regex(
r'data-player-videoid\s*=\s*[\'"]([a-f\d]+)', webpage, 'video_id') r'data-player-videoid\s*=\s*[\'"]([a-f\d]+)', webpage, 'video_id')
_uuid = uuid.uuid4().hex
enc_key = self.get_enc_key(video_id) formats = []
for _ in range(5):
raw_data = self.get_raw_data(tvid, video_id)
raw_data = self.get_raw_data(tvid, video_id, enc_key, _uuid) if raw_data['code'] != 'A00000':
if raw_data['code'] == 'A00111':
self.raise_geo_restricted()
raise ExtractorError('Unable to load data. Error code: ' + raw_data['code'])
if raw_data['code'] != 'A000000': data = raw_data['data']
raise ExtractorError('Unable to load data. Error code: ' + raw_data['code'])
data = raw_data['data'] for stream in data['vidl']:
if 'm3utx' not in stream:
continue
vd = compat_str(stream['vd'])
formats.append({
'url': stream['m3utx'],
'format_id': vd,
'ext': 'mp4',
'preference': self._FORMATS_MAP.get(vd, -1),
'protocol': 'm3u8_native',
})
title = data['vi']['vn'] if formats:
break
# generate video_urls_dict self._sleep(5, video_id)
video_urls_dict = self.construct_video_urls(
data, video_id, _uuid, tvid)
# construct info self._sort_formats(formats)
entries = [] title = (get_element_by_id('widget-videotitle', webpage) or
for format_id in video_urls_dict: clean_html(get_element_by_attribute('class', 'mod-play-tit', webpage)))
video_urls = video_urls_dict[format_id]
for i, video_url_info in enumerate(video_urls):
if len(entries) < i + 1:
entries.append({'formats': []})
entries[i]['formats'].append(
{
'url': video_url_info[0],
'filesize': video_url_info[-1],
'format_id': format_id,
'preference': int(self.get_bid(format_id))
}
)
for i in range(len(entries)): return {
self._sort_formats(entries[i]['formats']) 'id': video_id,
entries[i].update( 'title': title,
{ 'formats': formats,
'id': '%s_part%d' % (video_id, i + 1), }
'title': title,
}
)
if len(entries) > 1:
info = {
'_type': 'multi_video',
'id': video_id,
'title': title,
'entries': entries,
}
else:
info = entries[0]
info['id'] = video_id
info['title'] = title
return info

View File

@@ -12,9 +12,35 @@ from ..utils import (
class JWPlatformBaseIE(InfoExtractor): class JWPlatformBaseIE(InfoExtractor):
@staticmethod
def _find_jwplayer_data(webpage):
# TODO: Merge this with JWPlayer-related codes in generic.py
mobj = re.search(
'jwplayer\((?P<quote>[\'"])[^\'" ]+(?P=quote)\)\.setup\((?P<options>[^)]+)\)',
webpage)
if mobj:
return mobj.group('options')
def _extract_jwplayer_data(self, webpage, video_id, *args, **kwargs):
jwplayer_data = self._parse_json(
self._find_jwplayer_data(webpage), video_id)
return self._parse_jwplayer_data(
jwplayer_data, video_id, *args, **kwargs)
def _parse_jwplayer_data(self, jwplayer_data, video_id, require_title=True, m3u8_id=None, rtmp_params=None): def _parse_jwplayer_data(self, jwplayer_data, video_id, require_title=True, m3u8_id=None, rtmp_params=None):
# JWPlayer backward compatibility: flattened playlists
# https://github.com/jwplayer/jwplayer/blob/v7.4.3/src/js/api/config.js#L81-L96
if 'playlist' not in jwplayer_data:
jwplayer_data = {'playlist': [jwplayer_data]}
video_data = jwplayer_data['playlist'][0] video_data = jwplayer_data['playlist'][0]
# JWPlayer backward compatibility: flattened sources
# https://github.com/jwplayer/jwplayer/blob/v7.4.3/src/js/playlist/item.js#L29-L35
if 'sources' not in video_data:
video_data['sources'] = [video_data]
formats = [] formats = []
for source in video_data['sources']: for source in video_data['sources']:
source_url = self._proto_relative_url(source['file']) source_url = self._proto_relative_url(source['file'])

View File

@@ -6,7 +6,6 @@ import base64
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_urllib_parse_urlencode,
compat_urlparse, compat_urlparse,
compat_parse_qs, compat_parse_qs,
) )
@@ -15,6 +14,7 @@ from ..utils import (
ExtractorError, ExtractorError,
int_or_none, int_or_none,
unsmuggle_url, unsmuggle_url,
smuggle_url,
) )
@@ -34,7 +34,8 @@ class KalturaIE(InfoExtractor):
)(?:/(?P<path>[^?]+))?(?:\?(?P<query>.*))? )(?:/(?P<path>[^?]+))?(?:\?(?P<query>.*))?
) )
''' '''
_API_BASE = 'http://cdnapi.kaltura.com/api_v3/index.php?' _SERVICE_URL = 'http://cdnapi.kaltura.com'
_SERVICE_BASE = '/api_v3/index.php'
_TESTS = [ _TESTS = [
{ {
'url': 'kaltura:269692:1_1jc2y3e4', 'url': 'kaltura:269692:1_1jc2y3e4',
@@ -64,16 +65,50 @@ class KalturaIE(InfoExtractor):
} }
] ]
def _kaltura_api_call(self, video_id, actions, *args, **kwargs): @staticmethod
def _extract_url(webpage):
mobj = (
re.search(
r"""(?xs)
kWidget\.(?:thumb)?[Ee]mbed\(
\{.*?
(?P<q1>['\"])wid(?P=q1)\s*:\s*
(?P<q2>['\"])_?(?P<partner_id>[^'\"]+)(?P=q2),.*?
(?P<q3>['\"])entry_?[Ii]d(?P=q3)\s*:\s*
(?P<q4>['\"])(?P<id>[^'\"]+)(?P=q4),
""", webpage) or
re.search(
r'''(?xs)
(?P<q1>["\'])
(?:https?:)?//cdnapi(?:sec)?\.kaltura\.com/.*?(?:p|partner_id)/(?P<partner_id>\d+).*?
(?P=q1).*?
(?:
entry_?[Ii]d|
(?P<q2>["\'])entry_?[Ii]d(?P=q2)
)\s*:\s*
(?P<q3>["\'])(?P<id>.+?)(?P=q3)
''', webpage))
if mobj:
embed_info = mobj.groupdict()
url = 'kaltura:%(partner_id)s:%(id)s' % embed_info
escaped_pid = re.escape(embed_info['partner_id'])
service_url = re.search(
r'<script[^>]+src=["\']((?:https?:)?//.+?)/p/%s/sp/%s00/embedIframeJs' % (escaped_pid, escaped_pid),
webpage)
if service_url:
url = smuggle_url(url, {'service_url': service_url.group(1)})
return url
def _kaltura_api_call(self, video_id, actions, service_url=None, *args, **kwargs):
params = actions[0] params = actions[0]
if len(actions) > 1: if len(actions) > 1:
for i, a in enumerate(actions[1:], start=1): for i, a in enumerate(actions[1:], start=1):
for k, v in a.items(): for k, v in a.items():
params['%d:%s' % (i, k)] = v params['%d:%s' % (i, k)] = v
query = compat_urllib_parse_urlencode(params) data = self._download_json(
url = self._API_BASE + query (service_url or self._SERVICE_URL) + self._SERVICE_BASE,
data = self._download_json(url, video_id, *args, **kwargs) video_id, query=params, *args, **kwargs)
status = data if len(actions) == 1 else data[0] status = data if len(actions) == 1 else data[0]
if status.get('objectType') == 'KalturaAPIException': if status.get('objectType') == 'KalturaAPIException':
@@ -82,7 +117,7 @@ class KalturaIE(InfoExtractor):
return data return data
def _get_kaltura_signature(self, video_id, partner_id): def _get_kaltura_signature(self, video_id, partner_id, service_url=None):
actions = [{ actions = [{
'apiVersion': '3.1', 'apiVersion': '3.1',
'expiry': 86400, 'expiry': 86400,
@@ -92,10 +127,10 @@ class KalturaIE(InfoExtractor):
'widgetId': '_%s' % partner_id, 'widgetId': '_%s' % partner_id,
}] }]
return self._kaltura_api_call( return self._kaltura_api_call(
video_id, actions, note='Downloading Kaltura signature')['ks'] video_id, actions, service_url, note='Downloading Kaltura signature')['ks']
def _get_video_info(self, video_id, partner_id): def _get_video_info(self, video_id, partner_id, service_url=None):
signature = self._get_kaltura_signature(video_id, partner_id) signature = self._get_kaltura_signature(video_id, partner_id, service_url)
actions = [ actions = [
{ {
'action': 'null', 'action': 'null',
@@ -118,7 +153,7 @@ class KalturaIE(InfoExtractor):
}, },
] ]
return self._kaltura_api_call( return self._kaltura_api_call(
video_id, actions, note='Downloading video info JSON') video_id, actions, service_url, note='Downloading video info JSON')
def _real_extract(self, url): def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {}) url, smuggled_data = unsmuggle_url(url, {})
@@ -127,7 +162,7 @@ class KalturaIE(InfoExtractor):
partner_id, entry_id = mobj.group('partner_id', 'id') partner_id, entry_id = mobj.group('partner_id', 'id')
ks = None ks = None
if partner_id and entry_id: if partner_id and entry_id:
info, flavor_assets = self._get_video_info(entry_id, partner_id) info, flavor_assets = self._get_video_info(entry_id, partner_id, smuggled_data.get('service_url'))
else: else:
path, query = mobj.group('path', 'query') path, query = mobj.group('path', 'query')
if not path and not query: if not path and not query:
@@ -175,12 +210,17 @@ class KalturaIE(InfoExtractor):
unsigned_url += '?referrer=%s' % referrer unsigned_url += '?referrer=%s' % referrer
return unsigned_url return unsigned_url
data_url = info['dataUrl']
if '/flvclipper/' in data_url:
data_url = re.sub(r'/flvclipper/.*', '/serveFlavor', data_url)
formats = [] formats = []
for f in flavor_assets: for f in flavor_assets:
# Continue if asset is not ready # Continue if asset is not ready
if f['status'] != 2: if f['status'] != 2:
continue continue
video_url = sign_url('%s/flavorId/%s' % (info['dataUrl'], f['id'])) video_url = sign_url(
'%s/flavorId/%s' % (data_url, f['id']))
formats.append({ formats.append({
'format_id': '%(fileExt)s-%(bitrate)s' % f, 'format_id': '%(fileExt)s-%(bitrate)s' % f,
'ext': f.get('fileExt'), 'ext': f.get('fileExt'),
@@ -193,9 +233,12 @@ class KalturaIE(InfoExtractor):
'width': int_or_none(f.get('width')), 'width': int_or_none(f.get('width')),
'url': video_url, 'url': video_url,
}) })
m3u8_url = sign_url(info['dataUrl'].replace('format/url', 'format/applehttp')) if '/playManifest/' in data_url:
formats.extend(self._extract_m3u8_formats( m3u8_url = sign_url(data_url.replace(
m3u8_url, entry_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False)) 'format/url', 'format/applehttp'))
formats.extend(self._extract_m3u8_formats(
m3u8_url, entry_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
self._check_formats(formats, entry_id) self._check_formats(formats, entry_id)
self._sort_formats(formats) self._sort_formats(formats)

View File

@@ -0,0 +1,71 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
int_or_none,
qualities,
)
class KamcordIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?kamcord\.com/v/(?P<id>[^/?#&]+)'
_TEST = {
'url': 'https://www.kamcord.com/v/hNYRduDgWb4',
'md5': 'c3180e8a9cfac2e86e1b88cb8751b54c',
'info_dict': {
'id': 'hNYRduDgWb4',
'ext': 'mp4',
'title': 'Drinking Madness',
'uploader': 'jacksfilms',
'uploader_id': '3044562',
'view_count': int,
'like_count': int,
'comment_count': int,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video = self._parse_json(
self._search_regex(
r'window\.__props\s*=\s*({.+?});?(?:\n|\s*</script)',
webpage, 'video'),
video_id)['video']
title = video['title']
formats = self._extract_m3u8_formats(
video['play']['hls'], video_id, 'mp4', entry_protocol='m3u8_native')
self._sort_formats(formats)
uploader = video.get('user', {}).get('username')
uploader_id = video.get('user', {}).get('id')
view_count = int_or_none(video.get('viewCount'))
like_count = int_or_none(video.get('heartCount'))
comment_count = int_or_none(video.get('messageCount'))
preference_key = qualities(('small', 'medium', 'large'))
thumbnails = [{
'url': thumbnail_url,
'id': thumbnail_id,
'preference': preference_key(thumbnail_id),
} for thumbnail_id, thumbnail_url in (video.get('thumbnail') or {}).items()
if isinstance(thumbnail_id, compat_str) and isinstance(thumbnail_url, compat_str)]
return {
'id': video_id,
'title': title,
'uploader': uploader,
'uploader_id': uploader_id,
'view_count': view_count,
'like_count': like_count,
'comment_count': comment_count,
'thumbnails': thumbnails,
'formats': formats,
}

View File

@@ -26,11 +26,6 @@ class KuwoBaseIE(InfoExtractor):
def _get_formats(self, song_id, tolerate_ip_deny=False): def _get_formats(self, song_id, tolerate_ip_deny=False):
formats = [] formats = []
for file_format in self._FORMATS: for file_format in self._FORMATS:
headers = {}
cn_verification_proxy = self._downloader.params.get('cn_verification_proxy')
if cn_verification_proxy:
headers['Ytdl-request-proxy'] = cn_verification_proxy
query = { query = {
'format': file_format['ext'], 'format': file_format['ext'],
'br': file_format.get('br', ''), 'br': file_format.get('br', ''),
@@ -42,7 +37,7 @@ class KuwoBaseIE(InfoExtractor):
song_url = self._download_webpage( song_url = self._download_webpage(
'http://antiserver.kuwo.cn/anti.s', 'http://antiserver.kuwo.cn/anti.s',
song_id, note='Download %s url info' % file_format['format'], song_id, note='Download %s url info' % file_format['format'],
query=query, headers=headers, query=query, headers=self.geo_verification_headers(),
) )
if song_url == 'IPDeny' and not tolerate_ip_deny: if song_url == 'IPDeny' and not tolerate_ip_deny:
@@ -148,8 +143,8 @@ class KuwoAlbumIE(InfoExtractor):
'url': 'http://www.kuwo.cn/album/502294/', 'url': 'http://www.kuwo.cn/album/502294/',
'info_dict': { 'info_dict': {
'id': '502294', 'id': '502294',
'title': 'M', 'title': 'Made\xa0Series\xa0《M》',
'description': 'md5:6a7235a84cc6400ec3b38a7bdaf1d60c', 'description': 'md5:d463f0d8a0ff3c3ea3d6ed7452a9483f',
}, },
'playlist_count': 2, 'playlist_count': 2,
} }
@@ -209,7 +204,7 @@ class KuwoSingerIE(InfoExtractor):
'url': 'http://www.kuwo.cn/mingxing/bruno+mars/', 'url': 'http://www.kuwo.cn/mingxing/bruno+mars/',
'info_dict': { 'info_dict': {
'id': 'bruno+mars', 'id': 'bruno+mars',
'title': 'Bruno Mars', 'title': 'Bruno\xa0Mars',
}, },
'playlist_mincount': 329, 'playlist_mincount': 329,
}, { }, {
@@ -306,7 +301,7 @@ class KuwoMvIE(KuwoBaseIE):
'id': '6480076', 'id': '6480076',
'ext': 'mp4', 'ext': 'mp4',
'title': 'My HouseMV', 'title': 'My HouseMV',
'creator': '2PM', 'creator': 'PM02:00',
}, },
# In this video, music URLs (anti.s) are blocked outside China and # In this video, music URLs (anti.s) are blocked outside China and
# USA, while the MV URL (mvurl) is available globally, so force the MV # USA, while the MV URL (mvurl) is available globally, so force the MV

View File

@@ -1,60 +1,65 @@
# coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
parse_duration, js_to_json,
smuggle_url,
) )
class LA7IE(InfoExtractor): class LA7IE(InfoExtractor):
IE_NAME = 'la7.tv' IE_NAME = 'la7.it'
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)(https?://)?(?:
https?://(?:www\.)?la7\.tv/ (?:www\.)?la7\.it/([^/]+)/(?:rivedila7|video)/|
(?: tg\.la7\.it/repliche-tgla7\?id=
richplayer/\?assetid=| )(?P<id>.+)'''
\?contentId=
)
(?P<id>[0-9]+)'''
_TEST = { _TESTS = [{
'url': 'http://www.la7.tv/richplayer/?assetid=50355319', # 'src' is a plain URL
'md5': 'ec7d1f0224d20ba293ab56cf2259651f', 'url': 'http://www.la7.it/crozza/video/inccool8-02-10-2015-163722',
'md5': '8b613ffc0c4bf9b9e377169fc19c214c',
'info_dict': { 'info_dict': {
'id': '50355319', 'id': 'inccool8-02-10-2015-163722',
'ext': 'mp4', 'ext': 'mp4',
'title': 'IL DIVO', 'title': 'Inc.Cool8',
'description': 'Un film di Paolo Sorrentino con Toni Servillo, Anna Bonaiuto, Giulio Bosetti e Flavio Bucci', 'description': 'Benvenuti nell\'incredibile mondo della INC. COOL. 8. dove “INC.” sta per “Incorporated” “COOL” sta per “fashion” ed Eight sta per il gesto atletico',
'duration': 6254, 'thumbnail': 're:^https?://.*',
'uploader_id': 'kdla7pillole@iltrovatore.it',
'timestamp': 1443814869,
'upload_date': '20151002',
}, },
'skip': 'Blocked in the US', }, {
} # 'src' is a dictionary
'url': 'http://tg.la7.it/repliche-tgla7?id=189080',
'md5': '6b0d8888d286e39870208dfeceaf456b',
'info_dict': {
'id': '189080',
'ext': 'mp4',
'title': 'TG LA7',
},
}, {
'url': 'http://www.la7.it/omnibus/rivedila7/omnibus-news-02-07-2016-189077',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
xml_url = 'http://www.la7.tv/repliche/content/index.php?contentId=%s' % video_id
doc = self._download_xml(xml_url, video_id)
video_title = doc.find('title').text webpage = self._download_webpage(url, video_id)
description = doc.find('description').text
duration = parse_duration(doc.find('duration').text)
thumbnail = doc.find('img').text
view_count = int(doc.find('views').text)
prefix = doc.find('.//fqdn').text.strip().replace('auto:', 'http:') player_data = self._parse_json(
self._search_regex(r'videoLa7\(({[^;]+})\);', webpage, 'player data'),
formats = [{ video_id, transform_source=js_to_json)
'format': vnode.find('quality').text,
'tbr': int(vnode.find('quality').text),
'url': vnode.find('fms').text.strip().replace('mp4:', prefix),
} for vnode in doc.findall('.//videos/video')]
self._sort_formats(formats)
return { return {
'_type': 'url_transparent',
'url': smuggle_url('kaltura:103:%s' % player_data['vid'], {
'service_url': 'http://kdam.iltrovatore.it',
}),
'id': video_id, 'id': video_id,
'title': video_title, 'title': player_data['title'],
'description': description, 'description': self._og_search_description(webpage, default=None),
'thumbnail': thumbnail, 'thumbnail': player_data.get('poster'),
'duration': duration, 'ie_key': 'Kaltura',
'formats': formats,
'view_count': view_count,
} }

View File

@@ -20,15 +20,16 @@ from ..utils import (
int_or_none, int_or_none,
orderedSet, orderedSet,
parse_iso8601, parse_iso8601,
sanitized_Request,
str_or_none, str_or_none,
url_basename, url_basename,
urshift,
update_url_query,
) )
class LeIE(InfoExtractor): class LeIE(InfoExtractor):
IE_DESC = '乐视网' IE_DESC = '乐视网'
_VALID_URL = r'https?://www\.le\.com/ptv/vplay/(?P<id>\d+)\.html' _VALID_URL = r'https?://(?:www\.le\.com/ptv/vplay|sports\.le\.com/video)/(?P<id>\d+)\.html'
_URL_TEMPLATE = 'http://www.le.com/ptv/vplay/%s.html' _URL_TEMPLATE = 'http://www.le.com/ptv/vplay/%s.html'
@@ -69,17 +70,16 @@ class LeIE(InfoExtractor):
'hls_prefer_native': True, 'hls_prefer_native': True,
}, },
'skip': 'Only available in China', 'skip': 'Only available in China',
}, {
'url': 'http://sports.le.com/video/25737697.html',
'only_matching': True,
}] }]
@staticmethod
def urshift(val, n):
return val >> n if val >= 0 else (val + 0x100000000) >> n
# ror() and calc_time_key() are reversed from a embedded swf file in KLetvPlayer.swf # ror() and calc_time_key() are reversed from a embedded swf file in KLetvPlayer.swf
def ror(self, param1, param2): def ror(self, param1, param2):
_loc3_ = 0 _loc3_ = 0
while _loc3_ < param2: while _loc3_ < param2:
param1 = self.urshift(param1, 1) + ((param1 & 1) << 31) param1 = urshift(param1, 1) + ((param1 & 1) << 31)
_loc3_ += 1 _loc3_ += 1
return param1 return param1
@@ -90,6 +90,10 @@ class LeIE(InfoExtractor):
_loc3_ = self.ror(_loc3_, _loc2_ % 17) _loc3_ = self.ror(_loc3_, _loc2_ % 17)
return _loc3_ return _loc3_
# reversed from http://jstatic.letvcdn.com/sdk/player.js
def get_mms_key(self, time):
return self.ror(time, 8) ^ 185025305
# see M3U8Encryption class in KLetvPlayer.swf # see M3U8Encryption class in KLetvPlayer.swf
@staticmethod @staticmethod
def decrypt_m3u8(encrypted_data): def decrypt_m3u8(encrypted_data):
@@ -110,28 +114,7 @@ class LeIE(InfoExtractor):
return bytes(_loc7_) return bytes(_loc7_)
def _real_extract(self, url): def _check_errors(self, play_json):
media_id = self._match_id(url)
page = self._download_webpage(url, media_id)
params = {
'id': media_id,
'platid': 1,
'splatid': 101,
'format': 1,
'tkey': self.calc_time_key(int(time.time())),
'domain': 'www.le.com'
}
play_json_req = sanitized_Request(
'http://api.le.com/mms/out/video/playJson?' + compat_urllib_parse_urlencode(params)
)
cn_verification_proxy = self._downloader.params.get('cn_verification_proxy')
if cn_verification_proxy:
play_json_req.add_header('Ytdl-request-proxy', cn_verification_proxy)
play_json = self._download_json(
play_json_req,
media_id, 'Downloading playJson data')
# Check for errors # Check for errors
playstatus = play_json['playstatus'] playstatus = play_json['playstatus']
if playstatus['status'] == 0: if playstatus['status'] == 0:
@@ -142,43 +125,99 @@ class LeIE(InfoExtractor):
msg = 'Generic error. flag = %d' % flag msg = 'Generic error. flag = %d' % flag
raise ExtractorError(msg, expected=True) raise ExtractorError(msg, expected=True)
playurl = play_json['playurl'] def _real_extract(self, url):
media_id = self._match_id(url)
page = self._download_webpage(url, media_id)
formats = ['350', '1000', '1300', '720p', '1080p'] play_json_h5 = self._download_json(
dispatch = playurl['dispatch'] 'http://api.le.com/mms/out/video/playJsonH5',
media_id, 'Downloading html5 playJson data', query={
'id': media_id,
'platid': 3,
'splatid': 304,
'format': 1,
'tkey': self.get_mms_key(int(time.time())),
'domain': 'www.le.com',
'tss': 'no',
},
headers=self.geo_verification_headers())
self._check_errors(play_json_h5)
urls = [] play_json_flash = self._download_json(
for format_id in formats: 'http://api.le.com/mms/out/video/playJson',
if format_id in dispatch: media_id, 'Downloading flash playJson data', query={
media_url = playurl['domain'][0] + dispatch[format_id][0] 'id': media_id,
media_url += '&' + compat_urllib_parse_urlencode({ 'platid': 1,
'm3v': 1, 'splatid': 101,
'format': 1,
'tkey': self.calc_time_key(int(time.time())),
'domain': 'www.le.com',
},
headers=self.geo_verification_headers())
self._check_errors(play_json_flash)
def get_h5_urls(media_url, format_id):
location = self._download_json(
media_url, media_id,
'Download JSON metadata for format %s' % format_id, query={
'format': 1, 'format': 1,
'expect': 3, 'expect': 3,
'rateid': format_id, 'tss': 'no',
}) })['location']
nodes_data = self._download_json( return {
media_url, media_id, 'http': update_url_query(location, {'tss': 'no'}),
'Download JSON metadata for format %s' % format_id) 'hls': update_url_query(location, {'tss': 'ios'}),
}
req = self._request_webpage( def get_flash_urls(media_url, format_id):
nodes_data['nodelist'][0]['location'], media_id, media_url += '&' + compat_urllib_parse_urlencode({
note='Downloading m3u8 information for format %s' % format_id) 'm3v': 1,
'format': 1,
'expect': 3,
'rateid': format_id,
})
m3u8_data = self.decrypt_m3u8(req.read()) nodes_data = self._download_json(
media_url, media_id,
'Download JSON metadata for format %s' % format_id)
url_info_dict = { req = self._request_webpage(
'url': encode_data_uri(m3u8_data, 'application/vnd.apple.mpegurl'), nodes_data['nodelist'][0]['location'], media_id,
'ext': determine_ext(dispatch[format_id][1]), note='Downloading m3u8 information for format %s' % format_id)
'format_id': format_id,
'protocol': 'm3u8',
}
if format_id[-1:] == 'p': m3u8_data = self.decrypt_m3u8(req.read())
url_info_dict['height'] = int_or_none(format_id[:-1])
urls.append(url_info_dict) return {
'hls': encode_data_uri(m3u8_data, 'application/vnd.apple.mpegurl'),
}
extracted_formats = []
formats = []
for play_json, get_urls in ((play_json_h5, get_h5_urls), (play_json_flash, get_flash_urls)):
playurl = play_json['playurl']
play_domain = playurl['domain'][0]
for format_id, format_data in playurl.get('dispatch', []).items():
if format_id in extracted_formats:
continue
extracted_formats.append(format_id)
media_url = play_domain + format_data[0]
for protocol, format_url in get_urls(media_url, format_id).items():
f = {
'url': format_url,
'ext': determine_ext(format_data[1]),
'format_id': '%s-%s' % (protocol, format_id),
'protocol': 'm3u8_native' if protocol == 'hls' else 'http',
'quality': int_or_none(format_id),
}
if format_id[-1:] == 'p':
f['height'] = int_or_none(format_id[:-1])
formats.append(f)
self._sort_formats(formats, ('height', 'quality', 'format_id'))
publish_time = parse_iso8601(self._html_search_regex( publish_time = parse_iso8601(self._html_search_regex(
r'发布时间&nbsp;([^<>]+) ', page, 'publish time', default=None), r'发布时间&nbsp;([^<>]+) ', page, 'publish time', default=None),
@@ -187,7 +226,7 @@ class LeIE(InfoExtractor):
return { return {
'id': media_id, 'id': media_id,
'formats': urls, 'formats': formats,
'title': playurl['title'], 'title': playurl['title'],
'thumbnail': playurl['pic'], 'thumbnail': playurl['pic'],
'description': description, 'description': description,
@@ -196,7 +235,7 @@ class LeIE(InfoExtractor):
class LePlaylistIE(InfoExtractor): class LePlaylistIE(InfoExtractor):
_VALID_URL = r'https?://[a-z]+\.le\.com/[a-z]+/(?P<id>[a-z0-9_]+)' _VALID_URL = r'https?://[a-z]+\.le\.com/(?!video)[a-z]+/(?P<id>[a-z0-9_]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.le.com/tv/46177.html', 'url': 'http://www.le.com/tv/46177.html',

View File

@@ -0,0 +1,143 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
float_or_none,
int_or_none,
parse_filesize,
)
class LibraryOfCongressIE(InfoExtractor):
IE_NAME = 'loc'
IE_DESC = 'Library of Congress'
_VALID_URL = r'https?://(?:www\.)?loc\.gov/(?:item/|today/cyberlc/feature_wdesc\.php\?.*\brec=)(?P<id>[0-9]+)'
_TESTS = [{
# embedded via <div class="media-player"
'url': 'http://loc.gov/item/90716351/',
'md5': '353917ff7f0255aa6d4b80a034833de8',
'info_dict': {
'id': '90716351',
'ext': 'mp4',
'title': "Pa's trip to Mars",
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 0,
'view_count': int,
},
}, {
# webcast embedded via mediaObjectId
'url': 'https://www.loc.gov/today/cyberlc/feature_wdesc.php?rec=5578',
'info_dict': {
'id': '5578',
'ext': 'mp4',
'title': 'Help! Preservation Training Needs Here, There & Everywhere',
'duration': 3765,
'view_count': int,
'subtitles': 'mincount:1',
},
'params': {
'skip_download': True,
},
}, {
# with direct download links
'url': 'https://www.loc.gov/item/78710669/',
'info_dict': {
'id': '78710669',
'ext': 'mp4',
'title': 'La vie et la passion de Jesus-Christ',
'duration': 0,
'view_count': int,
'formats': 'mincount:4',
},
'params': {
'skip_download': True,
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
media_id = self._search_regex(
(r'id=(["\'])media-player-(?P<id>.+?)\1',
r'<video[^>]+id=(["\'])uuid-(?P<id>.+?)\1',
r'<video[^>]+data-uuid=(["\'])(?P<id>.+?)\1',
r'mediaObjectId\s*:\s*(["\'])(?P<id>.+?)\1'),
webpage, 'media id', group='id')
data = self._download_json(
'https://media.loc.gov/services/v1/media?id=%s&context=json' % media_id,
video_id)['mediaObject']
derivative = data['derivatives'][0]
media_url = derivative['derivativeUrl']
title = derivative.get('shortName') or data.get('shortName') or self._og_search_title(
webpage)
# Following algorithm was extracted from setAVSource js function
# found in webpage
media_url = media_url.replace('rtmp', 'https')
is_video = data.get('mediaType', 'v').lower() == 'v'
ext = determine_ext(media_url)
if ext not in ('mp4', 'mp3'):
media_url += '.mp4' if is_video else '.mp3'
if 'vod/mp4:' in media_url:
formats = [{
'url': media_url.replace('vod/mp4:', 'hls-vod/media/') + '.m3u8',
'format_id': 'hls',
'ext': 'mp4',
'protocol': 'm3u8_native',
'quality': 1,
}]
elif 'vod/mp3:' in media_url:
formats = [{
'url': media_url.replace('vod/mp3:', ''),
'vcodec': 'none',
}]
download_urls = set()
for m in re.finditer(
r'<option[^>]+value=(["\'])(?P<url>.+?)\1[^>]+data-file-download=[^>]+>\s*(?P<id>.+?)(?:(?:&nbsp;|\s+)\((?P<size>.+?)\))?\s*<', webpage):
format_id = m.group('id').lower()
if format_id == 'gif':
continue
download_url = m.group('url')
if download_url in download_urls:
continue
download_urls.add(download_url)
formats.append({
'url': download_url,
'format_id': format_id,
'filesize_approx': parse_filesize(m.group('size')),
})
self._sort_formats(formats)
duration = float_or_none(data.get('duration'))
view_count = int_or_none(data.get('viewCount'))
subtitles = {}
cc_url = data.get('ccUrl')
if cc_url:
subtitles.setdefault('en', []).append({
'url': cc_url,
'ext': 'ttml',
})
return {
'id': video_id,
'title': title,
'thumbnail': self._og_search_thumbnail(webpage, default=None),
'duration': duration,
'view_count': view_count,
'formats': formats,
'subtitles': subtitles,
}

View File

@@ -98,13 +98,19 @@ class LimelightBaseIE(InfoExtractor):
} for thumbnail in properties.get('thumbnails', []) if thumbnail.get('url')] } for thumbnail in properties.get('thumbnails', []) if thumbnail.get('url')]
subtitles = {} subtitles = {}
for caption in properties.get('captions', {}): for caption in properties.get('captions', []):
lang = caption.get('language_code') lang = caption.get('language_code')
subtitles_url = caption.get('url') subtitles_url = caption.get('url')
if lang and subtitles_url: if lang and subtitles_url:
subtitles[lang] = [{ subtitles.setdefault(lang, []).append({
'url': subtitles_url, 'url': subtitles_url,
}] })
closed_captions_url = properties.get('closed_captions_url')
if closed_captions_url:
subtitles.setdefault('en', []).append({
'url': closed_captions_url,
'ext': 'ttml',
})
return { return {
'id': video_id, 'id': video_id,
@@ -123,7 +129,18 @@ class LimelightBaseIE(InfoExtractor):
class LimelightMediaIE(LimelightBaseIE): class LimelightMediaIE(LimelightBaseIE):
IE_NAME = 'limelight' IE_NAME = 'limelight'
_VALID_URL = r'(?:limelight:media:|https?://link\.videoplatform\.limelight\.com/media/\??\bmediaId=)(?P<id>[a-z0-9]{32})' _VALID_URL = r'''(?x)
(?:
limelight:media:|
https?://
(?:
link\.videoplatform\.limelight\.com/media/|
assets\.delvenetworks\.com/player/loader\.swf
)
\?.*?\bmediaId=
)
(?P<id>[a-z0-9]{32})
'''
_TESTS = [{ _TESTS = [{
'url': 'http://link.videoplatform.limelight.com/media/?mediaId=3ffd040b522b4485b6d84effc750cd86', 'url': 'http://link.videoplatform.limelight.com/media/?mediaId=3ffd040b522b4485b6d84effc750cd86',
'info_dict': { 'info_dict': {
@@ -158,6 +175,9 @@ class LimelightMediaIE(LimelightBaseIE):
# rtmp download # rtmp download
'skip_download': True, 'skip_download': True,
}, },
}, {
'url': 'https://assets.delvenetworks.com/player/loader.swf?mediaId=8018a574f08d416e95ceaccae4ba0452',
'only_matching': True,
}] }]
_PLAYLIST_SERVICE_PATH = 'media' _PLAYLIST_SERVICE_PATH = 'media'
_API_PATH = 'media' _API_PATH = 'media'
@@ -176,15 +196,29 @@ class LimelightMediaIE(LimelightBaseIE):
class LimelightChannelIE(LimelightBaseIE): class LimelightChannelIE(LimelightBaseIE):
IE_NAME = 'limelight:channel' IE_NAME = 'limelight:channel'
_VALID_URL = r'(?:limelight:channel:|https?://link\.videoplatform\.limelight\.com/media/\??\bchannelId=)(?P<id>[a-z0-9]{32})' _VALID_URL = r'''(?x)
_TEST = { (?:
limelight:channel:|
https?://
(?:
link\.videoplatform\.limelight\.com/media/|
assets\.delvenetworks\.com/player/loader\.swf
)
\?.*?\bchannelId=
)
(?P<id>[a-z0-9]{32})
'''
_TESTS = [{
'url': 'http://link.videoplatform.limelight.com/media/?channelId=ab6a524c379342f9b23642917020c082', 'url': 'http://link.videoplatform.limelight.com/media/?channelId=ab6a524c379342f9b23642917020c082',
'info_dict': { 'info_dict': {
'id': 'ab6a524c379342f9b23642917020c082', 'id': 'ab6a524c379342f9b23642917020c082',
'title': 'Javascript Sample Code', 'title': 'Javascript Sample Code',
}, },
'playlist_mincount': 3, 'playlist_mincount': 3,
} }, {
'url': 'http://assets.delvenetworks.com/player/loader.swf?channelId=ab6a524c379342f9b23642917020c082',
'only_matching': True,
}]
_PLAYLIST_SERVICE_PATH = 'channel' _PLAYLIST_SERVICE_PATH = 'channel'
_API_PATH = 'channels' _API_PATH = 'channels'
@@ -207,15 +241,29 @@ class LimelightChannelIE(LimelightBaseIE):
class LimelightChannelListIE(LimelightBaseIE): class LimelightChannelListIE(LimelightBaseIE):
IE_NAME = 'limelight:channel_list' IE_NAME = 'limelight:channel_list'
_VALID_URL = r'(?:limelight:channel_list:|https?://link\.videoplatform\.limelight\.com/media/\?.*?\bchannelListId=)(?P<id>[a-z0-9]{32})' _VALID_URL = r'''(?x)
_TEST = { (?:
limelight:channel_list:|
https?://
(?:
link\.videoplatform\.limelight\.com/media/|
assets\.delvenetworks\.com/player/loader\.swf
)
\?.*?\bchannelListId=
)
(?P<id>[a-z0-9]{32})
'''
_TESTS = [{
'url': 'http://link.videoplatform.limelight.com/media/?channelListId=301b117890c4465c8179ede21fd92e2b', 'url': 'http://link.videoplatform.limelight.com/media/?channelListId=301b117890c4465c8179ede21fd92e2b',
'info_dict': { 'info_dict': {
'id': '301b117890c4465c8179ede21fd92e2b', 'id': '301b117890c4465c8179ede21fd92e2b',
'title': 'Website - Hero Player', 'title': 'Website - Hero Player',
}, },
'playlist_mincount': 2, 'playlist_mincount': 2,
} }, {
'url': 'https://assets.delvenetworks.com/player/loader.swf?channelListId=301b117890c4465c8179ede21fd92e2b',
'only_matching': True,
}]
_PLAYLIST_SERVICE_PATH = 'channel_list' _PLAYLIST_SERVICE_PATH = 'channel_list'
def _real_extract(self, url): def _real_extract(self, url):

View File

@@ -203,9 +203,10 @@ class LivestreamIE(InfoExtractor):
if not videos_info: if not videos_info:
break break
for v in videos_info: for v in videos_info:
v_id = compat_str(v['id'])
entries.append(self.url_result( entries.append(self.url_result(
'http://livestream.com/accounts/%s/events/%s/videos/%s' % (account_id, event_id, v['id']), 'http://livestream.com/accounts/%s/events/%s/videos/%s' % (account_id, event_id, v_id),
'Livestream', v['id'], v['caption'])) 'Livestream', v_id, v.get('caption')))
last_video = videos_info[-1]['id'] last_video = videos_info[-1]['id']
return self.playlist_result(entries, event_id, event_data['full_name']) return self.playlist_result(entries, event_id, event_data['full_name'])

View File

@@ -1,106 +1,106 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
import json
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str from ..compat import (
compat_HTTPError,
compat_str,
compat_urlparse,
)
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
clean_html,
int_or_none, int_or_none,
sanitized_Request,
urlencode_postdata, urlencode_postdata,
) )
class LyndaBaseIE(InfoExtractor): class LyndaBaseIE(InfoExtractor):
_LOGIN_URL = 'https://www.lynda.com/login/login.aspx' _SIGNIN_URL = 'https://www.lynda.com/signin'
_PASSWORD_URL = 'https://www.lynda.com/signin/password'
_USER_URL = 'https://www.lynda.com/signin/user'
_ACCOUNT_CREDENTIALS_HINT = 'Use --username and --password options to provide lynda.com account credentials.' _ACCOUNT_CREDENTIALS_HINT = 'Use --username and --password options to provide lynda.com account credentials.'
_NETRC_MACHINE = 'lynda' _NETRC_MACHINE = 'lynda'
def _real_initialize(self): def _real_initialize(self):
self._login() self._login()
@staticmethod
def _check_error(json_string, key_or_keys):
keys = [key_or_keys] if isinstance(key_or_keys, compat_str) else key_or_keys
for key in keys:
error = json_string.get(key)
if error:
raise ExtractorError('Unable to login: %s' % error, expected=True)
def _login_step(self, form_html, fallback_action_url, extra_form_data, note, referrer_url):
action_url = self._search_regex(
r'<form[^>]+action=(["\'])(?P<url>.+?)\1', form_html,
'post url', default=fallback_action_url, group='url')
if not action_url.startswith('http'):
action_url = compat_urlparse.urljoin(self._SIGNIN_URL, action_url)
form_data = self._hidden_inputs(form_html)
form_data.update(extra_form_data)
try:
response = self._download_json(
action_url, None, note,
data=urlencode_postdata(form_data),
headers={
'Referer': referrer_url,
'X-Requested-With': 'XMLHttpRequest',
})
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 500:
response = self._parse_json(e.cause.read().decode('utf-8'), None)
self._check_error(response, ('email', 'password'))
raise
self._check_error(response, 'ErrorMessage')
return response, action_url
def _login(self): def _login(self):
username, password = self._get_login_info() username, password = self._get_login_info()
if username is None: if username is None:
return return
login_form = { # Step 1: download signin page
'username': username, signin_page = self._download_webpage(
'password': password, self._SIGNIN_URL, None, 'Downloading signin page')
'remember': 'false',
'stayPut': 'false'
}
request = sanitized_Request(
self._LOGIN_URL, urlencode_postdata(login_form))
login_page = self._download_webpage(
request, None, 'Logging in as %s' % username)
# Not (yet) logged in # Already logged in
m = re.search(r'loginResultJson\s*=\s*\'(?P<json>[^\']+)\';', login_page) if any(re.search(p, signin_page) for p in (
if m is not None: 'isLoggedIn\s*:\s*true', r'logout\.aspx', r'>Log out<')):
response = m.group('json')
response_json = json.loads(response)
state = response_json['state']
if state == 'notlogged':
raise ExtractorError(
'Unable to login, incorrect username and/or password',
expected=True)
# This is when we get popup:
# > You're already logged in to lynda.com on two devices.
# > If you log in here, we'll log you out of another device.
# So, we need to confirm this.
if state == 'conflicted':
confirm_form = {
'username': '',
'password': '',
'resolve': 'true',
'remember': 'false',
'stayPut': 'false',
}
request = sanitized_Request(
self._LOGIN_URL, urlencode_postdata(confirm_form))
login_page = self._download_webpage(
request, None,
'Confirming log in and log out from another device')
if all(not re.search(p, login_page) for p in ('isLoggedIn\s*:\s*true', r'logout\.aspx', r'>Log out<')):
if 'login error' in login_page:
mobj = re.search(
r'(?s)<h1[^>]+class="topmost">(?P<title>[^<]+)</h1>\s*<div>(?P<description>.+?)</div>',
login_page)
if mobj:
raise ExtractorError(
'lynda returned error: %s - %s'
% (mobj.group('title'), clean_html(mobj.group('description'))),
expected=True)
raise ExtractorError('Unable to log in')
def _logout(self):
username, _ = self._get_login_info()
if username is None:
return return
self._download_webpage( # Step 2: submit email
'http://www.lynda.com/ajax/logout.aspx', None, signin_form = self._search_regex(
'Logging out', 'Unable to log out', fatal=False) r'(?s)(<form[^>]+data-form-name=["\']signin["\'][^>]*>.+?</form>)',
signin_page, 'signin form')
signin_page, signin_url = self._login_step(
signin_form, self._PASSWORD_URL, {'email': username},
'Submitting email', self._SIGNIN_URL)
# Step 3: submit password
password_form = signin_page['body']
self._login_step(
password_form, self._USER_URL, {'email': username, 'password': password},
'Submitting password', signin_url)
class LyndaIE(LyndaBaseIE): class LyndaIE(LyndaBaseIE):
IE_NAME = 'lynda' IE_NAME = 'lynda'
IE_DESC = 'lynda.com videos' IE_DESC = 'lynda.com videos'
_VALID_URL = r'https?://www\.lynda\.com/(?:[^/]+/[^/]+/\d+|player/embed)/(?P<id>\d+)' _VALID_URL = r'https?://www\.lynda\.com/(?:[^/]+/[^/]+/\d+|player/embed)/(?P<id>\d+)'
_NETRC_MACHINE = 'lynda'
_TIMECODE_REGEX = r'\[(?P<timecode>\d+:\d+:\d+[\.,]\d+)\]' _TIMECODE_REGEX = r'\[(?P<timecode>\d+:\d+:\d+[\.,]\d+)\]'
_TESTS = [{ _TESTS = [{
'url': 'http://www.lynda.com/Bootstrap-tutorials/Using-exercise-files/110885/114408-4.html', 'url': 'http://www.lynda.com/Bootstrap-tutorials/Using-exercise-files/110885/114408-4.html',
'md5': 'ecfc6862da89489161fb9cd5f5a6fac1', # md5 is unstable
'info_dict': { 'info_dict': {
'id': '114408', 'id': '114408',
'ext': 'mp4', 'ext': 'mp4',
@@ -212,8 +212,6 @@ class LyndaCourseIE(LyndaBaseIE):
'http://www.lynda.com/ajax/player?courseId=%s&type=course' % course_id, 'http://www.lynda.com/ajax/player?courseId=%s&type=course' % course_id,
course_id, 'Downloading course JSON') course_id, 'Downloading course JSON')
self._logout()
if course.get('Status') == 'NotFound': if course.get('Status') == 'NotFound':
raise ExtractorError( raise ExtractorError(
'Course %s does not exist' % course_id, expected=True) 'Course %s does not exist' % course_id, expected=True)
@@ -246,5 +244,6 @@ class LyndaCourseIE(LyndaBaseIE):
% unaccessible_videos + self._ACCOUNT_CREDENTIALS_HINT) % unaccessible_videos + self._ACCOUNT_CREDENTIALS_HINT)
course_title = course.get('Title') course_title = course.get('Title')
course_description = course.get('Description')
return self.playlist_result(entries, course_id, course_title) return self.playlist_result(entries, course_id, course_title, course_description)

View File

@@ -1,8 +1,6 @@
# encoding: utf-8 # encoding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
@@ -23,34 +21,5 @@ class M6IE(InfoExtractor):
} }
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id = self._match_id(url)
video_id = mobj.group('id') return self.url_result('6play:%s' % video_id, 'SixPlay', video_id)
rss = self._download_xml('http://ws.m6.fr/v1/video/info/m6/bonus/%s' % video_id, video_id,
'Downloading video RSS')
title = rss.find('./channel/item/title').text
description = rss.find('./channel/item/description').text
thumbnail = rss.find('./channel/item/visuel_clip_big').text
duration = int(rss.find('./channel/item/duration').text)
view_count = int(rss.find('./channel/item/nombre_vues').text)
formats = []
for format_id in ['lq', 'sd', 'hq', 'hd']:
video_url = rss.find('./channel/item/url_video_%s' % format_id)
if video_url is None:
continue
formats.append({
'url': video_url.text,
'format_id': format_id,
})
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'duration': duration,
'view_count': view_count,
'formats': formats,
}

View File

@@ -4,16 +4,12 @@ from __future__ import unicode_literals
import random import random
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urllib_parse_urlencode from ..utils import xpath_text
from ..utils import (
sanitized_Request,
xpath_text,
)
class MatchTVIE(InfoExtractor): class MatchTVIE(InfoExtractor):
_VALID_URL = r'https?://matchtv\.ru/?#live-player' _VALID_URL = r'https?://matchtv\.ru(?:/on-air|/?#live-player)'
_TEST = { _TESTS = [{
'url': 'http://matchtv.ru/#live-player', 'url': 'http://matchtv.ru/#live-player',
'info_dict': { 'info_dict': {
'id': 'matchtv-live', 'id': 'matchtv-live',
@@ -24,12 +20,16 @@ class MatchTVIE(InfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
} }, {
'url': 'http://matchtv.ru/on-air/',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
video_id = 'matchtv-live' video_id = 'matchtv-live'
request = sanitized_Request( video_url = self._download_json(
'http://player.matchtv.ntvplus.tv/player/smil?%s' % compat_urllib_parse_urlencode({ 'http://player.matchtv.ntvplus.tv/player/smil', video_id,
query={
'ts': '', 'ts': '',
'quality': 'SD', 'quality': 'SD',
'contentId': '561d2c0df7159b37178b4567', 'contentId': '561d2c0df7159b37178b4567',
@@ -40,11 +40,10 @@ class MatchTVIE(InfoExtractor):
'contentType': 'channel', 'contentType': 'channel',
'timeShift': '0', 'timeShift': '0',
'platform': 'portal', 'platform': 'portal',
}), },
headers={ headers={
'Referer': 'http://player.matchtv.ntvplus.tv/embed-player/NTVEmbedPlayer.swf', 'Referer': 'http://player.matchtv.ntvplus.tv/embed-player/NTVEmbedPlayer.swf',
}) })['data']['videoUrl']
video_url = self._download_json(request, video_id)['data']['videoUrl']
f4m_url = xpath_text(self._download_xml(video_url, video_id), './to') f4m_url = xpath_text(self._download_xml(video_url, video_id), './to')
formats = self._extract_f4m_formats(f4m_url, video_id) formats = self._extract_f4m_formats(f4m_url, video_id)
self._sort_formats(formats) self._sort_formats(formats)

View File

@@ -0,0 +1,73 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from .pladform import PladformIE
from ..utils import (
unescapeHTML,
int_or_none,
ExtractorError,
)
class METAIE(InfoExtractor):
_VALID_URL = r'https?://video\.meta\.ua/(?:iframe/)?(?P<id>[0-9]+)'
_TESTS = [{
'url': 'http://video.meta.ua/5502115.video',
'md5': '71b6f3ee274bef16f1ab410f7f56b476',
'info_dict': {
'id': '5502115',
'ext': 'mp4',
'title': 'Sony Xperia Z camera test [HQ]',
'description': 'Xperia Z shoots video in FullHD HDR.',
'uploader_id': 'nomobile',
'uploader': 'CHЁZA.TV',
'upload_date': '20130211',
},
'add_ie': ['Youtube'],
}, {
'url': 'http://video.meta.ua/iframe/5502115',
'only_matching': True,
}, {
# pladform embed
'url': 'http://video.meta.ua/7121015.video',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
st_html5 = self._search_regex(
r"st_html5\s*=\s*'#([^']+)'", webpage, 'uppod html5 st', default=None)
if st_html5:
# uppod st decryption algorithm is reverse engineered from function un(s) at uppod.js
json_str = ''
for i in range(0, len(st_html5), 3):
json_str += '&#x0%s;' % st_html5[i:i + 3]
uppod_data = self._parse_json(unescapeHTML(json_str), video_id)
error = uppod_data.get('customnotfound')
if error:
raise ExtractorError('%s said: %s' % (self.IE_NAME, error), expected=True)
video_url = uppod_data['file']
info = {
'id': video_id,
'url': video_url,
'title': uppod_data.get('comment') or self._og_search_title(webpage),
'description': self._og_search_description(webpage, default=None),
'thumbnail': uppod_data.get('poster') or self._og_search_thumbnail(webpage),
'duration': int_or_none(self._og_search_property(
'video:duration', webpage, default=None)),
}
if 'youtube.com/' in video_url:
info.update({
'_type': 'url_transparent',
'ie_key': 'Youtube',
})
return info
pladform_url = PladformIE._extract_url(webpage)
if pladform_url:
return self.url_result(pladform_url)

View File

@@ -11,13 +11,14 @@ from ..utils import (
determine_ext, determine_ext,
ExtractorError, ExtractorError,
int_or_none, int_or_none,
sanitized_Request,
urlencode_postdata, urlencode_postdata,
get_element_by_attribute,
mimetype2ext,
) )
class MetacafeIE(InfoExtractor): class MetacafeIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?metacafe\.com/watch/([^/]+)/([^/]+)/.*' _VALID_URL = r'https?://(?:www\.)?metacafe\.com/watch/(?P<video_id>[^/]+)/(?P<display_id>[^/?#]+)'
_DISCLAIMER = 'http://www.metacafe.com/family_filter/' _DISCLAIMER = 'http://www.metacafe.com/family_filter/'
_FILTER_POST = 'http://www.metacafe.com/f/index.php?inputType=filter&controllerGroup=user' _FILTER_POST = 'http://www.metacafe.com/f/index.php?inputType=filter&controllerGroup=user'
IE_NAME = 'metacafe' IE_NAME = 'metacafe'
@@ -47,6 +48,7 @@ class MetacafeIE(InfoExtractor):
'uploader': 'ign', 'uploader': 'ign',
'description': 'Sony released a massive FAQ on the PlayStation Blog detailing the PS4\'s capabilities and limitations.', 'description': 'Sony released a massive FAQ on the PlayStation Blog detailing the PS4\'s capabilities and limitations.',
}, },
'skip': 'Page is temporarily unavailable.',
}, },
# AnyClip video # AnyClip video
{ {
@@ -55,8 +57,8 @@ class MetacafeIE(InfoExtractor):
'id': 'an-dVVXnuY7Jh77J', 'id': 'an-dVVXnuY7Jh77J',
'ext': 'mp4', 'ext': 'mp4',
'title': 'The Andromeda Strain (1971): Stop the Bomb Part 3', 'title': 'The Andromeda Strain (1971): Stop the Bomb Part 3',
'uploader': 'anyclip', 'uploader': 'AnyClip',
'description': 'md5:38c711dd98f5bb87acf973d573442e67', 'description': 'md5:cbef0460d31e3807f6feb4e7a5952e5b',
}, },
}, },
# age-restricted video # age-restricted video
@@ -110,28 +112,25 @@ class MetacafeIE(InfoExtractor):
def report_disclaimer(self): def report_disclaimer(self):
self.to_screen('Retrieving disclaimer') self.to_screen('Retrieving disclaimer')
def _real_initialize(self): def _confirm_age(self):
# Retrieve disclaimer # Retrieve disclaimer
self.report_disclaimer() self.report_disclaimer()
self._download_webpage(self._DISCLAIMER, None, False, 'Unable to retrieve disclaimer') self._download_webpage(self._DISCLAIMER, None, False, 'Unable to retrieve disclaimer')
# Confirm age # Confirm age
disclaimer_form = {
'filters': '0',
'submit': "Continue - I'm over 18",
}
request = sanitized_Request(self._FILTER_POST, urlencode_postdata(disclaimer_form))
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
self.report_age_confirmation() self.report_age_confirmation()
self._download_webpage(request, None, False, 'Unable to confirm age') self._download_webpage(
self._FILTER_POST, None, False, 'Unable to confirm age',
data=urlencode_postdata({
'filters': '0',
'submit': "Continue - I'm over 18",
}), headers={
'Content-Type': 'application/x-www-form-urlencoded',
})
def _real_extract(self, url): def _real_extract(self, url):
# Extract id and simplified title from URL # Extract id and simplified title from URL
mobj = re.match(self._VALID_URL, url) video_id, display_id = re.match(self._VALID_URL, url).groups()
if mobj is None:
raise ExtractorError('Invalid URL: %s' % url)
video_id = mobj.group(1)
# the video may come from an external site # the video may come from an external site
m_external = re.match('^(\w{2})-(.*)$', video_id) m_external = re.match('^(\w{2})-(.*)$', video_id)
@@ -144,15 +143,24 @@ class MetacafeIE(InfoExtractor):
if prefix == 'cb': if prefix == 'cb':
return self.url_result('theplatform:%s' % ext_id, 'ThePlatform') return self.url_result('theplatform:%s' % ext_id, 'ThePlatform')
# Retrieve video webpage to extract further information # self._confirm_age()
req = sanitized_Request('http://www.metacafe.com/watch/%s/' % video_id)
# AnyClip videos require the flashversion cookie so that we get the link # AnyClip videos require the flashversion cookie so that we get the link
# to the mp4 file # to the mp4 file
mobj_an = re.match(r'^an-(.*?)$', video_id) headers = {}
if mobj_an: if video_id.startswith('an-'):
req.headers['Cookie'] = 'flashVersion=0;' headers['Cookie'] = 'flashVersion=0;'
webpage = self._download_webpage(req, video_id)
# Retrieve video webpage to extract further information
webpage = self._download_webpage(url, video_id, headers=headers)
error = get_element_by_attribute(
'class', 'notfound-page-title', webpage)
if error:
raise ExtractorError(error, expected=True)
video_title = self._html_search_meta(
['og:title', 'twitter:title'], webpage, 'title', default=None) or self._search_regex(r'<h1>(.*?)</h1>', webpage, 'title')
# Extract URL, uploader and title from webpage # Extract URL, uploader and title from webpage
self.report_extraction(video_id) self.report_extraction(video_id)
@@ -216,20 +224,40 @@ class MetacafeIE(InfoExtractor):
'player_url': player_url, 'player_url': player_url,
'ext': play_path.partition(':')[0], 'ext': play_path.partition(':')[0],
}) })
if video_url is None:
flashvars = self._parse_json(self._search_regex(
r'flashvars\s*=\s*({.*});', webpage, 'flashvars',
default=None), video_id, fatal=False)
if flashvars:
video_url = []
for source in flashvars.get('sources'):
source_url = source.get('src')
if not source_url:
continue
ext = mimetype2ext(source.get('type')) or determine_ext(source_url)
if ext == 'm3u8':
video_url.extend(self._extract_m3u8_formats(
source_url, video_id, 'mp4',
'm3u8_native', m3u8_id='hls', fatal=False))
else:
video_url.append({
'url': source_url,
'ext': ext,
})
if video_url is None: if video_url is None:
raise ExtractorError('Unsupported video type') raise ExtractorError('Unsupported video type')
video_title = self._html_search_regex( description = self._html_search_meta(
r'(?im)<title>(.*) - Video</title>', webpage, 'title') ['og:description', 'twitter:description', 'description'],
description = self._og_search_description(webpage) webpage, 'title', fatal=False)
thumbnail = self._og_search_thumbnail(webpage) thumbnail = self._html_search_meta(
['og:image', 'twitter:image'], webpage, 'title', fatal=False)
video_uploader = self._html_search_regex( video_uploader = self._html_search_regex(
r'submitter=(.*?);|googletag\.pubads\(\)\.setTargeting\("(?:channel|submiter)","([^"]+)"\);', r'submitter=(.*?);|googletag\.pubads\(\)\.setTargeting\("(?:channel|submiter)","([^"]+)"\);',
webpage, 'uploader nickname', fatal=False) webpage, 'uploader nickname', fatal=False)
duration = int_or_none( duration = int_or_none(
self._html_search_meta('video:duration', webpage)) self._html_search_meta('video:duration', webpage, default=None))
age_limit = ( age_limit = (
18 18
if re.search(r'(?:"contentRating":|"rating",)"restricted"', webpage) if re.search(r'(?:"contentRating":|"rating",)"restricted"', webpage)
@@ -242,10 +270,11 @@ class MetacafeIE(InfoExtractor):
'url': video_url, 'url': video_url,
'ext': video_ext, 'ext': video_ext,
}] }]
self._sort_formats(formats) self._sort_formats(formats)
return { return {
'id': video_id, 'id': video_id,
'display_id': display_id,
'description': description, 'description': description,
'uploader': video_uploader, 'uploader': video_uploader,
'title': video_title, 'title': video_title,

View File

@@ -26,7 +26,8 @@ class MGTVIE(InfoExtractor):
video_id = self._match_id(url) video_id = self._match_id(url)
api_data = self._download_json( api_data = self._download_json(
'http://v.api.mgtv.com/player/video', video_id, 'http://v.api.mgtv.com/player/video', video_id,
query={'video_id': video_id})['data'] query={'video_id': video_id},
headers=self.geo_verification_headers())['data']
info = api_data['info'] info = api_data['info']
formats = [] formats = []

View File

@@ -4,6 +4,7 @@ from __future__ import unicode_literals
import random import random
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import ( from ..utils import (
xpath_text, xpath_text,
int_or_none, int_or_none,
@@ -18,13 +19,16 @@ class MioMioIE(InfoExtractor):
_TESTS = [{ _TESTS = [{
# "type=video" in flashvars # "type=video" in flashvars
'url': 'http://www.miomio.tv/watch/cc88912/', 'url': 'http://www.miomio.tv/watch/cc88912/',
'md5': '317a5f7f6b544ce8419b784ca8edae65',
'info_dict': { 'info_dict': {
'id': '88912', 'id': '88912',
'ext': 'flv', 'ext': 'flv',
'title': '【SKY】字幕 铠武昭和VS平成 假面骑士大战FEAT战队 魔星字幕组 字幕', 'title': '【SKY】字幕 铠武昭和VS平成 假面骑士大战FEAT战队 魔星字幕组 字幕',
'duration': 5923, 'duration': 5923,
}, },
'params': {
# The server provides broken file
'skip_download': True,
}
}, { }, {
'url': 'http://www.miomio.tv/watch/cc184024/', 'url': 'http://www.miomio.tv/watch/cc184024/',
'info_dict': { 'info_dict': {
@@ -32,7 +36,7 @@ class MioMioIE(InfoExtractor):
'title': '《动漫同人插画绘制》', 'title': '《动漫同人插画绘制》',
}, },
'playlist_mincount': 86, 'playlist_mincount': 86,
'skip': 'This video takes time too long for retrieving the URL', 'skip': 'Unable to load videos',
}, { }, {
'url': 'http://www.miomio.tv/watch/cc173113/', 'url': 'http://www.miomio.tv/watch/cc173113/',
'info_dict': { 'info_dict': {
@@ -40,20 +44,23 @@ class MioMioIE(InfoExtractor):
'title': 'The New Macbook 2015 上手试玩与简评' 'title': 'The New Macbook 2015 上手试玩与简评'
}, },
'playlist_mincount': 2, 'playlist_mincount': 2,
'skip': 'Unable to load videos',
}, {
# new 'h5' player
'url': 'http://www.miomio.tv/watch/cc273295/',
'md5': '',
'info_dict': {
'id': '273295',
'ext': 'mp4',
'title': 'アウト×デラックス 20160526',
},
'params': {
# intermittent HTTP 500
'skip_download': True,
},
}] }]
def _real_extract(self, url): def _extract_mioplayer(self, webpage, video_id, title, http_headers):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = self._html_search_meta(
'description', webpage, 'title', fatal=True)
mioplayer_path = self._search_regex(
r'src="(/mioplayer/[^"]+)"', webpage, 'ref_path')
http_headers = {'Referer': 'http://www.miomio.tv%s' % mioplayer_path}
xml_config = self._search_regex( xml_config = self._search_regex(
r'flashvars="type=(?:sina|video)&amp;(.+?)&amp;', r'flashvars="type=(?:sina|video)&amp;(.+?)&amp;',
webpage, 'xml config') webpage, 'xml config')
@@ -92,10 +99,34 @@ class MioMioIE(InfoExtractor):
'http_headers': http_headers, 'http_headers': http_headers,
}) })
return entries
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = self._html_search_meta(
'description', webpage, 'title', fatal=True)
mioplayer_path = self._search_regex(
r'src="(/mioplayer(?:_h5)?/[^"]+)"', webpage, 'ref_path')
if '_h5' in mioplayer_path:
player_url = compat_urlparse.urljoin(url, mioplayer_path)
player_webpage = self._download_webpage(
player_url, video_id,
note='Downloading player webpage', headers={'Referer': url})
entries = self._parse_html5_media_entries(player_url, player_webpage)
http_headers = {'Referer': player_url}
else:
http_headers = {'Referer': 'http://www.miomio.tv%s' % mioplayer_path}
entries = self._extract_mioplayer(webpage, video_id, title, http_headers)
if len(entries) == 1: if len(entries) == 1:
segment = entries[0] segment = entries[0]
segment['id'] = video_id segment['id'] = video_id
segment['title'] = title segment['title'] = title
segment['http_headers'] = http_headers
return segment return segment
return { return {

View File

@@ -1,5 +1,8 @@
# coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_urllib_parse_urlencode, compat_urllib_parse_urlencode,
@@ -8,82 +11,137 @@ from ..compat import (
from ..utils import ( from ..utils import (
get_element_by_attribute, get_element_by_attribute,
int_or_none, int_or_none,
remove_start,
extract_attributes,
determine_ext,
) )
class MiTeleIE(InfoExtractor): class MiTeleBaseIE(InfoExtractor):
IE_DESC = 'mitele.es' def _get_player_info(self, url, webpage):
_VALID_URL = r'https?://www\.mitele\.es/[^/]+/[^/]+/[^/]+/(?P<id>[^/]+)/' player_data = extract_attributes(self._search_regex(
r'(?s)(<ms-video-player.+?</ms-video-player>)',
webpage, 'ms video player'))
video_id = player_data['data-media-id']
config_url = compat_urlparse.urljoin(url, player_data['data-config'])
config = self._download_json(
config_url, video_id, 'Downloading config JSON')
mmc_url = config['services']['mmc']
_TEST = { duration = None
formats = []
for m_url in (mmc_url, mmc_url.replace('/flash.json', '/html5.json')):
mmc = self._download_json(
m_url, video_id, 'Downloading mmc JSON')
if not duration:
duration = int_or_none(mmc.get('duration'))
for location in mmc['locations']:
gat = self._proto_relative_url(location.get('gat'), 'http:')
bas = location.get('bas')
loc = location.get('loc')
ogn = location.get('ogn')
if None in (gat, bas, loc, ogn):
continue
token_data = {
'bas': bas,
'icd': loc,
'ogn': ogn,
'sta': '0',
}
media = self._download_json(
'%s/?%s' % (gat, compat_urllib_parse_urlencode(token_data)),
video_id, 'Downloading %s JSON' % location['loc'])
file_ = media.get('file')
if not file_:
continue
ext = determine_ext(file_)
if ext == 'f4m':
formats.extend(self._extract_f4m_formats(
file_ + '&hdcore=3.2.0&plugin=aasp-3.2.0.77.18',
video_id, f4m_id='hds', fatal=False))
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
file_, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
self._sort_formats(formats)
return {
'id': video_id,
'formats': formats,
'thumbnail': player_data.get('data-poster') or config.get('poster', {}).get('imageUrl'),
'duration': duration,
}
class MiTeleIE(MiTeleBaseIE):
IE_DESC = 'mitele.es'
_VALID_URL = r'https?://www\.mitele\.es/(?:[^/]+/){3}(?P<id>[^/]+)/'
_TESTS = [{
'url': 'http://www.mitele.es/programas-tv/diario-de/la-redaccion/programa-144/', 'url': 'http://www.mitele.es/programas-tv/diario-de/la-redaccion/programa-144/',
# MD5 is unstable # MD5 is unstable
'info_dict': { 'info_dict': {
'id': '0NF1jJnxS1Wu3pHrmvFyw2', 'id': '0NF1jJnxS1Wu3pHrmvFyw2',
'display_id': 'programa-144', 'display_id': 'programa-144',
'ext': 'flv', 'ext': 'mp4',
'title': 'Tor, la web invisible', 'title': 'Tor, la web invisible',
'description': 'md5:3b6fce7eaa41b2d97358726378d9369f', 'description': 'md5:3b6fce7eaa41b2d97358726378d9369f',
'series': 'Diario de',
'season': 'La redacción',
'episode': 'Programa 144',
'thumbnail': 're:(?i)^https?://.*\.jpg$', 'thumbnail': 're:(?i)^https?://.*\.jpg$',
'duration': 2913, 'duration': 2913,
}, },
} }, {
# no explicit title
'url': 'http://www.mitele.es/programas-tv/cuarto-milenio/temporada-6/programa-226/',
'info_dict': {
'id': 'eLZSwoEd1S3pVyUm8lc6F',
'display_id': 'programa-226',
'ext': 'mp4',
'title': 'Cuarto Milenio - Temporada 6 - Programa 226',
'description': 'md5:50daf9fadefa4e62d9fc866d0c015701',
'series': 'Cuarto Milenio',
'season': 'Temporada 6',
'episode': 'Programa 226',
'thumbnail': 're:(?i)^https?://.*\.jpg$',
'duration': 7312,
},
'params': {
'skip_download': True,
},
}]
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
config_url = self._search_regex( info = self._get_player_info(url, webpage)
r'data-config\s*=\s*"([^"]+)"', webpage, 'data config url')
config_url = compat_urlparse.urljoin(url, config_url)
config = self._download_json(
config_url, display_id, 'Downloading config JSON')
mmc = self._download_json(
config['services']['mmc'], display_id, 'Downloading mmc JSON')
formats = []
for location in mmc['locations']:
gat = self._proto_relative_url(location.get('gat'), 'http:')
bas = location.get('bas')
loc = location.get('loc')
ogn = location.get('ogn')
if None in (gat, bas, loc, ogn):
continue
token_data = {
'bas': bas,
'icd': loc,
'ogn': ogn,
'sta': '0',
}
media = self._download_json(
'%s/?%s' % (gat, compat_urllib_parse_urlencode(token_data)),
display_id, 'Downloading %s JSON' % location['loc'])
file_ = media.get('file')
if not file_:
continue
formats.extend(self._extract_f4m_formats(
file_ + '&hdcore=3.2.0&plugin=aasp-3.2.0.77.18',
display_id, f4m_id=loc))
self._sort_formats(formats)
title = self._search_regex( title = self._search_regex(
r'class="Destacado-text"[^>]*>\s*<strong>([^<]+)</strong>', webpage, 'title') r'class="Destacado-text"[^>]*>\s*<strong>([^<]+)</strong>',
webpage, 'title', default=None)
video_id = self._search_regex( mobj = re.search(r'''(?sx)
r'data-media-id\s*=\s*"([^"]+)"', webpage, class="Destacado-text"[^>]*>.*?<h1>\s*
'data media id', default=None) or display_id <span>(?P<series>[^<]+)</span>\s*
thumbnail = config.get('poster', {}).get('imageUrl') <span>(?P<season>[^<]+)</span>\s*
duration = int_or_none(mmc.get('duration')) <span>(?P<episode>[^<]+)</span>''', webpage)
series, season, episode = mobj.groups() if mobj else [None] * 3
return { if not title:
'id': video_id, if mobj:
title = '%s - %s - %s' % (series, season, episode)
else:
title = remove_start(self._search_regex(
r'<title>([^<]+)</title>', webpage, 'title'), 'Ver online ')
info.update({
'display_id': display_id, 'display_id': display_id,
'title': title, 'title': title,
'description': get_element_by_attribute('class', 'text', webpage), 'description': get_element_by_attribute('class', 'text', webpage),
'thumbnail': thumbnail, 'series': series,
'duration': duration, 'season': season,
'formats': formats, 'episode': episode,
} })
return info

View File

@@ -102,11 +102,11 @@ class MixcloudIE(InfoExtractor):
description = self._og_search_description(webpage) description = self._og_search_description(webpage)
like_count = parse_count(self._search_regex( like_count = parse_count(self._search_regex(
r'\bbutton-favorite[^>]+>.*?<span[^>]+class=["\']toggle-number[^>]+>\s*([^<]+)', r'\bbutton-favorite[^>]+>.*?<span[^>]+class=["\']toggle-number[^>]+>\s*([^<]+)',
webpage, 'like count', fatal=False)) webpage, 'like count', default=None))
view_count = str_to_int(self._search_regex( view_count = str_to_int(self._search_regex(
[r'<meta itemprop="interactionCount" content="UserPlays:([0-9]+)"', [r'<meta itemprop="interactionCount" content="UserPlays:([0-9]+)"',
r'/listeners/?">([0-9,.]+)</a>'], r'/listeners/?">([0-9,.]+)</a>'],
webpage, 'play count', fatal=False)) webpage, 'play count', default=None))
return { return {
'id': track_id, 'id': track_id,

122
youtube_dl/extractor/msn.py Normal file
View File

@@ -0,0 +1,122 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
determine_ext,
ExtractorError,
int_or_none,
unescapeHTML,
)
class MSNIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?msn\.com/(?:[^/]+/)+(?P<display_id>[^/]+)/[a-z]{2}-(?P<id>[\da-zA-Z]+)'
_TESTS = [{
'url': 'http://www.msn.com/en-ae/foodanddrink/joinourtable/criminal-minds-shemar-moore-shares-a-touching-goodbye-message/vp-BBqQYNE',
'md5': '8442f66c116cbab1ff7098f986983458',
'info_dict': {
'id': 'BBqQYNE',
'display_id': 'criminal-minds-shemar-moore-shares-a-touching-goodbye-message',
'ext': 'mp4',
'title': 'Criminal Minds - Shemar Moore Shares A Touching Goodbye Message',
'description': 'md5:e8e89b897b222eb33a6b5067a8f1bc25',
'duration': 104,
'uploader': 'CBS Entertainment',
'uploader_id': 'IT0X5aoJ6bJgYerJXSDCgFmYPB1__54v',
},
}, {
'url': 'http://www.msn.com/en-ae/news/offbeat/meet-the-nine-year-old-self-made-millionaire/ar-BBt6ZKf',
'only_matching': True,
}, {
'url': 'http://www.msn.com/en-ae/video/watch/obama-a-lot-of-people-will-be-disappointed/vi-AAhxUMH',
'only_matching': True,
}, {
# geo restricted
'url': 'http://www.msn.com/en-ae/foodanddrink/joinourtable/the-first-fart-makes-you-laugh-the-last-fart-makes-you-cry/vp-AAhzIBU',
'only_matching': True,
}, {
'url': 'http://www.msn.com/en-ae/entertainment/bollywood/watch-how-salman-khan-reacted-when-asked-if-he-would-apologize-for-his-raped-woman-comment/vi-AAhvzW6',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id, display_id = mobj.group('id', 'display_id')
webpage = self._download_webpage(url, display_id)
video = self._parse_json(
self._search_regex(
r'data-metadata\s*=\s*(["\'])(?P<data>.+?)\1',
webpage, 'video data', default='{}', group='data'),
display_id, transform_source=unescapeHTML)
if not video:
error = unescapeHTML(self._search_regex(
r'data-error=(["\'])(?P<error>.+?)\1',
webpage, 'error', group='error'))
raise ExtractorError('%s said: %s' % (self.IE_NAME, error), expected=True)
title = video['title']
formats = []
for file_ in video.get('videoFiles', []):
format_url = file_.get('url')
if not format_url:
continue
ext = determine_ext(format_url)
# .ism is not yet supported (see
# https://github.com/rg3/youtube-dl/issues/8118)
if ext == 'ism':
continue
if 'm3u8' in format_url:
# m3u8_native should not be used here until
# https://github.com/rg3/youtube-dl/issues/9913 is fixed
m3u8_formats = self._extract_m3u8_formats(
format_url, display_id, 'mp4',
m3u8_id='hls', fatal=False)
# Despite metadata in m3u8 all video+audio formats are
# actually video-only (no audio)
for f in m3u8_formats:
if f.get('acodec') != 'none' and f.get('vcodec') != 'none':
f['acodec'] = 'none'
formats.extend(m3u8_formats)
else:
formats.append({
'url': format_url,
'ext': 'mp4',
'format_id': 'http',
'width': int_or_none(file_.get('width')),
'height': int_or_none(file_.get('height')),
})
self._sort_formats(formats)
subtitles = {}
for file_ in video.get('files', []):
format_url = file_.get('url')
format_code = file_.get('formatCode')
if not format_url or not format_code:
continue
if compat_str(format_code) == '3100':
subtitles.setdefault(file_.get('culture', 'en'), []).append({
'ext': determine_ext(format_url, 'ttml'),
'url': format_url,
})
return {
'id': video_id,
'display_id': display_id,
'title': title,
'description': video.get('description'),
'thumbnail': video.get('headlineImage', {}).get('url'),
'duration': int_or_none(video.get('durationSecs')),
'uploader': video.get('sourceFriendly'),
'uploader_id': video.get('providerId'),
'creator': video.get('creator'),
'subtitles': subtitles,
'formats': formats,
}

View File

@@ -6,6 +6,7 @@ from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_urllib_parse_urlencode, compat_urllib_parse_urlencode,
compat_str, compat_str,
compat_xpath,
) )
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
@@ -84,9 +85,10 @@ class MTVServicesInfoExtractor(InfoExtractor):
rtmp_video_url = rendition.find('./src').text rtmp_video_url = rendition.find('./src').text
if rtmp_video_url.endswith('siteunavail.png'): if rtmp_video_url.endswith('siteunavail.png'):
continue continue
new_url = self._transform_rtmp_url(rtmp_video_url)
formats.append({ formats.append({
'ext': ext, 'ext': 'flv' if new_url.startswith('rtmp') else ext,
'url': self._transform_rtmp_url(rtmp_video_url), 'url': new_url,
'format_id': rendition.get('bitrate'), 'format_id': rendition.get('bitrate'),
'width': int(rendition.get('width')), 'width': int(rendition.get('width')),
'height': int(rendition.get('height')), 'height': int(rendition.get('height')),
@@ -139,9 +141,9 @@ class MTVServicesInfoExtractor(InfoExtractor):
itemdoc, './/{http://search.yahoo.com/mrss/}category', itemdoc, './/{http://search.yahoo.com/mrss/}category',
'scheme', 'urn:mtvn:video_title') 'scheme', 'urn:mtvn:video_title')
if title_el is None: if title_el is None:
title_el = itemdoc.find('.//{http://search.yahoo.com/mrss/}title') title_el = itemdoc.find(compat_xpath('.//{http://search.yahoo.com/mrss/}title'))
if title_el is None: if title_el is None:
title_el = itemdoc.find('.//title') or itemdoc.find('./title') title_el = itemdoc.find(compat_xpath('.//title'))
if title_el.text is None: if title_el.text is None:
title_el = None title_el = None

View File

@@ -1,6 +1,7 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from .theplatform import ThePlatformIE
from ..utils import ( from ..utils import (
smuggle_url, smuggle_url,
url_basename, url_basename,
@@ -61,7 +62,7 @@ class NationalGeographicIE(InfoExtractor):
} }
class NationalGeographicChannelIE(InfoExtractor): class NationalGeographicChannelIE(ThePlatformIE):
IE_NAME = 'natgeo:channel' IE_NAME = 'natgeo:channel'
_VALID_URL = r'https?://channel\.nationalgeographic\.com/(?:wild/)?[^/]+/videos/(?P<id>[^/?]+)' _VALID_URL = r'https?://channel\.nationalgeographic\.com/(?:wild/)?[^/]+/videos/(?P<id>[^/?]+)'
@@ -102,12 +103,22 @@ class NationalGeographicChannelIE(InfoExtractor):
release_url = self._search_regex( release_url = self._search_regex(
r'video_auth_playlist_url\s*=\s*"([^"]+)"', r'video_auth_playlist_url\s*=\s*"([^"]+)"',
webpage, 'release url') webpage, 'release url')
query = {
'mbr': 'true',
'switch': 'http',
}
is_auth = self._search_regex(r'video_is_auth\s*=\s*"([^"]+)"', webpage, 'is auth', fatal=False)
if is_auth == 'auth':
auth_resource_id = self._search_regex(
r"video_auth_resourceId\s*=\s*'([^']+)'",
webpage, 'auth resource id')
query['auth'] = self._extract_mvpd_auth(url, display_id, 'natgeo', auth_resource_id) or ''
return { return {
'_type': 'url_transparent', '_type': 'url_transparent',
'ie_key': 'ThePlatform', 'ie_key': 'ThePlatform',
'url': smuggle_url( 'url': smuggle_url(
update_url_query(release_url, {'mbr': 'true', 'switch': 'http'}), update_url_query(release_url, query),
{'force_smil_url': True}), {'force_smil_url': True}),
'display_id': display_id, 'display_id': display_id,
} }

Some files were not shown because too many files have changed in this diff Show More