Compare commits

...

568 Commits

Author SHA1 Message Date
Sergey M․
19e2d1cdea release 2016.06.20 2016-06-20 20:50:01 +07:00
Sergey M․
8369a4fe76 [downloader/hls] Simplify and carry long lines 2016-06-20 21:55:17 +07:00
Philipp Hagemeister
1f749b6658 Revert "[jsinterp] Avoid double key lookup for setting new key"
This reverts commit 7c05097633.
2016-06-20 13:29:13 +02:00
Remita Amine
819707920a [cbs] fix _VALID_URL 2016-06-19 23:55:19 +01:00
Remita Amine
43518503a6 [cbs,cbsnews,cbssports] reduce requests while extracting all formats 2016-06-19 23:40:00 +01:00
Remita Amine
5839d556e4 [theplatform] reduce requests for theplatform feed info extraction 2016-06-19 23:37:05 +01:00
Yen Chi Hsuan
6c83e583b3 [radiojavan] PEP8
E275 is added in pycodestyle 2.6

See https://github.com/PyCQA/pycodestyle/pull/491
2016-06-19 13:32:08 +08:00
Yen Chi Hsuan
6aeb64b673 Merge pull request #8201 from remitamine/hls-aes
[downloader/hls] Add support for AES-128 encrypted segments in hlsnative downloader
2016-06-19 13:25:08 +08:00
Remita Amine
6cd64b6806 [foxsports] extract http formats 2016-06-19 05:45:48 +01:00
remitamine
e154c65128 [downloader/hls] Add support for AES-128 encrypted segments in hlsnative downloader 2016-06-19 01:01:40 +01:00
Sergey M․
a50fd6e026 release 2016.06.19.1 2016-06-19 03:57:14 +07:00
Sergey M․
6a55bb66ee [vimeo] Fix rented videos (Closes #9830) 2016-06-19 03:56:01 +07:00
Lucas Moura
7c05097633 [jsinterp] Avoid double key lookup for setting new key
In order to add a new key to both __objects and __functions dicts on jsinterp.py, it is
necessary to first verify if a key was present and if not, create the key and
assign it to a value.

However, this can be done with a single step using dict setdefault method.
2016-06-19 03:29:45 +07:00
Sergey M․
589568789f release 2016.06.19 2016-06-19 02:30:29 +07:00
Sergey M․
7577d849a6 [r7] Fix extraction and add support for articles (Closes #9826) 2016-06-19 02:25:34 +07:00
Sergey M․
cb23192bc4 [closertotruth] Update and improve (Closes #8680) 2016-06-19 00:35:29 +07:00
Steven Gosseling
41c1023300 [closertotruth] Add extractor
Removed print statement from code.

Replaced two regex searches with the corret ones.

Removed some unnecessary semicolumns

fixed title extraction

refactored everything to search_regex

processed comments on commit 5650b0d, fixed feedback from flake8

Improved regexes and returns info dict now.

Added support for closertotruth interview URL

Added support for episodes page
2016-06-18 23:19:56 +07:00
Sergey M․
90b6288cce [arte:+7] Simplify _VALID_URL 2016-06-18 22:23:48 +07:00
Sergey M․
c1823c8ad9 [README.md] Remove 'small' from description (#9814) 2016-06-18 22:08:48 +07:00
Sergey M․
d7c6c656c5 [arte:+7] Expand _VALID_URL (Closes #9820) 2016-06-18 21:42:17 +07:00
Yen Chi Hsuan
b0b128049a [extractors] Update references to sportschau (#9799) 2016-06-18 13:43:47 +08:00
Yen Chi Hsuan
e8f13f2637 [sportschau.de] Fix extraction and moved to its own file (closes #9799) 2016-06-18 13:42:58 +08:00
Yen Chi Hsuan
b5aad37f6b [ard] Remove SportschauIE, which is now based on WDR (#9799) 2016-06-18 13:42:39 +08:00
Yen Chi Hsuan
6d0d4fc26d [wdr] Add WDRBaseIE, for Sportschau (#9799) 2016-06-18 13:40:55 +08:00
Yen Chi Hsuan
0278aa443f [br] Skip invalid tests 2016-06-18 12:53:48 +08:00
Yen Chi Hsuan
1f35745758 [azubu] Don't fail on optional fields 2016-06-18 12:39:08 +08:00
Yen Chi Hsuan
573c35272f [bbc] Skip a geo-restricted test case 2016-06-18 12:35:55 +08:00
Yen Chi Hsuan
09e3f91e40 [arte] Update _TESTS and fix for pages with multiple YouTube videos
Some tests are from #6895 and #6613
2016-06-18 12:34:58 +08:00
Yen Chi Hsuan
1b6cf16be7 [aftonbladet] Fix extraction 2016-06-18 12:27:39 +08:00
Yen Chi Hsuan
26264cb056 [adobetv] Use embedded data in the webpage
Sometimes the HTML webpage is returned even with '?format=json'
2016-06-18 12:21:40 +08:00
Yen Chi Hsuan
a72df5f36f [mtvservices] Fix ext for RTMP streams 2016-06-18 12:19:06 +08:00
Yen Chi Hsuan
c878e635de [bet] Moved to MTVServices 2016-06-18 12:17:24 +08:00
Sergey M․
0f47cc2e92 release 2016.06.18.1 2016-06-18 06:20:34 +07:00
Sergey M․
5fc2757682 release 2016.06.18 2016-06-18 06:00:05 +07:00
Sergey M․
e3944c2621 [pornhd] Add working test 2016-06-18 05:50:17 +07:00
Sergey M․
667d96480b [pornhd] Detect removed videos and modernize 2016-06-18 05:42:20 +07:00
Sergey M․
e6fe993c31 [pornhd] Improve formats extraction 2016-06-18 05:37:53 +07:00
Sergey M․
d0d93f76ea [pornhd] Fix metadata extraction 2016-06-18 05:30:46 +07:00
Sergey M․
20a6a154fe [mtv] Use compat_xpath and fix FutureWarning 2016-06-18 04:46:26 +07:00
Sergey M․
f011876076 [nickde] Add extractor (Closes #9778) 2016-06-18 04:40:48 +07:00
Sergey M․
6929569403 [mitele] Extract series metadata and make title more robust (Closes #9758) 2016-06-18 04:06:19 +07:00
Sergey M․
eb451890da [carambatv] Add extractor (Closes #9815) 2016-06-18 03:04:14 +07:00
Sergey M․
ded7511a70 [bbccouk] Add support for playlists (Closes #9812) 2016-06-17 23:42:52 +07:00
Sergey M․
d2161cade5 release 2016.06.16 2016-06-16 22:40:55 +07:00
Sergey M․
27e5fa8198 [cda] Fix extraction (Closes #9803) 2016-06-16 22:33:12 +07:00
Yen Chi Hsuan
efbd1eb51a [wimp] Fix extraction and update _TESTS 2016-06-16 12:27:21 +08:00
Yen Chi Hsuan
369ff75081 [jwplatform] Improved JWPlayer support 2016-06-16 12:26:45 +08:00
Yen Chi Hsuan
47212f7bcb [utils] Don't transform numbers not starting with a zero
Fix test_Viidea and maybe others
2016-06-16 11:00:54 +08:00
Sergey M․
4c93ee8d14 [imdb] Improve _VALID_URL (Closes #9788) 2016-06-15 22:34:55 +07:00
Yen Chi Hsuan
8bc4dbb1af [wrzuta.pl] Detect error and update _TESTS 2016-06-14 11:14:59 +08:00
Sergey M․
6c3760292c [pornhub] Improve title extraction (Closes #9777) 2016-06-14 04:57:59 +07:00
Sergey M․
4cef70db6c [devscripts/release.sh] Add flag for gpg-sign commits 2016-06-14 03:16:56 +07:00
Sergey M․
ff4af6ec59 [lynda] Remove superfluous _NETRC_MACHINE 2016-06-14 02:49:33 +07:00
Sergey M․
d01fb21d4c release 2016.06.14 2016-06-14 02:19:42 +07:00
Sergey M․
a4ea28eee6 Credit @venth for wrzuta:playlist (#9341) 2016-06-14 02:15:47 +07:00
Sergey M․
bc2a871f3e Credit @dracony for rockstargames (#9737) 2016-06-14 02:15:09 +07:00
Sergey M․
1759672eed [wrzuta:playlist] Improve and simplify (Closes #9341) 2016-06-14 02:13:54 +07:00
venth
fea55ef4a9 [wrzuta.pl:playlist] Added playlist extraction from wrzuta.pl 2016-06-14 02:10:48 +07:00
Sergey M․
16b6bd01d2 [rockstargames] Improve and add Youtube fallback (Closes #9737) 2016-06-14 01:11:24 +07:00
Dracony
14d0f4e0f3 Added extractor for rockstargames.com 2016-06-14 01:09:35 +07:00
Sergey M․
778f969447 [twitch:clips] Add extractor (Closes #9767) 2016-06-14 00:06:31 +07:00
Sergey M․
79cd8b3d8a [README.md] Suggest checking extractor code under all Python versions 2016-06-13 10:04:04 +07:00
Sergey M․
b4663f12b1 [README.md] Update links to info dict metafields 2016-06-13 07:16:35 +07:00
Sergey M․
b50e02c1e4 [README.md] Update links to options available for YoutubeDL 2016-06-13 07:05:32 +07:00
Sergey M․
33b72ce64e [xfileshare] Improve removed videos detection 2016-06-13 01:19:54 +07:00
Sergey M․
cf2bf840ba [xfileshare] Fix test 2016-06-13 01:11:14 +07:00
Sergey M․
bccdac6874 [xfileshare:xvidstage] Add support for videos with packed codes (Closes #4335) 2016-06-13 01:11:04 +07:00
Sergey M․
e69f9f5d68 [downloader/external] Decode error string before writing to stderr 2016-06-12 16:45:07 +07:00
Sergey M․
77a9a9c295 release 2016.06.12 2016-06-12 12:06:48 +07:00
Sergey M․
84dcd1c4e4 [streamcloud] Detect removed videos (Closes #3768) 2016-06-12 11:08:39 +07:00
Sergey M․
971e3b7520 [nrk:skole] Fix extraction 2016-06-12 07:20:37 +07:00
Sergey M․
4e79011729 [nrktv] Fix tests 2016-06-12 06:57:04 +07:00
Sergey M․
a936ac321c [README.md] Document using output template in batch files (Closes #9717) 2016-06-12 06:39:31 +07:00
Sergey M․
98960c911c [instagram] Extract metadata from JSON 2016-06-12 06:06:04 +07:00
Sergey M․
329ca3bef6 [utils] Add try_get
To reduce boilerplate when accessing JSON
2016-06-12 06:05:34 +07:00
Sergey M․
2c3322e36e [youporn] Fix metadata extraction 2016-06-12 04:49:37 +07:00
Sergey M․
80ae228b34 [matchtv] Modernize 2016-06-12 01:57:23 +07:00
Yen Chi Hsuan
6d28c408cf [viki] Do not use a fallback language for title in the first try
In test_Viki_3, 'titles' gives a Hebrew title.
2016-06-11 23:00:44 +08:00
Yen Chi Hsuan
c83b35d4aa [viki] Update _TESTS 2016-06-11 22:39:13 +08:00
Yen Chi Hsuan
94e5d6aedb [viki] Skip a geo-restricted test 2016-06-11 21:49:01 +08:00
Yen Chi Hsuan
531a74968c [vimeo] Fix extraction for VimeoReview videos 2016-06-11 21:35:08 +08:00
Yen Chi Hsuan
c5edd147d1 [generic] Remove an invalid test
Now handled by telewebion.py
2016-06-11 18:39:58 +08:00
Yen Chi Hsuan
856150d056 [telewebion] Add new extractor (closes #5135) 2016-06-11 18:39:58 +08:00
Yen Chi Hsuan
03ebea89b0 Merge pull request #9755 from vxbinaca/patch-2
[utils] Change Firefox 44 to 47
2016-06-11 17:38:45 +08:00
Paul Henning
15d106787e [utils] Change Firefox 44 to 47
See commit title.
2016-06-11 05:36:31 -04:00
Yen Chi Hsuan
7aab3696dd [kuwo] Update _TESTS 2016-06-11 15:37:04 +08:00
Yen Chi Hsuan
47787efa2b [leeco] Recognize Le Sports URLs (fixes #9750) 2016-06-11 13:14:41 +08:00
Sergey M․
4a420119a6 release 2016.06.11.3 2016-06-11 08:34:30 +07:00
Sergey M․
33751818d3 release 2016.06.11.2 2016-06-11 08:28:51 +07:00
Sergey M․
698f127c1a [setup.py] Add python 3.5 classifier 2016-06-11 06:14:22 +07:00
Sergey M․
fe458b6596 [limelight] Extract ttml subtitles (Closes #9739) 2016-06-11 05:57:27 +07:00
Sergey M․
21ac1a8ac3 [limelight] Fix typo 2016-06-11 05:52:50 +07:00
Sergey M․
79027c0ea0 [limelight] Improve _VALID_URLs 2016-06-11 05:40:02 +07:00
Sergey M․
4cad2929cd [limelight] Fix _VALID_URLs 2016-06-11 05:30:44 +07:00
Sergey M․
62666af99f [indavideo] Fix formats' height (Closes #9744) 2016-06-11 05:13:05 +07:00
Sergey M․
9ddc289f88 [README.md] Document missing playlist fields in output template 2016-06-11 04:59:47 +07:00
Sergey M․
6626c214e1 release 2016.06.11.1 2016-06-11 03:00:08 +07:00
Sergey M․
d845622b2e release 2016.06.11 2016-06-11 02:41:48 +07:00
Sergey M
1058f56e96 Merge pull request #9747 from TRox1972/lynda
[Lynda] Extract course description
2016-06-11 02:34:58 +07:00
TRox1972
0434358823 [Lynda] Extract course description 2016-06-10 19:17:58 +02:00
Sergey M․
3841256c2c [lynda] Skip login if already logged in 2016-06-10 23:01:52 +07:00
Sergey M․
bdf16f8140 [lynda] Add support for new authentication (Closes #9740) 2016-06-10 22:40:18 +07:00
Yen Chi Hsuan
836ab0c554 [compat] Import html5 entities correctly 2016-06-10 18:12:57 +08:00
Yen Chi Hsuan
6c0376fe4f [dw] Skip an invalid test
DW documentaries only last for one or two weeks. See #9475
2016-06-10 16:53:40 +08:00
Yen Chi Hsuan
1fa309da40 [generic] Update test_Generic_40
The original link now redirects to an YouTube user channel.
2016-06-10 16:39:31 +08:00
Yen Chi Hsuan
daa0df9e8b [youtube:user] Support another URL form
Such an URL comes from http://www.gametrailers.com/. This is originally
a test case in GenericIE, but now seems all GameTrailers videos are on
YouTube.
2016-06-10 16:37:12 +08:00
Yen Chi Hsuan
09728d5fbc [audiomack:album] Force video_id to be strings
Related: be6217b261
2016-06-10 16:11:28 +08:00
Yen Chi Hsuan
c16f8a4659 [voicerepublic] Force video_id to be strings
Related: be6217b261
2016-06-10 16:04:28 +08:00
Yen Chi Hsuan
a225238530 [vporn] Improve error detection and update _TESTS 2016-06-10 15:12:53 +08:00
Yen Chi Hsuan
55b2f099c0 [utils] Decode HTML5 entities
Used in test_Vporn_1. Also related to #9270
2016-06-10 15:11:55 +08:00
Yen Chi Hsuan
9631a94fb5 [compat] Add compat_html_entities_html5
Used in tset_Vporn_1. Also Related to #9270
2016-06-10 15:05:24 +08:00
Yen Chi Hsuan
cc4444662c [generic] Remove Vulture embed detection
Vulture.com videos now hosts on YouTube, Vimeo, MTV, NBC News or Hulu.
Here's an example of Hulu:
http://www.vulture.com/2016/06/kimmel-interviews-mariah-carey-in-a-bathtub.html
2016-06-10 13:40:57 +08:00
Yen Chi Hsuan
de3eb07ed6 [generic] Detect NBC News embeds 2016-06-10 13:32:59 +08:00
Yen Chi Hsuan
5de008e8c3 [nbcnews] Support embed widgets
Used in some Vulture videos
2016-06-10 13:31:55 +08:00
Yen Chi Hsuan
3e74b444e7 [vulture] Remove the extractor
The first 10 URLs in google search "site:http://video.vulture.com/video"
is dead. I guess Vulture does not host videos on their own anymore.
2016-06-10 13:13:59 +08:00
Yen Chi Hsuan
e1e0a10c56 [weibo] Remove the extractor
The Weibo weishipin (微視頻, tiny videos) service is dead and now all
videos are hosted on Sina videos, which is covered by sina.py
2016-06-10 13:01:22 +08:00
Yen Chi Hsuan
436214baf7 [xfileshare] Skip an invalid test 2016-06-10 12:31:06 +08:00
Yen Chi Hsuan
506d0e9693 [xuite] Skip the invalid test 2016-06-10 12:29:58 +08:00
Yen Chi Hsuan
55290788d3 [yahoo] Yahoo doesn't like region names in lower cases
Fix test_Yahoo_7
2016-06-10 12:28:56 +08:00
Yen Chi Hsuan
bc7e7adf51 [wdr] Subtitles are TTML 2016-06-10 00:22:41 +08:00
Sergey M․
b0aebe702c [godtv] Relax _VALID_URL 2016-06-09 21:34:47 +07:00
Sergey M․
416878f41f [godtv] Add more tests 2016-06-09 21:33:51 +07:00
Sergey M․
c0fed3bda5 [godtv] Improve and add support for playlists (Closes #9608) 2016-06-09 21:29:41 +07:00
TRox1972
bb1e44cc8e [godtv] Add extractor
[GodTV] Improvements
2016-06-09 21:27:27 +07:00
N1k145
21efee5f8b [openload] Relax _VALID_URL
[openload] added to _TESTS, removed escape
2016-06-09 20:46:54 +07:00
Yen Chi Hsuan
e2713d32f4 [openload] Fix extraction. Thanks @perron375 for the solution
Closes #9706
2016-06-09 19:00:13 +08:00
Yen Chi Hsuan
e21c26daf9 Merge pull request #9395 from pmrowla/afreecatv
[afreecatv] Add new extractor for afreecatv.com VODs
2016-06-09 17:20:16 +08:00
Yen Chi Hsuan
1594a4932f [wdr] Misc changes 2016-06-09 13:49:35 +08:00
Yen Chi Hsuan
6869d634c6 [wdr] Simplify extraction 2016-06-09 13:41:12 +08:00
Yen Chi Hsuan
50918c4ee0 [wdr] Support radio players (closes #6147) 2016-06-09 13:04:30 +08:00
Yen Chi Hsuan
6c33d24b46 [utils] Add audio/mpeg to mimetype2ext()
Used in WDR live radios (#6147)
2016-06-09 12:58:24 +08:00
Sergey M․
be6217b261 [YoutubeDL] Force string conversion on non string video ids 2016-06-09 05:34:19 +07:00
Sergey M․
9d51a0a9a1 [vessel] Make hls formats non fatal 2016-06-09 04:13:38 +07:00
Sergey M․
39da509f67 [vessel] Extract DASH formats 2016-06-09 04:12:48 +07:00
Sergey M․
a479b8f687 [vessel] Use native hls by default 2016-06-09 04:09:32 +07:00
Sergey M․
48a5eabc48 [extractor/generic] Add support vessel embeds (Closes #7083) 2016-06-09 04:02:27 +07:00
Sergey M․
11380753b5 [vessel] Add support for embed urls and improve extraction 2016-06-09 04:00:47 +07:00
Yen Chi Hsuan
411c590a1f [youku:show] Add new extractor 2016-06-08 23:45:46 +08:00
Yen Chi Hsuan
6da8d7de69 [twitter] Update _TESTS 2016-06-08 21:48:12 +08:00
Yen Chi Hsuan
c6308b3153 [twitter] Fix extraction for videos with HLS streams
Closes #9623
2016-06-08 21:28:10 +08:00
Yen Chi Hsuan
fc0a45fa41 [twitter] Detect suspended accounts and update _TESTS 2016-06-08 21:12:14 +08:00
Yen Chi Hsuan
e6e90515db [nbc] Add the test case from #9578
Closes #9578
2016-06-08 20:50:01 +08:00
Yen Chi Hsuan
22a0a95247 [theplatform] Some NBC videos require an additional cookie
Related: #9578
2016-06-08 20:47:39 +08:00
Yen Chi Hsuan
50ce1c331c [downloader/external] Add another env for proxies in ffmpeg/avconv
Related sources:
https://git.libav.org/?p=libav.git;a=blob;f=libavformat/http.c;h=8fe8d11e1edfdbb04a8726db2c49cfef3f572aac;hb=HEAD#l152
https://git.libav.org/?p=libav.git;a=blob;f=libavformat/tls.c;h=fab243e93e20034e88e619188c13a44a5d8ccdb9;hb=HEAD#l63
https://github.com/FFmpeg/FFmpeg/blob/f8e89d8/libavformat/http.c#L191
https://github.com/FFmpeg/FFmpeg/blob/f8e89d8/libavformat/tls.c#L92
2016-06-08 14:43:52 +08:00
Yen Chi Hsuan
7264e38591 [bilibili] Fix for videos without upload time (closes #9710) 2016-06-08 14:31:40 +08:00
Sergey M․
33d9f3707c [thesixtyone] Relax _VALID_URL (Closes #9714) 2016-06-08 02:22:04 +07:00
Sergey M․
a26a9d6239 [livestream:event] Ensure video id is string (Closes #9721) 2016-06-07 23:53:08 +07:00
Yen Chi Hsuan
a4a8201c02 [wdr] Update _TESTS 2016-06-08 00:25:51 +08:00
Yen Chi Hsuan
a6571f1073 [common] Fix <bootstrapInfo> detection in F4M manifests
Regression since 0a5685b26f
2016-06-08 00:19:33 +08:00
Sergey M․
57b6e9652e [canal+] Add support for d17.tv 2016-06-07 22:32:08 +07:00
Sergey M․
3d9b3605a3 [canal+] Update tests 2016-06-07 22:26:18 +07:00
Sergey M․
74193838f7 [canal+] Improve extraction (Closes #9718) 2016-06-07 22:12:20 +07:00
Sergey M
fb94e260b5 Merge pull request #9720 from Kagami/vlive-new-statuses
[vlive] Acknowledge vlive+ streams statuses
2016-06-07 21:22:53 +07:00
Kagami Hiiragi
345dec937f [vlive] Acknowledge vlive+ streams statuses
Same as common statuses just with "PRODUCT_" prefix:
PRODUCE_LIVE_END, PRODUCT_COMING_SOON, etc.
2016-06-07 17:12:13 +03:00
Philipp Hagemeister
4315f74fa8 Merge remote-tracking branch 'Boris-de/wdrmaus_fix#8562' 2016-06-07 12:29:18 +02:00
Jaime Marquínez Ferrándiz
e67f688025 [compat] Add 'compat_input' to __all__ 2016-06-05 23:16:08 +02:00
Sergey M․
db59b37d0b [devscripts/create-github-release] Make full published releases by default 2016-06-06 03:02:11 +07:00
Sergey M․
244fe977fe [options] Add --load-info-json alias for symmetry with --write-info-json 2016-06-06 02:52:58 +07:00
Sergey M․
7b0d1c2859 [__init__] Use write_string instead of compat_string (Closes #9689) 2016-06-05 21:01:20 +07:00
Yen Chi Hsuan
21d0a8e48b Merge pull request #9702 from Eun/patch-1
curl: follow redirect
2016-06-05 17:43:26 +08:00
Tobias Salzmann
47f12ad3e3 curl: follow redirect 2016-06-05 11:04:55 +02:00
Sergey M
8f1aaa97a1 [README.md] Update pypi instructions 2016-06-05 11:19:44 +07:00
Sergey M
9d78524cbe Merge pull request #9697 from ryandesign/ryandesign-README-MacPorts
Update README.md to mention MacPorts
2016-06-05 11:09:20 +07:00
Ryan Schmidt
bc270284b5 Update README.md to mention MacPorts 2016-06-04 21:30:22 -05:00
Philipp Hagemeister
c93b4eaceb git pushMerge branch 'master' of github.com:rg3/youtube-dl 2016-06-04 22:55:21 +02:00
Philipp Hagemeister
71b9cb3107 extend FAQ (#9696) 2016-06-04 22:55:15 +02:00
Sergey M․
633b444fd2 [downloader/hls] Correct comment on twitch vods 2016-06-05 03:31:10 +07:00
Sergey M․
51c4d85ce7 [downloader/hls] PEP 8 2016-06-05 03:21:43 +07:00
Sergey M․
631d4c87ee [twitch:vod] Use native hls 2016-06-05 03:19:44 +07:00
Sergey M․
1e236d7e23 [downloader/hls] Do not rely on EXT-X-PLAYLIST-TYPE:EVENT 2016-06-05 03:16:05 +07:00
Sergey M․
2c34735267 [youtube] Add itags 256 and 258 2016-06-05 01:44:13 +07:00
Sergey M․
39b32571df [devscripts/release.sh] Release to GitHub 2016-06-05 00:48:33 +07:00
Sergey M․
db56f281d9 [devscripts/create-github-release] Add script for releasing on GitHub
Yet only Basic authentication is supported either via .netrc or by manual input
2016-06-05 00:47:26 +07:00
Sergey M․
e92b552a10 [devscripts/buildserver] Use compat_input from compat 2016-06-05 00:44:51 +07:00
Sergey M․
1ae6c83bce [compat] Add compat_input 2016-06-05 00:43:55 +07:00
Sergey M․
0fc832e1b2 [vidio] Improve (Closes #9562) 2016-06-04 16:48:24 +07:00
TRox1972
7def35712a [vidio] Add extractor (Closes #7195)
[Vidio] fix fallback value and wrap duration in int_or_none

[Vidio] don't use video_id for _html_search_regex()
2016-06-04 16:48:24 +07:00
Philipp Hagemeister
cad88f96dc disable uploading to yt-dl.org for now 2016-06-04 11:42:52 +02:00
Sergey M․
762d44c956 [channel9] Add support for rss links (Closes #9673) 2016-06-04 04:57:16 +07:00
Sergey M․
4d8856d511 [loc] Extract direct download links 2016-06-04 00:26:03 +07:00
Sergey M․
c917106be4 [loc] Extract subtites 2016-06-03 23:55:22 +07:00
Sergey M․
76e9cd7f24 [loc] Add support for another URL schema and simplify 2016-06-03 23:43:34 +07:00
Sergey M․
bf4c6a38e1 release 2016.06.03 2016-06-03 23:25:24 +07:00
Sergey M․
7f3c3dfa52 [loc] Improve (Closes #9521) 2016-06-03 23:19:11 +07:00
TRox1972
9c3c447eb3 [loc] Add extractor (Closes #3188)
Added extractor of loc.gov, which closes #3188. I am not an experienced programmer, so I am sure I did a bunch of mistakes, but the extractor works (for me at least).

[LibraryOfCongress] don't use video_id for _search_regex()

[LibraryOfCongress] Improvements
2016-06-03 22:17:35 +07:00
Yen Chi Hsuan
ad73083ff0 [bilibili] Add _part%d suffixes back (closes #9660) 2016-06-02 19:29:27 +08:00
Yen Chi Hsuan
1e8b59243f Merge pull request #9669 from bzc6p/master
Added sanitization support for Hungarian letters Ő and Ű
2016-06-02 18:23:54 +08:00
bzc6p
c88270271e Added sanitization support for Hungarian letters Ő and Ű 2016-06-02 11:51:48 +02:00
bzc6p
b96f007eeb Added sanitization support for Hungarian letters Ő and Ű 2016-06-02 11:39:32 +02:00
Yen Chi Hsuan
9a4aec8b7e [utils] Use bytes-like objects as header values on Python 2 2016-06-02 15:00:49 +08:00
Yen Chi Hsuan
54fb199681 [test/test_http] Fix getsockname() on Jython 2016-06-02 15:00:49 +08:00
Yen Chi Hsuan
8c32e5dc32 [test/test_utils] Add test for #9588 2016-06-02 15:00:49 +08:00
Yen Chi Hsuan
0ea590076f [utils] Always decode Location header
escape_url is broken for bytes-like objects
2016-06-02 15:00:49 +08:00
Remita Amine
4a684895c0 [seeker] Add new extractor(closes #9619) 2016-06-01 21:20:25 +01:00
Remita Amine
f4e4aa9b6b [revision3:embed] Add new extractor 2016-06-01 21:20:25 +01:00
Sergey M․
5e3856a2c5 release 2016.06.02 2016-06-02 01:19:57 +07:00
Sergey M․
6e6b9f600f [arte] Add support for playlists and rework tests (Closes #9632) 2016-06-02 01:10:23 +07:00
Sergey M․
6a1df4fb5f [spankwire] Add support for new URL format (Closes #9657) 2016-06-01 21:23:58 +07:00
Yen Chi Hsuan
dde1ce7c06 [tf1] Fix a regular expression (closes #9656)
This is a Python bug fixed in 2.7.6 [1]

[1] https://github.com/rg3/youtube-dl/issues/9656#issuecomment-222968594
2016-06-01 20:04:43 +08:00
Yen Chi Hsuan
811586ebcf [generic] Update the UDNEmbed test case 2016-06-01 19:23:44 +08:00
Yen Chi Hsuan
0ff3749bfe [udn] Fix m3u8 and f4m extraction as well as improve 2016-06-01 19:23:09 +08:00
Yen Chi Hsuan
28bab13348 [generic,viewlift] Move a test case to the specialized extractor 2016-06-01 19:18:01 +08:00
Yen Chi Hsuan
877032314f [generic] Improve Kaltura detection
Closes #4004
2016-06-01 18:37:34 +08:00
Peter Rowlands
e7d85c4ef7 use /track/video/file to determine if video exists 2016-05-31 17:28:49 +09:00
Sergey M․
8ec2b2c41c [options] Add --limit-rate alias for rate limiting option
Closes #9644
In order to follow regular --verb-noun pattern and better conformity with wget and curl
2016-05-30 21:48:35 +07:00
Sergey M․
197a5da1d0 [yandexmusic] Improve captcha detection 2016-05-30 03:26:26 +07:00
Sergey M․
abbb2938fa release 2016.05.30.2 2016-05-30 03:12:12 +07:00
Sergey M․
f657b1a5f2 release 2016.05.30.1 2016-05-30 03:03:06 +07:00
Philipp Hagemeister
86a52881c6 [travis] unsubscribe @phihag 2016-05-29 21:29:38 +02:00
Sergey M․
8267423652 release 2016.05.30 2016-05-30 01:18:23 +07:00
Sergey M
917a3196f8 [README.md] Update c runtime dependency FAQ entry 2016-05-30 01:03:40 +07:00
Sergey M․
56bd028a0f [devscripts/buildserver] Listen on all interfaces 2016-05-30 00:21:18 +07:00
Sergey M․
681b923b5c [devscripts/release.sh] Allow passing buildserver address as cli option 2016-05-29 23:36:42 +07:00
Yen Chi Hsuan
9ed6d8c6c5 [youku] Extract resolution 2016-05-29 13:54:05 +08:00
Sergey M․
f3fb420b82 [devscripts/release.sh] Check for wheel 2016-05-29 11:49:14 +06:00
Sergey M․
165e3561e9 [devscripts/buildserver] Check Wow6432Node first when searching for python
This allows building releases from 64bit OS
2016-05-29 10:02:00 +06:00
Sergey M․
27f17c0eab [Makefile] Fix youtube-dl.1 target
Now it accepts output filename as argument
2016-05-29 09:11:16 +06:00
Sergey M․
44c8892369 [devscripts/prepare_manpage] Fix manpage generation on Windows 2016-05-29 09:06:10 +06:00
Sergey M․
f574103d7c [buildserver] Fix buildserver and make python2 compatible 2016-05-29 09:03:17 +06:00
Yen Chi Hsuan
6d138e98e3 Merge pull request #9621 from venth/feature/ignored_intellij
ignored intellij related files
2016-05-29 03:10:29 +08:00
venth
2a329110b9 ignored intellij related files 2016-05-28 20:27:18 +02:00
Yen Chi Hsuan
2bee7b25f3 [Makefile] Cleanup m4a files
[ci skip]
2016-05-29 01:59:09 +08:00
Yen Chi Hsuan
92cf872a48 [.gitignore] Ignore mp3 files
[ci skip]
2016-05-29 01:59:01 +08:00
Yen Chi Hsuan
6461f2b7ec [bilibili] Fix extraction, improve and cleanup 2016-05-29 01:26:00 +08:00
Sergey M․
807cf7b07f [udemy] Fix authentication for localized layout (Closes #9594) 2016-05-28 21:18:24 +06:00
Sergey M․
de7d76af52 [coub] Add another test 2016-05-27 23:38:17 +06:00
Sergey M․
11c70deba7 [coub] Add extractor (Closes #9609) 2016-05-27 23:34:58 +06:00
Sergey M․
f36532404d [vk] Remove superfluous code 2016-05-27 22:19:10 +06:00
Sergey M․
77b8b4e696 [extractor/common] Borrow quality metadata from parent set-level manifest for f4m 2016-05-27 01:47:44 +06:00
Sergey M․
2615fa7584 [downloader/f4m] Simply select format when it's the only one 2016-05-27 01:46:12 +06:00
Boris Wachtmeister
3a686853e1 [WDR] fixed parsing of playlists 2016-05-26 20:54:51 +02:00
Boris Wachtmeister
949fc42e00 [WDR] the other wdrmaus.de pages also changed to the new player 2016-05-26 20:54:51 +02:00
Boris Wachtmeister
33a1ff7113 [WDR] extract jsonp-url by parsing data-extension of mediaLink 2016-05-26 20:54:51 +02:00
Boris Wachtmeister
bec2c14f2c [WDR] add special handling if alt-url is a m3u8 2016-05-26 20:54:51 +02:00
Boris Wachtmeister
37f972954d [WDR] use _download_json with a strip_jsonp 2016-05-26 20:54:51 +02:00
Boris Wachtmeister
3874e6ea66 [WDR] use single quotes for strings 2016-05-26 20:54:51 +02:00
Yen Chi Hsuan
fac2af3c51 [common] Fix m3u8 extraction in f4m manifests 2016-05-27 01:41:27 +08:00
Sergey M․
6f8cb24219 [tvp] Expand _VALID_URL and improve naming (Closes #9602) 2016-05-26 22:21:55 +06:00
Yen Chi Hsuan
448bb5f333 [common] Fix non-bootstrapped support in f4m 2016-05-27 00:03:48 +08:00
Yen Chi Hsuan
293c255688 [utils] Remove debugging codes 2016-05-26 22:54:16 +08:00
Yen Chi Hsuan
ac88d2316e [dw] Support documentaries (closes #9475) 2016-05-26 22:48:47 +08:00
Yen Chi Hsuan
5950cb1d6d [utils] Support a new form of date
Found in dw.com (#9475)
2016-05-26 22:44:00 +08:00
Yen Chi Hsuan
761052db92 [playwire] Add the test (closed #9531) 2016-05-26 21:57:06 +08:00
Yen Chi Hsuan
240b60453e [common] Support m3u8 in f4m manifests
Related: #9531
2016-05-26 21:55:43 +08:00
Yen Chi Hsuan
85b0fe7d64 [playwire] Use _extract_f4m_formats
Related: #9531
2016-05-26 21:43:35 +08:00
Yen Chi Hsuan
0a5685b26f [common] Support non-bootstraped streams in f4m manifests
Related: #9531
2016-05-26 21:41:47 +08:00
Sergey M․
6f748df43f [eporner] Make test only_matching 2016-05-25 20:51:17 +06:00
Yen Chi Hsuan
b410cb83d4 Merge pull request #9595 from Kagami/vlive-site-update
[vlive] Address site update
2016-05-25 19:24:15 +08:00
Yen Chi Hsuan
da9d82840a Merge pull request #9600 from wankerer/master
[eporner] fix for the new URL layout
2016-05-25 18:52:55 +08:00
wankerer
4ee0b8afdb [eporner] fix for the new URL layout
Recently eporner slightly changed the URL layout, the ID that used to be
digits only are now digits and letters, so youtube-dl falls back to
the generic extractor that doesn't work.

Fix the matching regex to allow letters in ID.

[v2: added a test case]
2016-05-24 15:57:36 -07:00
remitamine
1de32771e1 [eyedotv] Add new extractor(closes #9582) 2016-05-24 20:10:12 +01:00
remitamine
688c634b7d skip some tests to reduce test time 2016-05-24 16:44:11 +01:00
Sergey M․
0d6ee97508 Credit @TRox1972 for tosh.cc (#9566) and localnews8 (#9539) 2016-05-24 21:42:47 +06:00
Sergey M․
6b43132ce9 [xhamster] Update tests 2016-05-24 21:38:27 +06:00
mexican porn commits
a4690b3244 [xhamster] url regex fix for videos with empty title. 2016-05-24 21:35:43 +06:00
remitamine
444417edb5 [radiocanada] Add new extractor(#4020) 2016-05-24 15:58:27 +01:00
remitamine
277c7465f5 [ooyala] check manifest ext with determine_ext and update tests for related extractors 2016-05-24 11:24:29 +01:00
Kagami Hiiragi
25bcd3550e [vlive] Address site update
Changes:
* Fix video params extraction
* Don't make status request since status info now available on the page
* Remove unneeded code
* Fix test
2016-05-24 12:54:28 +03:00
remitamine
a4760d204f [ooyala] use api v2 to reduce requests for format extraction 2016-05-24 00:22:29 +01:00
remitamine
e8593f346a [ooyala] extract subtitles 2016-05-23 23:58:16 +01:00
remitamine
05b651e3a5 [washingtonpost] reduce requests for m3u8 manifests 2016-05-23 13:04:50 +01:00
remitamine
42a7439717 [cbs] allow to pass content id to the extractor(closes #9589) 2016-05-23 09:31:37 +01:00
remitamine
b1e9ebd080 [washingtonpost] remove unnecessary code 2016-05-23 02:30:12 +01:00
remitamine
0c50eeb987 [reuters] Add new extractor 2016-05-23 02:27:31 +01:00
remitamine
4b464a6a78 [washingtonpost] improve format extraction and add support for video pages extraction 2016-05-23 00:48:11 +01:00
Sergey M․
5db9df622f [life:embed] Use native hls 2016-05-23 04:22:09 +06:00
Sergey M․
5181759c0d [life] Update _VALID_URL 2016-05-23 04:00:08 +06:00
Sergey M․
e54373204a [lifenews] Fix metadata extraction 2016-05-23 03:44:04 +06:00
remitamine
102810ef04 [voxmedia] fix volume embed extraction 2016-05-22 20:37:35 +01:00
Yen Chi Hsuan
78d3b3e213 [generic] Improve Livestream detection (closes #2234) 2016-05-23 01:40:11 +08:00
Yen Chi Hsuan
7a46542f97 [livestream] Video IDs should always be strings (#2234) 2016-05-23 01:40:11 +08:00
Yen Chi Hsuan
eb7941e3e6 [compat] Fix for XML with <!DOCTYPE> in Python 2.7 and 3.2
Such XML documents cause DeprecationWarning if python is run
with `-W error`
2016-05-23 01:40:11 +08:00
remitamine
db3b8b2103 [tf1] add support for more related web sites 2016-05-22 17:03:17 +01:00
remitamine
c5f5155100 [wat] extract all formats 2016-05-22 17:03:17 +01:00
Yen Chi Hsuan
4a12077855 [genric] Eliminate duplicated video URLs (closes #6562) 2016-05-22 22:23:20 +08:00
Sergey M
a4a7c44bd3 [README.md] Document solution for extremely slow start on Windows 2016-05-22 15:04:51 +06:00
Thor77
70346165fe [bandcamp] raise ExtractorError when track not streamable (#9465)
* [bandcamp] raise ExtractorError when track not streamable

* [bandcamp] update md5 for second test

* don't rely on json-data, but just check for 'file'

* don't rely on presence of 'file'
2016-05-22 14:15:39 +08:00
Sergey M
c776b99691 [README.md] Remove Windows updating trickery
Windows updating fixed in e9297256d4.
2016-05-22 10:14:02 +06:00
Sergey M․
e9297256d4 [update] Fix youtube-dl.exe updating from arbitrary directory (Closes #2718) 2016-05-22 10:06:45 +06:00
Sergey M
e5871c672b [README.md] Clarify location for youtube-dl.exe even more
%USERPROFILE% not in %PATH% by default.
2016-05-22 09:36:07 +06:00
Sergey M
9b06b0fb92 [README.md] Clarify updating on Windows 2016-05-22 09:26:06 +06:00
Sergey M
4f3a25c2b4 [README.md] Fix typo 2016-05-22 09:00:08 +06:00
Sergey M
21a19aa94d [README.md] Clarify location for youtube-dl.exe 2016-05-22 08:59:28 +06:00
Sergey M․
c6b9cf05e1 [utils] Do not fail on unknown date formats in unified_strdate 2016-05-22 08:28:41 +06:00
Sergey M․
4d8819d249 [extractor/generic] Add support for theplatform embeds (Closes #8636, closes #9476) 2016-05-22 06:52:39 +06:00
Sergey M․
898f4b49cc [theplatform] Add _extract_urls 2016-05-22 06:47:22 +06:00
Sergey M․
0150a00f33 [cc] Add test for tosh.cc (Closes #9566) 2016-05-22 02:58:41 +06:00
TRox1972
c8831015f4 [ComedyCentral] Add support for tosh.cc.com and cc.com/video-clips 2016-05-22 02:55:10 +06:00
Sergey M․
92d221ad48 [periscope] Update uploader_id (Closes #9565) 2016-05-22 02:39:15 +06:00
Sergey M․
0db9a05f88 [periscope:user] Adapt to layout changes (Closes #9563) 2016-05-22 02:15:56 +06:00
Philipp Hagemeister
e03b35b8f9 release 2016.05.21.2 2016-05-21 21:47:39 +02:00
Philipp Hagemeister
d2fee3c99e release.sh: also check for python3 rsa module 2016-05-21 21:47:22 +02:00
Philipp Hagemeister
598869afb1 release 2016.05.21.1 2016-05-21 21:27:00 +02:00
Philipp Hagemeister
7e642e4fd6 release: check for pandoc
Abort releaseing if pandoc is missing.
(pandoc was not included in my essential app database, and thus missing on my new machine.)
2016-05-21 21:26:57 +02:00
Philipp Hagemeister
c8cc3745fb release 2016.05.21 2016-05-21 21:18:59 +02:00
Jaime Marquínez Ferrándiz
4c718d3c50 [rtve] Recognize 'filmoteca' URLs 2016-05-21 17:37:35 +02:00
Yen Chi Hsuan
115c65793a [jwplatform] Don't fail with RTMP URLs without mp4:, mp3: or flv: 2016-05-21 13:50:38 +08:00
Yen Chi Hsuan
661d46b28f [cbslocal] Add new extractor (closes #9522) 2016-05-21 13:40:45 +08:00
Yen Chi Hsuan
5ce3d5bd1b [sendtonews] Add new extractor
Used in CBSLocal. Part of #9522
2016-05-21 13:39:42 +08:00
Yen Chi Hsuan
612b5f403e [jwplatform] Improved m3u8 and rtmp support
Changes made for SendtoNewsIE. Part of #9522
2016-05-21 13:38:01 +08:00
Yen Chi Hsuan
9f54e692d2 [anvato] Add new extractor
Used in CBSLocal (#9522)
2016-05-21 13:18:29 +08:00
Yen Chi Hsuan
7b2fcbfd4e [common] Skip TYPE=CLOSED-CAPTIONS lines in m3u8 manifests
According to [1], valid values for TYPE are AUDIO, VIDEO, SUBTITLES
and CLOSED-CAPTIONS. Such a value is found in Anvato master playlists,
though I don't use _extract_m3u8_formats() in the end.

Part of #9522.

[1] https://tools.ietf.org/html/draft-pantos-http-live-streaming-19#section-4.3.4.1
2016-05-21 13:16:28 +08:00
Yen Chi Hsuan
16da9bbc29 [common] Add _m3u8_meta_format() template
For extractors who handle m3u8 manifests by themselves. (eg., AnvatoIE)

Part of #9522
2016-05-21 13:15:28 +08:00
Sergey M․
c8602b2f9b [nrk] Unquote subtitles' URLs 2016-05-21 05:09:16 +06:00
Sergey M․
b219f5e51b [brightcove:new] Improve error reporting 2016-05-21 00:59:06 +06:00
Sergey M․
1846e9ade0 [localnews8] Fix extractor (Closes #9539) 2016-05-20 22:31:08 +06:00
TRox1972
6756602be6 [LocalNews8] add extractor (Closes #9200) 2016-05-20 22:10:13 +06:00
Sergey M․
6c114b1210 [extractor/generic] Remove generic id and title from wistia extractionand update tests 2016-05-20 21:55:35 +06:00
Sergey M․
7ded6545ed [extractor/generic] Add test for wistia standard embed 2016-05-20 21:43:36 +06:00
Sergey M․
aa5957ac49 [extractor/generic] Add support for async wistia embeds (Closes #9549) 2016-05-20 21:33:31 +06:00
remitamine
64413f7563 [cbc] fix extraction for flv only videos(fixes #5309) 2016-05-20 16:21:23 +01:00
Sergey M․
45f160a43c [wistia] Improve hls support 2016-05-20 21:16:08 +06:00
Sergey M․
36ca2c55db [wistia] Skip storyboard and improve extraction 2016-05-20 21:04:01 +06:00
Sergey M․
f0c96af9cb [wistia] Add alias and modernize 2016-05-20 20:55:10 +06:00
Yen Chi Hsuan
31a70191e7 [cbc] Add the test case from #5156 2016-05-20 19:04:50 +08:00
Yen Chi Hsuan
ad96b4c8f5 [common] Extract audio formats in SMIL
Found in http://www.cbc.ca/player/play/2657631896

Closes #5156
2016-05-20 19:02:53 +08:00
Yen Chi Hsuan
043dc9d36f [cbc] Fix for old-styled URLs
The URL http://www.cbc.ca/player/News/ID/2672225049/ (#6342) redirects
to http://www.cbc.ca/player/play/2672224672, while youtube-dl wasn't
able to handle it correctly.
2016-05-20 18:39:54 +08:00
remitamine
52f7c75cff [cbc] extract http formats and update tests 2016-05-20 06:58:46 +01:00
Sergey M․
f6e588afc0 [24video] Fix description extraction 2016-05-20 08:53:04 +06:00
remitamine
a001296703 [learnr] Add new extractor(closes #4284) 2016-05-19 18:18:03 +01:00
Yen Chi Hsuan
2cbd8c6781 Merge pull request #9537 from TRox1972/p1
[Makefile] delete thumbnails
2016-05-19 16:58:44 +08:00
TRox1972
8585dc4cdc [Makefile] delete thumbnails 2016-05-19 01:21:38 +02:00
Sergey M․
dd81769c62 [ndtv] Fix extraction 2016-05-19 04:34:19 +06:00
Sergey M․
46bc9b7d7c [utils] Allow None in remove_{start,end} 2016-05-19 04:31:30 +06:00
remitamine
b78531a36a [formula1] Add new extractor(closes #3617) 2016-05-18 22:24:46 +01:00
Sergey M․
11e6a0b641 [nfb] Modernize and extract subtitles 2016-05-18 00:25:15 +06:00
Sergey M․
15cda1ef77 [nfb] Fix uploader extraction 2016-05-17 23:46:47 +06:00
Yen Chi Hsuan
055f0d3d06 [abcnews] Added a new extractor (closes #3992)
Related: #6108, #8664, #9459
2016-05-17 15:38:57 +08:00
Yen Chi Hsuan
cdd94c2eae [utils] Check for None values in SOCKS proxy
Originally reported at
https://github.com/rg3/youtube-dl/pull/9287#issuecomment-219617864
2016-05-17 14:38:15 +08:00
Philipp Hagemeister
36755d9d69 release 2016.05.16 2016-05-16 17:25:47 +02:00
Sergey M․
f7199423e5 [groupon] Add support for Youtube embeds (Closes #9508) 2016-05-16 00:30:13 +06:00
Sergey M․
a0a81918f1 [collegehumor] Remove extractor
It now uses brightcove
2016-05-15 22:07:51 +06:00
Yen Chi Hsuan
5572d598a5 [hearthisat] Update the first test 2016-05-15 15:44:04 +08:00
Yen Chi Hsuan
cec9727c7f [hearthisat] Detect invalid download links (fixes #9440) 2016-05-15 15:35:31 +08:00
Yen Chi Hsuan
79298173c5 [utils] Fix getheader in urlhandle_detect_ext
Fixes #7049, related to #9440
2016-05-15 15:34:50 +08:00
Sergey M․
69c9cc2716 [xvideos] Extract html5 player formats (Closes #9495) 2016-05-15 03:38:04 +06:00
Sergey M․
ed56f26039 [extractor/common] Improve name extraction for m3u8 formats 2016-05-15 03:34:35 +06:00
Sergey M․
6f41b2bcf1 [extractor/generic] Improve 3qsdn embeds support (Closes #9453) 2016-05-14 23:58:25 +06:00
Sergey M․
cda6d47aad [utils] Simplify integer conversion in js_to_json 2016-05-14 23:41:57 +06:00
Sergey M․
5d39176f6d [extractor/generic:3qsdn] Add support for embeds 2016-05-14 23:40:34 +06:00
Sergey M․
5c86bfe70f [3qsdn] Add extractor 2016-05-14 23:35:03 +06:00
Sergey M․
364cf465dd [test_utils] PEP 8 2016-05-14 20:46:33 +06:00
Sergey M․
ca950f49e9 [ora] Revert extraction to regexes
It's less fragile than using js_to_json with ora js
2016-05-14 20:45:18 +06:00
Sergey M․
89ac4a19e6 [utils] Process non-base 10 integers in js_to_json 2016-05-14 20:39:58 +06:00
felix
640eea0a0c [ora] minimise fragile regex shenanigans; recognise unsafespeech.com URLs 2016-05-14 20:13:06 +06:00
felix
bd1e484448 [utils] js_to_json: various improvements
now JS object literals like { /* " */ 0: ",]\xaa<\/p>", } will be correctly converted to JSON.
2016-05-14 20:12:39 +06:00
Yen Chi Hsuan
a834622b89 Merge pull request #9492 from jwilk/teamcoco
[teamcoco] Fix base64 regexp
2016-05-14 20:02:40 +08:00
Yen Chi Hsuan
707bb426b1 Merge pull request #9493 from jwilk/errno
Don't hardcode errno constant
2016-05-14 20:00:11 +08:00
Jakub Wilk
66e7ace17a Don't hardcode errno constant
The value of ENOENT is architecture-dependent, so don't assume it's
always 2.
2016-05-14 13:41:41 +02:00
Jakub Wilk
791ff52f75 [teamcoco] Fix base64 regexp 2016-05-14 13:19:54 +02:00
Yen Chi Hsuan
98d560f205 [test/test_socks] Skip SOCKS tests
They occasional trigger errors or blocks
(https://travis-ci.org/rg3/youtube-dl/jobs/130184883)
2016-05-14 18:48:36 +08:00
Yen Chi Hsuan
afcc317800 Merge pull request #9466 from TRox1972/patch-1
Update README.md
2016-05-14 17:03:04 +08:00
Sergey M․
b5abf86148 [cinemassacre] Remove extractor (Closes #9457)
It now uses jwplatform
2016-05-14 04:53:14 +06:00
Sergey M․
134c6ea856 [YoutubeDL] Sanitize url for url and url_transparent extraction results 2016-05-14 04:46:38 +06:00
remitamine
0730be9022 [sina] fix extraction(fixes #1146) 2016-05-13 20:25:01 +01:00
Sergey M․
96c2e3e909 [imdb] Improve extraction 2016-05-13 23:25:05 +06:00
Sergey M․
f196508f7b [imdb] Relax _VALID_URL (Closes #9481) 2016-05-13 22:19:00 +06:00
Yen Chi Hsuan
cc1028aa6d [openload] Fix extraction (closes #9472) 2016-05-13 18:11:08 +08:00
remitamine
ad55e10165 [brightcove] change the protocol for m3u8 formats to m3u8_native 2016-05-13 08:35:38 +01:00
remitamine
18cf6381f6 [nrk] extract m3u8 formats 2016-05-13 08:05:28 +01:00
remitamine
cdf32ff15d [extractors] add import for UstudioEmbedIE 2016-05-13 05:25:32 +01:00
remitamine
99d79b8692 [ustudio] add support ustudio app/embed urls 2016-05-13 05:21:45 +01:00
remitamine
b9e7bc55da [mgtv] extract http formats 2016-05-12 22:46:23 +01:00
Sergey M․
d8d540cf0d [nrk] Rework extractor (Closes #9470) 2016-05-13 02:07:12 +06:00
Sergey M․
0df79d552a [twitch:bookmarks] Remove extractor
Bookmarks no longer available
2016-05-13 00:14:30 +06:00
Sergey M․
0db3a66162 [twitch] Skip dead tests 2016-05-12 23:57:52 +06:00
Yen Chi Hsuan
7581bfc958 [utils] Unquote crendentials passed to SOCKS proxies
Fixes #9450
2016-05-13 00:27:25 +08:00
TRox1972
f388f616c1 Update README.md 2016-05-12 16:48:12 +02:00
Yen Chi Hsuan
a3fa6024d6 [bloomberg] Fix test_Bloomberg
In this test case, sometimes HLS is the best format while sometimes HDS
is. To prevent occasional test failures, force HDS to be the best
format. In the past, testing against HDS formats causes the same error
as #9214, which is fixed as #9377 landed.
2016-05-12 20:08:42 +08:00
Yen Chi Hsuan
1b405bb47d [downloader/f4m] Tolerate truncate segments when testing
Replaces #9216

Fixes #9214 and test_Bloomberg partially
2016-05-12 20:02:36 +08:00
Yen Chi Hsuan
7e8ddca1bb [vevo] Delay the georestriction check to prevent false alerts
Fixes #9408
2016-05-12 19:56:58 +08:00
Yen Chi Hsuan
778a1ccca7 [utils] Add Œ and œ found in French to ACCENT_CHARS
Fixes #9463
2016-05-12 19:48:48 +08:00
Yen Chi Hsuan
4540515cb3 [iqiyi] Fix 1080P extraction (closes #9446) 2016-05-12 18:48:27 +08:00
Sergey M․
e0741fd449 [__init__] Simplify colon presence check 2016-05-11 22:03:30 +06:00
teemuy
e73b9c65e2 Bugfix: Allow colons in custom HTTP header values. 2016-05-11 21:59:24 +06:00
Yen Chi Hsuan
702ccf2dc0 [compat] Rename shlex_quote and remove unused subprocess_check_output 2016-05-10 16:00:21 +08:00
Philipp Hagemeister
28b4f73620 release 2016.05.10 2016-05-10 09:08:08 +02:00
Yen Chi Hsuan
c2876afafe [test/test_socks] Use a different port range
Seems on Travis CI, ports in the original range are often used.
2016-05-10 14:51:38 +08:00
Yen Chi Hsuan
6ddb4888d2 [options] Update --proxy description for SOCKS proxies 2016-05-10 14:51:38 +08:00
Yen Chi Hsuan
fa5cb8d021 [socks] Remove a superfluous clause 2016-05-10 14:51:38 +08:00
Yen Chi Hsuan
e21f17fc86 [test/test_socks] Test with local SOCKS servers 2016-05-10 14:51:38 +08:00
Yen Chi Hsuan
edaa23f822 [compat] Rename struct_(un)pack to compat_struct_(un)pack 2016-05-10 14:51:38 +08:00
Yen Chi Hsuan
d5ae6bb501 [utils] Add rationale for register_socks_protocols 2016-05-10 14:51:38 +08:00
Yen Chi Hsuan
51fb4995a5 [utils] Register SOCKS protocols in urllib and support SOCKS4A 2016-05-10 14:51:38 +08:00
Yen Chi Hsuan
9e9cd7248d [socks] Eliminate magic constants and improve 2016-05-10 14:51:38 +08:00
Yen Chi Hsuan
72f3289ac4 [test/test_socks] Add tests for SOCKS proxies 2016-05-10 14:51:38 +08:00
Yen Chi Hsuan
71aff18809 [socks] Support SOCKS proxies 2016-05-10 14:51:38 +08:00
Yen Chi Hsuan
dab0daeeb0 [utils,compat] Move struct_pack and struct_unpack to compat.py 2016-05-10 14:51:38 +08:00
Yen Chi Hsuan
4350b74545 [socks] Add socks.py from @bluec0re's public domain implementation
https://gist.github.com/bluec0re/cafd3764412967417fd3
2016-05-10 14:49:25 +08:00
Sergey M․
2937590e8b [downloader/hls] PEP 8 2016-05-09 22:16:33 +06:00
Sergey M․
fad7bbec3a [test_compat] Remove unused import 2016-05-09 22:15:55 +06:00
Sergey M․
e62d9c5caa [downloader/external] Call ffmpeg with with HTTP_PROXY env variable set (#9437) 2016-05-09 22:05:12 +06:00
Sergey M․
20cfdcc910 [test_compat] Avoid None values for compat_setenv 2016-05-09 22:00:14 +06:00
Sergey M․
1292638754 [test_compat] Use compat_setenv 2016-05-09 21:58:38 +06:00
Sergey M․
fe40f9eef2 [compat] Add compat_setenv 2016-05-09 21:55:03 +06:00
Sergey M․
6104cc2985 [downloader/hls] Add event media playlists to unsupported features of hlsnative 2016-05-09 20:55:37 +06:00
Sergey M․
c15c47d19b [downloader/hls] Remove EXT-X-MEDIA-SEQUENCE from unsupported features for hlsnative 2016-05-09 20:45:03 +06:00
Sergey M․
965fefdcd8 Credit @sleep-walker for #9431 2016-05-09 20:38:33 +06:00
Sergey M․
3951e7eb93 [ceskatelevize] Simplify, restore bonus video test and skip georestricted test (Closes #9431) 2016-05-09 20:37:20 +06:00
Tomáš Čech
f1f6f5aa5e [ceskatelevize] Add support for live streams
Live streams has no playlist title, use title of the stream containing
TV channel name. Internal m3u8 handler doesn't seem to handle well
continuous streams. Add test for live stream. Remove no longer
reachable test.
2016-05-09 18:58:15 +06:00
Sergey M
eb785b856f Merge pull request #9358 from dstftw/hls-native-to-ffmpeg-delegation
[downloader/hls] Delegate extraction to ffmpeg when unsupported features detected
2016-05-08 22:07:55 +00:00
Sergey M․
c52f4efaee [mva] Improve _VALID_URLs 2016-05-08 20:10:20 +06:00
Sergey M․
f23a92a0ce [mva] Add extractor (Closes #6667) 2016-05-08 20:02:54 +06:00
Yen Chi Hsuan
3b01a9fbb6 [litv] Add new extractor
LiTV is a streaming platform providing free and paid legal contents in
Taiwan.
2016-05-08 14:34:38 +08:00
Peter Rowlands
93fdb14177 don't use selection by attribute 2016-05-08 10:33:17 +09:00
Peter Rowlands
370d4eb8ad use stricter file selector
in case of empty in case of empty ./track/video/file entries
2016-05-08 10:02:48 +09:00
Peter Rowlands
3452c3a27c update tests 2016-05-08 10:02:19 +09:00
Sergey M․
9c072d38c6 [arte] Improve language preference (Closes #9401, closes #9162) 2016-05-08 06:52:42 +06:00
Peter Rowlands
81f35fee2f fix extractors.py import order 2016-05-08 08:57:16 +09:00
Peter Rowlands
0fdbe3146c use dict.get in case upload_date does not exist 2016-05-08 08:56:22 +09:00
Sergey M․
3e169233da Expanduser for more options with input files 2016-05-08 04:36:57 +06:00
Sergey M․
f5436c5d9e [downloader/external] Add temp fix ffmpeg m3u8 downloads (Closes #9394) 2016-05-08 02:29:26 +06:00
Sergey M․
5c24873a9e Credit @inondle for #9400 2016-05-08 02:04:34 +06:00
Sergey M․
00c21c225d Credit @kdeldycke for #9430 2016-05-08 00:11:44 +06:00
Sergey M
d013b26719 Merge pull request #9430 from kdeldycke/batch_file_home_expansion
Expand user's home in batch file path.
2016-05-07 18:09:51 +00:00
Kevin Deldycke
e2eca6f65e Expand user's home in batch file path. 2016-05-07 20:03:25 +02:00
Yen Chi Hsuan
a0904c5d80 [telegraaf] Fix extractor (closes #9318) 2016-05-08 00:56:31 +08:00
Sergey M․
cb1fa58813 [flickr] Extract uploader URL (Closes #9426) 2016-05-07 20:15:40 +06:00
remitamine
3fd6332c05 [flickr] extract license field(closes #9425) 2016-05-07 15:13:14 +01:00
Sergey M
401d147893 Merge pull request #9400 from inondle/master
[liveleak] Adds support for thumbnails and updates tests
2016-05-06 19:23:31 +00:00
inondle
e2ee97dcd5 [liveleak] Adds support for thumbnails, updates tests 2016-05-06 12:05:37 -07:00
Sergey M․
f745403b5b [vevo] Revert videoplayer.vevo.com to api.vevo.com 2016-05-06 23:37:17 +06:00
Sergey M․
3e80e6f40d [vevo] Allow request to api.vevo.com to fail (Closes #9417)
I don't know whether this it's tempopary or api has just gone
2016-05-06 23:35:58 +06:00
Sergey M․
25cb7a0eeb [youtube] Allow empty attribute values in description regex 2016-05-06 22:11:18 +06:00
Sergey M․
abc97b5eda [utils] Allow empty attribute values in get_element_by_attribute (Closes #9415) 2016-05-06 22:07:30 +06:00
remitamine
04e88ca2ca [vk] improve extraction(fixes #7976) 2016-05-06 15:02:40 +01:00
Peter Rowlands
8d93c21466 add multi_video test case 2016-05-06 12:08:43 +09:00
Peter Rowlands
1dbfd78754 fix multi_video part naming, add upload_date field 2016-05-06 12:07:29 +09:00
Peter Rowlands
22e35adefd use url instead of single formats entry 2016-05-06 10:41:30 +09:00
Yen Chi Hsuan
6f59aa934b [periscope:user] Add new extractor for user pages
Closes #9388
2016-05-06 02:14:39 +08:00
Yen Chi Hsuan
109db8ea64 Merge pull request #9367 from codesparkle/master
Feature: --restrict-filenames: replace accented characters by their unaccented counterpart instead of "_"
2016-05-06 01:44:03 +08:00
Peter Rowlands
833b644fff use xpath_text 2016-05-06 01:24:02 +09:00
Sergey M․
915620fd68 [redtube] PEP 8 2016-05-05 21:34:06 +06:00
Sergey M․
ac12e888f9 [redtube] Extract all formats, duration, upload date and view count (Closes #9397) 2016-05-05 21:02:54 +06:00
Yen Chi Hsuan
b1c6a5bac8 [Makefile] Remove more media files in make clean 2016-05-05 20:50:39 +08:00
Yen Chi Hsuan
7d08f6073d [kuwo:category] Update test 2016-05-05 20:20:26 +08:00
remitamine
758a059241 [dailymail] Add new extractor(closes #2667) 2016-05-05 13:13:22 +01:00
Yen Chi Hsuan
4f8c56eb4e [fczenit] Fix extraction and update test
Closes #9359
2016-05-05 17:55:37 +08:00
Peter Rowlands
57cf9b7f06 [afreecatv] Add new extractor for afreecatv.com VODs 2016-05-05 03:59:23 +09:00
Sergey M․
9da526aae7 [yandexmusic:playlist] Update test 2016-05-04 23:18:48 +06:00
Sergey M․
75b81df3af [udemy] Modernize 2016-05-04 23:14:12 +06:00
Sergey M․
aabdc83d6e [udemy] Fix course enroll (Closes #9393) 2016-05-04 23:03:44 +06:00
Sergey M․
2a48e6f01a [yandexmusic:playlist] Respect track order for long (>150) playlists 2016-05-04 22:45:01 +06:00
Sergey M․
203a3c0e6a [yandexmusic:playlist] Make title optional 2016-05-04 22:35:28 +06:00
Sergey M․
d36724cca4 [yandexmusic:playlist] Remove unused imports 2016-05-04 22:34:37 +06:00
Sergey M․
15fc0658f7 [yandexmusic:playlist] Modernize 2016-05-04 22:33:29 +06:00
Sergey M․
e960c3c223 [yandexmusic:playlist] Improve extraction (Closes #6801) 2016-05-04 22:25:39 +06:00
Sergey M․
bc7e77a04b [vevo] Use raise_geo_restricted 2016-05-03 23:18:36 +06:00
Sergey M․
964f49336f [aol] Improve _VALID_URL (Closes #9381) 2016-05-03 21:24:51 +06:00
Sergey M․
57d8e32a3e [xfileshare] Add support for streamin.to 2016-05-03 16:58:11 +06:00
Sergey M․
4174552391 [xfileshare] Refactor _VALID_URL and remove ded sites 2016-05-03 15:35:32 +06:00
Sergey M․
80bc4106af [xfileshare] Add support for thevideobee.to (Closes #9374) 2016-05-03 15:09:23 +06:00
Yen Chi Hsuan
7759be38da [xiami] Detect georestriction and skip tests 2016-05-03 16:19:43 +08:00
Yen Chi Hsuan
a0a309b973 [kuwo:category] Fix description and update test 2016-05-03 16:06:28 +08:00
Adam Thalhammer
c587cbb793 improved performance by extracting accented chars to top level 2016-05-03 10:40:30 +10:00
Sergey M․
6c52a86f54 [README.md] Update creator description 2016-05-02 21:32:57 +06:00
Sergey M․
8a92e51c60 [extractor/common] Relax wording for creator metafield 2016-05-02 21:31:35 +06:00
Sergey M․
f0e14fdd43 [YoutubeDL] Skip non-relevant field types when building output template 2016-05-02 20:05:06 +06:00
Sergey M․
df5f4e8888 [vevo] Remove superfluous code 2016-05-02 18:47:35 +06:00
Sergey M․
7960b0563b [YoutubeDL] Properly process unable-to-download-error on python2 2016-05-02 18:35:50 +06:00
Sergey M․
5c9ced9504 [vevo] Improve genre extraction 2016-05-02 18:19:00 +06:00
Adam Thalhammer
31c4448f6e Instead of replacing accented characters with an underscore when sanitizing file names in restricted mode, replace them with their non-accented equivalents fixes #9347 2016-05-02 13:25:12 +10:00
Adam Thalhammer
79a2e94e79 Instead of replacing accented characters with an underscore when sanitizing file names in restricted mode, replace them with their non-accented equivalents fixes #9347 2016-05-02 13:21:39 +10:00
Sergey M․
686cc89634 [discovery] Fix typo 2016-05-02 07:07:35 +06:00
Sergey M․
9508738f9a [vevo] Extract featured artist 2016-05-02 03:36:40 +06:00
Sergey M․
78a3ff33ab [vevo:playlist] Add fallback for playlist id 2016-05-02 03:29:48 +06:00
Sergey M․
881dbc86c4 [vevo] Extract track related metafields and add artists to title (Closes #1684) 2016-05-02 03:28:58 +06:00
Sergey M․
8e7d004888 [vevo] Add test for video only available via webpage 2016-05-02 03:06:48 +06:00
Sergey M․
9618c44824 [vevo] Extract video versions from webpage as a last resort (Closes #8426, closes #9366) 2016-05-02 02:58:20 +06:00
Sergey M․
516ea41a7d [vevo] Fix _call_api 2016-05-02 02:54:50 +06:00
Sergey M․
e2bd301ce7 [vevo:playlist] Fix genre playlists 2016-05-02 01:00:42 +06:00
Sergey M․
0c9d288ba0 [vevo:playlist] Remove debug params 2016-05-02 00:50:31 +06:00
Sergey M․
e0da32df6e [vevo:playlist] Add extractor (Closes #9334, closes #9364) 2016-05-02 00:48:26 +06:00
Philipp Hagemeister
174aba3223 release 2016.05.01 2016-05-01 10:19:14 +02:00
Sergey M․
0d66bd0eab [downloader/hls] Delegate extraction to ffmpeg when unsupported features detected 2016-05-01 13:56:51 +06:00
Sergey M․
4bd143a3a0 [postprocessor/ffmpeg] Simplify metadata preparation and add track related metafields (Closes #9357) 2016-05-01 10:56:54 +06:00
Sergey M․
6f27bf1c74 Credit @blahgeek for xiami (#9079) 2016-05-01 08:08:51 +06:00
Sergey M․
68bb2fef95 [tagesschau] Restrict playlist entry regex 2016-05-01 07:15:23 +06:00
Sergey M․
854cc54bc1 [tagesschau] Expand video id 2016-05-01 07:01:55 +06:00
Sergey M․
651ad35ce0 [tagesschau] Relax _VALID_URL 2016-05-01 06:57:19 +06:00
Sergey M․
6a0f9a24d0 [tagesschau] Separate player extractor 2016-05-01 06:45:44 +06:00
remitamine
9cf79e8f4b [ccc] improve extraction 2016-05-01 01:45:17 +01:00
Sergey M․
2844b09336 [tagesschau] Fix article media ids 2016-05-01 04:42:05 +06:00
Sergey M․
1a2b377cc2 [tagesschau] Fix audio support 2016-05-01 04:38:46 +06:00
Sergey M․
4c1b2e5c0e [tagesschau] Add support for playlists 2016-05-01 04:18:56 +06:00
Sergey M․
9e1b96ae40 [rtlnl] Match formats only by height 2016-05-01 03:20:36 +06:00
Sergey M․
fc35cd9e0c [tagesschau] Relax _VALID_URL 2016-05-01 02:56:32 +06:00
Sergey M․
339fe7228a [tagesschau] Update _FORMATS map 2016-05-01 02:56:32 +06:00
remitamine
ea7e7fecbd [discovery] remove unused imports 2016-04-30 21:55:28 +01:00
remitamine
d00b93d58c [discovery] extract more info using BrightcoveNewIE 2016-04-30 21:49:32 +01:00
remitamine
93f7a31bf3 [discovery] extract subtitle 2016-04-30 20:51:32 +01:00
remitamine
33a1ec950c [discovery] extract http formats 2016-04-30 20:51:32 +01:00
Sergey M․
4e0c0c1508 [xiami] Improve extraction (Closes #9079)
* Switch to JSON source
* Add abstract IE for playlists
* Extract more track related metadata
2016-04-30 21:50:23 +06:00
BlahGeek
89c0dc9a5f [xiami] Add xiami extractor 2016-04-30 21:48:40 +06:00
remitamine
f628d800fb [ted] add support for youtube embeds and update tests 2016-04-30 16:34:57 +01:00
remitamine
11fa3d7f99 [ted] extract all http formats 2016-04-30 15:44:30 +01:00
Sergey M․
d41ee7b774 [vlive] Pass Referer as bytestring (Closes #9352) 2016-04-30 19:22:42 +06:00
remitamine
e0e9bbb0e9 [pbs] extract srt and vtt subtitles 2016-04-30 14:02:17 +01:00
remitamine
7691184a31 [pbs] remove duplicate format 2016-04-30 12:57:30 +01:00
remitamine
35cd2f4c25 [pbs] extract only the formats that we know that they will be available as http format
https://projects.pbs.org/confluence/display/coveapi/COVE+Video+Specifications
2016-04-30 11:32:13 +01:00
remitamine
350d7963db [pbs] fix the least bitrate http url construction 2016-04-30 11:12:11 +01:00
remitamine
cbc032c8b7 [pbs] extract all http formats 2016-04-30 01:24:36 +01:00
remitamine
69c4cde4ba [wsj] improve extraction 2016-04-29 21:37:05 +01:00
Sergey M․
ca278a182b [rtlnl] Replace test 2016-04-30 02:07:29 +06:00
Sergey M․
373e1230e4 [rtlnl] Clarify tests 2016-04-30 01:50:26 +06:00
Sergey M․
cd63d091ce [rtlnl] Fix tests 2016-04-30 01:48:14 +06:00
Sergey M․
0571ffda7d [rtlnl] Improve extraction (Closes #9329)
* Make hls extraction non fatal and revert ext
* Extract progressive formats' metadata from corresponding hls formats
2016-04-30 01:43:39 +06:00
Reino17
5556047465 [rtlnl] Update 720p PG_URL_TEMPLATE
- Fixed the format_id for the 720p progressive videostream and added the video's resolution.
- The adaptive videostreams have the m3u8-extension, so I removed the confusing mp4-extension in order to make a better distinction between the these and the progressive videostreams.
2016-04-30 01:43:13 +06:00
remitamine
65a3bfb379 [dfb] extract m3u8 formats 2016-04-29 19:21:17 +01:00
Yen Chi Hsuan
cef3f3011f [funimation] Detect blocking and support CloudFlare cookies 2016-04-30 00:17:09 +08:00
Yen Chi Hsuan
e9c6cdf4a1 [common] Fix format_id construction for HLS 2016-04-29 22:50:16 +08:00
Sergey M․
00a17a9e12 [crunchyroll] Sort formats 2016-04-29 19:44:10 +06:00
Sergey M․
8312b1a3d1 [crunchyroll] Add even more relaxed fmt fallback 2016-04-29 19:43:53 +06:00
Sergey M․
6ff4469528 [crunchyroll] Relax fmt regex 2016-04-29 19:39:27 +06:00
Yen Chi Hsuan
68835d687a Merge branch 'Kagami-vlive-hls' 2016-04-29 19:30:51 +08:00
Yen Chi Hsuan
9d186afac8 [vlive] Coding style and PEP8 2016-04-29 19:29:50 +08:00
Yen Chi Hsuan
151d98130b Merge branch 'vlive-hls' of https://github.com/Kagami/youtube-dl into Kagami-vlive-hls 2016-04-29 19:26:39 +08:00
Kagami Hiiragi
b24d6336a7 [vlive] Add support for live videos 2016-04-29 14:22:50 +03:00
remitamine
065216d94f [crunchyroll] reduce requests for formats extraction 2016-04-29 11:46:42 +01:00
remitamine
67167920db [viewlift] replace SnagFilms extractors
- add support for other sites that use the same logic
- improve format extraction and sorting
2016-04-29 11:24:10 +01:00
Yen Chi Hsuan
14638e2915 [sexykarma] Rename to WatchIndianPornIE and fix extraction 2016-04-29 18:17:08 +08:00
Yen Chi Hsuan
1910077ed7 Revert "[sexykarma] Remove the extractor"
This reverts commit 31ff3c074e.
2016-04-29 17:59:23 +08:00
Yen Chi Hsuan
5819edef03 [ooyala] Skip an invalid test
Ooyala is used by lots of extractors and its correctness can be verified
by these websites.
2016-04-29 14:27:15 +08:00
Yen Chi Hsuan
f5535ed0e3 [orf] Skip the expired test 2016-04-29 14:24:07 +08:00
Yen Chi Hsuan
31ff3c074e [sexykarma] Remove the extractor
Its domain name is on sale.

Closes #9317
2016-04-29 13:36:52 +08:00
Sergey M․
72670c39de [arte:+7] Fix typo in _VALID_URL 2016-04-29 04:46:23 +06:00
Sergey M․
683d892bf9 [viewster] Remove unused import 2016-04-29 01:30:53 +06:00
Sergey M․
497971cd4a [yandexmusic] Clarify blockage even more 2016-04-29 01:28:07 +06:00
remitamine
e757fb3d05 [crunchyroll] improve extraction
- extract more metadata(series, episode, episode_number)
- reduce duplicate requests for extracting formats
- remove duplicate formats
2016-04-28 18:42:20 +01:00
remitamine
0ba9e3ca22 [viewster] extract formats for videos with multiple audios/subtitles 2016-04-28 17:45:09 +01:00
Sergey M․
4b53762914 [yandexmusic] Clarify blockage 2016-04-28 21:45:33 +06:00
Sergey M․
eebe6b382e [yandexmusic] Improve error handling 2016-04-28 21:37:34 +06:00
Yen Chi Hsuan
0cbcbdd89d [nuvid] Fix extraction
Closes #7620
2016-04-28 17:51:20 +08:00
Yen Chi Hsuan
7f776fa4b5 [yandexmusic] Skip tests as Travis CI blocked 2016-04-28 17:08:41 +08:00
Yen Chi Hsuan
eb5ad31ce1 Merge branch 'pmrowla-mwave-meetgreet' 2016-04-28 16:03:43 +08:00
Yen Chi Hsuan
a5941305b6 [mwave] Coding style 2016-04-28 16:03:08 +08:00
Yen Chi Hsuan
f8dddaf456 Merge branch 'mwave-meetgreet' of https://github.com/pmrowla/youtube-dl into pmrowla-mwave-meetgreet 2016-04-28 15:56:32 +08:00
Yen Chi Hsuan
618c71dc64 [cloudy] New domain name for the test_cloudy_1
I'm sure whether videoraj.ch still works or not, so keep it.
2016-04-28 15:46:00 +08:00
Sergey M․
52af8f222b [cwtv] Relax _VALID_URL (Closes #9327) 2016-04-28 04:01:21 +06:00
Yen Chi Hsuan
3cc8649c9d [20min] Detect embedded YouTube videos
Fixes #9331
2016-04-28 02:58:11 +08:00
Yen Chi Hsuan
dcf094d626 [theplatform] Fix for Python 3.2
test_AENetworks{,_1} fails as in Python < 3.3, binascii.a2b_* functions
accepts only bytes-like objects
2016-04-27 18:35:33 +08:00
Peter Rowlands
5b5d7cc11e [mwave] Add Mwave Meet & Greet extractor 2016-04-27 15:57:17 +09:00
Yen Chi Hsuan
2ac2cbc0a3 [malemotion] Remove the extractor
Announcement from their homepage:

```
MaleMotion is closed

After another system crash, I'm forced to close the site

This week all content will be erased

Don't forget to cancel your subscription if any !
```

Closes #9311.
2016-04-27 13:55:32 +08:00
Yen Chi Hsuan
a7e03861e8 [scivee] Skip the test
Not accessible from either Travis CI or my machine.

Closes #9315
2016-04-27 13:52:04 +08:00
Sergey M
046ea04a7d [README.md] Mention mpv 2016-04-27 00:22:08 +06:00
Sergey M
7464360379 [README.md] Add FAQ entry on output template conflicts 2016-04-27 00:16:48 +06:00
Sergey M․
175c2e9ec3 [youtube:search_url] Reimplement in terms of youtube:playlistbase 2016-04-26 22:29:29 +06:00
remitamine
f1f879098a [viewster] extract more metadata for http formats 2016-04-26 13:40:40 +01:00
Sergey M․
c9fd530670 [ok] Extract start time 2016-04-25 22:15:15 +06:00
Sergey M․
749b0046a8 [ok] Allow embeds without title (Closes #9303) 2016-04-25 22:05:47 +06:00
Yen Chi Hsuan
e3de3d6f2f [normalboots] Fix extraction
Now it's using ScreenwaveMedia
2016-04-25 23:49:12 +08:00
Yen Chi Hsuan
ad58942d57 [muzu] Remove extractor
MUZU is shutting down in October 2015. [1]

[1] http://www.musicbusinessworldwide.com/youtube-rival-muzu-is-heading-into-liquidation/
2016-04-25 23:35:05 +08:00
Yen Chi Hsuan
4645432d7a [eagleplatform] Checking direct HTTP links
Sometimes they fail with 404
2016-04-25 22:48:17 +08:00
Yen Chi Hsuan
6bdc2d5358 [mitele] Comment out unstable MD5
Also Akamai f4f fragments
2016-04-25 22:27:25 +08:00
Yen Chi Hsuan
2beff95da5 [nrk] Comment out unstable MD5 checksums
Both are Akamai f4f fragments.
2016-04-25 22:26:19 +08:00
Yen Chi Hsuan
abc1723edd [unistra] Sort formats
Originally URLs are passed to set() and not sorted, so the result is not
deterministic, causing occasional FAILs on Travis CI.
2016-04-25 22:24:40 +08:00
Yen Chi Hsuan
b248e6485b Merge branch 'remitamine-akamai_pv' 2016-04-25 21:02:30 +08:00
Yen Chi Hsuan
d6712378e7 Merge branch 'akamai_pv' of https://github.com/remitamine/youtube-dl into remitamine-akamai_pv 2016-04-25 21:02:02 +08:00
remitamine
fb72ec58ae [extractor/common] do not process f4m manifest that contain akamai playerVerificationChallenge 2016-04-25 13:37:03 +01:00
Sergey M․
c83a352227 [openload] Make thumbnail optional 2016-04-25 00:26:06 +06:00
Sergey M․
e9063b5de9 [openload] Add test 2016-04-25 00:22:55 +06:00
Sergey M․
594b0c4c69 [openload] Fix ext extraction 2016-04-25 00:03:29 +06:00
Sergey M․
eb9ee19422 [utils] Allow None mimetypes in mimetype2ext 2016-04-25 00:03:12 +06:00
Sergey M․
a1394b820d [openload] Fix title extraction (Closes #9298) 2016-04-25 00:01:37 +06:00
Yen Chi Hsuan
aa9dc24f5a [douyutv] Improve extraction and update tests
The JSON API sometimes return HTML pages with errors
2016-04-24 23:52:17 +08:00
Yen Chi Hsuan
51762e1a31 [xminus] Fix extraction (closes #9228) 2016-04-24 23:21:45 +08:00
Boris Wachtmeister
14f7a2b8af [WDRMaus] switch current show to new WDR extractor (fixes #8562)
It seems that the "current show" already uses the new WDR video-player,
while all the others videos still use the old player.

I just added the current show URL to the normal WDR-extractor, which
works fine. This commit needs my changes from PR #8842 that fix the
support for WDR.
2016-04-23 11:53:22 +02:00
Boris Wachtmeister
c0837a12c8 [WDR] complete overhaul after relaunch of the site
The WDR relaunched their site on 2016-02-23 which not only changed the
URL-schema completely but also the layout of their pages.

Apparently the whole "mediathek" now runs on the wdr-domain, so no
separate URL for funkhauseuropa anymore.
There seems to be no explicit handling of video-sizes on the page or in
the URLs anymore. There seems to be only one size for HTML5, but still
several sizes for flash. The extractor adds all to the list of formats.

There is no metadata for the HTML5-stream, so that the best flash-stream
will always be considered as the "best" format. At least in my tests
this seemed to be true anyway.
2016-04-23 11:42:18 +02:00
206 changed files with 11084 additions and 3606 deletions

View File

@@ -6,8 +6,8 @@
---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.04.24*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.04.24**
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.06.20*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.06.20**
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2016.04.24
[debug] youtube-dl version 2016.06.20
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

8
.gitignore vendored
View File

@@ -28,10 +28,16 @@ updates_key.pem
*.mp4
*.m4a
*.m4v
*.mp3
*.part
*.swp
test/testdata
test/local_parameters.json
.tox
youtube-dl.zsh
# IntelliJ related files
.idea
.idea/*
*.iml
tmp/

View File

@@ -7,11 +7,13 @@ python:
- "3.4"
- "3.5"
sudo: false
install:
- bash ./devscripts/install_srelay.sh
- export PATH=$PATH:$(pwd)/tmp/srelay-0.4.8b6
script: nosetests test --verbose
notifications:
email:
- filippo.valsorda@gmail.com
- phihag@phihag.de
- yasoob.khld@gmail.com
# irc:
# channels:

View File

@@ -168,3 +168,10 @@ José Joaquín Atria
Viťas Strádal
Kagami Hiiragi
Philip Huppert
blahgeek
Kevin Deldycke
inondle
Tomáš Čech
Déstin Reed
Roman Tsiupa
Artur Krysiak

View File

@@ -142,9 +142,9 @@ After you have ensured this site is distributing it's content legally, you can f
```
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc.
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L68-L226). Add tests and code for as many as you want.
8. Keep in mind that the only mandatory fields in info dict for successful extraction process are `id`, `title` and either `url` or `formats`, i.e. these are the critical data the extraction does not make any sense without. This means that [any field](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L138-L226) apart from aforementioned mandatory ones should be treated **as optional** and extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields. For example, if you have some intermediate dict `meta` that is a source of metadata and it has a key `summary` that you want to extract and put into resulting info dict as `description`, you should be ready that this key may be missing from the `meta` dict, i.e. you should extract it as `meta.get('summary')` and not `meta['summary']`. Similarly, you should pass `fatal=False` when extracting data from a webpage with `_search_regex/_html_search_regex`.
9. Check the code with [flake8](https://pypi.python.org/pypi/flake8).
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L74-L252). Add tests and code for as many as you want.
8. Keep in mind that the only mandatory fields in info dict for successful extraction process are `id`, `title` and either `url` or `formats`, i.e. these are the critical data the extraction does not make any sense without. This means that [any field](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L148-L252) apart from aforementioned mandatory ones should be treated **as optional** and extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields. For example, if you have some intermediate dict `meta` that is a source of metadata and it has a key `summary` that you want to extract and put into resulting info dict as `description`, you should be ready that this key may be missing from the `meta` dict, i.e. you should extract it as `meta.get('summary')` and not `meta['summary']`. Similarly, you should pass `fatal=False` when extracting data from a webpage with `_search_regex/_html_search_regex`.
9. Check the code with [flake8](https://pypi.python.org/pypi/flake8). Also make sure your code works under all [Python](http://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+.
10. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
$ git add youtube_dl/extractor/extractors.py

View File

@@ -1,7 +1,7 @@
all: youtube-dl README.md CONTRIBUTING.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish supportedsites
clean:
rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish youtube_dl/extractor/lazy_extractors.py *.dump *.part *.info.json *.mp4 *.flv *.mp3 *.avi CONTRIBUTING.md.tmp ISSUE_TEMPLATE.md.tmp youtube-dl youtube-dl.exe
rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish youtube_dl/extractor/lazy_extractors.py *.dump *.part *.info.json *.mp4 *.m4a *.flv *.mp3 *.avi *.mkv *.webm *.jpg *.png CONTRIBUTING.md.tmp ISSUE_TEMPLATE.md.tmp youtube-dl youtube-dl.exe
find . -name "*.pyc" -delete
find . -name "*.class" -delete
@@ -37,7 +37,7 @@ test:
ot: offlinetest
offlinetest: codetest
$(PYTHON) -m nose --verbose test --exclude test_download.py --exclude test_age_restriction.py --exclude test_subtitles.py --exclude test_write_annotations.py --exclude test_youtube_lists.py --exclude test_iqiyi_sdk_interpreter.py
$(PYTHON) -m nose --verbose test --exclude test_download.py --exclude test_age_restriction.py --exclude test_subtitles.py --exclude test_write_annotations.py --exclude test_youtube_lists.py --exclude test_iqiyi_sdk_interpreter.py --exclude test_socks.py
tar: youtube-dl.tar.gz
@@ -69,7 +69,7 @@ README.txt: README.md
pandoc -f markdown -t plain README.md -o README.txt
youtube-dl.1: README.md
$(PYTHON) devscripts/prepare_manpage.py >youtube-dl.1.temp.md
$(PYTHON) devscripts/prepare_manpage.py youtube-dl.1.temp.md
pandoc -s -f markdown -t man youtube-dl.1.temp.md -o youtube-dl.1
rm -f youtube-dl.1.temp.md

View File

@@ -17,7 +17,7 @@ youtube-dl - download videos from youtube.com or other video platforms
To install it right away for all UNIX users (Linux, OS X, etc.), type:
sudo curl https://yt-dl.org/latest/youtube-dl -o /usr/local/bin/youtube-dl
sudo curl -L https://yt-dl.org/latest/youtube-dl -o /usr/local/bin/youtube-dl
sudo chmod a+rx /usr/local/bin/youtube-dl
If you do not have curl, you can alternatively use a recent wget:
@@ -25,20 +25,26 @@ If you do not have curl, you can alternatively use a recent wget:
sudo wget https://yt-dl.org/downloads/latest/youtube-dl -O /usr/local/bin/youtube-dl
sudo chmod a+rx /usr/local/bin/youtube-dl
Windows users can [download a .exe file](https://yt-dl.org/latest/youtube-dl.exe) and place it in their home directory or any other location on their [PATH](http://en.wikipedia.org/wiki/PATH_%28variable%29).
OS X users can install **youtube-dl** with [Homebrew](http://brew.sh/).
brew install youtube-dl
Windows users can [download an .exe file](https://yt-dl.org/latest/youtube-dl.exe) and place it in any location on their [PATH](http://en.wikipedia.org/wiki/PATH_%28variable%29) except for `%SYSTEMROOT%\System32` (e.g. **do not** put in `C:\Windows\System32`).
You can also use pip:
sudo pip install youtube-dl
sudo pip install --upgrade youtube-dl
This command will update youtube-dl if you have already installed it. See the [pypi page](https://pypi.python.org/pypi/youtube_dl) for more information.
OS X users can install youtube-dl with [Homebrew](http://brew.sh/):
brew install youtube-dl
Or with [MacPorts](https://www.macports.org/):
sudo port install youtube-dl
Alternatively, refer to the [developer instructions](#developer-instructions) for how to check out and work with the git repository. For further options, including PGP signatures, see the [youtube-dl Download Page](https://rg3.github.io/youtube-dl/download.html).
# DESCRIPTION
**youtube-dl** is a small command-line program to download videos from
**youtube-dl** is a command-line program to download videos from
YouTube.com and a few more sites. It requires the Python interpreter, version
2.6, 2.7, or 3.2+, and it is not platform specific. It should work on
your Unix box, on Windows or on Mac OS X. It is released to the public domain,
@@ -73,8 +79,8 @@ which means you can modify it, redistribute it or use it however you like.
repairs broken URLs, but emits an error if
this is not possible instead of searching.
--ignore-config Do not read configuration files. When given
in the global configuration file /etc
/youtube-dl.conf: Do not read the user
in the global configuration file
/etc/youtube-dl.conf: Do not read the user
configuration in ~/.config/youtube-
dl/config (%APPDATA%/youtube-dl/config.txt
on Windows)
@@ -85,9 +91,11 @@ which means you can modify it, redistribute it or use it however you like.
--no-color Do not emit color codes in output
## Network Options:
--proxy URL Use the specified HTTP/HTTPS proxy. Pass in
an empty string (--proxy "") for direct
connection
--proxy URL Use the specified HTTP/HTTPS/SOCKS proxy.
To enable experimental SOCKS proxy, specify
a proper scheme. For example
socks5://127.0.0.1:1080/. Pass in an empty
string (--proxy "") for direct connection
--socket-timeout SECONDS Time to wait before giving up, in seconds
--source-address IP Client-side IP address to bind to
(experimental)
@@ -160,7 +168,7 @@ which means you can modify it, redistribute it or use it however you like.
(experimental)
## Download Options:
-r, --rate-limit LIMIT Maximum download rate in bytes per second
-r, --limit-rate RATE Maximum download rate in bytes per second
(e.g. 50K or 4.2M)
-R, --retries RETRIES Number of retries (default is 10), or
"infinite".
@@ -247,18 +255,19 @@ which means you can modify it, redistribute it or use it however you like.
--write-info-json Write video metadata to a .info.json file
--write-annotations Write video annotations to a
.annotations.xml file
--load-info FILE JSON file containing the video information
--load-info-json FILE JSON file containing the video information
(created with the "--write-info-json"
option)
--cookies FILE File to read cookies from and dump cookie
jar in
--cache-dir DIR Location in the filesystem where youtube-dl
can store some downloaded information
permanently. By default $XDG_CACHE_HOME
/youtube-dl or ~/.cache/youtube-dl . At the
moment, only YouTube player files (for
videos with obfuscated signatures) are
cached, but that may change.
permanently. By default
$XDG_CACHE_HOME/youtube-dl or
~/.cache/youtube-dl . At the moment, only
YouTube player files (for videos with
obfuscated signatures) are cached, but that
may change.
--no-cache-dir Disable filesystem caching
--rm-cache-dir Delete all filesystem cache files
@@ -415,7 +424,7 @@ which means you can modify it, redistribute it or use it however you like.
# CONFIGURATION
You can configure youtube-dl by placing any supported command line option to a configuration file. On Linux, the system wide configuration file is located at `/etc/youtube-dl.conf` and the user wide configuration file at `~/.config/youtube-dl/config`. On Windows, the user wide configuration file locations are `%APPDATA%\youtube-dl\config.txt` or `C:\Users\<user name>\youtube-dl.conf`.
You can configure youtube-dl by placing any supported command line option to a configuration file. On Linux and OS X, the system wide configuration file is located at `/etc/youtube-dl.conf` and the user wide configuration file at `~/.config/youtube-dl/config`. On Windows, the user wide configuration file locations are `%APPDATA%\youtube-dl\config.txt` or `C:\Users\<user name>\youtube-dl.conf`.
For example, with the following configuration file youtube-dl will always extract the audio, not copy the mtime, use a proxy and save all videos under `Movies` directory in your home directory:
```
@@ -431,7 +440,7 @@ You can use `--ignore-config` if you want to disable the configuration file for
### Authentication with `.netrc` file
You may also want to configure automatic credentials storage for extractors that support authentication (by providing login and password with `--username` and `--password`) in order not to pass credentials as command line arguments on every youtube-dl execution and prevent tracking plain text passwords in the shell command history. You can achieve this using a [`.netrc` file](http://stackoverflow.com/tags/.netrc/info) on per extractor basis. For that you will need to create a`.netrc` file in your `$HOME` and restrict permissions to read/write by you only:
You may also want to configure automatic credentials storage for extractors that support authentication (by providing login and password with `--username` and `--password`) in order not to pass credentials as command line arguments on every youtube-dl execution and prevent tracking plain text passwords in the shell command history. You can achieve this using a [`.netrc` file](http://stackoverflow.com/tags/.netrc/info) on per extractor basis. For that you will need to create a `.netrc` file in your `$HOME` and restrict permissions to read/write by you only:
```
touch $HOME/.netrc
chmod a-rwx,u+rw $HOME/.netrc
@@ -465,7 +474,7 @@ The basic usage is not to set any template arguments when downloading a single f
- `display_id`: An alternative identifier for the video
- `uploader`: Full name of the video uploader
- `license`: License name the video is licensed under
- `creator`: The main artist who created the video
- `creator`: The creator of the video
- `release_date`: The date (YYYYMMDD) when the video was released
- `timestamp`: UNIX timestamp of the moment the video became available
- `upload_date`: Video upload date (YYYYMMDD)
@@ -502,6 +511,9 @@ The basic usage is not to set any template arguments when downloading a single f
- `autonumber`: Five-digit number that will be increased with each download, starting at zero
- `playlist`: Name or id of the playlist that contains the video
- `playlist_index`: Index of the video in the playlist padded with leading zeros according to the total length of the playlist
- `playlist_id`: Playlist identifier
- `playlist_title`: Playlist title
Available for the video that belongs to some logical chapter or section:
- `chapter`: Name or title of the chapter the video belongs to
@@ -541,6 +553,10 @@ The current default template is `%(title)s-%(id)s.%(ext)s`.
In some cases, you don't want special characters such as 中, spaces, or &, such as when transferring the downloaded filename to a Windows system or the filename through an 8bit-unsafe channel. In these cases, add the `--restrict-filenames` flag to get a shorter title:
#### Output template and Windows batch files
If you are using output template inside a Windows batch file then you must escape plain percent characters (`%`) by doubling, so that `-o "%(title)s-%(id)s.%(ext)s"` should become `-o "%%(title)s-%%(id)s.%%(ext)s"`. However you should not touch `%`'s that are not plain characters, e.g. environment variables for expansion should stay intact: `-o "C:\%HOMEPATH%\Desktop\%%(title)s.%%(ext)s"`.
#### Output template examples
Note on Windows you may need to use double quotes instead of single.
@@ -691,12 +707,20 @@ hash -r
Again, from then on you'll be able to update with `sudo youtube-dl -U`.
### youtube-dl is extremely slow to start on Windows
Add a file exclusion for `youtube-dl.exe` in Windows Defender settings.
### I'm getting an error `Unable to extract OpenGraph title` on YouTube playlists
YouTube changed their playlist format in March 2014 and later on, so you'll need at least youtube-dl 2014.07.25 to download all YouTube videos.
If you have installed youtube-dl with a package manager, pip, setup.py or a tarball, please use that to update. Note that Ubuntu packages do not seem to get updated anymore. Since we are not affiliated with Ubuntu, there is little we can do. Feel free to [report bugs](https://bugs.launchpad.net/ubuntu/+source/youtube-dl/+filebug) to the [Ubuntu packaging guys](mailto:ubuntu-motu@lists.ubuntu.com?subject=outdated%20version%20of%20youtube-dl) - all they have to do is update the package to a somewhat recent version. See above for a way to update.
### I'm getting an error when trying to use output template: `error: using output template conflicts with using title, video ID or auto number`
Make sure you are not using `-o` with any of these options `-t`, `--title`, `--id`, `-A` or `--auto-number` set in command line or in a configuration file. Remove the latter if any.
### Do I always have to pass `-citw`?
By default, youtube-dl intends to have the best options (incidentally, if you have a convincing case that these should be different, [please file an issue where you explain that](https://yt-dl.org/bug)). Therefore, it is unnecessary and sometimes harmful to copy long option strings from webpages. In particular, the only option out of `-citw` that is regularly useful is `-i`.
@@ -717,7 +741,7 @@ Videos or video formats streamed via RTMP protocol can only be downloaded when [
### I have downloaded a video but how can I play it?
Once the video is fully downloaded, use any video player, such as [vlc](http://www.videolan.org) or [mplayer](http://www.mplayerhq.hu/).
Once the video is fully downloaded, use any video player, such as [mpv](https://mpv.io/), [vlc](http://www.videolan.org) or [mplayer](http://www.mplayerhq.hu/).
### I extracted a video URL with `-g`, but it does not play on another machine / in my webbrowser.
@@ -774,9 +798,9 @@ means you're using an outdated version of Python. Please update to Python 2.6 or
Since June 2012 ([#342](https://github.com/rg3/youtube-dl/issues/342)) youtube-dl is packed as an executable zipfile, simply unzip it (might need renaming to `youtube-dl.zip` first on some systems) or clone the git repository, as laid out above. If you modify the code, you can run it by executing the `__main__.py` file. To recompile the executable, run `make youtube-dl`.
### The exe throws a *Runtime error from Visual C++*
### The exe throws an error due to missing `MSVCR100.dll`
To run the exe you need to install first the [Microsoft Visual C++ 2008 Redistributable Package](http://www.microsoft.com/en-us/download/details.aspx?id=29).
To run the exe you need to install first the [Microsoft Visual C++ 2010 Redistributable Package (x86)](https://www.microsoft.com/en-US/download/details.aspx?id=5555).
### On Windows, how should I set up ffmpeg and youtube-dl? Where should I put the exe files?
@@ -831,6 +855,12 @@ It is *not* possible to detect whether a URL is supported or not. That's because
If you want to find out whether a given URL is supported, simply call youtube-dl with it. If you get no videos back, chances are the URL is either not referring to a video or unsupported. You can find out which by examining the output (if you run youtube-dl on the console) or catching an `UnsupportedError` exception if you run it from a Python program.
# Why do I need to go through that much red tape when filing bugs?
Before we had the issue template, despite our extensive [bug reporting instructions](#bugs), about 80% of the issue reports we got were useless, for instance because people used ancient versions hundreds of releases old, because of simple syntactic errors (not in youtube-dl but in general shell usage), because the problem was alrady reported multiple times before, because people did not actually read an error message, even if it said "please install ffmpeg", because people did not mention the URL they were trying to download and many more simple, easy-to-avoid problems, many of whom were totally unrelated to youtube-dl.
youtube-dl is an open-source project manned by too few volunteers, so we'd rather spend time fixing bugs where we are certain none of those simple problems apply, and where we can be reasonably confident to be able to reproduce the issue without asking the reporter repeatedly. As such, the output of `youtube-dl -v YOUR_URL_HERE` is really all that's required to file an issue. The issue template also guides you through some basic steps you can do, such as checking that your version of youtube-dl is current.
# DEVELOPER INSTRUCTIONS
Most users do not need to build youtube-dl and can [download the builds](http://rg3.github.io/youtube-dl/download.html) or get them from their distribution.
@@ -905,9 +935,9 @@ After you have ensured this site is distributing it's content legally, you can f
```
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc.
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L68-L226). Add tests and code for as many as you want.
8. Keep in mind that the only mandatory fields in info dict for successful extraction process are `id`, `title` and either `url` or `formats`, i.e. these are the critical data the extraction does not make any sense without. This means that [any field](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L138-L226) apart from aforementioned mandatory ones should be treated **as optional** and extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields. For example, if you have some intermediate dict `meta` that is a source of metadata and it has a key `summary` that you want to extract and put into resulting info dict as `description`, you should be ready that this key may be missing from the `meta` dict, i.e. you should extract it as `meta.get('summary')` and not `meta['summary']`. Similarly, you should pass `fatal=False` when extracting data from a webpage with `_search_regex/_html_search_regex`.
9. Check the code with [flake8](https://pypi.python.org/pypi/flake8).
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L74-L252). Add tests and code for as many as you want.
8. Keep in mind that the only mandatory fields in info dict for successful extraction process are `id`, `title` and either `url` or `formats`, i.e. these are the critical data the extraction does not make any sense without. This means that [any field](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L148-L252) apart from aforementioned mandatory ones should be treated **as optional** and extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields. For example, if you have some intermediate dict `meta` that is a source of metadata and it has a key `summary` that you want to extract and put into resulting info dict as `description`, you should be ready that this key may be missing from the `meta` dict, i.e. you should extract it as `meta.get('summary')` and not `meta['summary']`. Similarly, you should pass `fatal=False` when extracting data from a webpage with `_search_regex/_html_search_regex`.
9. Check the code with [flake8](https://pypi.python.org/pypi/flake8). Also make sure your code works under all [Python](http://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+.
10. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
$ git add youtube_dl/extractor/extractors.py
@@ -934,7 +964,7 @@ with youtube_dl.YoutubeDL(ydl_opts) as ydl:
ydl.download(['http://www.youtube.com/watch?v=BaW_jenozKc'])
```
Most likely, you'll want to use various options. For a list of what can be done, have a look at [`youtube_dl/YoutubeDL.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/YoutubeDL.py#L121-L269). For a start, if you want to intercept youtube-dl's output, set a `logger` object.
Most likely, you'll want to use various options. For a list of options available, have a look at [`youtube_dl/YoutubeDL.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/YoutubeDL.py#L128-L278). For a start, if you want to intercept youtube-dl's output, set a `logger` object.
Here's a more complete example of a program that outputs only errors (and a short message after the download is finished), and downloads/converts the video to an mp3 file:

View File

@@ -1,17 +1,38 @@
#!/usr/bin/python3
from http.server import HTTPServer, BaseHTTPRequestHandler
from socketserver import ThreadingMixIn
import argparse
import ctypes
import functools
import shutil
import subprocess
import sys
import tempfile
import threading
import traceback
import os.path
sys.path.insert(0, os.path.dirname(os.path.dirname((os.path.abspath(__file__)))))
from youtube_dl.compat import (
compat_input,
compat_http_server,
compat_str,
compat_urlparse,
)
class BuildHTTPServer(ThreadingMixIn, HTTPServer):
# These are not used outside of buildserver.py thus not in compat.py
try:
import winreg as compat_winreg
except ImportError: # Python 2
import _winreg as compat_winreg
try:
import socketserver as compat_socketserver
except ImportError: # Python 2
import SocketServer as compat_socketserver
class BuildHTTPServer(compat_socketserver.ThreadingMixIn, compat_http_server.HTTPServer):
allow_reuse_address = True
@@ -191,7 +212,7 @@ def main(args=None):
action='store_const', dest='action', const='service',
help='Run as a Windows service')
parser.add_argument('-b', '--bind', metavar='<host:port>',
action='store', default='localhost:8142',
action='store', default='0.0.0.0:8142',
help='Bind to host:port (default %default)')
options = parser.parse_args(args=args)
@@ -216,7 +237,7 @@ def main(args=None):
srv = BuildHTTPServer((host, port), BuildHTTPRequestHandler)
thr = threading.Thread(target=srv.serve_forever)
thr.start()
input('Press ENTER to shut down')
compat_input('Press ENTER to shut down')
srv.shutdown()
thr.join()
@@ -231,8 +252,6 @@ def rmtree(path):
os.remove(fname)
os.rmdir(path)
#==============================================================================
class BuildError(Exception):
def __init__(self, output, code=500):
@@ -249,15 +268,25 @@ class HTTPError(BuildError):
class PythonBuilder(object):
def __init__(self, **kwargs):
pythonVersion = kwargs.pop('python', '2.7')
try:
key = _winreg.OpenKey(_winreg.HKEY_LOCAL_MACHINE, r'SOFTWARE\Python\PythonCore\%s\InstallPath' % pythonVersion)
python_version = kwargs.pop('python', '3.4')
python_path = None
for node in ('Wow6432Node\\', ''):
try:
self.pythonPath, _ = _winreg.QueryValueEx(key, '')
finally:
_winreg.CloseKey(key)
except Exception:
raise BuildError('No such Python version: %s' % pythonVersion)
key = compat_winreg.OpenKey(
compat_winreg.HKEY_LOCAL_MACHINE,
r'SOFTWARE\%sPython\PythonCore\%s\InstallPath' % (node, python_version))
try:
python_path, _ = compat_winreg.QueryValueEx(key, '')
finally:
compat_winreg.CloseKey(key)
break
except Exception:
pass
if not python_path:
raise BuildError('No such Python version: %s' % python_version)
self.pythonPath = python_path
super(PythonBuilder, self).__init__(**kwargs)
@@ -305,8 +334,10 @@ class YoutubeDLBuilder(object):
def build(self):
try:
subprocess.check_output([os.path.join(self.pythonPath, 'python.exe'), 'setup.py', 'py2exe'],
cwd=self.buildPath)
proc = subprocess.Popen([os.path.join(self.pythonPath, 'python.exe'), 'setup.py', 'py2exe'], stdin=subprocess.PIPE, cwd=self.buildPath)
proc.wait()
#subprocess.check_output([os.path.join(self.pythonPath, 'python.exe'), 'setup.py', 'py2exe'],
# cwd=self.buildPath)
except subprocess.CalledProcessError as e:
raise BuildError(e.output)
@@ -369,12 +400,12 @@ class Builder(PythonBuilder, GITBuilder, YoutubeDLBuilder, DownloadBuilder, Clea
pass
class BuildHTTPRequestHandler(BaseHTTPRequestHandler):
class BuildHTTPRequestHandler(compat_http_server.BaseHTTPRequestHandler):
actionDict = {'build': Builder, 'download': Builder} # They're the same, no more caching.
def do_GET(self):
path = urlparse.urlparse(self.path)
paramDict = dict([(key, value[0]) for key, value in urlparse.parse_qs(path.query).items()])
path = compat_urlparse.urlparse(self.path)
paramDict = dict([(key, value[0]) for key, value in compat_urlparse.parse_qs(path.query).items()])
action, _, path = path.path.strip('/').partition('/')
if path:
path = path.split('/')
@@ -388,7 +419,7 @@ class BuildHTTPRequestHandler(BaseHTTPRequestHandler):
builder.close()
except BuildError as e:
self.send_response(e.code)
msg = unicode(e).encode('UTF-8')
msg = compat_str(e).encode('UTF-8')
self.send_header('Content-Type', 'text/plain; charset=UTF-8')
self.send_header('Content-Length', len(msg))
self.end_headers()
@@ -400,7 +431,5 @@ class BuildHTTPRequestHandler(BaseHTTPRequestHandler):
else:
self.send_response(500, 'Malformed URL')
#==============================================================================
if __name__ == '__main__':
main()

View File

@@ -0,0 +1,111 @@
#!/usr/bin/env python
from __future__ import unicode_literals
import base64
import json
import mimetypes
import netrc
import optparse
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from youtube_dl.compat import (
compat_basestring,
compat_input,
compat_getpass,
compat_print,
compat_urllib_request,
)
from youtube_dl.utils import (
make_HTTPS_handler,
sanitized_Request,
)
class GitHubReleaser(object):
_API_URL = 'https://api.github.com/repos/rg3/youtube-dl/releases'
_UPLOADS_URL = 'https://uploads.github.com/repos/rg3/youtube-dl/releases/%s/assets?name=%s'
_NETRC_MACHINE = 'github.com'
def __init__(self, debuglevel=0):
self._init_github_account()
https_handler = make_HTTPS_handler({}, debuglevel=debuglevel)
self._opener = compat_urllib_request.build_opener(https_handler)
def _init_github_account(self):
try:
info = netrc.netrc().authenticators(self._NETRC_MACHINE)
if info is not None:
self._username = info[0]
self._password = info[2]
compat_print('Using GitHub credentials found in .netrc...')
return
else:
compat_print('No GitHub credentials found in .netrc')
except (IOError, netrc.NetrcParseError):
compat_print('Unable to parse .netrc')
self._username = compat_input(
'Type your GitHub username or email address and press [Return]: ')
self._password = compat_getpass(
'Type your GitHub password and press [Return]: ')
def _call(self, req):
if isinstance(req, compat_basestring):
req = sanitized_Request(req)
# Authorizing manually since GitHub does not response with 401 with
# WWW-Authenticate header set (see
# https://developer.github.com/v3/#basic-authentication)
b64 = base64.b64encode(
('%s:%s' % (self._username, self._password)).encode('utf-8')).decode('ascii')
req.add_header('Authorization', 'Basic %s' % b64)
response = self._opener.open(req).read().decode('utf-8')
return json.loads(response)
def list_releases(self):
return self._call(self._API_URL)
def create_release(self, tag_name, name=None, body='', draft=False, prerelease=False):
data = {
'tag_name': tag_name,
'target_commitish': 'master',
'name': name,
'body': body,
'draft': draft,
'prerelease': prerelease,
}
req = sanitized_Request(self._API_URL, json.dumps(data).encode('utf-8'))
return self._call(req)
def create_asset(self, release_id, asset):
asset_name = os.path.basename(asset)
url = self._UPLOADS_URL % (release_id, asset_name)
# Our files are small enough to be loaded directly into memory.
data = open(asset, 'rb').read()
req = sanitized_Request(url, data)
mime_type, _ = mimetypes.guess_type(asset_name)
req.add_header('Content-Type', mime_type or 'application/octet-stream')
return self._call(req)
def main():
parser = optparse.OptionParser(usage='%prog VERSION BUILDPATH')
options, args = parser.parse_args()
if len(args) != 2:
parser.error('Expected a version and a build directory')
version, build_path = args
releaser = GitHubReleaser()
new_release = releaser.create_release(version, name='youtube-dl %s' % version)
release_id = new_release['id']
for asset in os.listdir(build_path):
compat_print('Uploading %s...' % asset)
releaser.create_asset(release_id, os.path.join(build_path, asset))
if __name__ == '__main__':
main()

8
devscripts/install_srelay.sh Executable file
View File

@@ -0,0 +1,8 @@
#!/bin/bash
mkdir -p tmp && cd tmp
wget -N http://downloads.sourceforge.net/project/socks-relay/socks-relay/srelay-0.4.8/srelay-0.4.8b6.tar.gz
tar zxvf srelay-0.4.8b6.tar.gz
cd srelay-0.4.8b6
./configure
make

View File

@@ -1,13 +1,46 @@
from __future__ import unicode_literals
import io
import optparse
import os.path
import sys
import re
ROOT_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
README_FILE = os.path.join(ROOT_DIR, 'README.md')
PREFIX = '''%YOUTUBE-DL(1)
# NAME
youtube\-dl \- download videos from youtube.com or other video platforms
# SYNOPSIS
**youtube-dl** \[OPTIONS\] URL [URL...]
'''
def main():
parser = optparse.OptionParser(usage='%prog OUTFILE.md')
options, args = parser.parse_args()
if len(args) != 1:
parser.error('Expected an output filename')
outfile, = args
with io.open(README_FILE, encoding='utf-8') as f:
readme = f.read()
readme = re.sub(r'(?s)^.*?(?=# DESCRIPTION)', '', readme)
readme = re.sub(r'\s+youtube-dl \[OPTIONS\] URL \[URL\.\.\.\]', '', readme)
readme = PREFIX + readme
readme = filter_options(readme)
with io.open(outfile, 'w', encoding='utf-8') as outf:
outf.write(readme)
def filter_options(readme):
ret = ''
@@ -37,27 +70,5 @@ def filter_options(readme):
return ret
with io.open(README_FILE, encoding='utf-8') as f:
readme = f.read()
PREFIX = '''%YOUTUBE-DL(1)
# NAME
youtube\-dl \- download videos from youtube.com or other video platforms
# SYNOPSIS
**youtube-dl** \[OPTIONS\] URL [URL...]
'''
readme = re.sub(r'(?s)^.*?(?=# DESCRIPTION)', '', readme)
readme = re.sub(r'\s+youtube-dl \[OPTIONS\] URL \[URL\.\.\.\]', '', readme)
readme = PREFIX + readme
readme = filter_options(readme)
if sys.version_info < (3, 0):
print(readme.encode('utf-8'))
else:
print(readme)
if __name__ == '__main__':
main()

View File

@@ -6,7 +6,7 @@
# * the git config user.signingkey is properly set
# You will need
# pip install coverage nose rsa
# pip install coverage nose rsa wheel
# TODO
# release notes
@@ -15,10 +15,33 @@
set -e
skip_tests=true
if [ "$1" = '--run-tests' ]; then
skip_tests=false
shift
fi
gpg_sign_commits=""
buildserver='localhost:8142'
while true
do
case "$1" in
--run-tests)
skip_tests=false
shift
;;
--gpg-sign-commits|-S)
gpg_sign_commits="-S"
shift
;;
--buildserver)
buildserver="$2"
shift 2
;;
--*)
echo "ERROR: unknown option $1"
exit 1
;;
*)
break
;;
esac
done
if [ -z "$1" ]; then echo "ERROR: specify version number like this: $0 1994.09.06"; exit 1; fi
version="$1"
@@ -33,6 +56,9 @@ if [ ! -z "`git status --porcelain | grep -v CHANGELOG`" ]; then echo 'ERROR: th
useless_files=$(find youtube_dl -type f -not -name '*.py')
if [ ! -z "$useless_files" ]; then echo "ERROR: Non-.py files in youtube_dl: $useless_files"; exit 1; fi
if [ ! -f "updates_key.pem" ]; then echo 'ERROR: updates_key.pem missing'; exit 1; fi
if ! type pandoc >/dev/null 2>/dev/null; then echo 'ERROR: pandoc is missing'; exit 1; fi
if ! python3 -c 'import rsa' 2>/dev/null; then echo 'ERROR: python3-rsa is missing'; exit 1; fi
if ! python3 -c 'import wheel' 2>/dev/null; then echo 'ERROR: wheel is missing'; exit 1; fi
/bin/echo -e "\n### First of all, testing..."
make clean
@@ -48,7 +74,7 @@ sed -i "s/__version__ = '.*'/__version__ = '$version'/" youtube_dl/version.py
/bin/echo -e "\n### Committing documentation, templates and youtube_dl/version.py..."
make README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE.md supportedsites
git add README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE.md docs/supportedsites.md youtube_dl/version.py
git commit -m "release $version"
git commit $gpg_sign_commits -m "release $version"
/bin/echo -e "\n### Now tagging, signing and pushing..."
git tag -s -m "Release $version" "$version"
@@ -64,7 +90,7 @@ git push origin "$version"
REV=$(git rev-parse HEAD)
make youtube-dl youtube-dl.tar.gz
read -p "VM running? (y/n) " -n 1
wget "http://localhost:8142/build/rg3/youtube-dl/youtube-dl.exe?rev=$REV" -O youtube-dl.exe
wget "http://$buildserver/build/rg3/youtube-dl/youtube-dl.exe?rev=$REV" -O youtube-dl.exe
mkdir -p "build/$version"
mv youtube-dl youtube-dl.exe "build/$version"
mv youtube-dl.tar.gz "build/$version/youtube-dl-$version.tar.gz"
@@ -74,15 +100,16 @@ RELEASE_FILES="youtube-dl youtube-dl.exe youtube-dl-$version.tar.gz"
(cd build/$version/ && sha256sum $RELEASE_FILES > SHA2-256SUMS)
(cd build/$version/ && sha512sum $RELEASE_FILES > SHA2-512SUMS)
/bin/echo -e "\n### Signing and uploading the new binaries to yt-dl.org ..."
/bin/echo -e "\n### Signing and uploading the new binaries to GitHub..."
for f in $RELEASE_FILES; do gpg --passphrase-repeat 5 --detach-sig "build/$version/$f"; done
scp -r "build/$version" ytdl@yt-dl.org:html/tmp/
ssh ytdl@yt-dl.org "mv html/tmp/$version html/downloads/"
ROOT=$(pwd)
python devscripts/create-github-release.py $version "$ROOT/build/$version"
ssh ytdl@yt-dl.org "sh html/update_latest.sh $version"
/bin/echo -e "\n### Now switching to gh-pages..."
git clone --branch gh-pages --single-branch . build/gh-pages
ROOT=$(pwd)
(
set -e
ORIGIN_URL=$(git config --get remote.origin.url)
@@ -94,7 +121,7 @@ ROOT=$(pwd)
"$ROOT/devscripts/gh-pages/update-copyright.py"
"$ROOT/devscripts/gh-pages/update-sites.py"
git add *.html *.html.in update
git commit -m "release $version"
git commit $gpg_sign_commits -m "release $version"
git push "$ROOT" gh-pages
git push "$ORIGIN_URL" gh-pages
)

View File

@@ -6,6 +6,7 @@
- **22tracks:genre**
- **22tracks:track**
- **24video**
- **3qsdn**: 3Q SDN
- **3sat**
- **4tube**
- **56.com**
@@ -15,6 +16,8 @@
- **9gag**
- **abc.net.au**
- **Abc7News**
- **abcnews**
- **abcnews:video**
- **AcademicEarth:Course**
- **acast**
- **acast:channel**
@@ -25,6 +28,7 @@
- **AdobeTVVideo**
- **AdultSwim**
- **aenetworks**: A+E Networks: A&E, Lifetime, History.com, FYI Network
- **AfreecaTV**: afreecatv.com
- **Aftonbladet**
- **AirMozilla**
- **AlJazeera**
@@ -52,6 +56,7 @@
- **arte.tv:future**
- **arte.tv:info**
- **arte.tv:magazine**
- **arte.tv:playlist**
- **AtresPlayer**
- **ATTTechChannel**
- **AudiMedia**
@@ -69,6 +74,8 @@
- **bbc**: BBC
- **bbc.co.uk**: BBC iPlayer
- **bbc.co.uk:article**: BBC articles
- **bbc.co.uk:iplayer:playlist**
- **bbc.co.uk:playlist**
- **BeatportPro**
- **Beeg**
- **BehindKink**
@@ -77,6 +84,7 @@
- **Bild**: Bild.de
- **BiliBili**
- **BioBioChileTV**
- **BIQLE**
- **BleacherReport**
- **BleacherReportCMS**
- **blinkx**
@@ -98,10 +106,13 @@
- **canalc2.tv**
- **Canalplus**: canalplus.fr, piwiplus.fr and d8.tv
- **Canvas**
- **CarambaTV**
- **CarambaTVPage**
- **CBC**
- **CBCPlayer**
- **CBS**
- **CBSInteractive**
- **CBSLocal**
- **CBSNews**: CBS News
- **CBSNewsLiveVideo**: CBS News Live Videos
- **CBSSports**
@@ -113,11 +124,11 @@
- **chirbit**
- **chirbit:profile**
- **Cinchcast**
- **Cinemassacre**
- **Clipfish**
- **cliphunter**
- **ClipRs**
- **Clipsyndicate**
- **CloserToTruth**
- **cloudtime**: CloudTime
- **Cloudy**
- **Clubic**
@@ -127,12 +138,12 @@
- **CNN**
- **CNNArticle**
- **CNNBlogs**
- **CollegeHumor**
- **CollegeRama**
- **ComCarCoff**
- **ComedyCentral**
- **ComedyCentralShows**: The Daily Show / The Colbert Report
- **CondeNast**: Condé Nast media group: Allure, Architectural Digest, Ars Technica, Bon Appétit, Brides, Condé Nast, Condé Nast Traveler, Details, Epicurious, GQ, Glamour, Golf Digest, SELF, Teen Vogue, The New Yorker, Vanity Fair, Vogue, W Magazine, WIRED
- **Coub**
- **Cracked**
- **Crackle**
- **Criterion**
@@ -145,6 +156,7 @@
- **culturebox.francetvinfo.fr**
- **CultureUnplugged**
- **CWTV**
- **DailyMail**
- **dailymotion**
- **dailymotion:playlist**
- **dailymotion:user**
@@ -201,6 +213,7 @@
- **exfm**: ex.fm
- **ExpoTV**
- **ExtremeTube**
- **EyedoTV**
- **facebook**
- **faz.net**
- **fc2**
@@ -212,6 +225,7 @@
- **Flickr**
- **Folketinget**: Folketinget (ft.dk; Danish parliament)
- **FootyRoom**
- **Formula1**
- **FOX**
- **Foxgay**
- **FoxNews**: Fox News and Fox Business Video
@@ -245,6 +259,7 @@
- **Globo**
- **GloboArticle**
- **GodTube**
- **GodTV**
- **GoldenMoustache**
- **Golem**
- **GoogleDrive**
@@ -315,20 +330,24 @@
- **la7.tv**
- **Laola1Tv**
- **Le**: 乐视网
- **Learnr**
- **Lecture2Go**
- **Lemonde**
- **LePlaylist**
- **LetvCloud**: 乐视云
- **Libsyn**
- **life**: Life.ru
- **life:embed**
- **lifenews**: LIFE | NEWS
- **limelight**
- **limelight:channel**
- **limelight:channel_list**
- **LiTV**
- **LiveLeak**
- **livestream**
- **livestream:original**
- **LnkGo**
- **loc**: Library of Congress
- **LocalNews8**
- **LoveHomePorn**
- **lrt.lt**
- **lynda**: lynda.com videos
@@ -338,7 +357,6 @@
- **mailru**: Видео@Mail.Ru
- **MakersChannel**
- **MakerTV**
- **Malemotion**
- **MatchTV**
- **MDR**: MDR.DE and KiKA
- **media.ccc.de**
@@ -375,8 +393,10 @@
- **mtvservices:embedded**
- **MuenchenTV**: münchen.tv
- **MusicPlayOn**
- **muzu.tv**
- **mva**: Microsoft Virtual Academy videos
- **mva:course**: Microsoft Virtual Academy courses
- **Mwave**
- **MwaveMeetGreet**
- **MySpace**
- **MySpace:album**
- **MySpass**
@@ -417,6 +437,7 @@
- **nhl.com:videocenter**
- **nhl.com:videocenter:category**: NHL videocenter category
- **nick.com**
- **nick.de**
- **niconico**: ニコニコ動画
- **NiconicoPlaylist**
- **njoy**: N-JOY
@@ -464,7 +485,8 @@
- **pbs**: Public Broadcasting Service (PBS) and member stations: PBS: Public Broadcasting Service, APT - Alabama Public Television (WBIQ), GPB/Georgia Public Broadcasting (WGTV), Mississippi Public Broadcasting (WMPN), Nashville Public Television (WNPT), WFSU-TV (WFSU), WSRE (WSRE), WTCI (WTCI), WPBA/Channel 30 (WPBA), Alaska Public Media (KAKM), Arizona PBS (KAET), KNME-TV/Channel 5 (KNME), Vegas PBS (KLVX), AETN/ARKANSAS ETV NETWORK (KETS), KET (WKLE), WKNO/Channel 10 (WKNO), LPB/LOUISIANA PUBLIC BROADCASTING (WLPB), OETA (KETA), Ozarks Public Television (KOZK), WSIU Public Broadcasting (WSIU), KEET TV (KEET), KIXE/Channel 9 (KIXE), KPBS San Diego (KPBS), KQED (KQED), KVIE Public Television (KVIE), PBS SoCal/KOCE (KOCE), ValleyPBS (KVPT), CONNECTICUT PUBLIC TELEVISION (WEDH), KNPB Channel 5 (KNPB), SOPTV (KSYS), Rocky Mountain PBS (KRMA), KENW-TV3 (KENW), KUED Channel 7 (KUED), Wyoming PBS (KCWC), Colorado Public Television / KBDI 12 (KBDI), KBYU-TV (KBYU), Thirteen/WNET New York (WNET), WGBH/Channel 2 (WGBH), WGBY (WGBY), NJTV Public Media NJ (WNJT), WLIW21 (WLIW), mpt/Maryland Public Television (WMPB), WETA Television and Radio (WETA), WHYY (WHYY), PBS 39 (WLVT), WVPT - Your Source for PBS and More! (WVPT), Howard University Television (WHUT), WEDU PBS (WEDU), WGCU Public Media (WGCU), WPBT2 (WPBT), WUCF TV (WUCF), WUFT/Channel 5 (WUFT), WXEL/Channel 42 (WXEL), WLRN/Channel 17 (WLRN), WUSF Public Broadcasting (WUSF), ETV (WRLK), UNC-TV (WUNC), PBS Hawaii - Oceanic Cable Channel 10 (KHET), Idaho Public Television (KAID), KSPS (KSPS), OPB (KOPB), KWSU/Channel 10 & KTNW/Channel 31 (KWSU), WILL-TV (WILL), Network Knowledge - WSEC/Springfield (WSEC), WTTW11 (WTTW), Iowa Public Television/IPTV (KDIN), Nine Network (KETC), PBS39 Fort Wayne (WFWA), WFYI Indianapolis (WFYI), Milwaukee Public Television (WMVS), WNIN (WNIN), WNIT Public Television (WNIT), WPT (WPNE), WVUT/Channel 22 (WVUT), WEIU/Channel 51 (WEIU), WQPT-TV (WQPT), WYCC PBS Chicago (WYCC), WIPB-TV (WIPB), WTIU (WTIU), CET (WCET), ThinkTVNetwork (WPTD), WBGU-TV (WBGU), WGVU TV (WGVU), NET1 (KUON), Pioneer Public Television (KWCM), SDPB Television (KUSD), TPT (KTCA), KSMQ (KSMQ), KPTS/Channel 8 (KPTS), KTWU/Channel 11 (KTWU), East Tennessee PBS (WSJK), WCTE-TV (WCTE), WLJT, Channel 11 (WLJT), WOSU TV (WOSU), WOUB/WOUC (WOUB), WVPB (WVPB), WKYU-PBS (WKYU), KERA 13 (KERA), MPBN (WCBB), Mountain Lake PBS (WCFE), NHPTV (WENH), Vermont PBS (WETK), witf (WITF), WQED Multimedia (WQED), WMHT Educational Telecommunications (WMHT), Q-TV (WDCQ), WTVS Detroit Public TV (WTVS), CMU Public Television (WCMU), WKAR-TV (WKAR), WNMU-TV Public TV 13 (WNMU), WDSE - WRPT (WDSE), WGTE TV (WGTE), Lakeland Public Television (KAWE), KMOS-TV - Channels 6.1, 6.2 and 6.3 (KMOS), MontanaPBS (KUSM), KRWG/Channel 22 (KRWG), KACV (KACV), KCOS/Channel 13 (KCOS), WCNY/Channel 24 (WCNY), WNED (WNED), WPBS (WPBS), WSKG Public TV (WSKG), WXXI (WXXI), WPSU (WPSU), WVIA Public Media Studios (WVIA), WTVI (WTVI), Western Reserve PBS (WNEO), WVIZ/PBS ideastream (WVIZ), KCTS 9 (KCTS), Basin PBS (KPBT), KUHT / Channel 8 (KUHT), KLRN (KLRN), KLRU (KLRU), WTJX Channel 12 (WTJX), WCVE PBS (WCVE), KBTC Public Television (KBTC)
- **pcmag**
- **People**
- **Periscope**: Periscope
- **periscope**: Periscope
- **periscope:user**: Periscope user videos
- **PhilharmonieDeParis**: Philharmonie de Paris
- **phoenix.de**
- **Photobucket**
@@ -500,8 +522,11 @@
- **qqmusic:singer**: QQ音乐 - 歌手
- **qqmusic:toplist**: QQ音乐 - 排行榜
- **R7**
- **R7Article**
- **radio.de**
- **radiobremen**
- **radiocanada**
- **RadioCanadaAudioVideo**
- **radiofrance**
- **RadioJavan**
- **Rai**
@@ -511,10 +536,13 @@
- **RedTube**
- **RegioTV**
- **Restudy**
- **Reuters**
- **ReverbNation**
- **Revision3**
- **revision**
- **revision3:embed**
- **RICE**
- **RingTV**
- **RockstarGames**
- **RottenTomatoes**
- **Roxwel**
- **RTBF**
@@ -551,10 +579,11 @@
- **ScreencastOMatic**
- **ScreenJunkies**
- **ScreenwaveMedia**
- **Seeker**
- **SenateISVP**
- **SendtoNews**
- **ServingSys**
- **Sexu**
- **SexyKarma**: Sexy Karma and Watch Indian Porn
- **Shahid**
- **Shared**: shared.sx and vivo.sx
- **ShareSix**
@@ -567,8 +596,6 @@
- **smotri:broadcast**: Smotri.com broadcasts
- **smotri:community**: Smotri.com community videos
- **smotri:user**: Smotri.com user videos
- **SnagFilms**
- **SnagFilmsEmbed**
- **Snotr**
- **Sohu**
- **soundcloud**
@@ -610,6 +637,7 @@
- **Syfy**
- **SztvHu**
- **Tagesschau**
- **tagesschau:player**
- **Tapely**
- **Tass**
- **TDSLifeway**
@@ -627,6 +655,7 @@
- **Telegraaf**
- **TeleMB**
- **TeleTask**
- **Telewebion**
- **TF1**
- **TheIntercept**
- **ThePlatform**
@@ -673,12 +702,12 @@
- **TVCArticle**
- **tvigle**: Интернет-телевидение Tvigle.ru
- **tvland.com**
- **tvp.pl**
- **tvp.pl:Series**
- **tvp**: Telewizja Polska
- **tvp:series**
- **TVPlay**: TV3Play and related services
- **Tweakers**
- **twitch:bookmarks**
- **twitch:chapter**
- **twitch:clips**
- **twitch:past_broadcasts**
- **twitch:profile**
- **twitch:stream**
@@ -695,7 +724,8 @@
- **USAToday**
- **ustream**
- **ustream:channel**
- **Ustudio**
- **ustudio**
- **ustudio:embed**
- **Varzesh3**
- **Vbox7**
- **VeeHD**
@@ -703,6 +733,7 @@
- **Vessel**
- **Vesti**: Вести.Ru
- **Vevo**
- **VevoPlaylist**
- **VGTV**: VGTV, BTTV, FTV, Aftenposten and Aftonbladet
- **vh1.com**
- **Vice**
@@ -719,12 +750,15 @@
- **VideoPremium**
- **VideoTt**: video.tt - Your True Tube (Currently broken)
- **videoweed**: VideoWeed
- **Vidio**
- **vidme**
- **vidme:user**
- **vidme:user:likes**
- **Vidzi**
- **vier**
- **vier:videos**
- **ViewLift**
- **ViewLiftEmbed**
- **Viewster**
- **Viidea**
- **viki**
@@ -752,16 +786,15 @@
- **VRT**
- **vube**: Vube.com
- **VuClip**
- **vulture.com**
- **Walla**
- **WashingtonPost**
- **washingtonpost**
- **washingtonpost:article**
- **wat.tv**
- **WatchIndianPorn**: Watch Indian Porn
- **WDR**
- **wdr:mobile**
- **WDRMaus**: Sendung mit der Maus
- **WebOfStories**
- **WebOfStoriesPlaylist**
- **Weibo**
- **WeiqiTV**: WQTV
- **wholecloud**: WholeCloud
- **Wimp**
@@ -769,12 +802,17 @@
- **WNL**
- **WorldStarHipHop**
- **wrzuta.pl**
- **wrzuta.pl:playlist**
- **WSJ**: Wall Street Journal
- **XBef**
- **XboxClips**
- **XFileShare**: XFileShare based sites: GorillaVid.in, daclips.in, movpod.in, fastvideo.in, realvid.net, filehoot.com and vidto.me
- **XFileShare**: XFileShare based sites: DaClips, FileHoot, GorillaVid, MovPod, PowerWatch, Rapidvideo.ws, TheVideoBee, Vidto, Streamin.To, XVIDSTAGE
- **XHamster**
- **XHamsterEmbed**
- **xiami:album**: 虾米音乐 - 专辑
- **xiami:artist**: 虾米音乐 - 歌手
- **xiami:collection**: 虾米音乐 - 精选集
- **xiami:song**: 虾米音乐
- **XMinus**
- **XNXX**
- **Xstream**
@@ -793,6 +831,7 @@
- **Ynet**
- **YouJizz**
- **youku**: 优酷
- **youku:show**
- **YouPorn**
- **YourUpload**
- **youtube**: YouTube.com

View File

@@ -122,6 +122,7 @@ setup(
"Programming Language :: Python :: 3.2",
"Programming Language :: Python :: 3.3",
"Programming Language :: Python :: 3.4",
"Programming Language :: Python :: 3.5",
],
cmdclass={'build_lazy_extractors': build_lazy_extractors},

View File

@@ -24,8 +24,13 @@ from youtube_dl.utils import (
def get_params(override=None):
PARAMETERS_FILE = os.path.join(os.path.dirname(os.path.abspath(__file__)),
"parameters.json")
LOCAL_PARAMETERS_FILE = os.path.join(os.path.dirname(os.path.abspath(__file__)),
"local_parameters.json")
with io.open(PARAMETERS_FILE, encoding='utf-8') as pf:
parameters = json.load(pf)
if os.path.exists(LOCAL_PARAMETERS_FILE):
with io.open(LOCAL_PARAMETERS_FILE, encoding='utf-8') as pf:
parameters.update(json.load(pf))
if override:
parameters.update(override)
return parameters

View File

@@ -10,13 +10,14 @@ import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from youtube_dl.utils import get_filesystem_encoding
from youtube_dl.compat import (
compat_getenv,
compat_setenv,
compat_etree_fromstring,
compat_expanduser,
compat_shlex_split,
compat_str,
compat_struct_unpack,
compat_urllib_parse_unquote,
compat_urllib_parse_unquote_plus,
compat_urllib_parse_urlencode,
@@ -26,19 +27,22 @@ from youtube_dl.compat import (
class TestCompat(unittest.TestCase):
def test_compat_getenv(self):
test_str = 'тест'
os.environ['YOUTUBE-DL-TEST'] = (
test_str if sys.version_info >= (3, 0)
else test_str.encode(get_filesystem_encoding()))
compat_setenv('YOUTUBE-DL-TEST', test_str)
self.assertEqual(compat_getenv('YOUTUBE-DL-TEST'), test_str)
def test_compat_setenv(self):
test_var = 'YOUTUBE-DL-TEST'
test_str = 'тест'
compat_setenv(test_var, test_str)
compat_getenv(test_var)
self.assertEqual(compat_getenv(test_var), test_str)
def test_compat_expanduser(self):
old_home = os.environ.get('HOME')
test_str = 'C:\Documents and Settings\тест\Application Data'
os.environ['HOME'] = (
test_str if sys.version_info >= (3, 0)
else test_str.encode(get_filesystem_encoding()))
compat_setenv('HOME', test_str)
self.assertEqual(compat_expanduser('~'), test_str)
os.environ['HOME'] = old_home
compat_setenv('HOME', old_home or '')
def test_all_present(self):
import youtube_dl.compat
@@ -99,5 +103,15 @@ class TestCompat(unittest.TestCase):
self.assertTrue(isinstance(doc.find('chinese').text, compat_str))
self.assertTrue(isinstance(doc.find('foo/bar').text, compat_str))
def test_compat_etree_fromstring_doctype(self):
xml = '''<?xml version="1.0"?>
<!DOCTYPE smil PUBLIC "-//W3C//DTD SMIL 2.0//EN" "http://www.w3.org/2001/SMIL20/SMIL20.dtd">
<smil xmlns="http://www.w3.org/2001/SMIL20/Language"></smil>'''
compat_etree_fromstring(xml)
def test_struct_unpack(self):
self.assertEqual(compat_struct_unpack('!B', b'\x00'), (0,))
if __name__ == '__main__':
unittest.main()

View File

@@ -16,6 +16,15 @@ import threading
TEST_DIR = os.path.dirname(os.path.abspath(__file__))
def http_server_port(httpd):
if os.name == 'java' and isinstance(httpd.socket, ssl.SSLSocket):
# In Jython SSLSocket is not a subclass of socket.socket
sock = httpd.socket.sock
else:
sock = httpd.socket
return sock.getsockname()[1]
class HTTPTestRequestHandler(compat_http_server.BaseHTTPRequestHandler):
def log_message(self, format, *args):
pass
@@ -31,6 +40,22 @@ class HTTPTestRequestHandler(compat_http_server.BaseHTTPRequestHandler):
self.send_header('Content-Type', 'video/mp4')
self.end_headers()
self.wfile.write(b'\x00\x00\x00\x00\x20\x66\x74[video]')
elif self.path == '/302':
if sys.version_info[0] == 3:
# XXX: Python 3 http server does not allow non-ASCII header values
self.send_response(404)
self.end_headers()
return
new_url = 'http://localhost:%d/中文.html' % http_server_port(self.server)
self.send_response(302)
self.send_header(b'Location', new_url.encode('utf-8'))
self.end_headers()
elif self.path == '/%E4%B8%AD%E6%96%87.html':
self.send_response(200)
self.send_header('Content-Type', 'text/html; charset=utf-8')
self.end_headers()
self.wfile.write(b'<html><video src="/vid.mp4" /></html>')
else:
assert False
@@ -47,18 +72,32 @@ class FakeLogger(object):
class TestHTTP(unittest.TestCase):
def setUp(self):
self.httpd = compat_http_server.HTTPServer(
('localhost', 0), HTTPTestRequestHandler)
self.port = http_server_port(self.httpd)
self.server_thread = threading.Thread(target=self.httpd.serve_forever)
self.server_thread.daemon = True
self.server_thread.start()
def test_unicode_path_redirection(self):
# XXX: Python 3 http server does not allow non-ASCII header values
if sys.version_info[0] == 3:
return
ydl = YoutubeDL({'logger': FakeLogger()})
r = ydl.extract_info('http://localhost:%d/302' % self.port)
self.assertEqual(r['url'], 'http://localhost:%d/vid.mp4' % self.port)
class TestHTTPS(unittest.TestCase):
def setUp(self):
certfn = os.path.join(TEST_DIR, 'testcert.pem')
self.httpd = compat_http_server.HTTPServer(
('localhost', 0), HTTPTestRequestHandler)
self.httpd.socket = ssl.wrap_socket(
self.httpd.socket, certfile=certfn, server_side=True)
if os.name == 'java':
# In Jython SSLSocket is not a subclass of socket.socket
sock = self.httpd.socket.sock
else:
sock = self.httpd.socket
self.port = sock.getsockname()[1]
self.port = http_server_port(self.httpd)
self.server_thread = threading.Thread(target=self.httpd.serve_forever)
self.server_thread.daemon = True
self.server_thread.start()
@@ -94,14 +133,14 @@ class TestProxy(unittest.TestCase):
def setUp(self):
self.proxy = compat_http_server.HTTPServer(
('localhost', 0), _build_proxy_handler('normal'))
self.port = self.proxy.socket.getsockname()[1]
self.port = http_server_port(self.proxy)
self.proxy_thread = threading.Thread(target=self.proxy.serve_forever)
self.proxy_thread.daemon = True
self.proxy_thread.start()
self.cn_proxy = compat_http_server.HTTPServer(
('localhost', 0), _build_proxy_handler('cn'))
self.cn_port = self.cn_proxy.socket.getsockname()[1]
self.cn_port = http_server_port(self.cn_proxy)
self.cn_proxy_thread = threading.Thread(target=self.cn_proxy.serve_forever)
self.cn_proxy_thread.daemon = True
self.cn_proxy_thread.start()

118
test/test_socks.py Normal file
View File

@@ -0,0 +1,118 @@
#!/usr/bin/env python
# coding: utf-8
from __future__ import unicode_literals
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import random
import subprocess
from test.helper import (
FakeYDL,
get_params,
)
from youtube_dl.compat import (
compat_str,
compat_urllib_request,
)
class TestMultipleSocks(unittest.TestCase):
@staticmethod
def _check_params(attrs):
params = get_params()
for attr in attrs:
if attr not in params:
print('Missing %s. Skipping.' % attr)
return
return params
def test_proxy_http(self):
params = self._check_params(['primary_proxy', 'primary_server_ip'])
if params is None:
return
ydl = FakeYDL({
'proxy': params['primary_proxy']
})
self.assertEqual(
ydl.urlopen('http://yt-dl.org/ip').read().decode('utf-8'),
params['primary_server_ip'])
def test_proxy_https(self):
params = self._check_params(['primary_proxy', 'primary_server_ip'])
if params is None:
return
ydl = FakeYDL({
'proxy': params['primary_proxy']
})
self.assertEqual(
ydl.urlopen('https://yt-dl.org/ip').read().decode('utf-8'),
params['primary_server_ip'])
def test_secondary_proxy_http(self):
params = self._check_params(['secondary_proxy', 'secondary_server_ip'])
if params is None:
return
ydl = FakeYDL()
req = compat_urllib_request.Request('http://yt-dl.org/ip')
req.add_header('Ytdl-request-proxy', params['secondary_proxy'])
self.assertEqual(
ydl.urlopen(req).read().decode('utf-8'),
params['secondary_server_ip'])
def test_secondary_proxy_https(self):
params = self._check_params(['secondary_proxy', 'secondary_server_ip'])
if params is None:
return
ydl = FakeYDL()
req = compat_urllib_request.Request('https://yt-dl.org/ip')
req.add_header('Ytdl-request-proxy', params['secondary_proxy'])
self.assertEqual(
ydl.urlopen(req).read().decode('utf-8'),
params['secondary_server_ip'])
class TestSocks(unittest.TestCase):
_SKIP_SOCKS_TEST = True
def setUp(self):
if self._SKIP_SOCKS_TEST:
return
self.port = random.randint(20000, 30000)
self.server_process = subprocess.Popen([
'srelay', '-f', '-i', '127.0.0.1:%d' % self.port],
stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
def tearDown(self):
if self._SKIP_SOCKS_TEST:
return
self.server_process.terminate()
self.server_process.communicate()
def _get_ip(self, protocol):
if self._SKIP_SOCKS_TEST:
return '127.0.0.1'
ydl = FakeYDL({
'proxy': '%s://127.0.0.1:%d' % (protocol, self.port),
})
return ydl.urlopen('http://yt-dl.org/ip').read().decode('utf-8')
def test_socks4(self):
self.assertTrue(isinstance(self._get_ip('socks4'), compat_str))
def test_socks4a(self):
self.assertTrue(isinstance(self._get_ip('socks4a'), compat_str))
def test_socks5(self):
self.assertTrue(isinstance(self._get_ip('socks5'), compat_str))
if __name__ == '__main__':
unittest.main()

View File

@@ -50,12 +50,13 @@ from youtube_dl.utils import (
sanitize_path,
prepend_extension,
replace_extension,
remove_start,
remove_end,
remove_quotes,
shell_quote,
smuggle_url,
str_to_int,
strip_jsonp,
struct_unpack,
timeconvert,
unescapeHTML,
unified_strdate,
@@ -139,8 +140,8 @@ class TestUtil(unittest.TestCase):
self.assertEqual('yes_no', sanitize_filename('yes? no', restricted=True))
self.assertEqual('this_-_that', sanitize_filename('this: that', restricted=True))
tests = 'a\xe4b\u4e2d\u56fd\u7684c'
self.assertEqual(sanitize_filename(tests, restricted=True), 'a_b_c')
tests = 'aäb\u4e2d\u56fd\u7684c'
self.assertEqual(sanitize_filename(tests, restricted=True), 'aab_c')
self.assertTrue(sanitize_filename('\xf6', restricted=True) != '') # No empty filename
forbidden = '"\0\\/&!: \'\t\n()[]{}$;`^,#'
@@ -155,6 +156,10 @@ class TestUtil(unittest.TestCase):
self.assertTrue(sanitize_filename('-', restricted=True) != '')
self.assertTrue(sanitize_filename(':', restricted=True) != '')
self.assertEqual(sanitize_filename(
'ÂÃÄÀÁÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖŐØŒÙÚÛÜŰÝÞßàáâãäåæçèéêëìíîïðñòóôõöőøœùúûüűýþÿ', restricted=True),
'AAAAAAAECEEEEIIIIDNOOOOOOOOEUUUUUYPssaaaaaaaeceeeeiiiionooooooooeuuuuuypy')
def test_sanitize_ids(self):
self.assertEqual(sanitize_filename('_n_cd26wFpw', is_id=True), '_n_cd26wFpw')
self.assertEqual(sanitize_filename('_BD_eEpuzXw', is_id=True), '_BD_eEpuzXw')
@@ -212,6 +217,16 @@ class TestUtil(unittest.TestCase):
self.assertEqual(replace_extension('.abc', 'temp'), '.abc.temp')
self.assertEqual(replace_extension('.abc.ext', 'temp'), '.abc.temp')
def test_remove_start(self):
self.assertEqual(remove_start(None, 'A - '), None)
self.assertEqual(remove_start('A - B', 'A - '), 'B')
self.assertEqual(remove_start('B - A', 'A - '), 'B - A')
def test_remove_end(self):
self.assertEqual(remove_end(None, ' - B'), None)
self.assertEqual(remove_end('A - B', ' - B'), 'A')
self.assertEqual(remove_end('B - A', ' - B'), 'B - A')
def test_remove_quotes(self):
self.assertEqual(remove_quotes(None), None)
self.assertEqual(remove_quotes('"'), '"')
@@ -234,6 +249,8 @@ class TestUtil(unittest.TestCase):
self.assertEqual(unescapeHTML('&#47;'), '/')
self.assertEqual(unescapeHTML('&eacute;'), 'é')
self.assertEqual(unescapeHTML('&#2013266066;'), '&#2013266066;')
# HTML5 entities
self.assertEqual(unescapeHTML('&period;&apos;'), '.\'')
def test_date_from_str(self):
self.assertEqual(date_from_str('yesterday'), date_from_str('now-1day'))
@@ -453,9 +470,6 @@ class TestUtil(unittest.TestCase):
testPL(5, 2, (2, 99), [2, 3, 4])
testPL(5, 2, (20, 99), [])
def test_struct_unpack(self):
self.assertEqual(struct_unpack('!B', b'\x00'), (0,))
def test_read_batch_urls(self):
f = io.StringIO('''\xef\xbb\xbf foo
bar\r
@@ -617,6 +631,18 @@ class TestUtil(unittest.TestCase):
json_code = js_to_json(inp)
self.assertEqual(json.loads(json_code), json.loads(inp))
inp = '''{
0:{src:'skipped', type: 'application/dash+xml'},
1:{src:'skipped', type: 'application/vnd.apple.mpegURL'},
}'''
self.assertEqual(js_to_json(inp), '''{
"0":{"src":"skipped", "type": "application/dash+xml"},
"1":{"src":"skipped", "type": "application/vnd.apple.mpegURL"}
}''')
inp = '''{"foo":101}'''
self.assertEqual(js_to_json(inp), '''{"foo":101}''')
def test_js_to_json_edgecases(self):
on = js_to_json("{abc_def:'1\\'\\\\2\\\\\\'3\"4'}")
self.assertEqual(json.loads(on), {"abc_def": "1'\\2\\'3\"4"})
@@ -640,6 +666,27 @@ class TestUtil(unittest.TestCase):
on = js_to_json('{"abc": "def",}')
self.assertEqual(json.loads(on), {'abc': 'def'})
on = js_to_json('{ 0: /* " \n */ ",]" , }')
self.assertEqual(json.loads(on), {'0': ',]'})
on = js_to_json(r'["<p>x<\/p>"]')
self.assertEqual(json.loads(on), ['<p>x</p>'])
on = js_to_json(r'["\xaa"]')
self.assertEqual(json.loads(on), ['\u00aa'])
on = js_to_json("['a\\\nb']")
self.assertEqual(json.loads(on), ['ab'])
on = js_to_json('{0xff:0xff}')
self.assertEqual(json.loads(on), {'255': 255})
on = js_to_json('{077:077}')
self.assertEqual(json.loads(on), {'63': 63})
on = js_to_json('{42:42}')
self.assertEqual(json.loads(on), {'42': 42})
def test_extract_attributes(self):
self.assertEqual(extract_attributes('<e x="y">'), {'x': 'y'})
self.assertEqual(extract_attributes("<e x='y'>"), {'x': 'y'})

View File

@@ -9,5 +9,6 @@ passenv = HOME
defaultargs = test --exclude test_download.py --exclude test_age_restriction.py
--exclude test_subtitles.py --exclude test_write_annotations.py
--exclude test_youtube_lists.py --exclude test_iqiyi_sdk_interpreter.py
--exclude test_socks.py
commands = nosetests --verbose {posargs:{[testenv]defaultargs}} # --with-coverage --cover-package=youtube_dl --cover-html
# test.test_download:TestDownload.test_NowVideo

View File

@@ -64,6 +64,7 @@ from .utils import (
PostProcessingError,
preferredencoding,
prepend_extension,
register_socks_protocols,
render_table,
replace_extension,
SameFileError,
@@ -325,7 +326,7 @@ class YoutubeDL(object):
['fribidi', '-c', 'UTF-8'] + width_args, **sp_kwargs)
self._output_channel = os.fdopen(master, 'rb')
except OSError as ose:
if ose.errno == 2:
if ose.errno == errno.ENOENT:
self.report_warning('Could not find fribidi executable, ignoring --bidi-workaround . Make sure that fribidi is an executable file in one of the directories in your $PATH.')
else:
raise
@@ -361,6 +362,8 @@ class YoutubeDL(object):
for ph in self.params.get('progress_hooks', []):
self.add_progress_hook(ph)
register_socks_protocols()
def warn_if_short_id(self, argv):
# short YouTube ID starting with dash?
idxs = [
@@ -580,7 +583,7 @@ class YoutubeDL(object):
is_id=(k == 'id'))
template_dict = dict((k, sanitize(k, v))
for k, v in template_dict.items()
if v is not None)
if v is not None and not isinstance(v, (list, tuple, dict)))
template_dict = collections.defaultdict(lambda: 'NA', template_dict)
outtmpl = self.params.get('outtmpl', DEFAULT_OUTTMPL)
@@ -717,6 +720,7 @@ class YoutubeDL(object):
result_type = ie_result.get('_type', 'video')
if result_type in ('url', 'url_transparent'):
ie_result['url'] = sanitize_url(ie_result['url'])
extract_flat = self.params.get('extract_flat', False)
if ((extract_flat == 'in_playlist' and 'playlist' in extra_info) or
extract_flat is True):
@@ -1219,6 +1223,10 @@ class YoutubeDL(object):
if 'title' not in info_dict:
raise ExtractorError('Missing "title" field in extractor result')
if not isinstance(info_dict['id'], compat_str):
self.report_warning('"id" field is not a string - forcing string conversion')
info_dict['id'] = compat_str(info_dict['id'])
if 'playlist' not in info_dict:
# It isn't part of a playlist
info_dict['playlist'] = None
@@ -1639,7 +1647,7 @@ class YoutubeDL(object):
# Just a single file
success = dl(filename, info_dict)
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
self.report_error('unable to download video data: %s' % str(err))
self.report_error('unable to download video data: %s' % error_to_compat_str(err))
return
except (OSError, IOError) as err:
raise UnavailableVideoError(err)
@@ -2018,6 +2026,7 @@ class YoutubeDL(object):
if opts_cookiefile is None:
self.cookiejar = compat_cookiejar.CookieJar()
else:
opts_cookiefile = compat_expanduser(opts_cookiefile)
self.cookiejar = compat_cookiejar.MozillaCookieJar(
opts_cookiefile)
if os.access(opts_cookiefile, os.R_OK):

View File

@@ -18,7 +18,6 @@ from .options import (
from .compat import (
compat_expanduser,
compat_getpass,
compat_print,
compat_shlex_split,
workaround_optparse_bug9161,
)
@@ -67,16 +66,16 @@ def _real_main(argv=None):
# Custom HTTP headers
if opts.headers is not None:
for h in opts.headers:
if h.find(':', 1) < 0:
if ':' not in h:
parser.error('wrong header formatting, it should be key:value, not "%s"' % h)
key, value = h.split(':', 2)
key, value = h.split(':', 1)
if opts.verbose:
write_string('[debug] Adding header from command line option %s:%s\n' % (key, value))
std_headers[key] = value
# Dump user agent
if opts.dump_user_agent:
compat_print(std_headers['User-Agent'])
write_string(std_headers['User-Agent'] + '\n', out=sys.stdout)
sys.exit(0)
# Batch file verification
@@ -86,7 +85,9 @@ def _real_main(argv=None):
if opts.batchfile == '-':
batchfd = sys.stdin
else:
batchfd = io.open(opts.batchfile, 'r', encoding='utf-8', errors='ignore')
batchfd = io.open(
compat_expanduser(opts.batchfile),
'r', encoding='utf-8', errors='ignore')
batch_urls = read_batch_urls(batchfd)
if opts.verbose:
write_string('[debug] Batch file urls: ' + repr(batch_urls) + '\n')
@@ -99,10 +100,10 @@ def _real_main(argv=None):
if opts.list_extractors:
for ie in list_extractors(opts.age_limit):
compat_print(ie.IE_NAME + (' (CURRENTLY BROKEN)' if not ie._WORKING else ''))
write_string(ie.IE_NAME + (' (CURRENTLY BROKEN)' if not ie._WORKING else '') + '\n', out=sys.stdout)
matchedUrls = [url for url in all_urls if ie.suitable(url)]
for mu in matchedUrls:
compat_print(' ' + mu)
write_string(' ' + mu + '\n', out=sys.stdout)
sys.exit(0)
if opts.list_extractor_descriptions:
for ie in list_extractors(opts.age_limit):
@@ -115,7 +116,7 @@ def _real_main(argv=None):
_SEARCHES = ('cute kittens', 'slithering pythons', 'falling cat', 'angry poodle', 'purple fish', 'running tortoise', 'sleeping bunny', 'burping cow')
_COUNTS = ('', '5', '10', 'all')
desc += ' (Example: "%s%s:%s" )' % (ie.SEARCH_KEY, random.choice(_COUNTS), random.choice(_SEARCHES))
compat_print(desc)
write_string(desc + '\n', out=sys.stdout)
sys.exit(0)
# Conflicting, missing and erroneous options
@@ -404,7 +405,7 @@ def _real_main(argv=None):
try:
if opts.load_info_filename is not None:
retcode = ydl.download_with_info_file(opts.load_info_filename)
retcode = ydl.download_with_info_file(compat_expanduser(opts.load_info_filename))
else:
retcode = ydl.download(all_urls)
except MaxDownloadsReached:

File diff suppressed because it is too large Load Diff

View File

@@ -6,6 +6,7 @@ import sys
import re
from .common import FileDownloader
from ..compat import compat_setenv
from ..postprocessor.ffmpeg import FFmpegPostProcessor, EXT_TO_OUT_FORMATS
from ..utils import (
cli_option,
@@ -84,7 +85,7 @@ class ExternalFD(FileDownloader):
cmd, stderr=subprocess.PIPE)
_, stderr = p.communicate()
if p.returncode != 0:
self.to_stderr(stderr)
self.to_stderr(stderr.decode('utf-8', 'replace'))
return p.returncode
@@ -198,6 +199,19 @@ class FFmpegFD(ExternalFD):
'-headers',
''.join('%s: %s\r\n' % (key, val) for key, val in headers.items())]
env = None
proxy = self.params.get('proxy')
if proxy:
if not re.match(r'^[\da-zA-Z]+://', proxy):
proxy = 'http://%s' % proxy
# Since December 2015 ffmpeg supports -http_proxy option (see
# http://git.videolan.org/?p=ffmpeg.git;a=commit;h=b4eb1f29ebddd60c41a2eb39f5af701e38e0d3fd)
# We could switch to the following code if we are able to detect version properly
# args += ['-http_proxy', proxy]
env = os.environ.copy()
compat_setenv('HTTP_PROXY', proxy, env=env)
compat_setenv('http_proxy', proxy, env=env)
protocol = info_dict.get('protocol')
if protocol == 'rtmp':
@@ -224,7 +238,7 @@ class FFmpegFD(ExternalFD):
args += ['-rtmp_live', 'live']
args += ['-i', url, '-c', 'copy']
if protocol == 'm3u8':
if protocol in ('m3u8', 'm3u8_native'):
if self.params.get('hls_use_mpegts', False) or tmpfilename == '-':
args += ['-f', 'mpegts']
else:
@@ -239,7 +253,7 @@ class FFmpegFD(ExternalFD):
self._debug_cmd(args)
proc = subprocess.Popen(args, stdin=subprocess.PIPE)
proc = subprocess.Popen(args, stdin=subprocess.PIPE, env=env)
try:
retval = proc.wait()
except KeyboardInterrupt:

View File

@@ -12,37 +12,49 @@ from ..compat import (
compat_urlparse,
compat_urllib_error,
compat_urllib_parse_urlparse,
compat_struct_pack,
compat_struct_unpack,
)
from ..utils import (
encodeFilename,
fix_xml_ampersands,
sanitize_open,
struct_pack,
struct_unpack,
xpath_text,
)
class DataTruncatedError(Exception):
pass
class FlvReader(io.BytesIO):
"""
Reader for Flv files
The file format is documented in https://www.adobe.com/devnet/f4v.html
"""
def read_bytes(self, n):
data = self.read(n)
if len(data) < n:
raise DataTruncatedError(
'FlvReader error: need %d bytes while only %d bytes got' % (
n, len(data)))
return data
# Utility functions for reading numbers and strings
def read_unsigned_long_long(self):
return struct_unpack('!Q', self.read(8))[0]
return compat_struct_unpack('!Q', self.read_bytes(8))[0]
def read_unsigned_int(self):
return struct_unpack('!I', self.read(4))[0]
return compat_struct_unpack('!I', self.read_bytes(4))[0]
def read_unsigned_char(self):
return struct_unpack('!B', self.read(1))[0]
return compat_struct_unpack('!B', self.read_bytes(1))[0]
def read_string(self):
res = b''
while True:
char = self.read(1)
char = self.read_bytes(1)
if char == b'\x00':
break
res += char
@@ -53,18 +65,18 @@ class FlvReader(io.BytesIO):
Read a box and return the info as a tuple: (box_size, box_type, box_data)
"""
real_size = size = self.read_unsigned_int()
box_type = self.read(4)
box_type = self.read_bytes(4)
header_end = 8
if size == 1:
real_size = self.read_unsigned_long_long()
header_end = 16
return real_size, box_type, self.read(real_size - header_end)
return real_size, box_type, self.read_bytes(real_size - header_end)
def read_asrt(self):
# version
self.read_unsigned_char()
# flags
self.read(3)
self.read_bytes(3)
quality_entry_count = self.read_unsigned_char()
# QualityEntryCount
for i in range(quality_entry_count):
@@ -85,7 +97,7 @@ class FlvReader(io.BytesIO):
# version
self.read_unsigned_char()
# flags
self.read(3)
self.read_bytes(3)
# time scale
self.read_unsigned_int()
@@ -119,7 +131,7 @@ class FlvReader(io.BytesIO):
# version
self.read_unsigned_char()
# flags
self.read(3)
self.read_bytes(3)
self.read_unsigned_int() # BootstrapinfoVersion
# Profile,Live,Update,Reserved
@@ -194,11 +206,11 @@ def build_fragments_list(boot_info):
def write_unsigned_int(stream, val):
stream.write(struct_pack('!I', val))
stream.write(compat_struct_pack('!I', val))
def write_unsigned_int_24(stream, val):
stream.write(struct_pack('!I', val)[1:])
stream.write(compat_struct_pack('!I', val)[1:])
def write_flv_header(stream):
@@ -307,7 +319,7 @@ class F4mFD(FragmentFD):
doc = compat_etree_fromstring(manifest)
formats = [(int(f.attrib.get('bitrate', -1)), f)
for f in self._get_unencrypted_media(doc)]
if requested_bitrate is None:
if requested_bitrate is None or len(formats) == 1:
# get the best format
formats = sorted(formats, key=lambda f: f[0])
rate, media = formats[-1]
@@ -374,7 +386,17 @@ class F4mFD(FragmentFD):
down.close()
reader = FlvReader(down_data)
while True:
_, box_type, box_data = reader.read_box_info()
try:
_, box_type, box_data = reader.read_box_info()
except DataTruncatedError:
if test:
# In tests, segments may be truncated, and thus
# FlvReader may not be able to parse the whole
# chunk. If so, write the segment as is
# See https://github.com/rg3/youtube-dl/issues/9214
dest_stream.write(down_data)
break
raise
if box_type == b'mdat':
dest_stream.write(box_data)
break

View File

@@ -2,13 +2,24 @@ from __future__ import unicode_literals
import os.path
import re
import binascii
try:
from Crypto.Cipher import AES
can_decrypt_frag = True
except ImportError:
can_decrypt_frag = False
from .fragment import FragmentFD
from .external import FFmpegFD
from ..compat import compat_urlparse
from ..compat import (
compat_urlparse,
compat_struct_pack,
)
from ..utils import (
encodeFilename,
sanitize_open,
parse_m3u8_attributes,
)
@@ -17,42 +28,101 @@ class HlsFD(FragmentFD):
FD_NAME = 'hlsnative'
@staticmethod
def can_download(manifest):
UNSUPPORTED_FEATURES = (
r'#EXT-X-KEY:METHOD=(?!NONE|AES-128)', # encrypted streams [1]
r'#EXT-X-BYTERANGE', # playlists composed of byte ranges of media files [2]
# Live streams heuristic does not always work (e.g. geo restricted to Germany
# http://hls-geo.daserste.de/i/videoportal/Film/c_620000/622873/format,716451,716457,716450,716458,716459,.mp4.csmil/index_4_av.m3u8?null=0)
# r'#EXT-X-MEDIA-SEQUENCE:(?!0$)', # live streams [3]
# This heuristic also is not correct since segments may not be appended as well.
# Twitch vods of finished streams have EXT-X-PLAYLIST-TYPE:EVENT despite
# no segments will definitely be appended to the end of the playlist.
# r'#EXT-X-PLAYLIST-TYPE:EVENT', # media segments may be appended to the end of
# # event media playlists [4]
# 1. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.2.4
# 2. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.2.2
# 3. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3.2
# 4. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3.5
)
check_results = [not re.search(feature, manifest) for feature in UNSUPPORTED_FEATURES]
check_results.append(can_decrypt_frag or '#EXT-X-KEY:METHOD=AES-128' not in manifest)
return all(check_results)
def real_download(self, filename, info_dict):
man_url = info_dict['url']
self.to_screen('[%s] Downloading m3u8 manifest' % self.FD_NAME)
manifest = self.ydl.urlopen(man_url).read()
s = manifest.decode('utf-8', 'ignore')
fragment_urls = []
if not self.can_download(s):
self.report_warning(
'hlsnative has detected features it does not support, '
'extraction will be delegated to ffmpeg')
fd = FFmpegFD(self.ydl, self.params)
for ph in self._progress_hooks:
fd.add_progress_hook(ph)
return fd.real_download(filename, info_dict)
total_frags = 0
for line in s.splitlines():
line = line.strip()
if line and not line.startswith('#'):
segment_url = (
line
if re.match(r'^https?://', line)
else compat_urlparse.urljoin(man_url, line))
fragment_urls.append(segment_url)
# We only download the first fragment during the test
if self.params.get('test', False):
break
total_frags += 1
ctx = {
'filename': filename,
'total_frags': len(fragment_urls),
'total_frags': total_frags,
}
self._prepare_and_start_frag_download(ctx)
i = 0
media_sequence = 0
decrypt_info = {'METHOD': 'NONE'}
frags_filenames = []
for i, frag_url in enumerate(fragment_urls):
frag_filename = '%s-Frag%d' % (ctx['tmpfilename'], i)
success = ctx['dl'].download(frag_filename, {'url': frag_url})
if not success:
return False
down, frag_sanitized = sanitize_open(frag_filename, 'rb')
ctx['dest_stream'].write(down.read())
down.close()
frags_filenames.append(frag_sanitized)
for line in s.splitlines():
line = line.strip()
if line:
if not line.startswith('#'):
frag_url = (
line
if re.match(r'^https?://', line)
else compat_urlparse.urljoin(man_url, line))
frag_filename = '%s-Frag%d' % (ctx['tmpfilename'], i)
success = ctx['dl'].download(frag_filename, {'url': frag_url})
if not success:
return False
down, frag_sanitized = sanitize_open(frag_filename, 'rb')
frag_content = down.read()
down.close()
if decrypt_info['METHOD'] == 'AES-128':
iv = decrypt_info.get('IV') or compat_struct_pack('>8xq', media_sequence)
frag_content = AES.new(
decrypt_info['KEY'], AES.MODE_CBC, iv).decrypt(frag_content)
ctx['dest_stream'].write(frag_content)
frags_filenames.append(frag_sanitized)
# We only download the first fragment during the test
if self.params.get('test', False):
break
i += 1
media_sequence += 1
elif line.startswith('#EXT-X-KEY'):
decrypt_info = parse_m3u8_attributes(line[11:])
if decrypt_info['METHOD'] == 'AES-128':
if 'IV' in decrypt_info:
decrypt_info['IV'] = binascii.unhexlify(decrypt_info['IV'][2:])
if not re.match(r'^https?://', decrypt_info['URI']):
decrypt_info['URI'] = compat_urlparse.urljoin(
man_url, decrypt_info['URI'])
decrypt_info['KEY'] = self.ydl.urlopen(decrypt_info['URI']).read()
elif line.startswith('#EXT-X-MEDIA-SEQUENCE'):
media_sequence = int(line[22:])
self._finish_frag_download(ctx)

View File

@@ -0,0 +1,135 @@
# coding: utf-8
from __future__ import unicode_literals
import calendar
import re
import time
from .amp import AMPIE
from .common import InfoExtractor
from ..compat import compat_urlparse
class AbcNewsVideoIE(AMPIE):
IE_NAME = 'abcnews:video'
_VALID_URL = 'http://abcnews.go.com/[^/]+/video/(?P<display_id>[0-9a-z-]+)-(?P<id>\d+)'
_TESTS = [{
'url': 'http://abcnews.go.com/ThisWeek/video/week-exclusive-irans-foreign-minister-zarif-20411932',
'info_dict': {
'id': '20411932',
'ext': 'mp4',
'display_id': 'week-exclusive-irans-foreign-minister-zarif',
'title': '\'This Week\' Exclusive: Iran\'s Foreign Minister Zarif',
'description': 'George Stephanopoulos goes one-on-one with Iranian Foreign Minister Dr. Javad Zarif.',
'duration': 180,
'thumbnail': 're:^https?://.*\.jpg$',
},
'params': {
# m3u8 download
'skip_download': True,
},
}, {
'url': 'http://abcnews.go.com/2020/video/2020-husband-stands-teacher-jail-student-affairs-26119478',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('display_id')
video_id = mobj.group('id')
info_dict = self._extract_feed_info(
'http://abcnews.go.com/video/itemfeed?id=%s' % video_id)
info_dict.update({
'id': video_id,
'display_id': display_id,
})
return info_dict
class AbcNewsIE(InfoExtractor):
IE_NAME = 'abcnews'
_VALID_URL = 'https?://abcnews\.go\.com/(?:[^/]+/)+(?P<display_id>[0-9a-z-]+)/story\?id=(?P<id>\d+)'
_TESTS = [{
'url': 'http://abcnews.go.com/Blotter/News/dramatic-video-rare-death-job-america/story?id=10498713#.UIhwosWHLjY',
'info_dict': {
'id': '10498713',
'ext': 'flv',
'display_id': 'dramatic-video-rare-death-job-america',
'title': 'Occupational Hazards',
'description': 'Nightline investigates the dangers that lurk at various jobs.',
'thumbnail': 're:^https?://.*\.jpg$',
'upload_date': '20100428',
'timestamp': 1272412800,
},
'add_ie': ['AbcNewsVideo'],
}, {
'url': 'http://abcnews.go.com/Entertainment/justin-timberlake-performs-stop-feeling-eurovision-2016/story?id=39125818',
'info_dict': {
'id': '39125818',
'ext': 'mp4',
'display_id': 'justin-timberlake-performs-stop-feeling-eurovision-2016',
'title': 'Justin Timberlake Drops Hints For Secret Single',
'description': 'Lara Spencer reports the buzziest stories of the day in "GMA" Pop News.',
'upload_date': '20160515',
'timestamp': 1463329500,
},
'params': {
# m3u8 download
'skip_download': True,
# The embedded YouTube video is blocked due to copyright issues
'playlist_items': '1',
},
'add_ie': ['AbcNewsVideo'],
}, {
'url': 'http://abcnews.go.com/Technology/exclusive-apple-ceo-tim-cook-iphone-cracking-software/story?id=37173343',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('display_id')
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
video_url = self._search_regex(
r'window\.abcnvideo\.url\s*=\s*"([^"]+)"', webpage, 'video URL')
full_video_url = compat_urlparse.urljoin(url, video_url)
youtube_url = self._html_search_regex(
r'<iframe[^>]+src="(https://www\.youtube\.com/embed/[^"]+)"',
webpage, 'YouTube URL', default=None)
timestamp = None
date_str = self._html_search_regex(
r'<span[^>]+class="timestamp">([^<]+)</span>',
webpage, 'timestamp', fatal=False)
if date_str:
tz_offset = 0
if date_str.endswith(' ET'): # Eastern Time
tz_offset = -5
date_str = date_str[:-3]
date_formats = ['%b. %d, %Y', '%b %d, %Y, %I:%M %p']
for date_format in date_formats:
try:
timestamp = calendar.timegm(time.strptime(date_str.strip(), date_format))
except ValueError:
continue
if timestamp is not None:
timestamp -= tz_offset * 3600
entry = {
'_type': 'url_transparent',
'ie_key': AbcNewsVideoIE.ie_key(),
'url': full_video_url,
'id': video_id,
'display_id': display_id,
'timestamp': timestamp,
}
if youtube_url:
entries = [entry, self.url_result(youtube_url, 'Youtube')]
return self.playlist_result(entries)
return entry

View File

@@ -156,7 +156,10 @@ class AdobeTVVideoIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
video_data = self._download_json(url + '?format=json', video_id)
webpage = self._download_webpage(url, video_id)
video_data = self._parse_json(self._search_regex(
r'var\s+bridge\s*=\s*([^;]+);', webpage, 'bridged data'), video_id)
formats = [{
'format_id': '%s-%s' % (determine_ext(source['src']), source.get('height')),

View File

@@ -0,0 +1,133 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse_urlparse,
compat_urlparse,
)
from ..utils import (
ExtractorError,
int_or_none,
xpath_element,
xpath_text,
)
class AfreecaTVIE(InfoExtractor):
IE_DESC = 'afreecatv.com'
_VALID_URL = r'''(?x)^
https?://(?:(live|afbbs|www)\.)?afreeca(?:tv)?\.com(?::\d+)?
(?:
/app/(?:index|read_ucc_bbs)\.cgi|
/player/[Pp]layer\.(?:swf|html))
\?.*?\bnTitleNo=(?P<id>\d+)'''
_TESTS = [{
'url': 'http://live.afreecatv.com:8079/app/index.cgi?szType=read_ucc_bbs&szBjId=dailyapril&nStationNo=16711924&nBbsNo=18605867&nTitleNo=36164052&szSkin=',
'md5': 'f72c89fe7ecc14c1b5ce506c4996046e',
'info_dict': {
'id': '36164052',
'ext': 'mp4',
'title': '데일리 에이프릴 요정들의 시상식!',
'thumbnail': 're:^https?://(?:video|st)img.afreecatv.com/.*$',
'uploader': 'dailyapril',
'uploader_id': 'dailyapril',
'upload_date': '20160503',
}
}, {
'url': 'http://afbbs.afreecatv.com:8080/app/read_ucc_bbs.cgi?nStationNo=16711924&nTitleNo=36153164&szBjId=dailyapril&nBbsNo=18605867',
'info_dict': {
'id': '36153164',
'title': "BJ유트루와 함께하는 '팅커벨 메이크업!'",
'thumbnail': 're:^https?://(?:video|st)img.afreecatv.com/.*$',
'uploader': 'dailyapril',
'uploader_id': 'dailyapril',
},
'playlist_count': 2,
'playlist': [{
'md5': 'd8b7c174568da61d774ef0203159bf97',
'info_dict': {
'id': '36153164_1',
'ext': 'mp4',
'title': "BJ유트루와 함께하는 '팅커벨 메이크업!'",
'upload_date': '20160502',
},
}, {
'md5': '58f2ce7f6044e34439ab2d50612ab02b',
'info_dict': {
'id': '36153164_2',
'ext': 'mp4',
'title': "BJ유트루와 함께하는 '팅커벨 메이크업!'",
'upload_date': '20160502',
},
}],
}, {
'url': 'http://www.afreecatv.com/player/Player.swf?szType=szBjId=djleegoon&nStationNo=11273158&nBbsNo=13161095&nTitleNo=36327652',
'only_matching': True,
}]
@staticmethod
def parse_video_key(key):
video_key = {}
m = re.match(r'^(?P<upload_date>\d{8})_\w+_(?P<part>\d+)$', key)
if m:
video_key['upload_date'] = m.group('upload_date')
video_key['part'] = m.group('part')
return video_key
def _real_extract(self, url):
video_id = self._match_id(url)
parsed_url = compat_urllib_parse_urlparse(url)
info_url = compat_urlparse.urlunparse(parsed_url._replace(
netloc='afbbs.afreecatv.com:8080',
path='/api/video/get_video_info.php'))
video_xml = self._download_xml(info_url, video_id)
if xpath_element(video_xml, './track/video/file') is None:
raise ExtractorError('Specified AfreecaTV video does not exist',
expected=True)
title = xpath_text(video_xml, './track/title', 'title')
uploader = xpath_text(video_xml, './track/nickname', 'uploader')
uploader_id = xpath_text(video_xml, './track/bj_id', 'uploader id')
duration = int_or_none(xpath_text(video_xml, './track/duration',
'duration'))
thumbnail = xpath_text(video_xml, './track/titleImage', 'thumbnail')
entries = []
for i, video_file in enumerate(video_xml.findall('./track/video/file')):
video_key = self.parse_video_key(video_file.get('key', ''))
if not video_key:
continue
entries.append({
'id': '%s_%s' % (video_id, video_key.get('part', i + 1)),
'title': title,
'upload_date': video_key.get('upload_date'),
'duration': int_or_none(video_file.get('duration')),
'url': video_file.text,
})
info = {
'id': video_id,
'title': title,
'uploader': uploader,
'uploader_id': uploader_id,
'duration': duration,
'thumbnail': thumbnail,
}
if len(entries) > 1:
info['_type'] = 'multi_video'
info['entries'] = entries
elif len(entries) == 1:
info['url'] = entries[0]['url']
info['upload_date'] = entries[0].get('upload_date')
else:
raise ExtractorError(
'No files found for the specified AfreecaTV video, either'
' the URL is incorrect or the video has been made private.',
expected=True)
return info

View File

@@ -24,10 +24,10 @@ class AftonbladetIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
# find internal video meta data
meta_url = 'http://aftonbladet-play.drlib.aptoma.no/video/%s.json'
meta_url = 'http://aftonbladet-play-metadata.cdn.drvideo.aptoma.no/video/%s.json'
player_config = self._parse_json(self._html_search_regex(
r'data-player-config="([^"]+)"', webpage, 'player config'), video_id)
internal_meta_id = player_config['videoId']
internal_meta_id = player_config['aptomaVideoId']
internal_meta_url = meta_url % internal_meta_id
internal_meta_json = self._download_json(
internal_meta_url, video_id, 'Downloading video meta data')

View File

@@ -52,7 +52,7 @@ class AMPIE(InfoExtractor):
for media_data in media_content:
media = media_data['@attributes']
media_type = media['type']
if media_type == 'video/f4m':
if media_type in ('video/f4m', 'application/f4m+xml'):
formats.extend(self._extract_f4m_formats(
media['url'] + '?hdcore=3.4.0&plugin=aasp-3.4.0.132.124',
video_id, f4m_id='hds', fatal=False))
@@ -61,7 +61,7 @@ class AMPIE(InfoExtractor):
media['url'], video_id, 'mp4', m3u8_id='hls', fatal=False))
else:
formats.append({
'format_id': media_data['media-category']['@attributes']['label'],
'format_id': media_data.get('media-category', {}).get('@attributes', {}).get('label'),
'url': media['url'],
'tbr': int_or_none(media.get('bitrate')),
'filesize': int_or_none(media.get('fileSize')),

View File

@@ -0,0 +1,224 @@
# coding: utf-8
from __future__ import unicode_literals
import base64
import hashlib
import json
import random
import time
from .common import InfoExtractor
from ..aes import aes_encrypt
from ..compat import compat_str
from ..utils import (
bytes_to_intlist,
determine_ext,
intlist_to_bytes,
int_or_none,
strip_jsonp,
)
def md5_text(s):
if not isinstance(s, compat_str):
s = compat_str(s)
return hashlib.md5(s.encode('utf-8')).hexdigest()
class AnvatoIE(InfoExtractor):
# Copied from anvplayer.min.js
_ANVACK_TABLE = {
'nbcu_nbcd_desktop_web_prod_93d8ead38ce2024f8f544b78306fbd15895ae5e6': 'NNemUkySjxLyPTKvZRiGntBIjEyK8uqicjMakIaQ',
'nbcu_nbcd_desktop_web_qa_1a6f01bdd0dc45a439043b694c8a031d': 'eSxJUbA2UUKBTXryyQ2d6NuM8oEqaPySvaPzfKNA',
'nbcu_nbcd_desktop_web_acc_eb2ff240a5d4ae9a63d4c297c32716b6c523a129': '89JR3RtUGbvKuuJIiKOMK0SoarLb5MUx8v89RcbP',
'nbcu_nbcd_watchvod_web_prod_e61107507180976724ec8e8319fe24ba5b4b60e1': 'Uc7dFt7MJ9GsBWB5T7iPvLaMSOt8BBxv4hAXk5vv',
'nbcu_nbcd_watchvod_web_qa_42afedba88a36203db5a4c09a5ba29d045302232': 'T12oDYVFP2IaFvxkmYMy5dKxswpLHtGZa4ZAXEi7',
'nbcu_nbcd_watchvod_web_acc_9193214448e2e636b0ffb78abacfd9c4f937c6ca': 'MmobcxUxMedUpohNWwXaOnMjlbiyTOBLL6d46ZpR',
'nbcu_local_monitor_web_acc_f998ad54eaf26acd8ee033eb36f39a7b791c6335': 'QvfIoPYrwsjUCcASiw3AIkVtQob2LtJHfidp9iWg',
'nbcu_cable_monitor_web_acc_a413759603e8bedfcd3c61b14767796e17834077': 'uwVPJLShvJWSs6sWEIuVem7MTF8A4IknMMzIlFto',
'nbcu_nbcd_mcpstage_web_qa_4c43a8f6e95a88dbb40276c0630ba9f693a63a4e': 'PxVYZVwjhgd5TeoPRxL3whssb5OUPnM3zyAzq8GY',
'nbcu_comcast_comcast_web_prod_074080762ad4ce956b26b43fb22abf153443a8c4': 'afnaRZfDyg1Z3WZHdupKfy6xrbAG2MHqe3VfuSwh',
'nbcu_comcast_comcast_web_qa_706103bb93ead3ef70b1de12a0e95e3c4481ade0': 'DcjsVbX9b3uoPlhdriIiovgFQZVxpISZwz0cx1ZK',
'nbcu_comcast_comcastcable_web_prod_669f04817536743563d7331c9293e59fbdbe3d07': '0RwMN2cWy10qhAhOscq3eK7aEe0wqnKt3vJ0WS4D',
'nbcu_comcast_comcastcable_web_qa_3d9d2d66219094127f0f6b09cc3c7bb076e3e1ca': '2r8G9DEya7PCqBceKZgrn2XkXgASjwLMuaFE1Aad',
'hearst_hearst_demo_web_stage_960726dfef3337059a01a78816e43b29ec04dfc7': 'cuZBPXTR6kSdoTCVXwk5KGA8rk3NrgGn4H6e9Dsp',
'anvato_mcpqa_demo_web_stage_18b55e00db5a13faa8d03ae6e41f6f5bcb15b922': 'IOaaLQ8ymqVyem14QuAvE5SndQynTcH5CrLkU2Ih',
'anvato_nextmedia_demo_web_stage_9787d56a02ff6b9f43e9a2b0920d8ca88beb5818': 'Pqu9zVzI1ApiIzbVA3VkGBEQHvdKSUuKpD6s2uaR',
'anvato_scripps_app_web_prod_0837996dbe373629133857ae9eb72e740424d80a': 'du1ccmn7RxzgizwbWU7hyUaGodNlJn7HtXI0WgXW',
'anvato_scripps_app_web_stage_360797e00fe2826be142155c4618cc52fce6c26c': '2PMrQ0BRoqCWl7nzphj0GouIMEh2mZYivAT0S1Su',
'fs2go_fs2go_go_all_prod_21934911ccfafc03a075894ead2260d11e2ddd24': 'RcuHlKikW2IJw6HvVoEkqq2UsuEJlbEl11pWXs4Q',
'fs2go_fs2go_go_web_prod_ead4b0eec7460c1a07783808db21b49cf1f2f9a7': '4K0HTT2u1zkQA2MaGaZmkLa1BthGSBdr7jllrhk5',
'fs2go_fs2go_go_web_stage_407585454a4400355d4391691c67f361': 'ftnc37VKRJBmHfoGGi3kT05bHyeJzilEzhKJCyl3',
'fs2go_fs2go_go_android_stage_44b714db6f8477f29afcba15a41e1d30': 'CtxpPvVpo6AbZGomYUhkKs7juHZwNml9b9J0J2gI',
'anvato_cbslocal_app_web_prod_547f3e49241ef0e5d30c79b2efbca5d92c698f67': 'Pw0XX5KBDsyRnPS0R2JrSrXftsy8Jnz5pAjaYC8s',
'anvato_cbslocal_app_web_stage_547a5f096594cd3e00620c6f825cad1096d28c80': '37OBUhX2uwNyKhhrNzSSNHSRPZpApC3trdqDBpuz',
'fs2go_att_att_web_prod_1042dddd089a05438b6a08f972941176f699ffd8': 'JLcF20JwYvpv6uAGcLWIaV12jKwaL1R8us4b6Zkg',
'fs2go_att_att_web_stage_807c5001955fc114a3331fe027ddc76e': 'gbu1oO1y0JiOFh4SUipt86P288JHpyjSqolrrT1x',
'fs2go_fs2go_tudor_web_prod_a7dd8e5a7cdc830cae55eae6f3e9fee5ee49eb9b': 'ipcp87VCEZXPPe868j3orLqzc03oTy7DXsGkAXXH',
'anvato_mhz_app_web_prod_b808218b30de7fdf60340cbd9831512bc1bf6d37': 'Stlm5Gs6BEhJLRTZHcNquyzxGqr23EuFmE5DCgjX',
'fs2go_charter_charter_web_stage_c2c6e5a68375a1bf00fff213d3ff8f61a835a54c': 'Lz4hbJp1fwL6jlcz4M2PMzghM4jp4aAmybtT5dPc',
'fs2go_charter_charter_web_prod_ebfe3b10f1af215a7321cd3d629e0b81dfa6fa8c': 'vUJsK345A1bVmyYDRhZX0lqFIgVXuqhmuyp1EtPK',
'anvato_epfox_app_web_prod_b3373168e12f423f41504f207000188daf88251b': 'GDKq1ixvX3MoBNdU5IOYmYa2DTUXYOozPjrCJnW7',
'anvato_epfox_app_web_stage_a3c2ce60f8f83ef374a88b68ee73a950f8ab87ce': '2jz2NH4BsXMaDsoJ5qkHMbcczAfIReo2eFYuVC1C',
'fs2go_verizon_verizon_web_stage_08e6df0354a4803f1b1f2428b5a9a382e8dbcd62': 'rKTVapNaAcmnUbGL4ZcuOoY4SE7VmZSQsblPFr7e',
'fs2go_verizon_verizon_web_prod_f909564cb606eff1f731b5e22e0928676732c445': 'qLSUuHerM3u9eNPzaHyUK52obai5MvE4XDJfqYe1',
'fs2go_foxcom_synd_web_stage_f7b9091f00ea25a4fdaaae77fca5b54cdc7e7043': '96VKF2vLd24fFiDfwPFpzM5llFN4TiIGAlodE0Re',
'fs2go_foxcom_synd_web_prod_0f2cdd64d87e4ab6a1d54aada0ff7a7c8387a064': 'agiPjbXEyEZUkbuhcnmVPhe9NNVbDjCFq2xkcx51',
'anvato_own_app_web_stage_1214ade5d28422c4dae9d03c1243aba0563c4dba': 'mzhamNac3swG4WsJAiUTacnGIODi6SWeVWk5D7ho',
'anvato_own_app_web_prod_944e162ed927ec3e9ed13eb68ed2f1008ee7565e': '9TSxh6G2TXOLBoYm9ro3LdNjjvnXpKb8UR8KoIP9',
'anvato_scripps_app_ftv_prod_a10a10468edd5afb16fb48171c03b956176afad1': 'COJ2i2UIPK7xZqIWswxe7FaVBOVgRkP1F6O6qGoH',
'anvato_scripps_app_ftv_stage_77d3ad2bdb021ec37ca2e35eb09acd396a974c9a': 'Q7nnopNLe2PPfGLOTYBqxSaRpl209IhqaEuDZi1F',
'anvato_univision_app_web_stage_551236ef07a0e17718c3995c35586b5ed8cb5031': 'D92PoLS6UitwxDRA191HUGT9OYcOjV6mPMa5wNyo',
'anvato_univision_app_web_prod_039a5c0a6009e637ae8ac906718a79911e0e65e1': '5mVS5u4SQjtw6NGw2uhMbKEIONIiLqRKck5RwQLR',
'nbcu_cnbc_springfield_ios_prod_670207fae43d6e9a94c351688851a2ce': 'M7fqCCIP9lW53oJbHs19OlJlpDrVyc2OL8gNeuTa',
'nbcu_cnbc_springfieldvod_ios_prod_7a5f04b1ceceb0e9c9e2264a44aa236e08e034c2': 'Yia6QbJahW0S7K1I0drksimhZb4UFq92xLBmmMvk',
'anvato_cox_app_web_prod_ce45cda237969f93e7130f50ee8bb6280c1484ab': 'cc0miZexpFtdoqZGvdhfXsLy7FXjRAOgb9V0f5fZ',
'anvato_cox_app_web_stage_c23dbe016a8e9d8c7101d10172b92434f6088bf9': 'yivU3MYHd2eDZcOfmLbINVtqxyecKTOp8OjOuoGJ',
'anvato_chnzero_app_web_stage_b1164d1352b579e792e542fddf13ee34c0eeb46b': 'A76QkXMmVH8lTCfU15xva1mZnSVcqeY4Xb22Kp7m',
'anvato_chnzero_app_web_prod_253d358928dc08ec161eda2389d53707288a730c': 'OA5QI3ZWZZkdtUEDqh28AH8GedsF6FqzJI32596b',
'anvato_discovery_vodpoc_web_stage_9fa7077b5e8af1f8355f65d4fb8d2e0e9d54e2b7': 'q3oT191tTQ5g3JCP67PkjLASI9s16DuWZ6fYmry3',
'anvato_discovery_vodpoc_web_prod_688614983167a1af6cdf6d76343fda10a65223c1': 'qRvRQCTVHd0VVOHsMvvfidyWmlYVrTbjby7WqIuK',
'nbcu_cnbc_springfieldvod_ftv_stage_826040aad1925a46ac5dfb4b3c5143e648c6a30d': 'JQaSb5a8Tz0PT4ti329DNmzDO30TnngTHmvX8Vua',
'nbcu_cnbc_springfield_ftv_stage_826040aad1925a46ac5dfb4b3c5143e648c6a30d': 'JQaSb5a8Tz0PT4ti329DNmzDO30TnngTHmvX8Vua',
'nbcu_nbcd_capture_web_stage_4dd9d585bfb984ebf856dee35db027b2465cc4ae': '0j1Ov4Vopyi2HpBZJYdL2m8ERJVGYh3nNpzPiO8F',
'nbcu_nbcd_watch3_android_prod_7712ca5fcf1c22f19ec1870a9650f9c37db22dcf': '3LN2UB3rPUAMu7ZriWkHky9vpLMXYha8JbSnxBlx',
'nbcu_nbcd_watchvod3_android_prod_0910a3a4692d57c0b5ff4316075bc5d096be45b9': 'mJagcQ2II30vUOAauOXne7ERwbf5S9nlB3IP17lQ',
'anvato_scripps_app_atv_prod_790deda22e16e71e83df58f880cd389908a45d52': 'CB6trI1mpoDIM5o54DNTsji90NDBQPZ4z4RqBNSH',
'nbcu_nbcd_watchv4_android_prod_ff67cef9cb409158c6f8c3533edddadd0b750507': 'j8CHQCUWjlYERj4NFRmUYOND85QNbHViH09UwuKm',
'nbcu_nbcd_watchvodv4_android_prod_a814d781609989dea6a629d50ae4c7ad8cc8e907': 'rkVnUXxdA9rawVLUlDQtMue9Y4Q7lFEaIotcUhjt',
'rvVKpA50qlOPLFxMjrCGf5pdkdQDm7qn': '1J7ZkY5Qz5lMLi93QOH9IveE7EYB3rLl',
'nbcu_dtv_local_web_prod_b266cf49defe255fd4426a97e27c09e513e9f82f': 'HuLnJDqzLa4saCzYMJ79zDRSQpEduw1TzjMNQu2b',
'nbcu_att_local_web_prod_4cef038b2d969a6b7d700a56a599040b6a619f67': 'Q0Em5VDc2KpydUrVwzWRXAwoNBulWUxCq2faK0AV',
'nbcu_dish_local_web_prod_c56dcaf2da2e9157a4266c82a78195f1dd570f6b': 'bC1LWmRz9ayj2AlzizeJ1HuhTfIaJGsDBnZNgoRg',
'nbcu_verizon_local_web_prod_88bebd2ce006d4ed980de8133496f9a74cb9b3e1': 'wzhDKJZpgvUSS1EQvpCQP8Q59qVzcPixqDGJefSk',
'nbcu_charter_local_web_prod_9ad90f7fc4023643bb718f0fe0fd5beea2382a50': 'PyNbxNhEWLzy1ZvWEQelRuIQY88Eub7xbSVRMdfT',
'nbcu_suddenlink_local_web_prod_20fb711725cac224baa1c1cb0b1c324d25e97178': '0Rph41lPXZbb3fqeXtHjjbxfSrNbtZp1Ygq7Jypa',
'nbcu_wow_local_web_prod_652d9ce4f552d9c2e7b5b1ed37b8cb48155174ad': 'qayIBZ70w1dItm2zS42AptXnxW15mkjRrwnBjMPv',
'nbcu_centurylink_local_web_prod_2034402b029bf3e837ad46814d9e4b1d1345ccd5': 'StePcPMkjsX51PcizLdLRMzxMEl5k2FlsMLUNV4k',
'nbcu_atlanticbrd_local_web_prod_8d5f5ecbf7f7b2f5e6d908dd75d90ae3565f682e': 'NtYLb4TFUS0pRs3XTkyO5sbVGYjVf17bVbjaGscI',
'nbcu_nbcd_watchvod_web_dev_08bc05699be47c4f31d5080263a8cfadc16d0f7c': 'hwxi2dgDoSWgfmVVXOYZm14uuvku4QfopstXckhr',
'anvato_nextmedia_app_web_prod_a4fa8c7204aa65e71044b57aaf63711980cfe5a0': 'tQN1oGPYY1nM85rJYePWGcIb92TG0gSqoVpQTWOw',
'anvato_mcp_lin_web_prod_4c36fbfd4d8d8ecae6488656e21ac6d1ac972749': 'GUXNf5ZDX2jFUpu4WT2Go4DJ5nhUCzpnwDRRUx1K',
'anvato_mcp_univision_web_prod_37fe34850c99a3b5cdb71dab10a417dd5cdecafa': 'bLDYF8JqfG42b7bwKEgQiU9E2LTIAtnKzSgYpFUH',
'anvato_mcp_fs2go_web_prod_c7b90a93e171469cdca00a931211a2f556370d0a': 'icgGoYGipQMMSEvhplZX1pwbN69srwKYWksz3xWK',
'anvato_mcp_sps_web_prod_54bdc90dd6ba21710e9f7074338365bba28da336': 'fA2iQdI7RDpynqzQYIpXALVS83NTPr8LLFK4LFsu',
'anvato_mcp_anv_web_prod_791407490f4c1ef2a4bcb21103e0cb1bcb3352b3': 'rMOUZqe9lwcGq2mNgG3EDusm6lKgsUnczoOX3mbg',
'anvato_mcp_gray_web_prod_4c10f067c393ed8fc453d3930f8ab2b159973900': 'rMOUZqe9lwcGq2mNgG3EDusm6lKgsUnczoOX3mbg',
'anvato_mcp_hearst_web_prod_5356c3de0fc7c90a3727b4863ca7fec3a4524a99': 'P3uXJ0fXXditBPCGkfvlnVScpPEfKmc64Zv7ZgbK',
'anvato_mcp_cbs_web_prod_02f26581ff80e5bda7aad28226a8d369037f2cbe': 'mGPvo5ZA5SgjOFAPEPXv7AnOpFUICX8hvFQVz69n',
'anvato_mcp_telemundo_web_prod_c5278d51ad46fda4b6ca3d0ea44a7846a054f582': 'qyT6PXXLjVNCrHaRVj0ugAhalNRS7Ee9BP7LUokD',
'nbcu_nbcd_watchvodv4_web_stage_4108362fba2d4ede21f262fea3c4162cbafd66c7': 'DhaU5lj0W2gEdcSSsnxURq8t7KIWtJfD966crVDk',
'anvato_scripps_app_ios_prod_409c41960c60b308db43c3cc1da79cab9f1c3d93': 'WPxj5GraLTkYCyj3M7RozLqIycjrXOEcDGFMIJPn',
'EZqvRyKBJLrgpClDPDF8I7Xpdp40Vx73': '4OxGd2dEakylntVKjKF0UK9PDPYB6A9W',
'M2v78QkpleXm9hPp9jUXI63x5vA6BogR': 'ka6K32k7ZALmpINkjJUGUo0OE42Md1BQ',
'nbcu_nbcd_desktop_web_prod_93d8ead38ce2024f8f544b78306fbd15895ae5e6_secure': 'NNemUkySjxLyPTKvZRiGntBIjEyK8uqicjMakIaQ'
}
_AUTH_KEY = b'\x31\xc2\x42\x84\x9e\x73\xa0\xce'
def __init__(self, *args, **kwargs):
super(AnvatoIE, self).__init__(*args, **kwargs)
self.__server_time = None
def _server_time(self, access_key, video_id):
if self.__server_time is not None:
return self.__server_time
self.__server_time = int(self._download_json(
self._api_prefix(access_key) + 'server_time?anvack=' + access_key, video_id,
note='Fetching server time')['server_time'])
return self.__server_time
def _api_prefix(self, access_key):
return 'https://tkx2-%s.anvato.net/rest/v2/' % ('prod' if 'prod' in access_key else 'stage')
def _get_video_json(self, access_key, video_id):
# See et() in anvplayer.min.js, which is an alias of getVideoJSON()
video_data_url = self._api_prefix(access_key) + 'mcp/video/%s?anvack=%s' % (video_id, access_key)
server_time = self._server_time(access_key, video_id)
input_data = '%d~%s~%s' % (server_time, md5_text(video_data_url), md5_text(server_time))
auth_secret = intlist_to_bytes(aes_encrypt(
bytes_to_intlist(input_data[:64]), bytes_to_intlist(self._AUTH_KEY)))
video_data_url += '&X-Anvato-Adst-Auth=' + base64.b64encode(auth_secret).decode('ascii')
anvrid = md5_text(time.time() * 1000 * random.random())[:30]
payload = {
'api': {
'anvrid': anvrid,
'anvstk': md5_text('%s|%s|%d|%s' % (
access_key, anvrid, server_time, self._ANVACK_TABLE[access_key])),
'anvts': server_time,
},
}
return self._download_json(
video_data_url, video_id, transform_source=strip_jsonp,
data=json.dumps(payload).encode('utf-8'))
def _extract_anvato_videos(self, webpage, video_id):
anvplayer_data = self._parse_json(self._html_search_regex(
r'<script[^>]+data-anvp=\'([^\']+)\'', webpage,
'Anvato player data'), video_id)
video_id = anvplayer_data['video']
access_key = anvplayer_data['accessKey']
video_data = self._get_video_json(access_key, video_id)
formats = []
for published_url in video_data['published_urls']:
video_url = published_url['embed_url']
ext = determine_ext(video_url)
if ext == 'smil':
formats.extend(self._extract_smil_formats(video_url, video_id))
continue
tbr = int_or_none(published_url.get('kbps'))
a_format = {
'url': video_url,
'format_id': ('-'.join(filter(None, ['http', published_url.get('cdn_name')]))).lower(),
'tbr': tbr if tbr != 0 else None,
}
if ext == 'm3u8':
# Not using _extract_m3u8_formats here as individual media
# playlists are also included in published_urls.
if tbr is None:
formats.append(self._m3u8_meta_format(video_url, ext='mp4', m3u8_id='hls'))
continue
else:
a_format.update({
'format_id': '-'.join(filter(None, ['hls', compat_str(tbr)])),
'ext': 'mp4',
})
elif ext == 'mp3':
a_format['vcodec'] = 'none'
else:
a_format.update({
'width': int_or_none(published_url.get('width')),
'height': int_or_none(published_url.get('height')),
})
formats.append(a_format)
self._sort_formats(formats)
subtitles = {}
for caption in video_data.get('captions', []):
a_caption = {
'url': caption['url'],
'ext': 'tt' if caption.get('format') == 'SMPTE-TT' else None
}
subtitles.setdefault(caption['language'], []).append(a_caption)
return {
'id': video_id,
'formats': formats,
'title': video_data.get('def_title'),
'description': video_data.get('def_description'),
'categories': video_data.get('categories'),
'thumbnail': video_data.get('thumbnail'),
'subtitles': subtitles,
}

View File

@@ -12,7 +12,7 @@ from ..utils import (
class AolIE(InfoExtractor):
IE_NAME = 'on.aol.com'
_VALID_URL = r'(?:aol-video:|https?://on\.aol\.com/.*-)(?P<id>[^/?-]+)'
_VALID_URL = r'(?:aol-video:|https?://on\.aol\.com/(?:[^/]+/)*(?:[^/?#&]+-)?)(?P<id>[^/?#&]+)'
_TESTS = [{
# video with 5min ID
@@ -53,6 +53,12 @@ class AolIE(InfoExtractor):
}, {
'url': 'http://on.aol.com/shows/park-bench-shw518173474-559a1b9be4b0c3bfad3357a7?context=SH:SHW518173474:PL4327:1460619712763',
'only_matching': True,
}, {
'url': 'http://on.aol.com/video/519442220',
'only_matching': True,
}, {
'url': 'aol-video:5707d6b8e4b090497b04f706',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@@ -8,7 +8,6 @@ from .generic import GenericIE
from ..utils import (
determine_ext,
ExtractorError,
get_element_by_attribute,
qualities,
int_or_none,
parse_duration,
@@ -274,41 +273,3 @@ class ARDIE(InfoExtractor):
'upload_date': upload_date,
'thumbnail': thumbnail,
}
class SportschauIE(ARDMediathekIE):
IE_NAME = 'Sportschau'
_VALID_URL = r'(?P<baseurl>https?://(?:www\.)?sportschau\.de/(?:[^/]+/)+video(?P<id>[^/#?]+))\.html'
_TESTS = [{
'url': 'http://www.sportschau.de/tourdefrance/videoseppeltkokainhatnichtsmitklassischemdopingzutun100.html',
'info_dict': {
'id': 'seppeltkokainhatnichtsmitklassischemdopingzutun100',
'ext': 'mp4',
'title': 'Seppelt: "Kokain hat nichts mit klassischem Doping zu tun"',
'thumbnail': 're:^https?://.*\.jpg$',
'description': 'Der ARD-Doping Experte Hajo Seppelt gibt seine Einschätzung zum ersten Dopingfall der diesjährigen Tour de France um den Italiener Luca Paolini ab.',
},
'params': {
# m3u8 download
'skip_download': True,
},
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
base_url = mobj.group('baseurl')
webpage = self._download_webpage(url, video_id)
title = get_element_by_attribute('class', 'headline', webpage)
description = self._html_search_meta('description', webpage, 'description')
info = self._extract_media_info(
base_url + '-mc_defaultQuality-h.json', webpage, video_id)
info.update({
'title': title,
'description': description,
})
return info

View File

@@ -61,10 +61,7 @@ class ArteTvIE(InfoExtractor):
}
class ArteTVPlus7IE(InfoExtractor):
IE_NAME = 'arte.tv:+7'
_VALID_URL = r'https?://(?:www\.)?arte\.tv/guide/(?P<lang>fr|de|en|es)/(?:(?:sendungen|emissions|embed)/)?(?P<id>[^/]+)/(?P<name>[^/?#&+])'
class ArteTVBaseIE(InfoExtractor):
@classmethod
def _extract_url_info(cls, url):
mobj = re.match(cls._VALID_URL, url)
@@ -78,6 +75,125 @@ class ArteTVPlus7IE(InfoExtractor):
video_id = mobj.group('id')
return video_id, lang
def _extract_from_json_url(self, json_url, video_id, lang, title=None):
info = self._download_json(json_url, video_id)
player_info = info['videoJsonPlayer']
upload_date_str = player_info.get('shootingDate')
if not upload_date_str:
upload_date_str = (player_info.get('VRA') or player_info.get('VDA') or '').split(' ')[0]
title = (player_info.get('VTI') or title or player_info['VID']).strip()
subtitle = player_info.get('VSU', '').strip()
if subtitle:
title += ' - %s' % subtitle
info_dict = {
'id': player_info['VID'],
'title': title,
'description': player_info.get('VDE'),
'upload_date': unified_strdate(upload_date_str),
'thumbnail': player_info.get('programImage') or player_info.get('VTU', {}).get('IUR'),
}
qfunc = qualities(['HQ', 'MQ', 'EQ', 'SQ'])
LANGS = {
'fr': 'F',
'de': 'A',
'en': 'E[ANG]',
'es': 'E[ESP]',
}
langcode = LANGS.get(lang, lang)
formats = []
for format_id, format_dict in player_info['VSR'].items():
f = dict(format_dict)
versionCode = f.get('versionCode')
l = re.escape(langcode)
# Language preference from most to least priority
# Reference: section 5.6.3 of
# http://www.arte.tv/sites/en/corporate/files/complete-technical-guidelines-arte-geie-v1-05.pdf
PREFERENCES = (
# original version in requested language, without subtitles
r'VO{0}$'.format(l),
# original version in requested language, with partial subtitles in requested language
r'VO{0}-ST{0}$'.format(l),
# original version in requested language, with subtitles for the deaf and hard-of-hearing in requested language
r'VO{0}-STM{0}$'.format(l),
# non-original (dubbed) version in requested language, without subtitles
r'V{0}$'.format(l),
# non-original (dubbed) version in requested language, with subtitles partial subtitles in requested language
r'V{0}-ST{0}$'.format(l),
# non-original (dubbed) version in requested language, with subtitles for the deaf and hard-of-hearing in requested language
r'V{0}-STM{0}$'.format(l),
# original version in requested language, with partial subtitles in different language
r'VO{0}-ST(?!{0}).+?$'.format(l),
# original version in requested language, with subtitles for the deaf and hard-of-hearing in different language
r'VO{0}-STM(?!{0}).+?$'.format(l),
# original version in different language, with partial subtitles in requested language
r'VO(?:(?!{0}).+?)?-ST{0}$'.format(l),
# original version in different language, with subtitles for the deaf and hard-of-hearing in requested language
r'VO(?:(?!{0}).+?)?-STM{0}$'.format(l),
# original version in different language, without subtitles
r'VO(?:(?!{0}))?$'.format(l),
# original version in different language, with partial subtitles in different language
r'VO(?:(?!{0}).+?)?-ST(?!{0}).+?$'.format(l),
# original version in different language, with subtitles for the deaf and hard-of-hearing in different language
r'VO(?:(?!{0}).+?)?-STM(?!{0}).+?$'.format(l),
)
for pref, p in enumerate(PREFERENCES):
if re.match(p, versionCode):
lang_pref = len(PREFERENCES) - pref
break
else:
lang_pref = -1
format = {
'format_id': format_id,
'preference': -10 if f.get('videoFormat') == 'M3U8' else None,
'language_preference': lang_pref,
'format_note': '%s, %s' % (f.get('versionCode'), f.get('versionLibelle')),
'width': int_or_none(f.get('width')),
'height': int_or_none(f.get('height')),
'tbr': int_or_none(f.get('bitrate')),
'quality': qfunc(f.get('quality')),
}
if f.get('mediaType') == 'rtmp':
format['url'] = f['streamer']
format['play_path'] = 'mp4:' + f['url']
format['ext'] = 'flv'
else:
format['url'] = f['url']
formats.append(format)
self._check_formats(formats, video_id)
self._sort_formats(formats)
info_dict['formats'] = formats
return info_dict
class ArteTVPlus7IE(ArteTVBaseIE):
IE_NAME = 'arte.tv:+7'
_VALID_URL = r'https?://(?:(?:www|sites)\.)?arte\.tv/[^/]+/(?P<lang>fr|de|en|es)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://www.arte.tv/guide/de/sendungen/XEN/xenius/?vid=055918-015_PLUS7-D',
'only_matching': True,
}, {
'url': 'http://sites.arte.tv/karambolage/de/video/karambolage-22',
'only_matching': True,
}]
@classmethod
def suitable(cls, url):
return False if ArteTVPlaylistIE.suitable(url) else super(ArteTVPlus7IE, cls).suitable(url)
def _real_extract(self, url):
video_id, lang = self._extract_url_info(url)
webpage = self._download_webpage(url, video_id)
@@ -127,84 +243,10 @@ class ArteTVPlus7IE(InfoExtractor):
return self._extract_from_json_url(json_url, video_id, lang, title=title)
# Different kind of embed URL (e.g.
# http://www.arte.tv/magazine/trepalium/fr/episode-0406-replay-trepalium)
embed_url = self._search_regex(
r'<iframe[^>]+src=(["\'])(?P<url>.+?)\1',
webpage, 'embed url', group='url')
return self.url_result(embed_url)
def _extract_from_json_url(self, json_url, video_id, lang, title=None):
info = self._download_json(json_url, video_id)
player_info = info['videoJsonPlayer']
upload_date_str = player_info.get('shootingDate')
if not upload_date_str:
upload_date_str = (player_info.get('VRA') or player_info.get('VDA') or '').split(' ')[0]
title = (player_info.get('VTI') or title or player_info['VID']).strip()
subtitle = player_info.get('VSU', '').strip()
if subtitle:
title += ' - %s' % subtitle
info_dict = {
'id': player_info['VID'],
'title': title,
'description': player_info.get('VDE'),
'upload_date': unified_strdate(upload_date_str),
'thumbnail': player_info.get('programImage') or player_info.get('VTU', {}).get('IUR'),
}
qfunc = qualities(['HQ', 'MQ', 'EQ', 'SQ'])
LANGS = {
'fr': 'F',
'de': 'A',
'en': 'E[ANG]',
'es': 'E[ESP]',
}
formats = []
for format_id, format_dict in player_info['VSR'].items():
f = dict(format_dict)
versionCode = f.get('versionCode')
langcode = LANGS.get(lang, lang)
lang_rexs = [r'VO?%s-' % re.escape(langcode), r'VO?.-ST%s$' % re.escape(langcode)]
lang_pref = None
if versionCode:
matched_lang_rexs = [r for r in lang_rexs if re.match(r, versionCode)]
lang_pref = -10 if not matched_lang_rexs else 10 * len(matched_lang_rexs)
source_pref = 0
if versionCode is not None:
# The original version with subtitles has lower relevance
if re.match(r'VO-ST(F|A|E)', versionCode):
source_pref -= 10
# The version with sourds/mal subtitles has also lower relevance
elif re.match(r'VO?(F|A|E)-STM\1', versionCode):
source_pref -= 9
format = {
'format_id': format_id,
'preference': -10 if f.get('videoFormat') == 'M3U8' else None,
'language_preference': lang_pref,
'format_note': '%s, %s' % (f.get('versionCode'), f.get('versionLibelle')),
'width': int_or_none(f.get('width')),
'height': int_or_none(f.get('height')),
'tbr': int_or_none(f.get('bitrate')),
'quality': qfunc(f.get('quality')),
'source_preference': source_pref,
}
if f.get('mediaType') == 'rtmp':
format['url'] = f['streamer']
format['play_path'] = 'mp4:' + f['url']
format['ext'] = 'flv'
else:
format['url'] = f['url']
formats.append(format)
self._check_formats(formats, video_id)
self._sort_formats(formats)
info_dict['formats'] = formats
return info_dict
entries = [
self.url_result(url)
for _, url in re.findall(r'<iframe[^>]+src=(["\'])(?P<url>.+?)\1', webpage)]
return self.playlist_result(entries)
# It also uses the arte_vp_url url from the webpage to extract the information
@@ -213,22 +255,17 @@ class ArteTVCreativeIE(ArteTVPlus7IE):
_VALID_URL = r'https?://creative\.arte\.tv/(?P<lang>fr|de|en|es)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://creative.arte.tv/de/magazin/agentur-amateur-corporate-design',
'url': 'http://creative.arte.tv/fr/episode/osmosis-episode-1',
'info_dict': {
'id': '72176',
'id': '057405-001-A',
'ext': 'mp4',
'title': 'Folge 2 - Corporate Design',
'upload_date': '20131004',
'title': 'OSMOSIS - N\'AYEZ PLUS PEUR D\'AIMER (1)',
'upload_date': '20150716',
},
}, {
'url': 'http://creative.arte.tv/fr/Monty-Python-Reunion',
'info_dict': {
'id': '160676',
'ext': 'mp4',
'title': 'Monty Python live (mostly)',
'description': 'Événement ! Quarante-cinq ans après leurs premiers succès, les légendaires Monty Python remontent sur scène.\n',
'upload_date': '20140805',
}
'playlist_count': 11,
'add_ie': ['Youtube'],
}, {
'url': 'http://creative.arte.tv/de/episode/agentur-amateur-4-der-erste-kunde',
'only_matching': True,
@@ -239,7 +276,7 @@ class ArteTVInfoIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:info'
_VALID_URL = r'https?://info\.arte\.tv/(?P<lang>fr|de|en|es)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TEST = {
_TESTS = [{
'url': 'http://info.arte.tv/fr/service-civique-un-cache-misere',
'info_dict': {
'id': '067528-000-A',
@@ -247,7 +284,7 @@ class ArteTVInfoIE(ArteTVPlus7IE):
'title': 'Service civique, un cache misère ?',
'upload_date': '20160403',
},
}
}]
class ArteTVFutureIE(ArteTVPlus7IE):
@@ -272,6 +309,8 @@ class ArteTVDDCIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:ddc'
_VALID_URL = r'https?://ddc\.arte\.tv/(?P<lang>emission|folge)/(?P<id>[^/?#&]+)'
_TESTS = []
def _real_extract(self, url):
video_id, lang = self._extract_url_info(url)
if lang == 'folge':
@@ -290,7 +329,7 @@ class ArteTVConcertIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:concert'
_VALID_URL = r'https?://concert\.arte\.tv/(?P<lang>fr|de|en|es)/(?P<id>[^/?#&]+)'
_TEST = {
_TESTS = [{
'url': 'http://concert.arte.tv/de/notwist-im-pariser-konzertclub-divan-du-monde',
'md5': '9ea035b7bd69696b67aa2ccaaa218161',
'info_dict': {
@@ -300,24 +339,23 @@ class ArteTVConcertIE(ArteTVPlus7IE):
'upload_date': '20140128',
'description': 'md5:486eb08f991552ade77439fe6d82c305',
},
}
}]
class ArteTVCinemaIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:cinema'
_VALID_URL = r'https?://cinema\.arte\.tv/(?P<lang>fr|de|en|es)/(?P<id>.+)'
_TEST = {
'url': 'http://cinema.arte.tv/de/node/38291',
'md5': '6b275511a5107c60bacbeeda368c3aa1',
_TESTS = [{
'url': 'http://cinema.arte.tv/fr/article/les-ailes-du-desir-de-julia-reck',
'md5': 'a5b9dd5575a11d93daf0e3f404f45438',
'info_dict': {
'id': '055876-000_PWA12025-D',
'id': '062494-000-A',
'ext': 'mp4',
'title': 'Tod auf dem Nil',
'upload_date': '20160122',
'description': 'md5:7f749bbb77d800ef2be11d54529b96bc',
'title': 'Film lauréat du concours web - "Les ailes du désir" de Julia Reck',
'upload_date': '20150807',
},
}
}]
class ArteTVMagazineIE(ArteTVPlus7IE):
@@ -362,9 +400,41 @@ class ArteTVEmbedIE(ArteTVPlus7IE):
)
'''
_TESTS = []
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
lang = mobj.group('lang')
json_url = mobj.group('json_url')
return self._extract_from_json_url(json_url, video_id, lang)
class ArteTVPlaylistIE(ArteTVBaseIE):
IE_NAME = 'arte.tv:playlist'
_VALID_URL = r'https?://(?:www\.)?arte\.tv/guide/(?P<lang>fr|de|en|es)/[^#]*#collection/(?P<id>PL-\d+)'
_TESTS = [{
'url': 'http://www.arte.tv/guide/de/plus7/?country=DE#collection/PL-013263/ARTETV',
'info_dict': {
'id': 'PL-013263',
'title': 'Areva & Uramin',
},
'playlist_mincount': 6,
}, {
'url': 'http://www.arte.tv/guide/de/playlists?country=DE#collection/PL-013190/ARTETV',
'only_matching': True,
}]
def _real_extract(self, url):
playlist_id, lang = self._extract_url_info(url)
collection = self._download_json(
'https://api.arte.tv/api/player/v1/collectionData/%s/%s?source=videos'
% (lang, playlist_id), playlist_id)
title = collection.get('title')
description = collection.get('shortDescription') or collection.get('teaserText')
entries = [
self._extract_from_json_url(
video['jsonUrl'], video.get('programId') or playlist_id, lang)
for video in collection['videos'] if video.get('jsonUrl')]
return self.playlist_result(entries, playlist_id, title, description)

View File

@@ -6,6 +6,7 @@ import time
from .common import InfoExtractor
from .soundcloud import SoundcloudIE
from ..compat import compat_str
from ..utils import (
ExtractorError,
url_basename,
@@ -136,7 +137,7 @@ class AudiomackAlbumIE(InfoExtractor):
result[resultkey] = api_response[apikey]
song_id = url_basename(api_response['url']).rpartition('.')[0]
result['entries'].append({
'id': api_response.get('id', song_id),
'id': compat_str(api_response.get('id', song_id)),
'uploader': api_response.get('artist'),
'title': api_response.get('title', song_id),
'url': api_response['url'],

View File

@@ -46,6 +46,7 @@ class AzubuIE(InfoExtractor):
'uploader_id': 272749,
'view_count': int,
},
'skip': 'Channel offline',
},
]
@@ -56,22 +57,26 @@ class AzubuIE(InfoExtractor):
'http://www.azubu.tv/api/video/%s' % video_id, video_id)['data']
title = data['title'].strip()
description = data['description']
thumbnail = data['thumbnail']
view_count = data['view_count']
uploader = data['user']['username']
uploader_id = data['user']['id']
description = data.get('description')
thumbnail = data.get('thumbnail')
view_count = data.get('view_count')
user = data.get('user', {})
uploader = user.get('username')
uploader_id = user.get('id')
stream_params = json.loads(data['stream_params'])
timestamp = float_or_none(stream_params['creationDate'], 1000)
duration = float_or_none(stream_params['length'], 1000)
timestamp = float_or_none(stream_params.get('creationDate'), 1000)
duration = float_or_none(stream_params.get('length'), 1000)
renditions = stream_params.get('renditions') or []
video = stream_params.get('FLVFullLength') or stream_params.get('videoFullLength')
if video:
renditions.append(video)
if not renditions and not user.get('channel', {}).get('is_live', True):
raise ExtractorError('%s said: channel is offline.' % self.IE_NAME, expected=True)
formats = [{
'url': fmt['url'],
'width': fmt['frameWidth'],

View File

@@ -29,7 +29,7 @@ class BandcampIE(InfoExtractor):
'_skip': 'There is a limit of 200 free downloads / month for the test song'
}, {
'url': 'http://benprunty.bandcamp.com/track/lanius-battle',
'md5': '2b68e5851514c20efdff2afc5603b8b4',
'md5': '73d0b3171568232574e45652f8720b5c',
'info_dict': {
'id': '2650410135',
'ext': 'mp3',
@@ -48,6 +48,10 @@ class BandcampIE(InfoExtractor):
if m_trackinfo:
json_code = m_trackinfo.group(1)
data = json.loads(json_code)[0]
track_id = compat_str(data['id'])
if not data.get('file'):
raise ExtractorError('Not streamable', video_id=track_id, expected=True)
formats = []
for format_id, format_url in data['file'].items():
@@ -64,7 +68,7 @@ class BandcampIE(InfoExtractor):
self._sort_formats(formats)
return {
'id': compat_str(data['id']),
'id': track_id,
'title': data['title'],
'formats': formats,
'duration': float_or_none(data.get('duration')),

View File

@@ -31,7 +31,7 @@ class BBCCoUkIE(InfoExtractor):
music/clips[/#]|
radio/player/
)
(?P<id>%s)
(?P<id>%s)(?!/(?:episodes|broadcasts|clips))
''' % _ID_REGEX
_MEDIASELECTOR_URLS = [
@@ -192,6 +192,7 @@ class BBCCoUkIE(InfoExtractor):
# rtmp download
'skip_download': True,
},
'skip': 'Now it\'s really geo-restricted',
}, {
# compact player (https://github.com/rg3/youtube-dl/issues/8147)
'url': 'http://www.bbc.co.uk/programmes/p028bfkf/player',
@@ -698,7 +699,9 @@ class BBCIE(BBCCoUkIE):
@classmethod
def suitable(cls, url):
return False if BBCCoUkIE.suitable(url) or BBCCoUkArticleIE.suitable(url) else super(BBCIE, cls).suitable(url)
EXCLUDE_IE = (BBCCoUkIE, BBCCoUkArticleIE, BBCCoUkIPlayerPlaylistIE, BBCCoUkPlaylistIE)
return (False if any(ie.suitable(url) for ie in EXCLUDE_IE)
else super(BBCIE, cls).suitable(url))
def _extract_from_media_meta(self, media_meta, video_id):
# Direct links to media in media metadata (e.g.
@@ -975,3 +978,72 @@ class BBCCoUkArticleIE(InfoExtractor):
r'<div[^>]+typeof="Clip"[^>]+resource="([^"]+)"', webpage)]
return self.playlist_result(entries, playlist_id, title, description)
class BBCCoUkPlaylistBaseIE(InfoExtractor):
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
entries = [
self.url_result(self._URL_TEMPLATE % video_id, BBCCoUkIE.ie_key())
for video_id in re.findall(
self._VIDEO_ID_TEMPLATE % BBCCoUkIE._ID_REGEX, webpage)]
title, description = self._extract_title_and_description(webpage)
return self.playlist_result(entries, playlist_id, title, description)
class BBCCoUkIPlayerPlaylistIE(BBCCoUkPlaylistBaseIE):
IE_NAME = 'bbc.co.uk:iplayer:playlist'
_VALID_URL = r'https?://(?:www\.)?bbc\.co\.uk/iplayer/episodes/(?P<id>%s)' % BBCCoUkIE._ID_REGEX
_URL_TEMPLATE = 'http://www.bbc.co.uk/iplayer/episode/%s'
_VIDEO_ID_TEMPLATE = r'data-ip-id=["\'](%s)'
_TEST = {
'url': 'http://www.bbc.co.uk/iplayer/episodes/b05rcz9v',
'info_dict': {
'id': 'b05rcz9v',
'title': 'The Disappearance',
'description': 'French thriller serial about a missing teenager.',
},
'playlist_mincount': 6,
}
def _extract_title_and_description(self, webpage):
title = self._search_regex(r'<h1>([^<]+)</h1>', webpage, 'title', fatal=False)
description = self._search_regex(
r'<p[^>]+class=(["\'])subtitle\1[^>]*>(?P<value>[^<]+)</p>',
webpage, 'description', fatal=False, group='value')
return title, description
class BBCCoUkPlaylistIE(BBCCoUkPlaylistBaseIE):
IE_NAME = 'bbc.co.uk:playlist'
_VALID_URL = r'https?://(?:www\.)?bbc\.co\.uk/programmes/(?P<id>%s)/(?:episodes|broadcasts|clips)' % BBCCoUkIE._ID_REGEX
_URL_TEMPLATE = 'http://www.bbc.co.uk/programmes/%s'
_VIDEO_ID_TEMPLATE = r'data-pid=["\'](%s)'
_TESTS = [{
'url': 'http://www.bbc.co.uk/programmes/b05rcz9v/clips',
'info_dict': {
'id': 'b05rcz9v',
'title': 'The Disappearance - Clips - BBC Four',
'description': 'French thriller serial about a missing teenager.',
},
'playlist_mincount': 7,
}, {
'url': 'http://www.bbc.co.uk/programmes/b05rcz9v/broadcasts/2016/06',
'only_matching': True,
}, {
'url': 'http://www.bbc.co.uk/programmes/b05rcz9v/clips',
'only_matching': True,
}, {
'url': 'http://www.bbc.co.uk/programmes/b055jkys/episodes/player',
'only_matching': True,
}]
def _extract_title_and_description(self, webpage):
title = self._og_search_title(webpage, fatal=False)
description = self._og_search_description(webpage)
return title, description

View File

@@ -1,31 +1,27 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_urllib_parse_unquote
from ..utils import (
xpath_text,
xpath_with_ns,
int_or_none,
parse_iso8601,
)
from .mtv import MTVServicesInfoExtractor
from ..utils import unified_strdate
from ..compat import compat_urllib_parse_urlencode
class BetIE(InfoExtractor):
class BetIE(MTVServicesInfoExtractor):
_VALID_URL = r'https?://(?:www\.)?bet\.com/(?:[^/]+/)+(?P<id>.+?)\.html'
_TESTS = [
{
'url': 'http://www.bet.com/news/politics/2014/12/08/in-bet-exclusive-obama-talks-race-and-racism.html',
'info_dict': {
'id': 'news/national/2014/a-conversation-with-president-obama',
'id': '07e96bd3-8850-3051-b856-271b457f0ab8',
'display_id': 'in-bet-exclusive-obama-talks-race-and-racism',
'ext': 'flv',
'title': 'A Conversation With President Obama',
'description': 'md5:699d0652a350cf3e491cd15cc745b5da',
'description': 'President Obama urges persistence in confronting racism and bias.',
'duration': 1534,
'timestamp': 1418075340,
'upload_date': '20141208',
'uploader': 'admin',
'thumbnail': 're:(?i)^https?://.*\.jpg$',
'subtitles': {
'en': 'mincount:2',
}
},
'params': {
# rtmp download
@@ -35,16 +31,17 @@ class BetIE(InfoExtractor):
{
'url': 'http://www.bet.com/video/news/national/2014/justice-for-ferguson-a-community-reacts.html',
'info_dict': {
'id': 'news/national/2014/justice-for-ferguson-a-community-reacts',
'id': '9f516bf1-7543-39c4-8076-dd441b459ba9',
'display_id': 'justice-for-ferguson-a-community-reacts',
'ext': 'flv',
'title': 'Justice for Ferguson: A Community Reacts',
'description': 'A BET News special.',
'duration': 1696,
'timestamp': 1416942360,
'upload_date': '20141125',
'uploader': 'admin',
'thumbnail': 're:(?i)^https?://.*\.jpg$',
'subtitles': {
'en': 'mincount:2',
}
},
'params': {
# rtmp download
@@ -53,57 +50,32 @@ class BetIE(InfoExtractor):
}
]
_FEED_URL = "http://feeds.mtvnservices.com/od/feed/bet-mrss-player"
def _get_feed_query(self, uri):
return compat_urllib_parse_urlencode({
'uuid': uri,
})
def _extract_mgid(self, webpage):
return self._search_regex(r'data-uri="([^"]+)', webpage, 'mgid')
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
mgid = self._extract_mgid(webpage)
videos_info = self._get_videos_info(mgid)
media_url = compat_urllib_parse_unquote(self._search_regex(
[r'mediaURL\s*:\s*"([^"]+)"', r"var\s+mrssMediaUrl\s*=\s*'([^']+)'"],
webpage, 'media URL'))
info_dict = videos_info['entries'][0]
video_id = self._search_regex(
r'/video/(.*)/_jcr_content/', media_url, 'video id')
upload_date = unified_strdate(self._html_search_meta('date', webpage))
description = self._html_search_meta('description', webpage)
mrss = self._download_xml(media_url, display_id)
item = mrss.find('./channel/item')
NS_MAP = {
'dc': 'http://purl.org/dc/elements/1.1/',
'media': 'http://search.yahoo.com/mrss/',
'ka': 'http://kickapps.com/karss',
}
title = xpath_text(item, './title', 'title')
description = xpath_text(
item, './description', 'description', fatal=False)
timestamp = parse_iso8601(xpath_text(
item, xpath_with_ns('./dc:date', NS_MAP),
'upload date', fatal=False))
uploader = xpath_text(
item, xpath_with_ns('./dc:creator', NS_MAP),
'uploader', fatal=False)
media_content = item.find(
xpath_with_ns('./media:content', NS_MAP))
duration = int_or_none(media_content.get('duration'))
smil_url = media_content.get('url')
thumbnail = media_content.find(
xpath_with_ns('./media:thumbnail', NS_MAP)).get('url')
formats = self._extract_smil_formats(smil_url, display_id)
self._sort_formats(formats)
return {
'id': video_id,
info_dict.update({
'display_id': display_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'timestamp': timestamp,
'uploader': uploader,
'duration': duration,
'formats': formats,
}
'upload_date': upload_date,
})
return info_dict

View File

@@ -1,34 +1,42 @@
# coding: utf-8
from __future__ import unicode_literals
import calendar
import datetime
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..compat import (
compat_etree_fromstring,
compat_str,
compat_parse_qs,
compat_xml_parse_error,
)
from ..utils import (
int_or_none,
unescapeHTML,
ExtractorError,
int_or_none,
float_or_none,
xpath_text,
)
class BiliBiliIE(InfoExtractor):
_VALID_URL = r'https?://www\.bilibili\.(?:tv|com)/video/av(?P<id>\d+)(?:/index_(?P<page_num>\d+).html)?'
_VALID_URL = r'https?://www\.bilibili\.(?:tv|com)/video/av(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.bilibili.tv/video/av1074402/',
'md5': '2c301e4dab317596e837c3e7633e7d86',
'md5': '5f7d29e1a2872f3df0cf76b1f87d3788',
'info_dict': {
'id': '1554319',
'ext': 'flv',
'title': '【金坷垃】金泡沫',
'duration': 308313,
'description': 'md5:ce18c2a2d2193f0df2917d270f2e5923',
'duration': 308.067,
'timestamp': 1398012660,
'upload_date': '20140420',
'thumbnail': 're:^https?://.+\.jpg',
'description': 'md5:ce18c2a2d2193f0df2917d270f2e5923',
'timestamp': 1397983878,
'uploader': '菊子桑',
'uploader_id': '156160',
},
}, {
'url': 'http://www.bilibili.com/video/av1041170/',
@@ -36,75 +44,186 @@ class BiliBiliIE(InfoExtractor):
'id': '1041170',
'title': '【BD1080P】刀语【诸神&异域】',
'description': '这是个神奇的故事~每个人不留弹幕不给走哦~切利哦!~',
'uploader': '枫叶逝去',
'timestamp': 1396501299,
},
'playlist_count': 9,
}, {
'url': 'http://www.bilibili.com/video/av4808130/',
'info_dict': {
'id': '4808130',
'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
},
'playlist': [{
'md5': '55cdadedf3254caaa0d5d27cf20a8f9c',
'info_dict': {
'id': '4808130_part1',
'ext': 'flv',
'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
'timestamp': 1464564180,
'upload_date': '20160529',
'uploader': '喜欢拉面',
'uploader_id': '151066',
},
}, {
'md5': '926f9f67d0c482091872fbd8eca7ea3d',
'info_dict': {
'id': '4808130_part2',
'ext': 'flv',
'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
'timestamp': 1464564180,
'upload_date': '20160529',
'uploader': '喜欢拉面',
'uploader_id': '151066',
},
}, {
'md5': '4b7b225b968402d7c32348c646f1fd83',
'info_dict': {
'id': '4808130_part3',
'ext': 'flv',
'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
'timestamp': 1464564180,
'upload_date': '20160529',
'uploader': '喜欢拉面',
'uploader_id': '151066',
},
}, {
'md5': '7b795e214166501e9141139eea236e91',
'info_dict': {
'id': '4808130_part4',
'ext': 'flv',
'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
'timestamp': 1464564180,
'upload_date': '20160529',
'uploader': '喜欢拉面',
'uploader_id': '151066',
},
}],
}, {
# Missing upload time
'url': 'http://www.bilibili.com/video/av1867637/',
'info_dict': {
'id': '2880301',
'ext': 'flv',
'title': '【HDTV】【喜剧】岳父岳母真难当 2014【法国票房冠军】',
'description': '一个信奉天主教的法国旧式传统资产阶级家庭中有四个女儿。三个女儿却分别找了阿拉伯、犹太、中国丈夫,老夫老妻唯独期盼剩下未嫁的小女儿能找一个信奉天主教的法国白人,结果没想到小女儿找了一位非裔黑人……【这次应该不会跳帧了】',
'uploader': '黑夜为猫',
'uploader_id': '610729',
},
'params': {
# Just to test metadata extraction
'skip_download': True,
},
'expected_warnings': ['upload time'],
}]
# BiliBili blocks keys from time to time. The current key is extracted from
# the Android client
# TODO: find the sign algorithm used in the flash player
_APP_KEY = '86385cdc024c0f6c'
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
page_num = mobj.group('page_num') or '1'
view_data = self._download_json(
'http://api.bilibili.com/view?type=json&appkey=8e9fc618fbd41e28&id=%s&page=%s' % (video_id, page_num),
video_id)
if 'error' in view_data:
raise ExtractorError('%s said: %s' % (self.IE_NAME, view_data['error']), expected=True)
webpage = self._download_webpage(url, video_id)
cid = view_data['cid']
title = unescapeHTML(view_data['title'])
params = compat_parse_qs(self._search_regex(
[r'EmbedPlayer\([^)]+,\s*"([^"]+)"\)',
r'<iframe[^>]+src="https://secure\.bilibili\.com/secure,([^"]+)"'],
webpage, 'player parameters'))
cid = params['cid'][0]
doc = self._download_xml(
'http://interface.bilibili.com/v_cdn_play?appkey=8e9fc618fbd41e28&cid=%s' % cid,
cid,
'Downloading page %s/%s' % (page_num, view_data['pages'])
)
info_xml_str = self._download_webpage(
'http://interface.bilibili.com/v_cdn_play',
cid, query={'appkey': self._APP_KEY, 'cid': cid},
note='Downloading video info page')
if xpath_text(doc, './result') == 'error':
raise ExtractorError('%s said: %s' % (self.IE_NAME, xpath_text(doc, './message')), expected=True)
err_msg = None
durls = None
info_xml = None
try:
info_xml = compat_etree_fromstring(info_xml_str.encode('utf-8'))
except compat_xml_parse_error:
info_json = self._parse_json(info_xml_str, video_id, fatal=False)
err_msg = (info_json or {}).get('error_text')
else:
err_msg = xpath_text(info_xml, './message')
if info_xml is not None:
durls = info_xml.findall('./durl')
if not durls:
if err_msg:
raise ExtractorError('%s said: %s' % (self.IE_NAME, err_msg), expected=True)
else:
raise ExtractorError('No videos found!')
entries = []
for durl in doc.findall('./durl'):
for durl in durls:
size = xpath_text(durl, ['./filesize', './size'])
formats = [{
'url': durl.find('./url').text,
'filesize': int_or_none(size),
'ext': 'flv',
}]
backup_urls = durl.find('./backup_url')
if backup_urls is not None:
for backup_url in backup_urls.findall('./url'):
formats.append({'url': backup_url.text})
formats.reverse()
for backup_url in durl.findall('./backup_url/url'):
formats.append({
'url': backup_url.text,
# backup URLs have lower priorities
'preference': -2 if 'hd.mp4' in backup_url.text else -3,
})
self._sort_formats(formats)
entries.append({
'id': '%s_part%s' % (cid, xpath_text(durl, './order')),
'title': title,
'duration': int_or_none(xpath_text(durl, './length'), 1000),
'formats': formats,
})
title = self._html_search_regex('<h1[^>]+title="([^"]+)">', webpage, 'title')
description = self._html_search_meta('description', webpage)
datetime_str = self._html_search_regex(
r'<time[^>]+datetime="([^"]+)"', webpage, 'upload time', fatal=False)
timestamp = None
if datetime_str:
timestamp = calendar.timegm(datetime.datetime.strptime(datetime_str, '%Y-%m-%dT%H:%M').timetuple())
# TODO 'view_count' requires deobfuscating Javascript
info = {
'id': compat_str(cid),
'title': title,
'description': view_data.get('description'),
'thumbnail': view_data.get('pic'),
'uploader': view_data.get('author'),
'timestamp': int_or_none(view_data.get('created')),
'view_count': int_or_none(view_data.get('play')),
'duration': int_or_none(xpath_text(doc, './timelength')),
'description': description,
'timestamp': timestamp,
'thumbnail': self._html_search_meta('thumbnailUrl', webpage),
'duration': float_or_none(xpath_text(info_xml, './timelength'), scale=1000),
}
uploader_mobj = re.search(
r'<a[^>]+href="https?://space\.bilibili\.com/(?P<id>\d+)"[^>]+title="(?P<name>[^"]+)"',
webpage)
if uploader_mobj:
info.update({
'uploader': uploader_mobj.group('name'),
'uploader_id': uploader_mobj.group('id'),
})
for entry in entries:
entry.update(info)
if len(entries) == 1:
entries[0].update(info)
return entries[0]
else:
info.update({
for idx, entry in enumerate(entries):
entry['id'] = '%s_part%d' % (video_id, (idx + 1))
return {
'_type': 'multi_video',
'id': video_id,
'title': title,
'description': description,
'entries': entries,
})
return info
}

View File

@@ -0,0 +1,39 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class BIQLEIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?biqle\.(?:com|org|ru)/watch/(?P<id>-?\d+_\d+)'
_TESTS = [{
'url': 'http://www.biqle.ru/watch/847655_160197695',
'md5': 'ad5f746a874ccded7b8f211aeea96637',
'info_dict': {
'id': '160197695',
'ext': 'mp4',
'title': 'Foo Fighters - The Pretender (Live at Wembley Stadium)',
'uploader': 'Andrey Rogozin',
'upload_date': '20110605',
}
}, {
'url': 'https://biqle.org/watch/-44781847_168547604',
'md5': '7f24e72af1db0edf7c1aaba513174f97',
'info_dict': {
'id': '168547604',
'ext': 'mp4',
'title': 'Ребенок в шоке от автоматической мойки',
'uploader': 'Dmitry Kotov',
}
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
embed_url = self._proto_relative_url(self._search_regex(
r'<iframe.+?src="((?:http:)?//daxab\.com/[^"]+)".*?></iframe>', webpage, 'embed url'))
return {
'_type': 'url_transparent',
'url': embed_url,
}

View File

@@ -17,6 +17,9 @@ class BloombergIE(InfoExtractor):
'title': 'Shah\'s Presentation on Foreign-Exchange Strategies',
'description': 'md5:a8ba0302912d03d246979735c17d2761',
},
'params': {
'format': 'best[format_id^=hds]',
},
}, {
'url': 'http://www.bloomberg.com/news/articles/2015-11-12/five-strange-things-that-have-been-happening-in-financial-markets',
'only_matching': True,

View File

@@ -29,7 +29,8 @@ class BRIE(InfoExtractor):
'duration': 180,
'uploader': 'Reinhard Weber',
'upload_date': '20150422',
}
},
'skip': '404 not found',
},
{
'url': 'http://www.br.de/nachrichten/oberbayern/inhalt/muenchner-polizeipraesident-schreiber-gestorben-100.html',
@@ -40,7 +41,8 @@ class BRIE(InfoExtractor):
'title': 'Manfred Schreiber ist tot',
'description': 'md5:b454d867f2a9fc524ebe88c3f5092d97',
'duration': 26,
}
},
'skip': '404 not found',
},
{
'url': 'https://www.br-klassik.de/audio/peeping-tom-premierenkritik-dance-festival-muenchen-100.html',
@@ -51,7 +53,8 @@ class BRIE(InfoExtractor):
'title': 'Kurzweilig und sehr bewegend',
'description': 'md5:0351996e3283d64adeb38ede91fac54e',
'duration': 296,
}
},
'skip': '404 not found',
},
{
'url': 'http://www.br.de/radio/bayern1/service/team/videos/team-video-erdelt100.html',

View File

@@ -307,9 +307,10 @@ class BrightcoveLegacyIE(InfoExtractor):
playlist_title=playlist_info['mediaCollectionDTO']['displayName'])
def _extract_video_info(self, video_info):
video_id = compat_str(video_info['id'])
publisher_id = video_info.get('publisherId')
info = {
'id': compat_str(video_info['id']),
'id': video_id,
'title': video_info['displayName'].strip(),
'description': video_info.get('shortDescription'),
'thumbnail': video_info.get('videoStillURL') or video_info.get('thumbnailURL'),
@@ -331,7 +332,8 @@ class BrightcoveLegacyIE(InfoExtractor):
url_comp = compat_urllib_parse_urlparse(url)
if url_comp.path.endswith('.m3u8'):
formats.extend(
self._extract_m3u8_formats(url, info['id'], 'mp4'))
self._extract_m3u8_formats(
url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
continue
elif 'akamaihd.net' in url_comp.netloc:
# This type of renditions are served through
@@ -365,7 +367,7 @@ class BrightcoveLegacyIE(InfoExtractor):
a_format.update({
'format_id': 'hls%s' % ('-%s' % tbr if tbr else ''),
'ext': 'mp4',
'protocol': 'm3u8',
'protocol': 'm3u8_native',
})
formats.append(a_format)
@@ -395,7 +397,7 @@ class BrightcoveLegacyIE(InfoExtractor):
return ad_info
if 'url' not in info and not info.get('formats'):
raise ExtractorError('Unable to extract video url for %s' % info['id'])
raise ExtractorError('Unable to extract video url for %s' % video_id)
return info
@@ -442,6 +444,10 @@ class BrightcoveNewIE(InfoExtractor):
# non numeric ref: prefixed video id
'url': 'http://players.brightcove.net/710858724001/default_default/index.html?videoId=ref:event-stream-356',
'only_matching': True,
}, {
# unavailable video without message but with error_code
'url': 'http://players.brightcove.net/1305187701/c832abfb-641b-44eb-9da0-2fe76786505f_default/index.html?videoId=4377407326001',
'only_matching': True,
}]
@staticmethod
@@ -512,8 +518,9 @@ class BrightcoveNewIE(InfoExtractor):
})
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
json_data = self._parse_json(e.cause.read().decode(), video_id)
raise ExtractorError(json_data[0]['message'], expected=True)
json_data = self._parse_json(e.cause.read().decode(), video_id)[0]
raise ExtractorError(
json_data.get('message') or json_data['error_code'], expected=True)
raise
title = json_data['name'].strip()
@@ -527,7 +534,7 @@ class BrightcoveNewIE(InfoExtractor):
if not src:
continue
formats.extend(self._extract_m3u8_formats(
src, video_id, 'mp4', m3u8_id='hls', fatal=False))
src, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
elif source_type == 'application/dash+xml':
if not src:
continue

View File

@@ -11,6 +11,7 @@ class BYUtvIE(InfoExtractor):
_VALID_URL = r'^https?://(?:www\.)?byutv.org/watch/[0-9a-f-]+/(?P<video_id>[^/?#]+)'
_TEST = {
'url': 'http://www.byutv.org/watch/6587b9a3-89d2-42a6-a7f7-fd2f81840a7d/studio-c-season-5-episode-5',
'md5': '05850eb8c749e2ee05ad5a1c34668493',
'info_dict': {
'id': 'studio-c-season-5-episode-5',
'ext': 'mp4',
@@ -21,7 +22,8 @@ class BYUtvIE(InfoExtractor):
},
'params': {
'skip_download': True,
}
},
'add_ie': ['Ooyala'],
}
def _real_extract(self, url):

View File

@@ -4,11 +4,11 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_urllib_parse_urlparse
from ..utils import (
ExtractorError,
HEADRequest,
unified_strdate,
url_basename,
qualities,
int_or_none,
)
@@ -16,24 +16,38 @@ from ..utils import (
class CanalplusIE(InfoExtractor):
IE_DESC = 'canalplus.fr, piwiplus.fr and d8.tv'
_VALID_URL = r'https?://(?:www\.(?P<site>canalplus\.fr|piwiplus\.fr|d8\.tv|itele\.fr)/.*?/(?P<path>.*)|player\.canalplus\.fr/#/(?P<id>[0-9]+))'
_VALID_URL = r'''(?x)
https?://
(?:
(?:
(?:(?:www|m)\.)?canalplus\.fr|
(?:www\.)?piwiplus\.fr|
(?:www\.)?d8\.tv|
(?:www\.)?d17\.tv|
(?:www\.)?itele\.fr
)/(?:(?:[^/]+/)*(?P<display_id>[^/?#&]+))?(?:\?.*\bvid=(?P<vid>\d+))?|
player\.canalplus\.fr/#/(?P<id>\d+)
)
'''
_VIDEO_INFO_TEMPLATE = 'http://service.canal-plus.com/video/rest/getVideosLiees/%s/%s?format=json'
_SITE_ID_MAP = {
'canalplus.fr': 'cplus',
'piwiplus.fr': 'teletoon',
'd8.tv': 'd8',
'itele.fr': 'itele',
'canalplus': 'cplus',
'piwiplus': 'teletoon',
'd8': 'd8',
'd17': 'd17',
'itele': 'itele',
}
_TESTS = [{
'url': 'http://www.canalplus.fr/c-emissions/pid1830-c-zapping.html?vid=1263092',
'md5': '12164a6f14ff6df8bd628e8ba9b10b78',
'url': 'http://www.canalplus.fr/c-emissions/pid1830-c-zapping.html?vid=1192814',
'md5': '41f438a4904f7664b91b4ed0dec969dc',
'info_dict': {
'id': '1263092',
'id': '1192814',
'ext': 'mp4',
'title': 'Le Zapping - 13/05/15',
'description': 'md5:09738c0d06be4b5d06a0940edb0da73f',
'upload_date': '20150513',
'title': "L'Année du Zapping 2014 - L'Année du Zapping 2014",
'description': "Toute l'année 2014 dans un Zapping exceptionnel !",
'upload_date': '20150105',
},
}, {
'url': 'http://www.piwiplus.fr/videos-piwi/pid1405-le-labyrinthe-boing-super-ranger.html?vid=1108190',
@@ -46,35 +60,45 @@ class CanalplusIE(InfoExtractor):
},
'skip': 'Only works from France',
}, {
'url': 'http://www.d8.tv/d8-docs-mags/pid6589-d8-campagne-intime.html',
'url': 'http://www.d8.tv/d8-docs-mags/pid5198-d8-en-quete-d-actualite.html?vid=1390231',
'info_dict': {
'id': '966289',
'ext': 'flv',
'title': 'Campagne intime - Documentaire exceptionnel',
'description': 'md5:d2643b799fb190846ae09c61e59a859f',
'upload_date': '20131108',
},
'skip': 'videos get deleted after a while',
}, {
'url': 'http://www.itele.fr/france/video/aubervilliers-un-lycee-en-colere-111559',
'md5': '38b8f7934def74f0d6f3ba6c036a5f82',
'info_dict': {
'id': '1213714',
'id': '1390231',
'ext': 'mp4',
'title': 'Aubervilliers : un lycée en colère - Le 11/02/2015 à 06h45',
'description': 'md5:8216206ec53426ea6321321f3b3c16db',
'upload_date': '20150211',
'title': "Vacances pas chères : prix discount ou grosses dépenses ? - En quête d'actualité",
'description': 'md5:edb6cf1cb4a1e807b5dd089e1ac8bfc6',
'upload_date': '20160512',
},
'params': {
'skip_download': True,
},
}, {
'url': 'http://www.itele.fr/chroniques/invite-bruce-toussaint/thierry-solere-nicolas-sarkozy-officialisera-sa-candidature-a-la-primaire-quand-il-le-voudra-167224',
'info_dict': {
'id': '1398334',
'ext': 'mp4',
'title': "L'invité de Bruce Toussaint du 07/06/2016 - ",
'description': 'md5:40ac7c9ad0feaeb6f605bad986f61324',
'upload_date': '20160607',
},
'params': {
'skip_download': True,
},
}, {
'url': 'http://m.canalplus.fr/?vid=1398231',
'only_matching': True,
}, {
'url': 'http://www.d17.tv/emissions/pid8303-lolywood.html?vid=1397061',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.groupdict().get('id')
video_id = mobj.groupdict().get('id') or mobj.groupdict().get('vid')
site_id = self._SITE_ID_MAP[mobj.group('site') or 'canal']
site_id = self._SITE_ID_MAP[compat_urllib_parse_urlparse(url).netloc.rsplit('.', 2)[-2]]
# Beware, some subclasses do not define an id group
display_id = url_basename(mobj.group('path'))
display_id = mobj.group('display_id') or video_id
if video_id is None:
webpage = self._download_webpage(url, display_id)

View File

@@ -0,0 +1,88 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
float_or_none,
int_or_none,
try_get,
)
class CarambaTVIE(InfoExtractor):
_VALID_URL = r'(?:carambatv:|https?://video1\.carambatv\.ru/v/)(?P<id>\d+)'
_TESTS = [{
'url': 'http://video1.carambatv.ru/v/191910501',
'md5': '2f4a81b7cfd5ab866ee2d7270cb34a2a',
'info_dict': {
'id': '191910501',
'ext': 'mp4',
'title': '[BadComedian] - Разборка в Маниле (Абсолютный обзор)',
'thumbnail': 're:^https?://.*\.jpg',
'duration': 2678.31,
},
}, {
'url': 'carambatv:191910501',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
video = self._download_json(
'http://video1.carambatv.ru/v/%s/videoinfo.js' % video_id,
video_id)
title = video['title']
base_url = video.get('video') or 'http://video1.carambatv.ru/v/%s/' % video_id
formats = [{
'url': base_url + f['fn'],
'height': int_or_none(f.get('height')),
'format_id': '%sp' % f['height'] if f.get('height') else None,
} for f in video['qualities'] if f.get('fn')]
self._sort_formats(formats)
thumbnail = video.get('splash')
duration = float_or_none(try_get(
video, lambda x: x['annotations'][0]['end_time'], compat_str))
return {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'duration': duration,
'formats': formats,
}
class CarambaTVPageIE(InfoExtractor):
_VALID_URL = r'https?://carambatv\.ru/(?:[^/]+/)+(?P<id>[^/?#&]+)'
_TEST = {
'url': 'http://carambatv.ru/movie/bad-comedian/razborka-v-manile/',
'md5': '',
'info_dict': {
'id': '191910501',
'ext': 'mp4',
'title': '[BadComedian] - Разборка в Маниле (Абсолютный обзор)',
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 2678.31,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_url = self._og_search_property('video:iframe', webpage, default=None)
if not video_url:
video_id = self._search_regex(
r'(?:video_id|crmb_vuid)\s*[:=]\s*["\']?(\d+)',
webpage, 'video id')
video_url = 'carambatv:%s' % video_id
return self.url_result(video_url, CarambaTVIE.ie_key())

View File

@@ -4,65 +4,66 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import js_to_json
from ..utils import (
js_to_json,
smuggle_url,
)
class CBCIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?cbc\.ca/(?:[^/]+/)+(?P<id>[^/?#]+)'
_VALID_URL = r'https?://(?:www\.)?cbc\.ca/(?!player/)(?:[^/]+/)+(?P<id>[^/?#]+)'
_TESTS = [{
# with mediaId
'url': 'http://www.cbc.ca/22minutes/videos/clips-season-23/don-cherry-play-offs',
'md5': '97e24d09672fc4cf56256d6faa6c25bc',
'info_dict': {
'id': '2682904050',
'ext': 'flv',
'ext': 'mp4',
'title': 'Don Cherry All-Stars',
'description': 'Don Cherry has a bee in his bonnet about AHL player John Scott because that guys got heart.',
'timestamp': 1454475540,
'timestamp': 1454463000,
'upload_date': '20160203',
},
'params': {
# rtmp download
'skip_download': True,
'uploader': 'CBCC-NEW',
},
}, {
# with clipId
'url': 'http://www.cbc.ca/archives/entry/1978-robin-williams-freestyles-on-90-minutes-live',
'md5': '0274a90b51a9b4971fe005c63f592f12',
'info_dict': {
'id': '2487345465',
'ext': 'flv',
'ext': 'mp4',
'title': 'Robin Williams freestyles on 90 Minutes Live',
'description': 'Wacky American comedian Robin Williams shows off his infamous "freestyle" comedic talents while being interviewed on CBC\'s 90 Minutes Live.',
'upload_date': '19700101',
'upload_date': '19780210',
'uploader': 'CBCC-NEW',
},
'params': {
# rtmp download
'skip_download': True,
'timestamp': 255977160,
},
}, {
# multiple iframes
'url': 'http://www.cbc.ca/natureofthings/blog/birds-eye-view-from-vancouvers-burrard-street-bridge-how-we-got-the-shot',
'playlist': [{
'md5': '377572d0b49c4ce0c9ad77470e0b96b4',
'info_dict': {
'id': '2680832926',
'ext': 'flv',
'ext': 'mp4',
'title': 'An Eagle\'s-Eye View Off Burrard Bridge',
'description': 'Hercules the eagle flies from Vancouver\'s Burrard Bridge down to a nearby park with a mini-camera strapped to his back.',
'upload_date': '19700101',
'upload_date': '20160201',
'timestamp': 1454342820,
'uploader': 'CBCC-NEW',
},
}, {
'md5': '415a0e3f586113894174dfb31aa5bb1a',
'info_dict': {
'id': '2658915080',
'ext': 'flv',
'ext': 'mp4',
'title': 'Fly like an eagle!',
'description': 'Eagle equipped with a mini camera flies from the world\'s tallest tower',
'upload_date': '19700101',
'upload_date': '20150315',
'timestamp': 1426443984,
'uploader': 'CBCC-NEW',
},
}],
'params': {
# rtmp download
'skip_download': True,
},
}]
@classmethod
@@ -91,24 +92,54 @@ class CBCIE(InfoExtractor):
class CBCPlayerIE(InfoExtractor):
_VALID_URL = r'(?:cbcplayer:|https?://(?:www\.)?cbc\.ca/(?:player/play/|i/caffeine/syndicate/\?mediaId=))(?P<id>\d+)'
_TEST = {
_TESTS = [{
'url': 'http://www.cbc.ca/player/play/2683190193',
'md5': '64d25f841ddf4ddb28a235338af32e2c',
'info_dict': {
'id': '2683190193',
'ext': 'flv',
'ext': 'mp4',
'title': 'Gerry Runs a Sweat Shop',
'description': 'md5:b457e1c01e8ff408d9d801c1c2cd29b0',
'timestamp': 1455067800,
'timestamp': 1455071400,
'upload_date': '20160210',
'uploader': 'CBCC-NEW',
},
'params': {
# rtmp download
'skip_download': True,
}, {
# Redirected from http://www.cbc.ca/player/AudioMobile/All%20in%20a%20Weekend%20Montreal/ID/2657632011/
'url': 'http://www.cbc.ca/player/play/2657631896',
'md5': 'e5e708c34ae6fca156aafe17c43e8b75',
'info_dict': {
'id': '2657631896',
'ext': 'mp3',
'title': 'CBC Montreal is organizing its first ever community hackathon!',
'description': 'The modern technology we tend to depend on so heavily, is never without it\'s share of hiccups and headaches. Next weekend - CBC Montreal will be getting members of the public for its first Hackathon.',
'timestamp': 1425704400,
'upload_date': '20150307',
'uploader': 'CBCC-NEW',
},
}
}, {
# available only when we add `formats=MPEG4,FLV,MP3` to theplatform url
'url': 'http://www.cbc.ca/player/play/2164402062',
'md5': '17a61eb813539abea40618d6323a7f82',
'info_dict': {
'id': '2164402062',
'ext': 'flv',
'title': 'Cancer survivor four times over',
'description': 'Tim Mayer has beaten three different forms of cancer four times in five years.',
'timestamp': 1320410746,
'upload_date': '20111104',
'uploader': 'CBCC-NEW',
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
return self.url_result(
'http://feed.theplatform.com/f/ExhSPC/vms_5akSXx4Ng_Zn?byGuid=%s' % video_id,
'ThePlatformFeed', video_id)
return {
'_type': 'url_transparent',
'ie_key': 'ThePlatform',
'url': smuggle_url(
'http://link.theplatform.com/s/ExhSPC/media/guid/2655402169/%s?mbr=true&formats=MPEG4,FLV,MP3' % video_id, {
'force_smil_url': True
}),
'id': video_id,
}

View File

@@ -1,15 +1,15 @@
from __future__ import unicode_literals
from .theplatform import ThePlatformIE
import re
from .theplatform import ThePlatformFeedIE
from ..utils import (
xpath_text,
xpath_element,
int_or_none,
find_xpath_attr,
)
class CBSBaseIE(ThePlatformIE):
class CBSBaseIE(ThePlatformFeedIE):
def _parse_smil_subtitles(self, smil, namespace=None, subtitles_lang='en'):
closed_caption_e = find_xpath_attr(smil, self._xpath_ns('.//param', namespace), 'name', 'ClosedCaptionURL')
return {
@@ -19,9 +19,22 @@ class CBSBaseIE(ThePlatformIE):
}]
} if closed_caption_e is not None and closed_caption_e.attrib.get('value') else []
def _extract_video_info(self, filter_query, video_id):
return self._extract_feed_info(
'dJ5BDC', 'VxxJg8Ymh8sE', filter_query, video_id, lambda entry: {
'series': entry.get('cbs$SeriesTitle'),
'season_number': int_or_none(entry.get('cbs$SeasonNumber')),
'episode': entry.get('cbs$EpisodeTitle'),
'episode_number': int_or_none(entry.get('cbs$EpisodeNumber')),
}, {
'StreamPack': {
'manifest': 'm3u',
}
})
class CBSIE(CBSBaseIE):
_VALID_URL = r'https?://(?:www\.)?(?:cbs\.com/shows/[^/]+/(?:video|artist)|colbertlateshow\.com/(?:video|podcasts))/[^/]+/(?P<id>[^/]+)'
_VALID_URL = r'(?:cbs:|https?://(?:www\.)?(?:cbs\.com/shows/[^/]+/video|colbertlateshow\.com/(?:video|podcasts))/)(?P<id>[\w-]+)'
_TESTS = [{
'url': 'http://www.cbs.com/shows/garth-brooks/video/_u7W953k6la293J7EPTd9oHkSPs6Xn6_/connect-chat-feat-garth-brooks/',
@@ -36,25 +49,7 @@ class CBSIE(CBSBaseIE):
'upload_date': '20131127',
'uploader': 'CBSI-NEW',
},
'params': {
# rtmp download
'skip_download': True,
},
'_skip': 'Blocked outside the US',
}, {
'url': 'http://www.cbs.com/shows/liveonletterman/artist/221752/st-vincent/',
'info_dict': {
'id': 'WWF_5KqY3PK1',
'display_id': 'st-vincent',
'ext': 'flv',
'title': 'Live on Letterman - St. Vincent',
'description': 'Live On Letterman: St. Vincent in concert from New York\'s Ed Sullivan Theater on Tuesday, July 16, 2014.',
'duration': 3221,
},
'params': {
# rtmp download
'skip_download': True,
},
'expected_warnings': ['Failed to download m3u8 information'],
'_skip': 'Blocked outside the US',
}, {
'url': 'http://colbertlateshow.com/video/8GmB0oY0McANFvp2aEffk9jZZZ2YyXxy/the-colbeard/',
@@ -66,43 +61,5 @@ class CBSIE(CBSBaseIE):
TP_RELEASE_URL_TEMPLATE = 'http://link.theplatform.com/s/dJ5BDC/%s?mbr=true'
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
content_id = self._search_regex(
[r"video\.settings\.content_id\s*=\s*'([^']+)';", r"cbsplayer\.contentId\s*=\s*'([^']+)';"],
webpage, 'content id')
items_data = self._download_xml(
'http://can.cbs.com/thunder/player/videoPlayerService.php',
content_id, query={'partner': 'cbs', 'contentId': content_id})
video_data = xpath_element(items_data, './/item')
title = xpath_text(video_data, 'videoTitle', 'title', True)
subtitles = {}
formats = []
for item in items_data.findall('.//item'):
pid = xpath_text(item, 'pid')
if not pid:
continue
tp_release_url = self.TP_RELEASE_URL_TEMPLATE % pid
if '.m3u8' in xpath_text(item, 'contentUrl', default=''):
tp_release_url += '&manifest=m3u'
tp_formats, tp_subtitles = self._extract_theplatform_smil(
tp_release_url, content_id, 'Downloading %s SMIL data' % pid)
formats.extend(tp_formats)
subtitles = self._merge_subtitles(subtitles, tp_subtitles)
self._sort_formats(formats)
info = self.get_metadata('dJ5BDC/media/guid/2198311517/%s' % content_id, content_id)
info.update({
'id': content_id,
'display_id': display_id,
'title': title,
'series': xpath_text(video_data, 'seriesTitle'),
'season_number': int_or_none(xpath_text(video_data, 'seasonNumber')),
'episode_number': int_or_none(xpath_text(video_data, 'episodeNumber')),
'duration': int_or_none(xpath_text(video_data, 'videoLength'), 1000),
'thumbnail': xpath_text(video_data, 'previewImageURL'),
'formats': formats,
'subtitles': subtitles,
})
return info
content_id = self._match_id(url)
return self._extract_video_info('byGuid=%s' % content_id, content_id)

View File

@@ -0,0 +1,84 @@
# coding: utf-8
from __future__ import unicode_literals
import calendar
import datetime
from .anvato import AnvatoIE
from .sendtonews import SendtoNewsIE
from ..compat import compat_urlparse
class CBSLocalIE(AnvatoIE):
_VALID_URL = r'https?://[a-z]+\.cbslocal\.com/\d+/\d+/\d+/(?P<id>[0-9a-z-]+)'
_TESTS = [{
# Anvato backend
'url': 'http://losangeles.cbslocal.com/2016/05/16/safety-advocates-say-fatal-car-seat-failures-are-public-health-crisis',
'md5': 'f0ee3081e3843f575fccef901199b212',
'info_dict': {
'id': '3401037',
'ext': 'mp4',
'title': 'Safety Advocates Say Fatal Car Seat Failures Are \'Public Health Crisis\'',
'description': 'Collapsing seats have been the focus of scrutiny for decades, though experts say remarkably little has been done to address the issue. Randy Paige reports.',
'thumbnail': 're:^https?://.*',
'timestamp': 1463440500,
'upload_date': '20160516',
'subtitles': {
'en': 'mincount:5',
},
'categories': [
'Stations\\Spoken Word\\KCBSTV',
'Syndication\\MSN',
'Syndication\\NDN',
'Syndication\\AOL',
'Syndication\\Yahoo',
'Syndication\\Tribune',
'Syndication\\Curb.tv',
'Content\\News'
],
},
}, {
# SendtoNews embed
'url': 'http://cleveland.cbslocal.com/2016/05/16/indians-score-season-high-15-runs-in-blowout-win-over-reds-rapid-reaction/',
'info_dict': {
'id': 'GxfCe0Zo7D-175909-5588',
'ext': 'mp4',
'title': 'Recap: CLE 15, CIN 6',
'description': '5/16/16: Indians\' bats explode for 15 runs in a win',
'upload_date': '20160516',
'timestamp': 1463433840,
'duration': 49,
},
'params': {
# m3u8 download
'skip_download': True,
},
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
sendtonews_url = SendtoNewsIE._extract_url(webpage)
if sendtonews_url:
info_dict = {
'_type': 'url_transparent',
'url': compat_urlparse.urljoin(url, sendtonews_url),
}
else:
info_dict = self._extract_anvato_videos(webpage, display_id)
time_str = self._html_search_regex(
r'class="entry-date">([^<]+)<', webpage, 'released date', fatal=False)
timestamp = None
if time_str:
timestamp = calendar.timegm(datetime.datetime.strptime(
time_str, '%b %d, %Y %I:%M %p').timetuple())
info_dict.update({
'display_id': display_id,
'timestamp': timestamp,
})
return info_dict

View File

@@ -30,9 +30,12 @@ class CBSNewsIE(CBSBaseIE):
{
'url': 'http://www.cbsnews.com/videos/fort-hood-shooting-army-downplays-mental-illness-as-cause-of-attack/',
'info_dict': {
'id': 'fort-hood-shooting-army-downplays-mental-illness-as-cause-of-attack',
'id': 'SNJBOYzXiWBOvaLsdzwH8fmtP1SCd91Y',
'ext': 'mp4',
'title': 'Fort Hood shooting: Army downplays mental illness as cause of attack',
'description': 'md5:4a6983e480542d8b333a947bfc64ddc7',
'upload_date': '19700101',
'uploader': 'CBSI-NEW',
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 205,
'subtitles': {
@@ -58,30 +61,8 @@ class CBSNewsIE(CBSBaseIE):
webpage, 'video JSON info'), video_id)
item = video_info['item'] if 'item' in video_info else video_info
title = item.get('articleTitle') or item.get('hed')
duration = item.get('duration')
thumbnail = item.get('mediaImage') or item.get('thumbnail')
subtitles = {}
formats = []
for format_id in ['RtmpMobileLow', 'RtmpMobileHigh', 'Hls', 'RtmpDesktop']:
pid = item.get('media' + format_id)
if not pid:
continue
release_url = 'http://link.theplatform.com/s/dJ5BDC/%s?mbr=true' % pid
tp_formats, tp_subtitles = self._extract_theplatform_smil(release_url, video_id, 'Downloading %s SMIL data' % pid)
formats.extend(tp_formats)
subtitles = self._merge_subtitles(subtitles, tp_subtitles)
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'duration': duration,
'formats': formats,
'subtitles': subtitles,
}
guid = item['mpxRefId']
return self._extract_video_info('byGuid=%s' % guid, guid)
class CBSNewsLiveVideoIE(InfoExtractor):

View File

@@ -1,30 +1,28 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from .cbs import CBSBaseIE
class CBSSportsIE(InfoExtractor):
_VALID_URL = r'https?://www\.cbssports\.com/video/player/(?P<section>[^/]+)/(?P<id>[^/]+)'
class CBSSportsIE(CBSBaseIE):
_VALID_URL = r'https?://www\.cbssports\.com/video/player/[^/]+/(?P<id>\d+)'
_TEST = {
'url': 'http://www.cbssports.com/video/player/tennis/318462531970/0/us-open-flashbacks-1990s',
_TESTS = [{
'url': 'http://www.cbssports.com/video/player/videos/708337219968/0/ben-simmons-the-next-lebron?-not-so-fast',
'info_dict': {
'id': '_d5_GbO8p1sT',
'ext': 'flv',
'title': 'US Open flashbacks: 1990s',
'description': 'Bill Macatee relives the best moments in US Open history from the 1990s.',
'id': '708337219968',
'ext': 'mp4',
'title': 'Ben Simmons the next LeBron? Not so fast',
'description': 'md5:854294f627921baba1f4b9a990d87197',
'timestamp': 1466293740,
'upload_date': '20160618',
'uploader': 'CBSI-NEW',
},
}
'params': {
# m3u8 download
'skip_download': True,
}
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
section = mobj.group('section')
video_id = mobj.group('id')
all_videos = self._download_json(
'http://www.cbssports.com/data/video/player/getVideos/%s?as=json' % section,
video_id)
# The json file contains the info of all the videos in the section
video_info = next(v for v in all_videos if v['pcid'] == video_id)
return self.url_result('theplatform:%s' % video_info['pid'], 'ThePlatform')
video_id = self._match_id(url)
return self._extract_video_info('byId=%s' % video_id, video_id)

View File

@@ -1,13 +1,9 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
parse_duration,
qualities,
unified_strdate,
parse_iso8601,
)
@@ -19,14 +15,14 @@ class CCCIE(InfoExtractor):
'url': 'https://media.ccc.de/v/30C3_-_5443_-_en_-_saal_g_-_201312281830_-_introduction_to_processor_design_-_byterazor#video',
'md5': '3a1eda8f3a29515d27f5adb967d7e740',
'info_dict': {
'id': '30C3_-_5443_-_en_-_saal_g_-_201312281830_-_introduction_to_processor_design_-_byterazor',
'id': '1839',
'ext': 'mp4',
'title': 'Introduction to Processor Design',
'description': 'md5:80be298773966f66d56cb11260b879af',
'description': 'md5:df55f6d073d4ceae55aae6f2fd98a0ac',
'thumbnail': 're:^https?://.*\.jpg$',
'view_count': int,
'upload_date': '20131228',
'duration': 3660,
'timestamp': 1388188800,
'duration': 3710,
}
}, {
'url': 'https://media.ccc.de/v/32c3-7368-shopshifting#download',
@@ -34,79 +30,48 @@ class CCCIE(InfoExtractor):
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
event_id = self._search_regex("data-id='(\d+)'", webpage, 'event id')
event_data = self._download_json('https://media.ccc.de/public/events/%s' % event_id, event_id)
if self._downloader.params.get('prefer_free_formats'):
preference = qualities(['mp3', 'opus', 'mp4-lq', 'webm-lq', 'h264-sd', 'mp4-sd', 'webm-sd', 'mp4', 'webm', 'mp4-hd', 'h264-hd', 'webm-hd'])
else:
preference = qualities(['opus', 'mp3', 'webm-lq', 'mp4-lq', 'webm-sd', 'h264-sd', 'mp4-sd', 'webm', 'mp4', 'webm-hd', 'mp4-hd', 'h264-hd'])
title = self._html_search_regex(
r'(?s)<h1>(.*?)</h1>', webpage, 'title')
description = self._html_search_regex(
r'(?s)<h3>About</h3>(.+?)<h3>',
webpage, 'description', fatal=False)
upload_date = unified_strdate(self._html_search_regex(
r"(?s)<span[^>]+class='[^']*fa-calendar-o'[^>]*>(.+?)</span>",
webpage, 'upload date', fatal=False))
view_count = int_or_none(self._html_search_regex(
r"(?s)<span class='[^']*fa-eye'></span>(.*?)</li>",
webpage, 'view count', fatal=False))
duration = parse_duration(self._html_search_regex(
r'(?s)<span[^>]+class=(["\']).*?fa-clock-o.*?\1[^>]*></span>(?P<duration>.+?)</li',
webpage, 'duration', fatal=False, group='duration'))
matches = re.finditer(r'''(?xs)
<(?:span|div)\s+class='label\s+filetype'>(?P<format>[^<]*)</(?:span|div)>\s*
<(?:span|div)\s+class='label\s+filetype'>(?P<lang>[^<]*)</(?:span|div)>\s*
<a\s+download\s+href='(?P<http_url>[^']+)'>\s*
(?:
.*?
<a\s+(?:download\s+)?href='(?P<torrent_url>[^']+\.torrent)'
)?''', webpage)
formats = []
for m in matches:
format = m.group('format')
format_id = self._search_regex(
r'.*/([a-z0-9_-]+)/[^/]*$',
m.group('http_url'), 'format id', default=None)
if format_id:
format_id = m.group('lang') + '-' + format_id
vcodec = 'h264' if 'h264' in format_id else (
'none' if format_id in ('mp3', 'opus') else None
for recording in event_data.get('recordings', []):
recording_url = recording.get('recording_url')
if not recording_url:
continue
language = recording.get('language')
folder = recording.get('folder')
format_id = None
if language:
format_id = language
if folder:
if language:
format_id += '-' + folder
else:
format_id = folder
vcodec = 'h264' if 'h264' in folder else (
'none' if folder in ('mp3', 'opus') else None
)
formats.append({
'format_id': format_id,
'format': format,
'language': m.group('lang'),
'url': m.group('http_url'),
'url': recording_url,
'width': int_or_none(recording.get('width')),
'height': int_or_none(recording.get('height')),
'filesize': int_or_none(recording.get('size'), invscale=1024 * 1024),
'language': language,
'vcodec': vcodec,
'preference': preference(format_id),
})
if m.group('torrent_url'):
formats.append({
'format_id': 'torrent-%s' % (format if format_id is None else format_id),
'format': '%s (torrent)' % format,
'proto': 'torrent',
'format_note': '(unsupported; will just download the .torrent file)',
'vcodec': vcodec,
'preference': -100 + preference(format_id),
'url': m.group('torrent_url'),
})
self._sort_formats(formats)
thumbnail = self._html_search_regex(
r"<video.*?poster='([^']+)'", webpage, 'thumbnail', fatal=False)
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'view_count': view_count,
'upload_date': upload_date,
'duration': duration,
'id': event_id,
'display_id': display_id,
'title': event_data['title'],
'description': event_data.get('description'),
'thumbnail': event_data.get('thumb_url'),
'timestamp': parse_iso8601(event_data.get('date')),
'duration': int_or_none(event_data.get('length')),
'tags': event_data.get('tags'),
'formats': formats,
}

View File

@@ -58,7 +58,8 @@ class CDAIE(InfoExtractor):
def extract_format(page, version):
unpacked = decode_packed_codes(page)
format_url = self._search_regex(
r"url:\\'(.+?)\\'", unpacked, '%s url' % version, fatal=False)
r"(?:file|url)\s*:\s*(\\?[\"'])(?P<url>http.+?)\1", unpacked,
'%s url' % version, fatal=False, group='url')
if not format_url:
return
f = {
@@ -75,7 +76,8 @@ class CDAIE(InfoExtractor):
info_dict['formats'].append(f)
if not info_dict['duration']:
info_dict['duration'] = parse_duration(self._search_regex(
r"duration:\\'(.+?)\\'", unpacked, 'duration', fatal=False))
r"duration\s*:\s*(\\?[\"'])(?P<duration>.+?)\1",
unpacked, 'duration', fatal=False, group='duration'))
extract_format(webpage, 'default')

View File

@@ -33,19 +33,33 @@ class CeskaTelevizeIE(InfoExtractor):
'skip_download': True,
},
}, {
'url': 'http://www.ceskatelevize.cz/ivysilani/10532695142-prvni-republika/bonus/14716-zpevacka-z-duparny-bobina',
'url': 'http://www.ceskatelevize.cz/ivysilani/10441294653-hyde-park-civilizace/215411058090502/bonus/20641-bonus-01-en',
'info_dict': {
'id': '61924494876844374',
'id': '61924494877028507',
'ext': 'mp4',
'title': 'První republika: Zpěvačka z Dupárny Bobina',
'description': 'Sága mapující atmosféru první republiky od r. 1918 do r. 1945.',
'title': 'Hyde Park Civilizace: Bonus 01 - En',
'description': 'English Subtittles',
'thumbnail': 're:^https?://.*\.jpg',
'duration': 88.4,
'duration': 81.3,
},
'params': {
# m3u8 download
'skip_download': True,
},
}, {
# live stream
'url': 'http://www.ceskatelevize.cz/ivysilani/zive/ct4/',
'info_dict': {
'id': 402,
'ext': 'mp4',
'title': 're:^ČT Sport \d{4}-\d{2}-\d{2} \d{2}:\d{2}$',
'is_live': True,
},
'params': {
# m3u8 download
'skip_download': True,
},
'skip': 'Georestricted to Czech Republic',
}, {
# video with 18+ caution trailer
'url': 'http://www.ceskatelevize.cz/porady/10520528904-queer/215562210900007-bogotart/',
@@ -118,19 +132,21 @@ class CeskaTelevizeIE(InfoExtractor):
req = sanitized_Request(compat_urllib_parse_unquote(playlist_url))
req.add_header('Referer', url)
playlist_title = self._og_search_title(webpage)
playlist_description = self._og_search_description(webpage)
playlist_title = self._og_search_title(webpage, default=None)
playlist_description = self._og_search_description(webpage, default=None)
playlist = self._download_json(req, playlist_id)['playlist']
playlist_len = len(playlist)
entries = []
for item in playlist:
is_live = item.get('type') == 'LIVE'
formats = []
for format_id, stream_url in item['streamUrls'].items():
formats.extend(self._extract_m3u8_formats(
stream_url, playlist_id, 'mp4',
entry_protocol='m3u8_native', fatal=False))
entry_protocol='m3u8' if is_live else 'm3u8_native',
fatal=False))
self._sort_formats(formats)
item_id = item.get('id') or item['assetId']
@@ -145,14 +161,22 @@ class CeskaTelevizeIE(InfoExtractor):
if subs:
subtitles = self.extract_subtitles(episode_id, subs)
if playlist_len == 1:
final_title = playlist_title or title
if is_live:
final_title = self._live_title(final_title)
else:
final_title = '%s (%s)' % (playlist_title, title)
entries.append({
'id': item_id,
'title': playlist_title if playlist_len == 1 else '%s (%s)' % (playlist_title, title),
'title': final_title,
'description': playlist_description if playlist_len == 1 else None,
'thumbnail': thumbnail,
'duration': duration,
'formats': formats,
'subtitles': subtitles,
'is_live': is_live,
})
return self.playlist_result(entries, playlist_id, playlist_title, playlist_description)

View File

@@ -20,54 +20,64 @@ class Channel9IE(InfoExtractor):
'''
IE_DESC = 'Channel 9'
IE_NAME = 'channel9'
_VALID_URL = r'https?://(?:www\.)?channel9\.msdn\.com/(?P<contentpath>.+)/?'
_VALID_URL = r'https?://(?:www\.)?channel9\.msdn\.com/(?P<contentpath>.+?)(?P<rss>/RSS)?/?(?:[?#&]|$)'
_TESTS = [
{
'url': 'http://channel9.msdn.com/Events/TechEd/Australia/2013/KOS002',
'md5': 'bbd75296ba47916b754e73c3a4bbdf10',
'info_dict': {
'id': 'Events/TechEd/Australia/2013/KOS002',
'ext': 'mp4',
'title': 'Developer Kick-Off Session: Stuff We Love',
'description': 'md5:c08d72240b7c87fcecafe2692f80e35f',
'duration': 4576,
'thumbnail': 're:http://.*\.jpg',
'session_code': 'KOS002',
'session_day': 'Day 1',
'session_room': 'Arena 1A',
'session_speakers': ['Ed Blankenship', 'Andrew Coates', 'Brady Gaster', 'Patrick Klug', 'Mads Kristensen'],
},
_TESTS = [{
'url': 'http://channel9.msdn.com/Events/TechEd/Australia/2013/KOS002',
'md5': 'bbd75296ba47916b754e73c3a4bbdf10',
'info_dict': {
'id': 'Events/TechEd/Australia/2013/KOS002',
'ext': 'mp4',
'title': 'Developer Kick-Off Session: Stuff We Love',
'description': 'md5:c08d72240b7c87fcecafe2692f80e35f',
'duration': 4576,
'thumbnail': 're:http://.*\.jpg',
'session_code': 'KOS002',
'session_day': 'Day 1',
'session_room': 'Arena 1A',
'session_speakers': ['Ed Blankenship', 'Andrew Coates', 'Brady Gaster', 'Patrick Klug',
'Mads Kristensen'],
},
{
'url': 'http://channel9.msdn.com/posts/Self-service-BI-with-Power-BI-nuclear-testing',
'md5': 'b43ee4529d111bc37ba7ee4f34813e68',
'info_dict': {
'id': 'posts/Self-service-BI-with-Power-BI-nuclear-testing',
'ext': 'mp4',
'title': 'Self-service BI with Power BI - nuclear testing',
'description': 'md5:d1e6ecaafa7fb52a2cacdf9599829f5b',
'duration': 1540,
'thumbnail': 're:http://.*\.jpg',
'authors': ['Mike Wilmot'],
},
}, {
'url': 'http://channel9.msdn.com/posts/Self-service-BI-with-Power-BI-nuclear-testing',
'md5': 'b43ee4529d111bc37ba7ee4f34813e68',
'info_dict': {
'id': 'posts/Self-service-BI-with-Power-BI-nuclear-testing',
'ext': 'mp4',
'title': 'Self-service BI with Power BI - nuclear testing',
'description': 'md5:d1e6ecaafa7fb52a2cacdf9599829f5b',
'duration': 1540,
'thumbnail': 're:http://.*\.jpg',
'authors': ['Mike Wilmot'],
},
{
# low quality mp4 is best
'url': 'https://channel9.msdn.com/Events/CPP/CppCon-2015/Ranges-for-the-Standard-Library',
'info_dict': {
'id': 'Events/CPP/CppCon-2015/Ranges-for-the-Standard-Library',
'ext': 'mp4',
'title': 'Ranges for the Standard Library',
'description': 'md5:2e6b4917677af3728c5f6d63784c4c5d',
'duration': 5646,
'thumbnail': 're:http://.*\.jpg',
},
'params': {
'skip_download': True,
},
}
]
}, {
# low quality mp4 is best
'url': 'https://channel9.msdn.com/Events/CPP/CppCon-2015/Ranges-for-the-Standard-Library',
'info_dict': {
'id': 'Events/CPP/CppCon-2015/Ranges-for-the-Standard-Library',
'ext': 'mp4',
'title': 'Ranges for the Standard Library',
'description': 'md5:2e6b4917677af3728c5f6d63784c4c5d',
'duration': 5646,
'thumbnail': 're:http://.*\.jpg',
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://channel9.msdn.com/Niners/Splendid22/Queue/76acff796e8f411184b008028e0d492b/RSS',
'info_dict': {
'id': 'Niners/Splendid22/Queue/76acff796e8f411184b008028e0d492b',
'title': 'Channel 9',
},
'playlist_count': 2,
}, {
'url': 'https://channel9.msdn.com/Events/DEVintersection/DEVintersection-2016/RSS',
'only_matching': True,
}, {
'url': 'https://channel9.msdn.com/Events/Speakers/scott-hanselman/RSS?UrlSafeName=scott-hanselman',
'only_matching': True,
}]
_RSS_URL = 'http://channel9.msdn.com/%s/RSS'
@@ -254,22 +264,30 @@ class Channel9IE(InfoExtractor):
return self.playlist_result(contents)
def _extract_list(self, content_path):
rss = self._download_xml(self._RSS_URL % content_path, content_path, 'Downloading RSS')
def _extract_list(self, video_id, rss_url=None):
if not rss_url:
rss_url = self._RSS_URL % video_id
rss = self._download_xml(rss_url, video_id, 'Downloading RSS')
entries = [self.url_result(session_url.text, 'Channel9')
for session_url in rss.findall('./channel/item/link')]
title_text = rss.find('./channel/title').text
return self.playlist_result(entries, content_path, title_text)
return self.playlist_result(entries, video_id, title_text)
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
content_path = mobj.group('contentpath')
rss = mobj.group('rss')
webpage = self._download_webpage(url, content_path, 'Downloading web page')
if rss:
return self._extract_list(content_path, url)
page_type_m = re.search(r'<meta name="WT.entryid" content="(?P<pagetype>[^:]+)[^"]+"/>', webpage)
if page_type_m is not None:
page_type = page_type_m.group('pagetype')
webpage = self._download_webpage(
url, content_path, 'Downloading web page')
page_type = self._search_regex(
r'<meta[^>]+name=(["\'])WT\.entryid\1[^>]+content=(["\'])(?P<pagetype>[^:]+).+?\2',
webpage, 'page type', default=None, group='pagetype')
if page_type:
if page_type == 'Entry': # Any 'item'-like page, may contain downloadable content
return self._extract_entry_item(webpage, content_path)
elif page_type == 'Session': # Event session page, may contain downloadable content
@@ -278,6 +296,5 @@ class Channel9IE(InfoExtractor):
return self._extract_list(content_path)
else:
raise ExtractorError('Unexpected WT.entryid %s' % page_type, expected=True)
else: # Assuming list
return self._extract_list(content_path)

View File

@@ -1,119 +0,0 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import ExtractorError
from .screenwavemedia import ScreenwaveMediaIE
class CinemassacreIE(InfoExtractor):
_VALID_URL = 'https?://(?:www\.)?cinemassacre\.com/(?P<date_y>[0-9]{4})/(?P<date_m>[0-9]{2})/(?P<date_d>[0-9]{2})/(?P<display_id>[^?#/]+)'
_TESTS = [
{
'url': 'http://cinemassacre.com/2012/11/10/avgn-the-movie-trailer/',
'md5': 'fde81fbafaee331785f58cd6c0d46190',
'info_dict': {
'id': 'Cinemassacre-19911',
'ext': 'mp4',
'upload_date': '20121110',
'title': '“Angry Video Game Nerd: The Movie” Trailer',
'description': 'md5:fb87405fcb42a331742a0dce2708560b',
},
'params': {
# m3u8 download
'skip_download': True,
},
},
{
'url': 'http://cinemassacre.com/2013/10/02/the-mummys-hand-1940',
'md5': 'd72f10cd39eac4215048f62ab477a511',
'info_dict': {
'id': 'Cinemassacre-521be8ef82b16',
'ext': 'mp4',
'upload_date': '20131002',
'title': 'The Mummys Hand (1940)',
},
'params': {
# m3u8 download
'skip_download': True,
},
},
{
# Youtube embedded video
'url': 'http://cinemassacre.com/2006/12/07/chronologically-confused-about-bad-movie-and-video-game-sequel-titles/',
'md5': 'ec9838a5520ef5409b3e4e42fcb0a3b9',
'info_dict': {
'id': 'OEVzPCY2T-g',
'ext': 'webm',
'title': 'AVGN: Chronologically Confused about Bad Movie and Video Game Sequel Titles',
'upload_date': '20061207',
'uploader': 'Cinemassacre',
'uploader_id': 'JamesNintendoNerd',
'description': 'md5:784734696c2b8b7f4b8625cc799e07f6',
}
},
{
# Youtube embedded video
'url': 'http://cinemassacre.com/2006/09/01/mckids/',
'md5': '7393c4e0f54602ad110c793eb7a6513a',
'info_dict': {
'id': 'FnxsNhuikpo',
'ext': 'webm',
'upload_date': '20060901',
'uploader': 'Cinemassacre Extra',
'description': 'md5:de9b751efa9e45fbaafd9c8a1123ed53',
'uploader_id': 'Cinemassacre',
'title': 'AVGN: McKids',
}
},
{
'url': 'http://cinemassacre.com/2015/05/25/mario-kart-64-nintendo-64-james-mike-mondays/',
'md5': '1376908e49572389e7b06251a53cdd08',
'info_dict': {
'id': 'Cinemassacre-555779690c440',
'ext': 'mp4',
'description': 'Lets Play Mario Kart 64 !! Mario Kart 64 is a classic go-kart racing game released for the Nintendo 64 (N64). Today James & Mike do 4 player Battle Mode with Kyle and Bootsy!',
'title': 'Mario Kart 64 (Nintendo 64) James & Mike Mondays',
'upload_date': '20150525',
},
'params': {
# m3u8 download
'skip_download': True,
},
}
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('display_id')
video_date = mobj.group('date_y') + mobj.group('date_m') + mobj.group('date_d')
webpage = self._download_webpage(url, display_id)
playerdata_url = self._search_regex(
[
ScreenwaveMediaIE.EMBED_PATTERN,
r'<iframe[^>]+src="(?P<url>(?:https?:)?//(?:[^.]+\.)?youtube\.com/.+?)"',
],
webpage, 'player data URL', default=None, group='url')
if not playerdata_url:
raise ExtractorError('Unable to find player data')
video_title = self._html_search_regex(
r'<title>(?P<title>.+?)\|', webpage, 'title')
video_description = self._html_search_regex(
r'<div class="entry-content">(?P<description>.+?)</div>',
webpage, 'description', flags=re.DOTALL, fatal=False)
video_thumbnail = self._og_search_thumbnail(webpage)
return {
'_type': 'url_transparent',
'display_id': display_id,
'title': video_title,
'description': video_description,
'upload_date': video_date,
'thumbnail': video_thumbnail,
'url': playerdata_url,
}

View File

@@ -0,0 +1,92 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class CloserToTruthIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?closertotruth\.com/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://closertotruth.com/series/solutions-the-mind-body-problem#video-3688',
'info_dict': {
'id': '0_zof1ktre',
'display_id': 'solutions-the-mind-body-problem',
'ext': 'mov',
'title': 'Solutions to the Mind-Body Problem?',
'upload_date': '20140221',
'timestamp': 1392956007,
'uploader_id': 'CTTXML'
},
'params': {
'skip_download': True,
},
}, {
'url': 'http://closertotruth.com/episodes/how-do-brains-work',
'info_dict': {
'id': '0_iuxai6g6',
'display_id': 'how-do-brains-work',
'ext': 'mov',
'title': 'How do Brains Work?',
'upload_date': '20140221',
'timestamp': 1392956024,
'uploader_id': 'CTTXML'
},
'params': {
'skip_download': True,
},
}, {
'url': 'http://closertotruth.com/interviews/1725',
'info_dict': {
'id': '1725',
'title': 'AyaFr-002',
},
'playlist_mincount': 2,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
partner_id = self._search_regex(
r'<script[^>]+src=["\'].*?\b(?:partner_id|p)/(\d+)',
webpage, 'kaltura partner_id')
title = self._search_regex(
r'<title>(.+?)\s*\|\s*.+?</title>', webpage, 'video title')
select = self._search_regex(
r'(?s)<select[^>]+id="select-version"[^>]*>(.+?)</select>',
webpage, 'select version', default=None)
if select:
entry_ids = set()
entries = []
for mobj in re.finditer(
r'<option[^>]+value=(["\'])(?P<id>[0-9a-z_]+)(?:#.+?)?\1[^>]*>(?P<title>[^<]+)',
webpage):
entry_id = mobj.group('id')
if entry_id in entry_ids:
continue
entry_ids.add(entry_id)
entries.append({
'_type': 'url_transparent',
'url': 'kaltura:%s:%s' % (partner_id, entry_id),
'ie_key': 'Kaltura',
'title': mobj.group('title'),
})
if entries:
return self.playlist_result(entries, display_id, title)
entry_id = self._search_regex(
r'<a[^>]+id=(["\'])embed-kaltura\1[^>]+data-kaltura=(["\'])(?P<id>[0-9a-z_]+)\2',
webpage, 'kaltura entry_id', group='id')
return {
'_type': 'url_transparent',
'display_id': display_id,
'url': 'kaltura:%s:%s' % (partner_id, entry_id),
'ie_key': 'Kaltura',
'title': title
}

View File

@@ -19,7 +19,7 @@ from ..utils import (
class CloudyIE(InfoExtractor):
_IE_DESC = 'cloudy.ec and videoraj.ch'
_VALID_URL = r'''(?x)
https?://(?:www\.)?(?P<host>cloudy\.ec|videoraj\.ch)/
https?://(?:www\.)?(?P<host>cloudy\.ec|videoraj\.(?:ch|to))/
(?:v/|embed\.php\?id=)
(?P<id>[A-Za-z0-9]+)
'''
@@ -37,7 +37,7 @@ class CloudyIE(InfoExtractor):
}
},
{
'url': 'http://www.videoraj.ch/v/47f399fd8bb60',
'url': 'http://www.videoraj.to/v/47f399fd8bb60',
'md5': '7d0f8799d91efd4eda26587421c3c3b0',
'info_dict': {
'id': '47f399fd8bb60',

View File

@@ -1,101 +0,0 @@
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..utils import int_or_none
class CollegeHumorIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?(?:www\.)?collegehumor\.com/(video|embed|e)/(?P<videoid>[0-9]+)/?(?P<shorttitle>.*)$'
_TESTS = [
{
'url': 'http://www.collegehumor.com/video/6902724/comic-con-cosplay-catastrophe',
'md5': 'dcc0f5c1c8be98dc33889a191f4c26bd',
'info_dict': {
'id': '6902724',
'ext': 'mp4',
'title': 'Comic-Con Cosplay Catastrophe',
'description': "Fans get creative this year at San Diego. Too creative. And yes, that's really Joss Whedon.",
'age_limit': 13,
'duration': 187,
},
}, {
'url': 'http://www.collegehumor.com/video/3505939/font-conference',
'md5': '72fa701d8ef38664a4dbb9e2ab721816',
'info_dict': {
'id': '3505939',
'ext': 'mp4',
'title': 'Font Conference',
'description': "This video wasn't long enough, so we made it double-spaced.",
'age_limit': 10,
'duration': 179,
},
}, {
# embedded youtube video
'url': 'http://www.collegehumor.com/embed/6950306',
'info_dict': {
'id': 'Z-bao9fg6Yc',
'ext': 'mp4',
'title': 'Young Americans Think President John F. Kennedy Died THIS MORNING IN A CAR ACCIDENT!!!',
'uploader': 'Mark Dice',
'uploader_id': 'MarkDice',
'description': 'md5:62c3dab9351fac7bb44b53b69511d87f',
'upload_date': '20140127',
},
'params': {
'skip_download': True,
},
'add_ie': ['Youtube'],
},
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('videoid')
jsonUrl = 'http://www.collegehumor.com/moogaloop/video/' + video_id + '.json'
data = json.loads(self._download_webpage(
jsonUrl, video_id, 'Downloading info JSON'))
vdata = data['video']
if vdata.get('youtubeId') is not None:
return {
'_type': 'url',
'url': vdata['youtubeId'],
'ie_key': 'Youtube',
}
AGE_LIMITS = {'nc17': 18, 'r': 18, 'pg13': 13, 'pg': 10, 'g': 0}
rating = vdata.get('rating')
if rating:
age_limit = AGE_LIMITS.get(rating.lower())
else:
age_limit = None # None = No idea
PREFS = {'high_quality': 2, 'low_quality': 0}
formats = []
for format_key in ('mp4', 'webm'):
for qname, qurl in vdata.get(format_key, {}).items():
formats.append({
'format_id': format_key + '_' + qname,
'url': qurl,
'format': format_key,
'preference': PREFS.get(qname),
})
self._sort_formats(formats)
duration = int_or_none(vdata.get('duration'), 1000)
like_count = int_or_none(vdata.get('likes'))
return {
'id': video_id,
'title': vdata['title'],
'description': vdata.get('description'),
'thumbnail': vdata.get('thumbnail'),
'formats': formats,
'age_limit': age_limit,
'duration': duration,
'like_count': like_count,
}

View File

@@ -44,10 +44,10 @@ class ComedyCentralShowsIE(MTVServicesInfoExtractor):
# or: http://www.colbertnation.com/the-colbert-report-collections/422008/festival-of-lights/79524
_VALID_URL = r'''(?x)^(:(?P<shortname>tds|thedailyshow)
|https?://(:www\.)?
(?P<showname>thedailyshow|thecolbertreport)\.(?:cc\.)?com/
(?P<showname>thedailyshow|thecolbertreport|tosh)\.(?:cc\.)?com/
((?:full-)?episodes/(?:[0-9a-z]{6}/)?(?P<episode>.*)|
(?P<clip>
(?:(?:guests/[^/]+|videos|video-playlists|special-editions|news-team/[^/]+)/[^/]+/(?P<videotitle>[^/?#]+))
(?:(?:guests/[^/]+|videos|video-(?:clips|playlists)|special-editions|news-team/[^/]+)/[^/]+/(?P<videotitle>[^/?#]+))
|(the-colbert-report-(videos|collections)/(?P<clipID>[0-9]+)/[^/]*/(?P<cntitle>.*?))
|(watch/(?P<date>[^/]*)/(?P<tdstitle>.*))
)|
@@ -129,6 +129,9 @@ class ComedyCentralShowsIE(MTVServicesInfoExtractor):
}, {
'url': 'http://thedailyshow.cc.com/news-team/michael-che/7wnfel/we-need-to-talk-about-israel',
'only_matching': True,
}, {
'url': 'http://tosh.cc.com/video-clips/68g93d/twitter-users-share-summer-plans',
'only_matching': True,
}]
_available_formats = ['3500', '2200', '1700', '1200', '750', '400']

View File

@@ -45,6 +45,7 @@ from ..utils import (
unescapeHTML,
unified_strdate,
url_basename,
xpath_element,
xpath_text,
xpath_with_ns,
determine_protocol,
@@ -52,6 +53,7 @@ from ..utils import (
mimetype2ext,
update_Request,
update_url_query,
parse_m3u8_attributes,
)
@@ -163,7 +165,7 @@ class InfoExtractor(object):
description: Full video description.
uploader: Full name of the video uploader.
license: License name the video is licensed under.
creator: The main artist who created the video.
creator: The creator of the video.
release_date: The date (YYYYMMDD) when the video was released.
timestamp: UNIX timestamp of the moment the video became available.
upload_date: Video upload date (YYYYMMDD).
@@ -987,7 +989,7 @@ class InfoExtractor(object):
def _extract_f4m_formats(self, manifest_url, video_id, preference=None, f4m_id=None,
transform_source=lambda s: fix_xml_ampersands(s).strip(),
fatal=True):
fatal=True, m3u8_id=None):
manifest = self._download_xml(
manifest_url, video_id, 'Downloading f4m manifest',
'Unable to download f4m manifest',
@@ -1001,11 +1003,18 @@ class InfoExtractor(object):
return self._parse_f4m_formats(
manifest, manifest_url, video_id, preference=preference, f4m_id=f4m_id,
transform_source=transform_source, fatal=fatal)
transform_source=transform_source, fatal=fatal, m3u8_id=m3u8_id)
def _parse_f4m_formats(self, manifest, manifest_url, video_id, preference=None, f4m_id=None,
transform_source=lambda s: fix_xml_ampersands(s).strip(),
fatal=True):
fatal=True, m3u8_id=None):
# currently youtube-dl cannot decode the playerVerificationChallenge as Akamai uses Adobe Alchemy
akamai_pv = manifest.find('{http://ns.adobe.com/f4m/1.0}pv-2.0')
if akamai_pv is not None and ';' in akamai_pv.text:
playerVerificationChallenge = akamai_pv.text.split(';')[0]
if playerVerificationChallenge.strip() != '':
return []
formats = []
manifest_version = '1.0'
media_nodes = manifest.findall('{http://ns.adobe.com/f4m/1.0}media')
@@ -1022,9 +1031,26 @@ class InfoExtractor(object):
'base URL', default=None)
if base_url:
base_url = base_url.strip()
bootstrap_info = xpath_element(
manifest, ['{http://ns.adobe.com/f4m/1.0}bootstrapInfo', '{http://ns.adobe.com/f4m/2.0}bootstrapInfo'],
'bootstrap info', default=None)
for i, media_el in enumerate(media_nodes):
if manifest_version == '2.0':
media_url = media_el.attrib.get('href') or media_el.attrib.get('url')
tbr = int_or_none(media_el.attrib.get('bitrate'))
width = int_or_none(media_el.attrib.get('width'))
height = int_or_none(media_el.attrib.get('height'))
format_id = '-'.join(filter(None, [f4m_id, compat_str(i if tbr is None else tbr)]))
# If <bootstrapInfo> is present, the specified f4m is a
# stream-level manifest, and only set-level manifests may refer to
# external resources. See section 11.4 and section 4 of F4M spec
if bootstrap_info is None:
media_url = None
# @href is introduced in 2.0, see section 11.6 of F4M spec
if manifest_version == '2.0':
media_url = media_el.attrib.get('href')
if media_url is None:
media_url = media_el.attrib.get('url')
if not media_url:
continue
manifest_url = (
@@ -1034,29 +1060,43 @@ class InfoExtractor(object):
# since bitrates in parent manifest (this one) and media_url manifest
# may differ leading to inability to resolve the format by requested
# bitrate in f4m downloader
if determine_ext(manifest_url) == 'f4m':
formats.extend(self._extract_f4m_formats(
ext = determine_ext(manifest_url)
if ext == 'f4m':
f4m_formats = self._extract_f4m_formats(
manifest_url, video_id, preference=preference, f4m_id=f4m_id,
transform_source=transform_source, fatal=fatal))
transform_source=transform_source, fatal=fatal)
# Sometimes stream-level manifest contains single media entry that
# does not contain any quality metadata (e.g. http://matchtv.ru/#live-player).
# At the same time parent's media entry in set-level manifest may
# contain it. We will copy it from parent in such cases.
if len(f4m_formats) == 1:
f = f4m_formats[0]
f.update({
'tbr': f.get('tbr') or tbr,
'width': f.get('width') or width,
'height': f.get('height') or height,
'format_id': f.get('format_id') if not tbr else format_id,
})
formats.extend(f4m_formats)
continue
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
manifest_url, video_id, 'mp4', preference=preference,
m3u8_id=m3u8_id, fatal=fatal))
continue
tbr = int_or_none(media_el.attrib.get('bitrate'))
formats.append({
'format_id': '-'.join(filter(None, [f4m_id, compat_str(i if tbr is None else tbr)])),
'format_id': format_id,
'url': manifest_url,
'ext': 'flv',
'ext': 'flv' if bootstrap_info is not None else None,
'tbr': tbr,
'width': int_or_none(media_el.attrib.get('width')),
'height': int_or_none(media_el.attrib.get('height')),
'width': width,
'height': height,
'preference': preference,
})
return formats
def _extract_m3u8_formats(self, m3u8_url, video_id, ext=None,
entry_protocol='m3u8', preference=None,
m3u8_id=None, note=None, errnote=None,
fatal=True):
formats = [{
def _m3u8_meta_format(self, m3u8_url, ext=None, preference=None, m3u8_id=None):
return {
'format_id': '-'.join(filter(None, [m3u8_id, 'meta'])),
'url': m3u8_url,
'ext': ext,
@@ -1064,7 +1104,14 @@ class InfoExtractor(object):
'preference': preference - 1 if preference else -1,
'resolution': 'multiple',
'format_note': 'Quality selection URL',
}]
}
def _extract_m3u8_formats(self, m3u8_url, video_id, ext=None,
entry_protocol='m3u8', preference=None,
m3u8_id=None, note=None, errnote=None,
fatal=True, live=False):
formats = [self._m3u8_meta_format(m3u8_url, ext, preference, m3u8_id)]
format_url = lambda u: (
u
@@ -1104,23 +1151,11 @@ class InfoExtractor(object):
}]
last_info = None
last_media = None
kv_rex = re.compile(
r'(?P<key>[a-zA-Z_-]+)=(?P<val>"[^"]+"|[^",]+)(?:,|$)')
for line in m3u8_doc.splitlines():
if line.startswith('#EXT-X-STREAM-INF:'):
last_info = {}
for m in kv_rex.finditer(line):
v = m.group('val')
if v.startswith('"'):
v = v[1:-1]
last_info[m.group('key')] = v
last_info = parse_m3u8_attributes(line)
elif line.startswith('#EXT-X-MEDIA:'):
last_media = {}
for m in kv_rex.finditer(line):
v = m.group('val')
if v.startswith('"'):
v = v[1:-1]
last_media[m.group('key')] = v
last_media = parse_m3u8_attributes(line)
elif line.startswith('#') or not line.strip():
continue
else:
@@ -1131,8 +1166,15 @@ class InfoExtractor(object):
format_id = []
if m3u8_id:
format_id.append(m3u8_id)
last_media_name = last_media.get('NAME') if last_media and last_media.get('TYPE') != 'SUBTITLES' else None
format_id.append(last_media_name if last_media_name else '%d' % (tbr if tbr else len(formats)))
last_media_name = last_media.get('NAME') if last_media and last_media.get('TYPE') not in ('SUBTITLES', 'CLOSED-CAPTIONS') else None
# Despite specification does not mention NAME attribute for
# EXT-X-STREAM-INF it still sometimes may be present
stream_name = last_info.get('NAME') or last_media_name
# Bandwidth of live streams may differ over time thus making
# format_id unpredictable. So it's better to keep provided
# format_id intact.
if not live:
format_id.append(stream_name if stream_name else '%d' % (tbr if tbr else len(formats)))
f = {
'format_id': '-'.join(format_id),
'url': format_url(line.strip()),
@@ -1264,21 +1306,21 @@ class InfoExtractor(object):
m3u8_count = 0
srcs = []
videos = smil.findall(self._xpath_ns('.//video', namespace))
for video in videos:
src = video.get('src')
media = smil.findall(self._xpath_ns('.//video', namespace)) + smil.findall(self._xpath_ns('.//audio', namespace))
for medium in media:
src = medium.get('src')
if not src or src in srcs:
continue
srcs.append(src)
bitrate = float_or_none(video.get('system-bitrate') or video.get('systemBitrate'), 1000)
filesize = int_or_none(video.get('size') or video.get('fileSize'))
width = int_or_none(video.get('width'))
height = int_or_none(video.get('height'))
proto = video.get('proto')
ext = video.get('ext')
bitrate = float_or_none(medium.get('system-bitrate') or medium.get('systemBitrate'), 1000)
filesize = int_or_none(medium.get('size') or medium.get('fileSize'))
width = int_or_none(medium.get('width'))
height = int_or_none(medium.get('height'))
proto = medium.get('proto')
ext = medium.get('ext')
src_ext = determine_ext(src)
streamer = video.get('streamer') or base
streamer = medium.get('streamer') or base
if proto == 'rtmp' or streamer.startswith('rtmp'):
rtmp_count += 1

View File

@@ -0,0 +1,143 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
ExtractorError,
float_or_none,
int_or_none,
parse_iso8601,
qualities,
)
class CoubIE(InfoExtractor):
_VALID_URL = r'(?:coub:|https?://(?:coub\.com/(?:view|embed|coubs)/|c-cdn\.coub\.com/fb-player\.swf\?.*\bcoub(?:ID|id)=))(?P<id>[\da-z]+)'
_TESTS = [{
'url': 'http://coub.com/view/5u5n1',
'info_dict': {
'id': '5u5n1',
'ext': 'mp4',
'title': 'The Matrix Moonwalk',
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 4.6,
'timestamp': 1428527772,
'upload_date': '20150408',
'uploader': 'Артём Лоскутников',
'uploader_id': 'artyom.loskutnikov',
'view_count': int,
'like_count': int,
'repost_count': int,
'comment_count': int,
'age_limit': 0,
},
}, {
'url': 'http://c-cdn.coub.com/fb-player.swf?bot_type=vk&coubID=7w5a4',
'only_matching': True,
}, {
'url': 'coub:5u5n1',
'only_matching': True,
}, {
# longer video id
'url': 'http://coub.com/view/237d5l5h',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
coub = self._download_json(
'http://coub.com/api/v2/coubs/%s.json' % video_id, video_id)
if coub.get('error'):
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, coub['error']), expected=True)
title = coub['title']
file_versions = coub['file_versions']
QUALITIES = ('low', 'med', 'high')
MOBILE = 'mobile'
IPHONE = 'iphone'
HTML5 = 'html5'
SOURCE_PREFERENCE = (MOBILE, IPHONE, HTML5)
quality_key = qualities(QUALITIES)
preference_key = qualities(SOURCE_PREFERENCE)
formats = []
for kind, items in file_versions.get(HTML5, {}).items():
if kind not in ('video', 'audio'):
continue
if not isinstance(items, dict):
continue
for quality, item in items.items():
if not isinstance(item, dict):
continue
item_url = item.get('url')
if not item_url:
continue
formats.append({
'url': item_url,
'format_id': '%s-%s-%s' % (HTML5, kind, quality),
'filesize': int_or_none(item.get('size')),
'vcodec': 'none' if kind == 'audio' else None,
'quality': quality_key(quality),
'preference': preference_key(HTML5),
})
iphone_url = file_versions.get(IPHONE, {}).get('url')
if iphone_url:
formats.append({
'url': iphone_url,
'format_id': IPHONE,
'preference': preference_key(IPHONE),
})
mobile_url = file_versions.get(MOBILE, {}).get('audio_url')
if mobile_url:
formats.append({
'url': mobile_url,
'format_id': '%s-audio' % MOBILE,
'preference': preference_key(MOBILE),
})
self._sort_formats(formats)
thumbnail = coub.get('picture')
duration = float_or_none(coub.get('duration'))
timestamp = parse_iso8601(coub.get('published_at') or coub.get('created_at'))
uploader = coub.get('channel', {}).get('title')
uploader_id = coub.get('channel', {}).get('permalink')
view_count = int_or_none(coub.get('views_count') or coub.get('views_increase_count'))
like_count = int_or_none(coub.get('likes_count'))
repost_count = int_or_none(coub.get('recoubs_count'))
comment_count = int_or_none(coub.get('comments_count'))
age_restricted = coub.get('age_restricted', coub.get('age_restricted_by_admin'))
if age_restricted is not None:
age_limit = 18 if age_restricted is True else 0
else:
age_limit = None
return {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'duration': duration,
'timestamp': timestamp,
'uploader': uploader,
'uploader_id': uploader_id,
'view_count': view_count,
'like_count': like_count,
'repost_count': repost_count,
'comment_count': comment_count,
'age_limit': age_limit,
'formats': formats,
}

View File

@@ -11,7 +11,6 @@ from math import pow, sqrt, floor
from .common import InfoExtractor
from ..compat import (
compat_etree_fromstring,
compat_urllib_parse_unquote,
compat_urllib_parse_urlencode,
compat_urllib_request,
compat_urlparse,
@@ -27,6 +26,7 @@ from ..utils import (
unified_strdate,
urlencode_postdata,
xpath_text,
extract_attributes,
)
from ..aes import (
aes_cbc_decrypt,
@@ -306,28 +306,36 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
r'<a[^>]+href="/publisher/[^"]+"[^>]*>([^<]+)</a>', webpage,
'video_uploader', fatal=False)
playerdata_url = compat_urllib_parse_unquote(self._html_search_regex(r'"config_url":"([^"]+)', webpage, 'playerdata_url'))
playerdata_req = sanitized_Request(playerdata_url)
playerdata_req.data = urlencode_postdata({'current_page': webpage_url})
playerdata_req.add_header('Content-Type', 'application/x-www-form-urlencoded')
playerdata = self._download_webpage(playerdata_req, video_id, note='Downloading media info')
stream_id = self._search_regex(r'<media_id>([^<]+)', playerdata, 'stream_id')
video_thumbnail = self._search_regex(r'<episode_image_url>([^<]+)', playerdata, 'thumbnail', fatal=False)
available_fmts = []
for a, fmt in re.findall(r'(<a[^>]+token=["\']showmedia\.([0-9]{3,4})p["\'][^>]+>)', webpage):
attrs = extract_attributes(a)
href = attrs.get('href')
if href and '/freetrial' in href:
continue
available_fmts.append(fmt)
if not available_fmts:
for p in (r'token=["\']showmedia\.([0-9]{3,4})p"', r'showmedia\.([0-9]{3,4})p'):
available_fmts = re.findall(p, webpage)
if available_fmts:
break
video_encode_ids = []
formats = []
for fmt in re.findall(r'showmedia\.([0-9]{3,4})p', webpage):
for fmt in available_fmts:
stream_quality, stream_format = self._FORMAT_IDS[fmt]
video_format = fmt + 'p'
streamdata_req = sanitized_Request(
'http://www.crunchyroll.com/xml/?req=RpcApiVideoPlayer_GetStandardConfig&media_id=%s&video_format=%s&video_quality=%s'
% (stream_id, stream_format, stream_quality),
% (video_id, stream_format, stream_quality),
compat_urllib_parse_urlencode({'current_page': url}).encode('utf-8'))
streamdata_req.add_header('Content-Type', 'application/x-www-form-urlencoded')
streamdata = self._download_xml(
streamdata_req, video_id,
note='Downloading media info for %s' % video_format)
stream_info = streamdata.find('./{default}preload/stream_info')
video_encode_id = xpath_text(stream_info, './video_encode_id')
if video_encode_id in video_encode_ids:
continue
video_encode_ids.append(video_encode_id)
video_url = xpath_text(stream_info, './host')
video_play_path = xpath_text(stream_info, './file')
if not video_url or not video_play_path:
@@ -359,6 +367,14 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
'ext': 'flv',
})
formats.append(format_info)
self._sort_formats(formats)
metadata = self._download_xml(
'http://www.crunchyroll.com/xml', video_id,
note='Downloading media info', query={
'req': 'RpcApiVideoPlayer_GetMediaMetadata',
'media_id': video_id,
})
subtitles = self.extract_subtitles(video_id, webpage)
@@ -366,9 +382,12 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
'id': video_id,
'title': video_title,
'description': video_description,
'thumbnail': video_thumbnail,
'thumbnail': xpath_text(metadata, 'episode_image_url'),
'uploader': video_uploader,
'upload_date': video_upload_date,
'series': xpath_text(metadata, 'series_title'),
'episode': xpath_text(metadata, 'episode_title'),
'episode_number': int_or_none(xpath_text(metadata, 'episode_number')),
'subtitles': subtitles,
'formats': formats,
}

View File

@@ -9,7 +9,7 @@ from ..utils import (
class CWTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?cw(?:tv|seed)\.com/shows/(?:[^/]+/){2}\?play=(?P<id>[a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{12})'
_VALID_URL = r'https?://(?:www\.)?cw(?:tv|seed)\.com/(?:shows/)?(?:[^/]+/){2}\?.*\bplay=(?P<id>[a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{12})'
_TESTS = [{
'url': 'http://cwtv.com/shows/arrow/legends-of-yesterday/?play=6b15e985-9345-4f60-baf8-56e96be57c63',
'info_dict': {
@@ -48,6 +48,9 @@ class CWTVIE(InfoExtractor):
# m3u8 download
'skip_download': True,
}
}, {
'url': 'http://cwtv.com/thecw/chroniclesofcisco/?play=8adebe35-f447-465f-ab52-e863506ff6d6',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@@ -0,0 +1,61 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
determine_protocol,
)
class DailyMailIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?dailymail\.co\.uk/video/[^/]+/video-(?P<id>[0-9]+)'
_TEST = {
'url': 'http://www.dailymail.co.uk/video/sciencetech/video-1288527/Turn-video-impressionist-masterpiece.html',
'md5': '2f639d446394f53f3a33658b518b6615',
'info_dict': {
'id': '1288527',
'ext': 'mp4',
'title': 'Turn any video into an impressionist masterpiece',
'description': 'md5:88ddbcb504367987b2708bb38677c9d2',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_data = self._parse_json(self._search_regex(
r"data-opts='({.+?})'", webpage, 'video data'), video_id)
title = video_data['title']
video_sources = self._download_json(video_data.get(
'sources', {}).get('url') or 'http://www.dailymail.co.uk/api/player/%s/video-sources.json' % video_id, video_id)
formats = []
for rendition in video_sources['renditions']:
rendition_url = rendition.get('url')
if not rendition_url:
continue
tbr = int_or_none(rendition.get('encodingRate'), 1000)
container = rendition.get('videoContainer')
is_hls = container == 'M2TS'
protocol = 'm3u8_native' if is_hls else determine_protocol({'url': rendition_url})
formats.append({
'format_id': ('hls' if is_hls else protocol) + ('-%d' % tbr if tbr else ''),
'url': rendition_url,
'width': int_or_none(rendition.get('frameWidth')),
'height': int_or_none(rendition.get('frameHeight')),
'tbr': tbr,
'vcodec': rendition.get('videoCodec'),
'container': container,
'protocol': protocol,
'ext': 'mp4' if is_hls else None,
})
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': video_data.get('descr'),
'thumbnail': video_data.get('poster') or video_data.get('thumbnail'),
'formats': formats,
}

View File

@@ -12,39 +12,46 @@ class DFBIE(InfoExtractor):
_TEST = {
'url': 'http://tv.dfb.de/video/u-19-em-stimmen-zum-spiel-gegen-russland/11633/',
# The md5 is different each time
'md5': 'ac0f98a52a330f700b4b3034ad240649',
'info_dict': {
'id': '11633',
'display_id': 'u-19-em-stimmen-zum-spiel-gegen-russland',
'ext': 'flv',
'ext': 'mp4',
'title': 'U 19-EM: Stimmen zum Spiel gegen Russland',
'upload_date': '20150714',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
display_id = mobj.group('display_id')
display_id, video_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, display_id)
player_info = self._download_xml(
'http://tv.dfb.de/server/hd_video.php?play=%s' % video_id,
display_id)
video_info = player_info.find('video')
stream_access_url = self._proto_relative_url(video_info.find('url').text.strip())
f4m_info = self._download_xml(
self._proto_relative_url(video_info.find('url').text.strip()), display_id)
token_el = f4m_info.find('token')
manifest_url = token_el.attrib['url'] + '?' + 'hdnea=' + token_el.attrib['auth'] + '&hdcore=3.2.0'
formats = self._extract_f4m_formats(manifest_url, display_id)
formats = []
# see http://tv.dfb.de/player/js/ajax.js for the method to extract m3u8 formats
for sa_url in (stream_access_url, stream_access_url + '&area=&format=iphone'):
stream_access_info = self._download_xml(sa_url, display_id)
token_el = stream_access_info.find('token')
manifest_url = token_el.attrib['url'] + '?' + 'hdnea=' + token_el.attrib['auth']
if '.f4m' in manifest_url:
formats.extend(self._extract_f4m_formats(
manifest_url + '&hdcore=3.2.0',
display_id, f4m_id='hds', fatal=False))
else:
formats.extend(self._extract_m3u8_formats(
manifest_url, display_id, 'mp4',
'm3u8_native', m3u8_id='hls', fatal=False))
self._sort_formats(formats)
return {
'id': video_id,
'display_id': display_id,
'title': video_info.find('title').text,
'thumbnail': self._og_search_thumbnail(webpage),
'thumbnail': 'http://tv.dfb.de/images/%s_640x360.jpg' % video_id,
'upload_date': unified_strdate(video_info.find('time_date').text),
'formats': formats,
}

View File

@@ -33,6 +33,7 @@ class DiscoveryIE(InfoExtractor):
'duration': 156,
'timestamp': 1302032462,
'upload_date': '20110405',
'uploader_id': '103207',
},
'params': {
'skip_download': True, # requires ffmpeg
@@ -54,7 +55,11 @@ class DiscoveryIE(InfoExtractor):
'upload_date': '20140725',
'timestamp': 1406246400,
'duration': 116,
'uploader_id': '103207',
},
'params': {
'skip_download': True, # requires ffmpeg
}
}]
def _real_extract(self, url):
@@ -66,13 +71,19 @@ class DiscoveryIE(InfoExtractor):
entries = []
for idx, video_info in enumerate(info['playlist']):
formats = self._extract_m3u8_formats(
video_info['src'], display_id, 'mp4', 'm3u8_native', m3u8_id='hls',
note='Download m3u8 information for video %d' % (idx + 1))
self._sort_formats(formats)
subtitles = {}
caption_url = video_info.get('captionsUrl')
if caption_url:
subtitles = {
'en': [{
'url': caption_url,
}]
}
entries.append({
'_type': 'url_transparent',
'url': 'http://players.brightcove.net/103207/default_default/index.html?videoId=ref:%s' % video_info['referenceId'],
'id': compat_str(video_info['id']),
'formats': formats,
'title': video_info['title'],
'description': video_info.get('description'),
'duration': parse_duration(video_info.get('video_length')),
@@ -80,6 +91,7 @@ class DiscoveryIE(InfoExtractor):
'thumbnail': video_info.get('thumbnailURL'),
'alt_title': video_info.get('secondary_title'),
'timestamp': parse_iso8601(video_info.get('publishedDate')),
'subtitles': subtitles,
})
return self.playlist_result(entries, display_id, video_title)

View File

@@ -18,7 +18,7 @@ class DouyuTVIE(InfoExtractor):
'display_id': 'iseven',
'ext': 'flv',
'title': 're:^清晨醒脑T-ara根本停不下来 [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'description': 'md5:f34981259a03e980a3c6404190a3ed61',
'description': 're:.*m7show@163\.com.*',
'thumbnail': 're:^https?://.*\.jpg$',
'uploader': '7师傅',
'uploader_id': '431925',
@@ -43,7 +43,7 @@ class DouyuTVIE(InfoExtractor):
'params': {
'skip_download': True,
},
'skip': 'Romm not found',
'skip': 'Room not found',
}, {
'url': 'http://www.douyutv.com/17732',
'info_dict': {
@@ -51,7 +51,7 @@ class DouyuTVIE(InfoExtractor):
'display_id': '17732',
'ext': 'flv',
'title': 're:^清晨醒脑T-ara根本停不下来 [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'description': 'md5:f34981259a03e980a3c6404190a3ed61',
'description': 're:.*m7show@163\.com.*',
'thumbnail': 're:^https?://.*\.jpg$',
'uploader': '7师傅',
'uploader_id': '431925',
@@ -75,13 +75,28 @@ class DouyuTVIE(InfoExtractor):
room_id = self._html_search_regex(
r'"room_id"\s*:\s*(\d+),', page, 'room id')
prefix = 'room/%s?aid=android&client_sys=android&time=%d' % (
room_id, int(time.time()))
config = None
# Douyu API sometimes returns error "Unable to load the requested class: eticket_redis_cache"
# Retry with different parameters - same parameters cause same errors
for i in range(5):
prefix = 'room/%s?aid=android&client_sys=android&time=%d' % (
room_id, int(time.time()))
auth = hashlib.md5((prefix + '1231').encode('ascii')).hexdigest()
auth = hashlib.md5((prefix + '1231').encode('ascii')).hexdigest()
config = self._download_json(
'http://www.douyutv.com/api/v1/%s&auth=%s' % (prefix, auth),
video_id)
config_page = self._download_webpage(
'http://www.douyutv.com/api/v1/%s&auth=%s' % (prefix, auth),
video_id)
try:
config = self._parse_json(config_page, video_id, fatal=False)
except ExtractorError:
# Wait some time before retrying to get a different time() value
self._sleep(1, video_id, msg_template='%(video_id)s: Error occurs. '
'Waiting for %(timeout)s seconds before retrying')
continue
else:
break
if config is None:
raise ExtractorError('Unable to fetch API result')
data = config['data']

View File

@@ -2,13 +2,16 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import int_or_none
from ..utils import (
int_or_none,
unified_strdate,
)
from ..compat import compat_urlparse
class DWIE(InfoExtractor):
IE_NAME = 'dw'
_VALID_URL = r'https?://(?:www\.)?dw\.com/(?:[^/]+/)+av-(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?dw\.com/(?:[^/]+/)+(?:av|e)-(?P<id>\d+)'
_TESTS = [{
# video
'url': 'http://www.dw.com/en/intelligent-light/av-19112290',
@@ -31,6 +34,18 @@ class DWIE(InfoExtractor):
'description': 'md5:bc9ca6e4e063361e21c920c53af12405',
'upload_date': '20160311',
}
}, {
# DW documentaries, only last for one or two weeks
'url': 'http://www.dw.com/en/documentaries-welcome-to-the-90s-2016-05-21/e-19220158-9798',
'md5': '56b6214ef463bfb9a3b71aeb886f3cf1',
'info_dict': {
'id': '19274438',
'ext': 'mp4',
'title': 'Welcome to the 90s Hip Hop',
'description': 'Welcome to the 90s - The Golden Decade of Hip Hop',
'upload_date': '20160521',
},
'skip': 'Video removed',
}]
def _real_extract(self, url):
@@ -38,6 +53,7 @@ class DWIE(InfoExtractor):
webpage = self._download_webpage(url, media_id)
hidden_inputs = self._hidden_inputs(webpage)
title = hidden_inputs['media_title']
media_id = hidden_inputs.get('media_id') or media_id
if hidden_inputs.get('player_type') == 'video' and hidden_inputs.get('stream_file') == '1':
formats = self._extract_smil_formats(
@@ -49,13 +65,20 @@ class DWIE(InfoExtractor):
else:
formats = [{'url': hidden_inputs['file_name']}]
upload_date = hidden_inputs.get('display_date')
if not upload_date:
upload_date = self._html_search_regex(
r'<span[^>]+class="date">([0-9.]+)\s*\|', webpage,
'upload date', default=None)
upload_date = unified_strdate(upload_date)
return {
'id': media_id,
'title': title,
'description': self._og_search_description(webpage),
'thumbnail': hidden_inputs.get('preview_image'),
'duration': int_or_none(hidden_inputs.get('file_duration')),
'upload_date': hidden_inputs.get('display_date'),
'upload_date': upload_date,
'formats': formats,
}

View File

@@ -23,7 +23,7 @@ class EaglePlatformIE(InfoExtractor):
_TESTS = [{
# http://lenta.ru/news/2015/03/06/navalny/
'url': 'http://lentaru.media.eagleplatform.com/index/player?player=new&record_id=227304&player_template_id=5201',
'md5': '881ee8460e1b7735a8be938e2ffb362b',
# Not checking MD5 as sometimes the direct HTTP link results in 404 and HLS is used
'info_dict': {
'id': '227304',
'ext': 'mp4',
@@ -109,8 +109,11 @@ class EaglePlatformIE(InfoExtractor):
mobj = re.search('/([^/]+)/index\.m3u8', m3u8_format['url'])
if mobj:
http_format = m3u8_format.copy()
video_url = mp4_url.replace(mp4_url_basename, mobj.group(1))
if not self._is_valid_url(video_url, video_id):
continue
http_format.update({
'url': mp4_url.replace(mp4_url_basename, mobj.group(1)),
'url': video_url,
'format_id': m3u8_format['format_id'].replace('hls', 'http'),
'protocol': 'http',
})

View File

@@ -11,8 +11,8 @@ from ..utils import (
class EpornerIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?eporner\.com/hd-porn/(?P<id>\d+)/(?P<display_id>[\w-]+)'
_TEST = {
_VALID_URL = r'https?://(?:www\.)?eporner\.com/hd-porn/(?P<id>\w+)/(?P<display_id>[\w-]+)'
_TESTS = [{
'url': 'http://www.eporner.com/hd-porn/95008/Infamous-Tiffany-Teen-Strip-Tease-Video/',
'md5': '39d486f046212d8e1b911c52ab4691f8',
'info_dict': {
@@ -23,8 +23,12 @@ class EpornerIE(InfoExtractor):
'duration': 1838,
'view_count': int,
'age_limit': 18,
}
}
},
}, {
# New (May 2016) URL layout
'url': 'http://www.eporner.com/hd-porn/3YRUtzMcWn0/Star-Wars-XXX-Parody/',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)

View File

@@ -8,6 +8,7 @@ class ESPNIE(InfoExtractor):
_VALID_URL = r'https?://espn\.go\.com/(?:[^/]+/)*(?P<id>[^/]+)'
_TESTS = [{
'url': 'http://espn.go.com/video/clip?id=10365079',
'md5': '60e5d097a523e767d06479335d1bdc58',
'info_dict': {
'id': 'FkYWtmazr6Ed8xmvILvKLWjd4QvYZpzG',
'ext': 'mp4',
@@ -15,21 +16,22 @@ class ESPNIE(InfoExtractor):
'description': None,
},
'params': {
# m3u8 download
'skip_download': True,
},
'add_ie': ['OoyalaExternal'],
}, {
# intl video, from http://www.espnfc.us/video/mls-highlights/150/video/2743663/must-see-moments-best-of-the-mls-season
'url': 'http://espn.go.com/video/clip?id=2743663',
'md5': 'f4ac89b59afc7e2d7dbb049523df6768',
'info_dict': {
'id': '50NDFkeTqRHB0nXBOK-RGdSG5YQPuxHg',
'ext': 'mp4',
'title': 'Must-See Moments: Best of the MLS season',
},
'params': {
# m3u8 download
'skip_download': True,
},
'add_ie': ['OoyalaExternal'],
}, {
'url': 'https://espn.go.com/video/iframe/twitter/?cms=espn&id=10365079',
'only_matching': True,

View File

@@ -3,6 +3,10 @@ from __future__ import unicode_literals
from .abc import ABCIE
from .abc7news import Abc7NewsIE
from .abcnews import (
AbcNewsIE,
AbcNewsVideoIE,
)
from .academicearth import AcademicEarthCourseIE
from .acast import (
ACastIE,
@@ -17,6 +21,7 @@ from .adobetv import (
)
from .adultswim import AdultSwimIE
from .aenetworks import AENetworksIE
from .afreecatv import AfreecaTVIE
from .aftonbladet import AftonbladetIE
from .airmozilla import AirMozillaIE
from .aljazeera import AlJazeeraIE
@@ -39,7 +44,6 @@ from .archiveorg import ArchiveOrgIE
from .ard import (
ARDIE,
ARDMediathekIE,
SportschauIE,
)
from .arte import (
ArteTvIE,
@@ -52,6 +56,7 @@ from .arte import (
ArteTVDDCIE,
ArteTVMagazineIE,
ArteTVEmbedIE,
ArteTVPlaylistIE,
)
from .atresplayer import AtresPlayerIE
from .atttechchannel import ATTTechChannelIE
@@ -65,6 +70,8 @@ from .bandcamp import BandcampIE, BandcampAlbumIE
from .bbc import (
BBCCoUkIE,
BBCCoUkArticleIE,
BBCCoUkIPlayerPlaylistIE,
BBCCoUkPlaylistIE,
BBCIE,
)
from .beeg import BeegIE
@@ -75,6 +82,7 @@ from .bigflix import BigflixIE
from .bild import BildIE
from .bilibili import BiliBiliIE
from .biobiochiletv import BioBioChileTVIE
from .biqle import BIQLEIE
from .bleacherreport import (
BleacherReportIE,
BleacherReportCMSIE,
@@ -101,11 +109,16 @@ from .camwithher import CamWithHerIE
from .canalplus import CanalplusIE
from .canalc2 import Canalc2IE
from .canvas import CanvasIE
from .carambatv import (
CarambaTVIE,
CarambaTVPageIE,
)
from .cbc import (
CBCIE,
CBCPlayerIE,
)
from .cbs import CBSIE
from .cbslocal import CBSLocalIE
from .cbsinteractive import CBSInteractiveIE
from .cbsnews import (
CBSNewsIE,
@@ -123,11 +136,11 @@ from .chirbit import (
ChirbitProfileIE,
)
from .cinchcast import CinchcastIE
from .cinemassacre import CinemassacreIE
from .cliprs import ClipRsIE
from .clipfish import ClipfishIE
from .cliphunter import CliphunterIE
from .clipsyndicate import ClipsyndicateIE
from .closertotruth import CloserToTruthIE
from .cloudy import CloudyIE
from .clubic import ClubicIE
from .clyp import ClypIE
@@ -138,7 +151,7 @@ from .cnn import (
CNNBlogsIE,
CNNArticleIE,
)
from .collegehumor import CollegeHumorIE
from .coub import CoubIE
from .collegerama import CollegeRamaIE
from .comedycentral import ComedyCentralIE, ComedyCentralShowsIE
from .comcarcoff import ComCarCoffIE
@@ -157,6 +170,7 @@ from .cspan import CSpanIE
from .ctsnews import CtsNewsIE
from .cultureunplugged import CultureUnpluggedIE
from .cwtv import CWTVIE
from .dailymail import DailyMailIE
from .dailymotion import (
DailymotionIE,
DailymotionPlaylistIE,
@@ -226,6 +240,7 @@ from .everyonesmixtape import EveryonesMixtapeIE
from .exfm import ExfmIE
from .expotv import ExpoTVIE
from .extremetube import ExtremeTubeIE
from .eyedotv import EyedoTVIE
from .facebook import FacebookIE
from .faz import FazIE
from .fc2 import FC2IE
@@ -238,6 +253,7 @@ from .fktv import FKTVIE
from .flickr import FlickrIE
from .folketinget import FolketingetIE
from .footyroom import FootyRoomIE
from .formula1 import Formula1IE
from .fourtube import FourTubeIE
from .fox import FOXIE
from .foxgay import FoxgayIE
@@ -282,6 +298,7 @@ from .globo import (
GloboArticleIE,
)
from .godtube import GodTubeIE
from .godtv import GodTVIE
from .goldenmoustache import GoldenMoustacheIE
from .golem import GolemIE
from .googledrive import GoogleDriveIE
@@ -365,6 +382,7 @@ from .kuwo import (
)
from .la7 import LA7IE
from .laola1tv import Laola1TvIE
from .learnr import LearnrIE
from .lecture2go import Lecture2GoIE
from .lemonde import LemondeIE
from .leeco import (
@@ -372,6 +390,7 @@ from .leeco import (
LePlaylistIE,
LetvCloudIE,
)
from .libraryofcongress import LibraryOfCongressIE
from .libsyn import LibsynIE
from .lifenews import (
LifeNewsIE,
@@ -382,6 +401,7 @@ from .limelight import (
LimelightChannelIE,
LimelightChannelListIE,
)
from .litv import LiTVIE
from .liveleak import LiveLeakIE
from .livestream import (
LivestreamIE,
@@ -389,6 +409,7 @@ from .livestream import (
LivestreamShortenerIE,
)
from .lnkgo import LnkGoIE
from .localnews8 import LocalNews8IE
from .lovehomeporn import LoveHomePornIE
from .lrt import LRTIE
from .lynda import (
@@ -400,13 +421,16 @@ from .macgamestore import MacGameStoreIE
from .mailru import MailRuIE
from .makerschannel import MakersChannelIE
from .makertv import MakerTVIE
from .malemotion import MalemotionIE
from .matchtv import MatchTVIE
from .mdr import MDRIE
from .metacafe import MetacafeIE
from .metacritic import MetacriticIE
from .mgoon import MgoonIE
from .mgtv import MGTVIE
from .microsoftvirtualacademy import (
MicrosoftVirtualAcademyIE,
MicrosoftVirtualAcademyCourseIE,
)
from .minhateca import MinhatecaIE
from .ministrygrid import MinistryGridIE
from .minoto import MinotoIE
@@ -439,8 +463,7 @@ from .mtv import (
)
from .muenchentv import MuenchenTVIE
from .musicplayon import MusicPlayOnIE
from .muzu import MuzuTVIE
from .mwave import MwaveIE
from .mwave import MwaveIE, MwaveMeetGreetIE
from .myspace import MySpaceIE, MySpaceAlbumIE
from .myspass import MySpassIE
from .myvi import MyviIE
@@ -495,7 +518,10 @@ from .nhl import (
NHLVideocenterCategoryIE,
NHLIE,
)
from .nick import NickIE
from .nick import (
NickIE,
NickDeIE,
)
from .niconico import NiconicoIE, NiconicoPlaylistIE
from .ninegag import NineGagIE
from .noco import NocoIE
@@ -562,7 +588,10 @@ from .parliamentliveuk import ParliamentLiveUKIE
from .patreon import PatreonIE
from .pbs import PBSIE
from .people import PeopleIE
from .periscope import PeriscopeIE
from .periscope import (
PeriscopeIE,
PeriscopeUserIE,
)
from .philharmoniedeparis import PhilharmonieDeParisIE
from .phoenix import PhoenixIE
from .photobucket import PhotobucketIE
@@ -602,7 +631,14 @@ from .qqmusic import (
QQMusicToplistIE,
QQMusicPlaylistIE,
)
from .r7 import R7IE
from .r7 import (
R7IE,
R7ArticleIE,
)
from .radiocanada import (
RadioCanadaIE,
RadioCanadaAudioVideoIE,
)
from .radiode import RadioDeIE
from .radiojavan import RadioJavanIE
from .radiobremen import RadioBremenIE
@@ -616,11 +652,16 @@ from .rds import RDSIE
from .redtube import RedTubeIE
from .regiotv import RegioTVIE
from .restudy import RestudyIE
from .reuters import ReutersIE
from .reverbnation import ReverbNationIE
from .revision3 import Revision3IE
from .revision3 import (
Revision3EmbedIE,
Revision3IE,
)
from .rice import RICEIE
from .ringtv import RingTVIE
from .ro220 import Ro220IE
from .rockstargames import RockstarGamesIE
from .rottentomatoes import RottenTomatoesIE
from .roxwel import RoxwelIE
from .rtbf import RTBFIE
@@ -656,10 +697,11 @@ from .screencast import ScreencastIE
from .screencastomatic import ScreencastOMaticIE
from .screenjunkies import ScreenJunkiesIE
from .screenwavemedia import ScreenwaveMediaIE, TeamFourIE
from .seeker import SeekerIE
from .senateisvp import SenateISVPIE
from .sendtonews import SendtoNewsIE
from .servingsys import ServingSysIE
from .sexu import SexuIE
from .sexykarma import SexyKarmaIE
from .shahid import ShahidIE
from .shared import SharedIE
from .sharesix import ShareSixIE
@@ -676,10 +718,6 @@ from .smotri import (
SmotriUserIE,
SmotriBroadcastIE,
)
from .snagfilms import (
SnagFilmsIE,
SnagFilmsEmbedIE,
)
from .snotr import SnotrIE
from .sohu import SohuIE
from .soundcloud import (
@@ -712,6 +750,7 @@ from .sportbox import (
SportBoxEmbedIE,
)
from .sportdeutschland import SportDeutschlandIE
from .sportschau import SportschauIE
from .srgssr import (
SRGSSRIE,
SRGSSRPlayIE,
@@ -731,7 +770,10 @@ from .svt import (
from .swrmediathek import SWRMediathekIE
from .syfy import SyfyIE
from .sztvhu import SztvHuIE
from .tagesschau import TagesschauIE
from .tagesschau import (
TagesschauPlayerIE,
TagesschauIE,
)
from .tapely import TapelyIE
from .tass import TassIE
from .tdslifeway import TDSLifewayIE
@@ -749,6 +791,7 @@ from .telecinco import TelecincoIE
from .telegraaf import TelegraafIE
from .telemb import TeleMBIE
from .teletask import TeleTaskIE
from .telewebion import TelewebionIE
from .testurl import TestURLIE
from .tf1 import TF1IE
from .theintercept import TheInterceptIE
@@ -761,6 +804,7 @@ from .thesixtyone import TheSixtyOneIE
from .thestar import TheStarIE
from .thisamericanlife import ThisAmericanLifeIE
from .thisav import ThisAVIE
from .threeqsdn import ThreeQSDNIE
from .tinypic import TinyPicIE
from .tlc import TlcDeIE
from .tmz import (
@@ -813,7 +857,10 @@ from .tvc import (
)
from .tvigle import TvigleIE
from .tvland import TVLandIE
from .tvp import TvpIE, TvpSeriesIE
from .tvp import (
TVPIE,
TVPSeriesIE,
)
from .tvplay import TVPlayIE
from .tweakers import TweakersIE
from .twentyfourvideo import TwentyFourVideoIE
@@ -828,8 +875,8 @@ from .twitch import (
TwitchVodIE,
TwitchProfileIE,
TwitchPastBroadcastsIE,
TwitchBookmarksIE,
TwitchStreamIE,
TwitchClipsIE,
)
from .twitter import (
TwitterCardIE,
@@ -846,14 +893,20 @@ from .unistra import UnistraIE
from .urort import UrortIE
from .usatoday import USATodayIE
from .ustream import UstreamIE, UstreamChannelIE
from .ustudio import UstudioIE
from .ustudio import (
UstudioIE,
UstudioEmbedIE,
)
from .varzesh3 import Varzesh3IE
from .vbox7 import Vbox7IE
from .veehd import VeeHDIE
from .veoh import VeohIE
from .vessel import VesselIE
from .vesti import VestiIE
from .vevo import VevoIE
from .vevo import (
VevoIE,
VevoPlaylistIE,
)
from .vgtv import (
BTArticleIE,
BTVestlendingenIE,
@@ -875,6 +928,7 @@ from .videomore import (
)
from .videopremium import VideoPremiumIE
from .videott import VideoTtIE
from .vidio import VidioIE
from .vidme import (
VidmeIE,
VidmeUserIE,
@@ -882,6 +936,10 @@ from .vidme import (
)
from .vidzi import VidziIE
from .vier import VierIE, VierVideosIE
from .viewlift import (
ViewLiftIE,
ViewLiftEmbedIE,
)
from .viewster import ViewsterIE
from .viidea import ViideaIE
from .vimeo import (
@@ -916,25 +974,29 @@ from .vporn import VpornIE
from .vrt import VRTIE
from .vube import VubeIE
from .vuclip import VuClipIE
from .vulture import VultureIE
from .walla import WallaIE
from .washingtonpost import WashingtonPostIE
from .washingtonpost import (
WashingtonPostIE,
WashingtonPostArticleIE,
)
from .wat import WatIE
from .watchindianporn import WatchIndianPornIE
from .wdr import (
WDRIE,
WDRMobileIE,
WDRMausIE,
)
from .webofstories import (
WebOfStoriesIE,
WebOfStoriesPlaylistIE,
)
from .weibo import WeiboIE
from .weiqitv import WeiqiTVIE
from .wimp import WimpIE
from .wistia import WistiaIE
from .worldstarhiphop import WorldStarHipHopIE
from .wrzuta import WrzutaIE
from .wrzuta import (
WrzutaIE,
WrzutaPlaylistIE,
)
from .wsj import WSJIE
from .xbef import XBefIE
from .xboxclips import XboxClipsIE
@@ -943,6 +1005,12 @@ from .xhamster import (
XHamsterIE,
XHamsterEmbedIE,
)
from .xiami import (
XiamiSongIE,
XiamiAlbumIE,
XiamiArtistIE,
XiamiCollectionIE
)
from .xminus import XMinusIE
from .xnxx import XNXXIE
from .xstream import XstreamIE
@@ -964,7 +1032,10 @@ from .yesjapan import YesJapanIE
from .yinyuetai import YinYueTaiIE
from .ynet import YnetIE
from .youjizz import YouJizzIE
from .youku import YoukuIE
from .youku import (
YoukuIE,
YoukuShowIE,
)
from .youporn import YouPornIE
from .yourupload import YourUploadIE
from .youtube import (

View File

@@ -0,0 +1,64 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
xpath_text,
parse_duration,
ExtractorError,
)
class EyedoTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?eyedo\.tv/[^/]+/(?:#!/)?Live/Detail/(?P<id>[0-9]+)'
_TEST = {
'url': 'https://www.eyedo.tv/en-US/#!/Live/Detail/16301',
'md5': 'ba14f17995cdfc20c36ba40e21bf73f7',
'info_dict': {
'id': '16301',
'ext': 'mp4',
'title': 'Journée du conseil scientifique de l\'Afnic 2015',
'description': 'md5:4abe07293b2f73efc6e1c37028d58c98',
'uploader': 'Afnic Live',
'uploader_id': '8023',
}
}
_ROOT_URL = 'http://live.eyedo.net:1935/'
def _real_extract(self, url):
video_id = self._match_id(url)
video_data = self._download_xml('http://eyedo.tv/api/live/GetLive/%s' % video_id, video_id)
def _add_ns(path):
return self._xpath_ns(path, 'http://schemas.datacontract.org/2004/07/EyeDo.Core.Implementation.Web.ViewModels.Api')
title = xpath_text(video_data, _add_ns('Titre'), 'title', True)
state_live_code = xpath_text(video_data, _add_ns('StateLiveCode'), 'title', True)
if state_live_code == 'avenir':
raise ExtractorError(
'%s said: We\'re sorry, but this video is not yet available.' % self.IE_NAME,
expected=True)
is_live = state_live_code == 'live'
m3u8_url = None
# http://eyedo.tv/Content/Html5/Scripts/html5view.js
if is_live:
if xpath_text(video_data, 'Cdn') == 'true':
m3u8_url = 'http://rrr.sz.xlcdn.com/?account=eyedo&file=A%s&type=live&service=wowza&protocol=http&output=playlist.m3u8' % video_id
else:
m3u8_url = self._ROOT_URL + 'w/%s/eyedo_720p/playlist.m3u8' % video_id
else:
m3u8_url = self._ROOT_URL + 'replay-w/%s/mp4:%s.mp4/playlist.m3u8' % (video_id, video_id)
return {
'id': video_id,
'title': title,
'formats': self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8' if is_live else 'm3u8_native'),
'description': xpath_text(video_data, _add_ns('Description')),
'duration': parse_duration(xpath_text(video_data, _add_ns('Duration'))),
'uploader': xpath_text(video_data, _add_ns('Createur')),
'uploader_id': xpath_text(video_data, _add_ns('CreateurId')),
'chapter': xpath_text(video_data, _add_ns('ChapitreTitre')),
'chapter_id': xpath_text(video_data, _add_ns('ChapitreId')),
}

View File

@@ -1,20 +1,19 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_urlparse
class FczenitIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?fc-zenit\.ru/video/gl(?P<id>[0-9]+)'
_VALID_URL = r'https?://(?:www\.)?fc-zenit\.ru/video/(?P<id>[0-9]+)'
_TEST = {
'url': 'http://fc-zenit.ru/video/gl6785/',
'md5': '458bacc24549173fe5a5aa29174a5606',
'url': 'http://fc-zenit.ru/video/41044/',
'md5': '0e3fab421b455e970fa1aa3891e57df0',
'info_dict': {
'id': '6785',
'id': '41044',
'ext': 'mp4',
'title': '«Зенит-ТВ»: как Олег Шатов играл против «Урала»',
'title': 'Так пишется история: казанский разгром ЦСКА на «Зенит-ТВ»',
},
}
@@ -22,15 +21,23 @@ class FczenitIE(InfoExtractor):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_title = self._html_search_regex(r'<div class=\"photoalbum__title\">([^<]+)', webpage, 'title')
video_title = self._html_search_regex(
r'<[^>]+class=\"photoalbum__title\">([^<]+)', webpage, 'title')
bitrates_raw = self._html_search_regex(r'bitrates:.*\n(.*)\]', webpage, 'video URL')
bitrates = re.findall(r'url:.?\'(.+?)\'.*?bitrate:.?([0-9]{3}?)', bitrates_raw)
video_items = self._parse_json(self._search_regex(
r'arrPath\s*=\s*JSON\.parse\(\'(.+)\'\)', webpage, 'video items'),
video_id)
def merge_dicts(*dicts):
ret = {}
for a_dict in dicts:
ret.update(a_dict)
return ret
formats = [{
'url': furl,
'tbr': tbr,
} for furl, tbr in bitrates]
'url': compat_urlparse.urljoin(url, video_url),
'tbr': int(tbr),
} for tbr, video_url in merge_dicts(*video_items).items()]
self._sort_formats(formats)

View File

@@ -24,13 +24,28 @@ class FlickrIE(InfoExtractor):
'upload_date': '20110423',
'uploader_id': '10922353@N03',
'uploader': 'Forest Wander',
'uploader_url': 'https://www.flickr.com/photos/forestwander-nature-pictures/',
'comment_count': int,
'view_count': int,
'tags': list,
'license': 'Attribution-ShareAlike',
}
}
_API_BASE_URL = 'https://api.flickr.com/services/rest?'
# https://help.yahoo.com/kb/flickr/SLN25525.html
_LICENSES = {
'0': 'All Rights Reserved',
'1': 'Attribution-NonCommercial-ShareAlike',
'2': 'Attribution-NonCommercial',
'3': 'Attribution-NonCommercial-NoDerivs',
'4': 'Attribution',
'5': 'Attribution-ShareAlike',
'6': 'Attribution-NoDerivs',
'7': 'No known copyright restrictions',
'8': 'United States government work',
'9': 'Public Domain Dedication (CC0)',
'10': 'Public Domain Work',
}
def _call_api(self, method, video_id, api_key, note, secret=None):
query = {
@@ -75,6 +90,9 @@ class FlickrIE(InfoExtractor):
self._sort_formats(formats)
owner = video_info.get('owner', {})
uploader_id = owner.get('nsid')
uploader_path = owner.get('path_alias') or uploader_id
uploader_url = 'https://www.flickr.com/photos/%s/' % uploader_path if uploader_path else None
return {
'id': video_id,
@@ -83,11 +101,13 @@ class FlickrIE(InfoExtractor):
'formats': formats,
'timestamp': int_or_none(video_info.get('dateuploaded')),
'duration': int_or_none(video_info.get('video', {}).get('duration')),
'uploader_id': owner.get('nsid'),
'uploader_id': uploader_id,
'uploader': owner.get('realname'),
'uploader_url': uploader_url,
'comment_count': int_or_none(video_info.get('comments', {}).get('_content')),
'view_count': int_or_none(video_info.get('views')),
'tags': [tag.get('_content') for tag in video_info.get('tags', {}).get('tag', [])]
'tags': [tag.get('_content') for tag in video_info.get('tags', {}).get('tag', [])],
'license': self._LICENSES.get(video_info.get('license')),
}
else:
raise ExtractorError('not a video', expected=True)

View File

@@ -0,0 +1,26 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class Formula1IE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?formula1\.com/content/fom-website/en/video/\d{4}/\d{1,2}/(?P<id>.+?)\.html'
_TEST = {
'url': 'http://www.formula1.com/content/fom-website/en/video/2016/5/Race_highlights_-_Spain_2016.html',
'md5': '8c79e54be72078b26b89e0e111c0502b',
'info_dict': {
'id': 'JvYXJpMzE6pArfHWm5ARp5AiUmD-gibV',
'ext': 'flv',
'title': 'Race highlights - Spain 2016',
},
'add_ie': ['Ooyala'],
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
ooyala_embed_code = self._search_regex(
r'data-videoid="([^"]+)"', webpage, 'ooyala embed code')
return self.url_result(
'ooyala:%s' % ooyala_embed_code, 'Ooyala', ooyala_embed_code)

View File

@@ -1,7 +1,10 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import smuggle_url
from ..utils import (
smuggle_url,
update_url_query,
)
class FoxSportsIE(InfoExtractor):
@@ -9,11 +12,15 @@ class FoxSportsIE(InfoExtractor):
_TEST = {
'url': 'http://www.foxsports.com/video?vid=432609859715',
'md5': 'b49050e955bebe32c301972e4012ac17',
'info_dict': {
'id': 'gA0bHB3Ladz3',
'ext': 'flv',
'id': 'i0qKWsk3qJaM',
'ext': 'mp4',
'title': 'Courtney Lee on going up 2-0 in series vs. Blazers',
'description': 'Courtney Lee talks about Memphis being focused.',
'upload_date': '20150423',
'timestamp': 1429761109,
'uploader': 'NEWA-FNG-FOXSPORTS',
},
'add_ie': ['ThePlatform'],
}
@@ -28,5 +35,8 @@ class FoxSportsIE(InfoExtractor):
r"data-player-config='([^']+)'", webpage, 'data player config'),
video_id)
return self.url_result(smuggle_url(
config['releaseURL'] + '&manifest=f4m', {'force_smil_url': True}))
return self.url_result(smuggle_url(update_url_query(
config['releaseURL'], {
'mbr': 'true',
'switch': 'http',
}), {'force_smil_url': True}))

View File

@@ -2,6 +2,10 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import (
compat_HTTPError,
compat_urllib_parse_unquote_plus,
)
from ..utils import (
clean_html,
determine_ext,
@@ -27,6 +31,7 @@ class FunimationIE(InfoExtractor):
'description': 'md5:1769f43cd5fc130ace8fd87232207892',
'thumbnail': 're:https?://.*\.jpg',
},
'skip': 'Access without user interaction is forbidden by CloudFlare, and video removed',
}, {
'url': 'http://www.funimation.com/shows/hacksign/videos/official/role-play',
'info_dict': {
@@ -37,6 +42,7 @@ class FunimationIE(InfoExtractor):
'description': 'md5:b602bdc15eef4c9bbb201bb6e6a4a2dd',
'thumbnail': 're:https?://.*\.jpg',
},
'skip': 'Access without user interaction is forbidden by CloudFlare',
}, {
'url': 'http://www.funimation.com/shows/attack-on-titan-junior-high/videos/promotional/broadcast-dub-preview',
'info_dict': {
@@ -47,8 +53,36 @@ class FunimationIE(InfoExtractor):
'description': 'md5:f8ec49c0aff702a7832cd81b8a44f803',
'thumbnail': 're:https?://.*\.(?:jpg|png)',
},
'skip': 'Access without user interaction is forbidden by CloudFlare',
}]
_LOGIN_URL = 'http://www.funimation.com/login'
def _download_webpage(self, *args, **kwargs):
try:
return super(FunimationIE, self)._download_webpage(*args, **kwargs)
except ExtractorError as ee:
if isinstance(ee.cause, compat_HTTPError) and ee.cause.code == 403:
response = ee.cause.read()
if b'>Please complete the security check to access<' in response:
raise ExtractorError(
'Access to funimation.com is blocked by CloudFlare. '
'Please browse to http://www.funimation.com/, solve '
'the reCAPTCHA, export browser cookies to a text file,'
' and then try again with --cookies YOUR_COOKIE_FILE.',
expected=True)
raise
def _extract_cloudflare_session_ua(self, url):
ci_session_cookie = self._get_cookies(url).get('ci_session')
if ci_session_cookie:
ci_session = compat_urllib_parse_unquote_plus(ci_session_cookie.value)
# ci_session is a string serialized by PHP function serialize()
# This case is simple enough to use regular expressions only
return self._search_regex(
r'"user_agent";s:\d+:"([^"]+)"', ci_session, 'user agent',
default=None)
def _login(self):
(username, password) = self._get_login_info()
if username is None:
@@ -57,8 +91,11 @@ class FunimationIE(InfoExtractor):
'email_field': username,
'password_field': password,
})
login_request = sanitized_Request('http://www.funimation.com/login', data, headers={
'User-Agent': 'Mozilla/5.0 (Windows NT 5.2; WOW64; rv:42.0) Gecko/20100101 Firefox/42.0',
user_agent = self._extract_cloudflare_session_ua(self._LOGIN_URL)
if not user_agent:
user_agent = 'Mozilla/5.0 (Windows NT 5.2; WOW64; rv:42.0) Gecko/20100101 Firefox/42.0'
login_request = sanitized_Request(self._LOGIN_URL, data, headers={
'User-Agent': user_agent,
'Content-Type': 'application/x-www-form-urlencoded'
})
login_page = self._download_webpage(
@@ -103,11 +140,16 @@ class FunimationIE(InfoExtractor):
('mobile', 'Mozilla/5.0 (Linux; Android 4.4.2; Nexus 4 Build/KOT49H) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.114 Mobile Safari/537.36'),
)
user_agent = self._extract_cloudflare_session_ua(url)
if user_agent:
USER_AGENTS = ((None, user_agent),)
for kind, user_agent in USER_AGENTS:
request = sanitized_Request(url)
request.add_header('User-Agent', user_agent)
webpage = self._download_webpage(
request, display_id, 'Downloading %s webpage' % kind)
request, display_id,
'Downloading %s webpage' % kind if kind else 'Downloading webpage')
playlist = self._parse_json(
self._search_regex(

View File

@@ -51,7 +51,7 @@ from .tnaflix import TNAFlixNetworkEmbedIE
from .vimeo import VimeoIE
from .dailymotion import DailymotionCloudIE
from .onionstudios import OnionStudiosIE
from .snagfilms import SnagFilmsEmbedIE
from .viewlift import ViewLiftEmbedIE
from .screenwavemedia import ScreenwaveMediaIE
from .mtv import MTVServicesEmbeddedIE
from .pladform import PladformIE
@@ -61,6 +61,9 @@ from .jwplatform import JWPlatformIE
from .digiteka import DigitekaIE
from .instagram import InstagramIE
from .liveleak import LiveLeakIE
from .threeqsdn import ThreeQSDNIE
from .theplatform import ThePlatformIE
from .vessel import VesselIE
class GenericIE(InfoExtractor):
@@ -624,13 +627,13 @@ class GenericIE(InfoExtractor):
},
# MTVSercices embed
{
'url': 'http://www.gametrailers.com/news-post/76093/north-america-europe-is-getting-that-mario-kart-8-mercedes-dlc-too',
'md5': '35727f82f58c76d996fc188f9755b0d5',
'url': 'http://www.vulture.com/2016/06/new-key-peele-sketches-released.html',
'md5': 'ca1aef97695ef2c1d6973256a57e5252',
'info_dict': {
'id': '0306a69b-8adf-4fb5-aace-75f8e8cbfca9',
'id': '769f7ec0-0692-4d62-9b45-0d88074bffc1',
'ext': 'mp4',
'title': 'Review',
'description': 'Mario\'s life in the fast lane has never looked so good.',
'title': 'Key and Peele|October 10, 2012|2|203|Liam Neesons - Uncensored',
'description': 'Two valets share their love for movie star Liam Neesons.',
},
},
# YouTube embed via <data-embed-url="">
@@ -716,15 +719,18 @@ class GenericIE(InfoExtractor):
},
# Wistia embed
{
'url': 'http://education-portal.com/academy/lesson/north-american-exploration-failed-colonies-of-spain-france-england.html#lesson',
'md5': '8788b683c777a5cf25621eaf286d0c23',
'url': 'http://study.com/academy/lesson/north-american-exploration-failed-colonies-of-spain-france-england.html#lesson',
'md5': '1953f3a698ab51cfc948ed3992a0b7ff',
'info_dict': {
'id': '1cfaf6b7ea',
'id': '6e2wtrbdaf',
'ext': 'mov',
'title': 'md5:51364a8d3d009997ba99656004b5e20d',
'duration': 643.0,
'filesize': 182808282,
'uploader': 'education-portal.com',
'title': 'paywall_north-american-exploration-failed-colonies-of-spain-france-england',
'description': 'a Paywall Videos video from Remilon',
'duration': 644.072,
'uploader': 'study.com',
'timestamp': 1459678540,
'upload_date': '20160403',
'filesize': 24687186,
},
},
{
@@ -733,14 +739,30 @@ class GenericIE(InfoExtractor):
'info_dict': {
'id': 'uxjb0lwrcz',
'ext': 'mp4',
'title': 'Conversation about Hexagonal Rails Part 1 - ThoughtWorks',
'title': 'Conversation about Hexagonal Rails Part 1',
'description': 'a Martin Fowler video from ThoughtWorks',
'duration': 1715.0,
'uploader': 'thoughtworks.wistia.com',
'upload_date': '20140603',
'timestamp': 1401832161,
'upload_date': '20140603',
},
},
# Wistia standard embed (async)
{
'url': 'https://www.getdrip.com/university/brennan-dunn-drip-workshop/',
'info_dict': {
'id': '807fafadvk',
'ext': 'mp4',
'title': 'Drip Brennan Dunn Workshop',
'description': 'a JV Webinars video from getdrip-1',
'duration': 4986.95,
'timestamp': 1463607249,
'upload_date': '20160518',
},
'params': {
'skip_download': True,
}
},
# Soundcloud embed
{
'url': 'http://nakedsecurity.sophos.com/2014/10/29/sscc-171-are-you-sure-that-1234-is-a-bad-password-podcast/',
@@ -763,6 +785,19 @@ class GenericIE(InfoExtractor):
'title': 'Rosetta #CometLanding webcast HL 10',
}
},
# Another Livestream embed, without 'new.' in URL
{
'url': 'https://www.freespeech.org/',
'info_dict': {
'id': '123537347',
'ext': 'mp4',
'title': 're:^FSTV [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
},
'params': {
# Live stream
'skip_download': True,
},
},
# LazyYT
{
'url': 'http://discourse.ubuntu.com/t/unity-8-desktop-mode-windows-on-mir/1986',
@@ -847,18 +882,6 @@ class GenericIE(InfoExtractor):
'title': 'EP3S5 - Bon Appétit - Baqueira Mi Corazon !',
}
},
# Kaltura embed
{
'url': 'http://www.monumentalnetwork.com/videos/john-carlson-postgame-2-25-15',
'info_dict': {
'id': '1_eergr3h1',
'ext': 'mp4',
'upload_date': '20150226',
'uploader_id': 'MonumentalSports-Kaltura@perfectsensedigital.com',
'timestamp': int,
'title': 'John Carlson Postgame 2/25/15',
},
},
# Kaltura embed (different embed code)
{
'url': 'http://www.premierchristianradio.com/Shows/Saturday/Unbelievable/Conference-Videos/Os-Guinness-Is-It-Fools-Talk-Unbelievable-Conference-2014',
@@ -884,9 +907,23 @@ class GenericIE(InfoExtractor):
'uploader_id': 'echojecka',
},
},
# Kaltura embed with single quotes
{
'url': 'http://fod.infobase.com/p_ViewPlaylist.aspx?AssignmentID=NUN8ZY',
'info_dict': {
'id': '0_izeg5utt',
'ext': 'mp4',
'title': '35871',
'timestamp': 1355743100,
'upload_date': '20121217',
'uploader_id': 'batchUser',
},
'add_ie': ['Kaltura'],
},
# Eagle.Platform embed (generic URL)
{
'url': 'http://lenta.ru/news/2015/03/06/navalny/',
# Not checking MD5 as sometimes the direct HTTP link results in 404 and HLS is used
'info_dict': {
'id': '227304',
'ext': 'mp4',
@@ -901,6 +938,7 @@ class GenericIE(InfoExtractor):
# ClipYou (Eagle.Platform) embed (custom URL)
{
'url': 'http://muz-tv.ru/play/7129/',
# Not checking MD5 as sometimes the direct HTTP link results in 404 and HLS is used
'info_dict': {
'id': '12820',
'ext': 'mp4',
@@ -994,16 +1032,31 @@ class GenericIE(InfoExtractor):
'timestamp': 1389118457,
},
},
# NBC News embed
{
'url': 'http://www.vulture.com/2016/06/letterman-couldnt-care-less-about-late-night.html',
'md5': '1aa589c675898ae6d37a17913cf68d66',
'info_dict': {
'id': '701714499682',
'ext': 'mp4',
'title': 'PREVIEW: On Assignment: David Letterman',
'description': 'A preview of Tom Brokaw\'s interview with David Letterman as part of the On Assignment series powered by Dateline. Airs Sunday June 12 at 7/6c.',
},
},
# UDN embed
{
'url': 'http://www.udn.com/news/story/7314/822787',
'url': 'https://video.udn.com/news/300346',
'md5': 'fd2060e988c326991037b9aff9df21a6',
'info_dict': {
'id': '300346',
'ext': 'mp4',
'title': '中一中男師變性 全校師生力挺',
'thumbnail': 're:^https?://.*\.jpg$',
}
},
'params': {
# m3u8 download
'skip_download': True,
},
},
# Ooyala embed
{
@@ -1020,20 +1073,6 @@ class GenericIE(InfoExtractor):
'skip_download': True,
}
},
# Contains a SMIL manifest
{
'url': 'http://www.telewebion.com/fa/1263668/%D9%82%D8%B1%D8%B9%D9%87%E2%80%8C%DA%A9%D8%B4%DB%8C-%D9%84%DB%8C%DA%AF-%D9%82%D9%87%D8%B1%D9%85%D8%A7%D9%86%D8%A7%D9%86-%D8%A7%D8%B1%D9%88%D9%BE%D8%A7/%2B-%D9%81%D9%88%D8%AA%D8%A8%D8%A7%D9%84.html',
'info_dict': {
'id': 'file',
'ext': 'flv',
'title': '+ Football: Lottery Champions League Europe',
'uploader': 'www.telewebion.com',
},
'params': {
# rtmpe downloads
'skip_download': True,
}
},
# Brightcove URL in single quotes
{
'url': 'http://www.sportsnet.ca/baseball/mlb/sn-presents-russell-martin-world-citizen/',
@@ -1171,6 +1210,16 @@ class GenericIE(InfoExtractor):
'uploader': 'Lake8737',
}
},
# Duplicated embedded video URLs
{
'url': 'http://www.hudl.com/athlete/2538180/highlights/149298443',
'info_dict': {
'id': '149298443_480_16c25b74_2',
'ext': 'mp4',
'title': 'vs. Blue Orange Spring Game',
'uploader': 'www.hudl.com',
},
},
]
def report_following_redirect(self, new_url):
@@ -1425,7 +1474,8 @@ class GenericIE(InfoExtractor):
# Site Name | Video Title
# Video Title - Tagline | Site Name
# and so on and so forth; it's just not practical
video_title = self._html_search_regex(
video_title = self._og_search_title(
webpage, default=None) or self._html_search_regex(
r'(?s)<title>(.*?)</title>', webpage, 'video title',
default='video')
@@ -1443,6 +1493,9 @@ class GenericIE(InfoExtractor):
video_uploader = self._search_regex(
r'^(?:https?://)?([^/]*)/.*', url, 'video uploader')
video_description = self._og_search_description(webpage, default=None)
video_thumbnail = self._og_search_thumbnail(webpage, default=None)
# Helper method
def _playlist_from_matches(matches, getter=None, ie=None):
urlrs = orderedSet(
@@ -1473,6 +1526,16 @@ class GenericIE(InfoExtractor):
if bc_urls:
return _playlist_from_matches(bc_urls, ie='BrightcoveNew')
# Look for ThePlatform embeds
tp_urls = ThePlatformIE._extract_urls(webpage)
if tp_urls:
return _playlist_from_matches(tp_urls, ie='ThePlatform')
# Look for Vessel embeds
vessel_urls = VesselIE._extract_urls(webpage)
if vessel_urls:
return _playlist_from_matches(vessel_urls, ie=VesselIE.ie_key())
# Look for embedded rtl.nl player
matches = re.findall(
r'<iframe[^>]+?src="((?:https?:)?//(?:www\.)?rtl\.nl/system/videoplayer/[^"]+(?:video_)?embed[^"]+)"',
@@ -1541,21 +1604,26 @@ class GenericIE(InfoExtractor):
'url': embed_url,
'ie_key': 'Wistia',
'uploader': video_uploader,
'title': video_title,
'id': video_id,
}
match = re.search(r'(?:id=["\']wistia_|data-wistia-?id=["\']|Wistia\.embed\(["\'])(?P<id>[^"\']+)', webpage)
if match:
return {
'_type': 'url_transparent',
'url': 'http://fast.wistia.net/embed/iframe/{0:}'.format(match.group('id')),
'url': 'wistia:%s' % match.group('id'),
'ie_key': 'Wistia',
'uploader': video_uploader,
'title': video_title,
'id': match.group('id')
}
match = re.search(
r'''(?sx)
<script[^>]+src=(["'])(?:https?:)?//fast\.wistia\.com/assets/external/E-v1\.js\1[^>]*>.*?
<div[^>]+class=(["']).*?\bwistia_async_(?P<id>[a-z0-9]+)\b.*?\2
''', webpage)
if match:
return self.url_result(self._proto_relative_url(
'wistia:%s' % match.group('id')), 'Wistia')
# Look for SVT player
svt_url = SVTIE._extract_url(webpage)
if svt_url:
@@ -1775,14 +1843,6 @@ class GenericIE(InfoExtractor):
url = unescapeHTML(mobj.group('url'))
return self.url_result(url)
# Look for embedded vulture.com player
mobj = re.search(
r'<iframe src="(?P<url>https?://video\.vulture\.com/[^"]+)"',
webpage)
if mobj is not None:
url = unescapeHTML(mobj.group('url'))
return self.url_result(url, ie='Vulture')
# Look for embedded mtvservices player
mtvservices_url = MTVServicesEmbeddedIE._extract_url(webpage)
if mtvservices_url:
@@ -1831,7 +1891,7 @@ class GenericIE(InfoExtractor):
return self.url_result(self._proto_relative_url(mobj.group('url'), scheme='http:'), 'CondeNast')
mobj = re.search(
r'<iframe[^>]+src="(?P<url>https?://new\.livestream\.com/[^"]+/player[^"]+)"',
r'<iframe[^>]+src="(?P<url>https?://(?:new\.)?livestream\.com/[^"]+/player[^"]+)"',
webpage)
if mobj is not None:
return self.url_result(mobj.group('url'), 'Livestream')
@@ -1843,7 +1903,7 @@ class GenericIE(InfoExtractor):
return self.url_result(mobj.group('url'), 'Zapiks')
# Look for Kaltura embeds
mobj = (re.search(r"(?s)kWidget\.(?:thumb)?[Ee]mbed\(\{.*?'wid'\s*:\s*'_?(?P<partner_id>[^']+)',.*?'entry_?[Ii]d'\s*:\s*'(?P<id>[^']+)',", webpage) or
mobj = (re.search(r"(?s)kWidget\.(?:thumb)?[Ee]mbed\(\{.*?(?P<q1>['\"])wid(?P=q1)\s*:\s*(?P<q2>['\"])_?(?P<partner_id>[^'\"]+)(?P=q2),.*?(?P<q3>['\"])entry_?[Ii]d(?P=q3)\s*:\s*(?P<q4>['\"])(?P<id>[^'\"]+)(?P=q4),", webpage) or
re.search(r'(?s)(?P<q1>["\'])(?:https?:)?//cdnapi(?:sec)?\.kaltura\.com/.*?(?:p|partner_id)/(?P<partner_id>\d+).*?(?P=q1).*?entry_?[Ii]d\s*:\s*(?P<q2>["\'])(?P<id>.+?)(?P=q2)', webpage))
if mobj is not None:
return self.url_result(smuggle_url(
@@ -1895,6 +1955,12 @@ class GenericIE(InfoExtractor):
if nbc_sports_url:
return self.url_result(nbc_sports_url, 'NBCSportsVPlayer')
# Look for NBC News embeds
nbc_news_embed_url = re.search(
r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//www\.nbcnews\.com/widget/video-embed/[^"\']+)\1', webpage)
if nbc_news_embed_url:
return self.url_result(nbc_news_embed_url.group('url'), 'NBCNews')
# Look for Google Drive embeds
google_drive_url = GoogleDriveIE._extract_url(webpage)
if google_drive_url:
@@ -1922,10 +1988,10 @@ class GenericIE(InfoExtractor):
if onionstudios_url:
return self.url_result(onionstudios_url)
# Look for SnagFilms embeds
snagfilms_url = SnagFilmsEmbedIE._extract_url(webpage)
if snagfilms_url:
return self.url_result(snagfilms_url)
# Look for ViewLift embeds
viewlift_url = ViewLiftEmbedIE._extract_url(webpage)
if viewlift_url:
return self.url_result(viewlift_url)
# Look for JWPlatform embeds
jwplatform_url = JWPlatformIE._extract_url(webpage)
@@ -1981,6 +2047,19 @@ class GenericIE(InfoExtractor):
if liveleak_url:
return self.url_result(liveleak_url, 'LiveLeak')
# Look for 3Q SDN embeds
threeqsdn_url = ThreeQSDNIE._extract_url(webpage)
if threeqsdn_url:
return {
'_type': 'url_transparent',
'ie_key': ThreeQSDNIE.ie_key(),
'url': self._proto_relative_url(threeqsdn_url),
'title': video_title,
'description': video_description,
'thumbnail': video_thumbnail,
'uploader': video_uploader,
}
def check_video(vurl):
if YoutubeIE.suitable(vurl):
return True
@@ -2061,7 +2140,7 @@ class GenericIE(InfoExtractor):
raise UnsupportedError(url)
entries = []
for video_url in found:
for video_url in orderedSet(found):
video_url = unescapeHTML(video_url)
video_url = video_url.replace('\\/', '/')
video_url = compat_urlparse.urljoin(url, video_url)

View File

@@ -0,0 +1,66 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from .ooyala import OoyalaIE
from ..utils import js_to_json
class GodTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?god\.tv(?:/[^/]+)*/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://god.tv/jesus-image/video/jesus-conference-2016/randy-needham',
'info_dict': {
'id': 'lpd3g2MzE6D1g8zFAKz8AGpxWcpu6o_3',
'ext': 'mp4',
'title': 'Randy Needham',
'duration': 3615.08,
},
'params': {
'skip_download': True,
}
}, {
'url': 'http://god.tv/playlist/bible-study',
'info_dict': {
'id': 'bible-study',
},
'playlist_mincount': 37,
}, {
'url': 'http://god.tv/node/15097',
'only_matching': True,
}, {
'url': 'http://god.tv/live/africa',
'only_matching': True,
}, {
'url': 'http://god.tv/liveevents',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
settings = self._parse_json(
self._search_regex(
r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);',
webpage, 'settings', default='{}'),
display_id, transform_source=js_to_json, fatal=False)
ooyala_id = None
if settings:
playlist = settings.get('playlist')
if playlist and isinstance(playlist, list):
entries = [
OoyalaIE._build_url_result(video['content_id'])
for video in playlist if video.get('content_id')]
if entries:
return self.playlist_result(entries, display_id)
ooyala_id = settings.get('ooyala', {}).get('content_id')
if not ooyala_id:
ooyala_id = self._search_regex(
r'["\']content_id["\']\s*:\s*(["\'])(?P<id>[\w-]+)\1',
webpage, 'ooyala id', group='id')
return OoyalaIE._build_url_result(ooyala_id)

View File

@@ -4,7 +4,7 @@ from .common import InfoExtractor
class GrouponIE(InfoExtractor):
_VALID_URL = r'https?://www\.groupon\.com/deals/(?P<id>[^?#]+)'
_VALID_URL = r'https?://(?:www\.)?groupon\.com/deals/(?P<id>[^/?#&]+)'
_TEST = {
'url': 'https://www.groupon.com/deals/bikram-yoga-huntington-beach-2#ooid=tubGNycTo_9Uxg82uESj4i61EYX8nyuf',
@@ -14,17 +14,27 @@ class GrouponIE(InfoExtractor):
'description': 'Studio kept at 105 degrees and 40% humidity with anti-microbial and anti-slip Flotex flooring; certified instructors',
},
'playlist': [{
'md5': '42428ce8a00585f9bc36e49226eae7a1',
'info_dict': {
'id': 'tubGNycTo_9Uxg82uESj4i61EYX8nyuf',
'ext': 'flv',
'title': 'Bikram Yoga Huntington Beach | Orange County',
'id': 'fk6OhWpXgIQ',
'ext': 'mp4',
'title': 'Bikram Yoga Huntington Beach | Orange County !tubGNycTo@9Uxg82uESj4i61EYX8nyuf',
'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
'duration': 44.961,
'duration': 45,
'upload_date': '20160405',
'uploader_id': 'groupon',
'uploader': 'Groupon',
},
'add_ie': ['Youtube'],
}],
'params': {
'skip_download': 'HDS',
}
'skip_download': True,
},
}
_PROVIDERS = {
'ooyala': ('ooyala:%s', 'Ooyala'),
'youtube': ('%s', 'Youtube'),
}
def _real_extract(self, url):
@@ -36,12 +46,17 @@ class GrouponIE(InfoExtractor):
videos = payload['carousel'].get('dealVideos', [])
entries = []
for v in videos:
if v.get('provider') != 'OOYALA':
provider = v.get('provider')
video_id = v.get('media') or v.get('id') or v.get('baseURL')
if not provider or not video_id:
continue
url_pattern, ie_key = self._PROVIDERS.get(provider.lower())
if not url_pattern:
self.report_warning(
'%s: Unsupported video provider %s, skipping video' %
(playlist_id, v.get('provider')))
(playlist_id, provider))
continue
entries.append(self.url_result('ooyala:%s' % v['media']))
entries.append(self.url_result(url_pattern % video_id, ie_key))
return {
'_type': 'playlist',

View File

@@ -7,6 +7,7 @@ from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import (
HEADRequest,
KNOWN_EXTENSIONS,
sanitized_Request,
str_to_int,
urlencode_postdata,
@@ -17,7 +18,7 @@ from ..utils import (
class HearThisAtIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?hearthis\.at/(?P<artist>[^/]+)/(?P<title>[A-Za-z0-9\-]+)/?$'
_PLAYLIST_URL = 'https://hearthis.at/playlist.php'
_TEST = {
_TESTS = [{
'url': 'https://hearthis.at/moofi/dr-kreep',
'md5': 'ab6ec33c8fed6556029337c7885eb4e0',
'info_dict': {
@@ -26,7 +27,7 @@ class HearThisAtIE(InfoExtractor):
'title': 'Moofi - Dr. Kreep',
'thumbnail': 're:^https?://.*\.jpg$',
'timestamp': 1421564134,
'description': 'Creepy Patch. Mutable Instruments Braids Vowel + Formant Mode.',
'description': 'Listen to Dr. Kreep by Moofi on hearthis.at - Modular, Eurorack, Mutable Intruments Braids, Valhalla-DSP',
'upload_date': '20150118',
'comment_count': int,
'view_count': int,
@@ -34,7 +35,25 @@ class HearThisAtIE(InfoExtractor):
'duration': 71,
'categories': ['Experimental'],
}
}
}, {
# 'download' link redirects to the original webpage
'url': 'https://hearthis.at/twitchsf/dj-jim-hopkins-totally-bitchin-80s-dance-mix/',
'md5': '5980ceb7c461605d30f1f039df160c6e',
'info_dict': {
'id': '811296',
'ext': 'mp3',
'title': 'TwitchSF - DJ Jim Hopkins - Totally Bitchin\' 80\'s Dance Mix!',
'description': 'Listen to DJ Jim Hopkins - Totally Bitchin\' 80\'s Dance Mix! by TwitchSF on hearthis.at - Dance',
'upload_date': '20160328',
'timestamp': 1459186146,
'thumbnail': 're:^https?://.*\.jpg$',
'comment_count': int,
'view_count': int,
'like_count': int,
'duration': 4360,
'categories': ['Dance'],
},
}]
def _real_extract(self, url):
m = re.match(self._VALID_URL, url)
@@ -90,13 +109,14 @@ class HearThisAtIE(InfoExtractor):
ext_handle = self._request_webpage(
ext_req, display_id, note='Determining extension')
ext = urlhandle_detect_ext(ext_handle)
formats.append({
'format_id': 'download',
'vcodec': 'none',
'ext': ext,
'url': download_url,
'preference': 2, # Usually better quality
})
if ext in KNOWN_EXTENSIONS:
formats.append({
'format_id': 'download',
'vcodec': 'none',
'ext': ext,
'url': download_url,
'preference': 2, # Usually better quality
})
self._sort_formats(formats)
return {

View File

@@ -8,7 +8,7 @@ class HowcastIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?howcast\.com/videos/(?P<id>\d+)'
_TEST = {
'url': 'http://www.howcast.com/videos/390161-How-to-Tie-a-Square-Knot-Properly',
'md5': '8b743df908c42f60cf6496586c7f12c3',
'md5': '7d45932269a288149483144f01b99789',
'info_dict': {
'id': '390161',
'ext': 'mp4',
@@ -19,9 +19,9 @@ class HowcastIE(InfoExtractor):
'duration': 56.823,
},
'params': {
# m3u8 download
'skip_download': True,
},
'add_ie': ['Ooyala'],
}
def _real_extract(self, url):

View File

@@ -1,10 +1,10 @@
from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor
from ..utils import (
mimetype2ext,
qualities,
)
@@ -12,9 +12,9 @@ from ..utils import (
class ImdbIE(InfoExtractor):
IE_NAME = 'imdb'
IE_DESC = 'Internet Movie Database trailers'
_VALID_URL = r'https?://(?:www|m)\.imdb\.com/video/imdb/vi(?P<id>\d+)'
_VALID_URL = r'https?://(?:www|m)\.imdb\.com/(?:video/[^/]+/|title/tt\d+.*?#lb-)vi(?P<id>\d+)'
_TEST = {
_TESTS = [{
'url': 'http://www.imdb.com/video/imdb/vi2524815897',
'info_dict': {
'id': '2524815897',
@@ -22,7 +22,16 @@ class ImdbIE(InfoExtractor):
'title': 'Ice Age: Continental Drift Trailer (No. 2) - IMDb',
'description': 'md5:9061c2219254e5d14e03c25c98e96a81',
}
}
}, {
'url': 'http://www.imdb.com/video/_/vi2524815897',
'only_matching': True,
}, {
'url': 'http://www.imdb.com/title/tt1667889/?ref_=ext_shr_eml_vi#lb-vi2524815897',
'only_matching': True,
}, {
'url': 'http://www.imdb.com/title/tt1667889/#lb-vi2524815897',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
@@ -48,13 +57,27 @@ class ImdbIE(InfoExtractor):
json_data = self._search_regex(
r'<script[^>]+class="imdb-player-data"[^>]*?>(.*?)</script>',
format_page, 'json data', flags=re.DOTALL)
info = json.loads(json_data)
format_info = info['videoPlayerObject']['video']
f_id = format_info['ffname']
info = self._parse_json(json_data, video_id, fatal=False)
if not info:
continue
format_info = info.get('videoPlayerObject', {}).get('video', {})
if not format_info:
continue
video_info_list = format_info.get('videoInfoList')
if not video_info_list or not isinstance(video_info_list, list):
continue
video_info = video_info_list[0]
if not video_info or not isinstance(video_info, dict):
continue
video_url = video_info.get('videoUrl')
if not video_url:
continue
format_id = format_info.get('ffname')
formats.append({
'format_id': f_id,
'url': format_info['videoInfoList'][0]['videoUrl'],
'quality': quality(f_id),
'format_id': format_id,
'url': video_url,
'ext': mimetype2ext(video_info.get('videoMimeType')),
'quality': quality(format_id),
})
self._sort_formats(formats)

View File

@@ -60,7 +60,8 @@ class IndavideoEmbedIE(InfoExtractor):
formats = [{
'url': video_url,
'height': self._search_regex(r'\.(\d{3,4})\.mp4$', video_url, 'height', default=None),
'height': int_or_none(self._search_regex(
r'\.(\d{3,4})\.mp4(?:\?|$)', video_url, 'height', default=None)),
} for video_url in video_urls]
self._sort_formats(formats)

View File

@@ -8,6 +8,7 @@ from ..utils import (
int_or_none,
limit_length,
lowercase_escape,
try_get,
)
@@ -19,10 +20,16 @@ class InstagramIE(InfoExtractor):
'info_dict': {
'id': 'aye83DjauH',
'ext': 'mp4',
'uploader_id': 'naomipq',
'title': 'Video by naomipq',
'description': 'md5:1f17f0ab29bd6fe2bfad705f58de3cb8',
}
'thumbnail': 're:^https?://.*\.jpg',
'timestamp': 1371748545,
'upload_date': '20130620',
'uploader_id': 'naomipq',
'uploader': 'Naomi Leonor Phan-Quang',
'like_count': int,
'comment_count': int,
},
}, {
# missing description
'url': 'https://www.instagram.com/p/BA-pQFBG8HZ/?taken-by=britneyspears',
@@ -31,6 +38,13 @@ class InstagramIE(InfoExtractor):
'ext': 'mp4',
'uploader_id': 'britneyspears',
'title': 'Video by britneyspears',
'thumbnail': 're:^https?://.*\.jpg',
'timestamp': 1453760977,
'upload_date': '20160125',
'uploader_id': 'britneyspears',
'uploader': 'Britney Spears',
'like_count': int,
'comment_count': int,
},
'params': {
'skip_download': True,
@@ -67,21 +81,57 @@ class InstagramIE(InfoExtractor):
url = mobj.group('url')
webpage = self._download_webpage(url, video_id)
uploader_id = self._search_regex(r'"owner":{"username":"(.+?)"',
webpage, 'uploader id', fatal=False)
desc = self._search_regex(
r'"caption":"(.+?)"', webpage, 'description', default=None)
if desc is not None:
desc = lowercase_escape(desc)
(video_url, description, thumbnail, timestamp, uploader,
uploader_id, like_count, comment_count) = [None] * 8
shared_data = self._parse_json(
self._search_regex(
r'window\._sharedData\s*=\s*({.+?});',
webpage, 'shared data', default='{}'),
video_id, fatal=False)
if shared_data:
media = try_get(
shared_data, lambda x: x['entry_data']['PostPage'][0]['media'], dict)
if media:
video_url = media.get('video_url')
description = media.get('caption')
thumbnail = media.get('display_src')
timestamp = int_or_none(media.get('date'))
uploader = media.get('owner', {}).get('full_name')
uploader_id = media.get('owner', {}).get('username')
like_count = int_or_none(media.get('likes', {}).get('count'))
comment_count = int_or_none(media.get('comments', {}).get('count'))
if not video_url:
video_url = self._og_search_video_url(webpage, secure=False)
if not uploader_id:
uploader_id = self._search_regex(
r'"owner"\s*:\s*{\s*"username"\s*:\s*"(.+?)"',
webpage, 'uploader id', fatal=False)
if not description:
description = self._search_regex(
r'"caption"\s*:\s*"(.+?)"', webpage, 'description', default=None)
if description is not None:
description = lowercase_escape(description)
if not thumbnail:
thumbnail = self._og_search_thumbnail(webpage)
return {
'id': video_id,
'url': self._og_search_video_url(webpage, secure=False),
'url': video_url,
'ext': 'mp4',
'title': 'Video by %s' % uploader_id,
'thumbnail': self._og_search_thumbnail(webpage),
'description': description,
'thumbnail': thumbnail,
'timestamp': timestamp,
'uploader_id': uploader_id,
'description': desc,
'uploader': uploader,
'like_count': like_count,
'comment_count': comment_count,
}

View File

@@ -505,7 +505,10 @@ class IqiyiIE(InfoExtractor):
'enc': md5_text(enc_key + tail),
'qyid': _uuid,
'tn': random.random(),
'um': 0,
# In iQiyi's flash player, um is set to 1 if there's a logged user
# Some 1080P formats are only available with a logged user.
# Here force um=1 to trick the iQiyi server
'um': 1,
'authkey': md5_text(md5_text('') + tail),
'k_tag': 1,
}

View File

@@ -5,33 +5,76 @@ import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
float_or_none,
int_or_none,
)
class JWPlatformBaseIE(InfoExtractor):
def _parse_jwplayer_data(self, jwplayer_data, video_id, require_title=True):
@staticmethod
def _find_jwplayer_data(webpage):
# TODO: Merge this with JWPlayer-related codes in generic.py
mobj = re.search(
'jwplayer\((?P<quote>[\'"])[^\'" ]+(?P=quote)\)\.setup\((?P<options>[^)]+)\)',
webpage)
if mobj:
return mobj.group('options')
def _extract_jwplayer_data(self, webpage, video_id, *args, **kwargs):
jwplayer_data = self._parse_json(
self._find_jwplayer_data(webpage), video_id)
return self._parse_jwplayer_data(
jwplayer_data, video_id, *args, **kwargs)
def _parse_jwplayer_data(self, jwplayer_data, video_id, require_title=True, m3u8_id=None, rtmp_params=None):
# JWPlayer backward compatibility: flattened playlists
# https://github.com/jwplayer/jwplayer/blob/v7.4.3/src/js/api/config.js#L81-L96
if 'playlist' not in jwplayer_data:
jwplayer_data = {'playlist': [jwplayer_data]}
video_data = jwplayer_data['playlist'][0]
# JWPlayer backward compatibility: flattened sources
# https://github.com/jwplayer/jwplayer/blob/v7.4.3/src/js/playlist/item.js#L29-L35
if 'sources' not in video_data:
video_data['sources'] = [video_data]
formats = []
for source in video_data['sources']:
source_url = self._proto_relative_url(source['file'])
source_type = source.get('type') or ''
if source_type in ('application/vnd.apple.mpegurl', 'hls'):
if source_type in ('application/vnd.apple.mpegurl', 'hls') or determine_ext(source_url) == 'm3u8':
formats.extend(self._extract_m3u8_formats(
source_url, video_id, 'mp4', 'm3u8_native', fatal=False))
source_url, video_id, 'mp4', 'm3u8_native', m3u8_id=m3u8_id, fatal=False))
elif source_type.startswith('audio'):
formats.append({
'url': source_url,
'vcodec': 'none',
})
else:
formats.append({
a_format = {
'url': source_url,
'width': int_or_none(source.get('width')),
'height': int_or_none(source.get('height')),
})
}
if source_url.startswith('rtmp'):
a_format['ext'] = 'flv',
# See com/longtailvideo/jwplayer/media/RTMPMediaProvider.as
# of jwplayer.flash.swf
rtmp_url_parts = re.split(
r'((?:mp4|mp3|flv):)', source_url, 1)
if len(rtmp_url_parts) == 3:
rtmp_url, prefix, play_path = rtmp_url_parts
a_format.update({
'url': rtmp_url,
'play_path': prefix + play_path,
})
if rtmp_params:
a_format.update(rtmp_params)
formats.append(a_format)
self._sort_formats(formats)
subtitles = {}

View File

@@ -148,8 +148,8 @@ class KuwoAlbumIE(InfoExtractor):
'url': 'http://www.kuwo.cn/album/502294/',
'info_dict': {
'id': '502294',
'title': 'M',
'description': 'md5:6a7235a84cc6400ec3b38a7bdaf1d60c',
'title': 'Made\xa0Series\xa0《M》',
'description': 'md5:d463f0d8a0ff3c3ea3d6ed7452a9483f',
},
'playlist_count': 2,
}
@@ -209,7 +209,7 @@ class KuwoSingerIE(InfoExtractor):
'url': 'http://www.kuwo.cn/mingxing/bruno+mars/',
'info_dict': {
'id': 'bruno+mars',
'title': 'Bruno Mars',
'title': 'Bruno\xa0Mars',
},
'playlist_mincount': 329,
}, {
@@ -283,6 +283,8 @@ class KuwoCategoryIE(InfoExtractor):
category_desc = remove_start(
get_element_by_id('intro', webpage).strip(),
'%s简介:' % category_name)
if category_desc == '暂无':
category_desc = None
jsonm = self._parse_json(self._html_search_regex(
r'var\s+jsonm\s*=\s*([^;]+);', webpage, 'category songs'), category_id)
@@ -304,7 +306,7 @@ class KuwoMvIE(KuwoBaseIE):
'id': '6480076',
'ext': 'mp4',
'title': 'My HouseMV',
'creator': '2PM',
'creator': 'PM02:00',
},
# In this video, music URLs (anti.s) are blocked outside China and
# USA, while the MV URL (mvurl) is available globally, so force the MV

View File

@@ -0,0 +1,33 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class LearnrIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?learnr\.pro/view/video/(?P<id>[0-9]+)'
_TEST = {
'url': 'http://www.learnr.pro/view/video/51624-web-development-tutorial-for-beginners-1-how-to-build-webpages-with-html-css-javascript',
'md5': '3719fdf0a68397f49899e82c308a89de',
'info_dict': {
'id': '51624',
'ext': 'mp4',
'title': 'Web Development Tutorial for Beginners (#1) - How to build webpages with HTML, CSS, Javascript',
'description': 'md5:b36dbfa92350176cdf12b4d388485503',
'uploader': 'LearnCode.academy',
'uploader_id': 'learncodeacademy',
'upload_date': '20131021',
},
'add_ie': ['Youtube'],
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
return {
'_type': 'url_transparent',
'url': self._search_regex(
r"videoId\s*:\s*'([^']+)'", webpage, 'youtube id'),
'id': video_id,
}

View File

@@ -28,7 +28,7 @@ from ..utils import (
class LeIE(InfoExtractor):
IE_DESC = '乐视网'
_VALID_URL = r'https?://www\.le\.com/ptv/vplay/(?P<id>\d+)\.html'
_VALID_URL = r'https?://(?:www\.le\.com/ptv/vplay|sports\.le\.com/video)/(?P<id>\d+)\.html'
_URL_TEMPLATE = 'http://www.le.com/ptv/vplay/%s.html'
@@ -69,6 +69,9 @@ class LeIE(InfoExtractor):
'hls_prefer_native': True,
},
'skip': 'Only available in China',
}, {
'url': 'http://sports.le.com/video/25737697.html',
'only_matching': True,
}]
@staticmethod
@@ -196,7 +199,7 @@ class LeIE(InfoExtractor):
class LePlaylistIE(InfoExtractor):
_VALID_URL = r'https?://[a-z]+\.le\.com/[a-z]+/(?P<id>[a-z0-9_]+)'
_VALID_URL = r'https?://[a-z]+\.le\.com/(?!video)[a-z]+/(?P<id>[a-z0-9_]+)'
_TESTS = [{
'url': 'http://www.le.com/tv/46177.html',

View File

@@ -0,0 +1,143 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
float_or_none,
int_or_none,
parse_filesize,
)
class LibraryOfCongressIE(InfoExtractor):
IE_NAME = 'loc'
IE_DESC = 'Library of Congress'
_VALID_URL = r'https?://(?:www\.)?loc\.gov/(?:item/|today/cyberlc/feature_wdesc\.php\?.*\brec=)(?P<id>[0-9]+)'
_TESTS = [{
# embedded via <div class="media-player"
'url': 'http://loc.gov/item/90716351/',
'md5': '353917ff7f0255aa6d4b80a034833de8',
'info_dict': {
'id': '90716351',
'ext': 'mp4',
'title': "Pa's trip to Mars",
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 0,
'view_count': int,
},
}, {
# webcast embedded via mediaObjectId
'url': 'https://www.loc.gov/today/cyberlc/feature_wdesc.php?rec=5578',
'info_dict': {
'id': '5578',
'ext': 'mp4',
'title': 'Help! Preservation Training Needs Here, There & Everywhere',
'duration': 3765,
'view_count': int,
'subtitles': 'mincount:1',
},
'params': {
'skip_download': True,
},
}, {
# with direct download links
'url': 'https://www.loc.gov/item/78710669/',
'info_dict': {
'id': '78710669',
'ext': 'mp4',
'title': 'La vie et la passion de Jesus-Christ',
'duration': 0,
'view_count': int,
'formats': 'mincount:4',
},
'params': {
'skip_download': True,
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
media_id = self._search_regex(
(r'id=(["\'])media-player-(?P<id>.+?)\1',
r'<video[^>]+id=(["\'])uuid-(?P<id>.+?)\1',
r'<video[^>]+data-uuid=(["\'])(?P<id>.+?)\1',
r'mediaObjectId\s*:\s*(["\'])(?P<id>.+?)\1'),
webpage, 'media id', group='id')
data = self._download_json(
'https://media.loc.gov/services/v1/media?id=%s&context=json' % media_id,
video_id)['mediaObject']
derivative = data['derivatives'][0]
media_url = derivative['derivativeUrl']
title = derivative.get('shortName') or data.get('shortName') or self._og_search_title(
webpage)
# Following algorithm was extracted from setAVSource js function
# found in webpage
media_url = media_url.replace('rtmp', 'https')
is_video = data.get('mediaType', 'v').lower() == 'v'
ext = determine_ext(media_url)
if ext not in ('mp4', 'mp3'):
media_url += '.mp4' if is_video else '.mp3'
if 'vod/mp4:' in media_url:
formats = [{
'url': media_url.replace('vod/mp4:', 'hls-vod/media/') + '.m3u8',
'format_id': 'hls',
'ext': 'mp4',
'protocol': 'm3u8_native',
'quality': 1,
}]
elif 'vod/mp3:' in media_url:
formats = [{
'url': media_url.replace('vod/mp3:', ''),
'vcodec': 'none',
}]
download_urls = set()
for m in re.finditer(
r'<option[^>]+value=(["\'])(?P<url>.+?)\1[^>]+data-file-download=[^>]+>\s*(?P<id>.+?)(?:(?:&nbsp;|\s+)\((?P<size>.+?)\))?\s*<', webpage):
format_id = m.group('id').lower()
if format_id == 'gif':
continue
download_url = m.group('url')
if download_url in download_urls:
continue
download_urls.add(download_url)
formats.append({
'url': download_url,
'format_id': format_id,
'filesize_approx': parse_filesize(m.group('size')),
})
self._sort_formats(formats)
duration = float_or_none(data.get('duration'))
view_count = int_or_none(data.get('viewCount'))
subtitles = {}
cc_url = data.get('ccUrl')
if cc_url:
subtitles.setdefault('en', []).append({
'url': cc_url,
'ext': 'ttml',
})
return {
'id': video_id,
'title': title,
'thumbnail': self._og_search_thumbnail(webpage, default=None),
'duration': duration,
'view_count': view_count,
'formats': formats,
'subtitles': subtitles,
}

View File

@@ -7,48 +7,53 @@ from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import (
determine_ext,
int_or_none,
remove_end,
unified_strdate,
ExtractorError,
int_or_none,
parse_iso8601,
remove_end,
)
class LifeNewsIE(InfoExtractor):
IE_NAME = 'lifenews'
IE_DESC = 'LIFE | NEWS'
_VALID_URL = r'https?://lifenews\.ru/(?:mobile/)?(?P<section>news|video)/(?P<id>\d+)'
IE_NAME = 'life'
IE_DESC = 'Life.ru'
_VALID_URL = r'https?://life\.ru/t/[^/]+/(?P<id>\d+)'
_TESTS = [{
# single video embedded via video/source
'url': 'http://lifenews.ru/news/98736',
'url': 'https://life.ru/t/новости/98736',
'md5': '77c95eaefaca216e32a76a343ad89d23',
'info_dict': {
'id': '98736',
'ext': 'mp4',
'title': 'Мужчина нашел дома архив оборонного завода',
'description': 'md5:3b06b1b39b5e2bea548e403d99b8bf26',
'timestamp': 1344154740,
'upload_date': '20120805',
'view_count': int,
}
}, {
# single video embedded via iframe
'url': 'http://lifenews.ru/news/152125',
'url': 'https://life.ru/t/новости/152125',
'md5': '77d19a6f0886cd76bdbf44b4d971a273',
'info_dict': {
'id': '152125',
'ext': 'mp4',
'title': 'В Сети появилось видео захвата «Правым сектором» колхозных полей ',
'description': 'Жители двух поселков Днепропетровской области не простили радикалам угрозу лишения плодородных земель и пошли в лобовую. ',
'timestamp': 1427961840,
'upload_date': '20150402',
'view_count': int,
}
}, {
# two videos embedded via iframe
'url': 'http://lifenews.ru/news/153461',
'url': 'https://life.ru/t/новости/153461',
'info_dict': {
'id': '153461',
'title': 'В Москве спасли потерявшегося медвежонка, который спрятался на дереве',
'description': 'Маленький хищник не смог найти дорогу домой и обрел временное убежище на тополе недалеко от жилого массива, пока его не нашла соседская собака.',
'upload_date': '20150505',
'timestamp': 1430825520,
'view_count': int,
},
'playlist': [{
'md5': '9b6ef8bc0ffa25aebc8bdb40d89ab795',
@@ -57,6 +62,7 @@ class LifeNewsIE(InfoExtractor):
'ext': 'mp4',
'title': 'В Москве спасли потерявшегося медвежонка, который спрятался на дереве (Видео 1)',
'description': 'Маленький хищник не смог найти дорогу домой и обрел временное убежище на тополе недалеко от жилого массива, пока его не нашла соседская собака.',
'timestamp': 1430825520,
'upload_date': '20150505',
},
}, {
@@ -66,22 +72,25 @@ class LifeNewsIE(InfoExtractor):
'ext': 'mp4',
'title': 'В Москве спасли потерявшегося медвежонка, который спрятался на дереве (Видео 2)',
'description': 'Маленький хищник не смог найти дорогу домой и обрел временное убежище на тополе недалеко от жилого массива, пока его не нашла соседская собака.',
'timestamp': 1430825520,
'upload_date': '20150505',
},
}],
}, {
'url': 'http://lifenews.ru/video/13035',
'url': 'https://life.ru/t/новости/213035',
'only_matching': True,
}, {
'url': 'https://life.ru/t/%D0%BD%D0%BE%D0%B2%D0%BE%D1%81%D1%82%D0%B8/153461',
'only_matching': True,
}, {
'url': 'https://life.ru/t/новости/411489/manuel_vals_nazval_frantsiiu_tsieliu_nomier_odin_dlia_ighil',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
section = mobj.group('section')
video_id = self._match_id(url)
webpage = self._download_webpage(
'http://lifenews.ru/%s/%s' % (section, video_id),
video_id, 'Downloading page')
webpage = self._download_webpage(url, video_id)
video_urls = re.findall(
r'<video[^>]+><source[^>]+src=["\'](.+?)["\']', webpage)
@@ -95,26 +104,22 @@ class LifeNewsIE(InfoExtractor):
title = remove_end(
self._og_search_title(webpage),
' - Первый по срочным новостям — LIFE | NEWS')
' - Life.ru')
description = self._og_search_description(webpage)
view_count = self._html_search_regex(
r'<div class=\'views\'>\s*(\d+)\s*</div>', webpage, 'view count', fatal=False)
comment_count = self._html_search_regex(
r'=\'commentCount\'[^>]*>\s*(\d+)\s*<',
webpage, 'comment count', fatal=False)
r'<div[^>]+class=(["\']).*?\bhits-count\b.*?\1[^>]*>\s*(?P<value>\d+)\s*</div>',
webpage, 'view count', fatal=False, group='value')
upload_date = self._html_search_regex(
r'<time[^>]*datetime=\'([^\']+)\'', webpage, 'upload date', fatal=False)
if upload_date is not None:
upload_date = unified_strdate(upload_date)
timestamp = parse_iso8601(self._search_regex(
r'<time[^>]+datetime=(["\'])(?P<value>.+?)\1',
webpage, 'upload date', fatal=False, group='value'))
common_info = {
'description': description,
'view_count': int_or_none(view_count),
'comment_count': int_or_none(comment_count),
'upload_date': upload_date,
'timestamp': timestamp,
}
def make_entry(video_id, video_url, index=None):
@@ -183,7 +188,8 @@ class LifeEmbedIE(InfoExtractor):
ext = determine_ext(video_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', m3u8_id='m3u8'))
video_url, video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='m3u8'))
else:
formats.append({
'url': video_url,

View File

@@ -98,13 +98,19 @@ class LimelightBaseIE(InfoExtractor):
} for thumbnail in properties.get('thumbnails', []) if thumbnail.get('url')]
subtitles = {}
for caption in properties.get('captions', {}):
for caption in properties.get('captions', []):
lang = caption.get('language_code')
subtitles_url = caption.get('url')
if lang and subtitles_url:
subtitles[lang] = [{
subtitles.setdefault(lang, []).append({
'url': subtitles_url,
}]
})
closed_captions_url = properties.get('closed_captions_url')
if closed_captions_url:
subtitles.setdefault('en', []).append({
'url': closed_captions_url,
'ext': 'ttml',
})
return {
'id': video_id,
@@ -123,7 +129,18 @@ class LimelightBaseIE(InfoExtractor):
class LimelightMediaIE(LimelightBaseIE):
IE_NAME = 'limelight'
_VALID_URL = r'(?:limelight:media:|https?://link\.videoplatform\.limelight\.com/media/\??\bmediaId=)(?P<id>[a-z0-9]{32})'
_VALID_URL = r'''(?x)
(?:
limelight:media:|
https?://
(?:
link\.videoplatform\.limelight\.com/media/|
assets\.delvenetworks\.com/player/loader\.swf
)
\?.*?\bmediaId=
)
(?P<id>[a-z0-9]{32})
'''
_TESTS = [{
'url': 'http://link.videoplatform.limelight.com/media/?mediaId=3ffd040b522b4485b6d84effc750cd86',
'info_dict': {
@@ -158,6 +175,9 @@ class LimelightMediaIE(LimelightBaseIE):
# rtmp download
'skip_download': True,
},
}, {
'url': 'https://assets.delvenetworks.com/player/loader.swf?mediaId=8018a574f08d416e95ceaccae4ba0452',
'only_matching': True,
}]
_PLAYLIST_SERVICE_PATH = 'media'
_API_PATH = 'media'
@@ -176,15 +196,29 @@ class LimelightMediaIE(LimelightBaseIE):
class LimelightChannelIE(LimelightBaseIE):
IE_NAME = 'limelight:channel'
_VALID_URL = r'(?:limelight:channel:|https?://link\.videoplatform\.limelight\.com/media/\??\bchannelId=)(?P<id>[a-z0-9]{32})'
_TEST = {
_VALID_URL = r'''(?x)
(?:
limelight:channel:|
https?://
(?:
link\.videoplatform\.limelight\.com/media/|
assets\.delvenetworks\.com/player/loader\.swf
)
\?.*?\bchannelId=
)
(?P<id>[a-z0-9]{32})
'''
_TESTS = [{
'url': 'http://link.videoplatform.limelight.com/media/?channelId=ab6a524c379342f9b23642917020c082',
'info_dict': {
'id': 'ab6a524c379342f9b23642917020c082',
'title': 'Javascript Sample Code',
},
'playlist_mincount': 3,
}
}, {
'url': 'http://assets.delvenetworks.com/player/loader.swf?channelId=ab6a524c379342f9b23642917020c082',
'only_matching': True,
}]
_PLAYLIST_SERVICE_PATH = 'channel'
_API_PATH = 'channels'
@@ -207,15 +241,29 @@ class LimelightChannelIE(LimelightBaseIE):
class LimelightChannelListIE(LimelightBaseIE):
IE_NAME = 'limelight:channel_list'
_VALID_URL = r'(?:limelight:channel_list:|https?://link\.videoplatform\.limelight\.com/media/\?.*?\bchannelListId=)(?P<id>[a-z0-9]{32})'
_TEST = {
_VALID_URL = r'''(?x)
(?:
limelight:channel_list:|
https?://
(?:
link\.videoplatform\.limelight\.com/media/|
assets\.delvenetworks\.com/player/loader\.swf
)
\?.*?\bchannelListId=
)
(?P<id>[a-z0-9]{32})
'''
_TESTS = [{
'url': 'http://link.videoplatform.limelight.com/media/?channelListId=301b117890c4465c8179ede21fd92e2b',
'info_dict': {
'id': '301b117890c4465c8179ede21fd92e2b',
'title': 'Website - Hero Player',
},
'playlist_mincount': 2,
}
}, {
'url': 'https://assets.delvenetworks.com/player/loader.swf?channelListId=301b117890c4465c8179ede21fd92e2b',
'only_matching': True,
}]
_PLAYLIST_SERVICE_PATH = 'channel_list'
def _real_extract(self, url):

View File

@@ -0,0 +1,137 @@
# coding: utf-8
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
int_or_none,
smuggle_url,
unsmuggle_url,
)
class LiTVIE(InfoExtractor):
_VALID_URL = r'https?://www\.litv\.tv/vod/[^/]+/content\.do\?.*?\bid=(?P<id>[^&]+)'
_URL_TEMPLATE = 'https://www.litv.tv/vod/%s/content.do?id=%s'
_TESTS = [{
'url': 'https://www.litv.tv/vod/drama/content.do?brc_id=root&id=VOD00041610&isUHEnabled=true&autoPlay=1',
'info_dict': {
'id': 'VOD00041606',
'title': '花千骨',
},
'playlist_count': 50,
}, {
'url': 'https://www.litv.tv/vod/drama/content.do?brc_id=root&id=VOD00041610&isUHEnabled=true&autoPlay=1',
'info_dict': {
'id': 'VOD00041610',
'ext': 'mp4',
'title': '花千骨第1集',
'thumbnail': 're:https?://.*\.jpg$',
'description': 'md5:c7017aa144c87467c4fb2909c4b05d6f',
'episode_number': 1,
},
'params': {
'noplaylist': True,
'skip_download': True, # m3u8 download
},
'skip': 'Georestricted to Taiwan',
}]
def _extract_playlist(self, season_list, video_id, vod_data, view_data, prompt=True):
episode_title = view_data['title']
content_id = season_list['contentId']
if prompt:
self.to_screen('Downloading playlist %s - add --no-playlist to just download video %s' % (content_id, video_id))
all_episodes = [
self.url_result(smuggle_url(
self._URL_TEMPLATE % (view_data['contentType'], episode['contentId']),
{'force_noplaylist': True})) # To prevent infinite recursion
for episode in season_list['episode']]
return self.playlist_result(all_episodes, content_id, episode_title)
def _real_extract(self, url):
url, data = unsmuggle_url(url, {})
video_id = self._match_id(url)
noplaylist = self._downloader.params.get('noplaylist')
noplaylist_prompt = True
if 'force_noplaylist' in data:
noplaylist = data['force_noplaylist']
noplaylist_prompt = False
webpage = self._download_webpage(url, video_id)
view_data = dict(map(lambda t: (t[0], t[2]), re.findall(
r'viewData\.([a-zA-Z]+)\s*=\s*(["\'])([^"\']+)\2',
webpage)))
vod_data = self._parse_json(self._search_regex(
'var\s+vod\s*=\s*([^;]+)', webpage, 'VOD data', default='{}'),
video_id)
season_list = list(vod_data.get('seasonList', {}).values())
if season_list:
if not noplaylist:
return self._extract_playlist(
season_list[0], video_id, vod_data, view_data,
prompt=noplaylist_prompt)
if noplaylist_prompt:
self.to_screen('Downloading just video %s because of --no-playlist' % video_id)
# In browsers `getMainUrl` request is always issued. Usually this
# endpoint gives the same result as the data embedded in the webpage.
# If georestricted, there are no embedded data, so an extra request is
# necessary to get the error code
video_data = self._parse_json(self._search_regex(
r'uiHlsUrl\s*=\s*testBackendData\(([^;]+)\);',
webpage, 'video data', default='{}'), video_id)
if not video_data:
payload = {
'assetId': view_data['assetId'],
'watchDevices': vod_data['watchDevices'],
'contentType': view_data['contentType'],
}
video_data = self._download_json(
'https://www.litv.tv/vod/getMainUrl', video_id,
data=json.dumps(payload).encode('utf-8'),
headers={'Content-Type': 'application/json'})
if not video_data.get('fullpath'):
error_msg = video_data.get('errorMessage')
if error_msg == 'vod.error.outsideregionerror':
self.raise_geo_restricted('This video is available in Taiwan only')
if error_msg:
raise ExtractorError('%s said: %s' % (self.IE_NAME, error_msg), expected=True)
raise ExtractorError('Unexpected result from %s' % self.IE_NAME)
formats = self._extract_m3u8_formats(
video_data['fullpath'], video_id, ext='mp4', m3u8_id='hls')
for a_format in formats:
# LiTV HLS segments doesn't like compressions
a_format.setdefault('http_headers', {})['Youtubedl-no-compression'] = True
title = view_data['title'] + view_data.get('secondaryMark', '')
description = view_data.get('description')
thumbnail = view_data.get('imageFile')
categories = [item['name'] for item in vod_data.get('category', [])]
episode = int_or_none(view_data.get('episode'))
return {
'id': video_id,
'formats': formats,
'title': title,
'description': description,
'thumbnail': thumbnail,
'categories': categories,
'episode_number': episode,
}

View File

@@ -17,7 +17,8 @@ class LiveLeakIE(InfoExtractor):
'ext': 'flv',
'description': 'extremely bad day for this guy..!',
'uploader': 'ljfriel2',
'title': 'Most unlucky car accident'
'title': 'Most unlucky car accident',
'thumbnail': 're:^https?://.*\.jpg$'
}
}, {
'url': 'http://www.liveleak.com/view?i=f93_1390833151',
@@ -28,6 +29,7 @@ class LiveLeakIE(InfoExtractor):
'description': 'German Television Channel NDR does an exclusive interview with Edward Snowden.\r\nUploaded on LiveLeak cause German Television thinks the rest of the world isn\'t intereseted in Edward Snowden.',
'uploader': 'ARD_Stinkt',
'title': 'German Television does first Edward Snowden Interview (ENGLISH)',
'thumbnail': 're:^https?://.*\.jpg$'
}
}, {
'url': 'http://www.liveleak.com/view?i=4f7_1392687779',
@@ -49,7 +51,8 @@ class LiveLeakIE(InfoExtractor):
'ext': 'mp4',
'description': 'Happened on 27.7.2014. \r\nAt 0:53 you can see people still swimming at near beach.',
'uploader': 'bony333',
'title': 'Crazy Hungarian tourist films close call waterspout in Croatia'
'title': 'Crazy Hungarian tourist films close call waterspout in Croatia',
'thumbnail': 're:^https?://.*\.jpg$'
}
}]
@@ -72,6 +75,7 @@ class LiveLeakIE(InfoExtractor):
age_limit = int_or_none(self._search_regex(
r'you confirm that you are ([0-9]+) years and over.',
webpage, 'age limit', default=None))
video_thumbnail = self._og_search_thumbnail(webpage)
sources_raw = self._search_regex(
r'(?s)sources:\s*(\[.*?\]),', webpage, 'video URLs', default=None)
@@ -124,4 +128,5 @@ class LiveLeakIE(InfoExtractor):
'uploader': video_uploader,
'formats': formats,
'age_limit': age_limit,
'thumbnail': video_thumbnail,
}

View File

@@ -150,7 +150,7 @@ class LivestreamIE(InfoExtractor):
}
def _extract_stream_info(self, stream_info):
broadcast_id = stream_info['broadcast_id']
broadcast_id = compat_str(stream_info['broadcast_id'])
is_live = stream_info.get('is_live')
formats = []
@@ -203,9 +203,10 @@ class LivestreamIE(InfoExtractor):
if not videos_info:
break
for v in videos_info:
v_id = compat_str(v['id'])
entries.append(self.url_result(
'http://livestream.com/accounts/%s/events/%s/videos/%s' % (account_id, event_id, v['id']),
'Livestream', v['id'], v['caption']))
'http://livestream.com/accounts/%s/events/%s/videos/%s' % (account_id, event_id, v_id),
'Livestream', v_id, v.get('caption')))
last_video = videos_info[-1]['id']
return self.playlist_result(entries, event_id, event_data['full_name'])

Some files were not shown because too many files have changed in this diff Show More