Compare commits

..

236 Commits

Author SHA1 Message Date
6c152ce20f release 2016.10.02 2016-10-02 15:58:00 +07:00
26406d33c7 [ChangeLog] Actualize 2016-10-02 15:56:33 +07:00
703b3afa93 [amcnetworks] Skip a restricted _TEST 2016-10-02 14:25:06 +08:00
99ed78c79e [jwplatform] Support DASH streams 2016-10-02 14:07:49 +08:00
fd15264172 [jwplatform] Support old-style jwplayer playlists 2016-10-02 13:47:06 +08:00
bd26441205 [utils] Fix xattr error handling 2016-10-02 03:03:41 +08:00
b19e275d99 [__init__] Fix lost xattr if --embed-thumbnail used
Reported at
https://github.com/rg3/youtube-dl/issues/9054#issuecomment-250451823
2016-10-02 02:12:14 +08:00
f6ba581f89 [byutv:event] Add extractor 2016-10-02 00:50:07 +07:00
6d2549fb4f [byutv] Fix id and display id 2016-10-02 00:44:54 +07:00
4da4516973 [byutv] Rely on _match_id and _parse_json 2016-10-02 00:41:18 +07:00
e1e97c2446 [periscope:user] Fix extraction (Closes #10820) 2016-10-01 22:50:47 +07:00
53a7e3d287 [utils] Support xattr as well as pyxattr
Closes #9054

There are two xattr packages in Python, pyxattr [1] and xattr [2]. They
have different APIs.

In old days pyxattr supports Linux only and xattr supports Linux, Mac,
FreeBSD and Solaris, and pyxattr supports Linux only. Recently pyxattr
adds support for Mac OS X. [3]

An old version of [2] is shipped with Mac OS X. However, some Linux
distributions have pyxattr only, for example PLD-Linux [4] and old Arch
Linux. [5] As a result, supporting both is the way to go.

[1] https://github.com/iustin/pyxattr
[2] https://github.com/xattr/xattr
[3] https://github.com/iustin/pyxattr/pull/9
[4] https://github.com/rg3/youtube-dl/issues/5498
[5] https://git.archlinux.org/svntogit/community.git/commit/?id=427c4c76401e386d865ccddea4fbfdc74df80492
    https://git.archlinux.org/svntogit/community.git/commit/?id=59b40da7b69622a6761d364a8b07909e9cccaa56
    python-xattr is added on 2016/06/29 while pyxattr is there for more
    than 6 years
2016-10-01 20:13:04 +08:00
d54739a2e6 [downloader/http] xattr values should be bytes 2016-10-01 19:58:13 +08:00
63e0fd5bcc Merge pull request #10818 from TRox1972/criterion_match_id
[criterion] Rely on _match_id, improve regex and add thumbnail to test
2016-10-01 19:49:18 +08:00
9c51a24642 [criterion] Rely on _match_id, improve regex and add thumbnail to test 2016-10-01 13:46:48 +02:00
9bd7bd0b80 [twitch] Skip a 404 test 2016-10-01 16:38:47 +08:00
4a76b73c6c Merge pull request #10817 from TRox1972/clubic_match_id
[clubic] Rely on _match_id and _parse_json
2016-10-01 16:20:12 +08:00
e295618f9e [dctp] Fix extraction (closes #10734) 2016-10-01 15:22:48 +08:00
d7753d1948 [downloader/http] Use write_xattr function for --xattr-set-filesize 2016-10-01 14:47:20 +08:00
eaf9b22f94 [clubic] Rely on _match_id and _parse_json 2016-09-30 20:03:25 +02:00
a1001f47fc [instagram] PEP 8 2016-10-01 00:16:08 +07:00
1609782258 [Instagram] Extract video dimensions 2016-10-01 00:13:34 +07:00
de6babf922 [tvland] Extend _VALID_URL (Closes #10812) 2016-09-30 22:30:34 +07:00
b0582fc806 [vgtv] Add support for tv.aftonbladet.se (Closes #10800) 2016-09-30 00:15:09 +07:00
af33dd8ee7 [aftonbladet] Remove extractor 2016-09-30 00:13:03 +07:00
70d7b323b6 [vk] Improve view count extraction 2016-09-29 23:52:29 +07:00
a7ee8a00f4 [vk] Extract timestamp (Closes #10760) 2016-09-29 23:52:29 +07:00
c6eed6b8c0 [utils] Lower priority for rare date formats and add tests 2016-09-29 23:52:29 +07:00
3aa3953d28 [vk] Fix date and view count extraction. 2016-09-29 23:52:29 +07:00
efa97bdcf1 Move write_xattr to utils.py
There are some other places that use xattr functions. It's better to
move it to a common place so that others can use it.
2016-09-30 00:28:32 +08:00
475f8a4580 [vk] Add support for running live streams (Closes #10799) 2016-09-29 23:21:39 +07:00
93aa0b6318 [vk] Add support for finished live streams (#10799) 2016-09-29 23:04:10 +07:00
0ce26ef228 Merge pull request #10788 from TRox1972/instagram_comments
[Instagram] Extract comments
2016-09-29 21:54:39 +08:00
0d72ff9c51 [leeco] Recognize more Le Sports URLs (#10794) 2016-09-29 21:39:35 +08:00
a56e74e271 [Instagram] Extract comments 2016-09-28 19:32:40 +02:00
f533490bb7 [ketnet] Extract mzsource formats (#10770) 2016-09-28 22:58:25 +07:00
8bfda726c2 [limelight:media] improve http formats extraction 2016-09-28 16:34:27 +01:00
8f0cf20ab9 release 2016.09.27 2016-09-27 23:09:46 +07:00
c8f45f763c [ChangeLog] Remove duplicate 2016-09-27 23:03:00 +07:00
dd2cffeeec [ChangeLog] Actualize 2016-09-27 22:43:35 +07:00
cdfcc4ce95 [mtv] Improve _VALID_URL 2016-09-27 22:27:10 +07:00
e384552590 [vk] Add support for dailymotion embeds
Fixes #10661
2016-09-27 21:58:14 +07:00
1a2fbe322e [periscope] Treat timed_out state as finished stream 2016-09-27 21:55:51 +07:00
f9dd86a112 [npo] Clarify IE_NAMEs (Closes #10775) 2016-09-27 21:37:33 +07:00
2342733f85 fix tests related to 1978540a5122c53012e17a78841f3da0df77fd34(closes #10774) 2016-09-27 15:31:25 +01:00
93933c9819 [awaan:video] fix test(closes #10773) 2016-09-27 15:31:25 +01:00
d75d9e343e [einthusan] Fix extraction (closes #10714) 2016-09-27 14:38:41 +08:00
72c3d02d29 [promptfile] Improve and modernize 2016-09-26 23:39:54 +07:00
d3dbb46330 [promptfile] Fix extraction (Closes #10634) 2016-09-26 23:20:58 +07:00
fffb9cff94 [kaltura] Speed up embed regexes (#10764) 2016-09-26 22:15:58 +07:00
d3c97bad61 Ignore and cleanup 3gp files 2016-09-26 14:14:37 +08:00
2d5b4af007 [extractors] Add import for anderetijden extractor 2016-09-25 23:30:57 +07:00
f1ee462c82 [PULL_REQUEST_TEMPLATE.md] Fix typo 2016-09-25 22:38:36 +07:00
5742c18bc1 [npo] Add support for anderetijden.nl (Closes #10754) 2016-09-25 22:26:14 +07:00
ddb19772d5 [vpro] Fix playlist title extraction and update tests 2016-09-25 22:26:06 +07:00
a3d8b38168 [npo] Generalize playlist extractors 2016-09-25 22:26:00 +07:00
e590b7ff9e [PULL_REQUEST_TEMPLATE.md] Add checkable Improvement options PR's purpose 2016-09-25 18:09:46 +07:00
f3625cc4ca [PULL_REQUEST_TEMPLATE.md] Add Unlicense notice 2016-09-25 18:08:35 +07:00
2d3d29976b [youtube] Change test URLs from http to https 2016-09-25 17:45:24 +07:00
493353c7fd [prosiebensat1] Add support for advopedia 2016-09-25 06:25:57 +07:00
0a078550b9 [prosiebensat1] Improve _VALID_URL 2016-09-25 06:19:17 +07:00
f92bb612c6 [mwave] Relax _VALID_URLs (Closes #10735, closes #10748) 2016-09-25 06:14:32 +07:00
ddde91952f [prosiebensat1] Fix playlist support (Closes #10745) 2016-09-25 05:36:18 +07:00
63c583eb2c [prosiebensat1] Add support for sat1gold (#10745) 2016-09-25 04:43:10 +07:00
7fd57de6fb [cbsnews:livevideo] fix extraction and extract m3u8 formats 2016-09-24 22:01:33 +01:00
e71a450956 [common] add hdcore sign to akamai f4m formats 2016-09-24 21:55:53 +01:00
27e99078d3 [brightcove:new] add support for live streams 2016-09-24 15:39:48 +01:00
6f126d903f [download/hls] Delegate downloading to ffmpeg for live streams 2016-09-24 15:39:47 +01:00
7518a61d41 [soundcloud] Fix typo in playlist base class name 2016-09-24 19:29:49 +07:00
8e45e1cc4d [soundcloud] Generalize playlist entries extraction (#10733) 2016-09-24 19:18:01 +07:00
f0bc5a8609 [twitter] Support Periscope embeds (closes #10737)
Also update _TESTS
2016-09-24 20:00:29 +08:00
a54ffb8aa7 [mtv] add common IE_NAME prefix for MTVIE and MTVVideoIE 2016-09-24 10:50:14 +01:00
8add4bfecb [mtv] add support for new website urls(closes #8169)(closes #9808) 2016-09-24 10:42:20 +01:00
0711995bca [openload] Support subtitles (closes #10625) 2016-09-24 14:27:08 +08:00
5968d7d2fe [extractor/common] Improved support for HTML5 subtitles
Ref: #10625

In a strict sense, <track>s with kind=captions are not subtitles. [1]
openload misuses this attribute, and I guess there will be more
examples, so I add it to common.py.

Also allow extracting information for subtitles-only <video> or <audio>
tags, which is the case of openload.

[1] https://www.w3.org/TR/html5/embedded-content-0.html#attr-track-kind
2016-09-24 14:20:42 +08:00
e6332059ac release 2016.09.24 2016-09-24 02:16:47 +07:00
8eec691e8a [ChangeLog] Actualize 2016-09-24 02:12:49 +07:00
24628cf7db [soundcloud:playlist] Provide video id for playlist entries (Closes #10733) 2016-09-24 02:01:01 +07:00
71ad00c09f [prosiebensat1] Add support for kabeleinsdoku (Closes #10732) 2016-09-23 21:08:16 +07:00
45cae3b021 [cbs] extract info from thunder videoPlayerService(closes #10728) 2016-09-22 19:28:22 +01:00
4ddcb5999d [openload] Fix extraction (closes #10408, closes #10727)
Thanks to @daniel100097 for providing a working version
2016-09-23 01:47:51 +08:00
628406db96 [Makefile] Cleanup files from fragment-based downloaders 2016-09-23 01:13:56 +08:00
e3d6bdc8fc [ustream] Support HLS streams (closes #10698) 2016-09-23 01:11:13 +08:00
0a439c5c4c [udemy] Stringify video id 2016-09-22 21:48:53 +07:00
1978540a51 [ooyala] extract all hls formats 2016-09-21 21:49:52 +01:00
12f211d0cb [videomore] Fix embed regex 2016-09-21 22:51:36 +07:00
3a5a18705f [adobepass] add support MSO that depend on watchTVeverywhere(closes #10709) 2016-09-21 15:57:27 +01:00
1ae0ae5db0 [cartoonnetwork] add support Adobe Pass auth 2016-09-20 18:52:00 +01:00
f62a77b99a [soundcloud] Modernize 2016-09-20 21:56:57 +07:00
4bfd294e2f [soundcloud] Extract license metadata 2016-09-20 21:56:57 +07:00
e33a7253b2 [fox] add support for Adobe Pass auth(closes #8584) 2016-09-20 15:52:23 +01:00
c38f06818d add support for Adobe Pass auth in tbs,tnt and trutv extractors(fixes #10642)(closes #10222)(closes #10519) 2016-09-20 11:55:30 +01:00
cb57386873 release 2016.09.19 2016-09-19 02:58:32 +07:00
59fd8f931d [ChangeLog] Actualize 2016-09-19 02:57:14 +07:00
70b4cf9b1b [crunchyroll] Check if already logged in (Closes #10700) 2016-09-19 02:50:06 +07:00
cc764a6da8 [twitch:stream] Remove fallback to profile extraction when stream is offline
Main page does not contain profile videos anymore
2016-09-18 19:10:18 +07:00
d8dbf8707d [thisav] Improve title extraction (closes #10682)
I didn't add a test case as the one in #10682 looks like a copyrighted
product.
2016-09-18 18:35:38 +08:00
a1da888d0c [vyborymos] Improve station info extraction 2016-09-18 17:30:55 +07:00
3acff9423d release 2016.09.18 2016-09-18 17:16:55 +07:00
9ca93b99d1 [ChangeLog] Actualize 2016-09-18 17:15:22 +07:00
14ae11efab [vyborymos] Add extractor (Closes #10692) 2016-09-18 16:56:40 +07:00
190d2027d0 [xfileshare] Add title regex for streamin.to and fallback to video id (Closes #10646) 2016-09-18 07:22:06 +07:00
26394d021d [globo:article] Add support for multiple videos (Closes #10653) 2016-09-17 23:34:10 +07:00
30d0b549be [extractor/common] Add manifest_url for hls and hds formats 2016-09-17 21:33:38 +07:00
86f4d14f81 Refactor fragments interface and dash segments downloader
- Eliminate segment_urls and initialization_url
+ Introduce manifest_url (manifest may contain unfragmented data in this case url will be used for direct media URL and manifest_url for manifest itself correspondingly)
* Rewrite dashsegments downloader to use fragments data
* Improve generic mpd extraction
2016-09-17 20:35:22 +07:00
21d21b0c72 [svt] Fix DASH formats extraction 2016-09-17 19:25:31 +07:00
b4c1d6e800 [extractor/common] Expose fragments interface for dashsegments formats 2016-09-17 18:31:18 +07:00
a0d5077c8d [extractor/common] Introduce fragments interface 2016-09-17 18:31:09 +07:00
584d6f3457 [thisav] Recognize jwplayers (closes #10447) 2016-09-17 18:46:43 +08:00
e14c82bd6b [jwplatform] Use js_to_json to detect more JWPlayers 2016-09-17 18:45:08 +08:00
c51a7f0b2f [franceinter] Fix upload date extraction 2016-09-17 15:44:37 +07:00
d05ef09d9d [mangomolo] fix domain regex 2016-09-17 08:11:01 +01:00
30d9e20938 [postprocessor/ffmpeg] apply FFmpegFixupM3u8PP only for videos with aac codec(#5591) 2016-09-16 22:06:55 +01:00
fc86d4eed0 [mangomolo] fix typo 2016-09-16 20:10:47 +01:00
7d273a387a [mangomolo] add support for Mangomolo embeds 2016-09-16 19:31:39 +01:00
6ad0219556 [common] add helper method for Wowza Streaming Engine format extraction 2016-09-16 19:30:38 +01:00
98b7506e96 [toutv] add support for authentication(closes #10669) 2016-09-16 17:40:15 +01:00
52dc8a9b3f [franceinter] Fix upload date extraction 2016-09-16 22:02:59 +07:00
9d8985a165 [tv4] Fix hls and hds formats (Closes #10659) 2016-09-16 00:54:34 +07:00
f5e008d134 release 2016.09.15 2016-09-15 23:46:11 +07:00
e6bf3621e7 [ChangeLog] Actualize 2016-09-15 23:31:16 +07:00
490b755769 Improve some id regexes 2016-09-15 23:12:58 +07:00
1dec2c8a0e [adobepass] Change mvpd cache section name
In order to better emphasize it's relation to Adobe Pass
2016-09-15 22:47:45 +07:00
dcce092e0a [extractor/common] Simplify _get_netrc_login_info and carry long lines 2016-09-15 22:35:12 +07:00
32443dd346 [extractor/common] Update _get_login_info's comment 2016-09-15 22:34:29 +07:00
2133565cec [extractor/common] Simplify _get_login_info 2016-09-15 22:26:37 +07:00
1da50aa34e [YoutubeDL] Improve Adobe Pass options' wording 2016-09-15 22:24:55 +07:00
d2522b86ac [options] Actually print Adobe Pass options sections in --help 2016-09-15 22:18:31 +07:00
537f753399 [options] Improve Adobe Pass wording 2016-09-15 22:17:17 +07:00
c849836854 [utils] Improve _hidden_inputs 2016-09-15 21:54:48 +07:00
eb5b1fc021 [crunchyroll] Fix authentication (Closes #10655) 2016-09-15 21:53:35 +07:00
95be29e1c6 [twitch] Fix api calls (Closes #10654, closes #10660) 2016-09-15 20:58:02 +07:00
c035dba19e [bellmedia] add support for more sites 2016-09-15 08:12:12 +01:00
87148bb711 [adobepass] rename --ap-mso-list option to --ap-list-mso 2016-09-14 20:21:09 +01:00
797c636bcb [ap] improve adobe pass names and parse error handling 2016-09-14 18:58:47 +01:00
0002962f3f [franceinter] Improve extraction (Closes #10538) 2016-09-14 23:59:38 +07:00
3e4185c396 [utils] Use native french month names 2016-09-14 23:59:38 +07:00
f6717dec8a [utils] Improve month_by_name and add tests 2016-09-14 23:59:38 +07:00
a942d6cb48 [utils,franceinter] Add french months' names and fix extraction
Update of the "FranceInter" radio extractor : webpages HTML structure
had changed, the extractor didn't work. So I updated this extractor to
get the mp3 URL and all details.
2016-09-14 23:59:38 +07:00
961516bfd1 [kwuo:song] Improve error detection (closes #10650) 2016-09-15 00:56:15 +08:00
6db354a9f4 [kuwo] Update _TESTS 2016-09-15 00:53:04 +08:00
353f340e11 [go] fix typo 2016-09-14 17:22:42 +01:00
014b7e6b25 [go] add support for free full episodes(#10439) 2016-09-14 17:08:25 +01:00
925194022c Improve some _VALID_URLs 2016-09-14 22:47:21 +07:00
b690ea15eb [viafree] Fix test 2016-09-14 22:45:23 +07:00
5712c0f426 [adobepass] remove unnecessary option 2016-09-14 16:37:21 +01:00
86d68f906e [bilibili] Fix extraction for videos without backup_url (#10647) 2016-09-14 22:11:49 +08:00
4875ff6847 [bilibili] Remove copyrighted test cases
I can't find any English or Chinese material that claims BiliBili has
bought legal redistribution permissions for copyrighted products from
copyrighted holders.

References for removed test cases:
"刀语": https://en.wikipedia.org/wiki/Katanagatari, by White Fox
"哆啦A梦": https://en.wikipedia.org/wiki/Doraemon, by Shin-Ei Animation
"岳父岳母真难当": https://en.wikipedia.org/wiki/Serial_(Bad)_Weddings, by Les films du 24
"混沌武士": https://en.wikipedia.org/wiki/Samurai_Champloo, by Manglobe

I shouldn't have added them to _TESTS
2016-09-14 22:09:43 +08:00
1b6712ab23 [adobepass] add specific options for adobe pass authentication
- add --ap-username and --ap-password option to specify
TV provider username and password in the cmd line
- add --ap-retries option to limit the number of retries
- add --list-ap-msi-ids to list the supported TV Providers
2016-09-13 22:16:01 +01:00
8414c2da31 [adobepass] PEP 8 2016-09-13 23:22:16 +07:00
45396dd2ed [nhk] Fix extraction (Closes #10633) 2016-09-13 23:20:25 +07:00
7a7309219c [adobepass] add an option to specify mso_id and support for ROGERS TV Provider(closes #10606) 2016-09-12 23:39:35 +01:00
fcba157e80 [ISSUE_TEMPLATE_tmpl.md] Fix typo 2016-09-12 23:29:43 +07:00
a6ccc3e518 [safari] Improve ids regexes (#10617) 2016-09-12 23:05:52 +07:00
1d16035bb4 [kaltura] Improve audio detection 2016-09-12 22:43:45 +07:00
e8bcd982cc [kaltura] Skip chun format 2016-09-12 22:33:00 +07:00
a5ff05df1a [extractor/generic] Add vimeo embed that requires Referer passed 2016-09-12 21:49:31 +07:00
d002e91986 [vimeo:ondemand] Pass Referer along with embed URL (#10624) 2016-09-12 21:48:45 +07:00
546edb2efa [ISSUE_TEMPLATE_tmpl.md] Fix typo 2016-09-12 21:01:31 +07:00
be45730226 [nbc] Add new extractor for NBC Olympics (#10295, #10361) 2016-09-12 02:55:15 +08:00
ee7e672eb0 [tube8] Remove proxy settings from test 2016-09-11 23:46:50 +07:00
0307d6fba6 release 2016.09.11.1 2016-09-11 23:33:20 +07:00
fc150cba1d [devscripts/release.sh] Add missing fi 2016-09-11 23:32:01 +07:00
d667ab7fad [ChangeLog] Actualize 2016-09-11 23:30:18 +07:00
eb87d4545a [devscripts/release.sh] Add ChangeLog reminder prompt 2016-09-11 23:29:25 +07:00
1c81476cbb release 2016.09.11 2016-09-11 23:20:09 +07:00
bc9186c882 [tvplay] Remove unused import 2016-09-11 22:51:12 +07:00
6599c72527 [tube8] Extract categories and tags (Closes #10579) 2016-09-11 22:50:36 +07:00
6bb05b32a9 [pornhub] Extract categories and tags (closes #10499) 2016-09-11 19:22:51 +08:00
fea74acad8 [foxnews] Revert to old extractor names 2016-09-11 18:54:24 +08:00
f01115c933 [openload] Temporary fix (#10408) 2016-09-11 18:36:59 +08:00
2cdbc06a1f [foxnews] Support Fox News Articles (closes #10598) 2016-09-11 18:32:45 +08:00
2cb93afcd8 [viafree] Improve video id extraction (Closes #10615) 2016-09-11 14:59:14 +07:00
bfcda07a27 [abc:iview] Skip the test. They are removed soon 2016-09-11 04:06:00 +08:00
001a5fd3d7 [iwara] Fix extraction after relaunch
Closes #10462, closes #3215
2016-09-11 03:02:00 +08:00
1e35999c1e [tfo] Add new extractor 2016-09-10 19:43:31 +01:00
2512b17493 [lrt] Fix audio extraction (Closes #10566) 2016-09-11 01:27:20 +07:00
56c0ead4d3 [9now] Improve video data extraction (Closes #10561) 2016-09-11 00:42:13 +07:00
7324243750 [9now] Fix extraction 2016-09-11 00:16:29 +07:00
84a18e9b90 [polskieradio:category] Improve extraction 2016-09-10 22:01:49 +07:00
b29f842e0e [canalplus] Add support for c8.fr (Closes #10577) 2016-09-10 20:46:45 +07:00
f009fcac0d Merge branch 'master' of github.com:rg3/youtube-dl 2016-09-10 19:21:03 +07:00
6c3affcb18 [newgrounds] Fix uploader extraction
Closes #10584

Also change test URLs to HTTPS, as proposed by
@stepshal in #10593.

Closes #10593
2016-09-10 20:09:09 +08:00
1e19ff2984 Merge branch 'polskie-radio-programme' of https://github.com/JakubAdamWieczorek/youtube-dl 2016-09-10 00:42:36 +07:00
c6129feb7f [ketnet] Add extractor (Closes #10343) 2016-09-09 23:20:45 +07:00
bb5ebd4453 [canvas] Add support for een.be (Closes #10605) 2016-09-09 22:16:21 +07:00
cb9cbd84ed [extractors] add import for TeleQuebecIE 2016-09-08 22:55:27 +01:00
4d5726b0d7 [telequebec] Add new extractor(closes #1999) 2016-09-08 22:53:44 +01:00
4614ad7b59 [parliamentliveuk] fix extraction(closes #9137) 2016-09-08 20:46:12 +01:00
b717837190 release 2016.09.08 2016-09-08 23:46:14 +07:00
2abad67e52 [ChangeLog] Actualize 2016-09-08 23:32:16 +07:00
ad0e2b3359 [abcotvs] Add support for ABC Owned Television Stations 2016-09-08 23:15:58 +07:00
37720844f6 [jwplatform] Extract height from label 2016-09-08 22:53:20 +07:00
6cfcb8ac36 [tvnoe] Do not capture unused groups in _VALID_URL 2016-09-08 22:53:20 +07:00
7a979da8cb [yahoo] Look for Brightcove Legacy Studio embeds(closes #9345) 2016-09-08 16:44:22 +01:00
2fdc7b0e04 [viafree] PEP 8 2016-09-08 22:40:02 +07:00
010d034fca [videomore] Fix extraction (Closes #10592) 2016-09-08 22:38:49 +07:00
02e552886f Merge pull request #10596 from stepshal/r_prefix
Add missing r prefix for _VALID_URLs
2016-09-08 18:31:09 +08:00
25042f7372 Add missing r prefix for _VALID_URLs 2016-09-08 17:04:57 +07:00
3f612f0767 Fix _VALID_URLs further (#10594) 2016-09-08 17:39:29 +08:00
17bf6e71cc Merge pull request #10594 from stepshal/https_support
Add support for https for rest of the exctractors.
2016-09-08 17:28:46 +08:00
881f35479d Credit @xyb for miaopai extractor (#10556) 2016-09-08 17:22:43 +08:00
89f257d6e5 Add support for https for rest of the exctractors. 2016-09-08 13:52:22 +07:00
e78a5428b6 [foxgay] Fix extraction (closes #10480) 2016-09-08 02:01:09 +08:00
6656a82481 [rmcdecouverte] Add new extractor(closes #9709) 2016-09-07 17:33:22 +01:00
d7e794928d [tlc] fix query string parsing 2016-09-07 17:33:22 +01:00
9c27188988 Merge branch 'xyb-miaopai' 2016-09-08 00:31:06 +08:00
b84d311d53 [ChangeLog] Update for #10556 2016-09-08 00:29:55 +08:00
f87feb4b68 [miaopai] Coding style (#10556) 2016-09-08 00:28:33 +08:00
2841bdcebb Merge branch 'miaopai' of https://github.com/xyb/youtube-dl into xyb-miaopai 2016-09-08 00:08:02 +08:00
84b91dd4e3 [gamestar] Fix metadata extraction (closes #10479) 2016-09-07 23:07:50 +08:00
92c9c2a88b [moevideo] Skip another removed test (#10474) 2016-09-07 22:21:59 +08:00
9d54b02bae [puls4] fix extraction(closes #10583) 2016-09-07 14:43:20 +01:00
846d8b76a0 [cctv] Add new extractor(closes #8153) 2016-09-07 10:11:09 +01:00
aa3f9fe695 Explain why and why not to specify --hls-prefer-native
This has been asked at http://stackoverflow.com/questions/39357037/what-does-youtube-dl-option-hls-prefer-native-do-any-downside-adding-to-youtu
2016-09-07 10:38:59 +02:00
8258f4457c [lci] Add new extractor(closes #10573) 2016-09-06 20:47:42 +01:00
948cd5b72d [wat] extract dash formats 2016-09-06 20:44:45 +01:00
8d3737cda7 [polskieradio] Add support for downloading whole programmes.
This extends the Polskie Radio (the Polish national radio) extractor to
enable the user to download all the broadcasts of a single programme.
2016-09-06 21:34:44 +02:00
155bc674c4 [viafree] Improve video id detection (Closes #10569) 2016-09-07 00:41:31 +07:00
c33c962adf [trutv] Add new extractor(#10519) 2016-09-06 15:56:17 +01:00
bdcc046d12 [turner] use android secure hls host and catch token extraction errors 2016-09-06 15:53:03 +01:00
a493f10208 using _parse_html5_media_entries to parse video tag 2016-09-05 23:08:33 +08:00
f3eeaacb4e [nick] Add test for #10559 2016-09-05 21:42:41 +07:00
b4d6a85d60 [nick] Add support for nickelodeon.nl (Closes #10559) 2016-09-05 21:33:14 +07:00
0b36a96212 [abcotvs] extend _VALID_URL and add support for clips.abcotvs.com(closes #9551) 2016-09-05 13:41:21 +01:00
bc22a79694 Credit @mcepl for #10524 2016-09-05 16:44:06 +08:00
340e31ca74 Merge branch 'PeterDing-bilibili' 2016-09-05 13:55:07 +08:00
973dee491f [ChangeLog] Update for #10190 2016-09-05 13:54:35 +08:00
1f85029d82 [bilibili] Simplify 2016-09-05 13:53:58 +08:00
95be19d436 [miaopai] Add new extractor 2016-09-05 13:53:09 +08:00
95843da529 Merge branch 'bilibili' of https://github.com/PeterDing/youtube-dl into PeterDing-bilibili 2016-09-05 13:47:24 +08:00
abf2c79f95 Merge branch 'mcepl-tvnoe' 2016-09-05 13:39:51 +08:00
b49ad71ce1 [ChangeLog] Update for #10524 2016-09-05 13:38:55 +08:00
9127e1533d [tvnoe] PEP8 and coding style 2016-09-05 13:37:36 +08:00
78e762d23c Add new extractor for TV Noe (Czech Christian TV).
Fixes #10520
2016-09-04 19:06:40 +02:00
7be15d4097 [bilibili] Support episodes
[extractor/bilibili] add md5 for testing

[extractor/bilibili] remove unnecessary headers

[extractor/bilibili] correct _TESTS; find thumbnail for episode

[extractor/bilibili] [Fix] restore removed tests
2016-08-29 23:31:08 +08:00
179 changed files with 4736 additions and 1568 deletions

View File

@ -6,8 +6,8 @@
---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.09.04.1*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.09.04.1**
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.10.02*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.10.02**
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2016.09.04.1
[debug] youtube-dl version 2016.10.02
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}
@ -55,4 +55,4 @@ $ youtube-dl -v <your command line>
### Description of your *issue*, suggested solution and other information
Explanation of your *issue* in arbitrary form goes here. Please make sure the [description is worded well enough to be understood](https://github.com/rg3/youtube-dl#is-the-description-of-the-issue-itself-sufficient). Provide as much context and examples as possible.
If work on your *issue* required an account credentials please provide them or explain how one can obtain them.
If work on your *issue* requires account credentials please provide them or explain how one can obtain them.

View File

@ -55,4 +55,4 @@ $ youtube-dl -v <your command line>
### Description of your *issue*, suggested solution and other information
Explanation of your *issue* in arbitrary form goes here. Please make sure the [description is worded well enough to be understood](https://github.com/rg3/youtube-dl#is-the-description-of-the-issue-itself-sufficient). Provide as much context and examples as possible.
If work on your *issue* required an account credentials please provide them or explain how one can obtain them.
If work on your *issue* requires account credentials please provide them or explain how one can obtain them.

View File

@ -10,8 +10,13 @@
- [ ] At least skimmed through [adding new extractor tutorial](https://github.com/rg3/youtube-dl#adding-support-for-a-new-site) and [youtube-dl coding conventions](https://github.com/rg3/youtube-dl#youtube-dl-coding-conventions) sections
- [ ] [Searched](https://github.com/rg3/youtube-dl/search?q=is%3Apr&type=Issues) the bugtracker for similar pull requests
### In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under [Unlicense](http://unlicense.org/). Check one of the following options:
- [ ] I am the original author of this code and I am willing to release it under [Unlicense](http://unlicense.org/)
- [ ] I am not the original author of this code but it is in public domain or released under [Unlicense](http://unlicense.org/) (provide reliable evidence)
### What is the purpose of your *pull request*?
- [ ] Bug fix
- [ ] Improvement
- [ ] New extractor
- [ ] New feature

1
.gitignore vendored
View File

@ -29,6 +29,7 @@ updates_key.pem
*.m4a
*.m4v
*.mp3
*.3gp
*.part
*.swp
test/testdata

View File

@ -183,3 +183,5 @@ Petr Zvoníček
Pratyush Singh
Aleksander Nitecki
Sebastian Blunt
Matěj Cepl
Xie Yanbo

173
ChangeLog
View File

@ -1,3 +1,176 @@
version 2016.10.02
Core
* Fix possibly lost extended attributes during post-processing
+ Support pyxattr as well as python-xattr for --xattrs and
--xattr-set-filesize (#9054)
Extractors
+ [jwplatform] Support DASH streams in JWPlayer
+ [jwplatform] Support old-style JWPlayer playlists
+ [byutv:event] Add extractor
* [periscope:user] Fix extraction (#10820)
* [dctp] Fix extraction (#10734)
+ [instagram] Extract video dimensions (#10790)
+ [tvland] Extend URL regular expression (#10812)
+ [vgtv] Add support for tv.aftonbladet.se (#10800)
- [aftonbladet] Remove extractor
* [vk] Fix timestamp and view count extraction (#10760)
+ [vk] Add support for running and finished live streams (#10799)
+ [leeco] Recognize more Le Sports URLs (#10794)
+ [instagram] Extract comments (#10788)
+ [ketnet] Extract mzsource formats (#10770)
* [limelight:media] Improve HTTP formats extraction
version 2016.09.27
Core
+ Add hdcore query parameter to akamai f4m formats
+ Delegate HLS live streams downloading to ffmpeg
+ Improved support for HTML5 subtitles
Extractors
+ [vk] Add support for dailymotion embeds (#10661)
* [promptfile] Fix extraction (#10634)
* [kaltura] Speed up embed regular expressions (#10764)
+ [npo] Add support for anderetijden.nl (#10754)
+ [prosiebensat1] Add support for advopedia sites
* [mwave] Relax URL regular expression (#10735, #10748)
* [prosiebensat1] Fix playlist support (#10745)
+ [prosiebensat1] Add support for sat1gold sites (#10745)
+ [cbsnews:livevideo] Fix extraction and extract m3u8 formats
+ [brightcove:new] Add support for live streams
* [soundcloud] Generalize playlist entries extraction (#10733)
+ [mtv] Add support for new URL schema (#8169, #9808)
* [einthusan] Fix extraction (#10714)
+ [twitter] Support Periscope embeds (#10737)
+ [openload] Support subtitles (#10625)
version 2016.09.24
Core
+ Add support for watchTVeverywhere.com authentication provider based MSOs for
Adobe Pass authentication (#10709)
Extractors
+ [soundcloud:playlist] Provide video id for early playlist entries (#10733)
+ [prosiebensat1] Add support for kabeleinsdoku (#10732)
* [cbs] Extract info from thunder videoPlayerService (#10728)
* [openload] Fix extraction (#10408)
+ [ustream] Support the new HLS streams (#10698)
+ [ooyala] Extract all HLS formats
+ [cartoonnetwork] Add support for Adobe Pass authentication
+ [soundcloud] Extract license metadata
+ [fox] Add support for Adobe Pass authentication (#8584)
+ [tbs] Add support for Adobe Pass authentication (#10642, #10222)
+ [trutv] Add support for Adobe Pass authentication (#10519)
+ [turner] Add support for Adobe Pass authentication
version 2016.09.19
Extractors
+ [crunchyroll] Check if already authenticated (#10700)
- [twitch:stream] Remove fallback to profile extraction when stream is offline
* [thisav] Improve title extraction (#10682)
* [vyborymos] Improve station info extraction
version 2016.09.18
Core
+ Introduce manifest_url and fragments fields in formats dictionary for
fragmented media
+ Provide manifest_url field for DASH segments, HLS and HDS
+ Provide fragments field for DASH segments
* Rework DASH segments downloader to use fragments field
+ Add helper method for Wowza Streaming Engine formats extraction
Extractors
+ [vyborymos] Add extractor for vybory.mos.ru (#10692)
+ [xfileshare] Add title regular expression for streamin.to (#10646)
+ [globo:article] Add support for multiple videos (#10653)
+ [thisav] Recognize HTML5 videos (#10447)
* [jwplatform] Improve JWPlayer detection
+ [mangomolo] Add support for Mangomolo embeds
+ [toutv] Add support for authentication (#10669)
* [franceinter] Fix upload date extraction
* [tv4] Fix HLS and HDS formats extraction (#10659)
version 2016.09.15
Core
* Improve _hidden_inputs
+ Introduce improved explicit Adobe Pass support
+ Add --ap-mso to provide multiple-system operator identifier
+ Add --ap-username to provide MSO account username
+ Add --ap-password to provide MSO account password
+ Add --ap-list-mso to list all supported MSOs
+ Add support for Rogers Cable multiple-system operator (#10606)
Extractors
* [crunchyroll] Fix authentication (#10655)
* [twitch] Fix API calls (#10654, #10660)
+ [bellmedia] Add support for more Bell Media Television sites
* [franceinter] Fix extraction (#10538, #2105)
* [kuwo] Improve error detection (#10650)
+ [go] Add support for free full episodes (#10439)
* [bilibili] Fix extraction for specific videos (#10647)
* [nhk] Fix extraction (#10633)
* [kaltura] Improve audio detection
* [kaltura] Skip chun format
+ [vimeo:ondemand] Pass Referer along with embed URL (#10624)
+ [nbc] Add support for NBC Olympics (#10361)
version 2016.09.11.1
Extractors
+ [tube8] Extract categories and tags (#10579)
+ [pornhub] Extract categories and tags (#10499)
* [openload] Temporary fix (#10408)
+ [foxnews] Add support Fox News articles (#10598)
* [viafree] Improve video id extraction (#10615)
* [iwara] Fix extraction after relaunch (#10462, #3215)
+ [tfo] Add extractor for tfo.org
* [lrt] Fix audio extraction (#10566)
* [9now] Fix extraction (#10561)
+ [canalplus] Add support for c8.fr (#10577)
* [newgrounds] Fix uploader extraction (#10584)
+ [polskieradio:category] Add support for category lists (#10576)
+ [ketnet] Add extractor for ketnet.be (#10343)
+ [canvas] Add support for een.be (#10605)
+ [telequebec] Add extractor for telequebec.tv (#1999)
* [parliamentliveuk] Fix extraction (#9137)
version 2016.09.08
Extractors
+ [jwplatform] Extract height from format label
+ [yahoo] Extract Brightcove Legacy Studio embeds (#9345)
* [videomore] Fix extraction (#10592)
* [foxgay] Fix extraction (#10480)
+ [rmcdecouverte] Add extractor for rmcdecouverte.bfmtv.com (#9709)
* [gamestar] Fix metadata extraction (#10479)
* [puls4] Fix extraction (#10583)
+ [cctv] Add extractor for CCTV and CNTV (#8153)
+ [lci] Add extractor for lci.fr (#10573)
+ [wat] Extract DASH formats
+ [viafree] Improve video id detection (#10569)
+ [trutv] Add extractor for trutv.com (#10519)
+ [nick] Add support for nickelodeon.nl (#10559)
+ [abcotvs:clips] Add support for clips.abcotvs.com
+ [abcotvs] Add support for ABC Owned Television Stations sites (#9551)
+ [miaopai] Add extractor for miaopai.com (#10556)
* [gamestar] Fix metadata extraction (#10479)
+ [bilibili] Add support for episodes (#10190)
+ [tvnoe] Add extractor for tvnoe.cz (#10524)
version 2016.09.04.1
Core

View File

@ -1,7 +1,7 @@
all: youtube-dl README.md CONTRIBUTING.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish supportedsites
clean:
rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish youtube_dl/extractor/lazy_extractors.py *.dump *.part *.info.json *.mp4 *.m4a *.flv *.mp3 *.avi *.mkv *.webm *.jpg *.png CONTRIBUTING.md.tmp ISSUE_TEMPLATE.md.tmp youtube-dl youtube-dl.exe
rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish youtube_dl/extractor/lazy_extractors.py *.dump *.part* *.info.json *.mp4 *.m4a *.flv *.mp3 *.avi *.mkv *.webm *.3gp *.jpg *.png CONTRIBUTING.md.tmp ISSUE_TEMPLATE.md.tmp youtube-dl youtube-dl.exe
find . -name "*.pyc" -delete
find . -name "*.class" -delete

View File

@ -358,6 +358,17 @@ which means you can modify it, redistribute it or use it however you like.
-n, --netrc Use .netrc authentication data
--video-password PASSWORD Video password (vimeo, smotri, youku)
## Adobe Pass Options:
--ap-mso MSO Adobe Pass multiple-system operator (TV
provider) identifier, use --ap-list-mso for
a list of available MSOs
--ap-username USERNAME Multiple-system operator account login
--ap-password PASSWORD Multiple-system operator account password.
If this option is left out, youtube-dl will
ask interactively.
--ap-list-mso List all supported multiple-system
operators
## Post-processing Options:
-x, --extract-audio Convert video files to audio-only files
(requires ffmpeg or avconv and ffprobe or
@ -851,6 +862,16 @@ will download the complete `PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re` playlist and cre
youtube-dl --download-archive archive.txt "https://www.youtube.com/playlist?list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re"
### Should I add `--hls-prefer-native` into my config?
When youtube-dl detects an HLS video, it can download it either with the built-in downloader or ffmpeg. Since many HLS streams are slightly invalid and ffmpeg/youtube-dl each handle some invalid cases better than the other, there is an option to switch the downloader if needed.
When youtube-dl knows that one particular downloader works better for a given website, that downloader will be picked. Otherwise, youtube-dl will pick the best downloader for general compatibility, which at the moment happens to be ffmpeg. This choice may change in future versions of youtube-dl, with improvements of the built-in downloader and/or ffmpeg.
In particular, the generic extractor (used when your website is not in the [list of supported sites by youtube-dl](http://rg3.github.io/youtube-dl/supportedsites.html) cannot mandate one specific downloader.
If you put either `--hls-prefer-native` or `--hls-prefer-ffmpeg` into your configuration, a different subset of videos will fail to download correctly. Instead, it is much better to [file an issue](https://yt-dl.org/bug) or a pull request which details why the native or the ffmpeg HLS downloader is a better choice for your use case.
### Can you add support for this anime video site, or site which shows current movies for free?
As a matter of policy (as well as legality), youtube-dl does not include support for services that specialize in infringing copyright. As a rule of thumb, if you cannot easily find a video that the service is quite obviously allowed to distribute (i.e. that has been uploaded by the creator, the creator's distributor, or is published under a free license), the service is probably unfit for inclusion to youtube-dl.

View File

@ -60,6 +60,9 @@ if ! type pandoc >/dev/null 2>/dev/null; then echo 'ERROR: pandoc is missing'; e
if ! python3 -c 'import rsa' 2>/dev/null; then echo 'ERROR: python3-rsa is missing'; exit 1; fi
if ! python3 -c 'import wheel' 2>/dev/null; then echo 'ERROR: wheel is missing'; exit 1; fi
read -p "Is ChangeLog up to date? (y/n) " -n 1
if [[ ! $REPLY =~ ^[Yy]$ ]]; then exit 1; fi
/bin/echo -e "\n### First of all, testing..."
make clean
if $skip_tests ; then

View File

@ -19,9 +19,10 @@
- **9now.com.au**
- **abc.net.au**
- **abc.net.au:iview**
- **Abc7News**
- **abcnews**
- **abcnews:video**
- **abcotvs**: ABC Owned Television Stations
- **abcotvs:clips**
- **AcademicEarth:Course**
- **acast**
- **acast:channel**
@ -33,12 +34,12 @@
- **AdultSwim**
- **aenetworks**: A+E Networks: A&E, Lifetime, History.com, FYI Network
- **AfreecaTV**: afreecatv.com
- **Aftonbladet**
- **AirMozilla**
- **AlJazeera**
- **Allocine**
- **AlphaPorno**
- **AMCNetworks**
- **anderetijden**: npo.nl and ntr.nl
- **AnimeOnDemand**
- **anitube.se**
- **AnySex**
@ -88,6 +89,7 @@
- **BeatportPro**
- **Beeg**
- **BehindKink**
- **BellMedia**
- **Bet**
- **Bigflix**
- **Bild**: Bild.de
@ -109,6 +111,7 @@
- **bt:vestlendingen**: Bergens Tidende - Vestlendingen
- **BuzzFeed**
- **BYUtv**
- **BYUtvEvent**
- **Camdemy**
- **CamdemyFolder**
- **CamWithHer**
@ -125,9 +128,10 @@
- **CBS**
- **CBSInteractive**
- **CBSLocal**
- **CBSNews**: CBS News
- **CBSNewsLiveVideo**: CBS News Live Videos
- **cbsnews**: CBS News
- **cbsnews:livevideo**: CBS News Live Videos
- **CBSSports**
- **CCTV**
- **CDA**
- **CeskaTelevize**
- **channel9**: Channel 9
@ -167,7 +171,6 @@
- **CSNNE**
- **CSpan**: C-SPAN
- **CtsNews**: 華視新聞
- **CTV**
- **CTVNews**
- **culturebox.francetvinfo.fr**
- **CultureUnplugged**
@ -245,7 +248,8 @@
- **Formula1**
- **FOX**
- **Foxgay**
- **FoxNews**: Fox News and Fox Business Video
- **foxnews**: Fox News and Fox Business Video
- **foxnews:article**
- **foxnews:insider**
- **FoxSports**
- **france2.fr:generation-quoi**
@ -324,6 +328,7 @@
- **ivi**: ivi.ru
- **ivi:compilation**: ivi.ru compilations
- **ivideon**: Ivideon TV
- **Iwara**
- **Izlesene**
- **JeuxVideo**
- **Jove**
@ -337,6 +342,7 @@
- **KarriereVideos**
- **keek**
- **KeezMovies**
- **Ketnet**
- **KhanAcademy**
- **KickStarter**
- **KonserthusetPlay**
@ -352,6 +358,7 @@
- **kuwo:song**: 酷我音乐
- **la7.it**
- **Laola1Tv**
- **LCI**
- **Lcp**
- **LcpPlay**
- **Le**: 乐视网
@ -382,6 +389,8 @@
- **mailru**: Видео@Mail.Ru
- **MakersChannel**
- **MakerTV**
- **mangomolo:live**
- **mangomolo:video**
- **MatchTV**
- **MDR**: MDR.DE and KiKA
- **media.ccc.de**
@ -390,6 +399,7 @@
- **Metacritic**
- **Mgoon**
- **MGTV**: 芒果TV
- **MiaoPai**
- **Minhateca**
- **MinistryGrid**
- **Minoto**
@ -415,8 +425,9 @@
- **MPORA**
- **MSN**
- **mtg**: MTG services
- **MTV**
- **mtv**
- **mtv.de**
- **mtv:video**
- **mtvservices:embedded**
- **MuenchenTV**: münchen.tv
- **MusicPlayOn**
@ -438,6 +449,7 @@
- **NBA**
- **NBC**
- **NBCNews**
- **NBCOlympics**
- **NBCSports**
- **NBCSportsVPlayer**
- **ndr**: NDR.de - Norddeutscher Rundfunk
@ -536,6 +548,7 @@
- **podomatic**
- **Pokemon**
- **PolskieRadio**
- **PolskieRadioCategory**
- **PornCom**
- **PornHd**
- **PornHub**: PornHub and Thumbzilla
@ -576,6 +589,7 @@
- **revision3:embed**
- **RICE**
- **RingTV**
- **RMCDecouverte**
- **RockstarGames**
- **RoosterTeeth**
- **RottenTomatoes**
@ -696,9 +710,11 @@
- **Telecinco**: telecinco.es, cuatro.com and mediaset.es
- **Telegraaf**
- **TeleMB**
- **TeleQuebec**
- **TeleTask**
- **Telewebion**
- **TF1**
- **TFO**
- **TheIntercept**
- **ThePlatform**
- **ThePlatformFeed**
@ -720,7 +736,7 @@
- **ToypicsUser**: Toypics user profile
- **TrailerAddict** (Currently broken)
- **Trilulilu**
- **trollvids**
- **TruTV**
- **Tube8**
- **TubiTv**
- **tudou**
@ -742,6 +758,7 @@
- **TVCArticle**
- **tvigle**: Интернет-телевидение Tvigle.ru
- **tvland.com**
- **TVNoe**
- **tvp**: Telewizja Polska
- **tvp:embed**: Telewizja Polska
- **tvp:series**
@ -836,6 +853,7 @@
- **VRT**
- **vube**: Vube.com
- **VuClip**
- **VyboryMos**
- **Walla**
- **washingtonpost**
- **washingtonpost:article**
@ -849,7 +867,7 @@
- **wholecloud**: WholeCloud
- **Wimp**
- **Wistia**
- **WNL**
- **wnl**: npo.nl and ntr.nl
- **WorldStarHipHop**
- **wrzuta.pl**
- **wrzuta.pl:playlist**

View File

@ -40,6 +40,7 @@ from youtube_dl.utils import (
js_to_json,
limit_length,
mimetype2ext,
month_by_name,
ohdave_rsa_encrypt,
OnDemandPagedList,
orderedSet,
@ -291,6 +292,7 @@ class TestUtil(unittest.TestCase):
self.assertEqual(unified_strdate('25-09-2014'), '20140925')
self.assertEqual(unified_strdate('27.02.2016 17:30'), '20160227')
self.assertEqual(unified_strdate('UNKNOWN DATE FORMAT'), None)
self.assertEqual(unified_strdate('Feb 7, 2016 at 6:35 pm'), '20160207')
def test_unified_timestamps(self):
self.assertEqual(unified_timestamp('December 21, 2010'), 1292889600)
@ -311,6 +313,7 @@ class TestUtil(unittest.TestCase):
self.assertEqual(unified_timestamp('27.02.2016 17:30'), 1456594200)
self.assertEqual(unified_timestamp('UNKNOWN DATE FORMAT'), None)
self.assertEqual(unified_timestamp('May 16, 2016 11:15 PM'), 1463440500)
self.assertEqual(unified_timestamp('Feb 7, 2016 at 6:35 pm'), 1454870100)
def test_determine_ext(self):
self.assertEqual(determine_ext('http://example.com/foo/bar.mp4/?download'), 'mp4')
@ -634,6 +637,14 @@ class TestUtil(unittest.TestCase):
self.assertEqual(mimetype2ext('text/vtt;charset=utf-8'), 'vtt')
self.assertEqual(mimetype2ext('text/html; charset=utf-8'), 'html')
def test_month_by_name(self):
self.assertEqual(month_by_name(None), None)
self.assertEqual(month_by_name('December', 'en'), 12)
self.assertEqual(month_by_name('décembre', 'fr'), 12)
self.assertEqual(month_by_name('December'), 12)
self.assertEqual(month_by_name('décembre'), None)
self.assertEqual(month_by_name('Unknown', 'unknown'), None)
def test_parse_codecs(self):
self.assertEqual(parse_codecs(''), {})
self.assertEqual(parse_codecs('avc1.77.30, mp4a.40.2'), {

View File

@ -131,6 +131,9 @@ class YoutubeDL(object):
username: Username for authentication purposes.
password: Password for authentication purposes.
videopassword: Password for accessing a video.
ap_mso: Adobe Pass multiple-system operator identifier.
ap_username: Multiple-system operator account username.
ap_password: Multiple-system operator account password.
usenetrc: Use netrc for authentication instead.
verbose: Print additional info to stdout.
quiet: Do not print messages to stdout.

View File

@ -34,12 +34,14 @@ from .utils import (
setproctitle,
std_headers,
write_string,
render_table,
)
from .update import update_self
from .downloader import (
FileDownloader,
)
from .extractor import gen_extractors, list_extractors
from .extractor.adobepass import MSO_INFO
from .YoutubeDL import YoutubeDL
@ -118,18 +120,26 @@ def _real_main(argv=None):
desc += ' (Example: "%s%s:%s" )' % (ie.SEARCH_KEY, random.choice(_COUNTS), random.choice(_SEARCHES))
write_string(desc + '\n', out=sys.stdout)
sys.exit(0)
if opts.ap_list_mso:
table = [[mso_id, mso_info['name']] for mso_id, mso_info in MSO_INFO.items()]
write_string('Supported TV Providers:\n' + render_table(['mso', 'mso name'], table) + '\n', out=sys.stdout)
sys.exit(0)
# Conflicting, missing and erroneous options
if opts.usenetrc and (opts.username is not None or opts.password is not None):
parser.error('using .netrc conflicts with giving username/password')
if opts.password is not None and opts.username is None:
parser.error('account username missing\n')
if opts.ap_password is not None and opts.ap_username is None:
parser.error('TV Provider account username missing\n')
if opts.outtmpl is not None and (opts.usetitle or opts.autonumber or opts.useid):
parser.error('using output template conflicts with using title, video ID or auto number')
if opts.usetitle and opts.useid:
parser.error('using title conflicts with using video ID')
if opts.username is not None and opts.password is None:
opts.password = compat_getpass('Type account password and press [Return]: ')
if opts.ap_username is not None and opts.ap_password is None:
opts.ap_password = compat_getpass('Type TV provider account password and press [Return]: ')
if opts.ratelimit is not None:
numeric_limit = FileDownloader.parse_bytes(opts.ratelimit)
if numeric_limit is None:
@ -155,6 +165,8 @@ def _real_main(argv=None):
parser.error('max sleep interval must be greater than or equal to min sleep interval')
else:
opts.max_sleep_interval = opts.sleep_interval
if opts.ap_mso and opts.ap_mso not in MSO_INFO:
parser.error('Unsupported TV Provider, use --ap-list-mso to get a list of supported TV Providers')
def parse_retries(retries):
if retries in ('inf', 'infinite'):
@ -254,8 +266,6 @@ def _real_main(argv=None):
postprocessors.append({
'key': 'FFmpegEmbedSubtitle',
})
if opts.xattrs:
postprocessors.append({'key': 'XAttrMetadata'})
if opts.embedthumbnail:
already_have_thumbnail = opts.writethumbnail or opts.write_all_thumbnails
postprocessors.append({
@ -264,6 +274,10 @@ def _real_main(argv=None):
})
if not already_have_thumbnail:
opts.writethumbnail = True
# XAttrMetadataPP should be run after post-processors that may change file
# contents
if opts.xattrs:
postprocessors.append({'key': 'XAttrMetadata'})
# Please keep ExecAfterDownload towards the bottom as it allows the user to modify the final file in any way.
# So if the user is able to remove the file before your postprocessor runs it might cause a few problems.
if opts.exec_cmd:
@ -271,12 +285,6 @@ def _real_main(argv=None):
'key': 'ExecAfterDownload',
'exec_cmd': opts.exec_cmd,
})
if opts.xattr_set_filesize:
try:
import xattr
xattr # Confuse flake8
except ImportError:
parser.error('setting filesize xattr requested but python-xattr is not available')
external_downloader_args = None
if opts.external_downloader_args:
external_downloader_args = compat_shlex_split(opts.external_downloader_args)
@ -293,6 +301,9 @@ def _real_main(argv=None):
'password': opts.password,
'twofactor': opts.twofactor,
'videopassword': opts.videopassword,
'ap_mso': opts.ap_mso,
'ap_username': opts.ap_username,
'ap_password': opts.ap_password,
'quiet': (opts.quiet or any_getting or any_printing),
'no_warnings': opts.no_warnings,
'forceurl': opts.geturl,

View File

@ -1,7 +1,6 @@
from __future__ import unicode_literals
import os
import re
from .fragment import FragmentFD
from ..compat import compat_urllib_error
@ -19,34 +18,32 @@ class DashSegmentsFD(FragmentFD):
FD_NAME = 'dashsegments'
def real_download(self, filename, info_dict):
base_url = info_dict['url']
segment_urls = [info_dict['segment_urls'][0]] if self.params.get('test', False) else info_dict['segment_urls']
initialization_url = info_dict.get('initialization_url')
segments = info_dict['fragments'][:1] if self.params.get(
'test', False) else info_dict['fragments']
ctx = {
'filename': filename,
'total_frags': len(segment_urls) + (1 if initialization_url else 0),
'total_frags': len(segments),
}
self._prepare_and_start_frag_download(ctx)
def combine_url(base_url, target_url):
if re.match(r'^https?://', target_url):
return target_url
return '%s%s%s' % (base_url, '' if base_url.endswith('/') else '/', target_url)
segments_filenames = []
fragment_retries = self.params.get('fragment_retries', 0)
skip_unavailable_fragments = self.params.get('skip_unavailable_fragments', True)
def process_segment(segment, tmp_filename, fatal):
target_url, segment_name = segment
def process_segment(segment, tmp_filename, num):
segment_url = segment['url']
segment_name = 'Frag%d' % num
target_filename = '%s-%s' % (tmp_filename, segment_name)
# In DASH, the first segment contains necessary headers to
# generate a valid MP4 file, so always abort for the first segment
fatal = num == 0 or not skip_unavailable_fragments
count = 0
while count <= fragment_retries:
try:
success = ctx['dl'].download(target_filename, {'url': combine_url(base_url, target_url)})
success = ctx['dl'].download(target_filename, {'url': segment_url})
if not success:
return False
down, target_sanitized = sanitize_open(target_filename, 'rb')
@ -72,16 +69,8 @@ class DashSegmentsFD(FragmentFD):
return False
return True
segments_to_download = [(initialization_url, 'Init')] if initialization_url else []
segments_to_download.extend([
(segment_url, 'Seg%d' % i)
for i, segment_url in enumerate(segment_urls)])
for i, segment in enumerate(segments_to_download):
# In DASH, the first segment contains necessary headers to
# generate a valid MP4 file, so always abort for the first segment
fatal = i == 0 or not skip_unavailable_fragments
if not process_segment(segment, ctx['tmpfilename'], fatal):
for i, segment in enumerate(segments):
if not process_segment(segment, ctx['tmpfilename'], i):
return False
self._finish_frag_download(ctx)

View File

@ -31,7 +31,7 @@ class HlsFD(FragmentFD):
FD_NAME = 'hlsnative'
@staticmethod
def can_download(manifest):
def can_download(manifest, info_dict):
UNSUPPORTED_FEATURES = (
r'#EXT-X-KEY:METHOD=(?!NONE|AES-128)', # encrypted streams [1]
r'#EXT-X-BYTERANGE', # playlists composed of byte ranges of media files [2]
@ -53,6 +53,7 @@ class HlsFD(FragmentFD):
)
check_results = [not re.search(feature, manifest) for feature in UNSUPPORTED_FEATURES]
check_results.append(can_decrypt_frag or '#EXT-X-KEY:METHOD=AES-128' not in manifest)
check_results.append(not info_dict.get('is_live'))
return all(check_results)
def real_download(self, filename, info_dict):
@ -62,7 +63,7 @@ class HlsFD(FragmentFD):
s = manifest.decode('utf-8', 'ignore')
if not self.can_download(s):
if not self.can_download(s, info_dict):
self.report_warning(
'hlsnative has detected features it does not support, '
'extraction will be delegated to ffmpeg')

View File

@ -13,6 +13,9 @@ from ..utils import (
encodeFilename,
sanitize_open,
sanitized_Request,
write_xattr,
XAttrMetadataError,
XAttrUnavailableError,
)
@ -179,9 +182,8 @@ class HttpFD(FileDownloader):
if self.params.get('xattr_set_filesize', False) and data_len is not None:
try:
import xattr
xattr.setxattr(tmpfilename, 'user.ytdl.filesize', str(data_len))
except(OSError, IOError, ImportError) as err:
write_xattr(tmpfilename, 'user.ytdl.filesize', str(data_len).encode('utf-8'))
except (XAttrUnavailableError, XAttrMetadataError) as err:
self.report_error('unable to set filesize xattr: %s' % str(err))
try:

View File

@ -13,7 +13,7 @@ from ..utils import (
class ABCIE(InfoExtractor):
IE_NAME = 'abc.net.au'
_VALID_URL = r'https?://www\.abc\.net\.au/news/(?:[^/]+/){1,2}(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?abc\.net\.au/news/(?:[^/]+/){1,2}(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.abc.net.au/news/2014-11-05/australia-to-staff-ebola-treatment-centre-in-sierra-leone/5868334',
@ -100,6 +100,7 @@ class ABCIViewIE(InfoExtractor):
IE_NAME = 'abc.net.au:iview'
_VALID_URL = r'https?://iview\.abc\.net\.au/programs/[^/]+/(?P<id>[^/?#]+)'
# ABC iview programs are normally available for 14 days only.
_TESTS = [{
'url': 'http://iview.abc.net.au/programs/gardening-australia/FA1505V024S00',
'md5': '979d10b2939101f0d27a06b79edad536',
@ -112,6 +113,7 @@ class ABCIViewIE(InfoExtractor):
'uploader_id': 'abc1',
'timestamp': 1471719600,
},
'skip': 'Video gone',
}]
def _real_extract(self, url):

View File

@ -12,7 +12,7 @@ from ..compat import compat_urlparse
class AbcNewsVideoIE(AMPIE):
IE_NAME = 'abcnews:video'
_VALID_URL = 'http://abcnews.go.com/[^/]+/video/(?P<display_id>[0-9a-z-]+)-(?P<id>\d+)'
_VALID_URL = r'https?://abcnews\.go\.com/[^/]+/video/(?P<display_id>[0-9a-z-]+)-(?P<id>\d+)'
_TESTS = [{
'url': 'http://abcnews.go.com/ThisWeek/video/week-exclusive-irans-foreign-minister-zarif-20411932',
@ -49,7 +49,7 @@ class AbcNewsVideoIE(AMPIE):
class AbcNewsIE(InfoExtractor):
IE_NAME = 'abcnews'
_VALID_URL = 'https?://abcnews\.go\.com/(?:[^/]+/)+(?P<display_id>[0-9a-z-]+)/story\?id=(?P<id>\d+)'
_VALID_URL = r'https?://abcnews\.go\.com/(?:[^/]+/)+(?P<display_id>[0-9a-z-]+)/story\?id=(?P<id>\d+)'
_TESTS = [{
'url': 'http://abcnews.go.com/Blotter/News/dramatic-video-rare-death-job-america/story?id=10498713#.UIhwosWHLjY',

View File

@ -1,13 +1,19 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import parse_iso8601
from ..utils import (
int_or_none,
parse_iso8601,
)
class Abc7NewsIE(InfoExtractor):
_VALID_URL = r'https?://abc7news\.com(?:/[^/]+/(?P<display_id>[^/]+))?/(?P<id>\d+)'
class ABCOTVSIE(InfoExtractor):
IE_NAME = 'abcotvs'
IE_DESC = 'ABC Owned Television Stations'
_VALID_URL = r'https?://(?:abc(?:7(?:news|ny|chicago)?|11|13|30)|6abc)\.com(?:/[^/]+/(?P<display_id>[^/]+))?/(?P<id>\d+)'
_TESTS = [
{
'url': 'http://abc7news.com/entertainment/east-bay-museum-celebrates-vintage-synthesizers/472581/',
@ -15,7 +21,7 @@ class Abc7NewsIE(InfoExtractor):
'id': '472581',
'display_id': 'east-bay-museum-celebrates-vintage-synthesizers',
'ext': 'mp4',
'title': 'East Bay museum celebrates history of synthesized music',
'title': 'East Bay museum celebrates vintage synthesizers',
'description': 'md5:a4f10fb2f2a02565c1749d4adbab4b10',
'thumbnail': 're:^https?://.*\.jpg$',
'timestamp': 1421123075,
@ -41,7 +47,7 @@ class Abc7NewsIE(InfoExtractor):
webpage = self._download_webpage(url, display_id)
m3u8 = self._html_search_meta(
'contentURL', webpage, 'm3u8 url', fatal=True)
'contentURL', webpage, 'm3u8 url', fatal=True).split('?')[0]
formats = self._extract_m3u8_formats(m3u8, display_id, 'mp4')
self._sort_formats(formats)
@ -66,3 +72,41 @@ class Abc7NewsIE(InfoExtractor):
'uploader': uploader,
'formats': formats,
}
class ABCOTVSClipsIE(InfoExtractor):
IE_NAME = 'abcotvs:clips'
_VALID_URL = r'https?://clips\.abcotvs\.com/(?:[^/]+/)*video/(?P<id>\d+)'
_TEST = {
'url': 'https://clips.abcotvs.com/kabc/video/214814',
'info_dict': {
'id': '214814',
'ext': 'mp4',
'title': 'SpaceX launch pad explosion destroys rocket, satellite',
'description': 'md5:9f186e5ad8f490f65409965ee9c7be1b',
'upload_date': '20160901',
'timestamp': 1472756695,
},
'params': {
# m3u8 download
'skip_download': True,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
video_data = self._download_json('https://clips.abcotvs.com/vogo/video/getByIds?ids=' + video_id, video_id)['results'][0]
title = video_data['title']
formats = self._extract_m3u8_formats(
video_data['videoURL'].split('?')[0], video_id, 'mp4')
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': video_data.get('description'),
'thumbnail': video_data.get('thumbnailURL'),
'duration': int_or_none(video_data.get('duration')),
'timestamp': int_or_none(video_data.get('pubDate')),
'formats': formats,
}

File diff suppressed because it is too large Load Diff

View File

@ -1,64 +0,0 @@
# encoding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import int_or_none
class AftonbladetIE(InfoExtractor):
_VALID_URL = r'https?://tv\.aftonbladet\.se/abtv/articles/(?P<id>[0-9]+)'
_TEST = {
'url': 'http://tv.aftonbladet.se/abtv/articles/36015',
'info_dict': {
'id': '36015',
'ext': 'mp4',
'title': 'Vulkanutbrott i rymden - nu släpper NASA bilderna',
'description': 'Jupiters måne mest aktiv av alla himlakroppar',
'timestamp': 1394142732,
'upload_date': '20140306',
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
# find internal video meta data
meta_url = 'http://aftonbladet-play-metadata.cdn.drvideo.aptoma.no/video/%s.json'
player_config = self._parse_json(self._html_search_regex(
r'data-player-config="([^"]+)"', webpage, 'player config'), video_id)
internal_meta_id = player_config['aptomaVideoId']
internal_meta_url = meta_url % internal_meta_id
internal_meta_json = self._download_json(
internal_meta_url, video_id, 'Downloading video meta data')
# find internal video formats
format_url = 'http://aftonbladet-play.videodata.drvideo.aptoma.no/actions/video/?id=%s'
internal_video_id = internal_meta_json['videoId']
internal_formats_url = format_url % internal_video_id
internal_formats_json = self._download_json(
internal_formats_url, video_id, 'Downloading video formats')
formats = []
for fmt in internal_formats_json['formats']['http']['pseudostreaming']['mp4']:
p = fmt['paths'][0]
formats.append({
'url': 'http://%s:%d/%s/%s' % (p['address'], p['port'], p['path'], p['filename']),
'ext': 'mp4',
'width': int_or_none(fmt.get('width')),
'height': int_or_none(fmt.get('height')),
'tbr': int_or_none(fmt.get('bitrate')),
'protocol': 'http',
})
self._sort_formats(formats)
return {
'id': video_id,
'title': internal_meta_json['title'],
'formats': formats,
'thumbnail': internal_meta_json.get('imageUrl'),
'description': internal_meta_json.get('shortPreamble'),
'timestamp': int_or_none(internal_meta_json.get('timePublished')),
'duration': int_or_none(internal_meta_json.get('duration')),
'view_count': int_or_none(internal_meta_json.get('views')),
}

View File

@ -4,7 +4,7 @@ from .common import InfoExtractor
class AlJazeeraIE(InfoExtractor):
_VALID_URL = r'https?://www\.aljazeera\.com/programmes/.*?/(?P<id>[^/]+)\.html'
_VALID_URL = r'https?://(?:www\.)?aljazeera\.com/programmes/.*?/(?P<id>[^/]+)\.html'
_TEST = {
'url': 'http://www.aljazeera.com/programmes/the-slum/2014/08/deliverance-201482883754237240.html',

View File

@ -28,6 +28,7 @@ class AMCNetworksIE(ThePlatformIE):
# m3u8 download
'skip_download': True,
},
'skip': 'Requires TV provider accounts',
}, {
'url': 'http://www.bbcamerica.com/shows/the-hunt/full-episodes/season-1/episode-01-the-hardest-challenge',
'only_matching': True,

View File

@ -238,7 +238,7 @@ class ARDMediathekIE(InfoExtractor):
class ARDIE(InfoExtractor):
_VALID_URL = '(?P<mainurl>https?://(www\.)?daserste\.de/[^?#]+/videos/(?P<display_id>[^/?#]+)-(?P<id>[0-9]+))\.html'
_VALID_URL = r'(?P<mainurl>https?://(www\.)?daserste\.de/[^?#]+/videos/(?P<display_id>[^/?#]+)-(?P<id>[0-9]+))\.html'
_TEST = {
'url': 'http://www.daserste.de/information/reportage-dokumentation/dokus/videos/die-story-im-ersten-mission-unter-falscher-flagge-100.html',
'md5': 'd216c3a86493f9322545e045ddc3eb35',

View File

@ -50,25 +50,6 @@ class AWAANBaseIE(InfoExtractor):
'is_live': is_live,
}
def _extract_video_formats(self, webpage, video_id, m3u8_entry_protocol):
formats = []
format_url_base = 'http' + self._html_search_regex(
[
r'file\s*:\s*"https?(://[^"]+)/playlist.m3u8',
r'<a[^>]+href="rtsp(://[^"]+)"'
], webpage, 'format url')
formats.extend(self._extract_mpd_formats(
format_url_base + '/manifest.mpd',
video_id, mpd_id='dash', fatal=False))
formats.extend(self._extract_m3u8_formats(
format_url_base + '/playlist.m3u8', video_id, 'mp4',
m3u8_entry_protocol, m3u8_id='hls', fatal=False))
formats.extend(self._extract_f4m_formats(
format_url_base + '/manifest.f4m',
video_id, f4m_id='hds', fatal=False))
self._sort_formats(formats)
return formats
class AWAANVideoIE(AWAANBaseIE):
IE_NAME = 'awaan:video'
@ -85,6 +66,7 @@ class AWAANVideoIE(AWAANBaseIE):
'duration': 2041,
'timestamp': 1227504126,
'upload_date': '20081124',
'uploader_id': '71',
},
}, {
'url': 'http://awaan.ae/video/26723981/%D8%AF%D8%A7%D8%B1-%D8%A7%D9%84%D8%B3%D9%84%D8%A7%D9%85:-%D8%AE%D9%8A%D8%B1-%D8%AF%D9%88%D8%B1-%D8%A7%D9%84%D8%A3%D9%86%D8%B5%D8%A7%D8%B1',
@ -99,16 +81,18 @@ class AWAANVideoIE(AWAANBaseIE):
video_id, headers={'Origin': 'http://awaan.ae'})
info = self._parse_video_data(video_data, video_id, False)
webpage = self._download_webpage(
'http://admin.mangomolo.com/analytics/index.php/customers/embed/video?' +
compat_urllib_parse_urlencode({
'id': video_data['id'],
'user_id': video_data['user_id'],
'signature': video_data['signature'],
'countries': 'Q0M=',
'filter': 'DENY',
}), video_id)
info['formats'] = self._extract_video_formats(webpage, video_id, 'm3u8_native')
embed_url = 'http://admin.mangomolo.com/analytics/index.php/customers/embed/video?' + compat_urllib_parse_urlencode({
'id': video_data['id'],
'user_id': video_data['user_id'],
'signature': video_data['signature'],
'countries': 'Q0M=',
'filter': 'DENY',
})
info.update({
'_type': 'url_transparent',
'url': embed_url,
'ie_key': 'MangomoloVideo',
})
return info
@ -138,16 +122,18 @@ class AWAANLiveIE(AWAANBaseIE):
channel_id, headers={'Origin': 'http://awaan.ae'})
info = self._parse_video_data(channel_data, channel_id, True)
webpage = self._download_webpage(
'http://admin.mangomolo.com/analytics/index.php/customers/embed/index?' +
compat_urllib_parse_urlencode({
'id': base64.b64encode(channel_data['user_id'].encode()).decode(),
'channelid': base64.b64encode(channel_data['id'].encode()).decode(),
'signature': channel_data['signature'],
'countries': 'Q0M=',
'filter': 'DENY',
}), channel_id)
info['formats'] = self._extract_video_formats(webpage, channel_id, 'm3u8')
embed_url = 'http://admin.mangomolo.com/analytics/index.php/customers/embed/index?' + compat_urllib_parse_urlencode({
'id': base64.b64encode(channel_data['user_id'].encode()).decode(),
'channelid': base64.b64encode(channel_data['id'].encode()).decode(),
'signature': channel_data['signature'],
'countries': 'Q0M=',
'filter': 'DENY',
})
info.update({
'_type': 'url_transparent',
'url': embed_url,
'ie_key': 'MangomoloLive',
})
return info

View File

@ -103,7 +103,7 @@ class AzubuIE(InfoExtractor):
class AzubuLiveIE(InfoExtractor):
_VALID_URL = r'https?://www.azubu.tv/(?P<id>[^/]+)$'
_VALID_URL = r'https?://(?:www\.)?azubu\.tv/(?P<id>[^/]+)$'
_TEST = {
'url': 'http://www.azubu.tv/MarsTVMDLen',

View File

@ -1028,7 +1028,7 @@ class BBCIE(BBCCoUkIE):
class BBCCoUkArticleIE(InfoExtractor):
_VALID_URL = r'https?://www.bbc.co.uk/programmes/articles/(?P<id>[a-zA-Z0-9]+)'
_VALID_URL = r'https?://(?:www\.)?bbc\.co\.uk/programmes/articles/(?P<id>[a-zA-Z0-9]+)'
IE_NAME = 'bbc.co.uk:article'
IE_DESC = 'BBC articles'

View File

@ -6,8 +6,25 @@ import re
from .common import InfoExtractor
class CTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?P<domain>ctv|tsn|bnn|thecomedynetwork)\.ca/.*?(?:\bvid=|-vid|~|%7E)(?P<id>[0-9.]+)'
class BellMediaIE(InfoExtractor):
_VALID_URL = r'''(?x)https?://(?:www\.)?
(?P<domain>
(?:
ctv|
tsn|
bnn|
thecomedynetwork|
discovery|
discoveryvelocity|
sciencechannel|
investigationdiscovery|
animalplanet|
bravo|
mtv|
space
)\.ca|
much\.com
)/.*?(?:\bvid=|-vid|~|%7E|/(?:episode)?)(?P<id>[0-9]{6})'''
_TESTS = [{
'url': 'http://www.ctv.ca/video/player?vid=706966',
'md5': 'ff2ebbeae0aa2dcc32a830c3fd69b7b0',
@ -32,15 +49,27 @@ class CTVIE(InfoExtractor):
}, {
'url': 'http://www.ctv.ca/YourMorning/Video/S1E6-Monday-August-29-2016-vid938009',
'only_matching': True,
}, {
'url': 'http://www.much.com/shows/atmidnight/episode948007/tuesday-september-13-2016',
'only_matching': True,
}, {
'url': 'http://www.much.com/shows/the-almost-impossible-gameshow/928979/episode-6',
'only_matching': True,
}]
_DOMAINS = {
'thecomedynetwork': 'comedy',
'discoveryvelocity': 'discvel',
'sciencechannel': 'discsci',
'investigationdiscovery': 'invdisc',
'animalplanet': 'aniplan',
}
def _real_extract(self, url):
domain, video_id = re.match(self._VALID_URL, url).groups()
if domain == 'thecomedynetwork':
domain = 'comedy'
domain = domain.split('.')[0]
return {
'_type': 'url_transparent',
'id': video_id,
'url': '9c9media:%s_web:%s' % (domain, video_id),
'url': '9c9media:%s_web:%s' % (self._DOMAINS.get(domain, domain), video_id),
'ie_key': 'NineCNineMedia',
}

View File

@ -10,13 +10,14 @@ from ..utils import (
int_or_none,
float_or_none,
unified_timestamp,
urlencode_postdata,
)
class BiliBiliIE(InfoExtractor):
_VALID_URL = r'https?://www\.bilibili\.(?:tv|com)/video/av(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.|bangumi\.|)bilibili\.(?:tv|com)/(?:video/av|anime/v/)(?P<id>\d+)'
_TESTS = [{
_TEST = {
'url': 'http://www.bilibili.tv/video/av1074402/',
'md5': '9fa226fe2b8a9a4d5a69b4c6a183417e',
'info_dict': {
@ -31,66 +32,26 @@ class BiliBiliIE(InfoExtractor):
'uploader': '菊子桑',
'uploader_id': '156160',
},
}, {
'url': 'http://www.bilibili.com/video/av1041170/',
'info_dict': {
'id': '1041170',
'ext': 'mp4',
'title': '【BD1080P】刀语【诸神&异域】',
'description': '这是个神奇的故事~每个人不留弹幕不给走哦~切利哦!~',
'duration': 3382.259,
'timestamp': 1396530060,
'upload_date': '20140403',
'thumbnail': 're:^https?://.+\.jpg',
'uploader': '枫叶逝去',
'uploader_id': '520116',
},
}, {
'url': 'http://www.bilibili.com/video/av4808130/',
'info_dict': {
'id': '4808130',
'ext': 'mp4',
'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
'duration': 1493.995,
'timestamp': 1464564180,
'upload_date': '20160529',
'thumbnail': 're:^https?://.+\.jpg',
'uploader': '喜欢拉面',
'uploader_id': '151066',
},
}, {
# Missing upload time
'url': 'http://www.bilibili.com/video/av1867637/',
'info_dict': {
'id': '1867637',
'ext': 'mp4',
'title': '【HDTV】【喜剧】岳父岳母真难当 2014【法国票房冠军】',
'description': '一个信奉天主教的法国旧式传统资产阶级家庭中有四个女儿。三个女儿却分别找了阿拉伯、犹太、中国丈夫,老夫老妻唯独期盼剩下未嫁的小女儿能找一个信奉天主教的法国白人,结果没想到小女儿找了一位非裔黑人……【这次应该不会跳帧了】',
'duration': 5760.0,
'uploader': '黑夜为猫',
'uploader_id': '610729',
'thumbnail': 're:^https?://.+\.jpg',
},
'params': {
# Just to test metadata extraction
'skip_download': True,
},
'expected_warnings': ['upload time'],
}]
}
_APP_KEY = '6f90a59ac58a4123'
_BILIBILI_KEY = '0bfd84cc3940035173f35e6777508326'
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
cid = compat_parse_qs(self._search_regex(
[r'EmbedPlayer\([^)]+,\s*"([^"]+)"\)',
r'<iframe[^>]+src="https://secure\.bilibili\.com/secure,([^"]+)"'],
webpage, 'player parameters'))['cid'][0]
if 'anime/v' not in url:
cid = compat_parse_qs(self._search_regex(
[r'EmbedPlayer\([^)]+,\s*"([^"]+)"\)',
r'<iframe[^>]+src="https://secure\.bilibili\.com/secure,([^"]+)"'],
webpage, 'player parameters'))['cid'][0]
else:
js = self._download_json(
'http://bangumi.bilibili.com/web_api/get_source', video_id,
data=urlencode_postdata({'episode_id': video_id}),
headers={'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8'})
cid = js['result']['cid']
payload = 'appkey=%s&cid=%s&otype=json&quality=2&type=mp4' % (self._APP_KEY, cid)
sign = hashlib.md5((payload + self._BILIBILI_KEY).encode('utf-8')).hexdigest()
@ -106,7 +67,7 @@ class BiliBiliIE(InfoExtractor):
'url': durl['url'],
'filesize': int_or_none(durl['size']),
}]
for backup_url in durl['backup_url']:
for backup_url in durl.get('backup_url', []):
formats.append({
'url': backup_url,
# backup URLs have lower priorities
@ -125,6 +86,7 @@ class BiliBiliIE(InfoExtractor):
description = self._html_search_meta('description', webpage)
timestamp = unified_timestamp(self._html_search_regex(
r'<time[^>]+datetime="([^"]+)"', webpage, 'upload time', fatal=False))
thumbnail = self._html_search_meta(['og:image', 'thumbnailUrl'], webpage)
# TODO 'view_count' requires deobfuscating Javascript
info = {
@ -132,7 +94,7 @@ class BiliBiliIE(InfoExtractor):
'title': title,
'description': description,
'timestamp': timestamp,
'thumbnail': self._html_search_meta('thumbnailUrl', webpage),
'thumbnail': thumbnail,
'duration': float_or_none(video_info.get('timelength'), scale=1000),
}

View File

@ -12,7 +12,7 @@ from ..utils import (
class BpbIE(InfoExtractor):
IE_DESC = 'Bundeszentrale für politische Bildung'
_VALID_URL = r'https?://www\.bpb\.de/mediathek/(?P<id>[0-9]+)/'
_VALID_URL = r'https?://(?:www\.)?bpb\.de/mediathek/(?P<id>[0-9]+)/'
_TEST = {
'url': 'http://www.bpb.de/mediathek/297/joachim-gauck-zu-1989-und-die-erinnerung-an-die-ddr',

View File

@ -621,15 +621,21 @@ class BrightcoveNewIE(InfoExtractor):
'url': text_track['src'],
})
is_live = False
duration = float_or_none(json_data.get('duration'), 1000)
if duration and duration < 0:
is_live = True
return {
'id': video_id,
'title': title,
'title': self._live_title(title) if is_live else title,
'description': clean_html(json_data.get('description')),
'thumbnail': json_data.get('thumbnail') or json_data.get('poster'),
'duration': float_or_none(json_data.get('duration'), 1000),
'duration': duration,
'timestamp': parse_iso8601(json_data.get('published_at')),
'uploader_id': account_id,
'formats': formats,
'subtitles': subtitles,
'tags': json_data.get('tags', []),
'is_live': is_live,
}

View File

@ -1,6 +1,5 @@
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
@ -8,15 +7,15 @@ from ..utils import ExtractorError
class BYUtvIE(InfoExtractor):
_VALID_URL = r'^https?://(?:www\.)?byutv.org/watch/[0-9a-f-]+/(?P<video_id>[^/?#]+)'
_TEST = {
_VALID_URL = r'https?://(?:www\.)?byutv\.org/watch/(?!event/)(?P<id>[0-9a-f-]+)(?:/(?P<display_id>[^/?#&]+))?'
_TESTS = [{
'url': 'http://www.byutv.org/watch/6587b9a3-89d2-42a6-a7f7-fd2f81840a7d/studio-c-season-5-episode-5',
'md5': '05850eb8c749e2ee05ad5a1c34668493',
'info_dict': {
'id': 'studio-c-season-5-episode-5',
'id': '6587b9a3-89d2-42a6-a7f7-fd2f81840a7d',
'display_id': 'studio-c-season-5-episode-5',
'ext': 'mp4',
'description': 'md5:e07269172baff037f8e8bf9956bc9747',
'title': 'Season 5 Episode 5',
'description': 'md5:e07269172baff037f8e8bf9956bc9747',
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 1486.486,
},
@ -24,28 +23,71 @@ class BYUtvIE(InfoExtractor):
'skip_download': True,
},
'add_ie': ['Ooyala'],
}, {
'url': 'http://www.byutv.org/watch/6587b9a3-89d2-42a6-a7f7-fd2f81840a7d',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
display_id = mobj.group('display_id') or video_id
webpage = self._download_webpage(url, display_id)
episode_code = self._search_regex(
r'(?s)episode:(.*?\}),\s*\n', webpage, 'episode information')
ep = self._parse_json(
episode_code, display_id, transform_source=lambda s:
re.sub(r'(\n\s+)([a-zA-Z]+):\s+\'(.*?)\'', r'\1"\2": "\3"', s))
if ep['providerType'] != 'Ooyala':
raise ExtractorError('Unsupported provider %s' % ep['provider'])
return {
'_type': 'url_transparent',
'ie_key': 'Ooyala',
'url': 'ooyala:%s' % ep['providerId'],
'id': video_id,
'display_id': display_id,
'title': ep['title'],
'description': ep.get('description'),
'thumbnail': ep.get('imageThumbnail'),
}
class BYUtvEventIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?byutv\.org/watch/event/(?P<id>[0-9a-f-]+)'
_TEST = {
'url': 'http://www.byutv.org/watch/event/29941b9b-8bf6-48d2-aebf-7a87add9e34b',
'info_dict': {
'id': '29941b9b-8bf6-48d2-aebf-7a87add9e34b',
'ext': 'mp4',
'title': 'Toledo vs. BYU (9/30/16)',
},
'params': {
'skip_download': True,
},
'add_ie': ['Ooyala'],
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('video_id')
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
episode_code = self._search_regex(
r'(?s)episode:(.*?\}),\s*\n', webpage, 'episode information')
episode_json = re.sub(
r'(\n\s+)([a-zA-Z]+):\s+\'(.*?)\'', r'\1"\2": "\3"', episode_code)
ep = json.loads(episode_json)
if ep['providerType'] == 'Ooyala':
return {
'_type': 'url_transparent',
'ie_key': 'Ooyala',
'url': 'ooyala:%s' % ep['providerId'],
'id': video_id,
'title': ep['title'],
'description': ep.get('description'),
'thumbnail': ep.get('imageThumbnail'),
}
else:
raise ExtractorError('Unsupported provider %s' % ep['provider'])
ooyala_id = self._search_regex(
r'providerId\s*:\s*(["\'])(?P<id>(?:(?!\1).)+)\1',
webpage, 'ooyala id', group='id')
title = self._search_regex(
r'class=["\']description["\'][^>]*>\s*<h1>([^<]+)</h1>', webpage,
'title').strip()
return {
'_type': 'url_transparent',
'ie_key': 'Ooyala',
'url': 'ooyala:%s' % ooyala_id,
'id': video_id,
'title': title,
}

View File

@ -112,7 +112,7 @@ class CamdemyIE(InfoExtractor):
class CamdemyFolderIE(InfoExtractor):
_VALID_URL = r'https?://www.camdemy.com/folder/(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?camdemy\.com/folder/(?P<id>\d+)'
_TESTS = [{
# links with trailing slash
'url': 'http://www.camdemy.com/folder/450',

View File

@ -23,6 +23,7 @@ class CanalplusIE(InfoExtractor):
(?:(?:www|m)\.)?canalplus\.fr|
(?:www\.)?piwiplus\.fr|
(?:www\.)?d8\.tv|
(?:www\.)?c8\.fr|
(?:www\.)?d17\.tv|
(?:www\.)?itele\.fr
)/(?:(?:[^/]+/)*(?P<display_id>[^/?#&]+))?(?:\?.*\bvid=(?P<vid>\d+))?|
@ -35,6 +36,7 @@ class CanalplusIE(InfoExtractor):
'canalplus': 'cplus',
'piwiplus': 'teletoon',
'd8': 'd8',
'c8': 'd8',
'd17': 'd17',
'itele': 'itele',
}

View File

@ -1,11 +1,13 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import float_or_none
class CanvasIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?canvas\.be/video/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_VALID_URL = r'https?://(?:www\.)?(?P<site_id>canvas|een)\.be/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://www.canvas.be/video/de-afspraak/najaar-2015/de-afspraak-veilt-voor-de-warmste-week',
'md5': 'ea838375a547ac787d4064d8c7860a6c',
@ -38,22 +40,42 @@ class CanvasIE(InfoExtractor):
'params': {
'skip_download': True,
}
}, {
'url': 'https://www.een.be/sorry-voor-alles/herbekijk-sorry-voor-alles',
'info_dict': {
'id': 'mz-ast-11a587f8-b921-4266-82e2-0bce3e80d07f',
'display_id': 'herbekijk-sorry-voor-alles',
'ext': 'mp4',
'title': 'Herbekijk Sorry voor alles',
'description': 'md5:8bb2805df8164e5eb95d6a7a29dc0dd3',
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 3788.06,
},
'params': {
'skip_download': True,
}
}, {
'url': 'https://www.canvas.be/check-point/najaar-2016/de-politie-uw-vriend',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
mobj = re.match(self._VALID_URL, url)
site_id, display_id = mobj.group('site_id'), mobj.group('id')
webpage = self._download_webpage(url, display_id)
title = self._search_regex(
title = (self._search_regex(
r'<h1[^>]+class="video__body__header__title"[^>]*>(.+?)</h1>',
webpage, 'title', default=None) or self._og_search_title(webpage)
webpage, 'title', default=None) or self._og_search_title(
webpage)).strip()
video_id = self._html_search_regex(
r'data-video=(["\'])(?P<id>.+?)\1', webpage, 'video id', group='id')
r'data-video=(["\'])(?P<id>(?:(?!\1).)+)\1', webpage, 'video id', group='id')
data = self._download_json(
'https://mediazone.vrt.be/api/v1/canvas/assets/%s' % video_id, display_id)
'https://mediazone.vrt.be/api/v1/%s/assets/%s'
% (site_id, video_id), display_id)
formats = []
for target in data['targetUrls']:

View File

@ -30,7 +30,13 @@ class CartoonNetworkIE(TurnerBaseIE):
return self._extract_cvp_info(
'http://www.cartoonnetwork.com/video-seo-svc/episodeservices/getCvpPlaylist?networkName=CN2&' + query, video_id, {
'secure': {
'media_src': 'http://apple-secure.cdn.turner.com/toon/big',
'media_src': 'http://androidhls-secure.cdn.turner.com/toon/big',
'tokenizer_src': 'http://www.cartoonnetwork.com/cntv/mvpd/processors/services/token_ipadAdobe.do',
},
}, {
'url': url,
'site_name': 'CartoonNetwork',
'auth_required': self._search_regex(
r'_cnglobal\.cvpFullOrPreviewAuth\s*=\s*(true|false);',
webpage, 'auth required', default='false') == 'true',
})

View File

@ -4,7 +4,9 @@ from .theplatform import ThePlatformFeedIE
from ..utils import (
int_or_none,
find_xpath_attr,
ExtractorError,
xpath_element,
xpath_text,
update_url_query,
)
@ -47,27 +49,49 @@ class CBSIE(CBSBaseIE):
'only_matching': True,
}]
def _extract_video_info(self, guid):
path = 'dJ5BDC/media/guid/2198311517/' + guid
smil_url = 'http://link.theplatform.com/s/%s?mbr=true' % path
formats, subtitles = self._extract_theplatform_smil(smil_url + '&manifest=m3u', guid)
for r in ('OnceURL&formats=M3U', 'HLS&formats=M3U', 'RTMP', 'WIFI', '3G'):
try:
tp_formats, _ = self._extract_theplatform_smil(smil_url + '&assetTypes=' + r, guid, 'Downloading %s SMIL data' % r.split('&')[0])
formats.extend(tp_formats)
except ExtractorError:
def _extract_video_info(self, content_id):
items_data = self._download_xml(
'http://can.cbs.com/thunder/player/videoPlayerService.php',
content_id, query={'partner': 'cbs', 'contentId': content_id})
video_data = xpath_element(items_data, './/item')
title = xpath_text(video_data, 'videoTitle', 'title', True)
tp_path = 'dJ5BDC/media/guid/2198311517/%s' % content_id
tp_release_url = 'http://link.theplatform.com/s/' + tp_path
asset_types = []
subtitles = {}
formats = []
for item in items_data.findall('.//item'):
asset_type = xpath_text(item, 'assetType')
if not asset_type or asset_type in asset_types:
continue
asset_types.append(asset_type)
query = {
'mbr': 'true',
'assetTypes': asset_type,
}
if asset_type.startswith('HLS') or asset_type in ('OnceURL', 'StreamPack'):
query['formats'] = 'MPEG4,M3U'
elif asset_type in ('RTMP', 'WIFI', '3G'):
query['formats'] = 'MPEG4,FLV'
tp_formats, tp_subtitles = self._extract_theplatform_smil(
update_url_query(tp_release_url, query), content_id,
'Downloading %s SMIL data' % asset_type)
formats.extend(tp_formats)
subtitles = self._merge_subtitles(subtitles, tp_subtitles)
self._sort_formats(formats)
metadata = self._download_theplatform_metadata(path, guid)
info = self._parse_theplatform_metadata(metadata)
info = self._extract_theplatform_metadata(tp_path, content_id)
info.update({
'id': guid,
'id': content_id,
'title': title,
'series': xpath_text(video_data, 'seriesTitle'),
'season_number': int_or_none(xpath_text(video_data, 'seasonNumber')),
'episode_number': int_or_none(xpath_text(video_data, 'episodeNumber')),
'duration': int_or_none(xpath_text(video_data, 'videoLength'), 1000),
'thumbnail': xpath_text(video_data, 'previewImageURL'),
'formats': formats,
'subtitles': subtitles,
'series': metadata.get('cbs$SeriesTitle'),
'season_number': int_or_none(metadata.get('cbs$SeasonNumber')),
'episode': metadata.get('cbs$EpisodeTitle'),
'episode_number': int_or_none(metadata.get('cbs$EpisodeNumber')),
})
return info

View File

@ -9,6 +9,7 @@ from ..utils import (
class CBSNewsIE(CBSIE):
IE_NAME = 'cbsnews'
IE_DESC = 'CBS News'
_VALID_URL = r'https?://(?:www\.)?cbsnews\.com/(?:news|videos)/(?P<id>[\da-z_-]+)'
@ -68,15 +69,16 @@ class CBSNewsIE(CBSIE):
class CBSNewsLiveVideoIE(InfoExtractor):
IE_NAME = 'cbsnews:livevideo'
IE_DESC = 'CBS News Live Videos'
_VALID_URL = r'https?://(?:www\.)?cbsnews\.com/live/video/(?P<id>[\da-z_-]+)'
_VALID_URL = r'https?://(?:www\.)?cbsnews\.com/live/video/(?P<id>[^/?#]+)'
# Live videos get deleted soon. See http://www.cbsnews.com/live/ for the latest examples
_TEST = {
'url': 'http://www.cbsnews.com/live/video/clinton-sanders-prepare-to-face-off-in-nh/',
'info_dict': {
'id': 'clinton-sanders-prepare-to-face-off-in-nh',
'ext': 'flv',
'ext': 'mp4',
'title': 'Clinton, Sanders Prepare To Face Off In NH',
'duration': 334,
},
@ -84,25 +86,22 @@ class CBSNewsLiveVideoIE(InfoExtractor):
}
def _real_extract(self, url):
video_id = self._match_id(url)
display_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_info = self._download_json(
'http://feeds.cbsn.cbsnews.com/rundown/story', display_id, query={
'device': 'desktop',
'dvr_slug': display_id,
})
video_info = self._parse_json(self._html_search_regex(
r'data-story-obj=\'({.+?})\'', webpage, 'video JSON info'), video_id)['story']
hdcore_sign = 'hdcore=3.3.1'
f4m_formats = self._extract_f4m_formats(video_info['url'] + '&' + hdcore_sign, video_id)
if f4m_formats:
for entry in f4m_formats:
# URLs without the extra param induce an 404 error
entry.update({'extra_param_to_segment_url': hdcore_sign})
self._sort_formats(f4m_formats)
formats = self._extract_akamai_formats(video_info['url'], display_id)
self._sort_formats(formats)
return {
'id': video_id,
'id': display_id,
'display_id': display_id,
'title': video_info['headline'],
'thumbnail': video_info.get('thumbnail_url_hd') or video_info.get('thumbnail_url_sd'),
'duration': parse_duration(video_info.get('segmentDur')),
'formats': f4m_formats,
'formats': formats,
}

View File

@ -4,7 +4,7 @@ from .cbs import CBSBaseIE
class CBSSportsIE(CBSBaseIE):
_VALID_URL = r'https?://www\.cbssports\.com/video/player/[^/]+/(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?cbssports\.com/video/player/[^/]+/(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.cbssports.com/video/player/videos/708337219968/0/ben-simmons-the-next-lebron?-not-so-fast',

View File

@ -0,0 +1,53 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import float_or_none
class CCTVIE(InfoExtractor):
_VALID_URL = r'''(?x)https?://(?:.+?\.)?
(?:
cctv\.(?:com|cn)|
cntv\.cn
)/
(?:
video/[^/]+/(?P<id>[0-9a-f]{32})|
\d{4}/\d{2}/\d{2}/(?P<display_id>VID[0-9A-Za-z]+)
)'''
_TESTS = [{
'url': 'http://english.cntv.cn/2016/09/03/VIDEhnkB5y9AgHyIEVphCEz1160903.shtml',
'md5': '819c7b49fc3927d529fb4cd555621823',
'info_dict': {
'id': '454368eb19ad44a1925bf1eb96140a61',
'ext': 'mp4',
'title': 'Portrait of Real Current Life 09/03/2016 Modern Inventors Part 1',
}
}, {
'url': 'http://tv.cctv.com/2016/09/07/VIDE5C1FnlX5bUywlrjhxXOV160907.shtml',
'only_matching': True,
}, {
'url': 'http://tv.cntv.cn/video/C39296/95cfac44cabd3ddc4a9438780a4e5c44',
'only_matching': True
}]
def _real_extract(self, url):
video_id, display_id = re.match(self._VALID_URL, url).groups()
if not video_id:
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
r'(?:fo\.addVariable\("videoCenterId",\s*|guid\s*=\s*)"([0-9a-f]{32})',
webpage, 'video_id')
api_data = self._download_json(
'http://vdn.apps.cntv.cn/api/getHttpVideoInfo.do?pid=' + video_id, video_id)
m3u8_url = re.sub(r'maxbr=\d+&?', '', api_data['hls_url'])
return {
'id': video_id,
'title': api_data['title'],
'formats': self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native', fatal=False),
'duration': float_or_none(api_data.get('video', {}).get('totalLength')),
}

View File

@ -17,7 +17,7 @@ from ..utils import (
class CeskaTelevizeIE(InfoExtractor):
_VALID_URL = r'https?://www\.ceskatelevize\.cz/(porady|ivysilani)/(?:[^/]+/)*(?P<id>[^/#?]+)/*(?:[#?].*)?$'
_VALID_URL = r'https?://(?:www\.)?ceskatelevize\.cz/(porady|ivysilani)/(?:[^/]+/)*(?P<id>[^/#?]+)/*(?:[#?].*)?$'
_TESTS = [{
'url': 'http://www.ceskatelevize.cz/ivysilani/ivysilani/10441294653-hyde-park-civilizace/214411058091220',
'info_dict': {

View File

@ -65,7 +65,7 @@ class ChirbitIE(InfoExtractor):
class ChirbitProfileIE(InfoExtractor):
IE_NAME = 'chirbit:profile'
_VALID_URL = r'https?://(?:www\.)?chirbit.com/(?:rss/)?(?P<id>[^/]+)'
_VALID_URL = r'https?://(?:www\.)?chirbit\.com/(?:rss/)?(?P<id>[^/]+)'
_TEST = {
'url': 'http://chirbit.com/ScarletBeauty',
'info_dict': {

View File

@ -1,9 +1,6 @@
# coding: utf-8
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..utils import (
clean_html,
@ -30,16 +27,14 @@ class ClubicIE(InfoExtractor):
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
player_url = 'http://player.m6web.fr/v1/player/clubic/%s.html' % video_id
player_page = self._download_webpage(player_url, video_id)
config_json = self._search_regex(
config = self._parse_json(self._search_regex(
r'(?m)M6\.Player\.config\s*=\s*(\{.+?\});$', player_page,
'configuration')
config = json.loads(config_json)
'configuration'), video_id)
video_info = config['videoInfo']
sources = config['sources']

View File

@ -6,7 +6,7 @@ from ..utils import ExtractorError
class CMTIE(MTVIE):
IE_NAME = 'cmt.com'
_VALID_URL = r'https?://www\.cmt\.com/(?:videos|shows)/(?:[^/]+/)*(?P<videoid>\d+)'
_VALID_URL = r'https?://(?:www\.)?cmt\.com/(?:videos|shows)/(?:[^/]+/)*(?P<videoid>\d+)'
_FEED_URL = 'http://www.cmt.com/sitewide/apps/player/embed/rss/'
_TESTS = [{

View File

@ -87,6 +87,9 @@ class InfoExtractor(object):
Potential fields:
* url Mandatory. The URL of the video file
* manifest_url
The URL of the manifest file in case of
fragmented media (DASH, hls, hds)
* ext Will be calculated from URL if missing
* format A human-readable description of the format
("mp4 container with h264/opus").
@ -115,6 +118,11 @@ class InfoExtractor(object):
download, lower-case.
"http", "https", "rtsp", "rtmp", "rtmpe",
"m3u8", "m3u8_native" or "http_dash_segments".
* fragments A list of fragments of the fragmented media,
with the following entries:
* "url" (mandatory) - fragment's URL
* "duration" (optional, int or float)
* "filesize" (optional, int)
* preference Order number of this format. If this field is
present and not None, the formats get sorted
by this field, regardless of all other values.
@ -674,33 +682,36 @@ class InfoExtractor(object):
username = info[0]
password = info[2]
else:
raise netrc.NetrcParseError('No authenticators for %s' % netrc_machine)
raise netrc.NetrcParseError(
'No authenticators for %s' % netrc_machine)
except (IOError, netrc.NetrcParseError) as err:
self._downloader.report_warning('parsing .netrc: %s' % error_to_compat_str(err))
self._downloader.report_warning(
'parsing .netrc: %s' % error_to_compat_str(err))
return (username, password)
return username, password
def _get_login_info(self):
def _get_login_info(self, username_option='username', password_option='password', netrc_machine=None):
"""
Get the login info as (username, password)
It will look in the netrc file using the _NETRC_MACHINE value
First look for the manually specified credentials using username_option
and password_option as keys in params dictionary. If no such credentials
available look in the netrc file using the netrc_machine or _NETRC_MACHINE
value.
If there's no info available, return (None, None)
"""
if self._downloader is None:
return (None, None)
username = None
password = None
downloader_params = self._downloader.params
# Attempt to use provided username and password or .netrc data
if downloader_params.get('username') is not None:
username = downloader_params['username']
password = downloader_params['password']
if downloader_params.get(username_option) is not None:
username = downloader_params[username_option]
password = downloader_params[password_option]
else:
username, password = self._get_netrc_login_info()
username, password = self._get_netrc_login_info(netrc_machine)
return (username, password)
return username, password
def _get_tfa_info(self, note='two-factor verification code'):
"""
@ -888,16 +899,16 @@ class InfoExtractor(object):
def _hidden_inputs(html):
html = re.sub(r'<!--(?:(?!<!--).)*-->', '', html)
hidden_inputs = {}
for input in re.findall(r'(?i)<input([^>]+)>', html):
if not re.search(r'type=(["\'])(?:hidden|submit)\1', input):
for input in re.findall(r'(?i)(<input[^>]+>)', html):
attrs = extract_attributes(input)
if not input:
continue
name = re.search(r'(?:name|id)=(["\'])(?P<value>.+?)\1', input)
if not name:
if attrs.get('type') not in ('hidden', 'submit'):
continue
value = re.search(r'value=(["\'])(?P<value>.*?)\1', input)
if not value:
continue
hidden_inputs[name.group('value')] = value.group('value')
name = attrs.get('name') or attrs.get('id')
value = attrs.get('value')
if name and value is not None:
hidden_inputs[name] = value
return hidden_inputs
def _form_hidden_inputs(self, form_id, html):
@ -1139,6 +1150,7 @@ class InfoExtractor(object):
formats.append({
'format_id': format_id,
'url': manifest_url,
'manifest_url': manifest_url,
'ext': 'flv' if bootstrap_info is not None else None,
'tbr': tbr,
'width': width,
@ -1244,9 +1256,11 @@ class InfoExtractor(object):
# format_id intact.
if not live:
format_id.append(stream_name if stream_name else '%d' % (tbr if tbr else len(formats)))
manifest_url = format_url(line.strip())
f = {
'format_id': '-'.join(format_id),
'url': format_url(line.strip()),
'url': manifest_url,
'manifest_url': manifest_url,
'tbr': tbr,
'ext': ext,
'fps': float_or_none(last_info.get('FRAME-RATE')),
@ -1518,9 +1532,10 @@ class InfoExtractor(object):
mpd_base_url = re.match(r'https?://.+/', urlh.geturl()).group()
return self._parse_mpd_formats(
compat_etree_fromstring(mpd.encode('utf-8')), mpd_id, mpd_base_url, formats_dict=formats_dict)
compat_etree_fromstring(mpd.encode('utf-8')), mpd_id, mpd_base_url,
formats_dict=formats_dict, mpd_url=mpd_url)
def _parse_mpd_formats(self, mpd_doc, mpd_id=None, mpd_base_url='', formats_dict={}):
def _parse_mpd_formats(self, mpd_doc, mpd_id=None, mpd_base_url='', formats_dict={}, mpd_url=None):
"""
Parse formats from MPD manifest.
References:
@ -1541,42 +1556,52 @@ class InfoExtractor(object):
def extract_multisegment_info(element, ms_parent_info):
ms_info = ms_parent_info.copy()
# As per [1, 5.3.9.2.2] SegmentList and SegmentTemplate share some
# common attributes and elements. We will only extract relevant
# for us.
def extract_common(source):
segment_timeline = source.find(_add_ns('SegmentTimeline'))
if segment_timeline is not None:
s_e = segment_timeline.findall(_add_ns('S'))
if s_e:
ms_info['total_number'] = 0
ms_info['s'] = []
for s in s_e:
r = int(s.get('r', 0))
ms_info['total_number'] += 1 + r
ms_info['s'].append({
't': int(s.get('t', 0)),
# @d is mandatory (see [1, 5.3.9.6.2, Table 17, page 60])
'd': int(s.attrib['d']),
'r': r,
})
start_number = source.get('startNumber')
if start_number:
ms_info['start_number'] = int(start_number)
timescale = source.get('timescale')
if timescale:
ms_info['timescale'] = int(timescale)
segment_duration = source.get('duration')
if segment_duration:
ms_info['segment_duration'] = int(segment_duration)
def extract_Initialization(source):
initialization = source.find(_add_ns('Initialization'))
if initialization is not None:
ms_info['initialization_url'] = initialization.attrib['sourceURL']
segment_list = element.find(_add_ns('SegmentList'))
if segment_list is not None:
extract_common(segment_list)
extract_Initialization(segment_list)
segment_urls_e = segment_list.findall(_add_ns('SegmentURL'))
if segment_urls_e:
ms_info['segment_urls'] = [segment.attrib['media'] for segment in segment_urls_e]
initialization = segment_list.find(_add_ns('Initialization'))
if initialization is not None:
ms_info['initialization_url'] = initialization.attrib['sourceURL']
else:
segment_template = element.find(_add_ns('SegmentTemplate'))
if segment_template is not None:
start_number = segment_template.get('startNumber')
if start_number:
ms_info['start_number'] = int(start_number)
segment_timeline = segment_template.find(_add_ns('SegmentTimeline'))
if segment_timeline is not None:
s_e = segment_timeline.findall(_add_ns('S'))
if s_e:
ms_info['total_number'] = 0
ms_info['s'] = []
for s in s_e:
r = int(s.get('r', 0))
ms_info['total_number'] += 1 + r
ms_info['s'].append({
't': int(s.get('t', 0)),
# @d is mandatory (see [1, 5.3.9.6.2, Table 17, page 60])
'd': int(s.attrib['d']),
'r': r,
})
else:
timescale = segment_template.get('timescale')
if timescale:
ms_info['timescale'] = int(timescale)
segment_duration = segment_template.get('duration')
if segment_duration:
ms_info['segment_duration'] = int(segment_duration)
extract_common(segment_template)
media_template = segment_template.get('media')
if media_template:
ms_info['media_template'] = media_template
@ -1584,11 +1609,14 @@ class InfoExtractor(object):
if initialization:
ms_info['initialization_url'] = initialization
else:
initialization = segment_template.find(_add_ns('Initialization'))
if initialization is not None:
ms_info['initialization_url'] = initialization.attrib['sourceURL']
extract_Initialization(segment_template)
return ms_info
def combine_url(base_url, target_url):
if re.match(r'^https?://', target_url):
return target_url
return '%s%s%s' % (base_url, '' if base_url.endswith('/') else '/', target_url)
mpd_duration = parse_duration(mpd_doc.get('mediaPresentationDuration'))
formats = []
for period in mpd_doc.findall(_add_ns('Period')):
@ -1631,6 +1659,7 @@ class InfoExtractor(object):
f = {
'format_id': '%s-%s' % (mpd_id, representation_id) if mpd_id else representation_id,
'url': base_url,
'manifest_url': mpd_url,
'ext': mimetype2ext(mime_type),
'width': int_or_none(representation_attrib.get('width')),
'height': int_or_none(representation_attrib.get('height')),
@ -1645,9 +1674,7 @@ class InfoExtractor(object):
}
representation_ms_info = extract_multisegment_info(representation, adaption_set_ms_info)
if 'segment_urls' not in representation_ms_info and 'media_template' in representation_ms_info:
if 'total_number' not in representation_ms_info and 'segment_duration':
segment_duration = float(representation_ms_info['segment_duration']) / float(representation_ms_info['timescale'])
representation_ms_info['total_number'] = int(math.ceil(float(period_duration) / segment_duration))
media_template = representation_ms_info['media_template']
media_template = media_template.replace('$RepresentationID$', representation_id)
media_template = re.sub(r'\$(Number|Bandwidth|Time)\$', r'%(\1)d', media_template)
@ -1656,46 +1683,79 @@ class InfoExtractor(object):
# As per [1, 5.3.9.4.4, Table 16, page 55] $Number$ and $Time$
# can't be used at the same time
if '%(Number' in media_template:
representation_ms_info['segment_urls'] = [
media_template % {
if '%(Number' in media_template and 's' not in representation_ms_info:
segment_duration = None
if 'total_number' not in representation_ms_info and 'segment_duration':
segment_duration = float_or_none(representation_ms_info['segment_duration'], representation_ms_info['timescale'])
representation_ms_info['total_number'] = int(math.ceil(float(period_duration) / segment_duration))
representation_ms_info['fragments'] = [{
'url': media_template % {
'Number': segment_number,
'Bandwidth': representation_attrib.get('bandwidth'),
}
for segment_number in range(
representation_ms_info['start_number'],
representation_ms_info['total_number'] + representation_ms_info['start_number'])]
},
'duration': segment_duration,
} for segment_number in range(
representation_ms_info['start_number'],
representation_ms_info['total_number'] + representation_ms_info['start_number'])]
else:
representation_ms_info['segment_urls'] = []
# $Number*$ or $Time$ in media template with S list available
# Example $Number*$: http://www.svtplay.se/klipp/9023742/stopptid-om-bjorn-borg
# Example $Time$: https://play.arkena.com/embed/avp/v2/player/media/b41dda37-d8e7-4d3f-b1b5-9a9db578bdfe/1/129411
representation_ms_info['fragments'] = []
segment_time = 0
segment_d = None
segment_number = representation_ms_info['start_number']
def add_segment_url():
representation_ms_info['segment_urls'].append(
media_template % {
'Time': segment_time,
'Bandwidth': representation_attrib.get('bandwidth'),
}
)
segment_url = media_template % {
'Time': segment_time,
'Bandwidth': representation_attrib.get('bandwidth'),
'Number': segment_number,
}
representation_ms_info['fragments'].append({
'url': segment_url,
'duration': float_or_none(segment_d, representation_ms_info['timescale']),
})
for num, s in enumerate(representation_ms_info['s']):
segment_time = s.get('t') or segment_time
segment_d = s['d']
add_segment_url()
segment_number += 1
for r in range(s.get('r', 0)):
segment_time += s['d']
segment_time += segment_d
add_segment_url()
segment_time += s['d']
if 'segment_urls' in representation_ms_info:
segment_number += 1
segment_time += segment_d
elif 'segment_urls' in representation_ms_info and 's' in representation_ms_info:
# No media template
# Example: https://www.youtube.com/watch?v=iXZV5uAYMJI
# or any YouTube dashsegments video
fragments = []
s_num = 0
for segment_url in representation_ms_info['segment_urls']:
s = representation_ms_info['s'][s_num]
for r in range(s.get('r', 0) + 1):
fragments.append({
'url': segment_url,
'duration': float_or_none(s['d'], representation_ms_info['timescale']),
})
representation_ms_info['fragments'] = fragments
# NB: MPD manifest may contain direct URLs to unfragmented media.
# No fragments key is present in this case.
if 'fragments' in representation_ms_info:
f.update({
'segment_urls': representation_ms_info['segment_urls'],
'fragments': [],
'protocol': 'http_dash_segments',
})
if 'initialization_url' in representation_ms_info:
initialization_url = representation_ms_info['initialization_url'].replace('$RepresentationID$', representation_id)
f.update({
'initialization_url': initialization_url,
})
if not f.get('url'):
f['url'] = initialization_url
f['fragments'].append({'url': initialization_url})
f['fragments'].extend(representation_ms_info['fragments'])
for fragment in f['fragments']:
fragment['url'] = combine_url(base_url, fragment['url'])
try:
existing_format = next(
fo for fo in formats
@ -1768,7 +1828,7 @@ class InfoExtractor(object):
for track_tag in re.findall(r'<track[^>]+>', media_content):
track_attributes = extract_attributes(track_tag)
kind = track_attributes.get('kind')
if not kind or kind == 'subtitles':
if not kind or kind in ('subtitles', 'captions'):
src = track_attributes.get('src')
if not src:
continue
@ -1776,22 +1836,70 @@ class InfoExtractor(object):
media_info['subtitles'].setdefault(lang, []).append({
'url': absolute_url(src),
})
if media_info['formats']:
if media_info['formats'] or media_info['subtitles']:
entries.append(media_info)
return entries
def _extract_akamai_formats(self, manifest_url, video_id):
formats = []
hdcore_sign = 'hdcore=3.7.0'
f4m_url = re.sub(r'(https?://.+?)/i/', r'\1/z/', manifest_url).replace('/master.m3u8', '/manifest.f4m')
formats.extend(self._extract_f4m_formats(
update_url_query(f4m_url, {'hdcore': '3.7.0'}),
video_id, f4m_id='hds', fatal=False))
if 'hdcore=' not in f4m_url:
f4m_url += ('&' if '?' in f4m_url else '?') + hdcore_sign
f4m_formats = self._extract_f4m_formats(
f4m_url, video_id, f4m_id='hds', fatal=False)
for entry in f4m_formats:
entry.update({'extra_param_to_segment_url': hdcore_sign})
formats.extend(f4m_formats)
m3u8_url = re.sub(r'(https?://.+?)/z/', r'\1/i/', manifest_url).replace('/manifest.f4m', '/master.m3u8')
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
return formats
def _extract_wowza_formats(self, url, video_id, m3u8_entry_protocol='m3u8_native', skip_protocols=[]):
url = re.sub(r'/(?:manifest|playlist|jwplayer)\.(?:m3u8|f4m|mpd|smil)', '', url)
url_base = self._search_regex(r'(?:https?|rtmp|rtsp)(://[^?]+)', url, 'format url')
http_base_url = 'http' + url_base
formats = []
if 'm3u8' not in skip_protocols:
formats.extend(self._extract_m3u8_formats(
http_base_url + '/playlist.m3u8', video_id, 'mp4',
m3u8_entry_protocol, m3u8_id='hls', fatal=False))
if 'f4m' not in skip_protocols:
formats.extend(self._extract_f4m_formats(
http_base_url + '/manifest.f4m',
video_id, f4m_id='hds', fatal=False))
if re.search(r'(?:/smil:|\.smil)', url_base):
if 'dash' not in skip_protocols:
formats.extend(self._extract_mpd_formats(
http_base_url + '/manifest.mpd',
video_id, mpd_id='dash', fatal=False))
if 'smil' not in skip_protocols:
rtmp_formats = self._extract_smil_formats(
http_base_url + '/jwplayer.smil',
video_id, fatal=False)
for rtmp_format in rtmp_formats:
rtsp_format = rtmp_format.copy()
rtsp_format['url'] = '%s/%s' % (rtmp_format['url'], rtmp_format['play_path'])
del rtsp_format['play_path']
del rtsp_format['ext']
rtsp_format.update({
'url': rtsp_format['url'].replace('rtmp://', 'rtsp://'),
'format_id': rtmp_format['format_id'].replace('rtmp', 'rtsp'),
'protocol': 'rtsp',
})
formats.extend([rtmp_format, rtsp_format])
else:
for protocol in ('rtmp', 'rtsp'):
if protocol not in skip_protocols:
formats.append({
'url': protocol + url_base,
'format_id': protocol,
'protocol': protocol,
})
return formats
def _live_title(self, name):
""" Generate the title for a live video """
now = datetime.datetime.now()

View File

@ -1,13 +1,11 @@
# -*- coding: utf-8 -*-
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class CriterionIE(InfoExtractor):
_VALID_URL = r'https?://www\.criterion\.com/films/(?P<id>[0-9]+)-.+'
_VALID_URL = r'https?://(?:www\.)?criterion\.com/films/(?P<id>[0-9]+)-.+'
_TEST = {
'url': 'http://www.criterion.com/films/184-le-samourai',
'md5': 'bc51beba55685509883a9a7830919ec3',
@ -16,20 +14,20 @@ class CriterionIE(InfoExtractor):
'ext': 'mp4',
'title': 'Le Samouraï',
'description': 'md5:a2b4b116326558149bef81f76dcbb93f',
'thumbnail': 're:^https?://.*\.jpg$',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
final_url = self._search_regex(
r'so.addVariable\("videoURL", "(.+?)"\)\;', webpage, 'video url')
r'so\.addVariable\("videoURL", "(.+?)"\)\;', webpage, 'video url')
title = self._og_search_title(webpage)
description = self._html_search_meta('description', webpage)
thumbnail = self._search_regex(
r'so.addVariable\("thumbnailURL", "(.+?)"\)\;',
r'so\.addVariable\("thumbnailURL", "(.+?)"\)\;',
webpage, 'thumbnail url')
return {

View File

@ -34,22 +34,58 @@ from ..aes import (
class CrunchyrollBaseIE(InfoExtractor):
_LOGIN_URL = 'https://www.crunchyroll.com/login'
_LOGIN_FORM = 'login_form'
_NETRC_MACHINE = 'crunchyroll'
def _login(self):
(username, password) = self._get_login_info()
if username is None:
return
self.report_login()
login_url = 'https://www.crunchyroll.com/?a=formhandler'
data = urlencode_postdata({
'formname': 'RpcApiUser_Login',
'name': username,
'password': password,
login_page = self._download_webpage(
self._LOGIN_URL, None, 'Downloading login page')
def is_logged(webpage):
return '<title>Redirecting' in webpage
# Already logged in
if is_logged(login_page):
return
login_form_str = self._search_regex(
r'(?P<form><form[^>]+?id=(["\'])%s\2[^>]*>)' % self._LOGIN_FORM,
login_page, 'login form', group='form')
post_url = extract_attributes(login_form_str).get('action')
if not post_url:
post_url = self._LOGIN_URL
elif not post_url.startswith('http'):
post_url = compat_urlparse.urljoin(self._LOGIN_URL, post_url)
login_form = self._form_hidden_inputs(self._LOGIN_FORM, login_page)
login_form.update({
'login_form[name]': username,
'login_form[password]': password,
})
login_request = sanitized_Request(login_url, data)
login_request.add_header('Content-Type', 'application/x-www-form-urlencoded')
self._download_webpage(login_request, None, False, 'Wrong login info')
response = self._download_webpage(
post_url, None, 'Logging in', 'Wrong login info',
data=urlencode_postdata(login_form),
headers={'Content-Type': 'application/x-www-form-urlencoded'})
# Successful login
if is_logged(response):
return
error = self._html_search_regex(
'(?s)<ul[^>]+class=["\']messages["\'][^>]*>(.+?)</ul>',
response, 'error message', default=None)
if error:
raise ExtractorError('Unable to login: %s' % error, expected=True)
raise ExtractorError('Unable to log in')
def _real_initialize(self):
self._login()

View File

@ -394,7 +394,7 @@ class DailymotionUserIE(DailymotionPlaylistIE):
class DailymotionCloudIE(DailymotionBaseInfoExtractor):
_VALID_URL_PREFIX = r'http://api\.dmcloud\.net/(?:player/)?embed/'
_VALID_URL_PREFIX = r'https?://api\.dmcloud\.net/(?:player/)?embed/'
_VALID_URL = r'%s[^/]+/(?P<id>[^/?]+)' % _VALID_URL_PREFIX
_VALID_EMBED_URL = r'%s[^/]+/[^\'"]+' % _VALID_URL_PREFIX

View File

@ -1,61 +1,54 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import unified_strdate
class DctpTvIE(InfoExtractor):
_VALID_URL = r'https?://www.dctp.tv/(#/)?filme/(?P<id>.+?)/$'
_VALID_URL = r'https?://(?:www\.)?dctp\.tv/(#/)?filme/(?P<id>.+?)/$'
_TEST = {
'url': 'http://www.dctp.tv/filme/videoinstallation-fuer-eine-kaufhausfassade/',
'md5': '174dd4a8a6225cf5655952f969cfbe24',
'info_dict': {
'id': '1324',
'id': '95eaa4f33dad413aa17b4ee613cccc6c',
'display_id': 'videoinstallation-fuer-eine-kaufhausfassade',
'ext': 'flv',
'title': 'Videoinstallation für eine Kaufhausfassade'
'ext': 'mp4',
'title': 'Videoinstallation für eine Kaufhausfassade',
'description': 'Kurzfilm',
'upload_date': '20110407',
'thumbnail': 're:^https?://.*\.jpg$',
},
'params': {
# rtmp download
'skip_download': True,
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
base_url = 'http://dctp-ivms2-restapi.s3.amazonaws.com/'
version_json = self._download_json(
base_url + 'version.json',
video_id, note='Determining file version')
version = version_json['version_name']
info_json = self._download_json(
'{0}{1}/restapi/slugs/{2}.json'.format(base_url, version, video_id),
video_id, note='Fetching object ID')
object_id = compat_str(info_json['object_id'])
meta_json = self._download_json(
'{0}{1}/restapi/media/{2}.json'.format(base_url, version, object_id),
video_id, note='Downloading metadata')
uuid = meta_json['uuid']
title = meta_json['title']
wide = meta_json['is_wide']
if wide:
ratio = '16x9'
else:
ratio = '4x3'
play_path = 'mp4:{0}_dctp_0500_{1}.m4v'.format(uuid, ratio)
webpage = self._download_webpage(url, video_id)
object_id = self._html_search_meta('DC.identifier', webpage)
servers_json = self._download_json(
'http://www.dctp.tv/streaming_servers/',
'http://www.dctp.tv/elastic_streaming_client/get_streaming_server/',
video_id, note='Downloading server list')
url = servers_json[0]['endpoint']
server = servers_json[0]['server']
m3u8_path = self._search_regex(
r'\'([^\'"]+/playlist\.m3u8)"', webpage, 'm3u8 path')
formats = self._extract_m3u8_formats(
'http://%s%s' % (server, m3u8_path), video_id, ext='mp4',
entry_protocol='m3u8_native')
title = self._og_search_title(webpage)
description = self._html_search_meta('DC.description', webpage)
upload_date = unified_strdate(
self._html_search_meta('DC.date.created', webpage))
thumbnail = self._og_search_thumbnail(webpage)
return {
'id': object_id,
'title': title,
'format': 'rtmp',
'url': url,
'play_path': play_path,
'rtmp_real_time': True,
'ext': 'flv',
'display_id': video_id
'formats': formats,
'display_id': video_id,
'description': description,
'upload_date': upload_date,
'thumbnail': thumbnail,
}

View File

@ -13,7 +13,7 @@ from ..utils import (
class DemocracynowIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?democracynow.org/(?P<id>[^\?]*)'
_VALID_URL = r'https?://(?:www\.)?democracynow\.org/(?P<id>[^\?]*)'
IE_NAME = 'democracynow'
_TESTS = [{
'url': 'http://www.democracynow.org/shows/2015/7/3',

View File

@ -14,7 +14,7 @@ class EinthusanIE(InfoExtractor):
_TESTS = [
{
'url': 'http://www.einthusan.com/movies/watch.php?id=2447',
'md5': 'af244f4458cd667205e513d75da5b8b1',
'md5': 'd71379996ff5b7f217eca034c34e3461',
'info_dict': {
'id': '2447',
'ext': 'mp4',
@ -25,13 +25,13 @@ class EinthusanIE(InfoExtractor):
},
{
'url': 'http://www.einthusan.com/movies/watch.php?id=1671',
'md5': 'ef63c7a803e22315880ed182c10d1c5c',
'md5': 'b16a6fd3c67c06eb7c79c8a8615f4213',
'info_dict': {
'id': '1671',
'ext': 'mp4',
'title': 'Soodhu Kavvuum',
'thumbnail': 're:^https?://.*\.jpg$',
'description': 'md5:05d8a0c0281a4240d86d76e14f2f4d51',
'description': 'md5:b40f2bf7320b4f9414f3780817b2af8c',
}
},
]
@ -50,9 +50,11 @@ class EinthusanIE(InfoExtractor):
video_id = self._search_regex(
r'data-movieid=["\'](\d+)', webpage, 'video id', default=video_id)
video_url = self._download_webpage(
m3u8_url = self._download_webpage(
'http://cdn.einthusan.com/geturl/%s/hd/London,Washington,Toronto,Dallas,San,Sydney/'
% video_id, video_id)
% video_id, video_id, headers={'Referer': url})
formats = self._extract_m3u8_formats(
m3u8_url, video_id, ext='mp4', entry_protocol='m3u8_native')
description = self._html_search_meta('description', webpage)
thumbnail = self._html_search_regex(
@ -64,7 +66,7 @@ class EinthusanIE(InfoExtractor):
return {
'id': video_id,
'title': title,
'url': video_url,
'formats': formats,
'thumbnail': thumbnail,
'description': description,
}

View File

@ -4,7 +4,7 @@ from .common import InfoExtractor
class EngadgetIE(InfoExtractor):
_VALID_URL = r'https?://www.engadget.com/video/(?P<id>[^/?#]+)'
_VALID_URL = r'https?://(?:www\.)?engadget\.com/video/(?P<id>[^/?#]+)'
_TESTS = [{
# video with 5min ID

View File

@ -8,7 +8,7 @@ from ..utils import (
class ExpoTVIE(InfoExtractor):
_VALID_URL = r'https?://www\.expotv\.com/videos/[^?#]*/(?P<id>[0-9]+)($|[?#])'
_VALID_URL = r'https?://(?:www\.)?expotv\.com/videos/[^?#]*/(?P<id>[0-9]+)($|[?#])'
_TEST = {
'url': 'http://www.expotv.com/videos/reviews/3/40/NYX-Butter-lipstick/667916',
'md5': 'fe1d728c3a813ff78f595bc8b7a707a8',

View File

@ -5,11 +5,14 @@ from .abc import (
ABCIE,
ABCIViewIE,
)
from .abc7news import Abc7NewsIE
from .abcnews import (
AbcNewsIE,
AbcNewsVideoIE,
)
from .abcotvs import (
ABCOTVSIE,
ABCOTVSClipsIE,
)
from .academicearth import AcademicEarthCourseIE
from .acast import (
ACastIE,
@ -28,7 +31,6 @@ from .aenetworks import (
HistoryTopicIE,
)
from .afreecatv import AfreecaTVIE
from .aftonbladet import AftonbladetIE
from .airmozilla import AirMozillaIE
from .aljazeera import AlJazeeraIE
from .alphaporno import AlphaPornoIE
@ -90,6 +92,7 @@ from .bbc import (
)
from .beeg import BeegIE
from .behindkink import BehindKinkIE
from .bellmedia import BellMediaIE
from .beatportpro import BeatportProIE
from .bet import BetIE
from .bigflix import BigflixIE
@ -113,7 +116,10 @@ from .brightcove import (
BrightcoveNewIE,
)
from .buzzfeed import BuzzFeedIE
from .byutv import BYUtvIE
from .byutv import (
BYUtvIE,
BYUtvEventIE,
)
from .c56 import C56IE
from .camdemy import (
CamdemyIE,
@ -143,6 +149,7 @@ from .cbsnews import (
)
from .cbssports import CBSSportsIE
from .ccc import CCCIE
from .cctv import CCTVIE
from .cda import CDAIE
from .ceskatelevize import CeskaTelevizeIE
from .channel9 import Channel9IE
@ -191,7 +198,6 @@ from .crunchyroll import (
)
from .cspan import CSpanIE
from .ctsnews import CtsNewsIE
from .ctv import CTVIE
from .ctvnews import CTVNewsIE
from .cultureunplugged import CultureUnpluggedIE
from .curiositystream import (
@ -289,6 +295,7 @@ from .fox import FOXIE
from .foxgay import FoxgayIE
from .foxnews import (
FoxNewsIE,
FoxNewsArticleIE,
FoxNewsInsiderIE,
)
from .foxsports import FoxSportsIE
@ -391,6 +398,7 @@ from .ivi import (
IviCompilationIE
)
from .ivideon import IvideonIE
from .iwara import IwaraIE
from .izlesene import IzleseneIE
from .jeuxvideo import JeuxVideoIE
from .jove import JoveIE
@ -403,6 +411,7 @@ from .kankan import KankanIE
from .karaoketv import KaraoketvIE
from .karrierevideos import KarriereVideosIE
from .keezmovies import KeezMoviesIE
from .ketnet import KetnetIE
from .khanacademy import KhanAcademyIE
from .kickstarter import KickStarterIE
from .keek import KeekIE
@ -421,6 +430,7 @@ from .kuwo import (
)
from .la7 import LA7IE
from .laola1tv import Laola1TvIE
from .lci import LCIIE
from .lcp import (
LcpPlayIE,
LcpIE,
@ -464,6 +474,10 @@ from .macgamestore import MacGameStoreIE
from .mailru import MailRuIE
from .makerschannel import MakersChannelIE
from .makertv import MakerTVIE
from .mangomolo import (
MangomoloVideoIE,
MangomoloLiveIE,
)
from .matchtv import MatchTVIE
from .mdr import MDRIE
from .meta import METAIE
@ -471,6 +485,7 @@ from .metacafe import MetacafeIE
from .metacritic import MetacriticIE
from .mgoon import MgoonIE
from .mgtv import MGTVIE
from .miaopai import MiaoPaiIE
from .microsoftvirtualacademy import (
MicrosoftVirtualAcademyIE,
MicrosoftVirtualAcademyCourseIE,
@ -503,6 +518,7 @@ from .movingimage import MovingImageIE
from .msn import MSNIE
from .mtv import (
MTVIE,
MTVVideoIE,
MTVServicesEmbeddedIE,
MTVDEIE,
)
@ -525,6 +541,7 @@ from .nbc import (
CSNNEIE,
NBCIE,
NBCNewsIE,
NBCOlympicsIE,
NBCSportsIE,
NBCSportsVPlayerIE,
)
@ -597,13 +614,14 @@ from .nowtv import (
)
from .noz import NozIE
from .npo import (
AndereTijdenIE,
NPOIE,
NPOLiveIE,
NPORadioIE,
NPORadioFragmentIE,
SchoolTVIE,
VPROIE,
WNLIE
WNLIE,
)
from .npr import NprIE
from .nrk import (
@ -664,7 +682,10 @@ from .pluralsight import (
)
from .podomatic import PodomaticIE
from .pokemon import PokemonIE
from .polskieradio import PolskieRadioIE
from .polskieradio import (
PolskieRadioIE,
PolskieRadioCategoryIE,
)
from .porn91 import Porn91IE
from .porncom import PornComIE
from .pornhd import PornHdIE
@ -718,6 +739,7 @@ from .revision3 import (
)
from .rice import RICEIE
from .ringtv import RingTVIE
from .rmcdecouverte import RMCDecouverteIE
from .ro220 import Ro220IE
from .rockstargames import RockstarGamesIE
from .roosterteeth import RoosterTeethIE
@ -854,10 +876,12 @@ from .telebruxelles import TeleBruxellesIE
from .telecinco import TelecincoIE
from .telegraaf import TelegraafIE
from .telemb import TeleMBIE
from .telequebec import TeleQuebecIE
from .teletask import TeleTaskIE
from .telewebion import TelewebionIE
from .testurl import TestURLIE
from .tf1 import TF1IE
from .tfo import TFOIE
from .theintercept import TheInterceptIE
from .theplatform import (
ThePlatformIE,
@ -886,7 +910,7 @@ from .toutv import TouTvIE
from .toypics import ToypicsUserIE, ToypicsIE
from .traileraddict import TrailerAddictIE
from .trilulilu import TriluliluIE
from .trollvids import TrollvidsIE
from .trutv import TruTVIE
from .tube8 import Tube8IE
from .tubitv import TubiTvIE
from .tudou import (
@ -916,6 +940,7 @@ from .tvc import (
)
from .tvigle import TvigleIE
from .tvland import TVLandIE
from .tvnoe import TVNoeIE
from .tvp import (
TVPEmbedIE,
TVPIE,
@ -1048,6 +1073,7 @@ from .vporn import VpornIE
from .vrt import VRTIE
from .vube import VubeIE
from .vuclip import VuClipIE
from .vyborymos import VyboryMosIE
from .walla import WallaIE
from .washingtonpost import (
WashingtonPostIE,

View File

@ -11,9 +11,13 @@ class Formula1IE(InfoExtractor):
'md5': '8c79e54be72078b26b89e0e111c0502b',
'info_dict': {
'id': 'JvYXJpMzE6pArfHWm5ARp5AiUmD-gibV',
'ext': 'flv',
'ext': 'mp4',
'title': 'Race highlights - Spain 2016',
},
'params': {
# m3u8 download
'skip_download': True,
},
'add_ie': ['Ooyala'],
}, {
'url': 'http://www.formula1.com/en/video/2016/5/Race_highlights_-_Spain_2016.html',

View File

@ -1,14 +1,14 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from .adobepass import AdobePassIE
from ..utils import (
smuggle_url,
update_url_query,
)
class FOXIE(InfoExtractor):
class FOXIE(AdobePassIE):
_VALID_URL = r'https?://(?:www\.)?fox\.com/watch/(?P<id>[0-9]+)'
_TEST = {
'url': 'http://www.fox.com/watch/255180355939/7684182528',
@ -30,14 +30,26 @@ class FOXIE(InfoExtractor):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
release_url = self._parse_json(self._search_regex(
r'"fox_pdk_player"\s*:\s*({[^}]+?})', webpage, 'fox_pdk_player'),
video_id)['release_url']
settings = self._parse_json(self._search_regex(
r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);',
webpage, 'drupal settings'), video_id)
fox_pdk_player = settings['fox_pdk_player']
release_url = fox_pdk_player['release_url']
query = {
'mbr': 'true',
'switch': 'http'
}
if fox_pdk_player.get('access') == 'locked':
ap_p = settings['foxAdobePassProvider']
rating = ap_p.get('videoRating')
if rating == 'n/a':
rating = None
resource = self._get_mvpd_resource('fbc-fox', None, ap_p['videoGUID'], rating)
query['auth'] = self._extract_mvpd_auth(url, video_id, 'fbc-fox', resource)
return {
'_type': 'url_transparent',
'ie_key': 'ThePlatform',
'url': smuggle_url(update_url_query(
release_url, {'switch': 'http'}), {'force_smil_url': True}),
'url': smuggle_url(update_url_query(release_url, query), {'force_smil_url': True}),
'id': video_id,
}

View File

@ -1,18 +1,24 @@
from __future__ import unicode_literals
import itertools
from .common import InfoExtractor
from ..utils import (
get_element_by_id,
remove_end,
)
class FoxgayIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?foxgay\.com/videos/(?:\S+-)?(?P<id>\d+)\.shtml'
_TEST = {
'url': 'http://foxgay.com/videos/fuck-turkish-style-2582.shtml',
'md5': '80d72beab5d04e1655a56ad37afe6841',
'md5': '344558ccfea74d33b7adbce22e577f54',
'info_dict': {
'id': '2582',
'ext': 'mp4',
'title': 'md5:6122f7ae0fc6b21ebdf59c5e083ce25a',
'description': 'md5:5e51dc4405f1fd315f7927daed2ce5cf',
'title': 'Fuck Turkish-style',
'description': 'md5:6ae2d9486921891efe89231ace13ffdf',
'age_limit': 18,
'thumbnail': 're:https?://.*\.jpg$',
},
@ -22,27 +28,35 @@ class FoxgayIE(InfoExtractor):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = self._html_search_regex(
r'<title>(?P<title>.*?)</title>',
webpage, 'title', fatal=False)
description = self._html_search_regex(
r'<div class="ico_desc"><h2>(?P<description>.*?)</h2>',
webpage, 'description', fatal=False)
title = remove_end(self._html_search_regex(
r'<title>([^<]+)</title>', webpage, 'title'), ' - Foxgay.com')
description = get_element_by_id('inf_tit', webpage)
# The default user-agent with foxgay cookies leads to pages without videos
self._downloader.cookiejar.clear('.foxgay.com')
# Find the URL for the iFrame which contains the actual video.
iframe_url = self._html_search_regex(
r'<iframe[^>]+src=([\'"])(?P<url>[^\'"]+)\1', webpage,
'video frame', group='url')
iframe = self._download_webpage(
self._html_search_regex(r'iframe src="(?P<frame>.*?)"', webpage, 'video frame'),
video_id)
video_url = self._html_search_regex(
r"v_path = '(?P<vid>http://.*?)'", iframe, 'url')
thumb_url = self._html_search_regex(
r"t_path = '(?P<thumb>http://.*?)'", iframe, 'thumbnail', fatal=False)
iframe_url, video_id, headers={'User-Agent': 'curl/7.50.1'},
note='Downloading video frame')
video_data = self._parse_json(self._search_regex(
r'video_data\s*=\s*([^;]+);', iframe, 'video data'), video_id)
formats = [{
'url': source,
'height': resolution,
} for source, resolution in zip(
video_data['sources'], video_data.get('resolutions', itertools.repeat(None)))]
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'url': video_url,
'formats': formats,
'description': description,
'thumbnail': thumb_url,
'thumbnail': video_data.get('act_vid', {}).get('thumb'),
'age_limit': 18,
}

View File

@ -7,6 +7,7 @@ from .common import InfoExtractor
class FoxNewsIE(AMPIE):
IE_NAME = 'foxnews'
IE_DESC = 'Fox News and Fox Business Video'
_VALID_URL = r'https?://(?P<host>video\.(?:insider\.)?fox(?:news|business)\.com)/v/(?:video-embed\.html\?video_id=)?(?P<id>\d+)'
_TESTS = [
@ -66,6 +67,35 @@ class FoxNewsIE(AMPIE):
return info
class FoxNewsArticleIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?foxnews\.com/(?!v)([^/]+/)+(?P<id>[a-z-]+)'
IE_NAME = 'foxnews:article'
_TEST = {
'url': 'http://www.foxnews.com/politics/2016/09/08/buzz-about-bud-clinton-camp-denies-claims-wore-earpiece-at-forum.html',
'md5': '62aa5a781b308fdee212ebb6f33ae7ef',
'info_dict': {
'id': '5116295019001',
'ext': 'mp4',
'title': 'Trump and Clinton asked to defend positions on Iraq War',
'description': 'Veterans react on \'The Kelly File\'',
'timestamp': 1473299755,
'upload_date': '20160908',
},
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._html_search_regex(
r'data-video-id=([\'"])(?P<id>[^\'"]+)\1',
webpage, 'video ID', group='id')
return self.url_result(
'http://video.foxnews.com/v/' + video_id,
FoxNewsIE.ie_key())
class FoxNewsInsiderIE(InfoExtractor):
_VALID_URL = r'https?://insider\.foxnews\.com/([^/]+/)+(?P<id>[a-z-]+)'
IE_NAME = 'foxnews:insider'
@ -83,6 +113,10 @@ class FoxNewsInsiderIE(InfoExtractor):
'upload_date': '20160825',
'thumbnail': 're:^https?://.*\.jpg$',
},
'params': {
# m3u8 download
'skip_download': True,
},
'add_ie': [FoxNewsIE.ie_key()],
}

View File

@ -2,21 +2,21 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import int_or_none
from ..utils import month_by_name
class FranceInterIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?franceinter\.fr/player/reecouter\?play=(?P<id>[0-9]+)'
_VALID_URL = r'https?://(?:www\.)?franceinter\.fr/emissions/(?P<id>[^?#]+)'
_TEST = {
'url': 'http://www.franceinter.fr/player/reecouter?play=793962',
'md5': '4764932e466e6f6c79c317d2e74f6884',
'url': 'https://www.franceinter.fr/emissions/affaires-sensibles/affaires-sensibles-07-septembre-2016',
'md5': '9e54d7bdb6fdc02a841007f8a975c094',
'info_dict': {
'id': '793962',
'id': 'affaires-sensibles/affaires-sensibles-07-septembre-2016',
'ext': 'mp3',
'title': 'LHistoire dans les jeux vidéo',
'description': 'md5:7e93ddb4451e7530022792240a3049c7',
'timestamp': 1387369800,
'upload_date': '20131218',
'title': 'Affaire Cahuzac : le contentieux du compte en Suisse',
'description': 'md5:401969c5d318c061f86bda1fa359292b',
'upload_date': '20160907',
},
}
@ -25,23 +25,30 @@ class FranceInterIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
path = self._search_regex(
r'<a id="player".+?href="([^"]+)"', webpage, 'video url')
video_url = 'http://www.franceinter.fr/' + path
video_url = self._search_regex(
r'(?s)<div[^>]+class=["\']page-diffusion["\'][^>]*>.*?<button[^>]+data-url=(["\'])(?P<url>(?:(?!\1).)+)\1',
webpage, 'video url', group='url')
title = self._html_search_regex(
r'<span class="title-diffusion">(.+?)</span>', webpage, 'title')
description = self._html_search_regex(
r'<span class="description">(.*?)</span>',
webpage, 'description', fatal=False)
timestamp = int_or_none(self._search_regex(
r'data-date="(\d+)"', webpage, 'upload date', fatal=False))
title = self._og_search_title(webpage)
description = self._og_search_description(webpage)
upload_date_str = self._search_regex(
r'class=["\']cover-emission-period["\'][^>]*>[^<]+\s+(\d{1,2}\s+[^\s]+\s+\d{4})<',
webpage, 'upload date', fatal=False)
if upload_date_str:
upload_date_list = upload_date_str.split()
upload_date_list.reverse()
upload_date_list[1] = '%02d' % (month_by_name(upload_date_list[1], lang='fr') or 0)
upload_date_list[2] = '%02d' % int(upload_date_list[2])
upload_date = ''.join(upload_date_list)
else:
upload_date = None
return {
'id': video_id,
'title': title,
'description': description,
'timestamp': timestamp,
'upload_date': upload_date,
'formats': [{
'url': video_url,
'vcodec': 'none',

View File

@ -8,7 +8,7 @@ from .common import InfoExtractor
class FreespeechIE(InfoExtractor):
IE_NAME = 'freespeech.org'
_VALID_URL = r'https://www\.freespeech\.org/video/(?P<title>.+)'
_VALID_URL = r'https?://(?:www\.)?freespeech\.org/video/(?P<title>.+)'
_TEST = {
'add_ie': ['Youtube'],
'url': 'https://www.freespeech.org/video/obama-romney-campaign-colorado-ahead-debate-0',

View File

@ -1,19 +1,15 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
parse_duration,
str_to_int,
unified_strdate,
remove_end,
)
class GameStarIE(InfoExtractor):
_VALID_URL = r'https?://www\.gamestar\.de/videos/.*,(?P<id>[0-9]+)\.html'
_VALID_URL = r'https?://(?:www\.)?gamestar\.de/videos/.*,(?P<id>[0-9]+)\.html'
_TEST = {
'url': 'http://www.gamestar.de/videos/trailer,3/hobbit-3-die-schlacht-der-fuenf-heere,76110.html',
'md5': '96974ecbb7fd8d0d20fca5a00810cea7',
@ -21,8 +17,9 @@ class GameStarIE(InfoExtractor):
'id': '76110',
'ext': 'mp4',
'title': 'Hobbit 3: Die Schlacht der Fünf Heere - Teaser-Trailer zum dritten Teil',
'description': 'Der Teaser-Trailer zu Hobbit 3: Die Schlacht der Fünf Heere zeigt einige Szenen aus dem dritten Teil der Saga und kündigt den vollständigen Trailer an.',
'thumbnail': 'http://images.gamestar.de/images/idgwpgsgp/bdb/2494525/600x.jpg',
'description': 'Der Teaser-Trailer zu Hobbit 3: Die Schlacht der Fünf Heere zeigt einige Szenen aus dem dritten Teil der Saga und kündigt den...',
'thumbnail': 're:^https?://.*\.jpg$',
'timestamp': 1406542020,
'upload_date': '20140728',
'duration': 17
}
@ -32,41 +29,27 @@ class GameStarIE(InfoExtractor):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
og_title = self._og_search_title(webpage)
title = re.sub(r'\s*- Video (bei|-) GameStar\.de$', '', og_title)
url = 'http://gamestar.de/_misc/videos/portal/getVideoUrl.cfm?premium=0&videoId=' + video_id
description = self._og_search_description(webpage).strip()
thumbnail = self._proto_relative_url(
self._og_search_thumbnail(webpage), scheme='http:')
upload_date = unified_strdate(self._html_search_regex(
r'<span style="float:left;font-size:11px;">Datum: ([0-9]+\.[0-9]+\.[0-9]+)&nbsp;&nbsp;',
webpage, 'upload_date', fatal=False))
duration = parse_duration(self._html_search_regex(
r'&nbsp;&nbsp;Länge: ([0-9]+:[0-9]+)</span>', webpage, 'duration',
fatal=False))
view_count = str_to_int(self._html_search_regex(
r'&nbsp;&nbsp;Zuschauer: ([0-9\.]+)&nbsp;&nbsp;', webpage,
'view_count', fatal=False))
# TODO: there are multiple ld+json objects in the webpage,
# while _search_json_ld finds only the first one
json_ld = self._parse_json(self._search_regex(
r'(?s)<script[^>]+type=(["\'])application/ld\+json\1[^>]*>(?P<json_ld>[^<]+VideoObject[^<]+)</script>',
webpage, 'JSON-LD', group='json_ld'), video_id)
info_dict = self._json_ld(json_ld, video_id)
info_dict['title'] = remove_end(info_dict['title'], ' - GameStar')
view_count = json_ld.get('interactionCount')
comment_count = int_or_none(self._html_search_regex(
r'>Kommentieren \(([0-9]+)\)</a>', webpage, 'comment_count',
r'([0-9]+) Kommentare</span>', webpage, 'comment_count',
fatal=False))
return {
info_dict.update({
'id': video_id,
'title': title,
'url': url,
'ext': 'mp4',
'thumbnail': thumbnail,
'description': description,
'upload_date': upload_date,
'duration': duration,
'view_count': view_count,
'comment_count': comment_count
}
})
return info_dict

View File

@ -1369,6 +1369,11 @@ class GenericIE(InfoExtractor):
},
'add_ie': ['Vimeo'],
},
{
# generic vimeo embed that requires original URL passed as Referer
'url': 'http://racing4everyone.eu/2016/07/30/formula-1-2016-round12-germany/',
'only_matching': True,
},
{
'url': 'https://support.arkena.com/display/PLAY/Ways+to+embed+your+video',
'md5': 'b96f2f71b359a8ecd05ce4e1daa72365',
@ -1652,7 +1657,9 @@ class GenericIE(InfoExtractor):
return self.playlist_result(self._parse_xspf(doc, video_id), video_id)
elif re.match(r'(?i)^(?:{[^}]+})?MPD$', doc.tag):
info_dict['formats'] = self._parse_mpd_formats(
doc, video_id, mpd_base_url=url.rpartition('/')[0])
doc, video_id,
mpd_base_url=full_response.geturl().rpartition('/')[0],
mpd_url=url)
self._sort_formats(info_dict['formats'])
return info_dict
elif re.match(r'^{http://ns\.adobe\.com/f4m/[12]\.0}manifest$', doc.tag):
@ -2249,6 +2256,35 @@ class GenericIE(InfoExtractor):
return self.url_result(
self._proto_relative_url(unescapeHTML(mobj.group('url'))), 'VODPlatform')
# Look for Mangomolo embeds
mobj = re.search(
r'''(?x)<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?admin\.mangomolo\.com/analytics/index\.php/customers/embed/
(?:
video\?.*?\bid=(?P<video_id>\d+)|
index\?.*?\bchannelid=(?P<channel_id>(?:[A-Za-z0-9+/=]|%2B|%2F|%3D)+)
).+?)\1''', webpage)
if mobj is not None:
info = {
'_type': 'url_transparent',
'url': self._proto_relative_url(unescapeHTML(mobj.group('url'))),
'title': video_title,
'description': video_description,
'thumbnail': video_thumbnail,
'uploader': video_uploader,
}
video_id = mobj.group('video_id')
if video_id:
info.update({
'ie_key': 'MangomoloVideo',
'id': video_id,
})
else:
info.update({
'ie_key': 'MangomoloLive',
'id': mobj.group('channel_id'),
})
return info
# Look for Instagram embeds
instagram_embed_url = InstagramIE._extract_embed_url(webpage)
if instagram_embed_url is not None:
@ -2296,12 +2332,23 @@ class GenericIE(InfoExtractor):
info_dict.update(json_ld)
return info_dict
# Look for HTML5 media
entries = self._parse_html5_media_entries(url, webpage, video_id, m3u8_id='hls')
if entries:
for entry in entries:
entry.update({
'id': video_id,
'title': video_title,
})
self._sort_formats(entry['formats'])
return self.playlist_result(entries)
def check_video(vurl):
if YoutubeIE.suitable(vurl):
return True
vpath = compat_urlparse.urlparse(vurl).path
vext = determine_ext(vpath)
return '.' in vpath and vext not in ('swf', 'png', 'jpg', 'srt', 'sbv', 'sub', 'vtt', 'ttml')
return '.' in vpath and vext not in ('swf', 'png', 'jpg', 'srt', 'sbv', 'sub', 'vtt', 'ttml', 'js')
def filter_video(urls):
return list(filter(check_video, urls))
@ -2351,9 +2398,6 @@ class GenericIE(InfoExtractor):
# We only look in og:video if the MIME type is a video, don't try if it's a Flash player:
if m_video_type is not None:
found = filter_video(re.findall(r'<meta.*?property="og:video".*?content="(.*?)"', webpage))
if not found:
# HTML5 video
found = re.findall(r'(?s)<(?:video|audio)[^<]*(?:>.*?<source[^>]*)?\s+src=["\'](.*?)["\']', webpage)
if not found:
REDIRECT_REGEX = r'[0-9]{,2};\s*(?:URL|url)=\'?([^\'"]+)'
found = re.search(

View File

@ -2,6 +2,7 @@
from __future__ import unicode_literals
import random
import re
import math
from .common import InfoExtractor
@ -14,12 +15,13 @@ from ..utils import (
ExtractorError,
float_or_none,
int_or_none,
orderedSet,
str_or_none,
)
class GloboIE(InfoExtractor):
_VALID_URL = '(?:globo:|https?://.+?\.globo\.com/(?:[^/]+/)*(?:v/(?:[^/]+/)?|videos/))(?P<id>\d{7,})'
_VALID_URL = r'(?:globo:|https?://.+?\.globo\.com/(?:[^/]+/)*(?:v/(?:[^/]+/)?|videos/))(?P<id>\d{7,})'
_API_URL_TEMPLATE = 'http://api.globovideos.com/videos/%s/playlist'
_SECURITY_URL_TEMPLATE = 'http://security.video.globo.com/videos/%s/hash?player=flash&version=17.0.0.132&resource_id=%s'
@ -63,6 +65,9 @@ class GloboIE(InfoExtractor):
}, {
'url': 'http://canaloff.globo.com/programas/desejar-profundo/videos/4518560.html',
'only_matching': True,
}, {
'url': 'globo:3607726',
'only_matching': True,
}]
class MD5(object):
@ -396,7 +401,7 @@ class GloboIE(InfoExtractor):
class GloboArticleIE(InfoExtractor):
_VALID_URL = 'https?://.+?\.globo\.com/(?:[^/]+/)*(?P<id>[^/]+)(?:\.html)?'
_VALID_URL = r'https?://.+?\.globo\.com/(?:[^/]+/)*(?P<id>[^/.]+)(?:\.html)?'
_VIDEOID_REGEXES = [
r'\bdata-video-id=["\'](\d{7,})',
@ -408,15 +413,20 @@ class GloboArticleIE(InfoExtractor):
_TESTS = [{
'url': 'http://g1.globo.com/jornal-nacional/noticia/2014/09/novidade-na-fiscalizacao-de-bagagem-pela-receita-provoca-discussoes.html',
'md5': '307fdeae4390ccfe6ba1aa198cf6e72b',
'info_dict': {
'id': '3652183',
'ext': 'mp4',
'title': 'Receita Federal explica como vai fiscalizar bagagens de quem retorna ao Brasil de avião',
'duration': 110.711,
'uploader': 'Rede Globo',
'uploader_id': '196',
}
'id': 'novidade-na-fiscalizacao-de-bagagem-pela-receita-provoca-discussoes',
'title': 'Novidade na fiscalização de bagagem pela Receita provoca discussões',
'description': 'md5:c3c4b4d4c30c32fce460040b1ac46b12',
},
'playlist_count': 1,
}, {
'url': 'http://g1.globo.com/pr/parana/noticia/2016/09/mpf-denuncia-lula-marisa-e-mais-seis-na-operacao-lava-jato.html',
'info_dict': {
'id': 'mpf-denuncia-lula-marisa-e-mais-seis-na-operacao-lava-jato',
'title': "Lula era o 'comandante máximo' do esquema da Lava Jato, diz MPF",
'description': 'md5:8aa7cc8beda4dc71cc8553e00b77c54c',
},
'playlist_count': 6,
}, {
'url': 'http://gq.globo.com/Prazeres/Poder/noticia/2015/10/all-o-desafio-assista-ao-segundo-capitulo-da-serie.html',
'only_matching': True,
@ -435,5 +445,12 @@ class GloboArticleIE(InfoExtractor):
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(self._VIDEOID_REGEXES, webpage, 'video id')
return self.url_result('globo:%s' % video_id, 'Globo')
video_ids = []
for video_regex in self._VIDEOID_REGEXES:
video_ids.extend(re.findall(video_regex, webpage))
entries = [
self.url_result('globo:%s' % video_id, GloboIE.ie_key())
for video_id in orderedSet(video_ids)]
title = self._og_search_title(webpage, fatal=False)
description = self._html_search_meta('description', webpage)
return self.playlist_result(entries, display_id, title, description)

View File

@ -8,6 +8,8 @@ from ..utils import (
int_or_none,
determine_ext,
parse_age_limit,
urlencode_postdata,
ExtractorError,
)
@ -19,7 +21,7 @@ class GoIE(InfoExtractor):
'watchdisneyjunior': '008',
'watchdisneyxd': '009',
}
_VALID_URL = r'https?://(?:(?P<sub_domain>%s)\.)?go\.com/.*?vdka(?P<id>\w+)' % '|'.join(_BRANDS.keys())
_VALID_URL = r'https?://(?:(?P<sub_domain>%s)\.)?go\.com/(?:[^/]+/)*(?:vdka(?P<id>\w+)|season-\d+/\d+-(?P<display_id>[^/?#]+))' % '|'.join(_BRANDS.keys())
_TESTS = [{
'url': 'http://abc.go.com/shows/castle/video/most-recent/vdka0_g86w5onx',
'info_dict': {
@ -38,9 +40,13 @@ class GoIE(InfoExtractor):
}]
def _real_extract(self, url):
sub_domain, video_id = re.match(self._VALID_URL, url).groups()
sub_domain, video_id, display_id = re.match(self._VALID_URL, url).groups()
if not video_id:
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(r'data-video-id=["\']VDKA(\w+)', webpage, 'video id')
brand = self._BRANDS[sub_domain]
video_data = self._download_json(
'http://api.contents.watchabc.go.com/vp2/ws/contents/3000/videos/%s/001/-1/-1/-1/%s/-1/-1.json' % (self._BRANDS[sub_domain], video_id),
'http://api.contents.watchabc.go.com/vp2/ws/contents/3000/videos/%s/001/-1/-1/-1/%s/-1/-1.json' % (brand, video_id),
video_id)['video'][0]
title = video_data['title']
@ -52,6 +58,21 @@ class GoIE(InfoExtractor):
format_id = asset.get('format')
ext = determine_ext(asset_url)
if ext == 'm3u8':
video_type = video_data.get('type')
if video_type == 'lf':
entitlement = self._download_json(
'https://api.entitlement.watchabc.go.com/vp2/ws-secure/entitlement/2020/authorize.json',
video_id, data=urlencode_postdata({
'video_id': video_data['id'],
'video_type': video_type,
'brand': brand,
'device': '001',
}))
errors = entitlement.get('errors', {}).get('errors', [])
if errors:
error_message = ', '.join([error['message'] for error in errors])
raise ExtractorError('%s said: %s' % (self.IE_NAME, error_message), expected=True)
asset_url += '?' + entitlement['uplynkData']['sessionKey']
formats.extend(self._extract_m3u8_formats(
asset_url, video_id, 'mp4', m3u8_id=format_id or 'hls', fatal=False))
else:

View File

@ -10,7 +10,7 @@ from ..utils import unified_strdate
class GooglePlusIE(InfoExtractor):
IE_DESC = 'Google Plus'
_VALID_URL = r'https://plus\.google\.com/(?:[^/]+/)*?posts/(?P<id>\w+)'
_VALID_URL = r'https?://plus\.google\.com/(?:[^/]+/)*?posts/(?P<id>\w+)'
IE_NAME = 'plus.google'
_TEST = {
'url': 'https://plus.google.com/u/0/108897254135232129896/posts/ZButuJc6CtH',

View File

@ -11,7 +11,7 @@ from ..utils import (
class GoshgayIE(InfoExtractor):
_VALID_URL = r'https?://www\.goshgay\.com/video(?P<id>\d+?)($|/)'
_VALID_URL = r'https?://(?:www\.)?goshgay\.com/video(?P<id>\d+?)($|/)'
_TEST = {
'url': 'http://www.goshgay.com/video299069/diesel_sfw_xxx_video',
'md5': '4b6db9a0a333142eb9f15913142b0ed1',

View File

@ -5,7 +5,7 @@ from .common import InfoExtractor
class HarkIE(InfoExtractor):
_VALID_URL = r'https?://www\.hark\.com/clips/(?P<id>.+?)-.+'
_VALID_URL = r'https?://(?:www\.)?hark\.com/clips/(?P<id>.+?)-.+'
_TEST = {
'url': 'http://www.hark.com/clips/mmbzyhkgny-obama-beyond-the-afghan-theater-we-only-target-al-qaeda-on-may-23-2013',
'md5': '6783a58491b47b92c7c1af5a77d4cbee',

View File

@ -12,7 +12,7 @@ from ..utils import (
class HotNewHipHopIE(InfoExtractor):
_VALID_URL = r'https?://www\.hotnewhiphop\.com/.*\.(?P<id>.*)\.html'
_VALID_URL = r'https?://(?:www\.)?hotnewhiphop\.com/.*\.(?P<id>.*)\.html'
_TEST = {
'url': 'http://www.hotnewhiphop.com/freddie-gibbs-lay-it-down-song.1435540.html',
'md5': '2c2cd2f76ef11a9b3b581e8b232f3d96',

View File

@ -94,7 +94,7 @@ class ImdbIE(InfoExtractor):
class ImdbListIE(InfoExtractor):
IE_NAME = 'imdb:list'
IE_DESC = 'Internet Movie Database lists'
_VALID_URL = r'https?://www\.imdb\.com/list/(?P<id>[\da-zA-Z_-]{11})'
_VALID_URL = r'https?://(?:www\.)?imdb\.com/list/(?P<id>[\da-zA-Z_-]{11})'
_TEST = {
'url': 'http://www.imdb.com/list/JFs9NWw6XI0',
'info_dict': {

View File

@ -29,6 +29,7 @@ class InstagramIE(InfoExtractor):
'uploader': 'Naomi Leonor Phan-Quang',
'like_count': int,
'comment_count': int,
'comments': list,
},
}, {
# missing description
@ -44,6 +45,7 @@ class InstagramIE(InfoExtractor):
'uploader': 'Britney Spears',
'like_count': int,
'comment_count': int,
'comments': list,
},
'params': {
'skip_download': True,
@ -82,7 +84,7 @@ class InstagramIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
(video_url, description, thumbnail, timestamp, uploader,
uploader_id, like_count, comment_count) = [None] * 8
uploader_id, like_count, comment_count, height, width) = [None] * 10
shared_data = self._parse_json(
self._search_regex(
@ -94,6 +96,8 @@ class InstagramIE(InfoExtractor):
shared_data, lambda x: x['entry_data']['PostPage'][0]['media'], dict)
if media:
video_url = media.get('video_url')
height = int_or_none(media.get('dimensions', {}).get('height'))
width = int_or_none(media.get('dimensions', {}).get('width'))
description = media.get('caption')
thumbnail = media.get('display_src')
timestamp = int_or_none(media.get('date'))
@ -101,10 +105,24 @@ class InstagramIE(InfoExtractor):
uploader_id = media.get('owner', {}).get('username')
like_count = int_or_none(media.get('likes', {}).get('count'))
comment_count = int_or_none(media.get('comments', {}).get('count'))
comments = [{
'author': comment.get('user', {}).get('username'),
'author_id': comment.get('user', {}).get('id'),
'id': comment.get('id'),
'text': comment.get('text'),
'timestamp': int_or_none(comment.get('created_at')),
} for comment in media.get(
'comments', {}).get('nodes', []) if comment.get('text')]
if not video_url:
video_url = self._og_search_video_url(webpage, secure=False)
formats = [{
'url': video_url,
'width': width,
'height': height,
}]
if not uploader_id:
uploader_id = self._search_regex(
r'"owner"\s*:\s*{\s*"username"\s*:\s*"(.+?)"',
@ -121,7 +139,7 @@ class InstagramIE(InfoExtractor):
return {
'id': video_id,
'url': video_url,
'formats': formats,
'ext': 'mp4',
'title': 'Video by %s' % uploader_id,
'description': description,
@ -131,6 +149,7 @@ class InstagramIE(InfoExtractor):
'uploader': uploader,
'like_count': like_count,
'comment_count': comment_count,
'comments': comments,
}

View File

@ -0,0 +1,77 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_urllib_parse_urlparse
from ..utils import remove_end
class IwaraIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.|ecchi\.)?iwara\.tv/videos/(?P<id>[a-zA-Z0-9]+)'
_TESTS = [{
'url': 'http://iwara.tv/videos/amVwUl1EHpAD9RD',
'md5': '1d53866b2c514b23ed69e4352fdc9839',
'info_dict': {
'id': 'amVwUl1EHpAD9RD',
'ext': 'mp4',
'title': '【MMD R-18】ガールフレンド carry_me_off',
'age_limit': 18,
},
}, {
'url': 'http://ecchi.iwara.tv/videos/Vb4yf2yZspkzkBO',
'md5': '7e5f1f359cd51a027ba4a7b7710a50f0',
'info_dict': {
'id': '0B1LvuHnL-sRFNXB1WHNqbGw4SXc',
'ext': 'mp4',
'title': '[3D Hentai] Kyonyu Ã\x97 Genkai Ã\x97 Emaki Shinobi Girls.mp4',
'age_limit': 18,
},
'add_ie': ['GoogleDrive'],
}, {
'url': 'http://www.iwara.tv/videos/nawkaumd6ilezzgq',
'md5': '1d85f1e5217d2791626cff5ec83bb189',
'info_dict': {
'id': '6liAP9s2Ojc',
'ext': 'mp4',
'age_limit': 0,
'title': '[MMD] Do It Again Ver.2 [1080p 60FPS] (Motion,Camera,Wav+DL)',
'description': 'md5:590c12c0df1443d833fbebe05da8c47a',
'upload_date': '20160910',
'uploader': 'aMMDsork',
'uploader_id': 'UCVOFyOSCyFkXTYYHITtqB7A',
},
'add_ie': ['Youtube'],
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage, urlh = self._download_webpage_handle(url, video_id)
hostname = compat_urllib_parse_urlparse(urlh.geturl()).hostname
# ecchi is 'sexy' in Japanese
age_limit = 18 if hostname.split('.')[0] == 'ecchi' else 0
entries = self._parse_html5_media_entries(url, webpage, video_id)
if not entries:
iframe_url = self._html_search_regex(
r'<iframe[^>]+src=([\'"])(?P<url>[^\'"]+)\1',
webpage, 'iframe URL', group='url')
return {
'_type': 'url_transparent',
'url': iframe_url,
'age_limit': age_limit,
}
title = remove_end(self._html_search_regex(
r'<title>([^<]+)</title>', webpage, 'title'), ' | Iwara')
info_dict = entries[0]
info_dict.update({
'id': video_id,
'title': title,
'age_limit': age_limit,
})
return info_dict

View File

@ -9,6 +9,7 @@ from ..utils import (
determine_ext,
float_or_none,
int_or_none,
js_to_json,
mimetype2ext,
)
@ -19,24 +20,32 @@ class JWPlatformBaseIE(InfoExtractor):
# TODO: Merge this with JWPlayer-related codes in generic.py
mobj = re.search(
'jwplayer\((?P<quote>[\'"])[^\'" ]+(?P=quote)\)\.setup\((?P<options>[^)]+)\)',
r'jwplayer\((?P<quote>[\'"])[^\'" ]+(?P=quote)\)\.setup\s*\((?P<options>[^)]+)\)',
webpage)
if mobj:
return mobj.group('options')
def _extract_jwplayer_data(self, webpage, video_id, *args, **kwargs):
jwplayer_data = self._parse_json(
self._find_jwplayer_data(webpage), video_id)
self._find_jwplayer_data(webpage), video_id,
transform_source=js_to_json)
return self._parse_jwplayer_data(
jwplayer_data, video_id, *args, **kwargs)
def _parse_jwplayer_data(self, jwplayer_data, video_id=None, require_title=True, m3u8_id=None, rtmp_params=None, base_url=None):
def _parse_jwplayer_data(self, jwplayer_data, video_id=None, require_title=True,
m3u8_id=None, mpd_id=None, rtmp_params=None, base_url=None):
# JWPlayer backward compatibility: flattened playlists
# https://github.com/jwplayer/jwplayer/blob/v7.4.3/src/js/api/config.js#L81-L96
if 'playlist' not in jwplayer_data:
jwplayer_data = {'playlist': [jwplayer_data]}
entries = []
# JWPlayer backward compatibility: single playlist item
# https://github.com/jwplayer/jwplayer/blob/v7.7.0/src/js/playlist/playlist.js#L10
if not isinstance(jwplayer_data['playlist'], list):
jwplayer_data['playlist'] = [jwplayer_data['playlist']]
for video_data in jwplayer_data['playlist']:
# JWPlayer backward compatibility: flattened sources
# https://github.com/jwplayer/jwplayer/blob/v7.4.3/src/js/playlist/item.js#L29-L35
@ -55,6 +64,9 @@ class JWPlatformBaseIE(InfoExtractor):
if source_type == 'hls' or ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
source_url, this_video_id, 'mp4', 'm3u8_native', m3u8_id=m3u8_id, fatal=False))
elif ext == 'mpd':
formats.extend(self._extract_mpd_formats(
source_url, this_video_id, mpd_id=mpd_id, fatal=False))
# https://github.com/jwplayer/jwplayer/blob/master/src/js/providers/default.js#L67
elif source_type.startswith('audio') or ext in ('oga', 'aac', 'mp3', 'mpeg', 'vorbis'):
formats.append({
@ -63,10 +75,17 @@ class JWPlatformBaseIE(InfoExtractor):
'ext': ext,
})
else:
height = int_or_none(source.get('height'))
if height is None:
# Often no height is provided but there is a label in
# format like 1080p.
height = int_or_none(self._search_regex(
r'^(\d{3,})[pP]$', source.get('label') or '',
'height', default=None))
a_format = {
'url': source_url,
'width': int_or_none(source.get('width')),
'height': int_or_none(source.get('height')),
'height': height,
'ext': ext,
}
if source_url.startswith('rtmp'):

View File

@ -105,20 +105,20 @@ class KalturaIE(InfoExtractor):
kWidget\.(?:thumb)?[Ee]mbed\(
\{.*?
(?P<q1>['\"])wid(?P=q1)\s*:\s*
(?P<q2>['\"])_?(?P<partner_id>[^'\"]+)(?P=q2),.*?
(?P<q2>['\"])_?(?P<partner_id>(?:(?!(?P=q2)).)+)(?P=q2),.*?
(?P<q3>['\"])entry_?[Ii]d(?P=q3)\s*:\s*
(?P<q4>['\"])(?P<id>[^'\"]+)(?P=q4),
(?P<q4>['\"])(?P<id>(?:(?!(?P=q4)).)+)(?P=q4),
""", webpage) or
re.search(
r'''(?xs)
(?P<q1>["\'])
(?:https?:)?//cdnapi(?:sec)?\.kaltura\.com/.*?(?:p|partner_id)/(?P<partner_id>\d+).*?
(?:https?:)?//cdnapi(?:sec)?\.kaltura\.com/(?:(?!(?P=q1)).)*(?:p|partner_id)/(?P<partner_id>\d+)(?:(?!(?P=q1)).)*
(?P=q1).*?
(?:
entry_?[Ii]d|
(?P<q2>["\'])entry_?[Ii]d(?P=q2)
)\s*:\s*
(?P<q3>["\'])(?P<id>.+?)(?P=q3)
(?P<q3>["\'])(?P<id>(?:(?!(?P=q3)).)+)(?P=q3)
''', webpage))
if mobj:
embed_info = mobj.groupdict()
@ -262,8 +262,16 @@ class KalturaIE(InfoExtractor):
# Continue if asset is not ready
if f.get('status') != 2:
continue
# Original format that's not available (e.g. kaltura:1926081:0_c03e1b5g)
# skip for now.
if f.get('fileExt') == 'chun':
continue
video_url = sign_url(
'%s/flavorId/%s' % (data_url, f['id']))
# audio-only has no videoCodecId (e.g. kaltura:1926081:0_c03e1b5g
# -f mp4-56)
vcodec = 'none' if 'videoCodecId' not in f and f.get(
'frameRate') == 0 else f.get('videoCodecId')
formats.append({
'format_id': '%(fileExt)s-%(bitrate)s' % f,
'ext': f.get('fileExt'),
@ -271,7 +279,7 @@ class KalturaIE(InfoExtractor):
'fps': int_or_none(f.get('frameRate')),
'filesize_approx': int_or_none(f.get('size'), invscale=1024),
'container': f.get('containerFormat'),
'vcodec': f.get('videoCodecId'),
'vcodec': vcodec,
'height': int_or_none(f.get('height')),
'width': int_or_none(f.get('width')),
'url': video_url,

View File

@ -5,7 +5,7 @@ from .common import InfoExtractor
class KaraoketvIE(InfoExtractor):
_VALID_URL = r'http://www.karaoketv.co.il/[^/]+/(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?karaoketv\.co\.il/[^/]+/(?P<id>\d+)'
_TEST = {
'url': 'http://www.karaoketv.co.il/%D7%A9%D7%99%D7%A8%D7%99_%D7%A7%D7%A8%D7%99%D7%95%D7%A7%D7%99/58356/%D7%90%D7%99%D7%96%D7%95%D7%9F',
'info_dict': {

View File

@ -0,0 +1,72 @@
from __future__ import unicode_literals
from .common import InfoExtractor
class KetnetIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?ketnet\.be/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://www.ketnet.be/kijken/zomerse-filmpjes',
'md5': 'd907f7b1814ef0fa285c0475d9994ed7',
'info_dict': {
'id': 'zomerse-filmpjes',
'ext': 'mp4',
'title': 'Gluur mee op de filmset en op Pennenzakkenrock',
'description': 'Gluur mee met Ghost Rockers op de filmset',
'thumbnail': 're:^https?://.*\.jpg$',
}
}, {
'url': 'https://www.ketnet.be/kijken/karrewiet/uitzending-8-september-2016',
'only_matching': True,
}, {
'url': 'https://www.ketnet.be/achter-de-schermen/sien-repeteert-voor-stars-for-life',
'only_matching': True,
}, {
# mzsource, geo restricted to Belgium
'url': 'https://www.ketnet.be/kijken/nachtwacht/de-bermadoe',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
config = self._parse_json(
self._search_regex(
r'(?s)playerConfig\s*=\s*({.+?})\s*;', webpage,
'player config'),
video_id)
title = config['title']
formats = []
for source_key in ('', 'mz'):
source = config.get('%ssource' % source_key)
if not isinstance(source, dict):
continue
for format_id, format_url in source.items():
if format_id == 'hls':
formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id=format_id,
fatal=False))
elif format_id == 'hds':
formats.extend(self._extract_f4m_formats(
format_url, video_id, f4m_id=format_id, fatal=False))
else:
formats.append({
'url': format_url,
'format_id': format_id,
})
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': config.get('description'),
'thumbnail': config.get('image'),
'series': config.get('program'),
'episode': config.get('episode'),
'formats': formats,
}

View File

@ -6,7 +6,7 @@ from ..utils import smuggle_url
class KickStarterIE(InfoExtractor):
_VALID_URL = r'https?://www\.kickstarter\.com/projects/(?P<id>[^/]*)/.*'
_VALID_URL = r'https?://(?:www\.)?kickstarter\.com/projects/(?P<id>[^/]*)/.*'
_TESTS = [{
'url': 'https://www.kickstarter.com/projects/1404461844/intersection-the-story-of-josh-grant/description',
'md5': 'c81addca81327ffa66c642b5d8b08cab',

View File

@ -59,7 +59,7 @@ class KuwoBaseIE(InfoExtractor):
class KuwoIE(KuwoBaseIE):
IE_NAME = 'kuwo:song'
IE_DESC = '酷我音乐'
_VALID_URL = r'https?://www\.kuwo\.cn/yinyue/(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?kuwo\.cn/yinyue/(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.kuwo.cn/yinyue/635632/',
'info_dict': {
@ -82,7 +82,7 @@ class KuwoIE(KuwoBaseIE):
'upload_date': '20150518',
},
'params': {
'format': 'mp3-320'
'format': 'mp3-320',
},
}, {
'url': 'http://www.kuwo.cn/yinyue/3197154?catalog=yueku2016',
@ -91,10 +91,10 @@ class KuwoIE(KuwoBaseIE):
def _real_extract(self, url):
song_id = self._match_id(url)
webpage = self._download_webpage(
webpage, urlh = self._download_webpage_handle(
url, song_id, note='Download song detail info',
errnote='Unable to get song detail info')
if '对不起,该歌曲由于版权问题已被下线,将返回网站首页' in webpage:
if song_id not in urlh.geturl() or '对不起,该歌曲由于版权问题已被下线,将返回网站首页' in webpage:
raise ExtractorError('this song has been offline because of copyright issues', expected=True)
song_name = self._html_search_regex(
@ -139,7 +139,7 @@ class KuwoIE(KuwoBaseIE):
class KuwoAlbumIE(InfoExtractor):
IE_NAME = 'kuwo:album'
IE_DESC = '酷我音乐 - 专辑'
_VALID_URL = r'https?://www\.kuwo\.cn/album/(?P<id>\d+?)/'
_VALID_URL = r'https?://(?:www\.)?kuwo\.cn/album/(?P<id>\d+?)/'
_TEST = {
'url': 'http://www.kuwo.cn/album/502294/',
'info_dict': {
@ -181,7 +181,7 @@ class KuwoChartIE(InfoExtractor):
'info_dict': {
'id': '香港中文龙虎榜',
},
'playlist_mincount': 10,
'playlist_mincount': 7,
}
def _real_extract(self, url):
@ -200,7 +200,7 @@ class KuwoChartIE(InfoExtractor):
class KuwoSingerIE(InfoExtractor):
IE_NAME = 'kuwo:singer'
IE_DESC = '酷我音乐 - 歌手'
_VALID_URL = r'https?://www\.kuwo\.cn/mingxing/(?P<id>[^/]+)'
_VALID_URL = r'https?://(?:www\.)?kuwo\.cn/mingxing/(?P<id>[^/]+)'
_TESTS = [{
'url': 'http://www.kuwo.cn/mingxing/bruno+mars/',
'info_dict': {
@ -296,14 +296,14 @@ class KuwoCategoryIE(InfoExtractor):
class KuwoMvIE(KuwoBaseIE):
IE_NAME = 'kuwo:mv'
IE_DESC = '酷我音乐 - MV'
_VALID_URL = r'https?://www\.kuwo\.cn/mv/(?P<id>\d+?)/'
_VALID_URL = r'https?://(?:www\.)?kuwo\.cn/mv/(?P<id>\d+?)/'
_TEST = {
'url': 'http://www.kuwo.cn/mv/6480076/',
'info_dict': {
'id': '6480076',
'ext': 'mp4',
'title': 'My HouseMV',
'creator': 'PM02:00',
'creator': '2PM',
},
# In this video, music URLs (anti.s) are blocked outside China and
# USA, while the MV URL (mvurl) is available globally, so force the MV

View File

@ -0,0 +1,24 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class LCIIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?lci\.fr/[^/]+/[\w-]+-(?P<id>\d+)\.html'
_TEST = {
'url': 'http://www.lci.fr/international/etats-unis-a-j-62-hillary-clinton-reste-sans-voix-2001679.html',
'md5': '2fdb2538b884d4d695f9bd2bde137e6c',
'info_dict': {
'id': '13244802',
'ext': 'mp4',
'title': 'Hillary Clinton et sa quinte de toux, en plein meeting',
'description': 'md5:a4363e3a960860132f8124b62f4a01c9',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
wat_id = self._search_regex(r'data-watid=[\'"](\d+)', webpage, 'wat id')
return self.url_result('wat:' + wat_id, 'Wat', wat_id)

View File

@ -29,7 +29,7 @@ from ..utils import (
class LeIE(InfoExtractor):
IE_DESC = '乐视网'
_VALID_URL = r'https?://(?:www\.le\.com/ptv/vplay|sports\.le\.com/video)/(?P<id>\d+)\.html'
_VALID_URL = r'https?://(?:www\.le\.com/ptv/vplay|(?:sports\.le|(?:www\.)?lesports)\.com/(?:match|video))/(?P<id>\d+)\.html'
_URL_TEMPLATE = 'http://www.le.com/ptv/vplay/%s.html'
@ -73,6 +73,12 @@ class LeIE(InfoExtractor):
}, {
'url': 'http://sports.le.com/video/25737697.html',
'only_matching': True,
}, {
'url': 'http://www.lesports.com/match/1023203003.html',
'only_matching': True,
}, {
'url': 'http://sports.le.com/match/1023203003.html',
'only_matching': True,
}]
# ror() and calc_time_key() are reversed from a embedded swf file in KLetvPlayer.swf

View File

@ -59,7 +59,7 @@ class LimelightBaseIE(InfoExtractor):
format_id = 'rtmp'
if stream.get('videoBitRate'):
format_id += '-%d' % int_or_none(stream['videoBitRate'])
http_url = 'http://%s/%s' % (rtmp.group('host').replace('csl.', 'cpl.'), rtmp.group('playpath')[4:])
http_url = 'http://cpl.delvenetworks.com/' + rtmp.group('playpath')[4:]
urls.append(http_url)
http_fmt = fmt.copy()
http_fmt.update({

View File

@ -14,7 +14,7 @@ from ..utils import (
class LiTVIE(InfoExtractor):
_VALID_URL = r'https?://www\.litv\.tv/(?:vod|promo)/[^/]+/(?:content\.do)?\?.*?\b(?:content_)?id=(?P<id>[^&]+)'
_VALID_URL = r'https?://(?:www\.)?litv\.tv/(?:vod|promo)/[^/]+/(?:content\.do)?\?.*?\b(?:content_)?id=(?P<id>[^&]+)'
_URL_TEMPLATE = 'https://www.litv.tv/vod/%s/content.do?id=%s'

View File

@ -1,8 +1,11 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
int_or_none,
parse_duration,
remove_end,
@ -12,8 +15,10 @@ from ..utils import (
class LRTIE(InfoExtractor):
IE_NAME = 'lrt.lt'
_VALID_URL = r'https?://(?:www\.)?lrt\.lt/mediateka/irasas/(?P<id>[0-9]+)'
_TEST = {
_TESTS = [{
# m3u8 download
'url': 'http://www.lrt.lt/mediateka/irasas/54391/',
'md5': 'fe44cf7e4ab3198055f2c598fc175cb0',
'info_dict': {
'id': '54391',
'ext': 'mp4',
@ -23,20 +28,45 @@ class LRTIE(InfoExtractor):
'view_count': int,
'like_count': int,
},
'params': {
'skip_download': True, # m3u8 download
}, {
# direct mp3 download
'url': 'http://www.lrt.lt/mediateka/irasas/1013074524/',
'md5': '389da8ca3cad0f51d12bed0c844f6a0a',
'info_dict': {
'id': '1013074524',
'ext': 'mp3',
'title': 'Kita tema 2016-09-05 15:05',
'description': 'md5:1b295a8fc7219ed0d543fc228c931fb5',
'duration': 3008,
'view_count': int,
'like_count': int,
},
}
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = remove_end(self._og_search_title(webpage), ' - LRT')
m3u8_url = self._search_regex(
r'file\s*:\s*(["\'])(?P<url>.+?)\1\s*\+\s*location\.hash\.substring\(1\)',
webpage, 'm3u8 url', group='url')
formats = self._extract_m3u8_formats(m3u8_url, video_id, 'mp4')
formats = []
for _, file_url in re.findall(
r'file\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage):
ext = determine_ext(file_url)
if ext not in ('m3u8', 'mp3'):
continue
# mp3 served as m3u8 produces stuttered media file
if ext == 'm3u8' and '.mp3' in file_url:
continue
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
file_url, video_id, 'mp4', entry_protocol='m3u8_native',
fatal=False))
elif ext == 'mp3':
formats.append({
'url': file_url,
'vcodec': 'none',
})
self._sort_formats(formats)
thumbnail = self._og_search_thumbnail(webpage)

View File

@ -94,7 +94,7 @@ class LyndaBaseIE(InfoExtractor):
class LyndaIE(LyndaBaseIE):
IE_NAME = 'lynda'
IE_DESC = 'lynda.com videos'
_VALID_URL = r'https?://www\.lynda\.com/(?:[^/]+/[^/]+/\d+|player/embed)/(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?lynda\.com/(?:[^/]+/[^/]+/\d+|player/embed)/(?P<id>\d+)'
_TIMECODE_REGEX = r'\[(?P<timecode>\d+:\d+:\d+[\.,]\d+)\]'

View File

@ -7,7 +7,7 @@ from ..utils import ExtractorError
class MacGameStoreIE(InfoExtractor):
IE_NAME = 'macgamestore'
IE_DESC = 'MacGameStore trailers'
_VALID_URL = r'https?://www\.macgamestore\.com/mediaviewer\.php\?trailer=(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?macgamestore\.com/mediaviewer\.php\?trailer=(?P<id>\d+)'
_TEST = {
'url': 'http://www.macgamestore.com/mediaviewer.php?trailer=2450',

View File

@ -0,0 +1,54 @@
# coding: utf-8
from __future__ import unicode_literals
import base64
from .common import InfoExtractor
from ..compat import compat_urllib_parse_unquote
from ..utils import (
int_or_none,
)
class MangomoloBaseIE(InfoExtractor):
def _get_real_id(self, page_id):
return page_id
def _real_extract(self, url):
page_id = self._get_real_id(self._match_id(url))
webpage = self._download_webpage(url, page_id)
hidden_inputs = self._hidden_inputs(webpage)
m3u8_entry_protocol = 'm3u8' if self._IS_LIVE else 'm3u8_native'
format_url = self._html_search_regex(
[
r'file\s*:\s*"(https?://[^"]+?/playlist.m3u8)',
r'<a[^>]+href="(rtsp://[^"]+)"'
], webpage, 'format url')
formats = self._extract_wowza_formats(
format_url, page_id, m3u8_entry_protocol, ['smil'])
self._sort_formats(formats)
return {
'id': page_id,
'title': self._live_title(page_id) if self._IS_LIVE else page_id,
'uploader_id': hidden_inputs.get('userid'),
'duration': int_or_none(hidden_inputs.get('duration')),
'is_live': self._IS_LIVE,
'formats': formats,
}
class MangomoloVideoIE(MangomoloBaseIE):
IE_NAME = 'mangomolo:video'
_VALID_URL = r'https?://admin\.mangomolo\.com/analytics/index\.php/customers/embed/video\?.*?\bid=(?P<id>\d+)'
_IS_LIVE = False
class MangomoloLiveIE(MangomoloBaseIE):
IE_NAME = 'mangomolo:live'
_VALID_URL = r'https?://admin\.mangomolo\.com/analytics/index\.php/customers/embed/index\?.*?\bchannelid=(?P<id>(?:[A-Za-z0-9+/=]|%2B|%2F|%3D)+)'
_IS_LIVE = True
def _get_real_id(self, page_id):
return base64.b64decode(compat_urllib_parse_unquote(page_id).encode()).decode()

View File

@ -9,7 +9,7 @@ from ..utils import (
class MetacriticIE(InfoExtractor):
_VALID_URL = r'https?://www\.metacritic\.com/.+?/trailers/(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?metacritic\.com/.+?/trailers/(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.metacritic.com/game/playstation-4/infamous-second-son/trailers/3698222',

View File

@ -6,7 +6,7 @@ from ..utils import int_or_none
class MGTVIE(InfoExtractor):
_VALID_URL = r'https?://www\.mgtv\.com/v/(?:[^/]+/)*(?P<id>\d+)\.html'
_VALID_URL = r'https?://(?:www\.)?mgtv\.com/v/(?:[^/]+/)*(?P<id>\d+)\.html'
IE_DESC = '芒果TV'
_TESTS = [{

View File

@ -0,0 +1,40 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class MiaoPaiIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?miaopai\.com/show/(?P<id>[-A-Za-z0-9~_]+)'
_TEST = {
'url': 'http://www.miaopai.com/show/n~0hO7sfV1nBEw4Y29-Hqg__.htm',
'md5': '095ed3f1cd96b821add957bdc29f845b',
'info_dict': {
'id': 'n~0hO7sfV1nBEw4Y29-Hqg__',
'ext': 'mp4',
'title': '西游记音乐会的秒拍视频',
'thumbnail': 're:^https?://.*/n~0hO7sfV1nBEw4Y29-Hqg___m.jpg',
}
}
_USER_AGENT_IPAD = 'Mozilla/5.0 (iPad; CPU OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Safari/601.1'
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(
url, video_id, headers={'User-Agent': self._USER_AGENT_IPAD})
title = self._html_search_regex(
r'<title>([^<]+)</title>', webpage, 'title')
thumbnail = self._html_search_regex(
r'<div[^>]+class=(?P<q1>[\'"]).*\bvideo_img\b.*(?P=q1)[^>]+data-url=(?P<q2>[\'"])(?P<url>[^\'"]+)(?P=q2)',
webpage, 'thumbnail', fatal=False, group='url')
videos = self._parse_html5_media_entries(url, webpage, video_id)
info = videos[0]
info.update({
'id': video_id,
'title': title,
'thumbnail': thumbnail,
})
return info

View File

@ -8,7 +8,7 @@ from ..utils import (
class MinistryGridIE(InfoExtractor):
_VALID_URL = r'https?://www\.ministrygrid.com/([^/?#]*/)*(?P<id>[^/#?]+)/?(?:$|[?#])'
_VALID_URL = r'https?://(?:www\.)?ministrygrid\.com/([^/?#]*/)*(?P<id>[^/#?]+)/?(?:$|[?#])'
_TEST = {
'url': 'http://www.ministrygrid.com/training-viewer/-/training/t4g-2014-conference/the-gospel-by-numbers-4/the-gospel-by-numbers',

View File

@ -74,7 +74,7 @@ class MiTeleBaseIE(InfoExtractor):
class MiTeleIE(MiTeleBaseIE):
IE_DESC = 'mitele.es'
_VALID_URL = r'https?://www\.mitele\.es/(?:[^/]+/){3}(?P<id>[^/]+)/'
_VALID_URL = r'https?://(?:www\.)?mitele\.es/(?:[^/]+/){3}(?P<id>[^/]+)/'
_TESTS = [{
'url': 'http://www.mitele.es/programas-tv/diario-de/la-redaccion/programa-144/',

View File

@ -35,7 +35,8 @@ class MoeVideoIE(InfoExtractor):
'height': 360,
'duration': 179,
'filesize': 17822500,
}
},
'skip': 'Video has been removed',
},
{
'url': 'http://playreplay.net/video/77107.7f325710a627383d40540d8e991a',

View File

@ -9,7 +9,7 @@ from ..compat import (
class MotorsportIE(InfoExtractor):
IE_DESC = 'motorsport.com'
_VALID_URL = r'https?://www\.motorsport\.com/[^/?#]+/video/(?:[^/?#]+/)(?P<id>[^/]+)/?(?:$|[?#])'
_VALID_URL = r'https?://(?:www\.)?motorsport\.com/[^/?#]+/video/(?:[^/?#]+/)(?P<id>[^/]+)/?(?:$|[?#])'
_TEST = {
'url': 'http://www.motorsport.com/f1/video/main-gallery/red-bull-racing-2014-rules-explained/',
'info_dict': {

View File

@ -7,7 +7,7 @@ from .common import InfoExtractor
class MoviezineIE(InfoExtractor):
_VALID_URL = r'https?://www\.moviezine\.se/video/(?P<id>[^?#]+)'
_VALID_URL = r'https?://(?:www\.)?moviezine\.se/video/(?P<id>[^?#]+)'
_TEST = {
'url': 'http://www.moviezine.se/video/205866',

View File

@ -270,6 +270,29 @@ class MTVServicesEmbeddedIE(MTVServicesInfoExtractor):
class MTVIE(MTVServicesInfoExtractor):
IE_NAME = 'mtv'
_VALID_URL = r'https?://(?:www\.)?mtv\.com/(?:video-clips|full-episodes)/(?P<id>[^/?#.]+)'
_FEED_URL = 'http://www.mtv.com/feeds/mrss/'
_TESTS = [{
'url': 'http://www.mtv.com/video-clips/vl8qof/unlocking-the-truth-trailer',
'md5': '1edbcdf1e7628e414a8c5dcebca3d32b',
'info_dict': {
'id': '5e14040d-18a4-47c4-a582-43ff602de88e',
'ext': 'mp4',
'title': 'Unlocking The Truth|July 18, 2016|1|101|Trailer',
'description': '"Unlocking the Truth" premieres August 17th at 11/10c.',
'timestamp': 1468846800,
'upload_date': '20160718',
},
}, {
'url': 'http://www.mtv.com/full-episodes/94tujl/unlocking-the-truth-gates-of-hell-season-1-ep-101',
'only_matching': True,
}]
class MTVVideoIE(MTVServicesInfoExtractor):
IE_NAME = 'mtv:video'
_VALID_URL = r'''(?x)^https?://
(?:(?:www\.)?mtv\.com/videos/.+?/(?P<videoid>[0-9]+)/[^/]+$|
m\.mtv\.com/videos/video\.rbml\?.*?id=(?P<mgid>[^&]+))'''

View File

@ -9,9 +9,9 @@ from ..utils import (
class MwaveIE(InfoExtractor):
_VALID_URL = r'https?://mwave\.interest\.me/mnettv/videodetail\.m\?searchVideoDetailVO\.clip_id=(?P<id>[0-9]+)'
_VALID_URL = r'https?://mwave\.interest\.me/(?:[^/]+/)?mnettv/videodetail\.m\?searchVideoDetailVO\.clip_id=(?P<id>[0-9]+)'
_URL_TEMPLATE = 'http://mwave.interest.me/mnettv/videodetail.m?searchVideoDetailVO.clip_id=%s'
_TEST = {
_TESTS = [{
'url': 'http://mwave.interest.me/mnettv/videodetail.m?searchVideoDetailVO.clip_id=168859',
# md5 is unstable
'info_dict': {
@ -23,7 +23,10 @@ class MwaveIE(InfoExtractor):
'duration': 206,
'view_count': int,
}
}
}, {
'url': 'http://mwave.interest.me/en/mnettv/videodetail.m?searchVideoDetailVO.clip_id=176199',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
@ -60,8 +63,8 @@ class MwaveIE(InfoExtractor):
class MwaveMeetGreetIE(InfoExtractor):
_VALID_URL = r'https?://mwave\.interest\.me/meetgreet/view/(?P<id>\d+)'
_TEST = {
_VALID_URL = r'https?://mwave\.interest\.me/(?:[^/]+/)?meetgreet/view/(?P<id>\d+)'
_TESTS = [{
'url': 'http://mwave.interest.me/meetgreet/view/256',
'info_dict': {
'id': '173294',
@ -72,7 +75,10 @@ class MwaveMeetGreetIE(InfoExtractor):
'duration': 3634,
'view_count': int,
}
}
}, {
'url': 'http://mwave.interest.me/en/meetgreet/view/256',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)

View File

@ -11,7 +11,7 @@ from ..utils import (
class MySpassIE(InfoExtractor):
_VALID_URL = r'https?://www\.myspass\.de/.*'
_VALID_URL = r'https?://(?:www\.)?myspass\.de/.*'
_TEST = {
'url': 'http://www.myspass.de/myspass/shows/tvshows/absolute-mehrheit/Absolute-Mehrheit-vom-17022013-Die-Highlights-Teil-2--/11741/',
'md5': '0b49f4844a068f8b33f4b7c88405862b',

View File

@ -13,7 +13,7 @@ from ..utils import (
class NBCIE(InfoExtractor):
_VALID_URL = r'https?://www\.nbc\.com/(?:[^/]+/)+(?P<id>n?\d+)'
_VALID_URL = r'https?://(?:www\.)?nbc\.com/(?:[^/]+/)+(?P<id>n?\d+)'
_TESTS = [
{
@ -138,7 +138,7 @@ class NBCSportsVPlayerIE(InfoExtractor):
class NBCSportsIE(InfoExtractor):
# Does not include https because its certificate is invalid
_VALID_URL = r'https?://www\.nbcsports\.com//?(?:[^/]+/)+(?P<id>[0-9a-z-]+)'
_VALID_URL = r'https?://(?:www\.)?nbcsports\.com//?(?:[^/]+/)+(?P<id>[0-9a-z-]+)'
_TEST = {
'url': 'http://www.nbcsports.com//college-basketball/ncaab/tom-izzo-michigan-st-has-so-much-respect-duke',
@ -161,7 +161,7 @@ class NBCSportsIE(InfoExtractor):
class CSNNEIE(InfoExtractor):
_VALID_URL = r'https?://www\.csnne\.com/video/(?P<id>[0-9a-z-]+)'
_VALID_URL = r'https?://(?:www\.)?csnne\.com/video/(?P<id>[0-9a-z-]+)'
_TEST = {
'url': 'http://www.csnne.com/video/snc-evening-update-wright-named-red-sox-no-5-starter',
@ -335,3 +335,43 @@ class NBCNewsIE(ThePlatformIE):
'url': 'http://feed.theplatform.com/f/2E2eJC/nnd_NBCNews?byId=%s' % video_id,
'ie_key': 'ThePlatformFeed',
}
class NBCOlympicsIE(InfoExtractor):
_VALID_URL = r'https?://www\.nbcolympics\.com/video/(?P<id>[a-z-]+)'
_TEST = {
# Geo-restricted to US
'url': 'http://www.nbcolympics.com/video/justin-roses-son-leo-was-tears-after-his-dad-won-gold',
'md5': '54fecf846d05429fbaa18af557ee523a',
'info_dict': {
'id': 'WjTBzDXx5AUq',
'display_id': 'justin-roses-son-leo-was-tears-after-his-dad-won-gold',
'ext': 'mp4',
'title': 'Rose\'s son Leo was in tears after his dad won gold',
'description': 'Olympic gold medalist Justin Rose gets emotional talking to the impact his win in men\'s golf has already had on his children.',
'timestamp': 1471274964,
'upload_date': '20160815',
'uploader': 'NBCU-SPORTS',
},
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
drupal_settings = self._parse_json(self._search_regex(
r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);',
webpage, 'drupal settings'), display_id)
iframe_url = drupal_settings['vod']['iframe_url']
theplatform_url = iframe_url.replace(
'vplayer.nbcolympics.com', 'player.theplatform.com')
return {
'_type': 'url_transparent',
'url': theplatform_url,
'ie_key': ThePlatformIE.ie_key(),
'display_id': display_id,
}

View File

@ -23,7 +23,7 @@ class NDRBaseIE(InfoExtractor):
class NDRIE(NDRBaseIE):
IE_NAME = 'ndr'
IE_DESC = 'NDR.de - Norddeutscher Rundfunk'
_VALID_URL = r'https?://www\.ndr\.de/(?:[^/]+/)*(?P<id>[^/?#]+),[\da-z]+\.html'
_VALID_URL = r'https?://(?:www\.)?ndr\.de/(?:[^/]+/)*(?P<id>[^/?#]+),[\da-z]+\.html'
_TESTS = [{
# httpVideo, same content id
'url': 'http://www.ndr.de/fernsehen/Party-Poette-und-Parade,hafengeburtstag988.html',
@ -105,7 +105,7 @@ class NDRIE(NDRBaseIE):
class NJoyIE(NDRBaseIE):
IE_NAME = 'njoy'
IE_DESC = 'N-JOY'
_VALID_URL = r'https?://www\.n-joy\.de/(?:[^/]+/)*(?:(?P<display_id>[^/?#]+),)?(?P<id>[\da-z]+)\.html'
_VALID_URL = r'https?://(?:www\.)?n-joy\.de/(?:[^/]+/)*(?:(?P<display_id>[^/?#]+),)?(?P<id>[\da-z]+)\.html'
_TESTS = [{
# httpVideo, same content id
'url': 'http://www.n-joy.de/entertainment/comedy/comedy_contest/Benaissa-beim-NDR-Comedy-Contest,comedycontest2480.html',
@ -238,7 +238,7 @@ class NDREmbedBaseIE(InfoExtractor):
class NDREmbedIE(NDREmbedBaseIE):
IE_NAME = 'ndr:embed'
_VALID_URL = r'https?://www\.ndr\.de/(?:[^/]+/)*(?P<id>[\da-z]+)-(?:player|externalPlayer)\.html'
_VALID_URL = r'https?://(?:www\.)?ndr\.de/(?:[^/]+/)*(?P<id>[\da-z]+)-(?:player|externalPlayer)\.html'
_TESTS = [{
'url': 'http://www.ndr.de/fernsehen/sendungen/ndr_aktuell/ndraktuell28488-player.html',
'md5': '8b9306142fe65bbdefb5ce24edb6b0a9',
@ -332,7 +332,7 @@ class NDREmbedIE(NDREmbedBaseIE):
class NJoyEmbedIE(NDREmbedBaseIE):
IE_NAME = 'njoy:embed'
_VALID_URL = r'https?://www\.n-joy\.de/(?:[^/]+/)*(?P<id>[\da-z]+)-(?:player|externalPlayer)_[^/]+\.html'
_VALID_URL = r'https?://(?:www\.)?n-joy\.de/(?:[^/]+/)*(?P<id>[\da-z]+)-(?:player|externalPlayer)_[^/]+\.html'
_TESTS = [{
# httpVideo
'url': 'http://www.n-joy.de/events/reeperbahnfestival/doku948-player_image-bc168e87-5263-4d6d-bd27-bb643005a6de_theme-n-joy.html',

View File

@ -1,15 +1,12 @@
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
class NewgroundsIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?newgrounds\.com/(?:audio/listen|portal/view)/(?P<id>[0-9]+)'
_TESTS = [{
'url': 'http://www.newgrounds.com/audio/listen/549479',
'url': 'https://www.newgrounds.com/audio/listen/549479',
'md5': 'fe6033d297591288fa1c1f780386f07a',
'info_dict': {
'id': '549479',
@ -18,7 +15,7 @@ class NewgroundsIE(InfoExtractor):
'uploader': 'Burn7',
}
}, {
'url': 'http://www.newgrounds.com/portal/view/673111',
'url': 'https://www.newgrounds.com/portal/view/673111',
'md5': '3394735822aab2478c31b1004fe5e5bc',
'info_dict': {
'id': '673111',
@ -29,24 +26,20 @@ class NewgroundsIE(InfoExtractor):
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
music_id = mobj.group('id')
webpage = self._download_webpage(url, music_id)
media_id = self._match_id(url)
webpage = self._download_webpage(url, media_id)
title = self._html_search_regex(
r'<title>([^>]+)</title>', webpage, 'title')
uploader = self._html_search_regex(
[r',"artist":"([^"]+)",', r'[\'"]owner[\'"]\s*:\s*[\'"]([^\'"]+)[\'"],'],
webpage, 'uploader')
r'Author\s*<a[^>]+>([^<]+)', webpage, 'uploader', fatal=False)
music_url_json_string = self._html_search_regex(
r'({"url":"[^"]+"),', webpage, 'music url') + '}'
music_url_json = json.loads(music_url_json_string)
music_url = music_url_json['url']
music_url = self._parse_json(self._search_regex(
r'"url":("[^"]+"),', webpage, ''), media_id)
return {
'id': music_id,
'id': media_id,
'title': title,
'url': music_url,
'uploader': uploader,

View File

@ -7,7 +7,7 @@ from ..utils import parse_iso8601
class NextMediaIE(InfoExtractor):
IE_DESC = '蘋果日報'
_VALID_URL = r'https?://hk.apple.nextmedia.com/[^/]+/[^/]+/(?P<date>\d+)/(?P<id>\d+)'
_VALID_URL = r'https?://hk\.apple\.nextmedia\.com/[^/]+/[^/]+/(?P<date>\d+)/(?P<id>\d+)'
_TESTS = [{
'url': 'http://hk.apple.nextmedia.com/realtime/news/20141108/53109199',
'md5': 'dff9fad7009311c421176d1ac90bfe4f',
@ -68,7 +68,7 @@ class NextMediaIE(InfoExtractor):
class NextMediaActionNewsIE(NextMediaIE):
IE_DESC = '蘋果日報 - 動新聞'
_VALID_URL = r'https?://hk.dv.nextmedia.com/actionnews/[^/]+/(?P<date>\d+)/(?P<id>\d+)/\d+'
_VALID_URL = r'https?://hk\.dv\.nextmedia\.com/actionnews/[^/]+/(?P<date>\d+)/(?P<id>\d+)/\d+'
_TESTS = [{
'url': 'http://hk.dv.nextmedia.com/actionnews/hit/20150121/19009428/20061460',
'md5': '05fce8ffeed7a5e00665d4b7cf0f9201',
@ -93,7 +93,7 @@ class NextMediaActionNewsIE(NextMediaIE):
class AppleDailyIE(NextMediaIE):
IE_DESC = '臺灣蘋果日報'
_VALID_URL = r'https?://(www|ent).appledaily.com.tw/(?:animation|appledaily|enews|realtimenews)/[^/]+/[^/]+/(?P<date>\d+)/(?P<id>\d+)(/.*)?'
_VALID_URL = r'https?://(www|ent)\.appledaily\.com\.tw/(?:animation|appledaily|enews|realtimenews)/[^/]+/[^/]+/(?P<date>\d+)/(?P<id>\d+)(/.*)?'
_TESTS = [{
'url': 'http://ent.appledaily.com.tw/enews/article/entertainment/20150128/36354694',
'md5': 'a843ab23d150977cc55ef94f1e2c1e4d',

Some files were not shown because too many files have changed in this diff Show More