Compare commits

...

183 Commits

Author SHA1 Message Date
8c6c88c7da release 2017.02.21 2017-02-21 23:48:24 +07:00
159aaaa9d0 [ChangeLog] Actualize 2017-02-21 23:46:58 +07:00
eea0716cae [extractor/common] Print origin country for fake IP 2017-02-21 23:14:33 +07:00
336a76551b [extractor/common] Do not quit _initialize_geo_bypass on empty countries 2017-02-21 23:09:41 +07:00
dc0a869e5e [extractor/common] Fix typo 2017-02-21 23:05:31 +07:00
e39b5d4ab8 [extractor/common] Allow calling _initialize_geo_bypass from extractors (#11970) 2017-02-21 23:00:43 +07:00
e469ab2528 [ninecninemedia] use geo bypass mechanism 2017-02-21 14:38:00 +01:00
890d44b005 [adobepass] add support for Time Warner Cable(closes #12191) 2017-02-20 19:00:40 +01:00
6926304472 [spankbang] Make uploader optional (closes #12193) 2017-02-21 00:54:43 +07:00
3ccdde8cb7 [extractor/common] Emphasize geo bypass APIs are experimental 2017-02-20 23:21:15 +07:00
da42ff0668 [iprima] Improve geo restriction detection and disable geo bypass 2017-02-20 23:17:19 +07:00
82f662182b [iprima] Modernize 2017-02-20 23:16:14 +07:00
2cc7fcd338 [commonmistakes] Disable UnicodeBOM extractor test for python 3.2 2017-02-20 03:06:52 +07:00
6d4c259765 [svt] PEP 8 2017-02-20 02:25:55 +07:00
c78dd35491 [nrk] PEP 8 2017-02-20 02:25:39 +07:00
8ffb8e63fe [prosiebensat1] Throw ExtractionError on unsupported page type (closes #12180) 2017-02-20 01:00:53 +07:00
983e9b7746 [nrk] Update _API_HOST and relax _VALID_URL 2017-02-20 00:59:31 +07:00
8936f68a0b [travis] Run tests in parallel
[test_download] Print test names in case of network errors

[test_download] Add comments for nose parameters

[test_download] Modify outtmpl to prevent info JSON filename conflicts

Thanks @jaimeMF for the idea.

[travis] Only download tests should be run in parallel
2017-02-19 21:26:35 +08:00
c58b7ffef4 [tv4] Bypass geo restriction and improve detection 2017-02-19 06:25:59 +07:00
f1a78ee4ef [tv4] Switch to hls3 protocol (closes #12177) 2017-02-19 06:16:00 +07:00
de64e23c56 [downloader/ism] Honor HTTP headers when downloading fragments 2017-02-19 04:18:36 +07:00
553f6dbac7 [downloader/dash] Honor HTTP headers when downloading fragments
For example, https://www.oppetarkiv.se/video/1196142/natten-ar-dagens-mor
2017-02-19 04:18:22 +07:00
0aa10994f4 [options] Move geo restriction related options to separate section 2017-02-19 05:10:08 +08:00
4248dad92b Improve geo bypass mechanism
* Rename options to preffixly match with --geo-verification-proxy
* Introduce _GEO_COUNTRIES for extractors
* Implement faking IP right away for sites with known geo restriction
2017-02-19 05:10:08 +08:00
0a840f584c Rename bypass geo restriction options 2017-02-19 05:10:08 +08:00
0016b84e16 Add faked X-Forwarded-For to formats' HTTP headers 2017-02-19 05:10:08 +08:00
18a0defab0 [utils] Make random_ipv4 return unicode string 2017-02-19 05:10:08 +08:00
5d3fbf77d9 [viki] Improve geo restriction detection 2017-02-19 05:10:08 +08:00
80b59020e0 [vgtv] Improve geo restriction detection 2017-02-19 05:10:08 +08:00
71631862f4 [srgssr] Improve geo restriction detection 2017-02-19 05:10:08 +08:00
89cc7fe770 [vbox7] Improve geo restriction detection and use geo bypass mechanism 2017-02-19 05:10:08 +08:00
04d906eae3 [svt] Improve geo restriction detection and use geo bypass mechanism 2017-02-19 05:10:08 +08:00
8ab8066cf0 [pbs] Improve geo restriction detection and use geo bypass mechanism 2017-02-19 05:10:08 +08:00
01b1aa9ff4 [ondemandkorea] Improve geo restriction detection and use geo bypass mechanism 2017-02-19 05:10:08 +08:00
ff4007891f [nrk] Improve geo restriction detection and use geo bypass mechanism 2017-02-19 05:10:08 +08:00
28200e654b [itv] Improve geo restriction detection and use geo bypass mechanism 2017-02-19 05:10:08 +08:00
e633f21a96 [go] Improve geo restriction detection and use geo bypass mechanism 2017-02-19 05:10:08 +08:00
d392005a79 [dramafever] Improve geo restriction detection and use geo bypass mechanism 2017-02-19 05:10:08 +08:00
773f291dcb Add experimental geo restriction bypass mechanism
Based on faking X-Forwarded-For HTTP header
2017-02-19 05:10:08 +08:00
bf5b9d859a [utils] Introduce YoutubeDLError base class for all youtube-dl exceptions 2017-02-19 05:10:08 +08:00
049a0f4d6d [brightcove:legacy] restrict videoPlayer value(closes #12040) 2017-02-18 21:08:40 +01:00
ac33accd96 [options] Mention quoted string literals for --match-filter 2017-02-18 23:59:26 +07:00
e84888b432 [tvn24] Improve extraction (closes #11679) 2017-02-18 23:34:09 +07:00
02d9b82a23 [tvn24] Add extractor 2017-02-18 23:33:49 +07:00
a2e3286676 [thisav] Add support for html5 media (closes #11771) 2017-02-18 20:21:53 +07:00
f75caf059e [metacafe] Improve (closes #10371) 2017-02-18 19:58:25 +07:00
bdabbc220c [metacafe] Bypass family filter
If you don't send this user=ffilter: false cookie, it will 301 redirect you to a page asking about it, and then the title check will fail.
2017-02-18 19:47:33 +07:00
70bcc444a9 [viceland] improve info extraction and update test 2017-02-18 09:52:43 +01:00
28e35f5070 release 2017.02.17 2017-02-17 23:59:56 +07:00
cf3704c132 [ChangeLog] Actualize 2017-02-17 23:48:30 +07:00
2c1f442c2b [options] Add missing spaces 2017-02-17 23:18:26 +07:00
bad4ccdb5d [heise] Improve (closes #9725) 2017-02-17 23:09:40 +07:00
db76c30c6e [heise] Support videos embedded in any article. 2017-02-17 22:55:53 +07:00
c2bde5d081 [ellentv] Improve 2017-02-17 22:45:51 +07:00
90fad0e74c [openload] Fix extraction (closes #12002) 2017-02-17 22:31:16 +07:00
d94badc755 [openload] Semifix extraction (closes #10408)
just updated the code. i don't do much python still i tried to convert my code. lemme know if there is any prob with it
2017-02-17 22:30:05 +07:00
fef51645d6 [theplatform] Recognize URLs with whitespaces (closes #12044) 2017-02-17 23:13:51 +08:00
4cead6a614 [einthusan] Relax _VALID_URL (closes #12141, closes #12159) 2017-02-17 22:02:01 +07:00
a4a554a793 [generic] Try parsing JWPlayer embedded videos (closes #12030) 2017-02-16 23:44:03 +08:00
b898f0a173 [elpais] Fix typo and improve extraction (closes #12139) 2017-02-16 04:57:42 +07:00
2480b056c1 release 2017.02.16 2017-02-16 00:10:04 +07:00
3aa25395aa [ChangeLog] Actualize 2017-02-16 00:08:56 +07:00
eafaeb226a [ceskatelevize] Lower priority for audio description sources (#12119) 2017-02-16 00:04:15 +07:00
de4d378c0c [ceskatelevize] Prefix format ids 2017-02-15 23:38:00 +07:00
099cfdb770 [devscripts/run_tests.sh] Change permission for script to 755 2017-02-16 00:28:31 +08:00
398dea3210 [test_YoutubeDL] Fix invalid escape sequences 2017-02-15 23:20:46 +07:00
db13c16ef8 [utils] Add support for quoted string literals in --match-filter (closes #8050, closes #12142, closes #12144) 2017-02-15 23:12:10 +07:00
1bd05345ea [amcnetworks] fix extraction(closes #12127) 2017-02-15 14:19:18 +01:00
3021cf83b7 [pinkbike] Fix uploader extraction (closes #12054) 2017-02-15 02:08:32 +07:00
04a741232f [onetpl] Add support for businessinsider.com.pl and plejada.pl 2017-02-15 01:23:55 +07:00
43a3d9edfc [onetpl] Add support for onet.pl (closes #10507) 2017-02-15 01:14:06 +07:00
d31aa74fdb [onetmvp] Add shortcut extractor 2017-02-15 00:58:18 +07:00
6092ccd058 [vodpl] Make more robust and add another test (closes #12122) 2017-02-15 00:52:31 +07:00
22ce9ad2bd [vod.pl] Add new extractor 2017-02-15 00:48:08 +07:00
9a372f14b4 [pornhub] Extract video URL from tv platform site (#12007, #12129) 2017-02-14 23:52:41 +07:00
5cb2d36c82 [ceskatelevize] Extract DASH formats (closes #12119, closes #12133) 2017-02-14 22:57:38 +07:00
fcca0d53a8 [ceskatelevize] Quick fix to revert to using old HLS-based playlist
This fixes recent changes in iVysilani. Proper patch should migrate to
MPEG-DASH version, which is now the default.
2017-02-14 22:25:37 +07:00
58a65ba852 release 2017.02.14 2017-02-14 01:09:18 +07:00
cedf08ff54 [ChangeLog] Actualize 2017-02-14 01:07:35 +07:00
50de3dbad3 [zdf] Fix extraction (closes #12117) 2017-02-14 01:00:06 +07:00
085f169ffe [xtube] Fix extraction for both kinds of video id (closes #12088) 2017-02-13 23:44:43 +07:00
f6d6ca1db3 [xtube] Improve title extraction 2017-02-13 23:34:14 +07:00
6e5956e6ba [lemonde] Fallback delegate extraction to generic extractor (closes #12115, closes #12116) 2017-02-13 23:17:48 +07:00
50fd3c2c69 Merge branch 'master' of github.com:rg3/youtube-dl 2017-02-13 22:58:50 +07:00
89c6691f9d [bellmedia] accept longer video id(closes #12114) 2017-02-13 15:08:48 +01:00
454e5cdb17 [limelight] add support referer protected videos 2017-02-13 14:29:05 +01:00
1de9f78e71 [travis] Separate builds for core and download 2017-02-13 18:56:05 +08:00
9dad941853 [disney] improve extraction
- add support for more urls
- detect expired videos
- skip Adobe Flash Access protected videos

closes #4975
closes #11000
closes #11882
closes #11936
2017-02-13 11:43:20 +01:00
1e2c3f61fc [travis] Separate builds for core and download 2017-02-13 17:36:13 +07:00
0dac7cbb09 [hotstar] improve extraction(closes #12096)
- extract all qualities
- detect drm protected videos
- extract more metadata
2017-02-12 17:35:24 +01:00
f8514630db [einthusan] Fix extraction (closes #11416)
The old test URLs are no longer valid, so I replace them with the one
from #11416
2017-02-12 20:53:55 +08:00
459818e280 [aenetworks] Add support for lifetimemovieclub.com 2017-02-12 20:18:11 +08:00
6310acf512 [youtube] Fix parsing codecs (closes #12091) 2017-02-12 18:09:53 +07:00
8d38dafbbf ChangeLog: update after #12085 2017-02-12 00:45:37 +08:00
f3915452de Merge pull request #12085 from wiiaboo/python2
utils.py: Workaround TypeError with Python 2.7.13 in Windows
2017-02-12 00:42:43 +08:00
2f49bcd690 utils.py: Workaround TypeError with Python 2.7.13 in Windows
Fixes #11540

Tested with Windows Python 2.7.12 and 2.7.13.
2017-02-11 14:51:28 +00:00
68c22c4c15 [iqiyi] Update _TESTS 2017-02-11 22:27:45 +08:00
9b92a5917b release 2017.02.11 2017-02-11 03:24:00 +07:00
3e2274c8b7 [ChangeLog] Actualize 2017-02-11 17:08:22 +07:00
3d7e3aaa0e [pluralsight:course] Fix extraction (closes #12075) 2017-02-11 17:00:52 +07:00
624c4b92ff [facebook] Add coding cookie 2017-02-11 16:18:45 +07:00
2af12ad9d2 Introduce get_elements_by_class and get_elements_by_attribute utility functions 2017-02-11 17:16:54 +08:00
97eb9bd2ac [bbc] extract m3u8 formats with 320k audio 2017-02-10 19:46:15 +01:00
71cdd75628 [facebook] Relax video id matching (closes #11017, closes #12055, closes #12056) 2017-02-11 01:05:22 +07:00
c7d6f614f3 [corus] Add new extractor(closes #12060)(#9164) 2017-02-10 17:00:09 +01:00
08a00eef79 [extractor/common] skip m3u8 manifests protected with Adobe Flash Access 2017-02-10 17:00:09 +01:00
9dd5408c99 [pluralsight] Detect blocked account error message (#12070) 2017-02-10 22:48:11 +07:00
9510709575 [bloomberg] Add another video id regex (closes #12062) 2017-02-10 22:16:20 +07:00
5abcca9060 [sixplay] use raw string for regex 2017-02-10 09:34:59 +01:00
e01bfc19c3 [extractor/commonmistakes] Restrict _VALID_URL (closes #12050) 2017-02-10 09:39:24 +07:00
4d32b63851 [tvplayer] Add new extractor 2017-02-09 23:09:21 +01:00
55d4de2283 release 2017.02.10 2017-02-10 01:27:33 +07:00
61ee556aea [ChangeLog] Actualize 2017-02-10 01:26:00 +07:00
ff24261ba0 [kaltura] Add explicit port to regexes
They should not match e.g. cdnapi.kaltura.computernetworks.com/...
2017-02-10 01:24:14 +07:00
fbc6dc525e [xtube] Fix shortcuts 2017-02-10 01:06:23 +07:00
9150d1eb69 [xtube] Fix extraction (closes #12023) 2017-02-10 01:03:35 +07:00
b7f9843bec [pornhub] Simplify (closes #12018) 2017-02-10 00:57:44 +07:00
e64b0fca14 [pornhub] Fix extraction (closes #12007) 2017-02-10 00:56:12 +07:00
78ef214d2d [facebook] Improve JS data regex (closes #12042) 2017-02-09 23:42:40 +07:00
be670b8e8f [external:ffmpeg] do not assume that ffmpeg unknown version format is new 2017-02-09 17:36:59 +01:00
37084f6641 [kaltura] improve embed partner id extraction(fixes #12041) 2017-02-09 16:24:54 +01:00
b04975733c [sprout] Add new extractor 2017-02-09 09:13:29 +01:00
c8b8fb0a99 [sixplay] improve extraction
- skip drm protected formats
- extract more and better formats
- skip duplicate asset urls
2017-02-08 22:56:10 +01:00
8298018273 [scrippsnetworks:watch] Add new extractor(closes #10765) 2017-02-08 20:44:23 +01:00
ae8d5a5c59 [go] add support for adobe pass auth(closes #11468)(closes #10831) 2017-02-08 18:57:07 +01:00
b9c9cb5f79 [6play] Fix extraction (closes #12011) 2017-02-08 23:15:39 +07:00
fdf9b959bc [nbc] add support adobe pass auth(closes #12006) 2017-02-08 16:23:42 +01:00
013877298d release 2017.02.07 2017-02-07 02:04:50 +07:00
c87f95f991 [ChangeLog] Actualize 2017-02-07 01:58:57 +07:00
f28aeff264 [pornhub] Fix extraction (closes #11997) 2017-02-07 01:52:59 +07:00
242a14a1f6 [extractor/common] Fix audio only with audio group in m3u8 (closes #11995) 2017-02-07 00:22:16 +07:00
d5d904ff7d [canalplus] Add support for cstar.fr (#11990) 2017-02-06 23:53:42 +07:00
5620f840f6 [extractor/generic] Add test for #11993 and more metadata for rtmp 2017-02-06 23:31:58 +07:00
b7a8c1bcfa [extractor/generic] Improve rtmp support (closes #11993) 2017-02-06 23:23:40 +07:00
7097bffba6 [downloader/fragment] Respect --no-part 2017-02-06 23:07:59 +07:00
2aec7256ae [extractor/common] Speed-up media tags regex (closes #11979) 2017-02-06 00:20:30 +07:00
815482d4eb Credit @motophil for gaskrank.py (#11685) 2017-02-06 00:38:22 +08:00
9c14fe9681 [gaskrank] Minor change and update ChangeLog after #11685 2017-02-06 00:25:28 +08:00
e705755739 [gaskrank] Add new extractor (#11685)
* [gaskrank] Add new extractor

* [gaskrank] Add new extractor - fixes as requested

* [gaskrank] Add new extractor - style fix

* [Gaskrank] Add new extractor - requested fixes

* [Gaskrank] Add new extractor - fix md5 checksum

* [gaskrank] Add new extractor - more requested fixes

* [Gaskrank] Add new extractor - fixed all but one quantified code issues

* [Gaskrank] add new extractor - more fields extracted, added second test

* [Gaskrank] Add new extractor - requested fixes.

* [Gaskrank] Add new extractor - requested changes.

* [Gaskrank] Add new extractor - final(?) fixes.
2017-02-06 00:19:37 +08:00
019f4c0371 [bandcamp] Fix extraction for incomplete albums
Closes #11727
2017-02-05 22:47:04 +08:00
2ab2c0d1f5 [iwara] Add width (closes #11724)
The heuristic is from #11724
2017-02-05 22:30:13 +08:00
caf0f5f8b7 [iwara] Fix extraction (closes #11781) 2017-02-05 21:48:13 +08:00
e4e50f60b1 [googledrive] Fix extraction on Python 3.6
Since Python 3.6, invalid escape sequences are deprecated. It's likely
that there are invalid escape sequences somewhere on the webpage, so
instead of unescaping the whole webpage, just unescape the URL.

See https://bugs.python.org/issue27364. That change was designed for
string literals, while it affects the 'unicode_escape' encoding as well.
The code path is:

str.decode('unicode_escape')
    codecs.unicode_escape_decode()
        PyUnicode_DecodeUnicodeEscape()
2017-02-05 21:41:08 +08:00
6ef3e65a7b [videopress] Add extractor 2017-02-05 13:37:27 +07:00
6fd138bed8 [sportbox] PEP 8 2017-02-05 13:36:52 +07:00
49bd8d5e2e [travis] Add python 3.6 2017-02-05 02:41:22 +07:00
3d2c2752c5 [afreecatv] extract rtmp formats 2017-02-04 18:18:28 +01:00
a713a86755 release 2017.02.04.1 2017-02-04 23:26:39 +07:00
7bccd5fc8a [ChangeLog] Actualize 2017-02-04 23:23:38 +07:00
3144eccf55 [ChangeLog] Actualize 2017-02-04 23:22:28 +07:00
9db8f6c540 [twitch:stream] Improve _VALID_URL (closes #11971) 2017-02-04 23:21:07 +07:00
8e4041cf3f [radiocanada] fix extraction for toutv rtmp formats 2017-02-04 17:05:35 +01:00
31487eb974 release 2017.02.04 2017-02-04 22:57:48 +07:00
c2521c1ac6 [Piksel] Add another app token regex 2017-02-04 23:23:14 +08:00
643dc0fcfe [vk] Catch author blocked error message
Example link (video in blocked group):
https://vk.com/search?c%5Bq%5D=%D0%9F%D1%80%D1%8B%D0%B6%D0%BE%D0%BA%20c%20%D0%BA%D1%80%D0%B0%D0%BD%D0%B0%20%D0%B2%20%D1%81%D1%82%D0%B8%D0%BB%D0%B5%20%D0%A7%D0%B5%D0%BB%D0%BE%D0%B2%D0%B5%D0%BA%D0%B0-%D0%BF%D0%B0%D1%83%D0%BA%D0%B0&c%5Bsection%5D=video&c%5Bsort%5D=2&z=video-10639516_456240611
2017-02-04 22:21:09 +07:00
36fce54816 [turner] fix downloading of secure hls formats using ffmpeg(closes #11358)(closes #11373)(closes #11800) 2017-02-04 15:23:46 +01:00
2c15db829c [drtv] add support for live and radio sections(closes #1827)(closes #3427) 2017-02-04 08:38:28 +01:00
f65dba7cdb [myspace] fix extraction and extract hls and http formats 2017-02-03 22:25:19 +01:00
605fd6392f [youtube] add format info for itag 325 and 328 2017-02-03 17:59:48 +01:00
f962790ee5 [vine] Fix extraction (closes #11955) 2017-02-03 21:56:48 +07:00
b7cc5f078e [extractors] Remove remnants of sportbox extractor (#11954) 2017-02-03 21:56:10 +07:00
f7a10d8cd6 [sportbox] Remove extractor (closes #11954)
Covered by generic extractor
2017-02-03 21:25:44 +07:00
daac118bf4 [ChangeLog] Update after #11901 2017-02-03 18:56:40 +08:00
8939f784d9 Merge pull request #11901 from ThomasChr/randonplaylistorder
New parameter --playlist-random to randomize playlist download order. Fixes #11889
2017-02-03 18:53:14 +08:00
df0588a31f Merge branch 'fstirlitz-filmon' 2017-02-03 10:15:52 +01:00
4ce3407d08 [filmon] improve extraction 2017-02-03 10:15:03 +01:00
d7f9242e30 [ChangeLog] Update after #11565 2017-02-03 12:13:24 +08:00
45024183ae [infoq] Add audio only format if available (#11565)
* [infoq] Add audio only format if available

Refactor cookie code into a function.
Renamed formats to http_video, http_audio, rtmp_video
Renamed extract functions to video instead of videos as they return
one or no video.

* [infoq] Rename to _extract_cookies as it more than one

* [infoq] Remove redundant determine_ext

* [infoq] Add comment about hardcoded URL

* [infoq] Use _hidden_inputs instead of messy regex

* [infoq] Probe if audio URL is valid

Make it possible to pass headers to _is_valid_url

* [infoq] Add audio only test
2017-02-03 12:10:13 +08:00
33da98f493 [douyutv] Improve room id regex
http://www.douyu.com/t/lpl  source get extra '\' with "room_id\" (from js coding)
2017-02-03 03:26:41 +07:00
4195096ea8 [utils] Improve comments processing in js_to_json (closes #11947) 2017-02-03 03:04:33 +07:00
0bbcc8a10a [iprima] Fix extraction (closes #11920, closes #11896) 2017-02-03 03:04:33 +07:00
b3ee552e4b [utils] Handle single-line comments in js_to_json 2017-02-03 03:04:33 +07:00
a22b2fd19b [youtube] Fix ytsearch* when cookies are provided
Closes #11924

The API with `page` is no longer used in browsers, and YouTube always
returns {'reload': 'now'} when cookies are provided.

See http://youtube.github.io/spfjs/documentation/start/ for how SPF
works. Basically appending static link with a `spf` parameter yields the
corresponding dynamic link.
2017-02-03 01:28:24 +08:00
c54c01f82d [go] Relax video id regex (closes #11937) 2017-02-02 23:04:46 +07:00
5a116e1302 [facebook] Fix title extraction (closes #11941) 2017-02-02 22:45:18 +07:00
a685751051 [youtube:playlist] Recognize TL playlists (closes #11945) 2017-02-02 22:01:11 +07:00
bd8f48c78b [bilibili] Support new Bangumi URLs (closes #11845)
To reduce complexity, I don't support old Bangumi URLs directly via
_VALID_URL. Instead, I choose to let it go to generic redirection. An
example can be found in #10190:

http://bangumi.bilibili.com/anime/v/40062
2017-02-02 21:51:31 +08:00
81aeafeb44 [cbc:watch] extract audio codec for audion only formats(fixes #11893) 2017-02-02 08:07:28 +01:00
8bdc149441 [downloader/external:ffmpeg] minimize the use of aac_adtstoasc filter 2017-02-02 08:07:28 +01:00
020c5df52d [elpais] Fix extraction for some URLs (closes #11765) 2017-02-01 23:48:34 +01:00
da162c1135 [compat] add compat_etree_register_namespace to __all__ list 2017-02-01 20:15:59 +01:00
75822ca790 New parameter --playlist-random to randomize playlist download order. Fixes #11889 2017-01-31 10:03:31 +01:00
a0758dfa1a [filmon] new extractor 2016-11-13 17:28:17 +01:00
112 changed files with 3407 additions and 1021 deletions

View File

@ -6,8 +6,8 @@
--- ---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.02.01*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected. ### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.02.21*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.02.01** - [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.02.21**
### Before submitting an *issue* make sure you have: ### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections - [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2017.02.01 [debug] youtube-dl version 2017.02.21
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View File

@ -6,8 +6,12 @@ python:
- "3.3" - "3.3"
- "3.4" - "3.4"
- "3.5" - "3.5"
- "3.6"
sudo: false sudo: false
script: nosetests test --verbose env:
- YTDL_TEST_SET=core
- YTDL_TEST_SET=download
script: ./devscripts/run_tests.sh
notifications: notifications:
email: email:
- filippo.valsorda@gmail.com - filippo.valsorda@gmail.com

View File

@ -201,3 +201,4 @@ Stephen Chen
Fabian Stahl Fabian Stahl
Bagira Bagira
Odd Stråbø Odd Stråbø
Philip Herzog

185
ChangeLog
View File

@ -1,3 +1,188 @@
version 2017.02.21
Core
* [extractor/common] Allow calling _initialize_geo_bypass from extractors
(#11970)
+ [adobepass] Add support for Time Warner Cable (#12191)
+ [travis] Run tests in parallel
+ [downloader/ism] Honor HTTP headers when downloading fragments
+ [downloader/dash] Honor HTTP headers when downloading fragments
+ [utils] Add GeoUtils class for working with geo tools and GeoUtils.random_ipv4
+ Add option --geo-bypass-country for explicit geo bypass on behalf of
specified country
+ Add options to control geo bypass mechanism --geo-bypass and --no-geo-bypass
+ Add experimental geo restriction bypass mechanism based on faking
X-Forwarded-For HTTP header
+ [utils] Introduce GeoRestrictedError for geo restricted videos
+ [utils] Introduce YoutubeDLError base class for all youtube-dl exceptions
Extractors
+ [ninecninemedia] Use geo bypass mechanism
* [spankbang] Make uploader optional (#12193)
+ [iprima] Improve geo restriction detection and disable geo bypass
* [iprima] Modernize
* [commonmistakes] Disable UnicodeBOM extractor test for python 3.2
+ [prosiebensat1] Throw ExtractionError on unsupported page type (#12180)
* [nrk] Update _API_HOST and relax _VALID_URL
+ [tv4] Bypass geo restriction and improve detection
* [tv4] Switch to hls3 protocol (#12177)
+ [viki] Improve geo restriction detection
+ [vgtv] Improve geo restriction detection
+ [srgssr] Improve geo restriction detection
+ [vbox7] Improve geo restriction detection and use geo bypass mechanism
+ [svt] Improve geo restriction detection and use geo bypass mechanism
+ [pbs] Improve geo restriction detection and use geo bypass mechanism
+ [ondemandkorea] Improve geo restriction detection and use geo bypass mechanism
+ [nrk] Improve geo restriction detection and use geo bypass mechanism
+ [itv] Improve geo restriction detection and use geo bypass mechanism
+ [go] Improve geo restriction detection and use geo bypass mechanism
+ [dramafever] Improve geo restriction detection and use geo bypass mechanism
* [brightcove:legacy] Restrict videoPlayer value (#12040)
+ [tvn24] Add support for tvn24.pl and tvn24bis.pl (#11679)
+ [thisav] Add support for HTML5 media (#11771)
* [metacafe] Bypass family filter (#10371)
* [viceland] Improve info extraction
version 2017.02.17
Extractors
* [heise] Improve extraction (#9725)
* [ellentv] Improve (#11653)
* [openload] Fix extraction (#10408, #12002)
+ [theplatform] Recognize URLs with whitespaces (#12044)
* [einthusan] Relax URL regular expression (#12141, #12159)
+ [generic] Support complex JWPlayer embedded videos (#12030)
* [elpais] Improve extraction (#12139)
version 2017.02.16
Core
+ [utils] Add support for quoted string literals in --match-filter (#8050,
#12142, #12144)
Extractors
* [ceskatelevize] Lower priority for audio description sources (#12119)
* [amcnetworks] Fix extraction (#12127)
* [pinkbike] Fix uploader extraction (#12054)
+ [onetpl] Add support for businessinsider.com.pl and plejada.pl
+ [onetpl] Add support for onet.pl (#10507)
+ [onetmvp] Add shortcut extractor
+ [vodpl] Add support for vod.pl (#12122)
+ [pornhub] Extract video URL from tv platform site (#12007, #12129)
+ [ceskatelevize] Extract DASH formats (#12119, #12133)
version 2017.02.14
Core
* TypeError is fixed with Python 2.7.13 on Windows (#11540, #12085)
Extractor
* [zdf] Fix extraction (#12117)
* [xtube] Fix extraction for both kinds of video id (#12088)
* [xtube] Improve title extraction (#12088)
+ [lemonde] Fallback delegate extraction to generic extractor (#12115, #12116)
* [bellmedia] Allow video id longer than 6 characters (#12114)
+ [limelight] Add support for referer protected videos
* [disney] Improve extraction (#4975, #11000, #11882, #11936)
* [hotstar] Improve extraction (#12096)
* [einthusan] Fix extraction (#11416)
+ [aenetworks] Add support for lifetimemovieclub.com (#12097)
* [youtube] Fix parsing codecs (#12091)
version 2017.02.11
Core
+ [utils] Introduce get_elements_by_class and get_elements_by_attribute
utility functions
+ [extractor/common] Skip m3u8 manifests protected with Adobe Flash Access
Extractor
* [pluralsight:course] Fix extraction (#12075)
+ [bbc] Extract m3u8 formats with 320k audio
* [facebook] Relax video id matching (#11017, #12055, #12056)
+ [corus] Add support for Corus Entertainment sites (#12060, #9164)
+ [pluralsight] Detect blocked account error message (#12070)
+ [bloomberg] Add another video id pattern (#12062)
* [extractor/commonmistakes] Restrict URL regular expression (#12050)
+ [tvplayer] Add support for tvplayer.com
version 2017.02.10
Extractors
* [xtube] Fix extraction (#12023)
* [pornhub] Fix extraction (#12007, #12018)
* [facebook] Improve JS data regular expression (#12042)
* [kaltura] Improve embed partner id extraction (#12041)
+ [sprout] Add support for sproutonline.com
* [6play] Improve extraction
+ [scrippsnetworks:watch] Add support for Scripps Networks sites (#10765)
+ [go] Add support for Adobe Pass authentication (#11468, #10831)
* [6play] Fix extraction (#12011)
+ [nbc] Add support for Adobe Pass authentication (#12006)
version 2017.02.07
Core
* [extractor/common] Fix audio only with audio group in m3u8 (#11995)
+ [downloader/fragment] Respect --no-part
* [extractor/common] Speed-up HTML5 media entries extraction (#11979)
Extractors
* [pornhub] Fix extraction (#11997)
+ [canalplus] Add support for cstar.fr (#11990)
+ [extractor/generic] Improve RTMP support (#11993)
+ [gaskrank] Add support for gaskrank.tv (#11685)
* [bandcamp] Fix extraction for incomplete albums (#11727)
* [iwara] Fix extraction (#11781)
* [googledrive] Fix extraction on Python 3.6
+ [videopress] Add support for videopress.com
+ [afreecatv] Extract RTMP formats
version 2017.02.04.1
Extractors
+ [twitch:stream] Add support for player.twitch.tv (#11971)
* [radiocanada] Fix extraction for toutv rtmp formats
version 2017.02.04
Core
+ Add --playlist-random to shuffle playlists (#11889, #11901)
* [utils] Improve comments processing in js_to_json (#11947)
* [utils] Handle single-line comments in js_to_json
* [downloader/external:ffmpeg] Minimize the use of aac_adtstoasc filter
Extractors
+ [piksel] Add another app token pattern (#11969)
+ [vk] Capture and output author blocked error message (#11965)
+ [turner] Fix secure HLS formats downloading with ffmpeg (#11358, #11373,
#11800)
+ [drtv] Add support for live and radio sections (#1827, #3427)
* [myspace] Fix extraction and extract HLS and HTTP formats
+ [youtube] Add format info for itag 325 and 328
* [vine] Fix extraction (#11955)
- [sportbox] Remove extractor (#11954)
+ [filmon] Add support for filmon.com (#11187)
+ [infoq] Add audio only formats (#11565)
* [douyutv] Improve room id regular expression (#11931)
* [iprima] Fix extraction (#11920, #11896)
* [youtube] Fix ytsearch when cookies are provided (#11924)
* [go] Relax video id regular expression (#11937)
* [facebook] Fix title extraction (#11941)
+ [youtube:playlist] Recognize TL playlists (#11945)
+ [bilibili] Support new Bangumi URLs (#11845)
+ [cbc:watch] Extract audio codec for audio only formats (#11893)
+ [elpais] Fix extraction for some URLs (#11765)
version 2017.02.01 version 2017.02.01
Extractors Extractors

View File

@ -99,11 +99,21 @@ Alternatively, refer to the [developer instructions](#developer-instructions) fo
--source-address IP Client-side IP address to bind to --source-address IP Client-side IP address to bind to
-4, --force-ipv4 Make all connections via IPv4 -4, --force-ipv4 Make all connections via IPv4
-6, --force-ipv6 Make all connections via IPv6 -6, --force-ipv6 Make all connections via IPv6
## Geo Restriction:
--geo-verification-proxy URL Use this proxy to verify the IP address for --geo-verification-proxy URL Use this proxy to verify the IP address for
some geo-restricted sites. The default some geo-restricted sites. The default
proxy specified by --proxy (or none, if the proxy specified by --proxy (or none, if the
options is not present) is used for the options is not present) is used for the
actual downloading. actual downloading.
--geo-bypass Bypass geographic restriction via faking
X-Forwarded-For HTTP header (experimental)
--no-geo-bypass Do not bypass geographic restriction via
faking X-Forwarded-For HTTP header
(experimental)
--geo-bypass-country CODE Force bypass geographic restriction with
explicitly provided two-letter ISO 3166-2
country code (experimental)
## Video Selection: ## Video Selection:
--playlist-start NUMBER Playlist video to start at (default is 1) --playlist-start NUMBER Playlist video to start at (default is 1)
@ -137,20 +147,22 @@ Alternatively, refer to the [developer instructions](#developer-instructions) fo
--match-filter FILTER Generic video filter. Specify any key (see --match-filter FILTER Generic video filter. Specify any key (see
help for -o for a list of available keys) help for -o for a list of available keys)
to match if the key is present, !key to to match if the key is present, !key to
check if the key is not present,key > check if the key is not present, key >
NUMBER (like "comment_count > 12", also NUMBER (like "comment_count > 12", also
works with >=, <, <=, !=, =) to compare works with >=, <, <=, !=, =) to compare
against a number, and & to require multiple against a number, key = 'LITERAL' (like
matches. Values which are not known are "uploader = 'Mike Smith'", also works with
excluded unless you put a question mark (?) !=) to match against a string literal and &
after the operator.For example, to only to require multiple matches. Values which
match videos that have been liked more than are not known are excluded unless you put a
100 times and disliked less than 50 times question mark (?) after the operator. For
(or the dislike functionality is not example, to only match videos that have
available at the given service), but who been liked more than 100 times and disliked
also have a description, use --match-filter less than 50 times (or the dislike
"like_count > 100 & dislike_count <? 50 & functionality is not available at the given
description" . service), but who also have a description,
use --match-filter "like_count > 100 &
dislike_count <? 50 & description" .
--no-playlist Download only the video, if the URL refers --no-playlist Download only the video, if the URL refers
to a video and a playlist. to a video and a playlist.
--yes-playlist Download the playlist, if the URL refers to --yes-playlist Download the playlist, if the URL refers to
@ -182,6 +194,7 @@ Alternatively, refer to the [developer instructions](#developer-instructions) fo
automatically resized from an initial value automatically resized from an initial value
of SIZE. of SIZE.
--playlist-reverse Download playlist videos in reverse order --playlist-reverse Download playlist videos in reverse order
--playlist-random Download playlist videos in random order
--xattr-set-filesize Set file xattribute ytdl.filesize with --xattr-set-filesize Set file xattribute ytdl.filesize with
expected file size (experimental) expected file size (experimental)
--hls-prefer-native Use the native HLS downloader instead of --hls-prefer-native Use the native HLS downloader instead of

21
devscripts/run_tests.sh Executable file
View File

@ -0,0 +1,21 @@
#!/bin/bash
DOWNLOAD_TESTS="age_restriction|download|subtitles|write_annotations|iqiyi_sdk_interpreter"
test_set=""
multiprocess_args=""
case "$YTDL_TEST_SET" in
core)
test_set="-I test_($DOWNLOAD_TESTS)\.py"
;;
download)
test_set="-I test_(?!$DOWNLOAD_TESTS).+\.py"
multiprocess_args="--processes=4 --process-timeout=540"
;;
*)
break
;;
esac
nosetests test --verbose $test_set $multiprocess_args

View File

@ -11,6 +11,7 @@
- **4tube** - **4tube**
- **56.com** - **56.com**
- **5min** - **5min**
- **6play**
- **8tracks** - **8tracks**
- **91porn** - **91porn**
- **9c9media** - **9c9media**
@ -84,6 +85,7 @@
- **bambuser:channel** - **bambuser:channel**
- **Bandcamp** - **Bandcamp**
- **Bandcamp:album** - **Bandcamp:album**
- **bangumi.bilibili.com**: BiliBili番剧
- **bbc**: BBC - **bbc**: BBC
- **bbc.co.uk**: BBC iPlayer - **bbc.co.uk**: BBC iPlayer
- **bbc.co.uk:article**: BBC articles - **bbc.co.uk:article**: BBC articles
@ -167,6 +169,7 @@
- **ComedyCentralShortname** - **ComedyCentralShortname**
- **ComedyCentralTV** - **ComedyCentralTV**
- **CondeNast**: Condé Nast media group: Allure, Architectural Digest, Ars Technica, Bon Appétit, Brides, Condé Nast, Condé Nast Traveler, Details, Epicurious, GQ, Glamour, Golf Digest, SELF, Teen Vogue, The New Yorker, Vanity Fair, Vogue, W Magazine, WIRED - **CondeNast**: Condé Nast media group: Allure, Architectural Digest, Ars Technica, Bon Appétit, Brides, Condé Nast, Condé Nast Traveler, Details, Epicurious, GQ, Glamour, Golf Digest, SELF, Teen Vogue, The New Yorker, Vanity Fair, Vogue, W Magazine, WIRED
- **Corus**
- **Coub** - **Coub**
- **Cracked** - **Cracked**
- **Crackle** - **Crackle**
@ -211,7 +214,8 @@
- **DRBonanza** - **DRBonanza**
- **Dropbox** - **Dropbox**
- **DrTuber** - **DrTuber**
- **DRTV** - **drtv**
- **drtv:live**
- **Dumpert** - **Dumpert**
- **dvtv**: http://video.aktualne.cz/ - **dvtv**: http://video.aktualne.cz/
- **dw** - **dw**
@ -247,6 +251,8 @@
- **fc2:embed** - **fc2:embed**
- **Fczenit** - **Fczenit**
- **fernsehkritik.tv** - **fernsehkritik.tv**
- **filmon**
- **filmon:channel**
- **Firstpost** - **Firstpost**
- **FiveTV** - **FiveTV**
- **Flickr** - **Flickr**
@ -278,6 +284,7 @@
- **Gamersyde** - **Gamersyde**
- **GameSpot** - **GameSpot**
- **GameStar** - **GameStar**
- **Gaskrank**
- **Gazeta** - **Gazeta**
- **GDCVault** - **GDCVault**
- **generic**: Generic downloader that works on some sites - **generic**: Generic downloader that works on some sites
@ -303,7 +310,6 @@
- **HellPorno** - **HellPorno**
- **Helsinki**: helsinki.fi - **Helsinki**: helsinki.fi
- **HentaiStigma** - **HentaiStigma**
- **HGTV**
- **hgtv.com:show** - **hgtv.com:show**
- **HistoricFilms** - **HistoricFilms**
- **history:topic**: History.com Topic - **history:topic**: History.com Topic
@ -540,8 +546,10 @@
- **OktoberfestTV** - **OktoberfestTV**
- **on.aol.com** - **on.aol.com**
- **OnDemandKorea** - **OnDemandKorea**
- **onet.pl**
- **onet.tv** - **onet.tv**
- **onet.tv:channel** - **onet.tv:channel**
- **OnetMVP**
- **OnionStudios** - **OnionStudios**
- **Ooyala** - **Ooyala**
- **OoyalaExternal** - **OoyalaExternal**
@ -662,6 +670,7 @@
- **screen.yahoo:search**: Yahoo screen search - **screen.yahoo:search**: Yahoo screen search
- **Screencast** - **Screencast**
- **ScreencastOMatic** - **ScreencastOMatic**
- **scrippsnetworks:watch**
- **Seeker** - **Seeker**
- **SenateISVP** - **SenateISVP**
- **SendtoNews** - **SendtoNews**
@ -671,7 +680,6 @@
- **Shared**: shared.sx - **Shared**: shared.sx
- **ShowRoomLive** - **ShowRoomLive**
- **Sina** - **Sina**
- **SixPlay**
- **skynewsarabia:article** - **skynewsarabia:article**
- **skynewsarabia:video** - **skynewsarabia:video**
- **SkySports** - **SkySports**
@ -703,10 +711,10 @@
- **Spiegeltv** - **Spiegeltv**
- **Spike** - **Spike**
- **Sport5** - **Sport5**
- **SportBox**
- **SportBoxEmbed** - **SportBoxEmbed**
- **SportDeutschland** - **SportDeutschland**
- **Sportschau** - **Sportschau**
- **Sprout**
- **sr:mediathek**: Saarländischer Rundfunk - **sr:mediathek**: Saarländischer Rundfunk
- **SRGSSR** - **SRGSSR**
- **SRGSSRPlay**: srf.ch, rts.ch, rsi.ch, rtr.ch and swissinfo.ch play sites - **SRGSSRPlay**: srf.ch, rts.ch, rsi.ch, rtr.ch and swissinfo.ch play sites
@ -796,10 +804,12 @@
- **TVCArticle** - **TVCArticle**
- **tvigle**: Интернет-телевидение Tvigle.ru - **tvigle**: Интернет-телевидение Tvigle.ru
- **tvland.com** - **tvland.com**
- **TVN24**
- **TVNoe** - **TVNoe**
- **tvp**: Telewizja Polska - **tvp**: Telewizja Polska
- **tvp:embed**: Telewizja Polska - **tvp:embed**: Telewizja Polska
- **tvp:series** - **tvp:series**
- **TVPlayer**
- **Tweakers** - **Tweakers**
- **twitch:chapter** - **twitch:chapter**
- **twitch:clips** - **twitch:clips**
@ -856,6 +866,7 @@
- **videomore:season** - **videomore:season**
- **videomore:video** - **videomore:video**
- **VideoPremium** - **VideoPremium**
- **VideoPress**
- **videoweed**: VideoWeed - **videoweed**: VideoWeed
- **Vidio** - **Vidio**
- **vidme** - **vidme**
@ -892,6 +903,7 @@
- **vlive** - **vlive**
- **vlive:channel** - **vlive:channel**
- **Vodlocker** - **Vodlocker**
- **VODPl**
- **VODPlatform** - **VODPlatform**
- **VoiceRepublic** - **VoiceRepublic**
- **VoxMedia** - **VoxMedia**

View File

@ -1,4 +1,5 @@
#!/usr/bin/env python #!/usr/bin/env python
# coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
@ -540,10 +541,10 @@ class TestYoutubeDL(unittest.TestCase):
self.assertEqual(ydl._format_note({}), '') self.assertEqual(ydl._format_note({}), '')
assertRegexpMatches(self, ydl._format_note({ assertRegexpMatches(self, ydl._format_note({
'vbr': 10, 'vbr': 10,
}), '^\s*10k$') }), r'^\s*10k$')
assertRegexpMatches(self, ydl._format_note({ assertRegexpMatches(self, ydl._format_note({
'fps': 30, 'fps': 30,
}), '^30fps$') }), r'^30fps$')
def test_postprocessors(self): def test_postprocessors(self):
filename = 'post-processor-testfile.mp4' filename = 'post-processor-testfile.mp4'
@ -606,6 +607,8 @@ class TestYoutubeDL(unittest.TestCase):
'duration': 30, 'duration': 30,
'filesize': 10 * 1024, 'filesize': 10 * 1024,
'playlist_id': '42', 'playlist_id': '42',
'uploader': "變態妍字幕版 太妍 тест",
'creator': "тест ' 123 ' тест--",
} }
second = { second = {
'id': '2', 'id': '2',
@ -616,6 +619,7 @@ class TestYoutubeDL(unittest.TestCase):
'description': 'foo', 'description': 'foo',
'filesize': 5 * 1024, 'filesize': 5 * 1024,
'playlist_id': '43', 'playlist_id': '43',
'uploader': "тест 123",
} }
videos = [first, second] videos = [first, second]
@ -656,6 +660,26 @@ class TestYoutubeDL(unittest.TestCase):
res = get_videos(f) res = get_videos(f)
self.assertEqual(res, ['1']) self.assertEqual(res, ['1'])
f = match_filter_func('uploader = "變態妍字幕版 太妍 тест"')
res = get_videos(f)
self.assertEqual(res, ['1'])
f = match_filter_func('uploader != "變態妍字幕版 太妍 тест"')
res = get_videos(f)
self.assertEqual(res, ['2'])
f = match_filter_func('creator = "тест \' 123 \' тест--"')
res = get_videos(f)
self.assertEqual(res, ['1'])
f = match_filter_func("creator = 'тест \\' 123 \\' тест--'")
res = get_videos(f)
self.assertEqual(res, ['1'])
f = match_filter_func(r"creator = 'тест \' 123 \' тест--' & duration > 30")
res = get_videos(f)
self.assertEqual(res, [])
def test_playlist_items_selection(self): def test_playlist_items_selection(self):
entries = [{ entries = [{
'id': compat_str(i), 'id': compat_str(i),

View File

@ -65,6 +65,10 @@ defs = gettestcases()
class TestDownload(unittest.TestCase): class TestDownload(unittest.TestCase):
# Parallel testing in nosetests. See
# http://nose.readthedocs.org/en/latest/doc_tests/test_multiprocess/multiprocess.html
_multiprocess_shared_ = True
maxDiff = None maxDiff = None
def setUp(self): def setUp(self):
@ -73,7 +77,7 @@ class TestDownload(unittest.TestCase):
# Dynamically generate tests # Dynamically generate tests
def generator(test_case): def generator(test_case, tname):
def test_template(self): def test_template(self):
ie = youtube_dl.extractor.get_info_extractor(test_case['name']) ie = youtube_dl.extractor.get_info_extractor(test_case['name'])
@ -102,6 +106,7 @@ def generator(test_case):
return return
params = get_params(test_case.get('params', {})) params = get_params(test_case.get('params', {}))
params['outtmpl'] = tname + '_' + params['outtmpl']
if is_playlist and 'playlist' not in test_case: if is_playlist and 'playlist' not in test_case:
params.setdefault('extract_flat', 'in_playlist') params.setdefault('extract_flat', 'in_playlist')
params.setdefault('skip_download', True) params.setdefault('skip_download', True)
@ -146,7 +151,7 @@ def generator(test_case):
raise raise
if try_num == RETRIES: if try_num == RETRIES:
report_warning('Failed due to network errors, skipping...') report_warning('%s failed due to network errors, skipping...' % tname)
return return
print('Retrying: {0} failed tries\n\n##########\n\n'.format(try_num)) print('Retrying: {0} failed tries\n\n##########\n\n'.format(try_num))
@ -221,12 +226,12 @@ def generator(test_case):
# And add them to TestDownload # And add them to TestDownload
for n, test_case in enumerate(defs): for n, test_case in enumerate(defs):
test_method = generator(test_case)
tname = 'test_' + str(test_case['name']) tname = 'test_' + str(test_case['name'])
i = 1 i = 1
while hasattr(TestDownload, tname): while hasattr(TestDownload, tname):
tname = 'test_%s_%d' % (test_case['name'], i) tname = 'test_%s_%d' % (test_case['name'], i)
i += 1 i += 1
test_method = generator(test_case, tname)
test_method.__name__ = str(tname) test_method.__name__ = str(tname)
setattr(TestDownload, test_method.__name__, test_method) setattr(TestDownload, test_method.__name__, test_method)
del test_method del test_method

View File

@ -34,6 +34,9 @@ from youtube_dl.utils import (
find_xpath_attr, find_xpath_attr,
fix_xml_ampersands, fix_xml_ampersands,
get_element_by_class, get_element_by_class,
get_element_by_attribute,
get_elements_by_class,
get_elements_by_attribute,
InAdvancePagedList, InAdvancePagedList,
intlist_to_bytes, intlist_to_bytes,
is_html, is_html,
@ -785,12 +788,27 @@ class TestUtil(unittest.TestCase):
on = js_to_json('["abc", "def",]') on = js_to_json('["abc", "def",]')
self.assertEqual(json.loads(on), ['abc', 'def']) self.assertEqual(json.loads(on), ['abc', 'def'])
on = js_to_json('[/*comment\n*/"abc"/*comment\n*/,/*comment\n*/"def",/*comment\n*/]')
self.assertEqual(json.loads(on), ['abc', 'def'])
on = js_to_json('[//comment\n"abc" //comment\n,//comment\n"def",//comment\n]')
self.assertEqual(json.loads(on), ['abc', 'def'])
on = js_to_json('{"abc": "def",}') on = js_to_json('{"abc": "def",}')
self.assertEqual(json.loads(on), {'abc': 'def'}) self.assertEqual(json.loads(on), {'abc': 'def'})
on = js_to_json('{/*comment\n*/"abc"/*comment\n*/:/*comment\n*/"def"/*comment\n*/,/*comment\n*/}')
self.assertEqual(json.loads(on), {'abc': 'def'})
on = js_to_json('{ 0: /* " \n */ ",]" , }') on = js_to_json('{ 0: /* " \n */ ",]" , }')
self.assertEqual(json.loads(on), {'0': ',]'}) self.assertEqual(json.loads(on), {'0': ',]'})
on = js_to_json('{ /*comment\n*/0/*comment\n*/: /* " \n */ ",]" , }')
self.assertEqual(json.loads(on), {'0': ',]'})
on = js_to_json('{ 0: // comment\n1 }')
self.assertEqual(json.loads(on), {'0': 1})
on = js_to_json(r'["<p>x<\/p>"]') on = js_to_json(r'["<p>x<\/p>"]')
self.assertEqual(json.loads(on), ['<p>x</p>']) self.assertEqual(json.loads(on), ['<p>x</p>'])
@ -800,15 +818,27 @@ class TestUtil(unittest.TestCase):
on = js_to_json("['a\\\nb']") on = js_to_json("['a\\\nb']")
self.assertEqual(json.loads(on), ['ab']) self.assertEqual(json.loads(on), ['ab'])
on = js_to_json("/*comment\n*/[/*comment\n*/'a\\\nb'/*comment\n*/]/*comment\n*/")
self.assertEqual(json.loads(on), ['ab'])
on = js_to_json('{0xff:0xff}') on = js_to_json('{0xff:0xff}')
self.assertEqual(json.loads(on), {'255': 255}) self.assertEqual(json.loads(on), {'255': 255})
on = js_to_json('{/*comment\n*/0xff/*comment\n*/:/*comment\n*/0xff/*comment\n*/}')
self.assertEqual(json.loads(on), {'255': 255})
on = js_to_json('{077:077}') on = js_to_json('{077:077}')
self.assertEqual(json.loads(on), {'63': 63}) self.assertEqual(json.loads(on), {'63': 63})
on = js_to_json('{/*comment\n*/077/*comment\n*/:/*comment\n*/077/*comment\n*/}')
self.assertEqual(json.loads(on), {'63': 63})
on = js_to_json('{42:42}') on = js_to_json('{42:42}')
self.assertEqual(json.loads(on), {'42': 42}) self.assertEqual(json.loads(on), {'42': 42})
on = js_to_json('{/*comment\n*/42/*comment\n*/:/*comment\n*/42/*comment\n*/}')
self.assertEqual(json.loads(on), {'42': 42})
def test_extract_attributes(self): def test_extract_attributes(self):
self.assertEqual(extract_attributes('<e x="y">'), {'x': 'y'}) self.assertEqual(extract_attributes('<e x="y">'), {'x': 'y'})
self.assertEqual(extract_attributes("<e x='y'>"), {'x': 'y'}) self.assertEqual(extract_attributes("<e x='y'>"), {'x': 'y'})
@ -1097,6 +1127,32 @@ The first line
self.assertEqual(get_element_by_class('foo', html), 'nice') self.assertEqual(get_element_by_class('foo', html), 'nice')
self.assertEqual(get_element_by_class('no-such-class', html), None) self.assertEqual(get_element_by_class('no-such-class', html), None)
def test_get_element_by_attribute(self):
html = '''
<span class="foo bar">nice</span>
'''
self.assertEqual(get_element_by_attribute('class', 'foo bar', html), 'nice')
self.assertEqual(get_element_by_attribute('class', 'foo', html), None)
self.assertEqual(get_element_by_attribute('class', 'no-such-foo', html), None)
def test_get_elements_by_class(self):
html = '''
<span class="foo bar">nice</span><span class="foo bar">also nice</span>
'''
self.assertEqual(get_elements_by_class('foo', html), ['nice', 'also nice'])
self.assertEqual(get_elements_by_class('no-such-class', html), [])
def test_get_elements_by_attribute(self):
html = '''
<span class="foo bar">nice</span><span class="foo bar">also nice</span>
'''
self.assertEqual(get_elements_by_attribute('class', 'foo bar', html), ['nice', 'also nice'])
self.assertEqual(get_elements_by_attribute('class', 'foo', html), [])
self.assertEqual(get_elements_by_attribute('class', 'no-such-foo', html), [])
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()

View File

@ -24,6 +24,7 @@ import sys
import time import time
import tokenize import tokenize
import traceback import traceback
import random
from .compat import ( from .compat import (
compat_basestring, compat_basestring,
@ -55,6 +56,8 @@ from .utils import (
ExtractorError, ExtractorError,
format_bytes, format_bytes,
formatSeconds, formatSeconds,
GeoRestrictedError,
ISO3166Utils,
locked_file, locked_file,
make_HTTPS_handler, make_HTTPS_handler,
MaxDownloadsReached, MaxDownloadsReached,
@ -159,6 +162,7 @@ class YoutubeDL(object):
playlistend: Playlist item to end at. playlistend: Playlist item to end at.
playlist_items: Specific indices of playlist to download. playlist_items: Specific indices of playlist to download.
playlistreverse: Download playlist items in reverse order. playlistreverse: Download playlist items in reverse order.
playlistrandom: Download playlist items in random order.
matchtitle: Download only matching titles. matchtitle: Download only matching titles.
rejecttitle: Reject downloads for matching titles. rejecttitle: Reject downloads for matching titles.
logger: Log messages to a logging.Logger instance. logger: Log messages to a logging.Logger instance.
@ -270,6 +274,12 @@ class YoutubeDL(object):
If it returns None, the video is downloaded. If it returns None, the video is downloaded.
match_filter_func in utils.py is one example for this. match_filter_func in utils.py is one example for this.
no_color: Do not emit color codes in output. no_color: Do not emit color codes in output.
geo_bypass: Bypass geographic restriction via faking X-Forwarded-For
HTTP header (experimental)
geo_bypass_country:
Two-letter ISO 3166-2 country code that will be used for
explicit geographic restriction bypassing via faking
X-Forwarded-For HTTP header (experimental)
The following options determine which downloader is picked: The following options determine which downloader is picked:
external_downloader: Executable of the external downloader to call. external_downloader: Executable of the external downloader to call.
@ -705,6 +715,14 @@ class YoutubeDL(object):
return self.process_ie_result(ie_result, download, extra_info) return self.process_ie_result(ie_result, download, extra_info)
else: else:
return ie_result return ie_result
except GeoRestrictedError as e:
msg = e.msg
if e.countries:
msg += '\nThis video is available in %s.' % ', '.join(
map(ISO3166Utils.short2full, e.countries))
msg += '\nYou might want to use a VPN or a proxy server (with --proxy) to workaround.'
self.report_error(msg)
break
except ExtractorError as e: # An error we somewhat expected except ExtractorError as e: # An error we somewhat expected
self.report_error(compat_str(e), e.format_traceback()) self.report_error(compat_str(e), e.format_traceback())
break break
@ -842,8 +860,17 @@ class YoutubeDL(object):
if self.params.get('playlistreverse', False): if self.params.get('playlistreverse', False):
entries = entries[::-1] entries = entries[::-1]
if self.params.get('playlistrandom', False):
random.shuffle(entries)
x_forwarded_for = ie_result.get('__x_forwarded_for_ip')
for i, entry in enumerate(entries, 1): for i, entry in enumerate(entries, 1):
self.to_screen('[download] Downloading video %s of %s' % (i, n_entries)) self.to_screen('[download] Downloading video %s of %s' % (i, n_entries))
# This __x_forwarded_for_ip thing is a bit ugly but requires
# minimal changes
if x_forwarded_for:
entry['__x_forwarded_for_ip'] = x_forwarded_for
extra = { extra = {
'n_entries': n_entries, 'n_entries': n_entries,
'playlist': playlist, 'playlist': playlist,
@ -1228,6 +1255,11 @@ class YoutubeDL(object):
if cookies: if cookies:
res['Cookie'] = cookies res['Cookie'] = cookies
if 'X-Forwarded-For' not in res:
x_forwarded_for_ip = info_dict.get('__x_forwarded_for_ip')
if x_forwarded_for_ip:
res['X-Forwarded-For'] = x_forwarded_for_ip
return res return res
def _calc_cookies(self, info_dict): def _calc_cookies(self, info_dict):
@ -1370,6 +1402,9 @@ class YoutubeDL(object):
full_format_info = info_dict.copy() full_format_info = info_dict.copy()
full_format_info.update(format) full_format_info.update(format)
format['http_headers'] = self._calc_headers(full_format_info) format['http_headers'] = self._calc_headers(full_format_info)
# Remove private housekeeping stuff
if '__x_forwarded_for_ip' in info_dict:
del info_dict['__x_forwarded_for_ip']
# TODO Central sorting goes here # TODO Central sorting goes here

View File

@ -344,6 +344,7 @@ def _real_main(argv=None):
'playliststart': opts.playliststart, 'playliststart': opts.playliststart,
'playlistend': opts.playlistend, 'playlistend': opts.playlistend,
'playlistreverse': opts.playlist_reverse, 'playlistreverse': opts.playlist_reverse,
'playlistrandom': opts.playlist_random,
'noplaylist': opts.noplaylist, 'noplaylist': opts.noplaylist,
'logtostderr': opts.outtmpl == '-', 'logtostderr': opts.outtmpl == '-',
'consoletitle': opts.consoletitle, 'consoletitle': opts.consoletitle,
@ -413,6 +414,8 @@ def _real_main(argv=None):
'cn_verification_proxy': opts.cn_verification_proxy, 'cn_verification_proxy': opts.cn_verification_proxy,
'geo_verification_proxy': opts.geo_verification_proxy, 'geo_verification_proxy': opts.geo_verification_proxy,
'config_location': opts.config_location, 'config_location': opts.config_location,
'geo_bypass': opts.geo_bypass,
'geo_bypass_country': opts.geo_bypass_country,
} }
with YoutubeDL(ydl_opts) as ydl: with YoutubeDL(ydl_opts) as ydl:

View File

@ -2883,6 +2883,7 @@ __all__ = [
'compat_cookiejar', 'compat_cookiejar',
'compat_cookies', 'compat_cookies',
'compat_etree_fromstring', 'compat_etree_fromstring',
'compat_etree_register_namespace',
'compat_expanduser', 'compat_expanduser',
'compat_get_terminal_size', 'compat_get_terminal_size',
'compat_getenv', 'compat_getenv',

View File

@ -43,7 +43,10 @@ class DashSegmentsFD(FragmentFD):
count = 0 count = 0
while count <= fragment_retries: while count <= fragment_retries:
try: try:
success = ctx['dl'].download(target_filename, {'url': segment_url}) success = ctx['dl'].download(target_filename, {
'url': segment_url,
'http_headers': info_dict.get('http_headers'),
})
if not success: if not success:
return False return False
down, target_sanitized = sanitize_open(target_filename, 'rb') down, target_sanitized = sanitize_open(target_filename, 'rb')

View File

@ -17,6 +17,7 @@ from ..utils import (
encodeArgument, encodeArgument,
handle_youtubedl_headers, handle_youtubedl_headers,
check_executable, check_executable,
is_outdated_version,
) )
@ -198,6 +199,15 @@ class FFmpegFD(ExternalFD):
args = [ffpp.executable, '-y'] args = [ffpp.executable, '-y']
seekable = info_dict.get('_seekable')
if seekable is not None:
# setting -seekable prevents ffmpeg from guessing if the server
# supports seeking(by adding the header `Range: bytes=0-`), which
# can cause problems in some cases
# https://github.com/rg3/youtube-dl/issues/11800#issuecomment-275037127
# http://trac.ffmpeg.org/ticket/6125#comment:10
args += ['-seekable', '1' if seekable else '0']
args += self._configuration_args() args += self._configuration_args()
# start_time = info_dict.get('start_time') or 0 # start_time = info_dict.get('start_time') or 0
@ -264,7 +274,9 @@ class FFmpegFD(ExternalFD):
if self.params.get('hls_use_mpegts', False) or tmpfilename == '-': if self.params.get('hls_use_mpegts', False) or tmpfilename == '-':
args += ['-f', 'mpegts'] args += ['-f', 'mpegts']
else: else:
args += ['-f', 'mp4', '-bsf:a', 'aac_adtstoasc'] args += ['-f', 'mp4']
if (ffpp.basename == 'ffmpeg' and is_outdated_version(ffpp._versions['ffmpeg'], '3.2', False)) and (not info_dict.get('acodec') or info_dict['acodec'].split('.')[0] in ('aac', 'mp4a')):
args += ['-bsf:a', 'aac_adtstoasc']
elif protocol == 'rtmp': elif protocol == 'rtmp':
args += ['-f', 'flv'] args += ['-f', 'flv']
else: else:

View File

@ -61,6 +61,7 @@ class FragmentFD(FileDownloader):
'noprogress': True, 'noprogress': True,
'ratelimit': self.params.get('ratelimit'), 'ratelimit': self.params.get('ratelimit'),
'retries': self.params.get('retries', 0), 'retries': self.params.get('retries', 0),
'nopart': self.params.get('nopart', False),
'test': self.params.get('test', False), 'test': self.params.get('test', False),
} }
) )

View File

@ -238,7 +238,10 @@ class IsmFD(FragmentFD):
count = 0 count = 0
while count <= fragment_retries: while count <= fragment_retries:
try: try:
success = ctx['dl'].download(target_filename, {'url': segment_url}) success = ctx['dl'].download(target_filename, {
'url': segment_url,
'http_headers': info_dict.get('http_headers'),
})
if not success: if not success:
return False return False
down, target_sanitized = sanitize_open(target_filename, 'rb') down, target_sanitized = sanitize_open(target_filename, 'rb')

View File

@ -31,6 +31,11 @@ MSO_INFO = {
'username_field': 'user', 'username_field': 'user',
'password_field': 'passwd', 'password_field': 'passwd',
}, },
'TWC': {
'name': 'Time Warner Cable | Spectrum',
'username_field': 'Ecom_User_ID',
'password_field': 'Ecom_Password',
},
'thr030': { 'thr030': {
'name': '3 Rivers Communications' 'name': '3 Rivers Communications'
}, },

View File

@ -23,7 +23,7 @@ class AENetworksBaseIE(ThePlatformIE):
class AENetworksIE(AENetworksBaseIE): class AENetworksIE(AENetworksBaseIE):
IE_NAME = 'aenetworks' IE_NAME = 'aenetworks'
IE_DESC = 'A+E Networks: A&E, Lifetime, History.com, FYI Network' IE_DESC = 'A+E Networks: A&E, Lifetime, History.com, FYI Network'
_VALID_URL = r'https?://(?:www\.)?(?P<domain>(?:history|aetv|mylifetime)\.com|fyi\.tv)/(?:shows/(?P<show_path>[^/]+(?:/[^/]+){0,2})|movies/(?P<movie_display_id>[^/]+)/full-movie)' _VALID_URL = r'https?://(?:www\.)?(?P<domain>(?:history|aetv|mylifetime|lifetimemovieclub)\.com|fyi\.tv)/(?:shows/(?P<show_path>[^/]+(?:/[^/]+){0,2})|movies/(?P<movie_display_id>[^/]+)(?:/full-movie)?)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.history.com/shows/mountain-men/season-1/episode-1', 'url': 'http://www.history.com/shows/mountain-men/season-1/episode-1',
'md5': 'a97a65f7e823ae10e9244bc5433d5fe6', 'md5': 'a97a65f7e823ae10e9244bc5433d5fe6',
@ -62,11 +62,15 @@ class AENetworksIE(AENetworksBaseIE):
}, { }, {
'url': 'http://www.mylifetime.com/movies/center-stage-on-pointe/full-movie', 'url': 'http://www.mylifetime.com/movies/center-stage-on-pointe/full-movie',
'only_matching': True 'only_matching': True
}, {
'url': 'https://www.lifetimemovieclub.com/movies/a-killer-among-us',
'only_matching': True
}] }]
_DOMAIN_TO_REQUESTOR_ID = { _DOMAIN_TO_REQUESTOR_ID = {
'history.com': 'HISTORY', 'history.com': 'HISTORY',
'aetv.com': 'AETV', 'aetv.com': 'AETV',
'mylifetime.com': 'LIFETIME', 'mylifetime.com': 'LIFETIME',
'lifetimemovieclub.com': 'LIFETIMEMOVIECLUB',
'fyi.tv': 'FYI', 'fyi.tv': 'FYI',
} }

View File

@ -221,10 +221,23 @@ class AfreecaTVGlobalIE(AfreecaTVIE):
s_url = s.get('purl') s_url = s.get('purl')
if not s_url: if not s_url:
continue continue
# TODO: extract rtmp formats stype = s.get('stype')
if s.get('stype') == 'HLS': if stype == 'HLS':
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
s_url, channel_id, 'mp4', fatal=False)) s_url, channel_id, 'mp4', m3u8_id=stype, fatal=False))
elif stype == 'RTMP':
format_id = [stype]
label = s.get('label')
if label:
format_id.append(label)
formats.append({
'format_id': '-'.join(format_id),
'url': s_url,
'tbr': int_or_none(s.get('bps')),
'height': int_or_none(s.get('brt')),
'ext': 'flv',
'rtmp_live': True,
})
self._sort_formats(formats) self._sort_formats(formats)
info.update({ info.update({

View File

@ -53,20 +53,30 @@ class AMCNetworksIE(ThePlatformIE):
'mbr': 'true', 'mbr': 'true',
'manifest': 'm3u', 'manifest': 'm3u',
} }
media_url = self._search_regex(r'window\.platformLinkURL\s*=\s*[\'"]([^\'"]+)', webpage, 'media url') media_url = self._search_regex(
r'window\.platformLinkURL\s*=\s*[\'"]([^\'"]+)',
webpage, 'media url')
theplatform_metadata = self._download_theplatform_metadata(self._search_regex( theplatform_metadata = self._download_theplatform_metadata(self._search_regex(
r'https?://link.theplatform.com/s/([^?]+)', media_url, 'theplatform_path'), display_id) r'link\.theplatform\.com/s/([^?]+)',
media_url, 'theplatform_path'), display_id)
info = self._parse_theplatform_metadata(theplatform_metadata) info = self._parse_theplatform_metadata(theplatform_metadata)
video_id = theplatform_metadata['pid'] video_id = theplatform_metadata['pid']
title = theplatform_metadata['title'] title = theplatform_metadata['title']
rating = theplatform_metadata['ratings'][0]['rating'] rating = theplatform_metadata['ratings'][0]['rating']
auth_required = self._search_regex(r'window\.authRequired\s*=\s*(true|false);', webpage, 'auth required') auth_required = self._search_regex(
r'window\.authRequired\s*=\s*(true|false);',
webpage, 'auth required')
if auth_required == 'true': if auth_required == 'true':
requestor_id = self._search_regex(r'window\.requestor_id\s*=\s*[\'"]([^\'"]+)', webpage, 'requestor id') requestor_id = self._search_regex(
resource = self._get_mvpd_resource(requestor_id, title, video_id, rating) r'window\.requestor_id\s*=\s*[\'"]([^\'"]+)',
query['auth'] = self._extract_mvpd_auth(url, video_id, requestor_id, resource) webpage, 'requestor id')
resource = self._get_mvpd_resource(
requestor_id, title, video_id, rating)
query['auth'] = self._extract_mvpd_auth(
url, video_id, requestor_id, resource)
media_url = update_url_query(media_url, query) media_url = update_url_query(media_url, query)
formats, subtitles = self._extract_theplatform_smil(media_url, video_id) formats, subtitles = self._extract_theplatform_smil(
media_url, video_id)
self._sort_formats(formats) self._sort_formats(formats)
info.update({ info.update({
'id': video_id, 'id': video_id,
@ -78,9 +88,11 @@ class AMCNetworksIE(ThePlatformIE):
if ns_keys: if ns_keys:
ns = list(ns_keys)[0] ns = list(ns_keys)[0]
series = theplatform_metadata.get(ns + '$show') series = theplatform_metadata.get(ns + '$show')
season_number = int_or_none(theplatform_metadata.get(ns + '$season')) season_number = int_or_none(
theplatform_metadata.get(ns + '$season'))
episode = theplatform_metadata.get(ns + '$episodeTitle') episode = theplatform_metadata.get(ns + '$episodeTitle')
episode_number = int_or_none(theplatform_metadata.get(ns + '$episode')) episode_number = int_or_none(
theplatform_metadata.get(ns + '$episode'))
if season_number: if season_number:
title = 'Season %d - %s' % (season_number, title) title = 'Season %d - %s' % (season_number, title)
if series: if series:

View File

@ -1,13 +1,13 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .jwplatform import JWPlatformBaseIE from .common import InfoExtractor
from ..utils import ( from ..utils import (
unified_strdate, unified_strdate,
clean_html, clean_html,
) )
class ArchiveOrgIE(JWPlatformBaseIE): class ArchiveOrgIE(InfoExtractor):
IE_NAME = 'archive.org' IE_NAME = 'archive.org'
IE_DESC = 'archive.org videos' IE_DESC = 'archive.org videos'
_VALID_URL = r'https?://(?:www\.)?archive\.org/(?:details|embed)/(?P<id>[^/?#]+)(?:[?].*)?$' _VALID_URL = r'https?://(?:www\.)?archive\.org/(?:details|embed)/(?P<id>[^/?#]+)(?:[?].*)?$'

View File

@ -209,6 +209,15 @@ class BandcampAlbumIE(InfoExtractor):
'id': 'entropy-ep', 'id': 'entropy-ep',
}, },
'playlist_mincount': 3, 'playlist_mincount': 3,
}, {
# not all tracks have songs
'url': 'https://insulters.bandcamp.com/album/we-are-the-plague',
'info_dict': {
'id': 'we-are-the-plague',
'title': 'WE ARE THE PLAGUE',
'uploader_id': 'insulters',
},
'playlist_count': 2,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -217,12 +226,16 @@ class BandcampAlbumIE(InfoExtractor):
album_id = mobj.group('album_id') album_id = mobj.group('album_id')
playlist_id = album_id or uploader_id playlist_id = album_id or uploader_id
webpage = self._download_webpage(url, playlist_id) webpage = self._download_webpage(url, playlist_id)
tracks_paths = re.findall(r'<a href="(.*?)" itemprop="url">', webpage) track_elements = re.findall(
if not tracks_paths: r'(?s)<div[^>]*>(.*?<a[^>]+href="([^"]+?)"[^>]+itemprop="url"[^>]*>.*?)</div>', webpage)
if not track_elements:
raise ExtractorError('The page doesn\'t contain any tracks') raise ExtractorError('The page doesn\'t contain any tracks')
# Only tracks with duration info have songs
entries = [ entries = [
self.url_result(compat_urlparse.urljoin(url, t_path), ie=BandcampIE.ie_key()) self.url_result(compat_urlparse.urljoin(url, t_path), ie=BandcampIE.ie_key())
for t_path in tracks_paths] for elem_content, t_path in track_elements
if self._html_search_meta('duration', elem_content, default=None)]
title = self._html_search_regex( title = self._html_search_regex(
r'album_title\s*:\s*"((?:\\.|[^"\\])+?)"', r'album_title\s*:\s*"((?:\\.|[^"\\])+?)"',
webpage, 'title', fatal=False) webpage, 'title', fatal=False)

View File

@ -225,6 +225,8 @@ class BBCCoUkIE(InfoExtractor):
} }
] ]
_USP_RE = r'/([^/]+?)\.ism(?:\.hlsv2\.ism)?/[^/]+\.m3u8'
class MediaSelectionError(Exception): class MediaSelectionError(Exception):
def __init__(self, id): def __init__(self, id):
self.id = id self.id = id
@ -336,6 +338,15 @@ class BBCCoUkIE(InfoExtractor):
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
href, programme_id, ext='mp4', entry_protocol='m3u8_native', href, programme_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id=format_id, fatal=False)) m3u8_id=format_id, fatal=False))
if re.search(self._USP_RE, href):
usp_formats = self._extract_m3u8_formats(
re.sub(self._USP_RE, r'/\1.ism/\1.m3u8', href),
programme_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id=format_id, fatal=False)
for f in usp_formats:
if f.get('height') and f['height'] > 720:
continue
formats.append(f)
elif transfer_format == 'hds': elif transfer_format == 'hds':
formats.extend(self._extract_f4m_formats( formats.extend(self._extract_f4m_formats(
href, programme_id, f4m_id=format_id, fatal=False)) href, programme_id, f4m_id=format_id, fatal=False))

View File

@ -24,7 +24,7 @@ class BellMediaIE(InfoExtractor):
space space
)\.ca| )\.ca|
much\.com much\.com
)/.*?(?:\bvid=|-vid|~|%7E|/(?:episode)?)(?P<id>[0-9]{6})''' )/.*?(?:\bvid=|-vid|~|%7E|/(?:episode)?)(?P<id>[0-9]{6,})'''
_TESTS = [{ _TESTS = [{
'url': 'http://www.ctv.ca/video/player?vid=706966', 'url': 'http://www.ctv.ca/video/player?vid=706966',
'md5': 'ff2ebbeae0aa2dcc32a830c3fd69b7b0', 'md5': 'ff2ebbeae0aa2dcc32a830c3fd69b7b0',
@ -55,6 +55,9 @@ class BellMediaIE(InfoExtractor):
}, { }, {
'url': 'http://www.much.com/shows/the-almost-impossible-gameshow/928979/episode-6', 'url': 'http://www.much.com/shows/the-almost-impossible-gameshow/928979/episode-6',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://www.ctv.ca/DCs-Legends-of-Tomorrow/Video/S2E11-Turncoat-vid1051430',
'only_matching': True,
}] }]
_DOMAINS = { _DOMAINS = {
'thecomedynetwork': 'comedy', 'thecomedynetwork': 'comedy',

View File

@ -5,19 +5,27 @@ import hashlib
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_parse_qs from ..compat import (
compat_parse_qs,
compat_urlparse,
)
from ..utils import ( from ..utils import (
ExtractorError,
int_or_none, int_or_none,
float_or_none, float_or_none,
parse_iso8601,
smuggle_url,
strip_jsonp,
unified_timestamp, unified_timestamp,
unsmuggle_url,
urlencode_postdata, urlencode_postdata,
) )
class BiliBiliIE(InfoExtractor): class BiliBiliIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.|bangumi\.|)bilibili\.(?:tv|com)/(?:video/av|anime/v/)(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.|bangumi\.|)bilibili\.(?:tv|com)/(?:video/av|anime/(?P<anime_id>\d+)/play#)(?P<id>\d+)'
_TEST = { _TESTS = [{
'url': 'http://www.bilibili.tv/video/av1074402/', 'url': 'http://www.bilibili.tv/video/av1074402/',
'md5': '9fa226fe2b8a9a4d5a69b4c6a183417e', 'md5': '9fa226fe2b8a9a4d5a69b4c6a183417e',
'info_dict': { 'info_dict': {
@ -32,25 +40,61 @@ class BiliBiliIE(InfoExtractor):
'uploader': '菊子桑', 'uploader': '菊子桑',
'uploader_id': '156160', 'uploader_id': '156160',
}, },
} }, {
# Tested in BiliBiliBangumiIE
'url': 'http://bangumi.bilibili.com/anime/1869/play#40062',
'only_matching': True,
}, {
'url': 'http://bangumi.bilibili.com/anime/5802/play#100643',
'md5': '3f721ad1e75030cc06faf73587cfec57',
'info_dict': {
'id': '100643',
'ext': 'mp4',
'title': 'CHAOS;CHILD',
'description': '如果你是神明并且能够让妄想成为现实。那你会进行怎么样的妄想是淫靡的世界独裁社会毁灭性的制裁还是……2015年涩谷。从6年前发生的大灾害“涩谷地震”之后复兴了的这个街区里新设立的私立高中...',
},
'skip': 'Geo-restricted to China',
}]
_APP_KEY = '84956560bc028eb7' _APP_KEY = '84956560bc028eb7'
_BILIBILI_KEY = '94aba54af9065f71de72f5508f1cd42e' _BILIBILI_KEY = '94aba54af9065f71de72f5508f1cd42e'
def _report_error(self, result):
if 'message' in result:
raise ExtractorError('%s said: %s' % (self.IE_NAME, result['message']), expected=True)
elif 'code' in result:
raise ExtractorError('%s returns error %d' % (self.IE_NAME, result['code']), expected=True)
else:
raise ExtractorError('Can\'t extract Bangumi episode ID')
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) url, smuggled_data = unsmuggle_url(url, {})
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
anime_id = mobj.group('anime_id')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
if 'anime/v' not in url: if 'anime/' not in url:
cid = compat_parse_qs(self._search_regex( cid = compat_parse_qs(self._search_regex(
[r'EmbedPlayer\([^)]+,\s*"([^"]+)"\)', [r'EmbedPlayer\([^)]+,\s*"([^"]+)"\)',
r'<iframe[^>]+src="https://secure\.bilibili\.com/secure,([^"]+)"'], r'<iframe[^>]+src="https://secure\.bilibili\.com/secure,([^"]+)"'],
webpage, 'player parameters'))['cid'][0] webpage, 'player parameters'))['cid'][0]
else: else:
if 'no_bangumi_tip' not in smuggled_data:
self.to_screen('Downloading episode %s. To download all videos in anime %s, re-run youtube-dl with %s' % (
video_id, anime_id, compat_urlparse.urljoin(url, '//bangumi.bilibili.com/anime/%s' % anime_id)))
headers = {
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
}
headers.update(self.geo_verification_headers())
js = self._download_json( js = self._download_json(
'http://bangumi.bilibili.com/web_api/get_source', video_id, 'http://bangumi.bilibili.com/web_api/get_source', video_id,
data=urlencode_postdata({'episode_id': video_id}), data=urlencode_postdata({'episode_id': video_id}),
headers={'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8'}) headers=headers)
if 'result' not in js:
self._report_error(js)
cid = js['result']['cid'] cid = js['result']['cid']
payload = 'appkey=%s&cid=%s&otype=json&quality=2&type=mp4' % (self._APP_KEY, cid) payload = 'appkey=%s&cid=%s&otype=json&quality=2&type=mp4' % (self._APP_KEY, cid)
@ -58,7 +102,11 @@ class BiliBiliIE(InfoExtractor):
video_info = self._download_json( video_info = self._download_json(
'http://interface.bilibili.com/playurl?%s&sign=%s' % (payload, sign), 'http://interface.bilibili.com/playurl?%s&sign=%s' % (payload, sign),
video_id, note='Downloading video info page') video_id, note='Downloading video info page',
headers=self.geo_verification_headers())
if 'durl' not in video_info:
self._report_error(video_info)
entries = [] entries = []
@ -85,7 +133,7 @@ class BiliBiliIE(InfoExtractor):
title = self._html_search_regex('<h1[^>]+title="([^"]+)">', webpage, 'title') title = self._html_search_regex('<h1[^>]+title="([^"]+)">', webpage, 'title')
description = self._html_search_meta('description', webpage) description = self._html_search_meta('description', webpage)
timestamp = unified_timestamp(self._html_search_regex( timestamp = unified_timestamp(self._html_search_regex(
r'<time[^>]+datetime="([^"]+)"', webpage, 'upload time', fatal=False)) r'<time[^>]+datetime="([^"]+)"', webpage, 'upload time', default=None))
thumbnail = self._html_search_meta(['og:image', 'thumbnailUrl'], webpage) thumbnail = self._html_search_meta(['og:image', 'thumbnailUrl'], webpage)
# TODO 'view_count' requires deobfuscating Javascript # TODO 'view_count' requires deobfuscating Javascript
@ -99,7 +147,7 @@ class BiliBiliIE(InfoExtractor):
} }
uploader_mobj = re.search( uploader_mobj = re.search(
r'<a[^>]+href="https?://space\.bilibili\.com/(?P<id>\d+)"[^>]+title="(?P<name>[^"]+)"', r'<a[^>]+href="(?:https?:)?//space\.bilibili\.com/(?P<id>\d+)"[^>]+title="(?P<name>[^"]+)"',
webpage) webpage)
if uploader_mobj: if uploader_mobj:
info.update({ info.update({
@ -123,3 +171,70 @@ class BiliBiliIE(InfoExtractor):
'description': description, 'description': description,
'entries': entries, 'entries': entries,
} }
class BiliBiliBangumiIE(InfoExtractor):
_VALID_URL = r'https?://bangumi\.bilibili\.com/anime/(?P<id>\d+)'
IE_NAME = 'bangumi.bilibili.com'
IE_DESC = 'BiliBili番剧'
_TESTS = [{
'url': 'http://bangumi.bilibili.com/anime/1869',
'info_dict': {
'id': '1869',
'title': '混沌武士',
'description': 'md5:6a9622b911565794c11f25f81d6a97d2',
},
'playlist_count': 26,
}, {
'url': 'http://bangumi.bilibili.com/anime/1869',
'info_dict': {
'id': '1869',
'title': '混沌武士',
'description': 'md5:6a9622b911565794c11f25f81d6a97d2',
},
'playlist': [{
'md5': '91da8621454dd58316851c27c68b0c13',
'info_dict': {
'id': '40062',
'ext': 'mp4',
'title': '混沌武士',
'description': '故事发生在日本的江户时代。风是一个小酒馆的打工女。一日,酒馆里来了一群恶霸,虽然他们的举动令风十分不满,但是毕竟风只是一届女流,无法对他们采取什么行动,只能在心里嘟哝。这时,酒家里又进来了个“不良份子...',
'timestamp': 1414538739,
'upload_date': '20141028',
'episode': '疾风怒涛 Tempestuous Temperaments',
'episode_number': 1,
},
}],
'params': {
'playlist_items': '1',
},
}]
@classmethod
def suitable(cls, url):
return False if BiliBiliIE.suitable(url) else super(BiliBiliBangumiIE, cls).suitable(url)
def _real_extract(self, url):
bangumi_id = self._match_id(url)
# Sometimes this API returns a JSONP response
season_info = self._download_json(
'http://bangumi.bilibili.com/jsonp/seasoninfo/%s.ver' % bangumi_id,
bangumi_id, transform_source=strip_jsonp)['result']
entries = [{
'_type': 'url_transparent',
'url': smuggle_url(episode['webplay_url'], {'no_bangumi_tip': 1}),
'ie_key': BiliBiliIE.ie_key(),
'timestamp': parse_iso8601(episode.get('update_time'), delimiter=' '),
'episode': episode.get('index_title'),
'episode_number': int_or_none(episode.get('index')),
} for episode in season_info['episodes']]
entries = sorted(entries, key=lambda entry: entry.get('episode_number'))
return self.playlist_result(
entries, bangumi_id,
season_info.get('bangumi_title'), season_info.get('evaluate'))

View File

@ -33,6 +33,10 @@ class BloombergIE(InfoExtractor):
'params': { 'params': {
'format': 'best[format_id^=hds]', 'format': 'best[format_id^=hds]',
}, },
}, {
# data-bmmrid=
'url': 'https://www.bloomberg.com/politics/articles/2017-02-08/le-pen-aide-briefed-french-central-banker-on-plan-to-print-money',
'only_matching': True,
}, { }, {
'url': 'http://www.bloomberg.com/news/articles/2015-11-12/five-strange-things-that-have-been-happening-in-financial-markets', 'url': 'http://www.bloomberg.com/news/articles/2015-11-12/five-strange-things-that-have-been-happening-in-financial-markets',
'only_matching': True, 'only_matching': True,
@ -45,9 +49,10 @@ class BloombergIE(InfoExtractor):
name = self._match_id(url) name = self._match_id(url)
webpage = self._download_webpage(url, name) webpage = self._download_webpage(url, name)
video_id = self._search_regex( video_id = self._search_regex(
(r'["\']bmmrId["\']\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1', (r'["\']bmmrId["\']\s*:\s*(["\'])(?P<id>(?:(?!\1).)+)\1',
r'videoId\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1'), r'videoId\s*:\s*(["\'])(?P<id>(?:(?!\1).)+)\1',
webpage, 'id', group='url', default=None) r'data-bmmrid=(["\'])(?P<id>(?:(?!\1).)+)\1'),
webpage, 'id', group='id', default=None)
if not video_id: if not video_id:
bplayer_data = self._parse_json(self._search_regex( bplayer_data = self._parse_json(self._search_regex(
r'BPlayer\(null,\s*({[^;]+})\);', webpage, 'id'), name) r'BPlayer\(null,\s*({[^;]+})\);', webpage, 'id'), name)

View File

@ -191,6 +191,10 @@ class BrightcoveLegacyIE(InfoExtractor):
# These fields hold the id of the video # These fields hold the id of the video
videoPlayer = find_param('@videoPlayer') or find_param('videoId') or find_param('videoID') or find_param('@videoList') videoPlayer = find_param('@videoPlayer') or find_param('videoId') or find_param('videoID') or find_param('@videoList')
if videoPlayer is not None: if videoPlayer is not None:
if isinstance(videoPlayer, list):
videoPlayer = videoPlayer[0]
if not (videoPlayer.isdigit() or videoPlayer.startswith('ref:')):
return None
params['@videoPlayer'] = videoPlayer params['@videoPlayer'] = videoPlayer
linkBase = find_param('linkBaseURL') linkBase = find_param('linkBaseURL')
if linkBase is not None: if linkBase is not None:

View File

@ -27,6 +27,7 @@ class CanalplusIE(InfoExtractor):
(?:www\.)?d8\.tv| (?:www\.)?d8\.tv|
(?:www\.)?c8\.fr| (?:www\.)?c8\.fr|
(?:www\.)?d17\.tv| (?:www\.)?d17\.tv|
(?:(?:football|www)\.)?cstar\.fr|
(?:www\.)?itele\.fr (?:www\.)?itele\.fr
)/(?:(?:[^/]+/)*(?P<display_id>[^/?#&]+))?(?:\?.*\bvid=(?P<vid>\d+))?| )/(?:(?:[^/]+/)*(?P<display_id>[^/?#&]+))?(?:\?.*\bvid=(?P<vid>\d+))?|
player\.canalplus\.fr/#/(?P<id>\d+) player\.canalplus\.fr/#/(?P<id>\d+)
@ -40,6 +41,7 @@ class CanalplusIE(InfoExtractor):
'd8': 'd8', 'd8': 'd8',
'c8': 'd8', 'c8': 'd8',
'd17': 'd17', 'd17': 'd17',
'cstar': 'd17',
'itele': 'itele', 'itele': 'itele',
} }
@ -86,6 +88,19 @@ class CanalplusIE(InfoExtractor):
'description': 'Chaque matin du lundi au vendredi, Michaël Darmon reçoit un invité politique à 8h25.', 'description': 'Chaque matin du lundi au vendredi, Michaël Darmon reçoit un invité politique à 8h25.',
'upload_date': '20161014', 'upload_date': '20161014',
}, },
}, {
'url': 'http://football.cstar.fr/cstar-minisite-foot/pid7566-feminines-videos.html?vid=1416769',
'info_dict': {
'id': '1416769',
'display_id': 'pid7566-feminines-videos',
'ext': 'mp4',
'title': 'France - Albanie : les temps forts de la soirée - 20/09/2016',
'description': 'md5:c3f30f2aaac294c1c969b3294de6904e',
'upload_date': '20160921',
},
'params': {
'skip_download': True,
},
}, { }, {
'url': 'http://m.canalplus.fr/?vid=1398231', 'url': 'http://m.canalplus.fr/?vid=1398231',
'only_matching': True, 'only_matching': True,

View File

@ -296,6 +296,12 @@ class CBCWatchVideoIE(CBCWatchBaseIE):
formats = self._extract_m3u8_formats(re.sub(r'/([^/]+)/[^/?]+\.m3u8', r'/\1/\1.m3u8', m3u8_url), video_id, 'mp4', fatal=False) formats = self._extract_m3u8_formats(re.sub(r'/([^/]+)/[^/?]+\.m3u8', r'/\1/\1.m3u8', m3u8_url), video_id, 'mp4', fatal=False)
if len(formats) < 2: if len(formats) < 2:
formats = self._extract_m3u8_formats(m3u8_url, video_id, 'mp4') formats = self._extract_m3u8_formats(m3u8_url, video_id, 'mp4')
for f in formats:
format_id = f.get('format_id')
if format_id.startswith('AAC'):
f['acodec'] = 'aac'
elif format_id.startswith('AC3'):
f['acodec'] = 'ac-3'
self._sort_formats(formats) self._sort_formats(formats)
info = { info = {

View File

@ -13,6 +13,7 @@ from ..utils import (
float_or_none, float_or_none,
sanitized_Request, sanitized_Request,
urlencode_postdata, urlencode_postdata,
USER_AGENTS,
) )
@ -21,10 +22,10 @@ class CeskaTelevizeIE(InfoExtractor):
_TESTS = [{ _TESTS = [{
'url': 'http://www.ceskatelevize.cz/ivysilani/ivysilani/10441294653-hyde-park-civilizace/214411058091220', 'url': 'http://www.ceskatelevize.cz/ivysilani/ivysilani/10441294653-hyde-park-civilizace/214411058091220',
'info_dict': { 'info_dict': {
'id': '61924494876951776', 'id': '61924494877246241',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Hyde Park Civilizace', 'title': 'Hyde Park Civilizace: Život v Grónsku',
'description': 'md5:fe93f6eda372d150759d11644ebbfb4a', 'description': 'md5:3fec8f6bb497be5cdb0c9e8781076626',
'thumbnail': r're:^https?://.*\.jpg', 'thumbnail': r're:^https?://.*\.jpg',
'duration': 3350, 'duration': 3350,
}, },
@ -114,70 +115,100 @@ class CeskaTelevizeIE(InfoExtractor):
'requestSource': 'iVysilani', 'requestSource': 'iVysilani',
} }
req = sanitized_Request(
'http://www.ceskatelevize.cz/ivysilani/ajax/get-client-playlist',
data=urlencode_postdata(data))
req.add_header('Content-type', 'application/x-www-form-urlencoded')
req.add_header('x-addr', '127.0.0.1')
req.add_header('X-Requested-With', 'XMLHttpRequest')
req.add_header('Referer', url)
playlistpage = self._download_json(req, playlist_id)
playlist_url = playlistpage['url']
if playlist_url == 'error_region':
raise ExtractorError(NOT_AVAILABLE_STRING, expected=True)
req = sanitized_Request(compat_urllib_parse_unquote(playlist_url))
req.add_header('Referer', url)
playlist_title = self._og_search_title(webpage, default=None)
playlist_description = self._og_search_description(webpage, default=None)
playlist = self._download_json(req, playlist_id)['playlist']
playlist_len = len(playlist)
entries = [] entries = []
for item in playlist:
is_live = item.get('type') == 'LIVE'
formats = []
for format_id, stream_url in item['streamUrls'].items():
formats.extend(self._extract_m3u8_formats(
stream_url, playlist_id, 'mp4',
entry_protocol='m3u8' if is_live else 'm3u8_native',
fatal=False))
self._sort_formats(formats)
item_id = item.get('id') or item['assetId'] for user_agent in (None, USER_AGENTS['Safari']):
title = item['title'] req = sanitized_Request(
'http://www.ceskatelevize.cz/ivysilani/ajax/get-client-playlist',
data=urlencode_postdata(data))
duration = float_or_none(item.get('duration')) req.add_header('Content-type', 'application/x-www-form-urlencoded')
thumbnail = item.get('previewImageUrl') req.add_header('x-addr', '127.0.0.1')
req.add_header('X-Requested-With', 'XMLHttpRequest')
if user_agent:
req.add_header('User-Agent', user_agent)
req.add_header('Referer', url)
subtitles = {} playlistpage = self._download_json(req, playlist_id, fatal=False)
if item.get('type') == 'VOD':
subs = item.get('subtitles')
if subs:
subtitles = self.extract_subtitles(episode_id, subs)
if playlist_len == 1: if not playlistpage:
final_title = playlist_title or title continue
if is_live:
final_title = self._live_title(final_title)
else:
final_title = '%s (%s)' % (playlist_title, title)
entries.append({ playlist_url = playlistpage['url']
'id': item_id, if playlist_url == 'error_region':
'title': final_title, raise ExtractorError(NOT_AVAILABLE_STRING, expected=True)
'description': playlist_description if playlist_len == 1 else None,
'thumbnail': thumbnail, req = sanitized_Request(compat_urllib_parse_unquote(playlist_url))
'duration': duration, req.add_header('Referer', url)
'formats': formats,
'subtitles': subtitles, playlist_title = self._og_search_title(webpage, default=None)
'is_live': is_live, playlist_description = self._og_search_description(webpage, default=None)
})
playlist = self._download_json(req, playlist_id, fatal=False)
if not playlist:
continue
playlist = playlist.get('playlist')
if not isinstance(playlist, list):
continue
playlist_len = len(playlist)
for num, item in enumerate(playlist):
is_live = item.get('type') == 'LIVE'
formats = []
for format_id, stream_url in item.get('streamUrls', {}).items():
if 'playerType=flash' in stream_url:
stream_formats = self._extract_m3u8_formats(
stream_url, playlist_id, 'mp4',
entry_protocol='m3u8' if is_live else 'm3u8_native',
m3u8_id='hls-%s' % format_id, fatal=False)
else:
stream_formats = self._extract_mpd_formats(
stream_url, playlist_id,
mpd_id='dash-%s' % format_id, fatal=False)
# See https://github.com/rg3/youtube-dl/issues/12119#issuecomment-280037031
if format_id == 'audioDescription':
for f in stream_formats:
f['source_preference'] = -10
formats.extend(stream_formats)
if user_agent and len(entries) == playlist_len:
entries[num]['formats'].extend(formats)
continue
item_id = item.get('id') or item['assetId']
title = item['title']
duration = float_or_none(item.get('duration'))
thumbnail = item.get('previewImageUrl')
subtitles = {}
if item.get('type') == 'VOD':
subs = item.get('subtitles')
if subs:
subtitles = self.extract_subtitles(episode_id, subs)
if playlist_len == 1:
final_title = playlist_title or title
if is_live:
final_title = self._live_title(final_title)
else:
final_title = '%s (%s)' % (playlist_title, title)
entries.append({
'id': item_id,
'title': final_title,
'description': playlist_description if playlist_len == 1 else None,
'thumbnail': thumbnail,
'duration': duration,
'formats': formats,
'subtitles': subtitles,
'is_live': is_live,
})
for e in entries:
self._sort_formats(e['formats'])
return self.playlist_result(entries, playlist_id, playlist_title, playlist_description) return self.playlist_result(entries, playlist_id, playlist_title, playlist_description)

View File

@ -6,6 +6,7 @@ import hashlib
import json import json
import netrc import netrc
import os import os
import random
import re import re
import socket import socket
import sys import sys
@ -39,7 +40,10 @@ from ..utils import (
ExtractorError, ExtractorError,
fix_xml_ampersands, fix_xml_ampersands,
float_or_none, float_or_none,
GeoRestrictedError,
GeoUtils,
int_or_none, int_or_none,
js_to_json,
parse_iso8601, parse_iso8601,
RegexNotFoundError, RegexNotFoundError,
sanitize_filename, sanitize_filename,
@ -319,17 +323,34 @@ class InfoExtractor(object):
_real_extract() methods and define a _VALID_URL regexp. _real_extract() methods and define a _VALID_URL regexp.
Probably, they should also be added to the list of extractors. Probably, they should also be added to the list of extractors.
_GEO_BYPASS attribute may be set to False in order to disable
geo restriction bypass mechanisms for a particular extractor.
Though it won't disable explicit geo restriction bypass based on
country code provided with geo_bypass_country. (experimental)
_GEO_COUNTRIES attribute may contain a list of presumably geo unrestricted
countries for this extractor. One of these countries will be used by
geo restriction bypass mechanism right away in order to bypass
geo restriction, of course, if the mechanism is not disabled. (experimental)
NB: both these geo attributes are experimental and may change in future
or be completely removed.
Finally, the _WORKING attribute should be set to False for broken IEs Finally, the _WORKING attribute should be set to False for broken IEs
in order to warn the users and skip the tests. in order to warn the users and skip the tests.
""" """
_ready = False _ready = False
_downloader = None _downloader = None
_x_forwarded_for_ip = None
_GEO_BYPASS = True
_GEO_COUNTRIES = None
_WORKING = True _WORKING = True
def __init__(self, downloader=None): def __init__(self, downloader=None):
"""Constructor. Receives an optional downloader.""" """Constructor. Receives an optional downloader."""
self._ready = False self._ready = False
self._x_forwarded_for_ip = None
self.set_downloader(downloader) self.set_downloader(downloader)
@classmethod @classmethod
@ -358,15 +379,59 @@ class InfoExtractor(object):
def initialize(self): def initialize(self):
"""Initializes an instance (authentication, etc).""" """Initializes an instance (authentication, etc)."""
self._initialize_geo_bypass(self._GEO_COUNTRIES)
if not self._ready: if not self._ready:
self._real_initialize() self._real_initialize()
self._ready = True self._ready = True
def _initialize_geo_bypass(self, countries):
"""
Initialize geo restriction bypass mechanism.
This method is used to initialize geo bypass mechanism based on faking
X-Forwarded-For HTTP header. A random country from provided country list
is selected and a random IP belonging to this country is generated. This
IP will be passed as X-Forwarded-For HTTP header in all subsequent
HTTP requests.
This method will be used for initial geo bypass mechanism initialization
during the instance initialization with _GEO_COUNTRIES.
You may also manually call it from extractor's code if geo countries
information is not available beforehand (e.g. obtained during
extraction) or due to some another reason.
"""
if not self._x_forwarded_for_ip:
country_code = self._downloader.params.get('geo_bypass_country', None)
# If there is no explicit country for geo bypass specified and
# the extractor is known to be geo restricted let's fake IP
# as X-Forwarded-For right away.
if (not country_code and
self._GEO_BYPASS and
self._downloader.params.get('geo_bypass', True) and
countries):
country_code = random.choice(countries)
if country_code:
self._x_forwarded_for_ip = GeoUtils.random_ipv4(country_code)
if self._downloader.params.get('verbose', False):
self._downloader.to_stdout(
'[debug] Using fake IP %s (%s) as X-Forwarded-For.'
% (self._x_forwarded_for_ip, country_code.upper()))
def extract(self, url): def extract(self, url):
"""Extracts URL information and returns it in list of dicts.""" """Extracts URL information and returns it in list of dicts."""
try: try:
self.initialize() for _ in range(2):
return self._real_extract(url) try:
self.initialize()
ie_result = self._real_extract(url)
if self._x_forwarded_for_ip:
ie_result['__x_forwarded_for_ip'] = self._x_forwarded_for_ip
return ie_result
except GeoRestrictedError as e:
if self.__maybe_fake_ip_and_retry(e.countries):
continue
raise
except ExtractorError: except ExtractorError:
raise raise
except compat_http_client.IncompleteRead as e: except compat_http_client.IncompleteRead as e:
@ -374,6 +439,21 @@ class InfoExtractor(object):
except (KeyError, StopIteration) as e: except (KeyError, StopIteration) as e:
raise ExtractorError('An extractor error has occurred.', cause=e) raise ExtractorError('An extractor error has occurred.', cause=e)
def __maybe_fake_ip_and_retry(self, countries):
if (not self._downloader.params.get('geo_bypass_country', None) and
self._GEO_BYPASS and
self._downloader.params.get('geo_bypass', True) and
not self._x_forwarded_for_ip and
countries):
country_code = random.choice(countries)
self._x_forwarded_for_ip = GeoUtils.random_ipv4(country_code)
if self._x_forwarded_for_ip:
self.report_warning(
'Video is geo restricted. Retrying extraction with fake IP %s (%s) as X-Forwarded-For.'
% (self._x_forwarded_for_ip, country_code.upper()))
return True
return False
def set_downloader(self, downloader): def set_downloader(self, downloader):
"""Sets the downloader for this IE.""" """Sets the downloader for this IE."""
self._downloader = downloader self._downloader = downloader
@ -433,6 +513,15 @@ class InfoExtractor(object):
if isinstance(url_or_request, (compat_str, str)): if isinstance(url_or_request, (compat_str, str)):
url_or_request = url_or_request.partition('#')[0] url_or_request = url_or_request.partition('#')[0]
# Some sites check X-Forwarded-For HTTP header in order to figure out
# the origin of the client behind proxy. This allows bypassing geo
# restriction by faking this header's value to IP that belongs to some
# geo unrestricted country. We will do so once we encounter any
# geo restriction error.
if self._x_forwarded_for_ip:
if 'X-Forwarded-For' not in headers:
headers['X-Forwarded-For'] = self._x_forwarded_for_ip
urlh = self._request_webpage(url_or_request, video_id, note, errnote, fatal, data=data, headers=headers, query=query) urlh = self._request_webpage(url_or_request, video_id, note, errnote, fatal, data=data, headers=headers, query=query)
if urlh is False: if urlh is False:
assert not fatal assert not fatal
@ -608,10 +697,8 @@ class InfoExtractor(object):
expected=True) expected=True)
@staticmethod @staticmethod
def raise_geo_restricted(msg='This video is not available from your location due to geo restriction'): def raise_geo_restricted(msg='This video is not available from your location due to geo restriction', countries=None):
raise ExtractorError( raise GeoRestrictedError(msg, countries=countries)
'%s. You might want to use --proxy to workaround.' % msg,
expected=True)
# Methods for following #608 # Methods for following #608
@staticmethod @staticmethod
@ -1025,13 +1112,13 @@ class InfoExtractor(object):
unique_formats.append(f) unique_formats.append(f)
formats[:] = unique_formats formats[:] = unique_formats
def _is_valid_url(self, url, video_id, item='video'): def _is_valid_url(self, url, video_id, item='video', headers={}):
url = self._proto_relative_url(url, scheme='http:') url = self._proto_relative_url(url, scheme='http:')
# For now assume non HTTP(S) URLs always valid # For now assume non HTTP(S) URLs always valid
if not (url.startswith('http://') or url.startswith('https://')): if not (url.startswith('http://') or url.startswith('https://')):
return True return True
try: try:
self._request_webpage(url, video_id, 'Checking %s URL' % item) self._request_webpage(url, video_id, 'Checking %s URL' % item, headers=headers)
return True return True
except ExtractorError as e: except ExtractorError as e:
if isinstance(e.cause, compat_urllib_error.URLError): if isinstance(e.cause, compat_urllib_error.URLError):
@ -1208,6 +1295,9 @@ class InfoExtractor(object):
m3u8_doc, urlh = res m3u8_doc, urlh = res
m3u8_url = urlh.geturl() m3u8_url = urlh.geturl()
if '#EXT-X-FAXS-CM:' in m3u8_doc: # Adobe Flash Access
return []
formats = [self._m3u8_meta_format(m3u8_url, ext, preference, m3u8_id)] formats = [self._m3u8_meta_format(m3u8_url, ext, preference, m3u8_id)]
format_url = lambda u: ( format_url = lambda u: (
@ -1315,8 +1405,8 @@ class InfoExtractor(object):
'abr': abr, 'abr': abr,
}) })
f.update(parse_codecs(last_info.get('CODECS'))) f.update(parse_codecs(last_info.get('CODECS')))
if audio_in_video_stream.get(last_info.get('AUDIO')) is False: if audio_in_video_stream.get(last_info.get('AUDIO')) is False and f['vcodec'] != 'none':
# TODO: update acodec for for audio only formats with the same GROUP-ID # TODO: update acodec for audio only formats with the same GROUP-ID
f['acodec'] = 'none' f['acodec'] = 'none'
formats.append(f) formats.append(f)
last_info = {} last_info = {}
@ -1959,7 +2049,12 @@ class InfoExtractor(object):
media_tags = [(media_tag, media_type, '') media_tags = [(media_tag, media_type, '')
for media_tag, media_type for media_tag, media_type
in re.findall(r'(?s)(<(video|audio)[^>]*/>)', webpage)] in re.findall(r'(?s)(<(video|audio)[^>]*/>)', webpage)]
media_tags.extend(re.findall(r'(?s)(<(?P<tag>video|audio)[^>]*>)(.*?)</(?P=tag)>', webpage)) media_tags.extend(re.findall(
# We only allow video|audio followed by a whitespace or '>'.
# Allowing more characters may end up in significant slow down (see
# https://github.com/rg3/youtube-dl/issues/11979, example URL:
# http://www.porntrex.com/maps/videositemap.xml).
r'(?s)(<(?P<tag>video|audio)(?:\s+[^>]*)?>)(.*?)</(?P=tag)>', webpage))
for media_tag, media_type, media_content in media_tags: for media_tag, media_type, media_content in media_tags:
media_info = { media_info = {
'formats': [], 'formats': [],
@ -2065,6 +2160,123 @@ class InfoExtractor(object):
}) })
return formats return formats
@staticmethod
def _find_jwplayer_data(webpage):
mobj = re.search(
r'jwplayer\((?P<quote>[\'"])[^\'" ]+(?P=quote)\)\.setup\s*\((?P<options>[^)]+)\)',
webpage)
if mobj:
return mobj.group('options')
def _extract_jwplayer_data(self, webpage, video_id, *args, **kwargs):
jwplayer_data = self._parse_json(
self._find_jwplayer_data(webpage), video_id,
transform_source=js_to_json)
return self._parse_jwplayer_data(
jwplayer_data, video_id, *args, **kwargs)
def _parse_jwplayer_data(self, jwplayer_data, video_id=None, require_title=True,
m3u8_id=None, mpd_id=None, rtmp_params=None, base_url=None):
# JWPlayer backward compatibility: flattened playlists
# https://github.com/jwplayer/jwplayer/blob/v7.4.3/src/js/api/config.js#L81-L96
if 'playlist' not in jwplayer_data:
jwplayer_data = {'playlist': [jwplayer_data]}
entries = []
# JWPlayer backward compatibility: single playlist item
# https://github.com/jwplayer/jwplayer/blob/v7.7.0/src/js/playlist/playlist.js#L10
if not isinstance(jwplayer_data['playlist'], list):
jwplayer_data['playlist'] = [jwplayer_data['playlist']]
for video_data in jwplayer_data['playlist']:
# JWPlayer backward compatibility: flattened sources
# https://github.com/jwplayer/jwplayer/blob/v7.4.3/src/js/playlist/item.js#L29-L35
if 'sources' not in video_data:
video_data['sources'] = [video_data]
this_video_id = video_id or video_data['mediaid']
formats = []
for source in video_data['sources']:
source_url = self._proto_relative_url(source['file'])
if base_url:
source_url = compat_urlparse.urljoin(base_url, source_url)
source_type = source.get('type') or ''
ext = mimetype2ext(source_type) or determine_ext(source_url)
if source_type == 'hls' or ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
source_url, this_video_id, 'mp4', 'm3u8_native', m3u8_id=m3u8_id, fatal=False))
elif ext == 'mpd':
formats.extend(self._extract_mpd_formats(
source_url, this_video_id, mpd_id=mpd_id, fatal=False))
# https://github.com/jwplayer/jwplayer/blob/master/src/js/providers/default.js#L67
elif source_type.startswith('audio') or ext in ('oga', 'aac', 'mp3', 'mpeg', 'vorbis'):
formats.append({
'url': source_url,
'vcodec': 'none',
'ext': ext,
})
else:
height = int_or_none(source.get('height'))
if height is None:
# Often no height is provided but there is a label in
# format like 1080p.
height = int_or_none(self._search_regex(
r'^(\d{3,})[pP]$', source.get('label') or '',
'height', default=None))
a_format = {
'url': source_url,
'width': int_or_none(source.get('width')),
'height': height,
'ext': ext,
}
if source_url.startswith('rtmp'):
a_format['ext'] = 'flv'
# See com/longtailvideo/jwplayer/media/RTMPMediaProvider.as
# of jwplayer.flash.swf
rtmp_url_parts = re.split(
r'((?:mp4|mp3|flv):)', source_url, 1)
if len(rtmp_url_parts) == 3:
rtmp_url, prefix, play_path = rtmp_url_parts
a_format.update({
'url': rtmp_url,
'play_path': prefix + play_path,
})
if rtmp_params:
a_format.update(rtmp_params)
formats.append(a_format)
self._sort_formats(formats)
subtitles = {}
tracks = video_data.get('tracks')
if tracks and isinstance(tracks, list):
for track in tracks:
if track.get('kind') != 'captions':
continue
track_url = urljoin(base_url, track.get('file'))
if not track_url:
continue
subtitles.setdefault(track.get('label') or 'en', []).append({
'url': self._proto_relative_url(track_url)
})
entries.append({
'id': this_video_id,
'title': video_data['title'] if require_title else video_data.get('title'),
'description': video_data.get('description'),
'thumbnail': self._proto_relative_url(video_data.get('image')),
'timestamp': int_or_none(video_data.get('pubdate')),
'duration': float_or_none(jwplayer_data.get('duration') or video_data.get('duration')),
'subtitles': subtitles,
'formats': formats,
})
if len(entries) == 1:
return entries[0]
else:
return self.playlist_result(entries)
def _live_title(self, name): def _live_title(self, name):
""" Generate the title for a live video """ """ Generate the title for a live video """
now = datetime.datetime.now() now = datetime.datetime.now()

View File

@ -1,5 +1,7 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import sys
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ExtractorError from ..utils import ExtractorError
@ -7,7 +9,7 @@ from ..utils import ExtractorError
class CommonMistakesIE(InfoExtractor): class CommonMistakesIE(InfoExtractor):
IE_DESC = False # Do not list IE_DESC = False # Do not list
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
(?:url|URL) (?:url|URL)$
''' '''
_TESTS = [{ _TESTS = [{
@ -33,7 +35,9 @@ class UnicodeBOMIE(InfoExtractor):
IE_DESC = False IE_DESC = False
_VALID_URL = r'(?P<bom>\ufeff)(?P<id>.*)$' _VALID_URL = r'(?P<bom>\ufeff)(?P<id>.*)$'
_TESTS = [{ # Disable test for python 3.2 since BOM is broken in re in this version
# (see https://github.com/rg3/youtube-dl/issues/9751)
_TESTS = [] if (3, 0) < sys.version_info <= (3, 3) else [{
'url': '\ufeffhttp://www.youtube.com/watch?v=BaW_jenozKc', 'url': '\ufeffhttp://www.youtube.com/watch?v=BaW_jenozKc',
'only_matching': True, 'only_matching': True,
}] }]

View File

@ -0,0 +1,72 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .theplatform import ThePlatformFeedIE
from ..utils import int_or_none
class CorusIE(ThePlatformFeedIE):
_VALID_URL = r'https?://(?:www\.)?(?P<domain>(?:globaltv|etcanada)\.com|(?:hgtv|foodnetwork|slice)\.ca)/(?:video/|(?:[^/]+/)+(?:videos/[a-z0-9-]+-|video\.html\?.*?\bv=))(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.hgtv.ca/shows/bryan-inc/videos/movie-night-popcorn-with-bryan-870923331648/',
'md5': '05dcbca777bf1e58c2acbb57168ad3a6',
'info_dict': {
'id': '870923331648',
'ext': 'mp4',
'title': 'Movie Night Popcorn with Bryan',
'description': 'Bryan whips up homemade popcorn, the old fashion way for Jojo and Lincoln.',
'uploader': 'SHWM-NEW',
'upload_date': '20170206',
'timestamp': 1486392197,
},
}, {
'url': 'http://www.foodnetwork.ca/shows/chopped/video/episode/chocolate-obsession/video.html?v=872683587753',
'only_matching': True,
}, {
'url': 'http://etcanada.com/video/873675331955/meet-the-survivor-game-changers-castaways-part-2/',
'only_matching': True,
}]
_TP_FEEDS = {
'globaltv': {
'feed_id': 'ChQqrem0lNUp',
'account_id': 2269680845,
},
'etcanada': {
'feed_id': 'ChQqrem0lNUp',
'account_id': 2269680845,
},
'hgtv': {
'feed_id': 'L0BMHXi2no43',
'account_id': 2414428465,
},
'foodnetwork': {
'feed_id': 'ukK8o58zbRmJ',
'account_id': 2414429569,
},
'slice': {
'feed_id': '5tUJLgV2YNJ5',
'account_id': 2414427935,
},
}
def _real_extract(self, url):
domain, video_id = re.match(self._VALID_URL, url).groups()
feed_info = self._TP_FEEDS[domain.split('.')[0]]
return self._extract_feed_info('dtjsEC', feed_info['feed_id'], 'byId=' + video_id, video_id, lambda e: {
'episode_number': int_or_none(e.get('pl1$episode')),
'season_number': int_or_none(e.get('pl1$season')),
'series': e.get('pl1$show'),
}, {
'HLS': {
'manifest': 'm3u',
},
'DesktopHLS Default': {
'manifest': 'm3u',
},
'MP4 MBR': {
'manifest': 'm3u',
},
}, feed_info['account_id'])

View File

@ -9,13 +9,15 @@ from ..utils import (
unified_strdate, unified_strdate,
compat_str, compat_str,
determine_ext, determine_ext,
ExtractorError,
) )
class DisneyIE(InfoExtractor): class DisneyIE(InfoExtractor):
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?://(?P<domain>(?:[^/]+\.)?(?:disney\.[a-z]{2,3}(?:\.[a-z]{2})?|disney(?:(?:me|latino)\.com|turkiye\.com\.tr)|starwars\.com))/(?:embed/|(?:[^/]+/)+[\w-]+-)(?P<id>[a-z0-9]{24})''' https?://(?P<domain>(?:[^/]+\.)?(?:disney\.[a-z]{2,3}(?:\.[a-z]{2})?|disney(?:(?:me|latino)\.com|turkiye\.com\.tr)|(?:starwars|marvelkids)\.com))/(?:(?:embed/|(?:[^/]+/)+[\w-]+-)(?P<id>[a-z0-9]{24})|(?:[^/]+/)?(?P<display_id>[^/?#]+))'''
_TESTS = [{ _TESTS = [{
# Disney.EmbedVideo
'url': 'http://video.disney.com/watch/moana-trailer-545ed1857afee5a0ec239977', 'url': 'http://video.disney.com/watch/moana-trailer-545ed1857afee5a0ec239977',
'info_dict': { 'info_dict': {
'id': '545ed1857afee5a0ec239977', 'id': '545ed1857afee5a0ec239977',
@ -28,6 +30,20 @@ class DisneyIE(InfoExtractor):
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
} }
}, {
# Grill.burger
'url': 'http://www.starwars.com/video/rogue-one-a-star-wars-story-intro-featurette',
'info_dict': {
'id': '5454e9f4e9804a552e3524c8',
'ext': 'mp4',
'title': '"Intro" Featurette: Rogue One: A Star Wars Story',
'upload_date': '20170104',
'description': 'Go behind-the-scenes of Rogue One: A Star Wars Story in this featurette with Director Gareth Edwards and the cast of the film.',
},
'params': {
# m3u8 download
'skip_download': True,
}
}, { }, {
'url': 'http://videos.disneylatino.com/ver/spider-man-de-regreso-a-casa-primer-adelanto-543a33a1850bdcfcca13bae2', 'url': 'http://videos.disneylatino.com/ver/spider-man-de-regreso-a-casa-primer-adelanto-543a33a1850bdcfcca13bae2',
'only_matching': True, 'only_matching': True,
@ -43,31 +59,55 @@ class DisneyIE(InfoExtractor):
}, { }, {
'url': 'http://www.starwars.com/embed/54690d1e6c42e5f09a0fb097', 'url': 'http://www.starwars.com/embed/54690d1e6c42e5f09a0fb097',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://spiderman.marvelkids.com/embed/522900d2ced3c565e4cc0677',
'only_matching': True,
}, {
'url': 'http://spiderman.marvelkids.com/videos/contest-of-champions-part-four-clip-1',
'only_matching': True,
}, {
'url': 'http://disneyjunior.en.disneyme.com/dj/watch-my-friends-tigger-and-pooh-promo',
'only_matching': True,
}, {
'url': 'http://disneyjunior.disney.com/galactech-the-galactech-grab-galactech-an-admiral-rescue',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
domain, video_id = re.match(self._VALID_URL, url).groups() domain, video_id, display_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage( if not video_id:
'http://%s/embed/%s' % (domain, video_id), video_id) webpage = self._download_webpage(url, display_id)
video_data = self._parse_json(self._search_regex( grill = re.sub(r'"\s*\+\s*"', '', self._search_regex(
r'Disney\.EmbedVideo=({.+});', webpage, 'embed data'), video_id)['video'] r'Grill\.burger\s*=\s*({.+})\s*:',
webpage, 'grill data'))
page_data = next(s for s in self._parse_json(grill, display_id)['stack'] if s.get('type') == 'video')
video_data = page_data['data'][0]
else:
webpage = self._download_webpage(
'http://%s/embed/%s' % (domain, video_id), video_id)
page_data = self._parse_json(self._search_regex(
r'Disney\.EmbedVideo\s*=\s*({.+});',
webpage, 'embed data'), video_id)
video_data = page_data['video']
for external in video_data.get('externals', []): for external in video_data.get('externals', []):
if external.get('source') == 'vevo': if external.get('source') == 'vevo':
return self.url_result('vevo:' + external['data_id'], 'Vevo') return self.url_result('vevo:' + external['data_id'], 'Vevo')
video_id = video_data['id']
title = video_data['title'] title = video_data['title']
formats = [] formats = []
for flavor in video_data.get('flavors', []): for flavor in video_data.get('flavors', []):
flavor_format = flavor.get('format') flavor_format = flavor.get('format')
flavor_url = flavor.get('url') flavor_url = flavor.get('url')
if not flavor_url or not re.match(r'https?://', flavor_url): if not flavor_url or not re.match(r'https?://', flavor_url) or flavor_format == 'mp4_access':
continue continue
tbr = int_or_none(flavor.get('bitrate')) tbr = int_or_none(flavor.get('bitrate'))
if tbr == 99999: if tbr == 99999:
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
flavor_url, video_id, 'mp4', m3u8_id=flavor_format, fatal=False)) flavor_url, video_id, 'mp4',
m3u8_id=flavor_format, fatal=False))
continue continue
format_id = [] format_id = []
if flavor_format: if flavor_format:
@ -88,6 +128,10 @@ class DisneyIE(InfoExtractor):
'ext': ext, 'ext': ext,
'vcodec': 'none' if (width == 0 and height == 0) else None, 'vcodec': 'none' if (width == 0 and height == 0) else None,
}) })
if not formats and video_data.get('expired'):
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, page_data['translations']['video_expired']),
expected=True)
self._sort_formats(formats) self._sort_formats(formats)
subtitles = {} subtitles = {}

View File

@ -18,7 +18,7 @@ from ..utils import (
class DouyuTVIE(InfoExtractor): class DouyuTVIE(InfoExtractor):
IE_DESC = '斗鱼' IE_DESC = '斗鱼'
_VALID_URL = r'https?://(?:www\.)?douyu(?:tv)?\.com/(?P<id>[A-Za-z0-9]+)' _VALID_URL = r'https?://(?:www\.)?douyu(?:tv)?\.com/(?:[^/]+/)*(?P<id>[A-Za-z0-9]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.douyutv.com/iseven', 'url': 'http://www.douyutv.com/iseven',
'info_dict': { 'info_dict': {
@ -68,6 +68,10 @@ class DouyuTVIE(InfoExtractor):
}, { }, {
'url': 'http://www.douyu.com/xiaocang', 'url': 'http://www.douyu.com/xiaocang',
'only_matching': True, 'only_matching': True,
}, {
# \"room_id\"
'url': 'http://www.douyu.com/t/lpl',
'only_matching': True,
}] }]
# Decompile core.swf in webpage by ffdec "Search SWFs in memory". core.swf # Decompile core.swf in webpage by ffdec "Search SWFs in memory". core.swf
@ -82,7 +86,7 @@ class DouyuTVIE(InfoExtractor):
else: else:
page = self._download_webpage(url, video_id) page = self._download_webpage(url, video_id)
room_id = self._html_search_regex( room_id = self._html_search_regex(
r'"room_id"\s*:\s*(\d+),', page, 'room id') r'"room_id\\?"\s*:\s*(\d+),', page, 'room id')
room = self._download_json( room = self._download_json(
'http://m.douyu.com/html5/live?roomId=%s' % room_id, video_id, 'http://m.douyu.com/html5/live?roomId=%s' % room_id, video_id,

View File

@ -20,6 +20,7 @@ from ..utils import (
class DramaFeverBaseIE(AMPIE): class DramaFeverBaseIE(AMPIE):
_LOGIN_URL = 'https://www.dramafever.com/accounts/login/' _LOGIN_URL = 'https://www.dramafever.com/accounts/login/'
_NETRC_MACHINE = 'dramafever' _NETRC_MACHINE = 'dramafever'
_GEO_COUNTRIES = ['US', 'CA']
_CONSUMER_SECRET = 'DA59dtVXYLxajktV' _CONSUMER_SECRET = 'DA59dtVXYLxajktV'
@ -116,8 +117,9 @@ class DramaFeverIE(DramaFeverBaseIE):
'http://www.dramafever.com/amp/episode/feed.json?guid=%s' % video_id) 'http://www.dramafever.com/amp/episode/feed.json?guid=%s' % video_id)
except ExtractorError as e: except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError): if isinstance(e.cause, compat_HTTPError):
raise ExtractorError( self.raise_geo_restricted(
'Currently unavailable in your country.', expected=True) msg='Currently unavailable in your country',
countries=self._GEO_COUNTRIES)
raise raise
series_id, episode_number = video_id.split('.') series_id, episode_number = video_id.split('.')

View File

@ -9,12 +9,13 @@ from ..utils import (
mimetype2ext, mimetype2ext,
parse_iso8601, parse_iso8601,
remove_end, remove_end,
update_url_query,
) )
class DRTVIE(InfoExtractor): class DRTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?dr\.dk/(?:tv/se|nyheder)/(?:[^/]+/)*(?P<id>[\da-z-]+)(?:[/#?]|$)' _VALID_URL = r'https?://(?:www\.)?dr\.dk/(?:tv/se|nyheder|radio/ondemand)/(?:[^/]+/)*(?P<id>[\da-z-]+)(?:[/#?]|$)'
IE_NAME = 'drtv'
_TESTS = [{ _TESTS = [{
'url': 'https://www.dr.dk/tv/se/boern/ultra/klassen-ultra/klassen-darlig-taber-10', 'url': 'https://www.dr.dk/tv/se/boern/ultra/klassen-ultra/klassen-darlig-taber-10',
'md5': '25e659cccc9a2ed956110a299fdf5983', 'md5': '25e659cccc9a2ed956110a299fdf5983',
@ -79,9 +80,10 @@ class DRTVIE(InfoExtractor):
subtitles = {} subtitles = {}
for asset in data['Assets']: for asset in data['Assets']:
if asset.get('Kind') == 'Image': kind = asset.get('Kind')
if kind == 'Image':
thumbnail = asset.get('Uri') thumbnail = asset.get('Uri')
elif asset.get('Kind') == 'VideoResource': elif kind in ('VideoResource', 'AudioResource'):
duration = float_or_none(asset.get('DurationInMilliseconds'), 1000) duration = float_or_none(asset.get('DurationInMilliseconds'), 1000)
restricted_to_denmark = asset.get('RestrictedToDenmark') restricted_to_denmark = asset.get('RestrictedToDenmark')
spoken_subtitles = asset.get('Target') == 'SpokenSubtitles' spoken_subtitles = asset.get('Target') == 'SpokenSubtitles'
@ -96,9 +98,13 @@ class DRTVIE(InfoExtractor):
preference = -1 preference = -1
format_id += '-spoken-subtitles' format_id += '-spoken-subtitles'
if target == 'HDS': if target == 'HDS':
formats.extend(self._extract_f4m_formats( f4m_formats = self._extract_f4m_formats(
uri + '?hdcore=3.3.0&plugin=aasp-3.3.0.99.43', uri + '?hdcore=3.3.0&plugin=aasp-3.3.0.99.43',
video_id, preference, f4m_id=format_id)) video_id, preference, f4m_id=format_id)
if kind == 'AudioResource':
for f in f4m_formats:
f['vcodec'] = 'none'
formats.extend(f4m_formats)
elif target == 'HLS': elif target == 'HLS':
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
uri, video_id, 'mp4', entry_protocol='m3u8_native', uri, video_id, 'mp4', entry_protocol='m3u8_native',
@ -112,6 +118,7 @@ class DRTVIE(InfoExtractor):
'format_id': format_id, 'format_id': format_id,
'tbr': int_or_none(bitrate), 'tbr': int_or_none(bitrate),
'ext': link.get('FileFormat'), 'ext': link.get('FileFormat'),
'vcodec': 'none' if kind == 'AudioResource' else None,
}) })
subtitles_list = asset.get('SubtitlesList') subtitles_list = asset.get('SubtitlesList')
if isinstance(subtitles_list, list): if isinstance(subtitles_list, list):
@ -144,3 +151,58 @@ class DRTVIE(InfoExtractor):
'formats': formats, 'formats': formats,
'subtitles': subtitles, 'subtitles': subtitles,
} }
class DRTVLiveIE(InfoExtractor):
IE_NAME = 'drtv:live'
_VALID_URL = r'https?://(?:www\.)?dr\.dk/(?:tv|TV)/live/(?P<id>[\da-z-]+)'
_TEST = {
'url': 'https://www.dr.dk/tv/live/dr1',
'info_dict': {
'id': 'dr1',
'ext': 'mp4',
'title': 're:^DR1 [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
},
'params': {
# m3u8 download
'skip_download': True,
},
}
def _real_extract(self, url):
channel_id = self._match_id(url)
channel_data = self._download_json(
'https://www.dr.dk/mu-online/api/1.0/channel/' + channel_id,
channel_id)
title = self._live_title(channel_data['Title'])
formats = []
for streaming_server in channel_data.get('StreamingServers', []):
server = streaming_server.get('Server')
if not server:
continue
link_type = streaming_server.get('LinkType')
for quality in streaming_server.get('Qualities', []):
for stream in quality.get('Streams', []):
stream_path = stream.get('Stream')
if not stream_path:
continue
stream_url = update_url_query(
'%s/%s' % (server, stream_path), {'b': ''})
if link_type == 'HLS':
formats.extend(self._extract_m3u8_formats(
stream_url, channel_id, 'mp4',
m3u8_id=link_type, fatal=False, live=True))
elif link_type == 'HDS':
formats.extend(self._extract_f4m_formats(update_url_query(
'%s/%s' % (server, stream_path), {'hdcore': '3.7.0'}),
channel_id, f4m_id=link_type, fatal=False))
self._sort_formats(formats)
return {
'id': channel_id,
'title': title,
'thumbnail': channel_data.get('PrimaryImageUri'),
'formats': formats,
'is_live': True,
}

View File

@ -1,67 +1,97 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import base64
import json
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urlparse from ..compat import (
compat_urlparse,
compat_str,
)
from ..utils import ( from ..utils import (
remove_start, extract_attributes,
sanitized_Request, ExtractorError,
get_elements_by_class,
urlencode_postdata,
) )
class EinthusanIE(InfoExtractor): class EinthusanIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?einthusan\.com/movies/watch.php\?([^#]*?)id=(?P<id>[0-9]+)' _VALID_URL = r'https?://einthusan\.tv/movie/watch/(?P<id>[^/?#&]+)'
_TESTS = [ _TESTS = [{
{ 'url': 'https://einthusan.tv/movie/watch/9097/',
'url': 'http://www.einthusan.com/movies/watch.php?id=2447', 'md5': 'ff0f7f2065031b8a2cf13a933731c035',
'md5': 'd71379996ff5b7f217eca034c34e3461', 'info_dict': {
'info_dict': { 'id': '9097',
'id': '2447', 'ext': 'mp4',
'ext': 'mp4', 'title': 'Ae Dil Hai Mushkil',
'title': 'Ek Villain', 'description': 'md5:33ef934c82a671a94652a9b4e54d931b',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
'description': 'md5:9d29fc91a7abadd4591fb862fa560d93', }
} }, {
}, 'url': 'https://einthusan.tv/movie/watch/51MZ/?lang=hindi',
{ 'only_matching': True,
'url': 'http://www.einthusan.com/movies/watch.php?id=1671', }]
'md5': 'b16a6fd3c67c06eb7c79c8a8615f4213',
'info_dict': { # reversed from jsoncrypto.prototype.decrypt() in einthusan-PGMovieWatcher.js
'id': '1671', def _decrypt(self, encrypted_data, video_id):
'ext': 'mp4', return self._parse_json(base64.b64decode((
'title': 'Soodhu Kavvuum', encrypted_data[:10] + encrypted_data[-1] + encrypted_data[12:-1]
'thumbnail': r're:^https?://.*\.jpg$', ).encode('ascii')).decode('utf-8'), video_id)
'description': 'md5:b40f2bf7320b4f9414f3780817b2af8c',
}
},
]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
request = sanitized_Request(url) webpage = self._download_webpage(url, video_id)
request.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 5.2; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0')
webpage = self._download_webpage(request, video_id)
title = self._html_search_regex( title = self._html_search_regex(r'<h3>([^<]+)</h3>', webpage, 'title')
r'<h1><a[^>]+class=["\']movie-title["\'][^>]*>(.+?)</a></h1>',
webpage, 'title')
video_id = self._search_regex( player_params = extract_attributes(self._search_regex(
r'data-movieid=["\'](\d+)', webpage, 'video id', default=video_id) r'(<section[^>]+id="UIVideoPlayer"[^>]+>)', webpage, 'player parameters'))
m3u8_url = self._download_webpage( page_id = self._html_search_regex(
'http://cdn.einthusan.com/geturl/%s/hd/London,Washington,Toronto,Dallas,San,Sydney/' '<html[^>]+data-pageid="([^"]+)"', webpage, 'page ID')
% video_id, video_id, headers={'Referer': url}) video_data = self._download_json(
formats = self._extract_m3u8_formats( 'https://einthusan.tv/ajax/movie/watch/%s/' % video_id, video_id,
m3u8_url, video_id, ext='mp4', entry_protocol='m3u8_native') data=urlencode_postdata({
'xEvent': 'UIVideoPlayer.PingOutcome',
'xJson': json.dumps({
'EJOutcomes': player_params['data-ejpingables'],
'NativeHLS': False
}),
'arcVersion': 3,
'appVersion': 59,
'gorilla.csrf.Token': page_id,
}))['Data']
description = self._html_search_meta('description', webpage) if isinstance(video_data, compat_str) and video_data.startswith('/ratelimited/'):
raise ExtractorError(
'Download rate reached. Please try again later.', expected=True)
ej_links = self._decrypt(video_data['EJLinks'], video_id)
formats = []
m3u8_url = ej_links.get('HLSLink')
if m3u8_url:
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, ext='mp4', entry_protocol='m3u8_native'))
mp4_url = ej_links.get('MP4Link')
if mp4_url:
formats.append({
'url': mp4_url,
})
self._sort_formats(formats)
description = get_elements_by_class('synopsis', webpage)[0]
thumbnail = self._html_search_regex( thumbnail = self._html_search_regex(
r'''<a class="movie-cover-wrapper".*?><img src=["'](.*?)["'].*?/></a>''', r'''<img[^>]+src=(["'])(?P<url>(?!\1).+?/moviecovers/(?!\1).+?)\1''',
webpage, "thumbnail url", fatal=False) webpage, 'thumbnail url', fatal=False, group='url')
if thumbnail is not None: if thumbnail is not None:
thumbnail = compat_urlparse.urljoin(url, remove_start(thumbnail, '..')) thumbnail = compat_urlparse.urljoin(url, thumbnail)
return { return {
'id': video_id, 'id': video_id,

View File

@ -1,13 +1,9 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import json
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from .kaltura import KalturaIE
ExtractorError, from ..utils import NO_DEFAULT
NO_DEFAULT,
)
class EllenTVIE(InfoExtractor): class EllenTVIE(InfoExtractor):
@ -65,7 +61,7 @@ class EllenTVIE(InfoExtractor):
if partner_id and kaltura_id: if partner_id and kaltura_id:
break break
return self.url_result('kaltura:%s:%s' % (partner_id, kaltura_id), 'Kaltura') return self.url_result('kaltura:%s:%s' % (partner_id, kaltura_id), KalturaIE.ie_key())
class EllenTVClipsIE(InfoExtractor): class EllenTVClipsIE(InfoExtractor):
@ -77,14 +73,14 @@ class EllenTVClipsIE(InfoExtractor):
'id': 'meryl-streep-vanessa-hudgens', 'id': 'meryl-streep-vanessa-hudgens',
'title': 'Meryl Streep, Vanessa Hudgens', 'title': 'Meryl Streep, Vanessa Hudgens',
}, },
'playlist_mincount': 7, 'playlist_mincount': 5,
} }
def _real_extract(self, url): def _real_extract(self, url):
playlist_id = self._match_id(url) playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id) webpage = self._download_webpage(url, playlist_id)
playlist = self._extract_playlist(webpage) playlist = self._extract_playlist(webpage, playlist_id)
return { return {
'_type': 'playlist', '_type': 'playlist',
@ -93,16 +89,13 @@ class EllenTVClipsIE(InfoExtractor):
'entries': self._extract_entries(playlist) 'entries': self._extract_entries(playlist)
} }
def _extract_playlist(self, webpage): def _extract_playlist(self, webpage, playlist_id):
json_string = self._search_regex(r'playerView.addClips\(\[\{(.*?)\}\]\);', webpage, 'json') json_string = self._search_regex(r'playerView.addClips\(\[\{(.*?)\}\]\);', webpage, 'json')
try: return self._parse_json('[{' + json_string + '}]', playlist_id)
return json.loads('[{' + json_string + '}]')
except ValueError as ve:
raise ExtractorError('Failed to download JSON', cause=ve)
def _extract_entries(self, playlist): def _extract_entries(self, playlist):
return [ return [
self.url_result( self.url_result(
'kaltura:%s:%s' % (item['kaltura_partner_id'], item['kaltura_entry_id']), 'kaltura:%s:%s' % (item['kaltura_partner_id'], item['kaltura_entry_id']),
'Kaltura') KalturaIE.ie_key(), video_id=item['kaltura_entry_id'])
for item in playlist] for item in playlist]

View File

@ -2,7 +2,7 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import unified_strdate from ..utils import strip_jsonp, unified_strdate
class ElPaisIE(InfoExtractor): class ElPaisIE(InfoExtractor):
@ -29,6 +29,28 @@ class ElPaisIE(InfoExtractor):
'description': 'Que sí, que las cápsulas son cómodas. Pero si le pides algo más a la vida, quizá deberías aprender a usar bien la cafetera italiana. No tienes más que ver este vídeo y seguir sus siete normas básicas.', 'description': 'Que sí, que las cápsulas son cómodas. Pero si le pides algo más a la vida, quizá deberías aprender a usar bien la cafetera italiana. No tienes más que ver este vídeo y seguir sus siete normas básicas.',
'upload_date': '20160303', 'upload_date': '20160303',
} }
}, {
'url': 'http://elpais.com/elpais/2017/01/26/ciencia/1485456786_417876.html',
'md5': '9c79923a118a067e1a45789e1e0b0f9c',
'info_dict': {
'id': '1485456786_417876',
'ext': 'mp4',
'title': 'Hallado un barco de la antigua Roma que naufragó en Baleares hace 1.800 años',
'description': 'La nave portaba cientos de ánforas y se hundió cerca de la isla de Cabrera por razones desconocidas',
'upload_date': '20170127',
},
}, {
'url': 'http://epv.elpais.com/epv/2017/02/14/programa_la_voz_de_inaki/1487062137_075943.html',
'info_dict': {
'id': '1487062137_075943',
'ext': 'mp4',
'title': 'Disyuntivas',
'description': 'md5:a0fb1485c4a6a8a917e6f93878e66218',
'upload_date': '20170214',
},
'params': {
'skip_download': True,
},
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -37,19 +59,27 @@ class ElPaisIE(InfoExtractor):
prefix = self._html_search_regex( prefix = self._html_search_regex(
r'var\s+url_cache\s*=\s*"([^"]+)";', webpage, 'URL prefix') r'var\s+url_cache\s*=\s*"([^"]+)";', webpage, 'URL prefix')
video_suffix = self._search_regex( id_multimedia = self._search_regex(
r"(?:URLMediaFile|urlVideo_\d+)\s*=\s*url_cache\s*\+\s*'([^']+)'", webpage, 'video URL') r"id_multimedia\s*=\s*'([^']+)'", webpage, 'ID multimedia', default=None)
if id_multimedia:
url_info = self._download_json(
'http://elpais.com/vdpep/1/?pepid=' + id_multimedia, video_id, transform_source=strip_jsonp)
video_suffix = url_info['mp4']
else:
video_suffix = self._search_regex(
r"(?:URLMediaFile|urlVideo_\d+)\s*=\s*url_cache\s*\+\s*'([^']+)'", webpage, 'video URL')
video_url = prefix + video_suffix video_url = prefix + video_suffix
thumbnail_suffix = self._search_regex( thumbnail_suffix = self._search_regex(
r"(?:URLMediaStill|urlFotogramaFijo_\d+)\s*=\s*url_cache\s*\+\s*'([^']+)'", r"(?:URLMediaStill|urlFotogramaFijo_\d+)\s*=\s*url_cache\s*\+\s*'([^']+)'",
webpage, 'thumbnail URL', fatal=False) webpage, 'thumbnail URL', default=None)
thumbnail = ( thumbnail = (
None if thumbnail_suffix is None None if thumbnail_suffix is None
else prefix + thumbnail_suffix) else prefix + thumbnail_suffix) or self._og_search_thumbnail(webpage)
title = self._html_search_regex( title = self._html_search_regex(
(r"tituloVideo\s*=\s*'([^']+)'", webpage, 'title', (r"tituloVideo\s*=\s*'([^']+)'",
r'<h2 class="entry-header entry-title.*?>(.*?)</h2>'), r'<h2 class="entry-header entry-title.*?>(.*?)</h2>',
webpage, 'title') r'<h1[^>]+class="titulo"[^>]*>([^<]+)'),
webpage, 'title', default=None) or self._og_search_title(webpage)
upload_date = unified_strdate(self._search_regex( upload_date = unified_strdate(self._search_regex(
r'<p class="date-header date-int updated"\s+title="([^"]+)">', r'<p class="date-header date-int updated"\s+title="([^"]+)">',
webpage, 'upload date', default=None) or self._html_search_meta( webpage, 'upload date', default=None) or self._html_search_meta(

View File

@ -103,7 +103,10 @@ from .beatport import BeatportIE
from .bet import BetIE from .bet import BetIE
from .bigflix import BigflixIE from .bigflix import BigflixIE
from .bild import BildIE from .bild import BildIE
from .bilibili import BiliBiliIE from .bilibili import (
BiliBiliIE,
BiliBiliBangumiIE,
)
from .biobiochiletv import BioBioChileTVIE from .biobiochiletv import BioBioChileTVIE
from .biqle import BIQLEIE from .biqle import BIQLEIE
from .bleacherreport import ( from .bleacherreport import (
@ -199,6 +202,7 @@ from .commonprotocols import (
RtmpIE, RtmpIE,
) )
from .condenast import CondeNastIE from .condenast import CondeNastIE
from .corus import CorusIE
from .cracked import CrackedIE from .cracked import CrackedIE
from .crackle import CrackleIE from .crackle import CrackleIE
from .criterion import CriterionIE from .criterion import CriterionIE
@ -245,7 +249,10 @@ from .dramafever import (
from .dreisat import DreiSatIE from .dreisat import DreiSatIE
from .drbonanza import DRBonanzaIE from .drbonanza import DRBonanzaIE
from .drtuber import DrTuberIE from .drtuber import DrTuberIE
from .drtv import DRTVIE from .drtv import (
DRTVIE,
DRTVLiveIE,
)
from .dvtv import DVTVIE from .dvtv import DVTVIE
from .dumpert import DumpertIE from .dumpert import DumpertIE
from .defense import DefenseGouvFrIE from .defense import DefenseGouvFrIE
@ -296,6 +303,10 @@ from .fc2 import (
FC2EmbedIE, FC2EmbedIE,
) )
from .fczenit import FczenitIE from .fczenit import FczenitIE
from .filmon import (
FilmOnIE,
FilmOnChannelIE,
)
from .firstpost import FirstpostIE from .firstpost import FirstpostIE
from .firsttv import FirstTVIE from .firsttv import FirstTVIE
from .fivemin import FiveMinIE from .fivemin import FiveMinIE
@ -339,6 +350,7 @@ from .gameone import (
from .gamersyde import GamersydeIE from .gamersyde import GamersydeIE
from .gamespot import GameSpotIE from .gamespot import GameSpotIE
from .gamestar import GameStarIE from .gamestar import GameStarIE
from .gaskrank import GaskrankIE
from .gazeta import GazetaIE from .gazeta import GazetaIE
from .gdcvault import GDCVaultIE from .gdcvault import GDCVaultIE
from .generic import GenericIE from .generic import GenericIE
@ -370,10 +382,7 @@ from .heise import HeiseIE
from .hellporno import HellPornoIE from .hellporno import HellPornoIE
from .helsinki import HelsinkiIE from .helsinki import HelsinkiIE
from .hentaistigma import HentaiStigmaIE from .hentaistigma import HentaiStigmaIE
from .hgtv import ( from .hgtv import HGTVComShowIE
HGTVIE,
HGTVComShowIE,
)
from .historicfilms import HistoricFilmsIE from .historicfilms import HistoricFilmsIE
from .hitbox import HitboxIE, HitboxLiveIE from .hitbox import HitboxIE, HitboxLiveIE
from .hitrecord import HitRecordIE from .hitrecord import HitRecordIE
@ -685,6 +694,8 @@ from .ondemandkorea import OnDemandKoreaIE
from .onet import ( from .onet import (
OnetIE, OnetIE,
OnetChannelIE, OnetChannelIE,
OnetMVPIE,
OnetPlIE,
) )
from .onionstudios import OnionStudiosIE from .onionstudios import OnionStudiosIE
from .ooyala import ( from .ooyala import (
@ -827,6 +838,7 @@ from .sbs import SBSIE
from .scivee import SciVeeIE from .scivee import SciVeeIE
from .screencast import ScreencastIE from .screencast import ScreencastIE
from .screencastomatic import ScreencastOMaticIE from .screencastomatic import ScreencastOMaticIE
from .scrippsnetworks import ScrippsNetworksWatchIE
from .seeker import SeekerIE from .seeker import SeekerIE
from .senateisvp import SenateISVPIE from .senateisvp import SenateISVPIE
from .sendtonews import SendtoNewsIE from .sendtonews import SendtoNewsIE
@ -881,12 +893,10 @@ from .spiegeltv import SpiegeltvIE
from .spike import SpikeIE from .spike import SpikeIE
from .stitcher import StitcherIE from .stitcher import StitcherIE
from .sport5 import Sport5IE from .sport5 import Sport5IE
from .sportbox import ( from .sportbox import SportBoxEmbedIE
SportBoxIE,
SportBoxEmbedIE,
)
from .sportdeutschland import SportDeutschlandIE from .sportdeutschland import SportDeutschlandIE
from .sportschau import SportschauIE from .sportschau import SportschauIE
from .sprout import SproutIE
from .srgssr import ( from .srgssr import (
SRGSSRIE, SRGSSRIE,
SRGSSRPlayIE, SRGSSRPlayIE,
@ -999,6 +1009,7 @@ from .tvc import (
) )
from .tvigle import TvigleIE from .tvigle import TvigleIE
from .tvland import TVLandIE from .tvland import TVLandIE
from .tvn24 import TVN24IE
from .tvnoe import TVNoeIE from .tvnoe import TVNoeIE
from .tvp import ( from .tvp import (
TVPEmbedIE, TVPEmbedIE,
@ -1009,6 +1020,7 @@ from .tvplay import (
TVPlayIE, TVPlayIE,
ViafreeIE, ViafreeIE,
) )
from .tvplayer import TVPlayerIE
from .tweakers import TweakersIE from .tweakers import TweakersIE
from .twentyfourvideo import TwentyFourVideoIE from .twentyfourvideo import TwentyFourVideoIE
from .twentymin import TwentyMinutenIE from .twentymin import TwentyMinutenIE
@ -1088,6 +1100,7 @@ from .videomore import (
VideomoreSeasonIE, VideomoreSeasonIE,
) )
from .videopremium import VideoPremiumIE from .videopremium import VideoPremiumIE
from .videopress import VideoPressIE
from .vidio import VidioIE from .vidio import VidioIE
from .vidme import ( from .vidme import (
VidmeIE, VidmeIE,
@ -1137,6 +1150,7 @@ from .vlive import (
VLiveChannelIE VLiveChannelIE
) )
from .vodlocker import VodlockerIE from .vodlocker import VodlockerIE
from .vodpl import VODPlIE
from .vodplatform import VODPlatformIE from .vodplatform import VODPlatformIE
from .voicerepublic import VoiceRepublicIE from .voicerepublic import VoiceRepublicIE
from .voxmedia import VoxMediaIE from .voxmedia import VoxMediaIE

View File

@ -1,3 +1,4 @@
# coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
@ -73,7 +74,7 @@ class FacebookIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '274175099429670', 'id': '274175099429670',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Facebook video #274175099429670', 'title': 'Asif Nawab Butt posted a video to his Timeline.',
'uploader': 'Asif Nawab Butt', 'uploader': 'Asif Nawab Butt',
'upload_date': '20140506', 'upload_date': '20140506',
'timestamp': 1399398998, 'timestamp': 1399398998,
@ -134,6 +135,46 @@ class FacebookIE(InfoExtractor):
'upload_date': '20161030', 'upload_date': '20161030',
'uploader': 'CNN', 'uploader': 'CNN',
}, },
}, {
# bigPipe.onPageletArrive ... onPageletArrive pagelet_group_mall
'url': 'https://www.facebook.com/yaroslav.korpan/videos/1417995061575415/',
'info_dict': {
'id': '1417995061575415',
'ext': 'mp4',
'title': 'md5:a7b86ca673f51800cd54687b7f4012fe',
'timestamp': 1486648217,
'upload_date': '20170209',
'uploader': 'Yaroslav Korpan',
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.facebook.com/LaGuiaDelVaron/posts/1072691702860471',
'info_dict': {
'id': '1072691702860471',
'ext': 'mp4',
'title': 'md5:ae2d22a93fbb12dad20dc393a869739d',
'timestamp': 1477305000,
'upload_date': '20161024',
'uploader': 'La Guía Del Varón',
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.facebook.com/groups/1024490957622648/permalink/1396382447100162/',
'info_dict': {
'id': '1396382447100162',
'ext': 'mp4',
'title': 'md5:e2d2700afdf84e121f5d0f999bad13a3',
'timestamp': 1486035494,
'upload_date': '20170202',
'uploader': 'Elisabeth Ahtn',
},
'params': {
'skip_download': True,
},
}, { }, {
'url': 'https://www.facebook.com/video.php?v=10204634152394104', 'url': 'https://www.facebook.com/video.php?v=10204634152394104',
'only_matching': True, 'only_matching': True,
@ -249,7 +290,7 @@ class FacebookIE(InfoExtractor):
for item in instances: for item in instances:
if item[1][0] == 'VideoConfig': if item[1][0] == 'VideoConfig':
video_item = item[2][0] video_item = item[2][0]
if video_item.get('video_id') == video_id: if video_item.get('video_id'):
return video_item['videoData'] return video_item['videoData']
server_js_data = self._parse_json(self._search_regex( server_js_data = self._parse_json(self._search_regex(
@ -262,7 +303,7 @@ class FacebookIE(InfoExtractor):
if not video_data: if not video_data:
server_js_data = self._parse_json( server_js_data = self._parse_json(
self._search_regex( self._search_regex(
r'bigPipe\.onPageletArrive\(({.+?})\)\s*;\s*}\s*\)\s*,\s*["\']onPageletArrive\s+stream_pagelet', r'bigPipe\.onPageletArrive\(({.+?})\)\s*;\s*}\s*\)\s*,\s*["\']onPageletArrive\s+(?:stream_pagelet|pagelet_group_mall)',
webpage, 'js data', default='{}'), webpage, 'js data', default='{}'),
video_id, transform_source=js_to_json, fatal=False) video_id, transform_source=js_to_json, fatal=False)
if server_js_data: if server_js_data:
@ -318,10 +359,16 @@ class FacebookIE(InfoExtractor):
video_title = self._html_search_regex( video_title = self._html_search_regex(
r'(?s)<span class="fbPhotosPhotoCaption".*?id="fbPhotoPageCaption"><span class="hasCaption">(.*?)</span>', r'(?s)<span class="fbPhotosPhotoCaption".*?id="fbPhotoPageCaption"><span class="hasCaption">(.*?)</span>',
webpage, 'alternative title', default=None) webpage, 'alternative title', default=None)
video_title = limit_length(video_title, 80)
if not video_title: if not video_title:
video_title = self._html_search_meta(
'description', webpage, 'title')
if video_title:
video_title = limit_length(video_title, 80)
else:
video_title = 'Facebook video #%s' % video_id video_title = 'Facebook video #%s' % video_id
uploader = clean_html(get_element_by_id('fbPhotoPageAuthorName', webpage)) uploader = clean_html(get_element_by_id(
'fbPhotoPageAuthorName', webpage)) or self._search_regex(
r'ownerName\s*:\s*"([^"]+)"', webpage, 'uploader', fatal=False)
timestamp = int_or_none(self._search_regex( timestamp = int_or_none(self._search_regex(
r'<abbr[^>]+data-utime=["\'](\d+)', webpage, r'<abbr[^>]+data-utime=["\'](\d+)', webpage,
'timestamp', default=None)) 'timestamp', default=None))

View File

@ -0,0 +1,178 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import (
compat_str,
compat_HTTPError,
)
from ..utils import (
qualities,
strip_or_none,
int_or_none,
ExtractorError,
)
class FilmOnIE(InfoExtractor):
IE_NAME = 'filmon'
_VALID_URL = r'(?:https?://(?:www\.)?filmon\.com/vod/view/|filmon:)(?P<id>\d+)'
_TESTS = [{
'url': 'https://www.filmon.com/vod/view/24869-0-plan-9-from-outer-space',
'info_dict': {
'id': '24869',
'ext': 'mp4',
'title': 'Plan 9 From Outer Space',
'description': 'Dead human, zombies and vampires',
},
}, {
'url': 'https://www.filmon.com/vod/view/2825-1-popeye-series-1',
'info_dict': {
'id': '2825',
'title': 'Popeye Series 1',
'description': 'The original series of Popeye.',
},
'playlist_mincount': 8,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
try:
response = self._download_json(
'https://www.filmon.com/api/vod/movie?id=%s' % video_id,
video_id)['response']
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError):
errmsg = self._parse_json(e.cause.read().decode(), video_id)['reason']
raise ExtractorError('%s said: %s' % (self.IE_NAME, errmsg), expected=True)
raise
title = response['title']
description = strip_or_none(response.get('description'))
if response.get('type_id') == 1:
entries = [self.url_result('filmon:' + episode_id) for episode_id in response.get('episodes', [])]
return self.playlist_result(entries, video_id, title, description)
QUALITY = qualities(('low', 'high'))
formats = []
for format_id, stream in response.get('streams', {}).items():
stream_url = stream.get('url')
if not stream_url:
continue
formats.append({
'format_id': format_id,
'url': stream_url,
'ext': 'mp4',
'quality': QUALITY(stream.get('quality')),
'protocol': 'm3u8_native',
})
self._sort_formats(formats)
thumbnails = []
poster = response.get('poster', {})
thumbs = poster.get('thumbs', {})
thumbs['poster'] = poster
for thumb_id, thumb in thumbs.items():
thumb_url = thumb.get('url')
if not thumb_url:
continue
thumbnails.append({
'id': thumb_id,
'url': thumb_url,
'width': int_or_none(thumb.get('width')),
'height': int_or_none(thumb.get('height')),
})
return {
'id': video_id,
'title': title,
'formats': formats,
'description': description,
'thumbnails': thumbnails,
}
class FilmOnChannelIE(InfoExtractor):
IE_NAME = 'filmon:channel'
_VALID_URL = r'https?://(?:www\.)?filmon\.com/(?:tv|channel)/(?P<id>[a-z0-9-]+)'
_TESTS = [{
# VOD
'url': 'http://www.filmon.com/tv/sports-haters',
'info_dict': {
'id': '4190',
'ext': 'mp4',
'title': 'Sports Haters',
'description': 'md5:dabcb4c1d9cfc77085612f1a85f8275d',
},
}, {
# LIVE
'url': 'https://www.filmon.com/channel/filmon-sports',
'only_matching': True,
}, {
'url': 'https://www.filmon.com/tv/2894',
'only_matching': True,
}]
_THUMBNAIL_RES = [
('logo', 56, 28),
('big_logo', 106, 106),
('extra_big_logo', 300, 300),
]
def _real_extract(self, url):
channel_id = self._match_id(url)
try:
channel_data = self._download_json(
'http://www.filmon.com/api-v2/channel/' + channel_id, channel_id)['data']
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError):
errmsg = self._parse_json(e.cause.read().decode(), channel_id)['message']
raise ExtractorError('%s said: %s' % (self.IE_NAME, errmsg), expected=True)
raise
channel_id = compat_str(channel_data['id'])
is_live = not channel_data.get('is_vod') and not channel_data.get('is_vox')
title = channel_data['title']
QUALITY = qualities(('low', 'high'))
formats = []
for stream in channel_data.get('streams', []):
stream_url = stream.get('url')
if not stream_url:
continue
if not is_live:
formats.extend(self._extract_wowza_formats(
stream_url, channel_id, skip_protocols=['dash', 'rtmp', 'rtsp']))
continue
quality = stream.get('quality')
formats.append({
'format_id': quality,
# this is an m3u8 stream, but we are deliberately not using _extract_m3u8_formats
# because it doesn't have bitrate variants anyway
'url': stream_url,
'ext': 'mp4',
'quality': QUALITY(quality),
})
self._sort_formats(formats)
thumbnails = []
for name, width, height in self._THUMBNAIL_RES:
thumbnails.append({
'id': name,
'url': 'http://static.filmon.com/assets/channels/%s/%s.png' % (channel_id, name),
'width': width,
'height': height,
})
return {
'id': channel_id,
'display_id': channel_data.get('alias'),
'title': self._live_title(title) if is_live else title,
'description': channel_data.get('description'),
'thumbnails': thumbnails,
'formats': formats,
'is_live': is_live,
}

View File

@ -0,0 +1,123 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
float_or_none,
int_or_none,
js_to_json,
unified_strdate,
)
class GaskrankIE(InfoExtractor):
"""InfoExtractor for gaskrank.tv"""
_VALID_URL = r'https?://(?:www\.)?gaskrank\.tv/tv/(?P<categories>[^/]+)/(?P<id>[^/]+)\.html?'
_TESTS = [
{
'url': 'http://www.gaskrank.tv/tv/motorrad-fun/strike-einparken-durch-anfaenger-crash-mit-groesserem-flurschaden.htm',
'md5': '1ae88dbac97887d85ebd1157a95fc4f9',
'info_dict': {
'id': '201601/26955',
'ext': 'mp4',
'title': 'Strike! Einparken können nur Männer - Flurschaden hält sich in Grenzen *lol*',
'thumbnail': r're:^https?://.*\.jpg$',
'categories': ['motorrad-fun'],
'display_id': 'strike-einparken-durch-anfaenger-crash-mit-groesserem-flurschaden',
'uploader_id': 'Bikefun',
'upload_date': '20170110',
'uploader_url': None,
}
},
{
'url': 'http://www.gaskrank.tv/tv/racing/isle-of-man-tt-2011-michael-du-15920.htm',
'md5': 'c33ee32c711bc6c8224bfcbe62b23095',
'info_dict': {
'id': '201106/15920',
'ext': 'mp4',
'title': 'Isle of Man - Michael Dunlop vs Guy Martin - schwindelig kucken',
'thumbnail': r're:^https?://.*\.jpg$',
'categories': ['racing'],
'display_id': 'isle-of-man-tt-2011-michael-du-15920',
'uploader_id': 'IOM',
'upload_date': '20160506',
'uploader_url': 'www.iomtt.com',
}
}
]
def _real_extract(self, url):
"""extract information from gaskrank.tv"""
def fix_json(code):
"""Removes trailing comma in json: {{},} --> {{}}"""
return re.sub(r',\s*}', r'}', js_to_json(code))
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
categories = [re.match(self._VALID_URL, url).group('categories')]
title = self._search_regex(
r'movieName\s*:\s*\'([^\']*)\'',
webpage, 'title')
thumbnail = self._search_regex(
r'poster\s*:\s*\'([^\']*)\'',
webpage, 'thumbnail', default=None)
mobj = re.search(
r'Video von:\s*(?P<uploader_id>[^|]*?)\s*\|\s*vom:\s*(?P<upload_date>[0-9][0-9]\.[0-9][0-9]\.[0-9][0-9][0-9][0-9])',
webpage)
if mobj is not None:
uploader_id = mobj.groupdict().get('uploader_id')
upload_date = unified_strdate(mobj.groupdict().get('upload_date'))
uploader_url = self._search_regex(
r'Homepage:\s*<[^>]*>(?P<uploader_url>[^<]*)',
webpage, 'uploader_url', default=None)
tags = re.findall(
r'/tv/tags/[^/]+/"\s*>(?P<tag>[^<]*?)<',
webpage)
view_count = self._search_regex(
r'class\s*=\s*"gkRight"(?:[^>]*>\s*<[^>]*)*icon-eye-open(?:[^>]*>\s*<[^>]*)*>\s*(?P<view_count>[0-9\.]*)',
webpage, 'view_count', default=None)
if view_count:
view_count = int_or_none(view_count.replace('.', ''))
average_rating = self._search_regex(
r'itemprop\s*=\s*"ratingValue"[^>]*>\s*(?P<average_rating>[0-9,]+)',
webpage, 'average_rating')
if average_rating:
average_rating = float_or_none(average_rating.replace(',', '.'))
playlist = self._parse_json(
self._search_regex(
r'playlist\s*:\s*\[([^\]]*)\]',
webpage, 'playlist', default='{}'),
display_id, transform_source=fix_json, fatal=False)
video_id = self._search_regex(
r'https?://movies\.gaskrank\.tv/([^-]*?)(-[^\.]*)?\.mp4',
playlist.get('0').get('src'), 'video id')
formats = []
for key in playlist:
formats.append({
'url': playlist[key]['src'],
'format_id': key,
'quality': playlist[key].get('quality')})
self._sort_formats(formats, field_preference=['format_id'])
return {
'id': video_id,
'title': title,
'formats': formats,
'thumbnail': thumbnail,
'categories': categories,
'display_id': display_id,
'uploader_id': uploader_id,
'upload_date': upload_date,
'uploader_url': uploader_url,
'tags': tags,
'view_count': view_count,
'average_rating': average_rating,
}

View File

@ -20,6 +20,7 @@ from ..utils import (
float_or_none, float_or_none,
HEADRequest, HEADRequest,
is_html, is_html,
js_to_json,
orderedSet, orderedSet,
sanitized_Request, sanitized_Request,
smuggle_url, smuggle_url,
@ -29,6 +30,7 @@ from ..utils import (
UnsupportedError, UnsupportedError,
xpath_text, xpath_text,
) )
from .commonprotocols import RtmpIE
from .brightcove import ( from .brightcove import (
BrightcoveLegacyIE, BrightcoveLegacyIE,
BrightcoveNewIE, BrightcoveNewIE,
@ -81,6 +83,7 @@ from .videa import VideaIE
from .twentymin import TwentyMinutenIE from .twentymin import TwentyMinutenIE
from .ustream import UstreamIE from .ustream import UstreamIE
from .openload import OpenloadIE from .openload import OpenloadIE
from .videopress import VideoPressIE
class GenericIE(InfoExtractor): class GenericIE(InfoExtractor):
@ -946,6 +949,29 @@ class GenericIE(InfoExtractor):
'title': 'Webinar: Using Discovery, The National Archives online catalogue', 'title': 'Webinar: Using Discovery, The National Archives online catalogue',
}, },
}, },
# jwplayer rtmp
{
'url': 'http://www.suffolk.edu/sjc/',
'info_dict': {
'id': 'sjclive',
'ext': 'flv',
'title': 'Massachusetts Supreme Judicial Court Oral Arguments',
'uploader': 'www.suffolk.edu',
},
'params': {
'skip_download': True,
}
},
# Complex jwplayer
{
'url': 'http://www.indiedb.com/games/king-machine/videos',
'info_dict': {
'id': 'videos',
'ext': 'mp4',
'title': 'king machine trailer 1',
'thumbnail': r're:^https?://.*\.jpg$',
},
},
# rtl.nl embed # rtl.nl embed
{ {
'url': 'http://www.rtlnieuws.nl/nieuws/buitenland/aanslagen-kopenhagen', 'url': 'http://www.rtlnieuws.nl/nieuws/buitenland/aanslagen-kopenhagen',
@ -976,19 +1002,6 @@ class GenericIE(InfoExtractor):
'title': 'Os Guinness // Is It Fools Talk? // Unbelievable? Conference 2014', 'title': 'Os Guinness // Is It Fools Talk? // Unbelievable? Conference 2014',
}, },
}, },
# Kaltura embed protected with referrer
{
'url': 'http://www.disney.nl/disney-channel/filmpjes/achter-de-schermen#/videoId/violetta-achter-de-schermen-ruggero',
'info_dict': {
'id': '1_g4fbemnq',
'ext': 'mp4',
'title': 'Violetta - Achter De Schermen - Ruggero',
'description': 'Achter de schermen met Ruggero',
'timestamp': 1435133761,
'upload_date': '20150624',
'uploader_id': 'echojecka',
},
},
# Kaltura embed with single quotes # Kaltura embed with single quotes
{ {
'url': 'http://fod.infobase.com/p_ViewPlaylist.aspx?AssignmentID=NUN8ZY', 'url': 'http://fod.infobase.com/p_ViewPlaylist.aspx?AssignmentID=NUN8ZY',
@ -1473,7 +1486,27 @@ class GenericIE(InfoExtractor):
'skip_download': True, 'skip_download': True,
}, },
'add_ie': [TwentyMinutenIE.ie_key()], 'add_ie': [TwentyMinutenIE.ie_key()],
} },
{
# VideoPress embed
'url': 'https://en.support.wordpress.com/videopress/',
'info_dict': {
'id': 'OcobLTqC',
'ext': 'm4v',
'title': 'IMG_5786',
'timestamp': 1435711927,
'upload_date': '20150701',
},
'params': {
'skip_download': True,
},
'add_ie': [VideoPressIE.ie_key()],
},
{
# ThePlatform embedded with whitespaces in URLs
'url': 'http://www.golfchannel.com/topics/shows/golftalkcentral.htm',
'only_matching': True,
},
# { # {
# # TODO: find another test # # TODO: find another test
# # http://schema.org/VideoObject # # http://schema.org/VideoObject
@ -2320,8 +2353,9 @@ class GenericIE(InfoExtractor):
'Channel': 'channel', 'Channel': 'channel',
'ChannelList': 'channel_list', 'ChannelList': 'channel_list',
} }
return self.url_result('limelight:%s:%s' % ( return self.url_result(smuggle_url('limelight:%s:%s' % (
lm[mobj.group(1)], mobj.group(2)), 'Limelight%s' % mobj.group(1), mobj.group(2)) lm[mobj.group(1)], mobj.group(2)), {'source_url': url}),
'Limelight%s' % mobj.group(1), mobj.group(2))
mobj = re.search( mobj = re.search(
r'''(?sx) r'''(?sx)
@ -2331,7 +2365,9 @@ class GenericIE(InfoExtractor):
value=(["\'])(?:(?!\3).)*mediaId=(?P<id>[a-z0-9]{32}) value=(["\'])(?:(?!\3).)*mediaId=(?P<id>[a-z0-9]{32})
''', webpage) ''', webpage)
if mobj: if mobj:
return self.url_result('limelight:media:%s' % mobj.group('id')) return self.url_result(smuggle_url(
'limelight:media:%s' % mobj.group('id'),
{'source_url': url}), 'LimelightMedia', mobj.group('id'))
# Look for AdobeTVVideo embeds # Look for AdobeTVVideo embeds
mobj = re.search( mobj = re.search(
@ -2438,6 +2474,12 @@ class GenericIE(InfoExtractor):
return _playlist_from_matches( return _playlist_from_matches(
openload_urls, ie=OpenloadIE.ie_key()) openload_urls, ie=OpenloadIE.ie_key())
# Look for VideoPress embeds
videopress_urls = VideoPressIE._extract_urls(webpage)
if videopress_urls:
return _playlist_from_matches(
videopress_urls, ie=VideoPressIE.ie_key())
# Looking for http://schema.org/VideoObject # Looking for http://schema.org/VideoObject
json_ld = self._search_json_ld( json_ld = self._search_json_ld(
webpage, video_id, default={}, expected_type='VideoObject') webpage, video_id, default={}, expected_type='VideoObject')
@ -2462,9 +2504,20 @@ class GenericIE(InfoExtractor):
self._sort_formats(entry['formats']) self._sort_formats(entry['formats'])
return self.playlist_result(entries) return self.playlist_result(entries)
jwplayer_data_str = self._find_jwplayer_data(webpage)
if jwplayer_data_str:
try:
jwplayer_data = self._parse_json(
jwplayer_data_str, video_id, transform_source=js_to_json)
return self._parse_jwplayer_data(jwplayer_data, video_id)
except ExtractorError:
pass
def check_video(vurl): def check_video(vurl):
if YoutubeIE.suitable(vurl): if YoutubeIE.suitable(vurl):
return True return True
if RtmpIE.suitable(vurl):
return True
vpath = compat_urlparse.urlparse(vurl).path vpath = compat_urlparse.urlparse(vurl).path
vext = determine_ext(vpath) vext = determine_ext(vpath)
return '.' in vpath and vext not in ('swf', 'png', 'jpg', 'srt', 'sbv', 'sub', 'vtt', 'ttml', 'js') return '.' in vpath and vext not in ('swf', 'png', 'jpg', 'srt', 'sbv', 'sub', 'vtt', 'ttml', 'js')
@ -2572,6 +2625,15 @@ class GenericIE(InfoExtractor):
'age_limit': age_limit, 'age_limit': age_limit,
} }
if RtmpIE.suitable(video_url):
entry_info_dict.update({
'_type': 'url_transparent',
'ie_key': RtmpIE.ie_key(),
'url': video_url,
})
entries.append(entry_info_dict)
continue
ext = determine_ext(video_url) ext = determine_ext(video_url)
if ext == 'smil': if ext == 'smil':
entry_info_dict['formats'] = self._extract_smil_formats(video_url, video_id) entry_info_dict['formats'] = self._extract_smil_formats(video_url, video_id)

View File

@ -3,7 +3,7 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .adobepass import AdobePassIE
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
determine_ext, determine_ext,
@ -13,15 +13,31 @@ from ..utils import (
) )
class GoIE(InfoExtractor): class GoIE(AdobePassIE):
_BRANDS = { _SITE_INFO = {
'abc': '001', 'abc': {
'freeform': '002', 'brand': '001',
'watchdisneychannel': '004', 'requestor_id': 'ABC',
'watchdisneyjunior': '008', },
'watchdisneyxd': '009', 'freeform': {
'brand': '002',
'requestor_id': 'ABCFamily',
},
'watchdisneychannel': {
'brand': '004',
'requestor_id': 'Disney',
},
'watchdisneyjunior': {
'brand': '008',
'requestor_id': 'DisneyJunior',
},
'watchdisneyxd': {
'brand': '009',
'requestor_id': 'DisneyXD',
}
} }
_VALID_URL = r'https?://(?:(?P<sub_domain>%s)\.)?go\.com/(?:[^/]+/)*(?:vdka(?P<id>\w+)|season-\d+/\d+-(?P<display_id>[^/?#]+))' % '|'.join(_BRANDS.keys()) _VALID_URL = r'https?://(?:(?P<sub_domain>%s)\.)?go\.com/(?:[^/]+/)*(?:vdka(?P<id>\w+)|season-\d+/\d+-(?P<display_id>[^/?#]+))' % '|'.join(_SITE_INFO.keys())
_GEO_COUNTRIES = ['US']
_TESTS = [{ _TESTS = [{
'url': 'http://abc.go.com/shows/castle/video/most-recent/vdka0_g86w5onx', 'url': 'http://abc.go.com/shows/castle/video/most-recent/vdka0_g86w5onx',
'info_dict': { 'info_dict': {
@ -43,8 +59,12 @@ class GoIE(InfoExtractor):
sub_domain, video_id, display_id = re.match(self._VALID_URL, url).groups() sub_domain, video_id, display_id = re.match(self._VALID_URL, url).groups()
if not video_id: if not video_id:
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(r'data-video-id=["\']VDKA(\w+)', webpage, 'video id') video_id = self._search_regex(
brand = self._BRANDS[sub_domain] # There may be inner quotes, e.g. data-video-id="'VDKA3609139'"
# from http://freeform.go.com/shows/shadowhunters/episodes/season-2/1-this-guilty-blood
r'data-video-id=["\']*VDKA(\w+)', webpage, 'video id')
site_info = self._SITE_INFO[sub_domain]
brand = site_info['brand']
video_data = self._download_json( video_data = self._download_json(
'http://api.contents.watchabc.go.com/vp2/ws/contents/3000/videos/%s/001/-1/-1/-1/%s/-1/-1.json' % (brand, video_id), 'http://api.contents.watchabc.go.com/vp2/ws/contents/3000/videos/%s/001/-1/-1/-1/%s/-1/-1.json' % (brand, video_id),
video_id)['video'][0] video_id)['video'][0]
@ -60,16 +80,32 @@ class GoIE(InfoExtractor):
if ext == 'm3u8': if ext == 'm3u8':
video_type = video_data.get('type') video_type = video_data.get('type')
if video_type == 'lf': if video_type == 'lf':
data = {
'video_id': video_data['id'],
'video_type': video_type,
'brand': brand,
'device': '001',
}
if video_data.get('accesslevel') == '1':
requestor_id = site_info['requestor_id']
resource = self._get_mvpd_resource(
requestor_id, title, video_id, None)
auth = self._extract_mvpd_auth(
url, video_id, requestor_id, resource)
data.update({
'token': auth,
'token_type': 'ap',
'adobe_requestor_id': requestor_id,
})
entitlement = self._download_json( entitlement = self._download_json(
'https://api.entitlement.watchabc.go.com/vp2/ws-secure/entitlement/2020/authorize.json', 'https://api.entitlement.watchabc.go.com/vp2/ws-secure/entitlement/2020/authorize.json',
video_id, data=urlencode_postdata({ video_id, data=urlencode_postdata(data), headers=self.geo_verification_headers())
'video_id': video_data['id'],
'video_type': video_type,
'brand': brand,
'device': '001',
}))
errors = entitlement.get('errors', {}).get('errors', []) errors = entitlement.get('errors', {}).get('errors', [])
if errors: if errors:
for error in errors:
if error.get('code') == 1002:
self.raise_geo_restricted(
error['message'], countries=self._GEO_COUNTRIES)
error_message = ', '.join([error['message'] for error in errors]) error_message = ', '.join([error['message'] for error in errors])
raise ExtractorError('%s said: %s' % (self.IE_NAME, error_message), expected=True) raise ExtractorError('%s said: %s' % (self.IE_NAME, error_message), expected=True)
asset_url += '?' + entitlement['uplynkData']['sessionKey'] asset_url += '?' + entitlement['uplynkData']['sessionKey']

View File

@ -6,6 +6,7 @@ from .common import InfoExtractor
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
int_or_none, int_or_none,
lowercase_escape,
) )
@ -13,12 +14,12 @@ class GoogleDriveIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:docs|drive)\.google\.com/(?:uc\?.*?id=|file/d/)|video\.google\.com/get_player\?.*?docid=)(?P<id>[a-zA-Z0-9_-]{28,})' _VALID_URL = r'https?://(?:(?:docs|drive)\.google\.com/(?:uc\?.*?id=|file/d/)|video\.google\.com/get_player\?.*?docid=)(?P<id>[a-zA-Z0-9_-]{28,})'
_TESTS = [{ _TESTS = [{
'url': 'https://drive.google.com/file/d/0ByeS4oOUV-49Zzh4R1J6R09zazQ/edit?pli=1', 'url': 'https://drive.google.com/file/d/0ByeS4oOUV-49Zzh4R1J6R09zazQ/edit?pli=1',
'md5': '881f7700aec4f538571fa1e0eed4a7b6', 'md5': 'd109872761f7e7ecf353fa108c0dbe1e',
'info_dict': { 'info_dict': {
'id': '0ByeS4oOUV-49Zzh4R1J6R09zazQ', 'id': '0ByeS4oOUV-49Zzh4R1J6R09zazQ',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Big Buck Bunny.mp4', 'title': 'Big Buck Bunny.mp4',
'duration': 46, 'duration': 45,
} }
}, { }, {
# video id is longer than 28 characters # video id is longer than 28 characters
@ -55,7 +56,7 @@ class GoogleDriveIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage( webpage = self._download_webpage(
'http://docs.google.com/file/d/%s' % video_id, video_id, encoding='unicode_escape') 'http://docs.google.com/file/d/%s' % video_id, video_id)
reason = self._search_regex(r'"reason"\s*,\s*"([^"]+)', webpage, 'reason', default=None) reason = self._search_regex(r'"reason"\s*,\s*"([^"]+)', webpage, 'reason', default=None)
if reason: if reason:
@ -74,7 +75,7 @@ class GoogleDriveIE(InfoExtractor):
resolution = fmt.split('/')[1] resolution = fmt.split('/')[1]
width, height = resolution.split('x') width, height = resolution.split('x')
formats.append({ formats.append({
'url': fmt_url, 'url': lowercase_escape(fmt_url),
'format_id': fmt_id, 'format_id': fmt_id,
'resolution': resolution, 'resolution': resolution,
'width': int_or_none(width), 'width': int_or_none(width),

View File

@ -6,59 +6,58 @@ from ..utils import (
determine_ext, determine_ext,
int_or_none, int_or_none,
parse_iso8601, parse_iso8601,
xpath_text,
) )
class HeiseIE(InfoExtractor): class HeiseIE(InfoExtractor):
_VALID_URL = r'''(?x) _VALID_URL = r'https?://(?:www\.)?heise\.de/(?:[^/]+/)+[^/]+-(?P<id>[0-9]+)\.html'
https?://(?:www\.)?heise\.de/video/artikel/ _TESTS = [{
.+?(?P<id>[0-9]+)\.html(?:$|[?#]) 'url': 'http://www.heise.de/video/artikel/Podcast-c-t-uplink-3-3-Owncloud-Tastaturen-Peilsender-Smartphone-2404147.html',
'''
_TEST = {
'url': (
'http://www.heise.de/video/artikel/Podcast-c-t-uplink-3-3-Owncloud-Tastaturen-Peilsender-Smartphone-2404147.html'
),
'md5': 'ffed432483e922e88545ad9f2f15d30e', 'md5': 'ffed432483e922e88545ad9f2f15d30e',
'info_dict': { 'info_dict': {
'id': '2404147', 'id': '2404147',
'ext': 'mp4', 'ext': 'mp4',
'title': ( 'title': "Podcast: c't uplink 3.3 Owncloud / Tastaturen / Peilsender Smartphone",
"Podcast: c't uplink 3.3 Owncloud / Tastaturen / Peilsender Smartphone"
),
'format_id': 'mp4_720p', 'format_id': 'mp4_720p',
'timestamp': 1411812600, 'timestamp': 1411812600,
'upload_date': '20140927', 'upload_date': '20140927',
'description': 'In uplink-Episode 3.3 geht es darum, wie man sich von Cloud-Anbietern emanzipieren kann, worauf man beim Kauf einer Tastatur achten sollte und was Smartphones über uns verraten.', 'description': 'md5:c934cbfb326c669c2bcabcbe3d3fcd20',
'thumbnail': r're:^https?://.*\.jpe?g$', 'thumbnail': r're:^https?://.*/gallery/$',
} }
} }, {
'url': 'http://www.heise.de/ct/artikel/c-t-uplink-3-3-Owncloud-Tastaturen-Peilsender-Smartphone-2403911.html',
'only_matching': True,
}, {
'url': 'http://www.heise.de/newsticker/meldung/c-t-uplink-Owncloud-Tastaturen-Peilsender-Smartphone-2404251.html?wt_mc=rss.ho.beitrag.atom',
'only_matching': True,
}, {
'url': 'http://www.heise.de/ct/ausgabe/2016-12-Spiele-3214137.html',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
container_id = self._search_regex( container_id = self._search_regex(
r'<div class="videoplayerjw".*?data-container="([0-9]+)"', r'<div class="videoplayerjw"[^>]+data-container="([0-9]+)"',
webpage, 'container ID') webpage, 'container ID')
sequenz_id = self._search_regex( sequenz_id = self._search_regex(
r'<div class="videoplayerjw".*?data-sequenz="([0-9]+)"', r'<div class="videoplayerjw"[^>]+data-sequenz="([0-9]+)"',
webpage, 'sequenz ID') webpage, 'sequenz ID')
data_url = 'http://www.heise.de/videout/feed?container=%s&sequenz=%s' % (container_id, sequenz_id)
doc = self._download_xml(data_url, video_id)
info = { title = self._html_search_meta('fulltitle', webpage, default=None)
'id': video_id, if not title or title == "c't":
'thumbnail': self._og_search_thumbnail(webpage), title = self._search_regex(
'timestamp': parse_iso8601( r'<div[^>]+class="videoplayerjw"[^>]+data-title="([^"]+)"',
self._html_search_meta('date', webpage)), webpage, 'title')
'description': self._og_search_description(webpage),
}
title = self._html_search_meta('fulltitle', webpage) doc = self._download_xml(
if title: 'http://www.heise.de/videout/feed', video_id, query={
info['title'] = title 'container': container_id,
else: 'sequenz': sequenz_id,
info['title'] = self._og_search_title(webpage) })
formats = [] formats = []
for source_node in doc.findall('.//{http://rss.jwpcdn.com/}source'): for source_node in doc.findall('.//{http://rss.jwpcdn.com/}source'):
@ -74,6 +73,18 @@ class HeiseIE(InfoExtractor):
'height': height, 'height': height,
}) })
self._sort_formats(formats) self._sort_formats(formats)
info['formats'] = formats
return info description = self._og_search_description(
webpage, default=None) or self._html_search_meta(
'description', webpage)
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': (xpath_text(doc, './/{http://rss.jwpcdn.com/}image') or
self._og_search_thumbnail(webpage)),
'timestamp': parse_iso8601(
self._html_search_meta('date', webpage)),
'formats': formats,
}

View File

@ -2,50 +2,6 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import (
int_or_none,
js_to_json,
smuggle_url,
)
class HGTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?hgtv\.ca/[^/]+/video/(?P<id>[^/]+)/video.html'
_TEST = {
'url': 'http://www.hgtv.ca/homefree/video/overnight-success/video.html?v=738081859718&p=1&s=da#video',
'md5': '',
'info_dict': {
'id': 'aFH__I_5FBOX',
'ext': 'mp4',
'title': 'Overnight Success',
'description': 'After weeks of hard work, high stakes, breakdowns and pep talks, the final 2 contestants compete to win the ultimate dream.',
'uploader': 'SHWM-NEW',
'timestamp': 1470320034,
'upload_date': '20160804',
},
'params': {
# m3u8 download
'skip_download': True,
},
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
embed_vars = self._parse_json(self._search_regex(
r'(?s)embed_vars\s*=\s*({.*?});',
webpage, 'embed vars'), display_id, js_to_json)
return {
'_type': 'url_transparent',
'url': smuggle_url(
'http://link.theplatform.com/s/dtjsEC/%s?mbr=true&manifest=m3u' % embed_vars['pid'], {
'force_smil_url': True
}),
'series': embed_vars.get('show'),
'season_number': int_or_none(embed_vars.get('season')),
'episode_number': int_or_none(embed_vars.get('episode')),
'ie_key': 'ThePlatform',
}
class HGTVComShowIE(InfoExtractor): class HGTVComShowIE(InfoExtractor):

View File

@ -34,11 +34,9 @@ class HotStarIE(InfoExtractor):
'only_matching': True, 'only_matching': True,
}] }]
_GET_CONTENT_TEMPLATE = 'http://account.hotstar.com/AVS/besc?action=GetAggregatedContentDetails&channel=PCTV&contentId=%s' def _download_json(self, url_or_request, video_id, note='Downloading JSON metadata', fatal=True, query=None):
_GET_CDN_TEMPLATE = 'http://getcdn.hotstar.com/AVS/besc?action=GetCDN&asJson=Y&channel=%s&id=%s&type=%s' json_data = super(HotStarIE, self)._download_json(
url_or_request, video_id, note, fatal=fatal, query=query)
def _download_json(self, url_or_request, video_id, note='Downloading JSON metadata', fatal=True):
json_data = super(HotStarIE, self)._download_json(url_or_request, video_id, note, fatal=fatal)
if json_data['resultCode'] != 'OK': if json_data['resultCode'] != 'OK':
if fatal: if fatal:
raise ExtractorError(json_data['errorDescription']) raise ExtractorError(json_data['errorDescription'])
@ -48,20 +46,37 @@ class HotStarIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
video_data = self._download_json( video_data = self._download_json(
self._GET_CONTENT_TEMPLATE % video_id, 'http://account.hotstar.com/AVS/besc', video_id, query={
video_id)['contentInfo'][0] 'action': 'GetAggregatedContentDetails',
'channel': 'PCTV',
'contentId': video_id,
})['contentInfo'][0]
title = video_data['episodeTitle']
if video_data.get('encrypted') == 'Y':
raise ExtractorError('This video is DRM protected.', expected=True)
formats = [] formats = []
# PCTV for extracting f4m manifest for f in ('JIO',):
for f in ('TABLET',):
format_data = self._download_json( format_data = self._download_json(
self._GET_CDN_TEMPLATE % (f, video_id, 'VOD'), 'http://getcdn.hotstar.com/AVS/besc',
video_id, 'Downloading %s JSON metadata' % f, fatal=False) video_id, 'Downloading %s JSON metadata' % f,
fatal=False, query={
'action': 'GetCDN',
'asJson': 'Y',
'channel': f,
'id': video_id,
'type': 'VOD',
})
if format_data: if format_data:
format_url = format_data['src'] format_url = format_data.get('src')
if not format_url:
continue
ext = determine_ext(format_url) ext = determine_ext(format_url)
if ext == 'm3u8': if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(format_url, video_id, 'mp4', m3u8_id='hls', fatal=False)) formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4',
m3u8_id='hls', fatal=False))
elif ext == 'f4m': elif ext == 'f4m':
# produce broken files # produce broken files
continue continue
@ -75,9 +90,12 @@ class HotStarIE(InfoExtractor):
return { return {
'id': video_id, 'id': video_id,
'title': video_data['episodeTitle'], 'title': title,
'description': video_data.get('description'), 'description': video_data.get('description'),
'duration': int_or_none(video_data.get('duration')), 'duration': int_or_none(video_data.get('duration')),
'timestamp': int_or_none(video_data.get('broadcastDate')), 'timestamp': int_or_none(video_data.get('broadcastDate')),
'formats': formats, 'formats': formats,
'episode': title,
'episode_number': int_or_none(video_data.get('episodeNumber')),
'series': video_data.get('contentTitle'),
} }

View File

@ -4,7 +4,10 @@ from __future__ import unicode_literals
import base64 import base64
from ..compat import compat_urllib_parse_unquote from ..compat import (
compat_urllib_parse_unquote,
compat_urlparse,
)
from ..utils import determine_ext from ..utils import determine_ext
from .bokecc import BokeCCBaseIE from .bokecc import BokeCCBaseIE
@ -33,9 +36,21 @@ class InfoQIE(BokeCCBaseIE):
'ext': 'flv', 'ext': 'flv',
'description': 'md5:308d981fb28fa42f49f9568322c683ff', 'description': 'md5:308d981fb28fa42f49f9568322c683ff',
}, },
}, {
'url': 'https://www.infoq.com/presentations/Simple-Made-Easy',
'md5': '0e34642d4d9ef44bf86f66f6399672db',
'info_dict': {
'id': 'Simple-Made-Easy',
'title': 'Simple Made Easy',
'ext': 'mp3',
'description': 'md5:3e0e213a8bbd074796ef89ea35ada25b',
},
'params': {
'format': 'bestaudio',
},
}] }]
def _extract_rtmp_videos(self, webpage): def _extract_rtmp_video(self, webpage):
# The server URL is hardcoded # The server URL is hardcoded
video_url = 'rtmpe://video.infoq.com/cfx/st/' video_url = 'rtmpe://video.infoq.com/cfx/st/'
@ -47,28 +62,53 @@ class InfoQIE(BokeCCBaseIE):
playpath = 'mp4:' + real_id playpath = 'mp4:' + real_id
return [{ return [{
'format_id': 'rtmp', 'format_id': 'rtmp_video',
'url': video_url, 'url': video_url,
'ext': determine_ext(playpath), 'ext': determine_ext(playpath),
'play_path': playpath, 'play_path': playpath,
}] }]
def _extract_http_videos(self, webpage): def _extract_cookies(self, webpage):
http_video_url = self._search_regex(r'P\.s\s*=\s*\'([^\']+)\'', webpage, 'video URL')
policy = self._search_regex(r'InfoQConstants.scp\s*=\s*\'([^\']+)\'', webpage, 'policy') policy = self._search_regex(r'InfoQConstants.scp\s*=\s*\'([^\']+)\'', webpage, 'policy')
signature = self._search_regex(r'InfoQConstants.scs\s*=\s*\'([^\']+)\'', webpage, 'signature') signature = self._search_regex(r'InfoQConstants.scs\s*=\s*\'([^\']+)\'', webpage, 'signature')
key_pair_id = self._search_regex(r'InfoQConstants.sck\s*=\s*\'([^\']+)\'', webpage, 'key-pair-id') key_pair_id = self._search_regex(r'InfoQConstants.sck\s*=\s*\'([^\']+)\'', webpage, 'key-pair-id')
return 'CloudFront-Policy=%s; CloudFront-Signature=%s; CloudFront-Key-Pair-Id=%s' % (
policy, signature, key_pair_id)
def _extract_http_video(self, webpage):
http_video_url = self._search_regex(r'P\.s\s*=\s*\'([^\']+)\'', webpage, 'video URL')
return [{ return [{
'format_id': 'http', 'format_id': 'http_video',
'url': http_video_url, 'url': http_video_url,
'http_headers': { 'http_headers': {
'Cookie': 'CloudFront-Policy=%s; CloudFront-Signature=%s; CloudFront-Key-Pair-Id=%s' % ( 'Cookie': self._extract_cookies(webpage)
policy, signature, key_pair_id),
}, },
}] }]
def _extract_http_audio(self, webpage, video_id):
fields = self._hidden_inputs(webpage)
http_audio_url = fields['filename']
if http_audio_url is None:
return []
cookies_header = {'Cookie': self._extract_cookies(webpage)}
# base URL is found in the Location header in the response returned by
# GET https://www.infoq.com/mp3download.action?filename=... when logged in.
http_audio_url = compat_urlparse.urljoin('http://res.infoq.com/downloads/mp3downloads/', http_audio_url)
# audio file seem to be missing some times even if there is a download link
# so probe URL to make sure
if not self._is_valid_url(http_audio_url, video_id, headers=cookies_header):
return []
return [{
'format_id': 'http_audio',
'url': http_audio_url,
'vcodec': 'none',
'http_headers': cookies_header,
}]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
@ -80,7 +120,10 @@ class InfoQIE(BokeCCBaseIE):
# for China videos, HTTP video URL exists but always fails with 403 # for China videos, HTTP video URL exists but always fails with 403
formats = self._extract_bokecc_formats(webpage, video_id) formats = self._extract_bokecc_formats(webpage, video_id)
else: else:
formats = self._extract_rtmp_videos(webpage) + self._extract_http_videos(webpage) formats = (
self._extract_rtmp_video(webpage) +
self._extract_http_video(webpage) +
self._extract_http_audio(webpage, video_id))
self._sort_formats(formats) self._sort_formats(formats)

View File

@ -8,12 +8,12 @@ from .common import InfoExtractor
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
js_to_json, js_to_json,
sanitized_Request,
) )
class IPrimaIE(InfoExtractor): class IPrimaIE(InfoExtractor):
_VALID_URL = r'https?://play\.iprima\.cz/(?:.+/)?(?P<id>[^?#]+)' _VALID_URL = r'https?://play\.iprima\.cz/(?:.+/)?(?P<id>[^?#]+)'
_GEO_BYPASS = False
_TESTS = [{ _TESTS = [{
'url': 'http://play.iprima.cz/gondici-s-r-o-33', 'url': 'http://play.iprima.cz/gondici-s-r-o-33',
@ -29,6 +29,10 @@ class IPrimaIE(InfoExtractor):
}, { }, {
'url': 'http://play.iprima.cz/particka/particka-92', 'url': 'http://play.iprima.cz/particka/particka-92',
'only_matching': True, 'only_matching': True,
}, {
# geo restricted
'url': 'http://play.iprima.cz/closer-nove-pripady/closer-nove-pripady-iv-1',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -38,11 +42,13 @@ class IPrimaIE(InfoExtractor):
video_id = self._search_regex(r'data-product="([^"]+)">', webpage, 'real id') video_id = self._search_regex(r'data-product="([^"]+)">', webpage, 'real id')
req = sanitized_Request( playerpage = self._download_webpage(
'http://play.iprima.cz/prehravac/init?_infuse=1' 'http://play.iprima.cz/prehravac/init',
'&_ts=%s&productId=%s' % (round(time.time()), video_id)) video_id, note='Downloading player', query={
req.add_header('Referer', url) '_infuse': 1,
playerpage = self._download_webpage(req, video_id, note='Downloading player') '_ts': round(time.time()),
'productId': video_id,
}, headers={'Referer': url})
formats = [] formats = []
@ -65,7 +71,7 @@ class IPrimaIE(InfoExtractor):
options = self._parse_json( options = self._parse_json(
self._search_regex( self._search_regex(
r'(?s)var\s+playerOptions\s*=\s*({.+?});', r'(?s)(?:TDIPlayerOptions|playerOptions)\s*=\s*({.+?});\s*\]\]',
playerpage, 'player options', default='{}'), playerpage, 'player options', default='{}'),
video_id, transform_source=js_to_json, fatal=False) video_id, transform_source=js_to_json, fatal=False)
if options: if options:
@ -82,7 +88,7 @@ class IPrimaIE(InfoExtractor):
extract_formats(src) extract_formats(src)
if not formats and '>GEO_IP_NOT_ALLOWED<' in playerpage: if not formats and '>GEO_IP_NOT_ALLOWED<' in playerpage:
self.raise_geo_restricted() self.raise_geo_restricted(countries=['CZ'])
self._sort_formats(formats) self._sort_formats(formats)

View File

@ -173,11 +173,12 @@ class IqiyiIE(InfoExtractor):
} }
}, { }, {
'url': 'http://www.iqiyi.com/v_19rrhnnclk.html', 'url': 'http://www.iqiyi.com/v_19rrhnnclk.html',
'md5': '667171934041350c5de3f5015f7f1152', 'md5': 'b7dc800a4004b1b57749d9abae0472da',
'info_dict': { 'info_dict': {
'id': 'e3f585b550a280af23c98b6cb2be19fb', 'id': 'e3f585b550a280af23c98b6cb2be19fb',
'ext': 'mp4', 'ext': 'mp4',
'title': '名侦探柯南 国语版第752集 迫近灰原秘密的黑影 下篇', # This can be either Simplified Chinese or Traditional Chinese
'title': r're:^(?:名侦探柯南 国语版第752集 迫近灰原秘密的黑影 下篇|名偵探柯南 國語版第752集 迫近灰原秘密的黑影 下篇)$',
}, },
'skip': 'Geo-restricted to China', 'skip': 'Geo-restricted to China',
}, { }, {

View File

@ -24,6 +24,7 @@ from ..utils import (
class ITVIE(InfoExtractor): class ITVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?itv\.com/hub/[^/]+/(?P<id>[0-9a-zA-Z]+)' _VALID_URL = r'https?://(?:www\.)?itv\.com/hub/[^/]+/(?P<id>[0-9a-zA-Z]+)'
_GEO_COUNTRIES = ['GB']
_TEST = { _TEST = {
'url': 'http://www.itv.com/hub/mr-bean-animated-series/2a2936a0053', 'url': 'http://www.itv.com/hub/mr-bean-animated-series/2a2936a0053',
'info_dict': { 'info_dict': {
@ -98,7 +99,11 @@ class ITVIE(InfoExtractor):
headers=headers, data=etree.tostring(req_env)) headers=headers, data=etree.tostring(req_env))
playlist = xpath_element(resp_env, './/Playlist') playlist = xpath_element(resp_env, './/Playlist')
if playlist is None: if playlist is None:
fault_code = xpath_text(resp_env, './/faultcode')
fault_string = xpath_text(resp_env, './/faultstring') fault_string = xpath_text(resp_env, './/faultstring')
if fault_code == 'InvalidGeoRegion':
self.raise_geo_restricted(
msg=fault_string, countries=self._GEO_COUNTRIES)
raise ExtractorError('%s said: %s' % (self.IE_NAME, fault_string)) raise ExtractorError('%s said: %s' % (self.IE_NAME, fault_string))
title = xpath_text(playlist, 'EpisodeTitle', fatal=True) title = xpath_text(playlist, 'EpisodeTitle', fatal=True)
video_element = xpath_element(playlist, 'VideoEntries/Video', fatal=True) video_element = xpath_element(playlist, 'VideoEntries/Video', fatal=True)

View File

@ -3,14 +3,18 @@ from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urllib_parse_urlparse from ..compat import compat_urllib_parse_urlparse
from ..utils import remove_end from ..utils import (
int_or_none,
mimetype2ext,
remove_end,
)
class IwaraIE(InfoExtractor): class IwaraIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.|ecchi\.)?iwara\.tv/videos/(?P<id>[a-zA-Z0-9]+)' _VALID_URL = r'https?://(?:www\.|ecchi\.)?iwara\.tv/videos/(?P<id>[a-zA-Z0-9]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://iwara.tv/videos/amVwUl1EHpAD9RD', 'url': 'http://iwara.tv/videos/amVwUl1EHpAD9RD',
'md5': '1d53866b2c514b23ed69e4352fdc9839', # md5 is unstable
'info_dict': { 'info_dict': {
'id': 'amVwUl1EHpAD9RD', 'id': 'amVwUl1EHpAD9RD',
'ext': 'mp4', 'ext': 'mp4',
@ -23,17 +27,17 @@ class IwaraIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '0B1LvuHnL-sRFNXB1WHNqbGw4SXc', 'id': '0B1LvuHnL-sRFNXB1WHNqbGw4SXc',
'ext': 'mp4', 'ext': 'mp4',
'title': '[3D Hentai] Kyonyu Ã\x97 Genkai Ã\x97 Emaki Shinobi Girls.mp4', 'title': '[3D Hentai] Kyonyu × Genkai × Emaki Shinobi Girls.mp4',
'age_limit': 18, 'age_limit': 18,
}, },
'add_ie': ['GoogleDrive'], 'add_ie': ['GoogleDrive'],
}, { }, {
'url': 'http://www.iwara.tv/videos/nawkaumd6ilezzgq', 'url': 'http://www.iwara.tv/videos/nawkaumd6ilezzgq',
'md5': '1d85f1e5217d2791626cff5ec83bb189', # md5 is unstable
'info_dict': { 'info_dict': {
'id': '6liAP9s2Ojc', 'id': '6liAP9s2Ojc',
'ext': 'mp4', 'ext': 'mp4',
'age_limit': 0, 'age_limit': 18,
'title': '[MMD] Do It Again Ver.2 [1080p 60FPS] (Motion,Camera,Wav+DL)', 'title': '[MMD] Do It Again Ver.2 [1080p 60FPS] (Motion,Camera,Wav+DL)',
'description': 'md5:590c12c0df1443d833fbebe05da8c47a', 'description': 'md5:590c12c0df1443d833fbebe05da8c47a',
'upload_date': '20160910', 'upload_date': '20160910',
@ -52,9 +56,9 @@ class IwaraIE(InfoExtractor):
# ecchi is 'sexy' in Japanese # ecchi is 'sexy' in Japanese
age_limit = 18 if hostname.split('.')[0] == 'ecchi' else 0 age_limit = 18 if hostname.split('.')[0] == 'ecchi' else 0
entries = self._parse_html5_media_entries(url, webpage, video_id) video_data = self._download_json('http://www.iwara.tv/api/video/%s' % video_id, video_id)
if not entries: if not video_data:
iframe_url = self._html_search_regex( iframe_url = self._html_search_regex(
r'<iframe[^>]+src=([\'"])(?P<url>[^\'"]+)\1', r'<iframe[^>]+src=([\'"])(?P<url>[^\'"]+)\1',
webpage, 'iframe URL', group='url') webpage, 'iframe URL', group='url')
@ -67,11 +71,25 @@ class IwaraIE(InfoExtractor):
title = remove_end(self._html_search_regex( title = remove_end(self._html_search_regex(
r'<title>([^<]+)</title>', webpage, 'title'), ' | Iwara') r'<title>([^<]+)</title>', webpage, 'title'), ' | Iwara')
info_dict = entries[0] formats = []
info_dict.update({ for a_format in video_data:
format_id = a_format.get('resolution')
height = int_or_none(self._search_regex(
r'(\d+)p', format_id, 'height', default=None))
formats.append({
'url': a_format['uri'],
'format_id': format_id,
'ext': mimetype2ext(a_format.get('mime')) or 'mp4',
'height': height,
'width': int_or_none(height / 9.0 * 16.0 if height else None),
'quality': 1 if format_id == 'Source' else 0,
})
self._sort_formats(formats)
return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'age_limit': age_limit, 'age_limit': age_limit,
}) 'formats': formats,
}
return info_dict

View File

@ -4,139 +4,9 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import (
determine_ext,
float_or_none,
int_or_none,
js_to_json,
mimetype2ext,
urljoin,
)
class JWPlatformBaseIE(InfoExtractor): class JWPlatformIE(InfoExtractor):
@staticmethod
def _find_jwplayer_data(webpage):
# TODO: Merge this with JWPlayer-related codes in generic.py
mobj = re.search(
r'jwplayer\((?P<quote>[\'"])[^\'" ]+(?P=quote)\)\.setup\s*\((?P<options>[^)]+)\)',
webpage)
if mobj:
return mobj.group('options')
def _extract_jwplayer_data(self, webpage, video_id, *args, **kwargs):
jwplayer_data = self._parse_json(
self._find_jwplayer_data(webpage), video_id,
transform_source=js_to_json)
return self._parse_jwplayer_data(
jwplayer_data, video_id, *args, **kwargs)
def _parse_jwplayer_data(self, jwplayer_data, video_id=None, require_title=True,
m3u8_id=None, mpd_id=None, rtmp_params=None, base_url=None):
# JWPlayer backward compatibility: flattened playlists
# https://github.com/jwplayer/jwplayer/blob/v7.4.3/src/js/api/config.js#L81-L96
if 'playlist' not in jwplayer_data:
jwplayer_data = {'playlist': [jwplayer_data]}
entries = []
# JWPlayer backward compatibility: single playlist item
# https://github.com/jwplayer/jwplayer/blob/v7.7.0/src/js/playlist/playlist.js#L10
if not isinstance(jwplayer_data['playlist'], list):
jwplayer_data['playlist'] = [jwplayer_data['playlist']]
for video_data in jwplayer_data['playlist']:
# JWPlayer backward compatibility: flattened sources
# https://github.com/jwplayer/jwplayer/blob/v7.4.3/src/js/playlist/item.js#L29-L35
if 'sources' not in video_data:
video_data['sources'] = [video_data]
this_video_id = video_id or video_data['mediaid']
formats = []
for source in video_data['sources']:
source_url = self._proto_relative_url(source['file'])
if base_url:
source_url = compat_urlparse.urljoin(base_url, source_url)
source_type = source.get('type') or ''
ext = mimetype2ext(source_type) or determine_ext(source_url)
if source_type == 'hls' or ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
source_url, this_video_id, 'mp4', 'm3u8_native', m3u8_id=m3u8_id, fatal=False))
elif ext == 'mpd':
formats.extend(self._extract_mpd_formats(
source_url, this_video_id, mpd_id=mpd_id, fatal=False))
# https://github.com/jwplayer/jwplayer/blob/master/src/js/providers/default.js#L67
elif source_type.startswith('audio') or ext in ('oga', 'aac', 'mp3', 'mpeg', 'vorbis'):
formats.append({
'url': source_url,
'vcodec': 'none',
'ext': ext,
})
else:
height = int_or_none(source.get('height'))
if height is None:
# Often no height is provided but there is a label in
# format like 1080p.
height = int_or_none(self._search_regex(
r'^(\d{3,})[pP]$', source.get('label') or '',
'height', default=None))
a_format = {
'url': source_url,
'width': int_or_none(source.get('width')),
'height': height,
'ext': ext,
}
if source_url.startswith('rtmp'):
a_format['ext'] = 'flv'
# See com/longtailvideo/jwplayer/media/RTMPMediaProvider.as
# of jwplayer.flash.swf
rtmp_url_parts = re.split(
r'((?:mp4|mp3|flv):)', source_url, 1)
if len(rtmp_url_parts) == 3:
rtmp_url, prefix, play_path = rtmp_url_parts
a_format.update({
'url': rtmp_url,
'play_path': prefix + play_path,
})
if rtmp_params:
a_format.update(rtmp_params)
formats.append(a_format)
self._sort_formats(formats)
subtitles = {}
tracks = video_data.get('tracks')
if tracks and isinstance(tracks, list):
for track in tracks:
if track.get('kind') != 'captions':
continue
track_url = urljoin(base_url, track.get('file'))
if not track_url:
continue
subtitles.setdefault(track.get('label') or 'en', []).append({
'url': self._proto_relative_url(track_url)
})
entries.append({
'id': this_video_id,
'title': video_data['title'] if require_title else video_data.get('title'),
'description': video_data.get('description'),
'thumbnail': self._proto_relative_url(video_data.get('image')),
'timestamp': int_or_none(video_data.get('pubdate')),
'duration': float_or_none(jwplayer_data.get('duration') or video_data.get('duration')),
'subtitles': subtitles,
'formats': formats,
})
if len(entries) == 1:
return entries[0]
else:
return self.playlist_result(entries)
class JWPlatformIE(JWPlatformBaseIE):
_VALID_URL = r'(?:https?://content\.jwplatform\.com/(?:feeds|players|jw6)/|jwplatform:)(?P<id>[a-zA-Z0-9]{8})' _VALID_URL = r'(?:https?://content\.jwplatform\.com/(?:feeds|players|jw6)/|jwplatform:)(?P<id>[a-zA-Z0-9]{8})'
_TEST = { _TEST = {
'url': 'http://content.jwplatform.com/players/nPripu9l-ALJ3XQCI.js', 'url': 'http://content.jwplatform.com/players/nPripu9l-ALJ3XQCI.js',

View File

@ -23,11 +23,11 @@ class KalturaIE(InfoExtractor):
(?: (?:
kaltura:(?P<partner_id>\d+):(?P<id>[0-9a-z_]+)| kaltura:(?P<partner_id>\d+):(?P<id>[0-9a-z_]+)|
https?:// https?://
(:?(?:www|cdnapi(?:sec)?)\.)?kaltura\.com/ (:?(?:www|cdnapi(?:sec)?)\.)?kaltura\.com(?::\d+)?/
(?: (?:
(?: (?:
# flash player # flash player
index\.php/kwidget| index\.php/(?:kwidget|extwidget/preview)|
# html5 player # html5 player
html5/html5lib/[^/]+/mwEmbedFrame\.php html5/html5lib/[^/]+/mwEmbedFrame\.php
) )
@ -94,6 +94,14 @@ class KalturaIE(InfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
},
{
'url': 'https://www.kaltura.com/index.php/extwidget/preview/partner_id/1770401/uiconf_id/37307382/entry_id/0_58u8kme7/embed/iframe?&flashvars[streamerType]=auto',
'only_matching': True,
},
{
'url': 'https://www.kaltura.com:443/index.php/extwidget/preview/partner_id/1770401/uiconf_id/37307382/entry_id/0_58u8kme7/embed/iframe?&flashvars[streamerType]=auto',
'only_matching': True,
} }
] ]
@ -112,7 +120,7 @@ class KalturaIE(InfoExtractor):
re.search( re.search(
r'''(?xs) r'''(?xs)
(?P<q1>["\']) (?P<q1>["\'])
(?:https?:)?//cdnapi(?:sec)?\.kaltura\.com/(?:(?!(?P=q1)).)*(?:p|partner_id)/(?P<partner_id>\d+)(?:(?!(?P=q1)).)* (?:https?:)?//cdnapi(?:sec)?\.kaltura\.com(?::\d+)?/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+)(?:(?!(?P=q1)).)*
(?P=q1).*? (?P=q1).*?
(?: (?:
entry_?[Ii]d| entry_?[Ii]d|
@ -209,6 +217,8 @@ class KalturaIE(InfoExtractor):
partner_id = params['wid'][0][1:] partner_id = params['wid'][0][1:]
elif 'p' in params: elif 'p' in params:
partner_id = params['p'][0] partner_id = params['p'][0]
elif 'partner_id' in params:
partner_id = params['partner_id'][0]
else: else:
raise ExtractorError('Invalid URL', expected=True) raise ExtractorError('Invalid URL', expected=True)
if 'entry_id' in params: if 'entry_id' in params:

View File

@ -7,20 +7,40 @@ class LemondeIE(InfoExtractor):
_VALID_URL = r'https?://(?:.+?\.)?lemonde\.fr/(?:[^/]+/)*(?P<id>[^/]+)\.html' _VALID_URL = r'https?://(?:.+?\.)?lemonde\.fr/(?:[^/]+/)*(?P<id>[^/]+)\.html'
_TESTS = [{ _TESTS = [{
'url': 'http://www.lemonde.fr/police-justice/video/2016/01/19/comprendre-l-affaire-bygmalion-en-cinq-minutes_4849702_1653578.html', 'url': 'http://www.lemonde.fr/police-justice/video/2016/01/19/comprendre-l-affaire-bygmalion-en-cinq-minutes_4849702_1653578.html',
'md5': '01fb3c92de4c12c573343d63e163d302', 'md5': 'da120c8722d8632eec6ced937536cc98',
'info_dict': { 'info_dict': {
'id': 'lqm3kl', 'id': 'lqm3kl',
'ext': 'mp4', 'ext': 'mp4',
'title': "Comprendre l'affaire Bygmalion en 5 minutes", 'title': "Comprendre l'affaire Bygmalion en 5 minutes",
'thumbnail': r're:^https?://.*\.jpg', 'thumbnail': r're:^https?://.*\.jpg',
'duration': 320, 'duration': 309,
'upload_date': '20160119', 'upload_date': '20160119',
'timestamp': 1453194778, 'timestamp': 1453194778,
'uploader_id': '3pmkp', 'uploader_id': '3pmkp',
}, },
}, {
# standard iframe embed
'url': 'http://www.lemonde.fr/les-decodeurs/article/2016/10/18/tout-comprendre-du-ceta-le-petit-cousin-du-traite-transatlantique_5015920_4355770.html',
'info_dict': {
'id': 'uzsxms',
'ext': 'mp4',
'title': "CETA : quelles suites pour l'accord commercial entre l'Europe et le Canada ?",
'thumbnail': r're:^https?://.*\.jpg',
'duration': 325,
'upload_date': '20161021',
'timestamp': 1477044540,
'uploader_id': '3pmkp',
},
'params': {
'skip_download': True,
},
}, { }, {
'url': 'http://redaction.actu.lemonde.fr/societe/video/2016/01/18/calais-debut-des-travaux-de-defrichement-dans-la-jungle_4849233_3224.html', 'url': 'http://redaction.actu.lemonde.fr/societe/video/2016/01/18/calais-debut-des-travaux-de-defrichement-dans-la-jungle_4849233_3224.html',
'only_matching': True, 'only_matching': True,
}, {
# YouTube embeds
'url': 'http://www.lemonde.fr/pixels/article/2016/12/09/pourquoi-pewdiepie-superstar-de-youtube-a-menace-de-fermer-sa-chaine_5046649_4408996.html',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -30,5 +50,9 @@ class LemondeIE(InfoExtractor):
digiteka_url = self._proto_relative_url(self._search_regex( digiteka_url = self._proto_relative_url(self._search_regex(
r'url\s*:\s*(["\'])(?P<url>(?:https?://)?//(?:www\.)?(?:digiteka\.net|ultimedia\.com)/deliver/.+?)\1', r'url\s*:\s*(["\'])(?P<url>(?:https?://)?//(?:www\.)?(?:digiteka\.net|ultimedia\.com)/deliver/.+?)\1',
webpage, 'digiteka url', group='url')) webpage, 'digiteka url', group='url', default=None))
return self.url_result(digiteka_url, 'Digiteka')
if digiteka_url:
return self.url_result(digiteka_url, 'Digiteka')
return self.url_result(url, 'Generic')

View File

@ -8,6 +8,7 @@ from ..utils import (
determine_ext, determine_ext,
float_or_none, float_or_none,
int_or_none, int_or_none,
unsmuggle_url,
) )
@ -15,20 +16,23 @@ class LimelightBaseIE(InfoExtractor):
_PLAYLIST_SERVICE_URL = 'http://production-ps.lvp.llnw.net/r/PlaylistService/%s/%s/%s' _PLAYLIST_SERVICE_URL = 'http://production-ps.lvp.llnw.net/r/PlaylistService/%s/%s/%s'
_API_URL = 'http://api.video.limelight.com/rest/organizations/%s/%s/%s/%s.json' _API_URL = 'http://api.video.limelight.com/rest/organizations/%s/%s/%s/%s.json'
def _call_playlist_service(self, item_id, method, fatal=True): def _call_playlist_service(self, item_id, method, fatal=True, referer=None):
headers = {}
if referer:
headers['Referer'] = referer
return self._download_json( return self._download_json(
self._PLAYLIST_SERVICE_URL % (self._PLAYLIST_SERVICE_PATH, item_id, method), self._PLAYLIST_SERVICE_URL % (self._PLAYLIST_SERVICE_PATH, item_id, method),
item_id, 'Downloading PlaylistService %s JSON' % method, fatal=fatal) item_id, 'Downloading PlaylistService %s JSON' % method, fatal=fatal, headers=headers)
def _call_api(self, organization_id, item_id, method): def _call_api(self, organization_id, item_id, method):
return self._download_json( return self._download_json(
self._API_URL % (organization_id, self._API_PATH, item_id, method), self._API_URL % (organization_id, self._API_PATH, item_id, method),
item_id, 'Downloading API %s JSON' % method) item_id, 'Downloading API %s JSON' % method)
def _extract(self, item_id, pc_method, mobile_method, meta_method): def _extract(self, item_id, pc_method, mobile_method, meta_method, referer=None):
pc = self._call_playlist_service(item_id, pc_method) pc = self._call_playlist_service(item_id, pc_method, referer=referer)
metadata = self._call_api(pc['orgId'], item_id, meta_method) metadata = self._call_api(pc['orgId'], item_id, meta_method)
mobile = self._call_playlist_service(item_id, mobile_method, fatal=False) mobile = self._call_playlist_service(item_id, mobile_method, fatal=False, referer=referer)
return pc, mobile, metadata return pc, mobile, metadata
def _extract_info(self, streams, mobile_urls, properties): def _extract_info(self, streams, mobile_urls, properties):
@ -207,10 +211,13 @@ class LimelightMediaIE(LimelightBaseIE):
_API_PATH = 'media' _API_PATH = 'media'
def _real_extract(self, url): def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
video_id = self._match_id(url) video_id = self._match_id(url)
pc, mobile, metadata = self._extract( pc, mobile, metadata = self._extract(
video_id, 'getPlaylistByMediaId', 'getMobilePlaylistByMediaId', 'properties') video_id, 'getPlaylistByMediaId',
'getMobilePlaylistByMediaId', 'properties',
smuggled_data.get('source_url'))
return self._extract_info( return self._extract_info(
pc['playlistItems'][0].get('streams', []), pc['playlistItems'][0].get('streams', []),
@ -247,11 +254,13 @@ class LimelightChannelIE(LimelightBaseIE):
_API_PATH = 'channels' _API_PATH = 'channels'
def _real_extract(self, url): def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
channel_id = self._match_id(url) channel_id = self._match_id(url)
pc, mobile, medias = self._extract( pc, mobile, medias = self._extract(
channel_id, 'getPlaylistByChannelId', channel_id, 'getPlaylistByChannelId',
'getMobilePlaylistWithNItemsByChannelId?begin=0&count=-1', 'media') 'getMobilePlaylistWithNItemsByChannelId?begin=0&count=-1',
'media', smuggled_data.get('source_url'))
entries = [ entries = [
self._extract_info( self._extract_info(

View File

@ -6,12 +6,12 @@ from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_parse_qs, compat_parse_qs,
compat_urllib_parse_unquote, compat_urllib_parse_unquote,
compat_urllib_parse_urlencode,
) )
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
ExtractorError, ExtractorError,
int_or_none, int_or_none,
urlencode_postdata,
get_element_by_attribute, get_element_by_attribute,
mimetype2ext, mimetype2ext,
) )
@ -50,6 +50,21 @@ class MetacafeIE(InfoExtractor):
}, },
'skip': 'Page is temporarily unavailable.', 'skip': 'Page is temporarily unavailable.',
}, },
# metacafe video with family filter
{
'url': 'http://www.metacafe.com/watch/2155630/adult_art_by_david_hart_156/',
'md5': 'b06082c5079bbdcde677a6291fbdf376',
'info_dict': {
'id': '2155630',
'ext': 'mp4',
'title': 'Adult Art By David Hart 156',
'uploader': '63346',
'description': 'md5:9afac8fc885252201ad14563694040fc',
},
'params': {
'skip_download': True,
},
},
# AnyClip video # AnyClip video
{ {
'url': 'http://www.metacafe.com/watch/an-dVVXnuY7Jh77J/the_andromeda_strain_1971_stop_the_bomb_part_3/', 'url': 'http://www.metacafe.com/watch/an-dVVXnuY7Jh77J/the_andromeda_strain_1971_stop_the_bomb_part_3/',
@ -112,22 +127,6 @@ class MetacafeIE(InfoExtractor):
def report_disclaimer(self): def report_disclaimer(self):
self.to_screen('Retrieving disclaimer') self.to_screen('Retrieving disclaimer')
def _confirm_age(self):
# Retrieve disclaimer
self.report_disclaimer()
self._download_webpage(self._DISCLAIMER, None, False, 'Unable to retrieve disclaimer')
# Confirm age
self.report_age_confirmation()
self._download_webpage(
self._FILTER_POST, None, False, 'Unable to confirm age',
data=urlencode_postdata({
'filters': '0',
'submit': "Continue - I'm over 18",
}), headers={
'Content-Type': 'application/x-www-form-urlencoded',
})
def _real_extract(self, url): def _real_extract(self, url):
# Extract id and simplified title from URL # Extract id and simplified title from URL
video_id, display_id = re.match(self._VALID_URL, url).groups() video_id, display_id = re.match(self._VALID_URL, url).groups()
@ -143,13 +142,15 @@ class MetacafeIE(InfoExtractor):
if prefix == 'cb': if prefix == 'cb':
return self.url_result('theplatform:%s' % ext_id, 'ThePlatform') return self.url_result('theplatform:%s' % ext_id, 'ThePlatform')
# self._confirm_age() headers = {
# Disable family filter
'Cookie': 'user=%s; ' % compat_urllib_parse_urlencode({'ffilter': False})
}
# AnyClip videos require the flashversion cookie so that we get the link # AnyClip videos require the flashversion cookie so that we get the link
# to the mp4 file # to the mp4 file
headers = {}
if video_id.startswith('an-'): if video_id.startswith('an-'):
headers['Cookie'] = 'flashVersion=0;' headers['Cookie'] += 'flashVersion=0; '
# Retrieve video webpage to extract further information # Retrieve video webpage to extract further information
webpage = self._download_webpage(url, video_id, headers=headers) webpage = self._download_webpage(url, video_id, headers=headers)

View File

@ -17,9 +17,10 @@ class MySpaceIE(InfoExtractor):
_TESTS = [ _TESTS = [
{ {
'url': 'https://myspace.com/fiveminutestothestage/video/little-big-town/109594919', 'url': 'https://myspace.com/fiveminutestothestage/video/little-big-town/109594919',
'md5': '9c1483c106f4a695c47d2911feed50a7',
'info_dict': { 'info_dict': {
'id': '109594919', 'id': '109594919',
'ext': 'flv', 'ext': 'mp4',
'title': 'Little Big Town', 'title': 'Little Big Town',
'description': 'This country quartet was all smiles while playing a sold out show at the Pacific Amphitheatre in Orange County, California.', 'description': 'This country quartet was all smiles while playing a sold out show at the Pacific Amphitheatre in Orange County, California.',
'uploader': 'Five Minutes to the Stage', 'uploader': 'Five Minutes to the Stage',
@ -27,37 +28,30 @@ class MySpaceIE(InfoExtractor):
'timestamp': 1414108751, 'timestamp': 1414108751,
'upload_date': '20141023', 'upload_date': '20141023',
}, },
'params': {
# rtmp download
'skip_download': True,
},
}, },
# songs # songs
{ {
'url': 'https://myspace.com/killsorrow/music/song/of-weakened-soul...-93388656-103880681', 'url': 'https://myspace.com/killsorrow/music/song/of-weakened-soul...-93388656-103880681',
'md5': '1d7ee4604a3da226dd69a123f748b262',
'info_dict': { 'info_dict': {
'id': '93388656', 'id': '93388656',
'ext': 'flv', 'ext': 'm4a',
'title': 'Of weakened soul...', 'title': 'Of weakened soul...',
'uploader': 'Killsorrow', 'uploader': 'Killsorrow',
'uploader_id': 'killsorrow', 'uploader_id': 'killsorrow',
}, },
'params': {
# rtmp download
'skip_download': True,
},
}, { }, {
'add_ie': ['Vevo'], 'add_ie': ['Youtube'],
'url': 'https://myspace.com/threedaysgrace/music/song/animal-i-have-become-28400208-28218041', 'url': 'https://myspace.com/threedaysgrace/music/song/animal-i-have-become-28400208-28218041',
'info_dict': { 'info_dict': {
'id': 'USZM20600099', 'id': 'xqds0B_meys',
'ext': 'mp4', 'ext': 'webm',
'title': 'Animal I Have Become', 'title': 'Three Days Grace - Animal I Have Become',
'uploader': 'Three Days Grace', 'description': 'md5:8bd86b3693e72a077cf863a8530c54bb',
'timestamp': int, 'uploader': 'ThreeDaysGraceVEVO',
'upload_date': '20060502', 'uploader_id': 'ThreeDaysGraceVEVO',
'upload_date': '20091002',
}, },
'skip': 'VEVO is only available in some countries',
}, { }, {
'add_ie': ['Youtube'], 'add_ie': ['Youtube'],
'url': 'https://myspace.com/starset2/music/song/first-light-95799905-106964426', 'url': 'https://myspace.com/starset2/music/song/first-light-95799905-106964426',
@ -76,24 +70,46 @@ class MySpaceIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id') video_id = mobj.group('id')
is_song = mobj.group('mediatype').startswith('music/song')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
player_url = self._search_regex( player_url = self._search_regex(
r'playerSwf":"([^"?]*)', webpage, 'player URL') r'videoSwf":"([^"?]*)', webpage, 'player URL', fatal=False)
def rtmp_format_from_stream_url(stream_url, width=None, height=None): def formats_from_stream_urls(stream_url, hls_stream_url, http_stream_url, width=None, height=None):
rtmp_url, play_path = stream_url.split(';', 1) formats = []
return { vcodec = 'none' if is_song else None
'format_id': 'rtmp', if hls_stream_url:
'url': rtmp_url, formats.append({
'play_path': play_path, 'format_id': 'hls',
'player_url': player_url, 'url': hls_stream_url,
'protocol': 'rtmp', 'protocol': 'm3u8_native',
'ext': 'flv', 'ext': 'm4a' if is_song else 'mp4',
'width': width, 'vcodec': vcodec,
'height': height, })
} if stream_url and player_url:
rtmp_url, play_path = stream_url.split(';', 1)
formats.append({
'format_id': 'rtmp',
'url': rtmp_url,
'play_path': play_path,
'player_url': player_url,
'protocol': 'rtmp',
'ext': 'flv',
'width': width,
'height': height,
'vcodec': vcodec,
})
if http_stream_url:
formats.append({
'format_id': 'http',
'url': http_stream_url,
'width': width,
'height': height,
'vcodec': vcodec,
})
return formats
if mobj.group('mediatype').startswith('music/song'): if is_song:
# songs don't store any useful info in the 'context' variable # songs don't store any useful info in the 'context' variable
song_data = self._search_regex( song_data = self._search_regex(
r'''<button.*data-song-id=(["\'])%s\1.*''' % video_id, r'''<button.*data-song-id=(["\'])%s\1.*''' % video_id,
@ -108,8 +124,10 @@ class MySpaceIE(InfoExtractor):
return self._search_regex( return self._search_regex(
r'''data-%s=([\'"])(?P<data>.*?)\1''' % name, r'''data-%s=([\'"])(?P<data>.*?)\1''' % name,
song_data, name, default='', group='data') song_data, name, default='', group='data')
stream_url = search_data('stream-url') formats = formats_from_stream_urls(
if not stream_url: search_data('stream-url'), search_data('hls-stream-url'),
search_data('http-stream-url'))
if not formats:
vevo_id = search_data('vevo-id') vevo_id = search_data('vevo-id')
youtube_id = search_data('youtube-id') youtube_id = search_data('youtube-id')
if vevo_id: if vevo_id:
@ -121,6 +139,7 @@ class MySpaceIE(InfoExtractor):
else: else:
raise ExtractorError( raise ExtractorError(
'Found song but don\'t know how to download it') 'Found song but don\'t know how to download it')
self._sort_formats(formats)
return { return {
'id': video_id, 'id': video_id,
'title': self._og_search_title(webpage), 'title': self._og_search_title(webpage),
@ -128,27 +147,16 @@ class MySpaceIE(InfoExtractor):
'uploader_id': search_data('artist-username'), 'uploader_id': search_data('artist-username'),
'thumbnail': self._og_search_thumbnail(webpage), 'thumbnail': self._og_search_thumbnail(webpage),
'duration': int_or_none(search_data('duration')), 'duration': int_or_none(search_data('duration')),
'formats': [rtmp_format_from_stream_url(stream_url)] 'formats': formats,
} }
else: else:
video = self._parse_json(self._search_regex( video = self._parse_json(self._search_regex(
r'context = ({.*?});', webpage, 'context'), r'context = ({.*?});', webpage, 'context'),
video_id)['video'] video_id)['video']
formats = [] formats = formats_from_stream_urls(
hls_stream_url = video.get('hlsStreamUrl') video.get('streamUrl'), video.get('hlsStreamUrl'),
if hls_stream_url: video.get('mp4StreamUrl'), int_or_none(video.get('width')),
formats.append({ int_or_none(video.get('height')))
'format_id': 'hls',
'url': hls_stream_url,
'protocol': 'm3u8_native',
'ext': 'mp4',
})
stream_url = video.get('streamUrl')
if stream_url:
formats.append(rtmp_format_from_stream_url(
stream_url,
int_or_none(video.get('width')),
int_or_none(video.get('height'))))
self._sort_formats(formats) self._sort_formats(formats)
return { return {
'id': video_id, 'id': video_id,

View File

@ -4,23 +4,26 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from .theplatform import ThePlatformIE from .theplatform import ThePlatformIE
from .adobepass import AdobePassIE
from ..compat import compat_urllib_parse_urlparse
from ..utils import ( from ..utils import (
find_xpath_attr, find_xpath_attr,
lowercase_escape, lowercase_escape,
smuggle_url, smuggle_url,
unescapeHTML, unescapeHTML,
update_url_query, update_url_query,
int_or_none,
) )
class NBCIE(InfoExtractor): class NBCIE(AdobePassIE):
_VALID_URL = r'https?://(?:www\.)?nbc\.com/(?:[^/]+/)+(?P<id>n?\d+)' _VALID_URL = r'https?://(?:www\.)?nbc\.com/(?:[^/]+/)+(?P<id>n?\d+)'
_TESTS = [ _TESTS = [
{ {
'url': 'http://www.nbc.com/the-tonight-show/segments/112966', 'url': 'http://www.nbc.com/the-tonight-show/video/jimmy-fallon-surprises-fans-at-ben-jerrys/2848237',
'info_dict': { 'info_dict': {
'id': '112966', 'id': '2848237',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Jimmy Fallon Surprises Fans at Ben & Jerry\'s', 'title': 'Jimmy Fallon Surprises Fans at Ben & Jerry\'s',
'description': 'Jimmy gives out free scoops of his new "Tonight Dough" ice cream flavor by surprising customers at the Ben & Jerry\'s scoop shop.', 'description': 'Jimmy gives out free scoops of his new "Tonight Dough" ice cream flavor by surprising customers at the Ben & Jerry\'s scoop shop.',
@ -69,7 +72,7 @@ class NBCIE(InfoExtractor):
# HLS streams requires the 'hdnea3' cookie # HLS streams requires the 'hdnea3' cookie
'url': 'http://www.nbc.com/Kings/video/goliath/n1806', 'url': 'http://www.nbc.com/Kings/video/goliath/n1806',
'info_dict': { 'info_dict': {
'id': 'n1806', 'id': '101528f5a9e8127b107e98c5e6ce4638',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Goliath', 'title': 'Goliath',
'description': 'When an unknown soldier saves the life of the King\'s son in battle, he\'s thrust into the limelight and politics of the kingdom.', 'description': 'When an unknown soldier saves the life of the King\'s son in battle, he\'s thrust into the limelight and politics of the kingdom.',
@ -87,21 +90,57 @@ class NBCIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
theplatform_url = unescapeHTML(lowercase_escape(self._html_search_regex( info = {
[
r'(?:class="video-player video-player-full" data-mpx-url|class="player" src)="(.*?)"',
r'<iframe[^>]+src="((?:https?:)?//player\.theplatform\.com/[^"]+)"',
r'"embedURL"\s*:\s*"([^"]+)"'
],
webpage, 'theplatform url').replace('_no_endcard', '').replace('\\/', '/')))
if theplatform_url.startswith('//'):
theplatform_url = 'http:' + theplatform_url
return {
'_type': 'url_transparent', '_type': 'url_transparent',
'ie_key': 'ThePlatform', 'ie_key': 'ThePlatform',
'url': smuggle_url(theplatform_url, {'source_url': url}),
'id': video_id, 'id': video_id,
} }
video_data = None
preload = self._search_regex(
r'PRELOAD\s*=\s*({.+})', webpage, 'preload data', default=None)
if preload:
preload_data = self._parse_json(preload, video_id)
path = compat_urllib_parse_urlparse(url).path.rstrip('/')
entity_id = preload_data.get('xref', {}).get(path)
video_data = preload_data.get('entities', {}).get(entity_id)
if video_data:
query = {
'mbr': 'true',
'manifest': 'm3u',
}
video_id = video_data['guid']
title = video_data['title']
if video_data.get('entitlement') == 'auth':
resource = self._get_mvpd_resource(
'nbcentertainment', title, video_id,
video_data.get('vChipRating'))
query['auth'] = self._extract_mvpd_auth(
url, video_id, 'nbcentertainment', resource)
theplatform_url = smuggle_url(update_url_query(
'http://link.theplatform.com/s/NnzsPC/media/guid/2410887629/' + video_id,
query), {'force_smil_url': True})
info.update({
'id': video_id,
'title': title,
'url': theplatform_url,
'description': video_data.get('description'),
'keywords': video_data.get('keywords'),
'season_number': int_or_none(video_data.get('seasonNumber')),
'episode_number': int_or_none(video_data.get('episodeNumber')),
'series': video_data.get('showName'),
})
else:
theplatform_url = unescapeHTML(lowercase_escape(self._html_search_regex(
[
r'(?:class="video-player video-player-full" data-mpx-url|class="player" src)="(.*?)"',
r'<iframe[^>]+src="((?:https?:)?//player\.theplatform\.com/[^"]+)"',
r'"embedURL"\s*:\s*"([^"]+)"'
],
webpage, 'theplatform url').replace('_no_endcard', '').replace('\\/', '/')))
if theplatform_url.startswith('//'):
theplatform_url = 'http:' + theplatform_url
info['url'] = smuggle_url(theplatform_url, {'source_url': url})
return info
class NBCSportsVPlayerIE(InfoExtractor): class NBCSportsVPlayerIE(InfoExtractor):

View File

@ -19,6 +19,7 @@ class NineCNineMediaBaseIE(InfoExtractor):
class NineCNineMediaStackIE(NineCNineMediaBaseIE): class NineCNineMediaStackIE(NineCNineMediaBaseIE):
IE_NAME = '9c9media:stack' IE_NAME = '9c9media:stack'
_GEO_COUNTRIES = ['CA']
_VALID_URL = r'9c9media:stack:(?P<destination_code>[^:]+):(?P<content_id>\d+):(?P<content_package>\d+):(?P<id>\d+)' _VALID_URL = r'9c9media:stack:(?P<destination_code>[^:]+):(?P<content_id>\d+):(?P<content_package>\d+):(?P<id>\d+)'
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -1,7 +1,6 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import random
import re import re
from .common import InfoExtractor from .common import InfoExtractor
@ -15,24 +14,7 @@ from ..utils import (
class NRKBaseIE(InfoExtractor): class NRKBaseIE(InfoExtractor):
_faked_ip = None _GEO_COUNTRIES = ['NO']
def _download_webpage_handle(self, *args, **kwargs):
# NRK checks X-Forwarded-For HTTP header in order to figure out the
# origin of the client behind proxy. This allows to bypass geo
# restriction by faking this header's value to some Norway IP.
# We will do so once we encounter any geo restriction error.
if self._faked_ip:
# NB: str is intentional
kwargs.setdefault(str('headers'), {})['X-Forwarded-For'] = self._faked_ip
return super(NRKBaseIE, self)._download_webpage_handle(*args, **kwargs)
def _fake_ip(self):
# Use fake IP from 37.191.128.0/17 in order to workaround geo
# restriction
def octet(lb=0, ub=255):
return random.randint(lb, ub)
self._faked_ip = '37.191.%d.%d' % (octet(128), octet())
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
@ -44,8 +26,6 @@ class NRKBaseIE(InfoExtractor):
title = data.get('fullTitle') or data.get('mainTitle') or data['title'] title = data.get('fullTitle') or data.get('mainTitle') or data['title']
video_id = data.get('id') or video_id video_id = data.get('id') or video_id
http_headers = {'X-Forwarded-For': self._faked_ip} if self._faked_ip else {}
entries = [] entries = []
conviva = data.get('convivaStatistics') or {} conviva = data.get('convivaStatistics') or {}
@ -90,7 +70,6 @@ class NRKBaseIE(InfoExtractor):
'duration': duration, 'duration': duration,
'subtitles': subtitles, 'subtitles': subtitles,
'formats': formats, 'formats': formats,
'http_headers': http_headers,
}) })
if not entries: if not entries:
@ -107,19 +86,17 @@ class NRKBaseIE(InfoExtractor):
}] }]
if not entries: if not entries:
message_type = data.get('messageType', '')
# Can be ProgramIsGeoBlocked or ChannelIsGeoBlocked*
if 'IsGeoBlocked' in message_type and not self._faked_ip:
self.report_warning(
'Video is geo restricted, trying to fake IP')
self._fake_ip()
return self._real_extract(url)
MESSAGES = { MESSAGES = {
'ProgramRightsAreNotReady': 'Du kan dessverre ikke se eller høre programmet', 'ProgramRightsAreNotReady': 'Du kan dessverre ikke se eller høre programmet',
'ProgramRightsHasExpired': 'Programmet har gått ut', 'ProgramRightsHasExpired': 'Programmet har gått ut',
'ProgramIsGeoBlocked': 'NRK har ikke rettigheter til å vise dette programmet utenfor Norge', 'ProgramIsGeoBlocked': 'NRK har ikke rettigheter til å vise dette programmet utenfor Norge',
} }
message_type = data.get('messageType', '')
# Can be ProgramIsGeoBlocked or ChannelIsGeoBlocked*
if 'IsGeoBlocked' in message_type:
self.raise_geo_restricted(
msg=MESSAGES.get('ProgramIsGeoBlocked'),
countries=self._GEO_COUNTRIES)
raise ExtractorError( raise ExtractorError(
'%s said: %s' % (self.IE_NAME, MESSAGES.get( '%s said: %s' % (self.IE_NAME, MESSAGES.get(
message_type, message_type)), message_type, message_type)),
@ -188,12 +165,12 @@ class NRKIE(NRKBaseIE):
https?:// https?://
(?: (?:
(?:www\.)?nrk\.no/video/PS\*| (?:www\.)?nrk\.no/video/PS\*|
v8-psapi\.nrk\.no/mediaelement/ v8[-.]psapi\.nrk\.no/mediaelement/
) )
) )
(?P<id>[^/?#&]+) (?P<id>[^?#&]+)
''' '''
_API_HOST = 'v8.psapi.nrk.no' _API_HOST = 'v8-psapi.nrk.no'
_TESTS = [{ _TESTS = [{
# video # video
'url': 'http://www.nrk.no/video/PS*150533', 'url': 'http://www.nrk.no/video/PS*150533',
@ -219,6 +196,9 @@ class NRKIE(NRKBaseIE):
}, { }, {
'url': 'nrk:ecc1b952-96dc-4a98-81b9-5296dc7a98d9', 'url': 'nrk:ecc1b952-96dc-4a98-81b9-5296dc7a98d9',
'only_matching': True, 'only_matching': True,
}, {
'url': 'nrk:clip/7707d5a3-ebe7-434a-87d5-a3ebe7a34a70',
'only_matching': True,
}, { }, {
'url': 'https://v8-psapi.nrk.no/mediaelement/ecc1b952-96dc-4a98-81b9-5296dc7a98d9', 'url': 'https://v8-psapi.nrk.no/mediaelement/ecc1b952-96dc-4a98-81b9-5296dc7a98d9',
'only_matching': True, 'only_matching': True,

View File

@ -1,15 +1,16 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
from .jwplatform import JWPlatformBaseIE from .common import InfoExtractor
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
js_to_json, js_to_json,
) )
class OnDemandKoreaIE(JWPlatformBaseIE): class OnDemandKoreaIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?ondemandkorea\.com/(?P<id>[^/]+)\.html' _VALID_URL = r'https?://(?:www\.)?ondemandkorea\.com/(?P<id>[^/]+)\.html'
_GEO_COUNTRIES = ['US', 'CA']
_TEST = { _TEST = {
'url': 'http://www.ondemandkorea.com/ask-us-anything-e43.html', 'url': 'http://www.ondemandkorea.com/ask-us-anything-e43.html',
'info_dict': { 'info_dict': {
@ -35,7 +36,8 @@ class OnDemandKoreaIE(JWPlatformBaseIE):
if 'msg_block_01.png' in webpage: if 'msg_block_01.png' in webpage:
self.raise_geo_restricted( self.raise_geo_restricted(
'This content is not available in your region') msg='This content is not available in your region',
countries=self._GEO_COUNTRIES)
if 'This video is only available to ODK PLUS members.' in webpage: if 'This video is only available to ODK PLUS members.' in webpage:
raise ExtractorError( raise ExtractorError(

View File

@ -23,7 +23,7 @@ class OnetBaseIE(InfoExtractor):
return self._search_regex( return self._search_regex(
r'id=(["\'])mvp:(?P<id>.+?)\1', webpage, 'mvp id', group='id') r'id=(["\'])mvp:(?P<id>.+?)\1', webpage, 'mvp id', group='id')
def _extract_from_id(self, video_id, webpage): def _extract_from_id(self, video_id, webpage=None):
response = self._download_json( response = self._download_json(
'http://qi.ckm.onetapi.pl/', video_id, 'http://qi.ckm.onetapi.pl/', video_id,
query={ query={
@ -74,8 +74,10 @@ class OnetBaseIE(InfoExtractor):
meta = video.get('meta', {}) meta = video.get('meta', {})
title = self._og_search_title(webpage, default=None) or meta['title'] title = (self._og_search_title(
description = self._og_search_description(webpage, default=None) or meta.get('description') webpage, default=None) if webpage else None) or meta['title']
description = (self._og_search_description(
webpage, default=None) if webpage else None) or meta.get('description')
duration = meta.get('length') or meta.get('lenght') duration = meta.get('length') or meta.get('lenght')
timestamp = parse_iso8601(meta.get('addDate'), ' ') timestamp = parse_iso8601(meta.get('addDate'), ' ')
@ -89,6 +91,18 @@ class OnetBaseIE(InfoExtractor):
} }
class OnetMVPIE(OnetBaseIE):
_VALID_URL = r'onetmvp:(?P<id>\d+\.\d+)'
_TEST = {
'url': 'onetmvp:381027.1509591944',
'only_matching': True,
}
def _real_extract(self, url):
return self._extract_from_id(self._match_id(url))
class OnetIE(OnetBaseIE): class OnetIE(OnetBaseIE):
_VALID_URL = r'https?://(?:www\.)?onet\.tv/[a-z]/[a-z]+/(?P<display_id>[0-9a-z-]+)/(?P<id>[0-9a-z]+)' _VALID_URL = r'https?://(?:www\.)?onet\.tv/[a-z]/[a-z]+/(?P<display_id>[0-9a-z-]+)/(?P<id>[0-9a-z]+)'
IE_NAME = 'onet.tv' IE_NAME = 'onet.tv'
@ -167,3 +181,44 @@ class OnetChannelIE(OnetBaseIE):
channel_title = strip_or_none(get_element_by_class('o_channelName', webpage)) channel_title = strip_or_none(get_element_by_class('o_channelName', webpage))
channel_description = strip_or_none(get_element_by_class('o_channelDesc', webpage)) channel_description = strip_or_none(get_element_by_class('o_channelDesc', webpage))
return self.playlist_result(entries, channel_id, channel_title, channel_description) return self.playlist_result(entries, channel_id, channel_title, channel_description)
class OnetPlIE(InfoExtractor):
_VALID_URL = r'https?://(?:[^/]+\.)?(?:onet|businessinsider\.com|plejada)\.pl/(?:[^/]+/)+(?P<id>[0-9a-z]+)'
IE_NAME = 'onet.pl'
_TESTS = [{
'url': 'http://eurosport.onet.pl/zimowe/skoki-narciarskie/ziobro-wygral-kwalifikacje-w-pjongczangu/9ckrly',
'md5': 'b94021eb56214c3969380388b6e73cb0',
'info_dict': {
'id': '1561707.1685479',
'ext': 'mp4',
'title': 'Ziobro wygrał kwalifikacje w Pjongczangu',
'description': 'md5:61fb0740084d2d702ea96512a03585b4',
'upload_date': '20170214',
'timestamp': 1487078046,
},
}, {
'url': 'http://film.onet.pl/zwiastuny/ghost-in-the-shell-drugi-zwiastun-pl/5q6yl3',
'only_matching': True,
}, {
'url': 'http://moto.onet.pl/jak-wybierane-sa-miejsca-na-fotoradary/6rs04e',
'only_matching': True,
}, {
'url': 'http://businessinsider.com.pl/wideo/scenariusz-na-koniec-swiata-wedlug-nasa/dwnqptk',
'only_matching': True,
}, {
'url': 'http://plejada.pl/weronika-rosati-o-swoim-domniemanym-slubie/n2bq89',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
mvp_id = self._search_regex(
r'data-params-mvp=["\'](\d+\.\d+)', webpage, 'mvp id')
return self.url_result(
'onetmvp:%s' % mvp_id, OnetMVPIE.ie_key(), video_id=mvp_id)

View File

@ -75,17 +75,17 @@ class OpenloadIE(InfoExtractor):
'<span[^>]+id="[^"]+"[^>]*>([0-9]+)</span>', '<span[^>]+id="[^"]+"[^>]*>([0-9]+)</span>',
webpage, 'openload ID') webpage, 'openload ID')
first_three_chars = int(float(ol_id[0:][:3])) first_two_chars = int(float(ol_id[0:][:2]))
fifth_char = int(float(ol_id[3:5])) urlcode = []
urlcode = '' num = 2
num = 5
while num < len(ol_id): while num < len(ol_id):
urlcode += compat_chr(int(float(ol_id[num:][:3])) + key = int(float(ol_id[num + 3:][:2]))
first_three_chars - fifth_char * int(float(ol_id[num + 3:][:2]))) urlcode.append((key, compat_chr(int(float(ol_id[num:][:3])) - first_two_chars)))
num += 5 num += 5
video_url = 'https://openload.co/stream/' + urlcode video_url = 'https://openload.co/stream/' + ''.join(
[value for _, value in sorted(urlcode, key=lambda x: x[0])])
title = self._og_search_title(webpage, default=None) or self._search_regex( title = self._og_search_title(webpage, default=None) or self._search_regex(
r'<span[^>]+class=["\']title["\'][^>]*>([^<]+)', webpage, r'<span[^>]+class=["\']title["\'][^>]*>([^<]+)', webpage,

View File

@ -193,6 +193,8 @@ class PBSIE(InfoExtractor):
) )
''' % '|'.join(list(zip(*_STATIONS))[0]) ''' % '|'.join(list(zip(*_STATIONS))[0])
_GEO_COUNTRIES = ['US']
_TESTS = [ _TESTS = [
{ {
'url': 'http://www.pbs.org/tpt/constitution-usa-peter-sagal/watch/a-more-perfect-union/', 'url': 'http://www.pbs.org/tpt/constitution-usa-peter-sagal/watch/a-more-perfect-union/',
@ -489,11 +491,13 @@ class PBSIE(InfoExtractor):
headers=self.geo_verification_headers()) headers=self.geo_verification_headers())
if redirect_info['status'] == 'error': if redirect_info['status'] == 'error':
message = self._ERRORS.get(
redirect_info['http_code'], redirect_info['message'])
if redirect_info['http_code'] == 403:
self.raise_geo_restricted(
msg=message, countries=self._GEO_COUNTRIES)
raise ExtractorError( raise ExtractorError(
'%s said: %s' % ( '%s said: %s' % (self.IE_NAME, message), expected=True)
self.IE_NAME,
self._ERRORS.get(redirect_info['http_code'], redirect_info['message'])),
expected=True)
format_url = redirect_info.get('url') format_url = redirect_info.get('url')
if not format_url: if not format_url:

View File

@ -16,18 +16,33 @@ from ..utils import (
class PikselIE(InfoExtractor): class PikselIE(InfoExtractor):
_VALID_URL = r'https?://player\.piksel\.com/v/(?P<id>[a-z0-9]+)' _VALID_URL = r'https?://player\.piksel\.com/v/(?P<id>[a-z0-9]+)'
_TEST = { _TESTS = [
'url': 'http://player.piksel.com/v/nv60p12f', {
'md5': 'd9c17bbe9c3386344f9cfd32fad8d235', 'url': 'http://player.piksel.com/v/nv60p12f',
'info_dict': { 'md5': 'd9c17bbe9c3386344f9cfd32fad8d235',
'id': 'nv60p12f', 'info_dict': {
'ext': 'mp4', 'id': 'nv60p12f',
'title': 'فن الحياة - الحلقة 1', 'ext': 'mp4',
'description': 'احدث برامج الداعية الاسلامي " مصطفي حسني " فى رمضان 2016علي النهار نور', 'title': 'فن الحياة - الحلقة 1',
'timestamp': 1465231790, 'description': 'احدث برامج الداعية الاسلامي " مصطفي حسني " فى رمضان 2016علي النهار نور',
'upload_date': '20160606', 'timestamp': 1465231790,
'upload_date': '20160606',
}
},
{
# Original source: http://www.uscourts.gov/cameras-courts/state-washington-vs-donald-j-trump-et-al
'url': 'https://player.piksel.com/v/v80kqp41',
'md5': '753ddcd8cc8e4fa2dda4b7be0e77744d',
'info_dict': {
'id': 'v80kqp41',
'ext': 'mp4',
'title': 'WAW- State of Washington vs. Donald J. Trump, et al',
'description': 'State of Washington vs. Donald J. Trump, et al, Case Number 17-CV-00141-JLR, TRO Hearing, Civil Rights Case, 02/3/2017, 1:00 PM (PST), Seattle Federal Courthouse, Seattle, WA, Judge James L. Robart presiding.',
'timestamp': 1486171129,
'upload_date': '20170204',
}
} }
} ]
@staticmethod @staticmethod
def _extract_url(webpage): def _extract_url(webpage):
@ -40,8 +55,10 @@ class PikselIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
app_token = self._search_regex( app_token = self._search_regex([
r'clientAPI\s*:\s*"([^"]+)"', webpage, 'app token') r'clientAPI\s*:\s*"([^"]+)"',
r'data-de-api-key\s*=\s*"([^"]+)"'
], webpage, 'app token')
response = self._download_json( response = self._download_json(
'http://player.piksel.com/ws/ws_program/api/%s/mode/json/apiv/5' % app_token, 'http://player.piksel.com/ws/ws_program/api/%s/mode/json/apiv/5' % app_token,
video_id, query={ video_id, query={

View File

@ -64,7 +64,8 @@ class PinkbikeIE(InfoExtractor):
'video:duration', webpage, 'duration')) 'video:duration', webpage, 'duration'))
uploader = self._search_regex( uploader = self._search_regex(
r'un:\s*"([^"]+)"', webpage, 'uploader', fatal=False) r'<a[^>]+\brel=["\']author[^>]+>([^<]+)', webpage,
'uploader', fatal=False)
upload_date = unified_strdate(self._search_regex( upload_date = unified_strdate(self._search_regex(
r'class="fullTime"[^>]+title="([^"]+)"', r'class="fullTime"[^>]+title="([^"]+)"',
webpage, 'upload date', fatal=False)) webpage, 'upload date', fatal=False))

View File

@ -18,6 +18,7 @@ from ..utils import (
parse_duration, parse_duration,
qualities, qualities,
srt_subtitles_timecode, srt_subtitles_timecode,
update_url_query,
urlencode_postdata, urlencode_postdata,
) )
@ -92,6 +93,10 @@ class PluralsightIE(PluralsightBaseIE):
raise ExtractorError('Unable to login: %s' % error, expected=True) raise ExtractorError('Unable to login: %s' % error, expected=True)
if all(p not in response for p in ('__INITIAL_STATE__', '"currentUser"')): if all(p not in response for p in ('__INITIAL_STATE__', '"currentUser"')):
BLOCKED = 'Your account has been blocked due to suspicious activity'
if BLOCKED in response:
raise ExtractorError(
'Unable to login: %s' % BLOCKED, expected=True)
raise ExtractorError('Unable to log in') raise ExtractorError('Unable to log in')
def _get_subtitles(self, author, clip_id, lang, name, duration, video_id): def _get_subtitles(self, author, clip_id, lang, name, duration, video_id):
@ -327,25 +332,44 @@ class PluralsightCourseIE(PluralsightBaseIE):
# TODO: PSM cookie # TODO: PSM cookie
course = self._download_json( course = self._download_json(
'%s/data/course/%s' % (self._API_BASE, course_id), '%s/player/functions/rpc' % self._API_BASE, course_id,
course_id, 'Downloading course JSON') 'Downloading course JSON',
data=json.dumps({
'fn': 'bootstrapPlayer',
'payload': {
'courseId': course_id,
}
}).encode('utf-8'),
headers={
'Content-Type': 'application/json;charset=utf-8'
})['payload']['course']
title = course['title'] title = course['title']
course_name = course['name']
course_data = course['modules']
description = course.get('description') or course.get('shortDescription') description = course.get('description') or course.get('shortDescription')
course_data = self._download_json(
'%s/data/course/content/%s' % (self._API_BASE, course_id),
course_id, 'Downloading course data JSON')
entries = [] entries = []
for num, module in enumerate(course_data, 1): for num, module in enumerate(course_data, 1):
author = module.get('author')
module_name = module.get('name')
if not author or not module_name:
continue
for clip in module.get('clips', []): for clip in module.get('clips', []):
player_parameters = clip.get('playerParameters') clip_index = int_or_none(clip.get('index'))
if not player_parameters: if clip_index is None:
continue continue
clip_url = update_url_query(
'%s/player' % self._API_BASE, query={
'mode': 'live',
'course': course_name,
'author': author,
'name': module_name,
'clip': clip_index,
})
entries.append({ entries.append({
'_type': 'url_transparent', '_type': 'url_transparent',
'url': '%s/training/player?%s' % (self._API_BASE, player_parameters), 'url': clip_url,
'ie_key': PluralsightIE.ie_key(), 'ie_key': PluralsightIE.ie_key(),
'chapter': module.get('title'), 'chapter': module.get('title'),
'chapter_number': num, 'chapter_number': num,

View File

@ -2,27 +2,27 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import itertools import itertools
import os # import os
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_HTTPError, compat_HTTPError,
compat_urllib_parse_unquote, # compat_urllib_parse_unquote,
compat_urllib_parse_unquote_plus, # compat_urllib_parse_unquote_plus,
compat_urllib_parse_urlparse, # compat_urllib_parse_urlparse,
) )
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
int_or_none, int_or_none,
js_to_json, js_to_json,
orderedSet, orderedSet,
sanitized_Request, # sanitized_Request,
str_to_int, str_to_int,
) )
from ..aes import ( # from ..aes import (
aes_decrypt_text # aes_decrypt_text
) # )
class PornHubIE(InfoExtractor): class PornHubIE(InfoExtractor):
@ -109,10 +109,14 @@ class PornHubIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
req = sanitized_Request( def dl_webpage(platform):
'http://www.pornhub.com/view_video.php?viewkey=%s' % video_id) return self._download_webpage(
req.add_header('Cookie', 'age_verified=1') 'http://www.pornhub.com/view_video.php?viewkey=%s' % video_id,
webpage = self._download_webpage(req, video_id) video_id, headers={
'Cookie': 'age_verified=1; platform=%s' % platform,
})
webpage = dl_webpage('pc')
error_msg = self._html_search_regex( error_msg = self._html_search_regex(
r'(?s)<div[^>]+class=(["\'])(?:(?!\1).)*\b(?:removed|userMessageSection)\b(?:(?!\1).)*\1[^>]*>(?P<error>.+?)</div>', r'(?s)<div[^>]+class=(["\'])(?:(?!\1).)*\b(?:removed|userMessageSection)\b(?:(?!\1).)*\1[^>]*>(?P<error>.+?)</div>',
@ -123,10 +127,19 @@ class PornHubIE(InfoExtractor):
'PornHub said: %s' % error_msg, 'PornHub said: %s' % error_msg,
expected=True, video_id=video_id) expected=True, video_id=video_id)
tv_webpage = dl_webpage('tv')
video_url = self._search_regex(
r'<video[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//.+?)\1', tv_webpage,
'video url', group='url')
title = self._search_regex(
r'<h1>([^>]+)</h1>', tv_webpage, 'title', default=None)
# video_title from flashvars contains whitespace instead of non-ASCII (see # video_title from flashvars contains whitespace instead of non-ASCII (see
# http://www.pornhub.com/view_video.php?viewkey=1331683002), not relying # http://www.pornhub.com/view_video.php?viewkey=1331683002), not relying
# on that anymore. # on that anymore.
title = self._html_search_meta( title = title or self._html_search_meta(
'twitter:title', webpage, default=None) or self._search_regex( 'twitter:title', webpage, default=None) or self._search_regex(
(r'<h1[^>]+class=["\']title["\'][^>]*>(?P<title>[^<]+)', (r'<h1[^>]+class=["\']title["\'][^>]*>(?P<title>[^<]+)',
r'<div[^>]+data-video-title=(["\'])(?P<title>.+?)\1', r'<div[^>]+data-video-title=(["\'])(?P<title>.+?)\1',
@ -156,37 +169,6 @@ class PornHubIE(InfoExtractor):
comment_count = self._extract_count( comment_count = self._extract_count(
r'All Comments\s*<span>\(([\d,.]+)\)', webpage, 'comment') r'All Comments\s*<span>\(([\d,.]+)\)', webpage, 'comment')
video_urls = list(map(compat_urllib_parse_unquote, re.findall(r"player_quality_[0-9]{3}p\s*=\s*'([^']+)'", webpage)))
if webpage.find('"encrypted":true') != -1:
password = compat_urllib_parse_unquote_plus(
self._search_regex(r'"video_title":"([^"]+)', webpage, 'password'))
video_urls = list(map(lambda s: aes_decrypt_text(s, password, 32).decode('utf-8'), video_urls))
formats = []
for video_url in video_urls:
path = compat_urllib_parse_urlparse(video_url).path
extension = os.path.splitext(path)[1][1:]
format = path.split('/')[5].split('_')[:2]
format = '-'.join(format)
m = re.match(r'^(?P<height>[0-9]+)[pP]-(?P<tbr>[0-9]+)[kK]$', format)
if m is None:
height = None
tbr = None
else:
height = int(m.group('height'))
tbr = int(m.group('tbr'))
formats.append({
'url': video_url,
'ext': extension,
'format': format,
'format_id': format,
'tbr': tbr,
'height': height,
})
self._sort_formats(formats)
page_params = self._parse_json(self._search_regex( page_params = self._parse_json(self._search_regex(
r'page_params\.zoneDetails\[([\'"])[^\'"]+\1\]\s*=\s*(?P<data>{[^}]+})', r'page_params\.zoneDetails\[([\'"])[^\'"]+\1\]\s*=\s*(?P<data>{[^}]+})',
webpage, 'page parameters', group='data', default='{}'), webpage, 'page parameters', group='data', default='{}'),
@ -198,6 +180,7 @@ class PornHubIE(InfoExtractor):
return { return {
'id': video_id, 'id': video_id,
'url': video_url,
'uploader': video_uploader, 'uploader': video_uploader,
'title': title, 'title': title,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
@ -206,7 +189,7 @@ class PornHubIE(InfoExtractor):
'like_count': like_count, 'like_count': like_count,
'dislike_count': dislike_count, 'dislike_count': dislike_count,
'comment_count': comment_count, 'comment_count': comment_count,
'formats': formats, # 'formats': formats,
'age_limit': 18, 'age_limit': 18,
'tags': tags, 'tags': tags,
'categories': categories, 'categories': categories,

View File

@ -2,13 +2,13 @@ from __future__ import unicode_literals
import re import re
from .jwplatform import JWPlatformBaseIE from .common import InfoExtractor
from ..utils import ( from ..utils import (
str_to_int, str_to_int,
) )
class PornoXOIE(JWPlatformBaseIE): class PornoXOIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?pornoxo\.com/videos/(?P<id>\d+)/(?P<display_id>[^/]+)\.html' _VALID_URL = r'https?://(?:www\.)?pornoxo\.com/videos/(?P<id>\d+)/(?P<display_id>[^/]+)\.html'
_TEST = { _TEST = {
'url': 'http://www.pornoxo.com/videos/7564/striptease-from-sexy-secretary.html', 'url': 'http://www.pornoxo.com/videos/7564/striptease-from-sexy-secretary.html',

View File

@ -424,3 +424,6 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
return self._extract_clip(url, webpage) return self._extract_clip(url, webpage)
elif page_type == 'playlist': elif page_type == 'playlist':
return self._extract_playlist(url, webpage) return self._extract_playlist(url, webpage)
else:
raise ExtractorError(
'Unsupported page type %s' % page_type, expected=True)

View File

@ -54,9 +54,8 @@ class RadioCanadaIE(InfoExtractor):
raise ExtractorError('This video is DRM protected.', expected=True) raise ExtractorError('This video is DRM protected.', expected=True)
device_types = ['ipad'] device_types = ['ipad']
if app_code != 'toutv':
device_types.append('flash')
if not smuggled_data: if not smuggled_data:
device_types.append('flash')
device_types.append('android') device_types.append('android')
formats = [] formats = []
@ -103,7 +102,7 @@ class RadioCanadaIE(InfoExtractor):
continue continue
f_url = re.sub(r'\d+\.%s' % ext, '%d.%s' % (tbr, ext), v_url) f_url = re.sub(r'\d+\.%s' % ext, '%d.%s' % (tbr, ext), v_url)
protocol = determine_protocol({'url': f_url}) protocol = determine_protocol({'url': f_url})
formats.append({ f = {
'format_id': '%s-%d' % (protocol, tbr), 'format_id': '%s-%d' % (protocol, tbr),
'url': f_url, 'url': f_url,
'ext': 'flv' if protocol == 'rtmp' else ext, 'ext': 'flv' if protocol == 'rtmp' else ext,
@ -111,7 +110,14 @@ class RadioCanadaIE(InfoExtractor):
'width': int_or_none(url_e.get('width')), 'width': int_or_none(url_e.get('width')),
'height': int_or_none(url_e.get('height')), 'height': int_or_none(url_e.get('height')),
'tbr': tbr, 'tbr': tbr,
}) }
mobj = re.match(r'(?P<url>rtmp://[^/]+/[^/]+)/(?P<playpath>[^?]+)(?P<auth>\?.+)', f_url)
if mobj:
f.update({
'url': mobj.group('url') + mobj.group('auth'),
'play_path': mobj.group('playpath'),
})
formats.append(f)
if protocol == 'rtsp': if protocol == 'rtsp':
base_url = self._search_regex( base_url = self._search_regex(
r'rtsp://([^?]+)', f_url, 'base url', default=None) r'rtsp://([^?]+)', f_url, 'base url', default=None)

View File

@ -2,11 +2,10 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from .jwplatform import JWPlatformBaseIE
from ..compat import compat_str from ..compat import compat_str
class RENTVIE(JWPlatformBaseIE): class RENTVIE(InfoExtractor):
_VALID_URL = r'(?:rentv:|https?://(?:www\.)?ren\.tv/(?:player|video/epizod)/)(?P<id>\d+)' _VALID_URL = r'(?:rentv:|https?://(?:www\.)?ren\.tv/(?:player|video/epizod)/)(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://ren.tv/video/epizod/118577', 'url': 'http://ren.tv/video/epizod/118577',

View File

@ -3,7 +3,7 @@ from __future__ import unicode_literals
import re import re
from .jwplatform import JWPlatformBaseIE from .common import InfoExtractor
from ..utils import ( from ..utils import (
js_to_json, js_to_json,
get_element_by_class, get_element_by_class,
@ -11,7 +11,7 @@ from ..utils import (
) )
class RudoIE(JWPlatformBaseIE): class RudoIE(InfoExtractor):
_VALID_URL = r'https?://rudo\.video/vod/(?P<id>[0-9a-zA-Z]+)' _VALID_URL = r'https?://rudo\.video/vod/(?P<id>[0-9a-zA-Z]+)'
_TEST = { _TEST = {

View File

@ -1,11 +1,11 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
from .jwplatform import JWPlatformBaseIE from .common import InfoExtractor
from ..utils import js_to_json from ..utils import js_to_json
class ScreencastOMaticIE(JWPlatformBaseIE): class ScreencastOMaticIE(InfoExtractor):
_VALID_URL = r'https?://screencast-o-matic\.com/watch/(?P<id>[0-9a-zA-Z]+)' _VALID_URL = r'https?://screencast-o-matic\.com/watch/(?P<id>[0-9a-zA-Z]+)'
_TEST = { _TEST = {
'url': 'http://screencast-o-matic.com/watch/c2lD3BeOPl', 'url': 'http://screencast-o-matic.com/watch/c2lD3BeOPl',

View File

@ -0,0 +1,60 @@
# coding: utf-8
from __future__ import unicode_literals
from .adobepass import AdobePassIE
from ..utils import (
int_or_none,
smuggle_url,
update_url_query,
)
class ScrippsNetworksWatchIE(AdobePassIE):
IE_NAME = 'scrippsnetworks:watch'
_VALID_URL = r'https?://watch\.(?:hgtv|foodnetwork|travelchannel|diynetwork|cookingchanneltv)\.com/player\.[A-Z0-9]+\.html#(?P<id>\d+)'
_TEST = {
'url': 'http://watch.hgtv.com/player.HNT.html#0256538',
'md5': '26545fd676d939954c6808274bdb905a',
'info_dict': {
'id': '0256538',
'ext': 'mp4',
'title': 'Seeking a Wow House',
'description': 'Buyers retiring in Palm Springs, California, want a modern house with major wow factor. They\'re also looking for a pool and a large, open floorplan with tall windows looking out at the views.',
'uploader': 'SCNI',
'upload_date': '20170207',
'timestamp': 1486450493,
},
'skip': 'requires TV provider authentication',
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
channel = self._parse_json(self._search_regex(
r'"channels"\s*:\s*(\[.+\])',
webpage, 'channels'), video_id)[0]
video_data = next(v for v in channel['videos'] if v.get('nlvid') == video_id)
title = video_data['title']
release_url = video_data['releaseUrl']
if video_data.get('restricted'):
requestor_id = self._search_regex(
r'requestorId\s*=\s*"([^"]+)";', webpage, 'requestor id')
resource = self._get_mvpd_resource(
requestor_id, title, video_id,
video_data.get('ratings', [{}])[0].get('rating'))
auth = self._extract_mvpd_auth(
url, video_id, requestor_id, resource)
release_url = update_url_query(release_url, {'auth': auth})
return {
'_type': 'url_transparent',
'id': video_id,
'title': title,
'url': smuggle_url(release_url, {'force_smil_url': True}),
'description': video_data.get('description'),
'thumbnail': video_data.get('thumbnailUrl'),
'series': video_data.get('showTitle'),
'season_number': int_or_none(video_data.get('season')),
'episode_number': int_or_none(video_data.get('episodeNumber')),
'ie_key': 'ThePlatform',
}

View File

@ -3,7 +3,7 @@ from __future__ import unicode_literals
import re import re
from .jwplatform import JWPlatformBaseIE from .common import InfoExtractor
from ..utils import ( from ..utils import (
float_or_none, float_or_none,
parse_iso8601, parse_iso8601,
@ -14,7 +14,7 @@ from ..utils import (
) )
class SendtoNewsIE(JWPlatformBaseIE): class SendtoNewsIE(InfoExtractor):
_VALID_URL = r'https?://embed\.sendtonews\.com/player2/embedplayer\.php\?.*\bSC=(?P<id>[0-9A-Za-z-]+)' _VALID_URL = r'https?://embed\.sendtonews\.com/player2/embedplayer\.php\?.*\bSC=(?P<id>[0-9A-Za-z-]+)'
_TEST = { _TEST = {

View File

@ -1,64 +1,101 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str
from ..utils import ( from ..utils import (
qualities,
int_or_none,
mimetype2ext,
determine_ext, determine_ext,
int_or_none,
try_get,
qualities,
) )
class SixPlayIE(InfoExtractor): class SixPlayIE(InfoExtractor):
IE_NAME = '6play'
_VALID_URL = r'(?:6play:|https?://(?:www\.)?6play\.fr/.+?-c_)(?P<id>[0-9]+)' _VALID_URL = r'(?:6play:|https?://(?:www\.)?6play\.fr/.+?-c_)(?P<id>[0-9]+)'
_TEST = { _TEST = {
'url': 'http://www.6play.fr/jamel-et-ses-amis-au-marrakech-du-rire-p_1316/jamel-et-ses-amis-au-marrakech-du-rire-2015-c_11495320', 'url': 'http://www.6play.fr/le-meilleur-patissier-p_1807/le-meilleur-patissier-special-fetes-mercredi-a-21-00-sur-m6-c_11638450',
'md5': '42310bffe4ba3982db112b9cd3467328', 'md5': '42310bffe4ba3982db112b9cd3467328',
'info_dict': { 'info_dict': {
'id': '11495320', 'id': '11638450',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Jamel et ses amis au Marrakech du rire 2015', 'title': 'Le Meilleur Pâtissier, spécial fêtes mercredi à 21:00 sur M6',
'description': 'md5:ba2149d5c321d5201b78070ee839d872', 'description': 'md5:308853f6a5f9e2d55a30fc0654de415f',
'duration': 39,
'series': 'Le meilleur pâtissier',
},
'params': {
'skip_download': True,
}, },
} }
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
clip_data = self._download_json(
'https://player.m6web.fr/v2/video/config/6play-auth/FR/%s.json' % video_id,
video_id)
video_data = clip_data['videoInfo']
data = self._download_json(
'https://pc.middleware.6play.fr/6play/v2/platforms/m6group_web/services/6play/videos/clip_%s' % video_id,
video_id, query={
'csa': 5,
'with': 'clips',
})
clip_data = data['clips'][0]
title = clip_data['title']
urls = []
quality_key = qualities(['lq', 'sd', 'hq', 'hd']) quality_key = qualities(['lq', 'sd', 'hq', 'hd'])
formats = [] formats = []
for source in clip_data['sources']: for asset in clip_data['assets']:
source_type, source_url = source.get('type'), source.get('src') asset_url = asset.get('full_physical_path')
if not source_url or source_type == 'hls/primetime': protocol = asset.get('protocol')
if not asset_url or protocol == 'primetime' or asset_url in urls:
continue continue
ext = mimetype2ext(source_type) or determine_ext(source_url) urls.append(asset_url)
if ext == 'm3u8': container = asset.get('video_container')
formats.extend(self._extract_m3u8_formats( ext = determine_ext(asset_url)
source_url, video_id, 'mp4', 'm3u8_native', if container == 'm3u8' or ext == 'm3u8':
m3u8_id='hls', fatal=False)) if protocol == 'usp':
formats.extend(self._extract_f4m_formats( asset_url = re.sub(r'/([^/]+)\.ism/[^/]*\.m3u8', r'/\1.ism/\1.m3u8', asset_url)
source_url.replace('.m3u8', '.f4m'), formats.extend(self._extract_m3u8_formats(
video_id, f4m_id='hds', fatal=False)) asset_url, video_id, 'mp4', 'm3u8_native',
elif ext == 'mp4': m3u8_id='hls', fatal=False))
quality = source.get('quality') formats.extend(self._extract_f4m_formats(
asset_url.replace('.m3u8', '.f4m'),
video_id, f4m_id='hds', fatal=False))
formats.extend(self._extract_mpd_formats(
asset_url.replace('.m3u8', '.mpd'),
video_id, mpd_id='dash', fatal=False))
formats.extend(self._extract_ism_formats(
re.sub(r'/[^/]+\.m3u8', '/Manifest', asset_url),
video_id, ism_id='mss', fatal=False))
else:
formats.extend(self._extract_m3u8_formats(
asset_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
elif container == 'mp4' or ext == 'mp4':
quality = asset.get('video_quality')
formats.append({ formats.append({
'url': source_url, 'url': asset_url,
'format_id': quality, 'format_id': quality,
'quality': quality_key(quality), 'quality': quality_key(quality),
'ext': ext, 'ext': ext,
}) })
self._sort_formats(formats) self._sort_formats(formats)
def get(getter):
for src in (data, clip_data):
v = try_get(src, getter, compat_str)
if v:
return v
return { return {
'id': video_id, 'id': video_id,
'title': video_data['title'].strip(), 'title': title,
'description': video_data.get('description'), 'description': get(lambda x: x['description']),
'duration': int_or_none(video_data.get('duration')), 'duration': int_or_none(clip_data.get('duration')),
'series': video_data.get('titlePgm'), 'series': get(lambda x: x['program']['title']),
'formats': formats, 'formats': formats,
} }

View File

@ -23,6 +23,10 @@ class SpankBangIE(InfoExtractor):
# 480p only # 480p only
'url': 'http://spankbang.com/1vt0/video/solvane+gangbang', 'url': 'http://spankbang.com/1vt0/video/solvane+gangbang',
'only_matching': True, 'only_matching': True,
}, {
# no uploader
'url': 'http://spankbang.com/lklg/video/sex+with+anyone+wedding+edition+2',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -48,7 +52,7 @@ class SpankBangIE(InfoExtractor):
thumbnail = self._og_search_thumbnail(webpage) thumbnail = self._og_search_thumbnail(webpage)
uploader = self._search_regex( uploader = self._search_regex(
r'class="user"[^>]*><img[^>]+>([^<]+)', r'class="user"[^>]*><img[^>]+>([^<]+)',
webpage, 'uploader', fatal=False) webpage, 'uploader', default=None)
age_limit = self._rta_search(webpage) age_limit = self._rta_search(webpage)

View File

@ -4,65 +4,7 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urlparse from ..utils import js_to_json
from ..utils import (
js_to_json,
unified_strdate,
)
class SportBoxIE(InfoExtractor):
_VALID_URL = r'https?://news\.sportbox\.ru/(?:[^/]+/)+spbvideo_NI\d+_(?P<display_id>.+)'
_TESTS = [{
'url': 'http://news.sportbox.ru/Vidy_sporta/Avtosport/Rossijskij/spbvideo_NI483529_Gonka-2-zaezd-Obyedinenniy-2000-klassi-Turing-i-S',
'md5': 'ff56a598c2cf411a9a38a69709e97079',
'info_dict': {
'id': '80822',
'ext': 'mp4',
'title': 'Гонка 2 заезд ««Объединенный 2000»: классы Туринг и Супер-продакшн',
'description': 'md5:3d72dc4a006ab6805d82f037fdc637ad',
'thumbnail': r're:^https?://.*\.jpg$',
'upload_date': '20140928',
},
'params': {
# m3u8 download
'skip_download': True,
},
}, {
'url': 'http://news.sportbox.ru/Vidy_sporta/billiard/spbvideo_NI486287_CHempionat-mira-po-dinamichnoy-piramide-4',
'only_matching': True,
}, {
'url': 'http://news.sportbox.ru/video/no_ads/spbvideo_NI536574_V_Novorossijske_proshel_detskij_turnir_Pole_slavy_bojevoj?ci=211355',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('display_id')
webpage = self._download_webpage(url, display_id)
player = self._search_regex(
r'src="/?(vdl/player/[^"]+)"', webpage, 'player')
title = self._html_search_regex(
[r'"nodetitle"\s*:\s*"([^"]+)"', r'class="node-header_{1,2}title">([^<]+)'],
webpage, 'title')
description = self._og_search_description(webpage) or self._html_search_meta(
'description', webpage, 'description')
thumbnail = self._og_search_thumbnail(webpage)
upload_date = unified_strdate(self._html_search_meta(
'dateCreated', webpage, 'upload date'))
return {
'_type': 'url_transparent',
'url': compat_urlparse.urljoin(url, '/%s' % player),
'display_id': display_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'upload_date': upload_date,
}
class SportBoxEmbedIE(InfoExtractor): class SportBoxEmbedIE(InfoExtractor):

View File

@ -0,0 +1,52 @@
# coding: utf-8
from __future__ import unicode_literals
from .adobepass import AdobePassIE
from ..utils import (
extract_attributes,
update_url_query,
smuggle_url,
)
class SproutIE(AdobePassIE):
_VALID_URL = r'https?://(?:www\.)?sproutonline\.com/watch/(?P<id>[^/?#]+)'
_TEST = {
'url': 'http://www.sproutonline.com/watch/cowboy-adventure',
'md5': '74bf14128578d1e040c3ebc82088f45f',
'info_dict': {
'id': '9dexnwtmh8_X',
'ext': 'mp4',
'title': 'A Cowboy Adventure',
'description': 'Ruff-Ruff, Tweet and Dave get to be cowboys for the day at Six Cow Corral.',
'timestamp': 1437758640,
'upload_date': '20150724',
'uploader': 'NBCU-SPROUT-NEW',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_component = self._search_regex(
r'(?s)(<div[^>]+data-component="video"[^>]*?>)',
webpage, 'video component', default=None)
if video_component:
options = self._parse_json(extract_attributes(
video_component)['data-options'], video_id)
theplatform_url = options['video']
query = {
'mbr': 'true',
'manifest': 'm3u',
}
if options.get('protected'):
query['auth'] = self._extract_mvpd_auth(url, options['pid'], 'sprout', 'sprout')
theplatform_url = smuggle_url(update_url_query(
theplatform_url, query), {'force_smil_url': True})
else:
iframe = self._search_regex(
r'(<iframe[^>]+id="sproutVideoIframe"[^>]*?>)',
webpage, 'iframe')
theplatform_url = extract_attributes(iframe)['src']
return self.url_result(theplatform_url, 'ThePlatform')

View File

@ -14,6 +14,8 @@ from ..utils import (
class SRGSSRIE(InfoExtractor): class SRGSSRIE(InfoExtractor):
_VALID_URL = r'(?:https?://tp\.srgssr\.ch/p(?:/[^/]+)+\?urn=urn|srgssr):(?P<bu>srf|rts|rsi|rtr|swi):(?:[^:]+:)?(?P<type>video|audio):(?P<id>[0-9a-f\-]{36}|\d+)' _VALID_URL = r'(?:https?://tp\.srgssr\.ch/p(?:/[^/]+)+\?urn=urn|srgssr):(?P<bu>srf|rts|rsi|rtr|swi):(?:[^:]+:)?(?P<type>video|audio):(?P<id>[0-9a-f\-]{36}|\d+)'
_GEO_BYPASS = False
_GEO_COUNTRIES = ['CH']
_ERRORS = { _ERRORS = {
'AGERATING12': 'To protect children under the age of 12, this video is only available between 8 p.m. and 6 a.m.', 'AGERATING12': 'To protect children under the age of 12, this video is only available between 8 p.m. and 6 a.m.',
@ -40,8 +42,12 @@ class SRGSSRIE(InfoExtractor):
media_id)[media_type.capitalize()] media_id)[media_type.capitalize()]
if media_data.get('block') and media_data['block'] in self._ERRORS: if media_data.get('block') and media_data['block'] in self._ERRORS:
raise ExtractorError('%s said: %s' % ( message = self._ERRORS[media_data['block']]
self.IE_NAME, self._ERRORS[media_data['block']]), expected=True) if media_data['block'] == 'GEOBLOCK':
self.raise_geo_restricted(
msg=message, countries=self._GEO_COUNTRIES)
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, message), expected=True)
return media_data return media_data

View File

@ -13,6 +13,8 @@ from ..utils import (
class SVTBaseIE(InfoExtractor): class SVTBaseIE(InfoExtractor):
_GEO_COUNTRIES = ['SE']
def _extract_video(self, video_info, video_id): def _extract_video(self, video_info, video_id):
formats = [] formats = []
for vr in video_info['videoReferences']: for vr in video_info['videoReferences']:
@ -38,7 +40,9 @@ class SVTBaseIE(InfoExtractor):
'url': vurl, 'url': vurl,
}) })
if not formats and video_info.get('rights', {}).get('geoBlockedSweden'): if not formats and video_info.get('rights', {}).get('geoBlockedSweden'):
self.raise_geo_restricted('This video is only available in Sweden') self.raise_geo_restricted(
'This video is only available in Sweden',
countries=self._GEO_COUNTRIES)
self._sort_formats(formats) self._sort_formats(formats)
subtitles = {} subtitles = {}

View File

@ -179,10 +179,12 @@ class ThePlatformIE(ThePlatformBaseIE, AdobePassIE):
if m: if m:
return [m.group('url')] return [m.group('url')]
# Are whitesapces ignored in URLs?
# https://github.com/rg3/youtube-dl/issues/12044
matches = re.findall( matches = re.findall(
r'<(?:iframe|script)[^>]+src=(["\'])((?:https?:)?//player\.theplatform\.com/p/.+?)\1', webpage) r'(?s)<(?:iframe|script)[^>]+src=(["\'])((?:https?:)?//player\.theplatform\.com/p/.+?)\1', webpage)
if matches: if matches:
return list(zip(*matches))[1] return [re.sub(r'\s', '', list(zip(*matches))[1][0])]
@staticmethod @staticmethod
def _sign_url(url, sig_key, sig_secret, life=600, include_qs=False): def _sign_url(url, sig_key, sig_secret, life=600, include_qs=False):
@ -306,9 +308,10 @@ class ThePlatformFeedIE(ThePlatformBaseIE):
}, },
}] }]
def _extract_feed_info(self, provider_id, feed_id, filter_query, video_id, custom_fields=None, asset_types_query={}): def _extract_feed_info(self, provider_id, feed_id, filter_query, video_id, custom_fields=None, asset_types_query={}, account_id=None):
real_url = self._URL_TEMPLATE % (self.http_scheme(), provider_id, feed_id, filter_query) real_url = self._URL_TEMPLATE % (self.http_scheme(), provider_id, feed_id, filter_query)
entry = self._download_json(real_url, video_id)['entries'][0] entry = self._download_json(real_url, video_id)['entries'][0]
main_smil_url = 'http://link.theplatform.com/s/%s/media/guid/%d/%s' % (provider_id, account_id, entry['guid']) if account_id else None
formats = [] formats = []
subtitles = {} subtitles = {}
@ -333,7 +336,7 @@ class ThePlatformFeedIE(ThePlatformBaseIE):
if asset_type in asset_types_query: if asset_type in asset_types_query:
query.update(asset_types_query[asset_type]) query.update(asset_types_query[asset_type])
cur_formats, cur_subtitles = self._extract_theplatform_smil(update_url_query( cur_formats, cur_subtitles = self._extract_theplatform_smil(update_url_query(
smil_url, query), video_id, 'Downloading SMIL data for %s' % asset_type) main_smil_url or smil_url, query), video_id, 'Downloading SMIL data for %s' % asset_type)
formats.extend(cur_formats) formats.extend(cur_formats)
subtitles = self._merge_subtitles(subtitles, cur_subtitles) subtitles = self._merge_subtitles(subtitles, cur_subtitles)

View File

@ -3,13 +3,14 @@ from __future__ import unicode_literals
import re import re
from .jwplatform import JWPlatformBaseIE from .common import InfoExtractor
from ..utils import remove_end from ..utils import remove_end
class ThisAVIE(JWPlatformBaseIE): class ThisAVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?thisav\.com/video/(?P<id>[0-9]+)/.*' _VALID_URL = r'https?://(?:www\.)?thisav\.com/video/(?P<id>[0-9]+)/.*'
_TESTS = [{ _TESTS = [{
# jwplayer
'url': 'http://www.thisav.com/video/47734/%98%26sup1%3B%83%9E%83%82---just-fit.html', 'url': 'http://www.thisav.com/video/47734/%98%26sup1%3B%83%9E%83%82---just-fit.html',
'md5': '0480f1ef3932d901f0e0e719f188f19b', 'md5': '0480f1ef3932d901f0e0e719f188f19b',
'info_dict': { 'info_dict': {
@ -20,6 +21,7 @@ class ThisAVIE(JWPlatformBaseIE):
'uploader_id': 'dj7970' 'uploader_id': 'dj7970'
} }
}, { }, {
# html5 media
'url': 'http://www.thisav.com/video/242352/nerdy-18yo-big-ass-tattoos-and-glasses.html', 'url': 'http://www.thisav.com/video/242352/nerdy-18yo-big-ass-tattoos-and-glasses.html',
'md5': 'ba90c076bd0f80203679e5b60bf523ee', 'md5': 'ba90c076bd0f80203679e5b60bf523ee',
'info_dict': { 'info_dict': {
@ -48,8 +50,12 @@ class ThisAVIE(JWPlatformBaseIE):
}], }],
} }
else: else:
info_dict = self._extract_jwplayer_data( entries = self._parse_html5_media_entries(url, webpage, video_id)
webpage, video_id, require_title=False) if entries:
info_dict = entries[0]
else:
info_dict = self._extract_jwplayer_data(
webpage, video_id, require_title=False)
uploader = self._html_search_regex( uploader = self._html_search_regex(
r': <a href="http://www.thisav.com/user/[0-9]+/(?:[^"]+)">([^<]+)</a>', r': <a href="http://www.thisav.com/user/[0-9]+/(?:[^"]+)">([^<]+)</a>',
webpage, 'uploader name', fatal=False) webpage, 'uploader name', fatal=False)

View File

@ -100,9 +100,13 @@ class TurnerBaseIE(AdobePassIE):
formats.extend(self._extract_smil_formats( formats.extend(self._extract_smil_formats(
video_url, video_id, fatal=False)) video_url, video_id, fatal=False))
elif ext == 'm3u8': elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats( m3u8_formats = self._extract_m3u8_formats(
video_url, video_id, 'mp4', video_url, video_id, 'mp4',
m3u8_id=format_id or 'hls', fatal=False)) m3u8_id=format_id or 'hls', fatal=False)
if '/secure/' in video_url and '?hdnea=' in video_url:
for f in m3u8_formats:
f['_seekable'] = False
formats.extend(m3u8_formats)
elif ext == 'f4m': elif ext == 'f4m':
formats.extend(self._extract_f4m_formats( formats.extend(self._extract_f4m_formats(
update_url_query(video_url, {'hdcore': '3.7.0'}), update_url_query(video_url, {'hdcore': '3.7.0'}),

View File

@ -24,6 +24,7 @@ class TV4IE(InfoExtractor):
sport/| sport/|
) )
)(?P<id>[0-9]+)''' )(?P<id>[0-9]+)'''
_GEO_COUNTRIES = ['SE']
_TESTS = [ _TESTS = [
{ {
'url': 'http://www.tv4.se/kalla-fakta/klipp/kalla-fakta-5-english-subtitles-2491650', 'url': 'http://www.tv4.se/kalla-fakta/klipp/kalla-fakta-5-english-subtitles-2491650',
@ -71,16 +72,12 @@ class TV4IE(InfoExtractor):
'http://www.tv4play.se/player/assets/%s.json' % video_id, 'http://www.tv4play.se/player/assets/%s.json' % video_id,
video_id, 'Downloading video info JSON') video_id, 'Downloading video info JSON')
# If is_geo_restricted is true, it doesn't necessarily mean we can't download it
if info.get('is_geo_restricted'):
self.report_warning('This content might not be available in your country due to licensing restrictions.')
title = info['title'] title = info['title']
subtitles = {} subtitles = {}
formats = [] formats = []
# http formats are linked with unresolvable host # http formats are linked with unresolvable host
for kind in ('hls', ''): for kind in ('hls3', ''):
data = self._download_json( data = self._download_json(
'https://prima.tv4play.se/api/web/asset/%s/play.json' % video_id, 'https://prima.tv4play.se/api/web/asset/%s/play.json' % video_id,
video_id, 'Downloading sources JSON', query={ video_id, 'Downloading sources JSON', query={
@ -113,6 +110,10 @@ class TV4IE(InfoExtractor):
'url': manifest_url, 'url': manifest_url,
'ext': 'vtt', 'ext': 'vtt',
}]}) }]})
if not formats and info.get('is_geo_restricted'):
self.raise_geo_restricted(countries=self._GEO_COUNTRIES)
self._sort_formats(formats) self._sort_formats(formats)
return { return {

View File

@ -0,0 +1,76 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
unescapeHTML,
)
class TVN24IE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:[^/]+)\.)?tvn24(?:bis)?\.pl/(?:[^/]+/)*(?P<id>[^/]+)\.html'
_TESTS = [{
'url': 'http://www.tvn24.pl/wiadomosci-z-kraju,3/oredzie-artura-andrusa,702428.html',
'md5': 'fbdec753d7bc29d96036808275f2130c',
'info_dict': {
'id': '1584444',
'ext': 'mp4',
'title': '"Święta mają być wesołe, dlatego, ludziska, wszyscy pod jemiołę"',
'description': 'Wyjątkowe orędzie Artura Andrusa, jednego z gości "Szkła kontaktowego".',
'thumbnail': 're:http://.*[.]jpeg',
}
}, {
'url': 'http://fakty.tvn24.pl/ogladaj-online,60/53-konferencja-bezpieczenstwa-w-monachium,716431.html',
'only_matching': True,
}, {
'url': 'http://sport.tvn24.pl/pilka-nozna,105/ligue-1-kamil-glik-rozcial-glowe-monaco-tylko-remisuje-z-bastia,716522.html',
'only_matching': True,
}, {
'url': 'http://tvn24bis.pl/poranek,146,m/gen-koziej-w-tvn24-bis-wracamy-do-czasow-zimnej-wojny,715660.html',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = self._og_search_title(webpage)
def extract_json(attr, name, fatal=True):
return self._parse_json(
self._search_regex(
r'\b%s=(["\'])(?P<json>(?!\1).+?)\1' % attr, webpage,
name, group='json', fatal=fatal) or '{}',
video_id, transform_source=unescapeHTML, fatal=fatal)
quality_data = extract_json('data-quality', 'formats')
formats = []
for format_id, url in quality_data.items():
formats.append({
'url': url,
'format_id': format_id,
'height': int_or_none(format_id.rstrip('p')),
})
self._sort_formats(formats)
description = self._og_search_description(webpage)
thumbnail = self._og_search_thumbnail(
webpage, default=None) or self._html_search_regex(
r'\bdata-poster=(["\'])(?P<url>(?!\1).+?)\1', webpage,
'thumbnail', group='url')
share_params = extract_json(
'data-share-params', 'share params', fatal=False)
if isinstance(share_params, dict):
video_id = share_params.get('id') or video_id
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'formats': formats,
}

View File

@ -1,7 +1,7 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
from .jwplatform import JWPlatformBaseIE from .common import InfoExtractor
from ..utils import ( from ..utils import (
clean_html, clean_html,
get_element_by_class, get_element_by_class,
@ -9,7 +9,7 @@ from ..utils import (
) )
class TVNoeIE(JWPlatformBaseIE): class TVNoeIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?tvnoe\.cz/video/(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.)?tvnoe\.cz/video/(?P<id>[0-9]+)'
_TEST = { _TEST = {
'url': 'http://www.tvnoe.cz/video/10362', 'url': 'http://www.tvnoe.cz/video/10362',

View File

@ -0,0 +1,75 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_HTTPError
from ..utils import (
extract_attributes,
urlencode_postdata,
ExtractorError,
)
class TVPlayerIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?tvplayer\.com/watch/(?P<id>[^/?#]+)'
_TEST = {
'url': 'http://tvplayer.com/watch/bbcone',
'info_dict': {
'id': '89',
'ext': 'mp4',
'title': r're:^BBC One [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
},
'params': {
# m3u8 download
'skip_download': True,
}
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
current_channel = extract_attributes(self._search_regex(
r'(<div[^>]+class="[^"]*current-channel[^"]*"[^>]*>)',
webpage, 'channel element'))
title = current_channel['data-name']
resource_id = self._search_regex(
r'resourceId\s*=\s*"(\d+)"', webpage, 'resource id')
platform = self._search_regex(
r'platform\s*=\s*"([^"]+)"', webpage, 'platform')
token = self._search_regex(
r'token\s*=\s*"([^"]+)"', webpage, 'token', default='null')
validate = self._search_regex(
r'validate\s*=\s*"([^"]+)"', webpage, 'validate', default='null')
try:
response = self._download_json(
'http://api.tvplayer.com/api/v2/stream/live',
resource_id, headers={
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
}, data=urlencode_postdata({
'service': 1,
'platform': platform,
'id': resource_id,
'token': token,
'validate': validate,
}))['tvplayer']['response']
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError):
response = self._parse_json(
e.cause.read().decode(), resource_id)['tvplayer']['response']
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, response['error']), expected=True)
raise
formats = self._extract_m3u8_formats(response['stream'], resource_id, 'mp4')
self._sort_formats(formats)
return {
'id': resource_id,
'display_id': display_id,
'title': self._live_title(title),
'formats': formats,
'is_live': True,
}

View File

@ -447,7 +447,14 @@ class TwitchHighlightsIE(TwitchVideosBaseIE):
class TwitchStreamIE(TwitchBaseIE): class TwitchStreamIE(TwitchBaseIE):
IE_NAME = 'twitch:stream' IE_NAME = 'twitch:stream'
_VALID_URL = r'%s/(?P<id>[^/#?]+)/?(?:\#.*)?$' % TwitchBaseIE._VALID_URL_BASE _VALID_URL = r'''(?x)
https?://
(?:
(?:www\.)?twitch\.tv/|
player\.twitch\.tv/\?.*?\bchannel=
)
(?P<id>[^/#?]+)
'''
_TESTS = [{ _TESTS = [{
'url': 'http://www.twitch.tv/shroomztv', 'url': 'http://www.twitch.tv/shroomztv',
@ -471,8 +478,25 @@ class TwitchStreamIE(TwitchBaseIE):
}, { }, {
'url': 'http://www.twitch.tv/miracle_doto#profile-0', 'url': 'http://www.twitch.tv/miracle_doto#profile-0',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://player.twitch.tv/?channel=lotsofs',
'only_matching': True,
}] }]
@classmethod
def suitable(cls, url):
return (False
if any(ie.suitable(url) for ie in (
TwitchVideoIE,
TwitchChapterIE,
TwitchVodIE,
TwitchProfileIE,
TwitchAllVideosIE,
TwitchUploadsIE,
TwitchPastBroadcastsIE,
TwitchHighlightsIE))
else super(TwitchStreamIE, cls).suitable(url))
def _real_extract(self, url): def _real_extract(self, url):
channel_id = self._match_id(url) channel_id = self._match_id(url)

View File

@ -20,6 +20,7 @@ class Vbox7IE(InfoExtractor):
) )
(?P<id>[\da-fA-F]+) (?P<id>[\da-fA-F]+)
''' '''
_GEO_COUNTRIES = ['BG']
_TESTS = [{ _TESTS = [{
'url': 'http://vbox7.com/play:0946fff23c', 'url': 'http://vbox7.com/play:0946fff23c',
'md5': 'a60f9ab3a3a2f013ef9a967d5f7be5bf', 'md5': 'a60f9ab3a3a2f013ef9a967d5f7be5bf',
@ -78,7 +79,7 @@ class Vbox7IE(InfoExtractor):
video_url = video['src'] video_url = video['src']
if '/na.mp4' in video_url: if '/na.mp4' in video_url:
self.raise_geo_restricted() self.raise_geo_restricted(countries=self._GEO_COUNTRIES)
uploader = video.get('uploader') uploader = video.get('uploader')

View File

@ -14,6 +14,7 @@ from ..utils import (
class VGTVIE(XstreamIE): class VGTVIE(XstreamIE):
IE_DESC = 'VGTV, BTTV, FTV, Aftenposten and Aftonbladet' IE_DESC = 'VGTV, BTTV, FTV, Aftenposten and Aftonbladet'
_GEO_BYPASS = False
_HOST_TO_APPNAME = { _HOST_TO_APPNAME = {
'vgtv.no': 'vgtv', 'vgtv.no': 'vgtv',
@ -217,7 +218,8 @@ class VGTVIE(XstreamIE):
properties = try_get( properties = try_get(
data, lambda x: x['streamConfiguration']['properties'], list) data, lambda x: x['streamConfiguration']['properties'], list)
if properties and 'geoblocked' in properties: if properties and 'geoblocked' in properties:
raise self.raise_geo_restricted() raise self.raise_geo_restricted(
countries=[host.rpartition('.')[-1].partition('/')[0].upper()])
self._sort_formats(info['formats']) self._sort_formats(info['formats'])

View File

@ -70,10 +70,10 @@ class ViceBaseIE(AdobePassIE):
'url': uplynk_preplay_url, 'url': uplynk_preplay_url,
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'description': base.get('body'), 'description': base.get('body') or base.get('display_body'),
'thumbnail': watch_hub_data.get('cover-image') or watch_hub_data.get('thumbnail'), 'thumbnail': watch_hub_data.get('cover-image') or watch_hub_data.get('thumbnail'),
'duration': parse_duration(video_data.get('video_duration') or watch_hub_data.get('video-duration')), 'duration': int_or_none(video_data.get('video_duration')) or parse_duration(watch_hub_data.get('video-duration')),
'timestamp': int_or_none(video_data.get('created_at')), 'timestamp': int_or_none(video_data.get('created_at'), 1000),
'age_limit': parse_age_limit(video_data.get('video_rating')), 'age_limit': parse_age_limit(video_data.get('video_rating')),
'series': video_data.get('show_title') or watch_hub_data.get('show-title'), 'series': video_data.get('show_title') or watch_hub_data.get('show-title'),
'episode_number': int_or_none(episode.get('episode_number') or watch_hub_data.get('episode')), 'episode_number': int_or_none(episode.get('episode_number') or watch_hub_data.get('episode')),

View File

@ -7,16 +7,16 @@ from .vice import ViceBaseIE
class VicelandIE(ViceBaseIE): class VicelandIE(ViceBaseIE):
_VALID_URL = r'https?://(?:www\.)?viceland\.com/[^/]+/video/[^/]+/(?P<id>[a-f0-9]+)' _VALID_URL = r'https?://(?:www\.)?viceland\.com/[^/]+/video/[^/]+/(?P<id>[a-f0-9]+)'
_TEST = { _TEST = {
'url': 'https://www.viceland.com/en_us/video/cyberwar-trailer/57608447973ee7705f6fbd4e', 'url': 'https://www.viceland.com/en_us/video/trapped/588a70d0dba8a16007de7316',
'info_dict': { 'info_dict': {
'id': '57608447973ee7705f6fbd4e', 'id': '588a70d0dba8a16007de7316',
'ext': 'mp4', 'ext': 'mp4',
'title': 'CYBERWAR (Trailer)', 'title': 'TRAPPED (Series Trailer)',
'description': 'Tapping into the geopolitics of hacking and surveillance, Ben Makuch travels the world to meet with hackers, government officials, and dissidents to investigate the ecosystem of cyberwarfare.', 'description': 'md5:7a8e95c2b6cd86461502a2845e581ccf',
'age_limit': 14, 'age_limit': 14,
'timestamp': 1466008539, 'timestamp': 1485474122,
'upload_date': '20160615', 'upload_date': '20170126',
'uploader_id': '11', 'uploader_id': '57a204098cb727dec794c6a3',
'uploader': 'Viceland', 'uploader': 'Viceland',
}, },
'params': { 'params': {

View File

@ -0,0 +1,99 @@
# coding: utf-8
from __future__ import unicode_literals
import random
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
determine_ext,
float_or_none,
parse_age_limit,
qualities,
try_get,
unified_timestamp,
urljoin,
)
class VideoPressIE(InfoExtractor):
_VALID_URL = r'https?://videopress\.com/embed/(?P<id>[\da-zA-Z]+)'
_TESTS = [{
'url': 'https://videopress.com/embed/kUJmAcSf',
'md5': '706956a6c875873d51010921310e4bc6',
'info_dict': {
'id': 'kUJmAcSf',
'ext': 'mp4',
'title': 'VideoPress Demo',
'thumbnail': r're:^https?://.*\.jpg',
'duration': 634.6,
'timestamp': 1434983935,
'upload_date': '20150622',
'age_limit': 0,
},
}, {
# 17+, requires birth_* params
'url': 'https://videopress.com/embed/iH3gstfZ',
'only_matching': True,
}]
@staticmethod
def _extract_urls(webpage):
return re.findall(
r'<iframe[^>]+src=["\']((?:https?://)?videopress\.com/embed/[\da-zA-Z]+)',
webpage)
def _real_extract(self, url):
video_id = self._match_id(url)
video = self._download_json(
'https://public-api.wordpress.com/rest/v1.1/videos/%s' % video_id,
video_id, query={
'birth_month': random.randint(1, 12),
'birth_day': random.randint(1, 31),
'birth_year': random.randint(1950, 1995),
})
title = video['title']
def base_url(scheme):
return try_get(
video, lambda x: x['file_url_base'][scheme], compat_str)
base_url = base_url('https') or base_url('http')
QUALITIES = ('std', 'dvd', 'hd')
quality = qualities(QUALITIES)
formats = []
for format_id, f in video['files'].items():
if not isinstance(f, dict):
continue
for ext, path in f.items():
if ext in ('mp4', 'ogg'):
formats.append({
'url': urljoin(base_url, path),
'format_id': '%s-%s' % (format_id, ext),
'ext': determine_ext(path, ext),
'quality': quality(format_id),
})
original_url = try_get(video, lambda x: x['original'], compat_str)
if original_url:
formats.append({
'url': original_url,
'format_id': 'original',
'quality': len(QUALITIES),
})
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': video.get('description'),
'thumbnail': video.get('poster'),
'duration': float_or_none(video.get('duration'), 1000),
'timestamp': unified_timestamp(video.get('upload_date')),
'age_limit': parse_age_limit(video.get('rating')),
'formats': formats,
}

Some files were not shown because too many files have changed in this diff Show More