Compare commits

...

146 Commits

Author SHA1 Message Date
ef48a1175d release 2017.02.27 2017-02-27 23:26:07 +07:00
c6184bcf7b [ChangeLog] Actualize 2017-02-27 23:24:03 +07:00
18abb74376 [npo] Relax _VALID_URL for zapp.nl 2017-02-27 23:13:51 +07:00
dbc01fdb6f [hetklokhuis] Fix IE_NAME 2017-02-27 23:10:29 +07:00
f264c62334 [npo] Add support for zapp.nl 2017-02-27 23:10:00 +07:00
0dc5a86a32 [npo] Add support for hetklokhuis.nl (closes #12293) 2017-02-27 22:43:19 +07:00
0e879f432a [youtube:channel] Remove duplicate test 2017-02-27 22:22:43 +07:00
892b47ab6c [scivee] Remove extractor (#9315)
The Wikipedia page is changed from active to down:
https://en.wikipedia.org/w/index.php?title=SciVee&diff=prev&oldid=723161154

Some other interesting bits:

$ nslookup www.scivee.tv
Server:         8.8.8.8
Address:        8.8.8.8#53

Non-authoritative answer:
www.scivee.tv   canonical name = scivee.rcsb.org.
Name:   scivee.rcsb.org
Address: 132.249.231.211

$ nslookup rcsb.org
Server:         8.8.8.8
Address:        8.8.8.8#53

Non-authoritative answer:
Name:   rcsb.org
Address: 132.249.231.77

Both IPs are from UCSD. I guess it's maintained by a lab and they don't
maintain it anymore.
2017-02-27 21:34:33 +08:00
fdeea72611 [cda] Decode URL (fixes #12255) 2017-02-26 22:05:52 +08:00
xbe
7fd4655256 [crunchyroll] Extract uploader name that's not a link
Provide the Crunchyroll extractor with the ability to extract uploader
names that aren't links. Add a test for this new functionality.
This fixes #12267.
2017-02-26 19:08:10 +08:00
fd5c4aab59 [youtube] Raise GeoRestrictedError 2017-02-26 16:52:40 +07:00
8878789f11 [dailymotion] Raise GeoRestrictedError 2017-02-26 16:52:40 +07:00
a5cf17989b [MDR] Relax _VALID_URL and playerURL matching and update _TESTS
Ref: #12169
2017-02-26 17:24:54 +08:00
b3aec47665 [tvigle] Raise GeoRestrictedError 2017-02-25 23:27:45 +07:00
9d0c08a02c [vevo] Fix videos with the new streams/streamsV3 format (closes #11719) 2017-02-26 00:15:49 +08:00
e498758b9c [freshlive] Fix issues and improve (closes #12175) 2017-02-25 22:56:42 +07:00
5fc8d89361 [freshlive] Add extractor 2017-02-25 22:55:17 +07:00
d374d943f3 [downloader/common] Limit displaying 2 digits after decimal point in sleep interval message 2017-02-25 20:59:04 +07:00
103f8c8d36 [xhamster] Capture and output videoClosed error (#12263) 2017-02-25 20:38:21 +07:00
922ab7840b [etonline] Add extractor (closes #12236) 2017-02-25 20:16:40 +07:00
831217291a [compat] Use try except for compat_numeric_types 2017-02-25 19:44:50 +07:00
db182c63fb [njpwworld] Add new extractor (closes #11561) 2017-02-25 18:44:39 +08:00
eeb0a95684 [extractor/common] Add 'preference' to _parse_html5_media_entries
Some websites, like NJPWorld, put different qualities on different
player pages.
2017-02-25 18:40:05 +08:00
231bcd0b6b [amcnetworks] Relax _VALID_URL (#12127) 2017-02-25 02:51:53 +07:00
204efc8509 release 2017.02.24.1 2017-02-24 21:59:39 +07:00
5d3a51e1b9 [ChangeLog] Actualize 2017-02-24 21:57:39 +07:00
ad3033037c [noco] Modernize 2017-02-24 21:51:56 +07:00
f3bc281239 [noco] Swtich login URL to https (closes #12246) 2017-02-24 21:48:34 +07:00
441d7a32e5 [thescene] Extract more metadata 2017-02-24 21:22:29 +07:00
51ed496307 [thescene] Fix extraction (closes #12235) 2017-02-24 22:08:45 +08:00
68f17a9c2d [tubitv] use geo bypass mechanism 2017-02-24 12:27:56 +01:00
39e7277ed1 [openload] fix extraction(closes #10408) 2017-02-24 11:21:58 +01:00
42dcdbe11c [ivi] Raise GeoRestrictedError 2017-02-24 10:54:39 +07:00
6b097cff27 release 2017.02.24 2017-02-24 06:09:15 +07:00
f2f7961820 [ChangeLog] Actualize 2017-02-24 06:07:41 +07:00
be5df5ee31 Suppress help for all deprecated options and print warning when used 2017-02-24 06:04:27 +07:00
f2980fddeb [lynda:course] Add webpage extraction fallback (closes #12238) 2017-02-24 05:01:31 +07:00
0f57447de7 [postprocessor/ffmpeg] Add mising space (closes #12232) 2017-02-24 04:56:58 +07:00
19f3821821 [devscripts/make_lazy_extractors] Fix making lazy extractors on python 3 under Windows 2017-02-24 02:09:51 +07:00
8e1409fd80 [go] sign all uplynk urls and use geo bypass only for free videos(closes #12087)(closes #12210) 2017-02-23 18:42:06 +01:00
050f143c12 [README.md] Clarify sequence types in output template and document numeric string formatting operations 2017-02-23 23:00:13 +07:00
fafc2bf5a9 [options] Deprecate --autonumber-size 2017-02-23 22:11:16 +07:00
b3175982c3 [YoutubeDL] Add more numeric fields for NA substitution in outtmpl 2017-02-23 22:01:57 +07:00
89db639dfe [YoutubeDL] Rewrite outtmpl for playlist_index and autonumber for backward compatibility 2017-02-23 22:01:09 +07:00
d0d9ade486 [YoutubeDL] Add support for string formatting operations in output template 2017-02-23 22:57:53 +08:00
28572a1a0b [compat] Add compat_numeric_types 2017-02-23 22:57:53 +08:00
0f3d41b44d [devscripts/run_tests] Exclude youtube lists tests from core build 2017-02-23 19:48:54 +07:00
d5fd9a3be3 [skylinewebcams] Add extractor (closes #12221) 2017-02-23 18:45:38 +07:00
ada77fa544 [instagram] Add support for multi video posts (closes #12226) 2017-02-23 18:02:04 +07:00
9e03aa75c7 [crunchyroll] extract playlist entries ids 2017-02-23 11:57:18 +01:00
30eaa3a702 [mgtv] fix extraction 2017-02-23 11:57:05 +01:00
c59f703610 [sohu] raise GeoRestrictedError 2017-02-23 11:56:55 +01:00
bc61c80c14 [leeco] raise GeoRestrictedError and use geo bypass mechanism 2017-02-23 11:56:45 +01:00
345b24538b release 2017.02.22 2017-02-22 23:50:42 +07:00
63a29b6118 [ChangeLog] Actualize 2017-02-22 23:45:01 +07:00
b5869560a4 [crunchyroll] Fix descriptions with double quotes (closes #12124) 2017-02-23 00:08:45 +08:00
527ef85fe9 [dailymotion] Make comment count optional (closes #12209)
Not served anymore
2017-02-22 21:49:30 +07:00
58ad6995cd [vidzi] Add test for #12213 2017-02-22 21:29:53 +07:00
a86e416088 [vidzi] Add support for vidzi.cc 2017-02-22 22:28:09 +08:00
71e9577b94 [24video] Add support for 24video.tube (closes #12217) 2017-02-22 21:19:52 +07:00
0d427c8304 [setup] Actualize maintainer info 2017-02-22 01:51:27 +07:00
139d8ac106 [setup] Add python 3.6 classifier 2017-02-22 01:50:34 +07:00
abd29a2ced [crackle] use geo bypass mechanism 2017-02-21 19:37:26 +01:00
31615ac279 [viewster] use geo verifcation headers 2017-02-21 19:36:39 +01:00
fc320a40d9 Revert "[cbc] use geo bypass mechanism"
This reverts commit 86466a8b6f.
2017-02-21 18:14:55 +01:00
7345d6d465 [tfo] Improve geo restriction detection and use geo bypass mechanism 2017-02-21 17:52:50 +01:00
86466a8b6f [cbc] use geo bypass mechanism 2017-02-21 17:52:50 +01:00
33dc173cdc [telequebec] use geo bypass mechanism 2017-02-21 17:52:50 +01:00
3444844b04 [limelight] extract PlaylistService errors 2017-02-21 17:52:50 +01:00
8c6c88c7da release 2017.02.21 2017-02-21 23:48:24 +07:00
159aaaa9d0 [ChangeLog] Actualize 2017-02-21 23:46:58 +07:00
eea0716cae [extractor/common] Print origin country for fake IP 2017-02-21 23:14:33 +07:00
336a76551b [extractor/common] Do not quit _initialize_geo_bypass on empty countries 2017-02-21 23:09:41 +07:00
dc0a869e5e [extractor/common] Fix typo 2017-02-21 23:05:31 +07:00
e39b5d4ab8 [extractor/common] Allow calling _initialize_geo_bypass from extractors (#11970) 2017-02-21 23:00:43 +07:00
e469ab2528 [ninecninemedia] use geo bypass mechanism 2017-02-21 14:38:00 +01:00
890d44b005 [adobepass] add support for Time Warner Cable(closes #12191) 2017-02-20 19:00:40 +01:00
6926304472 [spankbang] Make uploader optional (closes #12193) 2017-02-21 00:54:43 +07:00
3ccdde8cb7 [extractor/common] Emphasize geo bypass APIs are experimental 2017-02-20 23:21:15 +07:00
da42ff0668 [iprima] Improve geo restriction detection and disable geo bypass 2017-02-20 23:17:19 +07:00
82f662182b [iprima] Modernize 2017-02-20 23:16:14 +07:00
2cc7fcd338 [commonmistakes] Disable UnicodeBOM extractor test for python 3.2 2017-02-20 03:06:52 +07:00
6d4c259765 [svt] PEP 8 2017-02-20 02:25:55 +07:00
c78dd35491 [nrk] PEP 8 2017-02-20 02:25:39 +07:00
8ffb8e63fe [prosiebensat1] Throw ExtractionError on unsupported page type (closes #12180) 2017-02-20 01:00:53 +07:00
983e9b7746 [nrk] Update _API_HOST and relax _VALID_URL 2017-02-20 00:59:31 +07:00
8936f68a0b [travis] Run tests in parallel
[test_download] Print test names in case of network errors

[test_download] Add comments for nose parameters

[test_download] Modify outtmpl to prevent info JSON filename conflicts

Thanks @jaimeMF for the idea.

[travis] Only download tests should be run in parallel
2017-02-19 21:26:35 +08:00
c58b7ffef4 [tv4] Bypass geo restriction and improve detection 2017-02-19 06:25:59 +07:00
f1a78ee4ef [tv4] Switch to hls3 protocol (closes #12177) 2017-02-19 06:16:00 +07:00
de64e23c56 [downloader/ism] Honor HTTP headers when downloading fragments 2017-02-19 04:18:36 +07:00
553f6dbac7 [downloader/dash] Honor HTTP headers when downloading fragments
For example, https://www.oppetarkiv.se/video/1196142/natten-ar-dagens-mor
2017-02-19 04:18:22 +07:00
0aa10994f4 [options] Move geo restriction related options to separate section 2017-02-19 05:10:08 +08:00
4248dad92b Improve geo bypass mechanism
* Rename options to preffixly match with --geo-verification-proxy
* Introduce _GEO_COUNTRIES for extractors
* Implement faking IP right away for sites with known geo restriction
2017-02-19 05:10:08 +08:00
0a840f584c Rename bypass geo restriction options 2017-02-19 05:10:08 +08:00
0016b84e16 Add faked X-Forwarded-For to formats' HTTP headers 2017-02-19 05:10:08 +08:00
18a0defab0 [utils] Make random_ipv4 return unicode string 2017-02-19 05:10:08 +08:00
5d3fbf77d9 [viki] Improve geo restriction detection 2017-02-19 05:10:08 +08:00
80b59020e0 [vgtv] Improve geo restriction detection 2017-02-19 05:10:08 +08:00
71631862f4 [srgssr] Improve geo restriction detection 2017-02-19 05:10:08 +08:00
89cc7fe770 [vbox7] Improve geo restriction detection and use geo bypass mechanism 2017-02-19 05:10:08 +08:00
04d906eae3 [svt] Improve geo restriction detection and use geo bypass mechanism 2017-02-19 05:10:08 +08:00
8ab8066cf0 [pbs] Improve geo restriction detection and use geo bypass mechanism 2017-02-19 05:10:08 +08:00
01b1aa9ff4 [ondemandkorea] Improve geo restriction detection and use geo bypass mechanism 2017-02-19 05:10:08 +08:00
ff4007891f [nrk] Improve geo restriction detection and use geo bypass mechanism 2017-02-19 05:10:08 +08:00
28200e654b [itv] Improve geo restriction detection and use geo bypass mechanism 2017-02-19 05:10:08 +08:00
e633f21a96 [go] Improve geo restriction detection and use geo bypass mechanism 2017-02-19 05:10:08 +08:00
d392005a79 [dramafever] Improve geo restriction detection and use geo bypass mechanism 2017-02-19 05:10:08 +08:00
773f291dcb Add experimental geo restriction bypass mechanism
Based on faking X-Forwarded-For HTTP header
2017-02-19 05:10:08 +08:00
bf5b9d859a [utils] Introduce YoutubeDLError base class for all youtube-dl exceptions 2017-02-19 05:10:08 +08:00
049a0f4d6d [brightcove:legacy] restrict videoPlayer value(closes #12040) 2017-02-18 21:08:40 +01:00
ac33accd96 [options] Mention quoted string literals for --match-filter 2017-02-18 23:59:26 +07:00
e84888b432 [tvn24] Improve extraction (closes #11679) 2017-02-18 23:34:09 +07:00
02d9b82a23 [tvn24] Add extractor 2017-02-18 23:33:49 +07:00
a2e3286676 [thisav] Add support for html5 media (closes #11771) 2017-02-18 20:21:53 +07:00
f75caf059e [metacafe] Improve (closes #10371) 2017-02-18 19:58:25 +07:00
bdabbc220c [metacafe] Bypass family filter
If you don't send this user=ffilter: false cookie, it will 301 redirect you to a page asking about it, and then the title check will fail.
2017-02-18 19:47:33 +07:00
70bcc444a9 [viceland] improve info extraction and update test 2017-02-18 09:52:43 +01:00
28e35f5070 release 2017.02.17 2017-02-17 23:59:56 +07:00
cf3704c132 [ChangeLog] Actualize 2017-02-17 23:48:30 +07:00
2c1f442c2b [options] Add missing spaces 2017-02-17 23:18:26 +07:00
bad4ccdb5d [heise] Improve (closes #9725) 2017-02-17 23:09:40 +07:00
db76c30c6e [heise] Support videos embedded in any article. 2017-02-17 22:55:53 +07:00
c2bde5d081 [ellentv] Improve 2017-02-17 22:45:51 +07:00
90fad0e74c [openload] Fix extraction (closes #12002) 2017-02-17 22:31:16 +07:00
d94badc755 [openload] Semifix extraction (closes #10408)
just updated the code. i don't do much python still i tried to convert my code. lemme know if there is any prob with it
2017-02-17 22:30:05 +07:00
fef51645d6 [theplatform] Recognize URLs with whitespaces (closes #12044) 2017-02-17 23:13:51 +08:00
4cead6a614 [einthusan] Relax _VALID_URL (closes #12141, closes #12159) 2017-02-17 22:02:01 +07:00
a4a554a793 [generic] Try parsing JWPlayer embedded videos (closes #12030) 2017-02-16 23:44:03 +08:00
b898f0a173 [elpais] Fix typo and improve extraction (closes #12139) 2017-02-16 04:57:42 +07:00
2480b056c1 release 2017.02.16 2017-02-16 00:10:04 +07:00
3aa25395aa [ChangeLog] Actualize 2017-02-16 00:08:56 +07:00
eafaeb226a [ceskatelevize] Lower priority for audio description sources (#12119) 2017-02-16 00:04:15 +07:00
de4d378c0c [ceskatelevize] Prefix format ids 2017-02-15 23:38:00 +07:00
099cfdb770 [devscripts/run_tests.sh] Change permission for script to 755 2017-02-16 00:28:31 +08:00
398dea3210 [test_YoutubeDL] Fix invalid escape sequences 2017-02-15 23:20:46 +07:00
db13c16ef8 [utils] Add support for quoted string literals in --match-filter (closes #8050, closes #12142, closes #12144) 2017-02-15 23:12:10 +07:00
1bd05345ea [amcnetworks] fix extraction(closes #12127) 2017-02-15 14:19:18 +01:00
3021cf83b7 [pinkbike] Fix uploader extraction (closes #12054) 2017-02-15 02:08:32 +07:00
04a741232f [onetpl] Add support for businessinsider.com.pl and plejada.pl 2017-02-15 01:23:55 +07:00
43a3d9edfc [onetpl] Add support for onet.pl (closes #10507) 2017-02-15 01:14:06 +07:00
d31aa74fdb [onetmvp] Add shortcut extractor 2017-02-15 00:58:18 +07:00
6092ccd058 [vodpl] Make more robust and add another test (closes #12122) 2017-02-15 00:52:31 +07:00
22ce9ad2bd [vod.pl] Add new extractor 2017-02-15 00:48:08 +07:00
9a372f14b4 [pornhub] Extract video URL from tv platform site (#12007, #12129) 2017-02-14 23:52:41 +07:00
5cb2d36c82 [ceskatelevize] Extract DASH formats (closes #12119, closes #12133) 2017-02-14 22:57:38 +07:00
fcca0d53a8 [ceskatelevize] Quick fix to revert to using old HLS-based playlist
This fixes recent changes in iVysilani. Proper patch should migrate to
MPEG-DASH version, which is now the default.
2017-02-14 22:25:37 +07:00
98 changed files with 2213 additions and 789 deletions

View File

@ -6,8 +6,8 @@
---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.02.14*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.02.14**
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.02.27*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.02.27**
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2017.02.14
[debug] youtube-dl version 2017.02.27
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@ -11,8 +11,6 @@ sudo: false
env:
- YTDL_TEST_SET=core
- YTDL_TEST_SET=download
before_script:
- chmod +x ./devscripts/run_tests.sh
script: ./devscripts/run_tests.sh
notifications:
email:

149
ChangeLog
View File

@ -1,3 +1,152 @@
version 2017.02.27
Core
* [downloader/common] Limit displaying 2 digits after decimal point in sleep
interval message (#12183)
+ [extractor/common] Add preference to _parse_html5_media_entries
Extractors
+ [npo] Add support for zapp.nl
+ [npo] Add support for hetklokhuis.nl (#12293)
- [scivee] Remove extractor (#9315)
+ [cda] Decode download URL (#12255)
+ [crunchyroll] Improve uploader extraction (#12267)
+ [youtube] Raise GeoRestrictedError
+ [dailymotion] Raise GeoRestrictedError
+ [mdr] Recognize more URL patterns (#12169)
+ [tvigle] Raise GeoRestrictedError
* [vevo] Fix extraction for videos with the new streams/streamsV3 format
(#11719)
+ [freshlive] Add support for freshlive.tv (#12175)
+ [xhamster] Capture and output videoClosed error (#12263)
+ [etonline] Add support for etonline.com (#12236)
+ [njpwworld] Add support for njpwworld.com (#11561)
* [amcnetworks] Relax URL regular expression (#12127)
version 2017.02.24.1
Extractors
* [noco] Modernize
* [noco] Switch login URL to https (#12246)
+ [thescene] Extract more metadata
* [thescene] Fix extraction (#12235)
+ [tubitv] Use geo bypass mechanism
* [openload] Fix extraction (#10408)
+ [ivi] Raise GeoRestrictedError
version 2017.02.24
Core
* [options] Hide deprecated options from --help
* [options] Deprecate --autonumber-size
+ [YoutubeDL] Add support for string formatting operations in output template
(#5185, #5748, #6841, #9929, #9966 #9978, #12189)
Extractors
+ [lynda:course] Add webpage extraction fallback (#12238)
* [go] Sign all uplynk URLs and use geo bypass only for free videos
(#12087, #12210)
+ [skylinewebcams] Add support for skylinewebcams.com (#12221)
+ [instagram] Add support for multi video posts (#12226)
+ [crunchyroll] Extract playlist entries ids
* [mgtv] Fix extraction
+ [sohu] Raise GeoRestrictedError
+ [leeco] Raise GeoRestrictedError and use geo bypass mechanism
version 2017.02.22
Extractors
* [crunchyroll] Fix descriptions with double quotes (#12124)
* [dailymotion] Make comment count optional (#12209)
+ [vidzi] Add support for vidzi.cc (#12213)
+ [24video] Add support for 24video.tube (#12217)
+ [crackle] Use geo bypass mechanism
+ [viewster] Use geo verification headers
+ [tfo] Improve geo restriction detection and use geo bypass mechanism
+ [telequebec] Use geo bypass mechanism
+ [limelight] Extract PlaylistService errors and improve geo restriction
detection
version 2017.02.21
Core
* [extractor/common] Allow calling _initialize_geo_bypass from extractors
(#11970)
+ [adobepass] Add support for Time Warner Cable (#12191)
+ [travis] Run tests in parallel
+ [downloader/ism] Honor HTTP headers when downloading fragments
+ [downloader/dash] Honor HTTP headers when downloading fragments
+ [utils] Add GeoUtils class for working with geo tools and GeoUtils.random_ipv4
+ Add option --geo-bypass-country for explicit geo bypass on behalf of
specified country
+ Add options to control geo bypass mechanism --geo-bypass and --no-geo-bypass
+ Add experimental geo restriction bypass mechanism based on faking
X-Forwarded-For HTTP header
+ [utils] Introduce GeoRestrictedError for geo restricted videos
+ [utils] Introduce YoutubeDLError base class for all youtube-dl exceptions
Extractors
+ [ninecninemedia] Use geo bypass mechanism
* [spankbang] Make uploader optional (#12193)
+ [iprima] Improve geo restriction detection and disable geo bypass
* [iprima] Modernize
* [commonmistakes] Disable UnicodeBOM extractor test for python 3.2
+ [prosiebensat1] Throw ExtractionError on unsupported page type (#12180)
* [nrk] Update _API_HOST and relax _VALID_URL
+ [tv4] Bypass geo restriction and improve detection
* [tv4] Switch to hls3 protocol (#12177)
+ [viki] Improve geo restriction detection
+ [vgtv] Improve geo restriction detection
+ [srgssr] Improve geo restriction detection
+ [vbox7] Improve geo restriction detection and use geo bypass mechanism
+ [svt] Improve geo restriction detection and use geo bypass mechanism
+ [pbs] Improve geo restriction detection and use geo bypass mechanism
+ [ondemandkorea] Improve geo restriction detection and use geo bypass mechanism
+ [nrk] Improve geo restriction detection and use geo bypass mechanism
+ [itv] Improve geo restriction detection and use geo bypass mechanism
+ [go] Improve geo restriction detection and use geo bypass mechanism
+ [dramafever] Improve geo restriction detection and use geo bypass mechanism
* [brightcove:legacy] Restrict videoPlayer value (#12040)
+ [tvn24] Add support for tvn24.pl and tvn24bis.pl (#11679)
+ [thisav] Add support for HTML5 media (#11771)
* [metacafe] Bypass family filter (#10371)
* [viceland] Improve info extraction
version 2017.02.17
Extractors
* [heise] Improve extraction (#9725)
* [ellentv] Improve (#11653)
* [openload] Fix extraction (#10408, #12002)
+ [theplatform] Recognize URLs with whitespaces (#12044)
* [einthusan] Relax URL regular expression (#12141, #12159)
+ [generic] Support complex JWPlayer embedded videos (#12030)
* [elpais] Improve extraction (#12139)
version 2017.02.16
Core
+ [utils] Add support for quoted string literals in --match-filter (#8050,
#12142, #12144)
Extractors
* [ceskatelevize] Lower priority for audio description sources (#12119)
* [amcnetworks] Fix extraction (#12127)
* [pinkbike] Fix uploader extraction (#12054)
+ [onetpl] Add support for businessinsider.com.pl and plejada.pl
+ [onetpl] Add support for onet.pl (#10507)
+ [onetmvp] Add shortcut extractor
+ [vodpl] Add support for vod.pl (#12122)
+ [pornhub] Extract video URL from tv platform site (#12007, #12129)
+ [ceskatelevize] Extract DASH formats (#12119, #12133)
version 2017.02.14
Core

184
README.md
View File

@ -99,11 +99,21 @@ Alternatively, refer to the [developer instructions](#developer-instructions) fo
--source-address IP Client-side IP address to bind to
-4, --force-ipv4 Make all connections via IPv4
-6, --force-ipv6 Make all connections via IPv6
## Geo Restriction:
--geo-verification-proxy URL Use this proxy to verify the IP address for
some geo-restricted sites. The default
proxy specified by --proxy (or none, if the
options is not present) is used for the
actual downloading.
--geo-bypass Bypass geographic restriction via faking
X-Forwarded-For HTTP header (experimental)
--no-geo-bypass Do not bypass geographic restriction via
faking X-Forwarded-For HTTP header
(experimental)
--geo-bypass-country CODE Force bypass geographic restriction with
explicitly provided two-letter ISO 3166-2
country code (experimental)
## Video Selection:
--playlist-start NUMBER Playlist video to start at (default is 1)
@ -137,20 +147,22 @@ Alternatively, refer to the [developer instructions](#developer-instructions) fo
--match-filter FILTER Generic video filter. Specify any key (see
help for -o for a list of available keys)
to match if the key is present, !key to
check if the key is not present,key >
check if the key is not present, key >
NUMBER (like "comment_count > 12", also
works with >=, <, <=, !=, =) to compare
against a number, and & to require multiple
matches. Values which are not known are
excluded unless you put a question mark (?)
after the operator.For example, to only
match videos that have been liked more than
100 times and disliked less than 50 times
(or the dislike functionality is not
available at the given service), but who
also have a description, use --match-filter
"like_count > 100 & dislike_count <? 50 &
description" .
against a number, key = 'LITERAL' (like
"uploader = 'Mike Smith'", also works with
!=) to match against a string literal and &
to require multiple matches. Values which
are not known are excluded unless you put a
question mark (?) after the operator. For
example, to only match videos that have
been liked more than 100 times and disliked
less than 50 times (or the dislike
functionality is not available at the given
service), but who also have a description,
use --match-filter "like_count > 100 &
dislike_count <? 50 & description" .
--no-playlist Download only the video, if the URL refers
to a video and a playlist.
--yes-playlist Download the playlist, if the URL refers to
@ -205,21 +217,11 @@ Alternatively, refer to the [developer instructions](#developer-instructions) fo
--id Use only video ID in file name
-o, --output TEMPLATE Output filename template, see the "OUTPUT
TEMPLATE" for all the info
--autonumber-size NUMBER Specify the number of digits in
%(autonumber)s when it is present in output
filename template or --auto-number option
is given (default is 5)
--autonumber-start NUMBER Specify the start value for %(autonumber)s
(default is 1)
--restrict-filenames Restrict filenames to only ASCII
characters, and avoid "&" and spaces in
filenames
-A, --auto-number [deprecated; use -o
"%(autonumber)s-%(title)s.%(ext)s" ] Number
downloaded files starting from 00000
-t, --title [deprecated] Use title in file name
(default)
-l, --literal [deprecated] Alias of --title
-w, --no-overwrites Do not overwrite files
-c, --continue Force resume of partially downloaded files.
By default, youtube-dl will resume
@ -474,87 +476,89 @@ The `-o` option allows users to indicate a template for the output file names.
**tl;dr:** [navigate me to examples](#output-template-examples).
The basic usage is not to set any template arguments when downloading a single file, like in `youtube-dl -o funny_video.flv "http://some/video"`. However, it may contain special sequences that will be replaced when downloading each video. The special sequences have the format `%(NAME)s`. To clarify, that is a percent symbol followed by a name in parentheses, followed by a lowercase S. Allowed names are:
The basic usage is not to set any template arguments when downloading a single file, like in `youtube-dl -o funny_video.flv "http://some/video"`. However, it may contain special sequences that will be replaced when downloading each video. The special sequences may be formatted according to [python string formatting operations](https://docs.python.org/2/library/stdtypes.html#string-formatting). For example, `%(NAME)s` or `%(NAME)05d`. To clarify, that is a percent symbol followed by a name in parentheses, followed by a formatting operations. Allowed names along with sequence type are:
- `id`: Video identifier
- `title`: Video title
- `url`: Video URL
- `ext`: Video filename extension
- `alt_title`: A secondary title of the video
- `display_id`: An alternative identifier for the video
- `uploader`: Full name of the video uploader
- `license`: License name the video is licensed under
- `creator`: The creator of the video
- `release_date`: The date (YYYYMMDD) when the video was released
- `timestamp`: UNIX timestamp of the moment the video became available
- `upload_date`: Video upload date (YYYYMMDD)
- `uploader_id`: Nickname or id of the video uploader
- `location`: Physical location where the video was filmed
- `duration`: Length of the video in seconds
- `view_count`: How many users have watched the video on the platform
- `like_count`: Number of positive ratings of the video
- `dislike_count`: Number of negative ratings of the video
- `repost_count`: Number of reposts of the video
- `average_rating`: Average rating give by users, the scale used depends on the webpage
- `comment_count`: Number of comments on the video
- `age_limit`: Age restriction for the video (years)
- `format`: A human-readable description of the format
- `format_id`: Format code specified by `--format`
- `format_note`: Additional info about the format
- `width`: Width of the video
- `height`: Height of the video
- `resolution`: Textual description of width and height
- `tbr`: Average bitrate of audio and video in KBit/s
- `abr`: Average audio bitrate in KBit/s
- `acodec`: Name of the audio codec in use
- `asr`: Audio sampling rate in Hertz
- `vbr`: Average video bitrate in KBit/s
- `fps`: Frame rate
- `vcodec`: Name of the video codec in use
- `container`: Name of the container format
- `filesize`: The number of bytes, if known in advance
- `filesize_approx`: An estimate for the number of bytes
- `protocol`: The protocol that will be used for the actual download
- `extractor`: Name of the extractor
- `extractor_key`: Key name of the extractor
- `epoch`: Unix epoch when creating the file
- `autonumber`: Five-digit number that will be increased with each download, starting at zero
- `playlist`: Name or id of the playlist that contains the video
- `playlist_index`: Index of the video in the playlist padded with leading zeros according to the total length of the playlist
- `playlist_id`: Playlist identifier
- `playlist_title`: Playlist title
- `id` (string): Video identifier
- `title` (string): Video title
- `url` (string): Video URL
- `ext` (string): Video filename extension
- `alt_title` (string): A secondary title of the video
- `display_id` (string): An alternative identifier for the video
- `uploader` (string): Full name of the video uploader
- `license` (string): License name the video is licensed under
- `creator` (string): The creator of the video
- `release_date` (string): The date (YYYYMMDD) when the video was released
- `timestamp` (numeric): UNIX timestamp of the moment the video became available
- `upload_date` (string): Video upload date (YYYYMMDD)
- `uploader_id` (string): Nickname or id of the video uploader
- `location` (string): Physical location where the video was filmed
- `duration` (numeric): Length of the video in seconds
- `view_count` (numeric): How many users have watched the video on the platform
- `like_count` (numeric): Number of positive ratings of the video
- `dislike_count` (numeric): Number of negative ratings of the video
- `repost_count` (numeric): Number of reposts of the video
- `average_rating` (numeric): Average rating give by users, the scale used depends on the webpage
- `comment_count` (numeric): Number of comments on the video
- `age_limit` (numeric): Age restriction for the video (years)
- `format` (string): A human-readable description of the format
- `format_id` (string): Format code specified by `--format`
- `format_note` (string): Additional info about the format
- `width` (numeric): Width of the video
- `height` (numeric): Height of the video
- `resolution` (string): Textual description of width and height
- `tbr` (numeric): Average bitrate of audio and video in KBit/s
- `abr` (numeric): Average audio bitrate in KBit/s
- `acodec` (string): Name of the audio codec in use
- `asr` (numeric): Audio sampling rate in Hertz
- `vbr` (numeric): Average video bitrate in KBit/s
- `fps` (numeric): Frame rate
- `vcodec` (string): Name of the video codec in use
- `container` (string): Name of the container format
- `filesize` (numeric): The number of bytes, if known in advance
- `filesize_approx` (numeric): An estimate for the number of bytes
- `protocol` (string): The protocol that will be used for the actual download
- `extractor` (string): Name of the extractor
- `extractor_key` (string): Key name of the extractor
- `epoch` (numeric): Unix epoch when creating the file
- `autonumber` (numeric): Five-digit number that will be increased with each download, starting at zero
- `playlist` (string): Name or id of the playlist that contains the video
- `playlist_index` (numeric): Index of the video in the playlist padded with leading zeros according to the total length of the playlist
- `playlist_id` (string): Playlist identifier
- `playlist_title` (string): Playlist title
Available for the video that belongs to some logical chapter or section:
- `chapter`: Name or title of the chapter the video belongs to
- `chapter_number`: Number of the chapter the video belongs to
- `chapter_id`: Id of the chapter the video belongs to
- `chapter` (string): Name or title of the chapter the video belongs to
- `chapter_number` (numeric): Number of the chapter the video belongs to
- `chapter_id` (string): Id of the chapter the video belongs to
Available for the video that is an episode of some series or programme:
- `series`: Title of the series or programme the video episode belongs to
- `season`: Title of the season the video episode belongs to
- `season_number`: Number of the season the video episode belongs to
- `season_id`: Id of the season the video episode belongs to
- `episode`: Title of the video episode
- `episode_number`: Number of the video episode within a season
- `episode_id`: Id of the video episode
- `series` (string): Title of the series or programme the video episode belongs to
- `season` (string): Title of the season the video episode belongs to
- `season_number` (numeric): Number of the season the video episode belongs to
- `season_id` (string): Id of the season the video episode belongs to
- `episode` (string): Title of the video episode
- `episode_number` (numeric): Number of the video episode within a season
- `episode_id` (string): Id of the video episode
Available for the media that is a track or a part of a music album:
- `track`: Title of the track
- `track_number`: Number of the track within an album or a disc
- `track_id`: Id of the track
- `artist`: Artist(s) of the track
- `genre`: Genre(s) of the track
- `album`: Title of the album the track belongs to
- `album_type`: Type of the album
- `album_artist`: List of all artists appeared on the album
- `disc_number`: Number of the disc or other physical medium the track belongs to
- `release_year`: Year (YYYY) when the album was released
- `track` (string): Title of the track
- `track_number` (numeric): Number of the track within an album or a disc
- `track_id` (string): Id of the track
- `artist` (string): Artist(s) of the track
- `genre` (string): Genre(s) of the track
- `album` (string): Title of the album the track belongs to
- `album_type` (string): Type of the album
- `album_artist` (string): List of all artists appeared on the album
- `disc_number` (numeric): Number of the disc or other physical medium the track belongs to
- `release_year` (numeric): Year (YYYY) when the album was released
Each aforementioned sequence when referenced in an output template will be replaced by the actual value corresponding to the sequence name. Note that some of the sequences are not guaranteed to be present since they depend on the metadata obtained by a particular extractor. Such sequences will be replaced with `NA`.
For example for `-o %(title)s-%(id)s.%(ext)s` and an mp4 video with title `youtube-dl test video` and id `BaW_jenozKcj`, this will result in a `youtube-dl test video-BaW_jenozKcj.mp4` file created in the current directory.
For numeric sequences you can use numeric related formatting, for example, `%(view_count)05d` will result in a string with view count padded with zeros up to 5 characters, like in `00042`.
Output templates can also contain arbitrary hierarchical path, e.g. `-o '%(playlist)s/%(playlist_index)s - %(title)s.%(ext)s'` which will result in downloading each video in a directory corresponding to this path template. Any missing directory will be automatically created for you.
To use percent literals in an output template use `%%`. To output to stdout use `-o -`.

View File

@ -1,6 +1,7 @@
from __future__ import unicode_literals, print_function
from inspect import getsource
import io
import os
from os.path import dirname as dirn
import sys
@ -95,5 +96,5 @@ module_contents.append(
module_src = '\n'.join(module_contents) + '\n'
with open(lazy_extractors_filename, 'wt') as f:
with io.open(lazy_extractors_filename, 'wt', encoding='utf-8') as f:
f.write(module_src)

6
devscripts/run_tests.sh Normal file → Executable file
View File

@ -1,8 +1,9 @@
#!/bin/bash
DOWNLOAD_TESTS="age_restriction|download|subtitles|write_annotations|iqiyi_sdk_interpreter"
DOWNLOAD_TESTS="age_restriction|download|subtitles|write_annotations|iqiyi_sdk_interpreter|youtube_lists"
test_set=""
multiprocess_args=""
case "$YTDL_TEST_SET" in
core)
@ -10,10 +11,11 @@ case "$YTDL_TEST_SET" in
;;
download)
test_set="-I test_(?!$DOWNLOAD_TESTS).+\.py"
multiprocess_args="--processes=4 --process-timeout=540"
;;
*)
break
;;
esac
nosetests test --verbose $test_set
nosetests test --verbose $test_set $multiprocess_args

View File

@ -239,6 +239,7 @@
- **ESPN**
- **ESPNArticle**
- **EsriVideo**
- **ETOnline**
- **Europa**
- **EveryonesMixtape**
- **ExpoTV**
@ -274,6 +275,7 @@
- **francetvinfo.fr**
- **Freesound**
- **freespeech.org**
- **FreshLive**
- **Funimation**
- **FunnyOrDie**
- **Fusion**
@ -310,6 +312,7 @@
- **HellPorno**
- **Helsinki**: helsinki.fi
- **HentaiStigma**
- **hetklokhuis**
- **hgtv.com:show**
- **HistoricFilms**
- **history:topic**: History.com Topic
@ -511,6 +514,7 @@
- **Nintendo**
- **njoy**: N-JOY
- **njoy:embed**
- **NJPWWorld**: 新日本プロレスワールド
- **NobelPrize**
- **Noco**
- **Normalboots**
@ -546,8 +550,10 @@
- **OktoberfestTV**
- **on.aol.com**
- **OnDemandKorea**
- **onet.pl**
- **onet.tv**
- **onet.tv:channel**
- **OnetMVP**
- **OnionStudios**
- **Ooyala**
- **OoyalaExternal**
@ -664,7 +670,6 @@
- **savefrom.net**
- **SBS**: sbs.com.au
- **schooltv**
- **SciVee**
- **screen.yahoo:search**: Yahoo screen search
- **Screencast**
- **ScreencastOMatic**
@ -678,6 +683,7 @@
- **Shared**: shared.sx
- **ShowRoomLive**
- **Sina**
- **SkylineWebcams**
- **skynewsarabia:article**
- **skynewsarabia:video**
- **SkySports**
@ -802,6 +808,7 @@
- **TVCArticle**
- **tvigle**: Интернет-телевидение Tvigle.ru
- **tvland.com**
- **TVN24**
- **TVNoe**
- **tvp**: Telewizja Polska
- **tvp:embed**: Telewizja Polska
@ -900,6 +907,7 @@
- **vlive**
- **vlive:channel**
- **Vodlocker**
- **VODPl**
- **VODPlatform**
- **VoiceRepublic**
- **VoxMedia**

View File

@ -107,8 +107,8 @@ setup(
url='https://github.com/rg3/youtube-dl',
author='Ricardo Garcia',
author_email='ytdl@yt-dl.org',
maintainer='Philipp Hagemeister',
maintainer_email='phihag@phihag.de',
maintainer='Sergey M.',
maintainer_email='dstftw@gmail.com',
packages=[
'youtube_dl',
'youtube_dl.extractor', 'youtube_dl.downloader',
@ -130,6 +130,7 @@ setup(
'Programming Language :: Python :: 3.3',
'Programming Language :: Python :: 3.4',
'Programming Language :: Python :: 3.5',
'Programming Language :: Python :: 3.6',
],
cmdclass={'build_lazy_extractors': build_lazy_extractors},

View File

@ -1,4 +1,5 @@
#!/usr/bin/env python
# coding: utf-8
from __future__ import unicode_literals
@ -525,6 +526,7 @@ class TestYoutubeDL(unittest.TestCase):
'id': '1234',
'ext': 'mp4',
'width': None,
'height': 1080,
}
def fname(templ):
@ -534,16 +536,29 @@ class TestYoutubeDL(unittest.TestCase):
self.assertEqual(fname('%(id)s-%(width)s.%(ext)s'), '1234-NA.mp4')
# Replace missing fields with 'NA'
self.assertEqual(fname('%(uploader_date)s-%(id)s.%(ext)s'), 'NA-1234.mp4')
self.assertEqual(fname('%(height)d.%(ext)s'), '1080.mp4')
self.assertEqual(fname('%(height)6d.%(ext)s'), ' 1080.mp4')
self.assertEqual(fname('%(height)-6d.%(ext)s'), '1080 .mp4')
self.assertEqual(fname('%(height)06d.%(ext)s'), '001080.mp4')
self.assertEqual(fname('%(height) 06d.%(ext)s'), ' 01080.mp4')
self.assertEqual(fname('%(height) 06d.%(ext)s'), ' 01080.mp4')
self.assertEqual(fname('%(height)0 6d.%(ext)s'), ' 01080.mp4')
self.assertEqual(fname('%(height)0 6d.%(ext)s'), ' 01080.mp4')
self.assertEqual(fname('%(height) 0 6d.%(ext)s'), ' 01080.mp4')
self.assertEqual(fname('%%(height)06d.%(ext)s'), '%(height)06d.mp4')
self.assertEqual(fname('%(width)06d.%(ext)s'), 'NA.mp4')
self.assertEqual(fname('%(width)06d.%%(ext)s'), 'NA.%(ext)s')
self.assertEqual(fname('%%(width)06d.%(ext)s'), '%(width)06d.mp4')
def test_format_note(self):
ydl = YoutubeDL()
self.assertEqual(ydl._format_note({}), '')
assertRegexpMatches(self, ydl._format_note({
'vbr': 10,
}), '^\s*10k$')
}), r'^\s*10k$')
assertRegexpMatches(self, ydl._format_note({
'fps': 30,
}), '^30fps$')
}), r'^30fps$')
def test_postprocessors(self):
filename = 'post-processor-testfile.mp4'
@ -606,6 +621,8 @@ class TestYoutubeDL(unittest.TestCase):
'duration': 30,
'filesize': 10 * 1024,
'playlist_id': '42',
'uploader': "變態妍字幕版 太妍 тест",
'creator': "тест ' 123 ' тест--",
}
second = {
'id': '2',
@ -616,6 +633,7 @@ class TestYoutubeDL(unittest.TestCase):
'description': 'foo',
'filesize': 5 * 1024,
'playlist_id': '43',
'uploader': "тест 123",
}
videos = [first, second]
@ -656,6 +674,26 @@ class TestYoutubeDL(unittest.TestCase):
res = get_videos(f)
self.assertEqual(res, ['1'])
f = match_filter_func('uploader = "變態妍字幕版 太妍 тест"')
res = get_videos(f)
self.assertEqual(res, ['1'])
f = match_filter_func('uploader != "變態妍字幕版 太妍 тест"')
res = get_videos(f)
self.assertEqual(res, ['2'])
f = match_filter_func('creator = "тест \' 123 \' тест--"')
res = get_videos(f)
self.assertEqual(res, ['1'])
f = match_filter_func("creator = 'тест \\' 123 \\' тест--'")
res = get_videos(f)
self.assertEqual(res, ['1'])
f = match_filter_func(r"creator = 'тест \' 123 \' тест--' & duration > 30")
res = get_videos(f)
self.assertEqual(res, [])
def test_playlist_items_selection(self):
entries = [{
'id': compat_str(i),

View File

@ -65,6 +65,10 @@ defs = gettestcases()
class TestDownload(unittest.TestCase):
# Parallel testing in nosetests. See
# http://nose.readthedocs.org/en/latest/doc_tests/test_multiprocess/multiprocess.html
_multiprocess_shared_ = True
maxDiff = None
def setUp(self):
@ -73,7 +77,7 @@ class TestDownload(unittest.TestCase):
# Dynamically generate tests
def generator(test_case):
def generator(test_case, tname):
def test_template(self):
ie = youtube_dl.extractor.get_info_extractor(test_case['name'])
@ -102,6 +106,7 @@ def generator(test_case):
return
params = get_params(test_case.get('params', {}))
params['outtmpl'] = tname + '_' + params['outtmpl']
if is_playlist and 'playlist' not in test_case:
params.setdefault('extract_flat', 'in_playlist')
params.setdefault('skip_download', True)
@ -146,7 +151,7 @@ def generator(test_case):
raise
if try_num == RETRIES:
report_warning('Failed due to network errors, skipping...')
report_warning('%s failed due to network errors, skipping...' % tname)
return
print('Retrying: {0} failed tries\n\n##########\n\n'.format(try_num))
@ -221,12 +226,12 @@ def generator(test_case):
# And add them to TestDownload
for n, test_case in enumerate(defs):
test_method = generator(test_case)
tname = 'test_' + str(test_case['name'])
i = 1
while hasattr(TestDownload, tname):
tname = 'test_%s_%d' % (test_case['name'], i)
i += 1
test_method = generator(test_case, tname)
test_method.__name__ = str(tname)
setattr(TestDownload, test_method.__name__, test_method)
del test_method

View File

@ -33,6 +33,7 @@ from .compat import (
compat_get_terminal_size,
compat_http_client,
compat_kwargs,
compat_numeric_types,
compat_os_name,
compat_str,
compat_tokenize_tokenize,
@ -56,6 +57,8 @@ from .utils import (
ExtractorError,
format_bytes,
formatSeconds,
GeoRestrictedError,
ISO3166Utils,
locked_file,
make_HTTPS_handler,
MaxDownloadsReached,
@ -272,6 +275,12 @@ class YoutubeDL(object):
If it returns None, the video is downloaded.
match_filter_func in utils.py is one example for this.
no_color: Do not emit color codes in output.
geo_bypass: Bypass geographic restriction via faking X-Forwarded-For
HTTP header (experimental)
geo_bypass_country:
Two-letter ISO 3166-2 country code that will be used for
explicit geographic restriction bypassing via faking
X-Forwarded-For HTTP header (experimental)
The following options determine which downloader is picked:
external_downloader: Executable of the external downloader to call.
@ -319,11 +328,21 @@ class YoutubeDL(object):
self.params.update(params)
self.cache = Cache(self)
if self.params.get('cn_verification_proxy') is not None:
self.report_warning('--cn-verification-proxy is deprecated. Use --geo-verification-proxy instead.')
def check_deprecated(param, option, suggestion):
if self.params.get(param) is not None:
self.report_warning(
'%s is deprecated. Use %s instead.' % (option, suggestion))
return True
return False
if check_deprecated('cn_verification_proxy', '--cn-verification-proxy', '--geo-verification-proxy'):
if self.params.get('geo_verification_proxy') is None:
self.params['geo_verification_proxy'] = self.params['cn_verification_proxy']
check_deprecated('autonumber_size', '--autonumber-size', 'output template with %(autonumber)0Nd, where N in the number of digits')
check_deprecated('autonumber', '--auto-number', '-o "%(autonumber)s-%(title)s.%(ext)s"')
check_deprecated('usetitle', '--title', '-o "%(title)s-%(id)s.%(ext)s"')
if params.get('bidi_workaround', False):
try:
import pty
@ -585,10 +604,7 @@ class YoutubeDL(object):
autonumber_size = self.params.get('autonumber_size')
if autonumber_size is None:
autonumber_size = 5
autonumber_templ = '%0' + str(autonumber_size) + 'd'
template_dict['autonumber'] = autonumber_templ % (self.params.get('autonumber_start', 1) - 1 + self._num_downloads)
if template_dict.get('playlist_index') is not None:
template_dict['playlist_index'] = '%0*d' % (len(str(template_dict['n_entries'])), template_dict['playlist_index'])
template_dict['autonumber'] = self.params.get('autonumber_start', 1) - 1 + self._num_downloads
if template_dict.get('resolution') is None:
if template_dict.get('width') and template_dict.get('height'):
template_dict['resolution'] = '%dx%d' % (template_dict['width'], template_dict['height'])
@ -601,12 +617,61 @@ class YoutubeDL(object):
compat_str(v),
restricted=self.params.get('restrictfilenames'),
is_id=(k == 'id'))
template_dict = dict((k, sanitize(k, v))
template_dict = dict((k, v if isinstance(v, compat_numeric_types) else sanitize(k, v))
for k, v in template_dict.items()
if v is not None and not isinstance(v, (list, tuple, dict)))
template_dict = collections.defaultdict(lambda: 'NA', template_dict)
outtmpl = self.params.get('outtmpl', DEFAULT_OUTTMPL)
# For fields playlist_index and autonumber convert all occurrences
# of %(field)s to %(field)0Nd for backward compatibility
field_size_compat_map = {
'playlist_index': len(str(template_dict['n_entries'])),
'autonumber': autonumber_size,
}
FIELD_SIZE_COMPAT_RE = r'(?<!%)%\((?P<field>autonumber|playlist_index)\)s'
mobj = re.search(FIELD_SIZE_COMPAT_RE, outtmpl)
if mobj:
outtmpl = re.sub(
FIELD_SIZE_COMPAT_RE,
r'%%(\1)0%dd' % field_size_compat_map[mobj.group('field')],
outtmpl)
NUMERIC_FIELDS = set((
'width', 'height', 'tbr', 'abr', 'asr', 'vbr', 'fps', 'filesize', 'filesize_approx',
'upload_year', 'upload_month', 'upload_day',
'duration', 'view_count', 'like_count', 'dislike_count', 'repost_count',
'average_rating', 'comment_count', 'age_limit',
'start_time', 'end_time',
'chapter_number', 'season_number', 'episode_number',
'track_number', 'disc_number', 'release_year',
'playlist_index',
))
# Missing numeric fields used together with integer presentation types
# in format specification will break the argument substitution since
# string 'NA' is returned for missing fields. We will patch output
# template for missing fields to meet string presentation type.
for numeric_field in NUMERIC_FIELDS:
if numeric_field not in template_dict:
# As of [1] format syntax is:
# %[mapping_key][conversion_flags][minimum_width][.precision][length_modifier]type
# 1. https://docs.python.org/2/library/stdtypes.html#string-formatting
FORMAT_RE = r'''(?x)
(?<!%)
%
\({0}\) # mapping key
(?:[#0\-+ ]+)? # conversion flags (optional)
(?:\d+)? # minimum field width (optional)
(?:\.\d+)? # precision (optional)
[hlL]? # length modifier (optional)
[diouxXeEfFgGcrs%] # conversion type
'''
outtmpl = re.sub(
FORMAT_RE.format(numeric_field),
r'%({0})s'.format(numeric_field), outtmpl)
tmpl = compat_expanduser(outtmpl)
filename = tmpl % template_dict
# Temporary fix for #4787
@ -707,6 +772,14 @@ class YoutubeDL(object):
return self.process_ie_result(ie_result, download, extra_info)
else:
return ie_result
except GeoRestrictedError as e:
msg = e.msg
if e.countries:
msg += '\nThis video is available in %s.' % ', '.join(
map(ISO3166Utils.short2full, e.countries))
msg += '\nYou might want to use a VPN or a proxy server (with --proxy) to workaround.'
self.report_error(msg)
break
except ExtractorError as e: # An error we somewhat expected
self.report_error(compat_str(e), e.format_traceback())
break
@ -847,8 +920,14 @@ class YoutubeDL(object):
if self.params.get('playlistrandom', False):
random.shuffle(entries)
x_forwarded_for = ie_result.get('__x_forwarded_for_ip')
for i, entry in enumerate(entries, 1):
self.to_screen('[download] Downloading video %s of %s' % (i, n_entries))
# This __x_forwarded_for_ip thing is a bit ugly but requires
# minimal changes
if x_forwarded_for:
entry['__x_forwarded_for_ip'] = x_forwarded_for
extra = {
'n_entries': n_entries,
'playlist': playlist,
@ -1233,6 +1312,11 @@ class YoutubeDL(object):
if cookies:
res['Cookie'] = cookies
if 'X-Forwarded-For' not in res:
x_forwarded_for_ip = info_dict.get('__x_forwarded_for_ip')
if x_forwarded_for_ip:
res['X-Forwarded-For'] = x_forwarded_for_ip
return res
def _calc_cookies(self, info_dict):
@ -1375,6 +1459,9 @@ class YoutubeDL(object):
full_format_info = info_dict.copy()
full_format_info.update(format)
format['http_headers'] = self._calc_headers(full_format_info)
# Remove private housekeeping stuff
if '__x_forwarded_for_ip' in info_dict:
del info_dict['__x_forwarded_for_ip']
# TODO Central sorting goes here

View File

@ -414,6 +414,11 @@ def _real_main(argv=None):
'cn_verification_proxy': opts.cn_verification_proxy,
'geo_verification_proxy': opts.geo_verification_proxy,
'config_location': opts.config_location,
'geo_bypass': opts.geo_bypass,
'geo_bypass_country': opts.geo_bypass_country,
# just for deprecation check
'autonumber': opts.autonumber if opts.autonumber is True else None,
'usetitle': opts.usetitle if opts.usetitle is True else None,
}
with YoutubeDL(ydl_opts) as ydl:

View File

@ -2760,6 +2760,12 @@ else:
compat_kwargs = lambda kwargs: kwargs
try:
compat_numeric_types = (int, float, long, complex)
except NameError: # Python 3
compat_numeric_types = (int, float, complex)
if sys.version_info < (2, 7):
def compat_socket_create_connection(address, timeout, source_address=None):
host, port = address
@ -2895,6 +2901,7 @@ __all__ = [
'compat_input',
'compat_itertools_count',
'compat_kwargs',
'compat_numeric_types',
'compat_ord',
'compat_os_name',
'compat_parse_qs',

View File

@ -347,7 +347,10 @@ class FileDownloader(object):
if min_sleep_interval:
max_sleep_interval = self.params.get('max_sleep_interval', min_sleep_interval)
sleep_interval = random.uniform(min_sleep_interval, max_sleep_interval)
self.to_screen('[download] Sleeping %s seconds...' % sleep_interval)
self.to_screen(
'[download] Sleeping %s seconds...' % (
int(sleep_interval) if sleep_interval.is_integer()
else '%.2f' % sleep_interval))
time.sleep(sleep_interval)
return self.real_download(filename, info_dict)

View File

@ -43,7 +43,10 @@ class DashSegmentsFD(FragmentFD):
count = 0
while count <= fragment_retries:
try:
success = ctx['dl'].download(target_filename, {'url': segment_url})
success = ctx['dl'].download(target_filename, {
'url': segment_url,
'http_headers': info_dict.get('http_headers'),
})
if not success:
return False
down, target_sanitized = sanitize_open(target_filename, 'rb')

View File

@ -238,7 +238,10 @@ class IsmFD(FragmentFD):
count = 0
while count <= fragment_retries:
try:
success = ctx['dl'].download(target_filename, {'url': segment_url})
success = ctx['dl'].download(target_filename, {
'url': segment_url,
'http_headers': info_dict.get('http_headers'),
})
if not success:
return False
down, target_sanitized = sanitize_open(target_filename, 'rb')

View File

@ -31,6 +31,11 @@ MSO_INFO = {
'username_field': 'user',
'password_field': 'passwd',
},
'TWC': {
'name': 'Time Warner Cable | Spectrum',
'username_field': 'Ecom_User_ID',
'password_field': 'Ecom_Password',
},
'thr030': {
'name': '3 Rivers Communications'
},

View File

@ -10,7 +10,7 @@ from ..utils import (
class AMCNetworksIE(ThePlatformIE):
_VALID_URL = r'https?://(?:www\.)?(?:amc|bbcamerica|ifc|wetv)\.com/(?:movies/|shows/[^/]+/(?:full-episodes/)?[^/]+/episode-\d+(?:-(?:[^/]+/)?|/))(?P<id>[^/?#]+)'
_VALID_URL = r'https?://(?:www\.)?(?:amc|bbcamerica|ifc|wetv)\.com/(?:movies|shows(?:/[^/]+)+)/(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'http://www.ifc.com/shows/maron/season-04/episode-01/step-1',
'md5': '',
@ -44,6 +44,12 @@ class AMCNetworksIE(ThePlatformIE):
}, {
'url': 'http://www.bbcamerica.com/shows/doctor-who/full-episodes/the-power-of-the-daleks/episode-01-episode-1-color-version',
'only_matching': True,
}, {
'url': 'http://www.wetv.com/shows/mama-june-from-not-to-hot/full-episode/season-01/thin-tervention',
'only_matching': True,
}, {
'url': 'http://www.wetv.com/shows/la-hair/videos/season-05/episode-09-episode-9-2/episode-9-sneak-peek-3',
'only_matching': True,
}]
def _real_extract(self, url):
@ -53,20 +59,30 @@ class AMCNetworksIE(ThePlatformIE):
'mbr': 'true',
'manifest': 'm3u',
}
media_url = self._search_regex(r'window\.platformLinkURL\s*=\s*[\'"]([^\'"]+)', webpage, 'media url')
media_url = self._search_regex(
r'window\.platformLinkURL\s*=\s*[\'"]([^\'"]+)',
webpage, 'media url')
theplatform_metadata = self._download_theplatform_metadata(self._search_regex(
r'https?://link.theplatform.com/s/([^?]+)', media_url, 'theplatform_path'), display_id)
r'link\.theplatform\.com/s/([^?]+)',
media_url, 'theplatform_path'), display_id)
info = self._parse_theplatform_metadata(theplatform_metadata)
video_id = theplatform_metadata['pid']
title = theplatform_metadata['title']
rating = theplatform_metadata['ratings'][0]['rating']
auth_required = self._search_regex(r'window\.authRequired\s*=\s*(true|false);', webpage, 'auth required')
auth_required = self._search_regex(
r'window\.authRequired\s*=\s*(true|false);',
webpage, 'auth required')
if auth_required == 'true':
requestor_id = self._search_regex(r'window\.requestor_id\s*=\s*[\'"]([^\'"]+)', webpage, 'requestor id')
resource = self._get_mvpd_resource(requestor_id, title, video_id, rating)
query['auth'] = self._extract_mvpd_auth(url, video_id, requestor_id, resource)
requestor_id = self._search_regex(
r'window\.requestor_id\s*=\s*[\'"]([^\'"]+)',
webpage, 'requestor id')
resource = self._get_mvpd_resource(
requestor_id, title, video_id, rating)
query['auth'] = self._extract_mvpd_auth(
url, video_id, requestor_id, resource)
media_url = update_url_query(media_url, query)
formats, subtitles = self._extract_theplatform_smil(media_url, video_id)
formats, subtitles = self._extract_theplatform_smil(
media_url, video_id)
self._sort_formats(formats)
info.update({
'id': video_id,
@ -78,9 +94,11 @@ class AMCNetworksIE(ThePlatformIE):
if ns_keys:
ns = list(ns_keys)[0]
series = theplatform_metadata.get(ns + '$show')
season_number = int_or_none(theplatform_metadata.get(ns + '$season'))
season_number = int_or_none(
theplatform_metadata.get(ns + '$season'))
episode = theplatform_metadata.get(ns + '$episodeTitle')
episode_number = int_or_none(theplatform_metadata.get(ns + '$episode'))
episode_number = int_or_none(
theplatform_metadata.get(ns + '$episode'))
if season_number:
title = 'Season %d - %s' % (season_number, title)
if series:

View File

@ -1,13 +1,13 @@
from __future__ import unicode_literals
from .jwplatform import JWPlatformBaseIE
from .common import InfoExtractor
from ..utils import (
unified_strdate,
clean_html,
)
class ArchiveOrgIE(JWPlatformBaseIE):
class ArchiveOrgIE(InfoExtractor):
IE_NAME = 'archive.org'
IE_DESC = 'archive.org videos'
_VALID_URL = r'https?://(?:www\.)?archive\.org/(?:details|embed)/(?P<id>[^/?#]+)(?:[?].*)?$'

View File

@ -191,6 +191,10 @@ class BrightcoveLegacyIE(InfoExtractor):
# These fields hold the id of the video
videoPlayer = find_param('@videoPlayer') or find_param('videoId') or find_param('videoID') or find_param('@videoList')
if videoPlayer is not None:
if isinstance(videoPlayer, list):
videoPlayer = videoPlayer[0]
if not (videoPlayer.isdigit() or videoPlayer.startswith('ref:')):
return None
params['@videoPlayer'] = videoPlayer
linkBase = find_param('linkBaseURL')
if linkBase is not None:

View File

@ -1,6 +1,7 @@
# coding: utf-8
from __future__ import unicode_literals
import codecs
import re
from .common import InfoExtractor
@ -96,6 +97,10 @@ class CDAIE(InfoExtractor):
if not video or 'file' not in video:
self.report_warning('Unable to extract %s version information' % version)
return
if video['file'].startswith('uggc'):
video['file'] = codecs.decode(video['file'], 'rot_13')
if video['file'].endswith('adc.mp4'):
video['file'] = video['file'].replace('adc.mp4', '.mp4')
f = {
'url': video['file'],
}

View File

@ -13,6 +13,7 @@ from ..utils import (
float_or_none,
sanitized_Request,
urlencode_postdata,
USER_AGENTS,
)
@ -21,10 +22,10 @@ class CeskaTelevizeIE(InfoExtractor):
_TESTS = [{
'url': 'http://www.ceskatelevize.cz/ivysilani/ivysilani/10441294653-hyde-park-civilizace/214411058091220',
'info_dict': {
'id': '61924494876951776',
'id': '61924494877246241',
'ext': 'mp4',
'title': 'Hyde Park Civilizace',
'description': 'md5:fe93f6eda372d150759d11644ebbfb4a',
'title': 'Hyde Park Civilizace: Život v Grónsku',
'description': 'md5:3fec8f6bb497be5cdb0c9e8781076626',
'thumbnail': r're:^https?://.*\.jpg',
'duration': 3350,
},
@ -114,70 +115,100 @@ class CeskaTelevizeIE(InfoExtractor):
'requestSource': 'iVysilani',
}
req = sanitized_Request(
'http://www.ceskatelevize.cz/ivysilani/ajax/get-client-playlist',
data=urlencode_postdata(data))
req.add_header('Content-type', 'application/x-www-form-urlencoded')
req.add_header('x-addr', '127.0.0.1')
req.add_header('X-Requested-With', 'XMLHttpRequest')
req.add_header('Referer', url)
playlistpage = self._download_json(req, playlist_id)
playlist_url = playlistpage['url']
if playlist_url == 'error_region':
raise ExtractorError(NOT_AVAILABLE_STRING, expected=True)
req = sanitized_Request(compat_urllib_parse_unquote(playlist_url))
req.add_header('Referer', url)
playlist_title = self._og_search_title(webpage, default=None)
playlist_description = self._og_search_description(webpage, default=None)
playlist = self._download_json(req, playlist_id)['playlist']
playlist_len = len(playlist)
entries = []
for item in playlist:
is_live = item.get('type') == 'LIVE'
formats = []
for format_id, stream_url in item['streamUrls'].items():
formats.extend(self._extract_m3u8_formats(
stream_url, playlist_id, 'mp4',
entry_protocol='m3u8' if is_live else 'm3u8_native',
fatal=False))
self._sort_formats(formats)
item_id = item.get('id') or item['assetId']
title = item['title']
for user_agent in (None, USER_AGENTS['Safari']):
req = sanitized_Request(
'http://www.ceskatelevize.cz/ivysilani/ajax/get-client-playlist',
data=urlencode_postdata(data))
duration = float_or_none(item.get('duration'))
thumbnail = item.get('previewImageUrl')
req.add_header('Content-type', 'application/x-www-form-urlencoded')
req.add_header('x-addr', '127.0.0.1')
req.add_header('X-Requested-With', 'XMLHttpRequest')
if user_agent:
req.add_header('User-Agent', user_agent)
req.add_header('Referer', url)
subtitles = {}
if item.get('type') == 'VOD':
subs = item.get('subtitles')
if subs:
subtitles = self.extract_subtitles(episode_id, subs)
playlistpage = self._download_json(req, playlist_id, fatal=False)
if playlist_len == 1:
final_title = playlist_title or title
if is_live:
final_title = self._live_title(final_title)
else:
final_title = '%s (%s)' % (playlist_title, title)
if not playlistpage:
continue
entries.append({
'id': item_id,
'title': final_title,
'description': playlist_description if playlist_len == 1 else None,
'thumbnail': thumbnail,
'duration': duration,
'formats': formats,
'subtitles': subtitles,
'is_live': is_live,
})
playlist_url = playlistpage['url']
if playlist_url == 'error_region':
raise ExtractorError(NOT_AVAILABLE_STRING, expected=True)
req = sanitized_Request(compat_urllib_parse_unquote(playlist_url))
req.add_header('Referer', url)
playlist_title = self._og_search_title(webpage, default=None)
playlist_description = self._og_search_description(webpage, default=None)
playlist = self._download_json(req, playlist_id, fatal=False)
if not playlist:
continue
playlist = playlist.get('playlist')
if not isinstance(playlist, list):
continue
playlist_len = len(playlist)
for num, item in enumerate(playlist):
is_live = item.get('type') == 'LIVE'
formats = []
for format_id, stream_url in item.get('streamUrls', {}).items():
if 'playerType=flash' in stream_url:
stream_formats = self._extract_m3u8_formats(
stream_url, playlist_id, 'mp4',
entry_protocol='m3u8' if is_live else 'm3u8_native',
m3u8_id='hls-%s' % format_id, fatal=False)
else:
stream_formats = self._extract_mpd_formats(
stream_url, playlist_id,
mpd_id='dash-%s' % format_id, fatal=False)
# See https://github.com/rg3/youtube-dl/issues/12119#issuecomment-280037031
if format_id == 'audioDescription':
for f in stream_formats:
f['source_preference'] = -10
formats.extend(stream_formats)
if user_agent and len(entries) == playlist_len:
entries[num]['formats'].extend(formats)
continue
item_id = item.get('id') or item['assetId']
title = item['title']
duration = float_or_none(item.get('duration'))
thumbnail = item.get('previewImageUrl')
subtitles = {}
if item.get('type') == 'VOD':
subs = item.get('subtitles')
if subs:
subtitles = self.extract_subtitles(episode_id, subs)
if playlist_len == 1:
final_title = playlist_title or title
if is_live:
final_title = self._live_title(final_title)
else:
final_title = '%s (%s)' % (playlist_title, title)
entries.append({
'id': item_id,
'title': final_title,
'description': playlist_description if playlist_len == 1 else None,
'thumbnail': thumbnail,
'duration': duration,
'formats': formats,
'subtitles': subtitles,
'is_live': is_live,
})
for e in entries:
self._sort_formats(e['formats'])
return self.playlist_result(entries, playlist_id, playlist_title, playlist_description)

View File

@ -6,6 +6,7 @@ import hashlib
import json
import netrc
import os
import random
import re
import socket
import sys
@ -39,7 +40,10 @@ from ..utils import (
ExtractorError,
fix_xml_ampersands,
float_or_none,
GeoRestrictedError,
GeoUtils,
int_or_none,
js_to_json,
parse_iso8601,
RegexNotFoundError,
sanitize_filename,
@ -319,17 +323,34 @@ class InfoExtractor(object):
_real_extract() methods and define a _VALID_URL regexp.
Probably, they should also be added to the list of extractors.
_GEO_BYPASS attribute may be set to False in order to disable
geo restriction bypass mechanisms for a particular extractor.
Though it won't disable explicit geo restriction bypass based on
country code provided with geo_bypass_country. (experimental)
_GEO_COUNTRIES attribute may contain a list of presumably geo unrestricted
countries for this extractor. One of these countries will be used by
geo restriction bypass mechanism right away in order to bypass
geo restriction, of course, if the mechanism is not disabled. (experimental)
NB: both these geo attributes are experimental and may change in future
or be completely removed.
Finally, the _WORKING attribute should be set to False for broken IEs
in order to warn the users and skip the tests.
"""
_ready = False
_downloader = None
_x_forwarded_for_ip = None
_GEO_BYPASS = True
_GEO_COUNTRIES = None
_WORKING = True
def __init__(self, downloader=None):
"""Constructor. Receives an optional downloader."""
self._ready = False
self._x_forwarded_for_ip = None
self.set_downloader(downloader)
@classmethod
@ -358,15 +379,59 @@ class InfoExtractor(object):
def initialize(self):
"""Initializes an instance (authentication, etc)."""
self._initialize_geo_bypass(self._GEO_COUNTRIES)
if not self._ready:
self._real_initialize()
self._ready = True
def _initialize_geo_bypass(self, countries):
"""
Initialize geo restriction bypass mechanism.
This method is used to initialize geo bypass mechanism based on faking
X-Forwarded-For HTTP header. A random country from provided country list
is selected and a random IP belonging to this country is generated. This
IP will be passed as X-Forwarded-For HTTP header in all subsequent
HTTP requests.
This method will be used for initial geo bypass mechanism initialization
during the instance initialization with _GEO_COUNTRIES.
You may also manually call it from extractor's code if geo countries
information is not available beforehand (e.g. obtained during
extraction) or due to some another reason.
"""
if not self._x_forwarded_for_ip:
country_code = self._downloader.params.get('geo_bypass_country', None)
# If there is no explicit country for geo bypass specified and
# the extractor is known to be geo restricted let's fake IP
# as X-Forwarded-For right away.
if (not country_code and
self._GEO_BYPASS and
self._downloader.params.get('geo_bypass', True) and
countries):
country_code = random.choice(countries)
if country_code:
self._x_forwarded_for_ip = GeoUtils.random_ipv4(country_code)
if self._downloader.params.get('verbose', False):
self._downloader.to_stdout(
'[debug] Using fake IP %s (%s) as X-Forwarded-For.'
% (self._x_forwarded_for_ip, country_code.upper()))
def extract(self, url):
"""Extracts URL information and returns it in list of dicts."""
try:
self.initialize()
return self._real_extract(url)
for _ in range(2):
try:
self.initialize()
ie_result = self._real_extract(url)
if self._x_forwarded_for_ip:
ie_result['__x_forwarded_for_ip'] = self._x_forwarded_for_ip
return ie_result
except GeoRestrictedError as e:
if self.__maybe_fake_ip_and_retry(e.countries):
continue
raise
except ExtractorError:
raise
except compat_http_client.IncompleteRead as e:
@ -374,6 +439,21 @@ class InfoExtractor(object):
except (KeyError, StopIteration) as e:
raise ExtractorError('An extractor error has occurred.', cause=e)
def __maybe_fake_ip_and_retry(self, countries):
if (not self._downloader.params.get('geo_bypass_country', None) and
self._GEO_BYPASS and
self._downloader.params.get('geo_bypass', True) and
not self._x_forwarded_for_ip and
countries):
country_code = random.choice(countries)
self._x_forwarded_for_ip = GeoUtils.random_ipv4(country_code)
if self._x_forwarded_for_ip:
self.report_warning(
'Video is geo restricted. Retrying extraction with fake IP %s (%s) as X-Forwarded-For.'
% (self._x_forwarded_for_ip, country_code.upper()))
return True
return False
def set_downloader(self, downloader):
"""Sets the downloader for this IE."""
self._downloader = downloader
@ -433,6 +513,15 @@ class InfoExtractor(object):
if isinstance(url_or_request, (compat_str, str)):
url_or_request = url_or_request.partition('#')[0]
# Some sites check X-Forwarded-For HTTP header in order to figure out
# the origin of the client behind proxy. This allows bypassing geo
# restriction by faking this header's value to IP that belongs to some
# geo unrestricted country. We will do so once we encounter any
# geo restriction error.
if self._x_forwarded_for_ip:
if 'X-Forwarded-For' not in headers:
headers['X-Forwarded-For'] = self._x_forwarded_for_ip
urlh = self._request_webpage(url_or_request, video_id, note, errnote, fatal, data=data, headers=headers, query=query)
if urlh is False:
assert not fatal
@ -608,10 +697,8 @@ class InfoExtractor(object):
expected=True)
@staticmethod
def raise_geo_restricted(msg='This video is not available from your location due to geo restriction'):
raise ExtractorError(
'%s. You might want to use --proxy to workaround.' % msg,
expected=True)
def raise_geo_restricted(msg='This video is not available from your location due to geo restriction', countries=None):
raise GeoRestrictedError(msg, countries=countries)
# Methods for following #608
@staticmethod
@ -1923,7 +2010,7 @@ class InfoExtractor(object):
})
return formats
def _parse_html5_media_entries(self, base_url, webpage, video_id, m3u8_id=None, m3u8_entry_protocol='m3u8', mpd_id=None):
def _parse_html5_media_entries(self, base_url, webpage, video_id, m3u8_id=None, m3u8_entry_protocol='m3u8', mpd_id=None, preference=None):
def absolute_url(video_url):
return compat_urlparse.urljoin(base_url, video_url)
@ -1945,7 +2032,8 @@ class InfoExtractor(object):
is_plain_url = False
formats = self._extract_m3u8_formats(
full_url, video_id, ext='mp4',
entry_protocol=m3u8_entry_protocol, m3u8_id=m3u8_id)
entry_protocol=m3u8_entry_protocol, m3u8_id=m3u8_id,
preference=preference)
elif ext == 'mpd':
is_plain_url = False
formats = self._extract_mpd_formats(
@ -2073,6 +2161,123 @@ class InfoExtractor(object):
})
return formats
@staticmethod
def _find_jwplayer_data(webpage):
mobj = re.search(
r'jwplayer\((?P<quote>[\'"])[^\'" ]+(?P=quote)\)\.setup\s*\((?P<options>[^)]+)\)',
webpage)
if mobj:
return mobj.group('options')
def _extract_jwplayer_data(self, webpage, video_id, *args, **kwargs):
jwplayer_data = self._parse_json(
self._find_jwplayer_data(webpage), video_id,
transform_source=js_to_json)
return self._parse_jwplayer_data(
jwplayer_data, video_id, *args, **kwargs)
def _parse_jwplayer_data(self, jwplayer_data, video_id=None, require_title=True,
m3u8_id=None, mpd_id=None, rtmp_params=None, base_url=None):
# JWPlayer backward compatibility: flattened playlists
# https://github.com/jwplayer/jwplayer/blob/v7.4.3/src/js/api/config.js#L81-L96
if 'playlist' not in jwplayer_data:
jwplayer_data = {'playlist': [jwplayer_data]}
entries = []
# JWPlayer backward compatibility: single playlist item
# https://github.com/jwplayer/jwplayer/blob/v7.7.0/src/js/playlist/playlist.js#L10
if not isinstance(jwplayer_data['playlist'], list):
jwplayer_data['playlist'] = [jwplayer_data['playlist']]
for video_data in jwplayer_data['playlist']:
# JWPlayer backward compatibility: flattened sources
# https://github.com/jwplayer/jwplayer/blob/v7.4.3/src/js/playlist/item.js#L29-L35
if 'sources' not in video_data:
video_data['sources'] = [video_data]
this_video_id = video_id or video_data['mediaid']
formats = []
for source in video_data['sources']:
source_url = self._proto_relative_url(source['file'])
if base_url:
source_url = compat_urlparse.urljoin(base_url, source_url)
source_type = source.get('type') or ''
ext = mimetype2ext(source_type) or determine_ext(source_url)
if source_type == 'hls' or ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
source_url, this_video_id, 'mp4', 'm3u8_native', m3u8_id=m3u8_id, fatal=False))
elif ext == 'mpd':
formats.extend(self._extract_mpd_formats(
source_url, this_video_id, mpd_id=mpd_id, fatal=False))
# https://github.com/jwplayer/jwplayer/blob/master/src/js/providers/default.js#L67
elif source_type.startswith('audio') or ext in ('oga', 'aac', 'mp3', 'mpeg', 'vorbis'):
formats.append({
'url': source_url,
'vcodec': 'none',
'ext': ext,
})
else:
height = int_or_none(source.get('height'))
if height is None:
# Often no height is provided but there is a label in
# format like 1080p.
height = int_or_none(self._search_regex(
r'^(\d{3,})[pP]$', source.get('label') or '',
'height', default=None))
a_format = {
'url': source_url,
'width': int_or_none(source.get('width')),
'height': height,
'ext': ext,
}
if source_url.startswith('rtmp'):
a_format['ext'] = 'flv'
# See com/longtailvideo/jwplayer/media/RTMPMediaProvider.as
# of jwplayer.flash.swf
rtmp_url_parts = re.split(
r'((?:mp4|mp3|flv):)', source_url, 1)
if len(rtmp_url_parts) == 3:
rtmp_url, prefix, play_path = rtmp_url_parts
a_format.update({
'url': rtmp_url,
'play_path': prefix + play_path,
})
if rtmp_params:
a_format.update(rtmp_params)
formats.append(a_format)
self._sort_formats(formats)
subtitles = {}
tracks = video_data.get('tracks')
if tracks and isinstance(tracks, list):
for track in tracks:
if track.get('kind') != 'captions':
continue
track_url = urljoin(base_url, track.get('file'))
if not track_url:
continue
subtitles.setdefault(track.get('label') or 'en', []).append({
'url': self._proto_relative_url(track_url)
})
entries.append({
'id': this_video_id,
'title': video_data['title'] if require_title else video_data.get('title'),
'description': video_data.get('description'),
'thumbnail': self._proto_relative_url(video_data.get('image')),
'timestamp': int_or_none(video_data.get('pubdate')),
'duration': float_or_none(jwplayer_data.get('duration') or video_data.get('duration')),
'subtitles': subtitles,
'formats': formats,
})
if len(entries) == 1:
return entries[0]
else:
return self.playlist_result(entries)
def _live_title(self, name):
""" Generate the title for a live video """
now = datetime.datetime.now()

View File

@ -1,5 +1,7 @@
from __future__ import unicode_literals
import sys
from .common import InfoExtractor
from ..utils import ExtractorError
@ -33,7 +35,9 @@ class UnicodeBOMIE(InfoExtractor):
IE_DESC = False
_VALID_URL = r'(?P<bom>\ufeff)(?P<id>.*)$'
_TESTS = [{
# Disable test for python 3.2 since BOM is broken in re in this version
# (see https://github.com/rg3/youtube-dl/issues/9751)
_TESTS = [] if (3, 0) < sys.version_info <= (3, 3) else [{
'url': '\ufeffhttp://www.youtube.com/watch?v=BaW_jenozKc',
'only_matching': True,
}]

View File

@ -6,6 +6,7 @@ from ..utils import int_or_none
class CrackleIE(InfoExtractor):
_GEO_COUNTRIES = ['US']
_VALID_URL = r'(?:crackle:|https?://(?:(?:www|m)\.)?crackle\.com/(?:playlist/\d+/|(?:[^/]+/)+))(?P<id>\d+)'
_TEST = {
'url': 'http://www.crackle.com/comedians-in-cars-getting-coffee/2498934',

View File

@ -123,7 +123,7 @@ class CrunchyrollIE(CrunchyrollBaseIE):
'url': 'http://www.crunchyroll.com/wanna-be-the-strongest-in-the-world/episode-1-an-idol-wrestler-is-born-645513',
'info_dict': {
'id': '645513',
'ext': 'flv',
'ext': 'mp4',
'title': 'Wanna be the Strongest in the World Episode 1 An Idol-Wrestler is Born!',
'description': 'md5:2d17137920c64f2f49981a7797d275ef',
'thumbnail': 'http://img1.ak.crunchyroll.com/i/spire1-tmb/20c6b5e10f1a47b10516877d3c039cae1380951166_full.jpg',
@ -192,6 +192,36 @@ class CrunchyrollIE(CrunchyrollBaseIE):
# geo-restricted (US), 18+ maturity wall, non-premium available
'url': 'http://www.crunchyroll.com/cosplay-complex-ova/episode-1-the-birth-of-the-cosplay-club-565617',
'only_matching': True,
}, {
# A description with double quotes
'url': 'http://www.crunchyroll.com/11eyes/episode-1-piros-jszaka-red-night-535080',
'info_dict': {
'id': '535080',
'ext': 'mp4',
'title': '11eyes Episode 1 Piros éjszaka - Red Night',
'description': 'Kakeru and Yuka are thrown into an alternate nightmarish world they call "Red Night".',
'uploader': 'Marvelous AQL Inc.',
'upload_date': '20091021',
},
'params': {
# Just test metadata extraction
'skip_download': True,
},
}, {
# make sure we can extract an uploader name that's not a link
'url': 'http://www.crunchyroll.com/hakuoki-reimeiroku/episode-1-dawn-of-the-divine-warriors-606899',
'info_dict': {
'id': '606899',
'ext': 'mp4',
'title': 'Hakuoki Reimeiroku Episode 1 Dawn of the Divine Warriors',
'description': 'Ryunosuke was left to die, but Serizawa-san asked him a simple question "Do you want to live?"',
'uploader': 'Geneon Entertainment',
'upload_date': '20120717',
},
'params': {
# just test metadata extraction
'skip_download': True,
},
}]
_FORMAT_IDS = {
@ -362,9 +392,9 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
r'(?s)<h1[^>]*>((?:(?!<h1).)*?<span[^>]+itemprop=["\']title["\'][^>]*>(?:(?!<h1).)+?)</h1>',
webpage, 'video_title')
video_title = re.sub(r' {2,}', ' ', video_title)
video_description = self._html_search_regex(
r'<script[^>]*>\s*.+?\[media_id=%s\].+?"description"\s*:\s*"([^"]+)' % video_id,
webpage, 'description', default=None)
video_description = self._parse_json(self._html_search_regex(
r'<script[^>]*>\s*.+?\[media_id=%s\].+?({.+?"description"\s*:.+?})\);' % video_id,
webpage, 'description', default='{}'), video_id).get('description')
if video_description:
video_description = lowercase_escape(video_description.replace(r'\r\n', '\n'))
video_upload_date = self._html_search_regex(
@ -373,8 +403,9 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
if video_upload_date:
video_upload_date = unified_strdate(video_upload_date)
video_uploader = self._html_search_regex(
r'<a[^>]+href="/publisher/[^"]+"[^>]*>([^<]+)</a>', webpage,
'video_uploader', fatal=False)
# try looking for both an uploader that's a link and one that's not
[r'<a[^>]+href="/publisher/[^"]+"[^>]*>([^<]+)</a>', r'<div>\s*Publisher:\s*<span>\s*(.+?)\s*</span>\s*</div>'],
webpage, 'video_uploader', fatal=False)
available_fmts = []
for a, fmt in re.findall(r'(<a[^>]+token=["\']showmedia\.([0-9]{3,4})p["\'][^>]+>)', webpage):
@ -519,11 +550,11 @@ class CrunchyrollShowPlaylistIE(CrunchyrollBaseIE):
r'(?s)<h1[^>]*>\s*<span itemprop="name">(.*?)</span>',
webpage, 'title')
episode_paths = re.findall(
r'(?s)<li id="showview_videos_media_[0-9]+"[^>]+>.*?<a href="([^"]+)"',
r'(?s)<li id="showview_videos_media_(\d+)"[^>]+>.*?<a href="([^"]+)"',
webpage)
entries = [
self.url_result('http://www.crunchyroll.com' + ep, 'Crunchyroll')
for ep in episode_paths
self.url_result('http://www.crunchyroll.com' + ep, 'Crunchyroll', ep_id)
for ep_id, ep in episode_paths
]
entries.reverse()

View File

@ -66,7 +66,6 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
'uploader_id': 'xijv66',
'age_limit': 0,
'view_count': int,
'comment_count': int,
}
},
# Vevo video
@ -140,7 +139,7 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
view_count = str_to_int(view_count_str)
comment_count = int_or_none(self._search_regex(
r'<meta[^>]+itemprop="interactionCount"[^>]+content="UserComments:(\d+)"',
webpage, 'comment count', fatal=False))
webpage, 'comment count', default=None))
player_v5 = self._search_regex(
[r'buildPlayer\(({.+?})\);\n', # See https://github.com/rg3/youtube-dl/issues/7826
@ -283,9 +282,14 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
}
def _check_error(self, info):
error = info.get('error')
if info.get('error') is not None:
title = error['title']
# See https://developer.dailymotion.com/api#access-error
if error.get('code') == 'DM007':
self.raise_geo_restricted(msg=title)
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, info['error']['title']), expected=True)
'%s said: %s' % (self.IE_NAME, title), expected=True)
def _get_subtitles(self, video_id, webpage):
try:

View File

@ -20,6 +20,7 @@ from ..utils import (
class DramaFeverBaseIE(AMPIE):
_LOGIN_URL = 'https://www.dramafever.com/accounts/login/'
_NETRC_MACHINE = 'dramafever'
_GEO_COUNTRIES = ['US', 'CA']
_CONSUMER_SECRET = 'DA59dtVXYLxajktV'
@ -116,8 +117,9 @@ class DramaFeverIE(DramaFeverBaseIE):
'http://www.dramafever.com/amp/episode/feed.json?guid=%s' % video_id)
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError):
raise ExtractorError(
'Currently unavailable in your country.', expected=True)
self.raise_geo_restricted(
msg='Currently unavailable in your country',
countries=self._GEO_COUNTRIES)
raise
series_id, episode_number = video_id.split('.')

View File

@ -18,8 +18,8 @@ from ..utils import (
class EinthusanIE(InfoExtractor):
_VALID_URL = r'https?://einthusan\.tv/movie/watch/(?P<id>[0-9]+)'
_TEST = {
_VALID_URL = r'https?://einthusan\.tv/movie/watch/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://einthusan.tv/movie/watch/9097/',
'md5': 'ff0f7f2065031b8a2cf13a933731c035',
'info_dict': {
@ -29,7 +29,10 @@ class EinthusanIE(InfoExtractor):
'description': 'md5:33ef934c82a671a94652a9b4e54d931b',
'thumbnail': r're:^https?://.*\.jpg$',
}
}
}, {
'url': 'https://einthusan.tv/movie/watch/51MZ/?lang=hindi',
'only_matching': True,
}]
# reversed from jsoncrypto.prototype.decrypt() in einthusan-PGMovieWatcher.js
def _decrypt(self, encrypted_data, video_id):

View File

@ -1,13 +1,9 @@
# coding: utf-8
from __future__ import unicode_literals
import json
from .common import InfoExtractor
from ..utils import (
ExtractorError,
NO_DEFAULT,
)
from .kaltura import KalturaIE
from ..utils import NO_DEFAULT
class EllenTVIE(InfoExtractor):
@ -65,7 +61,7 @@ class EllenTVIE(InfoExtractor):
if partner_id and kaltura_id:
break
return self.url_result('kaltura:%s:%s' % (partner_id, kaltura_id), 'Kaltura')
return self.url_result('kaltura:%s:%s' % (partner_id, kaltura_id), KalturaIE.ie_key())
class EllenTVClipsIE(InfoExtractor):
@ -77,14 +73,14 @@ class EllenTVClipsIE(InfoExtractor):
'id': 'meryl-streep-vanessa-hudgens',
'title': 'Meryl Streep, Vanessa Hudgens',
},
'playlist_mincount': 7,
'playlist_mincount': 5,
}
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
playlist = self._extract_playlist(webpage)
playlist = self._extract_playlist(webpage, playlist_id)
return {
'_type': 'playlist',
@ -93,16 +89,13 @@ class EllenTVClipsIE(InfoExtractor):
'entries': self._extract_entries(playlist)
}
def _extract_playlist(self, webpage):
def _extract_playlist(self, webpage, playlist_id):
json_string = self._search_regex(r'playerView.addClips\(\[\{(.*?)\}\]\);', webpage, 'json')
try:
return json.loads('[{' + json_string + '}]')
except ValueError as ve:
raise ExtractorError('Failed to download JSON', cause=ve)
return self._parse_json('[{' + json_string + '}]', playlist_id)
def _extract_entries(self, playlist):
return [
self.url_result(
'kaltura:%s:%s' % (item['kaltura_partner_id'], item['kaltura_entry_id']),
'Kaltura')
KalturaIE.ie_key(), video_id=item['kaltura_entry_id'])
for item in playlist]

View File

@ -39,6 +39,18 @@ class ElPaisIE(InfoExtractor):
'description': 'La nave portaba cientos de ánforas y se hundió cerca de la isla de Cabrera por razones desconocidas',
'upload_date': '20170127',
},
}, {
'url': 'http://epv.elpais.com/epv/2017/02/14/programa_la_voz_de_inaki/1487062137_075943.html',
'info_dict': {
'id': '1487062137_075943',
'ext': 'mp4',
'title': 'Disyuntivas',
'description': 'md5:a0fb1485c4a6a8a917e6f93878e66218',
'upload_date': '20170214',
},
'params': {
'skip_download': True,
},
}]
def _real_extract(self, url):
@ -59,14 +71,15 @@ class ElPaisIE(InfoExtractor):
video_url = prefix + video_suffix
thumbnail_suffix = self._search_regex(
r"(?:URLMediaStill|urlFotogramaFijo_\d+)\s*=\s*url_cache\s*\+\s*'([^']+)'",
webpage, 'thumbnail URL', fatal=False)
webpage, 'thumbnail URL', default=None)
thumbnail = (
None if thumbnail_suffix is None
else prefix + thumbnail_suffix)
else prefix + thumbnail_suffix) or self._og_search_thumbnail(webpage)
title = self._html_search_regex(
(r"tituloVideo\s*=\s*'([^']+)'", webpage, 'title',
r'<h2 class="entry-header entry-title.*?>(.*?)</h2>'),
webpage, 'title')
(r"tituloVideo\s*=\s*'([^']+)'",
r'<h2 class="entry-header entry-title.*?>(.*?)</h2>',
r'<h1[^>]+class="titulo"[^>]*>([^<]+)'),
webpage, 'title', default=None) or self._og_search_title(webpage)
upload_date = unified_strdate(self._search_regex(
r'<p class="date-header date-int updated"\s+title="([^"]+)">',
webpage, 'upload date', default=None) or self._html_search_meta(

View File

@ -0,0 +1,39 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class ETOnlineIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?etonline\.com/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://www.etonline.com/tv/211130_dove_cameron_liv_and_maddie_emotional_episode_series_finale/',
'info_dict': {
'id': '211130_dove_cameron_liv_and_maddie_emotional_episode_series_finale',
'title': 'md5:a21ec7d3872ed98335cbd2a046f34ee6',
'description': 'md5:8b94484063f463cca709617c79618ccd',
},
'playlist_count': 2,
}, {
'url': 'http://www.etonline.com/media/video/here_are_the_stars_who_love_bringing_their_moms_as_dates_to_the_oscars-211359/',
'only_matching': True,
}]
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/1242911076001/default_default/index.html?videoId=ref:%s'
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
entries = [
self.url_result(
self.BRIGHTCOVE_URL_TEMPLATE % video_id, 'BrightcoveNew', video_id)
for video_id in re.findall(
r'site\.brightcove\s*\([^,]+,\s*["\'](title_\d+)', webpage)]
return self.playlist_result(
entries, playlist_id,
self._og_search_title(webpage, fatal=False),
self._og_search_description(webpage))

View File

@ -288,6 +288,7 @@ from .espn import (
ESPNArticleIE,
)
from .esri import EsriVideoIE
from .etonline import ETOnlineIE
from .europa import EuropaIE
from .everyonesmixtape import EveryonesMixtapeIE
from .expotv import ExpoTVIE
@ -338,6 +339,7 @@ from .francetv import (
)
from .freesound import FreesoundIE
from .freespeech import FreespeechIE
from .freshlive import FreshLiveIE
from .funimation import FunimationIE
from .funnyordie import FunnyOrDieIE
from .fusion import FusionIE
@ -637,6 +639,7 @@ from .ninecninemedia import (
from .ninegag import NineGagIE
from .ninenow import NineNowIE
from .nintendo import NintendoIE
from .njpwworld import NJPWWorldIE
from .nobelprize import NobelPrizeIE
from .noco import NocoIE
from .normalboots import NormalbootsIE
@ -666,6 +669,7 @@ from .npo import (
NPORadioIE,
NPORadioFragmentIE,
SchoolTVIE,
HetKlokhuisIE,
VPROIE,
WNLIE,
)
@ -694,6 +698,8 @@ from .ondemandkorea import OnDemandKoreaIE
from .onet import (
OnetIE,
OnetChannelIE,
OnetMVPIE,
OnetPlIE,
)
from .onionstudios import OnionStudiosIE
from .ooyala import (
@ -833,7 +839,6 @@ from .safari import (
from .sapo import SapoIE
from .savefrom import SaveFromIE
from .sbs import SBSIE
from .scivee import SciVeeIE
from .screencast import ScreencastIE
from .screencastomatic import ScreencastOMaticIE
from .scrippsnetworks import ScrippsNetworksWatchIE
@ -850,6 +855,7 @@ from .shared import (
from .showroomlive import ShowRoomLiveIE
from .sina import SinaIE
from .sixplay import SixPlayIE
from .skylinewebcams import SkylineWebcamsIE
from .skynewsarabia import (
SkyNewsArabiaIE,
SkyNewsArabiaArticleIE,
@ -1007,6 +1013,7 @@ from .tvc import (
)
from .tvigle import TvigleIE
from .tvland import TVLandIE
from .tvn24 import TVN24IE
from .tvnoe import TVNoeIE
from .tvp import (
TVPEmbedIE,
@ -1147,6 +1154,7 @@ from .vlive import (
VLiveChannelIE
)
from .vodlocker import VodlockerIE
from .vodpl import VODPlIE
from .vodplatform import VODPlatformIE
from .voicerepublic import VoiceRepublicIE
from .voxmedia import VoxMediaIE

View File

@ -0,0 +1,84 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
ExtractorError,
int_or_none,
try_get,
unified_timestamp,
)
class FreshLiveIE(InfoExtractor):
_VALID_URL = r'https?://freshlive\.tv/[^/]+/(?P<id>\d+)'
_TEST = {
'url': 'https://freshlive.tv/satotv/74712',
'md5': '9f0cf5516979c4454ce982df3d97f352',
'info_dict': {
'id': '74712',
'ext': 'mp4',
'title': 'テスト',
'description': 'テスト',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 1511,
'timestamp': 1483619655,
'upload_date': '20170105',
'uploader': 'サトTV',
'uploader_id': 'satotv',
'view_count': int,
'comment_count': int,
'is_live': False,
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
options = self._parse_json(
self._search_regex(
r'window\.__CONTEXT__\s*=\s*({.+?});\s*</script>',
webpage, 'initial context'),
video_id)
info = options['context']['dispatcher']['stores']['ProgramStore']['programs'][video_id]
title = info['title']
if info.get('status') == 'upcoming':
raise ExtractorError('Stream %s is upcoming' % video_id, expected=True)
stream_url = info.get('liveStreamUrl') or info['archiveStreamUrl']
is_live = info.get('liveStreamUrl') is not None
formats = self._extract_m3u8_formats(
stream_url, video_id, ext='mp4',
entry_protocol='m3u8' if is_live else 'm3u8_native',
m3u8_id='hls')
if is_live:
title = self._live_title(title)
return {
'id': video_id,
'formats': formats,
'title': title,
'description': info.get('description'),
'thumbnail': info.get('thumbnailUrl'),
'duration': int_or_none(info.get('airTime')),
'timestamp': unified_timestamp(info.get('createdAt')),
'uploader': try_get(
info, lambda x: x['channel']['title'], compat_str),
'uploader_id': try_get(
info, lambda x: x['channel']['code'], compat_str),
'uploader_url': try_get(
info, lambda x: x['channel']['permalink'], compat_str),
'view_count': int_or_none(info.get('viewCount')),
'comment_count': int_or_none(info.get('commentCount')),
'tags': info.get('tags', []),
'is_live': is_live,
}

View File

@ -20,6 +20,7 @@ from ..utils import (
float_or_none,
HEADRequest,
is_html,
js_to_json,
orderedSet,
sanitized_Request,
smuggle_url,
@ -961,6 +962,16 @@ class GenericIE(InfoExtractor):
'skip_download': True,
}
},
# Complex jwplayer
{
'url': 'http://www.indiedb.com/games/king-machine/videos',
'info_dict': {
'id': 'videos',
'ext': 'mp4',
'title': 'king machine trailer 1',
'thumbnail': r're:^https?://.*\.jpg$',
},
},
# rtl.nl embed
{
'url': 'http://www.rtlnieuws.nl/nieuws/buitenland/aanslagen-kopenhagen',
@ -1490,7 +1501,12 @@ class GenericIE(InfoExtractor):
'skip_download': True,
},
'add_ie': [VideoPressIE.ie_key()],
}
},
{
# ThePlatform embedded with whitespaces in URLs
'url': 'http://www.golfchannel.com/topics/shows/golftalkcentral.htm',
'only_matching': True,
},
# {
# # TODO: find another test
# # http://schema.org/VideoObject
@ -2488,6 +2504,15 @@ class GenericIE(InfoExtractor):
self._sort_formats(entry['formats'])
return self.playlist_result(entries)
jwplayer_data_str = self._find_jwplayer_data(webpage)
if jwplayer_data_str:
try:
jwplayer_data = self._parse_json(
jwplayer_data_str, video_id, transform_source=js_to_json)
return self._parse_jwplayer_data(jwplayer_data, video_id)
except ExtractorError:
pass
def check_video(vurl):
if YoutubeIE.suitable(vurl):
return True

View File

@ -78,40 +78,60 @@ class GoIE(AdobePassIE):
ext = determine_ext(asset_url)
if ext == 'm3u8':
video_type = video_data.get('type')
if video_type == 'lf':
data = {
'video_id': video_data['id'],
'video_type': video_type,
'brand': brand,
'device': '001',
}
if video_data.get('accesslevel') == '1':
requestor_id = site_info['requestor_id']
resource = self._get_mvpd_resource(
requestor_id, title, video_id, None)
auth = self._extract_mvpd_auth(
url, video_id, requestor_id, resource)
data.update({
'token': auth,
'token_type': 'ap',
'adobe_requestor_id': requestor_id,
})
entitlement = self._download_json(
'https://api.entitlement.watchabc.go.com/vp2/ws-secure/entitlement/2020/authorize.json',
video_id, data=urlencode_postdata(data), headers=self.geo_verification_headers())
errors = entitlement.get('errors', {}).get('errors', [])
if errors:
error_message = ', '.join([error['message'] for error in errors])
raise ExtractorError('%s said: %s' % (self.IE_NAME, error_message), expected=True)
asset_url += '?' + entitlement['uplynkData']['sessionKey']
data = {
'video_id': video_data['id'],
'video_type': video_type,
'brand': brand,
'device': '001',
}
if video_data.get('accesslevel') == '1':
requestor_id = site_info['requestor_id']
resource = self._get_mvpd_resource(
requestor_id, title, video_id, None)
auth = self._extract_mvpd_auth(
url, video_id, requestor_id, resource)
data.update({
'token': auth,
'token_type': 'ap',
'adobe_requestor_id': requestor_id,
})
else:
self._initialize_geo_bypass(['US'])
entitlement = self._download_json(
'https://api.entitlement.watchabc.go.com/vp2/ws-secure/entitlement/2020/authorize.json',
video_id, data=urlencode_postdata(data), headers=self.geo_verification_headers())
errors = entitlement.get('errors', {}).get('errors', [])
if errors:
for error in errors:
if error.get('code') == 1002:
self.raise_geo_restricted(
error['message'], countries=['US'])
error_message = ', '.join([error['message'] for error in errors])
raise ExtractorError('%s said: %s' % (self.IE_NAME, error_message), expected=True)
asset_url += '?' + entitlement['uplynkData']['sessionKey']
formats.extend(self._extract_m3u8_formats(
asset_url, video_id, 'mp4', m3u8_id=format_id or 'hls', fatal=False))
else:
formats.append({
f = {
'format_id': format_id,
'url': asset_url,
'ext': ext,
})
}
if re.search(r'(?:/mp4/source/|_source\.mp4)', asset_url):
f.update({
'format_id': ('%s-' % format_id if format_id else '') + 'SOURCE',
'preference': 1,
})
else:
mobj = re.search(r'/(\d+)x(\d+)/', asset_url)
if mobj:
height = int(mobj.group(2))
f.update({
'format_id': ('%s-' % format_id if format_id else '') + '%dP' % height,
'width': int(mobj.group(1)),
'height': height,
})
formats.append(f)
self._sort_formats(formats)
subtitles = {}

View File

@ -6,59 +6,58 @@ from ..utils import (
determine_ext,
int_or_none,
parse_iso8601,
xpath_text,
)
class HeiseIE(InfoExtractor):
_VALID_URL = r'''(?x)
https?://(?:www\.)?heise\.de/video/artikel/
.+?(?P<id>[0-9]+)\.html(?:$|[?#])
'''
_TEST = {
'url': (
'http://www.heise.de/video/artikel/Podcast-c-t-uplink-3-3-Owncloud-Tastaturen-Peilsender-Smartphone-2404147.html'
),
_VALID_URL = r'https?://(?:www\.)?heise\.de/(?:[^/]+/)+[^/]+-(?P<id>[0-9]+)\.html'
_TESTS = [{
'url': 'http://www.heise.de/video/artikel/Podcast-c-t-uplink-3-3-Owncloud-Tastaturen-Peilsender-Smartphone-2404147.html',
'md5': 'ffed432483e922e88545ad9f2f15d30e',
'info_dict': {
'id': '2404147',
'ext': 'mp4',
'title': (
"Podcast: c't uplink 3.3 Owncloud / Tastaturen / Peilsender Smartphone"
),
'title': "Podcast: c't uplink 3.3 Owncloud / Tastaturen / Peilsender Smartphone",
'format_id': 'mp4_720p',
'timestamp': 1411812600,
'upload_date': '20140927',
'description': 'In uplink-Episode 3.3 geht es darum, wie man sich von Cloud-Anbietern emanzipieren kann, worauf man beim Kauf einer Tastatur achten sollte und was Smartphones über uns verraten.',
'thumbnail': r're:^https?://.*\.jpe?g$',
'description': 'md5:c934cbfb326c669c2bcabcbe3d3fcd20',
'thumbnail': r're:^https?://.*/gallery/$',
}
}
}, {
'url': 'http://www.heise.de/ct/artikel/c-t-uplink-3-3-Owncloud-Tastaturen-Peilsender-Smartphone-2403911.html',
'only_matching': True,
}, {
'url': 'http://www.heise.de/newsticker/meldung/c-t-uplink-Owncloud-Tastaturen-Peilsender-Smartphone-2404251.html?wt_mc=rss.ho.beitrag.atom',
'only_matching': True,
}, {
'url': 'http://www.heise.de/ct/ausgabe/2016-12-Spiele-3214137.html',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
container_id = self._search_regex(
r'<div class="videoplayerjw".*?data-container="([0-9]+)"',
r'<div class="videoplayerjw"[^>]+data-container="([0-9]+)"',
webpage, 'container ID')
sequenz_id = self._search_regex(
r'<div class="videoplayerjw".*?data-sequenz="([0-9]+)"',
r'<div class="videoplayerjw"[^>]+data-sequenz="([0-9]+)"',
webpage, 'sequenz ID')
data_url = 'http://www.heise.de/videout/feed?container=%s&sequenz=%s' % (container_id, sequenz_id)
doc = self._download_xml(data_url, video_id)
info = {
'id': video_id,
'thumbnail': self._og_search_thumbnail(webpage),
'timestamp': parse_iso8601(
self._html_search_meta('date', webpage)),
'description': self._og_search_description(webpage),
}
title = self._html_search_meta('fulltitle', webpage, default=None)
if not title or title == "c't":
title = self._search_regex(
r'<div[^>]+class="videoplayerjw"[^>]+data-title="([^"]+)"',
webpage, 'title')
title = self._html_search_meta('fulltitle', webpage)
if title:
info['title'] = title
else:
info['title'] = self._og_search_title(webpage)
doc = self._download_xml(
'http://www.heise.de/videout/feed', video_id, query={
'container': container_id,
'sequenz': sequenz_id,
})
formats = []
for source_node in doc.findall('.//{http://rss.jwpcdn.com/}source'):
@ -74,6 +73,18 @@ class HeiseIE(InfoExtractor):
'height': height,
})
self._sort_formats(formats)
info['formats'] = formats
return info
description = self._og_search_description(
webpage, default=None) or self._html_search_meta(
'description', webpage)
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': (xpath_text(doc, './/{http://rss.jwpcdn.com/}image') or
self._og_search_thumbnail(webpage)),
'timestamp': parse_iso8601(
self._html_search_meta('date', webpage)),
'formats': formats,
}

View File

@ -3,6 +3,7 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
get_element_by_attribute,
int_or_none,
@ -50,6 +51,33 @@ class InstagramIE(InfoExtractor):
'params': {
'skip_download': True,
},
}, {
# multi video post
'url': 'https://www.instagram.com/p/BQ0eAlwhDrw/',
'playlist': [{
'info_dict': {
'id': 'BQ0dSaohpPW',
'ext': 'mp4',
'title': 'Video 1',
},
}, {
'info_dict': {
'id': 'BQ0dTpOhuHT',
'ext': 'mp4',
'title': 'Video 2',
},
}, {
'info_dict': {
'id': 'BQ0dT7RBFeF',
'ext': 'mp4',
'title': 'Video 3',
},
}],
'info_dict': {
'id': 'BQ0eAlwhDrw',
'title': 'Post by instagram',
'description': 'md5:0f9203fc6a2ce4d228da5754bcf54957',
},
}, {
'url': 'https://instagram.com/p/-Cmh1cukG2/',
'only_matching': True,
@ -113,6 +141,32 @@ class InstagramIE(InfoExtractor):
'timestamp': int_or_none(comment.get('created_at')),
} for comment in media.get(
'comments', {}).get('nodes', []) if comment.get('text')]
if not video_url:
edges = try_get(
media, lambda x: x['edge_sidecar_to_children']['edges'],
list) or []
if edges:
entries = []
for edge_num, edge in enumerate(edges, start=1):
node = try_get(edge, lambda x: x['node'], dict)
if not node:
continue
node_video_url = try_get(node, lambda x: x['video_url'], compat_str)
if not node_video_url:
continue
entries.append({
'id': node.get('shortcode') or node['id'],
'title': 'Video %d' % edge_num,
'url': node_video_url,
'thumbnail': node.get('display_url'),
'width': int_or_none(try_get(node, lambda x: x['dimensions']['width'])),
'height': int_or_none(try_get(node, lambda x: x['dimensions']['height'])),
'view_count': int_or_none(node.get('video_view_count')),
})
return self.playlist_result(
entries, video_id,
'Post by %s' % uploader_id if uploader_id else None,
description)
if not video_url:
video_url = self._og_search_video_url(webpage, secure=False)

View File

@ -8,12 +8,12 @@ from .common import InfoExtractor
from ..utils import (
determine_ext,
js_to_json,
sanitized_Request,
)
class IPrimaIE(InfoExtractor):
_VALID_URL = r'https?://play\.iprima\.cz/(?:.+/)?(?P<id>[^?#]+)'
_GEO_BYPASS = False
_TESTS = [{
'url': 'http://play.iprima.cz/gondici-s-r-o-33',
@ -29,6 +29,10 @@ class IPrimaIE(InfoExtractor):
}, {
'url': 'http://play.iprima.cz/particka/particka-92',
'only_matching': True,
}, {
# geo restricted
'url': 'http://play.iprima.cz/closer-nove-pripady/closer-nove-pripady-iv-1',
'only_matching': True,
}]
def _real_extract(self, url):
@ -38,11 +42,13 @@ class IPrimaIE(InfoExtractor):
video_id = self._search_regex(r'data-product="([^"]+)">', webpage, 'real id')
req = sanitized_Request(
'http://play.iprima.cz/prehravac/init?_infuse=1'
'&_ts=%s&productId=%s' % (round(time.time()), video_id))
req.add_header('Referer', url)
playerpage = self._download_webpage(req, video_id, note='Downloading player')
playerpage = self._download_webpage(
'http://play.iprima.cz/prehravac/init',
video_id, note='Downloading player', query={
'_infuse': 1,
'_ts': round(time.time()),
'productId': video_id,
}, headers={'Referer': url})
formats = []
@ -82,7 +88,7 @@ class IPrimaIE(InfoExtractor):
extract_formats(src)
if not formats and '>GEO_IP_NOT_ALLOWED<' in playerpage:
self.raise_geo_restricted()
self.raise_geo_restricted(countries=['CZ'])
self._sort_formats(formats)

View File

@ -24,6 +24,7 @@ from ..utils import (
class ITVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?itv\.com/hub/[^/]+/(?P<id>[0-9a-zA-Z]+)'
_GEO_COUNTRIES = ['GB']
_TEST = {
'url': 'http://www.itv.com/hub/mr-bean-animated-series/2a2936a0053',
'info_dict': {
@ -98,7 +99,11 @@ class ITVIE(InfoExtractor):
headers=headers, data=etree.tostring(req_env))
playlist = xpath_element(resp_env, './/Playlist')
if playlist is None:
fault_code = xpath_text(resp_env, './/faultcode')
fault_string = xpath_text(resp_env, './/faultstring')
if fault_code == 'InvalidGeoRegion':
self.raise_geo_restricted(
msg=fault_string, countries=self._GEO_COUNTRIES)
raise ExtractorError('%s said: %s' % (self.IE_NAME, fault_string))
title = xpath_text(playlist, 'EpisodeTitle', fatal=True)
video_element = xpath_element(playlist, 'VideoEntries/Video', fatal=True)

View File

@ -16,6 +16,8 @@ class IviIE(InfoExtractor):
IE_DESC = 'ivi.ru'
IE_NAME = 'ivi'
_VALID_URL = r'https?://(?:www\.)?ivi\.ru/(?:watch/(?:[^/]+/)?|video/player\?.*?videoId=)(?P<id>\d+)'
_GEO_BYPASS = False
_GEO_COUNTRIES = ['RU']
_TESTS = [
# Single movie
@ -91,7 +93,11 @@ class IviIE(InfoExtractor):
if 'error' in video_json:
error = video_json['error']
if error['origin'] == 'NoRedisValidData':
origin = error['origin']
if origin == 'NotAllowedForLocation':
self.raise_geo_restricted(
msg=error['message'], countries=self._GEO_COUNTRIES)
elif origin == 'NoRedisValidData':
raise ExtractorError('Video %s does not exist' % video_id, expected=True)
raise ExtractorError(
'Unable to download video %s: %s' % (video_id, error['message']),

View File

@ -4,139 +4,9 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import (
determine_ext,
float_or_none,
int_or_none,
js_to_json,
mimetype2ext,
urljoin,
)
class JWPlatformBaseIE(InfoExtractor):
@staticmethod
def _find_jwplayer_data(webpage):
# TODO: Merge this with JWPlayer-related codes in generic.py
mobj = re.search(
r'jwplayer\((?P<quote>[\'"])[^\'" ]+(?P=quote)\)\.setup\s*\((?P<options>[^)]+)\)',
webpage)
if mobj:
return mobj.group('options')
def _extract_jwplayer_data(self, webpage, video_id, *args, **kwargs):
jwplayer_data = self._parse_json(
self._find_jwplayer_data(webpage), video_id,
transform_source=js_to_json)
return self._parse_jwplayer_data(
jwplayer_data, video_id, *args, **kwargs)
def _parse_jwplayer_data(self, jwplayer_data, video_id=None, require_title=True,
m3u8_id=None, mpd_id=None, rtmp_params=None, base_url=None):
# JWPlayer backward compatibility: flattened playlists
# https://github.com/jwplayer/jwplayer/blob/v7.4.3/src/js/api/config.js#L81-L96
if 'playlist' not in jwplayer_data:
jwplayer_data = {'playlist': [jwplayer_data]}
entries = []
# JWPlayer backward compatibility: single playlist item
# https://github.com/jwplayer/jwplayer/blob/v7.7.0/src/js/playlist/playlist.js#L10
if not isinstance(jwplayer_data['playlist'], list):
jwplayer_data['playlist'] = [jwplayer_data['playlist']]
for video_data in jwplayer_data['playlist']:
# JWPlayer backward compatibility: flattened sources
# https://github.com/jwplayer/jwplayer/blob/v7.4.3/src/js/playlist/item.js#L29-L35
if 'sources' not in video_data:
video_data['sources'] = [video_data]
this_video_id = video_id or video_data['mediaid']
formats = []
for source in video_data['sources']:
source_url = self._proto_relative_url(source['file'])
if base_url:
source_url = compat_urlparse.urljoin(base_url, source_url)
source_type = source.get('type') or ''
ext = mimetype2ext(source_type) or determine_ext(source_url)
if source_type == 'hls' or ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
source_url, this_video_id, 'mp4', 'm3u8_native', m3u8_id=m3u8_id, fatal=False))
elif ext == 'mpd':
formats.extend(self._extract_mpd_formats(
source_url, this_video_id, mpd_id=mpd_id, fatal=False))
# https://github.com/jwplayer/jwplayer/blob/master/src/js/providers/default.js#L67
elif source_type.startswith('audio') or ext in ('oga', 'aac', 'mp3', 'mpeg', 'vorbis'):
formats.append({
'url': source_url,
'vcodec': 'none',
'ext': ext,
})
else:
height = int_or_none(source.get('height'))
if height is None:
# Often no height is provided but there is a label in
# format like 1080p.
height = int_or_none(self._search_regex(
r'^(\d{3,})[pP]$', source.get('label') or '',
'height', default=None))
a_format = {
'url': source_url,
'width': int_or_none(source.get('width')),
'height': height,
'ext': ext,
}
if source_url.startswith('rtmp'):
a_format['ext'] = 'flv'
# See com/longtailvideo/jwplayer/media/RTMPMediaProvider.as
# of jwplayer.flash.swf
rtmp_url_parts = re.split(
r'((?:mp4|mp3|flv):)', source_url, 1)
if len(rtmp_url_parts) == 3:
rtmp_url, prefix, play_path = rtmp_url_parts
a_format.update({
'url': rtmp_url,
'play_path': prefix + play_path,
})
if rtmp_params:
a_format.update(rtmp_params)
formats.append(a_format)
self._sort_formats(formats)
subtitles = {}
tracks = video_data.get('tracks')
if tracks and isinstance(tracks, list):
for track in tracks:
if track.get('kind') != 'captions':
continue
track_url = urljoin(base_url, track.get('file'))
if not track_url:
continue
subtitles.setdefault(track.get('label') or 'en', []).append({
'url': self._proto_relative_url(track_url)
})
entries.append({
'id': this_video_id,
'title': video_data['title'] if require_title else video_data.get('title'),
'description': video_data.get('description'),
'thumbnail': self._proto_relative_url(video_data.get('image')),
'timestamp': int_or_none(video_data.get('pubdate')),
'duration': float_or_none(jwplayer_data.get('duration') or video_data.get('duration')),
'subtitles': subtitles,
'formats': formats,
})
if len(entries) == 1:
return entries[0]
else:
return self.playlist_result(entries)
class JWPlatformIE(JWPlatformBaseIE):
class JWPlatformIE(InfoExtractor):
_VALID_URL = r'(?:https?://content\.jwplatform\.com/(?:feeds|players|jw6)/|jwplatform:)(?P<id>[a-zA-Z0-9]{8})'
_TEST = {
'url': 'http://content.jwplatform.com/players/nPripu9l-ALJ3XQCI.js',

View File

@ -30,7 +30,7 @@ from ..utils import (
class LeIE(InfoExtractor):
IE_DESC = '乐视网'
_VALID_URL = r'https?://(?:www\.le\.com/ptv/vplay|(?:sports\.le|(?:www\.)?lesports)\.com/(?:match|video))/(?P<id>\d+)\.html'
_GEO_COUNTRIES = ['CN']
_URL_TEMPLATE = 'http://www.le.com/ptv/vplay/%s.html'
_TESTS = [{
@ -126,10 +126,9 @@ class LeIE(InfoExtractor):
if playstatus['status'] == 0:
flag = playstatus['flag']
if flag == 1:
msg = 'Country %s auth error' % playstatus['country']
self.raise_geo_restricted()
else:
msg = 'Generic error. flag = %d' % flag
raise ExtractorError(msg, expected=True)
raise ExtractorError('Generic error. flag = %d' % flag, expected=True)
def _real_extract(self, url):
media_id = self._match_id(url)

View File

@ -4,11 +4,13 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_HTTPError
from ..utils import (
determine_ext,
float_or_none,
int_or_none,
unsmuggle_url,
ExtractorError,
)
@ -20,9 +22,17 @@ class LimelightBaseIE(InfoExtractor):
headers = {}
if referer:
headers['Referer'] = referer
return self._download_json(
self._PLAYLIST_SERVICE_URL % (self._PLAYLIST_SERVICE_PATH, item_id, method),
item_id, 'Downloading PlaylistService %s JSON' % method, fatal=fatal, headers=headers)
try:
return self._download_json(
self._PLAYLIST_SERVICE_URL % (self._PLAYLIST_SERVICE_PATH, item_id, method),
item_id, 'Downloading PlaylistService %s JSON' % method, fatal=fatal, headers=headers)
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
error = self._parse_json(e.cause.read().decode(), item_id)['detail']['contentAccessPermission']
if error == 'CountryDisabled':
self.raise_geo_restricted()
raise ExtractorError(error, expected=True)
raise
def _call_api(self, organization_id, item_id, method):
return self._download_json(
@ -213,6 +223,7 @@ class LimelightMediaIE(LimelightBaseIE):
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
video_id = self._match_id(url)
self._initialize_geo_bypass(smuggled_data.get('geo_countries'))
pc, mobile, metadata = self._extract(
video_id, 'getPlaylistByMediaId',

View File

@ -260,9 +260,24 @@ class LyndaCourseIE(LyndaBaseIE):
course_path = mobj.group('coursepath')
course_id = mobj.group('courseid')
item_template = 'https://www.lynda.com/%s/%%s-4.html' % course_path
course = self._download_json(
'https://www.lynda.com/ajax/player?courseId=%s&type=course' % course_id,
course_id, 'Downloading course JSON')
course_id, 'Downloading course JSON', fatal=False)
if not course:
webpage = self._download_webpage(url, course_id)
entries = [
self.url_result(
item_template % video_id, ie=LyndaIE.ie_key(),
video_id=video_id)
for video_id in re.findall(
r'data-video-id=["\'](\d+)', webpage)]
return self.playlist_result(
entries, course_id,
self._og_search_title(webpage, fatal=False),
self._og_search_description(webpage))
if course.get('Status') == 'NotFound':
raise ExtractorError(
@ -283,7 +298,7 @@ class LyndaCourseIE(LyndaBaseIE):
if video_id:
entries.append({
'_type': 'url_transparent',
'url': 'https://www.lynda.com/%s/%s-4.html' % (course_path, video_id),
'url': item_template % video_id,
'ie_key': LyndaIE.ie_key(),
'chapter': chapter.get('Title'),
'chapter_number': int_or_none(chapter.get('ChapterIndex')),

View File

@ -14,7 +14,7 @@ from ..utils import (
class MDRIE(InfoExtractor):
IE_DESC = 'MDR.DE and KiKA'
_VALID_URL = r'https?://(?:www\.)?(?:mdr|kika)\.de/(?:.*)/[a-z]+-?(?P<id>\d+)(?:_.+?)?\.html'
_VALID_URL = r'https?://(?:www\.)?(?:mdr|kika)\.de/(?:.*)/[a-z-]+-?(?P<id>\d+)(?:_.+?)?\.html'
_TESTS = [{
# MDR regularly deletes its videos
@ -31,6 +31,7 @@ class MDRIE(InfoExtractor):
'duration': 250,
'uploader': 'MITTELDEUTSCHER RUNDFUNK',
},
'skip': '404 not found',
}, {
'url': 'http://www.kika.de/baumhaus/videos/video19636.html',
'md5': '4930515e36b06c111213e80d1e4aad0e',
@ -41,6 +42,7 @@ class MDRIE(InfoExtractor):
'duration': 134,
'uploader': 'KIKA',
},
'skip': '404 not found',
}, {
'url': 'http://www.kika.de/sendungen/einzelsendungen/weihnachtsprogramm/videos/video8182.html',
'md5': '5fe9c4dd7d71e3b238f04b8fdd588357',
@ -49,11 +51,21 @@ class MDRIE(InfoExtractor):
'ext': 'mp4',
'title': 'Beutolomäus und der geheime Weihnachtswunsch',
'description': 'md5:b69d32d7b2c55cbe86945ab309d39bbd',
'timestamp': 1450950000,
'upload_date': '20151224',
'timestamp': 1482541200,
'upload_date': '20161224',
'duration': 4628,
'uploader': 'KIKA',
},
}, {
# audio with alternative playerURL pattern
'url': 'http://www.mdr.de/kultur/videos-und-audios/audio-radio/operation-mindfuck-robert-wilson100.html',
'info_dict': {
'id': '100',
'ext': 'mp4',
'title': 'Feature: Operation Mindfuck - Robert Anton Wilson',
'duration': 3239,
'uploader': 'MITTELDEUTSCHER RUNDFUNK',
},
}, {
'url': 'http://www.kika.de/baumhaus/sendungen/video19636_zc-fea7f8a0_zs-4bf89c60.html',
'only_matching': True,
@ -71,7 +83,7 @@ class MDRIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
data_url = self._search_regex(
r'(?:dataURL|playerXml(?:["\'])?)\s*:\s*(["\'])(?P<url>.+/(?:video|audio)-?[0-9]+-avCustom\.xml)\1',
r'(?:dataURL|playerXml(?:["\'])?)\s*:\s*(["\'])(?P<url>.+?-avCustom\.xml)\1',
webpage, 'data url', group='url').replace(r'\/', '/')
doc = self._download_xml(

View File

@ -6,12 +6,12 @@ from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_urllib_parse_unquote,
compat_urllib_parse_urlencode,
)
from ..utils import (
determine_ext,
ExtractorError,
int_or_none,
urlencode_postdata,
get_element_by_attribute,
mimetype2ext,
)
@ -50,6 +50,21 @@ class MetacafeIE(InfoExtractor):
},
'skip': 'Page is temporarily unavailable.',
},
# metacafe video with family filter
{
'url': 'http://www.metacafe.com/watch/2155630/adult_art_by_david_hart_156/',
'md5': 'b06082c5079bbdcde677a6291fbdf376',
'info_dict': {
'id': '2155630',
'ext': 'mp4',
'title': 'Adult Art By David Hart 156',
'uploader': '63346',
'description': 'md5:9afac8fc885252201ad14563694040fc',
},
'params': {
'skip_download': True,
},
},
# AnyClip video
{
'url': 'http://www.metacafe.com/watch/an-dVVXnuY7Jh77J/the_andromeda_strain_1971_stop_the_bomb_part_3/',
@ -112,22 +127,6 @@ class MetacafeIE(InfoExtractor):
def report_disclaimer(self):
self.to_screen('Retrieving disclaimer')
def _confirm_age(self):
# Retrieve disclaimer
self.report_disclaimer()
self._download_webpage(self._DISCLAIMER, None, False, 'Unable to retrieve disclaimer')
# Confirm age
self.report_age_confirmation()
self._download_webpage(
self._FILTER_POST, None, False, 'Unable to confirm age',
data=urlencode_postdata({
'filters': '0',
'submit': "Continue - I'm over 18",
}), headers={
'Content-Type': 'application/x-www-form-urlencoded',
})
def _real_extract(self, url):
# Extract id and simplified title from URL
video_id, display_id = re.match(self._VALID_URL, url).groups()
@ -143,13 +142,15 @@ class MetacafeIE(InfoExtractor):
if prefix == 'cb':
return self.url_result('theplatform:%s' % ext_id, 'ThePlatform')
# self._confirm_age()
headers = {
# Disable family filter
'Cookie': 'user=%s; ' % compat_urllib_parse_urlencode({'ffilter': False})
}
# AnyClip videos require the flashversion cookie so that we get the link
# to the mp4 file
headers = {}
if video_id.startswith('an-'):
headers['Cookie'] = 'flashVersion=0;'
headers['Cookie'] += 'flashVersion=0; '
# Retrieve video webpage to extract further information
webpage = self._download_webpage(url, video_id, headers=headers)

View File

@ -2,16 +2,17 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import int_or_none
class MGTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?mgtv\.com/v/(?:[^/]+/)*(?P<id>\d+)\.html'
_VALID_URL = r'https?://(?:www\.)?mgtv\.com/(v|b)/(?:[^/]+/)*(?P<id>\d+)\.html'
IE_DESC = '芒果TV'
_TESTS = [{
'url': 'http://www.mgtv.com/v/1/290525/f/3116640.html',
'md5': '1bdadcf760a0b90946ca68ee9a2db41a',
'md5': 'b1ffc0fc163152acf6beaa81832c9ee7',
'info_dict': {
'id': '3116640',
'ext': 'mp4',
@ -21,48 +22,45 @@ class MGTVIE(InfoExtractor):
'thumbnail': r're:^https?://.*\.jpg$',
},
}, {
# no tbr extracted from stream_url
'url': 'http://www.mgtv.com/v/1/1/f/3324755.html',
'url': 'http://www.mgtv.com/b/301817/3826653.html',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
api_data = self._download_json(
'http://v.api.mgtv.com/player/video', video_id,
'http://pcweb.api.mgtv.com/player/video', video_id,
query={'video_id': video_id},
headers=self.geo_verification_headers())['data']
info = api_data['info']
title = info['title'].strip()
stream_domain = api_data['stream_domain'][0]
formats = []
for idx, stream in enumerate(api_data['stream']):
stream_url = stream.get('url')
if not stream_url:
stream_path = stream.get('url')
if not stream_path:
continue
format_data = self._download_json(
stream_domain + stream_path, video_id,
note='Download video info for format #%d' % idx)
format_url = format_data.get('info')
if not format_url:
continue
tbr = int_or_none(self._search_regex(
r'(\d+)\.mp4', stream_url, 'tbr', default=None))
def extract_format(stream_url, format_id, idx, query={}):
format_info = self._download_json(
stream_url, video_id,
note='Download video info for format %s' % (format_id or '#%d' % idx),
query=query)
return {
'format_id': format_id,
'url': format_info['info'],
'ext': 'mp4',
'tbr': tbr,
}
formats.append(extract_format(
stream_url, 'hls-%d' % tbr if tbr else None, idx * 2))
formats.append(extract_format(stream_url.replace(
'/playlist.m3u8', ''), 'http-%d' % tbr if tbr else None, idx * 2 + 1, {'pno': 1031}))
r'_(\d+)_mp4/', format_url, 'tbr', default=None))
formats.append({
'format_id': compat_str(tbr or idx),
'url': format_url,
'ext': 'mp4',
'tbr': tbr,
'protocol': 'm3u8_native',
})
self._sort_formats(formats)
return {
'id': video_id,
'title': info['title'].strip(),
'title': title,
'formats': formats,
'description': info.get('desc'),
'duration': int_or_none(info.get('duration')),

View File

@ -19,6 +19,7 @@ class NineCNineMediaBaseIE(InfoExtractor):
class NineCNineMediaStackIE(NineCNineMediaBaseIE):
IE_NAME = '9c9media:stack'
_GEO_COUNTRIES = ['CA']
_VALID_URL = r'9c9media:stack:(?P<destination_code>[^:]+):(?P<content_id>\d+):(?P<content_package>\d+):(?P<id>\d+)'
def _real_extract(self, url):

View File

@ -0,0 +1,83 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import (
get_element_by_class,
urlencode_postdata,
)
class NJPWWorldIE(InfoExtractor):
_VALID_URL = r'https?://njpwworld\.com/p/(?P<id>[a-z0-9_]+)'
IE_DESC = '新日本プロレスワールド'
_NETRC_MACHINE = 'njpwworld'
_TEST = {
'url': 'http://njpwworld.com/p/s_series_00155_1_9/',
'info_dict': {
'id': 's_series_00155_1_9',
'ext': 'mp4',
'title': '第9試合 ランディ・サベージ vs リック・スタイナー',
'tags': list,
},
'params': {
'skip_download': True, # AES-encrypted m3u8
},
'skip': 'Requires login',
}
def _real_initialize(self):
self._login()
def _login(self):
username, password = self._get_login_info()
# No authentication to be performed
if not username:
return True
webpage, urlh = self._download_webpage_handle(
'https://njpwworld.com/auth/login', None,
note='Logging in', errnote='Unable to login',
data=urlencode_postdata({'login_id': username, 'pw': password}))
# /auth/login will return 302 for successful logins
if urlh.geturl() == 'https://njpwworld.com/auth/login':
self.report_warning('unable to login')
return False
return True
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
formats = []
for player_url, kind in re.findall(r'<a[^>]+href="(/player[^"]+)".+?<img[^>]+src="[^"]+qf_btn_([^".]+)', webpage):
player_url = compat_urlparse.urljoin(url, player_url)
player_page = self._download_webpage(
player_url, video_id, note='Downloading player page')
entries = self._parse_html5_media_entries(
player_url, player_page, video_id, m3u8_id='hls-%s' % kind,
m3u8_entry_protocol='m3u8_native',
preference=2 if 'hq' in kind else 1)
formats.extend(entries[0]['formats'])
self._sort_formats(formats)
post_content = get_element_by_class('post-content', webpage)
tags = re.findall(
r'<li[^>]+class="tag-[^"]+"><a[^>]*>([^<]+)</a></li>', post_content
) if post_content else None
return {
'id': video_id,
'title': self._og_search_title(webpage),
'formats': formats,
'tags': tags,
}

View File

@ -23,7 +23,7 @@ from ..utils import (
class NocoIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:www\.)?noco\.tv/emission/|player\.noco\.tv/\?idvideo=)(?P<id>\d+)'
_LOGIN_URL = 'http://noco.tv/do.php'
_LOGIN_URL = 'https://noco.tv/do.php'
_API_URL_TEMPLATE = 'https://api.noco.tv/1.1/%s?ts=%s&tk=%s'
_SUB_LANG_TEMPLATE = '&sub_lang=%s'
_NETRC_MACHINE = 'noco'
@ -69,16 +69,17 @@ class NocoIE(InfoExtractor):
if username is None:
return
login_form = {
'a': 'login',
'cookie': '1',
'username': username,
'password': password,
}
request = sanitized_Request(self._LOGIN_URL, urlencode_postdata(login_form))
request.add_header('Content-Type', 'application/x-www-form-urlencoded; charset=UTF-8')
login = self._download_json(request, None, 'Logging in as %s' % username)
login = self._download_json(
self._LOGIN_URL, None, 'Logging in as %s' % username,
data=urlencode_postdata({
'a': 'login',
'cookie': '1',
'username': username,
'password': password,
}),
headers={
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
})
if 'erreur' in login:
raise ExtractorError('Unable to login: %s' % clean_html(login['erreur']), expected=True)

View File

@ -51,7 +51,8 @@ class NPOIE(NPOBaseIE):
(?:
npo\.nl/(?!live|radio)(?:[^/]+/){2}|
ntr\.nl/(?:[^/]+/){2,}|
omroepwnl\.nl/video/fragment/[^/]+__
omroepwnl\.nl/video/fragment/[^/]+__|
zapp\.nl/[^/]+/[^/]+/
)
)
(?P<id>[^/?#]+)
@ -140,6 +141,18 @@ class NPOIE(NPOBaseIE):
'upload_date': '20150508',
'duration': 462,
},
},
{
'url': 'http://www.zapp.nl/de-bzt-show/gemist/KN_1687547',
'only_matching': True,
},
{
'url': 'http://www.zapp.nl/de-bzt-show/filmpjes/POMS_KN_7315118',
'only_matching': True,
},
{
'url': 'http://www.zapp.nl/beste-vrienden-quiz/extra-video-s/WO_NTR_1067990',
'only_matching': True,
}
]
@ -416,7 +429,21 @@ class NPORadioFragmentIE(InfoExtractor):
}
class SchoolTVIE(InfoExtractor):
class NPODataMidEmbedIE(InfoExtractor):
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
r'data-mid=(["\'])(?P<id>(?:(?!\1).)+)\1', webpage, 'video_id', group='id')
return {
'_type': 'url_transparent',
'ie_key': 'NPO',
'url': 'npo:%s' % video_id,
'display_id': display_id
}
class SchoolTVIE(NPODataMidEmbedIE):
IE_NAME = 'schooltv'
_VALID_URL = r'https?://(?:www\.)?schooltv\.nl/video/(?P<id>[^/?#&]+)'
@ -435,17 +462,25 @@ class SchoolTVIE(InfoExtractor):
}
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
r'data-mid=(["\'])(?P<id>(?:(?!\1).)+)\1', webpage, 'video_id', group='id')
return {
'_type': 'url_transparent',
'ie_key': 'NPO',
'url': 'npo:%s' % video_id,
'display_id': display_id
class HetKlokhuisIE(NPODataMidEmbedIE):
IE_NAME = 'hetklokhuis'
_VALID_URL = r'https?://(?:www\.)?hetklokhuis.nl/[^/]+/\d+/(?P<id>[^/?#&]+)'
_TEST = {
'url': 'http://hetklokhuis.nl/tv-uitzending/3471/Zwaartekrachtsgolven',
'info_dict': {
'id': 'VPWON_1260528',
'display_id': 'Zwaartekrachtsgolven',
'ext': 'm4v',
'title': 'Het Klokhuis: Zwaartekrachtsgolven',
'description': 'md5:c94f31fb930d76c2efa4a4a71651dd48',
'upload_date': '20170223',
},
'params': {
'skip_download': True
}
}
class NPOPlaylistBaseIE(NPOIE):

View File

@ -1,7 +1,6 @@
# coding: utf-8
from __future__ import unicode_literals
import random
import re
from .common import InfoExtractor
@ -15,24 +14,7 @@ from ..utils import (
class NRKBaseIE(InfoExtractor):
_faked_ip = None
def _download_webpage_handle(self, *args, **kwargs):
# NRK checks X-Forwarded-For HTTP header in order to figure out the
# origin of the client behind proxy. This allows to bypass geo
# restriction by faking this header's value to some Norway IP.
# We will do so once we encounter any geo restriction error.
if self._faked_ip:
# NB: str is intentional
kwargs.setdefault(str('headers'), {})['X-Forwarded-For'] = self._faked_ip
return super(NRKBaseIE, self)._download_webpage_handle(*args, **kwargs)
def _fake_ip(self):
# Use fake IP from 37.191.128.0/17 in order to workaround geo
# restriction
def octet(lb=0, ub=255):
return random.randint(lb, ub)
self._faked_ip = '37.191.%d.%d' % (octet(128), octet())
_GEO_COUNTRIES = ['NO']
def _real_extract(self, url):
video_id = self._match_id(url)
@ -44,8 +26,6 @@ class NRKBaseIE(InfoExtractor):
title = data.get('fullTitle') or data.get('mainTitle') or data['title']
video_id = data.get('id') or video_id
http_headers = {'X-Forwarded-For': self._faked_ip} if self._faked_ip else {}
entries = []
conviva = data.get('convivaStatistics') or {}
@ -90,7 +70,6 @@ class NRKBaseIE(InfoExtractor):
'duration': duration,
'subtitles': subtitles,
'formats': formats,
'http_headers': http_headers,
})
if not entries:
@ -107,19 +86,17 @@ class NRKBaseIE(InfoExtractor):
}]
if not entries:
message_type = data.get('messageType', '')
# Can be ProgramIsGeoBlocked or ChannelIsGeoBlocked*
if 'IsGeoBlocked' in message_type and not self._faked_ip:
self.report_warning(
'Video is geo restricted, trying to fake IP')
self._fake_ip()
return self._real_extract(url)
MESSAGES = {
'ProgramRightsAreNotReady': 'Du kan dessverre ikke se eller høre programmet',
'ProgramRightsHasExpired': 'Programmet har gått ut',
'ProgramIsGeoBlocked': 'NRK har ikke rettigheter til å vise dette programmet utenfor Norge',
}
message_type = data.get('messageType', '')
# Can be ProgramIsGeoBlocked or ChannelIsGeoBlocked*
if 'IsGeoBlocked' in message_type:
self.raise_geo_restricted(
msg=MESSAGES.get('ProgramIsGeoBlocked'),
countries=self._GEO_COUNTRIES)
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, MESSAGES.get(
message_type, message_type)),
@ -188,12 +165,12 @@ class NRKIE(NRKBaseIE):
https?://
(?:
(?:www\.)?nrk\.no/video/PS\*|
v8-psapi\.nrk\.no/mediaelement/
v8[-.]psapi\.nrk\.no/mediaelement/
)
)
(?P<id>[^/?#&]+)
(?P<id>[^?#&]+)
'''
_API_HOST = 'v8.psapi.nrk.no'
_API_HOST = 'v8-psapi.nrk.no'
_TESTS = [{
# video
'url': 'http://www.nrk.no/video/PS*150533',
@ -219,6 +196,9 @@ class NRKIE(NRKBaseIE):
}, {
'url': 'nrk:ecc1b952-96dc-4a98-81b9-5296dc7a98d9',
'only_matching': True,
}, {
'url': 'nrk:clip/7707d5a3-ebe7-434a-87d5-a3ebe7a34a70',
'only_matching': True,
}, {
'url': 'https://v8-psapi.nrk.no/mediaelement/ecc1b952-96dc-4a98-81b9-5296dc7a98d9',
'only_matching': True,

View File

@ -1,15 +1,16 @@
# coding: utf-8
from __future__ import unicode_literals
from .jwplatform import JWPlatformBaseIE
from .common import InfoExtractor
from ..utils import (
ExtractorError,
js_to_json,
)
class OnDemandKoreaIE(JWPlatformBaseIE):
class OnDemandKoreaIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?ondemandkorea\.com/(?P<id>[^/]+)\.html'
_GEO_COUNTRIES = ['US', 'CA']
_TEST = {
'url': 'http://www.ondemandkorea.com/ask-us-anything-e43.html',
'info_dict': {
@ -35,7 +36,8 @@ class OnDemandKoreaIE(JWPlatformBaseIE):
if 'msg_block_01.png' in webpage:
self.raise_geo_restricted(
'This content is not available in your region')
msg='This content is not available in your region',
countries=self._GEO_COUNTRIES)
if 'This video is only available to ODK PLUS members.' in webpage:
raise ExtractorError(

View File

@ -23,7 +23,7 @@ class OnetBaseIE(InfoExtractor):
return self._search_regex(
r'id=(["\'])mvp:(?P<id>.+?)\1', webpage, 'mvp id', group='id')
def _extract_from_id(self, video_id, webpage):
def _extract_from_id(self, video_id, webpage=None):
response = self._download_json(
'http://qi.ckm.onetapi.pl/', video_id,
query={
@ -74,8 +74,10 @@ class OnetBaseIE(InfoExtractor):
meta = video.get('meta', {})
title = self._og_search_title(webpage, default=None) or meta['title']
description = self._og_search_description(webpage, default=None) or meta.get('description')
title = (self._og_search_title(
webpage, default=None) if webpage else None) or meta['title']
description = (self._og_search_description(
webpage, default=None) if webpage else None) or meta.get('description')
duration = meta.get('length') or meta.get('lenght')
timestamp = parse_iso8601(meta.get('addDate'), ' ')
@ -89,6 +91,18 @@ class OnetBaseIE(InfoExtractor):
}
class OnetMVPIE(OnetBaseIE):
_VALID_URL = r'onetmvp:(?P<id>\d+\.\d+)'
_TEST = {
'url': 'onetmvp:381027.1509591944',
'only_matching': True,
}
def _real_extract(self, url):
return self._extract_from_id(self._match_id(url))
class OnetIE(OnetBaseIE):
_VALID_URL = r'https?://(?:www\.)?onet\.tv/[a-z]/[a-z]+/(?P<display_id>[0-9a-z-]+)/(?P<id>[0-9a-z]+)'
IE_NAME = 'onet.tv'
@ -167,3 +181,44 @@ class OnetChannelIE(OnetBaseIE):
channel_title = strip_or_none(get_element_by_class('o_channelName', webpage))
channel_description = strip_or_none(get_element_by_class('o_channelDesc', webpage))
return self.playlist_result(entries, channel_id, channel_title, channel_description)
class OnetPlIE(InfoExtractor):
_VALID_URL = r'https?://(?:[^/]+\.)?(?:onet|businessinsider\.com|plejada)\.pl/(?:[^/]+/)+(?P<id>[0-9a-z]+)'
IE_NAME = 'onet.pl'
_TESTS = [{
'url': 'http://eurosport.onet.pl/zimowe/skoki-narciarskie/ziobro-wygral-kwalifikacje-w-pjongczangu/9ckrly',
'md5': 'b94021eb56214c3969380388b6e73cb0',
'info_dict': {
'id': '1561707.1685479',
'ext': 'mp4',
'title': 'Ziobro wygrał kwalifikacje w Pjongczangu',
'description': 'md5:61fb0740084d2d702ea96512a03585b4',
'upload_date': '20170214',
'timestamp': 1487078046,
},
}, {
'url': 'http://film.onet.pl/zwiastuny/ghost-in-the-shell-drugi-zwiastun-pl/5q6yl3',
'only_matching': True,
}, {
'url': 'http://moto.onet.pl/jak-wybierane-sa-miejsca-na-fotoradary/6rs04e',
'only_matching': True,
}, {
'url': 'http://businessinsider.com.pl/wideo/scenariusz-na-koniec-swiata-wedlug-nasa/dwnqptk',
'only_matching': True,
}, {
'url': 'http://plejada.pl/weronika-rosati-o-swoim-domniemanym-slubie/n2bq89',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
mvp_id = self._search_regex(
r'data-params-mvp=["\'](\d+\.\d+)', webpage, 'mvp id')
return self.url_result(
'onetmvp:%s' % mvp_id, OnetMVPIE.ie_key(), video_id=mvp_id)

View File

@ -72,20 +72,25 @@ class OpenloadIE(InfoExtractor):
raise ExtractorError('File not found', expected=True)
ol_id = self._search_regex(
'<span[^>]+id="[^"]+"[^>]*>([0-9]+)</span>',
'<span[^>]+id="[^"]+"[^>]*>([0-9A-Za-z]+)</span>',
webpage, 'openload ID')
first_three_chars = int(float(ol_id[0:][:3]))
fifth_char = int(float(ol_id[3:5]))
urlcode = ''
num = 5
first_char = int(ol_id[0])
urlcode = []
num = 1
while num < len(ol_id):
urlcode += compat_chr(int(float(ol_id[num:][:3])) +
first_three_chars - fifth_char * int(float(ol_id[num + 3:][:2])))
i = ord(ol_id[num])
key = 0
if i <= 90:
key = i - 65
elif i >= 97:
key = 25 + i - 97
urlcode.append((key, compat_chr(int(ol_id[num + 2:num + 5]) // int(ol_id[num + 1]) - first_char)))
num += 5
video_url = 'https://openload.co/stream/' + urlcode
video_url = 'https://openload.co/stream/' + ''.join(
[value for _, value in sorted(urlcode, key=lambda x: x[0])])
title = self._og_search_title(webpage, default=None) or self._search_regex(
r'<span[^>]+class=["\']title["\'][^>]*>([^<]+)', webpage,

View File

@ -193,6 +193,8 @@ class PBSIE(InfoExtractor):
)
''' % '|'.join(list(zip(*_STATIONS))[0])
_GEO_COUNTRIES = ['US']
_TESTS = [
{
'url': 'http://www.pbs.org/tpt/constitution-usa-peter-sagal/watch/a-more-perfect-union/',
@ -489,11 +491,13 @@ class PBSIE(InfoExtractor):
headers=self.geo_verification_headers())
if redirect_info['status'] == 'error':
message = self._ERRORS.get(
redirect_info['http_code'], redirect_info['message'])
if redirect_info['http_code'] == 403:
self.raise_geo_restricted(
msg=message, countries=self._GEO_COUNTRIES)
raise ExtractorError(
'%s said: %s' % (
self.IE_NAME,
self._ERRORS.get(redirect_info['http_code'], redirect_info['message'])),
expected=True)
'%s said: %s' % (self.IE_NAME, message), expected=True)
format_url = redirect_info.get('url')
if not format_url:

View File

@ -64,7 +64,8 @@ class PinkbikeIE(InfoExtractor):
'video:duration', webpage, 'duration'))
uploader = self._search_regex(
r'un:\s*"([^"]+)"', webpage, 'uploader', fatal=False)
r'<a[^>]+\brel=["\']author[^>]+>([^<]+)', webpage,
'uploader', fatal=False)
upload_date = unified_strdate(self._search_regex(
r'class="fullTime"[^>]+title="([^"]+)"',
webpage, 'upload date', fatal=False))

View File

@ -2,27 +2,27 @@
from __future__ import unicode_literals
import itertools
import os
# import os
import re
from .common import InfoExtractor
from ..compat import (
compat_HTTPError,
compat_urllib_parse_unquote,
compat_urllib_parse_unquote_plus,
compat_urllib_parse_urlparse,
# compat_urllib_parse_unquote,
# compat_urllib_parse_unquote_plus,
# compat_urllib_parse_urlparse,
)
from ..utils import (
ExtractorError,
int_or_none,
js_to_json,
orderedSet,
sanitized_Request,
# sanitized_Request,
str_to_int,
)
from ..aes import (
aes_decrypt_text
)
# from ..aes import (
# aes_decrypt_text
# )
class PornHubIE(InfoExtractor):
@ -109,10 +109,14 @@ class PornHubIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
req = sanitized_Request(
'http://www.pornhub.com/view_video.php?viewkey=%s' % video_id)
req.add_header('Cookie', 'age_verified=1')
webpage = self._download_webpage(req, video_id)
def dl_webpage(platform):
return self._download_webpage(
'http://www.pornhub.com/view_video.php?viewkey=%s' % video_id,
video_id, headers={
'Cookie': 'age_verified=1; platform=%s' % platform,
})
webpage = dl_webpage('pc')
error_msg = self._html_search_regex(
r'(?s)<div[^>]+class=(["\'])(?:(?!\1).)*\b(?:removed|userMessageSection)\b(?:(?!\1).)*\1[^>]*>(?P<error>.+?)</div>',
@ -123,10 +127,19 @@ class PornHubIE(InfoExtractor):
'PornHub said: %s' % error_msg,
expected=True, video_id=video_id)
tv_webpage = dl_webpage('tv')
video_url = self._search_regex(
r'<video[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//.+?)\1', tv_webpage,
'video url', group='url')
title = self._search_regex(
r'<h1>([^>]+)</h1>', tv_webpage, 'title', default=None)
# video_title from flashvars contains whitespace instead of non-ASCII (see
# http://www.pornhub.com/view_video.php?viewkey=1331683002), not relying
# on that anymore.
title = self._html_search_meta(
title = title or self._html_search_meta(
'twitter:title', webpage, default=None) or self._search_regex(
(r'<h1[^>]+class=["\']title["\'][^>]*>(?P<title>[^<]+)',
r'<div[^>]+data-video-title=(["\'])(?P<title>.+?)\1',
@ -156,48 +169,6 @@ class PornHubIE(InfoExtractor):
comment_count = self._extract_count(
r'All Comments\s*<span>\(([\d,.]+)\)', webpage, 'comment')
video_variables = {}
for video_variablename, quote, video_variable in re.findall(
r'(player_quality_[0-9]{3,4}p\w+)\s*=\s*(["\'])(.+?)\2;', webpage):
video_variables[video_variablename] = video_variable
video_urls = []
for encoded_video_url in re.findall(
r'player_quality_[0-9]{3,4}p\s*=(.+?);', webpage):
for varname, varval in video_variables.items():
encoded_video_url = encoded_video_url.replace(varname, varval)
video_urls.append(re.sub(r'[\s+]', '', encoded_video_url))
if webpage.find('"encrypted":true') != -1:
password = compat_urllib_parse_unquote_plus(
self._search_regex(r'"video_title":"([^"]+)', webpage, 'password'))
video_urls = list(map(lambda s: aes_decrypt_text(s, password, 32).decode('utf-8'), video_urls))
formats = []
for video_url in video_urls:
path = compat_urllib_parse_urlparse(video_url).path
extension = os.path.splitext(path)[1][1:]
format = path.split('/')[5].split('_')[:2]
format = '-'.join(format)
m = re.match(r'^(?P<height>[0-9]+)[pP]-(?P<tbr>[0-9]+)[kK]$', format)
if m is None:
height = None
tbr = None
else:
height = int(m.group('height'))
tbr = int(m.group('tbr'))
formats.append({
'url': video_url,
'ext': extension,
'format': format,
'format_id': format,
'tbr': tbr,
'height': height,
})
self._sort_formats(formats)
page_params = self._parse_json(self._search_regex(
r'page_params\.zoneDetails\[([\'"])[^\'"]+\1\]\s*=\s*(?P<data>{[^}]+})',
webpage, 'page parameters', group='data', default='{}'),
@ -209,6 +180,7 @@ class PornHubIE(InfoExtractor):
return {
'id': video_id,
'url': video_url,
'uploader': video_uploader,
'title': title,
'thumbnail': thumbnail,
@ -217,7 +189,7 @@ class PornHubIE(InfoExtractor):
'like_count': like_count,
'dislike_count': dislike_count,
'comment_count': comment_count,
'formats': formats,
# 'formats': formats,
'age_limit': 18,
'tags': tags,
'categories': categories,

View File

@ -2,13 +2,13 @@ from __future__ import unicode_literals
import re
from .jwplatform import JWPlatformBaseIE
from .common import InfoExtractor
from ..utils import (
str_to_int,
)
class PornoXOIE(JWPlatformBaseIE):
class PornoXOIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?pornoxo\.com/videos/(?P<id>\d+)/(?P<display_id>[^/]+)\.html'
_TEST = {
'url': 'http://www.pornoxo.com/videos/7564/striptease-from-sexy-secretary.html',

View File

@ -424,3 +424,6 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
return self._extract_clip(url, webpage)
elif page_type == 'playlist':
return self._extract_playlist(url, webpage)
else:
raise ExtractorError(
'Unsupported page type %s' % page_type, expected=True)

View File

@ -2,11 +2,10 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from .jwplatform import JWPlatformBaseIE
from ..compat import compat_str
class RENTVIE(JWPlatformBaseIE):
class RENTVIE(InfoExtractor):
_VALID_URL = r'(?:rentv:|https?://(?:www\.)?ren\.tv/(?:player|video/epizod)/)(?P<id>\d+)'
_TESTS = [{
'url': 'http://ren.tv/video/epizod/118577',

View File

@ -3,7 +3,7 @@ from __future__ import unicode_literals
import re
from .jwplatform import JWPlatformBaseIE
from .common import InfoExtractor
from ..utils import (
js_to_json,
get_element_by_class,
@ -11,7 +11,7 @@ from ..utils import (
)
class RudoIE(JWPlatformBaseIE):
class RudoIE(InfoExtractor):
_VALID_URL = r'https?://rudo\.video/vod/(?P<id>[0-9a-zA-Z]+)'
_TEST = {

View File

@ -1,57 +0,0 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import int_or_none
class SciVeeIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?scivee\.tv/node/(?P<id>\d+)'
_TEST = {
'url': 'http://www.scivee.tv/node/62352',
'md5': 'b16699b74c9e6a120f6772a44960304f',
'info_dict': {
'id': '62352',
'ext': 'mp4',
'title': 'Adam Arkin at the 2014 DOE JGI Genomics of Energy & Environment Meeting',
'description': 'md5:81f1710638e11a481358fab1b11059d7',
},
'skip': 'Not accessible from Travis CI server',
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
# annotations XML is malformed
annotations = self._download_webpage(
'http://www.scivee.tv/assets/annotations/%s' % video_id, video_id, 'Downloading annotations')
title = self._html_search_regex(r'<title>([^<]+)</title>', annotations, 'title')
description = self._html_search_regex(r'<abstract>([^<]+)</abstract>', annotations, 'abstract', fatal=False)
filesize = int_or_none(self._html_search_regex(
r'<filesize>([^<]+)</filesize>', annotations, 'filesize', fatal=False))
formats = [
{
'url': 'http://www.scivee.tv/assets/audio/%s' % video_id,
'ext': 'mp3',
'format_id': 'audio',
},
{
'url': 'http://www.scivee.tv/assets/video/%s' % video_id,
'ext': 'mp4',
'format_id': 'video',
'filesize': filesize,
},
]
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': 'http://www.scivee.tv/assets/videothumb/%s' % video_id,
'formats': formats,
}

View File

@ -1,11 +1,11 @@
# coding: utf-8
from __future__ import unicode_literals
from .jwplatform import JWPlatformBaseIE
from .common import InfoExtractor
from ..utils import js_to_json
class ScreencastOMaticIE(JWPlatformBaseIE):
class ScreencastOMaticIE(InfoExtractor):
_VALID_URL = r'https?://screencast-o-matic\.com/watch/(?P<id>[0-9a-zA-Z]+)'
_TEST = {
'url': 'http://screencast-o-matic.com/watch/c2lD3BeOPl',

View File

@ -3,7 +3,7 @@ from __future__ import unicode_literals
import re
from .jwplatform import JWPlatformBaseIE
from .common import InfoExtractor
from ..utils import (
float_or_none,
parse_iso8601,
@ -14,7 +14,7 @@ from ..utils import (
)
class SendtoNewsIE(JWPlatformBaseIE):
class SendtoNewsIE(InfoExtractor):
_VALID_URL = r'https?://embed\.sendtonews\.com/player2/embedplayer\.php\?.*\bSC=(?P<id>[0-9A-Za-z-]+)'
_TEST = {

View File

@ -0,0 +1,42 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class SkylineWebcamsIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?skylinewebcams\.com/[^/]+/webcam/(?:[^/]+/)+(?P<id>[^/]+)\.html'
_TEST = {
'url': 'https://www.skylinewebcams.com/it/webcam/italia/lazio/roma/scalinata-piazza-di-spagna-barcaccia.html',
'info_dict': {
'id': 'scalinata-piazza-di-spagna-barcaccia',
'ext': 'mp4',
'title': 're:^Live Webcam Scalinata di Piazza di Spagna - La Barcaccia [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'description': 'Roma, veduta sulla Scalinata di Piazza di Spagna e sulla Barcaccia',
'is_live': True,
},
'params': {
'skip_download': True,
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
stream_url = self._search_regex(
r'url\s*:\s*(["\'])(?P<url>(?:https?:)?//.+?\.m3u8.*?)\1', webpage,
'stream url', group='url')
title = self._og_search_title(webpage)
description = self._og_search_description(webpage)
return {
'id': video_id,
'url': stream_url,
'ext': 'mp4',
'title': self._live_title(title),
'description': description,
'is_live': True,
}

View File

@ -108,12 +108,11 @@ class SohuIE(InfoExtractor):
if vid_data['play'] != 1:
if vid_data.get('status') == 12:
raise ExtractorError(
'Sohu said: There\'s something wrong in the video.',
'%s said: There\'s something wrong in the video.' % self.IE_NAME,
expected=True)
else:
raise ExtractorError(
'Sohu said: The video is only licensed to users in Mainland China.',
expected=True)
self.raise_geo_restricted(
'%s said: The video is only licensed to users in Mainland China.' % self.IE_NAME)
formats_json = {}
for format_id in ('nor', 'high', 'super', 'ori', 'h2644k', 'h2654k'):

View File

@ -23,6 +23,10 @@ class SpankBangIE(InfoExtractor):
# 480p only
'url': 'http://spankbang.com/1vt0/video/solvane+gangbang',
'only_matching': True,
}, {
# no uploader
'url': 'http://spankbang.com/lklg/video/sex+with+anyone+wedding+edition+2',
'only_matching': True,
}]
def _real_extract(self, url):
@ -48,7 +52,7 @@ class SpankBangIE(InfoExtractor):
thumbnail = self._og_search_thumbnail(webpage)
uploader = self._search_regex(
r'class="user"[^>]*><img[^>]+>([^<]+)',
webpage, 'uploader', fatal=False)
webpage, 'uploader', default=None)
age_limit = self._rta_search(webpage)

View File

@ -14,6 +14,8 @@ from ..utils import (
class SRGSSRIE(InfoExtractor):
_VALID_URL = r'(?:https?://tp\.srgssr\.ch/p(?:/[^/]+)+\?urn=urn|srgssr):(?P<bu>srf|rts|rsi|rtr|swi):(?:[^:]+:)?(?P<type>video|audio):(?P<id>[0-9a-f\-]{36}|\d+)'
_GEO_BYPASS = False
_GEO_COUNTRIES = ['CH']
_ERRORS = {
'AGERATING12': 'To protect children under the age of 12, this video is only available between 8 p.m. and 6 a.m.',
@ -40,8 +42,12 @@ class SRGSSRIE(InfoExtractor):
media_id)[media_type.capitalize()]
if media_data.get('block') and media_data['block'] in self._ERRORS:
raise ExtractorError('%s said: %s' % (
self.IE_NAME, self._ERRORS[media_data['block']]), expected=True)
message = self._ERRORS[media_data['block']]
if media_data['block'] == 'GEOBLOCK':
self.raise_geo_restricted(
msg=message, countries=self._GEO_COUNTRIES)
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, message), expected=True)
return media_data

View File

@ -13,6 +13,8 @@ from ..utils import (
class SVTBaseIE(InfoExtractor):
_GEO_COUNTRIES = ['SE']
def _extract_video(self, video_info, video_id):
formats = []
for vr in video_info['videoReferences']:
@ -38,7 +40,9 @@ class SVTBaseIE(InfoExtractor):
'url': vurl,
})
if not formats and video_info.get('rights', {}).get('geoBlockedSweden'):
self.raise_geo_restricted('This video is only available in Sweden')
self.raise_geo_restricted(
'This video is only available in Sweden',
countries=self._GEO_COUNTRIES)
self._sort_formats(formats)
subtitles = {}

View File

@ -2,7 +2,10 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import int_or_none
from ..utils import (
int_or_none,
smuggle_url,
)
class TeleQuebecIE(InfoExtractor):
@ -28,7 +31,7 @@ class TeleQuebecIE(InfoExtractor):
return {
'_type': 'url_transparent',
'id': media_id,
'url': 'limelight:media:' + media_data['streamInfo']['sourceId'],
'url': smuggle_url('limelight:media:' + media_data['streamInfo']['sourceId'], {'geo_countries': ['CA']}),
'title': media_data['title'],
'description': media_data.get('descriptions', [{'text': None}])[0].get('text'),
'duration': int_or_none(media_data.get('durationInMilliseconds'), 1000),

View File

@ -8,10 +8,12 @@ from ..utils import (
HEADRequest,
ExtractorError,
int_or_none,
clean_html,
)
class TFOIE(InfoExtractor):
_GEO_COUNTRIES = ['CA']
_VALID_URL = r'https?://(?:www\.)?tfo\.org/(?:en|fr)/(?:[^/]+/){2}(?P<id>\d+)'
_TEST = {
'url': 'http://www.tfo.org/en/universe/tfo-247/100463871/video-game-hackathon',
@ -36,7 +38,9 @@ class TFOIE(InfoExtractor):
'X-tfo-session': self._get_cookies('http://www.tfo.org/')['tfo-session'].value,
})
if infos.get('success') == 0:
raise ExtractorError('%s said: %s' % (self.IE_NAME, infos['msg']), expected=True)
if infos.get('code') == 'ErrGeoBlocked':
self.raise_geo_restricted(countries=self._GEO_COUNTRIES)
raise ExtractorError('%s said: %s' % (self.IE_NAME, clean_html(infos['msg'])), expected=True)
video_data = infos['data']
return {

View File

@ -179,10 +179,12 @@ class ThePlatformIE(ThePlatformBaseIE, AdobePassIE):
if m:
return [m.group('url')]
# Are whitesapces ignored in URLs?
# https://github.com/rg3/youtube-dl/issues/12044
matches = re.findall(
r'<(?:iframe|script)[^>]+src=(["\'])((?:https?:)?//player\.theplatform\.com/p/.+?)\1', webpage)
r'(?s)<(?:iframe|script)[^>]+src=(["\'])((?:https?:)?//player\.theplatform\.com/p/.+?)\1', webpage)
if matches:
return list(zip(*matches))[1]
return [re.sub(r'\s', '', list(zip(*matches))[1][0])]
@staticmethod
def _sign_url(url, sig_key, sig_secret, life=600, include_qs=False):

View File

@ -3,7 +3,10 @@ from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import qualities
from ..utils import (
int_or_none,
qualities,
)
class TheSceneIE(InfoExtractor):
@ -16,6 +19,11 @@ class TheSceneIE(InfoExtractor):
'ext': 'mp4',
'title': 'Narciso Rodriguez: Spring 2013 Ready-to-Wear',
'display_id': 'narciso-rodriguez-spring-2013-ready-to-wear',
'duration': 127,
'series': 'Style.com Fashion Shows',
'season': 'Ready To Wear Spring 2013',
'tags': list,
'categories': list,
},
}
@ -32,21 +40,29 @@ class TheSceneIE(InfoExtractor):
player = self._download_webpage(player_url, display_id)
info = self._parse_json(
self._search_regex(
r'(?m)var\s+video\s+=\s+({.+?});$', player, 'info json'),
r'(?m)video\s*:\s*({.+?}),$', player, 'info json'),
display_id)
video_id = info['id']
title = info['title']
qualities_order = qualities(('low', 'high'))
formats = [{
'format_id': '{0}-{1}'.format(f['type'].split('/')[0], f['quality']),
'url': f['src'],
'quality': qualities_order(f['quality']),
} for f in info['sources'][0]]
} for f in info['sources']]
self._sort_formats(formats)
return {
'id': info['id'],
'id': video_id,
'display_id': display_id,
'title': info['title'],
'title': title,
'formats': formats,
'thumbnail': info.get('poster_frame'),
'duration': int_or_none(info.get('duration')),
'series': info.get('series_title'),
'season': info.get('season_title'),
'tags': info.get('tags'),
'categories': info.get('categories'),
}

View File

@ -3,13 +3,14 @@ from __future__ import unicode_literals
import re
from .jwplatform import JWPlatformBaseIE
from .common import InfoExtractor
from ..utils import remove_end
class ThisAVIE(JWPlatformBaseIE):
class ThisAVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?thisav\.com/video/(?P<id>[0-9]+)/.*'
_TESTS = [{
# jwplayer
'url': 'http://www.thisav.com/video/47734/%98%26sup1%3B%83%9E%83%82---just-fit.html',
'md5': '0480f1ef3932d901f0e0e719f188f19b',
'info_dict': {
@ -20,6 +21,7 @@ class ThisAVIE(JWPlatformBaseIE):
'uploader_id': 'dj7970'
}
}, {
# html5 media
'url': 'http://www.thisav.com/video/242352/nerdy-18yo-big-ass-tattoos-and-glasses.html',
'md5': 'ba90c076bd0f80203679e5b60bf523ee',
'info_dict': {
@ -48,8 +50,12 @@ class ThisAVIE(JWPlatformBaseIE):
}],
}
else:
info_dict = self._extract_jwplayer_data(
webpage, video_id, require_title=False)
entries = self._parse_html5_media_entries(url, webpage, video_id)
if entries:
info_dict = entries[0]
else:
info_dict = self._extract_jwplayer_data(
webpage, video_id, require_title=False)
uploader = self._html_search_regex(
r': <a href="http://www.thisav.com/user/[0-9]+/(?:[^"]+)">([^<]+)</a>',
webpage, 'uploader name', fatal=False)

View File

@ -16,6 +16,7 @@ class TubiTvIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?tubitv\.com/video/(?P<id>[0-9]+)'
_LOGIN_URL = 'http://tubitv.com/login'
_NETRC_MACHINE = 'tubitv'
_GEO_COUNTRIES = ['US']
_TEST = {
'url': 'http://tubitv.com/video/283829/the_comedian_at_the_friday',
'md5': '43ac06be9326f41912dc64ccf7a80320',

View File

@ -24,6 +24,7 @@ class TV4IE(InfoExtractor):
sport/|
)
)(?P<id>[0-9]+)'''
_GEO_COUNTRIES = ['SE']
_TESTS = [
{
'url': 'http://www.tv4.se/kalla-fakta/klipp/kalla-fakta-5-english-subtitles-2491650',
@ -71,16 +72,12 @@ class TV4IE(InfoExtractor):
'http://www.tv4play.se/player/assets/%s.json' % video_id,
video_id, 'Downloading video info JSON')
# If is_geo_restricted is true, it doesn't necessarily mean we can't download it
if info.get('is_geo_restricted'):
self.report_warning('This content might not be available in your country due to licensing restrictions.')
title = info['title']
subtitles = {}
formats = []
# http formats are linked with unresolvable host
for kind in ('hls', ''):
for kind in ('hls3', ''):
data = self._download_json(
'https://prima.tv4play.se/api/web/asset/%s/play.json' % video_id,
video_id, 'Downloading sources JSON', query={
@ -113,6 +110,10 @@ class TV4IE(InfoExtractor):
'url': manifest_url,
'ext': 'vtt',
}]})
if not formats and info.get('is_geo_restricted'):
self.raise_geo_restricted(countries=self._GEO_COUNTRIES)
self._sort_formats(formats)
return {

View File

@ -17,6 +17,9 @@ class TvigleIE(InfoExtractor):
IE_DESC = 'Интернет-телевидение Tvigle.ru'
_VALID_URL = r'https?://(?:www\.)?(?:tvigle\.ru/(?:[^/]+/)+(?P<display_id>[^/]+)/$|cloud\.tvigle\.ru/video/(?P<id>\d+))'
_GEO_BYPASS = False
_GEO_COUNTRIES = ['RU']
_TESTS = [
{
'url': 'http://www.tvigle.ru/video/sokrat/',
@ -72,8 +75,13 @@ class TvigleIE(InfoExtractor):
error_message = item.get('errorMessage')
if not videos and error_message:
raise ExtractorError(
'%s returned error: %s' % (self.IE_NAME, error_message), expected=True)
if item.get('isGeoBlocked') is True:
self.raise_geo_restricted(
msg=error_message, countries=self._GEO_COUNTRIES)
else:
raise ExtractorError(
'%s returned error: %s' % (self.IE_NAME, error_message),
expected=True)
title = item['title']
description = item.get('description')

View File

@ -0,0 +1,76 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
unescapeHTML,
)
class TVN24IE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:[^/]+)\.)?tvn24(?:bis)?\.pl/(?:[^/]+/)*(?P<id>[^/]+)\.html'
_TESTS = [{
'url': 'http://www.tvn24.pl/wiadomosci-z-kraju,3/oredzie-artura-andrusa,702428.html',
'md5': 'fbdec753d7bc29d96036808275f2130c',
'info_dict': {
'id': '1584444',
'ext': 'mp4',
'title': '"Święta mają być wesołe, dlatego, ludziska, wszyscy pod jemiołę"',
'description': 'Wyjątkowe orędzie Artura Andrusa, jednego z gości "Szkła kontaktowego".',
'thumbnail': 're:http://.*[.]jpeg',
}
}, {
'url': 'http://fakty.tvn24.pl/ogladaj-online,60/53-konferencja-bezpieczenstwa-w-monachium,716431.html',
'only_matching': True,
}, {
'url': 'http://sport.tvn24.pl/pilka-nozna,105/ligue-1-kamil-glik-rozcial-glowe-monaco-tylko-remisuje-z-bastia,716522.html',
'only_matching': True,
}, {
'url': 'http://tvn24bis.pl/poranek,146,m/gen-koziej-w-tvn24-bis-wracamy-do-czasow-zimnej-wojny,715660.html',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = self._og_search_title(webpage)
def extract_json(attr, name, fatal=True):
return self._parse_json(
self._search_regex(
r'\b%s=(["\'])(?P<json>(?!\1).+?)\1' % attr, webpage,
name, group='json', fatal=fatal) or '{}',
video_id, transform_source=unescapeHTML, fatal=fatal)
quality_data = extract_json('data-quality', 'formats')
formats = []
for format_id, url in quality_data.items():
formats.append({
'url': url,
'format_id': format_id,
'height': int_or_none(format_id.rstrip('p')),
})
self._sort_formats(formats)
description = self._og_search_description(webpage)
thumbnail = self._og_search_thumbnail(
webpage, default=None) or self._html_search_regex(
r'\bdata-poster=(["\'])(?P<url>(?!\1).+?)\1', webpage,
'thumbnail', group='url')
share_params = extract_json(
'data-share-params', 'share params', fatal=False)
if isinstance(share_params, dict):
video_id = share_params.get('id') or video_id
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'formats': formats,
}

View File

@ -1,7 +1,7 @@
# coding: utf-8
from __future__ import unicode_literals
from .jwplatform import JWPlatformBaseIE
from .common import InfoExtractor
from ..utils import (
clean_html,
get_element_by_class,
@ -9,7 +9,7 @@ from ..utils import (
)
class TVNoeIE(JWPlatformBaseIE):
class TVNoeIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?tvnoe\.cz/video/(?P<id>[0-9]+)'
_TEST = {
'url': 'http://www.tvnoe.cz/video/10362',

View File

@ -12,7 +12,7 @@ from ..utils import (
class TwentyFourVideoIE(InfoExtractor):
IE_NAME = '24video'
_VALID_URL = r'https?://(?:www\.)?24video\.(?:net|me|xxx|sex)/(?:video/(?:view|xml)/|player/new24_play\.swf\?id=)(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?24video\.(?:net|me|xxx|sex|tube)/(?:video/(?:view|xml)/|player/new24_play\.swf\?id=)(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.24video.net/video/view/1044982',
@ -37,6 +37,9 @@ class TwentyFourVideoIE(InfoExtractor):
}, {
'url': 'http://www.24video.me/video/view/1044982',
'only_matching': True,
}, {
'url': 'http://www.24video.tube/video/view/2363750',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@ -20,6 +20,7 @@ class Vbox7IE(InfoExtractor):
)
(?P<id>[\da-fA-F]+)
'''
_GEO_COUNTRIES = ['BG']
_TESTS = [{
'url': 'http://vbox7.com/play:0946fff23c',
'md5': 'a60f9ab3a3a2f013ef9a967d5f7be5bf',
@ -78,7 +79,7 @@ class Vbox7IE(InfoExtractor):
video_url = video['src']
if '/na.mp4' in video_url:
self.raise_geo_restricted()
self.raise_geo_restricted(countries=self._GEO_COUNTRIES)
uploader = video.get('uploader')

View File

@ -17,12 +17,12 @@ from ..utils import (
class VevoBaseIE(InfoExtractor):
def _extract_json(self, webpage, video_id, item):
def _extract_json(self, webpage, video_id):
return self._parse_json(
self._search_regex(
r'window\.__INITIAL_STORE__\s*=\s*({.+?});\s*</script>',
webpage, 'initial store'),
video_id)['default'][item]
video_id)
class VevoIE(VevoBaseIE):
@ -139,6 +139,11 @@ class VevoIE(VevoBaseIE):
# no genres available
'url': 'http://www.vevo.com/watch/INS171400764',
'only_matching': True,
}, {
# Another case available only via the webpage; using streams/streamsV3 formats
# Geo-restricted to Netherlands/Germany
'url': 'http://www.vevo.com/watch/boostee/pop-corn-clip-officiel/FR1A91600909',
'only_matching': True,
}]
_VERSIONS = {
0: 'youtube', # only in AuthenticateVideo videoVersions
@ -193,7 +198,14 @@ class VevoIE(VevoBaseIE):
# https://github.com/rg3/youtube-dl/issues/9366)
if not video_versions:
webpage = self._download_webpage(url, video_id)
video_versions = self._extract_json(webpage, video_id, 'streams')[video_id][0]
json_data = self._extract_json(webpage, video_id)
if 'streams' in json_data.get('default', {}):
video_versions = json_data['default']['streams'][video_id][0]
else:
video_versions = [
value
for key, value in json_data['apollo']['data'].items()
if key.startswith('%s.streams' % video_id)]
uploader = None
artist = None
@ -207,7 +219,7 @@ class VevoIE(VevoBaseIE):
formats = []
for video_version in video_versions:
version = self._VERSIONS.get(video_version['version'])
version = self._VERSIONS.get(video_version.get('version'), 'generic')
version_url = video_version.get('url')
if not version_url:
continue
@ -339,7 +351,7 @@ class VevoPlaylistIE(VevoBaseIE):
if video_id:
return self.url_result('vevo:%s' % video_id, VevoIE.ie_key())
playlists = self._extract_json(webpage, playlist_id, '%ss' % playlist_kind)
playlists = self._extract_json(webpage, playlist_id)['default']['%ss' % playlist_kind]
playlist = (list(playlists.values())[0]
if playlist_kind == 'playlist' else playlists[playlist_id])

View File

@ -14,6 +14,7 @@ from ..utils import (
class VGTVIE(XstreamIE):
IE_DESC = 'VGTV, BTTV, FTV, Aftenposten and Aftonbladet'
_GEO_BYPASS = False
_HOST_TO_APPNAME = {
'vgtv.no': 'vgtv',
@ -217,7 +218,8 @@ class VGTVIE(XstreamIE):
properties = try_get(
data, lambda x: x['streamConfiguration']['properties'], list)
if properties and 'geoblocked' in properties:
raise self.raise_geo_restricted()
raise self.raise_geo_restricted(
countries=[host.rpartition('.')[-1].partition('/')[0].upper()])
self._sort_formats(info['formats'])

View File

@ -70,10 +70,10 @@ class ViceBaseIE(AdobePassIE):
'url': uplynk_preplay_url,
'id': video_id,
'title': title,
'description': base.get('body'),
'description': base.get('body') or base.get('display_body'),
'thumbnail': watch_hub_data.get('cover-image') or watch_hub_data.get('thumbnail'),
'duration': parse_duration(video_data.get('video_duration') or watch_hub_data.get('video-duration')),
'timestamp': int_or_none(video_data.get('created_at')),
'duration': int_or_none(video_data.get('video_duration')) or parse_duration(watch_hub_data.get('video-duration')),
'timestamp': int_or_none(video_data.get('created_at'), 1000),
'age_limit': parse_age_limit(video_data.get('video_rating')),
'series': video_data.get('show_title') or watch_hub_data.get('show-title'),
'episode_number': int_or_none(episode.get('episode_number') or watch_hub_data.get('episode')),

View File

@ -7,16 +7,16 @@ from .vice import ViceBaseIE
class VicelandIE(ViceBaseIE):
_VALID_URL = r'https?://(?:www\.)?viceland\.com/[^/]+/video/[^/]+/(?P<id>[a-f0-9]+)'
_TEST = {
'url': 'https://www.viceland.com/en_us/video/cyberwar-trailer/57608447973ee7705f6fbd4e',
'url': 'https://www.viceland.com/en_us/video/trapped/588a70d0dba8a16007de7316',
'info_dict': {
'id': '57608447973ee7705f6fbd4e',
'id': '588a70d0dba8a16007de7316',
'ext': 'mp4',
'title': 'CYBERWAR (Trailer)',
'description': 'Tapping into the geopolitics of hacking and surveillance, Ben Makuch travels the world to meet with hackers, government officials, and dissidents to investigate the ecosystem of cyberwarfare.',
'title': 'TRAPPED (Series Trailer)',
'description': 'md5:7a8e95c2b6cd86461502a2845e581ccf',
'age_limit': 14,
'timestamp': 1466008539,
'upload_date': '20160615',
'uploader_id': '11',
'timestamp': 1485474122,
'upload_date': '20170126',
'uploader_id': '57a204098cb727dec794c6a3',
'uploader': 'Viceland',
},
'params': {

View File

@ -3,7 +3,7 @@ from __future__ import unicode_literals
import re
from .jwplatform import JWPlatformBaseIE
from .common import InfoExtractor
from ..utils import (
decode_packed_codes,
js_to_json,
@ -12,8 +12,8 @@ from ..utils import (
)
class VidziIE(JWPlatformBaseIE):
_VALID_URL = r'https?://(?:www\.)?vidzi\.tv/(?:embed-)?(?P<id>[0-9a-zA-Z]+)'
class VidziIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?vidzi\.(?:tv|cc)/(?:embed-)?(?P<id>[0-9a-zA-Z]+)'
_TESTS = [{
'url': 'http://vidzi.tv/cghql9yq6emu.html',
'md5': '4f16c71ca0c8c8635ab6932b5f3f1660',
@ -29,6 +29,9 @@ class VidziIE(JWPlatformBaseIE):
}, {
'url': 'http://vidzi.tv/embed-4z2yb0rzphe9-600x338.html',
'skip_download': True,
}, {
'url': 'http://vidzi.cc/cghql9yq6emu.html',
'skip_download': True,
}]
def _real_extract(self, url):

View File

@ -86,7 +86,9 @@ class ViewsterIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
# Get 'api_token' cookie
self._request_webpage(HEADRequest('http://www.viewster.com/'), video_id)
self._request_webpage(
HEADRequest('http://www.viewster.com/'),
video_id, headers=self.geo_verification_headers())
cookies = self._get_cookies('http://www.viewster.com/')
self._AUTH_TOKEN = compat_urllib_parse_unquote(cookies['api_token'].value)

View File

@ -27,6 +27,7 @@ class VikiBaseIE(InfoExtractor):
_APP_VERSION = '2.2.5.1428709186'
_APP_SECRET = '-$iJ}@p7!G@SyU/je1bEyWg}upLu-6V6-Lg9VD(]siH,r.,m-r|ulZ,U4LC/SeR)'
_GEO_BYPASS = False
_NETRC_MACHINE = 'viki'
_token = None
@ -77,8 +78,11 @@ class VikiBaseIE(InfoExtractor):
def _check_errors(self, data):
for reason, status in data.get('blocking', {}).items():
if status and reason in self._ERRORS:
message = self._ERRORS[reason]
if reason == 'geo':
self.raise_geo_restricted(msg=message)
raise ExtractorError('%s said: %s' % (
self.IE_NAME, self._ERRORS[reason]), expected=True)
self.IE_NAME, message), expected=True)
def _real_initialize(self):
self._login()

View File

@ -0,0 +1,32 @@
# coding: utf-8
from __future__ import unicode_literals
from .onet import OnetBaseIE
class VODPlIE(OnetBaseIE):
_VALID_URL = r'https?://vod\.pl/(?:[^/]+/)+(?P<id>[0-9a-zA-Z]+)'
_TESTS = [{
'url': 'https://vod.pl/filmy/chlopaki-nie-placza/3ep3jns',
'md5': 'a7dc3b2f7faa2421aefb0ecaabf7ec74',
'info_dict': {
'id': '3ep3jns',
'ext': 'mp4',
'title': 'Chłopaki nie płaczą',
'description': 'md5:f5f03b84712e55f5ac9f0a3f94445224',
'timestamp': 1463415154,
'duration': 5765,
'upload_date': '20160516',
},
}, {
'url': 'https://vod.pl/seriale/belfer-na-planie-praca-kamery-online/2c10heh',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
info_dict = self._extract_from_id(self._search_mvp_id(webpage), webpage)
info_dict['id'] = video_id
return info_dict

View File

@ -1,10 +1,10 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from .youtube import YoutubeIE
from .jwplatform import JWPlatformBaseIE
class WimpIE(JWPlatformBaseIE):
class WimpIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?wimp\.com/(?P<id>[^/]+)'
_TESTS = [{
'url': 'http://www.wimp.com/maru-is-exhausted/',

View File

@ -5,6 +5,7 @@ import re
from .common import InfoExtractor
from ..utils import (
dict_get,
ExtractorError,
int_or_none,
parse_duration,
unified_strdate,
@ -57,6 +58,10 @@ class XHamsterIE(InfoExtractor):
}, {
'url': 'https://xhamster.com/movies/2272726/amber_slayed_by_the_knight.html',
'only_matching': True,
}, {
# This video is visible for marcoalfa123456's friends only
'url': 'https://it.xhamster.com/movies/7263980/la_mia_vicina.html',
'only_matching': True,
}]
def _real_extract(self, url):
@ -78,6 +83,12 @@ class XHamsterIE(InfoExtractor):
mrss_url = '%s://xhamster.com/movies/%s/%s.html' % (proto, video_id, seo)
webpage = self._download_webpage(mrss_url, video_id)
error = self._html_search_regex(
r'<div[^>]+id=["\']videoClosed["\'][^>]*>(.+?)</div>',
webpage, 'error', default=None)
if error:
raise ExtractorError(error, expected=True)
title = self._html_search_regex(
[r'<h1[^>]*>([^<]+)</h1>',
r'<meta[^>]+itemprop=".*?caption.*?"[^>]+content="(.+?)"',

View File

@ -47,7 +47,6 @@ from ..utils import (
unsmuggle_url,
uppercase_escape,
urlencode_postdata,
ISO3166Utils,
)
@ -371,6 +370,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
}
_SUBTITLE_FORMATS = ('ttml', 'vtt')
_GEO_BYPASS = False
IE_NAME = 'youtube'
_TESTS = [
{
@ -917,7 +918,12 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
# itag 212
'url': '1t24XAntNCY',
'only_matching': True,
}
},
{
# geo restricted to JP
'url': 'sJL6WA-aGkQ',
'only_matching': True,
},
]
def __init__(self, *args, **kwargs):
@ -1376,11 +1382,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
if 'token' not in video_info:
if 'reason' in video_info:
if 'The uploader has not made this video available in your country.' in video_info['reason']:
regions_allowed = self._html_search_meta('regionsAllowed', video_webpage, default=None)
if regions_allowed:
raise ExtractorError('YouTube said: This video is available in %s only' % (
', '.join(map(ISO3166Utils.short2full, regions_allowed.split(',')))),
expected=True)
regions_allowed = self._html_search_meta(
'regionsAllowed', video_webpage, default=None)
countries = regions_allowed.split(',') if regions_allowed else None
self.raise_geo_restricted(
msg=video_info['reason'][0], countries=countries)
raise ExtractorError(
'YouTube said: %s' % video_info['reason'][0],
expected=True, video_id=video_id)
@ -2226,7 +2232,7 @@ class YoutubeUserIE(YoutubeChannelIE):
'url': 'https://www.youtube.com/gametrailers',
'only_matching': True,
}, {
# This channel is not available.
# This channel is not available, geo restricted to JP
'url': 'https://www.youtube.com/user/kananishinoSMEJ/videos',
'only_matching': True,
}]

View File

@ -228,17 +228,29 @@ def parseOpts(overrideArguments=None):
action='store_const', const='::', dest='source_address',
help='Make all connections via IPv6',
)
network.add_option(
geo = optparse.OptionGroup(parser, 'Geo Restriction')
geo.add_option(
'--geo-verification-proxy',
dest='geo_verification_proxy', default=None, metavar='URL',
help='Use this proxy to verify the IP address for some geo-restricted sites. '
'The default proxy specified by --proxy (or none, if the options is not present) is used for the actual downloading.'
)
network.add_option(
'The default proxy specified by --proxy (or none, if the options is not present) is used for the actual downloading.')
geo.add_option(
'--cn-verification-proxy',
dest='cn_verification_proxy', default=None, metavar='URL',
help=optparse.SUPPRESS_HELP,
)
help=optparse.SUPPRESS_HELP)
geo.add_option(
'--geo-bypass',
action='store_true', dest='geo_bypass', default=True,
help='Bypass geographic restriction via faking X-Forwarded-For HTTP header (experimental)')
geo.add_option(
'--no-geo-bypass',
action='store_false', dest='geo_bypass', default=True,
help='Do not bypass geographic restriction via faking X-Forwarded-For HTTP header (experimental)')
geo.add_option(
'--geo-bypass-country', metavar='CODE',
dest='geo_bypass_country', default=None,
help='Force bypass geographic restriction with explicitly provided two-letter ISO 3166-2 country code (experimental)')
selection = optparse.OptionGroup(parser, 'Video Selection')
selection.add_option(
@ -298,14 +310,16 @@ def parseOpts(overrideArguments=None):
metavar='FILTER', dest='match_filter', default=None,
help=(
'Generic video filter. '
'Specify any key (see help for -o for a list of available keys) to'
' match if the key is present, '
'!key to check if the key is not present,'
'Specify any key (see help for -o for a list of available keys) to '
'match if the key is present, '
'!key to check if the key is not present, '
'key > NUMBER (like "comment_count > 12", also works with '
'>=, <, <=, !=, =) to compare against a number, and '
'& to require multiple matches. '
'Values which are not known are excluded unless you'
' put a question mark (?) after the operator.'
'>=, <, <=, !=, =) to compare against a number, '
'key = \'LITERAL\' (like "uploader = \'Mike Smith\'", also works with !=) '
'to match against a string literal '
'and & to require multiple matches. '
'Values which are not known are excluded unless you '
'put a question mark (?) after the operator. '
'For example, to only match videos that have been liked more than '
'100 times and disliked less than 50 times (or the dislike '
'functionality is not available at the given service), but who '
@ -665,8 +679,8 @@ def parseOpts(overrideArguments=None):
help=('Output filename template, see the "OUTPUT TEMPLATE" for all the info'))
filesystem.add_option(
'--autonumber-size',
dest='autonumber_size', metavar='NUMBER', default=5, type=int,
help='Specify the number of digits in %(autonumber)s when it is present in output filename template or --auto-number option is given (default is %default)')
dest='autonumber_size', metavar='NUMBER', type=int,
help=optparse.SUPPRESS_HELP)
filesystem.add_option(
'--autonumber-start',
dest='autonumber_start', metavar='NUMBER', default=1, type=int,
@ -678,15 +692,15 @@ def parseOpts(overrideArguments=None):
filesystem.add_option(
'-A', '--auto-number',
action='store_true', dest='autonumber', default=False,
help='[deprecated; use -o "%(autonumber)s-%(title)s.%(ext)s" ] Number downloaded files starting from 00000')
help=optparse.SUPPRESS_HELP)
filesystem.add_option(
'-t', '--title',
action='store_true', dest='usetitle', default=False,
help='[deprecated] Use title in file name (default)')
help=optparse.SUPPRESS_HELP)
filesystem.add_option(
'-l', '--literal', default=False,
action='store_true', dest='usetitle',
help='[deprecated] Alias of --title')
help=optparse.SUPPRESS_HELP)
filesystem.add_option(
'-w', '--no-overwrites',
action='store_true', dest='nooverwrites', default=False,
@ -834,6 +848,7 @@ def parseOpts(overrideArguments=None):
parser.add_option_group(general)
parser.add_option_group(network)
parser.add_option_group(geo)
parser.add_option_group(selection)
parser.add_option_group(downloader)
parser.add_option_group(filesystem)

View File

@ -536,8 +536,7 @@ class FFmpegSubtitlesConvertorPP(FFmpegPostProcessor):
ext = sub['ext']
if ext == new_ext:
self._downloader.to_screen(
'[ffmpeg] Subtitle file for %s is already in the requested'
'format' % new_ext)
'[ffmpeg] Subtitle file for %s is already in the requested format' % new_ext)
continue
old_file = subtitles_filename(filename, lang, ext)
sub_filenames.append(old_file)

View File

@ -23,6 +23,7 @@ import operator
import os
import pipes
import platform
import random
import re
import socket
import ssl
@ -701,7 +702,12 @@ def bug_reports_message():
return msg
class ExtractorError(Exception):
class YoutubeDLError(Exception):
"""Base exception for YoutubeDL errors."""
pass
class ExtractorError(YoutubeDLError):
"""Error during info extraction."""
def __init__(self, msg, tb=None, expected=False, cause=None, video_id=None):
@ -742,7 +748,19 @@ class RegexNotFoundError(ExtractorError):
pass
class DownloadError(Exception):
class GeoRestrictedError(ExtractorError):
"""Geographic restriction Error exception.
This exception may be thrown when a video is not available from your
geographic location due to geographic restrictions imposed by a website.
"""
def __init__(self, msg, countries=None):
super(GeoRestrictedError, self).__init__(msg, expected=True)
self.msg = msg
self.countries = countries
class DownloadError(YoutubeDLError):
"""Download Error exception.
This exception may be thrown by FileDownloader objects if they are not
@ -756,7 +774,7 @@ class DownloadError(Exception):
self.exc_info = exc_info
class SameFileError(Exception):
class SameFileError(YoutubeDLError):
"""Same File exception.
This exception will be thrown by FileDownloader objects if they detect
@ -765,7 +783,7 @@ class SameFileError(Exception):
pass
class PostProcessingError(Exception):
class PostProcessingError(YoutubeDLError):
"""Post Processing exception.
This exception may be raised by PostProcessor's .run() method to
@ -773,15 +791,16 @@ class PostProcessingError(Exception):
"""
def __init__(self, msg):
super(PostProcessingError, self).__init__(msg)
self.msg = msg
class MaxDownloadsReached(Exception):
class MaxDownloadsReached(YoutubeDLError):
""" --max-downloads limit has been reached. """
pass
class UnavailableVideoError(Exception):
class UnavailableVideoError(YoutubeDLError):
"""Unavailable Format exception.
This exception will be thrown when a video is requested
@ -790,7 +809,7 @@ class UnavailableVideoError(Exception):
pass
class ContentTooShortError(Exception):
class ContentTooShortError(YoutubeDLError):
"""Content Too Short exception.
This exception may be raised by FileDownloader objects when a file they
@ -799,12 +818,15 @@ class ContentTooShortError(Exception):
"""
def __init__(self, downloaded, expected):
super(ContentTooShortError, self).__init__(
'Downloaded {0} bytes, expected {1} bytes'.format(downloaded, expected)
)
# Both in bytes
self.downloaded = downloaded
self.expected = expected
class XAttrMetadataError(Exception):
class XAttrMetadataError(YoutubeDLError):
def __init__(self, code=None, msg='Unknown error'):
super(XAttrMetadataError, self).__init__(msg)
self.code = code
@ -820,7 +842,7 @@ class XAttrMetadataError(Exception):
self.reason = 'NOT_SUPPORTED'
class XAttrUnavailableError(Exception):
class XAttrUnavailableError(YoutubeDLError):
pass
@ -2383,6 +2405,7 @@ def _match_one(filter_part, dct):
\s*(?P<op>%s)(?P<none_inclusive>\s*\?)?\s*
(?:
(?P<intval>[0-9.]+(?:[kKmMgGtTpPeEzZyY]i?[Bb]?)?)|
(?P<quote>["\'])(?P<quotedstrval>(?:\\.|(?!(?P=quote)|\\).)+?)(?P=quote)|
(?P<strval>(?![0-9.])[a-z0-9A-Z]*)
)
\s*$
@ -2391,7 +2414,8 @@ def _match_one(filter_part, dct):
if m:
op = COMPARISON_OPERATORS[m.group('op')]
actual_value = dct.get(m.group('key'))
if (m.group('strval') is not None or
if (m.group('quotedstrval') is not None or
m.group('strval') is not None or
# If the original field is a string and matching comparisonvalue is
# a number we should respect the origin of the original field
# and process comparison value as a string (see
@ -2401,7 +2425,10 @@ def _match_one(filter_part, dct):
if m.group('op') not in ('=', '!='):
raise ValueError(
'Operator %s does not support string values!' % m.group('op'))
comparison_value = m.group('strval') or m.group('intval')
comparison_value = m.group('quotedstrval') or m.group('strval') or m.group('intval')
quote = m.group('quote')
if quote is not None:
comparison_value = comparison_value.replace(r'\%s' % quote, quote)
else:
try:
comparison_value = int(m.group('intval'))
@ -3013,6 +3040,260 @@ class ISO3166Utils(object):
return cls._country_map.get(code.upper())
class GeoUtils(object):
# Major IPv4 address blocks per country
_country_ip_map = {
'AD': '85.94.160.0/19',
'AE': '94.200.0.0/13',
'AF': '149.54.0.0/17',
'AG': '209.59.64.0/18',
'AI': '204.14.248.0/21',
'AL': '46.99.0.0/16',
'AM': '46.70.0.0/15',
'AO': '105.168.0.0/13',
'AP': '159.117.192.0/21',
'AR': '181.0.0.0/12',
'AS': '202.70.112.0/20',
'AT': '84.112.0.0/13',
'AU': '1.128.0.0/11',
'AW': '181.41.0.0/18',
'AZ': '5.191.0.0/16',
'BA': '31.176.128.0/17',
'BB': '65.48.128.0/17',
'BD': '114.130.0.0/16',
'BE': '57.0.0.0/8',
'BF': '129.45.128.0/17',
'BG': '95.42.0.0/15',
'BH': '37.131.0.0/17',
'BI': '154.117.192.0/18',
'BJ': '137.255.0.0/16',
'BL': '192.131.134.0/24',
'BM': '196.12.64.0/18',
'BN': '156.31.0.0/16',
'BO': '161.56.0.0/16',
'BQ': '161.0.80.0/20',
'BR': '152.240.0.0/12',
'BS': '24.51.64.0/18',
'BT': '119.2.96.0/19',
'BW': '168.167.0.0/16',
'BY': '178.120.0.0/13',
'BZ': '179.42.192.0/18',
'CA': '99.224.0.0/11',
'CD': '41.243.0.0/16',
'CF': '196.32.200.0/21',
'CG': '197.214.128.0/17',
'CH': '85.0.0.0/13',
'CI': '154.232.0.0/14',
'CK': '202.65.32.0/19',
'CL': '152.172.0.0/14',
'CM': '165.210.0.0/15',
'CN': '36.128.0.0/10',
'CO': '181.240.0.0/12',
'CR': '201.192.0.0/12',
'CU': '152.206.0.0/15',
'CV': '165.90.96.0/19',
'CW': '190.88.128.0/17',
'CY': '46.198.0.0/15',
'CZ': '88.100.0.0/14',
'DE': '53.0.0.0/8',
'DJ': '197.241.0.0/17',
'DK': '87.48.0.0/12',
'DM': '192.243.48.0/20',
'DO': '152.166.0.0/15',
'DZ': '41.96.0.0/12',
'EC': '186.68.0.0/15',
'EE': '90.190.0.0/15',
'EG': '156.160.0.0/11',
'ER': '196.200.96.0/20',
'ES': '88.0.0.0/11',
'ET': '196.188.0.0/14',
'EU': '2.16.0.0/13',
'FI': '91.152.0.0/13',
'FJ': '144.120.0.0/16',
'FM': '119.252.112.0/20',
'FO': '88.85.32.0/19',
'FR': '90.0.0.0/9',
'GA': '41.158.0.0/15',
'GB': '25.0.0.0/8',
'GD': '74.122.88.0/21',
'GE': '31.146.0.0/16',
'GF': '161.22.64.0/18',
'GG': '62.68.160.0/19',
'GH': '45.208.0.0/14',
'GI': '85.115.128.0/19',
'GL': '88.83.0.0/19',
'GM': '160.182.0.0/15',
'GN': '197.149.192.0/18',
'GP': '104.250.0.0/19',
'GQ': '105.235.224.0/20',
'GR': '94.64.0.0/13',
'GT': '168.234.0.0/16',
'GU': '168.123.0.0/16',
'GW': '197.214.80.0/20',
'GY': '181.41.64.0/18',
'HK': '113.252.0.0/14',
'HN': '181.210.0.0/16',
'HR': '93.136.0.0/13',
'HT': '148.102.128.0/17',
'HU': '84.0.0.0/14',
'ID': '39.192.0.0/10',
'IE': '87.32.0.0/12',
'IL': '79.176.0.0/13',
'IM': '5.62.80.0/20',
'IN': '117.192.0.0/10',
'IO': '203.83.48.0/21',
'IQ': '37.236.0.0/14',
'IR': '2.176.0.0/12',
'IS': '82.221.0.0/16',
'IT': '79.0.0.0/10',
'JE': '87.244.64.0/18',
'JM': '72.27.0.0/17',
'JO': '176.29.0.0/16',
'JP': '126.0.0.0/8',
'KE': '105.48.0.0/12',
'KG': '158.181.128.0/17',
'KH': '36.37.128.0/17',
'KI': '103.25.140.0/22',
'KM': '197.255.224.0/20',
'KN': '198.32.32.0/19',
'KP': '175.45.176.0/22',
'KR': '175.192.0.0/10',
'KW': '37.36.0.0/14',
'KY': '64.96.0.0/15',
'KZ': '2.72.0.0/13',
'LA': '115.84.64.0/18',
'LB': '178.135.0.0/16',
'LC': '192.147.231.0/24',
'LI': '82.117.0.0/19',
'LK': '112.134.0.0/15',
'LR': '41.86.0.0/19',
'LS': '129.232.0.0/17',
'LT': '78.56.0.0/13',
'LU': '188.42.0.0/16',
'LV': '46.109.0.0/16',
'LY': '41.252.0.0/14',
'MA': '105.128.0.0/11',
'MC': '88.209.64.0/18',
'MD': '37.246.0.0/16',
'ME': '178.175.0.0/17',
'MF': '74.112.232.0/21',
'MG': '154.126.0.0/17',
'MH': '117.103.88.0/21',
'MK': '77.28.0.0/15',
'ML': '154.118.128.0/18',
'MM': '37.111.0.0/17',
'MN': '49.0.128.0/17',
'MO': '60.246.0.0/16',
'MP': '202.88.64.0/20',
'MQ': '109.203.224.0/19',
'MR': '41.188.64.0/18',
'MS': '208.90.112.0/22',
'MT': '46.11.0.0/16',
'MU': '105.16.0.0/12',
'MV': '27.114.128.0/18',
'MW': '105.234.0.0/16',
'MX': '187.192.0.0/11',
'MY': '175.136.0.0/13',
'MZ': '197.218.0.0/15',
'NA': '41.182.0.0/16',
'NC': '101.101.0.0/18',
'NE': '197.214.0.0/18',
'NF': '203.17.240.0/22',
'NG': '105.112.0.0/12',
'NI': '186.76.0.0/15',
'NL': '145.96.0.0/11',
'NO': '84.208.0.0/13',
'NP': '36.252.0.0/15',
'NR': '203.98.224.0/19',
'NU': '49.156.48.0/22',
'NZ': '49.224.0.0/14',
'OM': '5.36.0.0/15',
'PA': '186.72.0.0/15',
'PE': '186.160.0.0/14',
'PF': '123.50.64.0/18',
'PG': '124.240.192.0/19',
'PH': '49.144.0.0/13',
'PK': '39.32.0.0/11',
'PL': '83.0.0.0/11',
'PM': '70.36.0.0/20',
'PR': '66.50.0.0/16',
'PS': '188.161.0.0/16',
'PT': '85.240.0.0/13',
'PW': '202.124.224.0/20',
'PY': '181.120.0.0/14',
'QA': '37.210.0.0/15',
'RE': '139.26.0.0/16',
'RO': '79.112.0.0/13',
'RS': '178.220.0.0/14',
'RU': '5.136.0.0/13',
'RW': '105.178.0.0/15',
'SA': '188.48.0.0/13',
'SB': '202.1.160.0/19',
'SC': '154.192.0.0/11',
'SD': '154.96.0.0/13',
'SE': '78.64.0.0/12',
'SG': '152.56.0.0/14',
'SI': '188.196.0.0/14',
'SK': '78.98.0.0/15',
'SL': '197.215.0.0/17',
'SM': '89.186.32.0/19',
'SN': '41.82.0.0/15',
'SO': '197.220.64.0/19',
'SR': '186.179.128.0/17',
'SS': '105.235.208.0/21',
'ST': '197.159.160.0/19',
'SV': '168.243.0.0/16',
'SX': '190.102.0.0/20',
'SY': '5.0.0.0/16',
'SZ': '41.84.224.0/19',
'TC': '65.255.48.0/20',
'TD': '154.68.128.0/19',
'TG': '196.168.0.0/14',
'TH': '171.96.0.0/13',
'TJ': '85.9.128.0/18',
'TK': '27.96.24.0/21',
'TL': '180.189.160.0/20',
'TM': '95.85.96.0/19',
'TN': '197.0.0.0/11',
'TO': '175.176.144.0/21',
'TR': '78.160.0.0/11',
'TT': '186.44.0.0/15',
'TV': '202.2.96.0/19',
'TW': '120.96.0.0/11',
'TZ': '156.156.0.0/14',
'UA': '93.72.0.0/13',
'UG': '154.224.0.0/13',
'US': '3.0.0.0/8',
'UY': '167.56.0.0/13',
'UZ': '82.215.64.0/18',
'VA': '212.77.0.0/19',
'VC': '24.92.144.0/20',
'VE': '186.88.0.0/13',
'VG': '172.103.64.0/18',
'VI': '146.226.0.0/16',
'VN': '14.160.0.0/11',
'VU': '202.80.32.0/20',
'WF': '117.20.32.0/21',
'WS': '202.4.32.0/19',
'YE': '134.35.0.0/16',
'YT': '41.242.116.0/22',
'ZA': '41.0.0.0/11',
'ZM': '165.56.0.0/13',
'ZW': '41.85.192.0/19',
}
@classmethod
def random_ipv4(cls, code):
block = cls._country_ip_map.get(code.upper())
if not block:
return None
addr, preflen = block.split('/')
addr_min = compat_struct_unpack('!L', socket.inet_aton(addr))[0]
addr_max = addr_min | (0xffffffff >> int(preflen))
return compat_str(socket.inet_ntoa(
compat_struct_pack('!L', random.randint(addr_min, addr_max))))
class PerRequestProxyHandler(compat_urllib_request.ProxyHandler):
def __init__(self, proxies=None):
# Set default handlers

View File

@ -1,3 +1,3 @@
from __future__ import unicode_literals
__version__ = '2017.02.14'
__version__ = '2017.02.27'