Compare commits

...

93 Commits

Author SHA1 Message Date
345b24538b release 2017.02.22 2017-02-22 23:50:42 +07:00
63a29b6118 [ChangeLog] Actualize 2017-02-22 23:45:01 +07:00
b5869560a4 [crunchyroll] Fix descriptions with double quotes (closes #12124) 2017-02-23 00:08:45 +08:00
527ef85fe9 [dailymotion] Make comment count optional (closes #12209)
Not served anymore
2017-02-22 21:49:30 +07:00
58ad6995cd [vidzi] Add test for #12213 2017-02-22 21:29:53 +07:00
a86e416088 [vidzi] Add support for vidzi.cc 2017-02-22 22:28:09 +08:00
71e9577b94 [24video] Add support for 24video.tube (closes #12217) 2017-02-22 21:19:52 +07:00
0d427c8304 [setup] Actualize maintainer info 2017-02-22 01:51:27 +07:00
139d8ac106 [setup] Add python 3.6 classifier 2017-02-22 01:50:34 +07:00
abd29a2ced [crackle] use geo bypass mechanism 2017-02-21 19:37:26 +01:00
31615ac279 [viewster] use geo verifcation headers 2017-02-21 19:36:39 +01:00
fc320a40d9 Revert "[cbc] use geo bypass mechanism"
This reverts commit 86466a8b6f.
2017-02-21 18:14:55 +01:00
7345d6d465 [tfo] Improve geo restriction detection and use geo bypass mechanism 2017-02-21 17:52:50 +01:00
86466a8b6f [cbc] use geo bypass mechanism 2017-02-21 17:52:50 +01:00
33dc173cdc [telequebec] use geo bypass mechanism 2017-02-21 17:52:50 +01:00
3444844b04 [limelight] extract PlaylistService errors 2017-02-21 17:52:50 +01:00
8c6c88c7da release 2017.02.21 2017-02-21 23:48:24 +07:00
159aaaa9d0 [ChangeLog] Actualize 2017-02-21 23:46:58 +07:00
eea0716cae [extractor/common] Print origin country for fake IP 2017-02-21 23:14:33 +07:00
336a76551b [extractor/common] Do not quit _initialize_geo_bypass on empty countries 2017-02-21 23:09:41 +07:00
dc0a869e5e [extractor/common] Fix typo 2017-02-21 23:05:31 +07:00
e39b5d4ab8 [extractor/common] Allow calling _initialize_geo_bypass from extractors (#11970) 2017-02-21 23:00:43 +07:00
e469ab2528 [ninecninemedia] use geo bypass mechanism 2017-02-21 14:38:00 +01:00
890d44b005 [adobepass] add support for Time Warner Cable(closes #12191) 2017-02-20 19:00:40 +01:00
6926304472 [spankbang] Make uploader optional (closes #12193) 2017-02-21 00:54:43 +07:00
3ccdde8cb7 [extractor/common] Emphasize geo bypass APIs are experimental 2017-02-20 23:21:15 +07:00
da42ff0668 [iprima] Improve geo restriction detection and disable geo bypass 2017-02-20 23:17:19 +07:00
82f662182b [iprima] Modernize 2017-02-20 23:16:14 +07:00
2cc7fcd338 [commonmistakes] Disable UnicodeBOM extractor test for python 3.2 2017-02-20 03:06:52 +07:00
6d4c259765 [svt] PEP 8 2017-02-20 02:25:55 +07:00
c78dd35491 [nrk] PEP 8 2017-02-20 02:25:39 +07:00
8ffb8e63fe [prosiebensat1] Throw ExtractionError on unsupported page type (closes #12180) 2017-02-20 01:00:53 +07:00
983e9b7746 [nrk] Update _API_HOST and relax _VALID_URL 2017-02-20 00:59:31 +07:00
8936f68a0b [travis] Run tests in parallel
[test_download] Print test names in case of network errors

[test_download] Add comments for nose parameters

[test_download] Modify outtmpl to prevent info JSON filename conflicts

Thanks @jaimeMF for the idea.

[travis] Only download tests should be run in parallel
2017-02-19 21:26:35 +08:00
c58b7ffef4 [tv4] Bypass geo restriction and improve detection 2017-02-19 06:25:59 +07:00
f1a78ee4ef [tv4] Switch to hls3 protocol (closes #12177) 2017-02-19 06:16:00 +07:00
de64e23c56 [downloader/ism] Honor HTTP headers when downloading fragments 2017-02-19 04:18:36 +07:00
553f6dbac7 [downloader/dash] Honor HTTP headers when downloading fragments
For example, https://www.oppetarkiv.se/video/1196142/natten-ar-dagens-mor
2017-02-19 04:18:22 +07:00
0aa10994f4 [options] Move geo restriction related options to separate section 2017-02-19 05:10:08 +08:00
4248dad92b Improve geo bypass mechanism
* Rename options to preffixly match with --geo-verification-proxy
* Introduce _GEO_COUNTRIES for extractors
* Implement faking IP right away for sites with known geo restriction
2017-02-19 05:10:08 +08:00
0a840f584c Rename bypass geo restriction options 2017-02-19 05:10:08 +08:00
0016b84e16 Add faked X-Forwarded-For to formats' HTTP headers 2017-02-19 05:10:08 +08:00
18a0defab0 [utils] Make random_ipv4 return unicode string 2017-02-19 05:10:08 +08:00
5d3fbf77d9 [viki] Improve geo restriction detection 2017-02-19 05:10:08 +08:00
80b59020e0 [vgtv] Improve geo restriction detection 2017-02-19 05:10:08 +08:00
71631862f4 [srgssr] Improve geo restriction detection 2017-02-19 05:10:08 +08:00
89cc7fe770 [vbox7] Improve geo restriction detection and use geo bypass mechanism 2017-02-19 05:10:08 +08:00
04d906eae3 [svt] Improve geo restriction detection and use geo bypass mechanism 2017-02-19 05:10:08 +08:00
8ab8066cf0 [pbs] Improve geo restriction detection and use geo bypass mechanism 2017-02-19 05:10:08 +08:00
01b1aa9ff4 [ondemandkorea] Improve geo restriction detection and use geo bypass mechanism 2017-02-19 05:10:08 +08:00
ff4007891f [nrk] Improve geo restriction detection and use geo bypass mechanism 2017-02-19 05:10:08 +08:00
28200e654b [itv] Improve geo restriction detection and use geo bypass mechanism 2017-02-19 05:10:08 +08:00
e633f21a96 [go] Improve geo restriction detection and use geo bypass mechanism 2017-02-19 05:10:08 +08:00
d392005a79 [dramafever] Improve geo restriction detection and use geo bypass mechanism 2017-02-19 05:10:08 +08:00
773f291dcb Add experimental geo restriction bypass mechanism
Based on faking X-Forwarded-For HTTP header
2017-02-19 05:10:08 +08:00
bf5b9d859a [utils] Introduce YoutubeDLError base class for all youtube-dl exceptions 2017-02-19 05:10:08 +08:00
049a0f4d6d [brightcove:legacy] restrict videoPlayer value(closes #12040) 2017-02-18 21:08:40 +01:00
ac33accd96 [options] Mention quoted string literals for --match-filter 2017-02-18 23:59:26 +07:00
e84888b432 [tvn24] Improve extraction (closes #11679) 2017-02-18 23:34:09 +07:00
02d9b82a23 [tvn24] Add extractor 2017-02-18 23:33:49 +07:00
a2e3286676 [thisav] Add support for html5 media (closes #11771) 2017-02-18 20:21:53 +07:00
f75caf059e [metacafe] Improve (closes #10371) 2017-02-18 19:58:25 +07:00
bdabbc220c [metacafe] Bypass family filter
If you don't send this user=ffilter: false cookie, it will 301 redirect you to a page asking about it, and then the title check will fail.
2017-02-18 19:47:33 +07:00
70bcc444a9 [viceland] improve info extraction and update test 2017-02-18 09:52:43 +01:00
28e35f5070 release 2017.02.17 2017-02-17 23:59:56 +07:00
cf3704c132 [ChangeLog] Actualize 2017-02-17 23:48:30 +07:00
2c1f442c2b [options] Add missing spaces 2017-02-17 23:18:26 +07:00
bad4ccdb5d [heise] Improve (closes #9725) 2017-02-17 23:09:40 +07:00
db76c30c6e [heise] Support videos embedded in any article. 2017-02-17 22:55:53 +07:00
c2bde5d081 [ellentv] Improve 2017-02-17 22:45:51 +07:00
90fad0e74c [openload] Fix extraction (closes #12002) 2017-02-17 22:31:16 +07:00
d94badc755 [openload] Semifix extraction (closes #10408)
just updated the code. i don't do much python still i tried to convert my code. lemme know if there is any prob with it
2017-02-17 22:30:05 +07:00
fef51645d6 [theplatform] Recognize URLs with whitespaces (closes #12044) 2017-02-17 23:13:51 +08:00
4cead6a614 [einthusan] Relax _VALID_URL (closes #12141, closes #12159) 2017-02-17 22:02:01 +07:00
a4a554a793 [generic] Try parsing JWPlayer embedded videos (closes #12030) 2017-02-16 23:44:03 +08:00
b898f0a173 [elpais] Fix typo and improve extraction (closes #12139) 2017-02-16 04:57:42 +07:00
2480b056c1 release 2017.02.16 2017-02-16 00:10:04 +07:00
3aa25395aa [ChangeLog] Actualize 2017-02-16 00:08:56 +07:00
eafaeb226a [ceskatelevize] Lower priority for audio description sources (#12119) 2017-02-16 00:04:15 +07:00
de4d378c0c [ceskatelevize] Prefix format ids 2017-02-15 23:38:00 +07:00
099cfdb770 [devscripts/run_tests.sh] Change permission for script to 755 2017-02-16 00:28:31 +08:00
398dea3210 [test_YoutubeDL] Fix invalid escape sequences 2017-02-15 23:20:46 +07:00
db13c16ef8 [utils] Add support for quoted string literals in --match-filter (closes #8050, closes #12142, closes #12144) 2017-02-15 23:12:10 +07:00
1bd05345ea [amcnetworks] fix extraction(closes #12127) 2017-02-15 14:19:18 +01:00
3021cf83b7 [pinkbike] Fix uploader extraction (closes #12054) 2017-02-15 02:08:32 +07:00
04a741232f [onetpl] Add support for businessinsider.com.pl and plejada.pl 2017-02-15 01:23:55 +07:00
43a3d9edfc [onetpl] Add support for onet.pl (closes #10507) 2017-02-15 01:14:06 +07:00
d31aa74fdb [onetmvp] Add shortcut extractor 2017-02-15 00:58:18 +07:00
6092ccd058 [vodpl] Make more robust and add another test (closes #12122) 2017-02-15 00:52:31 +07:00
22ce9ad2bd [vod.pl] Add new extractor 2017-02-15 00:48:08 +07:00
9a372f14b4 [pornhub] Extract video URL from tv platform site (#12007, #12129) 2017-02-14 23:52:41 +07:00
5cb2d36c82 [ceskatelevize] Extract DASH formats (closes #12119, closes #12133) 2017-02-14 22:57:38 +07:00
fcca0d53a8 [ceskatelevize] Quick fix to revert to using old HLS-based playlist
This fixes recent changes in iVysilani. Proper patch should migrate to
MPEG-DASH version, which is now the default.
2017-02-14 22:25:37 +07:00
73 changed files with 1378 additions and 513 deletions

View File

@ -6,8 +6,8 @@
---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.02.14*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.02.14**
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.02.22*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.02.22**
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2017.02.14
[debug] youtube-dl version 2017.02.22
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@ -11,8 +11,6 @@ sudo: false
env:
- YTDL_TEST_SET=core
- YTDL_TEST_SET=download
before_script:
- chmod +x ./devscripts/run_tests.sh
script: ./devscripts/run_tests.sh
notifications:
email:

View File

@ -1,3 +1,94 @@
version 2017.02.22
Extractors
* [crunchyroll] Fix descriptions with double quotes (#12124)
* [dailymotion] Make comment count optional (#12209)
+ [vidzi] Add support for vidzi.cc (#12213)
+ [24video] Add support for 24video.tube (#12217)
+ [crackle] Use geo bypass mechanism
+ [viewster] Use geo verification headers
+ [tfo] Improve geo restriction detection and use geo bypass mechanism
+ [telequebec] Use geo bypass mechanism
+ [limelight] Extract PlaylistService errors and improve geo restriction
detection
version 2017.02.21
Core
* [extractor/common] Allow calling _initialize_geo_bypass from extractors
(#11970)
+ [adobepass] Add support for Time Warner Cable (#12191)
+ [travis] Run tests in parallel
+ [downloader/ism] Honor HTTP headers when downloading fragments
+ [downloader/dash] Honor HTTP headers when downloading fragments
+ [utils] Add GeoUtils class for working with geo tools and GeoUtils.random_ipv4
+ Add option --geo-bypass-country for explicit geo bypass on behalf of
specified country
+ Add options to control geo bypass mechanism --geo-bypass and --no-geo-bypass
+ Add experimental geo restriction bypass mechanism based on faking
X-Forwarded-For HTTP header
+ [utils] Introduce GeoRestrictedError for geo restricted videos
+ [utils] Introduce YoutubeDLError base class for all youtube-dl exceptions
Extractors
+ [ninecninemedia] Use geo bypass mechanism
* [spankbang] Make uploader optional (#12193)
+ [iprima] Improve geo restriction detection and disable geo bypass
* [iprima] Modernize
* [commonmistakes] Disable UnicodeBOM extractor test for python 3.2
+ [prosiebensat1] Throw ExtractionError on unsupported page type (#12180)
* [nrk] Update _API_HOST and relax _VALID_URL
+ [tv4] Bypass geo restriction and improve detection
* [tv4] Switch to hls3 protocol (#12177)
+ [viki] Improve geo restriction detection
+ [vgtv] Improve geo restriction detection
+ [srgssr] Improve geo restriction detection
+ [vbox7] Improve geo restriction detection and use geo bypass mechanism
+ [svt] Improve geo restriction detection and use geo bypass mechanism
+ [pbs] Improve geo restriction detection and use geo bypass mechanism
+ [ondemandkorea] Improve geo restriction detection and use geo bypass mechanism
+ [nrk] Improve geo restriction detection and use geo bypass mechanism
+ [itv] Improve geo restriction detection and use geo bypass mechanism
+ [go] Improve geo restriction detection and use geo bypass mechanism
+ [dramafever] Improve geo restriction detection and use geo bypass mechanism
* [brightcove:legacy] Restrict videoPlayer value (#12040)
+ [tvn24] Add support for tvn24.pl and tvn24bis.pl (#11679)
+ [thisav] Add support for HTML5 media (#11771)
* [metacafe] Bypass family filter (#10371)
* [viceland] Improve info extraction
version 2017.02.17
Extractors
* [heise] Improve extraction (#9725)
* [ellentv] Improve (#11653)
* [openload] Fix extraction (#10408, #12002)
+ [theplatform] Recognize URLs with whitespaces (#12044)
* [einthusan] Relax URL regular expression (#12141, #12159)
+ [generic] Support complex JWPlayer embedded videos (#12030)
* [elpais] Improve extraction (#12139)
version 2017.02.16
Core
+ [utils] Add support for quoted string literals in --match-filter (#8050,
#12142, #12144)
Extractors
* [ceskatelevize] Lower priority for audio description sources (#12119)
* [amcnetworks] Fix extraction (#12127)
* [pinkbike] Fix uploader extraction (#12054)
+ [onetpl] Add support for businessinsider.com.pl and plejada.pl
+ [onetpl] Add support for onet.pl (#10507)
+ [onetmvp] Add shortcut extractor
+ [vodpl] Add support for vod.pl (#12122)
+ [pornhub] Extract video URL from tv platform site (#12007, #12129)
+ [ceskatelevize] Extract DASH formats (#12119, #12133)
version 2017.02.14
Core

View File

@ -99,11 +99,21 @@ Alternatively, refer to the [developer instructions](#developer-instructions) fo
--source-address IP Client-side IP address to bind to
-4, --force-ipv4 Make all connections via IPv4
-6, --force-ipv6 Make all connections via IPv6
## Geo Restriction:
--geo-verification-proxy URL Use this proxy to verify the IP address for
some geo-restricted sites. The default
proxy specified by --proxy (or none, if the
options is not present) is used for the
actual downloading.
--geo-bypass Bypass geographic restriction via faking
X-Forwarded-For HTTP header (experimental)
--no-geo-bypass Do not bypass geographic restriction via
faking X-Forwarded-For HTTP header
(experimental)
--geo-bypass-country CODE Force bypass geographic restriction with
explicitly provided two-letter ISO 3166-2
country code (experimental)
## Video Selection:
--playlist-start NUMBER Playlist video to start at (default is 1)
@ -137,20 +147,22 @@ Alternatively, refer to the [developer instructions](#developer-instructions) fo
--match-filter FILTER Generic video filter. Specify any key (see
help for -o for a list of available keys)
to match if the key is present, !key to
check if the key is not present,key >
check if the key is not present, key >
NUMBER (like "comment_count > 12", also
works with >=, <, <=, !=, =) to compare
against a number, and & to require multiple
matches. Values which are not known are
excluded unless you put a question mark (?)
after the operator.For example, to only
match videos that have been liked more than
100 times and disliked less than 50 times
(or the dislike functionality is not
available at the given service), but who
also have a description, use --match-filter
"like_count > 100 & dislike_count <? 50 &
description" .
against a number, key = 'LITERAL' (like
"uploader = 'Mike Smith'", also works with
!=) to match against a string literal and &
to require multiple matches. Values which
are not known are excluded unless you put a
question mark (?) after the operator. For
example, to only match videos that have
been liked more than 100 times and disliked
less than 50 times (or the dislike
functionality is not available at the given
service), but who also have a description,
use --match-filter "like_count > 100 &
dislike_count <? 50 & description" .
--no-playlist Download only the video, if the URL refers
to a video and a playlist.
--yes-playlist Download the playlist, if the URL refers to

4
devscripts/run_tests.sh Normal file → Executable file
View File

@ -3,6 +3,7 @@
DOWNLOAD_TESTS="age_restriction|download|subtitles|write_annotations|iqiyi_sdk_interpreter"
test_set=""
multiprocess_args=""
case "$YTDL_TEST_SET" in
core)
@ -10,10 +11,11 @@ case "$YTDL_TEST_SET" in
;;
download)
test_set="-I test_(?!$DOWNLOAD_TESTS).+\.py"
multiprocess_args="--processes=4 --process-timeout=540"
;;
*)
break
;;
esac
nosetests test --verbose $test_set
nosetests test --verbose $test_set $multiprocess_args

View File

@ -546,8 +546,10 @@
- **OktoberfestTV**
- **on.aol.com**
- **OnDemandKorea**
- **onet.pl**
- **onet.tv**
- **onet.tv:channel**
- **OnetMVP**
- **OnionStudios**
- **Ooyala**
- **OoyalaExternal**
@ -802,6 +804,7 @@
- **TVCArticle**
- **tvigle**: Интернет-телевидение Tvigle.ru
- **tvland.com**
- **TVN24**
- **TVNoe**
- **tvp**: Telewizja Polska
- **tvp:embed**: Telewizja Polska
@ -900,6 +903,7 @@
- **vlive**
- **vlive:channel**
- **Vodlocker**
- **VODPl**
- **VODPlatform**
- **VoiceRepublic**
- **VoxMedia**

View File

@ -107,8 +107,8 @@ setup(
url='https://github.com/rg3/youtube-dl',
author='Ricardo Garcia',
author_email='ytdl@yt-dl.org',
maintainer='Philipp Hagemeister',
maintainer_email='phihag@phihag.de',
maintainer='Sergey M.',
maintainer_email='dstftw@gmail.com',
packages=[
'youtube_dl',
'youtube_dl.extractor', 'youtube_dl.downloader',
@ -130,6 +130,7 @@ setup(
'Programming Language :: Python :: 3.3',
'Programming Language :: Python :: 3.4',
'Programming Language :: Python :: 3.5',
'Programming Language :: Python :: 3.6',
],
cmdclass={'build_lazy_extractors': build_lazy_extractors},

View File

@ -1,4 +1,5 @@
#!/usr/bin/env python
# coding: utf-8
from __future__ import unicode_literals
@ -540,10 +541,10 @@ class TestYoutubeDL(unittest.TestCase):
self.assertEqual(ydl._format_note({}), '')
assertRegexpMatches(self, ydl._format_note({
'vbr': 10,
}), '^\s*10k$')
}), r'^\s*10k$')
assertRegexpMatches(self, ydl._format_note({
'fps': 30,
}), '^30fps$')
}), r'^30fps$')
def test_postprocessors(self):
filename = 'post-processor-testfile.mp4'
@ -606,6 +607,8 @@ class TestYoutubeDL(unittest.TestCase):
'duration': 30,
'filesize': 10 * 1024,
'playlist_id': '42',
'uploader': "變態妍字幕版 太妍 тест",
'creator': "тест ' 123 ' тест--",
}
second = {
'id': '2',
@ -616,6 +619,7 @@ class TestYoutubeDL(unittest.TestCase):
'description': 'foo',
'filesize': 5 * 1024,
'playlist_id': '43',
'uploader': "тест 123",
}
videos = [first, second]
@ -656,6 +660,26 @@ class TestYoutubeDL(unittest.TestCase):
res = get_videos(f)
self.assertEqual(res, ['1'])
f = match_filter_func('uploader = "變態妍字幕版 太妍 тест"')
res = get_videos(f)
self.assertEqual(res, ['1'])
f = match_filter_func('uploader != "變態妍字幕版 太妍 тест"')
res = get_videos(f)
self.assertEqual(res, ['2'])
f = match_filter_func('creator = "тест \' 123 \' тест--"')
res = get_videos(f)
self.assertEqual(res, ['1'])
f = match_filter_func("creator = 'тест \\' 123 \\' тест--'")
res = get_videos(f)
self.assertEqual(res, ['1'])
f = match_filter_func(r"creator = 'тест \' 123 \' тест--' & duration > 30")
res = get_videos(f)
self.assertEqual(res, [])
def test_playlist_items_selection(self):
entries = [{
'id': compat_str(i),

View File

@ -65,6 +65,10 @@ defs = gettestcases()
class TestDownload(unittest.TestCase):
# Parallel testing in nosetests. See
# http://nose.readthedocs.org/en/latest/doc_tests/test_multiprocess/multiprocess.html
_multiprocess_shared_ = True
maxDiff = None
def setUp(self):
@ -73,7 +77,7 @@ class TestDownload(unittest.TestCase):
# Dynamically generate tests
def generator(test_case):
def generator(test_case, tname):
def test_template(self):
ie = youtube_dl.extractor.get_info_extractor(test_case['name'])
@ -102,6 +106,7 @@ def generator(test_case):
return
params = get_params(test_case.get('params', {}))
params['outtmpl'] = tname + '_' + params['outtmpl']
if is_playlist and 'playlist' not in test_case:
params.setdefault('extract_flat', 'in_playlist')
params.setdefault('skip_download', True)
@ -146,7 +151,7 @@ def generator(test_case):
raise
if try_num == RETRIES:
report_warning('Failed due to network errors, skipping...')
report_warning('%s failed due to network errors, skipping...' % tname)
return
print('Retrying: {0} failed tries\n\n##########\n\n'.format(try_num))
@ -221,12 +226,12 @@ def generator(test_case):
# And add them to TestDownload
for n, test_case in enumerate(defs):
test_method = generator(test_case)
tname = 'test_' + str(test_case['name'])
i = 1
while hasattr(TestDownload, tname):
tname = 'test_%s_%d' % (test_case['name'], i)
i += 1
test_method = generator(test_case, tname)
test_method.__name__ = str(tname)
setattr(TestDownload, test_method.__name__, test_method)
del test_method

View File

@ -56,6 +56,8 @@ from .utils import (
ExtractorError,
format_bytes,
formatSeconds,
GeoRestrictedError,
ISO3166Utils,
locked_file,
make_HTTPS_handler,
MaxDownloadsReached,
@ -272,6 +274,12 @@ class YoutubeDL(object):
If it returns None, the video is downloaded.
match_filter_func in utils.py is one example for this.
no_color: Do not emit color codes in output.
geo_bypass: Bypass geographic restriction via faking X-Forwarded-For
HTTP header (experimental)
geo_bypass_country:
Two-letter ISO 3166-2 country code that will be used for
explicit geographic restriction bypassing via faking
X-Forwarded-For HTTP header (experimental)
The following options determine which downloader is picked:
external_downloader: Executable of the external downloader to call.
@ -707,6 +715,14 @@ class YoutubeDL(object):
return self.process_ie_result(ie_result, download, extra_info)
else:
return ie_result
except GeoRestrictedError as e:
msg = e.msg
if e.countries:
msg += '\nThis video is available in %s.' % ', '.join(
map(ISO3166Utils.short2full, e.countries))
msg += '\nYou might want to use a VPN or a proxy server (with --proxy) to workaround.'
self.report_error(msg)
break
except ExtractorError as e: # An error we somewhat expected
self.report_error(compat_str(e), e.format_traceback())
break
@ -847,8 +863,14 @@ class YoutubeDL(object):
if self.params.get('playlistrandom', False):
random.shuffle(entries)
x_forwarded_for = ie_result.get('__x_forwarded_for_ip')
for i, entry in enumerate(entries, 1):
self.to_screen('[download] Downloading video %s of %s' % (i, n_entries))
# This __x_forwarded_for_ip thing is a bit ugly but requires
# minimal changes
if x_forwarded_for:
entry['__x_forwarded_for_ip'] = x_forwarded_for
extra = {
'n_entries': n_entries,
'playlist': playlist,
@ -1233,6 +1255,11 @@ class YoutubeDL(object):
if cookies:
res['Cookie'] = cookies
if 'X-Forwarded-For' not in res:
x_forwarded_for_ip = info_dict.get('__x_forwarded_for_ip')
if x_forwarded_for_ip:
res['X-Forwarded-For'] = x_forwarded_for_ip
return res
def _calc_cookies(self, info_dict):
@ -1375,6 +1402,9 @@ class YoutubeDL(object):
full_format_info = info_dict.copy()
full_format_info.update(format)
format['http_headers'] = self._calc_headers(full_format_info)
# Remove private housekeeping stuff
if '__x_forwarded_for_ip' in info_dict:
del info_dict['__x_forwarded_for_ip']
# TODO Central sorting goes here

View File

@ -414,6 +414,8 @@ def _real_main(argv=None):
'cn_verification_proxy': opts.cn_verification_proxy,
'geo_verification_proxy': opts.geo_verification_proxy,
'config_location': opts.config_location,
'geo_bypass': opts.geo_bypass,
'geo_bypass_country': opts.geo_bypass_country,
}
with YoutubeDL(ydl_opts) as ydl:

View File

@ -43,7 +43,10 @@ class DashSegmentsFD(FragmentFD):
count = 0
while count <= fragment_retries:
try:
success = ctx['dl'].download(target_filename, {'url': segment_url})
success = ctx['dl'].download(target_filename, {
'url': segment_url,
'http_headers': info_dict.get('http_headers'),
})
if not success:
return False
down, target_sanitized = sanitize_open(target_filename, 'rb')

View File

@ -238,7 +238,10 @@ class IsmFD(FragmentFD):
count = 0
while count <= fragment_retries:
try:
success = ctx['dl'].download(target_filename, {'url': segment_url})
success = ctx['dl'].download(target_filename, {
'url': segment_url,
'http_headers': info_dict.get('http_headers'),
})
if not success:
return False
down, target_sanitized = sanitize_open(target_filename, 'rb')

View File

@ -31,6 +31,11 @@ MSO_INFO = {
'username_field': 'user',
'password_field': 'passwd',
},
'TWC': {
'name': 'Time Warner Cable | Spectrum',
'username_field': 'Ecom_User_ID',
'password_field': 'Ecom_Password',
},
'thr030': {
'name': '3 Rivers Communications'
},

View File

@ -53,20 +53,30 @@ class AMCNetworksIE(ThePlatformIE):
'mbr': 'true',
'manifest': 'm3u',
}
media_url = self._search_regex(r'window\.platformLinkURL\s*=\s*[\'"]([^\'"]+)', webpage, 'media url')
media_url = self._search_regex(
r'window\.platformLinkURL\s*=\s*[\'"]([^\'"]+)',
webpage, 'media url')
theplatform_metadata = self._download_theplatform_metadata(self._search_regex(
r'https?://link.theplatform.com/s/([^?]+)', media_url, 'theplatform_path'), display_id)
r'link\.theplatform\.com/s/([^?]+)',
media_url, 'theplatform_path'), display_id)
info = self._parse_theplatform_metadata(theplatform_metadata)
video_id = theplatform_metadata['pid']
title = theplatform_metadata['title']
rating = theplatform_metadata['ratings'][0]['rating']
auth_required = self._search_regex(r'window\.authRequired\s*=\s*(true|false);', webpage, 'auth required')
auth_required = self._search_regex(
r'window\.authRequired\s*=\s*(true|false);',
webpage, 'auth required')
if auth_required == 'true':
requestor_id = self._search_regex(r'window\.requestor_id\s*=\s*[\'"]([^\'"]+)', webpage, 'requestor id')
resource = self._get_mvpd_resource(requestor_id, title, video_id, rating)
query['auth'] = self._extract_mvpd_auth(url, video_id, requestor_id, resource)
requestor_id = self._search_regex(
r'window\.requestor_id\s*=\s*[\'"]([^\'"]+)',
webpage, 'requestor id')
resource = self._get_mvpd_resource(
requestor_id, title, video_id, rating)
query['auth'] = self._extract_mvpd_auth(
url, video_id, requestor_id, resource)
media_url = update_url_query(media_url, query)
formats, subtitles = self._extract_theplatform_smil(media_url, video_id)
formats, subtitles = self._extract_theplatform_smil(
media_url, video_id)
self._sort_formats(formats)
info.update({
'id': video_id,
@ -78,9 +88,11 @@ class AMCNetworksIE(ThePlatformIE):
if ns_keys:
ns = list(ns_keys)[0]
series = theplatform_metadata.get(ns + '$show')
season_number = int_or_none(theplatform_metadata.get(ns + '$season'))
season_number = int_or_none(
theplatform_metadata.get(ns + '$season'))
episode = theplatform_metadata.get(ns + '$episodeTitle')
episode_number = int_or_none(theplatform_metadata.get(ns + '$episode'))
episode_number = int_or_none(
theplatform_metadata.get(ns + '$episode'))
if season_number:
title = 'Season %d - %s' % (season_number, title)
if series:

View File

@ -1,13 +1,13 @@
from __future__ import unicode_literals
from .jwplatform import JWPlatformBaseIE
from .common import InfoExtractor
from ..utils import (
unified_strdate,
clean_html,
)
class ArchiveOrgIE(JWPlatformBaseIE):
class ArchiveOrgIE(InfoExtractor):
IE_NAME = 'archive.org'
IE_DESC = 'archive.org videos'
_VALID_URL = r'https?://(?:www\.)?archive\.org/(?:details|embed)/(?P<id>[^/?#]+)(?:[?].*)?$'

View File

@ -191,6 +191,10 @@ class BrightcoveLegacyIE(InfoExtractor):
# These fields hold the id of the video
videoPlayer = find_param('@videoPlayer') or find_param('videoId') or find_param('videoID') or find_param('@videoList')
if videoPlayer is not None:
if isinstance(videoPlayer, list):
videoPlayer = videoPlayer[0]
if not (videoPlayer.isdigit() or videoPlayer.startswith('ref:')):
return None
params['@videoPlayer'] = videoPlayer
linkBase = find_param('linkBaseURL')
if linkBase is not None:

View File

@ -13,6 +13,7 @@ from ..utils import (
float_or_none,
sanitized_Request,
urlencode_postdata,
USER_AGENTS,
)
@ -21,10 +22,10 @@ class CeskaTelevizeIE(InfoExtractor):
_TESTS = [{
'url': 'http://www.ceskatelevize.cz/ivysilani/ivysilani/10441294653-hyde-park-civilizace/214411058091220',
'info_dict': {
'id': '61924494876951776',
'id': '61924494877246241',
'ext': 'mp4',
'title': 'Hyde Park Civilizace',
'description': 'md5:fe93f6eda372d150759d11644ebbfb4a',
'title': 'Hyde Park Civilizace: Život v Grónsku',
'description': 'md5:3fec8f6bb497be5cdb0c9e8781076626',
'thumbnail': r're:^https?://.*\.jpg',
'duration': 3350,
},
@ -114,70 +115,100 @@ class CeskaTelevizeIE(InfoExtractor):
'requestSource': 'iVysilani',
}
req = sanitized_Request(
'http://www.ceskatelevize.cz/ivysilani/ajax/get-client-playlist',
data=urlencode_postdata(data))
req.add_header('Content-type', 'application/x-www-form-urlencoded')
req.add_header('x-addr', '127.0.0.1')
req.add_header('X-Requested-With', 'XMLHttpRequest')
req.add_header('Referer', url)
playlistpage = self._download_json(req, playlist_id)
playlist_url = playlistpage['url']
if playlist_url == 'error_region':
raise ExtractorError(NOT_AVAILABLE_STRING, expected=True)
req = sanitized_Request(compat_urllib_parse_unquote(playlist_url))
req.add_header('Referer', url)
playlist_title = self._og_search_title(webpage, default=None)
playlist_description = self._og_search_description(webpage, default=None)
playlist = self._download_json(req, playlist_id)['playlist']
playlist_len = len(playlist)
entries = []
for item in playlist:
is_live = item.get('type') == 'LIVE'
formats = []
for format_id, stream_url in item['streamUrls'].items():
formats.extend(self._extract_m3u8_formats(
stream_url, playlist_id, 'mp4',
entry_protocol='m3u8' if is_live else 'm3u8_native',
fatal=False))
self._sort_formats(formats)
item_id = item.get('id') or item['assetId']
title = item['title']
for user_agent in (None, USER_AGENTS['Safari']):
req = sanitized_Request(
'http://www.ceskatelevize.cz/ivysilani/ajax/get-client-playlist',
data=urlencode_postdata(data))
duration = float_or_none(item.get('duration'))
thumbnail = item.get('previewImageUrl')
req.add_header('Content-type', 'application/x-www-form-urlencoded')
req.add_header('x-addr', '127.0.0.1')
req.add_header('X-Requested-With', 'XMLHttpRequest')
if user_agent:
req.add_header('User-Agent', user_agent)
req.add_header('Referer', url)
subtitles = {}
if item.get('type') == 'VOD':
subs = item.get('subtitles')
if subs:
subtitles = self.extract_subtitles(episode_id, subs)
playlistpage = self._download_json(req, playlist_id, fatal=False)
if playlist_len == 1:
final_title = playlist_title or title
if is_live:
final_title = self._live_title(final_title)
else:
final_title = '%s (%s)' % (playlist_title, title)
if not playlistpage:
continue
entries.append({
'id': item_id,
'title': final_title,
'description': playlist_description if playlist_len == 1 else None,
'thumbnail': thumbnail,
'duration': duration,
'formats': formats,
'subtitles': subtitles,
'is_live': is_live,
})
playlist_url = playlistpage['url']
if playlist_url == 'error_region':
raise ExtractorError(NOT_AVAILABLE_STRING, expected=True)
req = sanitized_Request(compat_urllib_parse_unquote(playlist_url))
req.add_header('Referer', url)
playlist_title = self._og_search_title(webpage, default=None)
playlist_description = self._og_search_description(webpage, default=None)
playlist = self._download_json(req, playlist_id, fatal=False)
if not playlist:
continue
playlist = playlist.get('playlist')
if not isinstance(playlist, list):
continue
playlist_len = len(playlist)
for num, item in enumerate(playlist):
is_live = item.get('type') == 'LIVE'
formats = []
for format_id, stream_url in item.get('streamUrls', {}).items():
if 'playerType=flash' in stream_url:
stream_formats = self._extract_m3u8_formats(
stream_url, playlist_id, 'mp4',
entry_protocol='m3u8' if is_live else 'm3u8_native',
m3u8_id='hls-%s' % format_id, fatal=False)
else:
stream_formats = self._extract_mpd_formats(
stream_url, playlist_id,
mpd_id='dash-%s' % format_id, fatal=False)
# See https://github.com/rg3/youtube-dl/issues/12119#issuecomment-280037031
if format_id == 'audioDescription':
for f in stream_formats:
f['source_preference'] = -10
formats.extend(stream_formats)
if user_agent and len(entries) == playlist_len:
entries[num]['formats'].extend(formats)
continue
item_id = item.get('id') or item['assetId']
title = item['title']
duration = float_or_none(item.get('duration'))
thumbnail = item.get('previewImageUrl')
subtitles = {}
if item.get('type') == 'VOD':
subs = item.get('subtitles')
if subs:
subtitles = self.extract_subtitles(episode_id, subs)
if playlist_len == 1:
final_title = playlist_title or title
if is_live:
final_title = self._live_title(final_title)
else:
final_title = '%s (%s)' % (playlist_title, title)
entries.append({
'id': item_id,
'title': final_title,
'description': playlist_description if playlist_len == 1 else None,
'thumbnail': thumbnail,
'duration': duration,
'formats': formats,
'subtitles': subtitles,
'is_live': is_live,
})
for e in entries:
self._sort_formats(e['formats'])
return self.playlist_result(entries, playlist_id, playlist_title, playlist_description)

View File

@ -6,6 +6,7 @@ import hashlib
import json
import netrc
import os
import random
import re
import socket
import sys
@ -39,7 +40,10 @@ from ..utils import (
ExtractorError,
fix_xml_ampersands,
float_or_none,
GeoRestrictedError,
GeoUtils,
int_or_none,
js_to_json,
parse_iso8601,
RegexNotFoundError,
sanitize_filename,
@ -319,17 +323,34 @@ class InfoExtractor(object):
_real_extract() methods and define a _VALID_URL regexp.
Probably, they should also be added to the list of extractors.
_GEO_BYPASS attribute may be set to False in order to disable
geo restriction bypass mechanisms for a particular extractor.
Though it won't disable explicit geo restriction bypass based on
country code provided with geo_bypass_country. (experimental)
_GEO_COUNTRIES attribute may contain a list of presumably geo unrestricted
countries for this extractor. One of these countries will be used by
geo restriction bypass mechanism right away in order to bypass
geo restriction, of course, if the mechanism is not disabled. (experimental)
NB: both these geo attributes are experimental and may change in future
or be completely removed.
Finally, the _WORKING attribute should be set to False for broken IEs
in order to warn the users and skip the tests.
"""
_ready = False
_downloader = None
_x_forwarded_for_ip = None
_GEO_BYPASS = True
_GEO_COUNTRIES = None
_WORKING = True
def __init__(self, downloader=None):
"""Constructor. Receives an optional downloader."""
self._ready = False
self._x_forwarded_for_ip = None
self.set_downloader(downloader)
@classmethod
@ -358,15 +379,59 @@ class InfoExtractor(object):
def initialize(self):
"""Initializes an instance (authentication, etc)."""
self._initialize_geo_bypass(self._GEO_COUNTRIES)
if not self._ready:
self._real_initialize()
self._ready = True
def _initialize_geo_bypass(self, countries):
"""
Initialize geo restriction bypass mechanism.
This method is used to initialize geo bypass mechanism based on faking
X-Forwarded-For HTTP header. A random country from provided country list
is selected and a random IP belonging to this country is generated. This
IP will be passed as X-Forwarded-For HTTP header in all subsequent
HTTP requests.
This method will be used for initial geo bypass mechanism initialization
during the instance initialization with _GEO_COUNTRIES.
You may also manually call it from extractor's code if geo countries
information is not available beforehand (e.g. obtained during
extraction) or due to some another reason.
"""
if not self._x_forwarded_for_ip:
country_code = self._downloader.params.get('geo_bypass_country', None)
# If there is no explicit country for geo bypass specified and
# the extractor is known to be geo restricted let's fake IP
# as X-Forwarded-For right away.
if (not country_code and
self._GEO_BYPASS and
self._downloader.params.get('geo_bypass', True) and
countries):
country_code = random.choice(countries)
if country_code:
self._x_forwarded_for_ip = GeoUtils.random_ipv4(country_code)
if self._downloader.params.get('verbose', False):
self._downloader.to_stdout(
'[debug] Using fake IP %s (%s) as X-Forwarded-For.'
% (self._x_forwarded_for_ip, country_code.upper()))
def extract(self, url):
"""Extracts URL information and returns it in list of dicts."""
try:
self.initialize()
return self._real_extract(url)
for _ in range(2):
try:
self.initialize()
ie_result = self._real_extract(url)
if self._x_forwarded_for_ip:
ie_result['__x_forwarded_for_ip'] = self._x_forwarded_for_ip
return ie_result
except GeoRestrictedError as e:
if self.__maybe_fake_ip_and_retry(e.countries):
continue
raise
except ExtractorError:
raise
except compat_http_client.IncompleteRead as e:
@ -374,6 +439,21 @@ class InfoExtractor(object):
except (KeyError, StopIteration) as e:
raise ExtractorError('An extractor error has occurred.', cause=e)
def __maybe_fake_ip_and_retry(self, countries):
if (not self._downloader.params.get('geo_bypass_country', None) and
self._GEO_BYPASS and
self._downloader.params.get('geo_bypass', True) and
not self._x_forwarded_for_ip and
countries):
country_code = random.choice(countries)
self._x_forwarded_for_ip = GeoUtils.random_ipv4(country_code)
if self._x_forwarded_for_ip:
self.report_warning(
'Video is geo restricted. Retrying extraction with fake IP %s (%s) as X-Forwarded-For.'
% (self._x_forwarded_for_ip, country_code.upper()))
return True
return False
def set_downloader(self, downloader):
"""Sets the downloader for this IE."""
self._downloader = downloader
@ -433,6 +513,15 @@ class InfoExtractor(object):
if isinstance(url_or_request, (compat_str, str)):
url_or_request = url_or_request.partition('#')[0]
# Some sites check X-Forwarded-For HTTP header in order to figure out
# the origin of the client behind proxy. This allows bypassing geo
# restriction by faking this header's value to IP that belongs to some
# geo unrestricted country. We will do so once we encounter any
# geo restriction error.
if self._x_forwarded_for_ip:
if 'X-Forwarded-For' not in headers:
headers['X-Forwarded-For'] = self._x_forwarded_for_ip
urlh = self._request_webpage(url_or_request, video_id, note, errnote, fatal, data=data, headers=headers, query=query)
if urlh is False:
assert not fatal
@ -608,10 +697,8 @@ class InfoExtractor(object):
expected=True)
@staticmethod
def raise_geo_restricted(msg='This video is not available from your location due to geo restriction'):
raise ExtractorError(
'%s. You might want to use --proxy to workaround.' % msg,
expected=True)
def raise_geo_restricted(msg='This video is not available from your location due to geo restriction', countries=None):
raise GeoRestrictedError(msg, countries=countries)
# Methods for following #608
@staticmethod
@ -2073,6 +2160,123 @@ class InfoExtractor(object):
})
return formats
@staticmethod
def _find_jwplayer_data(webpage):
mobj = re.search(
r'jwplayer\((?P<quote>[\'"])[^\'" ]+(?P=quote)\)\.setup\s*\((?P<options>[^)]+)\)',
webpage)
if mobj:
return mobj.group('options')
def _extract_jwplayer_data(self, webpage, video_id, *args, **kwargs):
jwplayer_data = self._parse_json(
self._find_jwplayer_data(webpage), video_id,
transform_source=js_to_json)
return self._parse_jwplayer_data(
jwplayer_data, video_id, *args, **kwargs)
def _parse_jwplayer_data(self, jwplayer_data, video_id=None, require_title=True,
m3u8_id=None, mpd_id=None, rtmp_params=None, base_url=None):
# JWPlayer backward compatibility: flattened playlists
# https://github.com/jwplayer/jwplayer/blob/v7.4.3/src/js/api/config.js#L81-L96
if 'playlist' not in jwplayer_data:
jwplayer_data = {'playlist': [jwplayer_data]}
entries = []
# JWPlayer backward compatibility: single playlist item
# https://github.com/jwplayer/jwplayer/blob/v7.7.0/src/js/playlist/playlist.js#L10
if not isinstance(jwplayer_data['playlist'], list):
jwplayer_data['playlist'] = [jwplayer_data['playlist']]
for video_data in jwplayer_data['playlist']:
# JWPlayer backward compatibility: flattened sources
# https://github.com/jwplayer/jwplayer/blob/v7.4.3/src/js/playlist/item.js#L29-L35
if 'sources' not in video_data:
video_data['sources'] = [video_data]
this_video_id = video_id or video_data['mediaid']
formats = []
for source in video_data['sources']:
source_url = self._proto_relative_url(source['file'])
if base_url:
source_url = compat_urlparse.urljoin(base_url, source_url)
source_type = source.get('type') or ''
ext = mimetype2ext(source_type) or determine_ext(source_url)
if source_type == 'hls' or ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
source_url, this_video_id, 'mp4', 'm3u8_native', m3u8_id=m3u8_id, fatal=False))
elif ext == 'mpd':
formats.extend(self._extract_mpd_formats(
source_url, this_video_id, mpd_id=mpd_id, fatal=False))
# https://github.com/jwplayer/jwplayer/blob/master/src/js/providers/default.js#L67
elif source_type.startswith('audio') or ext in ('oga', 'aac', 'mp3', 'mpeg', 'vorbis'):
formats.append({
'url': source_url,
'vcodec': 'none',
'ext': ext,
})
else:
height = int_or_none(source.get('height'))
if height is None:
# Often no height is provided but there is a label in
# format like 1080p.
height = int_or_none(self._search_regex(
r'^(\d{3,})[pP]$', source.get('label') or '',
'height', default=None))
a_format = {
'url': source_url,
'width': int_or_none(source.get('width')),
'height': height,
'ext': ext,
}
if source_url.startswith('rtmp'):
a_format['ext'] = 'flv'
# See com/longtailvideo/jwplayer/media/RTMPMediaProvider.as
# of jwplayer.flash.swf
rtmp_url_parts = re.split(
r'((?:mp4|mp3|flv):)', source_url, 1)
if len(rtmp_url_parts) == 3:
rtmp_url, prefix, play_path = rtmp_url_parts
a_format.update({
'url': rtmp_url,
'play_path': prefix + play_path,
})
if rtmp_params:
a_format.update(rtmp_params)
formats.append(a_format)
self._sort_formats(formats)
subtitles = {}
tracks = video_data.get('tracks')
if tracks and isinstance(tracks, list):
for track in tracks:
if track.get('kind') != 'captions':
continue
track_url = urljoin(base_url, track.get('file'))
if not track_url:
continue
subtitles.setdefault(track.get('label') or 'en', []).append({
'url': self._proto_relative_url(track_url)
})
entries.append({
'id': this_video_id,
'title': video_data['title'] if require_title else video_data.get('title'),
'description': video_data.get('description'),
'thumbnail': self._proto_relative_url(video_data.get('image')),
'timestamp': int_or_none(video_data.get('pubdate')),
'duration': float_or_none(jwplayer_data.get('duration') or video_data.get('duration')),
'subtitles': subtitles,
'formats': formats,
})
if len(entries) == 1:
return entries[0]
else:
return self.playlist_result(entries)
def _live_title(self, name):
""" Generate the title for a live video """
now = datetime.datetime.now()

View File

@ -1,5 +1,7 @@
from __future__ import unicode_literals
import sys
from .common import InfoExtractor
from ..utils import ExtractorError
@ -33,7 +35,9 @@ class UnicodeBOMIE(InfoExtractor):
IE_DESC = False
_VALID_URL = r'(?P<bom>\ufeff)(?P<id>.*)$'
_TESTS = [{
# Disable test for python 3.2 since BOM is broken in re in this version
# (see https://github.com/rg3/youtube-dl/issues/9751)
_TESTS = [] if (3, 0) < sys.version_info <= (3, 3) else [{
'url': '\ufeffhttp://www.youtube.com/watch?v=BaW_jenozKc',
'only_matching': True,
}]

View File

@ -6,6 +6,7 @@ from ..utils import int_or_none
class CrackleIE(InfoExtractor):
_GEO_COUNTRIES = ['US']
_VALID_URL = r'(?:crackle:|https?://(?:(?:www|m)\.)?crackle\.com/(?:playlist/\d+/|(?:[^/]+/)+))(?P<id>\d+)'
_TEST = {
'url': 'http://www.crackle.com/comedians-in-cars-getting-coffee/2498934',

View File

@ -123,7 +123,7 @@ class CrunchyrollIE(CrunchyrollBaseIE):
'url': 'http://www.crunchyroll.com/wanna-be-the-strongest-in-the-world/episode-1-an-idol-wrestler-is-born-645513',
'info_dict': {
'id': '645513',
'ext': 'flv',
'ext': 'mp4',
'title': 'Wanna be the Strongest in the World Episode 1 An Idol-Wrestler is Born!',
'description': 'md5:2d17137920c64f2f49981a7797d275ef',
'thumbnail': 'http://img1.ak.crunchyroll.com/i/spire1-tmb/20c6b5e10f1a47b10516877d3c039cae1380951166_full.jpg',
@ -192,6 +192,21 @@ class CrunchyrollIE(CrunchyrollBaseIE):
# geo-restricted (US), 18+ maturity wall, non-premium available
'url': 'http://www.crunchyroll.com/cosplay-complex-ova/episode-1-the-birth-of-the-cosplay-club-565617',
'only_matching': True,
}, {
# A description with double quotes
'url': 'http://www.crunchyroll.com/11eyes/episode-1-piros-jszaka-red-night-535080',
'info_dict': {
'id': '535080',
'ext': 'mp4',
'title': '11eyes Episode 1 Piros éjszaka - Red Night',
'description': 'Kakeru and Yuka are thrown into an alternate nightmarish world they call "Red Night".',
'uploader': 'Marvelous AQL Inc.',
'upload_date': '20091021',
},
'params': {
# Just test metadata extraction
'skip_download': True,
},
}]
_FORMAT_IDS = {
@ -362,9 +377,9 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
r'(?s)<h1[^>]*>((?:(?!<h1).)*?<span[^>]+itemprop=["\']title["\'][^>]*>(?:(?!<h1).)+?)</h1>',
webpage, 'video_title')
video_title = re.sub(r' {2,}', ' ', video_title)
video_description = self._html_search_regex(
r'<script[^>]*>\s*.+?\[media_id=%s\].+?"description"\s*:\s*"([^"]+)' % video_id,
webpage, 'description', default=None)
video_description = self._parse_json(self._html_search_regex(
r'<script[^>]*>\s*.+?\[media_id=%s\].+?({.+?"description"\s*:.+?})\);' % video_id,
webpage, 'description', default='{}'), video_id).get('description')
if video_description:
video_description = lowercase_escape(video_description.replace(r'\r\n', '\n'))
video_upload_date = self._html_search_regex(

View File

@ -66,7 +66,6 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
'uploader_id': 'xijv66',
'age_limit': 0,
'view_count': int,
'comment_count': int,
}
},
# Vevo video
@ -140,7 +139,7 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
view_count = str_to_int(view_count_str)
comment_count = int_or_none(self._search_regex(
r'<meta[^>]+itemprop="interactionCount"[^>]+content="UserComments:(\d+)"',
webpage, 'comment count', fatal=False))
webpage, 'comment count', default=None))
player_v5 = self._search_regex(
[r'buildPlayer\(({.+?})\);\n', # See https://github.com/rg3/youtube-dl/issues/7826

View File

@ -20,6 +20,7 @@ from ..utils import (
class DramaFeverBaseIE(AMPIE):
_LOGIN_URL = 'https://www.dramafever.com/accounts/login/'
_NETRC_MACHINE = 'dramafever'
_GEO_COUNTRIES = ['US', 'CA']
_CONSUMER_SECRET = 'DA59dtVXYLxajktV'
@ -116,8 +117,9 @@ class DramaFeverIE(DramaFeverBaseIE):
'http://www.dramafever.com/amp/episode/feed.json?guid=%s' % video_id)
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError):
raise ExtractorError(
'Currently unavailable in your country.', expected=True)
self.raise_geo_restricted(
msg='Currently unavailable in your country',
countries=self._GEO_COUNTRIES)
raise
series_id, episode_number = video_id.split('.')

View File

@ -18,8 +18,8 @@ from ..utils import (
class EinthusanIE(InfoExtractor):
_VALID_URL = r'https?://einthusan\.tv/movie/watch/(?P<id>[0-9]+)'
_TEST = {
_VALID_URL = r'https?://einthusan\.tv/movie/watch/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://einthusan.tv/movie/watch/9097/',
'md5': 'ff0f7f2065031b8a2cf13a933731c035',
'info_dict': {
@ -29,7 +29,10 @@ class EinthusanIE(InfoExtractor):
'description': 'md5:33ef934c82a671a94652a9b4e54d931b',
'thumbnail': r're:^https?://.*\.jpg$',
}
}
}, {
'url': 'https://einthusan.tv/movie/watch/51MZ/?lang=hindi',
'only_matching': True,
}]
# reversed from jsoncrypto.prototype.decrypt() in einthusan-PGMovieWatcher.js
def _decrypt(self, encrypted_data, video_id):

View File

@ -1,13 +1,9 @@
# coding: utf-8
from __future__ import unicode_literals
import json
from .common import InfoExtractor
from ..utils import (
ExtractorError,
NO_DEFAULT,
)
from .kaltura import KalturaIE
from ..utils import NO_DEFAULT
class EllenTVIE(InfoExtractor):
@ -65,7 +61,7 @@ class EllenTVIE(InfoExtractor):
if partner_id and kaltura_id:
break
return self.url_result('kaltura:%s:%s' % (partner_id, kaltura_id), 'Kaltura')
return self.url_result('kaltura:%s:%s' % (partner_id, kaltura_id), KalturaIE.ie_key())
class EllenTVClipsIE(InfoExtractor):
@ -77,14 +73,14 @@ class EllenTVClipsIE(InfoExtractor):
'id': 'meryl-streep-vanessa-hudgens',
'title': 'Meryl Streep, Vanessa Hudgens',
},
'playlist_mincount': 7,
'playlist_mincount': 5,
}
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
playlist = self._extract_playlist(webpage)
playlist = self._extract_playlist(webpage, playlist_id)
return {
'_type': 'playlist',
@ -93,16 +89,13 @@ class EllenTVClipsIE(InfoExtractor):
'entries': self._extract_entries(playlist)
}
def _extract_playlist(self, webpage):
def _extract_playlist(self, webpage, playlist_id):
json_string = self._search_regex(r'playerView.addClips\(\[\{(.*?)\}\]\);', webpage, 'json')
try:
return json.loads('[{' + json_string + '}]')
except ValueError as ve:
raise ExtractorError('Failed to download JSON', cause=ve)
return self._parse_json('[{' + json_string + '}]', playlist_id)
def _extract_entries(self, playlist):
return [
self.url_result(
'kaltura:%s:%s' % (item['kaltura_partner_id'], item['kaltura_entry_id']),
'Kaltura')
KalturaIE.ie_key(), video_id=item['kaltura_entry_id'])
for item in playlist]

View File

@ -39,6 +39,18 @@ class ElPaisIE(InfoExtractor):
'description': 'La nave portaba cientos de ánforas y se hundió cerca de la isla de Cabrera por razones desconocidas',
'upload_date': '20170127',
},
}, {
'url': 'http://epv.elpais.com/epv/2017/02/14/programa_la_voz_de_inaki/1487062137_075943.html',
'info_dict': {
'id': '1487062137_075943',
'ext': 'mp4',
'title': 'Disyuntivas',
'description': 'md5:a0fb1485c4a6a8a917e6f93878e66218',
'upload_date': '20170214',
},
'params': {
'skip_download': True,
},
}]
def _real_extract(self, url):
@ -59,14 +71,15 @@ class ElPaisIE(InfoExtractor):
video_url = prefix + video_suffix
thumbnail_suffix = self._search_regex(
r"(?:URLMediaStill|urlFotogramaFijo_\d+)\s*=\s*url_cache\s*\+\s*'([^']+)'",
webpage, 'thumbnail URL', fatal=False)
webpage, 'thumbnail URL', default=None)
thumbnail = (
None if thumbnail_suffix is None
else prefix + thumbnail_suffix)
else prefix + thumbnail_suffix) or self._og_search_thumbnail(webpage)
title = self._html_search_regex(
(r"tituloVideo\s*=\s*'([^']+)'", webpage, 'title',
r'<h2 class="entry-header entry-title.*?>(.*?)</h2>'),
webpage, 'title')
(r"tituloVideo\s*=\s*'([^']+)'",
r'<h2 class="entry-header entry-title.*?>(.*?)</h2>',
r'<h1[^>]+class="titulo"[^>]*>([^<]+)'),
webpage, 'title', default=None) or self._og_search_title(webpage)
upload_date = unified_strdate(self._search_regex(
r'<p class="date-header date-int updated"\s+title="([^"]+)">',
webpage, 'upload date', default=None) or self._html_search_meta(

View File

@ -694,6 +694,8 @@ from .ondemandkorea import OnDemandKoreaIE
from .onet import (
OnetIE,
OnetChannelIE,
OnetMVPIE,
OnetPlIE,
)
from .onionstudios import OnionStudiosIE
from .ooyala import (
@ -1007,6 +1009,7 @@ from .tvc import (
)
from .tvigle import TvigleIE
from .tvland import TVLandIE
from .tvn24 import TVN24IE
from .tvnoe import TVNoeIE
from .tvp import (
TVPEmbedIE,
@ -1147,6 +1150,7 @@ from .vlive import (
VLiveChannelIE
)
from .vodlocker import VodlockerIE
from .vodpl import VODPlIE
from .vodplatform import VODPlatformIE
from .voicerepublic import VoiceRepublicIE
from .voxmedia import VoxMediaIE

View File

@ -20,6 +20,7 @@ from ..utils import (
float_or_none,
HEADRequest,
is_html,
js_to_json,
orderedSet,
sanitized_Request,
smuggle_url,
@ -961,6 +962,16 @@ class GenericIE(InfoExtractor):
'skip_download': True,
}
},
# Complex jwplayer
{
'url': 'http://www.indiedb.com/games/king-machine/videos',
'info_dict': {
'id': 'videos',
'ext': 'mp4',
'title': 'king machine trailer 1',
'thumbnail': r're:^https?://.*\.jpg$',
},
},
# rtl.nl embed
{
'url': 'http://www.rtlnieuws.nl/nieuws/buitenland/aanslagen-kopenhagen',
@ -1490,7 +1501,12 @@ class GenericIE(InfoExtractor):
'skip_download': True,
},
'add_ie': [VideoPressIE.ie_key()],
}
},
{
# ThePlatform embedded with whitespaces in URLs
'url': 'http://www.golfchannel.com/topics/shows/golftalkcentral.htm',
'only_matching': True,
},
# {
# # TODO: find another test
# # http://schema.org/VideoObject
@ -2488,6 +2504,15 @@ class GenericIE(InfoExtractor):
self._sort_formats(entry['formats'])
return self.playlist_result(entries)
jwplayer_data_str = self._find_jwplayer_data(webpage)
if jwplayer_data_str:
try:
jwplayer_data = self._parse_json(
jwplayer_data_str, video_id, transform_source=js_to_json)
return self._parse_jwplayer_data(jwplayer_data, video_id)
except ExtractorError:
pass
def check_video(vurl):
if YoutubeIE.suitable(vurl):
return True

View File

@ -37,6 +37,7 @@ class GoIE(AdobePassIE):
}
}
_VALID_URL = r'https?://(?:(?P<sub_domain>%s)\.)?go\.com/(?:[^/]+/)*(?:vdka(?P<id>\w+)|season-\d+/\d+-(?P<display_id>[^/?#]+))' % '|'.join(_SITE_INFO.keys())
_GEO_COUNTRIES = ['US']
_TESTS = [{
'url': 'http://abc.go.com/shows/castle/video/most-recent/vdka0_g86w5onx',
'info_dict': {
@ -101,6 +102,10 @@ class GoIE(AdobePassIE):
video_id, data=urlencode_postdata(data), headers=self.geo_verification_headers())
errors = entitlement.get('errors', {}).get('errors', [])
if errors:
for error in errors:
if error.get('code') == 1002:
self.raise_geo_restricted(
error['message'], countries=self._GEO_COUNTRIES)
error_message = ', '.join([error['message'] for error in errors])
raise ExtractorError('%s said: %s' % (self.IE_NAME, error_message), expected=True)
asset_url += '?' + entitlement['uplynkData']['sessionKey']

View File

@ -6,59 +6,58 @@ from ..utils import (
determine_ext,
int_or_none,
parse_iso8601,
xpath_text,
)
class HeiseIE(InfoExtractor):
_VALID_URL = r'''(?x)
https?://(?:www\.)?heise\.de/video/artikel/
.+?(?P<id>[0-9]+)\.html(?:$|[?#])
'''
_TEST = {
'url': (
'http://www.heise.de/video/artikel/Podcast-c-t-uplink-3-3-Owncloud-Tastaturen-Peilsender-Smartphone-2404147.html'
),
_VALID_URL = r'https?://(?:www\.)?heise\.de/(?:[^/]+/)+[^/]+-(?P<id>[0-9]+)\.html'
_TESTS = [{
'url': 'http://www.heise.de/video/artikel/Podcast-c-t-uplink-3-3-Owncloud-Tastaturen-Peilsender-Smartphone-2404147.html',
'md5': 'ffed432483e922e88545ad9f2f15d30e',
'info_dict': {
'id': '2404147',
'ext': 'mp4',
'title': (
"Podcast: c't uplink 3.3 Owncloud / Tastaturen / Peilsender Smartphone"
),
'title': "Podcast: c't uplink 3.3 Owncloud / Tastaturen / Peilsender Smartphone",
'format_id': 'mp4_720p',
'timestamp': 1411812600,
'upload_date': '20140927',
'description': 'In uplink-Episode 3.3 geht es darum, wie man sich von Cloud-Anbietern emanzipieren kann, worauf man beim Kauf einer Tastatur achten sollte und was Smartphones über uns verraten.',
'thumbnail': r're:^https?://.*\.jpe?g$',
'description': 'md5:c934cbfb326c669c2bcabcbe3d3fcd20',
'thumbnail': r're:^https?://.*/gallery/$',
}
}
}, {
'url': 'http://www.heise.de/ct/artikel/c-t-uplink-3-3-Owncloud-Tastaturen-Peilsender-Smartphone-2403911.html',
'only_matching': True,
}, {
'url': 'http://www.heise.de/newsticker/meldung/c-t-uplink-Owncloud-Tastaturen-Peilsender-Smartphone-2404251.html?wt_mc=rss.ho.beitrag.atom',
'only_matching': True,
}, {
'url': 'http://www.heise.de/ct/ausgabe/2016-12-Spiele-3214137.html',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
container_id = self._search_regex(
r'<div class="videoplayerjw".*?data-container="([0-9]+)"',
r'<div class="videoplayerjw"[^>]+data-container="([0-9]+)"',
webpage, 'container ID')
sequenz_id = self._search_regex(
r'<div class="videoplayerjw".*?data-sequenz="([0-9]+)"',
r'<div class="videoplayerjw"[^>]+data-sequenz="([0-9]+)"',
webpage, 'sequenz ID')
data_url = 'http://www.heise.de/videout/feed?container=%s&sequenz=%s' % (container_id, sequenz_id)
doc = self._download_xml(data_url, video_id)
info = {
'id': video_id,
'thumbnail': self._og_search_thumbnail(webpage),
'timestamp': parse_iso8601(
self._html_search_meta('date', webpage)),
'description': self._og_search_description(webpage),
}
title = self._html_search_meta('fulltitle', webpage, default=None)
if not title or title == "c't":
title = self._search_regex(
r'<div[^>]+class="videoplayerjw"[^>]+data-title="([^"]+)"',
webpage, 'title')
title = self._html_search_meta('fulltitle', webpage)
if title:
info['title'] = title
else:
info['title'] = self._og_search_title(webpage)
doc = self._download_xml(
'http://www.heise.de/videout/feed', video_id, query={
'container': container_id,
'sequenz': sequenz_id,
})
formats = []
for source_node in doc.findall('.//{http://rss.jwpcdn.com/}source'):
@ -74,6 +73,18 @@ class HeiseIE(InfoExtractor):
'height': height,
})
self._sort_formats(formats)
info['formats'] = formats
return info
description = self._og_search_description(
webpage, default=None) or self._html_search_meta(
'description', webpage)
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': (xpath_text(doc, './/{http://rss.jwpcdn.com/}image') or
self._og_search_thumbnail(webpage)),
'timestamp': parse_iso8601(
self._html_search_meta('date', webpage)),
'formats': formats,
}

View File

@ -8,12 +8,12 @@ from .common import InfoExtractor
from ..utils import (
determine_ext,
js_to_json,
sanitized_Request,
)
class IPrimaIE(InfoExtractor):
_VALID_URL = r'https?://play\.iprima\.cz/(?:.+/)?(?P<id>[^?#]+)'
_GEO_BYPASS = False
_TESTS = [{
'url': 'http://play.iprima.cz/gondici-s-r-o-33',
@ -29,6 +29,10 @@ class IPrimaIE(InfoExtractor):
}, {
'url': 'http://play.iprima.cz/particka/particka-92',
'only_matching': True,
}, {
# geo restricted
'url': 'http://play.iprima.cz/closer-nove-pripady/closer-nove-pripady-iv-1',
'only_matching': True,
}]
def _real_extract(self, url):
@ -38,11 +42,13 @@ class IPrimaIE(InfoExtractor):
video_id = self._search_regex(r'data-product="([^"]+)">', webpage, 'real id')
req = sanitized_Request(
'http://play.iprima.cz/prehravac/init?_infuse=1'
'&_ts=%s&productId=%s' % (round(time.time()), video_id))
req.add_header('Referer', url)
playerpage = self._download_webpage(req, video_id, note='Downloading player')
playerpage = self._download_webpage(
'http://play.iprima.cz/prehravac/init',
video_id, note='Downloading player', query={
'_infuse': 1,
'_ts': round(time.time()),
'productId': video_id,
}, headers={'Referer': url})
formats = []
@ -82,7 +88,7 @@ class IPrimaIE(InfoExtractor):
extract_formats(src)
if not formats and '>GEO_IP_NOT_ALLOWED<' in playerpage:
self.raise_geo_restricted()
self.raise_geo_restricted(countries=['CZ'])
self._sort_formats(formats)

View File

@ -24,6 +24,7 @@ from ..utils import (
class ITVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?itv\.com/hub/[^/]+/(?P<id>[0-9a-zA-Z]+)'
_GEO_COUNTRIES = ['GB']
_TEST = {
'url': 'http://www.itv.com/hub/mr-bean-animated-series/2a2936a0053',
'info_dict': {
@ -98,7 +99,11 @@ class ITVIE(InfoExtractor):
headers=headers, data=etree.tostring(req_env))
playlist = xpath_element(resp_env, './/Playlist')
if playlist is None:
fault_code = xpath_text(resp_env, './/faultcode')
fault_string = xpath_text(resp_env, './/faultstring')
if fault_code == 'InvalidGeoRegion':
self.raise_geo_restricted(
msg=fault_string, countries=self._GEO_COUNTRIES)
raise ExtractorError('%s said: %s' % (self.IE_NAME, fault_string))
title = xpath_text(playlist, 'EpisodeTitle', fatal=True)
video_element = xpath_element(playlist, 'VideoEntries/Video', fatal=True)

View File

@ -4,139 +4,9 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import (
determine_ext,
float_or_none,
int_or_none,
js_to_json,
mimetype2ext,
urljoin,
)
class JWPlatformBaseIE(InfoExtractor):
@staticmethod
def _find_jwplayer_data(webpage):
# TODO: Merge this with JWPlayer-related codes in generic.py
mobj = re.search(
r'jwplayer\((?P<quote>[\'"])[^\'" ]+(?P=quote)\)\.setup\s*\((?P<options>[^)]+)\)',
webpage)
if mobj:
return mobj.group('options')
def _extract_jwplayer_data(self, webpage, video_id, *args, **kwargs):
jwplayer_data = self._parse_json(
self._find_jwplayer_data(webpage), video_id,
transform_source=js_to_json)
return self._parse_jwplayer_data(
jwplayer_data, video_id, *args, **kwargs)
def _parse_jwplayer_data(self, jwplayer_data, video_id=None, require_title=True,
m3u8_id=None, mpd_id=None, rtmp_params=None, base_url=None):
# JWPlayer backward compatibility: flattened playlists
# https://github.com/jwplayer/jwplayer/blob/v7.4.3/src/js/api/config.js#L81-L96
if 'playlist' not in jwplayer_data:
jwplayer_data = {'playlist': [jwplayer_data]}
entries = []
# JWPlayer backward compatibility: single playlist item
# https://github.com/jwplayer/jwplayer/blob/v7.7.0/src/js/playlist/playlist.js#L10
if not isinstance(jwplayer_data['playlist'], list):
jwplayer_data['playlist'] = [jwplayer_data['playlist']]
for video_data in jwplayer_data['playlist']:
# JWPlayer backward compatibility: flattened sources
# https://github.com/jwplayer/jwplayer/blob/v7.4.3/src/js/playlist/item.js#L29-L35
if 'sources' not in video_data:
video_data['sources'] = [video_data]
this_video_id = video_id or video_data['mediaid']
formats = []
for source in video_data['sources']:
source_url = self._proto_relative_url(source['file'])
if base_url:
source_url = compat_urlparse.urljoin(base_url, source_url)
source_type = source.get('type') or ''
ext = mimetype2ext(source_type) or determine_ext(source_url)
if source_type == 'hls' or ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
source_url, this_video_id, 'mp4', 'm3u8_native', m3u8_id=m3u8_id, fatal=False))
elif ext == 'mpd':
formats.extend(self._extract_mpd_formats(
source_url, this_video_id, mpd_id=mpd_id, fatal=False))
# https://github.com/jwplayer/jwplayer/blob/master/src/js/providers/default.js#L67
elif source_type.startswith('audio') or ext in ('oga', 'aac', 'mp3', 'mpeg', 'vorbis'):
formats.append({
'url': source_url,
'vcodec': 'none',
'ext': ext,
})
else:
height = int_or_none(source.get('height'))
if height is None:
# Often no height is provided but there is a label in
# format like 1080p.
height = int_or_none(self._search_regex(
r'^(\d{3,})[pP]$', source.get('label') or '',
'height', default=None))
a_format = {
'url': source_url,
'width': int_or_none(source.get('width')),
'height': height,
'ext': ext,
}
if source_url.startswith('rtmp'):
a_format['ext'] = 'flv'
# See com/longtailvideo/jwplayer/media/RTMPMediaProvider.as
# of jwplayer.flash.swf
rtmp_url_parts = re.split(
r'((?:mp4|mp3|flv):)', source_url, 1)
if len(rtmp_url_parts) == 3:
rtmp_url, prefix, play_path = rtmp_url_parts
a_format.update({
'url': rtmp_url,
'play_path': prefix + play_path,
})
if rtmp_params:
a_format.update(rtmp_params)
formats.append(a_format)
self._sort_formats(formats)
subtitles = {}
tracks = video_data.get('tracks')
if tracks and isinstance(tracks, list):
for track in tracks:
if track.get('kind') != 'captions':
continue
track_url = urljoin(base_url, track.get('file'))
if not track_url:
continue
subtitles.setdefault(track.get('label') or 'en', []).append({
'url': self._proto_relative_url(track_url)
})
entries.append({
'id': this_video_id,
'title': video_data['title'] if require_title else video_data.get('title'),
'description': video_data.get('description'),
'thumbnail': self._proto_relative_url(video_data.get('image')),
'timestamp': int_or_none(video_data.get('pubdate')),
'duration': float_or_none(jwplayer_data.get('duration') or video_data.get('duration')),
'subtitles': subtitles,
'formats': formats,
})
if len(entries) == 1:
return entries[0]
else:
return self.playlist_result(entries)
class JWPlatformIE(JWPlatformBaseIE):
class JWPlatformIE(InfoExtractor):
_VALID_URL = r'(?:https?://content\.jwplatform\.com/(?:feeds|players|jw6)/|jwplatform:)(?P<id>[a-zA-Z0-9]{8})'
_TEST = {
'url': 'http://content.jwplatform.com/players/nPripu9l-ALJ3XQCI.js',

View File

@ -4,11 +4,13 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_HTTPError
from ..utils import (
determine_ext,
float_or_none,
int_or_none,
unsmuggle_url,
ExtractorError,
)
@ -20,9 +22,17 @@ class LimelightBaseIE(InfoExtractor):
headers = {}
if referer:
headers['Referer'] = referer
return self._download_json(
self._PLAYLIST_SERVICE_URL % (self._PLAYLIST_SERVICE_PATH, item_id, method),
item_id, 'Downloading PlaylistService %s JSON' % method, fatal=fatal, headers=headers)
try:
return self._download_json(
self._PLAYLIST_SERVICE_URL % (self._PLAYLIST_SERVICE_PATH, item_id, method),
item_id, 'Downloading PlaylistService %s JSON' % method, fatal=fatal, headers=headers)
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
error = self._parse_json(e.cause.read().decode(), item_id)['detail']['contentAccessPermission']
if error == 'CountryDisabled':
self.raise_geo_restricted()
raise ExtractorError(error, expected=True)
raise
def _call_api(self, organization_id, item_id, method):
return self._download_json(
@ -213,6 +223,7 @@ class LimelightMediaIE(LimelightBaseIE):
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
video_id = self._match_id(url)
self._initialize_geo_bypass(smuggled_data.get('geo_countries'))
pc, mobile, metadata = self._extract(
video_id, 'getPlaylistByMediaId',

View File

@ -6,12 +6,12 @@ from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_urllib_parse_unquote,
compat_urllib_parse_urlencode,
)
from ..utils import (
determine_ext,
ExtractorError,
int_or_none,
urlencode_postdata,
get_element_by_attribute,
mimetype2ext,
)
@ -50,6 +50,21 @@ class MetacafeIE(InfoExtractor):
},
'skip': 'Page is temporarily unavailable.',
},
# metacafe video with family filter
{
'url': 'http://www.metacafe.com/watch/2155630/adult_art_by_david_hart_156/',
'md5': 'b06082c5079bbdcde677a6291fbdf376',
'info_dict': {
'id': '2155630',
'ext': 'mp4',
'title': 'Adult Art By David Hart 156',
'uploader': '63346',
'description': 'md5:9afac8fc885252201ad14563694040fc',
},
'params': {
'skip_download': True,
},
},
# AnyClip video
{
'url': 'http://www.metacafe.com/watch/an-dVVXnuY7Jh77J/the_andromeda_strain_1971_stop_the_bomb_part_3/',
@ -112,22 +127,6 @@ class MetacafeIE(InfoExtractor):
def report_disclaimer(self):
self.to_screen('Retrieving disclaimer')
def _confirm_age(self):
# Retrieve disclaimer
self.report_disclaimer()
self._download_webpage(self._DISCLAIMER, None, False, 'Unable to retrieve disclaimer')
# Confirm age
self.report_age_confirmation()
self._download_webpage(
self._FILTER_POST, None, False, 'Unable to confirm age',
data=urlencode_postdata({
'filters': '0',
'submit': "Continue - I'm over 18",
}), headers={
'Content-Type': 'application/x-www-form-urlencoded',
})
def _real_extract(self, url):
# Extract id and simplified title from URL
video_id, display_id = re.match(self._VALID_URL, url).groups()
@ -143,13 +142,15 @@ class MetacafeIE(InfoExtractor):
if prefix == 'cb':
return self.url_result('theplatform:%s' % ext_id, 'ThePlatform')
# self._confirm_age()
headers = {
# Disable family filter
'Cookie': 'user=%s; ' % compat_urllib_parse_urlencode({'ffilter': False})
}
# AnyClip videos require the flashversion cookie so that we get the link
# to the mp4 file
headers = {}
if video_id.startswith('an-'):
headers['Cookie'] = 'flashVersion=0;'
headers['Cookie'] += 'flashVersion=0; '
# Retrieve video webpage to extract further information
webpage = self._download_webpage(url, video_id, headers=headers)

View File

@ -19,6 +19,7 @@ class NineCNineMediaBaseIE(InfoExtractor):
class NineCNineMediaStackIE(NineCNineMediaBaseIE):
IE_NAME = '9c9media:stack'
_GEO_COUNTRIES = ['CA']
_VALID_URL = r'9c9media:stack:(?P<destination_code>[^:]+):(?P<content_id>\d+):(?P<content_package>\d+):(?P<id>\d+)'
def _real_extract(self, url):

View File

@ -1,7 +1,6 @@
# coding: utf-8
from __future__ import unicode_literals
import random
import re
from .common import InfoExtractor
@ -15,24 +14,7 @@ from ..utils import (
class NRKBaseIE(InfoExtractor):
_faked_ip = None
def _download_webpage_handle(self, *args, **kwargs):
# NRK checks X-Forwarded-For HTTP header in order to figure out the
# origin of the client behind proxy. This allows to bypass geo
# restriction by faking this header's value to some Norway IP.
# We will do so once we encounter any geo restriction error.
if self._faked_ip:
# NB: str is intentional
kwargs.setdefault(str('headers'), {})['X-Forwarded-For'] = self._faked_ip
return super(NRKBaseIE, self)._download_webpage_handle(*args, **kwargs)
def _fake_ip(self):
# Use fake IP from 37.191.128.0/17 in order to workaround geo
# restriction
def octet(lb=0, ub=255):
return random.randint(lb, ub)
self._faked_ip = '37.191.%d.%d' % (octet(128), octet())
_GEO_COUNTRIES = ['NO']
def _real_extract(self, url):
video_id = self._match_id(url)
@ -44,8 +26,6 @@ class NRKBaseIE(InfoExtractor):
title = data.get('fullTitle') or data.get('mainTitle') or data['title']
video_id = data.get('id') or video_id
http_headers = {'X-Forwarded-For': self._faked_ip} if self._faked_ip else {}
entries = []
conviva = data.get('convivaStatistics') or {}
@ -90,7 +70,6 @@ class NRKBaseIE(InfoExtractor):
'duration': duration,
'subtitles': subtitles,
'formats': formats,
'http_headers': http_headers,
})
if not entries:
@ -107,19 +86,17 @@ class NRKBaseIE(InfoExtractor):
}]
if not entries:
message_type = data.get('messageType', '')
# Can be ProgramIsGeoBlocked or ChannelIsGeoBlocked*
if 'IsGeoBlocked' in message_type and not self._faked_ip:
self.report_warning(
'Video is geo restricted, trying to fake IP')
self._fake_ip()
return self._real_extract(url)
MESSAGES = {
'ProgramRightsAreNotReady': 'Du kan dessverre ikke se eller høre programmet',
'ProgramRightsHasExpired': 'Programmet har gått ut',
'ProgramIsGeoBlocked': 'NRK har ikke rettigheter til å vise dette programmet utenfor Norge',
}
message_type = data.get('messageType', '')
# Can be ProgramIsGeoBlocked or ChannelIsGeoBlocked*
if 'IsGeoBlocked' in message_type:
self.raise_geo_restricted(
msg=MESSAGES.get('ProgramIsGeoBlocked'),
countries=self._GEO_COUNTRIES)
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, MESSAGES.get(
message_type, message_type)),
@ -188,12 +165,12 @@ class NRKIE(NRKBaseIE):
https?://
(?:
(?:www\.)?nrk\.no/video/PS\*|
v8-psapi\.nrk\.no/mediaelement/
v8[-.]psapi\.nrk\.no/mediaelement/
)
)
(?P<id>[^/?#&]+)
(?P<id>[^?#&]+)
'''
_API_HOST = 'v8.psapi.nrk.no'
_API_HOST = 'v8-psapi.nrk.no'
_TESTS = [{
# video
'url': 'http://www.nrk.no/video/PS*150533',
@ -219,6 +196,9 @@ class NRKIE(NRKBaseIE):
}, {
'url': 'nrk:ecc1b952-96dc-4a98-81b9-5296dc7a98d9',
'only_matching': True,
}, {
'url': 'nrk:clip/7707d5a3-ebe7-434a-87d5-a3ebe7a34a70',
'only_matching': True,
}, {
'url': 'https://v8-psapi.nrk.no/mediaelement/ecc1b952-96dc-4a98-81b9-5296dc7a98d9',
'only_matching': True,

View File

@ -1,15 +1,16 @@
# coding: utf-8
from __future__ import unicode_literals
from .jwplatform import JWPlatformBaseIE
from .common import InfoExtractor
from ..utils import (
ExtractorError,
js_to_json,
)
class OnDemandKoreaIE(JWPlatformBaseIE):
class OnDemandKoreaIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?ondemandkorea\.com/(?P<id>[^/]+)\.html'
_GEO_COUNTRIES = ['US', 'CA']
_TEST = {
'url': 'http://www.ondemandkorea.com/ask-us-anything-e43.html',
'info_dict': {
@ -35,7 +36,8 @@ class OnDemandKoreaIE(JWPlatformBaseIE):
if 'msg_block_01.png' in webpage:
self.raise_geo_restricted(
'This content is not available in your region')
msg='This content is not available in your region',
countries=self._GEO_COUNTRIES)
if 'This video is only available to ODK PLUS members.' in webpage:
raise ExtractorError(

View File

@ -23,7 +23,7 @@ class OnetBaseIE(InfoExtractor):
return self._search_regex(
r'id=(["\'])mvp:(?P<id>.+?)\1', webpage, 'mvp id', group='id')
def _extract_from_id(self, video_id, webpage):
def _extract_from_id(self, video_id, webpage=None):
response = self._download_json(
'http://qi.ckm.onetapi.pl/', video_id,
query={
@ -74,8 +74,10 @@ class OnetBaseIE(InfoExtractor):
meta = video.get('meta', {})
title = self._og_search_title(webpage, default=None) or meta['title']
description = self._og_search_description(webpage, default=None) or meta.get('description')
title = (self._og_search_title(
webpage, default=None) if webpage else None) or meta['title']
description = (self._og_search_description(
webpage, default=None) if webpage else None) or meta.get('description')
duration = meta.get('length') or meta.get('lenght')
timestamp = parse_iso8601(meta.get('addDate'), ' ')
@ -89,6 +91,18 @@ class OnetBaseIE(InfoExtractor):
}
class OnetMVPIE(OnetBaseIE):
_VALID_URL = r'onetmvp:(?P<id>\d+\.\d+)'
_TEST = {
'url': 'onetmvp:381027.1509591944',
'only_matching': True,
}
def _real_extract(self, url):
return self._extract_from_id(self._match_id(url))
class OnetIE(OnetBaseIE):
_VALID_URL = r'https?://(?:www\.)?onet\.tv/[a-z]/[a-z]+/(?P<display_id>[0-9a-z-]+)/(?P<id>[0-9a-z]+)'
IE_NAME = 'onet.tv'
@ -167,3 +181,44 @@ class OnetChannelIE(OnetBaseIE):
channel_title = strip_or_none(get_element_by_class('o_channelName', webpage))
channel_description = strip_or_none(get_element_by_class('o_channelDesc', webpage))
return self.playlist_result(entries, channel_id, channel_title, channel_description)
class OnetPlIE(InfoExtractor):
_VALID_URL = r'https?://(?:[^/]+\.)?(?:onet|businessinsider\.com|plejada)\.pl/(?:[^/]+/)+(?P<id>[0-9a-z]+)'
IE_NAME = 'onet.pl'
_TESTS = [{
'url': 'http://eurosport.onet.pl/zimowe/skoki-narciarskie/ziobro-wygral-kwalifikacje-w-pjongczangu/9ckrly',
'md5': 'b94021eb56214c3969380388b6e73cb0',
'info_dict': {
'id': '1561707.1685479',
'ext': 'mp4',
'title': 'Ziobro wygrał kwalifikacje w Pjongczangu',
'description': 'md5:61fb0740084d2d702ea96512a03585b4',
'upload_date': '20170214',
'timestamp': 1487078046,
},
}, {
'url': 'http://film.onet.pl/zwiastuny/ghost-in-the-shell-drugi-zwiastun-pl/5q6yl3',
'only_matching': True,
}, {
'url': 'http://moto.onet.pl/jak-wybierane-sa-miejsca-na-fotoradary/6rs04e',
'only_matching': True,
}, {
'url': 'http://businessinsider.com.pl/wideo/scenariusz-na-koniec-swiata-wedlug-nasa/dwnqptk',
'only_matching': True,
}, {
'url': 'http://plejada.pl/weronika-rosati-o-swoim-domniemanym-slubie/n2bq89',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
mvp_id = self._search_regex(
r'data-params-mvp=["\'](\d+\.\d+)', webpage, 'mvp id')
return self.url_result(
'onetmvp:%s' % mvp_id, OnetMVPIE.ie_key(), video_id=mvp_id)

View File

@ -75,17 +75,17 @@ class OpenloadIE(InfoExtractor):
'<span[^>]+id="[^"]+"[^>]*>([0-9]+)</span>',
webpage, 'openload ID')
first_three_chars = int(float(ol_id[0:][:3]))
fifth_char = int(float(ol_id[3:5]))
urlcode = ''
num = 5
first_two_chars = int(float(ol_id[0:][:2]))
urlcode = []
num = 2
while num < len(ol_id):
urlcode += compat_chr(int(float(ol_id[num:][:3])) +
first_three_chars - fifth_char * int(float(ol_id[num + 3:][:2])))
key = int(float(ol_id[num + 3:][:2]))
urlcode.append((key, compat_chr(int(float(ol_id[num:][:3])) - first_two_chars)))
num += 5
video_url = 'https://openload.co/stream/' + urlcode
video_url = 'https://openload.co/stream/' + ''.join(
[value for _, value in sorted(urlcode, key=lambda x: x[0])])
title = self._og_search_title(webpage, default=None) or self._search_regex(
r'<span[^>]+class=["\']title["\'][^>]*>([^<]+)', webpage,

View File

@ -193,6 +193,8 @@ class PBSIE(InfoExtractor):
)
''' % '|'.join(list(zip(*_STATIONS))[0])
_GEO_COUNTRIES = ['US']
_TESTS = [
{
'url': 'http://www.pbs.org/tpt/constitution-usa-peter-sagal/watch/a-more-perfect-union/',
@ -489,11 +491,13 @@ class PBSIE(InfoExtractor):
headers=self.geo_verification_headers())
if redirect_info['status'] == 'error':
message = self._ERRORS.get(
redirect_info['http_code'], redirect_info['message'])
if redirect_info['http_code'] == 403:
self.raise_geo_restricted(
msg=message, countries=self._GEO_COUNTRIES)
raise ExtractorError(
'%s said: %s' % (
self.IE_NAME,
self._ERRORS.get(redirect_info['http_code'], redirect_info['message'])),
expected=True)
'%s said: %s' % (self.IE_NAME, message), expected=True)
format_url = redirect_info.get('url')
if not format_url:

View File

@ -64,7 +64,8 @@ class PinkbikeIE(InfoExtractor):
'video:duration', webpage, 'duration'))
uploader = self._search_regex(
r'un:\s*"([^"]+)"', webpage, 'uploader', fatal=False)
r'<a[^>]+\brel=["\']author[^>]+>([^<]+)', webpage,
'uploader', fatal=False)
upload_date = unified_strdate(self._search_regex(
r'class="fullTime"[^>]+title="([^"]+)"',
webpage, 'upload date', fatal=False))

View File

@ -2,27 +2,27 @@
from __future__ import unicode_literals
import itertools
import os
# import os
import re
from .common import InfoExtractor
from ..compat import (
compat_HTTPError,
compat_urllib_parse_unquote,
compat_urllib_parse_unquote_plus,
compat_urllib_parse_urlparse,
# compat_urllib_parse_unquote,
# compat_urllib_parse_unquote_plus,
# compat_urllib_parse_urlparse,
)
from ..utils import (
ExtractorError,
int_or_none,
js_to_json,
orderedSet,
sanitized_Request,
# sanitized_Request,
str_to_int,
)
from ..aes import (
aes_decrypt_text
)
# from ..aes import (
# aes_decrypt_text
# )
class PornHubIE(InfoExtractor):
@ -109,10 +109,14 @@ class PornHubIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
req = sanitized_Request(
'http://www.pornhub.com/view_video.php?viewkey=%s' % video_id)
req.add_header('Cookie', 'age_verified=1')
webpage = self._download_webpage(req, video_id)
def dl_webpage(platform):
return self._download_webpage(
'http://www.pornhub.com/view_video.php?viewkey=%s' % video_id,
video_id, headers={
'Cookie': 'age_verified=1; platform=%s' % platform,
})
webpage = dl_webpage('pc')
error_msg = self._html_search_regex(
r'(?s)<div[^>]+class=(["\'])(?:(?!\1).)*\b(?:removed|userMessageSection)\b(?:(?!\1).)*\1[^>]*>(?P<error>.+?)</div>',
@ -123,10 +127,19 @@ class PornHubIE(InfoExtractor):
'PornHub said: %s' % error_msg,
expected=True, video_id=video_id)
tv_webpage = dl_webpage('tv')
video_url = self._search_regex(
r'<video[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//.+?)\1', tv_webpage,
'video url', group='url')
title = self._search_regex(
r'<h1>([^>]+)</h1>', tv_webpage, 'title', default=None)
# video_title from flashvars contains whitespace instead of non-ASCII (see
# http://www.pornhub.com/view_video.php?viewkey=1331683002), not relying
# on that anymore.
title = self._html_search_meta(
title = title or self._html_search_meta(
'twitter:title', webpage, default=None) or self._search_regex(
(r'<h1[^>]+class=["\']title["\'][^>]*>(?P<title>[^<]+)',
r'<div[^>]+data-video-title=(["\'])(?P<title>.+?)\1',
@ -156,48 +169,6 @@ class PornHubIE(InfoExtractor):
comment_count = self._extract_count(
r'All Comments\s*<span>\(([\d,.]+)\)', webpage, 'comment')
video_variables = {}
for video_variablename, quote, video_variable in re.findall(
r'(player_quality_[0-9]{3,4}p\w+)\s*=\s*(["\'])(.+?)\2;', webpage):
video_variables[video_variablename] = video_variable
video_urls = []
for encoded_video_url in re.findall(
r'player_quality_[0-9]{3,4}p\s*=(.+?);', webpage):
for varname, varval in video_variables.items():
encoded_video_url = encoded_video_url.replace(varname, varval)
video_urls.append(re.sub(r'[\s+]', '', encoded_video_url))
if webpage.find('"encrypted":true') != -1:
password = compat_urllib_parse_unquote_plus(
self._search_regex(r'"video_title":"([^"]+)', webpage, 'password'))
video_urls = list(map(lambda s: aes_decrypt_text(s, password, 32).decode('utf-8'), video_urls))
formats = []
for video_url in video_urls:
path = compat_urllib_parse_urlparse(video_url).path
extension = os.path.splitext(path)[1][1:]
format = path.split('/')[5].split('_')[:2]
format = '-'.join(format)
m = re.match(r'^(?P<height>[0-9]+)[pP]-(?P<tbr>[0-9]+)[kK]$', format)
if m is None:
height = None
tbr = None
else:
height = int(m.group('height'))
tbr = int(m.group('tbr'))
formats.append({
'url': video_url,
'ext': extension,
'format': format,
'format_id': format,
'tbr': tbr,
'height': height,
})
self._sort_formats(formats)
page_params = self._parse_json(self._search_regex(
r'page_params\.zoneDetails\[([\'"])[^\'"]+\1\]\s*=\s*(?P<data>{[^}]+})',
webpage, 'page parameters', group='data', default='{}'),
@ -209,6 +180,7 @@ class PornHubIE(InfoExtractor):
return {
'id': video_id,
'url': video_url,
'uploader': video_uploader,
'title': title,
'thumbnail': thumbnail,
@ -217,7 +189,7 @@ class PornHubIE(InfoExtractor):
'like_count': like_count,
'dislike_count': dislike_count,
'comment_count': comment_count,
'formats': formats,
# 'formats': formats,
'age_limit': 18,
'tags': tags,
'categories': categories,

View File

@ -2,13 +2,13 @@ from __future__ import unicode_literals
import re
from .jwplatform import JWPlatformBaseIE
from .common import InfoExtractor
from ..utils import (
str_to_int,
)
class PornoXOIE(JWPlatformBaseIE):
class PornoXOIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?pornoxo\.com/videos/(?P<id>\d+)/(?P<display_id>[^/]+)\.html'
_TEST = {
'url': 'http://www.pornoxo.com/videos/7564/striptease-from-sexy-secretary.html',

View File

@ -424,3 +424,6 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
return self._extract_clip(url, webpage)
elif page_type == 'playlist':
return self._extract_playlist(url, webpage)
else:
raise ExtractorError(
'Unsupported page type %s' % page_type, expected=True)

View File

@ -2,11 +2,10 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from .jwplatform import JWPlatformBaseIE
from ..compat import compat_str
class RENTVIE(JWPlatformBaseIE):
class RENTVIE(InfoExtractor):
_VALID_URL = r'(?:rentv:|https?://(?:www\.)?ren\.tv/(?:player|video/epizod)/)(?P<id>\d+)'
_TESTS = [{
'url': 'http://ren.tv/video/epizod/118577',

View File

@ -3,7 +3,7 @@ from __future__ import unicode_literals
import re
from .jwplatform import JWPlatformBaseIE
from .common import InfoExtractor
from ..utils import (
js_to_json,
get_element_by_class,
@ -11,7 +11,7 @@ from ..utils import (
)
class RudoIE(JWPlatformBaseIE):
class RudoIE(InfoExtractor):
_VALID_URL = r'https?://rudo\.video/vod/(?P<id>[0-9a-zA-Z]+)'
_TEST = {

View File

@ -1,11 +1,11 @@
# coding: utf-8
from __future__ import unicode_literals
from .jwplatform import JWPlatformBaseIE
from .common import InfoExtractor
from ..utils import js_to_json
class ScreencastOMaticIE(JWPlatformBaseIE):
class ScreencastOMaticIE(InfoExtractor):
_VALID_URL = r'https?://screencast-o-matic\.com/watch/(?P<id>[0-9a-zA-Z]+)'
_TEST = {
'url': 'http://screencast-o-matic.com/watch/c2lD3BeOPl',

View File

@ -3,7 +3,7 @@ from __future__ import unicode_literals
import re
from .jwplatform import JWPlatformBaseIE
from .common import InfoExtractor
from ..utils import (
float_or_none,
parse_iso8601,
@ -14,7 +14,7 @@ from ..utils import (
)
class SendtoNewsIE(JWPlatformBaseIE):
class SendtoNewsIE(InfoExtractor):
_VALID_URL = r'https?://embed\.sendtonews\.com/player2/embedplayer\.php\?.*\bSC=(?P<id>[0-9A-Za-z-]+)'
_TEST = {

View File

@ -23,6 +23,10 @@ class SpankBangIE(InfoExtractor):
# 480p only
'url': 'http://spankbang.com/1vt0/video/solvane+gangbang',
'only_matching': True,
}, {
# no uploader
'url': 'http://spankbang.com/lklg/video/sex+with+anyone+wedding+edition+2',
'only_matching': True,
}]
def _real_extract(self, url):
@ -48,7 +52,7 @@ class SpankBangIE(InfoExtractor):
thumbnail = self._og_search_thumbnail(webpage)
uploader = self._search_regex(
r'class="user"[^>]*><img[^>]+>([^<]+)',
webpage, 'uploader', fatal=False)
webpage, 'uploader', default=None)
age_limit = self._rta_search(webpage)

View File

@ -14,6 +14,8 @@ from ..utils import (
class SRGSSRIE(InfoExtractor):
_VALID_URL = r'(?:https?://tp\.srgssr\.ch/p(?:/[^/]+)+\?urn=urn|srgssr):(?P<bu>srf|rts|rsi|rtr|swi):(?:[^:]+:)?(?P<type>video|audio):(?P<id>[0-9a-f\-]{36}|\d+)'
_GEO_BYPASS = False
_GEO_COUNTRIES = ['CH']
_ERRORS = {
'AGERATING12': 'To protect children under the age of 12, this video is only available between 8 p.m. and 6 a.m.',
@ -40,8 +42,12 @@ class SRGSSRIE(InfoExtractor):
media_id)[media_type.capitalize()]
if media_data.get('block') and media_data['block'] in self._ERRORS:
raise ExtractorError('%s said: %s' % (
self.IE_NAME, self._ERRORS[media_data['block']]), expected=True)
message = self._ERRORS[media_data['block']]
if media_data['block'] == 'GEOBLOCK':
self.raise_geo_restricted(
msg=message, countries=self._GEO_COUNTRIES)
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, message), expected=True)
return media_data

View File

@ -13,6 +13,8 @@ from ..utils import (
class SVTBaseIE(InfoExtractor):
_GEO_COUNTRIES = ['SE']
def _extract_video(self, video_info, video_id):
formats = []
for vr in video_info['videoReferences']:
@ -38,7 +40,9 @@ class SVTBaseIE(InfoExtractor):
'url': vurl,
})
if not formats and video_info.get('rights', {}).get('geoBlockedSweden'):
self.raise_geo_restricted('This video is only available in Sweden')
self.raise_geo_restricted(
'This video is only available in Sweden',
countries=self._GEO_COUNTRIES)
self._sort_formats(formats)
subtitles = {}

View File

@ -2,7 +2,10 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import int_or_none
from ..utils import (
int_or_none,
smuggle_url,
)
class TeleQuebecIE(InfoExtractor):
@ -28,7 +31,7 @@ class TeleQuebecIE(InfoExtractor):
return {
'_type': 'url_transparent',
'id': media_id,
'url': 'limelight:media:' + media_data['streamInfo']['sourceId'],
'url': smuggle_url('limelight:media:' + media_data['streamInfo']['sourceId'], {'geo_countries': ['CA']}),
'title': media_data['title'],
'description': media_data.get('descriptions', [{'text': None}])[0].get('text'),
'duration': int_or_none(media_data.get('durationInMilliseconds'), 1000),

View File

@ -8,10 +8,12 @@ from ..utils import (
HEADRequest,
ExtractorError,
int_or_none,
clean_html,
)
class TFOIE(InfoExtractor):
_GEO_COUNTRIES = ['CA']
_VALID_URL = r'https?://(?:www\.)?tfo\.org/(?:en|fr)/(?:[^/]+/){2}(?P<id>\d+)'
_TEST = {
'url': 'http://www.tfo.org/en/universe/tfo-247/100463871/video-game-hackathon',
@ -36,7 +38,9 @@ class TFOIE(InfoExtractor):
'X-tfo-session': self._get_cookies('http://www.tfo.org/')['tfo-session'].value,
})
if infos.get('success') == 0:
raise ExtractorError('%s said: %s' % (self.IE_NAME, infos['msg']), expected=True)
if infos.get('code') == 'ErrGeoBlocked':
self.raise_geo_restricted(countries=self._GEO_COUNTRIES)
raise ExtractorError('%s said: %s' % (self.IE_NAME, clean_html(infos['msg'])), expected=True)
video_data = infos['data']
return {

View File

@ -179,10 +179,12 @@ class ThePlatformIE(ThePlatformBaseIE, AdobePassIE):
if m:
return [m.group('url')]
# Are whitesapces ignored in URLs?
# https://github.com/rg3/youtube-dl/issues/12044
matches = re.findall(
r'<(?:iframe|script)[^>]+src=(["\'])((?:https?:)?//player\.theplatform\.com/p/.+?)\1', webpage)
r'(?s)<(?:iframe|script)[^>]+src=(["\'])((?:https?:)?//player\.theplatform\.com/p/.+?)\1', webpage)
if matches:
return list(zip(*matches))[1]
return [re.sub(r'\s', '', list(zip(*matches))[1][0])]
@staticmethod
def _sign_url(url, sig_key, sig_secret, life=600, include_qs=False):

View File

@ -3,13 +3,14 @@ from __future__ import unicode_literals
import re
from .jwplatform import JWPlatformBaseIE
from .common import InfoExtractor
from ..utils import remove_end
class ThisAVIE(JWPlatformBaseIE):
class ThisAVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?thisav\.com/video/(?P<id>[0-9]+)/.*'
_TESTS = [{
# jwplayer
'url': 'http://www.thisav.com/video/47734/%98%26sup1%3B%83%9E%83%82---just-fit.html',
'md5': '0480f1ef3932d901f0e0e719f188f19b',
'info_dict': {
@ -20,6 +21,7 @@ class ThisAVIE(JWPlatformBaseIE):
'uploader_id': 'dj7970'
}
}, {
# html5 media
'url': 'http://www.thisav.com/video/242352/nerdy-18yo-big-ass-tattoos-and-glasses.html',
'md5': 'ba90c076bd0f80203679e5b60bf523ee',
'info_dict': {
@ -48,8 +50,12 @@ class ThisAVIE(JWPlatformBaseIE):
}],
}
else:
info_dict = self._extract_jwplayer_data(
webpage, video_id, require_title=False)
entries = self._parse_html5_media_entries(url, webpage, video_id)
if entries:
info_dict = entries[0]
else:
info_dict = self._extract_jwplayer_data(
webpage, video_id, require_title=False)
uploader = self._html_search_regex(
r': <a href="http://www.thisav.com/user/[0-9]+/(?:[^"]+)">([^<]+)</a>',
webpage, 'uploader name', fatal=False)

View File

@ -24,6 +24,7 @@ class TV4IE(InfoExtractor):
sport/|
)
)(?P<id>[0-9]+)'''
_GEO_COUNTRIES = ['SE']
_TESTS = [
{
'url': 'http://www.tv4.se/kalla-fakta/klipp/kalla-fakta-5-english-subtitles-2491650',
@ -71,16 +72,12 @@ class TV4IE(InfoExtractor):
'http://www.tv4play.se/player/assets/%s.json' % video_id,
video_id, 'Downloading video info JSON')
# If is_geo_restricted is true, it doesn't necessarily mean we can't download it
if info.get('is_geo_restricted'):
self.report_warning('This content might not be available in your country due to licensing restrictions.')
title = info['title']
subtitles = {}
formats = []
# http formats are linked with unresolvable host
for kind in ('hls', ''):
for kind in ('hls3', ''):
data = self._download_json(
'https://prima.tv4play.se/api/web/asset/%s/play.json' % video_id,
video_id, 'Downloading sources JSON', query={
@ -113,6 +110,10 @@ class TV4IE(InfoExtractor):
'url': manifest_url,
'ext': 'vtt',
}]})
if not formats and info.get('is_geo_restricted'):
self.raise_geo_restricted(countries=self._GEO_COUNTRIES)
self._sort_formats(formats)
return {

View File

@ -0,0 +1,76 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
unescapeHTML,
)
class TVN24IE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:[^/]+)\.)?tvn24(?:bis)?\.pl/(?:[^/]+/)*(?P<id>[^/]+)\.html'
_TESTS = [{
'url': 'http://www.tvn24.pl/wiadomosci-z-kraju,3/oredzie-artura-andrusa,702428.html',
'md5': 'fbdec753d7bc29d96036808275f2130c',
'info_dict': {
'id': '1584444',
'ext': 'mp4',
'title': '"Święta mają być wesołe, dlatego, ludziska, wszyscy pod jemiołę"',
'description': 'Wyjątkowe orędzie Artura Andrusa, jednego z gości "Szkła kontaktowego".',
'thumbnail': 're:http://.*[.]jpeg',
}
}, {
'url': 'http://fakty.tvn24.pl/ogladaj-online,60/53-konferencja-bezpieczenstwa-w-monachium,716431.html',
'only_matching': True,
}, {
'url': 'http://sport.tvn24.pl/pilka-nozna,105/ligue-1-kamil-glik-rozcial-glowe-monaco-tylko-remisuje-z-bastia,716522.html',
'only_matching': True,
}, {
'url': 'http://tvn24bis.pl/poranek,146,m/gen-koziej-w-tvn24-bis-wracamy-do-czasow-zimnej-wojny,715660.html',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = self._og_search_title(webpage)
def extract_json(attr, name, fatal=True):
return self._parse_json(
self._search_regex(
r'\b%s=(["\'])(?P<json>(?!\1).+?)\1' % attr, webpage,
name, group='json', fatal=fatal) or '{}',
video_id, transform_source=unescapeHTML, fatal=fatal)
quality_data = extract_json('data-quality', 'formats')
formats = []
for format_id, url in quality_data.items():
formats.append({
'url': url,
'format_id': format_id,
'height': int_or_none(format_id.rstrip('p')),
})
self._sort_formats(formats)
description = self._og_search_description(webpage)
thumbnail = self._og_search_thumbnail(
webpage, default=None) or self._html_search_regex(
r'\bdata-poster=(["\'])(?P<url>(?!\1).+?)\1', webpage,
'thumbnail', group='url')
share_params = extract_json(
'data-share-params', 'share params', fatal=False)
if isinstance(share_params, dict):
video_id = share_params.get('id') or video_id
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'formats': formats,
}

View File

@ -1,7 +1,7 @@
# coding: utf-8
from __future__ import unicode_literals
from .jwplatform import JWPlatformBaseIE
from .common import InfoExtractor
from ..utils import (
clean_html,
get_element_by_class,
@ -9,7 +9,7 @@ from ..utils import (
)
class TVNoeIE(JWPlatformBaseIE):
class TVNoeIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?tvnoe\.cz/video/(?P<id>[0-9]+)'
_TEST = {
'url': 'http://www.tvnoe.cz/video/10362',

View File

@ -12,7 +12,7 @@ from ..utils import (
class TwentyFourVideoIE(InfoExtractor):
IE_NAME = '24video'
_VALID_URL = r'https?://(?:www\.)?24video\.(?:net|me|xxx|sex)/(?:video/(?:view|xml)/|player/new24_play\.swf\?id=)(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?24video\.(?:net|me|xxx|sex|tube)/(?:video/(?:view|xml)/|player/new24_play\.swf\?id=)(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.24video.net/video/view/1044982',
@ -37,6 +37,9 @@ class TwentyFourVideoIE(InfoExtractor):
}, {
'url': 'http://www.24video.me/video/view/1044982',
'only_matching': True,
}, {
'url': 'http://www.24video.tube/video/view/2363750',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@ -20,6 +20,7 @@ class Vbox7IE(InfoExtractor):
)
(?P<id>[\da-fA-F]+)
'''
_GEO_COUNTRIES = ['BG']
_TESTS = [{
'url': 'http://vbox7.com/play:0946fff23c',
'md5': 'a60f9ab3a3a2f013ef9a967d5f7be5bf',
@ -78,7 +79,7 @@ class Vbox7IE(InfoExtractor):
video_url = video['src']
if '/na.mp4' in video_url:
self.raise_geo_restricted()
self.raise_geo_restricted(countries=self._GEO_COUNTRIES)
uploader = video.get('uploader')

View File

@ -14,6 +14,7 @@ from ..utils import (
class VGTVIE(XstreamIE):
IE_DESC = 'VGTV, BTTV, FTV, Aftenposten and Aftonbladet'
_GEO_BYPASS = False
_HOST_TO_APPNAME = {
'vgtv.no': 'vgtv',
@ -217,7 +218,8 @@ class VGTVIE(XstreamIE):
properties = try_get(
data, lambda x: x['streamConfiguration']['properties'], list)
if properties and 'geoblocked' in properties:
raise self.raise_geo_restricted()
raise self.raise_geo_restricted(
countries=[host.rpartition('.')[-1].partition('/')[0].upper()])
self._sort_formats(info['formats'])

View File

@ -70,10 +70,10 @@ class ViceBaseIE(AdobePassIE):
'url': uplynk_preplay_url,
'id': video_id,
'title': title,
'description': base.get('body'),
'description': base.get('body') or base.get('display_body'),
'thumbnail': watch_hub_data.get('cover-image') or watch_hub_data.get('thumbnail'),
'duration': parse_duration(video_data.get('video_duration') or watch_hub_data.get('video-duration')),
'timestamp': int_or_none(video_data.get('created_at')),
'duration': int_or_none(video_data.get('video_duration')) or parse_duration(watch_hub_data.get('video-duration')),
'timestamp': int_or_none(video_data.get('created_at'), 1000),
'age_limit': parse_age_limit(video_data.get('video_rating')),
'series': video_data.get('show_title') or watch_hub_data.get('show-title'),
'episode_number': int_or_none(episode.get('episode_number') or watch_hub_data.get('episode')),

View File

@ -7,16 +7,16 @@ from .vice import ViceBaseIE
class VicelandIE(ViceBaseIE):
_VALID_URL = r'https?://(?:www\.)?viceland\.com/[^/]+/video/[^/]+/(?P<id>[a-f0-9]+)'
_TEST = {
'url': 'https://www.viceland.com/en_us/video/cyberwar-trailer/57608447973ee7705f6fbd4e',
'url': 'https://www.viceland.com/en_us/video/trapped/588a70d0dba8a16007de7316',
'info_dict': {
'id': '57608447973ee7705f6fbd4e',
'id': '588a70d0dba8a16007de7316',
'ext': 'mp4',
'title': 'CYBERWAR (Trailer)',
'description': 'Tapping into the geopolitics of hacking and surveillance, Ben Makuch travels the world to meet with hackers, government officials, and dissidents to investigate the ecosystem of cyberwarfare.',
'title': 'TRAPPED (Series Trailer)',
'description': 'md5:7a8e95c2b6cd86461502a2845e581ccf',
'age_limit': 14,
'timestamp': 1466008539,
'upload_date': '20160615',
'uploader_id': '11',
'timestamp': 1485474122,
'upload_date': '20170126',
'uploader_id': '57a204098cb727dec794c6a3',
'uploader': 'Viceland',
},
'params': {

View File

@ -3,7 +3,7 @@ from __future__ import unicode_literals
import re
from .jwplatform import JWPlatformBaseIE
from .common import InfoExtractor
from ..utils import (
decode_packed_codes,
js_to_json,
@ -12,8 +12,8 @@ from ..utils import (
)
class VidziIE(JWPlatformBaseIE):
_VALID_URL = r'https?://(?:www\.)?vidzi\.tv/(?:embed-)?(?P<id>[0-9a-zA-Z]+)'
class VidziIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?vidzi\.(?:tv|cc)/(?:embed-)?(?P<id>[0-9a-zA-Z]+)'
_TESTS = [{
'url': 'http://vidzi.tv/cghql9yq6emu.html',
'md5': '4f16c71ca0c8c8635ab6932b5f3f1660',
@ -29,6 +29,9 @@ class VidziIE(JWPlatformBaseIE):
}, {
'url': 'http://vidzi.tv/embed-4z2yb0rzphe9-600x338.html',
'skip_download': True,
}, {
'url': 'http://vidzi.cc/cghql9yq6emu.html',
'skip_download': True,
}]
def _real_extract(self, url):

View File

@ -86,7 +86,9 @@ class ViewsterIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
# Get 'api_token' cookie
self._request_webpage(HEADRequest('http://www.viewster.com/'), video_id)
self._request_webpage(
HEADRequest('http://www.viewster.com/'),
video_id, headers=self.geo_verification_headers())
cookies = self._get_cookies('http://www.viewster.com/')
self._AUTH_TOKEN = compat_urllib_parse_unquote(cookies['api_token'].value)

View File

@ -27,6 +27,7 @@ class VikiBaseIE(InfoExtractor):
_APP_VERSION = '2.2.5.1428709186'
_APP_SECRET = '-$iJ}@p7!G@SyU/je1bEyWg}upLu-6V6-Lg9VD(]siH,r.,m-r|ulZ,U4LC/SeR)'
_GEO_BYPASS = False
_NETRC_MACHINE = 'viki'
_token = None
@ -77,8 +78,11 @@ class VikiBaseIE(InfoExtractor):
def _check_errors(self, data):
for reason, status in data.get('blocking', {}).items():
if status and reason in self._ERRORS:
message = self._ERRORS[reason]
if reason == 'geo':
self.raise_geo_restricted(msg=message)
raise ExtractorError('%s said: %s' % (
self.IE_NAME, self._ERRORS[reason]), expected=True)
self.IE_NAME, message), expected=True)
def _real_initialize(self):
self._login()

View File

@ -0,0 +1,32 @@
# coding: utf-8
from __future__ import unicode_literals
from .onet import OnetBaseIE
class VODPlIE(OnetBaseIE):
_VALID_URL = r'https?://vod\.pl/(?:[^/]+/)+(?P<id>[0-9a-zA-Z]+)'
_TESTS = [{
'url': 'https://vod.pl/filmy/chlopaki-nie-placza/3ep3jns',
'md5': 'a7dc3b2f7faa2421aefb0ecaabf7ec74',
'info_dict': {
'id': '3ep3jns',
'ext': 'mp4',
'title': 'Chłopaki nie płaczą',
'description': 'md5:f5f03b84712e55f5ac9f0a3f94445224',
'timestamp': 1463415154,
'duration': 5765,
'upload_date': '20160516',
},
}, {
'url': 'https://vod.pl/seriale/belfer-na-planie-praca-kamery-online/2c10heh',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
info_dict = self._extract_from_id(self._search_mvp_id(webpage), webpage)
info_dict['id'] = video_id
return info_dict

View File

@ -1,10 +1,10 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from .youtube import YoutubeIE
from .jwplatform import JWPlatformBaseIE
class WimpIE(JWPlatformBaseIE):
class WimpIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?wimp\.com/(?P<id>[^/]+)'
_TESTS = [{
'url': 'http://www.wimp.com/maru-is-exhausted/',

View File

@ -228,17 +228,29 @@ def parseOpts(overrideArguments=None):
action='store_const', const='::', dest='source_address',
help='Make all connections via IPv6',
)
network.add_option(
geo = optparse.OptionGroup(parser, 'Geo Restriction')
geo.add_option(
'--geo-verification-proxy',
dest='geo_verification_proxy', default=None, metavar='URL',
help='Use this proxy to verify the IP address for some geo-restricted sites. '
'The default proxy specified by --proxy (or none, if the options is not present) is used for the actual downloading.'
)
network.add_option(
'The default proxy specified by --proxy (or none, if the options is not present) is used for the actual downloading.')
geo.add_option(
'--cn-verification-proxy',
dest='cn_verification_proxy', default=None, metavar='URL',
help=optparse.SUPPRESS_HELP,
)
help=optparse.SUPPRESS_HELP)
geo.add_option(
'--geo-bypass',
action='store_true', dest='geo_bypass', default=True,
help='Bypass geographic restriction via faking X-Forwarded-For HTTP header (experimental)')
geo.add_option(
'--no-geo-bypass',
action='store_false', dest='geo_bypass', default=True,
help='Do not bypass geographic restriction via faking X-Forwarded-For HTTP header (experimental)')
geo.add_option(
'--geo-bypass-country', metavar='CODE',
dest='geo_bypass_country', default=None,
help='Force bypass geographic restriction with explicitly provided two-letter ISO 3166-2 country code (experimental)')
selection = optparse.OptionGroup(parser, 'Video Selection')
selection.add_option(
@ -298,14 +310,16 @@ def parseOpts(overrideArguments=None):
metavar='FILTER', dest='match_filter', default=None,
help=(
'Generic video filter. '
'Specify any key (see help for -o for a list of available keys) to'
' match if the key is present, '
'!key to check if the key is not present,'
'Specify any key (see help for -o for a list of available keys) to '
'match if the key is present, '
'!key to check if the key is not present, '
'key > NUMBER (like "comment_count > 12", also works with '
'>=, <, <=, !=, =) to compare against a number, and '
'& to require multiple matches. '
'Values which are not known are excluded unless you'
' put a question mark (?) after the operator.'
'>=, <, <=, !=, =) to compare against a number, '
'key = \'LITERAL\' (like "uploader = \'Mike Smith\'", also works with !=) '
'to match against a string literal '
'and & to require multiple matches. '
'Values which are not known are excluded unless you '
'put a question mark (?) after the operator. '
'For example, to only match videos that have been liked more than '
'100 times and disliked less than 50 times (or the dislike '
'functionality is not available at the given service), but who '
@ -834,6 +848,7 @@ def parseOpts(overrideArguments=None):
parser.add_option_group(general)
parser.add_option_group(network)
parser.add_option_group(geo)
parser.add_option_group(selection)
parser.add_option_group(downloader)
parser.add_option_group(filesystem)

View File

@ -23,6 +23,7 @@ import operator
import os
import pipes
import platform
import random
import re
import socket
import ssl
@ -701,7 +702,12 @@ def bug_reports_message():
return msg
class ExtractorError(Exception):
class YoutubeDLError(Exception):
"""Base exception for YoutubeDL errors."""
pass
class ExtractorError(YoutubeDLError):
"""Error during info extraction."""
def __init__(self, msg, tb=None, expected=False, cause=None, video_id=None):
@ -742,7 +748,19 @@ class RegexNotFoundError(ExtractorError):
pass
class DownloadError(Exception):
class GeoRestrictedError(ExtractorError):
"""Geographic restriction Error exception.
This exception may be thrown when a video is not available from your
geographic location due to geographic restrictions imposed by a website.
"""
def __init__(self, msg, countries=None):
super(GeoRestrictedError, self).__init__(msg, expected=True)
self.msg = msg
self.countries = countries
class DownloadError(YoutubeDLError):
"""Download Error exception.
This exception may be thrown by FileDownloader objects if they are not
@ -756,7 +774,7 @@ class DownloadError(Exception):
self.exc_info = exc_info
class SameFileError(Exception):
class SameFileError(YoutubeDLError):
"""Same File exception.
This exception will be thrown by FileDownloader objects if they detect
@ -765,7 +783,7 @@ class SameFileError(Exception):
pass
class PostProcessingError(Exception):
class PostProcessingError(YoutubeDLError):
"""Post Processing exception.
This exception may be raised by PostProcessor's .run() method to
@ -773,15 +791,16 @@ class PostProcessingError(Exception):
"""
def __init__(self, msg):
super(PostProcessingError, self).__init__(msg)
self.msg = msg
class MaxDownloadsReached(Exception):
class MaxDownloadsReached(YoutubeDLError):
""" --max-downloads limit has been reached. """
pass
class UnavailableVideoError(Exception):
class UnavailableVideoError(YoutubeDLError):
"""Unavailable Format exception.
This exception will be thrown when a video is requested
@ -790,7 +809,7 @@ class UnavailableVideoError(Exception):
pass
class ContentTooShortError(Exception):
class ContentTooShortError(YoutubeDLError):
"""Content Too Short exception.
This exception may be raised by FileDownloader objects when a file they
@ -799,12 +818,15 @@ class ContentTooShortError(Exception):
"""
def __init__(self, downloaded, expected):
super(ContentTooShortError, self).__init__(
'Downloaded {0} bytes, expected {1} bytes'.format(downloaded, expected)
)
# Both in bytes
self.downloaded = downloaded
self.expected = expected
class XAttrMetadataError(Exception):
class XAttrMetadataError(YoutubeDLError):
def __init__(self, code=None, msg='Unknown error'):
super(XAttrMetadataError, self).__init__(msg)
self.code = code
@ -820,7 +842,7 @@ class XAttrMetadataError(Exception):
self.reason = 'NOT_SUPPORTED'
class XAttrUnavailableError(Exception):
class XAttrUnavailableError(YoutubeDLError):
pass
@ -2383,6 +2405,7 @@ def _match_one(filter_part, dct):
\s*(?P<op>%s)(?P<none_inclusive>\s*\?)?\s*
(?:
(?P<intval>[0-9.]+(?:[kKmMgGtTpPeEzZyY]i?[Bb]?)?)|
(?P<quote>["\'])(?P<quotedstrval>(?:\\.|(?!(?P=quote)|\\).)+?)(?P=quote)|
(?P<strval>(?![0-9.])[a-z0-9A-Z]*)
)
\s*$
@ -2391,7 +2414,8 @@ def _match_one(filter_part, dct):
if m:
op = COMPARISON_OPERATORS[m.group('op')]
actual_value = dct.get(m.group('key'))
if (m.group('strval') is not None or
if (m.group('quotedstrval') is not None or
m.group('strval') is not None or
# If the original field is a string and matching comparisonvalue is
# a number we should respect the origin of the original field
# and process comparison value as a string (see
@ -2401,7 +2425,10 @@ def _match_one(filter_part, dct):
if m.group('op') not in ('=', '!='):
raise ValueError(
'Operator %s does not support string values!' % m.group('op'))
comparison_value = m.group('strval') or m.group('intval')
comparison_value = m.group('quotedstrval') or m.group('strval') or m.group('intval')
quote = m.group('quote')
if quote is not None:
comparison_value = comparison_value.replace(r'\%s' % quote, quote)
else:
try:
comparison_value = int(m.group('intval'))
@ -3013,6 +3040,260 @@ class ISO3166Utils(object):
return cls._country_map.get(code.upper())
class GeoUtils(object):
# Major IPv4 address blocks per country
_country_ip_map = {
'AD': '85.94.160.0/19',
'AE': '94.200.0.0/13',
'AF': '149.54.0.0/17',
'AG': '209.59.64.0/18',
'AI': '204.14.248.0/21',
'AL': '46.99.0.0/16',
'AM': '46.70.0.0/15',
'AO': '105.168.0.0/13',
'AP': '159.117.192.0/21',
'AR': '181.0.0.0/12',
'AS': '202.70.112.0/20',
'AT': '84.112.0.0/13',
'AU': '1.128.0.0/11',
'AW': '181.41.0.0/18',
'AZ': '5.191.0.0/16',
'BA': '31.176.128.0/17',
'BB': '65.48.128.0/17',
'BD': '114.130.0.0/16',
'BE': '57.0.0.0/8',
'BF': '129.45.128.0/17',
'BG': '95.42.0.0/15',
'BH': '37.131.0.0/17',
'BI': '154.117.192.0/18',
'BJ': '137.255.0.0/16',
'BL': '192.131.134.0/24',
'BM': '196.12.64.0/18',
'BN': '156.31.0.0/16',
'BO': '161.56.0.0/16',
'BQ': '161.0.80.0/20',
'BR': '152.240.0.0/12',
'BS': '24.51.64.0/18',
'BT': '119.2.96.0/19',
'BW': '168.167.0.0/16',
'BY': '178.120.0.0/13',
'BZ': '179.42.192.0/18',
'CA': '99.224.0.0/11',
'CD': '41.243.0.0/16',
'CF': '196.32.200.0/21',
'CG': '197.214.128.0/17',
'CH': '85.0.0.0/13',
'CI': '154.232.0.0/14',
'CK': '202.65.32.0/19',
'CL': '152.172.0.0/14',
'CM': '165.210.0.0/15',
'CN': '36.128.0.0/10',
'CO': '181.240.0.0/12',
'CR': '201.192.0.0/12',
'CU': '152.206.0.0/15',
'CV': '165.90.96.0/19',
'CW': '190.88.128.0/17',
'CY': '46.198.0.0/15',
'CZ': '88.100.0.0/14',
'DE': '53.0.0.0/8',
'DJ': '197.241.0.0/17',
'DK': '87.48.0.0/12',
'DM': '192.243.48.0/20',
'DO': '152.166.0.0/15',
'DZ': '41.96.0.0/12',
'EC': '186.68.0.0/15',
'EE': '90.190.0.0/15',
'EG': '156.160.0.0/11',
'ER': '196.200.96.0/20',
'ES': '88.0.0.0/11',
'ET': '196.188.0.0/14',
'EU': '2.16.0.0/13',
'FI': '91.152.0.0/13',
'FJ': '144.120.0.0/16',
'FM': '119.252.112.0/20',
'FO': '88.85.32.0/19',
'FR': '90.0.0.0/9',
'GA': '41.158.0.0/15',
'GB': '25.0.0.0/8',
'GD': '74.122.88.0/21',
'GE': '31.146.0.0/16',
'GF': '161.22.64.0/18',
'GG': '62.68.160.0/19',
'GH': '45.208.0.0/14',
'GI': '85.115.128.0/19',
'GL': '88.83.0.0/19',
'GM': '160.182.0.0/15',
'GN': '197.149.192.0/18',
'GP': '104.250.0.0/19',
'GQ': '105.235.224.0/20',
'GR': '94.64.0.0/13',
'GT': '168.234.0.0/16',
'GU': '168.123.0.0/16',
'GW': '197.214.80.0/20',
'GY': '181.41.64.0/18',
'HK': '113.252.0.0/14',
'HN': '181.210.0.0/16',
'HR': '93.136.0.0/13',
'HT': '148.102.128.0/17',
'HU': '84.0.0.0/14',
'ID': '39.192.0.0/10',
'IE': '87.32.0.0/12',
'IL': '79.176.0.0/13',
'IM': '5.62.80.0/20',
'IN': '117.192.0.0/10',
'IO': '203.83.48.0/21',
'IQ': '37.236.0.0/14',
'IR': '2.176.0.0/12',
'IS': '82.221.0.0/16',
'IT': '79.0.0.0/10',
'JE': '87.244.64.0/18',
'JM': '72.27.0.0/17',
'JO': '176.29.0.0/16',
'JP': '126.0.0.0/8',
'KE': '105.48.0.0/12',
'KG': '158.181.128.0/17',
'KH': '36.37.128.0/17',
'KI': '103.25.140.0/22',
'KM': '197.255.224.0/20',
'KN': '198.32.32.0/19',
'KP': '175.45.176.0/22',
'KR': '175.192.0.0/10',
'KW': '37.36.0.0/14',
'KY': '64.96.0.0/15',
'KZ': '2.72.0.0/13',
'LA': '115.84.64.0/18',
'LB': '178.135.0.0/16',
'LC': '192.147.231.0/24',
'LI': '82.117.0.0/19',
'LK': '112.134.0.0/15',
'LR': '41.86.0.0/19',
'LS': '129.232.0.0/17',
'LT': '78.56.0.0/13',
'LU': '188.42.0.0/16',
'LV': '46.109.0.0/16',
'LY': '41.252.0.0/14',
'MA': '105.128.0.0/11',
'MC': '88.209.64.0/18',
'MD': '37.246.0.0/16',
'ME': '178.175.0.0/17',
'MF': '74.112.232.0/21',
'MG': '154.126.0.0/17',
'MH': '117.103.88.0/21',
'MK': '77.28.0.0/15',
'ML': '154.118.128.0/18',
'MM': '37.111.0.0/17',
'MN': '49.0.128.0/17',
'MO': '60.246.0.0/16',
'MP': '202.88.64.0/20',
'MQ': '109.203.224.0/19',
'MR': '41.188.64.0/18',
'MS': '208.90.112.0/22',
'MT': '46.11.0.0/16',
'MU': '105.16.0.0/12',
'MV': '27.114.128.0/18',
'MW': '105.234.0.0/16',
'MX': '187.192.0.0/11',
'MY': '175.136.0.0/13',
'MZ': '197.218.0.0/15',
'NA': '41.182.0.0/16',
'NC': '101.101.0.0/18',
'NE': '197.214.0.0/18',
'NF': '203.17.240.0/22',
'NG': '105.112.0.0/12',
'NI': '186.76.0.0/15',
'NL': '145.96.0.0/11',
'NO': '84.208.0.0/13',
'NP': '36.252.0.0/15',
'NR': '203.98.224.0/19',
'NU': '49.156.48.0/22',
'NZ': '49.224.0.0/14',
'OM': '5.36.0.0/15',
'PA': '186.72.0.0/15',
'PE': '186.160.0.0/14',
'PF': '123.50.64.0/18',
'PG': '124.240.192.0/19',
'PH': '49.144.0.0/13',
'PK': '39.32.0.0/11',
'PL': '83.0.0.0/11',
'PM': '70.36.0.0/20',
'PR': '66.50.0.0/16',
'PS': '188.161.0.0/16',
'PT': '85.240.0.0/13',
'PW': '202.124.224.0/20',
'PY': '181.120.0.0/14',
'QA': '37.210.0.0/15',
'RE': '139.26.0.0/16',
'RO': '79.112.0.0/13',
'RS': '178.220.0.0/14',
'RU': '5.136.0.0/13',
'RW': '105.178.0.0/15',
'SA': '188.48.0.0/13',
'SB': '202.1.160.0/19',
'SC': '154.192.0.0/11',
'SD': '154.96.0.0/13',
'SE': '78.64.0.0/12',
'SG': '152.56.0.0/14',
'SI': '188.196.0.0/14',
'SK': '78.98.0.0/15',
'SL': '197.215.0.0/17',
'SM': '89.186.32.0/19',
'SN': '41.82.0.0/15',
'SO': '197.220.64.0/19',
'SR': '186.179.128.0/17',
'SS': '105.235.208.0/21',
'ST': '197.159.160.0/19',
'SV': '168.243.0.0/16',
'SX': '190.102.0.0/20',
'SY': '5.0.0.0/16',
'SZ': '41.84.224.0/19',
'TC': '65.255.48.0/20',
'TD': '154.68.128.0/19',
'TG': '196.168.0.0/14',
'TH': '171.96.0.0/13',
'TJ': '85.9.128.0/18',
'TK': '27.96.24.0/21',
'TL': '180.189.160.0/20',
'TM': '95.85.96.0/19',
'TN': '197.0.0.0/11',
'TO': '175.176.144.0/21',
'TR': '78.160.0.0/11',
'TT': '186.44.0.0/15',
'TV': '202.2.96.0/19',
'TW': '120.96.0.0/11',
'TZ': '156.156.0.0/14',
'UA': '93.72.0.0/13',
'UG': '154.224.0.0/13',
'US': '3.0.0.0/8',
'UY': '167.56.0.0/13',
'UZ': '82.215.64.0/18',
'VA': '212.77.0.0/19',
'VC': '24.92.144.0/20',
'VE': '186.88.0.0/13',
'VG': '172.103.64.0/18',
'VI': '146.226.0.0/16',
'VN': '14.160.0.0/11',
'VU': '202.80.32.0/20',
'WF': '117.20.32.0/21',
'WS': '202.4.32.0/19',
'YE': '134.35.0.0/16',
'YT': '41.242.116.0/22',
'ZA': '41.0.0.0/11',
'ZM': '165.56.0.0/13',
'ZW': '41.85.192.0/19',
}
@classmethod
def random_ipv4(cls, code):
block = cls._country_ip_map.get(code.upper())
if not block:
return None
addr, preflen = block.split('/')
addr_min = compat_struct_unpack('!L', socket.inet_aton(addr))[0]
addr_max = addr_min | (0xffffffff >> int(preflen))
return compat_str(socket.inet_ntoa(
compat_struct_pack('!L', random.randint(addr_min, addr_max))))
class PerRequestProxyHandler(compat_urllib_request.ProxyHandler):
def __init__(self, proxies=None):
# Set default handlers

View File

@ -1,3 +1,3 @@
from __future__ import unicode_literals
__version__ = '2017.02.14'
__version__ = '2017.02.22'