Compare commits

...

37 Commits

Author SHA1 Message Date
2480b056c1 release 2017.02.16 2017-02-16 00:10:04 +07:00
3aa25395aa [ChangeLog] Actualize 2017-02-16 00:08:56 +07:00
eafaeb226a [ceskatelevize] Lower priority for audio description sources (#12119) 2017-02-16 00:04:15 +07:00
de4d378c0c [ceskatelevize] Prefix format ids 2017-02-15 23:38:00 +07:00
099cfdb770 [devscripts/run_tests.sh] Change permission for script to 755 2017-02-16 00:28:31 +08:00
398dea3210 [test_YoutubeDL] Fix invalid escape sequences 2017-02-15 23:20:46 +07:00
db13c16ef8 [utils] Add support for quoted string literals in --match-filter (closes #8050, closes #12142, closes #12144) 2017-02-15 23:12:10 +07:00
1bd05345ea [amcnetworks] fix extraction(closes #12127) 2017-02-15 14:19:18 +01:00
3021cf83b7 [pinkbike] Fix uploader extraction (closes #12054) 2017-02-15 02:08:32 +07:00
04a741232f [onetpl] Add support for businessinsider.com.pl and plejada.pl 2017-02-15 01:23:55 +07:00
43a3d9edfc [onetpl] Add support for onet.pl (closes #10507) 2017-02-15 01:14:06 +07:00
d31aa74fdb [onetmvp] Add shortcut extractor 2017-02-15 00:58:18 +07:00
6092ccd058 [vodpl] Make more robust and add another test (closes #12122) 2017-02-15 00:52:31 +07:00
22ce9ad2bd [vod.pl] Add new extractor 2017-02-15 00:48:08 +07:00
9a372f14b4 [pornhub] Extract video URL from tv platform site (#12007, #12129) 2017-02-14 23:52:41 +07:00
5cb2d36c82 [ceskatelevize] Extract DASH formats (closes #12119, closes #12133) 2017-02-14 22:57:38 +07:00
fcca0d53a8 [ceskatelevize] Quick fix to revert to using old HLS-based playlist
This fixes recent changes in iVysilani. Proper patch should migrate to
MPEG-DASH version, which is now the default.
2017-02-14 22:25:37 +07:00
58a65ba852 release 2017.02.14 2017-02-14 01:09:18 +07:00
cedf08ff54 [ChangeLog] Actualize 2017-02-14 01:07:35 +07:00
50de3dbad3 [zdf] Fix extraction (closes #12117) 2017-02-14 01:00:06 +07:00
085f169ffe [xtube] Fix extraction for both kinds of video id (closes #12088) 2017-02-13 23:44:43 +07:00
f6d6ca1db3 [xtube] Improve title extraction 2017-02-13 23:34:14 +07:00
6e5956e6ba [lemonde] Fallback delegate extraction to generic extractor (closes #12115, closes #12116) 2017-02-13 23:17:48 +07:00
50fd3c2c69 Merge branch 'master' of github.com:rg3/youtube-dl 2017-02-13 22:58:50 +07:00
89c6691f9d [bellmedia] accept longer video id(closes #12114) 2017-02-13 15:08:48 +01:00
454e5cdb17 [limelight] add support referer protected videos 2017-02-13 14:29:05 +01:00
1de9f78e71 [travis] Separate builds for core and download 2017-02-13 18:56:05 +08:00
9dad941853 [disney] improve extraction
- add support for more urls
- detect expired videos
- skip Adobe Flash Access protected videos

closes #4975
closes #11000
closes #11882
closes #11936
2017-02-13 11:43:20 +01:00
1e2c3f61fc [travis] Separate builds for core and download 2017-02-13 17:36:13 +07:00
0dac7cbb09 [hotstar] improve extraction(closes #12096)
- extract all qualities
- detect drm protected videos
- extract more metadata
2017-02-12 17:35:24 +01:00
f8514630db [einthusan] Fix extraction (closes #11416)
The old test URLs are no longer valid, so I replace them with the one
from #11416
2017-02-12 20:53:55 +08:00
459818e280 [aenetworks] Add support for lifetimemovieclub.com 2017-02-12 20:18:11 +08:00
6310acf512 [youtube] Fix parsing codecs (closes #12091) 2017-02-12 18:09:53 +07:00
8d38dafbbf ChangeLog: update after #12085 2017-02-12 00:45:37 +08:00
f3915452de Merge pull request #12085 from wiiaboo/python2
utils.py: Workaround TypeError with Python 2.7.13 in Windows
2017-02-12 00:42:43 +08:00
2f49bcd690 utils.py: Workaround TypeError with Python 2.7.13 in Windows
Fixes #11540

Tested with Windows Python 2.7.12 and 2.7.13.
2017-02-11 14:51:28 +00:00
68c22c4c15 [iqiyi] Update _TESTS 2017-02-11 22:27:45 +08:00
27 changed files with 584 additions and 215 deletions

View File

@ -6,8 +6,8 @@
--- ---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.02.11*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected. ### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.02.16*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.02.11** - [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.02.16**
### Before submitting an *issue* make sure you have: ### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections - [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2017.02.11 [debug] youtube-dl version 2017.02.16
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View File

@ -8,7 +8,10 @@ python:
- "3.5" - "3.5"
- "3.6" - "3.6"
sudo: false sudo: false
script: nosetests test --verbose env:
- YTDL_TEST_SET=core
- YTDL_TEST_SET=download
script: ./devscripts/run_tests.sh
notifications: notifications:
email: email:
- filippo.valsorda@gmail.com - filippo.valsorda@gmail.com

View File

@ -1,3 +1,40 @@
version 2017.02.16
Core
+ [utils] Add support for quoted string literals in --match-filter (#8050,
#12142, #12144)
Extractors
* [ceskatelevize] Lower priority for audio description sources (#12119)
* [amcnetworks] Fix extraction (#12127)
* [pinkbike] Fix uploader extraction (#12054)
+ [onetpl] Add support for businessinsider.com.pl and plejada.pl
+ [onetpl] Add support for onet.pl (#10507)
+ [onetmvp] Add shortcut extractor
+ [vodpl] Add support for vod.pl (#12122)
+ [pornhub] Extract video URL from tv platform site (#12007, #12129)
+ [ceskatelevize] Extract DASH formats (#12119, #12133)
version 2017.02.14
Core
* TypeError is fixed with Python 2.7.13 on Windows (#11540, #12085)
Extractor
* [zdf] Fix extraction (#12117)
* [xtube] Fix extraction for both kinds of video id (#12088)
* [xtube] Improve title extraction (#12088)
+ [lemonde] Fallback delegate extraction to generic extractor (#12115, #12116)
* [bellmedia] Allow video id longer than 6 characters (#12114)
+ [limelight] Add support for referer protected videos
* [disney] Improve extraction (#4975, #11000, #11882, #11936)
* [hotstar] Improve extraction (#12096)
* [einthusan] Fix extraction (#11416)
+ [aenetworks] Add support for lifetimemovieclub.com (#12097)
* [youtube] Fix parsing codecs (#12091)
version 2017.02.11 version 2017.02.11
Core Core

19
devscripts/run_tests.sh Executable file
View File

@ -0,0 +1,19 @@
#!/bin/bash
DOWNLOAD_TESTS="age_restriction|download|subtitles|write_annotations|iqiyi_sdk_interpreter"
test_set=""
case "$YTDL_TEST_SET" in
core)
test_set="-I test_($DOWNLOAD_TESTS)\.py"
;;
download)
test_set="-I test_(?!$DOWNLOAD_TESTS).+\.py"
;;
*)
break
;;
esac
nosetests test --verbose $test_set

View File

@ -546,8 +546,10 @@
- **OktoberfestTV** - **OktoberfestTV**
- **on.aol.com** - **on.aol.com**
- **OnDemandKorea** - **OnDemandKorea**
- **onet.pl**
- **onet.tv** - **onet.tv**
- **onet.tv:channel** - **onet.tv:channel**
- **OnetMVP**
- **OnionStudios** - **OnionStudios**
- **Ooyala** - **Ooyala**
- **OoyalaExternal** - **OoyalaExternal**
@ -900,6 +902,7 @@
- **vlive** - **vlive**
- **vlive:channel** - **vlive:channel**
- **Vodlocker** - **Vodlocker**
- **VODPl**
- **VODPlatform** - **VODPlatform**
- **VoiceRepublic** - **VoiceRepublic**
- **VoxMedia** - **VoxMedia**

View File

@ -1,4 +1,5 @@
#!/usr/bin/env python #!/usr/bin/env python
# coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
@ -540,10 +541,10 @@ class TestYoutubeDL(unittest.TestCase):
self.assertEqual(ydl._format_note({}), '') self.assertEqual(ydl._format_note({}), '')
assertRegexpMatches(self, ydl._format_note({ assertRegexpMatches(self, ydl._format_note({
'vbr': 10, 'vbr': 10,
}), '^\s*10k$') }), r'^\s*10k$')
assertRegexpMatches(self, ydl._format_note({ assertRegexpMatches(self, ydl._format_note({
'fps': 30, 'fps': 30,
}), '^30fps$') }), r'^30fps$')
def test_postprocessors(self): def test_postprocessors(self):
filename = 'post-processor-testfile.mp4' filename = 'post-processor-testfile.mp4'
@ -606,6 +607,8 @@ class TestYoutubeDL(unittest.TestCase):
'duration': 30, 'duration': 30,
'filesize': 10 * 1024, 'filesize': 10 * 1024,
'playlist_id': '42', 'playlist_id': '42',
'uploader': "變態妍字幕版 太妍 тест",
'creator': "тест ' 123 ' тест--",
} }
second = { second = {
'id': '2', 'id': '2',
@ -616,6 +619,7 @@ class TestYoutubeDL(unittest.TestCase):
'description': 'foo', 'description': 'foo',
'filesize': 5 * 1024, 'filesize': 5 * 1024,
'playlist_id': '43', 'playlist_id': '43',
'uploader': "тест 123",
} }
videos = [first, second] videos = [first, second]
@ -656,6 +660,26 @@ class TestYoutubeDL(unittest.TestCase):
res = get_videos(f) res = get_videos(f)
self.assertEqual(res, ['1']) self.assertEqual(res, ['1'])
f = match_filter_func('uploader = "變態妍字幕版 太妍 тест"')
res = get_videos(f)
self.assertEqual(res, ['1'])
f = match_filter_func('uploader != "變態妍字幕版 太妍 тест"')
res = get_videos(f)
self.assertEqual(res, ['2'])
f = match_filter_func('creator = "тест \' 123 \' тест--"')
res = get_videos(f)
self.assertEqual(res, ['1'])
f = match_filter_func("creator = 'тест \\' 123 \\' тест--'")
res = get_videos(f)
self.assertEqual(res, ['1'])
f = match_filter_func(r"creator = 'тест \' 123 \' тест--' & duration > 30")
res = get_videos(f)
self.assertEqual(res, [])
def test_playlist_items_selection(self): def test_playlist_items_selection(self):
entries = [{ entries = [{
'id': compat_str(i), 'id': compat_str(i),

View File

@ -23,7 +23,7 @@ class AENetworksBaseIE(ThePlatformIE):
class AENetworksIE(AENetworksBaseIE): class AENetworksIE(AENetworksBaseIE):
IE_NAME = 'aenetworks' IE_NAME = 'aenetworks'
IE_DESC = 'A+E Networks: A&E, Lifetime, History.com, FYI Network' IE_DESC = 'A+E Networks: A&E, Lifetime, History.com, FYI Network'
_VALID_URL = r'https?://(?:www\.)?(?P<domain>(?:history|aetv|mylifetime)\.com|fyi\.tv)/(?:shows/(?P<show_path>[^/]+(?:/[^/]+){0,2})|movies/(?P<movie_display_id>[^/]+)/full-movie)' _VALID_URL = r'https?://(?:www\.)?(?P<domain>(?:history|aetv|mylifetime|lifetimemovieclub)\.com|fyi\.tv)/(?:shows/(?P<show_path>[^/]+(?:/[^/]+){0,2})|movies/(?P<movie_display_id>[^/]+)(?:/full-movie)?)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.history.com/shows/mountain-men/season-1/episode-1', 'url': 'http://www.history.com/shows/mountain-men/season-1/episode-1',
'md5': 'a97a65f7e823ae10e9244bc5433d5fe6', 'md5': 'a97a65f7e823ae10e9244bc5433d5fe6',
@ -62,11 +62,15 @@ class AENetworksIE(AENetworksBaseIE):
}, { }, {
'url': 'http://www.mylifetime.com/movies/center-stage-on-pointe/full-movie', 'url': 'http://www.mylifetime.com/movies/center-stage-on-pointe/full-movie',
'only_matching': True 'only_matching': True
}, {
'url': 'https://www.lifetimemovieclub.com/movies/a-killer-among-us',
'only_matching': True
}] }]
_DOMAIN_TO_REQUESTOR_ID = { _DOMAIN_TO_REQUESTOR_ID = {
'history.com': 'HISTORY', 'history.com': 'HISTORY',
'aetv.com': 'AETV', 'aetv.com': 'AETV',
'mylifetime.com': 'LIFETIME', 'mylifetime.com': 'LIFETIME',
'lifetimemovieclub.com': 'LIFETIMEMOVIECLUB',
'fyi.tv': 'FYI', 'fyi.tv': 'FYI',
} }

View File

@ -53,20 +53,30 @@ class AMCNetworksIE(ThePlatformIE):
'mbr': 'true', 'mbr': 'true',
'manifest': 'm3u', 'manifest': 'm3u',
} }
media_url = self._search_regex(r'window\.platformLinkURL\s*=\s*[\'"]([^\'"]+)', webpage, 'media url') media_url = self._search_regex(
r'window\.platformLinkURL\s*=\s*[\'"]([^\'"]+)',
webpage, 'media url')
theplatform_metadata = self._download_theplatform_metadata(self._search_regex( theplatform_metadata = self._download_theplatform_metadata(self._search_regex(
r'https?://link.theplatform.com/s/([^?]+)', media_url, 'theplatform_path'), display_id) r'link\.theplatform\.com/s/([^?]+)',
media_url, 'theplatform_path'), display_id)
info = self._parse_theplatform_metadata(theplatform_metadata) info = self._parse_theplatform_metadata(theplatform_metadata)
video_id = theplatform_metadata['pid'] video_id = theplatform_metadata['pid']
title = theplatform_metadata['title'] title = theplatform_metadata['title']
rating = theplatform_metadata['ratings'][0]['rating'] rating = theplatform_metadata['ratings'][0]['rating']
auth_required = self._search_regex(r'window\.authRequired\s*=\s*(true|false);', webpage, 'auth required') auth_required = self._search_regex(
r'window\.authRequired\s*=\s*(true|false);',
webpage, 'auth required')
if auth_required == 'true': if auth_required == 'true':
requestor_id = self._search_regex(r'window\.requestor_id\s*=\s*[\'"]([^\'"]+)', webpage, 'requestor id') requestor_id = self._search_regex(
resource = self._get_mvpd_resource(requestor_id, title, video_id, rating) r'window\.requestor_id\s*=\s*[\'"]([^\'"]+)',
query['auth'] = self._extract_mvpd_auth(url, video_id, requestor_id, resource) webpage, 'requestor id')
resource = self._get_mvpd_resource(
requestor_id, title, video_id, rating)
query['auth'] = self._extract_mvpd_auth(
url, video_id, requestor_id, resource)
media_url = update_url_query(media_url, query) media_url = update_url_query(media_url, query)
formats, subtitles = self._extract_theplatform_smil(media_url, video_id) formats, subtitles = self._extract_theplatform_smil(
media_url, video_id)
self._sort_formats(formats) self._sort_formats(formats)
info.update({ info.update({
'id': video_id, 'id': video_id,
@ -78,9 +88,11 @@ class AMCNetworksIE(ThePlatformIE):
if ns_keys: if ns_keys:
ns = list(ns_keys)[0] ns = list(ns_keys)[0]
series = theplatform_metadata.get(ns + '$show') series = theplatform_metadata.get(ns + '$show')
season_number = int_or_none(theplatform_metadata.get(ns + '$season')) season_number = int_or_none(
theplatform_metadata.get(ns + '$season'))
episode = theplatform_metadata.get(ns + '$episodeTitle') episode = theplatform_metadata.get(ns + '$episodeTitle')
episode_number = int_or_none(theplatform_metadata.get(ns + '$episode')) episode_number = int_or_none(
theplatform_metadata.get(ns + '$episode'))
if season_number: if season_number:
title = 'Season %d - %s' % (season_number, title) title = 'Season %d - %s' % (season_number, title)
if series: if series:

View File

@ -24,7 +24,7 @@ class BellMediaIE(InfoExtractor):
space space
)\.ca| )\.ca|
much\.com much\.com
)/.*?(?:\bvid=|-vid|~|%7E|/(?:episode)?)(?P<id>[0-9]{6})''' )/.*?(?:\bvid=|-vid|~|%7E|/(?:episode)?)(?P<id>[0-9]{6,})'''
_TESTS = [{ _TESTS = [{
'url': 'http://www.ctv.ca/video/player?vid=706966', 'url': 'http://www.ctv.ca/video/player?vid=706966',
'md5': 'ff2ebbeae0aa2dcc32a830c3fd69b7b0', 'md5': 'ff2ebbeae0aa2dcc32a830c3fd69b7b0',
@ -55,6 +55,9 @@ class BellMediaIE(InfoExtractor):
}, { }, {
'url': 'http://www.much.com/shows/the-almost-impossible-gameshow/928979/episode-6', 'url': 'http://www.much.com/shows/the-almost-impossible-gameshow/928979/episode-6',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://www.ctv.ca/DCs-Legends-of-Tomorrow/Video/S2E11-Turncoat-vid1051430',
'only_matching': True,
}] }]
_DOMAINS = { _DOMAINS = {
'thecomedynetwork': 'comedy', 'thecomedynetwork': 'comedy',

View File

@ -13,6 +13,7 @@ from ..utils import (
float_or_none, float_or_none,
sanitized_Request, sanitized_Request,
urlencode_postdata, urlencode_postdata,
USER_AGENTS,
) )
@ -21,10 +22,10 @@ class CeskaTelevizeIE(InfoExtractor):
_TESTS = [{ _TESTS = [{
'url': 'http://www.ceskatelevize.cz/ivysilani/ivysilani/10441294653-hyde-park-civilizace/214411058091220', 'url': 'http://www.ceskatelevize.cz/ivysilani/ivysilani/10441294653-hyde-park-civilizace/214411058091220',
'info_dict': { 'info_dict': {
'id': '61924494876951776', 'id': '61924494877246241',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Hyde Park Civilizace', 'title': 'Hyde Park Civilizace: Život v Grónsku',
'description': 'md5:fe93f6eda372d150759d11644ebbfb4a', 'description': 'md5:3fec8f6bb497be5cdb0c9e8781076626',
'thumbnail': r're:^https?://.*\.jpg', 'thumbnail': r're:^https?://.*\.jpg',
'duration': 3350, 'duration': 3350,
}, },
@ -114,6 +115,9 @@ class CeskaTelevizeIE(InfoExtractor):
'requestSource': 'iVysilani', 'requestSource': 'iVysilani',
} }
entries = []
for user_agent in (None, USER_AGENTS['Safari']):
req = sanitized_Request( req = sanitized_Request(
'http://www.ceskatelevize.cz/ivysilani/ajax/get-client-playlist', 'http://www.ceskatelevize.cz/ivysilani/ajax/get-client-playlist',
data=urlencode_postdata(data)) data=urlencode_postdata(data))
@ -121,9 +125,14 @@ class CeskaTelevizeIE(InfoExtractor):
req.add_header('Content-type', 'application/x-www-form-urlencoded') req.add_header('Content-type', 'application/x-www-form-urlencoded')
req.add_header('x-addr', '127.0.0.1') req.add_header('x-addr', '127.0.0.1')
req.add_header('X-Requested-With', 'XMLHttpRequest') req.add_header('X-Requested-With', 'XMLHttpRequest')
if user_agent:
req.add_header('User-Agent', user_agent)
req.add_header('Referer', url) req.add_header('Referer', url)
playlistpage = self._download_json(req, playlist_id) playlistpage = self._download_json(req, playlist_id, fatal=False)
if not playlistpage:
continue
playlist_url = playlistpage['url'] playlist_url = playlistpage['url']
if playlist_url == 'error_region': if playlist_url == 'error_region':
@ -135,19 +144,38 @@ class CeskaTelevizeIE(InfoExtractor):
playlist_title = self._og_search_title(webpage, default=None) playlist_title = self._og_search_title(webpage, default=None)
playlist_description = self._og_search_description(webpage, default=None) playlist_description = self._og_search_description(webpage, default=None)
playlist = self._download_json(req, playlist_id)['playlist'] playlist = self._download_json(req, playlist_id, fatal=False)
if not playlist:
continue
playlist = playlist.get('playlist')
if not isinstance(playlist, list):
continue
playlist_len = len(playlist) playlist_len = len(playlist)
entries = [] for num, item in enumerate(playlist):
for item in playlist:
is_live = item.get('type') == 'LIVE' is_live = item.get('type') == 'LIVE'
formats = [] formats = []
for format_id, stream_url in item['streamUrls'].items(): for format_id, stream_url in item.get('streamUrls', {}).items():
formats.extend(self._extract_m3u8_formats( if 'playerType=flash' in stream_url:
stream_formats = self._extract_m3u8_formats(
stream_url, playlist_id, 'mp4', stream_url, playlist_id, 'mp4',
entry_protocol='m3u8' if is_live else 'm3u8_native', entry_protocol='m3u8' if is_live else 'm3u8_native',
fatal=False)) m3u8_id='hls-%s' % format_id, fatal=False)
self._sort_formats(formats) else:
stream_formats = self._extract_mpd_formats(
stream_url, playlist_id,
mpd_id='dash-%s' % format_id, fatal=False)
# See https://github.com/rg3/youtube-dl/issues/12119#issuecomment-280037031
if format_id == 'audioDescription':
for f in stream_formats:
f['source_preference'] = -10
formats.extend(stream_formats)
if user_agent and len(entries) == playlist_len:
entries[num]['formats'].extend(formats)
continue
item_id = item.get('id') or item['assetId'] item_id = item.get('id') or item['assetId']
title = item['title'] title = item['title']
@ -179,6 +207,9 @@ class CeskaTelevizeIE(InfoExtractor):
'is_live': is_live, 'is_live': is_live,
}) })
for e in entries:
self._sort_formats(e['formats'])
return self.playlist_result(entries, playlist_id, playlist_title, playlist_description) return self.playlist_result(entries, playlist_id, playlist_title, playlist_description)
def _get_subtitles(self, episode_id, subs): def _get_subtitles(self, episode_id, subs):

View File

@ -9,13 +9,15 @@ from ..utils import (
unified_strdate, unified_strdate,
compat_str, compat_str,
determine_ext, determine_ext,
ExtractorError,
) )
class DisneyIE(InfoExtractor): class DisneyIE(InfoExtractor):
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?://(?P<domain>(?:[^/]+\.)?(?:disney\.[a-z]{2,3}(?:\.[a-z]{2})?|disney(?:(?:me|latino)\.com|turkiye\.com\.tr)|starwars\.com))/(?:embed/|(?:[^/]+/)+[\w-]+-)(?P<id>[a-z0-9]{24})''' https?://(?P<domain>(?:[^/]+\.)?(?:disney\.[a-z]{2,3}(?:\.[a-z]{2})?|disney(?:(?:me|latino)\.com|turkiye\.com\.tr)|(?:starwars|marvelkids)\.com))/(?:(?:embed/|(?:[^/]+/)+[\w-]+-)(?P<id>[a-z0-9]{24})|(?:[^/]+/)?(?P<display_id>[^/?#]+))'''
_TESTS = [{ _TESTS = [{
# Disney.EmbedVideo
'url': 'http://video.disney.com/watch/moana-trailer-545ed1857afee5a0ec239977', 'url': 'http://video.disney.com/watch/moana-trailer-545ed1857afee5a0ec239977',
'info_dict': { 'info_dict': {
'id': '545ed1857afee5a0ec239977', 'id': '545ed1857afee5a0ec239977',
@ -28,6 +30,20 @@ class DisneyIE(InfoExtractor):
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
} }
}, {
# Grill.burger
'url': 'http://www.starwars.com/video/rogue-one-a-star-wars-story-intro-featurette',
'info_dict': {
'id': '5454e9f4e9804a552e3524c8',
'ext': 'mp4',
'title': '"Intro" Featurette: Rogue One: A Star Wars Story',
'upload_date': '20170104',
'description': 'Go behind-the-scenes of Rogue One: A Star Wars Story in this featurette with Director Gareth Edwards and the cast of the film.',
},
'params': {
# m3u8 download
'skip_download': True,
}
}, { }, {
'url': 'http://videos.disneylatino.com/ver/spider-man-de-regreso-a-casa-primer-adelanto-543a33a1850bdcfcca13bae2', 'url': 'http://videos.disneylatino.com/ver/spider-man-de-regreso-a-casa-primer-adelanto-543a33a1850bdcfcca13bae2',
'only_matching': True, 'only_matching': True,
@ -43,31 +59,55 @@ class DisneyIE(InfoExtractor):
}, { }, {
'url': 'http://www.starwars.com/embed/54690d1e6c42e5f09a0fb097', 'url': 'http://www.starwars.com/embed/54690d1e6c42e5f09a0fb097',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://spiderman.marvelkids.com/embed/522900d2ced3c565e4cc0677',
'only_matching': True,
}, {
'url': 'http://spiderman.marvelkids.com/videos/contest-of-champions-part-four-clip-1',
'only_matching': True,
}, {
'url': 'http://disneyjunior.en.disneyme.com/dj/watch-my-friends-tigger-and-pooh-promo',
'only_matching': True,
}, {
'url': 'http://disneyjunior.disney.com/galactech-the-galactech-grab-galactech-an-admiral-rescue',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
domain, video_id = re.match(self._VALID_URL, url).groups() domain, video_id, display_id = re.match(self._VALID_URL, url).groups()
if not video_id:
webpage = self._download_webpage(url, display_id)
grill = re.sub(r'"\s*\+\s*"', '', self._search_regex(
r'Grill\.burger\s*=\s*({.+})\s*:',
webpage, 'grill data'))
page_data = next(s for s in self._parse_json(grill, display_id)['stack'] if s.get('type') == 'video')
video_data = page_data['data'][0]
else:
webpage = self._download_webpage( webpage = self._download_webpage(
'http://%s/embed/%s' % (domain, video_id), video_id) 'http://%s/embed/%s' % (domain, video_id), video_id)
video_data = self._parse_json(self._search_regex( page_data = self._parse_json(self._search_regex(
r'Disney\.EmbedVideo=({.+});', webpage, 'embed data'), video_id)['video'] r'Disney\.EmbedVideo\s*=\s*({.+});',
webpage, 'embed data'), video_id)
video_data = page_data['video']
for external in video_data.get('externals', []): for external in video_data.get('externals', []):
if external.get('source') == 'vevo': if external.get('source') == 'vevo':
return self.url_result('vevo:' + external['data_id'], 'Vevo') return self.url_result('vevo:' + external['data_id'], 'Vevo')
video_id = video_data['id']
title = video_data['title'] title = video_data['title']
formats = [] formats = []
for flavor in video_data.get('flavors', []): for flavor in video_data.get('flavors', []):
flavor_format = flavor.get('format') flavor_format = flavor.get('format')
flavor_url = flavor.get('url') flavor_url = flavor.get('url')
if not flavor_url or not re.match(r'https?://', flavor_url): if not flavor_url or not re.match(r'https?://', flavor_url) or flavor_format == 'mp4_access':
continue continue
tbr = int_or_none(flavor.get('bitrate')) tbr = int_or_none(flavor.get('bitrate'))
if tbr == 99999: if tbr == 99999:
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
flavor_url, video_id, 'mp4', m3u8_id=flavor_format, fatal=False)) flavor_url, video_id, 'mp4',
m3u8_id=flavor_format, fatal=False))
continue continue
format_id = [] format_id = []
if flavor_format: if flavor_format:
@ -88,6 +128,10 @@ class DisneyIE(InfoExtractor):
'ext': ext, 'ext': ext,
'vcodec': 'none' if (width == 0 and height == 0) else None, 'vcodec': 'none' if (width == 0 and height == 0) else None,
}) })
if not formats and video_data.get('expired'):
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, page_data['translations']['video_expired']),
expected=True)
self._sort_formats(formats) self._sort_formats(formats)
subtitles = {} subtitles = {}

View File

@ -1,67 +1,94 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import base64
import json
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urlparse from ..compat import (
compat_urlparse,
compat_str,
)
from ..utils import ( from ..utils import (
remove_start, extract_attributes,
sanitized_Request, ExtractorError,
get_elements_by_class,
urlencode_postdata,
) )
class EinthusanIE(InfoExtractor): class EinthusanIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?einthusan\.com/movies/watch.php\?([^#]*?)id=(?P<id>[0-9]+)' _VALID_URL = r'https?://einthusan\.tv/movie/watch/(?P<id>[0-9]+)'
_TESTS = [ _TEST = {
{ 'url': 'https://einthusan.tv/movie/watch/9097/',
'url': 'http://www.einthusan.com/movies/watch.php?id=2447', 'md5': 'ff0f7f2065031b8a2cf13a933731c035',
'md5': 'd71379996ff5b7f217eca034c34e3461',
'info_dict': { 'info_dict': {
'id': '2447', 'id': '9097',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Ek Villain', 'title': 'Ae Dil Hai Mushkil',
'description': 'md5:33ef934c82a671a94652a9b4e54d931b',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
'description': 'md5:9d29fc91a7abadd4591fb862fa560d93',
} }
},
{
'url': 'http://www.einthusan.com/movies/watch.php?id=1671',
'md5': 'b16a6fd3c67c06eb7c79c8a8615f4213',
'info_dict': {
'id': '1671',
'ext': 'mp4',
'title': 'Soodhu Kavvuum',
'thumbnail': r're:^https?://.*\.jpg$',
'description': 'md5:b40f2bf7320b4f9414f3780817b2af8c',
} }
},
] # reversed from jsoncrypto.prototype.decrypt() in einthusan-PGMovieWatcher.js
def _decrypt(self, encrypted_data, video_id):
return self._parse_json(base64.b64decode((
encrypted_data[:10] + encrypted_data[-1] + encrypted_data[12:-1]
).encode('ascii')).decode('utf-8'), video_id)
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
request = sanitized_Request(url) webpage = self._download_webpage(url, video_id)
request.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 5.2; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0')
webpage = self._download_webpage(request, video_id)
title = self._html_search_regex( title = self._html_search_regex(r'<h3>([^<]+)</h3>', webpage, 'title')
r'<h1><a[^>]+class=["\']movie-title["\'][^>]*>(.+?)</a></h1>',
webpage, 'title')
video_id = self._search_regex( player_params = extract_attributes(self._search_regex(
r'data-movieid=["\'](\d+)', webpage, 'video id', default=video_id) r'(<section[^>]+id="UIVideoPlayer"[^>]+>)', webpage, 'player parameters'))
m3u8_url = self._download_webpage( page_id = self._html_search_regex(
'http://cdn.einthusan.com/geturl/%s/hd/London,Washington,Toronto,Dallas,San,Sydney/' '<html[^>]+data-pageid="([^"]+)"', webpage, 'page ID')
% video_id, video_id, headers={'Referer': url}) video_data = self._download_json(
formats = self._extract_m3u8_formats( 'https://einthusan.tv/ajax/movie/watch/%s/' % video_id, video_id,
m3u8_url, video_id, ext='mp4', entry_protocol='m3u8_native') data=urlencode_postdata({
'xEvent': 'UIVideoPlayer.PingOutcome',
'xJson': json.dumps({
'EJOutcomes': player_params['data-ejpingables'],
'NativeHLS': False
}),
'arcVersion': 3,
'appVersion': 59,
'gorilla.csrf.Token': page_id,
}))['Data']
description = self._html_search_meta('description', webpage) if isinstance(video_data, compat_str) and video_data.startswith('/ratelimited/'):
raise ExtractorError(
'Download rate reached. Please try again later.', expected=True)
ej_links = self._decrypt(video_data['EJLinks'], video_id)
formats = []
m3u8_url = ej_links.get('HLSLink')
if m3u8_url:
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, ext='mp4', entry_protocol='m3u8_native'))
mp4_url = ej_links.get('MP4Link')
if mp4_url:
formats.append({
'url': mp4_url,
})
self._sort_formats(formats)
description = get_elements_by_class('synopsis', webpage)[0]
thumbnail = self._html_search_regex( thumbnail = self._html_search_regex(
r'''<a class="movie-cover-wrapper".*?><img src=["'](.*?)["'].*?/></a>''', r'''<img[^>]+src=(["'])(?P<url>(?!\1).+?/moviecovers/(?!\1).+?)\1''',
webpage, "thumbnail url", fatal=False) webpage, 'thumbnail url', fatal=False, group='url')
if thumbnail is not None: if thumbnail is not None:
thumbnail = compat_urlparse.urljoin(url, remove_start(thumbnail, '..')) thumbnail = compat_urlparse.urljoin(url, thumbnail)
return { return {
'id': video_id, 'id': video_id,

View File

@ -694,6 +694,8 @@ from .ondemandkorea import OnDemandKoreaIE
from .onet import ( from .onet import (
OnetIE, OnetIE,
OnetChannelIE, OnetChannelIE,
OnetMVPIE,
OnetPlIE,
) )
from .onionstudios import OnionStudiosIE from .onionstudios import OnionStudiosIE
from .ooyala import ( from .ooyala import (
@ -1147,6 +1149,7 @@ from .vlive import (
VLiveChannelIE VLiveChannelIE
) )
from .vodlocker import VodlockerIE from .vodlocker import VodlockerIE
from .vodpl import VODPlIE
from .vodplatform import VODPlatformIE from .vodplatform import VODPlatformIE
from .voicerepublic import VoiceRepublicIE from .voicerepublic import VoiceRepublicIE
from .voxmedia import VoxMediaIE from .voxmedia import VoxMediaIE

View File

@ -991,19 +991,6 @@ class GenericIE(InfoExtractor):
'title': 'Os Guinness // Is It Fools Talk? // Unbelievable? Conference 2014', 'title': 'Os Guinness // Is It Fools Talk? // Unbelievable? Conference 2014',
}, },
}, },
# Kaltura embed protected with referrer
{
'url': 'http://www.disney.nl/disney-channel/filmpjes/achter-de-schermen#/videoId/violetta-achter-de-schermen-ruggero',
'info_dict': {
'id': '1_g4fbemnq',
'ext': 'mp4',
'title': 'Violetta - Achter De Schermen - Ruggero',
'description': 'Achter de schermen met Ruggero',
'timestamp': 1435133761,
'upload_date': '20150624',
'uploader_id': 'echojecka',
},
},
# Kaltura embed with single quotes # Kaltura embed with single quotes
{ {
'url': 'http://fod.infobase.com/p_ViewPlaylist.aspx?AssignmentID=NUN8ZY', 'url': 'http://fod.infobase.com/p_ViewPlaylist.aspx?AssignmentID=NUN8ZY',
@ -2350,8 +2337,9 @@ class GenericIE(InfoExtractor):
'Channel': 'channel', 'Channel': 'channel',
'ChannelList': 'channel_list', 'ChannelList': 'channel_list',
} }
return self.url_result('limelight:%s:%s' % ( return self.url_result(smuggle_url('limelight:%s:%s' % (
lm[mobj.group(1)], mobj.group(2)), 'Limelight%s' % mobj.group(1), mobj.group(2)) lm[mobj.group(1)], mobj.group(2)), {'source_url': url}),
'Limelight%s' % mobj.group(1), mobj.group(2))
mobj = re.search( mobj = re.search(
r'''(?sx) r'''(?sx)
@ -2361,7 +2349,9 @@ class GenericIE(InfoExtractor):
value=(["\'])(?:(?!\3).)*mediaId=(?P<id>[a-z0-9]{32}) value=(["\'])(?:(?!\3).)*mediaId=(?P<id>[a-z0-9]{32})
''', webpage) ''', webpage)
if mobj: if mobj:
return self.url_result('limelight:media:%s' % mobj.group('id')) return self.url_result(smuggle_url(
'limelight:media:%s' % mobj.group('id'),
{'source_url': url}), 'LimelightMedia', mobj.group('id'))
# Look for AdobeTVVideo embeds # Look for AdobeTVVideo embeds
mobj = re.search( mobj = re.search(

View File

@ -34,11 +34,9 @@ class HotStarIE(InfoExtractor):
'only_matching': True, 'only_matching': True,
}] }]
_GET_CONTENT_TEMPLATE = 'http://account.hotstar.com/AVS/besc?action=GetAggregatedContentDetails&channel=PCTV&contentId=%s' def _download_json(self, url_or_request, video_id, note='Downloading JSON metadata', fatal=True, query=None):
_GET_CDN_TEMPLATE = 'http://getcdn.hotstar.com/AVS/besc?action=GetCDN&asJson=Y&channel=%s&id=%s&type=%s' json_data = super(HotStarIE, self)._download_json(
url_or_request, video_id, note, fatal=fatal, query=query)
def _download_json(self, url_or_request, video_id, note='Downloading JSON metadata', fatal=True):
json_data = super(HotStarIE, self)._download_json(url_or_request, video_id, note, fatal=fatal)
if json_data['resultCode'] != 'OK': if json_data['resultCode'] != 'OK':
if fatal: if fatal:
raise ExtractorError(json_data['errorDescription']) raise ExtractorError(json_data['errorDescription'])
@ -48,20 +46,37 @@ class HotStarIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
video_data = self._download_json( video_data = self._download_json(
self._GET_CONTENT_TEMPLATE % video_id, 'http://account.hotstar.com/AVS/besc', video_id, query={
video_id)['contentInfo'][0] 'action': 'GetAggregatedContentDetails',
'channel': 'PCTV',
'contentId': video_id,
})['contentInfo'][0]
title = video_data['episodeTitle']
if video_data.get('encrypted') == 'Y':
raise ExtractorError('This video is DRM protected.', expected=True)
formats = [] formats = []
# PCTV for extracting f4m manifest for f in ('JIO',):
for f in ('TABLET',):
format_data = self._download_json( format_data = self._download_json(
self._GET_CDN_TEMPLATE % (f, video_id, 'VOD'), 'http://getcdn.hotstar.com/AVS/besc',
video_id, 'Downloading %s JSON metadata' % f, fatal=False) video_id, 'Downloading %s JSON metadata' % f,
fatal=False, query={
'action': 'GetCDN',
'asJson': 'Y',
'channel': f,
'id': video_id,
'type': 'VOD',
})
if format_data: if format_data:
format_url = format_data['src'] format_url = format_data.get('src')
if not format_url:
continue
ext = determine_ext(format_url) ext = determine_ext(format_url)
if ext == 'm3u8': if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(format_url, video_id, 'mp4', m3u8_id='hls', fatal=False)) formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4',
m3u8_id='hls', fatal=False))
elif ext == 'f4m': elif ext == 'f4m':
# produce broken files # produce broken files
continue continue
@ -75,9 +90,12 @@ class HotStarIE(InfoExtractor):
return { return {
'id': video_id, 'id': video_id,
'title': video_data['episodeTitle'], 'title': title,
'description': video_data.get('description'), 'description': video_data.get('description'),
'duration': int_or_none(video_data.get('duration')), 'duration': int_or_none(video_data.get('duration')),
'timestamp': int_or_none(video_data.get('broadcastDate')), 'timestamp': int_or_none(video_data.get('broadcastDate')),
'formats': formats, 'formats': formats,
'episode': title,
'episode_number': int_or_none(video_data.get('episodeNumber')),
'series': video_data.get('contentTitle'),
} }

View File

@ -173,11 +173,12 @@ class IqiyiIE(InfoExtractor):
} }
}, { }, {
'url': 'http://www.iqiyi.com/v_19rrhnnclk.html', 'url': 'http://www.iqiyi.com/v_19rrhnnclk.html',
'md5': '667171934041350c5de3f5015f7f1152', 'md5': 'b7dc800a4004b1b57749d9abae0472da',
'info_dict': { 'info_dict': {
'id': 'e3f585b550a280af23c98b6cb2be19fb', 'id': 'e3f585b550a280af23c98b6cb2be19fb',
'ext': 'mp4', 'ext': 'mp4',
'title': '名侦探柯南 国语版第752集 迫近灰原秘密的黑影 下篇', # This can be either Simplified Chinese or Traditional Chinese
'title': r're:^(?:名侦探柯南 国语版第752集 迫近灰原秘密的黑影 下篇|名偵探柯南 國語版第752集 迫近灰原秘密的黑影 下篇)$',
}, },
'skip': 'Geo-restricted to China', 'skip': 'Geo-restricted to China',
}, { }, {

View File

@ -7,20 +7,40 @@ class LemondeIE(InfoExtractor):
_VALID_URL = r'https?://(?:.+?\.)?lemonde\.fr/(?:[^/]+/)*(?P<id>[^/]+)\.html' _VALID_URL = r'https?://(?:.+?\.)?lemonde\.fr/(?:[^/]+/)*(?P<id>[^/]+)\.html'
_TESTS = [{ _TESTS = [{
'url': 'http://www.lemonde.fr/police-justice/video/2016/01/19/comprendre-l-affaire-bygmalion-en-cinq-minutes_4849702_1653578.html', 'url': 'http://www.lemonde.fr/police-justice/video/2016/01/19/comprendre-l-affaire-bygmalion-en-cinq-minutes_4849702_1653578.html',
'md5': '01fb3c92de4c12c573343d63e163d302', 'md5': 'da120c8722d8632eec6ced937536cc98',
'info_dict': { 'info_dict': {
'id': 'lqm3kl', 'id': 'lqm3kl',
'ext': 'mp4', 'ext': 'mp4',
'title': "Comprendre l'affaire Bygmalion en 5 minutes", 'title': "Comprendre l'affaire Bygmalion en 5 minutes",
'thumbnail': r're:^https?://.*\.jpg', 'thumbnail': r're:^https?://.*\.jpg',
'duration': 320, 'duration': 309,
'upload_date': '20160119', 'upload_date': '20160119',
'timestamp': 1453194778, 'timestamp': 1453194778,
'uploader_id': '3pmkp', 'uploader_id': '3pmkp',
}, },
}, {
# standard iframe embed
'url': 'http://www.lemonde.fr/les-decodeurs/article/2016/10/18/tout-comprendre-du-ceta-le-petit-cousin-du-traite-transatlantique_5015920_4355770.html',
'info_dict': {
'id': 'uzsxms',
'ext': 'mp4',
'title': "CETA : quelles suites pour l'accord commercial entre l'Europe et le Canada ?",
'thumbnail': r're:^https?://.*\.jpg',
'duration': 325,
'upload_date': '20161021',
'timestamp': 1477044540,
'uploader_id': '3pmkp',
},
'params': {
'skip_download': True,
},
}, { }, {
'url': 'http://redaction.actu.lemonde.fr/societe/video/2016/01/18/calais-debut-des-travaux-de-defrichement-dans-la-jungle_4849233_3224.html', 'url': 'http://redaction.actu.lemonde.fr/societe/video/2016/01/18/calais-debut-des-travaux-de-defrichement-dans-la-jungle_4849233_3224.html',
'only_matching': True, 'only_matching': True,
}, {
# YouTube embeds
'url': 'http://www.lemonde.fr/pixels/article/2016/12/09/pourquoi-pewdiepie-superstar-de-youtube-a-menace-de-fermer-sa-chaine_5046649_4408996.html',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -30,5 +50,9 @@ class LemondeIE(InfoExtractor):
digiteka_url = self._proto_relative_url(self._search_regex( digiteka_url = self._proto_relative_url(self._search_regex(
r'url\s*:\s*(["\'])(?P<url>(?:https?://)?//(?:www\.)?(?:digiteka\.net|ultimedia\.com)/deliver/.+?)\1', r'url\s*:\s*(["\'])(?P<url>(?:https?://)?//(?:www\.)?(?:digiteka\.net|ultimedia\.com)/deliver/.+?)\1',
webpage, 'digiteka url', group='url')) webpage, 'digiteka url', group='url', default=None))
if digiteka_url:
return self.url_result(digiteka_url, 'Digiteka') return self.url_result(digiteka_url, 'Digiteka')
return self.url_result(url, 'Generic')

View File

@ -8,6 +8,7 @@ from ..utils import (
determine_ext, determine_ext,
float_or_none, float_or_none,
int_or_none, int_or_none,
unsmuggle_url,
) )
@ -15,20 +16,23 @@ class LimelightBaseIE(InfoExtractor):
_PLAYLIST_SERVICE_URL = 'http://production-ps.lvp.llnw.net/r/PlaylistService/%s/%s/%s' _PLAYLIST_SERVICE_URL = 'http://production-ps.lvp.llnw.net/r/PlaylistService/%s/%s/%s'
_API_URL = 'http://api.video.limelight.com/rest/organizations/%s/%s/%s/%s.json' _API_URL = 'http://api.video.limelight.com/rest/organizations/%s/%s/%s/%s.json'
def _call_playlist_service(self, item_id, method, fatal=True): def _call_playlist_service(self, item_id, method, fatal=True, referer=None):
headers = {}
if referer:
headers['Referer'] = referer
return self._download_json( return self._download_json(
self._PLAYLIST_SERVICE_URL % (self._PLAYLIST_SERVICE_PATH, item_id, method), self._PLAYLIST_SERVICE_URL % (self._PLAYLIST_SERVICE_PATH, item_id, method),
item_id, 'Downloading PlaylistService %s JSON' % method, fatal=fatal) item_id, 'Downloading PlaylistService %s JSON' % method, fatal=fatal, headers=headers)
def _call_api(self, organization_id, item_id, method): def _call_api(self, organization_id, item_id, method):
return self._download_json( return self._download_json(
self._API_URL % (organization_id, self._API_PATH, item_id, method), self._API_URL % (organization_id, self._API_PATH, item_id, method),
item_id, 'Downloading API %s JSON' % method) item_id, 'Downloading API %s JSON' % method)
def _extract(self, item_id, pc_method, mobile_method, meta_method): def _extract(self, item_id, pc_method, mobile_method, meta_method, referer=None):
pc = self._call_playlist_service(item_id, pc_method) pc = self._call_playlist_service(item_id, pc_method, referer=referer)
metadata = self._call_api(pc['orgId'], item_id, meta_method) metadata = self._call_api(pc['orgId'], item_id, meta_method)
mobile = self._call_playlist_service(item_id, mobile_method, fatal=False) mobile = self._call_playlist_service(item_id, mobile_method, fatal=False, referer=referer)
return pc, mobile, metadata return pc, mobile, metadata
def _extract_info(self, streams, mobile_urls, properties): def _extract_info(self, streams, mobile_urls, properties):
@ -207,10 +211,13 @@ class LimelightMediaIE(LimelightBaseIE):
_API_PATH = 'media' _API_PATH = 'media'
def _real_extract(self, url): def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
video_id = self._match_id(url) video_id = self._match_id(url)
pc, mobile, metadata = self._extract( pc, mobile, metadata = self._extract(
video_id, 'getPlaylistByMediaId', 'getMobilePlaylistByMediaId', 'properties') video_id, 'getPlaylistByMediaId',
'getMobilePlaylistByMediaId', 'properties',
smuggled_data.get('source_url'))
return self._extract_info( return self._extract_info(
pc['playlistItems'][0].get('streams', []), pc['playlistItems'][0].get('streams', []),
@ -247,11 +254,13 @@ class LimelightChannelIE(LimelightBaseIE):
_API_PATH = 'channels' _API_PATH = 'channels'
def _real_extract(self, url): def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
channel_id = self._match_id(url) channel_id = self._match_id(url)
pc, mobile, medias = self._extract( pc, mobile, medias = self._extract(
channel_id, 'getPlaylistByChannelId', channel_id, 'getPlaylistByChannelId',
'getMobilePlaylistWithNItemsByChannelId?begin=0&count=-1', 'media') 'getMobilePlaylistWithNItemsByChannelId?begin=0&count=-1',
'media', smuggled_data.get('source_url'))
entries = [ entries = [
self._extract_info( self._extract_info(

View File

@ -23,7 +23,7 @@ class OnetBaseIE(InfoExtractor):
return self._search_regex( return self._search_regex(
r'id=(["\'])mvp:(?P<id>.+?)\1', webpage, 'mvp id', group='id') r'id=(["\'])mvp:(?P<id>.+?)\1', webpage, 'mvp id', group='id')
def _extract_from_id(self, video_id, webpage): def _extract_from_id(self, video_id, webpage=None):
response = self._download_json( response = self._download_json(
'http://qi.ckm.onetapi.pl/', video_id, 'http://qi.ckm.onetapi.pl/', video_id,
query={ query={
@ -74,8 +74,10 @@ class OnetBaseIE(InfoExtractor):
meta = video.get('meta', {}) meta = video.get('meta', {})
title = self._og_search_title(webpage, default=None) or meta['title'] title = (self._og_search_title(
description = self._og_search_description(webpage, default=None) or meta.get('description') webpage, default=None) if webpage else None) or meta['title']
description = (self._og_search_description(
webpage, default=None) if webpage else None) or meta.get('description')
duration = meta.get('length') or meta.get('lenght') duration = meta.get('length') or meta.get('lenght')
timestamp = parse_iso8601(meta.get('addDate'), ' ') timestamp = parse_iso8601(meta.get('addDate'), ' ')
@ -89,6 +91,18 @@ class OnetBaseIE(InfoExtractor):
} }
class OnetMVPIE(OnetBaseIE):
_VALID_URL = r'onetmvp:(?P<id>\d+\.\d+)'
_TEST = {
'url': 'onetmvp:381027.1509591944',
'only_matching': True,
}
def _real_extract(self, url):
return self._extract_from_id(self._match_id(url))
class OnetIE(OnetBaseIE): class OnetIE(OnetBaseIE):
_VALID_URL = r'https?://(?:www\.)?onet\.tv/[a-z]/[a-z]+/(?P<display_id>[0-9a-z-]+)/(?P<id>[0-9a-z]+)' _VALID_URL = r'https?://(?:www\.)?onet\.tv/[a-z]/[a-z]+/(?P<display_id>[0-9a-z-]+)/(?P<id>[0-9a-z]+)'
IE_NAME = 'onet.tv' IE_NAME = 'onet.tv'
@ -167,3 +181,44 @@ class OnetChannelIE(OnetBaseIE):
channel_title = strip_or_none(get_element_by_class('o_channelName', webpage)) channel_title = strip_or_none(get_element_by_class('o_channelName', webpage))
channel_description = strip_or_none(get_element_by_class('o_channelDesc', webpage)) channel_description = strip_or_none(get_element_by_class('o_channelDesc', webpage))
return self.playlist_result(entries, channel_id, channel_title, channel_description) return self.playlist_result(entries, channel_id, channel_title, channel_description)
class OnetPlIE(InfoExtractor):
_VALID_URL = r'https?://(?:[^/]+\.)?(?:onet|businessinsider\.com|plejada)\.pl/(?:[^/]+/)+(?P<id>[0-9a-z]+)'
IE_NAME = 'onet.pl'
_TESTS = [{
'url': 'http://eurosport.onet.pl/zimowe/skoki-narciarskie/ziobro-wygral-kwalifikacje-w-pjongczangu/9ckrly',
'md5': 'b94021eb56214c3969380388b6e73cb0',
'info_dict': {
'id': '1561707.1685479',
'ext': 'mp4',
'title': 'Ziobro wygrał kwalifikacje w Pjongczangu',
'description': 'md5:61fb0740084d2d702ea96512a03585b4',
'upload_date': '20170214',
'timestamp': 1487078046,
},
}, {
'url': 'http://film.onet.pl/zwiastuny/ghost-in-the-shell-drugi-zwiastun-pl/5q6yl3',
'only_matching': True,
}, {
'url': 'http://moto.onet.pl/jak-wybierane-sa-miejsca-na-fotoradary/6rs04e',
'only_matching': True,
}, {
'url': 'http://businessinsider.com.pl/wideo/scenariusz-na-koniec-swiata-wedlug-nasa/dwnqptk',
'only_matching': True,
}, {
'url': 'http://plejada.pl/weronika-rosati-o-swoim-domniemanym-slubie/n2bq89',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
mvp_id = self._search_regex(
r'data-params-mvp=["\'](\d+\.\d+)', webpage, 'mvp id')
return self.url_result(
'onetmvp:%s' % mvp_id, OnetMVPIE.ie_key(), video_id=mvp_id)

View File

@ -64,7 +64,8 @@ class PinkbikeIE(InfoExtractor):
'video:duration', webpage, 'duration')) 'video:duration', webpage, 'duration'))
uploader = self._search_regex( uploader = self._search_regex(
r'un:\s*"([^"]+)"', webpage, 'uploader', fatal=False) r'<a[^>]+\brel=["\']author[^>]+>([^<]+)', webpage,
'uploader', fatal=False)
upload_date = unified_strdate(self._search_regex( upload_date = unified_strdate(self._search_regex(
r'class="fullTime"[^>]+title="([^"]+)"', r'class="fullTime"[^>]+title="([^"]+)"',
webpage, 'upload date', fatal=False)) webpage, 'upload date', fatal=False))

View File

@ -2,27 +2,27 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import itertools import itertools
import os # import os
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_HTTPError, compat_HTTPError,
compat_urllib_parse_unquote, # compat_urllib_parse_unquote,
compat_urllib_parse_unquote_plus, # compat_urllib_parse_unquote_plus,
compat_urllib_parse_urlparse, # compat_urllib_parse_urlparse,
) )
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
int_or_none, int_or_none,
js_to_json, js_to_json,
orderedSet, orderedSet,
sanitized_Request, # sanitized_Request,
str_to_int, str_to_int,
) )
from ..aes import ( # from ..aes import (
aes_decrypt_text # aes_decrypt_text
) # )
class PornHubIE(InfoExtractor): class PornHubIE(InfoExtractor):
@ -109,10 +109,14 @@ class PornHubIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
req = sanitized_Request( def dl_webpage(platform):
'http://www.pornhub.com/view_video.php?viewkey=%s' % video_id) return self._download_webpage(
req.add_header('Cookie', 'age_verified=1') 'http://www.pornhub.com/view_video.php?viewkey=%s' % video_id,
webpage = self._download_webpage(req, video_id) video_id, headers={
'Cookie': 'age_verified=1; platform=%s' % platform,
})
webpage = dl_webpage('pc')
error_msg = self._html_search_regex( error_msg = self._html_search_regex(
r'(?s)<div[^>]+class=(["\'])(?:(?!\1).)*\b(?:removed|userMessageSection)\b(?:(?!\1).)*\1[^>]*>(?P<error>.+?)</div>', r'(?s)<div[^>]+class=(["\'])(?:(?!\1).)*\b(?:removed|userMessageSection)\b(?:(?!\1).)*\1[^>]*>(?P<error>.+?)</div>',
@ -123,10 +127,19 @@ class PornHubIE(InfoExtractor):
'PornHub said: %s' % error_msg, 'PornHub said: %s' % error_msg,
expected=True, video_id=video_id) expected=True, video_id=video_id)
tv_webpage = dl_webpage('tv')
video_url = self._search_regex(
r'<video[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//.+?)\1', tv_webpage,
'video url', group='url')
title = self._search_regex(
r'<h1>([^>]+)</h1>', tv_webpage, 'title', default=None)
# video_title from flashvars contains whitespace instead of non-ASCII (see # video_title from flashvars contains whitespace instead of non-ASCII (see
# http://www.pornhub.com/view_video.php?viewkey=1331683002), not relying # http://www.pornhub.com/view_video.php?viewkey=1331683002), not relying
# on that anymore. # on that anymore.
title = self._html_search_meta( title = title or self._html_search_meta(
'twitter:title', webpage, default=None) or self._search_regex( 'twitter:title', webpage, default=None) or self._search_regex(
(r'<h1[^>]+class=["\']title["\'][^>]*>(?P<title>[^<]+)', (r'<h1[^>]+class=["\']title["\'][^>]*>(?P<title>[^<]+)',
r'<div[^>]+data-video-title=(["\'])(?P<title>.+?)\1', r'<div[^>]+data-video-title=(["\'])(?P<title>.+?)\1',
@ -156,6 +169,7 @@ class PornHubIE(InfoExtractor):
comment_count = self._extract_count( comment_count = self._extract_count(
r'All Comments\s*<span>\(([\d,.]+)\)', webpage, 'comment') r'All Comments\s*<span>\(([\d,.]+)\)', webpage, 'comment')
"""
video_variables = {} video_variables = {}
for video_variablename, quote, video_variable in re.findall( for video_variablename, quote, video_variable in re.findall(
r'(player_quality_[0-9]{3,4}p\w+)\s*=\s*(["\'])(.+?)\2;', webpage): r'(player_quality_[0-9]{3,4}p\w+)\s*=\s*(["\'])(.+?)\2;', webpage):
@ -197,6 +211,7 @@ class PornHubIE(InfoExtractor):
'height': height, 'height': height,
}) })
self._sort_formats(formats) self._sort_formats(formats)
"""
page_params = self._parse_json(self._search_regex( page_params = self._parse_json(self._search_regex(
r'page_params\.zoneDetails\[([\'"])[^\'"]+\1\]\s*=\s*(?P<data>{[^}]+})', r'page_params\.zoneDetails\[([\'"])[^\'"]+\1\]\s*=\s*(?P<data>{[^}]+})',
@ -209,6 +224,7 @@ class PornHubIE(InfoExtractor):
return { return {
'id': video_id, 'id': video_id,
'url': video_url,
'uploader': video_uploader, 'uploader': video_uploader,
'title': title, 'title': title,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
@ -217,7 +233,7 @@ class PornHubIE(InfoExtractor):
'like_count': like_count, 'like_count': like_count,
'dislike_count': dislike_count, 'dislike_count': dislike_count,
'comment_count': comment_count, 'comment_count': comment_count,
'formats': formats, # 'formats': formats,
'age_limit': 18, 'age_limit': 18,
'tags': tags, 'tags': tags,
'categories': categories, 'categories': categories,

View File

@ -0,0 +1,32 @@
# coding: utf-8
from __future__ import unicode_literals
from .onet import OnetBaseIE
class VODPlIE(OnetBaseIE):
_VALID_URL = r'https?://vod\.pl/(?:[^/]+/)+(?P<id>[0-9a-zA-Z]+)'
_TESTS = [{
'url': 'https://vod.pl/filmy/chlopaki-nie-placza/3ep3jns',
'md5': 'a7dc3b2f7faa2421aefb0ecaabf7ec74',
'info_dict': {
'id': '3ep3jns',
'ext': 'mp4',
'title': 'Chłopaki nie płaczą',
'description': 'md5:f5f03b84712e55f5ac9f0a3f94445224',
'timestamp': 1463415154,
'duration': 5765,
'upload_date': '20160516',
},
}, {
'url': 'https://vod.pl/seriale/belfer-na-planie-praca-kamery-online/2c10heh',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
info_dict = self._extract_from_id(self._search_mvp_id(webpage), webpage)
info_dict['id'] = video_id
return info_dict

View File

@ -44,6 +44,9 @@ class XTubeIE(InfoExtractor):
}, { }, {
'url': 'xtube:625837', 'url': 'xtube:625837',
'only_matching': True, 'only_matching': True,
}, {
'url': 'xtube:kVTUy_G222_',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -53,11 +56,16 @@ class XTubeIE(InfoExtractor):
if not display_id: if not display_id:
display_id = video_id display_id = video_id
url = 'http://www.xtube.com/video-watch/-%s' % video_id
req = sanitized_Request(url) if video_id.isdigit() and len(video_id) < 11:
req.add_header('Cookie', 'age_verified=1; cookiesAccepted=1') url_pattern = 'http://www.xtube.com/video-watch/-%s'
webpage = self._download_webpage(req, display_id) else:
url_pattern = 'http://www.xtube.com/watch.php?v=%s'
webpage = self._download_webpage(
url_pattern % video_id, display_id, headers={
'Cookie': 'age_verified=1; cookiesAccepted=1',
})
sources = self._parse_json(self._search_regex( sources = self._parse_json(self._search_regex(
r'(["\'])sources\1\s*:\s*(?P<sources>{.+?}),', r'(["\'])sources\1\s*:\s*(?P<sources>{.+?}),',
@ -73,7 +81,7 @@ class XTubeIE(InfoExtractor):
self._sort_formats(formats) self._sort_formats(formats)
title = self._search_regex( title = self._search_regex(
(r'<h1>(?P<title>[^<]+)</h1>', r'videoTitle\s*:\s*(["\'])(?P<title>.+?)\1'), (r'<h1>\s*(?P<title>[^<]+?)\s*</h1>', r'videoTitle\s*:\s*(["\'])(?P<title>.+?)\1'),
webpage, 'title', group='title') webpage, 'title', group='title')
description = self._search_regex( description = self._search_regex(
r'</h1>\s*<p>([^<]+)', webpage, 'description', fatal=False) r'</h1>\s*<p>([^<]+)', webpage, 'description', fatal=False)

View File

@ -34,6 +34,7 @@ from ..utils import (
int_or_none, int_or_none,
mimetype2ext, mimetype2ext,
orderedSet, orderedSet,
parse_codecs,
parse_duration, parse_duration,
remove_quotes, remove_quotes,
remove_start, remove_start,
@ -1696,15 +1697,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
codecs = mobj.group('val') codecs = mobj.group('val')
break break
if codecs: if codecs:
codecs = codecs.split(',') dct.update(parse_codecs(codecs))
if len(codecs) == 2:
acodec, vcodec = codecs[1], codecs[0]
else:
acodec, vcodec = (codecs[0], 'none') if kind == 'audio' else ('none', codecs[0])
dct.update({
'acodec': acodec,
'vcodec': vcodec,
})
formats.append(dct) formats.append(dct)
elif video_info.get('hlsvp'): elif video_info.get('hlsvp'):
manifest_url = video_info['hlsvp'][0] manifest_url = video_info['hlsvp'][0]

View File

@ -20,9 +20,9 @@ from ..utils import (
class ZDFBaseIE(InfoExtractor): class ZDFBaseIE(InfoExtractor):
def _call_api(self, url, player, referrer, video_id): def _call_api(self, url, player, referrer, video_id, item):
return self._download_json( return self._download_json(
url, video_id, 'Downloading JSON content', url, video_id, 'Downloading JSON %s' % item,
headers={ headers={
'Referer': referrer, 'Referer': referrer,
'Api-Auth': 'Bearer %s' % player['apiToken'], 'Api-Auth': 'Bearer %s' % player['apiToken'],
@ -104,7 +104,7 @@ class ZDFIE(ZDFBaseIE):
}) })
formats.append(f) formats.append(f)
def _extract_entry(self, url, content, video_id): def _extract_entry(self, url, player, content, video_id):
title = content.get('title') or content['teaserHeadline'] title = content.get('title') or content['teaserHeadline']
t = content['mainVideoContent']['http://zdf.de/rels/target'] t = content['mainVideoContent']['http://zdf.de/rels/target']
@ -116,7 +116,8 @@ class ZDFIE(ZDFBaseIE):
'http://zdf.de/rels/streams/ptmd-template'].replace( 'http://zdf.de/rels/streams/ptmd-template'].replace(
'{playerId}', 'portal') '{playerId}', 'portal')
ptmd = self._download_json(urljoin(url, ptmd_path), video_id) ptmd = self._call_api(
urljoin(url, ptmd_path), player, url, video_id, 'metadata')
formats = [] formats = []
track_uris = set() track_uris = set()
@ -174,8 +175,9 @@ class ZDFIE(ZDFBaseIE):
} }
def _extract_regular(self, url, player, video_id): def _extract_regular(self, url, player, video_id):
content = self._call_api(player['content'], player, url, video_id) content = self._call_api(
return self._extract_entry(player['content'], content, video_id) player['content'], player, url, video_id, 'content')
return self._extract_entry(player['content'], player, content, video_id)
def _extract_mobile(self, video_id): def _extract_mobile(self, video_id):
document = self._download_json( document = self._download_json(

View File

@ -1684,6 +1684,11 @@ def setproctitle(title):
libc = ctypes.cdll.LoadLibrary('libc.so.6') libc = ctypes.cdll.LoadLibrary('libc.so.6')
except OSError: except OSError:
return return
except TypeError:
# LoadLibrary in Windows Python 2.7.13 only expects
# a bytestring, but since unicode_literals turns
# every string into a unicode string, it fails.
return
title_bytes = title.encode('utf-8') title_bytes = title.encode('utf-8')
buf = ctypes.create_string_buffer(len(title_bytes)) buf = ctypes.create_string_buffer(len(title_bytes))
buf.value = title_bytes buf.value = title_bytes
@ -2378,6 +2383,7 @@ def _match_one(filter_part, dct):
\s*(?P<op>%s)(?P<none_inclusive>\s*\?)?\s* \s*(?P<op>%s)(?P<none_inclusive>\s*\?)?\s*
(?: (?:
(?P<intval>[0-9.]+(?:[kKmMgGtTpPeEzZyY]i?[Bb]?)?)| (?P<intval>[0-9.]+(?:[kKmMgGtTpPeEzZyY]i?[Bb]?)?)|
(?P<quote>["\'])(?P<quotedstrval>(?:\\.|(?!(?P=quote)|\\).)+?)(?P=quote)|
(?P<strval>(?![0-9.])[a-z0-9A-Z]*) (?P<strval>(?![0-9.])[a-z0-9A-Z]*)
) )
\s*$ \s*$
@ -2386,7 +2392,8 @@ def _match_one(filter_part, dct):
if m: if m:
op = COMPARISON_OPERATORS[m.group('op')] op = COMPARISON_OPERATORS[m.group('op')]
actual_value = dct.get(m.group('key')) actual_value = dct.get(m.group('key'))
if (m.group('strval') is not None or if (m.group('quotedstrval') is not None or
m.group('strval') is not None or
# If the original field is a string and matching comparisonvalue is # If the original field is a string and matching comparisonvalue is
# a number we should respect the origin of the original field # a number we should respect the origin of the original field
# and process comparison value as a string (see # and process comparison value as a string (see
@ -2396,7 +2403,10 @@ def _match_one(filter_part, dct):
if m.group('op') not in ('=', '!='): if m.group('op') not in ('=', '!='):
raise ValueError( raise ValueError(
'Operator %s does not support string values!' % m.group('op')) 'Operator %s does not support string values!' % m.group('op'))
comparison_value = m.group('strval') or m.group('intval') comparison_value = m.group('quotedstrval') or m.group('strval') or m.group('intval')
quote = m.group('quote')
if quote is not None:
comparison_value = comparison_value.replace(r'\%s' % quote, quote)
else: else:
try: try:
comparison_value = int(m.group('intval')) comparison_value = int(m.group('intval'))

View File

@ -1,3 +1,3 @@
from __future__ import unicode_literals from __future__ import unicode_literals
__version__ = '2017.02.11' __version__ = '2017.02.16'