Compare commits

...

29 Commits

Author SHA1 Message Date
28e35f5070 release 2017.02.17 2017-02-17 23:59:56 +07:00
cf3704c132 [ChangeLog] Actualize 2017-02-17 23:48:30 +07:00
2c1f442c2b [options] Add missing spaces 2017-02-17 23:18:26 +07:00
bad4ccdb5d [heise] Improve (closes #9725) 2017-02-17 23:09:40 +07:00
db76c30c6e [heise] Support videos embedded in any article. 2017-02-17 22:55:53 +07:00
c2bde5d081 [ellentv] Improve 2017-02-17 22:45:51 +07:00
90fad0e74c [openload] Fix extraction (closes #12002) 2017-02-17 22:31:16 +07:00
d94badc755 [openload] Semifix extraction (closes #10408)
just updated the code. i don't do much python still i tried to convert my code. lemme know if there is any prob with it
2017-02-17 22:30:05 +07:00
fef51645d6 [theplatform] Recognize URLs with whitespaces (closes #12044) 2017-02-17 23:13:51 +08:00
4cead6a614 [einthusan] Relax _VALID_URL (closes #12141, closes #12159) 2017-02-17 22:02:01 +07:00
a4a554a793 [generic] Try parsing JWPlayer embedded videos (closes #12030) 2017-02-16 23:44:03 +08:00
b898f0a173 [elpais] Fix typo and improve extraction (closes #12139) 2017-02-16 04:57:42 +07:00
2480b056c1 release 2017.02.16 2017-02-16 00:10:04 +07:00
3aa25395aa [ChangeLog] Actualize 2017-02-16 00:08:56 +07:00
eafaeb226a [ceskatelevize] Lower priority for audio description sources (#12119) 2017-02-16 00:04:15 +07:00
de4d378c0c [ceskatelevize] Prefix format ids 2017-02-15 23:38:00 +07:00
099cfdb770 [devscripts/run_tests.sh] Change permission for script to 755 2017-02-16 00:28:31 +08:00
398dea3210 [test_YoutubeDL] Fix invalid escape sequences 2017-02-15 23:20:46 +07:00
db13c16ef8 [utils] Add support for quoted string literals in --match-filter (closes #8050, closes #12142, closes #12144) 2017-02-15 23:12:10 +07:00
1bd05345ea [amcnetworks] fix extraction(closes #12127) 2017-02-15 14:19:18 +01:00
3021cf83b7 [pinkbike] Fix uploader extraction (closes #12054) 2017-02-15 02:08:32 +07:00
04a741232f [onetpl] Add support for businessinsider.com.pl and plejada.pl 2017-02-15 01:23:55 +07:00
43a3d9edfc [onetpl] Add support for onet.pl (closes #10507) 2017-02-15 01:14:06 +07:00
d31aa74fdb [onetmvp] Add shortcut extractor 2017-02-15 00:58:18 +07:00
6092ccd058 [vodpl] Make more robust and add another test (closes #12122) 2017-02-15 00:52:31 +07:00
22ce9ad2bd [vod.pl] Add new extractor 2017-02-15 00:48:08 +07:00
9a372f14b4 [pornhub] Extract video URL from tv platform site (#12007, #12129) 2017-02-14 23:52:41 +07:00
5cb2d36c82 [ceskatelevize] Extract DASH formats (closes #12119, closes #12133) 2017-02-14 22:57:38 +07:00
fcca0d53a8 [ceskatelevize] Quick fix to revert to using old HLS-based playlist
This fixes recent changes in iVysilani. Proper patch should migrate to
MPEG-DASH version, which is now the default.
2017-02-14 22:25:37 +07:00
37 changed files with 565 additions and 365 deletions

View File

@ -6,8 +6,8 @@
--- ---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.02.14*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected. ### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.02.17*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.02.14** - [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.02.17**
### Before submitting an *issue* make sure you have: ### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections - [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2017.02.14 [debug] youtube-dl version 2017.02.17
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View File

@ -11,8 +11,6 @@ sudo: false
env: env:
- YTDL_TEST_SET=core - YTDL_TEST_SET=core
- YTDL_TEST_SET=download - YTDL_TEST_SET=download
before_script:
- chmod +x ./devscripts/run_tests.sh
script: ./devscripts/run_tests.sh script: ./devscripts/run_tests.sh
notifications: notifications:
email: email:

View File

@ -1,3 +1,33 @@
version 2017.02.17
Extractors
* [heise] Improve extraction (#9725)
* [ellentv] Improve (#11653)
* [openload] Fix extraction (#10408, #12002)
+ [theplatform] Recognize URLs with whitespaces (#12044)
* [einthusan] Relax URL regular expression (#12141, #12159)
+ [generic] Support complex JWPlayer embedded videos (#12030)
* [elpais] Improve extraction (#12139)
version 2017.02.16
Core
+ [utils] Add support for quoted string literals in --match-filter (#8050,
#12142, #12144)
Extractors
* [ceskatelevize] Lower priority for audio description sources (#12119)
* [amcnetworks] Fix extraction (#12127)
* [pinkbike] Fix uploader extraction (#12054)
+ [onetpl] Add support for businessinsider.com.pl and plejada.pl
+ [onetpl] Add support for onet.pl (#10507)
+ [onetmvp] Add shortcut extractor
+ [vodpl] Add support for vod.pl (#12122)
+ [pornhub] Extract video URL from tv platform site (#12007, #12129)
+ [ceskatelevize] Extract DASH formats (#12119, #12133)
version 2017.02.14 version 2017.02.14
Core Core

View File

@ -137,13 +137,13 @@ Alternatively, refer to the [developer instructions](#developer-instructions) fo
--match-filter FILTER Generic video filter. Specify any key (see --match-filter FILTER Generic video filter. Specify any key (see
help for -o for a list of available keys) help for -o for a list of available keys)
to match if the key is present, !key to to match if the key is present, !key to
check if the key is not present,key > check if the key is not present, key >
NUMBER (like "comment_count > 12", also NUMBER (like "comment_count > 12", also
works with >=, <, <=, !=, =) to compare works with >=, <, <=, !=, =) to compare
against a number, and & to require multiple against a number, and & to require multiple
matches. Values which are not known are matches. Values which are not known are
excluded unless you put a question mark (?) excluded unless you put a question mark (?)
after the operator.For example, to only after the operator. For example, to only
match videos that have been liked more than match videos that have been liked more than
100 times and disliked less than 50 times 100 times and disliked less than 50 times
(or the dislike functionality is not (or the dislike functionality is not

0
devscripts/run_tests.sh Normal file → Executable file
View File

View File

@ -546,8 +546,10 @@
- **OktoberfestTV** - **OktoberfestTV**
- **on.aol.com** - **on.aol.com**
- **OnDemandKorea** - **OnDemandKorea**
- **onet.pl**
- **onet.tv** - **onet.tv**
- **onet.tv:channel** - **onet.tv:channel**
- **OnetMVP**
- **OnionStudios** - **OnionStudios**
- **Ooyala** - **Ooyala**
- **OoyalaExternal** - **OoyalaExternal**
@ -900,6 +902,7 @@
- **vlive** - **vlive**
- **vlive:channel** - **vlive:channel**
- **Vodlocker** - **Vodlocker**
- **VODPl**
- **VODPlatform** - **VODPlatform**
- **VoiceRepublic** - **VoiceRepublic**
- **VoxMedia** - **VoxMedia**

View File

@ -1,4 +1,5 @@
#!/usr/bin/env python #!/usr/bin/env python
# coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
@ -540,10 +541,10 @@ class TestYoutubeDL(unittest.TestCase):
self.assertEqual(ydl._format_note({}), '') self.assertEqual(ydl._format_note({}), '')
assertRegexpMatches(self, ydl._format_note({ assertRegexpMatches(self, ydl._format_note({
'vbr': 10, 'vbr': 10,
}), '^\s*10k$') }), r'^\s*10k$')
assertRegexpMatches(self, ydl._format_note({ assertRegexpMatches(self, ydl._format_note({
'fps': 30, 'fps': 30,
}), '^30fps$') }), r'^30fps$')
def test_postprocessors(self): def test_postprocessors(self):
filename = 'post-processor-testfile.mp4' filename = 'post-processor-testfile.mp4'
@ -606,6 +607,8 @@ class TestYoutubeDL(unittest.TestCase):
'duration': 30, 'duration': 30,
'filesize': 10 * 1024, 'filesize': 10 * 1024,
'playlist_id': '42', 'playlist_id': '42',
'uploader': "變態妍字幕版 太妍 тест",
'creator': "тест ' 123 ' тест--",
} }
second = { second = {
'id': '2', 'id': '2',
@ -616,6 +619,7 @@ class TestYoutubeDL(unittest.TestCase):
'description': 'foo', 'description': 'foo',
'filesize': 5 * 1024, 'filesize': 5 * 1024,
'playlist_id': '43', 'playlist_id': '43',
'uploader': "тест 123",
} }
videos = [first, second] videos = [first, second]
@ -656,6 +660,26 @@ class TestYoutubeDL(unittest.TestCase):
res = get_videos(f) res = get_videos(f)
self.assertEqual(res, ['1']) self.assertEqual(res, ['1'])
f = match_filter_func('uploader = "變態妍字幕版 太妍 тест"')
res = get_videos(f)
self.assertEqual(res, ['1'])
f = match_filter_func('uploader != "變態妍字幕版 太妍 тест"')
res = get_videos(f)
self.assertEqual(res, ['2'])
f = match_filter_func('creator = "тест \' 123 \' тест--"')
res = get_videos(f)
self.assertEqual(res, ['1'])
f = match_filter_func("creator = 'тест \\' 123 \\' тест--'")
res = get_videos(f)
self.assertEqual(res, ['1'])
f = match_filter_func(r"creator = 'тест \' 123 \' тест--' & duration > 30")
res = get_videos(f)
self.assertEqual(res, [])
def test_playlist_items_selection(self): def test_playlist_items_selection(self):
entries = [{ entries = [{
'id': compat_str(i), 'id': compat_str(i),

View File

@ -53,20 +53,30 @@ class AMCNetworksIE(ThePlatformIE):
'mbr': 'true', 'mbr': 'true',
'manifest': 'm3u', 'manifest': 'm3u',
} }
media_url = self._search_regex(r'window\.platformLinkURL\s*=\s*[\'"]([^\'"]+)', webpage, 'media url') media_url = self._search_regex(
r'window\.platformLinkURL\s*=\s*[\'"]([^\'"]+)',
webpage, 'media url')
theplatform_metadata = self._download_theplatform_metadata(self._search_regex( theplatform_metadata = self._download_theplatform_metadata(self._search_regex(
r'https?://link.theplatform.com/s/([^?]+)', media_url, 'theplatform_path'), display_id) r'link\.theplatform\.com/s/([^?]+)',
media_url, 'theplatform_path'), display_id)
info = self._parse_theplatform_metadata(theplatform_metadata) info = self._parse_theplatform_metadata(theplatform_metadata)
video_id = theplatform_metadata['pid'] video_id = theplatform_metadata['pid']
title = theplatform_metadata['title'] title = theplatform_metadata['title']
rating = theplatform_metadata['ratings'][0]['rating'] rating = theplatform_metadata['ratings'][0]['rating']
auth_required = self._search_regex(r'window\.authRequired\s*=\s*(true|false);', webpage, 'auth required') auth_required = self._search_regex(
r'window\.authRequired\s*=\s*(true|false);',
webpage, 'auth required')
if auth_required == 'true': if auth_required == 'true':
requestor_id = self._search_regex(r'window\.requestor_id\s*=\s*[\'"]([^\'"]+)', webpage, 'requestor id') requestor_id = self._search_regex(
resource = self._get_mvpd_resource(requestor_id, title, video_id, rating) r'window\.requestor_id\s*=\s*[\'"]([^\'"]+)',
query['auth'] = self._extract_mvpd_auth(url, video_id, requestor_id, resource) webpage, 'requestor id')
resource = self._get_mvpd_resource(
requestor_id, title, video_id, rating)
query['auth'] = self._extract_mvpd_auth(
url, video_id, requestor_id, resource)
media_url = update_url_query(media_url, query) media_url = update_url_query(media_url, query)
formats, subtitles = self._extract_theplatform_smil(media_url, video_id) formats, subtitles = self._extract_theplatform_smil(
media_url, video_id)
self._sort_formats(formats) self._sort_formats(formats)
info.update({ info.update({
'id': video_id, 'id': video_id,
@ -78,9 +88,11 @@ class AMCNetworksIE(ThePlatformIE):
if ns_keys: if ns_keys:
ns = list(ns_keys)[0] ns = list(ns_keys)[0]
series = theplatform_metadata.get(ns + '$show') series = theplatform_metadata.get(ns + '$show')
season_number = int_or_none(theplatform_metadata.get(ns + '$season')) season_number = int_or_none(
theplatform_metadata.get(ns + '$season'))
episode = theplatform_metadata.get(ns + '$episodeTitle') episode = theplatform_metadata.get(ns + '$episodeTitle')
episode_number = int_or_none(theplatform_metadata.get(ns + '$episode')) episode_number = int_or_none(
theplatform_metadata.get(ns + '$episode'))
if season_number: if season_number:
title = 'Season %d - %s' % (season_number, title) title = 'Season %d - %s' % (season_number, title)
if series: if series:

View File

@ -1,13 +1,13 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .jwplatform import JWPlatformBaseIE from .common import InfoExtractor
from ..utils import ( from ..utils import (
unified_strdate, unified_strdate,
clean_html, clean_html,
) )
class ArchiveOrgIE(JWPlatformBaseIE): class ArchiveOrgIE(InfoExtractor):
IE_NAME = 'archive.org' IE_NAME = 'archive.org'
IE_DESC = 'archive.org videos' IE_DESC = 'archive.org videos'
_VALID_URL = r'https?://(?:www\.)?archive\.org/(?:details|embed)/(?P<id>[^/?#]+)(?:[?].*)?$' _VALID_URL = r'https?://(?:www\.)?archive\.org/(?:details|embed)/(?P<id>[^/?#]+)(?:[?].*)?$'

View File

@ -13,6 +13,7 @@ from ..utils import (
float_or_none, float_or_none,
sanitized_Request, sanitized_Request,
urlencode_postdata, urlencode_postdata,
USER_AGENTS,
) )
@ -21,10 +22,10 @@ class CeskaTelevizeIE(InfoExtractor):
_TESTS = [{ _TESTS = [{
'url': 'http://www.ceskatelevize.cz/ivysilani/ivysilani/10441294653-hyde-park-civilizace/214411058091220', 'url': 'http://www.ceskatelevize.cz/ivysilani/ivysilani/10441294653-hyde-park-civilizace/214411058091220',
'info_dict': { 'info_dict': {
'id': '61924494876951776', 'id': '61924494877246241',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Hyde Park Civilizace', 'title': 'Hyde Park Civilizace: Život v Grónsku',
'description': 'md5:fe93f6eda372d150759d11644ebbfb4a', 'description': 'md5:3fec8f6bb497be5cdb0c9e8781076626',
'thumbnail': r're:^https?://.*\.jpg', 'thumbnail': r're:^https?://.*\.jpg',
'duration': 3350, 'duration': 3350,
}, },
@ -114,70 +115,100 @@ class CeskaTelevizeIE(InfoExtractor):
'requestSource': 'iVysilani', 'requestSource': 'iVysilani',
} }
req = sanitized_Request(
'http://www.ceskatelevize.cz/ivysilani/ajax/get-client-playlist',
data=urlencode_postdata(data))
req.add_header('Content-type', 'application/x-www-form-urlencoded')
req.add_header('x-addr', '127.0.0.1')
req.add_header('X-Requested-With', 'XMLHttpRequest')
req.add_header('Referer', url)
playlistpage = self._download_json(req, playlist_id)
playlist_url = playlistpage['url']
if playlist_url == 'error_region':
raise ExtractorError(NOT_AVAILABLE_STRING, expected=True)
req = sanitized_Request(compat_urllib_parse_unquote(playlist_url))
req.add_header('Referer', url)
playlist_title = self._og_search_title(webpage, default=None)
playlist_description = self._og_search_description(webpage, default=None)
playlist = self._download_json(req, playlist_id)['playlist']
playlist_len = len(playlist)
entries = [] entries = []
for item in playlist:
is_live = item.get('type') == 'LIVE'
formats = []
for format_id, stream_url in item['streamUrls'].items():
formats.extend(self._extract_m3u8_formats(
stream_url, playlist_id, 'mp4',
entry_protocol='m3u8' if is_live else 'm3u8_native',
fatal=False))
self._sort_formats(formats)
item_id = item.get('id') or item['assetId'] for user_agent in (None, USER_AGENTS['Safari']):
title = item['title'] req = sanitized_Request(
'http://www.ceskatelevize.cz/ivysilani/ajax/get-client-playlist',
data=urlencode_postdata(data))
duration = float_or_none(item.get('duration')) req.add_header('Content-type', 'application/x-www-form-urlencoded')
thumbnail = item.get('previewImageUrl') req.add_header('x-addr', '127.0.0.1')
req.add_header('X-Requested-With', 'XMLHttpRequest')
if user_agent:
req.add_header('User-Agent', user_agent)
req.add_header('Referer', url)
subtitles = {} playlistpage = self._download_json(req, playlist_id, fatal=False)
if item.get('type') == 'VOD':
subs = item.get('subtitles')
if subs:
subtitles = self.extract_subtitles(episode_id, subs)
if playlist_len == 1: if not playlistpage:
final_title = playlist_title or title continue
if is_live:
final_title = self._live_title(final_title)
else:
final_title = '%s (%s)' % (playlist_title, title)
entries.append({ playlist_url = playlistpage['url']
'id': item_id, if playlist_url == 'error_region':
'title': final_title, raise ExtractorError(NOT_AVAILABLE_STRING, expected=True)
'description': playlist_description if playlist_len == 1 else None,
'thumbnail': thumbnail, req = sanitized_Request(compat_urllib_parse_unquote(playlist_url))
'duration': duration, req.add_header('Referer', url)
'formats': formats,
'subtitles': subtitles, playlist_title = self._og_search_title(webpage, default=None)
'is_live': is_live, playlist_description = self._og_search_description(webpage, default=None)
})
playlist = self._download_json(req, playlist_id, fatal=False)
if not playlist:
continue
playlist = playlist.get('playlist')
if not isinstance(playlist, list):
continue
playlist_len = len(playlist)
for num, item in enumerate(playlist):
is_live = item.get('type') == 'LIVE'
formats = []
for format_id, stream_url in item.get('streamUrls', {}).items():
if 'playerType=flash' in stream_url:
stream_formats = self._extract_m3u8_formats(
stream_url, playlist_id, 'mp4',
entry_protocol='m3u8' if is_live else 'm3u8_native',
m3u8_id='hls-%s' % format_id, fatal=False)
else:
stream_formats = self._extract_mpd_formats(
stream_url, playlist_id,
mpd_id='dash-%s' % format_id, fatal=False)
# See https://github.com/rg3/youtube-dl/issues/12119#issuecomment-280037031
if format_id == 'audioDescription':
for f in stream_formats:
f['source_preference'] = -10
formats.extend(stream_formats)
if user_agent and len(entries) == playlist_len:
entries[num]['formats'].extend(formats)
continue
item_id = item.get('id') or item['assetId']
title = item['title']
duration = float_or_none(item.get('duration'))
thumbnail = item.get('previewImageUrl')
subtitles = {}
if item.get('type') == 'VOD':
subs = item.get('subtitles')
if subs:
subtitles = self.extract_subtitles(episode_id, subs)
if playlist_len == 1:
final_title = playlist_title or title
if is_live:
final_title = self._live_title(final_title)
else:
final_title = '%s (%s)' % (playlist_title, title)
entries.append({
'id': item_id,
'title': final_title,
'description': playlist_description if playlist_len == 1 else None,
'thumbnail': thumbnail,
'duration': duration,
'formats': formats,
'subtitles': subtitles,
'is_live': is_live,
})
for e in entries:
self._sort_formats(e['formats'])
return self.playlist_result(entries, playlist_id, playlist_title, playlist_description) return self.playlist_result(entries, playlist_id, playlist_title, playlist_description)

View File

@ -40,6 +40,7 @@ from ..utils import (
fix_xml_ampersands, fix_xml_ampersands,
float_or_none, float_or_none,
int_or_none, int_or_none,
js_to_json,
parse_iso8601, parse_iso8601,
RegexNotFoundError, RegexNotFoundError,
sanitize_filename, sanitize_filename,
@ -2073,6 +2074,123 @@ class InfoExtractor(object):
}) })
return formats return formats
@staticmethod
def _find_jwplayer_data(webpage):
mobj = re.search(
r'jwplayer\((?P<quote>[\'"])[^\'" ]+(?P=quote)\)\.setup\s*\((?P<options>[^)]+)\)',
webpage)
if mobj:
return mobj.group('options')
def _extract_jwplayer_data(self, webpage, video_id, *args, **kwargs):
jwplayer_data = self._parse_json(
self._find_jwplayer_data(webpage), video_id,
transform_source=js_to_json)
return self._parse_jwplayer_data(
jwplayer_data, video_id, *args, **kwargs)
def _parse_jwplayer_data(self, jwplayer_data, video_id=None, require_title=True,
m3u8_id=None, mpd_id=None, rtmp_params=None, base_url=None):
# JWPlayer backward compatibility: flattened playlists
# https://github.com/jwplayer/jwplayer/blob/v7.4.3/src/js/api/config.js#L81-L96
if 'playlist' not in jwplayer_data:
jwplayer_data = {'playlist': [jwplayer_data]}
entries = []
# JWPlayer backward compatibility: single playlist item
# https://github.com/jwplayer/jwplayer/blob/v7.7.0/src/js/playlist/playlist.js#L10
if not isinstance(jwplayer_data['playlist'], list):
jwplayer_data['playlist'] = [jwplayer_data['playlist']]
for video_data in jwplayer_data['playlist']:
# JWPlayer backward compatibility: flattened sources
# https://github.com/jwplayer/jwplayer/blob/v7.4.3/src/js/playlist/item.js#L29-L35
if 'sources' not in video_data:
video_data['sources'] = [video_data]
this_video_id = video_id or video_data['mediaid']
formats = []
for source in video_data['sources']:
source_url = self._proto_relative_url(source['file'])
if base_url:
source_url = compat_urlparse.urljoin(base_url, source_url)
source_type = source.get('type') or ''
ext = mimetype2ext(source_type) or determine_ext(source_url)
if source_type == 'hls' or ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
source_url, this_video_id, 'mp4', 'm3u8_native', m3u8_id=m3u8_id, fatal=False))
elif ext == 'mpd':
formats.extend(self._extract_mpd_formats(
source_url, this_video_id, mpd_id=mpd_id, fatal=False))
# https://github.com/jwplayer/jwplayer/blob/master/src/js/providers/default.js#L67
elif source_type.startswith('audio') or ext in ('oga', 'aac', 'mp3', 'mpeg', 'vorbis'):
formats.append({
'url': source_url,
'vcodec': 'none',
'ext': ext,
})
else:
height = int_or_none(source.get('height'))
if height is None:
# Often no height is provided but there is a label in
# format like 1080p.
height = int_or_none(self._search_regex(
r'^(\d{3,})[pP]$', source.get('label') or '',
'height', default=None))
a_format = {
'url': source_url,
'width': int_or_none(source.get('width')),
'height': height,
'ext': ext,
}
if source_url.startswith('rtmp'):
a_format['ext'] = 'flv'
# See com/longtailvideo/jwplayer/media/RTMPMediaProvider.as
# of jwplayer.flash.swf
rtmp_url_parts = re.split(
r'((?:mp4|mp3|flv):)', source_url, 1)
if len(rtmp_url_parts) == 3:
rtmp_url, prefix, play_path = rtmp_url_parts
a_format.update({
'url': rtmp_url,
'play_path': prefix + play_path,
})
if rtmp_params:
a_format.update(rtmp_params)
formats.append(a_format)
self._sort_formats(formats)
subtitles = {}
tracks = video_data.get('tracks')
if tracks and isinstance(tracks, list):
for track in tracks:
if track.get('kind') != 'captions':
continue
track_url = urljoin(base_url, track.get('file'))
if not track_url:
continue
subtitles.setdefault(track.get('label') or 'en', []).append({
'url': self._proto_relative_url(track_url)
})
entries.append({
'id': this_video_id,
'title': video_data['title'] if require_title else video_data.get('title'),
'description': video_data.get('description'),
'thumbnail': self._proto_relative_url(video_data.get('image')),
'timestamp': int_or_none(video_data.get('pubdate')),
'duration': float_or_none(jwplayer_data.get('duration') or video_data.get('duration')),
'subtitles': subtitles,
'formats': formats,
})
if len(entries) == 1:
return entries[0]
else:
return self.playlist_result(entries)
def _live_title(self, name): def _live_title(self, name):
""" Generate the title for a live video """ """ Generate the title for a live video """
now = datetime.datetime.now() now = datetime.datetime.now()

View File

@ -18,8 +18,8 @@ from ..utils import (
class EinthusanIE(InfoExtractor): class EinthusanIE(InfoExtractor):
_VALID_URL = r'https?://einthusan\.tv/movie/watch/(?P<id>[0-9]+)' _VALID_URL = r'https?://einthusan\.tv/movie/watch/(?P<id>[^/?#&]+)'
_TEST = { _TESTS = [{
'url': 'https://einthusan.tv/movie/watch/9097/', 'url': 'https://einthusan.tv/movie/watch/9097/',
'md5': 'ff0f7f2065031b8a2cf13a933731c035', 'md5': 'ff0f7f2065031b8a2cf13a933731c035',
'info_dict': { 'info_dict': {
@ -29,7 +29,10 @@ class EinthusanIE(InfoExtractor):
'description': 'md5:33ef934c82a671a94652a9b4e54d931b', 'description': 'md5:33ef934c82a671a94652a9b4e54d931b',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
} }
} }, {
'url': 'https://einthusan.tv/movie/watch/51MZ/?lang=hindi',
'only_matching': True,
}]
# reversed from jsoncrypto.prototype.decrypt() in einthusan-PGMovieWatcher.js # reversed from jsoncrypto.prototype.decrypt() in einthusan-PGMovieWatcher.js
def _decrypt(self, encrypted_data, video_id): def _decrypt(self, encrypted_data, video_id):

View File

@ -1,13 +1,9 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import json
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from .kaltura import KalturaIE
ExtractorError, from ..utils import NO_DEFAULT
NO_DEFAULT,
)
class EllenTVIE(InfoExtractor): class EllenTVIE(InfoExtractor):
@ -65,7 +61,7 @@ class EllenTVIE(InfoExtractor):
if partner_id and kaltura_id: if partner_id and kaltura_id:
break break
return self.url_result('kaltura:%s:%s' % (partner_id, kaltura_id), 'Kaltura') return self.url_result('kaltura:%s:%s' % (partner_id, kaltura_id), KalturaIE.ie_key())
class EllenTVClipsIE(InfoExtractor): class EllenTVClipsIE(InfoExtractor):
@ -77,14 +73,14 @@ class EllenTVClipsIE(InfoExtractor):
'id': 'meryl-streep-vanessa-hudgens', 'id': 'meryl-streep-vanessa-hudgens',
'title': 'Meryl Streep, Vanessa Hudgens', 'title': 'Meryl Streep, Vanessa Hudgens',
}, },
'playlist_mincount': 7, 'playlist_mincount': 5,
} }
def _real_extract(self, url): def _real_extract(self, url):
playlist_id = self._match_id(url) playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id) webpage = self._download_webpage(url, playlist_id)
playlist = self._extract_playlist(webpage) playlist = self._extract_playlist(webpage, playlist_id)
return { return {
'_type': 'playlist', '_type': 'playlist',
@ -93,16 +89,13 @@ class EllenTVClipsIE(InfoExtractor):
'entries': self._extract_entries(playlist) 'entries': self._extract_entries(playlist)
} }
def _extract_playlist(self, webpage): def _extract_playlist(self, webpage, playlist_id):
json_string = self._search_regex(r'playerView.addClips\(\[\{(.*?)\}\]\);', webpage, 'json') json_string = self._search_regex(r'playerView.addClips\(\[\{(.*?)\}\]\);', webpage, 'json')
try: return self._parse_json('[{' + json_string + '}]', playlist_id)
return json.loads('[{' + json_string + '}]')
except ValueError as ve:
raise ExtractorError('Failed to download JSON', cause=ve)
def _extract_entries(self, playlist): def _extract_entries(self, playlist):
return [ return [
self.url_result( self.url_result(
'kaltura:%s:%s' % (item['kaltura_partner_id'], item['kaltura_entry_id']), 'kaltura:%s:%s' % (item['kaltura_partner_id'], item['kaltura_entry_id']),
'Kaltura') KalturaIE.ie_key(), video_id=item['kaltura_entry_id'])
for item in playlist] for item in playlist]

View File

@ -39,6 +39,18 @@ class ElPaisIE(InfoExtractor):
'description': 'La nave portaba cientos de ánforas y se hundió cerca de la isla de Cabrera por razones desconocidas', 'description': 'La nave portaba cientos de ánforas y se hundió cerca de la isla de Cabrera por razones desconocidas',
'upload_date': '20170127', 'upload_date': '20170127',
}, },
}, {
'url': 'http://epv.elpais.com/epv/2017/02/14/programa_la_voz_de_inaki/1487062137_075943.html',
'info_dict': {
'id': '1487062137_075943',
'ext': 'mp4',
'title': 'Disyuntivas',
'description': 'md5:a0fb1485c4a6a8a917e6f93878e66218',
'upload_date': '20170214',
},
'params': {
'skip_download': True,
},
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -59,14 +71,15 @@ class ElPaisIE(InfoExtractor):
video_url = prefix + video_suffix video_url = prefix + video_suffix
thumbnail_suffix = self._search_regex( thumbnail_suffix = self._search_regex(
r"(?:URLMediaStill|urlFotogramaFijo_\d+)\s*=\s*url_cache\s*\+\s*'([^']+)'", r"(?:URLMediaStill|urlFotogramaFijo_\d+)\s*=\s*url_cache\s*\+\s*'([^']+)'",
webpage, 'thumbnail URL', fatal=False) webpage, 'thumbnail URL', default=None)
thumbnail = ( thumbnail = (
None if thumbnail_suffix is None None if thumbnail_suffix is None
else prefix + thumbnail_suffix) else prefix + thumbnail_suffix) or self._og_search_thumbnail(webpage)
title = self._html_search_regex( title = self._html_search_regex(
(r"tituloVideo\s*=\s*'([^']+)'", webpage, 'title', (r"tituloVideo\s*=\s*'([^']+)'",
r'<h2 class="entry-header entry-title.*?>(.*?)</h2>'), r'<h2 class="entry-header entry-title.*?>(.*?)</h2>',
webpage, 'title') r'<h1[^>]+class="titulo"[^>]*>([^<]+)'),
webpage, 'title', default=None) or self._og_search_title(webpage)
upload_date = unified_strdate(self._search_regex( upload_date = unified_strdate(self._search_regex(
r'<p class="date-header date-int updated"\s+title="([^"]+)">', r'<p class="date-header date-int updated"\s+title="([^"]+)">',
webpage, 'upload date', default=None) or self._html_search_meta( webpage, 'upload date', default=None) or self._html_search_meta(

View File

@ -694,6 +694,8 @@ from .ondemandkorea import OnDemandKoreaIE
from .onet import ( from .onet import (
OnetIE, OnetIE,
OnetChannelIE, OnetChannelIE,
OnetMVPIE,
OnetPlIE,
) )
from .onionstudios import OnionStudiosIE from .onionstudios import OnionStudiosIE
from .ooyala import ( from .ooyala import (
@ -1147,6 +1149,7 @@ from .vlive import (
VLiveChannelIE VLiveChannelIE
) )
from .vodlocker import VodlockerIE from .vodlocker import VodlockerIE
from .vodpl import VODPlIE
from .vodplatform import VODPlatformIE from .vodplatform import VODPlatformIE
from .voicerepublic import VoiceRepublicIE from .voicerepublic import VoiceRepublicIE
from .voxmedia import VoxMediaIE from .voxmedia import VoxMediaIE

View File

@ -20,6 +20,7 @@ from ..utils import (
float_or_none, float_or_none,
HEADRequest, HEADRequest,
is_html, is_html,
js_to_json,
orderedSet, orderedSet,
sanitized_Request, sanitized_Request,
smuggle_url, smuggle_url,
@ -961,6 +962,16 @@ class GenericIE(InfoExtractor):
'skip_download': True, 'skip_download': True,
} }
}, },
# Complex jwplayer
{
'url': 'http://www.indiedb.com/games/king-machine/videos',
'info_dict': {
'id': 'videos',
'ext': 'mp4',
'title': 'king machine trailer 1',
'thumbnail': r're:^https?://.*\.jpg$',
},
},
# rtl.nl embed # rtl.nl embed
{ {
'url': 'http://www.rtlnieuws.nl/nieuws/buitenland/aanslagen-kopenhagen', 'url': 'http://www.rtlnieuws.nl/nieuws/buitenland/aanslagen-kopenhagen',
@ -1490,7 +1501,12 @@ class GenericIE(InfoExtractor):
'skip_download': True, 'skip_download': True,
}, },
'add_ie': [VideoPressIE.ie_key()], 'add_ie': [VideoPressIE.ie_key()],
} },
{
# ThePlatform embedded with whitespaces in URLs
'url': 'http://www.golfchannel.com/topics/shows/golftalkcentral.htm',
'only_matching': True,
},
# { # {
# # TODO: find another test # # TODO: find another test
# # http://schema.org/VideoObject # # http://schema.org/VideoObject
@ -2488,6 +2504,15 @@ class GenericIE(InfoExtractor):
self._sort_formats(entry['formats']) self._sort_formats(entry['formats'])
return self.playlist_result(entries) return self.playlist_result(entries)
jwplayer_data_str = self._find_jwplayer_data(webpage)
if jwplayer_data_str:
try:
jwplayer_data = self._parse_json(
jwplayer_data_str, video_id, transform_source=js_to_json)
return self._parse_jwplayer_data(jwplayer_data, video_id)
except ExtractorError:
pass
def check_video(vurl): def check_video(vurl):
if YoutubeIE.suitable(vurl): if YoutubeIE.suitable(vurl):
return True return True

View File

@ -6,59 +6,58 @@ from ..utils import (
determine_ext, determine_ext,
int_or_none, int_or_none,
parse_iso8601, parse_iso8601,
xpath_text,
) )
class HeiseIE(InfoExtractor): class HeiseIE(InfoExtractor):
_VALID_URL = r'''(?x) _VALID_URL = r'https?://(?:www\.)?heise\.de/(?:[^/]+/)+[^/]+-(?P<id>[0-9]+)\.html'
https?://(?:www\.)?heise\.de/video/artikel/ _TESTS = [{
.+?(?P<id>[0-9]+)\.html(?:$|[?#]) 'url': 'http://www.heise.de/video/artikel/Podcast-c-t-uplink-3-3-Owncloud-Tastaturen-Peilsender-Smartphone-2404147.html',
'''
_TEST = {
'url': (
'http://www.heise.de/video/artikel/Podcast-c-t-uplink-3-3-Owncloud-Tastaturen-Peilsender-Smartphone-2404147.html'
),
'md5': 'ffed432483e922e88545ad9f2f15d30e', 'md5': 'ffed432483e922e88545ad9f2f15d30e',
'info_dict': { 'info_dict': {
'id': '2404147', 'id': '2404147',
'ext': 'mp4', 'ext': 'mp4',
'title': ( 'title': "Podcast: c't uplink 3.3 Owncloud / Tastaturen / Peilsender Smartphone",
"Podcast: c't uplink 3.3 Owncloud / Tastaturen / Peilsender Smartphone"
),
'format_id': 'mp4_720p', 'format_id': 'mp4_720p',
'timestamp': 1411812600, 'timestamp': 1411812600,
'upload_date': '20140927', 'upload_date': '20140927',
'description': 'In uplink-Episode 3.3 geht es darum, wie man sich von Cloud-Anbietern emanzipieren kann, worauf man beim Kauf einer Tastatur achten sollte und was Smartphones über uns verraten.', 'description': 'md5:c934cbfb326c669c2bcabcbe3d3fcd20',
'thumbnail': r're:^https?://.*\.jpe?g$', 'thumbnail': r're:^https?://.*/gallery/$',
} }
} }, {
'url': 'http://www.heise.de/ct/artikel/c-t-uplink-3-3-Owncloud-Tastaturen-Peilsender-Smartphone-2403911.html',
'only_matching': True,
}, {
'url': 'http://www.heise.de/newsticker/meldung/c-t-uplink-Owncloud-Tastaturen-Peilsender-Smartphone-2404251.html?wt_mc=rss.ho.beitrag.atom',
'only_matching': True,
}, {
'url': 'http://www.heise.de/ct/ausgabe/2016-12-Spiele-3214137.html',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
container_id = self._search_regex( container_id = self._search_regex(
r'<div class="videoplayerjw".*?data-container="([0-9]+)"', r'<div class="videoplayerjw"[^>]+data-container="([0-9]+)"',
webpage, 'container ID') webpage, 'container ID')
sequenz_id = self._search_regex( sequenz_id = self._search_regex(
r'<div class="videoplayerjw".*?data-sequenz="([0-9]+)"', r'<div class="videoplayerjw"[^>]+data-sequenz="([0-9]+)"',
webpage, 'sequenz ID') webpage, 'sequenz ID')
data_url = 'http://www.heise.de/videout/feed?container=%s&sequenz=%s' % (container_id, sequenz_id)
doc = self._download_xml(data_url, video_id)
info = { title = self._html_search_meta('fulltitle', webpage, default=None)
'id': video_id, if not title or title == "c't":
'thumbnail': self._og_search_thumbnail(webpage), title = self._search_regex(
'timestamp': parse_iso8601( r'<div[^>]+class="videoplayerjw"[^>]+data-title="([^"]+)"',
self._html_search_meta('date', webpage)), webpage, 'title')
'description': self._og_search_description(webpage),
}
title = self._html_search_meta('fulltitle', webpage) doc = self._download_xml(
if title: 'http://www.heise.de/videout/feed', video_id, query={
info['title'] = title 'container': container_id,
else: 'sequenz': sequenz_id,
info['title'] = self._og_search_title(webpage) })
formats = [] formats = []
for source_node in doc.findall('.//{http://rss.jwpcdn.com/}source'): for source_node in doc.findall('.//{http://rss.jwpcdn.com/}source'):
@ -74,6 +73,18 @@ class HeiseIE(InfoExtractor):
'height': height, 'height': height,
}) })
self._sort_formats(formats) self._sort_formats(formats)
info['formats'] = formats
return info description = self._og_search_description(
webpage, default=None) or self._html_search_meta(
'description', webpage)
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': (xpath_text(doc, './/{http://rss.jwpcdn.com/}image') or
self._og_search_thumbnail(webpage)),
'timestamp': parse_iso8601(
self._html_search_meta('date', webpage)),
'formats': formats,
}

View File

@ -4,139 +4,9 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import (
determine_ext,
float_or_none,
int_or_none,
js_to_json,
mimetype2ext,
urljoin,
)
class JWPlatformBaseIE(InfoExtractor): class JWPlatformIE(InfoExtractor):
@staticmethod
def _find_jwplayer_data(webpage):
# TODO: Merge this with JWPlayer-related codes in generic.py
mobj = re.search(
r'jwplayer\((?P<quote>[\'"])[^\'" ]+(?P=quote)\)\.setup\s*\((?P<options>[^)]+)\)',
webpage)
if mobj:
return mobj.group('options')
def _extract_jwplayer_data(self, webpage, video_id, *args, **kwargs):
jwplayer_data = self._parse_json(
self._find_jwplayer_data(webpage), video_id,
transform_source=js_to_json)
return self._parse_jwplayer_data(
jwplayer_data, video_id, *args, **kwargs)
def _parse_jwplayer_data(self, jwplayer_data, video_id=None, require_title=True,
m3u8_id=None, mpd_id=None, rtmp_params=None, base_url=None):
# JWPlayer backward compatibility: flattened playlists
# https://github.com/jwplayer/jwplayer/blob/v7.4.3/src/js/api/config.js#L81-L96
if 'playlist' not in jwplayer_data:
jwplayer_data = {'playlist': [jwplayer_data]}
entries = []
# JWPlayer backward compatibility: single playlist item
# https://github.com/jwplayer/jwplayer/blob/v7.7.0/src/js/playlist/playlist.js#L10
if not isinstance(jwplayer_data['playlist'], list):
jwplayer_data['playlist'] = [jwplayer_data['playlist']]
for video_data in jwplayer_data['playlist']:
# JWPlayer backward compatibility: flattened sources
# https://github.com/jwplayer/jwplayer/blob/v7.4.3/src/js/playlist/item.js#L29-L35
if 'sources' not in video_data:
video_data['sources'] = [video_data]
this_video_id = video_id or video_data['mediaid']
formats = []
for source in video_data['sources']:
source_url = self._proto_relative_url(source['file'])
if base_url:
source_url = compat_urlparse.urljoin(base_url, source_url)
source_type = source.get('type') or ''
ext = mimetype2ext(source_type) or determine_ext(source_url)
if source_type == 'hls' or ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
source_url, this_video_id, 'mp4', 'm3u8_native', m3u8_id=m3u8_id, fatal=False))
elif ext == 'mpd':
formats.extend(self._extract_mpd_formats(
source_url, this_video_id, mpd_id=mpd_id, fatal=False))
# https://github.com/jwplayer/jwplayer/blob/master/src/js/providers/default.js#L67
elif source_type.startswith('audio') or ext in ('oga', 'aac', 'mp3', 'mpeg', 'vorbis'):
formats.append({
'url': source_url,
'vcodec': 'none',
'ext': ext,
})
else:
height = int_or_none(source.get('height'))
if height is None:
# Often no height is provided but there is a label in
# format like 1080p.
height = int_or_none(self._search_regex(
r'^(\d{3,})[pP]$', source.get('label') or '',
'height', default=None))
a_format = {
'url': source_url,
'width': int_or_none(source.get('width')),
'height': height,
'ext': ext,
}
if source_url.startswith('rtmp'):
a_format['ext'] = 'flv'
# See com/longtailvideo/jwplayer/media/RTMPMediaProvider.as
# of jwplayer.flash.swf
rtmp_url_parts = re.split(
r'((?:mp4|mp3|flv):)', source_url, 1)
if len(rtmp_url_parts) == 3:
rtmp_url, prefix, play_path = rtmp_url_parts
a_format.update({
'url': rtmp_url,
'play_path': prefix + play_path,
})
if rtmp_params:
a_format.update(rtmp_params)
formats.append(a_format)
self._sort_formats(formats)
subtitles = {}
tracks = video_data.get('tracks')
if tracks and isinstance(tracks, list):
for track in tracks:
if track.get('kind') != 'captions':
continue
track_url = urljoin(base_url, track.get('file'))
if not track_url:
continue
subtitles.setdefault(track.get('label') or 'en', []).append({
'url': self._proto_relative_url(track_url)
})
entries.append({
'id': this_video_id,
'title': video_data['title'] if require_title else video_data.get('title'),
'description': video_data.get('description'),
'thumbnail': self._proto_relative_url(video_data.get('image')),
'timestamp': int_or_none(video_data.get('pubdate')),
'duration': float_or_none(jwplayer_data.get('duration') or video_data.get('duration')),
'subtitles': subtitles,
'formats': formats,
})
if len(entries) == 1:
return entries[0]
else:
return self.playlist_result(entries)
class JWPlatformIE(JWPlatformBaseIE):
_VALID_URL = r'(?:https?://content\.jwplatform\.com/(?:feeds|players|jw6)/|jwplatform:)(?P<id>[a-zA-Z0-9]{8})' _VALID_URL = r'(?:https?://content\.jwplatform\.com/(?:feeds|players|jw6)/|jwplatform:)(?P<id>[a-zA-Z0-9]{8})'
_TEST = { _TEST = {
'url': 'http://content.jwplatform.com/players/nPripu9l-ALJ3XQCI.js', 'url': 'http://content.jwplatform.com/players/nPripu9l-ALJ3XQCI.js',

View File

@ -1,14 +1,14 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
from .jwplatform import JWPlatformBaseIE from .common import InfoExtractor
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
js_to_json, js_to_json,
) )
class OnDemandKoreaIE(JWPlatformBaseIE): class OnDemandKoreaIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?ondemandkorea\.com/(?P<id>[^/]+)\.html' _VALID_URL = r'https?://(?:www\.)?ondemandkorea\.com/(?P<id>[^/]+)\.html'
_TEST = { _TEST = {
'url': 'http://www.ondemandkorea.com/ask-us-anything-e43.html', 'url': 'http://www.ondemandkorea.com/ask-us-anything-e43.html',

View File

@ -23,7 +23,7 @@ class OnetBaseIE(InfoExtractor):
return self._search_regex( return self._search_regex(
r'id=(["\'])mvp:(?P<id>.+?)\1', webpage, 'mvp id', group='id') r'id=(["\'])mvp:(?P<id>.+?)\1', webpage, 'mvp id', group='id')
def _extract_from_id(self, video_id, webpage): def _extract_from_id(self, video_id, webpage=None):
response = self._download_json( response = self._download_json(
'http://qi.ckm.onetapi.pl/', video_id, 'http://qi.ckm.onetapi.pl/', video_id,
query={ query={
@ -74,8 +74,10 @@ class OnetBaseIE(InfoExtractor):
meta = video.get('meta', {}) meta = video.get('meta', {})
title = self._og_search_title(webpage, default=None) or meta['title'] title = (self._og_search_title(
description = self._og_search_description(webpage, default=None) or meta.get('description') webpage, default=None) if webpage else None) or meta['title']
description = (self._og_search_description(
webpage, default=None) if webpage else None) or meta.get('description')
duration = meta.get('length') or meta.get('lenght') duration = meta.get('length') or meta.get('lenght')
timestamp = parse_iso8601(meta.get('addDate'), ' ') timestamp = parse_iso8601(meta.get('addDate'), ' ')
@ -89,6 +91,18 @@ class OnetBaseIE(InfoExtractor):
} }
class OnetMVPIE(OnetBaseIE):
_VALID_URL = r'onetmvp:(?P<id>\d+\.\d+)'
_TEST = {
'url': 'onetmvp:381027.1509591944',
'only_matching': True,
}
def _real_extract(self, url):
return self._extract_from_id(self._match_id(url))
class OnetIE(OnetBaseIE): class OnetIE(OnetBaseIE):
_VALID_URL = r'https?://(?:www\.)?onet\.tv/[a-z]/[a-z]+/(?P<display_id>[0-9a-z-]+)/(?P<id>[0-9a-z]+)' _VALID_URL = r'https?://(?:www\.)?onet\.tv/[a-z]/[a-z]+/(?P<display_id>[0-9a-z-]+)/(?P<id>[0-9a-z]+)'
IE_NAME = 'onet.tv' IE_NAME = 'onet.tv'
@ -167,3 +181,44 @@ class OnetChannelIE(OnetBaseIE):
channel_title = strip_or_none(get_element_by_class('o_channelName', webpage)) channel_title = strip_or_none(get_element_by_class('o_channelName', webpage))
channel_description = strip_or_none(get_element_by_class('o_channelDesc', webpage)) channel_description = strip_or_none(get_element_by_class('o_channelDesc', webpage))
return self.playlist_result(entries, channel_id, channel_title, channel_description) return self.playlist_result(entries, channel_id, channel_title, channel_description)
class OnetPlIE(InfoExtractor):
_VALID_URL = r'https?://(?:[^/]+\.)?(?:onet|businessinsider\.com|plejada)\.pl/(?:[^/]+/)+(?P<id>[0-9a-z]+)'
IE_NAME = 'onet.pl'
_TESTS = [{
'url': 'http://eurosport.onet.pl/zimowe/skoki-narciarskie/ziobro-wygral-kwalifikacje-w-pjongczangu/9ckrly',
'md5': 'b94021eb56214c3969380388b6e73cb0',
'info_dict': {
'id': '1561707.1685479',
'ext': 'mp4',
'title': 'Ziobro wygrał kwalifikacje w Pjongczangu',
'description': 'md5:61fb0740084d2d702ea96512a03585b4',
'upload_date': '20170214',
'timestamp': 1487078046,
},
}, {
'url': 'http://film.onet.pl/zwiastuny/ghost-in-the-shell-drugi-zwiastun-pl/5q6yl3',
'only_matching': True,
}, {
'url': 'http://moto.onet.pl/jak-wybierane-sa-miejsca-na-fotoradary/6rs04e',
'only_matching': True,
}, {
'url': 'http://businessinsider.com.pl/wideo/scenariusz-na-koniec-swiata-wedlug-nasa/dwnqptk',
'only_matching': True,
}, {
'url': 'http://plejada.pl/weronika-rosati-o-swoim-domniemanym-slubie/n2bq89',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
mvp_id = self._search_regex(
r'data-params-mvp=["\'](\d+\.\d+)', webpage, 'mvp id')
return self.url_result(
'onetmvp:%s' % mvp_id, OnetMVPIE.ie_key(), video_id=mvp_id)

View File

@ -75,17 +75,17 @@ class OpenloadIE(InfoExtractor):
'<span[^>]+id="[^"]+"[^>]*>([0-9]+)</span>', '<span[^>]+id="[^"]+"[^>]*>([0-9]+)</span>',
webpage, 'openload ID') webpage, 'openload ID')
first_three_chars = int(float(ol_id[0:][:3])) first_two_chars = int(float(ol_id[0:][:2]))
fifth_char = int(float(ol_id[3:5])) urlcode = []
urlcode = '' num = 2
num = 5
while num < len(ol_id): while num < len(ol_id):
urlcode += compat_chr(int(float(ol_id[num:][:3])) + key = int(float(ol_id[num + 3:][:2]))
first_three_chars - fifth_char * int(float(ol_id[num + 3:][:2]))) urlcode.append((key, compat_chr(int(float(ol_id[num:][:3])) - first_two_chars)))
num += 5 num += 5
video_url = 'https://openload.co/stream/' + urlcode video_url = 'https://openload.co/stream/' + ''.join(
[value for _, value in sorted(urlcode, key=lambda x: x[0])])
title = self._og_search_title(webpage, default=None) or self._search_regex( title = self._og_search_title(webpage, default=None) or self._search_regex(
r'<span[^>]+class=["\']title["\'][^>]*>([^<]+)', webpage, r'<span[^>]+class=["\']title["\'][^>]*>([^<]+)', webpage,

View File

@ -64,7 +64,8 @@ class PinkbikeIE(InfoExtractor):
'video:duration', webpage, 'duration')) 'video:duration', webpage, 'duration'))
uploader = self._search_regex( uploader = self._search_regex(
r'un:\s*"([^"]+)"', webpage, 'uploader', fatal=False) r'<a[^>]+\brel=["\']author[^>]+>([^<]+)', webpage,
'uploader', fatal=False)
upload_date = unified_strdate(self._search_regex( upload_date = unified_strdate(self._search_regex(
r'class="fullTime"[^>]+title="([^"]+)"', r'class="fullTime"[^>]+title="([^"]+)"',
webpage, 'upload date', fatal=False)) webpage, 'upload date', fatal=False))

View File

@ -2,27 +2,27 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import itertools import itertools
import os # import os
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_HTTPError, compat_HTTPError,
compat_urllib_parse_unquote, # compat_urllib_parse_unquote,
compat_urllib_parse_unquote_plus, # compat_urllib_parse_unquote_plus,
compat_urllib_parse_urlparse, # compat_urllib_parse_urlparse,
) )
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
int_or_none, int_or_none,
js_to_json, js_to_json,
orderedSet, orderedSet,
sanitized_Request, # sanitized_Request,
str_to_int, str_to_int,
) )
from ..aes import ( # from ..aes import (
aes_decrypt_text # aes_decrypt_text
) # )
class PornHubIE(InfoExtractor): class PornHubIE(InfoExtractor):
@ -109,10 +109,14 @@ class PornHubIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
req = sanitized_Request( def dl_webpage(platform):
'http://www.pornhub.com/view_video.php?viewkey=%s' % video_id) return self._download_webpage(
req.add_header('Cookie', 'age_verified=1') 'http://www.pornhub.com/view_video.php?viewkey=%s' % video_id,
webpage = self._download_webpage(req, video_id) video_id, headers={
'Cookie': 'age_verified=1; platform=%s' % platform,
})
webpage = dl_webpage('pc')
error_msg = self._html_search_regex( error_msg = self._html_search_regex(
r'(?s)<div[^>]+class=(["\'])(?:(?!\1).)*\b(?:removed|userMessageSection)\b(?:(?!\1).)*\1[^>]*>(?P<error>.+?)</div>', r'(?s)<div[^>]+class=(["\'])(?:(?!\1).)*\b(?:removed|userMessageSection)\b(?:(?!\1).)*\1[^>]*>(?P<error>.+?)</div>',
@ -123,10 +127,19 @@ class PornHubIE(InfoExtractor):
'PornHub said: %s' % error_msg, 'PornHub said: %s' % error_msg,
expected=True, video_id=video_id) expected=True, video_id=video_id)
tv_webpage = dl_webpage('tv')
video_url = self._search_regex(
r'<video[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//.+?)\1', tv_webpage,
'video url', group='url')
title = self._search_regex(
r'<h1>([^>]+)</h1>', tv_webpage, 'title', default=None)
# video_title from flashvars contains whitespace instead of non-ASCII (see # video_title from flashvars contains whitespace instead of non-ASCII (see
# http://www.pornhub.com/view_video.php?viewkey=1331683002), not relying # http://www.pornhub.com/view_video.php?viewkey=1331683002), not relying
# on that anymore. # on that anymore.
title = self._html_search_meta( title = title or self._html_search_meta(
'twitter:title', webpage, default=None) or self._search_regex( 'twitter:title', webpage, default=None) or self._search_regex(
(r'<h1[^>]+class=["\']title["\'][^>]*>(?P<title>[^<]+)', (r'<h1[^>]+class=["\']title["\'][^>]*>(?P<title>[^<]+)',
r'<div[^>]+data-video-title=(["\'])(?P<title>.+?)\1', r'<div[^>]+data-video-title=(["\'])(?P<title>.+?)\1',
@ -156,48 +169,6 @@ class PornHubIE(InfoExtractor):
comment_count = self._extract_count( comment_count = self._extract_count(
r'All Comments\s*<span>\(([\d,.]+)\)', webpage, 'comment') r'All Comments\s*<span>\(([\d,.]+)\)', webpage, 'comment')
video_variables = {}
for video_variablename, quote, video_variable in re.findall(
r'(player_quality_[0-9]{3,4}p\w+)\s*=\s*(["\'])(.+?)\2;', webpage):
video_variables[video_variablename] = video_variable
video_urls = []
for encoded_video_url in re.findall(
r'player_quality_[0-9]{3,4}p\s*=(.+?);', webpage):
for varname, varval in video_variables.items():
encoded_video_url = encoded_video_url.replace(varname, varval)
video_urls.append(re.sub(r'[\s+]', '', encoded_video_url))
if webpage.find('"encrypted":true') != -1:
password = compat_urllib_parse_unquote_plus(
self._search_regex(r'"video_title":"([^"]+)', webpage, 'password'))
video_urls = list(map(lambda s: aes_decrypt_text(s, password, 32).decode('utf-8'), video_urls))
formats = []
for video_url in video_urls:
path = compat_urllib_parse_urlparse(video_url).path
extension = os.path.splitext(path)[1][1:]
format = path.split('/')[5].split('_')[:2]
format = '-'.join(format)
m = re.match(r'^(?P<height>[0-9]+)[pP]-(?P<tbr>[0-9]+)[kK]$', format)
if m is None:
height = None
tbr = None
else:
height = int(m.group('height'))
tbr = int(m.group('tbr'))
formats.append({
'url': video_url,
'ext': extension,
'format': format,
'format_id': format,
'tbr': tbr,
'height': height,
})
self._sort_formats(formats)
page_params = self._parse_json(self._search_regex( page_params = self._parse_json(self._search_regex(
r'page_params\.zoneDetails\[([\'"])[^\'"]+\1\]\s*=\s*(?P<data>{[^}]+})', r'page_params\.zoneDetails\[([\'"])[^\'"]+\1\]\s*=\s*(?P<data>{[^}]+})',
webpage, 'page parameters', group='data', default='{}'), webpage, 'page parameters', group='data', default='{}'),
@ -209,6 +180,7 @@ class PornHubIE(InfoExtractor):
return { return {
'id': video_id, 'id': video_id,
'url': video_url,
'uploader': video_uploader, 'uploader': video_uploader,
'title': title, 'title': title,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
@ -217,7 +189,7 @@ class PornHubIE(InfoExtractor):
'like_count': like_count, 'like_count': like_count,
'dislike_count': dislike_count, 'dislike_count': dislike_count,
'comment_count': comment_count, 'comment_count': comment_count,
'formats': formats, # 'formats': formats,
'age_limit': 18, 'age_limit': 18,
'tags': tags, 'tags': tags,
'categories': categories, 'categories': categories,

View File

@ -2,13 +2,13 @@ from __future__ import unicode_literals
import re import re
from .jwplatform import JWPlatformBaseIE from .common import InfoExtractor
from ..utils import ( from ..utils import (
str_to_int, str_to_int,
) )
class PornoXOIE(JWPlatformBaseIE): class PornoXOIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?pornoxo\.com/videos/(?P<id>\d+)/(?P<display_id>[^/]+)\.html' _VALID_URL = r'https?://(?:www\.)?pornoxo\.com/videos/(?P<id>\d+)/(?P<display_id>[^/]+)\.html'
_TEST = { _TEST = {
'url': 'http://www.pornoxo.com/videos/7564/striptease-from-sexy-secretary.html', 'url': 'http://www.pornoxo.com/videos/7564/striptease-from-sexy-secretary.html',

View File

@ -2,11 +2,10 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from .jwplatform import JWPlatformBaseIE
from ..compat import compat_str from ..compat import compat_str
class RENTVIE(JWPlatformBaseIE): class RENTVIE(InfoExtractor):
_VALID_URL = r'(?:rentv:|https?://(?:www\.)?ren\.tv/(?:player|video/epizod)/)(?P<id>\d+)' _VALID_URL = r'(?:rentv:|https?://(?:www\.)?ren\.tv/(?:player|video/epizod)/)(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://ren.tv/video/epizod/118577', 'url': 'http://ren.tv/video/epizod/118577',

View File

@ -3,7 +3,7 @@ from __future__ import unicode_literals
import re import re
from .jwplatform import JWPlatformBaseIE from .common import InfoExtractor
from ..utils import ( from ..utils import (
js_to_json, js_to_json,
get_element_by_class, get_element_by_class,
@ -11,7 +11,7 @@ from ..utils import (
) )
class RudoIE(JWPlatformBaseIE): class RudoIE(InfoExtractor):
_VALID_URL = r'https?://rudo\.video/vod/(?P<id>[0-9a-zA-Z]+)' _VALID_URL = r'https?://rudo\.video/vod/(?P<id>[0-9a-zA-Z]+)'
_TEST = { _TEST = {

View File

@ -1,11 +1,11 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
from .jwplatform import JWPlatformBaseIE from .common import InfoExtractor
from ..utils import js_to_json from ..utils import js_to_json
class ScreencastOMaticIE(JWPlatformBaseIE): class ScreencastOMaticIE(InfoExtractor):
_VALID_URL = r'https?://screencast-o-matic\.com/watch/(?P<id>[0-9a-zA-Z]+)' _VALID_URL = r'https?://screencast-o-matic\.com/watch/(?P<id>[0-9a-zA-Z]+)'
_TEST = { _TEST = {
'url': 'http://screencast-o-matic.com/watch/c2lD3BeOPl', 'url': 'http://screencast-o-matic.com/watch/c2lD3BeOPl',

View File

@ -3,7 +3,7 @@ from __future__ import unicode_literals
import re import re
from .jwplatform import JWPlatformBaseIE from .common import InfoExtractor
from ..utils import ( from ..utils import (
float_or_none, float_or_none,
parse_iso8601, parse_iso8601,
@ -14,7 +14,7 @@ from ..utils import (
) )
class SendtoNewsIE(JWPlatformBaseIE): class SendtoNewsIE(InfoExtractor):
_VALID_URL = r'https?://embed\.sendtonews\.com/player2/embedplayer\.php\?.*\bSC=(?P<id>[0-9A-Za-z-]+)' _VALID_URL = r'https?://embed\.sendtonews\.com/player2/embedplayer\.php\?.*\bSC=(?P<id>[0-9A-Za-z-]+)'
_TEST = { _TEST = {

View File

@ -179,10 +179,12 @@ class ThePlatformIE(ThePlatformBaseIE, AdobePassIE):
if m: if m:
return [m.group('url')] return [m.group('url')]
# Are whitesapces ignored in URLs?
# https://github.com/rg3/youtube-dl/issues/12044
matches = re.findall( matches = re.findall(
r'<(?:iframe|script)[^>]+src=(["\'])((?:https?:)?//player\.theplatform\.com/p/.+?)\1', webpage) r'(?s)<(?:iframe|script)[^>]+src=(["\'])((?:https?:)?//player\.theplatform\.com/p/.+?)\1', webpage)
if matches: if matches:
return list(zip(*matches))[1] return [re.sub(r'\s', '', list(zip(*matches))[1][0])]
@staticmethod @staticmethod
def _sign_url(url, sig_key, sig_secret, life=600, include_qs=False): def _sign_url(url, sig_key, sig_secret, life=600, include_qs=False):

View File

@ -3,11 +3,11 @@ from __future__ import unicode_literals
import re import re
from .jwplatform import JWPlatformBaseIE from .common import InfoExtractor
from ..utils import remove_end from ..utils import remove_end
class ThisAVIE(JWPlatformBaseIE): class ThisAVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?thisav\.com/video/(?P<id>[0-9]+)/.*' _VALID_URL = r'https?://(?:www\.)?thisav\.com/video/(?P<id>[0-9]+)/.*'
_TESTS = [{ _TESTS = [{
'url': 'http://www.thisav.com/video/47734/%98%26sup1%3B%83%9E%83%82---just-fit.html', 'url': 'http://www.thisav.com/video/47734/%98%26sup1%3B%83%9E%83%82---just-fit.html',

View File

@ -1,7 +1,7 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
from .jwplatform import JWPlatformBaseIE from .common import InfoExtractor
from ..utils import ( from ..utils import (
clean_html, clean_html,
get_element_by_class, get_element_by_class,
@ -9,7 +9,7 @@ from ..utils import (
) )
class TVNoeIE(JWPlatformBaseIE): class TVNoeIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?tvnoe\.cz/video/(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.)?tvnoe\.cz/video/(?P<id>[0-9]+)'
_TEST = { _TEST = {
'url': 'http://www.tvnoe.cz/video/10362', 'url': 'http://www.tvnoe.cz/video/10362',

View File

@ -3,7 +3,7 @@ from __future__ import unicode_literals
import re import re
from .jwplatform import JWPlatformBaseIE from .common import InfoExtractor
from ..utils import ( from ..utils import (
decode_packed_codes, decode_packed_codes,
js_to_json, js_to_json,
@ -12,7 +12,7 @@ from ..utils import (
) )
class VidziIE(JWPlatformBaseIE): class VidziIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?vidzi\.tv/(?:embed-)?(?P<id>[0-9a-zA-Z]+)' _VALID_URL = r'https?://(?:www\.)?vidzi\.tv/(?:embed-)?(?P<id>[0-9a-zA-Z]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://vidzi.tv/cghql9yq6emu.html', 'url': 'http://vidzi.tv/cghql9yq6emu.html',

View File

@ -0,0 +1,32 @@
# coding: utf-8
from __future__ import unicode_literals
from .onet import OnetBaseIE
class VODPlIE(OnetBaseIE):
_VALID_URL = r'https?://vod\.pl/(?:[^/]+/)+(?P<id>[0-9a-zA-Z]+)'
_TESTS = [{
'url': 'https://vod.pl/filmy/chlopaki-nie-placza/3ep3jns',
'md5': 'a7dc3b2f7faa2421aefb0ecaabf7ec74',
'info_dict': {
'id': '3ep3jns',
'ext': 'mp4',
'title': 'Chłopaki nie płaczą',
'description': 'md5:f5f03b84712e55f5ac9f0a3f94445224',
'timestamp': 1463415154,
'duration': 5765,
'upload_date': '20160516',
},
}, {
'url': 'https://vod.pl/seriale/belfer-na-planie-praca-kamery-online/2c10heh',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
info_dict = self._extract_from_id(self._search_mvp_id(webpage), webpage)
info_dict['id'] = video_id
return info_dict

View File

@ -1,10 +1,10 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor
from .youtube import YoutubeIE from .youtube import YoutubeIE
from .jwplatform import JWPlatformBaseIE
class WimpIE(JWPlatformBaseIE): class WimpIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?wimp\.com/(?P<id>[^/]+)' _VALID_URL = r'https?://(?:www\.)?wimp\.com/(?P<id>[^/]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.wimp.com/maru-is-exhausted/', 'url': 'http://www.wimp.com/maru-is-exhausted/',

View File

@ -298,14 +298,14 @@ def parseOpts(overrideArguments=None):
metavar='FILTER', dest='match_filter', default=None, metavar='FILTER', dest='match_filter', default=None,
help=( help=(
'Generic video filter. ' 'Generic video filter. '
'Specify any key (see help for -o for a list of available keys) to' 'Specify any key (see help for -o for a list of available keys) to '
' match if the key is present, ' 'match if the key is present, '
'!key to check if the key is not present,' '!key to check if the key is not present, '
'key > NUMBER (like "comment_count > 12", also works with ' 'key > NUMBER (like "comment_count > 12", also works with '
'>=, <, <=, !=, =) to compare against a number, and ' '>=, <, <=, !=, =) to compare against a number, and '
'& to require multiple matches. ' '& to require multiple matches. '
'Values which are not known are excluded unless you' 'Values which are not known are excluded unless you '
' put a question mark (?) after the operator.' 'put a question mark (?) after the operator. '
'For example, to only match videos that have been liked more than ' 'For example, to only match videos that have been liked more than '
'100 times and disliked less than 50 times (or the dislike ' '100 times and disliked less than 50 times (or the dislike '
'functionality is not available at the given service), but who ' 'functionality is not available at the given service), but who '

View File

@ -2383,6 +2383,7 @@ def _match_one(filter_part, dct):
\s*(?P<op>%s)(?P<none_inclusive>\s*\?)?\s* \s*(?P<op>%s)(?P<none_inclusive>\s*\?)?\s*
(?: (?:
(?P<intval>[0-9.]+(?:[kKmMgGtTpPeEzZyY]i?[Bb]?)?)| (?P<intval>[0-9.]+(?:[kKmMgGtTpPeEzZyY]i?[Bb]?)?)|
(?P<quote>["\'])(?P<quotedstrval>(?:\\.|(?!(?P=quote)|\\).)+?)(?P=quote)|
(?P<strval>(?![0-9.])[a-z0-9A-Z]*) (?P<strval>(?![0-9.])[a-z0-9A-Z]*)
) )
\s*$ \s*$
@ -2391,7 +2392,8 @@ def _match_one(filter_part, dct):
if m: if m:
op = COMPARISON_OPERATORS[m.group('op')] op = COMPARISON_OPERATORS[m.group('op')]
actual_value = dct.get(m.group('key')) actual_value = dct.get(m.group('key'))
if (m.group('strval') is not None or if (m.group('quotedstrval') is not None or
m.group('strval') is not None or
# If the original field is a string and matching comparisonvalue is # If the original field is a string and matching comparisonvalue is
# a number we should respect the origin of the original field # a number we should respect the origin of the original field
# and process comparison value as a string (see # and process comparison value as a string (see
@ -2401,7 +2403,10 @@ def _match_one(filter_part, dct):
if m.group('op') not in ('=', '!='): if m.group('op') not in ('=', '!='):
raise ValueError( raise ValueError(
'Operator %s does not support string values!' % m.group('op')) 'Operator %s does not support string values!' % m.group('op'))
comparison_value = m.group('strval') or m.group('intval') comparison_value = m.group('quotedstrval') or m.group('strval') or m.group('intval')
quote = m.group('quote')
if quote is not None:
comparison_value = comparison_value.replace(r'\%s' % quote, quote)
else: else:
try: try:
comparison_value = int(m.group('intval')) comparison_value = int(m.group('intval'))

View File

@ -1,3 +1,3 @@
from __future__ import unicode_literals from __future__ import unicode_literals
__version__ = '2017.02.14' __version__ = '2017.02.17'