Compare commits

...

49 Commits

Author SHA1 Message Date
28e35f5070 release 2017.02.17 2017-02-17 23:59:56 +07:00
cf3704c132 [ChangeLog] Actualize 2017-02-17 23:48:30 +07:00
2c1f442c2b [options] Add missing spaces 2017-02-17 23:18:26 +07:00
bad4ccdb5d [heise] Improve (closes #9725) 2017-02-17 23:09:40 +07:00
db76c30c6e [heise] Support videos embedded in any article. 2017-02-17 22:55:53 +07:00
c2bde5d081 [ellentv] Improve 2017-02-17 22:45:51 +07:00
90fad0e74c [openload] Fix extraction (closes #12002) 2017-02-17 22:31:16 +07:00
d94badc755 [openload] Semifix extraction (closes #10408)
just updated the code. i don't do much python still i tried to convert my code. lemme know if there is any prob with it
2017-02-17 22:30:05 +07:00
fef51645d6 [theplatform] Recognize URLs with whitespaces (closes #12044) 2017-02-17 23:13:51 +08:00
4cead6a614 [einthusan] Relax _VALID_URL (closes #12141, closes #12159) 2017-02-17 22:02:01 +07:00
a4a554a793 [generic] Try parsing JWPlayer embedded videos (closes #12030) 2017-02-16 23:44:03 +08:00
b898f0a173 [elpais] Fix typo and improve extraction (closes #12139) 2017-02-16 04:57:42 +07:00
2480b056c1 release 2017.02.16 2017-02-16 00:10:04 +07:00
3aa25395aa [ChangeLog] Actualize 2017-02-16 00:08:56 +07:00
eafaeb226a [ceskatelevize] Lower priority for audio description sources (#12119) 2017-02-16 00:04:15 +07:00
de4d378c0c [ceskatelevize] Prefix format ids 2017-02-15 23:38:00 +07:00
099cfdb770 [devscripts/run_tests.sh] Change permission for script to 755 2017-02-16 00:28:31 +08:00
398dea3210 [test_YoutubeDL] Fix invalid escape sequences 2017-02-15 23:20:46 +07:00
db13c16ef8 [utils] Add support for quoted string literals in --match-filter (closes #8050, closes #12142, closes #12144) 2017-02-15 23:12:10 +07:00
1bd05345ea [amcnetworks] fix extraction(closes #12127) 2017-02-15 14:19:18 +01:00
3021cf83b7 [pinkbike] Fix uploader extraction (closes #12054) 2017-02-15 02:08:32 +07:00
04a741232f [onetpl] Add support for businessinsider.com.pl and plejada.pl 2017-02-15 01:23:55 +07:00
43a3d9edfc [onetpl] Add support for onet.pl (closes #10507) 2017-02-15 01:14:06 +07:00
d31aa74fdb [onetmvp] Add shortcut extractor 2017-02-15 00:58:18 +07:00
6092ccd058 [vodpl] Make more robust and add another test (closes #12122) 2017-02-15 00:52:31 +07:00
22ce9ad2bd [vod.pl] Add new extractor 2017-02-15 00:48:08 +07:00
9a372f14b4 [pornhub] Extract video URL from tv platform site (#12007, #12129) 2017-02-14 23:52:41 +07:00
5cb2d36c82 [ceskatelevize] Extract DASH formats (closes #12119, closes #12133) 2017-02-14 22:57:38 +07:00
fcca0d53a8 [ceskatelevize] Quick fix to revert to using old HLS-based playlist
This fixes recent changes in iVysilani. Proper patch should migrate to
MPEG-DASH version, which is now the default.
2017-02-14 22:25:37 +07:00
58a65ba852 release 2017.02.14 2017-02-14 01:09:18 +07:00
cedf08ff54 [ChangeLog] Actualize 2017-02-14 01:07:35 +07:00
50de3dbad3 [zdf] Fix extraction (closes #12117) 2017-02-14 01:00:06 +07:00
085f169ffe [xtube] Fix extraction for both kinds of video id (closes #12088) 2017-02-13 23:44:43 +07:00
f6d6ca1db3 [xtube] Improve title extraction 2017-02-13 23:34:14 +07:00
6e5956e6ba [lemonde] Fallback delegate extraction to generic extractor (closes #12115, closes #12116) 2017-02-13 23:17:48 +07:00
50fd3c2c69 Merge branch 'master' of github.com:rg3/youtube-dl 2017-02-13 22:58:50 +07:00
89c6691f9d [bellmedia] accept longer video id(closes #12114) 2017-02-13 15:08:48 +01:00
454e5cdb17 [limelight] add support referer protected videos 2017-02-13 14:29:05 +01:00
1de9f78e71 [travis] Separate builds for core and download 2017-02-13 18:56:05 +08:00
9dad941853 [disney] improve extraction
- add support for more urls
- detect expired videos
- skip Adobe Flash Access protected videos

closes #4975
closes #11000
closes #11882
closes #11936
2017-02-13 11:43:20 +01:00
1e2c3f61fc [travis] Separate builds for core and download 2017-02-13 17:36:13 +07:00
0dac7cbb09 [hotstar] improve extraction(closes #12096)
- extract all qualities
- detect drm protected videos
- extract more metadata
2017-02-12 17:35:24 +01:00
f8514630db [einthusan] Fix extraction (closes #11416)
The old test URLs are no longer valid, so I replace them with the one
from #11416
2017-02-12 20:53:55 +08:00
459818e280 [aenetworks] Add support for lifetimemovieclub.com 2017-02-12 20:18:11 +08:00
6310acf512 [youtube] Fix parsing codecs (closes #12091) 2017-02-12 18:09:53 +07:00
8d38dafbbf ChangeLog: update after #12085 2017-02-12 00:45:37 +08:00
f3915452de Merge pull request #12085 from wiiaboo/python2
utils.py: Workaround TypeError with Python 2.7.13 in Windows
2017-02-12 00:42:43 +08:00
2f49bcd690 utils.py: Workaround TypeError with Python 2.7.13 in Windows
Fixes #11540

Tested with Windows Python 2.7.12 and 2.7.13.
2017-02-11 14:51:28 +00:00
68c22c4c15 [iqiyi] Update _TESTS 2017-02-11 22:27:45 +08:00
47 changed files with 850 additions and 479 deletions

View File

@ -6,8 +6,8 @@
--- ---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.02.11*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected. ### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.02.17*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.02.11** - [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.02.17**
### Before submitting an *issue* make sure you have: ### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections - [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2017.02.11 [debug] youtube-dl version 2017.02.17
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View File

@ -8,7 +8,10 @@ python:
- "3.5" - "3.5"
- "3.6" - "3.6"
sudo: false sudo: false
script: nosetests test --verbose env:
- YTDL_TEST_SET=core
- YTDL_TEST_SET=download
script: ./devscripts/run_tests.sh
notifications: notifications:
email: email:
- filippo.valsorda@gmail.com - filippo.valsorda@gmail.com

View File

@ -1,3 +1,52 @@
version 2017.02.17
Extractors
* [heise] Improve extraction (#9725)
* [ellentv] Improve (#11653)
* [openload] Fix extraction (#10408, #12002)
+ [theplatform] Recognize URLs with whitespaces (#12044)
* [einthusan] Relax URL regular expression (#12141, #12159)
+ [generic] Support complex JWPlayer embedded videos (#12030)
* [elpais] Improve extraction (#12139)
version 2017.02.16
Core
+ [utils] Add support for quoted string literals in --match-filter (#8050,
#12142, #12144)
Extractors
* [ceskatelevize] Lower priority for audio description sources (#12119)
* [amcnetworks] Fix extraction (#12127)
* [pinkbike] Fix uploader extraction (#12054)
+ [onetpl] Add support for businessinsider.com.pl and plejada.pl
+ [onetpl] Add support for onet.pl (#10507)
+ [onetmvp] Add shortcut extractor
+ [vodpl] Add support for vod.pl (#12122)
+ [pornhub] Extract video URL from tv platform site (#12007, #12129)
+ [ceskatelevize] Extract DASH formats (#12119, #12133)
version 2017.02.14
Core
* TypeError is fixed with Python 2.7.13 on Windows (#11540, #12085)
Extractor
* [zdf] Fix extraction (#12117)
* [xtube] Fix extraction for both kinds of video id (#12088)
* [xtube] Improve title extraction (#12088)
+ [lemonde] Fallback delegate extraction to generic extractor (#12115, #12116)
* [bellmedia] Allow video id longer than 6 characters (#12114)
+ [limelight] Add support for referer protected videos
* [disney] Improve extraction (#4975, #11000, #11882, #11936)
* [hotstar] Improve extraction (#12096)
* [einthusan] Fix extraction (#11416)
+ [aenetworks] Add support for lifetimemovieclub.com (#12097)
* [youtube] Fix parsing codecs (#12091)
version 2017.02.11 version 2017.02.11
Core Core

View File

@ -137,13 +137,13 @@ Alternatively, refer to the [developer instructions](#developer-instructions) fo
--match-filter FILTER Generic video filter. Specify any key (see --match-filter FILTER Generic video filter. Specify any key (see
help for -o for a list of available keys) help for -o for a list of available keys)
to match if the key is present, !key to to match if the key is present, !key to
check if the key is not present,key > check if the key is not present, key >
NUMBER (like "comment_count > 12", also NUMBER (like "comment_count > 12", also
works with >=, <, <=, !=, =) to compare works with >=, <, <=, !=, =) to compare
against a number, and & to require multiple against a number, and & to require multiple
matches. Values which are not known are matches. Values which are not known are
excluded unless you put a question mark (?) excluded unless you put a question mark (?)
after the operator.For example, to only after the operator. For example, to only
match videos that have been liked more than match videos that have been liked more than
100 times and disliked less than 50 times 100 times and disliked less than 50 times
(or the dislike functionality is not (or the dislike functionality is not

19
devscripts/run_tests.sh Executable file
View File

@ -0,0 +1,19 @@
#!/bin/bash
DOWNLOAD_TESTS="age_restriction|download|subtitles|write_annotations|iqiyi_sdk_interpreter"
test_set=""
case "$YTDL_TEST_SET" in
core)
test_set="-I test_($DOWNLOAD_TESTS)\.py"
;;
download)
test_set="-I test_(?!$DOWNLOAD_TESTS).+\.py"
;;
*)
break
;;
esac
nosetests test --verbose $test_set

View File

@ -546,8 +546,10 @@
- **OktoberfestTV** - **OktoberfestTV**
- **on.aol.com** - **on.aol.com**
- **OnDemandKorea** - **OnDemandKorea**
- **onet.pl**
- **onet.tv** - **onet.tv**
- **onet.tv:channel** - **onet.tv:channel**
- **OnetMVP**
- **OnionStudios** - **OnionStudios**
- **Ooyala** - **Ooyala**
- **OoyalaExternal** - **OoyalaExternal**
@ -900,6 +902,7 @@
- **vlive** - **vlive**
- **vlive:channel** - **vlive:channel**
- **Vodlocker** - **Vodlocker**
- **VODPl**
- **VODPlatform** - **VODPlatform**
- **VoiceRepublic** - **VoiceRepublic**
- **VoxMedia** - **VoxMedia**

View File

@ -1,4 +1,5 @@
#!/usr/bin/env python #!/usr/bin/env python
# coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
@ -540,10 +541,10 @@ class TestYoutubeDL(unittest.TestCase):
self.assertEqual(ydl._format_note({}), '') self.assertEqual(ydl._format_note({}), '')
assertRegexpMatches(self, ydl._format_note({ assertRegexpMatches(self, ydl._format_note({
'vbr': 10, 'vbr': 10,
}), '^\s*10k$') }), r'^\s*10k$')
assertRegexpMatches(self, ydl._format_note({ assertRegexpMatches(self, ydl._format_note({
'fps': 30, 'fps': 30,
}), '^30fps$') }), r'^30fps$')
def test_postprocessors(self): def test_postprocessors(self):
filename = 'post-processor-testfile.mp4' filename = 'post-processor-testfile.mp4'
@ -606,6 +607,8 @@ class TestYoutubeDL(unittest.TestCase):
'duration': 30, 'duration': 30,
'filesize': 10 * 1024, 'filesize': 10 * 1024,
'playlist_id': '42', 'playlist_id': '42',
'uploader': "變態妍字幕版 太妍 тест",
'creator': "тест ' 123 ' тест--",
} }
second = { second = {
'id': '2', 'id': '2',
@ -616,6 +619,7 @@ class TestYoutubeDL(unittest.TestCase):
'description': 'foo', 'description': 'foo',
'filesize': 5 * 1024, 'filesize': 5 * 1024,
'playlist_id': '43', 'playlist_id': '43',
'uploader': "тест 123",
} }
videos = [first, second] videos = [first, second]
@ -656,6 +660,26 @@ class TestYoutubeDL(unittest.TestCase):
res = get_videos(f) res = get_videos(f)
self.assertEqual(res, ['1']) self.assertEqual(res, ['1'])
f = match_filter_func('uploader = "變態妍字幕版 太妍 тест"')
res = get_videos(f)
self.assertEqual(res, ['1'])
f = match_filter_func('uploader != "變態妍字幕版 太妍 тест"')
res = get_videos(f)
self.assertEqual(res, ['2'])
f = match_filter_func('creator = "тест \' 123 \' тест--"')
res = get_videos(f)
self.assertEqual(res, ['1'])
f = match_filter_func("creator = 'тест \\' 123 \\' тест--'")
res = get_videos(f)
self.assertEqual(res, ['1'])
f = match_filter_func(r"creator = 'тест \' 123 \' тест--' & duration > 30")
res = get_videos(f)
self.assertEqual(res, [])
def test_playlist_items_selection(self): def test_playlist_items_selection(self):
entries = [{ entries = [{
'id': compat_str(i), 'id': compat_str(i),

View File

@ -23,7 +23,7 @@ class AENetworksBaseIE(ThePlatformIE):
class AENetworksIE(AENetworksBaseIE): class AENetworksIE(AENetworksBaseIE):
IE_NAME = 'aenetworks' IE_NAME = 'aenetworks'
IE_DESC = 'A+E Networks: A&E, Lifetime, History.com, FYI Network' IE_DESC = 'A+E Networks: A&E, Lifetime, History.com, FYI Network'
_VALID_URL = r'https?://(?:www\.)?(?P<domain>(?:history|aetv|mylifetime)\.com|fyi\.tv)/(?:shows/(?P<show_path>[^/]+(?:/[^/]+){0,2})|movies/(?P<movie_display_id>[^/]+)/full-movie)' _VALID_URL = r'https?://(?:www\.)?(?P<domain>(?:history|aetv|mylifetime|lifetimemovieclub)\.com|fyi\.tv)/(?:shows/(?P<show_path>[^/]+(?:/[^/]+){0,2})|movies/(?P<movie_display_id>[^/]+)(?:/full-movie)?)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.history.com/shows/mountain-men/season-1/episode-1', 'url': 'http://www.history.com/shows/mountain-men/season-1/episode-1',
'md5': 'a97a65f7e823ae10e9244bc5433d5fe6', 'md5': 'a97a65f7e823ae10e9244bc5433d5fe6',
@ -62,11 +62,15 @@ class AENetworksIE(AENetworksBaseIE):
}, { }, {
'url': 'http://www.mylifetime.com/movies/center-stage-on-pointe/full-movie', 'url': 'http://www.mylifetime.com/movies/center-stage-on-pointe/full-movie',
'only_matching': True 'only_matching': True
}, {
'url': 'https://www.lifetimemovieclub.com/movies/a-killer-among-us',
'only_matching': True
}] }]
_DOMAIN_TO_REQUESTOR_ID = { _DOMAIN_TO_REQUESTOR_ID = {
'history.com': 'HISTORY', 'history.com': 'HISTORY',
'aetv.com': 'AETV', 'aetv.com': 'AETV',
'mylifetime.com': 'LIFETIME', 'mylifetime.com': 'LIFETIME',
'lifetimemovieclub.com': 'LIFETIMEMOVIECLUB',
'fyi.tv': 'FYI', 'fyi.tv': 'FYI',
} }

View File

@ -53,20 +53,30 @@ class AMCNetworksIE(ThePlatformIE):
'mbr': 'true', 'mbr': 'true',
'manifest': 'm3u', 'manifest': 'm3u',
} }
media_url = self._search_regex(r'window\.platformLinkURL\s*=\s*[\'"]([^\'"]+)', webpage, 'media url') media_url = self._search_regex(
r'window\.platformLinkURL\s*=\s*[\'"]([^\'"]+)',
webpage, 'media url')
theplatform_metadata = self._download_theplatform_metadata(self._search_regex( theplatform_metadata = self._download_theplatform_metadata(self._search_regex(
r'https?://link.theplatform.com/s/([^?]+)', media_url, 'theplatform_path'), display_id) r'link\.theplatform\.com/s/([^?]+)',
media_url, 'theplatform_path'), display_id)
info = self._parse_theplatform_metadata(theplatform_metadata) info = self._parse_theplatform_metadata(theplatform_metadata)
video_id = theplatform_metadata['pid'] video_id = theplatform_metadata['pid']
title = theplatform_metadata['title'] title = theplatform_metadata['title']
rating = theplatform_metadata['ratings'][0]['rating'] rating = theplatform_metadata['ratings'][0]['rating']
auth_required = self._search_regex(r'window\.authRequired\s*=\s*(true|false);', webpage, 'auth required') auth_required = self._search_regex(
r'window\.authRequired\s*=\s*(true|false);',
webpage, 'auth required')
if auth_required == 'true': if auth_required == 'true':
requestor_id = self._search_regex(r'window\.requestor_id\s*=\s*[\'"]([^\'"]+)', webpage, 'requestor id') requestor_id = self._search_regex(
resource = self._get_mvpd_resource(requestor_id, title, video_id, rating) r'window\.requestor_id\s*=\s*[\'"]([^\'"]+)',
query['auth'] = self._extract_mvpd_auth(url, video_id, requestor_id, resource) webpage, 'requestor id')
resource = self._get_mvpd_resource(
requestor_id, title, video_id, rating)
query['auth'] = self._extract_mvpd_auth(
url, video_id, requestor_id, resource)
media_url = update_url_query(media_url, query) media_url = update_url_query(media_url, query)
formats, subtitles = self._extract_theplatform_smil(media_url, video_id) formats, subtitles = self._extract_theplatform_smil(
media_url, video_id)
self._sort_formats(formats) self._sort_formats(formats)
info.update({ info.update({
'id': video_id, 'id': video_id,
@ -78,9 +88,11 @@ class AMCNetworksIE(ThePlatformIE):
if ns_keys: if ns_keys:
ns = list(ns_keys)[0] ns = list(ns_keys)[0]
series = theplatform_metadata.get(ns + '$show') series = theplatform_metadata.get(ns + '$show')
season_number = int_or_none(theplatform_metadata.get(ns + '$season')) season_number = int_or_none(
theplatform_metadata.get(ns + '$season'))
episode = theplatform_metadata.get(ns + '$episodeTitle') episode = theplatform_metadata.get(ns + '$episodeTitle')
episode_number = int_or_none(theplatform_metadata.get(ns + '$episode')) episode_number = int_or_none(
theplatform_metadata.get(ns + '$episode'))
if season_number: if season_number:
title = 'Season %d - %s' % (season_number, title) title = 'Season %d - %s' % (season_number, title)
if series: if series:

View File

@ -1,13 +1,13 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .jwplatform import JWPlatformBaseIE from .common import InfoExtractor
from ..utils import ( from ..utils import (
unified_strdate, unified_strdate,
clean_html, clean_html,
) )
class ArchiveOrgIE(JWPlatformBaseIE): class ArchiveOrgIE(InfoExtractor):
IE_NAME = 'archive.org' IE_NAME = 'archive.org'
IE_DESC = 'archive.org videos' IE_DESC = 'archive.org videos'
_VALID_URL = r'https?://(?:www\.)?archive\.org/(?:details|embed)/(?P<id>[^/?#]+)(?:[?].*)?$' _VALID_URL = r'https?://(?:www\.)?archive\.org/(?:details|embed)/(?P<id>[^/?#]+)(?:[?].*)?$'

View File

@ -24,7 +24,7 @@ class BellMediaIE(InfoExtractor):
space space
)\.ca| )\.ca|
much\.com much\.com
)/.*?(?:\bvid=|-vid|~|%7E|/(?:episode)?)(?P<id>[0-9]{6})''' )/.*?(?:\bvid=|-vid|~|%7E|/(?:episode)?)(?P<id>[0-9]{6,})'''
_TESTS = [{ _TESTS = [{
'url': 'http://www.ctv.ca/video/player?vid=706966', 'url': 'http://www.ctv.ca/video/player?vid=706966',
'md5': 'ff2ebbeae0aa2dcc32a830c3fd69b7b0', 'md5': 'ff2ebbeae0aa2dcc32a830c3fd69b7b0',
@ -55,6 +55,9 @@ class BellMediaIE(InfoExtractor):
}, { }, {
'url': 'http://www.much.com/shows/the-almost-impossible-gameshow/928979/episode-6', 'url': 'http://www.much.com/shows/the-almost-impossible-gameshow/928979/episode-6',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://www.ctv.ca/DCs-Legends-of-Tomorrow/Video/S2E11-Turncoat-vid1051430',
'only_matching': True,
}] }]
_DOMAINS = { _DOMAINS = {
'thecomedynetwork': 'comedy', 'thecomedynetwork': 'comedy',

View File

@ -13,6 +13,7 @@ from ..utils import (
float_or_none, float_or_none,
sanitized_Request, sanitized_Request,
urlencode_postdata, urlencode_postdata,
USER_AGENTS,
) )
@ -21,10 +22,10 @@ class CeskaTelevizeIE(InfoExtractor):
_TESTS = [{ _TESTS = [{
'url': 'http://www.ceskatelevize.cz/ivysilani/ivysilani/10441294653-hyde-park-civilizace/214411058091220', 'url': 'http://www.ceskatelevize.cz/ivysilani/ivysilani/10441294653-hyde-park-civilizace/214411058091220',
'info_dict': { 'info_dict': {
'id': '61924494876951776', 'id': '61924494877246241',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Hyde Park Civilizace', 'title': 'Hyde Park Civilizace: Život v Grónsku',
'description': 'md5:fe93f6eda372d150759d11644ebbfb4a', 'description': 'md5:3fec8f6bb497be5cdb0c9e8781076626',
'thumbnail': r're:^https?://.*\.jpg', 'thumbnail': r're:^https?://.*\.jpg',
'duration': 3350, 'duration': 3350,
}, },
@ -114,70 +115,100 @@ class CeskaTelevizeIE(InfoExtractor):
'requestSource': 'iVysilani', 'requestSource': 'iVysilani',
} }
req = sanitized_Request(
'http://www.ceskatelevize.cz/ivysilani/ajax/get-client-playlist',
data=urlencode_postdata(data))
req.add_header('Content-type', 'application/x-www-form-urlencoded')
req.add_header('x-addr', '127.0.0.1')
req.add_header('X-Requested-With', 'XMLHttpRequest')
req.add_header('Referer', url)
playlistpage = self._download_json(req, playlist_id)
playlist_url = playlistpage['url']
if playlist_url == 'error_region':
raise ExtractorError(NOT_AVAILABLE_STRING, expected=True)
req = sanitized_Request(compat_urllib_parse_unquote(playlist_url))
req.add_header('Referer', url)
playlist_title = self._og_search_title(webpage, default=None)
playlist_description = self._og_search_description(webpage, default=None)
playlist = self._download_json(req, playlist_id)['playlist']
playlist_len = len(playlist)
entries = [] entries = []
for item in playlist:
is_live = item.get('type') == 'LIVE'
formats = []
for format_id, stream_url in item['streamUrls'].items():
formats.extend(self._extract_m3u8_formats(
stream_url, playlist_id, 'mp4',
entry_protocol='m3u8' if is_live else 'm3u8_native',
fatal=False))
self._sort_formats(formats)
item_id = item.get('id') or item['assetId'] for user_agent in (None, USER_AGENTS['Safari']):
title = item['title'] req = sanitized_Request(
'http://www.ceskatelevize.cz/ivysilani/ajax/get-client-playlist',
data=urlencode_postdata(data))
duration = float_or_none(item.get('duration')) req.add_header('Content-type', 'application/x-www-form-urlencoded')
thumbnail = item.get('previewImageUrl') req.add_header('x-addr', '127.0.0.1')
req.add_header('X-Requested-With', 'XMLHttpRequest')
if user_agent:
req.add_header('User-Agent', user_agent)
req.add_header('Referer', url)
subtitles = {} playlistpage = self._download_json(req, playlist_id, fatal=False)
if item.get('type') == 'VOD':
subs = item.get('subtitles')
if subs:
subtitles = self.extract_subtitles(episode_id, subs)
if playlist_len == 1: if not playlistpage:
final_title = playlist_title or title continue
if is_live:
final_title = self._live_title(final_title)
else:
final_title = '%s (%s)' % (playlist_title, title)
entries.append({ playlist_url = playlistpage['url']
'id': item_id, if playlist_url == 'error_region':
'title': final_title, raise ExtractorError(NOT_AVAILABLE_STRING, expected=True)
'description': playlist_description if playlist_len == 1 else None,
'thumbnail': thumbnail, req = sanitized_Request(compat_urllib_parse_unquote(playlist_url))
'duration': duration, req.add_header('Referer', url)
'formats': formats,
'subtitles': subtitles, playlist_title = self._og_search_title(webpage, default=None)
'is_live': is_live, playlist_description = self._og_search_description(webpage, default=None)
})
playlist = self._download_json(req, playlist_id, fatal=False)
if not playlist:
continue
playlist = playlist.get('playlist')
if not isinstance(playlist, list):
continue
playlist_len = len(playlist)
for num, item in enumerate(playlist):
is_live = item.get('type') == 'LIVE'
formats = []
for format_id, stream_url in item.get('streamUrls', {}).items():
if 'playerType=flash' in stream_url:
stream_formats = self._extract_m3u8_formats(
stream_url, playlist_id, 'mp4',
entry_protocol='m3u8' if is_live else 'm3u8_native',
m3u8_id='hls-%s' % format_id, fatal=False)
else:
stream_formats = self._extract_mpd_formats(
stream_url, playlist_id,
mpd_id='dash-%s' % format_id, fatal=False)
# See https://github.com/rg3/youtube-dl/issues/12119#issuecomment-280037031
if format_id == 'audioDescription':
for f in stream_formats:
f['source_preference'] = -10
formats.extend(stream_formats)
if user_agent and len(entries) == playlist_len:
entries[num]['formats'].extend(formats)
continue
item_id = item.get('id') or item['assetId']
title = item['title']
duration = float_or_none(item.get('duration'))
thumbnail = item.get('previewImageUrl')
subtitles = {}
if item.get('type') == 'VOD':
subs = item.get('subtitles')
if subs:
subtitles = self.extract_subtitles(episode_id, subs)
if playlist_len == 1:
final_title = playlist_title or title
if is_live:
final_title = self._live_title(final_title)
else:
final_title = '%s (%s)' % (playlist_title, title)
entries.append({
'id': item_id,
'title': final_title,
'description': playlist_description if playlist_len == 1 else None,
'thumbnail': thumbnail,
'duration': duration,
'formats': formats,
'subtitles': subtitles,
'is_live': is_live,
})
for e in entries:
self._sort_formats(e['formats'])
return self.playlist_result(entries, playlist_id, playlist_title, playlist_description) return self.playlist_result(entries, playlist_id, playlist_title, playlist_description)

View File

@ -40,6 +40,7 @@ from ..utils import (
fix_xml_ampersands, fix_xml_ampersands,
float_or_none, float_or_none,
int_or_none, int_or_none,
js_to_json,
parse_iso8601, parse_iso8601,
RegexNotFoundError, RegexNotFoundError,
sanitize_filename, sanitize_filename,
@ -2073,6 +2074,123 @@ class InfoExtractor(object):
}) })
return formats return formats
@staticmethod
def _find_jwplayer_data(webpage):
mobj = re.search(
r'jwplayer\((?P<quote>[\'"])[^\'" ]+(?P=quote)\)\.setup\s*\((?P<options>[^)]+)\)',
webpage)
if mobj:
return mobj.group('options')
def _extract_jwplayer_data(self, webpage, video_id, *args, **kwargs):
jwplayer_data = self._parse_json(
self._find_jwplayer_data(webpage), video_id,
transform_source=js_to_json)
return self._parse_jwplayer_data(
jwplayer_data, video_id, *args, **kwargs)
def _parse_jwplayer_data(self, jwplayer_data, video_id=None, require_title=True,
m3u8_id=None, mpd_id=None, rtmp_params=None, base_url=None):
# JWPlayer backward compatibility: flattened playlists
# https://github.com/jwplayer/jwplayer/blob/v7.4.3/src/js/api/config.js#L81-L96
if 'playlist' not in jwplayer_data:
jwplayer_data = {'playlist': [jwplayer_data]}
entries = []
# JWPlayer backward compatibility: single playlist item
# https://github.com/jwplayer/jwplayer/blob/v7.7.0/src/js/playlist/playlist.js#L10
if not isinstance(jwplayer_data['playlist'], list):
jwplayer_data['playlist'] = [jwplayer_data['playlist']]
for video_data in jwplayer_data['playlist']:
# JWPlayer backward compatibility: flattened sources
# https://github.com/jwplayer/jwplayer/blob/v7.4.3/src/js/playlist/item.js#L29-L35
if 'sources' not in video_data:
video_data['sources'] = [video_data]
this_video_id = video_id or video_data['mediaid']
formats = []
for source in video_data['sources']:
source_url = self._proto_relative_url(source['file'])
if base_url:
source_url = compat_urlparse.urljoin(base_url, source_url)
source_type = source.get('type') or ''
ext = mimetype2ext(source_type) or determine_ext(source_url)
if source_type == 'hls' or ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
source_url, this_video_id, 'mp4', 'm3u8_native', m3u8_id=m3u8_id, fatal=False))
elif ext == 'mpd':
formats.extend(self._extract_mpd_formats(
source_url, this_video_id, mpd_id=mpd_id, fatal=False))
# https://github.com/jwplayer/jwplayer/blob/master/src/js/providers/default.js#L67
elif source_type.startswith('audio') or ext in ('oga', 'aac', 'mp3', 'mpeg', 'vorbis'):
formats.append({
'url': source_url,
'vcodec': 'none',
'ext': ext,
})
else:
height = int_or_none(source.get('height'))
if height is None:
# Often no height is provided but there is a label in
# format like 1080p.
height = int_or_none(self._search_regex(
r'^(\d{3,})[pP]$', source.get('label') or '',
'height', default=None))
a_format = {
'url': source_url,
'width': int_or_none(source.get('width')),
'height': height,
'ext': ext,
}
if source_url.startswith('rtmp'):
a_format['ext'] = 'flv'
# See com/longtailvideo/jwplayer/media/RTMPMediaProvider.as
# of jwplayer.flash.swf
rtmp_url_parts = re.split(
r'((?:mp4|mp3|flv):)', source_url, 1)
if len(rtmp_url_parts) == 3:
rtmp_url, prefix, play_path = rtmp_url_parts
a_format.update({
'url': rtmp_url,
'play_path': prefix + play_path,
})
if rtmp_params:
a_format.update(rtmp_params)
formats.append(a_format)
self._sort_formats(formats)
subtitles = {}
tracks = video_data.get('tracks')
if tracks and isinstance(tracks, list):
for track in tracks:
if track.get('kind') != 'captions':
continue
track_url = urljoin(base_url, track.get('file'))
if not track_url:
continue
subtitles.setdefault(track.get('label') or 'en', []).append({
'url': self._proto_relative_url(track_url)
})
entries.append({
'id': this_video_id,
'title': video_data['title'] if require_title else video_data.get('title'),
'description': video_data.get('description'),
'thumbnail': self._proto_relative_url(video_data.get('image')),
'timestamp': int_or_none(video_data.get('pubdate')),
'duration': float_or_none(jwplayer_data.get('duration') or video_data.get('duration')),
'subtitles': subtitles,
'formats': formats,
})
if len(entries) == 1:
return entries[0]
else:
return self.playlist_result(entries)
def _live_title(self, name): def _live_title(self, name):
""" Generate the title for a live video """ """ Generate the title for a live video """
now = datetime.datetime.now() now = datetime.datetime.now()

View File

@ -9,13 +9,15 @@ from ..utils import (
unified_strdate, unified_strdate,
compat_str, compat_str,
determine_ext, determine_ext,
ExtractorError,
) )
class DisneyIE(InfoExtractor): class DisneyIE(InfoExtractor):
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?://(?P<domain>(?:[^/]+\.)?(?:disney\.[a-z]{2,3}(?:\.[a-z]{2})?|disney(?:(?:me|latino)\.com|turkiye\.com\.tr)|starwars\.com))/(?:embed/|(?:[^/]+/)+[\w-]+-)(?P<id>[a-z0-9]{24})''' https?://(?P<domain>(?:[^/]+\.)?(?:disney\.[a-z]{2,3}(?:\.[a-z]{2})?|disney(?:(?:me|latino)\.com|turkiye\.com\.tr)|(?:starwars|marvelkids)\.com))/(?:(?:embed/|(?:[^/]+/)+[\w-]+-)(?P<id>[a-z0-9]{24})|(?:[^/]+/)?(?P<display_id>[^/?#]+))'''
_TESTS = [{ _TESTS = [{
# Disney.EmbedVideo
'url': 'http://video.disney.com/watch/moana-trailer-545ed1857afee5a0ec239977', 'url': 'http://video.disney.com/watch/moana-trailer-545ed1857afee5a0ec239977',
'info_dict': { 'info_dict': {
'id': '545ed1857afee5a0ec239977', 'id': '545ed1857afee5a0ec239977',
@ -28,6 +30,20 @@ class DisneyIE(InfoExtractor):
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
} }
}, {
# Grill.burger
'url': 'http://www.starwars.com/video/rogue-one-a-star-wars-story-intro-featurette',
'info_dict': {
'id': '5454e9f4e9804a552e3524c8',
'ext': 'mp4',
'title': '"Intro" Featurette: Rogue One: A Star Wars Story',
'upload_date': '20170104',
'description': 'Go behind-the-scenes of Rogue One: A Star Wars Story in this featurette with Director Gareth Edwards and the cast of the film.',
},
'params': {
# m3u8 download
'skip_download': True,
}
}, { }, {
'url': 'http://videos.disneylatino.com/ver/spider-man-de-regreso-a-casa-primer-adelanto-543a33a1850bdcfcca13bae2', 'url': 'http://videos.disneylatino.com/ver/spider-man-de-regreso-a-casa-primer-adelanto-543a33a1850bdcfcca13bae2',
'only_matching': True, 'only_matching': True,
@ -43,31 +59,55 @@ class DisneyIE(InfoExtractor):
}, { }, {
'url': 'http://www.starwars.com/embed/54690d1e6c42e5f09a0fb097', 'url': 'http://www.starwars.com/embed/54690d1e6c42e5f09a0fb097',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://spiderman.marvelkids.com/embed/522900d2ced3c565e4cc0677',
'only_matching': True,
}, {
'url': 'http://spiderman.marvelkids.com/videos/contest-of-champions-part-four-clip-1',
'only_matching': True,
}, {
'url': 'http://disneyjunior.en.disneyme.com/dj/watch-my-friends-tigger-and-pooh-promo',
'only_matching': True,
}, {
'url': 'http://disneyjunior.disney.com/galactech-the-galactech-grab-galactech-an-admiral-rescue',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
domain, video_id = re.match(self._VALID_URL, url).groups() domain, video_id, display_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage( if not video_id:
'http://%s/embed/%s' % (domain, video_id), video_id) webpage = self._download_webpage(url, display_id)
video_data = self._parse_json(self._search_regex( grill = re.sub(r'"\s*\+\s*"', '', self._search_regex(
r'Disney\.EmbedVideo=({.+});', webpage, 'embed data'), video_id)['video'] r'Grill\.burger\s*=\s*({.+})\s*:',
webpage, 'grill data'))
page_data = next(s for s in self._parse_json(grill, display_id)['stack'] if s.get('type') == 'video')
video_data = page_data['data'][0]
else:
webpage = self._download_webpage(
'http://%s/embed/%s' % (domain, video_id), video_id)
page_data = self._parse_json(self._search_regex(
r'Disney\.EmbedVideo\s*=\s*({.+});',
webpage, 'embed data'), video_id)
video_data = page_data['video']
for external in video_data.get('externals', []): for external in video_data.get('externals', []):
if external.get('source') == 'vevo': if external.get('source') == 'vevo':
return self.url_result('vevo:' + external['data_id'], 'Vevo') return self.url_result('vevo:' + external['data_id'], 'Vevo')
video_id = video_data['id']
title = video_data['title'] title = video_data['title']
formats = [] formats = []
for flavor in video_data.get('flavors', []): for flavor in video_data.get('flavors', []):
flavor_format = flavor.get('format') flavor_format = flavor.get('format')
flavor_url = flavor.get('url') flavor_url = flavor.get('url')
if not flavor_url or not re.match(r'https?://', flavor_url): if not flavor_url or not re.match(r'https?://', flavor_url) or flavor_format == 'mp4_access':
continue continue
tbr = int_or_none(flavor.get('bitrate')) tbr = int_or_none(flavor.get('bitrate'))
if tbr == 99999: if tbr == 99999:
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
flavor_url, video_id, 'mp4', m3u8_id=flavor_format, fatal=False)) flavor_url, video_id, 'mp4',
m3u8_id=flavor_format, fatal=False))
continue continue
format_id = [] format_id = []
if flavor_format: if flavor_format:
@ -88,6 +128,10 @@ class DisneyIE(InfoExtractor):
'ext': ext, 'ext': ext,
'vcodec': 'none' if (width == 0 and height == 0) else None, 'vcodec': 'none' if (width == 0 and height == 0) else None,
}) })
if not formats and video_data.get('expired'):
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, page_data['translations']['video_expired']),
expected=True)
self._sort_formats(formats) self._sort_formats(formats)
subtitles = {} subtitles = {}

View File

@ -1,67 +1,97 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import base64
import json
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urlparse from ..compat import (
compat_urlparse,
compat_str,
)
from ..utils import ( from ..utils import (
remove_start, extract_attributes,
sanitized_Request, ExtractorError,
get_elements_by_class,
urlencode_postdata,
) )
class EinthusanIE(InfoExtractor): class EinthusanIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?einthusan\.com/movies/watch.php\?([^#]*?)id=(?P<id>[0-9]+)' _VALID_URL = r'https?://einthusan\.tv/movie/watch/(?P<id>[^/?#&]+)'
_TESTS = [ _TESTS = [{
{ 'url': 'https://einthusan.tv/movie/watch/9097/',
'url': 'http://www.einthusan.com/movies/watch.php?id=2447', 'md5': 'ff0f7f2065031b8a2cf13a933731c035',
'md5': 'd71379996ff5b7f217eca034c34e3461', 'info_dict': {
'info_dict': { 'id': '9097',
'id': '2447', 'ext': 'mp4',
'ext': 'mp4', 'title': 'Ae Dil Hai Mushkil',
'title': 'Ek Villain', 'description': 'md5:33ef934c82a671a94652a9b4e54d931b',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
'description': 'md5:9d29fc91a7abadd4591fb862fa560d93', }
} }, {
}, 'url': 'https://einthusan.tv/movie/watch/51MZ/?lang=hindi',
{ 'only_matching': True,
'url': 'http://www.einthusan.com/movies/watch.php?id=1671', }]
'md5': 'b16a6fd3c67c06eb7c79c8a8615f4213',
'info_dict': { # reversed from jsoncrypto.prototype.decrypt() in einthusan-PGMovieWatcher.js
'id': '1671', def _decrypt(self, encrypted_data, video_id):
'ext': 'mp4', return self._parse_json(base64.b64decode((
'title': 'Soodhu Kavvuum', encrypted_data[:10] + encrypted_data[-1] + encrypted_data[12:-1]
'thumbnail': r're:^https?://.*\.jpg$', ).encode('ascii')).decode('utf-8'), video_id)
'description': 'md5:b40f2bf7320b4f9414f3780817b2af8c',
}
},
]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
request = sanitized_Request(url) webpage = self._download_webpage(url, video_id)
request.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 5.2; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0')
webpage = self._download_webpage(request, video_id)
title = self._html_search_regex( title = self._html_search_regex(r'<h3>([^<]+)</h3>', webpage, 'title')
r'<h1><a[^>]+class=["\']movie-title["\'][^>]*>(.+?)</a></h1>',
webpage, 'title')
video_id = self._search_regex( player_params = extract_attributes(self._search_regex(
r'data-movieid=["\'](\d+)', webpage, 'video id', default=video_id) r'(<section[^>]+id="UIVideoPlayer"[^>]+>)', webpage, 'player parameters'))
m3u8_url = self._download_webpage( page_id = self._html_search_regex(
'http://cdn.einthusan.com/geturl/%s/hd/London,Washington,Toronto,Dallas,San,Sydney/' '<html[^>]+data-pageid="([^"]+)"', webpage, 'page ID')
% video_id, video_id, headers={'Referer': url}) video_data = self._download_json(
formats = self._extract_m3u8_formats( 'https://einthusan.tv/ajax/movie/watch/%s/' % video_id, video_id,
m3u8_url, video_id, ext='mp4', entry_protocol='m3u8_native') data=urlencode_postdata({
'xEvent': 'UIVideoPlayer.PingOutcome',
'xJson': json.dumps({
'EJOutcomes': player_params['data-ejpingables'],
'NativeHLS': False
}),
'arcVersion': 3,
'appVersion': 59,
'gorilla.csrf.Token': page_id,
}))['Data']
description = self._html_search_meta('description', webpage) if isinstance(video_data, compat_str) and video_data.startswith('/ratelimited/'):
raise ExtractorError(
'Download rate reached. Please try again later.', expected=True)
ej_links = self._decrypt(video_data['EJLinks'], video_id)
formats = []
m3u8_url = ej_links.get('HLSLink')
if m3u8_url:
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, ext='mp4', entry_protocol='m3u8_native'))
mp4_url = ej_links.get('MP4Link')
if mp4_url:
formats.append({
'url': mp4_url,
})
self._sort_formats(formats)
description = get_elements_by_class('synopsis', webpage)[0]
thumbnail = self._html_search_regex( thumbnail = self._html_search_regex(
r'''<a class="movie-cover-wrapper".*?><img src=["'](.*?)["'].*?/></a>''', r'''<img[^>]+src=(["'])(?P<url>(?!\1).+?/moviecovers/(?!\1).+?)\1''',
webpage, "thumbnail url", fatal=False) webpage, 'thumbnail url', fatal=False, group='url')
if thumbnail is not None: if thumbnail is not None:
thumbnail = compat_urlparse.urljoin(url, remove_start(thumbnail, '..')) thumbnail = compat_urlparse.urljoin(url, thumbnail)
return { return {
'id': video_id, 'id': video_id,

View File

@ -1,13 +1,9 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import json
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from .kaltura import KalturaIE
ExtractorError, from ..utils import NO_DEFAULT
NO_DEFAULT,
)
class EllenTVIE(InfoExtractor): class EllenTVIE(InfoExtractor):
@ -65,7 +61,7 @@ class EllenTVIE(InfoExtractor):
if partner_id and kaltura_id: if partner_id and kaltura_id:
break break
return self.url_result('kaltura:%s:%s' % (partner_id, kaltura_id), 'Kaltura') return self.url_result('kaltura:%s:%s' % (partner_id, kaltura_id), KalturaIE.ie_key())
class EllenTVClipsIE(InfoExtractor): class EllenTVClipsIE(InfoExtractor):
@ -77,14 +73,14 @@ class EllenTVClipsIE(InfoExtractor):
'id': 'meryl-streep-vanessa-hudgens', 'id': 'meryl-streep-vanessa-hudgens',
'title': 'Meryl Streep, Vanessa Hudgens', 'title': 'Meryl Streep, Vanessa Hudgens',
}, },
'playlist_mincount': 7, 'playlist_mincount': 5,
} }
def _real_extract(self, url): def _real_extract(self, url):
playlist_id = self._match_id(url) playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id) webpage = self._download_webpage(url, playlist_id)
playlist = self._extract_playlist(webpage) playlist = self._extract_playlist(webpage, playlist_id)
return { return {
'_type': 'playlist', '_type': 'playlist',
@ -93,16 +89,13 @@ class EllenTVClipsIE(InfoExtractor):
'entries': self._extract_entries(playlist) 'entries': self._extract_entries(playlist)
} }
def _extract_playlist(self, webpage): def _extract_playlist(self, webpage, playlist_id):
json_string = self._search_regex(r'playerView.addClips\(\[\{(.*?)\}\]\);', webpage, 'json') json_string = self._search_regex(r'playerView.addClips\(\[\{(.*?)\}\]\);', webpage, 'json')
try: return self._parse_json('[{' + json_string + '}]', playlist_id)
return json.loads('[{' + json_string + '}]')
except ValueError as ve:
raise ExtractorError('Failed to download JSON', cause=ve)
def _extract_entries(self, playlist): def _extract_entries(self, playlist):
return [ return [
self.url_result( self.url_result(
'kaltura:%s:%s' % (item['kaltura_partner_id'], item['kaltura_entry_id']), 'kaltura:%s:%s' % (item['kaltura_partner_id'], item['kaltura_entry_id']),
'Kaltura') KalturaIE.ie_key(), video_id=item['kaltura_entry_id'])
for item in playlist] for item in playlist]

View File

@ -39,6 +39,18 @@ class ElPaisIE(InfoExtractor):
'description': 'La nave portaba cientos de ánforas y se hundió cerca de la isla de Cabrera por razones desconocidas', 'description': 'La nave portaba cientos de ánforas y se hundió cerca de la isla de Cabrera por razones desconocidas',
'upload_date': '20170127', 'upload_date': '20170127',
}, },
}, {
'url': 'http://epv.elpais.com/epv/2017/02/14/programa_la_voz_de_inaki/1487062137_075943.html',
'info_dict': {
'id': '1487062137_075943',
'ext': 'mp4',
'title': 'Disyuntivas',
'description': 'md5:a0fb1485c4a6a8a917e6f93878e66218',
'upload_date': '20170214',
},
'params': {
'skip_download': True,
},
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -59,14 +71,15 @@ class ElPaisIE(InfoExtractor):
video_url = prefix + video_suffix video_url = prefix + video_suffix
thumbnail_suffix = self._search_regex( thumbnail_suffix = self._search_regex(
r"(?:URLMediaStill|urlFotogramaFijo_\d+)\s*=\s*url_cache\s*\+\s*'([^']+)'", r"(?:URLMediaStill|urlFotogramaFijo_\d+)\s*=\s*url_cache\s*\+\s*'([^']+)'",
webpage, 'thumbnail URL', fatal=False) webpage, 'thumbnail URL', default=None)
thumbnail = ( thumbnail = (
None if thumbnail_suffix is None None if thumbnail_suffix is None
else prefix + thumbnail_suffix) else prefix + thumbnail_suffix) or self._og_search_thumbnail(webpage)
title = self._html_search_regex( title = self._html_search_regex(
(r"tituloVideo\s*=\s*'([^']+)'", webpage, 'title', (r"tituloVideo\s*=\s*'([^']+)'",
r'<h2 class="entry-header entry-title.*?>(.*?)</h2>'), r'<h2 class="entry-header entry-title.*?>(.*?)</h2>',
webpage, 'title') r'<h1[^>]+class="titulo"[^>]*>([^<]+)'),
webpage, 'title', default=None) or self._og_search_title(webpage)
upload_date = unified_strdate(self._search_regex( upload_date = unified_strdate(self._search_regex(
r'<p class="date-header date-int updated"\s+title="([^"]+)">', r'<p class="date-header date-int updated"\s+title="([^"]+)">',
webpage, 'upload date', default=None) or self._html_search_meta( webpage, 'upload date', default=None) or self._html_search_meta(

View File

@ -694,6 +694,8 @@ from .ondemandkorea import OnDemandKoreaIE
from .onet import ( from .onet import (
OnetIE, OnetIE,
OnetChannelIE, OnetChannelIE,
OnetMVPIE,
OnetPlIE,
) )
from .onionstudios import OnionStudiosIE from .onionstudios import OnionStudiosIE
from .ooyala import ( from .ooyala import (
@ -1147,6 +1149,7 @@ from .vlive import (
VLiveChannelIE VLiveChannelIE
) )
from .vodlocker import VodlockerIE from .vodlocker import VodlockerIE
from .vodpl import VODPlIE
from .vodplatform import VODPlatformIE from .vodplatform import VODPlatformIE
from .voicerepublic import VoiceRepublicIE from .voicerepublic import VoiceRepublicIE
from .voxmedia import VoxMediaIE from .voxmedia import VoxMediaIE

View File

@ -20,6 +20,7 @@ from ..utils import (
float_or_none, float_or_none,
HEADRequest, HEADRequest,
is_html, is_html,
js_to_json,
orderedSet, orderedSet,
sanitized_Request, sanitized_Request,
smuggle_url, smuggle_url,
@ -961,6 +962,16 @@ class GenericIE(InfoExtractor):
'skip_download': True, 'skip_download': True,
} }
}, },
# Complex jwplayer
{
'url': 'http://www.indiedb.com/games/king-machine/videos',
'info_dict': {
'id': 'videos',
'ext': 'mp4',
'title': 'king machine trailer 1',
'thumbnail': r're:^https?://.*\.jpg$',
},
},
# rtl.nl embed # rtl.nl embed
{ {
'url': 'http://www.rtlnieuws.nl/nieuws/buitenland/aanslagen-kopenhagen', 'url': 'http://www.rtlnieuws.nl/nieuws/buitenland/aanslagen-kopenhagen',
@ -991,19 +1002,6 @@ class GenericIE(InfoExtractor):
'title': 'Os Guinness // Is It Fools Talk? // Unbelievable? Conference 2014', 'title': 'Os Guinness // Is It Fools Talk? // Unbelievable? Conference 2014',
}, },
}, },
# Kaltura embed protected with referrer
{
'url': 'http://www.disney.nl/disney-channel/filmpjes/achter-de-schermen#/videoId/violetta-achter-de-schermen-ruggero',
'info_dict': {
'id': '1_g4fbemnq',
'ext': 'mp4',
'title': 'Violetta - Achter De Schermen - Ruggero',
'description': 'Achter de schermen met Ruggero',
'timestamp': 1435133761,
'upload_date': '20150624',
'uploader_id': 'echojecka',
},
},
# Kaltura embed with single quotes # Kaltura embed with single quotes
{ {
'url': 'http://fod.infobase.com/p_ViewPlaylist.aspx?AssignmentID=NUN8ZY', 'url': 'http://fod.infobase.com/p_ViewPlaylist.aspx?AssignmentID=NUN8ZY',
@ -1503,7 +1501,12 @@ class GenericIE(InfoExtractor):
'skip_download': True, 'skip_download': True,
}, },
'add_ie': [VideoPressIE.ie_key()], 'add_ie': [VideoPressIE.ie_key()],
} },
{
# ThePlatform embedded with whitespaces in URLs
'url': 'http://www.golfchannel.com/topics/shows/golftalkcentral.htm',
'only_matching': True,
},
# { # {
# # TODO: find another test # # TODO: find another test
# # http://schema.org/VideoObject # # http://schema.org/VideoObject
@ -2350,8 +2353,9 @@ class GenericIE(InfoExtractor):
'Channel': 'channel', 'Channel': 'channel',
'ChannelList': 'channel_list', 'ChannelList': 'channel_list',
} }
return self.url_result('limelight:%s:%s' % ( return self.url_result(smuggle_url('limelight:%s:%s' % (
lm[mobj.group(1)], mobj.group(2)), 'Limelight%s' % mobj.group(1), mobj.group(2)) lm[mobj.group(1)], mobj.group(2)), {'source_url': url}),
'Limelight%s' % mobj.group(1), mobj.group(2))
mobj = re.search( mobj = re.search(
r'''(?sx) r'''(?sx)
@ -2361,7 +2365,9 @@ class GenericIE(InfoExtractor):
value=(["\'])(?:(?!\3).)*mediaId=(?P<id>[a-z0-9]{32}) value=(["\'])(?:(?!\3).)*mediaId=(?P<id>[a-z0-9]{32})
''', webpage) ''', webpage)
if mobj: if mobj:
return self.url_result('limelight:media:%s' % mobj.group('id')) return self.url_result(smuggle_url(
'limelight:media:%s' % mobj.group('id'),
{'source_url': url}), 'LimelightMedia', mobj.group('id'))
# Look for AdobeTVVideo embeds # Look for AdobeTVVideo embeds
mobj = re.search( mobj = re.search(
@ -2498,6 +2504,15 @@ class GenericIE(InfoExtractor):
self._sort_formats(entry['formats']) self._sort_formats(entry['formats'])
return self.playlist_result(entries) return self.playlist_result(entries)
jwplayer_data_str = self._find_jwplayer_data(webpage)
if jwplayer_data_str:
try:
jwplayer_data = self._parse_json(
jwplayer_data_str, video_id, transform_source=js_to_json)
return self._parse_jwplayer_data(jwplayer_data, video_id)
except ExtractorError:
pass
def check_video(vurl): def check_video(vurl):
if YoutubeIE.suitable(vurl): if YoutubeIE.suitable(vurl):
return True return True

View File

@ -6,59 +6,58 @@ from ..utils import (
determine_ext, determine_ext,
int_or_none, int_or_none,
parse_iso8601, parse_iso8601,
xpath_text,
) )
class HeiseIE(InfoExtractor): class HeiseIE(InfoExtractor):
_VALID_URL = r'''(?x) _VALID_URL = r'https?://(?:www\.)?heise\.de/(?:[^/]+/)+[^/]+-(?P<id>[0-9]+)\.html'
https?://(?:www\.)?heise\.de/video/artikel/ _TESTS = [{
.+?(?P<id>[0-9]+)\.html(?:$|[?#]) 'url': 'http://www.heise.de/video/artikel/Podcast-c-t-uplink-3-3-Owncloud-Tastaturen-Peilsender-Smartphone-2404147.html',
'''
_TEST = {
'url': (
'http://www.heise.de/video/artikel/Podcast-c-t-uplink-3-3-Owncloud-Tastaturen-Peilsender-Smartphone-2404147.html'
),
'md5': 'ffed432483e922e88545ad9f2f15d30e', 'md5': 'ffed432483e922e88545ad9f2f15d30e',
'info_dict': { 'info_dict': {
'id': '2404147', 'id': '2404147',
'ext': 'mp4', 'ext': 'mp4',
'title': ( 'title': "Podcast: c't uplink 3.3 Owncloud / Tastaturen / Peilsender Smartphone",
"Podcast: c't uplink 3.3 Owncloud / Tastaturen / Peilsender Smartphone"
),
'format_id': 'mp4_720p', 'format_id': 'mp4_720p',
'timestamp': 1411812600, 'timestamp': 1411812600,
'upload_date': '20140927', 'upload_date': '20140927',
'description': 'In uplink-Episode 3.3 geht es darum, wie man sich von Cloud-Anbietern emanzipieren kann, worauf man beim Kauf einer Tastatur achten sollte und was Smartphones über uns verraten.', 'description': 'md5:c934cbfb326c669c2bcabcbe3d3fcd20',
'thumbnail': r're:^https?://.*\.jpe?g$', 'thumbnail': r're:^https?://.*/gallery/$',
} }
} }, {
'url': 'http://www.heise.de/ct/artikel/c-t-uplink-3-3-Owncloud-Tastaturen-Peilsender-Smartphone-2403911.html',
'only_matching': True,
}, {
'url': 'http://www.heise.de/newsticker/meldung/c-t-uplink-Owncloud-Tastaturen-Peilsender-Smartphone-2404251.html?wt_mc=rss.ho.beitrag.atom',
'only_matching': True,
}, {
'url': 'http://www.heise.de/ct/ausgabe/2016-12-Spiele-3214137.html',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
container_id = self._search_regex( container_id = self._search_regex(
r'<div class="videoplayerjw".*?data-container="([0-9]+)"', r'<div class="videoplayerjw"[^>]+data-container="([0-9]+)"',
webpage, 'container ID') webpage, 'container ID')
sequenz_id = self._search_regex( sequenz_id = self._search_regex(
r'<div class="videoplayerjw".*?data-sequenz="([0-9]+)"', r'<div class="videoplayerjw"[^>]+data-sequenz="([0-9]+)"',
webpage, 'sequenz ID') webpage, 'sequenz ID')
data_url = 'http://www.heise.de/videout/feed?container=%s&sequenz=%s' % (container_id, sequenz_id)
doc = self._download_xml(data_url, video_id)
info = { title = self._html_search_meta('fulltitle', webpage, default=None)
'id': video_id, if not title or title == "c't":
'thumbnail': self._og_search_thumbnail(webpage), title = self._search_regex(
'timestamp': parse_iso8601( r'<div[^>]+class="videoplayerjw"[^>]+data-title="([^"]+)"',
self._html_search_meta('date', webpage)), webpage, 'title')
'description': self._og_search_description(webpage),
}
title = self._html_search_meta('fulltitle', webpage) doc = self._download_xml(
if title: 'http://www.heise.de/videout/feed', video_id, query={
info['title'] = title 'container': container_id,
else: 'sequenz': sequenz_id,
info['title'] = self._og_search_title(webpage) })
formats = [] formats = []
for source_node in doc.findall('.//{http://rss.jwpcdn.com/}source'): for source_node in doc.findall('.//{http://rss.jwpcdn.com/}source'):
@ -74,6 +73,18 @@ class HeiseIE(InfoExtractor):
'height': height, 'height': height,
}) })
self._sort_formats(formats) self._sort_formats(formats)
info['formats'] = formats
return info description = self._og_search_description(
webpage, default=None) or self._html_search_meta(
'description', webpage)
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': (xpath_text(doc, './/{http://rss.jwpcdn.com/}image') or
self._og_search_thumbnail(webpage)),
'timestamp': parse_iso8601(
self._html_search_meta('date', webpage)),
'formats': formats,
}

View File

@ -34,11 +34,9 @@ class HotStarIE(InfoExtractor):
'only_matching': True, 'only_matching': True,
}] }]
_GET_CONTENT_TEMPLATE = 'http://account.hotstar.com/AVS/besc?action=GetAggregatedContentDetails&channel=PCTV&contentId=%s' def _download_json(self, url_or_request, video_id, note='Downloading JSON metadata', fatal=True, query=None):
_GET_CDN_TEMPLATE = 'http://getcdn.hotstar.com/AVS/besc?action=GetCDN&asJson=Y&channel=%s&id=%s&type=%s' json_data = super(HotStarIE, self)._download_json(
url_or_request, video_id, note, fatal=fatal, query=query)
def _download_json(self, url_or_request, video_id, note='Downloading JSON metadata', fatal=True):
json_data = super(HotStarIE, self)._download_json(url_or_request, video_id, note, fatal=fatal)
if json_data['resultCode'] != 'OK': if json_data['resultCode'] != 'OK':
if fatal: if fatal:
raise ExtractorError(json_data['errorDescription']) raise ExtractorError(json_data['errorDescription'])
@ -48,20 +46,37 @@ class HotStarIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
video_data = self._download_json( video_data = self._download_json(
self._GET_CONTENT_TEMPLATE % video_id, 'http://account.hotstar.com/AVS/besc', video_id, query={
video_id)['contentInfo'][0] 'action': 'GetAggregatedContentDetails',
'channel': 'PCTV',
'contentId': video_id,
})['contentInfo'][0]
title = video_data['episodeTitle']
if video_data.get('encrypted') == 'Y':
raise ExtractorError('This video is DRM protected.', expected=True)
formats = [] formats = []
# PCTV for extracting f4m manifest for f in ('JIO',):
for f in ('TABLET',):
format_data = self._download_json( format_data = self._download_json(
self._GET_CDN_TEMPLATE % (f, video_id, 'VOD'), 'http://getcdn.hotstar.com/AVS/besc',
video_id, 'Downloading %s JSON metadata' % f, fatal=False) video_id, 'Downloading %s JSON metadata' % f,
fatal=False, query={
'action': 'GetCDN',
'asJson': 'Y',
'channel': f,
'id': video_id,
'type': 'VOD',
})
if format_data: if format_data:
format_url = format_data['src'] format_url = format_data.get('src')
if not format_url:
continue
ext = determine_ext(format_url) ext = determine_ext(format_url)
if ext == 'm3u8': if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(format_url, video_id, 'mp4', m3u8_id='hls', fatal=False)) formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4',
m3u8_id='hls', fatal=False))
elif ext == 'f4m': elif ext == 'f4m':
# produce broken files # produce broken files
continue continue
@ -75,9 +90,12 @@ class HotStarIE(InfoExtractor):
return { return {
'id': video_id, 'id': video_id,
'title': video_data['episodeTitle'], 'title': title,
'description': video_data.get('description'), 'description': video_data.get('description'),
'duration': int_or_none(video_data.get('duration')), 'duration': int_or_none(video_data.get('duration')),
'timestamp': int_or_none(video_data.get('broadcastDate')), 'timestamp': int_or_none(video_data.get('broadcastDate')),
'formats': formats, 'formats': formats,
'episode': title,
'episode_number': int_or_none(video_data.get('episodeNumber')),
'series': video_data.get('contentTitle'),
} }

View File

@ -173,11 +173,12 @@ class IqiyiIE(InfoExtractor):
} }
}, { }, {
'url': 'http://www.iqiyi.com/v_19rrhnnclk.html', 'url': 'http://www.iqiyi.com/v_19rrhnnclk.html',
'md5': '667171934041350c5de3f5015f7f1152', 'md5': 'b7dc800a4004b1b57749d9abae0472da',
'info_dict': { 'info_dict': {
'id': 'e3f585b550a280af23c98b6cb2be19fb', 'id': 'e3f585b550a280af23c98b6cb2be19fb',
'ext': 'mp4', 'ext': 'mp4',
'title': '名侦探柯南 国语版第752集 迫近灰原秘密的黑影 下篇', # This can be either Simplified Chinese or Traditional Chinese
'title': r're:^(?:名侦探柯南 国语版第752集 迫近灰原秘密的黑影 下篇|名偵探柯南 國語版第752集 迫近灰原秘密的黑影 下篇)$',
}, },
'skip': 'Geo-restricted to China', 'skip': 'Geo-restricted to China',
}, { }, {

View File

@ -4,139 +4,9 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import (
determine_ext,
float_or_none,
int_or_none,
js_to_json,
mimetype2ext,
urljoin,
)
class JWPlatformBaseIE(InfoExtractor): class JWPlatformIE(InfoExtractor):
@staticmethod
def _find_jwplayer_data(webpage):
# TODO: Merge this with JWPlayer-related codes in generic.py
mobj = re.search(
r'jwplayer\((?P<quote>[\'"])[^\'" ]+(?P=quote)\)\.setup\s*\((?P<options>[^)]+)\)',
webpage)
if mobj:
return mobj.group('options')
def _extract_jwplayer_data(self, webpage, video_id, *args, **kwargs):
jwplayer_data = self._parse_json(
self._find_jwplayer_data(webpage), video_id,
transform_source=js_to_json)
return self._parse_jwplayer_data(
jwplayer_data, video_id, *args, **kwargs)
def _parse_jwplayer_data(self, jwplayer_data, video_id=None, require_title=True,
m3u8_id=None, mpd_id=None, rtmp_params=None, base_url=None):
# JWPlayer backward compatibility: flattened playlists
# https://github.com/jwplayer/jwplayer/blob/v7.4.3/src/js/api/config.js#L81-L96
if 'playlist' not in jwplayer_data:
jwplayer_data = {'playlist': [jwplayer_data]}
entries = []
# JWPlayer backward compatibility: single playlist item
# https://github.com/jwplayer/jwplayer/blob/v7.7.0/src/js/playlist/playlist.js#L10
if not isinstance(jwplayer_data['playlist'], list):
jwplayer_data['playlist'] = [jwplayer_data['playlist']]
for video_data in jwplayer_data['playlist']:
# JWPlayer backward compatibility: flattened sources
# https://github.com/jwplayer/jwplayer/blob/v7.4.3/src/js/playlist/item.js#L29-L35
if 'sources' not in video_data:
video_data['sources'] = [video_data]
this_video_id = video_id or video_data['mediaid']
formats = []
for source in video_data['sources']:
source_url = self._proto_relative_url(source['file'])
if base_url:
source_url = compat_urlparse.urljoin(base_url, source_url)
source_type = source.get('type') or ''
ext = mimetype2ext(source_type) or determine_ext(source_url)
if source_type == 'hls' or ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
source_url, this_video_id, 'mp4', 'm3u8_native', m3u8_id=m3u8_id, fatal=False))
elif ext == 'mpd':
formats.extend(self._extract_mpd_formats(
source_url, this_video_id, mpd_id=mpd_id, fatal=False))
# https://github.com/jwplayer/jwplayer/blob/master/src/js/providers/default.js#L67
elif source_type.startswith('audio') or ext in ('oga', 'aac', 'mp3', 'mpeg', 'vorbis'):
formats.append({
'url': source_url,
'vcodec': 'none',
'ext': ext,
})
else:
height = int_or_none(source.get('height'))
if height is None:
# Often no height is provided but there is a label in
# format like 1080p.
height = int_or_none(self._search_regex(
r'^(\d{3,})[pP]$', source.get('label') or '',
'height', default=None))
a_format = {
'url': source_url,
'width': int_or_none(source.get('width')),
'height': height,
'ext': ext,
}
if source_url.startswith('rtmp'):
a_format['ext'] = 'flv'
# See com/longtailvideo/jwplayer/media/RTMPMediaProvider.as
# of jwplayer.flash.swf
rtmp_url_parts = re.split(
r'((?:mp4|mp3|flv):)', source_url, 1)
if len(rtmp_url_parts) == 3:
rtmp_url, prefix, play_path = rtmp_url_parts
a_format.update({
'url': rtmp_url,
'play_path': prefix + play_path,
})
if rtmp_params:
a_format.update(rtmp_params)
formats.append(a_format)
self._sort_formats(formats)
subtitles = {}
tracks = video_data.get('tracks')
if tracks and isinstance(tracks, list):
for track in tracks:
if track.get('kind') != 'captions':
continue
track_url = urljoin(base_url, track.get('file'))
if not track_url:
continue
subtitles.setdefault(track.get('label') or 'en', []).append({
'url': self._proto_relative_url(track_url)
})
entries.append({
'id': this_video_id,
'title': video_data['title'] if require_title else video_data.get('title'),
'description': video_data.get('description'),
'thumbnail': self._proto_relative_url(video_data.get('image')),
'timestamp': int_or_none(video_data.get('pubdate')),
'duration': float_or_none(jwplayer_data.get('duration') or video_data.get('duration')),
'subtitles': subtitles,
'formats': formats,
})
if len(entries) == 1:
return entries[0]
else:
return self.playlist_result(entries)
class JWPlatformIE(JWPlatformBaseIE):
_VALID_URL = r'(?:https?://content\.jwplatform\.com/(?:feeds|players|jw6)/|jwplatform:)(?P<id>[a-zA-Z0-9]{8})' _VALID_URL = r'(?:https?://content\.jwplatform\.com/(?:feeds|players|jw6)/|jwplatform:)(?P<id>[a-zA-Z0-9]{8})'
_TEST = { _TEST = {
'url': 'http://content.jwplatform.com/players/nPripu9l-ALJ3XQCI.js', 'url': 'http://content.jwplatform.com/players/nPripu9l-ALJ3XQCI.js',

View File

@ -7,20 +7,40 @@ class LemondeIE(InfoExtractor):
_VALID_URL = r'https?://(?:.+?\.)?lemonde\.fr/(?:[^/]+/)*(?P<id>[^/]+)\.html' _VALID_URL = r'https?://(?:.+?\.)?lemonde\.fr/(?:[^/]+/)*(?P<id>[^/]+)\.html'
_TESTS = [{ _TESTS = [{
'url': 'http://www.lemonde.fr/police-justice/video/2016/01/19/comprendre-l-affaire-bygmalion-en-cinq-minutes_4849702_1653578.html', 'url': 'http://www.lemonde.fr/police-justice/video/2016/01/19/comprendre-l-affaire-bygmalion-en-cinq-minutes_4849702_1653578.html',
'md5': '01fb3c92de4c12c573343d63e163d302', 'md5': 'da120c8722d8632eec6ced937536cc98',
'info_dict': { 'info_dict': {
'id': 'lqm3kl', 'id': 'lqm3kl',
'ext': 'mp4', 'ext': 'mp4',
'title': "Comprendre l'affaire Bygmalion en 5 minutes", 'title': "Comprendre l'affaire Bygmalion en 5 minutes",
'thumbnail': r're:^https?://.*\.jpg', 'thumbnail': r're:^https?://.*\.jpg',
'duration': 320, 'duration': 309,
'upload_date': '20160119', 'upload_date': '20160119',
'timestamp': 1453194778, 'timestamp': 1453194778,
'uploader_id': '3pmkp', 'uploader_id': '3pmkp',
}, },
}, {
# standard iframe embed
'url': 'http://www.lemonde.fr/les-decodeurs/article/2016/10/18/tout-comprendre-du-ceta-le-petit-cousin-du-traite-transatlantique_5015920_4355770.html',
'info_dict': {
'id': 'uzsxms',
'ext': 'mp4',
'title': "CETA : quelles suites pour l'accord commercial entre l'Europe et le Canada ?",
'thumbnail': r're:^https?://.*\.jpg',
'duration': 325,
'upload_date': '20161021',
'timestamp': 1477044540,
'uploader_id': '3pmkp',
},
'params': {
'skip_download': True,
},
}, { }, {
'url': 'http://redaction.actu.lemonde.fr/societe/video/2016/01/18/calais-debut-des-travaux-de-defrichement-dans-la-jungle_4849233_3224.html', 'url': 'http://redaction.actu.lemonde.fr/societe/video/2016/01/18/calais-debut-des-travaux-de-defrichement-dans-la-jungle_4849233_3224.html',
'only_matching': True, 'only_matching': True,
}, {
# YouTube embeds
'url': 'http://www.lemonde.fr/pixels/article/2016/12/09/pourquoi-pewdiepie-superstar-de-youtube-a-menace-de-fermer-sa-chaine_5046649_4408996.html',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -30,5 +50,9 @@ class LemondeIE(InfoExtractor):
digiteka_url = self._proto_relative_url(self._search_regex( digiteka_url = self._proto_relative_url(self._search_regex(
r'url\s*:\s*(["\'])(?P<url>(?:https?://)?//(?:www\.)?(?:digiteka\.net|ultimedia\.com)/deliver/.+?)\1', r'url\s*:\s*(["\'])(?P<url>(?:https?://)?//(?:www\.)?(?:digiteka\.net|ultimedia\.com)/deliver/.+?)\1',
webpage, 'digiteka url', group='url')) webpage, 'digiteka url', group='url', default=None))
return self.url_result(digiteka_url, 'Digiteka')
if digiteka_url:
return self.url_result(digiteka_url, 'Digiteka')
return self.url_result(url, 'Generic')

View File

@ -8,6 +8,7 @@ from ..utils import (
determine_ext, determine_ext,
float_or_none, float_or_none,
int_or_none, int_or_none,
unsmuggle_url,
) )
@ -15,20 +16,23 @@ class LimelightBaseIE(InfoExtractor):
_PLAYLIST_SERVICE_URL = 'http://production-ps.lvp.llnw.net/r/PlaylistService/%s/%s/%s' _PLAYLIST_SERVICE_URL = 'http://production-ps.lvp.llnw.net/r/PlaylistService/%s/%s/%s'
_API_URL = 'http://api.video.limelight.com/rest/organizations/%s/%s/%s/%s.json' _API_URL = 'http://api.video.limelight.com/rest/organizations/%s/%s/%s/%s.json'
def _call_playlist_service(self, item_id, method, fatal=True): def _call_playlist_service(self, item_id, method, fatal=True, referer=None):
headers = {}
if referer:
headers['Referer'] = referer
return self._download_json( return self._download_json(
self._PLAYLIST_SERVICE_URL % (self._PLAYLIST_SERVICE_PATH, item_id, method), self._PLAYLIST_SERVICE_URL % (self._PLAYLIST_SERVICE_PATH, item_id, method),
item_id, 'Downloading PlaylistService %s JSON' % method, fatal=fatal) item_id, 'Downloading PlaylistService %s JSON' % method, fatal=fatal, headers=headers)
def _call_api(self, organization_id, item_id, method): def _call_api(self, organization_id, item_id, method):
return self._download_json( return self._download_json(
self._API_URL % (organization_id, self._API_PATH, item_id, method), self._API_URL % (organization_id, self._API_PATH, item_id, method),
item_id, 'Downloading API %s JSON' % method) item_id, 'Downloading API %s JSON' % method)
def _extract(self, item_id, pc_method, mobile_method, meta_method): def _extract(self, item_id, pc_method, mobile_method, meta_method, referer=None):
pc = self._call_playlist_service(item_id, pc_method) pc = self._call_playlist_service(item_id, pc_method, referer=referer)
metadata = self._call_api(pc['orgId'], item_id, meta_method) metadata = self._call_api(pc['orgId'], item_id, meta_method)
mobile = self._call_playlist_service(item_id, mobile_method, fatal=False) mobile = self._call_playlist_service(item_id, mobile_method, fatal=False, referer=referer)
return pc, mobile, metadata return pc, mobile, metadata
def _extract_info(self, streams, mobile_urls, properties): def _extract_info(self, streams, mobile_urls, properties):
@ -207,10 +211,13 @@ class LimelightMediaIE(LimelightBaseIE):
_API_PATH = 'media' _API_PATH = 'media'
def _real_extract(self, url): def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
video_id = self._match_id(url) video_id = self._match_id(url)
pc, mobile, metadata = self._extract( pc, mobile, metadata = self._extract(
video_id, 'getPlaylistByMediaId', 'getMobilePlaylistByMediaId', 'properties') video_id, 'getPlaylistByMediaId',
'getMobilePlaylistByMediaId', 'properties',
smuggled_data.get('source_url'))
return self._extract_info( return self._extract_info(
pc['playlistItems'][0].get('streams', []), pc['playlistItems'][0].get('streams', []),
@ -247,11 +254,13 @@ class LimelightChannelIE(LimelightBaseIE):
_API_PATH = 'channels' _API_PATH = 'channels'
def _real_extract(self, url): def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
channel_id = self._match_id(url) channel_id = self._match_id(url)
pc, mobile, medias = self._extract( pc, mobile, medias = self._extract(
channel_id, 'getPlaylistByChannelId', channel_id, 'getPlaylistByChannelId',
'getMobilePlaylistWithNItemsByChannelId?begin=0&count=-1', 'media') 'getMobilePlaylistWithNItemsByChannelId?begin=0&count=-1',
'media', smuggled_data.get('source_url'))
entries = [ entries = [
self._extract_info( self._extract_info(

View File

@ -1,14 +1,14 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
from .jwplatform import JWPlatformBaseIE from .common import InfoExtractor
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
js_to_json, js_to_json,
) )
class OnDemandKoreaIE(JWPlatformBaseIE): class OnDemandKoreaIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?ondemandkorea\.com/(?P<id>[^/]+)\.html' _VALID_URL = r'https?://(?:www\.)?ondemandkorea\.com/(?P<id>[^/]+)\.html'
_TEST = { _TEST = {
'url': 'http://www.ondemandkorea.com/ask-us-anything-e43.html', 'url': 'http://www.ondemandkorea.com/ask-us-anything-e43.html',

View File

@ -23,7 +23,7 @@ class OnetBaseIE(InfoExtractor):
return self._search_regex( return self._search_regex(
r'id=(["\'])mvp:(?P<id>.+?)\1', webpage, 'mvp id', group='id') r'id=(["\'])mvp:(?P<id>.+?)\1', webpage, 'mvp id', group='id')
def _extract_from_id(self, video_id, webpage): def _extract_from_id(self, video_id, webpage=None):
response = self._download_json( response = self._download_json(
'http://qi.ckm.onetapi.pl/', video_id, 'http://qi.ckm.onetapi.pl/', video_id,
query={ query={
@ -74,8 +74,10 @@ class OnetBaseIE(InfoExtractor):
meta = video.get('meta', {}) meta = video.get('meta', {})
title = self._og_search_title(webpage, default=None) or meta['title'] title = (self._og_search_title(
description = self._og_search_description(webpage, default=None) or meta.get('description') webpage, default=None) if webpage else None) or meta['title']
description = (self._og_search_description(
webpage, default=None) if webpage else None) or meta.get('description')
duration = meta.get('length') or meta.get('lenght') duration = meta.get('length') or meta.get('lenght')
timestamp = parse_iso8601(meta.get('addDate'), ' ') timestamp = parse_iso8601(meta.get('addDate'), ' ')
@ -89,6 +91,18 @@ class OnetBaseIE(InfoExtractor):
} }
class OnetMVPIE(OnetBaseIE):
_VALID_URL = r'onetmvp:(?P<id>\d+\.\d+)'
_TEST = {
'url': 'onetmvp:381027.1509591944',
'only_matching': True,
}
def _real_extract(self, url):
return self._extract_from_id(self._match_id(url))
class OnetIE(OnetBaseIE): class OnetIE(OnetBaseIE):
_VALID_URL = r'https?://(?:www\.)?onet\.tv/[a-z]/[a-z]+/(?P<display_id>[0-9a-z-]+)/(?P<id>[0-9a-z]+)' _VALID_URL = r'https?://(?:www\.)?onet\.tv/[a-z]/[a-z]+/(?P<display_id>[0-9a-z-]+)/(?P<id>[0-9a-z]+)'
IE_NAME = 'onet.tv' IE_NAME = 'onet.tv'
@ -167,3 +181,44 @@ class OnetChannelIE(OnetBaseIE):
channel_title = strip_or_none(get_element_by_class('o_channelName', webpage)) channel_title = strip_or_none(get_element_by_class('o_channelName', webpage))
channel_description = strip_or_none(get_element_by_class('o_channelDesc', webpage)) channel_description = strip_or_none(get_element_by_class('o_channelDesc', webpage))
return self.playlist_result(entries, channel_id, channel_title, channel_description) return self.playlist_result(entries, channel_id, channel_title, channel_description)
class OnetPlIE(InfoExtractor):
_VALID_URL = r'https?://(?:[^/]+\.)?(?:onet|businessinsider\.com|plejada)\.pl/(?:[^/]+/)+(?P<id>[0-9a-z]+)'
IE_NAME = 'onet.pl'
_TESTS = [{
'url': 'http://eurosport.onet.pl/zimowe/skoki-narciarskie/ziobro-wygral-kwalifikacje-w-pjongczangu/9ckrly',
'md5': 'b94021eb56214c3969380388b6e73cb0',
'info_dict': {
'id': '1561707.1685479',
'ext': 'mp4',
'title': 'Ziobro wygrał kwalifikacje w Pjongczangu',
'description': 'md5:61fb0740084d2d702ea96512a03585b4',
'upload_date': '20170214',
'timestamp': 1487078046,
},
}, {
'url': 'http://film.onet.pl/zwiastuny/ghost-in-the-shell-drugi-zwiastun-pl/5q6yl3',
'only_matching': True,
}, {
'url': 'http://moto.onet.pl/jak-wybierane-sa-miejsca-na-fotoradary/6rs04e',
'only_matching': True,
}, {
'url': 'http://businessinsider.com.pl/wideo/scenariusz-na-koniec-swiata-wedlug-nasa/dwnqptk',
'only_matching': True,
}, {
'url': 'http://plejada.pl/weronika-rosati-o-swoim-domniemanym-slubie/n2bq89',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
mvp_id = self._search_regex(
r'data-params-mvp=["\'](\d+\.\d+)', webpage, 'mvp id')
return self.url_result(
'onetmvp:%s' % mvp_id, OnetMVPIE.ie_key(), video_id=mvp_id)

View File

@ -75,17 +75,17 @@ class OpenloadIE(InfoExtractor):
'<span[^>]+id="[^"]+"[^>]*>([0-9]+)</span>', '<span[^>]+id="[^"]+"[^>]*>([0-9]+)</span>',
webpage, 'openload ID') webpage, 'openload ID')
first_three_chars = int(float(ol_id[0:][:3])) first_two_chars = int(float(ol_id[0:][:2]))
fifth_char = int(float(ol_id[3:5])) urlcode = []
urlcode = '' num = 2
num = 5
while num < len(ol_id): while num < len(ol_id):
urlcode += compat_chr(int(float(ol_id[num:][:3])) + key = int(float(ol_id[num + 3:][:2]))
first_three_chars - fifth_char * int(float(ol_id[num + 3:][:2]))) urlcode.append((key, compat_chr(int(float(ol_id[num:][:3])) - first_two_chars)))
num += 5 num += 5
video_url = 'https://openload.co/stream/' + urlcode video_url = 'https://openload.co/stream/' + ''.join(
[value for _, value in sorted(urlcode, key=lambda x: x[0])])
title = self._og_search_title(webpage, default=None) or self._search_regex( title = self._og_search_title(webpage, default=None) or self._search_regex(
r'<span[^>]+class=["\']title["\'][^>]*>([^<]+)', webpage, r'<span[^>]+class=["\']title["\'][^>]*>([^<]+)', webpage,

View File

@ -64,7 +64,8 @@ class PinkbikeIE(InfoExtractor):
'video:duration', webpage, 'duration')) 'video:duration', webpage, 'duration'))
uploader = self._search_regex( uploader = self._search_regex(
r'un:\s*"([^"]+)"', webpage, 'uploader', fatal=False) r'<a[^>]+\brel=["\']author[^>]+>([^<]+)', webpage,
'uploader', fatal=False)
upload_date = unified_strdate(self._search_regex( upload_date = unified_strdate(self._search_regex(
r'class="fullTime"[^>]+title="([^"]+)"', r'class="fullTime"[^>]+title="([^"]+)"',
webpage, 'upload date', fatal=False)) webpage, 'upload date', fatal=False))

View File

@ -2,27 +2,27 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import itertools import itertools
import os # import os
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_HTTPError, compat_HTTPError,
compat_urllib_parse_unquote, # compat_urllib_parse_unquote,
compat_urllib_parse_unquote_plus, # compat_urllib_parse_unquote_plus,
compat_urllib_parse_urlparse, # compat_urllib_parse_urlparse,
) )
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
int_or_none, int_or_none,
js_to_json, js_to_json,
orderedSet, orderedSet,
sanitized_Request, # sanitized_Request,
str_to_int, str_to_int,
) )
from ..aes import ( # from ..aes import (
aes_decrypt_text # aes_decrypt_text
) # )
class PornHubIE(InfoExtractor): class PornHubIE(InfoExtractor):
@ -109,10 +109,14 @@ class PornHubIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
req = sanitized_Request( def dl_webpage(platform):
'http://www.pornhub.com/view_video.php?viewkey=%s' % video_id) return self._download_webpage(
req.add_header('Cookie', 'age_verified=1') 'http://www.pornhub.com/view_video.php?viewkey=%s' % video_id,
webpage = self._download_webpage(req, video_id) video_id, headers={
'Cookie': 'age_verified=1; platform=%s' % platform,
})
webpage = dl_webpage('pc')
error_msg = self._html_search_regex( error_msg = self._html_search_regex(
r'(?s)<div[^>]+class=(["\'])(?:(?!\1).)*\b(?:removed|userMessageSection)\b(?:(?!\1).)*\1[^>]*>(?P<error>.+?)</div>', r'(?s)<div[^>]+class=(["\'])(?:(?!\1).)*\b(?:removed|userMessageSection)\b(?:(?!\1).)*\1[^>]*>(?P<error>.+?)</div>',
@ -123,10 +127,19 @@ class PornHubIE(InfoExtractor):
'PornHub said: %s' % error_msg, 'PornHub said: %s' % error_msg,
expected=True, video_id=video_id) expected=True, video_id=video_id)
tv_webpage = dl_webpage('tv')
video_url = self._search_regex(
r'<video[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//.+?)\1', tv_webpage,
'video url', group='url')
title = self._search_regex(
r'<h1>([^>]+)</h1>', tv_webpage, 'title', default=None)
# video_title from flashvars contains whitespace instead of non-ASCII (see # video_title from flashvars contains whitespace instead of non-ASCII (see
# http://www.pornhub.com/view_video.php?viewkey=1331683002), not relying # http://www.pornhub.com/view_video.php?viewkey=1331683002), not relying
# on that anymore. # on that anymore.
title = self._html_search_meta( title = title or self._html_search_meta(
'twitter:title', webpage, default=None) or self._search_regex( 'twitter:title', webpage, default=None) or self._search_regex(
(r'<h1[^>]+class=["\']title["\'][^>]*>(?P<title>[^<]+)', (r'<h1[^>]+class=["\']title["\'][^>]*>(?P<title>[^<]+)',
r'<div[^>]+data-video-title=(["\'])(?P<title>.+?)\1', r'<div[^>]+data-video-title=(["\'])(?P<title>.+?)\1',
@ -156,48 +169,6 @@ class PornHubIE(InfoExtractor):
comment_count = self._extract_count( comment_count = self._extract_count(
r'All Comments\s*<span>\(([\d,.]+)\)', webpage, 'comment') r'All Comments\s*<span>\(([\d,.]+)\)', webpage, 'comment')
video_variables = {}
for video_variablename, quote, video_variable in re.findall(
r'(player_quality_[0-9]{3,4}p\w+)\s*=\s*(["\'])(.+?)\2;', webpage):
video_variables[video_variablename] = video_variable
video_urls = []
for encoded_video_url in re.findall(
r'player_quality_[0-9]{3,4}p\s*=(.+?);', webpage):
for varname, varval in video_variables.items():
encoded_video_url = encoded_video_url.replace(varname, varval)
video_urls.append(re.sub(r'[\s+]', '', encoded_video_url))
if webpage.find('"encrypted":true') != -1:
password = compat_urllib_parse_unquote_plus(
self._search_regex(r'"video_title":"([^"]+)', webpage, 'password'))
video_urls = list(map(lambda s: aes_decrypt_text(s, password, 32).decode('utf-8'), video_urls))
formats = []
for video_url in video_urls:
path = compat_urllib_parse_urlparse(video_url).path
extension = os.path.splitext(path)[1][1:]
format = path.split('/')[5].split('_')[:2]
format = '-'.join(format)
m = re.match(r'^(?P<height>[0-9]+)[pP]-(?P<tbr>[0-9]+)[kK]$', format)
if m is None:
height = None
tbr = None
else:
height = int(m.group('height'))
tbr = int(m.group('tbr'))
formats.append({
'url': video_url,
'ext': extension,
'format': format,
'format_id': format,
'tbr': tbr,
'height': height,
})
self._sort_formats(formats)
page_params = self._parse_json(self._search_regex( page_params = self._parse_json(self._search_regex(
r'page_params\.zoneDetails\[([\'"])[^\'"]+\1\]\s*=\s*(?P<data>{[^}]+})', r'page_params\.zoneDetails\[([\'"])[^\'"]+\1\]\s*=\s*(?P<data>{[^}]+})',
webpage, 'page parameters', group='data', default='{}'), webpage, 'page parameters', group='data', default='{}'),
@ -209,6 +180,7 @@ class PornHubIE(InfoExtractor):
return { return {
'id': video_id, 'id': video_id,
'url': video_url,
'uploader': video_uploader, 'uploader': video_uploader,
'title': title, 'title': title,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
@ -217,7 +189,7 @@ class PornHubIE(InfoExtractor):
'like_count': like_count, 'like_count': like_count,
'dislike_count': dislike_count, 'dislike_count': dislike_count,
'comment_count': comment_count, 'comment_count': comment_count,
'formats': formats, # 'formats': formats,
'age_limit': 18, 'age_limit': 18,
'tags': tags, 'tags': tags,
'categories': categories, 'categories': categories,

View File

@ -2,13 +2,13 @@ from __future__ import unicode_literals
import re import re
from .jwplatform import JWPlatformBaseIE from .common import InfoExtractor
from ..utils import ( from ..utils import (
str_to_int, str_to_int,
) )
class PornoXOIE(JWPlatformBaseIE): class PornoXOIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?pornoxo\.com/videos/(?P<id>\d+)/(?P<display_id>[^/]+)\.html' _VALID_URL = r'https?://(?:www\.)?pornoxo\.com/videos/(?P<id>\d+)/(?P<display_id>[^/]+)\.html'
_TEST = { _TEST = {
'url': 'http://www.pornoxo.com/videos/7564/striptease-from-sexy-secretary.html', 'url': 'http://www.pornoxo.com/videos/7564/striptease-from-sexy-secretary.html',

View File

@ -2,11 +2,10 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from .jwplatform import JWPlatformBaseIE
from ..compat import compat_str from ..compat import compat_str
class RENTVIE(JWPlatformBaseIE): class RENTVIE(InfoExtractor):
_VALID_URL = r'(?:rentv:|https?://(?:www\.)?ren\.tv/(?:player|video/epizod)/)(?P<id>\d+)' _VALID_URL = r'(?:rentv:|https?://(?:www\.)?ren\.tv/(?:player|video/epizod)/)(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://ren.tv/video/epizod/118577', 'url': 'http://ren.tv/video/epizod/118577',

View File

@ -3,7 +3,7 @@ from __future__ import unicode_literals
import re import re
from .jwplatform import JWPlatformBaseIE from .common import InfoExtractor
from ..utils import ( from ..utils import (
js_to_json, js_to_json,
get_element_by_class, get_element_by_class,
@ -11,7 +11,7 @@ from ..utils import (
) )
class RudoIE(JWPlatformBaseIE): class RudoIE(InfoExtractor):
_VALID_URL = r'https?://rudo\.video/vod/(?P<id>[0-9a-zA-Z]+)' _VALID_URL = r'https?://rudo\.video/vod/(?P<id>[0-9a-zA-Z]+)'
_TEST = { _TEST = {

View File

@ -1,11 +1,11 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
from .jwplatform import JWPlatformBaseIE from .common import InfoExtractor
from ..utils import js_to_json from ..utils import js_to_json
class ScreencastOMaticIE(JWPlatformBaseIE): class ScreencastOMaticIE(InfoExtractor):
_VALID_URL = r'https?://screencast-o-matic\.com/watch/(?P<id>[0-9a-zA-Z]+)' _VALID_URL = r'https?://screencast-o-matic\.com/watch/(?P<id>[0-9a-zA-Z]+)'
_TEST = { _TEST = {
'url': 'http://screencast-o-matic.com/watch/c2lD3BeOPl', 'url': 'http://screencast-o-matic.com/watch/c2lD3BeOPl',

View File

@ -3,7 +3,7 @@ from __future__ import unicode_literals
import re import re
from .jwplatform import JWPlatformBaseIE from .common import InfoExtractor
from ..utils import ( from ..utils import (
float_or_none, float_or_none,
parse_iso8601, parse_iso8601,
@ -14,7 +14,7 @@ from ..utils import (
) )
class SendtoNewsIE(JWPlatformBaseIE): class SendtoNewsIE(InfoExtractor):
_VALID_URL = r'https?://embed\.sendtonews\.com/player2/embedplayer\.php\?.*\bSC=(?P<id>[0-9A-Za-z-]+)' _VALID_URL = r'https?://embed\.sendtonews\.com/player2/embedplayer\.php\?.*\bSC=(?P<id>[0-9A-Za-z-]+)'
_TEST = { _TEST = {

View File

@ -179,10 +179,12 @@ class ThePlatformIE(ThePlatformBaseIE, AdobePassIE):
if m: if m:
return [m.group('url')] return [m.group('url')]
# Are whitesapces ignored in URLs?
# https://github.com/rg3/youtube-dl/issues/12044
matches = re.findall( matches = re.findall(
r'<(?:iframe|script)[^>]+src=(["\'])((?:https?:)?//player\.theplatform\.com/p/.+?)\1', webpage) r'(?s)<(?:iframe|script)[^>]+src=(["\'])((?:https?:)?//player\.theplatform\.com/p/.+?)\1', webpage)
if matches: if matches:
return list(zip(*matches))[1] return [re.sub(r'\s', '', list(zip(*matches))[1][0])]
@staticmethod @staticmethod
def _sign_url(url, sig_key, sig_secret, life=600, include_qs=False): def _sign_url(url, sig_key, sig_secret, life=600, include_qs=False):

View File

@ -3,11 +3,11 @@ from __future__ import unicode_literals
import re import re
from .jwplatform import JWPlatformBaseIE from .common import InfoExtractor
from ..utils import remove_end from ..utils import remove_end
class ThisAVIE(JWPlatformBaseIE): class ThisAVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?thisav\.com/video/(?P<id>[0-9]+)/.*' _VALID_URL = r'https?://(?:www\.)?thisav\.com/video/(?P<id>[0-9]+)/.*'
_TESTS = [{ _TESTS = [{
'url': 'http://www.thisav.com/video/47734/%98%26sup1%3B%83%9E%83%82---just-fit.html', 'url': 'http://www.thisav.com/video/47734/%98%26sup1%3B%83%9E%83%82---just-fit.html',

View File

@ -1,7 +1,7 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
from .jwplatform import JWPlatformBaseIE from .common import InfoExtractor
from ..utils import ( from ..utils import (
clean_html, clean_html,
get_element_by_class, get_element_by_class,
@ -9,7 +9,7 @@ from ..utils import (
) )
class TVNoeIE(JWPlatformBaseIE): class TVNoeIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?tvnoe\.cz/video/(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.)?tvnoe\.cz/video/(?P<id>[0-9]+)'
_TEST = { _TEST = {
'url': 'http://www.tvnoe.cz/video/10362', 'url': 'http://www.tvnoe.cz/video/10362',

View File

@ -3,7 +3,7 @@ from __future__ import unicode_literals
import re import re
from .jwplatform import JWPlatformBaseIE from .common import InfoExtractor
from ..utils import ( from ..utils import (
decode_packed_codes, decode_packed_codes,
js_to_json, js_to_json,
@ -12,7 +12,7 @@ from ..utils import (
) )
class VidziIE(JWPlatformBaseIE): class VidziIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?vidzi\.tv/(?:embed-)?(?P<id>[0-9a-zA-Z]+)' _VALID_URL = r'https?://(?:www\.)?vidzi\.tv/(?:embed-)?(?P<id>[0-9a-zA-Z]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://vidzi.tv/cghql9yq6emu.html', 'url': 'http://vidzi.tv/cghql9yq6emu.html',

View File

@ -0,0 +1,32 @@
# coding: utf-8
from __future__ import unicode_literals
from .onet import OnetBaseIE
class VODPlIE(OnetBaseIE):
_VALID_URL = r'https?://vod\.pl/(?:[^/]+/)+(?P<id>[0-9a-zA-Z]+)'
_TESTS = [{
'url': 'https://vod.pl/filmy/chlopaki-nie-placza/3ep3jns',
'md5': 'a7dc3b2f7faa2421aefb0ecaabf7ec74',
'info_dict': {
'id': '3ep3jns',
'ext': 'mp4',
'title': 'Chłopaki nie płaczą',
'description': 'md5:f5f03b84712e55f5ac9f0a3f94445224',
'timestamp': 1463415154,
'duration': 5765,
'upload_date': '20160516',
},
}, {
'url': 'https://vod.pl/seriale/belfer-na-planie-praca-kamery-online/2c10heh',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
info_dict = self._extract_from_id(self._search_mvp_id(webpage), webpage)
info_dict['id'] = video_id
return info_dict

View File

@ -1,10 +1,10 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor
from .youtube import YoutubeIE from .youtube import YoutubeIE
from .jwplatform import JWPlatformBaseIE
class WimpIE(JWPlatformBaseIE): class WimpIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?wimp\.com/(?P<id>[^/]+)' _VALID_URL = r'https?://(?:www\.)?wimp\.com/(?P<id>[^/]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.wimp.com/maru-is-exhausted/', 'url': 'http://www.wimp.com/maru-is-exhausted/',

View File

@ -44,6 +44,9 @@ class XTubeIE(InfoExtractor):
}, { }, {
'url': 'xtube:625837', 'url': 'xtube:625837',
'only_matching': True, 'only_matching': True,
}, {
'url': 'xtube:kVTUy_G222_',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -53,11 +56,16 @@ class XTubeIE(InfoExtractor):
if not display_id: if not display_id:
display_id = video_id display_id = video_id
url = 'http://www.xtube.com/video-watch/-%s' % video_id
req = sanitized_Request(url) if video_id.isdigit() and len(video_id) < 11:
req.add_header('Cookie', 'age_verified=1; cookiesAccepted=1') url_pattern = 'http://www.xtube.com/video-watch/-%s'
webpage = self._download_webpage(req, display_id) else:
url_pattern = 'http://www.xtube.com/watch.php?v=%s'
webpage = self._download_webpage(
url_pattern % video_id, display_id, headers={
'Cookie': 'age_verified=1; cookiesAccepted=1',
})
sources = self._parse_json(self._search_regex( sources = self._parse_json(self._search_regex(
r'(["\'])sources\1\s*:\s*(?P<sources>{.+?}),', r'(["\'])sources\1\s*:\s*(?P<sources>{.+?}),',
@ -73,7 +81,7 @@ class XTubeIE(InfoExtractor):
self._sort_formats(formats) self._sort_formats(formats)
title = self._search_regex( title = self._search_regex(
(r'<h1>(?P<title>[^<]+)</h1>', r'videoTitle\s*:\s*(["\'])(?P<title>.+?)\1'), (r'<h1>\s*(?P<title>[^<]+?)\s*</h1>', r'videoTitle\s*:\s*(["\'])(?P<title>.+?)\1'),
webpage, 'title', group='title') webpage, 'title', group='title')
description = self._search_regex( description = self._search_regex(
r'</h1>\s*<p>([^<]+)', webpage, 'description', fatal=False) r'</h1>\s*<p>([^<]+)', webpage, 'description', fatal=False)

View File

@ -34,6 +34,7 @@ from ..utils import (
int_or_none, int_or_none,
mimetype2ext, mimetype2ext,
orderedSet, orderedSet,
parse_codecs,
parse_duration, parse_duration,
remove_quotes, remove_quotes,
remove_start, remove_start,
@ -1696,15 +1697,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
codecs = mobj.group('val') codecs = mobj.group('val')
break break
if codecs: if codecs:
codecs = codecs.split(',') dct.update(parse_codecs(codecs))
if len(codecs) == 2:
acodec, vcodec = codecs[1], codecs[0]
else:
acodec, vcodec = (codecs[0], 'none') if kind == 'audio' else ('none', codecs[0])
dct.update({
'acodec': acodec,
'vcodec': vcodec,
})
formats.append(dct) formats.append(dct)
elif video_info.get('hlsvp'): elif video_info.get('hlsvp'):
manifest_url = video_info['hlsvp'][0] manifest_url = video_info['hlsvp'][0]

View File

@ -20,9 +20,9 @@ from ..utils import (
class ZDFBaseIE(InfoExtractor): class ZDFBaseIE(InfoExtractor):
def _call_api(self, url, player, referrer, video_id): def _call_api(self, url, player, referrer, video_id, item):
return self._download_json( return self._download_json(
url, video_id, 'Downloading JSON content', url, video_id, 'Downloading JSON %s' % item,
headers={ headers={
'Referer': referrer, 'Referer': referrer,
'Api-Auth': 'Bearer %s' % player['apiToken'], 'Api-Auth': 'Bearer %s' % player['apiToken'],
@ -104,7 +104,7 @@ class ZDFIE(ZDFBaseIE):
}) })
formats.append(f) formats.append(f)
def _extract_entry(self, url, content, video_id): def _extract_entry(self, url, player, content, video_id):
title = content.get('title') or content['teaserHeadline'] title = content.get('title') or content['teaserHeadline']
t = content['mainVideoContent']['http://zdf.de/rels/target'] t = content['mainVideoContent']['http://zdf.de/rels/target']
@ -116,7 +116,8 @@ class ZDFIE(ZDFBaseIE):
'http://zdf.de/rels/streams/ptmd-template'].replace( 'http://zdf.de/rels/streams/ptmd-template'].replace(
'{playerId}', 'portal') '{playerId}', 'portal')
ptmd = self._download_json(urljoin(url, ptmd_path), video_id) ptmd = self._call_api(
urljoin(url, ptmd_path), player, url, video_id, 'metadata')
formats = [] formats = []
track_uris = set() track_uris = set()
@ -174,8 +175,9 @@ class ZDFIE(ZDFBaseIE):
} }
def _extract_regular(self, url, player, video_id): def _extract_regular(self, url, player, video_id):
content = self._call_api(player['content'], player, url, video_id) content = self._call_api(
return self._extract_entry(player['content'], content, video_id) player['content'], player, url, video_id, 'content')
return self._extract_entry(player['content'], player, content, video_id)
def _extract_mobile(self, video_id): def _extract_mobile(self, video_id):
document = self._download_json( document = self._download_json(

View File

@ -298,14 +298,14 @@ def parseOpts(overrideArguments=None):
metavar='FILTER', dest='match_filter', default=None, metavar='FILTER', dest='match_filter', default=None,
help=( help=(
'Generic video filter. ' 'Generic video filter. '
'Specify any key (see help for -o for a list of available keys) to' 'Specify any key (see help for -o for a list of available keys) to '
' match if the key is present, ' 'match if the key is present, '
'!key to check if the key is not present,' '!key to check if the key is not present, '
'key > NUMBER (like "comment_count > 12", also works with ' 'key > NUMBER (like "comment_count > 12", also works with '
'>=, <, <=, !=, =) to compare against a number, and ' '>=, <, <=, !=, =) to compare against a number, and '
'& to require multiple matches. ' '& to require multiple matches. '
'Values which are not known are excluded unless you' 'Values which are not known are excluded unless you '
' put a question mark (?) after the operator.' 'put a question mark (?) after the operator. '
'For example, to only match videos that have been liked more than ' 'For example, to only match videos that have been liked more than '
'100 times and disliked less than 50 times (or the dislike ' '100 times and disliked less than 50 times (or the dislike '
'functionality is not available at the given service), but who ' 'functionality is not available at the given service), but who '

View File

@ -1684,6 +1684,11 @@ def setproctitle(title):
libc = ctypes.cdll.LoadLibrary('libc.so.6') libc = ctypes.cdll.LoadLibrary('libc.so.6')
except OSError: except OSError:
return return
except TypeError:
# LoadLibrary in Windows Python 2.7.13 only expects
# a bytestring, but since unicode_literals turns
# every string into a unicode string, it fails.
return
title_bytes = title.encode('utf-8') title_bytes = title.encode('utf-8')
buf = ctypes.create_string_buffer(len(title_bytes)) buf = ctypes.create_string_buffer(len(title_bytes))
buf.value = title_bytes buf.value = title_bytes
@ -2378,6 +2383,7 @@ def _match_one(filter_part, dct):
\s*(?P<op>%s)(?P<none_inclusive>\s*\?)?\s* \s*(?P<op>%s)(?P<none_inclusive>\s*\?)?\s*
(?: (?:
(?P<intval>[0-9.]+(?:[kKmMgGtTpPeEzZyY]i?[Bb]?)?)| (?P<intval>[0-9.]+(?:[kKmMgGtTpPeEzZyY]i?[Bb]?)?)|
(?P<quote>["\'])(?P<quotedstrval>(?:\\.|(?!(?P=quote)|\\).)+?)(?P=quote)|
(?P<strval>(?![0-9.])[a-z0-9A-Z]*) (?P<strval>(?![0-9.])[a-z0-9A-Z]*)
) )
\s*$ \s*$
@ -2386,7 +2392,8 @@ def _match_one(filter_part, dct):
if m: if m:
op = COMPARISON_OPERATORS[m.group('op')] op = COMPARISON_OPERATORS[m.group('op')]
actual_value = dct.get(m.group('key')) actual_value = dct.get(m.group('key'))
if (m.group('strval') is not None or if (m.group('quotedstrval') is not None or
m.group('strval') is not None or
# If the original field is a string and matching comparisonvalue is # If the original field is a string and matching comparisonvalue is
# a number we should respect the origin of the original field # a number we should respect the origin of the original field
# and process comparison value as a string (see # and process comparison value as a string (see
@ -2396,7 +2403,10 @@ def _match_one(filter_part, dct):
if m.group('op') not in ('=', '!='): if m.group('op') not in ('=', '!='):
raise ValueError( raise ValueError(
'Operator %s does not support string values!' % m.group('op')) 'Operator %s does not support string values!' % m.group('op'))
comparison_value = m.group('strval') or m.group('intval') comparison_value = m.group('quotedstrval') or m.group('strval') or m.group('intval')
quote = m.group('quote')
if quote is not None:
comparison_value = comparison_value.replace(r'\%s' % quote, quote)
else: else:
try: try:
comparison_value = int(m.group('intval')) comparison_value = int(m.group('intval'))

View File

@ -1,3 +1,3 @@
from __future__ import unicode_literals from __future__ import unicode_literals
__version__ = '2017.02.11' __version__ = '2017.02.17'