Compare commits

...

52 Commits

Author SHA1 Message Date
b56e41a701 release 2017.04.02 2017-04-02 02:39:15 +07:00
a76c25146a [ChangeLog] Actualize 2017-04-02 02:37:18 +07:00
361f293ab8 [rai] Skip not found content item id 2017-04-02 02:24:13 +07:00
b8d8cced9b [rai] Improve extraction (closes #11790)
* Fix georestriction detection
* Detect live streams
+ Extract relinker metadata
* Improve ContentItem detection
+ Extract series metadata
* Fix tests
2017-04-02 02:14:42 +07:00
51342717cd [rai] Fix extraction 2017-04-02 02:10:53 +07:00
48ab554feb [vrv] add support for series pages 2017-04-01 18:09:36 +01:00
a6f3a162f3 [limelight] improve extraction for audio only formats 2017-04-01 15:35:39 +01:00
91399b2fcc [funimation] fix extraction(closes #10696)(#11773) 2017-04-01 13:33:04 +01:00
eecea00d36 [xfileshare] Add support for vidabc.com (closes #12589) 2017-04-01 18:56:35 +07:00
2cd668ee59 [xfileshare] Improve extraction and extract hls formats 2017-04-01 18:55:48 +07:00
ca77b92f94 [crunchyroll] pass geo verifcation proxy 2017-04-01 09:33:23 +01:00
e97fc8d6b8 [cwtv] extract ISM formats 2017-04-01 07:50:24 +01:00
be61efdf17 [tvplay] Bypass geo restriction 2017-04-01 07:26:40 +01:00
77c8ebe631 [vrv] Add new extractor 2017-03-31 23:29:23 +01:00
7453999580 [packtpub] Add extractor (closes #12610) 2017-04-01 00:25:27 +07:00
1640eb0961 [YoutubeDL] Return early when extraction of url_transparent fails 2017-03-31 23:57:35 +07:00
3e943cfe09 [generic] pass base_url to _parse_jwplayer_data 2017-03-31 14:54:06 +01:00
82be732b17 [adn] Add new extractor 2017-03-31 12:24:23 +01:00
639e5b2a84 [allocine] Extract more metadata 2017-03-29 04:43:12 +07:00
128244657b [allocine] Fix extraction 2017-03-29 05:23:20 +08:00
12ee65ea0d [options] Mention ISM for --fragment-retries and --skip-unavailable-fragments 2017-03-28 23:35:48 +07:00
aea1dccbd0 [openload] fix extractor 2017-03-29 00:00:09 +08:00
9e691da067 release 2017.03.26 2017-03-26 08:11:40 +07:00
82eefd0be0 [ChangeLog] Actualize 2017-03-26 23:39:12 +07:00
f7923a4c39 [ChangeLog] Update after #12307 2017-03-26 22:07:12 +08:00
cc63259d18 Merge pull request #12307 from rndusr/fix/str-item-assignment
Fix "'str' object does not support item assignment"
2017-03-26 21:51:09 +08:00
2bfaf89b6c [downloader/hls] move check for m3u8 live streams to get_suitable_downloader 2017-03-25 23:07:05 +01:00
4f06c1c9fc Merge branch 'master' of github.com-rndusr:rg3/youtube-dl into fix/str-item-assignment 2017-03-25 21:36:59 +01:00
942b44a052 [test_compat] Do not use dash in env variables' names 2017-03-26 03:24:25 +07:00
a426ef6d78 [test_utils] Do not use dash in env variables' names 2017-03-26 03:22:48 +07:00
41c5e60dd5 [test_utils] Fix expand_path tests 2017-03-26 03:07:56 +07:00
d212c93d16 [pluralsight] PEP 8 2017-03-26 02:34:25 +07:00
15495cf3e5 [franceculture] PEP 8 2017-03-26 02:32:46 +07:00
5b7cc56b05 [atresplayer] PEP 8 2017-03-26 02:32:14 +07:00
590bc6f6a1 Use expand_path where appropriate (closes #12556) 2017-03-26 02:31:16 +07:00
51098426b8 [utils] Introduce expand_path 2017-03-26 02:30:10 +07:00
c73e330e7a _find_jwplayer_data() returns dict or None
This simplifies code for callers of `_find_jwplayer_data()` which no longer have
to run `_parse_json()` on the return value.

It also makes sure that `_find_jwplayer_data()` returns either a `dict` or
`None` and nothing else.
2017-03-25 19:38:30 +01:00
fb4fc44928 [downloader/hls] immediately delegate downloading to ffmpeg in case live stream 2017-03-25 19:38:23 +01:00
03486dbb01 Add test for JWPlayer where config is passed as variable 2017-03-25 19:37:45 +01:00
51ef4919df [afreecatv] Fix extraction (closes #12179) 2017-03-26 01:32:07 +08:00
d66d43c554 [atvat] Add new extractor(closes #5325) 2017-03-25 18:13:58 +01:00
610a6d1053 [atresplayer] Do not extract ISM formats
As per @remitamine: the ISM downloader does not support videos served from wowza servers(it will produce broken files)
2017-03-25 21:40:54 +07:00
c6c22e984d [test_download] Print additional IEs in summary output 2017-03-25 22:36:40 +08:00
d97729c83a [fox] remove unused import 2017-03-25 14:28:53 +01:00
7aa0ee321b [fox] Add metadata extraction
Add series, season number, episode number and episode.
2017-03-25 21:12:25 +08:00
e8e4cc5a6a [generic] Replace LazyYT test with skiplagged
discourse.ubuntu.com has gone away, repalce with skiplagged.com.
Be nice to have a non-frontpage URL that might be more stable,
though I don't have one. Maybe this should move to html
in test/test_InfoExtractor.py?
2017-03-25 19:53:32 +07:00
c7301e677b [atresplayer] Extract DASH and ISM formats 2017-03-25 18:03:46 +07:00
048086920b [atresplayer] Extract HD manifest 2017-03-25 17:52:04 +07:00
1088d76da6 [atresplayer] Fix login error detection 2017-03-25 17:47:35 +07:00
31a1214076 [franceculture] fix extraction(closes #12547) 2017-03-25 07:04:48 +01:00
d0ba55871e [youtube] Improve _VALID_URLs (closes #12538) 2017-03-25 01:18:33 +07:00
54b960f340 [generic] Do not follow redirects to the same URL 2017-03-24 00:45:24 +07:00
43 changed files with 1332 additions and 528 deletions

View File

@ -6,8 +6,8 @@
--- ---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.03.24*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected. ### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.04.02*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.03.24** - [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.04.02**
### Before submitting an *issue* make sure you have: ### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections - [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2017.03.24 [debug] youtube-dl version 2017.04.02
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View File

@ -1,3 +1,48 @@
version 2017.04.02
Core
[YoutubeDL] Return early when extraction of url_transparent fails
Extractors
* [rai] Fix and improve extraction (#11790)
+ [vrv] Add support for series pages
* [limelight] Improve extraction for audio only formats
* [funimation] Fix extraction (#10696, #11773)
+ [xfileshare] Add support for vidabc.com (#12589)
+ [xfileshare] Improve extraction and extract hls formats
+ [crunchyroll] Pass geo verifcation proxy
+ [cwtv] Extract ISM formats
+ [tvplay] Bypass geo restriction
+ [vrv] Add support for vrv.co
+ [packtpub] Add support for packtpub.com (#12610)
+ [generic] Pass base_url to _parse_jwplayer_data
+ [adn] Add support for animedigitalnetwork.fr (#4866)
+ [allocine] Extract more metadata
* [allocine] Fix extraction (#12592)
* [openload] Fix extraction
version 2017.03.26
Core
* Don't raise an error if JWPlayer config data is not a Javascript object
literal. _find_jwplayer_data now returns a dict rather than an str. (#12307)
* Expand environment variables for options representing paths (#12556)
+ [utils] Introduce expand_path
* [downloader/hls] Delegate downloading to ffmpeg immediately for live streams
Extractors
* [afreecatv] Fix extraction (#12179)
+ [atvat] Add support for atv.at (#5325)
+ [fox] Add metadata extraction (#12391)
+ [atresplayer] Extract DASH formats
+ [atresplayer] Extract HD manifest (#12548)
* [atresplayer] Fix login error detection (#12548)
* [franceculture] Fix extraction (#12547)
* [youtube] Improve URL regular expression (#12538)
* [generic] Do not follow redirects to the same URL
version 2017.03.24 version 2017.03.24
Extractors Extractors

View File

@ -181,10 +181,10 @@ Alternatively, refer to the [developer instructions](#developer-instructions) fo
-R, --retries RETRIES Number of retries (default is 10), or -R, --retries RETRIES Number of retries (default is 10), or
"infinite". "infinite".
--fragment-retries RETRIES Number of retries for a fragment (default --fragment-retries RETRIES Number of retries for a fragment (default
is 10), or "infinite" (DASH and hlsnative is 10), or "infinite" (DASH, hlsnative and
only) ISM)
--skip-unavailable-fragments Skip unavailable fragments (DASH and --skip-unavailable-fragments Skip unavailable fragments (DASH, hlsnative
hlsnative only) and ISM)
--abort-on-unavailable-fragment Abort downloading when some fragment is not --abort-on-unavailable-fragment Abort downloading when some fragment is not
available available
--buffer-size SIZE Size of download buffer (e.g. 1024 or 16K) --buffer-size SIZE Size of download buffer (e.g. 1024 or 16K)

View File

@ -28,6 +28,7 @@
- **acast** - **acast**
- **acast:channel** - **acast:channel**
- **AddAnime** - **AddAnime**
- **ADN**: Anime Digital Network
- **AdobeTV** - **AdobeTV**
- **AdobeTVChannel** - **AdobeTVChannel**
- **AdobeTVShow** - **AdobeTVShow**
@ -67,6 +68,7 @@
- **arte.tv:playlist** - **arte.tv:playlist**
- **AtresPlayer** - **AtresPlayer**
- **ATTTechChannel** - **ATTTechChannel**
- **ATVAt**
- **AudiMedia** - **AudiMedia**
- **AudioBoom** - **AudioBoom**
- **audiomack** - **audiomack**
@ -571,6 +573,8 @@
- **orf:iptv**: iptv.ORF.at - **orf:iptv**: iptv.ORF.at
- **orf:oe1**: Radio Österreich 1 - **orf:oe1**: Radio Österreich 1
- **orf:tvthek**: ORF TVthek - **orf:tvthek**: ORF TVthek
- **PacktPub**
- **PacktPubCourse**
- **PandaTV**: 熊猫TV - **PandaTV**: 熊猫TV
- **pandora.tv**: 판도라TV - **pandora.tv**: 판도라TV
- **parliamentlive.tv**: UK parliament videos - **parliamentlive.tv**: UK parliament videos
@ -628,7 +632,7 @@
- **radiofrance** - **radiofrance**
- **RadioJavan** - **RadioJavan**
- **Rai** - **Rai**
- **RaiTV** - **RaiPlay**
- **RBMARadio** - **RBMARadio**
- **RDS**: RDS.ca - **RDS**: RDS.ca
- **RedBullTV** - **RedBullTV**
@ -925,6 +929,8 @@
- **vpro**: npo.nl and ntr.nl - **vpro**: npo.nl and ntr.nl
- **Vrak** - **Vrak**
- **VRT** - **VRT**
- **vrv**
- **vrv:series**
- **vube**: Vube.com - **vube**: Vube.com
- **VuClip** - **VuClip**
- **VVVVID** - **VVVVID**
@ -952,7 +958,7 @@
- **WSJ**: Wall Street Journal - **WSJ**: Wall Street Journal
- **XBef** - **XBef**
- **XboxClips** - **XboxClips**
- **XFileShare**: XFileShare based sites: DaClips, FileHoot, GorillaVid, MovPod, PowerWatch, Rapidvideo.ws, TheVideoBee, Vidto, Streamin.To, XVIDSTAGE - **XFileShare**: XFileShare based sites: DaClips, FileHoot, GorillaVid, MovPod, PowerWatch, Rapidvideo.ws, TheVideoBee, Vidto, Streamin.To, XVIDSTAGE, Vid ABC
- **XHamster** - **XHamster**
- **XHamsterEmbed** - **XHamsterEmbed**
- **xiami:album**: 虾米音乐 - 专辑 - **xiami:album**: 虾米音乐 - 专辑

View File

@ -27,11 +27,11 @@ from youtube_dl.compat import (
class TestCompat(unittest.TestCase): class TestCompat(unittest.TestCase):
def test_compat_getenv(self): def test_compat_getenv(self):
test_str = 'тест' test_str = 'тест'
compat_setenv('YOUTUBE-DL-TEST', test_str) compat_setenv('YOUTUBE_DL_COMPAT_GETENV', test_str)
self.assertEqual(compat_getenv('YOUTUBE-DL-TEST'), test_str) self.assertEqual(compat_getenv('YOUTUBE_DL_COMPAT_GETENV'), test_str)
def test_compat_setenv(self): def test_compat_setenv(self):
test_var = 'YOUTUBE-DL-TEST' test_var = 'YOUTUBE_DL_COMPAT_SETENV'
test_str = 'тест' test_str = 'тест'
compat_setenv(test_var, test_str) compat_setenv(test_var, test_str)
compat_getenv(test_var) compat_getenv(test_var)

View File

@ -71,6 +71,18 @@ class TestDownload(unittest.TestCase):
maxDiff = None maxDiff = None
def __str__(self):
"""Identify each test with the `add_ie` attribute, if available."""
def strclass(cls):
"""From 2.7's unittest; 2.6 had _strclass so we can't import it."""
return '%s.%s' % (cls.__module__, cls.__name__)
add_ie = getattr(self, self._testMethodName).add_ie
return '%s (%s)%s:' % (self._testMethodName,
strclass(self.__class__),
' [%s]' % add_ie if add_ie else '')
def setUp(self): def setUp(self):
self.defs = defs self.defs = defs
@ -233,6 +245,8 @@ for n, test_case in enumerate(defs):
i += 1 i += 1
test_method = generator(test_case, tname) test_method = generator(test_case, tname)
test_method.__name__ = str(tname) test_method.__name__ = str(tname)
ie_list = test_case.get('add_ie')
test_method.add_ie = ie_list and ','.join(ie_list)
setattr(TestDownload, test_method.__name__, test_method) setattr(TestDownload, test_method.__name__, test_method)
del test_method del test_method

View File

@ -56,6 +56,7 @@ from youtube_dl.utils import (
read_batch_urls, read_batch_urls,
sanitize_filename, sanitize_filename,
sanitize_path, sanitize_path,
expand_path,
prepend_extension, prepend_extension,
replace_extension, replace_extension,
remove_start, remove_start,
@ -95,6 +96,8 @@ from youtube_dl.utils import (
from youtube_dl.compat import ( from youtube_dl.compat import (
compat_chr, compat_chr,
compat_etree_fromstring, compat_etree_fromstring,
compat_getenv,
compat_setenv,
compat_urlparse, compat_urlparse,
compat_parse_qs, compat_parse_qs,
) )
@ -214,6 +217,18 @@ class TestUtil(unittest.TestCase):
self.assertEqual(sanitize_path('./abc'), 'abc') self.assertEqual(sanitize_path('./abc'), 'abc')
self.assertEqual(sanitize_path('./../abc'), '..\\abc') self.assertEqual(sanitize_path('./../abc'), '..\\abc')
def test_expand_path(self):
def env(var):
return '%{0}%'.format(var) if sys.platform == 'win32' else '${0}'.format(var)
compat_setenv('YOUTUBE_DL_EXPATH_PATH', 'expanded')
self.assertEqual(expand_path(env('YOUTUBE_DL_EXPATH_PATH')), 'expanded')
self.assertEqual(expand_path(env('HOME')), compat_getenv('HOME'))
self.assertEqual(expand_path('~'), compat_getenv('HOME'))
self.assertEqual(
expand_path('~/%s' % env('YOUTUBE_DL_EXPATH_PATH')),
'%s/expanded' % compat_getenv('HOME'))
def test_prepend_extension(self): def test_prepend_extension(self):
self.assertEqual(prepend_extension('abc.ext', 'temp'), 'abc.temp.ext') self.assertEqual(prepend_extension('abc.ext', 'temp'), 'abc.temp.ext')
self.assertEqual(prepend_extension('abc.ext', 'temp', 'ext'), 'abc.temp.ext') self.assertEqual(prepend_extension('abc.ext', 'temp', 'ext'), 'abc.temp.ext')

View File

@ -29,7 +29,6 @@ import random
from .compat import ( from .compat import (
compat_basestring, compat_basestring,
compat_cookiejar, compat_cookiejar,
compat_expanduser,
compat_get_terminal_size, compat_get_terminal_size,
compat_http_client, compat_http_client,
compat_kwargs, compat_kwargs,
@ -54,6 +53,7 @@ from .utils import (
encode_compat_str, encode_compat_str,
encodeFilename, encodeFilename,
error_to_compat_str, error_to_compat_str,
expand_path,
ExtractorError, ExtractorError,
format_bytes, format_bytes,
formatSeconds, formatSeconds,
@ -672,7 +672,7 @@ class YoutubeDL(object):
FORMAT_RE.format(numeric_field), FORMAT_RE.format(numeric_field),
r'%({0})s'.format(numeric_field), outtmpl) r'%({0})s'.format(numeric_field), outtmpl)
tmpl = compat_expanduser(outtmpl) tmpl = expand_path(outtmpl)
filename = tmpl % template_dict filename = tmpl % template_dict
# Temporary fix for #4787 # Temporary fix for #4787
# 'Treat' all problem characters by passing filename through preferredencoding # 'Treat' all problem characters by passing filename through preferredencoding
@ -837,6 +837,12 @@ class YoutubeDL(object):
ie_result['url'], ie_key=ie_result.get('ie_key'), ie_result['url'], ie_key=ie_result.get('ie_key'),
extra_info=extra_info, download=False, process=False) extra_info=extra_info, download=False, process=False)
# extract_info may return None when ignoreerrors is enabled and
# extraction failed with an error, don't crash and return early
# in this case
if not info:
return info
force_properties = dict( force_properties = dict(
(k, v) for k, v in ie_result.items() if v is not None) (k, v) for k, v in ie_result.items() if v is not None)
for f in ('_type', 'url', 'ie_key'): for f in ('_type', 'url', 'ie_key'):
@ -2170,7 +2176,7 @@ class YoutubeDL(object):
if opts_cookiefile is None: if opts_cookiefile is None:
self.cookiejar = compat_cookiejar.CookieJar() self.cookiejar = compat_cookiejar.CookieJar()
else: else:
opts_cookiefile = compat_expanduser(opts_cookiefile) opts_cookiefile = expand_path(opts_cookiefile)
self.cookiejar = compat_cookiejar.MozillaCookieJar( self.cookiejar = compat_cookiejar.MozillaCookieJar(
opts_cookiefile) opts_cookiefile)
if os.access(opts_cookiefile, os.R_OK): if os.access(opts_cookiefile, os.R_OK):

View File

@ -16,7 +16,6 @@ from .options import (
parseOpts, parseOpts,
) )
from .compat import ( from .compat import (
compat_expanduser,
compat_getpass, compat_getpass,
compat_shlex_split, compat_shlex_split,
workaround_optparse_bug9161, workaround_optparse_bug9161,
@ -26,6 +25,7 @@ from .utils import (
decodeOption, decodeOption,
DEFAULT_OUTTMPL, DEFAULT_OUTTMPL,
DownloadError, DownloadError,
expand_path,
match_filter_func, match_filter_func,
MaxDownloadsReached, MaxDownloadsReached,
preferredencoding, preferredencoding,
@ -88,7 +88,7 @@ def _real_main(argv=None):
batchfd = sys.stdin batchfd = sys.stdin
else: else:
batchfd = io.open( batchfd = io.open(
compat_expanduser(opts.batchfile), expand_path(opts.batchfile),
'r', encoding='utf-8', errors='ignore') 'r', encoding='utf-8', errors='ignore')
batch_urls = read_batch_urls(batchfd) batch_urls = read_batch_urls(batchfd)
if opts.verbose: if opts.verbose:
@ -238,7 +238,7 @@ def _real_main(argv=None):
any_getting = opts.geturl or opts.gettitle or opts.getid or opts.getthumbnail or opts.getdescription or opts.getfilename or opts.getformat or opts.getduration or opts.dumpjson or opts.dump_single_json any_getting = opts.geturl or opts.gettitle or opts.getid or opts.getthumbnail or opts.getdescription or opts.getfilename or opts.getformat or opts.getduration or opts.dumpjson or opts.dump_single_json
any_printing = opts.print_json any_printing = opts.print_json
download_archive_fn = compat_expanduser(opts.download_archive) if opts.download_archive is not None else opts.download_archive download_archive_fn = expand_path(opts.download_archive) if opts.download_archive is not None else opts.download_archive
# PostProcessors # PostProcessors
postprocessors = [] postprocessors = []
@ -449,7 +449,7 @@ def _real_main(argv=None):
try: try:
if opts.load_info_filename is not None: if opts.load_info_filename is not None:
retcode = ydl.download_with_info_file(compat_expanduser(opts.load_info_filename)) retcode = ydl.download_with_info_file(expand_path(opts.load_info_filename))
else: else:
retcode = ydl.download(all_urls) retcode = ydl.download(all_urls)
except MaxDownloadsReached: except MaxDownloadsReached:

View File

@ -8,8 +8,11 @@ import re
import shutil import shutil
import traceback import traceback
from .compat import compat_expanduser, compat_getenv from .compat import compat_getenv
from .utils import write_json_file from .utils import (
expand_path,
write_json_file,
)
class Cache(object): class Cache(object):
@ -21,7 +24,7 @@ class Cache(object):
if res is None: if res is None:
cache_root = compat_getenv('XDG_CACHE_HOME', '~/.cache') cache_root = compat_getenv('XDG_CACHE_HOME', '~/.cache')
res = os.path.join(cache_root, 'youtube-dl') res = os.path.join(cache_root, 'youtube-dl')
return compat_expanduser(res) return expand_path(res)
def _get_cache_fn(self, section, key, dtype): def _get_cache_fn(self, section, key, dtype):
assert re.match(r'^[a-zA-Z0-9_.-]+$', section), \ assert re.match(r'^[a-zA-Z0-9_.-]+$', section), \

View File

@ -43,6 +43,9 @@ def get_suitable_downloader(info_dict, params={}):
if ed.can_download(info_dict): if ed.can_download(info_dict):
return ed return ed
if protocol.startswith('m3u8') and info_dict.get('is_live'):
return FFmpegFD
if protocol == 'm3u8' and params.get('hls_prefer_native') is True: if protocol == 'm3u8' and params.get('hls_prefer_native') is True:
return HlsFD return HlsFD

136
youtube_dl/extractor/adn.py Normal file
View File

@ -0,0 +1,136 @@
# coding: utf-8
from __future__ import unicode_literals
import base64
import json
import os
from .common import InfoExtractor
from ..aes import aes_cbc_decrypt
from ..compat import compat_ord
from ..utils import (
bytes_to_intlist,
ExtractorError,
float_or_none,
intlist_to_bytes,
srt_subtitles_timecode,
strip_or_none,
)
class ADNIE(InfoExtractor):
IE_DESC = 'Anime Digital Network'
_VALID_URL = r'https?://(?:www\.)?animedigitalnetwork\.fr/video/[^/]+/(?P<id>\d+)'
_TEST = {
'url': 'http://animedigitalnetwork.fr/video/blue-exorcist-kyoto-saga/7778-episode-1-debut-des-hostilites',
'md5': 'e497370d847fd79d9d4c74be55575c7a',
'info_dict': {
'id': '7778',
'ext': 'mp4',
'title': 'Blue Exorcist - Kyôto Saga - Épisode 1',
'description': 'md5:2f7b5aa76edbc1a7a92cedcda8a528d5',
}
}
def _get_subtitles(self, sub_path, video_id):
if not sub_path:
return None
enc_subtitles = self._download_webpage(
'http://animedigitalnetwork.fr/' + sub_path,
video_id, fatal=False)
if not enc_subtitles:
return None
# http://animedigitalnetwork.fr/components/com_vodvideo/videojs/adn-vjs.min.js
dec_subtitles = intlist_to_bytes(aes_cbc_decrypt(
bytes_to_intlist(base64.b64decode(enc_subtitles[24:])),
bytes_to_intlist(b'\xb5@\xcfq\xa3\x98"N\xe4\xf3\x12\x98}}\x16\xd8'),
bytes_to_intlist(base64.b64decode(enc_subtitles[:24]))
))
subtitles_json = self._parse_json(
dec_subtitles[:-compat_ord(dec_subtitles[-1])],
None, fatal=False)
if not subtitles_json:
return None
subtitles = {}
for sub_lang, sub in subtitles_json.items():
srt = ''
for num, current in enumerate(sub):
start, end, text = (
float_or_none(current.get('startTime')),
float_or_none(current.get('endTime')),
current.get('text'))
if start is None or end is None or text is None:
continue
srt += os.linesep.join(
(
'%d' % num,
'%s --> %s' % (
srt_subtitles_timecode(start),
srt_subtitles_timecode(end)),
text,
os.linesep,
))
if sub_lang == 'vostf':
sub_lang = 'fr'
subtitles.setdefault(sub_lang, []).extend([{
'ext': 'json',
'data': json.dumps(sub),
}, {
'ext': 'srt',
'data': srt,
}])
return subtitles
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
player_config = self._parse_json(self._search_regex(
r'playerConfig\s*=\s*({.+});', webpage, 'player config'), video_id)
video_info = {}
video_info_str = self._search_regex(
r'videoInfo\s*=\s*({.+});', webpage,
'video info', fatal=False)
if video_info_str:
video_info = self._parse_json(
video_info_str, video_id, fatal=False) or {}
options = player_config.get('options') or {}
metas = options.get('metas') or {}
title = metas.get('title') or video_info['title']
links = player_config.get('links') or {}
formats = []
for format_id, qualities in links.items():
for load_balancer_url in qualities.values():
load_balancer_data = self._download_json(
load_balancer_url, video_id, fatal=False) or {}
m3u8_url = load_balancer_data.get('location')
if not m3u8_url:
continue
m3u8_formats = self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native',
m3u8_id=format_id, fatal=False)
if format_id == 'vf':
for f in m3u8_formats:
f['language'] = 'fr'
formats.extend(m3u8_formats)
error = options.get('error')
if not formats and error:
raise ExtractorError('%s said: %s' % (self.IE_NAME, error), expected=True)
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': strip_or_none(metas.get('summary') or video_info.get('resume')),
'thumbnail': video_info.get('image'),
'formats': formats,
'subtitles': self.extract_subtitles(player_config.get('subtitles'), video_id),
'episode': metas.get('subtitle') or video_info.get('videoTitle'),
'series': video_info.get('playlistTitle'),
}

View File

@ -4,15 +4,10 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import compat_xpath
compat_urllib_parse_urlparse,
compat_urlparse,
)
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
int_or_none, int_or_none,
update_url_query,
xpath_element,
xpath_text, xpath_text,
) )
@ -43,7 +38,8 @@ class AfreecaTVIE(InfoExtractor):
'uploader': 'dailyapril', 'uploader': 'dailyapril',
'uploader_id': 'dailyapril', 'uploader_id': 'dailyapril',
'upload_date': '20160503', 'upload_date': '20160503',
} },
'skip': 'Video is gone',
}, { }, {
'url': 'http://afbbs.afreecatv.com:8080/app/read_ucc_bbs.cgi?nStationNo=16711924&nTitleNo=36153164&szBjId=dailyapril&nBbsNo=18605867', 'url': 'http://afbbs.afreecatv.com:8080/app/read_ucc_bbs.cgi?nStationNo=16711924&nTitleNo=36153164&szBjId=dailyapril&nBbsNo=18605867',
'info_dict': { 'info_dict': {
@ -71,6 +67,19 @@ class AfreecaTVIE(InfoExtractor):
'upload_date': '20160502', 'upload_date': '20160502',
}, },
}], }],
'skip': 'Video is gone',
}, {
'url': 'http://vod.afreecatv.com/PLAYER/STATION/18650793',
'info_dict': {
'id': '18650793',
'ext': 'flv',
'uploader': '윈아디',
'uploader_id': 'badkids',
'title': '오늘은 다르다! 쏘님의 우월한 위아래~ 댄스리액션!',
},
'params': {
'skip_download': True, # requires rtmpdump
},
}, { }, {
'url': 'http://www.afreecatv.com/player/Player.swf?szType=szBjId=djleegoon&nStationNo=11273158&nBbsNo=13161095&nTitleNo=36327652', 'url': 'http://www.afreecatv.com/player/Player.swf?szType=szBjId=djleegoon&nStationNo=11273158&nBbsNo=13161095&nTitleNo=36327652',
'only_matching': True, 'only_matching': True,
@ -90,40 +99,33 @@ class AfreecaTVIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
parsed_url = compat_urllib_parse_urlparse(url)
info_url = compat_urlparse.urlunparse(parsed_url._replace(
netloc='afbbs.afreecatv.com:8080',
path='/api/video/get_video_info.php'))
video_xml = self._download_xml( video_xml = self._download_xml(
update_url_query(info_url, {'nTitleNo': video_id}), video_id) 'http://afbbs.afreecatv.com:8080/api/video/get_video_info.php',
video_id, query={'nTitleNo': video_id})
if xpath_element(video_xml, './track/video/file') is None: video_element = video_xml.findall(compat_xpath('./track/video'))[1]
if video_element is None or video_element.text is None:
raise ExtractorError('Specified AfreecaTV video does not exist', raise ExtractorError('Specified AfreecaTV video does not exist',
expected=True) expected=True)
title = xpath_text(video_xml, './track/title', 'title') video_url_raw = video_element.text
app, playpath = video_url_raw.split('mp4:')
title = xpath_text(video_xml, './track/title', 'title', fatal=True)
uploader = xpath_text(video_xml, './track/nickname', 'uploader') uploader = xpath_text(video_xml, './track/nickname', 'uploader')
uploader_id = xpath_text(video_xml, './track/bj_id', 'uploader id') uploader_id = xpath_text(video_xml, './track/bj_id', 'uploader id')
duration = int_or_none(xpath_text(video_xml, './track/duration', duration = int_or_none(xpath_text(video_xml, './track/duration',
'duration')) 'duration'))
thumbnail = xpath_text(video_xml, './track/titleImage', 'thumbnail') thumbnail = xpath_text(video_xml, './track/titleImage', 'thumbnail')
entries = [] return {
for i, video_file in enumerate(video_xml.findall('./track/video/file')):
video_key = self.parse_video_key(video_file.get('key', ''))
if not video_key:
continue
entries.append({
'id': '%s_%s' % (video_id, video_key.get('part', i + 1)),
'title': title,
'upload_date': video_key.get('upload_date'),
'duration': int_or_none(video_file.get('duration')),
'url': video_file.text,
})
info = {
'id': video_id, 'id': video_id,
'url': app,
'ext': 'flv',
'play_path': 'mp4:' + playpath,
'rtmp_live': True, # downloading won't end without this
'title': title, 'title': title,
'uploader': uploader, 'uploader': uploader,
'uploader_id': uploader_id, 'uploader_id': uploader_id,
@ -131,20 +133,6 @@ class AfreecaTVIE(InfoExtractor):
'thumbnail': thumbnail, 'thumbnail': thumbnail,
} }
if len(entries) > 1:
info['_type'] = 'multi_video'
info['entries'] = entries
elif len(entries) == 1:
info['url'] = entries[0]['url']
info['upload_date'] = entries[0].get('upload_date')
else:
raise ExtractorError(
'No files found for the specified AfreecaTV video, either'
' the URL is incorrect or the video has been made private.',
expected=True)
return info
class AfreecaTVGlobalIE(AfreecaTVIE): class AfreecaTVGlobalIE(AfreecaTVIE):
IE_NAME = 'afreecatv:global' IE_NAME = 'afreecatv:global'

View File

@ -2,9 +2,13 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str
from ..utils import ( from ..utils import (
remove_end, int_or_none,
qualities, qualities,
remove_end,
try_get,
unified_timestamp,
url_basename, url_basename,
) )
@ -22,6 +26,10 @@ class AllocineIE(InfoExtractor):
'title': 'Astérix - Le Domaine des Dieux Teaser VF', 'title': 'Astérix - Le Domaine des Dieux Teaser VF',
'description': 'md5:4a754271d9c6f16c72629a8a993ee884', 'description': 'md5:4a754271d9c6f16c72629a8a993ee884',
'thumbnail': r're:http://.*\.jpg', 'thumbnail': r're:http://.*\.jpg',
'duration': 39,
'timestamp': 1404273600,
'upload_date': '20140702',
'view_count': int,
}, },
}, { }, {
'url': 'http://www.allocine.fr/video/player_gen_cmedia=19540403&cfilm=222257.html', 'url': 'http://www.allocine.fr/video/player_gen_cmedia=19540403&cfilm=222257.html',
@ -33,6 +41,10 @@ class AllocineIE(InfoExtractor):
'title': 'Planes 2 Bande-annonce VF', 'title': 'Planes 2 Bande-annonce VF',
'description': 'Regardez la bande annonce du film Planes 2 (Planes 2 Bande-annonce VF). Planes 2, un film de Roberts Gannaway', 'description': 'Regardez la bande annonce du film Planes 2 (Planes 2 Bande-annonce VF). Planes 2, un film de Roberts Gannaway',
'thumbnail': r're:http://.*\.jpg', 'thumbnail': r're:http://.*\.jpg',
'duration': 69,
'timestamp': 1385659800,
'upload_date': '20131128',
'view_count': int,
}, },
}, { }, {
'url': 'http://www.allocine.fr/video/player_gen_cmedia=19544709&cfilm=181290.html', 'url': 'http://www.allocine.fr/video/player_gen_cmedia=19544709&cfilm=181290.html',
@ -44,6 +56,10 @@ class AllocineIE(InfoExtractor):
'title': 'Dragons 2 - Bande annonce finale VF', 'title': 'Dragons 2 - Bande annonce finale VF',
'description': 'md5:6cdd2d7c2687d4c6aafe80a35e17267a', 'description': 'md5:6cdd2d7c2687d4c6aafe80a35e17267a',
'thumbnail': r're:http://.*\.jpg', 'thumbnail': r're:http://.*\.jpg',
'duration': 144,
'timestamp': 1397589900,
'upload_date': '20140415',
'view_count': int,
}, },
}, { }, {
'url': 'http://www.allocine.fr/video/video-19550147/', 'url': 'http://www.allocine.fr/video/video-19550147/',
@ -69,34 +85,37 @@ class AllocineIE(InfoExtractor):
r'data-model="([^"]+)"', webpage, 'data model', default=None) r'data-model="([^"]+)"', webpage, 'data model', default=None)
if model: if model:
model_data = self._parse_json(model, display_id) model_data = self._parse_json(model, display_id)
video = model_data['videos'][0]
for video_url in model_data['sources'].values(): title = video['title']
for video_url in video['sources'].values():
video_id, format_id = url_basename(video_url).split('_')[:2] video_id, format_id = url_basename(video_url).split('_')[:2]
formats.append({ formats.append({
'format_id': format_id, 'format_id': format_id,
'quality': quality(format_id), 'quality': quality(format_id),
'url': video_url, 'url': video_url,
}) })
duration = int_or_none(video.get('duration'))
title = model_data['title'] view_count = int_or_none(video.get('view_count'))
timestamp = unified_timestamp(try_get(
video, lambda x: x['added_at']['date'], compat_str))
else: else:
video_id = display_id video_id = display_id
media_data = self._download_json( media_data = self._download_json(
'http://www.allocine.fr/ws/AcVisiondataV5.ashx?media=%s' % video_id, display_id) 'http://www.allocine.fr/ws/AcVisiondataV5.ashx?media=%s' % video_id, display_id)
title = remove_end(
self._html_search_regex(
r'(?s)<title>(.+?)</title>', webpage, 'title').strip(),
' - AlloCiné')
for key, value in media_data['video'].items(): for key, value in media_data['video'].items():
if not key.endswith('Path'): if not key.endswith('Path'):
continue continue
format_id = key[:-len('Path')] format_id = key[:-len('Path')]
formats.append({ formats.append({
'format_id': format_id, 'format_id': format_id,
'quality': quality(format_id), 'quality': quality(format_id),
'url': value, 'url': value,
}) })
duration, view_count, timestamp = [None] * 3
title = remove_end(self._html_search_regex(
r'(?s)<title>(.+?)</title>', webpage, 'title'
).strip(), ' - AlloCiné')
self._sort_formats(formats) self._sort_formats(formats)
@ -104,7 +123,10 @@ class AllocineIE(InfoExtractor):
'id': video_id, 'id': video_id,
'display_id': display_id, 'display_id': display_id,
'title': title, 'title': title,
'thumbnail': self._og_search_thumbnail(webpage),
'formats': formats,
'description': self._og_search_description(webpage), 'description': self._og_search_description(webpage),
'thumbnail': self._og_search_thumbnail(webpage),
'duration': duration,
'timestamp': timestamp,
'view_count': view_count,
'formats': formats,
} }

View File

@ -93,8 +93,7 @@ class ArkenaIE(InfoExtractor):
exts = (mimetype2ext(f.get('Type')), determine_ext(f_url, None)) exts = (mimetype2ext(f.get('Type')), determine_ext(f_url, None))
if kind == 'm3u8' or 'm3u8' in exts: if kind == 'm3u8' or 'm3u8' in exts:
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
f_url, video_id, 'mp4', f_url, video_id, 'mp4', 'm3u8_native',
entry_protocol='m3u8' if is_live else 'm3u8_native',
m3u8_id=kind, fatal=False, live=is_live)) m3u8_id=kind, fatal=False, live=is_live))
elif kind == 'flash' or 'f4m' in exts: elif kind == 'flash' or 'f4m' in exts:
formats.extend(self._extract_f4m_formats( formats.extend(self._extract_f4m_formats(

View File

@ -90,7 +90,8 @@ class AtresPlayerIE(InfoExtractor):
request, None, 'Logging in as %s' % username) request, None, 'Logging in as %s' % username)
error = self._html_search_regex( error = self._html_search_regex(
r'(?s)<ul class="list_error">(.+?)</ul>', response, 'error', default=None) r'(?s)<ul[^>]+class="[^"]*\blist_error\b[^"]*">(.+?)</ul>',
response, 'error', default=None)
if error: if error:
raise ExtractorError( raise ExtractorError(
'Unable to login: %s' % error, expected=True) 'Unable to login: %s' % error, expected=True)
@ -155,13 +156,17 @@ class AtresPlayerIE(InfoExtractor):
if format_id == 'token' or not video_url.startswith('http'): if format_id == 'token' or not video_url.startswith('http'):
continue continue
if 'geodeswowsmpra3player' in video_url: if 'geodeswowsmpra3player' in video_url:
f4m_path = video_url.split('smil:', 1)[-1].split('free_', 1)[0] # f4m_path = video_url.split('smil:', 1)[-1].split('free_', 1)[0]
f4m_url = 'http://drg.antena3.com/{0}hds/es/sd.f4m'.format(f4m_path) # f4m_url = 'http://drg.antena3.com/{0}hds/es/sd.f4m'.format(f4m_path)
# this videos are protected by DRM, the f4m downloader doesn't support them # this videos are protected by DRM, the f4m downloader doesn't support them
continue continue
else: video_url_hd = video_url.replace('free_es', 'es')
f4m_url = video_url[:-9] + '/manifest.f4m' formats.extend(self._extract_f4m_formats(
formats.extend(self._extract_f4m_formats(f4m_url, video_id, f4m_id='hds', fatal=False)) video_url_hd[:-9] + '/manifest.f4m', video_id, f4m_id='hds',
fatal=False))
formats.extend(self._extract_mpd_formats(
video_url_hd[:-9] + '/manifest.mpd', video_id, mpd_id='dash',
fatal=False))
self._sort_formats(formats) self._sort_formats(formats)
path_data = player.get('pathData') path_data = player.get('pathData')

View File

@ -0,0 +1,73 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
determine_ext,
int_or_none,
unescapeHTML,
)
class ATVAtIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?atv\.at/(?:[^/]+/){2}(?P<id>[dv]\d+)'
_TESTS = [{
'url': 'http://atv.at/aktuell/di-210317-2005-uhr/v1698449/',
'md5': 'c3b6b975fb3150fc628572939df205f2',
'info_dict': {
'id': '1698447',
'ext': 'mp4',
'title': 'DI, 21.03.17 | 20:05 Uhr 1/1',
}
}, {
'url': 'http://atv.at/aktuell/meinrad-knapp/d8416/',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_data = self._parse_json(unescapeHTML(self._search_regex(
r'class="[^"]*jsb_video/FlashPlayer[^"]*"[^>]+data-jsb="([^"]+)"',
webpage, 'player data')), display_id)['config']['initial_video']
video_id = video_data['id']
video_title = video_data['title']
parts = []
for part in video_data.get('parts', []):
part_id = part['id']
part_title = part['title']
formats = []
for source in part.get('sources', []):
source_url = source.get('src')
if not source_url:
continue
ext = determine_ext(source_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
source_url, part_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
else:
formats.append({
'format_id': source.get('delivery'),
'url': source_url,
})
self._sort_formats(formats)
parts.append({
'id': part_id,
'title': part_title,
'thumbnail': part.get('preview_image_url'),
'duration': int_or_none(part.get('duration')),
'is_live': part.get('is_livestream'),
'formats': formats,
})
return {
'_type': 'multi_video',
'id': video_id,
'title': video_title,
'entries': parts,
}

View File

@ -160,8 +160,7 @@ class CeskaTelevizeIE(InfoExtractor):
for format_id, stream_url in item.get('streamUrls', {}).items(): for format_id, stream_url in item.get('streamUrls', {}).items():
if 'playerType=flash' in stream_url: if 'playerType=flash' in stream_url:
stream_formats = self._extract_m3u8_formats( stream_formats = self._extract_m3u8_formats(
stream_url, playlist_id, 'mp4', stream_url, playlist_id, 'mp4', 'm3u8_native',
entry_protocol='m3u8' if is_live else 'm3u8_native',
m3u8_id='hls-%s' % format_id, fatal=False) m3u8_id='hls-%s' % format_id, fatal=False)
else: else:
stream_formats = self._extract_mpd_formats( stream_formats = self._extract_mpd_formats(

View File

@ -2169,18 +2169,24 @@ class InfoExtractor(object):
}) })
return formats return formats
@staticmethod def _find_jwplayer_data(self, webpage, video_id=None, transform_source=js_to_json):
def _find_jwplayer_data(webpage):
mobj = re.search( mobj = re.search(
r'jwplayer\((?P<quote>[\'"])[^\'" ]+(?P=quote)\)\.setup\s*\((?P<options>[^)]+)\)', r'jwplayer\((?P<quote>[\'"])[^\'" ]+(?P=quote)\)\.setup\s*\((?P<options>[^)]+)\)',
webpage) webpage)
if mobj: if mobj:
return mobj.group('options') try:
jwplayer_data = self._parse_json(mobj.group('options'),
video_id=video_id,
transform_source=transform_source)
except ExtractorError:
pass
else:
if isinstance(jwplayer_data, dict):
return jwplayer_data
def _extract_jwplayer_data(self, webpage, video_id, *args, **kwargs): def _extract_jwplayer_data(self, webpage, video_id, *args, **kwargs):
jwplayer_data = self._parse_json( jwplayer_data = self._find_jwplayer_data(
self._find_jwplayer_data(webpage), video_id, webpage, video_id, transform_source=js_to_json)
transform_source=js_to_json)
return self._parse_jwplayer_data( return self._parse_jwplayer_data(
jwplayer_data, video_id, *args, **kwargs) jwplayer_data, video_id, *args, **kwargs)

View File

@ -390,7 +390,9 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
else: else:
webpage_url = 'http://www.' + mobj.group('url') webpage_url = 'http://www.' + mobj.group('url')
webpage = self._download_webpage(self._add_skip_wall(webpage_url), video_id, 'Downloading webpage') webpage = self._download_webpage(
self._add_skip_wall(webpage_url), video_id,
headers=self.geo_verification_headers())
note_m = self._html_search_regex( note_m = self._html_search_regex(
r'<div class="showmedia-trailer-notice">(.+?)</div>', r'<div class="showmedia-trailer-notice">(.+?)</div>',
webpage, 'trailer-notice', default='') webpage, 'trailer-notice', default='')
@ -565,7 +567,9 @@ class CrunchyrollShowPlaylistIE(CrunchyrollBaseIE):
def _real_extract(self, url): def _real_extract(self, url):
show_id = self._match_id(url) show_id = self._match_id(url)
webpage = self._download_webpage(self._add_skip_wall(url), show_id) webpage = self._download_webpage(
self._add_skip_wall(url), show_id,
headers=self.geo_verification_headers())
title = self._html_search_regex( title = self._html_search_regex(
r'(?s)<h1[^>]*>\s*<span itemprop="name">(.*?)</span>', r'(?s)<h1[^>]*>\s*<span itemprop="name">(.*?)</span>',
webpage, 'title') webpage, 'title')

View File

@ -82,6 +82,11 @@ class CWTVIE(InfoExtractor):
'url': quality_url, 'url': quality_url,
'tbr': tbr, 'tbr': tbr,
}) })
video_metadata = video_data['assetFields']
ism_url = video_metadata.get('smoothStreamingUrl')
if ism_url:
formats.extend(self._extract_ism_formats(
ism_url, video_id, ism_id='mss', fatal=False))
self._sort_formats(formats) self._sort_formats(formats)
thumbnails = [{ thumbnails = [{
@ -90,8 +95,6 @@ class CWTVIE(InfoExtractor):
'height': image.get('height'), 'height': image.get('height'),
} for image_id, image in video_data['images'].items() if image.get('uri')] if video_data.get('images') else None } for image_id, image in video_data['images'].items() if image.get('uri')] if video_data.get('images') else None
video_metadata = video_data['assetFields']
subtitles = { subtitles = {
'en': [{ 'en': [{
'url': video_metadata['UnicornCcUrl'], 'url': video_metadata['UnicornCcUrl'],

View File

@ -19,6 +19,7 @@ from .acast import (
ACastChannelIE, ACastChannelIE,
) )
from .addanime import AddAnimeIE from .addanime import AddAnimeIE
from .adn import ADNIE
from .adobetv import ( from .adobetv import (
AdobeTVIE, AdobeTVIE,
AdobeTVShowIE, AdobeTVShowIE,
@ -71,6 +72,7 @@ from .arte import (
) )
from .atresplayer import AtresPlayerIE from .atresplayer import AtresPlayerIE
from .atttechchannel import ATTTechChannelIE from .atttechchannel import ATTTechChannelIE
from .atvat import ATVAtIE
from .audimedia import AudiMediaIE from .audimedia import AudiMediaIE
from .audioboom import AudioBoomIE from .audioboom import AudioBoomIE
from .audiomack import AudiomackIE, AudiomackAlbumIE from .audiomack import AudiomackIE, AudiomackAlbumIE
@ -727,6 +729,10 @@ from .orf import (
ORFFM4IE, ORFFM4IE,
ORFIPTVIE, ORFIPTVIE,
) )
from .packtpub import (
PacktPubIE,
PacktPubCourseIE,
)
from .pandatv import PandaTVIE from .pandatv import PandaTVIE
from .pandoratv import PandoraTVIE from .pandoratv import PandoraTVIE
from .parliamentliveuk import ParliamentLiveUKIE from .parliamentliveuk import ParliamentLiveUKIE
@ -796,7 +802,7 @@ from .radiojavan import RadioJavanIE
from .radiobremen import RadioBremenIE from .radiobremen import RadioBremenIE
from .radiofrance import RadioFranceIE from .radiofrance import RadioFranceIE
from .rai import ( from .rai import (
RaiTVIE, RaiPlayIE,
RaiIE, RaiIE,
) )
from .rbmaradio import RBMARadioIE from .rbmaradio import RBMARadioIE
@ -1176,6 +1182,10 @@ from .voxmedia import VoxMediaIE
from .vporn import VpornIE from .vporn import VpornIE
from .vrt import VRTIE from .vrt import VRTIE
from .vrak import VrakIE from .vrak import VrakIE
from .vrv import (
VRVIE,
VRVSeriesIE,
)
from .medialaan import MedialaanIE from .medialaan import MedialaanIE
from .vube import VubeIE from .vube import VubeIE
from .vuclip import VuClipIE from .vuclip import VuClipIE

View File

@ -54,7 +54,7 @@ class EyedoTVIE(InfoExtractor):
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'formats': self._extract_m3u8_formats( 'formats': self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8' if is_live else 'm3u8_native'), m3u8_url, video_id, 'mp4', 'm3u8_native'),
'description': xpath_text(video_data, _add_ns('Description')), 'description': xpath_text(video_data, _add_ns('Description')),
'duration': parse_duration(xpath_text(video_data, _add_ns('Duration'))), 'duration': parse_duration(xpath_text(video_data, _add_ns('Duration'))),
'uploader': xpath_text(video_data, _add_ns('Createur')), 'uploader': xpath_text(video_data, _add_ns('Createur')),

View File

@ -47,9 +47,12 @@ class FOXIE(AdobePassIE):
resource = self._get_mvpd_resource('fbc-fox', None, ap_p['videoGUID'], rating) resource = self._get_mvpd_resource('fbc-fox', None, ap_p['videoGUID'], rating)
query['auth'] = self._extract_mvpd_auth(url, video_id, 'fbc-fox', resource) query['auth'] = self._extract_mvpd_auth(url, video_id, 'fbc-fox', resource)
return { info = self._search_json_ld(webpage, video_id, fatal=False)
info.update({
'_type': 'url_transparent', '_type': 'url_transparent',
'ie_key': 'ThePlatform', 'ie_key': 'ThePlatform',
'url': smuggle_url(update_url_query(release_url, query), {'force_smil_url': True}), 'url': smuggle_url(update_url_query(release_url, query), {'force_smil_url': True}),
'id': video_id, 'id': video_id,
} })
return info

View File

@ -4,7 +4,8 @@ from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
unified_strdate, extract_attributes,
int_or_none,
) )
@ -19,6 +20,7 @@ class FranceCultureIE(InfoExtractor):
'title': 'Rendez-vous au pays des geeks', 'title': 'Rendez-vous au pays des geeks',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
'upload_date': '20140301', 'upload_date': '20140301',
'timestamp': 1393642916,
'vcodec': 'none', 'vcodec': 'none',
} }
} }
@ -28,30 +30,34 @@ class FranceCultureIE(InfoExtractor):
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
video_url = self._search_regex( video_data = extract_attributes(self._search_regex(
r'(?s)<div[^>]+class="[^"]*?title-zone-diffusion[^"]*?"[^>]*>.*?<button[^>]+data-asset-source="([^"]+)"', r'(?s)<div[^>]+class="[^"]*?(?:title-zone-diffusion|heading-zone-(?:wrapper|player-button))[^"]*?"[^>]*>.*?(<button[^>]+data-asset-source="[^"]+"[^>]+>)',
webpage, 'video path') webpage, 'video data'))
title = self._og_search_title(webpage) video_url = video_data['data-asset-source']
title = video_data.get('data-asset-title') or self._og_search_title(webpage)
upload_date = unified_strdate(self._search_regex( description = self._html_search_regex(
'(?s)<div[^>]+class="date"[^>]*>.*?<span[^>]+class="inner"[^>]*>([^<]+)<', r'(?s)<div[^>]+class="intro"[^>]*>.*?<h2>(.+?)</h2>',
webpage, 'upload date', fatal=False)) webpage, 'description', default=None)
thumbnail = self._search_regex( thumbnail = self._search_regex(
r'(?s)<figure[^>]+itemtype="https://schema.org/ImageObject"[^>]*>.*?<img[^>]+data-dejavu-src="([^"]+)"', r'(?s)<figure[^>]+itemtype="https://schema.org/ImageObject"[^>]*>.*?<img[^>]+(?:data-dejavu-)?src="([^"]+)"',
webpage, 'thumbnail', fatal=False) webpage, 'thumbnail', fatal=False)
uploader = self._html_search_regex( uploader = self._html_search_regex(
r'(?s)<div id="emission".*?<span class="author">(.*?)</span>', r'(?s)<span class="author">(.*?)</span>',
webpage, 'uploader', default=None) webpage, 'uploader', default=None)
vcodec = 'none' if determine_ext(video_url.lower()) == 'mp3' else None ext = determine_ext(video_url.lower())
return { return {
'id': display_id, 'id': display_id,
'display_id': display_id, 'display_id': display_id,
'url': video_url, 'url': video_url,
'title': title, 'title': title,
'description': description,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'vcodec': vcodec, 'ext': ext,
'vcodec': 'none' if ext == 'mp3' else None,
'uploader': uploader, 'uploader': uploader,
'upload_date': upload_date, 'timestamp': int_or_none(video_data.get('data-asset-created-date')),
'duration': int_or_none(video_data.get('data-duration')),
} }

View File

@ -56,9 +56,8 @@ class FreshLiveIE(InfoExtractor):
is_live = info.get('liveStreamUrl') is not None is_live = info.get('liveStreamUrl') is not None
formats = self._extract_m3u8_formats( formats = self._extract_m3u8_formats(
stream_url, video_id, ext='mp4', stream_url, video_id, 'mp4',
entry_protocol='m3u8' if is_live else 'm3u8_native', 'm3u8_native', m3u8_id='hls')
m3u8_id='hls')
if is_live: if is_live:
title = self._live_title(title) title = self._live_title(title)

View File

@ -7,9 +7,9 @@ from ..compat import (
compat_urllib_parse_unquote_plus, compat_urllib_parse_unquote_plus,
) )
from ..utils import ( from ..utils import (
clean_html,
determine_ext, determine_ext,
int_or_none, int_or_none,
js_to_json,
sanitized_Request, sanitized_Request,
ExtractorError, ExtractorError,
urlencode_postdata urlencode_postdata
@ -17,34 +17,26 @@ from ..utils import (
class FunimationIE(InfoExtractor): class FunimationIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?funimation\.com/shows/[^/]+/videos/(?:official|promotional)/(?P<id>[^/?#&]+)' _VALID_URL = r'https?://(?:www\.)?funimation(?:\.com|now\.uk)/shows/[^/]+/(?P<id>[^/?#&]+)'
_NETRC_MACHINE = 'funimation' _NETRC_MACHINE = 'funimation'
_TESTS = [{ _TESTS = [{
'url': 'http://www.funimation.com/shows/air/videos/official/breeze', 'url': 'https://www.funimation.com/shows/hacksign/role-play/',
'info_dict': { 'info_dict': {
'id': '658', 'id': '91144',
'display_id': 'breeze',
'ext': 'mp4',
'title': 'Air - 1 - Breeze',
'description': 'md5:1769f43cd5fc130ace8fd87232207892',
'thumbnail': r're:https?://.*\.jpg',
},
'skip': 'Access without user interaction is forbidden by CloudFlare, and video removed',
}, {
'url': 'http://www.funimation.com/shows/hacksign/videos/official/role-play',
'info_dict': {
'id': '31128',
'display_id': 'role-play', 'display_id': 'role-play',
'ext': 'mp4', 'ext': 'mp4',
'title': '.hack//SIGN - 1 - Role Play', 'title': '.hack//SIGN - Role Play',
'description': 'md5:b602bdc15eef4c9bbb201bb6e6a4a2dd', 'description': 'md5:b602bdc15eef4c9bbb201bb6e6a4a2dd',
'thumbnail': r're:https?://.*\.jpg', 'thumbnail': r're:https?://.*\.jpg',
}, },
'skip': 'Access without user interaction is forbidden by CloudFlare', 'params': {
# m3u8 download
'skip_download': True,
},
}, { }, {
'url': 'http://www.funimation.com/shows/attack-on-titan-junior-high/videos/promotional/broadcast-dub-preview', 'url': 'https://www.funimation.com/shows/attack-on-titan-junior-high/broadcast-dub-preview/',
'info_dict': { 'info_dict': {
'id': '9635', 'id': '9635',
'display_id': 'broadcast-dub-preview', 'display_id': 'broadcast-dub-preview',
@ -54,25 +46,13 @@ class FunimationIE(InfoExtractor):
'thumbnail': r're:https?://.*\.(?:jpg|png)', 'thumbnail': r're:https?://.*\.(?:jpg|png)',
}, },
'skip': 'Access without user interaction is forbidden by CloudFlare', 'skip': 'Access without user interaction is forbidden by CloudFlare',
}, {
'url': 'https://www.funimationnow.uk/shows/puzzle-dragons-x/drop-impact/simulcast/',
'only_matching': True,
}] }]
_LOGIN_URL = 'http://www.funimation.com/login' _LOGIN_URL = 'http://www.funimation.com/login'
def _download_webpage(self, *args, **kwargs):
try:
return super(FunimationIE, self)._download_webpage(*args, **kwargs)
except ExtractorError as ee:
if isinstance(ee.cause, compat_HTTPError) and ee.cause.code == 403:
response = ee.cause.read()
if b'>Please complete the security check to access<' in response:
raise ExtractorError(
'Access to funimation.com is blocked by CloudFlare. '
'Please browse to http://www.funimation.com/, solve '
'the reCAPTCHA, export browser cookies to a text file,'
' and then try again with --cookies YOUR_COOKIE_FILE.',
expected=True)
raise
def _extract_cloudflare_session_ua(self, url): def _extract_cloudflare_session_ua(self, url):
ci_session_cookie = self._get_cookies(url).get('ci_session') ci_session_cookie = self._get_cookies(url).get('ci_session')
if ci_session_cookie: if ci_session_cookie:
@ -114,119 +94,74 @@ class FunimationIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
def _search_kane(name):
return self._search_regex(
r"KANE_customdimensions\.%s\s*=\s*'([^']+)';" % name,
webpage, name, default=None)
title_data = self._parse_json(self._search_regex(
r'TITLE_DATA\s*=\s*({[^}]+})',
webpage, 'title data', default=''),
display_id, js_to_json, fatal=False) or {}
video_id = title_data.get('id') or self._search_regex([
r"KANE_customdimensions.videoID\s*=\s*'(\d+)';",
r'<iframe[^>]+src="/player/(\d+)"',
], webpage, 'video_id', default=None)
if not video_id:
player_url = self._html_search_meta([
'al:web:url',
'og:video:url',
'og:video:secure_url',
], webpage, fatal=True)
video_id = self._search_regex(r'/player/(\d+)', player_url, 'video id')
title = episode = title_data.get('title') or _search_kane('videoTitle') or self._og_search_title(webpage)
series = _search_kane('showName')
if series:
title = '%s - %s' % (series, title)
description = self._html_search_meta(['description', 'og:description'], webpage, fatal=True)
try:
sources = self._download_json(
'https://prod-api-funimationnow.dadcdigital.com/api/source/catalog/video/%s/signed/' % video_id,
video_id)['items']
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
error = self._parse_json(e.cause.read(), video_id)['errors'][0]
raise ExtractorError('%s said: %s' % (
self.IE_NAME, error.get('detail') or error.get('title')), expected=True)
raise
errors = []
formats = [] formats = []
for source in sources:
ERRORS_MAP = { source_url = source.get('src')
'ERROR_MATURE_CONTENT_LOGGED_IN': 'matureContentLoggedIn', if not source_url:
'ERROR_MATURE_CONTENT_LOGGED_OUT': 'matureContentLoggedOut',
'ERROR_SUBSCRIPTION_LOGGED_OUT': 'subscriptionLoggedOut',
'ERROR_VIDEO_EXPIRED': 'videoExpired',
'ERROR_TERRITORY_UNAVAILABLE': 'territoryUnavailable',
'SVODBASIC_SUBSCRIPTION_IN_PLAYER': 'basicSubscription',
'SVODNON_SUBSCRIPTION_IN_PLAYER': 'nonSubscription',
'ERROR_PLAYER_NOT_RESPONDING': 'playerNotResponding',
'ERROR_UNABLE_TO_CONNECT_TO_CDN': 'unableToConnectToCDN',
'ERROR_STREAM_NOT_FOUND': 'streamNotFound',
}
USER_AGENTS = (
# PC UA is served with m3u8 that provides some bonus lower quality formats
('pc', 'Mozilla/5.0 (Windows NT 5.2; WOW64; rv:42.0) Gecko/20100101 Firefox/42.0'),
# Mobile UA allows to extract direct links and also does not fail when
# PC UA fails with hulu error (e.g.
# http://www.funimation.com/shows/hacksign/videos/official/role-play)
('mobile', 'Mozilla/5.0 (Linux; Android 4.4.2; Nexus 4 Build/KOT49H) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.114 Mobile Safari/537.36'),
)
user_agent = self._extract_cloudflare_session_ua(url)
if user_agent:
USER_AGENTS = ((None, user_agent),)
for kind, user_agent in USER_AGENTS:
request = sanitized_Request(url)
request.add_header('User-Agent', user_agent)
webpage = self._download_webpage(
request, display_id,
'Downloading %s webpage' % kind if kind else 'Downloading webpage')
playlist = self._parse_json(
self._search_regex(
r'var\s+playersData\s*=\s*(\[.+?\]);\n',
webpage, 'players data'),
display_id)[0]['playlist']
items = next(item['items'] for item in playlist if item.get('items'))
item = next(item for item in items if item.get('itemAK') == display_id)
error_messages = {}
video_error_messages = self._search_regex(
r'var\s+videoErrorMessages\s*=\s*({.+?});\n',
webpage, 'error messages', default=None)
if video_error_messages:
error_messages_json = self._parse_json(video_error_messages, display_id, fatal=False)
if error_messages_json:
for _, error in error_messages_json.items():
type_ = error.get('type')
description = error.get('description')
content = error.get('content')
if type_ == 'text' and description and content:
error_message = ERRORS_MAP.get(description)
if error_message:
error_messages[error_message] = content
for video in item.get('videoSet', []):
auth_token = video.get('authToken')
if not auth_token:
continue continue
funimation_id = video.get('FUNImationID') or video.get('videoId') source_type = source.get('videoType') or determine_ext(source_url)
preference = 1 if video.get('languageMode') == 'dub' else 0 if source_type == 'm3u8':
if not auth_token.startswith('?'):
auth_token = '?%s' % auth_token
for quality, height in (('sd', 480), ('hd', 720), ('hd1080', 1080)):
format_url = video.get('%sUrl' % quality)
if not format_url:
continue
if not format_url.startswith(('http', '//')):
errors.append(format_url)
continue
if determine_ext(format_url) == 'm3u8':
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
format_url + auth_token, display_id, 'mp4', entry_protocol='m3u8_native', source_url, video_id, 'mp4',
preference=preference, m3u8_id='%s-hls' % funimation_id, fatal=False)) m3u8_id='hls', fatal=False))
else: else:
tbr = int_or_none(self._search_regex(
r'-(\d+)[Kk]', format_url, 'tbr', default=None))
formats.append({ formats.append({
'url': format_url + auth_token, 'format_id': source_type,
'format_id': '%s-http-%dp' % (funimation_id, height), 'url': source_url,
'height': height,
'tbr': tbr,
'preference': preference,
}) })
if not formats and errors:
raise ExtractorError(
'%s returned error: %s'
% (self.IE_NAME, clean_html(error_messages.get(errors[0], errors[0]))),
expected=True)
self._sort_formats(formats) self._sort_formats(formats)
title = item['title']
artist = item.get('artist')
if artist:
title = '%s - %s' % (artist, title)
description = self._og_search_description(webpage) or item.get('description')
thumbnail = self._og_search_thumbnail(webpage) or item.get('posterUrl')
video_id = item.get('itemId') or display_id
return { return {
'id': video_id, 'id': video_id,
'display_id': display_id, 'display_id': display_id,
'title': title, 'title': title,
'description': description, 'description': description,
'thumbnail': thumbnail, 'thumbnail': self._og_search_thumbnail(webpage),
'series': series,
'season_number': int_or_none(title_data.get('seasonNum') or _search_kane('season')),
'episode_number': int_or_none(title_data.get('episodeNum')),
'episode': episode,
'season_id': title_data.get('seriesId'),
'formats': formats, 'formats': formats,
} }

View File

@ -902,12 +902,13 @@ class GenericIE(InfoExtractor):
}, },
# LazyYT # LazyYT
{ {
'url': 'http://discourse.ubuntu.com/t/unity-8-desktop-mode-windows-on-mir/1986', 'url': 'https://skiplagged.com/',
'info_dict': { 'info_dict': {
'id': '1986', 'id': 'skiplagged',
'title': 'Unity 8 desktop-mode windows on Mir! - Ubuntu Discourse', 'title': 'Skiplagged: The smart way to find cheap flights',
}, },
'playlist_mincount': 2, 'playlist_mincount': 1,
'add_ie': ['Youtube'],
}, },
# Cinchcast embed # Cinchcast embed
{ {
@ -990,6 +991,20 @@ class GenericIE(InfoExtractor):
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
}, },
}, },
{
# JWPlayer config passed as variable
'url': 'http://www.txxx.com/videos/3326530/ariele/',
'info_dict': {
'id': '3326530_hq',
'ext': 'mp4',
'title': 'ARIELE | Tube Cup',
'uploader': 'www.txxx.com',
'age_limit': 18,
},
'params': {
'skip_download': True,
}
},
# rtl.nl embed # rtl.nl embed
{ {
'url': 'http://www.rtlnieuws.nl/nieuws/buitenland/aanslagen-kopenhagen', 'url': 'http://www.rtlnieuws.nl/nieuws/buitenland/aanslagen-kopenhagen',
@ -2549,18 +2564,14 @@ class GenericIE(InfoExtractor):
self._sort_formats(entry['formats']) self._sort_formats(entry['formats'])
return self.playlist_result(entries) return self.playlist_result(entries)
jwplayer_data_str = self._find_jwplayer_data(webpage) jwplayer_data = self._find_jwplayer_data(
if jwplayer_data_str: webpage, video_id, transform_source=js_to_json)
try: if jwplayer_data:
jwplayer_data = self._parse_json(
jwplayer_data_str, video_id, transform_source=js_to_json)
info = self._parse_jwplayer_data( info = self._parse_jwplayer_data(
jwplayer_data, video_id, require_title=False) jwplayer_data, video_id, require_title=False, base_url=url)
if not info.get('title'): if not info.get('title'):
info['title'] = video_title info['title'] = video_title
return info return info
except ExtractorError:
pass
def check_video(vurl): def check_video(vurl):
if YoutubeIE.suitable(vurl): if YoutubeIE.suitable(vurl):
@ -2635,11 +2646,14 @@ class GenericIE(InfoExtractor):
found = re.search(REDIRECT_REGEX, refresh_header) found = re.search(REDIRECT_REGEX, refresh_header)
if found: if found:
new_url = compat_urlparse.urljoin(url, unescapeHTML(found.group(1))) new_url = compat_urlparse.urljoin(url, unescapeHTML(found.group(1)))
if new_url != url:
self.report_following_redirect(new_url) self.report_following_redirect(new_url)
return { return {
'_type': 'url', '_type': 'url',
'url': new_url, 'url': new_url,
} }
else:
found = None
if not found: if not found:
# twitter:player is a https URL to iframe player that may or may not # twitter:player is a https URL to iframe player that may or may not

View File

@ -62,13 +62,21 @@ class LimelightBaseIE(InfoExtractor):
fmt = { fmt = {
'url': stream_url, 'url': stream_url,
'abr': float_or_none(stream.get('audioBitRate')), 'abr': float_or_none(stream.get('audioBitRate')),
'vbr': float_or_none(stream.get('videoBitRate')),
'fps': float_or_none(stream.get('videoFrameRate')), 'fps': float_or_none(stream.get('videoFrameRate')),
'width': int_or_none(stream.get('videoWidthInPixels')),
'height': int_or_none(stream.get('videoHeightInPixels')),
'ext': ext, 'ext': ext,
} }
rtmp = re.search(r'^(?P<url>rtmpe?://(?P<host>[^/]+)/(?P<app>.+))/(?P<playpath>mp4:.+)$', stream_url) width = int_or_none(stream.get('videoWidthInPixels'))
height = int_or_none(stream.get('videoHeightInPixels'))
vbr = float_or_none(stream.get('videoBitRate'))
if width or height or vbr:
fmt.update({
'width': width,
'height': height,
'vbr': vbr,
})
else:
fmt['vcodec'] = 'none'
rtmp = re.search(r'^(?P<url>rtmpe?://(?P<host>[^/]+)/(?P<app>.+))/(?P<playpath>mp[34]:.+)$', stream_url)
if rtmp: if rtmp:
format_id = 'rtmp' format_id = 'rtmp'
if stream.get('videoBitRate'): if stream.get('videoBitRate'):

View File

@ -119,7 +119,8 @@ class LivestreamIE(InfoExtractor):
m3u8_url = video_data.get('m3u8_url') m3u8_url = video_data.get('m3u8_url')
if m3u8_url: if m3u8_url:
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False)) m3u8_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
f4m_url = video_data.get('f4m_url') f4m_url = video_data.get('f4m_url')
if f4m_url: if f4m_url:
@ -158,11 +159,11 @@ class LivestreamIE(InfoExtractor):
if smil_url: if smil_url:
formats.extend(self._extract_smil_formats(smil_url, broadcast_id)) formats.extend(self._extract_smil_formats(smil_url, broadcast_id))
entry_protocol = 'm3u8' if is_live else 'm3u8_native'
m3u8_url = stream_info.get('m3u8_url') m3u8_url = stream_info.get('m3u8_url')
if m3u8_url: if m3u8_url:
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
m3u8_url, broadcast_id, 'mp4', entry_protocol, m3u8_id='hls', fatal=False)) m3u8_url, broadcast_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
rtsp_url = stream_info.get('rtsp_url') rtsp_url = stream_info.get('rtsp_url')
if rtsp_url: if rtsp_url:
@ -276,7 +277,7 @@ class LivestreamOriginalIE(InfoExtractor):
'view_count': view_count, 'view_count': view_count,
} }
def _extract_video_formats(self, video_data, video_id, entry_protocol): def _extract_video_formats(self, video_data, video_id):
formats = [] formats = []
progressive_url = video_data.get('progressiveUrl') progressive_url = video_data.get('progressiveUrl')
@ -289,7 +290,8 @@ class LivestreamOriginalIE(InfoExtractor):
m3u8_url = video_data.get('httpUrl') m3u8_url = video_data.get('httpUrl')
if m3u8_url: if m3u8_url:
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', entry_protocol, m3u8_id='hls', fatal=False)) m3u8_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
rtsp_url = video_data.get('rtspUrl') rtsp_url = video_data.get('rtspUrl')
if rtsp_url: if rtsp_url:
@ -340,11 +342,10 @@ class LivestreamOriginalIE(InfoExtractor):
} }
video_data = self._download_json(stream_url, content_id) video_data = self._download_json(stream_url, content_id)
is_live = video_data.get('isLive') is_live = video_data.get('isLive')
entry_protocol = 'm3u8' if is_live else 'm3u8_native'
info.update({ info.update({
'id': content_id, 'id': content_id,
'title': self._live_title(info['title']) if is_live else info['title'], 'title': self._live_title(info['title']) if is_live else info['title'],
'formats': self._extract_video_formats(video_data, content_id, entry_protocol), 'formats': self._extract_video_formats(video_data, content_id),
'is_live': is_live, 'is_live': is_live,
}) })
return info return info

View File

@ -75,51 +75,38 @@ class OpenloadIE(InfoExtractor):
'<span[^>]+id="[^"]+"[^>]*>([0-9A-Za-z]+)</span>', '<span[^>]+id="[^"]+"[^>]*>([0-9A-Za-z]+)</span>',
webpage, 'openload ID') webpage, 'openload ID')
video_url_chars = [] decoded = ''
a = ol_id[0:24]
first_char = ord(ol_id[0]) b = []
key = first_char - 55 for i in range(0, len(a), 8):
maxKey = max(2, key) b.append(int(a[i:i + 8] or '0', 16))
key = min(maxKey, len(ol_id) - 38) ol_id = ol_id[24:]
t = ol_id[key:key + 36] j = 0
k = 0
hashMap = {} while j < len(ol_id):
v = ol_id.replace(t, '') c = 128
h = 0 d = 0
e = 0
while h < len(t): f = 0
f = t[h:h + 3] _more = True
i = int(f, 8) while _more:
hashMap[h / 3] = i if j + 1 >= len(ol_id):
h += 3 c = 143
f = int(ol_id[j:j + 2] or '0', 16)
h = 0 j += 2
H = 0 d += (f & 127) << e
while h < len(v): e += 7
B = '' _more = f >= c
C = '' g = d ^ b[k % 3]
if len(v) >= h + 2: for i in range(4):
B = v[h:h + 2] char_dec = (g >> 8 * i) & (c + 127)
if len(v) >= h + 3: char = compat_chr(char_dec)
C = v[h:h + 3] if char != '#':
i = int(B, 16) decoded += char
h += 2 k += 1
if H % 3 == 0:
i = int(C, 8)
h += 1
elif H % 2 == 0 and H != 0 and ord(v[H - 1]) < 60:
i = int(C, 10)
h += 1
index = H % 7
A = hashMap[index]
i ^= 213
i ^= A
video_url_chars.append(compat_chr(i))
H += 1
video_url = 'https://openload.co/stream/%s?mime=true' video_url = 'https://openload.co/stream/%s?mime=true'
video_url = video_url % (''.join(video_url_chars)) video_url = video_url % decoded
title = self._og_search_title(webpage, default=None) or self._search_regex( title = self._og_search_title(webpage, default=None) or self._search_regex(
r'<span[^>]+class=["\']title["\'][^>]*>([^<]+)', webpage, r'<span[^>]+class=["\']title["\'][^>]*>([^<]+)', webpage,

View File

@ -0,0 +1,138 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
clean_html,
ExtractorError,
remove_end,
strip_or_none,
unified_timestamp,
urljoin,
)
class PacktPubBaseIE(InfoExtractor):
_PACKT_BASE = 'https://www.packtpub.com'
_MAPT_REST = '%s/mapt-rest' % _PACKT_BASE
class PacktPubIE(PacktPubBaseIE):
_VALID_URL = r'https?://(?:www\.)?packtpub\.com/mapt/video/[^/]+/(?P<course_id>\d+)/(?P<chapter_id>\d+)/(?P<id>\d+)'
_TEST = {
'url': 'https://www.packtpub.com/mapt/video/web-development/9781787122215/20528/20530/Project+Intro',
'md5': '1e74bd6cfd45d7d07666f4684ef58f70',
'info_dict': {
'id': '20530',
'ext': 'mp4',
'title': 'Project Intro',
'thumbnail': r're:(?i)^https?://.*\.jpg',
'timestamp': 1490918400,
'upload_date': '20170331',
},
}
def _handle_error(self, response):
if response.get('status') != 'success':
raise ExtractorError(
'% said: %s' % (self.IE_NAME, response['message']),
expected=True)
def _download_json(self, *args, **kwargs):
response = super(PacktPubIE, self)._download_json(*args, **kwargs)
self._handle_error(response)
return response
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
course_id, chapter_id, video_id = mobj.group(
'course_id', 'chapter_id', 'id')
video = self._download_json(
'%s/users/me/products/%s/chapters/%s/sections/%s'
% (self._MAPT_REST, course_id, chapter_id, video_id), video_id,
'Downloading JSON video')['data']
content = video.get('content')
if not content:
raise ExtractorError('This video is locked', expected=True)
video_url = content['file']
metadata = self._download_json(
'%s/products/%s/chapters/%s/sections/%s/metadata'
% (self._MAPT_REST, course_id, chapter_id, video_id),
video_id)['data']
title = metadata['pageTitle']
course_title = metadata.get('title')
if course_title:
title = remove_end(title, ' - %s' % course_title)
timestamp = unified_timestamp(metadata.get('publicationDate'))
thumbnail = urljoin(self._PACKT_BASE, metadata.get('filepath'))
return {
'id': video_id,
'url': video_url,
'title': title,
'thumbnail': thumbnail,
'timestamp': timestamp,
}
class PacktPubCourseIE(PacktPubBaseIE):
_VALID_URL = r'(?P<url>https?://(?:www\.)?packtpub\.com/mapt/video/[^/]+/(?P<id>\d+))'
_TEST = {
'url': 'https://www.packtpub.com/mapt/video/web-development/9781787122215',
'info_dict': {
'id': '9781787122215',
'title': 'Learn Nodejs by building 12 projects [Video]',
},
'playlist_count': 90,
}
@classmethod
def suitable(cls, url):
return False if PacktPubIE.suitable(url) else super(
PacktPubCourseIE, cls).suitable(url)
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
url, course_id = mobj.group('url', 'id')
course = self._download_json(
'%s/products/%s/metadata' % (self._MAPT_REST, course_id),
course_id)['data']
entries = []
for chapter_num, chapter in enumerate(course['tableOfContents'], 1):
if chapter.get('type') != 'chapter':
continue
children = chapter.get('children')
if not isinstance(children, list):
continue
chapter_info = {
'chapter': chapter.get('title'),
'chapter_number': chapter_num,
'chapter_id': chapter.get('id'),
}
for section in children:
if section.get('type') != 'section':
continue
section_url = section.get('seoUrl')
if not isinstance(section_url, compat_str):
continue
entry = {
'_type': 'url_transparent',
'url': urljoin(url + '/', section_url),
'title': strip_or_none(section.get('title')),
'description': clean_html(section.get('summary')),
'ie_key': PacktPubIE.ie_key(),
}
entry.update(chapter_info)
entries.append(entry)
return self.playlist_result(entries, course_id, course.get('title'))

View File

@ -169,11 +169,10 @@ class PluralsightIE(PluralsightBaseIE):
collection = course['modules'] collection = course['modules']
module, clip = None, None clip = None
for module_ in collection: for module_ in collection:
if name in (module_.get('moduleName'), module_.get('name')): if name in (module_.get('moduleName'), module_.get('name')):
module = module_
for clip_ in module_.get('clips', []): for clip_ in module_.get('clips', []):
clip_index = clip_.get('clipIndex') clip_index = clip_.get('clipIndex')
if clip_index is None: if clip_index is None:

View File

@ -1,23 +1,40 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urlparse from ..compat import (
compat_urlparse,
compat_str,
)
from ..utils import ( from ..utils import (
determine_ext,
ExtractorError, ExtractorError,
determine_ext,
find_xpath_attr, find_xpath_attr,
fix_xml_ampersands, fix_xml_ampersands,
GeoRestrictedError,
int_or_none, int_or_none,
parse_duration, parse_duration,
strip_or_none,
try_get,
unified_strdate, unified_strdate,
unified_timestamp,
update_url_query, update_url_query,
urljoin,
xpath_text, xpath_text,
) )
class RaiBaseIE(InfoExtractor): class RaiBaseIE(InfoExtractor):
def _extract_relinker_formats(self, relinker_url, video_id): _UUID_RE = r'[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12}'
_GEO_COUNTRIES = ['IT']
_GEO_BYPASS = False
def _extract_relinker_info(self, relinker_url, video_id):
formats = [] formats = []
geoprotection = None
is_live = None
duration = None
for platform in ('mon', 'flash', 'native'): for platform in ('mon', 'flash', 'native'):
relinker = self._download_xml( relinker = self._download_xml(
@ -27,9 +44,27 @@ class RaiBaseIE(InfoExtractor):
query={'output': 45, 'pl': platform}, query={'output': 45, 'pl': platform},
headers=self.geo_verification_headers()) headers=self.geo_verification_headers())
media_url = find_xpath_attr(relinker, './url', 'type', 'content').text if not geoprotection:
geoprotection = xpath_text(
relinker, './geoprotection', default=None) == 'Y'
if not is_live:
is_live = xpath_text(
relinker, './is_live', default=None) == 'Y'
if not duration:
duration = parse_duration(xpath_text(
relinker, './duration', default=None))
url_elem = find_xpath_attr(relinker, './url', 'type', 'content')
if url_elem is None:
continue
media_url = url_elem.text
# This does not imply geo restriction (e.g.
# http://www.raisport.rai.it/dl/raiSport/media/rassegna-stampa-04a9f4bd-b563-40cf-82a6-aad3529cb4a9.html)
if media_url == 'http://download.rai.it/video_no_available.mp4': if media_url == 'http://download.rai.it/video_no_available.mp4':
self.raise_geo_restricted() continue
ext = determine_ext(media_url) ext = determine_ext(media_url)
if (ext == 'm3u8' and platform != 'mon') or (ext == 'f4m' and platform != 'flash'): if (ext == 'm3u8' and platform != 'mon') or (ext == 'f4m' and platform != 'flash'):
@ -53,35 +88,225 @@ class RaiBaseIE(InfoExtractor):
'format_id': 'http-%d' % bitrate if bitrate > 0 else 'http', 'format_id': 'http-%d' % bitrate if bitrate > 0 else 'http',
}) })
return formats if not formats and geoprotection is True:
self.raise_geo_restricted(countries=self._GEO_COUNTRIES)
def _extract_from_content_id(self, content_id, base_url): return dict((k, v) for k, v in {
'is_live': is_live,
'duration': duration,
'formats': formats,
}.items() if v is not None)
class RaiPlayIE(RaiBaseIE):
_VALID_URL = r'(?P<url>https?://(?:www\.)?raiplay\.it/.+?-(?P<id>%s)\.html)' % RaiBaseIE._UUID_RE
_TESTS = [{
'url': 'http://www.raiplay.it/video/2016/10/La-Casa-Bianca-e06118bb-59a9-4636-b914-498e4cfd2c66.html?source=twitter',
'md5': '340aa3b7afb54bfd14a8c11786450d76',
'info_dict': {
'id': 'e06118bb-59a9-4636-b914-498e4cfd2c66',
'ext': 'mp4',
'title': 'La Casa Bianca',
'alt_title': 'S2016 - Puntata del 23/10/2016',
'description': 'md5:a09d45890850458077d1f68bb036e0a5',
'thumbnail': r're:^https?://.*\.jpg$',
'uploader': 'Rai 3',
'creator': 'Rai 3',
'duration': 3278,
'timestamp': 1477764300,
'upload_date': '20161029',
'series': 'La Casa Bianca',
'season': '2016',
},
}, {
'url': 'http://www.raiplay.it/video/2014/04/Report-del-07042014-cb27157f-9dd0-4aee-b788-b1f67643a391.html',
'md5': '8970abf8caf8aef4696e7b1f2adfc696',
'info_dict': {
'id': 'cb27157f-9dd0-4aee-b788-b1f67643a391',
'ext': 'mp4',
'title': 'Report del 07/04/2014',
'alt_title': 'S2013/14 - Puntata del 07/04/2014',
'description': 'md5:f27c544694cacb46a078db84ec35d2d9',
'thumbnail': r're:^https?://.*\.jpg$',
'uploader': 'Rai 5',
'creator': 'Rai 5',
'duration': 6160,
'series': 'Report',
'season_number': 5,
'season': '2013/14',
},
'params': {
'skip_download': True,
},
}, {
'url': 'http://www.raiplay.it/video/2016/11/gazebotraindesi-efebe701-969c-4593-92f3-285f0d1ce750.html?',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
url, video_id = mobj.group('url', 'id')
media = self._download_json(
'%s?json' % url, video_id, 'Downloading video JSON')
title = media['name']
video = media['video']
relinker_info = self._extract_relinker_info(video['contentUrl'], video_id)
self._sort_formats(relinker_info['formats'])
thumbnails = []
if 'images' in media:
for _, value in media.get('images').items():
if value:
thumbnails.append({
'url': value.replace('[RESOLUTION]', '600x400')
})
timestamp = unified_timestamp(try_get(
media, lambda x: x['availabilities'][0]['start'], compat_str))
info = {
'id': video_id,
'title': title,
'alt_title': media.get('subtitle'),
'description': media.get('description'),
'uploader': media.get('channel'),
'creator': media.get('editor'),
'duration': parse_duration(video.get('duration')),
'timestamp': timestamp,
'thumbnails': thumbnails,
'series': try_get(
media, lambda x: x['isPartOf']['name'], compat_str),
'season_number': int_or_none(try_get(
media, lambda x: x['isPartOf']['numeroStagioni'])),
'season': media.get('stagione') or None,
}
info.update(relinker_info)
return info
class RaiIE(RaiBaseIE):
_VALID_URL = r'https?://[^/]+\.(?:rai\.(?:it|tv)|rainews\.it)/dl/.+?-(?P<id>%s)(?:-.+?)?\.html' % RaiBaseIE._UUID_RE
_TESTS = [{
# var uniquename = "ContentItem-..."
# data-id="ContentItem-..."
'url': 'http://www.raisport.rai.it/dl/raiSport/media/rassegna-stampa-04a9f4bd-b563-40cf-82a6-aad3529cb4a9.html',
'info_dict': {
'id': '04a9f4bd-b563-40cf-82a6-aad3529cb4a9',
'ext': 'mp4',
'title': 'TG PRIMO TEMPO',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 1758,
'upload_date': '20140612',
}
}, {
# with ContentItem in many metas
'url': 'http://www.rainews.it/dl/rainews/media/Weekend-al-cinema-da-Hollywood-arriva-il-thriller-di-Tate-Taylor-La-ragazza-del-treno-1632c009-c843-4836-bb65-80c33084a64b.html',
'info_dict': {
'id': '1632c009-c843-4836-bb65-80c33084a64b',
'ext': 'mp4',
'title': 'Weekend al cinema, da Hollywood arriva il thriller di Tate Taylor "La ragazza del treno"',
'description': 'I film in uscita questa settimana.',
'thumbnail': r're:^https?://.*\.png$',
'duration': 833,
'upload_date': '20161103',
}
}, {
# with ContentItem in og:url
'url': 'http://www.rai.it/dl/RaiTV/programmi/media/ContentItem-efb17665-691c-45d5-a60c-5301333cbb0c.html',
'md5': '11959b4e44fa74de47011b5799490adf',
'info_dict': {
'id': 'efb17665-691c-45d5-a60c-5301333cbb0c',
'ext': 'mp4',
'title': 'TG1 ore 20:00 del 03/11/2016',
'description': 'TG1 edizione integrale ore 20:00 del giorno 03/11/2016',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 2214,
'upload_date': '20161103',
}
}, {
# drawMediaRaiTV(...)
'url': 'http://www.report.rai.it/dl/Report/puntata/ContentItem-0c7a664b-d0f4-4b2c-8835-3f82e46f433e.html',
'md5': '2dd727e61114e1ee9c47f0da6914e178',
'info_dict': {
'id': '59d69d28-6bb6-409d-a4b5-ed44096560af',
'ext': 'mp4',
'title': 'Il pacco',
'description': 'md5:4b1afae1364115ce5d78ed83cd2e5b3a',
'thumbnail': r're:^https?://.*\.jpg$',
'upload_date': '20141221',
},
}, {
# initEdizione('ContentItem-...'
'url': 'http://www.tg1.rai.it/dl/tg1/2010/edizioni/ContentSet-9b6e0cba-4bef-4aef-8cf0-9f7f665b7dfb-tg1.html?item=undefined',
'info_dict': {
'id': 'c2187016-8484-4e3a-8ac8-35e475b07303',
'ext': 'mp4',
'title': r're:TG1 ore \d{2}:\d{2} del \d{2}/\d{2}/\d{4}',
'duration': 2274,
'upload_date': '20170401',
},
'skip': 'Changes daily',
}, {
# HDS live stream with only relinker URL
'url': 'http://www.rai.tv/dl/RaiTV/dirette/PublishingBlock-1912dbbf-3f96-44c3-b4cf-523681fbacbc.html?channel=EuroNews',
'info_dict': {
'id': '1912dbbf-3f96-44c3-b4cf-523681fbacbc',
'ext': 'flv',
'title': 'EuroNews',
},
'params': {
'skip_download': True,
},
}, {
# HLS live stream with ContentItem in og:url
'url': 'http://www.rainews.it/dl/rainews/live/ContentItem-3156f2f2-dc70-4953-8e2f-70d7489d4ce9.html',
'info_dict': {
'id': '3156f2f2-dc70-4953-8e2f-70d7489d4ce9',
'ext': 'mp4',
'title': 'La diretta di Rainews24',
},
'params': {
'skip_download': True,
},
}]
def _extract_from_content_id(self, content_id, url):
media = self._download_json( media = self._download_json(
'http://www.rai.tv/dl/RaiTV/programmi/media/ContentItem-%s.html?json' % content_id, 'http://www.rai.tv/dl/RaiTV/programmi/media/ContentItem-%s.html?json' % content_id,
content_id, 'Downloading video JSON') content_id, 'Downloading video JSON')
title = media['name'].strip()
media_type = media['type']
if 'Audio' in media_type:
relinker_info = {
'formats': {
'format_id': media.get('formatoAudio'),
'url': media['audioUrl'],
'ext': media.get('formatoAudio'),
}
}
elif 'Video' in media_type:
relinker_info = self._extract_relinker_info(media['mediaUri'], content_id)
else:
raise ExtractorError('not a media file')
self._sort_formats(relinker_info['formats'])
thumbnails = [] thumbnails = []
for image_type in ('image', 'image_medium', 'image_300'): for image_type in ('image', 'image_medium', 'image_300'):
thumbnail_url = media.get(image_type) thumbnail_url = media.get(image_type)
if thumbnail_url: if thumbnail_url:
thumbnails.append({ thumbnails.append({
'url': compat_urlparse.urljoin(base_url, thumbnail_url), 'url': compat_urlparse.urljoin(url, thumbnail_url),
}) })
formats = []
media_type = media['type']
if 'Audio' in media_type:
formats.append({
'format_id': media.get('formatoAudio'),
'url': media['audioUrl'],
'ext': media.get('formatoAudio'),
})
elif 'Video' in media_type:
formats.extend(self._extract_relinker_formats(media['mediaUri'], content_id))
self._sort_formats(formats)
else:
raise ExtractorError('not a media file')
subtitles = {} subtitles = {}
captions = media.get('subtitlesUrl') captions = media.get('subtitlesUrl')
if captions: if captions:
@ -94,174 +319,90 @@ class RaiBaseIE(InfoExtractor):
'url': captions, 'url': captions,
}] }]
return { info = {
'id': content_id, 'id': content_id,
'title': media['name'], 'title': title,
'description': media.get('desc'), 'description': strip_or_none(media.get('desc')),
'thumbnails': thumbnails, 'thumbnails': thumbnails,
'uploader': media.get('author'), 'uploader': media.get('author'),
'upload_date': unified_strdate(media.get('date')), 'upload_date': unified_strdate(media.get('date')),
'duration': parse_duration(media.get('length')), 'duration': parse_duration(media.get('length')),
'formats': formats,
'subtitles': subtitles, 'subtitles': subtitles,
} }
info.update(relinker_info)
class RaiTVIE(RaiBaseIE): return info
_VALID_URL = r'https?://(?:.+?\.)?(?:rai\.it|rai\.tv|rainews\.it)/dl/(?:[^/]+/)+(?:media|ondemand)/.+?-(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})(?:-.+?)?\.html'
_TESTS = [
{
'url': 'http://www.rai.tv/dl/RaiTV/programmi/media/ContentItem-cb27157f-9dd0-4aee-b788-b1f67643a391.html',
'md5': '8970abf8caf8aef4696e7b1f2adfc696',
'info_dict': {
'id': 'cb27157f-9dd0-4aee-b788-b1f67643a391',
'ext': 'mp4',
'title': 'Report del 07/04/2014',
'description': 'md5:f27c544694cacb46a078db84ec35d2d9',
'upload_date': '20140407',
'duration': 6160,
'thumbnail': r're:^https?://.*\.jpg$',
}
},
{
# no m3u8 stream
'url': 'http://www.raisport.rai.it/dl/raiSport/media/rassegna-stampa-04a9f4bd-b563-40cf-82a6-aad3529cb4a9.html',
# HDS download, MD5 is unstable
'info_dict': {
'id': '04a9f4bd-b563-40cf-82a6-aad3529cb4a9',
'ext': 'flv',
'title': 'TG PRIMO TEMPO',
'upload_date': '20140612',
'duration': 1758,
'thumbnail': r're:^https?://.*\.jpg$',
},
'skip': 'Geo-restricted to Italy',
},
{
'url': 'http://www.rainews.it/dl/rainews/media/state-of-the-net-Antonella-La-Carpia-regole-virali-7aafdea9-0e5d-49d5-88a6-7e65da67ae13.html',
'md5': '35cf7c229f22eeef43e48b5cf923bef0',
'info_dict': {
'id': '7aafdea9-0e5d-49d5-88a6-7e65da67ae13',
'ext': 'mp4',
'title': 'State of the Net, Antonella La Carpia: regole virali',
'description': 'md5:b0ba04a324126903e3da7763272ae63c',
'upload_date': '20140613',
},
'skip': 'Error 404',
},
{
'url': 'http://www.rai.tv/dl/RaiTV/programmi/media/ContentItem-b4a49761-e0cc-4b14-8736-2729f6f73132-tg2.html',
'info_dict': {
'id': 'b4a49761-e0cc-4b14-8736-2729f6f73132',
'ext': 'mp4',
'title': 'Alluvione in Sardegna e dissesto idrogeologico',
'description': 'Edizione delle ore 20:30 ',
},
'skip': 'invalid urls',
},
{
'url': 'http://www.ilcandidato.rai.it/dl/ray/media/Il-Candidato---Primo-episodio-Le-Primarie-28e5525a-b495-45e8-a7c3-bc48ba45d2b6.html',
'md5': 'e57493e1cb8bc7c564663f363b171847',
'info_dict': {
'id': '28e5525a-b495-45e8-a7c3-bc48ba45d2b6',
'ext': 'mp4',
'title': 'Il Candidato - Primo episodio: "Le Primarie"',
'description': 'md5:364b604f7db50594678f483353164fb8',
'upload_date': '20140923',
'duration': 386,
'thumbnail': r're:^https?://.*\.jpg$',
}
},
]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
return self._extract_from_content_id(video_id, url)
class RaiIE(RaiBaseIE):
_VALID_URL = r'https?://(?:.+?\.)?(?:rai\.it|rai\.tv|rainews\.it)/dl/.+?-(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})(?:-.+?)?\.html'
_TESTS = [
{
'url': 'http://www.report.rai.it/dl/Report/puntata/ContentItem-0c7a664b-d0f4-4b2c-8835-3f82e46f433e.html',
'md5': '2dd727e61114e1ee9c47f0da6914e178',
'info_dict': {
'id': '59d69d28-6bb6-409d-a4b5-ed44096560af',
'ext': 'mp4',
'title': 'Il pacco',
'description': 'md5:4b1afae1364115ce5d78ed83cd2e5b3a',
'upload_date': '20141221',
},
},
{
# Direct relinker URL
'url': 'http://www.rai.tv/dl/RaiTV/dirette/PublishingBlock-1912dbbf-3f96-44c3-b4cf-523681fbacbc.html?channel=EuroNews',
# HDS live stream, MD5 is unstable
'info_dict': {
'id': '1912dbbf-3f96-44c3-b4cf-523681fbacbc',
'ext': 'flv',
'title': 'EuroNews',
},
'skip': 'Geo-restricted to Italy',
},
{
# Embedded content item ID
'url': 'http://www.tg1.rai.it/dl/tg1/2010/edizioni/ContentSet-9b6e0cba-4bef-4aef-8cf0-9f7f665b7dfb-tg1.html?item=undefined',
'md5': '84c1135ce960e8822ae63cec34441d63',
'info_dict': {
'id': '0960e765-62c8-474a-ac4b-7eb3e2be39c8',
'ext': 'mp4',
'title': 'TG1 ore 20:00 del 02/07/2016',
'upload_date': '20160702',
},
},
{
'url': 'http://www.rainews.it/dl/rainews/live/ContentItem-3156f2f2-dc70-4953-8e2f-70d7489d4ce9.html',
# HDS live stream, MD5 is unstable
'info_dict': {
'id': '3156f2f2-dc70-4953-8e2f-70d7489d4ce9',
'ext': 'flv',
'title': 'La diretta di Rainews24',
},
},
]
@classmethod
def suitable(cls, url):
return False if RaiTVIE.suitable(url) else super(RaiIE, cls).suitable(url)
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
iframe_url = self._search_regex( content_item_id = None
[r'<iframe[^>]+src="([^"]*/dl/[^"]+\?iframe\b[^"]*)"',
r'drawMediaRaiTV\(["\'](.+?)["\']'],
webpage, 'iframe', default=None)
if iframe_url:
if not iframe_url.startswith('http'):
iframe_url = compat_urlparse.urljoin(url, iframe_url)
return self.url_result(iframe_url)
content_item_url = self._html_search_meta(
('og:url', 'og:video', 'og:video:secure_url', 'twitter:url',
'twitter:player', 'jsonlink'), webpage, default=None)
if content_item_url:
content_item_id = self._search_regex( content_item_id = self._search_regex(
r'initEdizione\((?P<q1>[\'"])ContentItem-(?P<content_id>[^\'"]+)(?P=q1)', r'ContentItem-(%s)' % self._UUID_RE, content_item_url,
webpage, 'content item ID', group='content_id', default=None) 'content item id', default=None)
if content_item_id:
return self._extract_from_content_id(content_item_id, url)
relinker_url = compat_urlparse.urljoin(url, self._search_regex( if not content_item_id:
r'(?:var\s+videoURL|mediaInfo\.mediaUri)\s*=\s*(?P<q1>[\'"])(?P<url>(https?:)?//mediapolis\.rai\.it/relinker/relinkerServlet\.htm\?cont=\d+)(?P=q1)', content_item_id = self._search_regex(
webpage, 'relinker URL', group='url')) r'''(?x)
formats = self._extract_relinker_formats(relinker_url, video_id) (?:
self._sort_formats(formats) (?:initEdizione|drawMediaRaiTV)\(|
<(?:[^>]+\bdata-id|var\s+uniquename)=
)
(["\'])
(?:(?!\1).)*\bContentItem-(?P<id>%s)
''' % self._UUID_RE,
webpage, 'content item id', default=None, group='id')
content_item_ids = set()
if content_item_id:
content_item_ids.add(content_item_id)
if video_id not in content_item_ids:
content_item_ids.add(video_id)
for content_item_id in content_item_ids:
try:
return self._extract_from_content_id(content_item_id, url)
except GeoRestrictedError:
raise
except ExtractorError:
pass
relinker_url = self._search_regex(
r'''(?x)
(?:
var\s+videoURL|
mediaInfo\.mediaUri
)\s*=\s*
([\'"])
(?P<url>
(?:https?:)?
//mediapolis(?:vod)?\.rai\.it/relinker/relinkerServlet\.htm\?
(?:(?!\1).)*\bcont=(?:(?!\1).)+)\1
''',
webpage, 'relinker URL', group='url')
relinker_info = self._extract_relinker_info(
urljoin(url, relinker_url), video_id)
self._sort_formats(relinker_info['formats'])
title = self._search_regex( title = self._search_regex(
r'var\s+videoTitolo\s*=\s*([\'"])(?P<title>[^\'"]+)\1', r'var\s+videoTitolo\s*=\s*([\'"])(?P<title>[^\'"]+)\1',
webpage, 'title', group='title', default=None) or self._og_search_title(webpage) webpage, 'title', group='title',
default=None) or self._og_search_title(webpage)
return { info = {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'formats': formats,
} }
info.update(relinker_info)
return info

View File

@ -31,9 +31,8 @@ class TVNoeIE(InfoExtractor):
r'<iframe[^>]+src="([^"]+)"', webpage, 'iframe URL') r'<iframe[^>]+src="([^"]+)"', webpage, 'iframe URL')
ifs_page = self._download_webpage(iframe_url, video_id) ifs_page = self._download_webpage(iframe_url, video_id)
jwplayer_data = self._parse_json( jwplayer_data = self._find_jwplayer_data(
self._find_jwplayer_data(ifs_page), ifs_page, video_id, transform_source=js_to_json)
video_id, transform_source=js_to_json)
info_dict = self._parse_jwplayer_data( info_dict = self._parse_jwplayer_data(
jwplayer_data, video_id, require_title=False, base_url=iframe_url) jwplayer_data, video_id, require_title=False, base_url=iframe_url)

View File

@ -225,7 +225,11 @@ class TVPlayIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
geo_country = self._search_regex(
r'https?://[^/]+\.([a-z]{2})', url,
'geo country', default=None)
if geo_country:
self._initialize_geo_bypass([geo_country.upper()])
video = self._download_json( video = self._download_json(
'http://playapi.mtgx.tv/v3/videos/%s' % video_id, video_id, 'Downloading video JSON') 'http://playapi.mtgx.tv/v3/videos/%s' % video_id, video_id, 'Downloading video JSON')

View File

@ -432,8 +432,7 @@ class VKIE(VKBaseIE):
}) })
elif format_id == 'hls': elif format_id == 'hls':
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4', format_url, video_id, 'mp4', 'm3u8_native',
entry_protocol='m3u8' if is_live else 'm3u8_native',
m3u8_id=format_id, fatal=False, live=is_live)) m3u8_id=format_id, fatal=False, live=is_live))
elif format_id == 'rtmp': elif format_id == 'rtmp':
formats.append({ formats.append({

191
youtube_dl/extractor/vrv.py Normal file
View File

@ -0,0 +1,191 @@
# coding: utf-8
from __future__ import unicode_literals
import base64
import json
import hashlib
import hmac
import random
import string
import time
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse_urlencode,
compat_urlparse,
)
from ..utils import (
float_or_none,
int_or_none,
)
class VRVBaseIE(InfoExtractor):
_API_DOMAIN = None
_API_PARAMS = {}
_CMS_SIGNING = {}
def _call_api(self, path, video_id, note, data=None):
base_url = self._API_DOMAIN + '/core/' + path
encoded_query = compat_urllib_parse_urlencode({
'oauth_consumer_key': self._API_PARAMS['oAuthKey'],
'oauth_nonce': ''.join([random.choice(string.ascii_letters) for _ in range(32)]),
'oauth_signature_method': 'HMAC-SHA1',
'oauth_timestamp': int(time.time()),
'oauth_version': '1.0',
})
headers = self.geo_verification_headers()
if data:
data = json.dumps(data).encode()
headers['Content-Type'] = 'application/json'
method = 'POST' if data else 'GET'
base_string = '&'.join([method, compat_urlparse.quote(base_url, ''), compat_urlparse.quote(encoded_query, '')])
oauth_signature = base64.b64encode(hmac.new(
(self._API_PARAMS['oAuthSecret'] + '&').encode('ascii'),
base_string.encode(), hashlib.sha1).digest()).decode()
encoded_query += '&oauth_signature=' + compat_urlparse.quote(oauth_signature, '')
return self._download_json(
'?'.join([base_url, encoded_query]), video_id,
note='Downloading %s JSON metadata' % note, headers=headers, data=data)
def _call_cms(self, path, video_id, note):
if not self._CMS_SIGNING:
self._CMS_SIGNING = self._call_api('index', video_id, 'CMS Signing')['cms_signing']
return self._download_json(
self._API_DOMAIN + path, video_id, query=self._CMS_SIGNING,
note='Downloading %s JSON metadata' % note, headers=self.geo_verification_headers())
def _set_api_params(self, webpage, video_id):
if not self._API_PARAMS:
self._API_PARAMS = self._parse_json(self._search_regex(
r'window\.__APP_CONFIG__\s*=\s*({.+?})</script>',
webpage, 'api config'), video_id)['cxApiParams']
self._API_DOMAIN = self._API_PARAMS.get('apiDomain', 'https://api.vrv.co')
def _get_cms_resource(self, resource_key, video_id):
return self._call_api(
'cms_resource', video_id, 'resource path', data={
'resource_key': resource_key,
})['__links__']['cms_resource']['href']
class VRVIE(VRVBaseIE):
IE_NAME = 'vrv'
_VALID_URL = r'https?://(?:www\.)?vrv\.co/watch/(?P<id>[A-Z0-9]+)'
_TEST = {
'url': 'https://vrv.co/watch/GR9PNZ396/Hidden-America-with-Jonah-Ray:BOSTON-WHERE-THE-PAST-IS-THE-PRESENT',
'info_dict': {
'id': 'GR9PNZ396',
'ext': 'mp4',
'title': 'BOSTON: WHERE THE PAST IS THE PRESENT',
'description': 'md5:4ec8844ac262ca2df9e67c0983c6b83f',
'uploader_id': 'seeso',
},
'params': {
# m3u8 download
'skip_download': True,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(
url, video_id,
headers=self.geo_verification_headers())
media_resource = self._parse_json(self._search_regex(
r'window\.__INITIAL_STATE__\s*=\s*({.+?})</script>',
webpage, 'inital state'), video_id).get('watch', {}).get('mediaResource') or {}
video_data = media_resource.get('json')
if not video_data:
self._set_api_params(webpage, video_id)
episode_path = self._get_cms_resource(
'cms:/episodes/' + video_id, video_id)
video_data = self._call_cms(episode_path, video_id, 'video')
title = video_data['title']
streams_json = media_resource.get('streams', {}).get('json', {})
if not streams_json:
self._set_api_params(webpage, video_id)
streams_path = video_data['__links__']['streams']['href']
streams_json = self._call_cms(streams_path, video_id, 'streams')
audio_locale = streams_json.get('audio_locale')
formats = []
for stream_id, stream in streams_json.get('streams', {}).get('adaptive_hls', {}).items():
stream_url = stream.get('url')
if not stream_url:
continue
stream_id = stream_id or audio_locale
m3u8_formats = self._extract_m3u8_formats(
stream_url, video_id, 'mp4', m3u8_id=stream_id,
note='Downloading %s m3u8 information' % stream_id,
fatal=False)
if audio_locale:
for f in m3u8_formats:
f['language'] = audio_locale
formats.extend(m3u8_formats)
self._sort_formats(formats)
thumbnails = []
for thumbnail in video_data.get('images', {}).get('thumbnails', []):
thumbnail_url = thumbnail.get('source')
if not thumbnail_url:
continue
thumbnails.append({
'url': thumbnail_url,
'width': int_or_none(thumbnail.get('width')),
'height': int_or_none(thumbnail.get('height')),
})
return {
'id': video_id,
'title': title,
'formats': formats,
'thumbnails': thumbnails,
'description': video_data.get('description'),
'duration': float_or_none(video_data.get('duration_ms'), 1000),
'uploader_id': video_data.get('channel_id'),
'series': video_data.get('series_title'),
'season': video_data.get('season_title'),
'season_number': int_or_none(video_data.get('season_number')),
'season_id': video_data.get('season_id'),
'episode': title,
'episode_number': int_or_none(video_data.get('episode_number')),
'episode_id': video_data.get('production_episode_id'),
}
class VRVSeriesIE(VRVBaseIE):
IE_NAME = 'vrv:series'
_VALID_URL = r'https?://(?:www\.)?vrv\.co/series/(?P<id>[A-Z0-9]+)'
_TEST = {
'url': 'https://vrv.co/series/G68VXG3G6/The-Perfect-Insider',
'info_dict': {
'id': 'G68VXG3G6',
},
'playlist_mincount': 11,
}
def _real_extract(self, url):
series_id = self._match_id(url)
webpage = self._download_webpage(
url, series_id,
headers=self.geo_verification_headers())
self._set_api_params(webpage, series_id)
seasons_path = self._get_cms_resource(
'cms:/seasons?series_id=' + series_id, series_id)
seasons_data = self._call_cms(seasons_path, series_id, 'seasons')
entries = []
for season in seasons_data.get('items', []):
episodes_path = season['__links__']['season/episodes']['href']
episodes = self._call_cms(episodes_path, series_id, 'episodes')
for episode in episodes.get('items', []):
episode_id = episode['id']
entries.append(self.url_result(
'https://vrv.co/watch/' + episode_id,
'VRV', episode_id, episode.get('title')))
return self.playlist_result(entries, series_id)

View File

@ -6,6 +6,7 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
decode_packed_codes, decode_packed_codes,
determine_ext,
ExtractorError, ExtractorError,
int_or_none, int_or_none,
NO_DEFAULT, NO_DEFAULT,
@ -26,6 +27,7 @@ class XFileShareIE(InfoExtractor):
('vidto.me', 'Vidto'), ('vidto.me', 'Vidto'),
('streamin.to', 'Streamin.To'), ('streamin.to', 'Streamin.To'),
('xvidstage.com', 'XVIDSTAGE'), ('xvidstage.com', 'XVIDSTAGE'),
('vidabc.com', 'Vid ABC'),
) )
IE_DESC = 'XFileShare based sites: %s' % ', '.join(list(zip(*_SITES))[1]) IE_DESC = 'XFileShare based sites: %s' % ', '.join(list(zip(*_SITES))[1])
@ -95,6 +97,16 @@ class XFileShareIE(InfoExtractor):
# removed by administrator # removed by administrator
'url': 'http://xvidstage.com/amfy7atlkx25', 'url': 'http://xvidstage.com/amfy7atlkx25',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://vidabc.com/i8ybqscrphfv',
'info_dict': {
'id': 'i8ybqscrphfv',
'ext': 'mp4',
'title': 're:Beauty and the Beast 2017',
},
'params': {
'skip_download': True,
},
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -133,31 +145,45 @@ class XFileShareIE(InfoExtractor):
webpage, 'title', default=None) or self._og_search_title( webpage, 'title', default=None) or self._og_search_title(
webpage, default=None) or video_id).strip() webpage, default=None) or video_id).strip()
def extract_video_url(default=NO_DEFAULT): def extract_formats(default=NO_DEFAULT):
return self._search_regex( urls = []
(r'file\s*:\s*(["\'])(?P<url>http.+?)\1,', for regex in (
r'file_link\s*=\s*(["\'])(?P<url>http.+?)\1', r'file\s*:\s*(["\'])(?P<url>http(?:(?!\1).)+\.(?:m3u8|mp4|flv)(?:(?!\1).)*)\1',
r'addVariable\((\\?["\'])file\1\s*,\s*(\\?["\'])(?P<url>http.+?)\2\)', r'file_link\s*=\s*(["\'])(?P<url>http(?:(?!\1).)+)\1',
r'<embed[^>]+src=(["\'])(?P<url>http.+?)\1'), r'addVariable\((\\?["\'])file\1\s*,\s*(\\?["\'])(?P<url>http(?:(?!\2).)+)\2\)',
webpage, 'file url', default=default, group='url') r'<embed[^>]+src=(["\'])(?P<url>http(?:(?!\1).)+\.(?:m3u8|mp4|flv)(?:(?!\1).)*)\1'):
for mobj in re.finditer(regex, webpage):
video_url = mobj.group('url')
if video_url not in urls:
urls.append(video_url)
formats = []
for video_url in urls:
if determine_ext(video_url) == 'm3u8':
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls',
fatal=False))
else:
formats.append({
'url': video_url,
'format_id': 'sd',
})
if not formats and default is not NO_DEFAULT:
return default
self._sort_formats(formats)
return formats
video_url = extract_video_url(default=None) formats = extract_formats(default=None)
if not video_url: if not formats:
webpage = decode_packed_codes(self._search_regex( webpage = decode_packed_codes(self._search_regex(
r"(}\('(.+)',(\d+),(\d+),'[^']*\b(?:file|embed)\b[^']*'\.split\('\|'\))", r"(}\('(.+)',(\d+),(\d+),'[^']*\b(?:file|embed)\b[^']*'\.split\('\|'\))",
webpage, 'packed code')) webpage, 'packed code'))
video_url = extract_video_url() formats = extract_formats()
thumbnail = self._search_regex( thumbnail = self._search_regex(
r'image\s*:\s*["\'](http[^"\']+)["\'],', webpage, 'thumbnail', default=None) r'image\s*:\s*["\'](http[^"\']+)["\'],', webpage, 'thumbnail', default=None)
formats = [{
'format_id': 'sd',
'url': video_url,
'quality': 1,
}]
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,

View File

@ -59,6 +59,8 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
# If True it will raise an error if no login info is provided # If True it will raise an error if no login info is provided
_LOGIN_REQUIRED = False _LOGIN_REQUIRED = False
_PLAYLIST_ID_RE = r'(?:PL|LL|EC|UU|FL|RD|UL|TL)[0-9A-Za-z-_]{10,}'
def _set_language(self): def _set_language(self):
self._set_cookie( self._set_cookie(
'.youtube.com', 'PREF', 'f1=50000000&hl=en', '.youtube.com', 'PREF', 'f1=50000000&hl=en',
@ -265,9 +267,14 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
) )
)? # all until now is optional -> you can pass the naked ID )? # all until now is optional -> you can pass the naked ID
([0-9A-Za-z_-]{11}) # here is it! the YouTube video ID ([0-9A-Za-z_-]{11}) # here is it! the YouTube video ID
(?!.*?\blist=) # combined list/video URLs are handled by the playlist IE (?!.*?\blist=
(?:
%(playlist_id)s| # combined list/video URLs are handled by the playlist IE
WL # WL are handled by the watch later IE
)
)
(?(1).+)? # if we found the ID, everything can follow (?(1).+)? # if we found the ID, everything can follow
$""" $""" % {'playlist_id': YoutubeBaseInfoExtractor._PLAYLIST_ID_RE}
_NEXT_URL_RE = r'[\?&]next_url=([^&]+)' _NEXT_URL_RE = r'[\?&]next_url=([^&]+)'
_formats = { _formats = {
'5': {'ext': 'flv', 'width': 400, 'height': 240, 'acodec': 'mp3', 'abr': 64, 'vcodec': 'h263'}, '5': {'ext': 'flv', 'width': 400, 'height': 240, 'acodec': 'mp3', 'abr': 64, 'vcodec': 'h263'},
@ -924,6 +931,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'url': 'sJL6WA-aGkQ', 'url': 'sJL6WA-aGkQ',
'only_matching': True, 'only_matching': True,
}, },
{
'url': 'https://www.youtube.com/watch?v=MuAGGZNfUkU&list=RDMM',
'only_matching': True,
},
] ]
def __init__(self, *args, **kwargs): def __init__(self, *args, **kwargs):
@ -1864,8 +1875,8 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
) )
.* .*
| |
((?:PL|LL|EC|UU|FL|RD|UL|TL)[0-9A-Za-z-_]{10,}) (%(playlist_id)s)
)""" )""" % {'playlist_id': YoutubeBaseInfoExtractor._PLAYLIST_ID_RE}
_TEMPLATE_URL = 'https://www.youtube.com/playlist?list=%s&disable_polymer=true' _TEMPLATE_URL = 'https://www.youtube.com/playlist?list=%s&disable_polymer=true'
_VIDEO_RE = r'href="\s*/watch\?v=(?P<id>[0-9A-Za-z_-]{11})&amp;[^"]*?index=(?P<index>\d+)(?:[^>]+>(?P<title>[^<]+))?' _VIDEO_RE = r'href="\s*/watch\?v=(?P<id>[0-9A-Za-z_-]{11})&amp;[^"]*?index=(?P<index>\d+)(?:[^>]+>(?P<title>[^<]+))?'
IE_NAME = 'youtube:playlist' IE_NAME = 'youtube:playlist'

View File

@ -459,11 +459,11 @@ def parseOpts(overrideArguments=None):
downloader.add_option( downloader.add_option(
'--fragment-retries', '--fragment-retries',
dest='fragment_retries', metavar='RETRIES', default=10, dest='fragment_retries', metavar='RETRIES', default=10,
help='Number of retries for a fragment (default is %default), or "infinite" (DASH and hlsnative only)') help='Number of retries for a fragment (default is %default), or "infinite" (DASH, hlsnative and ISM)')
downloader.add_option( downloader.add_option(
'--skip-unavailable-fragments', '--skip-unavailable-fragments',
action='store_true', dest='skip_unavailable_fragments', default=True, action='store_true', dest='skip_unavailable_fragments', default=True,
help='Skip unavailable fragments (DASH and hlsnative only)') help='Skip unavailable fragments (DASH, hlsnative and ISM)')
downloader.add_option( downloader.add_option(
'--abort-on-unavailable-fragment', '--abort-on-unavailable-fragment',
action='store_false', dest='skip_unavailable_fragments', action='store_false', dest='skip_unavailable_fragments',

View File

@ -39,6 +39,7 @@ from .compat import (
compat_basestring, compat_basestring,
compat_chr, compat_chr,
compat_etree_fromstring, compat_etree_fromstring,
compat_expanduser,
compat_html_entities, compat_html_entities,
compat_html_entities_html5, compat_html_entities_html5,
compat_http_client, compat_http_client,
@ -539,6 +540,11 @@ def sanitized_Request(url, *args, **kwargs):
return compat_urllib_request.Request(sanitize_url(url), *args, **kwargs) return compat_urllib_request.Request(sanitize_url(url), *args, **kwargs)
def expand_path(s):
"""Expand shell variables and ~"""
return os.path.expandvars(compat_expanduser(s))
def orderedSet(iterable): def orderedSet(iterable):
""" Remove all duplicates from the input iterable """ """ Remove all duplicates from the input iterable """
res = [] res = []

View File

@ -1,3 +1,3 @@
from __future__ import unicode_literals from __future__ import unicode_literals
__version__ = '2017.03.24' __version__ = '2017.04.02'