Compare commits

..

68 Commits

Author SHA1 Message Date
92747e664a release 2016.06.26 2016-06-26 21:15:24 +07:00
f1f336322d [msn] Fix extraction (Closes #8960, closes #9542) 2016-06-26 21:10:05 +07:00
bf8dd79045 [extractor/common] Fix sorting with custom field preference 2016-06-26 21:09:07 +07:00
c6781156aa [MSN] add new extractor 2016-06-26 21:07:59 +07:00
f484c5fa25 [vidbit] Improve (Closes #9759) 2016-06-26 16:59:28 +07:00
88d9f6c0c4 [utils] Add support for name list in _html_search_meta 2016-06-26 16:57:14 +07:00
3c9c088f9c [Vidbit] Add new extractor 2016-06-26 16:52:52 +07:00
fc3996bfe1 [iqiyi] Remove codes for debugging 2016-06-26 15:45:41 +08:00
5b6ad8630c [iqiyi] Partially fix IqiyiIE
Use the HTML5 API. Only low-resolution formats available

Related: #9839

Thanks @zhangn1985 for the overall algorithm (soimort/you-get#1224)
2016-06-26 15:18:32 +08:00
30105f4ac0 [le] Move urshift() to utils.py 2016-06-26 15:17:26 +08:00
1143535d76 [utils] Add urshift()
Used in IqiyiIE and LeIE
2016-06-26 15:16:49 +08:00
7d52c052ef [generic] Fix test_Generic_76
Broken: https://travis-ci.org/rg3/youtube-dl/jobs/140251658
2016-06-26 11:56:27 +08:00
a2406fce3c Fix misspelling 2016-06-26 01:28:55 +07:00
3b34ab538c [svtplay] Extend _VALID_URL (#9900) 2016-06-26 00:29:53 +07:00
ac782306f1 [iqiyi] Mark broken 2016-06-26 00:25:41 +07:00
0c00e889f3 Credit @JakubAdamWieczorek for #9813 2016-06-25 23:35:57 +07:00
ce96ed05f4 [polskieradio] Add test with video 2016-06-25 23:31:21 +07:00
0463b77a1f [polskieradio] Improve extraction (Closes #9813) 2016-06-25 23:19:18 +07:00
2d185706ea [polskieradio] Add support for Polskie Radio.
Polskie Radio is the main Polish state-funded radio broadcasting service.
2016-06-25 23:19:18 +07:00
b72b44318c [utils] Add strip_or_none 2016-06-25 23:19:18 +07:00
46f59e89ea [utils] Add unified_timestamp 2016-06-25 23:19:18 +07:00
b4241e308e release 2016.06.25 2016-06-25 03:03:20 +07:00
3d4b08dfc7 [setup.py] Add file version information and quotes consistency (Closes #9878) 2016-06-25 02:50:12 +07:00
be49068d65 [youtube] Fix and skip some tests 2016-06-24 22:47:19 +07:00
525cedb971 [youtube] Relax URL expansion in description 2016-06-24 22:37:13 +07:00
de3c7fe0d4 [youtube] Fix 141 format tests 2016-06-24 22:27:55 +07:00
896cc72750 [mixcloud] View count and like count may be absent
Closes #9874
2016-06-24 17:26:12 +08:00
c1ff6e1ad0 [vimeo:review] Fix extraction for password-protected videos
Closes #9853
2016-06-24 16:48:37 +08:00
fee70322d7 [appletrailers] correct thumbnail fallback 2016-06-23 19:03:34 +01:00
8065d6c55f [dcn] extend _VALID_URL for awaan.ae and extract all available formats 2016-06-23 17:22:15 +01:00
494172d2e5 [appletrailers] extract info from an alternative source if available(closes #8422)(closes #8422) 2016-06-23 15:49:42 +01:00
6e3c2047f8 [tvp] extract all formats and detect erros 2016-06-23 04:36:16 +01:00
011bd3221b release 2016.06.23.1 2016-06-23 09:42:56 +07:00
b46eabecd3 [jsinterp] Relax JS function regex (Closes #9863) 2016-06-23 09:41:34 +07:00
0437307a41 [nbc:nbcnews] improve extraction and add msnbc to the extractor 2016-06-23 01:36:19 +01:00
22b7ac13ef [tf1] fix wat id extraction(closes #9862) 2016-06-23 00:14:34 +01:00
96f88e91b7 release 2016.06.23 2016-06-23 04:29:34 +07:00
3331a4644d [vk] Remove unused import 2016-06-23 04:27:10 +07:00
adf1921dc1 [xnxx] Improve _VALID_URL (Closes #9858) 2016-06-23 04:26:49 +07:00
97674f0419 [xnxx] Replace test 2016-06-23 04:24:00 +07:00
rr-
73843ae8ac [xnxx] fix url regex
The pattern has changed from "video123412" to "video-o8xa19".
The changes maintain backwards compatibility with old-style URLs.
2016-06-23 04:19:55 +07:00
f2bb8c036a [vk] Modernize 2016-06-23 04:18:43 +07:00
75ca6bcee2 [vk] Workaround buggy new.vk.com Set-Cookie headers 2016-06-23 04:17:13 +07:00
089657ed1f [vimeo:album] Add paged example URL 2016-06-23 02:00:03 +07:00
b5eab86c24 [vimeo:album] Impove _VALID_URL 2016-06-23 01:56:58 +07:00
c8e3e0974b [vimeo:channel] Improve playlist extraction 2016-06-23 01:28:36 +07:00
dfc8f46e1c [vimeo:channel] Add video id to url_result
This will allow us to decide much faster that we don't want an already archived video,
and will allow having to download webpages for each video that has already been downloaded,
thus significantly speeding up the archival of channels that have no new content.
2016-06-23 01:26:27 +07:00
c143ddce5d [vimeo] Override original URL only when necessary 2016-06-23 00:51:36 +07:00
169d836feb lazy-extractors: Fix after commit 6e6b9f600f
The problem was in the following code:

    class ArteTVPlus7IE(ArteTVBaseIE):

        ...

        @classmethod
        def suitable(cls, url):
            return False if ArteTVPlaylistIE.suitable(url) else super(ArteTVPlus7IE, cls).suitable(url)

And its sublcasses like ArteTVCinemaIE.

Since in the lazy_extractors.py file ArteTVCinemaIE was not a subclass of ArteTVPlus7IE, super(ArteTVPlus7IE, cls) failed.

To fix it we have to make it a subclass. Since the order of _ALL_CLASSES is arbitrary we must sort them so that the base classes are defined first. We also must add base classes like YoutubeBaseInfoExtractor.
2016-06-22 19:20:50 +02:00
6ae938b295 [Vine] Extract view count 2016-06-22 23:57:35 +07:00
cf40fdf5c1 release 2016.06.22 2016-06-22 23:43:24 +07:00
23bdae0955 [svt] Various improvements
+ [svt:play] Add fallback path looking for video id and fix extraction for oppetarkiv
* [svt:base] Detect geo restriction
* [svt:base] Extract series related metadata
2016-06-22 23:36:07 +07:00
ca74c90bf5 Fix issue downloading facebook videos
youtube-dl expects the format items to be returned as a list,
but when there's only one item Facebook returns a dict instead,
this wraps the dict in a list if necessary
2016-06-22 12:52:15 +01:00
7cfc1e2a10 [gametrailers] Remove extractor
gametrailers closed (see http://www.polygon.com/2016/2/8/10944452/gametrailers-shuts-down-after-13-year-run)
2016-06-21 22:31:41 +07:00
1ac5705f62 [gamespot] extract all formats 2016-06-21 13:37:57 +01:00
e4f90ea0a7 [svt] Fix extraction for SVTPlay (closes #9809) 2016-06-21 17:55:53 +08:00
cdfc187cd5 [cbs] Remove unused import 2016-06-20 22:40:33 +07:00
feef925f49 [streamcloud] Capture error message (#9840) 2016-06-20 22:40:22 +07:00
19e2d1cdea release 2016.06.20 2016-06-20 20:50:01 +07:00
8369a4fe76 [downloader/hls] Simplify and carry long lines 2016-06-20 21:55:17 +07:00
1f749b6658 Revert "[jsinterp] Avoid double key lookup for setting new key"
This reverts commit 7c05097633.
2016-06-20 13:29:13 +02:00
819707920a [cbs] fix _VALID_URL 2016-06-19 23:55:19 +01:00
43518503a6 [cbs,cbsnews,cbssports] reduce requests while extracting all formats 2016-06-19 23:40:00 +01:00
5839d556e4 [theplatform] reduce requests for theplatform feed info extraction 2016-06-19 23:37:05 +01:00
6c83e583b3 [radiojavan] PEP8
E275 is added in pycodestyle 2.6

See https://github.com/PyCQA/pycodestyle/pull/491
2016-06-19 13:32:08 +08:00
6aeb64b673 Merge pull request #8201 from remitamine/hls-aes
[downloader/hls] Add support for AES-128 encrypted segments in hlsnative downloader
2016-06-19 13:25:08 +08:00
6cd64b6806 [foxsports] extract http formats 2016-06-19 05:45:48 +01:00
e154c65128 [downloader/hls] Add support for AES-128 encrypted segments in hlsnative downloader 2016-06-19 01:01:40 +01:00
42 changed files with 1332 additions and 890 deletions

View File

@ -6,8 +6,8 @@
--- ---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.06.19.1*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected. ### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.06.26*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.06.19.1** - [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.06.26**
### Before submitting an *issue* make sure you have: ### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections - [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2016.06.19.1 [debug] youtube-dl version 2016.06.26
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View File

@ -175,3 +175,4 @@ Tomáš Čech
Déstin Reed Déstin Reed
Roman Tsiupa Roman Tsiupa
Artur Krysiak Artur Krysiak
Jakub Adam Wieczorek

View File

@ -14,15 +14,17 @@ if os.path.exists(lazy_extractors_filename):
os.remove(lazy_extractors_filename) os.remove(lazy_extractors_filename)
from youtube_dl.extractor import _ALL_CLASSES from youtube_dl.extractor import _ALL_CLASSES
from youtube_dl.extractor.common import InfoExtractor from youtube_dl.extractor.common import InfoExtractor, SearchInfoExtractor
with open('devscripts/lazy_load_template.py', 'rt') as f: with open('devscripts/lazy_load_template.py', 'rt') as f:
module_template = f.read() module_template = f.read()
module_contents = [module_template + '\n' + getsource(InfoExtractor.suitable)] module_contents = [
module_template + '\n' + getsource(InfoExtractor.suitable) + '\n',
'class LazyLoadSearchExtractor(LazyLoadExtractor):\n pass\n']
ie_template = ''' ie_template = '''
class {name}(LazyLoadExtractor): class {name}({bases}):
_VALID_URL = {valid_url!r} _VALID_URL = {valid_url!r}
_module = '{module}' _module = '{module}'
''' '''
@ -34,10 +36,20 @@ make_valid_template = '''
''' '''
def get_base_name(base):
if base is InfoExtractor:
return 'LazyLoadExtractor'
elif base is SearchInfoExtractor:
return 'LazyLoadSearchExtractor'
else:
return base.__name__
def build_lazy_ie(ie, name): def build_lazy_ie(ie, name):
valid_url = getattr(ie, '_VALID_URL', None) valid_url = getattr(ie, '_VALID_URL', None)
s = ie_template.format( s = ie_template.format(
name=name, name=name,
bases=', '.join(map(get_base_name, ie.__bases__)),
valid_url=valid_url, valid_url=valid_url,
module=ie.__module__) module=ie.__module__)
if ie.suitable.__func__ is not InfoExtractor.suitable.__func__: if ie.suitable.__func__ is not InfoExtractor.suitable.__func__:
@ -47,12 +59,35 @@ def build_lazy_ie(ie, name):
s += make_valid_template.format(valid_url=ie._make_valid_url()) s += make_valid_template.format(valid_url=ie._make_valid_url())
return s return s
# find the correct sorting and add the required base classes so that sublcasses
# can be correctly created
classes = _ALL_CLASSES[:-1]
ordered_cls = []
while classes:
for c in classes[:]:
bases = set(c.__bases__) - set((object, InfoExtractor, SearchInfoExtractor))
stop = False
for b in bases:
if b not in classes and b not in ordered_cls:
if b.__name__ == 'GenericIE':
exit()
classes.insert(0, b)
stop = True
if stop:
break
if all(b in ordered_cls for b in bases):
ordered_cls.append(c)
classes.remove(c)
break
ordered_cls.append(_ALL_CLASSES[-1])
names = [] names = []
for ie in list(sorted(_ALL_CLASSES[:-1], key=lambda cls: cls.ie_key())) + _ALL_CLASSES[-1:]: for ie in ordered_cls:
name = ie.ie_key() + 'IE' name = ie.__name__
src = build_lazy_ie(ie, name) src = build_lazy_ie(ie, name)
module_contents.append(src) module_contents.append(src)
names.append(name) if ie in _ALL_CLASSES:
names.append(name)
module_contents.append( module_contents.append(
'_ALL_CLASSES = [{0}]'.format(', '.join(names))) '_ALL_CLASSES = [{0}]'.format(', '.join(names)))

View File

@ -248,7 +248,6 @@
- **Gamersyde** - **Gamersyde**
- **GameSpot** - **GameSpot**
- **GameStar** - **GameStar**
- **Gametrailers**
- **Gazeta** - **Gazeta**
- **GDCVault** - **GDCVault**
- **generic**: Generic downloader that works on some sites - **generic**: Generic downloader that works on some sites
@ -386,7 +385,7 @@
- **MovieFap** - **MovieFap**
- **Moviezine** - **Moviezine**
- **MPORA** - **MPORA**
- **MSNBC** - **MSN**
- **MTV** - **MTV**
- **mtv.de** - **mtv.de**
- **mtviggy.com** - **mtviggy.com**
@ -503,6 +502,7 @@
- **plus.google**: Google Plus - **plus.google**: Google Plus
- **pluzz.francetv.fr** - **pluzz.francetv.fr**
- **podomatic** - **podomatic**
- **PolskieRadio**
- **PornHd** - **PornHd**
- **PornHub** - **PornHub**
- **PornHubPlaylist** - **PornHubPlaylist**
@ -738,6 +738,7 @@
- **vh1.com** - **vh1.com**
- **Vice** - **Vice**
- **ViceShow** - **ViceShow**
- **Vidbit**
- **Viddler** - **Viddler**
- **video.google:search**: Google Video search - **video.google:search**: Google Video search
- **video.mit.edu** - **video.mit.edu**

View File

@ -21,25 +21,37 @@ try:
import py2exe import py2exe
except ImportError: except ImportError:
if len(sys.argv) >= 2 and sys.argv[1] == 'py2exe': if len(sys.argv) >= 2 and sys.argv[1] == 'py2exe':
print("Cannot import py2exe", file=sys.stderr) print('Cannot import py2exe', file=sys.stderr)
exit(1) exit(1)
py2exe_options = { py2exe_options = {
"bundle_files": 1, 'bundle_files': 1,
"compressed": 1, 'compressed': 1,
"optimize": 2, 'optimize': 2,
"dist_dir": '.', 'dist_dir': '.',
"dll_excludes": ['w9xpopen.exe', 'crypt32.dll'], 'dll_excludes': ['w9xpopen.exe', 'crypt32.dll'],
} }
# Get the version from youtube_dl/version.py without importing the package
exec(compile(open('youtube_dl/version.py').read(),
'youtube_dl/version.py', 'exec'))
DESCRIPTION = 'YouTube video downloader'
LONG_DESCRIPTION = 'Command-line program to download videos from YouTube.com and other video sites'
py2exe_console = [{ py2exe_console = [{
"script": "./youtube_dl/__main__.py", 'script': './youtube_dl/__main__.py',
"dest_base": "youtube-dl", 'dest_base': 'youtube-dl',
'version': __version__,
'description': DESCRIPTION,
'comments': LONG_DESCRIPTION,
'product_name': 'youtube-dl',
'product_version': __version__,
}] }]
py2exe_params = { py2exe_params = {
'console': py2exe_console, 'console': py2exe_console,
'options': {"py2exe": py2exe_options}, 'options': {'py2exe': py2exe_options},
'zipfile': None 'zipfile': None
} }
@ -72,7 +84,7 @@ else:
params['scripts'] = ['bin/youtube-dl'] params['scripts'] = ['bin/youtube-dl']
class build_lazy_extractors(Command): class build_lazy_extractors(Command):
description = "Build the extractor lazy loading module" description = 'Build the extractor lazy loading module'
user_options = [] user_options = []
def initialize_options(self): def initialize_options(self):
@ -87,16 +99,11 @@ class build_lazy_extractors(Command):
dry_run=self.dry_run, dry_run=self.dry_run,
) )
# Get the version from youtube_dl/version.py without importing the package
exec(compile(open('youtube_dl/version.py').read(),
'youtube_dl/version.py', 'exec'))
setup( setup(
name='youtube_dl', name='youtube_dl',
version=__version__, version=__version__,
description='YouTube video downloader', description=DESCRIPTION,
long_description='Small command-line program to download videos from' long_description=LONG_DESCRIPTION,
' YouTube.com and other video sites.',
url='https://github.com/rg3/youtube-dl', url='https://github.com/rg3/youtube-dl',
author='Ricardo Garcia', author='Ricardo Garcia',
author_email='ytdl@yt-dl.org', author_email='ytdl@yt-dl.org',
@ -112,17 +119,17 @@ setup(
# test_requires = ['nosetest'], # test_requires = ['nosetest'],
classifiers=[ classifiers=[
"Topic :: Multimedia :: Video", 'Topic :: Multimedia :: Video',
"Development Status :: 5 - Production/Stable", 'Development Status :: 5 - Production/Stable',
"Environment :: Console", 'Environment :: Console',
"License :: Public Domain", 'License :: Public Domain',
"Programming Language :: Python :: 2.6", 'Programming Language :: Python :: 2.6',
"Programming Language :: Python :: 2.7", 'Programming Language :: Python :: 2.7',
"Programming Language :: Python :: 3", 'Programming Language :: Python :: 3',
"Programming Language :: Python :: 3.2", 'Programming Language :: Python :: 3.2',
"Programming Language :: Python :: 3.3", 'Programming Language :: Python :: 3.3',
"Programming Language :: Python :: 3.4", 'Programming Language :: Python :: 3.4',
"Programming Language :: Python :: 3.5", 'Programming Language :: Python :: 3.5',
], ],
cmdclass={'build_lazy_extractors': build_lazy_extractors}, cmdclass={'build_lazy_extractors': build_lazy_extractors},

View File

@ -11,7 +11,7 @@ sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import FakeYDL from test.helper import FakeYDL
from youtube_dl.extractor.common import InfoExtractor from youtube_dl.extractor.common import InfoExtractor
from youtube_dl.extractor import YoutubeIE, get_info_extractor from youtube_dl.extractor import YoutubeIE, get_info_extractor
from youtube_dl.utils import encode_data_uri, strip_jsonp, ExtractorError from youtube_dl.utils import encode_data_uri, strip_jsonp, ExtractorError, RegexNotFoundError
class TestIE(InfoExtractor): class TestIE(InfoExtractor):
@ -66,6 +66,11 @@ class TestInfoExtractor(unittest.TestCase):
self.assertEqual(ie._html_search_meta('d', html), '4') self.assertEqual(ie._html_search_meta('d', html), '4')
self.assertEqual(ie._html_search_meta('e', html), '5') self.assertEqual(ie._html_search_meta('e', html), '5')
self.assertEqual(ie._html_search_meta('f', html), '6') self.assertEqual(ie._html_search_meta('f', html), '6')
self.assertEqual(ie._html_search_meta(('a', 'b', 'c'), html), '1')
self.assertEqual(ie._html_search_meta(('c', 'b', 'a'), html), '3')
self.assertEqual(ie._html_search_meta(('z', 'x', 'c'), html), '3')
self.assertRaises(RegexNotFoundError, ie._html_search_meta, 'z', html, None, fatal=True)
self.assertRaises(RegexNotFoundError, ie._html_search_meta, ('z', 'x'), html, None, fatal=True)
def test_download_json(self): def test_download_json(self):
uri = encode_data_uri(b'{"foo": "blah"}', 'application/json') uri = encode_data_uri(b'{"foo": "blah"}', 'application/json')

View File

@ -60,11 +60,13 @@ from youtube_dl.utils import (
timeconvert, timeconvert,
unescapeHTML, unescapeHTML,
unified_strdate, unified_strdate,
unified_timestamp,
unsmuggle_url, unsmuggle_url,
uppercase_escape, uppercase_escape,
lowercase_escape, lowercase_escape,
url_basename, url_basename,
urlencode_postdata, urlencode_postdata,
urshift,
update_url_query, update_url_query,
version_tuple, version_tuple,
xpath_with_ns, xpath_with_ns,
@ -283,8 +285,28 @@ class TestUtil(unittest.TestCase):
'20150202') '20150202')
self.assertEqual(unified_strdate('Feb 14th 2016 5:45PM'), '20160214') self.assertEqual(unified_strdate('Feb 14th 2016 5:45PM'), '20160214')
self.assertEqual(unified_strdate('25-09-2014'), '20140925') self.assertEqual(unified_strdate('25-09-2014'), '20140925')
self.assertEqual(unified_strdate('27.02.2016 17:30'), '20160227')
self.assertEqual(unified_strdate('UNKNOWN DATE FORMAT'), None) self.assertEqual(unified_strdate('UNKNOWN DATE FORMAT'), None)
def test_unified_timestamps(self):
self.assertEqual(unified_timestamp('December 21, 2010'), 1292889600)
self.assertEqual(unified_timestamp('8/7/2009'), 1247011200)
self.assertEqual(unified_timestamp('Dec 14, 2012'), 1355443200)
self.assertEqual(unified_timestamp('2012/10/11 01:56:38 +0000'), 1349920598)
self.assertEqual(unified_timestamp('1968 12 10'), -33436800)
self.assertEqual(unified_timestamp('1968-12-10'), -33436800)
self.assertEqual(unified_timestamp('28/01/2014 21:00:00 +0100'), 1390939200)
self.assertEqual(
unified_timestamp('11/26/2014 11:30:00 AM PST', day_first=False),
1417001400)
self.assertEqual(
unified_timestamp('2/2/2015 6:47:40 PM', day_first=False),
1422902860)
self.assertEqual(unified_timestamp('Feb 14th 2016 5:45PM'), 1455471900)
self.assertEqual(unified_timestamp('25-09-2014'), 1411603200)
self.assertEqual(unified_timestamp('27.02.2016 17:30'), 1456594200)
self.assertEqual(unified_timestamp('UNKNOWN DATE FORMAT'), None)
def test_determine_ext(self): def test_determine_ext(self):
self.assertEqual(determine_ext('http://example.com/foo/bar.mp4/?download'), 'mp4') self.assertEqual(determine_ext('http://example.com/foo/bar.mp4/?download'), 'mp4')
self.assertEqual(determine_ext('http://example.com/foo/bar/?download', None), None) self.assertEqual(determine_ext('http://example.com/foo/bar/?download', None), None)
@ -959,5 +981,9 @@ The first line
self.assertRaises(ValueError, encode_base_n, 0, 70) self.assertRaises(ValueError, encode_base_n, 0, 70)
self.assertRaises(ValueError, encode_base_n, 0, 60, custom_table) self.assertRaises(ValueError, encode_base_n, 0, 60, custom_table)
def test_urshift(self):
self.assertEqual(urshift(3, 1), 1)
self.assertEqual(urshift(-3, 1), 2147483646)
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()

View File

@ -2,14 +2,24 @@ from __future__ import unicode_literals
import os.path import os.path
import re import re
import binascii
try:
from Crypto.Cipher import AES
can_decrypt_frag = True
except ImportError:
can_decrypt_frag = False
from .fragment import FragmentFD from .fragment import FragmentFD
from .external import FFmpegFD from .external import FFmpegFD
from ..compat import compat_urlparse from ..compat import (
compat_urlparse,
compat_struct_pack,
)
from ..utils import ( from ..utils import (
encodeFilename, encodeFilename,
sanitize_open, sanitize_open,
parse_m3u8_attributes,
) )
@ -21,7 +31,7 @@ class HlsFD(FragmentFD):
@staticmethod @staticmethod
def can_download(manifest): def can_download(manifest):
UNSUPPORTED_FEATURES = ( UNSUPPORTED_FEATURES = (
r'#EXT-X-KEY:METHOD=(?!NONE)', # encrypted streams [1] r'#EXT-X-KEY:METHOD=(?!NONE|AES-128)', # encrypted streams [1]
r'#EXT-X-BYTERANGE', # playlists composed of byte ranges of media files [2] r'#EXT-X-BYTERANGE', # playlists composed of byte ranges of media files [2]
# Live streams heuristic does not always work (e.g. geo restricted to Germany # Live streams heuristic does not always work (e.g. geo restricted to Germany
@ -39,7 +49,9 @@ class HlsFD(FragmentFD):
# 3. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3.2 # 3. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3.2
# 4. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3.5 # 4. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3.5
) )
return all(not re.search(feature, manifest) for feature in UNSUPPORTED_FEATURES) check_results = [not re.search(feature, manifest) for feature in UNSUPPORTED_FEATURES]
check_results.append(can_decrypt_frag or '#EXT-X-KEY:METHOD=AES-128' not in manifest)
return all(check_results)
def real_download(self, filename, info_dict): def real_download(self, filename, info_dict):
man_url = info_dict['url'] man_url = info_dict['url']
@ -57,36 +69,60 @@ class HlsFD(FragmentFD):
fd.add_progress_hook(ph) fd.add_progress_hook(ph)
return fd.real_download(filename, info_dict) return fd.real_download(filename, info_dict)
fragment_urls = [] total_frags = 0
for line in s.splitlines(): for line in s.splitlines():
line = line.strip() line = line.strip()
if line and not line.startswith('#'): if line and not line.startswith('#'):
segment_url = ( total_frags += 1
line
if re.match(r'^https?://', line)
else compat_urlparse.urljoin(man_url, line))
fragment_urls.append(segment_url)
# We only download the first fragment during the test
if self.params.get('test', False):
break
ctx = { ctx = {
'filename': filename, 'filename': filename,
'total_frags': len(fragment_urls), 'total_frags': total_frags,
} }
self._prepare_and_start_frag_download(ctx) self._prepare_and_start_frag_download(ctx)
i = 0
media_sequence = 0
decrypt_info = {'METHOD': 'NONE'}
frags_filenames = [] frags_filenames = []
for i, frag_url in enumerate(fragment_urls): for line in s.splitlines():
frag_filename = '%s-Frag%d' % (ctx['tmpfilename'], i) line = line.strip()
success = ctx['dl'].download(frag_filename, {'url': frag_url}) if line:
if not success: if not line.startswith('#'):
return False frag_url = (
down, frag_sanitized = sanitize_open(frag_filename, 'rb') line
ctx['dest_stream'].write(down.read()) if re.match(r'^https?://', line)
down.close() else compat_urlparse.urljoin(man_url, line))
frags_filenames.append(frag_sanitized) frag_filename = '%s-Frag%d' % (ctx['tmpfilename'], i)
success = ctx['dl'].download(frag_filename, {'url': frag_url})
if not success:
return False
down, frag_sanitized = sanitize_open(frag_filename, 'rb')
frag_content = down.read()
down.close()
if decrypt_info['METHOD'] == 'AES-128':
iv = decrypt_info.get('IV') or compat_struct_pack('>8xq', media_sequence)
frag_content = AES.new(
decrypt_info['KEY'], AES.MODE_CBC, iv).decrypt(frag_content)
ctx['dest_stream'].write(frag_content)
frags_filenames.append(frag_sanitized)
# We only download the first fragment during the test
if self.params.get('test', False):
break
i += 1
media_sequence += 1
elif line.startswith('#EXT-X-KEY'):
decrypt_info = parse_m3u8_attributes(line[11:])
if decrypt_info['METHOD'] == 'AES-128':
if 'IV' in decrypt_info:
decrypt_info['IV'] = binascii.unhexlify(decrypt_info['IV'][2:])
if not re.match(r'^https?://', decrypt_info['URI']):
decrypt_info['URI'] = compat_urlparse.urljoin(
man_url, decrypt_info['URI'])
decrypt_info['KEY'] = self.ydl.urlopen(decrypt_info['URI']).read()
elif line.startswith('#EXT-X-MEDIA-SEQUENCE'):
media_sequence = int(line[22:])
self._finish_frag_download(ctx) self._finish_frag_download(ctx)

View File

@ -7,6 +7,8 @@ from .common import InfoExtractor
from ..compat import compat_urlparse from ..compat import compat_urlparse
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
parse_duration,
unified_strdate,
) )
@ -16,7 +18,8 @@ class AppleTrailersIE(InfoExtractor):
_TESTS = [{ _TESTS = [{
'url': 'http://trailers.apple.com/trailers/wb/manofsteel/', 'url': 'http://trailers.apple.com/trailers/wb/manofsteel/',
'info_dict': { 'info_dict': {
'id': 'manofsteel', 'id': '5111',
'title': 'Man of Steel',
}, },
'playlist': [ 'playlist': [
{ {
@ -70,6 +73,15 @@ class AppleTrailersIE(InfoExtractor):
'id': 'blackthorn', 'id': 'blackthorn',
}, },
'playlist_mincount': 2, 'playlist_mincount': 2,
'expected_warnings': ['Unable to download JSON metadata'],
}, {
# json data only available from http://trailers.apple.com/trailers/feeds/data/15881.json
'url': 'http://trailers.apple.com/trailers/fox/kungfupanda3/',
'info_dict': {
'id': '15881',
'title': 'Kung Fu Panda 3',
},
'playlist_mincount': 4,
}, { }, {
'url': 'http://trailers.apple.com/ca/metropole/autrui/', 'url': 'http://trailers.apple.com/ca/metropole/autrui/',
'only_matching': True, 'only_matching': True,
@ -85,6 +97,45 @@ class AppleTrailersIE(InfoExtractor):
movie = mobj.group('movie') movie = mobj.group('movie')
uploader_id = mobj.group('company') uploader_id = mobj.group('company')
webpage = self._download_webpage(url, movie)
film_id = self._search_regex(r"FilmId\s*=\s*'(\d+)'", webpage, 'film id')
film_data = self._download_json(
'http://trailers.apple.com/trailers/feeds/data/%s.json' % film_id,
film_id, fatal=False)
if film_data:
entries = []
for clip in film_data.get('clips', []):
clip_title = clip['title']
formats = []
for version, version_data in clip.get('versions', {}).items():
for size, size_data in version_data.get('sizes', {}).items():
src = size_data.get('src')
if not src:
continue
formats.append({
'format_id': '%s-%s' % (version, size),
'url': re.sub(r'_(\d+p.mov)', r'_h\1', src),
'width': int_or_none(size_data.get('width')),
'height': int_or_none(size_data.get('height')),
'language': version[:2],
})
self._sort_formats(formats)
entries.append({
'id': movie + '-' + re.sub(r'[^a-zA-Z0-9]', '', clip_title).lower(),
'formats': formats,
'title': clip_title,
'thumbnail': clip.get('screen') or clip.get('thumb'),
'duration': parse_duration(clip.get('runtime') or clip.get('faded')),
'upload_date': unified_strdate(clip.get('posted')),
'uploader_id': uploader_id,
})
page_data = film_data.get('page', {})
return self.playlist_result(entries, film_id, page_data.get('movie_title'))
playlist_url = compat_urlparse.urljoin(url, 'includes/playlists/itunes.inc') playlist_url = compat_urlparse.urljoin(url, 'includes/playlists/itunes.inc')
def fix_html(s): def fix_html(s):

View File

@ -1,17 +1,13 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re from .theplatform import ThePlatformFeedIE
from .theplatform import ThePlatformIE
from ..utils import ( from ..utils import (
xpath_text,
xpath_element,
int_or_none, int_or_none,
find_xpath_attr, find_xpath_attr,
) )
class CBSBaseIE(ThePlatformIE): class CBSBaseIE(ThePlatformFeedIE):
def _parse_smil_subtitles(self, smil, namespace=None, subtitles_lang='en'): def _parse_smil_subtitles(self, smil, namespace=None, subtitles_lang='en'):
closed_caption_e = find_xpath_attr(smil, self._xpath_ns('.//param', namespace), 'name', 'ClosedCaptionURL') closed_caption_e = find_xpath_attr(smil, self._xpath_ns('.//param', namespace), 'name', 'ClosedCaptionURL')
return { return {
@ -21,9 +17,22 @@ class CBSBaseIE(ThePlatformIE):
}] }]
} if closed_caption_e is not None and closed_caption_e.attrib.get('value') else [] } if closed_caption_e is not None and closed_caption_e.attrib.get('value') else []
def _extract_video_info(self, filter_query, video_id):
return self._extract_feed_info(
'dJ5BDC', 'VxxJg8Ymh8sE', filter_query, video_id, lambda entry: {
'series': entry.get('cbs$SeriesTitle'),
'season_number': int_or_none(entry.get('cbs$SeasonNumber')),
'episode': entry.get('cbs$EpisodeTitle'),
'episode_number': int_or_none(entry.get('cbs$EpisodeNumber')),
}, {
'StreamPack': {
'manifest': 'm3u',
}
})
class CBSIE(CBSBaseIE): class CBSIE(CBSBaseIE):
_VALID_URL = r'(?:cbs:(?P<content_id>\w+)|https?://(?:www\.)?(?:cbs\.com/shows/[^/]+/(?:video|artist)|colbertlateshow\.com/(?:video|podcasts))/[^/]+/(?P<display_id>[^/]+))' _VALID_URL = r'(?:cbs:|https?://(?:www\.)?(?:cbs\.com/shows/[^/]+/video|colbertlateshow\.com/(?:video|podcasts))/)(?P<id>[\w-]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.cbs.com/shows/garth-brooks/video/_u7W953k6la293J7EPTd9oHkSPs6Xn6_/connect-chat-feat-garth-brooks/', 'url': 'http://www.cbs.com/shows/garth-brooks/video/_u7W953k6la293J7EPTd9oHkSPs6Xn6_/connect-chat-feat-garth-brooks/',
@ -38,25 +47,7 @@ class CBSIE(CBSBaseIE):
'upload_date': '20131127', 'upload_date': '20131127',
'uploader': 'CBSI-NEW', 'uploader': 'CBSI-NEW',
}, },
'params': { 'expected_warnings': ['Failed to download m3u8 information'],
# rtmp download
'skip_download': True,
},
'_skip': 'Blocked outside the US',
}, {
'url': 'http://www.cbs.com/shows/liveonletterman/artist/221752/st-vincent/',
'info_dict': {
'id': 'WWF_5KqY3PK1',
'display_id': 'st-vincent',
'ext': 'flv',
'title': 'Live on Letterman - St. Vincent',
'description': 'Live On Letterman: St. Vincent in concert from New York\'s Ed Sullivan Theater on Tuesday, July 16, 2014.',
'duration': 3221,
},
'params': {
# rtmp download
'skip_download': True,
},
'_skip': 'Blocked outside the US', '_skip': 'Blocked outside the US',
}, { }, {
'url': 'http://colbertlateshow.com/video/8GmB0oY0McANFvp2aEffk9jZZZ2YyXxy/the-colbeard/', 'url': 'http://colbertlateshow.com/video/8GmB0oY0McANFvp2aEffk9jZZZ2YyXxy/the-colbeard/',
@ -68,44 +59,5 @@ class CBSIE(CBSBaseIE):
TP_RELEASE_URL_TEMPLATE = 'http://link.theplatform.com/s/dJ5BDC/%s?mbr=true' TP_RELEASE_URL_TEMPLATE = 'http://link.theplatform.com/s/dJ5BDC/%s?mbr=true'
def _real_extract(self, url): def _real_extract(self, url):
content_id, display_id = re.match(self._VALID_URL, url).groups() content_id = self._match_id(url)
if not content_id: return self._extract_video_info('byGuid=%s' % content_id, content_id)
webpage = self._download_webpage(url, display_id)
content_id = self._search_regex(
[r"video\.settings\.content_id\s*=\s*'([^']+)';", r"cbsplayer\.contentId\s*=\s*'([^']+)';"],
webpage, 'content id')
items_data = self._download_xml(
'http://can.cbs.com/thunder/player/videoPlayerService.php',
content_id, query={'partner': 'cbs', 'contentId': content_id})
video_data = xpath_element(items_data, './/item')
title = xpath_text(video_data, 'videoTitle', 'title', True)
subtitles = {}
formats = []
for item in items_data.findall('.//item'):
pid = xpath_text(item, 'pid')
if not pid:
continue
tp_release_url = self.TP_RELEASE_URL_TEMPLATE % pid
if '.m3u8' in xpath_text(item, 'contentUrl', default=''):
tp_release_url += '&manifest=m3u'
tp_formats, tp_subtitles = self._extract_theplatform_smil(
tp_release_url, content_id, 'Downloading %s SMIL data' % pid)
formats.extend(tp_formats)
subtitles = self._merge_subtitles(subtitles, tp_subtitles)
self._sort_formats(formats)
info = self.get_metadata('dJ5BDC/media/guid/2198311517/%s' % content_id, content_id)
info.update({
'id': content_id,
'display_id': display_id,
'title': title,
'series': xpath_text(video_data, 'seriesTitle'),
'season_number': int_or_none(xpath_text(video_data, 'seasonNumber')),
'episode_number': int_or_none(xpath_text(video_data, 'episodeNumber')),
'duration': int_or_none(xpath_text(video_data, 'videoLength'), 1000),
'thumbnail': xpath_text(video_data, 'previewImageURL'),
'formats': formats,
'subtitles': subtitles,
})
return info

View File

@ -30,9 +30,12 @@ class CBSNewsIE(CBSBaseIE):
{ {
'url': 'http://www.cbsnews.com/videos/fort-hood-shooting-army-downplays-mental-illness-as-cause-of-attack/', 'url': 'http://www.cbsnews.com/videos/fort-hood-shooting-army-downplays-mental-illness-as-cause-of-attack/',
'info_dict': { 'info_dict': {
'id': 'fort-hood-shooting-army-downplays-mental-illness-as-cause-of-attack', 'id': 'SNJBOYzXiWBOvaLsdzwH8fmtP1SCd91Y',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Fort Hood shooting: Army downplays mental illness as cause of attack', 'title': 'Fort Hood shooting: Army downplays mental illness as cause of attack',
'description': 'md5:4a6983e480542d8b333a947bfc64ddc7',
'upload_date': '19700101',
'uploader': 'CBSI-NEW',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
'duration': 205, 'duration': 205,
'subtitles': { 'subtitles': {
@ -58,30 +61,8 @@ class CBSNewsIE(CBSBaseIE):
webpage, 'video JSON info'), video_id) webpage, 'video JSON info'), video_id)
item = video_info['item'] if 'item' in video_info else video_info item = video_info['item'] if 'item' in video_info else video_info
title = item.get('articleTitle') or item.get('hed') guid = item['mpxRefId']
duration = item.get('duration') return self._extract_video_info('byGuid=%s' % guid, guid)
thumbnail = item.get('mediaImage') or item.get('thumbnail')
subtitles = {}
formats = []
for format_id in ['RtmpMobileLow', 'RtmpMobileHigh', 'Hls', 'RtmpDesktop']:
pid = item.get('media' + format_id)
if not pid:
continue
release_url = 'http://link.theplatform.com/s/dJ5BDC/%s?mbr=true' % pid
tp_formats, tp_subtitles = self._extract_theplatform_smil(release_url, video_id, 'Downloading %s SMIL data' % pid)
formats.extend(tp_formats)
subtitles = self._merge_subtitles(subtitles, tp_subtitles)
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'duration': duration,
'formats': formats,
'subtitles': subtitles,
}
class CBSNewsLiveVideoIE(InfoExtractor): class CBSNewsLiveVideoIE(InfoExtractor):

View File

@ -1,30 +1,28 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re from .cbs import CBSBaseIE
from .common import InfoExtractor
class CBSSportsIE(InfoExtractor): class CBSSportsIE(CBSBaseIE):
_VALID_URL = r'https?://www\.cbssports\.com/video/player/(?P<section>[^/]+)/(?P<id>[^/]+)' _VALID_URL = r'https?://www\.cbssports\.com/video/player/[^/]+/(?P<id>\d+)'
_TEST = { _TESTS = [{
'url': 'http://www.cbssports.com/video/player/tennis/318462531970/0/us-open-flashbacks-1990s', 'url': 'http://www.cbssports.com/video/player/videos/708337219968/0/ben-simmons-the-next-lebron?-not-so-fast',
'info_dict': { 'info_dict': {
'id': '_d5_GbO8p1sT', 'id': '708337219968',
'ext': 'flv', 'ext': 'mp4',
'title': 'US Open flashbacks: 1990s', 'title': 'Ben Simmons the next LeBron? Not so fast',
'description': 'Bill Macatee relives the best moments in US Open history from the 1990s.', 'description': 'md5:854294f627921baba1f4b9a990d87197',
'timestamp': 1466293740,
'upload_date': '20160618',
'uploader': 'CBSI-NEW',
}, },
} 'params': {
# m3u8 download
'skip_download': True,
}
}]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id = self._match_id(url)
section = mobj.group('section') return self._extract_video_info('byId=%s' % video_id, video_id)
video_id = mobj.group('id')
all_videos = self._download_json(
'http://www.cbssports.com/data/video/player/getVideos/%s?as=json' % section,
video_id)
# The json file contains the info of all the videos in the section
video_info = next(v for v in all_videos if v['pcid'] == video_id)
return self.url_result('theplatform:%s' % video_info['pid'], 'ThePlatform')

View File

@ -53,6 +53,7 @@ from ..utils import (
mimetype2ext, mimetype2ext,
update_Request, update_Request,
update_url_query, update_url_query,
parse_m3u8_attributes,
) )
@ -748,10 +749,12 @@ class InfoExtractor(object):
return self._og_search_property('url', html, **kargs) return self._og_search_property('url', html, **kargs)
def _html_search_meta(self, name, html, display_name=None, fatal=False, **kwargs): def _html_search_meta(self, name, html, display_name=None, fatal=False, **kwargs):
if not isinstance(name, (list, tuple)):
name = [name]
if display_name is None: if display_name is None:
display_name = name display_name = name[0]
return self._html_search_regex( return self._html_search_regex(
self._meta_regex(name), [self._meta_regex(n) for n in name],
html, display_name, fatal=fatal, group='content', **kwargs) html, display_name, fatal=fatal, group='content', **kwargs)
def _dc_search_uploader(self, html): def _dc_search_uploader(self, html):
@ -875,7 +878,11 @@ class InfoExtractor(object):
f['ext'] = determine_ext(f['url']) f['ext'] = determine_ext(f['url'])
if isinstance(field_preference, (list, tuple)): if isinstance(field_preference, (list, tuple)):
return tuple(f.get(field) if f.get(field) is not None else -1 for field in field_preference) return tuple(
f.get(field)
if f.get(field) is not None
else ('' if field == 'format_id' else -1)
for field in field_preference)
preference = f.get('preference') preference = f.get('preference')
if preference is None: if preference is None:
@ -1150,23 +1157,11 @@ class InfoExtractor(object):
}] }]
last_info = None last_info = None
last_media = None last_media = None
kv_rex = re.compile(
r'(?P<key>[a-zA-Z_-]+)=(?P<val>"[^"]+"|[^",]+)(?:,|$)')
for line in m3u8_doc.splitlines(): for line in m3u8_doc.splitlines():
if line.startswith('#EXT-X-STREAM-INF:'): if line.startswith('#EXT-X-STREAM-INF:'):
last_info = {} last_info = parse_m3u8_attributes(line)
for m in kv_rex.finditer(line):
v = m.group('val')
if v.startswith('"'):
v = v[1:-1]
last_info[m.group('key')] = v
elif line.startswith('#EXT-X-MEDIA:'): elif line.startswith('#EXT-X-MEDIA:'):
last_media = {} last_media = parse_m3u8_attributes(line)
for m in kv_rex.finditer(line):
v = m.group('val')
if v.startswith('"'):
v = v[1:-1]
last_media[m.group('key')] = v
elif line.startswith('#') or not line.strip(): elif line.startswith('#') or not line.strip():
continue continue
else: else:

View File

@ -20,7 +20,7 @@ from ..utils import (
class DCNIE(InfoExtractor): class DCNIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?dcndigital\.ae/(?:#/)?show/(?P<show_id>\d+)/[^/]+(?:/(?P<video_id>\d+)/(?P<season_id>\d+))?' _VALID_URL = r'https?://(?:www\.)?(?:awaan|dcndigital)\.ae/(?:#/)?show/(?P<show_id>\d+)/[^/]+(?:/(?P<video_id>\d+)/(?P<season_id>\d+))?'
def _real_extract(self, url): def _real_extract(self, url):
show_id, video_id, season_id = re.match(self._VALID_URL, url).groups() show_id, video_id, season_id = re.match(self._VALID_URL, url).groups()
@ -55,30 +55,32 @@ class DCNBaseIE(InfoExtractor):
'is_live': is_live, 'is_live': is_live,
} }
def _extract_video_formats(self, webpage, video_id, entry_protocol): def _extract_video_formats(self, webpage, video_id, m3u8_entry_protocol):
formats = [] formats = []
m3u8_url = self._html_search_regex( format_url_base = 'http' + self._html_search_regex(
r'file\s*:\s*"([^"]+)', webpage, 'm3u8 url', fatal=False) [
if m3u8_url: r'file\s*:\s*"https?(://[^"]+)/playlist.m3u8',
formats.extend(self._extract_m3u8_formats( r'<a[^>]+href="rtsp(://[^"]+)"'
m3u8_url, video_id, 'mp4', entry_protocol, m3u8_id='hls', fatal=None)) ], webpage, 'format url')
# TODO: Current DASH formats are broken - $Time$ pattern in
rtsp_url = self._search_regex( # <SegmentTemplate> not implemented yet
r'<a[^>]+href="(rtsp://[^"]+)"', webpage, 'rtsp url', fatal=False) # formats.extend(self._extract_mpd_formats(
if rtsp_url: # format_url_base + '/manifest.mpd',
formats.append({ # video_id, mpd_id='dash', fatal=False))
'url': rtsp_url, formats.extend(self._extract_m3u8_formats(
'format_id': 'rtsp', format_url_base + '/playlist.m3u8', video_id, 'mp4',
}) m3u8_entry_protocol, m3u8_id='hls', fatal=False))
formats.extend(self._extract_f4m_formats(
format_url_base + '/manifest.f4m',
video_id, f4m_id='hds', fatal=False))
self._sort_formats(formats) self._sort_formats(formats)
return formats return formats
class DCNVideoIE(DCNBaseIE): class DCNVideoIE(DCNBaseIE):
IE_NAME = 'dcn:video' IE_NAME = 'dcn:video'
_VALID_URL = r'https?://(?:www\.)?dcndigital\.ae/(?:#/)?(?:video/[^/]+|media|catchup/[^/]+/[^/]+)/(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?(?:awaan|dcndigital)\.ae/(?:#/)?(?:video(?:/[^/]+)?|media|catchup/[^/]+/[^/]+)/(?P<id>\d+)'
_TEST = { _TESTS = [{
'url': 'http://www.dcndigital.ae/#/video/%D8%B1%D8%AD%D9%84%D8%A9-%D8%A7%D9%84%D8%B9%D9%85%D8%B1-%D8%A7%D9%84%D8%AD%D9%84%D9%82%D8%A9-1/17375', 'url': 'http://www.dcndigital.ae/#/video/%D8%B1%D8%AD%D9%84%D8%A9-%D8%A7%D9%84%D8%B9%D9%85%D8%B1-%D8%A7%D9%84%D8%AD%D9%84%D9%82%D8%A9-1/17375',
'info_dict': 'info_dict':
{ {
@ -94,7 +96,10 @@ class DCNVideoIE(DCNBaseIE):
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
}, },
} }, {
'url': 'http://awaan.ae/video/26723981/%D8%AF%D8%A7%D8%B1-%D8%A7%D9%84%D8%B3%D9%84%D8%A7%D9%85:-%D8%AE%D9%8A%D8%B1-%D8%AF%D9%88%D8%B1-%D8%A7%D9%84%D8%A3%D9%86%D8%B5%D8%A7%D8%B1',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
@ -120,7 +125,7 @@ class DCNVideoIE(DCNBaseIE):
class DCNLiveIE(DCNBaseIE): class DCNLiveIE(DCNBaseIE):
IE_NAME = 'dcn:live' IE_NAME = 'dcn:live'
_VALID_URL = r'https?://(?:www\.)?dcndigital\.ae/(?:#/)?live/(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?(?:awaan|dcndigital)\.ae/(?:#/)?live/(?P<id>\d+)'
def _real_extract(self, url): def _real_extract(self, url):
channel_id = self._match_id(url) channel_id = self._match_id(url)
@ -147,7 +152,7 @@ class DCNLiveIE(DCNBaseIE):
class DCNSeasonIE(InfoExtractor): class DCNSeasonIE(InfoExtractor):
IE_NAME = 'dcn:season' IE_NAME = 'dcn:season'
_VALID_URL = r'https?://(?:www\.)?dcndigital\.ae/(?:#/)?program/(?:(?P<show_id>\d+)|season/(?P<season_id>\d+))' _VALID_URL = r'https?://(?:www\.)?(?:awaan|dcndigital)\.ae/(?:#/)?program/(?:(?P<show_id>\d+)|season/(?P<season_id>\d+))'
_TEST = { _TEST = {
'url': 'http://dcndigital.ae/#/program/205024/%D9%85%D8%AD%D8%A7%D8%B6%D8%B1%D8%A7%D8%AA-%D8%A7%D9%84%D8%B4%D9%8A%D8%AE-%D8%A7%D9%84%D8%B4%D8%B9%D8%B1%D8%A7%D9%88%D9%8A', 'url': 'http://dcndigital.ae/#/program/205024/%D9%85%D8%AD%D8%A7%D8%B6%D8%B1%D8%A7%D8%AA-%D8%A7%D9%84%D8%B4%D9%8A%D8%AE-%D8%A7%D9%84%D8%B4%D8%B9%D8%B1%D8%A7%D9%88%D9%8A',
'info_dict': 'info_dict':

View File

@ -285,7 +285,6 @@ from .gameone import (
from .gamersyde import GamersydeIE from .gamersyde import GamersydeIE
from .gamespot import GameSpotIE from .gamespot import GameSpotIE
from .gamestar import GameStarIE from .gamestar import GameStarIE
from .gametrailers import GametrailersIE
from .gazeta import GazetaIE from .gazeta import GazetaIE
from .gdcvault import GDCVaultIE from .gdcvault import GDCVaultIE
from .generic import GenericIE from .generic import GenericIE
@ -455,6 +454,7 @@ from .motherless import MotherlessIE
from .motorsport import MotorsportIE from .motorsport import MotorsportIE
from .movieclips import MovieClipsIE from .movieclips import MovieClipsIE
from .moviezine import MoviezineIE from .moviezine import MoviezineIE
from .msn import MSNIE
from .mtv import ( from .mtv import (
MTVIE, MTVIE,
MTVServicesEmbeddedIE, MTVServicesEmbeddedIE,
@ -481,7 +481,6 @@ from .nbc import (
NBCNewsIE, NBCNewsIE,
NBCSportsIE, NBCSportsIE,
NBCSportsVPlayerIE, NBCSportsVPlayerIE,
MSNBCIE,
) )
from .ndr import ( from .ndr import (
NDRIE, NDRIE,
@ -608,6 +607,7 @@ from .pluralsight import (
PluralsightCourseIE, PluralsightCourseIE,
) )
from .podomatic import PodomaticIE from .podomatic import PodomaticIE
from .polskieradio import PolskieRadioIE
from .porn91 import Porn91IE from .porn91 import Porn91IE
from .pornhd import PornHdIE from .pornhd import PornHdIE
from .pornhub import ( from .pornhub import (
@ -917,6 +917,7 @@ from .vice import (
ViceIE, ViceIE,
ViceShowIE, ViceShowIE,
) )
from .vidbit import VidbitIE
from .viddler import ViddlerIE from .viddler import ViddlerIE
from .videodetective import VideoDetectiveIE from .videodetective import VideoDetectiveIE
from .videofyme import VideofyMeIE from .videofyme import VideofyMeIE

View File

@ -239,6 +239,8 @@ class FacebookIE(InfoExtractor):
formats = [] formats = []
for format_id, f in video_data.items(): for format_id, f in video_data.items():
if f and isinstance(f, dict):
f = [f]
if not f or not isinstance(f, list): if not f or not isinstance(f, list):
continue continue
for quality in ('sd', 'hd'): for quality in ('sd', 'hd'):

View File

@ -1,7 +1,10 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import smuggle_url from ..utils import (
smuggle_url,
update_url_query,
)
class FoxSportsIE(InfoExtractor): class FoxSportsIE(InfoExtractor):
@ -9,11 +12,15 @@ class FoxSportsIE(InfoExtractor):
_TEST = { _TEST = {
'url': 'http://www.foxsports.com/video?vid=432609859715', 'url': 'http://www.foxsports.com/video?vid=432609859715',
'md5': 'b49050e955bebe32c301972e4012ac17',
'info_dict': { 'info_dict': {
'id': 'gA0bHB3Ladz3', 'id': 'i0qKWsk3qJaM',
'ext': 'flv', 'ext': 'mp4',
'title': 'Courtney Lee on going up 2-0 in series vs. Blazers', 'title': 'Courtney Lee on going up 2-0 in series vs. Blazers',
'description': 'Courtney Lee talks about Memphis being focused.', 'description': 'Courtney Lee talks about Memphis being focused.',
'upload_date': '20150423',
'timestamp': 1429761109,
'uploader': 'NEWA-FNG-FOXSPORTS',
}, },
'add_ie': ['ThePlatform'], 'add_ie': ['ThePlatform'],
} }
@ -28,5 +35,8 @@ class FoxSportsIE(InfoExtractor):
r"data-player-config='([^']+)'", webpage, 'data player config'), r"data-player-config='([^']+)'", webpage, 'data player config'),
video_id) video_id)
return self.url_result(smuggle_url( return self.url_result(smuggle_url(update_url_query(
config['releaseURL'] + '&manifest=f4m', {'force_smil_url': True})) config['releaseURL'], {
'mbr': 'true',
'switch': 'http',
}), {'force_smil_url': True}))

View File

@ -1,19 +1,19 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
import json
from .common import InfoExtractor from .once import OnceIE
from ..compat import ( from ..compat import (
compat_urllib_parse_unquote, compat_urllib_parse_unquote,
compat_urlparse,
) )
from ..utils import ( from ..utils import (
unescapeHTML, unescapeHTML,
url_basename,
dict_get,
) )
class GameSpotIE(InfoExtractor): class GameSpotIE(OnceIE):
_VALID_URL = r'https?://(?:www\.)?gamespot\.com/.*-(?P<id>\d+)/?' _VALID_URL = r'https?://(?:www\.)?gamespot\.com/.*-(?P<id>\d+)/?'
_TESTS = [{ _TESTS = [{
'url': 'http://www.gamespot.com/videos/arma-3-community-guide-sitrep-i/2300-6410818/', 'url': 'http://www.gamespot.com/videos/arma-3-community-guide-sitrep-i/2300-6410818/',
@ -39,29 +39,73 @@ class GameSpotIE(InfoExtractor):
webpage = self._download_webpage(url, page_id) webpage = self._download_webpage(url, page_id)
data_video_json = self._search_regex( data_video_json = self._search_regex(
r'data-video=["\'](.*?)["\']', webpage, 'data video') r'data-video=["\'](.*?)["\']', webpage, 'data video')
data_video = json.loads(unescapeHTML(data_video_json)) data_video = self._parse_json(unescapeHTML(data_video_json), page_id)
streams = data_video['videoStreams'] streams = data_video['videoStreams']
manifest_url = None
formats = [] formats = []
f4m_url = streams.get('f4m_stream') f4m_url = streams.get('f4m_stream')
if f4m_url is not None: if f4m_url:
# Transform the manifest url to a link to the mp4 files manifest_url = f4m_url
# they are used in mobile devices. formats.extend(self._extract_f4m_formats(
f4m_path = compat_urlparse.urlparse(f4m_url).path f4m_url + '?hdcore=3.7.0', page_id, f4m_id='hds', fatal=False))
QUALITIES_RE = r'((,\d+)+,?)' m3u8_url = streams.get('m3u8_stream')
qualities = self._search_regex(QUALITIES_RE, f4m_path, 'qualities').strip(',').split(',') if m3u8_url:
http_path = f4m_path[1:].split('/', 1)[1] manifest_url = m3u8_url
http_template = re.sub(QUALITIES_RE, r'%s', http_path) m3u8_formats = self._extract_m3u8_formats(
http_template = http_template.replace('.csmil/manifest.f4m', '') m3u8_url, page_id, 'mp4', 'm3u8_native',
http_template = compat_urlparse.urljoin( m3u8_id='hls', fatal=False)
'http://video.gamespotcdn.com/', http_template) formats.extend(m3u8_formats)
for q in qualities: progressive_url = dict_get(
formats.append({ streams, ('progressive_hd', 'progressive_high', 'progressive_low'))
'url': http_template % q, if progressive_url and manifest_url:
'ext': 'mp4', qualities_basename = self._search_regex(
'format_id': q, '/([^/]+)\.csmil/',
}) manifest_url, 'qualities basename', default=None)
else: if qualities_basename:
QUALITIES_RE = r'((,\d+)+,?)'
qualities = self._search_regex(
QUALITIES_RE, qualities_basename,
'qualities', default=None)
if qualities:
qualities = list(map(lambda q: int(q), qualities.strip(',').split(',')))
qualities.sort()
http_template = re.sub(QUALITIES_RE, r'%d', qualities_basename)
http_url_basename = url_basename(progressive_url)
if m3u8_formats:
self._sort_formats(m3u8_formats)
m3u8_formats = list(filter(
lambda f: f.get('vcodec') != 'none' and f.get('resolution') != 'multiple',
m3u8_formats))
if len(qualities) == len(m3u8_formats):
for q, m3u8_format in zip(qualities, m3u8_formats):
f = m3u8_format.copy()
f.update({
'url': progressive_url.replace(
http_url_basename, http_template % q),
'format_id': f['format_id'].replace('hls', 'http'),
'protocol': 'http',
})
formats.append(f)
else:
for q in qualities:
formats.append({
'url': progressive_url.replace(
http_url_basename, http_template % q),
'ext': 'mp4',
'format_id': 'http-%d' % q,
'tbr': q,
})
onceux_json = self._search_regex(
r'data-onceux-options=["\'](.*?)["\']', webpage, 'data video', default=None)
if onceux_json:
onceux_url = self._parse_json(unescapeHTML(onceux_json), page_id).get('metadataUri')
if onceux_url:
formats.extend(self._extract_once_formats(re.sub(
r'https?://[^/]+', 'http://once.unicornmedia.com', onceux_url).replace('ads/vmap/', '')))
if not formats:
for quality in ['sd', 'hd']: for quality in ['sd', 'hd']:
# It's actually a link to a flv file # It's actually a link to a flv file
flv_url = streams.get('f4m_{0}'.format(quality)) flv_url = streams.get('f4m_{0}'.format(quality))
@ -71,6 +115,7 @@ class GameSpotIE(InfoExtractor):
'ext': 'flv', 'ext': 'flv',
'format_id': quality, 'format_id': quality,
}) })
self._sort_formats(formats)
return { return {
'id': data_video['guid'], 'id': data_video['guid'],

View File

@ -1,62 +0,0 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
parse_age_limit,
url_basename,
)
class GametrailersIE(InfoExtractor):
_VALID_URL = r'https?://www\.gametrailers\.com/videos/view/[^/]+/(?P<id>.+)'
_TEST = {
'url': 'http://www.gametrailers.com/videos/view/gametrailers-com/116437-Just-Cause-3-Review',
'md5': 'f28c4efa0bdfaf9b760f6507955b6a6a',
'info_dict': {
'id': '2983958',
'ext': 'mp4',
'display_id': '116437-Just-Cause-3-Review',
'title': 'Just Cause 3 - Review',
'description': 'It\'s a lot of fun to shoot at things and then watch them explode in Just Cause 3, but should there be more to the experience than that?',
},
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
title = self._html_search_regex(
r'<title>(.+?)\|', webpage, 'title').strip()
embed_url = self._proto_relative_url(
self._search_regex(
r'src=\'(//embed.gametrailers.com/embed/[^\']+)\'', webpage,
'embed url'),
scheme='http:')
video_id = url_basename(embed_url)
embed_page = self._download_webpage(embed_url, video_id)
embed_vars_json = self._search_regex(
r'(?s)var embedVars = (\{.*?\})\s*</script>', embed_page,
'embed vars')
info = self._parse_json(embed_vars_json, video_id)
formats = []
for media in info['media']:
if media['mediaPurpose'] == 'play':
formats.append({
'url': media['uri'],
'height': media['height'],
'width:': media['width'],
})
self._sort_formats(formats)
return {
'id': video_id,
'display_id': display_id,
'title': title,
'formats': formats,
'thumbnail': info.get('thumbUri'),
'description': self._og_search_description(webpage),
'duration': int_or_none(info.get('videoLengthInSeconds')),
'age_limit': parse_age_limit(info.get('audienceRating')),
}

View File

@ -1091,12 +1091,17 @@ class GenericIE(InfoExtractor):
# Dailymotion Cloud video # Dailymotion Cloud video
{ {
'url': 'http://replay.publicsenat.fr/vod/le-debat/florent-kolandjian,dominique-cena,axel-decourtye,laurence-abeille,bruno-parmentier/175910', 'url': 'http://replay.publicsenat.fr/vod/le-debat/florent-kolandjian,dominique-cena,axel-decourtye,laurence-abeille,bruno-parmentier/175910',
'md5': '49444254273501a64675a7e68c502681', 'md5': 'dcaf23ad0c67a256f4278bce6e0bae38',
'info_dict': { 'info_dict': {
'id': '5585de919473990de4bee11b', 'id': 'x2uy8t3',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Le débat', 'title': 'Sauvons les abeilles ! - Le débat',
'description': 'md5:d9082128b1c5277987825d684939ca26',
'thumbnail': 're:^https?://.*\.jpe?g$', 'thumbnail': 're:^https?://.*\.jpe?g$',
'timestamp': 1434970506,
'upload_date': '20150622',
'uploader': 'Public Sénat',
'uploader_id': 'xa9gza',
} }
}, },
# OnionStudios embed # OnionStudios embed

View File

@ -1,30 +1,25 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import binascii
import hashlib import hashlib
import itertools import itertools
import math import math
import os
import random
import re import re
import time import time
import uuid
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_parse_qs,
compat_str, compat_str,
compat_urllib_parse_urlencode, compat_urllib_parse_urlencode,
compat_urllib_parse_urlparse,
) )
from ..utils import ( from ..utils import (
decode_packed_codes, decode_packed_codes,
ExtractorError, ExtractorError,
intlist_to_bytes,
ohdave_rsa_encrypt, ohdave_rsa_encrypt,
remove_start, remove_start,
sanitized_Request, urshift,
urlencode_postdata,
url_basename,
) )
@ -171,70 +166,21 @@ class IqiyiIE(InfoExtractor):
_TESTS = [{ _TESTS = [{
'url': 'http://www.iqiyi.com/v_19rrojlavg.html', 'url': 'http://www.iqiyi.com/v_19rrojlavg.html',
'md5': '2cb594dc2781e6c941a110d8f358118b', 'md5': '470a6c160618577166db1a7aac5a3606',
'info_dict': { 'info_dict': {
'id': '9c1fb1b99d192b21c559e5a1a2cb3c73', 'id': '9c1fb1b99d192b21c559e5a1a2cb3c73',
'ext': 'mp4',
'title': '美国德州空中惊现奇异云团 酷似UFO', 'title': '美国德州空中惊现奇异云团 酷似UFO',
'ext': 'f4v',
} }
}, { }, {
'url': 'http://www.iqiyi.com/v_19rrhnnclk.html', 'url': 'http://www.iqiyi.com/v_19rrhnnclk.html',
'md5': 'f09f0a6a59b2da66a26bf4eda669a4cc',
'info_dict': { 'info_dict': {
'id': 'e3f585b550a280af23c98b6cb2be19fb', 'id': 'e3f585b550a280af23c98b6cb2be19fb',
'title': '名侦探柯南第752集', 'ext': 'mp4',
}, 'title': '名侦探柯南 国语版',
'playlist': [{
'info_dict': {
'id': 'e3f585b550a280af23c98b6cb2be19fb_part1',
'ext': 'f4v',
'title': '名侦探柯南第752集',
},
}, {
'info_dict': {
'id': 'e3f585b550a280af23c98b6cb2be19fb_part2',
'ext': 'f4v',
'title': '名侦探柯南第752集',
},
}, {
'info_dict': {
'id': 'e3f585b550a280af23c98b6cb2be19fb_part3',
'ext': 'f4v',
'title': '名侦探柯南第752集',
},
}, {
'info_dict': {
'id': 'e3f585b550a280af23c98b6cb2be19fb_part4',
'ext': 'f4v',
'title': '名侦探柯南第752集',
},
}, {
'info_dict': {
'id': 'e3f585b550a280af23c98b6cb2be19fb_part5',
'ext': 'f4v',
'title': '名侦探柯南第752集',
},
}, {
'info_dict': {
'id': 'e3f585b550a280af23c98b6cb2be19fb_part6',
'ext': 'f4v',
'title': '名侦探柯南第752集',
},
}, {
'info_dict': {
'id': 'e3f585b550a280af23c98b6cb2be19fb_part7',
'ext': 'f4v',
'title': '名侦探柯南第752集',
},
}, {
'info_dict': {
'id': 'e3f585b550a280af23c98b6cb2be19fb_part8',
'ext': 'f4v',
'title': '名侦探柯南第752集',
},
}],
'params': {
'skip_download': True,
}, },
'skip': 'Geo-restricted to China',
}, { }, {
'url': 'http://www.iqiyi.com/w_19rt6o8t9p.html', 'url': 'http://www.iqiyi.com/w_19rt6o8t9p.html',
'only_matching': True, 'only_matching': True,
@ -287,13 +233,6 @@ class IqiyiIE(InfoExtractor):
('10', 'h1'), ('10', 'h1'),
] ]
AUTH_API_ERRORS = {
# No preview available (不允许试看鉴权失败)
'Q00505': 'This video requires a VIP account',
# End of preview time (试看结束鉴权失败)
'Q00506': 'Needs a VIP account for full video',
}
def _real_initialize(self): def _real_initialize(self):
self._login() self._login()
@ -352,177 +291,101 @@ class IqiyiIE(InfoExtractor):
return True return True
def _authenticate_vip_video(self, api_video_url, video_id, tvid, _uuid, do_report_warning): @staticmethod
auth_params = { def _gen_sc(tvid, timestamp):
# version and platform hard-coded in com/qiyi/player/core/model/remote/AuthenticationRemote.as M = [1732584193, -271733879]
'version': '2.0', M.extend([~M[0], ~M[1]])
'platform': 'b6c13e26323c537d', I_table = [7, 12, 17, 22, 5, 9, 14, 20, 4, 11, 16, 23, 6, 10, 15, 21]
'aid': tvid, C_base = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8388608, 432]
def L(n, t):
if t is None:
t = 0
return trunc(((n >> 1) + (t >> 1) << 1) + (n & 1) + (t & 1))
def trunc(n):
n = n % 0x100000000
if n > 0x7fffffff:
n -= 0x100000000
return n
def transform(string, mod):
num = int(string, 16)
return (num >> 8 * (i % 4) & 255 ^ i % mod) << ((a & 3) << 3)
C = list(C_base)
o = list(M)
k = str(timestamp - 7)
for i in range(13):
a = i
C[a >> 2] |= ord(k[a]) << 8 * (a % 4)
for i in range(16):
a = i + 13
start = (i >> 2) * 8
r = '03967743b643f66763d623d637e30733'
C[a >> 2] |= transform(''.join(reversed(r[start:start + 8])), 7)
for i in range(16):
a = i + 29
start = (i >> 2) * 8
r = '7038766939776a32776a32706b337139'
C[a >> 2] |= transform(r[start:start + 8], 1)
for i in range(9):
a = i + 45
if i < len(tvid):
C[a >> 2] |= ord(tvid[i]) << 8 * (a % 4)
for a in range(64):
i = a
I = i >> 4
C_index = [i, 5 * i + 1, 3 * i + 5, 7 * i][I] % 16 + urshift(a, 6)
m = L(L(o[0], [
trunc(o[1] & o[2]) | trunc(~o[1] & o[3]),
trunc(o[3] & o[1]) | trunc(~o[3] & o[2]),
o[1] ^ o[2] ^ o[3],
o[2] ^ trunc(o[1] | ~o[3])
][I]), L(
trunc(int(abs(math.sin(i + 1)) * 4294967296)),
C[C_index] if C_index < len(C) else None))
I = I_table[4 * I + i % 4]
o = [o[3],
L(o[1], trunc(trunc(m << I) | urshift(m, 32 - I))),
o[1],
o[2]]
new_M = [L(o[0], M[0]), L(o[1], M[1]), L(o[2], M[2]), L(o[3], M[3])]
s = [new_M[a >> 3] >> (1 ^ a & 7) * 4 & 15 for a in range(32)]
return binascii.hexlify(intlist_to_bytes(s))[1::2].decode('ascii')
def get_raw_data(self, tvid, video_id):
tm = int(time.time() * 1000)
sc = self._gen_sc(tvid, tm)
params = {
'platForm': 'h5',
'rate': 1,
'tvid': tvid, 'tvid': tvid,
'uid': '',
'deviceId': _uuid,
'playType': 'main', # XXX: always main?
'filename': os.path.splitext(url_basename(api_video_url))[0],
}
qd_items = compat_parse_qs(compat_urllib_parse_urlparse(api_video_url).query)
for key, val in qd_items.items():
auth_params[key] = val[0]
auth_req = sanitized_Request(
'http://api.vip.iqiyi.com/services/ckn.action',
urlencode_postdata(auth_params))
# iQiyi server throws HTTP 405 error without the following header
auth_req.add_header('Content-Type', 'application/x-www-form-urlencoded')
auth_result = self._download_json(
auth_req, video_id,
note='Downloading video authentication JSON',
errnote='Unable to download video authentication JSON')
code = auth_result.get('code')
msg = self.AUTH_API_ERRORS.get(code) or auth_result.get('msg') or code
if code == 'Q00506':
if do_report_warning:
self.report_warning(msg)
return False
if 'data' not in auth_result:
if msg is not None:
raise ExtractorError('%s said: %s' % (self.IE_NAME, msg), expected=True)
raise ExtractorError('Unexpected error from Iqiyi auth API')
return auth_result['data']
def construct_video_urls(self, data, video_id, _uuid, tvid):
def do_xor(x, y):
a = y % 3
if a == 1:
return x ^ 121
if a == 2:
return x ^ 72
return x ^ 103
def get_encode_code(l):
a = 0
b = l.split('-')
c = len(b)
s = ''
for i in range(c - 1, -1, -1):
a = do_xor(int(b[c - i - 1], 16), i)
s += chr(a)
return s[::-1]
def get_path_key(x, format_id, segment_index):
mg = ')(*&^flash@#$%a'
tm = self._download_json(
'http://data.video.qiyi.com/t?tn=' + str(random.random()), video_id,
note='Download path key of segment %d for format %s' % (segment_index + 1, format_id)
)['t']
t = str(int(math.floor(int(tm) / (600.0))))
return md5_text(t + mg + x)
video_urls_dict = {}
need_vip_warning_report = True
for format_item in data['vp']['tkl'][0]['vs']:
if 0 < int(format_item['bid']) <= 10:
format_id = self.get_format(format_item['bid'])
else:
continue
video_urls = []
video_urls_info = format_item['fs']
if not format_item['fs'][0]['l'].startswith('/'):
t = get_encode_code(format_item['fs'][0]['l'])
if t.endswith('mp4'):
video_urls_info = format_item['flvs']
for segment_index, segment in enumerate(video_urls_info):
vl = segment['l']
if not vl.startswith('/'):
vl = get_encode_code(vl)
is_vip_video = '/vip/' in vl
filesize = segment['b']
base_url = data['vp']['du'].split('/')
if not is_vip_video:
key = get_path_key(
vl.split('/')[-1].split('.')[0], format_id, segment_index)
base_url.insert(-1, key)
base_url = '/'.join(base_url)
param = {
'su': _uuid,
'qyid': uuid.uuid4().hex,
'client': '',
'z': '',
'bt': '',
'ct': '',
'tn': str(int(time.time()))
}
api_video_url = base_url + vl
if is_vip_video:
api_video_url = api_video_url.replace('.f4v', '.hml')
auth_result = self._authenticate_vip_video(
api_video_url, video_id, tvid, _uuid, need_vip_warning_report)
if auth_result is False:
need_vip_warning_report = False
break
param.update({
't': auth_result['t'],
# cid is hard-coded in com/qiyi/player/core/player/RuntimeData.as
'cid': 'afbe8fd3d73448c9',
'vid': video_id,
'QY00001': auth_result['u'],
})
api_video_url += '?' if '?' not in api_video_url else '&'
api_video_url += compat_urllib_parse_urlencode(param)
js = self._download_json(
api_video_url, video_id,
note='Download video info of segment %d for format %s' % (segment_index + 1, format_id))
video_url = js['l']
video_urls.append(
(video_url, filesize))
video_urls_dict[format_id] = video_urls
return video_urls_dict
def get_format(self, bid):
matched_format_ids = [_format_id for _bid, _format_id in self._FORMATS_MAP if _bid == str(bid)]
return matched_format_ids[0] if len(matched_format_ids) else None
def get_bid(self, format_id):
matched_bids = [_bid for _bid, _format_id in self._FORMATS_MAP if _format_id == format_id]
return matched_bids[0] if len(matched_bids) else None
def get_raw_data(self, tvid, video_id, enc_key, _uuid):
tm = str(int(time.time()))
tail = tm + tvid
param = {
'key': 'fvip',
'src': md5_text('youtube-dl'),
'tvId': tvid,
'vid': video_id, 'vid': video_id,
'vinfo': 1, 'cupid': 'qc_100001_100186',
'tm': tm, 'type': 'mp4',
'enc': md5_text(enc_key + tail), 'nolimit': 0,
'qyid': _uuid, 'agenttype': 13,
'tn': random.random(), 'src': 'd846d0c32d664d32b6b54ea48997a589',
# In iQiyi's flash player, um is set to 1 if there's a logged user 'sc': sc,
# Some 1080P formats are only available with a logged user. 't': tm - 7,
# Here force um=1 to trick the iQiyi server '__jsT': None,
'um': 1,
'authkey': md5_text(md5_text('') + tail),
'k_tag': 1,
} }
api_url = 'http://cache.video.qiyi.com/vms' + '?' + \ headers = {}
compat_urllib_parse_urlencode(param) cn_verification_proxy = self._downloader.params.get('cn_verification_proxy')
raw_data = self._download_json(api_url, video_id) if cn_verification_proxy:
return raw_data headers['Ytdl-request-proxy'] = cn_verification_proxy
return self._download_json(
def get_enc_key(self, video_id): 'http://cache.m.iqiyi.com/jp/tmts/%s/%s/' % (tvid, video_id),
# TODO: automatic key extraction video_id, transform_source=lambda s: remove_start(s, 'var tvInfoJs='),
# last update at 2016-01-22 for Zombie::bite query=params, headers=headers)
enc_key = '4a1caba4b4465345366f28da7c117d20'
return enc_key
def _extract_playlist(self, webpage): def _extract_playlist(self, webpage):
PAGE_SIZE = 50 PAGE_SIZE = 50
@ -571,58 +434,27 @@ class IqiyiIE(InfoExtractor):
r'data-player-tvid\s*=\s*[\'"](\d+)', webpage, 'tvid') r'data-player-tvid\s*=\s*[\'"](\d+)', webpage, 'tvid')
video_id = self._search_regex( video_id = self._search_regex(
r'data-player-videoid\s*=\s*[\'"]([a-f\d]+)', webpage, 'video_id') r'data-player-videoid\s*=\s*[\'"]([a-f\d]+)', webpage, 'video_id')
_uuid = uuid.uuid4().hex
enc_key = self.get_enc_key(video_id) for _ in range(5):
raw_data = self.get_raw_data(tvid, video_id)
raw_data = self.get_raw_data(tvid, video_id, enc_key, _uuid) if raw_data['code'] != 'A00000':
if raw_data['code'] == 'A00111':
self.raise_geo_restricted()
raise ExtractorError('Unable to load data. Error code: ' + raw_data['code'])
if raw_data['code'] != 'A000000': data = raw_data['data']
raise ExtractorError('Unable to load data. Error code: ' + raw_data['code'])
data = raw_data['data'] # iQiYi sometimes returns Ads
if not isinstance(data['playInfo'], dict):
self._sleep(5, video_id)
continue
title = data['vi']['vn'] title = data['playInfo']['an']
break
# generate video_urls_dict return {
video_urls_dict = self.construct_video_urls( 'id': video_id,
data, video_id, _uuid, tvid) 'title': title,
'url': data['m3u'],
# construct info }
entries = []
for format_id in video_urls_dict:
video_urls = video_urls_dict[format_id]
for i, video_url_info in enumerate(video_urls):
if len(entries) < i + 1:
entries.append({'formats': []})
entries[i]['formats'].append(
{
'url': video_url_info[0],
'filesize': video_url_info[-1],
'format_id': format_id,
'preference': int(self.get_bid(format_id))
}
)
for i in range(len(entries)):
self._sort_formats(entries[i]['formats'])
entries[i].update(
{
'id': '%s_part%d' % (video_id, i + 1),
'title': title,
}
)
if len(entries) > 1:
info = {
'_type': 'multi_video',
'id': video_id,
'title': title,
'entries': entries,
}
else:
info = entries[0]
info['id'] = video_id
info['title'] = title
return info

View File

@ -23,6 +23,7 @@ from ..utils import (
sanitized_Request, sanitized_Request,
str_or_none, str_or_none,
url_basename, url_basename,
urshift,
) )
@ -74,15 +75,11 @@ class LeIE(InfoExtractor):
'only_matching': True, 'only_matching': True,
}] }]
@staticmethod
def urshift(val, n):
return val >> n if val >= 0 else (val + 0x100000000) >> n
# ror() and calc_time_key() are reversed from a embedded swf file in KLetvPlayer.swf # ror() and calc_time_key() are reversed from a embedded swf file in KLetvPlayer.swf
def ror(self, param1, param2): def ror(self, param1, param2):
_loc3_ = 0 _loc3_ = 0
while _loc3_ < param2: while _loc3_ < param2:
param1 = self.urshift(param1, 1) + ((param1 & 1) << 31) param1 = urshift(param1, 1) + ((param1 & 1) << 31)
_loc3_ += 1 _loc3_ += 1
return param1 return param1

View File

@ -102,11 +102,11 @@ class MixcloudIE(InfoExtractor):
description = self._og_search_description(webpage) description = self._og_search_description(webpage)
like_count = parse_count(self._search_regex( like_count = parse_count(self._search_regex(
r'\bbutton-favorite[^>]+>.*?<span[^>]+class=["\']toggle-number[^>]+>\s*([^<]+)', r'\bbutton-favorite[^>]+>.*?<span[^>]+class=["\']toggle-number[^>]+>\s*([^<]+)',
webpage, 'like count', fatal=False)) webpage, 'like count', default=None))
view_count = str_to_int(self._search_regex( view_count = str_to_int(self._search_regex(
[r'<meta itemprop="interactionCount" content="UserPlays:([0-9]+)"', [r'<meta itemprop="interactionCount" content="UserPlays:([0-9]+)"',
r'/listeners/?">([0-9,.]+)</a>'], r'/listeners/?">([0-9,.]+)</a>'],
webpage, 'play count', fatal=False)) webpage, 'play count', default=None))
return { return {
'id': track_id, 'id': track_id,

119
youtube_dl/extractor/msn.py Normal file
View File

@ -0,0 +1,119 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
determine_ext,
ExtractorError,
int_or_none,
unescapeHTML,
)
class MSNIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?msn\.com/(?:[^/]+/)+(?P<display_id>[^/]+)/[a-z]{2}-(?P<id>[\da-zA-Z]+)'
_TESTS = [{
'url': 'http://www.msn.com/en-ae/foodanddrink/joinourtable/criminal-minds-shemar-moore-shares-a-touching-goodbye-message/vp-BBqQYNE',
'md5': '8442f66c116cbab1ff7098f986983458',
'info_dict': {
'id': 'BBqQYNE',
'display_id': 'criminal-minds-shemar-moore-shares-a-touching-goodbye-message',
'ext': 'mp4',
'title': 'Criminal Minds - Shemar Moore Shares A Touching Goodbye Message',
'description': 'md5:e8e89b897b222eb33a6b5067a8f1bc25',
'duration': 104,
'uploader': 'CBS Entertainment',
'uploader_id': 'IT0X5aoJ6bJgYerJXSDCgFmYPB1__54v',
},
}, {
'url': 'http://www.msn.com/en-ae/news/offbeat/meet-the-nine-year-old-self-made-millionaire/ar-BBt6ZKf',
'only_matching': True,
}, {
'url': 'http://www.msn.com/en-ae/video/watch/obama-a-lot-of-people-will-be-disappointed/vi-AAhxUMH',
'only_matching': True,
}, {
# geo restricted
'url': 'http://www.msn.com/en-ae/foodanddrink/joinourtable/the-first-fart-makes-you-laugh-the-last-fart-makes-you-cry/vp-AAhzIBU',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id, display_id = mobj.group('id', 'display_id')
webpage = self._download_webpage(url, display_id)
video = self._parse_json(
self._search_regex(
r'data-metadata\s*=\s*(["\'])(?P<data>.+?)\1',
webpage, 'video data', default='{}', group='data'),
display_id, transform_source=unescapeHTML)
if not video:
error = unescapeHTML(self._search_regex(
r'data-error=(["\'])(?P<error>.+?)\1',
webpage, 'error', group='error'))
raise ExtractorError('%s said: %s' % (self.IE_NAME, error), expected=True)
title = video['title']
formats = []
for file_ in video.get('videoFiles', []):
format_url = file_.get('url')
if not format_url:
continue
ext = determine_ext(format_url)
# .ism is not yet supported (see
# https://github.com/rg3/youtube-dl/issues/8118)
if ext == 'ism':
continue
if 'm3u8' in format_url:
# m3u8_native should not be used here until
# https://github.com/rg3/youtube-dl/issues/9913 is fixed
m3u8_formats = self._extract_m3u8_formats(
format_url, display_id, 'mp4',
m3u8_id='hls', fatal=False)
# Despite metadata in m3u8 all video+audio formats are
# actually video-only (no audio)
for f in m3u8_formats:
if f.get('acodec') != 'none' and f.get('vcodec') != 'none':
f['acodec'] = 'none'
formats.extend(m3u8_formats)
else:
formats.append({
'url': format_url,
'ext': 'mp4',
'format_id': 'http',
'width': int_or_none(file_.get('width')),
'height': int_or_none(file_.get('height')),
})
self._sort_formats(formats)
subtitles = {}
for file_ in video.get('files', []):
format_url = file_.get('url')
format_code = file_.get('formatCode')
if not format_url or not format_code:
continue
if compat_str(format_code) == '3100':
subtitles.setdefault(file_.get('culture', 'en'), []).append({
'ext': determine_ext(format_url, 'ttml'),
'url': format_url,
})
return {
'id': video_id,
'display_id': display_id,
'title': title,
'description': video.get('description'),
'thumbnail': video.get('headlineImage', {}).get('url'),
'duration': int_or_none(video.get('durationSecs')),
'uploader': video.get('sourceFriendly'),
'uploader_id': video.get('providerId'),
'creator': video.get('creator'),
'subtitles': subtitles,
'formats': formats,
}

View File

@ -9,10 +9,6 @@ from ..utils import (
lowercase_escape, lowercase_escape,
smuggle_url, smuggle_url,
unescapeHTML, unescapeHTML,
update_url_query,
int_or_none,
HEADRequest,
parse_iso8601,
) )
@ -192,9 +188,9 @@ class CSNNEIE(InfoExtractor):
class NBCNewsIE(ThePlatformIE): class NBCNewsIE(ThePlatformIE):
_VALID_URL = r'''(?x)https?://(?:www\.)?(?:nbcnews|today)\.com/ _VALID_URL = r'''(?x)https?://(?:www\.)?(?:nbcnews|today|msnbc)\.com/
(?:video/.+?/(?P<id>\d+)| (?:video/.+?/(?P<id>\d+)|
([^/]+/)*(?P<display_id>[^/?]+)) ([^/]+/)*(?:.*-)?(?P<mpx_id>[^/?]+))
''' '''
_TESTS = [ _TESTS = [
@ -216,13 +212,16 @@ class NBCNewsIE(ThePlatformIE):
'ext': 'mp4', 'ext': 'mp4',
'title': 'How Twitter Reacted To The Snowden Interview', 'title': 'How Twitter Reacted To The Snowden Interview',
'description': 'md5:65a0bd5d76fe114f3c2727aa3a81fe64', 'description': 'md5:65a0bd5d76fe114f3c2727aa3a81fe64',
'uploader': 'NBCU-NEWS',
'timestamp': 1401363060,
'upload_date': '20140529',
}, },
}, },
{ {
'url': 'http://www.nbcnews.com/feature/dateline-full-episodes/full-episode-family-business-n285156', 'url': 'http://www.nbcnews.com/feature/dateline-full-episodes/full-episode-family-business-n285156',
'md5': 'fdbf39ab73a72df5896b6234ff98518a', 'md5': 'fdbf39ab73a72df5896b6234ff98518a',
'info_dict': { 'info_dict': {
'id': 'Wjf9EDR3A_60', 'id': '529953347624',
'ext': 'mp4', 'ext': 'mp4',
'title': 'FULL EPISODE: Family Business', 'title': 'FULL EPISODE: Family Business',
'description': 'md5:757988edbaae9d7be1d585eb5d55cc04', 'description': 'md5:757988edbaae9d7be1d585eb5d55cc04',
@ -237,6 +236,9 @@ class NBCNewsIE(ThePlatformIE):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Nightly News with Brian Williams Full Broadcast (February 4)', 'title': 'Nightly News with Brian Williams Full Broadcast (February 4)',
'description': 'md5:1c10c1eccbe84a26e5debb4381e2d3c5', 'description': 'md5:1c10c1eccbe84a26e5debb4381e2d3c5',
'timestamp': 1423104900,
'uploader': 'NBCU-NEWS',
'upload_date': '20150205',
}, },
}, },
{ {
@ -245,10 +247,12 @@ class NBCNewsIE(ThePlatformIE):
'info_dict': { 'info_dict': {
'id': '529953347624', 'id': '529953347624',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Volkswagen U.S. Chief: We \'Totally Screwed Up\'', 'title': 'Volkswagen U.S. Chief:\xa0 We Have Totally Screwed Up',
'description': 'md5:d22d1281a24f22ea0880741bb4dd6301', 'description': 'md5:c8be487b2d80ff0594c005add88d8351',
'upload_date': '20150922',
'timestamp': 1442917800,
'uploader': 'NBCU-NEWS',
}, },
'expected_warnings': ['http-6000 is not available']
}, },
{ {
'url': 'http://www.today.com/video/see-the-aurora-borealis-from-space-in-stunning-new-nasa-video-669831235788', 'url': 'http://www.today.com/video/see-the-aurora-borealis-from-space-in-stunning-new-nasa-video-669831235788',
@ -260,6 +264,22 @@ class NBCNewsIE(ThePlatformIE):
'description': 'md5:74752b7358afb99939c5f8bb2d1d04b1', 'description': 'md5:74752b7358afb99939c5f8bb2d1d04b1',
'upload_date': '20160420', 'upload_date': '20160420',
'timestamp': 1461152093, 'timestamp': 1461152093,
'uploader': 'NBCU-NEWS',
},
},
{
'url': 'http://www.msnbc.com/all-in-with-chris-hayes/watch/the-chaotic-gop-immigration-vote-314487875924',
'md5': '6d236bf4f3dddc226633ce6e2c3f814d',
'info_dict': {
'id': '314487875924',
'ext': 'mp4',
'title': 'The chaotic GOP immigration vote',
'description': 'The Republican House votes on a border bill that has no chance of getting through the Senate or signed by the President and is drawing criticism from all sides.',
'thumbnail': 're:^https?://.*\.jpg$',
'timestamp': 1406937606,
'upload_date': '20140802',
'uploader': 'NBCU-NEWS',
'categories': ['MSNBC/Topics/Franchise/Best of last night', 'MSNBC/Topics/General/Congress'],
}, },
}, },
{ {
@ -290,105 +310,28 @@ class NBCNewsIE(ThePlatformIE):
} }
else: else:
# "feature" and "nightly-news" pages use theplatform.com # "feature" and "nightly-news" pages use theplatform.com
display_id = mobj.group('display_id') video_id = mobj.group('mpx_id')
webpage = self._download_webpage(url, display_id) if not video_id.isdigit():
info = None webpage = self._download_webpage(url, video_id)
bootstrap_json = self._search_regex( info = None
[r'(?m)(?:var\s+(?:bootstrapJson|playlistData)|NEWS\.videoObj)\s*=\s*({.+});?\s*$', bootstrap_json = self._search_regex(
r'videoObj\s*:\s*({.+})', r'data-video="([^"]+)"'], [r'(?m)(?:var\s+(?:bootstrapJson|playlistData)|NEWS\.videoObj)\s*=\s*({.+});?\s*$',
webpage, 'bootstrap json', default=None) r'videoObj\s*:\s*({.+})', r'data-video="([^"]+)"'],
bootstrap = self._parse_json( webpage, 'bootstrap json', default=None)
bootstrap_json, display_id, transform_source=unescapeHTML) bootstrap = self._parse_json(
if 'results' in bootstrap: bootstrap_json, video_id, transform_source=unescapeHTML)
info = bootstrap['results'][0]['video'] if 'results' in bootstrap:
elif 'video' in bootstrap: info = bootstrap['results'][0]['video']
info = bootstrap['video'] elif 'video' in bootstrap:
else: info = bootstrap['video']
info = bootstrap
video_id = info['mpxId']
title = info['title']
subtitles = {}
caption_links = info.get('captionLinks')
if caption_links:
for (sub_key, sub_ext) in (('smpte-tt', 'ttml'), ('web-vtt', 'vtt'), ('srt', 'srt')):
sub_url = caption_links.get(sub_key)
if sub_url:
subtitles.setdefault('en', []).append({
'url': sub_url,
'ext': sub_ext,
})
formats = []
for video_asset in info['videoAssets']:
video_url = video_asset.get('publicUrl')
if not video_url:
continue
container = video_asset.get('format')
asset_type = video_asset.get('assetType') or ''
if container == 'ISM' or asset_type == 'FireTV-Once':
continue
elif asset_type == 'OnceURL':
tp_formats, tp_subtitles = self._extract_theplatform_smil(
video_url, video_id)
formats.extend(tp_formats)
subtitles = self._merge_subtitles(subtitles, tp_subtitles)
else: else:
tbr = int_or_none(video_asset.get('bitRate') or video_asset.get('bitrate'), 1000) info = bootstrap
format_id = 'http%s' % ('-%d' % tbr if tbr else '') video_id = info['mpxId']
video_url = update_url_query(
video_url, {'format': 'redirect'})
# resolve the url so that we can check availability and detect the correct extension
head = self._request_webpage(
HEADRequest(video_url), video_id,
'Checking %s url' % format_id,
'%s is not available' % format_id,
fatal=False)
if head:
video_url = head.geturl()
formats.append({
'format_id': format_id,
'url': video_url,
'width': int_or_none(video_asset.get('width')),
'height': int_or_none(video_asset.get('height')),
'tbr': tbr,
'container': video_asset.get('format'),
})
self._sort_formats(formats)
return { return {
'_type': 'url_transparent',
'id': video_id, 'id': video_id,
'title': title, # http://feed.theplatform.com/f/2E2eJC/nbcnews also works
'description': info.get('description'), 'url': 'http://feed.theplatform.com/f/2E2eJC/nnd_NBCNews?byId=%s' % video_id,
'thumbnail': info.get('thumbnail'), 'ie_key': 'ThePlatformFeed',
'duration': int_or_none(info.get('duration')),
'timestamp': parse_iso8601(info.get('pubDate') or info.get('pub_date')),
'formats': formats,
'subtitles': subtitles,
} }
class MSNBCIE(InfoExtractor):
# https URLs redirect to corresponding http ones
_VALID_URL = r'https?://www\.msnbc\.com/[^/]+/watch/(?P<id>[^/]+)'
_TEST = {
'url': 'http://www.msnbc.com/all-in-with-chris-hayes/watch/the-chaotic-gop-immigration-vote-314487875924',
'md5': '6d236bf4f3dddc226633ce6e2c3f814d',
'info_dict': {
'id': 'n_hayes_Aimm_140801_272214',
'ext': 'mp4',
'title': 'The chaotic GOP immigration vote',
'description': 'The Republican House votes on a border bill that has no chance of getting through the Senate or signed by the President and is drawing criticism from all sides.',
'thumbnail': 're:^https?://.*\.jpg$',
'timestamp': 1406937606,
'upload_date': '20140802',
'uploader': 'NBCU-NEWS',
'categories': ['MSNBC/Topics/Franchise/Best of last night', 'MSNBC/Topics/General/Congress'],
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
embed_url = self._html_search_meta('embedURL', webpage)
return self.url_result(embed_url)

View File

@ -0,0 +1,95 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_str,
compat_urllib_parse_unquote,
)
from ..utils import (
int_or_none,
strip_or_none,
unified_timestamp,
)
class PolskieRadioIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?polskieradio\.pl/\d+/\d+/Artykul/(?P<id>[0-9]+)'
_TESTS = [{
'url': 'http://www.polskieradio.pl/7/5102/Artykul/1587943,Prof-Andrzej-Nowak-o-historii-nie-da-sie-myslec-beznamietnie',
'info_dict': {
'id': '1587943',
'title': 'Prof. Andrzej Nowak: o historii nie da się myśleć beznamiętnie',
'description': 'md5:12f954edbf3120c5e7075e17bf9fc5c5',
},
'playlist': [{
'md5': '2984ee6ce9046d91fc233bc1a864a09a',
'info_dict': {
'id': '1540576',
'ext': 'mp3',
'title': 'md5:d4623290d4ac983bf924061c75c23a0d',
'timestamp': 1456594200,
'upload_date': '20160227',
'duration': 2364,
},
}],
}, {
'url': 'http://www.polskieradio.pl/265/5217/Artykul/1635803,Euro-2016-nie-ma-miejsca-na-blad-Polacy-graja-ze-Szwajcaria-o-cwiercfinal',
'info_dict': {
'id': '1635803',
'title': 'Euro 2016: nie ma miejsca na błąd. Polacy grają ze Szwajcarią o ćwierćfinał',
'description': 'md5:01cb7d0cad58664095d72b51a1ebada2',
},
'playlist_mincount': 12,
}, {
'url': 'http://polskieradio.pl/9/305/Artykul/1632955,Bardzo-popularne-slowo-remis',
'only_matching': True,
}, {
'url': 'http://www.polskieradio.pl/7/5102/Artykul/1587943',
'only_matching': True,
}, {
# with mp4 video
'url': 'http://www.polskieradio.pl/9/299/Artykul/1634903,Brexit-Leszek-Miller-swiat-sie-nie-zawali-Europa-bedzie-trwac-dalej',
'only_matching': True,
}]
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
content = self._search_regex(
r'(?s)<div[^>]+class="audio atarticle"[^>]*>(.+?)<script>',
webpage, 'content')
timestamp = unified_timestamp(self._html_search_regex(
r'(?s)<span[^>]+id="datetime2"[^>]*>(.+?)</span>',
webpage, 'timestamp', fatal=False))
entries = []
media_urls = set()
for data_media in re.findall(r'<[^>]+data-media=({[^>]+})', content):
media = self._parse_json(data_media, playlist_id, fatal=False)
if not media.get('file') or not media.get('desc'):
continue
media_url = self._proto_relative_url(media['file'], 'http:')
if media_url in media_urls:
continue
media_urls.add(media_url)
entries.append({
'id': compat_str(media['id']),
'url': media_url,
'title': compat_urllib_parse_unquote(media['desc']),
'duration': int_or_none(media.get('length')),
'vcodec': 'none' if media.get('provider') == 'audio' else None,
'timestamp': timestamp,
})
title = self._og_search_title(webpage).strip()
description = strip_or_none(self._og_search_description(webpage))
return self.playlist_result(entries, playlist_id, title, description)

View File

@ -3,7 +3,7 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import( from ..utils import (
unified_strdate, unified_strdate,
str_to_int, str_to_int,
) )

View File

@ -6,7 +6,6 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
sanitized_Request,
urlencode_postdata, urlencode_postdata,
) )
@ -45,20 +44,26 @@ class StreamcloudIE(InfoExtractor):
(?:id="[^"]+"\s+)? (?:id="[^"]+"\s+)?
value="([^"]*)" value="([^"]*)"
''', orig_webpage) ''', orig_webpage)
post = urlencode_postdata(fields)
self._sleep(12, video_id) self._sleep(12, video_id)
headers = {
b'Content-Type': b'application/x-www-form-urlencoded',
}
req = sanitized_Request(url, post, headers)
webpage = self._download_webpage( webpage = self._download_webpage(
req, video_id, note='Downloading video page ...') url, video_id, data=urlencode_postdata(fields), headers={
title = self._html_search_regex( b'Content-Type': b'application/x-www-form-urlencoded',
r'<h1[^>]*>([^<]+)<', webpage, 'title') })
video_url = self._search_regex(
r'file:\s*"([^"]+)"', webpage, 'video URL') try:
title = self._html_search_regex(
r'<h1[^>]*>([^<]+)<', webpage, 'title')
video_url = self._search_regex(
r'file:\s*"([^"]+)"', webpage, 'video URL')
except ExtractorError:
message = self._html_search_regex(
r'(?s)<div[^>]+class=(["\']).*?msgboxinfo.*?\1[^>]*>(?P<message>.+?)</div>',
webpage, 'message', default=None, group='message')
if message:
raise ExtractorError('%s said: %s' % (self.IE_NAME, message), expected=True)
raise
thumbnail = self._search_regex( thumbnail = self._search_regex(
r'image:\s*"([^"]+)"', webpage, 'thumbnail URL', fatal=False) r'image:\s*"([^"]+)"', webpage, 'thumbnail URL', fatal=False)

View File

@ -6,17 +6,14 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
dict_get,
int_or_none,
try_get,
) )
class SVTBaseIE(InfoExtractor): class SVTBaseIE(InfoExtractor):
def _extract_video(self, url, video_id): def _extract_video(self, video_info, video_id):
info = self._download_json(url, video_id)
title = info['context']['title']
thumbnail = info['context'].get('thumbnailImage')
video_info = info['video']
formats = [] formats = []
for vr in video_info['videoReferences']: for vr in video_info['videoReferences']:
player_type = vr.get('playerType') player_type = vr.get('playerType')
@ -40,27 +37,49 @@ class SVTBaseIE(InfoExtractor):
'format_id': player_type, 'format_id': player_type,
'url': vurl, 'url': vurl,
}) })
if not formats and video_info.get('rights', {}).get('geoBlockedSweden'):
self.raise_geo_restricted('This video is only available in Sweden')
self._sort_formats(formats) self._sort_formats(formats)
subtitles = {} subtitles = {}
subtitle_references = video_info.get('subtitleReferences') subtitle_references = dict_get(video_info, ('subtitles', 'subtitleReferences'))
if isinstance(subtitle_references, list): if isinstance(subtitle_references, list):
for sr in subtitle_references: for sr in subtitle_references:
subtitle_url = sr.get('url') subtitle_url = sr.get('url')
subtitle_lang = sr.get('language', 'sv')
if subtitle_url: if subtitle_url:
subtitles.setdefault('sv', []).append({'url': subtitle_url}) if determine_ext(subtitle_url) == 'm3u8':
# TODO(yan12125): handle WebVTT in m3u8 manifests
continue
duration = video_info.get('materialLength') subtitles.setdefault(subtitle_lang, []).append({'url': subtitle_url})
age_limit = 18 if video_info.get('inappropriateForChildren') else 0
title = video_info.get('title')
series = video_info.get('programTitle')
season_number = int_or_none(video_info.get('season'))
episode = video_info.get('episodeTitle')
episode_number = int_or_none(video_info.get('episodeNumber'))
duration = int_or_none(dict_get(video_info, ('materialLength', 'contentDuration')))
age_limit = None
adult = dict_get(
video_info, ('inappropriateForChildren', 'blockedForChildren'),
skip_false_values=False)
if adult is not None:
age_limit = 18 if adult else 0
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'formats': formats, 'formats': formats,
'subtitles': subtitles, 'subtitles': subtitles,
'thumbnail': thumbnail,
'duration': duration, 'duration': duration,
'age_limit': age_limit, 'age_limit': age_limit,
'series': series,
'season_number': season_number,
'episode': episode,
'episode_number': episode_number,
} }
@ -68,11 +87,11 @@ class SVTIE(SVTBaseIE):
_VALID_URL = r'https?://(?:www\.)?svt\.se/wd\?(?:.*?&)?widgetId=(?P<widget_id>\d+)&.*?\barticleId=(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?svt\.se/wd\?(?:.*?&)?widgetId=(?P<widget_id>\d+)&.*?\barticleId=(?P<id>\d+)'
_TEST = { _TEST = {
'url': 'http://www.svt.se/wd?widgetId=23991&sectionId=541&articleId=2900353&type=embed&contextSectionId=123&autostart=false', 'url': 'http://www.svt.se/wd?widgetId=23991&sectionId=541&articleId=2900353&type=embed&contextSectionId=123&autostart=false',
'md5': '9648197555fc1b49e3dc22db4af51d46', 'md5': '33e9a5d8f646523ce0868ecfb0eed77d',
'info_dict': { 'info_dict': {
'id': '2900353', 'id': '2900353',
'ext': 'flv', 'ext': 'mp4',
'title': 'Här trycker Jagr till Giroux (under SVT-intervjun)', 'title': 'Stjärnorna skojar till det - under SVT-intervjun',
'duration': 27, 'duration': 27,
'age_limit': 0, 'age_limit': 0,
}, },
@ -89,15 +108,20 @@ class SVTIE(SVTBaseIE):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
widget_id = mobj.group('widget_id') widget_id = mobj.group('widget_id')
article_id = mobj.group('id') article_id = mobj.group('id')
return self._extract_video(
info = self._download_json(
'http://www.svt.se/wd?widgetId=%s&articleId=%s&format=json&type=embed&output=json' % (widget_id, article_id), 'http://www.svt.se/wd?widgetId=%s&articleId=%s&format=json&type=embed&output=json' % (widget_id, article_id),
article_id) article_id)
info_dict = self._extract_video(info['video'], article_id)
info_dict['title'] = info['context']['title']
return info_dict
class SVTPlayIE(SVTBaseIE): class SVTPlayIE(SVTBaseIE):
IE_DESC = 'SVT Play and Öppet arkiv' IE_DESC = 'SVT Play and Öppet arkiv'
_VALID_URL = r'https?://(?:www\.)?(?P<host>svtplay|oppetarkiv)\.se/video/(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.)?(?:svtplay|oppetarkiv)\.se/(?:video|klipp)/(?P<id>[0-9]+)'
_TEST = { _TESTS = [{
'url': 'http://www.svtplay.se/video/5996901/flygplan-till-haile-selassie/flygplan-till-haile-selassie-2', 'url': 'http://www.svtplay.se/video/5996901/flygplan-till-haile-selassie/flygplan-till-haile-selassie-2',
'md5': '2b6704fe4a28801e1a098bbf3c5ac611', 'md5': '2b6704fe4a28801e1a098bbf3c5ac611',
'info_dict': { 'info_dict': {
@ -113,12 +137,50 @@ class SVTPlayIE(SVTBaseIE):
}] }]
}, },
}, },
} }, {
# geo restricted to Sweden
'url': 'http://www.oppetarkiv.se/video/5219710/trollflojten',
'only_matching': True,
}, {
'url': 'http://www.svtplay.se/klipp/9023742/stopptid-om-bjorn-borg',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id = self._match_id(url)
video_id = mobj.group('id')
host = mobj.group('host') webpage = self._download_webpage(url, video_id)
return self._extract_video(
'http://www.%s.se/video/%s?output=json' % (host, video_id), data = self._parse_json(
video_id) self._search_regex(
r'root\["__svtplay"\]\s*=\s*([^;]+);',
webpage, 'embedded data', default='{}'),
video_id, fatal=False)
thumbnail = self._og_search_thumbnail(webpage)
if data:
video_info = try_get(
data, lambda x: x['context']['dispatcher']['stores']['VideoTitlePageStore']['data']['video'],
dict)
if video_info:
info_dict = self._extract_video(video_info, video_id)
info_dict.update({
'title': data['context']['dispatcher']['stores']['MetaStore']['title'],
'thumbnail': thumbnail,
})
return info_dict
video_id = self._search_regex(
r'<video[^>]+data-video-id=["\']([\da-zA-Z-]+)',
webpage, 'video id', default=None)
if video_id:
data = self._download_json(
'http://www.svt.se/videoplayer-api/video/%s' % video_id, video_id)
info_dict = self._extract_video(data, video_id)
if not info_dict.get('title'):
info_dict['title'] = re.sub(
r'\s*\|\s*.+?$', '',
info_dict.get('episode') or self._og_search_title(webpage))
return info_dict

View File

@ -48,6 +48,6 @@ class TF1IE(InfoExtractor):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
wat_id = self._html_search_regex( wat_id = self._html_search_regex(
r'(["\'])(?:https?:)?//www\.wat\.tv/embedframe/.*?(?P<id>\d{8}).*?\1', r'(["\'])(?:https?:)?//www\.wat\.tv/embedframe/.*?(?P<id>\d{8})\1',
webpage, 'wat id', group='id') webpage, 'wat id', group='id')
return self.url_result('wat:%s' % wat_id, 'Wat') return self.url_result('wat:%s' % wat_id, 'Wat')

View File

@ -277,9 +277,9 @@ class ThePlatformIE(ThePlatformBaseIE):
class ThePlatformFeedIE(ThePlatformBaseIE): class ThePlatformFeedIE(ThePlatformBaseIE):
_URL_TEMPLATE = '%s//feed.theplatform.com/f/%s/%s?form=json&byGuid=%s' _URL_TEMPLATE = '%s//feed.theplatform.com/f/%s/%s?form=json&%s'
_VALID_URL = r'https?://feed\.theplatform\.com/f/(?P<provider_id>[^/]+)/(?P<feed_id>[^?/]+)\?(?:[^&]+&)*byGuid=(?P<id>[a-zA-Z0-9_]+)' _VALID_URL = r'https?://feed\.theplatform\.com/f/(?P<provider_id>[^/]+)/(?P<feed_id>[^?/]+)\?(?:[^&]+&)*(?P<filter>by(?:Gui|I)d=(?P<id>[\w-]+))'
_TEST = { _TESTS = [{
# From http://player.theplatform.com/p/7wvmTC/MSNBCEmbeddedOffSite?guid=n_hardball_5biden_140207 # From http://player.theplatform.com/p/7wvmTC/MSNBCEmbeddedOffSite?guid=n_hardball_5biden_140207
'url': 'http://feed.theplatform.com/f/7wvmTC/msnbc_video-p-test?form=json&pretty=true&range=-40&byGuid=n_hardball_5biden_140207', 'url': 'http://feed.theplatform.com/f/7wvmTC/msnbc_video-p-test?form=json&pretty=true&range=-40&byGuid=n_hardball_5biden_140207',
'md5': '6e32495b5073ab414471b615c5ded394', 'md5': '6e32495b5073ab414471b615c5ded394',
@ -295,32 +295,38 @@ class ThePlatformFeedIE(ThePlatformBaseIE):
'categories': ['MSNBC/Issues/Democrats', 'MSNBC/Issues/Elections/Election 2016'], 'categories': ['MSNBC/Issues/Democrats', 'MSNBC/Issues/Elections/Election 2016'],
'uploader': 'NBCU-NEWS', 'uploader': 'NBCU-NEWS',
}, },
} }]
def _real_extract(self, url): def _extract_feed_info(self, provider_id, feed_id, filter_query, video_id, custom_fields=None, asset_types_query={}):
mobj = re.match(self._VALID_URL, url) real_url = self._URL_TEMPLATE % (self.http_scheme(), provider_id, feed_id, filter_query)
entry = self._download_json(real_url, video_id)['entries'][0]
video_id = mobj.group('id')
provider_id = mobj.group('provider_id')
feed_id = mobj.group('feed_id')
real_url = self._URL_TEMPLATE % (self.http_scheme(), provider_id, feed_id, video_id)
feed = self._download_json(real_url, video_id)
entry = feed['entries'][0]
formats = [] formats = []
subtitles = {} subtitles = {}
first_video_id = None first_video_id = None
duration = None duration = None
asset_types = []
for item in entry['media$content']: for item in entry['media$content']:
smil_url = item['plfile$url'] + '&mbr=true' smil_url = item['plfile$url']
cur_video_id = ThePlatformIE._match_id(smil_url) cur_video_id = ThePlatformIE._match_id(smil_url)
if first_video_id is None: if first_video_id is None:
first_video_id = cur_video_id first_video_id = cur_video_id
duration = float_or_none(item.get('plfile$duration')) duration = float_or_none(item.get('plfile$duration'))
cur_formats, cur_subtitles = self._extract_theplatform_smil(smil_url, video_id, 'Downloading SMIL data for %s' % cur_video_id) for asset_type in item['plfile$assetTypes']:
formats.extend(cur_formats) if asset_type in asset_types:
subtitles = self._merge_subtitles(subtitles, cur_subtitles) continue
asset_types.append(asset_type)
query = {
'mbr': 'true',
'formats': item['plfile$format'],
'assetTypes': asset_type,
}
if asset_type in asset_types_query:
query.update(asset_types_query[asset_type])
cur_formats, cur_subtitles = self._extract_theplatform_smil(update_url_query(
smil_url, query), video_id, 'Downloading SMIL data for %s' % asset_type)
formats.extend(cur_formats)
subtitles = self._merge_subtitles(subtitles, cur_subtitles)
self._sort_formats(formats) self._sort_formats(formats)
@ -344,5 +350,17 @@ class ThePlatformFeedIE(ThePlatformBaseIE):
'timestamp': timestamp, 'timestamp': timestamp,
'categories': categories, 'categories': categories,
}) })
if custom_fields:
ret.update(custom_fields(entry))
return ret return ret
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
provider_id = mobj.group('provider_id')
feed_id = mobj.group('feed_id')
filter_query = mobj.group('filter')
return self._extract_feed_info(provider_id, feed_id, filter_query, video_id)

View File

@ -4,6 +4,12 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import (
determine_ext,
clean_html,
get_element_by_attribute,
ExtractorError,
)
class TVPIE(InfoExtractor): class TVPIE(InfoExtractor):
@ -21,7 +27,7 @@ class TVPIE(InfoExtractor):
}, },
}, { }, {
'url': 'http://www.tvp.pl/there-can-be-anything-so-i-shortened-it/17916176', 'url': 'http://www.tvp.pl/there-can-be-anything-so-i-shortened-it/17916176',
'md5': 'c3b15ed1af288131115ff17a17c19dda', 'md5': 'b0005b542e5b4de643a9690326ab1257',
'info_dict': { 'info_dict': {
'id': '17916176', 'id': '17916176',
'ext': 'mp4', 'ext': 'mp4',
@ -53,6 +59,11 @@ class TVPIE(InfoExtractor):
webpage = self._download_webpage( webpage = self._download_webpage(
'http://www.tvp.pl/sess/tvplayer.php?object_id=%s' % video_id, video_id) 'http://www.tvp.pl/sess/tvplayer.php?object_id=%s' % video_id, video_id)
error_massage = get_element_by_attribute('class', 'msg error', webpage)
if error_massage:
raise ExtractorError('%s said: %s' % (
self.IE_NAME, clean_html(error_massage)), expected=True)
title = self._search_regex( title = self._search_regex(
r'name\s*:\s*([\'"])Title\1\s*,\s*value\s*:\s*\1(?P<title>.+?)\1', r'name\s*:\s*([\'"])Title\1\s*,\s*value\s*:\s*\1(?P<title>.+?)\1',
webpage, 'title', group='title') webpage, 'title', group='title')
@ -66,24 +77,50 @@ class TVPIE(InfoExtractor):
r"poster\s*:\s*'([^']+)'", webpage, 'thumbnail', default=None) r"poster\s*:\s*'([^']+)'", webpage, 'thumbnail', default=None)
video_url = self._search_regex( video_url = self._search_regex(
r'0:{src:([\'"])(?P<url>.*?)\1', webpage, 'formats', group='url', default=None) r'0:{src:([\'"])(?P<url>.*?)\1', webpage,
if not video_url: 'formats', group='url', default=None)
if not video_url or 'material_niedostepny.mp4' in video_url:
video_url = self._download_json( video_url = self._download_json(
'http://www.tvp.pl/pub/stat/videofileinfo?video_id=%s' % video_id, 'http://www.tvp.pl/pub/stat/videofileinfo?video_id=%s' % video_id,
video_id)['video_url'] video_id)['video_url']
ext = video_url.rsplit('.', 1)[-1] formats = []
if ext != 'ism/manifest': video_url_base = self._search_regex(
if '/' in ext: r'(https?://.+?/video)(?:\.(?:ism|f4m|m3u8)|-\d+\.mp4)',
ext = 'mp4' video_url, 'video base url', default=None)
if video_url_base:
# TODO: Current DASH formats are broken - $Time$ pattern in
# <SegmentTemplate> not implemented yet
# formats.extend(self._extract_mpd_formats(
# video_url_base + '.ism/video.mpd',
# video_id, mpd_id='dash', fatal=False))
formats.extend(self._extract_f4m_formats(
video_url_base + '.ism/video.f4m',
video_id, f4m_id='hds', fatal=False))
m3u8_formats = self._extract_m3u8_formats(
video_url_base + '.ism/video.m3u8', video_id,
'mp4', 'm3u8_native', m3u8_id='hls', fatal=False)
self._sort_formats(m3u8_formats)
m3u8_formats = list(filter(
lambda f: f.get('vcodec') != 'none' and f.get('resolution') != 'multiple',
m3u8_formats))
formats.extend(m3u8_formats)
for i, m3u8_format in enumerate(m3u8_formats, 2):
http_url = '%s-%d.mp4' % (video_url_base, i)
if self._is_valid_url(http_url, video_id):
f = m3u8_format.copy()
f.update({
'url': http_url,
'format_id': f['format_id'].replace('hls', 'http'),
'protocol': 'http',
})
formats.append(f)
else:
formats = [{ formats = [{
'format_id': 'direct', 'format_id': 'direct',
'url': video_url, 'url': video_url,
'ext': ext, 'ext': determine_ext(video_url, 'mp4'),
}] }]
else:
m3u8_url = re.sub('([^/]*)\.ism/manifest', r'\1.ism/\1.m3u8', video_url)
formats = self._extract_m3u8_formats(m3u8_url, video_id, 'mp4')
self._sort_formats(formats) self._sort_formats(formats)

View File

@ -0,0 +1,84 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import (
int_or_none,
js_to_json,
remove_end,
unified_strdate,
)
class VidbitIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?vidbit\.co/(?:watch|embed)\?.*?\bv=(?P<id>[\da-zA-Z]+)'
_TESTS = [{
'url': 'http://www.vidbit.co/watch?v=jkL2yDOEq2',
'md5': '1a34b7f14defe3b8fafca9796892924d',
'info_dict': {
'id': 'jkL2yDOEq2',
'ext': 'mp4',
'title': 'Intro to VidBit',
'description': 'md5:5e0d6142eec00b766cbf114bfd3d16b7',
'thumbnail': 're:https?://.*\.jpg$',
'upload_date': '20160618',
'view_count': int,
'comment_count': int,
}
}, {
'url': 'http://www.vidbit.co/embed?v=jkL2yDOEq2&auto=0&water=0',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(
compat_urlparse.urljoin(url, '/watch?v=%s' % video_id), video_id)
video_url, title = [None] * 2
config = self._parse_json(self._search_regex(
r'(?s)\.setup\(({.+?})\);', webpage, 'setup', default='{}'),
video_id, transform_source=js_to_json)
if config:
if config.get('file'):
video_url = compat_urlparse.urljoin(url, config['file'])
title = config.get('title')
if not video_url:
video_url = compat_urlparse.urljoin(url, self._search_regex(
r'file\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1',
webpage, 'video URL', group='url'))
if not title:
title = remove_end(
self._html_search_regex(
(r'<h1>(.+?)</h1>', r'<title>(.+?)</title>'),
webpage, 'title', default=None) or self._og_search_title(webpage),
' - VidBit')
description = self._html_search_meta(
('description', 'og:description', 'twitter:description'),
webpage, 'description')
upload_date = unified_strdate(self._html_search_meta(
'datePublished', webpage, 'upload date'))
view_count = int_or_none(self._search_regex(
r'<strong>(\d+)</strong> views',
webpage, 'view count', fatal=False))
comment_count = int_or_none(self._search_regex(
r'id=["\']cmt_num["\'][^>]*>\((\d+)\)',
webpage, 'comment count', fatal=False))
return {
'id': video_id,
'url': video_url,
'title': title,
'description': description,
'thumbnail': self._og_search_thumbnail(webpage),
'upload_date': upload_date,
'view_count': view_count,
'comment_count': comment_count,
}

View File

@ -16,6 +16,7 @@ from ..utils import (
ExtractorError, ExtractorError,
InAdvancePagedList, InAdvancePagedList,
int_or_none, int_or_none,
NO_DEFAULT,
RegexNotFoundError, RegexNotFoundError,
sanitized_Request, sanitized_Request,
smuggle_url, smuggle_url,
@ -56,6 +57,26 @@ class VimeoBaseInfoExtractor(InfoExtractor):
self._set_vimeo_cookie('vuid', vuid) self._set_vimeo_cookie('vuid', vuid)
self._download_webpage(login_request, None, False, 'Wrong login info') self._download_webpage(login_request, None, False, 'Wrong login info')
def _verify_video_password(self, url, video_id, webpage):
password = self._downloader.params.get('videopassword')
if password is None:
raise ExtractorError('This video is protected by a password, use the --video-password option', expected=True)
token, vuid = self._extract_xsrft_and_vuid(webpage)
data = urlencode_postdata({
'password': password,
'token': token,
})
if url.startswith('http://'):
# vimeo only supports https now, but the user can give an http url
url = url.replace('http://', 'https://')
password_request = sanitized_Request(url + '/password', data)
password_request.add_header('Content-Type', 'application/x-www-form-urlencoded')
password_request.add_header('Referer', url)
self._set_vimeo_cookie('vuid', vuid)
return self._download_webpage(
password_request, video_id,
'Verifying the password', 'Wrong password')
def _extract_xsrft_and_vuid(self, webpage): def _extract_xsrft_and_vuid(self, webpage):
xsrft = self._search_regex( xsrft = self._search_regex(
r'(?:(?P<q1>["\'])xsrft(?P=q1)\s*:|xsrft\s*[=:])\s*(?P<q>["\'])(?P<xsrft>.+?)(?P=q)', r'(?:(?P<q1>["\'])xsrft(?P=q1)\s*:|xsrft\s*[=:])\s*(?P<q>["\'])(?P<xsrft>.+?)(?P=q)',
@ -146,7 +167,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
\. \.
)? )?
vimeo(?P<pro>pro)?\.com/ vimeo(?P<pro>pro)?\.com/
(?!channels/[^/?#]+/?(?:$|[?#])|[^/]+/review/|(?:album|ondemand)/) (?!(?:channels|album)/[^/?#]+/?(?:$|[?#])|[^/]+/review/|ondemand/)
(?:.*?/)? (?:.*?/)?
(?: (?:
(?: (?:
@ -227,8 +248,6 @@ class VimeoIE(VimeoBaseInfoExtractor):
{ {
'url': 'http://vimeo.com/channels/keypeele/75629013', 'url': 'http://vimeo.com/channels/keypeele/75629013',
'md5': '2f86a05afe9d7abc0b9126d229bbe15d', 'md5': '2f86a05afe9d7abc0b9126d229bbe15d',
'note': 'Video is freely available via original URL '
'and protected with password when accessed via http://vimeo.com/75629013',
'info_dict': { 'info_dict': {
'id': '75629013', 'id': '75629013',
'ext': 'mp4', 'ext': 'mp4',
@ -272,7 +291,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
{ {
# contains original format # contains original format
'url': 'https://vimeo.com/33951933', 'url': 'https://vimeo.com/33951933',
'md5': '53c688fa95a55bf4b7293d37a89c5c53', 'md5': '2d9f5475e0537f013d0073e812ab89e6',
'info_dict': { 'info_dict': {
'id': '33951933', 'id': '33951933',
'ext': 'mp4', 'ext': 'mp4',
@ -284,6 +303,29 @@ class VimeoIE(VimeoBaseInfoExtractor):
'description': 'md5:ae23671e82d05415868f7ad1aec21147', 'description': 'md5:ae23671e82d05415868f7ad1aec21147',
}, },
}, },
{
# only available via https://vimeo.com/channels/tributes/6213729 and
# not via https://vimeo.com/6213729
'url': 'https://vimeo.com/channels/tributes/6213729',
'info_dict': {
'id': '6213729',
'ext': 'mp4',
'title': 'Vimeo Tribute: The Shining',
'uploader': 'Casey Donahue',
'uploader_url': 're:https?://(?:www\.)?vimeo\.com/caseydonahue',
'uploader_id': 'caseydonahue',
'upload_date': '20090821',
'description': 'md5:bdbf314014e58713e6e5b66eb252f4a6',
},
'params': {
'skip_download': True,
},
'expected_warnings': ['Unable to download JSON metadata'],
},
{
'url': 'http://vimeo.com/moogaloop.swf?clip_id=2539741',
'only_matching': True,
},
{ {
'url': 'https://vimeo.com/109815029', 'url': 'https://vimeo.com/109815029',
'note': 'Video not completely processed, "failed" seed status', 'note': 'Video not completely processed, "failed" seed status',
@ -293,6 +335,10 @@ class VimeoIE(VimeoBaseInfoExtractor):
'url': 'https://vimeo.com/groups/travelhd/videos/22439234', 'url': 'https://vimeo.com/groups/travelhd/videos/22439234',
'only_matching': True, 'only_matching': True,
}, },
{
'url': 'https://vimeo.com/album/2632481/video/79010983',
'only_matching': True,
},
{ {
# source file returns 403: Forbidden # source file returns 403: Forbidden
'url': 'https://vimeo.com/7809605', 'url': 'https://vimeo.com/7809605',
@ -319,26 +365,6 @@ class VimeoIE(VimeoBaseInfoExtractor):
if mobj: if mobj:
return mobj.group(1) return mobj.group(1)
def _verify_video_password(self, url, video_id, webpage):
password = self._downloader.params.get('videopassword')
if password is None:
raise ExtractorError('This video is protected by a password, use the --video-password option', expected=True)
token, vuid = self._extract_xsrft_and_vuid(webpage)
data = urlencode_postdata({
'password': password,
'token': token,
})
if url.startswith('http://'):
# vimeo only supports https now, but the user can give an http url
url = url.replace('http://', 'https://')
password_request = sanitized_Request(url + '/password', data)
password_request.add_header('Content-Type', 'application/x-www-form-urlencoded')
password_request.add_header('Referer', url)
self._set_vimeo_cookie('vuid', vuid)
return self._download_webpage(
password_request, video_id,
'Verifying the password', 'Wrong password')
def _verify_player_video_password(self, url, video_id): def _verify_player_video_password(self, url, video_id):
password = self._downloader.params.get('videopassword') password = self._downloader.params.get('videopassword')
if password is None: if password is None:
@ -369,7 +395,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
orig_url = url orig_url = url
if mobj.group('pro') or mobj.group('player'): if mobj.group('pro') or mobj.group('player'):
url = 'https://player.vimeo.com/video/' + video_id url = 'https://player.vimeo.com/video/' + video_id
else: elif any(p in url for p in ('play_redirect_hls', 'moogaloop.swf')):
url = 'https://vimeo.com/' + video_id url = 'https://vimeo.com/' + video_id
# Retrieve video webpage to extract further information # Retrieve video webpage to extract further information
@ -630,8 +656,21 @@ class VimeoChannelIE(VimeoBaseInfoExtractor):
webpage = self._login_list_password(page_url, list_id, webpage) webpage = self._login_list_password(page_url, list_id, webpage)
yield self._extract_list_title(webpage) yield self._extract_list_title(webpage)
for video_id in re.findall(r'id="clip_(\d+?)"', webpage): # Try extracting href first since not all videos are available via
yield self.url_result('https://vimeo.com/%s' % video_id, 'Vimeo') # short https://vimeo.com/id URL (e.g. https://vimeo.com/channels/tributes/6213729)
clips = re.findall(
r'id="clip_(\d+)"[^>]*>\s*<a[^>]+href="(/(?:[^/]+/)*\1)', webpage)
if clips:
for video_id, video_url in clips:
yield self.url_result(
compat_urlparse.urljoin(base_url, video_url),
VimeoIE.ie_key(), video_id=video_id)
# More relaxed fallback
else:
for video_id in re.findall(r'id=["\']clip_(\d+)', webpage):
yield self.url_result(
'https://vimeo.com/%s' % video_id,
VimeoIE.ie_key(), video_id=video_id)
if re.search(self._MORE_PAGES_INDICATOR, webpage, re.DOTALL) is None: if re.search(self._MORE_PAGES_INDICATOR, webpage, re.DOTALL) is None:
break break
@ -668,7 +707,7 @@ class VimeoUserIE(VimeoChannelIE):
class VimeoAlbumIE(VimeoChannelIE): class VimeoAlbumIE(VimeoChannelIE):
IE_NAME = 'vimeo:album' IE_NAME = 'vimeo:album'
_VALID_URL = r'https://vimeo\.com/album/(?P<id>\d+)' _VALID_URL = r'https://vimeo\.com/album/(?P<id>\d+)(?:$|[?#]|/(?!video))'
_TITLE_RE = r'<header id="page_header">\n\s*<h1>(.*?)</h1>' _TITLE_RE = r'<header id="page_header">\n\s*<h1>(.*?)</h1>'
_TESTS = [{ _TESTS = [{
'url': 'https://vimeo.com/album/2632481', 'url': 'https://vimeo.com/album/2632481',
@ -688,6 +727,13 @@ class VimeoAlbumIE(VimeoChannelIE):
'params': { 'params': {
'videopassword': 'youtube-dl', 'videopassword': 'youtube-dl',
} }
}, {
'url': 'https://vimeo.com/album/2632481/sort:plays/format:thumbnail',
'only_matching': True,
}, {
# TODO: respect page number
'url': 'https://vimeo.com/album/2632481/page:2/sort:plays/format:thumbnail',
'only_matching': True,
}] }]
def _page_url(self, base_url, pagenum): def _page_url(self, base_url, pagenum):
@ -746,12 +792,39 @@ class VimeoReviewIE(VimeoBaseInfoExtractor):
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
'uploader_id': 'user22258446', 'uploader_id': 'user22258446',
} }
}, {
'note': 'Password protected',
'url': 'https://vimeo.com/user37284429/review/138823582/c4d865efde',
'info_dict': {
'id': '138823582',
'ext': 'mp4',
'title': 'EFFICIENT PICKUP MASTERCLASS MODULE 1',
'uploader': 'TMB',
'uploader_id': 'user37284429',
},
'params': {
'videopassword': 'holygrail',
},
}] }]
def _real_initialize(self):
self._login()
def _get_config_url(self, webpage_url, video_id, video_password_verified=False):
webpage = self._download_webpage(webpage_url, video_id)
config_url = self._html_search_regex(
r'data-config-url="([^"]+)"', webpage, 'config URL',
default=NO_DEFAULT if video_password_verified else None)
if config_url is None:
self._verify_video_password(webpage_url, video_id, webpage)
config_url = self._get_config_url(
webpage_url, video_id, video_password_verified=True)
return config_url
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
config = self._download_json( config_url = self._get_config_url(url, video_id)
'https://player.vimeo.com/video/%s/config' % video_id, video_id) config = self._download_json(config_url, video_id)
info_dict = self._parse_config(config, video_id) info_dict = self._parse_config(config, video_id)
self._vimeo_sort_formats(info_dict['formats']) self._vimeo_sort_formats(info_dict['formats'])
info_dict['id'] = video_id info_dict['id'] = video_id

View File

@ -24,6 +24,7 @@ class VineIE(InfoExtractor):
'upload_date': '20130519', 'upload_date': '20130519',
'uploader': 'Jack Dorsey', 'uploader': 'Jack Dorsey',
'uploader_id': '76', 'uploader_id': '76',
'view_count': int,
'like_count': int, 'like_count': int,
'comment_count': int, 'comment_count': int,
'repost_count': int, 'repost_count': int,
@ -39,6 +40,7 @@ class VineIE(InfoExtractor):
'upload_date': '20140815', 'upload_date': '20140815',
'uploader': 'Mars Ruiz', 'uploader': 'Mars Ruiz',
'uploader_id': '1102363502380728320', 'uploader_id': '1102363502380728320',
'view_count': int,
'like_count': int, 'like_count': int,
'comment_count': int, 'comment_count': int,
'repost_count': int, 'repost_count': int,
@ -54,6 +56,7 @@ class VineIE(InfoExtractor):
'upload_date': '20130430', 'upload_date': '20130430',
'uploader': 'Z3k3', 'uploader': 'Z3k3',
'uploader_id': '936470460173008896', 'uploader_id': '936470460173008896',
'view_count': int,
'like_count': int, 'like_count': int,
'comment_count': int, 'comment_count': int,
'repost_count': int, 'repost_count': int,
@ -71,6 +74,7 @@ class VineIE(InfoExtractor):
'upload_date': '20150705', 'upload_date': '20150705',
'uploader': 'Pimry_zaa', 'uploader': 'Pimry_zaa',
'uploader_id': '1135760698325307392', 'uploader_id': '1135760698325307392',
'view_count': int,
'like_count': int, 'like_count': int,
'comment_count': int, 'comment_count': int,
'repost_count': int, 'repost_count': int,
@ -109,6 +113,7 @@ class VineIE(InfoExtractor):
'upload_date': unified_strdate(data.get('created')), 'upload_date': unified_strdate(data.get('created')),
'uploader': username, 'uploader': username,
'uploader_id': data.get('userIdStr'), 'uploader_id': data.get('userIdStr'),
'view_count': int_or_none(data.get('loops', {}).get('count')),
'like_count': int_or_none(data.get('likes', {}).get('count')), 'like_count': int_or_none(data.get('likes', {}).get('count')),
'comment_count': int_or_none(data.get('comments', {}).get('count')), 'comment_count': int_or_none(data.get('comments', {}).get('count')),
'repost_count': int_or_none(data.get('reposts', {}).get('count')), 'repost_count': int_or_none(data.get('reposts', {}).get('count')),

View File

@ -3,6 +3,7 @@ from __future__ import unicode_literals
import re import re
import json import json
import sys
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str from ..compat import compat_str
@ -10,7 +11,6 @@ from ..utils import (
ExtractorError, ExtractorError,
int_or_none, int_or_none,
orderedSet, orderedSet,
sanitized_Request,
str_to_int, str_to_int,
unescapeHTML, unescapeHTML,
unified_strdate, unified_strdate,
@ -190,7 +190,7 @@ class VKIE(InfoExtractor):
if username is None: if username is None:
return return
login_page = self._download_webpage( login_page, url_handle = self._download_webpage_handle(
'https://vk.com', None, 'Downloading login page') 'https://vk.com', None, 'Downloading login page')
login_form = self._hidden_inputs(login_page) login_form = self._hidden_inputs(login_page)
@ -200,11 +200,26 @@ class VKIE(InfoExtractor):
'pass': password.encode('cp1251'), 'pass': password.encode('cp1251'),
}) })
request = sanitized_Request( # https://new.vk.com/ serves two same remixlhk cookies in Set-Cookie header
'https://login.vk.com/?act=login', # and expects the first one to be set rather than second (see
urlencode_postdata(login_form)) # https://github.com/rg3/youtube-dl/issues/9841#issuecomment-227871201).
# As of RFC6265 the newer one cookie should be set into cookie store
# what actually happens.
# We will workaround this VK issue by resetting the remixlhk cookie to
# the first one manually.
cookies = url_handle.headers.get('Set-Cookie')
if sys.version_info[0] >= 3:
cookies = cookies.encode('iso-8859-1')
cookies = cookies.decode('utf-8')
remixlhk = re.search(r'remixlhk=(.+?);.*?\bdomain=(.+?)(?:[,;]|$)', cookies)
if remixlhk:
value, domain = remixlhk.groups()
self._set_cookie(domain, 'remixlhk', value)
login_page = self._download_webpage( login_page = self._download_webpage(
request, None, note='Logging in as %s' % username) 'https://login.vk.com/?act=login', None,
note='Logging in as %s' % username,
data=urlencode_postdata(login_form))
if re.search(r'onLoginFailed', login_page): if re.search(r'onLoginFailed', login_page):
raise ExtractorError( raise ExtractorError(

View File

@ -6,17 +6,23 @@ from ..compat import compat_urllib_parse_unquote
class XNXXIE(InfoExtractor): class XNXXIE(InfoExtractor):
_VALID_URL = r'^https?://(?:video|www)\.xnxx\.com/video(?P<id>[0-9]+)/(.*)' _VALID_URL = r'https?://(?:video|www)\.xnxx\.com/video-?(?P<id>[0-9a-z]+)/'
_TEST = { _TESTS = [{
'url': 'http://video.xnxx.com/video1135332/lida_naked_funny_actress_5_', 'url': 'http://www.xnxx.com/video-55awb78/skyrim_test_video',
'md5': '0831677e2b4761795f68d417e0b7b445', 'md5': 'ef7ecee5af78f8b03dca2cf31341d3a0',
'info_dict': { 'info_dict': {
'id': '1135332', 'id': '55awb78',
'ext': 'flv', 'ext': 'flv',
'title': 'lida » Naked Funny Actress (5)', 'title': 'Skyrim Test Video',
'age_limit': 18, 'age_limit': 18,
} },
} }, {
'url': 'http://video.xnxx.com/video1135332/lida_naked_funny_actress_5_',
'only_matching': True,
}, {
'url': 'http://www.xnxx.com/video-55awb78/',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)

View File

@ -501,6 +501,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'youtube_include_dash_manifest': True, 'youtube_include_dash_manifest': True,
'format': '141', 'format': '141',
}, },
'skip': 'format 141 not served anymore',
}, },
# DASH manifest with encrypted signature # DASH manifest with encrypted signature
{ {
@ -517,7 +518,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
}, },
'params': { 'params': {
'youtube_include_dash_manifest': True, 'youtube_include_dash_manifest': True,
'format': '141', 'format': '141/bestaudio[ext=m4a]',
}, },
}, },
# JS player signature function name containing $ # JS player signature function name containing $
@ -537,7 +538,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
}, },
'params': { 'params': {
'youtube_include_dash_manifest': True, 'youtube_include_dash_manifest': True,
'format': '141', 'format': '141/bestaudio[ext=m4a]',
}, },
}, },
# Controversy video # Controversy video
@ -618,7 +619,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/olympic', 'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/olympic',
'license': 'Standard YouTube License', 'license': 'Standard YouTube License',
'description': 'HO09 - Women - GER-AUS - Hockey - 31 July 2012 - London 2012 Olympic Games', 'description': 'HO09 - Women - GER-AUS - Hockey - 31 July 2012 - London 2012 Olympic Games',
'uploader': 'Olympics', 'uploader': 'Olympic',
'title': 'Hockey - Women - GER-AUS - London 2012 Olympic Games', 'title': 'Hockey - Women - GER-AUS - London 2012 Olympic Games',
}, },
'params': { 'params': {
@ -671,7 +672,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/dorappi2000', 'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/dorappi2000',
'uploader': 'dorappi2000', 'uploader': 'dorappi2000',
'license': 'Standard YouTube License', 'license': 'Standard YouTube License',
'formats': 'mincount:33', 'formats': 'mincount:32',
}, },
}, },
# DASH manifest with segment_list # DASH manifest with segment_list
@ -691,7 +692,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'params': { 'params': {
'youtube_include_dash_manifest': True, 'youtube_include_dash_manifest': True,
'format': '135', # bestvideo 'format': '135', # bestvideo
} },
'skip': 'This live event has ended.',
}, },
{ {
# Multifeed videos (multiple cameras), URL is for Main Camera # Multifeed videos (multiple cameras), URL is for Main Camera
@ -762,6 +764,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'title': 'DevConf.cz 2016 Day 2 Workshops 1 14:00 - 15:30', 'title': 'DevConf.cz 2016 Day 2 Workshops 1 14:00 - 15:30',
}, },
'playlist_count': 2, 'playlist_count': 2,
'skip': 'Not multifeed anymore',
}, },
{ {
'url': 'http://vid.plus/FlRa-iH7PGw', 'url': 'http://vid.plus/FlRa-iH7PGw',
@ -814,6 +817,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
'skip': 'This video does not exist.',
}, },
{ {
# Video licensed under Creative Commons # Video licensed under Creative Commons
@ -1331,7 +1335,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
(?:[a-zA-Z-]+="[^"]*"\s+)*? (?:[a-zA-Z-]+="[^"]*"\s+)*?
(?:title|href)="([^"]+)"\s+ (?:title|href)="([^"]+)"\s+
(?:[a-zA-Z-]+="[^"]*"\s+)*? (?:[a-zA-Z-]+="[^"]*"\s+)*?
class="(?:yt-uix-redirect-link|yt-uix-sessionlink[^"]*)"[^>]*> class="[^"]*"[^>]*>
[^<]+\.{3}\s* [^<]+\.{3}\s*
</a> </a>
''', r'\1', video_description) ''', r'\1', video_description)

View File

@ -131,8 +131,9 @@ class JSInterpreter(object):
if variable in local_vars: if variable in local_vars:
obj = local_vars[variable] obj = local_vars[variable]
else: else:
obj = self._objects.setdefault( if variable not in self._objects:
variable, self.extract_object(variable)) self._objects[variable] = self.extract_object(variable)
obj = self._objects[variable]
if arg_str is None: if arg_str is None:
# Member access # Member access
@ -203,7 +204,8 @@ class JSInterpreter(object):
argvals = tuple([ argvals = tuple([
int(v) if v.isdigit() else local_vars[v] int(v) if v.isdigit() else local_vars[v]
for v in m.group('args').split(',')]) for v in m.group('args').split(',')])
self._functions.setdefault(fname, self.extract_function(fname)) if fname not in self._functions:
self._functions[fname] = self.extract_function(fname)
return self._functions[fname](argvals) return self._functions[fname](argvals)
raise ExtractorError('Unsupported JS expression %r' % expr) raise ExtractorError('Unsupported JS expression %r' % expr)
@ -230,7 +232,7 @@ class JSInterpreter(object):
def extract_function(self, funcname): def extract_function(self, funcname):
func_m = re.search( func_m = re.search(
r'''(?x) r'''(?x)
(?:function\s+%s|[{;,]%s\s*=\s*function|var\s+%s\s*=\s*function)\s* (?:function\s+%s|[{;,]\s*%s\s*=\s*function|var\s+%s\s*=\s*function)\s*
\((?P<args>[^)]*)\)\s* \((?P<args>[^)]*)\)\s*
\{(?P<code>[^}]+)\}''' % ( \{(?P<code>[^}]+)\}''' % (
re.escape(funcname), re.escape(funcname), re.escape(funcname)), re.escape(funcname), re.escape(funcname), re.escape(funcname)),

View File

@ -76,7 +76,7 @@ class Socks4Error(ProxyError):
CODES = { CODES = {
91: 'request rejected or failed', 91: 'request rejected or failed',
92: 'request rejected becasue SOCKS server cannot connect to identd on the client', 92: 'request rejected because SOCKS server cannot connect to identd on the client',
93: 'request rejected because the client program and identd report different user-ids' 93: 'request rejected because the client program and identd report different user-ids'
} }

View File

@ -110,6 +110,49 @@ ACCENT_CHARS = dict(zip('ÂÃÄÀÁÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖŐØŒÙ
itertools.chain('AAAAAA', ['AE'], 'CEEEEIIIIDNOOOOOOO', ['OE'], 'UUUUUYP', ['ss'], itertools.chain('AAAAAA', ['AE'], 'CEEEEIIIIDNOOOOOOO', ['OE'], 'UUUUUYP', ['ss'],
'aaaaaa', ['ae'], 'ceeeeiiiionooooooo', ['oe'], 'uuuuuypy'))) 'aaaaaa', ['ae'], 'ceeeeiiiionooooooo', ['oe'], 'uuuuuypy')))
DATE_FORMATS = (
'%d %B %Y',
'%d %b %Y',
'%B %d %Y',
'%b %d %Y',
'%b %dst %Y %I:%M',
'%b %dnd %Y %I:%M',
'%b %dth %Y %I:%M',
'%Y %m %d',
'%Y-%m-%d',
'%Y/%m/%d',
'%Y/%m/%d %H:%M:%S',
'%Y-%m-%d %H:%M:%S',
'%Y-%m-%d %H:%M:%S.%f',
'%d.%m.%Y %H:%M',
'%d.%m.%Y %H.%M',
'%Y-%m-%dT%H:%M:%SZ',
'%Y-%m-%dT%H:%M:%S.%fZ',
'%Y-%m-%dT%H:%M:%S.%f0Z',
'%Y-%m-%dT%H:%M:%S',
'%Y-%m-%dT%H:%M:%S.%f',
'%Y-%m-%dT%H:%M',
)
DATE_FORMATS_DAY_FIRST = list(DATE_FORMATS)
DATE_FORMATS_DAY_FIRST.extend([
'%d-%m-%Y',
'%d.%m.%Y',
'%d.%m.%y',
'%d/%m/%Y',
'%d/%m/%y',
'%d/%m/%Y %H:%M:%S',
])
DATE_FORMATS_MONTH_FIRST = list(DATE_FORMATS)
DATE_FORMATS_MONTH_FIRST.extend([
'%m-%d-%Y',
'%m.%d.%Y',
'%m/%d/%Y',
'%m/%d/%y',
'%m/%d/%Y %H:%M:%S',
])
def preferredencoding(): def preferredencoding():
"""Get preferred encoding. """Get preferred encoding.
@ -975,6 +1018,24 @@ class YoutubeDLCookieProcessor(compat_urllib_request.HTTPCookieProcessor):
https_response = http_response https_response = http_response
def extract_timezone(date_str):
m = re.search(
r'^.{8,}?(?P<tz>Z$| ?(?P<sign>\+|-)(?P<hours>[0-9]{2}):?(?P<minutes>[0-9]{2})$)',
date_str)
if not m:
timezone = datetime.timedelta()
else:
date_str = date_str[:-len(m.group('tz'))]
if not m.group('sign'):
timezone = datetime.timedelta()
else:
sign = 1 if m.group('sign') == '+' else -1
timezone = datetime.timedelta(
hours=sign * int(m.group('hours')),
minutes=sign * int(m.group('minutes')))
return timezone, date_str
def parse_iso8601(date_str, delimiter='T', timezone=None): def parse_iso8601(date_str, delimiter='T', timezone=None):
""" Return a UNIX timestamp from the given date """ """ Return a UNIX timestamp from the given date """
@ -984,20 +1045,8 @@ def parse_iso8601(date_str, delimiter='T', timezone=None):
date_str = re.sub(r'\.[0-9]+', '', date_str) date_str = re.sub(r'\.[0-9]+', '', date_str)
if timezone is None: if timezone is None:
m = re.search( timezone, date_str = extract_timezone(date_str)
r'(?:Z$| ?(?P<sign>\+|-)(?P<hours>[0-9]{2}):?(?P<minutes>[0-9]{2})$)',
date_str)
if not m:
timezone = datetime.timedelta()
else:
date_str = date_str[:-len(m.group(0))]
if not m.group('sign'):
timezone = datetime.timedelta()
else:
sign = 1 if m.group('sign') == '+' else -1
timezone = datetime.timedelta(
hours=sign * int(m.group('hours')),
minutes=sign * int(m.group('minutes')))
try: try:
date_format = '%Y-%m-%d{0}%H:%M:%S'.format(delimiter) date_format = '%Y-%m-%d{0}%H:%M:%S'.format(delimiter)
dt = datetime.datetime.strptime(date_str, date_format) - timezone dt = datetime.datetime.strptime(date_str, date_format) - timezone
@ -1006,6 +1055,10 @@ def parse_iso8601(date_str, delimiter='T', timezone=None):
pass pass
def date_formats(day_first=True):
return DATE_FORMATS_DAY_FIRST if day_first else DATE_FORMATS_MONTH_FIRST
def unified_strdate(date_str, day_first=True): def unified_strdate(date_str, day_first=True):
"""Return a string with the date in the format YYYYMMDD""" """Return a string with the date in the format YYYYMMDD"""
@ -1014,53 +1067,11 @@ def unified_strdate(date_str, day_first=True):
upload_date = None upload_date = None
# Replace commas # Replace commas
date_str = date_str.replace(',', ' ') date_str = date_str.replace(',', ' ')
# %z (UTC offset) is only supported in python>=3.2
if not re.match(r'^[0-9]{1,2}-[0-9]{1,2}-[0-9]{4}$', date_str):
date_str = re.sub(r' ?(\+|-)[0-9]{2}:?[0-9]{2}$', '', date_str)
# Remove AM/PM + timezone # Remove AM/PM + timezone
date_str = re.sub(r'(?i)\s*(?:AM|PM)(?:\s+[A-Z]+)?', '', date_str) date_str = re.sub(r'(?i)\s*(?:AM|PM)(?:\s+[A-Z]+)?', '', date_str)
_, date_str = extract_timezone(date_str)
format_expressions = [ for expression in date_formats(day_first):
'%d %B %Y',
'%d %b %Y',
'%B %d %Y',
'%b %d %Y',
'%b %dst %Y %I:%M',
'%b %dnd %Y %I:%M',
'%b %dth %Y %I:%M',
'%Y %m %d',
'%Y-%m-%d',
'%Y/%m/%d',
'%Y/%m/%d %H:%M:%S',
'%Y-%m-%d %H:%M:%S',
'%Y-%m-%d %H:%M:%S.%f',
'%d.%m.%Y %H:%M',
'%d.%m.%Y %H.%M',
'%Y-%m-%dT%H:%M:%SZ',
'%Y-%m-%dT%H:%M:%S.%fZ',
'%Y-%m-%dT%H:%M:%S.%f0Z',
'%Y-%m-%dT%H:%M:%S',
'%Y-%m-%dT%H:%M:%S.%f',
'%Y-%m-%dT%H:%M',
]
if day_first:
format_expressions.extend([
'%d-%m-%Y',
'%d.%m.%Y',
'%d.%m.%y',
'%d/%m/%Y',
'%d/%m/%y',
'%d/%m/%Y %H:%M:%S',
])
else:
format_expressions.extend([
'%m-%d-%Y',
'%m.%d.%Y',
'%m/%d/%Y',
'%m/%d/%y',
'%m/%d/%Y %H:%M:%S',
])
for expression in format_expressions:
try: try:
upload_date = datetime.datetime.strptime(date_str, expression).strftime('%Y%m%d') upload_date = datetime.datetime.strptime(date_str, expression).strftime('%Y%m%d')
except ValueError: except ValueError:
@ -1076,6 +1087,29 @@ def unified_strdate(date_str, day_first=True):
return compat_str(upload_date) return compat_str(upload_date)
def unified_timestamp(date_str, day_first=True):
if date_str is None:
return None
date_str = date_str.replace(',', ' ')
pm_delta = datetime.timedelta(hours=12 if re.search(r'(?i)PM', date_str) else 0)
timezone, date_str = extract_timezone(date_str)
# Remove AM/PM + timezone
date_str = re.sub(r'(?i)\s*(?:AM|PM)(?:\s+[A-Z]+)?', '', date_str)
for expression in date_formats(day_first):
try:
dt = datetime.datetime.strptime(date_str, expression) - timezone + pm_delta
return calendar.timegm(dt.timetuple())
except ValueError:
pass
timetuple = email.utils.parsedate_tz(date_str)
if timetuple:
return calendar.timegm(timetuple.timetuple())
def determine_ext(url, default_ext='unknown_video'): def determine_ext(url, default_ext='unknown_video'):
if url is None: if url is None:
return default_ext return default_ext
@ -1626,6 +1660,10 @@ def float_or_none(v, scale=1, invscale=1, default=None):
return default return default
def strip_or_none(v):
return None if v is None else v.strip()
def parse_duration(s): def parse_duration(s):
if not isinstance(s, compat_basestring): if not isinstance(s, compat_basestring):
return None return None
@ -2852,3 +2890,16 @@ def decode_packed_codes(code):
return re.sub( return re.sub(
r'\b(\w+)\b', lambda mobj: symbol_table[mobj.group(0)], r'\b(\w+)\b', lambda mobj: symbol_table[mobj.group(0)],
obfucasted_code) obfucasted_code)
def parse_m3u8_attributes(attrib):
info = {}
for (key, val) in re.findall(r'(?P<key>[A-Z0-9-]+)=(?P<val>"[^"]+"|[^",]+)(?:,|$)', attrib):
if val.startswith('"'):
val = val[1:-1]
info[key] = val
return info
def urshift(val, n):
return val >> n if val >= 0 else (val + 0x100000000) >> n

View File

@ -1,3 +1,3 @@
from __future__ import unicode_literals from __future__ import unicode_literals
__version__ = '2016.06.19.1' __version__ = '2016.06.26'