Compare commits

...

83 Commits

Author SHA1 Message Date
Philipp Hagemeister
1f91bd15c3 release 2014.03.21.3 2014-03-21 02:10:35 +01:00
Philipp Hagemeister
11a15be4ce [cspan] Add support for newer videos (Fixes #2577) 2014-03-21 02:10:24 +01:00
Philipp Hagemeister
14e17e18cb release 2014.03.21.2 2014-03-21 01:42:45 +01:00
Philipp Hagemeister
1b124d1942 [parliamentliveuk] Add extractor 2014-03-21 01:42:28 +01:00
Philipp Hagemeister
747373d4ae release 2014.03.21.1 2014-03-21 01:00:27 +01:00
Philipp Hagemeister
18d367c0a5 Remove legacy InfoExtractors file 2014-03-21 01:00:06 +01:00
Philipp Hagemeister
a1a530b067 [pbs] Add support for video ratings 2014-03-21 00:59:51 +01:00
Philipp Hagemeister
cb9722cb3f [viki] Modernize 2014-03-21 00:53:18 +01:00
Philipp Hagemeister
773c0b4bb8 [pbs] Add support for widget URLs (Fixes #2594) 2014-03-21 00:46:32 +01:00
Philipp Hagemeister
23c322a531 release 2014.03.21 2014-03-21 00:37:23 +01:00
Philipp Hagemeister
7e8c0af004 Add --prefer-insecure option (Fixes #2364) 2014-03-21 00:37:10 +01:00
Philipp Hagemeister
d2983ccb25 [ninegag] Modernize and remove unused import 2014-03-21 00:37:10 +01:00
Philipp Hagemeister
f24e9833dc [youporn] Modernize 2014-03-21 00:37:10 +01:00
Sergey M․
bc2bdf5709 [kontrtube] Modernize 2014-03-20 23:05:57 +07:00
Philipp Hagemeister
627a209f74 release 2014.03.20 2014-03-20 16:35:54 +01:00
Philipp Hagemeister
1a4895453a [YoutubeDL] Improve error message 2014-03-20 16:33:46 +01:00
Philipp Hagemeister
aab74fa106 [ted] Simplify embed code (#2587) 2014-03-20 16:33:23 +01:00
Philipp Hagemeister
2bd9efd4c2 Merge remote-tracking branch 'anovicecodemonkey/TEDIEimprovements' 2014-03-20 16:24:34 +01:00
Jaime Marquínez Ferrándiz
39a743fb9b [arte] Modernize tests and fix _VALID_REGEX 2014-03-20 09:14:43 +01:00
Jaime Marquínez Ferrándiz
4966a0b22d [arte] Add extractor for concert.arte.tv (closes #2588) 2014-03-20 09:11:47 +01:00
anovicecodemonkey
fc26023120 [TEDIE] Add support for embeded TED video URLs 2014-03-20 01:04:21 +10:30
anovicecodemonkey
8d7c0cca13 [generic] Add support for embeded TED videos 2014-03-20 00:56:32 +10:30
Sergey M․
f66ede4328 [arte.tv:+7] Fix _VALID_URL 2014-03-19 21:23:55 +07:00
Philipp Hagemeister
cc88b90ec8 [desvscripts/release] Bump the number of password tries to accomodate stubby-fingered @phihag 2014-03-18 15:02:37 +01:00
Philipp Hagemeister
b6c5fa9a0b release 2014.03.18.1 2014-03-18 14:42:59 +01:00
Philipp Hagemeister
dff10eaa77 release 2014.03.18 2014-03-18 14:31:03 +01:00
Philipp Hagemeister
4e6f9aeca1 Fix typo 2014-03-18 14:28:53 +01:00
Philipp Hagemeister
e68301af21 Fix getpass on Windows (Fixes #2547) 2014-03-18 14:27:42 +01:00
Sergey M․
17286a96f2 [iprima] Fix permission check regex 2014-03-18 19:33:28 +07:00
Jaime Marquínez Ferrándiz
0892363e6d Merge pull request #2580 from ericpardee/patch-1
Update to comedycentral.py (cc.com)
2014-03-18 08:14:39 +01:00
ericpardee
f102372b5f Update to comedycentral.py (cc.com)
Added cc.com as it's same as comedycentral.com and used, i.e. http://www.cc.com/video-clips/fmyq0m/broad-city-a-beautiful-railroad-style-apartment
2014-03-17 18:01:26 -07:00
Jaime Marquínez Ferrándiz
ecbe1ad207 [generic] Fix access to removed function in python 3.4
The `Request.get_origin_req_host` method was deprecated in 3.3, use the
 `origin_req_host` property if it's not available, see http://docs.python.org/3.3/library/urllib.request.html#urllib.request.Request.get_origin_req_host.
2014-03-17 21:59:21 +01:00
Philipp Hagemeister
9d840c43b5 release 2014.03.17 2014-03-17 14:49:02 +01:00
Philipp Hagemeister
6f50f63382 Merge remote-tracking branch 'origin/wheels' 2014-03-17 14:31:22 +01:00
Philipp Hagemeister
ff14fc4964 [test] Rename get_testcases to gettestcases
Apparently, newer versions of nosetests are somewhat over-eager in their test discovery.
2014-03-17 14:30:13 +01:00
Sergey M․
e125c21531 [vesti] Restore vesti extractor 2014-03-17 02:01:01 +07:00
Sergey M․
93d020dd65 [generic] Add support for embedded rutv player 2014-03-17 02:00:31 +07:00
Sergey M․
a7515ec265 [rutv] Refactor vgtrk/rutv extractor 2014-03-17 01:59:40 +07:00
Jaime Marquínez Ferrándiz
b6c1ceccc2 [ted] Add 'http://' to the thumbnail url if it's missing 2014-03-16 11:24:11 +01:00
Jaime Marquínez Ferrándiz
4056ad8f36 Build and upload universal wheels to pypi 2014-03-16 10:22:41 +01:00
Philipp Hagemeister
6563837ee1 [udemy] Make sure test case is not inherited 2014-03-16 07:09:10 +01:00
Philipp Hagemeister
fd5e6f7ef2 [vevo] Mark all test timestamps as approximate 2014-03-16 07:05:48 +01:00
Sergey M․
15fd51b37c [generic] More generic support for embedded vimeo player (#1602) 2014-03-16 00:47:04 +07:00
Sergey M․
f1cef7a9ff [iprima] Skip test 2014-03-15 01:39:42 +07:00
Sergey M․
8264223511 [iprima] Add access permission check 2014-03-15 01:38:44 +07:00
Jaime Marquínez Ferrándiz
bc6d597828 Add bestvideo and worstvideo to special format names (#2163) 2014-03-14 17:01:47 +01:00
Philipp Hagemeister
aba77bbfc2 [vevo] Adapt test to constantly changing timestamp 2014-03-13 18:45:14 +01:00
Philipp Hagemeister
955c451456 Rename upload_timestamp to timestamp 2014-03-13 18:45:14 +01:00
Sergey M․
e5de3f6c89 [udemy] Initial support for free courses (#1617) 2014-03-14 00:36:39 +07:00
Philipp Hagemeister
2a1db721d4 [test_download] Move assertions before debugging output 2014-03-13 17:05:51 +01:00
Philipp Hagemeister
1e0eb60f1a [videobam] Fix empty title handling 2014-03-13 17:03:43 +01:00
Philipp Hagemeister
87a29e6f25 [wdr] Add description to tests 2014-03-13 17:01:58 +01:00
Philipp Hagemeister
c3d36f134f [googlesearch] Fix next page indicator check 2014-03-13 16:52:13 +01:00
Philipp Hagemeister
84769e708c [ninegag] Fix extraction 2014-03-13 16:40:53 +01:00
Philipp Hagemeister
9d2ecdbc71 [vevo] Centralize timestamp handling 2014-03-13 15:30:25 +01:00
Philipp Hagemeister
9b69af5342 Merge remote-tracking branch 'soult/br' 2014-03-13 14:35:34 +01:00
David Triendl
c21215b421 [br] Allow '/' in URL, allow empty author + broadcastDate fields
* Allow URLs that have a 'subdirectory' before the actual program name, e.g.
  'xyz/xyz-episode-1'.
* The author and broadcastDate fields in the XML file may be empty.
* Add test case for the two problems above.
2014-03-13 14:08:34 +01:00
Philipp Hagemeister
cddcfd90b4 [funnyordie] Correct JSON interpretation 2014-03-13 00:53:19 +01:00
Sergey M․
f36aacba0f [collegehumor] Fix one more test 2014-03-13 06:25:12 +07:00
Sergey M․
355271fb61 [collegehumor] Extract like count 2014-03-13 06:12:39 +07:00
Sergey M․
2a5b502364 [collegehumor] Fix test 2014-03-13 06:09:21 +07:00
Philipp Hagemeister
98ff9d82d4 release 2014.03.12 2014-03-12 14:50:14 +01:00
Jaime Marquínez Ferrándiz
b1ff87224c [vimeo] Now VimeoIE doesn't match urls of channels with a numeric id (fixes #2552) 2014-03-12 14:23:06 +01:00
Sergey M․
b461641fb9 [wdr] Add support for WDR sites (Closes #1367) 2014-03-12 04:20:47 +07:00
Sergey M․
b047de6f6e Add format to unified_strdate 2014-03-12 04:18:43 +07:00
Philipp Hagemeister
34ca5d9ba0 release 2014.03.11 2014-03-11 16:51:50 +01:00
Philipp Hagemeister
60cc4dc4b4 [generic/funnyordie] Add support for funnyordie embeds (Fixes #2546) 2014-03-11 16:51:36 +01:00
Philipp Hagemeister
db95dc13a1 [playvid] Simplify (#2539) 2014-03-10 20:55:47 +01:00
Philipp Hagemeister
777ac90791 Merge remote-tracking branch 'MikeCol/playvid_extract' 2014-03-10 20:45:45 +01:00
Philipp Hagemeister
04f9bebbcb Merge remote-tracking branch 'jaimeMF/remove_global_opener' 2014-03-10 20:42:54 +01:00
MikeCol
4ea3137e41 Playvid extractor 2014-03-10 20:16:49 +01:00
Jaime Marquínez Ferrándiz
a0792b738e Don't install the global url opener
All the code uses now the urlopen method of YoutubeDL
2014-03-10 19:04:51 +01:00
Jaime Marquínez Ferrándiz
19a41fc613 Don't set the global socket timeout
Use the timeout argument of the `OpenerDirector.open` method instead
2014-03-10 19:03:37 +01:00
Sergey M․
3ee52157fb [vgtrk] Rename vesti extractor 2014-03-11 00:58:05 +07:00
Sergey M․
c4d197ee2d [vesti] Fix _VALID_URL regex 2014-03-11 00:49:41 +07:00
Philipp Hagemeister
a33932cfe3 [vevo] Correct test value
The date is now interpreted as UTC for consistency.
2014-03-10 17:56:54 +01:00
Philipp Hagemeister
bcf89ce62c [generic] Suppress warning about doctypes in RSS parser 2014-03-10 17:31:32 +01:00
Philipp Hagemeister
e3899d0e00 Merge branch 'master' of github.com:rg3/youtube-dl 2014-03-10 16:42:22 +01:00
Philipp Hagemeister
dcb00da49c [depositfiles] Remove extractor
This site requires a CAPTCHA to download, supports arbitrary files and not only audio/video, and I can't find a single uncopyrighted video with a quick google search.
Closes #1255
2014-03-10 16:41:08 +01:00
Sergey M․
aa51d20d19 [vesti] Skip geo restricted test 2014-03-10 22:31:22 +07:00
Philipp Hagemeister
ae7ed92057 [youtube] Fix up invalid JSON 2014-03-10 13:35:45 +01:00
Philipp Hagemeister
e45b31d9bd [vevo] Interpret date as UTC instead of local time 2014-03-10 13:12:57 +01:00
Philipp Hagemeister
5a25f39653 Correct extractor documentation 2014-03-10 13:09:55 +01:00
41 changed files with 1053 additions and 402 deletions

View File

@@ -36,6 +36,9 @@ which means you can modify it, redistribute it or use it however you like.
an empty string (--proxy "") for direct
connection
--no-check-certificate Suppress HTTPS certificate validation.
--prefer-insecure Use an unencrypted connection to retrieve
information about the video. (Currently
supported only for YouTube)
--cache-dir DIR Location in the filesystem where youtube-dl
can store some downloaded information
permanently. By default $XDG_CACHE_HOME
@@ -191,9 +194,9 @@ which means you can modify it, redistribute it or use it however you like.
preference using slashes: "-f 22/17/18".
"-f mp4" and "-f flv" are also supported.
You can also use the special names "best",
"bestaudio", "worst", and "worstaudio". By
default, youtube-dl will pick the best
quality.
"bestvideo", "bestaudio", "worst",
"worstvideo" and "worstaudio". By default,
youtube-dl will pick the best quality.
--all-formats download all available video formats
--prefer-free-formats prefer free video formats unless a specific
one is requested

View File

@@ -70,7 +70,7 @@ RELEASE_FILES="youtube-dl youtube-dl.exe youtube-dl-$version.tar.gz"
git checkout HEAD -- youtube-dl youtube-dl.exe
/bin/echo -e "\n### Signing and uploading the new binaries to yt-dl.org ..."
for f in $RELEASE_FILES; do gpg --detach-sig "build/$version/$f"; done
for f in $RELEASE_FILES; do gpg --passphrase-repeat 5 --detach-sig "build/$version/$f"; done
scp -r "build/$version" ytdl@yt-dl.org:html/tmp/
ssh ytdl@yt-dl.org "mv html/tmp/$version html/downloads/"
ssh ytdl@yt-dl.org "sh html/update_latest.sh $version"
@@ -97,7 +97,7 @@ rm -rf build
make pypi-files
echo "Uploading to PyPi ..."
python setup.py sdist upload
python setup.py sdist bdist_wheel upload
make clean
/bin/echo -e "\n### DONE!"

2
setup.cfg Normal file
View File

@@ -0,0 +1,2 @@
[wheel]
universal = True

View File

@@ -71,7 +71,7 @@ class FakeYDL(YoutubeDL):
old_report_warning(message)
self.report_warning = types.MethodType(report_warning, self)
def get_testcases():
def gettestcases():
for ie in youtube_dl.extractor.gen_extractors():
t = getattr(ie, '_TEST', None)
if t:

View File

@@ -182,6 +182,24 @@ class TestFormatSelection(unittest.TestCase):
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'vid-high')
def test_format_selection_video(self):
formats = [
{'format_id': 'dash-video-low', 'ext': 'mp4', 'preference': 1, 'acodec': 'none'},
{'format_id': 'dash-video-high', 'ext': 'mp4', 'preference': 2, 'acodec': 'none'},
{'format_id': 'vid', 'ext': 'mp4', 'preference': 3},
]
info_dict = {'formats': formats, 'extractor': 'test'}
ydl = YDL({'format': 'bestvideo'})
ydl.process_ie_result(info_dict.copy())
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'dash-video-high')
ydl = YDL({'format': 'worstvideo'})
ydl.process_ie_result(info_dict.copy())
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'dash-video-low')
def test_youtube_format_selection(self):
order = [
'38', '37', '46', '22', '45', '35', '44', '18', '34', '43', '6', '5', '36', '17', '13',

View File

@@ -9,7 +9,7 @@ import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import get_testcases
from test.helper import gettestcases
from youtube_dl.extractor import (
FacebookIE,
@@ -105,7 +105,7 @@ class TestAllURLsMatching(unittest.TestCase):
def test_no_duplicates(self):
ies = gen_extractors()
for tc in get_testcases():
for tc in gettestcases():
url = tc['url']
for ie in ies:
if type(ie).__name__ in ('GenericIE', tc['name'] + 'IE'):
@@ -124,6 +124,8 @@ class TestAllURLsMatching(unittest.TestCase):
def test_vimeo_matching(self):
self.assertMatch('http://vimeo.com/channels/tributes', ['vimeo:channel'])
self.assertMatch('http://vimeo.com/channels/31259', ['vimeo:channel'])
self.assertMatch('http://vimeo.com/channels/31259/53576664', ['vimeo'])
self.assertMatch('http://vimeo.com/user7108434', ['vimeo:user'])
self.assertMatch('http://vimeo.com/user7108434/videos', ['vimeo:user'])
self.assertMatch('https://vimeo.com/user21297594/review/75524534/3c257a1b5d', ['vimeo:review'])
@@ -139,6 +141,7 @@ class TestAllURLsMatching(unittest.TestCase):
def test_pbs(self):
# https://github.com/rg3/youtube-dl/issues/2350
self.assertMatch('http://video.pbs.org/viralplayer/2365173446/', ['PBS'])
self.assertMatch('http://video.pbs.org/widget/partnerplayer/980042464/', ['PBS'])
if __name__ == '__main__':
unittest.main()

View File

@@ -8,7 +8,7 @@ sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import (
get_params,
get_testcases,
gettestcases,
try_rm,
md5,
report_warning
@@ -51,7 +51,7 @@ def _file_md5(fn):
with open(fn, 'rb') as f:
return hashlib.md5(f.read()).hexdigest()
defs = get_testcases()
defs = gettestcases()
class TestDownload(unittest.TestCase):
@@ -144,6 +144,10 @@ def generator(test_case):
self.assertTrue(
isinstance(got, compat_str) and match_rex.match(got),
u'field %s (value: %r) should match %r' % (info_field, got, match_str))
elif isinstance(expected, type):
got = info_dict.get(info_field)
self.assertTrue(isinstance(got, expected),
u'Expected type %r, but got value %r of type %r' % (expected, got, type(got)))
else:
if isinstance(expected, compat_str) and expected.startswith('md5:'):
got = 'md5:' + md5(info_dict.get(info_field))
@@ -152,19 +156,19 @@ def generator(test_case):
self.assertEqual(expected, got,
u'invalid value for field %s, expected %r, got %r' % (info_field, expected, got))
# If checkable fields are missing from the test case, print the info_dict
test_info_dict = dict((key, value if not isinstance(value, compat_str) or len(value) < 250 else 'md5:' + md5(value))
for key, value in info_dict.items()
if value and key in ('title', 'description', 'uploader', 'upload_date', 'uploader_id', 'location'))
if not all(key in tc.get('info_dict', {}).keys() for key in test_info_dict.keys()):
sys.stderr.write(u'\n"info_dict": ' + json.dumps(test_info_dict, ensure_ascii=False, indent=4) + u'\n')
# Check for the presence of mandatory fields
for key in ('id', 'url', 'title', 'ext'):
self.assertTrue(key in info_dict.keys() and info_dict[key])
# Check for mandatory fields that are automatically set by YoutubeDL
for key in ['webpage_url', 'extractor', 'extractor_key']:
self.assertTrue(info_dict.get(key), u'Missing field: %s' % key)
# If checkable fields are missing from the test case, print the info_dict
test_info_dict = dict((key, value if not isinstance(value, compat_str) or len(value) < 250 else 'md5:' + md5(value))
for key, value in info_dict.items()
if value and key in ('title', 'description', 'uploader', 'upload_date', 'timestamp', 'uploader_id', 'location'))
if not all(key in tc.get('info_dict', {}).keys() for key in test_info_dict.keys()):
sys.stderr.write(u'\n"info_dict": ' + json.dumps(test_info_dict, ensure_ascii=False, indent=4) + u'\n')
finally:
try_rm_tcs_files()

View File

@@ -249,7 +249,7 @@ class TestPlaylists(unittest.TestCase):
self.assertIsPlaylist(result)
self.assertEqual(result['id'], 'python language')
self.assertEqual(result['title'], 'python language')
self.assertTrue(len(result['entries']) == 15)
self.assertEqual(len(result['entries']), 15)
def test_generic_rss_feed(self):
dl = FakeYDL()

View File

@@ -1,4 +0,0 @@
# Legacy file for backwards compatibility, use youtube_dl.extractor instead!
from .extractor.common import InfoExtractor, SearchInfoExtractor
from .extractor import gen_extractors, get_info_extractor

View File

@@ -4,6 +4,7 @@
from __future__ import absolute_import, unicode_literals
import collections
import datetime
import errno
import io
import json
@@ -147,6 +148,8 @@ class YoutubeDL(object):
again.
cookiefile: File name where cookies should be read from and dumped to.
nocheckcertificate:Do not verify SSL certificates
prefer_insecure: Use HTTP instead of HTTPS to retrieve information.
At the moment, this is only supported by YouTube.
proxy: URL of the proxy server to use
socket_timeout: Time to wait for unresponsive hosts, in seconds
bidi_workaround: Work around buggy terminals without bidirectional text
@@ -532,7 +535,7 @@ class YoutubeDL(object):
else:
raise
else:
self.report_error('no suitable InfoExtractor: %s' % url)
self.report_error('no suitable InfoExtractor for URL %s' % url)
def process_ie_result(self, ie_result, download=True, extra_info={}):
"""
@@ -666,6 +669,18 @@ class YoutubeDL(object):
if f.get('vcodec') == 'none']
if audio_formats:
return audio_formats[0]
elif format_spec == 'bestvideo':
video_formats = [
f for f in available_formats
if f.get('acodec') == 'none']
if video_formats:
return video_formats[-1]
elif format_spec == 'worstvideo':
video_formats = [
f for f in available_formats
if f.get('acodec') == 'none']
if video_formats:
return video_formats[0]
else:
extensions = ['mp4', 'flv', 'webm', '3gp']
if format_spec in extensions:
@@ -688,6 +703,11 @@ class YoutubeDL(object):
if 'display_id' not in info_dict and 'id' in info_dict:
info_dict['display_id'] = info_dict['id']
if info_dict.get('upload_date') is None and info_dict.get('timestamp') is not None:
upload_date = datetime.datetime.utcfromtimestamp(
info_dict['timestamp'])
info_dict['upload_date'] = upload_date.strftime('%Y%m%d')
# This extractors handle format selection themselves
if info_dict['extractor'] in ['Youku']:
if download:
@@ -701,8 +721,11 @@ class YoutubeDL(object):
else:
formats = info_dict['formats']
if not formats:
raise ExtractorError('No video formats found!')
# We check that all the formats have the format and format_id fields
for (i, format) in enumerate(formats):
for i, format in enumerate(formats):
if format.get('format_id') is None:
format['format_id'] = compat_str(i)
if format.get('format') is None:
@@ -1167,7 +1190,7 @@ class YoutubeDL(object):
def urlopen(self, req):
""" Start an HTTP download """
return self._opener.open(req)
return self._opener.open(req, timeout=self._socket_timeout)
def print_debug_header(self):
if not self.params.get('verbose'):
@@ -1198,7 +1221,7 @@ class YoutubeDL(object):
def _setup_opener(self):
timeout_val = self.params.get('socket_timeout')
timeout = 600 if timeout_val is None else float(timeout_val)
self._socket_timeout = 600 if timeout_val is None else float(timeout_val)
opts_cookiefile = self.params.get('cookiefile')
opts_proxy = self.params.get('proxy')
@@ -1236,7 +1259,3 @@ class YoutubeDL(object):
# (See https://github.com/rg3/youtube-dl/issues/1309 for details)
opener.addheaders = []
self._opener = opener
# TODO remove this global modification
compat_urllib_request.install_opener(opener)
socket.setdefaulttimeout(timeout)

View File

@@ -56,7 +56,6 @@ __authors__ = (
__license__ = 'Public Domain'
import codecs
import getpass
import io
import locale
import optparse
@@ -68,6 +67,7 @@ import sys
from .utils import (
compat_getpass,
compat_print,
DateRange,
decodeOption,
@@ -237,6 +237,9 @@ def parseOpts(overrideArguments=None):
'--proxy', dest='proxy', default=None, metavar='URL',
help='Use the specified HTTP/HTTPS proxy. Pass in an empty string (--proxy "") for direct connection')
general.add_option('--no-check-certificate', action='store_true', dest='no_check_certificate', default=False, help='Suppress HTTPS certificate validation.')
general.add_option(
'--prefer-insecure', action='store_true', dest='prefer_insecure',
help='Use an unencrypted connection to retrieve information about the video. (Currently supported only for YouTube)')
general.add_option(
'--cache-dir', dest='cachedir', default=get_cachedir(), metavar='DIR',
help='Location in the filesystem where youtube-dl can store some downloaded information permanently. By default $XDG_CACHE_HOME/youtube-dl or ~/.cache/youtube-dl . At the moment, only YouTube player files (for videos with obfuscated signatures) are cached, but that may change.')
@@ -257,7 +260,6 @@ def parseOpts(overrideArguments=None):
action='store_true',
help='Do not read configuration files. When given in the global configuration file /etc/youtube-dl.conf: do not read the user configuration in ~/.config/youtube-dl.conf (%APPDATA%/youtube-dl/config.txt on Windows)')
selection.add_option(
'--playlist-start',
dest='playliststart', metavar='NUMBER', default=1, type=int,
@@ -316,7 +318,7 @@ def parseOpts(overrideArguments=None):
video_format.add_option('-f', '--format',
action='store', dest='format', metavar='FORMAT', default=None,
help='video format code, specify the order of preference using slashes: "-f 22/17/18". "-f mp4" and "-f flv" are also supported. You can also use the special names "best", "bestaudio", "worst", and "worstaudio". By default, youtube-dl will pick the best quality.')
help='video format code, specify the order of preference using slashes: "-f 22/17/18". "-f mp4" and "-f flv" are also supported. You can also use the special names "best", "bestvideo", "bestaudio", "worst", "worstvideo" and "worstaudio". By default, youtube-dl will pick the best quality.')
video_format.add_option('--all-formats',
action='store_const', dest='format', help='download all available video formats', const='all')
video_format.add_option('--prefer-free-formats',
@@ -611,7 +613,7 @@ def _real_main(argv=None):
if opts.usetitle and opts.useid:
parser.error(u'using title conflicts with using video ID')
if opts.username is not None and opts.password is None:
opts.password = getpass.getpass(u'Type account password and press return:')
opts.password = compat_getpass(u'Type account password and press [Return]: ')
if opts.ratelimit is not None:
numeric_limit = FileDownloader.parse_bytes(opts.ratelimit)
if numeric_limit is None:
@@ -756,6 +758,7 @@ def _real_main(argv=None):
'download_archive': download_archive_fn,
'cookiefile': opts.cookiefile,
'nocheckcertificate': opts.no_check_certificate,
'prefer_insecure': opts.prefer_insecure,
'proxy': opts.proxy,
'socket_timeout': opts.socket_timeout,
'bidi_workaround': opts.bidi_workaround,

View File

@@ -10,6 +10,7 @@ from .arte import (
ArteTvIE,
ArteTVPlus7IE,
ArteTVCreativeIE,
ArteTVConcertIE,
ArteTVFutureIE,
ArteTVDDCIE,
)
@@ -53,7 +54,6 @@ from .dailymotion import (
DailymotionUserIE,
)
from .daum import DaumIE
from .depositfiles import DepositFilesIE
from .dotsub import DotsubIE
from .dreisat import DreiSatIE
from .defense import DefenseGouvFrIE
@@ -174,8 +174,10 @@ from .nowness import NownessIE
from .nowvideo import NowVideoIE
from .ooyala import OoyalaIE
from .orf import ORFIE
from .parliamentliveuk import ParliamentLiveUKIE
from .pbs import PBSIE
from .photobucket import PhotobucketIE
from .playvid import PlayvidIE
from .podomatic import PodomaticIE
from .pornhd import PornHdIE
from .pornhub import PornHubIE
@@ -196,6 +198,7 @@ from .rutube import (
RutubeMovieIE,
RutubePersonIE,
)
from .rutv import RUTVIE
from .savefrom import SaveFromIE
from .servingsys import ServingSysIE
from .sina import SinaIE
@@ -242,6 +245,10 @@ from .tumblr import TumblrIE
from .tutv import TutvIE
from .tvigle import TvigleIE
from .tvp import TvpIE
from .udemy import (
UdemyIE,
UdemyCourseIE
)
from .unistra import UnistraIE
from .ustream import UstreamIE, UstreamChannelIE
from .vbox7 import Vbox7IE
@@ -268,6 +275,7 @@ from .viki import VikiIE
from .vk import VKIE
from .vube import VubeIE
from .wat import WatIE
from .wdr import WDRIE
from .weibo import WeiboIE
from .wimp import WimpIE
from .wistia import WistiaIE

View File

@@ -131,7 +131,7 @@ class ArteTvIE(InfoExtractor):
class ArteTVPlus7IE(InfoExtractor):
IE_NAME = 'arte.tv:+7'
_VALID_URL = r'https?://www\.arte.tv/guide/(?P<lang>fr|de)/(?:(?:sendungen|emissions)/)?(?P<id>.*?)/(?P<name>.*?)(\?.*)?'
_VALID_URL = r'https?://(?:www\.)?arte\.tv/guide/(?P<lang>fr|de)/(?:(?:sendungen|emissions)/)?(?P<id>.*?)/(?P<name>.*?)(\?.*)?'
@classmethod
def _extract_url_info(cls, url):
@@ -202,6 +202,8 @@ class ArteTVPlus7IE(InfoExtractor):
re.match(r'VO-ST(F|A)', f.get('versionCode', '')) is None,
# The version with sourds/mal subtitles has also lower relevance
re.match(r'VO?(F|A)-STM\1', f.get('versionCode', '')) is None,
# Prefer http downloads over m3u8
0 if f['url'].endswith('m3u8') else 1,
)
formats = sorted(formats, key=sort_key)
def _format(format_info):
@@ -242,8 +244,9 @@ class ArteTVCreativeIE(ArteTVPlus7IE):
_TEST = {
'url': 'http://creative.arte.tv/de/magazin/agentur-amateur-corporate-design',
'file': '050489-002.mp4',
'info_dict': {
'id': '050489-002',
'ext': 'mp4',
'title': 'Agentur Amateur / Agence Amateur #2 : Corporate Design',
},
}
@@ -255,8 +258,9 @@ class ArteTVFutureIE(ArteTVPlus7IE):
_TEST = {
'url': 'http://future.arte.tv/fr/sujet/info-sciences#article-anchor-7081',
'file': '050940-003.mp4',
'info_dict': {
'id': '050940-003',
'ext': 'mp4',
'title': 'Les champignons au secours de la planète',
},
}
@@ -270,7 +274,7 @@ class ArteTVFutureIE(ArteTVPlus7IE):
class ArteTVDDCIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:ddc'
_VALID_URL = r'http?://ddc\.arte\.tv/(?P<lang>emission|folge)/(?P<id>.+)'
_VALID_URL = r'https?://ddc\.arte\.tv/(?P<lang>emission|folge)/(?P<id>.+)'
def _real_extract(self, url):
video_id, lang = self._extract_url_info(url)
@@ -284,3 +288,19 @@ class ArteTVDDCIE(ArteTVPlus7IE):
javascriptPlayerGenerator = self._download_webpage(script_url, video_id, 'Download javascript player generator')
json_url = self._search_regex(r"json_url=(.*)&rendering_place.*", javascriptPlayerGenerator, 'json url')
return self._extract_from_json_url(json_url, video_id, lang)
class ArteTVConcertIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:concert'
_VALID_URL = r'https?://concert\.arte\.tv/(?P<lang>de|fr)/(?P<id>.+)'
_TEST = {
'url': 'http://concert.arte.tv/de/notwist-im-pariser-konzertclub-divan-du-monde',
'md5': '9ea035b7bd69696b67aa2ccaaa218161',
'info_dict': {
'id': '186',
'ext': 'mp4',
'title': 'The Notwist im Pariser Konzertclub "Divan du Monde"',
'upload_date': '20140128',
},
}

View File

@@ -9,21 +9,35 @@ from ..utils import ExtractorError
class BRIE(InfoExtractor):
IE_DESC = "Bayerischer Rundfunk Mediathek"
_VALID_URL = r"^https?://(?:www\.)?br\.de/mediathek/video/(?:sendungen/)?(?P<id>[a-z0-9\-]+)\.html$"
_VALID_URL = r"^https?://(?:www\.)?br\.de/mediathek/video/(?:sendungen/)?(?:[a-z0-9\-/]+/)?(?P<id>[a-z0-9\-]+)\.html$"
_BASE_URL = "http://www.br.de"
_TEST = {
"url": "http://www.br.de/mediathek/video/anselm-gruen-114.html",
"md5": "c4f83cf0f023ba5875aba0bf46860df2",
"info_dict": {
"id": "2c8d81c5-6fb7-4a74-88d4-e768e5856532",
"ext": "mp4",
"title": "Feiern und Verzichten",
"description": "Anselm Grün: Feiern und Verzichten",
"uploader": "BR/Birgit Baier",
"upload_date": "20140301"
_TESTS = [
{
"url": "http://www.br.de/mediathek/video/anselm-gruen-114.html",
"md5": "c4f83cf0f023ba5875aba0bf46860df2",
"info_dict": {
"id": "2c8d81c5-6fb7-4a74-88d4-e768e5856532",
"ext": "mp4",
"title": "Feiern und Verzichten",
"description": "Anselm Grün: Feiern und Verzichten",
"uploader": "BR/Birgit Baier",
"upload_date": "20140301"
}
},
{
"url": "http://www.br.de/mediathek/video/sendungen/unter-unserem-himmel/unter-unserem-himmel-alpen-ueber-den-pass-100.html",
"md5": "ab451b09d861dbed7d7cc9ab0be19ebe",
"info_dict": {
"id": "2c060e69-3a27-4e13-b0f0-668fac17d812",
"ext": "mp4",
"title": "Über den Pass",
"description": "Die Eroberung der Alpen: Über den Pass",
"uploader": None,
"upload_date": None
}
}
}
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
@@ -33,16 +47,21 @@ class BRIE(InfoExtractor):
r"return BRavFramework\.register\(BRavFramework\('avPlayer_(?:[a-f0-9-]{36})'\)\.setup\({dataURL:'(/mediathek/video/[a-z0-9/~_.-]+)'}\)\);", page, "XMLURL")
xml = self._download_xml(self._BASE_URL + xml_url, None)
videos = [{
"id": xml_video.get("externalId"),
"title": xml_video.find("title").text,
"formats": self._extract_formats(xml_video.find("assets")),
"thumbnails": self._extract_thumbnails(xml_video.find("teaserImage/variants")),
"description": " ".join(xml_video.find("shareTitle").text.splitlines()),
"uploader": xml_video.find("author").text,
"upload_date": "".join(reversed(xml_video.find("broadcastDate").text.split("."))),
"webpage_url": xml_video.find("permalink").text,
} for xml_video in xml.findall("video")]
videos = []
for xml_video in xml.findall("video"):
video = {
"id": xml_video.get("externalId"),
"title": xml_video.find("title").text,
"formats": self._extract_formats(xml_video.find("assets")),
"thumbnails": self._extract_thumbnails(xml_video.find("teaserImage/variants")),
"description": " ".join(xml_video.find("shareTitle").text.splitlines()),
"webpage_url": xml_video.find("permalink").text
}
if xml_video.find("author").text:
video["uploader"] = xml_video.find("author").text
if xml_video.find("broadcastDate").text:
video["upload_date"] = "".join(reversed(xml_video.find("broadcastDate").text.split(".")))
videos.append(video)
if len(videos) > 1:
self._downloader.report_warning(

View File

@@ -17,8 +17,9 @@ class CollegeHumorIE(InfoExtractor):
'id': '6902724',
'ext': 'mp4',
'title': 'Comic-Con Cosplay Catastrophe',
'description': 'Fans get creative this year',
'description': "Fans get creative this year at San Diego. Too creative. And yes, that's really Joss Whedon.",
'age_limit': 13,
'duration': 187,
},
},
{
@@ -28,7 +29,7 @@ class CollegeHumorIE(InfoExtractor):
'id': '3505939',
'ext': 'mp4',
'title': 'Font Conference',
'description': 'This video wasn\'t long enough,',
'description': "This video wasn't long enough, so we made it double-spaced.",
'age_limit': 10,
'duration': 179,
},
@@ -87,6 +88,7 @@ class CollegeHumorIE(InfoExtractor):
self._sort_formats(formats)
duration = int_or_none(vdata.get('duration'), 1000)
like_count = int_or_none(vdata.get('likes'))
return {
'id': video_id,
@@ -96,4 +98,5 @@ class CollegeHumorIE(InfoExtractor):
'formats': formats,
'age_limit': age_limit,
'duration': duration,
'like_count': like_count,
}

View File

@@ -14,7 +14,7 @@ from ..utils import (
class ComedyCentralIE(MTVServicesInfoExtractor):
_VALID_URL = r'''(?x)https?://(?:www\.)?comedycentral\.com/
_VALID_URL = r'''(?x)https?://(?:www\.)?(comedycentral|cc)\.com/
(video-clips|episodes|cc-studios|video-collections)
/(?P<title>.*)'''
_FEED_URL = 'http://comedycentral.com/feeds/mrss/'

View File

@@ -97,7 +97,9 @@ class InfoExtractor(object):
thumbnail: Full URL to a video thumbnail image.
description: One-line video description.
uploader: Full name of the video uploader.
timestamp: UNIX timestamp of the moment the video became available.
upload_date: Video upload date (YYYYMMDD).
If not explicitly set, calculated from timestamp.
uploader_id: Nickname or id of the video uploader.
location: Physical location of the video.
subtitles: The subtitle file contents as a dictionary in the format
@@ -118,9 +120,6 @@ class InfoExtractor(object):
_real_extract() methods and define a _VALID_URL regexp.
Probably, they should also be added to the list of extractors.
_real_extract() must return a *list* of information dictionaries as
described above.
Finally, the _WORKING attribute should be set to False for broken IEs
in order to warn the users and skip the tests.
"""

View File

@@ -10,9 +10,9 @@ from ..utils import (
class CSpanIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?c-span\.org/video/\?(?P<id>\d+)'
_VALID_URL = r'http://(?:www\.)?c-span\.org/video/\?(?P<id>[0-9a-f]+)'
IE_DESC = 'C-SPAN'
_TEST = {
_TESTS = [{
'url': 'http://www.c-span.org/video/?313572-1/HolderonV',
'md5': '8e44ce11f0f725527daccc453f553eb0',
'info_dict': {
@@ -22,13 +22,24 @@ class CSpanIE(InfoExtractor):
'description': 'Attorney General Eric Holder spoke to reporters following the Supreme Court decision in Shelby County v. Holder in which the court ruled that the preclearance provisions of the Voting Rights Act could not be enforced until Congress established new guidelines for review.',
},
'skip': 'Regularly fails on travis, for unknown reasons',
}
}, {
'url': 'http://www.c-span.org/video/?c4486943/cspan-international-health-care-models',
# For whatever reason, the served vide oalternates between
# two different ones
#'md5': 'dbb0f047376d457f2ab8b3929cbb2d0c',
'info_dict': {
'id': '340723',
'ext': 'mp4',
'title': 'International Health Care Models',
'description': 'md5:7a985a2d595dba00af3d9c9f0783c967',
}
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
page_id = mobj.group('id')
webpage = self._download_webpage(url, page_id)
video_id = self._search_regex(r'data-progid=\'(\d+)\'>', webpage, 'video id')
video_id = self._search_regex(r'progid=\'?([0-9]+)\'?>', webpage, 'video id')
description = self._html_search_regex(
[

View File

@@ -1,60 +0,0 @@
import re
import os
import socket
from .common import InfoExtractor
from ..utils import (
compat_http_client,
compat_str,
compat_urllib_error,
compat_urllib_parse,
compat_urllib_request,
ExtractorError,
)
class DepositFilesIE(InfoExtractor):
"""Information extractor for depositfiles.com"""
_VALID_URL = r'(?:http://)?(?:\w+\.)?depositfiles\.com/(?:../(?#locale))?files/(.+)'
def _real_extract(self, url):
file_id = url.split('/')[-1]
# Rebuild url in english locale
url = 'http://depositfiles.com/en/files/' + file_id
# Retrieve file webpage with 'Free download' button pressed
free_download_indication = {'gateway_result' : '1'}
request = compat_urllib_request.Request(url, compat_urllib_parse.urlencode(free_download_indication))
try:
self.report_download_webpage(file_id)
webpage = compat_urllib_request.urlopen(request).read()
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
raise ExtractorError(u'Unable to retrieve file webpage: %s' % compat_str(err))
# Search for the real file URL
mobj = re.search(r'<form action="(http://fileshare.+?)"', webpage)
if (mobj is None) or (mobj.group(1) is None):
# Try to figure out reason of the error.
mobj = re.search(r'<strong>(Attention.*?)</strong>', webpage, re.DOTALL)
if (mobj is not None) and (mobj.group(1) is not None):
restriction_message = re.sub('\s+', ' ', mobj.group(1)).strip()
raise ExtractorError(u'%s' % restriction_message)
else:
raise ExtractorError(u'Unable to extract download URL from: %s' % url)
file_url = mobj.group(1)
file_extension = os.path.splitext(file_url)[1][1:]
# Search for file title
file_title = self._search_regex(r'<b title="(.*?)">', webpage, u'title')
return [{
'id': file_id.decode('utf-8'),
'url': file_url.decode('utf-8'),
'uploader': None,
'upload_date': None,
'title': file_title,
'ext': file_extension.decode('utf-8'),
}]

View File

@@ -1,12 +1,13 @@
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
class FunnyOrDieIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?(?:www\.)?funnyordie\.com/videos/(?P<id>[0-9a-f]+)/.*$'
_VALID_URL = r'https?://(?:www\.)?funnyordie\.com/(?P<type>embed|videos)/(?P<id>[0-9a-f]+)(?:$|[?#/])'
_TEST = {
'url': 'http://www.funnyordie.com/videos/0732f586d7/heart-shaped-box-literal-video-version',
'file': '0732f586d7.mp4',
@@ -30,10 +31,23 @@ class FunnyOrDieIE(InfoExtractor):
[r'type="video/mp4" src="(.*?)"', r'src="([^>]*?)" type=\'video/mp4\''],
webpage, 'video URL', flags=re.DOTALL)
if mobj.group('type') == 'embed':
post_json = self._search_regex(
r'fb_post\s*=\s*(\{.*?\});', webpage, 'post details')
post = json.loads(post_json)
title = post['name']
description = post.get('description')
thumbnail = post.get('picture')
else:
title = self._og_search_title(webpage)
description = self._og_search_description(webpage)
thumbnail = None
return {
'id': video_id,
'url': video_url,
'ext': 'mp4',
'title': self._og_search_title(webpage),
'description': self._og_search_description(webpage),
'title': title,
'description': description,
'thumbnail': thumbnail,
}

View File

@@ -4,7 +4,6 @@ from __future__ import unicode_literals
import os
import re
import xml.etree.ElementTree
from .common import InfoExtractor
from .youtube import YoutubeIE
@@ -17,6 +16,7 @@ from ..utils import (
ExtractorError,
HEADRequest,
parse_xml,
smuggle_url,
unescapeHTML,
unified_strdate,
@@ -24,6 +24,7 @@ from ..utils import (
)
from .brightcove import BrightcoveIE
from .ooyala import OoyalaIE
from .rutv import RUTVIE
class GenericIE(InfoExtractor):
@@ -134,6 +135,43 @@ class GenericIE(InfoExtractor):
'skip_download': True,
},
},
# funnyordie embed
{
'url': 'http://www.theguardian.com/world/2014/mar/11/obama-zach-galifianakis-between-two-ferns',
'md5': '7cf780be104d40fea7bae52eed4a470e',
'info_dict': {
'id': '18e820ec3f',
'ext': 'mp4',
'title': 'Between Two Ferns with Zach Galifianakis: President Barack Obama',
'description': 'Episode 18: President Barack Obama sits down with Zach Galifianakis for his most memorable interview yet.',
},
},
# RUTV embed
{
'url': 'http://www.rg.ru/2014/03/15/reg-dfo/anklav-anons.html',
'info_dict': {
'id': '776940',
'ext': 'mp4',
'title': 'Охотское море стало целиком российским',
'description': 'md5:5ed62483b14663e2a95ebbe115eb8f43',
},
'params': {
# m3u8 download
'skip_download': True,
},
},
# Embedded TED video
{
'url': 'http://en.support.wordpress.com/videos/ted-talks/',
'md5': 'deeeabcc1085eb2ba205474e7235a3d5',
'info_dict': {
'id': '981',
'ext': 'mp4',
'title': 'My web playroom',
'uploader': 'Ze Frank',
'description': 'md5:ddb2a40ecd6b6a147e400e535874947b',
}
}
]
def report_download_webpage(self, video_id):
@@ -159,9 +197,14 @@ class GenericIE(InfoExtractor):
newurl = newurl.replace(' ', '%20')
newheaders = dict((k,v) for k,v in req.headers.items()
if k.lower() not in ("content-length", "content-type"))
try:
# This function was deprecated in python 3.3 and removed in 3.4
origin_req_host = req.get_origin_req_host()
except AttributeError:
origin_req_host = req.origin_req_host
return HEADRequest(newurl,
headers=newheaders,
origin_req_host=req.get_origin_req_host(),
origin_req_host=origin_req_host,
unverifiable=True)
else:
raise compat_urllib_error.HTTPError(req.get_full_url(), code, msg, headers, fp)
@@ -274,7 +317,7 @@ class GenericIE(InfoExtractor):
# Is it an RSS feed?
try:
doc = xml.etree.ElementTree.fromstring(webpage.encode('utf-8'))
doc = parse_xml(webpage)
if doc.tag == 'rss':
return self._extract_rss(url, video_id, doc)
except compat_xml_parse_error:
@@ -313,9 +356,9 @@ class GenericIE(InfoExtractor):
# Look for embedded (iframe) Vimeo player
mobj = re.search(
r'<iframe[^>]+?src="((?:https?:)?//player\.vimeo\.com/video/.+?)"', webpage)
r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//player\.vimeo\.com/video/.+?)\1', webpage)
if mobj:
player_url = unescapeHTML(mobj.group(1))
player_url = unescapeHTML(mobj.group('url'))
surl = smuggle_url(player_url, {'Referer': url})
return self.url_result(surl, 'Vimeo')
@@ -432,6 +475,19 @@ class GenericIE(InfoExtractor):
if mobj is not None:
return self.url_result(compat_urllib_parse.unquote(mobj.group('url')))
# Look for funnyordie embed
matches = re.findall(r'<iframe[^>]+?src="(https?://(?:www\.)?funnyordie\.com/embed/[^"]+)"', webpage)
if matches:
urlrs = [self.url_result(unescapeHTML(eurl), 'FunnyOrDie')
for eurl in matches]
return self.playlist_result(
urlrs, playlist_id=video_id, playlist_title=video_title)
# Look for embedded RUTV player
rutv_url = RUTVIE._extract_url(webpage)
if rutv_url:
return self.url_result(rutv_url, 'RUTV')
# Start with something easy: JW Player in SWFObject
mobj = re.search(r'flashvars: [\'"](?:.*&)?file=(http[^\'"&]*)', webpage)
if mobj is None:
@@ -443,6 +499,13 @@ class GenericIE(InfoExtractor):
if mobj is None:
# Broaden the search a little bit: JWPlayer JS loader
mobj = re.search(r'[^A-Za-z0-9]?file["\']?:\s*["\'](http(?![^\'"]+\.[0-9]+[\'"])[^\'"]+)["\']', webpage)
# Look for embedded TED player
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>http://embed\.ted\.com/.+?)\1', webpage)
if mobj is not None:
return self.url_result(mobj.group('url'), 'TED')
if mobj is None:
# Try to find twitter cards info
mobj = re.search(r'<meta (?:property|name)="twitter:player:stream" (?:content|value)="(.+?)"', webpage)

View File

@@ -46,6 +46,6 @@ class GoogleSearchIE(SearchInfoExtractor):
'url': mobj.group(1)
})
if (len(entries) >= n) or not re.search(r'class="pn" id="pnnext"', webpage):
if (len(entries) >= n) or not re.search(r'id="pnnext"', webpage):
res['entries'] = entries[:n]
return res

View File

@@ -6,7 +6,10 @@ from random import random
from math import floor
from .common import InfoExtractor
from ..utils import compat_urllib_request
from ..utils import (
compat_urllib_request,
ExtractorError,
)
class IPrimaIE(InfoExtractor):
@@ -36,6 +39,7 @@ class IPrimaIE(InfoExtractor):
'params': {
'skip_download': True, # requires rtmpdump
},
'skip': 'Do not have permission to access this page',
}]
def _real_extract(self, url):
@@ -44,6 +48,10 @@ class IPrimaIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
if re.search(r'Nemáte oprávnění přistupovat na tuto stránku\.\s*</div>', webpage):
raise ExtractorError(
'%s said: You do not have permission to access this page' % self.IE_NAME, expected=True)
player_url = (
'http://embed.livebox.cz/iprimaplay/player-embed-v2.js?__tok%s__=%s' %
(floor(random()*1073741824), floor(random()*1073741824))

View File

@@ -4,6 +4,7 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import int_or_none
class KontrTubeIE(InfoExtractor):
@@ -32,27 +33,26 @@ class KontrTubeIE(InfoExtractor):
video_url = self._html_search_regex(r"video_url: '(.+?)/?',", webpage, 'video URL')
thumbnail = self._html_search_regex(r"preview_url: '(.+?)/?',", webpage, 'video thumbnail', fatal=False)
title = self._html_search_regex(r'<title>(.+?) - Труба зовёт - Интересный видеохостинг</title>', webpage,
'video title')
title = self._html_search_regex(
r'<title>(.+?) - Труба зовёт - Интересный видеохостинг</title>', webpage, 'video title')
description = self._html_search_meta('description', webpage, 'video description')
mobj = re.search(r'<div class="col_2">Длительность: <span>(?P<minutes>\d+)м:(?P<seconds>\d+)с</span></div>',
webpage)
mobj = re.search(
r'<div class="col_2">Длительность: <span>(?P<minutes>\d+)м:(?P<seconds>\d+)с</span></div>', webpage)
duration = int(mobj.group('minutes')) * 60 + int(mobj.group('seconds')) if mobj else None
view_count = self._html_search_regex(r'<div class="col_2">Просмотров: <span>(\d+)</span></div>', webpage,
'view count', fatal=False)
view_count = int(view_count) if view_count is not None else None
view_count = self._html_search_regex(
r'<div class="col_2">Просмотров: <span>(\d+)</span></div>', webpage, 'view count', fatal=False)
comment_count = None
comment_str = self._html_search_regex(r'Комментарии: <span>([^<]+)</span>', webpage, 'comment count',
fatal=False)
comment_str = self._html_search_regex(
r'Комментарии: <span>([^<]+)</span>', webpage, 'comment count', fatal=False)
if comment_str.startswith('комментариев нет'):
comment_count = 0
else:
mobj = re.search(r'\d+ из (?P<total>\d+) комментариев', comment_str)
if mobj:
comment_count = int(mobj.group('total'))
comment_count = mobj.group('total')
return {
'id': video_id,
@@ -61,6 +61,6 @@ class KontrTubeIE(InfoExtractor):
'title': title,
'description': description,
'duration': duration,
'view_count': view_count,
'comment_count': comment_count,
'view_count': int_or_none(view_count),
'comment_count': int_or_none(comment_count),
}

View File

@@ -1,6 +1,5 @@
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
@@ -12,10 +11,13 @@ class NineGagIE(InfoExtractor):
_TEST = {
"url": "http://9gag.tv/v/1912",
"file": "1912.mp4",
"info_dict": {
"id": "1912",
"ext": "mp4",
"description": "This 3-minute video will make you smile and then make you feel untalented and insignificant. Anyway, you should share this awesomeness. (Thanks, Dino!)",
"title": "\"People Are Awesome 2013\" Is Absolutely Awesome"
"title": "\"People Are Awesome 2013\" Is Absolutely Awesome",
"view_count": int,
"thumbnail": "re:^https?://",
},
'add_ie': ['Youtube']
}
@@ -25,21 +27,27 @@ class NineGagIE(InfoExtractor):
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
data_json = self._html_search_regex(r'''(?x)
<div\s*id="tv-video"\s*data-video-source="youtube"\s*
data-video-meta="([^"]+)"''', webpage, 'video metadata')
data = json.loads(data_json)
youtube_id = self._html_search_regex(
r'(?s)id="jsid-video-post-container".*?data-external-id="([^"]+)"',
webpage, 'video ID')
description = self._html_search_regex(
r'(?s)<div class="video-caption">.*?<p>(.*?)</p>', webpage,
'description', fatal=False)
view_count_str = self._html_search_regex(
r'<p><b>([0-9][0-9,]*)</b> views</p>', webpage, 'view count',
fatal=False)
view_count = (
None if view_count_str is None
else int(view_count_str.replace(',', '')))
return {
'_type': 'url_transparent',
'url': data['youtubeVideoId'],
'url': youtube_id,
'ie_key': 'Youtube',
'id': video_id,
'title': data['title'],
'description': data['description'],
'view_count': int(data['view_count']),
'like_count': int(data['statistic']['like']),
'dislike_count': int(data['statistic']['dislike']),
'thumbnail': data['thumbnail_url'],
'title': self._og_search_title(webpage),
'description': description,
'view_count': view_count,
'thumbnail': self._og_search_thumbnail(webpage),
}

View File

@@ -0,0 +1,57 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
unified_strdate,
)
class ParliamentLiveUKIE(InfoExtractor):
IE_NAME = 'parliamentlive.tv'
IE_DESC = 'UK parliament videos'
_VALID_URL = r'https?://www\.parliamentlive\.tv/Main/Player\.aspx\?(?:[^&]+&)*?meetingId=(?P<id>[0-9]+)'
_TEST = {
'url': 'http://www.parliamentlive.tv/Main/Player.aspx?meetingId=15121&player=windowsmedia',
'info_dict': {
'id': '15121',
'ext': 'asf',
'title': 'hoc home affairs committee, 18 mar 2014.pm',
'description': 'md5:033b3acdf83304cd43946b2d5e5798d1',
},
'params': {
'skip_download': True, # Requires mplayer (mms)
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
asx_url = self._html_search_regex(
r'embed.*?src="([^"]+)" name="MediaPlayer"', webpage,
'metadata URL')
asx = self._download_xml(asx_url, video_id, 'Downloading ASX metadata')
video_url = asx.find('.//REF').attrib['HREF']
title = self._search_regex(
r'''(?x)player\.setClipDetails\(
(?:(?:[0-9]+|"[^"]+"),\s*){2}
"([^"]+",\s*"[^"]+)"
''',
webpage, 'title').replace('", "', ', ')
description = self._html_search_regex(
r'(?s)<span id="MainContentPlaceHolder_CaptionsBlock_WitnessInfo">(.*?)</span>',
webpage, 'description')
return {
'id': video_id,
'ext': 'asf',
'url': video_url,
'title': title,
'description': description,
}

View File

@@ -3,6 +3,9 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
US_RATINGS,
)
class PBSIE(InfoExtractor):
@@ -13,7 +16,7 @@ class PBSIE(InfoExtractor):
# Article with embedded player
(?:www\.)?pbs\.org/(?:[^/]+/){2,5}(?P<presumptive_id>[^/]+)/?(?:$|[?\#]) |
# Player
video\.pbs\.org/partnerplayer/(?P<player_id>[^/]+)/
video\.pbs\.org/(?:widget/)?partnerplayer/(?P<player_id>[^/]+)/
)
'''
@@ -57,6 +60,11 @@ class PBSIE(InfoExtractor):
info_url = 'http://video.pbs.org/videoInfo/%s?format=json' % video_id
info = self._download_json(info_url, display_id)
rating_str = info.get('rating')
if rating_str is not None:
rating_str = rating_str.rpartition('-')[2]
age_limit = US_RATINGS.get(rating_str)
return {
'id': video_id,
'title': info['title'],
@@ -65,4 +73,5 @@ class PBSIE(InfoExtractor):
'description': info['program'].get('description'),
'thumbnail': info.get('image_url'),
'duration': info.get('duration'),
'age_limit': age_limit,
}

View File

@@ -0,0 +1,80 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
compat_urllib_parse,
)
class PlayvidIE(InfoExtractor):
_VALID_URL = r'^https?://www\.playvid\.com/watch(\?v=|/)(?P<id>.+?)(?:#|$)'
_TEST = {
'url': 'http://www.playvid.com/watch/agbDDi7WZTV',
'md5': '44930f8afa616efdf9482daf4fe53e1e',
'info_dict': {
'id': 'agbDDi7WZTV',
'ext': 'mp4',
'title': 'Michelle Lewin in Miami Beach',
'duration': 240,
'age_limit': 18,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
video_title = None
duration = None
video_thumbnail = None
formats = []
# most of the information is stored in the flashvars
flashvars = self._html_search_regex(
r'flashvars="(.+?)"', webpage, 'flashvars')
infos = compat_urllib_parse.unquote(flashvars).split(r'&')
for info in infos:
videovars_match = re.match(r'^video_vars\[(.+?)\]=(.+?)$', info)
if videovars_match:
key = videovars_match.group(1)
val = videovars_match.group(2)
if key == 'title':
video_title = compat_urllib_parse.unquote_plus(val)
if key == 'duration':
try:
duration = int(val)
except ValueError:
pass
if key == 'big_thumb':
video_thumbnail = val
videourl_match = re.match(
r'^video_urls\]\[(?P<resolution>[0-9]+)p', key)
if videourl_match:
height = int(videourl_match.group('resolution'))
formats.append({
'height': height,
'url': val,
})
self._sort_formats(formats)
# Extract title - should be in the flashvars; if not, look elsewhere
if video_title is None:
video_title = self._html_search_regex(
r'<title>(.*?)</title', webpage, 'title')
return {
'id': video_id,
'formats': formats,
'title': video_title,
'thumbnail': video_thumbnail,
'duration': duration,
'description': None,
'age_limit': 18
}

View File

@@ -0,0 +1,183 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
int_or_none
)
class RUTVIE(InfoExtractor):
IE_DESC = 'RUTV.RU'
_VALID_URL = r'https?://player\.(?:rutv\.ru|vgtrk\.com)/(?:flash2v/container\.swf\?id=|iframe/(?P<type>swf|video|live)/id/)(?P<id>\d+)'
_TESTS = [
{
'url': 'http://player.rutv.ru/flash2v/container.swf?id=774471&sid=kultura&fbv=true&isPlay=true&ssl=false&i=560&acc_video_id=episode_id/972347/video_id/978186/brand_id/31724',
'info_dict': {
'id': '774471',
'ext': 'mp4',
'title': 'Монологи на все времена',
'description': 'md5:18d8b5e6a41fb1faa53819471852d5d5',
'duration': 2906,
},
'params': {
# m3u8 download
'skip_download': True,
},
},
{
'url': 'https://player.vgtrk.com/flash2v/container.swf?id=774016&sid=russiatv&fbv=true&isPlay=true&ssl=false&i=560&acc_video_id=episode_id/972098/video_id/977760/brand_id/57638',
'info_dict': {
'id': '774016',
'ext': 'mp4',
'title': 'Чужой в семье Сталина',
'description': '',
'duration': 2539,
},
'params': {
# m3u8 download
'skip_download': True,
},
},
{
'url': 'http://player.rutv.ru/iframe/swf/id/766888/sid/hitech/?acc_video_id=4000',
'info_dict': {
'id': '766888',
'ext': 'mp4',
'title': 'Вести.net: интернет-гиганты начали перетягивание программных "одеял"',
'description': 'md5:65ddd47f9830c4f42ed6475f8730c995',
'duration': 279,
},
'params': {
# m3u8 download
'skip_download': True,
},
},
{
'url': 'http://player.rutv.ru/iframe/video/id/771852/start_zoom/true/showZoomBtn/false/sid/russiatv/?acc_video_id=episode_id/970443/video_id/975648/brand_id/5169',
'info_dict': {
'id': '771852',
'ext': 'mp4',
'title': 'Прямой эфир. Жертвы загадочной болезни: смерть от старости в 17 лет',
'description': 'md5:b81c8c55247a4bd996b43ce17395b2d8',
'duration': 3096,
},
'params': {
# m3u8 download
'skip_download': True,
},
},
{
'url': 'http://player.rutv.ru/iframe/live/id/51499/showZoomBtn/false/isPlay/true/sid/sochi2014',
'info_dict': {
'id': '51499',
'ext': 'flv',
'title': 'Сочи-2014. Биатлон. Индивидуальная гонка. Мужчины ',
'description': 'md5:9e0ed5c9d2fa1efbfdfed90c9a6d179c',
},
'params': {
# rtmp download
'skip_download': True,
},
'skip': 'Translation has finished',
},
]
@classmethod
def _extract_url(cls, webpage):
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>https?://player\.rutv\.ru/iframe/(?:swf|video|live)/id/.+?)\1', webpage)
if mobj:
return mobj.group('url')
mobj = re.search(
r'<meta[^>]+?property=(["\'])og:video\1[^>]+?content=(["\'])(?P<url>http://player\.(?:rutv\.ru|vgtrk\.com)/flash2v/container\.swf\?id=.+?\2)',
webpage)
if mobj:
return mobj.group('url')
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_type = mobj.group('type')
if not video_type or video_type == 'swf':
video_type = 'video'
json_data = self._download_json(
'http://player.rutv.ru/iframe/%splay/id/%s' % ('live-' if video_type == 'live' else '', video_id),
video_id, 'Downloading JSON')
if json_data['errors']:
raise ExtractorError('%s said: %s' % (self.IE_NAME, json_data['errors']), expected=True)
playlist = json_data['data']['playlist']
medialist = playlist['medialist']
media = medialist[0]
if media['errors']:
raise ExtractorError('%s said: %s' % (self.IE_NAME, media['errors']), expected=True)
view_count = playlist.get('count_views')
priority_transport = playlist['priority_transport']
thumbnail = media['picture']
width = int_or_none(media['width'])
height = int_or_none(media['height'])
description = media['anons']
title = media['title']
duration = int_or_none(media.get('duration'))
formats = []
for transport, links in media['sources'].items():
for quality, url in links.items():
if transport == 'rtmp':
mobj = re.search(r'^(?P<url>rtmp://[^/]+/(?P<app>.+))/(?P<playpath>.+)$', url)
if not mobj:
continue
fmt = {
'url': mobj.group('url'),
'play_path': mobj.group('playpath'),
'app': mobj.group('app'),
'page_url': 'http://player.rutv.ru',
'player_url': 'http://player.rutv.ru/flash2v/osmf.swf?i=22',
'rtmp_live': True,
'ext': 'flv',
'vbr': int(quality),
}
elif transport == 'm3u8':
fmt = {
'url': url,
'ext': 'mp4',
}
else:
fmt = {
'url': url
}
fmt.update({
'width': width,
'height': height,
'format_id': '%s-%s' % (transport, quality),
'preference': -1 if priority_transport == transport else -2,
})
formats.append(fmt)
if not formats:
raise ExtractorError('No media links available for %s' % video_id)
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'view_count': view_count,
'duration': duration,
'formats': formats,
}

View File

@@ -11,7 +11,9 @@ from ..utils import (
class TEDIE(SubtitlesInfoExtractor):
_VALID_URL = r'''(?x)http://www\.ted\.com/
_VALID_URL = r'''(?x)
(?P<proto>https?://)
(?P<type>www|embed)(?P<urlmain>\.ted\.com/
(
(?P<type_playlist>playlists(?:/\d+)?) # We have a playlist
|
@@ -19,6 +21,7 @@ class TEDIE(SubtitlesInfoExtractor):
)
(/lang/(.*?))? # The url may contain the language
/(?P<name>\w+) # Here goes the name and then ".html"
.*)$
'''
_TEST = {
'url': 'http://www.ted.com/talks/dan_dennett_on_our_consciousness.html',
@@ -48,6 +51,9 @@ class TEDIE(SubtitlesInfoExtractor):
def _real_extract(self, url):
m = re.match(self._VALID_URL, url, re.VERBOSE)
if m.group('type') == 'embed':
desktop_url = m.group('proto') + 'www' + m.group('urlmain')
return self.url_result(desktop_url, 'TED')
name = m.group('name')
if m.group('type_talk'):
return self._talk_info(url, name)
@@ -93,11 +99,14 @@ class TEDIE(SubtitlesInfoExtractor):
self._list_available_subtitles(video_id, talk_info)
return
thumbnail = talk_info['thumb']
if not thumbnail.startswith('http'):
thumbnail = 'http://' + thumbnail
return {
'id': video_id,
'title': talk_info['title'],
'uploader': talk_info['speaker'],
'thumbnail': talk_info['thumb'],
'thumbnail': thumbnail,
'description': self._og_search_description(webpage),
'subtitles': video_subtitles,
'formats': formats,

View File

@@ -0,0 +1,164 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
compat_urllib_parse,
compat_urllib_request,
ExtractorError,
)
class UdemyIE(InfoExtractor):
IE_NAME = 'udemy'
_VALID_URL = r'https?://www\.udemy\.com/(?:[^#]+#/lecture/|lecture/view/?\?lectureId=)(?P<id>\d+)'
_LOGIN_URL = 'https://www.udemy.com/join/login-submit/'
_NETRC_MACHINE = 'udemy'
_TESTS = [{
'url': 'https://www.udemy.com/java-tutorial/#/lecture/172757',
'md5': '98eda5b657e752cf945d8445e261b5c5',
'info_dict': {
'id': '160614',
'ext': 'mp4',
'title': 'Introduction and Installation',
'description': 'md5:c0d51f6f21ef4ec65f091055a5eef876',
'duration': 579.29,
},
'skip': 'Requires udemy account credentials',
}]
def _handle_error(self, response):
if not isinstance(response, dict):
return
error = response.get('error')
if error:
error_str = 'Udemy returned error #%s: %s' % (error.get('code'), error.get('message'))
error_data = error.get('data')
if error_data:
error_str += ' - %s' % error_data.get('formErrors')
raise ExtractorError(error_str, expected=True)
def _download_json(self, url, video_id, note='Downloading JSON metadata'):
response = super(UdemyIE, self)._download_json(url, video_id, note)
self._handle_error(response)
return response
def _real_initialize(self):
self._login()
def _login(self):
(username, password) = self._get_login_info()
if username is None:
raise ExtractorError(
'Udemy account is required, use --username and --password options to provide account credentials.',
expected=True)
login_popup = self._download_webpage(
'https://www.udemy.com/join/login-popup?displayType=ajax&showSkipButton=1', None,
'Downloading login popup')
if login_popup == '<div class="run-command close-popup redirect" data-url="https://www.udemy.com/"></div>':
return
csrf = self._html_search_regex(r'<input type="hidden" name="csrf" value="(.+?)"', login_popup, 'csrf token')
login_form = {
'email': username,
'password': password,
'csrf': csrf,
'displayType': 'json',
'isSubmitted': '1',
}
request = compat_urllib_request.Request(self._LOGIN_URL, compat_urllib_parse.urlencode(login_form))
response = self._download_json(request, None, 'Logging in as %s' % username)
if 'returnUrl' not in response:
raise ExtractorError('Unable to log in')
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
lecture_id = mobj.group('id')
lecture = self._download_json(
'https://www.udemy.com/api-1.1/lectures/%s' % lecture_id, lecture_id, 'Downloading lecture JSON')
if lecture['assetType'] != 'Video':
raise ExtractorError('Lecture %s is not a video' % lecture_id, expected=True)
asset = lecture['asset']
stream_url = asset['streamUrl']
mobj = re.search(r'(https?://www\.youtube\.com/watch\?v=.*)', stream_url)
if mobj:
return self.url_result(mobj.group(1), 'Youtube')
video_id = asset['id']
thumbnail = asset['thumbnailUrl']
duration = asset['data']['duration']
download_url = asset['downloadUrl']
formats = [
{
'url': download_url['Video480p'][0],
'format_id': '360p',
},
{
'url': download_url['Video'][0],
'format_id': '720p',
},
]
title = lecture['title']
description = lecture['description']
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'duration': duration,
'formats': formats
}
class UdemyCourseIE(UdemyIE):
IE_NAME = 'udemy:course'
_VALID_URL = r'https?://www\.udemy\.com/(?P<coursepath>[\da-z-]+)'
_SUCCESSFULLY_ENROLLED = '>You have enrolled in this course!<'
_ALREADY_ENROLLED = '>You are already taking this course.<'
_TESTS = []
@classmethod
def suitable(cls, url):
return False if UdemyIE.suitable(url) else super(UdemyCourseIE, cls).suitable(url)
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
course_path = mobj.group('coursepath')
response = self._download_json(
'https://www.udemy.com/api-1.1/courses/%s' % course_path, course_path, 'Downloading course JSON')
course_id = int(response['id'])
course_title = response['title']
webpage = self._download_webpage(
'https://www.udemy.com/course/subscribe/?courseId=%s' % course_id, course_id, 'Enrolling in the course')
if self._SUCCESSFULLY_ENROLLED in webpage:
self.to_screen('%s: Successfully enrolled in' % course_id)
elif self._ALREADY_ENROLLED in webpage:
self.to_screen('%s: Already enrolled in' % course_id)
response = self._download_json('https://www.udemy.com/api-1.1/courses/%s/curriculum' % course_id,
course_id, 'Downloading course curriculum')
entries = [
self.url_result('https://www.udemy.com/%s/#/lecture/%s' % (course_path, asset['id']), 'Udemy')
for asset in response if asset.get('assetType') == 'Video'
]
return self.playlist_result(entries, course_id, course_title)

View File

@@ -4,16 +4,13 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
int_or_none
)
from ..utils import ExtractorError
from .rutv import RUTVIE
class VestiIE(InfoExtractor):
IE_NAME = 'vesti'
IE_DESC = 'Вести.Ru'
_VALID_URL = r'http://(?:(?:.+?\.)?vesti\.ru|(?:2\.)?russia\.tv|tvkultura\.ru|rutv\.ru)/(?P<id>.+)'
_VALID_URL = r'http://(?:.+?\.)?vesti\.ru/(?P<id>.+)'
_TESTS = [
{
@@ -85,7 +82,7 @@ class VestiIE(InfoExtractor):
# m3u8 download
'skip_download': True,
},
'skip': 'Blocked outside Russia'
'skip': 'Blocked outside Russia',
},
{
'url': 'http://sochi2014.vesti.ru/live/play/live_id/301',
@@ -101,76 +98,6 @@ class VestiIE(InfoExtractor):
},
'skip': 'Translation has finished'
},
{
'url': 'http://russia.tv/video/show/brand_id/5169/episode_id/970443/video_id/975648',
'info_dict': {
'id': '771852',
'ext': 'mp4',
'title': 'Прямой эфир. Жертвы загадочной болезни: смерть от старости в 17 лет',
'description': 'md5:b81c8c55247a4bd996b43ce17395b2d8',
'duration': 3096,
},
'params': {
# m3u8 download
'skip_download': True,
},
},
{
'url': 'http://russia.tv/brand/show/brand_id/57638',
'info_dict': {
'id': '774016',
'ext': 'mp4',
'title': 'Чужой в семье Сталина',
'description': '',
'duration': 2539,
},
'params': {
# m3u8 download
'skip_download': True,
},
},
{
'url': 'http://2.russia.tv/video/show/brand_id/48863/episode_id/972920/video_id/978667/viewtype/picture',
'info_dict': {
'id': '775081',
'ext': 'mp4',
'title': 'XXII зимние Олимпийские игры. Россияне заняли весь пьедестал в лыжных гонках',
'description': 'md5:15d3741dd8d04b203fbc031c6a47fb0f',
'duration': 101,
},
'params': {
# m3u8 download
'skip_download': True,
},
},
{
'url': 'http://tvkultura.ru/video/show/brand_id/31724/episode_id/972347/video_id/978186',
'info_dict': {
'id': '774471',
'ext': 'mp4',
'title': 'Монологи на все времена',
'description': 'md5:18d8b5e6a41fb1faa53819471852d5d5',
'duration': 2906,
},
'params': {
# m3u8 download
'skip_download': True,
},
},
{
'url': 'http://rutv.ru/brand/show/id/6792/channel/75',
'info_dict': {
'id': '125521',
'ext': 'mp4',
'title': 'Грустная дама червей. Х',
'description': '',
'duration': 4882,
},
'params': {
# m3u8 download
'skip_download': True,
},
},
]
def _real_extract(self, url):
@@ -180,99 +107,15 @@ class VestiIE(InfoExtractor):
page = self._download_webpage(url, video_id, 'Downloading page')
mobj = re.search(
r'<meta property="og:video" content="http://www\.vesti\.ru/i/flvplayer_videoHost\.swf\?vid=(?P<id>\d+)',
r'<meta[^>]+?property="og:video"[^>]+?content="http://www\.vesti\.ru/i/flvplayer_videoHost\.swf\?vid=(?P<id>\d+)',
page)
if mobj:
video_id = mobj.group('id')
page = self._download_webpage('http://www.vesti.ru/only_video.html?vid=%s' % video_id, video_id,
'Downloading video page')
mobj = re.search(
r'<meta property="og:video" content="http://player\.rutv\.ru/flash2v/container\.swf\?id=(?P<id>\d+)', page)
if mobj:
video_type = 'video'
video_id = mobj.group('id')
else:
mobj = re.search(
r'<iframe.+?src="http://player\.rutv\.ru/iframe/(?P<type>[^/]+)/id/(?P<id>\d+)[^"]*".*?></iframe>',
page)
rutv_url = RUTVIE._extract_url(page)
if rutv_url:
return self.url_result(rutv_url, 'RUTV')
if not mobj:
raise ExtractorError('No media found', expected=True)
video_type = mobj.group('type')
video_id = mobj.group('id')
json_data = self._download_json(
'http://player.rutv.ru/iframe/%splay/id/%s' % ('live-' if video_type == 'live' else '', video_id),
video_id, 'Downloading JSON')
if json_data['errors']:
raise ExtractorError('vesti returned error: %s' % json_data['errors'], expected=True)
playlist = json_data['data']['playlist']
medialist = playlist['medialist']
media = medialist[0]
if media['errors']:
raise ExtractorError('vesti returned error: %s' % media['errors'], expected=True)
view_count = playlist.get('count_views')
priority_transport = playlist['priority_transport']
thumbnail = media['picture']
width = int_or_none(media['width'])
height = int_or_none(media['height'])
description = media['anons']
title = media['title']
duration = int_or_none(media.get('duration'))
formats = []
for transport, links in media['sources'].items():
for quality, url in links.items():
if transport == 'rtmp':
mobj = re.search(r'^(?P<url>rtmp://[^/]+/(?P<app>.+))/(?P<playpath>.+)$', url)
if not mobj:
continue
fmt = {
'url': mobj.group('url'),
'play_path': mobj.group('playpath'),
'app': mobj.group('app'),
'page_url': 'http://player.rutv.ru',
'player_url': 'http://player.rutv.ru/flash2v/osmf.swf?i=22',
'rtmp_live': True,
'ext': 'flv',
'vbr': int(quality),
}
elif transport == 'm3u8':
fmt = {
'url': url,
'ext': 'mp4',
}
else:
fmt = {
'url': url
}
fmt.update({
'width': width,
'height': height,
'format_id': '%s-%s' % (transport, quality),
'preference': -1 if priority_transport == transport else -2,
})
formats.append(fmt)
if not formats:
raise ExtractorError('No media links available for %s' % video_id)
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'view_count': view_count,
'duration': duration,
'formats': formats,
}
raise ExtractorError('No video found', expected=True)

View File

@@ -2,7 +2,6 @@ from __future__ import unicode_literals
import re
import xml.etree.ElementTree
import datetime
from .common import InfoExtractor
from ..utils import (
@@ -22,6 +21,7 @@ class VevoIE(InfoExtractor):
https?://videoplayer\.vevo\.com/embed/embedded\?videoId=|
vevo:)
(?P<id>[^&?#]+)'''
_TESTS = [{
'url': 'http://www.vevo.com/watch/hurts/somebody-to-die-for/GB1101300280',
"md5": "06bea460acb744eab74a9d7dcb4bfd61",
@@ -34,6 +34,8 @@ class VevoIE(InfoExtractor):
"duration": 230.12,
"width": 1920,
"height": 1080,
# timestamp and upload_date are often incorrect; seem to change randomly
'timestamp': int,
}
}, {
'note': 'v3 SMIL format',
@@ -47,6 +49,7 @@ class VevoIE(InfoExtractor):
'title': 'I Wish I Could Break Your Heart',
'duration': 226.101,
'age_limit': 0,
'timestamp': int,
}
}, {
'note': 'Age-limited video',
@@ -57,7 +60,8 @@ class VevoIE(InfoExtractor):
'age_limit': 18,
'title': 'Tunnel Vision (Explicit)',
'uploader': 'Justin Timberlake',
'upload_date': '20130704',
'upload_date': 're:2013070[34]',
'timestamp': int,
},
'params': {
'skip_download': 'true',
@@ -169,13 +173,13 @@ class VevoIE(InfoExtractor):
timestamp_ms = int(self._search_regex(
r'/Date\((\d+)\)/', video_info['launchDate'], 'launch date'))
upload_date = datetime.datetime.fromtimestamp(timestamp_ms // 1000)
return {
'id': video_id,
'title': video_info['title'],
'formats': formats,
'thumbnail': video_info['imageUrl'],
'upload_date': upload_date.strftime('%Y%m%d'),
'timestamp': timestamp_ms // 1000,
'uploader': video_info['mainArtists'][0]['artistName'],
'duration': video_info['duration'],
'age_limit': age_limit,

View File

@@ -29,6 +29,7 @@ class VideoBamIE(InfoExtractor):
'info_dict': {
'id': 'pqLvq',
'ext': 'mp4',
'title': '_',
}
},
]
@@ -61,7 +62,7 @@ class VideoBamIE(InfoExtractor):
self._sort_formats(formats)
title = self._og_search_title(page, default='VideoBam', fatal=False)
title = self._og_search_title(page, default='_', fatal=False)
description = self._og_search_description(page, default=None)
thumbnail = self._og_search_thumbnail(page)
uploader = self._html_search_regex(r'Upload by ([^<]+)</a>', page, 'uploader', fatal=False, default=None)

View File

@@ -1,29 +1,33 @@
from __future__ import unicode_literals
import re
from ..utils import (
ExtractorError,
unescapeHTML,
unified_strdate,
US_RATINGS,
)
from .subtitles import SubtitlesInfoExtractor
class VikiIE(SubtitlesInfoExtractor):
IE_NAME = u'viki'
IE_NAME = 'viki'
_VALID_URL = r'^https?://(?:www\.)?viki\.com/videos/(?P<id>[0-9]+v)'
_TEST = {
u'url': u'http://www.viki.com/videos/1023585v-heirs-episode-14',
u'file': u'1023585v.mp4',
u'md5': u'a21454021c2646f5433514177e2caa5f',
u'info_dict': {
u'title': u'Heirs Episode 14',
u'uploader': u'SBS',
u'description': u'md5:c4b17b9626dd4b143dcc4d855ba3474e',
u'upload_date': u'20131121',
u'age_limit': 13,
'url': 'http://www.viki.com/videos/1023585v-heirs-episode-14',
'md5': 'a21454021c2646f5433514177e2caa5f',
'info_dict': {
'id': '1023585v',
'ext': 'mp4',
'title': 'Heirs Episode 14',
'uploader': 'SBS',
'description': 'md5:c4b17b9626dd4b143dcc4d855ba3474e',
'upload_date': '20131121',
'age_limit': 13,
},
u'skip': u'Blocked in the US',
'skip': 'Blocked in the US',
}
def _real_extract(self, url):
@@ -44,28 +48,21 @@ class VikiIE(SubtitlesInfoExtractor):
rating_str = self._html_search_regex(
r'<strong>Rating: </strong>\s*([^<]*)<', webpage,
u'rating information', default='').strip()
RATINGS = {
'G': 0,
'PG': 10,
'PG-13': 13,
'R': 16,
'NC': 18,
}
age_limit = RATINGS.get(rating_str)
'rating information', default='').strip()
age_limit = US_RATINGS.get(rating_str)
info_url = 'http://www.viki.com/player5_fragment/%s?action=show&controller=videos' % video_id
info_webpage = self._download_webpage(
info_url, video_id, note=u'Downloading info page')
info_url, video_id, note='Downloading info page')
if re.match(r'\s*<div\s+class="video-error', info_webpage):
raise ExtractorError(
u'Video %s is blocked from your location.' % video_id,
'Video %s is blocked from your location.' % video_id,
expected=True)
video_url = self._html_search_regex(
r'<source[^>]+src="([^"]+)"', info_webpage, u'video URL')
r'<source[^>]+src="([^"]+)"', info_webpage, 'video URL')
upload_date_str = self._html_search_regex(
r'"created_at":"([^"]+)"', info_webpage, u'upload date')
r'"created_at":"([^"]+)"', info_webpage, 'upload date')
upload_date = (
unified_strdate(upload_date_str)
if upload_date_str is not None

View File

@@ -102,6 +102,15 @@ class VimeoIE(SubtitlesInfoExtractor):
},
]
@classmethod
def suitable(cls, url):
if VimeoChannelIE.suitable(url):
# Otherwise channel urls like http://vimeo.com/channels/31259 would
# match
return False
else:
return super(VimeoIE, cls).suitable(url)
def _login(self):
(username, password) = self._get_login_info()
if username is None:
@@ -332,7 +341,7 @@ class VimeoIE(SubtitlesInfoExtractor):
class VimeoChannelIE(InfoExtractor):
IE_NAME = 'vimeo:channel'
_VALID_URL = r'(?:https?://)?vimeo\.com/channels/(?P<id>[^/]+)'
_VALID_URL = r'(?:https?://)?vimeo\.com/channels/(?P<id>[^/]+)/?(\?.*)?$'
_MORE_PAGES_INDICATOR = r'<a.+?rel="next"'
_TITLE_RE = r'<link rel="alternate"[^>]+?title="(.*?)"'

114
youtube_dl/extractor/wdr.py Normal file
View File

@@ -0,0 +1,114 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
unified_strdate,
compat_urlparse,
determine_ext,
)
class WDRIE(InfoExtractor):
_PLAYER_REGEX = '-(?:video|audio)player(?:_size-[LMS])?'
_VALID_URL = r'(?P<url>https?://www\d?\.(?:wdr\d?|funkhauseuropa)\.de/)(?P<id>.+?)(?P<player>%s)?\.html' % _PLAYER_REGEX
_TESTS = [
{
'url': 'http://www1.wdr.de/mediathek/video/sendungen/servicezeit/videoservicezeit560-videoplayer_size-L.html',
'info_dict': {
'id': 'mdb-362427',
'ext': 'flv',
'title': 'Servicezeit',
'description': 'md5:c8f43e5e815eeb54d0b96df2fba906cb',
'upload_date': '20140310',
},
'params': {
'skip_download': True,
},
},
{
'url': 'http://www1.wdr.de/themen/av/videomargaspiegelisttot101-videoplayer.html',
'info_dict': {
'id': 'mdb-363194',
'ext': 'flv',
'title': 'Marga Spiegel ist tot',
'description': 'md5:2309992a6716c347891c045be50992e4',
'upload_date': '20140311',
},
'params': {
'skip_download': True,
},
},
{
'url': 'http://www1.wdr.de/themen/kultur/audioerlebtegeschichtenmargaspiegel100-audioplayer.html',
'md5': '83e9e8fefad36f357278759870805898',
'info_dict': {
'id': 'mdb-194332',
'ext': 'mp3',
'title': 'Erlebte Geschichten: Marga Spiegel (29.11.2009)',
'description': 'md5:2309992a6716c347891c045be50992e4',
'upload_date': '20091129',
},
},
{
'url': 'http://www.funkhauseuropa.de/av/audiogrenzenlosleckerbaklava101-audioplayer.html',
'md5': 'cfff440d4ee64114083ac44676df5d15',
'info_dict': {
'id': 'mdb-363068',
'ext': 'mp3',
'title': 'Grenzenlos lecker - Baklava',
'description': 'md5:7b29e97e10dfb6e265238b32fa35b23a',
'upload_date': '20140311',
},
},
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
page_url = mobj.group('url')
page_id = mobj.group('id')
webpage = self._download_webpage(url, page_id)
if mobj.group('player') is None:
entries = [
self.url_result(page_url + href, 'WDR')
for href in re.findall(r'<a href="/?(.+?%s\.html)" rel="nofollow"' % self._PLAYER_REGEX, webpage)
]
return self.playlist_result(entries, page_id)
flashvars = compat_urlparse.parse_qs(
self._html_search_regex(r'<param name="flashvars" value="([^"]+)"', webpage, 'flashvars'))
page_id = flashvars['trackerClipId'][0]
video_url = flashvars['dslSrc'][0]
title = flashvars['trackerClipTitle'][0]
thumbnail = flashvars['startPicture'][0] if 'startPicture' in flashvars else None
if 'trackerClipAirTime' in flashvars:
upload_date = flashvars['trackerClipAirTime'][0]
else:
upload_date = self._html_search_meta('DC.Date', webpage, 'upload date')
if upload_date:
upload_date = unified_strdate(upload_date)
if video_url.endswith('.f4m'):
video_url += '?hdcore=3.2.0&plugin=aasp-3.2.0.77.18'
ext = 'flv'
else:
ext = determine_ext(video_url)
description = self._html_search_meta('Description', webpage, 'description')
return {
'id': page_id,
'url': video_url,
'ext': ext,
'title': title,
'description': description,
'thumbnail': thumbnail,
'upload_date': upload_date,
}

View File

@@ -1,3 +1,6 @@
from __future__ import unicode_literals
import json
import re
import sys
@@ -17,24 +20,25 @@ from ..aes import (
class YouPornIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?(?:www\.)?(?P<url>youporn\.com/watch/(?P<videoid>[0-9]+)/(?P<title>[^/]+))'
_VALID_URL = r'^(?P<proto>https?://)(?:www\.)?(?P<url>youporn\.com/watch/(?P<videoid>[0-9]+)/(?P<title>[^/]+))'
_TEST = {
u'url': u'http://www.youporn.com/watch/505835/sex-ed-is-it-safe-to-masturbate-daily/',
u'file': u'505835.mp4',
u'md5': u'71ec5fcfddacf80f495efa8b6a8d9a89',
u'info_dict': {
u"upload_date": u"20101221",
u"description": u"Love & Sex Answers: http://bit.ly/DanAndJenn -- Is It Unhealthy To Masturbate Daily?",
u"uploader": u"Ask Dan And Jennifer",
u"title": u"Sex Ed: Is It Safe To Masturbate Daily?",
u"age_limit": 18,
'url': 'http://www.youporn.com/watch/505835/sex-ed-is-it-safe-to-masturbate-daily/',
'md5': '71ec5fcfddacf80f495efa8b6a8d9a89',
'info_dict': {
'id': '505835',
'ext': 'mp4',
'upload_date': '20101221',
'description': 'Love & Sex Answers: http://bit.ly/DanAndJenn -- Is It Unhealthy To Masturbate Daily?',
'uploader': 'Ask Dan And Jennifer',
'title': 'Sex Ed: Is It Safe To Masturbate Daily?',
'age_limit': 18,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('videoid')
url = 'http://www.' + mobj.group('url')
url = mobj.group('proto') + 'www.' + mobj.group('url')
req = compat_urllib_request.Request(url)
req.add_header('Cookie', 'age_verified=1')
@@ -42,7 +46,7 @@ class YouPornIE(InfoExtractor):
age_limit = self._rta_search(webpage)
# Get JSON parameters
json_params = self._search_regex(r'var currentVideo = new Video\((.*)\);', webpage, u'JSON parameters')
json_params = self._search_regex(r'var currentVideo = new Video\((.*)\);', webpage, 'JSON parameters')
try:
params = json.loads(json_params)
except:
@@ -61,7 +65,7 @@ class YouPornIE(InfoExtractor):
# Get all of the links from the page
DOWNLOAD_LIST_RE = r'(?s)<ul class="downloadList">(?P<download_list>.*?)</ul>'
download_list_html = self._search_regex(DOWNLOAD_LIST_RE,
webpage, u'download list').strip()
webpage, 'download list').strip()
LINK_RE = r'<a href="([^"]+)">'
links = re.findall(LINK_RE, download_list_html)
@@ -86,7 +90,7 @@ class YouPornIE(InfoExtractor):
resolution = format_parts[0]
height = int(resolution[:-len('p')])
bitrate = int(format_parts[1][:-len('k')])
format = u'-'.join(format_parts) + u'-' + dn
format = '-'.join(format_parts) + '-' + dn
formats.append({
'url': video_url,

View File

@@ -194,14 +194,14 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
'151': {'ext': 'mp4', 'height': 72, 'resolution': '72p', 'format_note': 'HLS', 'preference': -10},
# DASH mp4 video
'133': {'ext': 'mp4', 'height': 240, 'resolution': '240p', 'format_note': 'DASH video', 'preference': -40},
'134': {'ext': 'mp4', 'height': 360, 'resolution': '360p', 'format_note': 'DASH video', 'preference': -40},
'135': {'ext': 'mp4', 'height': 480, 'resolution': '480p', 'format_note': 'DASH video', 'preference': -40},
'136': {'ext': 'mp4', 'height': 720, 'resolution': '720p', 'format_note': 'DASH video', 'preference': -40},
'137': {'ext': 'mp4', 'height': 1080, 'resolution': '1080p', 'format_note': 'DASH video', 'preference': -40},
'138': {'ext': 'mp4', 'height': 2160, 'resolution': '2160p', 'format_note': 'DASH video', 'preference': -40},
'160': {'ext': 'mp4', 'height': 192, 'resolution': '192p', 'format_note': 'DASH video', 'preference': -40},
'264': {'ext': 'mp4', 'height': 1440, 'resolution': '1440p', 'format_note': 'DASH video', 'preference': -40},
'133': {'ext': 'mp4', 'height': 240, 'resolution': '240p', 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
'134': {'ext': 'mp4', 'height': 360, 'resolution': '360p', 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
'135': {'ext': 'mp4', 'height': 480, 'resolution': '480p', 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
'136': {'ext': 'mp4', 'height': 720, 'resolution': '720p', 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
'137': {'ext': 'mp4', 'height': 1080, 'resolution': '1080p', 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
'138': {'ext': 'mp4', 'height': 2160, 'resolution': '2160p', 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
'160': {'ext': 'mp4', 'height': 192, 'resolution': '192p', 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
'264': {'ext': 'mp4', 'height': 1440, 'resolution': '1440p', 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
# Dash mp4 audio
'139': {'ext': 'm4a', 'format_note': 'DASH audio', 'vcodec': 'none', 'abr': 48, 'preference': -50},
@@ -209,12 +209,12 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
'141': {'ext': 'm4a', 'format_note': 'DASH audio', 'vcodec': 'none', 'abr': 256, 'preference': -50},
# Dash webm
'167': {'ext': 'webm', 'height': 360, 'width': 640, 'format_note': 'DASH video', 'container': 'webm', 'vcodec': 'VP8', 'acodec': 'none', 'preference': -40},
'168': {'ext': 'webm', 'height': 480, 'width': 854, 'format_note': 'DASH video', 'container': 'webm', 'vcodec': 'VP8', 'acodec': 'none', 'preference': -40},
'169': {'ext': 'webm', 'height': 720, 'width': 1280, 'format_note': 'DASH video', 'container': 'webm', 'vcodec': 'VP8', 'acodec': 'none', 'preference': -40},
'170': {'ext': 'webm', 'height': 1080, 'width': 1920, 'format_note': 'DASH video', 'container': 'webm', 'vcodec': 'VP8', 'acodec': 'none', 'preference': -40},
'218': {'ext': 'webm', 'height': 480, 'width': 854, 'format_note': 'DASH video', 'container': 'webm', 'vcodec': 'VP8', 'acodec': 'none', 'preference': -40},
'219': {'ext': 'webm', 'height': 480, 'width': 854, 'format_note': 'DASH video', 'container': 'webm', 'vcodec': 'VP8', 'acodec': 'none', 'preference': -40},
'167': {'ext': 'webm', 'height': 360, 'width': 640, 'format_note': 'DASH video', 'acodec': 'none', 'container': 'webm', 'vcodec': 'VP8', 'acodec': 'none', 'preference': -40},
'168': {'ext': 'webm', 'height': 480, 'width': 854, 'format_note': 'DASH video', 'acodec': 'none', 'container': 'webm', 'vcodec': 'VP8', 'acodec': 'none', 'preference': -40},
'169': {'ext': 'webm', 'height': 720, 'width': 1280, 'format_note': 'DASH video', 'acodec': 'none', 'container': 'webm', 'vcodec': 'VP8', 'acodec': 'none', 'preference': -40},
'170': {'ext': 'webm', 'height': 1080, 'width': 1920, 'format_note': 'DASH video', 'acodec': 'none', 'container': 'webm', 'vcodec': 'VP8', 'acodec': 'none', 'preference': -40},
'218': {'ext': 'webm', 'height': 480, 'width': 854, 'format_note': 'DASH video', 'acodec': 'none', 'container': 'webm', 'vcodec': 'VP8', 'acodec': 'none', 'preference': -40},
'219': {'ext': 'webm', 'height': 480, 'width': 854, 'format_note': 'DASH video', 'acodec': 'none', 'container': 'webm', 'vcodec': 'VP8', 'acodec': 'none', 'preference': -40},
'242': {'ext': 'webm', 'height': 240, 'resolution': '240p', 'format_note': 'DASH webm', 'preference': -40},
'243': {'ext': 'webm', 'height': 360, 'resolution': '360p', 'format_note': 'DASH webm', 'preference': -40},
'244': {'ext': 'webm', 'height': 480, 'resolution': '480p', 'format_note': 'DASH webm', 'preference': -40},
@@ -1130,14 +1130,18 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
return self._download_webpage(url, video_id, note=u'Searching for annotations.', errnote=u'Unable to download video annotations.')
def _real_extract(self, url):
proto = (
u'http' if self._downloader.params.get('prefer_insecure', False)
else u'https')
# Extract original video URL from URL with redirection, like age verification, using next_url parameter
mobj = re.search(self._NEXT_URL_RE, url)
if mobj:
url = 'https://www.youtube.com/' + compat_urllib_parse.unquote(mobj.group(1)).lstrip('/')
url = proto + '://www.youtube.com/' + compat_urllib_parse.unquote(mobj.group(1)).lstrip('/')
video_id = self.extract_id(url)
# Get video webpage
url = 'https://www.youtube.com/watch?v=%s&gl=US&hl=en&has_verified=1' % video_id
url = proto + '://www.youtube.com/watch?v=%s&gl=US&hl=en&has_verified=1' % video_id
video_webpage = self._download_webpage(url, video_id)
# Attempt to extract SWF player URL
@@ -1162,7 +1166,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
'asv': 3,
'sts':'1588',
})
video_info_url = 'https://www.youtube.com/get_video_info?' + data
video_info_url = proto + '://www.youtube.com/get_video_info?' + data
video_info_webpage = self._download_webpage(video_info_url, video_id,
note=False,
errnote='unable to download video info webpage')
@@ -1170,7 +1174,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
else:
age_gate = False
for el_type in ['&el=embedded', '&el=detailpage', '&el=vevo', '']:
video_info_url = ('https://www.youtube.com/get_video_info?&video_id=%s%s&ps=default&eurl=&gl=US&hl=en'
video_info_url = (proto + '://www.youtube.com/get_video_info?&video_id=%s%s&ps=default&eurl=&gl=US&hl=en'
% (video_id, el_type))
video_info_webpage = self._download_webpage(video_info_url, video_id,
note=False,
@@ -1285,10 +1289,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
# Decide which formats to download
try:
mobj = re.search(r';ytplayer.config = ({.*?});', video_webpage)
mobj = re.search(r';ytplayer\.config\s*=\s*({.*?});', video_webpage)
if not mobj:
raise ValueError('Could not find vevo ID')
ytplayer_config = json.loads(mobj.group(1))
json_code = uppercase_escape(mobj.group(1))
ytplayer_config = json.loads(json_code)
args = ytplayer_config['args']
# Easy way to know if the 's' value is in url_encoded_fmt_stream_map
# this signatures are encrypted
@@ -1444,7 +1449,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
'duration': video_duration,
'age_limit': 18 if age_gate else 0,
'annotations': video_annotations,
'webpage_url': 'https://www.youtube.com/watch?v=%s' % video_id,
'webpage_url': proto + '://www.youtube.com/watch?v=%s' % video_id,
'view_count': view_count,
'like_count': like_count,
'dislike_count': dislike_count,

View File

@@ -6,6 +6,7 @@ import ctypes
import datetime
import email.utils
import errno
import getpass
import gzip
import itertools
import io
@@ -22,6 +23,7 @@ import struct
import subprocess
import sys
import traceback
import xml.etree.ElementTree
import zlib
try:
@@ -777,6 +779,7 @@ def unified_strdate(date_str):
'%Y/%m/%d %H:%M:%S',
'%Y-%m-%d %H:%M:%S',
'%d.%m.%Y %H:%M',
'%d.%m.%Y %H.%M',
'%Y-%m-%dT%H:%M:%SZ',
'%Y-%m-%dT%H:%M:%S.%fZ',
'%Y-%m-%dT%H:%M:%S.%f0Z',
@@ -1267,3 +1270,31 @@ def read_batch_urls(batch_fd):
def urlencode_postdata(*args, **kargs):
return compat_urllib_parse.urlencode(*args, **kargs).encode('ascii')
def parse_xml(s):
class TreeBuilder(xml.etree.ElementTree.TreeBuilder):
def doctype(self, name, pubid, system):
pass # Ignore doctypes
parser = xml.etree.ElementTree.XMLParser(target=TreeBuilder())
kwargs = {'parser': parser} if sys.version_info >= (2, 7) else {}
return xml.etree.ElementTree.XML(s.encode('utf-8'), **kwargs)
if sys.version_info < (3, 0) and sys.platform == 'win32':
def compat_getpass(prompt, *args, **kwargs):
if isinstance(prompt, compat_str):
prompt = prompt.encode(preferredencoding())
return getpass.getpass(prompt, *args, **kwargs)
else:
compat_getpass = getpass.getpass
US_RATINGS = {
'G': 0,
'PG': 10,
'PG-13': 13,
'R': 16,
'NC': 18,
}

View File

@@ -1,2 +1,2 @@
__version__ = '2014.03.10'
__version__ = '2014.03.21.3'